Discovery of topic flows of authors

With an increase in the number of Web documents, the number of proposed methods for knowledge discovery on Web documents have been increased as well. The documents do not always provide keywords or categories, so unsupervised approaches are desirable, and topic modeling is such an approach for knowledge discovery without using labels. Further, Web documents usually have time information such as publish years, so knowledge patterns over time can be captured by incorporating the time information. The temporal patterns of knowledge can be used to develop useful services such as a graph of research trends, finding similar authors (potential co-authors) to a particular author, or finding top researchers about a specific research domain. In this paper, we propose a new topic model, Author Topic-Flow (ATF) model, whose objective is to capture temporal patterns of research interests of authors over time, where each topic is associated with a research domain. The state-of-the-art model, namely Temporal Author Topic model, has the same objective as ours, where it computes the temporal patterns of authors by combining the patterns of topics. We believe that such ‘indirect’ temporal patterns will be poor than the ‘direct’ temporal patterns of our proposed model. The ATF model allows each author to have a separated variable which models the temporal patterns, so we denote it as ‘direct’ topic flow. The design of the ATF model is based on the hypothesis that ‘direct’ topic flows will be better than the ‘indirect’ topic flows. We prove the hypothesis is true by a structural comparison between the two models and show the effectiveness of the ATF model by empirical results.


Introduction
As the number of Web documents is increasing exponentially, it becomes important to develop methods to extract useful information or knowledge from the documents.There are many knowledge discovery problems, one of which is the discovery of academic research interests.The discovery of research interests may give an insight into research trends according to a particular period and further may help researchers to make wise decisions for their future research topics.
It is important to note the difference between the discovery of academic interests and identifying experts [7].The task of discovery of academic interests is to find people who write about particular topics, and the identifying experts task is to find the most skilled people for particular research topics.The discovery of academic interests, therefore, assumes that an author writes much about a particular topic when the author is interested in the topic.If the academic interests are plotted along the time sequence, then it will be a trend of academic interests.The trend of academic interests can be used to develop useful services such as finding similar authors (potential co-authors) to a particular author, finding top researchers about a specific research domain, or a graph of research trends.For example, the research trend graph is provided by Microsoft academic research, as shown in Fig. 1.When we pick the research topic Networks and Communications, then the dominant author list is shown in the bottom.By navigating such research trend graph, we can get useful information about 'Which research area is hot, nowadays?','Who is the expert in this area?', or 'Will this research area rise again?'.
There are three practical factors to consider for the task of discovering research interests from academic Web documents.First, the academic documents typically do not have consistent labels or categories.Although each academic document provides keywords and categories that it belongs to, the keywords and categories are not standardized with other documents, conferences, or journals.The inconsistent keywords and categories make the discovery of academic interests more difficult.Second, it is necessary to capture latent semantic structures (i.e., research topics) that are common across all the documents.Each academic document may contain multiple research topics, so every topic is shared by all the documents.Third, academic Web documents are dynamic, so the research topics rise or fall as time goes by.The time factor conveys temporal patterns of research interests, which is important information.To address the above factors, clustering techniques have been investigated to overcome this obstacle [15].In this paper, we adopt a topic modeling approach which is one of such techniques, where each topic is regarded as a specific research domain.When we use a word as a feature, then each document can be represented as a sequence of words.This representation, however, is not practical because the number of possible sequences of unique words grows exponentially as the size of each document increases.Therefore, the sequence of words is usually ignored, and only the word frequencies are utilized, which is called bag-of-words (BOW) method.Although this method misses location information of words, it has shown good performances with various types of data.The topic modeling approach also takes the BOW method.Each topic or cluster obtained from the topic modeling approach conveys latent semantic meaning across the documents.When the number of unique words, called the vocabulary size, is V , then a topic is represented as a V -dimensional vector of weighted unique words.For each topic, each unique word has a weight, and unique words that have greater weights are more representative of the topic.By looking at the top representative words of each topic, human can interpret and give a name to the topic.The sum of word weights in each topic is 1, so a topic is a distribution across the set of unique words.Thus, each topic is a distribution across the set of unique words, and different topics represent different semantic meaning by having different weights of the unique words.For example, if there are only three unique words {music, film, rock} and the top two words of a particular topic is {film, music}, then we can interpret the topic is OST music.If the top two words of another topic are {music, rock}, then we might interpret the topic is rock music.In this paper, we regard each topic as a research domain, so the topic proportion can be interpreted as the interest of the domains.Further, if the topic proportions are listed according to time information, then it will be a temporal pattern or a trend of research interests.Our proposed new topic model captures such temporal pattern of each author from academic Web documents.
The rest of this paper is organized as follows.Section 2 provides preliminary background for understanding the topic modeling approaches and related studies.Section 3 describes our motivation and approach in detail, and Section 4 presents experiments and results.Finally, Sect. 5 brings this paper to conclusions and future work.

Background
The past efforts for discovering research interests can be divided into three categories as described in Daud [6].First, from a piece of text, research topics are identified by considering predefined features such as sentence length, forensic linguistics, and authorship attributions [8,12].The predefined features have tremendous impact on the performance, so these approaches heavily rely on the features.One big limitation of this approach is that it will require huge effort to develop different feature sets for different types of data.Second, from explicit connections between authors (e.g., cowriting), arbitrary relationships between the authors can be extracted in a form of graph structure [11,21,30].This approach usually does not utilize the content of documents, so it is not likely to exactly capture the research interests of the authors.For example, if two authors A and B wrote a paper together, then the two authors might have similar research interests.However, this does not help to answer the question 'What is the most favorite research topic of A?'.Moreover, this approach is not applicable to the authors who usually write papers alone.Third, from the contents of documents, arbitrary semantic structures can be extracted.As the number of documents without annotations (i.e., unstructured texts) is growing exponentially, it is preferred to take unsupervised methods [5,23,31].Topic modeling is one of such methods, and it captures the latent semantic structures across the documents.Many topic models and related studies have been proposed [19,26,28], where they are mainly motivated from the probabilistic latent semantic analysis (PLSA) model [14] or the latent Dirichlet allocation (LDA) model [3].For instance, some models extract topics from the perspectives of authors Mimno and McCallum [20], Rosen-Zvi et al [24], Steyvers et al [25].These models commonly assume that authors have topic distributions.In Chang et al [4], Newman et al [22], the topics are extracted from the perspectives of entities, where the entities are commonly supposed to have topic distributions.These models, however, do not consider the time dimension, so they suffer from a topic exchangeability problem.Assume that one uses AT model [24] to identify academic trends of a given author for two years.The AT model simply assumes that each author has a topic distribution which does not have any time information or temporal patterns.Thus, the model should be applied to each year separately.In other words, there will be two AT models for two years, respectively, and there will be two sets of topics φ first year and φ second year .It should be noted that the tth topic φ first year t of the first year will not be equivalent to the tth topic φ second year t of the second year.In the worst case, the tth topic of the first year may be out of existence in the second year.This inconsistency of topics is called as the exchangeability problem [6].
We propose a new topic model to obtain the research interests of authors without the drawbacks of the exchangeability problem.As each topic model has its unique generative structure that represents its own hypothesis, it will be helpful to discuss the existing salient models related to our proposed model.

Latent Dirichlet allocation (LDA) model and Author Topic (AT) model
Figure 2 shows the graphical representations of the LDA model [3] and the Author Topic (AT) model [24].Given D documents, the LDA model hypothesizes that each document is written by N d steps of an iterative process, where each step generates a single word.As the N d -dimensional word sequence is observable, it is represented by the shaded node w.For each nth word, the topic variable z is sampled from the topic distribution θ of the dth document, where θ is derived from the hyperparameter α.The nth word is also derived from T topics φ, where the topics are derived from the hyperparameter β.The hyperparameters α and β are usually given by human.Assume that there are V unique words in the documents, then every topic φ is V -dimensional distribution over the vocabulary words.In other words, every topic is a vector of the same length, but different topics will have different distributions.It is important to note that the topic φ is a probability distribution over the unique words while the topic distribution θ , also called topic distribution, is the probability distribution of the topics.This implies that each document can be represented as the topic distribution θ rather than the distribution φ over the unique words.As the number of topics T is usually smaller than the number of documents D, the topic distribution θ makes it easy to compute distance or similarity between documents.
The LDA model hypothesizes that the word sequence is generated from the topics, where each document has a topic distribution.While the LDA model captures some latent patterns from the document' point of view, the AT model extracts some patterns from the authors' point of view.That is, as shown in Fig. 2, the AT model hypothesizes that there is a topic distribution θ a for every ath author rather than dth document.If the total number of authors is A, then there are A topic distributions of the authors.For each document, the author list a d and the word sequence w are observable, so they are represented as the shaded nodes.For each nth word, the author variable a is sampled from the author list a d .The topic variable z is then sampled from the topic distribution Fig. 2 (Left) a graphical representation of the latent Dirichlet allocation (LDA) model [3]; (right) a graphical representation of the Author Topic (AT) model [24] Fig. 3 Sample topics of five authors obtained from AT model θ of the sampled author a.The nth word is generated in the process as same as the process of the LDA model.The biggest difference between the LDA model and the AT model is the position of the topic distribution θ .Thus, the hypothesis of the AT model is different from that of the LDA model, and such difference makes them to be the best for different types of data.To discover the research interests of authors, it is obvious that the AT model will be better than the LDA model.Figure 3 shows the sample topics of five authors obtained from the AT model.For example, the most prominent topic of the author Jordan can be interpreted as mixture of experts.

Dynamic Topic (DT) model, Sequential LDA (S-LDA) model, and Sequential Entity Group Topic (SEGT) model
Although the LDA model and AT model are capable of capturing latent semantics across the documents, both of them have a limitation that they do not consider temporal patterns.The temporal patterns can be important to some tasks such as a trend analysis and action recognition.There are some topic models that capture temporal patterns of topics [2,9,10,16].Blei and Lafferty proposed the Dynamic Topic (DT) model [2], which captures the evolution of topics by employing a Gaussian kernel.The topics are obtained from sequentially organized data, and the topics change over time, e.g., years.This implies that the research topics of different years will be different, which implies that it requires human to interpret the topics of every year.It also means that it is not possible to obtain the rise and fall of particular topics (i.e., a research trend).If the topics are not changed, then a flow of a particular topic can be obtained by concatenating the proportions of the corresponding topic of every time span (i.e., year).Assume that there are documents of three time tags 2010, 2011, and 2012, where each document has only one time tag.T -dimensional topic distribution will be obtained for every year, where the topics are same for every year.If the topic distributions are aligned or ordered by the time tags, then it will be T × 3 (time span) matrix, where each tth row is a temporal pattern of tth topic.We call the temporal pattern as topic flow in this paper.
Sequential LDA (S-LDA) model [10] was proposed to capture the topic flows.This model is designed based on a hypothesis that each segment (time span) is influenced by its previous segment.The hypothesis is represented by a nested extension of the twoparameter Poisson-Dirichlet Process (PDP) which basically gives a kind of smoothing effect to adjacent segments.In other words, it provides a smoothed version of topic flows in the assumption that the topic distributions do not rise and fall rapidly.However, the topics obtained by the S-LDA model are not from authors' point of view.
Sequential Entity Group Topic (SEGT) model [16] was proposed to capture the topic flows from the entity's (or entity group's) point of view.It is basically an extension of the S-LDA model, so it uses the nested extension of the two-parameter PDP to model the temporal patterns.The biggest difference between the SEGT model and the S-LDA model is that the position of the topic distributions.The SEGT model allows the entities or entity groups to have topic distributions, while the documents have the topic distributions in the S-LDA model.Although the results of S-LDA model and SEGT model were quite impressive, they have a common limitation that they are designed to capture the topic flows within only one document.That is, the authors assume that the smoothing effects between adjacent segments are valid only within a single document.

Topics Over Time (TOT) model and Temporal Author Topic (TAT) model
To get topic flows from a set of documents, Wang and McCallum [29] proposed the Topics Over Time (TOT) model.The model has a time stamp variable y which is generated from topic-specific multinomial distributions φ and beta distribution ψ.It has a limitation that it generates the topics from the perspective of the documents.To get topic flows of authors from a set of documents, the Temporal Author Topic (TAT) model [6] was appeared, where the TAT model is a variation of the Author Conference Topic (ACT1) model [27].
The graphical representations of the TOT model and the TAT model are depicted in Fig. 4. The biggest difference between the two models is the position of the topic distribution θ .In the TOT model, each document has a topic distribution, which implies that the topics are obtained from the perspective of the documents.On the other hand, each author has a topic distribution in the TAT model, which means that the topics are generated from the authors' point of view.The TAT model is basically an extension of the AT model.The process of word sequence generation of the TAT model is same The topics (the V -dimensional probability distributions over the vocabulary) as the process of the AT model, except for the additional variable ψ.The meaning of notations of the TAT model is summarized in Table 1.Note that the number of variable φ is same as the number of variable ψ.That is, there are T topics, each of which is a distribution over V unique words, while each time span distribution ψ is a distribution over time spans.Thus, ψ represents the flow (i.e., rise and fall) of every tth topic.The TAT model is the model most similar to ours, and its objective is the same as ours.
We believe that the TAT model has a limitation that it uses only one variable ψ to represent the flows of every topic.This implies that the topic flows of every author are shared, and this will lead to wrong results for a particular authors whose topic flows are much different from many other authors.Moreover, to compute the topic flows of authors, it is required to combine the topic patterns.Such topic flows are called as indirect topic flows in this paper.On the other hand, our proposed model allows each author to have a topic distribution for each time span, so the topic flows of authors are directly represented by the topic distribution.We call this topic flow as direct topic flows in this paper.The proposed model is designed based on the hypothesis that modeling directly topic flows will be more effective than the indirect topic flows.

Finding topic flows of authors over time
In this section, we clarify the definition of the problem to be solved and offer our contributions.Then we describe the new proposed topic model in detail.

Problem definition and contributions
The problem is to discover a research trend of authors.If we regard the research field as a topic of the topic modeling, then the problem becomes the task of finding the topic flows of authors.If an author a writes many papers on a particular topic t for year y, then the author will have a bigger topic proportion of the topic t for the year.Likewise, if the author writes only few papers on the topic t for the year y + 1, then the topic proportion for the year will decrease.The two consecutive topic proportions can be concatenated and will be a flow the topic t.
The topic flows of authors can be used to answer to the following questions: "Who are the authors who wrote most on topic Z in year Y ?", "Which topics did author P write most about in year Y ?", and "Can we see the temporal patterns of interests, e.g., topic flows of author P for the past 5 years?"The TAT model [6] solved the same problem as ours, but we demonstrate that the proposed model is more effectively used to answer to the above questions by showing many experimental results.The contributions of our work are as follows: (1) A new topic model to get topic flows of authors without suffering from the exchangeability problem; (2) Experimental verification of the effectiveness of the proposed model by comparison with the TAT model on a real-world dataset.

Author Topic-Flow (ATF) model
Motivated by the hypothesis that direct topic flows will be more effective than indirect topic flows, we designed a new topic model called the Author Topic-Flow (ATF) model.The graphical representation of the ATF model is shown in Fig. 5 and the meaning of notations is given in Table 2.
To effectively explain the structure of the ATF model, we here clarify the three big differences between the ATF model and the TAT model.First, the position of time span distribution ψ is different.Both models utilize the time span distribution to capture the temporal patterns of topics, so the position of ψ determines the way of representing temporal patterns of topics.The TAT model says that each topic has a time span distribution, which means that each topic has a flow or a temporal pattern.A higher proportion ψ t to the tth topic in the yth time span implies that there are more documents about the tth topic in the yth time span, and vice versa.It is worth noting that the topic distributions of every author contribute to the global time span The observed nth word in the dth document w A vector of all the observed words The number of words that are assigned to the tth topic and the ath author, within the yth year

C AY ay
The number of words that are assigned to the ath author within the yth year The number of the vth unique words that are assigned to the tth topic in every document distribution, so the time span distribution ψ is globally shared by all the authors.That is, it will make many authors to have similar topic flows to each other, which is obviously wrong results.Moreover, it is required to combine the topic distribution and temporal patterns of the topics, in order to indirectly obtain the topic flow of a particular author.On the other hand, the ATF model says that each author has a time span distribution ψ and topic distributions θ for every time span (e.g., year).The ATF model allows each author to have a different topic flow which can be directly used as a topic flow of the author.Second, the number of topic distributions θ is different.In the TAT model, each author has only one topic distribution, while each author has topic distributions for every time span in the ATF model.By joining with the first difference, this second difference allows each author to have a different topic flow.
Third, in the ATF model, every author a has the prior variables α a and γ a , which are prior information about the topic distributions and time span distribution, respectively.This helps the model to avoid an overfitting problem that stems from the restriction that every future document has a topic distribution the same as was seen in the training documents [3].
One may argue that why do not move the topic φ into each author.If each author has the topic φ, then different authors will have different topics.That is, the exchangeability problem appears between the authors, so it will be impossible to measure a similarity or a distance between the authors.To avoid the exchangeability problem, the topic φ should be shared by all the authors and all the time spans.The ATF model can be explained in a more formal generative way as follows.iii.Given z dn , generate a word w dn from Multinomial(φ z dn ).
In the ATF model, each document has the observable time span y, which intuitively makes sense because each research article likely has only one time tag.The TAT model, in contrast, has a time span for every word, which is not natural.The probability of generating the nth word w dn with year j given an author g of the dth document is represented as:

123
where the meaning of the notations is described in Table 2.The exact computation of the parameters intractable, we use a collapsed Gibbs sampling algorithm for approximating the parameters [1,13], which is one of the MCMC algorithms.For each step of the Markov chain, a latent topic assignment z dn and author assignment a dn of the nth word m in the dth document are drawn from the conditional posterior distribution: ( The meaning of the notations is represented in Table 2, with a minor exceptional use of the notation that C AY T ayt , C AY ay , and C T V tv in this expression exclude the nth word.The ATF model has three parameters, the topic distribution θ , the word distribution (topic) φ, and the time span distribution ψ.To obtain the parameters, we need to keep track of V * T (word by topic) and T * Y * A (topic by time span of the author) count matrices.From the count matrices, the three parameters can be obtained as ( The time span distribution ψ is not associated with the two random variables a and z, so it can be exactly computed without an iterative approximation.This implies that the time span distribution in Eq. ( 2) is not changed during the approximation, so it can make each step of the Markov chain faster by using a fixed time span distribution.The probability of the ath author given the tth topic and the yth year can be obtained using a joint conditional probability as follows: In terms of the joint probability, we compare the ATF model with the TAT model in order to emphasize the structural differences between them, which result in performance gaps that we will show by experiments.The joint probability of the TAT model is P(a, t, y) = P(t, y|a)P(a) = P(t|a)P(y|a)P(a), (7) where P(y|a) = T z=1 P(y|z)P(z|a), P(y|z) = ψ zy , P(z|a) = θ az , and P(a) = C AY ay / A a =1 C AY a y .The joint probability of the ATF model is P(a, t, y) = P(t|a, y)P(a, y) = P(t|a, y)P(y|a)P(a), (8) where P(y|a) = ψ ay , P(t|a, y) = θ ayt , and P(a) = C AY ay / A a =1 C AY a y .In the TAT model, a conditional probability P(t|a, y) of the tth topic given the ath author and the yth year can be computed from the joint probability P(a, t, y).However, to compute the joint probability, it is required that the time span distribution of each author P(y|a) should be 'indirectly' obtained by combining the time span distribution of every topic and the topic distribution of the author.This causes the worse performance of the TAT model about the task of discovery of topic flows of authors.Moreover, the topic flow of ath author is generated by combining the set of the conditional probabilities, which means that the topic flow of the author will be smoothed by the global time span distribution of the topics.This also means that every author tends to have similar temporal patterns of research interests, which will be obviously wrong results.
In contrast, the ATF model has the parameter θ ayt which itself is the conditional probability P(t|a, y), and the parameter ψ ay is the time span distribution of the ath author.Thus, the topic flow of each author can be 'directly' obtained by combining θ ayt .We believe that this difference between the two models will make a huge impact on the results.To effectively demonstrate structural differences, the graphical representation of three models (e.g., AT model, TAT model, and ATF model) is depicted in Fig. 6, where different colored lines denote different distributions.The generative process of AT model can be seen as two steps.The first step is to sample the author variable z followed by the second step of sampling the word.The generative process of TAT model has additional step of sampling the year (time span variable).The difference from the AT model is highlighted using the red dotted lines.Note that the variable γ , which represents the temporal patterns of topics, is a global variable.This implies that the temporal patterns of topics are shared by every author.On the other hand, in the generative process of the ATF model, each author has a separated variable γ .This eventually allows the temporal patterns of topics of a particular author to be independent from the temporal patterns of other authors.We believe that such structural difference makes our proposed model will be better for capturing the topic flow of authors, and proved it by experimental results.

Description of dataset and environment
The dataset is a set of 816 research articles on computer science, which was collected from a Microsoft academic research site.The dataset is collected by searching articles of some key authors and articles of the co-authors of the key authors.The dataset covers artificial intelligence, algorithms, databases, the natural language process, information retrieval, machine learning, networks, real-time systems, and so on.The time spans of the dataset were unique publish years between 2007 and 2011.The articles contained only abstracts, not the full texts.We removed stop-words, punctuations and numbers.All the words were lowercase, and we performed stemming by Porter stemmer.Words and authors that appeared less than three times in a dataset were removed.The sentences were recognized by '.', '?', '!', and "newline".The size of vocabulary was V = 8300, the number of total words was 70,373, and the number of authors was A = 1127.
For fair comparison, the parameters of the two models were equally initialized.We symmetrically set α = 0.1, β = 0.1, and γ = 0.1.We varied the number of topics T and found that various topics were obtained with few redundancies when T = 50.Therefore, for all experiments except the author prediction test, we set T = 50.The number of Gibbs sampler iterations for the models was 1000.

Topic discovery
The purpose of this experiment was to show the quality of topics obtained from the ATF model by comparing the topics of the ATF model with the topics of the TAT model.We used the three criteria of Jo and Oh [17] for measuring the quality.First, the discovered topics should be coherent.In other words, they should be meaningful or comprehensible to people.Second, they should be specific enough to capture the details of research interests in the dataset.Third, they should be those that are discussed the most in the dataset.In other words, the topics should sufficiently reflect the dataset.
Two sample topics are shown in Table 3, where the top 10 representative words were obtained according to their weights in each topic.The topic names were manually determined by people in accordance with the top representative words.The topics obtained from both models were similar.Specifically, for each topic, a list of top representative words of the ATF model was similar to that of the TAT model.Not only these topics, but also the overall topics obtained from the ATF model were similar to the topics obtained from the TAT model.The topics obtained from both models were comprehensible to people, and specific enough to capture the details of research interests.For example, with respect to the topic Document searching, it was trivial for people to determine the name of the topic as Document searching in accordance with the top representative words.Further, the topics of both models were generally discussed the most in the dataset.The topics Descriptive languages, Robot group control, and Document searching were discussed in the research fields of artificial intelligence, real-time systems, and information retrieval, respectively.

Topic-wise research interest over years
In this subsection, we demonstrate that the ATF model is effective in obtaining topic-wise research interests of authors over years, which can be obtained using the probabilities of authors given a particular topic and a year, where the probabilities are computed through Eq. ( 6).In other words, given a particular topic t, it determines how much each author is interested in the topic t over years.Therefore, the topicwise research interest of authors over years can be thought of as temporal patterns of research interests of authors from a particular topics point of view.
A sample of research interest patterns of five authors for the topic Descriptive languages is depicted in Fig. 7, where research interest patterns of different authors are plotted with different colors.Although the two graphs in the figure are obtained from the same dataset, they are significantly different.For instance, with respect to the author Ian Horrocks, the TAT model says that his interest in Descriptive languages increased in 2008 but rapidly decreased in 2010, while the ATF model says that his interest decreased slowly between 2008 and 2010.For the author Bernardo Cuenca, the TAT model says that his research interest slowly decreased from 2008 to 2011, while the ATF model says that his research interest was highest in 2009.These differences imply that one of the models is better than the other model for representing the topicwise research interest over years.To know which model is more effective, we manually counted the number of papers written by the authors for each year, as described in Table 4, where each digit between brackets represents the number of papers about the topic Descriptive language.The plot obtained from the ATF model is more consistent    The digits are the number of papers co-authored with Ian Horrocks The digits are the number of papers co-authored with Ian Horrocks The prediction process of unseen documents is described in Fig. 11.We divided the dataset into 10 subsets, and performed tenfold cross-validation.We assumed that unseen documents were written by the known authors of the training dataset, so we ignored the authors who appeared only in unseen documents.We again compared the ATF model with the TAT model, and each model was used to rank all the known authors for each unseen document for a given publish year of the document.The ranking process is done using the symmetric KL divergence between the topic distribution of unseen document and the topic distribution of authors.For each unseen document, we collected the gap between the predicted author ranks and the ground-truth authors who are assumed to be the first rank.The mean and standard deviation of the collected gaps are depicted in Fig. 12 with various settings for the number of topics from 10 to 90, where the lower mean and standard deviation should be better.The performances increased as the number of topics T increased from 10 to 50, because the latent research interests were well discovered when T = 50.The performances declined and the gap between the two models decreased when the number of topics T was greater than 50, because redundant topics were generated when T > 50.Both models had their best performances when T = 50, and the ATF model generally exhibited better performances than the TAT model.It is worth noting that the purpose of ATF model is not the task of author prediction.The ATF model is designed to grasp the temporal patterns of research interests of authors.For every model, we use the same machine of dual core 3.20GHz CPU, 8GB RAM, 500GB HDD with Microsoft windows 7. We used Java and Eclipse Juno for implementation and experiments.From 10 to 90 topics, parameter approximation time with a fixed number of iterations (e.g., 1000 iterations) using a collapsed Gibbs sampling is measured and averaged.The result is shown in Table 12, which shows that AT model takes the shortest time because the sampling function of parameter approximation is simplest among the three models.The TAT model takes little more time than the ATF model.The reason is that the TAT model has to update the topic-year distribution for

Conclusion
The amount of Web documents increases exponentially, which makes it necessary to develop systems or models to automatically capture some latent patterns among the documents.The latent patterns typically change over time, which compose topical flows or trends.In this paper, we propose Author Topic-Flow (ATF) model, which effectively discovers research trends in each author's point of view.The state-of-the-art model, namely Temporal Author Topic (TAT) model, combines the temporal patterns of topics to compute the research trends of the authors, so we denote it as indirect topic flow.On the other hand, our proposed model has a variable to directly represent the research trends of each author, so we denote it as direct topic flow.That is, it allows each author to directly have a temporal pattern of research interests, while each author in the TAT model has only a topic proportion covering all the time spans.We performed empirical comparisons with the TAT model with a real-world dataset, and we proved that the ATF model is more effective and efficient in capturing the temporal patterns of research interests of authors.We hope that this study would be helpful in some other research areas such as user-adapted Web content recommendation on mobile platform [18] or out-of-domain detection of intelligent dialog systems.

Fig. 1
Fig. 1 Sample of research trends obtained from Microsoft academic research

Fig. 4 (
Fig.4(Left) a graphical representation of the Topics Over Time (TOT) model[29]; (right) a graphical representation of the Temporal Author Topic (TAT) model[6]

Fig. 5 A 2 V
Fig. 5 A graphical representation of the Author Topic-Flow (ATF) model

1 .
For the tth topic, (a) Draw a word distribution φ t from Dirichlet (β). 2. For the ath author, (a) Draw a time span distribution ψ a from Dirichlet (γ a ).(b) For the yth time span, i. Draw a topic distribution θ ay from Dirichlet (α a ). 3. For the dth document, (a) Generate a list of authors a d and a time span y d from Multinomial(ψ).(b) For the nth word, i. Choose an author a dn from U ni f orm(a d ).ii.Given a dn and y d , choose a topic z dn from Multinomial(θ a dn y d ).

Fig. 6
Fig. 6 Graphical representations of three models (e.g., AT model, TAT model, and ATF model)

Fig. 10
Fig. 10 Research interest patterns of two topics, Robot group control and Networks, for the author Christos Ampatzis, which were obtained from the TAT model (left) and the ATF model (right)

Table 3
Sample topics obtained from the two models

Table 6
Number of papers, written by Christos Ampatzis, about two topics, Robot group control and Networks, for each year Robot group control and Networks, of the author Christos Ampatzis are shown in Fig. 10, where the left figure is obtained from the TAT model and the right figure is obtained from the ATF model.The plots of the two models are significantly different.The ATF model says that the research interest about the topic Robot group control increases from 2008, while the TAT model says that the research interest decreases from 2009.To know which model gives the right result, we again manually counted the number of papers written by the author for each year, as described in Table6.It is obvious that the plot of the ATF model is more consistent with the Table6.For instance, his research interest about the topic Robot group control grew from 2008, which is well depicted in the plot of the ATF model.The reason for the difference in the plots of the two models is the same as the reason from the experiment in the previous subsection.That is, in the TAT model, each author has only a topic proportion without time span distribution, so its representation of interest patterns is poor than the ATF model in which each author has research interest patterns.

Table 7
Top five authors who are interested in the topic Descriptive language, obtained from the TAT model

Table 8
(7) five authors who are interested in the topic Descriptive language, obtained from the ATF model table of the top five authors according to the number of papers written by the authors for each year, as shown in Table9.The top ranked author in Table9is Ian Horrocks every year, which may seem consistent with the result of the TAT model.If we see, however, the set of authors for each year, then Table9is more consistent with the result of the ATF model as there are more common authors between Table8and Table9for every year.As each author has only a topic proportion covering all the years in the TAT model, the research interest of each author for a particular year is smoothed by the temporal patterns of topics, as described in Eq.(7).In other words, every author is more likely to share similar temporal patterns of research interests.
This makes the ranked authors in 2007 are also ranked in other years, finally resulting in poor results.The ATF model allows each author to have a unique temporal pattern of research interest, so it is more sensitive to the variations of research interests of each author.For example, in the year 2011, the author Frank Wolter appears in the 123 Y.-S.Jeong et al.

Table 10
Top three authors similar to Ian Horrocks obtained from the TAT model for each year

Table 11
Top three authors similar to Ian Horrocks obtained from the ATF model for each year

Table 12
Spent time for training three models (seconds) while the time span distribution ψ of the ATF model can be exactly computed without an iterative approximation.Thus, our proposed ATF model is not only more effective than the TAT model, but also efficient in terms of the training time.