Introduction

A notable trend has been observed in library and information science (LIS): the inclusion of “information” as part of the discipline’s name. After the foundation of the first library school in 1887, the School of Library Science at the University of Pittsburgh added “information” to its name in 1964, becoming the School of Library and Information Science. By the 1990s, almost all former library schools had followed the University of Pittsburgh’s example (Hjørland 2018). In the twenty-first century, the diminishing use of “library” and related terms was found in LIS dissertations (Sugimoto et al. 2010) and a decreased interest in library management was found in LIS publications (Figuerola et al. 2017); the tendency today is to use the label “Information Science” alone (Olson and Grudin 2009). These shifts in nomenclature indicate that the research areas of LIS have changed substantially over time, and the research focus is shifting to informational issues.

To understand the development of the discipline and how its research topics have changed over time, many researchers have explored the changes of research topics based on literature in the field. Bibliometric methods are prevalent approaches in evaluation studies (Zhao and Strotmann 2008, 2014; White and McCain 1998; Chang et al. 2015). Content analysis was also a widely applied approach (Järvelin and Vakkari 1993; Koufogiannakis et al. 2004; Blessinger and Frasier 2007). Today, an increased interest has shown in using model-based approaches to explore the intellectual structure of a domain (Sugimoto et al. 2010; Liu et al. 2015; Figuerola et al. 2017). The method allows researchers to examine large document collections.

Many studies have consulted journal rankings in a single year to compile a journal list for a diachronic analysis. However, such data corpora might have limitations. First, highly cited journals identified in a single year may limit the topic spectrum for the period of study. Nearly all journals are biased towards a certain research area to some extent, which can often be inferred from the name of the journal. The preference of selected journals may heavily influence the results. Second, journals that gained researchers’ attention decades ago may no longer be the center of focus due to the rapidly changing environment. Emerging topics may not be fully represented in the analyses. Neglecting these issues may lead to research results that are not representative enough to capture topic changes in the domain.

In response to such limitations, this study applied an improved method for journal selection by consulting all available journal citation reports to generate a dynamic journal list. The analysis was divided into five periods covering the years 1996–2019. Correspondingly, five datasets that include the most influential journals for each period were included in the evolution analysis. Furthermore, to address the research gap that diachronic analyses are rarely conducted based on journal articles using model-based approaches, this study utilized the latent Dirichlet allocation (LDA) topic model to uncover the underlying topics in the text corpora. Many studies have confirmed that LDA can effectively cluster meaningful and interpretable topics from a large number of documents (Blei and Lafferty 2007; Yau et al. 2014; Suominen and Toivanen 2015). The central research questions are as followings. What research topics have been addressed LIS between 1996 and 2019? How the research topics have changed over time? By combining an innovative journal selection method and LDA topic model, this study aims to contribute a new perspective on observing and understanding the research development in LIS.

Literature review

Studies of topic changes and intellectual structure in LIS can be divided into three groups according to the methods used: content analysis, bibliometric methods, and model-based approaches. This introduction organizes the relevant literature based on the method category. A structured review and comparison of the literature with the essential properties of these studies, is provided in the “Appendix 1”.

Content analysis

In content analysis, researchers identify the composition of research content and sort articles into classification schemes by analyzing a data corpus with a certain number of articles. Studies using content analysis to detect research development in LIS have either adopted the classification schemes of other researchers or have devised new schemes (Chang et al. 2015). This method is relatively dated and primarily appears in publications from the 1970s and 1980s (Tuomaala et al. 2014).

Blessinger and Frasier (2007) analyzed 10 influential journals using a combination of content analysis and citation analysis for the period 1994–2004. The study found that in 1994–2004, librarians were still mainly writing about the profession’s practical issues; In addition, they found that new technologies in information science, most notably the Internet, had a tremendous impact on almost every aspect of our profession during this decade. Tuomaala et al. (2014) conducted a content analysis of LIS evolution. They examined a total of 42 journals from the years 1965, 1985, and 2005. They identified the four most prominent research areas in LIS: information storage and retrieval, scientific communication, library and information-service activities, and information seeking. They further concluded that information retrieval was the most popular area of research over 1965–2005. The most significant changes in the investigated period were the decreasing interest in library and information-service activities and the growth of research about information seeking and scientific communication.

Bibliometric methods

Bibliometrics comprises various techniques, including keywords analysis, direct citation analysis, co-citation analysis, and bibliographic coupling analysis. Studies of the intellectual structure and development of LIS frequently use these techniques.

Keywords analysis

Onyancha (2018) investigated the evolution of LIS by tracking author-supplied keywords in research articles published between 1971 and 2015. The author found that LIS evolved from information systems design and management in the 1970s to encompass scientific communication, information storage and retrieval, information access, information and knowledge management, and user education in 2015.

Citation analysis

Larivière et al. (2012) presented an encapsulated history of LIS by examining approximately 96,000 papers in 61 journals over the field’s first hundred years (1900–2010). Their analysis of lexicon frequency and bibliometric indicators revealed two major structural shifts: in 1960, LIS changed from a professional field focused on librarianship to an academic field focused on information and use, and in 1990, LIS began to receive more citations from outside the field. The study of Åström (2007) examined the most-cited articles from 21 LIS journals to identify the changes in research fronts from 1990 to 2004. The study showed that the main fields in LIS are information seeking and retrieval (ISR) and informetrics. The author also found that changes in the discipline can be seen primarily within these two fields rather than in new fields entering the discipline. The study showed that the IR field had become ISR and that webometrics had grown considerably, to the extent that it has come to dominate LIS research over 2000–2004. Chang et al. (2015) analyzed keywords, bibliographic coupling, and co-citation to track changes in LIS research subjects during four periods between 1995 and 2014. By examining 580 highly cited LIS articles, they found that the two subjects “information seeking (IS) and information retrieval (IR)” and “bibliometrics” appeared in all four periods. However, they observed that the percentage of articles in which the topics appeared was decreasing for IS and increasing for bibliometrics.

Model-based approaches

Model-based approaches have frequently been employed to detect the intellectual structure of a scientific domain based on the aggregated literature. This approach enables researchers to examine a larger corpus of text data than content analysis and bibliometric methods.

Liu et al. (2015) investigated the intellectual structure of library and information science using the formal concept analysis (FCA) method. By analyzing the papers published in 16 prominent journals in the LIS domain from 2001 to 2013, the authors identified nine main LIS research themes: bibliometrics, scientometrics, and informetrics; citation analysis; information retrieval; information behavior; libraries; user studies; social network analysis; information behavior; and webometrics. Sugimoto et al. (2010) examined 3121 doctoral dissertations using an LDA model to explore the development of LIS from 1930 to 2009. They found that LIS topics have changed substantially over time. Nonetheless, some themes occurred in multiple periods, representing core areas of the field: library history, citation analysis, information-seeking behavior, information retrieval, and information use. The authors noted the diminishing use of the word “library” and related terms. Another study using LDA was conducted by Figuerola et al. (2017), who analyzed title and abstract of academic production to investigate significant trends and subdomains in LIS. They examined in total of 92,705 documents for the period 1978–2014 in the database LISA (Library and Information Science Abstracts). In the results, they identified 19 dominant topics, which were further clustered into four main areas: process, information technology, library and specific areas of information application. Furthermore, they observed a notable growth in the specialized documentation for specific areas of activity (business, health, low, education, media, heritage), a decrease in the relative importance of libraries, and a constant but changing interest in information technology.

The variation of research methods and data-selection criteria makes it difficult to compare results. However, these studies exhibit the different characteristics of LIS research subjects, enabling researchers to observe the research trends in LIS from different perspectives.

Methodology

In this study, the author devised an improved method for journal selection by consulting all available JCRs for LIS. A dynamic journal list and five datasets were generated for further analysis. The author used LDA topic modeling to detect the underlying topics in the text corpus.

Journal selection and data collection

The author developed an improved data-selection process by consulting all available journal rankings in LIS from 1997 to 2018. JCR is an annual publication that provides the impact factors and rankings of journals based on citations. The journals with the highest impact factors are often considered core journals in the field and attract the attention of researchers. As a result, journals with high impact factors are often used to study disciplinary development and evolution. Since 1997, JCR has incorporated LIS as a discipline. Until the 2018 report, there are 22 yearly reports for LIS. Based on these reports, the most influential journals for each period between 1996 and 2019 (1996–2000, 2001–2005, 2005–2010, 2011–2015, and 2016–2019) were identified. For each of the five periods, the occurrence of each of the top 20 journals was recorded. In this way, the nine journals with the highest occurrence were selected (Table 1). Articles were sourced from Scopus based on the journal list. For the topic analysis, the title, abstract, and keywords of each document was used. In total 14,053 articles published from 1996 to 2019 were collected (Table 2). The data sets were further processed and analyzed by LDA topic modeling.

Table 1 Journal list
Table 2 Number of articles

LDA

As a result of the information explosion, new algorithmic tools are needed to understand, organize, and search information from large informational corpora. Topic modeling was designed to uncover hidden topical patterns in vast corpora. LDA is a probabilistic model proposed by Blei et al. (2003). The model analyzes the thematic structure of a corpus and can also perform topic clustering or text classification based on the topic distribution. Previous studies (Blei and Lafferty 2007; Yau et al. 2014; Suominen and Toivanen 2015) found that LDA performed well for understanding the rich underlying topical structure of a field.

LDA treats data as from a generative probabilistic process. This approach assumes that a document is composed of a group of words and that there is no sequential relationship between them. Therefore, it represents typical bag-of-words modeling. The intuition behind LDA is that documents include multiple topics, and so there is a probability of topic distribution for each topic. Each topic is depicted as a distribution over terms in a fixed vocabulary, with different topics represented by different probabilities of words within the vocabulary. The LDA topic model can be visualized by a graphical model (Fig. 1). The boxes are plates representing replications. The figure can be explained as follows: There are K topics in the collection. Each topic features a multinomial distribution over the vocabulary and is assumed to have been drawn from a Dirichlet (η). The generative process is performed for each document d as follows: First, select a distribution over topics θd from Dirichlet(α). Then, for each word n in the document, draw a topic index zd,n from the topic proportions θd. Finally, draw the observed word wd,n from the selected topic \(\beta_{{z_{d,n} }}\).

Fig. 1
figure 1

Latent Dirichlet allocation graphical model (Blei 2009). α: Dirichlet parameter. θd: The topic distribution of document d generated. βk: Distribution over terms of topic k. Zd.n: Topic assignment of word n in document d. Wd,n: observed word n in document d. α: delta Dirichlet distribution. η: topic hyperparameter

There are many implementation tools for LDA. This analysis applied the Gensim python library (Gensim: LDA Model, n.d.) to perform LDA. Before applying LDA, one must decide the number of topics for the corpus. Perplexity analysis, which estimates the performance of topic clustering based on a smaller set of data, was often used to determine the number of topics. There are also researchers who chose the number of topics based on their judgment or tests (Blei et al. 2003; Newman and Block 2006; Figuerola et al. 2017). In the present study, the author consulted the cluster number used in the studies of Sugimoto et al. (2010) and Yan (2015). (Despite using different approaches to determine the topic number, they both used 50 topics in their studies.) The number of topics for each text corpus was independently decided based on LDA tests on sample texts and ranged from 30 to 50 topics.

After gaining the results based on 14,059 articles, for each period the topics were ranked by probability values and the top ten clusters with their 10 most relevant keywords were selected as being the most representative topics. To facilitate the understanding and analysis of topics, the author manually examined a number of articles: For each corpus, a set of randomly selected articles was generated by searching every 20th document. In this sample texts set, the original text including title, keywords, and abstract along with the topic distribution for each document was examined in detail, enabling the identification of the most representative document and the interpretation of the top topics embedded in different research contexts. The most relevant documents and their source were displayed in “Appendix 26”.

Results

For a clear overview, each topic was labeled as some keywords. Since the definition of LIS subdomains is still a discussion in the field, the author attempted to label the topics by objectively reflecting their keywords and representative documents rather than pre-defining the domains. This may contribute a straightforward understanding of the topic and an easier comparison with previous studies.

1996–2000

In this period, 1509 documents were included in the corpus. The sample texts set examined for further interpretation contained 76 articles. Ten representative articles and their sources are listed in the “Appendix 2”. Table 3 provides an overview of the results.

Table 3 Latent Dirichlet allocation results for the period 1996–2000

Library science dominates the period’s top two topics, A1 and A2. Public libraries are the main research context in A1. The cluster includes issues such as digital resources, librarians, and technologies. Topic A2 addresses issues including librarianship, collection management, and research activities with an emphasis on academic libraries. Topic A3 outline a research field related to information management, including issues regarding data processing, technologies, and information use. Citation analysis (A7) and relevant clusters were also frequently discussed in this period. Issues related to impact factors (A4) attracted the most attention in the domain. In addition, studies of patent publications (A8) and scientific collaborations (A5) formed two small clusters that received much attention from researchers. The keywords in the cluster A6 identify the issue of information retrieval, including both information systems and users. Topic A9 represents the field of information seeking and behavior. A review of relevant documents indicates that topic A10 primarily includes studies about information system design and improvement.

2001–2005

The corpus for 2001–2005 included 2029 documents. The sample set included 102 articles. Ten representative texts are listed in the “Appendix 3”. An overview of the results is provided in Table 4.

Table 4 Latent Dirichlet allocation results for the period 2001–2005

The keywords in B1 provide a rather broad description of the digital information environment. A review of representative documents indicates that the topic is closely relevant to information retrieval. Topic B2 suggests that various methods and algorithms were explored and evaluated based on text sources. The research demonstrated in cluster B3 is closely related to informational activities in organizations and industries, including information technology, information processing, and information systems. The analysis of sample documents relevant to cluster B4 shows that the topic demonstrates the continuity of the study of information systems from the last period (A3). In this period, library-related issues merged into one cluster (B5). The issue relevant to WWW is demonstrated in topic B6. The keywords of topic B7 reveal an interest in networks, communities, and social environments. Topic A8 includes two subjects: the implementation of new systems or models in an organization and testing the acceptance of customers. Topic B9 shows the continuity of the cluster citation analysis (Topic A4) from the last period. In cluster B10, the majority of keywords concentrate on the user studies. A review of sample texts suggests a similar research area to the 1996–2000 period (Topic A9).

2006–2010

The corpus for this period included 3223 documents. The sample set included 162 articles. Ten sample texts and their sources are listed in the “Appendix 4”. An overview of the results is provided in Table 5.

Table 5 Latent Dirichlet allocation results for the period 2006–2010

Citation analysis (Topic C1) with an emphasis on journal impact factor grew considerably to the extent that it became the most significant research area in LIS over the years 2006–2010. The cluster C2 encompasses documents related to information retrieval, with a preference for documents about text-based retrieval. This keywords in topic C3 clearly identify the field of information seeking and behavior. Knowledge management (Topic C4) first emerged as a major LIS subdomain in 2006–2010. The keywords reveal an essential feature of this topic: knowledge management studies had a close relationship with commercial organizations. The topic of Governmental information management is reflected in cluster C5. A review of representative documents revealed that service, digital resources and e-government popular themes in this topic. The keywords of topic C6 reveal an interest in scientific collaboration. Linguistic analysis based on text mining was identified as an influential topic cluster (C7) in this period. After the first appearance in 1996–2000 as topic A8, patent analysis occurred again in this period (C8). In the research included in cluster C9, the interaction between users and systems was thoroughly investigated to test the acceptance of new technology. A similar cluster was also identified in the last period (Topic B8). Topic C10’s most prevalent terms show that its theme is scientific performance. The sample document, which has the highest reference probability for the topic, also supports this assumption. However, the sample set does not explain the appearance of words like “Wikipedia” and “economic.”

2011–2015

In this period, 3840 documents were included in the corpus. The sample set included 192 articles. Ten sample texts together with their source are listed in the “Appendix 5”. An overview of the results is provided in Table 6.

Table 6 Latent Dirichlet allocation results for the period 2011–2015

Bibliometrics analysis (Topic D1) was the largest topic in the 2011–2015 period. The keywords reveal that trend analyses of publications within a scientific field or a country were frequently conducted during this period. Including some of the same words as topic D1, Topic D2 seems to be a replicated topic. However, a detailed examination of the keywords reveals an emphasis on “evaluation” and “performance.” The research performance in various fields was intensively studied using various methods. The sample set indicates that comparisons between research fields, nations, and regions were also popular during this period. Topic D3 concentrates on the techniques and methods applied for measurement in citation analysis. Topic D4 may be considered a continuation of topic C6, scientific collaboration, from last period. Citation analysis with an emphasis on journal impact factors (D5) appears again as one of the most important topics in this period. The keywords in topic D6 cover information management in multiple contexts. Governmental issues (D7) continued to receive considerable interest from researchers in this period. Topic D8 demonstrates research interest in the online-community environment. The sample texts reveal that various research questions were investigated, including user behavior, business activities, and social media. Topic D9 encompasses studies of informational activities in various types of organizations. Analyses related to ranking activities (D10) are identified as a popular topic in this period.

2016–2019

In this period, the corpus includes 3452 documents. The sample set included 173 articles. Ten sample texts and their sources are listed in the “Appendix 6”. An overview of the results is provided in Table 7.

Table 7 Latent Dirichlet allocation results for the period 2016–2019

Citation analysis (Topic E1) with an emphasis on journal impact factor continues to constitute a large volume of today’s LIS publications. The continual appearance of this topic in all study periods (topics A4, B9, C1, and D5) proves that citation analysis is a steady and essential field in LIS. Social media (E2) is a fast-growing field that emerges as a significant topic during this period. Information management and processing (E3) is also a stable area appearing across multiple periods covered in this study (B3, C4, and D9). Topic E4 focuses on the applications of various algorithm- and model-based approaches on large document collections. Similar topics were also identified for previous periods (B2, C3, and C8). Knowledge sharing (E5) is a new topic emerging in this period. Studies within this topic usually investigate group communication mechanisms to improve work efficiency. A review of documents representative of topic E6 suggests a continuous interest in e-governance during this period. Topic E7 identifies informational activities within organizations. Studies under this topic investigated various strategies and systems related to information management to improve companies’ performance. In topic E8, a series of studies regarding online commercial activities, especially user behavior, are included. The topic of mobile application is clearly demonstrated in topic E9. Similar to social media, the social app topic comprises a wide range of issues, such as user behavior, health information, and information privacy. Based on a review of sample documents, topic E10 is considered one branch of studies related to knowledge management.

Discussion

Aggregated results

Aggregated results based on all 50 topics analyzed above are provided in this section for a diachronic track of the research trends in LIS.

Field level

Table 8 provides a summary of all topics from 1996 to 2019. The topics are organized spatially from top to bottom in descending order by their probability value. The topics were grouped into three categories to gain a holistic view of the LIS domain: library science (orange), bibliometrics (green), and information science and related issues (blue). A clear decrease in library science can be observed: During the period 1996–2000, library science still dominated the field with the top two clusters; in 2001–2005, library science shrank to a single cluster with a much lower proportion of total research; and since 2006, there are no clusters representing library issues in the top 10 topics.

Table 8 Overview of results (field level). (Color table online)

Bibliometrics was grouped as another category in LIS for its large number of documents and stable subclusters. In this study, the term “bibliometrics” represents bibliometrics, scientometrics, and informetrics, which share overlapping interests in the dynamics of disciplines as reflected in their literature (Hood and Wilson 2001). Although the number of topic clusters related to bibliometrics fluctuates across the periods, bibliometrics has proven to be a stable area in LIS. Some topics repeatedly occur across periods, and in rare cases, new topics emerged. Although citation analysis comprises only one cluster in the recent period, the field was nonetheless identified as the largest topic in the corpus.

On the contrary, information science and its related fields show a stable number of clusters but reflect intensive changes within the field. Table 9 provides more information about topic-level changes.

Table 9 Overview of results (topic level). (Color table online)

Topic level

In Table 9, topics are ranked by stability (the number of occurrences across periods) rather than probability value. Similar clusters are highlighted in the same color for a diachronic overview. The purple color at the bottom includes all topics that occurred less than twice across all periods.

Citation analysis is the most stable cluster and appeared in all periods. In this cluster, the topic of journal impact factors was intensively investigated. Information retrieval appears in all periods except 2011–2015, where bibliometrics thrived and formed more subclusters. Information retrieval has proven to be the most important area within information science, as researchers have focused more on model and algorithm-based text analysis.

The next layer of topics comprises information systems and organizational activities. These two fields have demonstrated a close relationship through time and are sometimes difficult to separate. In the first two periods, information systems were discussed mainly in terms of their application and design, whereas in the last two periods, the topic had a close relationship with commercial activities. It is worth noting that knowledge management and knowledge sharing were discussed intensively from 2016 to 2019.

The field of information seeking and behavior was stable in the first three periods. However, studies conducted since 2011 did not mention the field as frequently as previous studies. This does not necessarily imply that user studies are becoming less prevalent because such studies may be dispersed over other topics, such as social media. Scientific collaboration and research performance appeared in three periods. The review of relevant documents shows that the field has a close relationship with citation analysis.

Library-related issues occurred in the first two periods while government issues were studied during the last three. The study of Liu and Yang (2019), which exclusively examined research topics in library journals for the period 2008–2017, reveals the frequent use of the keywords “e-government” and “government” in library science. It is reasonable to assume that governmental issues may have taken over the interest in libraries to some extent.

Topics that occurred less than twice across all periods are listed at the bottom of the table. These topics included WWW, technology applications, networks and online communities, research ranking, social media, organizational innovation, and mobile applications. Topics that occurred only once have undergone a change of technological context from the Internet and networks to social media and mobile applications. These short-lived topics have one common feature: most of them describe a research context and are motivated by the development of technology. Although only appearing for a limited time, these topics have received so much attention that the discipline LIS is recognized as technology-driven.

Comparison with previous studies

Sugimoto et al. (2010) conducted a representative study using author-topic model, an extension of LDA, to investigate the evolution of LIS. The authors investigated dominant topics in doctoral dissertations between 1930 and 2009. Table 10 lists the topics identified by Sugimoto et al. (2010) for periods that overlap with this study.

Table 10 Summary of topics during 1990–2009 in Sugimoto et al. (2010, p. 193)

Regarding the areas of information retrieval, information seeking, and library science, the two studies share a consistency in the overlapping period. However, there are two significant differences. First, the present study identifies bibliometrics as a substantial component of LIS, while there is no such topic in the overlapping period in the results of Sugimoto et al. (2010). Sugimoto et al. (2010) discussed a similar issue when comparing their findings to the study of Åström (2007), who used highly cited journals as a data source. This may suggest that bibliometrics was not intensively investigated among LIS dissertations. Secondly, this study presents a broader spectrum of topics than the study of Sugimoto et al. (2010). This may reflect the different data sources applied in the two studies, suggesting that journal articles are more flexible and sensible to the external social circumstances and technological development, whereas dissertations have a relatively narrow research scope concentrating on the core areas of LIS.

The study of Åström (2007) used co-citation analysis to investigate research topics in LIS based on highly cited journal articles. The author’s findings regarding bibliometrics and information science are consistent with this study (Table 11). However, Åström (2007) found no clusters relevant to libraries in periods that overlap with this study and fewer subfields overall. This discrepancy may reflect the combined effect of journal selection and research methods. Chang et al. (2015) conducted another representative bibliometric analysis. The authors applied three methods to examine LIS research subjects based on 580 highly cited journal articles (1995–2014). They identified three areas in LIS: bibliometrics, IS, and IR and AIT (application of Internet technology). They found a decreasing trend in IS and IR and an increasing trend in bibliometrics, which can also be found in the periods that overlap with this study. Across these two studies, bibliometric methods based on highly cited journals rarely identified library science as a dominant LIS topic.

Table 11 Summary of topics identified by Åström (2007) for 1995–2004

LDA

The interpretation of the topics generated in this study is generally straightforward. LDA proved to be an excellent method for understanding the rich underlying topical structure of a field and on demonstrating emerging and sustained trends. One essential feature of LDA is that it assumes that one document addresses multiple topics. On the one hand, this assumption enables researchers to detect the underlying structure of a corpus more precisely. On the other hand, it provides a detailed perspective on how the topics are combined in the documents, which is especially essential for LIS because it is interdisciplinary and technology-driven. Until now, most studies of interdisciplinary disciplines have been based primarily on indirect indicators such as faulty composition, co-authorship or citations (e.g., Chang and Huang 2011; Huang and Chang 2012; Prebor 2010). In contrast, topic-level analysis enables the direct inspection of the topic components of a large text corpus and the examination of how the different topics are combined. In this way, LDA may provide a new level of granularity for examining highly interdisciplinary areas of LIS (Yan 2015).

The relationship between technology and LIS can be further elucidated using LDA. For example, today, numerous disciplines take social media as a popular research subject, including psychology, social science, computer science, and economics. In order to contribute to a better understanding of the nature of LIS, it is important to determine which aspects of social media were integrated into the field, rather than merely claiming that social media is a hot topic in the field and placing it alongside other topics such as information retrieval or user studies. Numerous such topics make the development of LIS rather confusing and unclear. LDA enables further exploration of the point where technology and LIS meet.

Without the correlation information between topics, the identification of relationships between clusters largely depends on the manual interpretation of researchers. Regarding the limitation, Blei and Lafferty (2007) proposed a modified topic model, the correlated topic model (CTM), which “gives a more realistic model of the latent topic structure where the presence of one latent topic may be correlated with the presence of another” (p. 19). Another improvement based on LDA is dynamic topic models (DTM), which was specifically designed for the study of the time evolution of topics. DTM can be used to capture the evolution of topics in a large sequential text corpus, observing how new topics emerge and disappear over time in a field (Blei and Lafferty 2006). Future studies regarding research trends and the intellectual structure of a domain may also consider utilizing the modified models.

Library and information science

In its early stages, library science focused on the professions of librarianship and collection management. This focus on practical library issues rather than the management of information in books led to the name “library science,” which misled some about the nature of the subject, suggesting a science taking the institute library as the primary research subject. This led to various critiques of the field in its early stage. However, what makes the library a distinct organization is the feature of managing a large amount of information. Without the efficient tools available today, collection management at that time largely depended on professional librarians with technical skills. It was not until the transfer of information from paper to digital form that activities and studies related to information management radical changed. The shift of attention to information systems has been accompanied by the fading of library science. Libraries become ordinary organizations, as the large volume of information that they manage is not unusual among organizations equipped with modern technologies. The focus of libraries has gradually moved to user service and governance. In the new digital environment, researchers must acquire skills in information management to solve problems and provide better access to users, much as earlier librarians had to gain professional skills. As a result, LIS and computer science are intimately related. This relationship has been verified by numerous studies of various aspects of LIS, such as journal rankings, university faculty, and research topics.

Today, the scope of the field of information science as an independent domain is larger than information science evolved from LIS. One remarkable feature is the field’s close relationship with the economy and information activities within organizations (Stock and Stock 2013), which was also demonstrated by the results in this study during 2016–2019. These subjects are not widely accepted as classic fields in LIS, which is why many researchers exclude certain journals from the LIS categories in JRC. However, given that LIS will develop further towards information science, such merging is inevitable and, in fact, is already reflected in the rankings of influential journals in LIS. Another issue that complicates the definition of information science as a field is its intimate relationship with information technology. The rapid development of technology causes the field’s research focus to change constantly. However, the field’s changing topics have one feature in common: they all address the properties of the external information environment. Two constant features are information and humans. Saracevic (1999) noted that it is hard to predict the future of LIS because the field is, by its nature, technology driven. However, perhaps precisely because of this nature, the future path of LIS is foreseeable when the true nature of the field is understood.

The third broad cluster included in the field is bibliometrics. Bibliometrics, especially citation analysis, is very stable across the periods. It has developed into one of the most dominant areas of LIS today. According to Hayes (2009), the major impact of the automation of libraries in the period of 1990–2008 was on print journals, which are rapidly disappearing and being replaced by electronic access via the Internet. This might be one of the main reasons for the thriving development of this field in the later periods. Apart from citation analysis, some areas related to scientific communication, research performance and international publications remained popular over the decades.

The data corpus in the period of 2015–2019 shows frequent use of the word “knowledge,” partially replacing the word “information” in some circumstances. The Fig. 2 shows the term frequency of the word “library,” “information,” and “knowledge” over the years in the datasets. The word “information” and “knowledge” have experienced an evident growth, whereas the word “library” has been gradually less used.The findings in the figure consistent well with the assumption concluded in the topic analysis regarding library science and the use of the word knowledge.

Fig. 2
figure 2

Term frequency

Limitations

The topic clusters are sensitive to the amount of data being analyzed. A journal rarely features complete coverage of research areas in LIS. Instead, most journals are, to some extent, biased towards a certain domain. The different number of articles in each journal caused a cluster heterogeneity issue in LDA when all the documents were joined into one corpus. With a larger volume of texts, some topics may generate multiple small clusters that would otherwise be assigned to a single category by manual interpretation. This issue is especially noticeable in the period 2011–2015, where topics about bibliometrics formed more sub-clusters.

To ensure that the research trends and the most prevalent topics in each period could be sensitively detected, this study applied a journal selection method based on all available JCRs of LIS. This limited the study period to 1996–2019. The exclusion of the period before 1996 is due to the lack of a consistent journal selection criteria, leading to a gap period that is worthy of further investigation.

Conclusion

This study used an improved method of journal selection to investigate the evolution of research topics in LIS based on LDA modeling. The analysis was divided into five periods covering the years from 1996 to 2019. A dynamic journal list with the most influential journals of each period was generated. In total,14,053 journal articles were included in the analysis, and their titles, abstracts, and keywords were used as the text corpus for LDA to identify underlying topics. For each period, the top 10 topics and their keywords were identified for further analysis.

This diachronic analysis shows that library science is gradually losing its dominance within LIS. One of the most remarkable indicators is the decrease in clusters related to library issues. In information science, information retrieval has consistently been the dominant domain, with its interests gradually shifting towards model- or algorithm-based text processing. Information seeking and behavior is also a stable field which tends to disperse among various topics rather than be identified as a distinct topic. Information systems and organizational activities have been discussed continuously and have developed a close relationship with e-commerce. The short-lived topics (those that appear in only one period) evidence a shift in technological context from the Internet and networks to social media and mobile applications. Bibliometrics has proven to be a stable area in LIS. Some topics, including citation analysis, scientific collaboration, and research performance, repeatedly occur across periods, and in rare cases, new topics emerged.

This study presents front research topics in LIS for the period 1996–2019. By combining a unique journal selection method and LDA topic modeling, the research contributes to a new perspective on observing the evolution of the domain. This study exhibits a diversity of research topics in LIS and reveals some research diachronic trends. In future work, the research topics in LIS journals before 1996 may still be worth to be explored using LDA. Furthermore, the structure and development of interdisciplinarity could be further examined by analyzing topic distribution in documents. From a holistic view, the author argues for further classification of research topics and the establishment of a systematic topic frame, for example, distinguishing topics by different attributes like method, research context, content, and user group.