Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI

Guo, Sicheng; Si, Li; Liu, Xianrui

doi:10.3390/app13137545

Open AccessReview

Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI

by

Sicheng Guo

^1,*,

Li Si

^1,2,* and

Xianrui Liu

^1,*

¹

School of Information Management, Wuhan University, Wuhan 430072, China

²

Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(13), 7545; https://doi.org/10.3390/app13137545

Submission received: 8 March 2023 / Revised: 14 June 2023 / Accepted: 16 June 2023 / Published: 26 June 2023

(This article belongs to the Special Issue New Techniques of Machine Learning and Deep Learning in Text Classification)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

As a hot topic in the field of library and information, the research on topic recognition and trend prediction has been paid close attention by academic circles. This paper uses a systematic literature review, bibliometric analyses and classification methods. Through a systematic literature review, 96 studies about topic identification and evolution prediction models are selected from the CNKI database. By using VOSviewer to conduct bibliometric analyses, the key research content and themes are revealed. Through the classification method, EXCEL is used to summarize models and algorithms used in the literature comprehensively. It is found that topic identification models and algorithms can be divided into four categories: ① Topic model based on LDA and related derivative models. ② Machine learning and deep learning methods. ③ Methods based on reference relation. ④ Text mining methods. Trend prediction models and algorithms mainly cover two categories: ① deep learning or machine learning models and algorithms based on time sequence; ② link prediction algorithms based on complex network. At the same time, we have also summarized the common index system involved in each study and the way to evaluate the effectiveness of the method, thus this paper comprehensively reveals the application progress in academic circles of topic identification and prediction models and algorithms from the last 10 years and beyond, based on the CNKI database. The purpose is to determine the most popular models and algorithms applied in research, generalize the corresponding indicator systems and validation methods, and finally provide references for model choice or evaluation when identifying and predicting topics in the future. Thus, this paper can help us to understand the overall progress made in text analysis research, and provides a useful reference for selecting and applying the appropriate models, algorithms and indicators.

Keywords:

topic identification; topic prediction; models; algorithms; machine learning; indicators; evaluation

1. Introduction

The exchange and diffusion of knowledge has always been a major requisite and core link in the process of knowledge production, development and application [1]. According to the different objects and contents, it can be divided into scientific theme diffusion, which reflects scientific research achievements, and technological theme diffusion, which emphasizes knowledge application and scientific and technological innovation [2]. The two concepts are both independent and intrinsically linked. The comprehensive identification and prediction of scientific literature topics and technology patent topics is an effective approach to understanding the nature of knowledge, and is an important way to control its development trend [3]. Topic identification and prediction are critical to exploring the progress of the discipline, clarifying knowledge hotspots and judging the development trends. Hence, determining which models, algorithms and common indicators are most widely used is of high concern.

The identification of topics and predicting patent technology’s evolution has become one of the main focuses of library and information science in recent years. Before the 20th century, the main methodology for predicting the evolution of a subject and its related technologies has been based on the collective wisdom of experts, the reliability of which has been completely dependent on the vision and scientific research level of experts. The modern approach to the prediction of subject topics has gradually split into two major systems: statistical regression prediction methods and machine learning prediction methods. Statistical regression prediction methods include linear regression, the Adaboost regressor, and so on. These models are less computationally intensive, quicker to build, and can be interpreted separate for each variable [4]. Machine learning prediction methods include LDA, SVM, Word2vec, K-means, TF-IDF, BERT, etc. These can be divided into topic models (e.g., LDA) and classification algorithms (e.g., SVM). Topic models can aggregate large amounts of text to uncover hidden semantic structures, and thereby indicate potential topics within the text [5]. Classification algorithms attempt to learn the relationship between a set of feature variables and a target variable of interest. They have wide applicability in data mining-related scenarios, wherein many objects can be characterized by finding correlations between features and targets [6]. Recent studies have provided references, outlining methods and perspectives relevant to this study. However, the scenarios and applications of these two types of systems need to be further explored [7]. It is also not clear which indicators and features should be used to select the most appropriate algorithms and models for topic identification and prediction [8,9]. Typical representatives of each type of model and algorithm should be taken for analysis, enabling us to identify the most frequently used algorithms and models in both groups of systems, and providing us with indicators to assess their effectiveness. At the same time, the most recent studies have focused on one or two kinds of model or algorithm, and have mainly used quantitative methods. However, we have not analyzed these studies one by one, and so our conclusion can only be taken as a general reference at a more coarse-grained level.

This study aims to elucidate the scenarios and applications of algorithms and models used for topic identification and prediction in China, identify the most frequently used algorithms and models in both groups of systems, and provide indicators to assess their effectiveness. In order to fully understand Chinese research from recent years on topic identification and evolution prediction models, as well as their application, at a more granular level, we have systematically collected relevant research on topic identification and evolution, and undertaken statistical analyses of the models, methods and indicators adopted, with a view to providing a reference for future research on topic identification and trend prediction.

2. Literature Review

Through the previous research, it has been found that most of existing literature reviews have been carried out from the two aspects of topic identification and evolution prediction.

2.1. Research on Topic Identification

Research on topic identification mainly focuses on the overall research and the summary of methods adopted. Generally speaking, Wang summarized topic discovery and evolution research outlined in Chinese scientific and technological documents based on the topic model, using the document analysis method [10]. Kai and Dou analyzed the research status of disruptive technology identification from the three perspectives of data sources, technical characteristics and identification methods [11]. Zhang and Dong analyzed the existing index system for subversive technology identification from the perspectives of technical characteristics, market characteristics and the macro environment [12]. Qiao et al. sorted out the existing subversive technology identification methods from the perspective of technology, the market, and a combination of technology and the market [13]. Park et al. proposed a text mining framework to extract the main topics from speeches [14]. Savin et al. identified around 20 topics related to service robotics by applying13 topic modeling [15].

As regards the application methods, Yang summarized emerging topic identification methodology systems, including an index and trend analysis method based on external feature statistics, a citation analysis method based on document citation patterns, a network analysis method based on co-occurrence relationships and a text mining method based on semantic analyses [2]. Li and Zhao summarized topic discovery technology based on co-word analyses, and improved the partition clustering algorithm, hierarchical clustering algorithm and other clustering methods [16]. Liu and Tan focused on quantitative research methods based on literature assessments and data mining, and classified them into three categories—subject word or literature statistics methods, citation network clustering methods, and text mining analysis methods—and carried out comparative analyses of these methods [17]. Luo et al. reviewed urban flood numerical simulations by summarizing the calculation methods of surface runoff, drainage systems, and coupled models [18]. Shelton et al. provided an overview of the methods of data collection, sampling, and analysis in qualitative research [19]. Shuo et al. compared different research methods related to the soil cracking field [20].

In general, these studies give an overview of topic identification research and the methods applied, but they do not subject the models and algorithms separately to analysis and study, and these studies have not taken a quantitative approach to assessing the topic identification methods either.

2.2. Research on Topic Prediction

The research on trend prediction mainly focuses on summaries of application methods. Zhang, Fang and Wang summarized the main research methods used in assessing technology evolution into five categories: patent citation analysis, text mining, classic model of technology life cycle, TRIZ theory and network analysis [21]. Zhou summed up the quantitative methods applied to address technology foresight, including growth curves, literature measurements, patent analyses, data mining, social network analysis, TRIZ, TFDEA, etc. [22]. Disruptive technology identification and prediction methods have drawn wide attention. Liu and Wu conducted research from the perspectives of technology management and application, and statistical analyses of patents or scientific papers [23]. Wang et al. carried out method classification according to their different uses, namely, technology supply and demand, scoring models, future scenario assumption, quantitative models and literature measurements [24]. Coelho et al. provided an overview of machine learning (ML) methods applied in predicting electrochemical corrosion [25]. Hond et al. addressed AI-based prediction model (AIPM) development, evaluation and implementation in computer science [26].

These literature reviews related to topic prediction provide references, with methods and perspectives relevant to this study. However, these studies have mainly used quantitative methods, and do not address the papers one by one. By employing a literature research and classification method, this paper begins with quantitative statistics, and analyzes the models and algorithms used in the classification analysis of the sampled literature one by one, aiming to objectively and comprehensively reveal the general situation and application prospects of current research methods.

3. Research Design

In order to achieve its research target, this study presents three major research questions. To address these questions, we adopt the most appropriate research methods and steps. The details are as follows.

3.1. Research Questions

The aim of this study is to analyze the current state of topic prediction and identification algorithms and models based on CNKI, and to provide a reference for subsequent research. This study is conducted via the three aspects of application overview, application preferences and application evaluation. Thus, the following questions are specifically addressed:

Question 1—What is the general state of the application of topic identification and evolutionary models and algorithms used in the literature?

Question 2—Which topic identification and prediction models and algorithms are widely used, and how are they applied in the literature specifically?

Question 3—What are the common indicators used in topic identification and prediction, and how are they validated in the literature?

Through Question 1, this study composes an overview of the application of algorithms and models, including all types, numbers and chronological distribution of algorithms and models. Since the current research focuses more on the application of algorithms or models in a specific field [27,28], it is essential to understand the whole landscape so as to better promote the effective selection and application of algorithms and models. By exploring Question 2, essential algorithms and models are filtered out, and their specific modes of application are parsed. This can be correlated with previous studies [29,30] that sought to dissect exactly how specific algorithms and models are applied, so as to inform the subsequent selection and optimization of algorithms and models. Finally, addressing Question 3 provides indicators for selecting and evaluating algorithms and models, and determining the most appropriate algorithms and models when analyzing within texts, allowing validity to be verified. This allows for the optimization and evaluation of existing algorithms and models [9,10], enabling us to advance research in the field of text analysis. Based on this, this paper provides a comprehensive insight into the current state of algorithms and models in the field of topic analysis and prediction.

3.2. Research Method and Process

3.2.1. Research Method

Two kinds of research methods are applied, as follows:

Systematic literature review (SLR). This is a specific type of literature review that is characterized as methodical, comprehensive, transparent, and replicable. It consists of raising research questions on a specific topic, searching and obtaining all relevant literature in a comprehensive way, and conducting a systematic assessment and integration of analysis to address the research question. This method contains five steps: defining research scope, developing selection criteria, planning the search, collecting and screening the literature, and presenting results [31];
Bibliometric analysis. This is a quantitative method used to explore and describe previous studies, and is helpful for evaluating academic studies in a specific field [32,33]. It offers a systematic, transparent, and reproducible review process, and thus enhances the reliability and quality of the review [34]. In this study, VOSviewer software has been used as a tool to perform the co-occurrence analysis, and then to realize the visualization of the intellectual structure;
Classification method. This study uses the classification method to delineate the types of models and algorithms used in the literature for topic classification and prediction. The essence of classification is the grouping together of things or concepts that are similar in some way and, in doing so, comparing a given object with the object that the assessor believes best represents the category [35]. Here, we have used EXCEL to record the algorithms and models used in each article and the domain to which they belong, and to classify them accordingly.

3.2.2. Research Process

The process of data acquisition and analysis is as follows (shown in Figure 1):

Step 1, determining the search strategy. The CNKI literature database is selected as the data sources to ensure the thorough gathering of data. CNKI (China National Knowledge Infrastructure) is currently the most comprehensive academic repository in China, with a total of over 200 million titles, containing more than 95% officially published Chinese academic resources. Therefore, using the CNKI database as the data source ensures adequate access to high-quality Chinese research in this field, which fits our research aims. Most research on topic identification or prediction is found in the scientific literature. In addition, science and technology patents are also important entities reflecting the generation and application of domain knowledge topics. The search terms “research frontiers”, “dynamic evolution”, “link prediction” and “network links” have been excluded for the sake of search completeness and accuracy. After several trial searches, an advanced search has been conducted with the keywords “technology identification” OR “technology prediction” OR “subversive technology”, excluding “topic identification” OR “topic prediction” OR “emerging topics” OR “topic evolution”. The document types are limited to journal papers, conference papers and dissertations, with no time span. The search deadline is 30 June 2022.

Step 2, screening the study sample. First, we read the titles, abstracts and keywords of the obtained literature to filter out the samples that are relevant to this field. Then, we review the full text of these samples one by one, selecting articles that explicitly identify the use of models and algorithms for topic analysis in their research methodology, finally choosing 96 document as the research sample of this paper.

Step 3, analyzing models and algorithms used in the sample literature. First, the bibliographic data of 96 documents are imported into the tool for co-word analysis and mapping the co-word network. Second, we create an EXCEL document, and manually scan and extract the main models and algorithms used in the sample literature from two perspectives: topic identification and trend prediction. These methods are recorded according to the original text of the paper. Three experts read all samples separately and decide on the final version of the EXCEL document. Then, classification is conducted to calculate the frequency of models and algorithms applied in the sample literature. A consistency test is conducted by randomly selecting the records of three researchers and calculating the Cohen’s Kappa statistic score. The Cohen’s Kappa scores are here all above 0.8 (0.75 indicates good consistency), indicating credible results.

Step 4, constructing an indicator system and related verification methods. We extract and summarize the indicators used in the documents to construct an indicator system for topic identification and prediction. The indicator validation methods used in the literature are grouped.

4. Overview of Models’ and Algorithms’ Applications in Topic Identification and Prediction

The authors carefully investigated the models and algorithms applied in 96 sample papers one by one, and the results of the statistical analyses are shown in Appendix A. By analyzing the year of publication and the research methods of these samples, this study reveals the chronological distribution and numbers of algorithms and models used, which are specified in Section 4.1 and Section 4.2.

4.1. Chronological Distribution

This study counted the number of topic identification and prediction documents per year to obtain an overview of their overall development. This is shown in Figure 2. It can be seen that since 2010, the research on topic identification and its trend prediction has been on the rise, which rise can be divided into two stages. The first stage is the beginning stage, from 2010 to 2015, with zero to one article published per year. The second stage is a steady rise from 2016, with 5 to 15 identification articles and 7 to 12 prediction articles published per year, which indicates the attention has gradually increased. After 2019, the number of identification articles per year has increased from 15 to 20, reflecting that the research on topic identification and prediction has become an important topic in the academic research.

4.2. Number of Types

Based on the statistics related to the types of models and algorithms used in the literature in Table A1 in Appendix A, the type of models and algorithms used in the samples have been calculated. The results are shown in Table 1. Among the applications of identification models and algorithms (87 documents), 45 documents (accounting for 51.7%) used one type of model or algorithm, 32 documents (accounting for 36.8%) used two types of models and algorithms, and 9 documents (accounting for 10.3%) used 3, while 1 document (accounting for 1%) used 4. It can be seen that slightly more than half of the papers used one type of model or algorithm, while more than one-third used two types of models and algorithms comprehensively, and nearly half of the papers used more than two models and algorithms.

As regards the application of prediction models and algorithms (55 documents), 33 documents used one type (accounting for 60%), 19 documents used two types (accounting for 34.5%), 2 documents (accounting for 3%) used three types of models and algorithms, and 1 document used four types (accounting for 1.8%). It can be seen that 60% of the literature used one type of model or algorithm, and 40% of the literature used more than two.

4.3. Keywords Analysis

Due to the number of articles, keywords with a frequency less than three occurrences are hidden, resulting in a total of 48 keywords. The keyword co-occurrence relations are shown in Figure 3, and larger circles with more keyword tags denote greater significance. There is a total of four clusters. These clusters are connected, while their boundaries are fuzzy. The first cluster is topic identification, whose related keywords include LDA, Word2Vec, K-means, VOLDA, PLDA, knowledge discovery, etc. The second cluster is topic prediction, and its related keywords include Markov model, exponential smoothing method, ARIMA model, social network analysis and so on. The third cluster is technology prediction, and its related keywords consist of data fusion, high-frequency words, word co-occurrence, evolutionary trends, etc. The fourth cluster is patent analysis, whose related keywords include multidimensional scale analysis, frontier technology, artificial intelligence, KELIM algorithm and so on. As mentioned earlier, technology and patents are also important entities reflecting the generation and application of domain knowledge topics. Thus, the first cluster and third cluster are analyzed later together as part of topic identification. The second cluster and the fourth cluster are discussed later together as part of the topic prediction.

5. Specific Application of Topic Identification Models and Algorithms

Using the statistics shown in Table A1 in Appendix A, classification has been conducted to find the most frequently used topic identification models and algorithms. The Dycharts tool is used for analyzing and visualizing data (see Figure 4).

Figure 3 shows each algorithm model and the number of times it has been applied. Topic recognition models and algorithms can be generally divided into the following four categories, denoted with different colors in the figure: (1) LDA theme model and related derivative models (accounting for 53%); (2) machine learning and deep learning models (accounting for 26%), mainly including K-means, Word2Vec, Bert and TF-IDF; (3) text mining (accounting for 14%); (4) citation analysis (accounting for 7%). The application frequencies of each method are as follows: LDA theme model and related derivative models (46 times), machine learning and deep learning (22 times), text mining (12 times) and citation analysis (6 times). It should be noted that although LDA can also be regarded as one of the classical algorithms of machine learning, due to its unique role in the field of topic recognition, this paper classifies studies using only the LDA model into category 1. These four kinds of topic identification models and algorithms are specifically discussed in Section 5.1, Section 5.2, Section 5.3 and Section 5.4.

5.1. Application of LDA Theme Model and Its Derivative Models

As shown in Figure 3, LDA and its derivative models are the most widely used in the research sample. Their specific applications are described and summarized in the following section.

5.1.1. LDA Theme Model

LDA was proposed by Blei et al. in 2003. It is a three-layer Bayesian probability model, including a three-layer structure of words, topics and documents. The LDA topic model overcomes the shortcomings of sparse text matrices and ignores text semantics, and is one of the most effective methods for analyzing large unstructured document sets [36]. By maximizing the co-occurrence probability of words to find word clusters, Dirichlet distribution is used to describe the document generation process and to cluster documents. The LDA topic model is the most widely used method in topic recognition, particularly in the fields of graphene [37], digital journalism [38], marine diesel engines [39], stem cell research [40] and so on.

Cui extended the LDA model to the online text stream and designed an online topic evolution mining model for network public opinion situation analyses [41]. Xue used the LDA model to assess the topics of scientific and technological studies in combination with controlled vocabulary [42]. Chen used LDA to realize the unsupervised automatic identification and acquisition of potential scientific and technological topics, and uncover topic distribution in a large number of patent texts [43]. Mao added topic filtering on the basis of existing evolution methods and proposed a method to judge the evolution status of a topic [44]. Chen et al. used the LDA algorithm to determine the clustering of potential technical topics in patent studies, and analyzed the distribution characteristics and evolution law of topics in various periods [45]. Liu et al. used the LDA model to identify the research topics within fund projects and the texts of papers, and used the similarities to build correlations between topics [46]. Yue et al. used the LDA model to identify topics, and obtained a time series data of topic changes in the field of information construction [40]. Liu et al. used the LDA model to mine the topic features within the deep learning field, and subdivided the research topics [47]. Li et al. used the LDA model to identify emerging topics in graphene studies by the National Science Foundation of the United States [48].

5.1.2. The Derivative Models of LDA Theme Model

In light of the limitations of traditional LDA in determining the number of topics using a Bayesian model or perplexity calculation, Wang proposed a topic model called hLDA based on hierarchical probability, and used time information to automatically mine potential topic information from scientific and technological documents [49]. Aiming at the problem of the fixed time window that arises in the traditional online topic model OLDA, Pei introduced a topic similarity matrix into the model and proposed a variable online topic model VOLDA model [50]. Luo introduced a dynamic weight calculation method to produce the IOLDA (improved OLDA) online topic model that can identify science and technology information topics [51] in view of problems such as the mixing of new and old topics caused by the fixed weight of the content evolution matrix in the traditional OLDA model. Zhang introduced word vector distribution into LDA and proposed the LDA2vec topic model [52]. Gao et al. proposed a CO-LDA model based on a combination of word co-occurrence analysis and the LDA model to mine online medical comment topics [27] in view of the issues of semantic sparsity and insufficient co-occurrence information in LDA models used for medical comment topic mining. Xu and Wang selected the PLDA model from Kmine experimental platform to identify themes in the graphene field [28]. Wu et al. identified topics and keywords in the comment texts based on a parallel potential Dirichlet distribution model, PLDA [6].

5.2. Application of Machine Learning and Deep Learning Models and Algorithms

Our classification analysis has revealed that machine learning and deep learning methods mainly focus on the single or comprehensive use of common models such as Word2Vec, K-means, Tf-idf, Bert, etc. See Figure 3 for details.

5.2.1. Word2Vec

Word2vec is an open-source tool built by Tomas Mikolov of Google in 2013 to express words as real vectors. It is a model for deriving semantic knowledge from a large text corpus in an unsupervised manner. It is a two-layer shallow neural network, and is widely used in natural language processing. Li combined the LDA model with the Word2vec method to extract topics and features that improve the accuracy and stability of the model’s topic mining ability [53]. Yang et al. combined LDA and Word2vec models to identify technical topics in the field of artificial intelligence [54]. Chen et al. comprehensively used LDA and Word2vec models to identify breakthrough innovation topics in the field of blockchain [43]. Teng et al. used the Word2Vec model to build a dynamic semantic dependency network, and found theme groups based on it [55].

5.2.2. K-Means

The K-means clustering algorithm is an unsupervised clustering algorithm, which clusters data according to their implicit characteristics in cases of unknown numbers of categories. Ruan and Xia combined the LDA model with K-means clustering and co-occurrence analyses in order to mine and identify interdisciplinary research topics [29]. Song and Zhu integrated the LDA model and K-means algorithm to realize patent text clustering and identify cutting-edge technical topics in the field of human intelligence [30]. Gao et al. clustered the patent technology of the patent diversity matrix to form a technology cluster through multidimensional scale analysis combined with the K-means algorithm and the Pearson correlation coefficient, based on the patent IPC number [56].

5.2.3. TF-IDF Algorithm

TF-IDF is a statistical method used to evaluate the importance of a word to a document set or one document in a corpus. In the research on topic recognition, TF-IDF is often accompanied by a K-means clustering algorithm. Huang used the TF-IDF algorithm and the K-means clustering method to construct a technical theme identification and analysis framework, enabling them to identify industrial wastewater treatment technology topics [57] and the IPC classification number. Ren et al. applied the TF-IDF method to vectorize each document, and used the K-means++ algorithm for clustering to identify emerging technology topics in the field of unmanned combat platforms [58].

5.2.4. BERT

The full name of BERT is Bidirectional Encoder Representation from Transformers. It is a pre-trained language representation model. It adopts the masked language model (MLM) and conducts pre-training based on the attention mechanism of two-way transformers, and achieves a novel performance in natural language processing tasks. Gui classified the technical fields addressed in the papers and patent data to determine the development status of each technical field, based on a BERT-LSTM model of subject identifications of scientific and technological documents [59]. Xu and Gui used the optimal Bert LSTM text classification model to identify the topics in all SCI papers and patent data in the field of new energy vehicles [5]. Tan took the technical relationships represented by semantic relationship between subject words in patent documents as the training data, and used the BERT pre-training model to realize the task of identifying technical relationships [60].

5.3. Application of Citation Analysis and Its Associated Models and Algorithms

Citation analysis is an activity and method used to analyze the relationship between literature citations and cited references. It quantitatively reflects the relationships between the inheritance and development of the literature through the network relationships between the literature references and cited references, and is also used in subject topic discovery.

Zhu and Leng selected the most highly cited papers in the field of “carbon nanotube fiber” as their research object, and used the C-value term extraction method for topic identification [61]. Li constructed an association matrix based on the paper–patent reference and text association, and identified the innovative topics in the field of Bruton tyrosine kinase inhibitors based on the method of science–technology reference and text associations [62]. Yu et al. used backward patent references (where patents cite other patents) in combination with the Bass model to identify disruptive technologies in solar photovoltaic power generation [63]. Zhang et al. selected the most highly cited papers in the field of artificial intelligence for analysis, and identified the cited themes combined with K-means, LDA and GSDMM clustering methods and models [12]. Based on the change in the direction of patent references, Wang et al. built a new method to identify disruptive technologies in the field of speech recognition [64]. Wu et al. used patented citation networks and genetic backward–forward path methods to identify the main paths, thus enabling them to track technology evolution and identify disruptive technologies in the field of surgical robotics [65].

5.4. Application of Text Mining and Its Associated Models and Algorithms

According to Figure 3, analysis methods based on co-word analysis, co-occurrence matrices and SAO structural semantics are the most widely used.

5.4.1. Co-Word Analysis and Co-Occurrence Matrix

The co-word analysis method was established in the 1970s, based on the concept of citation coupling and co-citation in bibliometrics. It is based on developing statistics on the frequency of common occurrences of word pairs or noun phrase pairs in the same document; these words or noun phrase pairs are hierarchically clustered to reveal the affinity between them, and then the structural changes of the disciplines and topics they represent can be analyzed.

Kui et al. used the Ucinet software to obtain a high-frequency co-word matrix and generate a topic network evolution map in the field of intelligence science [66]. Meng utilized a subject word co-occurrence model combined with a community discovery algorithm to build a co-word matrix and identify the main topics in synthetic biology [67]. Chen et al. obtained a word co-occurrence matrix by using co-word analysis, and derived term sets by combining professional terms and high-frequency term analyses [68]. Wang et al. established associations between technologies based on the co-occurrence of patent classification codes [69]. He focused on the achievements of library and information science, as well as computer software and application science, and used co-word analyses methods to explore the topic intersections in the two subjects [70].

5.4.2. SAO Structure

SAO is a subject recognition model based on subject, action, and object, which can improve the semantic information of keywords.

Guo constructed a technical solution evolution path recognition model based on SAO semantic mining [71]. Fan utilized SAO semantic analyses, similarity calculations, and clustering analyses to mine topics within potentially disruptive technology [72]. Li applied the SAO semantics method to realize the identification and extraction of the technical elements and their relationships [73]. Ma et al. proposed a core technology topic recognition method [74], extracting the SAO structure of scientific and technological papers and patent text data. Ma et al. constructed an SAO co-occurrence network and carried out classification to identify subversive technologies [75].

6. Specific Application of Topic Prediction Models and Algorithms

In these 96 samples, there are 55 topic prediction articles. Through classification analysis, this study concluded that the models and algorithms used can be mainly summarized into the following two aspects: (1) Deep learning or machine learning analysis models and algorithms based on time series. There are 21 documents applying this method, accounting for 38%. (2) Link prediction models and algorithms based on complex networks. There are six documents that apply this method, accounting for 11%. This study will present and analyze these two categories in the following subsections.

6.1. Deep Learning or Machine Learning Analysis Models and Algorithms Based on Time Sequence

The time series analysis method is an analysis method that uses a group of numerical sequences in chronological order, uses mathematical statistics methods to deal with them, and forecasts the development trend. It is often combined with the LSTM model (six documents), the Markov model (five documents), the SVM model (four documents), exponential smoothing (three documents) and ARIMA (three documents).

6.1.1. LSTM Model

The LSTM model is a long short-term memory network proposed by Hochreiter et al., which introduces the gate nodes of the forget gate, the input gate, and the output gate on the basis of the RNN recurrent neural network, and processes the long-distance dependent time series data through gate control. The LSTM is used in a large number of tasks related to sequence learning, and it is also one of the most widely applied methods in trend prediction.

Li added the three characteristics of topic intensity, keyword frequency and literature number, and proposed a technical topic prediction model based on the LSTM neural network [53]. Zhu et al. addressed topic relevance, novelty degree and migration degree as the predictors of the LSTM model in order to predict the evolution trends of privacy research topics [36]. Xu and Gui established an EMD-LSTM technology prediction model based on empirical mode decomposition (EMD) and LSTM to predict developments in the robotics field [5]. Huo et al. constructed a subject topic popularity prediction model, including subject topic extraction, subject topic popularity calculations, and subject topic popularity prediction, and predicted topics’ popularity in the LIS field [76]. Huo et al. carried out the evolution prediction of LIS subject topics at home and abroad based on the TPP-LSTM subject popularity prediction model [9]. Gui proposed an EEMD-LSTM technology prediction model combining collective empirical mode decomposition (EEMD) and LSTM to predict the key technologies in the field of lithium-ion battery use in new energy vehicles [59].

6.1.2. The Markov Model

Depending on whether the system’s state is fully observable, and whether the system is automatic or controlled, the most common Markov models can be divided into four categories: Markov Chain, the Hidden Markov Model (HMM), Markov Decision Process (MDP), and the partially observable Markov Decision Process (POMDP). As a statistical model, the Markov model is also applied in topic prediction.

Based on the LDA, Mao proposed a method to predict subject evolution trends combined with the Markov Chain [44]. Chen et al. used the Hidden Markov Model (HMM) including double stochastic processes to quantitatively predict the technical trends of marine diesel engines [39]. Lin combined the LDA theme model with the HMM model to predict research themes and development trends related to 3D printing technology [77]. Tian used Markov and Hidden Markov to build an evolution prediction model of each theme’s persistence and the degree of interest, and to predict topics in the field of cloud computing [1].

6.1.3. SVM Model

SVM is a common machine learning algorithm. It adopts the SRM (structural risk minimization) strategy and has high prediction accuracy, and it is often used in various classification and regression tasks.

Li et al. used NSF graphene project data as the data source to propose a new trend prediction method for fund projects based on time series analysis and SVM model [37]. Xu and Wang proposed a support vector machine prediction model optimized with the improved particle swarm optimization algorithm to predict development trends in the graphene field [28].

6.1.4. Exponential Smoothing Method

The method is a prediction method based on past, present and future development trends, focusing on stability and regularity, in relation to the recurrence of time series. It is one of the most important social prediction methods.

Song and Zhu used the cubic exponential smoothing method to predict the development trends of topics in cutting-edge technology, and predict the technological trends in the field of artificial intelligence [30]. Song and Ran used the exponential smoothing method to predict the future development trends in research topics related to smart libraries [78].

6.1.5. ARIMA Model

ARIMA is a modeling method that uses a difference method to transform non-stationary series into stationary series; it is also known as the summation autoregressive moving average model, and is a classic time series analysis method. Yue, Zhou and Chen proposed a subject topic trend prediction method based on the ARIMA model to predict research topic trends in the field of information construction [40]. Yue, Liu and Hu used the ARIMA model to measure the change trends in the intensity of topics in the US stem cell field [3]. Cui et al. adopted the ARIMA model and the exponential smoothing method for fitting and predicting in the field of underwater information perception technology [41].

6.1.6. Other Models and Algorithms

In addition to the model methods mentioned above, naive Bayes, regression models, TRIZ theory and other methods have also been applied.

For example, Li and Wu used the dynamic bending algorithm to conduct a trend analysis of the time series of theme popularity in the literature within the field of innovation management [79]. Nie et al. used algorithms including linear regression, support vector machine, radial basis function regression and radial basis function neural network to predict the development trends of time series of animal genetics and breeding disciplines [80]. Zheng used three classification algorithms—naive Bayes, support vector machine and XGBoost—to build a prediction model for multi-source data fusion [4]. With the Word2Vec model and TRIZ theory, Liu put forward a patent invention hierarchical classification model based on word vectors to identify technological maturity and development, so as to achieve technology development trend prediction [81]. Tan studied technology using TRIZ theory, assessing the strong subjectivity of the technology evolution analysis process, and pointed out that the details of the technology evolution process can be understood by analyzing the technology evolution diagram constructed by this method [60].

6.2. Application of Link Prediction Model

Link prediction refers to predicting the possibility of generating links between two nodes that have not yet produced connected edges in the network through known information, such as network nodes and network structure. This method has been widely used in the prediction of emerging technologies in recent years.

Liu proposed three similarity link prediction methods based on common neighbor-weighted, inter-node resource interaction, and inter-node topology-weighted, and conducted a comparative analysis [81]. Ma applied the link prediction concept to predict the potential links that have not yet appeared in a technical network, and used the node similarity calculation method to identify the technical opportunities of using gold nanoparticles in cancer treatment [82]. Liu et al. used the AA index (an index based on the common neighbor degree) to predict the future link probability of nodes with unconnected edges in a deep learning domain knowledge network [47]. Wu et al. proposed a link prediction module based on a topic co-occurrence network, and analyzed the weighted and non-weighted networks using multiple indicators based on similarity to predict related topics in the field of diabetes drugs [6]. Huang et al. introduced the complex network link prediction method and the neural network algorithm to build an approach to the dynamic prediction of emerging technology development networks [83]. Zhang et al. constructed a weighted co-occurrence network of technical topic fields, and proposed a technical prediction and improvement method based on the link prediction algorithm combined with technical life cycle theory [84].

7. Indicator System and Effectiveness Verification

In order to better understand and apply the above models and algorithms, this research further analyzes the indexes and the methods involved to verify their effectiveness. The indicators used in the models and algorithms were extracted and assembled into an index system by scanning the sample papers obtained (see Appendix A). The most commonly used methods for verifying the validity of indicators were likewise summarized from the literature. The indicators and validation methods are discussed in the following two subsections.

7.1. Indicator System

In the identification and prediction of topics, the scientific and reasonable construction of relevant indicator systems and feature selection is a key process. Based on a literature analysis, a more systematic and comprehensive indicator system has been built with the following five categories: basic scalar, topic popularity and intensity, emerging degree, cross diffusion and network structure, as shown in Table 2. The table also lists the specifics of each category (see Secondary Category column), the corresponding indicator of the measurement used and the reference source. Due to the large volume of master’s and doctoral dissertations and the broad coverage of system models and research methods, this paper mainly summarizes the characteristics of the indicators involved in 67 journal papers, excluding dissertations.

Among these, the basic scalar feature indicators are generally obtained through more direct descriptive quantitative statistics. Time series prediction is conducted based on word frequency or citation frequency. For example, Nie et al. used word frequency time series integration to predict popular topic words for a period of time in the future through subject word mining [80]. Chen et al. conducted time series evolution analyses using the quantitative trend evolution model of technology prediction to predict new technology and development trends in the area of volatile organic compounds [68]. Zhu et al. pointed out that LSTM and other time series prediction models can be applied to topic correlation and prediction under multiple time series windows [36]. The characteristic index used in basic scalar is the change in or patent subject words and word frequency in the literature.

Topic popularity is also called theme intensity in some studies. The two often appear overlapping, and are equally used or mixed. They are mostly used to reflect the relevant characteristics of the theme’s weight and direction. For example, Huo et al. proposed a subject theme popularity calculation index based on journal impact factors to carry out theme evolution prediction for LIS disciplines [9,76]. Li et al. pointed out that topic intensity reflects research focus and concerns, and topic words and weight distribution can be derived from the topic probability identification model [37].

The novelty and growth rate of the topics are usually considered comprehensively to measure the degree of emergence of the topics. For example, Cao et al. selected 26 relevant emerging technology themes as the analysis object for breakthrough prediction based on novelty and growth rate [85]. Ren and others built an evaluation system for the degree of emergence of a given technology topic, focusing on the three characteristics of novelty, growth and originality, and summarized the top 10 emerging technologies in the field of “unmanned combat platforms” [58].

However, with the development of society and disciplines themselves, multi-point breakthroughs and cross integrations of various disciplines are becoming more and more common. In the research on interdisciplinary subjects, Dong and others pointed out that it is of great significance to analyze the distribution of interdisciplinary subjects and the role of interdisciplinarity in exploring the frontier of disciplinary development and promoting interdisciplinary knowledge fusion [86].

Scientific relevance represents the flow of knowledge from the scientific literature to patented technology. Li and others believe that the greater the degree of scientific relevance, the more scientific the technology is, and the more subversive it will be [87]. Zhu et al. proposed that theme migration degree elucidates the probability of a topic moving from the current timing window to the next timing window, reflecting the evolution trend and timing characteristics of the research theme [36].

In addition, part of the research is based on the construction of network relationships, and most of its characteristics relate to the topology of the network, or are based on traditional link prediction methods. This research is carried out in combination with a focus on the centrality of the subject node and the similarity of local neighbor nodes or walking paths and other indicators, on the basis of centrality characteristics of complex networks [47,74,83].

7.2. Validation of Method Effectiveness

Validating the effectiveness of different models and choosing the best methods to conduct topic identification and prediction is also significant. The methods for verifying effectiveness can generally be divided into the following three categories, on the basis of the indicator system summarized above:

Time series models for prediction based on quantitative indicators.

The prediction indicators involved in such tasks are mostly the above basic scalar indicators, such as scale, word frequency, popularity intensity or novelty, which are essentially regression tasks. Such tasks also use common error statistics indicators used in time series analysis and prediction to evaluate effectiveness.

For example, Huo and Dong used LSTM neural networks to characterize the time series characteristics of the evolution of subject topic popularity, and built a subject topic popularity prediction model based on LSTM [9]. Cui et al. used the ARIMA model and the exponential smoothing method to predict the trends of topics [88], characterized by topic intensity and novelty. Liu et al. identified frontier themes with topic emergence degree and an attention index, and studied the lag effect of topic diffusion evolution using an autoregressive distribution lag model [46].

Most of these use root means square error (RMSE), mean absolute error (MAE), relative error (RE) and other indicators to evaluate the accuracy of prediction, and RMSE is the square root of the ratio of the sum of squares of the deviations between the predicted value and the actual value to the predicted times. MAE is the ratio of the absolute deviation between the predicted value and the actual value to the number of predictions, and is the average value of the absolute error. RE is the ratio of the D value between the predicted value and the actual value to the actual value. The smaller the value of these indicators, the higher the prediction accuracy of the model. Some studies also use R2 to represent the prediction effect of the model. The closer R2 is to 1, the greater the goodness of fit of the model.

Link prediction model based on complex networks

Such tasks are mostly based on multi-relationship and multi-type networks, such as co-words/citations. Topics are identified or predicted through the common indicators of centrality or link prediction tasks, such as the common neighbor indicators and the Jaccard coefficient. For such models, the evaluation standard AUC (area under the ROC curve) is the most commonly used precision measurement indicator.

For example, Liu took research in the field of deep learning as the object, took related topics as the nodes, and predicted the possibility of emerging topics through link prediction [47]. Zhang took the field of virus nucleic acid detection technology as an example, constructed a weighted co-occurrence network of technology subject fields, and proposed an improved technology prediction method based on the link prediction algorithm [84]. Huang et al. introduced the method of complex network link prediction combined with a neural network algorithm to build a dynamic prediction network for emerging technology development based on the patent data of Derwent [83]. The AUC is the area under the ROC curve. The ROC curve is a characteristic curve based on a confusion matrix, wherein the abscissa is the false positive rate and the ordinate is the true rate. When evaluating the effectiveness of different methods, the area under the ROC curve is the most intuitive basis for measuring the accuracy of each index in the link prediction algorithm as a whole.

Classification model of topic recognition based on deep learning.

The process of identifying topics is also a classification problem in essence. Whether it is the classification of topics into corresponding fields, or whether a topic is emerging/subversive/breakthrough in nature or not, it can be included in this column. The evaluation indicators for such tasks also basically follow the traditional learning classification model.

For example, Xu and others combined deep learning, machine learning and other methods to identify technological innovation opportunities within SCI papers and Derwent patent data. The Bert LSTM text classification model proposed by Xu and Gui is more applicable to the classification of papers and patent topics, and has led to improvements in accuracy [89]. Xu and Gui proposed a multi-modal input text classification model based on deep learning for topic recognition. The results of topic recognition in the field of robotics are consistent with the conclusions given in the 2018 report of Clarivate Analytics, and the prediction accuracy of this model is also better than that of other classification models. Some common indicators of machine learning and deep learning classification models have been applied, such as precision, recall, F1 value, etc.

8. Conclusions

This paper offers a comprehensive survey of the models and algorithms used in the research on topic identification and trend prediction based on CNKI over the last 10 years and beyond, systematically summarized the overall state of the application of models and algorithms in topic identification and prediction analysis, and described the specific scenarios of application of the models and algorithms from the points of view of age distribution, type, number of categories, and analysis of specific applications. The main conclusions are as follows.

Research on topic identification and prediction has become an important topic since 2019. More than half of the research has used one type of model or algorithm, while more than one-third of the research used two types of models and algorithms. The main themes of interest in this area are topic identification, topic prediction, technology prediction and patent analysis, including analysis methods such as LDA, Word2Vec, K-means, VOLDA, PLDA and other approaches.

In the application of topic recognition models and algorithms, the frequency of use can be listed from high to low as the LDA topic model and derived model, machine learning and deep learning, text mining-based methods and reference relationship-based analysis methods. Two kinds of methods are mainly used in trend prediction; one is deep learning or machine learning based on time series, the second is link prediction algorithms based on complex networks; the methods involved are summarized above.

The characteristics of the indicators used in the sample literature are summarized, and a primary indicator system including five characteristic dimensions is proposed, namely, basic scalar, topic popularity and intensity, emerging degree, cross diffusion and network structure, covering 15 secondary indicators. The methods of validity verification for the topic recognition and prediction models are shown.

In a word, models and algorithms based on machine learning and deep learning have become the most popular application. Internationally, these method has also been used widely to conduct data mining and text analysis in multiple subjects [90,91]. The main challenges met in modeling and applying algorithms include accuracy, computability, reliability, viability and sustainability [92,93].

8.1. Academic and Practical Contributions

This research contributes in three aspects. First, this study represents a literature review study undertaken in a systematic and reproducible manner, demonstrating a standardized research process. Compared to former studies [94], the same standardized research process and a more fine-grained analysis of the literature are here realized, and this approach even achieves a more detailed and in-depth analysis of the content of the literature. This can theoretically guide the classification and application of algorithms and models, and allows us to explore methods and techniques for the analysis of multiple types of text or data. Based on this, researchers can better cope with the emergence of big data in text classification and promote cutting-edge machine learning and deep learning techniques. Second, this study comprehensively shows the progress of algorithm and modeling applications in China, and can thus be used as a reference for further research in multiple fields. Unlike research focusing on specific fields [95,96], this research addresses the application of models and algorithms in multiple fields, which can help to elucidate the overall progress made in text analysis research. Third, this study provides ideas and methods for selecting suitable models and algorithms, and also provides a reference for constructing an indicator system to assess the effectiveness of a model or algorithm. These indicators are also useful for reconfiguring and optimizing algorithms and models [97].

8.2. Limitations and Suggestions

The limitation of this paper is that the means of comparatively analyzing various methods and validating the evaluation indicators for algorithms and models are still insufficient; this enables the features of each type of algorithm and model to be more prominent, and provide more operational guidance. At the same time, it will be useful to include more material from other databases to enable comparisons between Chinese and other nations’ research. In future studies, researchers should obtain a larger sample of the literature from several databases, and consider the uniqueness of various identification and prediction models and algorithms, comparing their advantages and disadvantages so as to choose the best methods to solve research questions. In addition, the indicators used in the models and algorithms should be further applied to algorithms and models in the future, so that their validity can be verified.

Author Contributions

Conceptualization, S.G. and L.S.; methodology, S.G., L.S. and X.L.; software, X.L.; validation, S.G. and L.S.; formal analysis, S.G.; investigation, S.G. and L.S.; resources, S.G.; data curation, S.G. and X.L.; writing—original draft preparation, S.G.; L.S. and X.L.; writing—review and editing, S.G., L.S. and X.L.; visualization, S.G. and X.L.; supervision, L.S.; project administration and L.S.; funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used in this research can be obtained from the corresponding authors upon request.

Acknowledgments

The authors are grateful to the National Demonstration Center for Experimental Library and Information Science Education.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This table includes the numbering of the sample of literature analyzed, the algorithms and models used and the domain to which they belong, with references to the original text.

Table A1. Application statistics of subject identification and prediction models and algorithms.

No.	Topic Identification Models and Algorithms	Topic Prediction Models and Algorithms	Domain	References	No.	Topic Identification Models and Algorithms	Topic Prediction Models and Algorithms	Domain	References
1	LDA (latent Dirichlet allocation) model, keywords	TPP-LSTM (topic popularity prediction model based on long short-term memory) topic hot prediction model	library and information science	Huo et al. (2021) [9]	49	ETM (embedding theme model)		epidemic policy of China and US	Zhou and Wu (2021) [98]
2	LDA model	ARIMA (autoregressive integrated moving average model) model	information construction	Yue et al., (2019) [3]	50	Relim algorithm	machine learning algorithm	animal genetics and breeding	Nie (2019) [99]
3	LDA model	LSTM (long short-term memory) model	privacy research	Zhu et al., (2020) [36]	51	LDA model		digital journalism	Chao et al., (2021) [38]
4	LDA model	Link prediction	deep learning	Liu et al., (2019) [47]	52	BERT-LDA model		agricultural robot	Wang and Gao (2021) [100]
5	LDA model. Word2vec (word to vector)	LSTM model	technical topic	Li (2019) [53]	53	Leiden community discovery algorithm		additive manufacturing	Li, Guo et al., (2021) [101]
6	LDA model, TP-JIF (topic popularity computing model based on journal impact factor) model	TPP-LSTM (topic popularity computing model based on journal impact factor) topic hot prediction model	library and information science	Huo et al. (2021) [76]	54	Generative topological mapping, BERT-LSTM model		new energy vehicles	Xu and Gui (2021) [89]
7	VOLDA (variable online-LDA) model	ESG (EEMD-SVM-GMDH), EEMD (ensemble empirical mode decomposition), SVM (support vector machine), GMDH (group method of data handing)	topic popularity	Pei (2018) [50]	55	Word2Vec	TRIZ (theory of the solution of inventive problems)	patent text	Liu (2017) [81]
8	keywords analysis	grey prediction model	information science	Xu et al., (2016) [102]	56	LDA model		network public opinion	Cui (2010) [88]
9	Kelim algorithm	time series integration method	animal genetics and breeding	Nie et al., [80]	57	SAO		hematopoietic stem cells	Ma et al., (2021) [74]
10	Co-LDA (word co-occurrence analysis combined with LDA) model	link prediction	diabetes drugs	Wu et al., (2021) [6]	58		link prediction of weighted network, neural network	application of perovskite materials	Huang et al., (2019) [83]
11	LDA model	time series analysis, SVM model	graphene field	Li et al., [37]	59	LDA model		patent	Chen (2015) [39]
12	HDP (Hierarchical Dirichlet Processes)	Naive Bayes, SVM, XGBoost (extreme gradient boosting) classification algorithm	massive news data source	Zheng (2019) [4]	60	LDA-kmeans	EMD (empirical mode decomposition)-LSTM technology prediction model	robot technology	Xu and Gui (2020) [89]
13		causal model	panel data	Du et al., (2016) [103]	61	conditional random field, domain vocabulary	link prediction technology, opportunity identification	gold nanoparticles	Ma (2016) [82]
14		citation curve	stem cells	Cao et al., (2020) [85]	62	LDA model	Hidden Markov Model	marine diesel engine	Chen et al., (2018) [43]
15	PLDA (probabilistic LDA) model	SVM, improved particle swarm optimization algorithm	graphene	Xu and Wang (2019) [28]	63	analytical hierarchy process		industrial robot	Cao et al., (2022) [104]
16	LDA model	Markov chain, HDP	NIPS papers	Mao (2016) [44]	64	text classification text clustering		3D printing	Zhao (2017) [105]
17	LDA model	curve fitting prediction method	fuel cell	Bai et al. (2020) [106]	65	SAO, similarity calculation, clustering analysis		artificial intelligence	Fan (2020) [72]
18	LDA model	HMM (hidden markov model)	3D printing	Lin (2020) [77]	66	patent citation, Bass model		solar photovoltaic power generation	Yu et al., (2021) [63]
19	LDA model, controlled vocabulary	AR (auto regressive) model	NLP	Xue (2013) [42]	67	fuzzy consistent matrix		smart and traditional cellphone, quantum and optical fiber communication, additive manufacturing	Li et al., (2021) [87]
20	Co-word analysis, community discovery algorithm,	Emergence scores (EScores) indicator	synthetic biology	Meng (2018) [67]	68	patent citation, questionnaire		speech recognition	Wang et al., (2022) [64]
21	IOLDA (improved online LDA) model	ESA (EEMD-SVR-Adaboost) EEMD (ensemble empirical mode decomposition), SVR (support vector regression) model	scientific and technological information	Luo (2018) [51]	69	citation, genetic backward forward path, technical discontinuity theory		surgical robot	Wu et al., (2022) [65]
22	LDA model	Emerging index, amount index	artificial intelligence	Liu et al., (2018) [107]	70	LDA model, mean value, linear regression fitting	ARIMA model, word2vec	stem cell field	Yue et al., (2020) [3]
23	LDA model	structure hole, Delphi method	chip	Xuan (2020) [108]	71	subject words relation recognition, BERT model, TRIZ theory		data mining	Tan (2019) [60]
24	Co-word matrix	Blondel partition algorithm, node coincidence degree	information science	Kui et al. (2016) [66]	72	SAO, catastrophe theory		stem cells	Ma et al., (2022) [75]
25	LDA model	Markov, Hidden Markov Models	cloud computing	Tian (2021) [1]	73	DSE (discovery–select–evaluation) identification ideas			Deng et al., (2022) [109]
26	Co-occurrence matrix		clinical medicine	Chen et al. (2017) [110]	74	TF-IDF, K-means++		unmanned combat platform in ship field	Ren et al., (2022) [58]
27	LDA model, K-means	exponential smoothing method	artificial intelligence	Song and Zhu (2021) [30]	75	LDA model, Word2vec		blockchain	Chen et al., (2022) [39]
28	LDA model	BP (back propagation) neural network, SVR machine learning algorithm	emerging topics	Ye et al., (2022) [111]	76	LDA model, emotion analysis method, entropy method, CRITIC (criteria importance though intercriteria correlation) method		intelligent connected vehicle	Tang and Qiu (2021) [112]
29		topic prediction using basic research results		Wu et al. (2022) [113]	77	Chunk-LDAvis toolkit		nano agriculture	Liu et al., (2019) [46]
30	PLDA, product technical attribute words-technical feature vocabulary	comment topic identification, multi-dimensional analysis of technical attributes	smart phones	Wu et al. (2021) [114]	78	CSToT (content similarity–topics over time)		domestic information science research	He et al., (2018) [115]
31	LDA model	topic model, Sen’s slope estimation method, Mann Kendall, exponential smoothing method	smart library	Song and Ran (2022) [78]	79	LDA model, Rao Stirling index		nanotechnology	Han et al., (2018) [116]
32		network topology evolution model, link prediction		Liu (2016) [117]	80	hLDA (hierarchical Latent dirichlet allocation)		library science and information science	Wang (2014) [49]
33		Knowledge–technology–environment three-dimensional framework	perovskite solar battery	Xie (2019) [118]	81	Word2Vec, Fast Unfolding algorithm, Page Rank algorithm		governor mailbox data	Teng et al., (2022) [55]
34	LDA model, Word2vec	topic intensity and content evolution analysis, expert evaluation	artificial intelligence	Yang et al. (2022) [54]	82	PathSelClus (integrating meta-path selection with user-guided object clustering) algorithm		genetically engineered vaccine	Xu et al., (2019) [89]
35	co-occurrence matrix	time series evolution	volatile organic compounds	Chen et al. (2020) [68]	83	LDA model		artificial intelligence	Li et al., (2022) [48]
36	SAO (subject– action–object)	technology evolution model of patent feature mining	slow and controlled release fertilizer	Li (2020) [73]	84	CO-LDA model		online medical review	Gao et al., (2019) [27]
37	patent classification code co-occurrence analysis	multidimensional technology correlation trend evolution model	medicine	Wang et al. (2021) [69]	85	scientific metrological characteristics of IDR theme		artificial intelligence	Dong et al., (2022) [86]
38	multidimensional scale analysis, K-means (K-means clustering algorithm)		artificial intelligence	Gao et al. (2020) [56]	86	LDA model, science and technology citation and text relevance		Bruton’s tyrosine kinase inhibitor	Li (2020) [62]
39		Multi-level comprehensive index evaluation system	solid oxide fuel battery	Hou and Zhu (2014) [119]	87	Vos (VOSviewer) clustering, strategic coordinate analysis	ARIMA model, exponential smoothing method	underwater information perception technology	Cui et al., (2022) [88]
40		polynomial regression model	home appliance industry	Wu et al. (2022) [120]	88	AP (affinity propagation) nearest neighbor propagation clustering algorithm	time series clustering	innovation management	Li and Wu (2019) [79]
41	TF-IDF (term frequency–inverse document frequency) algorithm, K-means	keywords patent map prediction method	industrial wastewater treatment	Huang (2019) [57]	89	OVL (overlap function) superposition algorithm, linear weighting method, K-means algorithm		industrial robot	Tian and Zhang (2021) [121]
42	topic identification on knowledge flow perspectives	quantitative technology prediction frame	harmonic reducer	Liao (2017) [122]	90	citation content analysis		carbon nanotube fiber	Zhu and Leng (2014) [61]
43	SAO	SAO, morphological analysis prediction model	dye-sensitized solar battery	Guo (2016) [71]	91	K-means, LDA model, GSDMM (Gibbs sampling algorithm for the Dirichlet multinomial mixture model) model	time series analysis	artificial intelligence	Zhang et al., (2022) [97]
44	BERT (bidirectional encoder representation from transformers)-LSTM identification model	EEMD-LSTM technology prediction model	lithium-ion battery for new energy vehicles	Gui (2021) [59]	92	LDA model, Rao Stirling index		solar photovoltaic	Han et al., (2021) [123]
45	LDA2vec (LDA combined with word2vec), text similarity theory and calculation method		antidepressants	Zhang (2019) [52]	93	LDA model, co-word analysis		intelligent technology	He (2021) [70]
46	LDA-GS model		graphene industry	Wu et al. (2021) [124]	94	LDA model, K-means, co-word analysis		library and information science and pedagogy	Ruan and Xia (2018) [29]
47		technology prediction model based on knowledge evolution	molecular breeding	Li (2021) [125]	95	calculation method of frontier topic characteristic index		treatment and prognosis of cardiovascular disease	Fan et al., (2018) [126]
48		link prediction weighted co-occurrence matrix, technology life cycle theory	nucleic acid detection technology	Zhang et al. (2021) [84]	96	LDA model, patent–paper hybrid co-citation analysis, expert interview		intelligent security technology	Zhu (2020) [92]

References

Tian, Y.D. Research on the Evolution and Prediction of Knowledge Topic. Inf. Sci. 2021, 6, 123–133. [Google Scholar] [CrossRef]
Yang, J.Q.; Wei, Y.H.; Huang, S.Z.; Luo, W.; Lu, W. Research Review on Emerging Topic Identification Based on Scientific Literatures. Inf. Sci. 2020, 8, 159–163, 177. [Google Scholar] [CrossRef]
Yue, L.X.; Liu, Z.Q.; Hu, Z.Y. Evolution Analysis of Hot Topics with Trend-Prediction. Data Anal. Knowl. Discov. 2020, 6, 22–34. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i8oRR1PAr7RxjuAJk4dHXov3eiaK-qy0V0MEwvWDcF540nJEGMnun7si7FR8zZ3ek&uniplatform=NZKPT (accessed on 10 June 2023).
Zheng, X.Q. Research on System Construction Based on Emerging Topic Detection and Prediction Method; Xiamen University: Xiamen, China, 2019; pp. 43–54. [Google Scholar]
Xu, X.G.; Gui, M.Z. Identifying Technology Innovation Opportunities Based on GTM Reverse Mapping. Inf. Stud. Theory Appl. 2021, 6, 146–153, 198. [Google Scholar] [CrossRef]
Wu, S.N.; Tian, R.N.; Pu, H.J.; Liang, W.Q.; Zhang, Y.F.; Yu, Q.; He, P.F. Research on the Prediction Method of Related Topics in the Medical Field Based on Social Media. Data Anal. Knowl. Discov. 2021, 12, 98–109. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_Xf80tvfaHzPNRCyn4BN7A8H7prX-1A3ec2RlY9xqy9u6&uniplatform=NZKPT (accessed on 10 June 2023).
Wang, C.Y.; Koo, L.J.; Yuan, J.F. A study on the classification of universities based on discipline characteristics: An example of “double first-class” universities. China High. Educ. Res. 2022, 351, 38–44. [Google Scholar] [CrossRef]
Khaire, U.M.; Dhanalakshmi, R. Stability of feature selection algorithm: A review. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1060–1073. [Google Scholar] [CrossRef]
Huo, C.G.; Dong, K.; Si, X.Y. Evolution Analysis and Prediction of Scientific Topic Popularity in the Field of LIS. Doc. Inf. Knowl. 2021, 2, 35–47, 57. [Google Scholar] [CrossRef]
Wang, Y.P. Research Progress of Scientific and Technical Literature Topic Detection and Evolution Based on Topic Model in China. Libr. Inf. Serv. 2016, 3, 130–137. [Google Scholar] [CrossRef]
Kai, Q.; Dou, Y.X. Survey of Disruptive Technology Identification. J. Intell. 2021, 11, 31–38. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iy_Rpms2pqwbFRRUtoUImHdOR7NvLKjZ7LPyq3EM1bLXtZ4TSi4gYzC4yFBDYNafB&uniplatform=NZKPT (accessed on 10 June 2023).
Zhang, J.W.; Dong, Y. Research Progress of Disruptive Technical Identification Indicators. Inf. Stud. Theory Appl. 2020, 6, 194–199. [Google Scholar] [CrossRef]
Qiao, Y.L.; Huang, Y.; Zhang, S.; Wang, X.F. The Identification of Disruptive Technology from a Multi-dimensional Perspective: Research Progress and Future Prospects. J. Intell. 2022, 8, 45–52. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_XYdCGNxWtU8QthnBzxOQtLhYpHolN8oHXTTQIxuntxNM&uniplatform=NZKPT (accessed on 10 June 2023).
Park, J.; Lee, H.J.; Cho, S. Hot topic detection in central bankers’ speeches. Expert Syst. Appl. 2023, 230, 120563. [Google Scholar] [CrossRef]
Savin, I.; Ott, I.; Konop, C. Tracing the evolution of service robotics: Insights from a topic modeling approach. Technol. Forecast. Soc. Chang. 2022, 174, 121280. [Google Scholar] [CrossRef]
Li, L.P.; Zhao, X.B. Review on Topic Discovery Methods Based on Text Clustering. Inf. Res. 2020, 11, 121–127. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i8oRR1PAr7RxjuAJk4dHXoirdakr6rX4kUmaZQhq2PnSRWn4lxXUdckURk3QC9M-d&uniplatform=NZKPT (accessed on 10 June 2023).
Liu, X.L.; Tang, Z.Y. Progress on Methods of Emerging Technology Topics Identification. Libr. Inf. Serv. 2020, 11, 145–152. [Google Scholar] [CrossRef]
Luo, P.; Luo, M.; Li, F.; Qi, X.; Huo, A.; Wang, Z.; He, B.; Takara, K.; Nover, D.; Wang, Y. Urban flood numerical simulation: Research, methods and future perspectives. Environ. Model. Softw. 2022, 156, 105478. [Google Scholar] [CrossRef]
Shelton, R.C.; Philbin, M.M.; Ramanadhan, S. Qualitative research methods in chronic disease: Introduction and opportunities to promote health equity. Annu. Rev. Public Health 2022, 43, 37–57. [Google Scholar] [CrossRef]
Xu, S.; Nowamooz, H.; Lai, J.; Liu, H. Mechanism, influencing factors and research methods for soil desiccation cracking: A review. Eur. J. Environ. Civ. Eng. 2023, 3, 1–25. [Google Scholar] [CrossRef]
Zhang, X.; Fang, S.; Wang, C.H. Review on Technology Evolution Research from Patent Citation Perspective. Sci. Sci. Manag. Sci. Technol. 2016, 3, 58–67. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7ijP0rjQD-AVm8oHBO0FTadljcow_eXGgKMEvDsj9YhRltpkI3SFHHp3VmYIua-TbX&uniplatform=NZKPT (accessed on 10 June 2023).
Zhou, Y.; Liu, H.L.; Liao, L.; Xue, L. Literature Review of Quantitative Technology Foresight Methods Based on Topic Modeling. Sci. Technol. Manag. Res. 2017, 11, 185–196. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iAEhECQAQ9aTiC5BjCgn0RmB4NtpdPbcvLESyvJs_62iRWk01nAtHn56clFBDlm3M&uniplatform=NZKPT (accessed on 10 June 2023).
Liu, Q.Y.; Wu, X.N. Review on Disruptive Technology Discovery Methods. Libr. Inf. Serv. 2017, 7, 127–136. [Google Scholar] [CrossRef]
Wang, C.; Xu, H.Y.; Fang, S. Progress of Approaches for Identification and Forecasting of Disruptive Technologies. Sci. Technol. Prog. Policy 2018, 9, 152–160. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i0-kJR0HYBJ80QN9L51zrPz0d_hXdda6BS1nZFoPQi2jI7Ig1FkXPfFMsz_DuKaae&uniplatform=NZKPT (accessed on 10 June 2023).
Coelho, L.B.; Zhang, D.; Van Ingelgem, Y.; Steckelmacher, D.; Nowé, A.; Terryn, H. Reviewing machine learning of corrosion prediction in a data-oriented perspective. NPJ Mater. Degrad. 2022, 6, 8. [Google Scholar] [CrossRef]
Hond, A.A.H.; Leeuwenberg, A.M.; Hooft, L.; Kant, I.M.J.; Nijman, S.W.J.; van Os, H.J.A.; Aardoom, J.J.; Debray, T.P.A.; Schuit, E.; van Smeden, M.; et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: A scoping review. NPJ Digit. Med. 2022, 5, 2. [Google Scholar] [CrossRef]
Gao, H.Y.; Liu, J.W.; Yang, S.X. Identifying Topics of Online Healthcare Reviews Based on Improved LDA. Trans. Beijing Inst. Technol. 2019, 4, 427–434. [Google Scholar] [CrossRef]
Xu, L.L.; Wang, F. Scientific Frontier Prediction Model Based on Support Vector Machine and Improved Particle Swarm Optimization. Inf. Sci. 2019, 8, 22–28. [Google Scholar] [CrossRef]
Ruan, G.C.; Xia, L. Research on Interdisciplinary Topics Identification. Inf. Sci. 2020, 12, 152–157. [Google Scholar] [CrossRef]
Song, K.; Zhu, Y.J. Patent Frontier Technology Topic Identification and Trend Prediction. J. Intell. 2021, 1, 33–38. Available online: https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2021&filename=QBZZ202101005&uniplatform=NZKPT&v=fDjsTQ_pkUc34IT6lc0BsMuRPAfV82K8s9TjMuvX273LH7EwJeLtgNhpZ2jL4g7G (accessed on 10 June 2023).
Siddaway, A.P.; Wood, A.M.; Hedges, L.V. How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annu. Rev. Psychol. 2019, 70, 747–770. [Google Scholar] [CrossRef]
Rey-Martí, A.; Ribeiro-Soriano, D.; Palacios-Marqués, D. A bibliometric analysis of social entrepreneurship. J. Bus. Res. 2016, 69, 1651–1655. [Google Scholar] [CrossRef]
Small, H. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 1973, 24, 265–269. [Google Scholar] [CrossRef]
Bellis, N.D. Bibliometrics and Citation Analysis: From the Science Citation Index to Cyber-Metrics; Scarecrow Press: Lanham, ML, USA, 2009. [Google Scholar]
Sarker, I.H. Deep Learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef] [PubMed]
Zhu, G.; Liu, L.; Li, F.J. Research on Topic Relation and Prediction Based on LDA and LSTM. J. Mod. Inf. 2020, 8, 38–50. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i8oRR1PAr7RxjuAJk4dHXoge9KT-AMPGtQmxVZzBNW4KlYBpdAL03RUjJN4cNQfFw&uniplatform=NZKPT (accessed on 10 June 2023).
Li, J.; Xu, L.L.; Zhao, S.J. Prediction and Visualization of Emerging Topics of Fund Sponsored Projects Based on Time Series Analysis and SVM Model. Inf. Stud. Theory Appl. 2019, 1, 118–123, 152. [Google Scholar] [CrossRef]
Chao, N.P.; Han, S.Q.; Wu, X.T. Topic Discovery and Evolution Analysis of Digital Journalism. J. Mass Commun. 2021, 9, 4–13. [Google Scholar] [CrossRef]
Chen, W.; Lin, C.R.; Li, J.Q.; Yang, Z.L. Analysis of the Evolutionary Trend of Technical Topics in Patents Based on LDA and HMM. J. China Soc. Sci Technol. Inf. 2018, 7, 732–741. [Google Scholar]
Yue, L.X.; Zhou, X.Y.; Chen, Y.N. Thematic Trend Prediction of Information Architecture Based on the ARIMA Model. Doc. Inf. Knowl. 2019, 5, 54–63, 72. [Google Scholar] [CrossRef]
Cui, K. The Research and Implementation of Topic Evolution Based on LDA. Nati. Univ. Def. Technol. 2010, 5, 18–39. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkhskYGsHyiXlyV6jw0YcPLAKe5A8Jxr64DzLnGUNqeloGZjehXwbAzHL9PlAzVpfL&uniplatform=NZKPT (accessed on 10 June 2023).
Xue, Y.B. Topic Discovery and Trend Forecasting in the Science and Technology Literature. Harbin Inst. Technol. Univ. 2014, 3, 20–29. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAk8URRK9V8kZLG_vkiPpTeIb4GvxjDyNTXVh9b-4cZ5dIgkzg75Tfct0iLdXZyEBGS&uniplatform=NZKPT (accessed on 10 June 2023).
Chen, H.S. Research on Topic Model Based Patent Mining and Its Applications. Beijing Inst. Technol. 2016, 4, 71–88. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C447WN1SO36whLpCgh0R0Z-iv9r0YoQXiId4v9BfOE9rDiv39gx3RLlwMF6A-xU1fhweMSwWcdDHO04FFWlwEWPJ&uniplatform=NZKPT (accessed on 10 June 2023).
Mao, L.F. Study of text evolution analysis and prediction based on topic model. Nanjing Univ. Posts Telecommun. 2017, 2, 12–43. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkkyu7xrzFWukWIylgpWWcEgwSw85v4Y4_8hlSQb4pGaghtZSQvRWf7CLlQO1wCp0d&uniplatform=NZKPT (accessed on 10 June 2023).
Chen, H.S.; Song, Y.H.; Jin, Q.Q.; Wang, X.F. Radical Innovative Topic Identification from a Perspective of Dynamic Topic Network. Libr. Inf. Serv. 2022, 10, 45–58. [Google Scholar] [CrossRef]
Liu, Z.Q.; Xu, H.Y.; Yue, L.X.; Fang, S. Research on Lagging Effect of Topic Diffusion Evolution Face to Prediction of Research Front. J. China Soc. Sci. Technol. Inf. 2018, 10, 979–988. [Google Scholar]
Liu, J.W.; Long, Z.X.; Wang, F.F. Finding Collaboration Opportunities from Emerging Issues with LDA Topic Model and Link Prediction. Data Anal. Knowl. Discov. 2019, 1, 104–117. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iLik5jEcCI09uHa3oBxtWoB3Uo_fHZa_LSTlXA0PMbyBDEXz9KCLopwPKpVKMK4_O&uniplatform=NZKPT (accessed on 10 June 2023).
Li, W.S.; Tan, L.M.; Zhang, G.L.; Wen, X.F.; Liao, T.; Shi, M. Research on Topic Recognition of Key Core Technology in Industrial Chain Based on Multi-source Information Fusion. J. Inf. Resour. Manag. 2022, 1, 116–126. [Google Scholar] [CrossRef]
Wang, P. Topic Extraction and Evolution for Scientific Literature Based on Hierarchical Probabilistic Topic Model. Libr. Inf. Serv. 2014, 22, 70–77. [Google Scholar] [CrossRef]
Pei, K.F. Research on Topic Heat Prediction Based on VOLDA Theme Model and ESG Prediction Model. Nanjing Univ. Aeronaut. Astronaut. 2019, 2, 11–31. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkWfZcByc-RON98J6vxPv10VISgQ7zi55MhQcfqP16cdwQUaQEW3OxxXcGsmFc7YK7&uniplatform=NZKPT (accessed on 10 June 2023).
Luo, Y. Research on Topic Discovery and Evolutive Prediction and Its Application Based on Scientific and Technical Literature. Univ. Electron. Sci. Technol. China 2021, 1, 15–22. [Google Scholar] [CrossRef]
Zhang, H. Research on Technology Forecasting Method from the Perspective of Data Fusion. Jilin Univ. 2020, 2, 69–78. [Google Scholar] [CrossRef]
Li, Y.J. Technology Topic Prediction Research Based on LSTM. Xiangtan Univ. 2020, 2, 21–25. [Google Scholar] [CrossRef]
Yang, H.; Wang, R.F.; Zhang, L. Technology Prediction Based on Core Patents Technology Topic Recognition and Evolution Analysis. J. Intell. 2022, 7, 49–56. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_XYdCGNxWtU8Qv57pnMYKPbXWo9jcdJG_AZN3v9SZv61k&uniplatform=NZKPT (accessed on 10 June 2023).
Teng, J.; Hu, G.W.; Wang, T. Topic Identification and Evolution Path Analysis of Social Appeal Based on Dynamic Semantic Dependency Network. Inf. Doc. Serv. 2022, 3, 20–33. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_XdZ9ekciUQMV-5xDMlhiMVCtMvA9BpHyD424dblG52gE&uniplatform=NZKPT (accessed on 10 June 2023).
Gao, N.; Peng, D.Y.; Fu, J.Y.; Zhao, Y.H. Research on Technology Fronts Prediction Based on Patent IPC Classification and Text Information. Inf. Stud. Theory Appl. 2020, 4, 123–129. [Google Scholar] [CrossRef]
Huang, N. Research on the Evolution Path of Industrial Wastewater Treatment Technology Based on Patent Mining. Tianjin Univ. 2022, 1, 31–57. [Google Scholar] [CrossRef]
Ren, H.C.; Huang, Q.L.; Zhang, Z.G.; Yu, W.D. Research on Topic Identification Technology of Emerging Technology in the Ship Field. Inf. Stud. Theory Appl. 2022, 11, 103–106. [Google Scholar] [CrossRef]
Gui, M.Z. Research on Key Technology Forecasting Based on Intelligent Methods. Shanghai Univ. 2018, 6, 32–63. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C447WN1SO36whLpCgh0R0Z-iTEMuTidDzndci_h58Y6ouQoOYLrrcHvDLa4hVZVgdQbHG__kFiWIAu3hXgrmFB1e&uniplatform=NZKPT (accessed on 10 June 2023).
Tan, T.T. Research Technology Evolution Analysis Method Oriented on Patent. Nanjing Univ. Sci. Technol. 2022, 1, 35–55. [Google Scholar] [CrossRef]
Zhu, Q.S.; Leng, F.H. Topic Identification of Highly Cited Papers Based on Citation Content Analysis. J. Libr. Sci. China 2014, 1, 39–49. [Google Scholar] [CrossRef]
Li, P.X. Identifying Innovation Topic Within the Relevancy Between Texts and Citations. Peking Union Med. Coll. 2021, 5, 23–47. [Google Scholar] [CrossRef]
Yu, G.H.; Ning, Z.; Li, H.F. Research on Identification Method of Disruptive Technology Based on Patent and Bass Model. Stud. Sci. Sci. 2021, 39, 1467–1473, 1536. [Google Scholar] [CrossRef]
Wang, K.; Chen, Y.; Wang, Y.Q.; Song, C. Research on Disruptive Technology Identification Based on Patent Citation Changes. J. Intell. 2022, 1, 74–80, 169. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_XYdCGNxWtU8QLn9VUsU5lRkyBz4xKUYcfNWC3W9x6G3k&uniplatform=NZKPT (accessed on 10 June 2023).
Wu, K.F.; Wang, W.; Zhang, S.Y.; Che, H.X.; Cai, L.; Chen, X. Research on Disruptive Technology Identification Methods from the Perspective of Technology Discontinuities. Inf. Stud. Theory Appl. 2022, 10, 125–131. [Google Scholar] [CrossRef]
Kui, L.; Xu, H.Y.; Hu, Z.Y.; Dong, K.; Wang, C.; Pang, H.S. Multiple-pattern Analysis and Prediction of Topic Evolution Path Based on Topic Correlation. Libr. Inf. Serv. 2016, 13, 71–81. [Google Scholar] [CrossRef]
Meng, J.Y. Research on Scientific Topic Evolution and Forecasting. Beijing Eng. Technol. Univ. 2021, 7, 26–45. [Google Scholar] [CrossRef]
Chen, R.; He, C.C.; Sun, J.Q.; Yan, S.M.; Liu, Y. Research on Technology Forecasting Based on Trend Evolution Analysis. Sci. Technol. Manag. Res. 2020, 24, 47–53. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iy_Rpms2pqwbFRRUtoUImHYrFaEO8QzFsfoWWIb05eWYtkwmQEJ49RFfVRisoi8i0&uniplatform=NZKPT (accessed on 10 June 2023).
Wang, S.S.; Yan, S.M.; Chen, R.; Li, J.X. Research on the Evolution of Technological Relatedness Trends Based on Patent Codes Co-Occurrence. J. Intell. 2021, 40, 53–61. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iy_Rpms2pqwbFRRUtoUImHdOR7NvLKjZ755L5uOAdibkX9XuTPd8hgP4bKZxr431A&uniplatform=NZKPT (accessed on 10 June 2023).
He, Y.J. Topic Recognition and Feature Analysis about the Researches in the Field of Intelligent Technology. Nanchang Univ. 2022, 8, 36–62. [Google Scholar] [CrossRef]
Guo, J.F. A Semantic Mining-Based Method to Analyze and Evaluate Technology Innovation Pathway. Beijing Inst. Technol. 2018, 6, 67–92. [Google Scholar]
Fan, M.J. Early Identification of Disruptive Technology Based on Multi-Source Heterogeneous Data. Beijing Univ. Technol. 2021, 6, 23–31. [Google Scholar] [CrossRef]
Li, X.M. Technology Evolution Analysis Based on Patent Elements Features. Chin. Acad. Agric. Sci. 2020, 1, 35–53. [Google Scholar] [CrossRef]
Ma, M.; Wang, C.; Zhou, Y.; Xu, H.Y.; Hu, Z.Y.; Xiong, G.H. Research on Core Technology Topic Identification and Evolution Trend Analysis Based on Semantic Information. Inf. Stud. Theory Appl. 2021, 9, 106–113. [Google Scholar] [CrossRef]
Ma, M.; Wang, C.; Zhang, W.R.; Xu, H.Y.; Zhou, Y. Research on the Methods of Identifying and Analyzing Potential Disruptive Technologies from the Perspective of Catastrophe. Inf. Stud. Theory Appl. 2022, 3, 157–164, 156. [Google Scholar] [CrossRef]
Huo, C.G.; Huo, F.F.; Dong, K. The Popularity Prediction of Scientific Topics Based on LSTM. Doc. Inf. Knowl. 2021, 2, 25–34. [Google Scholar] [CrossRef]
Lin, C.R. Research on Key Generic Technology Identification and Foresight Based on Patent Data Mining. Harbin Eng. Univ. 2021, 4, 104–127. [Google Scholar] [CrossRef]
Song, K.; Ran, C.J. A Method for Development Hierarchy Division and Trend Prediction of Subject Research Topic. Inf. Sci. 2022, 7, 136–144. [Google Scholar] [CrossRef]
Li, H.L.; Wu, X.L. Research on Topic Discovery and Evolution Based on Time Series Clustering. J. China Soc. Sci. Technol. Inf. 2019, 10, 1041–1050. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iLik5jEcCI09uHa3oBxtWoDWrTnpDfVAo-fIwIFrlvmtwZcyJNu6yBrTT_vR1ksws&uniplatform=NZKPT (accessed on 10 June 2023).
Nie, X.; Xie, N.F.; Wu, S.S.; Li, X.Y.; Wang, H.J. Prediction of Hot Trends in Animal Genetics and Breeding Based on Machine Learning. Agric. Outlook 2020, 1, 101–105. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i8oRR1PAr7RxjuAJk4dHXoh-KV_Gm6cjyLpUKC8fcuXARynP-1YTKDCXq-yd-cKyz&uniplatform=NZKPT (accessed on 10 June 2023).
Liu, H. Research on Technology Trend Prediction Method Based on Word Vector. Beijing Univ. Technol. 2018, 7, 29–43. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkZIGkvqfmUZglMdu7fCR481K0pCE4mjcx4y3CC_afkpF2rrQ8Xcih-IibouS2UCTu&uniplatform=NZKPT (accessed on 10 June 2023).
Ma, J. Applying Text Mining to Technology Opportunities Analysis in Biomedical Field. Beijing Inst. Technol. 2019, 9, 79–89. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C447WN1SO36whLpCgh0R0Z-iDdIt-WSAdV5IJ_Uy2HKRASVjT2vCJYzcCU8cA7s4Y38B3DRVz6yzPls0j7biIx2k&uniplatform=NZKPT (accessed on 10 June 2023).
Huang, L.; Zhu, Y.H.; Zhang, Y. Research on Identification of Emerging Topics Based on Link Prediction with Weighted Networks. J. China Soc. Sci Technol. Inf. 2019, 4, 335–341. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iLik5jEcCI09uHa3oBxtWoDWrTnpDfVAojkdgbYFroK9HWgn_Nf0F5Ysnw3lY3pgl&uniplatform=NZKPT (accessed on 10 June 2023).
Zhang, Y.; Lin, Y.H.; Hou, J.H. Technology Prediction Method Based on Data Fusion and Life Cycle. J. China Soc. Sci Technol. Inf. 2021, 5, 462–470. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iy_Rpms2pqwbFRRUtoUImHZHj7Gr3AZ3G1gh3PVbAJnQrfDM9i5UaoyVaWdqAjZQ7&uniplatform=NZKPT (accessed on 10 June 2023).
Cao, Y.W.; Xu, H.Y.; Wu, H.w.; Luo, R.J. Study on Radical Innovation Prediction to Emerging Technology Topics Based on Citation Curve Fitting. Libr. Inf. Serv. 2020, 5, 100–113. [Google Scholar] [CrossRef]
Dong, K.; Xu, H.Y.; Luo, R.; Fang, S. Research on Multi-dimensional Interdisciplinary Topics Identification Method Based on Scientific Literature Contents Analysis. Inf. Stud. Theory Appl. 2018, 5, 131–136. [Google Scholar] [CrossRef]
Li, Q.R.; Guo, J.F.; Huang, Y.; Wang, X.F. Research on the Method of Disruptive Technology Identification Based on Patent Bibliometrics. Stud. Sci. Sci. 2021, 39, 1166–1175. [Google Scholar] [CrossRef]
Cui, J.; Zhang, J.P.; Bao, Z.; Ding, S.C. Development Forecast of Core Theme in Science and Technology Field Based on Trend Analysis. Data Anal. Knowl. Discov. 2022, 9, 1–13. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_Xf80tvfaHzPNpC0tB6QvIH8O60mGM8cg0RoB–CdGM0wu&uniplatform=NZKPT (accessed on 10 June 2023).
Xu, X.G.; Gui, M.Z. Technology Forecast Based on Deep Learning. J. Intell. 2020, 8, 53–62. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i8oRR1PAr7RxjuAJk4dHXoh96Y1LJOLtJy0g5PBpP–g0i6yrybnHvJK–_LTVplKsC&uniplatform=NZKPT (accessed on 10 June 2023).
Raeesi Vanani, I. Text analytics of customers on twitter: Brand sentiments in customer support. J. Inf. Technol. Manag. 2019, 11, 43–58. [Google Scholar] [CrossRef]
Caruso, F.P.; Scala, G.; Cerulo, L.; Ceccarelli, M. A review of COVID-19 biomarkers and drug targets: Resources and tools. Brief. Bioinform. 2021, 22, 701–713. [Google Scholar] [CrossRef]
Zhu, J.H. Research on the Technical Theme and its Technical Principle of Intelligent Security Technology from the Perspective of Patent Literature. Dalian Univ. Technol. 2021, 6, 19–40. [Google Scholar] [CrossRef]
Zhang, J.; Tan, L.; Tao, X.; Pham, T.; Chen, B. Relational intelligence recognition in online social networks—A survey. Comput. Sci. Rev. 2020, 35, 1–33. [Google Scholar] [CrossRef]
Kleminski, R.; Kazienko, P.; Kajdanowicz, T. Analysis of direct citation, co-citation and bibliographic coupling in scientific topic identification. J. Inf. Sci. 2022, 48, 349–373. [Google Scholar] [CrossRef]
Quille, R.V.E.; Barros, J.M.; Barbado, M.; De Almeida, F.V.; Correa, P.L.P. Detecting Favorite Topics in Computing Scientific Literature via Dynamic Topic Modeling. IEEE Access 2023, 11, 41535–41545. [Google Scholar] [CrossRef]
Ebrahimi, F.; Dehghani, M.; Makkizadeh, F. Analysis of Persian Bioinformatics Research with Topic Modeling. BioMed Res. Int. 2023, 2023, 3728131. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.Z.; Qiu, M.M.; Wang, Q.Y. Citation Topic Identification and Evolution Based on Citation Content Clustering. Inf. Sci. 2023, 3, 1–21. Available online: http://kns.cnki.net/kcms/detail/22.1264.g2.20220617.1314.022.html (accessed on 9 January 2023).
Zhou, X.Q.; Wu, P. Research on Topic Detecting of Pandemic Policies of China and the United State of America Based on Embedded Topic Model. Inf. Stud. Theory Appl. 2022, 5, 173–180. [Google Scholar] [CrossRef]
Nie, X.P. Hot Topic Prediction Based on Time Series. Chin. Acad. Agric. Sci. 2019, 9, 13–47. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkOsSuGHvNoCRcTRpJSuXuqRwJWPPGVr5l_Dijs2KuLmIwYu5CQpWnAyD8GOhDBDfA&uniplatform=NZKPT (accessed on 10 June 2023).
Wang, X.H.; Gao, M. The Key Technology Identification Method Based on BERT-LDA and Its Empirical Research. Libr. Inf. Serv. 2021, 22, 114–125. [Google Scholar] [CrossRef]
Li, Q.R.; Guo, J.F.; Huang, Y.; Wang, X.F. Topic Evolution Research of Disruptive Technology Based on Mutation and Fusion Perspective. Stud. Sci. Sci. 2021, 39, 2129–2139. [Google Scholar] [CrossRef]
Xu, Y.; Meng, W.X.; Li, G.J. Forecasting Hot Topics of Information Science Based on Grey Prediction Model. Inf. Sci. 2016, 7, 3–6, 30. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkHr3ADhkADnVu66WViDP_3J4sNt5fDsyHNihxOdt36XBMPF1Qs4u2WnTW-wjYK8G6&uniplatform=NZKPT (accessed on 10 June 2023). [CrossRef]
Du, H.; Guo, Y.; Fan, Y.X.; Zhang, J.; Yu, Z.H.; Cheng, X.Q. Calculation and Prediction of Topic Popularity Based on Causal Model. J. Chin. Inf. Process. 2016, 2, 50–55. Available online: https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2016&filename=MESS201602008&uniplatform=NZKPT&v=a2h6sDzgRXvJvrea6GXuMcpOWMqvKUh31vGJL2SQ5IrDwmD_-IzMUopZ3Obx8x_Q (accessed on 10 June 2023).
Cao, R.; Bai, C.; Zhang, Y.J.; Zhang, X.X.; Zhang, Y.; Li, M.Y. Research on Disruptive Technology Recognition Model. China Sci. Technol. Resour. Rev. 2022, 2, 81–92. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i0–kJR0HYBJ80QN9L51zrPzpwteHiqxLlCtMdfKvcFNSaAplnNxV2D4m8qbogkCZt&uniplatform=NZKPT (accessed on 10 June 2023).
Zhao, G. Disruptive Technology Identification Based on Multi-source Heterogeneous Data. Huazhong Univ. Sci. Technol. 2019, 4, 16–27. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C475KOm_zrgu4lQARvep2SAkWfZcByc-RON98J6vxPv10ZEYtqW41TXSx5zS6kgxUKZFQOK3fFN0Q5bJGVWEyNA4&uniplatform=NZKPT (accessed on 10 June 2023).
Bai, J.J.; Yan, D.W.; Chen, Q. Trend Prediction of Emerging Topics Based on Topic Model and Curve Fitting. Inf. Stud. Theory Appl. 2020, 7, 130–136, 193. [Google Scholar] [CrossRef]
Liu, Z.Q.; Xu, H.Y.; Yue, L.X.; Fang, S. Research on Core Technology Topic Identification Based on Chunk-LDAvis. Libr. Inf. Serv. 2019, 9, 73–84. [Google Scholar] [CrossRef]
Xuan, H.S. Research on Technology Innovation Topic Discovery and Development Prediction for Patent Data. Xidian Univ. 2021, 5, 20–35. [Google Scholar] [CrossRef]
Deng, J.J.; Liu, A.R.; Cao, X.Y.; Zhang, K.; Bai, G.Z.; Chen, R.; Gao, Z.J.; Dai, H.W.; Wang, W.N. Methodological Framework of Identifying Disruptive Technologies on Emerging Stage. Bull. Chin. Acad. Sci. 2022, 5, 674–684. [Google Scholar] [CrossRef]
Chen, Y.Y.; Tian, W.F.; Wu, J.H. Visualization Analysis Methods of Subject Area Research Hotspots Tracking and Trend Prediction. Inf. Stud. Theory Appl. 2017, 6, 117–121. [Google Scholar] [CrossRef]
Ye, G.H.; Wang, C.C.; Li, S.Y. Recognition and Prediction of Emerging Topics in Interdisciplinary Scientific Research Collaboration Based on SciTS Conference Text. Inf. Sci. 2022, 7, 126–135. [Google Scholar] [CrossRef]
Tang, H.; Qiu, Y.W. Emerging Technology Topic Identification Based on Multi. J. Intell. 2021, 3, 81–88. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iy_Rpms2pqwbFRRUtoUImHdOR7NvLKjZ7c92OY79rnd6ugzo0KvZMiHncCKAP0FH6&uniplatform=NZKPT (accessed on 10 June 2023).
Wu, F.F.; Li, Y.W.; Miao, H. Prediction of Frontier Development and Research Topics of Essential Technology Fields in China Based on Linkage Between Basic Research and Technology Development. J. Intell. 2022, 1, 23, 60–66. Available online: https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2022&filename=QBZZ202201010&uniplatform=NZKPT&v=wgMWPIxfIaVJDknHfLmgfeuhtJ7obMNaCyrx6lkh2o9bsuvL9Nar8tMRiRV8ge-C (accessed on 10 June 2023).
Wu, Y.P.; Bai, R.J.; Liu, M.Y.; Wang, X.Y. Research on Technology Opportunity Discovery Based on Comment Topic Identification and Multi Dimension Analysis of Technical Attributes. Libr. Inf. Serv. 2021, 10, 56–67. [Google Scholar] [CrossRef]
He, W.L.; Feng, G.H.; Xie, H.L. Analyzing Scientific Literature with Content Similarity—Topics over Time Model. Data Anal. Knowl. Discov. 2018, 11, 64–72. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i0-kJR0HYBJ80QN9L51zrPxcrlhWtdSrQcxBnEn4vHtCdFXUmn3kt1IvYWiCTrf6H&uniplatform=NZKPT (accessed on 10 June 2023).
Han, Z.Q.; Liu, X.P.; Kou, J.J. Interdisciplinary Literature Discovery Based on Rao-Stirling Diversity Indices. Inf. Sci. 2020, 2, 116–124. [Google Scholar] [CrossRef]
Liu, S.X. Research on Key Technologies of Link Prediction and Network Evolution of Complex Network. Inf. Eng. Univ. 2021, 1, 53–87. [Google Scholar] [CrossRef]
Xie, Q.Q. Studying the Evolution Trajectory and Forcasting Development Trend of Emerging Technologies Based on Multi-Source Heterogeneous Data. Beijing Univ. Technol. 2019, 8, 63–79. [Google Scholar] [CrossRef]
Hou, J.H.; Zhu, X.Q. Evaluation Indicators System of Technology Forecasting and its Empirical Study Based on the Patent. Libr. Inf. Serv. 2014, 18, 77–82, 116. [Google Scholar] [CrossRef]
Wu, Y.W.; Ji, Y.J.; Gu, X.J. Industrial Generic Technology Prediction Based on Dynamic Complex Network of Patents. Comput. Integr. Manuf. Syst. 2020, 26, 3185–3194. [Google Scholar] [CrossRef]
Tian, P.W.; Zhang, X. Research on the Patented Technology Topics Identification Based on Heterogeneous Information Network. J. Intell. 2021, 8, 45–52. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iy_Rpms2pqwbFRRUtoUImHdOR7NvLKjZ7euRoAQ7qHQX9SWHW2eOjr–quJpcuuoFP&uniplatform=NZKPT (accessed on 10 June 2023).
Liao, L. Method and Case Study on Text Mining and Main Path Analysis based Technological Tendency Forecasting. Huazhong Univ. Sci. Technol. 2017, 3, 16–47. Available online: https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CMFD&dbname=CMFD201901&filename=1018801970.nh&uniplatform=NZKPT&v=XXgOgghaYaM3NAoQzIjz6GGe_59ortQLlK_1LNOP4XTYDzm2EGsDBlDnvq5tqPRv (accessed on 10 June 2023).
Han, F.; Zhang, S.T.; Feng, L.Z.; Yuan, J.P. Identifying Breakthrough Patent Topics by Measuring Technological Convergence. Data Anal. Knowl. Discov. 2021, 12, 137–147. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7iJTKGjg9uTdeTsOI_ra5_Xf80tvfaHzPNB9FXl2F0WSEPyA8FuYRGUN5HqZftefIL&uniplatform=NZKPT (accessed on 10 June 2023).
Wu, C.; Wang, Q.H.; Li, Y.; Zhang, L.F. Forecast and Cooperation Potential of Frontier Technology Fields of Strategic Emerging Industries. Syst. Eng. 2021, 4, 151–158. Available online: https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2021&filename=GCXT202104015&uniplatform=NZKPT&v=-590lXLM7-XMe7FOxMEWJ_BHlLOWyF3i2AIitCElCS_qN3d-G6x_JU3V7rQ62UzM (accessed on 10 June 2023).
Li, N. Research on Technology Foresight Method from the Perspective of Knowledge Evolution. Chin. Acad. Agric. Sci. 2022, 1, 26–55. [Google Scholar] [CrossRef]
Fan, S.P.; An, X.Y.; Yan, G.L.; Li, Y. Study on the Recognition Method of Frontier Topic in the Medical Field. J. China Soc. Sci Technol. Inf. 2018, 7, 686–694. Available online: https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7i0-kJR0HYBJ80QN9L51zrPzpwteHiqxLlLR76MCYwv8H3y2FYRYxuXOedF6BS0qn-&uniplatform=NZKPT (accessed on 10 June 2023).

Figure 1. Framework of research process.

Figure 2. Chronological distribution of models and algorithms.

Figure 3. Co-occurrence of keywords.

Figure 4. Classification of topic identification models and algorithms.

Table 1. Numbers of models and algorithms applied.

		1	2	3	4
	Number of Documents
Category
Topic identification		45	32	9	1
Topic prediction		33	19	2	1

Table 2. Topic identification and prediction indicator system.

Category	Secondary Category	Indicator	References
Basic scalar	Word frequency	Number of subject words in documents or patents	Nie et al., (2020) [80]
	Citation/Cited quantity	Citation frequency of documents or patents	Chen et al., (2020) [68]
	Scale	Number of patented products/patentees	Zhu et al., (2020) [36]
Topic popularity and intensity	Topic popularity Topic intensity	Weighting of the number of documents issued and citations	Li et al., (2019) [37]
		Number or proportion of supporting documents	Huo et al., (2021) [76]
		Keyword weight	Huo et al., (2021) [9]
Emerging degree	Topic novelty	Article/patent publication age	Cao et al., (2020) [85]
Emerging degree	Topic growth rate	Slope of curves of document/patent authorization	Ren et al., (2022) [58]
Cross diffusion	Crossing degree	Distribution of subject words and their cross interactions among disciplines	Dong et al., (2018) [86]
	Scientific relevance	Number of scientific documents cited by patents	Li Dong et al., (2021) [87]
	Topic relevance	The correlation of topic adjacent time evolution	Li Dong et al., (2021) [87]
	Migration degree	Transition probability of the topic to the next time	Zhu et al., (2020) [36]
Network structure	Topic node centrality	Degree/intermediary/proximity/eigenvector centrality	Liu et al., (2019) [47]
	Local information similarity	Common Neighbor/Cosine Similarity/Jaccard/Sorenson/ Priority Link, etc.	Ma et al., (2021) [74]
	Path similarity	Local path/Katz	Huang et al., (2019) [83]
	Random walk similarity	Average commuting time/cosine similarity based on random walk/SimRank	Huang et al., (2019) [83]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, S.; Si, L.; Liu, X. Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI. Appl. Sci. 2023, 13, 7545. https://doi.org/10.3390/app13137545

AMA Style

Guo S, Si L, Liu X. Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI. Applied Sciences. 2023; 13(13):7545. https://doi.org/10.3390/app13137545

Chicago/Turabian Style

Guo, Sicheng, Li Si, and Xianrui Liu. 2023. "Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI" Applied Sciences 13, no. 13: 7545. https://doi.org/10.3390/app13137545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI

Abstract

1. Introduction

2. Literature Review

2.1. Research on Topic Identification

2.2. Research on Topic Prediction

3. Research Design

3.1. Research Questions

3.2. Research Method and Process

3.2.1. Research Method

3.2.2. Research Process

4. Overview of Models’ and Algorithms’ Applications in Topic Identification and Prediction

4.1. Chronological Distribution

4.2. Number of Types

4.3. Keywords Analysis

5. Specific Application of Topic Identification Models and Algorithms

5.1. Application of LDA Theme Model and Its Derivative Models

5.1.1. LDA Theme Model

5.1.2. The Derivative Models of LDA Theme Model

5.2. Application of Machine Learning and Deep Learning Models and Algorithms

5.2.1. Word2Vec

5.2.2. K-Means

5.2.3. TF-IDF Algorithm

5.2.4. BERT

5.3. Application of Citation Analysis and Its Associated Models and Algorithms

5.4. Application of Text Mining and Its Associated Models and Algorithms

5.4.1. Co-Word Analysis and Co-Occurrence Matrix

5.4.2. SAO Structure

6. Specific Application of Topic Prediction Models and Algorithms

6.1. Deep Learning or Machine Learning Analysis Models and Algorithms Based on Time Sequence

6.1.1. LSTM Model

6.1.2. The Markov Model

6.1.3. SVM Model

6.1.4. Exponential Smoothing Method

6.1.5. ARIMA Model

6.1.6. Other Models and Algorithms

6.2. Application of Link Prediction Model

7. Indicator System and Effectiveness Verification

7.1. Indicator System

7.2. Validation of Method Effectiveness

8. Conclusions

8.1. Academic and Practical Contributions

8.2. Limitations and Suggestions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI