LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking

Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic software that streamlines the process of identifying state-of-the-art research across diverse academic topics. The software automatically retrieves articles from multiple databases and preprocesses the content through tokenization, case folding, token cleansing, stopword removal, and lemmatization. Subsequently, a numeric document-phrase matrix is created and analyzed using latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT) to discover and label topics automatically. The study introduces a novel topic-filtering method based on entity linking and filtering model outputs using a knowledge database to ensure topic relevance. The visual representation employs nested bubble and line charts, effectively illustrating current topics, gaps, and research evolution trends. A user survey spread to 52 student researchers assessing the interface, topic relevance, and research output of the developed software, revealed that the interface is user-friendly, easy to navigate, and the presented information is comprehensible. Survey results also indicated that the generated topics are consistent with the processed article content and relevant to the investigated topic. The visualization effectively aids in understanding the state-of-the-art and research map. This study demonstrates that integrating LDA, BERT, and the proposed topic filtering and labeling method yields a robust tool for preliminary research analysis with high precision and relevance.

To address this issue, researchers have employed technology in the search for state-of-the-art research. One such method was proposed by Oliveira et al. (2019), who utilized bibliometric analysis to map state-of-the-art research and trends in literature. This approach requires researchers to perform systematic literature review steps, such as determining the search database, searching articles using predefined search strings, and extracting bibliometric data (e.g., article title, publication year, journal name, author names) for input into software like VOSViewer. This software can then aid researchers in conducting bibliometric analysis to provide insights into trends and relationships among studies based on the extracted data [11].
Another approach involves using topic modeling algorithms to extract dominant topics from the searched materials. Topic modeling identifies latent topics within documents, where each generated topic is represented as a probability distribution over the document's vocabulary. Topic modeling has been used for various purposes, such as summarizing public opinions on events (e.g., reactions to product launches or controversial news), identifying prominent topics in specific domains, and tracking topic evolution in texts [12].
Given the capabilities of topic modeling, some studies have employed it to identify state-of-the-art research. Latent Dirichlet Allocation (LDA), an unsupervised, languageagnostic topic modeling algorithm, has been used to extract dominant topics receiving the most attention and identify topic trends within specific research areas. LDA's unsupervised nature and independence from specific languages make it a versatile approach for analyzing research article content. The output of topic modeling can represent recent research topics and reveal the evolution of investigated topics in specific research areas.
However, current approaches still involve manual processes and require expertise in data preprocessing for analysis [11]. Researchers must independently search, extract, and integrate data into the software. Sometimes, researchers only obtain data from a single source, which may limit information on a topic. Current approaches also lack an interface for searching up-to-date topics from various scientific databases and automatically providing insights and information on recent developments. Current bibliometric analysis methods solely analyze keywords from collected articles, which may produce inaccurate analysis results if the words used do not adequately reflect the article's content. Moreover, existing approaches have not yet been able to provide a comprehensive view of topic evolution trends in research, crucial information that can reveal paradigm shifts in research over time.
Considering the ongoing need for state-of-the-art discovery in academia and the limitations of previous approaches, this study aims to develop a new method for identifying state-of-the-art research. Due to its ability to analyze article content collections unsupervised, extract latent topics from a set of articles, and provide additional information such as topic evolution over time, this research proposes a state-ofthe-art discovery framework that builds upon the LDA-based approach.
This study results in a software tool that collects articles from various academic databases using information retrieval (IR) methods. The collected article content is preprocessed through tokenization, case folding, token cleansing, stopword removal, and lemmatization. An LDA model is then dynamically built based on the collected article content to obtain latent topics in research. The LDA model output is enriched using bidirectional encoder representations from transformers (BERT) and topic filtering through entity linking methods to ensure that the resulting topics are intuitive and represent real-world subjects. The topic outputs are then visualized with a nested bubble chart and line chart to display stateof-the-art topics, gaps, and evolutionary trends in research, which can aid researchers in preliminary research analysis. Based on the background described, several research questions are raised in this study, including: a) How can LDA be implemented to accelerate the stateof-the-art discovery process and, more generally, preliminary analysis in research? b) How can a software tool be developed that integrates LDA and open-access scientific literature retrieval for stateof-the-art discovery?

II. LITERATURE REVIEW A. RESEARCH
In conducting research, researchers generally follow a series of steps summarized in the flow diagram in Figure 1. The figure illustrates the scientific method employed by researchers to discover, develop, or validate theories [13]. The steps followed by researchers begin with the first stage, ''Wonder: Start from existing theories and observations,'' in which existing theories and phenomena are observed. This is followed sequentially through problem formulation, hypothesis formation, and hypothesis testing, to the sixth stage, ''Selection among competing theories,'' or the process of a theory's endurance among other theories in explaining a phenomenon.
The scientific method in the figure is represented in six sequential and recursive steps, with explanations of each step as follows: 1. Posing questions in the context of existing knowledge (theories & observations), where the questions can be ones that can be answered by old theories or questions that require the formulation of new theories. 2. Formulating a hypothesis as a tentative answer. This step involves formulating a hypothesis based on existing theories and observations to explain the phenomenon under investigation and provide a starting point for research. 3. Formulating consequences and predictions from the hypothesis, which will later be used to test the hypothesis's validity. These predictions and consequences guide experimental design and data collection. 4. Testing the hypothesis in specific experimental or theoretical domains. New hypotheses must be compatible with the existing worldview. If a hypothesis leads to contradictions and demands radical changes in the existing theoretical background, the hypothesis must be tested very carefully. This process can iterate from the second step to modify the hypothesis until an agreement is reached. If a discrepancy in the hypothesis is found, the process must start anew. 5. When consistency in hypothesis test results is obtained, the hypothesis becomes a theory and provides a coherent set of propositions defining a new class of phenomena or new theoretical concepts. A theory then becomes the framework within which observations or theoretical facts are explained and predictions are made. 6. The constructed theory then competes with other theories, undergoing a process of ''natural selection.'' Theories that can best explain existing phenomena endure.
Overall, the flow diagram in Figure 1 provides a clear and comprehensive overview of the research process and the main stages involved in discovering, developing, and validating knowledge. This study focuses on the first step in conducting scientific research, posing questions in the context of existing knowledge. In general, acquiring the context of existing knowledge can be accomplished by understanding the stateof-the-art of theories typically used to explain the phenomena being investigated.

B. STATE-OF-THE-ART
State-of-the-art, according to Santos [14], describes the current knowledge about a phenomenon to be studied. Discovering the state-of-the-art can provide a comprehensive overview of what has been done in a particular field and indicate the potential for further investigation into different cases that may not be fully explained by previous theories. Referring to the scientific method diagram, this is undoubtedly a crucial initial step in conducting research, as it forms the basis for formulating research questions or problems [14].
There are several types of state-of-the-art that can be identified. For example, there is theoretical state-of-the-art, which includes the most widely accepted theories and models in the field. There is also empirical state-of-the-art, which refers to the latest findings and discoveries based on empirical research. Additionally, there is technological state-of-the-art, which relates to the latest tools, technologies, and methodologies used in the field [14].
In literature reviews, this term is used to describe the current state of knowledge in a specific research area and to identify recent developments. A literature review on state-ofthe-art generally provides an overview of the current state of knowledge in a particular field or research area, highlighting recent developments in research. This includes a comprehensive summary of the latest research, methods, and techniques proposed and used in the field, as well as offering insights into future research directions.
In the context of this study, the research focuses on discovering the theoretical state-of-the-art, specifically in finding the latest research topics in a particular field. The proposed approach in this study, which uses LDA, BERT, filtering with entity linking and knowledge bases, as well as automatic data collection, processing, and intuitive visualization, is expected to help in revealing the latest research topics in a field of study.

C. SYSTEMATIC LITERATURE REVIEW
The widely known and used method to date for discovering state-of-the-art is literature analysis, often referred to as a literature review. There are various definitions of literature review, but it can be defined as the collection, analysis, and synthesis of findings, ideas, and perspectives from the literature on a particular topic. This provides a strong knowledge context foundation for the topic to be investigated and can help identify unexplored areas where the proposed research contributes [14], [15], [16], [17]. Literature reviews can be divided into three types, which include systematic, semisystematic, and integrative reviews. Of these three types, in most fields of science, including computer science, the systematic literature review has become the most developed and agreed-upon way to review literature, with many researchers noting it as the gold standard of literature reviews [17], [18].
A systematic literature review is a literature review conducted systematically and reproducibly. Systematic literature reviews are typically performed in three main stages, 59144 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  consisting of planning, conducting, and reporting, as illustrated in Figure 2. In the planning stage, the need for a literature review and the development of a review protocol (selection of keywords, search databases, and inclusion or exclusion criteria for articles) are determined. In the conducting stage, literature collection is carried out with a review protocol previously defined by the researcher. The collected literature will be further filtered to identify relevant literature, and after identifying relevant literature, a synthesis of ideas from each literature will be carried out. Finally, the reporting stage is where the synthesis results are shared [17], [19].
Although systematic literature review is praised as the ''gold standard'' for state-of-the-art discovery, there are many challenges in conducting it. Performing a systematic literature review can be difficult and time-consuming, as it involves the process of analysis, comparison, evaluation, and linking of different sources, which requires hours of manual work in reading and organizing content to achieve it. Moreover, as knowledge production is increasing at an incredible pace, literature selection can be challenging when faced with a continuously growing number of candidates. Consequently, this leads to difficulties in understanding, evaluating, and synthesizing the state-of-the-art of the topic being investigated [5], [7], [9], [17], [20].

D. BIBLIOMETRIC ANALYSIS
In an effort to make the process of extracting and understanding documents more efficient and automated, methods for gaining insights and comprehending documents have become a popular research area in informatics, more specifically in the fields of scientometrics and informetrics. With rapid advancements in technology and the continuous development of research in this field, many theories and algorithms have been created to explore the understanding of documents. The initial effort to understand and process scientific literature began in 1955 when Garfield (1955) proposed the idea of using citation indexes to link ideas among literature. Nowadays, clustering and similarity of documents using bibliometric data have become one of the cutting-edge ways to process scientific literature [21].
In line with Garfield's vision, the development of software such as VOSViewer or CitNet Explorer, capable of processing bibliometric data and performing co-citation, coauthorship, and co-occurrence of keyword analysis, represents one approach that uses bibliometric analysis to help researchers understand specific research topics. With this approach, researchers need to collect bibliometric data from one or more scientific databases. The data can then be input into the software, and with the use of such software, the discovery of relationships between research, research rankings, and keyword analysis can be performed and can assist in the preliminary analysis of research topics [11]. However, the results from bibliometric analysis have proven difficult to represent topics within the body of knowledge, as they are limited to grouping based on bibliometric data [21].

E. TOPIC MODELING
As an alternative, topic modeling techniques from the field of computer science provide another approach that can be employed in understanding documents. Topic modeling is a statistical modeling method used to discover latent topics within a collection of documents [20], [22], [23], [24]. In the realm of scholarly literature comprehension, topic modeling has been extensively utilized in characterizing scientific knowledge [25], [26], [27], due to its capability to uncover latent topics in large and unstructured document collections. Within topic modeling, the topics present in documents are classified based on the distribution of words within said documents. In other words, topics that are frequently mentioned in documents will have a higher weight in the topic modeling model.  In scientific research, the topics that are most frequently discussed and garner the attention of researchers are considered to be the most important and significant within that field. Topic modeling can help identify these topics automatically and objectively without requiring the subjective judgment of researchers. Furthermore, by analyzing trends in topic distribution over time, topic modeling can also help identify shifts in research focus and advances in the state-of-the-art within a given field.
One of the most recent implementations for conducting topic modeling involves using probabilistic topic models, with Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) being the most popular algorithms. Moreover, these two algorithms are the most commonly employed in topic modeling research, and numerous previous studies have compared their performance in discovering state-ofthe-art topics in various fields [20], [21], [22], [28]. LDA was first introduced by Blei, Ng, and Jordan in 2003 in their study entitled ''Latent Dirichlet Allocation.'' Figure 3 depicts a plate notation illustrating the generative process of the LDA algorithm. Conceptually, LDA assumes that a topic is represented by similar words and that each document can be represented by multiple topics. Additionally, LDA presumes that data follow a Dirichlet distribution [21], [22]. The primary objective of LDA is to map each document to a topic encompassing the words within the document. Since LDA treats each document equally (as a bag of words), reordering word sequences or modifying document content is unnecessary.
Technically, an LDA model has several parameters used to generate topics, illustrated in Figure 3. Here, α represents the per-document topic distribution, which manages the topic distribution for each document in the corpus. A higher α value will result in documents being assigned to more topics. θ is the document's topic distribution, representing the probability of the current document being assigned to a specific topic. Z denotes the topic for a specific word within the current document, w denotes the specific word, N is the number of words in a particular document, D is the number of docu-ments, K is the number of topics, ϕ is the word distribution for each topic, and finally, β is the per-topic word distribution, which manages the word distribution for each topic. A higher β value will cause topics to be represented by more words. With the above parameters, LDA works to generate topics as follows [21], [ it is more likely that the word w belongs to t.
The α parameter governs this topic assignment distribution for documents. 2. p(word w | topic t): This represents the proportion of assignments to topic t across all documents that come from word w. This proportion tries to capture how many documents belong to topic t because of the word w. The LDA represents documents as a mixture of topics and topics as a mixture of words. If a word has a high probability of being in a topic, all documents containing that word will be strongly associated with that topic as well. The calculation of these probabilities results in updates to the ϕ parameter values, and this distribution is governed by the β hyperparameter. 4. Update the probability of word w belonging to topic t as p(word w with topic t) = p(topic t | document d) * p(word w | topic t), updating the z parameter values. Once the probabilities are estimated, a set of words to construct or represent a particular topic can be determined by selecting words above a certain probability threshold or the top N probabilities. Similarly, topic assignments for each document can also be performed in a similar manner.
Latent Semantic Analysis (LSA), on the other hand, is a topic modeling algorithm that employs a multistep process to determine the structure of a corpus by modeling the context in which a word or term appears. In its operation, LSA constructs a document-term matrix (DTM), which is subsequently reduced using singular value decomposition (SVD) to approximate the eigenvectors of each word and the similarity between words. A factor rotation is then applied to the eigenvectors, and topics are generated from the clustering of word eigenvectors from the corpus [28].
In the field of state-of-the-art discovery, several studies have employed LDA and LSA topic modeling methods to identify research trends and provide an overview or summary of a topic. For example, Lee et al. [22] used LDA and bibliometric analysis to identify research trends in digital transformation in the manufacturing and engineering fields. Similarly, Li et al. employed the LDA algorithm, K-means clustering, and normalized term frequency-inverse document frequency (NTF-IDF) to discover research trends in deep learning within the computer vision field [30]. LSA has also been used in the social sciences to identify and analyze thematic patterns in large text collections [28]. These previous studies demonstrate that topic modeling, particularly LDA and LSA, can be used to discover the state-of-the-art in a topic and assist researchers in conducting preliminary analysis of research.
However, upon a comprehensive review of the literature, there has yet to be discovered an approach that can identify state-of-the-art research on any given topic with the capability to automatically process scholarly literature from various sources. Consequently, the contribution of this research is an integrated software system with a topic modeling component capable of uncovering the latest developments in a particular field.
The software extracts bibliometric data and article content from multiple research databases, employing information retrieval (IR), a process involving the collection and organization of information from a database [31]. More specifically, the software will utilize web scraping techniques, an efficient method for automatically extracting targeted data from websites. Owing to its ability to gather information from websites on a large scale, this technique is well-suited for this research, as the software will amass vast amounts of data from various research sites [32].
Subsequently, the software will process the collected data using a Latent Dirichlet Allocation (LDA) model, with the ultimate output being a visualization of topic modeling results. This will enable researchers to better understand current topics and emerging trends in research for any specified subject area. Figure 4 illustrates a flowchart that visualizes the necessary steps to conduct the research. There are five primary steps performed sequentially in this study: literature review, topic modeling algorithm determination, software development, software evaluation, and software deployment. Specifically, for the second and third steps, several sub-processes are carried out within those stages. This is symbolized by the yellow blocks encapsulating sub-processes (marked in purple). A more detailed description of each step and sub-process will be elaborated in the subsequent subsections.

A. LITERATURE REVIEW
The research begins with a literature review, a crucial step that provides a deeper understanding of concepts and previous researchers' efforts to address the problem of state-of-theart discovery. Through this review, researchers can identify weaknesses and limitations in previous studies and formulate output solutions relevant to a novel context. Explanations of concepts, insights, and previous researchers' efforts that form the foundation of this study have been presented earlier.

B. TOPIC MODELING ALGORITHM DETERMINATION
Algorithm selection is a critical step for state-of-the-art discovery in this research. The two most commonly used topic modeling algorithms, LDA and LSA, are chosen for comparison. The rationale for selecting these algorithms is based on their popularity and widespread use in previous research. LDA and LSA have gained popularity in recent years due to their ability to extract meaningful topics from large, unstructured text datasets and have outperformed other topic modeling algorithms in various studies. Previous research has compared LDA and LSA across different fields, including natural language processing, information discovery, and state-of-theart [33], [34], [35]. For example, Blei et al. [36] compared LDA to other models, demonstrating its ability to reveal the underlying structure of a corpus.
Other studies have shown that LSA is particularly effective for topic modeling on documents with shorter content, while LDA performs better on more extensive content documents [36], [37]. Moreover, LSA and LDA have been widely applied in research that discovers and analyzes research maps and topics within a field of study, aiming to identify current trends and gaps in a research area, as previously outlined in the literature review. Given the popularity and widespread use of LSA and LDA in previous research, as well as their effectiveness in discovering state-of-the-art topics, this study compares both algorithms. The comparison aims to identify the most suitable algorithm for automatic state-of-the-art discovery based on its ability to reveal relevant topics and provide meaningful labels.
Therefore, after conducting the literature review, the LDA and LSA topic modeling algorithms are implemented, and their performance is evaluated on a benchmark database to determine the topic modeling algorithm to be used. Data collection for algorithm determination is conducted by gathering research data, specifically abstract data, from research in the field of ''image recognition'' in the CORE academic database. This is done using the Python programming language with the requests library, which is used to make hypertext transfer protocol (HTTP) requests to the application programming interface (API) provided by CORE, with the search query as follows: ''image recognition'' AND fieldsOf-Study:''computer science'' AND documentType:''research''.
This query will return all academic articles in the field of computer science that contain the topic of image recognition. From the obtained data, only abstracts will be used, in line with previous research that recommends using abstracts based on data availability and consistent writing style [27], [38]. The CORE database is a comprehensive collection of open-access research papers, making it a valuable resource for this study [39]. However, it should be noted that the dataset obtained from CORE might not be entirely representative of the entire field of computer science and image recognition research.  One potential limitation is that the CORE database might not include all available articles in this field, particularly those published in journals or conferences that do not make their content openly accessible. Despite these limitations, the use of data from CORE as a data source for the determination of the algorithm still provides a valuable snapshot of the current research landscape in the field of image recognition.
Additionally, data collection for the algorithm determination phase is limited to 500 articles. This limitation was primarily driven by practical considerations, as conducting initial comparisons and algorithm evaluations with a smaller dataset ensures a manageable computational load and reduces processing time. The metric used to measure model performance is the UMass coherence metric [12], [22], [40], with data obtained from the CORE academic database for a research topic. The model with the best coherence metric value and coherent topic output is used.

C. DATA COLLECTION
After the model has been determined, a data collection pipeline is developed. The pipeline includes retrieving abstract data from five academic databases, CORE, arXiv, ScienceOpen, Emerald, and Garuda, according to the provided search query. The decision to use these databases was influenced by their extensive coverage of topics and disciplines, as well as their widespread recognition and use in the academic community. The choice to include Garuda, an Indonesian academic database, is to test the system's capability to process multilingual content. Nonetheless, relying solely on these databases may introduce selection bias, as some relevant studies not included in these databases may be overlooked.
The data is retrieved using the respective academic database's API through HTTP requests using the requests library, or, if the API is unavailable, through website scraping using the selenium library. Both data retrieval methods, however, have limitations. For instance, academic databases impose rate limits on API requests, which could hinder scalability and performance. Similarly, website scraping, while a viable temporary solution, requires close monitoring and maintenance due to potential changes in website structure, which could render previously scraped elements unavailable.
The focus on the collection and the analysis of research article abstracts are consistent with previous studies and are supported by several other reasons, such as (i) the availability of abstract data from the utilized databases, (ii) abstracts representing a brief summary of the article, minimizing the likelihood of identifying minor topics, and (iii) abstracts across articles being relatively comparable in terms of format and writing style [27], [38].
Nevertheless, using only abstracts may not capture the full breadth and depth of a research study, which could result in the derived topics being overly general. This is a potential trade-off when compared to processing full-text articles. However, [27] argues that abstracts are a common starting point for researchers to assess articles. Meanwhile, [38] highlights that the consistency in the structure and content of abstracts across various papers, which ensures that the key themes are not obscured by differences in writing style or presentation; thereby facilitating a more reliable comparison and analysis of different papers.
They studies also highlight that abstracts can still be a representative snapshot of a paper's content and effectively distill the essence of the research. Additionally, abstracts, due to their shorter length, require less computational resources to process, making them an ideal choice for topic modeling and text analysis. Key findings from similar studies in the field of text analysis and topic modeling also support the fact the common use of abstract in the field is due to its accessibility, cost and computational resource effectiveness, and the adequacy of content [41], [42], [43]. Therefore, the use of abstracts in our research aligns with the previous research, underscoring the effectiveness of abstracts in representing the core content and facilitating the analysis of academic articles.

D. DATA PREPROCESSING
Before using the data for analysis, preprocessing is performed to ensure that the data is suitable for modeling [44]. Preprocessing steps are carried out using the gensim library. A detailed explanation of the preprocessing steps undertaken is as follows:

1) TOKENIZATION
Tokenization is the first step of the preprocessing phase, where the text data is split into smaller units, referred to as tokens. This procedure is typically done at the word or subword level. In this study, we adopt word-level tokenization. This tokenization process operates by breaking a complex sentence down into words using space and punctuation as delimiters. For instance, the sentence ''I am studying machine learning.'' would be tokenized into ['I', 'am', 'studying', 'machine', 'learning', '.'] [45], [46].

2) CASE FOLDING
Case folding is then performed to ensure that the text is standardized in terms of the case. This process ensures that machine learning models interpret words with different character cases (e.g., ''You'' and ''YOU'') in the same way, minimizing the likelihood that the model will weigh identical words differently [45], [46].

3) TOKEN CLEANSING
Token cleansing is also performed, which includes several sub-processes, such as accent and diacritic normalization, where words like ''cliché'' are changed to ''cliche'' to reduce the variety of tokens. Additionally, the removal of links, punctuation, numbers, and non-Latin characters is also carried out to keep the focus on meaningful content [45], [46].

4) STOPWORD REMOVAL
Stopword removal is performed to discard words that frequently appear in the text but do not contribute to its meaning [45], [46]. The list of stopwords used is a combination of Indonesian and English stopwords provided by the Natural Language Toolkit (NLTK) library [47].

5) LEMMATIZATION
Lemmatization, as the final preprocessing step, is performed to reduce words to their base forms. This aims to ensure that derived words are interpreted in the same way by the model [45], [46], allowing the model to treat the variations of a word as a single token. For lemmatization, the stanfordnlp library with its Indonesian NLP pipeline is used to find the lemma of Indonesian text [48], and spacy is used for English text [49]. The application of lemmatization helps reduces the complexity of the model and increases the interpretability of the results.

E. VECTORIZATION
With the preprocessed data, the data is transformed into its numerical representation before it can be used in modeling. Most topic modeling implementations do this by building a frequency matrix for each token, where the token is an n-gram sequence, where n is determined by the researcher. This implementation is not carried out in this study, due to several reasons. Firstly, the determination of n-grams requires the researcher to know the domain to determine the appropriate n-grams, which is not always practical or feasible. Moreover, an n-gram configuration suitable for one case may not be applicable for other cases, thus determining n-grams requires repeated experimentation. This can result in the model output producing ''awkward'' and grammatically incorrect phrases [50], [51], [52]. Therefore, this study uses an approach that can extract sequences or phrases using part The POS patterns used are patterns widely used in research to extract phrases, where all phrases starting with an adjective followed by a noun, or a noun with a noun, or a noun with a preposition and a noun are extracted [53], [54], [55], [56]. In addition, there is additional filtering in the data vectorization process, where phrases or words that appear in more than 93% of the documents in the corpus are discarded. This ensures that the resulting vectors are not dominated by overly frequent terms, and therefore contain more nuanced information about each document's content. The result of this process is a document-phrase matrix, where each row corresponds to a document and each column corresponds to a unique phrase. The entries in this matrix represent the frequency of each phrase in each document. This matrix is then used in the subsequent modeling phase, where LDA and BERT are applied for automatic discovery and labeling of topics. With this approach, manual n-gram definition is no longer needed, and the modeling results can produce outputs that are expected to be more coherent and grammatically correct.

F. MODELING
The numerical matrix extracted in the previous stage is then used to determine the parameter values in the model to be built. Some parameters that will be hardcoded in the model are as follows: • max_iter = 50 • learning_method = batch • Parameters α and β are determined automatically using variational Bayes inference.
For parameter K or the number of topics will be determined through the use of the elbow cut-off, where the K parameter value with the highest UMass intrinsic coherence score is used. An early stopping mechanism is also applied to determine the value of K , with ε = 8, where ε is the patience parameter. This early stopping follows the logic as follows: for i . . . j, where i and j are predetermined values (in this study, 2 to 30), the search for the K value is stopped if the coherence score does not increase at i . . . i + ε. After obtaining the parameter values that can be assumed to be the optimal parameters, LDA model creation will be carried out with the parameter values found. This modeling process will be performed at runtime, meaning that the model is created dynamically using the abstract data from the user's search results.

G. LABELING
Automatic topic labeling is then carried out in a similar manner as described in the initial experiment, where the top 25 words from each topic will be combined into one text. Then, using the phrase extraction process accommodated by the pretrained multilingual BERT deep learning model paraphrase-multilingual-MiniLM-L12-v2 [57], ten phrases that best represent each topic will be found. This is done by searching for phrases with the closest cosine similarity embeddings to the combined 25 words from each topic. It is assumed that higher similarity indicates that these phrases can represent the overall topic well. The reason for using a multilingual model is because the corpus processed in modeling may contain Indonesian and English data, and each text may contain words, phrases, or terms in other languages. In addition, the pretrained BERT model used has been fine-tuned to identify and find semantic similarity between phrases in different languages, making it suitable for use on article content containing multiple languages. Choosing the labeling method using the BERT model is also due to the quality of the embeddings produced. Other methods for generating embeddings, such as word2vec, GloVe, FastText, and ELMO, generally are not trained on large text corpora, do not use transformer architecture, and do not take into account the context of words in a text. BERT as a base model has been trained on a very large text corpus that covers 104 languages, thus having seen many different contexts, and is capable of producing embeddings that take into account the context in which the word appears [58], [59]. Additionally, because the model used is pretrained and has been fine-tuned on a phrase-matching task, it is capable of producing more accurate and useful embeddings for natural language processing tasks compared to other embedding methods.

H. CANDIDATE LABEL PHRASE FILTERING
Generated phrases may be ''random'' or fail to represent an original concept or topic. Therefore, we introduce a novel topic-filtering method, grounded in the principle of entity linking. Entity linking is a process in which each phrase is resolved to a knowledge base (KB) [60], [61], [62]. This process thereby helps leverage structured world knowledge to enhance the coherence and relevance of the generated phrases. In our methodology, each phrase is cross-referenced with two knowledge bases: ConceptNet and Wikipedia, and phrases not found in both KBs are discarded. ConceptNet is a knowledge database developed by MIT, containing information about words or phrases and their relationships with other data [63]. Similar to ConceptNet, Wikipedia is an encyclopedia encompassing information across various fields of knowledge [64].
The fundamental assumption is that since ConceptNet and Wikipedia include human-verified concepts and relationships if a phrase is not present in either platform, it is deemed non-existent and removed. Essentially, each phrase is used as a search query in the KB. If a phrase is not found in the KB and has more than one word, the program constructs n-gram combinations of each word in the phrase for N = 2 up to the total number of words in the phrase and searches the KB. If any combination is found, the phrase is retained. Additional libraries used in this process include conceptnet_lite as an interface to the ConceptNet database and the wikipedia library for accessing Wikipedia data via HTTP requests.

I. VISUALIZATION
The output of the modeling, i.e., the phrases representing each topic, along with their weights, will be used in the visualization of the final topic modeling results. Each collected abstract document will have its categorical probabilities for certain topics obtained using the built model. Then, each document will be categorized into a specific topic with the highest topic probability.
The first visualization produced is the distribution of topic groupings on user search queries using a nested bubble chart. Based on the number of documents, representative phrases, and weights of each phrase from each topic, a bubble will be created with a size based on the number of documents categorized into a particular topic, the label of the bubble is the phrase with the highest weight indicating that it is the core idea of the topic, and there are nested bubbles representing other representative phrases (if any). Nested bubble charts were chosen for their ability to display hierarchical structures in data, where larger bubbles represent primary or trending topics, and smaller, nested bubbles represent subtopics or related phrases. The nested nature of the bubbles helps to highlight the intricate structure of academic research, where broad themes often encompass a variety of subtopics and related concepts. By providing a snapshot of the distribution of topic groupings, it aids users in quickly identifying trending topics and potential gaps, where smaller bubbles or absent themes may indicate areas that are less explored.
In addition to the bubble chart, a line chart is also produced to show the evolution of topic frequency over time based on user search queries. The Y-axis on the chart will represent the number of documents, and the X-axis represents the time range. Each data point on the chart will represent the number of documents for a specific topic and time. The line chart serves as a tool to track the development and progression of research trends over time, offering insights into the changing landscape of the academic field in question. By visualizing the frequency of topics over time, users can understand past and current research trends and possibly predict future directions.

J. SOFTWARE IMPLEMENTATION
The software developed as the culmination of this research is web-based and deployed for ease of access. Figure 5 displays the architecture of the software.
The architecture of the developed software consists of three primary components: database, backend, and frontend service. The web-based nature of this software facilitates seamless implementation and accessibility for the general public, enabling any internet-connected device to access and utilize the software.
The backend service of the software is an API constructed using the Python programming language and the FastAPI library. The API's primary responsibilities include retrieving academic data from the database based on user-provided search queries, performing topic modeling on the obtained data, and storing search histories in a MySQL database.
The frontend service is a website developed with Next.js, connected to the API, and serves as an interface for conduct-ing state-of-the-art searches on user-specified topics. In addition, this interface transforms the data received from the API, following the modeling process, into visual representations such as nested bubble charts and line charts. The application's frontend is hosted on the renowned Vercel cloud platform, providing enhanced performance and scalability. The use of the platform also significantly helps in the development and delivery of the software, as servers do not need to be managed and can scale infinitely However so, it is also subject to limitations inherent to cloud platforms such as potential latencies or downtimes during peak traffic and concerns about data privacy.
The backend and database, on the other hand, are hosted on a robust server operating on the widely-used Ubuntu system, which is situated within the DigitalOcean infrastructure. While this setup offers the advantage of robustness and a widely-used operating system, potential challenges such as difficulty in maintenance and scaling may arise, as server management must be handled manually. It is also important to acknowledge that the hosting decision was influenced by cost-efficiency and availability considerations, which might not necessarily align with optimal performance requirements.
This server is outfitted with 8GB of random-access memory (RAM) and a central processing unit (CPU) containing four virtual cores. These specifications were selected to ensure optimal performance and responsiveness for the anticipated user load. However, as user load increases or computational requirements for topic modeling change, these specifications might become insufficient, resulting in slower response times or limitations in handling concurrent user requests In order to ensure optimal performance and responsiveness of the application, various measures have been taken, including optimizing the code base, such as using parallel processing techniques in the modeling process and creating requests to third-party services, regular server maintenance, and monitoring system performance. However, potential bottlenecks such as network latency, system overload, or computational resource limitations may still occur, particularly during peak usage times. Therefore, future iterations of the software may require upgrades to the server specifications, or other architectural improvements to be able to retrieve and handle larger datasets and models.

K. EVALUATION
The performance of the developed software in discovering state-of-the-art is tested before deploying it for public use.

1) SOFTWARE SURVEY
The built software is tested to ensure that it is qualitatively suitable for use in research and for general public use. This qualitative evaluation process includes a questionnaire designed to assess two primary dimensions of the research output, namely, the application and the topic modeling results.  The questionnaire is distributed to individuals who are faculty members and/or student researchers. The rationale behind selecting individuals with these qualifications is to ensure that respondents have a clear understanding of the state-of-the-art in their respective research focus areas. This respondent selection criterion is also used to ensure that each respondent is competent and can assess the relevance of the topic outputs and their alignment with the desired field of study. Moreover, the questionnaire provides information that the state-of-the-art referred to is the theoretical state-of-theart, pointing to the latest topics in the field of study.
Each dimension has three indicators that will be used to provide an overall evaluation of each dimension and are rated on a Likert scale with abbreviations Strongly Disagree (SD -Score 1), Disagree (D -Score 2), Agree with Doubt (AD -Score 3), Agree (A -Score 4), and Strongly Agree (SA -Score 5). The items used in the questionnaire can be seen in Table 1.
The questionnaire items presented in Table 1 are created in a form using Google Forms to facilitate the collection of evaluations of the software output. Each participant is asked to try the developed software and complete the provided form based on their individual assessment. If the evaluation results show that, on average, the values of each item for both dimensions are above three, the software is assumed to be suitable for public use.

2) OUTPUT COMPARISON OF SOFTWARE WITH OPEN KNOWLEDGE MAPS
Sample outputs from one of the entries submitted by respondents are also evaluated and compared with the results of software with similar purposes and visualizations, such as Open Knowledge Maps. Open Knowledge Maps is a software developed to aid in searching, discovering, and understanding scientific literature in various fields of study. In line with the research objectives, Open Knowledge Maps was built based on the difficulty of finding research maps from the vast amount of available academic content. Therefore, the software provides an interface that visualizes a research topic in the form of a knowledge map containing major themes in the field of study represented as circles [65], [66], [67].
Open Knowledge Maps have been widely used in research across various fields to help researchers understand research maps, identify research trends, and discover gaps in research. For example, [68], [69] used Open Knowledge Maps to find and confirm the main themes in research related to agro, agri, and rural tourism. [70] also used Open Knowledge Maps to map the development of artificial intelligence applications in the Internet of Things. Additionally, Open Knowledge Maps have been used in various other fields, such as business [71], [72], [73], [74], marketing [75], civil engineering [76], and health [77], [78], [79], for similar purposes.
By comparing the research output with the outputs of Open Knowledge Maps, a software that has been tested and used by other researchers, the effectiveness and accuracy of the proposed approach can be evaluated, and areas where improvements in the software can be made, can be identified.

A. TOPIC MODELING ALGORITHM DETERMINATION
To determine the most suitable topic modeling algorithm, two standard algorithms, namely Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA), were implemented and compared. Figure 6 illustrates the coherence values for various K values.

1) LDA
As shown, the optimal coherence value for LDA modeling is obtained with K = 3, with a coherence value of -27.69. Consequently, K is set to 3, and these parameters are employed to construct the LDA model. By visualizing the model output using the pyLDAvis library in the form of an intertopic distance map, it is evident that each topic generated by LDA is well-segregated (see Figure 7).
The LDA modeling results, which return topics, are further analyzed using phrase extraction involving part-of-speech 59152 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   (POS) patterns and the BERT model. Through this advanced analysis, it is apparent that each topic discusses distinct and coherent subjects.
For instance, the output shown in Figure 8 reveals discussions on text recognition (first topic), face recognition (second topic), and fingerprint recognition (third topic) within the field of image recognition. Some phrases are still ''random'' or do not represent an original concept, such as face use or human show video. These are further filtered by performing entity linking with the ConceptNet and Wikipedia knowledge bases. The outcomes of the phrase filtering and normalization are presented in Table 2. Table 2 displays extracted phrases before and after filtering and normalization using entity linking, as well as phrases filtered and normalized using entity linking with the ConceptNet and Wikipedia knowledge bases.
For the first topic, it is observed that filtering with Concept-Net only outputs the phrase neural network as the sole valid phrase. On the other hand, filtering with Wikipedia yields several phrases, such as text recognition, neural network, network model, and system image. For the other two topics, the filtering and normalization results are quite similar for both methods, with the second topic containing only face recognition, show video (for ConceptNet), and human video (for Wikipedia), and the third topic represented solely through the phrase fingerprint recognition.
By performing filtering, the discussion themes of each topic become clearer and more detailed. Besides ensuring that each phrase from the topic represents a real concept, this process also demonstrates the potential for automatic topic labeling by showing filtered and normalized representative phrases. Evaluating the topic outputs and their representative phrases, it can be concluded that LDA effectively identifies topics from research abstracts, meaning that each topic is well-segregated and coherent. This approach has the potential to significantly enhance topic analysis and understanding, as well as streamline the process of categorizing and organizing research findings in a more structured manner.

2) LSA
In addition to LDA, LSA was also implemented as a basis for determining the topic modeling algorithm to be employed for this research. In its implementation, LSA only requires the determination of parameter K or the number of topics. The value of parameter K was determined similarly to LDA using the elbow cut-off method. Figure 9 displays the coherence values for various K values. For the LSA model, the best coherence value was achieved with K = 2, having a coherence value of -35.80. Consequently, K was set to 2. After modeling was performed, similar to LDA, phrase extraction using the BERT model was conducted to assist in the topic labeling process. The outputs of the extracted phrases can be seen in Figure 10.
Unlike the LDA model, the intertopic distance map visualization was deemed unsuitable for the LSA model, primarily because it does not represent topics as probability distributions over words. This characteristic poses challenges for visualization, as distance metrics such as the Jensen-Shannon divergence necessitate probability distributions for accurate measurement [80], [81]. However, filtering and normalization of the phrases generated by the LSA model were also performed using ConceptNet and Wikipedia. Table 3 shows the phrases before filtering and normalization, as well as the     outputs of filtering and normalization through the Concept-Net and Wikipedia knowledge bases.
For both topics, LSA appeared to struggle with segregating the two topics, as evidenced by the similar phrases between the two topics, both discussing face recognition. Further filtering and normalization with ConceptNet and Wikipedia confirmed that the two topics were not well-segregated, as both could be concluded to discuss face recognition. Table 4 shows a comparison of the outcomes between the two algorithms employed. The table displays the algorithms tested, LDA and LSA, the coherence values obtained from each model, and whether the topic outputs were wellsegregated.

3) COMPARISON AND DETERMINATION
Considering the performance of both models in finding topics within the given abstract data, it can be observed that LDA outperforms LSA in several aspects, such as coherence value and topic separation. The LDA model used had a coherence value of −27.69, while the LSA model's coherence value was lower at −35.80. Moreover, concerning topic outputs, LDA generated three topics, each representing a distinct topic: artificial neural networks, face recognition, and fingerprint recognition.
On the other hand, LSA produced two topics; however, both topics represented the same concept -face recognition, indicating that the topic separation of the LSA model was imperfect. It should be noted that, for the phrase extraction process that could be used for topic labeling, both models produced representative phrases. Taking into account the results obtained from preliminary experiments and the literature review that has been conducted, which demonstrates the prevalence of LDA in similar research, LDA will be employed as the primary algorithm for topic modeling in this study.

B. SOFTWARE IMPLEMENTATION
With the LDA algorithm selected, as well as the data collection pipeline, modeling methods, and visualization presented in the previous sub-chapters, a search interface has been created to explore the latest topics from research topics of interest. The software can be accessed at https://ldaviewer.vercel.app/.

C. SOFTWARE SURVEY
The developed software was further tested to ensure that it is qualitatively suitable for use in research and by the general public. This qualitative evaluation includes a questionnaire designed to assess two main dimensions of the research output, namely the application (software) and topic modeling results. The questionnaire was distributed and completed by 52 individuals who were student researchers. The analysis of the obtained survey results will be explained further. There were 52 respondents from various faculties with diverse research interests (see Figure 11).
The majority of respondents were researchers from the industrial technology faculty, but there were also many respondents from other faculties such as civil engineering and planning, psychology and sociocultural sciences, mathematics and natural sciences, medicine, religious sciences, business and economics, and education.
Regarding the application dimension, specifically in terms of ease of navigation, it can be seen in Figure 12 that the average score given was 4.3 on a scale of 1-5, with 5 being   the best score, with the highest distribution on the fourth and fifth scales. This score implies that the application is generally easy to use and navigate. This may be due to the fact that the software is designed to have an intuitive interface. Interface elements such as the search bar for searching are used because they are familiar with other popular software like Google, as well as placeholders and labels that provide usage instructions for users, making it easier for them to navigate the application.
Next, in terms of ease of information acquisition with the application interface, an average score of 4.1 was given, and similar to the previous item, it had the highest distribution on the fourth and fifth scales (see Figure 13). This indicates that the application interface can convey information that is easily accessible by users. The way information is presented in the software, using labels before presenting information and disclaimers, may explain why users can generally access information easily.
Lastly, in Figure 14, it can be seen that for the ease of information digestibility from the application, there are some-  what different results, with some respondents giving scores of 2 and 3, but still with the majority of scores at 4 and 5. Therefore, the average for this aspect is 4.1, indicating that the information from the application can be easily understood and digested. This may be because the software displays topic modeling output in the form of intuitive nested bubble and line chart visualizations that are easily understood by users. Additionally, one of the factors that may support this ease of information digestibility could be attributed to the labels explaining the components of the topic modeling output.
From Figure 15, regarding the relevance of the topics generated by the software to the topics chosen by the respondents, most respondents gave high scores, with an average score of 4.2 on a 1-5 scale. This indicates that the topics generated by the software are quite relevant to the topics chosen by the respondents. This can be explained by the new topic modeling approach proposed in this study, which extracts phrases dynamically from article content using POS patterns and then processes them through the LDA, BERT, and entity linking models. Through this approach, each generated topic is labeled with related phrases that refer to real topics, which may explain why the topic outputs are relevant to user search queries.
Meanwhile, for the evaluation of the quality of the topic assigned to the articles used, the average score given by respondents is 4.3 (see Figure 16). This indicates that most respondents believe that the articles used are in line with the labels assigned from the topic modeling process. The topic modeling process using the LDA model essentially ensures that each topic is represented by words or phrases that often co-occur and are similar through the Dirichlet process. Therefore, the article-topic suitability can be attributed to this VOLUME 11, 2023 59155 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  process, as the article content can be represented by topics with similar keywords.
From Figure 17, the average score given for the outputs and visualizations provided by the software from topic modeling results is 4.4, indicating that the provided visualizations can help deliver an intuitive research overview to researchers. Nested bubble and line chart visualizations can help provide an intuitive overview of research results because both types of visualizations can offer a simple visual display and integrate topic modeling information from collected articles. Intuitively, the nested bubble chart presents quantitative data, such as the number of documents classified into a particular topic, while the line chart can be used to show changes in topic frequency over time. By combining both types of visualizations, researchers can quickly identify current topics in a particular field and the evolution of topic trends over time, which is helpful for understanding the research landscape on a specific topic.
Based on the survey results, several conclusions can be drawn. First, most respondents found the developed application interface easy to navigate and facilitated their access to information. Second, the information presented in this application was considered easy to digest by most respondents. Third, most respondents also felt that the topics obtained through this application were relevant to the selected topics. Fourth, most respondents also felt that the articles used were in line with the topic labels given in the topic modeling process. Finally, most respondents also found the developed model's output and visualizations useful for their research.

D. OUTPUT COMPARISON OF SOFTWARE WITH OPEN-KNOWLEDGE MAPS
The following section will discuss a sample output from the software for a search query conducted by one of the respon- dents concerning ''circular economy'', a topic in business and economics that focuses on the sustainable and efficient use of resources to reduce environmental impact [82], [83], [84]. The evaluation will also be performed on the modeling output for this search query, and the results will be compared to the topic outputs from Open Knowledge Maps. The respondent conducted the search using the standard search feature that retrieves articles from all databases with a limit of 100 articles per database. In Figure 18, it can be seen that the search results were obtained in 17.36 minutes, with a total of 495 processed articles, 100 from each database except for the CORE database, where only 95 documents were obtained. The number of topics generated was 18, with a topic coherence value of -14.39. Figure 18 displays the search results obtained by the software for the ''circular economy'' query, with a total of 495 articles processed and 18 topics generated. The topic coherence value of -14.39 suggests that the topics generated are relatively coherent, although there is room for improvement in the modeling process. The results show that the software was able to effectively process and analyze a large number of articles in a relatively short amount of time, providing valuable insights into the research landscape of the circular economy. Table 5 shows the results of the topic modeling from the software used to analyze the text related to the circular economy. This table presents the topics generated by the software, the labels representing those topics, and other phrases that can represent related topics.
The generated topics cover various aspects of the circular economy, such as circular supply chain (sustainable supply chain), government policies, circular migration in cities, macroeconomy, sectoral level trade, business cycle, bank security against terrorism, global economy, local knowledge communication, provincial economy, sectoral resources, abstract economy, government fuel, regulation implementation, music industry, payment in Indonesia, cir-59156 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. cular decision-making, and mushroom substrate. Each topic has a label representing it, as well as other phrases that can represent related topics. For example, the first topic, Circular supply chain, has labels such as ''Circular business model'', ''Circular economy business'', ''Circular economy VOLUME 11, 2023 59157 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. strategy'', ''Circular economy principle'', ''Circular business case'', ''Plastic waste'', ''Waste management'', ''Chain management'', ''Environmental impact economy,'' and other phrases that can represent this topic can be found among these phrases.
The resulting topics from the modeling are relevant to the circular economy theme, including topics about circular supply chain, government policies, circular migration in cities, macroeconomy, and business cycle. In addition, these topics are also associated with phrases relevant to the circular economy, such as circular business model, waste management, environmental impact economy, and so on. This shows that the topic modeling process used can provide accurate and relevant results to the circular economy topic. Moreover, it demonstrates the capability of the topic modeling process used as it can process multilingual content simultaneously and group them into a topic; for example, the topic of government policy is rep- This shows how the process is able to process both English and Indonesian articles, which signifies that the topic modeling process used is language-agnostic and potentially can be used without limitations on the language of the text being processed. The results of the topic modeling conducted on this search query are shown in Figures 19 and 20, which are nested bubble and line chart visualizations. The nested bubble chart visualization shows the frequency of documents labeled under specific topics, and the line chart shows the trend of topics over time.
As seen in Figure 19, the circular supply chain topic has the highest number of articles in this field. This is intuitively reasonable since the circular supply chain is one of the main concepts that facilitate the flow of materials and resources in a closed loop through three main steps: reduction, reengineering, and recycling of resources. It can also be seen that topics discussing government policies and migration to a circular economy in cities are frequently discussed, indicating that the implementation of the circular economy requires government support through the realization of related policies and can be implemented in cities.
However, it can be seen that the bubbles for mushroom substrate, decision-making, and regulation implementation topics are tiny. This indicates that not much research has been conducted on these subjects. If generalized, the mushroom substrate topic discusses fungi, which are often discarded materials but should be recycled, and the reason why this topic may be rarely discussed is that it is very technical and specific to certain materials. Moreover, the small bubbles regarding decision-making and regulation implementation can be explained by researchers' focus on developing regulations to support the circular economy but not paying much attention to how governments can make decisions and  implement these regulations in the circular economy field. This can also be interpreted as a gap in the literature, where there is a lack of research on these topics. The line chart shown in Figure 20 displays the frequency trend of documents for each topic over time.
From Figure 20, it can be inferred that there has been a significant increase in research regarding government policies for the circular economy. This indicates a growing interest in this topic, receiving heightened attention from both researchers and governments. The visualization also reveals that prior to 2017, the majority of research was focused on general themes, such as sectoral level trade, macro, and global economy. It would be intriguing to investigate further the reasons behind the surge in research on government policies and other topics, such as the migration to a circular economy in cities from 2017 onwards. To compare the output sample presented with the results from Open Knowledge Maps, a research map was created using Open Knowledge Maps with the same keyword, ''circular economy,'' selecting the BASE database as it is an option that incorporates all fields of study. The visualization output from Open Knowledge Maps can be observed in Figure 21, which displays the generated research map.
From the research map generated by Open Knowledge Maps in the field of circular economy, it is evident that the topics of sustainable development approach, European Union, and circular economy approach are the largest themes in this area. In addition, related topics such as circular business models, resource management, and waste regulation also emerge as primary subjects in the circular economy field. This is somewhat different from the research output results, where the main topics discuss circular supply chains and government policies in realizing a circular economy.
Nevertheless, there are also several similarities between the topics generated by Open Knowledge Maps and the research output. For instance, the topics of circular supply chains, waste management and plastic (waste management, plastic products, waste recycling), and government regulations (tax policy and government policies) appear in both software outputs. From the research output perspective, this may indicate that the findings are accurate and relevant, as Open Knowledge Maps has been widely utilized in research and is considered a reliable source for identifying up-to-date research topics.
There are also distinct topics for each software output. From the Open Knowledge Maps results, the topics of sus-tainable development, agribusiness, circular-oriented innovation, and product design emerge, which were not generated by the research output. However, there are also topics present in the research output but absent from the Open Knowledge Maps results, such as migration and the communication of circular economy to cities and communities, mushroom substrate, and the decision-making and implementation of government regulations and policies. Additionally, the research output generates Indonesian-language circular economy topics alongside English-language ones. The differing topic results from the two software may be attributed to the use of different academic databases, as Open Knowledge Maps only supports PubMed and BASE, while the research output utilizes five academic databases, namely CORE, arXiv, Sci-enceOpen, Emerald, and Garuda.
In Open Knowledge Maps, it is noticeable that several circles feature the repeating topic of ''circular economy,'' which is counterintuitive as the keyword search or field of study intended for investigation. In the research output, the topic ''circular economy'' does not recur. This can be explained by the frequency filtering method applied, which removes too frequent terms from appearing. Moreover, the research output provides nested bubble visualizations and line charts, enabling users to visualize not only current topics but also gaps and trends in the evolution of research. These features can offer users a more comprehensive understanding of advancements in a particular field and can help identify new research areas or potential collaborations.
Based on the comparative results, it can be concluded that the research output generates academically sound topics, as evidenced by the similarities between the research output and Open Knowledge Maps. Additionally, the research output offers supplementary features that can enhance users' understanding of current topics and research evolution and provide an accurate and comprehensive representation of research.
From the user search query samples presented and the comparison made with software widely used by other researchers, it is apparent that the research output can generate state-of-the-art topics within a research area. Furthermore, its application can be tested on research topics in any field and language, owing to its language-agnostic and context-independent processing nature. This demonstrates that the topic modeling process presented in this research can be employed across various fields and can yield accurate and relevant results when analyzing trends in research.
Considering the survey results and the advanced evaluation of user search query samples in the software, it can be concluded that the developed research output can assist researchers in discovering the state-of-the-art within a research topic and provide insights into the latest developments automatically. The resulting software can be utilized by researchers in general across various research fields to obtain state-of-the-art research topics and can also be employed to identify gaps in research. VOLUME 11, 2023

V. CONCLUSION
Efforts to advance scientific knowledge should ideally become easier over time. However, the vast amount of digital information available has made this process increasingly challenging without tools capable of processing this information and providing insights to researchers. Previous studies have proposed various approaches to assist researchers in processing this information and formulating research maps. However, existing approaches are not yet fully automated, and the outputs obtained are limited due to a reliance on minimal bibliometric data.
Based on the conducted research and identified problems, it can be concluded that the developed software can help researchers discover the state-of-the-art within a research topic. This research addresses the proposed research question and provides a solution that can aid researchers in identifying state-of-the-art topics and gaining insights into recent developments automatically. The primary findings of this study are that the developed software can autonomously gather data from various academic databases, apply machine learning models and text analysis to identify research topics, and present topic outputs in an easily interpretable visualization format.
The use of the LDA machine learning model, automatic phrase extraction using POS patterns, and the BERT model has proven effective in identifying topics related to state-ofthe-art research. The topic filtering method based on entity linking ensures that the topic outputs are appropriate, accurate, and relevant to the research topic. The language-agnostic nature of the LDA model, coupled with the multilingual BERT model, demonstrates the research output's ability to synthesize and identify topics across languages. The topic visualizations presented in nested bubble charts and line charts also facilitate researchers' understanding of trends and gaps in research. The implications of these research findings are that the topic modeling process and developed software can support researchers in discovering state-of-theart research and provide insights into the latest developments automatically. This research output can be utilized by researchers in various fields to obtain up-to-date information and identify research gaps.

VI. LIMITATIONS AND FUTURE RESEARCH
This study faced several challenges and has several limitations that can be addressed in future research. First, ensuring the applicability and scalability of the preprocessing phase for diverse languages posed significant challenges. Extensive research, validation, and iterative refinements were required to handle linguistic variations across multiple languages. We addressed this by employing robust language processing libraries such as StanfordNLP and spaCy, capable of handling multilingual data. The determination of the POS pattern for the vectorization process presented another challenge due to the variability of key phrase structures across languages.
We recognized that the chosen pattern might not be universally optimal and may yield less satisfactory results in some contexts. Future research could explore dynamic POS pattern selection methods based on the characteristics of the specific language or academic discipline.
Our topic filtering method assumes that valid topics should exist in the knowledge bases, potentially invalidating emerging or niche topics not well-documented in ConceptNet or Wikipedia. To address this limitation, future iterations of our software could incorporate additional knowledge bases or employ other methods, such as integrating large language models trained on academic texts, to better capture such topics.
The software currently can only collect data from a select few academic databases, namely CORE, arXiv, ScienceOpen, Emerald, and Garuda, which means not all research data can be obtained. Second, there is a possibility that the topic outputs may still contain irrelevant topics, as not all data can be effectively filtered using the available knowledge databases. Additionally, the software currently takes a considerable amount of time to analyze the text and generate outputs.
Potential future research directions include developing software capable of gathering data from sources beyond academic databases, such as Google Scholar, ScienceDirect, and other academic databases. Subsequent research could also explore alternative approaches for generating more accurate and relevant topics automatically, such as using full-text content from obtained articles in conjunction with large language models (LLMs) trained on academic texts.
In regards to the survey conducted, there were some potential limitations and biases due to the chosen sample. While the respondents comprised a mix of 52 faculty members and student researchers, it may not fully represent the diversity of the larger academic community in terms of their research interests, academic levels, and familiarity with the software. Moreover, the sample size of 52, though statistically sufficient for a preliminary analysis, could limit the ability to generalize the findings to a broader population of researchers.
Larger studies with more diverse samples in terms of academic levels and fields of study could help to better evaluate the software's efficacy across different user profiles. It should also be noted that there could be a potential bias due to the self-selection of participants, as those who opted to respond to the survey might be more tech-savvy or more positive towards the software. Future studies could aim for a more randomized selection of participants to reduce this bias.
Further research could also evaluate the software's effectiveness in improving researchers' efficiency in searching for state-of-the-art topics, employing more rigorous validity and reliability tests, and evaluating the software's ability to present research findings in a more detailed and intuitive manner. 59160 VOLUME 11, 2023