Mapping of Cross-Lingual Emotional Topic Model Research Indexed in Scopus databases from 2000-2020

The scientific research on Cross-Lingual Emotional Topic models (CLETM) saw rising interest over the years. This study aims to identify the publication trends and growth potential of CLETM studies which will offer a better understanding and potential future research directions using bibliometric tools. All published articles related to ‘Cross Lingual’ or ‘Emotional topic model’ from Scopus were identified and analyzed using Bibliometrix R-package and VOSviewer software. A total of 1,188 publications were identified from 2000 to 2020 published in 429 journals contributed by 2529 authors with a 2.22 collaboration Index and 2.13 authors per document. Lecture Notes in Computer Science are most sources of published papers with 120 articles, h-index 12. The most active country was China with (TNP=145) documents. National Natural Science Foundation of China was the leading organization engaged in CLETM research funding. Li H who is affiliated with the department of Electrical and Computer Engineering, National University of Singapore was the most active author with 17 articles and h-index 9. Tsinghua University is the top author’s affiliation with 30 articles. The findings of this study provide landmarks, baseline information on vital research productivity, and insights into the historical progressions of CLETM research


INTRODUCTION
Cross Lingual Emotional Topic Models (CLETM) is a machine learning technique used for discovering semantic topics from a document collection which provides a convenient way to analyze large amounts of unclassified text. [1][4] Therefore, it is seen that topic models are a mathematical framework that assisted many users in the field of computer science to better understand a large number of document collections: not just to find individual documents but to further understand the general themes present in the collection, as well as new tools to explore and browse extensive collections of scholarly literature. [2,5]Due to the absence of bibliometric research on-topic models and sentiment analysis, and given the importance posed on the assessment of scientific production, [6][7][8] it became an essential branch of informatics with high usage in various scientific fields. [9,10]We sought out to make use of available bibliometric tools to analyze both the qualitative and quantitative attributes of published documents.The potential of digital libraries is not only in making documents more accessible but also in providing automated tools that can analyze the literature and help the readers better realize the term scientific contributions. [11]Bibliometrics is a field of study that attempts to utilize bibliographic data of publications and their citation relations to evaluate and reveal the structure of previous research and disciplines [12] such as recent published article in field of medicine in the progress in COVID-19, [13] effects of COVID-19 Pandemic on Mental Health [14] using of Artificial Intelligence and Machine Learning in Oncology, [15] infectious disease, [16,17] Ebola, [18] and childhood obesity [19,20] in order to understand the rapid growth of scientific research which is a difficult task and offers the road map of future research direction and challenges towards fill the future research gaps.Therefore, bibliometric studies stand out as a useful research technique to evaluate the continuous and rapidly evolving literature concerning CLETM and possibly identify future research directions.
In this study, we aimed to provide a comprehensive analysis of the research outputs in order to better assess the scientific research productivity regarding CLETM, and to characterize the high-impact articles, annual growth patterns of published documents, authorships, and authors scientific collaboration between researchers in the field of computer science and mathematics over the past 20 years.The main contributions of this study are given as follows.Bridge and provide further and broader understanding of the latest trends in CLETM global publication indexed in Scopus database from 2000-2020.

Selecting Study design
This study uses bibliometric analysis as a crucial scientific research approach adopted by many scientific scholars to monitor the research performance and scientific progress and also support appropriate policy actions for researchers or governments.The basic bibliometric variables, which presenting annual trends in a number of publications and citation times, number of authors, institutions, countries, journals, collaboration corresponding author analysis, and research hotspots, were assessed to provide researchers with a greater understanding of the documents published in the field of CLETM regarding past, current, and future directions of scientific research progress using bibliometric tools.

Data Sources
The data collected for this study was based on the retrieval of documents indexed in Scopus database (http://www.scopus.com/).Scopus is a unique database that can be used for bibliometric analysis.

Search strategy
A search strategy was developed, and comprehensive literature on CLETM literature was performed in Scopus on the 3 rd of September 2021.The study used the keywords: "crosslingual" or "Emotional topic model", to retrieve CLETM documents from the Scopus database published within the time span of 2000-2020.The keywords were searched in the article titles as to maximize the accuracy of the retrieved inquiry output.Regarding manuscript types, only English written documents, including research articles, conference papers, and review papers, were considered for analysis in this study (Figure 1).Two reviewers (IHM and IZ) independently screened the title to complete a list of the top 10 documents on the CLETM.Finally, all documents were downloaded in text file format (bib.txt,bib.ris, and CSV data format), and as a result, 1,188 publications related to CLETM were the subject of further analysis using the aforementioned bibliometric analysis techniques.

Data, analysis and visualization
The data was analysed via visualization software tools [21] such as the "Biblioshiny app" (using R-studio cloud) [22] and VOSviewer (version 1.6.6)package program (Leiden University, Leiden, The Netherlands) was used for mapping analysis and it also facilitates the visualization of dynamics and structure of information for the analysed documents. [23]

Characteristics of the meta-data
The retrieved documents were published in 2000-2020, in 429 journals with 2,529 authors contributed, and Collaboration Index of 2.22 per document.According to the analysis, all documents received a total citation score of 11,562.The majority of the documents were conference papers with 948 (79.80%), followed by articles 233 (19.61%), and review papers 7 (0.59%) as presented in (Table 1).The Annual trends of publication and citation times during the study period was represented in Figure 2.

Most cited documents
The top 10 cited documents are presented in Table 2.The top most cited paper for CLETM research is titled 'Co-training for cross-lingual sentiment classification' which received over 324 citations.This paper proposes to use the co-training approach to address the problem of cross-lingual sentiment classification, [24] followed by 'Learning a multilingual subjective language via cross-lingual projections' with 232 citations. [25]The later study proposed a new unified framework for monolingual and cross-lingual information retrieval.
Our findings show that LI H from the Department of Electrical and Computer Engineering, National University of Singapore is the most productive author with 17 published     3.

Output analysis of top 10 countries wise publication
A total of 66 countries contributed in 1,188 papers.China tops the list with around 145 publications followed by The United States of America (USA) with 58 papers, and Germany with 46 papers.These three countries are found to be the most productive countries for research in the field of CLETM.Additionally, our results showed that China is ranked first in terms of the number of documents and number of citations (Table 4).

Output analysis of top 10 most cited Journal source
The 1,188 documents on CLETM were published in 429 Journals, the top 10 most published journals are listed in

The top 10 Affiliation and funding agencies
The study also exhibits the top 10 affiliations and funding agencies.Tsinghua University is the top author affiliation institution with 30 authors (18.75%), followed by The University of Edinburgh 27 (16.88%).China's National Natural Science Foundation is the top funding agency for CLETM research with 97 articles (61.01%), followed by European Commission with 55 articles (34.59%) as presented in (Table 6).keywords were as follows: "cross-lingual" (589) times, "computational linguistics" (324) times, "natural language processing systems" (319) times, "semantics" (258) times, "information retrieval" (177) times, "translation (languages)" (163) times, "linguistics" (150) times, "speech recognition" (118) times, target language" (115) times, and "machine translations" (114) times among others.and the conceptual structure between reported keywords were visualized by using three methods Correspondence Analysis (CA), multiple Correspondence Analysis (MCA), and Multidimensional Scaling (MDS) as seen in (Figure 4 A, B  and C).The analysis shows the red cluster has the most keywords, which means the attention of the researchers to the CLETM subject theme of the study.

Collaboration analysis between countries and institution on CLETM
To uncover new knowledge and determine the collaboration between researchers within the top 50 institutions and countries, R software was used.The analysis showed that  based on the article title by using thematic evolution of topics reported that caught the reader's attention and identified the extent to which topics are related to each other (Figure 6B). of the global publication output and outlined possible future directions to the researchers just venturing into the field of CLETM by providing sufficient information on the growth and development of the literature, information on active authors, journals, countries, institutions, funding agencies, as well as complete keyword analysis for terms most frequently used in CLETM research.
The findings showed that research surrounding CLETM was steadily increasing and reached its highest peak in 2010 and 2020.The retrieved publications on CLETM received a high number of citations with an average of 10 citations per document, which is indicative of a large number of readers and scholars.The most-reported published documents in CLETM literature were published as conference papers 79.80%, followed by full research articles 19.61% and review papers 0.59%.These findings shows that researchers in the field of CLETM prefer to publish their work as conference papers which they believe can gain more attention from the community rather than another type of documents.
Tsinghua University, Nanyang Technological University, National University of Singapore, Soochow University, Microsoft Research Institution were located in cluster one with closeness (0.005, 0.004,0.004,0.004, and 0.0004).The analysis of collaborations between countries also shows that (China, the USA, Hong Kong) are located in cluster one with closeness between countries as (0.012, 0.014, and 0.011) as can be seen in supplementary Table S1.

DISCUSSION
The effect that Bibliometrics has had in the past years is significant weather that was in governing, policymaking, or trying to better understand certain scientific fields. [26]The data for this study was retrieved from Scopus, this database provides different h-index ratings for authors which are needed to track citations and determine the impact of publications. [27]A total of 1188 documents have been selected from the Scopus database.The retrieved documents were published during 2000-2020.The study focused on the comprehensive analysis are positively correlated with the h-index of the author, institution, and country. [37]Regarding top funding agencies, most of the research concerning CLETM was funded by the National Natural Science Foundation of China.Therefore, the findings further highlights that there is need to enhance research in the area of CLETM, and increase collaborations among different authors for future research.
[40][41] Significant correlations were noted between the number of citations and the years since

CONCLUSIONS AND FUTURE RECOMMENDATIONS
The literature on CLETM had been continuously growing for the last ten years.We here analyzed the literature published from 2000 to 2020 and found it was produced by 2529 authors across 66 countries and published in 429 sources indexed in Scopus.The keyword occurrence analysis mainly focused on 'crosslingual, 'computational linguistics', 'natural language processing systems', semantics', 'information retrieval', translation (languages)', 'linguistics' 'speech recognition 'target language', and "machine translations" among others.These top 10 keywords can be used to identify future research hotspots for CLETM.The and analyze the conceptual areas using three model shows the distribution of the topic and means the attention of the researchers to the CLETM subject theme of the study.
In addition, cross-lingual, Speech recognition research has received more attention during the 2000-2020 time slice in relation to the thematic evolution analysis.Co-training for cross-lingual sentiment classification article published by Wan, 2009, [24] and Learning a multilingual subjective language via cross-lingual projections published by Mihalcea  et al. 2007 attracted the interest of most scientists, and they had the highest citations. [28]These articles originally introduced CLETM to the international scientific community.
The analysis of the top cited articles recognized the article published by Wan et al., 2009 as the most cited with more than 324 times. [24]][30][31][32][33][34][35] Thereafter, research in the crosslingual Emotional topic model was increased after the year 2010 and Year 2020.Moreover, highly cited articles are very different from 'ordinary' cited articles [36] Since citation is used as a key indicator of research quality, highly cited publications enrich the basis for proposing and implementing solid policy measures aimed at promoting CLETM in a wider range of contexts.

a:
Frequency distribution of keywords associated with the document by Scopus; b : Frequency distribution of the authors' keywords'; c : Number of author appearances; d : The scientific collaboration on the social process by which two or more researchers are work together sharing their intellectual and material resources to produce new scientific knowledge.

Figure 2 :
Figure 2: Annual trends of publication and citation times.

Figure 3
Figure 3 depicts the result of the keywords plus analysis which unveils the most used keywords in CLETM literature which can enable the identification of research themes and topics that have been heavily studied by researchers and documented in Scopus database during the past 20 years.The top 10 frequent

Figure 4 ,
Figure 4, shows the Conceptual structure of keyword was use to represent the conceptual structure of the current literature on CLETM for capturing an article content with greater depth of understand the scientific concepts in CLETM research over the past 20 years.to the analysis of CLETM mapping

Figure 3 :
Figure 3: Word-clouds illustrating high frequency words in the CLETM research in the Scopus database for search Keyword set (n=100).

Figure 6
Figure 6 prevailing and emerging themes on CLETM based Keyword analysis concerning the thematic evolution area have identified two thematic areas over four sub-periods of time slices (2000-2011; 2012-2014; 2015-2017; 2018-2019, and 2020-2020) as presented in Figure 6 A. Moreover, forty-three thematic areas were identified over these four sub-periods

Figure 4 :
Figure 4: Conceptual structure map of CLETM themes using techniques of Correspondence Analysis (A), multiple Correspondence Analysis (B), and Multidimensional Scaling (C) method, for (n=50 Keyword).

Figure 5 :
Figure 5: Co-authorship analysis between countries based on the Links (L) and Total Link Strength (TLS) between two countries.

Figure 6 :
Figure 6: Representative thematic evolution Sy diagram based on keyword Plus (A) and Article title (B).

Table 5
. Conference proceeding papers which published as Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) was the greatest number of publications 120 followed by Proceedings of the Annual Conference of the International Speech Communication Association, Interspaced with 38 papers, and Ceur Workshop Proceedings with 35 papers.

Table 6 : Top 10 Affiliation and funding agencies enhanced research.
Chinese Academy of Sciences 17 (10.63)Ministry of Science and Technology of the People's Republic of China 22 (13.84)Google LLC 16 (10.00)National Key Research and Development Program of China 20 (12.58)Source: Author's Affiliation and Funding sponsor (from Scopus database), Puls: Publications

Table S1 : Collaboration analysis of Institutions and countries. Collaboration analysis of top 25 countries Collaboration analysis of top 25 Institutions Institutions Cluster Betweenness Closeness PageRank Country Cluster Betweenness Closeness PageRank
publication, Number of countries, Number of authors, and Authors h_index.In our analysis, we noticed that based on the number of articles and citation score, authors Vulic I, Zhang Y, and Li H from developed countries dominated the list.This could imply the usability and relevance of CLETM research in developed countries' communities.Based on the total link strength (TLS) the United States, United Kingdom, Germany, and China showed high occurrence between them in CLETM research.Despite its many advantages, our study has some limitations, which can be considered in the scope of future research in the CLETM study.First, we only used one database Scopus to obtain the released publications.Therefore, other published documents not indexed in Scopus are not included in the analysis.Furthermore, we only included articles, conferences, and review papers in the analysis.In future bibliometric analysis, researchers may consider using more diverse data sources.Future bibliometric studies might consider using other databases such as Web of Science, Google Scholar to provide a more comprehensive overview of research productivity in the field of CLETM.
The National Natural Science Foundation of China funded the greatest number of studies National Natural Science Foundation of China.The study provides indicators for uncovering vital research hot spots in the field of CLETM.In addition, the study delivers further information on country collaborations for future researchers based on the single country publication and multiple country publication.Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)" are very much active in publishing documents on CLETM over the past years.The published documents had a relatively high reported scientific contribution and ranking having around 100 documents receiving 827 citations score.Thus, based on the analysis and evidence reported, it seems that researchers are particularly interested in publishing their research as lecture notes in these domains and subsequently refer to these notes for reference in other publications.The most active author in this field is LI H from the Department of Electrical and Computer Engineering, National University of Singapore.Furthermore, the collaboration analysis published in the field indicates that (Tsinghua University, Nanyang Technological University, National University of Singapore, Institute for Information Research, Soochow University, and Microsoft Research) were close in collaboration over the past period.The study ranks China as the pioneering country in CLETM research.As a consequence, the ongoing funding and support from the National Natural Science Foundation of China led to Chinese authors having the maximum number of publications in this domain.
Following the geographical distribution of CLETM research, the analysis showed over 50 % of research on CLETM was produced by high-income countries.Several developing countries might be facing massive challenges in the field; therefore, further analysis on empirical research encompassing low-income countries, lower-middle income countries wouldThe '