Bibliometric Analysis of References Selection that Influence Citations among Articles of Thai Multidisciplinary Journals

The purpose of this study is to examine association between reference list factors and citation count among articles published in Thai multidisciplinary journals based on the Scopus database. Spearman’s rank correlation and Negative binomial regression models were used for univariate and multivariate analysis, respectively. The results from 900 articles revealed that many reference list factors such as number of references, proportion of past 10 years references, source items references and impact of references had significantly positive correlation with citation counts both in univariate and multivariate analysis. The number of references is still a good quantitative basic indicator, which is easy to access information. For impact of references list, this study measured by average citation and h-Index of references list. Despite being a factor that reflects the quality of the research as well but for the evaluation of the unpublished research articles in the submission process, the indicators still have a long memory retention period. Such indicators require a long time to collect data and are not flexible.


INTRODUCTION
Currently, researchers and academic staff emphasize more the importance of the citation count in their research publications because citations are an accepted indicator of research impact. Articles that use numerous citations immediately reflect the desirable quality of the research. Therefore, citation counts are used as an indicator in evaluating research quality at all levels, including individual level, faculty or academic institution level and nationality, and are also used to measure the quality of published journals, called journal quality indicators or journal metrics. The previous three-year citation counts, excluding self-citations, were used as part of estimating the journal quality indicators on the Scopus database such as Source Normalized Impact per Paper (SNIP) and the SCImago Journal Rank (SJR); [1] another reputable indicator is Journal Impact Factors, indicators on the ISI Web of Knowledge database, which were calculated from all previous two-year citation counts. [1,2] Thai educational institutions paid more attention to international publications and citations both on the Scopus and ISI Web of Knowledge databases to push the concept of becoming world class universities and international research universities in accordance with Thailand 4.0 policy. This policy engenders potential development of researchers and academic staff, quality assessment of researchers and articles, including improving the quality of Thai journals from the Thai national database to become international journals based on the Scopus database as part of the push for research articles to be published in a national database to a more international level. At present, about 40 Thai journals have reached international level on the Scopus database, and there were at least 20 journals from the project to improve the quality of Thai journals on the Thai-Journal Citation Index Centre (TCI). One of the guidelines for the development of national journals and the TCI database is the study of bibliometric analysis. [3] Bibliometric analysis is quantitative research by applying statistical methods to analyze bibliographic data or information of publication contained in the research online database. It is useful for evaluating and comparing research performance, for study of research directions or trends in each subject and is also an important tool in indicating the characteristics and values of research published in journals. [4] Citations analysis and their predictive variables studies are part of bibliometric analysis research. It is accepted that the citation of the research

Bibliometric Data and Citation Analysis
Bibliometric data are characteristic data of academic articles, books and research articles publication and appear on online databases. These online databases are considered a source of big data for assessing the quality, value or performance about research publication at the international level. Initial bibliometric data analysis is statistical analysis to study the distribution of research publication in each year, as well as to study the growth and trends of research articles published in various fields. Data analysis will involve collecting data in times series. Later, more studies have been conducted on the relationship between various bibliometric factors by using cross-sectional data or individual data. This has become an important research topic in the field of library and information science. [9] Agarwel et al. (2016) [10] explained about bibliometrics as statistical analysis for extracting measurable data of research publication and knowledge or scientific merit as well as quantity (productivity) and quality (performance) within a publication. Bibliometrics was also a key method for measurement of scholarly publication impact. Brown and Gutman (2019) [1] claimed that bibliometrics could also provide helpful information about publication profile, identifying weak points, and providing research network and research collaboration, which were related to research funding.
Citation count of scholarly publications is the part of bibliometric data, which is an acceptable indicator for evaluating quality of articles. The quality assessment of articles is difficult to determine with clearly concrete measurements. However, citation count is one of the most widely used indicators for evaluating the quality of research publication both of academic institutions and journals. [5,11,12] However, the citation analysis still needs to be taken into consideration about self-citation that has contributed to the quality assessment being inaccurate. [13,14] Another factor to consider when using the citation count as a measure of the research quality is citation speed or first time citation. [15] The study is usually done in the form of period time limit for collecting cited data since publication. [16] This is consistent with the estimating of journal quality indicator, namely SCImago Journal Rank (SJR) and Source Normalized Impact per Paper (SNIP), which used the citation counts of the articles within three years published for calculation and excluding Self-Citation. [17] Another example of a widely used indicator is journal impact factor, which uses all citation counts within two years of publication and was introduced by Garfield (1972). [5] References list Factors Garfield (1972) [5] investigated citation analysis and argued that citation count was a function with many variables besides scientific merit, for example, author's reputation, controversial nature of subject matter, circulation and others. Consistent article does not solely depend on the research quality, research methodology or scientific merit, but also on various factors that can be found from the analyses of bibliometric data. [5] The number of citations is mainly influenced by the quality of the published journals. In addition, there are other variables that need to be considered such as fields, institutions, authors, author's h-index and references list. Liu, Yang and Chen (2021) [6] have classified citation into three forms: direct citation, co-citation and bibliographic coupling, which is a study of citation in the literature. This study reflects the reference literature influencing indirect citation. It also found that the use of references in research articles plays a different role in each part of article writing (Zhang, Liu and Wang, 2021). [7] For editors of journals, according to the citation analysis, there are many aspects to assess the quality of the research that has been published in journals, which can compare the characteristics or values of the research articles with other journals in the same field, and it can also be used to define the indicators or criteria to consider research published in the journals. These assessment indicators are quantifiable and tangible.
The purposes of this study emphasize the citation analysis and the association with reference list factors in Thai multidisciplinary journals based on the Scopus database. We used articles from same subject field of journals for controlling some confounding factors. For the association factors, this study focused on quantitative and impact of references list of the articles. This is one of the issues to be considered for publication in research journals. [8] In addition, the initial concept of the study needs to incorporate guidelines for improving the criteria for considering and evaluating the quality of research articles published in Thai journals on the Scopus database. Analysis of characteristics of reference list of research articles will be detailed in more depth. For data collection plan, we collected data about the number of references and other relevant details such as number of references from open access journals, number of references from source items (Scopus), number of references published not more than 10 years, and the impact of the references list in term of average citation and h-Index. The main objective is to indicate the association of these two factors, the citation counts and references lists by using bibliometric data from the Scopus database. Spearman's rho correlation was used for univariate analysis between all independent variables and citation counts. For multivariate analysis, regression models for count data were used for analyzing reference list' influence on citation counts when controlling for other significant factors. Further detail about methodology will be presented in research methodology.
with the conclusions in the study of   [18] that bibliometric studies have revealed factors influenced by citation counts, factors should be considered in statistical analysis of bibliometric data including: 1) Field-dependent factors; 2) Journal-dependent factors (journal accessibility, visibility, internationality, journal quality indicators, etc.); 3) Article-dependent factors (articles type, number of authors, number of references, article length, etc.); and 4) Author/ reader-dependent factors (article language, social network effect, etc.). In addition, the analysis of the factors that influence the citation counts must also be careful about the distribution of citation counts that are countable data with non-normality distribution. The appropriate statistics for regression model were Poisson regression, negative binomial regression, and zero-inflate Poisson or negative regression model. Depending on the model suitability test and the overdispersed data problem, the Poisson Model cannot be used (Zong et al., 2020). [19] The references list within articles is the number of scholarly publications cited by new articles, which often cited basic knowledge, theory, concepts and results of these publications. Ucar et al. (2014) [20] showed the growth in the number of references used per research article in engineering journals. In addition, the editorial perspective showed the direction of the increase in the number of references, the association with journal impact factors and emphasized the importance of using references for publication community. [21] When the number of references affected the evaluation of scholarly performance, the increase in references per publication was statistically significant. However, there are still researchers who are interested in the quality of reference documents, especially in the accuracy of references. Spivey and Wilks (2004) [8] revealed about percentage of references accuracy in social network journals. They had suggested that the solution to the problem of inaccuracies in the reference list, which is introduced by Fenton et al. (2000) [22] such as the sharing of responsibility between the authors, editors and reviewers, was that the references list should be part of the acceptance process, limiting for number of references, presenting first page of references list when submitting articles and using references management software such as Endnote. Mitchell-Williams et al. (2017) [23] studied the same concept to give importance to the accuracy and completeness of the references list, which affected the quality assessment of articles.
For association analysis with citation counts, many articles found significant positive correlation between the number of references and citation counts. Moreover, the number of references could still predict citation counts when controlling other bibliometric factors. [24][25][26] There were also other research articles that investigated references list factors on different issues. Didegah and Thelwall [27] found factors that predicted citation counts by using zero-inflate negative binomial regression. The results showed that number of references and impact of references positively correlated with citation counts in nanoscience and nanotechnology. The impact of references was measured by an average of number of citations to the cited references. Bornmann et al. [28] revealed the association between the references list factor and the citation as well when analyzing using the negative regression model. Single publication h index for the cited references was used as a variable in references list factor. Jiang et al. [29] investigated the correlation between article citations and references factors. This study focuses only on variables in the references and their impact measures in order to answer two research questions including the effect of conducting the research process to the quality of research output and selection for the most useful reference to scholars and contribution to their research. The results indicated that variables had a positive relationship with citation counts such as citation count of the references, impact factor of references' journal, and h-index of authors in the references list.

Population and Sample Size
The population in this study is research articles published in Thai international multidisciplinary journals based on the Scopus database found by searching the list of journals from Scimago Journal and Country Rank (SCImago, 2020). [30] We found five multidisciplinary journals including Chiang Mai University Journal of Natural Sciences, Maejo International Journal of Science and Technology, ScienceAsia, Songklanakarin Journal of Science and Technology, and Walailak Journal of Science and Technology. All five journals have similar characteristics, and are mainly published in applied science and natural science, with four journals being published by Thai universities. Another journal is ScienceAsia, published by The Science Society of Thailand and The National Research Council of Thailand. The citation analysis study within the same field will help control the impact of the field on the citation impact. For sample size calculation, we calculated with sample size estimating for correlation analysis formula (Montgomery, 2012) [31] as follows: 3, , which one -side α = 0.05, β = 0.2 and The correlation coefficient was cited from study of So et al. (2015) [25] who studied factors affecting citation network in science and technology. The results revealed that the number of references had positive significant correlation with citation counts (r = 0.085; p < 0.001). When replacing the correlation coefficient from this study, it was found that the minimum sample size that should be used in this study was 855 articles. For data sampling procedure, we used purposive sampling with inclusion criteria to be only research article and number of page of not fewer than three pages. Bibliometric data was gathered from all research articles of five journals and starting from 2016 going back until the sample size was similar to the calculation. Therefore, this study selected research articles published from the years 2014-2016 with a total of 902 articles, excluding two articles because they were found to be message articles from the editorial board.

Variables
This study collected citation counts, characteristics of references lists and other bibliometric data, including publication year, journal quartile ranking of publication year, number of pages, number of authors, number of institutions of authors, number of countries and the number of keywords, which were collated from the Scopus and Scimago Journal and Country Rank database. For citation counts data, we collated number of citations within three years after publication and not excluding self-citation. All citation count data was collected until 31 January 2020. For the part of references list factors, we emphasized number of references and their characteristics that may affect citation count such as number of references from open access journals, number of references from same database (Scopus), number of new references (not more than 10 publication years) and impact of references in two variables including h-Index value and average citations of references lists. For impact indicators, we collected data in two parts: 1. Data of references lists from the Scopus database; and 2. Data of references lists from the Scopus database not more than 10 years old. As shown in Table 1, all characteristics variables, when used to analyze relationships, were transformed in terms of proportions by the number of references used as the basis for calculations.

Statistical Analysis
The univariate association between the references list variables shown in Table 1 and three-year citation counts was analyzed by using Spearman's rho correlation statistics, which was the appropriate statistic for citation data that was countable data, non-normal distribution and right skewed. Spearman's rho correlation was also used to analyze the relationship between citation and other bibliometric or some social variables (Rahimi, Soheili and Amini Nia, 2020). [32] Those variables that were statistically significant at the 95% confidence level were used as control variables for the multivariate models.
The aim of multivariate analysis was to indicate the influencing of references list factors to citation counts when controlling for significant bilbliometric variables. We preferred negative binomial regression statistics to be consistent with the characteristics of dependent variables, which are count data with over dispersion problem. Regarding the impact of references factors such as "AC_SC_REF" and "AC_10y_REF" collected data about all number of citations of references lists per number of references they will have very wide range. When we analyzed citation predicted by regression model for these variables, the beta coefficient will be close to 0 although it is statistically significant and cannot be interpreted clearly. Categorical variable transformation was used for solving this problem by using 1 st Quartile, 2 nd Quartile and 3 rd Quartile values that were cut-off points for categorizing into four groups. The quartile ranking of each multidisciplinary journal, which is collected on the Scimago Journal and Country Rank website, will use the data differently in each publication year from the calculated SJR indicators that change every year. Table 2 shows     Table 3 shows descriptive analysis for citation counts within the first three publication years excluding self-citations. The results find that 487 articles, or 54.1%, were research papers that have not been cited yet. Followed by 207 articles were cited 1 time, or 23.00%. The maximum number of citations per article is 10, with only 1 article and the average citation counts with standard deviation being 0.97 ± 1.52 times. Overall, more than 90% of the articles published in these five journals have not more than three times of citation counts. For the Spearman's rho correlation analysis, Table 4 shows two factors that had significant association with citation counts at a significant level 95% including journal quality indicator using journal quartile ranking, which had negative correlation with citation. When the quartile ranking is higher (the quality index of the journal decreases), it will reduce the citation counts of articles (r s = -0.122; p < 0.001). In addition, the number of authors had positive correlation with citation counts (r s = 0.066; p = 0.048). The journal quartile ranking and the number of authors will be used as a control variable to analyze the relationship between reference list factors and citation counts by using negative binomial regression. Table 5 shows the results for descriptive statistics (range, mean, standard deviation, median and inter-quartile range) of references list factor and their correlation with citation counts using Spearman's rho correlation. The descriptive analysis revealed that a number of references were cited in articles published in these five journals that had minimum and maximum values of 3-83, mean and standard deviation was 25.89 ± 11.59 articles. The mean and standard deviation of proportion of references from open access journal was 0.11 ± 0.12 or 11% of number of references. In which the proportion of article references from open access journals was the only factor that did not find any relationship with citation counts in both univariate and multivariate analysis. The mean and standard deviation of proportion of references from journal based on Scopus database was 0.70 ± 0.21, and 0.40 ± 0.20 for proportion of references from journals based on Scopus database and not more than 10 publication years. For the impact of references list factors, we studied two indicators such as h-Index and average citations. Mean and standard deviation of h-Index factors was 0.83 ± 0.14 based on all Scopus references and 0.84 ± 0.21 based on Scopus references and not more than 10 publication years, which had similar values. It can be interpreted that the h-Index of all reference articles will be close to 80% of number of references. As for the average citation of references list which had been gathered for past five years, the results found that mean and standard deviation based on Scopus database journal was 212.17 ± 421.49, and 80.89 ± 213.92 for not more than 10 years references.

RESULTS
Results of correlation analysis using Spearman's rho correlation revealed that many references list factors had a significantly positive correlation with citation counts of articles at a 95% confidence level, including number of references (r s = 0.167; p < 0.001), proportion of references from source items (r s = 0.067; p = 0.046), proportion of references from Scopus within last 10 years (r s = 0.066; p = 0.048) and average citation counts of references list both in same database list and new list (r s = 0.118; p < 0.001 and r s = 0.130; p < 0.001, respectively).
Negative binomial regression was used to find influencing level of references list factors to citation counts when controlling other bibliometric factors such as journal quartile ranking and number of authors; all results are shown in Table 6. Before analyzing, the Vuong Test was used to find an appropriate model compared with NB Regression and Zero-Inflate NB Regression. The results show that every model had p value > 0.05; therefore, it can be concluded that NB Regression is more appropriate for data analysis than ZINB Regression. In addition, the LR Test was also used to analyze the appropriateness between the NB Regression and Poisson Regression, in which the NB Regression was more appropriate.
The results confirmed references list factors can positively influence citation counts of articles, even though journal quartile and number of authors have been controlled. Variables that were statistically significant influencing to citation were NREF (Coef.  Table 5. In addition, the results which showed differences from univariate analysis and are more interesting are proportion of h-Index of references from source items within a period of 10 years can affect the citation counts when controlling confounding factors (Coef. = 0.637; 95% CI = 0.065, 1.209).

DISCUSSION AND CONCLUSION
Currently, bibliometric data analysis is a field that has seen more article publications. This research is useful in big data analytic tools for online research databases as well as to propose policies for the development of research potential both at individual and institutional levels and for evaluating the quality of research published in various journals. Many studies show the relationship of citation counts, which is an indicator of research article quality and various bibliometrics data, not merely the quality of the research in terms of scientific merit. This study focuses mainly on references list factors of articles which are considered as one of the indicators in the publication submission process of various journals, each with different evaluation criteria both in terms of quantity and quality.
The quantitative consideration of a references list as the number of references used in a research article is still a factor that can affect the quality indicators of the research in terms of citation counts by having positive association. The results reflect two points: first, the number of references is one of the indicators for quality assessment of scholarly publications. Another issue that can be seen is that current researchers' References may not only refer to the content, knowledge, or the results of the cited article, but that researchers still use the references of a cited articles as a source of instruction, theory or knowledge. As a result, articles in the References list are often cited as well. However, when researchers are aware of such  [22] as it appears in articles of Spivey and Wilks (2004), [8] limiting the number of references should be part of the submission process of published research articles for check the accuracy of the references list.
In terms of qualitative consideration of a references list, when we talk about defining metrics in terms of quality, there are often problems in creating concrete indicators. Because the quality is interpreted in various approaches, and also depends on the individual. This study has investigated collecting more detailed information of references lists in order to determine which elements are interesting and which can be used as a quality assessment indicator. We would like to emphasize two issues for consideration in the quality of the references lists found from this study, namely, the proportion of the references that are on the same database as the research article, and the proportion of references that are on the same database as the research article and are new references. Both of these factors, although having a low correlation with the citation counts, can still have significantly positive influencing both in univariate and multivariate analysis. During the collection of data, we found that some articles have a high proportion of inaccessible reference documents or from non-source items (Chi, 2014). [33] Such an issue may disrupt the editorial boards of journals in checking the authenticity of the document references. It also affects the citations because the reader will not be able to take advantage of all references. Another important issue that we are interested in is the use of local language in research articles, reference documents that are outside the Scopus but in a local database and usually research articles in the local language, which will prevent readers from making full use of the references list as well. Regarding the newness of the references list, both in consideration of research funding or publication submission process, researchers are often examined as to their references to previous studies or literature reviews. The reference documents should have a study result that matches the current global situation in that research field. The period of scholarly publication is therefore important in consideration, especially in references with the data analysis results and outcomes of that research. In addition, a list of reference documents that have a long time publication is usually a list of references cited for basic theory or background information. Hou et al. (2011) [34] proposed a new reference list counting method, with more emphasis on detailed content than counting from the references list in the last section of the articles. This is due to the problem that some references lists are just for background information or are incidentally mentioned. As for the proportion of OA references, this study did not find any correlation with the citation counts.
Although in other studies has been found that the OA policy will benefit the article published in that journal and found to have a positive correlation on citations (Fraser, Momeni, Mayr and Peters, 2020; [35] Oh et al. 2017). [36] However, in this study, OA Policy had no or little effect on citaitons, possibly because this study examined the impact of OA policy as an indirect Finally, another indicator of the quality assessment concerns the impact of references lists whether they measure qualitatively better or not. This study collected two variables that are h-Index and average citation estimating based on number of references in the Scopus database comparing with all references and newest references (10 publication years).
The study found that the impact of reference list can affect citation count and will yield better results if the focus is on the citing of articles that are high impact and published within the past 10 years, when measured using the h-Index of references list. The impact of references study is consistent with the results of many articles. [27][28][29] Although the citation and h-Index values are highly accepted indicators in terms of evaluating the quality of research and the use of research results, we see indications that assessment from citation or h-Index is a measure that is useful in terms of quality assessment of research after publication. [37] But in terms of consideration of the journal editorial boards in the submission process, they may not be appropriate indicators because there are significant disadvantages in the evaluation period and are not flexible. The papers in the submission process are still unable to take full advantage of bibliometric analysis for describing all the references lists on the online research article database.
This paper highlights the benefits of bibliometric data analysis, which is bringing big data to statistical analysis and can contribute to the development of researcher performance, journal quality and research of online databases. Bibliometric analysis is still necessary for research potential improvement at the individual, institutional and national level. The authors of this study have received the question, "How can we improve the research potential of researchers and our institution so that research articles will be published internationally and will be cited more?" This issue led to the research question and the beginning of the bibliometric analysis study. However, this study may not be consistent with the size of the Scopus database, which is considered as one of the largest research database. The database contains 23,452 active journals, 120,000 conferences and 206,000 books with approximately 3 million records added every year. [38] From this point on, the sample size needs to be enlarged so that it can be a good representation. In terms of macro-level analysis, we still need to focus on the analysis of other issues as well, not just in the references list and cover other research fields. Important issues that must be considered are, "Can we evaluate the quality of the research request from other indicators in concrete terms, in addition to citation of articles?"

LIMITATIONS
This study has limitation in terms of the size of study as it was a small and the research articles were selected for analysis within a single subject and collected at least sufficient data for minimum appropriate sample size for statistical analysis.
That results in limitations on generalization. Therefore, it is necessary to expand the scale of research to be more macroscale as mentioned above. For the next study, consideration should be given to the study of research articles in various fields, various factors, increasing the sample size and the study of articles published in journals from other countries in order to apply the results to other journals that are in the process of developing quality to internationality level.