ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms

Ginde, Gouri; Saha, Snehanshu; Mathur, Archana; Venkatagiri, Sukrit; Vadakkepat, Sujith; Narasimhamurthy, Anand; Daya Sagar, B. S.

doi:10.1007/s11192-016-2006-2

ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms

Published: 11 June 2016

Volume 108, pages 1479–1529, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Gouri Ginde¹,
Snehanshu Saha¹,
Archana Mathur¹,
Sukrit Venkatagiri¹,
Sujith Vadakkepat¹,
Anand Narasimhamurthy² &
…
B. S. Daya Sagar³

590 Accesses
17 Citations
Explore all metrics

Abstract

Defining and measuring internationality as a function of influence diffusion of scientific journals is an open problem. There exists no metric to rank journals based on the extent or scale of internationality. Measuring internationality is qualitative, vague, open to interpretation and is limited by vested interests. With the tremendous increase in the number of journals in various fields and the unflinching desire of academics across the globe to publish in “international” journals, it has become an absolute necessity to evaluate, rank and categorize journals based on internationality. Authors, in the current work have defined internationality as a measure of influence that transcends across geographic boundaries. There are concerns raised by the authors about unethical practices reflected in the process of journal publication whereby scholarly influence of a select few are artificially boosted, primarily by resorting to editorial maneuvers. To counter the impact of such tactics, authors have come up with a new method that defines and measures internationality by eliminating such local effects when computing the influence of journals. A new metric, Non-Local Influence Quotient is proposed as one such parameter for internationality computation along with another novel metric, Other-Citation Quotient as the complement of the ratio of self-citation and total citation. In addition, SNIP and international collaboration ratio are used as two other parameters. As these journal parameters are not readily available in one place, algorithms to scrape these metrics are written and documented as a part of the current manuscript. Cobb–Douglas production function is utilized as a model to compute Journal Internationality Modeling Index. Current work elucidates the metric acquisition algorithms while delivering arguments in favor of the suitability of the proposed model. Acquired data is corroborated by different supervised learning techniques. As part of future work, the authors present a bigger picture, Reputation and Global Influence Score, that will be computed to facilitate the formation of clusters of journals of high, moderate and low internationality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

Article 26 March 2021

Factors affecting number of citations: a comprehensive review of the literature

Article 15 February 2016

Notes

http://scholarlyoa.com/publishers/, as accessed on 6 Mar 2016.
https://en.wikipedia.org/wiki/Jeffrey_Beall, as accessed on 6 Mar 2016.
http://www.scimagojr.com/journalrank.php, as accessed on 6 Mar 2016.
https://aminer.org/billboard/citation, as accessed on 21 Jan 2016.
GitHub repository for MATLAB code, information on JIMI. https://github.com/SciBase-Project/internationality-journals/blob/master/JIMI-JIS/ScientoBASE_appendix, as accessed on 7 Mar 2016.
http://wokinfo.com/essays/impact-factor/.
http://www.crummy.com/software/BeautifulSoup/, as accessed on 6 Mar 2016.
http://www.journalmetrics.com/faq.php, as accessed on 7 Mar 2016.
http://www.journalmetrics.com/values.php, as accessed on 6 Mar 2016.
http://www.journalindicators.com/, as accessed on 6 Mar 2016.
A Guide to Support Vector Regression http://web.mit.edu/6.034/wwwbob/svm-notes-long-08, as accessed on 7 Mar 2016.

References

Abrizah, A., Zainab, A. N., Kiran, K., & Raj, R. G. (2013). LIS journals scientific impact and subject categorization: A comparison between Web of Science and Scopus. Scientometrics, 94, 721740. doi:10.1007/s11192-012-0813-7.
Article Google Scholar
Battese, G. E., & Broca, S. S. (1997). Functional forms of stochastic frontier production functions and models for technical inefficiency effects: a comparative study for wheat farmers in Pakistan. Journal of Productivity Analysis, 8, 395–414.
Article Google Scholar
Beall, J. (2012). Predatory publishers are corrupting open access. Nature, 489, 179.
Article Google Scholar
Bhattacharjee, Y. (2011). Saudi universities offer cash in exchange for academic prestige. Science, 334(6061), 1344–1345. doi:10.1126/science.334.6061.1344.
Article Google Scholar
Buchandiran, G. (2011). An exploratory study of indian science and technology publication output, Department of Library and Information Science, Loyola Institute of Technology Chennai. http://www.webpages.uidaho.edu/~mbolin/buchandiran.htm.
Buela-Casal, G., Perkakis, P., Taylor, M., & Checha, P. (2006). Reflections and perspectives on academic journals. Scientometrics, 67(1), 45–65.
Article Google Scholar
Changa, C.-L., McAleer, M., & Oxley, L. (2013). Coercive journal self citations, impact factor, journal influence and article influence. Mathematics and Computers in Simulation, 93, 190197.
MathSciNet Google Scholar
Cobb, C. W., & Douglas, P. H. (1928). A theory of production. American Economic Review, 18(Supplement), 139165.
Google Scholar
Crawford, W. (2014). Journals, ’Journals’ and wannabes: Investigating the list. Cites & Insights (vol. 14, p. 7). ISSN: 1534-0937.
Das, A. K., & Mishra, S. (2014). Genesis of altmetrics or article-level metrics for measuring efficacy of scholarly communications: Current perspectives. Journal of Scientometric Research, 3(2), 82–92.
Article Google Scholar
Gingras, Y. (2014). The abuses of research evaluation. University World News. Retrieved from http://www.universityworldnews.com/article.php?story=20140204141307557.
Ginde, G., Saha, S., Balasubramaniam, C., Harsha, R. S, Mathur, A., Dayasagar, B. S., & Anand, M. N. (2015) Mining massive databases for computation of scholastic indices: Model and quantify internationality and influence diffusion of peer-reviewed journals. In Proceedings of the fourth national conference of Institute of Scientometrics, SIoT.
Haddow, G., & Genoni, P. (2010). Citation analysis and peer ranking of australian social science journals. Scientometrics, 85(2), 471487.
Article Google Scholar
Harzing, A. W. (2007) Publish or Perish. http://www.harzing.com/pop.htm. Accessed 6 March 2016.
Heilig, L., & Vo, S. (2014). A scientometric analysis of cloud computing literature. IEEE Transactions on Cloud Computing, 2(3), 266–278.
Article Google Scholar
Jangid, N., Saha, S., Narasimhamurthy, A., & Mathur, A. (2015). Computing the Prestige of a journal: A Revised Multiple Linear Regression Approach. WCI-ACM digital library (accepted), August 10–13.
Jangid, N., Saha, S., Gupta, S., & Rao, J. M. (2014). Ranking of journals in science and technology domain: A novel and computationally lightweight approach. IERI Procedia, 10, 5762. doi:10.1016/j.ieri.2014.09.091.
Article Google Scholar
Jenab, S. M. H., & Nejati, A. (2014). Evaluation of the scientific production of countries by a resource scaled two-dimensional approach. Journal of Scientometric Research, 3(3).
Kao, C. (2009). The authorship and internationality of industrial engineering journals. Scientometrics, 80(3), 123–136.
Article Google Scholar
Liping, Y., Yuqing, C., Yuntao, P., & Yishan, W. (2009). Research on the evaluation of academic journals based on structural equation modeling. Journal of Informetrics, 3(4), 304–311.
Article Google Scholar
Moed, H. F. (2010) Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3).
Saha, S., Dwivedi, A., Dwivedi, N., Ginde, G., & Mathur, A. (2015) JIMI, journal internationality modelling index: An analytical investigation. In Proceedings of the fourth national conference of institute of scientometrics, SIoT.
Saha, S., Jangid, N., Mathur, A., & Anand, M. N. (2016). DSRS: Estimation and forecasting of journal influence in the science and technology domain via a lightweight quantitative approach. arXiv:1604.03215.
Saha, S., Sarkar, J., Dwivedi, A., Dwivedi, N., Narasimhamurthy, A. M., & Roy, R. (2016). A novel revenue optimization model to address the operation and maintenance cost of a data center. Journal of Cloud Computing, Advances, Systems and Applications. doi:10.1186/s13677-015-0050-8.
Google Scholar
Tan, B. H. (2008) Cobb–Douglas Production Function [Online Database]. http://docentes.fe.unl.pt/jamador/Macro/cobb-douglas. Accessed 9 March 2016.
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the fourteenth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD2008) (pp. 990–998).
Waltman, L., van Eck, N. J., van Leeuwen, T. N., & Visser, M. S. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 7, 272285.
Google Scholar
Zupanc, G. K. H. (2014). Impact beyond the impact factor. Journal of Comparative Physiology A, 200, 113–116.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, PESIT South Campus, Bangalore, India
Gouri Ginde, Snehanshu Saha, Archana Mathur, Sukrit Venkatagiri & Sujith Vadakkepat
Department of Computer Science, BITS-Pilani, Hyderabad Campus, Hyderabad, India
Anand Narasimhamurthy
Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, India
B. S. Daya Sagar

Authors

Gouri Ginde
View author publications
You can also search for this author in PubMed Google Scholar
Snehanshu Saha
View author publications
You can also search for this author in PubMed Google Scholar
Archana Mathur
View author publications
You can also search for this author in PubMed Google Scholar
Sukrit Venkatagiri
View author publications
You can also search for this author in PubMed Google Scholar
Sujith Vadakkepat
View author publications
You can also search for this author in PubMed Google Scholar
Anand Narasimhamurthy
View author publications
You can also search for this author in PubMed Google Scholar
B. S. Daya Sagar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sukrit Venkatagiri.

Additional information

Additional file on GitHub (see footnote 5) contains MATLAB source code that generates an audio/video interface file. The file demonstrates frames of 3D plot of Cobb Douglas Production function. The file contains sample snapshots of the proposed toolkit, as well as other source code used in the course of this manuscript.

Appendices

Appendix 1 Matlab code for Cobb–Douglas function

Matlab code for generating movie

Source code for computation of self-citations and journal name extraction

https://github.com/MaQuest/computeSelfcites as accessed on 7 Mar 16.
http://github.com/SciBase-Project/internationality-journals as accessed on 7 Mar 16.

The videos can be viewed at location

https://www.youtube.com/watch?v=8IuQzq8fEYU as accessed on 7 Mar 16.

Appendix 2 Journal Influence Score

The notion of “internationality” proposed in the model embodying the work is based on the quantitative features of a journal. Journal Influence Score (JIS) serves as the most important tool for the formation of a cluster of internationality, derived from the scientometric data. A relatively new journal is then evaluated for internationality by measuring the proximity or inclusion to the known cluster, albeit loosely. The authors believe that the metric serves as a strong indicator of internationality. Such a score could help formulate a publication appraisal policy of institutions across the country. JIS could serve as a useful guideline for funding

As shown in Fig. 21, we use a multiple linear regression (MLR) models where the JIS is the response variable. Thus, the response variable says y (JIS in our case), can be expressed as a function of k predictor variables $x_1,x_2,\ldots ,x_k$ using a linear model of the form

$$y = b_0 + b_1x_1 + b_2x_2 + b_3x_3 + \cdots + b_kx_k + e$$

where $b_0,b_1,\ldots ,b_k$ are fixed parameters that signify the weight of factors and e is the error.

Sample selection

For training and validating our model our source data for this study, we used data from the SCImago Journal and Country Rank (SJR) portal which contained journals in Elseviers Scopus. The portal includes the journals and country scientific indicators developed from the information contained in the Scopus database. The SCImago Journal and Country Rank (see footnote 3) is a portal that includes the journals and country scientific indicators developed from the information contained in Scopus database. This data source provides the statistics for features mentioned below:

SCImago Journal Rank (SJR) indicator It expresses the average number of weighted citations received in the selected year by the documents published in the selected journal in the three previous years
H Index The h index expresses the journal’s number of articles (h) that have received at least h citations.
Total Docs./Total Documents Output of the selected period. All types of documents are considered, including citable and noncitable documents.
Total Docs. (3 years) Published documents in the three previous years (selected year documents are excluded).
Total references It includes all the bibliographical references in a journal in the selected period.
Total Cites (3 years) Number of citations received in the selected year by a journal to the documents published in the three previous years.
Citable Documents Number of citable documents published by a journal in the three previous years (selected year documents are excluded). Exclusive articles, reviews, and conference papers are considered.
Cites per Documents (2 years) Average citations per document in a 2 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the two previous years.
Cites per Doc (3 years) Average citations per document in a 3 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the three previous years.
Cites per Doc (4 years) Average citations per document in a 4 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the four previous years.
Ref./Doc. Average number of references per document in the selected year.
Self Cites Number of journal’s self-citations in the selected year to its own documents published in the three previous years.
Non-citable documents (Available in the graphics): Noncitable documents ratio in the period is considered.
Cited Documents (Cited Doc.) Number of documents cited at least once in the three previous years.
Uncited Documents (Uncited Doc.) Number of uncited documents in the three previous years.
% International Collaboration Document ratio whose affiliation includes more % than one country address.

Data acquisition

We used a set of 12 parameters available from the SCImago portal. Additionally we used the Quarter, i.e. There was an additional parameter $Q_i = { i}/4$ where i was the quarter in which the journal was published. The input parameters (predictor variables) thus include the Quarter, H-Index, Total Docs 2012, Total Docs 3 years, Total Cites 3 years, Citable Docs 3 years, Ref/Doc, Cites/Doc 3 years and Total Ref. The quarter is considered as one of the input variables. Intuitively, any journal to be evaluated in the first Quarter of the year has more probability of having greater influence, considering the number of publications is mostly limited. Hence, the quarter of publication should be statistically significant. The results validate the use of quarter (in which the journal issue was published) in our model.

Statistical procedure

Starting with the initial set of input parameters, a two-phase approach was employed to obtain a more compact set of transformed variables. In the first step, the number of variables was reduced using correlation and MLR, and a down selected set of input variables was obtained. In the second step, pair wise correlation was applied on this reduced set and the few parameters that explained ${>}90\,\%$ of the variability were retained. The final model was an MLR model on the parameters retained after the second phase. These steps are described below.

Step 1: Down selection using correlation with response variable and Multiple Linear Regression In this phase, all the initially selected input parameters are used to analyze the correlation and regression statistics. The correlation of each individual parameter with the response variable was computed. Parameters which had both a low correlation (${<}0.4$) as well as high p value (${>}0.05$) were removed. As shown in table 1 (Saha et al. 2016), the input variable Ref./Doc can be removed. The regression was repeated multiple times until no parameters could be discarded based on above criteria.

Step 2: Down selection based on pair wise correlation of the set of input variables obtained in Step The down selected set of variables computed in Step 1 above for multiple journals was used to compute the overall variance from the co variance matrix. We computed pairwise correlations and identified a smaller set of variables such that the correlation between any two variables in this set was small. They can then be used to compute the percentage of variability accounted for individually as shown in table 1 (Saha et al. 2016). This reduced the number further to only five input variables. The R2 value was very similar to when 9 input variables were considered. We did not do a Principal Component Analysis (PCA) since we were interested in down-selection of features. While in PCA the principal components are orthogonal to each other by design and it provides an elegant way of dimensionality reduction based on percentage variability explained, one problem is interpretation of the transformed variables with respect to the original input variables.

Appendix 3

See Fig. 22.

Appendix 4

See Fig. 23.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ginde, G., Saha, S., Mathur, A. et al. ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms. Scientometrics 108, 1479–1529 (2016). https://doi.org/10.1007/s11192-016-2006-2

Download citation

Received: 09 March 2016
Published: 11 June 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11192-016-2006-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

Factors affecting number of citations: a comprehensive review of the literature

Notes

References