Skip to main content
Log in

ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Defining and measuring internationality as a function of influence diffusion of scientific journals is an open problem. There exists no metric to rank journals based on the extent or scale of internationality. Measuring internationality is qualitative, vague, open to interpretation and is limited by vested interests. With the tremendous increase in the number of journals in various fields and the unflinching desire of academics across the globe to publish in “international” journals, it has become an absolute necessity to evaluate, rank and categorize journals based on internationality. Authors, in the current work have defined internationality as a measure of influence that transcends across geographic boundaries. There are concerns raised by the authors about unethical practices reflected in the process of journal publication whereby scholarly influence of a select few are artificially boosted, primarily by resorting to editorial maneuvers. To counter the impact of such tactics, authors have come up with a new method that defines and measures internationality by eliminating such local effects when computing the influence of journals. A new metric, Non-Local Influence Quotient is proposed as one such parameter for internationality computation along with another novel metric, Other-Citation Quotient as the complement of the ratio of self-citation and total citation. In addition, SNIP and international collaboration ratio are used as two other parameters. As these journal parameters are not readily available in one place, algorithms to scrape these metrics are written and documented as a part of the current manuscript. Cobb–Douglas production function is utilized as a model to compute Journal Internationality Modeling Index. Current work elucidates the metric acquisition algorithms while delivering arguments in favor of the suitability of the proposed model. Acquired data is corroborated by different supervised learning techniques. As part of future work, the authors present a bigger picture, Reputation and Global Influence Score, that will be computed to facilitate the formation of clusters of journals of high, moderate and low internationality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://scholarlyoa.com/publishers/, as accessed on 6 Mar 2016.

  2. https://en.wikipedia.org/wiki/Jeffrey_Beall, as accessed on 6 Mar 2016.

  3. http://www.scimagojr.com/journalrank.php, as accessed on 6 Mar 2016.

  4. https://aminer.org/billboard/citation, as accessed on 21 Jan 2016.

  5. GitHub repository for MATLAB code, information on JIMI. https://github.com/SciBase-Project/internationality-journals/blob/master/JIMI-JIS/ScientoBASE_appendix, as accessed on 7 Mar 2016.

  6. http://wokinfo.com/essays/impact-factor/.

  7. http://www.crummy.com/software/BeautifulSoup/, as accessed on 6 Mar 2016.

  8. http://www.journalmetrics.com/faq.php, as accessed on 7 Mar 2016.

  9. http://www.journalmetrics.com/values.php, as accessed on 6 Mar 2016.

  10. http://www.journalindicators.com/, as accessed on 6 Mar 2016.

  11. A Guide to Support Vector Regression http://web.mit.edu/6.034/wwwbob/svm-notes-long-08, as accessed on 7 Mar 2016.

References

  • Abrizah, A., Zainab, A. N., Kiran, K., & Raj, R. G. (2013). LIS journals scientific impact and subject categorization: A comparison between Web of Science and Scopus. Scientometrics, 94, 721740. doi:10.1007/s11192-012-0813-7.

    Article  Google Scholar 

  • Battese, G. E., & Broca, S. S. (1997). Functional forms of stochastic frontier production functions and models for technical inefficiency effects: a comparative study for wheat farmers in Pakistan. Journal of Productivity Analysis, 8, 395–414.

    Article  Google Scholar 

  • Beall, J. (2012). Predatory publishers are corrupting open access. Nature, 489, 179.

    Article  Google Scholar 

  • Bhattacharjee, Y. (2011). Saudi universities offer cash in exchange for academic prestige. Science, 334(6061), 1344–1345. doi:10.1126/science.334.6061.1344.

    Article  Google Scholar 

  • Buchandiran, G. (2011). An exploratory study of indian science and technology publication output, Department of Library and Information Science, Loyola Institute of Technology Chennai. http://www.webpages.uidaho.edu/~mbolin/buchandiran.htm.

  • Buela-Casal, G., Perkakis, P., Taylor, M., & Checha, P. (2006). Reflections and perspectives on academic journals. Scientometrics, 67(1), 45–65.

    Article  Google Scholar 

  • Changa, C.-L., McAleer, M., & Oxley, L. (2013). Coercive journal self citations, impact factor, journal influence and article influence. Mathematics and Computers in Simulation, 93, 190197.

    MathSciNet  Google Scholar 

  • Cobb, C. W., & Douglas, P. H. (1928). A theory of production. American Economic Review, 18(Supplement), 139165.

    Google Scholar 

  • Crawford, W. (2014). Journals, ’Journals’ and wannabes: Investigating the list. Cites & Insights (vol. 14, p. 7). ISSN: 1534-0937.

  • Das, A. K., & Mishra, S. (2014). Genesis of altmetrics or article-level metrics for measuring efficacy of scholarly communications: Current perspectives. Journal of Scientometric Research, 3(2), 82–92.

    Article  Google Scholar 

  • Gingras, Y. (2014). The abuses of research evaluation. University World News. Retrieved from http://www.universityworldnews.com/article.php?story=20140204141307557.

  • Ginde, G., Saha, S., Balasubramaniam, C., Harsha, R. S, Mathur, A., Dayasagar, B. S., & Anand, M. N. (2015) Mining massive databases for computation of scholastic indices: Model and quantify internationality and influence diffusion of peer-reviewed journals. In Proceedings of the fourth national conference of Institute of Scientometrics, SIoT.

  • Haddow, G., & Genoni, P. (2010). Citation analysis and peer ranking of australian social science journals. Scientometrics, 85(2), 471487.

    Article  Google Scholar 

  • Harzing, A. W. (2007) Publish or Perish. http://www.harzing.com/pop.htm. Accessed 6 March 2016.

  • Heilig, L., & Vo, S. (2014). A scientometric analysis of cloud computing literature. IEEE Transactions on Cloud Computing, 2(3), 266–278.

    Article  Google Scholar 

  • Jangid, N., Saha, S., Narasimhamurthy, A., & Mathur, A. (2015). Computing the Prestige of a journal: A Revised Multiple Linear Regression Approach. WCI-ACM digital library (accepted), August 10–13.

  • Jangid, N., Saha, S., Gupta, S., & Rao, J. M. (2014). Ranking of journals in science and technology domain: A novel and computationally lightweight approach. IERI Procedia, 10, 5762. doi:10.1016/j.ieri.2014.09.091.

    Article  Google Scholar 

  • Jenab, S. M. H., & Nejati, A. (2014). Evaluation of the scientific production of countries by a resource scaled two-dimensional approach. Journal of Scientometric Research, 3(3).

  • Kao, C. (2009). The authorship and internationality of industrial engineering journals. Scientometrics, 80(3), 123–136.

    Article  Google Scholar 

  • Liping, Y., Yuqing, C., Yuntao, P., & Yishan, W. (2009). Research on the evaluation of academic journals based on structural equation modeling. Journal of Informetrics, 3(4), 304–311.

    Article  Google Scholar 

  • Moed, H. F. (2010) Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3).

  • Saha, S., Dwivedi, A., Dwivedi, N., Ginde, G., & Mathur, A. (2015) JIMI, journal internationality modelling index: An analytical investigation. In Proceedings of the fourth national conference of institute of scientometrics, SIoT.

  • Saha, S., Jangid, N., Mathur, A., & Anand, M. N. (2016). DSRS: Estimation and forecasting of journal influence in the science and technology domain via a lightweight quantitative approach. arXiv:1604.03215.

  • Saha, S., Sarkar, J., Dwivedi, A., Dwivedi, N., Narasimhamurthy, A. M., & Roy, R. (2016). A novel revenue optimization model to address the operation and maintenance cost of a data center. Journal of Cloud Computing, Advances, Systems and Applications. doi:10.1186/s13677-015-0050-8.

    Google Scholar 

  • Tan, B. H. (2008) Cobb–Douglas Production Function [Online Database]. http://docentes.fe.unl.pt/jamador/Macro/cobb-douglas. Accessed 9 March 2016.

  • Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the fourteenth ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD2008) (pp. 990–998).

  • Waltman, L., van Eck, N. J., van Leeuwen, T. N., & Visser, M. S. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 7, 272285.

    Google Scholar 

  • Zupanc, G. K. H. (2014). Impact beyond the impact factor. Journal of Comparative Physiology A, 200, 113–116.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sukrit Venkatagiri.

Additional information

Additional file on GitHub (see footnote 5) contains MATLAB source code that generates an audio/video interface file. The file demonstrates frames of 3D plot of Cobb Douglas Production function. The file contains sample snapshots of the proposed toolkit, as well as other source code used in the course of this manuscript.

Appendices

Appendix 1

Matlab code for Cobb–Douglas function

figure l

Matlab code for generating movie

figure m

Source code for computation of self-citations and journal name extraction

The videos can be viewed at location

Appendix 2

Journal Influence Score

The notion of “internationality” proposed in the model embodying the work is based on the quantitative features of a journal. Journal Influence Score (JIS) serves as the most important tool for the formation of a cluster of internationality, derived from the scientometric data. A relatively new journal is then evaluated for internationality by measuring the proximity or inclusion to the known cluster, albeit loosely. The authors believe that the metric serves as a strong indicator of internationality. Such a score could help formulate a publication appraisal policy of institutions across the country. JIS could serve as a useful guideline for funding

As shown in Fig. 21, we use a multiple linear regression (MLR) models where the JIS is the response variable. Thus, the response variable says y (JIS in our case), can be expressed as a function of k predictor variables \(x_1,x_2,\ldots ,x_k\) using a linear model of the form

$$y = b_0 + b_1x_1 + b_2x_2 + b_3x_3 + \cdots + b_kx_k + e$$

where \(b_0,b_1,\ldots ,b_k\) are fixed parameters that signify the weight of factors and e is the error.

Fig. 21
figure 21

Computation model of Journal Influence Score (JIS)

Sample selection

For training and validating our model our source data for this study, we used data from the SCImago Journal and Country Rank (SJR) portal which contained journals in Elseviers Scopus. The portal includes the journals and country scientific indicators developed from the information contained in the Scopus database. The SCImago Journal and Country Rank (see footnote 3) is a portal that includes the journals and country scientific indicators developed from the information contained in Scopus database. This data source provides the statistics for features mentioned below:

  • SCImago Journal Rank (SJR) indicator It expresses the average number of weighted citations received in the selected year by the documents published in the selected journal in the three previous years

  • H Index The h index expresses the journal’s number of articles (h) that have received at least h citations.

  • Total Docs./Total Documents Output of the selected period. All types of documents are considered, including citable and noncitable documents.

  • Total Docs. (3 years) Published documents in the three previous years (selected year documents are excluded).

  • Total references It includes all the bibliographical references in a journal in the selected period.

  • Total Cites (3 years) Number of citations received in the selected year by a journal to the documents published in the three previous years.

  • Citable Documents Number of citable documents published by a journal in the three previous years (selected year documents are excluded). Exclusive articles, reviews, and conference papers are considered.

  • Cites per Documents (2 years) Average citations per document in a 2 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the two previous years.

  • Cites per Doc (3 years) Average citations per document in a 3 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the three previous years.

  • Cites per Doc (4 years) Average citations per document in a 4 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the four previous years.

  • Ref./Doc. Average number of references per document in the selected year.

  • Self Cites Number of journal’s self-citations in the selected year to its own documents published in the three previous years.

  • Non-citable documents (Available in the graphics): Noncitable documents ratio in the period is considered.

  • Cited Documents (Cited Doc.) Number of documents cited at least once in the three previous years.

  • Uncited Documents (Uncited Doc.) Number of uncited documents in the three previous years.

  • % International Collaboration Document ratio whose affiliation includes more % than one country address.

Data acquisition

We used a set of 12 parameters available from the SCImago portal. Additionally we used the Quarter, i.e. There was an additional parameter \(Q_i = { i}/4\) where i was the quarter in which the journal was published. The input parameters (predictor variables) thus include the Quarter, H-Index, Total Docs 2012, Total Docs 3 years, Total Cites 3 years, Citable Docs 3 years, Ref/Doc, Cites/Doc 3 years and Total Ref. The quarter is considered as one of the input variables. Intuitively, any journal to be evaluated in the first Quarter of the year has more probability of having greater influence, considering the number of publications is mostly limited. Hence, the quarter of publication should be statistically significant. The results validate the use of quarter (in which the journal issue was published) in our model.

Statistical procedure

Starting with the initial set of input parameters, a two-phase approach was employed to obtain a more compact set of transformed variables. In the first step, the number of variables was reduced using correlation and MLR, and a down selected set of input variables was obtained. In the second step, pair wise correlation was applied on this reduced set and the few parameters that explained \({>}90\,\%\) of the variability were retained. The final model was an MLR model on the parameters retained after the second phase. These steps are described below.

Step 1: Down selection using correlation with response variable and Multiple Linear Regression In this phase, all the initially selected input parameters are used to analyze the correlation and regression statistics. The correlation of each individual parameter with the response variable was computed. Parameters which had both a low correlation (\({<}0.4\)) as well as high p value (\({>}0.05\)) were removed. As shown in table 1 (Saha et al. 2016), the input variable Ref./Doc can be removed. The regression was repeated multiple times until no parameters could be discarded based on above criteria.

Step 2: Down selection based on pair wise correlation of the set of input variables obtained in Step The down selected set of variables computed in Step 1 above for multiple journals was used to compute the overall variance from the co variance matrix. We computed pairwise correlations and identified a smaller set of variables such that the correlation between any two variables in this set was small. They can then be used to compute the percentage of variability accounted for individually as shown in table 1 (Saha et al. 2016). This reduced the number further to only five input variables. The R2 value was very similar to when 9 input variables were considered. We did not do a Principal Component Analysis (PCA) since we were interested in down-selection of features. While in PCA the principal components are orthogonal to each other by design and it provides an elegant way of dimensionality reduction based on percentage variability explained, one problem is interpretation of the transformed variables with respect to the original input variables.

Appendix 3

See Fig. 22.

Fig. 22
figure 22

Tool to calculate Journal Influence Score (JIS)

Appendix 4

See Fig. 23.

Fig. 23
figure 23

Screenshot of the survey form

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ginde, G., Saha, S., Mathur, A. et al. ScientoBASE: a framework and model for computing scholastic indicators of non-local influence of journals via native data acquisition algorithms. Scientometrics 108, 1479–1529 (2016). https://doi.org/10.1007/s11192-016-2006-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2006-2

Keywords

Navigation