Citation analytics: Data exploration and comparative analyses of CiteScores of Open Access and Subscription-Based publications indexed in Scopus (2014–2016)

Citation is one of the important metrics that are used in measuring the relevance and the impact of research publications. The potentials of citation analytics may be exploited to understand the gains of publishing scholarly peer-reviewed research outputs in either Open Access (OA) sources or Subscription-Based (SB) sources in the bid to increase citation impact. However, relevant data required for such comparative analysis must be freely accessible for evidence-based findings and conclusions. In this data article, citation scores (CiteScores) of 2542 OA sources and 15,040 SB sources indexed in Scopus from 2014 to 2016 were presented and analyzed based on a set of five inclusion criteria. A robust dataset, which contains the CiteScores of OA and SB publication sources included, is attached as supplementary material to this data article to facilitate further reuse. Descriptive statistics and frequency distributions of OA CiteScores and SB CiteScores are presented in tables. Boxplot representations and scatter plots are provided to show the statistical distributions of OA CiteScores and SB CiteScores across the three sub-categories (Book Series, Journal, and Trade Journal). Correlation coefficient and p-value matrices are made available within the data article. In addition, Probability Density Functions (PDFs) and Cumulative Distribution Functions (CDFs) of OA CiteScores and SB CiteScores are computed and the results are presented using tables and graphs. Furthermore, Analysis of Variance (ANOVA) and multiple comparison post-hoc tests are conducted to understand the statistical difference (and its significance, if any) in the citation impact of OA publication sources and SB publication source based on CiteScore. In the long run, the data provided in this article will help policy makers and researchers in Higher Education Institutions (HEIs) to identify the appropriate publication source type and category for dissemination of scholarly research findings with maximum citation impact.

presented using tables and graphs. Furthermore, Analysis of Variance (ANOVA) and multiple comparison post-hoc tests are conducted to understand the statistical difference (and its significance, if any) in the citation impact of OA publication sources and SB publication source based on CiteScore. In the long run, the data provided in this article will help policy makers and researchers in Higher Education Institutions (HEIs) to identify the appropriate publication source type and category for dissemination of scholarly research findings with maximum citation impact.
& 2018 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Data Analytics More specific subject area Citation Analytics Type of data Tables, graphs, figures, and spreadsheet file How data was acquired Data was acquired from publication source list available in Scopus online database [1]. A set of five inclusion criteria was established namely: publication source must be indexed in the Scopus database; publication source must be active as at 28th December 2017; publication must be written in English language; publication source type must either be Book Series, Journal or Trade Journal; and publication source must have CiteScores in 2014, 2015, and 2016.

Data format
Secondary, analyzed

Experimental factors
Publication sources that did not meet any of the five criteria for inclusion in the period under consideration were excluded.

Experimental features
Descriptive statistics, boxplot representations, scatter plots, frequency distributions, correlation and regression analyses, Probability Density Functions (PDFs), Cumulative Distribution Functions (CDFs), Analysis of Variance (ANOVA) test, and multiple post-hoc test are performed to explore the dataset provided in this data article. All statistical computations were done using the Machine Learning and Statistics toolbox in MATLAB 2016a software.

Data source location
Data is available as supplementary material to this data article Data accessibility In a bid to facilitate further works on citation analytics, detailed datasets are made publicly available in a Microsoft Excel spreadsheet file.

Value of the data
The dataset generated and made publicly available based on the stipulated criteria will help foster further investigation into the importance of Elsevier CiteScore and other source ranking methods [2][3][4].
Presenting this data in open access format will help researchers identify relevant sources as veritable outlets for dissemination of their research findings [5,6].
Quite a lot of research findings often end up in subscription-only sources. This invariably limits access to such works and reduces their impact on future research significantly. This shortfall is mitigated by isolating and analyzing the OA sources of the largest global indexing body for scientific research [7][8][9].
Descriptive statistics, frequency distributions, one-way ANOVA and multiple comparison post-hoc tests that are presented in tables, plots, and graphs will make data interpretation much easier for useful insights, inferences, and logical conclusions [10][11][12][13].
Detailed datasets that are made publicly available in a Microsoft Excel spreadsheet file attached to this article will encourage further explorative studies in this field of research.

Data
Analytics seeks to discover, interpret, and effectively communicate patterns in any given dataset. These attributes explain why analytics is becoming pervasive across various disciplines including ranking of Higher Education Institutions (HEIs). A very high premium is placed on scholarly research output as evidenced by publication in relevant sources as a proxy measure of excellence in ranking of HEIs. Scopus by Elsevier is currently the world's largest abstract and citation database of peerreviewed literature. It currently boasts over 70 million records. CiteScore™-a measure of the average citations received per document published in a serial, is one of the three major indices used by Scopus to rank publication sources [14][15][16]. In this source ranking method, higher is better. This metric invention from Scopus is comprehensive and transparent. It is a free metrics of current sources indexed in Scopus.
The potentials of citation analytics may be exploited to understand the gains of publishing scholarly peer-reviewed research outputs in either Open Access (OA) sources or Subscription-Based (SB) sources in the bid to increase citation impact. However, relevant data required for such comparative analysis must be freely accessible for evidence-based findings and conclusions. In this data article, citation scores (CiteScores) of 2542 OA sources and 15,040 SB sources indexed in Scopus from 2014 to 2016 were presented and analyzed based on a set of five inclusion criteria. Two publication    the hybrid model is a subset of the subscription-based model. Hence, in this data article, the hybrid model is totally captured under the SB category.

Experimental design, materials and methods
In this data article, CiteScores of 2542 OA sources and 15,040 SB sources indexed in Scopus from 2014 to 2016 were presented and analyzed. The methodology for calculating the CiteScore metrics is quite easy as represented by Eqs. (1) and (2). The methodology is further explained and illustrated in     According to Scopus, the 3-year CiteScore time window was chosen as a best fit for all subject areas. Research shows that a 3-year publication window is long enough to capture the citation peak of the majority of disciplines. A set of five inclusion criteria was established namely: publication source must be indexed in the Scopus database; publication source must be active as at 28th December 2017; publication must be written in English language; publication source type must either be Book Series, Journal or Trade Journal; and publication source must have CiteScores in 2014, 2015, and 2016. The Source identification numbers were carefully anonymized using the format: OA##### for OA publication sources and; SB##### for SB publication sources, where # is an integer. Hence, the sequential Publication ID is OA00001 through OA2542 for OA publication sources, and SB00001 through SB15040 for SB publication sources.  The descriptive statistics of the CiteScores of OA and SB scholarly research output sources for the three-year period are as presented in Table 2. In order to measure the tendency of centrality in the CiteScore data, boxplots are drawn for each publication source type. The boxplot representations of     Tables 3 and 4 respectively. In like manner, the distribution fitting parameters for SB CiteScore data, and their estimates and standard errors, are presented in Tables 5 and 6 respectively.
Furthermore, correlation analyses are performed to establish a linear relationship between the OA CiteScores and the SB CiteScores. The correlation coefficient matrices and their corresponding p-values are presented in Tables 7-12. Analysis of Variance (ANOVA) and multiple comparison post-hoc tests are conducted to understand the statistical difference (and its significance, if any) in the citation impact of OA publication sources and SB publication source based on CiteScore. The results of the