Who Reads PLOS Research Articles? Extensive Analysis of the Mendeley Readership Categories of PLOS Journals

Altmetrics have been used as a complementary or alternative to traditional citation metrics, offering alternative ways to measure societal impact and public engagement with scientific publications. Conceivably, journal articles are first read by stakeholders in different social referencing platforms like Mendeley then those are used as a citable document for future work. Thus, the readership of scientific journals is an informative indicator for different stakeholders involved in scholarly practice. The purpose of this paper is to examine the readership patterns and characteristics of PLOS journals. This article compares Scopus citation counts and Mendeley readership counts for the articles of seven PLOS journals that were published in 2017. The Mendeley The Mendeley API in Webometric Analyts software was used to obtain Mendeley readership data. The result shows that Scopus citations are positively and strongly correlated with readership counts in Mendeley for all investigated journals. Most of the readers are Ph.D. students and master’s students. The USA has registered as the highest number of readers counting PLOS journals. We observed that PLOS articles tend to attract more readers than citations. Therefore, the result suggests that readership data should be accepted as an impact indicator for all PLOS journals.


INTRODUCTION
Since the 1960s, both formal and informal research works towards the evaluation of scholarly publications have been carried out based on citation counts. Therefore, citations, as a basis for quantitative measures of scientific work, has been used by stakeholders for science policy-making, career advancement, performance evaluation, funding decisions and award selections. These citation-based research evaluations have several limitations [1,2] in evaluating the broader scope of research. [3] In 2010, the term 'altmetrics' has been proposed by Priem et al. [4] as non-traditional metrics of research evaluation, which has been used as a complementary or alternative to traditional citation metrics, offering alternative ways to measure societal impact and public engagement with scientific publications. Altmetrics has been proposed as a collaborative term for the data regarding different social media platforms. However, many researchers have argued against the term 'altmetrics', they have proposed 'influmetrics' or 'web-based social influence' [5] and 'uses metrics' [6] instead, because it covers other aspects of scientific work such as view, like, share, downloads and social media attention. Bornmann [7] discussed four advantages of altmetrics compared to citation counts, namely broadness, that measure impact of both academic and professional community, diversity, not only papers but also in other types of documents, speed i.e. realtime integration of data or impact and openness i.e. easy to collect altmetrics data. Moreover, he also stated some limitations regarding the commercialization of data sources, data quality: where data can be biased or duplicate, missing evidence: lack of large-scale studies; and manipulation: where data can be manipulated.
The readership of scientific journals is an informative indicator for different stakeholders involved in scholarly practice. [8] Researchers across the globe are worried to make a decision 'where to publish scientific works to reach the maximum desired audience'. Traditionally, LIS professionals use readership statistics to measure the value of scholarly journals, [9] and also to take decisions regarding collection development. Editors and publishers can use readership statistics to examine the performance of scholarly artifacts in the scholarly community.
Journal of Scientometric Research, Vol 9, Issue 3, Sep-Dec 2020 Mendeley is a very popular referencing tool among the reference managers like CiteULike, Connotea, Endnote and Zotero, which allow researchers to search scholarly work on the web, to store and share with their peers in real-time. Users can easily register their personal or others' publications in Mendeley, which will allow them to create a reference list of publications for their research work and to see others' Mendeley lists to communicate with them. [10] Mendeley has reached 2.23 million registered users and 539.5 million documents as of May 3, 2020 1 . Li et al. [11] conducted a study using 1,613 articles of Nature and Science journals and found that more than 90% of articles were stored in Mendeley, whereas approximately 60% of articles found in CiteULike.
In this study, we try to investigate the various activities associated with Mendeley readership counts of articles across seven PLOS journals by collecting the data (citations) from Scopus and the corresponding readership counts from Mendeley. Furthermore, it analyses the relationship among citations and readership counts, with particular emphasis on locational variation of Mendeley users, readership categories and the relationship of these categories with citations.
Different social referencing tools like Mendeley, BibSonomy and CiteULike allow tracking global usage of scholarly publications. Several studies have used Mendeley data in scientific publications, but the coverage and distribution of Mendeley readers substantially vary across disciplines. Thelwall and Sud [12] reported Mendeley covers 45-90% publications of the Scopus database. Another similar study was conducted by Zahedi et al. [13] on a broad range of WoS disciplines and found that 62.6% of publications were indexed in Mendeley. And also coverage of Mendeley readers vary across journals. For instance, a study on PLOS articles has been analyzed by Priem et al., [14] which shows approximately 80% of PLOS articles were included in the Mendeley library, while Delicious and CiteULike have included only 10% and 31% of the articles, respectively. Haustein and LariviÃ [15] conducted a study on journal articles that covered four broad disciplines. This study revealed that 65.9% of articles were covered by Mendeley with at least one user. In another study, Bansal et al. [16] found that 27.2% of Indian articles were covered by Mendeley.
Some previous pieces of literature indicate a correlation comparison of social referencing tools (Mendeley or CiteULike) and citation databases (WoS or Scopus). Several studies have shown a positive correlation between citations and readership counts in Mendeley [10,[17][18][19][20] and CiteULike [11] across different subject categories. Mohammadi et al. [18] noticed a significant correlation between Mendeley reader counts and WoS citation counts, where such types of users 1. http://web.archive.org/web/20140214110051/http://www.mendeley.com/ were considered those who often authored scientific articles. Mohammadi and Thelwall [19] found an overall medium and weak correlation between readers in Mendeley and WoS citations for social science and humanity disciplines. Another similar study by Thelwall and Wilson [10] examined 45 fields of medical science discipline from Scopus and observed that Mendeley readership counts had a strong correlation with the Scopus citations for all sub-fields of medical science. Meanwhile, Eldakar [17] conducted a study on Egyptian articles using the Scopus database and noted that Scopus citations were significantly correlated with different user categories in Mendeley for all Egyptian articles. Shrivastava and Mahajan [21] reported that Scopus citations for the top 100 highly cited papers were positively correlated with Mendeley readership counts, but the results varied with publication years.
Few papers have stated journal-wise readership counts in Mendeley. For example, Maflahi and Thelwall [22] conducted a study based on articles published in four LIS journals during 1996-2013 and found a positive correlation (approximately 0.6) between readership counts and citations for all investigated years. Another similar study based on 55,655 articles indexed in WoS during1995 to 2014 by Pooladian and Borrego. [8] Their findings indicated that 75% of LIS literature published in the last five years were mentioned at least once in Mendeley. Some previous studies have been performed in altmetrics perspectives based on PLOS articles to establish the relationship between: citations and ALMs, [23][24][25] citations and ASS; [26] tweets, citations and article views; [27] traditional metrics and altmetrics [28] and influence of altmetrics in citations growth. [29] In summary, existing studies have focused on disciplinary variations of Mendeley readership counts and relationship with citation indicators. Few previous studies have investigated journal wise readership counts of Mendeley users [8,11,22,30] but there is a lack of research on relationship between different occupational categories of users with citations and locational differences of Mendeley users. Thus, the present research work has incorporated these research gaps and examines the relationship between Scopus citations with Mendeley readership counts or occupational variations of readers for seven PLOS journals.

Research questions
The primary objective of this study is to demonstrate the readership activities of the articles of seven PLOS journals. To achieve this objective, we framed the following research questions: Seven PLOS journals (i.e., PLOS ONE, PLOS Biology, PLOS Computational Biology, PLOS Genetics, PLOS Medicine, PLOS Neglected Tropical Diseases and PLOS Pathogens) were selected for the focus of this study. Out of two internationally accepted databases i.e., WoS and Scopus, Scopus is preferable to WoS because it has extensive journal coverage. [31] The bibliographic and citation information of PLOS journals were downloaded from the Scopus database on 27 th April 2020. The PLOS articles from 2017 were chosen for this study, excluding review articles, editorials, etc. The year 2017 has been selected to give a minimum time usually three years after publication [32] to draw maximum citations to give a reasonable chance of finding a high correlation value between citation counts and readership counts. [10] All articles from six journals were included first for analysis, but later we found that only one multidisciplinary journal (PLOS ONE) individually published 20,085 articles in 2017. In this case, due to technical difficulties (Scopus has retrieved citation information of 20,000 articles as on April 2020) of Scopus, the top 20,000 highly cited articles were selected for analysis. To overcome this problem, we have identified and removed those duplicate records of PLOS articles using Scopus' unique IDs. All of the 21,942 articles with citations and Mendeley readership counts (both zero and non-zero) from seven journals were chosen for further analysis. The Mendeley API provides user-oriented information for each article about their 2. http://lexiurl.wlv.ac.uk/ profession, nationality and discipline of readers where they belong to. It provides 13 categories of the occupational status of readers. We merged these 13 categories into 10 categories (Table 1). Next, the datasets from Mendeley have been matched with citations taken on or before 28 th April 2020 from the Scopus database. The whole datasets were imported and analyzed using SPSS 25.0 software. Each of the dataset was analyzed by the Spearman correlation method as a basic measure of the degree of association between two variables i.e. readership counts and citations. Hence, Spearman correlation has been used instead of Pearson correlation because the obtained data (citations) are generally too skewed for the assumption of normality distribution of the Pearson test. Also, there are too many zero values in our data (usually for uncited articles) need to be transformed into a normal (Gaussian) distribution by applying mathematical functions [11,33] like log transformation, square-root transformation, reciprocal transformation, Box-Cox transformation, Yeo-Johnson transformation, etc. for Pearson test. Still, if we applied these transformation techniques for our data, they will give undefined or infinite values for uncited articles. Therefore, we applied the Spearman correlation technique in our study.

RESULTS
A total number of 22,977 articles were published by seven PLOS journals in 2017. Of these, 22,021 (95.84%) articles were found in Mendeley and 79 (0.36%) articles as duplicate records ( Table 2). The PLOS-ONE journal covered the highest number (75 articles) of duplicate records. Among these seven journals, Biology discipline accounts for 5 journals (i.e. Biology, Computational Biology, Genetics, Neglected Tropical Diseases and Pathogens), followed by Medicine and Multidisciplinary with one journal each. The overall coverage A previous study has shown that journals in the Medical and Biology discipline tend to get higher altmetrics coverage as compared to others. [34] In terms of readership statistics, all journals (unique articles) had 100% readership statistics in Mendeley, but only one journal (PLOS-ONE) had 99.57% readership data. Journal-wise citation counts (all articles) and readership counts (unique articles) are presented in Table 3. Furthermore, we calculated citations per paper (CPP) and readers per paper (RPP) and citations per reader (CPR

Readers and Occupational Status of readers
Because of RQ1, let us see the patterns of readership about the occupational or professional categories of the readers as provided by users when they registered in Mendeley. In  Medicine, most of the readers comes from master's students.
On the other hand, associate professors have the highest score in terms of Mendeley readers than the professors and assistant professors for all PLOS journals. Figure 1 shows the journalwise readership categories available in PLOS journal articles.

Correlation Analysis
To have the answer to the second research question, the Spearman correlation coefficient was calculated for all journals for all unique readership in Mendeley. According to Cohen, [35] the correlation results (r) equal to 0.5+, 0.3+ and 0.1+ whether it is positive or negative correlations are considered to be large, medium and small, respectively, with medium and large correlations have considered being substantial. The result shows that there are large positive correlations between citation counts and Mendeley readership counts for all investigated journals ( Table 4). The values of the correlation coefficient ranged from 0.691(Biology) to 0.525 (ONE).
The citations mean and median for journals ranged from 7.94 to 24.02 and 6 to 19 in the PLOS-ONE and Medicine journals, respectively. The skewed distribution of Scopus citations shows that the mean is greater than the median, whereas the same situation has happened for the Mendeley readership counts, except for Biology and Computational The Spearman correlation was calculated to show the relationship between citations and readership categories (RQ3). Positive correlations were found for all journals except the non-academic category of Biology journal (Table 5). However, the correlation strength varies with the readership categories of different journals. Higher correlations were found for Ph.D. students, master's students, graduate students, professors, associate professors, researchers and other categories. However, the readership categories of nonacademic, librarians and assistant professors for all journals show lower correlations.

Top readers' countries of PLOS articles
This study analyzed the top reader countries of PLOS articles based on users' information about nationalities on Mendeley (RQ4). Webometric Analyst software helps to extract readers' geographical location using the Mendeley API. Nevertheless, it is not possible to obtain the exact counts of country-level readers data because the location field in Mendeley is optional, users may or may not update their geographical location when the account is being created. Out of 21,942 articles, only     There were positive and statistically significant correlations between citation counts and readership counts for all PLOS journals. However, the study is limited to small number of articles and a single year of publication. The values of correlation coefficients of PLOS journal articles regarding citation counts and readership counts ranged from 0.525 (ONE; p <0.001) to 0.691 (Biology; p <0.001). However, the overall results for two journals, PLOS ONE and Neglected Tropical Diseases, were found to be comparatively weaker correlations and the results may have been affected due to the large size of the datasets and there were many documents read by users, but they could not receive any citation. Correlations between different user categories indicate that there is a positive correlation for all user categories except the non-academic category of Biology journal. Moreover, the non-academic category has found with the minimal correlation (negative for Biology journal) value with citations for all journals and the likely reason is that very less percentage of articles were read by non-academic people and it is an important issue because non-academic personal may read PLOS research articles before registering them or may not be useful for their daily activities or maybe they are not the registered member in Mendeley. Another issue is that the journal, PLOS ONE, recorded a comparatively highest number of non-academic users among all journals and a likely reason is that the journal is multidisciplinary. The correlation values are comparatively higher for Ph.D. students, master's students, graduate students, professors, associate professors and post-doc researcher categories.
The mean and median value of Mendeley readers for all journals are more than double of Scopus citations. A likely reason is that PLOS articles were read by many people irrespective of occupational status, but they did not cite them.
Our results corroborate with the existing studies [11,21,22] but larger correlation values were found in this present dataset. The reason is that Mendeley became a very popular online reference management tool by young researchers and the number of registered users have increased over time.
The findings of the occupational status of the readers differ a little bit from existing studies, [15,[17][18] as they concluded that the largest inclusion of articles by Ph.D. students and the lowest by librarians. But in this study, we found the lowest inclusion by non-academic people. Meanwhile, the results about the geographical location of users are almost similar to other studies, [16,17] where the USA has the highest number of readers for all journals. Researchers from other countries that read PLOS articles include the UK, Brazil, Argentina, Spain, France, Belgium, etc. The geographical locational status of readerships comes from 33 different countries across the globe for PLOS articles, which indicates a wider acceptance of PLOS articles within the scientific community.
Besides these concluding remarks, the present study also has several limitations. The study is entirely based on 21,942 articles published in seven PLOS journals and possibly that the results would not be the same for other journals. Citations data of these journal articles were collected from the Scopus database, which does not cover all citation data for the articles. Moreover, Mendeley is not the only reference manager tool at present and many other tools are available in academia like CiteULike, Zotero, Endnote, etc. Here, we only collected the Mendeley reader's data. We have investigated only articles published in a particular year. The results may vary over time, which could not be included in this study. The Mendeley API provides information (occupations, country etc.) about the reader, which is user-oriented, may or may not have been updated over time. For instance, graduate student readers are likely to be researchers, but they have not changed their Mendeley status since graduating [19] .

CONCLUSION
This study has analyzed 21,942 articles from seven PLOS journals in terms of coverage, readership categories, geographical location (country) of readers and also has calculated Spearman correlations between citation counts and readership counts. The majority of readers are Ph.D. students, Master's students and post-doc researchers, while librarians and other categories are not so prominent in terms of readership counts. However, the data about geographical location and user categories cannot be accurately determined due to the limitation of the availability. PLOS articles tend to gather more Mendeley readers than citation counts. The result indicates that Mendeley readers have a positive correlation with Scopus citations for all PLOS journals and the correlation results are statistically significant. Therefore, it should be accepted as an impact indicator for all PLOS journals.