Networks of reader and country status: An analysis of Mendeley reader statistics

The number of papers published in journals indexed by the Web of Science core collection is steadily increasing. In recent years, nearly two million new papers were published each year; somewhat more than one million papers when primary research articles are considered only. Sophisticated and compact bibliometric methods have to be applied in order to obtain an overview. One popular method is a network-based analysis. In this study, we analyze Mendeley readership data of a set of 1,133,224 articles and 64,960 reviews with publication year 2012 to generate three networks: (1) The network based on disciplinary affiliations points out similarities of and differences in readerships of papers. (2) The status group network shows which status groups (e.g. students, lecturers, or professors) commonly read and bookmark papers. (3) The country network focusses on global readership patterns: It visualizes similar and different reading patterns of papers at the country level. With these networks we explore the usefulness of readership data for networking.


Introduction
Bibliometrics is not only a mature research field, which develops advanced indicators for research evaluation purposes, but also a research field, which studies patterns in science.The best method for studying these patterns is bibliometric networking or science mapping.Here, bibliometric data are used to generate networks of citation relations (e.g. between scholarly journals), networks of co-authorships (e.g. between highly-cited researchers in information science), or networks of co-occurrence relations between keywords, words in abstracts and/ or words in titles (e.g.co-occurrence relations between words in abstracts of papers published in information science) (van Eck & Waltman, 2014).Powerful computers have led to the analysis of large networks (which may include the whole Web of Science, WoS, Thomson Reuters, database) (Milojević, 2014).Today, these networks are not only of interest for specialists in bibliometrics or networking, but also for stakeholders from publishers, research institutions, and funding agencies.According to Martin, Nightingale, and Rafols (2014) "network and sciencemapping visualisations have considerably enhanced the capacity to convey complex information to users.These tools are now sufficiently mature to be used not only available in academia but also in consultancy and funding organisations" (p.4).Overviews of publications dealing with networking and mapping have been published, for example, by and Börner, Sanyal, and Vespignani (2007) and Leydesdorff (2014).
Since recent years, altmetrics has developed to a popular research field in bibliometrics (Bornmann, 2014).Altmetrics counts and analyzes views, downloads, clicks, notes, saves, tweets, shares, likes, recommends, tags, posts, trackbacks, discussions, bookmarks, and comments to scholarly papers.Because it is not clear, what these counts really measure, most of the studies in this field have calculated the correlation between altmetric counts and citation counts (Bornmann, in press).A substantial positive correlation points to a certain, but otherwise undefined meaning of altmetrics in a scientific context.Similar to bibliometric data, altmetric data can not only be used for research evaluation purposes, but also for networking or science mapping.Kraker, Schlögl, Jack, and Lindstaedt (2014) presented a methodology and prototype for creating knowledge domain visualizations based on readership statistics (from Mendeley).
Haunschild and Bornmann (2015) generated a readership network which is based on Mendeley readers per (sub-)discipline for a large dataset of biomedical papers.
In this study, we use Mendeley readership data for all papers (articles and reviews where a DOI was available) from 2012 to generate three networks: (1) The network based on disciplinary affiliations will point out similarities of and differences in readerships of papers.(2) The status group network will show which status groups (e.g.students, lecturers, or professors) commonly read papers (or not).(3) The country network focuses on global readership patterns: It will visualize similar and different reading of papers at the country level.Using these networks, we explore the usefulness of readership data for networking analysis.

Dataset used
Between the 11 th and 23 rd of December 2014, Mendeley readership statistics for n A = 1,133,224 articles and n R = 64,960 reviews were retrieved via the Application Programming Interface (API), which was made available in 2014, using HTTP GET requests from R (http://www.rproject.org/).An example R script is available at http://dx.doi.org/10.6084/m9.figshare.1335688.
All papers studied here were published in 2012.The DOIs of the papers were obtained from the in-house database of the Max Planck Society (MPG) based on the WoS and administered by the Max Planck Digital Library (MPDL).The DOI was used to identify the papers in the Mendeley API.1,074,407 articles (94.8%) and 62,771 reviews (96.6%) were found at Mendeley.In total, we recorded 9,352,424 reader counts for articles and 1,335,764 reader counts for reviews.
It is optional for the users of Mendeley to provide their discipline (selecting from predefined subdisciplines) and location.However, Mendeley does not prescribe the possible values for country names.Therefore, we used the ISO names (http://countrycode.org) as possible values.Out of the 237 countries we did not find only 59 countries.However, we cannot distinguish between a country value which is not possible and a paper with no reader from this country.We were less surprised to find no reader from countries like Holy See (Vatican City) than from countries such as Singapore and Greenland.
We retrieved 1,572,240 reader counts (16.8%) for articles and 212,693 reader counts (15.9%) for reviews where the location information was shared.Country-specific readership information was available for 558,221 (49.3%) articles and 42,935 (66.1%) reviews.The academic status seems to be a mandatory piece of information, as the total number of Mendeley readers found agrees with the status-specific readership information.The self-assigned sub-discipline is not mandatory but most Mendeley users seem to provide it.Only 4,924 (0.05%) of the Mendeley article readers and 531 (0.04%) review readers did not share their sub-discipline information.

Software and Statistics
The data was organized at three levels of aggregation: a) Groups of individual readers who bookmark the papers, in terms of disciplinary affiliations; b) groups of readers in terms of their professional status (Professor, PhD student, postdoc, etc.); c) groups of readers in terms of their countries of origin.
The bookmarking can be considered as referencing, and then the analysis is analogous to bibliographic coupling (Kessler, 1963) in bibliometrics.Pajek is used for the network visualization and analysis.The largest component is extracted in each case, and further analyzed using the community finding algorithm of Blondel, Guillaume, Lambiotte, and Lefebvre (2008).
The results are visualized using VOSviewer.

Results
a. disciplinary affiliations 470 disciplinary affiliations can be distinguished, of which 465 (98.94%) form a largest component.The five affiliations which are not connected are: "Judaism", "Catholicism", "Transport Law", "Entertainment, Sports and Gaming Law", and "Air and Space Law".These five affiliations belong to the humanities (theology and law, respectively).In addition to the social sciences, the network shown in Figure 1 also includes some reading in the computer sciences and mathematics.The relation seems to be via cognitive psychology, artificial intelligence, etc.The humanities are more at the periphery of this set.The subdisciplines taxation law and German language are not directly connected to this sub-group, but nevertheless sorted into it by the community-finding algorithm.The number of readers providing bookmarks to these disciplines is low.In Figure 2, we did not use the links in order to keep the distinction between the two sets (with different colors) focal to the visualization.A version with the network links visible can be webstarted from http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/mendeley/fig2_map .txt&network=http://www.leydesdorff.net/mendeley/fig2_net.txt&n_lines=10000 It is somewhat surprising to see the sub-disciplines "regional law" and "Latin" sorted into the network of mainly bio-medical sciences in Figure 2. As the links in the web-started version show, these bookmarks have many links to several sub-disciplines within the bio-medical network.Table 1 shows Authority weights of the different status groups among networked Mendeley users.According to de Nooy, Mrvar, and Batagelj (2011), groups with high authority weights (in this case, students) are more central, because they share their interest on publications with many other groups.Senior Lecturersa group with the lowest authority weightseem to be interested in publications different from the other status groups.However, the authority weight is strongly influenced by the absolute number of reader counts.The Spearman rank correlation coefficient between authority weight and reader counts is 0.986.
Note that the status indication may be different among nations.For example, the ranks of "assistant professor" and "lecturer" are virtually non-existent in some countries.On the other hand side, ranks such as "reader" (sometimes different from "lecturer") and "habilitand" are not covered by the Mendeley classification system.Furthermore, some status groups seem redundant, e.g."doctoral student" and "Student PhD".

c. decomposition in terms of nations
Among the 200+ countries in the world, 178 countries are indicated among the readership of Mendeley that actively bookmarked records in this database.These countries are all connected with an average degree of 76.023; the density of the network is 0.43.The authority weights of the countries vary only between 0.054507 and 0.07686.This small variation of authority weight between countries is probably due to the high connectivity of the countries although there is a large variation of reader counts from 1 (Liberia) to 396,198 (USA).
The community-finding algorithm distinguishes four groups.However, the modularity among these four groups is low (Q = 0.0185) because of cross-group network connections: 1. a group of 53 nations that are core to the scientific enterprise, including most OECD countries, Russia, and China (Figure 5).
2. a largest group of 115 nations centered around Brazil and India, but including also Norway (Figure 6); 3. and 4. The other two groups are probably based on typographic confusions such as one group of ten smaller nations with "Niger" and "Nigeria" as the central core, and a fourth group with only "Guinea" and "Guinea Bissau". the subsets in VOSviewer for obtaining more details.The results of these finer-grained decompositions do not obviously make sense to us.

Discussion
Networks are one of the most important and popular methods to analyse bibliometric data.In this study, we point out that Mendeley data can also be successfully used as a data source for networks.It is a great advantage of Mendeley data that they can be retrieved for comprehensive publication sets using an API.Thus, one can download readership data on a grand scale which are then very suitable for network analyses.
The Mendeley readership networks can be generated by using different types of user information: their (1) discipline, (2) status, and (3) country.All three information are able to produce meaningful networks.In terms of disciplines, first, we find four groups: (1) biology, (2) social science and humanities (including relevant computer science), (3) bio-medical sciences, and (4) natural science and engineering.In all four groups, the category with the addition "miscellaneous" prevails.Probably, the readers who identify themselves with cross-disciplinary research interests are more inclined to generate these "bibliographic couplings" than more specifically specialized readers.The pronounced position of the social sciences and the humanities (albeit perhaps mediated by computer scientists) was not expected.
The decomposition in terms of status hierarchies within the network makes clear that this hierarchy is inversed in Mendeley.The lead among these users is taken by students working on theses.More than professionals, students have time to explore the literature beyond their specialization.Lecturers and Senior Lecturers entertain a different reading pattern, given their primary tasks in education.Librarians make use of Mendeley (and scholarly literature) differently from researchers.However, the reader count distribution is skewed.The calculated authority weight correlates strongly with the absolute number of observed reader counts.
The decomposition in terms of nations highlights the worldwide divide between developed and less-developed nations.A similar prevailing divide was recently also found in portfolio analysis of journal literature by Leydesdorff , Heimeriks, & Rotolo (in press).More fine-grained delineations can partially be recognized as regional, but could not always be provided with an obvious interpretation.
The academic status information is provided by every Mendeley user and nearly every Mendeley users provides (sub-) discipline information, while the minority of Mendeley users seems to provide their location.This makes it harder to analyze the reader counts broken down by countries.Some Mendeley academic status groups seem redundant, while others seem to be tailored to the US system.Surprisingly, the vast majority of Mendeley readers assigns the miscellaneous sub-discipline of their main discipline to themselves.It is not clear to which extent Mendeley users assign the precise sub-discipline, status, and location information to themselves and whether they update this information regularly.Despite these shortcomings of the Mendeley classification system and the quality of information the users provide, we presented a network analysis of Mendeley reader counts from three different perspectives.This leads us to the conclusion that useful network analysis can be performed using Mendeley reader counts.
The 465 affiliations in the main component are sorted into four groupings by the communityfinding algorithm of Blondel et al.,(2008); the modularity is Q = 0.245.The four groups are, respectively:

Figure 2 :
Figure 2: 71 affiliations in the bio-medical sciences (yellow) and 84 affiliations in the natural sciences and engineering (pink).

Figure 3 Figure 4
Figure 3 visualizes the entire network.It shows that the core set is occupied by readers who

Figure 4 :
Figure 4: Network of co-readers in terms of professional status.

Table 1 :
Authority weights and absolute number of reader counts N of different status groups among networked Mendeley users (using the Hubs & Authorities routine in Pajek).