Research coauthorship 1900–2020: Continuous, universal, and ongoing expansion

Abstract Research coauthorship is useful to combine different skill sets, especially for applied problems. While it has increased over the last century, it is unclear whether this increase is universal across academic fields and which fields coauthor the most and least. In response, we assess changes in the rate of journal article coauthorship 1900–2020 for all 27 Scopus broad fields and all 332 Scopus narrow fields. Although all broad fields have experienced reasonably continuous growth in coauthorship, in 2020, there were substantial disciplinary differences, from Arts and Humanities (1.3 authors) to Immunology and Microbiology (6 authors). All 332 Scopus narrow fields also experienced an increase in the average number of authors. Immunology and Classics are extreme Scopus narrow fields, as exemplified by 9.6 authors per Journal for ImmunoTherapy of Cancer article, whereas 93% of Trends in Classics articles were solo in 2020. The reason for this large difference seems to be the need for multiple complementary methods in Immunology, making it fundamentally a team science. Finally, the reasonably steady and universal increases in academic coauthorship over 121 years show no sign of slowing, suggesting that ever-expanding teams are a central part of current professional science.


INTRODUCTION
Research collaboration may be essential to address complex societal challenges (Hall, Vogel et al., 2018), as exemplified by the need for cooperation to reduce the impact of the COVID-19 pandemic (Cai, Fry, & Wagner, 2021;Chakraborty, Sharma et al., 2020). International collaboration is also widely believed to be beneficial (Matthews, Yang et al., 2020), and huge authorship teams involving large numbers of countries (Adams & Gurney, 2018) or hundreds of people (Thelwall, 2020) are also essential for some problems. This article focuses on collaboration as expressed in journal article coauthorship for the practical reasons that this is the type for which the largest scale evidence is available (e.g., 62% of articles found by a review of team science used bibliometric data; Hall et al., 2018), and this type of collaboration is important for research evaluation.

The Evolution of Research Coauthorship
Historically, collaboration appeared in science when it transformed from an amateur to a professional occupation (Beaver & Rosen, 1978), and this presumably expressed itself increasingly often in coauthored papers. The size of research teams and consequently the number of authors per paper increased after the Second World War in richer nations, driven by the cost of research, as part of the development of "big science" (Price, 1963). Coauthorship has become more common and easier with faster, cheaper travel, the internet (Melin, 2000), and cloud computing (Langmead & Nellore, 2018), but is still more likely between partners that are geographically close (Hoekman, Frenken, & Tijssen, 2010). It has been partly driven by funding programs that mandate or encourage collaboration (Melin, 2000), with large grants associating with greater and more diverse collaborations (Bozeman & Corley, 2004), aligning with the big science idea (Price, 1963).
Many empirical studies have found evidence of increasing coauthorship in individual fields (e.g., economics: Jones, 2021; translation studies: Rovira-Esteva, Aixelá et al., 2020), but the focus here is on science-wide trends. International coauthorship increased substantially in the Web of Science ( WoS), from 10% in 1990 to 25% in 2011 (Wagner, Park, & Leydesdorff, 2015). An investigation of 20 million WoS research articles from 1955 (science and engineering), 1956 (social sciences), or 1975 (arts and humanities) to 2000 found increases in average (arithmetic mean) team sizes and the proportion of coauthored articles in all cases, with solo research dominating the arts and humanities, solo articles being about half of all social science articles in 2000, and team publications being the norm in science and engineering (Wuchty, Jones, & Uzzi, 2007). An updated and larger scale investigation with the Science Citation Index Expanded, Social Sciences Citation Index, and Arts and Humanities Citation Index 1900-2011 found that an increasing proportion of articles had more authors when split into two groups: natural and medical sciences and social sciences and humanities . The percentage of solo papers also decreased substantially in both groups during this period. According to various field-specific catalogues, coauthorship increased between 1800 and 1999 in mathematics, physics, and logic, although the main expansions occurred in different half centuries in each case (Wagner-Döbler, 2001). Despite these studies, no prior published academic journal article seems to have analyzed long-term trends in research coauthorship across all broad or narrow fields of science.

Types of Coauthorship
Coauthorship occurs for at least two reasons that seem to apply to all areas of scholarship: between PhD students and supervisors, and for social reasons, such as to work with friends (Jha & Welch, 2010;Melin, 2000). Researcher characteristics also affect the likelihood that they collaborate (not equated with coauthorship) (Van Rijnsoever & Hessels, 2011). Other reasons for coauthorship vary between fields and probably change more over time, such as the need for large-scale studies or diverse sets of skills to tackle specific problems. At the individual level, researchers may coauthor an article because they have complementary areas of expertise for common interests, and so can produce more complex studies, but they may also join to access scientific instruments or other resources (Thorsteinsdottir, 2000;Tomáška, Cesare et al., 2020).
There are many ways of contributing to research projects, only some of which might usually be classed as collaboration or recognized with a coauthorship attribution (Katz & Martin, 1997;Laudel, 2002). For example, a minor contribution to a project might not be thought of as a collaboration and some teams might only allocate a coauthorship to major contributions. Other contributions might generate an acknowledgment instead, or no formal recognition (Cronin, 2001;Laudel, 2002). In contrast, noncollaborators might receive gift authorships (Chawla, 2020) and team leaders might be listed as last authors for overall management of a project or laboratory without a specific contribution to a study. Thus, the criteria for authorship are not fixed. There are some attempts to standardize the requirements for authorship, however. For example, the CRediT (Contributor Roles Taxonomy) system lists the following 14 authorship roles (CASRAI, 2022): Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Software; Supervision; Validation; Visualization; Writing-original draft; Writing-review & editing. From this list, it may be assumed that any of these types of contribution, if substantial enough, might qualify a person to be a coauthor. Nevertheless, other guidelines make more strict specifications, such as a requirement that all contribute to writing the article and are accountable for "all aspects of the work" (ICMJE, 2021).

Disciplinary Differences in Coauthorship
There are disciplinary differences in the likelihood of coauthoring papers. Informal collaboration without coauthorship is thought helpful in the social sciences and humanities, whereas formal collaborations are believed to be more important in the natural sciences (Lewis, Ross, & Holden, 2012). Many other factors (discussed below) seem less applicable to the social sciences, arts, and humanities. Nevertheless, social sciences and humanities scholars may also experiment with collaboration because it is promoted as an effective research strategy, even though it is rare in their home fields (Graham Bertolini, Weber et al., 2019).
Fields that need to combine to create pooled resources, such as global biodiversity data sets or specialized nuclear reactors, engender new coauthorships of different types based on these resources or their production (e.g., Arita, Karsch-Mizrachi, & Cochrane, 2021;D'Ippolito & Rüling, 2019). Similarly, large health-related cohort studies may need multiple centers to collect all the data and researchers in later stages may need to collaborate and coauthor with earlier researchers to understand or process all the data collected (Eblen, Fabsitz et al., 2012). In some areas of science, researchers operate largely as part of teams rather than individuals, and so collaboration and coauthorship are core to their work (Ziman, 1994).
There is some empirical evidence about disciplinary differences in the extent of coauthorship. International coauthorship has been more common in basic fields (Frame & Carpenter, 1979). Nevertheless, evidence from National Science Foundation funded studies in the United States suggests that it grew more rapidly during 1997-2012 in applied fields and particularly in the medical sciences and emerging applied hybrid fields (Coccia & Bozeman, 2016). The fields in which single authorship is most common are mathematics and the arts and humanities (Farber, 2005) and there are more authors per paper in the natural and medical sciences compared to the social sciences and humanities .

Research Questions
Although the above brief review suggests that coauthorship is broadly increasing, the extent of the historical trends and fine-grained disciplinary variations in coauthorship have not been systematically evaluated. This is an important omission for future attempts to identify the causes of the apparent coauthorship advantage. This article assesses the historical evolution of the number of coauthors of journal articles at the finest grained cross-science level yet. It also contrasts with a previous WoS study  by using a different data source: Scopus. It focuses on coauthorship as an easily available metadata type, even though there are a range of types of contribution to research (including none) that can translate into an authorship attribution (Larivière et al., 2016;Mongeon, Smith et al., 2017;Rahman, Mac Regenstein et al., 2017). It also focuses on global trends in coauthorship team sizes rather than changes in the nature of coauthorships because there is little data from which to deduce the nature of a collaboration on a science-wide or long-term scale. The following research questions drive this study.
• Has the average number of authors per publication for journal articles increased during 1900-2020 in all Scopus broad and narrow fields, and is it still increasing in 2020? • Which Scopus broad and narrow fields had the highest and lowest average number of authors per publication in 2020?

METHODS
The raw data for this study is the metadata records of all Scopus documents of type journal article published between 1900 and 2020, as downloaded by the Scopus API up to September 2021. Scopus was chosen for wider coverage than WoS (Martín-Martín, Thelwall et al., 2021) and a greater number of narrow fields (332 vs. 252), supporting a finer grained analysis. See the Discussion for coverage limitation issues.
Scopus broad fields (n = 27) and narrow fields (n = 332, excluding empty fields) were used for the classification process (for a list of narrow fields and journals, see Elsevier (2021)). The fields are predominantly journal-based and inferior to article-based classifications (Klavans & Boyack, 2017), but were used for transparency and the availability of expert-validated field names. Articles classified by Scopus into multiple broad fields were included within each one. Including these duplicates, 88 million articles were analyzed. Numbers for each broad field and year are in the online supplement (https://doi.org/10 .6084/m9.figshare.17064419).
The average number of authors per article was calculated using the geometric mean (based on averaging the logged numbers of authors, then applying the exponential function) rather than the arithmetic mean. The geometric mean is a more appropriate measure of central tendency for highly skewed data (Smothers, Sun, & Dayton, 1999) and this is important because of the presence of some huge teams. For example, a few articles by the 2,862-person ATLAS collaboration (e.g., https://doi.org/10.1007/JHEP09(2016)074) would skew the arithmetic means for their fields to unrepresentative numbers. The median is also appropriate for skewed data but is not fine-grained enough for this study because the average coauthorship values (for the geometric mean) are all between 1 and 7, as can be seen in the figures. The harmonic mean could also have been used to reduce the impact of highly coauthored outliers, but the geometric mean seems more intuitive because it is based on averaging logarithms rather than reciprocals.
Articles with no authors were excluded from all calculations because an article must be written by someone (excluding fake computer-generated texts). Unauthored articles seemed to be primarily entries that had been misleadingly classed as journal articles (e.g., special issue introductions, errata, corrections, announcements [e.g., "The Topics of the Month of IASR")], and editorials) but might also sometimes be journal articles with indexing errors. In at least one case, the article authors were not explicitly included in the original article ("Germ-cell migration. Finding the way to the gonad in Drosophila", Current Biology, 1994, 4(1), 47), either by a publishing accident or because it is not a full journal article (an annotated picture in the above case).

Broad Scopus Fields
In all 27 broad Scopus fields, the average number of authors per article has increased over the past 121 years (Figures 1-4). In nearly all broad fields, single authorship was overwhelmingly most common in 1900. The two exceptions were Materials Science and Chemistry, with averages close to 1.5 (Figure 1). By 2020, most broad fields had at least four authors per paper on average (geometric mean), including all multidisciplinary and natural science broad fields (Figure 1), the Energy and Chemical Engineering broad field (Figure 2), and all life science and health broad Scopus fields (Figure 3). In contrast, all social science and humanities broad fields had less than 3.5 authors per paper in 2020. The largest difference is between the average of 6 authors per Immunology and Microbiology journal article in 2020 ( Figure 3) and 1.6 for Arts and Humanities (Figure 4).

Narrow Scopus Fields
In almost all cases (329 out of 332), the average (geometric mean) number of authors per article in the most recent publishing year of a Scopus narrow field (almost always 2020) was higher than in the first publishing year (usually after 1900). Two of the three exceptions, Nurse Assisting and Agricultural and Biological Sciences (miscellaneous), were due to a single article published in their first year. Ignoring this year, there was an increase in average authorship for these two fields. The third exception, Drug Guides, began publishing in 1973 with four articles and, ignoring years with fewer than five articles, has increased to 2020. It also has an overall increasing trend, albeit with a decrease since 2013 (see online supplement for figure: https:// doi.org/10.6084/m9.figshare.17064419), so is not an exception.
Alternatively, trends could be assessed by comparing the average number of authors for the first 100 articles (or more, counting whole years) against the most recent 100 articles. A minimum of 100 ensures that the average is not dominated by a few articles. With this calculation, all 332 Scopus narrow fields had an increase in the average number of authors from their inception to 2020 (or their termination in Scopus). Hence, on the basis of both of these types of test, it is reasonable to claim that coauthorship has increased in all 332 Scopus narrow fields.

Immunology and Microbiology: Highest Average Coauthorship
The Scopus broad field with the highest coauthorship rate, Immunology and Microbiology, was further investigated for coauthorship rates in its constituent subfields. All seven narrow subfields have high rates of coauthorship, with the highest being Immunology in most years ( Figure 5). Perhaps confusingly, the "Immunology and Microbiology (all)" narrow field includes all types of Immunology and Microbiology, but it does not contain all Immunology and Microbiology articles.
High coauthorship was the norm for 19 of the 20 largest Immunology narrow field journals in 2020 (Figure 6), with the exception being European Journal of Molecular and Clinical Medicine. The highest average coauthorship for any journal with at least 100 articles was 9.6 authors: Journal for ImmunoTherapy of Cancer, apparently due to its reporting of complex multiexperiment studies (see online supplement for a detailed analysis of this journal: https://doi.org/10.6084/m9.figshare.17064419).

Arts and Humanities: Lowest Average Coauthorship
The Scopus broad field with the lowest coauthorship rate, Arts and Humanities, includes a wide variety of Scopus narrow fields in terms of average authorship ( Figure 6). Solo authorship is still the norm in many of the fields, as evidenced by many average author counts being below 1.4 (this is the maximum geometric mean that guarantees a mode of 1). Presumably, the fields overlapping with social science or technical specialties are more likely to have coauthors. The Arts and Humanities Scopus narrow field with the fewest average authors in 2020 is Classics, which is only marginally ahead of Literature and Literary Theory.
Within Classics, only one journal published more than 100 articles in 2020: Trends in Classics (see online supplement for figure: https://doi.org/10.6084/m9.figshare.17064419). These 106 articles included 99 with solo authors, six with two authors and one with four.

DISCUSSION
Two important data limitations of this study are that it relies on articles accurately reflecting coauthors and Scopus accurately indexing them. Both assumptions are not true in some cases. In particular, hugely coauthored articles may list the consortium as the author, and Scopus seems to limit the number of coauthors to 100, at least for some papers. This mainly affects highly coauthored articles, however, and its influence is reduced by using geometric mean averaging in this paper. "Authors" that are consortia include "The European Polycystic Kidney Disease Consortium" listed in Scopus as the author of "The polycystic kidney disease 1 gene encodes a 14 kb transcript and lies within a duplicated region on chromosome 16." This seems to be rare, however, with only 14 Scopus first-author names including "consortium" and only 268 including "collaboration" (e.g., "Event Horizon Telescope Collaboration"), although other consortium names are almost certainly also indexed.
Another important limitation is that the extent to which Scopus's coverage of academic journals has changed over time is unclear, although it is known that its total coverage has expanded greatly (Thelwall & Sud, 2022). Some journals may have been not produced or lost during the Second World War as universities and libraries were looted, damaged, or destroyed (Van der Hoeven & Van Albada, 1996), for example. Journals from fields that died out may also be more difficult to obtain, as might early journals that were published by scholarly societies and distributed relatively informally before the commercialization of much academic publishing (Fyfe, Coate et al., 2017). Scopus may also have been more (or less) inclusive in its selection policy for older years, given that it does not attempt to index everything (Mabe & Amin, 2001). More recently, because the coverage of Scopus changes every year, the inclusion of single large journals or groups of journals from a country or publisher can also impact the results (Thelwall & Sud, 2022). It is impossible to know how reliable the results are from different decades because of changes over time in the coverage of Scopus, but it seems likely that older results are increasingly less reliable, as some of the data may have been lost when retrospectively added after Scopus was created. The results should therefore be interpreted as accurate reflections of Scopus but not necessarily uniformly representative of published academic research since 1900.
The disciplinary differences in coauthorship rates shown above overlap with and echo some previous studies. This is true for the low rate of coauthorship for mathematics (Farber, 2005) and the social sciences and humanities . The results extend prior findings with updated, finer grained, and longer term information. In particular, while the arts and humanities are known to have the lowest coauthorship rates, Classics and Literature and Literary Theory do not seem to have been identified as the least collaborative or with the lowest level of coauthorship. Similarly, Immunology and Microbiology does not seem to have been previously singled out as the highest coauthorship area, at least with the Scopus classification scheme. Other fields seem to be better known for large authorship teams, such as astronomy (many large telescopes) (Kahn, 2018), genetics (Human Genome Project) (Dinh & Cheng, 2018), and particle physics (Large Hadron Collider) (Kahn, 2018), but Immunology and Microbiology seems to have consistently large coauthorship teams for its studies because of the typical need to combine multiple methods for a single article. Microbiology has been important for human health since germ theory displaced the miasma theory of disease in the late 19th century and was used to develop treatments and cures. This critical role, together with the complexity of its topic, may be part of the reason why studies are unusually elaborate and comprehensive, needing large teams of scientists. These teams are sometimes within a single medical center or hospital-university connection, rather than being higher profile international collaborations.
The apparently continuing increase in authorship raises the question of whether it is reaching a natural peak in any field. Plotting the authorship data on a graph with a logarithmic y-axis can help check this. On such a graph, exponential growth (i.e., the rate of growth is constantly increasing) would become a straight line and constant or decreasing growth rates would translate into a line bending downwards. With this interpretation, the logarithmic graphs for all fields for the authorship data show no sign of the rate of increase of authorship slowing down (Figure 7). In fact, the rate of increase seems to be accelerating in many broad Scopus fields. Thus, the data suggests that the level of authorship is not close to reaching a natural maximum, and raises the possibility that authorship team sizes will continue to grow in all broad fields for the foreseeable future.

CONCLUSIONS
Coauthorship has increased universally since 1900 in terms of broad and narrow fields, but with substantial differences between broad and narrow fields. The graphs in this paper can serve as benchmarks for future studies to check against the coauthorship rate of any field, to see how it compares to the overall average for the corresponding Scopus broad field (if geometric means are used).
The fact that the average number of authors per paper has been universally and reasonably steadily increasing for 121 years with no sign of reaching a natural maximum, even in its most prevalent fields, demonstrates a fundamental change in academic research and the possibility of a continued acceleration in article author numbers for the foreseeable future. This suggests that team size growth may be a fundamental part of modern academic research, for example due to the increasingly complex nature of studies needed to investigate beyond the expanding research frontier, or extrascientific pressure from funders to collaborate, leading to larger authorship teams.
The increases in the average numbers of authors per paper suggest, but do not prove (because of differing authorship attribution cultures), that collaboration has also increased steadily. Although all academic specialties are different, with their own trends and requirements, the results suggest that research group leaders and research managers should plan on the basis that coauthorship teams will continue to get larger and that this increase will need to be supported. This support might take the form of investing in infrastructure for coauthorship, encouraging communication between different sets of researchers that may coauthor, and encouraging the formation of larger research groupings or intergroup coauthorship. These types of support may help researchers to form and operate within the increasingly larger coauthorship teams that are appearing in all types of research. Managers should be sensitive to disciplinary differences and the importance of variety in research, however.
At the individual researcher level, the increases in average numbers of authors per article suggest that it may be beneficial to seek opportunities to enhance cooperation with other individuals or teams, perhaps leading to future coauthorships. It may also be useful for a researcher to look out for the advantages that coauthorships have brought to published research in their field so that beneficial types of coauthorship can be sought.