The Burden of Knowledge in Mathematics

We investigate how the potential burden of processing ever more knowledge has affected the careers and research output of researchers in mathematics over the past 64 years. We construct a panel dataset of 48.851 researchers who published in ten top-ranking journals in mathematics. For this population of researchers, we supplement the dataset with years of birth from public sources. Our results show a significant increase of the average age of researchers at their first publication in one of our topranking journals, of the number of references of single-author articles, and of the number of coauthors that contribute to an article. Our findings extend earlier empirical findings on patents, as well as on researchers in economics, and hint at a burden of knowledge pervading different areas of human development. Moreover, our results indicate that researchers develop strategies like the division of labor to deal with this burden.


Introduction and Related Literature
Science is an essential driver of economic growth (Solow, 1962;Ayres, 1996;Stephan, 1996) and hence, several endogeneous growth models explain economic growth by the accumulation of knowledge (Romer, 1986;Lucas, 1988;Romer, 1990). The application of knowledge leads to technological progress, increases productivity, and thus, also results in economic growth (Nelson and Phelps, 1966;Galor and Tsiddon, 1997). Past rates of technological and scientific progress made scientists and businessmen like Gordon E. Moore (1975), Ray Kurzweil (1990), or Brynjolfsson and McAfee (2011) believe in ever-growing technological progress. Price (1963, p. 20 f.) reports a doubling of scientific knowledge every ten to fifteen years. Following Price, over the past decades, several bibliometric studies confirm such growth rates for scientific publications (e.g. Tague et al., 1981;Archibald and Line, 1991;Tabah, 1999;Mabe and Amin, 2001;Fernández-Cano et al., 2004;Bornmann and Mutz, 2015).
However, there exist also more pessimistic reports than those mentioned above. An article in The Economist from 2013 describes how several figures like economic growth, wages, and life expectancies have stagnated in recent years. Gordon (1999) reports decreasing growth rates for the US economy and Bloom et al. (2017) discuss several cases from business, technology, and science for which growth rates appear to stagnate while productivity rates are decreasing. Bloom et al. reason that constant growth rates are the result of respective increases in input rather than productivity. Concerning the growth of scientific results and publications already Price (1963) mentions that it is driven by an increase of manpower, i.e. the number of researchers working in science. Related studies allow for similar reasonings. For instance, Evenson (1984Evenson ( , 1993 shows decreasing returns from research and development per inventor. Schankerman and Pakes (1986) report decreasing patent values per inventor and Conley et al. (2013) show decreasing productivity rates for scientists in the first six years after their PhD.
The above mentioned increasing number of scientific results and respective publications may pose a burden especially on young researchers who need to process an increasing amount of knowledge before they reach the cutting edge in their field and contribute to it themselves. Jones (2009) follows this line of thought and investigates US patents and the respective inventors. He finds several indicators for what he calls the burden of knowledge. Young inventors are increasingly old at the moment of their first patent filing. Furthermore, inventors file patents with increasingly many colleagues and are increasingly specialized on a subfield of their industry. A related study by Schweitzer and Brendel (2020) transfers Jones's measures to scientific publications in economics and reports analogous results for the respective authors. Moreover, Brendel and Schweitzer show increasing numbers of references stated in articles which they interpret as the amount of processed preceding knowledge needed to write the respective articles.
Indeed, there are several indicators for a burden of knowledge in science. For instance, Arbesman (2011) reports increasing difficulties for some sciences to provide new results. Levin and Stephan (1991) report decreased per capita productivity for physics and geosciences as well as Gonzalez-Brambila and Veloso (2007) for biology, engineering sciences, and social sciences in Mexico.
Several studies report increased team sizes for different scientific disciplines. McDowell and Melvin (1983), Laband and Tollison (2000), Torgler andPiatti (2013), andHamermesh (2015) for economics, Laband and Tollison (2000) for biology, Henriksen (2016) for different social sciences or Jones (2009) as well as Singh and Fleming (2010) for patents. Except for arts and the humanities, where still 90 % of articles are solo-authored, Wuchty et al. (2007) find similar trends in the natural, engineering, and social sciences. Between 1955 and 2000 they find that the share of co-authored articles in mathematics increased from 19 to 57 %. The number of coauthors increased especially in the natural sciences, which leads Cronin (2001) to speak of hyperauthorship. For instance, Castelvecchi (2015) reports articles from physics and biology with several thousand authors. Also less direct forms of collaboration are increasingly found as Laband and Tollison (2000) or Cronin et al. (2003) find investigating the acknowledgements in articles.
There might be several reasons for increased collaboration and increased team sizes like better possibilities of communication and lower costs of cooperation. However, there might be other effects as well. For instance, Simonton (2013) argues that many fields within the natural sciences have grown rapidly which led to a large and complex knowledge basis and increased specialization among scientists. Hence, increasingly, scientists need to cooperate to cover the necessary knowledge for solving a problem. In their comprehensive review of top-tier publishing in economics, Torgler and Piatti (2013) describe how collaboration shifted from tightly integrated joint endeavors of thinking, in the 1980s, towards a more mechanical process of labor division that is made necessary for a more rapid output of publications in an increasingly competitive environment.
We are interested in whether there are similar trends for authors and publications in Mathematics. However, none of the above-mentioned studies on team size and references focuses on a specific moment of the authors' careers. We add to the existing literature by examining authors and their publications at the beginning of their career. Hence, following Schweitzer and Brendel (2020) who investigated the field of Economics, for authors and publications in Mathematics, we investigate the age of authors at their first article, the team size at their first article, and the number of references cited in an author's first single-author article.
There are two main reasons why we chose the example of mathematics for our analysis. First, by developing the methodological apparatus for any formal analysis, mathematics provides a necessary precondition for natural sciences, engineering, and social sciences. To a certain degree, progress in the methodological apparatus of many scientific disciplines depends on progress in mathematics. This makes mathematics uniquely constitutive for scientific progress. Second, collaboration patterns in mathematics have been established over a long history and are quite stable relative to other disciplines. Importantly, research is usually performed by single researchers or small teams. Hence, publication data is not contaminated by a trend towards large author clusters that increasingly occur in natural sciences such as Physics, where large scale laboratory experiments lead to papers with hundreds of authors whose contribution is difficult to disentangle when analyzing publication data. Scholarly articles published in journals are the main indicator of academic success and individual mathematicians incentivized to demonstrate their mathematical excellence early in their academic life. Furthermore, if true, the presumption that mathematicians are especially likely to attain high achievements at a young age would make this field especially prone to a potential burden of knowledge (see Torgler and Piatti (2013) for a discussion of evidence for and against this presumption).
The rest of the paper is structured as follows: in Section 2, we will elaborate on the data used in this paper. In Section 3, we will introduce the measures investigated and present our results. The paper concludes in Section 4 with a discussion of the results and a conclusion.

Data
We make use of the publication database JSTOR. The database provides data on scientific articles from different disciplines including mathematics. For this study, we processed the publication year, the name of the journal, the names of the authors, the title, and the number of references of articles. We supplemented this data with years of birth of the authors by manually searching public sources like Wikipedia, the author's CVs, profiles on their home pages, or online search engines like prabook.com, intelius.com, and birthdatabase.com. Moreover, we complemented the data by the authors' gender using the R-package "R gender".1 Naturally, we had to restrict our data collection to a manageable quantity. We restricted our data set to articles published between 1950 and 2013. After omitting publications like for instance errata, lists of accepted manuscripts, editorial statements, and book reviews the data includes 50.682 articles from the following ten top-ranked journals in mathematics:  Table 1 gives an overview of the publication years, the number of articles (column "Articles"), the average number of published articles per year (column "Articles p.a."), and the number of authors (column "Authors") covered. As can be seen in the table, the time periods and the number of articles covered vary for different journals also on the per-year level (column "Articles p.a."). Hence, to account for the differences between journals, we control for journal-fixed effects in our regression analyses. Table 1: Overview of journals and publication years covered as well as the numbers of articles, articles per year, number of authors that published in the respective journal, and the number of authors for whom the year of birth is known and who had their first article within the respective journal.

Journal
Publication years Articles Articles p.a. Overall there are 35.118 authors in the data set.2 We found years of birth for 837 of these authors who were between 25 and 45 at their first article in one of our top-ranked journals. For the sake of simplicity, throughout this text, 'first article' refers to the authors' first appearance in one of our ten top-ranked journals. To be clear, the corresponding article is not necessarily their very first publication and, indeed, authors can have earlier publications that are not considered in the scope of our analysis, if they happened to appear in other journals. Beyond the information listed above Table 1 also depicts the number of authors for whom we found birthyears and who had their first article among our ten top-journals within the respective journal (column "Birthdays"), i.e. if an author for whom we have birth-year information published in more than one of our top journals he appears only once in the column "Birthyears" namely for the first top journal he published in.

Measures and Results
Using the data available, we investigate the measures introduced by Jones (2009) for patents and translated directly into scientific articles by Schweitzer and Brendel (2020): -Age at first article: An author's age at his first article in one of the top-ranked journals from our data set.
-Team size: The number of authors listed for each article with at least one debut author. 3 -Number of references: The number of articles referenced in an author's first single-author article. This measure serves as a proxy for the immediate knowledge an author had to process before writing his first article. Note that we cannot separate between the breadth and the depth of knowledge represented by references since we do not know if multiple references serve to cover multiple topics or deepen only one topic's understanding. References were drawn from JSTOR. They are available for 75.3 percent of articles in our dataset.
In our regression analyses (see Tables 2-4), we included several control variables. "Female" accounts for differences in authors' gender. The binary variable is one for female authors and zero for others. In addition to being used as an output variable to measure the burden of knowledge (see above and Table 4), "Team size" is also used as a control variable in Tables 2 and 3 to account for the number of authors listed on a given paper. The term "Journal" indicates, whether we included fixed effects for the different journals in a given model ("YES" if we did). The terms "Start" and "N" indicate the first publication year included in the regression analysis for a given model and the number of observations, respectively. Significance codes: *** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1 2 Authors were identified heuristically based on the names listed for each article using an algorithm for author disambiguation (Schweitzer, 2017).
3 The author of an article is a debut author if the article is his first article in one of our journals. Figures 1-3 depict some of our key findings. As shown in Figure 1 the average age of an author at his first article in one of our top-ranked journals has steadily increased from 28.3 years to 33.3 years over the course of the investigated 64-year period. This increase in the average age at an author's first article does also hold for shorter observation periods from the 1960ies or 1970ies on. It is more pronounced in the first decades of the observation period and flattens out thereafter. As shown in Table A.8 in the appendix, the positive coefficient of age remains in the second half of the observation period but is not significant, as are the other explanatory variables. Publication year Age at first article We also find an increasing number of references stated in the articles ( Figure 2) and an increased team size (Figure 3). The average number of references at an author's first single-author article increased from 4.4 in 1950 to 25.9 in 2013, while the average team size for debut articles increased from 1.3 to 2.8 over the same period of time. The spike for the average number of references in the late 1970ies is driven by higher means for the SIAM Review in these years. While the mean number of references in the SIAM Review in 1977 is 8.7 it increases to 72.7 in 1978. Figure A.4 in the appendix depicts the differences in the number of references between journals that deliberately try to encourage an integrative up-to-date perspective of topics (SIAM) and the remaining journals. One can see that both curves share a common baseline with the SIAM-journals having two spikes in the late 1970ies and late 1990ies that are both driven by similar spikes for the SIAM Review ( Figure A.4). The observed overall trends for the number of references and team size also show for most individual journals ( Figures A.6 and A.7 in the appendix). For the age at first publication, we observe two notable exceptions ( Figure A.5 in the appendix). For the SIAM Journal on Numerical Analysis, we observe a decline in the average age at first publication. Also, for the SIAM Review, we observe a decline that is mainly driven by an outlier at the end of the observation period in 2009. The reason for the latter finding is that we found the lowest number of birthyears for the SIAM Review (birthyears for 16 authors, Table 1). Hence, this trend is highly volatile and prone to outliers. Publication year Average number of references  Table 2 depicts the results for the age at first article for different linear regression models under several controls. In all regression models, time effects are positive and highly significant. Also effects of team size at an author's first article are positive and highly significant. Hence, the bigger the team an author wrote his first article with, the older the author at his first article. 4 Since the number of references and team size are count variables, regressions were also computed using negative binomial regression. However, results remain the same, except for the concrete coefficients, under the alternative regression models.
Controls for female authors are not significant. Table 3 depicts the regression results for the number of references. As one can see, we obtain a highly significant result for the time trend and a quite pronounced effect of female authors citing less at their first single-author articles. Overall the mean number of references stated in an article increases by approximately one every four years. Significance codes: *** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1 Finally, Table 4 depicts our results for the average team size at the authors' debut articles. Again we see highly significant time effects in all models. Significance codes: *** p < 0.001; ** p < 0.01; * p < 0.05; + p < 0.1

Discussion and Conclusion
Our results concerning authors' increasing age at their first publication are in line with the results of Schweitzer and Brendel (2020) and the results on inventors' age at their first patent filing reported by Jones (2009). These results indicate that it takes increasing time to come up with scientific results and to get them published. Likewise, our results on the number of references and articles' team sizes are in line with existing studies in different scientific fields. However, our paper is the first to focus on the beginning of a researcher's career, i.e. on their first publication, for the field of mathematics. Hence, in our line of reasoning, the increased number of references indicates that researchers process an increasing amount of literature and further information before they reach the cutting edge of their field and contribute to it themselves in form of their first article. Finally, our results on team sizes indicate that already young researchers who are beginning their careers are increasingly looking for opportunities of cooperation and a division of work that comes with the writing and publication of an article. Overall, our results might be further evidence of what Jones (2009) called a burden of knowledge that increases the amount of knowledge necessary to process before being able to come up with something new with every article or book published in a given field.
Besides an increasing amount of knowledge, there might be other factors contributing to an increasingly high number of references. One example is lower transaction costs of searching and referencing literature. Given today's search engines and levels of digitization, existing papers are easier to find, specific software tools help managing and structuring literature, and including a new reference has become a task of a few clicks on your computer. The beneficial effects of new information technologies might also be a reason why in our data the increase of authors' age at their first publication appears less pronounced in the latter half of our observation period. For the future, however, it is unclear if further improvements in information technology will be able to keep up with an increasing burden of knowledge.
Changing research norms in the past decades are another possible factor that could contribute to the observed increase in references. A fear of the allegation of plagiarism and search for confirmation of one's arguments might increasingly incentivize authors to cite existing literature. Also existing measures of research success rely on the number of citations (e.g. Hirsch, 2005;Garfield, 2006) and incentivize authors to cite their own research and to be cited by colleagues and vice versa (e.g. Bethard and Jurafsky, 2010;Lin and Huang, 2012;Teodorescu and Andrei, 2014). Despite these downsides, numerous other studies such as Bornmann and Mutz (2015) or Cordero et al. (2016) rely on the analysis of references.
Naturally, our study comes with limitations. One of these is the limited number of top-ranked journals investigated. Because of this limitation, we cannot distinguish between effects of increased competition within top-ranked journals and increased times of education and research. However, previous studies indicate that not all of the effect can be attributed to increased competition. For example, Jones (2009) observes increasing age at inventors' first invention even though patenting is less prone to increased competition, as there are no equivalents to top-journals with space restrictions. Moreover, in a related study reporting age at first publication in economics research, Schweitzer and Brendel (2020) split their data into two samples: a broad sample with all journals and articles available on EconLit and a narrow one with only nine top journals in economics. They find that the effects, especially concerning the age at first publication, are more pronounced for the very competitive journals, however, the trends are similarly increasing also for the wider sample of journals. Finally, Ellison (2002) estimates that only three to four months of the reported twelve to eighteen months increase of publishing times can be attributed to increased competition. Notwithstanding these studies, a follow-up study of the present work could build on a broader dataset of journals and author data (i.e. more journals overall and including lower-ranked journals) to shed more light on the factor of increased competition.
If a more comprehensive dataset on author data were available, it would also be a very fruitful endeavor for future research to include additional personal and institutional controls in the regression analyses. For instance, the age of an author at his first article might depend on his educational background (e.g. years in school) although the number of years in school might already be a manifestation of an increased amount of knowledge that students need to process. At the same time the institutions themselves might have affected the time it took for the first publication. For instance, one might control the institutions' reputation (e.g. rankings) or whether it is academic (i.e. university, research center) or non-academic (e.g. bank, industrial company).
Moreover, it would be interesting to investigate the composition of research teams of published articles in more detail. One can think of many different aspects that might be worth looking at. Among these are differences in gender and age of the individual authors, as well as differences in their fields of expertise. For instance, one could investigate how a teams' homogeneity concerning the individual authors' research fields affects publication outcomes, the size of the team, and the number of references cited. Another factor that might influence the moment a young researcher publishes his first article might be whether he publishes together with his supervisor or in another constellation with other coauthors.
While our findings are limited to academic publishing in top-ranked mathematics journals, together with related results from previous studies such as Jones (2009) for patents and Schweitzer and Brendel (2020) for publications in economics, they hint at a more general mechanism that could eventually slow down human progress. As stated before, scientific knowledge and progress are essential for economic growth and hence, a burden of knowledge might eventually also affect commercial applications and have various policy implications. Some articles report stagnation for various indicators such as economic growth or real hourly earnings in manufacturing and a decrease in productivity (Economist, 2013;Bloom et al., 2017).
A burden of knowledge in mathematics, in particular, might have policy implications with respect to the career trajectories of early career researchers in the field. In the presence of a burden of knowledge, we predict increasing disappointment of expectations of early discoveries. Thus, career incentives and expectations need to be managed. An appropriate policy can motivate the tackling of the burden. For example, alternative forms of academic performance such as the creation of review studies could be rewarded when evaluating young researchers. Also, returns of improvements in information and learning technologies that could alleviate the burden of knowledge can be high.