Exploring the role of interdisciplinarity in physics: Success, talent and luck

Although interdisciplinarity is often touted as a necessity for modern research, the evidence on the relative impact of sectorial versus to interdisciplinary science is qualitative at best. In this paper we leverage the bibliographic data set of the American Physical Society to quantify the role of interdisciplinarity in physics, and that of talent and luck in achieving success in scientific careers. We analyze a period of 30 years (1980-2009) tagging papers and their authors by means of the Physics and Astronomy Classification Scheme (PACS), to show that some degree of interdisciplinarity is quite helpful to reach success, measured as a proxy of either the number of articles or the citations score. We also propose an agent-based model of the publication-reputation-citation dynamics which reproduces the trends observed in the APS data set. On the one hand, the results highlight the crucial role of randomness and serendipity in real scientific research; on the other, they shed light on a counterintuitive effect indicating that the most talented authors are not necessarily the most successful ones.

on a sample of data from APS data set by minimizing the ratio of false negatives (same author/affiliation but we 23 consider the two authors/affiliations as different) and false positives (different authors/affiliations but considered the 24 same author). 25 Due to the large number of authors and affiliation we experienced a computational bottleneck due to the quadratic 26 time needed to perform all possible pairwise comparisons. To make such a cleaning step feasible we implemented the 27 similarity computation in connection to the Locality Sensitive Hashing (LSH) [3]. LSH is an algorithmic methodology 28 which makes use of hashing, that is able to fast identify similar pairs of objects without comparing them directly. 29 Using such a technique we were able to reduce the computational effort from quadratic to linear. All the code have 30 been developed in Php and the data, once cleaned, were stored into the relational database MySQL (v. 5.1). Further 31 manipulation and analysis of cleaned data were done using R language. The three different interdisciplinarity levels are represented with different colors: red (level 1), green (level 2) and blue (level 3). The bar for I AP S between j and j + 1 represents the number of researchers with I AP S ∈]j, j + 1]. In particular, the first two bars contains only researchers with I AP S = 1 and I AP S = 2, respectively. assigned to it and according to our choice we assign different PACS codes to a paper only if these codes differ on the 53 first digit; otherwise, we pile them up on a single code. In this way we assign to each paper a number of PACS codes 54 that is equal to the number of the different broad -less specific -areas related to it. From what has been said, is 55 understood that only PACS classified papers are considered. 56

57
Having at our disposal the PACS coded areas of all the papers, we may use them to define an index that helps us to quantify the variety of disciplines (areas) interested by the scientific production of any researcher. This variety is two-fold: a researcher may explore many different areas one by one, i.e. producing on many different PACS codes through papers with assigned only one code at a time; or she may explore few different areas but jointly, i.e. producing papers having more codes assigned together. In other words, a researcher's production can be interdisciplinary either because of the total number of areas that it interested, or because of the average number of areas jointly interested in one of its typical paper. As it is going to be evident, apart from an obvious constraint, these two degrees of interdisciplinarity are independent of each other. This observation led us to define an interdisciplinary index I AP S k for the researcher A k as

77
The 7303 researchers on which we have conducted our analysis are the remaining ones of a filtering procedure 78 conceived to study appropriately the researchers' careers over a period of thirty years, from 01/01/1980 to 31/12/2009.

79
The first requirement of the filtering is that a researcher must have produced her first paper in the period ranging 80 from 01/01/1975 to 31/12/1985 (see the left panel of Fig.S2). This ensures that all the researchers in the set started 81 their careers in a quite short period, so avoiding that the possible premature end of the production activity of a 82 researcher is due to her age. In this way, unless one started to produce in old age, that is a pretty remote possibility, 83 all the researchers in the set have comparable ages. Moreover, the PACS classification was implemented from 1975 84 onwards, enabling us to refer only to papers published starting from that year. The second requirement is that a 85 researcher must have produced a minimum number of (PACS classified) papers, that we chose to be 3. The third, 86 last, requirement is related to the way in which the raw APS database at our disposal has been cleaned (extensively 87 explained in the specific section).   S1: Statistical indicators of the 89949 published papers over the three defined classes of interdisciplinarity. A paper is counted in more than one class if it is coauthored by researchers belonging to different classes, so the sum of the reported numbers of papers exceeds 89949. A positive correlation between scientific production and interdisciplinarity level is found: the number of papers per researcher (PpA = papers/authors) increases quite strongly as the interdisciplinarity level grows.
the right panel of Fig.S3, where the cumulated number of papers is reported as function of time.  A confirm of the positive correlation between scientific production and interdisciplinarity level is shown in Table S1.

107
Comparing the number of papers per author and the (real) average number of papers per author (avg. PpA), we also 108 find a stronger presence of coauthoring in the level 1 and level 2 classes than in the level 3 class. This is due mainly to    (Fig. S4, left panel). A positive correlation between scientific impact, in terms of citations received, and interdisciplinarity level is found: the number of citations per author (CpA = citations/authors) raises as the interdisciplinarity level increases.
the fact that a lower percentage of researchers of the level 3 class participated to large scientific collaboration, respect 110 to the other two classes.

111
By looking minutely at their production one finds out that all of them did research in the areas of particle and groups working in other areas. As proved by the composition of the three interdisciplinarity classes in terms of the 116 ten PACS coded areas -see Table S2 -most of the researchers in our set who are involved in these large collaborations 117 belong to the level 2 class, justifying the heavier tail found for this class compared to those found for the other two 118 classes (Fig.S3).

119
One easily notes that these indicators clearly underestimate the real productivity of the researchers, but it must be 120 kept in mind that they refer only to (PACS coded) publications on APS and that the actual number of researchers 121 decreased over the thirty years, as shown in Fig.S2. shown for the papers production, we found a positive correlation between scientific impact, in terms of citations