The endless frontier? The recent increase of R&D productivity in pharmaceuticals

Studies on the early 2000s documented increasing attrition rates and duration of clinical trials, leading to a representation of a “productivity crisis” in pharmaceutical research and development (R&D). In this paper, we produce a new set of analyses for the last decade and report a recent increase of R&D productivity within the industry. We use an extensive data set on the development history of more than 50,000 projects between 1990 and 2017, which we integrate with data on sales, patents, and anagraphical information on each institution involved. We devise an indicator to quantify the novelty of each project, based on its set of mechanisms of action. First, we investigate how R&D projects are allocated across therapeutic areas and find a polarization towards high uncertainty/high potential reward indications, with a strong focus on oncology. Second, we find that attrition rates have been decreasing at all stages of clinical research in recent years. In parallel, for each phase, we observe a significant reduction of time required to identify projects to be discontinued. Moreover, our analysis shows that more recent successful R&D projects are increasingly based on novel mechanisms of action and target novel indications, which are characterized by relatively small patient populations. Third, we find that the number of R&D projects on advanced therapies is also growing. Finally, we investigate the relative contribution to productivity variations of different types of institutions along the drug development process, with a specific focus on the distinction between the roles of Originators and Developers of R&D projects. We document that in the last decade Originator–Developer collaborations in which biotech companies act as Developers have been growing in importance. Moreover, we show that biotechnology companies have reached levels of productivity in project development that are equivalent to those of large pharmaceutical companies. Our study reports on the state of R&D productivity in the bio-pharmaceutical industry, finding several signals of an improving performance, with R&D projects becoming more targeted and novel in terms of indications and mechanisms of action.


Background
It is no coincidence that Vannevar Bush devoted the first chapter of his The Endless Frontier [1] to "the war against disease", as the life sciences and pharmaceuticals are a key area for the long term evolution of the relationships between science, innovation, economic growth and society.
Notwithstanding the persistent contribution of scientific research to pharmaceutical R&D [2][3][4], in the early 2000s many concerns were raised on the ongoing process of drug development, which culminated in a diffuse perception of a "productivity crisis" [5,6]. Data showed a progressive increase of attrition rates at all stages of drug

Open Access
Journal of Translational Medicine *Correspondence: fabio.pammolli@polimi.it 1 Dipartimento di Ingegneria Gestionale, Politecnico di Milano, Via R. Lambruschini, 20156 Milano, Italy Full list of author information is available at the end of the article Page 2 of 14 Pammolli et al. J Transl Med (2020) 18:162 development, together with a significant increase of the time needed for the completion of clinical trials [5,7]. Several hypotheses were introduced to explain these trends, including a gestation lag associated with the fundamental transformations of scientific knowledge bases following the "omics" revolution [5,8,9]. Recently, signals have started to emerge of a change of tendency: (i) the number of New Therapeutic Entities (NTE) approved by year has increased regularly [10,11]; (ii) research in oncology has benefited from the introduction of biomarkers for the targeting of therapies [12,13]; (iii) several innovations are shaping the process of pharmaceutical R&D, from artificial intelligence to 3D printing for drug design and production [14,15]. In parallel, pharmaceutical companies have been rethinking the entire R&D process, implementing novel organizational solutions [16] and devoting great efforts to the early detection of nonviable drug candidates [17]. Finally, the recent upsurge of advanced therapies (e.g. CAR-T cell therapies) has been interpreted as a sign of a gestation lag of further major breakthroughs coming to an end [12,15,18].
Concurrently, regulatory agencies such as the US Food and Drug Administration (FDA) have worked to accelerate the drug approval process. Requests for Breakthrough Therapy Designation [19], conceived to speed up approval for drugs that exhibit outstanding performances in preclinical research, have been increasing steadily passing from an average approval rate of 33% in the first years of application (2013)(2014)(2015) to 44% in more recent years (2016)(2017)(2018).
In this paper, we ensure comparability of results with Pammolli et al., 2011 [5] and provide an updated and accurate picture of the current state of pharmaceutical R&D, using data on drug pipelines up to 2017. Our measures of productivity refer to the R&D process (e.g. attrition rates, phase durations), rather than to R&D expenditures [20][21][22]. This allows us to focus on a comprehensive data set of more than 50,000 R&D projects, whose processes have been registered with time and space signatures. Information on drug pipelines is integrated with links to an enriched patent database and to sales figures for marketed compounds. Moreover, we provide a broad classification of the indications associated with each project ("chronic", "lethal", "multifactorial", "rare") and identify the type of each institution (i.e. pharmaceutical and biotechnological companies, non-industrial institutions) involved in the R&D process either as an Originator or as a Developer of each project. Finally, we introduce two measures of novelty, respectively for project indications and mechanisms of action.
We identify the therapeutic areas that have attracted a stronger effort, and we are able to ascribe the observed changes in attrition rates to institution types, and to different configurations of Originator-Developer collaborations [23].

Data
Data on R&D projects has been collected from R&D Focus, a comprehensive proprietary database on pharmaceutical R&D pipelines. Data on R&D projects has been complemented establishing specific matchings with sales figures from IMS/IQVIA, and with patent data from Regpat and USPTO.
R&D Focus contains information about over 43,000 medical compounds developed until September 2018, both successful and failed. For each compound, a number of details are available. In particular, in this paper we use the following pieces of information: • ATC codes, classifying compounds into groups on the basis of the organ on which they act and their therapeutic and chemical properties; it is a hierarchical classification envisaging five levels. We refer to the first three classification levels, defined as follows: • Indications, i.e., the diseases for which the compound is/will be used. To ensure compatibility with previous studies [5], each indication has been classified as rare/not rare, lethal/not lethal, chronic/not chronic, multifactorial/not multifactorial. 3 We underline that this classification is preliminary, but can anyway be profitably employed to take into account the effects of disease types on other variables, as we will see below. Extending this classification with further categories would be an interesting future work. Pammolli et al. J Transl Med (2020) 18:162 • Mechanisms of action, representing the biochemical interactions and pathways through which the drug produces its effects. • Institutions that have participated to the R&D activity, distinguishing Originators (patent owners) and Developers (institutions developing the compound). • Codes of the patents related to the compound. • Commercial summary, which is a description of facts and events related to the compound in natural language.
Each pair (compound, indication), which defines an R&D project, is connected with information that reconstructs its development history. The development history is the sequence of development phases that the compound has undergone until its marketing or failure for any given indication. Phases are Preclinical, Phase I, Phase II, Phase III, Registration, Marketed. Each phase is associated with date and country of reference. Only projects started in USA, EU or Japan have been taken into account. We end up with a database covering the history of 50,150 R&D projects. IMS/IQVIA data record sales in Euros of all marketed pharmaceutical products from 2002 to 2016, in 35 countries. The database contains 202,651 products corresponding to 48,402 distinct compounds. The compound names of marketed products have been linked to the R&D compounds via text matching. In our R&D dataset we cover 2333 marketed compounds, and we have been able to connect 2123 of them with sales entries (91.0%). Globally, we identify 3584 projects, i.e., pairs (compound, indication), associated with sales figures.
We link the compounds listed in our R&D dataset with both USPTO and Regpat (EPO and PCT 4 patents); we establish a correspondence between the compounds that we list in the R&D dataset and, respectively, 2917 USPTO patents, 3441 EPO patents and 2419 PCT patents. For 14,263 R&D projects, i.e., pairs (compound, indication), we establish an association with a patent and its assignee(s), that is, its owning institution(s). Each institution name is then disambiguated and matched to a specific institutional type. In particular, we classify each patent assignee and each developing institution according to six categories: three industrial categories (pharmaceutical, biotech and other industrial) and three non-industrial categories (university, hospital and other research centers). For 84.6% of the R&D patents we are able to classify the assignees, while for 87.7% of the R&D compounds we are able to classify all the institutions involved in the R&D process.
Additional file 1: Fig. S1, proposes a flow chart summarizing the criteria and the steps leading to the construction of the datasets employed in the experiments (R&D projects, R&D projects associated with patents, R&D projects associated with sales).

Data processing Attrition rate
The attrition rate for a given development phase in a given year is defined as the percentage of R&D projects that entered the focal phase in that year and passed to the subsequent phase within 4 years (accordingly, the maximum possible starting year in our data is 2013). If information on the subsequent phase is missing but a more advanced one is recorded, the transition is deemed accomplished without imposing time constraints.

Phase duration
The duration of a given development phase in a year is defined as the median time required to the R&D projects that entered the focal phase in the given year to pass to the subsequent phase. The median is computed considering transitions with duration lower than or equal to 4 years, to make a sound comparison across decades.

Probability of success
The probability of success of projects developed within a given ATC3 is measured by the number of projects that reach the market over the total number of projects in that ATC3. Projects started after 2013 are not taken into consideration.

Novelty measure
Various kinds of novelty measures have been used in the scope of drug development [24]. In this paper we introduce a more comprehensive measure and we apply it to assess, for each project, the degree of novelty of both indications and mechanisms of action: where n ind/moa,<t is the number of times an indication/ mechanism of action listed in project i has appeared in previous projects, while N p,<t is the total number of previous projects. We select min(n ind/moa,<t ) to identify the "newest" mechanism of action amongst the ones related to the focal project.

Statistical techniques Statistical tests
To assess the significance of a change in productivity measures in two different time spans, we use Wilcoxon test [25], which is a nonparametric test to detect differences in the medians of two distributions. To perform such a test on attrition rates, we compare the distributions of phase transitions, treated as a binary variable indicating failure/success for each phase occurrence, in the two time spans.

Changepoint detection
Changepoint detection [26] identifies the time instants (changepoints) corresponding to abrupt changes in a function. Identifying the changepoints divides the function into sections. In particular, we split the attrition rate in correspondence of the years where the regression line changes the most. This is obtained by finding the sections of the function such that the sum of the residual errors of the regressions in each section is minimized. Note that adding more changepoints keeps reducing the value of the residual error, leading to overfitting. To avoid this problem, the error metric needs to envisage also a term penalizing high numbers of changepoints. Let x 1 , . . . , x n be the points of the function that we are studying, and let f p,q be the regression line approximating the function between the time instants p and q ( p < q ). The changepoint detection procedure finds the K time instants k 1 , . . . , k K minimizing the following metric: where in this formula k 0 represents time instant 1 and k K +1 represents the last time instant (n). The internal summation describes the residual error of the regression between the time instants k r and (k r+1 − 1) . The term βK , where β is a parameter to be set, penalizes the addition of new changepoints. It can be easily shown that a new changepoint is rejected if it does not bring an improvement to the residual error of the regression at least equal to β . In this work the threshold β has been set to twice the variance of the function, meaning that we stop adding changepoints when the subsequent new one would increase the R 2 determination coefficient of the regression of less than 2/n.

Regression with dummy variables
We model a set of response variables in a regression framework: • Phase-by-phase transition: binary variable identifying the successful passage from the focal phase to the next one; • Sales: logarithm of the sum of sales of the focal drug.
Our main explanatory variable is the binary variable identifying the type of Originator-Developer (O-D) relationship under study. For each project, we define the Originator(s) according to the assignee(s) of the related patent(s), and the Developers according to the developing institution listed at each stage of the R&D process. We define O-D relationships according to the presence of at least 1 assignee or developer in one of the different institutional types. We treat the "university", "hospital" and "other research" classifications as "non-industrial". Also, we define as the baseline O-D relationship the one that has a pharmaceutical company acting both as Originator and Developer. Then In addition, we use a few dummy variables to control for fixed effects characterizing the focal project: the starting year, the indication and the classification of the indication. Please notice that the four indication classes that we have indicated above (i.e. "lethal", "chronic", "rare", "multifactorial") are overlapping, and therefore multiple fixed effects related to the indication type may be relevant for a given R&D project.
In synthesis, the regression model for the generic response variable X can be written as: where OD is the binary variable classifying each project by either a relevant project according to the O-D relationship under study, or a baseline project (pharmaceutical as originator and developer both).

The evolution of R&D productivity in pharmaceuticals
We identify an R&D project as a specific indicationcompound association, and select projects started in either the US, Europe or Japan since 1990. We first focus on phase-by-phase attrition rates (Fig. 1a). At each stage of development, we define a success when we observe a transition to the next stage within 4 years, or, in case of missing data, to any other subsequent phase, without time constraint. As a consequence, in our analysis of (3) attrition rates we study projects which entered any phase of development by 2013, while we consider data until 2017 to detect phase transitions. We use changepoint detection analysis [26] to pinpoint the most relevant shifts in regression slopes in the data. We have found that attrition rates in clinical phases have been declining in recent years, though they have remained above the values observed in 1990-1999. We also observe a reduction of attrition rates in preclinical research. To portray a comprehensive picture of recent trends, we show in Fig show that the observed decreases are statistically significant for all phases, except for Phase III (see "Methods" for details). Attrition rates in late-stages clinical trials (i.e. Phase II and III) remain quite high (Fig. 1a, b). Market launches (i.e. projects that are registered by a regulatory agency and marketed, see the Registration panel in Fig. 1a) have increased steadily. As a first attempt to identify drivers of decreasing attrition rates, we compute the relative performance of R&D projects targeting different therapeutic areas. To this end, we classify the projects according to their corresponding a b Fig. 1 a Time evolution of attrition rates at different stages of drug development. Black circles: data; red solid lines: linear regression in the corresponding time window; blue vertical solid line: changepoint. In a given year, the attrition rate for each development phase is defined as the percentage of projects that started the phase in that year and failed to pass to the subsequent phase within 4 years (accordingly, 2013 is the last year we do consider). b Average (± standard deviation) yearly phase-by-phase attrition rates in three different time intervals (1990-1999, 2000-2009, 2010-2013). We also report the significance level of a Wilcoxon rank sum test [25]  first-level ATC class. Additional file 1: Table S2 reports the values involved in this computation (phase-byphase attrition rates, project share, significance level of the observed changes). Results provide a few relevant insights. First, at early stages of development (Preclinical and Phase I) attrition rates have been decreasing quite ubiquitously, but significant reductions (Wilcoxon test [25] at 0.05 significance level) are to be ascribed to cancer research (ATC class L: Antineoplastic and Immunomodulating Agents). Considering its share in the data, this result makes oncological research of utmost importance in the overall attrition rate reduction that we do observe for early stages of pharmaceutical R&D. Then, at later stages of the drug development process, significant reductions occur in class J (General Anti-Infectives Systemic), which also shows a statistically significant improvement in Phase III, in class B (Blood and Blood Forming Organs), in class C (Cardiovascular System) and in class P (Parasitology). When we move to consider ATC classes that provided a negative contribution to the decrease of phase-by-phase attrition rates, only a few of the observed results are statistically significant, with the notable exception of increase of Phase III attrition rates for class N (Nervous System), confirming the difficulty of research in mental/brain diseases [27].
In order to get some further insights on the observed results, we focus on specific subsets of R&D projects: two important sets of biologics, i.e. advanced therapeutics (cell and gene therapies) and monoclonal antibodies, and the R&D projects related to Alzheimer's disease. Biologics in fact are experiencing a remarkable growth in sales: according to EvaluatePharma [28], in 2020 sales of biological compounds are expected to increase by 50 billion USD. Finally, R&D projects on Alzheimer's disease represent a large class of neurological R&D projects [29]. In Additional file 1: Table S3, we show attrition rates in 2000-2009 and 2010-2013 for advanced therapies and monoclonal antibodies, while in Additional file 1: Table S4 we show attrition rates for Alzheimer's disease for the same periods, including also a focus on the projects connected to the amyloid hypothesis, 5 which we were able to identify in our data. As per advanced therapies, the significant decrease of attrition rates in the early phases of development is confirmed, with a very remarkable reduction for Phase I; please notice that the development of these therapies has been growing in recent years and so we do not observe any project passing Phase III until 2013, while our data contain eight projects reaching the market in 2014-2017. For monoclonal antibodies we record a significant decrease of attrition rate for the Preclinical phase. Regarding Alzheimer's disease, Additional file 1: Table S4 highlights the absence of significant improvement in attrition rates; in particular, our data do not record any R&D projects passing Phase III. The further focus on the projects related to the amyloid hypothesis, accounting for about 50% of the Alzheimer's projects, shows a similar pattern.
We now analyze the duration of R&D activities at different stages of the drug development process. First, we measure the time needed to identify non-viable R&D projects (Fig. 2a). Interestingly, ≃ 70% of projects that had started between 2000 and 2009 were terminated in the year they entered preclinical research, with a ≃ 20% increase with respect to the previous decade. For successful projects, we measure the time lag from date of patent to date of market launch. In the inset of Fig. 2a we show the distribution of the time lag between patent filing and market launch of successful projects in the three decades under study. Interestingly, despite the increase observed for projects started in the 1990s and in the 2000s, this measure has decreased, showing that the development of at least a fraction of the projects has become faster in recent years.
To track the evolution of phase durations, we compute the time needed to progress along the pipeline in the different decades of observation (to ensure comparability of projects in different decades we imposed a constraint of 4 years (48 months) as the maximum observable duration for a given phase). As shown in Fig. 2b, the time needed to complete preclinical research has been slightly increasing, but we did not find significant differences between decades (Wilcoxon test [25] at 0.05 significance level). In Phase I of clinical research, projects progression has become significantly faster in the latest decade. The duration of Phase II saw a significant increase in 2000-2009, but then this trend has halted. The duration of Phase III has increased progressively and significantly remaining the longest, due to the complexity of inherent activities (regulatory requirements, increasing patient sample size, multi-center logistics [6]).

Finding the niche
Evidences presented in the previous section documented that attrition rates have been decreasing in recent years. We now move to investigate which therapeutic areas research has focused on.
To this end, similarly to Pammolli et al. [5], we partition projects under study based on their ATC, identifying their main therapeutic areas at the 3-digit hierarchical level (ATC3). In Fig. 3 we show how projects are distributed across therapeutic areas, as a function of the corresponding probability of success (POS; i.e. how many Pammolli et al. J Transl Med (2020) 18:162 projects have reached the market from the preclinical phase, overall) up to 2013 and of the yearly average sales between 2002 and 2016. In general, results show that high uncertainty/high potential reward projects (i.e. low POS and high yearly sales) continue to polarize Research and Development efforts (Fig. 3a) [5]. Expectedly, projects in therapeutic areas with higher revenues and higher attrition rates have experienced the highest share increase between 2002-2009 and 2010-2017 (Fig. 3b). In particular, monoclonal antibody neoplastics (L1G) and immunosuppressants (L4X) have increased their share, while the still prevailing class L1X (antineoplastic and immunomodulating agents) has remained constant.
The concentration of projects in oncology has become even more apparent in the last decade: (i) 4 out the top 5 ATC3 classes, ranked by their overall share in projects ongoing between 2000 and 2017, are related to oncology (L); (ii) more than 40% of ongoing studies, currently listed on ClinicalTrials.gov 6 are oncology-related (see Additional file 1: Table S5). Other relevant fields that showed up in rankings include degenerative diseases of the central nervous system (N7X), with specific reference to Alzheimer's disease (N7D), another area in which unmet medical need is high [32,33]. Interestingly, coherently with results presented in the previous section, while projects in cancer research have improved their attrition rates after 2010, the performance of projects in class N has worsened in most cases (see Additional file 1: Table S2). In fact, out of 86 projects on Alzheimer's disease in the last 10 years, only one has received approval [15].
The focus on R&D projects of high complexity in relatively unexplored areas is also witnessed by the fact that orphan drugs indications and approval have increased in recent years [34]. The number of yearly NME approvals for orphan drugs has more than doubled from 2000-2009 to 2010-2017, while drug repositioning approvals towards rare indications have tripled in the same period [34]. We use a manual classification of indications to retrieve the share of rare diseases (defined as having a prevalence of ≤ 200,000 affected individuals in the US) by year of project start (Fig. 4a). In the observation period, this share has increased from 3 % in 1990 to about 16% in 2017.
The general tendency towards development of drugs on orphan indications and treatments that are more and more specific and target relatively small subpopulations seems to act as a factor of increasing difficulty of projects. It has been observed, for instance, that orphan drug . We highlight in green the area between the two curves. We show the fraction of projects that are discontinued after x years from the start of preclinical research. The distribution accounts for a maximum discontinuation time of 8 years, so we focus on projects started before 2010. Inset: boxplot of the time interval between patent filing and market launch years, based on the year of market launch, in three different time intervals (1990-1999, 2000-2009, 2010-2017). b Median phase duration per each phase of the drug development process, in three different time intervals (phases started in 1990-1999, 2000-2009, 2010-2013). The duration of a development phase in a year is defined as the median time required to the projects that started the focal phase in the given year to pass to the subsequent phase. The median duration is computed considering only transitions with duration lower than or equal to 4 years, to make a sound comparison across decades. When the median of a phase duration is not significantly different from that of the previous decade, the corresponding value is barred Pammolli et al. J Transl Med (2020) 18:162 development takes, on average, 2.3 years longer [35]. This is due, for instance, to the recruiting challenges associated with smaller and geographically dispersed patient populations, and to the scarcity of available animal models and biomarkers. An additional factor of complexity might be the increasing relevance of multifunctional drugs, which have emerged, in opposition to single-target drugs, as a new approach to treatment [36][37][38].
We then consider the average number of mechanisms of action per drug, by starting year of the project (Fig. 4a), observing a clear positive trend with a pronounced increase after 2010. This trend is confirmed also for the individual ATC1 classes, as shown in Additional file 1: Fig. S2. Between 1990 and 2017, the average number of mechanisms of action per drug has nearly tripled. This result may reflect a general improvement of drug efficacy, as they act on multiple targets, while it also reflects the increasing difficulty of drug design [37].
The evidences that we have presented so far might seem to lead to apparently contradictory conclusions: on the one hand, we found that research is focusing on difficult and risky areas like oncology and rare diseases; on the other hand, we observed a recent increase in productivity. A few explanations for our results can be introduced. As per cancer drugs, recent reports [11,14] show findings similar to ours and predict even greater sales and market share, due to the combination between high medical need and advances in the relevant science basis [6]. Advances in scientific knowledge bases sustaining R&D activities in oncology are worth mentioning here. In fact, our data show that advanced therapies (i.e. cell or gene therapies) are mostly focused on oncology (Additional file 1: Fig. S3), and that projects on advanced therapies have been on a steep rise in the last few years (Additional file 1: Fig. S4) [15]. Also, the rising importance of anti-cancer antibodies (class L1G) could be a factor of simplification of drug preparation for preclinical test and clinical trials, as the efficiency of monoclonal antibody production has improved significantly in recent times [39]. As per orphan indications, FDA data [40] show that they cover a majority share in fast-track programs, while projects in these areas are affected less by the "better than the Beatles" problem described by Scannell et al. [6]. In fact, in Additional file 1: Table S6 we show that the average phase-by-phase attrition rates have been declining in the subset of projects focusing on rare indications, with the notable exception of Phase III, in which trials set-up is known to be more demanding because of the small size of target population [35].
To complete the analysis, we investigate the degree of novelty associated with each research project. For each R&D project, we measure the novelty of indications and mechanisms of action. To this end, we devise an indicator that counts the number of times a given indication/ mechanism of action listed in the focal project appeared in earlier projects, taking into account the total number of previous projects (see "Methods" for details). Interestingly, the median value of both these measures has been increasing in recent years (Fig. 4b). In other words, research has tended to focus on novel indications and mechanisms of action. Recent reports [15] show, in fact, that 34% of mechanisms of action in FDA-approved drugs in 2018 are first-in-class (i.e. they were different from those of existing therapies). To gain insights into the relationships between novelty and market launches, we divide our dataset in successful (i.e. marketed) and failed projects. We have found a significantly higher median novelty of successful projects (0.083 vs 0.015; a Wilcoxon test rejects the null hypothesis that the two distributions have the same median with p << 0.01 ). In other words, an increasing fraction of marketed drugs tend to be based on novel mechanisms of action and target novel indications.

The division of innovative labor
As shown in [5,41,42], the contribution of different institutions (pharmaceutical and biotech companies, non-industrial institutions) to R&D performance might differ significantly. In this section, we first investigate the role of different institutional categories as Developers of R&D projects. Then, we study the Originator-Developer contractual relationships, where the Originator of a drug project is defined as the institution that holds the relevant patent and is assumed to have started the R&D project. We distinguish pharmaceutical companies, biotech companies, universities, hospitals, other non-industrial research centers. Overall, we cover a subset of the projects (see the "Data" section) that shows statistics comparable to the whole sample (Additional file 1: Table S7). In Table 1 we list attrition rates for each institution type and the corresponding share of projects developed in the two periods. Here we focus on Developers, while in the last part of this section we concentrate on Originators. We measure the contribution of institution type i to the variation in attrition rates in phase p for phases started in 2000-2009 and 2010-2013 using the formula δ ip = (�AR ip · Sh i ) * 100/�AR tot,p , where AR ip is the variation observed in attrition rates in p in the projects developed by i, Sh i is the share of phases belonging to projects developed by i in 2010-2013, and AR tot,p is the total attrition rate observed for phase p for all projects for which institution classification was available. Strikingly, results show (Additional file 1: Fig. S5) that a relevant fraction of the observed reduction of the attrition rates are to be ascribed to projects developed by biotechnological companies, with significant attrition rate decreases (Wilcoxon test [25] at 0.05 significance level) in all phases (except for Registration). The contribution of pharmaceutical companies to the total attrition rate changes tends to be high due to to the large share of projects in which they act as Developers (see e.g. in Phase I), but no significant changes are reported (except for Registration). Finally, non-industrial institutions acting as Developers experience a reduction of their attrition rates in Phase II and Registration.
We now focus on the Originator-Developer relationships. While division of innovative labor and R&D alliances have become increasingly important within the industry [43], academic and non-industrial institutions have been advocated as pivotal in driving early development of candidate drugs [44], while influence of interfirm and public/private knowledge transfer on R&D productivity has been underlined [45,46]. We study the effect of different Originator-Developer (O-D) relationships on attrition rates and sales for marketed products (i.e. the logarithm of composite sales for 2002-2016). In general, we identify an O-D relationship for each of 4860 R&D projects (1863 in the decade 1990-1999, 2997 in 2000-2013). In Additional file 1: Table S8 we show a full count of these projects by the corresponding relative O-D relationship. We have found that since the year 2000 biotechnological companies have increased their role both as Originators and Developers, while pharmaceutical companies are now less dominant as Developers than they were in the past. This trend seems to be confirmed by recent reports [15]. In Table 2 we show the results of the regressions of two response variables, phase-by-phase transition rates and sales, accounting for different O-D relationships against the baseline (i.e. a pharmaceutical company being both Originator and Developer; the complete results can be found in Additional file 1: Table S9). We run the regression for data before and after 2000, taken as a reference year. We consider fixed effects of time and a broad proxy for project difficulty,   based on the classification of the targeted indication as "chronic", "lethal", "rare", "multi-factorial". The transition rate results presented in Table 2 lead to two main conclusions. First, no O-D configuration seems to outperform the baseline. This result, while not surprising, confirms that large pharmaceutical companies have kept strong development capabilities, while they have continued to improve in discovery and preclinical research, also through acquisitions of small, research intensive, biotechnology companies [4,23,43]. However, biotech firms have increased the share of R&D projects in which they act as Developers, and their performance has improved over time, converging to the benchmark provided by the baseline. This result is important, because it is showing that the transition of some biotechnology companies from being oriented mostly to discovery to becoming integrated pharmaceutical companies has been providing a positive contribution to the recent recovery of R&D productivity in pharmaceuticals.
When we consider the size of the market, we see that since the early 2000s biotech firms and non-industrial institutions after 2000 have acted as Developers of R&D projects leading to smaller markets. This result is coherent with the higher share of projects focusing on an indication specified as "rare" (last columns of Table 2).

Concluding discussion
Our analyses in this paper revealed significant improvements in different features of R&D productivity in pharmaceuticals. Attrition rates at all stages of drug development have decreased. Our findings are statistically significant, except for Phase III, due to the low number of observations after 2013. The recent decrease of attrition rates in preclinical research is a piece of evidence that will deserve further monitoring. Research on CNS has continued to experience the highest attrition rates. We found that pharmaceutical R&D has continued to focus on therapeutic indications where medical need is high (i.e. oncology and degenerative diseases of the CNS). These increasing efforts correspond to high uncertainty and high potential reward projects. Interestingly enough, we found evidence that the time to discontinuation of non-viable projects has tended to decrease.
As a possible driver of decreasing attrition rates at all stages of pharmaceutical R&D we mention the higher reliance on validation of drug targets in preclinical research, in terms of their role in the disease and their toxicity. Indeed, the extensive genetic validation of drug targets has become more widely embraced in different therapeutic areas [47,48] and it has been shown to improve the chances of passing through clinical stages [49]. Better selection of patient subsets for clinical trials via "stratification" based on biomarkers [50] is a possible factor of improvement of success rates. "Precision" diagnostic assays are increasingly used as clinical endpoints [51], contributing to strengthen selection capabilities in drug development. Finally, the higher number of monoclonal antibodies as new candidate drugs has positively affected both preclinical development and clinical grade batch preparation [52].
We found that many of the detected improvements are widespread across projects in different therapeutic areas and at different stages of development, except for Phase III, in which performances show a higher variability and the impact of molecular stratification of patients seems to be still in its infancy. R&D projects on different types of cancer have experienced significant attrition rate decreases in early stages of development (Preclinical and Phase I) while improvements at later stages (II and III) have been driven more by anti-infective drugs.
The low productivity in CNS research can be explained by several motivations: e.g., patient heterogeneity, complexity of neurodegenerative diseases that often involve multiple molecular targets, the relatively low predictive validity of experimental animal models, the relative lack of established clinical biomarkers [27,53]. To improve this situation, changes are needed in both therapeutic research and regulatory policies, and specific programs and initiatives to promote such changes are being undertaken [54,55]. At present, R&D productivity in CNS is still lagging behind.
Moreover, our analyses showed that the number of mechanisms of action in drug projects has grown over time, and that the novelty of mechanisms of action and indications has increased. New drugs are increasingly based on novel mechanisms of action. The rise in the number and novelty of mechanisms of action and indications that we discovered for recent projects and the increasing focus on high uncertainty and high potential reward projects shows that new research trajectories are opening up. We found, though, that phase duration at late stages of drug development is increasing, particularly in Phase III, pointing at requirements in terms of trial organization and outcomes. Our findings documented that increasing numbers of candidate drugs tend to target multiple (and novel) mechanisms of action, following improvements in the understanding of the genetic, molecular and cellular bases of diseases. Though this paradigm shift may result into the generation of more efficacious drugs, it might also affect the length of the process of drug design. In fact, the duration of successful preclinical research has slightly increased after 2010.
When looking at the relative contribution of different institutional types to the growth of R&D productivity, we found that a relevant fraction of the detected increases are due to better performance of biotechnological companies, in preclinical and clinical research. The rising