When fragments link: a bibliometric perspective on the development of fragment-based drug discovery.

Fragment-based drug discovery (FBDD) is a highly interdisciplinary field, rich in ideas integrated from pharmaceutical sciences, chemistry, biology, and physics, among others. To enrich our understanding of the development of the field, we used bibliometric techniques to analyze 3642 publications in FBDD, complementing accounts by key practitioners. Mapping its core papers, we found the transfer of knowledge from academia to industry. Co-authorship analysis showed that university-industry collaboration has grown over time. Moreover, we show how ideas from other scientific disciplines have been integrated into the FBDD paradigm. Keyword analysis showed that the field is organized into four interconnected practices: library design, fragment screening, computational methods, and optimization. This study highlights the importance of interactions among various individuals and institutions from diverse disciplines in newly emerging scientific fields.


Introduction
Fragment-based drug discovery (FBDD) is now a widely adopted approach to lead discovery [1,2]. The field's origin can be traced back to its first demonstration 20 years ago at Abbott Laboratories by Shuker, Hajduk, Meadows and Fesik [3]. The historical development of FBDD has been discussed as anecdotes, for example during lectures at various conferences [4] and in scientific publications [5,6]. The technical aspects of the approach have also been described in key reviews [7][8][9]. Still, there are insights to be learned by systematically studying how the field has developed. In this paper, we look at the organizational and social aspects of FBDD's development by analyzing scientific publications that describe new developments in the FBDD field and the references that are provided in those publications. To analyze these records, we employ bibliometrics, an approach in information sciences to analyze the relationship among written publications.
In the past, technological breakthroughs have resulted from scientists working together at the interface of diverse disciplines, recombining knowledge from various fields [10]. FBDD's emergence can be seen as various scientific fields coming together: computational methods, molecular biology and medicinal chemistry. With pharmaceutical sciences being more multidisciplinary and the pharmaceutical industry seeking more collaborations especially in pre-clinical development [11][12][13], it is appealing to investigate the drivers that made FBDD so successful. With the increasing interest of how organizational factors can enable drug discovery [14], we seek to understand the roles of various groups from industry and academia in the rise of FBDD. By tracing how each publication from academia and industry influenced the field, we can better understand the role of each institution in driving forward new innovations.
Finally, looking at the trends in keyword usage in the publications over time and identifying which keywords usually go together in these publications can lead to a better grasp of how the field is organized. More importantly, by looking at how each keyword trend over time, we can get a sense of how the focus of FBDD changed over time and what its current direction is.

Data and Methods
The papers analyzed in this study were downloaded from Thomson-Reuter's Web of Science (WOS). In order to collect an initial set of papers in the field of fragment based drug discovery, keywords (see Fout! Verwijzingsbron niet gevonden.) were used.

Figure 1. Data Collection for FBDD Publications
The keyword search generated 3,208 papers. To ensure that the keyword "fragment" was used to refer to the field, we looked at the abstract, title and keywords fields of the publications and tallied the phrases that co-occurred the most with "fragment." We removed the combinations that were unrelated to the field, resulting to a dataset of 2,781 papers. To verify whether these papers were representative of FBDD, we inspected the dataset and found that key publications in the field were not captured by the keywords used in the preliminary search. Examples include Hopkins's paper on ligand efficiency [15] and Hann's paper on molecular complexity [16], as these have not mentioned any of the used keywords (see Fout! Verwijzingsbron niet gevonden.). Thus, an additional data collection step was performed. Using the first set of papers, we checked for their most cited references. Analyzing the references, we identified 861 additional publications that were cited at least 10 times. It is important to note that this list contains some publications that may not be directly related to FBDD development but nevertheless helped to shape the field. An example is the many references to Berman's publication describing the Protein Data Bank (PDB) that marks the pivotal role of protein structural information in FBDD [17]. Merging these publication lists resulted in a total number of publications of 3,642 that span the years from 1953 to 2016.
In order to understand the development of FBDD, we set the hallmark publication of Shuker, et al [3] in 1996 as the starting point of our analysis. We analyzed papers in the dataset that were published from 1996-2016 in intervals of 5-years. Various analyses were done to show the role of prior scientific knowledge in adjacent fields and of university-industry collaborations in the development of the field. First, the most cited articles in our dataset of FBDD articles were identified to find the core papers in FBDD. For further analysis, we used the software CitNetExplorer [18] to map the top 100 cited papers, showing the citation relationship among them, allowing us to trace the evolution of knowledge. To study how collaboration between academia and industry evolved in the last 20 years, we generated coauthorship network maps using the software VosViewer [19]. In an effort to uncover the scientific roots of FBDD, we also analyzed the scientific field that the FBDD articles belonged to. Moreover, cluster analysis of keywords was performed. By plotting a network map of keywords that co-occur in publications, we are able to identify the disciplines that researchers study.

Results and Discussion
Emergence as "Fragment-based Drug Discovery" The fragment-based approach to drug development is widely recognized to have started in 1996 with its first demonstration at Abbott Laboratories [3]. This seminal paper referred to the approach as Structure-Activity Relationship by Nuclear Magnetic Resonance (SAR by NMR), for the first time demonstrating the detection, ranking and progressing of small and weak affinity binders.
In our analysis, the FBDD publications in the first five years would mostly operate under the general umbrella of drug discovery instead of distinguishing itself as a particular discipline. However, traces of the keywords related to fragment-based drug discovery has already been present as early as the nineties, for example in the computational work of Joseph Moon and Jeffrey Howe [20] at Upjohn, Sergio Rotstein and Mark Murcko [21] at Vertex, and Hans-Joachim Böhm [22] at BASF. Synonyms such as needles and needle screening, used to describe early applications by Böhm and co-workers, now at F. Hoffmann-La Roche [23], were not adopted by the scientific community as these keywords were used in less than 5 publications in any year. As seen in Fout! Verwijzingsbron niet gevonden., It would take a few more years before research in the field would come together in a term such as 'fragment-based drug discovery', first appearing in the abstract of the 2002 paper of Murray and Verdonk [24]. Even in those days, the field would alter between the keywords 'lead' and 'drug'. The term lead discovery would dominate in the early years, stimulated by influential reviews coming from Astex [25][26][27] in the mid-2000s. Differentiating between the two, the term lead emphasizes the early stage wherein fragments are used (for example, before pharmacokinetic properties are being considered). On the other hand, the term drug can be helpful in that it contextualizes the ultimate goal that fragments aim to achieve, and that is to develop drugs.
By 2009, the term fragment-based drug discovery would finally overtake as the top keyword that researchers used to identify the field while lead discovery has lost favor back from its peak in 2005 as shown in Fout! Verwijzingsbron niet gevonden.. As it currently stands, the field is still divided between drug discovery and drug design. Discovery refers more to the finding of a new drug or drug candidate, while drug design puts more emphasis on the rational approaches to build the new drug (candidate). As it is, the abbreviation FBDD is now being drug discovery has been favored to drug design, being used as much as 3 times more in 2016 according to the Web of Science, although the different words seem to be used as synonyms. Figure 2. Occurrence of FBDD umbrella keywords in the literature. These keywords were chosen as these were the terms used to refer to the field in various important reviews.
Aside from the more extensive keyword use, the growth of the field is shown by looking at the increase in number of publications (Table 1). From an initial number of 277 publications in its first five years, this has now increased six-fold to 1709 publications from 2011-2016. There has also been an increase in the number of unique institutions, authors and countries associated with the field, clearly indicating that the approach is being adopted by an increasing number of scientists. Big Pharma 1 7 7 7 a Threshold needed to be set as some firms and researchers may co-author publications but not necessarily practice FBDD From Ideas to Application: The Role of Industry Clearly, the industry has played a pivotal role in developing FBDD. Although the approach was first demonstrated at the big pharmaceutical firm Abbott Laboratories [3], other organizations in the private sector were instrumental in subsequent FBDD development, in particular by improving emerging technologies and approaches to allow application in drug discovery. In the first few years of the field, majority of articles were coming from the industry. This is noteworthy because an inherent bias towards universities is expected when focusing on scientific publications, due to the incentive of academia to publish. . Considering that the industry has the opposite incentive of withholding information for competitive advantage [28,29], it emphasizes how influential the industry was in the development of FBDD. This is also supported by looking at the top institutes in terms of scientific impact, as measured by citations. As seen in Table 2, especially for the first years of FBDD, the industry clearly led the field. Abbott Laboratories dominated in the late nineties. Astex (founded in 1999 by University of Cambridge professors Tom Blundell and Chris Abell and former head of structural biology and bioinformatics of GlaxoWellcome dr. Harren Jhoti) led in the next decade. Only in the recent five years would there be a surge in publications from the academia in the top 10 list. Table 2 also shows that biotech companies such as Astex, Vertex and Sunesis have played an important role in establishing the field. However, it is also important to note that some prominent biotechs and pharmaceutical companies in FBDD do not show up in this particular analysis as they might have put less emphasis on authoring scientific publications. The important role that the private sector has played in FBDD innovation is also apparent when looking at the top 10 cited papers from our collection of FBDD papers (Table 3). A staggering number of nine of the ten top publications were written by industry researchers. The only paper in the top 10 coming from academia is Berman's publication on the Protein Data Bank [17], which does not strictly belong to FBDD but is a fundamental resource for drug discovery research in general and for FBDD in particular as many of the hit fragment optimization programs have been guided by protein structural data. Next to some influential reviews, including work from Hajduk, Congreve, Rees (all from Astex) and Erlanson (at that time working for Sunesis Pharmaceuticals), the conceptual Ligand Efficiency (LE) work of Hopkins and co-workers (at that time working for Pfizer) has made enormous impact (rank 2, Table 3). LE assesses the contribution of every atom on the affinity of the ligand and is being used to select the most promising fragment hits and to guide the growing of the fragments into bigger drug-like molecules. Also, the theoretical work of Hann and co-workers at Glaxosmithkline (rank 6, Table 3) on understanding how molecular complexity impacts hit finding has been influential for FBDD. Amongst others, this work raised the realization that fragments should be simple and small molecules that can interrogate the binding sites with higher resolution. Amongst others, this has resulted in the guidelines captured in the Rule of Three that define quality fragments. This popular mantra was attractively pitched by Congreve and coworkers (ranked 4) as a variation on Lipinski's Rule of Five (ranked 7, Table 3) that defines the properties of soluble and permeable drug-like molecules, the ultimate goal of FBDD efforts. However, if we look at the top cited journals in the recent years (Table 4), seven out of the 10 most cited publications were authored by academia from 2009. This adoption by academia is validated as well by the increase in the share of publishing universities and research institutions in FBDD in the last five years. One of the factors for the adoption by academia is the rise of academic medicinal chemistry and drug discovery groups [11,30]. We can also speculate on the mobility of researchers, including experts from industry that move towards university, setting up academic drug discovery research groups. As there has been an increase of interest in how researcher mobility affects innovation [31], the impact of this mobility and transfer of knowledge in FBDD's development will be the topic of future research. We then explored the list to the top 100 cited articles in FBDD, representing the core papers of FBDD. By creating a citation map of these articles over time, we can visualize the evolution in ideas within FBDD and the changing roles of industry and academia in shaping these ideas. Whereas Table 2 and Table 3 reveal the dominating role of the industry in establishing FBDD, the plot in Fout! Verwijzingsbron niet gevonden. reveals that ideas and tools developed in academia provided groundwork for the field.
. Figure 3. Citation map of 100 core papers in FBDD. Each paper is labelled by its last author. The colors reflect the affiliation of the authors Square highlights mark review articles.
Most of the theoretical grounding of FBDD came with ideas from the academia as early as 1970s. This early influence by the academia can be seen explicitly with the paper of Jencks from Brandeis University [32]. In his paper on the additivity of binding energies, he puts out the idea that large molecules can be considered as a combination of fragments.
On the upper left side of the citation map, a number of papers authored in the academia can also be seen.
These are foundational publications about influential drug discovery tools such as the Protein Data Bank in 1977 [33], molecular docking approaches by Ferrin and co-workers in 1982 [34], the molecular modelling software CHARMM by Karplus and co-workers in 1983 [35] and Goodford's computationalprocedure for determining energetically favorable binding-sites in 1988 [36] and functionality maps of binding sites by Karplus et al. in 1991 [37]. Also other computational chemistry efforts (e.g., Karplus, Schneider, Hubbard) to develop de novo structure generation and molecular docking software have made a tremendous impact. Frequently, the developed algorithms use fragment-based approaches as computational "tricks" to dissect the complication of having to assess and weigh the various properties of bigger, drug-like compounds. In those early nineties, the technologies and protocols to determine fragment binding to proteins, using for example sensitive biophysical technologies, were not yet available. The computational approaches were also adopted by industry, for example by Schneider at Roche and both Klebe and Böhm at BASF. The latter scientist also contributed to the pioneering needle screening work at Roche (vide supra) that combines in silico approaches with biochemical and biophysical screening as an early example of fragment-based approaches in hit finding and lead development. The impact of Abbott Laboratories in developing the applications is not only apparent from the work of Fesik and co-workers with NMR technology, also the work of Greer and co-workers that focusses on discovering ligands using X-ray crystallographic screening. Later, their crystallographic screening method called CrystaLEAD) was further developed and exploited by influential scientists like Hubbard (University of York, Vernalis), Rees, Jhoti (Astex), Abell, Blundell (University of Cambridge, co-founders of Astex). These high throughput X-ray crystallographic screening efforts were supported by academic activities such as the development by Cowtan and co-workers of the software COOT, a program that is used to display electron density maps and atomic models.
With academia laying out FBDD's foundations and Big Pharma first demonstrating the technique in 1996, the road was now ready for the field's valorization. The next decade of key FBDD publications would almost exclusively come from the industry. Especially in the early 2000s, smaller structure-based drug discovery companies like Astex, Vertex and Sunesis come to play an important role. These biotechs specialized in specific FBDD technologies and approaches (e.g., crystal soaking, biochemical assays, tethering) and perfected them for application in hit finding and lead generation. Fragments provided a way for these companies to obtain hits without the need to invest millions in compound libraries and robotics that are needed for typical high-throughput screening (HTS) approaches [6]. It is noted that not all known technologies and FBDD companies do appear in this bibliometric analysis, possibly because their restricted efforts to publish in scientific literature. It is interesting to see that those companies that do publish make an significant impact when considering collaborations that publish FBDD work (Fout! Verwijzingsbron niet gevonden.).
In the early years of FBDD, the majority of institutions involved were carrying out research independently. In this period, only a small group of mostly academic institutions were collaborating with a few players in the industry (Fout! Verwijzingsbron niet gevonden.a). This is seen by mostly fragmented nodes on the right side of the plot.
However, by the early 2000s, a network of university-industry collaborations started to form (Fout! Verwijzingsbron niet gevonden.b). With the research in FBDD becoming more collaborative, institutions from big pharma, spinoffs and academia coauthored more and more articles together. Especially in the final period from 2011-2016, a greater degree of integration among practicing institutions can be observed. The tight integration shows that FBDD is a high-tech and multidisciplinary research field in which specialists in various research area collaborate in developing new pharmaceuticals. It is noted that the development of this field also coincides with the transition of the pharmaceutical landscape in which the big companies outsource more in their pre-clinical research [38,39], an important change that seems to have shaped the FBDD field. To further understand how FBDD integrates knowledge from various scientific disciplines, we manually classified the previous core papers according to their content and discipline of origin as seen in Fout! Before 1996, the scientific groundwork that would eventually be integrated in FBDD would come from two separate fronts. As seen on the upper right side of Fout! Verwijzingsbron niet gevonden., on one end, we have the work of Jencks which provided the theoretical rationalization for fragments. On the other end (green cluster of Fout! Verwijzingsbron niet gevonden.), the previously discussed methodologies, that are fundamental in FBDD research can be seen. These computational approaches form an independent branch that used fragment approaches in binding energy calculations and de novo structure generation software. As seen in Fout! Verwijzingsbron niet gevonden., there is a clear separation between these two branches with no paper citing the two, prior to Fesik's hallmark publication.
Thus, it shows how key the SAR by NMR Science paper by Fesik and co-workers was in jumpstarting the field. As shown in Fout! Verwijzingsbron niet gevonden., the paper serves as a hub where a dense amount of publications branch from. The publication of Fesik brought the two separate branches together, explicitly referring to the paper of Jencks while also referring to Bohm's LUDI [22], Hubbard's HOOK [40] and Murcko's GroupBuild [21] at the same time. By this, the theoretical considerations and the computer-aided drug design capabilities were combined, enabled by the emerging biophysical screening technologies (e.g., NMR) and combined with X-ray crystallography to respectively measure and visualize low affinity fragment binding.
We looked at the categories of the journal sources of FBDD papers. Doing so allows us to see the disciplines that FBDD was building from. In Fout! Verwijzingsbron niet gevonden., before 1995, FBDD literature cited articles coming from the fields of biophysics, biochemistry and molecular biology and computer science. This signals that advancements in knowledge in these various fronts was necessary for FBDD to expand. It also gives a clear indication that FBDD is going mainstream with many publications now appearing in the more applied Medicinal Chemistry field, whereas in the early years most papers where in the field of biochemistry & molecular biology, biophysics and computational chemistry. Figure 6. Categories of journals over time Although this cluster includes the pre-90's computational techniques described before, the influence of this cluster extends into the early 2000s, including de novo structure generation and docking algorithms such as Glide [41] in 2004 and development of frequently used databases such as ZINC [42] in 2005.
Referring back to Fout! Verwijzingsbron niet gevonden., the blue cluster on the right side is composed of what is considered as integral FBDD publications. These include principles and demonstrations of how various biophysical techniques can be used in the paradigm of FBDD. Included as well are applications of FBDD to various therapeutic targets, i.e., the actual use in drug discovery [43,44]. Moreover, it also includes 16 key reviews that summarize and integrate knowledge in the field.
We also see a violet cluster at the early stages of FBDD from 1996-2002, which describes concepts on the molecular basis of the approach. One way of interpreting this is that there are researchers (like Fesik) who bridge the gap between a new field and established methods. In this case, they provided the molecular basis of FBDD. By formulating principles from their outsider perspective, they are able to integrate previously unexploited The citation map also shows an orange cluster that was integrated into FBDD relatively more recently. These are papers in the field of crystallography such as the CCP4 suite [45] in 1994, Minor's processing of X-ray diffraction data [46] and Dodson's refinement of macromolecular structures [47] both in 1997.
The impact that crystallography would bring to FBDD carries on today. By analyzing the keywords used in the abstract and title of the publications in the field, we can get a sense of the methods that catch the interest of practitioners. As seen in Fout! Verwijzingsbron niet gevonden., although nuclear magnetic resonance has been the dominant technique in the first years of FBDD, it has been replaced by x-ray crystallography in the last five years. It is important to note that this does not perfectly reflects the usage of such techniques in various laboratories but rather reflects the identifiers that are used by authors to attract their targeted readership.
Today, the field is organized into four interrelated practices. To come up with these four classifications, the top keywords in FBDD was plotted and clustered according to how often they occur together per paper (Fout! Verwijzingsbron niet gevonden.). Four clusters were detected, corresponding to the four major interrelated activities in FBDD -designing the fragment library, screening them using for example biophysical techniques, modeling using computational methods and optimizing the lead. Although the position of the keywords generally indicate the category and interrelatedness of the keywords with each other, the position must be taken with a grain of salt as keywords are more often than not related to the three other dimensions of FBDD.
In order to see what the trends have been in FBDD in the years, these keywords were colored according to the average year of publication. As seen in Fout! Verwijzingsbron niet gevonden., the colors correspond to the average year of keyword occurrence. Interestingly, there is a trend towards the upper left cluster of molecular biology, with more keywords occurring more recently. This is expected as the field has been moving towards applying FBDD, instead of building basic knowledge that comes from the other clusters.
As FBDD matures, it has been applied to more targets. This can be seen by the curious case of publication of Bradner [48]. Going back to Fout! Verwijzingsbron niet gevonden., this publication does not cite the core FBDD literature yet is cited by a lot of the recent papers in FBDD. This publication on the inhibition of BET bromodomains has been an area of interest for FBDD researchers in the recent years.
Together with other targets, the focus now for FBDD has been its application. The most cited references in the recent years (as seen in Table 4) have been reviews showing how more and more leads originating from FBDD are entering the clinical trials. Not only industry is using the technique, but also various academic groups. As Baker [2] puts it on a news article in 2013, FBDD has indeed grown up.

Conclusions
In our paper, we have shown the history of FBDD through bibliometric methods. In the early days of the field, research in FBDD was highly fragmented, operating under the general umbrella of drug discovery. Today, scientific progress in FBLD and FBDD are organized with the leading keywords "fragment-based lead discovery", "fragment-based drug discovery" and "fragment-based drug design." Although all these terms refer to the same approaches, they put emphasis on different aspects of work and the ultimate aim of the endeavors.
The history of FBDD provides a solid case on how recombining knowledge from various worlds can advance science. This was seen in two levels. First, on the organizational level, industry and academia played their respective roles reliably well. Academia laid down the theoretical foundations and also generated research on methods that can be later implemented industrially. With the basic science laid out, industry was able to valorize the knowledge and integrate them in actual drug discovery efforts. Progress in FBDD was able to occur alongside a growing interconnected network of collaborations among various institutions. The studies clearly identify an increasing interconnectedness between academia and industry. Interestingly, FBDD research field has developed in the same years that the pharmaceutical research landscape was undergoing major changes, with the big pharmaceutical companies outsourcing more and more pre-clinical research work [49]. As such, FBDD forms an interesting topic to further explore business development and innovation management in the pharmaceutical sciences. Using the bibliometric database as a premise, we would like to deepen the knowledge on how collaborations are formed. Also, with collaborations in FBDD being increasingly created, it is of value to understand how these collaborations are maintained so that all the complementary abilities of each partners are synergized instead of working separately. Finally, it is of essence to evaluate the success of these initiatives towards open innovation.
The technical aspect of FBDD's development show us that integration of outsider technologies with solid theoretical grounding is a useful approach to innovation. Being able to spot opportunities for integrating is becoming a more valuable skill for researchers wanting to stay on top of their fields. It is of interest then to understand how both academia and industry cope with this need. Further surveys should be done on this front.
Future studies can address the limitations of our current approach. In this bibliometric analysis, we only focused on scientific publications in FBDD. This analysis has identified the key opinion leaders of the field and clearly, publications that are accessible for the world-wide research community make an obvious impact. However, certain key contributions to the FBDD field are excluded from the analysis. As pharmaceutical companies and biotechs are often not incentivized to publish, analyzing the patent landscape might be able to characterize better the current state of collaborations in the field. Collecting additional data sources such as companies disclosures, conference attendance and new chemical entities in the market could provide a comprehensive picture on the growth of FBDD. By connecting and analyzing these data together, it would be possible to better understand the factors that allow companies to successfully bring their laboratory results towards the market. We believe that building a better understanding of business development and innovation management in such a well-defined and recently developed research area as FBDD offers useful case studies to describe the changing landscape of pharmaceutical sciences.

GRANT SUPPORT
This work was supported by the European Union's Horizon2020 MSCA Programme under grant agreement 675899 (FRAGNET);