Donated chemical probes for open science

Potent, selective and broadly characterized small molecule modulators of protein function (chemical probes) are powerful research reagents. The pharmaceutical industry has generated many high-quality chemical probes and several of these have been made available to academia. However, probe-associated data and control compounds, such as inactive structurally related molecules and their associated data, are generally not accessible. The lack of data and guidance makes it difficult for researchers to decide which chemical tools to choose. Several pharmaceutical companies (AbbVie, Bayer, Boehringer Ingelheim, Janssen, MSD, Pfizer, and Takeda) have therefore entered into a pre-competitive collaboration to make available a large number of innovative high-quality probes, including all probe-associated data, control compounds and recommendations on use (https://openscienceprobes.sgc-frankfurt.de/). Here we describe the chemical tools and target-related knowledge that have been made available, and encourage others to join the project.

"Man must shape his tools lest they shape him" (Arthur Miller) The function of a protein can be explored in several different ways. Genetic approaches are used to suppress the expression of the respective gene/protein, for example using gene editing methods such as siRNA or shRNA or by CRISPR/Cas9 (Mali et al., 2013). However, in drug discovery, these methods have some deficiencies: they commonly remove or suppress the entire protein and thus cannot easily reveal the function of a specific druggable protein domain -although domain-based CRISPR is becoming a more widely used method; they are not reversible; their effects are not instantaneous; and they not only disrupt the protein, but also the protein interactome around the targeted protein. Selective small molecule modulators ('chemical probes'), in contrast, can probe the particular function of a targeted domain and can, therefore, be used to study its role in biological processes and in human disease in a dose and time-dependent manner across a wide range of cell and animal models. These probes can also be modified to enhance the degradation of the protein(s) they bind to (Mali et al., 2013;Toure and Crews, 2016).
Small molecules can be used in a broad panel of assay systems comprising primary cells, tissues and also in vivo models, and other systems not easily amenable even for state-of-the-art genetic target validation methods. Despite the fact that non-selective compounds cast a wide net and can be used to uncover interesting polypharmacologies, having a panel of selective probes that can be used in combination will facilitate data deconvolution and target identification. These properties, together with the possibility of further development of probes into drug candidates, make them among the most versatile tools to explore the relevance of a protein for therapeutic development. However, the necessary characterization data is often missing for chemical compounds, and inhibitors are announced as being 'selective' despite missing a comprehensive profile. Tool compounds, which are chemically unstable or not comprehensively characterized are therefore limited in their utility . Moreover, poorly characterized chemical modulators generate misleading results and litter the literature with contradicting data on a target's function and its role in biology. This is also true for probes that are used improperly, e.g. at higher than appropriate concentration thus inhibiting other proteins in addition to the target or resulting in non-specific cellular toxicity. Unfortunately, reactive and non-specific inhibitors are widely used in the academic research community, often resulting in incorrect functional annotation (Baell and Walters, 2014).
The ideal chemical probes need to be selective, active in cells and chemically stable. The recent discussion on best practice within the chemical biology community suggested a number of stringent quality criteria for chemical probes Blagg and Workman, 2017;Edwards et al., 2009;Bunnage et al., 2013). Typical criteria as applied by the Structural Genomics Consortium (SGC) are shown in Figure 1, although these may vary slightly depending on the specific protein.
A diverse set of chemical tool compounds is available to cell biologists. However, characterization data associated with these compounds are often either incomplete or buried in patents or supplemental data files of publications. Thus, scientists face a challenge to decide which tools to use for their research. Help is provided for example by the Chemical Probes Portal (Baell and Walters, 2014;Blagg and Workman, 2017), which was established in 2015 to provide a comprehensive overview of published and newly released tool compounds that are annotated with a simple star-rating system. All compounds submitted to the portal are reviewed by at least three members of an independent Figure 1. Chemical probes need to fulfil stringent criteria to qualify as research tools. Shown here are target and compound related criteria applied by the Structural Genomics Consortium. DOI: https://doi.org/10.7554/eLife.34311.002 expert scientific advisory board. Only probes that receive three stars ('Best available probe for this target, or a high-quality probe that is a useful orthogonal tool') or four stars ('Recommended as a probe for this target') are recommended to be used. Of all the compounds submitted to the probe portal so far (about 400), 125 have achieved a rating of three stars or better, thereby showing that there is an urgent need for more high-quality tool compounds to foster reproducible research.
"Excellence, then, is not an act, but a habit" (Will Durant [Durant, 1926]) Like drug discovery, probe development is a multi-disciplinary effort involving experts from several areas including protein chemistry, biochemistry, cell biology, pharmacology and medicinal chemistry (Dahlin and Walters, 2014;Garbaccio and Parmee, 2016). Once a target has been selected, the first step is the design of a project-specific screening cascade. The screening procedure needs to reflect target-related probe criteria as well as the desired compound properties.
A typical screening cascade for a kinase probe discovery project is shown in Figure 2. The screening cascade consists of a primary assay -usually a biochemical activity assay -plus an assay with an orthogonal readout, e.g., a biophysical assay, a number of selectivity assays for the target and a cell-based assay to demonstrate on-target activity in the cellular environment. If possible, a crystallization system should be established to elucidate the binding modes of selected compounds enabling the rational design of better inhibitors. In silico analyses to exclude undesired events such as frequent hitters and pan-assay interference compounds (PAINs), and assays characterizing the physical and chemical properties of the identified hits (Hughes et al., 2011) complement the analysis (Baell and Walters, 2014). Medicinal chemistry optimization is then started for selected compound classes. For a typical project, multiple rounds of the Design -Make -Test -Analyze circle (Plowright et al., 2012) are needed before a suitable probe candidate is identified. Importantly, cross-correlating results from different assays within a compound class (e.g. tracking of cellular read-outs with biochemical potency and cellular target engagement data) provide a continuing consistency check if observed effects are truly a function of inhibiting the target of interest. Experience within the SGC shows that approximately 1-2 years and e2 million are needed to generate one chemical probe fulfilling these stringent criteria (Donner, 2014). This observation is in line with the experience of many medicinal chemists at pharmaceutical companies. As there may be similarities among binding sites on both related and unrelated proteins, unwanted binding to proteins other than the original target is regularly observed. This selectivity challenge can never be completely avoided, and thus the end users should be aware of unknown cross-reactivity challenges. A way to reduce the risk of non-specific effects is the use of a suitable control compound having a chemical structure closely related to that of the probe but lacking activity on the target. A wider profiling against a panel of pharmacologically active targets or proteomics analysis of the compound provides additional information about compound selectivity. Having access to multiple probes from structurally different chemical series further reduces the risk that unknown off-target activities give rise to incorrect conclusions about target functions.

"Well begun is half done" (Aristotle)
Many quality probe compounds are buried in the chemical vaults of the pharmaceutical industry, depriving the scientific community of useful tools and limiting the impact of the original research. In some cases, particular compounds, their properties and some structure-activity relationships (SAR) have been published (Nara et al., 2014;Siebeneicher et al., 2016;Takahashi et al., 2015;Wu-Wong et al., 1999). However, often only selected data are published and the proprietary compounds are not made available to the researchers except via restrictive contractual agreements, and this impedes their use and their impact. Indeed, in the nuclear hormone receptor field, we showed that any legal encumbrances to compound access reduced the subsequent use of the compound in the literature significantly (Isserlin, 2011). Thus, the open access/open science approach is the fastest route to reach the end users and thereby to have a positive effect on research.
This evidence, as well as impact from the SGC epigenetics probes project, has convinced the SGC partner companies that the release of previously hidden compounds and data to the public will provide value to science and to the companies (Lee, 2015). To this end, seven pharmaceutical companies associated with the SGC have each agreed to donate 10 of these valuable compounds, stemming from their research pipelines, for a total of 70 high-quality small molecules, thus providing a major boost to the chemical biology toolbox. The compounds have been selected based on a variety of criteria, which are different for each participating company. These include profiling available for the compound, feasibility of generating a control compound, availability of physical compound, target class, intellectual property considerations, and other factors. This is an exciting development, but many of the compounds will require wider profiling to meet today's more stringent quality criteria. As the primary focus of the pharmaceutical industry is not to generate chemical probes, but to develop new drugs, not all donated probes have been profiled to the same depth that is required of a high-quality chemical probe. Moreover, specificity for a particular target is not a requirement for an effective drug. Thus, although most of the pharma-donated probes have been extensively characterized, they often need to be better adapted for use as a single chemical tool ( Figure 1). In particular, no bespoke control compounds have been generated as the progress of the probe compounds within the company is usually followed by extensive SAR across a series of analogues. Selection and characterization of the control compound is needed to complete a probe package. In addition, control compounds also have to be carefully characterized to weed out promiscuous compounds.
The aim of our partnership is to provide this comprehensive characterization. We believe this to be a valuable contribution to the community. Once broadly characterized and accompanied by relevant control compounds, the initial set of 70 probes reflect a collective contribution of at least e140 million to the public domain (Figure 3). These donated probes cover a broad array of targets from different protein families relevant for a number of disease indications (see Table 1).
In order to guarantee the quality of the compounds, the donated probe candidates and control compounds are subjected to a two-tier scientific review process: the first review takes place internally, including partners who have not been involved in the probe project, and the second review is performed by a panel of renowned scientists, who have agreed to act as independent reviewers. The first 30 proposals were presented to the internal review committee during a two-day meeting in June 2017 in Frankfurt am Main, Germany, where a process for their release to the public was also established. At this 'historic' meeting scientists from eight pharmaceutical companies scrutinized the quality of the probes proposed by the other partners and made constructive suggestions on improvement of the associated data packages ( Figure 4A).
In the initial set, most targets are uniquely addressed by only one chemical compound, but a maximum of two chemical probes for the same target will be accepted if they represent different chemotypes as judged by the review panels. The remaining probe sets will be provided during the course of 2018/2019. All approved probes are measured against the same quality criteria ( Figure 1) and will be profiled in assay panels comprising of >500 assays, including broad panels of pharmacologically active targets such as GPCRs, kinases, ion channels and proteases to identify off-target activities ( Table 2). Disease-specific phenotypic panels such as assays in primary tissues established by SGC partners will provide an initial characterization of their biological effects .
The proposed probes range from completely novel 'best in class', to probes that have been selected because they are provided as a complete set, with control compounds. Although some of the proposed compounds themselves are already commercially available, for most there is no widely characterized partner control compound ( Figure 4B).
The current probe proposals cover proteins from many different families such as GPCRs, kinases and proteases as well as other protein targets implicated in a variety of therapeutic areas ranging from oncology to inflammatory diseases and neurodegenerative disorders. An excellent example of a donated probe is the recently published p300/CBP histone acetyltransferase (HAT) inhibitor (A-485), which was shown to have efficacy in several cell models of malignancies (Lasko et al., 2017). This probe, including its control compound, has been approved by both internal as well as external reviewers and is now available to the scientific community. In contrast, other donated probes are not published or only mentioned in patents and therefore have not been accessible at all. Examples include a novel coagulation factor II thrombin receptor (F2R/ PAR-1) inhibitor, which has potential for thrombosis management, and an inhibitor for focal adhesion kinase (FAK) and proline-rich tyrosine kinase 2 (PYK2), which has been in clinical trials for advanced non-haematologic malignancies, but for which profiling data have not yet been available. Even previously published probes are not always widely accessible. For example, the set includes a probe for the solute carrier NHE1, a target associated with ischemia/reperfusion-       induced cell death, a peptidomimetic agonist for the KISS1 receptor, which plays a crucial role in cellular hormone function and puberty, and the inhibitor for a gamma secretase (GSI) protease, which may have potential in targeting Alzheimer's disease.
Using the infrastructure and established processes of past SGC probe projects, a nonbureaucratic and simple distribution process is implemented. This process involves distribution in bespoke probe libraries under a simple webaccessible Open Science Trust Agreement (http://www.thesgc.org/click-trust) as well as through trusted commercial vendors. To the best of our knowledge, our initiative is unique in enabling open access to well-validated probes including controls generated in the pharmaceutical industry for diverse target families.
All supporting potency and selectivity data, as well as advice for the appropriate use of the compounds for cellular assays and -if applicable -in vivo assays, will be easily accessible via the public database (https://openscienceprobes.sgc-frankfurt.de/). The launch for the first version is planned for the beginning of 2018. The database supports the data needs of both biologists and chemists. The first version focusses on a search for the target proteins, probes, control compounds and recommendations on use. For the second version, additional features such as chemical substructure searches will be accessible. Full assay details will be provided and reagents used will be listed so that scientists using the probes are enabled to judge the quality of the data provided as well as to reproduce key data in their own lab. For example, it is important to know if a protein kinase has been screened in a binding or activity assay, and which ATP concentration has been used. Further, the protein construct used to perform certain assays is of significance.
As both the probes and the negative controls will be characterized in more than 500 assays, we will generate more than 70,000 biological data sets within the next 1-2 years: a rich and easily accessible source for future analyses. By providing the data in a comprehensive way we hope to extend our understanding of this particular mechanism or protein in a way that leads to new therapeutic approaches.
The new tool compounds and the corresponding data will help to improve the quality of research and will deepen our understanding of the target biology. However, comprehensive characterization, which ideally should be consistent to make data comparable and facilitate data mining, comes at a cost, and in many cases also requires resources for the (re-)synthesis of the chemical probe. The biggest problem is in the availability and characterization of the A. Attrition rate of the approval process of the proposed chemical probes. B. Approved probes were categorized to show their differentiation from available chemical modulators (i) targets for which there are currently no high-quality probes available; (ii) targets for which the donated probe promises a significant (e.g. 10-fold) benefit in potency or selectivity; (iii) cases in which the new donated probe has similar potency/selectivity as currently available probes but an entirely different chemotype; (iv) best in class compound where none of the above points apply and where the benefit lies in the availability of the control compound and/or the data annotation. The distribution of probes will occur via commercial vendors, but it is no trivial task to make the well-characterized control compounds available to the user. Due to reduced revenues from the control compounds, vendors are often reluctant to offer these important controls. Regrettably, researchers often perform experiments without the appropriate control compound due to cost reasons or because initial experiments have already been performed without the control. A trial kit, which we will offer, including both probe and control compound, and/or sets of pre-diluted compounds may aid researchers to perform properly controlled experiments from the start. It is up to the combined efforts of researchers, vendors, journal editors and referees to make use of the chemical probes in combination with their available control standard practice in biomedical research.
"From a small seed a mighty trunk may grow" (Aeschylus) While in the past almost all aspects of pharmaceutical research and development (R&D) were seen as competitive, the thinking in the field has shifted remarkably over the last decade. More and more challenges in the R&D process are seen as precompetitive, resulting in public-private partnerships and multilateral, critical mass consortia jointly addressing overarching issues. Many pharmaceutical companies have initiated open innovation projects interacting with the academic community. A key success factor for these endeavours is the easy access to knowhow and reagents without complicated contractual arrangements (Nilsson and Felding, 2015;Ehrismann and Patel, 2015). We hope that the project initiated here will entice other companies and academics to follow suit and join us in the quest to increase the availability of well-validated probes meeting stringent quality criteria for the scientific community and decide to make some of their assets openly available. Whilst ultimately, the success of the project will depend on the willingness and support of the scientific community, additional pharmaceutical companies and funding bodies to engage, we believe this is an exciting first step in uncovering and delivering high-quality chemical probes to unlock new biology and ultimately new highquality targets for drug discovery.