Evaluating New Chemistry to Drive Molecular Discovery: Fit for Purpose?

Abstract As our understanding of the impact of specific molecular properties on applications in discovery‐based disciplines improves, the extent to which published synthetic methods meet (or do not meet) desirable criteria is ever clearer. Herein, we show how the application of simple (and in many cases freely available) computational tools can be used to develop a semiquantitative understanding of the potential of new methods to support molecular discovery. This analysis can, among other things, inform the design of improved substrate scoping studies; direct the prioritization of specific exemplar structures for synthesis; and substantiate claims of potential future applications for new methods.


Introduction
Thee xploration of chemical space is an enduring challenge associated with the discovery of functional small molecules.T oaddress this challenge,synthetic methods need to enable the exploration of diverse and novel molecular property space that is relevant to the specific discovery application in question. Guidelines have been formulated to help steer chemists towards property space that is relevant to particular classes of functional small molecules,i ncluding drugs [1] and agrochemicals. [2] In spite of this,t he synthetic toolkit that currently dominates molecular discovery is remarkably narrow, [3] which not only contributes to our collective unsystematic and uneven exploration of chemical space, [4] but can also drive discovery away from optimal property space! [3a] Recent high-profile articles authored by industrial scientists have challenged the academic community to develop innovative synthetic methods that align with future discovery needs. [5][6][7][8][9][10] In many cases, compounds that would provide good starting points for discovery have substantially different properties to those of final optimized [1,2] functional molecules,a nd specific attributes of screening compounds, [5] fragments, [6] building blocks, [7] and robust reactions [8] ford rug discovery applications have been clearly described. Such articles can galvanize the academic synthetic chemistry community.I na ddition, major funded initiatives can facilitate the translation of synthetic approaches developed in academia:f or example, the European Lead Factory has harnessed innovative synthetic methods in the construction of its distinctive compound collection. [11] We argue,h owever,t hat in order to substantiate the potential value of new synthetic methods for discovery applications,i ti sn ecessary for researchers to demonstrate that (and contextualize how) appropriate property space may be targeted. [12] Although computational tools for library enumeration and molecular property prediction are widely used within industry,such tools are not yet widely used by the academic synthetic chemistry community,w ith only limited examples to date. [13] Herein, we illustrate some of the limitations imposed by the failure to appropriately benchmark substrate scope;w e demonstrate that simple and readily available computational methods can help design thorough investigations of the scope and limitations of new synthetic methods;a nd we show that these analyses can help identify future applications of emerging methods to support molecular discovery.
Asour understanding of the impact of specific molecular properties on applications in discovery-based disciplines improves,t he extent to whichpublished synthetic methods meet (or do not meet) desirable criteria is ever clearer.Herein, we showhow the application of simple (and in many cases freely available) computational tools can be used to develop asemiquantitative understanding of the potential of new methods to support molecular discovery.T his analysis can, among other things,inform the design of improved substrate scoping studies; direct the prioritization of specific exemplar structures for synthesis; and substantiate claims of potential future applications for new methods.
2. Demonstrating the Scope of Synthetic Methods: Not All "R Groups" are Equal Thet ake-up of new synthetic methods by end-user laboratories is,i np art, [12] reliant on demonstrating that the substrate scope is relevant to discovery applications.F or example,s tudies should ideally demonstrate that an ew method is compatible with substrates bearing heterocyclic and functionalized substituents,inaddition to carbocyclic and unfunctionalized variants.I nm any array syntheses,t he presence of polar functionality in reactants or reagents correlates strongly with unsuccessful reactions,d ue either to intrinsic failure of the chemistry or to purification issues. Arrays of produced compounds hence tend to be systematically less polar than designed, an undesirable outcome termed "logP drift". [5] An unrepresentative focus on specific structural features such as particular aromatic regioisomers can also cause unanticipated issues.F or example,ahistoric bias towards p-substitution has been observed in medicinal chemistry,w hich tends to reduce the three-dimensionality of the possible products. [4b] To illustrate the bias in the reported substrate scope of new reactions within the published literature,weanalyzed the synthetic methodology papers that appeared in the first issues of Angew.Chem. Int. Ed. (23 papers) and J. Am. Chem. Soc. (6 papers) in 2016 ( Figure 1). Specifically,w er ecorded the range of variable aryl and hetaryl groups that had been reported.
Some trends were apparent. First, the variation of (substituted) phenyl groups tends to be investigated much more thoroughly than that of hetaryl groups ( Figure 1A). Second, there are relatively few examples in which the introduced (het)aryl group bears functionalizable handles (such as halides,( protected) amines and alcohols,c arbonyl and carboxyl groups,a nd boronic acid derivatives;F igure 1B). Finally,t he introduction of p-substituted phenyl rings is demonstrated much more frequently than other substitution patterns ( Figure 1C), paralleling historic medicinal chemistry investigations. [4b] Thep rominence of particular types of substituent may accurately describe the actual scope of the synthetic method, or may simply reflect the fact that al imited range of substrates was investigated experimentally.T od istinguish between these possibilities,w es uggest that best practice of researchers should be to routinely investigate awider range of substrates,a nd that all unsuccessful examples should be reported (for example,i nt he Supporting Information). Of course,c oupled with this,r eviewers will need to regard the reporting of such "negative" results not necessarily as al imitation of the reported method, but rather as adding value to the reader and prospective end user,a nd indeed as aspur for future developments and improvements to existing methods.I nt he course of the above analysis,d espite the undesirable bias in the variable groups reported when viewed in aggregate,there were some examples of good practice,for example,anew method for a-(het)arylation of a-aminomethyltrifluoroborates using dual Ir/Ni catalysis,i nw hich av ery wide range of substituted aryl and hetaryl groups was explored (25 %h eteroaryl; > 60 %f unctionalizable). [14] More generally,w en ote that Glorius and Collins have described av ery useful screening approach that can help identify robust reactions that are particularly tolerant of  ab road range of functional groups, [15] which is now being adopted by others. [16] Given the apparent unsystematic exploration (or reporting) of molecular property space in many published studies, we demonstrate below how computational tools may facilitate the design of thorough analyses of the scope of new synthetic methods.

Aligning Emerging Synthetic Methods with Specific Discovery Applications
Thei dentification of potential applications of new synthetic methods may be facilitated by computational tools for the enumeration and analysis of virtual libraries.H erein, we have used our open-access computational tool Lead Likeness And Molecular Analysis (LLAMA); [17] however, other openaccess tools (e.g., KNIME) [18] and many commercial tools are also available.S uch tools could also be used by synthetic chemistry researchers to select reactants that allow the full scope of their methods to be demonstrated. In most cases,we used as tandard set of typical medicinal chemistry capping groups to decorate the scaffolds (see the Supporting Information). Herein, we highlight some recent exciting synthetic methods and provide evidence to suggest that they have the potential to drive the discovery of different classes of functional small molecules.F or clarity,o ur analyses focus on specific combinations of molecular properties in each case to demonstrate the point in question, though clearly in industrial applications,the full range of relevant properties needs to be considered. Attempts to capture performance against arange of molecular property measures have been made,for example in aC NS drug-likeness score, [19] am etric capturing the chemical beauty of drugs, [20] and alead-likeness penalty. [17]

Methods to Target Property Space Relevant to Discovery Applications
As discussed in Section 2above,itishighly desirable that any presentation of an ew synthetic method examines an appropriately broad substrate set, for example in terms of chemical functionality and substitution patterns,b ut significant additional useful information can be garnered by examining the properties of the product set as awhole.Willis and co-workers have developed anovel synthesis of sulfonamides in which an organometallic reagent is reacted with asulfur dioxide equivalent (DABSO) to give aproduct that is then directly reacted with an aqueous solution of an amine under oxidative conditions (Figure 2A). [21] Them ethod was found to be implementable in array format, and using ac ombination of 7o rganometallic reagents and 10 amines, the synthesis of 65 (of 70 possible) products was successfully demonstrated. Thediverse range of functionalized substrates used in this investigation was particularly notable.
Thea uthors own molecular property analysis suggested that their method has the potential to support both drug and agrochemical discovery.O ur analysis ( Figure 2B)c onfirms that all of the array products have molecular weights and calculated lipophilicities (AlogP) consistent with Lipinskis guidelines for orally bioavailable drugs.M oreover,as ignificant proportion also have molecular weights and lipophilicities that meet Clarkesguidelines [2a] for insecticides (60 %of the compounds;2 10 MW 500 and 0.9 logP 6.6), fungicides (41 %o ft he compounds;2 10 MW 380 and 1.4 logP 4.8), and herbicides (57 %o ft he compounds;2 30 MW 430 and 0.7 logP 4.9). Whilst other molecular properties are also relevant in drug and agrochemical discovery,t his simple analysis confirms the potential value of the method to support future discovery needs.

Design of Investigations to Test the Scope and Limitations of Synthetic Methods
Molecular property analyses could prospectively also be used to design investigations into the scope and limitations of new synthetic methods,f or example by prioritizing those substrate and reagent combinations that enable the most representative examination of molecular property space.W e noted the virtues of WolfesP d-catalyzed aminoarylation reactions for the synthesis of diverse pyrrolidines 1 (Figure 3A). [22] Distinctively,t he approach enables both the synthesis of the scaffold and the introduction of av ariable substituent on carbon through the aryl halide component.

Angewandte Chemie
We sought to demonstrate the application of aminoarylations to the synthesis of drug-relevant scaffolds.T oguide this study,wedetermined the molecular properties of virtual libraries derived from the previously reported scaffold (1a)as well as five related scaffolds that might also be prepared ( Figure 3B): in each case,t he scaffolds were decorated once with the set of exemplar capping groups.N otably,t he mean calculated lipophilicity of the derived libraries could be varied enormously (ca. 3A logP units;F igure 3C)w ithin the highly structurally conserved series.W einvestigated the synthesis of scaffolds 1d-f experimentally,a nd demonstrated that all three scaffolds could be prepared in good (60-87 %) yield and with high diastereoselectivity. [13c] Thesynthetic approach was subsequently exploited in the synthesis of more than 500 compounds that were added to the compound collection of the European Lead Factory.

Identification of Novel Scaffolds for Drug Discovery Applications
Theability to elaborate diverse,novel scaffolds for library enrichment in asynthetically efficient manner is asignificant challenge.B ode and co-workers have developed SnAP reagents that enable the synthesis of structurally diverse heterocycles from aldehydes or ketones (Figure 4). [23] As an example,condensation of the amino-substituted stannane 2a with benzaldehyde gives the corresponding morpholine 3a ( Figure 4A). Arange of SnAP reagents (many commercially available) has been described, each leading to distinctive heterocyclic scaffolds.T he potential value of the products as building blocks for drug discovery has been noted. [24] To demonstrate potential applications of the approach, we considered arange of building blocks that might be prepared by the combination of known SnAP reagents with benzaldehyde ( Figure 4B). In each case,t oc omplement the diversity possible by varying the aldehyde reactant, the building blocks were additionally decorated once using the exemplar capping groups (see the Supporting Information). Them olecular properties of the resulting virtual libraries are shown in Figure 4C.
We note that all of the scaffolds (3b-j)have at least some derivatives with lead-like molecular properties. [5] However, with the set of capping groups used, scaffolds 3b, 3c, 3f, 3g, and 3i enable lead-like property space to be targeted most effectively.O fc ourse,t he molecular properties of the derivatives may be tuned by varying the specific aldehyde and/or capping groups used. However,a so thers have noted, [25] the presence of benzo-fused rings such as those in 3e and 3h have av ery significant effect on the properties of derivatives.I nt erms of molecular novelty,w en oted that (after virtual Boc deprotection) only 3b is found as as ubstructure in as earch of ar andom 2% selection from the ZINC database.W ec onclude that the scaffolds 3c,d, 3f, 3g, and 3i,j may enable significant novel regions of lead-like property space to be explored.
Drug discovery against central nervous system (CNS) targets raises the additional challenge of penetration of the blood-brain barrier.T he challenge is exacerbated in the search for high-quality lead molecules,w hich generally need to be both smaller and less lipophilic than the final drug candidates. [5] Marcaurelle and co-workers have described the synthesis of ar ange of azetidine-based scaffolds (4-11)t hat were designed to meet the needs of CNS drug discovery (Figure 5A). [26] We decorated each scaffold once using our standard set of medicinal chemistry capping groups.T o Figure 3. Synthesis of 2,5-disubstituted pyrrolidines using Pd-catalyzed aminoarylation reactions. A) Synthesis of pyrrolidine 1a,asr eported by Wolfe.
[22a] B) Pyrrolidines that might also be accessible using the method. C) Mean molecular properties of virtual libraries derived from the scaffolds 1.Ineach case, the Boc group was removed and the scaffold was decorated once using arange of standard capping groups. Lipinski's guidelines for orally bioavailable drugs are indicated by the solid black line. Novel scaffolds (compared to arandom 2% selection of the ZINC database) are shown in black, whilst known substructures are shown in grey.Standard deviations are indicated.

Angewandte
Chemie establish relevance to CNS drug discovery,w ea ssessed the resulting virtual libraries against established guidelines for CNS drugs ( Figure 5B). [19] We note that 81 %o ft he compounds satisfied guidelines for both molecular size and lipophilicity (AlogP 3; heavy atoms 26), whilst the rest of the compounds fell into atransitional area (3 < AlogP < 5or 26 heavy atoms 36). We also analyzed the molecular properties of the compounds on ap er-scaffold basis,a nd concluded that, using our decoration strategy,the scaffolds 4-6 and 9-11 allow CNS drug-like space to be targeted most effectively.M arcaurelle and co-workers have also described the pairwise decoration of 9 to yield more than 1900 compounds.T he cell permeability of seven exemplar final compounds were measured experimentally,c onfirming their suitability for transport in the gut and at the blood-brain barrier.

Establishing the Shape Diversity of sp 3 -Rich Small-Molecule Libraries
Novel sp 3 -rich molecular scaffolds that combine welldefined molecular topologies with functional-group handles for diversification have been noted to have significant value in drug discovery applications. [6,7,25a] However, the impact of as caffold on the three-dimensionality of its derivatives is often not obvious by simple inspection. Carreira and Rogers-Evans have developed efficient syntheses of small sp 3 -rich spirocyclic scaffolds (such as 12 and 13;F igure 6A). [27] We analyzed the diversity of molecular shapes that may be Bode's approach to nitrogen-containing heterocycles using SnAP reagents. [23] A) Exemplar reaction of benzaldehyde with aSnAP reagent. B) Additionalscaffolds that have been, or might be, prepared from benzaldehyde using the approach.C )Mean molecular properties of virtual libraries derived from scaffolds 3 after one decoration reaction. Novel scaffolds are shown in black, whilst those that are found as substructures in arandom 2% sample from the ZINC database are shown in grey.Standard deviations are shown. Lead-like molecularp roperty space [5] is indicated by the black box. Figure 5. Assessmento fthe relevance of scaffolds to CNS drug discovery.A)The scaffolds considered in this study. [26] B) Mean molecular properties of virtual libraries derived from the scaffolds 4-11 after one decoration.S tandard deviations are shown. Molecular property space is shaded according to Pfizer's guidelines for relevance to CNS drug discovery (pale pink:o ptimal, dark pink:t ransitional area, red: undesirable). [19] Angewandte Chemie explored through decoration of these scaffolds.S imilar analyses could also be undertaken for other synthetic approaches to diverse molecular scaffolds. [28] We enumerated virtual libraries in which four scaffolds 12 a/b and 13 a/b were decorated once or twice with the standard set of capping reagents.W hile there are several metrics by which shape diversity can be assessed, [29,30] we used principal moments of inertia (PMI) plots.P MIs were determined for alow-lying conformer of each compound, and the mean PMIs for the compounds based on each scaffold are presented in Figure 6B.W en ote that the shape of the resulting compounds depends critically on the position of the functionalizable groups within the scaffolds (and hence the vectors that may be explored). Compounds derived from the scaffolds 12 a and 13 a are systematically more linear than those derived from the regioisomeric scaffolds 12 b and 13 b; however, the significant difference in mean PMIs between scaffolds 12 a and 13 a (versus the closely aligned values for 12 b/13 b)i sn otable and could not have been predicted by simple inspection.
Bull and co-workers have developed an efficient method for the synthesis of diverse substituted oxetanes,w hich were designed to explore three-dimensional fragment space (Figure 7). [31] As an example,R h-catalyzed reaction of the diazo compound 14 with the 2-bromo alcohol 15 gave the corresponding OÀHi nsertion product 16;s ubsequent base-mediated cyclization gave the substituted oxetane 17 a (Figure 7A).
To assess the potential value of the method to yield distinctive fragments,w ee numerated av irtual library of amides by combining six deprotected scaffolds ( Figure 7B) with 28 small commercially available amines (see the Supporting Information). 36 %( 61/168) of the virtual products had fragment-like properties (9 heavy atoms 17; À1 AlogP 3). We compared the shapes of low-lying conformers of these 61 fragments with those of 257 commercially available fragments and 261 randomly chosen fragments from the GDB-17 database of exhaustively enumerated compounds ( Figure 7C). [32] TheG DB-17 database provides an insight into the shape diversity of all possible fragments;t his potential diversity is poorly sampled by commercially available fragments,w hich tend to lie close to the rod-disk edge of aP MI plot. Our analysis shows that Bullss caffolds can be decorated to yield fragments that are significantly more three-dimensional than commercially available fragments.

Conclusions and Future Perspectives
We have demonstrated that simple,r eadily available computational tools can be used to inform asemiquantitative understanding of the likely value of new chemical methods to Figure 6. Evaluationo fthe shape diversity of virtual libraries based on spirocyclic scaffolds reported by Carreira and Rogers-Evans. [27] A) The scaffolds evaluated in this study.B)Mean principal moments of inertia of virtual libraries generated by decoration of the scaffolds once or twice using the standard set of capping groups.S ee the Supporting Information for further details. Figure 7. Bull's approach to substituted oxetanes. [31] A) Synthesis of an exemplar oxetane (R = Bn, which was virtually removed before the computational study). B) Other scaffolds that were combinedw ith 28 small amines to yield avirtual library of amides (see the Supporting Information). C) PMI plot of the 61 fragment-like compounds found in the virtual library (black), 257 fragments randomly selected from the emolecules database (light gray) and 261 fragments randomly selected from the GDB-17 database (dark gray). Ap lane-of-best-fit analysis (Ref. [30]) is also provided in the SupportingInformation. the discovery sciences.W hile examples of the retrospective use of these tools are beginning to appear in the literature,we argue that it is important that they are now used to ensure that the scope and limitations of new methods are fully established. Thei nformation from such analyses can be used, among other things,t od esign suitably representative substrate sets against which to test reactions,o rt oprioritize the synthesis of "high-value" scaffolds from ar ange of possible products.Weargue (as have others in related contexts) [33] that such an approach should not be regarded as arestriction, but rather as ac hallenge and inspiration for the academic synthetic chemistry community.Abetter, shared understanding of the scope and limitations of new methods will ultimately lead to more-rapid uptake of the most valuable methods,t hereby benefitting both the end-user community (availability of trustworthy new tools) and the academic authors (higher citation, demonstrable impact).