Open Source Drug Discovery: Highly Potent Antimalarial Compounds Derived from the Tres Cantos Arylpyrroles

The development of new antimalarial compounds remains a pivotal part of the strategy for malaria elimination. Recent large-scale phenotypic screens have provided a wealth of potential starting points for hit-to-lead campaigns. One such public set is explored, employing an open source research mechanism in which all data and ideas were shared in real time, anyone was able to participate, and patents were not sought. One chemical subseries was found to exhibit oral activity but contained a labile ester that could not be replaced without loss of activity, and the original hit exhibited remarkable sensitivity to minor structural change. A second subseries displayed high potency, including activity within gametocyte and liver stage assays, but at the cost of low solubility. As an open source research project, unexplored avenues are clearly identified and may be explored further by the community; new findings may be cumulatively added to the present work.


■ INTRODUCTION
Malaria remains one of the world's most deadly diseases. There were an estimated 214 million cases of malaria in 2015, including around 438,000 deaths of which the majority, tragically, were young children. 1 Besides the threat to human health, there is significant economic and social impact on the affected communities with malaria costing Africa billions of dollars per annum in direct losses and even more when considering lost economic growth. 2,3 The continual threat of drug resistance has led to the World Health Organization (WHO) recommending that all treatments should only be used in combination; artemisinin combination therapies (ACTs) comprising an artemisinin derivative and a 4-aminoquinoline or amino alcohol currently represent the front line. However, the inevitable reports of resistance or tolerance, in the form of increased parasite clearance times, have already appeared. 4−6 Loss of the artemisinin class of drugs is a terrifying scenario that requires urgent risk mitigation. Apart from the introduction of the ACTs, no viable new drug for malaria has entered the market in the past 15 years, and the recent results of the Mosquirix vaccine phase III trials showed 18−36% efficacy depending on patient age and other factors. 7,8 New chemical series that can replace and complement the ACTs are urgently needed and are being sought by a combination of academic and industrial groups, sometimes in collaboration with nongovernmental organizations (NGOs). 9−11 Of particular interest are lead candidates with differentiated activity profiles, ideally targeting gametocyte or liver stage parasites in addition to blood stages. 12−15 The ability of the pharmaceutical industry to provide new medicines cost-effectively is diminishing. 16 The industry acknowledges that lack of innovation is a problem. 17 Pharma is responsible for the creation of most marketed drugs, yet many of these are arguably not innovative; in contrast academia and the biotech industry generate more innovative leads, but many are orphan drugs. 18 Such challenges disproportionately affect research into new medicines for tropical diseases, which would inevitably generate a slim profit margin unlikely to recoup the necessary expenses of research and development. 19 The weak status of many drug development pipelines in the pharmaceutical industry is driving the exploration of alternative models. The current model of drug discovery, whether in academia 20 or industry, can generally be characterized by secrecy and an underlying profit motive. 21 In the area of tropical diseases there are significant philanthropic efforts being made by many companies in providing treatments 22 and engaging in drug development 23 but also in conducting not-forprofit research. 24,25 There have been calls for greater sharing of data in the NTD field, 26,27 including the development of patent pools 28 and new Product Development Partnerships. 29 Repositioning of existing drugs is seen as a possible general strategy for the development of new antimalarials, even though the challenges of such an approach are clear. 30 There has been much recent discussion of the need for "Open Innovation", a term with a nebulous definition but typically describing a range of ideas from the sharing of data in a precompetitive environment through to competitions that allow organizations to bring in the best external ideas to complement in-house research but for which there is no requirement for any collaboration. 31 A model that has been mooted, 32−38 but never properly implemented and evaluated, is drug discovery and development where all data and ideas are freely shared, there are no barriers to participation, and there are no patentsso-called "Open Source" Drug Discovery. The requirement for total sharing of data as well as workflows in drug discovery and development (i.e., the experimental science, as opposed to the software used in the project 39 ) would mirror the same practices in open source software developmenta model that has created robust and successful products in widespread use and formed the foundation of major industries as well as spawning for-profit open source software companies. It was shown recently that the opening up of a laboratory-based chemical research project to unrestricted participation by anyone accelerated the research because experts unknown to the core team were able to join the project and solve transient project needs. 36 That project involved the discovery of a new synthetic route to a known compound, specifically the active enantiomer of the drug of choice for the treatment of schistosomiasis, praziquantel. 40 The project benefited from the open sharing of chemical data and procedures on the Internet, i.e., using the Internet as a medium that facilitated collaboration and peer review where participants could influence the direction of the research before it occurred, rather than use the online content merely as an information resource. Several other initiatives have leveraged the advantage of open online discussion of chemical data and results generated by others. 41−43 The present work extends this idea to the identification of novel bioactive compounds. A previous demonstration of an open drug discovery cycle was shown by the Usefulchem project,

ACS Central Science
Research Article which found four micromolar hits from a small product library after in silico target prediction and docking. 44 The Open Source Drug Discovery project in India has carried out a crowdsourcing project for annotation of the Mycobacterium tuberculosis genome. 45−47 In the field of biotechnology the CAMBIA organization used patents to enforce a code of conduct based on the open sharing of technologies, in that experimental tools could be freely used, provided no further patents were taken to restrict the use of those tools by others. 48 The "open source" moniker is not merely semantic and distinguishes such projects from other "open" ventures in several important regards, 38 described by the six laws that governed the operation of the present project ( Figure 1, Figure S1). 49 Crucially, the research process (i.e., strategic discussions, issues involving doubt) takes place in the public domain. The Creative Commons license covering project content ensures free reuse of all content including for commercial purposes (CC-BY), 50 and uses a free or open source composite technical platform that has recently been reviewed. 51,52 In recent years, teams led by Novartis, St. Jude's Children's Research Hospital, and GlaxoSmithKline (GSK) Tres Cantos have released large data sets of antimalarial compounds derived from phenotypic high throughput screening (HTS) campaigns. 53−56 The GSK Tres Cantos Antimalarial (TCAMS) data set contained 13,533 filtered hits which were subsequently prioritized and grouped to provide numerous starting points for other research groups; 57 all of these 47 scaffolds have been explored either internally by GSK, by independent groups, or in collaboration between GSK and academic groups. GSK themselves have published evaluations of two of these priority series, the cyclopropyl carboxamides 58,59 and an indoline series. 60 No further optimization work has been performed by GSK on these two series due to the inherent risks identified. The original GSK data set was used to identify starting points for the present campaign, resulting in the selection of TCMDC-123812 (OSM-S-5, Figure 2A) and its 4-aminoantipyrine derivative TCMDC-123794 (OSM-S-6) because of their attractive physicochemical properties, such as low logP and molecular weight, coupled with promising bioactivity and therefore presumed high ligand efficiency. (Compound numbering in this paper is based on the original project numbering, rather than a renumbering for this paper, allowing simpler cross-correlation between this paper and the live Web sites. The convention used is Open Source Malaria (OSM)-first letter of city in which compound was first made (e.g., S = Sydney)incremental integer. Batch numbers are included in internal project numbering schemes.) A number of compounds in the original TCAMS set 53 featured the arylpyrrole core with alternative head groups lacking the ester linkage ( Figure 2B). TCMDC-123563 was discounted ( Figure S2) 61 as it represented a singleton and contained an unfavorable ketone linkage. A cluster of related compounds, the 2-iminothiazolidin-4-ones (the "near neighbors", NN), was targeted because the members (including TCMDC-124103, -125697, and -125698) possessed promising activities without the ester, and indicated tolerance to variation elsewhere in the structure. While this work was being written up for publication, Gilbert et al. published details of a series of pyrrolones (identified through an unpublished screen performed by the World Health Organization Special Programme for Research and Training in Tropical Diseases (WHO-TDR) but also identified in the Novartis antimalarial data set 55 ) that have some structural similarities to the NN compounds (i.e., an arylpyrrole joined through a double bond to a different heterocycle), and comparisons will be drawn below with this series. 62,63 It was important to resynthesize the original hit compounds (Figure 2A) to confirm their activity with authentic samples prior to beginning a hit to lead campaign where the NN compounds would act as a potential backup subseries.

ACS Central Science
Research Article in this paper may be found in full in the relevant electronic laboratory notebook 64 and are summarized in the Supporting Information (Chemical Protocols; figures with the prefix "SC" may be found therein).
The two original GSK hits (OSM-S-5 and -6) were successfully resynthesized via a novel pyrrole acid (OSM-S-4, Figure SC1) that was prepared via a Paal−Knorr cyclization of the relevant aniline and ethyl 2-acetyl-4-oxopentanoate. 65,66 This approach was found to be superior to an alternative method involving initial synthesis of the unfunctionalized N-arylpyrrole, followed by conversion to the corresponding aldehyde with a Vilsmeier−Haack reaction (a procedure that was improved through a community suggestion ( Figure S3) 67 ) and then oxidation to a carboxylic acid, because the pyrrole aldehyde was found to be remarkably resilient to a range of oxidants. (An alternative route using a Friedel−Crafts acylation between the unsubstituted pyrrole and ethyl chloroformate, suggested in an e-mail from the community, gave only starting material in two attempts.) To confirm the promise of the two starting points OSM-S-5 and -6, they were evaluated against 3D7 (drug-sensitive) and K1 (chloroquine resistant) strains of Plasmodium falciparum in a whole cell assay and against HEK-293 cells as a cytotoxicity marker (Table SB1, Biological Protocols; tables with the prefix "SB" may be found in the Supporting Information). Biological data may be browsed in a static data set taken as a snapshot for this paper (Data Sets S1 (Excel), S2 (SDF)), online in a database constructed through periodic batch uploads, 68 or in a "living database" to which may be added future data; 69 the latter may be visualized in a web browser 70 using an open source system that was recently deployed in the Wikipedia Chemical Structure Explorer, 71 or the data may be downloaded and visualized offline with proprietary ( Figure S4) 72 or open source ( Figure S5) 73 tools). The evaluation was performed in three different institutions using different assays and widely employed controls; controls are important to assess reproducibility and interassay variability 74 but also to minimize possible bias in evaluating compounds where there are pre-existing data from other researchers already in the public domain ( Figure S6). 75 The tests confirmed that the original compounds TCMDC-123812 (OSM-S-5) and TCMDC-123794 (OSM-S-6) are potent (300−500 nM range), although slightly less than previously reported (IC 50 of ca. 330 and 54 nM respectively from a colorimetric LDH assay over 72 h 53 ) with low associated cytotoxicities and similar efficacy against 3D7 and K1 strains. The stability of the ester linker under biologically relevant conditions was expected to be poor, so the aldehyde, ethyl ester, and carboxylic acid 4-fluoropyrroles made en route to these compounds were evaluated as potentially active fragments but were found to exhibit relatively low activity, suggesting that the parent compounds do not act as prodrugs in this way (similar inactivity was observed for the 4-H, 4-Me, and 4-CF 3 aldehydes, esters, and acid fragments) (Data Sets S3−S5).
Amide Analogues of the Original Hits. Replacement of the ester with a hydrolytically more stable amide was undertaken through synthesis of eight derivatives (Figure 3), most of which were obtained through SOCl 2 -mediated conversion of acid OSM-S-4 to the acid chloride. Compound OSM-S-16 served as a control lacking the pyrrole moiety, OSM-S-8 served as a truncated "des-glycinyl" analogue, and the importance of amide methylation was explored with compounds OSM-S-59 and -93. In addition, the six most relevant commercially available compounds were purchased ( Figure SC2). (For more on strategies for selecting compounds for commercial acquisition, colloquially known as "SAR by catalog", see the Supporting Information (Text S2 and the component files referred to therein).) The conversion of ester to amide resulted in essentially total loss of biological activity in all cases (Table SB2), even for the analogues involving minimal changes (OSM-S-19 and OSM-S-21), so the amide series was not explored further.
The Alternative "Near Neighbor" 2-Iminothiazolidinones. In parallel with the initial evaluation of the amide analogues, a number of NN analogues were synthesized (Figure 4), typically from a Vilsmeier−Haack oxidation of the relevant pyrrole to the corresponding aldehyde and then condensation with the appropriate 2-iminothiazolidin-4-one ( Figure SC3). 76−78 While double bond geometry was undefined for the original hits, Z-geometry was established here by X-ray crystallography on four compounds and therefore assumed more broadly for the series ( Figure SC4). Given the low predicted solubility of these NN compounds, several were generated with lower clogP values and submitted for biological testing (OSM-S-109 through -115, OSM-S-108, and OSM-S-138), alongside analogues synthesized and submitted by an independent undergraduate laboratory cohort: OSM-A-1 through -4 and resyntheses of OSM-S-37 and -111.
Many compounds in the NN series showed high potencies (shown schematically in Figure 4; raw data in Table SB3), with several found to be more active than the original TCAMS hits. The compounds exhibited low associated cytotoxicity. The aryl component of the pyrrole moiety was moderately tolerant to changes (though small changes could result in large changes in activity, exemplified by the 4-F and 3-F isomers), while the thiazolidinone component was found to be more sensitive. Incorporation of cyclopentyl, phenyl, and acetyl components was tolerated, but the methylenenitrile group was not. Replacement of the arylpyrrole moiety with a phenyl resulted in loss of activity.
While high potency was obtained with several members of the NN series, this was achieved at the cost of high calculated lipophilicity with many compounds in this series exhibiting calculated logP values of 5 or more and correspondingly poor lipophilic efficiency (Figures S15 and S16). 79 Solubility was a challenge in several of the assays examining these compounds. Those compounds with more polar groups on the constituent aromatic rings suffered a drop in potency, but the pyridinyl analogue OSM-S-51 (calculated LogP = 1.8) provides a possible future line of inquiry for the community. A lipophilicity/ potency trend was also generally seen in the pyrrolone series, 62 though potency was seen for several compounds containing a substituted piperidine ring in place of the N-aryl group. 63 The open nature of the project enabled regular consultation with the wider medicinal chemistry community in real time, i.e., where community input could influence the direction of the  Table SB3.

Research Article
research. An important contribution to the project from outside the core experimental team ( Figure S17) 80 was discussion of whether the most potent members of the NN series were pan assay interference compounds (PAINS), i.e., compounds frequently appearing as hits in high throughput screens yet which do not exhibit a straightforward "drug-like" interaction with a biological target. 81 The OSM compounds were run through the KNIME PAINS filter, 82,83 and both the 2-imino-4thiazolidinone and arylpyrrole components of these compounds were flagged as potential PAINS, with the proposed cause of the interference of the former being the thiazolidinone exo-double bond acting as a Michael acceptor, though it has been noted that this core is much more problematic in rhodanines than in their 2-imino counterparts. 84 The topic of PAINS has been the subject of extensive recent discussion in papers 84−88 and in online communities (Figures S18 and S19). 89,90 Although most concern has centered on rhodanines, any related structure could be problematic if it contains a potentially reactive conjugated exo-double bond. In the area of chemotherapeutic and antiparasitic agents, such motifs may still be present in viable leads. 84 The negative view of rhodanine derivatives in the medicinal chemistry community is generally derived from academic reports and patents where positive assay hits have been reported without adequate evaluation of SAR or elucidation of the mode of action. A complicating factor, often not considered, is that PAINS are defined on the basis of results from target-based screens, where one would not link a cellular readout to the target if nonspecific protein reactivity were possible. 84 Conversely, a covalent modifier from a cellular screen may still be useful as a probe under some circumstances. 91 The present compounds may not be problematically reactive and may still be progression candidates for the following reasons: (1) The parent hit compounds (TCMDC-123812 (OSM-S-5) and -123974 (OSM-S-6), Figure 2) were shown not to be "promiscuous" frequent hitters in the original GSK data; 53 (2) the assay data described above show that the relevant 2-imino-4-thiazolidinone fragment was inactive on its own (OSM-S-55, Table SB3, entry 30), and (3) preliminary experimental controls were performed to assess the reactivity of the exo double bond ( Figure SC5) that showed no reactivity of the exocyclic double bond to hydrogenation or the addition of hydride or a thiol in model cases. Yet it was noted that these compounds not only are closely related to known PAINS but also fail by ALARM NMR filtering, which is designed to detect known protein-reactive cores. 92−94 Ultimately, the authorship team adopted varying positions. Overall, given the

ACS Central Science
Research Article poor physical properties of these compounds and the extensive additional work that would be needed to fully mitigate the series risks, and encouraged by one of us (J.B.B.) to move away from PAINS-like structures, the team decided not to further pursue this subseries.
Poor solubility with excessive lipophilicity may not just impart poor pharmacokinetic properties but also drive nonspecific protein reactivity through hydrophobic burial that may not be picked up in in vitro assays. The emphasis for this series moved toward analogues that promised improved solubility. As with all key strategic decision points in this open source project, discussion of possible structures took place in an online public consultation. 95 The meeting recalibrated the project focus with selection of the next synthetic compounds and agreement on which commercially available analogues to purchase and evaluate ( Figure S24). 96 This community consultation confirmed ethers, amines, sulfonamides, oxadiazoles, and substituted ester analogues of the original arylpyrrole hits as the most valuable targets ( Figure S25). 97 Synthesis of a shortlist of such analogues was planned and undertaken by whomever wished to do so (half of the ten top-ranked synthetic shortlist were ultimately made; synthetic planning assistance was received gratis from the private sector (Supporting Information, Texts S6 and S7), and the most relevant commercially available compounds were purchased and evaluated. GSK assessed the proposed compounds and confirmed (publicly) that none of the molecules had previously been evaluated for antimalarial activity as part of the TCAMS screen.
Analogues in the Arylpyrrole Series. Several analogue series of the original arylpyrroles were synthesized and evaluated ( Figure 5, raw data in each case may be found in the Supporting Information).
Ethers. One of the most promising proposed analogues was the ether OSM-S-236. A number of approaches have failed to generate the desired product, attributed to the instability of either the arylpyrrole alcohol starting material OSM-S-11 (which decomposed on silica and when stored under inert conditions at 2°C) or (if formed) the desired product itself ( Figure SC6). The synthesis of this compound was abandoned, given that similar side reactivity may be seen in vivo, although such reactivity could be mitigated through the use of more electron-deficient pyrroles.
Amines. Four amine analogues (OSM-S-58, -60, -94, and -95) were successfully prepared by reductive alkylation of the relevant pyrrole aldehyde, and three additional amine analogues were purchased (OSM-S-88 to -90, Figure SC7). All were found to be inactive. The synthesized structures were found to be prone to decomposition under ambient conditions. Modified Esters. In an attempt to decrease the hydrolytic lability of the ester, methyl groups were introduced adjacent to this functional group (OSM-S-116 and -68) ( Figure SC8). To assess the influence of introducing methyl groups to the terminal amide, analogues OSM-S-82 and OSM-S-91 were purchased. All compounds were found to exhibit low potency (Table SB4), further suggesting that minor structural changes to the potent TCAMS hit compounds reduce activity.
Ketones. Friedel−Crafts acylation of a pyrrole precursor with succinic anhydride and subsequent amidation provided three ketone analogues (OSM-S-98, -102, and -103, Figure SC9) that were found to exhibit essentially zero potency, with only OSM-S-103 showing any activity at higher concentrations.
Sulfonamides. Three sulfonamide analogues (OSM-E-1 to -3) were synthetically derived directly from the relevant arylpyrrole ( Figure SC10) and were also found to lack potency.
Pyrazoles. Two pyrazole analogues (OSM-S-57 and -92) were synthesized ( Figure SC11) and evaluated to assess the impact of modifying the arylpyrrole core. Both compounds were found to be inactive; the fluoro analogue of OSM-S-57 was thus not synthesized. A related heterocycle alteration was also found to be detrimental to the literature pyrrolone series. 62 Oxazoles. The oxazole analogue OSM-S-105 was prepared from the carboxylic acid of the corresponding arylpyrazole, though the analogous sequence for the parent arylpyrrole series could not be completed ( Figure SC12). This compound and several of the synthetic precursors were evaluated and all found to be inactive.
Triazoles. The triazole analogue OSM-E-8 was synthesized via a Cu(I)-catalyzed cycloaddition reaction ( Figure SC13) and evaluated alongside four synthetic precursors and side products, but all compounds were found to be inactive.
Synthetic Threads That Remain Open. Several synthetic targets for this series remain open ( Figure 6) such as the ether compound OSM-S-236 (though it seems likely that this compound will be unstable) and the oxazole OSM-S-246. The oxadiazole shown was proposed, and preliminary experiments toward its synthesis were performed (see precursor OSM-S-269 in the project laboratory notebooks for further details). 64 The oxadiazole is a common replacement for carbonyl containing compounds in hit to lead campaigns, 98 and so it was reasoned that the inclusion of this heterocycle might have favorable consequences for drug metabolism, though a commercial oxadiazole analogue (OSM-S-85, Supporting Information) had been found to be inactive, leading to the synthetic effort toward the oxadiazole being downgraded. The tolerance of the NN set to the introduction of the pyridine in OSM-S-51 (Figure 4) could be explored further as a means to increase solubility in that cluster, and a pendant substituted piperidine was found to lead to several potent compounds in the pyrrolone series that possesses some structural similarities to the NN series. 63 Indeed variation of the aromatic group in the arylpyrrole series was not explored given the intractability of substituting the ester: given the tight SAR, low solubility, and poor metabolic stability observed for the series, the project viewed the probabilities of success as limited and so did not pursue these targets.
In silico pharmacophore modeling has to date proven ineffective at high-confidence analogue prediction, but this remains an open challenge 99 to which others may contribute given the data set available. 68,69 Some preliminary results suggesting a common feature map for the arylpyrrole and NN subseries were of particular interest and could be explored with the more substantial bioactivity data now available.

Research Article
Along these lines, an automated bioisosteric replacement analysis 100 was performed for the NN series (Text S8, Data Set S19) focusing on replacement of the pyrrole phenyl substituent, and output suggestions were generated for potent compound OSM-S-35 ( Figures S27−S30); these may be explored in the future.
All these structures are available for investigation by the community, building on the unsuccessful attempts detailed in the online laboratory notebooks. It is hoped that the sharing of negative synthetic data in this way (16 attempts in the case of the ether OSM-S-236) will lead to a faster completion of syntheses of analogues in the future since prior attempts are not "orphaned" in undisclosed or unpublished notebooks. Participants in future work may be physically located anywhere; they are requested (but not obligated) to operate open source (unrestricted sharing of all data and ideas) to avoid wasteful duplication of effort. The results, whether substantial or incremental, may be added to the series wiki. 101 However, it is important to acknowledge the limitations of the series identified to date, meaning further analogue synthesis undertaken by the community in the absence of better knowledge of the biological target is likely to be unproductive.
Advanced Biological Evaluation. To assess further the attractiveness of this class of compounds, attention was focused on the most potent of the newly discovered analogues and the original hits. Given that the compounds arose from a phenotypic assay, no mechanism of action (MoA) was known, and preliminary investigations described below were designed to probe this, with a view to minimizing any potential MoA overlap with other compounds already in development.
Metabolic and Solubility Assays. The two original GSK hits (OSM-S-5 and -6) and six NN compounds were evaluated for their kinetic solubility in phosphate buffer and their metabolic stability in human liver microsomes (Table SB5, Data Sets S20 and S21). Both the arylpyrrole esters showed good solubility, but showed degradation in microsomes even in the absence of cofactors for cytochrome P450 and glucuronidation enzymes (NADPH and UDPGA, respectively) suggesting degradation by nonspecific enzymes. The iminothiazolidinone (NN) compounds showed generally low rates of metabolic degradation but at a cost of very low solubility (a general feature also of the literature pyrrolone series 62 that displayed typically higher rates of metabolic clearance).
Oral Efficacy in Mice. Representative compounds from both subseries were evaluated in an oral in vivo mouse trial (Data Set S22). The original starting points OSM-S-5 and -6 ( Figure 2) were chosen along with NN representative OSM-S-35 ( Figure 4). All three compounds were found to be inactive in vivo in Plasmodium berghei ANKA infected mice at 50 mg/kg after 4 days po. It is plausible that at least some of the arylpyrrole ester compounds would be degraded by general hydrolysis during absorption, and this could explain their inactivity in the mouse model despite their more favorable (although still high) cLogP values. Analysis of the plasma samples from the trial with OSM-S-5 showed that the compound was indeed orally available, but concentrations in the blood were above the EC 50 for only approximately 4 h (Data Set S23). The same compound was evaluated for stability in human and mouse plasma and found to be susceptible to hydrolysis (t 1/2 114 min), but it was stable in human plasma with no measurable loss after 240 min (Data Set S24). Esterase activity is higher in rodents than in other species, 102 as confirmed by using p-nitrophenol acetate as a control compound in this assay. By way of comparison, the literature pyrrolone series had also exhibited low oral bioavailability that likely resulted from a combination of low solubility and significant metabolic clearance. 62,63 A metabolite identification and glutathione trapping experiment was carried out on OSM-S-35 in the presence of metabolic activation (human microsomes, Data Set S25). A number of metabolites were detected, mainly oxygenated species (mono-, bis-, and trioxygenated metabolites) with the predominant metabolite (based on peak area and assuming similar response factors for each metabolite) arising from likely hydroxylation of the pyrrole substituted benzene. In the presence of glutathione ethyl ester (GSH-EE), weak signals for adducts were observed for both parent compound and monoand bisoxygenated metabolites. In the case of the parent compound, the GSH-EE adduct was observed both in the presence and in the absence of metabolic activation. Further characterization of the adducts was precluded by the very weak MS/MS spectra; however, the detection of adducts suggests that the formation of reactive species cannot be ruled out.
hERG Liability. There is increasing awareness of the importance for drug candidates of inhibition of hERG (the human ether-a-go-go-related gene ion channel), which is sensitive to blockade by many drug-like structures. Such blockage has led to a number of prominent postmarketing withdrawals. 103 Regulators are sensitive to any hERG activity, and a hERG counterscreen is often now run early in hit characterization and series prioritization. Compounds OSM-S-5 and OSM-S-35 were shown not to suffer from significant hERG activity (IC 50 > 33 μM vs 0.7 μM for control compound quinidine), implying that the original ester and NN class of compounds are not likely to exhibit undesirable cardiac side effects later in development (Data Set S26, Text S11).
Late Stage Gametocyte Assay. There are relatively few compounds effective against the gametocyte stage of the parasite, 12,13 though such compounds are important in the prevention of disease transmission. Four compounds found to be active in the asexual assay were evaluated against late stage (IV−V) gametocytes. The results ( Table 1, Data Sets S27 and S28) indicated that the NN compounds OSM-S-38 and OSM-S-39 exhibited highly promising IC 50 values of 4 nM and 2.6 nM respectively, comparable to the activities of artemisinin and artesunate. Compound OSM-S-9 also showed good activity while one of the original hit compounds (OSM-S-5) exhibited low levels of activity. Several compounds evaluated were found to lead to an unusual parasite morphology that may be indicative of a slow-acting mechanism of action (Supporting Information, late stage gametocyte assays 1 and 2).
When the compounds were evaluated in a dual gamete formation assay (DGFA) 104 to evaluate separately the susceptibilities of male and female mature stage V gametocytes to both

ACS Central Science
Research Article the hit ester (OSM-S-5) and a selection of NN compounds, it was found that all compounds possessed low activity at 1 μM against both sexes (Text S10). The discrepancy in the data arising from these two gametocyte assays may arise from the slightly different cell biology assayed (stage IV/V gametocytes vs stage V mature gametocytes) or most likely is a function of the relative compound exposure times (96 h vs 24 h). Liver Stage. Liver stage activity is strategically important in antimalarial drug discovery because compounds that block development of exoerythrocytic parasite stages in hepatic cells often prove to have causal prophylactic activity in animal models. 14,105 Three compounds (original hit OSM-S-5 and the NN compounds OSM-S-38 and -111) were assessed for their activity against sporozoites in liver cells (vs atovaquone as positive control) and displayed varying potencies ( Table 2; Data Sets S29−31; Figure S31) that track with blood stage potency. Of particular note is the striking potency of the nontoxic compound OSM-S-38, which may at this level provide protection from malaria to people who have been treated with the compound after an infectious bite. Given the similarity observed in the possible mode of action (see next section), it is unclear why the arylpyrrole compound OSM-S-5 should exhibit such low levels of activity in the present assay vs the NN compounds. Possible explanations include differential solubility or stability; the level of cytotoxicity observed for OSM-S-5 suggests that the liver stage potency is probably just a generic toxic effect.
Mode of Action. As with most phenotypic hit to lead projects, the activity assays were carried out on whole parasites, giving a more realistic measure of activity than enzyme-based assays at the expense of an unknown mode of action. To predict the mode of action of the members of these series, eight representative active compounds (structures in the Supporting Information) across both subseries were evaluated in a yeastbased genetic sensitivity Hip Hop assay 106,107 that seeks mutations that result in enhanced compound potency, i.e., to identify any sensitive biological processes. The most potent compounds (OSM-S-39 and -51) showed enrichments for processes involved in chromatin remodeling and DNA repair. Indeed, inspection of the rank ordered genes for all compounds showed a preponderance of genes for diverse components of chromatin architecture, remodeling, and post-translational modification of chromatin proteins. These results are consistent with a perturbation of chromatin condensation with predicted followon effects on gene expression. Several groups have highlighted apparent plasticity of global gene expression in the malaria parasite which is manifested as a complex program of histone modification. 108,109 To attempt to gain additional insight into genome wide screens, we compared the data from several OSM compounds to a data set of 3200 similar genome wide screens to identify the 10 most similar profiles for each of these OSM analogues. This provided clear concordance between these results and screens for other compounds, for OSM-S-39 (danthron, an anthraquinone derivative, as well as artemisinin), OSM-S-51 (the antineoplastics mitoxanthrone and mitomycin C), and OSM-S-9 (the antiarrhythmic agent DTBHQ). Taken together, these data point to a global dysregulation of chromatin architecture and suggest that further bioinformatics analysis both within the hip-hop data set and to orthologous data sets may be informative. The finding that the profile of the compounds is similar to that of known antimalarials (e.g., artemisinin) is provocative and warrants consideration. While it is formally possible that the predicted mechanism of select OSM compounds is similar to that of artemisinin, in our experience the degree of concordance we see between the OSM compounds and artemisinin suggests that the OSM compounds perturb a similar, but distinct, cellular target or pathway. This observation is consistent with our recent observation that, in a compendium of 3200 chemogenomic screens, the majority of compounds fell into ∼45 cellular response classes. 106 Our finding that different compounds from the same series have different profiles is also compelling and is consistent with previously published work demonstrating that single atom changes within a compound class can produce distinct cellular responses, which we attribute to small differences in compound structure resulting in significant fitness differences. 110 In order to derive experimental evidence for possible differences in mode of action, a parasite reduction ratio (PRR, generically "rate of killing") 111 assay was run for six of these compounds (across both arylpyrrole and NN series) even though this assay is not a direct measure of differences in gene expression. The results (Text S1, Figure S37) suggest a common mechanism of action between the subseries and one that is distinct from artemisinin, which has previously exhibited a substantially faster killing profile. It remains a possibility that the in vivo target and/or impact on gene expression of the compounds in the arylpyrrole/near neighbor series are distinct and yet they still share a common mode of action, but this was not investigated further here.
The potency of the near neighbor set led to an in silico target prediction study being performed on OSM-S-39 (Text S11). The method employed a naive Bayes statistical model identifying molecular structural features of small molecules with protein targets, using the ChEMBL database. Similar statistical approaches based on known activities of compounds have been successfully applied to identify targets of antitubercular compounds. 112−114 A total of 1,287 proteins were included for which at least 50 active compounds were known with activities <10 μM, with the other compounds in the database comprising the inactives. After the OSM compounds were scored against each target, the scores obtained were standardized by comparison with scores obtained through comparison with a random set of >10K compounds. The prediction list was culled for proteins relevant to malaria (3D7 proteome), and the analysis gave as the most likely candidate targets, in order of significance, carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 1 (Q8I3U9), dihydroorotate dehydrogenase (DHODH, Q08210), SUMO-activating enzyme subunit 2 (Q8I553) and 1 (Q8IHS2), and cyclin-dependent kinase 1 (P61075). To kick-start the process of exploring these predictions, this compound and 24 others representative of both the hit compounds were evaluated in an experimental assay for DHODH inhibition (Data Set S36; Text S14) vs two positive control compounds (TCMDC-125840 and -123822)  Table SB3. e Data Set S29; Figure S31, liver stage potency curve.

ACS Central Science
Research Article known to inhibit this enzyme. 115 None of the compounds exhibited any activity, strongly suggesting that this is not the target for either subseries of compounds. A line of inquiry remaining open is the equivalent assays against the other targets identified in the in silico prediction. Involvement of PfATP4, a putative plasma membrane ion pump, in the mechanism of action was ruled out. The relevant experiments measure the effect of a compound on the cytosolic [Na + ] ([Na + ] i ) in isolated (trophozoite-stage) parasites preloaded with the Na + -sensitive fluorescent indicator SBFI. PfATP4 is important since it is the proposed target of the antimalarial compound KAE609 116 that has recently successfully completed phase 2 trials, 117 as well as that of a pyrazoleamide 118 a dihydroisoquinolone ((+)-SJ-733), 119 various aminopyrazoles, 120 28 of the 400 potent antiplasmodial compounds comprising the MMV Malaria Box, 121 and a triazolopyrazine series being investigated by the Open Source Malaria Consortium. 122 All of these compounds cause an immediate increase in parasite [Na + ] i on addition to isolated parasites. Here, several compounds across both subseries were tested at 1 μM for their effect on [Na + ] i in saponin-isolated SBFI-loaded trophozoite stage 3D7 parasites. 116 In each case there was no significant effect on [Na + ] i , consistent with PfATP4 not being the relevant biological target of this compound class (Data Set S37; Figure S38).

■ CONCLUSIONS
The public deposition of novel antimalarial hits from phenotypic whole-cell assays has had a significant effect on worldwide antimalarial drug discovery by providing an embarrassment of riches for the early stages of discovery. The plethora of alternative structures available for investigation has ultimately led to the series described in this paper being "parked" in favor of other possible avenues of inquiry. Interestingly the "stop" decision was straightforward to make in part because the decisions taken communally in the project had to be justifiable to all onlookers. The time taken to reach the stop decision in this case was probably slightly longer than would be expected from a traditionally structured project because certain contributions were not paid for or grant-supported, necessitating a lower priority than core interests of the contributing laboratories.
Both subseries investigated have members with major strengths, including potency with low molecular weight and significant late stage gametocyte and liver-stage activity coupled with low toxicity and low levels of activity in the hERG assay. Indeed, OSM-S-5 is bioavailable. Many of the most obvious structural changes to the hits, in several cases changes of a single atom, led to total loss, rather than moderation, of biological activity, known colloquially as "activity cliffs" (Figure 7). In the case of the ester-containing hit OSM-S-5, the ester was likely the key metabolic liability but could not be replaced with other common isosteres without loss of activity, yet (in the mouse model, at least) was found to be too short-lived in plasma to be taken on further. Other minor changes remote from the ester were also found to result in a precipitous decrease in potency. The "near neighbor" analogue set displayed impressive potency coupled with low cytotoxicity but Figure 7. Sensitivity of the initial hit OSM-S-5 to minor structural changes.

ACS Central Science
Research Article alongside low solubility that could not easily be engineered out through side chain modifications.
The emergence of large amounts of open data in the field of drug discovery for malaria makes it straightforward to search for new ways forward in a project where a stop decision has been made. A similarity network map was generated for the most potent compound identified (OSM-S-39, Data Set S38, Figure S39) that identifies those compounds most similar in structure known to the open databases ( Figure 8). Some of these are now represented in the MMV Malaria Box, 123 making simpler their investigation by other groups. A portion of this map is represented, alongside the structures and potencies of the closest neighbors in the map. While it is clear from this analysis that OSM-S-39 remains the most potent compound identified in this class, exploration of the neighboring structures in such open data sets may identify a way forward for this series that could suggest unexplored strategies to increase solubility with a realistic chance of maintaining potency, such as the potential scaffold-hop to the triazine TCMDC-125770. (One of the compounds shown from the Novartis screen, GNF-Pf-5137/1137, is from the same series as the published pyrrolones. 62 ) The recommended way forward, rather than further analogue synthesis in the present series, is to pursue clarity in the mechanisms of action of such series using, for example, generation of resistance coupled with genomic sequencing, or pull-down studies. Subsequent screening might then establish better starting points from related structures (informed by what has already been tried via comparison with network maps) against the relevant target.
One of the unique features of this project, the open source research method, ensures that the unexplored lines of inquiry remain open alongside the attendant data posted online that makes it straightforward for others to resume any portion of the research project as fully fledged participants, with access to both positive and negative data, details of all procedures as they were carried out (to aid reproducibility), and anecdotal insight into project loose ends that are easy to explore.
The machine-readability of the present project (for example the use of cheminformatic strings in the online electronic lab notebook) permits an unusually straightforward link between a high throughput screening result in a public database and a "live" research project that has investigated that compound.
Contributions to this project arose from disparate sources: (i) core government/NGO/foundation support, via both direct grant support and in-kind contributions (it is likely that direct support from public/philanthropic funds will always be needed as part of any open source drug discovery campaign); (ii) additional compound synthesis by undergraduate or postgraduate university students, either individually or as part of crowdsourced classes; (iii) expertise and advice emerging from the wider online community, both solicited and spontaneous; and (iv) additional laboratory contributions from other specialist centers, such as academic laboratories with block grants or pharmaceutical companies. These sources have considerable potential and can be mobilized further. For example, the time of researchers in major organizations (pharmaceutical companies, universities, CROs) could be contributed via policies that allow staff to work part-time on their own projects mimicking the pro bono system of the major law firms; such policies have been sporadically adopted ( Figure S40) 124 and are a component of the GSK Tres Cantos research campus that kick-started and then supported the work described in this paper. In cases where resources are not forthcoming, direct payment (e.g., generated through crowdsourcing) could be awarded via competition, as occurred in this paper to create the graphical abstract. Each situation in the diverse base of support for an open source project will be different, and may be driven by appropriate selfless and selfish motives (e.g., contribution to a public health project vs securing publication authorship). Besides the transparency and completeness of the data arising from the research, this experiment with open source drug discovery made clear some of the advantages of this way of working: the breadth of people and expertise arising from "external" sources that were contributed rather than actively solicited, such as successful

ACS Central Science
Research Article compound synthesis by federated groups of graduate and undergraduate students, or the analysis of data obtained by others. Allowing participation by "strangers" introduces costs, such as the time required by a "core" team to maintain the project status, data, and methodology to allow meaningful contributions (in particular a real-time data set making clear the series SAR). We predict that tools and practices will arise to make easier some of these challenges, as open source methods in drug discovery and development become more widely adopted.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10 Chemical and biological protocols, other text-based files, and screengrabs (PDF) All molecules (XLS) All molecules in SDF format (ZIP) Potency data from Ralph assay 1 (XLS) Potency data from GSK assay 1 (XLS) Potency data from Avery assay 1 (XLS) Similarity network data for Tres Cantos series in Cytoscape format (ZIP) Map of purchaseable compounds around OSM-S-35 in Cytoscape format (ZIP) Potency data from Ralph assay 2 (ZIP) Potency data from GSK assay 2 (XLS) Potency data from Avery assay 2 (XLS) Potency data from Avery assay 3a (PDF) Potency data from Avery assay 3b (XLS) X-ray structural information for OSM-S-35, -42, -54, and -9 (ZIP) Potency data from Avery assay 5 (XLS) Potency data from Guy assay (XLS) Potency data from Avery assay 3c (XLS) Potency data from Dundee assay 1 (PDF) Potency data from Dundee assay 2 (PDF) Bioisostere analysis in Cytoscape format (ZIP) Solubility and microsomal stability data for OSM-S-5, -6, -9, -10, -37, -38, -39, and -54 (ZIP) Solubility and microsomal stability data for OSM-S-111 (XLS) Data from oral in vivo P. berghei mouse trial for compounds OSM-S-5, -6, and -35 (XLS) Pharmacokinetic data from oral in vivo P. berghei mouse trial (XLS) Human and mouse plasma stability data for OSM-S-5 (XLS) Metabolite identification assay data for OSM-S-35 in human liver microsomes (XLS) hERG assay data for compounds OSM-S-5 and -35 (XLS) Avery late stage gametocyte assay 1 (XLS) Avery late stage gametocyte assay 2 (XLS) UCSD liver stage assay 1 (XLS) UCSD liver stage assay 2 (XLS) UCSD liver stage assay 3 (XLS) Nislow gene set enrichment analysis data (XLS) Nislow gene set enrichment analysis data, Spotfire format (ZIP) Nislow gene set enrichment analysis data (PDF) Nislow fitness defect scores for all deletion strains (XLS) Data from GSK DHODH assay (XLS) Data from Kirk ion regulation assay (XLS) Similarity map surrounding OSM-S-39 in Cytoscape format (ZIP) of Cambridge, now Charterhouse School) and Joie Garfunkel (Merck). For their generous time and insights, we thank the other (sometimes anonymous) online contributors. We thank Viputheshwar Sitaraman (Draw Science) for creating the graphical abstract as part of a competition for this purpose. We thank Ginger Taylor for creating The Synaptic Leap website, which hosted much of the early activity in the project. We would like to thank student volunteers (Min Kyung Chong, Jun Ki Hong, Martina Yousif, Sebastien Dath) for help with online data entry.