High-Throughput Phenotypic Screening and Machine Learning Methods Enabled the Selection of Broad-Spectrum Low-Toxicity Antitrypanosomatidic Agents

Broad-spectrum anti-infective chemotherapy agents with activity against Trypanosomes, Leishmania, and Mycobacterium tuberculosis species were identified from a high-throughput phenotypic screening program of the 456 compounds belonging to the Ty-Box, an in-house industry database. Compound characterization using machine learning approaches enabled the identification and synthesis of 44 compounds with broad-spectrum antiparasitic activity and minimal toxicity against Trypanosoma brucei, Leishmania Infantum, and Trypanosoma cruzi. In vitro studies confirmed the predictive models identified in compound 40 which emerged as a new lead, featured by an innovative N-(5-pyrimidinyl)benzenesulfonamide scaffold and promising low micromolar activity against two parasites and low toxicity. Given the volume and complexity of data generated by the diverse high-throughput screening assays performed on the compounds of the Ty-Box library, the chemoinformatic and machine learning tools enabled the selection of compounds eligible for further evaluation of their biological and toxicological activities and aided in the decision-making process toward the design and optimization of the identified lead.


■ INTRODUCTION
Poverty-related infectious diseases such as tuberculosis, malaria, trypanosomiasis, and leishmaniasis afflict a massive global population.It has been estimated that, overall, over 200 million are affected or are at risk.The common problem among all these infectious diseases is the limited number of therapeutic drugs (Figure 1), their poor safety profile due to their toxicity, low compliance by patients, low accessibility, and drug resistance development. 1,2In the case of tuberculosis (TB) infections, the current standard treatment is effective, even though clinical practice suggests that patients with uncomplicated drug-susceptible TB are required to take multiple antibiotics for 6 months.Since compliance is low, WHO recommends that this must be directly supervised and possibly changed with a therapy that not only ensures higher compliance but is also shorter in duration and demonstrates effectiveness in the short term.This concept has generated a huge layer of infrastructure to the long treatment program.With the rise of drug resistance, treatment failure rates have also increased along with more toxic therapies that are far more costly and hence less accessible.Improved interventions could have a substantial effect on our ability to decrease the morbidity and mortality associated with the disease and to limit the further spread, as treatment of active TB is the major modality for preventing transmission in most of the world.
Recent analysis reports that 75% of all emerging human infectious diseases in the past three decades worldwide originated in animals. 3Poor and disadvantaged populations (subtropical regions), European Mediterranean countries may spread new infectious agents. 4Infections caused by Trypanosomatidae such as Chagas diseases, Human African Trypanosomiasis (HAT), and Leishmaniasis account for 17% of the estimated global burden of all infectious diseases (700,000 deaths/year). 5Chagas disease is caused by Trypanosoma cruzi (Tc) that is more diffused in South America; in its chronic form Chagas disease leads to severe organ damage, affecting mainly cardiovascular and digestive systems. 6HAT is mostly diffused in African countries and is caused by Trypanosoma brucei (Tb) subspecies; while Tb rhodesiense is responsible for a more acute infection and a faster-progressing disease that also affects the central nervous system, Tb gambiense establishes a chronic infection and a slow progression disease. 7Leishmaniasis is endemic in many tropical and subtropical countries and is caused by the infection of diverse Leishmania spp.inducing three main clinical forms, namely, cutaneous, mucocutaneous, and visceral diseases. 8All the aforementioned infections, addressed as neglected infectious diseases (NID), cause severe population burden if the drug candidate pipeline is not enriched.Therefore, there is an unmet medical need for novel medicinal chemistry efforts to develop new treatments.To accelerate the drug discovery of hits or leads, different technologies have been adopted.High-throughput screening (HTS) or medium-throughput screening (MTS) technologies have been largely performed as the preferred approach, in particular for phenotypic screening, 9,10 and this has generally been more successful than a target-based approach, although both can be complementary for infectious diseases.
The management of the large amounts of data available from HTS approaches may require a powerful method of analysis.−22 This has enabled the prioritization of compounds for testing 23 and, in several cases, molecules with in vivo efficacy. 14In recent years, we have developed our own in-house machine learning software called Assay Central and demonstrated it by curating the Mtb data leading to 18,886 molecules (with activity cutoffs of 10 μM, 1 μM, and 100 nM). 21The 100 nM cutoff model was tested with an evaluation set (153 compounds) and showed statistics in line with those observed with 5-fold cross validation (accuracy 0.83). 21A more recent analysis of over 100 active leads for Mtb that are representative of thousands of active compounds generated over the decade suggested a very limited chemical diversity and this in turn will be reflected in any machine learning models generated with such data. 24The conventional approaches to drug discovery are identifying compounds that are the same or very similar to those preceding them, suggesting that our cheminformatics approaches need to evolve to go beyond the current chemical property space to find new leads.Databases related to structure activity relationships for NIDs and parasitic diseases are rare and ChEMBL 25 represents a curated source of such data.Similarly PubChem 26 is another source of such public domain data that can after some clean up and preparation be used for machine learning. 23In the present work, we apply various commercial (Discovery Studio) and proprietary (Assay Central) tools to machine learning to model all the data generated and use this to predict new compounds to test.In Figure 2. Workflow of the key actions of the study.Starting from the in-house Ty-Box library, the 456 noncommercial compounds were assessed (A) in a whole cell-based HTS primary screening against T. brucei, L. infantum, and T. cruzi and against replicant (H37Rv) and nonreplicant (ss18b) strains of Mtb and in parallel for in vitro human metabolic interference (five CYP isoforms) and toxicity (hERG, mitochondria, and A549 cell line).The primary hits were prioritized using chemoinformatics and Pareto ranking (B) which identified the best hits with high activity and low toxicity from the active compounds.The selected primary hits were validated in secondary screening (C) for the determination of the dose− response curve.In parallel, Bayesian classification models (D) were generated to discriminate the structural determinants responsible for biological activity from those accounting for human toxicity.This information was exploited to design, prepare, and assess a second generation of hits (E−H) with the main purpose to validate the Bayesian models and to identify the most promising compounds to be evaluated in vivo for pharmacokinetic properties (I).One lead candidate was identified at the end of the discovery pipeline.
this process, we also compared multiple machine-learning algorithms with these data sets and demonstrate how such Bayesian machine learning models can used for lead optimization.
Some large compound libraries were collected from different consortia (More Medicines for Tuberculosis (MM4TB) 27,28 and New Medicine for Trypanosomatidic Infection (NMTry-pI) 29 with the aim of finding a new antitubercular and/or an antiparasitic agent 30−36 to overcome the unmet medical needs still represented by tuberculosis and parasite infections.The compounds come from a drug discovery and development studies aimed at optimizing the translation of compounds from the discovery phases to the preclinical and clinical models.One of the approaches in searching for new potential chemical hits adopted relied on the screening of noncommercial available libraries of compounds using a target-based or a cell-based screen.The in-house chemical libraries are usually assembled from compounds and intermediates produced during other medicinal chemistry investigations performed within the academic or industrial research group.This validated screening approach was also used to fish out from the unique Italian Institute of Technology (IIT) LIBRA compound library novel compounds targeting the pteridine reductase 1 (PTR1) enzyme from T. brucei (TbPTR1) and from Leishmania major (LmPTR1), validated parasite targets, which offer the potential to be progressed as African trypanosomiasis and Leishmaniasis drugs, respectively. 31Another application was developed to the identification of new antituberculosis compounds. 37These libraries might have the limitation to cover a small part of the putative very large chemical property space.To explore greater chemical diversity, we decided to investigate the antiparasitic and antitubercular potential of another in-house chemical library provided by TYDOCK Pharma, namely, Ty-Box library.The Ty-Box library herein disclosed (see Supporting Information) consists of 456 compounds characterized by a subgroup of sulfonamide derivatives.Many of these compounds were previously explored for their anti-infective activity.This characteristic is important as we were looking for anti-infective agents.The library profile with the respective molecular properties, including chemical-physical properties and human toxicity profile have not been tested previously.Therefore, this work also provides information that will be useful for future studies with these molecules for additional targets or diseases.
The workflow of the key actions of the study to identify investigational leads, is reported in Figure 2. The compound library was screened in whole cell-based HTS campaigns against three kinetoplasts (T.brucei, Leishmania infantum, and T. cruzi) and against replicant (H37Rv) and nonreplicant (ss18b) strains of Mtb (Figure 2A) with the objective to identify potentially pan-antiparasitic compounds or promising in vitro single compounds or class with a promising lead feature profile. 38Since the potential liability and toxicity represent a limiting bottleneck in the progression of the compounds in the preclinical phase, the entire library was evaluated at an early stage for drug−drug interactions involving cytochrome P450 enzymes (CYP) inhibition considering that compounds altering the CYP enzymes can alter the metabolic transformation of other drugs coadministered to the patients thus generically addressed as influencing drug−drug interactions. 39Other aspects of drug evaluation are associated with assessing human toxicity.This includes examining the hERG (the human ether-a-go-go-related gene) for the evaluation of a potential cardiotoxicity due to the inhibition of potassium voltage gated channel; additionally, toxicity assessment against the A549 a human nonsmall lung cancer cell-line serve as a model to study compound cytotoxicity, whereas mitochondrial toxicity addresses the potential effects of investigational drugs on compounds metabolizing systems. 40These data have been achieved adopting in vitro HTS technologies for the antiparasitic activities (six strains).The large number of parameters generated in vitro in the study of the compounds of the Ty-Box library provided substantial data to support the use of the machine learning approach.
More than 20,000 data points have been generated that were then further processed using a machine learning methodology to generate a predictive model to identify and optimize compound features.The prioritization of compounds was guided by these chemoinformatic approaches to identify the best primary hits for antiparasitic potency and reduced/null human toxicity (Figures 2B,C and 1B,C).In addition, Bayesian classification models (Figure 2D) were generated to identify the structural features of each chemotype accounting for activity and toxicity to guide the design and synthesis of a second-generation compounds library of optimized hits to provide a quality lead compound with a potential for further refinement toward a preclinical candidate (Figure 2E−H).In summary, we first generated a large amount of in vitro biological data and then used machine learning methods as part of a combined virtual (chemoinformatic) and experimental biological HTS approach to identify new potential drug candidates for the treatment of broad-spectrum kinetoplastid infections.
■ RESULTS AND DISCUSSION Ty-Box Compound Library Properties.The in-house Ty-Box compound library consists of 456 noncommercial small molecules synthesized and assessed during several drug discovery projects performed in the past several years by our research group.The chemical property space covered by the Ty-Box library was explored using extended-connectivity fingerprints of maximum diameter (ECFP-6) method and it was represented using 3D and 2D t-stochastic neighbor embedding (t-SNE) algorithm (Figure 3A).For all the 456 Ty-Box compounds, the drug-likeness, in accordance with the Lipinski's "rule of five (RO5)", 41 was determined by computing molecular weight (MW), c log P, number of Hbond acceptors (HBA), number of H-bond donors (HBD), total polar surface area (TPSA), and number of rotatable bonds.Assuming no more than one violation of the rule, the 87.5% of the compounds were in accordance with the RO5 (Figure 3B).Moreover, the chemical space was analyzed for the maximal common substructure.Cluster analysis revealed 10 main clusters each containing >10 compounds (Figure 3C).Four main chemical families are represented in the Ty-Box: (i) the thiazole/thiazinepyrimidone, distinct in dihydrothiazolopyrimidinone/dihydropyrimidothiazinone (Figure 3C_1), and dihydrothiazolo (thiazino)purinone (Figure 3C_2); (ii) the acetanilide (Figure 3C_12); (iii) the benzothiadiazine (Figure 3C_4) and its congeners benzothiazinone (Figure 3C_3) and dihydrobenzothiadiazine (Figure 3C_5); and (iv) the benzenesulfonamide (BS).The latter is the most heavily represented family of compounds.The sulfonamide is decorated and derivatized with different and complex chemical functions or aromatic scaffolds that mask and deeply influence the chemical-physical-structural characteristics of these compounds so that it is possible to distinguish several independent subclasses, namely, benzenesulfonguanidine (Figure 3C_6), 2aminobenzenesulfonamide (Figure 3C_7), N-heteroaryl-BS (Figure 3C_8), N-aryl/alkylsulfamoylphenylacetamide (Figure 3C_9), pyrimidonyl-BS (Figure 3C_10), 4-amino-N-arylbenzensulfonamide (Figure 3C_11), and N-pyrimidinyl-BS (Figure 3C_13).Besides, sulfonamide is a chemical function that is the basis of several groups of drugs, such as the antibacterial sulfonamides.The sulfonylureas and thiazide diuretics are another example of drugs based upon the antibacterial sulfonamides.Over time, the application of sulfonamides has been extended from their use as antimicrobial agents to anticarbonic anhydrase, antiobesity, diuretic, hypoglycemic, antithyroid, antitumor, and antineuropathic pain activities, among others. 42Thus, the abundance of sulfonamide compounds in the Ty-Box could be useful for the investigation of the potential of this functional group for the design of antiparasitic or antituberculosis drugs and represent a good premise for a pan-antiparasitic agent.
High-Throughput Primary Screening against Kinetoplastids and M. tuberculosis.The Ty-Box library was tested against three kinetoplastid parasites (bloodstream form of T. brucei, amastigote L. infantum, and amastigote T. cruzi) and against replicant (H37Rv) and nonreplicant (ss18b) strains of Mtb using a primary whole-cell high-throughput phenotypic screening assay (step 2A, Figure 2).Results from the Mtb screening has been previously reported in Neres et al. 37 However, no early toxicity data were obtained nor was .Hierarchical clustering groups hit compounds with a similar chemical structure based on the ECFP_6 fingerprint.Antiparasitic activity, drug−drug interactions, and early human toxicity emerged from primary screening for each compound is represented through a heat-map.All the data of the primary screening are reported in Table S1.chemoinformatics analysis performed.In the present work, all compounds were screened at 50 μM in the T. brucei, L. infantum, and T. cruzi assays, and at 20 μM in the Mtb assays.The overall results of the primary screening against the three kinetoplastids and M. tuberculosis are reported as a heatmap in Figure 4.
Among the 456 compounds of the Ty-Box library, 153 hits possess single or multiple antiparasitic and/or antitubercular activity (Figure 5).The T. brucei assay relies on indirect determination of parasite numbers by quantification of total DNA released from cells present in the wells of plates using the SYBR Green I DNA fluorescent dye.Pentamidine was used as reference compound exhibiting an EC 50 of 1.6 ± 0.2 nM, which is comparable with the value reported in literature. 43,44he results are expressed as a percentage of cell growth inhibition at 50 μM.A cutoff of 80% of cell growth inhibition at the tested concentration was set for picking out the most active compounds (Figure 5), resulting in 48 primary hits (11% of the Ty-Box).In addition, 144 molecules (31% of the Ty-Box) showed a moderate activity in the range 30−78% of Figure 6.Prioritization of the primary hits active against at least one parasitic strain (according to the defined cutoff described in the main text) by Pareto ranking.The hits marked with a high score and reported with a green bar on the positive y-axis were prioritized and assessed in dose− response assays; conversely, the deprioritized or penalized hits (yellow and red bars, respectively) were rejected.In the negative y-axis is reported the cumulative activity and toxicity profile of the analyzed primary hits to highlight the weight of the antiparasitic activity (pale green bar), of the lack of A549 toxicity (aquamarine bar), and of the early toxicity (CYPs and hERG inhibition and mitochondria toxicity, in gradient blue bars) on the Pareto rank.cell growth inhibition, whereas the remaining 266 compounds (58% of the Ty-Box) were found to be inactive.L. infantum antiparasitic activity was determined according to the method of Siqueira-Neto et al. procedure with some modifications. 45hese compounds were screened in an intracellular assay of amastigote L. infantum-infected THP1-derived macrophages, which is a more physiological and disease-relevant model than those assays that rely on insect stages or axenic amastigote screens.Miltefosine and amphoterocin B were the positive controls, yielding EC 50 2.5 and 0.2 μM, respectively. 30A cutoff of 40% of cell growth inhibition at 50 μM was set since it is usually more difficult to identify antiparasitic agents able to hit the intracellular amastigote form of this parasite.Also in this case, a total of 48 primary hits were identified corresponding to an overall hit rate of 11% of the Ty-Box (Figure 5).Anti-Chagas activity was determined using human osteosarcoma U2OS cell-line infected with trypomastigote forms of the Y strain of T. cruzi in the presence of compounds.As for L. infantum, since T. cruzi is an intracellular parasite, the compounds were tested against the amastigote stage of the parasite for a more reliable antiparasitic evaluation.Infected cells were incubated for 72 h prior to fixation, staining, and quantification of antiparasitic activity by image analysis.Benznidazole was used as the reference, exhibiting an EC 50 2.4 μM, which is comparable with the value reported in literature. 46A cutoff of 40% of cell growth inhibition at 50 μM was set as for L. infantum, resulting in 45 hits with anti-T.cruzi activity (10% of the Ty-Box, Figure 5).Lastly, for the identification of the most promising hits with antitubercular activity against replicant (H37Rv) and nonreplicant (ss18b) strains of Mtb, a cutoff of 20% of bacterial growth inhibition at 20 μM was set.Only 10% of compounds (45 molecules) against the nonreplicant ss18b strains and 12% of compounds (57 molecules) against the replicant H37Rv strains showed bactericidal activity at the tested concentration (Figure 5).
In Vitro Early ADME-Tox Related Studies.The entire Ty-Box library was assessed in parallel for potential early toxicity-related issues using HTS in vitro assays.These studies included five CYP isoforms (1A2, 2C9, 2C19, 2D6, and 3A4), for the evaluation of drug−drug interactions, hERG liability, for the evaluation of a potential cardiotoxicity by inhibition of potassium voltage-gated channel, and mitochondrial toxicity and cytotoxicity in the A549 cell-line for preliminary evaluation of in vivo toxicity.The compounds were screened at 10 μM and the results reported as % inhibition (hERG and CYPs), % toxicity (mitochondrial toxicity in A549 cell-line), and % viability in the A549 cell-line.The overall early in vitro ADME-Tox profile of the Ty-Box library is represented using a heatmap visualization (Figure 4) and the numerical data are reported in Table S1.Almost all compounds present a safe CYP inhibition profile with an average value of inhibitor activity around 15% at 10 μM, with few exceptions exceeding this trend.Interestingly, the entire library showed an overall negligible adverse effect against hERG, mitochondrial toxicity, and cytotoxicity in the A549 cell-line.Pentamidine, benznidazole, amphotericin B, and miltefosine were used as reference controls and their early toxicity is in line with the literature data (Table S1). 30,36ctive and Not Toxic Hit Selection by Pareto Ranking and Validation in Secondary Assays.Chemoinformatic techniques can assist the drug discovery process by analyzing the relationship between biological activity, toxicity, molecular properties, and chemical structure of the tested compounds to help prioritize selections and increase efficiency.Using the various chemoinformatic techniques, the bioactive hits resulting from primary screening reported in the previous paragraph, and collected in Table S1 in the Supporting Information, were ranked by a Pareto model.The best hits for activity against at least one parasitic strain and low/null toxicity (step B, Figure 2) were selected (compounds identified as .Chemical structures of the sole validated primary hits and the corresponding antiparasitic activity expressed as EC 50 .For compounds that did not agree with the defined cutoff against at least one parasitic strain, the dose−response was not performed (n.d., not determined).SD for all the assays is within ±10% of each value.
The Pareto ranking enabled the selection of compounds endowed with activity at least against one parasite and/or Mtb strain and with low potential early toxicity-related issues according to the cutoff criteria: percentage of cell growth inhibition >80% at 50 μM against T. brucei, >40% at 50 μM against L. infantum and T. cruzi, >20% at 20 μM against replicant and nonreplicant strains of Mtb, <50% at 10 μM for CYPs inhibition and mitochondrial toxicity, <20% for hERG inhibition and >70% for A549 cell-growth (Figure 6).Twentyseven primary hits were selected for the secondary assay, and dose−response studies were performed for the determination of EC 50 values.The antiparasitic and/or antitubercular activity was confirmed for 10 out of 27 hits (Figure 7).
Ty-103, Ty-109, Ty-125, and Ty-127 (Figure 7) were the most interesting compounds of the library since they exerted wide-antiparasitic activity against all three kinetoplastids and the two strains of Mtb with EC 50 values in the low micromolar range (from 2.0 to 35.4 μM, Figure 7).These four primary hits share the dihydrothiazolpyrimidone scaffold (for Ty-103 and Ty-127,Figure 3C, cluster 1) or its extended ring homologue dihydrothiazinpyrimidone scaffold (for Ty-109 and Ty-125,Figure 3C, cluster 1).Although these hits possess a nitroso substituent on the pyrimidinone ring, in contrast with what was expected, they showed a safe metabolic and toxicological profile against the entire panel of targets considered in this study.The nitroaromatic moiety is a wellknown motif recurring in several examples of antiparasitic agents. 47The presence of a nitro group is not a guarantee it will be toxic or substrate for nitro reductases, as the anti-TB drug delamanid (a nitroimidazole containing compound) is not a substrate for these enzymes. 48The expansion of the dihydrothiazinpyrimidone scaffold toward the tricyclic system of the dihydrothiazinopurinone (Figure 3C, cluster 2) overall led to a loss of antiparasitic and antitubercular activity.Another interesting chemical class resulting from the primary screening was based on the benzothiadiazine scaffold and its subclusters benzothiadiazinone (Figure 3C, cluster 3), benzothiadiazine, and dihydrobenzothiadiazine (Figure 3C, cluster 4).Twentyfive compounds out of 84 were active against at least one parasite strain in the primary screening and two of those (Ty-251 and Ty-281, Figure 7) were prioritized by Pareto ranking and validated in the secondary screening.
Ty-251 was the most promising of the two showing a dual antiparasitic activity against T. brucei (EC 50 0.18 μM) and T. cruzi (EC 50 7.6 μM).In contrast, moving from the benzothiadiazinone scaffold toward the dihydrobenzothiadiazine Ty-281 (Figure 3C, cluster 5), the antiparasitic activity against T. brucei and L. infantum is completely lost, whereas it retained a submicromolar anti-Chagas activity with an EC 50 of 0.45 μM, and antitubercular activity (ss18b Mtb) with an EC 50 of 0.33 μM.The chemical class of the sulfonamides and the sub-clusters of the 2-amino-BS (Figure 3C, clusters 7 and 9), N-heteroaryl benzenesulfonamides (Figure 3C, clusters 8 and 11), N-pyrimidonylbenzen-BS (Figure 3C, cluster 10), and Npyrimidinyl-BS (Figure 3C, cluster 13) were the most representative for the number of primary hits active at least against one parasite strain (76 active hits over 167 compounds).Among this heterogeneous class of compounds, only two of the prioritized primary hits were validated (Ty-537, and Ty-678, Figure 7).These two compounds showed anti-T.brucei activity with an EC 50 ca.50 and 0.1 μM, respectively.Ty-537 (Figure 3C, cluster 7) showed in addition an interesting activity against replicant Mtb with an EC 50 of 0.17 μM.Particularly interesting was compound Ty-678 (Figure 3C, cluster 13) that, beside the nanomolar activity against T. brucei (EC 50 of 0.10 μM), showed promising activity toward the amastigote stage of L. infantum (EC 50 of 11 μM, Figure 7).Lastly, two primary hits belonging to the less representative chemical cluster of uracil were identified and validated.Ty-146 (Figure 7) showed a broad antiparasitic and antitubercular activity with EC 50 in the low micromolar range (EC 50 of 28.8, 39.0, 3.5, and 10.1 μM against T. brucei, T. cruzi, and both replicant and nonreplicant Mtb, respectively).Interestingly, the analogue bioisoster thiouracil Ty-366 retained solely the anti-T.brucei activity with an EC 50 of 19.5 μM.
In summary, primary screening of the Ty-Box identified a pool of primary hits with diverse chemical scaffolds and promising antikinetoplastid and/or antitubercular activity.This represents a valuable starting point for initiating a chemoinformatic-guided hit-to-lead program aimed at the optimization of these primary hits toward the identification of a lead compound with a balanced activity and toxicity profile.
Bayesian Models to Predict New Active and Nontoxic Hits.Bayesian machine learning modeling is a chemoinformatic approach useful for discriminating between userdefined active and inactive compounds present in a screening data set and therefore can be used to predict the likelihood of new hits not present in the training set (step D, Figure 2).Moreover, since the drug leads must not only be efficacious but also safe, the Bayesian model can be jointly adopted for predicting and discriminating potential toxic from safe compounds.Thus, Bayesian models can be constructed to correlate 2D chemical structural features of the compounds with activity and toxicity or lack thereof.We have therefore generated Bayesian models using the 456 molecules of the Ty-Box library as a training set, combining the bioactivity (against three different kinetoplastids and Mtb) and the enzymatic and cytotoxicity features.The screened molecules of the library were classified first into active/inactive and toxic/safe compounds according to the results achieved from each assay of the primary screening.The software was instructed using the cutoff values described above for discriminating active/inactive and toxic/safe primary hits.Compounds were set as active whether (i) antiparasitic activity >80% at 50 μM against T. brucei, (ii) antiparasitic activity >40% at 50 μM against amastigote L. infantum and T. cruzi, and (iii) activity >20% at 20 μM against both replicant and nonreplicant strains of Mtb.Analogously, compounds with % adverse activity in the early toxicity-related assays at 10 μM >50% against five CYP isoforms and mitochondrial toxicity, >20% against hERG, and >30% A549 cytotoxicity were set as toxic.All the compounds that do not fulfill these criteria were considered inactive or safe compounds.Models were generated using a standard protocol in Discovery Studio (Biovia, San Diego, CA) with the following molecular descriptors: molecular function class fingerprints of maximum diameter 6 (FCFP_6), A log P, molecular weight, number of rotatable bonds, number of rings, number of aromatic rings, number of hydrogen bond acceptors, number of hydrogen bond donors, and molecular fractional polar surface area.The generated model was validated by receiver operating characteristic (ROC) plot based on the leave-one-out cross-validation.The ROC was calculated for each model and assumed as a measurement of

Journal of Medicinal Chemistry
the accuracy of the model to identify the true positive and the true negative.An ideal model should have an ROC of 1.The models generated for the prediction of T. brucei, L. infantum, and T. cruzi activity were particularly predictive with ROC curves between 0.7 and 0.8, whereas these ROC values were lower for the two Mtb strains, with an ROC between 0.64 and 0.68 for ss18b and H37Rv, respectively (Figure S1).Interestingly, the Bayesian machine learning models resulted in excellent predictions for the chemical features responsible for the inhibition of the five CYP isoforms and for the inhibition of hERG, with ROC values ranging from 0.76 to 0.92.Lower ROC values were obtained for the models of mitochondrial toxicity and A549 cell-line toxicity (0.64 and 0.58, respectively) (Figure S2).Using the FCFP_6 descriptors, the fragments for Bayesian models were generated.The 20 highest-and lowest-scoring fragments accounting for activity/ toxicity were also identified with the software.Given the clusters above, the top-scoring fragments accounting for antiparasitic activity were dominated by sulfonamide motifs (such as BZ, N-heteroarylbenzene-BS, N-pyrimidonyl-BS, and N-pyrimidinyl-BS) and bicyclic scaffolds (such as dihydrothiazinpyrimidone or dihydrothiazinopurinone).Conversely, the bottom-scoring fragments included the benzothiadiazine scaffold (and its variants), although few relevant exceptions to this trend were present especially against T. cruzi.Interestingly, although both sulfonamides and dihydrothiazine derivatives could account for some interference with the activity of the five CYP isoforms, they were reported to be less involved in human toxicity, especially against hERG, mitochondrial toxicity, and cytotoxicity.We have also generated individual Assay Central machine learning models for these data sets using ECFP_6 descriptors alone.These results for 5-fold cross validation with Assay Central are summarized in Figure S3 and Table S2 and are very comparable to those produced with Discovery Studio and FCFP_6 descriptors.These models were also compared with a wide array of additional machine learning algorithms, namely, random forest, k-nearest neighbors, support vector classification, nai ̈ve Bayesian, AdaBoosted decision trees, and deep learning architecture as previously described. 49Radar plots for the various 5-fold cross validation metrics show broadly similar patterns (Figure S4), and when compared with the distance from the top normalized scores (Kruskal−Wallis), it shows that the random forest, support vector classification, and Assay Central Bayesian algorithm perform the best (Table S2).Similarly, the mean rank difference shows the same (Table S3).Different statistical tests on the rank normalized score such as Kruskal−Wallis (independent) did not show statistically significant differences (Table S4), whereas pairwise comparisons of the rank normalized score showed statistically significant differences for random forest, support vector classification, and our Assay Central Bayesian algorithm (Tables S5−S8 and Figure S5).
Given the large volume and complexity of the in vitro data generated by the diverse HTS assays performed on the compounds of the Ty-Box library, the chemoinformatic and machine learning tools allowed us to explore the biological and toxicological activities and to aid in the decision-making process toward the design and preparation of a second generation of compounds with improved activity/toxicity profile.
Validation of Bayesian Machine Learning Models for Whole-Cell Activity, Design, and Selection of the Test Set of Compounds.Given the potency and synthetic tractability of the validated hits and the identification of the chemical features responsible for activity and CYP inhibition, we initiated the medicinal chemistry elaboration focused on the optimization of the primary hits identified (while maintaining or improving the antikinetoplastids or antitubercular activity) and on the validation of the Bayesian models to establish early leads for further development.Thus, we mainly focused our attention on the chemical scaffolds of the benzothiadiazine (and its variants, clusters 3−5, Figure 3C) and N-pyrimidonyl-BS and N-pyrimidinyl-BS (clusters 10, 11, and 13, Figure 3C).Ty-251 and Ty-678 were assumed as reference compounds for each of these chemical classes and focused structure modification were introduced on the main scaffold.The virtual compounds were designed to be putatively active against at least one parasite strain and with null or reduced human toxicity.Putative decoy compounds were designed as well.In particular, the main scaffold of Ty-251, Ty-659, and Ty-678 was decorated by introducing chemical modifications and substituents to map the chemical space around this and to draft a preliminary structure−activity relationship (SAR) necessary to further validate these primary hits identified to support the identification of a lead compound.The virtual library of compounds that was designed was filtered for predicted activity and toxicity through the previously elaborated Bayesian models and a resulting set of 47 secondary hits, 1−47, were prioritized (Figure S6).We explored in particular the benzothiadiazine/one scaffolds variously decorated on the aromatic ring and on the nitrogen in position 2 as well as the reduction of the thiadiazinone ring to thiadiazine (Figure 8).Conversely, starting from the Npyrimidinyl-BS scaffold two divergent structural modifications were examined.The chemical structure of Ty-678 was modified by introducing a diverse alkyl or phenylalkyl chain at the sulfur or oxygen atoms of the 6-amino-2-mercaptopyrimidinol ring or by modifying the position of the amino or nitro group on the benzensolfonamidic moiety, whereas starting from Ty-659, we mainly focused on the bioisosteric replacement of the 2,4-dimethoxypyrimidine scaffold and on doubling the sulfonamide portion (Figure 8).Compounds 1− 44 were thus synthesized, whereas compounds 45−47 were commercially available and purchased (Figure 8).
All the secondary hits were assessed for antiparasitic activity against T. brucei, L. infantum, and T. cruzi and for potential early toxicity-related issues, which also served as a test set for the prospective validation of the identified primary hits and assessment of the predictive capacity of our Bayesian models.From this point, we focused on the compounds active against the kinetoplastids of most interest to us.
Description of Chemical Structures of the Test Set and Synthesis.The synthesis of the benzothiadiazinones 1− 5 is reported in Scheme 1. Compounds 1 and 2 and the intermediate I were prepared in good yield by the reaction of the appropriate aniline (6-chloroaniline for 1, N-methyl-6,7dichloroaniline for 2, and 7-chloroaniline for I) with chlorosulfonyl isocyanate in nitromethane, followed by the addiction of anhydrous aluminum chloride.The intermediate I was further reacted with the appropriate alkyl bromide (bromopropane for 3, propargyl bromide for 4 and 1bromobutane for 5 in the standard Sn2 condition, 50 using K 2 CO 3 as base, in DMF at room temperature and overnight to afford the N 2 -substituted benzothiadiazinones 3−5.Benzothiadiazines 6−8 were synthesized following different synthetic approaches according to the substituent to be introduced in position 3 on the benzothiadiazinic ring.6 was directly obtained by reacting aniline with N′-(chlorosulfonyl)-N,Ndimethylcarbamimidoyl chloride and DIPEA in anhydrous DCM at room temperature for 1.5 h (Scheme 1).
Biological Evaluation of the Test Set and Validation of the Bayesian Models.Regardless of the predicted activity by the Bayesian models against one or more parasite strains, the 47 secondary hits were screened first for the antiparasitic activity versus the three kinetoplastids.The results of the screening on the test set of compounds are reported in Table S9 and visually represented in Figure 9 as a heat map representation.
The test compounds were divided in three clusters according to the structural similarity and the derivation from the parent hits identified in the primary screening.The compounds were tested at 50 μM and % cell growth inhibition is reported with a heat-map ranging from red (no cell growth inhibition) to green (100% of cell growth inhibition at the tested concentration).For the validation of the Bayesian machine learning model, the same cutoff values used in the primary screening were adopted for discriminating actives from inactives.Figure 10 shows the results of the study.In detail, 36 and 11 compounds were predicted to be active and inactive, respectively, against T. brucei.Among the 36 predicted active compounds, 22 showed inhibition >80% at the tested concentration, whereas 10 of 11 derivatives were inactive, as expected.The model for T. brucei therefore resulted in an accuracy of 68%.For L. infantum, a lower accuracy of 47% resulted for the test set against L. infantum.Although 34 test compounds were predicted to be potentially active against L. infantum, only 10 compounds showed inhibition >40% at the tested concentration.Conversely, this machine learning model seems to be more effective in predicting the inactive compounds against Leishmania since of the 13 compounds predicted as inactive, only one showed an antileishmania activity above the set cutoff.Interestingly, the most impressive results were achieved in the T. cruzi test set.While only a few test compounds showed a modest activity against T. cruzi, the Bayesian model was able to effectively discriminate between active and inactive compounds, resulting in an accuracy of 91%.Only six compounds were predicted to be active against T. cruzi, and 4 of them showed activity >40% at 50 μM, whereas among the 41 predicted inactive compounds, only 2 compounds showed a modest cell growth inhibitory activity.All the synthesized compounds were assessed for potential early toxicity-related issues and the results of the in vitro screening were compared with the potential liability predicted by the Bayesian models for the test set of compounds.Tables S10 and S11 shows the prediction by the elaborated Bayesian model versus experimental CYPs inhibition of the secondary hits and the prediction by the elaborated Bayesian model versus experimental mitochondria and hERG toxicity of the secondary hits, respectively.Interestingly, the models resulted in an accuracy of 86% for hERG, 87% for CYP1A2, 91% for CYP2C19, 70% for CYP3A4, and 77% for mitochondrial toxicity.Conversely, a fair accuracy was achieved against CYP2C9 and CYP2D6, although the models were able to discriminate more effectively the nontoxic compounds.However, to identify only the compounds with the most promising antiparasitic activity, more stringent selection criteria were applied.The secondary hits showing a T. brucei growth inhibition >80% at 50 μM were subjected to a secondary screen by performing a dose− response assay for the determination of the EC 50 .Of the 22 tested compounds responding to this criterion, 11 were   antileishmania agent in secondary screen, whereas 38 resulted in a false-positive in the primary screening since its activity was not confirmed in the dose−response assay.Lastly, none of the compounds showed any promising anti-T.cruzi activity in the primary screening and therefore they were not investigated further.In summary, the test set exhibited an overall antiparasitic profile with a higher number of compounds showing anti-T.brucei activity (Figure 9).
By analyzing the antiparasitic and toxicity profile of the 47 test compounds, compound 40 (Figure 11A) resulted as the most promising derivative.Indeed, it showed a confirmed dual antiparasitic activity against T. brucei and L. infantum with IC 50 values of 3.0 and 6.0 μM, respectively.Moreover, with a 43% cell growth inhibitor activity at 10 μM against amastigote T. cruzi (IC 50 of ca. 12 μM) it could represent a valuable starting point for the further design of a pan-antikinetoplastid compound (Figure 11B).While other compounds showed an interesting activity and a safe early toxicity profile, only compound 40 was confirmed in secondary assays (e.g., 8 was not confirmed in the same assays).Besides, 40 showed a % of inhibition toward hERG, and all CYPs between 16 and 48% and a minimal mitochondrial toxicity (8% at 10 μM) (Figure 11B).The strongest inhibitory effect was observed only on CYP2C9.These results for antiparasitic activity and early toxicity are in accordance with the prediction obtained from machine learning models which successfully predicted for 40 required for biological activity and early toxicity (Figure 11C).The toxicity profile shows a few liabilities, but this is also likely linked to the presence of the nitro moiety.The nitro group should not always be regarded as a toxicophore, in particular when the drugs are used as antiparasitic agents. 47,51A few existing drugs contain the nitro group such as nifurtimox, benznidazole (for Chagas disease treatment), and fexinidazole that has been recently approved for HAT.However, if the compound proceeds further in the development program, mutagenicity assays will represent an important test to be performed.To evaluate the novelty of the identified lead, a similarity structure search of 40 was performed and provided only two substances with a structure similarity between 70 and 74%.The outcome of the research in different databases (Reaxys, SciFinder, and ChEMBL) clearly indicates that compound 40 represents a new and unexplored molecule.Based on the results achieved so far, 40 was elected as a promising hit with dual antiparasitic activity and low or null early toxicity, to be investigated in further in vivo studies.Before proceeding to in vivo studies, the preliminary pharmacokinetic profile of 40 was evaluated in healthy BALB/c mice using a SNAP-PK approach.40 was administered I.V. at a dose of 1 mg/Kg.Interestingly, after a single dose administration compound 40 showed a long half-life (t 1/2 12.7 h), and it was not detected in blood after 48 h (Figure 11D).The compound reached the maximum plasma concentration of C max 51.4 nM that is lower than its anti-T.brucei activity in vitro (EC 50 3 μM).A significantly higher dose can be used (e.g., 100 mg/Kg if tolerated) and different delivery formulation or route should be attempted for the progression of the compound to in vivo antiparasitic activity in infected mice.Compound 40 therefore represents a new pharmaceutical tool as an unexplored chemical scaffold for the design of improved anti-T.brucei compounds and it will subsequently be the subject of a lead optimization program to improve its properties.

■ CONCLUSIONS
In summary, a workflow was applied to prioritize the hits resulting from a HTS phenotypic screening of the Ty-Box library of 456 diverse chemical compounds and in order to identify new lead compounds against kinetoplastidae and/or Mtb.We progressed stepwise with the compound selection and obtained different readouts (Figure S7).We explored the large amount of activity and toxicity data generated by HTS with several computational approaches that are unique in the realm of NID research and have been successfully implemented in the NID drug discovery pipeline.Over 20,000 data points were generated that required the use of cheminformatic and machine learning approaches to assist in lead identification.We have subsequently demonstrated that antiparasitic and antituberculosis drug discovery can exploit these Bayesian machine learning models to assist in selecting compounds for in vitro screening that may have a higher probability of activity against T. brucei, L. infantum, T. cruzi, and Mtb, while at the same time a lower probability of undesirable off-target effects due to CYP inhibition, hERG inhibition, mitochondrial toxicity, or cytotoxicity.These new compounds could in turn lead to novel therapeutic treatments.The integrated in vitro and computational approaches also highlighted molecular substructures that contributed to the various activities and toxicities measured in vitro.Ultimately, the synthesis of similar and focused analogues solved some of the chemical or biological liabilities that would have prevented the molecules from proceeding further in the pipeline (Figure S7).Compound 40 was successfully identified as a new chemical entity, with low similarity to existing molecules, with a dual inhibitory activity against the parasitic enzymes and low toxicity.It is anticipated that the improved accessibility of such drug discovery data sets when combined with the machine learning tools like those described herein will facilitate the calculation of these activities more routinely in drug discovery.The outcomes of this research are publicly disclosed for the first time.We expect that this information will enable the scientific community to seed future lead discovery programs, apply and validate other computational approaches, and eventually we hope to aid in delivering innovative treatments for these important, but neglected diseases.
Finally, we have attempted a balanced use of different machine learning tools to support decision-making in drug discovery which led us to identify innovative, optimized, and developable compounds.Our efforts also used objective criteria and clearly identified chemical/biological liabilities and opportunities in a previously unexplored compound library in the field of NID.
■ EXPERIMENTAL SECTION Synthetic Procedures.All reagents, solvents, and other chemicals were used as purchased without further purification unless otherwise specified.Air-or moisture-sensitive were performed under an argon atmosphere.Reactions were monitored by thin-layer chromatography on silica gel plates (60F-254, E. Merck) and visualized with UV light, cerium ammonium sulfate, and alkaline KMnO 4 aqueous solution.Column liquid chromatography (LC) purifications were carried out using Merck silica gel 60 (230−400 mesh, ASTM).Flash chromatography purification was performed with an ISOLERA SP1-Biotage system.The structures of all isolated compounds were ensured by nuclear magnetic resonance (NMR) and mass spectrometry. 1H and 13 C NMR (1D and 2D experiments) spectra were recorded on a DPX-400 Avance (Bruker) spectrometer at 400 MHz or on a DPX-600 Avance (Bruker) spectrometer at 600 MHz.
In Vitro Evaluation of Activity against T. brucei.The efficacy of compounds against T. brucei bloodstream forms was evaluated using a modified resazurin-based assay previously described in literature. 52id log bloodstream forms were added to an equal volume of serial dilutions of compounds in a supplemented complete HMI-9 medium at a final cell density of 5 × 10 3 /mL.Following incubation for 72 h at 37 °C 5% CO 2 , 20 μL of a 0.5 μM resazurin solution was added and plates were incubated for a further 4 h under the same conditions.Fluorescence was measured at 540 and 620 nm excitation and emission wavelength, respectively, using a Synergy 2 Multi-Mode Reader (Biotek).This assay was successfully miniaturized into a 384well microtiter plate and met the criteria for suitability in a screening campaign.The parameters investigated included concentrations of cells, assay media composition, incubation time and temperature, Z′, DMSO tolerance, and reproducibility of the potencies of the reference compound pentamidine (3.17 ± 0.69 nM).The T. brucei phenotypic screen was performed using a Janus MDT (PerkinElmer), equipped with a 384-head, WellMate (Thermo) and MixMate plate mixer (Eppendorf AG). Assay measurements were taken using the EnVision Multilabel 2103 Reader (PerkinElmer) or Infinite M1000 PRO plate reader (Tecan Group Ltd.).Images from the mitochondrial toxicity assay were taken using an Opera automated microscope (PerkinElmer).The CYP450 assay was performed using the Tecan Fluent liquidhandling automation platform (Tecan Group Ltd.).
In Vitro Evaluation of Activity against L. Infantum Intramacrophage Amastigotes.Antiparasitic activity against L. infantum intracellular amastigotes at 10 μM was determined according to literature. 53Briefly, THP-1 cells were differentiated to macrophages, infected with L. infantum promastigotes and, 24 h after infection, were treated with compounds, incubated for another 72 h, and then submitted to high content analysis for determination of antiparasitic activity.The Operetta high-content automated imaging system (PerkinElmer) was used to acquire images and the Harmony Software (PerkinElmer) was optimized quantifying host cells number, infection ratio and number of parasites per infected cell.The ratio between infected cells and total number of cells is then calculated and defined as the infection ratio (IR).The Z′-factor was determined for each plate based on the IR for control wells and used as quality control criteria for plate approval.The IR was normalized to positive (amphotericin B-treated infected cells) and negative (vehicle-treated infected cells) and was used to determine the reduction of infection as the antiparasitic activity.
In Vitro Culture of T. cruzi.The drug assay method consists of infecting the osteosarcoma-derived human U2OS cell-line with tissuederived trypomastigote forms of T. cruzi for 24 h prior to the addition of the compounds to 384-well plates, as previously described. 46nfected cultures were exposed to compounds for 72 h.Plates were processed for high content analysis as described above for the Leishmania assay.
M. tuberculosis.As described in ref 37.
hERG Assay.This assay made use of the Predictor hERG fluorescence polarization assay (Thermo).The assay uses a membrane fraction containing hERG channel (Predictor hERG Membrane) and a high-affinity red fluorescent hERG channel ligand, or "tracer" (Predictor hERG Tracer Red), whose displacement by test compounds can be determined in a homogeneous, fluorescence polarization (FP)-based format. 54ytochrome P450 1A2, 2C9, 2C19, 2D6, and 3A4 Assays.These assays made use of the P450-Glo assay platform (Promega).Each CYP450 assay made use of microsomal preparations of cytochromes from baculovirus infected insect cells.Action of the CYP450 enzymes upon each substrate ultimately resulted in the generation of light and a decrease in this was indicative of inhibition of the enzymes. 54ytotoxicity Assay against the A549 Cell-Line.The assays were performed using the Cell Titer-Glo assay (Promega).The assay detects cellular ATP content with the amount of ATP being directly proportional to the number of cells present.The A549 cells were obtained from DSMZ (German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany) and WI38cells were obtained from ATCC (ATCC CCL-75) and were grown in DMEM with FCS (10% v/v), streptomycin (100 μg/mL), and penicillin G (100 U/ mL). 54ssessment of Mitochondrial Toxicity.This assay made use of MitoTracker Red chloromethyl-X-rosamine (CMXRos) uptake and high content imaging to monitor compound-mediated mitochondrial toxicity in the 786-O (renal carcinoma) cell line.Cells were maintained using a RPMI-1640 medium containing 2 mM glutamine, FCS (10% v/v), streptomycin (100 μg/mL), and penicillin G (100 U/mL). 54iological Data Analysis.The screening data were analyzed using ActivityBase (IDBS, Guildford, UK), and outlier elimination in the control wells was performed using the 3-sigma method.Unless stated, dose−response experiments were performed in the 11-point format with the IC 50 (or EC 50 ) value, Hill slope, minimum signal, and maximum signal for each dose−response curve obtained using a fourparameter logistic fit in the XE module of ActivityBase (IDBS).
Statistical Analysis.Bayesian Machine Learning.We generated and validated Laplacian-corrected naive Bayesian classifier models using Discovery Studio version 4.1 (Biovia, San Diego, CA).Values of the A log P; molecular weight; number of rotatable bonds, rings, aromatic rings, hydrogen bond acceptors, and hydrogen bond donors; molecular fractional polar surface area; and FCFP_6 were used as the molecular descriptors.Compounds were set as active when (i) % antiparasitic activity >80% at 50 μM against T. brucei, (ii) antiparasitic activity >40% at 50 μM against amastigote L. infantum and T. cruzi, and (iii) >20% activity at 20 μM against both replicant and nonreplicant strains of Mtb.Similarly, compounds with % adverse activity (>50% against the five CYP isoforms at 50 μM, >20% against hERG at 10 μM, >50% against mitochondrial toxicity at 10 μM, and >30% cytotoxicity against the A549 cell-line) were set as toxic.Computational models were validated using leave-one-out (LOO) cross validation, in which each sample was left out one at a time.A model was built using the remaining samples, and that model was used to predict the left-out sample.Each model was internally validated, receiver operator characteristic curve (ROC) plots were generated, and the cross-validated ROC's "area under the curve" was calculated.Then, 5-fold cross validation (i.e., leave out 20% of the data set, and repeat five times) was also performed.
−61 It uses automated workflows to detect problematic molecules for proper integration into machine learning methods, which can then be rapidly corrected using automation standardization and human recuration.This software also outputs a high-quality data set and a Bayesian machine learning model that may be used to predict potential bioactivity of additional compounds.These models are generated with ECFP6 descriptors produced from the CDK library 54 that have been used for structure− activity relationships. 62Each Bayesian machine learning model also includes metrics to evaluate internal 5-fold cross-validation performance, 21 including Receiver Operator Characteristic (ROC), Recall, Precision, F1 Score, Cohen's Kappa, 63,64 and Matthews Correlation Coefficient. 65The prediction scores generated with Assay Central model 62,66 as a probability-like score determined by the ratio of fingerprints (e.g., ECFP_6) in active and inactive molecules, with a value of 0.5 or greater designating a chemical as active at the modeled target.
Comparison of Machine Learning Algorithms.The data sets were used for the comparison of additional machine learning algorithms, namely, random forest, k-nearest neighbors, support vector classification, nai ̈ve Bayesian, AdaBoosted decision trees, and deep learning architecture. 49These alternative machine learning methods use ECFP6 as the molecular descriptors.5-fold cross-validation metrics were compared across all algorithms with a rank normalized score. 61,67hese scores can be evaluated pairwise (i.e., method per training set) or independently (i.e., generally method comparison).A further measure is a "difference from the top" (ΔRNS) metric which subtracts the rank normalized score for each algorithm from the highest rank normalized score for a specific training data set.This method maintains the pairwise results from each training set score by algorithm and allows a direct performance comparison of any two machine learning algorithms and yet maintains the information from the other six algorithms.

Figure 1 .
Figure 1.Representative chemotypes of known drugs used for the therapy of tuberculosis and protozoan (Leishmania and Trypanosoma spp.) infections.

Figure 3 .
Figure 3.Chemical and physicochemical characterization of the Ty-Box library.(A) 3D and 2D t-SNE of the Ty-Box compound library.Clustering of the library compounds was based on the chemical similarity.Thirteen clusters were identified; the common chemical scaffold of the compounds of the cluster is reported in panel (C).(B) (Top) distribution of the extended rule-of-five parameters (RO5) for the 456 compounds of the Ty-Box library.The bars are color-coded according to the chemical clusters identified in the t-SNE; (bottom) resume of the physicochemical parameters of the Ty-Box compound library.(C) Scaffold analysis of the compounds from Ty-Box library.The most frequently occurring atomic frameworks (Murcko scaffolds) in more than ten representative compounds are reported.

Figure 4
Figure 4. Hierarchical clustering groups hit compounds with a similar chemical structure based on the ECFP_6 fingerprint.Antiparasitic activity, drug−drug interactions, and early human toxicity emerged from primary screening for each compound is represented through a heat-map.All the data of the primary screening are reported in TableS1.

Figure 5 .
Figure 5. Venn's diagram reporting the selectivity profile of the primary hits fished out from the HTS according to the cutoff: % of cell growth inhibition >80% at 50 μM against T. brucei, >40% at 50 μM against L. infantum and T. cruzi, and >20% at 20 μM against replicant and nonreplicant strains of Mtb.The hits prioritized by Pareto ranking for dose−response studies are underlined.

Figure 7
Figure 7.Chemical structures of the sole validated primary hits and the corresponding antiparasitic activity expressed as EC 50 .For compounds that did not agree with the defined cutoff against at least one parasitic strain, the dose−response was not performed (n.d., not determined).SD for all the assays is within ±10% of each value.

Figure 8 .
Figure 8. Structural similarity trees of the secondary hit compounds.The compounds were divided in three clusters according to the structural similarity and the derivation from the parent hits identified in the primary screening.The modification in the chemical structure of each hit, with the respect to the most similar analogues, is highlighted in red.
Scheme 2 a

Figure 9 .
Figure 9. Heat map representation reporting the antiparasitic activity, drug−drug interactions, and early human toxicity for the secondary hits.

Figure 10 .
Figure 10.Statistical analysis for the validation of the Bayesian machine learning models by comparison of the predicted vs experimental data.For each target and off-target the correlation between the predicted state and the experimental activity/toxicity, the confusion matrix at the selected cutoff (antiparasitic activity >80% at 50 μM against T. brucei; antiparasitic activity >40% at 50 μM against amastigote L. infantum and T. cruzi; inhibitor activity >50% against CYP isoforms, >20% against hERG and mitochondrial toxicity >50% at 10 μM) and the ROC curve for the evaluation of the diagnostic ability of Bayesian machine learning model to discriminate the true state of test set of compounds is reported.

Figure 11 .
Figure 11.Biological activity summary of 40.(A) Chemical structure of the secondary hit 40; (B) summary of the antiparasitic activity and liability of 40.(C) Atom coloring of 40 with Assay Central Models for targets and off-targets generated in this work.(D) Pharmacokinetic profile for compound 40 after IV administration at the dose of 1 mg/Kg in mice.