Virtual Screening Based on Ligand and Structure with in vitro Assessment of Neolignans against Trypanosoma cruzi

Chagas disease, caused by the parasite Trypanosoma cruzi, occurs most commonly in Latin America. As the treatment is highly toxic and ineffective in the chronic phase of the disease, alternative treatments are needed. Through quantitative structure-activity relationship modeling (QSAR) analysis using ligand-based and structure-based virtual screening methods, we predicted the trypanocidal potential of 47 neolignans against three targets, the enzymes cruzain, trypanothione reductase, and sterol 14-alpha demethylase. A combined analysis allowed for the selection of potent inhibitors against Trypanosoma cruzi. Of these compounds, two were isolated and shown to inhibit the growth of epimastigotes at concentrations of 9.64 and 8.72 µM, and trypomastigote forms at 4.88 and 2.73 µM. Therefore, the compounds (2R, 3R)-2,3-dihydro-2 (4 methoxyphenyl)-3-methyl-5-(E)-propenylbenzofuran (46) and ottomentosa (47) may be a good option of growth inhibitors for the parasite stages and warrant additional study.


Introduction
Trypanosoma cruzi the protozoan parasite that causes Chagas disease (CD), was initially endemic to Latin America, but has spread to other locations such as Canada, the United States, Europe, Australia, and Japan. 1 It currently affects 6-7 million people worldwide and causes approximately 50,000 deaths per year. 1,2The transmission of T. cruzi can occur congenitally, through organ transplantation, blood or, ingestion of food and drinks contaminated by the parasite. 3he parasites are transmitted predominantly to humans as metacyclic trypomastigote (MT) forms through the contaminated feces of blood-sucking triatomines at the bite site.After internalization by host cells near the entry site, MTs initially reside in a vesicle containing the parasite, the parasitophore vacuole, from where they escape into the host cell's cytoplasm and differentiate in the proliferative amastigote form.After several cycles of replication, amastigotes differentiate into mobile flagellated trypomastigotes, which are released into the bloodstream, from where they can spread by infecting distant tissues or are captured by the triatomine vector during a blood meal.Ingested blood trypomastigotes become epimastigotes in the vector's midgut, multiplying and then differentiating into infectious metacyclic trypomastigotes.Infectious trypomastigotes and intracellular replicative amastigotes are the parasite's clinically relevant cycle stages to the drug target. 3,4urrently, no vaccines prevent diseases caused by trypomastigotas, and although chemotherapeutic drugs have been available for decades, they are highly toxic and have unpleasant side effects.Benznidazole, a derivative of nitroimidazole, and nifurtimox, a nitrofuran compound, both developed more than 40 years ago, are currently the only drugs available for the treatment of CD. 5 Although benznidazole is the first-line drug due to its better tolerability, both drugs have significant side effects. 6Therefore, patients should be monitored frequently.Unfortunately, the medications available are effective only in the acute phase and in 20% of cases the treatment must be stopped due to side effects. 7][10][11] Some studies [12][13][14] have reported the importance of neolignans as promising compounds in the treatment against T. cruzi.
Through a literature review, we were able to find and highlight three important targets (the enzymes cruzain, trypanothione reductase [TR], and sterol 14-alpha demethylase [CYP51]) for the proliferation and survival of the parasite in the parasite's three cellular forms.Cruzain and TR enzymes are found only in tryponomastids, which contributes to the design of selective drugs that do not harm humans.Cruzain is the main cysteine protease of T. cruzi, TR is essential for redox metabolism and CYP51 is the key enzyme in ergosterol biosynthesis.][17] We explored computational and experimental studies to select the most promising trypanomicidal compounds that may be effective in the acute and chronic phases of CD and present low toxicity.Thus, the identified compounds may be effective as insecticidal and drug agents in the acute and chronic phases of the disease.

Results and Discussion
Quantitative structure-activity relationship modeling (QSAR) modeling Three prediction models were built using the random forest (RF) algorithm to perform ligand-based virtual screening.For the construction of these models, molecular descriptors were calculated for molecules with known activity against cruzain, TR and CYP51 of T. cruzi obtained through the ChEMBL database. 18,19F models were evaluated for their predictive powers, using the parameters of specificity, sensitivity, precision, accuracy (AUC), positive predicted value (PPV) and negative predicted value (NPV), in addition to performance and robustness, using the receiver's operating characteristic (ROC) curve and Mathews correlation coefficient (MCC).Table 1 describes the characteristics of the models, in terms of predictive power and robustness, and Figure 1 shows the performance of the models.The results showed that the models provided satisfactory classification, performance, and robustness, except for the CYP51 enzyme, for which the accuracy and specificity values were below 0.6.Therefore, this model was disregarded for the prediction.
After the models were validated, they were used to analyze the set of neolignans for activity against T. cruzi.The neolignan bank was then screened to select compounds that are potentially active against cruzain and TR.
The RF model was able to select a compound with active potential, with probabilities from 50% for cruzain (Table 2).The TR model was able to classify all 47 compounds as potentially active, with probabilities ranging between 54 and 84% (Table 2).According to these results, neolignans have a greater active potential for the TR protein than for cruzain.

Docking consensus
The 47 neolignans also underwent a docking consensus assessment to increase the method's reliability and decrease false positives.The enzymes cruzain, TR and CYP51 were used for docking studies.The docking results were generated using three different scoring functions and validated by redocking the PDB ligand (ligand inhibitor crystallized with the protein of the Protein DataBank-PDB) 20 for each enzyme.More negative values indicated better predictions for most scoring functions.
After docking, the results were standardized for each scoring function using the docking probability (Prob Dc ) formula: 21 , if E lig < E Inib (1)   where E Lig is the energy of the ligand, E MLig is the energy of the ligand with the highest score, and E Inib is the energy of the inhibitor obtained from the crystallographic data of the test protein.The highest value in Prob Dc is equal to 1. Thus, only those compounds with energy values equal to or greater than the interaction with energy of the crystallographic inhibitor ligand were considered to be potentially active.Then, an average of the standardized results for each scoring function was determined.
The docking results were validated by re-docking the crystallographic ligand and the root-mean-square deviation (RMSD) of the positions.Re-docking consists of positioning and predicting the binding affinity of the crystallographic ligand in the region of the enzyme's active site.The RMSD compares and calculates the mean square root deviation of the postures obtained by re-docking and the ligand structure obtained experimentally.For the adjustment to be reliable, the RMSD value must be 2.0 Å or less. 22The results showed that the targets cruzain, TR and CYP51 obtained 0.77, 0.64 and 0.31 Å, respectively.
Among the 47 neolignans analyzed by molecular docking, two compounds were potentially active against cruzain.The cruzain test inhibitor had a Prob Dc value of 0.88 and only two neolignans showed values higher than the inhibitor, with values of 0.91 (compound 1) and 0.90 (compound 35).For the TR enzyme, 31 neoligans were considered active, with Prob Dc values ranging from 0.49  to 0.81.The TR inhibitor had a Prob Dc value equal to 0.49.
For the CYP51 enzyme, the inhibitor obtained a value of 0.56 and 18 neolignans obtained an equal or greater value.These results indicated that neolignans, in general, were more likely to activate TR and CYP51 proteins, and are not selective for the cruzain enzyme.These results can be seen in Tables 3-5.

Structure and ligand-based combined analyses
A second consensus analysis was carried out to identify potential lignans and multitarget, which demonstrate the probabilities of being active potentials for more than one protein, based on the RF and docking model.In this case, we used all the results of the prediction of the biological activity of the neolignans and combined them with the docking results.In addition to selecting the active compounds, this combined analysis allowed the selection of the most potent compounds by combining two important methodologies, based on the ligand and structure.For this analysis, the following formula was used: 21 , if Prob Comb > 0.5 (2)   where Prob Comb is the combined probability between the RF model and the docking model, Prob Dc is the probability of a compound being active in the molecular coupling analysis, ESP is the value of the specificity of the RF model and P Activity is the probability value of a compound to be active in the RF model.The combined probability (Prob comb ), based on the ligand and structure, can increase the predictive power of the models and decrease the number of false positives.For the molecules to be considered potentially active, the probability values must be equal to or greater than 0.5.The higher the Prob comb value, the greater the potential of the molecule.The combined probability values were calculated for all neolignans and each target enzyme, and we analyzed which molecules were multitargets.Only for the CYP51 enzyme was it impossible to calculate the Prob comb value, as we did not have biological activity data for this enzyme due to the low quality of the generated RF model.Therefore, we considered only the Prob Dc values for this target.In this case, the Prob comb value needed to be higher than the crystallographic ligand, which was 0.56.
After performing the combined analysis, based on the ligand and structure, and using the formula to identify potentially active and multitarget molecules, we identified 29 potentially active molecules for more than two target enzymes from the entire set of neolignans analyzed.In addition, after the combined probability analysis, we selected the multitarget compounds that passed the applicability domain for all enzymes in this study.Using Prob comb , we were able to select 22 compounds with a probability of activity ranging from 50-65% for cruzain Probability of a compound being active in the molecular coupling analysis; b probability of biological activity; c combined probability between the ligand and structure-based.
(Table 3).The combined analysis allowed us to select more active compounds for this target than using the P Activity and Prob Dc results separately.It was also possible to select 46 neolignans potentially active against the TR enzyme with a Prob comb value ranging from 52-82% (Table 4), while 18 neolignans obtained a probability ranging between 56-88% for the CYP51 enzyme (Table 5).The set of 47 neolignans was submitted to several predictive parameters to identify the compounds with the best pharmacokinetic, pharmaco-chemical and pharmacological profiles.Initially, through physicochemical properties, we sought to verify compounds with good absorption, using the lipid rule as a parameter.
We evaluated the absorption and bioavailability properties using the Lipinski rule, 23 comprising molecules with molecular weights below 500 Da, calculated the partition coefficient (LogP) (cLogP) values less than five, less than five hydrogen bonding donors, no more than ten hydrogen bonding acceptors and ≤ 10 rotating bonds with excellent absorption and bioavailability.Molecules that violate two or more of these rules do not demonstrate sufficient absorption.We observed that only one neolignan from our set of compounds did not meet this requirement.Therefore, 97.87% of the neolignans showed good absorption and bioavailability.
Factors such as lipophilicity and solubility contribute to the distribution of the drug in vivo, which is a requirement for advancing to preclinical and clinical testings.The most common descriptor for lipophilicity is the partition coefficient between n-octanol: water (log P).The results showed that all neolignans had ideal logP values below 5.0.
Pharmacokinetics are essential for understanding drug metabolism in the body, half-lives, and toxic metabolites.Unfortunately, many compounds fail in the pre-clinical and clinical testing phase due to the effects of metabolism and malabsorption on the brain.Therefore, an early assessment of this effect is necessary and an in silico approach contributes substantially to mitigate adverse reactions that some may experience.The results showed that 47.55% of the neolignans were not a substrate for the CYP enzyme and did not cross the blood-brain barrier.Toxicity was also evaluated, and we observed that only nine neolignans showed low or high toxicity in at least one parameter evaluated, such as mutagenicity or tumorigenesis, negative effects on the reproductive system, and irritability.Therefore, 80.85% of neolignans were considered to have the best ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties, as they do not present toxicity risks.Tables S1-S4, in Supplementary Information (SI) section, shows the ADMET profile of the entire set of neolignans.

Interaction analysis
Several neolignans in this study obtained promising binding energy values, biological activity, combined analyses, pharmacokinetics and pharmaco-chemical properties, and low toxicity.We chose to analyze the interactions of the two compounds (33 and 43) that stood out in all these properties and of the compounds that were isolated (46 and 47) that obtained good results in docking (Figures 2 to 4).

Cruzain
Cruzain is the main cysteine protease in T. cruzi, and is essential at all stages of the parasite's development.Overexpression of cruzaine increases the transformation of the parasite into an infectious form and is therefore considered an attractive target in drug planning. 15ompound 33 formed a hydrophobic interaction with Leu37, a steric interaction with Ala133 and three hydrogen bonds with the amino acids Trg59, Asp60 and Asp158.While compound 43 formed only two hydrogen bonds with the amino acids Ser61 and Asn70, in addition to several hydrophobic interactions with residues Met68, Gly65, Leu67, Gly66 and Ala133.Compound 47 showed a hydrogen bond with the amino acid Gly66, a van der Waals interaction with the amino acid Gly65 and hydrophobic interactions with the amino acids Leu67, Ala133, Asp156 and Leu157.According to Durrant et al., 24 an irreversible cruzain inhibitor, benzoyl-Tyr-Ala, could interact similarly to neolignans with a cruzain active site.According to the authors, interactions with Met68, Leu67 and Glu205 were important to inhibit the active of the enzyme's active.TR TR, found in the epimastigote and trypomastigote forms of the parasite, is a key enzyme in redox metabolism and is essential for trypanosomes.This enzyme is absent in humans, replaced by glutathione and glutathione reductase, offering a target for selective inhibition.
Compound 33 formed three strong hydrogen bonds with residues Gly14, Ser15 and Glu19.Hydrophobic interactions were also observed with the amino acids Cys53, Tyr111 and Met114.Compound 43 showed more hydrogen bonds compared to compound 33, forming notable bonds with the amino acids Ser15, Glu19, Ile107, Ser110, Tyr111 and Met114.Only a weaker interaction was observed with the amino acid Ile339.Compound 46 formed an interaction with the amino acid Thr335 and several hydrophobic interactions with the amino acids Gly57, Cys58, Lys61, Ile200, Phe204 and Asp327.Compound 47 formed more stable bonds with amino acids Ser110, Tyr111, Met114 and hydrophobic interactions with amino acids Ser15 and Glu19.Compound 47 formed more stable bonds with amino   A study by Saravanamuthu et al., 16 showed that a T. cruzi inhibitor of TR formed several interactions also observed with neolignans and the active site of TR.Notable interactions include those with Glu19, Trp22, Cys53, Ser110, Tyr111, Asp117 and Leu399 stand out.

CYP51
The CYP51 enzyme is involved in a key step in ergosterol biosynthesis, responsible for the oxidized demethylation of intermediate sterols through the heme group.It is essential for the parasite's survival, development and proliferation, which is why it is present in all cellular forms. 17ompound 33 formed four strong hydrogen bonds with the amino acids Ala287, Ala291, Met358 and Ala414 from the active site of CYP51.It also showed several hydrophobic interactions with the residues Tyr103, Met106, Ala115, Met123, Leu127 and Leu356.While compound 43 showed hydrophobic interactions with residues Phe110, Ala115, Ala287 and Leu356.Two hydrogen bonds with the amino acids Tyr116 and Phe290 and a steric interaction with the amino acid Leu130.

Molecular dynamics simulations
After the virtual screening and the analysis of the activity potential of several neolignans against important T. cruzi enzymes, we conducted molecular dynamics simulations with the two compounds that we were able to isolate (compounds 46 and 47) to assess the flexibility and stability of the enzymes and their interactions in the presence of factors such as solvent, ions, pressure, and temperature.This information is important because it complements the docking results and allows us to evaluate if the compounds remain strongly linked to the studied enzymes in the presence of factors found in the host organism.We chose the TR enzyme for this analysis, because neolignans were more selective for this protein.Then, the RMSD was calculated for the Cα atoms of the complexed enzyme and the structures of each ligand, separately.
The RMSD analysis of the TR enzyme complexed with the crystallographic ligand showed conformations ranging from 0.35 to 0.45 nm in size for 50 ns, with high stability (Figure 5).The same pattern was observed for the enzyme complexed with the neolignans.The stability of this protein is essential to keep compounds bound to the active site.
When we analyzed the flexibility of the ligands, we found that the crystallographic ligand was drastically more unstable than the neolignans during the entire dynamics simulation (Figure 6).Therefore, in the presence of solvents, ions and other factors, neolignans can establish stronger bonds with the active site.
To understand the flexibility of the residues and amino acids that contribute to the conformational changes in the TR enzyme, the mean quadratic fluctuation (RMSF) values were calculated for each amino acid in each enzyme.High RMSF values suggest greater flexibility, while low RMSF values reflect less flexibility.Since amino acids with fluctuations above 0.3 nm contribute to the flexibility of the protein structure, we found that residues at positions 1, 80-90, 460-462, and 486 contribute to conformational changes in the TR enzyme (Figure 7).We also found that none of the amino acids affecting the structural conformations identified in TR were active site components.This helps the neolignans to remain in the active site.
Through graphic programs of molecular modeling, it was possible to analyze 2D interactions at different times during the simulation of molecular dynamics (Figure 8).We noticed that most of the interactions observed in docking, were also observed in the dynamics simulations, that is, even in the presence of solvent and ions.Among the observed interactions, Val58, Ile106, Tyr110 and Met113 are notable.

Free energy calculations
The molecular mechanics -Poisson Boltzmann surface area approach (MM/PBSA) method was used to explore the arrangement of the interactions further and estimate the free energy after the molecular dynamic (MD) simulation.As seen in Table 6, the crystallographic ligand had superior free binding energy to the neolignans that obtained the best results in docking and predicting of biological activity.Although lignans 46 and 47 showed higher values of binding affinity in the docking than the    crystallographic ligand, only with the results obtained by the MM/PBSA calculations it was it possible to verify that in the presence of a solvent, the inhibitor provided better performance than the neolignans.When analyzing energy contributions, we note that the value of electrostatic and van der Waals energies favored the inhibitor's increased free energy.However, although these results are excellent for the inhibitor, they do not indicate that neolignans did not produce activity, as hydrogen bonds are strong and are not evaluated by the MM/PBSA method.
Some previous studies have demonstrated the trypanocidal activity of neolignans, but this is the first report that describes the activity of these two neoligans as trypanocidal.A study by Cabral et al. 13 showed that two neolignans, licarin A and burchellin, could inhibit the growth of the epimastigote by 45 and 20%, respectively.The authors also found that lycarin A and burchellin could induce trypomastigote death with IC 50 (represents the concentration required for 50% inhibition of the parasites)/24 h of 960 and 520 μM, respectively.Pelizzaro-Rocha et al. 25 demonstrated that the neolignan eupomatenoid-5 exhibited activity against trypomastigotes, the infective form of T. cruzi half maximal effective concentration (EC 50 40.5 μM), leading to ultrastructural alteration and lipoperoxidation in the cell membrane.In addition, they reported that the trypanocidal action of eupomatenoid-5 might be associated with mitochondrial dysfunction and oxidative damage, which can trigger destructive effects on the biological molecules of T. cruzi, leading to the death of the parasite.Ferreira et al. 26 used a semi-synthetic library of 23 derivatives of the neolignan dehydrodieugenol B that was prepared to explore synthetically accessible activity structure (SAR) relationships against T. cruzi.Five compounds demonstrated activity against trypomastigotes (IC 50 values from 8 to 64 μM) and eight showed activity against intracellular amastigotes (IC 50 values from 7 to 16 μM).

Conclusions
We used a comprehensive computational studies approach to investigate the potential of neolignans in the treatment of CD, which made possible the isolation and experimental testing of natural products against cellular forms of T. cruzi.The predictive models generated from essential enzymes of the parasite obtained satisfactory performance results for the continuity and credibility of this study, with an accuracy greater than 75%, and selected a neolignan with a 50% probability of active potential for the cruzain enzyme.For the TR enzyme, an accuracy of 85% was achieved, and the model selected all neolignans, with activity probabilities between 54 and 84%.Therefore, neolignans were considered selective against the TR enzyme.
For a structure-based investigation, a consensus docking analysis was conducted to ensure the reliability of the RF model and to reduce the number of false positives.Among the 47 neolignans analyzed by molecular docking, two compounds were considered potentially active against cruzain, 31 neolignans active against TR, and 18 against CYP51.These results indicated that neolignans, in general, are more likely to activate TR and CYP51 proteins, and are not selective for the cruzain enzyme.
A structure and ligand-based combined analysis, employed to increase the predictive power was able to identify potentially active molecules, using RF models and molecular docking, resulting in the identification of 22 compounds with a probability of activity ranging from 50-65% for cruzain and 46 neolignans potentially active against the TR enzyme with a probability ranging from 52-82%.We found that the combined analysis expanded the selection of active compounds for cruzain relative to the RF model and molecular docking.We also found that neolignans were more selective for the TR enzyme.MD simulations revealed that neolignan-complexed RT was stable under several conditions, including solvent, ions, temperature, and pressure, with only small variations observed for some complexed compounds.Therefore, the binding affinity between proteins and ligands is unlikely to be affected by environmental changes.In addition, none of the amino acids responsible for the enzymatic conformational changes were at the active site, allowing the active site to remain stable.In addition, through calculations of free energy using the MM/PBSA method, we found that although the crystallographic ligand presented a higher energy bond value in the presence of factors, it is notable that strong hydrogen bonds also favor the permanence of the neolignans at the active site of TR.
Two neolignans with excellent ADMET profiles demonstrated to be potentially active inhibitors against the enzymes cruzain and TR enzymes by virtual screening, were isolated from Krameria tomentosa and subjected to in vitro tests.The results showed that the two neolignans (46 and 47) could potentially inhibit T. cruzi, at concentrations of 9.64 and 8.72 µM for the epimastigote forms and 4.88 and 2.73 µM for the trypomastigote forms, respectively.Therefore, the compounds (2R,3R)-2,3-di-hydro-2-(4-methoxyphenyl)-3-methyl-5-(E)-propenylbenzofuran (46) and ottomentosa (47) proved to be promising inhibitors of growth for the epimastigote and promastigote stages of the parasite.
We also concluded that the neolignans investigated in this study that were considered active against the enzymes cruzain and CYP51 could be potent inhibitors of these enzymes in amastigotes, since these enzymes are present in all cellular forms of T. cruzi.Therefore, the neolignans selected in this study serve as a starting point for the development of new antichagasic compounds and should be investigated for their potential activity against the most relevant parasitic form of the parasite, the amastigote form.

Data collection and curation
The biological activity and 3D structure data of the enzymes cruzain, trypanothione reductase (TR), and sterol 14-alpha demethylase (CYP51) were investigated.Datasets with information on compounds and their activity values for each selected enzyme (Table 8) were downloaded from the ChEMBL database 18,19 with the codes CHEMBL3563 (cruzain), CHEMBL5131 (TR) and CHEMBL1075110 (CYP51).These compounds were used to build predictive models and were classified based on the pIC 50 (−log IC 50 ).The compounds of each dataset were classified based on the pIC 50 , this information is described in Table 8.The IC 50 value represents the concentration required for 50% inhibition of the parasites.The classification of compounds based on pIC 50 is performed in order to have the necessary number of samples to create a pattern and increase the probability of the model being correct in selecting active molecules, that is, the higher the pIC 50 , the lower the concentration of the chemical compound and consequently more potent is its activity.However, it is necessary to have a sufficient number of samples so that the model can distinguish active from inactive compounds.Figure 9 summarizes all the procedures performed in this study.
In addition, a search was performed in the ChEMBL database for lignans extracted from natural products.A total of 47 lignans were found, therefore, these were evaluated by virtual screening to identify molecules with potential activity against the three main enzymes listed above that are involved in the proliferation and survival of T. cruzi, according to the workflows presented by Fourches et al. 27 Three-dimensional structures were generated and standardized using the Standardizer v.18.17.0. 28This standardization is of paramount importance to create consistent compound libraries and is done through the following steps: addition of hydrogens, aromatization, generation of 3D structure (clean the molecular graph in three dimension) and exporting the compounds in SDF format.For a more detailed description on how the dataset was curated, please refer to the workflows described by Fourches et al. 27,29,30 Codes for the structures of neolignans are available in the Table S1, SI section.
Quantitative structure-activity relationship modeling (QSAR) Knime 3.6.2software (Knime 3.6.2,Konstanz Information Miner, Zurich, Switzerland) 31,32 was used to perform QSAR modeling.Given the success of our previous studies, 33,34 we opted to perform a QSAR 3D analysis.To generate descriptors, all compounds with their standardized chemical structure were saved in SDF format and imported into Dragon 7.0 software (Kode Chemioinformatics SRL, Pisa, Italy), a total of 5270 molecular descriptors. 35,36The descriptors generated in Dragon were imported into the Knime software and the random forest algorithm (RF) was used to build prediction models.
Each dataset was divided using the "Partitioning" tool, with the "stratified sample" option, to create a training set and an external test set, which represented 80 and 20% of the compounds, respectively.Although the compounds were selected randomly, the same proportion of active and inactive samples was maintained in both sets.
For external validation, we employed 5-fold crossvalidation using randomly selected, stratified groups.The distributions according to activity class variables were found to be maintained in all validation groups and in the training set.8][39] There were 200 the total number of trees constructed and 1 seed in the generation of random numbers for the RF for all generated models.
Using Knime nodes the most important descriptors in the generation of each prediction model were evaluated.The external performances of the selected models were analyzed for sensitivity (true positive rate, i.e., active rate), specificity (true negative rate, i.e., inactive rate) and accuracy (overall predictability).The positive (PPV) and negative (NPV) predictive values inform us about the probability of predicted positives (PPV) and negatives (NPV) being the true positives and negatives, respectively.In addition, the sensitivity and specificity of the receiver operating characteristic (ROC) curve were found to describe true performance with more clarity than accuracy. 39he model was also analyzed by the Matthews coefficient, a way to evaluate the model globally from the results obtained from the confusion matrix.The Matthews correlation coefficient (MCC) is a correlation coefficient between observed and predictive binary classifications.It results in a value between −1 and +1, where a coefficient of +1 represents a perfect forecast, 0 is nothing more than a random forecast, and −1 indicates total disagreement between forecast and observation. 40he MCC can be calculated from the following formula: ( where VP is the value of true positive, VN is the value of true negative, FP is the value of false positives and FN of false negatives.The applicability domain (APD) was used to analyze the compounds of the test sets to evaluate whether their predictions were reliable.The APD is based on Euclidean distances, and similarity measures between the descriptors of the training set are used to define the applicability domain.Meaning, if a test set compound has distances and similarity beyond this limit, its prediction is not reliable.The APD calculation is performed behind the formula: where d and σ are the Euclidean distances and the standard mean deviation, respectively, of the compounds in the training set.[43] Study of molecular docking Molecular docking was used to investigate the mechanism of action of selected compounds against the enzymes cruzain (PDB ID 1AIM), 15 TR (PDB ID 1GXF) 16 and CYP51 (PDB ID 4CK9). 44The 3D structures of the enzymes were obtained from the Protein Data Bank (PDB). 20Information about complexed enzymes and inhibitors can be seen in Table 9.Initially, all water molecules were removed from the crystalline structure and the mean square deviation (RMSD) was calculated from the positions, indicating the degree of reliability of the adjustment.To evaluate the docking procedure, we performed the redocking and RMSD calculation.The RMSD provides the connection mode close to the experimental structure and is considered successful if the value is less than 2.0 Å.The inhibitory ligands crystallized in each enzyme were used as templates to signal the active site region of the protein.
We used the Molegro Virtual Docker v.6.0.1 (MVD) software 45,46 with its predefined parameters.Then, a docking wizard was created in which the enzymes and ligands were inserted to analyze system's stability through the interactions associated with the enzyme's active site, using the energy value of the MolDock Score 45 as a reference.The MolDock SE (Simplex Evolution) algorithm is based on differential evolution and was used with the following parameters: a total of 10 runs with a maximum of 1,500 iterations using a population of 50 individuals, 2,000 minimization steps for each flexible residue and 2,000 steps of global minimization per run.The MolDock Score (GRID) and PLANTS score (GRID) scoring function were used to calculate the fit energy values.A GRID was set at 0.3 A and the search sphere radius was set at 15 Å.For the analysis of the ligand energy, electrostatic interactions, hydrogen bonds and sp2-sp2 torsions were evaluated.

AutoDock Vina (Vina)
We used AutoDock Vina 47,48 under the graphical interface of the PyRx Virtual Screening program tool, 49,50 maintaining the default parameters of the software.This program is based on the genetic algorithm and the empirical energy of the field strength.The protein and ligand files were converted to pdbqt format, and the generated GRID was conducted to the active site region.AutoDock Vina generated 10 conformations for each binder which were used for analysis.The binding affinity scoring function (kcal mol -1 ) corresponds to the sum of intermolecular and intramolecular contributions and the potentials are based on knowledge and empirical scores.

Consensus docking
A consensus analysis using three different scoring functions was used to decrease the number of false positives.
Prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties ADME parameters were calculated using the SwissADME open-access web tool, 51,52 which offers a set of rapid predictive models for the assessment of physicochemical, pharmacokinetic and pharmacological properties.The toxicity prediction was performed in the OSIRIS Property Explorer, 53,54 based on the following parameters: mutagenicity, tumorigenicity, reproductive effects and irritability.For absorption, factors included membrane permeability, intestinal absorption and the glycoprotein P substrate or inhibitor.Thus, we investigated compounds that did not exceed more than two violations of the Lipinski rule and for which the logP consensus was not greater than 4.15.In addition, the compounds were not substrates for the permeability glycoprotein enzyme (P-gp).The distribution was evaluated by factors that included the blood-brain barrier (logBB) and the permeability of the central neural system (CNS).Metabolism was predicted based on the CYP substrate or inhibition models (CYP1A2, CYP2C19, CYP2C9, CYP2D6 and CYP3A4).

Molecular dynamics simulations
6][57] The protein and ligand topologies were also prepared using the GROMOS96 54a7 force field.The MD simulation was performed using the SPC water model of point load, extended in a cubic box. 58The system was neutralized by adding ions (Cl − and Na + ) and minimized, to remove bad contacts between complex molecules and the solvent.The system was also balanced at 300 K, using the 100 ps V-rescale algorithm, represented by NVT (constant number of particles, volume, and temperature), followed by equilibrium at 1 atm of pressure, using the Parrinello-Rahman algorithm as the NPT (constant particles pressure and temperature), up to 100 ps.MD simulations were performed in 5,000,000 steps, at 50 ns.To determine the flexibility of the structure and whether the complex is stable and close to the experimental structure, RMSD values of all Cα atoms were calculated relative to the starting structures.RMSF values were also analyzed, to understand the roles played by residues near the receptor binding site.The RMSD and RMSF graphs were generated in Grace software 59 and the protein and ligands were visualized in UCSF Chimera. 60

Free energy calculations
The Molecular Mechanics-Poisson Boltzmann Surface Area approach (MM/PBSA) was used to calculate the free binding energy of the protein-binding complex in the study of the molecular behavior of the sulfotransferase enzyme and its respective ligands.The GROMACS g_mmpbsa module 61,62 was applied to estimate the bond-free energy of the selected complex using the trajectory files obtained in the molecular dynamics simulation.The GROMACS MM-PBSA calculation consisted of three steps.First, the potential energy in the vacuum was calculated, and then, the energies of polar and, finally, nonpolar solvation were estimated.The non-polar solvation energy was calculated using the solvent accessible surface area model (SASA).The required input files and solvation energy values were then selected to evaluate the following energetic components: van der Waals energy, electrostatic energy, polar energy of solvation, non-polar solvation energy, and free energy of bonding.

Parasites and cells
Epimastigote forms of T. cruzi, strain Y, were cultured liver infusion tryptose medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin 10,000 IU per 10 mg and kept at 28 ºC in a of biochemical oxygen demand incubator.The parasites used in the experiments were aliquoted from cultures in an exponential growth phase, determined using a 10-day growth curve.
Trypomastigote forms were obtained by infection of LLC-MK2 cells.The cells were cultured (2 × 10 5 ) in DMEM medium supplemented with 10% FBS and 1% penicillin-streptomycin 10,000 IU per 10 mg maintained at 37 ºC and 5% CO 2 .After reaching the confluence state, they were infected with a suspension of epimastigotes (1 × 10 7 ).The infected cells were kept in DMEM medium with 2% SFB at 37 ºC and 5% CO 2 for six days.Finally, the trypomastigote forms were removed from the supernatant, centrifuged and available for testing. 66

Figure 1 .
Figure 1.Receiver operating characteristic (ROC) curve-generated random forest (RF) model.(a) Test and (b) cross-validation for the enzyme cruzain and (c) test and (d) cross-validation for the TR.

Figure 2 .
Figure 2. 3D and 2D interactions of neolignan 33, 43 and 47 with the cruzain enzyme.Hydrogen bonds are highlighted in green, hydrophobic interactions are highlighted in pink, and electrostatic interactions are highlighted in red.

Figure 3 .
Figure 3. 3D and 2D interactions between neolignans 33, 43, 46 and 47 with the TR enzyme.Hydrogen bonds are highlighted in green, hydrophobic interactions are highlighted in pink, and electrostatic interactions are highlighted in red.

Figure 4 .
Figure 4. 3D and 2D interactions between neolignan 33 and 43 with the CYP51 enzyme.Hydrogen bonds are highlighted in green, hydrophobic interactions are highlighted in pink, and electrostatic interactions are highlighted in red.

Figure 5 .
Figure 5. RMSD values for the Cα atoms of the TR enzyme complexed with neolignans and the Protein Data Bank (PDB) ligand.

Figure 6 .
Figure 6.The RMSD values of the Cα atoms of the neolignans and the PDB ligand.

Figure 7 .Figure 8 .
Figure 7. Root-mean-square fluctuation (RMSF) for the Cα atoms of the TR enzyme complexed with the neolignans and the PDB ligand.

Figure 9 .
Figure 9. Outline of all study procedures.

Figure 10 .
Figure 10.Isolated structures of the roots of Krameria tomentosa.

Table 1 .
Summary of parameters corresponding to the results obtained for all models a Positive predicted value; b negative predicted value; c Mathews correlation coefficient; d receiver's operating characteristic.TR: trypanothione reductase.

Table 2 .
Neolignans activity probabilities (pActivity) against cruzain and TR as assessed by the RF model.The compounds considered active in the models are highlighted in bold

Table 3 .
Improved results in the combined probability between the prediction model and molecular docking analysis (Prob comb ) for potential activity against cruzain.The compounds shown are the compounds considered active (with Prob comb equal to or above 0.50) with values of binding energy, molecular docking probability (Prob Dc ) and probability of biological activity (Prob Ac )

Table 4 .
Improved results in the combined probability between the prediction model and molecular docking analysis (Prob comb ) for potential activity against TR.The compounds shown are the compounds considered active (with Prob comb equal to or above 0.50) with values of binding energy, molecular docking probability (Prob Dc ), and probability of biological activity (Prob Ac ) a Probability of a compound being active in the molecular coupling analysis; b probability of biological activity; c combined probability between the ligand and structure-based.

Table 5 .
Improved results in the probability of molecular docking (Prob Dc ) for potential activity against CYP51.Compounds considered active (with values equal to or greater than 0.56 in the Prob Dc ) are shown a Probability of a compound being active in the molecular coupling analysis.

Table 7 .
Effect on Trypanosoma cruzi and renal cells 50: concentration required for 50% inhibition of the parasites.

Table 8 .
Set of molecules from the ChEMBL Databases for each cruzain, TR and CYP51 database of T. cruzi TR: trypanothione reductase; IC 50 : concentration required for 50% inhibition of the parasites.