Molecular modelling and de novo fragment-based design of potential inhibitors of beta-tubulin gene of Necator americanus from natural products

The emergence of drug resistance against the known hookworm drugs namely albendazole and mebendazole and their reduced efficacies necessitate the need for new drugs. Chemically diverse natural products present plausible templates to augment hookworm drug discovery. The present work utilized pharmacoinformatics techniques to predict African natural compounds ZINC95486082, ZINC95486052 and euphohelionon as potential inhibitory molecules of the hookworm Necator americanus β tubulin gene. A library of 3390 compounds was screened against a homology-modelled structure of β tubulin. The docking results obtained from AutoDock Vina was validated with an acceptable area under the curve (AUC) of 0.714 computed from the receiver operating characteristic (ROC) curve. The three selected compounds had favourable binding affinities and were predicted to form no interactions with the resistance-associated mutations Phe167, Glu198 and Phe200. The compounds were predicted as anthelmintics using a Bayesian-based technique and were pharmacologically profiled to be druglike. Further molecular dynamics simulations and MM-PBSA calculations showed the compounds as promising anthelmintic drug leads. Novel critical residues comprising Leu246, Asn247 and Asn256 were also predicted for binding. Euphohelionon was selected as a template for the de novo fragment-based design of five compounds labelled A1, A2, A3, A4 and A5; with four of them having SAscore values below 6, denoting easy synthesis. All the five de novo molecules docked firmly in the binding pocket of the β tubulin with no binding interactions with the three known resistance mutation residues. Binding energies of −8.2, −7.6, −7.3, −7.2 and −6.8 kcal/mol were obtained for A1, A2, A3, A4 and A5, respectively. The identified compounds can serve as treasure troves from which future potent anthelmintics can be designed. The current study strives to assuage the hookworm disease burden, especially making available molecules with the potential to circumvent the chemoresistance.


Introduction
Hookworm infection remains a significant health burden globally. The disease is caused by the hookworm, an intestinal parasitic worm, that infects over 600 million persons especially in resource-limited countries and results in 135,000 deaths annually [1,2]. The hookworm infection belongs to a larger group of diseases called Neglected Tropical Diseases (NTDs) known to cause debilitating effects. Some NTDs are caused by parasitic worms including schistosomes, filarial and guinea worms [3]. Areas largely affected by hookworm include countries in sub-Saharan Africa, South Asia, Latin America, the Caribbean, the Middle East, and North Africa [4]. The human hookworm infection is primarily caused by Ancylostoma duodenale and Necator americanus [5] with the latter accounting for more than 85% of all hookworm infections [5,6]. The infection is known to affect growth as well as limit memory and cognition in children [7].
The β (beta) tubulin gene of the hookworm is critical for intracellular processes, cellular division, overall mobility, entry and migration of the parasite in the host cell. It is a subunit of the microtubule which, plays a crucial role in cell division and maintenance of the cytoskeleton [8]. It binds to two molecules of guanosine-5-triphosphate (GTP), at the positive end of microtubules [9]. Beta tubulin has so far been extensively exploited as a crucial target for anthelmintic and as a target for several other compounds [8]. Treatment using benzimidazoles including albendazole and mebendazole selectively target the betatubulin gene isotype, disrupting the polymerization of microtubules. Benzimidazoles confer their anthelminthic effect on susceptible nematodes by binding to their beta-tubulin gene resulting in the prevention of microtubule polymerization causing the destabilization of the intracellular processes and cellular division within the parasite and an overall immobility effect [10]. Community-based treatment methods normally employ a mass drug administration (MDA) strategy involving a combination regimen of albendazole, Ivermectin (IVM), and praziquantel (PZQ) [11][12][13][14]. This control strategy is, however, considered a high-risk approach in both humans and animals since it rather induces resistance [11]. For instance, there are reported cases of drug resistance in livestock in parts of Mali, Australia, South Africa and Paraguay [11]. Drawing lessons from this, it is therefore recommended a careful use of human anthelminthic in a way that will allow the prolonged use of these drugs and at the same time manage the problem of drug resistance and treatment failures.
Genetic mutations in the beta-tubulin gene are among the factors that have been reported to confer resistance in several parasitic nematodes including local strains of N. americanus [15,16]. These mutations occur at codons 167, 198 and 200 result in amino acid changes [15,17,18]. The changes lead to the substitution of phenylalanine with tyrosine at codons 167 and 200, and glutamate with alanine at amino acid position 198 [17,19,20]. Therefore, the binding and stabilization by alternative small molecule inhibitors with these residues Glu198, Phe167 and Phe200 at the active site of the beta-tubulin molecule is highly undesirable since it can lead to a decrease in binding affinity [21]. There is the need for molecules with different modes of binding.
Natural products (NPs) are structurally and chemically diverse compared to synthetic libraries [22]. Studies have shown that more than 50% of NPs pass the Lipinski rule of five for drug-likeness [23]. This makes natural products more absorbable than their synthetic counterparts do. Recent studies have reported a group of Dichapetalin compounds notably Dichapetalin A with promising anthelminthic activity against N. americanus [24].
In silico strategies for drug discovery presents an advantage in terms of time and cost for identifying novel chemotherapeutic agents by exploring a compendium of small compounds retrieved from freely available public databases. These methods have been used to identify some promising therapeutics against diseases [25,26]. Some of these databases include the African natural product database (AfroDB) and the North African Natural Product database (NANPDB) which contains a diverse and highly potent chemical landscape of natural compounds that could be explored for potential anthelminthic leads [27,28].
We sought to identify potential novel anthelminthic leads by performing virtual screening of natural products of African origin against a modelled 3D structure of beta-tubulin receptor of N. americanus. Also, we aimed at generating novel molecules using de novo fragment-based drug design. We characterized the mechanisms of binding for the resistanceassociated mutations within the active site. In addition, pharmacological profiling, and relevant biological activity predictions were performed to determine the therapeutic utility of the potential lead molecules.

Methodology
A series of computational applications and techniques were employed in this study (Supplementary Table 1).

Template selection and homology modelling of N. americanus beta-tubulin gene
A search in Protein Data Bank (PDB; http://www.rcsb.org/) revealed no solved tertiary structure of N. americanus beta tubulins, hence the primary sequence of the protein (Gene ID: NECAME_11014, Accession number: W2T758) was retrieved from UniProt [29]. For template identification, Aguayo-Ortiz et al. employed the Iterative Threading ASSEmbly Refinement (I-TASSER) software tool to predict a suitable template based on a binding site containing residues associated with mutations [30]. Using a similar approach, our protein sequence comprising of 449 amino acids was submitted for template and binding site identification via the Iterative Threading ASSEmbly Refinement (I-TASSER) server [31], which predicted four plausible templates. The D chain of the subunit of the multimeric structure of tubulin tyrosine ligase (T2R-TTL) [PDB ID: 5C8Y], was selected for further studies due to the presence of amino acid residues associated with resistance as well as a high sequence identity to the template. The crystallographic structure of T2R-TTL (PDB ID: 5C8Y, resolution: 2.59 Å) was downloaded from PDB and used as a template in homology modelling of the beta-tubulin receptor using MODELLER v 9.17 [32]. The align2d function in MODELLER v9.17 was used to align the sequence of the target with the template and five candidate 3-D models generated using MODELLER v 9.17. The best model was then chosen based on the lowest value of the Discrete Optimized Potential Energy (DOPE) and high GA341 scores [32]. The selected best model and the template were structurally aligned using SuperPose [33].

Model refinement and assessment
The selected best model was refined to fix steric clashes and bumps using the WHAT IF server [34]. This is due to models having undesired bond lengths, bond angles, torsion angles and contacts. The refined structure was then energy minimized using GROMOS43B1 force field in Swiss-PdbViewer v 4.10 [35] to correct local bond and angle geometry and to relax the close contacts in the geometric chain. The WHAT IF program implements WHAT_CHECK to ascertain and fix steric clashes based on the overlap of two non-bonding atoms of distance cutoff set at 0.4 Å. The minimized model was visualized with PyMOL v 1.74 [36] and further optimized using molecular dynamics simulations. This was performed to evaluate the overall stability, folding, and obtain insight into the conformational changes as well as the dynamics of the refined model. The molecular dynamics (MD) simulation of the structure was performed with the Linux version of GROMACS v 5.1.4 [37] software package by utilizing the GROMOS 96_43a1 force field and the simple point charge (SPC) water model by passing the "-water spce" command. The modelled structure was first immersed in a periodic water box of cubic shape (1 nm thick). After solvating the receptor, the net charge on the protein was +8e. Genion command in GROMACS was used to add eight chloride (Cl-) counter ions to neutralize the net charge on the protein. Electrostatic energy was computed via the particle mesh Ewald method with a computational load of 0.19. The cutoff distance for the calculation of the Coulomb and van der Waals interaction was 1.0 Å with the cutoff scheme being Verlet for minimization of 50000 steps. The system was subjected to a two-step ensemble process (NVT and NPT) at a temperature of 300 K and pressure of 1 bar (P) for 2 ps. Linear Constraint Solver (LINCS) constraints were performed for all bonds, with position restraint on the protein and allowing only the water molecules to move to equilibrate with the protein structure. The final production run was performed for the minimization of 5 ns under the same equilibration conditions of 300 K and 1 bar. The results were analyzed using GROMACS v 5.1.4 and GRACE v 5.1.25 [38] which uses the command xmgrace in a Linux terminal. The last frame of the minimized protein in gro format was saved as a pdb format using Visual Molecular Dynamics (VMD) v1.9.3 [39]. The minimized model was then validated by generating a Ramachandran plot [40] using the PROCHECK v 3.5.4 [41]. Other programs such as ERRAT, VERIFY3D and Qmean [42,43] were also used to corroborate the PROCHECK results.

Prediction and analysis of binding site
The potential binding site of the beta-tubulin model containing the amino acid residues of interest was predicted using the Computed Atlas of Surface Topography of proteins (CASTp) [44] and further visualized in PyMOL [36].

Protein preparation for docking
AutoDockTools v 4.2.6 [45] was used in the preparation of the protein model which involved the assignment of Gasteiger partial charges [46]. All existing water or solvent molecules were removed to eliminate the influence of solvent interactions in the proteinligand docking. The receptor file saved in Protein Data Bank partial charge and atom type (pdbqt) format was used as an input receptor file for AutoDock Vina.

Virtual screening
AutoDock Vina [54] tool in PyRx v 0.8 [55] was used to virtually screen a library of 3390 compounds against the binding pocket of the predicted model. The grid box with dimensions (22.5 Å × 22.5 Å x 22.5 Å) and centre (−18.35 Å, −8.23 Å, −22.48 Å) was set around the predicted active site residues of the receptor. A default exhaustiveness of 8 was used. After completion of the virtual screening process, the top hits, with the lowest binding energies (kcal/mol) relative to the known inhibitors, were selected for further visualization in PyMOL [36]. The top 30 hits with reasonably good binding energies and binding poses were selected for further evaluation.

Docking protocol validation
Five molecules comprising albendazole and mebendazole, as well as three potent dichlorosubstituted benzoxazole-triazolo-thione derivatives with PubChem CIDs [52] 53327690, 723308 and 53327692 were used as actives in the validation of docking protocol. Albendazole and mebendazole are broad-spectrum anthelmintic drugs targeting beta-tubulin [48,50]. Moreover, the choice of including the three dichloro-substituted benzoxazoletriazolo-thione derivatives was based on their reported efficacious inhibitory activities against helminth beta tubulins [51]. The SMILES of the five molecules were used to obtain a dataset of decoys via the Directory of Useful Decoys-Enhanced (DUD-E) [56]. Fifty decoys were obtained for each ligand, then each set of decoys has similar physicochemical properties to the parent ligand but is dissimilar in topology [57]. Furthermore, the dataset of 255 compounds comprising of 250 decoys and 5 ligands were screened against the predicted active site of the modelled beta tubulin receptor using AutoDock Vina. The docking result containing the respective binding energies of the ligands and their decoys was used to compute the Area Under the Curve (AUC) of the Receiver Operating Curve (ROC) via easyROC v 1.3.1 [58]. The variables were set to "nonparametric" as the method for curve fitting, "DeLong (1988) [+]"as the method for standard error estimation and confidence interval, and a "Type I" error of 0.05.

Protein-ligand interaction profiling
Ligplot+ [59] was used to study the 2D protein-ligand interaction which includes the hydrogen bonding and the hydrophobic contacts. The best poses of the hits were saved in a '.pdb' file format. The protein-ligand complex for these poses was visualized using PyMOL. The complexes were then loaded into the Ligplot+ workspace to generate the 2D schematic intermolecular interactions between the protein and the ligands.

Exploring the anthelmintic activity of the hits
Prediction of Activity Spectra for Substances (PASS) [60] which a is Bayesian-based machine learning technique was used to predict the anthelmintic activity of the top 30 hits. Anthelmintics are drugs for the treatment of helminth infections, including hookworm infection. A DrugBank and literature search for anthelmintic activity was also done to support the PASS predictions based on the structural information of the hits.

Drug likeness and pharmacological profiling
The SMILES of the hits were used to obtain the pharmacological profiles and physicochemical properties of the PASS predicted anthelmintic hits via SwissADME [61]. Parameters considered include molecular weight, number of hydrogen bond acceptors and donors for evaluating drug-likeness and gastrointestinal absorption, P-gp substrates, and cytochrome P450 inhibition for evaluating the pharmacokinetics.

Toxicity prediction analysis
The toxicity profiles of the PASS anthelmintic predicted compounds were evaluated and analyzed using the OSIRIS property explorer in DataWarrior v 4.5.2 [62]. OSIRIS explorer uses a precomputed set of known toxic compounds and fragments to predict relevant properties comprising tumorigenicity, mutagenicity reproductive effect and irritancy [62]. ADVERPred [63] was also used to predict the nephrotoxicity and hepatotoxicity of the compounds.

Molecular dynamics of protein-ligand complexes and MM-PBSA analyses
The MD simulations of the protein-ligand complex were performed in GROMACS v 2018 [64]. The protein topology of the modelled beta-tubulin was generated using the GROMOS 96_43a1 force field whereas the ligand topologies were generated via PRODRG [65], with defined settings (Chirality: Yes, Charges: Full, and EM: No). A protein-ligand complex was then constructed. The complex was solvated with SPC water in a cubic box of size 1.0 nm and neutralized with Na and Cl ions. Energy minimization of the complex was conducted for 50,000 steps using the steepest descent algorithm. The ligands were restrained before the NVT and NPT ensemble simulations. Equilibration simulation was run for 100 ps for each ensemble. A production MD of 100 ns simulation was performed on the complexes. The g_mmpbsa [66] was used to compute the free binding energies of the complexes over the 100 ns simulation using the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method. R programming package was used to generate graphs for the MM-PBSA simulation.

De novo design of inhibitors
A potential lead complex was used as input for the de novo design of novel ligands via e-LEA3D [67]. A binding site radius of 10 Å and a weight in a final score of 1 were selected in setting up the docking function. The coordinates of the active site as used in molecular docking studies was considered. The maximum number of conformers, number of generations and population size were set at 1, 30 and 30, respectively.

Template identification, homology modelling of proteins, molecular dynamic simulations and validation
Aguayo-Ortiz et al. used the crystal structure of Ovis aries beta-tubulin as a template to generate the homology model of the beta-tubulin of the nematode Trichinella spiralis and to predict a binding site that contained conserved residues [30]. Similarly, using I-TASSER [31,68], the D chain of the crystal structure of tubulin tyrosine ligase (T2R-TTL) with PDB ID 5C8Y was chosen as the best template for homology modelling based on the presence of conserved amino acid residues Phe167, Glu198 and Phe200, within the binding site. A sequence alignment with the template also showed that the beta-tubulin is homologous to the subunit of the T2R-TTL with 88% sequence identity and 94.3% sequence similarity (Supplementary Figure 2). In addition, the results of sequence alignment revealed that the predicted active site contained residues Phe167 and Glu198. Model 5 was selected as the best structural model of the beta-tubulin of N. americanus (UniProt ID W2T75) based on the lowest discrete optimized potential energy (DOPE) score of −53055.30859 obtained ( Table 1). The DOPE score and GA341 are methods used for evaluating the accuracy and reliability of modelled protein structures. DOPE is an atomic distance-dependent statistical method that is useful in evaluating the energies of generated models [69]. Models with the lowest DOPE score and highest GA341 score are considered reasonably better. The resulting protein model is a monomer, folded into a β domain that consists of 11-stranded β-sheets, and 11 α-helices ( Fig. 2A and B). Structural alignment of the 3D model and template using SuperPose [33] gave a root mean square deviation (RMSD) of 1.32 Å. The RMSD value obtained indicates an acceptable difference between the protein model and the template [33]. In addition, the Modeller objective function (molpdf) was used to measure the sum of all the restrains.
The overall structure of the model is similar to the template protein as expected (Fig.  2B). Additionally, the structural alignment revealed conserved key amino acid residues at positions 167 and 198 but only a residue difference at position 200 where phenylalanine (aromatic non-polar residue) is present in the model structure but a tyrosine (aromatic polar residue) in the template; these two residues differ by only a hydroxyl group. Further, there were binding site residue differences at positions 165 and 166, where two serine residues (polar amino acid) were present in the model while asparagine and threonine (also polar amino acids) occupied those respective positions in the template. In addition, there were amino acid differences at positions 317 and 370, where methionine and valine (hydrophobic amino acid residues) were present in the model and isoleucine (also hydrophobic amino acid) at the respective positions in the template.

Model refinement and assessment
Protein model refinement and energy minimization were performed on our best model using Swiss-PdbViewer [70] and GROMACS [64,71], respectively. Analysis of the GROMACS generated trajectories of the refined protein model indicated that the RMSD increased from the beginning but after a period of 0.5 ns, it remained relatively stable for the rest of the duration of the simulation (Fig. 3). This suggests that the model had very low RMSD for the backbone with less Root Mean Square (RMS) fluctuations and flexibility, and thus indicating that the model was well-conditioned.
The quality of the optimized structure was finally evaluated by generating a Ramachandran plot using PROCHECK [41]. Ramachandran plots highlight the most favoured, allowed, generously allowed and disallowed regions of the modelled protein structure. Ideally, a model of reasonably high quality should have at least 90% residues in the core regions [40]. The Ramachandran plot for the predicted model revealed that 92.3% of residues were within the most favourable region, whilst 7.7% were in the allowed region (Fig. 4), supportive that the predicted model was of reasonably high quality. In addition, the overall quality factor predicted by the ERRAT for the model was 89.327 (Supplementary Figure 3) which corroborates the quality of the protein model structure. ERRAT [72] provides the overall quality factor for non-bonded atomic interactions and the generally accepted value should be greater than 50 for a high-quality model. When the model was further validated using VERIFY 3D [42], 88.77% of the residues were predicted as having an average 3D-1D score greater than 0.2.

Prediction and analysis of binding site
The residues within the binding site of tubulins generally have a high sequence and structural conservation among helminth tubulins which are different from other families of tubulins. The predicted binding pocket included all residues whose mutation are linked to anthelminthic resistance (Supplementary Figure 4) and had a computed area of 401.920 Å 2 and volume of 164.250 Å 3 . Forty-three (43) residues formed the putative binding pocket and these included conserved residues Phe167, Glu198 and Phe200 (Fig. 5, Supplementary  Figure 4).

Virtual screening
Molecular docking is a useful computational technique in informing the selection of lead compounds [73]. After screening an integrated library of 3390 against the active site, the top hits were selected for further analysis based on their binding energies. The docking poses of the top hits were further visualized using PyMOL. Hits that were not docked firmly in the active site pocket were ferreted out. The binding energies and docking poses were used as criteria to select 30 hits for further evaluation. These compounds had lower binding energies than the five known inhibitors. Table 2 shows the docking results of the top 30 hits ranked according to their binding energies. S,5Z,8Z,11Z,13E, 17Z)-15-hydroxy-1-(2,4,6trihydroxyphenyl)-15-methylicosa-5,8,11, 13,17-pentaen-1-one had the lowest binding energy of −8.7 kcal/mol, suggesting a plausible strong molecular interaction. The binding energies of the top 30 hits ranged from −8.7 kcal/mol to −7.7 kcal/mol. Additionally, the screening results of the known hookworm beta-tubulin inhibitors namely albendazole and mebendazole and three other dichloro substituted benzoxazole-triazolo-thione derivatives were included (Table 2). Mebendazole had the lowest binding energy of −7.0 kcal/mol amongst the known inhibitors followed by the compound of PubChem ID 53327692 with a binding energy of −6.4 kcal/mol ( Table 2). The docking results showed the top 30 hits having relatively lower binding energies than the known inhibitors.

Docking protocol validation
The docking protocol of AutoDock Vina [54] was validated using the ROC curve analysis. Five hookworm's beta-tubulin inhibitory ligands comprising albendazole, mebendazole, and PubChem IDs 53327690, 723308 and 53327692 with their respective decoys were screened against the beta-tubulin model to generate the ROC curve. The ROC curve evaluates a docking model's ability to discriminate between actives and decoys [74]. An area under the curve (AUC) value of 1 is considered a perfect classification, and below 0.5 is poor discrimination [75,76]. Accordingly, an AUC value of 0.7-0.8 is considered acceptable and an AUC value of 0.8-0.9 is interpreted as excellent. Moreover, an AUC value of above 0.9 is considered outstanding [75,77]. By screening 5 actives and 250 decoys against hookworm beta-tubulin via AutoDock Vina, an AUC value of 0.714 was computed from the ROC curve (Fig. 6). The computed AUC value is considered acceptable [75].
To support the ROC evaluation of the AutoDock Vina, previously reported studies had employed AutoDock Vina to successfully screen compounds against helminth beta-tubulin receptors [51,78] The predicted compounds were experimented and confirmed as potent helminth inhibitors, suggesting that AutoDock Vina is an effective tool for deriving potential inhibitors for beta tubulins of N. americanus.

The anthelmintic activities of the hits
The anthelmintic related biological activity was predicted and reinforced using structural similarity evaluation with known anthelmintics. The main goal was to predict scaffolds that could be optimized for the development of anthelmintics inhibitors. PASS [60] was used to predict the pharmacological activities of the top 30 hits. The prediction was based on the structural-activity relationship between the compound of interest and a training set of over 26000 compounds with known biological activities [60]. The relevant biological activity considered in this study was the propensity of the hits to be anthelminthic. PASS was used in previous studies to predict the anthelmintic activity of novel compounds [82,83]. Experimental results from these studies mostly corroborated the PASS predictions, indicating that PASS is a plausible Bayesian-based technique for predicting the anthelmintic activity of a compound. For a given compound, PASS predicts Probable activity (P a ) and Probable inactivity (P i ), with both ranging between 0.000 and 1.000 for a predicted activity. A total of 19 compounds out of the 30 hits were predicted to be anthelmintic, with P a > P i ( Table 4) which denote their likely anthelmintic potential [83]. The compound 6,10-dimethyl-9-methylene-2-(4-methyl-1,2-dioxabicyclo[2.2.2]oct-5en-l-yl)undec-5-ene obtained from NANPDB had the highest P a of 0.759 and P i of 0.003 and with such P a > P i , there is a high propensity for it to be a potent anthelmintic warranting further pharmacological evaluation [82,83]. Furthermore, a Drugbank similarity search [84] of 6,10-dimethyl-9-methylene-2-(4-methyl-1,2-dioxabicyclo[2.2.2]oct-5-en-l-yl) undec-5-ene was done to support the PASS predictions. The hit had a DrugBank similarity score of 0.511 to terpenin-4-ol, an isomer of terpineol. Terpine-4-ol has been reported to be an anthelmintic with potent activity against the eggs and larvae of helminth Haemonchus contortus [85,86]. The DrugBank [84] performs structure similar searches by using locally developed SMILES string comparison method to identify related structures. The similarity scoring ranges from 0-1, with 1 being the exact compound.
ZINC15120680 had the second highest P a of 0.722 and P i of 0.003. It also had a DrugBank similarity score of 0.599, 0.572 and 0.571 to Kaempherol, Quercetin and Genistein, respectively. Kaempherol, quercetin and genistein are naturally occurring products with multiple pharmacological actions, including anthelmintic activity [87]. Kaempherol and quercetin for instance were found to contribute to the anthelmintic property of 10 East African plants [88]. Moreover, these two natural products were reported to cause paralysis and death of known helminths Taenia solium (tape worm) and Pheritima posthuma (earthworm) [89]. Phytochemical analysis of the root tuber peels revealed rich genistein contents [87,90]. Genistein appears to have an eclectic mechanism of action against helminth parasites [91]. For instance, genistein was found to alter the carbohydrate metabolism of cestode parasites [87,92]. It also disrupts the spines and the tegumental surface of trematodes, leading to total paralysis or death. Moreover, genistein and its derivatives interfere with the microtriches, vesiculations and nuclear pyknosis of cestodes E. multicularis and E. granulosus [87,93]. ZINC95486052 and ZINC95486082 were structurally similar to naringenin with DrugBank similarity scores of 0.768 and 0.818, respectively. Naringenin was reported to have a synergistic effect on the synthesis and shedding of cuticle larval stages of helminth H. contortus [94]. A substructure probe via PubChem [52] revealed a common 2-phenylchromen-4-one substructure in ZINC15120680 and its similar DrugBank compounds (kaempherol, quercetin and genistein). ZINC95486052, ZINC95486082, ZINC28462577, ZINC95485922, ZINC14760755, robustaflavone, tetrahydrorobustaflavone and ZINC95485928 all had a 2-phenylchromen-4-one substructure. This substructure is native to the flavonoid family of natural products. Studies have also attributed the high anthelmintic property in flavonoids to the presence of reactive 2-phenylchromen-4-one structure [89].
ZINC95485928 was predicted to possess anthelmintic activity with P a of 0.687 and P i of 0.004. It had a DrugBank structural similarity score of 0.500 to Diosmetin. Diosmetin extracts from plant species D. insularis was found to be efficacious against gastrointestinal nematodes in goats [95]. The authors further suggested that flavones, a family of natural products could probably be responsible for anthelmintic activities [95]. Furthermore, the NANPDB hit Euphohelionon, which is a constituent of the Euphorbia helioscopia L. plant [96] was predicted to have an anthelminthic activity with P a of 0.378 and P i of 0.02. A study of the methanolic extracts of E. helioscopia, the parent plant of euphohelionon, revealed significant larvicidal and ovicidal effects against H. contortus worms, both in vitro and in vivo [97]. Table 4 summarizes the PASS anthelmintic predictions of the 19 hit compounds from AfroDB [28] and NANPDB [27].

Drug likeness and pharmacological profiling
Lipinski's Rule of Five (RO5) [98] was used to evaluate the drug-likeness of the 19 predicted anthelmintic hits. The rule has been applied in analyzing African natural product libraries [28,99,100]. For a molecule to be qualified as "drug-like" and orally active, it must not violate more than one of the following criteria: a molecular weight ≤500 Da; logarithm of n-octanol/water partition ≤5, which determines the lipophilicity; hydrogen bond donors ≤5; and hydrogen bond acceptors ≤10. Table 5a provides the physicochemical properties and evaluation of the drug-likeness of the 19 predicted anthelmintic hits of which 12 did not violate any of the RO5 criteria. Furthermore, only three hits violated one of the RO5s. In total, 15 out of 19 hits were predicted to be druglike via SwissADME [61]. The remaining four hits with the least drug-likeness were tetrahydrorobustaflavone, robustaflavone, ZINC13480348 and euphohelionon.
Pharmacological profiling studies are important components of the drug development pipeline. The pharmacokinetic properties considered for this study included human intestinal absorption, Permeability Glycoprotein (P-gp) binding, blood-brain barrier and cytochrome P450 (CYPs450) inhibition [101]. A total of 10 hits were predicted to have high intestinal absorption. Permeability glycoprotein (P-gp) is an ATP-dependent efflux pump that moves substances from the intracellular space to the extracellular space [102]. P-gp has a broad substrate specificity [103] and the excretion of drugs back into the gut lumen by P-gp pharmacokinetically reduces the potency of some drugs which are P-gp substrates [104]. Moreover, studies have implicated P-gp as a contributing factor in helminth drug resistance [105]. Only three of the hits were predicted to be P-gp substrates. Another important pharmacokinetic parameter is the Cytochromes P450 (CYPs) inhibition. Cytochromes P450 (CYPs), which are major enzymes involved in drug metabolism belong to a protein super family containing heme [106]. Inhibition of CYPs may lead to possible drug-drug interactions [107]. The CYP1A2, CYP2C19, CYP2C9, CYP2D6 and CYP3A4 inhibition of the compounds were predicted via SwissADME (Table 5b). Toxicity predictions were also performed on the compounds due to their potential contribution to drug attrition [108]. The mutagenicity, tumorigenicity, irritancy and reproductive effect were predicted using Osiris Property Explorer (Table 5c). Furthermore, the nephrotoxicity and hepatotoxicity were predicted via ADVERPred, which utilized Structure−Activity Relationships (SARS) models built using the PASS [63]. The probability of activity (Pa) and probability of inactivity (Pi) for these parameters for the 19 compounds have been provided in Table 5c.

Molecular dynamics simulation of selected compounds
Three African natural compounds namely ZINC95486082, ZINC95486052, and euphohelionon were selected for downstream MD simulation. These compounds had low binding energies, relatively good pharmacological profiles, predicted anthelmintic propensity and most importantly did not interact with the mutation-associated residues. Additionally, two known inhibitors comprising PubChem ID 53327692 and mebendazole were selected as controls. Both compounds had relatively better binding energies amongst the five inhibitors in this study. A 100 ns MD simulation was performed to expound on the structural stability and conformational changes when situated in diverse and dynamic physiological conditions [109]. The parameters studied included the root mean square deviation (RMSD), the radius of gyration (Rg), and the root mean square fluctuation (RMSF). The RMSD evaluates the deviation of the protein-ligand complex during the simulation from the initial protein backbone atomic coordinates and is used as a plausible measure of a protein's stability [110]. An RMSD plot over simulation time revealed the backbones of the five complexes stabilized after 25 ns (Fig. 8A). The ZINC95486082-betatubulin complex rose from 0 nm to 0.5 nm during the start of the simulation. It then peaked around 23 ns, and later averaged around 0.75 nm over the remaining simulation time. Furthermore, the ZINC95486052-beta-tubulin complex maintained a steady RMSD of approximately 0.64 nm after the initial 20 ns of simulation. Additionally, the RMSD of Euphohelionon-beta-tubulin complex rose steadily from 0 nm and peaked around 1.0 nm, after which it began to decline. It then maintained an average RMSD of approximately 0.77 nm. The mebendazole-beta tubulin complex had the highest deviation of 1.0 nm from its protein backbone. The PubChem CID 53327692-beta tubulin complex also maintained an RMSD of approximately 0.64 nm.
The compactness of the complexes was evaluated using the Rg [111]. The Rg of all five complexes decreased over the first 20 ns. The trend then stabilized over the remaining simulation time (Fig. 8B). ZINC95486082-beta tubulin, ZINC95486052-beta-tubulin, Euphohelionon-beta-tubulin, Mebendazole-beta tubulin and PubChem ID 53327692-betatubulin complexes had average Rg of approximately 2.1 nm, 2.13 nm, 2.12 nm, 2.14 nm, and 2.13 nm, respectively. A stably folded and compact protein maintains a reasonable steady radius of gyration throughout the simulation [112]. As indicated in Fig. 7B, the trends appeared steady indicating the protein remained stably folded after binding.
Finally, the stability of the individual residues was assessed using their RMSF plots [109]. All the complexes possessed similar residue fluctuations within the same regions, with little deviation from the unbound protein core (Fig. 8C). Sizeable residue fluctuations were observed in the regions from residue index 420-449 which might be the region with the highest flexibility.

Evaluation of potential leads using the MM-PBSA calculations
The binding free energies of the complexes were estimated using MM-PBSA calculations. The calculations address some limitations of current scoring functions [113]. Contributing terms to the binding free energy include the electrostatic, polar, non-polar, and van der Waals energies. The binding free energies were computed in terms of average energies and standard deviations. Mebendazole had a relatively low average binding free energy of −150.810 kJ/mol (Table 6), thus affirming its binding energy against hookworm betatubulin. In addition, PubChem ID 53327692 had binding energy of −122.239 kJ/mol. The energy contributions per individual residues to the binding was also evaluated, whereby residues contributing >5.0 kJ/mol or < −5.0 kJ/mol were considered critical [114]. The amino acid residue, Glu198 contributed 34.3977 kJ/mol and 30.2668 kJ/mol to ligand binding of mebendazole and PubChem ID 53327692, respectively (Supplementary Figure  5).
Euphohelionon had the lowest binding free energy of −123.620 kJ/mol amongst the three potential compounds (Table 6). Its binding energy plot over the simulation time showed initial binding energy of 106 kJ/mol. It then averaged within the range of −150 kJ/mol and −90 kJ/mol, further peaking at 50 ns. A final binding energy of −194 kJ/mol was obtained at 100 ns (Fig. 9). The van der Waal, electrostatic and polar energies contributed favourably to the free binding energy (Table 6). An energy decomposition plot revealed Leu246, Leu253 and Asp327 contributed energies beyond the ± 5.0 kJ/mol threshold ( Fig. 10; Supplementary Table 2). The observed high energies corroborate the aforementioned assertion that Leu246 and Leu253 are critical for ligand binding and stabilization. Besides, the resistance mutation associated residues Phe167, Glu198 and Phe200 contributed energies of −0.0220 kJ/mol, 2.4911 kJ/mol and −0.0742 kJ/mol to ligand binding, respectively ( Fig. 10; Supplementary Table 2). This fell short of the threshold required for critical residues, suggesting that perhaps these are not the foremost ligand-binding residues.
ZINC95486052 was predicted by AutoDock Vina to have binding energies of −8.5 kcal/ mol. However, MM-PBSA calculations showed it had high binding free energy of 307.175 kJ/mol. This was predominantly due to high electrostatic and polar energies of 341.594 kJ/mol and 179.040 kJ/mol, respectively. Also, ZINC95486082 had high binding free energy of 336.564 kJ/mol. Molecules with high binding free energies may have limited lead likeness [114], hence a decreased propensity for further exploitation [115]. A binding energy decomposition per residue evaluation revealed Glu198 contributed favourably (>5.0 kJ/mol) to ligand binding of both compounds. Leu256 was observed to contribute favourable binding energies (< −5.0 kJ/mol) in both complexes.

Selection of parent molecule for de novo ligand design
The parent compound was selected based on its binding energy, protein-ligand interactions, predicted anthelmintic activity and free binding energies. After virtual screening an integrated library of 3385 compounds of African origin against the modelled beta-tubulin receptor, 30 hit compounds were selected based on their binding energies and how they are firmly docked in the binding pockets. All the selected hits had lower binding energy than the five known inhibitors. Compounds that form strong biomolecular interactions with resistance-associated mutation residues could lose interactions, leading to loss of affinity [21]. Four compounds namely anchinopeptolide A, euphohelionon, ZINC95486082 and ZINC 95486052 formed no molecular interactions with the resistance mutation associated residues (Table 3). However, anchinopeptolide A violated three rules of the RO5s (Table  5a). More so, it was not predicted as anthelmintic, hence warranted no further downstream exploration. ZINC95486082 and ZINC95486052 had binding of −8.5 kcal/mol and −8.3 kcal/mol, respectively. They were both predicted anthelmintics with Pa>0.3 and Pa>Pi (Table 4). Substructure searches revealed a common 2-phenylchromen-4-one substructure for ZINC95486082 and ZINC95486052. The 2-phenylchromen-4-one substructure was reported to be responsible for the high anthelmintic properties in flavonoids [89]. Both ZINC95486052 and ZINC95486082 were structurally similar to naringenin, a known anthelmintic with similarity scores of 0.768 and 0.818, respectively. However, evaluation of free binding energies of the two compounds via MM-PBSA calculations revealed high binding energies of 307.175 and 336.564 kJ/mol, respectively (Table 6), hence possibly decreasing their likelihood to be a lead compound [114].
Euphohelionon had binding energy of −7.9 kcal/mol and formed one hydrogen bond with critical residue Asn256 and nine hydrophobic contacts. It was predicted as anthelmintic with a Pa of 0.378 and Pi of 0.02. When the Pa>0.3 and Pa>Pi for a particular biological activity, the compound warrants further experimental validation. Furthermore, euphohelionon is sourced from the E. helioscopia a local African plant, which has reported anthelmintic activities. Additionally, euphohelionon had a generally good pharmacological profile (Table  5). Further evaluation of the binding energies via MM-PBSA calculations revealed a good binding free energy of −123.620 kJ/mol (−29.546 kcal/mol). The energy contribution of the individual residues also revealed that three critical residues namely Leu246, Leu253 and Asp327, contributed energies beyond the ± 5.0 kJ/mol to the euphohelionon-beta-tubulin binding. Furthermore, per energy decomposition of the residues revealed it would likely not interact with the known resistance-associated mutation residues. Considering these factors, euphohelionon was selected as a parent compound for de novo design of potential inhibitor molecules of hookworm beta-tubulin. The step-by-step stages in the selection of a parent compound have been illustrated in Fig. 11.

De novo fragment-based design of novel potential anthelmintics
The e-LEA3D server [67] was used to generate a total of 50 new molecules using euphohelionon as a parent compound. Duplicates and error generating molecules were eliminated. A total of 15 molecules with good physicochemical properties were used in molecular docking studies. The molecules with binding energies < −6.0 kcal/mol were selected for downstream analysis. Five novel molecules with identifications A1, A2, A3, A4 and A5 had good binding energies relative to the known inhibitors. Binding energies of −8.2 kcal/mol, −7.6 kcal/mol, −7.3 kcal/mol, −7.2 kcal/mol and −6.8 kcal/mol were obtained for A1, A2, A3, A4, and A5, respectively (Table 7). Only A1 had better binding energy than the parent compound, euphohelionon. Insights into the mechanism of binding revealed similar interactions akin to the euphohelionon-beta-tubulin complex. A1 formed a total of thirteen hydrophobic contacts with binding site residues including Leu246, Lys350 and Asn256. In addition, A2 formed two hydrogen bonds and six hydrophobic contacts (Fig. 12, Table 7). A3 formed nine hydrophobic contacts as well as three hydrogen bonds of varying lengths with Ala314. Furthermore, A4 formed two hydrogen bonds with Asn256 and Asn247. A5 formed a total of 11 hydrophobic contacts. Most importantly, none of the molecules interacted with the resistance mutation related residues ( Table 7).
The novelty, physicochemical and pharmacokinetic profiles of the five compounds were assessed via SwissADME [61]. A search using the SMILES of the five compounds uncovered that all the molecules had no exact structure in the database, asserting their uniqueness. More so, all the molecules complied with R05s [98] except A1 which violated it. A1 violated two R05s comprising MW = 538.57 g/mol and logP = 4.35. Synthetic accessibility score (SAscore) is a useful parameter for estimating the ease of synthesizing a molecule. SAscore ranges from 1 (easy to synthesize) to 10 (difficult to synthesize). Molecules with SAscore <6.0 have lower propensity to synthesize [116,117]. Estimated SAscore values of 4.51, 4.94, 6.04, 3.1, and 3.22 were obtained for A1, A2, A3, A4 and A5, respectively. All the molecules were found to be within the acceptable threshold, except A3. Pharmacokinetic predictions also revealed all the molecules except A3 had high intestinal absorption (Supplementary Table 3). Undesirable pharmacological profiles of a compound can be improved through chemical modifications. Amongst the hits, A5 had no predicted toxicity (Supplementary Table 3). Additionally, anthelmintic activity predictions via PASS revealed A5 had a P A of 0.448 and Pi of 0.028. When Pa>Pi and Pa>0.3, there is a high propensity for it to be a potent anthelmintic warranting further pharmacological evaluation [82,83]. Also, drug off-target activity as a result of drugs having different biological activities other than intended [118,119] cannot be discounted. The two-dimensional structures of the five de novo molecules have been provided (Table 8).

Contribution to the field
Computational techniques have been proven to be a faster and cheaper means of exploring novel drugs. This study supplements current efforts by identifying three African natural compounds with good pharmacological profiles and most importantly predicted to have no interactions with the resistance-associated residues. The three-dimensional structure of the hookworm beta-tubulin was modelled with high confidence. Moreover, the de novo fragment-based design was employed to identify five unique compounds with the potential of circumventing the mutation associated residues. Additionally, a novel mechanism of binding was elucidated which could aid in the design of future potent hookworm betatubulin inhibitors.

Conclusion
This study identified three molecules with anthelminthic potential by screening a library composed of 3385 African natural products against a modelled 3D structure of the betatubulin gene product of Necator americanus. The AutoDock Vina molecular docking protocol was validated using the ROC with an acceptable AUC of 0.714. The three molecules, ZINC95486082, ZINC95486052 and euphohelionon, were predicted to have distinct mechanisms of binding relative to those of known drugs. These compounds did not interact with the benzimidazole resistance-related residues Phe167, Glu198 and Phe200 in N. Americanus beta-tubulin. A sub-structure analysis indicated that ZINC95486082 and ZINC95486052 contained 2-phenylchromen-4-one, a known anthelmintic moiety. Euphohelionon was selected as a parent compound for de novo ligand design because of its predicted anthelmintic properties, molecular interactions, and low binding energy consolidated with MM-PBSA calculations. Also, five novel molecules were generated using the fragment-based drug design and four can be synthesized easily. They also had favourable binding energies when compared to known inhibitors, and exhibited no contacts with any of the resistance mutation residues. These compounds can subsequently be synthesized and experimentally characterized for the development of much-needed novel anthelmintics.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

NP
Natural Product A flow chart for the preparation and selection of ligands for virtual screening. The library consisted of African natural products and five known hookworm beta-tubulin inhibitors. RMSD plot of the molecular dynamics simulation using GROMACS. A plot of RMSD in nanometres (nm) against time in nanoseconds (ns). The RMSD increased from 0 ns to 0.5 ns and levelled off with slight fluctuations towards the end of 1 ns.

Fig. 4.
Ramachandran plot of beta-tubulin model from N. americanus obtained by PROCHECK: 92.3% residues in favourable regions (A, B, L); 7.7% residues in the additional allowed region (a, b, l, p); 0.0% residues in generously allowed regions (-a,-b,-p,-l); and 0% residues in disallowed regions.    Binding energy plot of euphohelionon complex. The graph shows binding energy (kJ/mol) versus time over 100 ns simulation. The schema shows the step-by-step approaches involved in the filtering and selection of compounds. Firstly, 3390 compounds comprising 3385 African natural products and 5 known beta-tubulin inhibitors were screened against the modelled beta-tubulin of the hookworm. Thirty compounds with low binding energies were selected as top hits for downstream analysis. The physicochemical and pharmacological properties, protein-ligand interactions and biological activity predictions of the 30 compounds were assessed in  Table 2 Docking results for the top 30 hits and 5 beta-tubulin inhibitors. The table shows the IDs or common names, IUPAC names and their binding energies (kcal/mol) of the hits. The compounds are arranged in increasing order of binding energy. The lower the binding energy, the better the binding affinity.

Table 5a
Physicochemical  Table 7 A summary of the protein-ligand interactions of the five novel compounds generated with Ligplot+. The table shows the binding energy, the synthetic accessibility score, the interacting residues in the active site, and the respective intermolecular bonds.  Table 8 The names, generated IUPAC names via ChemAxon and two-dimensional chemical structures of the novel potential lead compounds generated using euphohelionon as a template via e-LEA3D [67].
Name Generated IUPAC Name Generated Chemical structure