In Silico Screening of Isocitrate Lyase for Novel Anti-Buruli Ulcer Natural Products Originating from Africa

Buruli ulcer (BU) is caused by Mycobacterium ulcerans and is predominant in both tropical and subtropical regions. The neglected debilitating disease is characterized by chronic necrotizing skin lesions attributed to a mycolactone, which is a macrolide toxin secreted by M. ulcerans. The preferred treatment is surgical excision of the lesions followed by a prolonged combination antibiotic therapy using existing drugs such as rifampicin and streptomycin or clarithromycin. These antibiotics appear not to be adequately potent and efficacious against persistent and late stage ulcers. In addition, emerging drug resistance to treatment poses great challenges. There is a need to identify novel natural product-derived lead compounds, which are potent and efficacious for the treatment of Buruli ulcer. Natural products present a rich diversity of chemical compounds with proven activity against various infectious diseases, and therefore, are considered in this study. This study sought to computationally predict natural product-derived lead compounds with the potential to be developed further into potent drugs with better therapeutic efficacy than the existing anti-buruli ulcer compounds. The three-dimensional (3D) structure of Isocitrate lyase (ICL) of Mycobacterium ulcerans was generated using homology modeling and was further scrutinized with molecular dynamics simulations. A library consisting of 885 compounds retrieved from the AfroDb database was virtually screened against the validated ICL model using AutoDock Vina. AfroDb is a compendium of “drug-like” and structurally diverse 3D structures of natural products originating from different geographical regions in Africa. The molecular docking with the ICL model was validated by computing a Receiver Operating Characteristic (ROC) curve with a reasonably good Area Under the Curve (AUC) value of 0.89375. Twenty hit compounds, which docked firmly within the active site pocket of the ICL receptor, were assessed via in silico bioactivity and pharmacological profiling. The three compounds, which emerged as potential novel leads, comprise ZINC38143792 (Euscaphic acid), ZINC95485880, and ZINC95486305 with reasonable binding energies (high affinity) of −8.6, −8.6, and −8.8 kcal/mol, respectively. Euscaphic acid has been reported to show minimal inhibition against a drug-sensitive strain of M. tuberculosis. The other two leads were both predicted to possess dermatological activity while one was antibacterial. The leads have shown promising results pertaining to efficacy, toxicity, pharmacokinetic, and safety. These leads can be experimentally characterized to assess their anti-mycobacterial activity and their scaffolds may serve as rich skeletons for developing anti-buruli ulcer drugs.


Homology Modeling
The protein sequence of isocitrate lyase from M. ulcerans retrieved from the NCBI database had the GenBank accession number EUA86150.1 and comprised 428 amino acid residues [35]. The query sequence was compared to all sequences of known structures stored in Protein Data Bank [36,37] (PDB) via a Basic Logical Alignment and Search Tool (BLAST) search [38,39], which generated a list of protein structures that were similar to the query sequence. To corroborate the results, another template query using SWISS-MODEL template search interface (https://swissmodel.expasy.org/ interactive) was performed [40]. Both template search platforms gave the same results. The chain A of the structures were considered for the study. Templates considered were isocitrate lyase from Mycobacterium tuberculosis with PDB codes of 5DQL, 1F8I, 1F8M, and 1F61 and percentage sequence identities of 91.12%, 91.10%, 91.10%, and 90.87%, respectively. They were considered for homology modeling because they had higher sequence identity when compared to the other templates. All four templates comprising 5DQL, 1F8I, 1F8M, and 1F61 were solved using X-ray crystallography with resolutions of 1.8, 2.0, 1.8, and 2.2 Å, respectively.
The structural and sequence similarity between the templates (5DQL, 1F8M, 1F61, and 1F8I) were assessed by calculating their multiple sequence alignment using the malign command in Modeller 9v17 [41] and further structural alignments of the four three-dimensional (3D) structures used malign3d command in Modeller 9v17.
The 3D structure of ICL from M. tuberculosis with PDB ID 5DQL was obtained as the most plausible template for homology modeling because it had a slightly higher sequence identity of 91.12% when compared to the others despite having the same resolution of 1.8 Å with 1F8M. Five homology models were generated using Modeller 9v17 and model 5 ( Figure 1) was chosen as the best based on the least Discrete Optimized Potential Energy (DOPE) score of −47291.21875 (Table 1). DOPE is an atomic distance-dependent statistical potential that is used to determine the native states of proteins. Therefore, it gives an account of the finite and spherical shape of the native structures [42][43][44].  Table showing five successfully generated models with Modeller 9v17. The DOPE score is mainly based on probability theory and it provides information on the energy of the protein generated via Modeller 9v17.

Models DOPE Score
Model 1 −47200.15625 Model 2 −47099.78906 Model 3 −47185.48047 Model 4 −47193.96484 Model 5 −47291.21875 ModRefiner [45,46] was then used to refine the protein model before carrying out molecular dynamics simulation in GROMACS [47][48][49]. The refined protein had residue Glu155 in the disallowed region. Loop refinement of the protein was then performed using the Modloop server [50] to move the residue back into allowed regions of the protein.  ModRefiner [45,46] was then used to refine the protein model before carrying out molecular dynamics simulation in GROMACS [47][48][49]. The refined protein had residue Glu155 in the disallowed region. Loop refinement of the protein was then performed using the Modloop server [50] to move the residue back into allowed regions of the protein.

Structure Validation and Quality Prediction
PROSA [51] is a quality measure tool that compares the overall model quality score of the protein to that of experimentally solved protein structures in the PDB database, which was used to validate the protein model and the results were displayed in a plot (Figure 2a). A z-score of −8.0 was obtained, which showed that the modeled protein falls within the range of X-rays solved protein structures. A more negative z-score implies a better protein model [52]. The local model quality of the protein was also generated (Figure 2b). Amino acids residues with more negative energy levels have a high tendency of contributing to the overall quality of the tertiary structure. The protein was also submitted to the ProQ server to predict protein quality based on its LGscore [53] and MaxSub score, which are its quality measures. MaxSub is a quality measure calculated from the largest number of residues that can be found in which all distances between the model and the correct structure are shorter than 3.5 Å [53]. Likewise, the LGscore is a p-value score for the significance of a structural similarity match. The protein had an LGscore of 3.690 and a MaxSub score of 0.296, which falls in the range of a "very good model" per prediction ranges [54]. The stereochemical quality of the refined protein was checked using the Ramachandran plot, which was generated with PROCHECK [55]. The

Structure Validation and Quality Prediction
PROSA [51] is a quality measure tool that compares the overall model quality score of the protein to that of experimentally solved protein structures in the PDB database, which was used to validate the protein model and the results were displayed in a plot (Figure 2a). A z-score of −8.0 was obtained, which showed that the modeled protein falls within the range of X-rays solved protein structures. A more negative z-score implies a better protein model [52]. The local model quality of the protein was also generated (Figure 2b). Amino acids residues with more negative energy levels have a high tendency of contributing to the overall quality of the tertiary structure. The protein was also submitted to the ProQ server to predict protein quality based on its LGscore [53] and MaxSub score, which are its quality measures. MaxSub is a quality measure calculated from the largest number of residues that can be found in which all distances between the model and the correct structure are shorter than 3.5 Å [53]. Likewise, the LGscore is a p-value score for the significance of a structural similarity match. The protein had an LGscore of 3.690 and a MaxSub score of 0.296, which falls in the range of a "very good model" per prediction ranges [54]. The stereochemical quality of the refined protein was checked using the Ramachandran plot, which was generated with PROCHECK [55]. The Ramachandran plots of the modeled protein structure before and after loop refinement were generated. The Ramachandran plot drawn before loop refinement placed 95.5% of residues in allowed regions with 0.3% of them in disallowed region while, after loop refinement, 95.5% of residues were placed in allowed regions with no residues in disallowed regions ( Figure 3). This implies that the loop refinement process was successful as Glu155, which was initially within the disallowed regions, was successfully placed in the allowed regions of the protein. The secondary structure of the protein model is composed of 17 helices and 6 strands. Ramachandran plots of the modeled protein structure before and after loop refinement were generated. The Ramachandran plot drawn before loop refinement placed 95.5% of residues in allowed regions with 0.3% of them in disallowed region while, after loop refinement, 95.5% of residues were placed in allowed regions with no residues in disallowed regions ( Figure 3). This implies that the loop refinement process was successful as Glu155, which was initially within the disallowed regions, was successfully placed in the allowed regions of the protein. The secondary structure of the protein model is composed of 17 helices and 6 strands.

Molecular Dynamics Simulations
GROMACS 5.1.1 [47][48][49] was used to minimize the energy of the protein until it reached stability. The minimization was done using the steepest descent method for 1647 steps and the production run was performed for 1 ns. The overall potential energy observed was −1.9786255e+06 kcal/mol where the maximum force converged to 8.9850818e+02 kcal/mol, which is less than the allowable tolerance of 1000 kcal/mol that was set prior to simulation (Figure 4a). Both the NVT and NPT ensemble were run for 100 ps. An average temperature of 300 K was obtained after the 100 ps equilibration phase, which is shown in the temperature graph (Figure 4b). The pressure graph generated showed that the pressure fluctuated widely over the course of the 100 ps equilibration phase. During equilibration, the average value of the pressure was 1.09 bar (Figure 4c). A plot of density against time was generated at the end of isothermal-isobaric ensemble and the running average density was recorded to be 1018.14 kg/m 3 (Figure 4d). The values obtained for the density variation over 100 ps remained stable over time, which indicates that the system was well equilibrated. The RMSD graph shows a sharp increase in the deviation starting from around 0.2 Å to about 1.9 Å and then stabilizes around 1.6 Å within 1000 ps.

Virtual Screening Library of Natural Products
Molecular docking is most frequently used in SBDD because it has the ability to predict, with a substantial degree of accuracy, the conformation of small-molecules (ligands) within the appropriate target binding site [56]. The AutoDock Vina search space center was set to spatial coordinates of 49.7731, 52.376, and 72.30 Å in the X, Y, and Z coordinate axes. The dimensions of the grid box were

Virtual Screening Library of Natural Products
Molecular docking is most frequently used in SBDD because it has the ability to predict, with a substantial degree of accuracy, the conformation of small-molecules (ligands) within the appropriate target binding site [56]. The AutoDock Vina search space center was set to spatial coordinates of 49.7731, 52.376, and 72.30 Å in the X, Y, and Z coordinate axes. The dimensions of the grid box were set to 59.9038, 66.8920, and 44.779 Å by taking into account the entire search space of the protein molecule in order to perform docking. The ligand protein complexes were visualized in PyMOL to identify ligands, which docked firmly within the active site pocket and also had high negative binding energy values. The top 100 hit compounds were visualized after the virtual screening process. Each of the hundred compounds had their binding energies less than or equal to −8.4 kcal/mol with the lowest binding energy being −10.5 kcal/mol. The more negative the values of the binding energy, the better the predicted binding affinity between the ligand and the target. The binding energy values of the virtual screening results were measured as kcal/mol. After visualization of the pose of hundred ligands within the active site, 20 of the ligands were observed to dock firmly and deeply within the active site. Their LigPlots showed the individual residues interacting with the ligands via hydrogen bonding and hydrophobic interactions ( Table 2 and Table S2).

Protein-Ligand Interactions
Hydrogen bond interactions between the ligand and the protein were studied via the Ligplot of the ligand-protein complexes using Ligplot+ and PyMOL. A total of 20 selected ligands of the 100 top hits were observed to dock properly upon visualization after careful study. Hydrogen bonding and hydrophobic interactions are weak intermolecular forces that play key roles in stabilizing ligands energetically at favorable regions of a protein structure [57]. After visualization of pose, the 20 selected hits compounds that formed hydrogen and hydrophobic bond interactions with the residues of the active site of the receptor are shown in Table 2 and Table S2. Each of the ligands had several hydrophobic interactions with a majority of residues within the active site. In terms of hydrogen bonding, ZINC95486006, ZINC95486007, ZINC38143792, and ZINC95485880 had the highest number of hydrogen bond interactions. ZINC95486006 had four hydrogen bonds, and therefore, interacted with residues Asn75, Ser357, Glu380, and Ala390. Similarly, ZINC95486007 had three hydrogen bonds with residues Glu380, Asn75, and Ala390. ZINC38143792 also formed three hydrogen bonds with Arg379, Glu380, and Ser357. Finally, ZINC95485880 also formed three strong hydrogen bond interactions with Glu380, Arg386, and His393. The bond length of all hydrogen bond interactions were less than 5 Å. Upon careful observation, each of these four ligands formed hydrogen bonding with Glu380. Two other ligands formed hydrogen bond interactions with Glu380. Likewise, Asn75 was also involved in the hydrogen bond interactions of six of the ligands. Additionally, Glu380 and Asn75 could be essential residues in the active site of the protein ( Table 2 and Table S2). ZINC95486305, ZINC95486303, and ZINC95485905 had two hydrogen bonds interactions each with the receptor. ZINC95485882 was the only top ligand that did not have any hydrogen bond interactions with any of the residues despite its good binding energy, and therefore, was ranked to the bottom of the Table (Table 2 and Table S2). Each of the remaining ligands had one hydrogen bond interaction with only one residue in the active site except ZINC95486001, which had two hydrogen bond interactions with the same residue (Asn 75).

. Superimposition and Alignment
After re-docking and superimposition, the predicted docking poses and the experimentally determined poses of the co-crystallized ligands upon alignment shared common interactions with some specific residues in the active site. However, superimposition of the crystallographic ligand and redocking pose was used as a means of validating docking [58]. We superimposed the co-crystal complex and the re-docked ligand complex in order to identify critical overlapping residues. The overall goal still remained unchanged since we were able to reasonably validate AutoDock4 using the re-docking method. When the re-docked pose of Succinic acid of 1F8I was superimposed with that of the co-crystalized ligand, there was overlap of six critical residues with their corresponding hydrogen bond interactions ( Figure 6). These residues include Gly192, His193, Asn313, Ser315, Ser317 and Thr347. This shows that AutoDock4 was able to virtually reproduce a similar pose in the same environment we performed the virtual screening. Likewise, for the predicted pose of pyruvic acid of 1F8M receptor, there was overlap of five critical residues with their corresponding hydrogen and hydrophobic bond interactions. The overlapped residues comprise Trp93, Cys191, His193, Asn313 and Ser315 ( Figure  S1). In addition, for the 5DQL used as a template for the structural modeling, the re-docked pose of 4-hydroxy-2-oxobutanoic acid ligand was superimposed with the co-crystalized ligand. There was an overlap of critical residues comprising Trp93, Cys191, and Thr347 with their corresponding hydrogen and hydrophobic bond interactions (Figure 7). Among the ligands, only glyoxylic acid of 1F8I formed overlapping molecular interactions with two of the critical residues comprising Arg228 and Thr347 upon re-docking and subsequent superimposition ( Figure S2). Upon docking the four ligands to the generated ICL model of M. ulcerans, only pyruvic acid had one overlapping interaction involving Asn313 upon comparing with the co-crystalized structure (PDB ID 1F8M) ( Figure S3). Even though, the other three ligands formed hydrogen and hydrophobic bond interactions with residues within the selected active site, no overlapping interactions with their corresponding residues were observed upon superimposition with their co-crystalized ligands. They, nonetheless, had binding energies below −5 kcal/mol. The binding energies for the ligands comprising 4-hydroxy-2-oxobutanoic acid, glyoxylic acid, succinic acid, and pyruvic acid are −4.0, −4.5, −4.9, and −4.6 kcal/mol, respectively. Furthermore, the validation of the molecular docking was also undertaken by aligning the re-docked ligands with their respective co-crystalized complexes [59]. The RMSD values of the alignment of the re-docked ligands with the co-crystalized complexes of 1F8I, 1F8M, and 5DQL are 1.801 Å, 1.218 Å, and 1.769 Å, respectively. All the RMSD values of the alignments are well below 2.0 Å, which is considered the threshold for good alignment [60].

ROC Curve Analysis
Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) is a plausible metric for evaluating the classification ability of a docking model to distinguish between docked decoys and active ligands. The ROC curve provides a graphical representation of the overall performance of the docking to discriminate amongst the active ligands and the decoys when screened against the ICL receptors [61][62][63]. When AUC of ROC is closer to 1, the higher the ability of the model to discriminate between active ligands and decoys and closer to 0 is an indication of poor classification. AUC of 1 means perfect classification between active ligands and decoys with the system able to distinguish between true and false cases without errors. The value of 0.5 implies poor prediction ability with average random selection, and less than 0.70 is indicative of moderate discrimination [62,64]. The values of the AUC of the ROC curve for the active ligands and the 199 decoys screened separately against the model of ICL of M. ulcerans, 1F8I, 1F8M, and 5DQL are 0.89375, 07625, 0.76938, and 0.73567, respectively (Figure 8 and Figures S4-S6). AUC values from 0.8 to 0.9 are considered to be reasonably good while 0.7 to 0.8 are acceptable [65,66].

ROC Curve Analysis
Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) is a plausible metric for evaluating the classification ability of a docking model to distinguish between docked decoys and active ligands. The ROC curve provides a graphical representation of the overall performance of the docking to discriminate amongst the active ligands and the decoys when screened against the ICL receptors [61][62][63]. When AUC of ROC is closer to 1, the higher the ability of the model to discriminate between active ligands and decoys and closer to 0 is an indication of poor classification. AUC of 1 means perfect classification between active ligands and decoys with the system able to distinguish between true and false cases without errors. The value of 0.5 implies poor prediction ability with average random selection, and less than 0.70 is indicative of moderate discrimination [62,64]. The values of the AUC of the ROC curve for the active ligands and the 199 decoys screened separately against the model of ICL of M. ulcerans, 1F8I, 1F8M, and 5DQL are 0.89375, 07625, 0.76938, and 0.73567, respectively (Figures 8 and S4-S6). AUC values from 0.8 to 0.9 are considered to be reasonably good while 0.7 to 0.8 are acceptable [65,66].

Pharmacological Studies for Discovery of Leads
After virtual screening, the 20 selected hit compounds and 5 known drugs comprising rifampicin, streptomycin, clarithromycin, moxifloxacin, and amikacin were subjected to ADME/Tox studies and physicochemically profiled using Lipinski's rule of five (molecular weight not more than 500 Da, hydrogen bond donor not more than 5, hydrogen bond acceptors not more than 10, and logp value not greater than 5) [67,68]. Upon prediction via SwissADME [69], eight of the top compounds and four known drugs were observed to have violated two or more of Lipinski's rule (Tables 3 and  S3). In addition, 12 out of the 20 top compounds were predicted as either water insoluble or poorly soluble (Tables 3 and S3). Out of the five drugs, rifampicin was predicted to be poorly soluble while

Pharmacological Studies for Discovery of Leads
After virtual screening, the 20 selected hit compounds and 5 known drugs comprising rifampicin, streptomycin, clarithromycin, moxifloxacin, and amikacin were subjected to ADME/Tox studies and physicochemically profiled using Lipinski's rule of five (molecular weight not more than 500 Da, hydrogen bond donor not more than 5, hydrogen bond acceptors not more than 10, and log-p value not greater than 5) [67,68]. Upon prediction via SwissADME [69], eight of the top compounds and four known drugs were observed to have violated two or more of Lipinski's rule (Table 3 and Table  S3). In addition, 12 out of the 20 top compounds were predicted as either water insoluble or poorly soluble (Table 3 and Table S3). Out of the five drugs, rifampicin was predicted to be poorly soluble while amikacin was predicted to be soluble. This property, therefore, implied that the majority of the predicted hit compounds may exhibit poor oral administration (Table 3 and Table S3). It was also observed that five of the predicted top compounds comprising ZINC38143792, ZINC95485880, ZINC95486231, ZINC95485943, and ZINC03941105 complied with all ADMET filtering rules including that of Lipinski's. Among these five compounds, ZINC38143792 and ZINC95485880 were considered as leads because aside complying with all Lipinski rules, these compounds also had satisfactory solubility properties. Furthermore, they were not inhibitors to any of the Cytochrome P450 isoenzymes, which implies sufficient drug elimination properties via metabolic biotransformation, and therefore, should be given more priority (Table 4). ZINC38143792 was also predicted to be a non-Blood Brain Barrier Permeant (BBB) ( Table 4). However, ZINC95485880 failed and we can suggest that these compounds could likely not interfere with the activities of the nervous system by permeating or crossing boundaries of the blood brain barrier. While some compounds violated a rule of Lipinski, they, nonetheless, did very well in other ADMET properties such as solubility, bioavailability score, gastrointestinal absorption, and toxicity. Prominent among them was ZINC95486305, which was then added to the list of probable leads. The essence of toxicity profiling was to determine the cardiac toxicity and mutagenicity using ADMET predictor (Version 8.1). Cardiac toxicity was based on the hERG model, which predicts whether the compound blocks the hERG K+ channel or not. A "yes" indicate a compound has the likelihood to block the channel and a "no" indicate otherwise. From Table 5, ZINC95486182 and ZINC95486142 failed the toxicity tests while all the others passed. None of the compounds were predicted to be mutagenic (Table 5).

Prediction of Lead Compounds
Generally, discovering lead compounds after virtual screening is based on three primary criteria which are binding energy, molecular bond interactions and pharmacological profiling. Lead compounds are the most probable compounds, which have very low binding energies, strong hydrogen and hydrophobic bond interactions as well as reasonably good ADMET properties. Therefore, after virtual screening of the AfroDb database, 20 top compounds were selected as promising candidates for further analysis based on their low binding energies and reasonably good poses within the active site pocket. Out of this number, ZINC95486006 and ZINC95486007 formed four and three hydrogen bonds, respectively, within the active site but both had unfavorable ADMET properties. Irrespective of their strong hydrogen bond interactions, these ligands had very high molecular weights of 666.805 Da and 668.821 Da. In addition, each possessed 12 hydrogen bond acceptors which violated the Lipinski's rule of five. ZINC95486183, ZINC95486184, ZINC95486142, ZINC95486182, and ZINC95486303 also formed hydrogen bond interactions, but their ADMET properties were nonetheless very poor and fell short in molecular weight and log-p value of the Lipinski's rule.
Among the lead molecules, Euscaphic acid was isolated from Hoslundia opposita, which is an aromatic medicinal herb that grows all over Mozambique. Euscaphic acid exhibited a minimum inhibitory concentration of 50 µg mL −1 against a drug-sensitive strain of M. tuberculosis [72]. Since Mycobacterium ulcerans is a close homologue of Mycobacterium tuberculosis, Euscaphic acid can be screened against the ICL of Mycobacterium ulcerans as a possible inhibitor. Additionally, the biological activities predicted by PASS [73,74] relevant to M. ulcerans were dermatological and antibacterial for the other two leads. PASS predicted dermatological and antibacterial activity for ZINC95486305 with probable activity (Pa) and probable inactivity (Pi) values of 0.507 and 0.031 for dermatology, and Pa and Pi values of 0.354 and 0.042 for antibacterial activity, respectively. Similarly, PASS predicted dermatological activity for ZINC95485880 with Pa and Pi values of 0.300 and 0.087, respectively. Pa is based on the probability that a compound under investigation belongs to a subclass of active compounds and Pi is the probability that it belongs to a subclass of inactive compounds within PASS training datasets. When the Pa of a compound is greater than Pi, there is a drive to further investigate the pharmacological activity [74]. Since the Pa values in both dermatological and antibacterial activities were more than the Pi, it is necessary to explore the predicted pharmacological properties of the two leads in vitro. To support the aforementioned, a study had shown that herbal preparations with dermatological and antimicrobial properties possessed anti-M. ulcerans activity [75].  Table 4. Pharmacokinetics properties of predicted compounds and five known drugs. The pharmacokinetics properties comprised cytochrome inhibition, the blood brain barrier permeant (BBB), P-glycoprotein (P-gp) substrates, and gastrointestinal (GI) absorption.     Table 5. Cardiac Toxicity and Mutagenicity tests. Cardiac toxicity was based on the hERG model, which predicts whether the compound blocks the hERG K+ channel or not. A "yes" indicate a compound that has the likelihood to block a channel and a "no" indicate otherwise and "negative" means compound might not cause any mutation in host genes.

Induced Fit Docking
Docking techniques have emerged as useful tools in drug design to virtually screen libraries with the aim of discovering new inhibitors of protein targets. However, the authenticity and credibility of docking predictions are sometimes constrained by difficulties in modeling protein flexibility during ligand binding [76,77]. Induced-fit docking (IFD) was used to validate the predicted leads since IFD consider both the ligands and receptors as flexible [78,79]. The GlideScore, which is used to rank resulting complexes after induced-fit docking, is an empirical scoring function that provides an estimate of the binding affinity between a ligand and a receptor. Lower GlideScores are mostly representative of reasonably good binding between a ligand and a receptor [80][81][82]. Therefore, the more negative the GlideScores, the more plausible the binding [83]. The values obtained for docking scores and GlideScores were the same for all the three lead complexes. The complexes of ZINC95485880, ZINC95486305, and ZINC38143792 had GlideScores of −7.182 kcal/mol, −6.808 kcal/mol, and −5.449 kcal/mol as well as IFD scores of −883.740 kcal/mol, −885.405 kcal/mol, and

Induced Fit Docking
Docking techniques have emerged as useful tools in drug design to virtually screen libraries with the aim of discovering new inhibitors of protein targets. However, the authenticity and credibility of docking predictions are sometimes constrained by difficulties in modeling protein flexibility during ligand binding [76,77]. Induced-fit docking (IFD) was used to validate the predicted leads since IFD consider both the ligands and receptors as flexible [78,79]. The GlideScore, which is used to rank resulting complexes after induced-fit docking, is an empirical scoring function that provides an estimate of the binding affinity between a ligand and a receptor. Lower GlideScores are mostly representative of reasonably good binding between a ligand and a receptor [80][81][82]. Therefore, the more negative the GlideScores, the more plausible the binding [83].
The values obtained for docking scores and GlideScores were the same for all the three lead complexes. The complexes of ZINC95485880, ZINC95486305, and ZINC38143792 had GlideScores of −7.182 kcal/mol, −6.808 kcal/mol, and −5.449 kcal/mol as well as IFD scores of −883.740 kcal/mol, −885.405 kcal/mol, and −892.462 kcal/mol, respectively. The induced-fit pose and molecular interactions of ZINC95485880 are shown in Figures 10 and 11, while those of ZINC95486305 and ZINC38143792 are shown in Figures S8 and S9, respectively. The IFD scores estimate the most plausible conformations of the ligand complex [84]. Lower IFD scores usually represent favorable binding [85] and they are calculated using Equation (1). The three ligand complexes subjected to induced-fit docking formed major interactions with residues comprising THR73, ASN75, GLU380, ARG386, HIS352, ARG379, and SER357, which were also predicted to be key interacting residues within the active site of ICL after molecular interaction analysis with LigPlot+ ( Table 2 and Table S2).
Equation (1): IFDScore = 1.0 × Prime_Energy + 9.057 × GlideScore + 1.428 × Glide_Ecoul [86,87]. The Prime_Energy is the total energy of the system while the Glide_Ecoul is the Coulomb term (Coulomb energy).  Figures 10 and 11, while those of ZINC95486305 and ZINC38143792 are shown in Figures S8 and S9, respectively. The IFD scores estimate the most plausible conformations of the ligand complex [84]. Lower IFD scores usually represent favorable binding [85] and they are calculated using Equation (1). The three ligand complexes subjected to induced-fit docking formed major interactions with residues comprising THR73, ASN75, GLU380, ARG386, HIS352, ARG379, and SER357, which were also predicted to be key interacting residues within the active site of ICL after molecular interaction analysis with LigPlot+ (Tables 2 and S2).

Sequence Retrieval and Homology Modeling
The protein sequence of isocitrate lyase was retrieved from the National Center for Biotechnology Information (NCBI) database. The sequence was compared to all sequences of available 3D structures stored in the Protein Data Bank using Basic Logical Alignment and Search Tool (BLAST) in order to find suitable templates. Modeller 9.17 [41] was then used to model the 3D structure of the isocitrate lyase using the selected template sequences via the homology modeling process.

Protein Structure Refinement
After homology modeling, all potentially available bumps and clashes in the protein structure were removed using the WHAT IF server [88] by rotating side chain torsion angles and checking all contact distances between atom pairs. ModRefiner was then used to refine the protein model by drawing the model close to its native state in terms of its side-chain positioning, backbone topology, and hydrogen bonds [46]. It was also used to generate significant improvements in the local structure.

Sequence Retrieval and Homology Modeling
The protein sequence of isocitrate lyase was retrieved from the National Center for Biotechnology Information (NCBI) database. The sequence was compared to all sequences of available 3D structures stored in the Protein Data Bank using Basic Logical Alignment and Search Tool (BLAST) in order to find suitable templates. Modeller 9.17 [41] was then used to model the 3D structure of the isocitrate lyase using the selected template sequences via the homology modeling process.

Protein Structure Refinement
After homology modeling, all potentially available bumps and clashes in the protein structure were removed using the WHAT IF server [88] by rotating side chain torsion angles and checking all contact distances between atom pairs. ModRefiner was then used to refine the protein model by drawing the model close to its native state in terms of its side-chain positioning, backbone topology, and hydrogen bonds [46]. It was also used to generate significant improvements in the local structure.

Molecular Dynamics Simulation of Protein Structure
The refined structure of the protein model was subjected to molecular dynamics simulation using Gromacs 5.1.1 [47][48][49] with OPLS-AA as a force field. To carry out the simulation, the initial protein structure was solvated in a cubical box using the SPC/E water model, which is a generic equilibrated three-point solvent model. The solvated system contained a charged protein with a net charge of −13 electrons. 13 sodium Na + counter ions were added to neutralize the net charge. The solvated electro-neutral system was then assembled for energy minimization to ensure that the system had no steric clashes or inappropriate geometry. The energy of the relaxed structure was first minimized using the steepest decent method. Equilibration was conducted under two phases comprising NVT and NPT ensembles. The NVT ensemble is conducted under a constant number of particles, volume, and temperature. The pressure was conducted under NPT ensemble with the number of particles, pressure, and temperature kept constant. Both the NVT ensemble and the NPT ensemble were run for 100 ps. The temperature was set to 310 K (37 • C), which represents the normal physiological temperature for the human body. Upon completion of the two equilibration phases, position restraints were released and production molecular dynamics (MD) was run for 1 ns.
CASTp [91,92] was used to determine the most plausible binding pockets of the protein model. The computation of the binding site is based on quantitative characterization of surface pockets and internal voids, which are the important concave regions associated with binding properties of the protein structure [91]. The predicted cavity by CASTp was confirmed by the blind docking [93] process using AutoDock Vina within PyRx software version 0.8 [94,95].

Molecular Docking and Mechanisms of Binding
A library of 885 natural compounds were retrieved from AfroDb for docking against the protein model. AfroDb [32] is a database that contains African natural compounds and a subset catalogue of the ZINC database [96]. All the AfroDb drug-like compounds were retrieved in an SDF format in a single file. The ligands were energy minimized using Open Babel in PyRx (Version 0.8) prior to docking in order to obtain 3D ligand structures with proper bond lengths between atoms. Energy minimization was carried using the Universal Force Field (UFF), which is reasonably good for molecules containing elements found within the periodic Table [97]. The optimization algorithm employed in this study was the conjugate algorithm. This was followed by conversion of all the ligands into AutoDock PDBQT format. AutoDock Vina embedded in the PyRx software was used to perform molecular docking and virtual screening keeping the ligands flexible and the receptor rigid. Each ligand was allowed nine conformers for every docking process. Known inhibitors and ligands obtained from BindingDB [71,98], PDB [36,99], and literature were screened against the generated homology model of ICL of Mycobacterium ulcerans and the crystal structures of M. tuberculosis. The mechanisms of binding between the ligand-receptor complexes were profiled using Ligplot+ [100]. For most docking programs, a flexible ligand docks to rigid receptor but understanding the flexibility of the receptor is very vital to conformational changes since most protein structures experience side-chain or backbone movement upon binding to a ligand [77,83]. Induced Fit Docking (IFD) of the leads was done using the IFD module in the Schrödinger software suite [79]. The GlideScore and IFD score were generated for each pose.

Validation of Docking Protocol
In order to validate the docking protocol, ligands with experimentally determined pose within crystallographic protein structures were extracted from receptor-ligand complexes and re-docked to the receptors [101]. The complexes used were templates of the ICL of M. tuberculosis, a close homologue of M. ulcerans [16,102] and they were 5DQL, 1F8I, and 1F8M with their co-crystallized ligands comprising 4-hydroxy-2-oxobutanoic acid, succinic acid, glyoxylic acid, and pyruvic acid. The co-crystalized ligands were initially removed from their respective protein active sites and later re-docked into the receptors, which was done previously [60]. The predicted docking poses were compared to their corresponding crystallographic complexes and were superimposed in order to assess how well they align to each other and their common residues of molecular interactions. These four ligands were also docked within the modeled structure of ICL of M. ulcerans. Additionally, LigAlign was used to align the re-docked ligand complexes with the ligands in the co-crystalized complexes. LigAlign enables the analysis of ligand alignments within active sites [59].
As part of the validation of the molecular docking, the four known ligands in the co-crystalized structures of M. tuberculosis were extracted from the structures of the complexes comprising 1F8I, 1F8M, and 5DQL to aid in generating the receiver operating characteristics (ROC) curve. The ligands, which include succinic acid, glyoxylic acid, pyruvic acid, and 4-hydroxy-2-oxobutanoic acid, were used to obtain decoys via the Directory of useful decoys and enhanced (DUD-E) [103]. The decoys generated have similar physicochemical properties to the ligands but different 2-D topology. The number of generated decoys of succinic acid, glyoxylic acid, pyruvic acid, and 4-hydroxy-2-oxobutanoic acid were 50, 50, 50, and 49, respectively. Duplicated protonated ligands and their corresponding decoys were eliminated to prevent analog bias [62]. A total of 199 decoys and the four active ligands were screened separately against the ICL model structure of M. ulcerans, 1F8I, 1F8M, and 5DQL in order to calculate the area under the curve (AUC) value of the computed ROC curve. The ROC curves were generated with default settings using easyROC (Ver. 1.3), which operates in the R language environment [63]. The default settings consist of non-parametrically fitted ROC curve, type I error of 0.05, Standard error estimation of DeLong (1988), and confidence interval of DeLong (1988) [104].

Prediction of Activity Spectra for Substances (PASS) for Leads
The SMILES files of ZINC95486305 and ZINC95485880 were used to predict the biological activity using the prediction of activity spectra for substances (PASS) tool [73]. PASS is a tool that predicts over 3500 different kinds of biological activity, pharmacological effects, and mechanisms of action, toxicity, and interactions with metabolic enzymes and transporters as well as influence gene expression. The PASS algorithm is based on the analysis of structure activity relationships for over 250,000 bioactive substances including drugs, drug candidates, leads, and toxic compounds. PASS, therefore, estimates the probability that a compound belongs to a particular class of active compounds.

Conclusions
Isocitrate lyase is a key enzyme in the glyoxylate cycle of Mycobacterium ulcerans, and in this present study, its 3D structure was successfully generated using homology modeling techniques. In addition, molecular dynamics simulation was successfully performed on the predicted protein model. Furthermore, using molecular docking, 885 natural compounds retrieved from AfroDb were screened via the predicted active site of the modeled structure. Out of 885 compounds, 20 hit compounds were found based on a low binding energy (strong binding affinity) and reasonable docking pose upon visualization. An ROC curve with a reasonably good AUC value of 0.89375 was used to validate the docking protocol for the model structure of ICL of M. ulcerans. The hit compounds were further analyzed using ADMET testing and physicochemical profiling. Therefore, we propose that ZINC38143792, ZINC95485880, and ZINC95486305 are the lead compounds since they emerged as the best compounds among the 20 hits based on their favorable ADMET properties and strong active site interactions. ZINC38143792, which is also known as Euscaphic acid, has been reported to inhibit a drug-sensitive strain of M. tuberculosis [72]. ZINC95486305 and ZINC95485880 were both predicted to possess dermatological activity while ZINC95486305 also had antibacterial properties.
If the efficiencies of the leads are successfully proven via biochemical assays, these molecules could be important inhibitors to ICL, which is an essential target in the Buruli ulcer disease mechanism. The predicted leads can serve as scaffolds for further development of potent anti-buruli ulcer drugs.
Supplementary Materials: Figures S1-S9, and Tables S1-S3 are available as supplementary materials accessible online.

Funding:
The study was not funded by any funding body.