In silico prediction of new antimicrobial peptides and proteins as druggable targets towards alternative anti-schistosomal therapy

,


Introduction
Despite the mass drug administration for schistosomiasis it continues to be a major health threat to humans.Schistosomiasis is a neglected tropical disease, which is endemic in the tropics.This disease has been reported in 78 countries to have infected an estimated 240 million people, leaving over 700 million people at risk of the parasites [1] .Schistosoma mansoni, Schistosoma haematobium and Schistosoma japonicum are the 3 major schistosome species responsible for morbidity and mortality in humans [2] .The life cycle of the parasite is characterized in two hosts, an intermediate host (freshwater snails) and a definitive host (mammals).The eggs voided by the mammalian definitive host hatches in water to form miracidia that enter the tissue of the snail intermediate host [3] .Cercaria, the Schistosoma larvae, which can penetrate the skin of the mammalian host are then released by the snails.The free-living cercaria transforms into schistosomula that travel through the bloodstream to the liver where they develop and mature into adult worms that lay eggs into the bloodstream of the host with these eggs lodging within the host tissue.At this stage, the host suffers difficulty in blood flow, defects in reproductive organs, lungs, brain, and guts, predisposing the patients to genital schistosomiasis, neuro schistosomiasis, hepato-intestinal schistosomiasis and pulmonary schistosomiasis [4] .Diseases such as cancer, pulmonary and portal hypertension, malaise, liver cirrhosis and skin dermatitis amongst others have been attributed to schistososmiasis [4] .However, the introduction of Praziquantel (PZQ) several years ago has improved the treatment of this disease [5] .PZQ is easily accessible, less expensive and effective against the three major species parasitizing mammals, but it has been faced with the challenge of not showing activity against the juvenile worm [6] .Also, PZQ does not prevent re-infection and parasite resistance has been reported from some regions of the world.Moreover, various compounds have been reportedly used in the treatment of Schistosomiasis; metrifonate, oxamniquine, niridazole, oltipraz and hycanthone [7] .Additionally, some compounds have been proposed to be targets in treating schistosomiasis; these are anti-microbial peptides (AMPs), miltefosine and Imatinib [ 4 , 7 ].
The ability to protect self from pathogens and parasites is indispensable for good health.The innate immune system, which is a conserved mechanism of defence in animals, has been responsible for this protective mechanism [8] .Several studies have documented AMPs as part of the innate immune system [ 8 , 9 ], AMPs are natural antibiotics with multifunctional properties produced by all living species and are currently explored as a vital source for the development of new drugs [10] .Oyinloye and co-workers [11] proposed AMPs as excellent candidates in the treatment and control of schistosomal infections, and further suggested that the schistosome worm evades the innate immune system by reducing the efficacy of the host immune system via mimicking or manipulating it.This further leads to the development of favourable ambiences for the schistosomes promoting their survival and co-habitation within the host in an adaptive host-parasite complex mechanism.This co-habitation leads to immunosuppression, resulting in severe complications and the predisposition of the host to other secondary infections.However, AMPs are known to have immunostimulatory potentials.In addition, AMPs can scavenge disease-causing reactive oxygen species (ROS) produced by the schistosomes [11] .With respect to their various mechanisms of action and characteristics, AMPs have been proposed as emerging drug hits in drug design and discovery [ 10 , 12 ].With all the above-mentioned characteristics of AMPs, it is imperative to identify a class of these biomolecules that can act as therapeutic agents to tackle schistosomiasis or alternatively can act as an adjuvant to PZQ.Therefore, in silico discovery of new druggable compounds using AMPs as the foundation, modelling of AMPs and schistosome proteins using various bioinformatics techniques will aid and enhance the discovery of new schistosomicides since experimental investigations are time and capital consuming.

Data mining and data retrieval
For the purpose of stochastic model construction, various AMP databases such as Antimicrobial Peptide Database (APD) [13] , Collection of Anti-microbial Peptide (CAMP) [14] , and PepBank [15] were queried to extract and collect experimentally validated AMPs with anti-parasitic activity.The retrieved AMPs were documented in FASTA format and three-quarters of these were utilized as the training set for the model construction, whilst the remaining one-quarter was used as the testing set.Consequently, Hidden Markov models (HMMER) [16] were employed in the construction of the stochastic models.

Construction of AMPs profiles using HMMER
The HMMER algorithm version 2.3.2 was used in constructing the predictive profile.The algorithm was operated on POSIX-compatible platforms such as MacOS/X, UNIX and LINUX, to create the desired profiles.HMMER was employed in this study because it enable users to infer AMP families of a query sequence and aids capturing of new peptides performing similar functions from others in public databases.With the help of this algorithm, the constructed profiles enabled the search of protein sequence similarity using probabilistic techniques.However, Ubuntu 16.04.2LTS (XenialXerus) operating system, built on a LINUX kernel was used to construct all the HMMER profiles.The training set of the experimentally proven AMPs was utilized in the construction of the HMMER profile with the robustness of the created profile validated using the testing dataset.

Profile creation
To create the predictive profile, the training dataset was aligned with the hmmalign command of HMMER.This was carried out using the ClustalW alignment tool to perform multiple alignments for the training set using the command line: Clustalw --align --output = gcg --case = upper --seqnos = off --outorder = aligned -infile = family.fasta The result from the predictive profile creation process was saved as msf(gcg) format (family.msf).

Building of the profile
The resulting file was used in the second step of profile HMMER building, known as build profile.hmmbuild-builds a new HMM profile from the multiple sequence alignment file using the command line: hmmbuild family.hmmfamily.msf The 'build profile' is saved in hmm format (family.hmm).

Calibration of the profile
hmmcalibrate-this step calibrates the HMM search statistics.This command line helps to improve the profile sensitivity.

Testing of the profile
Following profile calibration, the hmmsearch step was carried out to search and query the testing dataset.The created HMM profile 'family.hmm' was thereafter used to evaluate the performance of the created profile by testing against a list of independent AMPs.The testing set (represented by one-quarter of the experimentally validated retrieved AMPs) was utilized as the positive dataset hence; the testing set was queried against the constructed profile.This was achieved using the command line:

hmmsearch --E 1e-2 family.hmm family(test set).fasta
The cut-off E-value to query the constructed model was then set to 0.01 as indicated in the command line, since lower E-values indicate a more significant hit.

Performance evaluation of created profile based on the prediction of both positive and negative testing sets
The constructed profile was queried against a negative dataset of 250 neuropeptide sequences with known non-antiparasitic activity.Sensitivity, specificity, accuracy and Mathews Correlation Coefficient were used as parameters to measure the strength and statistical performance evaluation of the created profile.
Sensitivity: This is the percentage of anti-schistosomal AMPs (testing set) correctly predicted as anti-schistosomal AMPs (Positive) Specificity: This is the percentage of non-anti-schistosomal AMPs (negative set) correctly predicted as non-antischistosomal AMPs (negative) Accuracy: This is the percentage of correctly predicted anti-schistosomal and non-anti-schistosomal AMPs.

Accuracy = T P + T N T P +
Mathews Correlation Coefficient (MCC): This is a measure of both sensitivity and specificity.It is worthy to note that MCC = 0 indicate complete random prediction, while MCC = 1 indicates perfect prediction.

Identification of novel putative anti-schistosomal AMPs from proteome sequence databases
One thousand three hundred proteome sequences in FASTA format were retrieved from the ENSEMBL server ( http:// www.ensembl.org/index.html ) as well as the UniProt database ( http://www.uniprot.org) respectively.More so, by employing the hmmsearch module of HMMER, all the retrieved proteome sequences were scanned against the calibrated constructed profile and the cut-off E-values for the search of anti-schistosomal AMPs was set at 0.01 using the command line: hmmsearch --E 1e-2 family.hmmfamily(test set).fasta Based on the set E-value, all identified peptides were considered to be putative anti-schistosomal AMPs.Finally, the physicochemical properties such as net charge, Boman index, instability index, extinction coefficient, hydrophobic residues and isoelectric point of the putative AMPs were calculated using the Bactibase ( http://bactibase.hammamilab.org/physicochem ) and APD ( http://aps.unmc.edu/AP/prediction/prediction_ main.php ) interfaces.

Identification of novel druggable schistosomes proteins
The STITCH (Search Tool for Interactions of Chemicals) database, accessed at http://stitch.embl.dewas used to identify schistosome target proteins using the effective drug against the worms Praziquantel as the query.Schistosoma was selected as the organism and all families of the interacting proteins were displayed; the family with the highest interaction confidence was then selected.

Receptors and ligands modelling and molecular docking
Iterative Threading ASSEmbly Refinement (I-TASSER) server for predicting protein structure and function ( http://zhanglab.ccmb.med.umich.edu/I-TASSER/ ) was used to predict structures of the schistosome proteins and the top 6 putative AMPs selected based on their predictive E-values.The respective amino acid sequences of the putative AMPs and the novel proteins were submitted to the online server I-TASSER.Thereafter, the predicted models of the two proteins were refined using the GalaxyRefine [17] accessible on http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE .The GalaxyRefine server has been documented to be one of the best methods used in increasing the local quality of structures.To achieve it aim, the server remodels sidechain, repack it and performs complete relaxation of the structure using molecular dynamics simulation.More so, this server enhances the backbone structure quality moderately.In addition, RAMPAGE [18] was used to validate the quality of the GalaxyRefine model.In silico molecular docking of the two identified schistosome proteins with the six putative anti-schistosomal AMPs was carried out using PatchDock ( https://bioinfo3d.cs.tau.ac.il/PatchDock/ ); an online server with a geometry-based molecular docking algorithm designed to ascertain docking transformations that yields good molecular shape complementarity [19] .Finally, the docked complex was refined using FireDock ( http://bioinfo3d.cs.tau.ac.il/FireDock/ ), a tool that refines, reshuttles, optimizes and rescores the side chains of the complex.The results in the form of observed interactions between the two biomolecules in the form of docked conformations were examined and visualized using PyMol Molecular Graphics System (2003), DeLano Scientific, LLC, USA.http://www.pymol.org/2/ .

Retrieval of experimentally validated AMPs
Experimentally validated anti-parasitic AMPs were retrieved from various databases as well as the accompanying publication for each AMP.After retrieving the experimentally validated AMPs from the various databases, a final list of 13 AMPs was compiled using Cluster Database at High Identity with Tolerance (CD-HIT) to remove duplicated sequences.Four of the final list of AMPs was used as the testing set whilst the remaining nine AMPs were utilized as training set.Following the creation of the anti-parasitic profile, the scanning of the created profile against the testing set confirmed that the profile would recognize AMPs of this class as it identified AMPs within the testing set with known anti-parasitic activity.Furthermore, the scanning of the 250 neuropeptides (negative set) against the profile shows no hit since the negative set does not possess anti-parasitic activity (Table S1a).After scanning has been done against the negative and the testing set, statistical performance measures such as sensitivity, specificity, accuracy and MCC were calculated to ascertain the robustness of the profile as shown in Table S1b.The performance of the profile is only rated on its specificity and accuracy values.From this study, the created model had a specificity of 100% and an accuracy of 99.3% however, specificity and accuracy values around 95% implies that the constructed profile had more than 95% confidence to predict a peptide to be a putative anti-schistosomal AMP.

Identification of putative anti-schistosomal AMPs
As mentioned in the methodology section, the constructed profile was scanned against the proteome sequence to identify peptides with the same motifs and activity as the profile.All the peptides recognized were termed as putative antischistosomal AMPs and were ranked according to their respective E-values.The significance of a hit is measured by its E-value, which is computed from the bits score, revealing the number of true positives (TP) picked by the training dataset.An E-value of 0.01 signifies that the chance of a hit being false or has come up by chance is only 1%.A final list of 20 AMPs was identified and the top six with the lowest E-values were selected for further study.E-values of the putative anti-schistosomal AMPs were 3.2E-55 for TAK1, 4.8e-08 for TAK2, 5e-07 for TAK3, 4.6e-06 for TAK4, 2.7e-05 for TAK5 and 0.0 0 017 for TAK6.From the results, it can be inferred that these peptides have E-values that are well below the cut-off value (E-value cut-off: 0.01).

Identification of novel schistosome proteins based on interaction study
In order to identify potent latent interacting proteins with PZQ, the STITCH 5.0 database was queried, and the result shown in Fig. S1.Two new proteins, Axonemal dynein intermediate chain, putative (Smp_103,920) and Glycosyltransferase (Smp_052330) were identified as potential interacting partners of PZQ with a confidence score of 0.430 and 0.425 respectively.The amino acid sequences of the proteins were retrieved for further in silico analysis and literature mining.

Analysis of the physicochemical parameters of schistosome proteins and putative anti-schistosomal AMPs
The physicochemical properties of the six putative AMPs were computed using Bactibase [20] and APD [13] to ascertain if the putative AMPs possessed similar properties to all known AMPs ( Table 1 ).The putative AMPs were shown to be novel since there was no match with any existing AMPs in various database libraries.

3-D homology modelling of schistosome proteins and putative anti-schistosomal AMPs and docking studies
Homology modelling of both the schistosome proteins and the putative AMPs was achieved via the I-TASSER server.The outputs generated from this process were saved and visualized using PyMol Molecular Graphics System (2003), DeLano Scientific, LLC, USA as depicted in Fig 1 .The C-score, TM-score and RMSD ( Table 2 ) were used to evaluate the quality of the predicted structures of the putative anti-schistosomal AMPs and the Schistosome proteins.The C-score data revealed the modelled structures were of high quality.Moreover, the structure of glycosyltransferase and axonemal intermediate chain were refined in order to perform an overall structural relaxation by molecular simulations using GalaxyRefine.This server provided five refined models for each protein, which was evaluated by RAMPAGE, and the top ranked refined model (Fig S2) based on RAMPAGE result was selected for further analysis.Structural analysis showed that axonemal intermediate chain has 86.5% amino acid residues in the most favoured region and 10.1% residues in the allowed regions.On the other hand, 84.1% of the amino acid residues in glycosyltransferase were in the favoured region and another 10.9% residues in the allowed region.Thereafter, the binding orientation, as well as the binding strength of each putative AMP (ligand) when bounded to the schistosomal proteins (receptor) was determined using online software PatchDock.This software works by removing all irregular interactions between atoms of the ligand and that of the receptor.Finally, the top 20 docking conformations for the most probable interactions based on binding affinity complementarity score and global energy were displayed ( Tables 3 and 4 ).Based on the binding affinity score, TAK5 and TAK3 have the highest propensity to bind to glycosyltransferase and axonemal dynein intermediate chain respectively.The PDB file for the best scoring complex for each docked complex was saved and visualized using PyMol Molecular Graphics System (2003) (Fig. S3 and S4).Immediately after docking, the schistosome protein-AMP complex was analyzed to ascertain the binding orientation of the ligand AMPs to the protein receptors and the interacting amino acid residues are listed in Table S2 and S3.The right binding orientation will be the one in which the AMPs could prevent the invasion of the human system by the schistosomal worm.Additionally, the analysis would help in drawing a better conclusion on which putative anti-schistosomal AMPs to be implemented as potent inhibitory molecule(s) to fight schistosomal infections.Therefore, the positions of the amino acid residues of the protein receptors interacting with the AMP ligands were mapped as shown in Fig S3 and Fig S4.

Discussion
To contain the various diseases affecting mankind, researchers have employed various computational techniques to identify lead compounds that can serve as another therapeutic option or adjuvant to existing drugs [21][22][23][24][25] .The HMMER algorithm was used to identify six putative anti-schistosomal AMPs; the best peptide has an E-value of 3.20E-55, which is indicative of the very low probability of the peptide to be a falsely predicted anti-schistosomal AMP.The highest E-value observed among the six selected peptides is 1.7E-4, meaning that there is only 1.7E-4% possibility for the peptide to be predicted as a false anti-schistosomal AMP.Thus, all the HMMER predicted putative anti-schistosomal AMPs had excellent probability scores to be considered true anti-schistosomal AMPs.The physico-chemical parameters showed all the putative AMPs are novel since none of them matched any existing AMPs related to schistosome inhibition in various AMPs database libraries as shown in Table 1 .All the peptides had a positive net charge due to the abundance of positively charged (basic side chain) amino acid residues and total hydrophobicity greater than 30%, which is standard for known AMPs as shown by Tincho and co-workers [26] .The binding potentials of antimicrobial peptides as depicted by the Boman index is known to range between 2.53-3.04kcal/mol, which denotes peptides with multifunctional and high binding potentials.Conversely, a low Boman index signifies an antibacterial drug agent with minimal negative effects [27] .In this study, five of the AMPs showed values below the 2.53 cut-off index, except TAK1, which showed a value above the cut-off.This meant that majority of the AMPs tested could possess anti-bacterial activity [27] .

Modeling of putative AMPs and schistosome proteins
The parameters C-score, TM-score and RMSD were used to measure the quality of the predicted models by I-TASSER, thereby enabling the validation of the structural model.The C-score is a confidence score used in evaluating the quality of the predicted model and ranges from −5 to 2, where a higher C-score value indicates a model with a high confidence and conversely for a model with a lower C-score [28][29][30] .TM-score and RMSD are known criteria for assessing the structural similarity between two structures and are generally employed to evaluate the accuracy of structural modelling when the native structure is known.Apart from TAK1, all the predicted structures had a low C-score, which could be indicative of the lack of available templates for their modelling [29] .With all TM-scores that were above or close to 0.5, the predicted structures have correct topology [28] .Also, all the putative antimicrobial peptides, TAK1 to TAK6 had good RMSD.However, after refinement, the protein 3D structures attained good quality, which was ascertained by the Ramachandran plot scores of 84.1% (glycosyltransferase) and 86.5% (axonemal dynein intermediate chain) respectively.

Docking study of the interaction of putative AMPs with schistosome proteins
Molecular docking was done to ascertain the interaction of the putative anti-schistosomal AMPs with glycosyltransferase.This could be a new approach in the inhibition of glycosyltransferase; an enzyme responsible for the biosynthesis of glycans, which helps in the maintenance of cell membrane integrity in Schistosomes [ 31 , 32 ].This approach also helps to establish how the interaction of these putative anti-schistosomal AMPs with axonemal dynein intermediate chain could be a new strategy in the prevention of the protein by acting as an antagonist.Axonemal dynein intermediate chain protein presumably powers the cilia [33] ; the cilia are linked to cell cycle development, proliferation, as well as play a major role in the Schistosomes development and everyday life.The docking results displayed a very high binding affinity score above 8731, which indicates good binding for all the putative anti-schistosomal AMPs [34] .TAK5 was shown to have the best geometric shape complementarity score (13,474) for glycosyltransferase, which can be attributed to its good Boman index (1.45kcal/mol).Remarkedly, this peptide also possesses the lowest global energy of −54.01.Moreover, the high abundance of lysine in this putative anti-schistosomal AMP makes it an excellent drug lead compound because documented evidence has shown that polyl -lysine destroys the surface membrane of adult schistosomes during perfusion [ 35 , 36 ].TAK3, another peptide with a high abundance of lysine had the second highest binding affinity to glycosyltransferase and global energy of −3.83.Further analysis revealed that TAK3 had the highest geometric shape complementarity score of 14,518 for axonemal dynein intermediate chain.Using the FireDock server, it was revealed that TAK 1, TAK 2 and TAK 6 have less global energy with −23.44, −21.50 and −22.11 kcal/mol (Table 5).The high binding score attained by TAK 3 can be accredited to its good Boman index of 1.82 kcal/mol; added to this, the AMP has high lysine content.Therefore, TAK5 and TAK3 are highly probable and energetically favourable model for glycosyltransferase and axonemal dynein intermediate chain respectively.All the interaction between the proteins and the putative anti-schistosomal AMPs are shown in Fig. S3 and S4 respectively.After mapping of the interaction, TAK1, TAK4 and TAK5 have a conserved binding site for glycosyltransferase, conversely, TAK1, TAK2, TAK4 and TAK5 possesses conserved binding site for axonemal intermediate chain.Overall, the realization from this study is that the binding affinity of the putative AMPs with the proteins does not show a parallel decrease from putative anti-schistosomal AMPs with the lowest E-value to the putative anti-schistosomal protein with the highest E-value.This is in agreement with the study of Tincho and colleagues [26] , where ten AMPs were retrieved using similar methods, with two of the AMPs used in developing lateral flow device with an ability to accurately detect both HIV-1 and HIV-2 [37] .

Conclusions
The emergence of drug resistance to anti-schistosomal drugs reveals a clear necessity to design and develop novel therapeutic agents [ 38 , 39 ].AMPs possess broad-spectrum anti-parasitic, anti-fungal, anti-protist, anti-viral and anti-bacterial activities.Therefore, exploration of the inhibition of schistosome proteins using AMPs will provide proper insights into the discovery of new schistosomicides, hence, in silico methods that are less expensive and not time-consuming will hasten this discovery.A plausible conclusion here is that these AMPs bind to the target receptors with high affinity, which indicates good interaction.Certainly, several in vitro and in vivo studies will have to be carried out in order to confirm or further ascertain the strength of these interactions and the stoichiometry of the protein-peptide interactors.More so, the elucidation of the mechanism of action used by the AMPs to elicit their anti-schistosomal properties remain very crucial thus, shedding more insight towards the design and development of a new or alternative treatment regimen for schistosomiasis.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
= True Positive, TN = True Negative, FP = False Positive, FN = False Negative and MCC = Mathews Correlation Coefficient

Table 1
Physico-chemical properties for the 6 putative anti-schistosomal AMPs.

Table 2
Quality assessment scores of the predicted 3D structures of putative anti-schistosomal AMPs and schistosome proteins generated by I-TASSER.

Table 3
Geometric scores of the binding affinity and binding energy obtained from the docking of the putative anti-schistosomal AMPs and glycosyltransferase protein.

Table 4
Geometric scores of the binding affinity and binding energy obtained from the docking of the putative anti-schistosomal AMPs and axonemal dynein intermediate chain protein.