Comparative Genomics Analysis of Cryptosporidium Parvum and Repurposing of Triazole Derivative as Anti-Cryptosporidial Agents

Cryptosporidium, a member of phylum apicomplexa is considered as an opportunistic pathogen for humans as well as other important livestock. It is a causative agent for water-borne infectious diarrheal like disease cryptosporidiosis with a comparatively high mortality rate among children and immunocompromised patients worldwide. The statics shows that cryptosporidiosis is among the top three threats for the survival of infants, especially in developing countries. To date, no fully effective drug therapy is available to treat cryptosporidiosis. Therefore, the discovery and development of an effective anti-cryptosporidial drug with a novel mechanism of action have become an insistent task for controlling cryptosporidiosis. The literature revealed that various heterocyclic aromatic compounds have various invincible biological properties such as anti-fungal, anti-bacterial and anti-parasitic. Among these, Triazole is one of the most promising candidates that grab attention by researchers, chemists, microbiologists, and pharmacologists through various success stories. Triazole nucleus is present in various natural anti-infective and medicinal compounds. In this research, we have collected triazole compounds from various published works and create a database of these novel compounds for further exploration as anti-cryptosporidial compounds. It is hoped that this research provides new insights for rational anti-cryptosporidial chemotherapeutic agents who will be more active and less toxic.

Cryptosporidiosis is an infectious disease caused by protozoan parasite Cryptosporidium. To date, 27 species and 60 genotypes of the parasite have been identified worldwide. Among them, C. parvum and C. hominis are mainly responsible for human cryptosporidiosis. 1 Human cryptosporidiosis is marked off by greenish watery, profuse fetid diarrhea, dehydration, abdominal pain, fever, and vomiting while failing to gain weight and malnutrition are few symptoms in chronic cases. Immunocompetent host recovers within two weeks in absence of any treatment, but the situation becomes worst in children and host having impaired immune system. In immunecompromised individuals such as those affected by Human Immunodeficiency Virus (HIV) experience unmanageable lethal diarrhea 2 . This acute diarrheal disease remains the largest threat to the health of young children and agriculturally important livestock worldwide 3 . The field reports of GEMS (The Global Enteric Multicenter Study ) procured by three-year case-control study over 22000 children of aged five years at seven divergent sites covering Africa and Asia continent manifested parasite second diarrhoea causing agent after rotavirus 4 .
Cryptosporidium infection accounts for 20 % and 9 % diarrheal cases in the young population of developing nations and developed nations respectively. In parallel, the findings of Global Network for the study of Malnutrition and Enteric Diseases ( MAL-ED) obtained through a five-year-long birth cohort study in 2145 children of age group 0-24 months at eight public sites in South America, Asia and Africa proclaimed Cryptosporidium species among top five agents that are liable for diarrhoeal mortality in first year of life 4 . In India, the highest occurrence of cryptosporidiosis is noted in southern and northern regions in infants and young children 5 .
The cryptosporidium life cycle consisted of both sexual and asexual stages that complete in a single host. Infected hosts shed oocysts in defecation that are infective as well as environmentally resistant phase 6 . Oocysts outlive in extreme environmental conditions and remain unpretentious to chemicals, environmental stresses and even most ultra-water purification drinking and sewage water treatment methods. Pathogenesis begins when oocysts are ingested by the host. Four spindle-like Sporozoites released into the gastrointestinal tract followed by differentiation to trophozoites inside parasitophorous vacuoles that are extracytoplasmic to intestinal epithelial cells. Trophozoites initiate an asexual cycle and undergo two consecutive merogony producing type I and type II meronts. Subsequently, type I merozoites and type II merozoites differentiate from type I and type II meronts respectively. Type I merozoites infect neighbor cells whereas type II merozoites develop into microgamonts (male gametes) and macrogamonts (female gametes) marks the initiation of the sexual phase of reproduction. Than fertilization in between microgametes and macrogametes results in oocysts development. As a result of sporogony, thick-walled oocysts which released in feces and thin-walled oocysts that remain in the host 7 .
Now it is established that for such widespread and prevalence of cryptosporidiosis there are mainly two reasons the first one is that infected person sheds large number oocysts that are immediate infective stages to a healthy person. Transmission takes place through a fecal-oral route either directly or indirectly. Another reason is the lack of a fully effective chemotherapeutic agent to treat cryptosporidiosis in all patients 8 .
The identification of drug-target enzymes that differ in human counterparts led to the foundation of comparative genomics. Previous studies revealed that Cryptosporidium has many unique metabolic pathways in comparison to other apicomplexan species 9 .
However, there is no potential drug/ antibiotic against this protozoan disease. The currently available drugs such as decuquinate, sulphaquinoxaline, halofuginone, paromomycin have shown less efficiency. FDA approved nitazoxanide has shown potential effect however in immunodeficiency case it is not effective 10 . Controlling this disease is very difficult because the oocytes of C. parvum are. Thus, it is important to identify the possible drugs to control this lifethreatening disease.

Retrieval of Proteome and Identification of Non-Homologous Crucial Proteins
Complete genome of C. parvum was retrieved by using the National Center for Biotechnology Information (NCBI) database (http:// www.ncbi.nlm.nih.gov/genomes/apicomplexa/) in fasta format 11 . All the sequences manually inspected and short sequences (less than 100 amino acids) were stricken out because short proteins have the least chance to be as essential genes. The CryptoDB (http://CryptpoDB.org) database was chosen to access essential genes of the parasite 12 .

Selection of Non-Homologous Proteins
The CD-HIT is a suite of a program that is very useful in comparing and clustering of proteins and nucleotides. The whole proteome of C. parvum was submitted to the CD-HIT server (weizhong-lab.ucsd.edu/cdhit-web-server/cgi-bin/ index.cgi?cmd=cd-hit) for filtration of analogous, paralogous and homologous proteins 13 . All the default parameter was chosen except sequence identity cut-off reset to 0.7. The non -homologous proteins of the parasite were obtained through BLAST P analysis as opposed to the human genome with expectation value cut-off reset at 10-4 and other parameters were chosen as default 11 . The non-homologous proteins of the parasite were submitted to the CryptoDB database (http:// CryptoDB.org) for the extraction of essential parasite genes. At this step expectation value (E-value) cut-off and bit score cut-off were chosen 10-10 and 100 respectively and the rest of the parameters were chosen as a default 14 .

Metabolic Pathway Analysis
KASS-KEGG Automatic Annotation server (http://www.genome.jp/kegg/kass/) was selected for metabolic pathway analysis. KASS program is helpful in the functional annotation of the genome. This analysis results in an assignment of functional role to each gene. This is also helpful in the retrieval of unique proteins for the drug targets 15 .

target Protein Model Generation
Template selection is a process of identifying a suitable protein that shares nearly the same structure of the query protein which doesn't possess the 3D structure. Template selection is very important in comparative protein modeling. Templates can be chosen by various tools such as BLAST, FASTA, Swiss-model, etc 16 . In the case of BLAST and FASTA, the sequence of the protein in FASTA format can be uploaded and the templates can be manually selected by considering the score value and the E value 17 . In the case of the Swiss-Model server, it automatically chooses the template and models the protein structure.in this study, we choose the BLAST tool for generating a respective template. In this tool, a high level of sequence identity should guarantee a more accurate alignment between the target sequence and template structure 18 . The sequence of CpIMPDH was retrieved from Uniprot and submitted for BLAST against PDB protein for obtaining perspective 3D structure 19 . Hits which fulfilled criteria of query coverage > 95 %, sequence identity >80 % and PDB structure resolution < 3.0 Å were selected as a template for homology modeling of IMPDH Inosine 5'-monophosphate dehydrogenase (IMPDH) is the most important molecular agent when looking for target-specific drug designing because this protozoan parasite cannot reclaim guanosine and therefore relies on IMPDH for guanine nucleotides synthesis to survive. Interestingly, C. parvum seems to have acquired the IMPDH gene from proteobacteria through lateral gene transfer and IMPDH of C. parvum is functionally and structurally distinct from eukaryotic IMPDH enzymes 20 . Therefore, mutational or inhibitory action against the IMPDH gene is considered to be effective and target specific. In silico identification and optimization of therapeutic inhibitors are cost and time effective. Here most effective additional inhibitors derivatives have been identified and optimized to inactivate the functional domain of IMPDH for treating cryptosporidiosis. Previously we also reported a new chemical scaffold using a structurebased pharmacophore approach against IMPDH of C. parvum. 21

Active Site Prediction of Target Protein
CASTp 3.0, Computed Atlas of Surface Topography of Proteins (http://www.sts.bioe. uic.edu/castp) is a free online tool which scans structural geometry of macromolecule and shows cavities on physio-chemical properties of residues of surrounding residues 22 . Modeled structure of IMPDH protein submitted to this tool for the spotting of binding pockets. The most prominent binding sites would be selected for further in-silico analysis.

ligand data set Preparation for Virtual Screening
Virtual screening is the foundation pillar for computer-aided drug design paradigm. In this computer-assisted process, virtual evaluation of large chemical libraries is accomplished by docking molecules into target protein and also prioritization of chemical compounds on the basis of binding affinities 23 .
In this study, we have chosen the PubChem (https://pubchem.ncbi.nlm.nih.gov) database for finding new molecular scaffolds that selectively interact with the target protein. NCBI operated PubChem database store biochemical information of small molecules under Pccompounds, Pcbioassay, and Pcsubstances subsets 24 .
Triazole is five-membered heterocyclic ring compounds with molecular formula C2H3N. The relative positioning of the nitrogen atom in the ring, triazole exists in two isomeric forms and each isomer presents two tautomers. Triazole derivatives interact with biomolecule by forming various noncovalent interactions and thus exploited as medicinal drugs 25 . These compounds were considered for the study.
The 2D structures of chemical compounds were drawn in Marvin Sketch 26 . The PDB coordinates of 30 triazole derivatives were obtained after converting their respective mol files in Open Babel 27 . Ligand dataset comprises 30 triazole derivatives 25 and detailed information regarding ligands is provided in Table 1.
The PDB coordinates of ligand were converted into pdbqt format after the addition of Gasteiger charges, merging non-polar hydrogens, detecting aromatic carbons and setting up a torsion tree.

Molecular Docking Studies
Molecular docking studies provide significant insights into binding interactions of the ligand with the target protein. AutoDock 4.2 tools were employed for this study 28 . This tool performs rigid docking where ligands and targets are allowed to interact in a rigid state without bond angle, bond length as well as torsional angle alteration.    29 . Parasite lacks several key enzymes in comparisons with the host for example it scavenges or transport building block biomolecule from the host in spite of de novo biosynthesis. The whole-genome sequence is manually curated for any observable anomaly CD-HIT (http://weizhong-lab.ucsd.edu) suite of the programme is used for redundancy removal and non -paralogs proteins identification. the whole genome in FASTA format is submitted to cd-hit (http://weizhong-lab.ucsd.edu/cdhit_suite/ cgi-bin/index.cgi?cmd=cd-hit) non-homologous proteins were again subjected to BLAST P analysis against the Crypto DB database with 10-10 E-value threshold. This resulted in 323 essential proteins that are described as an essential protein for the propagation of parasite inside the host.

Metabolic Pathway Analysis in Cryptosporidium Parvum
Metabolic pathway analysis shows that parasite has intensely diminished metabolic machinery. Further analysis also suggests that parasite has a scarcity of apicoplast and mitochondrial proteins. Carbohydrates, amino acids, and nucleic acid anabolic functions are also reduced. Parasite depends preponderatingly on glycolysis for energy production. Parasite metabolic machinery perfectly matches with its unique life cycle.

target Protein Model Generation
In the results of the BLAST search against PDB, only one-reference protein 3FFS has a high level of sequence identity and the identity of the reference protein with the domain is 99% 16 .
After this, we have chosen 3FFS (PDB ID) as a reference structure for modeling the Inosine Monophosphate dehydrogenase domain. Coordinates from the reference protein (3FFS) to the structurally variable regions (SVRs), structurally constant regions (SCRs), C-termini and C-termini were assigned to the target sequence based on the satisfaction of spatial restraints 30 .
The sequence of the reference structures was extracted from the respective structure files and aligned with the target sequence using the default parameters in ClustalW.
The Cladogram tree between the Inosine Monophosphate dehydrogenase and template are at a close distance 0.33 indicates both are closely related in origin 31 .
The 3FFS structure was used as the templates for building the 3D model of the Inosine Monophosphate dehydrogenase using MODELLER9V7 32 .  The peptide bond of a polypeptide chain N-Ca (phi angle) and Ca-C (psi angle) bond remain free to rotate whereas bond between Cß-N (omega angle) remain rigid due to p-p interaction. Although the value of phi and psi angle ranges from -180°C to +180°C but because of steric hindrances, only a few limited values are allowed. These dihedral angles describe specific secondary conformation of the protein. Ramachandran Plot helps in determining secondary structure and assists in structure prediction simulations 33 .
The final structure was further checked by the verify3D graph and the results have been shown in Figure 6. The overall scores indicate an acceptable protein environment.
After the refinement process, validation of the model was carried out using Ramachandran plot calculations computed with the PROCHECK    function should be identical. In fact from the structure comparison of template, a final refined model of Inosine Monophosphate dehydrogenase domain using SPDBV program and was shown in Figure 8.

Active Site Identification
The predicted model was submitted to the CASTp tool for the exploration of major binding pockets. Top three binding pockets in terms of surface area and volume selected. The details of these potential binding sites are given in supplementary table number 1shown in figure 9. We observed that THR 11,PHE12,GLU13, SER22,LEU25,SER 4 8 , A L A 4 9 , M E T 5 0 , A S P 1 6 3 , S E R 1 6 4 , A L A 1 6 5 , H I S 1 6 6 , S E R 1 6 9 , A S N 1 9 1 , V AL193,LYS210,GLY212,ILE213,VAL214, VA L 2 2 9 , G L N 2 3 1 , A L A 2 3 4 , A S P 2 5 2 , ARG256,TYR257 and ASP260 are binding cavity residues.
AutoDock 4.2 a freely accessible computational tool for docking of small molecules to target receptor macromolecule. Bound water molecules and heteroatoms were removed from the target protein. Further, Polar hydrogens and rotatable bonds were selected. This step was followed by the computation of the Gasteiger charge. Target protein preparation was accomplished by choosing amino-acid residues in the active site based on the AutoGrid method. Amino acid residues present in the binding sites include Ser164, Ala165, Ser169, His166, Ser22, Val24, Leu25, Pro26, Asn171, Asn191, and Asp252. Binding site residues were selected and three-dimensional grid boxes were created.
AutoDock exploits the Lamarckian-Genetic algorithm for the accurate placement of ligand into the active site. Docking was executed by considering all stereochemical configurations of chemical compounds. Top 100 poses of ligands were considered for docking in order to enhance the accuracy and efficacy. Information regarding docking scores is given in Table 2 in ascending order of free binding energy.

discussion
IMPDH is an attractive target for the treatment of C. parvum infections. In the present study, we have analyzed the efficacy of 30 triazole derivatives in combating C. parvum infections by executing in-silico methodologies. Molecular docking studies were carried out to identify the efficacy of 30 triazole compounds with CpIMPDH protein. Amino acid residues present in the binding sites include Ser164, Ala165, Ser169, His166, Ser22, Val24, Leu25, Pro26, Asn171, Asn191, and Asp252. The results of the docking study revealed significant interactions of triazole derivatives with IMPDH. Among 30 compounds, 4-(1-(4-chloronaphthalen-1-yloxy) ethyl)-1-(4chlorophenyl)-1H-1,2,3-triazole (compound ID: S8000011) showed highest free binding energy of -12.19 kcal/mol. This result is coherent with the interactions made by previously reported inhibitors with CpIMPDH. From the docking results, it is evident that chemical compounds possessing both electron releasing and electron-withdrawing groups exhibited good free binding energy values. The presence of electron-withdrawing groups enhances H-bonding potential When the number of electrons withdrawing groups are dominant in the chemical structure, electrons from these groups will be utilized for stabilizing chemical compounds by resonance stabilization interactions. Hence, these electrons can't be used in making significant interactions with the target protein. This is evident in ethyl a-bromocyclopropaneacetate (Compound ID: S8000011), which exhibited the lowest free binding energy.