Elucidation of Computational 3D Models of Protein Drug Targets for Colletotrichum Falcatum A Fungal Plant Pathogen Causing Red Rot of Sugarcane

Objective of this study is to develop 3D structures of the potential drug target proteins in the ascomycte plant pathogen Colletotrichum falcatum that causes ‘red rot disease’ a grave disease of sugarcane crop. This study uses online databases such as UniPort, DrugBank, PDB and PMDB to retrieve and submit biological information. Online webtools such as SwissModel, BLASTp, was used to construct homology models and find similarity, respectively. Total of 72 protein sequences were identified as potential drug targets and were retrieved form UniProt in .fasta file format. Based on the available template model, total of 52 proteins were successfully modelled using Swiss-Model webtool. Among these 52 predicted models, 41 models were identified as significant based on Ramachandran plot analysis. The 41 predicted models were submitted to PMDB server for public access. This study has created a dataset of 3D homology models of drug target proteins that can greatly benefit in selective drug discovery and drug development against the investigated pathogen Colletotrichum falcatum. This can greatly benefit the pharmaceutical industry in developing agricultural antifungal favouring sugarcane cultivation.


The Red-Rot Disease
Saccharum officinarium (sugarcane) is a monocotyledonous plant species that belongs to the family Graminae and is cultivated in most of the tropical and subtropical regions of the world 1 . It is the 2nd most valuable agro-industrial crop (cash-crop) in India next to cotton 2, 3 . It occupies around 4.2%(4.36million hectares) of the total area under cultivation and contributes 7.5% to the gross value of agricultural production of the country 1,2 . Global contributors to sugarcane production include Brazil, India, China, Thailand, the US, and the UK. India is the 2nd largest producer of sugarcane in the world following Brazil and is the largest consumer of sugarcane with an average consumption of 19 million MT a year 3 . Sugarcane is economically very important as it stores a high concentration of sucrose in its stalk tissues, but several biotic and abiotic factors affects this sucrose yield. Approximately a hundred diseases have been reported globally, with 100 fungi, 10 bacteria,10 viruses, and 50 nematode species known to cause destructive diseases to sugarcane 3 . The most devastating disease of sugarcane is the red rot disease, caused by the fungi Colletotrichum falcatum.

Colletotrichum Falcatum
It is a facultative saprophyte which belongs to the subdivision Ascomycotina 3 . It causes huge loss of 18-31% in sugarcane production. According to the GOI study of 2017, various biotic stress factors, including red rot disease hampers the production of sugarcane in many important parts of the country 3,4 . The pathogen affects the economically valuable stalk by entering through the nodes 5 . Meteorological factors, along with soil pH and waterlogging play a major, but a compound role in aiding the infection mechanism 3, 6-9 . Susceptibility of the cultivar, along with the age of stalk tissues and time of infestation, determines the symptoms shown by the plant 7 . Infections decrease juice purity and deplete important micro and macronutrients like iron, copper, zinc, potassium, and phosphorous, but however, increase the content of calcium, and nitrogen to a large extent 10,11 . Stalk weight is reduced by 29% furthermore, sugar retrieval capacity is reduced by up to 31%. Affected tissues cannot be commercially utilized because they lead to deterioration of the product 3,11 .

Homology Modeling
In this study, the 3D structures of proteins of Colletotrichum falcatum, are developed using homology modeling technique. These structures are necessary to design and develop drugs that could aim to stop the spread of this dreadful disease.Lack of protein structures has hindered the understanding of binding specificities of proteins and ligands, which are pre-requisites for drug design and development 12 . Well established and recognized databases and tools were employed in this study to obtain the computational structures of potential drug target proteins. These are very necessary to contain this dreadful disease and also reduce the monetary burdens incurred on the farmers due to this disease. Homology modelling was done using the pre-existing fasta sequences of the proteins and structure of the protein was predicted based on the available templates which were similar to the respective query sequence of the protein 13,14,15 . It is of utmost importance that the protein structures are available, which would otherwise hinder the understanding of binding specificities of the protein and ligand leading to low availability of drugs to stop the spread of this disease. The structures developed in this study could further be exploited to design and develop new and efficient drugs which would be a boon to the cultivators of this crop.

nCBi Database
This database was used to screen the amount of pre-existing information available on the chosen organism of interest. A simple search with the organism's name revealed the amount of available data in different databases (https://www. ncbi.nlm.nih.gov/).

sequence Retrieval
The UniProtknowledgebase (www. uniprot.org)-a centralised, reliable and publically accessable collection of non-reductant protein sequences 16 , was used to retrieve the amino acid sequences required for this study. The organism's name was used as the search criteria. The best sequences were sorted based on the length of the amino acid chain and were downloaded in.fasta file format. Repetitive sequences were all omitted to prevent ambiguity, and downloaded sequences were all stored along with their respective accession IDs.

sequence alignment
The query sequences or amino acid sequences retrieved from UNIPROT were all compared using the BLASTp server (www.uniprot. org), an online tool that compares the query sequence with pre-existing protein sequences in the Protein Data Bank in order to obtain the percentage similarity 17,14,18 . The first BLAST was carried out to get similarity values with pre-existing sequences of the non-reductant database, followed by a second BLAST to get similarity values with the sequences present in the Protein Data Bank (www.rcsb. org). The similarity estimations thus obtained in percentages were noted down for further reference.

structure Prediction
The 3D structure of the selected drug target proteins were predicted through homology modelling technique, using the online tool SWISS-MODEL (https://swissmodel.expasy.org/). This tool uses respective amino acid sequences of the protein as well as the templates available in the protein databank to predict the 3D structure of the protein 19,20 . Sequences retrieved from UNIPROT were uploaded in .fasta file format and the predicted 3D models were retrieved in .pdb file format. The quality of the models developed was dependent on the availability and percentage similarity of the templates.
The models hence developed were analyzed using the Ramachandran plots. These are graphical plots that represent protein structures in terms of torsion angles andhence play a significant role in confirming the predicted structure's accuracy 21 . The best models were downloaded in .pdb after the analysis was complete.

Model analysis
Qualitative analysis of the reliability of the predicted model was performed via the Ramachandran plot provided by the SWISS-MODEL as an in-built feature. The degree angles of all the residues were expected to be within the Most-Favoured regions of the Ramachandran plot, that determines the quality of the predicted structure. The residues that are outside of this favoured region are considered to be unfavourable prediction or outliers. These outliers reduce the confidence score of the predicted model.

Model submission
The 3D models thus predicted with good quality Ramachandran plot (confidence score) were then submitted to public database Protein Model Data Base [PMDB] (https://bioinformatics.cineca. it/PMDB/) which is a resource that stores manually built protein models that are published in scientific literature 22 . All the models were uploaded in .pdb file formats and each entry was given a unique PMDBID for future reference.

Drug Target selection
A bibliographical search of the organism Colletotrichum falcatum in NCBI website, showed that, there are no known protein structures of the organism, but there are about 600 protein sequences reported. Hence, this organism was ideal for construction of computational protein models. Preliminary screening of the 600 protein sequences, suggested that non-enzymatic protein components such as ribosomal subunits were predominantly reported. Hence, these ribosomal and non-enzymatic proteins were eliminated for the homology model development, resulting in a total of 125 protein sequences for further analysis. These 125 protein sequences were then analysed in the DrugBank database (www.drugbank.ca) to confirm if they are previously reported as a possible drug targets. Among these 125 protein sequences, 72 sequences were identified as potential drug targets based on literature proof of mechanism of action of known antibiotics.

Building Homology Model
The selected 72 protein sequences were retrieved from the uniprot database (www.uniprot. org) and saved as fasta file format (.fasta) then subjected for BLASTp also known as Protein Blast (https://blast.ncbi.nlm.nih.gov/) to search within Protein Data Bank (www.rcsb.org) to identify the templates for homology modelling. Protein structures with more than 80% sequence similarity were chosen as ideal template. Only 12 sequences had a similarity of more than 80%. The other sequences had more than 1 template used for homology modelling, with less than 80% sequence similarity. All the 72 protein sequences with appropriate template models were subjected for homology model construction using the Swiss-Model web server tool (https://swissmodel.expasy. org/). Homology protein models were built for 52 sequences successfully.

Ramachandran Plot Validation
The Swiss-Model tool, has developed a maximum of 5 different models for each of the query sequences. The best of the 5 models were selected based on the Ramachandran Plot analysis, that was integrated with the server. The ramachandran plot of all developed models were examined, and the model which has majority percentage of the residues within the most preferred regions of the graph were considered as the ideal / final model. The percentage of residues within the most favoured regions was also used as the confidence percentage score. The graphical representation of the ramachandran plot of least preferred model and best preferred models with lowest and highest confidence score are shown in Figure.1(A) & Figure.1(B). The structure analysis of a constructed homology model for WRKY transcription factor 37 protein is represented graphically in Figure.2. submission to PMDB A total of 52 homology models were developed in the Swiss-Model online tool. However, ramachandran plot analysis showed that only 41 protein models exhibited more than 90% residues within the most favoured regions. Hence, these 41 sequences were submitted in the Protein Model Data Base (https://bioinformatics.cineca.it/ PMDB/) for public access and further research. A summary of all the 41 protein 3D models submitted to the PMDB database with their PMDB ID, along with drug target name, Uniprot ID and confidence score of the model are tabulated in Table.1.

ConClusion
The objective of this study was to prepare 3-D models of proteins present in Colletotrichum falcatum that is a major pathogen responsible for the red rot disease in sugarcane crops. The protein structures of this ascomycete were successfully developed and analyzed using the swiss-model workspace. A similar study by Divya et.al (2018) was performed on a less studied organism P.marinus an endoparasitic pathogen, and other Perkinsus spp. that are responsible for causing devastating losses in the cultivation of shellfish and mollusk species worldwide , where the authors developed and submitted 3D structures of drug target proteins of the organism based on homology modelling 23 .
These developed structures could be further exploited to either develop new antifungal drugs or to revise the pre-existing ones. This is an important aspect of computer-aided, in-silico drug designing approach which gained significant momentum in the recent years 24 . Furthermore active sites of the molecule can be found out aiding in docking studies that would help in studying the interactions between the protein and the ligand. A Study by Daisy et.al (2013) involving the in-silico drug designing approach for biotin protein ligase of Mycobacterium tuberculosis, provides better insight on the significant role played by predicted protein structures in the field of drug design and development 25 .
The structures can also prove to be useful for understanding the mechanism of infection of the disease. They can also be used to design a pathway for understanding the effect of inhibition of a few proteins that can act as good antifungal drug targets which may help design an antifungal compound that might be more effective as well as non-toxic unlike the traditional fungicides and insecticides that are proven to be harmful in the long run. This study provides a basic platform for future in-silico work on Colletotrichum falcatum specific drug design and development. aCknowleDgeMenT