Insilico structural and functional assessment of hypothetical protein L345_13461 from Ophiophagus hannah

Ophiophagus Hannah is extensively dispersed throughout various portions of the Asia. Many toxin proteins have been identified from their venom which are pharmacology active. Current research goals is to investigate the structural and function assessment of a hypothetical protein L345_13461 of Ophiophagus hannah for a better understanding of king cobra venom. Using in silico approach, the 3D structure was generated by Homology modelling, while the functions was profiled by ProFunc tool. The primary and secondary structure analysis revealed that the hypothetical protein L345_13461 is a stable and located in cytoplasm comprising of a remarkable number of random coils. Homology modelling was accomplished employing SWISS-MODEL server where the templates identity of 34.76% was observed with PDB ID 5ZZ3. Quite a few evaluations of quality assessment and validation parameters created a stable protein model with prodigious quality. Functional analysis was accomplished through InterProScan, ProFunc, DeepGoPlus and KEGG KAAG, suggesting that the hypothetical protein is a cytoplasmic protein, which plays important roles in protein binding, metabolism, signalling and cellular processes, genetic information processing. Finally, we proposed that experimental support would assist to investigate the structures and functions of other hypothetical proteins of various living organisms.


Introduction
Hypothetical proteins (HPs) are proteins whose occurrence has been evaluated although their function but has not been predicted yet via in vivo approach [1]. HPs usually cover approximately huge slice of the protein coding regions in several genomes. Even though their functions are not still categorized well, they could have their own consequence to complete genomic and proteomic material [2, 3] Appropriate structural and functional observations of HPs of precise genome may shift to the detection of novel structures and its new functions and help to described additional protein pathways and cascades, consequently finishing our patchy information on the mosaic of proteins [4]. Illuminating the structural and functional enigmas of these HPs could promote to a well suitable comprehension of the proteinprotein connections and protein-protein networks in various sort of life including plants, microorganisms [5]. Additionally, new HPs could correspondingly serve as markers and pharmacological targets for drug development, finding, and screening [6, 7]. In current era, several hypothetical proteins have been identified in the genome of many life forms. But, because of a few limitations, such as the cost and time desired for experimental procedures, whole genome annotations have not reached yet. Likewise, the wide-ranging number of theoretical proteins in a genome makes their study a problematic mission. Bioinformatics apply Insilco approaches to find out gene loci, predict the transcripts of a particular gene or structure and location of a particular protein inside cell and the disease(s) associated with the abnormal structure or function of that particular protein

Results and Discussion Physiochemical properties of hypothetical protein L345_13461
We employed the ExPASy's ProtParam server to inspect the imaginary physiochemical properties of hypothetical protein L345_13461 using its amino acid sequence. Most of the calculations in this server display protein steadiness and stability because the stability is recognized with its suitable function aptitude [32]. Further, we investigated that the protein overall comprise of 220 amino acids, having a molecular weight of 24506.08 Daltons while its isoelectric point (PI) is 8.90 representing a positively charged protein. The instability index of the protein was calculated 39.27, categorized this protein as stable. The negative GRAVY index of -0.209 which depicts that protein is hydrophobic and soluble in nature. The utmost copious amino acid residue was detected to be Valine (21), Leucine (21) and Glycine (21) tracked by Serine and Lysine (18 each). The lowest was detected as Histidine (1). The sequence comprise of 24 negatively charged residues (Aspartic acid + Glutamic acid) and 28 positively charged residues (Arginine + Lysine). The molecular formula of the protein was investigated as C1114H1726N292O319S6 while the total number of atoms in the protein is 3457.

Subcellular localization of hypothetical protein L345_13461
Protein subcellular localization predictions comprise the computational expectancy of where a protein exists inside a cell. Envisaging subcellular localization of unidentified proteins can provide evidence about their cellular functions. This evidence could be employed in better comprehension of disease mechanism and drugs designing [33]. The subcellular localization of the hypothetical protein L345_13461 protein was evaluated by SOSUI server and find that the protein is cytoplasmic and validated through PSORTb v3.2.0 and Predict Protein severs.

Secondary Structure of hypothetical protein L345_13461
Initially the secondary structure of the protein was identified through SOPMA server. The random coils was detected as the most predominant (36.82%), followed through extended strand (33.64%) and alpha helix (18.64%). Moreover, beta turn was detected as 10.91%. Furthermore, identical outcomes were gained through Predict Protein and PSIPRED servers. The illustrative secondary structure of L345_13461 created from the PSIPRED server [34] is presented in (Fig. 1). Using Cofactory 1.0 server, we found that 1 Rossmann folds, 0.001 FAD, 0.149 NAD, 0.022 NADP cofactor were observed from residue number 150 to 192. Protein Distance Constraints Matrix [35] was build using DistanceP1.0 and shown in (Fig. 2). EasyPred 1.0 was employed for binding motif predication. Initially clustering was done using Henikoff & Henikoff 1/nr method while visualization is perform using the Kullback-Leibler method with the Seq2Logo-2.0 program (Fig. 3). Homology modelling envisages the 3D structure of a specified protein sequence and build model relating to its alignments to one or more proteins of well-known structure [36].
To accomplish the homology modelling, the sequence of hypothetical protein L345_13461 was provided as input in SWISS-MODEL server [37]. The server immediately did BLASTP hunt for each protein sequence to recognize templates for homology modelling. For each identified template, the quality of the templates has been evaluated from topographies of the target-template alignment. The templates with the maximum quality have then been nominated for model construction. In this specific hunt, PDB ID 5ZZ3 was nominated as the template for homology modelling which is an X-ray diffraction model of a Butyrophilin protein with an 34.76% sequence identity, which was a virtuous score to initiate modelling. The 3D model was observed through UCFS Chimera 1.8.1 and shown in (Fig. 4).

Quality assessment and validation of hypothetical protein L345_13461
Reliability of the created model was originally validated with ERRAT that checked the statistics of non-bonded associations amongst diverse atom categories based on characteristic atomic associations. The overall quality factor was depicting as 83.5714 which is virtuous to usage this model. While designated through the Verify3D program, the outcomes presented that 87.88% of hypothetical protein residues had an average 3D (atomic model) -1D (amino acid) score ≥ 0.2 corresponding suggesting that these structures were suitable, well-suited and virtuous (Fig. 5). Z-score of the hypothetical protein L345_13461 model was identified through PROSAweb. The Z-score is used to guess the model quality by means of structured resolved proteins as orientations [33]. The z-score of the hypothetical protein was found to be -6.02, signifying the model is virtuous, presented in (Fig. 6). The stereo chemical quality of the model hypothetical protein L345_13461protein was observed by means of Ramachandran plots through the Procheck server and validated by using RAMPAGE. Ramachandran plot analysis detected 93.3% of residues of the protein's model structure in the favoured region, with 5.5% and 1.2% residues in allowed and outlier regions, exclusively, demonstrating that the model was reliable and of better quality shown in (Fig. 7). The final protein structure was placed in PMDB and is available under ID PM0083165.

Conclusion
The current research was focused to produce the 3D structure and suggest possible functions of the hypothetical protein L345_13461 belong to Ophiophagus Hannah. The 3D model of the protein was created by means of Homology Modelling and polished through few structural assessment approaches and the concluding result was sincerely prodigious. We detected that this novel protein is a stable cytoplasmic protein having SPRY domain and concanavalin A-like lectinglucanase domain. The protein involved in various biological, molecular, metabolic, genetic, and cellular signalling processing. Additionally, this kind of procedure could be supportive in the structure and functions annotation of further hypothetical proteins.