Functional and Structural Analysis of Predicted Proteins Obtained from Homo sapiens' Minisatellite 33.15-Tagged Transcript pAKT-45 Variants

The spermatozoa are transcriptionally dormant entities which have been recognized to be an archive of mRNA, coding for a variety of functionally crucial cellular proteins. This significant repository of mRNA is predicted to be associated with early embryogenesis and postfertilization. The mRNA transcripts which are tagged with minisatellites have been involved in the regulation of the gene functions as well as their organization. However, very little information is available regarding the expression of the transcripts tagged with minisatellites in spermatozoa. Therefore, in order to understand the functions and the conformational behavior of the proteins expressed from these minisatellite-tagged transcripts, we have performed a detailed in silico analysis using the sequences of the transcripts. The protein predicted from KF274549 showed the functionalities similar to uncharacterized C4orf26 proteins, while that obtained from KF274557 predicted to be a metallophosphoesterase. Furthermore, the structural folds in the structure of these predicted proteins were analyzed by using the homology modeling and their conformational behaviors in the explicit water conditions were analyzed by using the techniques of Molecular Dynamics (MD) simulations. This detailed analysis will facilitate the understanding of these proteins in the spermatozoon region and can be used for uncovering other attributes of the metabolic network.


Introduction
The ejaculated spermatozoa correspond to fixed terminally differentiated cells, which are devoid of the transcription as well as translation of the nuclear-encoded mRNAs. Consequently, the spermatozoa carry only the paternal genome to the ooplasm. The knowledge obtained from the discovery of a variety of soluble signaling molecules, transcription factors, and the molecules carried by the spermatozoa into the zygotic cytoplasm during fertilization has revolutionized this assessment (Saunders, Larman et al. [1][2][3]). Although the spermatozoa generally retain the transcriptionally dormant state, still the constituent mRNA transcripts present in its framework expressed into a variety of the transcription factors and these proteins may be involved in cell proliferation, signal transduction, regulation of sperm motility, acrosome reaction chromatin condensation, and capacitation [1,2,[4][5][6]. Furthermore, the release of spermatozoal transcripts into the ooplasm is predicted to have a significant role during the process of fertilization and later stages. Generally, the spermatozoa contain around 3000-5000 mRNA transcripts, which may be involved in the expression and regulation of various constituent molecules [7][8][9].
The repeats present in the DNA sequences are dynamic elements of the genome and form the major portion of the satellites' regions and transposable components [10,11]. These repeats are usually observed in the noncoding fragments of the genomes whereas a minor portion is preserved inside the transcriptome [12][13][14] and involved in gene regulation during gene silencing, transcription, and translation [9,15,16]. The mechanisms related to the expression and organization of these conserved repeats in the mammalian transcriptomes, predominantly in the spermatozoa, still remain undiscovered. Therefore, an in silico methodology was used for understanding the functions and conformational behavior of proteins expressed from such satellite DNA sections. The different functionalities were observed for the two proteins, KF274549 and KF274557, that were uncovered from Homo sapiens' minisatellite 33.15-tagged transcript sequence. The KF274549 showed similarities to the family of C4orf26 proteins, which are the group of hypothetical proteins. The hypothetical or uncharacterized proteins are the predicted polypeptides which do not have the experimental evidences at the biochemical levels [17][18][19][20][21]. The information regarding the functionalities of such proteins can be helpful in understanding the hidden mechanism behind the pathogenesis of a variety of microbial organisms [17][18][19][20]. Furthermore, the protein obtained from KF274557 showed the presence of metallophosphoesterase activities. These analyses can be useful in understanding the expression behavior of the satellite regions and how they are modulating the functionalities of other biomolecules present in the metabolic networks.

Material and Methods
The in silico methodology used for the analyses of the predicted proteins was obtained from KF274549 and KF274557. The primary stages involve the establishment of the phylogenetic relationships among the close homologs of the predicted proteins. Then, sequences of these proteins were utilized as the inputs in order to predict the conserved motifs and domains along with the functions they may perform in the metabolic network. We have also observed the availability of the possible interaction partners in the biological databases that may modulate their functionalities. Furthermore, the three dimensional (3-D) structures of these predicted proteins were modeled by using the X-ray crystal structures present in the publically available databases. We have also analyzed the conformational changes of these proteins in the explicit water environment by using the available methods of the MD simulations. The phases of the adopted methodology are explained here in details: 2.1. Protein Translation. The protein sequences for KF274549 and KF274557 were predicted by using the "Translate Tool" present in the Expasy server on the basis of standard genetic code and validated by using the "Translation" Module of the Discovery Studio 4.0 [22]. The Expasy's translate tools predicted six different protein sequences for each query DNA sequence. The BLASTx [23] was used for the selection of the most suitable translated protein sequence for further study. For KF274549, the output based on 3 ′ -5 ′ Frame 2 was selected while for KF274557, the 5 ′ -3 ′ Frame 3 was selected.

Sequence Analyses.
A diverse range of bioinformatics was utilized for the functional analyses of the obtained proteins. In order to obtain the close homologs for the respective proteins, the sequence similarity tools such as BLASTp [23], HMMER [24], and HHpred [25] were utilized. For each sequence search, the proteins with low sequence identity (<20%) and query coverage (<50%) were excluded. Similarly, homologs with high sequence identities (>40%) were considered as the close homologs. Furthermore, the sequences of the close homologs were compared by using the multiple sequence alignment methods such as PRALINE [26]. The ClustalW method [27] was also utilized for inferring the information obtained from the alignment of the multiple sequences. On the basis of these alignments, the phylogenetic relationships were established by using the PHYLIP software package (http://evolution.genetics.washington.edu/phylip .html). The functional domains were annotated by using a variety of curated databases such as Pfam [28], SUPERFAM-ILY [29], PANTHER [30], SVMProt [31], CDART [32], SMART [33], InterPro [34], and ProtoNet [35]. Similarly, the conserved motifs in the sequences of KF274549 and KF274557 proteins were identified by using the MEME suite [36]. Moreover, the interaction partners in the metabolic network were predicted by using the STRING [37] database.

Structural Analyses.
The understanding of the protein functionalities was further explored by analyzing their structural elements. The structures of the proteins were predicted by using the MODELLER [38] module of DS by satisfying the spatial restraints. The accuracy of the predicted models was evaluated using the Ramachandran plot. The topology of 2 BioMed Research International the generated models was analyzed by using the PDBsum [39], and the structural homologs were searched by using the Dali server [40]. Furthermore, the conformational behavior of the predicted protein was understood by performing the Molecular Dynamics (MD) simulations using the GRO-MACS 4.6.5 [41] software package. The proteins were solvated by using the SPC/E water model [42], and through the steepest descent algorithm, the energy minimization was performed with a convergence criterion of 0.005 kcalmol -1 .

Result and Discussion
The minisatellites are found to be involved in the regulation of gene expression, the unstable regions of the chromosomes, and the genome imprinting [7][8][9]. Yet, expression profiles and biological implication of their correlation with the cod-ing regions still remain principally unresolved. In this study, we have performed an in silico analyses that enable us to understand the characteristic of translated protein obtained from the KF274549 and KF274557. The purpose of choosing these transcripts is explained in a previously published work [7][8][9]. The detailed analysis of each protein is discussed here separately.
3.1. KF274549 (T1). T1 showed high sequence similarities to uncharacterized C4orf26 proteins ( Figure S1), with the highest similarity found with unnamed protein GI:18676786 (Figure 1), whereas the outcomes of Pfam and InterPro showed that the T1 belongs to the DUF4721, a family of protein with a domain of unknown functions. Similarly, PANTHER classifies the T1 into the family of uncharacterized proteins PTHR40376. The SMART server identifies that the T1 may belong to the superfamily of ubiquitin-like proteins. The proteins belonging to the ubiquitin-proteasome (ubiquitin-proteasome) system have an established role in the  BioMed Research International process of protein degradation in the human cell [43]. During spermatogenesis, the ubiquitination enzymes play a major role in the formation of the normal sperm by replacing histone with protamine [43]. Lately, the histone ubiquitin ligases were discovered to play crucial roles in several stages of spermatogenesis, such as DNA damage response, meiotic sex chromosome inactivation, and spermiogenesis [43]. Furthermore, the ProtoNet classifies the T1 into cluster 3965083, which is the collection of mammalian secreted proteins. The STRING database predicted that the T1 interact with protein member H, matrix metallopeptidase 20, enamelin, amelogenin, WD repeat domain 72, amelogenin, solute carrier family 24 (sodium/potassium/calcium exchanger), and ameloblastin (Figure 2(a)). In addition, we were able to observe three motifs in the sequence of KF274549 by using the MEME suite, namely, 47 ′ -CNHRFPFQ, 119′-IKYPKHHLGRW, and 82′-SEGRET. The structure of T1 was predicted by using the crystal structure of the hypothetical protein ORF126 (PDB ID-2X5R), which is involved in the molecular process of metal ion binding. The predicted structure contains one alpha and six beta strand secondary structural elements (Figure 3(a)). The stereochemical analyses on the basis of Ramachandran plots showed that 97.7% of the residues were present in the allowed region of the plot, which is indicative of the reliability of the predicted models. Similarly, PDBsum analyzed the presence of 1 sheet, 3 beta hairpins, 1 psi loop, 3 beta bulges, 6 strands, 1 helix, 16 beta turns, and 5 gamma turns in the topological description of the modeled structure (Figures 3(b) and 3(c)). The Dali server identified the structural homologs with functionalities as carbamoyl-phosphate synthetase large chain. Therefore, T1 may be involved in the pyrimidine biosynthesis in mammalian spermatozoa [44]. The topology of the predicted model was generated by using the OPLS all-atom force field. Then, it was solvated by using the SPCE water model and minimized for 1000 steps of steepest descent. The NVT and NPT equilibration each for 100 ps time scale was carried out. Finally, the protein was simulated for a 200 ns time scale. The trajectories of the simulations were analyzed by using GROMACS utilities. The RMSD values which provide the assessment regarding the system stability showed a continuous increase till 25 ns and become stable after 40 ns with fluctuations which were observed between 0.6 nm and 0.8 nm (Figure 4(a)). Similarly, the radius of gyration (Rg) is the measure of the compactness of the protein structures that were observed to be around 1.6 nm (Figure 4(b)). Furthermore, constituent residues showed relatively higher fluctuations around the residue count of 75-100 which correspond to the alpha-helix region of the predicted structure (Figure 4(c)). The free energy landscapes were projected and showed the presence of relatively unstable conformational behavior in T1 (Figure 4(d)).

KF274557 (T2).
The sequence similarities for T2 showed high closeness to metallophosphoesterase ( Figure S2) as well as showed similarities with a diversity of proteins ( Figure 5). The outputs generated by CDART, SMART, and ProtoNet also validated the presence of metallophosphatase activities in T2. The mammalian phosphodiesterases in the spermatozoa were found to be significant for the capacitation and fertilization [45]. This category of the protein is also involved in the modulation of cyclic nucleotide cellular levels by catalyzing their degradation [46]. The cyclic nucleotides are involved in the regulation of sperm motility as well as acrosome reaction. The predicted interaction partners of T2 are post-GPI attachment to protein 3, guanine nucleotidebinding protein (G protein), zinc finger protein 426, ubiquitin-specific peptidase 10, ATPase-type 13A3, ubiquitin C, and post-GPI attachment to protein 1 (Figure 2(b)). Moreover, the MEME suite identified three motifs, namely, 18′-MNSDFGEQ, 119′-IYYLAIFQCNW, and 137′-SDGEQT.
The three dimensional (3-D) structure of T2 was predicted by using the restriction endonuclease MspI (PDB ID-1SA3). The predicted structure of T2 contains the characteristic six alpha and four beta strands (Figure 6(a)). The modeled structure showed 98.3% of the constituent residues in the allowed region of the Ramachandran plot. In addition, the PDBsum identified 1 sheet, 1 beta alpha beta unit, 2 beta hairpins, 2 beta bulges, 4 strands, 7 helices, 5 helix-helix interacts, 19 beta turns, and 2 gamma turns in the predicted structure of T2 (Figures 6(b) and 6(c)). The structural homologs identified by the Dali server showed functionalities of type 2 restriction endonucleases and can be considered as significant for the sperm acrosome reaction [47]. Furthermore, the solvated model of T2 was minimized for 1200 steps of the steepest descent algorithm. After equilibration, the final MD simulations were executed for a 200 ns time scale. The dynamical analyses showed that RMSD values showed a continuous perturbation up to 40 Figure 5: The outcomes of molecular phylogenetic analyses for T2 which showed higher homology towards metallophosphoesterases. 6 BioMed Research International constant at 0.7 nm (Figure 4(a)). Moreover, Rg values showed continuous fluctuations around 1.75-1.85 nm (Figure 4(b)), while constituent residues showed higher RMSF values in the residue region of 25-75 corresponding to the beta-sheet of the predicted structure (Figure 4(c)). Furthermore, the free energy landscapes showed a comparatively stable structural conformation of T2 (Figure 4(e)).

Conclusions
The current study identified the possible functions and conformational behavior associated with the proteins translated from Homo sapiens' minisatellite 33.15-tagged transcript pAKT-45 variants. The T1 was not showing any functional identities at the primary structure level as high similarities    BioMed Research International were observed with hypothetical proteins. Therefore, structure-based methods were utilized which predicted the T1 homology to carbamoyl-phosphate synthetase large chain, while the sequence and structure-based function of T2 was predicted with high confidence. Furthermore, the T2 showed a higher stability profile as compared to the T1 during the course of 200 ns MD simulations. This is attributed to the difference in the structural compactness of both the proteins. This study provides a better understanding of the behavior exhibited by the minisatellite regions at the protein level and their possible functionalities in the metabolic networks.

Data Availability
The data used to support the findings of this study are included in the article.

Conflicts of Interest
The authors declare no conflict of interest.