Identi cation and Computational Analysis of 3D- Structure Of MOCS1


 Cyclic pyranopterin monophosphate (cPMP) is one of the most stable intermediates in Molybdenum cofactor (MoCo) biosynthetic pathway. In humans, synthesis of cPMP from Guanosine triphosphate (GTP) requires functional genes i.e. Molybdenum Cofactor Synthesis-1 (MOCS1) genes that contains for two catalytic proteins MOCS1A and MOCS1B. Importance of MOCS1A and MOCS1B for biosynthesis of MoCo reveals from the fact that its deficiency leads to MoCo type A deficiency. As there is no structure available for MOCS1 genes in the literature, tertiary structure of MOCS1 genes were investigated in this research via threading or folds recognition method by i-TASSER and validation was done using ERRAT, Verify3D and Ramachandran plots. Binding sites were predicted and validated. Docking of MOCS1A with GTP and MOCS1B with 3, 8 dihydroguanosine was done using Autodock via PyRx. Apart from this, highly confident mutations were also predicted using SIFT and polyphen2 that can alter the structure and function of MOCS1 gene.


Introduction
Molybdenum cofactor biosynthesis protein 1 (encoded by MOCS1 gene) is a human protein involved in the formation of molybdenum cofactor (MOCO). It is also present in animals, fungi and some cellular slime molds [1]. Cytogenetic location of this gene is 6p21.2, which means that it is present on short arm p of chromosome 6 at position 21.2. Molecular Location is from base pairs 39,904,170 to 39,934,462 on chromosome 6 [2]. The most interesting feature of the MOCS1 gene (MOCS1A and MOCS1B proteins) is that it is responsible in the formation of rst intermediate, "precursor Z (cyclic pyranopterin monophosphate (cPMP)" in MOCO biosynthesis pathway ( Fig. 1) [3,4].
First step is the formation of rst intermediate cyclic pyranopterin monophosphate (cPMP) from GTP, which is converted into Molybdopterin (MPT). MPT is adenylated followed by insertion of molybdate into adenylated MPT and release of AMP. Final step involves conversion of MPT-AMP into mature Moco [4].
Molybdenum cofactor (MOCO) containing enzymes are very important as they help in the catalysis of many redox reactions involved in various cycles like carbon, sulfur and nitrogen. Most crucial molybdenum containing enzyme is sul te oxidase (SO) as it helps in degradation of sulfur containing amino acids [5]. Nitrate reductase (NR) is similar to sulphite oxidase but is abundant in autotrophic organisms. Xanthine oxidoreductase are involved in purine catabolism and leads to apoptosis. Moco depicts an essential role in sul te oxidase, aldehyde oxidases, xanthine dehydrogenase, and hypotaurine dehydrogenase, its de ciency can cause multiple diseases related to kidney functions [6]. The molybdenum cofactor de ciency may occur due to molybdenum de ciency or disturbed molybdopterin cofactor synthesis.
The rst model of MOCO biosynthesis was given by Rajagopalan and co-worker [7]. In eukaryotes like fungi, plants, and humans, six agents have been identi ed that catalyzes the reactions involved in synthesis of Moco. The naming of these agents in plants is based on cnx nomenclature i.e. cofactor for nitrate reductase and xanthine dehydrogenase, however, in humans they are named as MOCS (molybdenum cofactor synthesis) [8]. MOCO biosynthesis pathway is divided into four major steps, as shown in Fig. 1. As one can observe, the rst step is the conversion of GTP into the rst intermediate cyclic pyranopterin monophosphate (cPMP), which is aided by MOCS1A and MOCS1B [9].
MOCS1A is responsible for the cyclization of GTP to (8S)-3',8-cyclo-7,8-dihydroguanosine 5'-triphosphate while MOCS1B is responsible for the subsequent conversion of (8S)-3',8-cyclo-7,8-dihydroguanosine 5'triphosphate to cPMP [9] (Fig. 3). In E. Coli, MoaA [10] and MoaC [11] (homologous proteins of MOCS1A and MOCS1B) are responsible for the generation of cPMP by complicated reactions. MoaA and MoaC proteins puri ed from S. Aureus for the synthesis of cPMP were most recently studied by Hanzelman and Schindelin using an in vitro system [12]. MoaA is involved in the rst step but it was the last protein to be characterized structurally after reported crystal structure of S. Aureus MoaA by Hanzelman and Schindelin in 2004.
The importance of MOCS1A and its homologous proteins like MoaA and MoaC are described by their structures where two highly conserved cysteine clusters (one in the N-terminal and one in the C-terminal) are proposed to be involved in binding of FeS cluster [13]. Several mutations were identi ed in molybdenum cofactor de ciency patients these mutations were positioned in the conserved cysteine motifs. This indicates the functional importance for MOCS1/ homologous proteins as their mutations lead to Molybdenum Cofactor de ciency.
Molybdenum cofactor de ciency (MoCD) is a rare human disease responsible for many neurological disorders including brain atrophy which leads to delay in brain development [14,15]. MOCO de ciency leads to a loss in activity of Mo-dependent enzymes. There are three types of MoCD, i.e. type A, B, and C caused by the lack/ dysfunctioning of proteins MOCS1, MOCS2, and Gephyrin respectively [17]. Until now, no exact mechanism or reason has been identi ed that causes the dysregulation in the function of these enzymes. But many studies have suggested that mutation in the sequence and structure of gene may be responsible for this abnormality in MOCO pathway. To determine the cause, one may need to check the structural composition of the protein involved.
To best of our knowledge, in the protein database PDB, the crystallographic 3D structure of MOCS1A/B is not yet reported. Therefore, computational analysis of MOCS1 structure can be of advantage in the prediction of its probable structure. Therefore, we performed computational modeling and simulation in the hope to understand the properties and structure of MOCS1. Later, GTP ligand was docked to MOCS1A after determining binding sites and computational analysis were performed.

3D Structure Prediction
In this study, sequence retrieval was done from Uniprot database for protein MOCS1 with Uniprot ID Q9NZB8. For MOCS1A identi er: Q9NZB8-5 and for MOCS1B identi er: Q9NZB8-1 was retrieved [18]. The MOCS1A/B FASTA sequence functioned as a query sequence against a non-reductant protein database (nr) in BLAST for homology search [19]. SOPMA i.e. Self-optimized prediction method with alignment was used for secondary structure prediction of MOCS1A and MOCS1B proteins [20]. It is a self-optimized prediction method that improves the success rate in the prediction of the secondary structure of protein.
The number of conformational states: 4(helix, Sheet, Turn, and Coil), similarity threshold: 8, and window width 17 were set as parameters for SOPMA server [21]. 3D structure prediction was performed using i-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) [22], i.e. iterative Threading ASSEmbly Re nement, a platform for protein function and structure prediction from its amino acid sequence by threading [23]. The accuracy of predicted structure is determined by con dence score or C-score, ranges from 0 to 1. The higher value of the C-score, the more reliable is the prediction [24]. The 3D structure evaluation in this study was performed by online quality evaluation tools via Structural analysis and veri cation server (SAVES) (https://servicesn.mbi.ucla.edu/SAVES/). There are about ve servers in SAVES i.e. Verify 3D [24], ERRAT [25], PROCHECK [26], WHATCHECK, and IFCHECK [27]. In our study, we used ERRAT and Verify3D for structure validation. RAMPAGE and PROCHECK servers are used for Ramachandran plot and statistics analysis [26].

Molecular Dynamics
For protein structure having bad scores in model evaluation, 500,000 steps MD simulations were carried out to relax the bad contacts of proteins. By using GROMACS version 5.1 software and OPLS-AA force eld, molecular dynamics simulations were performed to re ne models [28]. PyMOL and visual molecular dynamics (VMD) applications were used to visualize molecule structure Post-MD simulations [29]. Grace application in Linux was used to display graphs. Gromacs does not work alone, so all of the above three applications were used for MD simulation analysis [30]. Moreover, MD simulation helps in xing the bad contacts at the active site of the protein. It also improves bad ERRAT or Verify3D scores [31]. Structure comparison of MOCS1 before and Post-MD simulation was done using Chimera [32]. Ligand binding site for MOCS1 in this study was predicted using DoGSiteScorer [33] and RaptorX [34]. Two software were used for analysis so that we can gain more con dence in the result. The best scores among these were used for further analysis.

Molecular docking
Molecular docking of ligands into their respective proteins was performed using PyRx [35] which is a Virtual Screening software for Computational Drug Discovery. Autodock, which is a part of PyRx, was used as docking software [36]. While docking runs, the target 3D structure was xed. To nd the best binding modes, the ligand is rotated and moved [35]. 3. Results And Discussion cPMP formation, the rst step in molybdenum cofactor biosynthesis, needs two proteins i.e. MOCS1A and MOCS1B [37]. The structure of MOCS1, to best of our knowledge, is not reported in PDB till now. So structure of MOCS1A and MOCS1B was required to be predicted to understand the formation of cPMP. Therefore, to understand the structure and function of MOCS1, structure prediction protocol was followed where computational modeling and simulation techniques were applied to predict the structures of MOCS1A and MOCS1B. Later, binding site of MOCS1A and MOCS1B were also determined. 3

Tertiary Structure Prediction
Tertiary structure prediction was carried out using i-TASSER, SWISS-MODEL, and RaptorX. Out of ve best models for MOCS1A and MOCS1B generated by. i-TASSER simulation, model 1 ( Figure S1 of Supplementary Material le) with − 1.49 con dence score ( Table 2) was selected for MOCS1A, and model 3 ( Figure S2 of Supplementary Material le) with C-score − 2.07 was selected for MOCS1B for further study. Swiss model generated two models for MOCS1A and three models for MOCS1B ( Figure S3 and S4 based on template from PDB, query sequence being partitioned into units), domain 1 has P-value of 7.57e-08 and uGDT 211 while domain 2 has P-value 6.18e-09 and uGDT 392 ( Table 2).

Model evaluation and validation
To determine the quality of predicted structures, the overall structure's evaluation was carried out by using Ramachandran plot statistics obtained from RAMPAGE for comparing the results generated by i-TASSER, Swiss-Model, and RaptorX. Ramachandran plot statistics for MOCS1A/B protein models obtained by three different methods are presented in Table 3. Analysis of the results (Table 3) shows that i-TASSER generated models are with satisfactory results as compared to Swiss-Model and RaptorX, as they both have predicted 3D structure for MOCS1A and MOCS1B partially. Therefore, 3D models for MOCS1A and MOCS1B generated by i-TASSER were considered for evaluation and validation in SAVES server by ERRAT and Verify3D.
ERRAT is a protein structure veri cation tool that is suitable for evaluating the progress of crystallographic model building and re nement. It works by analyzing the statistics of non-bonded interactions between different atom types and comparing them with databases of high-resolution structures. The region of the model that fall in 90-99% can be rejected. ERRAT gives an overall quality score besides highlighting the error-prone region. Figure 4 shows the overall quality factor of 83.82%.for MOCS1A model by i-TASSER whereas Fig. 5 shows plot for MOCS1B protein structure having 91.06% ERRAT score that is good score showing that model is satisfactory.
The average 3D-1D pro le score was obtained using Verify3D tool. Verify3D helps to investigate the compatibility of an atomic model with its own 1D amino acid sequence. It usually shows the 3D-ID score ranges from − 1 (bad score) to + 1 (good score). The Verify3D plot shows that for the protein MOCS1A, 84.68% of residues have averaged 3D-1D score > = 0.2 (Fig. 6) and for protein MOCS1B the average 3D-1D score is 76.26% (Fig. 7).
The structures of MOCS1A and MOCS1B were validated through RC plot. Based on these factors, it seemed that the models were pretty good. The Ramachandran plot of MOCS1A and MOCS1B model proteins obtained from i-TASSER (Figs. 8 and 9) shows that various residues are falling under allowed, favored, and outer regions.
The Ramachandran plot statistics for both model proteins MOCS1A and MOCS 1B are presented in Table 4 which shows that for MOCS1A about 80% residues are in the allowed region and 3.6% residues (only 12 residues) are in disallowed region which is negligible. For MOCS1B, more than 90% residues are in the allowed region and 1.8% residues (only 10 residues) are in disallowed region. It can also be seen from Ramachandran plots that most data lie in favored region.

Molecular Dynamics Simulation
For removing con icts and bad contacts between the main chain and side chains of modeled MOCS1 proteins, MD simulations were performed using GROMACS5.1 to re ne the models. Analysis of output data was done according to Energy, root-mean-square deviation (RMSD), and Gyration radius. The total energy (sum of the kinetic and potential energy of the molecules) should be constant throughout the simulation process. Kinetic energy is expected to be constant or should follow a declining trend. This is because the increase in the level of kinetic represents confusion of protein structure. While potential energy should follow an increasing trend or should be constant to ensure structure stability.
Distance between the backbones of protein atoms is measured by RMSD. Figures 10 & 11 show plot for RMSD of MOCS1A and MOCS1B respectively. The plot shows that RMSD gets leveled at 0.25nm for MOCS1A and 0.35 nm in the case of MOCS1B, representing protein stability. The structural re nement displays that RMSD plots for predicted models of MOCS1 attens during 1 ns, which indicates that proteins are stable throughout the simulation.
The compactness (folded or unfolded) of a protein structure was measured using radius of gyration plots. Results indicate that the MOCS1 proteins are very stable in its compact form over the course of 1000ps (1ns) at 300K. Radius of gyration plots for MOCS1A and MOCS1B are presented in the supplementary data le ( Figure S7 and S8).
Structure comparison of MOCS1A and MOCS1B Pre and Post-MD simulation was done using chimera. The results showed that predicted structures are constantly stable and it ensured proper folding of the protein. Figures 12 and 13 shows structure comparison of MOCS1A & MOCS1B Pre and Post-MD simulation.
Ramachandran plot analysis was carried out on Post-MD simulation structures using RAMPAGE, showing improvement in statistics of plots (Table 5). For Post-MD MOCS1A, 82.5% residues are in core region and for MOCS1B, 87.2% residues are in the most favoured region.

Ligand binding site prediction
To study the reaction mechanism associated with MOCS1 proteins, the knowledge of the active site was very important. Therefore, after the structure prediction of MOCS1 proteins, MOCS1A and MOCS1B, identi cation of binding pockets and the active site was required. There is no binding pocket identi ed in MOCS1 enzyme for MOCS1A to bind GTP and MOCS1B to bind 3',8-cyclo-7,8-dihydroguanosine 5'triphosphate in literature. Uniprot only suggested the active site residues located around 317-319 for GTP binding but not the exact active site was reported. DoGSite Scorer and RaptorX were utilized to determine the active site and the important residues present at that site for ligand binding.

Conclusion
Computational analysis was done on the MOCS1 proteins, MOCS1A and MOCS1B, involved in the rst step, formation of cPMP from GTP, of biosynthesis of molybdenum cofactor in humans. To study its reaction mechanism, the knowledge of the active site was very important. As there was no structure available in PDB for MOCS1A and MOCS1B proteins, therefore, in the rst step the 3D structure of MOCS1A and MOCS1B were identi ed using i-TASSER and veri ed using Ramachandran plots. After structure prediction the second step was the identi cation of binding pocket and active sites for the ligands involved in the formation of cPMP. This was done using DoGSite Scorer and RaptorX. After successful prediction of binding pockets, molecular docking was performed. This shows the interaction of GTP and 3',8-cyclo-7,8-dihydroguanosine 5'-triphosphate with the amino acid residues of MOCS1A and MOCS1B respectively, which thus led us to better understanding of the reaction involved in the formation of cPMP from GTP in molybdenum cofactor biosynthesis. ERRAT plot for MOCS1B Figure 6 Verify3D score for MOCS1A Figure 9 Ramachandran plot for MOCS1B Figure 10 RMSD plot for MOCS1A Page 18/21   PyMol view of First Binding pocket for GTP obtained from RaptorX