Structural and Molecular Dynamics Studies of UDP Glucose Pyrophosphorylase Dimerization in Rice (Oryza sativa L.)

UGPase enzyme is involved in sucrose synthesis as a catalyzing agent in the reaction, Mg +2 -UTP + Glu-1-P PPi + UDP-Glu. It catalyzes both forward and reverse reaction depending on the metabolic status of the tissue. UGPase contains three domains: N-terminal domain, catalytic domain with nucleotide binding loop at center and the C-terminal domain. UGPase as a monomer is the functionally active in most photosynthetic plants and microorganisms. Comparative structural modelling of UGPase was performed using models predicted from Modeller, Swiss-Model and I-TASSER. All predicted models were further assessed, evaluated and refined through QMEAN, WHATIF and SAVES server to achieve most stable model. To study the interaction between monomers units’, homodimers were built through protein-protein docking. Hydrogen bonding between monomer units, solvent accessible area and ligand binding/active sites were predicted with H Bond, NACCESS and CastP server. Solvent accessible area in dimer was approx. 100-200 Å smaller as compared to monomer. This study showed that reduction in solvent accessible area and subsequent smaller active site resulted in loss of activity due to dimerization in UGPase.

The UDP-Glucose is significant for the synthesis of cell wall polysaccharides as well as in production of carbohydrate moiety of glycolipids, glycoproteins and proteoglycans (Bishop et al., 2002). UDP-Glucose also serves as precursor for the development of sucrose and cell wall polymers like callose, hemicellulose, cellulose etc. Moreover, within both non-photosynthetic tissue and developing leaves UGPase acts as catalyst in sucrose degradation pathway. During sucrose degradation, UGPase produces UTP and glucose-1-P from UDP-Glucose and pyrophosphate. This Glucose-1-P is not only a substrate for ADP-Glucose pyrophosphorylase but also a key player of starch formation in heterotrophic tissues such as oilseed rape embryos and barley endosperm (Schwender et al., 2015).
The rice genome contains two homologous UGPase gene, OsUgp1 on chromosome 9 (Abe et al. 2002) and OsUgp2 on chromosome 2 (Mu. 2002). OsUgp1 gene has six isoforms, named as P1 to P6. These isoforms undergoes conserved but dissimilar N-terminal acetylation. The conserved protein modification might be responsible for the distinct roles of these isoforms during plant development and their respective regulations.
UGPase exists as a mixture of monomer, dimer and higher order oligomers (Martz et al., 2002). In rice, monomer is reported as most active and dimer as inactive. Inactivity of oligomers might be because of their post translational modifications. Comparative structural modeling and molecular dynamic simulation studies might help in revealing conformational differences between UGPase monomer and dimers.
Keeping this in view, the present study was conducted to undertake molecular simulation for dimerization of UDP-Glucose pyrophosphorylase and to compare structural analysis of predicted dimers of UDP-Glucose pyrophosphorylase. Models were subsequently simulated by Gromacs-2019. Steepest descent method for energy minimization of models was opted with <1000 KJ/mol energy cutoff. MD simulation with parameters such as temperature (300K), pressure (1.0 bar) and density (1000 kg m -3 ) were stabilized over a period (100ps) and executed for 20 ns (i.e. time=nstep*dt, nstep=1crore, dt=0.002). RMSD of simulated and unsimulated monomer models were graphically plotted.

MATERIALS AND METHODS
Homodimers using predicted models were generated by pyDock and ZDOCK protein-protein docking servers. Afterwards dimers were verified through SAVES and WHATIF server. Generated dimers were then simulated by Gromacs-2019 and analyzed for intra-protein interaction with Dim plot. Active/ligand binding sites in monomers were predicted by Cast P server. Solvent accessible surface area for residues was calculated in both monomer and dimers by using NACCESS tool (Hubbard, 1996).

RESULT AND DISCUSSION
In silico characterization Sequence for rice (Oryza sativa L) UGPase was retrieved from NCBI with accession number ACA50487.1. Additionally 114 complete UGPase protein sequences were retrieved from NCBI. Monomer of rice UGPase was predicted thermo as well as test tube stable through instability and aliphatic index score of 23.59 and 100.96 respectively by Protparam. Rice UGPase was characterized with -0.163 GRAVY, 51682.29 molecular weight and 5.43 theoretical pI. Melting point of rice UGPase was estimated by using Tm predictor to be in 55-65 O C range. GRAVY score, aliphatic index and melting point altogether points towards the globular stucture of UGPase. Similar results were obtained from other UGPase sequences denoting the conservation in their physiochemical properties during the course of evolution.

Phylogenetic Analysis
A Total of 114 protein sequeces from Plants, Fungi and Bacterial UGPase retrieved from NCBI were selected for construction of phylogenetic tree. Maximum parsimony methodolgy from MEGA7 tool was employed for building phylogenetic tree using selected sequences. Tree depicted closer evolutionary relationship within plantae, algae, fungi amd bacterial clades. Poaceae family member were observed to be in same clade highlighted with yellow color and rice UGPase was estimated closer to sugarcane and common millet UGPase (Fig. 1). Comparative modelling 3D models constructed from Modeller9.20, Swiss Model and I-TASSER named Model1, Model2 and Model3 respectively. 5WEG was used as template for all the three models. The secondary structures of predicted models were studied and compared to template as shown in Table 1. Loop region in all three models were found comparable whereas helix region of Model1 and sheet region of Model2 were observed to be consistent with template.
The structural quality of models was assessed using QMEAN and Z score. QMEAN score of Model1, Model2 and Model3 were 0.541, 0.695 and 0.718 respectively. Similarly, Z-score were -1.45, -2.54 and -1.85 for Model1, Model2 and Model3 respectively. The scores of models constructed in present study were found closer to optimal Z score of 0 and within optimal QMEAN range of 0-1. Both QMEAN and Z score of models complied with structural quality of solved Xray crystallography reference structures. The minimum energy calculated for all Mode1 (-22420.309KJ/Mol), Model2 (-23439.170KJ/Mol) and Model3 (-24260.711KJ/Mol) further supported that the models were stable and can be taken for further studies.
All models were then validated by WHATIF server. Coarse Packing quality was computed -1.047 for Model1, -0.548 for Model2 and -0.584 for Model3, illustrating all models are of good quality. Omega average, anomalous bond length and anomalous bond angles were other validation programs used to check structural errors. Trans-peptide omega angle in predicted models was approximately +178° and fulfills Gaussian distribution average for protein structures. The calculated Z-score of anomalous bond angle were 1.011, 0.872 and 0.935; and anomalous bond lengths were 0.665, 0.712 and 0.726 for Model1, Model2 and Model3 respectively. By and large, models were fulfilling protein structural criteria and can be used in subsequent studies.
Models were further analyzed by SAVES (Structural Analysis and Verification Server) using PROCHECK, PROVES, VERIFY3D and ERRAT packages (Table 2). VERIFY3D determined 3D model compatibility with its amino acid sequences and with a >84% score every predicted models/monomers were given a PASS status. Likewise, ERRAT compared predicted models/monomers with refined known structures by calculating non-bonded interactions between atoms and estimated them as Good with >90% score. Ramachandran plots for assessment of backbone conformation were generated from PROCHECK. Plots showed core region between 85.1 to 94.3% and disallowed region between 0 to 0.5% in all the models. PROVES calculated the atomic volume and Z-score for quality estimation in all models. Overall evaluation and verification deemed all models to fit for further simulation studies.

Molecular Dynamic Simulation
Topology file was generated by using force field OPLS-AA/L all-atom force field (Jorgensen L., 1996) with water model SPCE (extended simple point charge model). Models were positioned in cubic box center with 1.0 nm distance from boundary. All models were stabilized with negative potential energy and F max < 1000.0 ( Fig. 3(a)). Model1 having lowest energy, followed by Model3 and Model2, was considered most stable. Radius of gyration measuring protein compactness for simulated models was around 2.4 nm viz. acceptable for the protein folding ( Fig. 3(b)). RMSD between simulated and unsimulated models were calculated aprox. 1.1 Å over a period (Fig. 3(c)). Additionally, this ascertains that no major change in simulated structure has been taken place as compared to un simulated models. Active site prediction Ligand binding sites for simulated models was predicted using Cast P server (Tian et al., 2018). Active residues such as were also identified from NCBI's Conserved Domain Database. Sixteen amino acids were determined as active residues including K100, P190, G192, K257, G259, E272 etc. The largest pocket of central catalytic domain containing active residues was selected for subsequent analysis. Solvent accessible surface area and volume of the active site for each model was computed (Table 3).

Dimer formation
Homodimers were generated by using proteinprotein docking servers (ZDOCK and pyDOCK) using the simulated models. Dimer A, Dimer B and Dimer C were generated by submitting two copies of Model1, Model2 and Model3 respectively to Z-Dock server (Pierce et al., 2014). Correspondingly, Dimer X, Dimer Y and Dimer Z were generated using two copies of Model1, Model2 and Model3 respectively to pyDock server (Cheng et al., 2007). Dimers were subsequently evaluated by VERIFY 3D.

Dimer analysis and verification
Six dimers were then validated with WHATIF server. Coarse Packing quality was computed and found within the range of -1.159 to -1.573, illustrating all dimers are of good quality. Anomalous bond angle and anomalous bond length Z-score were also recorded standard range of 1.943 to 2.029 and Z-1.486 to 1.554 respectively.
Dimers were also analyzed by SAVES (Structural Analysis and Verification Server) using PROCHECK, VERIFY3D and ERRAT packages (Table 4). PROCHECK results showed core region in the range of 81.5 and 84.2% and disallowed region in the range of 0.2 and 1.0% in all the Dimers. Overall, all dimers were suitable according to structural principles of known protein.

Molecular Dynamic Simulation of Dimer
Topology file was generated by using force field CHARMM27 all-atom force field (MacKerell et al., 1998) with TIP3P water model. Dimers were placed in triclinic box center with the distance 1.0 nm from the edge. All dimers were stable with negative potential energy and F max<1000.0 (Fig. 4). Out of ZDOCK dimers, Dimer A was recorded with lowest energy followed by Dimer B and Dimer C. Similarly out of pyDOCK dimers Dimer X was found having lowest energy followed by Dimer Z and Dimer Y. RMSD calculated between simulated and un simulated dimers from superimposition was listed approximately <1.0 over a period (Fig. 5).
Minimium free energy for every monomer and dimer models were deduced and listed in Table 5. Dimers with lesser free energy were determined as more stable compare to monomer and might be responsible for dimerization.

Dimer analysis
At the end dimers were analyzed for identification of structural changes due to dimerization by Dim plot. It analyzes probable hydrogen bond and other non-covalent interactions between chains of oligomeric proteins. A total of 9, 10, 3, 5, 9 and 7 hydrogen bonds were determined between polypetide chains of DimerA, DimerB, DimerC, DimerX, DimerY and DimerZ respectively. Residues involved in hydrogen bond formation are shown in Table 6 and their solvent accessible surface area was calculated. Table 7 shows the total solvent accessible area for monomer and their respective Dimer chain A. The major change in solvent accessible area of amino acids constituting catalytic pocket like L90, K100, S101, S137, N139, T140, P190, G192, L253, K257, G259, E272, I273, F293, N296, K323, A337, I382, I395 etc. was observed. The change in solvent accessible area corresponds to change in binding site due to dimer formation. This might also coincide with the change in biological activity of monomer and dimer.       K592-R106, K593-R106, N540-D372, N575-N155, K618-S152, S620-S152

CONCLUSION
The presence of rice UGPase in monomer and various oligomeric states was found to be associated with its function. The rice UGPase is reported biologically active and dimer as biologically inactive. In absence of detailed structural and functional information, the link between oligomeric state and biological activity is yet to investigate. This link was explored through computational modeling and subsequent comparative analysis of monomer and dimer of rice UGPase. The present study involves first UGPase model generation via homology modeling and threading approach then their optimization through MD simulations. This was followed by subsequent generation and simulation of dimers. All predicted models satisfied the ponametus of best possible computational modeling and had high value of correctness. The active residues of UGPase catalytic domain were identified with Cast P and CDD. Based on solvent accessible surface area calculation of monomer vs dimers and corresponding change