Force Field Parameters for Fe2+4S2−4 Clusters of Dihydropyrimidine Dehydrogenase, the 5-Fluorouracil Cancer Drug Deactivation Protein: A Step towards In Silico Pharmacogenomics Studies

The dimeric dihydropyrimidine dehydrogenase (DPD), metalloenzyme, an adjunct anti-cancer drug target, contains highly specialized 4 × Fe2+4S2−4 clusters per chain. These clusters facilitate the catalysis of the rate-limiting step in the pyrimidine degradation pathway through a harmonized electron transfer cascade that triggers a redox catabolic reaction. In the process, the bulk of the administered 5-fluorouracil (5-FU) cancer drug is inactivated, while a small proportion is activated to nucleic acid antimetabolites. The occurrence of missense mutations in DPD protein within the general population, including those of African descent, has adverse toxicity effects due to altered 5-FU metabolism. Thus, deciphering mutation effects on protein structure and function is vital, especially for precision medicine purposes. We previously proposed combining molecular dynamics (MD) and dynamic residue network (DRN) analysis to decipher the molecular mechanisms of missense mutations in other proteins. However, the presence of Fe2+4S2−4 clusters in DPD poses a challenge for such in silico studies. The existing AMBER force field parameters cannot accurately describe the Fe2+ center coordination exhibited by this enzyme. Therefore, this study aimed to derive AMBER force field parameters for DPD enzyme Fe2+ centers, using the original Seminario method and the collation features Visual Force Field Derivation Toolkit as a supportive approach. All-atom MD simulations were performed to validate the results. Both approaches generated similar force field parameters, which accurately described the human DPD protein Fe2+4S2−4 cluster architecture. This information is crucial and opens new avenues for in silico cancer pharmacogenomics and drug discovery related research on 5-FU drug efficacy and toxicity issues.


The Study
Although there is an increased interest in the protein metal interactions, prompted by the essential physiological roles played by metal ions [15][16][17], the Fe-S (Gln) coordination in cluster 1026 is yet to be reported in other Fe 2+ 4S 2− 4 cluster containing proteins [1,2]. Metal ions such as iron (Fe 2+ ) are crucial components of a protein's electron transportation, We can gain insights into metal coordinating environments through computational studies, especially via molecular dynamics (MD) simulation. However, MD calculations are highly dependent on force fields derived through quantum mechanics (QM), and molecular  [20,21]. MM methods employ classical-type models to predict the amount of energy in a molecule, based on its conformation [22]. Compared to QM approaches, MM methods are computationally cheaper and sufficient for describing atomic interactions and dynamics of a purely organic system. However, most of the available MM force fields cannot accurately describe the metal/organic interface occurring in metalloproteins, as they ignore their induced explicit electronic degree of freedom [23]. To account for the electronic effects of the metals, de novo QM/MM calculations have been employed to describe the precise electron structure of atoms around a metal center [24][25][26]. Due to the importance of metals in protein function, the development of novel force field parameters using either hybrid QM/MM or pure QM approaches for describing various transition metal architectures is gaining pace [27]. This has led to numerous modified force fields that have been incorporated in several force field families, such as the optimized potentials for liquid simulations (OPLS-AA) [28], Gronigen molecular simulation (GROMOS) [29], chemistry at Harvard molecular mechanics (CHARMM) [30,31], and assisted model building with energy refinement (AMBER) [32]. Both CHARMM and AMBER are widely used. They give a large palette of atom types, allowing several organic molecules to be represented by assigning atom types based on chemical similarity [33,34]. OPLS-AA [35,36] optimizations focus on the condensed phase properties of small molecules, and have since been extended to include a diverse set of small molecule model compounds. However, atom type assignment must be done manually. It worth noting that a commercial implementation of OPLS-AA with atom typing functionality is available [37]. On the other hand, CHARMM has been enhanced with the CHARMM general force field (CGenFF), which not only covers a wide range of chemical groups found in biomolecules and drug-like molecules, but also many heterocyclic scaffolds [38,39]. Furthermore, a web interface for automatic atom typing and analogy-based parameter and charge assignment is now available [40,41]. The GROMOS force field atom type palette offers a pool of diversity for the construction of small molecule models with a force field derived from biopolymer parameters [29]. The general AMBER force field (GAFF) [42] and the antechamber toolkit are now included in AMBER [33,43,44] allowing the user to generate an AMBER [32,45] force field model for any input molecule. Besides the associated simulation speeds and exportable parameters, the development of a Python-based metal parameter builder (MCPB.py) [46], which supports various AMBER force fields and >80 metal ions, has made the parametrization of inorganic constituents in proteins more facile. These advantages make AMBER the most preferred platform for the development of metal parameters for use in simulations involving metalloproteins. Hitherto, various methods, such as the polarization model, and non-bonded, semi-bonded, and bonded models, have been implemented to characterize metalloproteins. The non-bonded model uses non-covalent (van der Waals and electrostatic forces) interaction to define metal centers [43,44], whereas, semi-bonded [47,48] models put dummy atoms around metals to resemble electron orbitals. However, these two methods are incapable of taking into account charge transfer and polarization effects around the metal centers [49]. These shortcomings have been solved by incorporating the charge transfer and polarization effects in potential energy function models [50,51]. Contrastingly, the bonded model utilizes defined harmonic energy terms, which have been introduced into possible energy function to account for the bond formation between atoms and metal centers [48,52,53][ The approaches mentioned above have been extensively used in studies to characterize Fe 2+ centers in a range of metalloproteins [52][53][54][55]. Among other Fe 2+ clusters, Carvalho and colleagues [54] satisfactorily generated AMBER force field parameters for Fe 2+ 4 S 2− 4 clusters coordinated by cysteine residues. However, none of these parameters featured glutamine residue coordination to the Fe 2+ center or developed parameters for the structures of composite multiple clusters, besides applying two approaches. To the best of our knowledge, this is the first study to determine the human DPD protein metal force field parameters.
Collectively, the current study integrates MM with QM techniques to determine accurate force field parameters for 8 × Fe 2+ 4 S 2− 4 cluster complexes of the modeled human DPD proteins. We utilized the bonded method of QM and Seminario techniques in our calculations [27]. More specifically, the density functional theory (DFT) of the QM approach was used to derive Fe 2+ center AMBER parameters for two models using different Seminario methods. The first method (viz. Model 1) used the original Seminario [56] method [46], whereas the second method (viz. Model 2) used collation features Visual Force Field Derivation Toolkit (VFFDT) Seminario [57]. A comparison of the parameters from the two methods was performed and their reliability evaluated via all atom MD simulations. For the first time, the current study reports novel force field parameters for multiple Fe 2+ 4 S 2− 4 clusters, coordinated to both cysteine and glutamine residues. Furthermore, the reliability of the two parameter generation approaches was also evaluated and found to be comparable. The newly derived force field parameters can be adopted by other systems depicting a similar Fe 2+ coordinating environment. More importantly, the establishment of these parameters creates an avenue for further molecular studies to fully understand the functional mechanism of the human DPD protein, and to decipher the effects of missense mutations on drug metabolism and cancer drug toxicity issues. As part of our ongoing investigations about the effects of known variants in human DPD, especially on its structure and stability, the reliability of the current parameters has been confirmed and the findings will be published as a follow-up study. Furthermore, different methods, such as the identification of new mutants, coupled with structural analysis and clinical studies, i.e., phenotyping of DPD, has had a great impact on the understanding of the structural and functional effects of these mutations [6]. Together, these results will be crucial, not only for understanding how mutations lead to 5-FU toxicities, but also to better inform the implementation of precision medicine in cancer treatment.

Human DPD 3D Wild Type (WT) Complete Structure Determined via Homology Modeling Approaches
The availability of accurate and complete 3D structural information is a fundamental aspect for molecular studies aimed at understanding protein function. With the absence of the human DPD X-ray structure in the protein data bank (PDB) [8], homology modeling approaches were used to calculate accurate models of the human DPD enzyme using MODELLER v9.15 [58], DiscoveryStudio4.5 [59], and pig X-ray structure (PDB ID: 1H7X, 2.01 Å) as a template [1,2]. The choice of the template was guided by the high sequence identity (93%) with the target human DPD enzyme. Additionally, it was in complex with the drug of interest (5-FU) and had a complete query coverage of 100%. Using the very slow refinement level in MODELLER v9.15, 100 apo protein models were generated. The three best models, with the lowest z-DOPE scores of −1.36, −1.36, and −0.88, were chosen for further validation. z-DOPE score evaluates the closeness of a model in comparison with the native structure, based on atomic distance-displacement statistical potential, with a score of ≤−1.0 being considered as a near-native structure [60,61]. Consequently, holo (apo and cofactors) and holo-drug (5-FU) complex structures were generated by incorporating the non-protein coordinates from the template in Discovery Studio 4.5 [59]. Additional model quality assessment (Table S1) was performed using the VERIFY3D webserver [62], qualitative model energy analysis (QMEAN) [63], protein structure analysis (ProSA) [64], and program to check the stereochemical quality of protein structures (PROCHECK) [65]. VERIFY3D utilizes pairwise interaction derived energy potentials to evaluate the local quality of a model, based on each residue structure environment [62]. High-quality structures are predicted to have more than 80% of their residues with a 3D-1D score of 0.2 or higher [62]. The modeled structures had 3D-ID scores of 0.2 or higher (Table S1) in 85.01% of its residues. QMEAN estimates the quality of the submitted model based on its physicochemical properties, then derives a value corresponding to the overall quality of the structure and compares it to the calculated QMEAN-scores of 9766 high-resolution experimental structures [63]. The modelled structures of DPD holo and holo-drug complexes had a QMEAN-score of 0.90 and 0.89, which is similar to that of high-resolution experimental structures. ProSA assesses the quality of the submitted model by calculating its potential energy and comparing the resulting score to that of the experimental structures available in PDB [64]. The Z-score of each monomer of the holo and holo-drug complexes was between −13.41 and −13.56, which is similar to NMR structures of the same size.
PROCHECK assesses the stereochemical quality of the submitted protein models based on their phi/psi angle arrangement and then produces Ramachandran plots, which show the protein residues positions on the most favored, allowed, and disallowed regions [65]. Each generated model had more than 83.8%, 16.0%, and 0.2% of their residues in the most favored, allowed, and disallowed regions, respectively, suggesting a good distribution of torsion angles (Table S1). Overall, constructed holo and holo-drug complexes with consistently high-quality scores were obtained.
To remove steric clashes in the generated models (holo and holo-drug), 100 steps of minimization, with the steepest descent algorithm using the GROMACS 5.14 MD simulation package [66], were performed and determined to be suitable for subsequent calculations.

AMBER Force Field Parameters Generated Using Bonded Approaches
The metal coordination geometries in proteins are highly dependent on the protonation states of the residues involved. Thus, to achieve the correct geometry arrangements in the human DPD protein, the protonation states of all titratable resides were determined at a pH of 7.5, using the H++ webserver (http://biophysics.cs.vt.edu/H++, accessed on 12 December 2019) [67] (Table S2). To ensure correct protonation, a visual inspection of all titratable residues was performed and corrected using Schrödinger Maestro version 11.8 [68]. Table 1 shows the protonation states of residues forming a bond with the metal ions in the Fe 2+ 4 S 2− 4 clusters. Cys was protonated as CYM and interacted with the Fe 2+ center through a sulfur (SG) bond. On the other hand, Gln was protonated as GLH to coordinate with the Fe 2+ ion through the oxygen (OE) atom. The AMBER force field parameters of the Fe 2+ 4 S 2− 4 clusters in the human DPD protein were calculated using two approaches: the original Seminario method (Model 1) and the collation features Seminario approach in visual-force field derivation tool (VFFDT) (Model 2). In each chain, two distinct residue coordinating environments were identified. Cluster 1026 (4 × Fe 2+ , 4 × S 2− , 3 × Cys and 1 × Gln) coordination was different from those of clusters 1027, 1028, and 1029 (4 × Fe 2+ , 4 × S 2− and 4 × Cys). The four Fe 2+ (FE1, FE2, FE3, FE4) bonded to the four S 2− (S1, S2, S3, S4) ions to form internal coordinates.  [69][70][71]. Model 2 calculations failed at the B3LYP level of theory, therefore, the parameters for single internal coordinates (S3 and FE3) were obtained using a Los Alamos double-zeta basis (LSDA/LANL2DZ) approach [72]. However, those for the external coordinates ((Cys and Fe 2+ ) and (Gln and Fe 2+ )) were derived using a geometry, frequency, noncovalent, extended TB (GFN1-xTB) method ( Figure S1) [73,74]. concerning Fe 2+ and Cys [54,77,78]. However, there is limited literature on Fe 2+ and Gln force field interactions, which has been sufficiently addressed in this study. Despite the slight differences, the values of force constant from both systems (Model 1 and 2) were within the same range, and consistent with those obtained from previous studies [54,78]. Commonly, force field parameter values of a model conducted under different systems are not exact, but fall within an expected range [56,57,79]. In generating new parameters, the state of the structural geometry optimization is thought to be a contributing factor to the varied observations [80]. Previous findings [81] ascribed the discrepancies to the different methods used in obtaining the force constant and the opposite manners in which the connectivity's were defined. Most importantly, the derived values showed that both models maintained the subsets structural geometry following the optimization step.

Geometry Optimization
The subset structures for Model 1 attained the local minima at step number 238 initiating the optimization process ( Figure 2C,D). During the optimization process, a significant energy variation between steps 120 and 230 was observed. The main cause of the energy variation was the formation of a repulsive bond between Fe 2+ and Fe 2+ ions instead of the Fe 2+ and S 2− ions in cluster 1026. Nevertheless, the subset structures achieved correct optimization, while maintaining their geometry, as seen in Figure 2B.
The original Seminario method derived individual point value parameters for the subsets in Model 1 (Table S3). Contrastingly, the VFFDT (Model 2) approach/method generated average related parameters for internal bond length and angles, whereas the external parameters were averaged manually (Table S4). The equilibrium bond length and angle values obtained from QM (Models 1 and 2) showed some deviation of the crystal structure (Tables 2-4). These disparities might have been due to deficient phase information on the x-ray structure, since they give a static snapshot of the dynamic structure, contributing to spurious values [75]. Moreover, the disparity might have resulted from the lack of solvent effects and intermolecular interactions during the QM gas-phase optimization step [75,76]. As expected, the average bond length and angle for Model 2 were within the range of those obtained from Model 1. Furthermore, consistent with previous studies, in both models, the bond distances between Gln(OE)-Fe 2+ were seemingly lower (Model 1: 1.92 Å and Model 2: 1.93 Å) ( Table 2) compared to the bond between Cys(S)-Fe 2+ , with force constants of 60.40 and 24.97 kcal·mol − ·Å − , respectively. The short bond length might be attributed to the smaller atom radius of oxygen in Gln compared to that of sulfur in Cys [1,2]. These values coincided with those obtained from previous related studies concerning Fe 2+ and Cys [54,77,78]. However, there is limited literature on Fe 2+ and Gln force field interactions, which has been sufficiently addressed in this study.
Despite the slight differences, the values of force constant from both systems (Model 1 and 2) were within the same range, and consistent with those obtained from previous studies [54,78]. Commonly, force field parameter values of a model conducted under different systems are not exact, but fall within an expected range [56,57,79]. In generating new parameters, the state of the structural geometry optimization is thought to be a contributing factor to the varied observations [80]. Previous findings [81] ascribed the discrepancies to the different methods used in obtaining the force constant and the opposite manners in which the connectivity's were defined. Most importantly, the derived values showed that both models maintained the subsets structural geometry following the optimization step.

RESP Charges
Partial atomic charge calculations were derived for each atom interacting with the Fe 2+ center for the optimized subset structures. Figure S2 and Table S5, illustrate differences in the WT DPD atomic charge distribution in the oxidized subsets. The RESP method derived these charges by fitting the molecular electrostatic potential obtained from the QM calculation, based on the atom-centered point charge model. In their oxidized state, atoms within the DPD Fe 2+ (S 2− , Gln and Cys) center exhibited varied atomic charges due to the large electrostatic environment around the protein's metal sphere. Such variations are known to influence charge transfer at the redox center bringing stability around the coordinating sphere of metalloproteins [79]. As such, they are vital components in the achievement of accurate inter-and intra-molecular potential electrostatic interaction [75].    The newly generated Fe 2+ force field parameters for subsets 1026-A and 1027-A (Tables S6 and S7) were inferred to the remaining Model 1 DPD clusters corresponding to their geometries mentioned earlier. Similarly, the generated internal and external parameters (Table S4) for Model 2 were also inferred to the corresponding clusters, accordingly. At the end, each model featured a holo and a holo-drug (5-FU cancer drug) protein complex, totaling 64 internal (Fe-S) and 32 external (30 Cys-Fe; 2 Gln-Fe) parameter calculations for the DPD (Fe 2+ 4 S 2− 4 ) clusters. In terms of energy profile and range of force constants for Model 1 and 2, there were no significant differences observed in terms of DPD Fe 2+ ion coordination to Cys, Gln residues, and S 2− ions. Tables 2-4 show a summary of equilibrium bond length, angle, and related force constants, with detailed information available in the supporting information (Tables S6 and S7). Dihedral-related force constants were derived manually from the respective structures (Table S8). Accurate parameters are necessary for maintaining the coordinating geometry of a metal center in metalloproteins [55]. Therefore, to evaluate the accuracy and reliability of the derived parameters (Model 1 and 2), all atom MD simulations (150 ns) for holo system and holo-drug complexes were performed. The derived parameters were validated by assessing the root mean square deviation (RMSD) (Figure 3A), the radius of gyration (Rg) ( Figure 3B), and root mean square fluctuation (RMSF) ( Figure 3C). Simulations of both models for holo and holo-ligand complexes showed minimal deviation from their initial structures, which were maintained across the simulation process ( Figure 3A). Model 1 systems (holo and holo-drug) displayed a multimodal RMSD density distribution, implying they sampled various local minima, whereas each of the Model 2 proteins attained a single local minimum (unimodal distribution). The Rg ( Figure 3B) revealed that the compactness of the various protein models remained the same during dynamics. However, differences were observed between the holo and holo-drug bound proteins. The ligand-bound protein was seen to generally have a higher Rg than the non-ligand bound protein in both model systems. This may be attributed to the presence of the drug. Proteins from both models exhibited similar RMSF profiles ( Figure 3C). However, the ligand-bound proteins appeared slightly more flexible than the non-ligand bound ones. As expected, the loop regions, which constitute~43% of the entire protein structure, including the active-site loop (residues 675-679), were the most flexible regions, while the metal site residues displayed minimal fluctuation ( Figure S3). Visualization of the different trajectories through visual molecular dynamics (VMD) [82] verified a high conformational change of the loop areas, while the protein central core containing Fe 2+ clusters had vibrational-like movements.
The profiles of the RMSDs ( Figure 3A) exhibited higher variation in conformational changes across all systems. These variations were more apparent in the Model 1 system's proteins compared to the Model 2 system. Considering the similarity of protein behavior with drug binding, it is apparent that both models showed similar atomic tendencies in the drug and non-drug bound systems. The disparities arising from conformational changes were because of the slight differences in the approaches used in the models' preparation. For instance, fixed bond parameters were assigned between Fe-S, Fe-Fe, and the connecting residues (Fe-Cys or Fe-Gln) of Model 2, based on averages of crystallographic structure (Table S2), whereas Model 1 parameters were attained from single point atom calculation of the crystallographic structure. The RMSF values of both the holo and holo-drug bound complexes demonstrated a region of higher flexibility between residues in all models ( Figure 3C). the loop areas, while the protein central core containing Fe 2+ clusters had vibrational-like movements. The profiles of the RMSDs ( Figure 3A) exhibited higher variation in conformational changes across all systems. These variations were more apparent in the Model 1 system's proteins compared to the Model 2 system. Considering the similarity of protein behavior with drug binding, it is apparent that both models showed similar atomic tendencies in the drug and non-drug bound systems. The disparities arising from conformational changes were because of the slight differences in the approaches used in the models' preparation. For instance, fixed bond parameters were assigned between Fe-S, Fe-Fe, and the connecting residues (Fe-Cys or Fe-Gln) of Model 2, based on averages of crystallographic structure (Table S2), whereas Model 1 parameters were attained from single point atom calculation of the crystallographic structure. The RMSF values of both the holo and holodrug bound complexes demonstrated a region of higher flexibility between residues in all models ( Figure 3C).
Proteins are dynamic entities and as such they undergo conformational changes as part of their functionality. Elucidating these changes is necessary for understanding how their functionality is maintained [83]. Hence, we evaluated the conformational variations sampled by each system during the simulation by plotting the free energy of each system snapshot as a function of RMSD and Rg using the Boltzmann constant ( Figure 4). In both models, free energy investigations revealed similar tendencies to the kernel density map in all the systems. Both holo and holo-drug bound proteins populated three main conformations in Model 1. However, the holo bound protein attained three energy minima at 0.18, 0.20, and 0.25 nm, while the drug-bound protein energy minima were attained later, at 0.22, 0.25, and 0.35 nm. On the other hand, Model 2 equilibrated at single energy minima for both the drug (0.28 nm) and holo (0.22 nm) bound complexes. Model 1 proteins repeatedly attempted to find a high probability region that guaranteed more thermodynamic stability for its conformational state than Model 2. However, upon drug binding Proteins are dynamic entities and as such they undergo conformational changes as part of their functionality. Elucidating these changes is necessary for understanding how their functionality is maintained [83]. Hence, we evaluated the conformational variations sampled by each system during the simulation by plotting the free energy of each system snapshot as a function of RMSD and Rg using the Boltzmann constant ( Figure 4). In both models, free energy investigations revealed similar tendencies to the kernel density map in all the systems. Both holo and holo-drug bound proteins populated three main conformations in Model 1. However, the holo bound protein attained three energy minima at 0.18, 0.20, and 0.25 nm, while the drug-bound protein energy minima were attained later, at 0.22, 0.25, and 0.35 nm. On the other hand, Model 2 equilibrated at single energy minima for both the drug (0.28 nm) and holo (0.22 nm) bound complexes. Model 1 proteins repeatedly attempted to find a high probability region that guaranteed more thermodynamic stability for its conformational state than Model 2. However, upon drug binding the conformation entropy was increased in both models, which destabilized the transitional state and simultaneously slowed down the protein equilibration. Visualization of the trajectories in VMD for establishing the cause of the trimodal ensemble showed alternating movements in the loop regions, including the C-terminal, N-terminal, and active-site loop areas. More importantly, the Fe 2+ 4 S 2− 4 cluster's geometry was maintained during the simulation ( Figure S4).

Fe 2+ 4 S 2− 4 Clusters Exhibited Stability during MD Simulations
Assessment of the inter-or intra-molecular distances between groups of interest can be used to investigate stability changes during MD simulations [84]. In this study, distances between the center of mass (COM) of; 1) the entire DPD protein and each of the eight Fe 2+ 4 S 2− 4 clusters ( Figure 5A); 2) each chain and the four Fe 2+ 4 S 2− 4 clusters therein ( Figure 5B); and 3) the active site of each chain and its Fe 2+ 4 S 2− 4 clusters, were evaluated ( Figure 5C) for each model (Model 1 and 2: holo and holo-drug). From these calculations, the overall stability of the key components involved in the electron transfer process was evaluated. Generally, the inter-COM distances between the various groups in both models were nearly the same ( Figure 5A-C). Moreover, data were distributed with a less standard deviation (uni-modal distribution), as seen from most kernel density plots, suggesting the distances within metal clusters remained in the same range across the 150 ns simulation and maintained stability within the metal clusters. Thus, the two methods can reliably be used to achieve accurate parameters for other metalloproteins.
Molecules 2021, 26, 2929 14 of 28 the conformation entropy was increased in both models, which destabilized the transitional state and simultaneously slowed down the protein equilibration. Visualization of the trajectories in VMD for establishing the cause of the trimodal ensemble showed alternating movements in the loop regions, including the C-terminal, N-terminal, and activesite loop areas. More importantly, the Fe 2+ 4S 2− 4 cluster's geometry was maintained during the simulation ( Figure S4).

Fe 2+ 4S 2− 4 Clusters Exhibited Stability During MD Simulations
Assessment of the inter-or intra-molecular distances between groups of interest can be used to investigate stability changes during MD simulations [84]. In this study, distances between the center of mass (COM) of; 1) the entire DPD protein and each of the eight Fe 2+ 4S 2− 4 clusters ( Figure 5A); 2) each chain and the four Fe 2+ 4S 2− 4 clusters therein (Figure 5B); and 3) the active site of each chain and its Fe 2+ 4S 2− 4 clusters, were evaluated ( Figure  5C) for each model (Model 1 and 2: holo and holo-drug). From these calculations, the overall stability of the key components involved in the electron transfer process was evaluated. Generally, the inter-COM distances between the various groups in both models In addition to the group inter-COM distance calculations, the distances between the Fe 2+ centers and the coordinating residues were also determined for the holo-drug complexes in both models ( Figure 6). Using this approach, the integrity of the coordinating geometry could be accessed during simulations. From the results, a high bond length consistency was observed within all Fe 2+ 4 S 2− 4 centers; an indication that the derived parameters were accurately describing the cluster geometries. Furthermore, the obtained bond lengths were in agreement with those reported previously [54,55]. The maintenance of the bond distances signified that the desired functionality and stability had not been jeopardized given that this is dependent on the protein environment [54]. Notably, Zheng et al.'s protocol for the evaluation of metal-binding structure confirmed that the coordinating tetrahedral geometry of Fe 2+ 4 S 2− 4 clusters was maintained during the entire simulation run. Although our calculations agreed with previous findings [54,56,77,78], it is worth noting that, to the best of the authors' knowledge, none of the studies featured the force field parameters for glutamine interaction with a single or multiple Fe 2+ 4 S 2− 4 cluster in a single protein.
were nearly the same ( Figure 5A-C). Moreover, data were distributed with a less standard deviation (uni-modal distribution), as seen from most kernel density plots, suggesting the distances within metal clusters remained in the same range across the 150 ns simulation and maintained stability within the metal clusters. Thus, the two methods can reliably be used to achieve accurate parameters for other metalloproteins. In addition to the group inter-COM distance calculations, the distances between the Fe 2+ centers and the coordinating residues were also determined for the holo-drug complexes in both models ( Figure 6). Using this approach, the integrity of the coordinating geometry could be accessed during simulations. From the results, a high bond length consistency was observed within all Fe 2+ 4S 2− 4 centers; an indication that the derived parameters were accurately describing the cluster geometries. Furthermore, the obtained bond lengths were in agreement with those reported previously [54,55]. The maintenance of the bond distances signified that the desired functionality and stability had not been jeopardized given that this is dependent on the protein environment [54]. Notably, Zheng et al.'s protocol for the evaluation of metal-binding structure confirmed that the coordinating tetrahedral geometry of Fe 2+ 4S 2− 4 clusters was maintained during the entire simulation run. Although our calculations agreed with previous findings [54,56,77,78], it is worth noting that, to the best of the authors' knowledge, none of the studies featured the force field parameters for glutamine interaction with a single or multiple Fe 2+ 4S 2− 4 cluster in a single protein. , and (C) active-site. Generally, a uni-modal distribution was seen across all clusters in both models. The distance between the Fe 2+ cluster and backbone of the protein remained within the same range during dynamics. Cluster compactness is an indication of the system stability. Respective clusters are colored accordingly.

Validation of Derived Parameters in IH7X Crystal Structure
The derived Fe 2+ 4 S 2− 4 parameters coordinated uniquely to Cys and Glu residues were inferred to the template structure (PDB ID: 1H7X) for additional validation purposes. As with the modelled human structures, the four Fe 2+ 4 S 2− 4 clusters in each chain of the template maintained the correct geometry, as shown in Figure S5.

Essential Motions of Protein in Phase Space
Proteins are dynamic entities, whose molecular motions are associated with many biological functions, including redox reactions. Collective coordinates derived from atomic fluctuation principal component analysis (PCA) are widely used to predict a lowdimensional subspace in which essential protein motion is expected to occur [85]. These molecular motions are critical in biological function. Therefore, PCA was calculated to investigate the 3D conformational study and internal dynamics of the holo and holo-drug complexes of both models (Model 1 and Model 2). The first (PC1) and the second (PC2) principal components captured the dominant protein motions of all atoms in the 150 ns MD simulation (Figure 7). Both holo structures (Model 1 and Model 2) showed a U-shaped time evolution from an unfolded state (yellow) emerging from simple Brownian motion and ending in a native state (dark blue), over 150 ns. Strikingly, the projection of holo-drug complexes from both models (1 and 2) adopted a V-shaped time evolution space, emerging from an unfolded state (yellow) and ending in a native state (dark blue). Model 1 and Model 2 holo structures accounted for 44.95% of the total global structural variances. The holo-drug complexes displayed 48.95% and 36.5% of global total variance for Model 1 and Model 2, respectively. In overall, the holo-drug complexes (Model 1 and Model 2) exhibited an altered conformational evolution over time in-comparison to their respective holo structure, suggesting that the newly derived force field parameters in both models did not alter protein function. 1026A and 1027A of Model 2, the holo-drug bound system. The coordinating distances between the Fe 2+ and the connecting residue was seen to be maintained throughout the simulation in both models.

Validation of Derived Parameters in IH7X Crystal Structure
The derived Fe 2+ 4S 2− 4 parameters coordinated uniquely to Cys and Glu residues were inferred to the template structure (PDB ID: 1H7X) for additional validation purposes. As with the modelled human structures, the four Fe 2+ 4S 2− 4 clusters in each chain of the template maintained the correct geometry, as shown in Figure S5.

Essential Motions of Protein in Phase Space
Proteins are dynamic entities, whose molecular motions are associated with many biological functions, including redox reactions. Collective coordinates derived from atomic fluctuation principal component analysis (PCA) are widely used to predict a lowdimensional subspace in which essential protein motion is expected to occur [85]. These molecular motions are critical in biological function. Therefore, PCA was calculated to investigate the 3D conformational study and internal dynamics of the holo and holo-drug complexes of both models (Model 1 and Model 2). The first (PC1) and the second (PC2) principal components captured the dominant protein motions of all atoms in the 150 ns MD simulation (Figure 7). Both holo structures (Model 1 and Model 2) showed a U-shaped time evolution from an unfolded state (yellow) emerging from simple Brownian motion and ending in a native state (dark blue), over 150 ns. Strikingly, the projection of holodrug complexes from both models (1 and 2) adopted a V-shaped time evolution space, emerging from an unfolded state (yellow) and ending in a native state (dark blue). Model 1 and Model 2 holo structures accounted for 44.95% of the total global structural variances. The holo-drug complexes displayed 48.95% and 36.5% of global total variance for Model 1 and Model 2, respectively. In overall, the holo-drug complexes (Model 1 and Model 2) exhibited an altered conformational evolution over time in-comparison to their respective holo structure, suggesting that the newly derived force field parameters in both models did not alter protein function.

Materials and Methods
A graphical workflow of the methods and tools used in this study is presented in Figure 8.

Materials and Methods
A graphical workflow of the methods and tools used in this study is presented in Figure 8.

Homology Modeling of Native DPD Protein.
Due to the absence of human DPD protein crystal structural information in the Protein Data Bank (PDB) database [10], a homology modeling approach was used to obtain a complete 3D structure using MODELLER v9.15 [61]. This technique has become indispensable for obtaining 3D model structures of proteins with unknown structures and their assemblies by satisfying spatial constraints based on similar proteins with known structural information [86]. The restraints are derived automatically from associated structures and their alignment with the target sequence. The input consists of the alignment of the sequence to be modeled with a template protein whose structure has been resolved, and a script file (Table S9). At first, the target sequence (human DPD enzyme: UniProt accession: Q12882) was obtained from the Universal Protein Resources [87]. Both HHPred [88] and PRIMO [89] were used to identify a suitable template for modeling the human DPD protein. From the potential templates listed by the two webservers, PDB 1H7X, a DPD crystal structure from pig with a resolution 2.01 Å , was identified as the top structural template having a sequence identity of 93% [1,2]. A pir alignment file was prepared between the Uniprot (UniProt accession: Q12882) target sequence and that of template using multiple sequence comparison by log-expectation (MUSCLE). Therefore, the template PDB ID: 1H7X was utilized. In MODELLER v9.15 [90], a total of 100 human DPD holo models were generated at the "very-slow" refinement level, guided by the selected template. The resulting models, devoid of both drugs (5-FU and cofactors), were ranked based on their lowest normalized discrete optimized protein energy (z-DOPE) score [60], and the top three models were selected for further modeling. To incorporate the non-protein structural information, each of the selected models was separately superimposed onto the template in Discovery Studio 4.5 [59], and all non-protein information was copied. The coordinates for cofactors and the drug were then transferred directly to the modeled structures. Further quality assessment of the resulting complexes was performed using VER-IFY3D [62], PROCHECK [65], QMEAN [63], and ProSA [64]. The best model showing a consistently high-quality score across the different validation programs was chosen for further studies.

Protonation of Titrarable Residues.
To account for the correct protonation states of the system, all DPD titratable residues were protonated at pH 7.5 [1], a system salinity of 0.5 M, and internal and external default dielectric constants of 80 and 10, respectively, in the H++ web server [67]. System coordinates (crd) and topology (top) files were used to build protonated protein structure files.

Homology Modeling of Native DPD Protein.
Due to the absence of human DPD protein crystal structural information in the Protein Data Bank (PDB) database [10], a homology modeling approach was used to obtain a complete 3D structure using MODELLER v9.15 [61]. This technique has become indispensable for obtaining 3D model structures of proteins with unknown structures and their assemblies by satisfying spatial constraints based on similar proteins with known structural information [86]. The restraints are derived automatically from associated structures and their alignment with the target sequence. The input consists of the alignment of the sequence to be modeled with a template protein whose structure has been resolved, and a script file (Table S9). At first, the target sequence (human DPD enzyme: UniProt accession: Q12882) was obtained from the Universal Protein Resources [87]. Both HHPred [88] and PRIMO [89] were used to identify a suitable template for modeling the human DPD protein.
From the potential templates listed by the two webservers, PDB 1H7X, a DPD crystal structure from pig with a resolution 2.01 Å, was identified as the top structural template having a sequence identity of 93% [1,2]. A pir alignment file was prepared between the Uniprot (UniProt accession: Q12882) target sequence and that of template using multiple sequence comparison by log-expectation (MUSCLE). Therefore, the template PDB ID: 1H7X was utilized. In MODELLER v9.15 [90], a total of 100 human DPD holo models were generated at the "very-slow" refinement level, guided by the selected template. The resulting models, devoid of both drugs (5-FU and cofactors), were ranked based on their lowest normalized discrete optimized protein energy (z-DOPE) score [60], and the top three models were selected for further modeling. To incorporate the non-protein structural information, each of the selected models was separately superimposed onto the template in Discovery Studio 4.5 [59], and all non-protein information was copied. The coordinates for cofactors and the drug were then transferred directly to the modeled structures. Further quality assessment of the resulting complexes was performed using VERIFY3D [62], PROCHECK [65], QMEAN [63], and ProSA [64]. The best model showing a consistently high-quality score across the different validation programs was chosen for further studies.

Protonation of Titrarable Residues.
To account for the correct protonation states of the system, all DPD titratable residues were protonated at pH 7.5 [1], a system salinity of 0.5 M, and internal and external default dielectric constants of 80 and 10, respectively, in the H++ web server [67]. System coordinates (crd) and topology (top) files were used to build protonated protein structure files. A visual inspection of all titratable residues was performed, and incorrect protonation corrected using Schrödinger Maestro version 11.8.

New Force Field Parameter Generation.
Prior to the parameter generation process, the residue coordinations present in chain-A and chain-B Fe 2+ 4 S 2− 4 centers were evaluated to identify representative subsets. Two unique coordination subset arrangements, viz. 1026A (4 × Fe 2+ , 4 × S 2− , 3 × Cys and 1 × Gln) and 1027B (4 × Fe 2+ , 4 × S 2− and 4 × Cys), were identified. The two subsets (1026A and 1027B) represented the coordinating geometry of all Fe 2+ 4 S 2− 4 clusters in the protein. Subsequently, force field parameters describing the coordinating interactions in these unique centers were determined via two approaches. First, the original Seminario method (Model 1) was implemented using the bonded model approach in Am-berTools16 [57] and Python-based metal center parameter builder (MCPB) [46]. Gaussian 09 [91,92] input files (com) of the protonated protein incorporating the subsets structures (1026A and 1027B) were prepared. Thereafter, their geometries were optimized utilizing the hybrid DFT method at a B3LYP correlation function level of theory. This process utilized double split-valence with a polarization [6-32G(d)] basis set [71,92] (Table S1). Sub-matrices of Cartesian Hessian matrix were used in the derivation of the metal geometry force field parameters [56]. Bond and angle force constants were obtained via fitting to harmonic potentials. The potential energy of the relative position for each atom in the system was determined by AMBER force field parameters calculated from Equation (1) below: where the bond lengths, angles values, torsion values, and the interatomic distances were obtained. The first and second term of the harmonic potential energy function relates to bond bending and bond stretching, respectively, whereas the torsion angles are described by the third term. Lastly, the van der Waals forces and electrostatic interaction are given by the non-bonded energy function involving the Lennard Jones (12-6) potential and Coulomb potential, respectively [32,56]. The optimized/minimized structures were then visualized in GaussView 5.0.9 [93] to confirm that the bonds in the centers were intact. The atomic charges of the optimized subset structures were then derived from electrostatic potential (ESP). However, ESP assigns unreasonably charged values to the buried atoms, which impair their conformational transferability. Therefore, the restrained electrostatic potential (RESP) fitting technique, which considers the Coulomb potential for the calculation of electrostatic interaction, was employed to address these issues. This method has been highly regarded and widely used for assigning partial charges to various molecules utilizing B3LYP/6-31G(d) gas phase [45]. Restraints, in terms of penalty functions, are applied on the buried atoms, leading to multiple possible charged values. Hence, the quality of fit to the QM ESP is not compromised [94]. Herein, a default Merz-Kollman restrained electrostatic potential (RESP) radius of 2.8 Å was allocated to the metal centers. An additional approach (herein named as Model 2) using the collation features Seminario: VFFDT program was used [57]. Analysis data were acquired following optimization of subset Fe 2+ -S 2− , Fe 2+ -Cys, and Fe 2+ -Gln coordination; the calculations were performed using density functional theory (DFT) featuring the LSDA/LANL2DZ (Table S2) [72]. This factored in the internal covalent bonds; note that the calculation was not successful at the B3LYP level of theory [69]. The external non-covalent bond calculation was determined by GFN1-xTB [73,74]. Retrieval of the force field parameters for the entire molecule was done through the Protocol menu item "FF" for the whole "General Small Molecule". Since the system in this study was symmetrical, the atom types were left identical to Fe or S. The AMBER force field parameters for Fe 2+ metal center bond and angles were then generated automatically. Individual detailed statistics were derived but only the final values were utilized for further calculations. The obtained parameters were then inferred to the other clusters in the modeled structures, as well as the template crystal structure (PDB ID: 1H7X) using the LEaP [95] program. This was based on the similarity of the clusters coordinating geometry. As such, cluster 1026A was inferred to 1029B, and those for 1027A were inferred to 1027B, 1028A, 1028B, 1029A, and 1029B, as they depict an identical coordination geometry. In total, 2 × ([Fe 2+ 4 S 2− 4 (S-Cys) 3 (S-Gln)]) and 6 × ([Fe 2+ 4 S 2− 4 (S-Cys) 4 ]) cluster parameters were derived for each model. No other 3D structure with metal centers, such as the human DPD coordinating environment, was available in the PDB. Therefore, the pig crystal structure was used to validate the reliability and accuracy of the newly generated force field parameters.

Force Field Parameters Validation and Analysis
To evaluate the reliability of the generated parameters derived from the original and automated Seminario approaches, duplicate all-atom MD simulations were conducted using the GROMACS 5.14 MD package [66]. For each model system (Model 1, Model 2, 1H7X crystal structure), the holo (protein with only cofactors) and holo-drug (5-FU) complexes were considered for simulation studies. At first, AMBER topologies for each system were generated by Leap modeling with the AMBER ff14SB force field to incorporate all the generated parameters [96]. The resulting system topologies were converted to GROMACScompatible input files for the structure (gro) and the topology (top), with the correct atom types and charges using the AnteChamber Python Parser interface (ACPYPE) tool [97]. The infinite systems were then solvated in an octahedron box system using the simple point charge (SPCE216) water model [98], and with a padding distance of 10 Å set between the protein surface and the box face. The net charge for all systems was subsequently neutralized by adding 0.15 M NaCl counter-ions [99]. The neutralized systems were then subjected to an energy minimization phase (without constraints) using the steepest descent integrator 0.01 nm, and a maximum force tolerance of 1000 kJ·mol −T ·nm −m was attained. This was necessary to get rid of steric clashes that may have resulted during incorporation of the parameters and water molecules. Subsequently, the systems were equilibrated to ensure that they attained the correct temperature and pressure using a two-step conical ensemble (each 100 ps). First, the temperature was set at 300 K (NVT-number of particles, volume, and temperature) using a modified Berendsen thermostat. This was followed by pressure equilibration at 1 atm (NPT-number of particles, volume and temperature) using the Parrinello-Rahman barostat algorithm [100]. The ensembles utilized the revised coulomb type for long range electrostatic interactions with a gap cut of 8.0 Å, as described by the particle mesh Ewald (PME) [101] method, and the LINCS algorithm was used to constrain bonds between all atoms [102]. Finally, production MD simulations of 150 ns were performed for all the systems at the Centre for High Performance Computing (CHPC) in Cape Town South Africa using 72 Linux CPU cores, with time integrations step of 2 fs. Coordinates were written to file every 10 ps. The obtained MD trajectories were stripped off all periodic boundary conditions (PBC) and fitted to the reference starting structure.

Root Mean Square, Root Mean Square Fluctuation, and Radius of Gyration Analysis
Global and local conformational behaviors of the replicate ensembles were determined using various GROMACS modules, viz. gmx rms, gmx rmsf, gmx gyrate, gmx distance, and analyzed in RStudio [103]. These packages were used to analyze the root mean square deviation (RMSD), root mean square fluctuation (RMSF), the radius of gyration (Rg), and the inter-center of mass between groups of interest, respectively. The overall conformational changes per system were observed using visual molecular dynamics (VMD) [82] to ensure that the derived parameters correctly maintained the geometry of the various Fe 2+ 4 S 2− 4 clusters.

Principal Component Analysis
Principal component analysis (PCA) was conducted in MDM-TASK-web to investigate the time evolution of the protein's conformational changes in MD trajectories [85,104]. PCA is a linear transformation technique that extracts the most important element from a data set by using a covariance matrix built from atomic coordinates defining the protein's accessible degree of freedom. The calculations of the coordinate covariance matrix for the Cα and Cβ atoms were implemented after RMS best-fit of the trajectories was applied to an average structure [85,104]. Corresponding eigenvectors and eigenvalues were then obtained from a diagonalized matrix. Protein coordinates were then projected using eigenvectors. PC1 versus PC2 plots were then derived from the normalized primary and secondary projections.
To ascertain how accurate the generated force field parameters were, the average bond lengths and force constants from the derived parameters were compared to those of the x-ray structure. All statistical calculations were performed using Welch t-test in RStudio v1.1. 456 [103], where a p-value (<0.05) was considered significant.

Conclusions
In addition to the nucleotide metabolizing function of the DPD metalloenzyme in humans, the dimeric protein also serves as an important anti-cancer drug target [4][5][6]. Deficiency or dysfunction of the enzyme, because of mutations, results in increased exposure to active fluoropyrimidines metabolites, leading to severe toxicity effects. Computational approaches such as MD simulations have become integral components of elucidating protein function, as well as the effects of mutations [4]. MD simulations allow the elucidation of the conformational evolution of protein systems over time during a reaction process [26,31,32]. MD simulations require the appropriate mathematical functions and a set of parameters collectively known as force fields, which describe the protein energy as a function of its atomic coordinates. In cases where adequate parameters are lacking, especially those describing non-protein components in a system, additional descriptors are necessary. In this work, which forms a platform for future studies towards anti-cancer personalized medicine, we reported new validated AMBER parameters that can be used to accurately describe the complex Fe 2+ 4 S 2− 4 clusters in the DPD protein and related systems. This was motivated by the absence of ready to use force field parameters enabling in silico studies on the DPD system. The development of combined QM/MM methods has provided the most effective, accurate, and theoretical description of the molecular system [92]. They enable a comprehensive analysis of the structural, functional, and coordinating environment in metal-binding sites [26]. Thus, we highlighted the two similar methods' capabilities, yet with different approaches and aspects of the algorithms for deriving authentic force field parameters for Fe 2+ centers in DPD protein.
First and foremost, we reported the generation of force field parameters using the original Seminario method [56]. We went further and exploited the collation features of the VFFDT Seminario method for obtaining the force field parameters of the same Fe 2+ ions as a supportive measure [57]. This was performed by considering the dimeric functionality of the human DPD protein, which relies on the well-organized inter-chain electron transfer across an eight Fe 2+ 4 S 2− 4 cluster complex. A double displacement reaction across the two chains leads to the activation and deactivation of the third most commonly prescribed anticancer (5-FU) drug globally [111]. It was remarkable that we successfully derived the desired force constants and bond distances for the Fe 2+ centers using both Seminario approaches. The parameters obtained from other studies [54] did not address the coordinating geometry of the clusters in this study. Moreover, none of the studies focused on force field parameters for multiple clusters in a protein. Therefore, from the range of force field parameters generated from both approaches, it would be best to obtain averages of such force fields for future use in other similar systems. These averaged values will allow for some degree of transferability.
Above all, the derived parameters could easily be incorporated into consolidated MM packages. Furthermore, we ascertained that irrespective of the DFT (B3LYP HF/6-31G* and (LSDA/LANL2DZ and GFN1-xTB) logarithm application, the original Seminario approach is not inferior to the modified Seminario (collation features VFFDT) approach. Despite the role of DFT calculations (such as B3LYP) in deciphering the reactivity mechanisms of the DPD systems, the method is faced with the major limitation of neglecting dispersion interactions [112]. As a result, additional correction approaches, such as DFT-D3 [113], DFT-D [114], and BJ-damping [115] methods, are included in the calculations. In calculations where the dispersion interactions were most critical in Model 2, the DFT-D3 correction, which is part of the Grimme's GFN1-xTB, was used. However, for Model 1, consideration of the most DFT correction method will be applied in future calculations. Owing to the possible occurrence of paramagnetism effects, due to the presence of unpaired electrons in the non-trivial DPD system Fe 2+ 4 S 2− 4 clusters, an attempt at implementing unrestricted calculations in Model 2 resulted in a higher energy compared to under restricted conditions.
The validation of the Fe 2+ force field parameters obtained from this study using MD simulations produced satisfactory results. This will provide more insight into atomistic or electronic information, regarding the effects of site-specific interactions on the reaction path, in the DPD protein and the detrimental mutants [26,31,32].
Most importantly, concerning the generation of AMBER force field parameters, the authors acknowledge no other compatible parameters for this unique system. The derived novel force field parameters have paved the way for further simulations and enhanced the mechanistic understanding of metal cluster function in the human DPD protein through higher-level MD simulation methods. Additionally, the derived parameters are currently being applied to study the structural and changes in stability effects due to existing mutations in the human DPD protein. Together, the results from these studies will provide the atomistic details of mutation effects involving the DPD protein. This will open a platform for the implementation of in silico cancer pharmacogenomics and drug discovery research on 5-FU drug efficacy and toxicity effects.