ReFOLD3: refinement of 3D protein models with gradual restraints based on predicted local quality and residue contacts

Abstract ReFOLD3 is unique in its application of gradual restraints, calculated from local model quality estimates and contact predictions, which are used to guide the refinement of theoretical 3D protein models towards the native structures. ReFOLD3 achieves improved performance by using an iterative refinement protocol to fix incorrect residue contacts and local errors, including unusual bonds and angles, which are identified in the submitted models by our leading ModFOLD8 model quality assessment method. Following refinement, the likely resulting improvements to the submitted models are recognized by ModFOLD8, which produces both global and local quality estimates. During the CASP14 prediction season (May–Aug 2020), we used the ReFOLD3 protocol to refine hundreds of 3D models, for both the refinement and the main tertiary structure prediction categories. Our group improved the global and local quality scores for numerous starting models in the refinement category, where we ranked in the top 10 according to the official assessment. The ReFOLD3 protocol was also used for the refinement of the SARS-CoV-2 targets as a part of the CASP Commons COVID-19 initiative, and we provided a significant number of the top 10 models. The ReFOLD3 web server is freely available at https://www.reading.ac.uk/bioinf/ReFOLD/.


INTRODUCTION
In silico modelling of protein structures provides a potential solution for bridging the protein sequence-structure gap. Although protein structures can now be predicted with high accuracy, using either advanced template based or deep learning-based methods, the resulting 3D models may still include significant local errors including unfavourable contacts, irregular hydrogen bonds, geometrical clashes and unusual angles. These errors may limit the usage of the modelled structures in cases where high accuracy is required, such as in drug discovery and protein engineering (1)(2)(3). The refinement of predicted protein structures refers to the correction of the local errors and improvement of the overall quality of theoretical 3D models, which is often considered as the 'last mile' for structure prediction (4).
Methods for the refinement of 3D protein models aim to increase model accuracy by fixing the local errors with the adjustment of secondary structure and modification of the sidechain interactions (3,5). Ironically, until recent years, the refinement of 3D models has more often than not resulted in decreases in average model accuracy and it remains a challenge for refinement groups to consistently improve all 3D models across all categories of target difficulty (3,6).
The employment of physics-based molecular dynamics (MD) simulations has been the main stay for the non-serverbased refinement methods. Since the CASP9 experiment, leading MD methods have generated refined 3D models by taking advantage of new force fields, parallel computing and restraint strategies (3). Although the MD-based protocols have been used by many top-performing groups in recent CASP experiments, they have suffered from a high computational time cost and the inadequacy of force fields for directing the generation of the 3D models towards the native basin (2)(3)(4)7,8). Both the automated server-based and non-server-based approaches for refinement sample dozens of 3D models in many alternative conformations (9). The most native like conformation among all 3D models are subsequently identified in the scoring stage (3). However, it is often problematic to recognise the improved models using either energy functions or model quality assessment programs (MQAPs), as the similarity among the 3D models generated by the sampling approaches (3).
The original ReFOLD server (10) was developed by our group to refine 3D models using a hybrid approach, which included both rapid and MD-based sampling and leading model quality estimates, to produce the predicted local and global quality scores for each sampled 3D model. The first protocol included the refinement of the initial structure using i3Drefine in 20 refinement cycles (1,11). The second protocol accommodated an MD-based protocol inspired by that of Feig and Mirjalili (7,12,13), but using more modest computational resources compared to the supercomputer scale resources, which were originally used. All 3D models generated by these sampling approaches were ranked using ModFOLD6 (14) at the last stage. The original ReFOLD method (10) was used for the refinement of CASP12 targets, and it showed promising performance, boosting the accuracy of our prediction pipeline (10). Nevertheless, structural drifts from the native state often occurred, especially for the templated based modelling (TBM) targets (10).
The ModFOLD server (14)(15)(16)(17) has been a leading approach for predicting the local and global quality of theoretical 3D models for more than a decade, according to both the CASP and CAMEO experiments (2,(14)(15)(16)(17). The local quality estimates provide significant information about the predicted accuracy score for each residue in each 3D model. In CASP13, ReFOLD2 was developed, which utilized a fixed restraint threshold, based on the per-residue accuracy scores produced by ModFOLD7 (the predicted distances from the native structure for each C-alpha atom), in order to guide the MD-based protocol. The CASP13 targets were relatively larger compared to the previous CASP experiments (3,10,18), and we observed that fixed restraint thresholds were not always appropriate for larger targets containing domains with varying accuracy. Therefore, with ReFOLD3, we developed a more targeted, novel gradual restraint strategy based on the local quality estimates for each starting model, which considered the required level of refinement for each individual residue.
In CASP13, contact prediction methods showed significant progress and reached up to 70% accuracy following the application of advanced deep learning methods (19)(20)(21). Residue-residue contact prediction methods have been used for the prediction of 3D models, protein-ligand interactions (22) and 3D model quality estimation (23)(24)(25)(26), and more recently, the performance of the tertiary structure prediction has been boosted by contact prediction methods (19)(20)(21). Thus for ReFOLD3, we also utilized contact predictions for the refinement of the predicted 3D models with the ap-plication of an additional gradual restraint strategy based on our contact distance agreement (CDA) score (14). The CDA score measures the agreement between the contacts, which were predicted by the DeepMetaPSICOV method (21,27,28), and the contacts between residues in the predicted 3D model, measured according to their Euclidean distance (14). The CDA score has been a local scoring component of the ModFOLD server since CASP11, and it will become increasingly important as contact prediction accuracy improves (14,21,(27)(28)(29).
In CASP14, ReFOLD3 gave us a performance boost in the main tertiary structure prediction category, where it enabled us to further improve the quality of some of the very best initial server models. Our group ranked in the top 10 in the refinement category itself, according to the official assessment. This was significant in the context of the very high-quality models of tertiary structures produced by many modelling groups. In CASP14, the models were much harder to refine as there was less room for improvement. Despite this major progress in modelling, most server models still contained significant local errors, which were both detectable by our ModFOLD8 method and fixable using ReFOLD3. It is notable that ReFOLD3 performed better at refining the most highly accurate starting models and it ranked within the top 5 refinement approaches, for models with GDT-TS scores >70. As 3D models of proteins become more widely used, it is important that biologists have access to freely available tools, such as ReFOLD3, in order to confidently identify and fix local errors in their theoretical models and bring them even closer to experimental quality.

MATERIALS AND METHODS
ReFOLD3, consisted of four protocols, which included improvements on the original version (10). The major improvement for ReFOLD3 was the accommodation of the two new comprehensive MD-based strategies (Supplementary Figure S1). The first and final protocol used i3Drefine (1,11,30) to refine the input model with 20 rapid iterations, the second and third protocols both employed more CPU/GPU intensive molecular dynamic simulation strategies, and models were ranked and selected at each stage using our ModFOLD8 server (14,16).
The second protocol included the introduction of molecular dynamics simulations that were guided by the per-residue accuracy scores obtained from ModFOLD8 (14,16,31). For the gradual restraint strategy, it was assumed that the regions identified as highly accurate should be restrained by applying a stronger harmonic positional restraint to prevent deviations from the native basin (3,7,13). A weaker restraint was also applied to the poorly predicted regions to allow for increases in the quality of these regions towards the native state (Supplementary Figure  S2A). Thus, the gradual restraints ranged from weak (0.05 kCal/mol/Å 2 ) to strong (1 kCal/mol/Å 2 ) harmonic positional restraints, which were applied to all atoms, including C-alphas, according to the distribution of the per-residue accuracy scores produced by ModFOLD8 (Supplementary  Table S1) (3,7,10,18,31). For the third protocol, a similar gradual restraint strategy was used to guide the MD simulation, but this time it was based on the residue-residue contact predictions. We utilized the CDA score method (14,31), which is based on the agreement between the residue contacts predicted by DeepMetaPSICOV (21) and the contacts in the model. If the CDA score was high, a stronger restraint was applied to preserve those residue contacts in the predicted 3D model. Lower CDA scores indicated where residues were likely to be further away from the native structure, and so weaker restraints were applied (Supplementary Table S2 and Supplementary Figure S2B).
The MD-based protocols were carried out at normal cellular conditions at 298 K with 1 bar of pressure in explicit solvent using nanoscale molecular dynamics (NAMD) (32) version 2.10 in Graphics processing unit (GPU) mode. The CHARMM22/27 force field (33) and the TIP3P water model (34) were also used for the simulation of the protein system. The system was also neutralized by inserting Na+ or Cl-ions to balance the net charge using Particle Mesh Ewald (PME) (35). The non-bonded interactions (mostly van der Waal's) were cut off by 12Å to the exclusion of bonded interactions by using the CHARMM27 default parameter file with the switching distance of 10Å (10). Using the pairlistdist function with 14Å distance between atom pairs for inclusion in pair lists made the switching function more efficient (10). The rigidBonds functions were also used to rigidify hydrogen bonds with a 2 fs timestep (10). The system's electrostatics and the temperature were calculated by PME with the temperature control using Langevin dynamics under the NTP conditions (constant number of particles, temperature and pressure) (10,36). The correction of clashes and the minimization of the system were carried out by 1000 steps in the first step of the MD-based protocol (10). The minimization step was followed by the implementation of the defined MD simulation to refine each target (10). Four parallel simulations were run for 2 ns, making 8 ns in total for a target as in the original MD-based protocol of ReFOLD (10). After the completion of the simulation run, 164 refined models were generated per target by taking a snapshot every 50ps, for each MD-based protocol (10).
Refined models generated from the MD-based protocols were then assessed and ranked using ModFOLD8 rank (10,14,31). The fourth protocol was a combination of the MD-based approaches, where the top-ranked model from the second and third protocol was then further refined using i3Drefine (1,10,11,14,31). Finally, all of the refined models generated by each of these protocols and the input model were pooled and re-ranked again according to the Mod-FOLD8 rank global scores (1,10,11,14,31).

Server inputs and outputs
The required inputs for ReFOLD3 are the protein sequence and a 3D model (in PDB format) for refinement. Optionally, users may provide a name for their protein sequence and their email address. The ReFOLD3 server results page ( Figure 1) includes an accurate estimate of the likely percentage improvement in model quality score for each refined 3D model, and it is unique in providing a series of individual per-residue error plots ( Figure 1B), which show the local quality estimates for the refined models compared to the uploaded protein structure. Therefore, users may easily visualize the specific local improvements in every refined 3D model at a glance. Users may also click buttons to compare the refined and original 3D models interactively directly within the browser. The superposition of the refined models with the input models can be visualized using the JSmol/HTML5 framework. Therefore, models can be viewed in 3D directly within in the browser, including on mobile devices, without the requirement of any plugins ( Figure 1C). The ReFOLD3 server will typically provide the results within ∼21 h for a target with 100 residues, with the MD simulation stages lasting ∼4 h (with Intel Xeon Platinum 8268 CPUs and Nvidia Tesla T4 GPUs). The server should complete the majority of refinement jobs within 48 h, once they are running.

Independent benchmarking and cross validation
CASP14 and CASP Commons 2020: The ReFOLD3 protocol was used to provide a significant proportion of the top ten identified 3D models for the ten SARS-2-CoV targets in CASP Commons COVID-19 initiative (Supplementary Tables S3-S10) (1,10,11,14,31). ReFOLD3 was also subsequently blind tested by independent assessors in the CASP14 experiment (May-Aug 2020) (2,10,14,18,31). Our refinement pipeline was employed to refine the best server 3D models for each target, which were selected by Mod-FOLD8 in the regular prediction category, as well as the starting models provided by the CASP assessors for the refinement category itself (1,10,11,14,31). ReFOLD3 generated numerous improved models compared to the initial structures, and this enhanced the performance of our prediction pipeline in the CASP14 experiment (Table 1 and Figure 2) (1,10,11,14,31). ReFOLD3 was ranked as the ninth best approach according to the assessors' formula (Supplementary Table S11) and tenth, according to the cumulative GDT-TS score in the CASP14 refinement category (Supplementary Table S12).
For the regular CASP14 prediction category, in our prediction pipeline, the top 3D model was selected by Mod-FOLD8 and then refined by the ReFOLD3 protocol in an attempt to further increase the accuracy of the predicted structure (1,10,11,14,31). From the results in Table 1, it is clear that ReFOLD3 managed to improve the quality for 68% of the top server 3D models selected by ModFOLD8, according to GDT-TS and lDDT scores ( GDT TS = 4.69, lDDT = 0.52). The GDT-TS score (37) is based on the superposition of C-alpha atoms between the predicted and native structure, while the lDDT score (38) calculates scores based on the atom-atom distance between the predicted or refined 3D model and the observed structure, considering all atoms (38). Thus, the results show that ReFOLD3 increased the accuracy of both the backbone and local regions for 68% of the structures. Furthermore, whereas the original version of ReFOLD was not successful in increasing the accuracy of the models for TBM targets in CASP12 (10), ReFOLD3 performed well across all target definitions in CASP14, including TBM (Figure 2, Supplementary Tables S13-S15) The refinement of the best server 3D model in the regular prediction category is arguably closer to a real-world scenario, compared to refinement of the starting models provided by assessors in the refinement category. The starting models chosen by the CASP assessors for the refinement category, may have already been pre-refined in their modelling pipelines and they often represent the very highest quality starting models, which makes their further refinement much more difficult. Nevertheless, ReFOLD3 managed to improve the quality for 38% of the starting models in the refinement category according to the observed scores (Supplementary Table S16). Using ReFOLD3, our group ranked ninth in the refinement category for the regular timeframe targets, according to the official assessors' formula (Supplementary Table S11). The ReFOLD3 method has also shown similar performance for each of the CASP14 target definitions, for the refinement category as well as the regular prediction category (Supplementary Table S17-S19). The datasets used for benchmarking are freely available for all to download via https://predictioncenter.org/ download area/CASP14/.
The performance of the ReFOLD3 was also analysed according to the size (number of amino acids) and the GDT-TS score intervals of the starting models by the CASP assessors in the refinement category (Supplementary Tables S20-S25). It is clear that the CASP14 refinement targets were much harder to refine overall and there was less room for improvement, as high-quality models of tertiary structures were produced by many modelling groups, which were then chosen by the assessors (20,39). However, it is important to note that many models still contained significant local errors, which were both detected and fixed using our ReFOLD3 pipeline (10,14,31). In that context, while many groups did not perform particularly well for targets with GDT-TS scores >70 (Supplementary Table  S25), ReFOLD3 performed relatively better at refining the most highly accurate starting models and was ranked within the top 5 refinement approaches for targets with GDT-TS scores >60, according to the cumulative Z-score (Supplementary Tables S24 and S25).

CONCLUSIONS
ReFOLD3 is one of the leading freely available fully automated servers for the refinement of theoretical 3D models of proteins. The method is unique in utilizing a gradual restraint strategy, based on both contact predictions and local quality estimation, to guide the refinement of protein models. Our application of gradual, rather than fixed, restraints has proved to be more successful for guiding our MDsimulations. As well as existing as an independent server, the ReFOLD3 method is integrated with the latest Mod-FOLD8 server (14,31). This integration enables users to identify the local errors in their 3D models and then correct them more conveniently, through the targeted improvement of specific regions. Using the ReFOLD3 server, non-expert users can easily visualize the likely improvements to their models, via intuitive per-residue error plots and 3D model superpositions.