A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and ambiguous contacts derived from solid- or solution-state NMR

Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.


1
ABSTRACT 2 3 Protein assemblies are involved in many important biological processes. Solid-state NMR 4 (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with 5 high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms 6 of both effort and time required is the manual identification of unambiguous intermolecular 7 contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform 8 labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract 9 information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing 10 the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the 11 predicted ECs are matched with the automatically generated list of experimental contacts. This 12 approach provides a selection of potential interface residues that is used directly in monomer-13 monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and 14 dimeric Sod1. 15 16 17 INTRODUCTION on a single protein sequence and thus on a single MSA. While this simplifies the construction of the 1 alignment, it makes the identification of ECs belonging to inter-molecular contacts much more 2 complicated because such information is hidden among hundreds or thousands of ECs of which the 3 majority are tertiary contacts (dos Santos et  In the present work we developed a protocol to extract information on the protein-protein 10 interface of homo-complexes from SSNMR-derived ambiguous contact lists, which can be 11 automatically generated, using coevolution analysis. All the ECs with a relevant probability to be 12 true residue interactions in either the monomer (intra-monomeric contacts) or in the homo-13 oligomerization interface (inter-monomeric contacts) are considered. The removal of intra-14 monomeric ECs requires the availability of the structure of the monomer. The predicted ECs with 15 possible matches to experimental peaks are used to identify candidate interface residues. The final 16 list of such residues is used directly in protein-protein docking calculations. The same protocol can 17 be also applied using only solution-state NMR data. 18 19 RESULTS 20 21 Our protocol aims to predict the structure of homo-oligomeric complexes by using 22 ambiguous NMR contacts to identify reliable inter-monomeric contacts within the list of ECs. The 23 whole procedure, which is described in detail in the next section, can be divided in two main parts. 24 First, intra-monomeric evolutionary couplings (ECs) are removed from the list of ECs based on the 25 3D structure of the monomer. Second, the list of ECs predicted to potentially be at the complex 26 interface is compared with the list of ambiguous NMR contacts to extract all residue pairs matching 27 both the predicted and the experimental dataset. The protocol was validated by predicting the 28 tetrameric structure of Escherichia coli L-asparaginase II (Cerofolini et al., 2019) (PDB ID: 6EOK), in 29 which two distinct dimeric conformations must be recognized to reconstruct the functional complex 30 (Fig. 1). Furthermore, the robustness of the procedure in the identification of complexes with small 31 interface regions was tested by predicting the structure of dimeric human apo Sod1 (Bertini et al.,  32 2009) (PDB ID: 3ECU) (Fig. 1   Description and application of the protocol 2 3 This protocol calculates a list of putative interface residues to be used as input to HADDOCK for 4 docking calculations. It needs four inputs (Fig. 2): one or more files with the list of ECs, the structure 5 of the monomer, the experimental NMR-derived list of ambiguous contacts and the Naccess file (rsa 6 format) with the per-residue relative solvent accessible area. The ECs of the target protein are 7 obtained from so-called coevolution analysis. A number of servers performing coevolution analysis 8 are available online (see Methods). In general, they need the protein sequence as input to predict a 9 contact map from multiple sequence alignments (MSAs). The output is a list of residue pairs scored 10 for the probability that they are actually in contact in the monomeric or oligomeric structure. We 11 apply a probability cutoff P to remove ECs with low probability of being true interactions. 12 Coevolution analysis usually outputs from hundreds to thousands of ECs that cannot be assigned as 13 intra-monomeric or inter-monomeric contacts without any structural information. As a 14 consequence, our protocol calculates for each EC the corresponding C⍺-C⍺ distance in the 3D 15 structure of the monomer and all the ECs below the distance cutoff of 12 Å are classified as intra-16 monomeric and removed . 17 After the removal of intra-monomeric ECs, the resulting list is enriched in contacts across 18 the interaction interface (inter-monomeric ECs inter-monomeric contacts an arduous task. Our protocol overcomes this bottleneck by matching 27 the predicted inter-monomeric ECs with the experimental list to extract information present in both 28 the datasets. In practice, residue pairs in the predicted inter-monomeric EC list are matched to 29 ambiguous assignments in the experimental list, providing a list of interface residue pairs. 30 The number of residual false-positives in the matched list is further decreased by removing 1 all the residues with a relative solvent accessibility lower than 40% in both main-chain and side-2 chain (i.e. buried residues). The remaining residues constituting the output list from our protocol 3 can be used directly as ambiguous interaction restraints (AIRs) in monomer-monomer docking 4 calculations with HADDOCK. The protocol can be run using the python script provided as 5 supplementary material (SI Appendix). 6 We assessed the accuracy of the protocol in predicting residues at the homo-oligomeric 7 interface for different probability cutoffs (Tables 1 and 2). Furthermore, we evaluated the NMR data 8 contribution to the prediction accuracy by comparing the results obtained with or without ("ECs + 9 NMR" and "ECs only", respectively) matching with the NMR data. A residue accurately predicted at 10 the complex interface is defined as a true-positive (TP) residue. More in detail, we defined a true-11 positive (TP) residue as having at least one atom with a distance < 7 Å from any atom located on a 12 different chain in the crystal structure of the complex. 13 In the case of the L-asparaginase II protein, the crystallographic complex is formed by four 14 subunits with a D2 symmetry. Thus, the ensemble of all TP residues contains the amino acids at both 15 dimeric interfaces. For this system, the inclusion of NMR data enhances the positive predictive value 16 (PPV), defined as true-positive (TP) residue predictions over all predictions [TP/(TP+FP)], at all the 17 probability cutoffs assessed (Table 1). In fact, on the basis of the "ECs only" analysis the absolute 18 number of TP residues present in the prediction is significantly higher than the number of TP 19 obtained after the match with NMR data. However, the same analysis also outputs a much greater 20 number of FPs. Consequently, the "ECs + NMR" analysis features a PPV of 100% for P >= 0.35; the 21 PPV remains very high (>= 80%) even at low probabilities (P < 0.35) and the number of predicted 22 interface residues is sufficient to successfully drive docking calculations (see next section). 23 24 25 Table 1. Number of residues predicted to make contacts across the L-asparaginase II homomeric interface. The protocol was applied 26 as depicted in figure 2 with the ECs matched with the NMR data "ECs + NMR" and without the matching step with NMR data "ECs 27 only". P indicates the probability threshold used to accept ECs. PPV = TP/(TP+FP). 28 29 Figure 2. Scheme of the protocol adopted to predict the structure of homo-oligomeric complexes using coevolution analysis and ambiguous NMR contacts. Instead, the Sod1 complex contains two subunits with a C2 symmetry and a small protein-2 protein interface. As a consequence, in the central part of the interface the inter-monomeric 3

L-asparaginase II
contacts involve residue pairs that also are at intra-monomer distance smaller than the 12 Å 4 threshold that we used to remove intra-monomeric ECs. In practice, this structural organization 5 significantly reduces the number of detectable TPs because the aforementioned inter-monomeric 6 contacts are discarded. Furthermore, small interfaces are harder to predict computationally and 7 also provide a lower number of NMR-detectable contacts. All these features make the Sod1 system 8 challenging but useful to test the limits of the protocol. When considering the Sod1 protein, the 9 "ECs only" protocol yielded a reasonable PPV for P >= 0.55, but with only a handful of TPs in the 10 prediction (Table 2). Instead, the match with NMR data removed the signal for P>= 0.45 while 11 retaining information at lower P values, especially for P = 0.30. 12 13 These results suggest that the quality of the initial EC prediction is quite important for the 2 performance of our protocol, leading to a larger enhancement of the PPV when the prediction 3 includes a larger number of TPs. When the EC data yielded is weaker and mixed with noise, our 4 protocol retains a good part of the available information but the PPV is mostly unchanged. 5 6 HADDOCK calculations for L-asparaginase II 7 8 The ECs at the P cutoff of 0.25 were matched with a solid state 2D 13 C-13 C DARR dataset 9 (mixing time 200 ms) holding 4937 ambiguous assignments, resulting in 19 surface residues 10 predicted to be at the protein-protein interface (corresponding to 14% of the whole protein 11 surface). The final 200 water-refined models generated by HADDOCK were analyzed by measuring 12 the RMSD from the structure with the lowest HADDOCK score. The clustering algorithm grouped 13 the models in 7 clusters (Fig. 3A). The first cluster was the most populated and included the models 14 with the lowest score. Indeed, the lowest HADDOCK score model of the first cluster was a dimer 15 with an RMSD of 0.7 Å from the crystallographic dimer formed by chain A and chain C of the 16 tetrameric protein (Fig. 3B). In addition to the HADDOCK score, the desolvation energy calculated 17 using empirical atomic solvation parameters proved to be an useful scoring function (Fernández-18 Recio et al., 2004), allowing the identification of the correct A-C dimer (Fig. S1). 19 20 21 Both the predicted inter-monomeric ECs and the experimental NMR inter-monomeric 22 contacts include residue pairs belonging to all the pairs of chains effectively in contact in the 23 functional complex. In the case of the tetrameric L-asparaginase II, besides the largest A-C interface 24 also chains A and D share a relevant number of contacts. According to this, in a single docking run 25 one might expect to sample both relevant dimeric configurations (A-C and A-D) in two different 26 clusters. Indeed, by checking the position of the 19 predicted interface residues within the crystal 27 structure, it appears that the A-C and A-D interfaces were both mapped (Fig. 4). In fact, the largest 28 portion of residues effectively in contact belonged to dimer A-C and the smallest portion to dimer 29 A-D. However, the structural configuration present in the other clusters did not correspond to 3 the A-D dimer. This could be easily verified by observing that the superimposition of the two dimers 4 on the common chain A resulted in evident steric clashes between the subunits, as shown for the 5 cluster 3 (Fig. 5). If the two dimers actually corresponded to the A-C and A-D dimers of the tetrameric 6 structure, the superimposition on the A chain would have caused no significant clashes. 7 8 9 In principle, the absence of the second compatible dimer in calculations can be due to two 10 reasons. First, the interface residues belonging to the second configuration were not present in the 11 AIRs dataset. Second, the residues belonging to the second interface region were present, but the 12 correct structural configuration had a HADDOCK score worse than the wrong sampled 13 configurations. In the present case, the latter reason was the relevant one. In fact, the wrong dimer 14 models in general contained some contacts from both interface regions, thus satisfying a higher 15 number of AIRs than the correct dimer A-D. 16 To obtain a model of the A-D dimer, we performed a second docking run in which the 1 restraints already satisfied in the best cluster (containing the most favored configuration) of the first 2 run were removed from the input dataset. To this end, we looked at the violation analysis of 3 HADDOCK, and retained all contacts that were not satisfied by the majority of the members of the 4 first cluster by at least 3 Å. This resulted in 9 residues being used as input to a second monomer-5 monomer docking run. As in the previous calculation, the first cluster was the largest and contained 6 the models with the best HADDOCK score and desolvation energy ( Fig. 6A and S2). Superimposing 7 the lowest HADDOCK score water-refined model with the crystal structure resulted in an RMSD of 8 0.9 Å from the dimer A-D (Fig. 6B). 9 In summary, the two correct dimeric conformations A-C and A-D were obtained performing 10 two distinct docking runs, the first one with the whole AIRs dataset and the second one with the 11 subset resulting from the removal of the AIRs satisfied in the best cluster of the first run. Crucially, 12 this procedure provided us with two compatible non-overlapping dimeric models that, for 13 symmetry, can be used to reconstruct the tetrameric model (Fig. 7). This step strictly depended by 14 the correct identification of the structural model on which the distance violation analysis was carried 15 out. In fact, selecting the third cluster of Fig. 3 to perform the violation analysis instead of the best 16 one resulted in a second docking run that sampled again the dimer A-C in the two best clusters and 17 not-compatible structural configurations in the others (Fig. S3). 18 19 Extracting the monomer from the PDB of the complex results in a protein model with the 20 side chains oriented in a contact-ready state that favors the correct assembly, in terms of both 21 docking score and RMSD from the experimental structure, as compared to incorrect docking poses. 22  Thus, to test our protocol in a more realistic condition we generated 15 homology models of L-1 asparaginase II using the structure of the homolog from Wolinella succinogenes (Lubkowski et al., 2 1996) as the structural template (PDB ID 1WSA, chain A). The homology models had a backbone 3 RMSD lower than 1 Å from the crystal structure of the E. coli protein, but widely differing in the 4 orientation of the surface side chains. Each model was used in protein-protein docking with the 5 same input AIRs of the "crystal P 0.25" runs, for both the A-C and A-D dimers. The results of Table 3  6 show the significant influence of the orientation of side chains on the ability of the docking 7 calculations to sample the correct dimer in the best cluster. Based on the HADDOCK score of the 8 best cluster for each model, the AC runs pointed out that the five runs with the best score also had 9 the lowest RMSD from the crystal A-C dimer, (green gradient in the table). However, for these five 10 models the second calculation with the AIRs providing the A-D dimer resulted in wrong dimeric 11 conformations. Nevertheless, by inspecting the results for all models (Table 3), it turned out that 12 the runs with the best HADDOCK scores (for their first clusters) indeed provided results 13 conformations close to the crystallographic A-D dimer (in particular models 6 and 15). For further 14 comparison, we performed a docking run of the crystallographic monomer with the 34 residues 15 (25% of the whole protein surface) output by the protocol run at a P cutoff of 0.20. Changing the 16 AIRs dataset with a larger one having the same PPV did not significantly affect the results. 17 Overall, the results described above pointed out the importance of generating a sufficiently 18 large number of homology models to sample many different side chain orientations, thus increasing 19 the probability to capture the orientation permitting residue-residue contacts across the 20 monomeric interface. The best clusters of the two crystal runs showed that ideal side chain 21 orientations provided the top HADDOCK score values. In line with this, the models that had the best 22 HADDOCK scores resulted in the configurations closest to the crystal structure, with a backbone 23 RMSD between 1 and 3 Å from it. For these models, the HADDOCK scores themselves were similar 24 to the values observed for the runs starting from the crystal monomer. Indeed, superimposing on 25 the chain A the AC dimer of model 13 and the AD dimer of model 15 or model6 showed two 26 compatible dimeric models that, taken together, can be used to reconstruct the tetrameric structure 27 ( Figure S4) 28 29 Table 3. Docking results for homology models of L-asparaginase II. The two "Crystal" runs were performed using the chain A of the 30 crystal structure. Each model mainly differs in the orientation of side chains. For each run the HADDOCK score of the best cluster 31 (calculated as the average value of the 4 best structures of the cluster) and the RMSD of its best structure from the experimental 32 dimer are reported.

33
A The predicted inter-monomeric ECs at P=0.30 were matched with 7611 ambiguous 5 assignments from solution-state 3D 1 H 15 N NOESY-HSQC spectrum. The protocol yielded 18 putative 6 interface residues, corresponding to 23% of the whole monomer surface. By comparing the 7 prediction to the of the crystal structure, it appeared that 7 out of 18 residues effectively formed 8 inter-monomeric contacts (Fig. S5). 9 From the docking calculation starting with the crystal monomer we obtained 7 clusters with 10 comparable HADDOCK score values (Fig. 8A). However, the distribution of the desolvation energies 11 discriminated the second cluster as the most favored (Fig. 8B). Indeed, the structural alignment of 12 the best model of this cluster with the experimental dimer revealed an impressive RMSD of 0.6 Å 13 (Fig. S6A). Instead, the same superimposition on the crystal structure of the first cluster resulted in 14 a dimer in which one of the two monomeric units was rotated by 180° with respect to the 15 corresponding experimental monomer, while preserving the same interface region (Fig. S6B). 16 17 18 DISCUSSION 19 20 Solid State NMR is an attractive technique to study large protein assemblies as even systems 21 with high molecular weight can provide very good spectra. However, the determination of their 3D 22 structure involves two very time-consuming steps: the assignment of the side chains in contact at 23 the interface between the subunits and, for homo-oligomeric complexes, the discrimination of 24 intra-vs inter-monomer contacts. In particular, the correct identification of inter-monomer contacts 25 usually requires extensive efforts by an experienced user. From the bioinformatics point of view, 26 focusing on homo-rather than hetero-oligomers makes the interpretation of coevolution signals

A B B
the coevolution analysis of homo-oligomers is based on a single protein MSA, which is relatively 1 effortless to build. Unfortunately, the availability of the three-dimensional structure of the 2 monomeric unit is necessary to successfully separate intra-monomeric and inter-monomeric ECs 3 (Uguzzoni et al., 2017). In this work, we developed a protocol to integrate ECs with NMR-derived 4 ambiguous contacts in order to identify interface residues in homo-oligomers. The input lists of 5 ambiguous contacts can be automatically generated from appropriate solution or solid-state NMR 6 spectra. Our protocol was validated by predicting two difficult cases: the tetrameric L-asparaginase 7 II, in which two distinct dimeric conformations must be recognized to reconstruct the functional 8 complex and the dimeric Sod1, in which the interface region is comparatively small. 9 The correct identification of interface residues was readily verified by comparing the output 10 of the protocol with the known interfaces in the crystal structures of the two systems (Tables 1 and  11 2). This analysis showed that NMR data can be beneficial by enriching the predictions in true 12 contacts (i.e. higher PPV). This improvement comes at the cost of reducing the absolute number of 13 predicted residues, which however did not limit the subsequent docking calculations. The requisite 14 for the integration of ECs and NMR data to be successful is that the initial list of potential inter-15 monomeric ECs contains enough information. This is clearly exemplified by the case of Sod1, for 16 which the absolute number of predictions, after removing all contacts that could be satisfied within 17 the monomer, was quite low. Consequently, many NMR signals could not be matched and the 18 benefit in PPV was modest. Nevertheless, when the total number of predicted interface residues is 19 in a reasonable range (15%-20% of all surface residues, i.e. 12-16 residues for Sod1) the prediction 20 resulting from the integration of ECs and NMR data is more reliable than that based only on ECs. 21 To generate a 3D structural model of the oligomer, the output of our protocol can be 22 exploited in docking calculations. As a proof-of-principle, we run these calculations starting from 23 the monomer conformation observed in the crystal structure. This is an ideal case, where all the 24 side chains at the protein-protein interface are already in the correct rotameric state to engage in 25 the formation of the complex. All the same, it is important to perform this step to ensure that the 26 output contains enough information to successfully drive the docking. This was indeed the case for 27 the main dimer of L-asparaginase II (A-C) as well as for Sod1. The calculation with the complete AIR 28 dataset could not identify the A-D dimer even though the dataset contained contacts belonging to 29 both interfaces. The A-D interface is somewhat smaller than the A-C interface; as HADDOCK aims to 30 satisfy the highest number of AIRs, the situation where the second chain of the dimer is positioned 31 in between the two interfaces, thus partly satisfying both subsets of AIRs, is favored over the 32 situation in which all of the A-D and none of the A-C AIRs are satisfied. To circumvent this bottleneck, 33 it is necessary to separate the residues belonging to each interface. This was done by removing the 34 contacts already satisfied in the first docking calculation to run a second calculation only with the 35 unsatisfied restraints. The best cluster of the second run indeed matched closely the A-D dimer of 36 the tetramer (Fig 6). Intriguingly, the AIRs derived from ECs only at a P cutoff of 0.8 (Table 1), whose 37 number was similar to the number of AIRs used in the "ECs + NMR" calculations, did not contain 38 information on the A-D dimer interface (not shown). Thus, the information provided by ECs at high 39 levels of confidence is not balanced over the two interfaces, presumably due to the evolutionary 40 history of the system. This makes it necessary to use data at lower P cutoffs, which is efficiently 41 filtered by the ambiguous contacts provided by solid state NMR. The experimental data in fact 42 contain information on both interfaces and thus is useful to extract both sets of true contacts from 43 the list of ECs. 44 In a more realistic scenario one would use a homology model of the monomer as the input 45 structure to docking calculations. We tested this scenario by generating 15 different models of 46 L-asparaginase II (Table 3) and using the same input AIRs used in the docking of the crystal monomer 47 for all calculations, so that the structure was the only source of variability. For the A-C dimer, we 48 observed that in four cases the best model of the adduct was within 2 Å from the crystal structure, 1 while an additional calculation provided a model with a RMSD of 2.2 Å. The A-D dimer resulted in a 2 similar situation, with two structures within 3 Å and another two at 3.2 Å. Remarkably, there was a 3 very good correlation between the HADDOCK score and the RMSD, allowing the more accurate 4 models to be identified quite straightforwardly. It is also noteworthy that the best results obtained 5 with the homology models had scores close to those obtained with the crystal monomer, which can 6 be reasonably assumed to represent the best possible score. It thus appears that sampling a 7 relatively extensive ensemble of different conformations is an important factor to obtain accurate 8 models of the oligomer in a real-life setting. 9 In summary, our protocol allowed us to predict homo-oligomeric structure in multimers and 10 in presence of a small homodimerization interface. Notably, this goal was achieved with a minimal 11 user effort, making the determination of the 3D structure of the complex faster than using 12 experimental data alone. The only parameter that must be decided by the user is the probability 13 cutoff P below which the ECs are removed. In our hands selecting a P cutoff such that the number 14 of predicted interface residues was 15%-20% of the number of surface residues in the monomer 15 worked well. Computational aspects 25 26 The protocol described in the "results" section can be carried out running the python script 27 provided (SI Appendix). The script needs four inputs: the ECs files, the PDB structure of the 28 monomeric protein, the experimental ambiguous NMR contacts list and the Naccess file (rsa format) 29 with the relative solvent accessibility of the residues. Details about inputs preparation, script steps, 30 and docking protocol adopted for the L-asparaginase II and Sod 1 are described below. 31 The ECs for both proteins were collected using 3 servers available online: Gremlin 32 (Ovchinnikov et al., 2014)  as an excellent threshold in the selection of true contacts across the interface (Uguzzoni et al., 2017). 45 The experimental procedure for the generation of the ambiguous NMR contacts list is 46 described in the next section. 47 The per-residue relative solvent accessible area for both main chain and side chain was 1 calculated with Naccess (Hubbard, S. J. and Thornton, 1993). Our python script requires the Naccess 2 file in the rsa format to automatically remove all the residues with a relative solvent accessible area 3 below 40% for both the side chain and the main chain. 4 The monomer-monomer docking calculations were carried out with the HADDOCK software 5 (Dominguez et al., 2003). The residues chosen to drive the docking run were given as active residues 6 (directly involved in the interaction) to generate ambiguous interaction restraints (AIRs) with the 7 default upper distance limit of 2 Å. The water-refined models were clustered based on the fraction 8 of common contacts (Rodrigues et al., 2012), FCC = 0.75, and the minimum number of elements in 9 a cluster of 4. For the docking run starting from crystal structures, chain A was used as the input 10 monomer. The number of models generated for each step of the HADDOCK docking procedure were 11 set as follow: 10000 for rigid-body energy minimization, 400 for semi-flexible simulated annealing 12 and 400 for refinement in explicit solvent. The distance violation analysis was performed on the best 13 cluster and the corresponding output written in the ana_dist_viol_all.lis file. In this file we selected 14 all the residues with a violation larger than 3 Å to generate a subset of AIRs to drive a second docking 15 run. Thus, the second docking run was performed using exactly the same conditions as the first one. 16 We generated 15 models of monomeric E. coli L-asparaginase II using the structure of 17 Wolinella succinogenes L-asparaginase (Lubkowski et al., 1996) as a template (PDB ID 1WSA, chain 18 A) using Modeller (Eswar et al., 2007). The two proteins have 55% sequence identity. The resulting 19 template-based models featured a very similar backbone conformation, lower than 1 Å from the E. 20 coli crystal, but different side chain orientations. Each model was assessed in protein-protein 21 docking using the same AIRs used in the "crystal P 0.25" runs, with all the AIRs (A-C dimer 22 calculation) and after the removal of the ones already satisfied by the A-C dimer (A-D dimer  23 calculation), respectively. The number of models generated for each step were reduced as follow: 24 1000 for rigid-body energy minimization, 200 for semi-flexible simulated annealing and 200 for 25 refinement in explicit solvent. 26 All the RMSD values reported in this work were measured on the C⍺ atoms.

28
Solid-and solution-state NMR data 29 30 The L-asparaginase II protein [U-13 C, 15  respectively; 128 scans were acquired; the inter-scan delay was set to 1.5 s in all the experiments. 47 All the spectra were processed with the Bruker TopSpin 3.2 software package and analyzed 1 with the program CARA (Keller, 2007). 2 The assignment of the carbon resonances of the 2D 13 C-13 C DARR spectra of rehydrated 3 freeze-dried ANSII was easily obtained by comparison with the 2D 13 C-13 C DARR spectrum collected 4 on the crystalline and PEGylated preparations of L-asparaginase II (Cerofolini et al., 2019; Ravera et  5 al., 2016). 6 The experimental data used for the Sod1 protein were taken from deposited solution-state 7 3D 1 H- 15  Financial support was provided by the European Commission (project no. 777536). 15 We thank Prof. Gaetano Montelione for many useful discussions. 16  The python script to perform the protocol can be downloaded at the following LINK.