Cancer-related Mutations with Local or Long-range Effects on an Allosteric Loop of p53

The tumor protein 53 (p53) is involved in transcription-dependent and independent processes. Several p53 variants related to cancer have been found to impact protein stability. Other variants, on the contrary, might have little impact on structural stability and have local or long-range effects on the p53 interactome. Our group previously identiﬁed a loop in the DNA binding domain (DBD) of p53 (residues 207–213) which can recruit different interactors. Experimental structures of p53 in complex with other proteins strengthen the importance of this interface for protein–protein interactions. We here characterized with structure-based approaches somatic and germline variants of p53 which could have a marginal effect in terms of stability and act locally or allosterically on the region 207–213 with consequences on the cytosolic functions of this protein. To this goal, we studied 1132 variants in the p53 DBD with structure-based approaches, accounting also for protein dynamics. We focused on variants predicted with marginal effects on structural stability. We then investigated each of these variants for their impact on DNA binding, dimerization of the p53 DBD, and intramolecular contacts with the 207–213 region. Furthermore, we identiﬁed variants that could modulate long-range the conformation of the region 207–213 using a coarse-grain model for allostery and all-atom molecular dynamics simulations. Our predictions have been further val-idated using enhanced sampling methods for 15 variants. The methodologies used in this study could be more broadly applied to other p53 variants or cases where conformational changes of loop regions are essential in the function of disease-related proteins.


Introduction
The TP53 gene, and its protein product p53, is best known as the guardian of the human genome 1 but recently also as the guardian of cellular homeostasis. 2 However, while p53 is predominantly known as one of the most important tumor suppressors, it can support cell survival at the benefit of cancer cells in certain scenarios, implying that p53 has a more complex role at the cellular level than previously assumed. This complexity may be investigated by looking at p53 as a dual-role gene, i.e., a gene that acts in a contextdependent manner as a tumor suppressor or tumor promoter. [3][4] The p53 protein has remarkably diverse functions in the cell, ranging from transcription-dependent (mostly associated with its nuclear localization) and transcription-independent (mostly cytosolic) functions. [5][6] For example, p53 transcription-independent functions that depend on its cytosolic pool [7][8] are related to mitochondrial apoptosis, 8 autophagy 9 or accumulation of Ca 2+ ions within the endoplasmic reticulum. 10 In normal cells, the p53 protein is generally expressed at a low level, and it is often post-translationally modified and stabilized upon stimuli such as DNA damage, cellular stresses, and other alterations. 6,11 The complexity of p53 functions and behavior suggests that mutations can have heterogeneous consequences depending on the context and cellular compartment. TP53 is among the most frequently mutated genes in human cancers, [12][13] with a high occurrence of missense mutations, which can result in loss of its transcriptional activity or gain-of-function (GOF).
Recent research has highlighted that cancerpredisposing syndromes are a major cause of malignancies development, especially in pediatric patients. [14][15][16][17] Germline variants in the TP53 gene cause the so-called Li-Fraumeni Syndrome (LFS), 18 which is often underdiagnosed, and approximately one-third of the affected people develop cancer during their childhood. [19][20] Mutant variants of p53 have been claimed to be undruggable for a long time, but recent studies suggest the opposite [21][22] and pave new directions for the possibility of rescuing structural mutants of p53.
p53 is a multi-domain protein, and the vast majority of mutations found in human cancers are located in the DBD (residues 101-292 as reported in UniProt). 13 P53 DBD features an immunoglobulin-like b-sandwich fold with several disordered loops, including those for DNA binding. The L1 loop (residues [113][114][115][116][117][118][119][120][121][122][123] can adopt different conformations that are more (extended conformations) or less (recessed states) favorable for interactions with DNA through K120. 23 p53 DBD evolved to be characterized by marginal stability. 24 The effects exerted by p53 variants at the structural and functional levels can be different and context dependent. Some alterations might impact interaction with DNA or other biological partners, posttranslational modifications (PTMs), or alter structural stability. 21 Other variants can have more elusive and long-range effects, acting as allosteric variants. [25][26][27][28][29][30] It cannot be ruled out that some variants can affect different properties at the same time, i.e., influencing the structural stability and exerting local effects on binding or more complex longrange mechanisms. 31 In previous work, we illustrated the importance of the dynamics and conformations of the S6-S7 loop (i.e., the loop connecting the beta-strands 6 and 7 of the DBD) to recruit different binding partners and for longrange communication with the DNA binding loops. 25 We also highlighted the importance of the disordered region upstream to the DBD, i.e., residues 91-100, in modulating the conformation of the S6-S7 loop. We thus focus our investigation on a construct including the DBD and the disordered region (namely p53  ). Moreover, molecular dynamics (MD) simulations of destabilizing mutants of p53 DBD suggested that open states of the S6-S7 loop can form a pocket that is likely to be druggable. 32 Here, we studied 1132 variants in p53  for their effect on the allosteric communication regulating the conformation of the S6-S7 loop of p53. As a consequence, these variants are likely to reshape the interaction preferences of the protein or the long-range communication to the DNA binding site. A schematic workflow of the study is reported in Figure 1.

Overview of the variants investigated in this study
We aggregated all the variants that have been found in the p53  and reported in cancer genomic databases, including CBioPortal and COSMIC (see Materials and Methods), for a total of 1091 different variants (Table S1, Figure 2(A)). All the positions of the DBD have been found to be mutated in the aggregated dataset. We compared these mutations to 1098 somatic and 237 germline missense variants reported in the TP53 database 33 and we observed that the majority overlap with somatic variants (Figure 2(A)). Nevertheless, more than 150 variants have been found both as somatic and germline variants, depending on the cohorts under investigation (Figure 2(A)). 215 variants in our aggregated dataset from cBioPortal and COSMIC were not reported in the TP53 database. We also noticed that the majority of the germline variants overlapped with the somatic ones and seven of them were not found in our aggregated dataset (i.e., N210Y, R290H/L/C, K292I, F134Y, and K291R).
Comparing these data with 118 variants reported in gnomAD 51 for p53, we also identified three additional variants occurring with low frequency in the population that have not been reported in any of the datasets mentioned above (i.e., S116A, L114S, and G112A) which we also included in this study.

Selection of structures to use in the study
We scrutinized 31 available structures for the p53 DNA binding domain (DBD) and selected highresolution ones in free or DNA-bound states for structure-based calculations (Table S2). We refer to p53 in the different assemblies as p53 91-289_monomer and p53 91-289_dimer . In particular, we used the monomeric state as a convenient minimalistic model to understand the effect of the mutations on the DBD, as we did in our previous study. 25 Additionally, we studied the monomeric state in complex with DNA (DNA:p53 91- Figure 2. Overview of the datasets used in the study. (a) The Venn diagram illustrates the source of the cancer variants. The majority of mutations come from COSMIC, cBioPortal and the TP53 database. Ten variants are not covered by these, including N120Y, R290H, R290L, R290C, K292I, F134Y, S116A, L114S, and G112A. Furthermore, the Venn diagram depicts how the same variants have been found either as somatic or germline, depending on the study. The encircling bar chart represents the sequence of the p53 DBD starting with tryptophan at position 91 and ending with leucine at position 289. The height of the bar indicates the number of variants at that site in the datasets. G245, the position with the highest number of alterations, has 13 observed amino acid substitutions. Furthermore, sites of interest for this study such as the L1 loop, S6-S7 loop, and zinc-binding sites are marked on the chart and colored in grey. (b) The flow diagram in panel b illustrates the workflow for the selection of the experimentally determined structures to use for the calculations carried out in this study. We aimed to select: i) a structure to use to represent the monomeric state of p53 DBD, ii) a structure to use in complex with DNA, iii) and a structure representative of the dimeric assembly observed when p53 is localized in the cytosol. We selected 29 out of 221 structures for further inspection and we selected the PDB entries 2XWR (chain A), 3KZ8 (chain B), and 2OCJ (chain B and D).  . Comparison of methods used to predict effects on structural stability. We evaluated the performances of the methods and influence of the initial structure on the prediction of changes in folding free energy upon mutation of p53 DBD. To this goal, we compared the predicted values to experimentally determined ones, selecting variants of p53 that are reported in the ThermoMutDB database. (a) Illustrates how nine of the mutations have multiple estimated experimental values, depicting the variability in the experimental results. (b) The panel shows the average DDG experimental values for each of the ThermoMutDB database mutations. It is these average values used for the comparison. (c) The panel illustrates the linear relationship as Pearson correlation between the experimental values and the predicted values of the five different combinations: FoldX calculations based on a molecular dynamics (MD) ensemble, the X-ray structure, the X-ray structure after refinement with PDB-REDO, along with Rosetta calculations based on an X-ray structure and the structure after PDB-REDO refinement. The calculations based on FoldX and Rosetta using the MD ensemble and the X-ray structure respectively featured the highest Pearson coefficients. (d) We evaluated the performance of the different combinations. In this case, we divided in classes of effect, i.e., stabilizing (DDG < À1.2 kcal/mol), neutral (À1.2 DDG 1.2 kcal/mol), destabilizing (1.2 < DDG 3 kcal/mol), and highly destabilizing (>3 kcal/mol). The evaluation is not solely based on correlation and introduces flexibility. FoldX has the best accuracy for this case. 5 289_monomer ) to assess the impact of the mutations on DNA binding. We used the dimeric state to simulate the conformation in the cytosol, where the favored state for p53 appears to be a dimer, according to cellular studies 52 in which the two monomers interact together in a conformation different from the ones observed in the DNA-bound tetramer of p53. [52][53] The process of selection is illustrated in Figure 2(B).
Evaluation of the approaches used for free energy calculations to predict the effects of mutations on structural stability Here, we aim to identify mutations that could alter the conformation of the S6-S7 loop either through local or distal (allosteric) effects and, in turn, affect the function of the loop in engaging in proteinprotein interactions to modulate the transcriptionindependent functions of p53. 25 In the first part of the work, we aimed to identify mutations that are expected to have a marginal effect on the protein structural stability of p53 91-289_monomer and, as such, could be more meaningful for the characterization of their functional impact. We selected two different methods based on different energy functions and sampling approaches to estimate changes in folding free energy (i.e., FoldX and Rosetta, see Materials and Methods).
Before performing the free energy calculations to select the variants to retain in the study, we evaluated the accuracy of the predictions against a pool of p53  variants extracted from the ThermoMutDB database, 54 which reports experimentally determined unfolding DDG values. We identified 43 annotated variants with single mutations. Of these variants, 35 correspond to mutations that we also identified in the cancer datasets described above (Table S1). We identified nine mutations with at least two different experimental measurements for the same mutation (Table S1), which allows us to appreciate the differences in the estimates at the experimental level ( Figure 3 (A)). We could observe that some discrepancies in the experimentally determined values for these substitutions are likely due to differences in the experimental protocols or in the biophysical methods. Overall, the standard deviations for these experimental measurements were in the range of 0.01-1 .21 kcal/mol, which suggests a large spread in the values. To avoid potential over-reliance on the singular experimental values, we used the average value for each amino acidic substitution (Figure 3 (B)).
Furthermore, we aimed to assess the accuracy of different approaches to predict changes in folding DDGs, along with the influence of using different initial structures for the calculations. Thus, we compared the predicted and experimentally determined DDG values for the 43 mutations reported in ThermoMutDB. The initial structures were an X-ray structure covering p53 91-289 , the same structure upon refinement with PDB-REDO 55 X-ray PDB-REDO , and an ensemble of 20 structures from a Molecular Dynamics (MD) simulation (see Materials and Methods). We collected the following calculations: i) scans with the Rosettabased free energy function for the X-ray and Xray PDB-REDO structures, ii) scans with the FoldX free energy function for the X-ray, X-ray PDB-REDO , and the MD ensemble. The MD ensemble was used exclusively for FoldX scans to evaluate if it was possible to compensate for limitations with backbone remodeling of FoldX using an average over an ensemble of structures. The accuracy of the predictive methods was evaluated by the linear relationship between experimental measurements and predictive estimation (Figure 3(C)). According to the values reported in Figure 3(C), we observed that: i) there is no clear advantage of applying the PDB-REDO refinement to repair the X-ray structure before the mutational scans of the p53 91-289_monomer structure used in this study, ii) using an MD ensemble yields better results than a single structure with FoldX and, finally (iii) while the Rosetta algorithm on the X-ray structure generates a better correlation than any other method, the algorithm cannot be deemed superior under these conditions due to the low correlation found on the X-ray PDB-REDO structure. In light of the variability observed among the experimentally derived values (Figure 3(A)) and the fact that the empirical energies functions used in the study generally properly recapitulate differences observed for classes of effects rather than exact DDG values, 56 we classified the mutations found in ThermoMutDB in four different classes: i) stabilizing (DDG < À1.2 kcal/mol), ii) neutral (À1.2 DDG 1.2 kcal/mol), iii) destabilizing (1.2 < DDG 3 kcal/mol), and iv) highly destabilizing (>3 kcal/mol) (Table S1). We then applied the same criteria to the predicted values from Rosetta or FoldX. Introducing classes based on threshold values offers some flexibility and may help overcome the limitations due to variability in the predicted and experimental values. We then evaluated the prediction accuracy according to this classification ( Figure 3 (D)) and observed that FoldX performed slightly better than Rosetta on p53 in this application.
Hence, the validity of the predictive values has been evaluated based on multiple methods and structural starting points. While regression correlations and accuracy measures of a confusion matrix are not comparable, both approaches illustrate reasonable predictive powers indicating that both FoldX and Rosetta can be applied for this study. Based on these analyses, we excluded the combination of the Xray PDB-REDO structure and the Rosetta energy function due to the poor agreement with the experimental data from further analyses.
Lastly, we assessed the extent to which each of the remaining methods classified the mutations in the same way. Accordingly, each amino acid substitution was classified according to its associated change in free energy using the aforementioned criteria. A majority vote then determined a single class for the mutation. Based on these two pieces of information, we calculated a fraction indicating the level of agreement leading to the majority vote classification. Given the four methods, this fraction could be either 0.5 for a 2/4 majority, 0.75 for a 3/4 majority, or 1.0 for a complete agreement. We also calculated the percentage of cases in each class fraction to evaluate if method agreement is higher in one class than another. According to this analysis, the methods agree when it comes to the distinction between neutral and highly destabilizing mutations, while the stabilizing and destabilizing mutations are more difficult to discriminate from the neutral mutations ( Figure S1). This result is also in line with the fact that FoldX energy function suffers from limitations in predicting stabilizing mutations. [57][58][59] Identification of p53 variants found in cancer studies with impact on structural stability In light of the comparison with the experimental data, we used the saturation scans to extract information on all the mutations reported in Figure 4(A). For this part of the work, we relied on the results from the MD ensemble with FoldX which is in better agreement with experimental data than using single conformations ( Figure 3 (C)), and the results with Rosetta on the X-ray structure. We thus classified each of the p53 mutations collected in the datasets reported in Figure 2(A) in these four categories: stabilizing, neutral, destabilizing, and highly destabilizing for each of the two approaches mentioned above (Table S1). As mentioned above, our goal was to identify mutations with the potential to alter the transcription-independent functions of p53 through local or long-range effects on the conformation of the S6-S7 loop (Figure 4(B)). In this context, for mutations that caused an increase in folding DDGs, the destabilization of the protein structure is expected to be the predominant signature at the cellular level and hide other functional effects due to changes in protein-protein interactions. In consideration of this, we applied strict selection criteria and retained only those mutations predicted to be neutral (or stabilizing) from both FoldX and Rosetta. Accordingly, we identified 395 likely neutral variants for the protein structural stability (Table S1). These mutations are distributed over 143 residues of the protein (Figure 4(B)).
Furthermore, we used the MD ensemble to derive a Protein Structure Network based on atomic contacts (acPSN) representation and identified residues with hub behavior, [60][61] as detailed in the Materials and Methods. Hub residues in a PSN are known to be important hotspots for structural stability or key residues in long-range structural communication. We thus evaluated if each of the selected 143 mutation sites has a hub behavior (Figure 4(C)). 45 sites are predicted as hub residues but the majority with a degree of 3 or 4. Some of the mutations at these sites might not affect this property as they could retain similar capabilities to engage in atomic contacts (for example mutations to phenylalanine at Y163, Y205, Y234, and Y236) and should be re-assessed upon analyses of MD simulations of each mutant variant (see below).

Local effect of the mutations on the DNA binding interface
In the nucleus, p53 binds DNA to activate or inhibit transcription. In this compartment, the protein is mostly found in its tetrameric state, where each monomer binds its DNA target. [62][63][64][65][66][67] We used a minimalistic model for the interaction between a monomer of p53 and DNA ( Figure 2 (B)) that resembles the interaction of a p53 mono- On the right, the saturation mutagenesis scans are illustrated as a heatmap for the two loops as an example. The full set of data is reported in the GitHub repository associated with the publication. Highly destabilizing amino acid substitutions for structural stability are highlighted in yellow, while dark blue indicates mutations that are predicted to be stabilizing the folded structure of the protein. These saturation scans are used to extract predicted DDG values for all the possible variants of p53 DBD, including the ones of interest for the study (Figure 1). (b) The panel shows the predicted DDG values with Rosetta or FoldX along the protein sequence of the p53 DBD. The goal was to identify and retain for further analyses alterations with marginal effects on structural stability. The red line indicates the threshold defined to discriminate between possible neutral and destabilizing amino acidic substitutions (i.e., 1.2 kcal/mol). We selected variants from the aggregated datasets of Figure 1 that were classified as neutral in both the FoldX and Rosetta datasets, resulting in 395 variants overall. The 395 mutations are distributed over 143 mutational sites. (c) We obtained a protein structure network based on atomic contacts (acPSN) from the MD ensemble of p53 91-289_monomer . We then estimated the hub residues (node degree equal to 3 or higher) that are mapped on the 3D structure of p53 91-289_monomer . The color code and ribbon thickness of the structure highlight the node degree of each hub using a color gradient from dark green (node degree = 3) to yellow (node degree = 7). mer with DNA in the tetrameric assembly (DNA: p53 91-289_monomer ). 25,32,68 Before assessing the effect of the 395 selected mutations on the longrange communication to S6-S7, we evaluated if any of them also has local effects on DNA binding. To this goal, we estimated the effects of the mutations found in our datasets on the interaction with DNA using FoldX-based calculations of changes in binding Gibbs free energy upon mutation (Figure 5(A)).
In addition, to gain an overview of potential hotspots, we performed the same calculation considering the 1132 variants collected in this study. We identified five main areas that had a change in the interaction energy. The hotspots were respectively in positions 119, 120 and 121, which are known positions to mediate DNA binding, 23 position 132, positions 239-249, positions 273-281, and positions 285, all of which directly face the DNA. Within these hotspots, we identified 36 substitutions with a destabilizing effect on the binding between p53 and DNA (i.e., DDG values higher than 0.8 kcal/mol) and three mutations with the opposite effect (i.e., DDG values lower than À0.8 kcal/mol). When reducing the pool of mutations to the 395 selected variants, we identified 21 variants that are likely to have a local effect and alter the interaction with DNA ( Figure 5(A)). Of these, S121F, N247Y, A276F were predicted to have a stabilizing effect on binding, whereas K120N/Q/E, Figure 5. Local effects of mutations on binding interfaces. (a) The panel shows p53 monomer in complex with DNA, where 10 residues account for a total of 21 mutations that are all neutral for the stability of the protein but impact the binding with DNA. The structure illustrates the positional placement of the residues, and the bar chart shows the predicted changes in DNAbinding free energy upon mutation. The black error bars represent the standard deviations of the performed runs. Mutations S121F, N247Y, and A276F are all predicted as stabilizing the binding of p53 DBD to the DNA, while the remaining 19 mutations are predicted as destabilizing the interaction, possibly resulting in decreased transcription. (b) The panel illustrates the cartoon representation of the p53 DBD dimer (which is representative of the assembly of p53 DBD in the cytosol). In particular, we focused on the residues at the binding interface for the sake of clarity. Each position is colored according to the sum of mutations that have a destabilizing effect at each position upon a saturation scan with the FoldX energy function. The residues for which neutral mutations in terms of structural stability were identified are shown as sticks. The bar chart shows on the x-axis the list of mutations for which the DDG of binding is higher than 0.8 kcal/mol or lower than À0.8 kcal/mol and on the y-axis the average DDG of binding in kcal/mol. Positive values indicate a destabilizing effect, while negative values a stabilizing one. The black bars represent the standard deviation of the predicted values. S141V, R273H, A276P/S/T, C277W/Y/R, R280I/T/ K, D281V/Y, and E285A/Q had destabilizing effects. These alterations may result in a reduction of DNA-p53 binding in terms of time and affinity which subsequently could lead to alteration of p53 activity as a transcription inhibitor or activator. Hence, these 21 mutations exemplify nondestabilizing mutations with functional impact. The 21 mutations are found in Table S3.
Local effect of the mutations on the binding interface in the dimeric state of p53 in the cytosol In the cytosol, where the transcriptionindependent functions of p53 are carried out, the preferred state is as a dimer. 52 Hence, we evaluated if any of the 395 selected variants had an impact on the local binding between the two monomers of p53 in the p53 91-289_dimer structure using the FoldX energy function to estimate the changes in DDGs of binding ( Figure 5(B)). Most of the mutations did not have any direct effect on the binding according to our calculation with the exception of 23 positions for which the DDGs of binding upon certain mutations was DDG < À0.8 kcal/mol or DDG > 0.8 kcal/mol (see Materials and Methods), thus indicating a stabilizing or destabilizing effect on the dimer formation respectively. According to all the performed scans (see Materials and Methods), C277Y, H179Y, and N263I mutations seem to stabilize the interaction. C277Y, which is predicted to destabilize the binding to the DNA (see above), might favor the binding of two p53 monomers through the formation of a hydrogen bond with D148. On the other hand, A276D, H178D/L/N/Q/Y, M243I/K/R/T/V, R181C/G/H/L/S/Y, S106R, and S261C/G/N were classified as destabilizing the binding, with H178 predicted to be a hotspot of highly destabilizing mutations (i.e., DDG > 3 kcal/ mol). Interestingly, the newly identified germline mutation R181H appears to be deleterious for dimer formation, and, in turn, for interaction with other binding partners. Indeed, despite its neutral classification in terms of protein stability, we observed an average binding DDG of 1.249 ± 0.093 kcal/mol for p53 91-289_dimer . Analysis of the structures generated by FoldX upon the R181H mutation indicates the tendency of the H181 sidechain to form a hydrogen bond with the carbonyl oxygen of P177, located on the same chain, instead of pointing towards the other monomer. However, it is difficult to predict the behavior of the histidine sidechain without experimental data available that could suggest the correct protonation and tautomeric state of the imidazole ring (see Discussion Section).

Identification of mutations with local effects on conformation and dynamics of S6-S7 loop
In a previous study, 25 we showed that the conformation of the S6-S7 loop (residues 207-213) can Figure 6. The residues of the S6-S7 loop are involved in local atomic contacts with some of the mutation sites of p53  . The figure illustrates the analysis of atomic contacts from the MD ensemble of p53 91-289_monomer . The heatmaps show the occurrence of each atomic contact for residues of the S6-S7 residues in the MD ensemble. The color gradient indicates contacts from low (yellow) to high (dark purple) occurrence. The right panel illustrates the location on the 3D structure of p53 91-289 of the local contacts of the residues of the S6-S7 loop. We represent each contact as a stick connecting the Ca atoms of the pair of residues involved in the interaction. The sticks are colored according to the same gradient used for the corresponding heatmaps. The radius of the sticks is proportional to the occurrence of the atomic contact (with a maximum radius of 0.4 A for a contact occurring in all the simulation frames). We show the Ca atoms of the residues of the S6-S7 loop (white spheres) and the mutation sites (grey spheres). We identify 20 mutation sites involved in local atomic contacts with the S6-S7 loop. be modulated by changes in the surrounding residues, such as phosphorylation at the S215 position. 25 To be able to quantify mutational sites that might have a local effect on the conformation and function of the S6-S7 loop, we estimated atomic contacts between each residue of the loop and its surroundings and their lifetime (see Materials and Methods for details). The results are depicted in Figure 6 and summarized in Table S1. Overall, we found 20 mutation sites that could have a local effect on the conformation of the S6-S7 loop. For example, the analysis highlighted the importance of S215, whose phosphorylation has been shown to entrap the conformation of the loop in an accessible state 25 and inhibit DNA binding. 69 In addition, we identified mutation sites located in the Nterminal disordered region (P92, L93, S95, S96, and V97), which are also of interest since movements of this region can regulate the conformation of S6-S7 according to our previous study, and NMR experiments. [70][71] We also identified M160 and M169, whose conformation is sensitive to conformational changes in the area where the disordered region of the N-terminal and the S6-S7 loop are located according to NMR experiments. 70 Identification of long-range allosteric mutations in p53 DBD with marginal effect on protein stability In a previous work, 25 we also illustrated that the S6-S7 loop (residues 207-213) can be allosterically modulated by conformational changes occurring in the DNA binding loop L1 (residues 114-124). Here, we carried out similar PSN-based analyses to estimate the paths of communication between the two loops. We then evaluated which of the 143 mutation sites is part of these communication routes (Table S1 and Figure 7(A)). Apart from mutation sites located in the endpoints for the communication (L114, S116, T118, V122, C124, D207, T211, and R213), we found 17 additional mutation sites in intermediate residues for the communication (K132, M133, F134, Q136, I162, K164, H193,  I195, Y205, H214, Y236, L252, E271, E285, R273, and C275). Among the intermediate residues in the communication paths between S6-S7 and L1 loops, H193, H214, Q192, and Y205 are also in direct contact with the S6-S7 loop ( Figure 6). Their mutations are thus likely to have a combined effect on the local structure of the loop and interfere with long-range communication.
Finally, we analyzed with a similar approach if any of the 143 mutation sites had the potential for longrange communication to the S6-S7 loop (Figure 7 (B)). The analysis resulted in 42 of the selected mutation sites being connected through paths of communication to D207 of the S6-S7 loop and 39 sites interested by paths connecting to the triad of residues D208, T211, and R213 in the S6-S7 loop. Due to the large number of mutation sites identified with a possible long-range effect on the conformation of the S6-S7 loop, we further quantified the allosteric communication with a different method based on AlloSigMA 2, 72 which allows modeling the steric effect of the mutations. Indeed, finding a site that is involved in a path of communication is not direct proof of an effect induced by a mutation at that site on a distal loop. Some amino acid substitutions could have no significant effect on the path of communication or result in the formation of alternative routes for distal communication, resulting in a neutral effect on the conformation of the S6-S7 loop. We thus applied the coarse-grained model implemented in AllosSigMA 2 to estimate the potential of each mutation to allosterically regulate the S6-S7 loop. In particular, the model allows to evaluate the effect of increasing (UP mutation) or decreasing (DOWN mutation) the steric hindrance at a certain site (e.g., our selected 143 mutation sites) and estimate the changes in allosteric free energy at distal sites (e.g., the S6-S7 loop) (Figure 7(C)). We noticed that the majority of the DOWN mutations resulted in a negative allosteric free energy, Dg, for the residues of the S6-S7 loop with only two sites interested by destabilizing  We then analyzed the mutation sites that occur in the long-range communication paths between L1 and S6-S7 loops or that communicate long-range to S6-S7 (Figure 7(C)) for their changes in allosteric Dg values. This analysis allowed us to pinpoint some variants that do not seem to alter the allosteric free energy substantially despite being in intermediate or terminal residues for long-range communication (such as T118 and V122 of the L1 loop). In addition, the analysis allowed us to confirm variants that are involved in the shortest communication paths to the S6-S7 loop that can have a destabilizing (in 12 mutation sites) or stabilizing effect (in 22 mutation sites) on the dynamics and conformation of the loop. Intriguingly, the effect can be dual in some sites, with mutations destabilizing the loop conformation (E198Q, R273Y, T140I, T230I, and T231I) or stabilizing it, reducing its flexibility (E198D, R273H/L/C/Q, T140S, T230S, and T231A). The latter is interesting to investigate further with molecular simulations (see below) since the reduced flexibility of the S6-S7 loop could correlate with its entrapment in one of the occluded or open states that could decrease or enhance its interactions with biological partners, respectively. 25 Furthermore, we also noticed other sites that seem to exert an allosteric effect on the S6-S7 loop if mutated, such as R181, R280, and R282.

Molecular dynamics investigation of 15 variants for their long-range effects on S6-S7 loop dynamics and conformation
To overcome the limitations of the AlloSigMA 2 approach, which is a convenient high-throughput approach for assessing the effects of mutations but does not allow the study of the protein structure at the atom level, we turned our attention to explicit solvent MD simulations.
We thus selected 15 interesting variants (K120E, K120N, R158L, R181H, R248Q, R248W, R273C, R273H, R280K, R280T, R282Q, S215G, S215I, S215R, and V217M) to carry out unbiased MD and/or enhanced sampling metadynamics simulations. We selected the 15 variants to cover different cases: i) four mutations at sites that could have local effects on the conformation of the S6-S7 loop (S215G/I/R); ii) in regions in proximity of the L1 loop that can allosterically communicate to S6-S7 loop (K120E/N); iii) in allosteric communication according to the PSN and AlloSigMA 2 analyses with S6-S7 (R181H, R273C/H, R282Q, R248W, and R280K/T, V217M) or iv) as a control since are expected not to have major effects (R158L, R248Q). Some of these variants have been also investigated in other studies and seem to be connected to changes in the loop of interest for this study. 32,73 At first, we evaluated if any of the mutations caused marked changes in the local atomic contacts (Figure 8). In particular, we focused, as we did for the wild type, on the effects of the mutations on the atomic contacts involving residues of the L1 loop or the S6-S7 loop in comparison with the wild-type variant. None of the variants showed a significant change in the pattern of intramolecular contacts during the simulation time, except for R181H, R273C, and S215G substitutions. In the first two cases, the residues of the L1 loop establish contacts with higher occurrence with the surrounding residues, while the S215G variant has a marked effect in the inter-residue contacts of the S6-S7 loop. The results are summarized in Table S4.
Next, we used the PSN-MD approach also for the 15 variants with a focus on changes in hub behavior and shortest paths for long-range communication.
In terms of hubs, 80 of the 143 sites did not change their hub or non-hub behavior upon any of the mutations of interest. We noticed that the mutants caused hub fluctuations (i.e., changes in the number of edges connected to the hub, that kept the hub behavior) at 18 sites. We identified 21 sites that are hubs in the wild type but not in some variants, and 24 sites that become hubs upon mutation. Interestingly, the mutations R273C and R282Q cause the loss of the corresponding hub at that position, while S215I and S215R acquired the hub behavior at position 215. The results are summarized in Table S5.
Finally, we evaluated if any of the mutations could alter the long-range communication paths that we identified for the wild-type variant (Table 1). We observed that the communication paths between the loops L1 and S6-S7 were not completely abolished upon most of the amino acid substitutions. Many variants only interfered with a small subset of intermediate nodes for long-range communication preserving the endpoints for it. Only R181H and mutations at the 215 sites to glycine or arginine seem to have pronounced effects. In addition, R158L, R273C/H, and R282Q abolished additional routes for possible allosteric regulation of the S6-S7 loop that is not connected to the L1 loop (Table 1, third column).
MD simulations can also be analyzed using dimensionality reduction techniques, such as Principal Component Analysis (PCA). 74-75 PCA allows to identify the most important motions during the protein dynamics and can also be used to compare different simulations in a reduced subspace. We here applied PCA to compare each of the variants to the wild-type in the same essential subspace with a focus on the conformation of the S6-S7 loop ( Figure 9) and the L1 loop that is in communication with it ( Figure S2). We then analyzed the projection along the first and the second principal components in detail, which gives an indication of the conformation space covered by the S6-S7 loop in the different simulations, and, as such, provides a measurement of which mutations are likely to retain the native dynamics (and conformation) of the loop and which ones can alter it. We observed that S215G, R181H, and R280T seem to have the most pronounced effects on the conformation of the S6-S7 loop. Unbiased classical MD simulations suffer from limitations when it comes to the analyses of effects induced by mutations and they encounter the risk of giving a limited view of the conformational space accessible to the protein, especially in the case of loop dynamics. We thus relied on a method to enhance the sampling and accurate estimate of the free energy landscape 76 associated with the conformation of the S6-S7 loop based on metadynamics (see Materials and Methods). We obtained converged free energy profiles for ten of the mentioned variants (K120E, R158L, R181H, R248Q, R248W, R273C, R273H, S215G, S215I, and S215R), which allowed us to more accu-rately quantify if the mutations influence the conformation of the S6-S7 loop and the related population of each state (open or occluded) in solution (Figure 10). In Figure 10, we showed as an example the results for one of the collective variables from the metadynamics simulations (i.e., the distance between the Ca atoms of R156 and D207). This is also one of the most representative collective variables used for the previous study on the S6-S7 loop. 25 According to our previous work, occluded states of the loop correspond to values of this collective variable lower than 1.5 nm, whereas the open and accessible states for biological interactions are generally at values higher than 2 nm. It has to be taken into consideration that this distance refers to the Ca atoms only and is used here to pro- Figure 8. Mutations in the p53 DBD can alter the intramolecular contacts of the S6-S7 loop. The panels illustrate an example of how some of the variants can reshape the contact map of the S6-S7 loop. We used the CONAN software to perform the analyses. The heatmaps elucidate how the inter-residue contacts of R213, belonging to the S6-S7 loop, changed between the wild-type and the S215G variant. The contacts are colored with a gradient from yellow (low occurrence) to dark purple (high occurrence). On the right, we report the contact observed for the S215G variant with a focus on the residue R213 of the S6-S7 loop. The panel shows only the interactions with occurrence higher than 0.1 (i.e., 10%). Each contact is represented as a stick that connects the Cɑ atoms of the interacting residues. The radius of the stick is proportional to the occurrence in the simulation frames (i.e., 100% occurrence corresponds to a radius of 0.4 A) and the colors of the sticks correspond to the one of the corresponding heatmaps on the left. The Cɑ atoms are represented as white (R213) and gray (all the other residues not belonging to the S6-S7 loop) spheres. vide a suitable first screening of the effect of many variants. Nonetheless, to identify the structural details of the conformational changes of the S6-S7 loop other properties should be measured as well, such as solvent accessibility and the location of specific side chains important for intermolecular interactions. These properties will be the subject of future studies in which we will investigate the mechanistic details of each variant.
The K120E substitution does not feature a marked impact on the sampling of the two states ( Figure 10(A)) but it seems to decrease the energy barrier for the interconversion among the two states. The R248W substitution causes a long-range perturbation that promotes a higher conformational heterogeneity in the S6-S7 loop, where also intermediate states between occluded and open states are observed (Figure 10(B)).
The R273C mutation disfavors the sampling of occluded states and seems to stabilize more open conformations of the S6-S7 loop (Figure 10(C)). In our previous study, open states of this loop correlated with conformations of L1 less prone to DNA binding, thus, suggesting that the R273C mutation could result in a p53 variant more prone to protein-protein interactions for transcriptionindependent functions of the protein and with compromised transcriptional activity. R273H, instead, does not seem to compromise the capability to explore states of the loop characterized by different degrees of accessibility but it decreases the energy barrier associated with the conformational changes ( Figure 10(D)).
The S215G mutation (Figure 10(E)), which alters a phosphorylation site that was known to entrap the protein with an open state of the S6-S7 loop, 25 (Figure 10(F)). The S215R substitution (Figure 10(G)) introduces a pos- Table 1 Effects of mutations on structural communication routes from different sites to the S6-S7 loop. We collected PSN-MD analyses for each of the 15 mutant variants of p53  . We then evaluated if any of the wild-type intermediate or end points for the communication between the L1 loop and the S6-S7 loop was altered. 'D' in the second column of the table indicates that a certain residue part of the wild-type communication routes is lost in the simulations of the mutant variants. 'Intermediate' or 'end point' indicates the role of the residue at that position in the communication routes found for wild-type p53  . We also evaluated if the mutation caused a change in the paths from the mutation site to the S6-S7 loop. This information is reported in the third column of the table. Here, 'D' indicates that the mutation caused the loss of communication between that mutation site and specific residues of the S6-S7 loop, whereas '+' indicates that there is a newly formed communication route induced by the mutation between the site and the S6-S7 loop, which was not previously observed for the wild-type. '-' indicates that there is no substantial change upon mutation, according to the PSN-MD analyses.

Variant
Paths from L1 to S6-S7 loop Paths from the mutation site to S6-S7 loop itively charged residue that promotes open states of the loop, which points in the direction that including a charged residue at this position can regulate the loop conformation. Indeed, a similar effect was previously observed upon phosphorylation at S215, which introduces a negatively charged residue. This means that both electrostatic repulsive and attractive mechanisms could be in play in this region to achieve the same conformational effect of activating the loop-related functions. R158L does not have a marked effect on the loop conformation apart from disfavoring the more extreme open states (Figure 10(H)). R248Q (Figure 10(I)), which was not predicted by the previous analyses using unbiased classical MD simulations to have an effect, causes a stabilization of the loop in its occluded state, suggesting that this mutation will affect the recruitment of p53 interactors through the S6-S7 loop. This result also emphasizes the importance of employing enhanced sampling approaches in molecular simulations to accurately understand the effects of mutations. The germline variant of unknown significance R181H tends to disfavor conformations of the S6-S7 loop that allow for interactions with other biomolecules with a potential effect on the transcription-independent functions of p53 (Figure 10(J)). Summary of the effects of the selected 15 variants and comparison with scores for pathogenicity or driver impact For the 15 variants that we studied using molecular simulations, we summarized the results from the different properties estimated in this study. The results are reported in Table 2 and we also compared our results with the scores for pathogenic or driver effects, as estimated by REVEL 77 and CSscape-somatic, 78 respectively. According to the predictors for pathogenicity or driver mutations, each of these mutations features very high scores despite their marginal effect on We here show the results from the metadynamic simulations of the wild-type p53 91-289 variant and ten variants which are representative of amino acid substitutions in direct contact with the loop or located at distal sites. We reported the mono-dimensional free energy surface (FES) profile for one of the collective variables employed in the simulation (i.e., the distance between the Cɑ atoms of D207 and R156). The wild-type and each of the mutants is highlighted in black and blue-green, respectively. As reported in the first panel, distances shorter than 1.5 nm are indicative of an occluded state of the loop, while distances greater than 2 nm refer to open conformations (i.e., more prone to proteinprotein interactions). The wild-type and each of the variants is highlighted in black and blue-green, respectively. We show different effects on the loop conformation induced by each variant (see also Table 2). Interestingly, K120E decreased the energy barrier between the two low-energy states, R273C and S215R stabilized the open state, while S215G, R181H, and R248Q favored the occluded one.  protein stability. Our functional assessment allows us to provide complementary information on the more likely effect of each of them and divide the mutations into the different classes: i) effects on direct interaction with DNA (K120E/N, R280T/K); ii) allosteric effect on the S6-S7 interface (R248Q/ W, R273C, R273H, R282Q), iii) local effects on S6-S7 (S215G/I/R), iv) combined local effects and allosteric (R181H); v) minor or neutral effects on all the properties under investigation (R158L).

Discussion
Limitations and strengths of the approach The process of analyzing cancer variants from a structural viewpoint with computational methods may shed light on biological processes yet to be researched comprehensively. While this is a great strength, the validity of our findings must be considered in light of the methodological framework and its known limitations. The methodological framework used here can be divided into two main areas: an assessment of either local or long-range effects of the mutations at the structural level. Both areas of this research depend on the choices of the starting structure, the employed algorithms and analysis strategies, and inherent challenges with molecular simulations or free energy calculations.
In this study, we focused on the DNA binding domain of p53 since it has a key role in the function of the S6-S7 loop which is at the center of the work. It was also a convenient case study to establish a multi-layer framework for analyzing mutations, that could be reused in the future to study other structural domains or full-length proteins. The framework was designed also according to our previous study in which we discussed how the conformational properties of the loop are not substantially affected by the tetrameric state of p53. 25 A future extension to this framework could include the possibility to account for the interactions of the DBD with the other p53 domains such as the transactivation domain, the proline-rich region, the nuclear localization signal, and the tetramerization domain.
In the first part of the work, we applied two different methods to estimate changes in folding free energy based on different algorithms to achieve more robust results. We also used a structural ensemble based on MD simulations to overcome the limits of FoldX in sampling the flexibility of the backbone of a protein. However, the two energy functions used here have both limitations 57,79 in relation to the conformational sampling accessible during the calculations. We found a reasonable correlation with experimental data for p53, in the range of 0.7-0.8, using both approaches which support their usage for high-throughput screening of many variants. The usage of more accurate methodologies such as the ones based on alchemical free energy protocols [80][81] could in the future be used to better appreciate and understand the effects of amino acid substitutions that are challenging for FoldX or Rosetta to predict.
We also used FoldX to calculate changes in binding free energy upon mutation for the binding to DNA or for the monomer-monomer interaction of the p53 DBD in the dimeric structure, which represents its cytosolic assembly. These calculations also suffer from limitations due to backbone rigidity when considering FoldX. This implies that we have been able to identify only very local effects with these methodologies. Other variants could also impact the binding to DNA or the dimerization which will require further modeling of the conformational changes of these complexes.
A challenge in this context is posed by mutations to histidine, such as R181H and R273H. Indeed, the different protonation and tautomeric state of the histidine imidazole ring can dramatically affect both the interaction pattern that the residue can establish and, in turn, the dynamics of the whole system. 82 Unfortunately, the lack of experimental data, especially from NMR spectroscopy, hampers the correct prediction of the preferred protonation and tautomeric state of the histidine side chain. 83 This issue could be mitigated by the prediction of pKa values of ionizable residues, like PROPKA3 84 on an experimental structure of the mutant variants of interest or the availability of chemical shifts from NMR experiments that can allow to also define the tautomeric state in solution. 85 In light of the above observation, approaches based on energy decomposition could be an additional tool to introduce in the structure-based assessment of the effects of variants as applied to other studies. 86 The advantage of this approach is that it is applied to MD simulations and all-atom force fields. It can then be useful to allow a better description of mutations to residues as histidines or better explore effects depending on backbone changes. It calculates an interaction matrix for the protein structure by considering the non-bonded interaction energies between residue pairs. The principal eigenvector resulting from the diagonalization of the matrix estimates the non-bonded interactions and residues with the most important contributions to the structural stability. The energy decomposition approach has been employed to estimate the effects of mutations on structural stability, showing performance comparable to FoldXbased calculations. 86 Integrating this method into our framework could contribute to the characterization of variants that are challenging to estimate by FoldX or Rosetta.
Overall, despite the limitations described above, the possibility to carry out high-throughput free energy calculations with empirical energy functions has allowed us to conveniently identify a subset of mutations to further explore in terms of elusive allosteric effects. In parallel, the approach has been important to disclose cases in which multiple mechanisms elicited by the mutations are at play, which could have different implications at the cellular level. We here used PCA in a non-quantitative manner to compare the effects of the mutated variants on the dynamics and conformations of the loop of interest. This analysis has been mostly applied as a first exploration before engaging in enhanced sampling with metadynamics. Methods based on the analysis of the covariance matrix of atomic fluctuations and PCA, such as the one employed here, are widely used to investigate MD ensembles. However, they have limitations in describing collective motions and identifying conformational states. 87 Approaches for structure-based assessments of variants as the ones presented here, if carried out systematically and linked to a biological readout, as we recently proposed for another protein, 88 have the potential to generate a large amount of data on 3D protein features that could be exploited in machine-learning setting for predictive or classification purposes, as recently done in other fields of molecular modeling and simulations. 89 Of note, we also noticed that the usage of PSNbased methods alone to predict allosteric mutations is not sufficient since very often the effects on the communication paths that we can measure are limited due to the existence of multiple paths and the robustness of the nodes of a network towards mutations. [90][91] It is thus important to integrate this convenient formalism with other approaches to better predict whether the amino acidic substitutions of interest influence the conformations of the endpoints of communication or if, on the contrary, their effect is negligible. In this context, we here illustrated that the PSN approach together with enhanced sampling simulations based on collective variables that capture the conformational states of interest are a remarkable combination to disentangle the effects of the amino acid substitutions. These methods also offer the possibility of understanding the effects of the mutations in terms of changes in the population of the different conformational states of the protein during dynamics.

Comparison of our data with available data on p53 variants
Our investigation provides a detailed understanding of the impact of 15 cancer-related variants of the p53 DBD, covering changes in folding free energies, interactions with DNA, dimerization occurring in the cytosol, and local or allosteric effects on the conformations of the functional interface provided by the S6-S7 loop ( Table 2). In addition, we also provided precalculated scores for the changes in folding or binding free energies for more than 1000 germline or somatic variants of p53.
Of note, R248Q is predicted in our study with a low impact on structural stability. At the same time, it is one of those mutations for which more than an experimental measurement is available [92][93][94] in ThermoMutDB and they can vary from neutral to destabilizing effects (i.e., 1.87, 3.07, or À0.74 kcal/mol). Thus, a destabilizing effect due to this variant cannot be ruled out. Many studies have been carried out with cellular or mouse models which point to a destabilizing effect for the R248Q variant resulting in misfolding and aggregation of p53. [95][96] The empirical free energy functions used in this study predict a marginal effect of this mutation on the stability of the monomer, representing one of those cases, as discussed above, in which more sophisticated methods accounting for protein dynamics and large-scale conformational changes are needed to appreciate the effects of this substitution. It is also important to highlight that p53 is characterized by low cellular stability already in its wildtype form and thus even mutations that destabilize it by a few kcal/mol might be enough to elicit a marked effect at the cellular level. This also suggests that, when using predictions from methods such as FoldX and Rosetta, variants with DDGs values around 1 kcal/mol, i.e., close to the thresholds used to classify the variants, could be worth further investigation. On the other side, the effect of R248Q might be highly context-dependent and other studies suggest that it is a variant associated with higher levels of expression of p53 and with an oncogenic effect. [97][98] We observed a long-range effect triggered by the R248Q variant on the S6-S7 loop. In another unbiased 500-ns molecular dynamics study, 32 R248Q was observed to perturb allosterically the DBD and to favor open states of the loop. These contrast with the free energy profile from the metadynamics used in this study, where we observe the opposite behavior (i.e., R248Q stabilizes more occluded states of the loop), according to the selected collective variables (Figure 10(I), Table 2). These discrepancies could be due to multiple factors, such as different coverage of the conformational space accessible to the loop during the simulations or different force fields used in the two studies or different temperature conditions. Enhanced sampling approaches, like the ones used here, should in principle allow for a better estimate of the population of different conformational states for a highly dynamic protein such as p53, however their accuracy also depends on how wellconverged they are. More calculations applying different approaches for enhanced sampling or different starting structures could help to shed light on these discrepancies. An interesting direction to pursue in the future using the R248Q variant as a case study could be to test the influence of using different force fields and different sampling approaches on the conformational dynamics of the S6-S7 loop and in parallel collect NMR data on its dynamics to better understand the mechanism in play. In addition, occluded states of the loop, according to our mechanistic model should imply a decreased binding to the p53 interactors recruited at the S6-S7 interface (see below) and experimental assays for proteinprotein interactions on the wild-type and mutant variant could also provide a suitable strategy to validate the results. We investigated variants at residues of p53 DBD that make direct contact with DNA and are frequently mutated in human cancers (R248Q/W, R273C/H). 99 For R248Q/W, R273C, and R273H we observed allosteric effects on the conformations of the S6-S7 loop ( Table 2), suggesting that these variants likely impact interaction preferences of the protein at this interface more than just acting locally by affecting the binding with DNA. This result suggests that multiple mechanisms are at play and can be triggered simultaneously by these substitutions. Experimental and computational data support our results, indicating that different variants of p53 DBD at R273 (R273H, R273C, and R273L) not only alter the DNA binding and transcriptional activity of p53, but also affect other properties and this may have a role in pathogenesis. 100 Furthermore, experimental studies show that variants in the p53 DBD, such as R273C/H and R248Q, act via diverse context-dependent mechanisms, such as by affecting physiological functions, have dominant-negative effects, or could gain oncogenic activities. 99,[101][102][103] Our assessment provides additional information to profile the effect of these variants in the p53 DBD. In addition, we characterized alterations at site 215 of p53 DBD that not only abolish a regulatory post-translational modification 69 but also act locally on the conformation of the S6-S7 loop and can have both inhibitory or activatory effects depending on the substitution (Figure 1), also pointing to multiple mechanisms that can have different relevance depending on the cellular context and conditions.
In this study, we systematically investigated the mutational impact one residue at a time. Further work may consider the mutational burden and effect of hypermutability of tumors to understand the global alterations of the protein and the context of cancer. Indeed, multiple occurring mutations may lead to either synergistic effects, where the collection of mutations is more damaging than the sum of the individual alterations (negative epistasis) or neutralize each other due to the compensatory nature of the mutations (positive epistasis). 104 This is also true for p53, where some compensatory mutations can rescue the protein function in the face of a pathogenic mutation. [105][106] Although the calculation of the mutational burden is common practice, using a mutational rate evaluating the number of mutations in a particular gene compared to genomic data, complex epistatic interactions are often omitted since assessment and prediction of compensatory mutations is not a straightforward task. 107 Approaches to assessing these epistatic interactions often rely heavily on evolutionary mechanisms, fitness landscapes, or as presented using double force scanning 105 assessing rescue sites without any evolutionary assumptions such as distance between rescue mutation and rescued site. Most recently, a new and faster implementation of the double force scanning has been proposed which could be included in an effective manner within our structure-based framework. 108 Other methods derive epistatic information more indirectly using evolutive approaches, e.g., based on coevolution 109 or using epistasis as a basis to understand the fitness or pathogenicity of a given mutation. [110][111] In conclusion, to comprehensively understand the impact of p53 variants or other similar cancer targets, tools to unveil the intricate combinations and their effects on conformational alterations are a field of active development. Combining these tools with a firm understanding of multimerization, cellular compartment, and thus preferred conformational state could provide invaluable information in the future. Prominently, this research may play a part to bridge the gap between mutational understanding in a structural space and hypermutability of tumors found in the clinic. Future research could lead to effectively explaining some hallmarks of cancer 110 from a structural perspective unveiling sites of interest that could be targeted by therapeutic agents.
Biological relevance of the assessment of cancer-related mutations on the allosteric regulation of the S6-S7 loop One of the main results of this work is that we identified a group of mutations of p53 that are likely to have a marginal impact on structural stability but still with a possible pathogenic impact due to alteration of an allosteric mechanism to regulate an interface for recruitment of proteinprotein interactions, i.e., the S6-S7 loop.
These results are of broader relevance if we consider that the p53 DBD is central in the cellular hub activity of p53. It interacts with DNA, transcription factors, and other protein partners to elicit p53 transcriptional-dependent and independent functions. [111][112] Different interaction partners for the p53 DBD, which binds in the proximity of the S6-S7 interface, have been predicted 25,112 or experimentally identified. 113 In addition, a recent X-ray structure shows the complex of p53 DBD with a short peptide of the p53upregulated modulator of apoptosis (PUMA) protein. 114 In this complex, PUMA directly interacts with the p53 DBD via residues of the N-terminal region and the S6-S7 loop.
In addition, molecular simulations of p53 DBD in complex either with the DNA or the repressor protein iASPP suggest a communication path from K120 in loop L1 to R213 of the S6-S7 loop, 115 like the ones observed in our previous work. 25 The authors propose that in complex with DNA, this communication path favors the binding with partners. On the other hand, the interaction with iASPP rewires the communication path making the interaction with DNA. 115 As we stated above, alterations in the coding region of the TP53 gene can elicit different outcomes, such as loss-of-function, gain-offunction, and dominant-negative effects. It becomes thus important to understand how cancer mutations can alter the conformation of the S6-S7 loop toward occluded (i.e., not favorable to intermolecular interactions) or open (i.e., accessible to intermolecular interactions) conformations. Substitutions such as R181H, R248Q, and S215G, which favor occluded states are likely to have an inhibitory effect on the intermolecular interactions either through a local effect (S215G) or an allosteric effect elicited at a long distance from the mutation site (R248Q). Other substitutions, such as R273C or S215R, could promote the opposite effect and enhance these interactions, making them constitutively active. The availability of a nanobody binding at this interface and of the interaction with PUMA could be exploited to design experiments (and simulations) to test these hypotheses.
In addition, in a recent study of the p53 DBD, it was shown that mutations that affect the motions of loop S6-S7 could favor its open states and the formation of a putative pocket with druggable properties, 22,32 which makes the effects of cancerrelated variants of p53 on this interface even more interesting for applicative purposes.
Towards a more generalized framework for structure-based assessments of diseaserelated mutations In this and previous works, 88,[116][117][118] we showed how using a multi-tier approach based on computational structural methods can help to shed light on the complexity behind the effects of mutations found in cancer or other diseases. Many of the techniques applied in this study will work more broadly for other proteins and can help to fill our gap of knowledge in the field of cancer structural biology. In addition, as shown in a previous study on a ubiquitin-like protein, 88 we can use these analyses to prioritize variants for experimental validation and help to identify the best biological readouts for the experiments depending on the functional effect that we predict.
A supplementary applied endpoint is the targeting of alterations therapeutically. Indeed, well-founded knowledge of structural cancer alterations may give rise to the development of therapeutics such as antibodies or CAR T-cells, where the alteration is the epitope. Additionally, such an approach could even identify isoform alterations purely on the sequence, suggesting a progression towards a personalized treatment approach.

Identification of variants to use in the study
We collected and aggregated cancer-related missense variants for the DBD region of p53 from cBioPortal 119 and COSMIC version 94. 120 The results are reported in TableS1. The data have been collected in November 2021 and updated/verified in January 2022.
We have used the gnomAD API to gather all the variants associated with TP53 in gnomAD version 2.1. We have filtered them by keeping variants with a defined protein change that were defined on the canonical Uniport isoform. We have then calculated the allele frequency as the ratio between the mutation allele count and the total number of high-quality genotypes for wholegenome sequencing data. Finally, we have filtered the dataset by keeping only the mutations with low frequency ( 0.0001).
We retrieved curated somatic and germline variants from the TP53 database 33 and filtered the data to retain only the missense variants in the region 91-289 of interest for this study. In addition, we have added in-house missense variants in the region of interest identified with 30x whole genome sequencing (method:, 14 or clinical NGS gene panels.
The Venn Diagram was produced with DeepVenn. 121

Selection of PDB structures for p53 DBD
The selection of experimentally solved threedimensional (3D) structures for the modeling of p53 is not trivial due to the uncommonly large number of options within the Protein Data Bank (PDB). Consequently, the choice was based on a set of criteria implemented in a Python script. We identified 226 PDB entries associated with UniProt P04637 and we retained information regarding the experimental method used, the resolution where applicable, and the deposition date of the structure. Each structure was chosen applying the following criteria: X-ray structures were preferred to electron microscopy, which was preferred to NMR spectroscopy. We selected the X-ray structure characterized by the best resolution. In cases of more structures with thesame resolution, we selected the most recent structure.
For each structure the sequence was retrieved and compared to the sequence of the DNA binding domain (residues 102-292), discarding any structures that did not contain an exact match (n = 31). 16 of these were bound to DNA, 11 were in complex with other peptides, one had undergone a post-translational modification and three were free chains. We selected three structures for further analyses: one free singlechain structure (2XWR, R = 1.68 A), 122 a single chain structure in complex with DNA (3KZ8, R = 1 .91 A, selected chain B). 66 In addition, we used the structure 2OCJ (chains B and D), 123 as a reference for the dimeric free state of p53 in the cytosol. See Table S2 for the full list of the 31 structures, each annotated for potential utility.
Modeling of the dimeric state and monomeric state in complex with DNA of p53  We used MODELLER version 10.1 124 to generate models of the dimeric state and monomeric state in complex with DNA of p53 91-289 (p53 91-289_dimer and DNA:p53 91-289_monomer , respectively) For p53 91-289_dimer , we used as templates the X-ray structure of the dimer of p53  (PDB entry 2OCJ), 123 and the structure after PDB-REDO refinement. We used the orientation of the two monomers defined by chains B and D in the PDB files. We selected this orientation since it closely resembles the orientation of the monomers in the X-ray structure of the dimer of p53  in complex with BCL-xL (PDB entry 6LHD), 53 which is the dimeric structure expected for p53 in the cytosol. We modeled only the N-terminal region (residues 91-96) for which the coordinates are not available in the PDB file using as a template the X-ray structure of p53  (PDB entry 2XWR chain A). 122 For DNA: p53 91-289_monomer , we used as a template the X-ray structure of the tetramer of p53  (PDB entry 3KZ8). 66 We used the monomer of p53  in chain B and the single DNA strand in chain C. We generated the second DNA strand (chain D) via symmetry operations in PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrö dinger, LLC). We modeled only the regions for which the coordinates are not available in the PDB file (the N-terminal region (residues 91-96), residues S185-G187, and part of the side chain of residues K101, Q104, R110, D148, L188, L201, R202, E204, E221, E224, V225, D228, E287) using as a template the X-ray structure of p53  (PDB entry 2XWR chain A).
We generated 100 models from each template using six distance restraints between five pairs of residues forming non-covalent contacts in the Nterminal region of the p53 domain (i.e, W91-R174, S94-T211, S94-S96, S94-T170, P92-F212). We used harmonic potential restraints, restraining each distance around its value in the template Xray structure of p53  (PDB entry 2XWR, chain A). We then calculated in the 100 models the six atom-atom distances used as restraints and discarded models having variation at a distance larger than 0.5 A. We then ranked the models on the root mean square deviation (RMSD) of the atoms of the residues in the N-terminal region with respect to the template structure of p53 91-289 (PDB entry 2XWR chain A). We selected the model with the lowest RMSD as the final one for each MODELLER run.

Free energy calculations of stability and binding
To calculate changes in Gibbs free energy, we applied the FoldX energy function 125 performing in-silico saturation mutagenesis using MutateX. 126 We followed the protocol described in, 116 applying the BuildModel module from the FoldX5 suite with five independent runs and subsequently finding the average DDG of the difference in DG between mutant and wild-type variants. We used different initial structures for the FoldX scans: i) the chain A from the X-ray structure with PDB entry 2XWR, ii) a structure obtained from the PDB-REDO refinement of 2XWR, and iii) an ensemble of 20 conformations extracted from a MD simulation (see below) to account for flexibility in the protein. We have previously illustrated that the performance of the FoldX algorithm increases with the use of MD ensembles as opposed to a single crystallographic structure. 126 We also showed that the proportional addition of frames only increases performance until a certain point as the application of 50 frames yielded a higher correlation with experimental values than 100 frames. 126 Additionally, stability prediction upon mutation was calculated using the protocol devised by Park and coworkers to predict the change in folding free energy in a protein monomer upon mutation Rosetta. 127 As recommended by the original publication, we used the ref2015 Rosetta energy function. This protocol takes a single structure as input, relaxes it using the Rosetta relax application, and further optimizes it in Cartesian space. Three rounds of Cartesian space optimization provide three pairs of wild type and mutant structures for each mutation. The change in folding free energy is calculated for each pair, and the final change in folding free energy associated with each mutation corresponds to the average over the three values obtained.
We also used the self-scan option of MutateX using the BuildModel module from the FoldX suite to mutate only the original amino acids to themselves, which was used as a control experiment allowing us to assess the local quality of the initial structures. A similar approach was applied to Rosetta. In the case of MutateX, we carried out the self-scans with and without the repair step of FoldX. In this regard, we expect that the self mutations should be very close to 0 kcal/mol since we are replacing a side chain with itself. In cases of deviations from this behavior, it could mean that upon repair that position or the surroundings are entrapped into some unfavorable conformations (self-scan with repair function on) or that the initial structure used for the calculation had some local placement of sidechains that could be improved with respect to the residues in the surrounding (self-scan with repair function off). The majority of the scans with MutateX or Rosetta did not give any self mutations with values above the cutoffs used to classify the mutations in stabilizing or destabilizing. The only exceptions are the positions H193 and H214 in FoldX-based calculations on the MD ensemble. The deviations for H193 and H214 were only observed in two or three models, and we verified that the final average results used for the selection of variants of interest at these positions were not affected by the local inaccuracy highlighted by the self-scan results. In the case of Rosetta, we also noticed 78 self-mutations with values above the cutoffs when we used the structure refined by PDB-REDO as the initial structure for the calculations. This result might be also due to sampling issues with this specific calculation more than to be related to the quality of the initial structure. For the mutations introduced with Rosetta on the X-ray structure, four self-mutations exceed the thresholds, at positions 131, 132, 165, and 247. The results of the predicted changes upon mutations at these positions thus need to be taken with caution.
Furthermore, we evaluated the accuracy of the predictions using a pool of variants of p53  from the ThermoMutDB database 54 with experimentally determined unfolding DDG values. We collected the values available in the database and sign-inverted the values so that they are consistent with our computational predictions (i.e., negative values represent stabilizing mutations while positive values represent destabilizing mutations). For mutations with multiple experimental measurements for the same mutation, we compared these and calculated the standard deviations to illustrate the error margin within the experimental findings. Furthermore, we calculated the average DDG value, resulting in one single value for comparison for each mutation.
We verified that the p53 mutations from ThermoMutDB used in this study were not part of the original training dataset for the development of the FoldX energy function. 128 The dataset contained 339 experimentally characterized mutant variants in nine different proteins (barnase, Cl-2, spectric, Src SH3, Sso7d, Tenascin, FKBP, Ada2h and CheY).
We then compared the predicted and experimentally determined DDGs values for the mutations reported in ThermoMutDB to evaluate: i) the accuracy of the approaches to predict changes in unfolding DDGs, ii) the influence of different initial structures for the calculations. As stated above, we used, as starting structures: i) the X-ray crystallographic structure as taken from the Protein Data Bank (X-ray), ii) a version of the X-ray structure upon refinement with PDB-REDO (X-ray PDB-REDO ), iii) and an ensemble of 20 structures from a Molecular Dynamics (MD) simulation with the CHARMM22* force field.
We used the X-ray and X-ray PDB-REDO to perform scans with both the FoldX and Rosetta free energy functions. The MD ensemble was used only for FoldX scans to evaluate if it can rescue some limitations due to lack of backbone remodeling during the calculations. To assess the accuracy of the methods used for prediction or the influence of the starting structure, we evaluated both the correlation with experimental data and the classification accuracy, which describes the performance of a classification model. The correlation was assessed by calculating the Pearson correlation coefficient by comparing the experimental values to the predicted ones. The classification accuracy was assessed based on the following classification model: stabilizing (DDG < À1.2 kcal/mol), neutral (-1.2 DDG 1.2 ), destabilizing (1.2 < DDG 3 kcal/mol), and highly destabilizing (>3 kcal/mol) (Table S1), applied to both the experimental and predicted values for each mutation.
We also used MutateX to calculate binding DDGs upon mutation using the structures of the dimer of p53. Both saturation and self-scans were run on the x-ray structure of the homodimer (PDB entry 2OCJ), 123 on the X-ray PDB-REDO structure, or their corresponding models designed with MODELLER (see above). We selected chains B and D for the MutateX scans since their orientation resembles the one observed in the cytosolic p53 fusion complex with BCL-xL (PDB entry 6LHD). 53 The self scans provided values within the cutoff to classify the mutations. We used as thresholds binding DDG values higher than 0.8 kcal/mol or lower than À0.8 kcal/mol for destabilizing or stabilizing mutations, respectively (i.e., the error of the predictor suggested by the developers of the FoldX energy function). We chose the same threshold values to investigate the effects of the mutations on the binding DDGs upon saturation scan at the interface between the two p53 monomers.

Changes in DNA binding free energies upon mutation
We used FoldX 5 to evaluate the change in binding energy for the binding between p53 and DNA upon mutations. This evaluation comprises a four-step process, encompassing a preparation step, a repair step, a step to build the model with the relevant mutation, and a step to analyze the DNA binding energy and calculate the difference in energy between the mutated and wild-type models.
This investigation relies on a crystallographic structure of p53 bound to DNA (PDB entry 3KZ8, chain B) upon preparatory and remodeling steps with PyMOL and MODELLER version 10.1. 124 PyMOL was used to model the missing parts of the double-stranded DNA, whereas MODELLER was used to model the missing residues of the disordered N-terminal region (residues 91-96) and the residues 185-187 using the X-ray structure of p53  (PDB entry 2XWR chain A) as a template. We generated 100 models from each template using six distance restraints between pairs of residues forming non-covalent interactions in the N-terminal region of the p53 domain, as done for the dimer model described above. We then ranked the models on the root mean square deviation (RMSD) of atomic positions in the N-terminal region to the template structure of p53  (PDB entry 2XWR chain A) and finally we selected the model with the lowest RMSD. The resulting structure is labelled 3KZ8b_DNA. The additional steps were completed using a Snakemake pipeline incorporating the prepared PDB file, FoldX, and a mutational input file, which included the previously described aggregated mutations. The pipeline includes a repair step, which aims to identify and repair residues containing suboptimal torsion angles, Van der Waals clashes, or total energy. After repair, the mutation step was carried out using the FoldX BuildModel command, where each mutation is modeled five times to avoid reliance on the deviation of a singular run. Concurrently, five self scans were carried out as a control, as we did for the monomer and the dimer above. For all ten models for each mutation, the analysis of Gibbs's interaction free energy between DNA and protein was performed using the FoldX command AnalyzeComplex. We estimated the average DDG values from the five runs.

Molecular dynamics and metadynamics simulations
We carried out all-atom explicit-solvent MD and well-tempered metadynamics 129 (WT-MetaD) simulations of wild-type and a selection of mutant variants of the p53 91-289 in its free state (i.e., DNAunbound) using GROMACS 130 and the CHARMM22* force field. 131 We selected the X-ray structure of p53 DBD in its monomeric and DNAunbound state (PDB entry 2XWR chain A residues 91-289) as a starting structure for the simulations. The structure of each mutant variant was obtained with the mutagenesis tools in PyMOL, retaining the original orientation of the side chain. We used a dodecahedral box of TIP3P water molecules 132 with a minimum distance between the protein and the box edges of 15 A applying periodic boundary conditions. We used a concentration of NaCl of 150 mM, neutralizing the net charge of the system. We modeled all the histidine residues in the Ne2-H tautomeric state and the system was equilibrated in multiple steps. We used the Particle-mesh Ewald summation scheme and we truncated Van der Waals and Coulomb interactions at 9 A. We collected 500-ns productive unbiased MD simulations and 300-ns WT-MetaD. The simulations were carried out at 298 K in the canonical ensemble using the velocity rescale thermostat. 133 As a control, WT-MetaD simulations for the wild-type and some selected variants have been extended to 700 ns to ensure that our estimate of the populations of the conformational states of the S6-S7 loop was reasonably converged.
The WT-MetaD simulations were carried out with GROMACS and PLUMED. 134 We used the same four CVs that we applied in a previous study of p53 DBD and which were discriminating between the solvent-exposed and occluded states of the S6-S7 loop and that were sensitive enough to understand the effects of residue modifications on the populations of the loop. 25

Atomic contact analyses with CONAN
We employed the software CONtact Analysis (CONAN) 135 to perform statistical analyses of intramolecular contacts along MD trajectories. The tool computes the inter-residue contacts using different user-defined cutoff distances, with a protocol similar to the one used in a previous study. 88 We processed the data to estimate the occurrence of each contact, calculated as the ratio between the number of frames in which the contact is detected and the total number of frames of the trajectory, and to estimate the number of encounters during the simulation (i.e., a number indicating how many times the contacts is formed and broken).

Protein Structure Networks (PSN)
We used a PSN method based on atomic contacts (acPSN) as originally formulated by Vishveshwara's group 60,136 and implemented in PyInteraph2. 61 We applied this method to MD simulations (PSN-MD). We retained only pairs of residues whose sequence distance is higher than 1 for the edge calculations and at a distance lower than 4.5 A. In the calculation, normalization factors for each residue are applied as previously discussed. 136 The normalization factors are used as proxies for the propensity of the residues to form contacts. For each pair of residues, the number of pairs of atoms lying within 4.5 A is divided by the square root of the product of the normalization factors of the two residues, and the result is multiplied by 100. This value represents the 'interaction strength' between the two residues. If the interaction strength exceeds a pre-defined cut-off (I min estimated as 2.5 for our simulations), an edge is drawn between the two residues with weight equal to the interaction strength. In addition, the acPSN will only include as edges those that have an occurrence higher than 50% of the frames in the ensemble. The interaction strength associated with each edge is in the final acPSN the average interaction strength for the edge over all the structures of the ensemble. The final acPSN is weighted according to the average interaction strength. We estimated the paths of communications between selected starting and end residues using PyInteraph2, which integrates the algorithm implemented in NetworkX for the calculation of all shortest paths between pairs of nodes. We estimated as hubs in the acPSN residues with an edge degree higher than 3, as previously suggested. 136 Estimate of allosteric free energy changes upon mutation We used the structure-based statistical mechanic model of allostery implemented in AlloSigMA 2 72 to predict the changes in allosteric free energy upon mutations. This coarse grained model can model mutations as 'UP' or 'DOWN' mutations depending on the changes in steric hindrance after the substitution. A UP mutation will mimic a residue replacement with a bulky residue, whereas the DOWN mutation will resemble the effect of a replacement with a smaller side chain. To match each of the selected 395 mutations with the corresponding Allo-SigMA mode, we estimated the changes in the volume of the amino acid elicited by the mutation with an in-house Python script (allosigma-classify), starting from an available list of residue volumes. 137 If the absolute value of the change was 5 A 3 we discarded the mutation from this analysis since we do not expect that the prediction could be reliable for small changes of a steric hindrance (examples are substitutions from tyrosine to phenylalanine or serine to alanine). This analysis led to discarding 66 variants from the allosteric energy estimates. In addition, we removed variants that are in residues in direct contact with the S6-S7 loop according to the atomic contacts or that are located in the S6-S7 loop (residues 207-213). As a result, 334 variants were retained for analyses with AlloSigMA 2. We then performed the full allosteric signaling map analysis as available on the AlloSigMA 2 webserver on the X-ray p53 DBD structure (PDB ID 2XWR, chain A) and downloaded the corresponding session file (February 2022). Of note, in the discussion of the results from AlloSigMA 2 we refer to the changes in allosteric free energy upon mutations as Dg, in alignment with the original AlloSigMA 2 terminology.
We used an in-house developed Python script (allosigma-heatmap) to extract the response information on the full protein as Dg UP or Dg DOWN for each position on which a mutation was found and depending on the mutation class (UP or DOWN). We further restricted the response sites to specific regions of interest. The resulting allosteric free energy changes (Dg UP or Dg DOWN ) can have positive or negative signs. They predict the regions of the protein sensitive to destabilization (i.e., increased flexibility) upon mutation and the ones that are stabilized (i.e., decreased dynamics), respectively. The effect of a mutation is modeled through the modification of the strength of the intramolecular contacts among the protein residues.
Principal Component Analysis (PCA) for L1 and S6-S7 loop dynamics PCA can be used to analyze MD trajectories with reference to the eigenvectors (i.e., principal components, PCs) of the mass-weighted covariance matrix of the atomic positional fluctuations. 75 We here applied PCA, upon fitting on the Ca atoms, to calculate the eigenvectors associated with the C a of the residues of the L1 and S6-S7 loops. This provides a description in a reduced conformational space of the dynamics and conformation of these loops during the simulations. We used for the comparison a common subspace derived from a PCA of a concatenated MD trajectory including all the 15 variants and the wild type protein. The first three PCs are sufficient to account for more than 80% of the variance for the loops.

CScape-somatic
CScape-somatic is a tool that uses a single kernel method to combine several features to predict somatic mutation driver potential. 78 It is based on a forward selection approach where sequential learning identified an optimal combination of feature groups based on 30 possible data sources. The final model feature groups included conservation, GC content, sequence uniqueness, local mutation frequency, proximity to gene features, spectrum, and functional elements. All used data were based on the GRCh37/hg19 version of the human genome. The prediction scores are given as probability estimates in the range [0, 1]. We here applied the cutoff of 0.89 for high confidence predictions of variants in the coding region.
To apply CScape-somatic, we used an R script to convert the data available from the aggregated metatable of variants. From this, we extracted index, residue position, wild type residue, mutated residue, and genomic mutation. The potential list of several genomic mutations for each residue was converted from HGVS format to separate columns.
Furthermore, we converted all coordinates to hg19 with liftOver to match the reference genome used by CScape-somatic.

REVEL
We used MyVariant.info Web resource 138 to retrieve the REVEL score 77 for each mutation of the p53 DBD. REVEL is an ensemble method to predict pathogenic mutations. The score can range from 0 to 1, with higher values indicating more detrimental mutations. We used as a cutoff a REVEL score of 0.4 to classify the mutations, as suggested in a recent benchmarking study. 139 This value represents a good trade-off between sensitivity and specificity.