Structural destabilization and chaperone-assisted proteasomal degradation of MLH1 as a mechanism for Lynch syndrome

1: The Linderstrøm-Lang Centre, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen, Denmark. 2: Current address: Computational Biology Laboratory, Danish Cancer Society Research Center, Strandboulevarden 49, DK-2100 Copenhagen, Denmark. 3: Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3B, DK2200 Copenhagen, Denmark. 4: Department of Surgical Gastroenterology, Aalborg University Hospital, DK-9000 Ålborg, Denmark. 5: Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, DK-2100 Copenhagen, Denmark. 6: Tohoku University Hospital, Tohoku University, Sendai, Japan.


Introduction
The DNA mismatch repair (MMR) pathway corrects mismatched base pairs inserted during replication. The MutSα (MSH2-MSH6) heterodimer initiates repair by detecting the mismatch after which the MutLα (MLH1-PMS2) heterodimer promotes the process by generating a nick in the newly synthesized DNA strand, thereby stimulating downstream repair proteins (Jiricny, 2006;Jun et al., 2006). The MMR pathway is phylogenetically highly conserved, emphasizing its importance as a key DNA repair mechanism of the cell (Jiricny, 2013;Sachadyn, 2010). Loss of MMR activity causes genome instability, and can result in both sporadic and inherited cancer, such as Lynch Syndrome In this study, we investigated whether this is the case for LS-linked variants of the MLH1 protein.
We determined cellular abundance for 69 missense variants, and show that several destabilized LSlinked MLH1 variants are targeted for chaperone-assisted proteasomal degradation and are therefore present at reduced cellular amounts. In turn, this lower amount of MLH1 results in degradation of the MLH1-binding proteins PMS1 and PMS2. In silico saturation mutagenesis and computational prediction of the thermodynamic stability of all possible MLH1 single site missense variants revealed a correlation between the structural destabilization of MLH1, reduced steady-state levels and the lossof-function phenotype. Accordingly, the thermodynamic stability predictions accurately separate disease-linked MLH1 missense mutations from benign MLH1 variants, and therefore hold potential for classification of MLH1 missense variants of unknown consequence, and hence for LS diagnostics.
Further, by suggesting a mechanistic origin for many LS-causing MLH1 missense variants our studies provide a starting point for development of novel therapies.

In silico saturation mutagenesis and thermodynamic stability predictions
Most missense proteins are less structurally stable than the wild-type protein (Tokuriki and Tawfik and 3RBN) (Fig. 1A), we performed in silico saturation mutagenesis, introducing all possible single site amino acid substitutions into the wild-type MLH1 sequence at the 564 structurally resolved residues. We then applied the FoldX energy function (Schymkowitz et al., 2005) to estimate the change in thermodynamic folding stability compared to the wild-type MLH1 protein (ΔΔG) (Fig.   1BC). Negative values indicate that mutations that are predicted to stabilize MLH1, while positive values indicate that the mutations may destabilize the MLH1 protein. Thus, those variants with ΔΔG predictions > 0 kcal/mol are expected to have a larger population of fully or partially unfolded structures that, in turn, may be prone to PQC-mediated degradation. Our saturation mutagenesis dataset comprises 19 (amino acids, excluding the wild-type residue) * 564 (residues resolved in the N-and C-terminal structures) = 10,716 different MLH1 variants, thus covering 75% of all possible missense variants in MLH1. We illustrate a subsection as a heat map in Fig. 1D (the entire dataset is included in the supplemental material, supplemental material file 1). The predictions reveal that 34% of the substitutions are expected to change the stability of MLH1 by less than 0.7 kcal/mol, which is the typical error of the predictions (Guerois et al., 2002) (Fig. 1E). A comparable fraction (32%) are, however, predicted cause a substantial destabilization (>2.5 kcal/mol) of the MLH1 protein (Fig. 1E).

Thermodynamic stability calculations predict severely reduced MLH1 steady-state levels
To test whether the in silico stability predictions are predictive of cellular stability, abundancy, and function, we selected 69 naturally occurring MLH1 missense variants with predicted ΔΔGs spanning from -1.6 to >15 kcal/mol (Table 1). We further ensured that the selected mutations were distributed throughout the MLH1 gene, thus probing the entire structured parts of the MLH1 protein (Fig. 1A).
Then, the variants were introduced into MLH1-negative HCT116 cells and analyzed by automated immunofluorescence microscopy using antibodies to MLH1.
As expected, wild-type MLH1 localized primarily to the nucleus ( Fig. 2A). This localization pattern was also observed for all the MLH1 variants, and we did not detect any protein aggregates. We did, however, observe large variations in the fluorescence intensity, and consequently the steady-state protein levels, between the different MLH1 variants ( Fig. 2A). To quantify these differences, we first excluded the non-transfected cells using the intensity in the non-transfected control. Then we measured the total intensity of the MLH1 fluorescence in each cell and normalized to the intensity for wild-type MLH1. This analysis revealed up to 12-fold difference in intensity between the variants showing sizable differences in abundance.
To examine whether these variations in cellular abundance is correlated with thermodynamic stability, we plotted the normalized values against the predicted structural stabilities (ΔΔGs). This analysis indeed reveals that those MLH1 variants that were predicted to be structurally destabilized (high ΔΔGs) also displayed reduced steady-state levels (Fig. 2B) Given that decreased levels of MLH1 protein could cause loss of MMR function we also examined whether cellular abundancy correlated with pathogenicity. Of the 69 variants that we studied, 29 are classified as pathogenic or likely pathogenic in the ClinVar database (Landrum et al., 2018), whereas 12 are (likely) benign, and 28 are variants of unknown significance. We found that all (likely) benign variants appeared stable and had steady-state levels >70% (Fig. 2B). Conversely, 18 out of the 29 pathogenic variants (62%) had steady-state levels < 70% of the (Fig. 2B), suggesting that protein destabilization is a common feature for many MLH1 variants linked to LS, and that predictions of stability might be useful for classifying variants.
Next, we analyzed how the measured steady-state levels and the stability predictions correlated with previously published in vivo functional data on MLH1 (Takahashi et al., 2007). In that study, MLH1 function was tested in a number of assays and ranked from 0 (no function) to 3 (full function) based on their dominant mutator effect (DME) when human MLH1 variants are expressed in yeast cells (Shimodaira et al., 1998). Our comparison revealed that variants with reduced steady-state levels and high risk of destabilization in general are less likely to be functional (Fig. 2CD), which again indicates that the reduced structural stability may be linked to the observed loss-of-function phenotype. For example, while 22/23 variants with DME=3 have steady state levels >70%, only five of the 23 variants with DME=0 have this high amount of protein. These functional differences are also reflected in the correlation between loss of stability (ΔΔG) and function (Fig. 2D). In particular none of the fully functional proteins (DME=3) are predicted to be destabilized by more than 3 kcal/mol, whereas 18/23 variants with DME=0 are predicted to be destabilized by at least this amount. The unstable and non-functional variants do not appear structurally clustered, but are found throughout both the N-and C-terminal domains of MLH1 (Fig. 2E). In contrast, the linker region is depleted in detrimental variants, while functional (Takahashi et al., 2007) and benign (Landrum et al., 2018) variants are found both in structured and unstructured regions (supplemental material, Fig. S1).

Proteasomal degradation causes reduced steady-state levels of destabilized MLH1 variants
Next, we analyzed why the steady-state levels of certain MLH1 variants were reduced. For this purpose, we carefully selected eight of the 69 missense MLH1 variants for further in-depth analyses (E23D, G67R, R100P, T117M, I219V, R265C, K618A, and R659P). As previously, these variants were chosen so the mutations were distributed across the MLH1 gene, and to represent a broad range of predicted structural stabilities (ΔΔGs) as well as different pathogenicity annotations from the ClinVar database (Table 1).
The variants were transiently transfected into HCT116 cells. Indeed, six of the variants (G67R, R100P, T117M, R265C, K618A, R659P) displayed reduced steady-state levels, while wild type-like levels were observed for two variants (E23D, I219V), in agreement with the fluorescence-based observations (Fig. 3AB). Co-transfection with a GFP-expression vector revealed that this was not caused by differences between transfection efficiencies since the amount of GFP was unchanged (Fig.   3A).
Next, in order to investigate if the reduced MLH1 levels were caused by degradation, we monitored the amounts of MLH1 over time in cultures treated with the translation inhibitor cycloheximide (CHX). This revealed that those variants with reduced steady-state levels were indeed rapidly degraded (half-life between 3 and 12 hours), whereas wild-type MLH1 and the other variants appeared stable (estimated half-life >> 12 hours) (Fig. 3CD). Treating the cells with the proteasome-inhibitor bortezomib (BZ) significantly increased the steady-state levels of the unstable variants, whereas the levels of the wild-type and stable MLH1 variants were unaffected, indicating degradation by the proteasome (Fig. 3EF). Based on these results, we conclude that certain missense MLH1 variants are structurally destabilized, which in turn leads to proteasomal degradation and reduced steady-state protein levels, and a loss-of-function phenotype as scored by the DME.

PMS1 and PMS2 are destabilized when MLH1 is degraded
In order to carry out its role in MMR, it is essential that MLH1 associates with PMS2 to form the To test the mechanism underlying this instability, we measured the stability of endogenous PMS1 and PMS2 in HCT116 cells with or without introducing wild-type MLH1. In cells treated with cycloheximide, the absence of MLH1 led to rapid degradation of both PMS1 and PMS2 (t½ ~ 3-5 hours). However, when wild-type MLH1 was present, PMS1 and PMS2 were dramatically stabilized (t½ ~ 12 hours) (Fig. 4AB). Treating untransfected HCT116 cells with bortezomib led to an increase in the amount of endogenous PMS1 and PMS2, showing that their degradation is proteasome-dependent (Fig. 4C). The stabilizing effect of MLH1 on PMS1 and PMS2 was also observed for the stable MLH1 variants (Fig. 4DE).
Accordingly, we found that the MLH1 levels correlated with the PMS1 and PMS2 levels (Fig. 4F).
Collectively, these results suggest that either there is not enough MLH1 variant in the cells to form complexes with PMS1/2 or that only stable MLH1 variants are able to bind PMS1 and PMS2, and that this binding in turn protects PMS1 and PMS2 from proteasomal degradation. To test these possibilities, we proceeded to assess the PMS2-binding activity of the selected MLH1 variants. To this end, HCT116 cells were co-transfected with both MLH1 and YFP-tagged PMS2. Importantly, the overexpressed YFP-PMS2 protein did not affect the MLH1 level and appeared stable in the absence of MLH1 (Fig. 4G), allowing us to directly compare the PMS2-binding activity of the selected MLH1 variants. To ensure that the cells contained sufficient levels of the unstable MLH1 variants, the cells were treated with bortezomib prior to lysis. We found that the wild-type and stable MLH1 variants (E23D, I219V) were efficiently co-precipitated with the YFP-tagged PMS2 (Fig. 4H).
Several of the unstable MLH1 variants did not display appreciable affinity for PMS2, even after blocking their degradation, suggesting that these MLH1 variants are structurally perturbed or unfolded to an extent that disables complex formation with PMS2. Interestingly, the K618A variant displayed a strong interaction with PMS2 ( Fig. 4H), indicating that this unstable variant retains the ability to bind PMS2, and therefore potentially engage in mismatch repair. We note that this result is

HSP70 is required for degradation of destabilized MLH1 variants
Since structurally destabilized proteins are prone to expose hydrophobic regions that are normally buried in the native protein conformation, molecular chaperones, including the prominent HSP70 and HSP90 enzymes, often engage such proteins in an attempt to refold them or to target them for proteasomal degradation (Arndt et al., 2007). Indeed, both HSP70 and HSP90 are known to interact with many missense variants though with different specificities and cellular consequences (Karras et al., 2017), and a previous study has linked HSP90 to MLH1 function (Fedier et al., 2005).
To test the involvement of molecular chaperones in degradation of the selected MLH1 variants, we analyzed their interaction with HSP70 and HSP90 by co-immunoprecipitation and Western blotting.
Similar to above, the cells were treated with bortezomib to ensure detectable amounts of MLH1.
Interestingly, four of the destabilized MLH1 variants (G67R, R100P, T117M and R265C) displayed a strong interaction with HSP70, up to approximately 7-fold greater compared to wild-type MLH1 ( Fig. 5AB). Conversely, in the case of HSP90 we observed binding to all the tested MLH1 variants, including the wild-type (Fig. 5CD), indicating that HSP90 may be involved in the function or general de novo folding of wild-type MLH1, while HSP70 may be involved in regulation of certain destabilized MLH1 variants, potentially playing a role in their degradation.
To test this hypothesis, we measured the steady-state levels of the MLH1 variants following inhibition of HSP70 and HSP90, respectively. We treated cells with the HSP70 inhibitor YM01 or the HSP90 inhibitor geldanamycin (GA) and compared with the MLH1 levels in untreated cells. Interestingly, the levels of some destabilized MLH1 variants increased significantly following HSP70 inhibition.
Especially three variants (G67R, R100P, T117M) were affected (Fig. 5E), all of which were also found to bind HSP70 ( Fig. 5A) and had the lowest steady-state levels prior to HSP70 inhibition of the eight tested variants. Together, these results suggest that HSP70 actively partakes in detecting and directing certain destabilized MLH1 variants for degradation. However, we did not observe any effect of HSP90 inhibition on the MLH1 protein levels for any of the tested variants ( Fig. 5F).

Structural stability calculations for predicting pathogenic mutations
Our results show that unstable protein variants are likely to be rapidly degraded, suggesting that predictions of changed thermodynamic stability of missense MLH1 variants could be used to estimate whether a particular MLH1 missense variant is pathogenic or not. In comparison with the sequencebased tools (e.g. PolyPhen2, PROVEAN) that are currently employed in the clinic (Adzhubei et al., 2010;Choi and Chan, 2015), the FoldX energy predictions provide an orthogonal structure-based and sequence-conservation-independent prediction of whether a mutation is likely to be pathogenic.
Unlike most variant consequence predictors, FoldX was not trained on whether mutations were benign or pathogenic, but solely on biophysical stability measurements (Guerois et al., 2002). This considerably reduces the risk of overfitting to known pathogenic variants. More importantly, because of the mechanistic link to protein stability, FoldX predictions enable insights into why a particular mutation is problematic (Kiel et  suggesting that these MLH1 proteins are stable. Accordingly, with only a few exceptions, the most common MLH1 alleles reported in gnomAD also appeared functional (high DME scores) (Fig. 6A).
To further test the performance of the structural stability calculations for identifying pathogenic MLH1 variants, we then compared the ΔΔG values for ClinVar-annotated MLH1 variants. This revealed that the benign MLH1 variants all appeared structurally stable, while many pathogenic variants appeared destabilized (Fig. 6B).
While many pathogenic variants are severely destabilized, others are predicted to be as stable as nonpathogenic variants. This observation could be explained e.g. by inaccuracies of our stability calculations or by loss of function via other mechanisms such as direct loss of enzymatic activity, post-translational modifications or protein-protein interactions (Wagih et al., 2018). Thus, as a separate method for predicting the biological consequences of mutations, we explored if sequence analysis of the MLH1 protein family across evolution would reveal differences in selective pressure between benign and pathogenic variants. We performed an analysis of a multiple sequence alignment of MLH1 homologs, which considers both conservation at individual sites, but also non-trivial, coevolution between pairs of residues (Balakrishnan et al., 2011;Stein et al., 2019). Turning this data into a statistical model allowed us to score all possible missense MLH1 variants. As this statistical sequence model is based on homologous sequences shaped by evolutionary pressures, it is expected to generally capture which residues, and pairs of residues, are tolerated (Balakrishnan et al., 2011).
As opposed to stability calculations via e.g. FoldX, this approach is not directly linked to an underlying mechanistic model. Thus, we generally expect destabilizing residues to be recognized as detrimental by both FoldX and the evolutionary statistical energies, while variants in functionally active sites might only be recognized by the latter, if they do not affect protein stability ). On the other hand, stability calculations could capture effects specific to human MLH1 that are more difficult to disentangle through the sequence analyses. In our implementation, low scores indicate mutations that during evolution appear tolerated, while high scores mark amino acid substitutions that are rare and therefore more likely to be detrimental to protein structure and/or function. Indeed, the average sequence-based score for the benign variants is lower (variations more likely to be tolerated) than the average for the ClinVar-curated pathogenic variants (Fig. 6C). The full matrix of evolutionary statistical energies is included in the supplemental material (supplemental material file 2).
Finally, to compare the capability of the above-described evolutionary statistical energies, FoldX, and the more traditional sequence-based methods (PolyPhen2 and PROVEAN) to separate pathogenic and non-pathogenic variants, we applied these approaches on a set of known benign and diseasecausing MLH1 variants. We then used receiver-operating characteristic (ROC) analyses to compare how well the different methods are able to distinguish 16 benign variants from 66 known pathogenic variants (Fig. 6D). The results show that although all predictors perform fairly well, the structural calculations and evolutionary statistical energies are slightly better at distinguishing disease-linked missense variants from harmless variants. Lastly, we combined structure-based stability calculations and evolutionary statistical energies into a two-dimensional landscape of variant tolerance (Fig. 6E), which largely agrees with the classification by Takahashi et al. (Takahashi et al., 2007). There are three variants with low evolutionary statistical energies (typically indicating tolerance), but predicted and experimentally confirmed destabilization vs. wild-type MLH1 (T662P, I565F, G244V). Further, a number of stable variants have high DME scores (indicating wild-type-like function), but also high evolutionary statistical energies, indicating likely loss of function. However, as comparison to ClinVar showed that several of these are classified as pathogenic (supplemental material, Fig. S2). A possible explanation for this discrepancy is a difference in sensitivity between human variants and the employed yeast assays (Takahashi et al., 2007).

Discussion
Missense variants in the MLH1 gene are a leading cause of Lynch syndrome (LS) and colorectal cancer (Peltomaki, 2016). In recent years, germline mutations that cause structural destabilization and subsequent protein misfolding have surfaced as the cause of several diseases, including cystic fibrosis Although the variant appears functional in yeast cells (Takahashi et al., 2007), the evolutionary statistical energy of 0.9 indicates that this change is rare across the MLH1 protein family evolution and thus might be detrimental. Our results for the K618A VUS suggest that this variant, albeit being unstable, is still able to associate with PMS2 and may therefore be functional, but untimely degraded. Of the 69 MLH1 variants that we analyzed, 30 have status as VUSs. Of these, our analysis identified several (e.g. G54E, G244V, and L676R) that hold characteristics indicating that they are pathogenic: steady-state levels below 50% of WT, high ΔΔG and low functionality score in vivo (Takahashi et al., 2007) (Table 1).
The potential use of stability predictions for LS diagnostics is supported by the predicted MLH1 stabilities clearly separating into disease-linked and benign MLH1 variants. Moreover, since we observe that those MLH1 alleles that occur more frequently in the population are in general predicted as stable, this suggests that these common MLH1 alleles are either benign or at least only disease causing with a low penetrance. However, certainly not all the unstable variants were accurately detected by the structural predictions. For instance, out of our eight selected variants, three (R100P, R265C, K618A) appeared unstable, but were not predicted to be so (Table 1). Here, it is important to note that the stability predictions report on the global stability of the protein, while in a cellular context it is unlikely that any of the variants are fully unfolded. Instead, it is likely that local elements unfold , and although refolding may occur, the locally unfolded state allows chaperones and other protein quality control components to associate and target the protein for proteasomal degradation (Fig. 7). A better understanding of the importance of local unfolding events for cellular stability is an important area for further research.
Similar to other destabilized proteins, we found that the degradation of some structurally destabilized MLH1 variants depends on the molecular chaperone HSP70. This suggests that HSP70 recognizes the destabilized MLH1 variants and targets them for proteasomal degradation. Accordingly, we In conclusion, our results support a model (Fig. 7) where missense mutations can cause destabilization of the MLH1 protein, leading to exposure of degrons which, in turn, trigger HSP70-assisted proteasomal degradation, causing disruption of the MMR pathway and ultimately leading to an increased cumulative lifetime risk of cancer development in LS patients.

Plasmids
Plasmids for expression of wild-type and mutant MLH1 variants have been described before  hours. Cells were lysed in SDS sample buffer (94 mM Tris/HCl pH 6.8, 3% SDS, 19% glycerol and 0.75% β-mercaptoethanol) and protein levels were analyzed by SDS-PAGE and Western blotting.

SDS-PAGE and Western blotting
Proteins were resolved by SDS-PAGE on 7x8 cm 12.5% acrylamide gels, and transferred to 0.2 μm nitrocellulose membranes (Advantec, Toyo Roshi Kaisha Ltd.). Blocking was performed using PBS

Evolutionary statistical energy calculations
To assess the likelihood of finding any given variant in the protein family, we created a multiple sequence alignment of human MLH1 using HHblits (Zimmermann et al., 2018) and then calculated a sequence log-likelihood score combining site-conservation and pairwise co-variation using Gremlin (Balakrishnan et al., 2011). Scores were normalized to a range of (0,1), with low scores indicating tolerated sequences and high scores indicating variants that are rare or unobserved across the multiple sequence alignment. Positions at which the number of distinct homologous sequences was too small to extract meaningful evolutionary statistical energies are set to NA (supplemental material file 2).
Other sequence-based predictions of functional variant consequences were retrieved from the webservers of PROVEAN (Choi and Chan, 2015) and PolyPhen2 (Adzhubei et al., 2010).

Dominant mutator effect (DME)
We grouped the functional classification observations from Takahashi et al. (Takahashi et al., 2007) into 4 categories by summarizing the number of assays each variant showed functional behavior in.
Thus, variants in group 0 were non-functional in all three assays, those in group 3 were functional in all three assays, and the rest were functional in some, but not in other assays.
No competing interests declared.         ; the latter are color-coded by DME. Note that the leftmost group of colored dots are variants that have been reported in patients, but are not recorded in gnomAD (thus their allele frequency in gnomAD is zero). Variants with common to intermediate frequencies are all predicted to be stable, while some rare variants are predicted to be destabilized. (B) FoldX ΔΔG for benign (blue), likely benign (cyan), likely pathogenic (orange), and pathogenic (red) variants that are reported in ClinVar with "at least one star" curation. (C) Evolutionary sequence energies for ClinVar-reported variants, color scheme as in (B). (D) ROC curves for FoldX ΔΔGs, evolutionary sequence energies, and sequence-based predictors (PolyPhen2, PROVEAN) to assess their performance in separating benign from pathogenic variants. As there are 16 benign and 66 pathogenic MLH1 missense variants in ClinVar, we split the pathogenic variants into 4 sets, performed ROC analysis of each subset to the benign variants, and report the average ROC curves and area under the curves (AUCs) here. TPR, true positive rate. FPR, false positive rate. (E) Landscape of variant tolerance by combination of changes in protein stability (x axis) and evolutionary sequence energies (y axis), such that the upper right corner indicates most likely detrimental variants, while those in the lower left corner are predicted stable and observed in MLH1 homologs. The green background density illustrates the distribution of all variants listed in gnomAD. The combination of metrics captures most nonfunctional variants (DME scores 0 or 1). Outliers are discussed in the main text. The wildtype (green) MLH1-PMS2 heterodimer promotes DNA mismatch repair. Disease-linked missense MLH1 variants (red) may also promote DNA repair, but are at risk of dissociating from PMS2 due to structural destabilization. The structural destabilization of MLH1 may also cause a partial unfolding of MLH1 which is recognized by the molecular chaperone HSP70 and causes proteasomal degradation of the MLH1 variant. In turn, the degradation of MLH1 leaves PMS2 without a partner protein, resulting in proteasomal degradation of PMS2.    )   vector  wild type  E23D  G67R  R100P  T117M  I219V  R265C  K618A  R659P 40 kDa --actin IP: myc (HSP70)   vector  wild type  E23D  G67R  R100P  T117M  I219V  R265C  K618A  R659P   +BZ  +BZ  wild type  E23D  G67R  R100P  T117M  I219V  R265C  K618A     Wild-type stable variants Disease-linked unstable variants