Introduction

The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1,2,3] has led to intensive research into its spike-protein (S-protein, Fig. 1a), whose evolution may lead to changes in its surface that evade the human immune system [4,5,6]. The S-protein is a glycosylated homo-trimer on the surface of coronaviruses responsible for their characteristic shape [7, 8]. During entry of the virus particle into human host cells, the S-protein binds to the cell-surface receptor angiotensin-converting enzyme 2 (ACE2), Fig. 1b [9,10,11].

Fig.1
figure 1

Structure of the SARS-CoV-2 S-protein, studied in this work. a Representative structure of the apo-S-protein state (PDB: 7DF3) [39] with the studied mutated sites of the RBD shown in red color. b Representative cryo-EM structure of the S-protein in complex with ACE2, (PDB: 7DF4). c Overlay of the eight prefusion S-protein structures used in this study to account for structural heterogeneity when computing mutation stability effects

Due to the importance of S-protein epitopes for immune recognition, presentation of the S-protein is the rationale behind a range of important vaccines [12, 13]. The presence of prominent antibodies in a population may lead to selection pressure to change the S-protein surface to evade the antibodies learned from vaccination or infection with earlier variants [14]. Such antigenic drift may lead to variants capable of escaping vaccine-induced immunity, a problem that is likely to persist for many years. The omicron variant has been a hallmark example of such evolution, leading to a very large number of breakthrough infections in late 2021 and early 2022 [15].

Understanding such evolution effects requires protein structural data [16,17,18,19]. Since function is structure-dependent, the structure is the platform on which the amino acid evolution occurs, with evolution rates typically depending on the structural context of the amino acid site [20,21,22]. The recent technical breakthroughs in cryo-electron (cryo-EM) microscopy of large molecules [23,24,25,26,27] are an excellent example of basic science as essential for innovation, enabling publication of hundreds of SARS-CoV-2 S-protein structures during the pandemic both of the protein alone (apo-S-protein), in various conformation states, and in complex with a large range of antibodies and ACE2 at resolutions typically at 2–4 Å [28].

Among the selection pressures acting on a protein beyond its direct function is the need to maintain the overall fold stability and translational fidelity, and thus evolution of new mutations often occurs with a tradeoff not to undermine these properties [17, 29,30,31,32]. Before fusion with a host cell, the S-protein is in a metastable conformation state under selection to evade antibodies and enhance ACE2 binding, which requires conversion to an “open” conformation state with the receptor binding domain (RBD) in an upward conformation state [33,34,35,36,37]. A new arising mutation with enhanced ACE2 binding or antibody evasion could affect epidemiology and clinical presentation (transmission / antigenic drift) but less likely so if the S-protein lost structural integrity. By conditioning the fitness space, stability function tradeoffs common to other aspects of protein evolution [29] may thus affect epidemiological and clinical relevance of new variants, an important topic even in the post-pandemic period [38].

Structure-based computational models based on machine learning or energy-based force fields can compute changes in protein stability upon mutation, [21, 40,41,42,43,44,45,46,47] and describe protein stability, antibody binding, and ACE2 binding for any possible mutation in the S-protein, using S-protein structures as input. If accurate, these models could then estimate the epidemiological (and perhaps clinical) impact of mutations, since transmission potential (reproduction numbers), incubation period, and virulence relate to these molecular properties of the virus [14, 48]. Unfortunately, there are major limitations to the accuracy of such methods [42, 49,50,51,52,53,54,55]. Good experimental data would help to train models suitable for predicting effects for new mutations where data are unavailable. The few experimental S-protein stability data from time-consuming differential scanning calorimetry or denaturation experiments prevent statistically meaningful analysis. Instead, we hypothesized that expression levels, available for very many SARS-CoV-2 S-protein mutations, may be a proxy of fold stability, as argued previously [56] and explored below for the first time.

Methods

Data for expression and ACE2 binding

Data for the effect of S-protein RBD single-point mutations on yeast expression and ACE2-binding were published by Starr et al. [56]. These mutation sites are marked in red color in Fig. 1a. The effect of mutations on expression is reported as the difference in log-mean fluorescence intensity (MFI) relative to wild-type (ΔlogMFI = logMFIvariant−logMFIwild-type), such that a positive value indicates higher RBD expression [56]. The effect of mutations on ACE2-binding (Fig. 1b) is calculated from the apparent dissociation constants (KD, app) and shown as the difference in log10(KD,app) relative to wild-type (Δlog10(KD,app) = log10(KD,app)wild-type−log10(KD,app)variant), such that a positive value indicates higher variant ACE2 binding [56]. There were two independent measurements (one from each of two independent libraries) for each mutation, providing an important way to assess the quality and reproducibility of the individual data points.

Data curation

There is an overall good correlation between the results obtained from the two independent libraries (Figure S1a). We removed outliers by filtering out observations where the two replicates are different (for binding or expression), defined as residuals > 1, or where data for one replicate is missing. The rationale is that data points not similar in the two experimental replicates cannot be considered reproduced and may erroneously affect analysis. We furthermore removed data points with effects on binding in either replicate < −4.5 to avoid values near the detection limit (see Figure S1a). The correlation between the replicates after curation is shown in Figure S1b. Removing outliers and data points at the detection limit also excluded stop codon mutations (Figure S2). The described curation removed 685 of the original 4221 data points. Further analyses were performed using the remaining 3536 data points and the average of the binding and expression data from the two independent libraries, with detailed data collected in the supplementary file Table_S1.csv.

Structures and computer models used to compute stability effects

For each mutation in the dataset, the change in protein free energy of folding (ΔΔG, kcal/mol) was computed using three different methods: The relatively new (2019) neural network method DeepDDG [57], the graph-based machine learning method mCSM [45] (mCSM), and the linear regression model SimBa-IB [58]. DeepDDG and mCSM were accessed via their respective web servers (http://protein.org.cn/ddg.html, http://biosig.unimelb.edu.au/mcsm/stability), and Simba-IB was run from the command-line (http://github.com/kasperplaneta/SimBa2). These data are available in the supplementary data file Table_S2.csv.

We have previously shown that analysis of functional properties can depend on input structure used, [59] and such heterogeneity is also seen in published cryo-EM structures of the S-protein [28]. Since our interest here is in exploring if computationally estimated stability changes of mutations correlate with experimental expression data of the prefusion S-protein, we used eight experimental cryo-EM structures of the prefusion S-protein from the Protein Data Bank (PDB) which reflect the state evaluated in the expression data better than an antibody or ACE2-bound state: 6VXX by Walls et al. [35], 6X6P by Herrera et al. [60], 6X79 by McCallum et al. [61], 6Z97 by Huo et al. [62], 6ZB4 by Toelzer et al. [63], 7CAB by Lv et al. [64], 7DDD by Zhang et al. [65], and 7DF3 by Zhang et al. [39] to account for such heterogeneity. Structural alignment of these eight structures using Schrodinger [66] indicates pairwise RMSD values of 0.64–3.10 Å. 7DDD and 7DF3 are made by the Shanghai group (Cong, Huang et al.), and 6VXX and 6X79 by Veesler’s group; the others by different groups. All structures are closed, except 6Z97 having a partly open prefusion state for one of its RBDs. The structures are shown in structural overlay in Fig. 1c.

The relative solvent accessible surface area (RSA) of the mutated sites was calculated using SimBa-IB [58], which uses FreeSASA for this task [67]. Because the eight PDB structures represent homo-trimers, the ΔΔG and RSA values reported in this study are average ΔΔG and RSA values for the three chains (A, B, and C) of each structure.

It is important to note that the cryo-EM structures discussed may not reflect very precisely the real conformations of the S-protein at physiological temperature (37 °C): Cryo-EM structures are typically obtained with samples deposited with vitrified ice and rapidly cooled using a cryo-agent [24, 25, 68]. The freezing may remove some conformational dynamics [69,70,71,72,73]. Conformational changes of the S-protein may also be temperature-dependent [74]. In addition, the physiologically relevant state of the S-protein tends to be heavily glycosylated, which none of the experimental assays studying mutation impacts so far has addressed. Our goal is to identify whether experimental apo-S-protein data can be approximated by computed data, without translating these to the physiological significance of these data.

Results and discussion

Experimental and computed genotype–phenotype heat maps

Our main interest was to investigate whether experimentally measured expression and ACE2 binding of S-protein RBD mutants [56] can be related to the thermodynamic stability of the mutants. Amino acid substitutions often change the stability of a protein with a tendency for the distribution of such effects to be skewed toward destabilization [75]. Mutations in RBD affect expression of the S-protein and binding to ACE2 differently, but as discussed before [56], there is a relatively strong correlation (R = 0.59) between the effect of mutations on expression and ACE2 binding (Figure S3a). This relationship is not intuitive but could relate to underlying effects such as ACE2 recognition and stability both relating to the mutating site’s solvent accessibility or correlations between codon use [76] which affects replication efficiency [77] and amino acid properties. Expression levels could also affect apparent binding constants even at the same specific ACE2 affinity, even if this is apparently accounted for, due to complex (e.g., tertiary) interactions. The large number of mutations having a small or “nearly neutral” impact, as noted by the authors [56] and reasonably expected, may also spuriously affect the relationships. However, if the data are grouped to adjust for the skewed distribution, the correlation between expression and binding becomes even larger and quite remarkable (R = 0.88) (Figure S3b).

To assess the correlation between the effects of mutations on expression, ACE2 binding and protein stability, we computationally predicted the stability changes (ΔΔG) caused by the mutations in RBD using three state-of-the-art methods (DeepDDG, mCSM, and SimBa-IB) and compared with the experimentally observed changes in expression and binding. Figure 2a, b shows heat maps build from the experimental data by Starr et al. [56] after curation for non-reproducible replicates as described in Methods. These experimental heat maps are compared with the computational heat maps of ΔΔG derived in the present work, using DeepDDG (Fig. 2c), mCSM (Fig. 2d), and SimBa-IB (Fig. 2e). SimBa produces more stabilizing trends overall, as it was developed to handle destabilization biases. (The method performs similarly to other methods in benchmarks despite this feature.) [49, 58]

Fig. 2
figure 2

Heat maps of mutation effects on RBD experimental expression and ACE2 binding (from Starr et al. [56]) compared with computed stability change (this work). a expression; b ACE2-binding affinity. Protein stability change computed with: (c) deepDDG; d mCSM e SimBa-IB. Rectangles are colored by mutational effect according to scale bars on the right. Black dots indicate the wild-type residues. White rectangles represent mutations for which there is no data in the curated dataset

The heat maps in Fig. 2 of experimental binding and expression do have some residual similarities, as also mentioned by Starr et al. [56]. The computed stability effects provide estimates of the impact of all possible mutations in the RBD, and show some similarities to the expression data, notably with nearly neutral effects (gray) being common to both experimental and computed data in the N- and C-terminals and in some regions of the protein around 444–450 and a larger area around 470–488. In contrast, mutations in the region 388–396 have strong effects on protein stability but are nearly neutral with respect to expression and ACE2 binding. Overall, there is less similarity between ACE2 binding and the computed stability changes as perhaps expected.

RBD sites with non-neutral effects cluster in the structured regions

Figure 3 shows the effect of mutation at each site on expression, ACE2 binding and predicted protein stability mapped onto the RBD structure. The effect at each site is calculated as the average of the absolute effect (without sign) of all 19 mutations as a measure of the tolerance to mutation at each site. Mutations that affect expression are mainly located in the core RBD subdomain, and in particular in the central beta sheet and its flanking alpha helices (Fig. 3a). Mutations that affect binding are located in the ACE2 binding subdomain or, similarly to mutations that affect expression, in the central beta sheet, while large parts of the RBD domain are tolerant to mutations with regard to ACE2 binding (Fig. 3b). Mutations that affect protein stability are mainly located in the structured core of the RBD subdomain, similarly to the effect of mutations on expression (Fig. 3c–e). When evaluating the computational methods in this way, their ability to identify the most tolerant sites for mutation (or equally, the sites more likely to be neutral from an evolutionary perspective) becomes more evident than in the heat maps of Fig. 2, while the better agreement with expression remains clear.

Fig. 3
figure 3

Effects of mutations mapped onto the SARS-CoV-2 RBD structure. Each residue is color-coded from cyan to red according to the average absolute effects of mutation. a RBD expression. b ACE2 binding. c Protein stability estimated by DeepDDG. d Protein stability estimated by mCSM, and e Protein stability estimated by SimBa-IB. Cyan indicates no effect, red indicates a strong effect. The color scales are relative within each panel. The ACE2 binding motif is shown on the right-hand side in each structure. (PDB: 6Z97, Chain A)

Computed and experimental S-protein mutant properties correlate significantly

As discussed above, mutations in the RBD affect expression, ACE2 binding and protein stability differently, but with some overlap in site effects especially for protein stability and expression. To quantify this relationship, we plotted the predicted ΔΔG values for all mutations against the observed changes in RBD expression (Fig. 4a) and ACE2 binding (Fig. 4b), respectively, for each experimental structure used as input for the three methods (24 comparisons for all studied RBD mutations for expression and 24 comparisons for ACE2 binding). Correlations between RBD expression and ΔΔG for individual PDB structures are consistently observed, but their magnitude depends on the prediction method, with correlation coefficients ranging from 0.40 to 0.48 for DeepDDG, 0.27 to 0.34 for mCSM, and 0.12 to 0.22 for SimBa-IB (Fig. 4a).

Fig. 4
figure 4

Predicted change in protein stability as a function of mutations effect on RBD expression and ACE2 binding. Three different prediction methods (DeepDDG, mCSM, and SimBa-IB; indicated on the right) were used to predict the change in protein stability for each mutation using eight different experimental structures of the S-protein (indicated on top). Orange lines indicate the resulting linear regression, and the correlation coefficients (R) are shown. The p-values for all correlations are < 0.001. a ΔΔG plotted against RBD expression, and b ΔΔG plotted against ACE2 binding

The observed ACE2 binding and the predicted ΔΔG also correlate, depending on the prediction method (Fig. 4b), which is in line with the correlation between experimental measures of expression and ACE2 binding (Figure S3). However, the correlations are weaker than for expression with correlation coefficients ranging from 0.29 to 0.37 for DeepDDG, 0.13 to 0.26 for mCSM, and 0.09 to 0.15 for SimBa-IB. In all 48 comparisons of computed and experimental data in Fig. 4, the correlations are statistically significant at the 99% confidence level (p-values of linear regression < 0.01).

Figure 5 shows the aggregate data for all structures, to account for structural heterogeneity effects. The relationships observed in Fig. 4 still hold true when averaging over all eight structures, i.e., our result is robust to structural heterogeneity. The effect of mutations on protein stability predicted using DeepDDG correlates better with both expression and ACE2 binding than using mCSM and especially SimBa-IB (Figs. 4 and 5). Judging from Fig. 3c–e, the three methods agree to a large extent on how they predict the tolerability of a site to mutation, whereas they differ in how they predict the effect of individual mutations (Fig. 2c–e), and this is most likely the cause of the differences in the resulting correlations.

Fig. 5
figure 5

Average change in protein stability for the eight used structures vs. effect on RBD expression and ACE2 binding. Stability changes upon mutation for DeepDDG, mCSM, and SimBa-IB correlate with the experimental data at 99% significance (p-values of linear regression). Orange lines indicate the resulting linear regression

Both the binding and expression effects are highly skewed, with an overrepresentation of data points between 0 and −1 (Figure S2). In order to adjust for this, we grouped the data into bins of 0.5-width and 0.25-width and calculated the mean expression and binding effect in each bin, and the mean predicted ΔΔG value for each bin. Figure 6 shows these data as averages of all PDBs (data for individual structures in Figures S4–S7). The correlation increases substantially upon binning, with each data point more well-determined, whereas the p-values decrease substantially due to the few aggregate data points after averaging. Remarkably, the computational data correlate extremely well with the binned experimental data, especially for DeepDDG, much more than normally seen [49]. Considering that the models were developed to predict fold stability effects, not expression (which also depends on effects at the nucleic acid level) and that we used the full S-protein structures, whereas the experiments express RBD on the yeast surface with expected modifications, this result is very surprising. One interpretation of this result is that broader functional properties of the mutant space of the S-protein are in fact largely determined by a few simple features, due to underlying correlations.

Fig. 6
figure 6

Predicted change in protein stability as a function of mutations effect on RBD expression and binding to ACE2 using binned data. Three different prediction methods (DeepDDG, mCSM, and SimBa-IB) were used to predict the change in protein stability for each mutation using eight different experimental structures of Spike-protein. The average change in stability for the eight structures is used in the calculation. a Binding and expression data grouped in 0.5-value bins b Binding and expression data grouped in 0.25-value bins

To understand the correlations on a per-site basis, Fig. 7 shows the relationships between the experimental and computed data averaged over sites, as an indicator of the site's tolerance to mutations. As this removes some amino acid specific variations between the three computer models used, the correlations now become more similar between the methods. In all cases, there is a significant correlation (99% confidence level, p-values of linear regression), such that sites more neutral to expression and ACE2 binding effects experimentally are also more neutral toward computed stability effects.

Fig. 7
figure 7

Site-averaged impact on ACE2 binding and expression vs. stability changes. This figure shows experimental-computational relationships for site-averaged properties (all mutations with data for each site-averaged) as an indicator of the site's tolerance to mutation

As shown in Figure S8, the three methods when applied to the same structures show generally good correlations, with R = 0.55–0.68, i.e., they have a large overlap in their description of the general trends in the total data set. Still, deepDDG is a clearly better method for estimating the experimental data, especially the expression data (Figs. 4, 5, 6), although the reasons can be several (it is a neural network method trained on > 5000 data points; [57] SimBa is a simpler linear regression model, [58, 78], and mCSM, a graph-based machine learning method, is somewhat older [45]).

Mutations in residues that are buried in the core of a protein tend to have larger effect on protein stability, which is also the case for the RBD domain, where mutations in the core subdomain are less tolerated (Fig. 3c–e) than mutations near the surface. To quantify if surface exposure of residues in RBD correlated with the effect of the mutations on expression and binding, we plotted these variables against the RSA for each site, and we see a moderate correlation (R = 0.29 for binding, and R = 0.43 for expression) with a notable overall tolerance to mutation at sites with high solvent exposure (Figure S9). As expected, there is also a good correlation between surface exposure and predicted stability changes, with correlation coefficients ranging from 0.27 (for SimBa-IB) to 0.60 (for DeepDDG; Figure S10).

Whereas enough data points required for statistical analysis are available for expression, there are a few good studies studying directly the turnover/stability of selected mutations. Barrett et al. provided data on D614G, A831V, D839Y/N/E, S943P, and P1263L, showing diverse fusion effects but turnover/stability mostly similar to the reference Wuhan strain [79]. Although our study focuses on RBD mutations as in the experimental assays, this agrees well with ΔΔG = − 0.4 kcal/mol for D614G and − 0.1 kcal/mol for S943P (using SimBa; sites 831 and 839 and 1263 not being available in the 6X6P structure used for this estimate; average stability effect of all possible mutations − 1.2 kcal/mol, destabilizing). However, there are too few data to make these comparisons statistically significant and agreement could be coincidental.

Also of interest, Teng et al. [80] investigated S-protein mutation stability effects exhaustively using other models (FoldX primarily), however without validation against experimental data. Still, the cited relatively small stabilization effect for D614G in that study is reasonably consistent with the experiments by Barrett et al. [79] given the uncertainty in computations and experimental data, but in contrast to another computational study comparing many computer models, but finding D614G destabilizing [81]. As computational estimates for single mutations can thus vary, broader benchmarking of groups of mutations against many experimental data, as here, may be necessary.

A major limitation moving forward on computational structure-based SARS-CoV-2 evolution is the in vivo relevance of the cryo-EM structures used as input and the in vitro protein states using to generate the experimental data, as the S-protein is heavily glycosylated [82]. Another major limitation specific to single-mutation data (both experimental and computational) is epistasis modulating the impact of mutation effects when multiple mutations are present together [83]. These limitations are in addition to those of wild-type-structure-based computer models extrapolating effects to a mutant protein state [42, 50, 51, 58]. Also, pathogens with many proteins contributing to fitness and phenotype, e.g., bacteria, would require models for many proteins, posing additional major challenges.

Concluding remarks and biological implications

The SARS-CoV-2 S-protein fusion with human ACE2 is a prerequisite for host cell entry [9, 10]. However, S-protein antibodies induced by previous infection or vaccines bind the S-protein and neutralize some virus particles, thus reducing infectivity. During evolution of SARS-CoV-2, these two effects are under selection [28]. Most SARS-CoV-2 evolution until the omicron variant involved selection for better ACE2 binding [84], whereas omicron reflects substantial evasion of existing antibodies via its many mutations in the S-protein [85]. In addition to these two effects, protein fold stability is an important constraint on protein evolution of new functionality (function-stability tradeoffs) [29, 30, 86, 87] and may play a role in SARS-CoV-2 S-protein evolution as well [28]. In order to understand and possibly predict SARS-CoV-2 evolution, an important health challenge, [28, 38] we explored if computer models can predict experimental mutant expression data, as a proxy of stability.

Our work shows that computed protein stability effects correlate significantly for all 48 comparisons of data sets (eight structures, three methods, and two properties) with expression levels and to lesser extent ACE2 binding observed in experiments [56]. The correlations between ACE2 binding and expression may reflect underlying correlations to codon use, amino acid chemical properties, and site solvent exposure. Such correlations could impact the mutability of the SARS-CoV-2 S-protein and affect phenotype tradeoffs, and thus ultimately virus evolution, given that S-protein fold stability in the prefusion state is important [28, 33]. To the extent that protein stability maintenance is important, it will affect SARS-CoV-2 S-protein evolution via constraints on antigenic drift.

Single amino acid changes as analyzed above and in assays [88, 89] are unlikely to be fully additive in variants with multiple substitutions, due to amino acid correlations (a form of intra-gene epistasis), and possibly epistasis with other virus genes [83, 90,91,92]. These epistasis effects haunt the protein evolution field and are not easily accounted for, but recent work suggests that substantial parts of the epistasis is already utilized, [93] although this remains to be studied in future work, and does not include potential inter-gene epistasis such as processing of S-protein RNA by non-structural proteins during virus replication within the host cell. Still, the maintenance of S-protein fold stability in the lipid surface of the virion is likely to be an important constraint on ACE2 binding and antigenic drift making the heat maps studied here of interest both as a proxy of expression and S-protein stability but possibly also as a contribution to computational estimates of the fitness function of SARS-CoV-2.

Our work represents the first benchmark of computer models against a large experimental data set of S-protein mutation effects. The finding that expression data correlate to computationally estimated stability effects suggests that computer models may estimate the stability/expression effects of new mutations for which we do not have data, although many challenges remain, as described. If and only if these challenges are addressed, appropriate models could perhaps eventually help to rationalize from the molecular impact of the amino acid changes to the observable characteristics of the virus, including its epidemiology.