Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution

SARS-CoV-2 has evolved variants with substitutions in the spike receptor-binding domain (RBD) that impact its affinity for ACE2 receptor and recognition by antibodies. These substitutions could also shape future evolution by modulating the effects of mutations at other sites—a phenomenon called epistasis. To investigate this possibility, we performed deep mutational scans to measure the effects on ACE2 binding of all single amino-acid mutations in the Wuhan-Hu-1, Alpha, Beta, Delta, and Eta variant RBDs. Some substitutions, most prominently N501Y, cause epistatic shifts in the effects of mutations at other sites. These epistatic shifts shape subsequent evolutionary change, for example enabling many of the antibody-escape substitutions in the Omicron RBD. These epistatic shifts occur despite high conservation of the overall RBD structure. Our data shed light on RBD sequence-function relationships and facilitate interpretation of ongoing SARS-CoV-2 evolution.

Consultants CMYC-45F) to detect yeast-displayed RBD protein and 1:200 PE-conjugated streptavidin (Thermo Fisher S866) to detect bound ACE2. At each ACE2 sample concentration, single RBD + cells were partitioned into bins of ACE2 binding (PE fluorescence) using a BD FACS Aria II as shown in Fig. S1A. A minimum of 10 million cells were collected at each sample concentration. Cells in each bin were grown overnight in 1 mL SD-CAA+pen-strep, and plasmid was isolated using 96-well yeast miniprep kits (Zymo D2005) according to manufacturer instructions, with the addition of an extended (>2 hr) Zymolyase treatment and a -80°C freeze/thaw cycle prior to cell lysis. N16 barcodes in each post-sort sample were PCR amplified as described in Starr et al. (2) and submitted for Illumina HiSeq 50bp single end sequencing. Sequencing reads are available on the NCBI Sequence Read Archive, BioProject PRJNA770094, BioSample SAMN25944367.
Demultiplexed Illumina barcode reads were aligned to library barcodes in barcodemutant lookup tables using dms_variants (version 0.8.9), yielding a table of counts of each barcode in each FACS bin which is available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/counts/variant_counts.csv. Read counts in each FACS bin were downweighted by the ratio of total sequence reads from a bin to the number of cells that were sorted into that bin from the FACS log.
We estimated the level of ACE2 binding of each barcoded mutant at each ACE2 sample concentration based on its distribution of counts across FACS bins as the simple mean bin (34) as described in (2). We determined the binding constant KD describing the affinity of each barcoded mutant for ACE2 along with free parameters a (titration response range) and b (titration curve baseline) via nonlinear least-squares regression using the standard noncooperative Hill equation relating the mean sort bin to the ACE2 labeling concentration: bin = a × [ACE2] / ([ACE2] + KD) + b The measured mean bin value at a given ACE2 concentration was excluded from curve fitting for a barcode if fewer than 2 counts were observed across the four FACS bins at that ACE2 concentration. Individual concentration points were also excluded from the curve fit if they exhibited bimodality (>40% of counts of a barcode were found in two non-consecutive bins). To avoid errant fits, we constrained the baseline fit parameter b to be between 1 and 1.5, the response parameter a between 2 and 3, and the KD parameter between 1e-15 and 1e-5. The fit for a barcoded mutant was discarded if the average count across all sample concentrations was below 2, or if more than one sample concentration was missing or excluded. We also discarded curve fits where the normalized mean square residual (residuals normalized relative to the fit response parameter a) was >40 times the median normalized mean square residual across all titration fits. Final KD binding constants were expressed as -log10(KD), where higher values indicate higher binding affinity. The computational pipeline for computing per-barcode binding constants is available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/summary/compute_binding_Kd.md.
Because most mutants in the library were independently associated with more than one N16 barcode, we were able to average across internal replicates to derive final mutant affinities. Barcode affinities associated with identical RBD genotypes were first averaged within each experimental replicate. The correlations in per-replicate collapsed affinities are shown in Fig.  S1C. The final average was then determined as the average of each replicate value. The final collapsed KD for each mutant is given in Data S1 and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/final_variant_scores/final_variant_scores.csv. The median mutant's ACE2 binding affinity measurement collapses 15 total barcodes across the replicate titration experiments (Fig. S1E). The number of independent barcodes collapsed into each mutation's final KD is reported in the raw data table linked above.

RBD expression deep mutational scanning experiments
Libraries were grown and induced for RBD expression as described above. Induced cells were washed, labeled with 1:100 FITC-conjugated chicken anti-Myc, and washed in preparation for FACS. Single cells were partitioned into bins of RBD expression (FITC fluorescence) using a BD FACS Aria II as shown in Fig. S1B. A total of >10 million viable cells (estimated by plating of post-sort dilutions) were collected for each library. Cells in each bin were grown out, plasmid isolated, and N16 barcodes sequenced as described above, except read count downweighting used the post-sort colony counts instead of the FACS log counts. Sequencing reads are available on the NCBI Sequence Read Archive, BioProject PRJNA770094, BioSample SAMN25944367.
We estimated the level of RBD expression (mean fluorescence intensity, MFI) of each barcoded mutant based on its distribution of counts across FACS bins and the known logtransformed fluorescence boundaries of each sort bin using a maximum likelihood approach (2,34), implemented using the fitdistrplus package (version 1.0.14) in R (35). Expression measurements were discarded for barcodes for which fewer than 10 counts were observed across the four FACS bins. The full pipeline for computing per-barcode expression values is available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/summary/compute_expression_meanF.md. Final mutant expression values were collapsed within and across replicates as described above (correlation between replicates shown in Fig. S1D), with a median of 10 barcodes collapsed into each mutant expression measurement (Fig. S1E). Final mutant expression values are available in Data S1 and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/final_variant_scores/final_variant_scores.csv.

Quantification of epistasis
Epistatic shifts at each site between pairs of RBD variants were quantified via a summary metric, Jensen-Shannon divergence, that captures variation in the 20 amino-acid-level affinity or expression phenotypes measured at a position (36). First, affinity (dissociation constant KD) or expression (MFI) phenotypes fi of each mutant i at a site were transformed to a probability analog πi: Note that for affinity, the resulting πi values correspond to Boltzmann weights, since -ln(KD) is proportional to the free energy in units of kBT. The Jensen-Shannon divergence between vectors p and q representing the 20 amino acid πi at a site in two RBD variants is the average Kullback-Leibler divergence of p and q from their per-index-mean vector m: ( || ) = ( || ) + ( || ) 2 where the Kullback-Leibler divergence is calculated as: The Jensen-Shannon divergence ranges from 0 for two vectors of probabilities that are identical to 1 for two vectors that are completely dissimilar. For context, scatterplots in Fig. 2C and Figs. S3, S4C illustrate the amino-acid-level epistatic perturbations that give rise to a range of Jensen-Shannon divergence values. To avoid noisier measurements artifactually inflating the epistatic shift metric, a given amino acid mutation was only included in the comparison between a pair of RBD variants if its final measurement was averaged across 3 or more individual barcodes in each RBD background. The calculation of epistatic shifts can be found at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/summary/epistatic_shifts.md. For epistatic interactions between particular mutations (e.g., Fig. 3A,B,D and Fig.  S5D,E), the expected difference in binding of a multiply mutated genotype compared to wildtype in the absence of epistasis is the sum of the Δlog10(KD) measurements of the component single mutations. Any deviation from this addition is a reflection of epistasis. An observed phenotype that is higher than the expected additive phenotype is considered "positive" epistasis, while an observed phenotype that is lower than the additive expectation is considered "negative" epistasis.

Analysis of SARS-CoV-2 genetic variation
A real-time phylogenetic tree of global SARS-CoV-2 genomes was used to count substitution accrual on internal branches of the SARS-CoV-2 phylogeny. The SARS-CoV-2 mutation-annotated tree described by McBroome et al. (22) was downloaded on May 25, 2022. Nucleotide mutations annotated on the tree were converted to amino acid mutations using matUtils (version 0.4.8) (22), using the Wuhan-Hu-1 genome (NCBI RefSeq NC_045512.2) as a reference. Amino acid substitutions were tabulated, excluding substitutions that occurred to isolated terminal branches. Substitutions were counted as occurring on N501 versus Y501 genomes, and substitutions that occurred coincidental with a 501 substitution were counted as accruing on the derived 501 state from that branch. A pseudocount of 1 was added to all substitution counts to enable log-ratio comparison of mutation accrual on N501 versus Y501 genomes (Fig. 3E). Note that we are counting occurrence of substitutions as individual ancestral events on the SARS-CoV-2 phylogeny, and not mutation frequencies which are conflated by variation in amplification of certain lineages and biases in depth of sampling over space and time.

Pseudovirus generation, titering, and neutralization experiments
A spike-pseudotyped lentiviral platform (37) was used to measure entry efficiency and antibody neutralization of spike variants. Single and double amino acid mutations and the entire suite of mutations found in Omicron BA.1 (8) were introduced into a previously described spikeexpression plasmid containing D614G and a 21-amino-acid deletion in the cytoplasmic tail (Addgene 158762) (37). Plasmids were purified in triplicate for independent viral rescues from separate plasmid stocks.
Media was changed 24 hours post-transfection. Approximately 65 hours post-transfection, viral supernatants were collected, filtered through a 0.45 μm syringe filter, and stored at -80°C. Particle concentration of each viral supernatant was determined in technical duplicate by p24 ELISA (Advanced Bioscience Laboratories Cat. # 5421) versus a known particle standard according to manufacturer instructions.
Antibody neutralization of spike-pseudotyped particles was measured on ACE2-high cell lines in technical duplicate. Cells were seeded in 96-well plates as described above for virus titering. 24 hours later, viral supernatants were diluted to a to a target of 400,000 RLU per well, and incubated for 1 hour at 37°C with monoclonal antibody across 8 four-fold dilution points starting at a concentration 400-fold higher than the expected IC50 (18). 100 μL of antibody-virus mixture was added to cells, and luciferase activity was measured ~65 hours post-infection as described for titering experiments. Fraction infectivity of each sample was calculated relative to a no-antibody well inoculated with the same viral supernatant in a matched row of the 96-well plate. We used neutcurve (https://jbloomlab.github.io/neutcurve, version 0.5.7) to calculate the inhibitory concentration 50% (IC50) of each antibody against each virus by fitting a Hill curve with fixed baseline 0 and plateau 1.

Recombinant protein production
SARS-CoV-2 Beta RBD for crystallization (residues 328-531 of S protein from GenBank NC_045512.2 with N-terminal signal peptide and C-terminal 8xHis-tag) was expressed in Expi293F (Thermo Fisher Scientific) cells in the presence of 10 µM kifunensine at 37°C and 8% CO2. Transfection was performed using the ExpiFectamine 293 Transfection Kit (Thermo Fisher Scientific). Cell culture supernatant was collected four days after transfection and supplemented with 10x PBS to a final concentration of 2.5x PBS (342.5 mM NaCl, 6.75 mM KCl and 29.75 mM phosphates). SARS-CoV-2 Beta RBD was purified using a 5 mL HisTalon Superflow cartridge (Takara Bio) followed by buffer exchange into PBS using a HiPrep 26/10 desalting column (Cytiva).
Human ACE2 for crystallization (residues 19-615 from Uniprot Q9BYF1 with a Cterminal thrombin cleavage site-TwinStrep-10xHis-GGG-tag, and N-terminal signal peptide) was expressed in Expi293F cells in the presence of 10 µM kifunensine at 37°C and 8% CO2. Transfection was performed using the Expi293 transfection kit (Thermo Fisher Scientific). Cell culture supernatant was collected five days after transfection and supplemented to a final concentration of 80 mM Tris-HCl pH 8.0, 100 mM NaCl, and then incubated with BioLock (IBA GmbH) solution. ACE2 was purified using a 1 mL StrepTrap HP column (Cytiva). Proteincontaining fractions were pooled and digested with EndoH and thrombin at 4°C overnight.

MD simulation
Coordinates of the Wuhan-Hu-1 RBD:hACE2 structure were prepared as previously described (45), where the RBD was taken from PDB 6M0J, hACE2 from PDB 1R42, and their complex generated by aligning the proteins to the CST 6M0J structure. Complex glycans were then added to the structure at positions 53, 90, 103, 322, 432, 546, and 690 on hACE2 and 343 on the RBD followed by refinement with ISOLDE (42). These coordinates were then prepared using QuickPrep (MOE v2020.0901, https://www.chemcomp.com). The Omicron RBD:hACE2:S304:S309 structure from PDB 7TN0 and Beta RBD:hACE2:S304:S309 structure determined here were glycosylated as described above and prepared using QuickPrep. The Wuhan-Hu-1+Q498R RBD:hACE2 and Omicron+Y501N RBD:hACE2:S304:S309 structures were generated by mutating the sidechain of Q498 to R and Y501 to N respectively followed by selecting the lowest energy rotamer (in MOE) for the mutated residues; further refinement was performed by excluding these residues from the restraint masks during minimization and equilibration (see below) The coordinates of each complex were parameterized using tleap (46). The following force fields were used: Amber ff14SB for the protein (47), GLYCAM_06j-1 for glycans (48), TIP3P for water (49) and for the neutralizing 0.15 M of NaCl, Joung & Cheatham parameters (50) were used; the Li parameters (51) for Zn and divalent ions.
For each complex, a nine-stage restrained minimization and equilibration protocol was used as previously described (52) using Amber20 (46). Initial minimization involved 10,000 steps and positional restraints applied to all heavy atoms resolved in the crystal structures with a 100 kcal/molÅ 2 force constant. 8 independent simulations were initialized using different initial velocities drawn randomly from the Maxwell-Boltzmann distribution, in an NVT ensemble, gentle heating from 100 K to 300 K over 100 ps, and positional restraints on all heavy atoms with a 100 kcal/molÅ 2 force constant. Next the ensemble was switched to NPT, a constant temperature of 300 K over 100 ps, and positional restraints on all heavy atoms with a 100 kcal/molÅ 2 force constant. Next, a 250 ps stage was run in NPT at 300 K with positional restraints on all heavy atoms with a 10 kcal/molÅ 2 force constant. Next minimization for 10,000 steps was performed with positional restraints on backbone atoms with a 10 kcal/molÅ 2 force constant. This was followed by 100 ps of NPT at 300 K with positional restraints on backbone atoms with a 10 kcal/molÅ 2 force constant. Next 100 ps of NPT at 300 K with positional restraints on backbone atoms with a 1.0 kcal/molÅ 2 force constant. Next 100 ps of NPT at 300 K with positional restraints on backbone atoms with a 0.1 kcal/molÅ 2 force constant. The final stage of equilibration was 2.5 ns of unrestrained MD in NPT at 300 K for each of the 8 independent trajectories for each complex. The 8 pairs of simulations were extended for 700 ns each in NPT at 300 K. Snapshots were saved every 1 ns for post-processing. 5.6 μs of aggregated MD simulations were obtained for Omicron RBD:hACE2:S304:309 and 5.6 μs for Wuhan-Hu-1 RBD:hACE2.
Simulations were post-processed using cpptraj (53). Coordinates were imaged, RMSaligned using Cα atoms to the respective crystal structure coordinates, water and ions stripped and saved to netcdf files. Distances were computed using the atoms of Y ( 33-38.]. Default parameters were used to compute the atomic densities observed over a grid, where the width of gaussian functions centered at each grid point bore widths equal to the atomic radii in each respective residue and then weighted by the atomic mass. The double sum of these gaussians over the grid points and over the course of the simulation were used to generate an isosurface map. The isosurfaces were rendered using an occupancy threshold of 0.5.

Data visualization
Interactive visualizations available at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS_variants/ were built using altair, version 4.2 (60). Sites of strong antibody escape ( Fig. 2A and Fig. S4A) were those with an average normalized total escape of >0.05 across all antibodies in the aggregate tool described by (11) as of November 27, 2021 (https://raw.githubusercontent.com/jbloomlab/SARS2_RBD_Ab_escape_maps/03910f5bb6bc86 ab823e9cf40f34b07b403f26d2/processed_data/escape_data.csv). using dimeric ACE2 (2). Monomeric ligand unmasks affinity-enhancing effects that can be masked by avidity with dimeric ligand, while dimeric ligand exhibits a larger dynamic range over moderate-to-low affinity variants.  (Fig. S1). The wildtype amino acid in each variant is indicated with an "x", and gray squares indicate missing mutations in each library. An interactive version of this map is at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS_variants/RBD-heatmaps/, and raw data are in Data S1.  Fig. 2C. Orange letters are mutations that were sampled with fewer than 3 unique barcodes across titration replicates in the Beta and/or Wuhan-Hu-1 data and were therefore not included in the epistatic shift computation as described in Methods. Scatterplots for all sites can be visualized at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS_variants/epistatic-shifts/.  Fig. 2C and Fig. S3. These scatter plots suggest that at the shifted sites, the Beta RBD is more sensitive to mutation compared to Wuhan-Hu-1, reflected in the concave shape of each plot. This is in contrast to idiosyncratic epistatic shifts in individual mutation's effects on ACE2 binding as seen in e.g. Fig.  2C.  (Fig. 3B). Antibody escape is estimated from a calculator that aggregates >250 antibody deep mutational scanning escape profiles to determine the expected antigenic effect of mutations (11). (C) Cumulative ACE2-binding affinity versus antibody escape of Omicron BA.1 mutations ( Fig. 3B and (B)). Antibody escape calculated as in (B), except mutations were introduced consecutively into the calculator in the order shown, which accounts for possible redundancy in escape mutations at overlapping antibody epitopes. (D) Double mutant cycle diagram illustrating negative sign epistasis between N501Y and G446V. (E) Triple mutant cycle illustrating additivity among the three mutations in the Beta RBD. See also Fig. S4E.  (18). Each point is the mean and standard error from technical duplicates. S2E12 is a negative control antibody, which was not expected to be escaped by the Y449H or N501Y mutations based on prior deep mutational scanning data (18). (E) Fold-change in neutralization of spikepseudotyped lentivirus from the curves in (D). The Y449H/N501Y double mutant shows 2.4x and 6.3x synergistic escape from S2H13 and S2X58, respectively, compared to the multiplicative fold-change of the single mutants.  2C). Site 403 has not previously been implicated in key events in the functional or antigenic evolution of SARSrelated coronaviruses, but mutations in this region have been shown to transmit allosterically to distal regions of the ACE2-binding surface (61,62), which may be further reflected in many epistatic shifts in mutation effects on RBD expression in this region (Fig. S4). For each property, the left hand plot shows the value for each variant RBD versus Wuhan-Hu-1 as a lineplot across RBD sites; sites with the largest displacements in the Beta structures are labeled. Right Hand plot shows relationship across RBD sites between the property and epistatic shift from deep mutational scanning ( Fig. 2A). Calculated properties include the mainchain Cα displacement from X-ray (B) or cryoEM (D) structures, and all-atom average displacement from X-ray (C) or cryoEM (E) structures.  Table S1. Crystallographic data collection and refinement statistics. Data S1. (separate file) Raw data containing the effect of each mutation in each RBD on ACE2-binding affinity and RBD expression. bind (and delta_bind) represent the -log10(KD) of a mutant (and log10(KD) relative to the respective parental sequence). expr (and delta_expr) represent the log(MFI) of a mutation (and log(MFI) relative to the respective parental sequence). Additional table elements give the number of unique experimental replicates and number of internally replicated barcode sequences with which each mutant phenotype was determined.