A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike

Summary A major challenge in understanding SARS-CoV-2 evolution is interpreting the antigenic and functional effects of emerging mutations in the viral spike protein. Here, we describe a deep mutational scanning platform based on non-replicative pseudotyped lentiviruses that directly quantifies how large numbers of spike mutations impact antibody neutralization and pseudovirus infection. We apply this platform to produce libraries of the Omicron BA.1 and Delta spikes. These libraries each contain ∼7,000 distinct amino acid mutations in the context of up to ∼135,000 unique mutation combinations. We use these libraries to map escape mutations from neutralizing antibodies targeting the receptor-binding domain, N-terminal domain, and S2 subunit of spike. Overall, this work establishes a high-throughput and safe approach to measure how ∼105 combinations of mutations affect antibody neutralization and spike-mediated infection. Notably, the platform described here can be extended to the entry proteins of many other viruses.


INTRODUCTION
The spike protein is the key target of neutralizing antibodies against SARS-CoV-2. Unfortunately, spike has undergone rapid evolution, which has eroded the potency of serum neutralization and enabled escape from most monoclonal antibodies. [1][2][3][4] Deep mutational scanning experiments can prospectively measure the effects of large numbers of mutations even before they emerge in viral variants, and therefore, they have been a valuable tool for rapidly interpreting how newly observed mutations in the spike affect antibody binding and protein folding or function. 1,5,6 The high-throughput nature of deep mutational scanning experiments has also enabled the generation of huge datasets that can inform computational methods for predicting the antigenic properties of possible future viral variants. 1,7 However, prior deep mutational scanning of the SARS-CoV-2 spike has been limited to either solely focusing on the receptor-binding domain (RBD), 1,8,9 other subdomains, 10,11 or just a small number of mutations across spike. 12 Furthermore, all previous spike deep mutational scanning experiments have been based on cell-surface display using either yeast 8,13 or mammalian cells 10-12 and therefore are limited to measuring antibody binding rather than neutralization, despite the fact that neutralization is thought to be a more relevant correlate of protection. 14, 15 Here, we describe a deep mutational scanning platform that directly measures how mutations affect cellular infection and antibody neutralization in the context of the full SARS-CoV-2 spike pseudotyped on non-replicative lentiviral particles. The key innovation behind the platform is a two-step pseudovirus generation protocol that enables the creation of large pseudovirus libraries with a link between the lentiviral genotype and the particular spike protein variant on the pseudovirus surface. We demonstrate that this platform can be used to create large genotype-phenotype linked pseudovirus libraries and map how mutations to spike affect both cellular infection and neutralization by antibodies targeting diverse regions of spike, including the RBD, N-terminal domain (NTD), and S2 subunit.

RESULTS
Producing pseudoviruses with genotype-phenotype link To characterize thousands of mutations in spike glycoprotein, we first established a lentiviral pseudotyping platform that maintains a genotype-phenotype link between the lentiviral genome and the spike variant on the virion's surface. Lentiviral spike-pseudotyping usually involves transfection of a backbone that carries a reporter gene flanked by the lentiviral long terminal repeats (LTRs), helper plasmids that code for structural and nonstructural genes required for the lentiviral life cycle, and an expression plasmid that codes for the spike variant of interest ( Figure S1A). [16][17][18] When these components are transfected into producer cells, virions are formed that carry lentiviral genomes and display spikes on their surface. However, because genome incorporation into a virion does not depend on the expressed spike, there is no link between the virion's genotype and the phenotype of the spike on its surface ( Figure S1A). The absence of a genotype-phenotype link is not problematic when only a single spike variant is used for transfection; however, it precludes deep mutational scanning studies that involve studying thousands of variants in a single pooled experiment.
To create a lentiviral genotype-phenotype link, we first generated a lentivirus backbone with the following key elements ( Figure 1A): (1) we restored the ability of the lentivirus to transcribe its full genome after integration by repairing the 3 0 LTR deletion present in traditional lentivirus vectors, 19,20 (2) we placed spike in the lentivirus backbone under an inducible promoter, and (3) we added a second constitutive promoter driving both a fluorescent reporter (ZsGreen) and a puromycin resistance gene.
Next, we developed a multi-step protocol that creates a genotype-phenotype link by ensuring that each producer cell only expresses a single variant of spike ( Figure 1B). In the first step of this protocol, we transfected cells with the spike-encoding backbone, a vesicular stomatitis virus G protein (VSV-G) expression plasmid, and the necessary helper plasmids. This produces non-genotype-phenotype-linked VSV-G-pseudotyped lentivi-ruses that we use to infect target cells at low multiplicity of infection so that most infected cells receive no more than one lentiviral genome. Next, we select for cells with integrated lentiviral genomes using puromycin, which yields a population of cells where each cell stores only a single spike variant. The spike is under an inducible promoter, which is only activated by the addition of doxycycline. To produce virions, we induce spike expression with doxycycline and transfect the helper plasmids necessary to produce lentiviruses. We validated that this approach can be used to generate genotype-phenotype-linked spike-pseudotyped viruses with titers >10 5 transduction units per mL (Figure S1B). We can further increase viral titers by 5to 10-fold by infecting cells in the presence of amphotericin B, as has been reported previously ( Figure S1C). [21][22][23] Design of mutations in SARS-CoV-2 spike deep mutational scanning library Rather than create deep mutational scanning libraries containing all possible amino acid mutants of spike, we chose to introduce only mutations that seem likely to arise during natural evolution and yield a functional spike protein. We had two rationales for designing our libraries in this way: (1) it reduces the total number of mutations that need to be included in the library, and (2) it increases the probability that variants with multiple mutations will remain functional by reducing the fraction of mutations that are highly deleterious.
Specifically, we included only mutations that have been observed in spike sequences deposited on the GISAID database, 24 reasoning that these mutations would represent mostly functional spike proteins. We introduced mutations at a higher frequency when they have emerged in spike independently many times according to the pre-built SARS-CoV-2 phylogenies from UShER. 25 Finally, we included every possible amino acid change at sites in spike that are evolving under positive selection. 26 We also included deletions at sites where such mutations are observed frequently in natural SARS-CoV-2 evolution. In total, our library design targeted 7,004 mutations in the BA.1 spike and 6,852 mutations in the Delta spike.
To introduce these mutations in the spike gene, we used a PCR-based mutagenesis method with a primer pool containing the desired mutations. 27 Importantly, this method introduces multiple mutations in each spike variant: we targeted 2 to 3 codon mutations per variant, ensuring the effects of most mutations are measured in multiple genetic backgrounds. The mutated spike genes were then barcoded with 16 random nucleotides placed downstream of the spike-coding sequence (Figure 1A) and cloned into the lentivirus backbone. As described below, after integration of the libraries into cells, these barcodes can be linked to the full set of mutations in each spike variant to facilitate downstream sequencing. 28,29 Production of pseudotyped BA.1 and Delta spike deep mutational scanning libraries We used the genotype-phenotype linked pseudovirus production strategy in Figure 1B to make BA.1 and Delta deep mutational scanning libraries. We created three independent BA.1 libraries each containing 100,000 barcoded variants and two independent Delta libraries each containing 50,000 barcoded variants (Figures 1E and S2A; a ''barcoded variant'' is a spike with a unique nucleotide barcode and some random mutation set; different barcoded variants usually but not always contain different mutations). After integrating the libraries into cells at low multiplicity of infection, we generated VSV-G-pseudotyped lentivirus from these cells by co-transfecting a plasmid expressing VSV-G alongside the other lentiviral helper plasmids ( Figure 1B, top right). The use of a VSV-G-pseudotyped virus ensures that we generate infectious lentiviral virions from all integrated backbones regardless of whether they encode a functional spike mutant. We then infected this VSV-G-pseudotyped lentivirus into a new round of cells and performed long-read PacBio sequencing to link the barcodes to the full set of spike mutations for each variant. We performed the PacBio barcode-mutation linking after integration into cells because recombination of the pseudodiploid lentiviral genome during integration 30,31 means the barcodemutation pairings may be different in the integrated cells to those in the original lentiviral backbone plasmids. 32 Importantly, linking barcodes to spike variants allows us to use short-read Illumina sequencing of the barcode to obtain the full spike genotype in all subsequent experiments.
Overall, the sequencing revealed that we had successfully introduced 99% of the targeted mutations in the BA.1 and Delta spike libraries ( Figures 1E and S2B), with at least one amino acid mutation at each site in spike for both the BA.1 and Delta libraries. The barcoded variants in the BA.1 libraries had on average 2 codon mutations per spike, whereas the variants in the Delta libraries had 3 codon mutations per spike ( Figures 1C and S2C). The number of mutations per variant is roughly Poisson distributed, so some variants had zero or one mutation, whereas others had many more ( Figure S3).
We then generated the actual spike-pseudotyped deep mutational scanning libraries from the variants stored at a single copy in the cells ( Figure 1B, lower right). We calculated a functional score for each variant based on its relative frequency in the spike-versus VSV-G-pseudotyped libraries. Positive functional scores indicate spike variants mediate pseudovirus infection better than the parental spike, whereas negative functional scores indicate worse pseudovirus infection. As expected, spike variants with premature stop codons had highly negative functional scores, whereas unmutated and synonymously mutated spike variants had functional scores close to zero (Figures 2, S2D, and S2E). Some variants with nonsynonymous mutations had functional scores close to zero, whereas others had more negative scores, reflecting the fact that some but not all nonsynonymous mutations are deleterious (Figures 2 and S2D; recall that our library design protocol preferentially introduced nonsynonymous mutations expected to yield functional spikes). Variants with multiple nonsynonymous mutations tended to have lower functional scores than variants with just one nonsynonymous mutation (Figures 2 and S2D), reflecting the cost of accumulating multiple often mildly deleterious mutations. Variants that only contained mutations designed into the libraries tended to have higher functional scores than variants that also carried other unintended nonsynonymous mutations ( Figure S2E), suggesting that our library design strategy indeed enriched for functionally tolerated mutations.
Use of an absolute standard to measure viral neutralization by deep sequencing Traditional neutralization assays measure the infectivity of a single virus variant at multiple antibody concentrations. Deep sequencing can measure the relative infectivities of many viral variants in pooled infections in the presence of an antibody. However, to convert the relative infectivities measured by deep sequencing into actual neutralization values, it is necessary to have an absolute standard that does not vary in its infectivity as a function of antibody concentration ( Figure 3A). To enable such measurements in our experiments, we added a barcoded VSV-G-pseudotyped virus into our libraries. This virus is produced from a separate cell line that has integrated lentiviral backbones carrying a barcoded mCherry gene (see STAR Methods). Importantly, this VSV-G-pseudotyped virus is not neutralized by any of the spike-binding antibodies ( Figures 3A  and S4A), and so the counts for VSV-G barcodes provide an absolute neutralization standard. To calculate the non-neutralized fraction for each viral variant, we simply compute the change in its barcode frequencies relative to the VSV-G standard ( Figure 3B).
To validate this approach, we added the VSV-G absolute standard at 1% of our BA.1 library titers and incubated the   Figure S2.

OPEN ACCESS
virus library with increasing concentrations of the LY-CoV1404 antibody, as schematized in Figure 3B. We then infected the library into ACE2-expressing target cells overnight, recovered viral genomes, and quantified the abundance of each viral barcode using deep sequencing. As expected, the fraction of VSV-G standard reads increased with antibody concentration because fewer spike variants could still infect in the presence of antibodies ( Figure 3C). We then calculated the nonneutralized fraction for each viral variant in our libraries after selection at different concentrations of the antibody. As expected, increasing antibody concentrations led to decreased non-neutralized fraction averaged over variants ( Figure 3D). Notably, variants with a greater number of substitutions had higher non-neutralized fractions, as expected if some substitutions escape the antibody.
Mapping antibody escape using a full spike deep mutational scanning system To demonstrate that pseudovirus-based deep mutational scanning can map escape from neutralizing antibodies targeting any region of spike, we chose a set of BA.1-neutralizing antibodies that bind distinct regions of spike: RBD-binding LY-CoV1404, NTD-binding 5-7, and S2-binding CC67.105. [33][34][35] Note that LY-CoV1404, also known as bebtelovimab, is one of the few clinically approved antibodies that retains potency against BA.1, BA.2, and other major Omicron lineages. 3,34 We first mapped escape from LY-CoV1404, applying the approach in Figure 3B to our three independent BA.1 libraries, and performing a technical replicate for one library. We used a biophysical model to decompose the measurements for spike variants in our libraries (some of which are multiply mutated)  into escape scores for individual mutations. 36 These mutationescape scores correlated well among technical and biological replicates ( Figure 4A). As expected, the key LY-CoV1404 escape sites were in the antibody's previously described epitope in the RBD, 34 which spans sites 439-452 and 498-501 (Figures 4B-4D and https://dms-vep.github.io/SARS-CoV-2_Omicron_BA. 1_spike_DMS_mAbs/LyCoV-1404_escape_plot.html). However, our deep mutational scanning emphasizes that only some mutations at these sites escape LY-CoV1404 neutralization. For instance, many amino acid mutations at site 446 strongly escape LY-CoV1404, but mutating this site from G (the identity in Wuhan-Hu-1) to S (the identity in BA.1 and BA.2.75) does not have a large effect. This observation emphasizes the somewhat serendipitous nature of the preserved potency of LY-CoV1404. However, this antibody may soon be escaped because sub-variants of BA.5 and BA.2.75 with mutations in the key escape site of K444 are increasingly being detected. 1,37 To validate the LY-CoV1404 deep mutational scanning, we cloned a set of mutations in the BA.1 spike with a range of effects in the deep mutational scanning data and performed standard pseudovirus neutralization assays ( Figure 4E). All the tested mutations exhibited neutralization phenotypes consistent with those measured in the deep mutational scanning. Furthermore, the neutralization assay IC 50 values correlated well with those predicted by our biophysical model 36 parameterized by the deep mutational scanning data ( Figure 4F).
We also compared the full spike LY-CoV1404 deep mutational scanning measurements to results from our previously described yeast-display system for deep mutational scanning of only the RBD. 8,13 The escape scores between the two experimental approaches correlated well ( Figure S5A) and both methods identified the same epitope ( Figure S5B).
To show that we can map escape from non-RBD-targeting antibodies, we next mapped the NTD-targeting 5-7 antibody. 33 This antibody targets an epitope outside the defined antigenic supersite in NTD and is one of the few NTD-targeting antibodies isolated pre-Omicron that still retains some potency against Omicron variants. 2,33,38 The deep mutational scanning showed that the key escape sites for 5-7 were in a hydrophobic pocket next to the N4 loop (site 172-178) (Figures 5A-5C and https:// dms-vep.github.io/SARS-CoV-2_Omicron_BA.1_spike_DMS_ mAbs/NTD_5-7_escape_plot.html), consistent with prior structural characterization of this antibody's epitope. 33 In addition, deletions in 167-171 ß-sheet, as well as mutations at the base of the adjacent loops such as G103 and V126 also escaped antibody 5-7 ( Figures 5A and 5B). We validated these deep mutational scanning results by performing individual neutralization assays with pseudoviruses containing L176K, S172N, and G103F mutations ( Figure 5D), all of which had the expected effect of completely escaping neutralization ( Figure 5E).
We next applied the full spike deep mutational scanning to S2 stem-helix targeting antibodies CC9.104 and CC67.105, which were isolated using both the SARS-CoV-2 and Middle East respiratory syndrome coronavirus (MERS-CoV) spike proteins as baits. 35 Both CC9.104 and CC67.105 broadly neutralize SARS-related coronaviruses, and CC9.104 also retains some potency against MERS-CoV. As expected, our deep mutational scanning showed that escape sites for both antibodies cluster in the S2 stem-helix region (Figures 6A-6E; https://dms-vep. github.io/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs/CC67. 105_escape_plot.html and https://dms-vep.github.io/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs/CC9.104_escape_plot. html). Our data also explain why only CC9.104 neutralizes MERS-CoV. The deep mutational scanning shows that the CC67.105 epitope centers on sites D1146, D1153, and F1156 ( Figures 6B and 6D), and consistent with the deep mutational scanning, mutating these sites leads to complete escape in validation neutralization assays ( Figure 6G). By contrast, the deep mutational scanning shows that although CC9.104's epitope also includes sites D1153 and F1156, mutations at site D1146 cause only modest or no escape ( Figures 6C and  6E), and validation neutralization assays again confirm these deep mutational scanning results ( Figure 6G). Notably, sites D1153 and F1156 are conserved between SARS-CoV-2 and MERS-CoV S2 stem-helix regions, but site D1146 is mutated to isoleucine in MERS-CoV ( Figure 6F). Based on our deep mutational scanning data we expect changes at site D1146 to have significant effects on escape from the CC67.105 antibody but only modest effects on escape from the CC9.104 antibody. Although our deep mutational scanning libraries do not include isoleucine at site D1146 (which is found in MERS-CoV), we confirmed that other mutations at D1146 lead to complete escape from CC67.105 and do not substantially impact neutralization by CC9.104 ( Figure 6G). Note that site D1163 is also mutated to isoleucine in MERS-CoV and both antibodies show some escape at that site, which may explain why CC9.104's potency against MERS-CoV is lower than against SARS-CoV-2.
The above deep mutational scanning of escape from the S2 antibodies emphasizes the difference between SARS-related coronavirus breadth and resistance to escape in SARS-CoV-2. Both CC9.104 and CC67.105 neutralize many diverse SARSrelated coronaviruses, but Omicron sub-variants with mutations that lead to almost complete escape from these antibodies have been detected (e.g., D1153Y in BA.2.46 and BA.2.59). Therefore, even pan-sarbecovirus neutralizing antibodies can be escaped by mutational diversity within SARS-CoV-2, which emphasizes the importance of directly mapping escape mutations in SARS-CoV-2 in addition to assessing breadth across other natural SARS-related coronaviruses.
To show that we can perform deep mutational scanning of the spikes from different SARS-CoV-2 strains, we mapped escape from the REGN10933 antibody using Delta spike deep mutational scanning libraries ( Figures S6A and S6B). REGN10933 is a class 1 antibody that directly competes with ACE2 binding and was part of the REGN-COV2 therapeutic cocktail used early in the pandemic but has lost potency against Omicron variants. 2,39,40 Escape sites for REGN10933 mapped with our deep mutational scanning system overlapped with the antibody binding footprint and included previously described escape mutations ( Figure S6C) 39,40 and correlated well with previously described yeast-based RBD deep mutational scanning data for this antibody (Figures S5C and S5D). 5 Finally, to show that pseudovirus-based deep mutational scanning can be used to map escape from polyclonal sera, we mapped escape from two Delta breakthrough sera using our     Delta libraries ( Figure S7). Major escape sites for both sera resided in RBD class I and II epitopes (sites 417, 484, and 486) ( Figure S7) as has been shown previously 7 and correlated well with changes in IC50 measured in conventional neutralization assays.

Functional effects of mutations on spike-mediated pseudovirus infection
Our deep mutational scanning also enables measurement of how mutations affect spike-mediated viral infection in the absence of antibodies. We can make these measurements by computing a functional score for each variant from its relative frequency in infectious spike-pseudotyped lentiviruses generated from our single-copy cell integrated cells versus VSV-G-pseudotyped lentivirus generated from the same cells ( Figure 1B). Spike variants with negative functional scores are worse at mediating cellular infection than the parental unmutated spike, whereas variants with positive functional scores are better at mediating infection ( Figure 2). To deconvolve the functional scores for the variants (which often contain multiple mutations) into the effects of individual mutations on spike-mediated entry, we used global epistasis models. 41,42 As expected stop codon mutations to the BA.1 spike were highly deleterious for spike-mediated infection, whereas amino acid mutations showed a wide range of effects ranging from slightly beneficial to roughly neutral to highly deleterious (Figure 7A; recall that our library design excludes many of the most deleterious amino acid mutations). To test whether the mutations measured to have slightly beneficial effects actually improved spike-mediated infection, we chose five mutations that the deep mutational scanning indicated improved infection ( Figure 7B) and generated pseudovirus mutants carrying these mutations. The validation experiments confirmed that all the tested mutations indeed slightly improved spike-mediated infection ( Figure 7C), validating that our deep mutational scanning can identify mutations that increase spike-mediated pseudovirus infection.
To examine the relationship between the functional effects of spike mutations in the deep mutational scanning and the actual evolution of human SARS-CoV-2, we determined the extent that mutations are enriched or depleted across a phylogenetic tree of all publicly available human SARS-CoV-2 sequences. 25 To do this, we calculated the number of independent observations of each mutation on the tree and compared these observed numbers to the expected numbers under neutrality as estimated from 4-fold synonymous sites, 43 analyzing only mutations expected to have R20 occurrences (see STAR Methods for details). Our deep mutational scanning measurements of the effects of mutations on spike-mediated infection were reasonably correlated with the enrichment of mutations among actual sequences ( Figure 7D), indicating our experiments at least partially reflect the functional selection actually shaping spike evolution. We performed a similar analysis for prior spike deep mutational scanning using yeast display of the RBD, 13 or mammalian cell display of the NTD 10 or a region of S2 11 ( Figure 7D). Our pseudovirus-based spike deep mutational scanning measurements were more correlated with the enrichment of mutations during actual evolution than any of these prior cell-surface display deep mutational scanning studies, presumably because our experiments mimic the true biological function of spike better than cell-surface display experiments.
However, none of the mutations with positive deep mutational scanning functional scores that we validated to improve spikemediated infection in a pseudovirus context are enriched during actual SARS-CoV-2 evolution ( Figure 7D). We suggest that this is because there is some divergence between the selection pressure in our pseudovirus-based experiments and true natural selection on spike. For instance, mutations at sites P1140 and P1143, which are located at the beginning of the S2 stem-helix, could potentially destabilize the prefusion trimer leading to more rapid cell entry in a pseudovirus context but negatively affect spike stability in the context of actual human transmission. Nonetheless, our functional measurements still provide the most accurate large-scale measurements to date on the effects of mutations to spike and should be useful for assessing which antibody-escape mutations are well enough tolerated to pose a plausible risk of emerging naturally. We also note that our experiments indicate that there are no further mutations to the BA.1 spike that improve pseudovirus titers to the same extent as the D614G mutation that fixed early in SARS-CoV-2's evolution in humans. 44-46

DISCUSSION
We have developed a deep mutational scanning system for assessing the antigenic and functional effects of mutations in the SARS-CoV-2 spike. This deep mutational scanning system is the first to measure how mutations to the entirety of spike affect   cellular infection and therefore enables the mapping of escape from antibodies targeting any part of the spike. Furthermore, our system directly quantifies how mutations affect antibody neutralization, and we show that these measurements correlate well with traditional pseudovirus neutralization assays. We expect that the ability to directly measure neutralization as opposed to binding will be especially useful when applied to polyclonal sera since the magnitude of how mutations affect neutralization versus binding can differ in a polyclonal context. 1,47 Using the deep mutational scanning system, we have mapped mutations that escape monoclonal antibodies targeting the RBD, NTD, and S2 domains. Although we only characterized a few antibodies here, this work is a first step toward generating similar data for much larger antibody panels. For the RBD, such data has proven valuable for interpreting the antigenic phenotypes of the new SARS-CoV-2 variants constantly emerging in the human population. 1,7 The ability to generate comparable data for other spike domains should enable better antigenic assessment of new variants, which can help inform vaccine design and strain selection. 48 In addition, our work on the S2 antibodies emphasizes the distinction between breadth on natural viruses and potential for escape within SARS-CoV-2: an antibody that crossneutralizes MERS-CoV and all tested SARS-related viruses is still escaped by mutations already present in some SARS-CoV-2 variants. Therefore, direct prospective mapping of escape mutations 5,6 should be useful for informing the development of therapeutic antibodies targeting even regions of spike conserved across most natural viruses.
We also used the deep mutational scanning system to measure how mutations in spike affect its ability to mediate pseudovirus infection. These measurements complement existing deep mutational scanning datasets on how spike mutations affect the molecular phenotypes of ACE2 affinity, 13 membrane fusion, 11 and cell-surface expression. 10,11,13 None of these experimental measurements fully reflect how mutations affect actual viral fitness, which is an emergent property of many molecular phenotypes. 49,50 However, understanding how mutations affect molecular phenotypes is an important step toward interpreting evolutionary outcomes, and our pseudovirus-based system currently provides the deep mutational scanning measurements that best reflect natural selection on the spike of human SARS-CoV-2.
The deep mutational scanning system we describe here can also be straightforwardly extended to any virus with an entry protein amenable to lentiviral pseudotyping. This set of viruses includes other coronaviruses, influenza viruses, filoviruses, arenaviruses, and henipaviruses-all of which have receptor-binding and fusion proteins for which lentiviral pseudotyping provides a safe way to study cellular infection and antibody neutralization without requiring direct work with the actual pathogenic virus. 51-55 Deep mutational scanning of the entry proteins of all these viruses could provide valuable information for antigenic surveillance and vaccine design since these proteins are the dominant target of neutralizing antibodies. However, we note that data generated by such a system could in principle be used to inform the introduction of gain-of-function mutations into actual potential pandemic viral pathogens. We therefore suggest that advances in the high-throughput characterization of mutations to viral proteins should be coupled with thoughtful limits on any downstream experiments with actual replicating viruses 56 to ensure that safely generated information is used to benefit public health without creating new risks.

Limitations of the study
The full spike lentiviral-based deep mutational scanning system described in this paper has both limitations and advantages compared with other systems to identify SARS-CoV-2 antibody-escape mutations. Yeast-display-based RBD deep mutational scanning 1,8,13 is probably more high-throughput (e.g., Cao et al. 1 have applied it to thousands of antibodies) but is limited to only RBD antibodies and measures binding rather than neutralization. The passaging of spike-pseudotyped VSV in the presence of antibodies can effectively identify both individual escape mutations and combinations of mutations, and better simulates natural evolution, 39,57 but does not provide comprehensive maps because the mutations that arise in any given experiment are stochastic. The passaging of live SARS-CoV-2 has similar benefits and limitations to the spike-pseudotyped VSV 58 but also involves biosafety concerns. The system we describe here is comprehensive and applies to the full spike, but will only identify escape mutations introduced into the original library design.
An important caveat is that our system measures the effects of spike mutations on viral entry in a pseudotyped system in cell culture. Although this system captures many of the actual constraints on spike, it does not capture all of them as evidenced by the fact that some of the mutations we identify that increase viral entry in our system do not appear to be favorably selected during the actual evolution of human SARS-CoV-2. Therefore, it is important to keep in mind that the measurements that are generated by our system come from a simplified experimental surrogate for the full role of spike during infection and transmission in humans.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

INCLUSION AND DIVERSITY
We support inclusive, diverse, and equitable conduct of research.

Design of lentiviral backbone and spike gene nucleotide sequence optimization
The structure of lentiviral backbone is shown in Figure 1A. spike-containing backbone is at https://github.com/dms-vep/SARS-CoV-2_Delta_spike_DMS_REGN10933/blob/main/library_ design/reference_sequences/pH2rU3_ForInd_sinobiological_617.2_Spiked21_CMV_ZsGT2APurR.gb. Note our BA.1 spike includes the EPE insertion after position 214 (Wuhan-Hu-1 numbering). The vector is based on a pHAGE2 lentiviral backbone in which we repaired the 3' LTR sequence, 19 which allows us to re-rescue the pseudovirus from the cells in which lentiviral backbones have been integrated. The lentiviral backbone is non-replicative unless helper plasmids (Gag/Pol (NR-52517), Tat1b (NR-52518), and Rev1b (NR-52519)) are also transfected into the cells containing this backbone. Expression of the spike gene in the lentivirus backbone is driven both by inducible TRE3G promoter and by Tat1b. TRE3G promoter is activated by addition of doxycycline in the presence of the reverse tetracycline transactivator (rtTA), which is endogenously expressed in HEK-293T-rtTA cells. The spike gene has been codon optimized and lacks 21 amino acids in its cytoplasmic tail. The cytoplasmic tail deletion has been previously shown to significantly increase pseudovirus titers. 60,61 For spike sequence codon optimization we tested a large panel of optimized sequences and found that virus titers can vary between codon optimizations by as much as 100-fold. While we did observe some titer differences when virus was generated from just transfection reactions using lentiviral backbones and helper plasmids, these differences were even more noticeable when virus was generated from cells with integrated spike-carrying lentiviral backbones. Of the tested codon optimizations we found that the sequence optimized spike from SinoBiological (VG40609-UT) gave by far the best virus titers; we therefore based all variant sequences on the original SinoBiological optimization. In addition to the inducible promoter and spike gene, the backbone also has a CMV promoter that drives expression of the ZsGreen gene linked by a T2A linker to the puromycin resistance gene. ZsGreen is used as a reporter gene to detect pseudovirus infection and the puromycin resistance gene is used as a selection marker for cells with successfully integrated lentiviral backbones.

Design of spike mutations to include in BA.1 and Delta full spike deep mutational scanning libraries
We aimed to create a library with mutations that would result in mostly functional spike proteins and include the mutations likely to be important for the antigenic evolution of spike. To this end, for BA.1 and Delta deep mutational scanning libraries we included the following types mutations: (1) mutations (nonsynonymous changes and deletions) observed in spike sequences deposited on GISAID database, (2) mutations that reoccur in spike phylogeny independently multiple times, (3) all possible amino acid changes at sites in spike that show positive selection. Specifically for the BA.1 library, we also included all possible amino acid changes for sites that are mutated in the BA.1 spike relative to Wuhan-Hu-1.
The following criteria were used to select the above described mutations for BA.1 library: nonsynonymous mutations need to be present in GISAID database >16 times, deletions need to occur in the NTD and be observed on GISAID database >300 times, nonsynonymous mutations need to reoccur on spike phylogenetic tree independently at least 21 times. To get all spike mutations observed in GISAID deposited sequences we used a CoVsurver curated spike amino acid frequency table (with sequences deposited up to January-31-2022). 24 To get independently recurring spike mutation counts we used pre-built SARS-CoV-2 phylogenies from UShER. 25 Information on sites in spike undergoing positive selection was taken from taken from table here https://raw. githubusercontent.com/spond/SARS-CoV-2-variation/master/windowed-sites-fel-2021-07.csv which was built using methods described in Maher et al. 26 The full list of mutations included in the BA.1 library can be found at https://github.com/dms-vep/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs/blob/main/library_design/results/aggregated_ mutations.csv.
The following criteria were used to select the above described mutations for the Delta library: nonsynonymous mutations and deletions need to be observed on GISAID database more than once and nonsynonymous mutations need to reoccur on spike phylogenetic tree independently more than 7 times. To get all spike mutations observed in GISAID deposited sequences we aligned all spike sequences deposited on GISAID up to July-26-2021 and extracted mutation frequency counts. Independently recurring spike mutations and positively selected sites were identified as described for BA.1 library above. The full list of mutations included in the Delta library can be found at https://github.com/dms-vep/SARS-CoV-2_Delta_spike_DMS_REGN10933/blob/main/library_design/ results/aggregated_mutations.csv.

Design of primers for BA.1 and Delta spike mutagenesis
For each set of mutations described in the section above we designed separate primer pools: (1) a pool of primers for observed mutations, (2) a pool of primers for recurrent mutations, (3) a pool of primers for positive selection site mutations, and (4) a pool of primers that would cover changes at multiple positive selection sites if those positive selection sites are close enough to each other so that the primers in the pool (3) would overlap. For the BA.1 library we also designed primers that would introduce multiple amino acid deletions at recurrent deletion regions described in McCarthy et al. 62 and included them in the observed mutation primer pool. Also for the BA.1 library, the set of primers that cover all possible amino acid changes at the sites already mutated in BA.1 was pooled with the positive selection site primer pool.
All primer pools were ordered from Integrated DNA Technologies as oPools. Scripts for designing the BA.1 library primer pools and the resulting oPools that were ordered are at https://github.com/dms-vep/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs/tree/main/library_design and scripts for designing the Delta library primer pools are at https://github.com/dms-vep/SARS-CoV-2_Delta_spike_DMS_REGN10933/tree/main/library_ design.

Design of full-spike deep mutational scanning plasmid libraries
Making of the plasmid libraries for deep mutational scanning required the following three steps (1) mutagenesis of the spike gene, (2) barcoding of the mutagenised spike sequence, and (3) cloning of the mutagenised and barcoded spike into the lentiviral backbonecarrying plasmid.
Spike mutagenesis was carried out by first amplifying BA.1 or Delta spike gene sequence from a plasmid carrying lentiviral backbone with a codon optimized spike sequence (see section 'Design of lentiviral backbone and spike gene nucleotide sequence optimization' for plasmid maps). Spike sequence was amplified using 'Spike amplification' primers from Table S1 with the following PCR conditions: 1.5 ml of 10mM forward primer, 1.5 ml of 10 mM reverse primer, 10 ng of amplified spike gene template, 25 ml of KOD polymerase (KOD Hot Start Master Mix, Sigma-Aldrich, Cat. No. 71842), and water for the final volume of 50 ml. PCR cycling conditions were as follows: 1. 95 C for 2 min 2. 95 C for 20 s 3. 62 C for 15 s 4. 70 C for 2 min (return to step 2 for another 19x cycles) 5. Hold at 4ºC Amplified spike sequence was first gel-purified using NucleoSpin Gel and PCR Clean-up kit (Takara, Cat. No. 740609.5) and then further purified using Ampure XP beads (Beckman Coulter, Cat. No. A63881) at 1:2.6 sample to bead ratio.
Next the purified spike template was used in mutagenesis PCR using protocol described previously in Bloom 27 with a few modifications (see also https://github.com/jbloomlab/CodonTilingPrimers for general background on this protocol). Forward and reverse primers for mutagenesis PCR were pooled into separate pools at 1 : 2 : 2 : 0.2 per primer molar ratio between observed primer pool : recurrent primer pool : positive selection primer pool : paired positive selection primer pool. The pooling ratios are determined by the fact that recurrent and positively selected sites may be more antigenically and structurally important for spike. Two independent mutagenesis reactions were performed for each spike creating two independent biological library replicates (which means that they will have a unique set of barcodes and a unique set of mutation combinations in spike). For BA.1 libraries we performed two rounds of mutagenesis with the first round consisting of 8 mutagenic PCR cycles followed by the second round of 10 mutagenic PCR cycles. For Delta libraries one biological replicate consisted of a single round of 10 mutagenic PCR cycles and the second biological replicate consisted one round of 8 and another round of 10 mutagenic PCR cycles. Each mutagenesis PCR reaction was divided into forward and reverse reactions based on whether the reaction used the forward or reverse pool of mutagenesis primers. The PCR conditions for mutagenesis were as follows: 1.5 ml of 5 mM of forward or reverse mutagenesis primer pool, 1.5 ml of 5 mM of forward or reverse 'Spike amplification and joining PCR' primer (see Table S1), 1.5 ml of 3 ng/ml linearised spike template, 4 ml of water, and 15 ml of KOD. The PCR cycling conditions were as follows: 1. 95 C, 2 min 2. 95 C, 20 s 3. 70 C, 1 s 4. 54 C, 20 s, cooling at 0.5ºC/s 5. 70 C, 100 s (return to step 2 for the number of cycles described above) 6. 4ºC, hold Between each mutagenic PCR round 20 cycles of joining PCR were performed. For joining PCR we used 4 ml each of 1:4 diluted forward and reverse mutagenesis reactions, 1.5 mL each of forward and reverse joining primers (see 'Spike amplification and joining PCR' primers in Table S1), 4 ml of water, and 15 ml of KOD. PCR cycling conditions were identical to the mutagenesis PCR conditions. Joined PCR mutagenesis products were gel and Ampure XP purified after each joining reaction.
After the spike sequence was mutagenised we performed a barcoding PCR that appended a random 16 nucleotide barcode sequence downstream of the spike gene stop codon. We chose 16 nucleotide barcodes as it allows for a total of 4 16 unique barcoded variants, which is much greater diversity of barcodes than the final size of our deep mutational scanning plasmid libraries and therefore limits potential barcodes duplications. For barcoding 'Spike barcoding' primers from Table S1 were used with the following PCR conditions 1.5 ml of 10mM forward primer, 1.5 ml of 10 mM reverse primer, 30 ng of the mutagenised spike gene template, 25 ml of KOD polymerase, and water for final volume of 50 ml. PCR cycling conditions were as follows: 1. 95ºC, 2 min 2. 95 C, 20 s 3. 70 C, 1 s 4. 55.5 C, 20 s, cooling at 0.5ºC/s 5. 70 C, 2 min (return to step 2 for another 9x cycles) 6. 4 C hold The mutagenised and barcoded spike was then cloned into lentiviral backbone-containing plasmid. First, we digested a lentiviral backbone containing plasmid using MluI and XbaI restriction sites. The map of plasmid used for vector digestion is at https://github. com/dms-vep/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs/blob/main/library_design/reference_sequences/other_plasmid_ maps_for_library_design/3137_pH2rU3_ForInd_mCherry_CMV_ZsGT2APurR.gb. Digested vector was gel and Ampure XP purified. We then used 1:3 insert to vector ratio in a 1 hour Hifi assembly reaction using NEBuilder HiFi DNA Assembly kit (NEB, Cat. No. E2621). After the HiFi assembly, we Ampure XP purified the reaction and eluted it in 20 ml of water (note that elution in water as opposed to elution buffer enhances the subsequent electroporation efficiency). We used 1 ml of the purified HiFi product to transform 20 ml of 10-beta electrocompetent E. coli cells (NEB, C3020K). We performed 10 electroporation reactions to get a final count of > 2 million CFUs per library and plated transformed cells out on LB+ampicillin plates. We aim to make plasmid library from a much greater number of CFUs than the number of variants in our final virus libraries to minimize barcode duplication, as explained in the next section. About 16 hours after transformation bacterial colonies were scraped using liquid LB+ampicillin and plasmid stocks were prepared using QIAGEN HiSpeed Plasmid Maxi Kit (Cat. No. 12662). The final structure of the lentiviral genome with mutagenised spike cloned into it is shown in Figure 1A.  Figure 1B.
To generate VSV-G-pseudotyped virus for each library we plated 0.5 million HEK-293T cells per well in eight wells of two 6-well tissue culture dishes. Note we aim to produce VSV-G-pseudotyped virus stocks that have a greater number of infectious particles than the number of colonies scraped for plasmid libraries in order to not introduce any bottleneck on barcodes at this stage. The next day we transfected 0.25 mg of each helper plasmid (Gag/Pol, Tat1b, and Rev1b), 0.25 mg of VSV-G expression plasmid (https:// github.com/jbloomlab/SARS-CoV-2-BA.1_Spike_DMS_validations/blob/main/plasmid_maps/29_HDM_VSV_G.gb) and 1 mg of mutagenised and barcoded spike containing lentiviral vector (described in the section above). Transfections were done using BioT reagent (Bioland Scientific, Cat. No. B01-02) according to the manufacturer's instructions. 48 hours post transfection supernatants from each well were pooled, filtered through a surfactant-free cellulose acetate 0.45 mm syringe filter (Corning, Cat. No. 431220), and stored at -80ºC. VSV-G-pseudotyped viruses were titrated as described in Crawford et al. 16 Next we infected HEK-293T-rtTA cells with the generated VSV-G-pseudotyped virus. The number of infectious virus units used in these infections allowed us to bottleneck the library size at the desired final variant number. For BA.1 libraries we attempted to bottleneck libraries at 100,000 variants and for Delta libraries we bottlenecked the libraries at 50,000 variants. Notably, we used a substantially lower number of variants to infect cells compared to the possible diversity of variants in our plasmid libraries. This allows us to limit any potential duplication of barcodes between different variants due to recombination in the lentivirus genome. 30-32 Note for BA.1, libraries Lib-1 and Lib-2 originate from the same mutagenised lentiviral backbone plasmid stock but independent VSV-G virus infections and Lib-3 originates from independent mutagenised plasmid library stock. For Delta libraries Lib-1 and Lib-2 are both from independent mutagenised spike plasmid stocks. Infections were performed at MOI < 0.01 (in order to ensure that only a single spike variant is integrated in each cell), which was verified 48 hours after infection using fluorescence-activated cell sorting by detecting ZsGreen expression from the lentiviral backbone. After MOI was verified, we expanded cells for another 48 hours and then started puromycin selections to select for cells with successfully integrated lentivirus genomes. Selection was done using 0.75 mg/ml of puromycin with a fresh change of puromycin-containing D10 (see 'Cell lines' section below) every 48 hours. Selections were terminated when visual inspection using a fluorescent microscope indicated that all cells express ZsGreen (approximately 6-8 days). After puromycin selection was finished we expanded cells for another 48 hours in fresh D10 and froze cell aliquots in tetracycline-free FBS (Gemini Bio, Cat. No. 100-800) containing 10% DMSO. Frozen cell aliquots were stored in liquid nitrogen long-term.

Generation of spike and VSV-G-pseudotyped viruses from cell-stored spike deep mutational scanning libraries
To generate spike-pseudotyped viruses from cell-stored deep mutational scanning libraries we plated 100 million library-containing cells per 5-layer flask (Corning Falcon 875cm 2 Rectangular Straight Neck Cell Culture Multi-Flask, Cat. No. 353144) in 150 ml of D10 without phenol red supplemented with 1 mg/ml for doxycycline (which allows to induce spike expression ahead of pseudovirus production). 24 hours after plating we transfected cells with 50 mg of each helper plasmid (Gag/Pol, Tat1b, Rev1b) using BioT reagent according to manufacturer's instructions. 48 hours post transfection cell supernatant was collected and filtered through a 0.45 mm SFCA Nalgene 500mL Rapid-Flow filter unit (Cat. No. 09-740-44B). Filtered supernatant was then concentrated by spinning at 4ºC 3000 rcf for 30 min using Pierce Protein Concentrator (ThermoFisher, 88537). Virus aliquots were stored long-term at -80ºC. Titers for spike-pseudotyped libraries titrated on HEK-293T-ACE2 cells ranged between 0.5x10 6 -2x10 6 transcription units per ml after concentrating the virus by 15-20 fold. We note that the library virus needs to be concentrated because the library virus titers are significantly lower than the wild-type single integrant titers (shown in Figure S1) due to the presence of deleterious mutations in many of the library variants.
To generate VSV-G-pseudotyped viruses (for functional selection and long-read PacBio sequencing) from cell-stored deep mutational scanning libraries we plated 60 million library-containing cells per 3-layer flask (Corning Falcon 525cm 2 Rectangular Straight Neck Cell Culture Multi-Flask, 353143) in 90 ml of D10 without phenol red (note we do not add doxycycline in this case). 24 hours after ll OPEN ACCESS Cell 186, 1263-1278.e1-e12, March 16, 2023 e5 plating we transfected cells with 30 mg of each of the helper plasmid (Gag/Pol, Tat1b, Rev1b) and 18.75 mg of VSV-G expression plasmid using BioT reagent according to manufacturer's instructions. 32-36 hours post transfection cell culture supernatant was collected and filtered through a 0.45 mm SFCA Nalgene filter unit. Filtered supernatant was then concentrated by spinning at 4ºC 3000 rcf for 30 min using Pierce Protein Concentrator. Virus aliquots were stored long-term at -80ºC. Titers for VSV-G-pseudotyped libraries titrated on HEK-293T-ACE2 cells ranged between 10x10 6 -30x10 6 transcription units per ml after concentrating the virus by 15 fold.
Long-read PacBio sequencing of barcoded spike variants in deep mutational scanning libraries Long-read PacBio sequencing was used to acquire reads spanning the spike and the random 16 nucleotide barcode sequences. To prepare amplicons for PacBio sequencing we infected 1 million HEK-293T cells with 30 million VSV-G-pseudotyped lentiviruses carrying the deep mutational scanning libraries. This number of viruses is significantly greater than the expected number of variants in the library, which allows us to achieve high variant coverage, avoid bottleneck of barcode diversity and correct for any potential PCR or sequencing errors. 12-15 hours after infection cells were trypsinized, washed with PBS and non-integrated lentiviral genomes were recovered using QIAprep Spin Miniprep Kit (Cat. No. 27106X4). 63, 64 We use non-integrated viral genomes as our sequencing templates because they are the more abundant forms of the lentiviral genome than the integrated proviruses. 65-68 Elution volume for the miniprep was adjusted to 144 ml. Next we performed two rounds of PCR to amplify the region in the lentivirus genome spanning the spike and the random 16 nucleotide barcode. In the first round of PCR we use primers containing single nucleotide tags, which allow us to later detect strand exchange that may occur during PCR amplification. To limit strand exchange during PCR (which would disrupt barcode/spike variant linkage) we also minimize the number of PCR cycles performed and do multiple PCR reactions per sample. 69,70 Each sample was split into eight PCR reactions, four of which use 'tag_1' forward and reverse primers and four of which use 'tag_2' forward and reverse primers from the 'Spike gene amplification for PacBio long-read sequencing' primer set in Table S1. PCR reaction conditions were as follows: 1 ml of forward primer, 1 ml of reverse primer, 20 ml of KOD, and 18 ml of sample. PCR cycling conditions for round 1 PCR were as follows: 1. 95 C for 2 min 2. 95 C for 20 s 3. 70 C for 1 s 4. 60 C for 10 s (ramp 0.5ºC/s) 5. 70 C for 2.5 min (go to 2 for another 7 cycles) 6. 70 C for 5 min 7. 4 C hold After the first PCR round we pooled all reactions for each sample and purified them using Ampure XP beads with 1:0.8 beads to sample ratio and the PCR product was eluted in 84 ml of elution buffer. Eluted PCR product was divided into four PCR tubes and the second round of PCR was performed using 'RND2' forward and reverse primers from the 'Spike gene amplification for PacBio longread sequencing' primer set in the Table S1. PCR reaction conditions were as follows: 2 ml of forward primer, 2 ml of reverse primer, 25 ml of KOD, and 21 ml of purified sample. PCR cycling conditions were the same as for the round 1 PCR for a total of 10 PCR cycles. PCR reactions for each sample were pooled, purified using Ampure XP beads with 1:0.8 beads to sample ratio, and eluted in 27 ml of elution buffer. Barcodes were attached to each sample using sample SMRTbell prep kit 3.0 before multiplexing. Multiplexed SMRTbell libraries were then bound to polymerase using Sequel II Binding Kit 3.2 and sequenced with PacBio Sequel IIe sequencer with a 20-hour movie collection time.
Antibody and sera escape mapping using full spike deep mutational scanning libraries For antibody and sera escape mapping we used between 4-15 times more infectious virions than the estimated total number of barcodes in a deep mutational scanning library. Using significantly more infectious virions relative to the number of variants per library avoids bottlenecking by having multiple copies of each variant. Note that we expect there to be several fold more lentiviral genomes per selection experiment than the amount of infectious units used because we are recovering the non-integrated viral genomes for sequencing, which are more abundant than integrated proviral DNA 65-68 on which our library virus titers are based. For each antibody escape mapping experiment we made a master mix of library spike-pseudotyped virus mixed with VSV-G pseudotyped neutralization standard (described below). Neutralization standard was added at 1-2% of the total virus titer used in the experiment. Virus master mix was then aliquoted into eppendorf tubes to which either different mounts of antibody or no antibody was added. For LY-CoV1404, CC9.104 and CC67.105 antibodies we performed selection experiments at 3 concentrations, starting with IC 99 concentration predetermined using standard pseudovirus neutralization assay and then increasing this concentration 4 fold and 16 fold. We start with IC 99 concentration intending that around 1% of the library will be able to escape antibody selection. We use additional concentrations as it helps us to cover a greater dynamic concentration range in cases where the exact IC 99 value is difficult to determine. Also, the use of multiple concentrations enables more precise mutation-escape predictions by the biophysical model used to decompose single-mutation effects. 36 For LY-CoV1404 starting concentration was 0.654 mg/ml, for CC9.104 -68 mg/ml, for CC67.105 -52.5 mg/ml. For the REGN10933 we started at IC 99.5 at 0.146 mg/ml and also increased that concentration by 4 fold and 16 fold. For the NTD 5-7 antibody, which does not fully neutralize the virus, we started with >IC 96 concentration at 150 mg/ml and then increased that concentration by 2 fold. For Delta breakthrough sera we started at 0.1164 and 0.00352 sera dilutions for 267C and 279C samples, respectively. For 267C we increased serum concentration by 2.5 fold and for 279C we increased serum concentration by 4 and 8 fold. Virus was mixed with the antibody or serum by inverting tubes several times, spun down at 300 g, and incubated at 37ºC for 1 h. After incubation virus and antibody mix or no antibody control were used to infect approximately 0.5 million target cells, which were plated a day before in D10 supplemented with 2.5 mg/ml of amphotericin B (Sigma, Cat. No. A2942) (which increases viral titers as shown in Figure S1C). The target cell line for different antibodies is determined by whether an antibody is able to neutralize pseudovirus on that cell line. We have previously described this phenomena in Farrell et al. 59 where we show that non-ACE2 competing antibodies do not fully neutralize pseudovirus on ACE2 overexpressing cells. While testing antibodies for the current study we also noticed that some S2-targeting antibodies are also not affected by ACE2 overexpression. Therefore, for LY-CoV1404, CC9.104 and CC67.105 antibodies we used HEK-293T-ACE2 as target cells but for NTD-targeting 5-7 antibody we used HEK-293T-ACE2-medium cells. For REGN10933 and Delta breakthrough sera we used HEK-293T-ACE2-TMPRSS2 as target cells, because TMPRSS2 overexpression increases Delta pseudovirus titers. 12-15 hours after infection cells were trypsinized, washed with PBS and non-integrated lentiviral genomes were recovered using QIAprep Spin Miniprep Kit and eluted in 21 ml of Qiagen elution buffer. Barcode reads for each sample were then prepared for Illumina sequencing using a method described in 'Barcode amplicon preparation for Illumina sequencing' section below.

Functional selections using full spike deep mutational scanning libraries
To perform functional spike selections we infected 1 million HEK-293T-ACE2 cells with 1-2 millions of the spike or VSV-G-pseudotyped viruses produced from deep mutational scanning library carrying cells (described earlier). As for antibody selections, the amount of virus used is greater than the number of variants in each library which limits potential bottlenecking of the library barcodes. 12-15 hours after infection cells were trypsinized, washed with PBS and non-integrated lentiviral genomes were recovered using QIAprep Spin Miniprep Kit. Barcode reads for each sample were then prepared for Illumina sequencing using methods described in 'Barcode amplicon preparation for Illumina sequencing' section below.

Barcode amplicon preparation for Illumina sequencing
To prepare barcode reads for Illumina sequencing we performed two rounds of PCR. In the first round of PCR we used primers that align to Illumina Truseq Read 1 primer site located directly upstream of the barcode in the lentiviral backbone and a primer annealing downstream of the barcode containing an overhand with Illumina Truseq Read 2 sequence (see 'Illumina barcode sequencing 1st round PCR primers' in the Table S1). Conditions for the first round PCR were as follows: 1 ml of 10uM forward primer, 1 ml of 10uM reverse primer, 26 ml of KOD, and 24 ml of miniprepped sample DNA. PCR cycling conditions for round 1 PCR were as follows: 1. 95ºC for 2 min 2. 95ºC for 20 s 3. 70ºC fro 1 s 4. 58ºC for 10 s, cooling at 0.5ºC per s 5. 70ºC 20 s (return to step 2 for another 27 cycles) 6. 4ºC hold PCR reactions were purified with Ampure XP beads using 1:3 sample to beads ratio and eluted in 37 ml of Qiagen elution buffer. Second round of PCR used primers primer annealing to the Illumina Truseq Read 1 primer site with P5 Illumina adapter overhang and reverse primers from the PerkinElmer NextFlex DNA Barcode adaptor set, which anneal to Truseq Read 2 site and contain P7 Illumina adapter and i7 sample index. Conditions for the second round PCR were as follows: 1.5 ml of 10uM universal primer, 1.5 ml of 10uM indexing primer, 25 ml of KOD, and 20 ng of first round PCR product. PCR cycling conditions were the same as the first round PCR for a total of 20 cycles. After the second PCR round all samples were pooled at desired ratios and gel and Ampure XP bead purified. Barcode amplicons were sequenced using NextSeq 2000 with either P2 or P3 reagent kits.

Production of barcoded neutralization standard
To make the neutralization standard that we add into our deep mutational scanning libraries we used the same general barcoding approach as described above for the deep mutational scanning plasmid library generation with a few important differences. The lentiviral backbone used for the neutralization standard consists of TRE3G inducible mCherry protein and CMV promoter driven ZsGreen. The plasmid map of the template backbone is at https://github.com/dms-vep/SARS-CoV-2_Omicron_BA.1_spike_ DMS_mAbs/blob/main/library_design/reference_sequences/other_plasmid_maps_for_library_design/2871_pH2rU3_ForInd_mCherry_ CMV_ZsG_NoBC_cloningvector.gb. Note this backbone does not encode any viral glycoproteins and to rescue VSV-G-pseudotyped virus we provide VSV-G expression plasmid in trans. We amplified mCherry plasmid from the lentiviral template and barcoded it in two independent PCR reactions using 2 sets of primers containing 4 unique barcodes (see 'Neutralization standard barcoding primers' in Table S1). Importantly, the unique barcodes were balanced so that there is a unique nucleotide at each position of the 16-nucleotide barcode between each of the four barcoding primers in a PCR reaction. Furthermore, the 8 barcoding

EXPERIMENTAL MODEL AND SUBJECT DETAILS
HEK-293T were acquired from ATCC (CRL3216), HEK-293T-ACE2 cells are described in Crawford et al.,16 generation and characterization of HEK-293T-ACE2-medium cells is described in Farrell et al. 59 (referred to 'medium' cells in the reference), generation of HEK-293T-rtTA cells is described below. All cells were grown in D10 media (Dulbecco's Modified Eagle Medium with 10% heat-inactivated fetal bovine serum, 2 mM l-glutamine, 100 U/mL penicillin, and 100 mg/mL streptomycin). For antibody selection experiments D10 was made with phenol-free DMEM (Corning DMEM With 4.5g/L Glucose, Sodium Pyruvate; Without L-Glutamine, Phenol Red from Fisher, Cat. No. MT17205CV). For experiments with HEK-293T-rtTA cells D10 was made with tetracycline-free FBS (Gemini Bio, Cat. No. 100-800) to avoid any expression of spike unless doxycycline is added.
To produce HEK-293T-rtTA expressing cells (used for storing deep mutational scanning libraries and required for TRE3G promoter activation) we first generated VSV-G-pseudotyped lentivirus carrying rtTA gene. To produce this virus we transfected 0.5 million HEK-293T cells with 0.25 mg of each helper plasmid (Gag/Pol, Tat1b, Rev1b), 0.25 mg of VSV-G expression plasmid and 1 mg of lentiviral backbone carrying plasmid into which rtTA has been cloned (plasmid map https://github.com/dms-vep/SARS-CoV-2_ Omicron_BA.1_spike_DMS_mAbs/blob/main/library_design/reference_sequences/other_plasmid_maps_for_library_design/2850_ pHAGE2_EF1a_TetOn_IRES_mCherry.gb). Note this lentiviral backbone is self-inactivating and therefore would not be produced and packaged into lentiviral particles upon helper plasmid transfection. 48 hours after transfection virus-containing cell supernatant was collected and filtered through a surfactant-free cellulose acetate 0.45 mm syringe filter. 5ml of the virus was used to infect 0.5 million low passage HEK-293T cells. 48 hours post infection single cell clones were sorted into a 96-well plate using BD Aria II cell sorter with 610/20 filter in PE-Texas Red channel. Single clones were expanded and tested for ability to produce high virus titers. High virus titers are essential for performing deep mutational scanning experiments on a practical experimental scale and we found that individual cell clones can vary significantly in the virus titers they can produce. Clonal cell population producing the highest virus titers was selected for expansion, frozen down in 10% DMSO and 20% FBS D10 media, and stored long-term in liquid nitrogen freezer.

Computational analysis
Overview of data analysis pipeline To analyze the deep mutational scanning data, we created a modular analysis pipeline. At the core of this pipeline is a set of common steps that are expected to be shared across analysis of many different datasets. We implemented this set of common steps in a standalone GitHub repository named dms-vep-pipeline that is publicly available at https://github.com/dms-vep/dms-vep-pipeline and is designed to be included in project-specific analyses as a git submodule. The dms-vep-pipeline consists of a series of Snakemake 72 rules that run Python scripts or Jupyter notebooks, and specifies a conda environment that provides details on the software used for the analysis. For this paper, we used version 1.01 of the dms-vep-pipeline.
For each specific project (in this case, deep mutational scanning of the BA.1 and Delta spikes) we created a separate GitHub repository that included dms-vep-pipeline as a submodule. The repository for BA.1 is publicly available at https://github.com/dms-vep/ SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs and the repository for Delta is at https://github.com/dms-vep/SARS-CoV-2_ Delta_spike_DMS_REGN10933 Note how each repository has a configuration file (the config.yaml file), project-specific input data (the data subdirectory), and a top-level Snakemake file (the Snakefile) that runs the analysis. The output of running the pipeline is placed in a results subdirectory, although only key results files are tracked in the GitHub repository since many of them are very large. The pipeline also generates HTML rendering of the key analysis notebooks and result plots, which are available at https://dms-vep. github.io/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs for BA.1 and https://dms-vep.github.io/SARS-CoV-2_Delta_spike_ DMS_REGN10933 for Delta. Looking at these websites is the easiest way to understand the analysis. Note that many of the plots are interactive charts created with Altair, 73 and we encourage readers to use the interactive features to better explore the data. Analysis of PacBio data to link barcodes to spike mutations To link each barcode to its spike variant, we used alignparse 74 to process the PacBio CCSs to determine the barcode and spike mutations for each CCS.
We performed several quality control steps. First we examined the synonymous tags introduced at the end of each amplicon during the library preparation to identify CCSs with discordant tags indicative of strand exchange during library preparation: typically <2% of CCSs had discordant tags, indicating a low rate of strand exchange. CCSs with identified strand exchange or out of frame indels were discarded. Next, we computed the empirical accuracy of the CCSs by examining how often CCSs with the same barcode reported the same spike sequence (the exact method used to compute the empirical accuracy is implemented here: https://jbloomlab.github. io/alignparse/alignparse.consensus.html#alignparse.consensus.empirical_accuracy). The empirical accuracies were between 0.65 and 0.75, indicating that fraction of CCSs correctly report the actual mutations. The inaccuracies are due to a combination of sequencing errors, reverse transcription errors, PCR strand exchange, and occasional actual association of the same barcode with different variants in different cells (which can especially arise if the complexity of the initial virus library integrated into cells at single copy is not much higher than the complexity of the final cell library).
We then built consensus sequences for each barcode with at least three CCSs, using the method implemented at https:// jbloomlab.github.io/alignparse/alignparse.consensus.html#alignparse.consensus.simple_mutconsensus with max_minor_sub_frac and max_minor_indel_frac both set to 0.2. This approach of requiring multiple concordant CCSs to call a consensus is expected to lead to higher accuracy in the final barcode / spike variant linking, and will generally discard barcodes that are not uniquely linked to a single spike variant.
Files containing the final barcode / variant lookup tables and the analysis notebooks with resulting quality control plots are linked to the main HTML pages in the documentation for the BA.1 and Delta experiments as provided in the Processed data section below. Analysis of Illumina data to count barcodes for each variant in each experiment For each experiment, we processed the Illumina barcode sequencing with the parser implemented at https://jbloomlab.github.io/ dms_variants/dms_variants.illuminabarcodeparser.html to determine the counts of each variant in each condition. Barcoded variants were only retained for subsequent analysis if their ''pre-selection'' counts (no-antibody selection for antibody escape experiments, VSV-G-pseudotyped infections for functional selections) met some minimal count threshold specified in the config.yaml file of the GitHub repos for the BA.1 and Delta spikes. This thresholding removes variants that are expected to have substantial noise due to low counts. Note that a caveat that should be kept in mind is that the actual key bottleneck is expected to usually occur at the stage of infection with the virus library rather than sequencing, since the barcodes are generally sequenced to a depth that greatly exceeds the complexity of the libraries used for the infections. Therefore, although variants with low counts are expected to have more noise, the counts do not enable a quantitative estimate of the actual bottleneck size experienced by each variant.
Files containing the barcode counts and the analysis notebook with resulting quality control plots are linked to the main HTML pages in the documentation for the BA.  Figure 2.
To deconvolve the functional scores for all spike variants (which often contain multiple mutations) into estimates of the effects of individual amino-acid mutations, we fit global epistasis models 41 to the variant functional scores, using the models implemented at https://jbloomlab.github.io/dms_variants/dms_variants.globalepistasis.html with the Gaussian likelihood function. This fitting estimates how each mutation affects an underlying latent phenotype, as well as the shape of the global epistasis function relating the latent phenotype to the observed functional score. We also then re-transform the inferred effect of each mutation on this latent phenotype through the global epistasis function to estimate its effect on the observed phenotype. This approach provides a way to deconvolve the information in the multi-mutant variants to more accurately estimate the effects of mutations under the assumptions of a global epistasis model. 41 For final reporting, we then took the average (median) of the estimated functional effect of each mutation across all the replicates and libraries for each different spike (BA.1 or Delta). In Figure 7 of this paper we report the functional effects on the observed (rather than latent) phenotype, as that is a more relevant measure of its expected impact on spike-mediated infection.
Files containing the effects of mutations on both the latent and observed phenotypes for both individual replicates/libraries and averages across them, the analysis notebooks with relevant quality control plots, and interactive plots summarizing the final estimates are linked to the main HTML pages in the documentation for the BA.1 and Delta experiments as provided in the Processed data section below.
For the figures in this paper and the default view of the interactive plots described in the Processed data section below, we only include mutations seen in at least three distinct variants (averaged across libraries). Computing antibody escape by mutations For the antibody selections, we computed the non-neutralized fraction (probability of escape) p v ðcÞ for each variant v at each antibody concentration c for a given antibody as is the count of variant v at antibody concentration c, n v 0 is the count of variant v in the no-antibody control, S c is the total counts of the neutralization standard at antibody concentration c, and S 0 is the total concentration of the neutralization standard in the no-antibody control. These values should in principle fall between 0 (variant is totally neutralized by antibody) and 1 (variant is not neutralized by antibody), and in practice we clip any values measured as >1 to a value of 1.
To deconvolve mutation-level escape values from the measured p v ðcÞ values for the variants (which often contain multiple mutations) at multiple concentrations, we used the approach implemented in polyclonal software package (https://jbloomlab.github.io/ polyclonal/), 36 constraining the fits to a single epitope (since we are only analyzing monoclonal antibodies). This analysis yields a mutation-level escape score for each observed variant (the b m;e values in the nomenclature of the polyclonal package) which will be zero for mutations that have no effect on antibody escape, and >0 for mutations that mediate antibody escape. These are the values plotted in the heat maps shown in the antibody escape figures in this paper; the line plots show site-level summaries of these values (eg, the sum of the escape values at each site). Note that the polyclonal models 36 can use the escape values inferred from the deep mutational scanning to predict the non-neutralized fraction for arbitrary mutants, and we correlated those measurements with the IC 50 values measured by standard neutralization assays in the antibody escape figures in this paper. For all antibodies we had replicate measurements (multiple libraries, and in some cases technical replicates of the same library), and the final reported values are the average (median) across these replicates.
Files containing the escape values for both individual replicates/libraries and averages across them, the analysis notebooks with relevant quality control plots, and interactive plots summarizing the final estimates are linked to the main HTML pages in the documentation for the BA.1 and Delta experiments as provided in the Processed data section below.
For the figures in this paper and the default view of the interactive plots described in the Processed data section below, we only include mutations seen in at least three distinct variants (averaged across libraries), and with a functional effect of at least -1.38 (for Omicron BA.1) or at least -1.47 (for Delta). Comparison of deep mutational scanning data to enrichment of mutations during actual human SARS-CoV-2 evolution In Figure 7, we compare the functional effects of mutations measured in deep mutational scanning to the enrichment of mutations in actual human SARS-CoV-2 evolution compared that expected given the underlying viral mutation rate. The computation of the ''expected'' versus actual observed mutation counts is performed by the code at https://github.com/jbloomlab/SARS2-mut-rates/tree/ 25204ea0c868c4c86d0cc16bd5f203b1b0607868 The approach for calculating the enrichment of observed versus expected mutations is inspired by the strategy used in Neher 75 Briefly, we downloaded the pre-built UShER tree of all publicly available SARS-CoV-2 sequences 25 as of Sept-26-2022from http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/2022/09/26/public-2022-09-26.all.masked.nextclade. pangolin.pb.gzWe then used matUtils25 to separate the tree by the pre-annotated Nextstrain clades, and extract the number of unique occurrences of each nucleotide mutation along the tree branches for each clade. Note that we are counting unique occurrences of each mutation along the tree (number of branches with a mutation), not the number of observations of the mutation in the final set of aligned mutationsThis is an important distinction, as a mutation could be common in the final aligned sequences even if it only had one or a few occurrencesIn counting these occurrences, we ignored mutations on any branch with more than four nucleotide mutations, or more than one nucleotide mutation to either the Wuhan-Hu-1 reference sequence or the founder sequence for that clade, as taken from Neher 75 The reason for excluding mutations on branches with high numbers of mutations as these can often indicate problematic sequences; the reason for excluding mutations on branches with multiple reversions to reference is that sometimes sequences with missing coverage have identities automatically called to reference. After tabulating these mutation counts, we ignored any mutations that are reversions from the clade founder to the Wuhan-Hu-1 reference or reversions of these, a set of sites known to be subject to sequencing errors, 76 or were at sites with unusually high mutation rates (see sites manually specified in the config.yaml file at the GitHub repo listed above). Overall, this process provided a list of the counts of nucleotide mutations for each Nextstrain clade after filtering branches and mutations/sites expected to be affected by various possible errors.
We then calculated the relative rates of each type of the 12 possible nucleotide mutations at four-fold synonymous third codon positions by counting the mutation at such positions in each Nextstrain clade, and then normalizing the fraction of these positions that had each nucleotide identity at each site, retaining only Nextstrain clades with at least 5000 such mutation counts. 43 We then calculated for each clade the expected number of counts of each nucleotide mutation based on these rates and the actual counts at the four-fold synonymous sites. The result of these calculations is how many times we expect to observe each nucleotide mutation if all sites are under the same constraint as the four-fold synonymous sites. Next, we aggregated the expected and observed counts across all Nextstrain clades, and also across all nucleotide mutations that encoded for the same amino-acid mutation to get expected and observed counts for amino-acid mutations, and only retained amino-acid mutations with at least 20 expected counts. Figure 7D plots the log base 2 of the expected versus observed counts after adding a pseudocount of 0.5.
We compared these counts to deep mutational scanning data from our study and several other spike deep mutational scanning studies as described in the main text. The tabulated deep mutational scanning measurements and the enrichments of mutations in actual SARS-CoV-2 evolution are at https://github.com/dms-vep/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs/blob/main/ results/compare_muteffects/all_natural_enrichment_vs_dms.csv

Processed data
The key results from the analysis are stored in the results subdirectory of the GitHub repos for BA.1 (https://github.com/dmsvep/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs) and Delta (https://github.com/dms-vep/SARS-CoV-2_Delta_spike_DMS_ REGN10933). The easiest way to navigate these results are via the HTML documentation at https://dms-vep.github.io/SARS-CoV-2_Omicron_BA.1_spike_DMS_mAbs and https://dms-vep.github.io/SARS-CoV-2_Delta_spike_DMS_REGN10933 These pages contain links to the key data files, as well as interactive heat maps of the functional effects of mutations and the effects of mutations on antibody escape. Note that these plots are interactive, and allow you to filter by certain regions of the protein, the number of variants in which a mutation is seen, the maximum magnitude of an effect at a given site, and other relevant parameters.
In the final output files, mutations are numbered in reference-based (Wuhan-Hu-1) spike numbering. The GitHub repos contain files that convert sequential numbering of the BA.1 and Delta spike to reference based numbering.
Modeling of antibody escape was performed using polyclonal software implemented in https://jbloomlab.github.io/polyclonal/ and as described in computational analysis section and in Yu et al. 36 Modeling of functional effects of mutations on spike entry was performed by fitting global epistasis models as described in computational analysis section using dms_variants package as implemented at https://jbloomlab.github.io/dms_variants/dms_variants. globalepistasis.html.
Measurement of enrichment of mutations in spike during natural evolution was performed as described in computational analysis section and implemented at https://github.com/jbloomlab/SARS2-mut-fitness/tree/25204ea0c868c4c86d0cc16bd5f203b1b0607868.

Raw sequencing data
The raw PacBio and Illumina sequencing data have been deposited on the NCBI's Sequence Read Archive with BioProject number PRJNA888402 for the Omicron BA.1 data and PRJNA889020 for the Delta data. The PacBio sequencing linking variants to barcodes can be found under BioSample: SAMN31220980 for Omicron BA.1 and BioSample: SAMN31230634 for Delta. The Illumina barcode sequencing can be found under BioSample: SAMN31216920 for Omicron BA.1 and BioSample: SAMN31230628 for Delta.

Ethics statement
Delta breakthrough sera were collected after written and informed consent. The breakthrough sera were part of the prospective longitudinal Hospitalized or Ambulatory Adults with Respiratory Viral Infections (HAARVI) study from Washington State, USA, which was approved by University of Washington Institutional Review Board (protocol #STUDY00000959).  Figure S1. Pseudovirus titers from phenotype-genotype linked lentiviruses, related to Figure 1 (A) Traditional lentivirus pseudotyping method. The lentivirus backbone used for pseudotyping does not code for the spike gene. To make spike-pseudotyped lentivirus, lentiviral helper plasmids, backbone, and spike expression plasmid are transfected into producer cells to make spike-pseudotyped lentivirus. This method produces lentiviruses that lack a genotype-phenotype link because the spike expressed on the surface of a viral particle is not coded by the lentiviral genome.    Figure S5. Comparison between antibody-escape mapping using full spike pseudovirus deep mutational scanning versus our previously described yeast-display deep mutational scanning of just the RBD, related to Figure 4 (A) Correlation between measured mutation-level escape scores for LY-CoV1404 antibody in pseudovirus and yeast-display deep mutational scanning experiments. Yeast display data is taken from Starr et al. 13 . (B) Surface representation of SARS-CoV-2 RBD colored by the sum of escape scores for LY-CoV1404 antibody at that site. (C) Correlation between measured mutation-level escape scores for REGN10933 antibody in pseudovirus and yeast-display deep mutational scanning experiments. Yeast display data is taken from Starr et al. 5