Identifying vaccine escape sites via statistical comparisons of short-term molecular dynamics

The identification of viral mutations that confer escape from antibodies is crucial for understanding the interplay between immunity and viral evolution. We describe a molecular dynamics (MD)-based approach that goes beyond contact mapping, scales well to a desktop computer with a modern graphics processor, and enables the user to identify functional protein sites that are prone to vaccine escape in a viral antigen. We first implement our MD pipeline to employ site-wise calculation of Kullback-Leibler divergence in atom fluctuation over replicate sets of short-term MD production runs thus enabling a statistical comparison of the rapid motion of influenza hemagglutinin (HA) in both the presence and absence of three well-known neutralizing antibodies. Using this simple comparative method applied to motions of viral proteins, we successfully identified in silico all previously empirically confirmed sites of escape in influenza HA, predetermined via selection experiments and neutralization assays. Upon the validation of our computational approach, we then surveyed potential hotspot residues in the receptor binding domain of the SARS-CoV-2 virus in the presence of COVOX-222 and S2H97 antibodies. We identified many single sites in the antigen-antibody interface that are similarly prone to potential antibody escape and that match many of the known sites of mutations arising in the SARS-CoV-2 variants of concern. In the Omicron variant, we find only minimal adaptive evolutionary shifts in the functional binding profiles of both antibodies. In summary, we provide an inexpensive and accurate computational method to monitor hotspots of functional evolution in antibody binding footprints.


INTRODUCTION
In current attempts to prevent the spread of the coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), clinicians and scientists have focused their efforts on the development of vaccines that are hoped to induce broad, long-lasting, neutralizing antibodies. However, the selective pressures imposed by the presence of the neutralizing antibodies in the host population can also drive the evolution of viruses toward adaptations that allow them to escape neutralization. For example, in the case of influenza viruses, immunity is provided by antibodies that target the hemagglutinin (HA), responsible for viral attachment and viral fusion to the host cell. However, these antibodies drive selection for amino acid substitutions in the HA, causing the influenza virus to rapidly evolve every year (1). Similarly, the antibody selection by the host immune system can also drive the emergence of new SARS-CoV-2 variants. Therefore, when developing vaccines that elicit antibodies against a broad range of strains, research efforts should also be aimed at identifying potential mutations that can facilitate viral escape from the neutralization effects of specific antibodies (2,3).
The traditional approach to identifying these mutations is by passaging the virus in the presence of antibodies in a directed selection experiment, followed by validation of the variants that arise with neutralization assays. For example, in influenza viruses, escape mutant selection using a panel of monoclonal antibodies (mAbs) was used to identify the five major antigenic regions, Sa, Sb, Ca1, Ca2, and Cb (4)(5)(6). However, a significant drawback of this approach is that the directed selection typically only favors one of the many potential mutations that can escape a given antibody. Another approach is to test antibody binding to a panel of viral variants. In a typical 500-residue viral protein, there are about 10 4 potential single amino acid mutants (7). Creating all individual mutants and then testing the mutants against the antibodies is an impossible task, thus causing researchers to limit themselves to exploring only a small portion of protein space (e.g., examining only mutations to alanine) (8,9). Such studies cannot give a complete picture of the mutational spectra that can allow a virus to escape neutralization by a given antibody (10).
Functional evolutionary studies of viral vaccine escape are often supplemented with protein structural determination via x-ray crystallography or cryogenic electron microscopy and subsequent contact mapping of heavy atoms within a specific cutoff distance. While structural information regarding the details of antibody binding are often considered the gold standard in defining epitopes, particularly for patent application, it has also been understood that structure alone cannot completely define the individual sites or "hotspots" that drive interactions within given protein-protein interaction (PPI) binding interfaces (11). To better identify sites that control PPI, a host of computational methods have been developed. These include coevolutionary rate comparison to identify functional pairings of sites across the PPI interface (12), complex scoring methods to predict PPI specificity (13,14), and, more recently, graph network and machine learning (ML) methods aimed at easing this PPI scoring challenge (15,16). So, while structural biology can provide a static image to verify that an antibody physically contacts a viral protein, it often cannot provide complete information regarding which amino acid sites are more prone to single replacement mutations that may drive sudden evolutionary changes affecting vaccine efficacy (i.e., hotspot sites or residues). Although binding can be observed and verified in structure, a much richer picture of the functional sites of binding may be determined by the study of changes to their motion (i.e., molecular dynamics [MD]) upon the formation of the PPI interface.
We propose here that a supplementation of structural analysis with a proper comparative statistical study of a bound versus unbound protein's short-term dynamics may help to directly resolve these hotspots of PPI specificity in any given PPI interface without imparting the potential biases of scoring functions and/or ML training data sets. The prediction of hotspot residues is crucial as these sites on the antibody-antigen complex have a strong propensity to disrupt binding interactions within the antibody-antigen interface (17,18). Recently, it has been demonstrated, via site-directed selection experiments and neutralization assays, that these hotspot regions share a common biophysical feature. They all tend to harbor single amino acid sites that have significant large affects upon binding interactions in the antibody-antigen interface (19). Given this common feature of hotspot residues, we further hypothesize an important role for comparative MD simulations of the antibody-antigen interface to help predict potential vaccine escape mutations before they happen, allowing for important functional context to real-time sequencebased surveillance of current and future pandemics.
Here, we utilize a relatively simple method of comparative statistical analysis of MD simulations developed by our lab (DROIDS-detecting relative outlier impacts in dynamic simulation or DROIDS 4.0) that employs a sitewise Kullback-Leibler (KL) divergence metric and a multiple test-corrected two-sample Kolmogorov-Smirnov (KS) test to successfully validate previously known sites of antibody escape in the influenza HA (19)(20)(21)(22)(23). We then utilized our site-wise comparative MD approach to identify potential sites prone to antibody escape in the spike protein of SARS-CoV-2. Specifically, in the Omicron variant we were able to identify sites in the receptor binding domain (RBD) that support the binding efficiency to two general neutralizing antibodies and its competitive binding to the natural receptor, human angiotensin-converting enzyme 2 (hACE2). We also compare our dynamicsbased method to a more traditional structure-based method of counting heavy atom contacts using distance cutoffs within the antibody-viral binding interface. In summary, we present a method to accurately identify hotspot residues that are prone to single point mutations with large functional effects upon the antibody-antigen binding interface and thus are likely preadapted to allow for vaccine escape. Identifying such residues in silico will be essential for prescreening the antigenic consequences of viral genetic variations and designing better vaccines that induce long-term and broadly neutralizing antibodies against viral pathogens.

PDB structure, glycosylation, and model preparation
The protein structures used for the primary models for analyzing the MD of antibody interactions with HA and the SARS-CoV-2-RBD are listed in Fig. 1 A ( [24][25][26][27][28][29]. Any crystallographic reflections were removed along with any other small molecules used in crystallization. Hydrogens were added, and crystallographic waters were removed using pdb4amber (AmberTools18) (30). Glycosylation was deleted using the swapaa function in UCSF Chimera 1.14 (31). Predicted glycosylation was rebuilt for the Amber force field using the glycoprotein builder on the glycam.org webserver and the GLYCAM06j-1 force field (32,33). Any missing loop structure in the files was inferred via homology modeling using the "refine loop" command to Modeler within UCSF Chimera (34,35). The globular head domain of the influenza HA is stable in its monomeric form. However, the stalk domain needs to be in a trimer to be stable (36). Therefore, MD simulations of PDB: 4GMS was performed using a trimmed (residues 57-270) monomeric form of the head domain. For MD simulations of the PDB: 3ZTJ and PDB: 4HLZ, the trimeric form of the stalk domain was used. The antibodies were trimmed, and only the heavy and light chains of the fragment antigen binding (Fab) were used. Finally, for the Omicron (B.1.1.529) variant simulations, we used the swapaa function to model the 15 RBD mutations onto PDB: 7OR9 (Fig. 4 B).

MD simulation protocols
All comparative MD analysis via our DROIDS pipeline was based upon 100 replicate sets of 1 ns accelerated MD runs (i.e., 100 Â 1 ns MD runs in each comparative state, e.g., unbound versus bound). MD simulations were conducted using the particle mesh Ewald method via accelerated MD (pmemd.cuda) in Amber18 and the ff14SB protein FIGURE 1 Description of PDB files and antibodies used in this study, and the comparison between KL divergence and average fraction of variants surviving. (A) Table summarizing the protein structure used for primary models for analyzing the MD of antibody interactions with the influenza HA and SARS-CoV-2 RBD, and ACE2 interactions with wild-type and mutated SARS-CoV-2 RBD (includes VOC mutations). The table includes amino acid positions and the corresponding ÀKL divergence value denoting atomic fluctuations dampening for antibody-antigen/protein 1-protein 2 MD simulations. D value and the level of significance for the corresponding amino acid position are also given. (B) S139/1 (green), FI6v3 (blue), and C179 (red) epitopes are mapped onto the HA trimer, shown in gray (PDB: 1RVX). Overlapping epitopes between FI6v3 and C179 are shown in cyan. Next to the HA trimer is a phylogenetic tree of HA subtypes. Circles denote reported antibody binding or neutralization against that subtype. (C) Box and whisker plots showing the KL divergence values within 2 sd for amino acid sites that had significant average fraction of variants surviving and nonsignificant average fraction of variants in the presence of monoclonal antibody S139/1 (19). Only sites K156, G158, and N193 of the HA head domain had significant average fraction surviving viral particles. and GLYCAM06j-1 (32)) force fields and implemented on two RTX 2080 Ti graphics processor units controlled via a Linux Mint 19 operating system (32,(37)(38)(39)(40)(41). These replicate sets were preceded by energy minimization, 300 ps of heating to 300 K, and 10 ns of equilibration, followed by random equilibration intervals for each replicate ranging from 0 to 0.5 ns. All protein systems were prepared via tLeAP (Ambertools18) and explicitly solvated, and charge neutralized with Na þ and Clions in a Tip3P octahedral water box set to 12 nm beyond the surface of each protein with periodic boundaries (30,42). All simulations were regulated using the Anderson thermostat at 300 K and one atmospheric pressure (43). Root mean-square atom fluctuations and atom correlations were conducted in CPPTRAJ using the atomicfluct and atomicorr commands (44).

Comparative protein dynamics analyses with DROIDS 4.0 and statistical analyses
Comparative signatures of dampened atom fluctuation during antibody binding were presented as protein site-wise divergence in atom fluctuation in the antibody-bound versus unbound states for each viral target protein (both influenza HA and SARS-CoV-2 RBD). Divergences were calculated using the signed symmetric KL divergence calculation in DROIDS 4.0. This divergence or relative entropy (45) is taken between the homologous distributions of atom fluctuation (i.e., root mean-square fluctuation or rmsf taken from 0.01-ns time slices of total MD simulation time) comparing the MD of antibody-bound and unbound spike proteins averaged over the four protein backbone atoms, achieving individual amino acid resolution. The rmsf value is thus where v represents the set of XYZ atom coordinates for i backbone atoms (C, N, O, or Ca) for a given amino acid residue over j time points and w represents the average coordinate structure for each MD production run in a given ensemble. The KL divergence (i.e., relative entropy) or similarity between the rmsf of two homologous protein sites representing a functional binding interaction (i.e., where 0 ¼ unbound state and 1 ¼ antibody-bound state) can then be described by where rmsf represents the average root mean-square deviation of a given atom over time. More specifically, the rmsf is a directionless root mean-square fluctuation sampled over an ensemble of MD runs with similar time slice intervals. Since mutational events at the protein level are typically amino acid replacements, this calculation is useful if applied to resolution of single amino acids rather than single atoms.
Note that amino acid side chain dynamics are ignored because only the backbone atoms (N, Ca, C, and O) are homologous between residues. However, because the sidechain atoms always attach to this backbone, rmsf still indirectly samples the dynamic effect of amino acid sidechain as they are still present in the simulation. The reference state of the protein is always unbound while the query state is bound. Therefore, the pairwise comparison represents the functional impact of binding on the unbound protein's normal motion in its solvated state. Thus, binding contact dampens the fluctuation of atoms at the sites of binding to the degree to which they are involved in the binding interaction. Multiple test-corrected two-sample KS tests are used to determine the statistical significance of local site-wise differences in the rmsf distributions. As the analysis is site-wise and a statistical hypothesis test is conducted for each site, the Benjamini-Hochberg method (46) was used to adjust p values for the false discovery rate generated by the higher probability of false positive results generated by the analysis of the multiple sites of a given protein structure. Therefore, all significance tests and p values for these site-wise differences were calculated in DROIDS 4.0 using two-sample KS tests with the Benjamini-Hochberg multiple test correction in DROIDS 4.0. Further mathematical details of DROIDS 4.0 site-wise comparative protein dynamics analysis were published previously by our group and can be found here (21,22). This code is available at our GitHub web landing: https://gbabbitt.github. io/DROIDS-4.0-comparative-protein-dynamics/, and is also available at our GitHub repository https://github.com/gbabbitt/DROIDS-4. 0-comparative-protein-dynamics.
We also use this method to compare the mutational impacts on the bound SARS-CoV-2 RBD/angiotensin-converting enzyme 2 (ACE2) interface for all current variants of concern (VOC) mutations. Thus, rather than determining the effect of binding on the viral protein dynamics, we instead determine the effect of the mutations on the viral protein dynamics when it is bound to its human target receptor ACE2.

Epitope map identification
Previously published articles were surveyed to identify the epitope mapping of the HA and SARS-CoV-2 RBD antibodies (Table 1). In brief, the identified epitopes were highlighted on the amino acid back bone of the corresponding antigen. These studies utilized some combination of experimental methods for epitope mapping that included structural determination, site-directed mutagenesis, pepscan analysis, hydrogen deuterium exchange, and cross-linking coupled with mass spectrometry.
To further assess the relationship between the structures of the interfaces in our models and the changes in their dynamics upon binding, we used the ContPro webserver (49) to compute the closest distances between all amino acid sites within 6 Å proximity across the PPI interface of both our Covox222-RBD and S2H97-RBD models. We then correlated these interface distances to the change in dynamics upon binding at each amino acid site (i.e., dFLUX or KL divergence).

Relation of dampened atom motion and the number of heavy atom contacts in the antibody-viral binding interface
The AmberTools20 "nativecontacts" function was utilized to extract both native and nonnative heavy atom contacts for the first frame (i.e., static structure) and the subsequent 100,000 frames from a 20-ns MD simulation on the antibody-bound structures of the SARS-CoV-2 RBD and the COVX222 and S2H97 antibodies (i.e., providing both a static view of the contacts in space as well as a dynamic view of the contacts over time). We subsequently analyzed the correlations of our atom dampening metric (i.e., KL divergence) to both the static heavy atom contacts and the fraction of contacts over time. All supporting files, data, and custom code are supplied in a folder labeled "S3 [heavy atom contact analyses]" in the Data S1 file (data_ RajendranFerranBabbitt_2022). The commands we used to invoke nativecontacts in the AmberTools program are given in the READ-ME.txt file in this folder. Fig. S3 A shows an overview of the pipeline, including the use of our custom parsing scripts written in python and our correlation analyses written in R language.

RESULTS
Using our comparative MD analysis pipeline (DROIDS 4.0), we successfully identified and validated previously known antibody escape sites in the head and stalk domains of the influenza HA. We then expanded the scope of our study to identify potential mutational sites prone to antibody escape in the spike protein of SARS-CoV-2 RBD and its recent Omicron genetic variant.

Computational validation of antibody escape mutation hotspots in influenza HA head and stalk domains
We first conducted comparative site-wise analyses of various HA viral spike protein dynamics in both antibody-bound versus unbound states (Fig. 1 A). We then applied our approach to anti-HA antibodies with a wide range of breadth against different influenza strains. Negative-signed KL divergences indicated universal dampening of atom fluctuation at key epitope sites in the antibody-bound state. A two-sample KS test corrected for the number of sites analyzed confirmed the statistical significance of these binding effects ( Fig. 1 A). S139/1 is a one such mAb known to neutralize both group 1 (H1, H2, H5, H9, and H13) and group 2 (H3) influenza viruses ( Fig. 1 B) (50). Crystallization studies have revealed that the antibody targets highly conserved residues in the RBD of the head domain (26) (Fig. 1 B). We applied a Mann-Whitney U-test to compare our KL divergence values, generated using MD simulations, to amino acid residues with significant average fraction of variants surviving and nonsignificant average fraction of variants surviving in the presence of S139/1, in accordance with (19) (Fig. 1 C). In our comparative analyses of MD simulations, we found significantly higher KL divergence in the average fraction of variants surviving during directed selection (p < 0.001), supporting that our KL divergence metric proves to be a useful quantitative measure in discriminating sites with mutations leading to vaccine escape. Comparative MD simulations of HA bound to S139/1 and unbound have revealed strong dampening of atom fluctuations occurring at sites K156, G158, and N193 of the HA, with ÀKL divergence values À4.65 (D ¼ 0.810, p < 0.001), À5.81 (D ¼ 0.895, p < 0.001), and À4.81 (D ¼ 0.807, p < 0.001), respectively (Figs. 1 A and 2 A). Directed selection using mAb S139/1 revealed that these three residues match empirically identified sites of strong escape (Fig. 2 B) (19). As expected, the three amino acid residues with the highest negative KL divergence in our comparison of atom fluctuations from MD simulations also fall directly in the empirically determined physical binding footprint of the antibody. And they are the same three sites where previous works have selected escape mutants in the H1, H2, and H3 HAs (26,50). It should be noted that residue 156 is a part of the 150 loop of the influenza HA and forms electrostatic interactions by inserting itself into the acidic pocket in the Fab formed by residues Glu H35 and Glu H50 . Another 150 loop binding determinant is residue 158, which has the largest dampening of atomic fluctuations. Gly158 is closely stacked on the light chain complementarity determining region 3 of S139/1 and is further stabilized by a main chain hydrogen bond between S159 and Asn L92 . Thus, any mutations at these positions will cause clashes at the antibody interface. S139/1 also recognizes members of the 190 helix. In the residues of 190 helix, Asn193 plays a significant role in antibody recognition and is buried by heavy chain complementarity determining regions 1-3 of S139/1 (50). Furthermore, it should be noted that site 193 was known to SARS-CoV-2 RBD (29) interact with the host receptor molecule (sialic acid moiety), suggesting the contribution of this residue to receptor binding of HA (51,52). Therefore, isolates with mutated residues at site 193 have reduced viral fitness due to their inability to bind to the host receptors (53). Selective pressure created by S139/1 on residue 193 has other secondary effects on the virus, such as the reduced binding activity of the influenza virus. With mAb S139/1, atomic fluctuation dampening is also seen in residues 133-134 of the influenza HA (Fig. 2 A). These residues are part of the 130 loop of the HA, and make a significant portion of the RBD of the influenza virus. Finally, in addition to atomic fluctuation dampening at residues 133-134 and at residues 156/158, we also see dampening at 145-147 (Fig. 2 A). Upon further examination, there lie no epitopes of S139/1 between residues 145 and 147, supporting that this dampening may be an artifact of the residue's location between two epitope sites (130 loop and 150 loop) of S139/1. Therefore, in summary, the three sites determined to have the largest physical effect upon antibody binding in our simulations (i.e., showing the largest negative KL divergence) are exactly the same three sites most likely to evolve antibody escape under directed selection experiments of (19). We also chose two broad antibodies that target the stalk domain of the HA (FI6v3 and C179) to further validate our computational approach to identify potential sites of escape. FI6v3 was first isolated by highthroughput screening of immortalized antibodysecreting cells and was found to bind to both group 1 (H1, H2, H5, H6, H8, H9, and H13) and group 2 (H3, H4, H7, and H10) viruses ( Fig. 1 B) (24). Antibody C179 was first isolated from a mouse that had been immunized with the H2N2 virus and was later found to cross-neutralize H1, H2, H5, H6, and H9 subtypes ( Fig. 1 B) (47). Both FI6v3 and C179 have epitopes that lie in the stalk domain and are known to interfere with membrane fusion (Fig. 1 B). Our MD simulations do not reveal dampening of atomic fluctuations in the FIGURE 2 Epitope prediction and validation of hotspot residues of the influenza hemagglutinin in the presence of neutralizing antibodies. Sequence positional plotting of dampening of atom motion on the influenza hemagglutinin (HA) head domain by (A) S139/1 and (B) matching hotspots identified in lab work by Doud et al. (19) and additional sites of moderated dampening of atom motion on the HA stalk domain by (C) FI6v3 and (D) C179. The sequence profile of the ÀKL divergence between S139/1 and the head domain produces strong negative peaks in (A) at K156, G158, and N193. A modest negative peak is observed in the stalk domain in (C) at E47 in the presence of FI6v3. HA1 numbering is used to identify the amino acid positions in (A). HA2 numbering is used to identify the amino acid position in (C) and (D). The gray bar on top of the ÀKL divergence plots denotes the HA amino acid backbone with the location of (A) S139/1 epitopes shown in green, (C) FI6v3 epitopes in blue, and (D) C179 epitopes in red. (B) Logo plots of S139/1 show the amino acid position that has the largest effect. Letter heights are proportional to the excess fraction of virions with that mutation that survive the antibody, as indicated by scale bars. The logo plot was prepared using deep mutational scanning data from Doud et al. (19). stalk domain in the presence of FI6v3 or C179. When we look at the difference in atomic fluctuations of the stalk antibodies on the same scale as S139/1, we only see a few sites with only weak to moderate levels of dampening in atom fluctuation (Fig. 2, C and D). In conclusion, antibody selection experiments revealed very similar results as our comparative MD simulation studies, with only few sites in the stalk domain with slightly increased fraction of variants surviving mAb FI6v3 again, much in accordance with the experiments of (19). As a result, these authors concluded that the stalk domain is less capable of escaping antibodies by single mutations (19). We also find that, in the presence of FI6v3, both directed selection and MD simulation show a small bump at site 47 (KL ¼ À2.41, D ¼ 0.647, p < 0.001), which is of importance (Figs. 1A and 2C), also in accordance with (19). Other mutational scanning studies have demonstrated that the introduction of E47R in the stalk domain has increased the resistance to FI6v3 (54). Like these studies, we also conclude that the HA stalk is more intolerant of mutations and, confirmed by us here, by a weaker signature of negative KL divergence at individual sites with more dampened atom fluctuations upon binding (Fig. 2, C  and D). The presence of sites with only small diver-gences in atomic fluctuations in our comparative MD simulations on the HA stalk would seem to confirm that the antistalk antibodies studied here readily target sites with high mutational tolerance, as suggested by other site-directed experiments (55), or that the binding energetics at protein-protein interfaces can be asymmetrically distributed across all sites, thus preventing us from identifying the mutational tolerant sites (17,18).

Computational prediction of antibody escape mutation hotspots in SARS-CoV-2 RBD
Upon validation of our computation approach using influenza HA, we implemented our computational pipeline to identify potential hotspot residues in the RBD of the spike protein of SARS-CoV-2 in the presence of two recently published antibodies against the virus: COVOX-222 and S2H97 (Fig. 3 A). COVOX-222 is known to bind to different residues than S2H97 and is known to neutralize strains P.1 (gamma) from Brazil, B.1.351 (beta) from South Africa, and B.1.1.7 (alpha) from the UK (48). Termed "super-antibody," S2H97, is known to bind with high affinity across all sarbecovirus clades and prophylactically protects hamsters from  1 A and 3 B).
It should be noted that the residues picked up by our computational method are part of the epitopes of COVOX-222. Of all the residues identified by the comparison of MD simulation, the residues with the greatest dampening of atomic fluctuation have a higher chance of being classified as hotspot residues. They are more prone to mutate, thus allowing the virus to escape COVOX-222. Residue 417, one of the residues with moderate dampening of atomic fluctuation, makes a weak salt bridge interaction with the heavy chain complementarity determining region 3 residue E99 of COVOX-222 (48). Also, residue N501, one with residues with the most dampening of atomic fluctuations, is known to interact with light chain complementarity determining region 1 residue P30 of the antibody via a stacking interactions (48,57). In the case of the mAb S2H97, we see a moderate dampening of atomic fluctuations at sites D428 (KL ¼ À3.81, D ¼ 0.692, p < 0.001) and K462 (KL ¼ À3. 37 (Figs. 1 A  and 3 C). Four of the five sites, except E484, fall in the epitope footprint of S2H97 (29). Like mAb COVOX-222, we predict that the sites with the most dampening of atomic fluctuations (i.e., negative KL divergence) are more prone to functionally evolve under the selection pressure of the vaccine, thus allowing the virus to potentially escape the binding of S2H97. However, E484 does not fall in the S2H97 footprint; it has been shown that the mutation of this site is known to enhance immune escape form neutralizing antibodies and also increase affinity to hACE2 (58).
Many of the residues identified as potential sites of escape, either in the presence of COVOX-222 or S2H97, overlap with mutations seen in the VOC. At present, there are mainly five kinds of VOC: Alpha, Beta, Gamma, Delta, and Omicron. We employed comparative MD simulation of the SARS-CoV-2 RBD bound to hACE2 (PDB: 6VW1 trimmed to only include the viral RBD) modeled with and without the recently arising VOC mutations in the alpha to delta variant strains. Three of the VOC mutations (417, 484, and 501) that were previously identified as potential escape sites using our approach (Figs. 1 A, 3, B and C) here were shown to significantly increase viral binding to hACE2 at the protein interface. This additional comparative dynamics analysis of wild-type versus mutant viral bound structures demonstrates that the VOC mutations have an overall effect to increase binding at the protein interface (i.e., overall negative KL divergence in Fig. 4 A). In addition to the three sites that we confirmed previously by comparing bound versus unbound dynamics (i.e., 417, 484, and 501), we also saw significant differences in the viral bound atomic fluctuations (i.e., impacts of mutations themselves) at positions 452 (KL ¼ À0.206, D ¼ 0.155, p < 0.001) and 478 (KL ¼ À0.129, D ¼ 0.190, p < 0.001) of RBD (Fig. 4 B). Investigation with pseudoviruses possessing RBD mutations harbored by VOC demonstrated that the plasmaneutralizing activity of vaccinated individuals showed one-to threefold significant decreases against E484K, N501Y, or the K417N þ E484K þ N501Y triple mutant (59). These results confirm what we have observed using our comparative MD analysis (Figs. 1 A and 3, A-D) and extend our method to the investigation of biophysical impacts on function the individual VOC mutations themselves (Fig. 4, A and B). Furthermore, evidence from clinical trials on the impact of VOC on vaccine efficacy confirms what we have observed as well. For example, the ChAdOx1 nCoV-19 vaccine and the single-dose JNJ-78436735 (Johnson & Johnson/ Janssen) vaccine have reduced vaccine efficacy by 10.4 and 57%, respectively, against the B.1.351 variant, which contains K417N, E484K, and N501Y mutations (60)(61)(62). Therefore, when evaluating vaccine efficacy and the effect of neutralizing antibodies against the SARS-CoV-2 virus, the focus should be given to amino acid positions that are prone to mutate and escape the antibody, as these positions might cause the emergence of new VOCs.
Finally, we also modeled the most recent VOC, Omicron. The Omicron RBD is known to harbor 15 different mutations, with several mutations linked to greater transmissibility, lower vaccine efficiency, and increased risk of reinfection (63). To further understand the evolution of the variant, we ran comparative MD simulations with the Omicron variant bound and unbound to mAb COVOX-222. In the wild-type RBD we see a dampening of atomic fluctuation at residues that correspond very well to the antibody footprint (Fig. 5 A and B). Interestingly, in the Omicron RBD, we see some increase in the atomic fluctuation dampening at several of the residues that correspond to the COVOX-222 epitopes (Fig. 5). Upon closer examination, the residues that overlap between the antibody footprint and the Omicron mutations are sites where the hACE2 is known to interact with the RBD. As a result, the increase in atomic fluctuation dampening might not be because of the direct evolution of stronger binding to the antibody, but also due to the evolution of increased binding affinity to hACE2. Other studies have shown that the mutations in the Omicron variant increase the number of salt bridges and hydrophobic interactions between RBD and hACE2, resulting in a higher binding efficiency to hACE2 (64). In addition, it has also been shown that the structural changes in the RBD domain, caused by the mutations, reduce the antibody interactions (65,66). We present a similar analysis of the functional binding of the Omicron variant with S2H97 (Fig. S2), which also indicates minimal overall effects of the genetic changes on potential antibody neutralization, with the exception of E484A, T478K, and S477N sites, in which Omicron appears to have lost interaction with S2H97.
To provide comparison of our results with more traditional structure-based methods of contact mapping and counting heavy atom contacts within the structure of the binding interface, we conducted native and nonnative heavy atom contact analysis using AmberTools20. We report moderate and significant correlations between the degree of dampened atom motion (KL divergence) and the number of static heavy atom contacts, indicating roughly 25% association (Table S1 and Fig. S3 B). We do not find that counting the fraction of static contacts over time of 20 ns adds any more information, probably not surprising because the bound protein structures are well equilibrated and do not largely change in structural conformation over time (Fig. S3 B). However, we do find that both the number of sites identified, as well as the slopes of the linear models describing the relation between heavy atom counts and KL divergence are quite dependent upon the distance cutoff chosen for counting (Fig. S3, B, C, and D, respectively) While some sites identified by both methods are quite congruent (Fig. S3, C and D) when a good cutoff distance is chosen, we conclude that our analyses of short-term dynamics is better than contact mapping in its ability to fully identify the larger landscape of sites that may be prone to vaccine escape. Our method is also not prone to potential bias imparted by the user's arbitrary choice of cutoff distances in defining heavy atom contacts. Finally, we also demonstrate that our results are quite reproducible over identically initiated sets of replicate MD runs (Fig. S4). In summary, we observe that introduction of mutations sites in the Omicron RBD have only slightly altered the COVOX-222 and S2H97 interactions with the viral RBD. Most importantly, our computational method appears to be an important, reproducible, and effective way to prescreen, quantify and monitor these changes without much additional effort in the lab.

DISCUSSION
Here, we report a remarkably fine agreement between experimental approaches to identify vaccine escape mutations and a relatively simple comparative computation applied to MD simulations conducted on the primary functional structural states of antibody-antigen interfaces. Furthermore, we demonstrate the capacity of comparative MD to describe the overall trajectory of evolution of the SARS-CoV-2/ACE2 binding interface throughout the ongoing pandemic. While some of what we report might be able to be deduced from structural data alone, our additional comparative analysis of the dynamic states induced by the structure-function relationship adds a clear and quantitatively grounded level of resolution allowing for the identification of single sites (i.e., hotspots) contributing to the epitope binding of antibodies. Directed selection experiments, followed by confirmation through neutralization assays, are the classical approach for identifying the location of these hotspot residues. Not only are the selection experiments time consuming, unless they are conducted in a sophisticated high-throughput manner, they only identify one of many potential mutations that may escape an antibody (67). Compared with the classical method, our simple computational approach can identify the locations of sites prone to vaccine escape in a matter of days on modern GPU hardware (e.g., 30 h on an Nvidia RTX 3080Ti). In addition, our method identifies positions of all amino acids that could mutate to escape the antibody in a single in silico experiment. And, compared with other computational tools, our MD-based approach does not require large and potentially biased ML training data sets. Furthermore, the de novo prediction is based on a given experimental structure, which enables an unprecedented close synergy between our computational approach and existing laboratory methods for identifying potential routes of viral evolution leading to enhanced transmissibility and vaccine escape.
Over the years, many sequence-and structure-based computational methods have been developed for the purpose of deducing protein sites contributing to PPI specificity. These have ranged from coevolutionary rate analysis (12), simple contact mapping combined with a variety of contact scoring methods (13,68,69), and, more recently, graph network and ML approaches aimed at solving the "contact scoring challenge" (15,16) originally raised by the work of Bogan and Thorn (11). In addition, many tools that have been more specifically designed for epitope prediction also include sequence-based methods, ML methods, and structurebased methods. The epitope surface accessibility to antibody binding is generally used by the sequencebased methods (70,71). The availability of an antigen sequence is crucial for the sequence-based method; however, the predicted epitope residues are not grouped into the corresponding epitopes (72). The ML-based epitope prediction methods include several steps: a collection of data sets with clean and comprehensive data, extraction of antigen features of the sequence (e.g., physicochemical properties, evolutionary information, amino acid composition), and training the model using ML algorithms (73). Some of the commonly used ML tools for epitope prediction are ABCPred (uses artificial neural network), COBEpro (uses support vector machine), EPSVR (uses support vector regression method), and BepiPred (based on random forest algorithm) (70,71,(74)(75)(76). Finally, structure-based methods identify epitopes by antigen structure and epitope-related propensity scales, including specific physicochemical properties and geometric attributes (71,77). As the evolution of antibody-antigen interactions seems largely driven by single-site mutations with large functional antibody-antigen binding effects that would impart obvious biophysical impacts on MD, we offer a simple and effective alternative computational method to study vaccine escape in silico using readily available MD simulation software. We have combined this with a classic and easily interpretable approach to the statistical comparison of distributions of local atom fluctuations on virus proteins in the presence/absence of antibodies. While our comparative dynamics method may be computationally heavy compared with many of these, it is not burdened by potential bias imparted by the data sets used to develop the contact scoring method or to train ML algorithms. However, it is subject to the main caveats of all MD studies, mainly that the force fields employed are accurate for the systems being modeled, and that the dynamics of the systems are appropriately stabilized and representatively sampled.
In summary, we have computationally identified hotspot residues known to mutate in a viral antigen in the presence of a neutralizing antibody. We first validated our approach using the influenza spike protein and three well-characterized antibodies against the influenza HA (13). We then implemented our approach to identify sites in the RBD of SARS-CoV-2 that are known to mutate in the presence of neutralizing mAbs S2H97 and COVOX-222 (42,43). We further identified that residues known to mutate in the presence of the antibody overlap with the mutations seen in the VOC. Finally, we also identified sites in the Omicron variant mutations that enhance binding efficiency to hACE2. While the method described here is not a complete substitute for laboratory-based methods and takes longer to conduct than simple analyses of antibody-bound viral protein structure, we believe it can greatly complement these methods by allowing time and cost saving through the computational prescreening of the underlying biophysics that may drive outcomes of many potential lab experiments. Determining which viral mutations escape from antibodies will be crucial for designing future therapeutics and vaccines and assessing future antigenic implications of ongoing viral evolution.

AUTHOR CONTRIBUTIONS
M.R. wrote the manuscript, designed and performed the experiments, and analyzed the data. G.A.B. developed the code, revised and edited the manuscript, designed the experiments, and analyzed the data. M.C.F. edited the manuscript and analyzed the data.

DECLARATION OF INTERESTS
The authors declare no competing interests.