Persistent cross-species SARS-CoV-2 variant infectivity predicted via comparative molecular dynamics simulation

Widespread human transmission of SARS-CoV-2 highlights the substantial public health, economic and societal consequences of virus spillover from wildlife and also presents a repeated risk of reverse spillovers back to naive wildlife populations. We employ comparative statistical analyses of a large set of short-term molecular dynamic (MD) simulations to investigate the potential human-to-bat (genus Rhinolophus) cross-species infectivity allowed by the binding of SARS-CoV-2 receptor-binding domain (RBD) to angiotensin-converting enzyme 2 (ACE2) across the bat progenitor strain and emerging human strain variants of concern (VOC). We statistically compare the dampening of atom motion across protein sites upon the formation of the RBD/ACE2 binding interface using various bat versus human target receptors (i.e. bACE2 and hACE2). We report that while the bat progenitor viral strain RaTG13 shows some pre-adaption binding to hACE2, it also exhibits stronger affinity to bACE2. While early emergent human strains and later VOCs exhibit robust binding to both hACE2 and bACE2, the delta and omicron variants exhibit evolutionary adaption of binding to hACE2. However, we conclude there is a still significant risk of mammalian cross-species infectivity of human VOCs during upcoming waves of infection as COVID-19 transitions from a pandemic to endemic status.

Widespread human transmission of SARS-CoV-2 highlights the substantial public health, economic and societal consequences of virus spillover from wildlife and also presents a repeated risk of reverse spillovers back to naive wildlife populations. We employ comparative statistical analyses of a large set of shortterm molecular dynamic (MD) simulations to investigate the potential human-to-bat (genus Rhinolophus) cross-species infectivity allowed by the binding of SARS-CoV-2 receptorbinding domain (RBD) to angiotensin-converting enzyme 2 (ACE2) across the bat progenitor strain and emerging human strain variants of concern (VOC). We statistically compare the dampening of atom motion across protein sites upon the formation of the RBD/ACE2 binding interface using various bat versus human target receptors (i.e. bACE2 and hACE2). We report that while the bat progenitor viral strain RaTG13 shows some pre-adaption binding to hACE2, it also exhibits stronger affinity to bACE2. While early emergent human strains and later VOCs exhibit robust binding to both hACE2 and bACE2, the delta and omicron variants exhibit evolutionary adaption of binding to hACE2. However, we conclude there is a still significant risk of mammalian crossspecies infectivity of human VOCs during upcoming waves of infection as COVID-19 transitions from a pandemic to endemic status. aegyptiacus) resulted in transient subclinical infection with oral and faecal shedding [29]. Given that the probable sources of SARS-CoV-2 or its progenitor are bat species, the potential risk of reverse zoonotic transmission from humans to bats and the subsequent negative impacts have to be recognized and studied. Furthermore, a significant concern in such secondary spillover events is the subsequent evolution of mutant strains leading to increased transmissibility and/or mortality in humans, reduced sensitivity to neutralizing antibodies and reduced vaccine efficacy.
A very crucial and unresolved key question of concern regarding the continued evolution of this pandemic is whether and how long the human VOC's remain capable of reverse spillovers into other species of mammals as they adapt their binding to more specifically target human ACE2. We have recently introduced new statistical applications for comparing the divergence of short-term rapid MD of proteins in functionally relevant molecular binding states (i.e. comparing the divergence of atom fluctuation between bound versus unbound protein states) [30][31][32]. We have applied this computational method to study of the evolution of emergent and endemic viral strains related to SARS-CoV-2 [33] and to the study of the evolution of antibody-binding escape mutations as well [34]. Here, we use this same comparative molecular dynamics-based approach to study individual amino acid sites involved in the binding of the various strains of the SARS-CoV-2 viral receptor-binding domain (RBD) to both the human and various Rhinolophus bat ACE2 orthologs (hACE2 and bACE2 resp.). We present multiple-test corrected site-wise statistical comparisons of the SARS-CoV-2 RBD binding signatures of all currently reported VOCs in the presence of both hACE2 and various bACE2, identifying sites that have led to significantly increased hACE2 binding of SARS-CoV-2 related viral strains as they evolved during the course of the pandemic. We report that while some specific adaptations to hACE2 have emerged in the Delta and Omicron VOC, there is still a persistent risk of reverse spillover to bats and likely many other mammals even several years after the start of the pandemic.

PDB structure and model preparation
Structures of the RBD of spike proteins, hACE2, Big-Eared horseshoe bat Rhinolophus macrotis, bACE2, and ACE2 from other different bat species (Rhinolophus sinicus [Chinese rufous horseshoe bat], Rhinolophus affinis [Intermediate horseshoe bat] and Rhinolophus ferrumequinum [Greater horseshoe bat]) were either obtained from the Protein Data Bank (PDB) or modelled using Alphafold2. The summary of the structures used for MD simulations is listed in table 1. Upon downloading the structures from PDB, any crystallographic reflections and other small molecules used in crystallization were removed. During the cleanup of the PDB, any glycans were removed and then later rebuilt using glycoprotein builder [35] so that the PDB file structure regarding atom types was compatible with the Amber 20 preprocessing software tLeAP [36]. See details below. When preparing the structures, we needed each of the variants (RaTG13, Wuhan WT, Alpha, Beta, Delta, Kappa, Epsilon, Omicron BA.1, Omicron BA.2 and Omicron BA.4/BA.5) bound to hACE2 and bACE2. We were able to find structures in the PDB where the RaTG13 variant was bound to bACE2 and the human variants bound to hACE2. Therefore, to model the RaTG13 variant bound to hACE2 and the human variants bound to bACE2, we used UCSF Chimera's MatchMaker superposition tool to properly place the receptor belonging to the opposite species [37]. Using pdb4amber (AmberTools20), hydrogen atoms were added, and crystallographic waters were removed [36]. Any missing loop structures in the files were inferred via homology modelling using the 'refine loop' command to modeller within UCSF chimera [38,39]. The structure of Omicron BA.2 RBD was not available on PDB, and therefore Alphafold2 was used to create a three-dimensional structure, details of which are given below. Similarly, Alphafold2 was used to model ACE2 of R. affinis, R. macrotis and R. ferrumquinum.

Molecular dynamic simulation protocols
MD simulation protocol was followed as previously described, with slight modifications [30][31][32]41]. Briefly, for each MD comparison, large replicate sets of accelerated MD simulations were prepared and then conducted using the particle mesh Ewald method implemented on the graphical processor unit (GPU) hardware by pmemd.cuda (Amber20) [42][43][44]. The MD simulations were either performed on a Linux mint 19 operating system (two Nvidia RTX 2080 Ti or two Nvidia RTX 3080 Ti) or on high performance computing cluster (Nvidia A100). All comparative MD analysis via our DROIDS pipeline was based upon 100 replicate sets of 1 nanosecond accelerated MD run (i.e. 100 × 1 ns MD run in each comparative state, e.g. RBD bound to the receptor, unbound RBD). Explicitly solvated protein systems were first prepared using teLeap (AmberTools 20) using the ff14SB protein forcefield, in conjunction with the GLYCAM_06j-1 forcefield [40,45]. Solvation was generated using the Tip3p water model in a 12 nm octahedral water box [46]. Automated charge neutralization was also done with teLeap software with Na + and Cl − ions. Each replicate set was preceded by energy minimization, 300 picoseconds of heating to 300 K, a 10 ns of equilibration, followed by random equilibration intervals for each replicate ranging from 0 to 0.5 nanoseconds. All simulations were regulated using the Anderson thermostat at 300 K and one atmospheric pressure [47]. Root mean square atom fluctuations were calculated in CPPTRAJ using the atomicfluct command [48].

Comparative protein dynamic analyses with DROIDS 4.0 and statistical analyses
Comparative signatures of dampened atom fluctuation during RBD binding to ACE2 were presented as protein site-wise divergence in atom fluctuation in the ACE2 bound versus unbound states for each RBD. Divergences were calculated using the signed symmetric Kullback-Leibler (KL) divergence calculation in DROIDS 4.0. Significance tests and p-values for these site-wise differences were calculated in DROIDS 4.0 using two-sample Kolmogorov-Smirnov tests with the Benjamini-Hochberg multiple test correction in DROIDS 4.0. The mathematical details of DROIDS 4.0 site-wise comparative protein dynamics analysis were published previously by our group and can be found here [30][31][32]. This code is available at our GitHub web landing: https://gbabbitt.github.io/DROIDS-4.0-comparative-proteindynamics/, which is also available at our GitHub repository https://github.com/gbabbitt/DROIDS-4. 0-comparative-protein-dynamics. The supporting data generated by this code for this manuscript are in repository at Zenodo/CERN https://zenodo.org/record/6477772#.YmMX09rMI2w with digital identifier https://doi.org/10.5281/zenodo.6477772 [49].

Alphafold2 three-dimensional prediction
We used Alphafold2 to create predicted protein structures of Omicron BA.2 RBD (Accession # UJE45220.1), R. affinis ACE2 (Accession # QMQ39227.1), R. sinicus ACE2 (Accession # AGZ48803.1) and R. ferrumequinum ACE2 (Acession # BAH02663.1). AlphaFold2 is a neural network-based deep learning model which first searches for homologous sequences with existing structures to use as a scaffold on which to place the new sequence [50]. The AlphaFold2-based prediction was run with the 'single sequence' mode using the predicted TM-score (PTM) method. We also specified that the algorithm should run an Amber relaxation procedure (i.e. energy minimization) to repair any structural violations in the predicted model [51].

Stronger binding of RatG13 RBD to bACE2 than to hACE2
We performed MD simulations of RaTG13 RBD bound and unbound to R. macrotis bACE2 and RaTG13 RBD bound and unbound to hACE2. To compare atomic fluctuations between RatG13 RBD bound and unbound structures, we used site-wise KL divergence along with multiple test corrected two-sample KS tests. The more negative the KL divergence value of a specific amino acid residue, the stronger the dampening of atomic fluctuations due to the RBD interactions with bACE2/hACE2. As one would expect, the bat RaTG13 RBD has a better binding with stronger amino acid-specific interactions with the bACE2 (figure 1a,c). However, in the case of hACE2, dampening of atomic fluctuations is lesser at those specific sites due to ACE2 being of human origin (figure 1a,d). Interestingly, the amino acid residues involved with the interactions of both bACE2 and hACE2 are very similar. This was observed in the normalized KL divergence graph, where the normalized KL divergence values per amino acid are similar between bACE2 and hACE2 (figure 2a). Other studies have found that 26 residues of the bACE2 and nine residues of the RaTG13 RBD are present at the interface. These residues create 12 H-bonds, two salt bridges and 157 non-bonded contacts [52]. The residues of RatG13 RBD that are strongly dampened by bACE2 include K417, L455, F456, S477, N487 and D501 (figure 1a). The weaker dampening of atomic fluctuations of RaTG13 RBD and hACE2 is primarily due to the lesser number of interactions between the RBD and hACE2. Compared to bACE2, hACE2 only makes 113 non-bonded contacts and 9 H-bonds [53]. Lastly, MD comparison of RaTG13 bound to bACE2 and RaTG13 bound to hACE2 show that almost half of the amino acids in the RBD of RaTG13 behave statistically differently (figure 1b). When comparing RaTG13 RBD bound to hACE2 royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220600 royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220600 with RaTG13 RBD bound to bACE2, the significance tests are conducted site-wise. Therefore, a separate test was conducted at each given amino acid site to compare the significant difference in fluctuations of the backbone atoms. Thus, a separate D value from a two-sample KS test is obtained for each amino acid site, and a multiple test correction (Benjamini-Hochberg) was applied to adjust the p-value to account for Wuhan-Hu-1 Lastly, to quantify the interaction of the different variants with bACE2 and hACE2, we also calculated the area under the curve (AUC) of the non-normalized KL divergence values, true atomic fluctuation dampening due to the RBD interaction with ACE2. As expected, the bat progenitor strain, RaTG13, has a higher AUC with bACE2 than hACE2. Since the Wuhuna-Hu-1 RBD is the first variant to make the transmission from bats to humans, the AUC profile is similar for both bACE2 and hACE2 ( figure 3a). Interestingly, all VOC (Alpha, Delta, Omicron BA.1, Omicron BA.2, Omicron BA.4/BA.5) have a higher AUC for the hACE2 than bACE2. When a paired t-test was performed to compare the AUC values of the VOC bound to bACE2 and hACE2, we saw a significantly higher AUC profile of the VOC with hACE2 ( figure 3a,b). And lastly, there is no clear trend for the AUC profile of the VBM. This is also clearly seen with no significant difference in AUC values of VBM between bACE2 and hACE2 (figure 3a,c).

Stronger binding of Delta RBD to hACE2 than to bACE2
The Delta variant (B.1.617.2) was first identified in India in October 2020. The two mutations that make up the RBD of the Delta variant include L452R and T478K. The site-wise KL divergence interaction of the Delta RBD between bACE2 and hACE2 denotes that hACE2 certainly has better binding dynamics than the bACE2 (figure 4a). KS test between the Delta RBD bound to bACE2 and Delta RBD bound to hACE2 royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220600 shows very few amino acids with statistically different atomic fluctuations (35%). About 65% of the amino acid of Delta RBD behave very similarly when either bound to bACE2 or hACE2. At the sites corresponding to the Delta variant mutation (L452R and T478K), we see a high level of significance in the difference in atomic fluctuation between the bACE2 and hACE2 (figure 4b). As mentioned previously, when comparing Delta RBD bound to hACE2 with Delta RBD bound to bACE2, significance tests were conducted in a site-wise manner, with a D value from a two-sample KS test calculated for each amino acid, and Benjamini-Hochberg multiple test correction was applied to adjust the p-values. The test correction was done to account for the multiple significance tests. Lastly, compared to bACE2, the binding of the hACE2 also has a dampening effect on the residues that are farther away from the RBD/ACE2 interface ( figure 4c,d).

ACE2 protein of different bat species display similar binding dynamic with different RBDs
In addition to ACE2 from R. macrotis, we also analysed the binding dynamics of ACE2 from other bat species with RBDs from RatG13, Alpha and Omicron BA.4/BA.5 variants. The different bat species includes R. macrotis, R. affinis, R. ferrumequinum and R. sinicus (figure 7c). In summary ACE2 from R. macrotis, R. affinis, R. ferrumequinum and R. sinicus were simulated with RBD from RatG13, Alpha and Omicron BA.4/BA.5 variants. The atomic fluctuations of the SARS-CoV-2 variants bound to the four different bat ACE2s are very identical (electronic supplementary material, figure S4). As mentioned previously, to quantify the interactions between RBD and ACE2, we also calculated the AUC of the KL divergence values. The AUC profile is very similar for the different MD simulations we performed (figure 7a). When the four different bat species were compared group wise we also see no statistical difference in the AUC values, indicating that the atomic fluctuations of the different RBDs are identical across different bat ACE2s (figure 7b).

Discussion/conclusion
Using our comparative MD simulation pipeline, we compared the binding profile of bACE2 and hACE2 with the RBD domains of different SARS-CoV-2 variants (RaTG13, Wuhan-Hu-1, Alpha, Beta, Delta, Kappa, Epsilon, Omicron BA.1 and Omicron BA.2). MD simulations of RatG13 spike protein revealed a better binding profile and stronger dampening of amino acids on the interface of RBD and bACE2 compared to hACE2. In the case of the RBD from Wuhan-Hu-1, the original 2019 SARS-CoV-2 virus, we see no difference in the binding profile between bACE2 and hACE2. Lastly, we also observed two different profiles with the VBM and VOC. In the case of VBM (Beta, Kappa and Epsilon variants), we see a similar binding profile and atomic fluctuation dampening in the RBD/bACE2 and RBD/hACE2 interfaces. On the other hand, VOCs (Alpha, Delta and Omicron variants) show preferable binding to hACE2 than to bACE2, indicating higher AUC values when RBD is bound to hACE2 than RBD bound to bACE2. The SARS-CoV-2 virus was first reported from pneumonia patients of Wuhan city in China's Hubei province. The spillover of SARS-CoV-2 from animals to humans occurred at the beginning of December 2019, when some of the pneumonia patients were involved in the wet animal market in the Hunnan district [55]. Genomic sequences, homology of ACE2 receptor and single intact ORF on gene 8 of the virus indicate bats as the natural reservoir of these viruses. However, an unknown animal is yet to be identified as an intermediate host [56,57]. It should be noted that even though the initial spread of the disease was due to a spillover even, the rapid spread of the disease was primarily due to human-tohuman transmission [55]. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 9: 220600

13
The receptor usage by the coronavirus has been well known to be a significant determinant of host range, tissue tropism and pathogenesis. Therefore, it is reasonable to assume that SARS-CoV-2 can infect humans, bats and other species. As a matter of fact, several in vivo infection and seroconversion studies have confirmed that SARS-CoV-2 can infect rhesus monkeys, feline, ferret and canines [58,59]. Our MD simulation has shown that RaTG13 RBD can bind to both bACE2 and hACE2, however to bACE2 with stronger binding, shown by severe dampening of atomic fluctuations in the bACE2/RBD interface (figure 1a). Even though hACE2 doesn't severely dampen the atomic fluctuations, the KL divergence profile is very similar to that of bACE2, indicating the flexibility of the virus to jump hosts. Additionally, it has also been shown that the RatG13 RBD Y493 is speculated to confer a potential steric clash to hACE2 [60].
As mentioned previously, for zoonotic spillover events to occur, humans must be exposed to the viruses. This can occur through direct contact with viruses excreted from infected bats or bridge hosts or through other contacts with infected animals such as slaughtering or butchering. The nature and intensity of the bat-human interface are critical to determining spillover risk. Human behaviour is a primary determinant of exposure, which may increase contact with bats or with other animals (bridge hosts) that may expose susceptible humans. Little is known about the specific conditions of coronavirus spillovers. Still, human behaviours that may increase viral exposure include activities such as bat hunting and consumption, guano farming and wildlife trading [52][53][54][61][62][63]. Due to recent spillover events, we see an identical binding profile with both bACE2 and hACE2 in the case of Wuhan-Hu-1 RBD (figure 2b). In the case of Wuhan-Hu-1 RBD, our MD simulation shows identical dampening of the atomic fluctuations and that the amino acid backbone in the RBD interacts with both bACE2 and hACE2 identically.  Regardless, in the case of human SARS-CoV-2 strains, we see a slightly different trend. In the case of VOC, including Alpha, Delta and Omicron variants, we see a slightly higher binding profile with hACE2 than bACE2 (figures 3a,b and 4-6). However, unlike RaTG13 RBD, we do not see very strong differences in the atomic fluctuations in the bACE2/hACE2 interface with the VOC RBDs. We don't see any significant differences in the binding profile with the VBM, including Beta, Epsilon and Kappa variants. At some areas of the RBD, the atomic fluctuations of the VBM are very similar between bACE2 and hACE2. The ability of the recent VOC and VBM to bind to both hACE2 and bACE2 with only slight differences supports the reverse zoonosis theory. The transmission of SARS-CoV-2 from humans to numerous animals and conducted in vitro infection experiments make it clear that the virus can infect and be transmitted between a wide range of distantly related mammal species. For example, case reports on cats (Felis catus) living in the same household with COVID-19 patients in Europe, Asia, North America and South America revealed that these animals could be infected with SARS-CoV-2, showing clinical manifestations ranging from asymptomatic to severe respiratory illness [55,56,64,65]. The reports show that 14% of tested cats in Hong Kong were SARS-CoV-2 positive by RT-PCR [66].
Furthermore, the seroprevalence screening performed among pets living in SARS-CoV-2-positive households in Italy demonstrated that 3.3% of dogs and 5.8% of cats were seropositive [67]. The high seroprevalence and SARS-CoV-2 detection rates in cats and, to some extent, in dogs indicate that these animals can be infected with SARS-CoV-2 [68]. Several other zoo animals, like tigers, lions, cougars and gorillas, were found to test positive for the virus. Farmed minks are highly susceptible to SARS-CoV-2 infection, and, in some cases, they have transmitted the virus back to humans. SARS-CoV-2positive minks were detected in 290 fur farms in Denmark, 69 mink fur farms in the Netherlands, 13 of 40 mink farms in Sweden, 23 out of 91 mink farms in Greece, 17 fur farms in the USA, four farms in Lithuania, two farms in Canada and one fur farm each in Italy, Latvia, Poland, France and Spain [69,70]. As a result of the virus being able to infect multiple species and also being able to jump hosts, there are concerns that the introduction and circulation of new virus strains in humans could result in modifications of transmissibility or virulence and decreased treatment and vaccine efficacy.
In conclusion, our MD simulations identified that the original bat progenitor RaTG13 RBD shows preferential binding to its host bACE2 receptor than hACE2. However, some of the recent human variants show differential binding between bACE2 and hACE2. Lastly, the VOC RBD shows slightly higher binding to hACE2 than the bACE2. These findings provide evidence that recent human SARS-CoV-2 variants may re-infect bats and that the extensive species diversity of bats may also have profound effects on SARS-CoV-2 evolution in the future. Given that the phylogenetic distance between bats and humans is comparable to that of most domestic pets and livestock, we also suggest that these mammals, and probably many others, could readily become host reservoirs that further promote the evolution of persistent cross-species infectivty as well. Additionally, our method provides a relatively fast and efficient computational MD-based approach for the functional surveillance of the virus and the viral receptor that can potentially enhance the functional interpretation of current efforts of genomic surveillance of emerging human viral variants of concern.