The fatty acid site is coupled to functional motifs in the SARS-CoV-2 spike protein and modulates spike allosteric behaviour

Graphical abstract


Introduction
The COVID-19 pandemic, which is having a devastating social and economic impact worldwide, is caused by the severe acute respiratory syndrome 2 (SARS-CoV-2) coronavirus. Since the initial outbreak in late 2019, SARS-CoV-2 has caused >261 million confirmed cases of COVID-19 disease and >5.2 million deaths [1] (perhaps as many as 7-13 million [2] worldwide as of 29th November 2021. SARS-CoV-2 is an enveloped, single-stranded RNA virus that belongs to the Betacoronavirus genus of the Coronaviridae family which includes pathogenic human coronaviruses that cause SARS severe acute respiratory syndrome) and MERS (Middle East respiratory syndrome) [3,4]. It initially infects respiratory epithelial cells by binding to the angiotensin-converting 2 enzyme (ACE2) [5,6]. When it first emerged as a human pathogen, SARS-CoV-2 was thought to cause predominantly respiratory disease, particularly pneumonia and severe acute respiratory distress syndrome [7,8]. However, it is now known that its effects are not limited to the respiratory tract: COVID-19 can cause severe inflammation and damage in other organs [9][10][11], including the heart, kidneys, liver and intestines, and can lead to neurological problems [12]. In common with other enveloped viruses, SARS-CoV-2 fuses its viral envelope with a host cell membrane to infect cells. Membrane attachment and fusion with the host cell is mediated by the SARS-CoV-2 spike protein, which primarily binds to the host ACE2 receptor [5,6]; however, the spike can also interact with https://doi.org/10.1016/j.csbj.2021. 12 neuropilin-1 [13,14] and potentially with other receptors [15,16]. The spike is a glycoprotein [17,18] and is found on the surface of the virion, in a trimeric form Fig. 1A).
Each monomer is formed of three regions: a large ectodomain, a transmembrane anchor and a short cytoplasmic tail [18]. The ectodomain comprises two subunits: S1 is responsible for binding to ACE2 [6,18], and S2 for viral-host membrane fusion [18,20]. The SARS-CoV-2 spike contains two proteolytic cleavage sites [18]: one 'furin protease recognition' site at the S1/S2 boundary, thought to activate the protein [21], and a second in the S2 subunit S2 ) that releases the fusion peptide [18,20]. The SARS-CoV-2 spike contains three free fatty acid (FA) binding sites, each located at the interface between every two neighbouring receptor-binding domains ( Fig. 1A and S3) [19]. The FA binding sites are lined by aromatic and hydrophobic residues (Fig. 1B) and a positively charged residue from a neighbouring monomer, namely R408, which acts as an anchor for the FA carboxylate headgroup [19]. The open spike conformation, with at least one RBD pointing upwards, is needed to interact with ACE2 receptors on the human host cell. It was shown by surface plasmon resonance that the presence of the FA linoleic acid (LA) reduces binding of the spike to ACE2 [19]. LA stabilizes the locked spike conformation, in which the receptorbinding motif (RBM) is occluded and cannot bind to the human ACE2 receptor [19], but there is no obvious connection between the FA sites and other structural motifs relevant for membrane fusion, or with antigenic epitopes. MD simulations showed persistent and stable interactions between LA and the spike trimer [19,22]. These simulations also revealed that LA rigidifies the FA binding site, and these effects extend to the N-terminus domain [22]. The cryo-EM structure of the spike from pangolin coronavirus (which is closely related to SARS-CoV-2) shows that the spike also binds LA in an equivalent FA pocket [23]. An equivalent FA binding site has also been found on the Novavax SARS-CoV-2 construct expressed and purified from insect cells [24].
Simulations, particularly atomistic molecular dynamics (MD) simulations, have provided crucial atomic-level insight into the structure, dynamics and interactions of the SARS-CoV-2 spike [13,19,22,[25][26][27][28][29][30][31][32][33]. Here, we apply dynamical-nonequilibrium MD simulations [34][35][36][37] to investigate the response of the SARS-CoV-2 spike to LA removal. We have shown this approach to be effective in identifying structural communication pathways in a variety of proteins, e.g. in identifying a general mechanism of interdomain signal propagation in nicotinic acetylcholine receptors [38,39] and mapping the networks connecting the allosteric and catalytic sites in two clinically relevant b-lactamase enzymes [40]. This approach is based on equilibrium simulations of the system in question, which generate configurations for multiple dynamicalnonequilibrium simulations, through which the effect of a perturbation can be studied. Running a large number of nonequilibrium simulations allows for the determination of the statistical significance of the structural response observed [37].

Dynamic response of the wild-type spike
A model of the locked wild-type spike was created from the cryo-EM structure (PDB code: 6ZB5) of the SARS-CoV-2 spike protein bound to three linoleate molecules [19]. Missing loops were built to generate the wild-type sequence according to the Uniprot accession number P0DTC2 for the unglycosylated ectodomain of the spike bound with LA (for details, see Supplementary Material). The locked structure had 42 disulphides per trimer. It remained intact and faithfully retained the structure and overall fold of the cryo-EM structure over the equilibrium simulation time [19,22]. Three equilibrium MD simulations (Fig. S1), 200 ns each, were per- Fig. 1. Cryo-EM structure of the ectodomain of the SARS-CoV-2 spike trimer with linoleic acid (LA) bound to the fatty acid-binding sites [19]. (A) Three-dimensional structure of the complex of the locked (in which all receptor-binding motifs (RBMs) are occluded) ectodomain of the SARS-CoV-2 spike trimer with linoleic acid (PDB code: 6ZB5) [19]. The spike protein is a homotrimer [18]: each monomer is shown in a different colour, namely green, orange and blue. LA molecules are highlighted with spheres. Each fatty acid (FA) binding site is located at the interface between two neighbouring monomers, and is formed by residues from two adjacent receptor-binding domains. (B) Detailed view of the FA binding site: this pocket is lined by hydrophobic and aromatic residues, and the LA acidic headgroup is close to R408 and Q409. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) formed for the locked form of the unglycosylated and uncleaved (no cleavage at the S1/S2 interface) ectodomain of the spike bound with LA and used as starting points for 90 dynamicalnonequilibrium simulations (Fig. S2). Here, we used models of the uncleaved spike ectodomains in order to detect any potential effects on structurally distant sites influenced by ligand in the FA sites in the intact spike. In the nonequilibrium simulations, all LA molecules were (instantaneously) annihilated. This triggers a response of the protein, as it adapts to LA removal. This annihilation is carried out for multiple configurations sampled from equilibrium MD (top panel in Figure S2), and comparison between the equilibrium and short dynamical-nonequilibrium MD trajectories identifies the structural response of the protein. Running multiple (in this case, 90) dynamical-nonequilibrium simulations reduces the noise associated with the structural response of the protein and allows for the determination of the statistical significance of the observed response. Nonequilibrium simulations of this type are emerging as an effective tool to study signal transmission and identify communication networks within proteins [38][39][40][41][42].
Here, the direct comparison between the equilibrium LA-bound and nonequilibrium apo spike simulations using the Kubo-Onsager approach [34][35][36][37] (bottom panel in Figure S2), and the average of the results over all the 90 replicates, allows for identification of the temporal sequence of conformational changes associated with the response of the spike to LA removal (Figs. 2 and S4), and also the determination of their statistical significance (Figure S5). The structure that we simulate here corresponds to the unglycosylated wild-type spike (Uniprot accession number P0DTC2), not cleaved at the 'furin recognition/cleavage' site. It was built based on the cryo-EM structure that originally revealed the FA binding site (PDB code: 6ZB5) [19]. Although a few glycans (e.g. at positions N165, N234, and N343) have been shown to be involved in the spike infection mechanism by altering the dynamics of receptor binding domain opening [25,43], and the glycan shield plays a vital role in the biological function of the spike, the internal networks and response of the protein scaffold identified here are not likely to be qualitatively altered by the glycans, which predominantly cover the exterior of the spike. The spike structure used for these simulations, namely the tightly packed closed structure [19], is only glycosylated on the outside, and as such, glycans are unlikely to affect the protein's internal allosteric communication networks. As Casalino et al. have shown, glycan dynamics are fast relative to the dynamics of the protein [25]. Note that the perturbation introduced here (LA annihilation) is not intended to mimic the physical process of LA (un)binding, but rather to promote a rapid response and force signal transmission within the protein, thus mapping the mechanical and dynamic coupling between the structural elements involved in this response. Note also that, due to the non-physical nature of the perturbation, the timescales observed for the protein's response do not represent the physical timescales of conformational change, however, the responses of similar systems (e.g. wild-type and D614G spike) can be meaningfully compared.
Despite small variations in amplitude, the structural response to LA removal is similar for the three monomers (Figs. 2 and S4). Figs. 2 and S4 show the time evolution of the average C apositional deviation of each individual monomer in the 5 ns following LA. Comparing these figures shows that all the monomers respond similarly, with the same motifs and order of events associated with signal propagation observed for each.
The simulations show that the structural response to LA removal occurs in very specific and well-defined regions of the spike. It is striking that some functional motifs, including regions distant from the FA site, are particularly affected by LA removal (Figs. 2 and S4). Some of these conformational changes occur in solvent-exposed, flexible regions of the protein (Figs. S6-S8). This might initially be surprising, but it should be noted that flexibility means that little energy is required to induce the conformational rearrangement. This is one reason why loops often play essential roles in protein function and allostery (e.g. [44]), and are frequently involved in signal propagation in proteins (e.g. [38][39][40][41]). Upon LA removal, the response within the spike starts in the FA binding pocket region, and it rapidly propagates through the receptorbinding domain (RBD) to the N-terminal domain (NTD), furin cleavage/recognition site and residues surrounding the fusion peptide (FP). All of the monomers are closely intertwined, and therefore signal propagation does not occur simply within an individual monomer, but rather involves a complex network of conformational changes spanning all three chains (Figs. S6-S8). For example, LA removal from the FA pocket formed by monomers A and C induces structural responses in the NTD of monomer B (NTD B ), the RBD of monomer C (RBD C ) and the furin cleavage/ recognition site of monomer A, as shown by the propagation of the signal from the first site to these regions (Figs. S6-S8).
Upon LA removal, the hydrophobic and aromatic residues lining the FA site reorient their side chains towards the inside of the FA site (Fig. S9), as part of the contraction and collapse of this site (Figs. S9-S10). Furthermore, in the absence of the partnering Fig. 2. Average Ca-positional deviation for the first monomer in the five nanoseconds after LA removal from the FA sites in the SARS-CoV-2 spike. The structural deviations were calculated using the Kubo-Onsager approach [34][35][36][37] for the pairwise comparison between the equilibrium LA-bound and dynamical-nonequilibrium apo spike simulations and averaged over all 90 replicates. A similar response to LA removal is observed for the other two monomers ( Figure S4). Some relevant motifs are highlighted in grey, namely N-terminal domain (NTD), receptor-binding domain (RBD), receptor-binding motif (RBM), fusion peptide (FP), fusion-peptide proximal region (FPPR), heptad repeat 1 (HR1), central helix (CH) and connector domain (CD). carboxylate headgroup of LA, R408 establishes alternative hydrogen bond interactions with nearby polar residues, such as Q404 and S375 (Fig. S11).
FA sites modulate key motifs for membrane fusion or antigenic epitopes in the spike The RBMs respond rapidly to LA removal ( Fig. 3 and S12-S13), due to their close proximity to the FA sites. From the time evolution of the protein response, signal transmission from the FA sites to the RBMs is apparently mediated by S366-A372 and R454-K458 (Fig. S14). The structural changes induced by LA removal in S366-A372 (a segment containing some of the residues lining the FA sites) are directly transmitted to R454-K458, and from there to the rest of the RBMs. 0.1 ns after LA removal, significant structural rearrangements are already apparent in the RBMs, mainly in the A475-C488 segment ( Fig. 3 and S12-S13). Subsequently, a gradual increase in deviations is observed for A475-C488. The RBM lies between the b4 and b7 strands of the RBD and contains most of the residues that directly interact with ACE2 [45,46]. This motif is one of the most variable regions of SARS-CoV-2 spike [47] and a major target for neutralising antibodies [48][49][50].
The NTDs also show a fast and significant response to LA removal, in particular, H146-E156 and L249-G257 ( Fig. 4 and S15-S16). The communication pathway connecting the FA sites to the NTDs apparently involve P337-A348 (a segment comprising some of the residues that directly interact with LA), W353-I358 and C166-P174 ( Figure S17). The rearrangements induced by LA removal in the P337-A348 region promptly spread to W353-I358 and from there to the NTDs via the C166-P174 region ( Figure S17). The NTD of the spike is a surface-exposed domain structurally linked to the RBD of a neighbouring monomer [19,45]. Although directly coupled to the RBD, the NTD does not bind to ACE2 [5,6] and its function in SARS-CoV-2 infection remains unclear. The spike NTDs of other related coronaviruses have been suggested to play a role in infection [51][52][53] and are known epitopes for neutralising antibodies [16,53]. Human antibodies targeting the NTD of the SARS-CoV-2 spike have been isolated from convalescent COVID-19 patients e.g. [16,54,55]) and this region was shown to be a super-antigenic site [56]. A cryo-EM structure of the complex between the spike and the 4A8 monoclonal antibody shows that the NTD loops L141-E156 and R246-A260 (two of the regions that show the largest responses to LA removal in Fig. 4 and S15-S16) directly mediate the interaction between the proteins [54]. Both of these loops are candidates for vaccine and therapeutic developments [54]. The conformational changes in the H146-E156 and L249-G257 segments are further transmitted, over the following 5 ns, to other parts of the NTD, namely S71-R78, N122-N125 and F175-F186. The N122-N125 segment is a conserved NxxN sequence motif present in the NTD of spikes from several coronaviruses, and its function remains unknown [57]. The F175-F186 region is located immediately before a recently identified epitope for human antibodies [55]. The S71-R78 segment is part of the GTNGTKR insertion shared by the SARS-CoV-2 and bat-CoV RaTG13 spikes but not the SARS-CoV spike [57]. This motif, which is also found in structural proteins of several other viruses, and proteins from other organisms, has been suggested to allow the SARS-CoV-2 spike to bind to other receptors besides ACE2 [57]. The coupling identified here between the FA site and specific regions of the NTD is remarkable and highlights the complex allosteric connections within the spike, with distant sites apparently able to modulate the response of the NTDs.
Both the furin cleavage/recognition site and V622-L629 region, which are >40 Å away from the FA site, respond notably to the removal of LA (Fig. 5 and S18-S19). Both regions respond rapidly, with a significant conformational response observed almost immediately after LA removal. The furin cleavage/recognition site is located at the boundary between the S1 and S2 subunits [18,19] and furin cleavage is thought to be important for the activation of the spike [21]. This site contains a polybasic PRRA insertion not found in other SARS-CoV-related coronaviruses [58]. Cellbased assays show that deletion of the PRRA motif affects virus infectivity [21,[58][59][60][61][62]. Note that in the simulations presented here, the furin cleavage/recognition site (located between R685 and S686) is not cleaved (see Supplementary Material). From the time evolution of the protein response to LA removal, signal propagation from the FA site to the furin cleavage/recognition site appears to occur via the C525-K537, F318-I326, and L629-Q644 regions (Figure S20). Upon LA removal, the structural changes in the FA site are first propagated to C525-K537 and then sequentially transmitted to F318-I326 and L629-Q644, ultimately reaching V622-L629 and the furin site. The furin cleavage/recognition site and V622-L629 region are among the spike regions most affected by LA removal and show increasingly large deviations (larger than most other loop regions of the protein) over the simulations. The conformational changes in these regions propagate to segments immediately adjacent to the fusion peptide, namely the downstream fusion peptide proximal region (FPPR) and the upstream D808-S813. The FPPR is a $ 25-residue segment located in S2 immediately downstream of the fusion peptide, which has been suggested to play an essential role in the structural transitions between preand post-fusion conformations of the spike [20]. The D808-S813 region is located upstream of the fusion peptide (FP), immediately preceding the S2 0 protease recognition and cleavage site R815) [60]. Both proteolytic sites in the SARS-CoV-2 spike are known epitopes for neutralising antibodies [63,64].
The close connection between the furin cleavage/recognition site, V622-L629, and the regions adjacent to the FP, identified here for the intact, wild-type spike is remarkable. Due to this crosstalk, mutations in or close to the furin cleavage/recognition site or V622-L629 are likely to affect signal transmission to the FPsurrounding regions, i.e. the FPPR and the S2 0 cleavage site. This is worthy of experimental investigation.  S18-S19 for the responses observed for the other two FA sites). Analogous connections between the other two FA sites and the furin cleavage/recognition sites, FPPR and S2 0 cleavage sites are also observed (Figs. S17-S18). For more details, see the legend of Fig. 3.

Simulations of the D614G spike
We also performed equilibrium and dynamical-nonequilibrium simulations of the D614G mutant spike. The D614G mutation is now dominant in SARS-CoV-2 lineages circulating worldwide [65] and confers increased transmissibility. Other variants are emerging as increasing numbers of infections provide further opportunities for mutations to arise. The B.1.1.7 (also known as Alpha) variant, largely responsible for the surge in cases in the UK in the winter of 2020/21, has increased infectivity without the D614G mutation [66,67]. However, three of the four variants, involved in the April 2021 surge of cases in India as well as the B.1.617.2 (Delta) and B.1.1.529 (Omicron) variants do include D614G among the mutations (COVID-19 Genomics UK Consortium, https://www.cogconsortium.uk and https://www.ecdc.europa.eu/). That a single amino acid replacement of the aspartate residue in position 614 by a glycine leads to more efficient viral transduction into host cells and greater infectivity [68][69][70][71] is worthy of mechanistic exploration. Here, three equilibrium 200 ns MD simulations were performed for the locked form of the unglycosylated and uncleaved ectodomain region of the D614G spike modelled from a wild-type model [19,22] based on the cryo-EM structure 6ZB5 [19] with and without LA bound (see Supplementary Material). Note that D614G has the same sequence as the wild-type except for the residue in position 614, which was mutated from an aspartate to a glycine. In the original wild-type spike, D614 is located at the interface between monomers, with its sidechain directly interacting with residues across the subunit interface [61]. Root mean square fluctuation (RMSF) profiles from equilibrium MD for the wild-type and D614G apo spikes are similar ( Figure S21). However, unlike the wild-type with or without LA, a single chain in one replicate of the D614G with LA bound exhibits greater fluctuation in the middle of the RBM corresponding to exposed loop residues Q474-N487 ( Figure S21). The RBM loop flipped orientation coincident with a transient loss of a salt bridge on the stem of this loop between K458 and E471 of the same chain.
The Q474-N487 loop is interesting because, although the RBD in the closed conformation remains inaccessible for binding to ACE2, residues Q474-N487 of the RBM (shown in magenta in the insert in Figure S21F) it includes the epitope Y473-P479 (YQAGSTP) [72], which may still provide a target for neutralizing antibodies in the closed conformation, depending on the degree of glycan shielding [73] (being close to the S349 O-glycosylation site [21]). This behaviour, albeit in a single chain of our equilibrium MD simulations of the unglycosylated, uncleaved wild-type and D614G LA-bound spikes, suggests that the D614G may influence mobility in this important region. There is some evidence that local flexibility and local sequence context can affect the fraction of occupancy of glycosylation sites in different proteins [74]. Spike glycosylation occurs co-translationally and it is also thought that LA binds cotranslationally. LA and the D614G mutation may affect glycosylation [75].
The trans-interface interactions of the carboxylate of D614 in the wild-type involve four potential candidate residues, K854, K835, Q836 and T859. In the equilibrium MD simulations, T859 came within 5 Å of D614 across an interface and made transient, weak interactions throughout the simulations of the apo and LAbound spike systems Figure S22). The carboxylate of D614 and NZ of K854 remains within salt-bridging distance throughout the wild-type simulations ( Figure S23). The D614-K854 trans-subunit interface interaction dominates throughout the simulation time, regardless of the presence of bound LA ( Figure S22). In the 200 ns open wild-type apo spike simulations, the trans-subunit interaction between D614 and K854 persisted for 99% (standard deviation 0.57%) of the frames across eight of the nine subunit interfaces. This contact was lost at only one interface and that was between an open and closed chain in one of the repeats ( Figure S22).
An analogous analysis was performed on the D614G mutant to establish whether K854 makes alternative hydrogen-bond or salt-bridge contacts across the 3 subunit interfaces (averaged over 3 Â 200 ns replicates) in the absence of a partnering D614 carboxylate. In the D614G spike, K854 fails to find any alternative salt-bridge and only occasionally comes within hydrogen-bonding distance of residues Q613 and N317 ( Figure S24). This supports the inference drawn from cryo-EM structures of the head region of the D614G spike that this mutation disrupts the inter-monomer saltbridge and hydrogen bond networks in this region, which may cause reduced stability of the trimer. This corresponds to the observation that the D614G mutant was mostly in an open conformation on the EM grids and suggests that loss of the D614-K854 interaction somehow destabilises the closed conformation (e.g. [76,77]). Dynamical-nonequilibrium simulations of the D614G variant were also performed to test whether the D614G mutation affects the response of the spike to LA. A distribution of conformations taken from the equilibrium simulations of the locked form of the unglycosylated and uncleaved D614G spike with LA bound was used as the starting point for the dynamical-nonequilibrium simulations. The same perturbation as for the wild-type spike was applied to the system, namely LA removal. The Kubo-Onsager approach [34][35][36][37] was again used to extract the response of the system ( Figure S25) and determine the statistical significance of the observed responses ( Figure S26). In the D614G variant, there is notably less symmetry across the monomers in the response of the spike to LA removal, compared to the wild-type ( Fig. 6 and S27-S34). For instance, the amplitude of the structural response of the V266-L629 and furin cleavage/recognition site regions in monomer C ( Figure S33) of the D614G is substantially smaller than in monomers B and A ( Fig. 6 and S34).
The conformational responses of the wild-type and D614G spikes can be directly compared because the same perturbation was used for both in the dynamical-nonequilibrium simulations. The conformational response of the NTDs and RBDs to LA removal is generally similar in the wild-type and D614G spike (Figs. S27-S33) with small variations in the amplitude of the structural rearrangements of some functional motifs, e.g. RBMs. However, the D614G mutation significantly affects inter-monomer communication, with reduction of signal transmission from the furin cleavage/recognition site and V622-L629 of one monomer to the FPPR of another ( Fig. 6 and S33-S34) compared to the wild-type ( Fig. 5 and S18-S19). In the D614G spike, only minor deviations of the FPPR are observed. Furthermore, the region located upstream of the fusion peptide, namely D808-S813, also shows different rates of signal propagation between the wild-type and D614G proteins ( Fig. 6 and S33-S34). The differences identified here may relate to functionally important differences between the wild-type and D614G spikes. The results here show that the D614G mutation alters the allosteric networks connecting the FA site to the regions surrounding the FP, particularly the FPPR. There is reduced communication between the monomers in the D614G spike. As noted above, the response of the D614G spike to LA is also less symmetrical than the wild-type.

Conclusions
Our findings show that changes in ligand occupancy at the FA site influence the dynamic behaviour of functionally important motifs distant from the FA site. The simulations identify a complex network of structural pathways connecting the FA sites to key structural motifs within the SARS-CoV-2 spike. These networks extend far beyond the regions surrounding the FA sites, with structural responses being observed in the RBM, NTD, furin cleavage/ recognition site and FP-adjacent regions (Movie 1). The results also show strong crosstalk between the furin cleavage/recognition site, V622-L629 and the regions adjacent to the FP. Disrupting or alter-ing these communication networks may be a novel strategy for drug development against COVID-19.
Simulations of the D614G spike show that this mutation affects communication between the FA site and the FPPR and the S2 0 cleavage site. The D614G mutant shows reduced response of the FPPR and a slower rate of signal propagation to the S2 0 cleavage site compared to the wild-type protein (Movie 2). These results indicate that the D614G mutation affects the allosteric behaviour and the response to LA of the spike, which may be related to the changes in viral fitness associated with this mutation [78].
The results here further highlight the potential of dynamicalnonequilibrium simulations for identifying pathways of allosteric communication [38][39][40] and suggest that this approach may be useful in analysing mutations and differences in functionally important dynamical behaviour, and possibly different effects of LA, between SARS-CoV-2 spike variants of clinical relevance, such as the Alpha, Beta, Gamma, and Delta, and (now) Omicron.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.