Predicted structural mimicry of spike receptor-binding motifs from highly pathogenic human coronaviruses

Graphical abstract


Introduction
Viruses have long been known to utilize molecular mimicry of host proteins to interrupt and exploit host biochemical pathways during infection [1,2]. Alongside the need to employ host machinery for the viral replication cycle, the evolution of viral protein motifs that resemble host proteins can result in new virulence mechanisms, such as inducing inflammation and evading the immune system [3]. Coronaviruses, in particular, have been suspected to have acquired human protein mimics throughout the long record of human coronavirus infections [4,5]. As further evi-dence, the highly pathogenic human coronaviruses, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), SARS-CoV, and the Middle Eastern Respiratory Syndrome Coronavirus (MERS-CoV), have been shown to encode numerous short linear motifs across their genomes that are homologous to human proteins [6]. Although coronavirus infections are typically localized to the lungs, resulting in respiratory infections, viral material has also been found in other organs, such as the kidney, brain, and heart, resulting in more life-threatening infections [7]. Furthermore, SARS-CoV-2 (the causative agent of the COVID-19 pandemic) infection has presented symptoms not previously seen in other coronavirus infections, such as conjunctival discharge from the eyes [8,9]. Investigations into coronavirus host mimicry may shed light on viral tropism and infection severity [10].
The structure of the receptor-binding motif (RBM) on the spike glycoprotein is particularly important for determining the tropism of the virus [11]. Host receptors that contain motif(s) that complement the electrochemical and spatial configurations of the viral RBM will interact and, thus, initiate viral entry [12,13]. Angiotensin converting enzyme II (ACE2) has been established as the primary cell entry receptor for SARS-CoV-2 and SARS-CoV and dipeptidyl peptidase IV (DPP4) as the primary cell entry receptor for MERS-CoV. However, several reports, some preliminary, have proposed additional coronavirus cell entry receptors, such as transferrin receptor protein 1, kidney injury molecule-1, kremen protein 1, and av integrins for SARS-CoV-2 [14][15][16][17][18][19][20]. Additionally, coronavirus spike proteins have been proposed to interact with host factors to facilitate infection aside from their role in cell entry [21]. For instance, two studies found that the SARS-CoV-2 spike protein alone can interact with the blood brain barrier [22,23]. The importance in receptor-binding and low glycosylation surrounding the coronavirus RBM residues make it an attractive target for inhibition by small-molecule drugs, therapeutic peptides, and neutralizing antibodies [24][25][26].
To date, there has been limited investigation into the structural similarity of highly pathogenic coronavirus RBMs [27]. Identifying structurally analogous human proteins may give insight into endogenous biochemical pathways that the virus is hijacking to facilitate infection or may help explain autoimmune disorders triggered by coronavirus infections [28,29]. Detecting similar microbial proteins may reveal shared host receptors or antibody crossimmunity [30]. Short linear motifs on coronavirus spike RBMs have been shown to share high amino acid sequence identity with human proteins, which may indicate host mimicry [31][32][33][34]. However, protein structure and fold similarity have been shown as more informative than amino acid sequence similarity in predicting molecular mimicry [35,36]. Drayman et al. performed a structural similarity search using bacterial and viral motifs and experimentally validated the simian vacuolating virus 40 major capsid protein mimicry of Gas6 binding with TAM -Tyro3, Axl, and Mer -receptors, demonstrating that structural paralogs with low amino acid identity may still act as molecular mimics. Thus, to add to the understanding of host mimicry of highly pathogenic coronavirus RBMs, we used structural bioinformatics tools to model and map the extent to which the three-dimensional structures of the SARS-CoV-2, SARS-CoV, and MERS-CoV spike RBMs are potentially mimicking the interactions of experimentallydetermined protein structures. We used structural alignment tools with distinct methodologies to perform a structural similarity screen between the RBMs and all known protein structures and, subsequently, tested potential RBM interactions with protein-protein docking simulations. Several cell signaling proteins, innate immune factors, snake and spider toxins, and microbial antigens are found to share structural features with the three RBMs. This information may help guide experimental efforts to elucidate spike RBM interactions, including that of vaccine design and cell entry receptor discovery.

Receptor-binding motif structural similarities and characteristics
Several models of the spike protein for each of the highly pathogenic coronaviruses have been experimentally determined; however, many of them are missing residues due to the difficulty in resolving the structure of flexible protein motifs [37]. To overcome this issue and obtain a representative three-dimensional model of each spike receptor-binding motif (RBM), we used ProtCHOIR, a recently developed pipeline to automate the modelling of homooligomers, to model each trimeric spike protein and, subsequently, manually selected the RBM residues for each coronavirus (Fig. 1). All generated models were structurally aligned to experimental models using TM-align to determine modelling precision. On a scale from 0 to 1, a TM-score of over 0.5 between two proteins implies that they have the same fold, while below 0.2 suggests a random alignment. Each RBM alignment with the corresponding experimental structure reported a TM-score over 0.95, reflecting high-quality modelling. Although receptor-binding of coronavirus spike proteins has been shown to be an elaborate process that involves interactions with glycans and multiple protein domains, we selected the most interactive region of the spike RBD with primary receptors (i.e. ACE2 for SARS-CoV and SARS-CoV-2; DPP4 for MERS-CoV) from experimental models as the receptor-binding motif (RBM) [38].
The structural similarity of the RBMs to one another was quantitatively assessed using TM-align before assessing their similarity to other known proteins. The SARS-CoV-2 and SARS-CoV RBMs were very similar with a TM-score of 0.71, while the TM-scores of MERS-CoV with the other two were both less than 0.25 ( Fig. 1). This level of divergence is also reflected at the amino acid sequence level for the RBD of SARS-CoV-2 and SARS-CoV at 64.6% sequence identity and MERS-CoV with SARS-CoV-2 and SARS-CoV at 19% and 21.6%, respectively.
As seen in Fig. 1, the SARS-CoV-2 RBM is comprised almost exclusively of hydrophobic and polar non-charged amino acids, with the exception of one acidic glutamate and one basic lysine. SARS-CoV is similar to SARS-CoV-2 in that it is composed mostly of hydrophobic and polar non-charged residues with some exceptions as single amino acid differences, such as an acidic aspartate in the middle of the SARS-CoV RBM. The MERS-CoV RBM consists of more acidic and basic amino acids and contains fewer polar noncharged residues. Of note, the SARS-CoV-2 and SARS-CoV RBMs have 7 and 8 aromatic residues, respectively, exposed on the receptor-binding surface of the RBM. The recent discovery of the N501Y and E484K mutants add a potentially functional aromatic and basic residue, respectively, in the SARS-CoV-2 RBM -both of which have been proposed to increase binding to ACE2 [39]. Modelling of the mutants yielded very small structural changes in the SARS-CoV-2 RBM -TM-scores of the mutant RBMs aligned to the reference structure were above 0.9.
In terms of global architecture, the SARS-CoV-2 and SARS-CoV RBMs contain two anti-parallel beta-strands connecting three loops, although the SARS-CoV-2 RBM has two short beta-strands leading to a cystine disulfide loop (Fig. 1). Both SARS-CoV and SARS-CoV-2 contain a similar cystine disulfide bond helping shape one end of the respective RBMs. The MERS-CoV RBM consists of three beta-strands connecting four loops. Because loop flexibility may affect overall structure, we submitted each RBD to the CABS-flex 2.0 web server and found that the cystine disulfide loop of both the SARS-CoV-2 and SARS-CoV RBMs displayed high flexibility (>9 RMSF) -otherwise, the RBM residues on all three RBMs were predicted to exhibit low RMSF (less than6.5) ( Supplementary  Fig. 1). The flexibility predictions from CABS-Flex 2.0 were supported by separate studies on coronavirus RBMs [40,41]. The high flexibility of the cystine loops in the SARS-related RBMs motivated the use of two additional models provided by CABS-Flex 2.0 for the structural similarity screen. The added models reported surprisingly low TM-scores compared to the references (0.42 and 0.65 for SARS-CoV-2 and 0.45 and 0.41 for SARS-CoV), revealing the high flexibility in these loops ( Supplementary Fig. 1). Overall, SARS-CoV-2 and SARS-CoV were found to share higher structural homology with one another than in comparison with MERS-CoV.

Structural similarity screen
After RBM model generation, we performed a structural similarity screen for each RBM. Four sequence-independent 3D-structure alignment tools with different methodologies were used to quantify the structural similarity between the RBMs and known 3D protein structures in order to better understand shared structural features between the RBMs and potential mimics. Notably in this study, although spike may engage in interactions within human cells, we focused on protein structures that would be found in the extracellular matrix (excluding antibodies, due to their structural diversity) to gain more insight into potential cell entry receptors, immunopathies, and shared antigenicity with other microorganisms [42].
The PDBeFold, RUPEE, and HMI-PRED web servers were used, and TM-align was locally-installed and run pairwise against the downloaded PDB database clustered at 100% sequence identity. The TM-score distributions between SARS-CoV-2 and SARS-CoV were quite similar, while MERS-CoV was more similar to a greater number of proteins ( Fig. 2A). The MERS-CoV RBM returned 3,954 structures with a TM-score of over 0.5 (~top 1% of TM-scores) out of 245,055 total RBM-chain alignments and an average TM-score of 0.33. The SARS-CoV-2 and SARS-CoV RBMs had lower average TM-scores, 0.298 and 0.297 respectively, and the top 1% corresponded roughly to the 0.4 TM-score line. Thus, structures with a TM-score of > 0.4 were selected for further analysis for the SARS-related viruses: 4,025 for SARS-CoV-2 and 3,561 for SARS-CoV. PDBeFold returned 621-806 and 1,163 structures for the SARS-related and MERS-CoV RBM models, respectively. The top 1,000 hits from each RUPEE run were recorded. HMI-PRED outputs ranged from 20 to 50 mimicked PDB templates per RBM. All alignments of interest were manually inspected to validate the potential for structural mimicry. Returned aligned proteins from each tool were linked to their corresponding PDB and UniProt codes. Shared UniProt codes between two or more tools were regarded as high-confidence hits. Biologically relevant structural alignments specific to each tool were also inspected and considered. Structural alignments that would not make sense biologically, such as when the RBM is facing the inside of the protein, were discarded, while alignments that were logical but found outside of protein-protein interfaces were included on a case-by-case basis. Returned structures not shown to be found in the extracellular matrix were removed. All tools returned their respective spike structures, confirming their validity.
A total of 62 UniProt codes, excluding 28 toxins, were considered as biologically relevant, which were comprised of 35, 19, 19, and 8 selected alignments from RUPEE, HMI-PRED, PDBeFold, and TM-align, respectively. When comparing tools (Fig. 2B), RUPEE and PDBeFold web servers shared 7 UniProt codes for at least one RBM, while TM-align shared 1 with PDBeFold and 0 with RUPEE. HMI-PRED shared 2 structures with RUPEE, 1 with PDBe-Fold, and 0 with TM-align. Little overlap was shown between most of the tools, which is consistent with structural similarity-based studies on HIV and human proteins [43]. The combined returned UniProt codes, excluding toxins, from all four tools totalled 39, 23, and 29 for the SARS-CoV-2, SARS-CoV, and MERS-CoV RBMs, respectively. The top alignments consisted of cytokines, chemokines, and growth factors and their receptors; structures containing EGF-like domains; complement activation proteins; cystine disulfide-rich toxins derived from snakes and spiders; and antigenic microbial proteins. A Venn diagram showing some shared hits between the three RBMs can be seen in Figure 2C, and a full listing of the hits, alignment values, and tools can be found in Tables 1 and 2. The SARS-CoV and SARS-CoV-2 RBMs shared more structural domains, while MERS-CoV returned more unique hits compared to the other two. Altogether these results indicate that proteins from completely different protein families may interact with coronavirus spike RBMs.

Analysis of predicted structural mimicry
Further examination of the structural alignments and their relevance to biological activity was performed to elucidate potential mechanisms of molecular mimicry by the SARS-CoV-2, SARS-CoV, and MERS-CoV spike RBMs. The UniProt and STRING databases were used to link the predicted mimics with potential interaction partners, and the PDB provided template structures to determine whether the alignments were found in ligand-binding regions. Selected high-confidence potential interactions were further evaluated using protein-protein docking with ClusPro PIPER in order to better understand electrochemical, in addition to structural, complementarity considering the low amino acid sequence identity. The docked models were then analyzed with the FoldX Anal-yseComplex program to determine the complex interaction energy. Docking of the natural ligand to the receptor was performed to obtain a control interaction energy. The energy of the original PDB protein complex was also predicted as an experimental control. The exploration of these interactions with structural alignment visualization and protein-protein docking may help explain their potential roles in infection.
The potential mimics were split into two categories: endogenous vs. exogenous, or human vs. non-human, to more effectively describe the results in the context of infection. Mimicry of endogenous proteins may reveal which human pathways, specifically, the viral RBM is hijacking; structurally similar exogenous proteins may exhibit shared interference of human interaction pathways or antigenicity with the coronavirus RBMs. Endogenous hits, both discovered by single and multiple structural alignment tools, are summarized in Table 1 and exogenous hits in Table 2.

Endogenous
Several proteins containing EGF-like domains were found to be similar to all three RBMs. EGF-like domains are evolutionarily conserved domains that share homology to the epidermal growth factor and have been shown to function primarily in tissue organization and repair [44,45]. Both the cystine disulfide loop and the central beta-strand sub-motif structures in the SARS-CoV-2 and SARS-CoV RBMs and the MERS-CoV beta-strands were found to mimic EGF-like domains.
The EGF-like domain of the urokinase-type plasminogen activator (uPa) in complex with its receptor, urokinase plasminogen activator receptor (uPAR), (PDB: 2fd6) was found to be similar to both SARS-CoV-2 and SARS-CoV RBMs using RUPEE. Interestingly, the uPa/uPAR system has been implicated in SARS-CoV-2 pathogenesis with uPAR as an early predictor of severe respiratory failure [46,47]. Although the RBMs protrude into the receptor in the structural alignments, the alignments suggest that the RBMs might bind to uPAR (Supplementary Fig. 2A).
The neurogenic locus notch homolog protein 1 (NOTCH1) EGFlike domain was returned for the SARS-CoV-2 RBM central betastrands and MERS-CoV RBM by RUPEE and PDBeFold. NOTCH1 is involved in developmental, innate immunity, and inflammation signaling pathways, and natural ligands of the NOTCH1 EGF-like domains include jagged-1, jagged-2, delta-like 1 (DLL1), DLL3, and DLL4 [48]. Alignment of the SARS-CoV-2 and MERS-CoV RBMs with the EGF-like domain of NOTCH1 bound to DLL4 (PDB: 4xl1) shows potential for molecular mimicry, i.e. the coronavirus RBMs may bind to DLL4 ( Supplementary Fig. 2B) [49]. The SARS-CoV-2 RBM was also found similar to NOTCH2 by RUPEE, but no PDB complex models were available for further inspection. No direct interactions with the NOTCH1 pathway have been revealed, but its inhibition has been proposed to help fight SARS-CoV-2 infection [50].
All three RBMs were found to potentially mimic the EGF-like domain of coagulation factor VIIa. Further inspection of the alignment in complex with tissue factor (PDB: 1dan) showed potential for mimicry ( Supplementary Fig. 2C) [51]. Interestingly, tissue factor expression has been shown to be up-regulated in sev-ere SARS-CoV-2 infections, although there are several plausible theories [52,53]. The cystine disulfide loops of SARS-CoV and SARS-CoV-2 were found to resemble the EGF-like domains of  coagulation factors X and IX and fibrillin, which are known to bind calcium [54,55]. However, there is no evidence for calcium binding to the RBMs. All three RBMs were found to mimic the EGF-like domain of thrombomodulin, specifically in the region that binds thrombin (PDB: 1dx5), by RUPEE and PDBeFold, while the SARS-CoV-2 similarity was also detected by HMI-PRED [56]. Studies have shown that both thrombin and thrombomodulin blood concentrations are correlated with SARS-CoV-2 infection severity [57,58]. The verification by three tools and relevance to the literature led us to explore the potential mimicking of thrombomodulin binding to thrombin by the SARS-CoV-2 RBM using protein-protein docking (Fig. 3A). Calculation of the interaction energies revealed that the reference docking and experimental controls showed similar affinities of À6.12 and À6.88 kJ/mol, and the SARS-CoV-2 RBM bound at a slightly lower affinity of À1.96 kJ/mol. The similarity to thrombomodulin might help explain the prothrombotic coagulopathy presented in SARS-CoV-2 infections [59].
The central beta-strands of the SARS-CoV-2 RBM were found to be structurally similar to the transforming growth factor alpha, epiregulin, and epigen EGF-like domains using RUPEE; however, alignment of the RBM with the proteins in complex with the epidermal growth factor receptor (EGFR) (PDBs: 1mox, 5wb7, 5wb8) showed that the RBM was just out of ligand-binding range (Supplementary Fig. 2E) [60,61]. There is no evidence for interaction of the SARS-CoV-2 RBM with the extracellular domain of EGFR; however, the alignments were included for potential off-target effects related to the EGF-like domains.
Structural mimicry of chemokine and cytokine signaling has been reported for several viruses [1]. Viral proteins can mimic the chemokine, as in the case of HIV gp120 and CCL5, or they can mimic the receptor and bind directly to the cytokine (inhibiting its function), such as the vaccinia virus B15R protein that mimics the IL-1B receptor and binds to IL-1B [62,63].
Several cell signaling ligands and receptors were found similar to the coronavirus RBMs. The SARS-CoV-2 and SARS-CoV RBMs were both found to mimic IL-8 like chemokines, fibroblast growth factor 1, C-C motif chemokines 3, interleukin-18, and ephrins; they also individually mimic BMP2 and von Willebrand factor, respectively. The MERS-CoV RBM structurally resembled C-X-C motif chemokines 2 and 4 and growth/differentiation factor 5. The alignments of the RBMs with IL-8 like chemokines, C-C motif chemokines 2 (CCL2), 3, and 4, and IL-18 in complex with their respective receptors shows only partial alignment with the ligand-binding regions (Supplementary Fig. 2F). Interestingly, however, expression levels of these cytokines have all been shown as correlating with SARS-CoV-2 infection, although other explanations have been proposed [64][65][66][67]. For example, IL-33 release by damaged lower respiratory cells during SARS-CoV-2 has been demonstrated to trigger inflammation, increasing CCL2 and CCL3 expression [68].
Fibroblast growth factor 1 (FGF1) was shown to be similar to the SARS-CoV-2 and SARS-CoV RBMs using PDBeFold and HMI-PRED. A transcriptomic profiling revealed that FGF1 was upregulated in coronavirus infections [69]. Thus, to look more closely at potential interference, we docked the SARS-CoV RBM with the fibroblast growth factor receptor 2 (FGFR2) (PDB: 3OJ2), which was predicted by HMI-PRED (Fig. 3B) [70]. The RBM-FGFR2 docking analysis predicted a potentially favourable affinity of À2.49 kJ/mol, although not as high as the experimental complex (À9.78 kJ/mol) (Fig. 3D). The FGF1 signaling pathway may, thus, be modulated by the coronavirus spike RBMs.
HMI-PRED predicted that the SARS-CoV-2 RBM mimics ephrin-A5 and ephrin-B2 binding to the ephrin type 4a receptor (EPHA4), and SARS-CoV mimics ephrin-A5 binding to the ephrin receptor type 3a. EPHA4 is unique among known class A ephrin receptors in that it binds both ephrin a and b ligands [71,72]. The structural similarity of two ligands for the same receptor for the SARS-CoV-2 RBM motivated further testing with protein-protein docking with EPHA4. Although there is no evidence for ephrin receptor involvement in coronavirus infections, other viral surface proteins have been shown to utilize ephrin receptors for cell entry, such as the rhesus r virus [73]. The docking revealed similar affinities between the SARS-CoV-2 RBM-EPHA4, ephrin-A5-EPHA4, and experimental (PDB: 4m4r) complexes: À1.41, À3.32, À0.45 kJ/mol, respectively (Fig. 3A).
The platelet glycoprotein Ib (GP-Ib) binding domain of von Willebrand factor (VWF) was found to be similar to SARS-CoV by HMI-PRED. VWF-GP-Ib interaction has been shown as critical in modulating thrombosis and inflammation [74]. Although there is no literature on VWF and SARS-CoV infection, blood concentration levels of VWF have been shown as correlated with SARS-CoV-2 infection severity, which may indicate potential pathway interference [75].
SARS-CoV-2 was predicted to mimic bone morphogenetic protein 2 binding to activin receptor type-2B and MERS-CoV to mimic growth/differentiation factor 5 (GDF5) binding to bone morphogenetic protein receptor type-1A (BMPRT1A) by HMI-PRED; however, no experimental evidence is available for either case. To explore the potential involvement of the MERS-CoV RBM in cell signaling, we docked the MERS-CoV RBM to the BMPRT1A in the GDF5-binding region (PDB: 3qb4) [76]. The docking of the RBM and GDF5 displayed similar affinities to BMPRT1A with À6.73 and À6.99 kJ/mol, respectively, while the experimental complex bound with À10.58 kJ/mol. RUPEE detected structural resemblance between the SARS-CoV-2 RBM and IL-6 receptor alpha and beta chains, both of which show  Fig. 2G). IL-6 has been reported as an overexpressed cytokine in SARS-CoV-2 infections, which can lead to induction of a hyper-innate inflammatory response [77,78]. Mimicry of the IL-6 receptors by the RBM could result in binding and, thus, interference of IL-6 related interactions. However, several alternative theories have been proposed to explain the increases in IL-6 during severe infection; for example, the SARS-CoV nucleocapsid protein has been shown to activate IL-6 expression [79]. HMI-PRED additionally predicted MERS-CoV RBM mimicry of the binding of the T cell receptor beta chain to the major histocompatibility complex class I-related gene protein and interferon lambda receptor 1 binding to the beta subunit of the interleukin-10 receptor, both of which could have implications in immunosurveillance and inflammatory pathways [80,81]. Different tumor necrosis factor-related ligands and receptors were found to be structurally analogous to the MERS-CoV RBM and SARS-CoV-2 RBM, respectively. Tumor necrosis factor receptor superfamily (TNFRSF) 1A, 4, 13C, and 14 were returned for the cystine disulfide loop for the SARS-CoV-2 RBM by RUPEE, while the tumor necrosis factor ligand superfamily (TNFSF) 13B and 14 were found to resemble the MERS-CoV RBM by RUPEE and HMI-PRED. Similarity of the SARS-CoV-2 RBM to TNFRSF 13C was also found by PDBeFold. These signaling pathways normally promote B-cell and the T-cell survival and maturation [82,83]. The structural similarity of this family of ligands and receptors to the SARS-CoV-2 and MERS-CoV RBMs led us to further inspect the interactions with protein-protein docking: mimicry of SARS-CoV-2 to TNFRSF 13C and MERS-CoV to TNFSF 13B. Thus, we simulated the binding of the SARS-CoV-2 RBM to TNFSF 13B and MERS-CoV to TNFRSF 13C (PDB: 3v56) [84]. Both cases revealed that the RBM is predicted to dock at a higher affinity than the natural ligand (Fig. 3D).
The complement system comprises a series of protein cascades that form an integral part of the innate immune response to viruses [85]. Viruses are generally susceptible to the complement system; however, viral proteins can utilize complement proteins through molecular mimicry in a variety of ways, such as using complement receptors for viral entry or evading detection by the immune system [86]. Infections from all three highly pathogenic coronaviruses have been reported to activate the complement system, enhancing pathogenicity, although the exact mechanisms remain unclear [87]. The spike protein of SARS-CoV-2 has been shown to localize near C4d and C5b-9 in lung vasculature, and mutations in several complement activation proteins, such as complement factors H, I, and III, have been found to correlate with infection severity [88,89]. The structural similarity screen yielded three motifs from the complement system that potentially mimic RBMs: complement factor I (CFI) binding domain of CFH for all three RBMs and both the complement C3d binding domain of complement receptor 2 (CR2) and the complement C1r binding domain of complement C1s for the MERS-CoV RBM. Interestingly, CFH and the SARS-CoV-2 spike protein have been proposed to compete for heparan sulfate binding [90]. The SARS-CoV RBM, however, was predicted to be similar to CFH by RUPEE, PDBeFold, and HMI-PRED; thus, we docked the SARS-CoV RBM to CFI (PDB: 5o32) and found that the natural ligand was predicted to bind at a slightly higher affinity than the SARS-CoV RBM: À1.60 vs. À0.14 kJ/mol, respectively [91]. The C3d-binding domain of CR2 for the MERS-CoV RBM was also identified by RUPEE, PDBeFold, and HMI-PRED and was, thus, explored with docking of the MERS-CoV RBM to C3 (PDB: 3oed) (Fig. 3C) [92]. The MERS-CoV RBM was predicted to bind at a higher affinity than both the control docking and experimental complexes: À5.10 vs. À0.93 and À1.20 kJ/mol, respectively (Fig. 3D). Additionally, HMI-PRED found that the MERS-CoV RBM also mimics the complement C1r binding site of complement C1s. Additional experimental efforts are needed to validate the relationship between coronavirus spike proteins and the complement activation pathway.
Other endogenous hits included several unrelated proteins, such as protease inhibitors and serotransferrin. The MERS-CoV RBM resembled the fibronectin type III (FNIII) domains of mouse myosin-binding protein C and tenascin-X using RUPEE. Although myosin-binding protein C is intracellular, FNIII domains are found across the domains of life and function in diverse ways, from cell adhesion to cell signaling [93]. Drayman et al. found that the West Nile virus envelope glycoprotein E resembles the structural architecture of the FN10 domain of fibronectin, which is a natural ligand for integrin avb3. Thus, we checked and found that the MERS-CoV RBM shares structural properties with other FNIII domains, such as those from fibronectin and neural cell adhesion molecule 1 (PDBs: 2haz and 1fnf, respectively) (Fig. 3C) [94,95]. The MERS-CoV RBM was also found to mimic part of the jagged-2, DLL1, and DLL4 proteins; however, the alignment was largely out of ligand-binding range when compared to jagged-1 in complex with NOTCH1 (PDB: 5uk5) -although the alignment may be relevant in other scenarios ( Supplementary Fig. 2E) [96]. Protease inhibitors included neuroserpin for the SARS-CoV-2 RBM and clitocypin-5 cysteine protease for the SARS-CoV RBM. The alignment of the SARS-CoV with clitocypin-5 cysteine protease showed potential binding to cathepsin L2 (PDB: 3h6s) [97]. The role of cathepsins in coronavirus cell entry has been described as helping process the spike protein for viral and host membrane fusion [98]. To investigate the potential for additional interactions between coronavirus RBMs and cathepsins, we performed protein-protein docking. The binding of the SARS-CoV RBM to cathepsin L2 was predicted to be more favourable than the docking and experimental controls (Fig. 3D). Experimental evidence is required to validate this interaction, however. Both the SARS-related RBMs resembled motifs of serotransferrin using TM-align, and, interestingly, the transferrin receptor protein 1 has been proposed as a potential cell entry receptor for SARS-CoV-2 [17]. However, the alignments were generally out of ligand-binding range (PDB: 1suv) ( Supplementary  Fig. 2H); since no binding mode was apparent, it was not considered for docking [99]. HMI-PRED predicted that the SARS-CoV-2 RBM mimics the dimerization domain of C-type lectin domain family 5 and that the SARS-CoV RBM mimics intercellular adhesion molecule 5 binding to integrin alpha-L [100,101]. Integrins have been proposed to bind to the SARS-CoV-2 spike protein, although that is due to a new RGD motif in the RBD -of note, the RGD motif is not included in the selected residues for this study's SARS-CoV-2 RBM since it does not interact with ACE2 in experimental models [16]. Because integrin binding has not been hypothesized outside of the discussion of the SARS-CoV-2 RGD motif, docking was not pursued. Both the SARS-CoV-2 and SARS-CoV RBMs mimicked the nicotine-binding domain of the nicotinic acetylcholine receptor by RUPEE, which may have implications in the 'nicotinic hypothesis' [102].

Exogenous
We classified the exogenous hits by the pathogen type. There were motifs from apicomplexan parasites, viruses, one bacterial protein, and snake and spider toxins found to resemble the coronavirus RBMs.
The EGF-like domains from merozoite surface protein 1 (MSP1) of several Plasmodium species were found to be structurally similar to all three RBMs using RUPEE. Compared to the other two, the SARS-CoV-2 RBM was found to be similar to the most Plasmodium species: falciparum, yoelii, cynomolgi, knowlesi, vivax. The SARS-CoV RBM returned P. yoelii MSP1 and the MERS-CoV RBM returned P. falciparum MSP1. A closer inspection at the P. falciparum MSP1 alignments revealed that two EGF-like domains on the same PDB structure (1ob1) were found to resemble the SARS-CoV-2 RBM (Fig. 4A) [103]. The PDB structure is originally modelling the antibody-binding epitope of the EGF-like domain of MSP1; however, the antibody epitope is located on a loop just outside of the EGF-like domain. Thus, antibody-binding to the SARS-CoV-2 RBM could not be verified, but the presence of two EGF-like domains near an epitope may motivate experimental testing. The P. falciparum apical membrane antigen 1 epitope (PDB: 2j5l) was also found to resemble the SARS-CoV-2 and SARS-CoV RBMs, although, as in the case of MSP1, both RBMs aligned to a region outside of the antibody-interacting residues (Fig. 4B) [104]. These EGF-like domains from Plasmodium parasites may provide structural epitope scaffolding for cross-reactivity against the coronavirus spike RBMs [105]. Recent studies have pointed to a potential protective effect of P. falciparum infections against SARS-CoV-2 infection, although direct experimental evidence is yet to be established [106][107][108][109]. The MERS-CoV RBM was also found to resemble the rhoptry neck protein 2 and thrombospondin-related anonymous protein from P. falciparum. The surface antigen 3 of Toxoplasma gondii was found to be similar to the MERS-CoV RBM (Fig. 4C). Although there are no data on MERS-CoV and T. gondii coinfections, SARS-CoV-2 has been shown to have negative covariation with toxoplasmosis, which may indicate a protective effect from T. gondii [110].
The coronavirus RBMs were found to structurally mimic several motifs on the HIV and Influenza spike proteins; however, they were found either facing inwards or buried inside the mimicked protein and were, therefore, discarded. PDBeFold and TM-align indicated that the SARS-CoV-2 and SARS-CoV RBMs structurally mimic several hepatitis C virus (HCV) antibody epitopes. The SARS-CoV-2 and SARS-CoV RBMs were found to be similar to 10 and 6 PDB HCV E2 protein epitopes structures, respectively (Supplementary Table 2). The HCV E2 protein is implicated in host entry, which has been explored as an inhibitory target with neutralizing antibodies [111,112]. A closer inspection of the mapping of the epitopes to the SARS-CoV-2 RBM show that they are distributed across the RBM (Fig. 4D). Some studies have suggested that HCV may be negatively correlated with SARS-CoV-2 infection [113][114][115]. Since several of the epitopes were aligned in ways that were accessible to antibodies in the original PBD, we selected three epitopes, one at each region of the RBM, and docked the respective antibody to the SARS-CoV-2 RBM using the ClusPro PIPER 'antibody' mode (Fig. 4D). As shown in Fig. 4E, the RBM-antibody docking results were compared to docking and experimental controlsthe antibodies bound in a similar way to docking controls in all three cases, while the experimental complexes were predicted to bind more tightly. These structural similarities may take part in potential cross-reactivity between HCV and coronavirus infections. Of note, two recently proposed cell entry receptors for the SARS-CoV-2 spike protein, ASGR1 and APOA4, have been shown as potentially implicated in mediating HCV viral entry [116]. In an interesting case, the MERS-CoV RBM was found to structurally mimic both the Machupo virus glycoprotein polyprotein GP complex RBM (TM-score: 0.47) and its receptor, transferrin receptor protein 1 (0.52) (PDB: 3kas) using TM-align, although the transferrin receptor scores slightly higher ( Supplementary Fig. 3A) [117].
Only one bacterial protein was selected in the structural similarity screen. The adhesin-binding fucosylated histo-blood group antigen of Helicobacter pylori was found to be similar to the MERS-CoV RBM by TM-align. The structure (PDB: 5f7l) shows binding of the bacterial protein to a nanobody; however, the RBM alignment is just outside of the nanobody binding site ( Supplementary  Fig. 3B) [118]. No studies have detailed any connections between MERS-CoV and H. pylori.
Motifs from snake, spider, and cone snail toxins were found to be similar to all three RBMs using PDBeFold and RUPEE. The SARS-CoV-2 and SARS-CoV RBMs shared similarity to four toxins, and the MERS-CoV RBM only returned unique proteins. The two SARS-related viruses mimicked three-finger bungarotoxins and inhibitor cystine-knot toxins, such as psalmotoxin-1, while MERS-CoV RBM resembled other three-finger toxins, like cytotoxin 4 ( Supplementary Fig. 3C, D) [119][120][121][122][123]. In total, these toxins may bind to several receptors involved in nociception, e.g. ASIC1 and Nav1.7, which may be relevant to the taste and pain perception changes experienced during SARS-CoV-2 infection [124]. Importantly, and perhaps confoundingly, a recent study found no changes in depolarization for Nav1.7 and Cav2.2 upon exposure to the SARS-CoV-2 RBD [125]. Thus, further experimental work is necessary to validate these interactions.

Spike receptor-binding motif model generation and characterization
Amino acid sequences of the SARS-CoV-2 (NCBI code: NC_045512), SARS-CoV (NC_004718), and MERS-CoV (NC_038294) spike proteins were extracted as FASTA files from the NCBI Viral Genomes Resource [133]. Each amino acid sequence corresponds to one of three identical protomers of the full homooligomeric spike trimer. Due to the high number of available experimentally-resolved structures for each spike protein, representative models were generated using ProtCHOIR -a recently developed bioinformatic tool to automate 3D homology modelling of homo-oligomers [134]. ProtCHOIR builds homo-oligomeric assemblies by searching for homolog templates on a locally created homo-oligomeric protein database using PSI-BLAST, performing a series of structural analyses on the input protomer structure or sequence using Molprobity, PISA, and GESAMT (all three tools as part of the CCP4 Molecular Graphics package), and comparative homology modelling using MODELLER (version 9.24) with molecular dynamics-level optimization and refinement [135][136][137][138][139][140]. Trimerization was detected for all three coronavirus spike proteins.
The residues for the receptor-binding domains (RBD) and RBMs of each spike model were manually selected based on experimental structures with primary receptors (residues defined in Supplementary Table 1) and made into sub-structures during manual inspection of full-length models on PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC). Global amino acid sequence alignment of RBDs was performed with EMBOSS Needle [141]. The full-length SARS-CoV-2 spike protein modelled with the lipid bilayer displayed in Fig. 1 was retrieved from the SARS-CoV-2 3D database [142]. No records exist of N-linked or O-linked glycosylation motifs near the three RBMs, which was supported by NetN-Glyc 1.0 and NetOGlyc 4.0 predictions [143,144]. To determine the flexibility of each residue in the RBDs, we used CABS-flex 2.0, a web server that offers fast simulations and resulting data of protein structure flexibility [145]. Default values were used and residue flexibility was reported as root mean squared flexibility (RMSF) [146].

Structure similarity screen
Several web servers and stand-alone tools have become available to perform pairwise or multiple sequence-independent protein structure alignments, such as DALI, FATCAT, iSARST, MADOKA, PDBeFold, TM-align, and RUPEE [147][148][149][150][151][152][153]. After testing each tool, the PDBeFold web server, RUPEE web server, and a locally-installed version of TM-align were selected due to the diversity of structural alignment methodologies, ease-of-use, data accessibility, and widespread-usage. A newly published web server for structural prediction of host-microbe interactions based on interface mimicry, HMI-PRED, was also included in the analysis.
Of note, mTM-align (the web server version of TM-align) was considered, but no non-spike proteins were shown -restricting the downstream analysis [154]. Thus, all 3D models in the PDB database clustered at 100% sequence identity were downloaded, and TM-align was run in a pairwise manner, using GNU parallel, between each RBM model and each chain of every downloaded PDB file (O. Tange (2018): GNU Parallel, March 2018, https://doi. org/10.5281/zenodo.1146014.). TM-align works by, first, combining secondary structure similarity alignments, defined by DSSP (Define Secondary Structure of Proteins), and TM-score-based structural alignments [155,156]. A structure rotation matrix is applied to the alignments in order to maximize the TM-score, which was used to rank the alignments for each RBM.
The PDBeFold web server utilizes SSM, a graph-matching algorithm that superimposes PROMOTIF-defined secondary structures and, subsequently, maps backbone carbon atoms of, first, matched and, second, unmatched secondary structures [151,157,158]. The hits are ranked by their Q-Score, which is calculated to achieve a lower root mean squared deviation (RMSD) and an increased number of aligned residues. Since the highest percentage (%) of secondary structure matches for the SARS-CoV RBM was found at 67% (while SARS-CoV-2 and MERS-CoV returned hits with 100% secondary structure matches), we set the PDBeFold search parameters for 65% structural similarity at ''highest precision" for each RBM.
The RUPEE web server performs structural similarity comparisons using a purely geometric approach: 1) a linear encoding of the protein structure is defined to identify separable regions of permissible torsion angles for DSSP secondary structure assignments; 2) the encoding is converted into a bag of features; 3) a protein structure indexing method is established using min-hashing and locality sensitive hashing; 4) the top 8,000 matches are sorted based on adjusted Jaccard similarity scores; and 5) if running in ''Top-Aligned" mode (used in this analysis), the alignments are re-scored using TM-align [153]. RUPEE allows the specific comparison of a query protein to the CATH, SCOP, PDB, and ECOD databases [159][160][161][162]. Additional settings are also offered, such as the ''contains" (finding query protein inside database protein) and ''contained in" options (small protein motif detection in query protein) -both of which were used in this analysis.
HMI-PRED combines TM-align with NACCESS to search through template host protein-protein complexes [163,164]. The structural alignment model of the ligand and putative receptor are refined with RosettaDock to quantify electrochemical complementarity, which is not included in a strict structural alignment screen [165]. Alignments for both the RBM and RBD of each of the three coronaviruses were collected from HMI-PRED.
Since the flexibility of the cystine disulfide loop on the SARS-CoV-2 and SARS-CoV RBMs may affect the global structure of the RBMs and, thus, search outcome, we used two additional models provided by CABS-flex 2.0 for both RBMs, making a total of three conformations for each SARS-CoV-2 and SARS-CoV. Results of the different conformations provided were pooled together for both SARS-related RBMs. The top-scoring alignments from all four tools for each RBM were matched with their corresponding PDB and, subsequently, UniProt accession code [166]. The UniProt accession codes were then compared across tools to identify shared top hits.

Protein-protein docking and interaction energy prediction
A local installation of ClusPro PIPER (version 1.1.5) was used for protein-protein docking [167][168][169][170]. Annotations informing  potential protein-protein interactions were obtained from the PDB, STRING, and UniProt databases [171]. The ClusPro PIPER ''antibody" docking mode was used to dock RBDs with the hepatitis C antibody PDB structures (PDBs: 4g6a, 5fgc, 5nph), and the ''others" mode was used for all non-antibody docking. In order to minimize non-biologically relevant binding during the docking runs, residues outside of the RBM on the spike RBDs and outside of the ligandbinding region on the predicted interaction partners were masked. The docked models were minimized using CHARMM22. To gain better insight into the binding strength of the potential RBDreceptor complex interactions, we used the FoldX (version 4.0) AnalyseComplex program, which predicts the interaction energy by finding the difference in stability between the individual unfolded molecules and the overall complex [172]. The original PDB ligands were also docked to the receptor in order to obtain a ''Reference docking energy" when compared with the predicted RBD-receptor energy. The ''PDB complex energy" was obtained from the original PDB containing the ligand and receptor to understand binding resolution of the experimental complex.

Data analysis and visualization
A full representation of the pipeline and tools used can be found in Fig. 5. Data were analyzed and plots were generated using R version 3.6.3 (2020-02-29). Protein structural alignments were visualized with PyMOL (version 1.8.4.0) [173]. Pdb-tools was used to manipulate and organize PDB files [174]. The graphical abstract was adapted from the ''SARS-CoV-2 Spike Protein Conformations" template on BioRender. Fig. 5 was created on draw.io. Raw data and alignment models are made available at https://github.com/ tlb-lab.

Conclusions
This study involved the structural bioinformatics characterization of potential molecular mimicry by highly pathogenic coronavirus spike protein RBMs. Using protein homology modelling, we built representative models of the spike RBMs and tested structural changes in the SARS-CoV-2 RBM induced by recently recorded mutations, which had little effect on overall RBM structure. Comparison of the RBMs revealed that the SARS-CoV and SARS-CoV-2 RBMs share higher structural homology than with MERS-CoV, which was underlined by the number of common returned proteins in the structural similarity screen using four structural alignment tools. The flexibility of the cystine disulfide loop in the SARS-related RBMs was found to permit large global changes in RBM structure; however, since most of the predicted mimicry was mapped to the RBM central beta-strands, which are quite rigid, the models of different conformations did not return significantly different proteins from the structural alignment tools. The structural alignment screen highlighted the similarity of the RBMs to evolutionarily unrelated human and non-human proteins. Further validation of the alignments with protein-protein docking revealed that all tested coronavirus RBM-endogenous protein interactions were predicted to be energetically favourable, confirming that the structural similarity screen may be useful in identifying potential molecular mimics.
The predicted endogenous mimicry comprised of proteins in cell signaling, adhesion, and complement pathways. Potential mimicry of several microbial antigenic proteins and exogenous toxins was also discovered. The EGF-like domains of both endogenous and exogenous proteins structurally resemble all three RBMs. Predicted mimicked endogenous interactions include the EGF-like domain of thrombomodulin binding to thrombin, NOTCH1 binding to DLL4, and coagulation factor VIIa binding to tissue factor. Inter-ference in these pathways may partially explain coagulopathies in coronavirus infections [126]. Exogenous EGF-like domains of MSP1 from different Plasmodium species, on the other hand, may provide a structural epitope scaffold for cross-reactivity between coronavirus and Plasmodium infections [106]. Epitope similarity was further explored among the several antibody-bound hepatitis C virus E2 protein motifs that were structurally analogous to the SARSrelated RBMs. Structural similarity to antigenic proteins from other microbes may confer cross-immunity and, thus, also potentially guide vaccine design [127]. Cell signaling pathway proteins, such as TNF-related and ephrin ligands, were also found as potential mimics of the coronavirus RBMs, which may lead to use of alternative co-receptors for viral entry or modulation of signaling cascades. Complement factor H was returned for all three RBMs and has also been implicated in coronavirus infections [90]. The mimicry of complement proteins is widespread among viruses, and the spike RBM may have secondary roles interfering in these pathways [128]. Many snake and spider toxins were also found similar to the coronavirus RBMs, which implies the potential usage of receptors involved in pain, muscle contraction, cell adhesion, and coagulation pathways [129][130][131]. The prediction of evolutionarily unrelated, yet structurally similar, potential protein mimics reveals that previously unidentified pathways could be altered by the spike RBMs. The structural variation between coronavirus RBMs and their resulting molecular mimics can possibly be connected to differences in tropism, infection severity, and immune system reactivity between coronaviruses.
Although experimental verification of the predicted interactions is required to take these results further, the findings presented in this study provide insight into the potential molecular mimicry utilized by highly pathogenic coronavirus RBMs. The data can be used to support inhibitory drug, peptide, and antibody design efforts in order to prevent viral cell entry and virulence mechanisms related to coronavirus RBMs [132]. Additional work is needed to better understand how coronaviruses co-opt host machinery to enhance fitness.

Permission note
Permission has been granted to use all text, illustrations, charts, tables, photographs, or other material from previously published sources.
CRediT authorship contribution statement

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Author contributions
CAB performed analyses and wrote the paper; ARJ, AFA, and LC performed analyses and provided expertise on bioinformatics tools and statistical analysis; AvT, SH, BPB, and SET provided expertise on sequence and structural investigations; SCV helped acquire funding and provided expertise on the structural mutation analysis; PHMT and TLB supervised and provided overall direction for the study.