SARS-CoV-2 and human retroelements: a case for molecular mimicry?

The factors driving the late phase of COVID-19 are still poorly understood. However, autoimmunity is an evolving theme in COVID-19’s pathogenesis. Additionally, deregulation of human retroelements (RE) is found in many viral infections, and has also been reported in COVID-19. Unexpectedly, coronaviruses (CoV) – including SARS-CoV-2 – harbour many RE-identical sequences (up to 35 base pairs), and some of these sequences are part of SARS-CoV-2 epitopes associated to COVID-19 severity. Furthermore, RE are expressed in healthy controls and human cells and become deregulated after SARS-CoV-2 infection, showing mainly changes in long interspersed nuclear element (LINE1) expression, but also in endogenous retroviruses. CoV and human RE share coding sequences, which are targeted by antibodies in COVID-19 and thus could induce an autoimmune loop by molecular mimicry.

The RE share a reverse transcriptase as a common denominator. Together with an endonuclease, they can move by "copy and paste. " Based on the presence of an envelope gene, they can be divided into long terminal repeat (LTR) positive and LTR negative retrotransposons. The former and endogenous retroviruses (ERV) belong to LTR positive elements. Long interspersed nuclear elements (LINE), short interspersed nuclear elements (SINE) and SVA elements (SINE-R, VNTR and Alu) belong to LTR negative elements [32][33][34][35]. The LINE contain at least two open reading frames (ORFs), ORF1, coding for a nucleic acid binding protein with chaperone activity (ORF1p) and ORF2, which codes for a reverse transcriptase/endonuclease (ORF2p) [35,36]. Importantly, RE make up 50 -70% of the human genome [37,38]. About 20% of the genome is made up from LINE sequences (c. 500,000 copies), of which more than 100 LINE1 family members are still intact and about 68 active in humans. The LINE1 show strong interpersonal differences [39,40] and an age-dependent expression pattern [41][42][43]. By comparison, ERV make up about 8% of the human genome. Despite -similar to LINE -predominant inactivation, there are still hundreds of intact viral promoters and open reading frames from which the expression of ERV transcripts and proteins is possible [44][45][46]. The RE activation is known from many viral infections, such as HIV [47], dengue [48], influenza A [48], Zika virus [48], West Nile virus [48], measles [48], Epstein-Barr virus [49] and cytomegalovirus [50]. Therefore, I looked for the relationship of coronaviruses (CoV) to human RE based on genome, transcriptome, epitope and peptide array data. Here, transcriptome analysis coincidentally revealed many RE-identical sequences and shared epitopes in the CoV family members investigated, such as SARS-CoV-2, MERS-CoV and HKU1. To the best of my knowledge, these findings have never been reported. Importantly, epitopes are shared between human LINE1-and SARS-CoV-2 proteins and antibodies against some of these epitopes have been found to be correlated to COVID-19's severity. In addition, RE are expressed in healthy controls and deregulated in COVID-19 patients, as well as in SARS-CoV-2-infected human cells.

Results
The CoV genomes harbour a large number of RE-identical sequences. Several of these sequences represent shared RE-SARS-CoV-2 epitopes. Importantly, antibodies against some of these epitopes are correlated to the severity of COVID-19. In addition, RE are widely expressed in healthy controls and deregulated in COVID-19 patients, as well as in SARS-CoV-2-infected human cells.
A cut-off ≥18 bp (correlating to potential epitopes of at least 6 aa) was chosen for downstream analysis for sensitivity and epitope size reasons. A 6 aa cut-off corresponds well to a known immuno-relevant linear epitope length of 4 -12 aa, as about 50% of them have a length ≤ 8 aa (about 25% ≤ 6 aa, and only a few of 4 aa) [51]. At this cut-off point, the majority of RE-identical sequences are seen in HKU1 (332), followed by NL63 (206) and SARS-CoV-2 (191) ( Fig. 2A and B, Table 1). SARS-CoV-2 and RE sequence data were further explored by "LAST" in order to allow single nucleotide polymorphisms to be included, thereby alignments to RE sequences up to 35 bp were seen (Supplementary Table 2). In the RE-CoV data, LINE1 represent the majority of all shared sequences, while alignment to ERV sequences is a relevant minority and includes the 35 bp hits (Fig. 1B, Supplementary   Fig. 2 Sequence alignments of CoV genomes to retroelements by nucmer (cut-off ≥18 bp). A. Proportion of LINE1 (L1) and endogenous retrovirus sequences, showing a dominance of L1 sequences in all virus genomes (nucmer) analysed. B. Dot plot of shared RE sequences in CoV genomes, showing the highest RE-identical sequences in HKU1, followed by NL63 and SARS-CoV-2 (nucmer). Each dot represents an ≥18 bp retroelement sequence also found in the respective CoV genome  Tables 1 and 2). In conclusion, genome analysis revealed the presence of many short RE-identical sequences in CoV genomes, including SARS-CoV-2.

Shared epitopes between SARS-CoV-2-and retroelement proteins
Subsequently, all RE-identical sequences ≥18 bp were compared to the coding regions of the genome of SARS-CoV-2. Accordingly, 70 sequences showing identical aa sequences in CoV and RE were identified (Supplementary Table 1). These sequences were then compared to results from a peptide array, which investigated epitope signatures in COVID-19 patients (severe vs. mild) [52]. An overlap of human LINE1 proteins to SARS-CoV-2 epitopes from the RNA-dependent RNA polymerase (RdRp), helicase and 2′-O-ribose methyltransferase was detected for epitopes targeted with > 2-fold elevated antibody levels in severe cases (Fig. 3). Importantly, antibodies targeting an epitope of the SARS-CoV-2 RdRp polymerase, which is identical to an epitope of the LINE1 ORF2p endonuclease domain, were 39-fold elevated in severely compared to only mildly affected COVID-19 patients (Fig. 3A). The same is seen with antibodies targeting the shared CoV-RE epitopes from the 2′-O-ribose methyltransferase (Fig. 3C) and helicase (Fig. 3D). The latter is also a known B cell epitope, aa "PARA-RVECFDKFKV" (the known B cell epitope is depicted in bold) [53]. Many other shared RE-CoV peptides (similar to those displayed in Fig. 3B) were not targeted by antibodies in severe vs. mild COVID-19 (Supplementary  Table 2), but some are known as T cell epitopes, such as the one present in all three chains of the spike protein shown in Fig. 3B (aa VKQIYKTPPIKDF, the known T cell epitope sequence is depicted in bold) [54]. Taken together, SARS-CoV-2 and RE share peptide sequences, of which some are epitopes correlated to COVID-19 severity.

Transcriptome analysis of retroelements in SARS-CoV-2-infected cells
An RE analysis of COVID-19 patient data (bronchoalveolar lavage fluid, BALF), SARS-CoV-2 infected lung epithelial cells and SARS-CoV-2 infected macrophages was performed to explore the presence of and changes in RE expression after SARS-CoV-2 infection. Infection resulted in a highly significant (adjusted p-value ≤0.05) and relevant (fold change ≥2) deregulation of human RE in all samples. Transcriptome data from COVID-19 patients' BALF compared to healthy controls shows an upregulation of 2035 and downregulation of 3144 RE (Fig. 4A). Among the top deregulated RE are mainly LINE1 (Fig. 4D). SARS-CoV-2-infected epithelial lung cells (Calu-3) show 34 up-and 29 downregulated RE (Fig. 4E), while infected human macrophages have 8 upand 24 downregulated RE. Among the top de-regulated RE for both are also mainly LINE1 (Fig. 4E, F).
In conclusion, RE are expressed in COVID-19 patients and human cells and become deregulated after SARS-CoV-2 infection, showing mainly changes in LINE1 expression.

Discussion
The factors driving the late phase of COVID-19 are still not fully understood [11,12]. However, there is evidence that autoantibodies and autoreactive lymphocytes could contribute to the disease's final outcome [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. Therefore, the question of autoantibody formation in COVID-19 has to be asked. The employment of a comprehensive RE database revealed many RE-identical sequences in ten CoV family members investigated, such as in SARS-CoV-2, MERS-CoV and HKU1 ( Figs. 1 and 2). Crucially, it was found that the LINE1 proteins ORF1p and ORF2p have peptides identical to SARS-CoV-2 epitopes (Fig. 3), and that some of these epitopes are associated with COVID-19's severity, as shown by correlation to COVID-19 patients' antibody titres (Fig. 3). In addition, RE are deregulated in COVID-19 patients (Fig. 4A), as well as SARS-CoV-2-infected human epithelial lung cells and macrophages ( Fig. 4B and C), which has occasionally been reported in the last few months for cell lines and patients [28][29][30][31]. Among the analysed REs, LINE1 are strongly represented in all results (Figs. 2, 3 and 4, Supplementary Table 1 and 2). The LINE1 code for at least a nucleic acid binding protein with chaperone activity (ORF1p) and a reverse transcriptase/endonuclease (ORF2p). Importantly, autoantibodies targeting the LINE1 ORF2p endonuclease domain have been reported in 41% of SARS-CoV-1 patients [55]. The RE are also targeted by autoantibodies in several connective tissue diseases, for example, antibodies against LINE1's ORF1p or ERV HERV-K's envelope protein have been described in patients with systemic lupus erythematosus, lupus nephritis, rheumatoid arthritis, Sjogren's syndrome and mixed connective tissue disease [56][57][58][59][60][61][62][63][64][65]. Relating to SARS, the autoantibodies' target, LINE1 ORF2p, was prominently stained post-mortem in lung macrophages (residing in blood vessels), leading the authors to suspect a build-up of autoreactive CD4+ Th cells and, thus, an autoimmune loop in SARS [55]. Importantly, there is also increasing evidence for an autoimmune pathogenesis in severe COVID-19 [13-27, 66, 67]. One explanation for autoantibody formation is by molecular mimicry, i.e. shared epitopes between pathogens and hosts [68][69][70][71][72]. The evolution of mimicry epitopes in pathogens could be based on chance. However, although the RE-identical sequences in CoV observed are short (12 -35 bp), the sequence lengths observed make formation by chance highly unlikely. Exemplarily, taking the genetic code (A, T, C, G) raised to a sequence of 18 bp (4 18 ) results in 68,719,476,736 possible bp combinations, thus, the chance of getting one identical sequence is 1:69 billion. Additionally, a myriad of 12 bp events (Table 1) occurring by chance is stochastically very unlikely (4 12 = 16,777,216) at more than 18,000 events. Moreover, an observed 35 bp hit such as ERVL_Xq21.31b (4 35 ) corresponds to 1.18 E21 possible bp combinations, thus, the chance of getting an identical sequence is 1:1.1 trilliardwithout accounting for all the other matching sequences. Therefore, recombination activities more probably account for the phenomena observed. The exchange of genetic material by recombination in RNA viruses is generally associated with virulence, host range and host response [73]. It is known that recombination in CoV can take place during co-infections at a high frequency by homologous and non-homologous recombination [74][75][76]. Mechanistically, an explanation could be the switching of the RdRp between multiple available RNA strands during replication [77]. This could have happened in a CoV host/ancestor with relevant LINE1 expression, as this is possible in some bat species. The black-bearded tomb bat (Taphozous melanopogon), for example, harbours two active LINE families [78] and shows relevant SARS-CoV-2 infection efficiency [79]. Moreover, lots of ERV families also reside in bats [80]. Therefore, serial acquisition of RE sequences, possibly taken from CoV in host animals (starting many million years ago) is a feasible scenario. Relating to the rather short sequence lengths observed, there might be an evolutionary functional constraint working against the uptake of longer RE sequences, but a benefit for the virus by coating itself with host self-antigens ("self-peptide coat"). This would dampen the innate and adaptive immune response by the presentation of "viral but self-like" peptides. The consequence of this hypothesis is in line with the view of autoimmune disease as a breakdown of self-tolerance [81,82]. Based on the findings, autoantibodies targeting human RE could be a factor in CoV-induced disease, like COVID-19. However, this report has limitations, as the data basis for a more extensive analysis of anti-RE autoantibodies in COVID-19 still does not exist.