A Bioinformatics Analysis Reveals Novel Pathogens as Molecular Mimicry Triggers of Systemic Sclerosis

A recent bioinformatic analysis revealing dominant B cell epitopes of systemic sclerosis-specific autoantibodies, including anti-centromere B, anti-topoisomerase I and anti-fibrillarin, has demonstrated the existence of several in silico antigenic mimics of pathogens that could act as triggers of the respective dominant autoepitopes. Based on those findings, the aim of the present study was to use a more comprehensive bioinformatic analysis. We demonstrated the presence of a plethora of novel microbial mimics, unnoticed by the studies so far conducted, which share remarkable amino acid similarities with the respective autoantigenic epitopes. This bioinformatic approach coupled by in vitro testing of the homologous self/non-self-mimics in serum samples from patients with systemic sclerosis may provide novel evidence of immunological cross-reactivity, implicating currently ignored or overlooked pathogens, which may indeed play a role in the induction of SSc-specific autoantibodies and assist efforts to understand the pathogenesis of this enigmatic disease.


INTRODUCTION
Molecular mimicry has been proposed as a pathogenic mechanism for an autoimmune disease's development, as well as a probe useful in uncovering its aetiologic agents. 1,2 The concept of molecular mimicry is based on the abundant epidemiological, clinical, and experimental evidence of an association of infectious agents with specific autoimmune diseases, and observed immunological cross-reactivities between self-targets and microbial determinants. 3,4 Immunological cross-reactivity is demonstrated when antibodies against a viral or microbial epitope (linear or conserved) can recognize a self-sequence of a known autoantigen, because of their antigenic similarity. In the case of linear epitopes, this similarity is likely to be a consequence of amino acid (aa) similarities between the core epitopic region of the respective self/non-self-epitopes. 2,4 If this occurs, a cross-reactive immune response against the determinant shared by the host and the virus can evoke a tissue-specific immune response that is presumably capable of eliciting cell and tissue destruction. The probable mechanism is generation of autoantibodies or cytotoxic cross-reactive effector lymphocytes, that recognise specific determinants on target cells. 4 By a complementary mechanism, the microbe can induce cellular injury and release self-antigens, which generate immune responses that cross-react with additional but distinct self-antigens, a mechanism which we have termed "multiple hit" molecular mimicry. However, the ultimate documentation of this mechanism has been elusive in humans. Presently, proof for molecular mimicry relies on the availability of structural data of viral and self-peptides that elicit cross-reactive immune responses and pathological features of the disease in experimental animal models. 2,4,5 The information derived from animal studies is of undoubted value, but experimental models of human disease suffer from severe limitations since they rarely reproduce the human condition faithfully. [6][7][8] Thus, several of the mimics that have been able to induce experimental autoimmune disease do not ultimately relate to the human setting. Molecular mimicry studies conducted in human biomaterial are almost impossible to undertake because they necessitate access to biomaterial (sera, cells, tissue) before exposure to the pathogen; such material is rarely accessible for testing. 9 Several studies have implicated molecular mimicry as a likely mechanism responsible for the induction of systemic sclerosis (SSc)-specific autoantibodies. [10][11][12][13][14][15][16][17][18] Sera from SSc patients recognize a sequence originated from human cytomegalovirus (CMV) late protein UL94 which appears to be homologous to the novel antigen-2 (NAG-2), which is expressed in endothelial cells. 12 Anti-UL94 antibodies from SSc patients not only bind to NAG-2 on endothelial cells, but also can provoke apoptosis. 14 This is rather intriguing, because when anti-UL94 CMV antibodies are bound to fibroblasts, they obtain a profibrotic phenotype. Another similarity between UL70 protein shared with Topoisomerase I, a major autoantigen in SSc, has been reported as a likely triggering factor in this disease, but no evidence of cross-reactivity has so far been obtained. 10 A recent paper by Gourh et al. has revisited this topic and investigated the relationships between disease-related autoantibodies, their respective autoantigens/autoepitopes, and genetic HLA associations, which confer susceptibility to geographically/ethnic distinct SSc cohorts. 19 Through their meticulous analysis, and the application of bioinformatics tools, these investigators recognized viral-obtained peptidyl sequences from the Mimiviridae and Phycodnaviridae families, which are highly homologous to their SSc-specific autoantigenic counterparts, and could theoretically act as inducers of the disease via immunological cross-reactivity and molecular mimicry. 19 Going through their identified viral sequences, it became apparent to us that the number of reported mimics was relatively smaller than what we thought we could obtain. We therefore decided to comprehensively assess the extent of self/non-self-mimicry and to provide a complete list of likely mimics also using a bioinformatics approach.

MATERIAL AND METHODS
Basic Local Alignment Search Tool Protein (BLASTp), BLASTp2 (National Center for Biotechnology Information, NCBI, National Library of Medicine, Bethesda, MD, USA) (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins) and Proteininfo (Rockefeller University, New York, NY, USA) (http://prowl.rockefeller.edu/prowl/proteininfo. html) computer-assisted programmes were used for aa comparisons and identification of peptidyl sequences homologues to those used as reference sequences reported by Gourh et al. "RQRAVALYFIDKLAL", from the immunodominant topoisomerase I epitope and "LQEAAEAFLVHLFED" from the centromere A (CENPA) epitope. 19 The (BLASTp) is a sequence comparison algorithm, enables comparison of a protein or peptide sequence with a database of sequences and identifies similar and potentially homologous sequences. The comparison of sequences is based on substitution matrices (such as BLOSUM62 that is used by BLAST by default) that score alignments of evolutionarily divergent protein sequences. We performed our BLASTp analyses as per Gourh et al. with slight modifications. When necessary, confirmatory Proteininfo analyses were performed, as we previously described. [20][21][22][23][24][25] The Ethics Committee of the University General Hospital of Larissa, Greece granted an approval for the completion of the study in the University of Thessaly. Statistical analyses for the significance of the observed similarities were automatically obtained by the acquisition of aligned sequences via BLASTp search engines.

RESULTS
Taking advantage of the work performed by Gourh et al., who determined the immunodominant human topoisomerase I epitope, CENPA fibrillarin and its microbial mimics, we expanded this work emphasizing key elements that have gone unnoticed. First, we explored homology between the bioinformatically predicted immunodominant peptide sequences and microbial protein sequences to assess whether the mimicries we identified by BLASTp analysis differ to those reported by Gourh et al. (Figure 1 and 2, Suppl. . In addition to that, we performed a Proteininfo analysis using 5meric sequences originated from topoisomerase I epitope "RQRAVALYFIDKLAL," and CENPA sequence "LQEAAEAFLVHLFED", 19 and requested for identical motifs being present in microbial and human sequences. Initially, the bioinformatically predicted immunodominant peptide sequences from topoisomerase I were compared for homology with microbial protein sequence databases. The approaches used by Gourh et al. and by the present study are illustrated in Figure 3. The Several hundreds of homologous sequences were identified in various species some of which differed from those recently published. 19 Gourh et al. reported only one sequence in topoisomerase I, "RQRAVALYFIDKLAL," that had remarkable high-scored matches within the viral database at an Expected (E) value of <0.05. E-value represents the number of BLAST alignments with the observed score or higher than that are expected to occur by chance in a database search, and is a measure of the significance of homology. Some of the alignments produced by BLAST could be due to chance and not due to a biologically meaningful relationship between the two sequences. These alignments would have a high E-value. In contrast, alignments with low E-values are not random, and the two sequences might be related biologically. This is why Gourh et al. reported only highscored matches with the E values of <0.05. The Authors reported that the matched homologous peptides originated from the nucleocytoplasmic large DNA virus clade viruses, in the Mimiviridae family. 19 We anticipated that analyses limited to matches with high quality matched, might lead to an underestimation of potential antigenic mimics for reasons we thoroughly For each sequence (i.e. the 15-mer of and the 15-mer of topoisomerase I), the first five-homology pairs with viruses and the first five-homology pairs with bacteria were chosen, according to the following criteria: -The pairs ought to have at least 6 (≥6) consecutive homologous aa, allowing one aa difference, but of the same side chain group (i.e. IV or YF) -Only one pair was chosen from same species -No pairs from phages were included -No pairs from uncultured bacteria or viruses were included -No hypothetical proteins were included   Table 3 provides details of the Taxid numbers used during BLASTp search. E: expected value. The "Identities" value corresponds to the number of amino acids that are identical between the two groups, while the "Positives" value corresponds to the number of amino acids that are identical or conservatively substituted. The "Gaps" value denotes the number of gaps in the alignment. In Figures  1 and 2, we included the values "Consecutive identity" and "Local E-value" to emphasize the local similarity between the aligned sequences. The "Consecutive identity" value corresponds to the number of amino acids that are identical in a row and the "Local E-value" corresponds to the E-value calculated after aligning the original 15-mer sequence with each consecutive identical sequence alone. groups, while the "Positives" value corresponds to the number of amino acids that are identical or conservatively substituted. The "Gaps" value denotes the number of gaps in the alignment. In Figures 1 and   4 2, we included the values "Consecutive identity" and "Local E-value" to emphasize the local similarity between the aligned sequences. The "Consecutive identity" value corresponds to the number of amino 5 acids that are identical in a row and the "Local E-value" corresponds to the E-value calculated after aligning the original 15-mer sequence with each consecutive identical sequence alone.

TITLE
discuss. An extensive list of matches provided by our analysis in Suppl Tables

DISCUSSION
In the present study, we extended the understanding for the potential role played by molecular mimicry in the induction of SSc-specific autoimmunity. On the basis of a bioinformatics analysis, we provided an exhaustive list of microbial mimics of dominant B-cell autoantigenic epitopes, implying that several previously unnoticed pathogens may indeed play role in the development of autoreactive B cell responses related to SSc. This enormous number of pathogenic determinants shares remarkable aa homologies with two key autoepitopes, one originated from topoisomerase I (Scl70) and one from centromere A. Based on these findings, hypothesis-driven experiments working on the extent by which synthetic peptides spanning the respective aa sequences could be tested for antibody reactivity using sera from anti-centromere A and anti-Scl70 positive SSc patients.  For each sequence (i.e. the 15-mer of and the 15-mer of CENPA), the first five-homology pairs with viruses and the first five-homology pairs with bacteria were chosen, according to the following criteria: -The pairs ought to have at least 6 (≥6) consecutive homologous aa, allowing one aa difference, but of the same side chain group (i.e. IV or YF) -Only one pair was chosen from same species -No pairs from phages were included -No pairs from uncultured bacteria or viruses were included -No hypothetical proteins were included   Table 3 provides details of the Taxid numbers used during BLASTp search. E: expected value. The "Identities" value corresponds to the number of amino acids that are identical between the two groups, while the "Positives" value corresponds to the number of amino acids that are identical or conservatively substituted. The "Gaps" value denotes the number of gaps in the alignment. In Figures 1 and 2, we included the values "Consecutive identity" and "Local E-value" to emphasize the local similarity between the aligned sequences. The "Consecutive identity" value corresponds to the number of amino acids that are identical in a row and the "Local E-value" corresponds to the E-value calculated after aligning the original 15-mer sequence with each consecutive identical sequence alone.
Supplementary Table 3  it is well-known that homologies do not need to be remarkable in order to induce cross-reactivity; mimicries restricted to 5-6-aa long, may have great success in immunological terms. 26 For example, Kanduc et al. 26 found that 34 pentameric sequences from the viral capsid protein of human papillomavirus shared sequences with human proteins that are associated with cardiovascular diseases. Trost et al. showed that all human proteins present bacterial motifs at the level of 5-, 6-, 7-and 8-mer aa. 27,28 Thus, aa similarity or even antigenic mimicry (shared motifs between foreign and self-antigens) is not sufficient to establish a significant connection. If the mimicries do not involve human proteins with biologic function such as enzymes or proteins with critical function, such targeting may embrace biologic meaningless, and must be disregarded. Foreign/human T-cell mimicry is more complicated, as motif sharing may involve very limited number of aa, and most molecular mimics cannot be identified by sequence alignment alone. 5 Sharing of a 6-aa long peptide between hepatitis B virus DNA polymerase and myelin basic protein was enough to induce experimental encephalomyelitis in rabbits. In fact, databases currently exist, which provide open access to experimentally verified peptide sequences displaying molecular mimicry and immunological cross-reactivity, and can serve as sources of in vivo and in vitro research. 29 For example, epidemiological studies, micro-biological data and immunological evidence of mimicry currently support a link between ankylosing spondylitis, HLA B27, and bacterial infection. 30 A single substitution of one aa is more than enough to abrogate such an effect, highlighting the importance of the shared motifs. Dozens of papers have been published so far (including those generated by our group), reporting the existence of such mimicries that operate both at the B-cell (cross-reactive antibodies) and the T-cell level at various autoimmune diseases. 4,8,[20][21][22][23]31,32 We have also been able to show that mimicking alignments per se are not sufficient enough to reveal cross-reactive mimics. 24,25,33 On the other hand, epitopic sequences, which do not share 100% identity, may well be able to act as cross-reactive targets of an antibody, if the dissimilar aa are not sufficient to change the antigenicity of the sort peptides. This is exemplified in particular at T-cell epitope mimics. 5 Given the relatively limited number of previously observed similarities which involve SSc-specific autoantigens and pathogens, we hypothesized that BLAST search could identify more mimics, and therefore would uncover likely microbial and viral triggers of SSc autoantibodies. Such identification could provide a convincing theoretical explanation for non-self-triggered autoimmunity in SSc by the involvemnt of molecular (antigenic) mimicry and subsequent immunological cross-reaction. If the attack Step-wise approaches used by approaches used by Gourh et al. 19 and the current study. Step-wise approaches used by approaches used by Gourh et al. 19 and the current study. 14 15 TITLE of the pathogen is met by the formation of strong T-and B-cells responses, aiming at controlling or preventing the infiltration of the virus in cells and tissues, and if the immune system is equipped with the 'wrong' genetic background, which restricts such responses at epitopic level that accidentally share homologies with self-epitopes, a cataclysm of reactions may happen which leads to autoimmunity and immune-mediated inflammation. Of particular interest was that we recognized several mimics that Gourh et al. did not identify, just because they could not fulfill their strict scoring criteria, which, by the way, are totally justifiable. With these results at hand, we speculate that local identical mimics must further be examined. Neither Gourh et al. nor we have experimentally tested the identified mimics. Immunological cross-reactivity can and must be defined through humoral and cellular studies involving synthetic peptides, which must be recognized by serum samples or T-cells from patients with SSc seropositive for their respective disease-specific autoantibodies. Only then, we can speculate on their potential pathogenic significance.
In conclusion, the present bioinformatics analysis allows the development of a hypothesis that is valid and can/ must be tested in vitro and in vivo. It also provides important clues as to whether molecular mimicry is a likely mechanism for the induction of autoimmunity related to SSc, and opens a window of opportunity for specifically designed research focusing on the similarities so far published.

CONFLICT OF INTEREST
The authors declare no conflict of interest.