Conservation of the OmpC Porin Among Typhoidal and Non-Typhoidal Salmonella Serovars

Salmonella enterica infections remain a challenging health issue, causing significant morbidity and mortality worldwide. Current vaccines against typhoid fever display moderate efficacy whilst no licensed vaccines are available for paratyphoid fever or invasive non-typhoidal salmonellosis. Therefore, there is an urgent need to develop high efficacy broad-spectrum vaccines that can protect against typhoidal and non-typhoidal Salmonella. The Salmonella outer membrane porins OmpC and OmpF, have been shown to be highly immunogenic antigens, efficiently eliciting protective antibody, and cellular immunity. Furthermore, enterobacterial porins, particularly the OmpC, have a high degree of homology in terms of sequence and structure, thus making them a suitable vaccine candidate. However, the degree of the amino acid conservation of OmpC among typhoidal and non-typhoidal Salmonella serovars is currently unknown. Here we used a bioinformatical analysis to classify the typhoidal and non-typhoidal Salmonella OmpC amino acid sequences into different clades independently of their serological classification. Further, our analysis determined that the porin OmpC contains various amino acid sequences that are highly conserved among both typhoidal and non-typhoidal Salmonella serovars. Critically, some of these highly conserved sequences were located in the transmembrane β-sheet within the porin β-barrel and have immunogenic potential for binding to MHC-II molecules, making them suitable candidates for a broad-spectrum Salmonella vaccine. Collectively, these findings suggest that these highly conserved sequences may be used for the rational design of an effective broad-spectrum vaccine against Salmonella.

Salmonella enterica infections remain a challenging health issue, causing significant morbidity and mortality worldwide. Current vaccines against typhoid fever display moderate efficacy whilst no licensed vaccines are available for paratyphoid fever or invasive non-typhoidal salmonellosis. Therefore, there is an urgent need to develop high efficacy broad-spectrum vaccines that can protect against typhoidal and non-typhoidal Salmonella. The Salmonella outer membrane porins OmpC and OmpF, have been shown to be highly immunogenic antigens, efficiently eliciting protective antibody, and cellular immunity. Furthermore, enterobacterial porins, particularly the OmpC, have a high degree of homology in terms of sequence and structure, thus making them a suitable vaccine candidate. However, the degree of the amino acid conservation of OmpC among typhoidal and non-typhoidal Salmonella serovars is currently unknown. Here we used a bioinformatical analysis to classify the typhoidal and non-typhoidal Salmonella OmpC amino acid sequences into different clades independently of their serological classification. Further, our analysis determined that the porin OmpC contains various amino acid sequences that are highly conserved among both typhoidal and non-typhoidal Salmonella serovars. Critically, some of these highly conserved sequences were located in the transmembrane β-sheet within the porin β-barrel and have immunogenic potential for binding to MHC-II molecules, making them suitable candidates for a broad-spectrum Salmonella vaccine. Collectively, these findings suggest that these highly conserved sequences may be used for the rational design of an effective broad-spectrum vaccine against Salmonella.

INTRODUCTION
Salmonella enterica infections remain a significant worldwide health problem, accounting for more than 120 million cases and approximately 1 million deaths annually (1,2). These high morbidity and mortality rates are caused mainly by enteric fevers (typhoid and paratyphoid) and by non-typhoidal Salmonella (NTS) gastroenteritis (1)(2)(3). Furthermore, invasive NTS bacteremia (iNTS) is a common complication observed in immunocompromised adults and in young children with severe malaria and malnutrition (4). The current available licensed vaccines against Salmonella are the oral live attenuated Ty21a, the Vi capsular polysaccharide (Vi CPS), and the Vi-tetanus toxoid conjugate (Vi-TT), which only target the Typhi serovar, and have shown variable efficacy; 50% (95% CI 35-61%) for Ty21a, 55% (95% CI 30-70%) for Vi-CPS, and 54.6% (95% CI 26.8-71.8%) for Vi-TT (5, 6), while no licensed vaccines against iNTS are currently available (7). Although cross-reactivity through vaccination with the Ty21a vaccine can be induced against Paratyphi A, B, and Enteritidis serovars (8,9), cross-protection has been reported only against Paratyphi B (10). Therefore, there is an urgent need for the development of novel broad-spectrum vaccines against Salmonella, which must be based on shared key structural components that induce protective immune responses against typhoidal and NTS serovars.
Porins are one of the most abundant outer-membrane proteins (Omp) in Gram-negative bacteria, which play a crucial role in the diffusion of small hydrophilic compounds, and are essential for bacterial survival and pathogenicity (11,12). Porins are β-barrel structures consisting of 16 β-sheets (β), with 8 internal periplasmic turns (T) and 8 extracellular loops (L) (13,14). Salmonella and other Gram-negative bacteria express two major porins, OmpC and OmpF (15)(16)(17). We have previously shown that S. Typhi OmpC and OmpF porins efficiently elicit innate immune responses through the TLR-mediated activation of antigen-presenting cells (18), which induce long-lasting porin-specific bactericidal antibody and cell-mediated immune responses (19)(20)(21)(22). However, the basis of antigen specificity of Salmonella porins is not well-understood. Previous studies have shown that the porin OmpC shows a high degree of homology in terms of sequence and structure among Enterobacteriaceae porins (11,13,15,23,24). Therefore, antibody and cellmediated cross-reactivity among Salmonella serovar porins has been widely reported in mouse models (19,(24)(25)(26)(27)(28). However, the degree of amino acid conservation of the porin OmpC among typhoidal and NTS serovars remains unknown. Through bioinformatics, we found that the typhoidal and NTS OmpC amino acid sequences can be classified into eight different clades that are independent of serovar classification. In addition, we found that the porin OmpC contains three distinct amino acid sequences, which are highly conserved among typhoidal and NTS serovars. These highly conserved sequences are located along the transmembrane β-sheet domains within the porin β-barrel. Furthermore, we found that one of the highly conserved OmpC sequences is present exclusively in Salmonella and not in other Enterobacterial porins and has the potential of binding to MHC-II molecules. Collectively, our results show that the porin OmpC of Salmonella contains highly conserved amino acid sequences, which could be used for the rational design of an effective, broad-spectrum vaccine against Salmonella.

Conservation Analysis
Full-length protein sequences for OmpC porin from typhoidal and NTS serovars (Typhi, Paratyphi A, B, C, Typhimurium, Enteritidis, Dublin, and Gallinarum) were collected from the NCBI Entrez protein database using Taxonomy IDs (Txid). Subsequently, sequences for OmpC from all serovars were each aligned using Clustal Omega standalone binary 1.2.1 (29) and used to create a neighbor-joining tree using the Jukes-Cantor model resampled with 100 bootstraps and samples separated into clades (8 clades). Porin conservation was assessed using inhouse developed software, based on a sliding window approach. Amino acid conservation within each clade was assessed using a 15 amino acid window with a mean conservation value (between 0 and 1) given for each window, determined by amino acid similarity. Windows with a mean value less than the first quartile of all windows was classed as conserved (intraclade). Zero (0) represents a fully conserved window. This was used to generate an intra-conservation plot representing the mean window conservation across the entire proteome. Subsequently, conservation across clades (inter-clade) was assessed by identifying windows at the same position across clades that were conserved within their respective clades (i.e., mean window value below the first quartile) and given an arbitrary value between 0 and 1000 to indicate the magnitude of inter-clade conservation given. A consensus was created from the clades with shared conservation.

MHC-II Peptide-Binding Prediction
To evaluate the potential immunogenicity of the OmpC conserved sequences, the Immuno Epitope Database (IEDB, https://www.iedb.org) MHC-II binding prediction tool was used. The MHCII binding predictions were performed on Oct/18/2019 using the IEDB analysis resource Consensus tool (34,35). The predicted output is given in units of IC 50 nM for combinatorial library and SMM_align; hence, a lower number indicates a higher affinity. According to the IEDB, as a rough guideline, peptides with IC 50 values <50 nM are considered high affinity, <500 nM intermediate affinity and <5,000 nM low affinity. Most known epitopes have high or intermediate affinity (36).

BLAST Analysis
The conserved sequences of Salmonella OmpC porin were compared against Non-Redundant (nr) GenBank database using the standard protein Basic Local Alignment Search Tool (BLAST), BLASTP 2.10.0+, using the default options (37). Fast minimum evolution pairwise alignment trees were constructed using the default options of BLASTP 2.10.0+ (max seq. difference 0.85, Grishin distance).

Statistical Analysis
Statistics were calculated using linear regression in GraphPad Prism 6.0. P values ≤ 0.05 were considered as significant.

Identification of Conserved Amino Acid Sequences in the Porin OmpC Among Typhoidal and Non-Typhoidal Salmonella Serovars
To determine the degree of conservation among OmpC amino acid sequences from clinically relevant typhoidal (Typhi, Paratyphi A, B, and C) and non-typhoidal (Typhimurium, Enteritidis, Dublin, and Gallinarum) Salmonella serovars (38), we retrieved and aligned 761 Salmonella serovar OmpC amino acid sequences and assessed conservation within serovars using in-house developed software (see methods) ( Table 1). However, sequences within serovars showed very poor identity, likely due to the serological classification of serovars (39). Therefore, OmpC sequences from these serovars were used to create a neighborjoining tree and outgroups were classed into 8 separate clades ( Figure 1A, Table 2). Each clade contained a mixture of serovars ( Figures 1B,C). However, clades A, B, C, D, F, and H consisted of a majority of non-typhoidal serovars, while clades E and G were comprised of mostly typhoidal serovars, with clade E containing only sequences from Paratyphi A and B. As would be expected, the greater the number of sequences per clade the greater the number of serovars it contained (R 2 = 0.7709, * * p = 0.00413). We found that Clade D contained sequences from all 8 serovars analyzed, followed by clade A with 7 serovars, C and F with 6 serovars, clade H with 5 serovars, clades B and G with 4 serovars, and clade E with 2 serovars. In addition, we found that the most widely distributed serovars among the clades were Paratyphi A and B, which had sequences present in all of the clades. Salmonella Enteritidis was found in all but one of the clades analyzed (clade E), while Typhimurium and Dublin were found in 6 clades. Typhi and Paratyphi C serovars were only present in 3 clades, whereas Gallinarum was only found in one clade. Next, we assessed the degree of conservation of full-length OmpC sequences within each clade (intra-clade; Figure 2A). Analysis showed a pattern of diversity and conservation across the protein unique to each clade, however some clades showed similar conservation fingerprints. For example, clades D and F showed a similar trend at the N-terminus, whereas clades F and G were more similar toward the C-terminus. There was no significant correlation between either the number of serovars per clade and the median conservation value for that clade (R 2 = 0.09655, p = 0.6412) and the number of sequences per clade and the median conservation value for that clade (R 2 = 0.008383, p = 0.8293). Subsequently, conservation between clades was assessed (Figure 2B), which identified 5,15,23,28,16,8, and 2% of the protein covered by regions conserved in all 8 or 7, 6, 5, 4, 3, and 2 clades respectively. Whereas, 3% of the protein sequence showed no conservation across clades, located centrally at position 236-246. Remarkably, when comparing more clades (>5) two regions of distinct cross clade conservation could be seen located at the 50-200 (X) and 320-430 (Y) amino acid positions, suggesting two regions of functional importance. Within these locations there were three regions (R1-R3) that showed a high degree of conservation among all of the Salmonella clades analyzed ( Table 3). Collectively, these data show that the OmpC porin contains distinct amino acid sequences that are highly conserved among typhoidal and non-typhoidal Salmonella serovars.

The Conserved Amino Acid Sequences Are Located Along the β-Sheets of OmpC Porin
Next, we sought to identify the location of the conserved regions along the secondary structure of S. Typhi OmpC, the only available crystal structure of a Salmonella OmpC porin reported to date (Figure 3A) (32). Our results show that the sequence of the conserved region R1 (KGETQINDQLTGY) was located partially along the β3 β-sheet, the periplasmic turn T3, and part of the β4 β-sheet. The conserved region R2 (WTRLAFAGLKFA) was located along the β5 β-sheet. Finally, the conserved region R3 (GFANKTQNFEVVAQYQFDFGLRPSQAYLSKG) was located along the β13 β-sheet, the periplasmic turn T7, and the β14 β-sheet. The visualization the conserved amino acid sequences on the crystal S. Typhi OmpC porin structure showed that most of the conserved sequences were distributed along the porin β-barrel ( Figure 3B). Collectively, our data show that the amino acid sequences conserved among Salmonella clades are located along the β-sheets and periplasmic turns of the OmpC porin β-barrel.

The Conserved Amino Acid Sequence R1 Is Exclusive for Salmonella
Because porins from Enterobacteriaceae show high-level sequence similarity (11,13,15,24), we questioned whether the conserved sequences were exclusive to Salmonella porin OmpC. BLASTp analysis of R1 indicated that this amino acid sequence was found among several OmpC porins from Salmonella enterica serovars, as well as other Salmonella porins, such as OmpS2 and PhoE (E-value 3 × 10 −4 , 100% identity). In addition, BLASTp results showed that the amino acid sequence from R1 (KGETQINDQLTGY) was also present in the OmpC and OmpF porins of the plant-associated genus Pantoea (40) (Figure 4). Conversely, the amino acid sequence of R2 (WTRLAFAGLKFA) was not exclusive to Salmonella serovars, since BLASTp results showed that this sequence was also found in the porins OmpC , while X-axis shows the position in the aligned amino acid consensus sequence. A conservation value below the first quartile was classed as conserved for each clade. (B) Inter-clade conservation patterns in the protein sequence of OmpC porin among typhoidal and non-typhoidal Salmonella serovars. Full-length OmpC protein sequences were retrieved from NCBI and aligned using Clustal Omega, and inter-serovar conservation was assessed using in-house developed software. The measure of OmpC amino acid conservation between Salmonella clades is shown in Y-axis, while X-axis shows the position in the aligned amino acid consensus sequence. Colors indicate the number of clades that share conservation between each other. Arrows indicate the regions conserved among all Salmonella clades and gray bars indicate regions of distinct cross-clade conservation (see Table 1). and OmpN of Escherichia coli and Klebsiella sp. (E-value 3 × 10 −3 , 100% identity) (Figure 5). Finally, the amino acid sequence of R3 (GFANKTQNFEVVAQYQFDFGLRPSQAYLSKG) was found to be present in several Salmonella enterica serovar OmpC porins, however it was also present in other porins, such as PhoE, OmpC, and OmpF from other Enterobacteria, such as E. coli, Enterobacter sp., Citrobacter sp., Klebsiella sp., and Rahnella sp. (E-value 6 × 10 −19 -4 × 10 −18 , 90.62% identity) (Figure 6). Next, based on HLA allele frequencies and reference sets with maximal population coverage, we predicted the MHC-II alleles to which the conserved R1 seqeunce could bind ( Table 4).

DISCUSSION
The development of novel tools for the detection of conserved sequences among vaccine candidates is particularly relevant to the discovery of shared antigenic determinants, which could be used for the rational design of broad-spectrum vaccines. In this study, we used in-house-developed software to evaluate the amino acid sequence conservation of the OmpC porin among typhoidal and NTS serovars. Although it has been reported that the porin OmpC has a high degree of homology in sequence and structure among Enterobacteriaceae porins, most of these  Frontiers in Immunology | www.frontiersin.org works have focused on defining the differences among amino acid sequences between the OmpC of Salmonella serovars and other Enterobacteria, but have not shown the degree of conservation of OmpC among serovars (11,13,15,23,24). To our knowledge, our work is the first to determine the degree of amino acid conservation of the OmpC porin among typhoidal and NTS serovars, and is the first work to define the conserved regions among Salmonella serovars OmpC porin. Previous reports have shown that the OmpC transmembrane regions are homologous in sequence and structure among Enterobacteria (13,41). Consequently, it was expected that most of the conserved sequences among Salmonella OmpC would be located along the transmembrane β-sheets of the porin βbarrel, as our results show. By contrast, none of the conserved amino acid regions were located along the surface-exposed loops; furthermore, we identified a region within the OmpC porin with no conservation across clades that corresponds to the external loop L4, which has been shown to be one of the regions with more antigenic variability within the OmpC (42)(43)(44). Our results show that the Salmonella OmpC conserved regions are located along the transmembrane β-sheets of the porin β-barrel, one explanation for this could be that some of the amino acid sequences contained in the conserved regions R1 and R3 of Salmonella OmpC are located in subunit contact regions, which are highly conserved among Enterobaceriaceae porins (24,45).
Likewise, it has been reported that the arginine residue (R-95) contained in the conserved region R2 is involved in pore formation (15,46), which could explain the conservation of this region among Salmonella serovars.
The evidence that the conserved sequence in R1 was exclusively found in Salmonella porin sequences and not in other gut-associated Enterobacteria, suggests that the immune response that this sequence would induce should be Salmonellaspecific. Conversely, the finding that the sequences contained in regions R2 and R3 were also found in porins of other commensal and pathogenic Enterobacteria, such as E. coli, Klebsiella sp., Enterobacter sp., Citrobacter sp., Klebsiella sp., and Rahnella sp., suggests that the sequences of regions R2 and R3 are not exclusive of Salmonella porins. Our results show that the conserved OmpC sequences R1 can potentially bind to human MHC-II molecules. Similar results were found for human CD4 + T cell epitopes conserved between meningococcal and gonococcal Neisseria porins (47,48). Furthermore, it has been reported that Enterobacteriaceae porins have crucial antigenic epitopes corresponding to regions buried within the outer membrane, which are also highly conserved among Enterobacterial species (24). Some of the amino acid residues from the OmpC conserved region R1 (GFKGETQ) have also been shown to be highly conserved among Enterobacteriaceae porins because of their location in a crucial domain involved in porin subunit interactions (24,45). In addition, some of the amino acid sequences conserved among Salmonella serovars (R1 and R3), have previously been reported as antibody targets or predicted as potential B cell epitopes (11,24,49); however, it remains unknown whether any of the conserved sequences can also be recognized by antibodies. In addition, further studies are needed to determine the contribution of MHCrestriction responses to the immunogenicity of the conserved OmpC peptides in T cells. Previous work identified two MHC-I-restricted epitopes in Salmonella OmpC porin (50), and strikingly, the amino acid sequence contained in one of the CD8 + T cell-specific peptides contains identical or similar residues to the sequences contained in the conserved region R2 (TRVAFAGL). However, future work will need to focus on whether CD8 + T cells from healthy donors or convalescent patients may also recognize some of the conserved OmpC sequences.
Although this work has shed some light regarding antigen specificity of the Salmonella OmpC porin among typhoidal and NTS serovars, there are still several questions left unanswered. For instance, it remains to be determined the cytokine profile produced by OmpC-specific CD4 + T cells, as we have previously shown that vaccination of healthy volunteers with either the Ty21a vaccine or with Salmonella porins induces IFN-γ-and TNF-α-producing CD4 + T cells (20,22). Furthermore, it remains unknown whether the conserved OmpC peptide sequences can be also recognized by T cells from convalescent patients or healthy volunteers challenged with typhoidal and NTS Salmonella serovars. Because our current porin-based vaccine candidate is made of a mixture of OmpC and OmpF porins, it remains to be determined the degree of conservation of the porin OmpF among typhoidal and non-typhoidal Salmonella by means of the same methodology.
In conclusion, our work is the first to specifically establish the degree of conservation of the porin OmpC among typhoidal and non-typhoidal Salmonella serovars and to define the specific amino acid sequences with the highest degree of conservation among typhoidal and NTS serovars. Furthermore, we found that one of the highly conserved OmpC amino acid sequences is exclusive for Salmonella and has immunogenic potential for MHC-II binding. Considering that porins are highly immunogenic and protective vaccine candidates against Salmonella infections, our findings may lead to a better understanding of the basis of antigen specificity of Salmonella porins, which could be used to design tools for monitoring the porin-specific immune response after challenge or vaccination and could have direct implications for the rational design of a broad-spectrum vaccine against Salmonella.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/supplementary material.

AUTHOR CONTRIBUTIONS
NV-P and JB performed the experiments, analyzed the results, and wrote the paper. MP-T and GA-V analyzed the results and wrote the paper. PKe wrote the in-house computer program. AK, CP-S, and CG-C analyzed results and revised the manuscript. IW-B, LS-T, RP-P, and AI analyzed the results and revised the manuscript. AR-S, PKl, and CL-M designed the study, supervised the experiments, and revised the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.