The Invasin and Complement-Resistance Protein Rck of Salmonella is More Widely Distributed than Previously Expected

ABSTRACT The rck open reading frame (ORF) on the pefI-srgC operon encodes an outer membrane protein responsible for invasion of nonphagocytic cell lines and resistance to complement-mediated killing. Until now, the rck ORF was only detected on the virulence plasmids of three serovars of Salmonella subsp. enterica (i.e., Bovismorbificans, Enteritidis, and Typhimurium). The increasing number of Salmonella genome sequences allowed us to use a combination of reference sequences and whole-genome multilocus sequence typing (wgMLST) data analysis to probe the presence of the operon and of rck in a wide array of isolates belonging to all Salmonella species and subspecies. We established the presence of partial or complete operons in 61 subsp. enterica serovars as well as in 4 other subspecies with various syntenies and frequencies. The rck ORF itself was retrieved in 36 subsp. enterica serovars and in two subspecies with either chromosomal or plasmid-borne localization. It displays high conservation of its sequence within the genus, and we demonstrated that most of the allelic variations identified did not alter the virulence properties of the protein. However, we demonstrated the importance of the residue at position 38 (at the level of the first extracellular loop of the protein) in the invasin function of Rck. Altogether, our results highlight that rck is not restricted to the three formerly identified serovars and could therefore have a more important role in virulence than previously expected. Moreover, this work raises questions about the mechanisms involved in rck acquisition and about virulence plasmid distribution and evolution. IMPORTANCE The foodborne pathogen Salmonella is responsible for a wide variety of pathologies depending on the infected host, the infecting serovars, and its set of virulence factors. However, the implication of each of these virulence factors and their role in the specific host-pathogen interplay are not fully understood. The significance of our research is in determining the distribution of one of these factors, the virulence plasmid-encoded invasin and resistance to complement killing protein Rck. In addition to providing elements of reflection concerning the mechanisms of acquisition of specific virulence genes in certain serotypes, this work will help to understand the role of Rck in the pathogenesis of Salmonella.

(plasmid-encoded fimbriae) operon located just upstream of pefI. SrgA is a disulfide reductase involved in fimbriae biogenesis and in SpiA (T3SS-2 effector) folding. The protein encoded by srgB contains a putative lipoprotein signal sequence and displays a weak homology with the thermostable phytase superfamily. Finally, srgD and srgC encode putative transcription factors homologous to the LuxR and AraC families, respectively (19,20). Currently, no functional link has been established between rck and any of the other ORFs in the pefI-srgC operon.
To this day, the rck ORF was identified only in three serovars (i.e., S. Enteritidis, S. Typhimurium, and S. Bovismorbificans), with a tendency toward specific associations between alleles and serovars. However, hybridization experiments using a DNA probe targeting the rck region and microarray experiments led to the hypothesis that the gene might also be carried on S. Blegdam, S. Pullorum, S. Moscow, S. Typhisuis, and S. Sendai genomes (18,20,21). Moreover, several studies reported the presence of other genes (e.g., spvC and pefA) traditionally associated with Salmonella virulence plasmids in serovars, such as S. Blegdam, S. Onarimon, S. Heidelberg, etc. (22)(23)(24). Together, these data suggested that the pefI-srgC operon, and therefore rck, might be more widely distributed than expected. In this study, our aim was to determine the distribution of the rck ORF within the Salmonella genus and to study the functionality of the variants of the protein. We took advantage of the large number of Salmonella genome assemblies available in the public database Enterobase to perform a large-scale screening of the pefI-srgC ORFs (25). One or more of these ORFs were found in 67,596 genomes associated with 61 serovars of S. enterica subsp. enterica but also in subsp. salamae, arizonae, diarizonae, and houtenae. The rck ORF was retrieved in 36 of them, thus extending greatly the list of serovars encoding this virulence factor. As a polymorphism at the rck locus was observed, in vitro studies were also carried out to assess the functionality of new Rck variants concerning their ability to confer complement resistance and invasion capacity to the pathogen.

RESULTS
Analysis of the pefI-srgC operon within Salmonella reference genomes. In order to find new strains carrying the pefI-srgC operon, we first looked for the presence of the operon's ORFs in the genomes of reference strains. A BLAST search of all these ORFs on 27 genomes retrieved from the ATCC genomic database allowed us to extend the list of pefI-srgC 1 serovars and highlighted variations in the composition and organization of the pefI-srgC operon according to the serovar.
As stated by Mambu et al. (4), two observations can be made when comparing the three complete operons previously described (retrieved on the virulence plasmids of S. Typhimurium, S. Enteritidis, and S. Bovismorbificans). The first observation is that the structure of the operon varies between these three serovars, as an inversion was observed between the pefI-srgD locus and the srgA ORF in S. Bovismorbificans compared to the sequences of S. Typhimurium and S. Enteritidis (Fig. 1A). The second observation is that the putative promoter sequences of the operon are also completely different between these serovars. While the inversion found in S. Bovismorbificans appears to have transposed the intergenic srgD-srgA region upstream of the operon, the promoter region found in S. Enteritidis differs completely from the one found in S. Typhimurium due to a completely different nucleotide sequence between orf5 (in the pef operon) and pefI in these two serovars. This observation illustrates the results presented by Abed et al. highlighting the differences in the regulatory mechanisms of the operon in these 2 serovars (17). The conditions allowing the transcription of the pefI-srgC operon in S. Enteritidis and S. Bovismorbificans remain currently unknown.
The complete pefI-srgC operon was not found in any other tested genomes retrieved from the ATCC database. However, several genomes harbored one to four ORFs of the operon. The sequence found on the S. Paratyphi C virulence plasmid (pSPCV) carries pefI, srgD (orf7), and truncated srgB ORFs and is very similar to the corresponding virulence plasmid sequence of S. Enteritidis (pSLA5), especially the FIG 1 The pefI-srgC operon on reference genomes of Salmonella retrieved from ATCC. Alignment of the pefI-srgC operon and its surrounding regions on the currently described virulence plasmids of S. Typhimurium ATCC 14028 (p14028s), Bovismorbificans 3114 (pVIRBov), Enteritidis LA5 (pSLA5), and (Continued on next page) upstream region that displays a very similar synteny and high homology ($97%), suggesting a similar regulatory scheme and a common origin (Fig. 1A). S. Dublin, S. Paratyphi A, and S. Typhi genomes also carry a part of the operon (i.e., 59-pefI-srgD-srgA-srgB-39), lacking the two last ORFs rck and srgC. These partial operons are located on the chromosome of their respective strains near the sef (S. Enteritidis fimbriae) operon in the Salmonella pathogenicity island (SPI) SPI-10 ( Fig. 1B), thus confirming the chromosomal location of a partial pefI-srgC operon in some serovars as suggested by Collighan et al. using Southern hybridizations (18). These data suggest a common evolutionary process of S. Paratyphi A, S. Typhi, S. Dublin, and S. Enteritidis chromosomes in the pefI-srgB region. It is also interesting to note that a srgB allele displaying 81% identity with p14028s srgB was retrieved alone on 11 other serovar chromosomes, including S. Paratyphi B, S. Poona, S. Sendai, and S. Montevideo (data not shown). Even if we did not find any new complete pefI-srgC operon sequence in the reference genomes studied, these results highlighted the presence of partial pefI-srgC operons in several new serovars and prompted us to perform a wider analysis using Enterobase.
Distribution of the pefI-srgC operon in Salmonella species and subspecies using Enterobase. To extend this analysis, we retrieved the whole-genome multilocus sequence typing (wgMLST) sequence types (ST) generated by Enterobase from a total of 188,233 assemblies of Salmonella assigned by the Salmonella in silico typing resource (SISTR) prediction tool to the two Salmonella species and to all subspecies, including 465 S. enterica subsp. enterica different serovars. One should consider a bias in this analysis when interpreting the results: the presence of numerous false-negative hits in the data set. For example, srgC was not detected in any S. Bovismorbificans strain and usually was lacking in S. Enteritidis by wgMLST, while contig analysis clearly indicated its presence downstream of rck (data not shown). Consequently, the absence of an ORF in the wgMLST scheme corresponds either to a real absence of this ORF or to a false negative. As specified on the database documentation, this phenomenon occurs when the sequence of the allele of interest is not trustworthy, being either fragmented or duplicated (https://enterobase.readthedocs.io/en/latest/pipelines/backend -pipeline-nomenclature.html). In contrast, a positive result always signifies the presence of an ORF.
Among the extracted assemblies, 67,596 were associated with at least one ORF of the pefI-srgC operon, most of them belonging to S. Enteritidis and S. Typhimurium serovars, whose genomes are the most represented in the database. The profiles of the pefI-srgC operon in each serovar are described in Table S2 in the supplemental material. More than 1,500 profiles were identified using this method. Almost all genomes of S. Enteritidis carry the pefI-srgC operon (n = 35,969/35,996), and only half of the S. Typhimurium genomes harbor this locus (n = 17,562/37,533). However, 90% of the assemblies of the S. Typhimurium monophasic variant (antigenic formula: 4;[5];12:i:2) carry the operon, most of the time in its complete form (Table S2). As expected, we recovered numerous pefI-srgC 1 assemblies from S. Bovismorbificans (n = 157/653), although the total number of genomes associated with this serovar was far less important than that of S. Enteritidis or S. Typhimurium. Most of the assemblies of S. Dublin (2,771/2,872), S. Typhi (8,021/8,037), and S. Paratyphi A (1,673/1,673) display the pefI-srgD-srgA-srgB profile described above on the ATCC reference strains, supporting a wider distribution of a truncated operon on the chromosomes of these serovars (Table  S2). Interestingly, the diversity of the allelic profile of these serovars was strongly biased toward a dominant profile displaying very high frequencies (i.e., according to the allelic profiling categories used in Enterobase; S. Dublin: profile 5-4-6-64-0-0, P = 0.977; S. Paratyphi A: profile 20-2-35-55-0-0, P = 0.992; S. Typhi: profile 2-2-4-3-0-0, P = 0.911). Altogether, these results confirm the data obtained with reference genomes. The pefI-srgC operon was also identified in new serovars, some of them displaying a very high frequency of the operon. Indeed, all the wgMLST data retrieved from the Berta (680/680) and Manchester (13/13) serovars, as well as most of the Gallinarum serovar (295/297), presented a profile containing at least the first four ORFs of the operon. In a similar manner to what has been previously observed on S. Typhi, S. Paratyphi A, and S. Dublin chromosomes, these ORFs are located near the sef operon in SPI-10 and share high identity ($90%) with these reference genomes for this region (Fig. S1). Similar to what was described above, S. Berta and S. Manchester display low allelic profile diversity in this region, with their dominant profile displaying high frequencies (profile 5-4-6-64-0-0, P = 0.904, and profile 2-2-42-3-0-0, P = 0.846, respectively). By contrast, the allelic profiles appear more diverse on S. Gallinarum assemblies (P max = 0.313). Highly prevalent serovars S. Newport, S. Kentucky, and S. Infantis also carry some ORFs of the operon but at a very low frequency (9/9,364; 1/5,855; and 2/ 7,405, respectively) ( Fig. 2; Table S2). It is important to note that the complete operon was retrieved in 14 serovars, including S. Agona, S. Baildon, and S. Paratyphi B (Table  S2). Finally, ORFs of the operon have also been found in genomes from 51 other serovars of S. enterica subsp. enterica but also within genomes of non-enterica subspecies. Indeed, four assemblies of our data set associated with S. enterica subsp. salamae were found to harbor the first five ORFs of the operon, with very different synteny from one strain to another. While two assemblies associated with S. enterica subsp. diarizonae appear to carry multiple ORFs of the operon (profiles 8-0-16-23-7-0 and 9-0-69-0-0-0), srgA was retrieved, alone, in numerous S. enterica subsp. arizonae (116/419), diarizonae (37/769), and houtenae (2/383) assemblies (Table S2).
rck ORF distribution within the assembled genomes database Enterobase. The Rck invasin, encoded by rck, is the most characterized protein of the pefI-srgC operon. It acts as an invasin through interaction with EGFR to mediate a zipper-like internalization within a variety of mammalian cell lines in vitro. It also acts as a resistance to complement factor through interaction with multiple inhibitors of this innate defense system. We thus decided to deepen our analysis on the distribution of this virulence protein using the wgMLST data retrieved from Enterobase. We were able to confirm the presence of the ORF in S. Typhimurium (monophasic variant: 1,459/1,560; diphasic variant: 15,814/35,973), S. Enteritidis (31,480/35,996), and S. Bovismorbificans (157/653) assemblies. Additionally, 33 new serovars and two subspecies were associated with the gene encoding this invasin (Fig. 3). Among these serovars, some are part (S. Enteritidis and S. Typhimurium) of the most frequently isolated serovars responsible for human salmonellosis in the European Union from 2017 to 2019, such as S. Newport (n = 8/ 9,364), S. Brandenburg (n = 8/525), or S. Stanley (n = 5/1,310) (32). It is also noteworthy to highlight the detection of the rck ORF on 11 assemblies (11/165) associated with S. Paratyphi C, for which only a partial operon containing the pefI, srgD, and srgB ORFs was described above.
To estimate whether rck in these strains is plasmid borne or chromosomally encoded, we compared the genetic environment surrounding the rck ORF in the contigs where it was detected with S. Typhimurium-or S. Enteritidis-corresponding regions on the large virulence plasmids. Evidence indicated that some strains acquired the virulence plasmid of S. Typhimurium (e.g., S. Agona, S. Baildon, S. Stanleyville, etc.) ( Fig. 4A; Fig. S2A) or S. Enteritidis (e.g., S. Stanley) (Fig. S2B) or a close derivative of these plasmids (e.g., S. Bispebjerg) (data not shown) through horizontal transfer, as contigs displaying from 90% to nearly 100% identity with these references were retrieved in some of these strain assemblies.
Interestingly, the contigs harboring rck on some S. Paratyphi C assemblies seem to indicate the existence of a novel, undescribed virulence plasmid. In view of the gene content, it appears that this plasmid represents a plasmid distinct from S. Typhimurium and S. Enteritidis virulence plasmids. Indeed, this plasmid, while displaying $90% identity on the shared regions with both of these references, lacks the whole samA to traS region of S. Enteritidis virulence plasmid (pSENV) but kept the traS to finO region of this plasmid. The loss of srgA should also be highlighted in the operon (Fig. S2C).
However, the contigs retrieved in some assemblies of strains associated with serovars S. Poona (Fig. 4B), S. Johannesburg, S. Newport, and S. Stanley (Fig. S2D to F) suggested that the operon might be carried on a nonvirulence plasmid, as it colocalized with plasmid-related genes (e.g., tra, par, etc.) but not with the traditional markers of Salmonella virulence plasmids (i.e., spv locus) (16). The absence of spv genes was confirmed with wgMLST data, which indicate that spvRABCD ORFs were missing from the other contigs of these assemblies.
Finally, colocalization of rck with chromosomal genes as observed on rck 1 contigs issued from S. Sandiego (e.g., phoN, uvrA, and dnaB) and S. Brandenburg (e.g., bigA, (Continued on next page) damX, ybbN, and phnT) assemblies led us to suggest a chromosomal integration ( Fig. 4C; Fig. S2G). It is also interesting to note that the genomic localization may vary among strains of the same serovar. This is notably the case for S. Stanley, which displays two kinds of supports, some rck ORFs associated with pSENV-like large virulence plasmid while others seem to be carried on nonvirulence plasmids (Fig. 4D).
As expected, the consequences of the diversity of rck genomic localizations were found at the level of the promoter region of the pefI-srgC operon. The strains that probably acquired the ORF by horizontal transfer of an entire virulence plasmid retained the sequence upstream of the operon found in S. Typhimurium and S. Enteritidis and presumably a similar pattern of regulation of its expression. However, this same region on other assemblies presents dissimilarities with those described to date, suggesting different regulatory mechanisms in these cases and different evolutionary routes (Fig. S2A to G).
Polymorphism of the rck ORF. rck 1 sequences were found to harbor a noteworthy diversity of allelic variants. Indeed, 89 different alleles (including those previously described on p14028s [number 1], pSENV [number 3], and pVIRBov [number 15]) harbored by a total of 49,001 strains were retrieved from our data set. Comparison of these sequences revealed that the rck ORF is very well conserved, with 81/88 displaying more than 95% of nucleotide identity with allele number 1. However, the phylogenetic tree based on this alignment revealed the existence of two clades, one related to allele number 1 and the other to allele number 3 (Fig. 3). In addition to being the dominant allele retrieved from S. Typhimurium assemblies (monophasic variant: 1,455/ 1,459; diphasic variant: 15,603/15,814), allele number 1 was also retrieved from 60 S. Enteritidis assemblies as well as from 16 assemblies associated with a total of 10 other serovars, including S. Agona, S. Baildon, and S. Stanleyville. Similarly, allele number 3, the dominant allele of S. Enteritidis (30,748/35,996), was retrieved in 12 assemblies associated with 9 other serovars (e.g., S. Typhi, S. Agona, or S. Infantis), while allele number 15 (retrieved on pVIRBov) appeared to be specific of S. Bovismorbificans. In total, 63 undescribed alleles were retrieved from S. Typhimurium and S. Enteritidis assemblies, while assemblies from other serovars harbored a total of 24 undescribed alleles. These latter alleles showed great identity (ranging from 99.4% to 99.8%) with allele number 1, except seven alleles that displayed less than 76% nucleotide identity with this reference (Fig. 3). These important differences are mainly due to deletion of various portions of the ORF, leading to aberrant Rck proteins. Still, they remain very rare, as they are only found in 13 assemblies (among 49,001 rck 1 assemblies) associated with serovar S. Typhimurium (allele number 20: n = 2), S. Soerenga (allele number 58: n = 1; allele number 59: n = 1) S. Javiana (allele number 62: n = 3), S. Enteritidis (allele number 69: n = 2), S. Sandiego (allele number 79: n = 2), and S. enterica subsp. salamae (allele number 58: n = 1; allele number 61: n = 1).
Altogether, these alleles encode proteins that are greatly similar in their sequence to the Rck protein of S. Typhimurium 14028 strain. They exhibit between 96% and 100% amino acid identity, except for alleles 20, 23, 37, 39, 58, 59, 61, 62, 69, 79, 86, 89, and 90 for which the introduction of a frameshift led to the production of truncated proteins.
Phenotypic characterization of Rck variants. Several reports demonstrated that even minor variation in the amino acid sequence might generate important variation in protein function (33,34). We therefore sought to evaluate the impact of the protein polymorphism observed in Rck sequences on its ability to both promote bacterial invasion of host cells and to protect these bacteria from complement attack. To better understand the potential impact of this polymorphism on the three-dimensional (3D) structure of Rck, we generated an in silico model of the protein encoded by allele num-  ber 1. This model shows great reliance with the prediction made by Guiney's laboratory and confirms the overall b-barrel transmembrane structure of the protein, exposing four extracellular loops similar to its Ail homolog in Yersinia pestis (Fig. 5A) (3).
The impact of the polymorphism described above was evaluated with a noninvasive E. coli strain sensitive to complement killing to exclude all the other Salmonella factors involved in these phenotypes that could mask the role of Rck. In this heterologous system, 12 uncharacterized variants of the protein, termed Rck C to N , specifically recovered from Enterobase rck 1 assemblies were overexpressed and compared to Rck A (of S. Typhimurium 14028, encoded by allele number 1) and Rck B (of S. Enteritidis, encoded by allele number 3), the two Rck proteins phenotypically characterized in the literature (10, 14) ( Table 1). The variants were selected based on their distribution within the new rck 1 serovars and their identity level with the most characterized variant of Rck (i.e., Rck A ) (Fig. 5B). Variants presenting less than 96% of identity with Rck A were not studied as we considered that the accumulation of mutations in these variants will have a great probability to change the overall structure of the protein.
Each variant has first been tested for its ability to confer complement resistance by comparing bacterial survival following incubation with normal and decomplemented serum. The MC1061 E. coli strain carried either an empty pSUP202 plasmid for the negative control (absence of serum resistance), the pSUP202 plasmid expressing the Rck protein of S. Typhimurium (Rck A ) as the positive control of serum resistance, or the pSUP202 plasmid expressing the 13 other variants. The first conclusion that emerged from these tests is that all the evaluated variants confer a certain resistance to the action of complement factors. As expected, E. coli MC1061 harboring the mock vector did not survive this innate immunity defense system, showing more than 7 logs of killing. In contrast, all the strains overexpressing one Rck variant exhibited less than 4 logs of killing compared to results obtained with decomplemented serum (Fig. 6A), thus confirming the conservation of the resistance to complement function of these variants. Nevertheless, a deletion of the amino acids at positions 114 and 115 (Rck M ) induced a greater sensitivity to complement, because a significant decrease in the survival rate of bacteria overexpressing this variant was observed (3.54 6 0.07 log kill; P = 6.4E23) compared to strains overexpressing Rck A (2.23 6 0.23 log kill). These results are in accordance with previous studies showing the importance of the third extracellular loop on the virulence-associated phenotypes granted by Rck and highlight the role of these two amino acids in the function of Rck in serum resistance (3,10).
Then, the variants were tested for their ability to mediate E. coli invasion into JEG-3 cells. These cells were chosen because this is one of the cell lines for which we observed the greatest difference between the control strains, thus allowing for easier detection of slight effects of amino acid substitutions (10). The first observation that emanated from these experiments was that the polymorphism observed between the three previously sequenced alleles of S. Typhimurium, S. Enteritidis, and S. Bovismorbificans generates proteins (Rck A , Rck B , and Rck C , respectively) that share the same ability to promote invasion of JEG-3 cells (Fig. 6B). The second observation was that, unlike what was observed during resistance to complement assays, not all the variants granted the ability to invade this cell line. Indeed, overexpression of Rck H , Rck I , and Rck M led to a significant decrease of the ability of E. coli to invade JEG-3 cells compared to Rck A . The quantity of internalized bacteria counted following treatment with gentamicin indicated a significant decrease of 83.7% (P = 4.3E24), 88.8% (P = 1.5E24), and 99.5% (P = 1.6E211) in the entry rate induced respectively by these variants compared to Rck A (Fig. 6B).
Rck I only differs from its parental variant (Rck F ), which does not induce altered invasion, by one substitution in the predicted signal peptide in position 17 (A17V substitution) (Fig. 5B). Although the explanation for this decrease in cellular invasion (88.8%) FIG 4 Legend (Continued) multiple molecular supports within the same serovar (D). On the sequences displaying consistent annotations, red, orange, and green arrows represent the ORFs of the pefI-srgC operon, the pef operon, and the spv locus, respectively. Gray areas between the schematic sequences denote nucleotide identity with a gradient specified in the bottom-right corner of the figures.  Table 1. The four loops are indicated by colored arrows above the sequence. The gray area represents the 114 to 159 peptide necessary and sufficient to promote invasion of mammalian cells. Underlined amino acids represent the signal peptide. Amino acids indicated in red denote polymorphic residues compared to the sequence of the Rck A variant. Dashes in the Rck M sequence correspond to the deletion of two amino acids. remains unclear, one can hypothesize that the accumulation of two mutations in the signal peptide would lead to a significant, but not a total, decrease in the membrane addressing of this variant. E. coli overexpressing Rck H displayed a significant decrease of invasion rate of 83.7% compared to Rck A . This phenotype might appear surprising when comparing the sequence of this variant with the sequence of Rck C given the localization of the substitution observed in the first loop (I38V). Indeed, it was demonstrated that the third loop is sufficient to promote invasion (10). One hypothesis that could explain this result is a cooperation between the loops (3).
Finally, the phenotype observed when comparing the invasion rate of Rck M with our reference protein does not appear surprising given the substantial deletion present in this variant. Indeed, as the amino acids in position 114 and 115 constitute the anchor point of the third extracellular loop in the bacterial membrane, we suggest that the deletion present in this variant greatly alters the structure of the loop, thus hindering a proper interaction with EGFR.

DISCUSSION
The rck ORF, which belongs to the pefI-srgC operon itself carried on the large virulence plasmid of Salmonella, encodes an OMP involved in cell invasion and complement resistance. Little was known about its distribution, and while it was assumed that it was restricted to only a few serovars, several reports have established the presence of virulence plasmid-associated genes in less characterized serovars (18,26,30). Moreover, other ORFs of the operon (pefI, srgD, srgA, and srgB) have been described on the serovar-specific pathogenicity island SPI-10 (35,36). The constant progress in genomics and the improvement of sequencing techniques has allowed for an increase in the number of sequenced and characterized genomes. We took advantage of the large number of Salmonella genome sequences available to investigate further the distribution of the operon and more specifically of the rck ORF using both complete genomes on ATCC and Enterobase databases.
The complete or partial pefI-srgC operon was found in 61 S. enterica subsp. enterica serovar assemblies, while the rck ORF itself was retrieved in 36 of them (i.e., 33 more than previously known). Moreover, the complete or partial operon was identified for the first time in four Salmonella enterica subspecies (salamae, arizonae, diarizonae, and houtenae) and rck itself in two of them (salamae and diarizonae). Altogether, these results highlight a greater distribution of this operon and of the rck virulence gene,  thus suggesting a more important role of this virulence factor than previously expected. This is even more true when considering the fact that a significant number of false-negative hits occurs when processing with the Enterobase-generated wgMLST data. The identification of the operon in subsp. salamae, arizonae, diarizonae, and houtenae but not in subsp. indica is interesting. Indeed, these subspecies share a lower ancestry with subsp. enterica than subsp. indica, which is a sister phylogroup of enterica (37), suggesting that the operon may have spread among all the enterica subspecies. The complete or partial pefI-srgC operon was found either on the chromosome or on related plasmids. Some serovars harbor an incomplete operon, which colocalized with the sef fimbrial operon within SPI-10. The integration of this genetic element was previously described for serovars Typhi, Paratyphi A, and Enteritidis, although the reason behind the necessity for the latter to carry both a plasmid-borne and chromosomal copy of these ORFs remains unknown (18,35). Additionally, another study previously detected the presence of the pathogenicity island (through the presence of the sef operon) on genomes of S. Washington and S. Typhimurium isolates (38). However, no assemblies of S. Washington were retrieved in our data set, hindering us to predict the composition of the island. The same study did not detect the island on S. Dublin isolates, while our results demonstrate that the island might be retrieved on a large proportion of this serovar population. The presence of these ORFs on the chromosome of some strains of these serovars implies that they must play a role, although yet undetermined, in these strains.
When the operon was found on plasmids, some strains appeared to carry plasmids very similar, both in size and gene content, to the previously described virulence plasmids of S. Typhimurium (p14028s) or S. Enteritidis (pSENV). While p14028s exhibits the genes necessary for self-conjugation and was previously characterized for its ability to promote it both in vitro and in vivo in the distal portion of the small intestine (39,40), pSENV has suffered severe degradation along its evolutionary course, especially deletions of the tra operon, making it unable to promote self-conjugation. The detection of the operon with a different architecture on spv 2 plasmids using wgMLST ST and the sequences of rck 1 assemblies of few serovars (S. Poona for example) also raised questions concerning the virulence plasmids and their evolution. Combined with previous works on pRST98, a hybrid virulence plasmid of S. Typhi carrying both drug resistance genes and virulence genes, including rck (41, 42), our results encourage us to investigate further on the evolutionary mechanisms explaining the emergence of these plasmids.
Whatever the genetic supports, we observed a remarkable low diversity in the sequence of the rck ORF. This relative stability of the nucleotide sequence among the serovars has direct repercussions on the amino acid sequence of the protein, which also displays great conservation. The impact of substitutions in virulence determinants on their functional properties has already been documented for some of them. Notably, several studies designed by the Schifferli lab have highlighted a host-specific tropism of strains according to FimH adhesin allelic variants (34,43). More recently, they characterized the functionality and the affinity of several allelic variants of the PagN invasin of Salmonella with its interactants and highlighted the importance of two amino acids in PagN binding to laminin and mammalian cells as well as in cell invasion (33). Here, the functionality of 14 variants of Rck, engineered in our laboratory to reproduce the proteins presenting polymorphisms in the different domains of the protein (i.e., signal peptide, all four extracellular loops, transmembrane b-barrel regions), was assessed. Only three variants (Rck H , Rck I , and Rck M ) displayed altered function (i.e., resistance to complement-mediated killing and/or invasion). The impact of the polymorphism in Rck M (D114 to 115) confirmed the essential role of the third loop (G116 to G143) in all the virulence properties of the protein yet described (3,10). However, the explanation of the decrease in invasion induced by the two other variants remains unclear. One can hypothesize that the I38V substitution harbored by the Rck H variant affects the affinity of the first loop with EGFR in a cooperation between loops during the adhesion/invasion process. A similar suggestion has already been made by Cirillo et al. when they observed that the addition of the D43K substitution to Rck mutants with a G118D substitution induced a more important decrease in serum resistance than that conferred by Rck G118D (3). As both isoleucine and valine are hydrophobic amino acids, these mutations probably did not greatly alter the first loop structure but rather the affinity of the variant with its interactant. The last phenotype observed, induced by the A17V substitution found on Rck I , might, however, be the result of the successive mutations of the signal peptide, possibly reducing, but not abrogating, protein addressing to the outer membrane. Indeed, as the variant Rck I was not altered in the complement resistance assay, it means that there is a sufficient quantity of Rck protein correctly expressed in the outer membrane to ensure the recruitment of inhibitors of the different complement pathways involved in the action of Rck. However, this seems not sufficient to properly mediate the interaction with EGFR, resulting in a decrease in the invasion rate. These two hypotheses require deeper investigation to fully understand the phenotypes of these variants. However, these three altered variants are harbored by only 4 strains from 4 different serovars, supporting the idea that the function of the Rck protein is generally very well conserved independently of the variant. As the functionality of the remaining Rck variants is preserved, the other amino acids presenting polymorphisms do not seem to be involved in resistance and invasion phenotypes, suggesting that essential amino acids for both phenotypes remain conserved among the variants.
Finally, it is important to point out that this study does not consider the multiplicity of regulation patterns of expression of the pefI-srgC operon. To this day, most of the studies concerning the pefI-srgC operon focused on the one retrieved on S. Typhimurium virulence plasmid, and consequently only the regulatory scheme of this operon's transcription has been described (17,44,45). As the promoter region of the pefI-srgC operon is completely different between Typhimurium, Enteritidis, and Bovismorbificans serotypes, the results concerning the regulation of rck from S. Typhimurium cannot be extended to Enteritidis and Bovismorbificans serotypes. In S. Typhimurium, pefI-srgC transcription depends on the presence of acyl-homoserine lactones (AHLs) through the quorum-sensing regulator SdiA and has been postulated to occur within the mammalian gastrointestinal tract in response to other bacterial population/community AHL production (45). However, work performed in our laboratory to determine the kinetics of expression of the operon during infection in murine models of gastroenteritis and typhoid fever using chromosomal transcriptional fusions demonstrated that this signaling pathway is not activated under these physiological conditions. These unpublished results agree with those obtained by B. Ahmer's lab, which showed that S. Typhimurium does not detect AHLs during its transit through the gastrointestinal tract of several mammal species (i.e., Guinea pigs, rabbits, pigs, chicks, calves, and mice) (46). More recent work identified FliA, a broadly conserved s factor, as a component of this regulatory scheme, acting both and at different levels on sdiA and rck transcription (47). The investigations concerning the regulation of rck should clearly be deepened to obtain a more comprehensive picture of the role of this invasin in Salmonella pathogenesis.
To conclude, this work extended the distribution of the pefI-srgC operon in the Salmonella genus and pointed out the presence of different genetic elements carrying the operon, thus raising questions concerning the mechanisms governing these acquisitions and virulence plasmid evolution. Moreover, the Rck virulence protein was found to be very well conserved, and this study highlighted the importance of the first and third loops on its virulence properties. Altogether, these results suggest a more important role of the Rck protein than previously expected in the virulence of Salmonella.
Rck structure modeling. The 3D structure of Rck was predicted based on the sequence of S. Typhimurium 14028 using the protein structure homology-modeling server SWISS-MODEL (52). Two models presenting GMQE and QMEAN over 0.6 and 22.6, respectively, were generated based on the crystal structures of Ail and OmpX. The second model was chosen to be shown in Fig. 5A, as it exhibits the best overall ERRAT quality factor (88.49). The picture was formatted using the Swiss-Pdbviewer software (http://www.expasy.org/spdbv/) (53).
Bacterial strains and plasmid construction. The bacterial strains and plasmids used in this study are listed in Table 3. rck was amplified by PCR from the genomes of S. Enteritidis LA5 or S. Bovismorbificans 201910217 strains (carrying rck alleles number 3 and number 15, respectively) using primers rck-BamHI-fwd and rck-SalI-rev (Table S1 in the supplemental material). The subsequent PCR products were digested by restriction enzymes BamHI and SalI, ligated within BamHI/ SalI-digested pSUP202, and used to transform chemically competent E. coli MC1061 bacteria.