Comparative genome analysis reveals important genetic differences among serotype O1 and serotype O2 strains of Y. ruckeri and provides insights into host adaptation and virulence

Abstract Despite the existence of a commercial vaccine routinely used to protect salmonids against Yersinia ruckeri, outbreaks still occur, mainly caused by nonmotile and lipase‐negative strains (serotype O1 biotype 2). Moreover, epizootics caused by other uncommon serotypes have also been reported. At the moment, one of the main concerns for the aquaculture industry is the expanding range of hosts of this pathogen and the emergence of new biotypes and serotypes causing mortality in fish farms and against which the vaccine cannot protect. The comparative analysis of the genome sequences of five Y. ruckeri strains (150, CSF007‐82, ATCC29473, Big Creek 74, and SC09) isolated from different hosts and classified into different serotypes revealed important genetic differences between the genomes analyzed. Thus, a clear genetic differentiation was found between serotype O1 and O2 strains. The presence of 99 unique genes in Big Creek 74 and 261 in SC09 could explain the adaptation of these strains to salmon and catfish, respectively. Finally, the absence of 21 genes in ATCC29473 which are present in the other four virulent strains could underpin the attenuation described for this strain. The study reveals important genetic differences among the genomes analyzed. Further investigation of the genes highlighted in this study could provide insights into the understanding of the virulence and niche adaptive mechanisms of Y. ruckeri.

Recently, a significant positive correlation between genetic and geographical distances was observed by Bastardo et al., 2015. Their results revealed that Y. ruckeri has experienced population changes that were probably induced by biogeography forces in the past and, much more recently, by adaptive processes resulting from aquaculture expansion.
During the last few years, nine genome sequences of Y. ruckeri strains, isolated from different niches have been uploaded onto NCBI (MKFJ00000000, NZ_CP011078, NZ_CP009539, JPFO00000000, CQBN00000000, CPUZ00000000, JPPT00000000, CCYO00000000, and JRWX00000000). Here, we present for the first time in this species a comparative analysis of five of those genomes belonging to strains isolated from different hosts and classified into different serotypes. The study reveals data that are important for a better understanding of the mechanisms underlying the niche adaptation and virulence of Y. ruckeri.

| Y. ruckeri strains used for genome comparison
Five previously sequenced Y. ruckeri strains were selected for comparative genome analysis based upon their characteristics and hosts (Table 1). Three strains were from serotype O1, isolated from rainbow trout (Oncorhynchus mykiss), of which two were virulent (Y. ruckeri 150 and Y. ruckeri CSF007-82), while the other, ATCC29473 type strain, was described as nonvirulent (Furones, Gilpin, Alderman, & Munn, 1990). The other two strains included in the analysis were Y. ruckeri Big Creek 74, belonging to serotype O2 and isolated from salmon, and Y. ruckeri SC09 isolated from catfish and of unknown serotype.

| Comparative analysis of Y. ruckeri genomes
Identification of putative protein-encoding genes and annotation of Y. ruckeri genomes were performed with Rapid Annotation using Subsystem Technology (RAST) (Brettin et al., 2015). Before comparative analysis, the set of proteins from the five genomes selected were compared using BLAST to UniRef90 to associate each translation product to a Uniref90 protein. It was considered that a protein from one genome was orthologous to another one when they were in the same cluster. Based on this clusterization process, Venn diagrams were constructed with shared proteins (orthologous proteins) using the Venn diagram package in R (Chen & Boutros, 2011). Pairwise genome alignments were performed with MAUVE (Darling, Mau, Blattner, & Perna, 2004).

| Y. ruckeri whole-genome comparisons
The pairwise full genome alignments revealed a mosaic pattern of homology organized in local collinear blocks (LCBs) between 150 and each of the other four strains (Figure 1). The 150 strain shares larger portions of genetic information with CSF007-82 and ATCC24973, than it does with Big Creek74 and SC09. This result suggests that Y. ruckeri strains belonging to serotype O1 and having rainbow trout as a host (150, ATCC29473, CSF007-82) are genetically more similar to each other than to other serotypes isolated from different animals, suggesting that differences in cell surface antigens and host specificity may have a markedly genetic base.
To identify orthologs shared by Y. ruckeri strains, a five-way Venn diagram was made ( Figure 2). The pangenome consists of 4,117 protein-coding genes with a core of 3,090 genes (75.05%). A total of 370 genes were found to be strain-specific, two genes corresponding to CSF007-82, eight to ATCC29473, 99 to Big Creek 74, and 261 to SC09 genomes. Approximately, half of these unique genes (57%) were annotated as coding for hypothetical proteins. Interestingly, while serotype O1 strains isolated from rainbow trout have few unique genes (150 has none), the other two strains, Big Creek74 and SC09, have a great number, 99 and 261, respectively. Most of these genes could be related to host adaptation processes, in particular to survival in salmon in the first case and in catfish in the second.
As can be seen from the Venn diagram, the 150, ATCC29473, and CSF007-82 strains share between them, and not with the other two

| Genes exclusively shared by serotype O1 strains
As mentioned above, a total of 268 genes were shared by 150, ATCC29473, and CSF007-82 strains, all of them belonging to serotype O1 and isolated from rainbow trout. These genes include 113 which encode for hypothetical proteins, 33 are mobile genetic elements, 24 encode for phage-related proteins, and 98 for proteins with different functions (Table 2). Thus, some of them are associated with restriction-modification and toxin-antitoxin systems. Both these systems have in common the death of cells that have lost one of the components (the antitoxin or the modification enzyme) and also their effect on global gene expression, which results in altered adaptive phenotypes. Thus, the antitoxin of the Escherichia coli MqsR-MqsA toxin-antitoxin system directly represses the transcription of the gene encoding the master stress regulator RpoS, while the degradation of the antitoxin during stress leads to a switch from the high-motility state to biofilm formation (Wang et al., 2011). In the same way, methylation events produced by restriction-modification systems may affect nearby gene expression. Thus, methylation by Type III RM systems controls the expression of certain genes leading to two distinct cell types with two distinct phenotypes ("phasevarion") (Srikhanta, Fox, & Jennings, 2010).
A relevant finding shared by all serotype O1 strains was related to a cluster of genes which are involved in the biosynthesis of the legionaminic acid, a nine-carbon diamino monosaccharide that is found coating the surface of various bacterial human pathogens, being the major component of the LPS. Interestingly, these genes which are grouped in a cluster of at least 18 genes are absent in other Yersinia species but present in other aquatic bacteria such as Vibrio vulnificus, Aeromonas salmonicida, Vibrio fischeri, or Photobacterium profundum. It is possible that this cluster provides an adaptive advantage for surviving in the aquatic environment or, as happens in some organisms such as Campylobacter jejuni (Zebian et al., 2016), it is related to virulence. This is because legionaminic acid is essential for flagella assembly in several species (Morrison & Imperiali, 2014) and for this reason, the genes involved in its biosynthesis are novel targets for the development of antivirulence agents (Table 2).
Other genes which are exclusive to O1 serotype strains code for a bacteriocin similar to colicin-Ib of Escherichia coli (WP_062877260) and virulence factors such as a type IV secretion system previously analyzed by Méndez et al., (2009) and an invasin present in other enterobacteriaceae such as Yersinia pestis (EIR59646), Y. pseudotuberculosis (WP_050128752), and Edwarsiella tarda (WP_047059316) ( Table 2).

| Genes exclusively shared by Big Creek 74 and SC09 strains
As indicated in the Venn diagram ( Figure 2), Big Creek 74 and SC09 share a total of 122 genes which include 37 ORFs encoding for proteins of unknown function, 50 for phage-related proteins, and 35 encode for proteins with similarity to proteins involved in a variety of functions such as restriction-modification systems, toxin-antitoxin systems or proteins involved in fimbriae synthesis (Table 3). One such case is that of a cluster involved in fimbriae biosynthesis, similar to the Stf cluster of Salmonella typhimurium which has been associated with differences in virulence and host range between the different serotypes (Emmerth, Goebel, Miller, & Hueck, 1999). Although one Stf cluster copy is present in the five genomes analyzed (Figure 3a), an additional complete copy of this cluster was only found in the genome of SC09 and, with the exception of the gene encoding the minor fimbriae subunit (stfE), also in Big Creek 74 ( Figure 3b). The last copy seems to be the result of several genetic rearrangements so it is probably not functional in those strains ( Figure 3b).
Three insecticidal toxin complexes (tc)-like proteins were also identified as unique in these strains. They are similar to the TcdA, TcdB, and TcdC proteins of Vibrio parahaemolyticus, which are involved in the production of acute hepatopancreatic necrosis disease in penaeid shrimp (Tang & Lightner, 2014).
One of the most interesting findings was that Big Creek 74 and SC09 strains share a cluster of seven genes involved in the utilization of sorbitol (Figure 4), a previously described characteristic associated with Y. ruckeri serotype O2 strains (Davies & Frerichs, 1989), which supports the hypothesis that SC09 belongs to this serotype.
The first three genes encode the three subunits of the sorbitol transporter of the phosphoenolpyruvate-dependent phosphotransferase system (PTS), involved in the uptake and phosphorylation of sorbitol, while gutD encodes a sorbitol-6-phosphate 2-dehydrogenase that synthesizes D-fructose 6-phosphate from D-sorbitol 6-phosphate.
In Although the role of this protein in the sorbitol metabolism is unclear, it could be a regulatory molecule involved in expression of the gut operon (Meredith & Woodard, 2005). In the plant pathogen Erwinia amylovora, the presence of this operon has been linked to virulence and suggested to contribute to host specificity (Aldridge, Metzger, & Geider, 1997).

| Unique genes of Big Creek 74
As was seen in the Venn diagram (Figure 2), Big Creek 74 strain has a total of 99 unique genes, which include 53 encoding hypothetical proteins, eight phage genes, four mobile genetic elements, and 34 genes which encode proteins with known function. As was mentioned above, F I G U R E 3 Analysis of the stf genes in Y. ruckeri genomes. Two copies of the stf cluster were found in Y. ruckeri strains. One copy of the cluster stfACDEFG is complete in the five strains (a), while a second copy is only complete in SC09, and with the exception of stfE gene, in Big Creek 74. The second copy of the stf cluster in 150, ATCC29473, and CSF007-82 strains is only constituted by stfA and stfC genes (b). Note that the gene represented by a striped arrow, which encodes a lipid A core-O-antigen ligase, was affected by a translocation and an inversion event, resulting in a different localization in the two clusters T A B L E 3 (Continued) the presence of some of these genes may underpin its adaptation to salmon, since the host of the other four strains is rainbow trout or catfish.
Among the proteins with known function (Table S1), we can find restriction-modification systems, transcriptional regulators, transferases, or proteins involved in polysaccharide biosynthesis. Especially interesting is the gene encoding an ATP-dependent Clp protease proteolytic subunit, a relevant regulatory enzyme in different bacteria, related also to virulence, environmental adaptation, and antibiotic resistance in microorganisms such as Staphylococcus aureus (Frees, Gerth, & Ingmer, 2014) or the fish pathogen Pseudomonas fluorescens (Liu, Chi, & Sun, 2015).

| Unique genes of SC09
SC09 has a total of 261 genes that are not present in the other strains, 148 of them encode hypothetical proteins, 17 are phage-related genes, nine mobile genetic elements, and the rest encode proteins with different functions. As was suggested for Big Creek 74, some of these genes may underpin the adaptation of this strain to survive inside the host (catfish) or under certain environmental conditions.
Among these unique proteins are transcriptional regulators, proteins related to type IV secretion systems, restriction-modification, and toxin-antitoxin components and proteins associated with cellular energy homeostasis (Table S2). One of the most interesting proteins is a thymidylate synthase, an enzyme linked to virulence in several microorganisms such as Staphylococcus aureus (Kriegeskorte et al., 2014) or Salmonella typhimurium, in which it was necessary for intracellular growth, both in macrophage-like and Hep-2 human epithelial cell lines (1) and also for complete virulence in a BALB/c mice model (Kok, Bühlmann, & Pechère, 2001 A finding which is worthy of further investigation was the presence, only in this strain, of a cluster of 12 genes related to cell wall polysaccharide biosynthesis, in particular the O-antigen.

| Genes solely absent in the avirulent strain ATCC29473
Among the five strains included in the study, ATCC29473 was defined as avirulent. In this sense, it was intriguing to analyze which genes are absent in this strain and present in the others, in order to elucidate the genetic basis of its attenuation. A total of 21 genes were found (Table 4), all of them encoding proteins with an assigned function, which were probably lost during the evolution of this strain. It is significant that 17 out of 21 genes are adjacent in the other four genomes from virulent strains ( Figure 5). This region of 19,566 bp contains genes encoding for a Crp-Fnr family transcriptional regulator, a hypothetical protein, an enzyme related to an enterobactin-like siderophore and three different gene clusters: one formed by three genes involved in iron transport, a group of three genes related to hexose phosphate uptake; and a region containing nine genes involved in the uptake and metabolism of citrate. Since most of these genes are related to virulence (Gray, Freitag, & Boor, 2006;Moisi et al., 2013;Urbany & Neuhaus, 2008), it is possible that the absence of this region could explain, in some way, the attenuation of Y. ruckeri ATCC29473. This is important for future studies and may help to shed light on the virulence of the species.

| CONCLUSION
In this study, is presented for the first time, the comparative analysis of five genome sequences of Y. ruckeri. Although the five strains shared approximately 75% of their genes, our study has revealed important genetic differences between the five genomes. Aside from the genetic differentiation found between serotype O1 and O2 strains, especially relevant are the high number of unique genes found in Big Creek 74 and SC09 in relation to serotype O1 strains and the 21 genes absent in the avirulent strain ATCC29473. These findings could explain the host specificity of the first two strains or the virulence attenuation of ATCC29473. Further investigation of those genes will provide insights into understanding the pathogenesis and the adaptive mechanisms to different environments of Y. ruckeri.