Comparative Genome Analyses of Vibrio anguillarum Strains Reveal a Link with Pathogenicity Traits

Comparative genome analysis of strains of a pathogenic bacterial species can be a powerful tool to discover acquisition of mobile genetic elements related to virulence. Here, we compared 28 V. anguillarum strains that differed in virulence in fish larval models. By pan-genome analyses, we found that six of nine highly virulent strains had a unique core and accessory genome. In contrast, V. anguillarum strains that were medium to nonvirulent had low genomic diversity. Integration of genomic and phenotypic features provides insights into the evolution of V. anguillarum and can also be important for survey and diagnostic purposes.

V. anguillarum pan-genome increased with each addition of a new genome and had at least 3,973 and 1,932 genes for chromosomes I and II, respectively (see Fig. S1 in the supplemental material). In contrast to this increase, the V. anguillarum core genome decreased with the addition of each new genome, as expected (Fig. S1). The V. anguillarum average gene contents were 1,891 and 479 genes for chromosomes I and II, respectively ( Fig. 1; Fig. S1). These open reading frames (ORFs) belonging to the core genome were assigned to putative functional categories using the Clusters of Orthologous Groups of Proteins (COG) database (Fig. S1B). Approximately 55.4 and 17.3% of the predicted genes in the core genome were dedicated to metabolic functions for chromosomes I and II, respectively. Of these genes, 36.2 and 13.7% were split between cellular process/signaling functions and information storage/processing functions for chromosomes I and II, respectively. Finally, functions of the predicted genes in the remaining 8.4 and 69% of the core genome were assigned as uncharacterized proteins for chromosomes I and II, respectively (Fig. S1B).
Distribution of virulence factors. Several virulence-associated genes have been described in V. anguillarum strain 775 (21). More than 90% of these virulence genes were present in all strains (see Table S3 in the supplemental material). Genes involved in iron transport, metalloproteases, motility, chemotaxis, type IV pilus, and quorum sensing were found in all strains (Table S3). However, V. anguillarum strains PF4, PF430-3, PF7, and S2 2/9 lacked the type VI secretion system present on chromosome I. Moreover, none of the V. anguillarum strains had the transcriptional regulator hylU or the unknown protein related to catechol siderophore metabolism positioned in chro-  (Table S3). All strains in which the plasmid pJM1 was found carried genes related to anguibactin and iron metabolism (Table S3). Although our results showed a global distribution of homologous virulence factors, nucleotide sequence dissimilarities could be translated to changes in amino acid level, which affect the dynamic functions or activities of these pathogenicity factors, leading to the development of a more invasive infection (32)(33)(34). To reveal the evolution of the virulence, we inferred the genetic diversity of 163 representative virulence factors shared by the V. anguillarum strains using the maximum likelihood algorithm (Fig. 2). V. anguillarum strains with medium to low virulence tended to cluster in a homogenous group, and the high-virulence strains 90-11-286, PF4, PF430-3, PF7, DSM21597, and HI610 clustered as separate groups (Fig. 2).
Phylogenetic relationship. In order to examine potential associations between the core genome composition and the virulence properties, the phylogeny of V. anguillarum strains was inferred by constructing a genome-relatedness maximum likelihood tree using orthologous alignment of 2,370 protein-coding genes (both chromosomes) of the core genome (Fig. 1). The evolutionary tree displayed different cluster patterns, which varied in the levels of diversity. Interestingly, 20 of the 28 strains that were medium to nonvirulent in the larval assays grouped with a very low genetic diversity (Fig. 3), with the exception of strains 601/90, 178/90, and 91-7-154, which were highly virulent in the larval systems (30) (Table S1). In contrast, V. anguillarum strains 90-11-286, PF4, PF430-3, PF7, DSM21597, and HI610, all of which were highly virulent, each constituted a separate cluster (Fig. 3). V. anguillarum strain DSM21597 was the most distant lineage in the phylogenetic tree. However, strains S2 2/9 and 4299, which were medium to low virulence, grouped with strains 90-11-286 and HI610, respectively (Fig. 3). These findings suggest an association between virulence and shared gene content. Moreover, the core phylogenetic tree indicated a geographic association among the most genetically diverse V. anguillarum strains. For example, the V. anguillarum strains from Chile, PF4, PF430-3, and PF7, shared a common ancestor. Similarly, strains HI610 and 4299, isolated in Norway, and strains S2 2/9 and 90-11-286, isolated in Denmark, clustered according to the geographic locality of isolation (Fig. 3).
GIs, prophages, and strain-specific genes. Ten genomic islands (GIs) have been described in V. anguillarum strain 775 (21), and their distribution in our collection was determined (see Fig. S2 in the supplemental material). All GIs (GIs 1 through 10) were found only in strains T265 and 775. Interestingly, the specific GIs 4, 6, 7, 8, and 10 were present in 82, 68, 64, 57, and 55% of the strains in the V. anguillarum collection, respectively (Fig. S2). For the purpose of this study, a GI was defined as a specific genomic region containing five or more ORFs (Ͼ5 kb). We detected a total of 64 strain-specific GIs between 6 and 132.1 kb for strains 90-11-286, PF4, PF430-3, PF7, HI610, DSM21597, 4299, and S2 2/9, all associated with transposases or integrases (see Table S4 in the supplemental material). Six of these nine strains were all highly virulent in cod, turbot, and halibut larva systems (31) (Table S1). A total of 1,067 strain-specific ORFs were found in these GIs. The GϩC contents ranged from 26.2 to 44.9% for GIs in chromosome I and 32.0 to 52.9% for GIs in chromosome II (Table S4). Genes related to toxins, fitness factors, modification-restriction systems, antitoxin-toxin systems, transport, and metabolism were found in the GIs. V. anguillarum strain 90-11-286 had 21  (Table S2). Bootstrap values Ͻ80% were removed from the tree. The horizontal bar at the base of the figure represents 0.01 substitutions per amino acid site. The virulence ranking of the strains is based on three fish larva models (31). HV, high virulence; LV, low virulence; MV, medium virulence.
specific GIs: one of them had an aerolysin toxin (GI 1), one contained diaguanylate cyclase and hemagglutinin genes (GI 9), and two harbored genes related to toxin RTX and toxin ABC transporter (GI 14) and iron and phosphate uptake systems (GI 17) (Fig. 4A). This strain contained a GI (GI 5) encoding a DprA protein, which has been associated with DNA transport and natural transformation competence (Fig. 4A).
V. anguillarum strain PF4 had a GI of 24.8 kb that encoded three antitoxin-toxin systems and one oxidoreductase gene (GI 23) (Fig. 4B). Also, strain PF7 had a GI that FIG 3 Core genome phylogeny of V. anguillarum strains. The maximum likelihood tree was obtained from a concatenated nucleotide sequence alignment of the orthologous core genes (1,723 genes for both chromosomes) for the 28 V. anguillarum strains. The virulence properties of the strains and geographical places of isolation were added to improve comparison. Bootstrap values of Ͻ80% were removed from the tree. The horizontal bar at the base of the figure represents 0.6 substitution per nucleotide site. The virulence ranking of the strains is based on three fish larva models (31). HV, high virulence; LV, low virulence; MV, medium virulence. encoded many acyltransferases, RTX toxin, nitrate reductase, and one glyoxalase gene related to antibiotic resistance (GI 31) (Fig. 4B). One GI of strain S2 2/9 carried genes coding for HipA protein and an antitoxin-toxin system (GI 35) (Fig. 4C). Strain HI610 harbored a GI with the presence of 50 rRNA methyltransferase, which is related to antibiotic resistance and RTX toxin Ca 2ϩ -binding protein (GI 48) (Fig. 4C). The GI of strain DSM21597 had genes encoding zonula occludens toxins (Zots) and the protein MarC related to resistance to antibiotics (Fig. 4D). Many other GIs harbored genes of ecological interest, but specific details for each of these are out of the scope of this article.
Finally, a new set of specific putative virulence factors were identified in strains 90-11-286 and 91-7-154 (see Table S5 in the supplemental material). V. anguillarum strain 90-11-286 had three additional hemagglutinin proteins, one toxin Fic protein, and one cytotoxic necrotizing factor 2 protein (Table S5). Strain 91-7-154 had a Zot. Interestingly, noticeable amino acid similarities (Ͼ74%) of these virulence factors were found to those of other Vibrio species (V. cholerae, V. ordalii, and V. harveyi), indicating that the genes coding for these proteins may have extrachromosomal origin (Table S5).
Prophages. Fifty-five different prophage-related elements were detected in the V. anguillarum genome sequences: of these, 9% were intact prophages, and the rest were defined as a "cryptic" or incomplete prophages. Both types of prophages were Vibrio anguillarum, Genomes, Pan-Genome, and Virulence found in chromosomes I and II (see Tables S6 and S7 in the supplemental material). Specifically, 40 (72%) unique phage-related sequences between 5.3 and 49.2 kb were specific in 17 of the 28 V. anguillarum strains (Table S6). On other hand, 15 phagerelated sequences (28%) were shared in 24 out of 28 V. anguillarum strains independently of locality and year of isolation (Table S7). Investigation of the presence of virulence or fitness factors encoded inside these sequences showed that V. anguillarum strains T265 and Ba35 carried a prophage-like element of 9.2 kb linked to a Zot-like toxin ( Fig. 4E; Table S7 [prophage 43]).

DISCUSSION
The 28 V. anguillarum strains analyzed in the present study represent the largest collection of genome-sequenced strains for this fish-pathogenic bacterium. The multiscale comparative approach used in this work provides insights into the diversity of V. anguillarum strains (Fig. 1 to 4). Identification and characterization of accessory genome included genes that confer resistance to antibiotics and encode toxins and/or genes that improve the fitness of the organism, which may have been acquired via lateral gene transfer (35) (Fig. 4). In addition, core genome diversity indicated that the most virulent strains grouped in different genetic clusters (Table S1; Fig. 3). Thus, altogether our data indicate that virulence is multifactorial in V. anguillarum and that both the core and accessory genomes affect the pathogenicity of this Vibrio species. Similarly, the accessory and core genomes are significant sources of virulenceassociated genes in Klebsiella pneumoniae (36), Escherichia coli (37), Staphylococcus aureus (38), and Pseudomona aeruginosa (39).
The plasmid pJM1 has been described as an important virulence factor in V. anguillarum (21,40). However, V. anguillarum strains 90-11-286, PF4, PF7, and HI610, which were highly virulent against fish larvae, did not contain the plasmid pJM1 (Table 1;  Table S1), but they had a functional vanchrobactin locus (not interrupted by a transposon) in the bacterial chromosome (31). This observation is in accordance with a previous study in which virulent pJM1-deficient strains did not carry the anguibactin system but produced the chromosomally encoded siderophore vanchrobactin, which is potentially a virulence factor (23,41,42). Thus, our results suggest that the presence or absence of the pJM1 plasmid is not an essential factor for V. anguillarum to cause disease in fish larvae.
The pan-genome analysis revealed that V. anguillarum contained a core genome of 2,370 nonduplicated ORFs for both chromosomes ( Fig. 1; Fig. S1). This level of core gene content is higher than that reported for V. mimicus (43) but lower than those in V. parahaemolyticus (44) and V. cholerae (45). When the V. anguillarum core genome was used to determine phylogeny, the 28 V. anguillarum strains clustered in 5 groups, and six of the nine most virulent strains were found in separate clusters (Fig. 3). In contrast, most of the strains that showed moderate to no virulence in our larval systems shared a very similar backbone, and hence probably all originated from a common ancestor (Fig. 3). These characteristics could be relevant for understanding the relationship between the core genome diversity and the influence of acquired mobile elements on pathogenicity in V. anguillarum. We speculate that these phylogenetically distant bacteria could occupy different niches in fish farms and that the acquisition of GIs may vary in these aquatic systems and be influenced by the genetic background. Interestingly, three V. anguillarum strains (91-7-154, 601/90, and 178/90) that displayed highvirulence properties in the larval systems clustered with the strains that showed medium-or low-virulence properties ( Fig. 3; Table S1). This observation allows us to suggest that specific mutations in the core genome may be linked to their highvirulence phenotypes.
Both chromosomes contained strain-specific elements and new virulence genes as a result of insertion of different genomic islands, prophages, and/or acquisition of other mobile genetic elements (described below) (Tables S4, S6, and S7). These results are in contrast to previous analysis in V. antiquaries (46), or V. mimicus (43), where chromosome II represents a collection of accessory elements and likely participate in the adaptation to different niches, having a critical role in the speciation and evolution of the genus Vibrio (47). However, our results indicated that chromosomes I and II both have genome plasticity ( Fig. 1; Fig. S1), leading us to suggest that both chromosomes are involved in and driving the evolution in V. anguillarum.
To capture the dynamic nature of virulence gene repertoires across V. anguillarum, we screened for Ͼ200 virulence related genes (21) (Table S3). A total of 163 genes were present in all of the strains (belonging to core genome), and phylogenetic relationships displayed a functional divergence in six out of nine of the most virulent strains (Fig. 2). This diversification leads us to suggest that virulence activity is under strong selection, affecting the dynamic functions or activities of these proteins, leading to the development of a stronger interaction with the host in the different steps of infection, as has been proposed for Pseudomonas syringae (32) and P. aeruginosa (48).
Genomic islands contribute to the evolution and diversification of microbial communities (49). We found 64 GIs that belonged to the accessory genome among six out of the nine most virulent V. anguillarum strains (Table S4). V. anguillarum strain 90-11-286 was highly virulent in the larval models (Table S1) and harbored GIs carrying a diversity of virulence factors (Fig. 4A). For example, one GI encoded the channelforming toxin aerolysin, which has been associated with diarrheal diseases and deep wound infections, by interacting with eukaryotic cells and aggregating to form pores, leading to the destruction of the membrane permeability barrier and osmotic lysis (50). A second GI had a diguanylate cyclase gene, which affects the adhesive and invasive capabilities of the human pathogen Porphyromonas gingivalis (51). Finally, a third GI contained a hemolysin toxin, RTX, associated with toxin ABC transporter (52). Interestingly, this strain harbored the highest number of accessory genes in both chromosomes ( Fig. 1; Fig. S1), and this feature could be associated with the presence of the protein DprA (Fig. 4A), which participates in uptake, transport, and protection of DNA in the natural transformation process (53).
Strains PF4, PF7, and S2 2/9 harbored GIs that encoded several acyltransferases and toxin-antitoxin systems ( Fig. 4B and C). Acyltransferases are enzymes that transfers acyl groups to specific targets and may be an important factor regulating quorum-sensing virulence-related phenotypes, including the production of virulence factors, motility, and biofilm formation (54). Toxin-antitoxin systems, which were originally linked to the plasmid maintenance and stabilization of the bacterial chromosome, are now known to be involved in general stress response (55), persistence (56), biofilm formation (57), and virulence capacity of pathogenic bacteria (58). Finally, strains HI610 and DSM21597 exhibited GIs which encode toxins and resistance to reactive oxygen species (Fig. 4C and D). Genomic island in strain HI610 had a 50 rRNA methyltransferase, which contribute to the virulence in Staphylococcus aureus by conferring resistance to oxidative stress (59) and hemolysin toxin RTX Ca 2ϩ binding (52). Strain DSM21597 had two zonula occludens toxins (Zot), described previously in V. cholerae, whose function is to increase intestinal permeability by interacting with a mammalian cell receptor, with subsequent activation of intracellular signaling leading to the disassembly of the intercellular tight junctions (60,61).
A set of prophage-related elements were identified in all 28 V. anguillarum strains (Table S6 and S7). Two prophage-like elements in the strains T265 and Ba35 contained Zot genes, which shared homology with V. cholerae (Fig. 4E). The presence of this toxin has also been documented in a prophage genome in V. coralliilyticus (62). However, recent studies have indicated that prophage elements may provide a benefit on virulence or fitness evolution, even if they do not carry virulence factors (63). Thus, this finding could be a starting point for future experimental studies on the role of bacteriophages as a potential central driver of pathogenicity in V. anguillarum.
Comparative genome analysis also revealed that most virulent strains 90-11-286, PF4, PF430, PF7, HI610, and DSM21597, and the low-virulence strain 4299 carried a new set of gene clusters related to biosynthesis, modification, and transport of exopolysaccharides (Fig. 5). In contrast to the limited diversity observed in this region among the 21 remaining strains, it clearly indicated that these clusters have been horizontally transferred. The existence of these accessory genes is regarded as essential virulence factor in Burkholderia pseudomallei (64) and V. cholerae (45). More importantly, the presence of these genes has been associated with the modification of capsule polysaccharide content and evasion of immune response (65).
Unlike Salmonella enterica serovar Typhi (66), Bacillus anthracis (67) and V. parahaemolyticus O3:K6 (68), which showed low genetic diversity, V. anguillarum offered an example of how lateral gene transfer has an important role in accessorizing the genome, providing genes essential for pathogenicity or fitness ( Fig. 4; Table S5). Taken altogether, we propose a hypothetical model of evolution in V. anguillarum occurring in distinct phylogenetic groups, which shows that the high-virulence properties of some strains were obtained mainly via acquisition of pathogenic genomic islands occurring in the natural environment (see Fig. S3 in the supplemental material). It should be noted that the stability and transmission of these GIs were speculative. For example, GIs 4 and 6 detected previously in strain 775 were present in 82 and 68% of the strains, respectively, independently of the geographic localities of isolation (Fig. S2). Thus, we assumed a vertical transmission and loss of these specific GIs in some phylogenetic clusters. In contrast, GIs 3 and 5 were presented in 7% and 14%, respectively (Fig. S2), and their transmission could be horizontal for those strains. Similarly, we hypothesized that temperate bacteriophages infected putative V. anguillarum genetic ancestors (e.g., Pp41) (Table S7) and consequently the prophage-related elements were transmitted vertically, while other bacteriophages could be strain specific (e.g., Pp 16). Altogether, this study proposes that GIs and prophage-related elements outside the core genome may be a driving force in diversity and pathogenicity of V. anguillarum (Fig. S3).
V. anguillarum is an important part of the autochthonous marine microbial communities with a specific ecological niche, such as fish, where selective pressure may allow acquisition of genetic traits that could increase fitness and virulence potential (69). Data presented here clearly support this view, where genomic islands carrying a suite of virulence genes and other mobile elements are probably driving the pathogenic and/or fitness evolution of V. anguillarum ( Fig. 4; Tables S4 and S5). It has been suggested that virulence factors have a dual function and are used by pathogens both during the host infection and in environmental adaptation (70). For example, the toxin hemagglutinin in V. cholerae has a role in intestinal colonization, but has also recently been implicated in biofilm formation on chitin-containing surfaces in aquatic environment (71). In the same way, V. antiquarius, isolated in a deep sea hydrothermal vent, exhibited Zot and RTX toxins (46), indicating a multifaceted role outside the host. Thus, the presence of these genes in the V. anguillarum strains ( Fig. 4; Tables S4 and S5) suggests a dual role in non-host environments; however, clearly a new outlook is needed for inferring the putative secondary role of pathogenic genes in this bacteria.
Conclusions. Using a comparative pan-genomic analysis of V. anguillarum, we identified new pathogenic genomic islands, prophages, and virulence factors, suggesting that independent acquisition of these mobile genetic elements could play an important role in the evolution and virulence of V. anguillarum. The phylogenetic relationship based on core genome and shared virulence factors revealed different cluster groups, which suggested a possible link with the virulence properties and supported the idea that pathogenicity is also driven by core genome content within this bacterial species. Altogether, the genome sequences analyzed could serve as a reference point for studies of pathogenicity in aquaculture when V. anguillarum is present.

MATERIALS AND METHODS
Strain selection, medium composition, and growth conditions. Twenty-eight V. anguillarum strains isolated from different geographic localities (Ͼ13,000 km), temporal scales (Ͼ25 years), and hosts (Table 1) were included in the analyses. The strains were stored at Ϫ80°C in LB broth (12106; Mo-Bio) with 15% glycerol. Strains were grown in LB broth and incubated at 22°C with agitation for 24 h (72).
DNA extraction. Bacterial DNA from V. anguillarum strains was extracted from cells harvested by centrifugation (5000 ϫ g, 10 min) using the NucleoSpin tissue kit (Macherey-Nagel). The amount of genomic DNA was measured using a Nanodrop2000 UV-visible light (UV-Vis) spectrophotometer (Thermo Scientific).
Genome sequencing, assembly, and annotation. The genomes of 25 V. anguillarum strains were sequenced using Illumina HiSeq platform (BGI, China) with paired-end read sizes of 100 bp. Library construction, sequencing, and data pipelining were performed in accordance with the manufacturer's protocols. The Illumina data were assembled into contiguous sequences using Geneious software (version 9.1.4) (73), and short-and low-coverage contigs were filtered out. The remaining contigs were aligned using chromosomes I and II and plasmid pJM1 of V. anguillarum strain 775 as references (GenBank accession no. CP002284.1 [chromosome I], CP002285.1 [chromosome II], and AY312585 [plasmid pJM1]; December 2014). The genome assembly process was performed using the Geneious software version 9.1.4 and assembled into two scaffolds of 35 to 71 contigs with an average coverage of Ͼ88ϫ for each isolate. The genome of V. anguillarum strain 90-11-286 was already fully sequenced and previously described (74). Annotation of the genomes was done by the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP) (75). Alternatively, genomic annotation was done by RAST (76) and BaSys (77).
Identification of genomic islands, prophage-like elements, and virulence factors. IslandViewer and MAUVE v2.3.1 were used to predict the putative genomic islands (GIs) (78,79). IslandViewer integrated sequence composition-based genomic island prediction programs, including IslandPath-DIMOB, SIGI-HMM, and the comparative genome-based program IslandPick. The MAUVE alignment procedure allows the detection of unique regions using a comparative genomics approach. Putative virulence genes were predicted using the virulence database MvirDB (80). All predicted genes of the 28 V. anguillarum strains were searched against the MvirDB by BLASTP with loose criteria (E value, Ն1EϪ5; identity, Ն35%; coverage, Ն80%). Also, VirulenceFinder 1.2 (81) was used to screen for putative virulence factors using selected databases from Escherichia coli, Enterococcus, and Streptococcus aureus. Prophagerelated sequences were identified and selected by running bacterial genomes in Phage_Finder v2.1 (82) and PHAST (83).
Virulence-related genes and genomic islands (GIs) of V. anguillarum strain 775 were used to identify homologs in the V. anguillarum genomes by BLAST analyses using the tBLASTn 2/2/25ϩ tool and an E value threshold of Յ10 Ϫ10 (84). These DNA sequences were verified as reciprocal best hits.
Pan-genome analysis. To predict the possible genomic dynamic changes at V. anguillarum, EDGAR (85) was used to predict the pan-genome: i.e., to determine the accessory genome (specific genes found in only one genome) and core genome (common genes mutually conserved). Comparative analyses at the protein level were done by an all-against-all comparison of the annotated genomes. The algorithm used was BLASTP with a standard scoring matrix, BLOMSUM62, and an E value cutoff of 10 Ϫ4 . All BLAST hits were normalized according to the best score (84). The score ratio value (SRV), which shows the quality of the hit, was calculated by dividing the scores of further hits by the best hit (86). Two genes were considered orthologous when revealing a bidirectional best BLAST hit with single SRV exceeding the predetermined cutoff of 76 (85).
Functional annotation of genes and transposase identification was accomplished by BLASTp alignment of annotated ORFs against the COG database (87) using BLASTϩ v2.2.24 (88).
Phylogenomic tree reconstruction. To reveal the phylogenetic relationship among V. anguillarum strains based on virulence factors, we selected 163 putative pathogenicity genes from V. anguillarum strain 775 (21). For each gene, protein sequences were aligned using ClustalW version 2.0 (89), and individual proteins were concatenated to infer phylogeny using maximum likelihood in Geneious version 9.1.4 (73). Similarly, to determine the core genome phylogenetic relationship among V. anguillarum strains based on genomic data, we selected a set of orthologous genes shared by all 28 strains and V. parahaemolyticus strain RIMD 2210633 (outgroup to root the tree) (1,723 genes present in a single copy, with paralogs not included) using OrthoMCL with an E value cutoff of 10 Ϫ10 (90). The sets of 1,723 single core genes were first aligned at the amino acid level using ClustalW version 2.0 (89) and then back-translated to DNA sequences using PAL2NAL (91). The alignment of all orthologous sequences was concatenated using FASconCAT (92). The gene tree was constructed using PhyML (93).
Accession number(s). Accession numbers for chromosomes and plasmids are listed in Table 1.