Genomic Analysis of Pseudomonas sp. Strain SCT, an Iodate-Reducing Bacterium Isolated from Marine Sediment, Reveals a Possible Use for Bioremediation

Strain SCT is an iodate-reducing bacterium isolated from marine sediment in Kanagawa Prefecture, Japan. In this study, we determined the draft genome sequence of strain SCT and compared it to complete genome sequences of other closely related bacteria, including Pseudomonas stutzeri. A phylogeny inferred from concatenation of core genes revealed that strain SCT was closely related to marine isolates of P. stutzeri. Genes present in the SCT genome but absent from the other analyzed P. stutzeri genomes comprised clusters corresponding to putative prophage regions and possible operons. They included pil genes, which encode type IV pili for natural transformation; the mer operon, which encodes resistance systems for mercury; and the pst operon, which encodes a Pi-specific transport system for phosphate uptake. We found that strain SCT had more prophage-like genes than the other P. stutzeri strains and that the majority (70%) of them were SCT strain-specific. These genes, encoded on distinct prophage regions, may have been acquired after branching from a common ancestor following independent phage transfer events. Thus, the genome sequence of Pseudomonas sp. strain SCT can provide detailed insights into its metabolic potential and the evolution of genetic elements associated with its unique phenotype.

Bay, Kanagawa Prefecture, Japan, by enrichment culture of iodatereducing bacteria (Amachi et al. 2007). A phylogenetic analysis based on 16S rRNA gene sequencing indicated that strain SCT was most closely related to Pseudomonas stutzeri (Amachi et al. 2007).
Results by Amachi et al. (2007) suggest "SCT is a dissimilatory iodate-reducing bacterium and that its iodate reductase is induced by iodate under anaerobic growth conditions". However, whereas the iodate-reducing activity of strain SCT has been extensively studied, its other potential functions remain largely unknown. The purpose of this study was to gain insights into the evolution and functional potential of strain SCT. To this end, we performed whole-genome shotgun sequencing of strain SCT and comparative genomic analyses with closely related bacteria including P. stutzeri.

Software
Bioinformatics analyses were conducted using Python version 3.5.5 and its molecular biology package Biopython version 1.72 (Cock et al. 2009). Statistical computing was implemented using R version 3.4.3, available at https://www.r-project.org/ (R Development Core Team 2008).
Genome sequencing, assembly, and annotation of strain SCT Strain SCT was grown aerobically in Lysogeny broth (LB) medium. Genomic DNA was extracted using a DNeasy blood and tissue kit (Qiagen, Hilden, Germany). According to the manufacturer's protocols, an Illumina paired-end library (with an average insert size of 550 bp) was prepared, and whole-genome sequencing was performed using an Illumina MiSeq sequencing platform (Illumina, San Diego, CA) at the National Institute for Environmental Studies. The sequencer produced 1,084,678 paired-end reads (2 · 300 bp).
Protein-coding DNA sequences (CDSs) were predicted and functional annotations (gene and product names) were assigned using Prokka version 1.11, which coordinates a suite of existing bioinformatics n tools and databases for annotation of prokaryotic genome sequences (Seemann 2014), incorporating Prodigal (Hyatt et al. 2010), BLAST+ (Altschul et al. 1997), and HMMER (http://hmmer.org/). Prokka was run with the following parameters: "-kingdom Bacteria-compliant" (https://github.com/tseemann/prokka). We then performed similarity searches of all the predicted protein sequences against the UniRef90 sequence database (UniRef90 Release 2016_08 consisting of 44,448,796 entries) using the BLASTP program with an E-value cutoff of 1e-5, and assigned functional annotations from the most similar (best hit) protein sequences. BlastKOALA (Kanehisa et al. 2016) was used to assign KEGG Orthology identifiers to the protein sequences obtained by BLAST searches, for which taxonomy group information on "Bacteria" and the KEGG Genes database of "family_eukaryotes + genus_prokaryotes" were selected at https://www.kegg.jp/blastkoala/.

Search for mobile genetic elements
Mobile genetic elements such as phages were searched in the Pseudomonas genomes. The PHASTER search tool (Arndt et al. 2017) was used to identify putative prophage regions in the 13 Pseudomonas genomes analyzed. A new search of phage sequences in the SCT genome was performed at http://phaster.ca/, and pre-calculated results for the other 12 Pseudomonas genomes were obtained from http://phaster.ca/ submissions.

Phylogenetic analysis
To infer phylogenetic relationships among the 13 Pseudomonas strains, we used single-copy core genes, which are shared by all genomes and contain only a single copy from each genome (and thus contain orthologs, but not paralogs). The core genes were built using the Roary pangenomic analysis pipeline (Page et al. 2015) with a default parameter. All nucleotide alignments of the core genes were done in MAFFT (Katoh et al. 2002) and then concatenated by Roary. We used RAxML version 8.2.11 (Stamatakis 2014) for maximum likelihood-based inference of a phylogenetic tree on the concatenated sequence alignment under the GTR+CAT model. RAxML was run as follows: "raxmlHPC-PTHREADS -f a -x 12345 -p 12345 -# 100 -m GTRCAT -s ./core_gene_ alignment.phy -n outfile". The resulting tree was drawn using FigTree version 1.4.3, available at http://tree.bio.ed.ac.uk/software/figtree/.

Gene conservation analysis
To assess the conservation of SCT protein-coding genes in the other Pseudomonas genomes, we used the large-scale blast score ratio (LS-BSR) (Sahl et al. 2014). Briefly, the LS-BSR pipeline performed a TBLASTN search using the protein sequence of strain SCT as a query and the whole nucleotide sequence of each of the Pseudomonas strains as a database, and calculated the BSR. The obtained BSR value ranged from 0 (no sequence similarity) to 1 (maximal sequence similarity) and was used as a measure of the degree of conservation of SCT genes in the other Pseudomonas genomes. The '$prefix_bsr_matrix.txt' file contains the BSR value for each gene in each genome, and the '$prefix_dup_ matrix.txt' file was used to determine gene presence and absence in each genome (https://github.com/jasonsahl/LS-BSR).

Data availability
The draft genome sequence of Pseudomonas sp. strain SCT has been deposited at GenBank/EMBL/DDBJ under BioProject number PRJDB5044, BioSample number SAMD00059319, and accession number BDJA00000000 (accession range: BDJA01000001-BDJA01000035). The version described in this paper is the first version, BDJA01000000. The raw reads have been deposited in the DDBJ Sequence Read Archive

Phylogeny
An accurate phylogenetic tree of a group of organisms provides a valid inference of its evolutionary history, gene gain, and loss events (Song et al. 2017). The subgroups of the genus Pseudomonas, including the P. stutzeri group, have been defined based on phylogenetic analyses of 16S rRNA gene sequences (Anzai et al. 2000). Recent studies have demonstrated that 16S rRNA gene sequences do not contain enough phylogenetic signals to distinguish closely related bacteria such as strains within the same species (Fox et al. 1992;Özen and Ussery 2012). To attain higher phylogenetic resolution, we inferred the phylogenetic relationship of SCT and other Pseudomonas strains based on 53 conserved core genes. Figure 1 shows the maximum likelihood phylogenetic tree based on the concatenated nucleotide sequence alignment of core genes. Core genome phylogeny with 100% bootstrap support indicated that strain SCT and P. stutzeri strain RCH2 (Chakraborty et al. 2017) formed a monophyletic group or clade that included also, in decreasing order of relevance, P. stutzeri strain 19SMN4 (Rosselló-Mora et al. 1994) and P. stutzeri strain CCUG 29243 (Brunet-Galmés et al. 2012). Results suggest that SCT belongs to P. stutzeri and is the sister strain to P. stutzeri strain RCH2.
The earliest branching lineage in this tree ( Figure 1) was P. balearica, followed by the P. stutzeri clade. The latter contained three subgroups: the first one comprised strains SCT, RCH2, 19SMN4, and CCUG 29243; the second comprised strains SLG510A3-8, DSM 4166, CGMCC 1.1803, and A1501; and the third comprised strains 28a24, 273, and DSM 10701. A previous study revealed that distinct subgroups for the P. stutzeri clade could be accredited to ecotype status resulting from niche-specific adaptations; accordingly, the first subgroup contained marine isolates and the second subgroup contained soil/sludge isolates (Sharma et al. 2015). Given the primary niche of the first subgroup strains (SCT, RCH2, 19SMN4, and CCUG 29243), and a comparison to other strains in the P. stutzeri clade (Table 1), this phylogeny suggests that adaptation to marine/aquifer and soil/rhizosphere environments might have evolved after divergence of these subgroups from a common ancestor.

Genome features
Genome features can reflect not only phylogenetic positions but also lifestyles or ecological niches, as indicated by free-living soil bacteria with large G+C-rich genomes and obligatory intracellular symbionts with small G+C-poor genomes (Dutta and Paul 2012). Table 1 shows the genome features (size, G+C content, and CDS number) of the 13 Pseudomonas strains included in this analysis. Genome size ranged from 4.17 Mb to 5.07 Mb with a median of 4.69 Mb, G+C content ranged from 60.3 to 64.7% with a median of 63.2%, and CDS numbers ranged from 3,931 to 4,714 with a median of 4,353. The genome features of strain SCT were thus in line with those of other Pseudomonas strains. The G+C content for the first subgroup (ranging from 62.3 to 62.7%) in our phylogenetic tree (Figure 1), was lower than that of the second subgroup (ranging from 63.9 to 64.0%).
In contrast to previous studies showing a positive correlation between genome size and G+C content for sequenced bacterial genomes (McCutcheon et al. 2009;Dutta and Paul 2012), a correlation between genome size and G+C content for the 13 Pseudomonas strains used in this study was weakly negative (Pearson's product-moment correlation coefficient, r = -0.34 and p-value = 0.25; Spearman's rank correlation coefficient, rho = -0.33 and p-value = 0.27).

Mobile genetic elements
Horizontal transfer of DNA occurs generally via three different mechanisms: conjugation, transformation, and transduction (Kwong et al. 2000). It is now widely accepted that mobile genetic elements, such as plasmids and phages, contribute to the evolution of bacteria and the spreading of virulence and drug resistance in microbial communities (Frost et al. 2005).
Plasmids can confer their hosts resistance to antibiotics and heavy metals (Popowska and Krawczyk-Balska 2013). P. stutzeri strains isolated from polluted environments tend to contain plasmids (Ginard et al. 1997) and some have been reported to harbor plasmid-encoded silver (Haefeli et al. 1984) or mercury resistance (Barbieri et al. 1989). P. stutzeri strain RCH2 (Chakraborty et al. 2017) contained three plasmids pPSEST01, pPSEST02, and pPSEST03 (12,763 bp, 9,865 bp, and 2,804 bp, respectively, in length), and strain 19SMN4 (Rosselló-Mora et al. 1994) contained one plasmid, pLIB119 (107,733 bp in length). The four plasmids were not similar to each other based on all-by-all BLASTN searches with a cutoff E value of 1e-5. Given the presence of plasmids in these two strains and their absence in the other Pseudomonas strains analyzed, our phylogeny ( Figure 1) suggests that these plasmids may have been acquired independently by each lineage, along the branch leading to the ancestor of either strain RCH2 or strain 19SMN4.

Gene annotations
The genome of Pseudomonas sp. strain SCT contained 4,520 CDSs, of which 1,089 are currently annotated as unknown functions (i.e., product names of "hypothetical proteins"), 2,709 are annotated by UniProtKB (Boutet et al. 2016), 509 by PFAM (Finn et al. 2014), 310 by CDD (Marchler-Bauer et al. 2011), and 20 by HAMAP (Lima et al. 2009). Of the 4,520 proteins, 4,401 (97.5%) had matches with 4,329 unique records in the UniRef90 database, and 2,543 (56.3%) were assigned to the 1,991 unique KEGG Orthology identifiers. Among the 4,520 CDSs, length in amino acids (Laa) ranged from 30 to 2,842 with a median of 272, and G+C content at the third codon position (GC3) ranged from 33 to 100%, with a median of 79%. G+C content varies more widely at the third codon position than at the first or second positions, which are constrained by protein-coding requirements (Sharp et al. 2005). The corresponding data for the SCT genes are shown in Supplementary Table S1.
The following paragraphs detail the functional annotations assigned to SCT genes by Prokka (gene and product names) as well as the UniRef90 and KEGG databases. The SCT genome contains genes encoding cytochrome c and related proteins such as cytochrome c oxidase subunits. A cluster of genes for type II secretion system proteins M, L, K, J, I, H, G, F, and E (locus_tag range: PSCT_03513 to PSCT_ 03521) was also identified.

Gene conservation
Conservation of SCT protein genes in the genome of all Pseudomonas strains (Table 1) was determined using the gene screen method with the TBLASTN tool in the LS-BSR pipeline. Of the 4,520 SCT genes, 1,908 (42%) were conserved in all 11 P. stutzeri strains, and 1,318 (29%) were conserved in all 13 Pseudomonas strains examined. Among P. stutzeri strains, more SCT genes were conserved in the three strains belonging to the first subgroup (RCH2, 19SMN4, and CCUG 29243; 3,578 to 3,919 genes) than in the strains belonging to the second subgroup (SLG510A3-8, DSM 4166, CGMCC 1.1803, and A1501; 3,461 to 3,547 genes), or the third subgroup (28a24, 273, and DSM 10701; 2,367 to 2,580 genes). Thus, the conservation of SCT genes in the Pseudomonas strains analyzed reflects their phylogenetic relationships (Figure 1).
A total of 451 genes were present in the SCT genome but absent from the other 10 P. stutzeri genomes analyzed; they are referred here as the "SCT strain-specific gene set". They included 254 hypothetical proteins as well as clusters of genes corresponding to putative prophages or possible operons (Supplementary Table S1). The SCT strain-specific genes may have been acquired following separation from common ancestors (gained on the branch leading to the SCT ancestor) and may be associated with SCT strain-specific phenotypic properties (e.g., iodate reduction and living in marine environments).
Sequence statistics (e.g., Laa and GC3) were used to compare SCT strain-specific genes and other genes in the SCT genome. The median Laa value for SCT strain-specific genes (183 aa) was smaller than that for the other genes (280 aa), the difference being highly significant according to a Wilcoxon rank sum test (p-value , 2.2e-16). Thus, in general, the SCT strain-specific genes tended to be shorter than the remaining genes in the SCT genome. The median value of GC3 (G+C content at the third codon position) was lower for the SCT strainspecific genes (70%) than for the other genes (79%). Again, the difference was significant according to a Wilcoxon rank sum test (p-value , 2.2e-16). Thus, in general, GC3 tended to be lower for the strainspecific genes than for the remaining genes in the SCT genome. For example, GC3 values for a cluster of five genes (contig accession number: BDJA01000002; locus_tag range: PSCT_01674 to PSCT_01678) were lower (ranging from 33 to 41%) than those for the flanking genes (.50%). BLASTP best hits in the UniRef90 database for the flanking genes were identified as belonging to Pseudomonas taxa, whereas those for the five genes were unknown (PSCT_01676 and PSCT_01678) or did not belong to the Pseudomonas genus; i.e., Halorhodospira halochloris (Class: Gammaproteobacteria; Order: Chromatiales) for PSCT_01674, Inquilinus limosus (Class: Alphaproteobacteria) for PSCT_01675, and Alteromonas confluentis (Class: Gammaproteobacteria; Order: Alteromonadales) for PSCT_01677. Studies have revealed that genes acquired by recent horizontal/lateral transfer often bear unusual nucleotide compositions (Lawrence and Ochman 1997) and are usually rich in A+T (Daubin et al. 2003). Thus, nucleotide composition such as total and positional G+C content (GC3) of genes have been used to detect horizontally transferred genes in various complete bacterial genomes (Becq et al. 2010).
Some P. stutzeri strains are competent for natural genetic transformation (Lorenz and Sikorski 2000), a process whereby DNA is taken up from external environments and is heritably integrated into the genome. This ability enables the bacterium to adapt to various conditions and, not surprisingly, P. stutzeri has been found in a wide range of environments (Lalucat et al. 2006). Natural transformation of P. stutzeri requires a competence phase and the formation of functional type IV pili (Meier et al. 2002). Genes for natural transformation (comA, exbB, and pil genes for type IV pili) were found in P. stutzeri strains CCUG 29243 (Brunet-Galmés et al. 2012) and DSM 10701, a model organism for natural transformation (Chakraborty et al. 2017). The SCT genome contains two distinct pil gene clusters for type IV pilus assembly protein. One cluster consists of pilQ-pilP-pilO-pilN-pilM genes (contig accession number: BDJA01000004; locus_tag range: PSCT_02297 to PSCT_02301; KEGG: K02666, K02665, K02664, K02663, and K02662). Another cluster is SCT strain-specific, and consists of pilY1-pilX-pilW-pilV-fimT-fimT genes for "type IV pilus assembly protein" and "type IV fimbrial biogenesis protein FimT" (contig accession number: BDJA01000006; locus_tag range: PSCT_03390 to PSCT_03395; KEGG: K02674, K02673, K02672, K02671, K08084, and K08084). GC3 values for the six SCT strain-specific genes pilY1-pilX-pilW-pilV-fimT-fimT were lower (ranging from 53 to 61%) than those for the flanking genes (.73%). Taxa of the BLASTP best hits in the UniRef90 database for the six genes were unknown (PSCT_03390 and PSCT_03392) or did not belong to P. stutzeri; i.e., Microbulbifer agarilyticus (Class: Gammaproteobacteria; Order: Alteromonadales) for PSCT_03391, Pseudomonas taeanensis for PSCT_03393, Marinimicrobium agarilyticum (Class: Gammaproteobacteria; Order: Alteromonadales) for PSCT_03394, and Thiobacillus (Class: Betaproteobacteria) for PSCT_03395. Evidence suggests that P. aeruginosa minor pilins PilV, PilW, and PilX require PilY1 for inclusion in surface pili and vice versa (Nguyen et al. 2015). A comprehensive list describing conservation of type IV pili accessory and assembly proteins (FimT, FimU, PilV, PilW, PilX, PilY1, PilY2, and PilE) among P. aeruginosa strains was produced by Asikyan et al. (2008). Present results suggest that strain SCT may be competent for natural genetic transformation and has been subjected to horizontal gene transfer events via natural transformation.
Of the 184 proteins encoded on the putative prophage regions in the SCT genome mentioned above, 129 were SCT strain-specific: 3/13 for BDJA01000001, 52/54 for BDJA01000002, 50/50 for BDJA01000003, 8/8 for BDJA01000004, 6/35 for BDJA01000005, and 10/24 for BDJA01000006. The majority (129/184, 70%) of the putative prophage genes are SCT strain-specific. It is likely that the genes in the cluster (i.e., prophage regions) were gained during the same phage transfer event, rather than by several independent events. Our results suggest that phage-mediated gene transfer events have occurred since separation from common ancestors (on the branch leading to the SCT ancestor).

Conclusion
We report a draft genome assembly for strain SCT, an iodate-reducing bacterium isolated from a marine environment. Phylogenetic analysis indicates that strain SCT belongs to the species Pseudomonas stutzeri and is closely related to marine isolates of P. stutzeri including strain RCH2. The SCT genome contains genes putatively involved in hydrocarbon degradation, nitrogen metabolism, and arsenic resistance and metabolism. Gene conservation analysis identified a set of genes present in the SCT genome but absent from the other P. stutzeri genomes analyzed. This SCT strain-specific gene set included (i) the pil gene cluster encoding minor pilins of the type IV pilus system for natural transformation, (ii) mer gene clusters encoding resistance systems for mercury, (iii) the pst gene cluster encoding Pst systems for uptake of phosphate, and (iv) gene clusters corresponding to putative prophage regions. These results suggest that strain SCT has evolved in marine environments and those polluted by hydrocarbons and heavy metals (e.g., arsenic and mercury) and has been subjected to horizontal gene transfer events via natural transformation and (phage-mediated) transduction. Accordingly, strain SCT has potential in bioremediation of hydrocarbon-and heavy-metal-polluted environments. Finally, bioinformatics analyses of the Pseudomonas sp. strain SCT genome sequence have identified a number of new gene targets, whose function will be revealed by future experimental testing.