Janthinobacterium CG23_2: Comparative Genome Analysis Reveals Enhanced Environmental Sensing and Transcriptional Regulation for Adaptation to Life in an Antarctic Supraglacial Stream

As many bacteria detected in Antarctic environments are neither true psychrophiles nor endemic species, their proliferation in spite of environmental extremes gives rise to genome adaptations. Janthinobacterium sp. CG23_2 is a bacterial isolate from the Cotton Glacier stream, Antarctica. To understand how Janthinobacterium sp. CG23_2 has adapted to its environment, we investigated its genomic traits in comparison to genomes of 35 published Janthinobacterium species. While we hypothesized that genome shrinkage and specialization to narrow ecological niches would be energetically favorable for dwelling in an ephemeral Antarctic stream, the genome of Janthinobacterium sp. CG23_2 was on average 1.7 ± 0.6 Mb larger and predicted 1411 ± 499 more coding sequences compared to the other Janthinobacterium spp. Putatively identified horizontal gene transfer events contributed 0.92 Mb to the genome size expansion of Janthinobacterium sp. CG23_2. Genes with high copy numbers in the species-specific accessory genome of Janthinobacterium sp. CG23_2 were associated with environmental sensing, locomotion, response and transcriptional regulation, stress response, and mobile elements—functional categories which also showed molecular adaptation to cold. Our data suggest that genome plasticity and the abundant complementary genes for sensing and responding to the extracellular environment supported the adaptation of Janthinobacterium sp. CG23_2 to this extreme environment.


Introduction
Environments with temperatures permanently below 5 • C dominate the Earth's biosphere (>80%). While much of the cold biosphere is made up of the world's oceans, a combined 37% of the land area consists of permafrost regions, 198,000 glaciers, and two ice sheets [1][2][3]. Adaptation to cold temperatures and associated stresses (i.e., desiccation, water activity, radiation, pH, ionic strength, and nutrient availability) [4] allows psychrophilic/psychrotolerant microorganisms to inhabit and even thrive in these extreme and often inhospitable environments. Integrating 'omics' technologies with physiological studies on cold adaptation has advanced our understanding of functional and evolutionary biological processes at the molecular level [5][6][7][8]. Specifically, genome comparisons are a The genomes of 35 Janthinobacterium species/strains publicly available at the National Center for Biotechnology Information (NCBI) were used for comparison to the Janthinobacterium CG23_2 genome (Table 1). Due to the low percentage match to the other Janthinobacterium spp. (Table S1), genomes from Janthinobacterium spp. B9-8, Marseille, HH01, CG3, and NBRC.102515 were excluded from core gene analysis. Core genes were identified using the algorithm described in Cleary et al. [32], which constructs a compressed de Bruijn graph (CDBG) of a genome population and identifies core genes using the frequently visited regions in the graph. Subsequently, the coding sequences (CDSs) of the Janthinobacterium CG23_2 genome were mapped to the core genes using GMAP v1.8 [33]. Default settings were applied except for the "maximum number of paths to show" flag, which was set to 1. The GMAP alignment was filtered for ≥90% query coverage and ≥80% identity. The same parameters were selected in the GMAP v1.8 software package for pairwise comparisons of CDSs identified in the 36 Janthinobacterium spp. genomes. A Janthinobacterium spp. genome tree was inferred using CheckM [34]. CheckM identifies, annotates, concatenates, and aligns 43 marker genes, which were subsequently placed onto a reference genome tree using the implemented pplacer software package [35]. The Maximum Likelihood tree was computed in MEGA7 [36]. The phylogenetic tree was visualized using iTOL version 4.3 [37]. Whole genome sequence in silico DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) between Janthinobacterium sp. CG23_2 and Janthinobacterium spp. (Table 1) were determined using the default settings for the genome-to-genome distance calculator GGDC 2.1 in combination with the BLAST+ alignment tool [38] and the Average Nucleotide Identity calculator [39,40], respectively. As recommended by the developers [38], results from Formula 2 were considered for DDH, as these estimates are more robust against the use of incomplete draft genomes.

Molecular Analysis of Cold Adaptation
Cold adaptation of proteins from Janthinobacterium sp. CG23_2 was identified using a publicly available python script [6]. Two custom databases were generated, using Janthinobacterium spp. isolated from mesophilic (31 annotated genomes) and polar/glacial environments (four annotated genomes) ( Table 1). The annotated genome from Janthinobacterium sp. CG23_2 was compared to these databases using BLASTP [41] with a cutoff score E-value ≤10 −15 . Cold-adaptation scores were assigned to each protein based on the following parameters: arginine to lysine ratio; frequency of acidic residues; proline residues; aromaticity; aliphacity; and grand average of hydropathicity (GRAVY) [6]. Cold adaptation was inferred for each index if the direction of change was significant for fewer proline and acidic residues, and lower R/K (arginine/lysine) ratios, aliphacity, aromaticity, and GRAVY. Proteins with ≥3 cold-adaptation indices were determined as being cold-adapted. [6]. Clusters of orthologous groups (COGs) were annotated with WebMGA using default settings [42] and mapped identified COGs against the updated COGs database [43].

Horizontal Gene Transfer (HGT)
HGTector [44]; was used to identify putative horizontally transferred genes. All-against-all BLASTP [41] was performed against the NCBI non-redundant protein sequences database with an E-value cutoff ≤10 −10 , ≥30% identity, and ≥70% sequence coverage. The protein sequence database, taxonomy database, and a protein-to-taxonomy dictionary were downloaded on August 15, 2018 from the NCBI website. Up to 100 non-redundant hits per protein were preserved. Hits to more than one organism under the same species were excluded. Janthinobacterium was defined as the self group (NCBI taxonomic ID: 29580), and Burkholderiales as the close group (NCBI taxonomic ID: 1224). The distal group was comprised of all other organisms. Cutoffs of 7.04, 5.55, and 0.79 for the self, close, and distal group, respectively, were computed using the kernel density estimation function. The cutoff in the self weight distribution was included to predict putatively HGT-derived genes that were acquired by specific organisms within the self group. COGs of predicted HGT-derived genes were annotated with WebMGA using default settings [42] and mapped identified COGs against the updated COGs database [43]. Due to their phylogenetic distance and lack of gene similarity between the other Janthinobacterium spp. (0.0-3.8%; Table S1), Janthinobacterium spp. Marseille and B9-8 were excluded from this analysis.

Genome Statistics of Janthinobacterium spp.
Genome statistics for all 36 Janthinobacterium spp. are summarized in Table 1. Genome sizes (4.11-7.85 Mb) and number of CDSs (3870-6859) varied across the analyzed species/strains. All species/strains had a high mole percent GC content (60.5-65.5%), except for Janthinobacterium spp. Marseille (54.25) and B9-8 (48.7). Janthinobacterium sp. CG23_2 had the largest genome with 7.85 Mb. Its genome was on average 1.7 ± 0.6 Mb larger and predicted 1411 ± 499 more CDSs compared to the other Janthinobacterium spp. (Table 1). Calculated DDH and ANI values for Janthinobacterium sp. CG23_2 were 21.5 ± 1.8% and 79.3 ± 0.4% (Table S2), respectively, and thus below the threshold boundaries (DDH: 70% and ANI: 95%) for members of the same species. Phylogenetic reconstruction of the 36 Janthinobacterium spp. using 43 marker genes identified by CheckM placed Janthinobacterium sp. CG23_2 near the root of the tree (Figure 1). Janthinobacterium spp. isolated in close proximities (i.e., Janthinobacterium spp. CG23_2/CG3; 551a/334; GW spp.; and HH100-spp.) were more closely related. Janthinobacterium spp. Marseille and B9-8, the two species with the smallest genome sizes and considerably lower GC contents, were most distantly related to all other members of the genus Janthinobacterium investigated (Figure 1 and Figure S1). boundaries (DDH: 70% and ANI: 95%) for members of the same species. Phylogenetic reconstruction of the 36 Janthinobacterium spp. using 43 marker genes identified by CheckM placed Janthinobacterium sp. CG23_2 near the root of the tree (Figure 1). Janthinobacterium spp. isolated in close proximities (i.e., Janthinobacterium spp. CG23_2/CG3; 551a/334; GW spp.; and HH100-spp.) were more closely related. Janthinobacterium spp. Marseille and B9-8, the two species with the smallest genome sizes and considerably lower GC contents, were most distantly related to all other members of the genus Janthinobacterium investigated (Figure 1 and Figure S1). Collectively, Janthinobacterium sp. CG23_2 had only 1282 CDSs (18.7%) in common with the other 35 Janthinobacterium spp., as determined by pairwise comparison. Overlap between Janthinobacterium sp. CG23_2 and each of the 35 Janthinobacterium spp. genomes was low and ranged between 7.2% and 12.1% ( Figure 2). Likewise, Janthinobacterium spp. B9-8, Marseille, HH01, CG3, and NBRC.102515 had little genome overlap with the other Janthinobacterium spp. (0-25% on average; Table S1). These six Janthinobacterium spp. with low percentage match (<25%) to the other Janthinobacterium spp. were excluded for core gene analysis. For the remaining 30 Janthinobacterium spp. 3260 core CDSs were identified. Only 164 CDSs were shared between Janthinobacterium sp. CG23_2 and the core genes, indicating the prevalence of the Janthinobacterium sp. CG23_2 species-specific accessory genome. Core genes were 12-and 4-fold enriched in the COG categories J and C, respectively, relative to those found specific to Janthinobacterium sp. CG23_2 ( Figure 3). Conversely, the species-specific accessory genome of Janthinobacterium sp. CG23_2 was 13-fold enriched in COG category T, and 4-5 fold in categories V, M, G, and Q, relative to those identified as core genes ( Figure 3). COG categories N, W, and X were only identified in the species-specific accessory genome of Janthinobacterium sp. CG23_2. Collectively, Janthinobacterium sp. CG23_2 had only 1282 CDSs (18.7%) in common with the other 35 Janthinobacterium spp., as determined by pairwise comparison. Overlap between Janthinobacterium sp. CG23_2 and each of the 35 Janthinobacterium spp. genomes was low and ranged between 7.2% and 12.1% ( Figure 2). Likewise, Janthinobacterium spp. B9-8, Marseille, HH01, CG3, and NBRC.102515 had little genome overlap with the other Janthinobacterium spp. (0-25% on average; Table S1). These six Janthinobacterium spp. with low percentage match (<25%) to the other Janthinobacterium spp. were excluded for core gene analysis. For the remaining 30 Janthinobacterium spp. 3260 core CDSs were identified. Only 164 CDSs were shared between Janthinobacterium sp. CG23_2 and the core genes, indicating the prevalence of the Janthinobacterium sp. CG23_2 species-specific accessory genome. Core genes were 12-and 4-fold enriched in the COG categories J and C, respectively, relative to those found specific to Janthinobacterium sp. CG23_2 ( Figure 3). Conversely, the species-specific accessory genome of Janthinobacterium sp. CG23_2 was 13-fold enriched in COG category T, and 4-5 fold in categories V, M, G, and Q, relative to those identified as core genes ( Figure 3). COG categories N, W, and X were only identified in the species-specific accessory genome of Janthinobacterium sp. CG23_2.  Relative abundance of functional classification of annotated protein coding genes normalized to the total number of protein coding genes for the core genome, shared between Janthinobacterium sp. CG23_2 and n ≥ 2 species/strains, and species-specific to Janthinobacterium sp. CG23_2. Protein coding genes lacking specific functional assignments were excluded. COGs: clusters of orthologous groups.   Relative abundance of functional classification of annotated protein coding genes normalized to the total number of protein coding genes for the core genome, shared between Janthinobacterium sp. CG23_2 and n ≥ 2 species/strains, and species-specific to Janthinobacterium sp. CG23_2. Protein coding genes lacking specific functional assignments were excluded. COGs: clusters of orthologous groups. Relative abundance of functional classification of annotated protein coding genes normalized to the total number of protein coding genes for the core genome, shared between Janthinobacterium sp. CG23_2 and n ≥ 2 species/strains, and species-specific to Janthinobacterium sp. CG23_2. Protein coding genes lacking specific functional assignments were excluded. COGs: clusters of orthologous groups.
The species-specific accessory genome of Janthinobacterium sp. CG23_2 contained 81% of the total number of CDS, which equates to 5144 proteins sequences, clustering into 3793 orthologs. COGs with copy numbers n ≥ 10 are summarized in Figure 4. The highest percentage of the genes in these COG categories were associated with signal transduction histidine kinases (n = 123).

Genome-Wide Molecular Cold-Adaptation of Janthinobacterium sp. CG23_2
Cold-adaptation of the entire Janthinobacterium sp. CG23_2 genome was inferred from substitution patterns across amino acids using the protein coding sequences of the other 35 Janthinobacterium spp. as comparative databases. Across the Janthinobacterium sp. CG23_2 genome, 27% (n = 1760) and 9% (n = 577) of the protein coding sequences indicated cold adaptation when compared to the mesophilic (i.e., 31 Janthinobacterium spp.) and polar/glacial (i.e., four Janthinobacterium spp.) database, respectively. Noteworthy differences were found in the number of protein coding sequences that were classified as neutral between Janthinobacterium sp. CG23_2 and the two databases. The number of protein coding sequences with no significant changes in the amino acid content for Janthinobacterium sp. CG23_2 and Janthinobacterium spp. isolated from other polar/glacial environments was 2.3 times (n = 1687) higher compared to the database built from Janthinobacterium spp. isolated from mesophilic environments. Overall, Janthinobacterium sp. CG23_2 had significantly more proteins that possessed lower aliphacity, R/K (arginine/lysine) ratios, and aromaticity when compared to their counterparts from mesophilic environments (Bonferroni- Figure 4. Janthinobacterium sp. CG23_2 species-specific clusters of orthologous groups (COGs). Only COGs with n ≥10 copy numbers are shown.

Discussion
Unlike core genomes, which may consist of conserved genes essential to the lifestyle of specific taxonomic groups, the accessory genome is more likely subject to genome evolution and provides selective advantages under specific environmental conditions [45]. Only 18.7% of all CDSs identified in the Janthinobacterium sp. CG23_2 genome matched protein sequences to one or more of the other 35 Janthinobacterium spp. genomes. Further, with merely 164 CDSs being identified in both the Janthinobacterium sp. CG23_2 genome and the core gene set of 30 Janthinobacterium spp., these results reinforce the importance of a species-specific accessory genome and the genomic variability of Janthinobacterium sp. CG23_2. While it should be noted that gene duplication was not determined, putatively identified HGT events alone increase the genome size of Janthinobacterium sp. CG23_2 by 0.92 Mb. HGT events were mainly associated with signal transduction histidine kinases, second messengers, response regulators, and functions linked to defense/stress mechanisms ( Table 2). These are all advantageous traits for survival and adaptation to extreme environments (discussed below). HGT is made possible primarily by the mobilome, including transposons and bacteriophages. Notable was the occurrence of 27 COGs denoting transposases in the species-specific accessory genome of Janthinobacterium sp. CG23_2, enzymes which catalyze the rearrangement or transfer of mobile genetic elements (i.e., transposons) within or between cells [46]. In a meta-analysis of 384 bacterial genomes, Newton and Bordenstein [47] determined that up to~6% of bacterial genomes could be the result of bacteriophage genes. These authors also established a correlation between larger genome sizes and an increase in the number of bacteriophage genes. While the Janthinobacterium sp. CG23_2 genome is by far the largest genome of the 36 Janthinobacterium species investigated, bacteriophage genes account for only 0.6% of its gene content. Smith et al. [48] reported virus to bacterium ratios ranging from 0.12 to 0.44 for the Cotton Glacier stream, 10-1000 fold lower compared to other polar inland waters [49]. Such low viral abundance in the Cotton Glacier stream may have limited the integration of phage genes into the bacterial chromosome of Janthinobacterium sp. CG23_2.
The species-specific accessory genome of Janthinobacterium sp. CG23_2 is dominated by functions associated with environmental signaling and transcriptional regulation (Figure 4). While both functions are predominant in the core genome of the genus Janthinobacterium [22], their enrichment in the species-specific accessory genome of Janthinobacterium sp. CG23_2 underscores their role in the adaptation to life in an ephemeral supraglacial Antarctic stream. Moreover, the importance of environmental sensing and orchestrating gene expression were firmly established in the cold-adaptation patterns of protein coding sequences (Table S3). Of relevance were gene categories related to signal transduction histidine kinases and response regulators containing CheY-like receivers. Histidine kinases and response regulators are the building blocks of the two-component signal transduction system, enabling an adaptive response to environmental stimuli (e.g., changes in pH and osmolarity levels, thermal and oxidative stress, light, nutrients and metal ions, and antimicrobials), mainly through gene expression [50]. Moreover, histidine kinases play a central role in the signal integration of the bacterial chemotaxis pathway, where auto-phosphorylated substrates transfer the phosphoryl group to CheY (CheY-P) [51]. Subsequently, the diffusible response regulator CheY-P interacts with the flagellar motor and reverses the rotation of flagella [51]. In line with these findings, the species-specific accessory genome of Janthinobacterium sp. CG23_2 possesses chemotaxis genes for sensing environmental cues and the movement towards factors that favor survival. Methyl-accepting chemotaxis proteins were the predominant chemoreceptors: proteins involved in biofilm formation and exopolysaccharide production, flagellum biosynthesis, degradation of xenobiotic compounds, and the production of toxins [52]. In addition to chemotaxis proteins, the presence of c-di-GMP phosphodiesterases and c-di-GMP synthetases suggests the possibility of reciprocal interactions between different chemosensory systems. c-di-GMP, a second messenger, inhibits the methyltransferase activity of methyl-accepting chemotaxis proteins. Ultimately, this modulation affects the phosphorylation of the CheY-like proteins and chemotactic responses [53]. As such, the c-di-GMP signaling system regulates the transition between motile-sessile states [54], lifestyle switches that enhance adaptation to fluctuations in the environment [55]. The species-specific accessory genome of Janthinobacterium sp. CG23_2 is equipped with gene categories associated with flagellar biosynthesis, basal body, hook, and motor proteins (n = 80) as well as pilus assembly proteins (n = 32). While the latter aids the adhesion of a bacterial cell to surfaces, flagella permit chemotaxis-navigated motility systems that allow for active locomotion. Temperature, osmolarity, pH, and nutrient concentration can trigger the expression of the flagellar master operon, which facilitates switching between a motile and sessile state [56]. With their involvement in detecting wetness [57], flagella participate collectively in the sensing of environmental conditions crucial for successful propagation in a supraglacial stream.
The genome composition of Janthinobacterium sp. CG23_2 revealed temperature and oxidative stress as major environmental challenges associated with a supraglacial stream environment. Overall, 1760 and 577 protein coding sequences in the Janthinobacterium sp. CG23_2 genome were predicted to be cold-adapted when compared to the Janthinobacterium species isolated from mesophilic and polar/glacial habitats, respectively. Both the increased levels of UV radiation above the Antarctic Ice Sheet and low temperatures can lead to the formation of reactive oxygen species, posing a lethal threat to bacterial cells. Protection against oxidative damage in the genome of Janthinobacterium sp. CG23_2 included genes such as catalases, hydroperoxide reductases, peroxiredoxins, cytochrome c peroxidases, glutathione peroxidases, glutaredoxins, and thioredoxin reductases, many of which were cold-adapted (Table S3). The species-specific accessory genome of Janthinobacterium sp. CG23_2 also contains 15 copies of cold-adapted glutathione S-transferases. Not only do bacterial glutathione transferases provide protection against oxidative stresses, they also play a key role in cellular detoxification including processes such as the biodegradation of xenobiotics and antimicrobial drug resistance [58]. Further, choline dehydrogenases (n = 7) were found in the species-specific accessory genome of Janthinobacterium sp., and oxidize the first of the two enzymatic steps in the production of glycine-betaine [59]. This compatible solute is a known cryoprotectant and osmolyte and is believed to prevent cold-induced aggregation of proteins and maintain membrane fluidity [59,60]. In addition to genes coping with environmental stresses, Rhs protein families (n = 35) are included in the species-specific accessory genome of Janthinobacterium sp. CG23_2. Rhs proteins are part of a contact-dependent growth inhibition system. Intercellular competition is mediated by injecting toxins that inhibit the growth of neighboring cells [61], thereby providing a competitive advantage in a low-nutrient environment such as the Cotton Glacier stream [29].
For bacteria to sense and adapt to their ever-changing environment, modifications in signaling and gene regulation pathways are essential [62]. By implication, genotypic selection would depend on the complexity of the environment. Clearly, the temporal heterogeneity of the supraglacial Cotton Glacier stream, Antarctica, poses challenges for its microbial inhabitants within the time frame of both a single and multiple generations. A major survival advantage in this fluctuating environment would be the ability to anticipate changes in the environment [63,64]. Investigations by Mitchell et al. [65], for instance, showed that by using heat shock as the early stimulus, certain bacterial or yeast cells gained protection against stresses to come (e.g., oxidative stress, oxygen depletion). Similarly, a specific response to one stress could increase the resistance to another [66]. While experimental evidence for anticipating stressors or the physiological cross-protection to secondary stresses were beyond the scope of the present study, the genome of Janthinobacterium sp. CG23_2 is well equipped with ample genes associated with environmental sensing related functions, transcription regulators, and stress response.
In the context of the Black Queen Hypothesis, cells can evolve in two ways, by either losing gene functions and mutually depending on other members of a community or by retaining large genomes expressing many genes that are not essential to central metabolism, growth, and reproduction [30]. Although the latter would seem energetically unfavorable in an extreme environment such as the Cotton Glacier stream, Janthinobacterium sp. CG23_2 has evolved through genome plasticity (i.e., horizontal gene transfer and transposase activity), features that have been suggested to enable adaptation to life in cold environments [67,68]. The question of whether this gene acquisition represents a common trend in the adaptation to the Cotton Glacier stream environment or Janthinobacterium sp. CG23_2 acquired a key status as a function-performing helper within the microbial community according to the Black Queen Hypothesis invites further studies on the network of interactions between co-occurring organisms and their genome evolution. Both whole genome in silico DDH (21.5 ± 1.8%) and ANI (79.3 ± 0.4%) qualified well below the cut-off value for species boundaries [69]. These results were in accordance with the distant branching of Janthinobacterium sp. CG23_2 within the Maximum Likelihood tree (Figure 1). Based on these molecular and phylogenetic indicators, Janthinobacterium sp. CG23_2 appears sufficiently different to constitute a separate species. The new species Janthinobacterium cottonii is proposed. Additional taxonomic studies will help in placing Janthinobacterium cottonii within the genus Janthinobacterium.
In conclusion, comparative sequence analysis of 36 Janthinobacterium spp. genomes revealed a high degree of speciation of Janthinobacterium sp. CG23_2. Initially it was hypothesized that ecological niche specialization would result in genome streamlining; however, the genome of Janthinobacterium sp. CG23_2 is significantly larger than other Janthinobacterium spp. and has distinct accessory genome features (i.e., environmental sensing, locomotion, response and transcriptional regulation, stress response, and mobile elements) which are well adapted to and suited for proliferation in the ephemeral and extreme conditions of the Antarctic stream. The results highlight how the genome plasticity of closely-related organisms can support the adaptation of individual species to specific environmental niches.