Draft genome sequence of Lampropedia cohaerens strain CT6T isolated from arsenic rich microbial mats of a Himalayan hot water spring

Lampropedia cohaerens strain CT6T, a non-motile, aerobic and coccoid strain was isolated from arsenic rich microbial mats (temperature ~45 °C) of a hot water spring located atop the Himalayan ranges at Manikaran, India. The present study reports the first genome sequence of type strain CT6T of genus Lampropedia cohaerens. Sequencing data was generated using the Illumina HiSeq 2000 platform and assembled with ABySS v 1.3.5. The 3,158,922 bp genome was assembled into 41 contigs with a mean GC content of 63.5 % and 2823 coding sequences. Strain CT6T was found to harbour genes involved in both the Entner-Duodoroff pathway and non-phosphorylated ED pathway. Strain CT6T also contained genes responsible for imparting resistance to arsenic, copper, cobalt, zinc, cadmium and magnesium, providing survival advantages at a thermal location. Additionally, the presence of genes associated with biofilm formation, pyrroloquinoline-quinone production, isoquinoline degradation and mineral phosphate solubilisation in the genome demonstrate the diverse genetic potential for survival at stressed niches.


Introduction
The genus Lampropedia, a member of the family Comamonadaceae [1] was established by Schroeter in 1886 [2] with the description of square, tablet forming cells of Lampropedia hyalina. Henceforth, strains of the same species, L. hyalina have been isolated from pond water [2], liquid manure of a dairy farm yard [3], fistulated heifer [4] and activated sludge [5]. L. hyalina was isolated from activated sludge, and was tested for its phosphate removal capabilities and was classified as belonging to the functional group of polyphosphate accumulating microorganisms [5]. Another species, L. cohaerens strain CT6 T [6] was isolated from arsenic rich microbial mats of a Himalayan hot water spring from Manikaran, India as a continuation to our efforts to explore the culturable [7][8][9][10] and unculturable [11] diversity at the Himalayan hot spring to understand the role played by niche specific genetic determinants in shaping the genomes of organisms inhabiting this stressed niche. L. cohaerens, a biofilm forming and arsenic tolerating bacterium [6], showed limited carbohydrate assimilation potential but could utilize some organic acids. Currently, the genus Lampropedia is represented by three species, L. hyalina ATCC 11041 T [12], "L. puyangensis 2-bin T " (not validly published) [13] and L. cohaerens CT6 T [6], leading to the description of the genus being emended [6], however, the genomic potential of this small group remains unresolved. The genome of strain CT6 T , which is the type strain for Lampropedia cohaerens was sequenced in order to supplement the phenotypic taxonomical observations with genetic data and obtain genomic insights into heavy metal resistance and metabolic potential of gene complements of this microbial mat dweller. Here, we describe the summary classification, properties, genome sequencing, assembly and annotation of L. cohaerens CT6 T (DSM 100029 T =KCTC 42939 T ).

Organism information
Classification and features L. cohaerens was characterized by using a polyphasic approach with the integration of genotypic, phenotypic and chemotaxonomic methods [6]. This Gram-stain-negative, aerobic bacterial strain, forms white, smooth colonies with irregular margins on LB agar [6]. Transmission electron microscopy (TEM) revealed coccoid, unflagellated cells approximately 0.62 μm × 0.39 μm in dimension (Fig. 1). Summary characteristics are mentioned in Table 1. The slightly thermophilic and arsenic tolerant L. cohaerens strain CT6 can tolerate temperature in the range 20-55°C and can tolerate arsenic trioxide up to 80 parts per billion [6]. The NaCl tolerance for strain CT6 T was tested as 1-3 % (w/v) and pH range as 6-9. Biofilm formation is observed in LB media, inspiring its etymology. L. cohaerens showed closest phylogenetic similarity to "L. puyangensis 2-bin T " (96.4 %) and L. hyalina ATCC 11041 T (95.4 %) on the basis of 16S rRNA gene sequences. A maximum-likelihood [14] phylogenetic tree based on Jukes-Cantor [15] model using MEGA version 6 [16] constructed with closely related members of family Comamonadaceae on the basis of Blast-n [17] of 16S rRNA gene placed strain CT6 T along with the members of genus Lampropedia with bootstrap [18] confidence value of 98 % (Fig. 2). Positive biochemical tests included the hydrolysis of tween 20, tween 80 and starch and utilization of capric acid, malic acid, citric acid, xanthine and hypoxanthine [6]. Catalase test was positive whereas oxidase test was negative [6]. The most prominent fatty acid methyl esters were C 16:0 , summed feature 8 (C 18:1 ω7c/ C 18:1 ω6c), C 14:0 , C 19:0 ω8c cyclo and summed feature 3 (C 16:1 ω7c/C 16:1 ω6c) [6]. The major polar lipids detected in  , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [48] strain CT6 T were phosphatidylethanolamine, phosphatidylglycerol and a glycolipid [6]. Strain CT6 T demonstrated the presence of putrescine, 2-hydroxyputrescine and spermidine as the major polyamines and ubiquinone-8 as the major quinone [6].

Genome sequencing information
Genome project history Whole genome sequencing was performed at Beijing Genomics Institute Technology Solutions, Hong Kong, China using the Illumina HiSeq 2000 technology. Sequencing was done using 500 bp and 2 kbp paired end libraries. Raw data was generated within a duration of 3 months. De-novo assembly was performed in-house at the University of Delhi. The draft genome sequence was submitted to NCBI under the accession number LBNQ00000000 (version 1 LBNQ01000000). The sequences were also submitted to IMG-JGI portal under GOLD Analysis Project ID Ga0079366. Sequence project information in compliance with MIGS version 2.0 is given in Table 2.
Growth conditions and genomic DNA preparation

Genome annotation
For initial annotations, sequences were submitted to the NCBI Prokaryotic Genomes Annotation Pipeline. Additionally, the sequences were uploaded on Integrated Microbial Genomes pipeline [22] under the umbrella of Joint Genome Institute [22]. Coding sequence prediction was performed using Prodigal V2.6.2 [23]. rRNA operons were predicted using RNAmmer version 1.2 [24]. tRNAs and tmRNAs were predicted using ARAGORN [25]. Phage Search Tool [26] was used to find phages in the genome. CRISPRs were found online by CRISPR finder online server [27]. For prediction of signal peptides and transmembrane domains, SignalP 4.1 server [28] and TMHMM server v. 2.0 [29] were used respectively. COG category assignment and Pfam domain predictions were done using WebMGA server [30].

Genome properties
The final draft genome consists of 41 contigs with a total of 3,158,922 bp and a G+C mol% of 63.5. A total of 2909 coding sequences were predicted accounting for a coding density of 88.92 %. Out of the total coding sequences, 83.84 % were assigned functions. Protein coding genes were 2823 and comprised 97.04 % of the total; RNA coding genes were 86 in number and 56 tRNAs were detected. Five rRNA operons were predicted with complete 5S-16S-23S rRNA genes (Fig. 3). Three confirmed CRISPRs were detected, one on contig 13 and two on contig 33. Two incomplete phages were also detected having a phage integrase and an attR site for integration. Pfam domains were detected for 2539 genes, 238 genes were found to code for proteins harbouring signal peptides and 665 genes with transmembrane domains ( Table 3). Out of the total genes, 2713 (92.09 %) were assigned to COG categories. COG category assignment placed majority of genes to general function prediction only (10.62 %), amino acid transport and metabolism (10.31 %), inorganic ion transport and metabolism (6.92 %) and energy production and conversion (6.21). 6.24 % genes were placed in the function  [49]. From outside to centre, ring 1 and 2 show protein coding genes on both the forward and reverse strand; ring 3 shows G+C% content plot, and ring 4 shows GC skew unknown category, whereas 7.91 % genes were not placed into the COGs (Table 4).

Insights from the genome sequence
Consistent with the limited metabolic potential of L. cohaerens, the genome sequence was found to lack hexokinase and glucokinase, key enzymes involved in glycolysis. Additionally, the lack of pentose phosphate pathway genes glucose-6-phosphate 1-dehydrogenase and 6-phosphogluconolactonase are responsible for the organism's inability to utilize carbohydrates. However, genes involved in Entner-Doudoroff pathway and nonphosphorylated ED pathways were identified. nED pathway enzyme D-gluconate dehydratase (EC 4.2.1.39) which brings about the conversion of D-gluconate to 2keto-3-deoxy-D-gluconate [31] was identified, along with conventional ED pathway enzyme 2-keto-3-deoxy-6-phosphogluconate aldolase (EC 4.1.2.14) which brings about the conversion of KDPG (generated after the first step in ED pathway) to pyruvate and glyceraldehyde-3phosphate [32]. Although L. cohaerens possesses enzymes involved in both the ED and nED pathway, the link between the two could not be established as the enzyme KDG kinase which brings about the conversion of KDG to KDPG could not be identified. L. cohaerens CT6 T was isolated from hot spring microbial mats, known to be rich in heavy metal sulfides. Microbiota present at hot springs have developed resistance mechanisms to withstand and survive high heavy metal concentrations. Consequently, L. cohaerens demonstrated a repertoire of heavy metal resistance genes.
Among genes imparting resistance against arsenic, arsenate reductase genes arsC (AAV94_10615), arsenic resistance genes arsH (AAV94_10620), arsenic transporter ACR3 (AAV94_10610) and transcriptional regulator arsR (AAV94_10600, AAV94_10605) were found. Two arsenic resistance clusters were found on contig 33 harbouring two copies of arsR, a copy of ACR3, and a copy of arsenate reductase arsC. In one of the clusters, an additional gene arsH, coding for arsenical resistance protein was found. Additionally, a gene arsB coding for arsenic efflux pump protein was identified. Among heavy metals, copper, a trace mineral element is taken up by living cells to get incorporated into a number of enzymes, particularly cytochrome oxidases; however, in excess it becomes toxic to the cells. Copper resistance mechanisms in bacteria involve the cus system, the cue system and the pco system [33]. Excess copper is removed either by efflux of the cations or by periplasmic detoxification. Cue system genes copA, an ion translocating ATPase; cueO  The total is based on either the size of the genome in base pairs or the total number of protein coding genes predicted in the annotated draft genome [34], a perplasmic multicopper oxidase and cueR, a copper response metalloregulatory protein which acts as the regulator of both copA and cueO [35] were identified. Cus system genes cusA and cusB both coding for cation efflux proteins are harboured by L. cohaerens. Additionally, pco system genes, copC and copD were present in its genome. Genes imparting resistance to other heavy metals including cobalt, zinc and cadmium efflux system genes czcA, czcD,czcB and czcC which code for outer membrane transporter efflux proteins were identified. Magnesium and cobalt transport protein encoding genes corA and corC were identified. Transcriptional regulators of merR family were found in six copies. MerR transcriptional factors are known to be regulators of various environmental stimuli, particularly, high concentrations of heavy metals and oxidative stress [36].
The genetics of biofilm formation in bacteria is a complex process and is dependent on the modulation of expression of a number of genes, mainly those involved in adhesion and autoregulation [37]. The PGA operon is comprised of genes coding for the synthesis of a secreted polysaccharide poly-β-1,6-N-acetyl-D-glucosamine responsible for cell-cell and cell-surface adhesion in biofilms. Strain CT6 T demonstrated biofilm formation in vitro, the genes responsible for which were found in its genome. The PGA operon genes pgaA -biofilm secretion outer membrane secretion, pgaBbiofilm PGA synthesis deacetylase and pgaC -biofilm PGA synthesis N-glucosyltransferase were found to be harboured as a single operon in the genome.
Members of the family Comamonadaceae have been shown to possess a mineral phosphate solubilisation phenotype. Genes associated with the MPS phenotype include a glucose dehydrogenase and a pyrroloquinoline-quinone synthase system. PQQ is a cofactor for glucose dehydrogenase. PQQ, a small molecule that serves as a redox cofactor in several enzymes has been found to be produced by Pseudomoas fluorescens, Enterobacter intermedium and many other bacteria. PQQ production has been shown to be involved in plant growth promoting effects in soil dwelling bacteria. Additionally, PQQ production has been associated with higher tolerance to radiation and free oxygen radicals, thus bringing to light its free radical scavenging role in bacteria [38]. PQQ dependent enzymes like GDH play a role in the availability of insoluble phosphates to plants, thus contributing to their mineral phosphate solubilisation phenotype. The MPS phenotype contributes significantly to the mineralization of phosphates, playing a key role in geochemical cycling of the element. Consequently, three copies of PQQ dependent glucose dehydrogenase gene were found. PQQ synthase genes pqqB, D, E were also found. Further, genes coding for isoquinoline 1-oxidoreductase α and β subunit corresponding to the isoquinoline degradation system were found. Isoquinoline 1-oxidoreductase catabolizes the first step in the hydroxylation of isoquinoline, a N-heterocyclic compound which is commonly associated with coal gasification, shale oil, coal tar, crude oil contaminated sites.

Conclusions
The genome of L. cohaerens strain CT6 T , a biofilm forming and arsenic tolerating bacterium was found to harbour the genes necessary for arsenic tolerance and biofilm formation. Genes related with the transport and efflux of copper, cobalt, zinc and cadmium were identified. Limited metabolic potential was attributed to lack of key glycolysis and pentose phosphate pathway genes. A metabolically unique combination of genes involving both ED pathway and the nED pathway was encountered. Phylloquinolinequinone synthetic genes were identified along with PQQ requiring glucose dehydrogenase. This was consistent with the phosphate removal phenotype of Lampropedia from sewage slugde samples [5]. L. cohaerens, which harbours MPS phenotype imparting genes, can be considered to belong to the group of MPS bacteria which are used to enhance the fertility of soil by ensuring availability of trapped phosphates to plants. The presence of isoquinoline degrading genes may be employed for removal of oil contaminations. Further experiments can be performed to link the genetic determinants of L. cohaerens with its actual functional potential. The genetic repertoire of L. cohaerens points towards survival capabilities at diverse stressed niches. The genes harboured by L. cohaerens enable the organism to survive at heavy metal rich microbial mats of hot spring. Biofilm formation may be considered as a niche specialised strategy adapted to survive the hot spring waters forming microbial mats. The diverse survival instincts are reflected in the genome by the presence of genes for a PQQ synthase system and PQQ-dependent glucose dehydrogenases. Isoquinoline degradation genes provide a supplemental benefit for survival at oil contaminated sites. Further, the presence of isoquinolinedegradation genes makes L. cohaerens a potential candidate for bioremediation of oil contaminated sites.