The large genome of Synechococcus moorigangaii CMS01 isolated from a mangrove ecosystem- evidences of motility and adaptive features

The whole genome of Synechococcus moorigangaii CMS01 isolated from Indian Sundarbans mangroves of Bay of Bengal is about 5.5 Mbp in size and contains approximately 0.5 Mbp plasmids. Genome annotation revealed total of 5806 genes out of which 5701 were CDSs. Of these, 5616 coding genes with 5616 protein coding CDSs were found. Along with genes coding for essential metabolic proteins, transport proteins and other cellular apparatus, genome also codes for proteins involved in flagella and pilus formation which has not been widely reported before in any coastal species of Synechococcus. The genome contains one incomplete prophage sequence. The genome analysis revealed adaptive features of S. moorigangaii CMS01 and establishes its ubiquitous distribution in coastal water of Bay of Bengal.


Introduction
The picocyanobacterial taxon Synechococcus represented by numerous species is unicellular, coccoid or rodshaped with ubiquitous distribution in marine ecosystems [1]. Synechococcus (>3 μm in diameter) have the ability to nutrients at sub-micromolar concentrations [2]. Members of this genus can grow across a range of light intensities and spectral range [3] and can utilize variety of nitrogen sources for growth [4]. Synechococcus belongs to a polyphyletic group which is characterized by genes acquired through horizontal gene transfer (HGT). The process of HGT plays a crucial role in evolution of cyanobacterial genomes [5] and can be identified as genomic islands within the genome. Genomic islands are thought to be acquired during infection by cyanophages [6]. Such genes acquired by HGT can code for proteins involved in photosynthesis, metabolism of key elements such as carbon and phosphorus and other stress responses [7].
We describe the genome sequence of a previously described new species of Synechococcus, named Synechococcus moorigangaii CMS01 [8] and highlight some adaptive features that has resulted in its ubiquity in coastal water of Bay of Bengal.

Methods
Isolation and genomic DNA extraction Synechococcus moorigangaii CMS01 was isolated and previously identified [8]  Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. culture was grown for 15 days under continuous light. Genomic DNA (gDNA) was extracted using modified published protocol [9].
Whole-genome sequencing Genome sequencing library was generated using Illumina-compatible SureSelect QXT whole genome preparation kit (Agilent, USA), followed by amplification and sequencing on Illumina MiSeq platform. The sequence data was checked using FastQC and adapters were trimmed using Cutadapt [10]. Quality checked pair-end reads were assembled into contigs using Unicycler [11].

Whole-genome sequence annotation and comparisons
The genome of S. moorigangaii CMS01 was aligned into circular map using CGView server [12]. Genome sequence of S. moorigangii CMS01 along with Synechococcus sp. PCC 7003 and Synechococcus sp. PCC 7335 were aligned in progressiveMauve [13]. The whole genome sequence based phylogeny was performed in the Type (Strain) Genome Server (TYGS) [14]. The genome data was compared using the MASH algorithm. Genome distances were calculated using the Genome BLAST Distance Phylogeny (GBDP) approach. The resulting intergenomic distances were used to calculate a balance minimum evolution tree with branch support via FASTME 2.1.4 including SPR post-processing [15]. Branch support was inferred from 100 pseudo-bootstrap replicates.
Genomic relatedness with closest relatives was determined using OrthoANIu algorithm [16]. Digital DDH values were calculated using genome-genome distance calculator (GGDC 2.1) applying Formula 2 (identities/ HSP length) [17]. The genome sequences used for GGDC and OrthoANIu analyses were of Synechococcus sp. PCC 7335, Synechococcus sp. PCC 7003, Synechococcus sp. PCC 7002 and Synechococcus sp. NKBG15041c. Average amino acid index (AAI) was determined using AAI-profiler [18]. In silico phenotyping was performed using Traitar [19]. Genomic annotation was carried out using Prokka [20] and revalidated using the Prokaryotic Genome Annotation Pipeline (PGAP) [21]. Genomic islands were predicted using IslandViewer 4 [22]. The resulting protein profile was viewed by plotting the data in a circular map using GView [23]. Prophage sequences within the genome and plasmids were identified using PHASTER [24,25]. The accession number for submitted genome data is SAMN12191289.

Genome analysis
The draft genome of S. moorigangaii CMS01 consisted of 6066887 bases which assembled into 227 contigs (figure 1). Approximately, 500000 bases were identified as plasmids. The genome is about 5.5 Mbp in size and nearly twice as big as the closest relatives including Synechococcus sp. PCC 7002, Synechococcus sp. PCC 7003 and Synechococcus sp. NBKG15041c (figure S1 available online at stacks.iop.org/IOPSN/1/034001/mmedia). The GC content was 56.53%. Genome analysis indicated the presence of 5806 total genes out of which 5701 were CDSs. A total of 5616 coding genes with 5616 protein coding CDSs were found. The genome codes for 105 RNA with three 5S rRNA, three 16S rRNA and four 23S rRNA. It codes for 88 tRNA and 7 ncRNAs. A total of 85 pseudogenes and 1 CRISPR array were found. Based on whole genome phylogeny S. moorigangaii CMS01 was confirmed as a new species (figure S3).

Possible phenotypic traits from genotype Metabolism
In silico phenotyping indicated the organism to be aerobic, motile and Gram negative. The isolate is susceptible to bile and produces enzymes including casein hydrolase, arginine dihydrolase, alkaline phosphatase, oxidase, catalase, lipase, lysine decarboxylase, gelatinase, coagulase, urease and DNase. It can use glycerol, pyrorolidonylbeta-naphthylamide, D-mannitol, acetate, L-arabinose, mucate and tartrate. Growth utilizing carbon sources including sucrose, D-mannose and trehalose was found. It can convert nitrite to nitrogen gas. It can grow on MacConkey agar in presence of high NaCl concentration (figure S2).
Energy generation and nutrient uptake Some of the genes identified from both the positive and negative strands of the gDNA are shown in figure S4. Genome annotation revealed genes involved in photosynthesis including photosystem II reaction centre proteins (psbN, psbH, psbL, psbJ, psbZ, psbQ), photosystem I iron-sulfur centre protein (psaC), photosystem I reaction centre subunits VIII, IX; and nitrogen regulation including Mo-dependent nitrogenase and global nitrogen regulator (ntcA). Genes involved in arsenic regulation including arsenic-transporting ATPase, ACR3 family arsenite efflux transporter, arsenic resistance protein (arsH) andarsenate reductase (arsC) were found. Other genes involved in toxin systems including type II toxin-antitoxin system (vapC) were identified. Genes involved in iron regulation including thioredoxin (trxA) were found. Transport protein coding genes including TonB-dependent receptor and MFS transporter were identified. Transporters specific for uptake of urea including urtABCDE transport system and urease cluster ureABCDEFG were present. A large number of cyanobacterial members harbour genes encoding urea catabolytic urease (ureABCDEFG), but not along with urea transport system [26]. The isolate can grow in presence of urea (1 μM concentration) although nitrate is the preferred source for growth [8]. A large number of genes coding for phosphate/phosphite/phosphonate ABC transporter indicate the capacity to uptake phosphorus in different forms from the environment. Indeed in estuarine mangroves phosphate can be limiting [27] and these genes reflect metabolic ability to uptake available forms of phosphorus.

Environmental adaptations
Transporter-coding genes indicated the possible adaptive capability of S. moorigangaii CMS01 to cope with extreme conditions including temperature and salinity which vary seasonally in Sundarbans leading to its ubiquitous distribution [28]. Molecules such as glutathione help to maintain cell redox homeostasis and protect the cell membrane lipids from oxidation stress in cold conditions [29]. The genome codes for RpsB, RpsC, RpsR, RpsG, RpsH, RpsI, RpsJ, RpsK, RpsM, RpsN, RpsO, RpsP, RpsQ and RpsS proteins. The genome codes for linker polypeptides that are necessary for correct assembly of phycobiliprotein in phycobilisome rods [2].  and NblA-related protein were detected which function in the degradation of phycobilisomes during nutrient stress in cyanobacteria [30]. The genome codes for phycocyanin alpha and beta subunits along with allophycocyanin. Multiple copies of smpB gene were identified in genome that codes for proteins required rescuing ribosomes stalled on defective messages [31]. The genes associated with possible estuarine adaptations of this species have been summarized in table S1. The genes possibly acquired by HGT as deduced by IslandViewer 4 are enlisted in table S2. The presence of multidrug efflux systems indicates the need to export toxins and highlight potential competition for resources experienced by S. moorigangaii CMS01. The possible locations of genomic islands are shown in figure 2. One incomplete prophase sequence of 14.6 kb length was identified. This sequence lays between positions 90307-105001 nucleotide position within the genome and codes for 18 proteins. This prophage sequence shows maximum identity with Paracoccus phage vB_PmaS-IMEP1 (Accession number: NC_026608).