Draft genome of Prochlorothrix hollandica CCAP 1490/1T (CALU1027), the chlorophyll a/b-containing filamentous cyanobacterium

Prochlorothrix hollandica is filamentous non-heterocystous cyanobacterium which possesses the chlorophyll a/b light-harvesting complexes. Despite the growing interest in unusual green-pigmented cyanobacteria (prochlorophytes) to date only a few sequenced genome from prochlorophytes genera have been reported. This study sequenced the genome of Prochlorothrix hollandica CCAP 1490/1T (CALU1027). The produced draft genome assembly (5.5 Mb) contains 3737 protein-coding genes and 114 RNA genes.


Introduction
The majority of cyanobacteria use chl a as a sole magnesium tetrapyrrole and common phycobilisome functioning as the bulk LHC. The prochlorophytes are a unique pigment subgroup of phylum Cyanobacteriabesides chl a, they contain other chls (b; 2,4-divinyl a; 2,4-divinyl b; f; g) as antennal pigments and simultaneously do not depend on the PBP-containing photoreceptors [1]. Prochlorophytes demonstrating these outgroup features are few and encompass three marine unicellular genera (Prochloron, Prochlorococcus, Acaryochloris) and one freshwater filamentous (Prochlorothrix). Unicellular Prochlorococcus spp. dominate in phytoplankton of oligotrophic regions of the world's ocean and they are of exceptional importance from the viewpoint of global primary productivity [2]. Prochloron sp. and Acaryochloris sp. were isolated in symbiotic association with colonial ascidians [3,4]. In contrast to other prochlorophytes distribution, P. hollandica is characterized by low abundance and patchy distribution [5]; more detailed genome analysis would explain the ecophysiological background of this microorganism.
The genus Prochlorothrix is represented by two cultivable free-living species: Prochlorothrix hollandica and Prochlorothrix scandica, as well as a number of unculturable strains, originating from environmental 16S rRNA sequences [6]. The distinction between P. hollandica and P. scandica is predominantly based on the molecular-genetic characters: DNA reassociation less than 30 % and DNA GC mol% content difference more than 5 % [5].
P. hollandica was isolated from the water bloom of Loosdrecht lake (near Amsterdam, Nertherlands) and validly published under the rules of Bacteriological Code as the type strain CCAP 1490/1 T [7,8]. The strain CCAP 1490/1 was generously supplied in 1994 by Dr. Hans C.P. Matthijs (Amsterdam University) and since then stored as CALU1027 at the Collection of Cultures of Algae and Microorganisms of St. Petersburg State University, CALU [9]. Prochlorothrix hollandica is also maintained as different strains under collection indexes CCMP34, CCMP682, NIVA-5/89, SAG10.89, and the strain PCC9006 was reported as well [10]. Another filamentous strain Prochlorothrix scandica was isolated from the phytoplankton of Lake Mälaren (Sweden), and is maintained as NIVA-8/90 and CALU1205 [11].
Among prochlorophytes at first were sequenced small genomes of unicellular Prochlorococcus sp. strains from LL-and HL-clades [2,12,13]. Four sequenced genomes of symbiotic Prochloron didemni P1-P4 are second in number [14]. Acaryochloris marina genomes were sequenced in the strains CCME5410 and MBIC11017 [15], but only one paper mentioned about P. hollandica PCC9006 genome sequenced by Shich et al. in the context of improving of global cyanobacterial phylogeny [16]. Here we report that genomic DNA of P. hollandica CCAP 1490/1 T (CALU1027) was sequenced and obtained draft genome was annotated in order to conduct investigations in the field of comparative genomics of cyanobacteria and prochlorophytes.

Classification and features
A representative genomic 16S rDNA sequence of strain P. hollandica CCAP 1490/1 T (CALU1027) was compared with another prochlorophytes and also with cyanobacterial type strains sequences obtained from GenBank. The tree was reconstructed using neighborjoining with the Kimura-2 parameter substitution model in MEGA 6.0 [17,18]. The phylogenetic position of P. hollandica CALU1027 represents in Fig. 1. Representatives of the genus Prochlorothrix are morphologically similar to other filamentous non-heterocystous cyanobacteria (Subsection III, Oscillatoriales) [19]. In particular, P. hollandica CALU1027 produces long (>300 μm) straight, unbranched, non-motile trichomes (Fig. 2). Individual cells are 1.6 ± 0.1 μm wide and 11.8 ± 0.9 μm long that matches with the data reported [2,4]. The opaque polar aggregates of gas vesicles resemble of those presented in Pseudanabaena type, but P. hollandica trichomes possess more slight intercellular constrictions (1/5 − 1/8 cell diameter). Trichomes multiply by means of occasional breakage without the resulting formation of hormogonia. Light-or electron microscopic-visible sheath and mucilaginous capsule were never observed; cell envelope demonstrates a typical Gram-negative triple-layer contour [5]. A brief survey of P. hollandica CALU1027 properties according to MIGS recommendations [20] is given in Table 1. Evidence codes -TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence) These evidence codes are from the Gene Ontology Project [25] Genome sequencing information

Genome project history
The WGS project AJTX02 has been deposited at DDBJ/ EMBL/GenBank under accession AJTX00000000 (20.02. 2013) and updated, in this research, as Draft Genome Project AJTX00000000.2 (29.04.2015). The assembled contigs have been deposited in NCBI. The project information and its association with the MIGS are summarized in Table 2.
Growth conditions and genomic DNA preparation P. hollandica CALU1027 was grown in the BG-11 medium [2]. The strain is a moderate mesophile, well growing at 20-22°C under continuous flux of light. For DNA isolation cells were harvested by centrifugation and treated with 2 μg/mL Proteinase K in 0.1 M Tris-HCl (pH 8.5), 1.5 M NaCl, 20 mM Na 2 EDTA, and 2 % cetyltrimethylammonium bromide at 55°C for 3-4 h. DNA was purified by standard protocol of organic extraction and ethanol precipitation.

Genome sequencing and assembly
For genome sequencing, DNA was randomly fragmented using Q800R sonicator system. After size selection, 500 bp DNA fragments were used for constructing sequence libraries and thereafter sequenced with a   The total is based on the total number of protein coding genes in annotated genome 250 bp paired-end reads method using the Illumina MiSeq platform according to the manufacturer's protocol, resulting in 3,679,738 read pairs. Reads were processed via the Trimmomatic 0.32 tool [21] and after filtration there were 3,665,348 read pairs. The obtained reads were used for further genome assembly with SPAdes 3.5 [22]. From the resulting assembly, the P. hollandica CALU1027 contigs was selected and scaffolded with Contiguator 2.7.4 [23], using assembly GCF_000332315.1 as a reference. The draft genome of    (Table 3).

Genome annotation
Protein-coding genes of draft genome assembly were predicted using the NCBI Prokaryotic Genome Annotation Pipeline (v.2.10) and an annotation method of best-placed reference protein set with GeneMarkS+ [24]. The annotated features were genes, CDS, rRNA, tRNA, ncRNA, and repeat regions. Functional assignments of the predicted ORFs were based on a BLASTP homology search against WGS of phylogenetically closest cyanobacteria and the NCBI non-redundant database. Functional assignment was also performed with a BLASTP homology search against the Clusters of Orthologous Groups (COG) database [25,26]. As much as 2855 genes (66 %) were assigned as a putative function, and the remaining genes were annotated as either hypothetical proteins or proteins with unknown function.

Genome properties
The GC content of the P. hollandica CALU1027 genome was 54.56 %. Gene annotation revealed 3737 protein coding genes, 12 rRNA genes, and 44 tRNA genes. COG annotations of protein coding genes are presented in Table 4.

Insights from the genome sequence
The assembly and analysis of P. hollandica CALU1027 genome annotation revealed a repertoire of genes necessary for the autonomous energy and substrate metabolism: 743 detected genes with relevance to 129 metabolic pathways have orthologs in P. hollandica CALU1027 and other cyanobacteria (Table 5). Comparative genomes analysis of P. hollandica CALU1027 with filamentious heterocystous cyanobacteria Anabaena variabilis ATCC29413 and unicellular prochlorophytes Prochlorococcus marinus CCMP1375 and Acaryochloris marina MBIC11017 revealed that the main differences were in the amino acids compounds, carbohydrates metabolism, membrane transport and stress response systems (data not shown). Chl a/b-containing Prochlorothrix and Prochloron were long considered to have a common ancestry with chloroplasts of green algae and higher plants [27,28]. However, P. hollandica and another prochlorophytes were shown to possess unique genes pcbA − pcbC coding chl a/b-LHC apoproteins and they are dissimilar from CAB apoprotein superfamily of chloroplast antenna [19][20][21][22][23][24][25][26][27][28][29][30]. It is notable that we found some PS II proteins commonly absent in cyanobacteria but usually belonging to chloroplast in green algae and higher plants: PsbW (6.1 kDa, nuclear encoded), PsbT (5 kDa, nuclear encoded), PsbR (10 kDa) and PsbQ (16 kDa, oxygen evolving complex protein). We also found that P. hollandica contains an ortholog of hetR gene (key regulator of heterocyst differentiation) although all these filamentous non-heterocystous cyanobacteria are devoid of nitrogenase and other prerequisites for diazotrophy [31,32].

Conclusions
The studying of P. hollandica CCAP1490/1 T (CALU1207) genome is valuable for analyses of photosynthesis genes evolution and for comparative genomics of cyanobacterial adaptation.