Complete genome sequence and genomic characterization of Microcystis panniformis FACHB 1757 by third-generation sequencing

The cyanobacterial genus Microcystis is well known as the main group that forms harmful blooms in water. A strain of Microcystis, M. panniformis FACHB1757, was isolated from Meiliang Bay of Lake Taihu in August 2011. The whole genome was sequenced using PacBio RS II sequencer with 48-fold coverage. The complete genome sequence with no gaps contained a 5,686,839 bp chromosome and a 38,683 bp plasmid, which coded for 6,519 and 49 proteins, respectively. Comparison with strains of M. aeruginosa and some other water bloom-forming cyanobacterial species revealed large-scale structure rearrangement and length variation at the genome level along with 36 genomic islands annotated genome-wide, which demonstrates high plasticity of the M. panniformis FACHB1757 genome and reveals that Microcystis has a flexible genome evolution.


Introduction
The massive development of bloom-forming cyanobacteria is causing problems in eutrophic water bodies worldwide. Among the cyanobacteria, Microcystis is perhaps the most notorious. Many Microcystis species have been reported to be able to produce microcystins [1][2][3][4], which threaten many aquatic ecosystems and cause serious and occasionally fatal human liver, digestive, neurological, and skin diseases [5][6][7].
Microcystis is a genus of unicellular colony-forming cyanobacteria whose taxonomy is still unclear [8]. Although morphological criteria have been proposed to distinguish Microcystis species from field samples, such criteria have long been questioned for use in species identification within the genus [9]. Several studies attempted to reconcile molecular and morphological taxonomy in Microcystis [9][10][11][12][13][14], and a morphologybased taxonomic system has been dominantly used. Microcystis panniformis was first reported in 2002 and was morphologically described as having flattened, irregular, monolayer colonies with small holes inside and later disintegrated into small pieces [15]. Since the M. panniformis strain SPC 702 was successfully isolated from Lago das Garças, São Paulo in 1999, studies addressing different aspects of this species have been performed [16][17][18][19][20][21][22][23][24][25]. In China, M. panniformis was reported as a newly recorded species in 2012 [26], and one strain (FACHB1757) was isolated from Lake Taihu. Microcystis panniformis was originally thought to only be distributed in tropical regions, but we showed that this species has invaded the subtropical regions with a monsoon climate [26]. Global expansion of harmful cyanobacteria has been thought to be linked to climate changes, particularly increasing amounts of atmospheric CO 2 and surface temperature, which may promote Microcystis growth and enhance the potential for bloom occurrence [27][28][29]. Therefore, a deeper understanding of the ecology and physiology of M. panniformis FACHB1757 by obtaining a robust genome reference may provide insight into the expansion and invasion mechanisms of Microcystis.

Classification and features
A water bloom sample was collected directly from the water surface using a plastic bucket in Meiliang Bay of Lake Taihu in August 2011 (Fig. 1a). Lake Taihu (E 30°5 6′-31°33′,N 119°54′-120°36′), the third largest freshwater lake in China, is located in the south of the Yangtze River Delta. The total area of the lake is 2338 km 2 , with an average depth of 2 m and total capacity of 47.6 × 10 8 m 3 . Lake Taihu is situated in the subtropical zone with a humid and semi-humid monsoon climate, and has suffered from severe eutrophication over the past three decades. Meiliang Bay is located in the northern part of Lake Taihu (Fig. 1a), which has a surface area of 100 km 2 , depth of 1.8-2.3 m, and is currently hypereutrophic [30].
Some Microcystis colonies in the sample disintegrated during the sample collection process; thus, only those macroscopic colonies with significant monolayer were collected with 3-ml pipets (BD Falcon, USA), and transferred into 50-ml centrifuge tubes (Corning, USA), and immediately shipped to the laboratory. Finally, macroscopic colonies that had flattened irregular up to monolayers with small holes (in old colonies) were identified as M. panniformis by examination under an optical microscope. Microcystis panniformis FACHB1757 was obtained, and this strain was then stored at the Freshwater Algae Culture Collection at the Institute of Hydrobiology, Chinese Academy of Sciences. . a The precise position of the isolated sample is indicated by a star; WT means Wutang station in Lake Taihu. b The morphology of the strain colonies in the white disk, which were collected directly from the water surface using a plastic bucket (on September 15, 2013 in Meiliang Bay, photo with a Nikon D7000). c, d Flat colonies with small holes as viewed under an optical microscope The general characteristics of M. panniformis FAC HB1757 are summarized in Table 1, and a phylogenetic tree based on 16S rRNA sequences is shown in Fig. 2. The spherical cells are estimated with a diameter of 2.6 to 6.8 μm (mean 4.7 μm), be densely agglomerated, and form irregular colonies with small holes. The young stages formed small clusters of cells, which were flat or circular in outline, sometimes spheroidal, and with or without an internal hollow. The old stages formed colonies with small holes, which later disintegrated into small groups. The mucilage (margin of colonies) was diffuse, and cells did not overlap. The margin of the colonies was smooth or (in old colonies) irregular. Cell density was regular and evenly agglomerated, sometimes in indistinct rows. Diagnostic characteristics included flat colonies with small holes, toxicity, homogeneously arranged cells, and life cycle was characterized by distinct benthic and planktonic phases [15,31]. The distribution was tropical, and this is likely a pantropical species (e.g., S. Africa, N. Australia, S. America, Africa, China, Vietnam and New Zealand) [13,15,16,26,31,32].

Phylogenetic analysis
Whole genome comparative analysis between M. panniformis FACHB1757 and 13 other cyanobacterial species was performed. General information of related genome data is shown in Table S1 (Additional file 1), and all data sets were downloaded from NCBI. The main water bloom-forming cyanobacterial species in freshwater and brackish water worldwide, particularly those in the Lake Taihu region, were included. Unicellular colony-forming Microcystis and filamentous heterocystous Dolichospermum (formerly known as the planktonic Anabaena) were the main components of cyanobacterial blooms in Lake Taihu [33]. The Aphanizomenon, Pseudanabaena, Cylindrospermopsis, Raphidiopsis, Planktothrix, Synechocystis, and Synechococcus species occurred as dominant species or accompanying species in blooms of Lake Taihu (including Lake Wuli) across different seasons. Among the 14 genome sequences, 691 single-copy gene families were annotated by OrthoMCL (version 2.0.9) [34], and MEGA6 [35] was used to construct a phylogenetic tree based on these sequences (Fig. 3).
The phylogenetic tree shows that M. panniformis FACHB1757 and M. aeruginosa NIES843 shared a significantly high similarity, and there was no clear division between M. panniformis and M. aeruginosa strains in the phylogenetic tree. The Microcystis lineage is distinct from the lineage that contains the unicellular Synechocystis, Synechococcus, and other multicellular cyanobacteria. Furthermore, the Synechocystis sp. PCC 6803 genome is more closely related to Microcystis than other strains. This result is congruent with previously published results based on 16S rRNA sequences [36][37][38][39]. Topological relationships between species in the phylogenetic tree based on single-copy gene families were generally consistent with the phylogenetic tree based on 16S rRNA sequences (Fig. 2).
Although Microcystis can be identified based on 16S rRNA and single-copy gene families sequences at the genus level, taxonomy of Microcystis at the species level was controversial in the past few decades, and , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [78] five species have even been unified into a single species [13]. 16S rRNA sequence estimation can be ambiguous when analyzing certain Microcystis species with distinct morphologies, as occurred when analyzing M. panniformis and M. ichthyobabe (Fig. 2). Therefore, the whole reference genome sequence data was expected to play a crucial role in species classification of Microcystis. However, the currently available cyanobacterial genome sequences are highly limited. Only three Microcystis strains with complete genomic sequences are available, including M. aeruginosa NIES843 and NIES2549, and M. panniformis FACHB1757 reported here. Furthermore, the further species concepts and more useful molecular approaches should be proposed to classify the species/strain divergences in Microcystis [40,41].

Genome sequencing information
Genome project history Microcystis panniformis FACHB1757 was selected for sequencing because of its obvious morphological characteristics; in particular, the macroscopic colonies with significant monolayer can even exceed 30 mm during the summer and early autumn in Lake Taihu A summary of the project information can be found in Table 2.

Growth conditions and genomic DNA preparation
Microcystis panniformis FACHB1757 colonies collected from the field were grown in MA medium [42] and incubated in 24-well culture plates for 4 wk. Then, floating colonies were transferred to the capped tubes that contained 5 ml of MA culture medium to finally form a unialgal culture. All cultures were grown at 25 ± 1°C with a 12 h light/12 h dark cycle under a photon irradiance of 25 μmol photons/(m 2 · sec) provided by daylight fluorescent lamps. Total genomic DNA of M. panniformis FACHB1757 was extracted using a commercial DNA isolation kit (DNeasy ® Plant Mini Kit, Qiagen, USA) following the manufacturer's instructions, and analyzed by micro-volume fluorescence detection (Nano-Drop ™ 8000 Spectrophotometer, Thermo Scientific, USA) and electrophoresis in 0.8 % agarose gel stained with ethidium bromide. The isolated DNA was eluted with 50 μl of the elution buffer from the commercial kit and then stored at −20°C until subsequent analyses.

Genome sequencing and assembly
First, the genome was surveyed using an Illumina Hiseq sequencer to detect the purity of the cultured unialgal strain. The insert size of the next generation pair-end library was 100 bp, and 1 Gbp raw data was produced in total. All reads were mapped to the M. aeruginosa NIES843 reference complete genome, and more than 80 % of reads matched well. Subsequently, the genome was sequenced using PacBio RS II. Genomic DNA was sheared by Covaris S220 g-TUBE. A 10 Kb library was constructed using a PacBio template prep kit and sequenced using the PacBio SMRT platform. In total, two SMRT cells were run, and 303 megabase pair raw data was obtained. After filtering, the mean read length was 7143 bp with a quality of 0.84, and the longest read was 31,225 bp. HGAP (version 2.2.3) was used for genome assembly. Long reads were chosen as seeds, and the other reads were mapped to the seeds using Blasr (version 1.3.1.132871) [43] for error correction. After alignment, the accuracy of seed sequences were optimized to 99 % to meet the requirements of the Sanger assembly software. There was a total of

Genome annotation
TRs were predicted by Tandem Repeat Finder (version 4.07b) [46] and Microsatellite identification tool (version 1.0), which can both identify perfect and compound micro-satellites. Prediction and annotation of the genome were done using the RAST server (version 2.0) [47]. RAST integrated tRNAscan-SE, and the search_for_rnas tool was used to call RNA genes across the chromosome. For gene estimation, GLIMMER2 was used to represent putative genes. Subsequently, a similar search was performed against FIGfams to identify the determined genes and annotate their functions. Moreover, all putative protein-coding genes were assigned to a category using databases including Clusters of Orthologous Groups (COG), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-Prot, and Non-Redundant Protein Database.

Genome properties
The genome assembly contained a complete circular chromosome sequence (5.69 M) and a plasmid (38.68 K). The schematic representation of the circular chromosome of M. panniformis FACHB1757 was showed in Fig. 4. Related genome assembly and annotation information can be found in Table 3. Nucleotide homology search of M. panniformis FACHB1757 and M. aeruginosa NIES843 genomes was conducted by BLAST, and similarity between the two species was 83.82 % (Additional file 1: Figure S1). A total of 1944 TRs were found in the genome, including 27 microsatellites, 1742 mini-satellites, and 176 satellites. Genome statistics are shown in Table 4. In total, there were 6567 genes, which included 48 RNA genes and 6519 protein-coding genes. Among the 6519 proteins, most contained around 100 amino acids (Additional file 1: Figure S2), and by compared with function databases mentioned above, 60.15 % of them were determined to have specific functions. There were 42 tRNA genes, and two copies of the rRNA gene cluster were found in the same direction. Function assignments of 6519 putative protein-coding genes were searched against several frequently used databases mentioned above; 3260 genes were assigned to COGs, of which 235 participated in signal transduction. Search of Pfam omains detected 3997 candidates. According to the subsystem classification results processed by RAST, 72 % of determined genes belong to specific subsystems, and the distribution of each category is demonstrated in (Additional file 1: Figure S3). The result of COG function annotation is shown in Table 4, and details of each COG cluster can be found in Additional file 2. The genes assigned to GO categories by InterProScane (version 5.4-47.0) [48] were classified into cellular components, molecular functions, and biological processes. Genes distributed in each category and their functions are shown in (Additional file 1: Figure S4). In the GO data, 309 signal function-related genes were found. KEGG matched 897 functional genes to related systems, as shown in (Additional file 1: Figure S5). Final gross function annotation outcomes are provided in (Additional file 1: Table S2).

Insights from the genome sequence
Comparative Microcystis species genomes Gene ortholog analysis Genes of four Microcystis species were compared (Fig. 5), and 2669 highly conserved orthologous genes were shared, which are representative of the core genome. Moreover, each genome had strain-specific genes, which varied from 296 to 1900. The M. aeruginosa NIES2549 genome, which has 1388 unique genes, is 1.5 Mbp smaller than that of M. aeruginosa NIES843, which only has 296 unique genes (M. aeruginosa NIES843 has 1388).
Microcystis panniformis FACHB1757 was shown to have 1900 specific genes, which was the greatest amount among the four strains, even though its genome was not the longest.

Secondary metabolite gene clusters
Microcystin was reported to enhance colony formation in Microcystis spp. and plays a key role in the persistence of their colonies and the dominance of Microcystis [49].

Conserved gene clusters
Four functional clusters of conserved genes related to microcystin synthesis, colony formation, photoregulation, and nutrient assimilation were also compared among the four Microcystis strains. In the microcystin synthesis gene cluster, the mcy and mcn gene clusters were not found in M. aeruginosa NIES2549. This is consistent with the results of a previous study, which showed that M. aeruginosa NIES2549 is a nontoxic strain [50]. With regard to colony formation, M. aeruginosa, M. wesenbergii, and M. panniformis all have typical macroscopic colony structure when observed by naked eye in Lake Taihu during summer and autumn water blooms.
Microcystis panniformis seems to be the largest, and can even have more than 30 mm colonies. Polysaccharides and microcystin play important roles in the process of Microcystis colony formation. The maximum EPS content was found in M. wesenbergii and M. aeruginosa, which are not the largest and are only approximately 100 μm [51], but positive correlations between EPS and Microcystis colony size in cultures were supported by previous studies [52][53][54]. mrpC and epsL were absent from all four strains, and only M. aeruginosa NIES843 contained cpsF, although tagH, capD, csaB, and rfbB were conserved in all four strains. Furthermore, mvn codes for a lectin in M. panniformis FACHB1757 and M. aeruginosa PCC7806, which specifically binds to a sugar moiety present on the surface of Microcystis cells. Additionally, a binding partner of MVN was identified in the lipopolysaccharide fraction of M. aeruginosa PCC7806, which involved in the Microcystis colony formation [55]. Together, the toxin-, EPS-, and lectin-related genes may explain the reason why M. panniformis FACHB1757 usually aggregates and produces a larger colony in Lake Taihu during water blooms. In the photo-regulation cluster, psb, apc and gvp with the exception of gvpC were all detected. It is interesting that gvpC is absent from M. panniformis FACHB1757, because this gene encodes GvpC, which is a highly conserved expressed protein in some genera that is closely related to gas vesicles [56][57][58]. Genes related to nutrient assimilation include ntc, pst, and sph clusters. ntcB, pstA, pstB1, pstB2, and pstC were only absent from M. aeruginosa PCC7806 among the four strains, which The total is based on the total number of protein-coding genes in the genome  [59][60][61]. Microcystis aeruginosa NIES-843 is the first strain of the genus Microcystis to be sequenced for its complete genome with the ABI 3770xl sequencer. Since then, the second completed Microcystis genome (of M. aeruginosa NIES-2549) was released on the April 29, 2015. Thus, the whole genome at the nucleic acid level was compared between M. panniformis FACHB 1757 and M. aeruginosa NIES-843. Mauve, which was designed for identification and alignment of conserved genomic sequences with rearrangements and horizontal transfer, was used to conduct comparative genomic sequence analysis [62]. As shown in (Additional file 1: Figure S1), M. panniformis FACHB1757 underwent extensive chromosome structure rearrangement, which indicates that Microcystis genomes are highly plastic [36].

Self-defense system Restriction modification system
Comparison with REBASE [63], a restriction enzyme database containing information about restriction enzymes, revealed that DNA methyltransferases and related proteins are involved in the biological process of R-M, and  277 restriction enzymes were found. Detailed classification revealed that 12 and 130 enzymes belonged to type I and type II systems, respectively, which together represented 46.93 % of all enzymes, and are categories of rapidly evolving genes [64]. Sixty-three, 10, and 2 enzymes, respectively, belonged to type IIG, type III, and type IV systems, and one control protein restriction enzyme and 58 unknown enzymes were also found.

Methylation modification analysis
It is widely thought that methylation modification is associated with R-M systems and participates in self-defense against foreign genome invasion. Genome methylation modification and methyl-transferase recognition sequence motifs were analyzed using SMRT (version 2.3.0). In the chromosome, 3204 m4C (N 4 -methylcytosine), 9,758 m6A (N 6 -methyladenine), and 31,845 other  modified bases were marked as modified (details are available in Additional file 3). Corresponding motif information is included in Table 5.

CRISPR system
MinCED derived from the CRT [65], was used to predict CRISPR structure. CRISPR are extensively found in prokaryotes and are thought to compose a CRISPRassociated system, which is a putative immune system based on RNA-interference [66]. Three candidate CRISPR clusters on chromosome sequence were annotated under strict parameter and 1 CRISPR on plasmid (further information is available in Additional file 4).

Genomic islands
GEIs are particularly influential in microorganism genomes with regard to virulence, antibiotic resistance, metabolic, symbiosis, or other important adaptations [67]. GEIs have substantial roles in horizontal gene transfer, which is now widely acknowledged as an important force that shapes bacterial genome structure. Island Viewer (version 2.0) [68] was used to predict the GEIs in M. panniformis FACHB1757. Island Viewer integrates SIGI-HMM, Island Pick, and Island Path-DIMOB and built-in databases, including the Virulence Factor Database and Antibiotic Resistance Gene Database. Thirty-six GEIs were found using Island Viewer, and their positions are shown in Fig. 6. Different kinds of functions were identified and are summarized in Table 6. Transposases were identified in most of the GEIs, as they participated in horizontal gene transfer. Toxin-related gene clusters were annotated in six GEIs and probably affect competitiveness and fitness. Some functional genes, such as hat/hatR, were also detected, which indicates the enhanced adaptability and metabolic versatility in this strain.

Conclusions
This study presents the complete whole genome sequence of a newly recorded species in China, M. panniformis, and demonstrates several genomic perspectives, including comparison with nine other water bloomforming cyanobacterial species. A 5.6 Mbp chromosome with a 38 Kbp plasmid was reported, and gene function, methylation modification, CRISPR, and GEIs throughout the genome were described. Large-scale of structure variation was demonstrated by comparison with M. aeruginosa genomes. A Venn diagram of four Microcystis strains showed gene quantity and category variation as a result of evolutionary divergence and revealed that Microcystis has underwent flexible genome evolution.

Additional files
Additional file 1: Tables S1-3 and Figures S1-5. Table S1. General information regarding phylogenomics and comparative genomic analysis. Table S2. Function annotation assignment from different databases. Table S3. Conserved gene cluster function and position in the genomes of four Microcystis strains. Figure S1. Whole genome comparison with M. aeruginosa NIES843 at the nucleic acid level. Large-scale structural rearrangement was obvious at the nucleic acid level. Blocks on the line for M. panniformis FACHB1757 show that the sequences of the two species were in the same direction, whereas blocks under the line indicated that the sequences ran in the opposite direction. Figure S2.
Length distribution of annotated protein-coding genes. The majority of fractions varied in length from 100 to 150. Figure S3. Subsystems and category distributions of genes. Functional annotation by RAST. Figure S4.
Histogram demonstrating functional annotation results against GO. Figure S5.