Metagenomics dataset used to characterize microbiome in water and sediments of the lake Solenoe (Novosibirsk region, Russia)

This is data on the microbial diversity in the floating cyanobacterial community and sediment samples from the lake Solenoe (Novosibirsk region, Russia) obtained by metagenomic methods. Such a detailed data of the microbial diversity of the Novosibirsk oblast lake ecosystem was carried out for the first time. The purpose of our work was to reveal microbial taxonomic diversity and abundance, metabolic pathways and new enzyme findings the studied lake ecosystem using the next-generation sequencing (NGS) technology and metagenomic analysis. The data was obtained using metagenomics DNA whole genome sequencing (WGS) on Illumina NextSeq and NovaSeq. The raw sequence data used for analysis is available in NCBI under the Sequence Read Archive (SRA) with the BioProjects and SRA accession numbers: PRJNA493912 (SRR7943696), PRJNA493952 (SRR7943839) and PRJNA661775 (SRR12601635, SRR12601634, SRR12601633) corresponding to floating cyanobacterial community and sediment layers samples, respectively.


a b s t r a c t
This is data on the microbial diversity in the floating cyanobacterial community and sediment samples from the lake Solenoe (Novosibirsk region, Russia) obtained by metagenomic methods. Such a detailed data of the microbial diversity of the Novosibirsk oblast lake ecosystem was carried out for the first time. The purpose of our work was to reveal microbial taxonomic diversity and abundance, metabolic pathways and new enzyme findings the studied lake ecosystem using the next-generation sequencing (NGS) technology and metagenomic analysis. The data was obtained using metagenomics DNA whole genome sequencing (WGS) on Illumina NextSeq and No-vaSeq. The raw sequence data used for analysis is available in NCBI under the Sequence Read Archive (SRA) with the BioProjects and SRA accession numbers: PRJNA493912 (SRR7943696), PRJNA493952 (SRR7943839) and PRJNA661775 (SRR12601635, SRR12601634, SRR12601633) corresponding to floating cyanobacterial community and sediment layers samples, respectively. Value of the Data • The obtained data can be used for investigation of microbial diversity and metabolic pathways and processes occurring in the unique hypersaline lake. • Metagenomic data provide complete taxonomic profiles of microbial diversity and abundance in floating cyanobacterial community and sediment samples from the lake Solenoe (Novosibirsk region, Russia). The data may be very important for researchers which work in Bioinformatics, Biodiversity, Biochemistry, Biotechnology and other research areas. • This is the first data on microbial diversity and metabolic pathways in cyanobacterial community and sediment samples obtained for the lake Solenoe. It can be used to compare taxonomic profiles and metabolic pathways of other ecosystems.

Data Description
The Solenoe lake belongs to the poorly studied lake system in the south of West Siberia. It is relatively small and weakly alkaline, with mineralization reaching 230 g/l in certain ears. In summer, dense or loose floating cyanobacterial communities form near the banks of the lake.
A description of the sampling points is given in Table 1 .
Reads were extracted from each sample, aligned to the 16S rRNA gene sequence and identified to different taxonomic levels using the PhyloFlash program.
The metagenome of each sample were aligned on the bacterial and archaeal genome sequences and identified to different taxonomic levels using the MG-RAST program.
Taxonomic identifications of metagenomes of the samples obtained by 16S RNA and by WGS analyses are mostly similar despite some differences.
The distribution of the phylums and classes of received OTUs ( operational taxonomic unit s) is shown in Fig. 1 .
In the studied samples, the percentage of Archaea was 1.1-14.4% according to PhyloFlash, and 4.4-17.9%, to MG-Rast; Euryarchaeota and Crenarchaeota were the most abundant archaeal phyla. The maximum percentage of Archaea was detected in the 4th layer of the bottom sediment (R4).
Among bacteria, Proteobacteria were the most abundant, especially in the upper layer (R1) where they accounted from 53.3% (PhyloFlash) and 56.6% (MG-Rast) of all microorganisms. Alphaproteobacteria were represented by up to 18.4% of the total sample; Deltaproteobacteria, by up to 24.7%; Gammaproteobacteria, up to 42.9%. Bacteroidetes were also abundant, especially in the R2 layer at 21.5% (MG-Rast). Cyanobacteria were the most frequent in R1, R2, and R3, with up to 29%. Firmicutes were found in all layers with the maximum of 14.4% in R4. Chloroflexi were the most abundant in the lowermost R5 layer at 9.9% (according to PhyloFlash).
Taxonomic composition of the studied communities: Top line, taxa according to WGS (16S RNA, PhyloFlash, SILVA); bottom line, according to WGS metagenomes Table 1 Description of the sampling points.

R1
Floating cyanobacterial community. Loose green mass concentrated in the upper layers of the water column.

R2
Loose layer of bottom sediments from the depth of 0-3 cm R3 Bottom sediment from the depth of 3-5 cm R4 Bottom sediment from the depth of 10-15 cm R5 Bottom sediment from the depth of 15-19 cm The raw sequencing data had the characteristics shown in Table 2 .

Sample collection and description
The Solenoe lake is located near the village Lepokurovo (Bagan region, Novosibirsk oblast). The area of the studies was described by us earlier [ 1 , 2 ]. During our field studies (July 2017) the salinity of the lake was 122 g/l; рН, 8.46; Eh, 226 mV. Samples of the floating cyanobacterial community and bottom sediments were taken in the littoral zone in one point of the lake. First we sampled the mat, then pressed a plastic pipe 100 mm in diameter for sediment sampling. The samples were put into sterile 50 ml falcon tubes and kept in ethanol at −70 °C. Each sample was taken and analysed in triplicate.
In this study, the metagenome sample of the floating cyanobacterial community is referred to as R1; of the bottom sediments, as R2, R3, R4, and R5 ( Table 1 ).

DNA extraction
Total DNA was isolated from the samples using the Genomic DNA from soil NucleoSpin® Soil kit (Macherey-Nagel) according to the manufacturer's protocol.
Total DNA extraction from deep bottom sediments was performed as follows: 300 μl of the sample was placed in a tube with ceramic beads for particle fragmentation and centrifuged for 1 min at 16 0 0 0 g. The supernatant was discarded, and 500 μl of the buffer containing 10 0 0 mM Tris-HCl and 100 mM EDTA (pH 8.0) was added to the sediment. After vortexing, 100 μl of lysozyme (10 mg/ml) and 10 μl of RNAse were added and incubated at 37 °C for 25 min; every 5 min the tube was mixed. To this mixture, we added 100 μl each of 10% SDS (sodium dodecyl sulfate), 10% sarcosyl (Sigma), and chloroform. Samples were frozen in liquid nitrogen, then heated to 65 °С and vortexed. This procedure was repeated three times, and then another three times after vortexing for 1 min and the addition of 100 μl of 10% polyvinylpyrrolidone (Sigma). The resulting mixture was centrifuged at 13 0 0 0 g for 10 min. The supernatant was transferred to a sterile tube with the equal volume of isopropanol and incubated for 30 min at -20 °С . The tube was centrifuged for 15 min at 16 0 0 0 g; 30 0 μl of 75% ethanol was added to the sediment, vortexed, centrifuged, and the supernatant was removed. This procedure was repeated one more time. After that, the sediment was dried at room temperature, dissolved in 150 μl of the buffer containing 10 mM Tris-HCl and 10 mM EDTA, and purified on CleanMag DNA magnetic particles (Eurogene) according to the standard protocol.  [4] with the options -nextseq-trim 20 -m 100. Contamination was reduced using bowtie2 (v. 2.3.5) by removing the sequences that could be mapped on the human genome [5] . Assembly of reads into contigs was done by metaSPAdes (v. 3.11.1), with the -only-assembler option, because the reads were already processed [6] . For the taxonomic analysis of microbial communities, we used the method implemented in phyloFlash v3.3b2 that is based on the alignment of reads against the SSU rRNA reference database [7] . This procedure was performed with the standard parameters. For automated phylogenetic and functional analysis of metagenomes we used the MG-RAST Web server v. 4.0.3 [8] . MG-RAST automatically does quality control, nucleotide and amino acid sequence alignments with database accessions. This allows one to perform taxonomic identification, detect metabolic pathways, and do comparative analysis of metagenomes. The data processing pipeline of MG-RAST includes read quality control (Solex-aQA, DRISEE, Bowtie2) and annotation (FragGeneScan with the search against the protein M5nr database that integrates the GenBank, SEED, IMG, UniProt, KEGG, and eggNOGs databases).

Ethics Statement
The work did not involve human subjects, animals, cell lines or endangered species of wild fauna and flora.

Declaration of Competing Interest
The authors declare that they have not known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.