Metagenomic data on the composition of bacterial communities in lake environment sediments for fish farming by next generation Illumina sequencing

This article contains data on the bacterial communities of lagoon sediments with fish potential in the Central Andes of Peru. The surface sediment samples were collected from four lagoons destined for continental water fish farming. DNA extraction was performed from 0.5 g of sample through the Presto™ Soil DNA Extraction Kit. Bacterial sequencing of the 16S rRNA amplicon was performed on the DNA extracted from the sediment. At least 36 Phyla bacteria were detected, the bacterial communities being dominated by Proteobacteria, Cyanobacteria, Actinobacteria, Firmicutes, Chloroflexi. These data can be used for predictive analysis to gain a better understanding of the dynamics of bacterial communities in environments under pressure from fish farming.


a b s t r a c t
This article contains data on the bacterial communities of lagoon sediments with fish potential in the Central Andes of Peru. The surface sediment samples were collected from four lagoons destined for continental water fish farming. DNA extraction was performed from 0.5 g of sample through the Presto TM Soil DNA Extraction Kit. Bacterial sequencing of the 16S rRNA amplicon was performed on the DNA extracted from the sediment. At least 36 Phyla bacteria were detected, the bacterial communities being dominated by Proteobacteria, Cyanobacteria, Actinobacteria, Firmicutes, Chloroflexi. These data can be used for predictive analysis to gain a better understanding of the dynamics of bacterial communities in environments under pressure from fish farming.
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.

Value of the Data
• These data are the first generated using 16S rRNA genes from bacterial communities in lake environments pressured by fish farming in the Peruvian Andes. • These metagenomic data may be useful to other researchers to expand molecular studies and compare the composition of bacterial communities under different environmental and anthropogenic factors. • These data can be used for predictive analysis to gain a better understanding of the dynamics of bacterial communities in environments under pressure from fish farming.

Study area
The study was conducted in the Pomacocha, Habascocha, Tipicocha and Tranca Grande lagoons of glacial origin located in the Central Andes of Peru, in the upper basin of the Perene River, at an altitude between 4310 and 4330 m.a.s.l. [3] . The four lagoons are used for intensive farming of Oncorhynchus mykiss (rainbow trout) in large floating cages ( Fig. 1 ).

Analytical data
The metagenomic data presented in this manuscript provide information on the bacterial communities of lagoon sediments intended for the cultivation of Oncorhynchus mykiss in the Central Andes of Peru. The bacterial taxonomic composition generated through sequencing of the 16S rRNA amplicon using the standard next-generation Illumina MiSeq protocol is shown in Fig. 2 . Analysis of the final readings revealed the Bacteria and Archaea domains. In the Habascocha lagoon the readings revealed 33 phyla, 64 classes and 127 orders, in the Pomacocha la- goon 30 phyla, 61 classes and 120 orders, in the Tipicocha lagoon 34 phyla, 61 classes and 130 orders and, in the Tranca Grande lagoon 31 phyla, 55 classes and 127 orders. The readings also revealed 276 bacterial families in the four lakes. However, between 10% and 14% of the total readings were not classified. Table 1 shows the abundance of bacteria in surface sediments of lagoons with fish potential in the Central Andes of Peru, according to phylum, obtained through high performance sequencing. Table 2 shows the mean abundance and percentage contribution of phyla bacteria to the differentiation or similarity between groups, according to the SIMPER analysis. Phylum Actinobacteria presented the highest percentage of contribution to the bacterial communities (29.20%), followed by Cyanobacteria (16.11%) and Proteobacteria (14.66%). The grouping of bacterial orders by SIMPROF analysis, reported five statistically different groups in relation to the number and site of sampling ( Fig. 3 ). The distribution of bacterial families in surface sediments of ponds with fish potential at 70% contribution by SIMPER analysis is shown in Fig. 4 .

Sediment sampling
Surface sediment samples (10 cm) were collected from four inland water fish ( Oncorhynchus mykiss ) culture ponds in November 2019. Sediment samples from each lagoon were conditioned in airtight plastic bags and transported on ice to the Universidad Nacional de Tumbes laboratory for analysis [4] .

Bioinformatic analysis of sequence readings
The FASTQ files generated by the program FASTQC v0.11.9 were processed to know the length of the readings, the quality of the bases and the percentage of nucleotide bases. Subsequently, quality filtering and removal of regions of the primer and adapters present in the readings was performed using the Trimmomatic v0.39 program [9] with minimum trimming values of Q30 and trimming of readings below 30 bp. All individual reads were greater than 150,0 0 0 per isolate with a read length of 251 nucleotides and a quality value of each sequenced base greater than 30. The taxonomic analysis was performed using the program [10] , based on the database minikraken_20,171,019_4GB. This program also handles multiple scripts for circular representation. Finally, operational taxonomic units were identified and abundances calculated [ 11 , 12 ].

Statistical analysis
Similarity percentage analysis (SIMPER) was performed to calculate the relative contribution of each taxon to the overall average dissimilarity observed between two or more groups of taxonomic assemblages. The groups were defined on the basis of a preliminary similarity profile clustering analysis (SIMPROF) of the same taxonomic occurrence data set [13] . The SIMPROF analysis allowed to test the multivariate structure within groups of samples. Square-root transformed abundances were used to calculate Bray Curtis similarities [14] , showing patterns between samples determined by significant similarity measurements ( p < 0.05), using clustering and ordering [15] . These analyses were performed in the Primer V7.

Nucleotide sequence access numbers
The 16S rRNA gene sequences reported in this study were sent to the GenBank database with the access number PRJNA657251 ( https://www.ncbi.nlm.nih.gov/sra/PRJNA657251 ).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.