Data on metagenomic profiles of activated sludge from a full-scale wastewater treatment plant

The data in this article mainly present the sequences of activated sludge from a full-scale municipal wastewater treatment plant (WWTP) carrying out simultaneous nitrogen and phosphorous removal in Beijing, China. Data include the operational conditions and performance, dominant microbes and taxonomic analysis in this WWTP, and function annotation results based on SEED, Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Sequencing data were generated by using Illumina HiSeq. 2000 platform according to the recommendations of the manufacturer. The sequencing data have been deposited in MG-RAST server (project ID: mgm4735473.3). For more information, see “Unraveling microbial structure and diversity of activated sludge in a full-scale simultaneous nitrogen and phosphorus removal plant using metagenomic sequencing” by Guo et al. (2017) [1].


a b s t r a c t
The data in this article mainly present the sequences of activated sludge from a full-scale municipal wastewater treatment plant (WWTP) carrying out simultaneous nitrogen and phosphorous removal in Beijing, China. Data include the operational conditions and performance, dominant microbes and taxonomic analysis in this WWTP, and function annotation results based on SEED, Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Sequencing data were generated by using Illumina HiSeq. 2000 platform according to the recommendations of the manufacturer. The sequencing data have been deposited in MG-RAST server (project ID: mgm4735473.3). For more information, see "Unraveling microbial structure and diversity of activated sludge in a full-scale simultaneous nitrogen and phosphorus removal plant using metagenomic sequencing" by Guo  Value of the data Data will be useful for investigating microbial community structure in wastewater treatment plants carrying out simultaneous nitrogen and phosphorus removal.
Data can be used to predict possible nitrogen conversation pathways in biological nitrogen removal systems from wastewater.
Sequencing data can be used to identify core microbes by comparing to similar data sets generated for simultaneous nitrogen and phosphorus removal plants with different treatment processes.
Accessibility of metagenomic sequence data allows researchers to perform new analyses with their own research purposes.

Data
Data on microbial community and functional profiles within activated sludge from a full-scale municipal wastewater treatment plant (WWTP) carrying out simultaneous nitrogen and phosphorous removal (SNPR) are presented [1]. Data include the operational conditions and performance of this WWTP (Table 1), dominant microbes and taxonomic analysis ( Table 2 and

Sampling of activated sludge
A 50 mL sample of activated sludge was taken using a plastic dipper from an aeration tank of a full-scale WWTP in Beijing (China). This WWTP treats a mean influent flow of 1×10 6 m 3 /day. The preliminary wastewater treatment consists of bar screens, aerated grit chambers and primary sedimentation. The plant has an Anaerobic-Anoxic-Oxic (A 2 O) configuration, in which nitrification, denitrification and biological phosphorous removal are simultaneously achieved. The hydraulic retention time is around 6-8 h and the solids retention time is 10-15 days. The excess sludge from the biological treatment settles down in the secondary clarifiers and enters the sludge treatment Table 1 Operational conditions and pollutant removal performance of the full-scale WWTP (The data are collected from 6 months prior to the sampling).  together. The sludge treatment consists of thickening tanks, anaerobic mesophilic digestion and dewatering.

DNA extraction
Briefly, 2 mL sample was centrifuged at 4000 rpm for 5 min at 4°C and the sludge pellet was collected. DNA extraction was performed using the FastDNA SPIN Kit for Soil (QBIOgene, Carlsbad, CA, USA) according to the kit manufacturer's instructions. DNA integrity was estimated through gel electrophoresis (1% agarose) and DNA concentrations were measured by using a Qubit Fluorometer (Thermo, USA).

DNA library construction and sequencing
The metagenomic sequencing was performed using Illumina HiSeq. 2000 platform. For library construction, the extracted DNA sample was processed according to the Paired-end Genomic DNA Sample Prep Kit protocol (Illumina) for generating 2×100 bp paired-ends reads. Briefly, DNA fragmentation was performed using the Covaris S2 Ultrasonicator. Then, the DNA fragments were subjected to end-repair, A-tailing, and adapter ligation. After DNA size-selection, PCR amplification and amplicon purification a~170 bp DNA fragment library was constructed for further sequencing. The base-calling pipeline (version Illumina Pipeline-0.3) was used to generate sequences. In this study, 4.5 Gb reads were generated for the metagenomic dataset. Quality filtering was performed as described previously [3] by removing raw reads that: contained more than 3 ambiguous nucleotides, were shorter than 35 bp, had more than 15 bp overlap with adapter sequences, included more than 36 nucleotides with quality value lower than 20, or were potential duplicated reads due to amplification artifacts. After quality filtering, a total of above 4.0 Gb high-quality DNA reads were used to assemble them into contigs using SOAPdenovo assembler (v 1.05, set as -p 8 -F -M 3 -D 1 -L 90 -u) [4]. The detailed pipeline for bioinformatic analyses can be found in our study [1].