Comparing microbial community compositions of biogas and sewage treatment plants by analyzing 16S rRNA gene data

This article contains data of bacterial and archaeal community compositions and process parameters of 16 biogas (BPs) and ten sewage treatment plants (STPs). Nucleic acids of BPs were extracted using a CTAB-based method while a phenol-chloroform-based method was applied for STPs. Amplicon sequencing data of the 16S rRNA gene was achieved by MiSeq-technology. Raw data were processed and statistically analyzed by several approaches including network analyses by using molecular ecological network analysis (MENA). Nodes of each network (BPs and STPs) were classified as generalists and peripherals to identify key players for a stable biogas production. Network parameters as well as interaction of generalistic species were compared between both plant types.


How data was acquired
Nucleic acid extraction of samples was followed by 16S rRNA gene analysis using MiSeq-technology. Raw data were processed and network analysis using Molecular Ecological network analysis (MENA) was applied.

Data format
Raw, processed, filtered, analyzed Experimental factors 16 biogas plants and ten sewage treatment plants were sampled in August 2016. Important process parameters of each plant were recorded.

Experimental features
DNA was extracted using plant-specific methods. Sequencing of DNA extracts was done using MiSeq-technology. Data were processed and analyzed using network analysis and other methods.

Data source location
All plants are located in Northern Bavaria (Germany).

Data accessibility
All tables and figures are within this article and supplementary material except for merged sequence reads. They are stored in the SRA of NCBI (ID: SRP126305) and are accessible under following URL: https://www.ncbi.nlm.nih.gov/sra/SRP126305

Value of the data
Surveys of bacterial and archaeal community compositions of many full-scale anaerobic digestion systems are still lacking. This issue is tackled in this article.
Process parameters of at least ten plants of each plant type were documented, enabling a solid comparison of process parameters and bacterial and archaeal community compositions.
Primers (515YF and 806R) are common, which ease data comparisons of this article with other data sets using similar methods.
Data include potential new bacterial as well as archaeal interaction patterns that could be further investigated in co-culturing experiments or other experimental set-ups.

Data
Paired-end sequence reads were produced by PCR-generated amplicons of the V4 variable regions of the 16S rRNA gene on an Illumina MiSeq. Reads were processed and finally merged by a minimum overlap of 15 base pairs and OTUs were clustered using an identity threshold of 99%.
Filtered, absolute abundances of OTUs were thereafter used for calculating a variety of diversity indices, a Bray-Curtis similarity heatmap and network analyses.

Sampling and DNA-extraction
16 biogas plants (BPs) and digestions tanks of 10 sewage treatment plants (STPs) were sampled in August 2016. Prior to sampling, pipelines were flushed and sludge was discarded. Afterwards, sludge was mixed thoroughly and samples were shock-frozen in liquid nitrogen and stored at À 80°C. Available plant parameters were collected from the plant operators (Tables 1 and 2). Different methods for DNA-extraction were tested. Nucleic acids from sludge of BPs and of STPs was finally extracted according to a modified CTAB-based method [1] and a modified phenol-chloroform-based method [2], respectively. Detailed description of each method can be found elsewhere [3].

Generating and processing the raw data of 16S rRNA gene sequences
Nucleic acid extracts obtained from each STP and BP plant were amplified by a two-Step PCR using the primer set 515F and 806R (details see [3]) and sequence libraries were created by v2 500 cycles kit (Illumina, San Diego, CA, USA) according to manufacturer's instructions. Sequencing of the libraries was carried out by 300-bp paired-end sequencing on an Illumina MiSeq system. Raw data were demultiplexed, quality filtered and adaptor trimmed using the Illumina real time analysis software (Illumina, San Diego, CA, USA). Quality of the reads was checked with the FastQC software, version 0.11.5 [4]. Adaptors were trimmed using the software cutadapt v.1.9.2.dev0 [5]. Reads that could not be trimmed were discarded. Merging of the trimmed forward and reverse reads, considering a minimum overlap of 15 bases, was done by using USEARCH version 8.1.1861 [6]. Clustering of OTUs with a 99% similarity level, discarding singletons and chimeras, was performed using USEARCH. Taxonomic assignment of OTUs was done with an identity threshold of 0.7 against the Greengenes v13.8 database [7]. Finally, an OTU table of absolute abundances of each OTU for each plant was generated. All steps mentioned above were performed by Microsynth AG (Balgach, Switzerland). Processed, merged reads were stored in the Short Read Archive of the NCBI (ID: SRP126305) and are accessible under following URL: https://www.ncbi.nlm.nih.gov/sra/SRP126305.

Data analyses of processed 16S rRNA gene data sets
Total number of sequence reads, number of bacterial and archaeal sequence reads as well as archaeal proportions of reads were summarized (Table 3). Generated OTU table was used as input matrix to generate a Bray-Curtis similarity matrix using function vegdist of R-package vegan v.2.4-3 [8]. Resulting matrix was visualized using Origin 2017 (OriginLab Corporation, Northampton, MA, USA) enabling a comparison of both bacterial and archaeal community composition between biogas and sewage treatment plants (Fig. 1). Based on the results of Bray-Curtis similarity heatmap, both plant types (BPs and STPs) were thereafter separately analyzed. Relative abundances of each OTU were calculated for each plant. OTUs below a minimum relative abundance of 0.1% in at least one plant of respective plant type were discarded. To avoid loss of too many archaeal OTUs, total relative abundance of bacterial and archaeal sequence reads were separated and each data set was set to 100%. Relative abundances of remaining OTUs were re-normalized to 100% and filtered OTU tables for both groups were generated (Supplementary Tables 5 and 6). Absolute abundances of remaining OTUs were then used for calculating alpha diversity measurements (OTU richness, Shannon, Simpson (1-D)) and OTU richness estimators (Chao1, abundance based coverage estimator (ACE)) using R-package vegan v.2.4-3 [8] (Table 4). Significance between mean values of both plant types was calculated using Mann-Whitney U test. Additionally, filtered OTU tables were summarized up to family level (Supplementary Tables 7 and 8) for a less complex overview of community compositions.

Network analyses
Network analysis was carried out separately for each plant group. Filtered relative abundances were used as an input for Molecular Ecological Network Analysis Pipeline [9] (accessible under http://ieg4.rccc.ou.edu/ mena/). Network was created based on Spearman similarity matrices. For comparable similarity thresholds of the Spearman similarity matrices, majority for BPs was set to four, while majority of STPs was chosen to be five. Modules were calculated by using fast greedy modularity optimization as described elsewhere [9]. Calculated networks (Fig. 2) were visualized with Cytoscape-software version 3.4.0 [10]. According to Maslov-Sneppen procedure [11], 100 random networks for each of the both networks were calculated for statistical assessment. Topological roles of each node were assigned calculating P i -(among-module connectivity) and z-values (within-module connectivity) following the simplified version [12] of the original Table 4 Diversity indices for BPs and STPs of OTUs with a relative abundance Z 0.1%. The significance thresholds were defined as p r 0.001 (**) and p r 0.0001 (***).

Plant
OTU richness*** Shannon*** Simpson (1-D)** Chao1*** ACE*** definition [13]. Therefore, Peripheral nodes (specialists) were defined by z r 2.5 and P i r 0.62, connectors by z r 2.5 and P i 4 0.62, module hubs by z 4 2.5 and P i r 0.62 and network hubs by z 4 2.5 and P i 4 0.62 (columns G-I of Supplementary Tables 5 and 6). Based on their putative metabolic abilities each generalistic OTU was assigned to one or more steps of anaerobic degradation. Considering that generalists are more important for network functionality than specialists [9], the focus was laid on interactions between generalistic OTUs. To avoid misinterpretations, role during anaerobic degradation of OTUs with insufficient taxonomic assignment was classified as unknown. Finally, a matrix was generated indicating the number of OTU interactions in function of their contribution in anaerobic degradation (Supplementary Table 9).