Metagenomes and metagenome-assembled genomes from a sequentially fed anaerobic digester treating solid organic municipal waste

ABSTRACT We present a data set of four metagenomes and 281 metagenome-assembled genomes describing the microbial community of a laboratory-scale high solids anaerobic digester. Our objective was to obtain information on the coding potential of the microbial community and draft genomes of the most abundant organisms in the digester.

1).No DNA shearing was performed on this sample since the DNA size distribution was close to the desired 12-15 kb size range.
The Illumina reads from each sample were quality trimmed and assembled sepa rately by the Anvi'o v 6.2 metagenomic Snakemake workflow (4).This pipeline uses  The IMG taxon id for S79W4 and S80W4 refers to the co-assembly of Illumina and PacBio reads, while the taxon id for S62W2 and S80W1 refers to the assembly of the Illumina reads.c Contigs > 500 bp before trimming were submitted to GenBank.d For PacBio this refers to the number of reads, not read pairs.e For PacBio this refers to the N50 for reads.
Illumina-utilities (5) with the read quality control filtering developed by Minoche et al. (6), which involves B-tail trimming (low-quality bases at the end of reads), removal of reads containing uncalled bases, and keeping reads only if at least two-thirds of the bases of the first half of the read have quality values of Q ≥ 30.The quality-trimmed reads were assembled using SPAdes v. 3.12.0(metaspades mode [7]).In addition, PacBio reads from S80W4 were combined with the Illumina reads from S79W4 in a hybrid assembly using SPAdes v3.14.1.Taxonomic profiling of the Illumina reads was done using phyloFlash 3.4, which assembles and classifies rRNA genes (8).Binning of the assemblies was done using metaBAT 2 v2.15 (9) and MaxBin 2 v.2.2.7 (10).Default parameters were used for all software.All bins generated from all assemblies were compared and dereplicated using dRep v.2.5.4 (11) with minimum completeness at 50%, maximum contamination at 25%, and average nucleotide identity threshold at 99%.The taxonomy of the resulting metagenome-assembled genomes (MAGs) was obtained by comparisons to the genome taxonomy database (12) using GTDB-tk 1.5.1 (13), while quality was assessed using Anvio v.7.1 and CheckM v.1.1.2(14).
The sequencing information and assembly statistics are shown in Table 1.Binning and dereplication gave 281 MAGs, where 254 were of medium quality with completion > 50% and contamination < 10%, 166 of these had completeness ≥ 75% and contamina tion < 10%.The 10 most abundant phyla in the metagenomes based on rRNA genes and MAGs are shown in Fig. 1.Functional annotation of the metagenomes was done by JGI's IMG database (15).These annotations will be used to understand the higher-than-expec ted efficiency of this solid-state digester configuration.

FIG 1
FIG 1 Relative abundance and taxonomic distribution of MAGs and rRNA genes obtained from the Illumina metagenome reads.The 10 most abundant phyla are shown.The 16S rRNA reads were obtained using the phyloflash v3.4 pipline (8) with classification to order level using the Ref NR99 database from SILVA v.138 (16).The abundance of MAGs was obtained from read mapping using the Anvi'o v.7.1 pipeline, and taxonomy was obtained from GTDB v. 202 and converted to SILVA v.138 format using the ar122_metadata_r202.xlsx and bac120_metadata_r202.xlsx provided for download at GTDB.

TABLE 1
Accession numbers, sequencing information, and assembly statistics of metagenomes from four samples from Daisy the digester a

Sample ID SRA, GenBank accession number IMG taxon ID b Reagent kit and sequencing platform Number of read pairs Assembly size (bp) c
GenBank accession numbers, assembly size, and N50 are for the SPAdes assembly of Illumina reads. b