Shotgun metagenomic data of microbiomes on plastic fabrics exposed to harsh tropical environments

The development of more affordable high-throughput DNA sequencing technologies and powerful bioinformatics is making of shotgun metagenomics a common tool for effective characterization of microbiomes and robust functional genomics. A shotgun metagenomic approach was applied in the characterization of microbial communities associated with plasticized fabric materials exposed to a harsh tropical environment for 14 months. High-throughput sequencing of TruSeq paired-end libraries was conducted using a whole-genome shotgun (WGS) approach on an Illumina HiSeq2000 platform generating 100 bp reads. A multifaceted bioinformatics pipeline was developed and applied to conduct quality control and trimming of raw reads, microbial classification, assembly of multi-microbial genomes, binning of assembled contigs to individual genomes, and prediction of microbial genes and proteins. The bioinformatic analysis of the large 161 Gb sequence dataset generated 3,314,688 contigs and 120 microbial genomes. The raw metagenomic data and the detailed description of the bioinformatics pipeline applied in data analysis provide an important resource for the genomic characterization of microbial communities associated with biodegraded plastic fabric materials. The raw shotgun metagenomics sequence data of microbial communities on plastic fabric materials have been deposited in MG-RAST (https://www.mg-rast.org/) under accession numbers: mgm4794685.3–mgm4794690.3. The datasets and raw data presented here were associated with the main research work “Metagenomic characterization of microbial communities on plasticized fabric materials exposed to harsh tropical environments” (Radwan et al., 2020).


a b s t r a c t
The development of more affordable high-throughput DNA sequencing technologies and powerful bioinformatics is making of shotgun metagenomics a common tool for effective characterization of microbiomes and robust functional genomics. A shotgun metagenomic approach was applied in the characterization of microbial communities associated with plasticized fabric materials exposed to a harsh tropical environment for 14 months. High-throughput sequencing of TruSeq paired-end libraries was conducted using a wholegenome shotgun (WGS) approach on an Illumina HiSeq20 0 0 platform generating 100 bp reads. A multifaceted bioinformatics pipeline was developed and applied to conduct quality control and trimming of raw reads, microbial classification, assembly of multi-microbial genomes, binning of assembled contigs to individual genomes, and prediction of microbial genes and proteins. The bioinformatic analysis of the large 161 Gb sequence dataset generated 3,314,688 contigs and 120 microbial genomes. The raw metagenomic data and the detailed description of the bioinformatics pipeline applied in data analysis provide an important resource for the genomic characterization of microbial communities associated with biodegraded plastic fabric materials. The raw shotgun metagenomics sequence data of microbial commu-nities on plastic fabric materials have been deposited in MG-RAST ( https://www.mg-rast.org/ ) under accession numbers: mgm4794685.3-mgm4794690. 3. The datasets and raw data presented here were associated with the main research work "Metagenomic characterization of microbial communities on plasticized fabric materials exposed to harsh tropical environments" (Radwan et al., 2020).
Published by Elsevier Inc. This is an open access article under the CC BY license.

Value of the Data
• Raw metagenomic data of microbial communities could be an asset dataset to provide genomic information related to the structure and composition of microbial communities associated with biodegraded plastic fabric materials. • Draft genomes identified from the dataset can be used to understand the underlying mechanisms by which microorganisms biodegrade plastics, and may help in development of biodegradation resistant materials and new plastic bioremediation approaches. • These metagenomic data are valuable genomic sources for comparative metagenomics and can be exploited as a reference for other research teams interested in better understanding pathways and mechanisms involved in biodeterioration of plastic materials. • Functional annotation of sequenced reads from the six different plastic fabric materials will help in elucidating the true composition and behavior of the complex microbiomes associated with environmentally exposed fabrics.

Data Description
The datasets presented in this article are the raw sequences of pair-end reads with 100 bp length generated by Illumina HiSeq20 0 0 platform. Shotgun metagenomics of six plastic fabric Table 1 Summary of raw reads, trimmed reads, and total sequence reads (bp) from each sample. Also, the percentage of surviving pair end reads after applying the trimming procedure is provided. N50 is the number of contigs whose length when summed up covers 50% or more of the genome assembly. L50 is the length of the smallest contig in the N50 set. * The longest contig (bp) in each sample. * * Samples are ordered descending based on to their sum (Mb). materials exposed to a harsh tropical environment produced 1.61 Gb of raw reads with a total of 161 Gb of 100 bp sequences [1] . The data files in FASTQ format were deposited in MG-RAST ( https://www.mg-rast.org/ ) and can be retrieved using accession numbers: mgm4794685.3-mgm4794690.3. In this article, Fig. 1 provides a summary of the in-house pipeline that was established for bioinformatics analysis of metagenomic data. Table 1 , contains a summary of raw reads, trimmed reads and total sequences (bp) from each sample. Table 1 also presents the number of sequences after trimming and the percent of surviving reads compare with the raw reads. Surviving reads from paired-end are reads after applying the trimming procedure. Table 2 summarizes the results of genomic assembled contigs generated by the MEGAHIT assembler program using trimmed sequences from the six fabrics. The sum (Mb), number of contigs > 500 bp, L50, N50 and the longest contig from each sample are presented in Table 2 . N50 is the number of contigs whose length when summed up covers 50% or more of the genome assembly while L50  is the length of the smallest contig in the N50 set. Fig. 2 shows an overall summary of microbial distribution and taxon paths in one of the six plastic fabric materials. The data shown in Fig. 2 have been generated by KAIJU [2] , a bioinformatic pipeline that is rapid and sensitive for taxonomic classification of short predicted proteins from metagenomic reads. Tables 3 -8 summarize the results from MaxBin analysis that provided the different microbial genomes in each of the six plastic fabrics that were exposed to a harsh tropical environment for 14 months. Those microbial genomes are initially classified to different species of algae, black yeast, fungi and bacteria using KAIJU [2] . Tables 3 -8 also show the genome size (Mb), GC content, classification and genome identification for each identified microbial genome. Fig. 3 shows the percent of completeness and contamination of each microbial genome presented in Tables 3 -8 calculated us-ing CheckM bioinformatic program [3] . A functional annotation summary of proteins predicted with MG-RAST from the sequenced reads is presented in Table 9 . Table 9 shows twenty-eight functional categories with Carbohydrates; Amino Acids and Derivatives; Protein Metabolism; Cofactors, Vitamins, Prosthetic Groups, Pigments; and Respiration being the foremost categories in the sequences from the six platic fabric samples.

Samples and exposure environments
The U.S. Army Research, Development and Engineering Center (Natick, MA) provided six plastic fabric materials after 14 months of exposure to harsh tropical environment in the Republic of Panama [1] . The plastic fabric samples were used for DNA extraction, library preparation of genomic DNA, high-throughput sequencing, and bioinformatic analysis.

Library preparation and DNA sequencing for metagenomic study
DNA from the six plastic fabric materials was extracted with the Qiagen DNeasy UltraClean Microbial extraction kit (Cat# 12224-250), and then used for library preparation and DNA sequencing. A 300 ng of DNA from each fabric sample was used for the preparation of the genomic library using the PrepX DNA Library kit and Apollo 324 NGS automatic library prep system (WaferGen, Fremont, CA). A high-throughput sequencing of TruSeq paired-end libraries was conducted using a whole-genome shotgun (WGS) approach on an Illumina HiSeq20 0 0 platform generating 100 bp reads. A TruSeq SBS kit v3 for 2 × 101 cycles of Incorporation Reagent (ICR) was used for read sequencing (Illumina, Inc. San Diego, CA).

Bioinformatics analysis for metagenomics study
An in-house multifaceted bioinformatics pipeline ( Fig. 1 ) was established for the stepwise processing of sequence data required for completion of the metagenomic study. Quality control of raw reads was performed by Trimmomatic version 0.36 [4] , which allowed trimming low quality reads and short reads from raw reads. Trimmed reads were sorted by BBtools ( https://jgi.doe.gov/data-and-tools/bbtools/ ) "bbnorm.sh" to ensure the compatibility and normalization of paired-end before mapping to different contigs using MEGAHIT assembler program [5] . Bowtie2 [6] was employed for mapping raw reads to contigs produced by MEGAHIT, and the BAM file from each fabric sample was used for generating the coverage matrix and abundance files. Binning of individual genomes in each fabric sample was performed by MaxBin bioinformatic program [7] using the abundance file and fasta contigs generated by MEGAHIT.

Functional annotation of metagenomic reads and genome identification
The functional annotation of metagenomic reads of each fabric sample exposed to the tropical environment was extracted from the MG-RAST analysis ( https://www.mgrast.org/mgmain.html? mgpage=project&project=mgp85570 ). Both RNAmmer [8] and CheckM [3] programs were used for ribosomal RNA identification of each binned genome generated by the MaxBin bioinformatic program. KAIJU [2] , a fast and sensitive bioinformatic pipeline, was used for taxonomic classification of predicted proteins from metagenomic reads. Additionally, CheckM was used for assessing the completeness and presence contamination of microbial genomes generated by MaxBin.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.