Metagenomic 16S rDNA amplicon data on bacterial diversity profiling and its predicted metabolic functions of varillales in Allpahuayo-Mishana National Reserve

The white-sands forests or varillales of the Peruvian Amazon are characterized by their distinct physical characteristics, patchy distribution, and endemism [1, 2]. Much research has been conducted on the specialized plant and animal communities that inhabit these ecosystems, yet their soil microbiomes have yet to be studied. Here we provide metagenomic 16S rDNA amplicon data of soil microbiomes from three types of varillales in Allpahuayo-Mishana National Reserve near Iquitos, Peru. Composite soil samples were collected from very low varillal, high-dry varillal, and high-wet varillal. Purified metagenomic DNA was used to prepare and sequence 16S rDNA metagenomic libraries on the Illumina MiqSeq platform. Raw paired-endsequences were analyzed using the Metagenomics RAST server (MG-RAST) and Parallel-Meta3 software and revealed the existence of a high percentage of undiscovered sequences, potentially indicating specialized bacterial communities in these forests. Also, were predicted several metabolic functions in this dataset. The raw sequence data in fastq format is available in the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2). Also, is available at NCBI's Sequence Read Archive (SRA) with accession numbers SRX7891206 (very low varillal), SRX7891207 (high-dry varillal), and SRX7891208 (high-wet varillal).


a b s t r a c t
The white-sands forests or varillales of the Peruvian Amazon are characterized by their distinct physical characteristics, patchy distribution, and endemism [1 , 2] . Much research has been conducted on the specialized plant and animal communities that inhabit these ecosystems, yet their soil microbiomes have yet to be studied. Here we provide metagenomic 16S rDNA amplicon data of soil microbiomes from three types of varillales in Allpahuayo-Mishana National Reserve near Iquitos, Peru. Composite soil samples were collected from very low varillal, high-dry varillal, and high-wet varillal. Purified metagenomic DNA was used to prepare and sequence 16S rDNA metagenomic libraries on the Illumina MiqSeq platform. Raw paired-endsequences were analyzed using the Metagenomics RAST server (MG-RAST) and Parallel-Meta3 software and revealed the exis-tence of a high percentage of undiscovered sequences, potentially indicating specialized bacterial communities in these forests. Also, were predicted several metabolic functions in this dataset. The raw sequence data in fastq format is available in the public repository Discover Mendeley Data ( https: //data.mendeley.com/datasets/syktzxcnp6/2 ). Also, is available at NCBI's Sequence Read Archive (SRA) with accession numbers SRX7891206 (very low varillal), SRX7891207 (high-dry varillal), and SRX7891208 (high-wet varillal).
© 2020 The Author(s

Value of the data
• This is the first metagenomic 16S rDNA amplicon data on bacterial profiling and its predicted metabolic functions of varillales in Allpahuayo-Mishana National Reserve of the Peruvian Amazon. • These data provide valuable information on the bacterial diversity and their metabolic functions of varillales in Allpahuayo-Mishana National Reserve of the Peruvian Amazon. • Metagenomic 16S rDNA amplicon data revealed a high percentage of undiscovered sequences which may indicate varillales contain specialized bacterial communities.

Sample collection
In this dataset, soil samples were collected from varialles of Allpahuayo-Mishana National Reserve ( Supplementary Fig. S1), which is located in a lowland tropical rain forest of the Peruvian Amazon between 130 and 153 m.a.s.l. Soil samples were obtained from three types of varialles as classified by [1] : 1) very low varillal (3 °57 54.293"S, 73 °26 10.110"W), which is characterized by a high density of small forest trees (height < 5 m) and an organic soil horizon thickness > 11 cm ; 2) high-dry varillal (3 °58 33.185"S, 73 °25 37.165"W), which is characterized by larger forest trees (height > 15 m) and an organic soil horizon thickness ≤11 cm; and 3) high-wet varillal (3 °58 21.535"S, 73 °25 54.369"W), which is also characterized by larger forest trees (height > 15 m) but is differentiated by an organic soil horizon thickness > 11 cm. Samples were obtained in October 2018 during the high water level season. In order to obtain a representative sample of soil bacterial diversity, thirteen soil cores (10 cm in diameter and 10 cm in depth) were collected in each varillal. The first soil core was designated the reference point for geographic coordinates. The remaining soil cores were sampled at five meter intervals in each cardinal direction with three soil cores obtained in each direction. All thirteen samples from a given reference point were pooled together, homogenized into a composite soil sample per varillal forest type and then passed through a 2 mm meshed sieve (Supplementary Fig. S2). The meshed soil samples were preserved temporarily at −20 °C for further studies.

Metagenomic DNA isolation
Metagenomic DNA was isolated from composite soil samples following the protocol of Devi et al., [3] . In addition, to remove humic and fulvic acids contamination and exclude smaller fragments, partially purified metagenomic DNA was subjected to agarose gel (0.6%) electrophoresis for 30 min at 100 V and DNA fragments > 20,0 0 0 bp were cut away using a sterile scalpel, placed in 2 mL microtubes, and purified with PureLink TM Quick Gel Extraction Kit (Invitrogen TM , Catalog: K210012) following the manufacturer's instructions. Quality and quantity of the purified metagenomic DNA (size approximately to 10,0 0 0 bp) were verified by both electrophoretic and spectrophotometric analysis using a NanoDrop 20 0 0 (Thermo Scientific).

Library preparation and next-generation DNA sequencing
Amplicon libraries were prepared following the 16S Metagenomics Sequencing Library preparation protocol (Part # 15044223 B). First, metagenomic DNA was amplified using primers designed to target 16S rDNA V3 and V4 regions [4] : 16S rDNA Amplicon PCR Forward Primer = 5 - TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3 , 16S rDNA Amplicon PCR Reverse Primer = 5 -GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3 . These locus-specific primers were synthesized with overhanging Illumina adapter sequences. A second PCR was performed to incorporate multiplexing indices and Illumina sequencing adapters. Amplicon libraries were then purified using 0.8x AMPure XP beads (Beckman Coulter) and size verified on a Bioanalyzer 2100 (Agilent Technologies) using an Agilent High Sensitivity DNA Kit. Libraries were quantified using the Qubit TM dsDNA HS Assay Kit (Thermo Fisher Scientific), normalized, pooled, and paired-end sequenced using the MiSeq Illumina Platform.

Sequence analysis
Raw paired sequences were uploaded as FASTQ files and analysed using the MG-RAST server v 4.0.3 [5][6][7] . Reads obtained after quality control were subjected to taxonomic analysis by comparing with different ribosomal RNA databases using the open and closed-reference Operational Taxonomic Unit (OTU) picking strategy. The OTUs were classified using the Greengene 13_8 16S reference database [8] . Taxonomy assignments were made to each OTU using the RDP classifier [9] and Silvangs [10] . Finally, the sequence coverage by rarefaction analysis and the alpha diversity of species in each varillal was produced by the MG-RAST pipeline. The microbial metabolic pathways were determined based on the 16S rDNA gene data using Parallel-Meta3 software v 3.5.3 [11 , 12]