16S rRNA metagenomic data of microbial diversity of Pheidole decarinata Santschi (Hymenoptera: Formicidae) workers

Metagenomic datasets of the microbial DNA of workers of a Pheidole decarinata Santschi (Hymenoptera: Formicidae) around houses with three replicates were presented. Next-generation sequencing of the microbial DNA was performed on an Illumina Miseq platform. QIIME (version 1.9.1) was used to analyze the raw fastq files. Metagenome of the three (3) samples consist of 333,708 sequences representing 137,359,149 bps with an average length of 413.67 bps. The sequence data is available at the NCBI SRA with the bioproject number PRJNA632430. Community analysis revealed Proteobacteria was the predominant (84.77%) microbial community present in the microbial DNA of workers of the P. decarinata.

LGI, LG2 and LG3 are codes given to Pheidole decarinata collected from different locations within in Nasarawa, Nigeria. The alphabet LG represents Lafia GRA while the numbers (1,2 and 3) represents the different collection sites (foraging trails/colonies).

Specifications Table Subject
Biology Specific subject area Microbiology Genomics Type of data

Value of the Data
• The data provides microbial diversity of Pheidole decarinata collected in urban areas.
• Little or no data exist on the community metagenomic of Pheidole ants commonly observed around residentials and non-residential areas in urban communities. • The data offers possibility in the discovery of novel bacteria that have not previously been reported from ants. • The metagenomic data also benefit future comparative and phylogenetic studies of microbial diversity of P. decarinata

Data description
Total sequences of 138,548, 120,456, 74,704 with average read length of 404.18, 409.71, 427.14 base pairs (bps) were produced by the Illumina Miseq sequencer from the samples LGI, LG2 and LG3, respectively as shown in Table 1 . LGI, LG2 and LG3 are codes given to the different locations where the samples were collected within Lafia GRA (LG) in Nasarawa, Nigeria.

Insect sampling
Workers (major and minor) of P. decarinata were collected in and around houses in Lafia, Nigeria. The ants were observed foraging households and baited with other insects (American cockroach) then collected into sterile tubes. Each foraging trails was preserved separately and were sorted and identified using standard taxonomic keys. The metagenomic analysis of the ant samples carried out to determine their bacteria composition. The ant samples were collected from different location and P. decarinata collected within same trail were pooled prior to DNA extraction [1] . Ten workers (both major and minor) of P. decarinata from same trail were used for each replicate. The samples were rinsed several times [2] with sterile distilled water to remove soil and other debris from the samples before further molecular analysis.

DNA extraction
Next-generation sequencing (NGS) was employed to investigate the microbial diversity of Pheidole decarinata . The genomic DNA were extracted from the insect samples using HiYield TM Genomic DNA isolation kit (Real Biotech Corporation, Taiwan) according to the manufacturer's protocols with little modifications. The ant specimens were washed with sterile distill water severally to remove soil, plants and other debris attached to the surface of the samples. Ten (10) major and minor workers from same foraging trail (colony) were pooled together and pounded in a 200 μL of 1X PBS with sterile pestle according to [3] . 1X PBS was used to replace QGT Buffer and mixed concurrently with Proteinase K and QGB Buffer before tissue homogenization and incubation. Incubation time was reduced to 2 h [3] . other steps such as DNA binding, washing and elution were done according to the HiYield TM Genomic DNA isolation kit protocols. The little modifications in the DNA isolation procedure was performed to achieve high molecular weight DNA [3] . Three (3) DNA samples (LG1, LG2 and LG3) were extracted from Pheidole decarinata were sent to MyTACG Bioscience Enterprise (Kualar Lumpur, Malaysia) for Illumina sequencing.

PCR amplification, amplicons purification and quantification
The V3-V4 marker region of the bacteria were amplified by PCR (95 °C for 2 min

Library construction and Illumina sequencing
Library construction was done by removing adapters dimer using beads and single-stranded DNA fragments were generated using sodium hydroxide. Sample libraries were pooled in equimolar and paired-end sequenced (2 × 250/300 bp) on an Illumina MiSeq platform according to the standard protocols. Raw fastq files were demultiplexed, quality-filtered using QI-IME (version 1.9.1) [4] . Low quality reads with average quality score < 20 were trimmed using Trimmomatic software [5] and the trimmed reads with lengths shorter than 50 bp were discarded. Paired-reads were merged to single read using FLASH (Fast Length Adjustment of Short reads) [6] based on overlapped relationship. Reads which could not be assembled were discarded. Operational Units (OTUs) were clustered with 97% similarity cutoff using UPARSE (version 7.1 http://drive5.com/uparse/ ) [7] . Chimeric sequences were identified and removed using UCHIME [8] . The taxonomy of each 16S rRNA gene sequence was analyzed by RDP Classifier ( http://rdp.cme.msu.edu/ ) [9] against the Silva (SSU123) 16S rRNA database [10] using confidence threshold of 0.7.

Declaration of Competing Interest
The authors declare that no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.