Data set on the diversity and core members of bacterial community associated with two specialist fruit flies Bactrocera melastomatos and B. umbrosa (Insecta, Tephritidae)

Bactrocera melastomatos Drew & Hancock and Bactrocera umbrosa (Fabricius) are fruit flies of the subfamily Dacinae under the family Tephritidae [1]. B. melastomatos occurs in India (Andaman Island), Thailand, Peninsular Malaysia, Singapore, and Indonesia (Sumatra, Kalimantan, Java) [1] while B. umbrosa is distributed from southern Thailand and Malaysia to New Guinea and New Caledonia [2]. The adult male flies of B. melastomatos are attracted to Cue lure while the adult male flies of B. umbrosa are attracted to methyl eugenol [3]. Fruit flies of Bactrocera melastomatos infest Melastomataceae while those of B. umbrosa infest Moraceae. We compare the diversity of microbiota associated with the wild adult males of these two specialist fruit flies infesting different families of host plants. Targeted 16S rRNA gene (V3-V4 region) was sequenced using the Illumina MiSeq platform. Six bacterial phyla (Actinobacteria, Armatimonadetes, Bacteroidetes, Cyanobacteria/Melainabacteria group, Firmicutes, Proteobacteria) were detected at 97% similarity clustering and 0.001% abundance filtering. Four phyla (Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria) were present in all the specimens studied. Proteobacteria was the predominant phylum in both B. melastomatos and B. umbrosa. Enterobacteriaceae was the predominant family in UM B. melastomatos and B. umbrosa, and Orbaceae was the predominant family in Awana B. melastomatos. Klebsiella was the predominant genus in B. umbrosa, Citrobacter in UM B. melastomatos, and Orbus in Awana B. melastomatos. Double Wolbachia infections were present in UM B. melastomatos. In general, the bacterial diversity and richness varied within and between the samples of B. melastomatos and B. umbrosa.


a b s t r a c t
Bactrocera melastomatos Drew & Hancock and Bactrocera umbrosa (Fabricius) are fruit flies of the subfamily Dacinae under the family Tephritidae [1] . B. melastomatos occurs in India (Andaman Island), Thailand, Peninsular Malaysia, Singapore, and Indonesia (Sumatra, Kalimantan, Java) [1] while B. umbrosa is distributed from southern Thailand and Malaysia to New Guinea and New Caledonia [2] . The adult male flies of B. melastomatos are attracted to Cue lure while the adult male flies of B. umbrosa are attracted to methyl eugenol [3] . Fruit flies of Bactrocera melastomatos infest Melastomataceae while those of B. umbrosa infest Moraceae. We compare the diversity of microbiota associated with the wild adult males of these two specialist fruit flies infesting different families of host plants. Targeted 16S rRNA gene (V3-V4 region) was sequenced using the Illumina MiSeq platform. Six bacterial phyla ( Actinobacteria, Armatimonadetes, Bacteroidetes, Cyanobacteria / Melainabacteria group, Firmicutes, Proteobacteria ) were detected at 97% similarity clustering and 0.001% abundance filtering. Four phyla ( Actinobacteria, Bac-teroidetes, Firmicutes, Proteobacteria ) were present in all the specimens studied.

Value of the Data
• The data provide information on the core members and different taxa of the bacterial community associated with B. melastomatos which infests only the fruits of Melastomataceae and B. umbrosa which infests only Artocarpus fruits of Moraceae. • The data are useful for comparative analysis of abundance and core members of the bacterial community with other specialist as well as generalist fruit flies. • The data are useful for culture-dependent technique on the microbiota associated with these two species of specialist fruit flies. • The data are valuable for developing pest management programme in controlling the fruit flies infesting host plants.

Data Description
The high throughput sequencing generated a total of 2205662 raw sequence reads. After quality filtering and chimera removal, the samples were obtained with sequences ranging from about 59907 in BU2 to 85783 in BM7. The number of reads varied among the specimens of B. melastomatos (74176-76713 in Awana samples and 61018-85783 in UM samples) and B. umbrosa (59907-81575). The species richness varied considerably within and across the three groups of samples ( Fig. 1 ). The raw datasets for 16S rRNA gene amplicon sequencing generated for this paper have been deposited in the GenBank Sequence Read Archive (accession number PR-JNA528573).
The overall bacterial community in the samples of B. melastomatos and B. umbrosa consisted of six phyla, 11 classes, 23 orders, 30 families, 64 genera, and 122 putative species ( Table 1 ; Supplementary Tables S1, S2). Of the six bacterial phyla, four -Actinobacteria, Bacteroidetes, Firmicutes , and Proteobacteria -were represented in all the fruit fly specimens, forming the core members of the bacterial community (     Of the four core phyla, the respective number of core OTUs were: Proteobacteria -4 classes, 4 orders, 4 families, 9 genera, 11 species; Bacteroidetes -2 classes, 2 orders, 2 families, 1 genus, 1 species; Firmicutes -1 class, 1 order, 1 family, 1 genus, 1 species; Actinobacteria -1 class ( In general, the bacterial OTU diversity varied within and between the samples of B. melastomatos and B. umbrosa ( Table 3 ; Figs. 2-4 ). The richness also varied within and between the samples. The bacterial community in the UM B. melastomatos samples were more diverse than the Awana B. melastomatos and B. umbrosa samples. On the other hand, the bacterial community in the B. umbrosa samples were more variable. Non-parametric statistical test analysis of simi-

Sample Collection and DNA Extraction
Wild adult male flies of B. melastomatos were collected by means of Cue lure, while those of B. umbrosa were collected by methyl eugenol. These fruit flies were collected in Peninsular Malaysia -B. melastomatos : 2 specimens from Universiti Malaya (UM) campus (3.

Targeted Metagenomics Sequencing
Demultiplexed raw sequences were extracted from the Illumina MiSeq system in FASTQ format and FastQC software was used to evaluate the quality of sequences [5] . The CLC Genomic Workbench v.7.5.1 was used to pair, merge, trim and filter the raw sequences ( https: //www.qiagenbioinformatics.com/ ). Ambiguous bases, low quality reads and sequences with read length below 200 bp were discarded. UCHIME was used to identify and remove the potential chimeric sequences [ 6 , 7 ]. UCLUST by open-reference OTU picking approach in Quantitative Insights into Microbial Ecology (Qiime v.1.9.0) was used to cluster the sequence reads into Operational Taxonomic Units (OTUs) at 97% similarity [ 6 , 8 ]. A representative sequence for each OTU was selected for taxonomic assignment with reference to the Greengene 13_8-release database [9] and additionally blasted against the NCBI 16S microbial database to gain additional insight into species level.

Bioinformatics and Statistical Analyses
Alpha and beta diversity analyses, and Principal Coordinate Analysis (PCoA) were performed as earlier described [ 10 , 11 ]. One-way ANOVA with post-hoc Tukey HSD test was used to compare the mean relative abundance of OTUs of different sam ples. A heatmap with OTU abundance and hierarchical clustering of samples was generated using R version 3.2.4 with Euclidean distances specified [12] .

Ethics Statements
These fruit flies are not endangered or protected by law and permits are not required to study them.