Metagenomic 16S rDNA amplicon data of microbial diversity of Cimex hemipterus (F.) (Hemiptera: Cimicidae) treated with insect growth regulators (IGR)

The metagenomics dataset presented here is based on bacterial 16S rDNA gene amplicons of DNA extracted from tropical bed bugs (Cimex hemipterus). Amplicon-based sequencing was performed using the Illumina MiSeq platform, and the raw sequence data were analyzed using QIIME (version 2022.8.3). The metagenome sequence comprised ten samples that include C1 (133 511bps), C2 (108 920bps), CH1 (106 562bps), CH2 (101 778bps), P1 (103 618bps), P2 (133 258bps), T1 (113 558bps), T2 (133 952bps), TM1 (125 335bps), and TM2 (118 345bps). The sequence data is readily accessible at the NCBI SRA under bio project PRJNA918835. The most abundant microbial community present in the C. hemipterus is the Proteobacteria, with more than 99% of the abundance.


Value of the data
• The metagenomics data offer comprehensive taxonomic profiles of microbial abundance and diversity in Cimex hemipterus treated with insect growth regulators. • The dataset discloses information regarding the effect of insect growth regulators on the microbial community of C.hemipterus . • The data also provides knowledge for examining the variation of microbiome influence that may contribute to a pest management approach that encourages the exploration of novel targets of chemical, genetic, and biological control for the Cimicidae family.

Objective
Cimicidae families are among the significant dilemmas of household pest management strategies since the bed bug infestation rate has failed to be fully curbed as bed bugs have the resistance capability that prolongs their survival and allows positive breeding patterns. Thus, insect growth regulators (IGR), a current key strategy to quell bed bug infestation, have been implemented. IGRs are best known as the juvenile hormone analog that hampers embryogenesis and reproduction, while chitin synthesis inhibitors impede the formation of the exoskeleton. However, IGRs are beginning to show the odds of resistance as bed bug infestations are reclaiming their title as a nuisance pest. To understand the potential cause of the resistance, it is crucial to consider the microbiome interaction with the IGR application. Hence, this study determines the microbial community of treated and untreated Cimex hemipterus exposed to IGRs using the 16S rDNA metagenomic analysis. Further research is required to determine the presence of substantial or insignificant differences in the microbial diversity between the control (untreated) and treated tropical bed bugs.

Data description
The presented dataset consists of bacterial metagenomic sequencing of control (untreated) and IGR-treated tropical bed bugs using an Illumina Miseq sequencer, which produces sequences with average reads from samples. A total of ten samples, comprised of two untreated control samples (C1 and C2) and eight IGR-treated samples (CH1, CH2, P1, P2, T1, T2, TM1, and TM2), are presented in Table 1 . Samples with similar alphabets represent two biological replicates of each sample from similar sites and strains. Two similar control samples served as negative controls and were not treated with growth regulators. The community analysis reveals the Proteobacteria family as the predominant microbial phylum, with more than 98% sequences for treated and untreated samples. The families were primarily divided into two genera, namely Wolbachia and Pectobacterium. Spirochaetota has 1%marking reads, while the other eight different phyla of bacteria, including Actinobacteriota, Bacteroidota, Desulfobacterota, Fibrobacterota, Firmicutes, Unclassified, Rs-K70 termite group, and Synergistota, have read rates of less than 1% ( Figure 1 ).

Experimental design, material, and methods
The metagenomic analysis was carried out on tropical bed bugs, C.hemipterus, collected from residential areas in Penang Island to validate their bacterial composition. Collected samples were treated with two classes of IGRs, namely the juvenile hormone analog and the chitin synthesis inhibitor [1] . Once the treated bed bug reached mortality, the samples were pooled for DNA extraction, using five bed bugs per triplicate. The samples were sterilized twice, once with 75% ethanol for 30 seconds and once with sterile distilled water for a minute, to remove the presence of external contaminants before molecular analysis. Tropical bed bugs were homogenized, and the genomic DNA was extracted using the HiYield TM Genomic DNA isolation kit (Real Biotech Corporation, Taiwan) according to the manufacturer's protocols with minimal modifications, including using 40μl Proteinase K instead of 20 μl and reducing the incubation period from three hours to one hour (Ashigar & Ab Majid, 2020 [2] ). Polymerase chain reaction (PCR) amplification was performed on the V3-V4 region of the bacteria from the extracted gDNA samples using PCR protocols (95 °C for 2 min, followed by 25 cycles at 95 °C for 30 s, 55 °C for 30 s, and 72 °C for 30 s, and a final extension at 72 °C for 5 min) (Ashigar & Ab Majid, 2021 [3] ).
Based on the standard protocol of the Illumina Miseq platform, the amplicon library was constructed, and QIIME (version 2022.8.3) was used to analyze the raw sequences. The reads were trimmed using Trimmomatic software, which naturally discarded reads less than 50 bp and gave the reads an average quality score of below 20. Using FLASH (Fast Length Adjustment of Short Reads), paired reads were integrated into a single read based on an overlapping relationship and reassembled overlapping sequences that were longer than 10 bp while deleting unassembled reads (Lim & Ab Majid, 2020 [5] ). Operational Taxonomic Units (OTU) were assigned to a taxonomy using the Ribosomal Database Project (RDP) (Gu et al., 2013 [4] ). The OTU employed UPARSE software to cluster the data sets based on a 97% similarity cut-off and UCHIME software to identify the chimeric sequences due to the vast number of reads analyzed (López-García et al., 2018 [6] ; Xie et al., 2016 [8] ). The RDP Classifier was used to examine the taxonomy of the 16S rRNA gene sequences against the SILVA 16S rRNA database with a confidence level of 0.7. By using the mothur and R software, the sequence coverage of each sample (C1, C2, CH1, CH2, P1, P2, T1, T2, and TM1) was assessed ( Fig. 2 ) (Wu et al., 2019 [7] ).  (CH1, CH2, T1, T2, TM1, TM2, P1, and P2) samples of Cimex hemipterus . The X-axis represents two biological replicates of pooled samples for each growth regulator, while the Y-axis is the taxon abundance. Phyla, which received less than 1%, were clubbed and assigned to 'Others.'

Ethics statements
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Universiti Sains Malaysia Research Ethics Committee (Human) JEPeM, Code: USM/JEPeM/19120868.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Metagenomic data of microbial diversity of Cimex hemipterus Raw sequence reads