Next-generation sequencing dataset of bacterial communities of Microcerotermes crassus workers associated with Ironwood trees (Casuarina equisetifolia) in Guam

Ironwood trees (Casuarina equisetifolia) in Guam have been suffering from Ironwood Tree Decline (IWTD) since 2002. Putative plant pathogenic bacteria such as Ralstonia solanacearum and Klebsiella species were identified in the ooze of declining trees and considered to be linked to IWTD. In addition, termites were found to be significantly associated with IWTD. Microcerotermes crassus Snyder (Blattodea: Termitidae) was identified as a termite species that attacks ironwood trees in Guam. Since termites harbor a diverse community of symbiotic and environmental bacteria, we sequenced the microbiome of M. crassus workers attacking ironwood trees in Guam to assess the presence of IWTD-associated pathogens in termite bodies. This dataset contains 652,571 raw sequencing reads present in M. crassus worker samples collected from six ironwood trees in Guam obtained via sequencing the V4 region of the16S rRNA gene on the Illumina NovaSeq (2 × 250bp) platform. Sequences were taxonomically assigned in QIIME2 using SILVA 132 and NCBI GenBank as reference databases. Spirochaetes and Fibrobacteres were the most dominant phyla in M. crassus workers. No putative plant pathogens of the genera Ralstonia or Klebsiella were found in the M. crassus samples. The dataset has been made publicly available through NCBI GenBank under BioProject ID PRJNA883256. This dataset can be used to compare the bacterial taxa present in M. crassus workers in Guam to bacteria communities of related termite species from other geographical locations. In addition, this dataset can also be used to investigate the relationship between termite microbiomes and the microbiomes of ironwood trees they attack and of the surrounding soil.


a b s t r a c t
Ironwood trees ( Casuarina equisetifolia ) in Guam have been suffering from Ironwood Tree Decline (IWTD) since 2002. Putative plant pathogenic bacteria such as Ralstonia solanacearum and Klebsiella species were identified in the ooze of declining trees and considered to be linked to IWTD. In addition, termites were found to be significantly associated with IWTD. Microcerotermes crassus Snyder (Blattodea: Termitidae) was identified as a termite species that attacks ironwood trees in Guam. Since termites harbor a diverse community of symbiotic and environmental bacteria, we sequenced the microbiome of M. crassus workers attacking ironwood trees in Guam to assess the presence of IWTDassociated pathogens in termite bodies. This dataset contains 652,571 raw sequencing reads present in M. crassus worker samples collected from six ironwood trees in Guam obtained via sequencing the V4 region of the16S rRNA gene on the Illumina NovaSeq (2 × 250bp) platform. Sequences were taxonomically assigned in QIIME2 using SILVA 132 and NCBI GenBank as reference databases. Spirochaetes and Fibrobacteres were the most dominant phyla in M. crassus workers. No putative plant pathogens of the genera Ralstonia or Klebsiella were found in the M. crassus samples. The dataset has been made publicly available through NCBI GenBank under BioProject ID PRJNA883256. This dataset can be used to compare the bacterial taxa present in M. crassus workers in Guam to bacteria communities of related termite species from other geographical locations. In addition, this dataset can also be used to investigate the relationship between termite microbiomes and the microbiomes of ironwood trees they attack and of the surrounding soil. ©

Value of the Data
• This dataset contributes to the investigation of Ironwood Tree Decline which is significantly associated with the presence of putative bacterial plant pathogens and termites on ironwood trees in Guam [ 1 , 2 ]. • Microcerotermes crassus workers harbor diverse bacteria and have been shown to attack ironwood trees in Guam [3] . Therefore, this dataset of bacterial communities of M. crassus workers associated with ironwood trees in Guam, would be useful to conservation scientists, foresters, and pest management professionals for assessing whether worker termites of this species could be vectors for IWTD pathogenic bacteria. an association between the microbiota of termites with the microbiota of the ironwood trees they are attacking or the microbiota of the soil surrounding the ironwood trees.

Objective
Casuarina equisetifolia (ironwood) trees in Guam are dying in large numbers due to ironwood tree decline (IWTD) since 2002 [4][5][6][7]. Bacteria such as the bacterial wilt pathogen Ralstonia solanacearum and wetwood bacteria of the genus Klebsiella ( Klebsiella oxytoca and Klebsiella variicola ) were identified in the ooze of declining trees and considered as predictors of IWTD [ 6 , 7 , 2 , 8 ]. The presence of termites was found to be significantly associated (p < 0.01) with IWTD [1] and termites are known to harbor a large bacteria diversity [9] . This dataset was generated to describe the bacterial composition of M. crassus termite workers collected from ironwood trees and assess if workers carry putative bacterial pathogens associated with IWTD.

Data Description
To identify the bacterial taxa present in M. crassus workers, the V4 variable region of the 16S rRNA was amplified using the Illumina NovaSeq platform. The links and accession numbers to the fastq files in this dataset are provided in Table 1 . The bacterial sequences present in all the samples were assigned to their respective taxa using the Quantitative Insights into Microbial Ecology (QIIME2 version 2021.8) pipeline [10] . A total of 652,571 raw sequencing reads were obtained across six M. crassus worker samples collected from six ironwood trees in Guam. A total of 378,976 sequence reads represented by 2,165 Amplicon Sequence Variants (ASVs) remained after quality filtering using DADA2. The removal of ASVs with no taxonomical assignment at 97% identity to references in the SILVA 132 database resulted in 231,967 sequence reads and 831 ASVs.
Rarefaction curves ( Fig. 1 ) were plotted to assess whether sequencing depth, sample numbers, and coverage were sufficient to capture most of the bacterial diversity present in the M. crassus worker samples. The sequence-depth based rarefaction curves were generated by plotting sequencing depth against different alpha diversity metrics. The sequence-depth based rarefaction curves for ASV richness and Faith's phylogenetic distance (PD) between the ASVs ( Fig. 1 a) started to level out at a sequencing depth of around 1,500. The sequence-depth based rarefaction curves for Shannon diversity levelled out at sequencing depth of 500. The levelling out of the rarefaction curves indicated that sequencing depth of the samples captured most of the bacteria diversity within each sample.
Sample-, and coverage-based rarefaction curves ( Fig. 1 b, c) were generated by plotting effective diversity against number of samples and estimated sample coverage, respectively. The effective diversity measures both relative abundance and richness and is quantified by Hill numbers (parameterized by q ). ASV richness, Shannon diversity, and Simpson diversity were quantified at q = 0, 1 and 2, respectively. The sample-and coverage-based rarefaction curves were extrapolated to twice the sample size to compute the effective diversity.  The sample-based rarefaction curves ( Fig. 1 b) started to level out at an effective diversity of around 150 and 120 for Shannon diversity and Simpson inverse, respectively, and extrapolation did not considerably increase these values. The sample-based rarefaction curve for ASV richness continued to increase after extrapolation. However, the increase in ASV richness would be due to rare ASVs since the increase in ASV richness was not accompanied by an increase in Shannon diversity and Simpson inverse ( Fig. 1 b). Six samples provided around 80% sample coverage ( Fig. 1 c). Extrapolation of the curves to twice the sample size increased the coverage to 95%.

Termite collection and DNA sequencing
Workers and soldiers of M. crassus were collected by the team of the University of Guam in 2021 from six different ironwood trees in Guam ( Table 3 ). Trees were separated by at least 30 meters to ensure that all the termite samples collected were from different colonies. Termite samples were preserved in 95% ethanol and sent to Louisiana State University for analysis. The soldier caste was used for morphological identification of the termite species [3] . Five workers per sample were pooled and DNA was extracted using DNeasy Blood & Tissue kit (Qiagen, Germantown, MA). We used sterile techniques throughout the DNA extraction process to minimize the risk of contamination. The concentration of extracted DNA was measured with the Invitrogen Qubit 4 Fluorometer (Thermo Fisher Scientific, Wilmington, DE) using the Qubit dsDNA BR Assay Kit (Invitrogen TM , Life Technologies TM ). The DNA was sent to the University of New Hampshire Hubbard Center for Genome Studies for sequencing. The V4 region of 16S rRNA gene of the bacterial DNA was amplified using the primers 515F and 926R [11] and sequenced on the Illumina NovaSeq (2 × 250bp) platform using Illumina Nextera Dilute library protocol with a spike-in of 1% Phi X (Illumina, San Diego, CA).

Bioinformatics and statistical analysis
Bioinformatic analysis was performed using QIIME2 version 2021.8 pipeline [10] . Demultiplexed fastq sequences were obtained after sequencing from the University of New Hampshire Hubbard Center for Genome Studies. Primers and chimera sequences were removed and Phred quality scores of the sequences were checked using DADA2 [12] . All the sequences were of good quality (Phred quality score > 30); therefore, no trimming was required. Sequence reads of 251 nucleotide length were obtained and forward reads were used for further analysis. Sequence depth based-rarefaction curves were plotted after subsampling the sequence reads to the number of sequences in the sample with the lowest sequencing depth of 4,118 using the QIIME 2 pipeline. Sample size-and coverage-based rarefaction curves were plotted using R package iN-EXT (iNterpolation/ EXTrapolation) [13] . The sequence reads or ASVs obtained after DADA2 quality filtering were assigned to their respective taxa by comparing them to the SILVA 132 reference database [14] using the consensus method in BLAST at a 97% pairwise identity cutoff. The ASVs that were not assigned to any taxonomic group were removed from the dataset before generating taxa bar plots showing relative abundance of ASVs at the phylum level. The taxonomical assignments of the top 20 ASVs with the highest number of reads were cross-checked against references in NCBI GenBank database (2021) by performing BLAST [15] . Codes for analysis in this manuscript are available at https://github.com/garima-setia/Microcerotermes-crassus .

Ethics Statements
Not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Next-generation sequencing dataset of bacterial communities of Microcerotermes crassus workers associated with Ironwood trees (Casuarina equisetifolia) in Guam (Original data) (NCBI).