Draft genome sequence, annotation and SSR mining data of Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae), the coconut rhinoceros beetle

The coconut rhinoceros beetle (CRB), Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae), is one of the major pests of coconut causing severe yield losses. The adult beetles feed on unopened spear leaf (resulting in the typical ‘V’-shaped cuts), spathes, inflorescence, and tender nut leading to stunted palm growth and yield reduction. Moreover, these damages serve as predisposing factors to the entry of other fatal enemies on palms, viz., red palm weevil and bud rot disease, causing yield loss as high as 10%. CRB attacks juvenile palms through the collar region, affecting the growth and initial establishment of the juvenile palms. While the immature stages of CRB sustain on organic debris, the adult beetles are ubiquitous pests on coconut and other palms. The discovery of a new invasive haplotype of CRB from Guam and other Pacific Islands, insensitive to Oryctes rhinoceros nudivirus (OrNV), a potent biocontrol agent, has raised serious concerns. The draft genome sequence and simple sequence repeat (SSR) marker data for this important pest of coconut are presented here. A total of 30 Gb of sequence data from an individual third instar larva was obtained on an Illumina HiSeq X Five platform. The draft genome assembly was found to be 372 Mb, with 97.6% completeness based on Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment. Functional gene annotation predicted about 16,241 genes. In addition, a total of 21,999 putative simple sequence repeat (SSR) markers were identified. The obtained draft genome is a valuable resource for comprehending population genetics, dispersal patterns, phylogenetics, and species behavior.


a b s t r a c t
The coconut rhinoceros beetle (CRB), Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae), is one of the major pests of coconut causing severe yield losses. The adult beetles feed on unopened spear leaf (resulting in the typical 'V'-shaped cuts), spathes, inflorescence, and tender nut leading to stunted palm growth and yield reduction. Moreover, these damages serve as predisposing factors to the entry of other fatal enemies on palms, viz., red palm weevil and bud rot disease, causing yield loss as high as 10%. CRB attacks juvenile palms through the collar region, affecting the growth and initial establishment of the juvenile palms. While the immature stages of CRB sustain on organic debris, the adult beetles are ubiquitous pests on coconut and other palms. The discovery of a new invasive haplotype of CRB from Guam and other Pacific Islands, insensitive to Oryctes rhinoceros nudivirus ( OrNV ), a potent biocontrol agent, has raised serious concerns. The draft genome sequence and simple sequence repeat (SSR) marker data for this important pest of coconut are presented here. A total of 30 Gb of sequence data from an individual third instar larva was obtained on an Illumina HiSeq X Five platform.

Value of the Data
• The dataset provides the draft genome sequence for a notorious pest of coconut, the coconut rhinoceros beetle (CRB) ( Oryctes rhinoceros ). • This dataset will be useful to palm entomologists working in the area of genomics and phylogenetics. • The dataset would enable the prediction of genes conferring pesticide resistance in this economically important coleopteran. • The genome can be mined to identify effective and novel targets for the control of CRB and targeting of specific genes with molecular tools like RNAi (RNA interference).

Data Description
The first draft genome assembly, annotation, and SSR marker data of the Oryctes rhinoceros Linn. (Coleoptera: Scarabaeidae) is presented in this article. A total of 215 million reads (equating to 30 Gb) data was generated after sequencing. A summary of statistics on the draft genome and its features are provided in Table 1 . The assembly consisted of 25,242 scaffolds with an N50 of 5.42 Mb ( Table 1 ) and was 372.39 Mb in total length. The 355.25 Mb genome size estimated with k-mer spectra using Jellyfish (with k -mer size set as 77) is complementary to these estimates. Table 2 provides the assessment of genome completeness using the BUSCO tool. The analysis  revealed 96.9% of the core Insecta orthologs were complete and single copy, 0.7% complete and duplicated, 1.4% fragmented, and 1% missing. These results indicate that the genome is of good quality. Detailed information of the repetitive elements detected in the assembled CRB genome is provided in Supplementary file S1. About 90.3 Mb (24.2%) of repeats were predicted in the draft genome of CRB. Functional gene annotation pipeline predicted about 16,241 genes, of which 13,779 gene isoforms were annotated with NCBI RefSeq and UniProt databases. The GO term classification and distribution, visualization using Web Gene Ontology Annotation Plot (WEGO), can be seen in Fig. 1 . The assembled draft genome of CRB was also used for the identification of simple sequence repeat (SSR) or microsatellite markers, the details of which are provided in Table 3 . Totally, 21,999 SSRs were identified, of which 1414 SSRs were present in compound formation.

DNA extraction and sequencing
A standardized procedure was used for the rearing of the beetle. One individual third instar larva (60 days) was taken, and DNA was extracted from the whole body using DNeasy Blood & Tissue Kit (Qiagen). The sample DNA concentration was ascertained using a Qubit 4 Fluorometer ( Thermo Fisher Scientific). The DNA library was prepared using KAPA HyperPlus Kit (Roche) as per standard protocol. Sequencing was undertaken on an Illumina Hiseq X Five platform.

Data pre-processing and genome profiling
The raw sequence reads were initially processed by fastp (v0.2.0) [1] to trim adapter sequences and eliminate low-quality reads. The k -mer profile was then computed, with the values of k ranging between 71 and 101, with an interval of 4, using Jellyfish (v2.3.0) [2] . The obtained k -mer frequencies were processed to estimate major genome characteristics, including genome size, heterozygosity and repetitiveness employing GenomeScope [3] . A k -mer size of 77 was recommended by KmerGenie (v1.7051) [4] for an optimal genome assembly.

Genome assembly and evaluation
The primary assembly was constructed using ABySS 2.0 [5] . BESST [6] was used for performing scaffolding on the primary assembly. The assembly was further polished by reference-based scaffolding. RaGOO [7] was used with the available Oryctes borbonicus genome [NCBI Accession No. GCA_902654985.1] to reorient and improve the assembly. All contigs below 10 0 0 bp were discarded in the final assembly. Finally, the assembled genome was evaluated for completeness with BUSCO [8] by searching against the insecta_odb10 database.

Identification of simple sequence repeats (SSRs)
Using MISA-web [20] , with the parameters of '10 for mono, '6 for di, and '5 for tri-, tetra-, penta-and hexa-nucleotide motifs, all assembled contigs of CRB were screened for the existence of simple sequence repeats (SSRs).

Ethics Statement
Not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Supplementary Materials
Supplementary material associated with this article can be found in the online version at doi: 10.1016/j.dib.2021.107424 .