Draft genome sequence, annotation, and SSR mining data of Elaeidobius kamerunicus Faust., an essential oil palm pollinating weevil

Elaeidobius kamerunicus Faust. (Coleoptera: Curculionidae) is an essential insect pollinator in oil palm plantations. Recently, researches have been undertaken to improve pollination efficiency using this species. A fundamental understanding of the genes related to this pollinator behavior is necessary to achieve this goal. Here, we present the draft genome sequence, annotation, and simple sequence repeat (SSR) marker data for this pollinator. In total, 34.97 Gb of sequence data from one male individual (monoisolate) were obtained using Illumina short-read platform NextSeq 500. The draft genome assembly was found to be 269.79 Mb and about 59.9% of completeness based on Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment. Functional gene annotation predicted about 26.566 genes. Also, a total of 281.668 putative SSR markers were identified. This draft genome sequence is a valuable resource for understanding the population genetics, phylogenetics, dispersal patterns, and behavior of this species.


Specifications
Omics: Genomics Specific subject area Insects, Coleoptera, Oil Palm, Weevil, Whole-genome sequencing (WGS) Type of data

Value of the Data
• This article provides the draft genome sequence data of Elaeidobius kamerunicus Faust.
(Coleoptera: Curculionidae) and thus addresses a knowledge gap of genome sequence within the order Coleoptera. • The draft genome sequence of this species will be useful for entomologists interested in functional genomics, population genetics, phylogenetics, and selection by breeding. • This dataset can be used as a reference for future complete genome assembly of this species.
• The newly developed SSR markers dataset in this report should be useful tools for assessing the genetic diversity, conservation, and bio management of this species.

Data Description
Elaeidobius kamerunicus Faust. (Coleoptera: Curculionidae) is an essential insect pollinator in oil palm plantations. This species is native to the tropical Africa region but introduced into Asia, including Indonesia [1] . The introduction of this weevil species into oil palm plantations successfully improved fruit set, increased the yield of oil palm, and reducing the need for assisted pollination [2] . Recent studies of this species have only focused on analyzing genetic diversity and species identification [1,3,4] . Interestingly, several divergent mitochondrial lineages in this species have been discovered based on the information of cytochrome c oxidase subunit I (COI) and cytochrome c oxidase subunit II (COII) gene sequences [1,3] . Our recent study successfully obtained the complete mitochondrial genome of E.kamerunicus from the partial dataset from this report, representing the first complete mitogenome for this species [5] . Nevertheless, the genomic resources of E.kamerunicus remain underdeveloped compared with many other agricultural insect species. This article presents the first draft genome assembly, annotation, and SSR marker data of the oil palm pollinating weevil, Elaeidobius kamerunicus Faust. (Coleoptera: Curculionidae). All raw sequencing reads data (34.97 Gb) used for genome assembly were deposited in the NCBI Short Read Archive (SRA) database. All of these SRA data are retrievable under the accession number SRR12726955-SRR12726958.
The assembled draft genome was constructed using the filtered reads, which is about 73.71% of the total raw sequence reads. The final draft genome assembly was 269.79 Mb containing 364.527 scaffolds with 31.71% GC ( Table 1 ). The genome project information has been deposited in the NCBI GenBank under the Bioproject ID: PRJNA637822. The whole-genome sequencing (WGS) data can be retrieved from the NCBI GenBank under accession JACGEL0 0 0 0 0 0 0 0 0.
The assembled draft genome of E.kamerunicus was used to identify simple sequence repeat (SSR) or microsatellite markers. In this dataset, we reported about 4.396 perfect SSRs (pSSRs), 3 compound SSRs (cSSRs), 251.377 imperfect SSRs (iSSRs), and 25.892 variable number tandem repeats (VNTRs) inside the E.kamerunicus genome. The annotation files of pSSRs, cSSRs, iSSRs, and VNTRs are provided in Supplementary file S1-S4 , respectively. The distribution of perfect SSRs (pSSRs) based on their motif and sequence length can be seen in Figs. 1 and 2 , respectively. Table 2 provides the detailed information of repetitive elements detected in this assembled genome. The annotation data of repetitive elements can be found in Supplementary file S5 . Functional gene annotation pipeline predicted about 26.566 genes, 14.145 were found to have GO term assigned to them. The GO term classification and distribution can be seen in Fig. 3 . The genome annotation data can be found in Supplementary file S6 .

Sample collection and sequencing
Elaeidobius kamerunicus samples were captured from oil palm female inflorescence during its anthesis. All of the samples were originally collected from an oil palm plantation of PT. Gunung Sejahtera Ibu Pertiwi, Kalimantan Tengah, Indonesia, with geospatial coordinate (2 °25 28.6 S 111 °47 26.8 E). The samples were then identified based on their morphological characteristic. One male of E.kamerunicus (monoisolate) was then selected for the determination of the genome sequence.
Total genomic DNA was extracted using the gSYNC TM DNA Extraction Kit (Geneaid) following the manufacturer's instructions. The quantity and quality of genomic DNA were measured using NanoDrop spectrophotometer (Thermo Fisher Scientific) and Qubit fluorometer (Invitrogen), followed by visualization on 0.8% agarose gel.
The library for NGS was prepared using NexteraXT library prep kit, and their quality and quantity were determined using Agilent Tapestation 4200 (Agilent), Qubit fluorometer (Invitrogen), and ABI 7500 Fast System qPCR (Applied Biosystems). The library sizes of about 350-600 bp were used for sequencing. Four paired-end libraries were generated using the Illumina NextSeq 500 sequencing platform.

Genome assembly and evaluation
The quality of the reads was assessed with the FastQC v. 0.11.2 software [6] . Genome assembly requires the sequencing quality of each base of the read at the level of Q30 (Phred scale). The raw reads were trimmed using the Trimmomatic v. 0.17 software [7] . K-mer length estimation for genome assembly was conducted using Kmergenie software [8] . Paired and unpaired high quality reads were taken as an input to the SPAdes v. 3.10.1 genomic assembler [9] with the following options: -careful -k 17, 19, 21, 23, 25, 31, 33, 35, 37, 41, 43, 45, 47, 51, 53, 55, 57, 61. Scaffolds that were < 200 bp in length were removed manually. The contaminants of foreign DNA, such as remaining adapters/vectors, organellar DNA, or contamination, were removed during submission to the NCBI GenBank database. The genome assembly statistics were obtained using QUAST software [10] . The completeness of E.kamerunicus genome assembly data was evaluated using BUSCO v. 3 analysis [11] against the Arthropoda database (odb9), consisting of 10 6 6 orthologs constructed from 60 species.

Repeat identification, masking, and genome annotation
The repetitive contents detection, such as transposable elements, retroelements, and total interspersed repeats, were detected using RepeatMasker v. 4.1 [13] with default parameters and insect Dfam repeat database [14] . The repeat-masked scaffold sequences were subjected to functional gene annotation. Functional annotation and gene ontology (GO) mapping of the final set of predicted protein sequences was carried out by OmicsBox v. 1.3.11 [15,16] .

Ethics Statement
Not applicable. No ethics protocols are required for Coleoptera in Indonesia.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.