Draft genome dataset of Tapinoma indicum (Forel) (Hymenoptera: Formicidae) in Penang Island, Malaysia

Tapinoma indicum is a household pest that is widely distributed in Asian countries. It is known as nuisance pest that causes annoyance and disturbance by constructing nests and foraging in building for food and water. This article documents the draft genome dataset of T. indicum collected in Penang Island, Malaysia using the next-generation sequencing known as the Illumina platform. This article presents the pair-end 150 bp genome dataset and the quality of the sequencing result. This dataset provides the information for further understanding of T. indicum in the molecular aspect and the opportunity to develop a novel method for pest control and regulation. The dataset is available under Sequence Read Archive (SRA) databases with the accession number SRR10848807.


a b s t r a c t
Tapinoma indicum is a household pest that is widely distributed in Asian countries. It is known as nuisance pest that causes annoyance and disturbance by constructing nests and foraging in building for food and water. This article documents the draft genome dataset of T. indicum collected in Penang Island, Malaysia using the next-generation sequencing known as the Illumina platform. This article presents the pair-end 150 bp genome dataset and the quality of the sequencing result. This dataset provides the information for further understanding of T. indicum in the molecular aspect and the opportunity to develop a novel method for pest control and regulation. The dataset is available under Sequence Read Archive (SRA) databases with the accession number SRR10848807.
© 2020 The Author(s

Value of the data
• The first Tapinoma indicum sequenced draft genome data.
• T. indicum represents one of the major nuisance pests widely distribute in Asian countries.
• The T. indicum draft genome data could be used for microsatellite marker design.
• Further study could potentially develop a novel pest control and management approach based on genetic diversity of the pest ( T. indicum ).

Data description
The dataset described in this article is the whole-genome paired-end sequencing result of BioSample SAMN13707189 under the BioProject PRJNA598521. It is registered under the Sequence Read Archive (SRA) databases with the accession number SRR10848807. The data set comprised of two high throughput sequencing fastq files: The quality score of the dataset falls between Q30 to Q40, where Q30 indicates 99.9% of the correct base and Q40 indicates 99.99% of the correct base ( Fig. 1 ). The rate of the single base error along the position of the read is under 0.08% ( Fig. 2 ). The total GC content stands for 40.98% ( Fig. 3 ). Out of 16,363,685 raw reads, 99.72% are the clean reads, followed by 0.27% reads related to the adapter sequence and 0.01% reads containing N base sequence ( Fig. 4 ). The forward adapter sequence is 5 -AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3 and the reverse adapter sequence is 5 -GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGT CTTCTGCTTG-3 .

Sampling and DNA extraction
The Tapinoma indicum was collected using a baiting method with peanut butter and honey [1] . The baits were left for 3 h at the location chosen for baiting. After collection, the T. indicum were immediately freeze killed and stored in 95% ethanol under −20 °C. A total of five T. indicum workers were used for the genomic DNA (gDNA) extraction. The abdomen body part of T. indicum workers were removed before gDNA extraction was performed to minimize the risk of DNA contamination by the gut microbiomes [2] . The gDNA extraction was carried out using HiYield PlusTM Genomic DNA Mini Kit (Blood/Tissue/Cultured Cells) (Real Biotech Corp., Taipei, Taiwan) according to the manufacturer's instruction, with minimum modification by repeating the elution step twice with 50 μl elution buffers to maximize DNA yield. The head and thorax tissues were vortexed in lysis buffer with Proteinase K and incubated at 60 °C for 1 h. After the DNA binds to the filter column through an ethanol wash, elution was carried out twice using 50 μl elution buffers to get a total of 100 μl gDNA solution [3] . The gDNA extracted was quantified by using NanoDrop 20 0 0 c Spectrophotometer (Thermo Fisher Scientific, Massachusetts, US).

Library preparation and sequencing
The sequencing library was generated using NEBNext R DNA Library Prep Kit (New England Biolabs, Ipswich, England) following the manufacturer's recommendations. A total of 1.0 μg gDNA was used in DNA fragmentation by randomly shearing into a 350 bp DNA fragment. The DNA fragment was end-repaired and added to dA-tailed. Then the NEBNext adapters for Illumina sequencing were ligated to the DNA fragments and PCR amplified using P5 and indexed P7 oligos. After the purification of the PCR products using the AMPure XP system (Beckman Coulter, Indianapolis, US), the library sequences were analysed for size distribution using Agilent 2100 Bioanalyzer (Agilent, Santa Clara, US) and quantified through real-time PCR. The qualified libraries are pooled and fed into the Illumina Hiseq 20 0 0 sequencers with the layout of pair-ended 150 bp reads.