The mitogenome data of Holothuria (Mertensiothuria) leucospilota (Brandt,1835) from Malaysia

White threads fish Holothuria (Mertensiothuria) leucospilota (Brandt, 1835) or locally known as bat puntil is a neritic marine organism, and it is widely distributed in Indo Pacific. They serve many important roles in ecosystem services and were discovered to contain many bioactive compounds that are useful for medicinal value. However, despite its abundance in Malaysian seawater, there is still a lack of records on H. leucospilota mitochondrial genome (mitogenome) from Malaysia. The mitogenome of H. leucospilota originating from Sedili Kechil, Kota Tinggi, Johor, Malaysia, is presented here. Whole genome sequencing was successfully sequenced using Illumina NovaSEQ6000 sequencing system and the mitochondrial-derived contigs were assembled using de novo approach. The size of the mitogenome is 15,982 bp which consists of 13 protein-coding genes (PCGs), 21 transfer RNAs, and 2 ribosomal RNAs. The overall composition of nucleotide bases was estimated to be 25.8% for T, 25.9% for C, 31.8% for A and 16.5% for G (with A + T content of 57.6%). Maximum likelihood phylogenetic tree analysis revealed that the mitochondrial Protein-Coding Genes (PCGs) sequence data from our H. leucospilota is closely related to H. leucospilota from accession number MK940237 and H. leucospilota from accession number MN594790, followed by H. leucospilota from accession number MN276190, forming sister group with H. hilla (MN163001), known as Tiger tail sea cucumber. The mitogenome of H. leucospilota will be valuable for genetic research, mitogenome reference and future conservation management of sea cucumber in Malaysia. The mitogenome data of H. leucospilota from Sedili Kechil, Kota Tinggi, Johor, Malaysia is available in the GenBank database repository with accession number ON584426.


a b s t r a c t
White threads fish Holothuria (Mertensiothuria) leucospilota (Brandt, 1835) or locally known as bat puntil is a neritic marine organism, and it is widely distributed in Indo Pacific. They serve many important roles in ecosystem services and were discovered to contain many bioactive compounds that are useful for medicinal value. However, despite its abundance in Malaysian seawater, there is still a lack of records on H. leucospilota mitochondrial genome (mitogenome) from Malaysia. The mitogenome of H. leucospilota originating from Sedili Kechil, Kota Tinggi, Johor, Malaysia, is presented here. Whole genome sequencing was successfully sequenced using Illumina NovaSEQ60 0 0 sequencing system and the mitochondrial-derived contigs were assembled using de novo approach. The size of the mitogenome is 15,982 bp which consists of 13 protein-coding genes (PCGs), 21 transfer RNAs, and 2 ribosomal RNAs. The overall composition of nucleotide bases was estimated to be 25.8% for T, 25.9% for C, 31.8% for A and 16.5% for G (with A + T content of 57.6%). Maximum likelihood phylogenetic tree analysis revealed that the mitochondrial Protein-Coding Genes (PCGs) sequence data from our H. leucospilota is closely related to H. leucospilota from accession number MK940237

Value of the Data
• This data will offer the mitogenome sequence of H. leucospilota originating from Malaysia, which will be valuable for species identification, molecular taxonomy, species conservation, genetic barcoding and phylogenetics of Malaysian sea cucumber. • This data can be applied in environmental DNA (eDNA) metabarcoding to analyze ecosystems in non-invasive approaches for biodiversity monitoring. • This data provides sequences that can be applied for partial gene identification and comparison that benefit researchers to resolve both taxonomic issue and product mislabeling in Malaysian sea cucumber markets. • This data provides PCGs that are useful in phylogenetic tree construction to improve statistical confidence and better resolution analyses compared to partial gene sequence. • This data would update and improves genetic documentation of H. leucospilota in Malaysia, as well as in public genetic database repository.

Objective
In Malaysia, H. leucospilota (Phylum Echinodermata; Class Holothuroidea; Order Aspidochirotida) is known as bat puntil, balat hitam, bat hitam [6] or patola [7] . Currently, the species is listed as 'Least Concern' on the International Union for Conservation of Nature (IUCN) Red List of Threatened Species status and is considered low value species in markets [8] , however, according to a previous report, the species is often vulnerable to overexploitation after high-value sea cucumber species in the fishing zone are depleted [7 , 8] as there are few to no regulations of the species fished [9] . Therefore, these issues consequently lead the species into a brink of local extinction [7] . Presently, there is still no record of H. leucospilota mitogenome from Malaysia. The most recent sourced records of H. leucospilota mitogenome obtained from GenBank, NCBI repository are from China [10][11][12] . Thus, our objective is to obtain a complete mitogenome of H. leucospilota originating from Sedili Kechil, Kota Tinggi, Johor, Malaysia.

Data Description
The mitogenome of H. leucospilota showed a total length of 15,982 bp which encode 13 protein-coding genes ( COX1, COX2, COX3, ND4L, CYTB, ATP8, ATP6, ND1, ND2, ND3, ND4, ND5, ND6 ), 21 transfer RNAs and 2 ribosomal RNAs ( 12S rRNA and 16S rRNA ) ( Fig. 1 ). The overall nucleotide bases composition was estimated to be T 25.8%, C 25.9%, A 31.8% and G 16.5% with A + T content of 57.6%. One gene from transfer RNA was missing ( tRNA-Ile ). The putative control region between trnT ( UGU ) and trnP ( UGG ) was also not determined possibly due to low coverage during sequencing and difficulty to assemble and sequenced repetitive DNA region [13 , 14] . Nonetheless, all PCGs are the considered component for phylogenetic reconstruction of sea cucumber species in this study as PCGs illustrate better resolution of functional divergence and speciation [15] . Moreover, whole mitogenomic phylogenetic tree does not indicate a good resolution in the analysis because of the relatively fast evolutionary rate of transfer RNA genes for approximately 7 to 10-fold higher than the genome wide average that would disrupt the construction of the phylogenetic tree [16 , 17] . Here, the 13 PCGs of H. leucospilota are presented with other 36 genes in Table 1 while Table 2 shows the base composition and relative skewness (AT skew and GC skew) of H. leucospilota mitogenome. In PCGs, ND6 gene is the only PCG that encodes at reverse strand while other PCGs encodes at forward strand. Most PCGs have typical mitochondrial start codon ATG (Methionine) [18] and the most termination codon is TAA , except for ND4 and ND6 genes which stop by codon TAG .
The mitogenome PCGs data was compared with other three H. leucospilota mitogenome PCGs obtained from GenBank repository, NCBI based on simple pairwise alignment algorithm from BLAST nucleotide ( https://blast.ncbi.nlm.nih.gov/ ) ( Table 3 ). According to the data, the most identical sequence is from [12] (accession number: MK940237), which is 99.56% of identity, followed by [11] (accession number: MN594790) and [10] (accession number: MN276190), which both similarities are 99.40% of identity. Maximum likelihood analysis was implemented in MEGA v11.0 [19] based on 13 concatenated PCGs of 13 individual species of sea cucumber obtained from the GenBank, NCBI repository. General Time Reversible model + Invariant site + Gamma distribution (GTR + I + G) was selected as the best-fit evolution model for the maximum likelihood phylogenetic tree.
According to the maximum likelihood phylogenetic tree, H. leucospilota from Sedili Kechil is clustered together to H. leucospilota from accession number MK940237 [12] and H. leucospilota from accession number MN594790 [11] , followed by H. leucospilota from accession number MN276190 [10] , which formed an independent branch, in which displayed as a small distinct from other individuals of the same species. H. leucospilota is the sister group to H. hilla (accession number: MN163001), known as Tiger tail sea cucumber and clustered together with other species from order Holothuriida in a monophyletic clade ( Fig. 2 ).

Specimen sampling and library preparation
The individual specimen (BioSample number: SAMN27554787 [3] ) was collected at intertidal zone of Tanjung Sedili Beach during low tide on November 2021 (Latitude: 1.82611N Longitude: 104.15869E) ( Fig. 3 ). H. leucospilota specimen was confirmed its locality by referring previous article [20] and identified based on its feature characteristics and behavior: entirely black-coloured body, cylindrical, elongated snake-like body, moderately tapered at anterior and posterior ends but broader at posterior half, mouth have 20 peltate tentacles [10 , 9] . The species excreted white sticky threads (Cuvierian tubules) and internal organs from anal openings under stress. The specimen was anesthetized using 5% MgSO 4 solute with seawater and then preserved in ethyl alcohol (95% ethanol) and stored in 4 °C fridge with proper tagging. The total genomic DNA (total gDNA) of H. leucospilota specimen was isolated from muscle tissue of the specimen using Favorprep TM Tissue Genomic DNA Extraction Mini Kit (Favorgen, Taiwan) according to manufacturer's instructions with minor modifications. The extracted total gDNA was subjected to Nanophotometer® (IMPLEN N50 Touch, Germany) and 2% (weight/volume) agarose horizontal gel electrophoresis (BIO-RAD) to verify the quantity and quality of total gDNA.

Library preparation and mitogenome assembly
For library preparation, approximately 100 ng of DNA was fragmented to 350 bp using a Bioruptor followed by NEB Ultra II library preparation (NEB, Ipswich, MA) according to the manufacturer's instructions. Whole genome sequencing was performed on an Illumina NovaSEQ60 0 0 (San Diego, CA) using a run configuration of 2 × 150 bp to generate approximately 1 Gb of data for each sample. The generated raw data was deposited in the NCBI Sequence Read Archive (SRA) under accession number: SRS12836453 [1] . Then, the generated raw reads were trimmed with fastp v0.21 [21] for quality check and providing clean data by eliminating low-quality bases and Illumina adapter sequences. The trimmed reads were then assembled into contigs in de novo assembler MegaHIT (by default setting) [22] . The mitochondrial-derived contigs were identified, circularised and annotated using MitoZ [23] .

Ethics Statements
The experiment complied with the ARRIVE guidelines and were carried out in accordance with the U.K. Animals (Scientific Procedures) Act, 1986 and associated guidelines; EU Directive 2010/63/EU for animal experiments; or the National Institutes of Health guide for the care and use of laboratory animals (NIH Publications No. 8023, revised 1978).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
The