The first complete mitochondrial genome data of Hippocampus kuda originating from Malaysia

The spotted seahorse, Hippocampus kuda population is exponentially decreasing globally due to habitat loss contributed by massive coastal urbanization as well as its large exploitation for Chinese herbal medicine. Genomic data would be highly useful to improve biomonitoring of seahorse populations in Malaysia via the usage of non-invasive approaches such as water environmental DNA. Here we report the first complete mitogenome of two H. kuda individuals originating from Malaysia, generated using BGISEQ-500RS sequencer. The lengths of both mitogenomes are 16,529bp, consisting of 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a control region. The overall base composition was 32.46% for A, 29.40% for T, 14.73% for G and 23.41% for C with AT rich features (61.86%). The gene organization of Malaysian H. kuda were similar to that of most teleost species. A phylogenetic analysis of the genome against mtDNA data from other Hippocampus species showed that Malaysian H. kuda samples clustered with H. capensis, H. reidi and H. kuda. Notably however, analysis of the data using BLASTn revealed they had 99.18% similarity to H. capensis, and only 97.66% to H. kuda and H. reidi, which are all part of the unresolved H. kuda complex. The mitogenomes are deposited in Genbank under the accession number MT221436 (HK1) and MT221436 (HK2).


a b s t r a c t
The spotted seahorse, Hippocampus kuda population is exponentially decreasing globally due to habitat loss contributed by massive coastal urbanization as well as its large exploitation for Chinese herbal medicine. Genomic data would be highly useful to improve biomonitoring of seahorse populations in Malaysia via the usage of non-invasive approaches such as water environmental DNA. Here we report the first complete mitogenome of two H. kuda individuals originating from Malaysia, generated using BGISEQ-500RS sequencer. The lengths of both mitogenomes are 16,529bp, consisting of 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a control region. The overall base composition was 32.46% for A, 29.40% for T, 14

Value of the Data
• The mitogenomes will be useful for H. kuda species monitoring using water environmental DNA approach • The data generated will be useful to resolve the H. kuda complex phylogenetic, population and evolutionary studies. • The data will contribute to our understanding of any adaptive introgression which take place within the H. kuda clade.

Data Description
The spotted seahorse, Hippocampus kuda Bleeker, 1852a is known for its species-complex due to the exceptionally large distribution all around the world [2] . However, among the Hippocampus genus, this species is decreasing due to overexploitation for its alleged medicinal properties [3] . Globally, seahorse populations are suffering an exponential decline due to anthropogenic and environmental actions that threaten their survival [3] . Massive development of coastal areas in Malaysia for mega urbanization projects, in regions that serve as its natural habitat are also clearly a threat for its populations [4] . Currently, this species is listed as vulnerable under the (IUCN) Red List of Threatened Species [5] .
Here, we provide the Malaysian H. kuda mitogenomes with 16,529bp in length. The data information for each individual is presented in Table 1 . The representative complete mitogenome map in Fig. 1 shows similar gene arrangement containing 37 genes; 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a non-coding A + T rich control region (D-loop) as in other seahorse mitogenomes [6] . Total length of 13 PCGs is 11,319 bp and they encoded 3773 amino acids. The overall base composition is estimated to be 32.46% for A, 29.40% for T, 14.73% for G and 23.41% for C, indicating an obvious AT rich feature (61.86%). The genes of NAD6 and eight tRNAs are encoded on the light strand, while the rest of mitochondrial genes are encoded on H-strand ( Table 2 , supplementary data 1).
A phylogenetic tree of all available Hippocampus mitogenomes was also constructed ( Fig. 2 ). In total we included the eighteen Hippocampus species available in Genbank along with both Malaysian H. kuda generated in this work. The mitogenomes were firstly aligned using MUSCLE [7] , after which a phylogenetic tree was constructed using the neighbor-joining (NJ) method. The  We also compared the mitogenomes to Genbank using BLASTn, and found the closest match for both Malaysian H. kuda mitogenomes was a 99.18% similarity to a H. capensis (NC_042791.1) sample collected from Bozhou Chinese herbal medicine market (Bozhou, China) [8] . The next closest match, at 97.66%, was to the sole H. kuda mitogenome currently available in Genbank (NC_010272.1), from a sample originating Vancouver Aquarium, Canada [9] . A similar match was also found to a H. reidi sample (NC_027931.1) [10] . Interestingly, both H. capensis and H. reidi species are not found in the Malaysian region. However, it is worth noting that there is an on-  going debate about these species being associated with the unresolved 'H. kuda clade'. Due to their large distribution, these species exhibit localized haplotypes, phylogeographic structuring, and variable morphology [2] . These findings clearly underscore for future studies using nuclear DNA (nuDNA) to fully resolve the relationship within these Hippocampus species.

Biological samples
Two individuals of Hippocampus kuda were caught incidentally as bycatch at Pulai River, Johor, Malaysia (Latitude: 1 °22 59.99" N Longitude: 103 °31 59.99" E), and identified based on its morphometric features [2] . Tissue samples of the two individuals, H. kuda (HK1) and H. kuda (HK2) were collected from the tip of the tails. The genomic DNA was extracted using Qiagen Blood and Tissue Kit (Qiagen, Valencia, CA). The DNA was later fragmented into 30 0-40 0bp using a M220 Focused-ultrasonicator (Covaris, USA) [11] and BGISeq compatible shotgun sequencing libraries were build using the Blunt-End-Single-Tube (BEST) library protocol [12] . Quantitative PCR was performed prior to index PCR in order to ensure the library was not over-amplified prior to sequencing. Each library was purified using Solid-Phase Reversible Immobilization (SPRI) bead solution. The quality control of generated libraries was quantified quantitatively and qualitatively using Qubit 2.0 Fluorometer (Invitrogen, Merelbeke, Belgium) and Agilent 2100 Bioanalyzer (Agilent, Santa Clara, USA). The libraries were pooled to the equimolar with 15 other libraries (not related to this work). Next, the libraries were sent for shotgun sequencing on the BGISEQ-500 platform in 100bp paired-end mode (PE100) (BGI, Shenzhen, China). The data generated were firstly demultiplexed by index prior to mitogenome construction. Hippocampus genus constructed with the combined protein-coding gene nucleotide sequences using MEGAX [24] . The tree was generated from NJ method using pipefish as an outgroup. Bootstrap values generated from 10 0 0 replicates for NJ analysis. The number at each node indicated the bootstrap probability of NJ analysis.

Complete mitogenome generation
The quality of the raw reads generated was verified using the fastQC program ( https:// www.bioinformatics.babraham.ac.uk/projects/fastqc/ ). The raw reads were trimmed for sequencing adapters, low-quality stretches, and leading/tailing Ns using AdapterRemoval v2.2.2 [13] . Forward and reverse reads were interleaved into a single file prior to the assembly. The assembly H. kuda (HK1) and H. kuda (HK2) was conducted using MITOBIM v1.8 [14] (default k-mer size of 31), which performs reference assemblies using MIRA iterations [15] . The reference sequence used for the assembly was H. kuda from Vancouver Aquarium, Canada (Genbank Accession Number: NC_010272.1). Next, we used the PALEOMIX v1.2.6 BAM pipeline [16] with default parameters to remove reads shorter than 25 bp after trimming. The trimmed reads were aligned using Burrows-Wheeler Aligner [17] against the newly assembled mitogenome constructed by MITO-BIM. Further trimming for the alignments that showed PCR duplicates and low-quality scores were conducted using MarkDuplicates program from Picard tools [18] . Next, the IndelRealigner tool from the Genome Analysis Toolkit (GATK) [19] was used to locally realign the reads around the small insertions and deletions (indels) in order to improve overall genome quality. Postanalysis, the statistics of the sequencing data for each individual was generated as displayed in Table 1 . Tablet software [20] was used to manually check the indels and read coverage along the assembled mitogenomes. The mitogenome was annotated using the MitoAnnotator [21] and GB2sequin annotation web application [22] . The circular mitochondrial genome map was drawn using OGDRAW [23] ( Fig. 1 ).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.