The complete mitochondrial genome data of the Common Rose butterfly, Pachliopta aristolochiae (Lepidoptera, Papilionoidea, Papilionidae) from Malaysia

Here, we present the complete mitochondrial genome of Pachliopta aristolochiae, a Common Rose butterfly from Malaysia. The sequence was generated using Illumina NovaSeq 6000 sequencing platform. The mitogenome is 15,235bp long, consisting of 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs, and two D-loop regions. The total base composition was (81.6%), with A (39.3%), T (42.3%), C (11.0%) and G (7.3%). The gene order of the three tRNAs was trnM-trnI-trnQ, which differs from the ancestral insect gene order trnI-trnQ-trnM. Phylogenetic tree analysis revealed that the sequenced Pachliopta aristolochiae in this data is closely related to Losaria neptunus (NC 037868), with highly supported ML and BI analysis. The data presented in this work can provide useful resources for other researchers to study deeper into the phylogenetic relationships of Lepidoptera and the diversification of the Pachliopta species. Also, as one of the bioindicator species, this data can be used to assess environmental changes in the terrestrial and aquatic ecosystem via enviromental DNA approahes. The mitogenome of Pachliopta aristolochiae is available in GenBank under the accession number MZ781228.


a b s t r a c t
Here, we present the complete mitochondrial genome of Pachliopta aristolochiae , a Common Rose butterfly from Malaysia. The sequence was generated using Illumina NovaSeq 60 0 0 sequencing platform. The mitogenome is 15,235bp long, consisting of 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs, and two D-loop regions. The total base composition was (81.6%), with A (39.3%), T (42.3%), C (11.0%) and G (7.3%). The gene order of the three tRNAs was trnM-trnI-trnQ , which differs from the ancestral insect gene order trnI-trnQ-trnM . Phylogenetic tree analysis revealed that the sequenced Pachliopta aristolochiae in this data is closely related to Losaria neptunus (NC 037868), with highly supported ML and BI analysis. The data presented in this work can provide useful resources for other researchers to study deeper into the phylogenetic relationships of Lepidoptera and the diversification of the Pachliopta species. Also, as one of the bioindicator species, this data can be used to assess environmental changes in the terrestrial and aquatic ecosystem via enviromental DNA approahes. The mitogenome of Pachliopta aristolochiae is available in GenBank under the accession number MZ781228.

Value of the Data
• The sequenced mitochondrial genome of the Common Rose butterfly, Pachliopta aristolochiae in this data represents the Pachliopta species originating from Malaysia. • As one of the bioindicator species, this mitogenome data can be used to assess environmental changes in the terrestrial and aquatic ecosystem via environmental DNA approaches. • The additional mitogenome data of Pachliopta aristolochiae generated can also provide the relevant information needed for other researchers to study deeper into the phylogenetic relationships of Lepidoptera and the diversification of the Pachliopta species.

Data Description
The Common Rose butterfly, Pachliopta aristolochiae mitogenome is a circular DNA with a total of 15,235bp in length ( Fig. 1 ). Table 1 shows the statistical data information for the sequence reads. The mitogenome encodes 13 protein-coding genes (PCGs), 22 transfer RNAs, 2 ribosomal RNAs, and two D-loop regions ( Table 2 ). The gene order of P.aristolochiae located between the Dloop and NAD2 was trnM-trnI-trnQ , which had been observed in most Lepidoptera mitogenomes, however, it differs from that of the ancestral insect gene order, trnI-trnQ-trnM [1] . The total size of the PCGs was 11,178bp in length and the tRNAs were 1,452bp long, ranging from 60bp to 71bp. Meanwhile, the sizes for the 12S and 16S RNAs are 719bp and 1280bp respectively. The majority of the PCGs (NAD2, COX1, COX2, ATP8, ATP6, COX3, NAD3, NAD6, CYTB) are scattered on the heavy strand, and NAD5, NAD4, NAD4l, NAD1 are on the light strand. Out of 13 PCGs, 12 were initiated by the typical ATN codon except for COX1 which uses the CGA start codon. Contrary to the start codon, two PCGs (COX2 and NAD4) were terminated with the incomplete stop codon T and the others were terminated by either TAA or TAG stop codon. The phenomena of incomplete termination codon had been observed in most Lepidoptera mitogenomes, and are associated with the polyadenylation process [2] . The mitogenome of P. aristolochiae showed an AT content of 81.64% with the base composition of A (39.3%), T (42.3%), C (11.0%), and G (7.3%) as shown in Table 3 . The nucleotide skew statistics of the whole mitogenome indicates a high occurrence of T over A, and C over G with an AT-skew of -0.037 and GC-skew of -0.202.
Two D-loop regions were found in the sequenced mitogenome of P.aristolochiae for this data. The first region was found at the position 6148bp to 6192bp, located between trnS1 and trnE. This region is 45bp long, which contained a string of microsatellite-like element (AT). Meanwhile, the second D-loop region was 420bp long, located between 12S rRNA and trnM, spanning a conserved ATAGA motif, followed by a poly-T stretch, and a microsatellite-like element (AT) 9 and (TA) 6 after the motif ATTTA, as commonly found in all Lepidoptera mitogenomes [4] . Maximum-Likelihood (ML) and Bayesian Inference (BI) probability tree were generated using 13 PCGs of 22 Lepidoptera mitogenomes from the family Papilionidae and Lycaenidae obtained from GenBank, including the sequenced P. aristolochiae in this data ( Table 4 ). The resulting trees yielded identical topology under the ML and BI analysis ( Fig. 3 ). Most of the nodes are highly supported with bootstrap value of more than 70% in ML analysis, and a Bayesian posterior prob-  ability of more than 0.95 in BI analysis. The sequence P. aristolochiae (MZ781228) in this study is clustered with the previously sequenced P. aristolochiae (NC 034280) and are closely related to Losaria neptunus (NC 037868), supported with a bootstrap value of 100% in ML and 1.0 posterior probability value in BI. A BLASTn analysis was also conducted to compare between the two mitogenomes of P.aristolochiae , where P.aristolochiae (MZ781228) in this data is 99.42% similar to P. aristolochiae (NC 034280) deposited in GenBank.

Sample collection, DNA extraction and pre-processing
The sample Pachliopta aristolochiae (voucher no: DIB022) was collected from Sungai Semawak Taman Negara Endau-Rompin Johor, Malaysia (5.62 N, 100.46 E) in March 2019. The genomic  DNA was extracted from a fresh tissue sample using Qiagen Blood and Tissue Kit (Qiagen, Valencia, CA) and was fragmented using a Bioruptor ® system [5] . The library preparation was done using NEBNext ® Ultra TM II DNA Library Prep Kit for Illumina ® , following the manufacturer's instructions. Then, the library was sent for sequencing using the Illumina NovaSeq 60 0 0 platform with 150 paired-end mode (PE150). A total of 10,102,764 raw reads were obtained and firstly  verified using the FastQC program for quality assessment ( https://www.bioinformatics.babraham. ac.uk/projects/fastqc/ ). Next, the raw reads were trimmed for sequencing adapters, low-quality bases as well as Ns [ 6 , 7 ] using AdapterRemoval v2.3.2 [8] . Sequences with quality score of 20 and above were retained. Both the forward and reverse reads were interleaved into a single file before using PALEOMIX [9] .

Mitogenome assembly, annotation and sequence analysis
The mitogenome was assembled using the NOVOPlasty v.4.2 [10] program with the default parameter. The reference sequence and seed input were taken from BOLD public data ( http://barcodinglife.org/ ), with the sequence ID BKKP127-18.COI-5P. Next, the assembled mitogenome was run through PALEOMIX BAM pipeline [9] using default parameters to remove reads shorter than 15 bp after trimming. The mitogenome annotation was carried out using MITOS v2 web server [11] , with reference set 'RefSeq 81 Metazoa' and genetic code '5' for invertebrates. Then, the predicted proteins were verified using the Open Reading Frame (ORF) Finder ( https://www.ncbi.nlm.nih.gov/orffinder/ ) server using BLASTP. To improve the genome annotation, the predicted proteins from MITOS v2 web server [11] and ORF Finder were aligned with the reference sequence of Pachliopta aristolochiae (NC 034280) in GenBank using Jalview 2 v11.1.4 [12] . Tablet software [13] was used to manually check for insertion and deletion of bases, as well as the sequence coverage. The total base compositions were calculated using BioEdit [14] . The AT/GC skewness was calculated as follows: AT skew = (A-T)/(A + T) and GC skew = (G-C)/(G + C), where each letter represents the total percentage of the respective base count. The annotated mitogenome sequence file was converted into GenBank format using GB2sequin web application [15] . The GenBank file format was then used to generate the circular mitogenome map using OGDRAW [3] .

Declaration of competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.