Complete mitochondrial genome of the Five-dot Sergeant Parathyma sulpitia (Nymphalidae: Limenitidinae) and its phylogenetic implications

: The complete mitochondrial genome of the Parathyma sulpitia (Lepidoptera, Nymphalidae, Limenitidinae) was determined. The entire mitochondrial DNA (mtDNA) molecule was 15 268 bp in size. Its gene content and organization were the same as those of other lepidopteran species, except for the presence of the 121 bp long intergenic spacer between trnS1 (AGN) and trnE . The 13 protein-coding genes (PCGs) started with the typical ATN codon, with the exception of the cox1 gene that used CGA as its initial codon. In addition, all protein-coding genes terminated at the common stop codon TAA, except the nad4 gene which used a single T as its terminating codon. All 22 tRNA genes possessed the typical clover leaf secondary structure except for trnS1 (AGN), which had a simple loop with the absence of the DHU stem. Excluding the A+T-rich region, the mtDNA genome of P. sulpitia harbored 11 intergenic spacers, the longest of which was 121 bp long with the highest A+T content (100%), located between trnS1 (AGN) and trnE . As in other lepidopteran species, there was an 18-bp poly-T stretch at the 3'-end of the A+T-rich region, and there were a few short microsatellite-like repeat regions without conspicuous macro-repeats in the A+T-rich region. The phylogenetic analyses of the published complete mt genomes from nine Nymphalidae species were conducted using the concatenated sequences of 13 PCGs with maximum likelihood and Bayesian inference methods. The results indicated that Limenitidinae was a sister to the Heliconiinae among the main Nymphalidae lineages in this study, strongly supporting the results of previous molecular data, while contradicting speculations based on morphological characters.

Insect mitochondrial DNA (mtDNA) is a circular DNA molecule 14-20 kb in size with 13 protein-coding genes (PCGs), two ribosomal RNA genes, 22 tRNA genes, and one A+T-rich region which contains the initiation sites for transcription and replication (Boore, 1999;Clayton, 1992;Wolstenholme, 1992). In recent years, owing to its maternal inheritance, lack of recombination and accelerated nucleotide substitution rates compared to those of the nuclear DNA, the mitochondrial genome has been popularly used in studies on phylogenetics, comparative and evolutionary genomics, population genetics, and molecular evolution.
The Nymphalidae is one of the largest groups of butterflies, comprising about 7 200 described species throughout the world. Its systematic and evolutionary process has long been a matter of controversy (Ackery, 1984(Ackery, , 1999de Jong et al, 1996;Ehrlich, 1958;Harvey, 1991). Until recently, however, only eight complete or nearly complete mt genome sequences have been determined from Nymphalidae among some forty sequences for Lepidoptera. That is, two from Heliconiinae, two from Satyrinae, and one each from Calinaginae, Apaturinae, Danainae, and Libytheinae.
Limenitidinae is a subfamily of Nymphalidae that includes the admirals and its close relatives. This butterfly group has long been the subject of scientific curiosity, serving as the model organism in diverse fields such as genetics, developmental biology, and evolutionary ecology (Fiedler, 2010;Platt & Maudsley, 1994). However, its sub-group classifications and phylogenetic relationships with the other Nymphalidae groups remains unresolved based on morphological and molecular criteria (Freitas & Brown, 2004;Wahlberg et al, 2003Wahlberg et al, , 2005Wahlberg & Wheat, 2008;Zhang et al, 2008).
Parathyma sulpitia is a representative species of the subfamily Limenitidinae (Lepidoptera: Nymphalidae) and it is widely distributed in Southeastern Asian areas, such as Vietnam, Burma, India, and China. We determined its complete mitochondrial genome sequence and compared this sequence with those of the other eight-nymphalid butterfly species available. Additionally, we performed phylogenetic analyses using maximum likelihood and Bayesian inference methods based on the concatenated 13 protein coding gene (PCG) sequences. The new sequence data and related analyses may provide useful information about the systematics and evolution of Nymphalidae at the genomic level.

Specimen collection
Adult butterflies of P. sulpitia were collected from the Jiulianshan National Nature Reserve, Jiangxi Province, China. The specimens were preserved immediately in 100% ethanol and then stored at −20 °C before genomic DNA extraction.

DNA extraction, PCR amplification and sequencing
Whole genomic DNA was extracted from thoracic muscle tissue with the DNeasy Tissue Kit (Qiagen) after the protocol of Hao et al (2005). Some universal PCR primers for short fragment amplifications of the cox1, cob and rrnL genes were synthesized (Simon et al, 1994). The remaining short and long primers were designed based on the sequence alignment of the available complete lepidopteran mitogenomes using Primer Premier 5.0 software (Singh et al, 1998).
The entire mitogenome of P. sulpitia was amplified in six fragments (cox1-cox3, cox3-nad5, nad5-nad4, nad4-cob, cob-rrnL, rrnL-cox1) using long-PCR techniques with TaKaRa LATaq polymerase under the following cycling conditions: initial denaturation for five minutes at 95 °C, followed by 30 cycles of 95 °C for 50 s, 45−50 °C for 50 s, 68 °C for 2 min and 30 s; and a final extension step of 68 °C for 10 min. The PCR products were visualized by electrophoresis on 1.2% agarose gel, then purified using a 3S Spin PCR Product Purification Kit and sequenced directly with an ABI-377 automatic DNA sequencer. For each long PCR product, the full, double-stranded sequence was determined by primer walking. The mitogenome sequence data were deposited into the GenBank database under the accession number JQ347260.

Sequence analysis and annotation
The tRNA genes and their secondary structure were predicted using tRNAscan-SE software v.1.21 (Lowe & Eddy, 1997) and the putative tRNA genes, which were not found by tRNAscan-SE, were determined by sequence comparison of P. sulpitia with other lepidopterans. The PCGs and rRNAs were confirmed by sequence comparison with ClustalX1.8 software and NCBI BLAST search function (Altschul et al, 1990). Nucleotide composition and codon usage were calculated with DAMBE software (Xia & Xie, 2001).

Phylogenetic analysis
Multiple sequence alignments of the concatenated sequences the 13 PCGs of the nine nymphalid species with available mitogenomes (Tab. 2) were conducted using Clustal X 1.8 software and then proofread manually (Thompson et al,1997). The phylogenetic trees were constructed using maximum likelihood (ML) (Abascal et al, 2007) and Bayesian inference (BI) (Yang & Rannala, 1997) methods with moth species Manduca sexta (Cameron & Whiting, 2008) (Tab. 2) used as outgroup. The ML analysis for the nucleotide and amino acid sequences were implemented in the PAUP* software (version 4.0b8) (Swofford, 2002) with TBR branch swapping (10 random addition sequences), the best fitting nucleotide substitution model (GTR+I+Γ) was selected using Modeltest version 3.06 (Posa & Krandall, 1998), and the confidence values of the ML tree were evaluated via the bootstrap test with 100 iterations. The Bayesian analyses were performed using MrBayes 3.1.2 (Ronquist & Huelsenbeck, 2003) with the partitioned strategy, the best fitting substitution model was selected as in the ML analysis; the MCMC analyses (with random starting trees) were run with one cold and three heated chains simultaneously for 1 000 000 generations sampled every 100 generations; Bayesian posterior probabilities were calculated from the sample points after the MCMC algorithm started to converge.

Genome organization
The mitogenome of P. sulpitia was a circular molecule 15 268 bp long and consisted of 13 PCGs [cytochrome oxidase subunits 1-3 (cox1-3), NADH dehydrogenase subunits 1-6 and 4L (nad1-6 and nad4L), cytochrome oxidase b (cob), ATP synthase subunits 6 and 8 genes (atp6 and atp8)], two ribosomal RNA genes for small and large subunits (rrnS and rrnL), 22 transfer RNA genes (one for each amino acid and two for leucine and serine) and a non-coding A+T-rich region. The gene orientation and order of the P. sulpitia mitogenome were identical to those of the other available lepidopteran mitogenomes, except for the presence of the 121 bp long intergenic spacer between trnS1(AGN) and trnE (Tab. 1, Fig. 1). As is the case in many insect mitogenomes, the major strand coded for more genes (nine PCGs and 14 tRNAs) and the A+T-rich region, whereas less genes were coded in the minor strand (four PCGs, eight tRNAs and two rRNA genes).

Protein-coding genes, tRNA and rRNA genes and A+T-rich region
All PCGs in the P. sulpitia mitogenome were initiated by typical ATN codons (seven with ATG, four with ATT, one with ATA), except the cox1 gene which was tentatively designated by the CGA codon (Tab. 1). Twelve PCGs of P. sulpitia had a common stop codon (TAA), except for the nad4 gene which harbored a single T.
The 22 tRNAs varied fro m 6 1 [ trnC an d trnS1(AGN)] to 71 bp (trnK) in size, and presented typical clover-leaf structure, with the unique exception of trnS1(AGN), which lacked the dihydrouridine (DHU)    (Fig. 2). The P. sulpitia tRNAs harbored a total of 24 pair mismatches in their stems, including six pairs in the DHU stems, eight pairs in the amino acid acceptor stems, two pairs in the TΨC stems and eight pairs in the anticodon stems, respectively. Among these 24 mismatches, 18 were G·U pairs which formed a weak bond in the secondary structure, and the other six were U·U (Fig. 2). As with other insect mitogenome sequences, two rRNA genes (rrnL and rrnS) were detected in P. sulpitia, located between trnL1 (CUN) and trnV, and between trnV and A+T region, respectively (Fig. 1). The lengths of the rrnL and the rrnS were determined as 1 319 bp and 779 bp, respectively.

Tab. 1 Summary of the mitogenome of Parathyma sulpitia
The A+T-rich region of P. sulpitia was 349 bp in size. There was an 18-bp poly-T stretch at the 3＇end of the A+T-rich region, and some short microsatellite-like repeat regions without conspicuous macro-repeats throughout the A+T-rich region.

Phylogenetic analysis
The resultant tree topologies of the ML and Bayesian analyses based on the nucleotide and amino acid sequences were the same, only with a slight difference in their bootstrap support or posterior probability values. For the paper length limit, we have only showed trees based on the nucleotide sequences

Genome structure, organization and composition
The P. sulpitia mitogenome size (15 268 bp) was well within the range detected in the completely sequenced lepidopteran insects, from 15 140 bp in Artogeia melete (GenBank accession no. NC_010568; Hong et al, 2009) to 16 094 bp in Agehana maraho (GenBank accession no. NC_014055; Wu et al, 2010). The nucleotide composition of A+T for the P. sulpitia mitogenome major strand was 81.9%, showing a strongly biased value, which was the highest of all the nymphalid species determined to date (Tab. 2). To evaluate the degree of base bias for the P. sulpitia mitogenome, base-skewness was also measured in this study. The results showed that AT and GCskewness values of the whole genome (measured from the major strand) were −0.048 and -0.178, respectively. This indicated that T and C were more frequently used than A and G in the genome, similar to results found in other nymphalid species used in this study (Tab. 3). However, when the two skewness values were considered separately, it was clear that the AT skew was the highest and the GC skew was the lowest of all the nymphalids in this study. Total codons were exclusive of the initial and termination codons; the skewness of the whole PCGs and the whole genome was calculated from major strand. * Outgroup.

Protein-coding genes
Twelve PCGs of P. sulpitia mitogenome were initiated by typical ATN codons, except for the cox1 gene. For the P. sulpitia COI gene, no typical ATN initiator was found in its starting region or in its neighboring trnY sequences. As for the cox1 initiation codon in animals, significantly different cases have been reported, for example, tetranucleotides such as TTAG in Coreana raphaelis (Kim et al, 2006), ATAA in Drosophila yakuba (Clary & Wolstenholme, 1985) are used, while hexanucleotides such as TATTAG in Ostrinia nubilalis and Ostrinia furnicalis (Coates et al, 2005), TTTTAG in Bombyx mori (Yukuhiro et al, 2002), TATCTA in Penaeus monodon (Wilson et al, 2000), ATTTAA in Anopheles gambiae (Beard et al, 1993), Anopheles quadrimaculatus (Mitchell et al, 1993), and Ceratitis capitata (Spanos et al, 2000) are used. Generally, the trinucleotide TTG was assumed to be the cox1 start codon for some invertebrate taxa including insect species, such as Pyrocoelia rufa (Bae et al, 2004), Caligula boisdnvalii (Hong et al, 2008), and Acraea issoria . In this study, however, according to sequence homologies with other available relevant insect species, the codon CGA was hypothesized to be the cox1 initiator synapomorphically characteristic of most lepidopteran species (Kim et al, 2009(Kim et al, , 2010. The nad4 gene of P. sulpitia harbored a single T, rather than the common stop codon TAA. Incomplete termination codons are frequently observed in most insect mitogenomes including all the sequenced lepidopteran insects to date (Kim et al, 2009), which has been interpreted in terms of post-transcriptional polyadenylation, in which two A residues are added to create the TAA terminator (Anderson et al, 1981;Ojala et al, 1981).
The value of A+T content for all PCGs was 80.6%, whereas, the corresponding values for the major and minor strands were 79.2% and 83.1%, respectively. Both values were the highest of all the nymphalids analysed in this study (Tab. 4). Furthermore, the A+T content of the PCG third codon position was calculated to be 96.7%, which was significantly higher than those of the first (74.8%) and the second (70.5%) codon positions. This value was the highest of all the corresponding values among the nymphalids (Tab. 4). With regard to AT-skew, the degree of A+T bias was calculated in different strands of the P. sulpitia mitogenome PCGs: the major strand evidenced a value of −0.172, whereas the minor strand exhibited a value of −0.154. In contrast, for the GC-skew, the major and minor strands showed values of −0.100 and 0.266, respectively (Tab. 3). Additionally, the A+T bias of the PCG codon usage for the P. sulpitia mitogenome (the relative synonymous codon frequencies, RSCU) revealed that codons harboring A or T in the third position were frequently used compared to other synonymous codons (Tab. 5).

Transfer RNA and ribosomal RNA genes
The P. sulpitia mitogenome harbored 22 tRNA genes, which were scattered throughout its whole region as is typically observed in metazoans including insects (Cha et al, 2007;Crozier & Crozier, 1993;Hong et al, 2008;Kim et al, 2010;Wilson et al, 2000;Yukuhiro et al, 2002). All tRNAs presented typical clover-leaf structure, with the unique exception of trnS1 (AGN), which lacked the dihydrouridine (DHU) stem (Fig. 2). The P. sulpitia tRNAs harbored a total of 22 pair mismatches in their stems, with the number of mismatches in P. sulpitia roughly the same as those detected in other lepidopteran species such as Antheraea pernyi  and Eriogyna pyretorum ), but less than those in Ochrogaster lunifer (Salvato et al, 2008). These tRNAs mismatches can be corrected through RNAediting mechanisms, which are well known for arthropod mtDNA (Lavrov et al, 2000).
As in all other insect mitogenome sequences, two rRNA genes (rrnL and rrnS) were detected in P. sulpitia. They were located between trnL1 (CUN) and trnV, and between trnV and the A+T region, respectively (Fig. 1). The length of the rrnL was determined to be 1 319 bp, which was within the size range observed in the other available sequenced insects, from 470 bp in Bemisia tabaci (Thao et al, 2004) to 1 426 bp in Hyphantria cunea (Liao et al, 2010). The length of the rrnS was determined to be 779 bp, which was well within the size range observed in other completely sequenced insects, from 434 bp in Ostrinia nubilalis (Clary & Wolstenholme, 1985) to 827 bp in Locusta migratoria (Flook et al, 1995).

Intergenic spacers and overlapping regions
The mtDNA genome of P. sulpitia included a total of 213 bp intergenic spacer sequences which were spread over 11 regions ranging in size from one to 121 bp. The largest spacer sequence (121 bp) was located between the trnS1 (AGN) and the trnE, rather than between the trnQ and the nad2 gene as found in other lepidopteran mitogenomes (Tab. 1). This spacer contained the highest A+T nucleotide (100%) of all the corresponding regions in all other lepidopterans determined. The sequence alignment of this spacer with partial A+T-rich region revealed a sequence homology of 74.4% (Fig. 3), suggesting that this spacer may have originated from a partial duplication of the A+T-rich region. The second largest intergenic spacer was 52 bp long, located between the trnQ and nad2 genes. This spacer is present in all lepidopteran mitogenomes sequenced, but absent in all non-lepidopteran insects (Hong et al, 2008). The sequence alignment of this spacer with the neighboring nad2 gene revealed a sequence homology of 62%, and thus, this spacer was proposed to have been originated from a partial duplication of the nad2 gene (Kim et al, 2009), with similar cases presented in other sequenced lepidopterans, such as Artogeia melete (70%) , C. raphaelis (62%) (Kim et al, 2006), Parnassius bremeri (70%) (Kim et al, 2009), and Phthonandria atrilineata (70%) ). The other nine smaller intergenic spacers ranged in size from one to 11 bp were dispersed throughout the whole genome, and their details are listed in Tab. 1.
A total of 92 bp were identified as overlapping sequences varying from one to 35 bp in 15 regions of the genome (Tab. 2). The longest overlap was 35 bp located between the cox2 and trnK genes, and the second largest was 20 bp long located between trnF and nad5. The third longest was 8 bp between trnW and trnC, with similarly sized overlaps also detected in other lepidopteran species (Hong et al, 2008). As expected, the 7 bp overlap within the atp8 and atp6 reading frames, which is characteristic of many animal mitogenomes (Boore, 1999;Hong et al, 2008), was also detected in this study. In addition, a 5 bp and a 3 bp overlap were located between cox1 and trnL (UUR), and between trnI and trnQ, respectively. As for the remaining nine overlaps of 1 or 2 bp in size, their detailed cases are shown in Tab. 1.

A+T-rich region
The A+T-rich region of P. sulpitia was 349 bp in size, located between rrnS and trnM (Fig. 1). This region showed the second highest A+T content (94.6%), slightly lower than the largest intergenic spacer (100%). This region included the O N (origin of minority or light strand replication), which was identified by the motif ATAGA located 20 bp downstream from rrnS. Additionally, a motif ATAGA followed by 19 bp poly-T, which has been suggested as the structural signal for the recognition of proteins in the replication initiation of minor-strand mtDNA, was detected, which is similar to that observed in other lepidopteran species such as the Bombyx mori (Yukuhiro et al, 2002). Finally, a few of multiple short microsatellite-like repeat regions, such as the (AT) 7 located 195 bp upstream from rnnS and preceded by the ATTTA motif, were present, which was as expected as they are also detected in the majority of other sequenced lepidopterans (Hong et al, 2008;Hu et al, 2010;Kim et al, 2009;Mao et al, 2010;Pan et al, 2008;Wang et al, 2011;Xia et al, 2011). As for the tRNA-like sequences and the tandemly repeated elements often reported in other lepidopteran species (Kim et al, 2009;Pan et al, 2008), no relevant structures were detected in the P. sulpitia A+T-rich region.

Phylogenetic analysis
An up-to-date and comprehensive classification of Nymphalidae was made by Ackery et al (1999) based on morphological characters, while work on molecular systematics of various lineages within Nymphalidae is beginning to clarify their relationships with interesting results (Brower et al, 2000;Wahlberg et al, 2003Wahlberg et al, , 2005. Though the twelve subgroups of Nymphalidae (Libytheinae, Danainae, Charaxinae, Morphinae, Satyrinae, Calinaginae, Heliconiinae, Limenitidinae, Cyrestinae, Biblidinae, Apaturinae, and Nymphalinae) are widely accepted at the subfamily level, some relationships within this group remain unresolved. For example, the phylogenetic positions of Danainae, Libytheinae, and Limenitidinae within Nymphalidae are still controversial.
As for the Limenitidinae, its sister group within the Nymphalidae has been the subject of substantial debate (Freitas & Brown, 2004;Harvey, 1991). From a morphological view, the close relationships of Limenitidinae, Heliconiinae, Nymphalinae, and Apaturinae have never been suggested (de Jong et al, 1996;Freitas & Brown, 2004;Harvey, 1991). For example, Freitas & Brown (2004) conducted a cladistic analysis of Nymphalidae based on immature and adult morphological characters, and the results showed that Limenitidinae is sister to the grouping of (Apaturinae + (Calinaginae + Satyrinae)), exclusive of the remaining nymphalidae taxa (Freitas & Brown, 2004). However, phylogenetic analyses based on molecular sequence data have convincingly suggested that Limenitidinae is the sister group of Heliconiinae (Brower, 2000;Wahlberg et al, 2003Wahlberg et al, , 2005Zhang et al, 2008). In this study, the ML and BI phylogenetic analyses based on the mitogenomic data of the nine available nymphalids, including that of P. sulpitia and other unpublished species, revealed the following relationships: (Danainae + ((Libytheinae + ((Satyrinae + Calinaginae) + (Apaturinae + (Heliconiinae + Limenitidinae) + Nymphalinae))))) with high support values (Fig. 4), which is congruent with those reported by Wahlberg et al (2003Wahlberg et al ( , 2005 and Brower (2000). In conclusion, the complete mitogenome of P. sulpitia harbored nearly the same characters as those of other nymphalids. Phylogenetic analysis on a mitogenomic level indicated that Limenitidinae was most closely related to Heliconiinae than other groups of Nymphalidae in this study, strongly supporting the results of former molecular studies, while contradicting the prevailing speculations based on morphological characters.