Complete mitogenome of the Lesser Purple Emperor Apatura ilia (Lepidoptera: Nymphalidae: Apaturinae) and comparison with other nymphalid butterflies

: The complete mitochondrial genome of Apatura ilia (GenBank accession no. JF437925) was determined as a circular DNA molecule of 15 242 bp, with common genes of 13 putative proteins, 2 rRNAs, and 22 tRNAs and of the same gene arrangement as in other sequenced lepidopterans. All protein-coding genes had the typical start codon ATN, except for the COI’s using CGA as its start codon as previously demonstrated in other lepidopteran species. The comparison of the nucleotide sequences of the A. ilia mitogenome with ten other Nymphalidae species showed nearly identical gene orientation and arrangement, with only a few alterations in non-coding fragments. The nucleotide composition and codon frequency all fell into the range estimated for the order Lepidoptera. The A. ilia mitochondrial genome had the canonical set of 22 tRNA genes folded in the typical cloverleaf structure, with an unique exception of tRNA Ser (AGN). The mitochondrial genes from A. ilia were overlapped in a total of 33 bp at 9 locations, as well as interleaved with a total of 155 bp intergenic spacers, spread over 12 regions with the size ranging from 1 to 49 bp. Furthermore, the spacer between ND6 and Cyt b harbored a microsatellite-like repeat (TA) 23 not found in other completely sequenced nymphalid genomes. The 403 bp AT-rich region harbored two conserved motifs (ATAGA, ATTTA), a 21 bp polyT stretch, a 10 bp poly-A region, along with two microsatellite-like repeats ( (TA) 10 and (TA) 7 ), as detected in other nymphalid butterflies.

中图分类号：Q969.42; Q969.439.2; Q754 文献标志码：A 文章编号：0254-5853-(2012)02-  Present butterfly taxonomy classifies the Lesser Purple Emperor Apatura ilia into the subfamily Apaturinae (Lepidoptera: Nymphalidae) (Chou, 1998). This butterfly species was widely distributed in the Palaearctic Europe and Asia. It's name comes from the butterfly's bright violet reflection under sunshine. In its lifecycle, larvae feed on the leaves of trees including trembling poplar, poplar, aspen and willow, whereas the adults mainly feed on tree sap and animal dejecta (Chou, 2000). Due to habitat destruction and environmental degradation, A. ilia has been designated as endangered or under priority protection in some countries or regions, including Belgium (Li & Fu, 2000).
The mitochondrial genome of insects is a circular and double-stranded molecule of approximately 14-20 kb in size with highly conserved exon arrangement covering a set of 37 genes, namely 13 PCGs, 22 tRNA genes, and 2 rRNA (srRNA (12S) and lrRNA (16S)) genes (Boore, 1999;Taanman, 1999). A mitogenome features smaller sizes, faster evolutionary rates, higher conservative gene content, maternal inheritance and little recombination (Brown, 1983;Avise, 1994), compared to the nucleic genome. Thus it has been commonly used for taxonomic and phylogenetic studies in many animal groups. The availability of complete mitogenome data of more species remarkably increases the accuracy and efficiency of a variety of research areas, such as molecular phylogenetics, phylogeography and taxonomy.
The complete mitochondrial genomes of nearly 240 insect species are available, of which only 10 are of Papilionoidae, despite its high biodiversity of 17 500 species (Robbins, 1982). The Nymphalidae is the largest butterfly family, and relationships with other butterfly groups remain unclear. It is necessary to integrate mitogenome data in the reconstruction of Nymphalidae phylogenies. In this study we sequenced the entire mitogenome of the nymphalid A. ilia, a representative butterfly species of the subfamily Apaturinae and compared its nucleotide organization to those of other representative nymphalid butterfly species. Our aim is to provide important molecular data to clarify the phylogenetic relationship between A. ilia and other nymphalid butterflies.

Sample collection and DNA extraction
Adult individuals of A. ilia were collected from Mount Yandangshan, Zhejiang, China in August 2008. Samples were quickly preserved in 100% ethanol and at -20 °C until DNA extraction. Total genomic DNA was isolated from a single frozen butterfly using the proteinase-K-SiO 2 as follows (Hao et al, 2005). The thorax muscle around 5 mm 3 was removed into a 10 mL Eppendorf tube, washed twice with ddH 2 O, and soaked for 2~3 h. Incubation was done with 500 μL DNA solution (5 mmol/L NaCl, 0.5% SDS, 15 mmol/L EDTA, 10 mmol/L Tris-HCl, pH 7.6) and 40 μL proteinase-K (20 mg/mL). The muscle was then bathed at 55 °C for 10~12 h and centrifuged at 4 000 rpm for 2 min. Liquid supernatant was transferred to a new 10 mL Eppendorf tube with 500 μL 8 mol/L GuSCN and 40 μL 50% clean glass liquid mixture, bathed at 37 °C for 1~2 h, shocked every ten min and centrifuged at 4 000 rpm for 1 min. The supernatant was removed and sediments were cleaned twice with 75% alcohol, and once with acetone. The sample was dried thoroughly in a vacuum dryer at 45 °C prior to the addition of 60 μL TE (10 mmol/L Tris-Cl, 1mmol/L EDTA, pH 8.0). The solution was later bathed at 56 °C for 30 min, and centrifuged at increasing speed untill 4 000 rpm for 1 min. The supernatant containing total genomic DNA was removed into a clean 1.5 mL Eppendorf tube and preserved at -20 °C for use.

Primer design, PCR amplification and DNA sequencing
The universal PCR primers for short fragment amplifications of the srRNA, COI and Cyt b genes were synthesized after Simon et al (1994) and Simons & Weller (2001). Long primers and certain short ones for some genes including COIII and ND5 were designed by the multiple sequence alignments of the complete mitochondrial genomes of all lepidopterans available (Tab. 1), using ClustalX 1.8 (Thompson et al, 1997) and Primer Premier 5.0 (Singh et al, 1998) softwares.
Long PCRs were performed using TaKaRa LA Taq polymerase with the cycling parameters: initial denaturation for 5 min at 95 °C, followed by 30 cycles of 95 °C for 50 sec, 47-61 °C for 50 sec, 68 °C for 2 min and 30 sec; and a final extension step of 68 °C for 10 min. The short fragments were amplified with TaKaRa Taq polymerase: initial denaturation for 5 min at 94 °C, followed by 35 cycles of 94 °C for 1 min, 45-53 °C for 1 min, 72 °C for 2 min; and a final extension step of 72 ° for 10 min. The PCR products were detected via electrophoresis in 1.2% agarose gel, purified using the 3S Spin PCR Product Purification Kit and sequenced directly with ABI-3730 automatic DNA sequencer. Mitogenome sequence data have been deposited into GenBank under the accession number JF437925.

Sequence analysis
All genes and the AT-rich region of the A. ilia mitogenome ClustalX 1.8. The nucleotide sequences of protein-coding genes were translated according to the invertebrate mtDNA genetic code. Fifteen of the 22 tRNA genes were identified using the software tRNA Scan-SE 1.21 (Lowe & Eddy, 1997) and RNAstructure 4.3 (Mathews, 2006). The remaining 7 tRNA genes were drawn manually after comparison with known homologous regions of other lepidopteran insects. MEGA 5.0 software (Tamura et al, 2007) was used to analyze nucleotide composition and codon usage.

Gennome organization
The complete mitogenome of A. ilia is 15 242 bp in size. Similar to most insects, the mitogenome has a set of 37 genes: including 13 protein-coding, 22 tRNA and 2 rRNA genes. A large noncoding A+T-rich region was identified (Fig. 1). This region is of highly variable Abbreviations for genes are as follows: COI-III refers to the cytochrome oxidase subunits, Cyt b refers to cytochrome b, and ND1-6, ND4L refers to NADH dehydrogenase components. tRNAs are denoted as one-letter symbol according to IUPAC-IUB single-letter amino acid codes. length in insects and is generally suggested to be the replication and transcription origin sites of the mtDNA double strands (Clayton, 1992). Furthermore the mitogenome of A. ilia was found to be highly similar to most sequenced lepidopterans in terms of gene order and orientation. Nine protein-coding genes were found in the major strand and the remaining 4 protein-coding genes in the minor strand along the mitogenome(Tab. 1). Besides those, the mitogenome of A. ilia has 9 overlapped sequences and 12 intergenic sequences. 2.2 Protein-coding genes, transfer RNA genes and ribosomal RNA genes Thirteen protein-coding genes for 3 711 amino acids were identified in the mitochondrial genome of A. ilia (Tab. 2). The longest one is the COI gene with 1 533 bp and the shortest one is ATP8 with only 159 bp. Twelve protein-coding genes were initiated by conventional start codon ATN, while only the COI gene was tentatively designated to be CGA as the start codon. In A. ilia, nine protein-coding genes ended with TANs (7 with TAA, 2 with TAG), while four genes ended with a single T right ahead of tRNA genes (Tab. 1).
Twenty-two tRNA genes were found in the mitogenome (Fig. 2), all of the cloverleaf secondary structure except for tRNA ser (AGN) which harbors a simple loop in the DHU arm. Twenty-two tRNA genes ranged from 62 bp for tRNA Arg and tRNA Ser (AGN) to 71 bp for tRNA Lys in length(Tab. 3).
As in other lepidopteran species, the A. ilia mitogenome was found to harbor two rRNA genes, srRNA (776 bp) and lrRNA (1,333 bp). They are located between tRNA Leu (CUA) and an A+T-rich region, separated by tRNA Val .

A+T-rich region, intergenic spacer and overlapping sequences
The AT-rich region of A. ilia was found to be 403 bp in length, located between srRNA and tRNA Met . It had the highest AT content (92.5%) across the whole mitogenome (Tab. 2), typical in Nymphalidae insects from 89.6% (Melnitis leda) (Unpublished) to 96.3% (Libythea celtis) (Unpublished)..In addition, the values of the AT skew and GC skew for this region reached to -0.07 and -0.14, respectively.
Thirteen intergenic spacer sequences were determined with a total length of 155 bp (Tab. 4). The longest two intergenic spacer sequences were both 49 bp long and located between tRNA Gln and ND2 and ND6 and Cyt b. The shortest one was only 1 bp in size. The    mitogenome of A. ilia contains nine overlapping sequences ranging from 1 bp to 8 bp and totaling 33 bp in length.

Gene organization and composition
The length of the complete mitogenome of A. ilia falls in the known range of the lepidopteran insects from 15 122 bp in M. leda to 16 094 bp in Agehana marho (Papilio maraho) . A. ilia demonstrated a common lepidopteran gene order tRNA Met followed by tRNA Ile and by tRNA Gln , different from those of other insect groups (tRNA Ile followed by tRNA Gln and by tRNA Met) (Tab. 1). Gene order was used to explore the presumed independent evolutionary lepidopteran lineages after divergence from their common ancestors (Boore et al, 1998).
The nucleotide composition of the A. ilia mitogenome showed considerable bias towards an A+T preference (80.5%) (Tab. 2), a common characteristic observed in insect mitochondrial genomes, ranging from 69.5% to 84.9 % (Crozier & Crozier, 1993;Dotson & Beard, 2001). It was noted that the content of base T (40.7%) was slightly higher than base A (39.8%), resulting in an AT skewness value of -0.012. The GC composition (19.5%) was correspondingly lower than AT (Tab. 2) and the GC skewness value was -0.21.

Protein-coding genes
The putative start codons were found to be the same as in lepidopteran mtDNA (ATN codons: 3 with ATA, 6 with ATG, 3 with ATT) (Tab. 1), except that the COI gene has no uniform start codon (Lessinger et al, 2000;Yukuhiro et al, 2002). In general, lepidopteran insects were quite conservative in using CGA to initiate the COI gene, such as in Eriogyna pyretorum (Jiang et al, 2009), Hyphantria cunea (Liao et al, 2010) and Adoxophyes honmai . However, there are some

Tab. 4 Overlapping and intergenic spacer sequences of nymphalid mitogenomes
COIII-tRNA Gly (I)2 (I)2 (I)2 (I)2 (I)2 (I)2 (I)2 (I)2 (I)2 (I)6 (I)2 tRNA Gly -ND3 tRNA Ser-tRNA Ser tRNA Ser -tRNA Glu (I)1 (I)1 (I)28 exceptions. A previous study using transcript information from the cDNA sequence showed that the start codon for the COI gene was TCG (Serine) in dipteran insects (Krzywinski et al, 2006). In addition, TTAAAG has been previously proposed to be the start codon for the COI gene in Pieris rapae (Mao et al, 2010), ATTACG for Papilio xuthus (Feng et al, 2010), TTAG for Corean raphaelis , and TTG for Acraea issoria  and Calinaga davidis (Xia et al, 2011). In A. ilia, four genes ended with a single T right ahead of tRNA genes (Tab. 1). The single T residue could be completed into triplet codons by polyadenylation (Clary et al, 1985), and the tRNA secondary structure is functional to the precise cleavage of the mature protein-coding genes from the primary multicistronic transcripts (Ojala et al, 1980(Ojala et al, , 1981. The AT bias of the protein-coding genes in A. ilia was prominent with AT content of 78.9%, the same as the average value of sequenced nymphalid mitogenomes (78.9%) (Tab. 2). Additionally, the PCG nucleotide frequency was T>A>G>C, displaying significant skews at AT (-0.15) and GC (0.02), both comparable to other sequenced lepidopterans (Liao et al, 2010). Examination of the concatenated 13 PCGs showed that the third codon position (91.7%) contained higher AT content than the first (74.4%) and second (70.9%) positions, and this case is also similar to other sequenced lepidopteran species like Happrchia autonoe (Kim et al, 2010). As for A+T content among 13 PCGs, ATP8 had the highest (93.9%) and COI has the lowest (72.4%) values (Tab. 2).

Transfer RNA genes and ribosomal RNA genes
The DHU arm of tRNA ser (AGN) has only a simple loop, common in most insects (Hong et al, 2008;Salvato et al, 2008;Wolstenholme, 1992). Similar to Parnassius bremeri (Kim et al, 2009), the tRNAs of A. ilia harbor 7 base pairs in amino-acyl stems, 5 base pairs in anticodon stems, and 7 base pairs in anticodon loop. However, the base pair numbers vary in other tRNA portions, especially within the TΨC loops (3-10 bp) (Tab. 3).
Among the 22 tRNA genes 32 mismatched base pairs were found, 10 on the amino acyl stem, 9 on the DHU stem, 1 on the TΨC stem and 9 on the anticodon stem. Twenty were between guanine and uracil, justifiable in terms of structural stableness (Topal & Fresco, 1976). Some unconventional mismatches were also observed, e.g. A-C (1), A-G (1), and U-U (8). Similar cases were also seen in other lepidopteran species. For example, C. raphaelis has 8 U-U mismatches in tRNAs; A. issoria exhibits a G-A and a C-U mismatches in the tRNA Ile ; H. autonoe contains a U-U and a A-C mismatches in tRNA Leu (UUR), H. cunea has U-U mismatches in tRNA Ala , tRNA Leu (CUN), and tRNA Leu (UUR) (Liao et al, 2010). These mismatches can be corrected through RNA-editing mechanisms that are well known for arthropod mtDNA (Lavrov et al, 2000) As in other lepidopteran species, the A. ilia mitogenome was found to harbor two rRNA genes (776 bp srRNA and 1 333 bp lrRNA). The AT content of srRNA was 84.9%, similar to those of other lepidopteran insects (87.5% for Phthonandria atrilineata , 82.0% for Ostrinia furnacalis (Coates et al, 2005)) (Tab.2). The AT content of lrRNA (85.0%) also fell into the range for other lepidopteran insects (85.1% for P. atrilineata, 81.4% for Ostrinia Lunifer (Salvato et al, 2008)) (Tab. 2).
Like spacer 1, spacer 2 was also 49 bp in length and found between ND6 and Cyt b. It is notable that a microsatellite-like repeat (TA) 23 was identified within this region. This case is extremely rare among the known nymphalid mitogenomes (the other example is in A. metis which has (TA) 12 in this region).
Spacer 3 was 13 bp and located between tRNA Ser (CUN) and ND1. This spacer sequence held a 7-base motif ATACTAA. A similar motif has been identified in previous studies as a plausible conservancy in all Lepidoptera species sequenced so far (Kim et al, 2009;Cameron & Whiting, 2008;Liao et al, 2010;Salvato et al, 2008). It may be functionally essential in the recognition of the mtDNA TERM (the transcription termination peptide). Recent mitogenomic sequence data has demonstrated mixed results in Nymphalidae species. For example, in H. autonoe a 6-bp spacer (Kim et al, 2010) was located between tRNA Ser (CUN) and ND1, and the 7-bp motif ATACTAA located within tRNA Ser (CUN), whereas in C. dauidis, A. hyperbius and A. issoris, the two genes were overlapped for 1 or 2 bp and the motif was located at the 3' end of the ND1 gene.
The 12-bp spacer 4 was located between ND4L and tRNA Thr , being longer than those of other nymphalid species. All remaining spacers in A. ilia are less than 10 bp.
We also found two overlapping sequences which are conservative in Lepidoptera, one was 7 bp long and the other 8 bp. The 7-bp sequence was located between ATP8 and ATP6 as ATGATAA, and the 8-bp overlap was located between tRNA Trp and tRNA Cys as AAGCCTTA, These two sequences have also been detected in other lepidopteran insects, such as P. bremeri, H. cunea and P. atrilineata. Based on data of the presently sequenced lepidopteran insects, the two overlapping sequences were postulated to be conservative across lepidopteran insect taxa. Another 5bp sequence was located between COI and tRNA Leu . The remaining five overlapping sequences range from 1 to 3 in size. So far the 3-bp overlap between tRNA Ile and tRNA Gln has been found in all sequenced Nymphalidae species.

A+T-rich region
The AT-rich region functional in mtDNA replication and transcription (Taanman, 1999). The origin of the major-strand replication was studied in the AT-rich region in vertebrates (Tapper & Clayton, 1981), followed by the detection on both strands of mtDNA in Drosophila species (Clary & Wolstenholme, 1987;Fauron & Wolstenholmn, 1980). In recent years investigations were expended to the replication origin of the minor-strand in Diptera, Lepidoptera, Coleoptera and Orthoptera, and the results suggest that the replication origin site of mtDNA minor-strand in insects is located before the poly-T structure, which is standing at the 3' end of the AT-rich region (Saito et al, 2005).
The AT-rich region of A. ilia has some common or similar structural features for lepidopteran insects. It harbors a 21-bp poly-T stretch located 18 bp upstream from srRNA and preceded by a motif ATAGA. The poly-T stretch and the motif composed the origin site for the minor-strand replication, recognizable as a structural signal by regulating proteins (Kim et al, 2009). Additionally, there are two microsatellite-like repeats of (TA) 10 and (TA) 7 , which are preceded by a conserved motif ATTTA and located upstream of (TA) 10 repeat. At the 5' end of this region there is a 14-bp poly-A structure, shortened to be 10 bp by inserting a single T base. A similar case was found for other nymphalid butterfly species; for example, the poly-A of A. issoris mtDNA was inserted by a guanine. Functionally, the poly-A was assumed to be the replication origin location of the mtDNA major-strand because of its connection with tRNA Met (Kim et al, 2009).