Characterization of the Complete Mitochondrial Genome of Leucoma salicis (Lepidoptera: Lymantriidae) and Comparison with Other Lepidopteran Insects.

The complete mitochondrial genome (mitogenome) of Leucoma salicis (Lepidoptera: Lymantriidae) was sequenced and annotated. It is a circular molecule of 15,334 bp, containing the 37 genes usually present in insect mitogenomes. All protein-coding genes (PCGs) are initiated by ATN codons, other than cox1, which is initiated by CGA. Three of the 13 PCGs had an incomplete termination codon, T or TA, while the others terminated with TAA. The relative synonymous codon usage of the 13 protein-coding genes (PCGs) was consistent with those of published lepidopteran sequences. All tRNA genes had typical clover-leaf secondary structures, except for the tRNASer (AGN), in which the dihydrouridine (DHU) arm could not form a stable stem-loop structure. The A + T-rich region of 325 bp had several distinctive features, including the motif 'ATAGA' followed by an 18 bp poly-T stretch, a microsatellite-like (AT)7 element, and an 11-bp poly-A present immediately upstream of tRNAMet. Relationships among 32 insect species were determined using Maximum Likelihood (ML), Neighbor Joining (NJ) and Bayesian Inference (BI) phylogenetic methods. These analyses confirm that L. salicis belongs to the Lymantriidae; and that Lymantriidae is a member of Noctuoidea, and is a sister taxon to Erebidae, Nolidae and Noctuidae, most closely related to Erebidae.

. Map of the mitogenome of L. salicis. tRNA genes are labeled according to the IUPAC-IUB threeletter amino acids; cox1, cox2 and cox3 refer to the cytochrome c oxidase subunits; cob refers to cytochrome b; nad1-nad6 refer to NADH dehydrogenase components; rrnL and rrnS refer to ribosomal RNAs.
Overlapping and intergenic spacer regions. We identified four overlapping gene sequences, varying from 1 bp to 8 bp, making up 19 bp in total. The longest overlapping region was 8 bp between tRNA Trp and tRNA Cys ; there was a 7 bp overlap between atp8 and atp6; 3 bp overlap between tRNA Ile and tRNA Gln , and 1 bp between tRNA Ala and tRNA Arg (Table 1).

Continued
Intergenic spacers were spread over 18 regions, and ranged in length from 1 bp to 47 bp. The longest (47 bp) contained an A + T-rich region and occurred between tRNA Gln and nad2. The 10 bp spacer region between tRNA Ser (UCN) and nad1 included an ' ATACTAA' motif (Fig. 6A).
The A + T-rich region. The 325 bp long A + T-rich region of L. salicis was located between the rrnS and tRNA Met genes (Table 1). A + T content in the A + T-rich region was 91.69%, and both AT (− 0.248) and GC  (− 0.408) skews were negative ( Table 2). The A + T-rich region did not contain long repeats, though some short repeating sequences scattered over the entire region were present: an ' ATAGA' motif followed by an 18 bp poly-T stretch, a microsatellite-like (AT) 7 and a poly-A element upstream of the tRNA Met gene (Fig. 6B).  Phylogenetic relationships. We established phylogenetic relationships among 32 insects (Table 3), based on nucleotide sequences of 13 PCGs, using Maximum Likelihood (ML), Neighbor Joining (NJ) and Bayesian Inference (BI) methods. Species clustered by family (Fig. 7A, B and C). Within Lymantriidae, L. salicis was most closely related to G. menyuanensis. Lymantriidae clustered with Erebidae, while Noctuidae clustered with Nolidae. Noctuoidea was most closely related to Bombycoidea in ML and NJ trees, while in the BI tree Bombycoidea was most closely related to Geometroidea. Papilionoidea and Tortricoidea branched together in ML and NJ methods, but were separated from each other in the BI tree.

Discussion
At the family level, the length of the L. salicis mitogenome (15,334 bp) is marginally smaller than that of Euproctis pseudoconspersa (15, Table 2). It is slightly higher than that of other sequenced mitogenomes in Noctuoidea, including Ctenoplusia agnata (− 0.023), G. menyuanensis (0.003) and E. pseudoconspersa (0.011). A similar trend has been observed in other lepidopteran superfamilies such as Bombycoidea, where AT skew varies from 0.001 (Sphinx morio) to 0.059 (Bombyx mori) 11 . In all sequenced lepidopteran mitogenomes, GC skew ranges from − 0.268 in G. menyuanensis to − 0.155 in Paracymoriza distinctalis ( Table 2). The L. salicis mitogenome is moderately skewed (− 0.254), showing the presence of more Cs than Gs. The AT skew value (0.063) of the protein-coding gene region in the L. salicis mitogenome is higher than that of several previously sequenced mitogenomes. Its negative GC skew (−0.234) is similar to that seen in other animals. Cox1 is thought to initiate with CGA, as found in other lepidopteran insects 12,13 . Cox1 and cox2 terminate with a single T, while nad4 terminates with TA. Similar results have been documented in several sequenced lepidopteran mitogenomes, including Artogeia melete 14 , Phthonandria atrilineata 15 , Ochrogaster lunifer 16 , H. cunea 17 and Amata emma 18 . The common termination codon TAA is usually created via post-transcriptional polyadenylation 19 . The relative synonymous codon usage of the 13 protein-coding genes (PCGs) in L. salicis is consistent with those of published lepidopteran sequences. Similarly, codons with A or T in the third codon position being overrepresented relative to other synonymous codons, is consistent with previous observations of lepidopterans 9 ; likewise the absence or underrepresentation of high-GC codons 18,20 .
The A + T content (83.91%) of rRNA genes is similar to that seen in Lymantriidae (83.05% in G. menyuanensis). The positive AT (0.029) and negative GC (−0.144) skew seen in the L. salicis mitogenome has also been reported in several sequenced lepidopterans ( Table 2). For example, H. cunea has a positive AT (0.024) and negative GC (−0.137) skew 17 ; and L. dispar also has positive AT (0.023) and negative GC (−0.155) skew.
The secondary structure of L. salicis tRNA Ser (AGN) lacks the dihydrouridine (DHU) arm and forms a simple loop. This has also been observed in several other animal mitogenomes 21 , including those of insects 15,22,23 . Ten tRNA genes have 11 mismatches in their secondary structures; most of these are located in the acceptor, DHU and anticodon stems. In addition, tRNA Cys and tRNA Ser (UCN) contain an A-A mismatch in the anticodon stem. Unmatched base pairs observed in tRNA sequences can be corrected by RNA-editing mechanisms that are well known for arthropod mtDNA 24 .
Four overlapping sequences occur in the mitogenome of L. Salicis. The 7 bp overlap between atp8 and atp6 has been documented in several other lepidopteran mitogenomes 25,26 . The 10 bp intergenic spacer region containing an ' ATACTAA' motif, between tRNA Ser (UCN) and nad1, has also been documented in at least nine other species, suggesting that this region is highly conserved among most of the lepidopteran mtDNAs sequenced to date 27 . The length of the A + T-rich region of L. salicis (325 bp) is shorter than those of G. menyuanensis (449), L. dispar (371), H. cunea (357) and B. thibetaria (350), and longer than those of Lista haraldusalis (310) and Choaspes benjaminii (293). Extra tRNA-like structures are often found in the A + T-rich region of lepidopteran mitogenomes. For example Antheraea yamamai has tRNA Ser (UCN)-like and tRNA Phe -like sequences, each with correct anticodon structure and forming a clover-leaf structure, which suggests that they may be functional, though each has several mismatches in both aminoacyl and anticodon stem regions 28 . Extra tRNA-like structures have not been seen in L. salicis. The presence of multiple tandem-repeat elements is described as being characteristic of insect A + T-rich regions 29 . Antheraea pernyi has a repeat element of 38 bp tandemly repeated six times 25 ; and Cnaphalocrocis medinalis has a duplicated 25 bp repeat element 25,30 . Long conspicuous repeats were not observed in the A + T-rich region of L. salicis, though shorter repeating sequences, an ' ATAGA' motif and other features were. These characteristic features have each been found in previously sequenced lepidopteran species 27,31,32 .
In general, the L. salicis mitogenome contains several features in nucleotide composition, structure of tRNAs and PCGs as well as in the A + T rich region. Particularly in advanced technologies like PCR-RFLP methods 3 and DNA barcodes 33 , these similarities and differences between L. salicis and other insects could be used as potential markers in species identification, especially the differences.
Phylogenetic relationships were established using Maximum Likelihood (ML) Neighbor Joining (NJ) and Bayesian Inference (BI) methods. Species clustered in families, and results were broadly consistent with previous work, e.g. Dong et al. 26 and Dai et al. 34 . Results obtained from our analyses also supported the classification proposed by Fibiger and Lafontaine 35 , including within Lymantriidae a clade comprised of E. pseudoconspersa, L. salicis, L. dispar and G. menyuanensis. The present analysis showed that within Lymantriidae, L. salicis was most closely related to G. menyuanensis, which is consistent with a recent study on E. pseudoconspersa 26 . Interestingly,

Superfamily
Family Species Size (bp) GenBank No.
L. dispar is more closely related to G. menyuanensis than E. pseudoconspersa in ML and NJ trees ( Fig. 7A and B), whereas in the BI consensus tree L. dispar and E. pseudoconspersa branch together with 0.6406 posterior probabilities (Fig. 7C). We conclude from the above results that differences between BI, ML and NJ methods generate different results on the relationship among different Noctuoidea species. Because most previous classifications of Lymantriidae species have been based on morphological features, the precise position of Lymantriidae within the Noctuoidea is still unclear. Kitching has suggested that the Lymantriidae are the sister group to a paraphyletic Pantheidae, sharing apomorphies such as the presence of secondary setae in first instar larvae 36 . Zahiri et al. reclassified the Noctuoidea on the basis of molecular analyses, making the group currently named Lymantriinae a subfamily of Erebidae 37 . Our results suggest that Lymantriidae can be regarded as a sister group to other families (Erebidae, Nolidae and Noctuidae) in the Noctuoidea, being most closely related to Erebidae that is consistent with previous study of Fibiger and Lafontaine (2005) on higher Noctuoidea classification. They placed the Lymantriidae from a position in front of the Nolidae to a position after Arctiidae to reflect the close association of the arctiids and lymantriids, and moved the Nolidae, Arctiidae and Lymantriidae in front of the upgraded family Erebidae so that their close relationship with the "quadrifids' is better reflected 35 . It is concluded that further studies are needed on sequencing and characterization of mitogenomes of the family Lymantriidae that will provide insight to classification of Noctuoidea.
At the level of superfamilies, Noctuoidea was closely related to Bombycoidea in our ML and NJ analyses, while in the BI tree, Bombycoidea was closely related to Geometroidea. Papilionoidea and Tortricoidea branched together in ML and NJ trees, but in the BI tree they formed separate branches, more in line with previous studies. Hepialoidea was the sister group to all other superfamilies, as found previously by Salvato et al. 16 and Chai et al. 38 . While several previous studies have been undertaken on mitogenomes of Noctuoidea, relatively little is known about Lymantriidae specifically. Further taxon sampling within Lymantriidae and related families is required to resolve the placement of Lymantriidae in Noctuoidea.

Materials and Methods
Sample collection and mitochondrial DNA extraction. L. salicis larvae were collected from willow trees within the campus of Anhui Agricultural University, Hefei, China. Total genomic DNA was extracted using the Aidlab Genomic DNA Extraction Kit (Aidlab Co., Beijing, China) according to the manufacturer's instructions. Quality of extracted DNA was assessed by electrophoresis on a 1% agarose gel stained with ethidium bromide.
Primer design, PCR amplification and sequencing. The full mitochondrial genome of L. salicis was PCR amplified in thirteen overlapping fragments, based on primers that were designed from known mitogenomes of Lymantriidae, and synthesized by Invitrogen Co. Ltd. Shanghai, China (Table 4). All PCRs were performed in a 50 μ L reaction volume, including 35 μ L sterilized distilled water, 5 μ L 10 × Taq buffer (Mg 2 + ), 4 μ L dNTP (25 mM), 1.5 μ L DNA, 2 μ L of each primer (10 μ M) and 0.5 μ L (1 unit) Taq (TaKaRa Co., Dalian, China). PCR conditions were as follows: 4 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 40 s at 46-58 °C (Table 4), and 1-3 min (depending on putative length of the fragments) at 72 °C; and then a final extension step of 72 °C for 10 min.
All PCR products were visualized by electrophoresis on a 1.0% TAE agarose gel, and purified using a DNA gel extraction kit (Transgen Co., Beijing, China). The purified PCR fragments were ligated into the T-vector (TaKaRa Co., Dalian, China) and transformed into Escherichia coli DH5α , using the manufacturer's protocol. Recombinants were cultured overnight at 37 °C on Luria-Bertani (LB) solid medium containing Ampicillin  Table 4. Details of the primers used to amplify the mitogenome of L. salicis.
Sequence assembly and gene annotation. The complete mtDNA sequence was assembled using the SeqManII program from the Lasergene software package (DNAStar Inc., Madison, USA). Sequence annotation was performed using the NCBI's web interface for BLAST (http://blast.ncbi.nlm.nih.gov/Blast). Nucleotide sequences of the PCGs were translated into putative proteins based on insect sequences available in GenBank. Initiation and termination codons were identified using an alignment created in ClustalX version 2.0, with other lepidopteran sequences as references. To describe base composition, we analyzed skew as described by Junqueira 39  The tRNA genes were verified using the program tRNAscan-SE with default settings 41 , in addition to using the alignment to visually identify sequences with the appropriate anticodons capable of folding into the typical clover-leaf secondary structure. In the A + T-rich region, tandem repeats were found with the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html) 42 .

Phylogenetic analysis.
A total of 29 sets of 13 PCG sequences were used to perform phylogenetic analysis, including those of L. salicis. Those from other taxa were downloaded from GenBank, with Drosophila melanogaster (U37541.1) 43 and Locusta migratoria (JN858212) 44 sequences used as an outgroup. Alignments of the 13 concatenated PCGs were conducted using ClustalX version 2.0. Maximum likelihood (ML) phylogenetic analysis was performed using MEGA 5.0 with Tamura-Nei model 40 . Neighbor Joining (NJ) distance analysis was performed using PAUP4b10 45 , and Bayesian Inference (BI) MCMC phylogenetic analysis was performed using MrBayes 3.2 46 . The ML analysis was pseudosampled with 1000 bootstrapped datasets. The NJ analysis was done with 1000 bootstrap replicates. The BI analysis used four chains MCMC, running for 1,000,000 generations, with trees being sampled every 1000 generations. The consensus tree was visualized using FigTree v1.4.0 (http://tree. bio.ed.ac.uk/software/figtree/).