The complete mitochondrial genome of Dysgonia stuposa (Lepidoptera: Erebidae) and phylogenetic relationships within Noctuoidea

To determine the Dysgonia stuposa mitochondrial genome (mitogenome) structure and to clarify its phylogenetic position, the entire mitogenome of D. stuposa was sequenced and annotated. The D. stuposa mitogenome is 15,721 bp in size and contains 37 genes (protein-coding genes, transfer RNA genes, ribosomal RNA genes) usually found in lepidopteran mitogenomes. The newly sequenced mitogenome contained some common features reported in other Erebidae species, e.g., an A+T biased nucleotide composition and a non-canonical start codon for cox1 (CGA). Like other insect mitogenomes, the D. stuposa mitogenome had a conserved sequence ‘ATACTAA’ in an intergenic spacer between trnS2 and nad1, and a motif ‘ATAGA’ followed by a 20 bp poly-T stretch in the A+T rich region. Phylogenetic analyses supported D. stuposa as part of the Erebidae family and reconfirmed the monophyly of the subfamilies Arctiinae, Catocalinae and Lymantriinae within Erebidae.


INTRODUCTION
Dysgonia stuposa (Lepidoptera: Erebidae) is an important pest species, and it has a wide distribution throughout the southern and eastern parts of Asia. Its larvae mainly consume the leaves of Punica granatum (pomegranate) resulting in considerable economic losses. In the northern areas of China, D. stuposa pupates during the winter to avoid the harsh environment (Piao, Fan & Zheng, 2012). The identification and prevention of D. stuposa at the pupal stage based on morphological characteristics is quite difficult for taxonomists and population ecologists. Despite the economic importance, our understanding of D. stuposa biology or phylogenetic status at the molecular level is still in its infancy. New molecular techniques such as DNA barcoding and PCR-RFLP are considered more reliable than morphology for studying taxonomy of animals (Arimoto & Iwaizum, 2014;Raupach et al., 2010). The application of molecular techniques to study the sequence of D. stuposa mitogenome will help in its precise identification and classification while contributing to future genetic ecology and evolutionary analyses.
The insect mitogenome is typically a 14-19 kb sized, circular, double-stranded DNA molecule (Boore, 1999). Compared to the nuclear genome, mitogenome is small in size and comparatively easy to sequence. Mitogenome usually contains numerous typical characteristics, such as stable gene composition, and conserved gene arrangements, which are widely used in molecular identification, population genetics, systematics and biogeographic studies (Wolstenholme, 1992;Wilson et al., 2010). Given the vast diversity of insects, mitogenome analyses are beneficial for species identification and broadly employed in the study of genomic evolution and phylogenetic relationships (Lu et al., 2013;Cameron, 2014).
Noctuoidea is one of the largest superfamilies of Lepidoptera, with over 42,400 described species (Nieukerken et al., 2011). Unlike other superfamilies, a metathoracic tympanal organ is a characteristic feature of Noctuoidea species (Miller, 1991). However, morphological based phylogenetics has failed to resolve classification conflicts at the family and sub-family level. Furthermore, the initial molecular studies were also unable to provide sufficient information as most of them rely on one or two genes with only 29-49 species (Mitchell et al., 1997;Fang et al., 2000). Mitchell, Mitter & Regier (2006) conducted systemic analyses based on two nuclear genes (elongation factor-1α (EF-1α) and dopa decarboxylase (DDC)) and increased taxon sampling (146 species), that supported the monophyly of sub-families and proposed a LAQ clade (Lymantriidae and Arctiidae became subordinate subfamilies within quadrifid noctuids). Zahiri et al. (2011) reconstructed the molecular phylogenetics of Noctuoidea using one mitochondrial (cox1) and seven nuclear genes wingless,RpS5,IDH,CMDH,GAPDH and CAD) from 152 species with the Maximum Likelihood (ML) method. They proposed a new perspective, splitting up the traditional group of quadrifid noctuids, and re-establishing Erebidae and Nolidae as families (Zahiri et al., 2011). However, this study failed to clarify phylogenetic relationships between Erebidae subfamilies (Zahiri et al., 2012). Additionally, morphological studies were not entirely consistent with the molecular studies in challenging some traditional synapomorphies, such as the ''quadrifid'' forewing venation and the presence of a transverse sclerite in the pleural region of segment A1 (Minet, Barbut & Lalanne-Cassou, 2012).
Complete mitogenomes and the mitochondrial genes are increasingly applied to understand phylogenetic relationships. For example, Wang et al. (2015) proposed two new tribes and established relationships between them within Lymantriinae by using two mitochondrial genes (cox1 and rrnL) along with six nuclear genes, using ML and Bayesian Inference (BI). The nucleotide and amino acid sequences of mitochondrial PCGs are also broadly used to determine the taxonomic status of species and to analyze phylogenetic relationships within Erebidae (Yang & Kong, 2016;Liu et al., 2017). Furthermore, as the mitogenome differs from the nuclear genome, it has been increasingly used to investigate poorly supported phylogenetic questions such as the position of Nymphalidae within Papilionoidea (Yang et al., 2009). Since many species of the genus Dysgonia have been moved to other genera, including Erebidae and Noctuidae based on the classification of Holloway & Miller (2003), the taxonomic status of many species remained uncertain. In our study, we sequenced the complete mitogenome of D. stuposa and reconstructed phylogenetic relationships to assess its phylogenetic position within Noctuoidea. The newly sequenced mitogenome supported new phylogenetic relationships within Erebidae and will provide a foundation for further studies into Noctuidae and Erebidae mitogenomics, biogeography, and phylogenetics.

Specimen collection and Genomic DNA extraction
The D. stuposa moths were collected from Xiangshan mountains (N33 • 59 , E116 • 47 ), Huaibei, Anhui, China. Based on morphological characteristics, the collected specimens were identified as D. stuposa using the record in Fauna Sinica (Chen, 2003). The genomic DNA (contains nuclear genome and mitogenome) of D. stuposa was isolated using the Animal Genomic DNA Isolation Kit according to the manufacturer's instructions (Sangon, Shanghai, China).

PCR amplification and fragment sequencing
To amplify the D. stuposa mitogenome, the universal (F1-R13) and specific primers (S1F-S3R) were used to perform PCR amplification (Table 1) (Sun et al., 2016). All PCR amplifications were executed using high fidelity DNA Polymerase (PrimeSTAR R GXL, Takara, Dalian, China). PCRs was performed according to Sun et al. (2016) with extension times depending on the putative length of target fragment. PCR product size was determined by agarose gel with TAE buffer, then sequenced at General Biosystems (General, Chuzhou, China) in both forward and reverse directions using ABI 3500 Genetic Analyzer by the Sanger sequencing method. For long fragments, internal sequencing primers were designed based on known fragment sequence. For the A+T rich region, the fragment was sequenced from two directions and repeated three times.

Sequence assembly and annotation
The complete mitogenome was assembled using the DNAMAN (https://www.lynnon. com/index.html). Sequence annotation (supplied in supplemental files) was performed by MITOS2 Web Server (http://mitos2.bioinf.uni-leipzig.de/index.py) and confirmed by BLAST to homologous sequences in NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi). To determine PCG initiation and termination codons, sequences were aligned with other published Noctuoidea sequences using ClustalX 2.0 (Larkin et al., 2007). AT skew and GC skew values were calculated using the methods given by Perna & Kocher (1995). MEGA 5.0 software was used to analyze relative synonymous codon usage (RSCU) (Tamura et al., 2011). tRNA genes were determined by tRNAscan Search Server (http://lowelab.ucsc.edu/tRNAscan-SE/) and secondary structures inferred from folding into their canonical clover-leaf structures (Lowe & Eddy, 1997). rRNA genes were determined by MITOS2 Web Server and confirmed by BLAST with the homologous sequences in NCBI. Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html) was used to analyze non-coding regions for tandem repeats (Benson, 1999).

Phylogenetic analysis
To infer the phylogenetic relationships among Noctuoidea at superfamily level, concatenated nucleotide sequence alignments for PCGs from 42 species (Table 2) was performed. All of the sequences were downloaded from GenBank. The Saturnidae species Bombyx mori (AY048187) and Antheraea pernyi (AY242996) (Liu et al., 2008) were used as outgroups. Sequences were aligned using ClustalX 2.0 software (Larkin et al., 2007). ML and BI were used to reconstruct phylogenetic relationships. For the ML analysis, nucleotide sequences were partitioned and performed in IQ-TREE (http://iqtree.cibiv.univie.ac.at/) with the best-fit model GTR+F+I+G4 (Trifinopoulos et al., 2016), and the clade support was investigated with 1000 bootstrap replicates. For the BI analysis, the GTR model and Invgamma rate variation across sites were presented and performed with MrBayes 3.2.6 (Ronquist et al., 2012). One cold chain and three heated chains were run with the dataset for 10 million generations with the tree being sampled every 1,000 generations. After discarding

Genome organization and composition
The D. stuposa mitogenome is a circular DNA molecule, which is 15,721 bp in length (accession number: MK262707) (Fig. 1). The size of the newly sequenced mitogenome is comparable to other Noctuoidea species, which range from 15,377 bp (Agrotis ipsilon) to 15,801 bp (Gynaephora minora) ( Table 3). The variation in size is generally due to differences in the length of their non-coding regions (intergenic spacers and A+T rich region) (Lv, Li & Kong, 2018). Annotation found the typical 37 genes and a non-coding A+T rich region like most of the sequenced insect mitogenomes (Table 4). An A and T biased nucleotide composition is a characteristic feature of insect mitogenomes (Boore, 1999), and D. stuposa is no exception. Nucleotide composition of D. stuposa was highly biased towards using A and T (A = 39.98%, T = 40.38%, G = 7.5%, C = 12.14%) ( Table 3); 80.36% total A+T content is comparable to previously sequenced lepidopterans (ranges from 77.84% in Ochrogaster lunifer to 81.49% in Gynaephora minora).

Protein-coding genes and codon usage
PCGs identified from the D. stuposa mitogenome had a total length of 11,269 bp, accounting for 71.7% of the mitogenome. In insects, most PCGs are on the J strand (majority), while some of them reside on the N strand (minority) (Simon et al., 1994). In D. stuposa, nine of the thirteen PCGs (nad2, cox1, cox2, atp8, atp6, cox3, nad3, nad6 and cob) are encoded on the J-strand, while the remaining PCGs (nad5, nad4, nad4L and nad1) are on the N-strand. An ATN codon initiated all PCGs except cox1, which uses a CGA codon, as in most Lepidoptera (Table 4). The utilize of non-canonical initiation codons for cox1 is a common feature across insects (Liu et al., 2016;Dai et al., 2016).
To estimate codon usage among Noctuoidea species and to assess similarities and variations in codon usage and distribution, PCGs nucleotide sequences of seven Noctuoidea (belonging to four families: Erebidae, Noctuidae, Nolidae and Notodontidae) were compared (Fig. 2). In D. stuposa phenylalanine (Phe), asparagine (Asn), leucine (Leu), methionine (Met), tyrosine (Tyr) and isoleucine (Ile) were the most commonly used amino acids, while cysteine (Cys) was the most rarely utilized amino acid. Codon usage is similar across Noctuoidea. Furthermore, we used the codons per thousand (CDspT) metric to illustrate the codons distribution in different species (Dai et al., 2015) (Fig. 3). CDspT results exhibited similar trends across the Noctuoidea superfamily, with the maximum CDspT value observed for Asn and Ile. Relative Synonymous Codon Usage (RSCU) for Noctuoidea species is presented in Fig. 4. Codons usage within a given amino acid varied between species. All codons were found in D. stuposa, except ACG and CCG. Some noctuid species lack GC rich synonymous codons, with G or C at the third codon position, such as GCG, CGC, GGC and CCG (e.g., these are not present in A. ipsilon) (Wu, Cui & Wei, 2015). The rarity or complete absence of GC-rich codons occur in various insect species (Sun et al., 2017;Li et al., 2018).

Ribosomal RNA and transfer RNA genes
The D. stuposa mitogenome contains the large (rrnL) and small ribosomal genes (rrnS), encoded by the N strand with a length of 1,308 bp and 782 bp, respectively (Fig. 1, Table 4). In D. stuposa, rrnL was located between trnL1 and trnV, while rrnS was resided between trnV and the AT-rich region, as reported in previously sequenced mitogenomes (Yang et al., 2009  There are 22 tRNA genes in the D. stuposa mitogenome, ranging in size from 57 bp (trnA) to 71 bp (trnK ) ( Table 4). Almost all tRNAs had the canonical clover-leaf secondary structure, except trnS1 that lacks the dihydrouridine (DHU) arm (Fig. 5), a common feature of trnS1 across mitogenomes of insects (Lavrov, Brown & Boore, 2000;Zhang et al., 2013). Stem pair mismatches in the secondary structure of tRNAs were observed such as an A-A mismatch (trnM ), U-G mismatches (trn I, trnQ, trnW, trn Y, trnL2, trnG, trnF, trnH, trn T, trnP, trnV ), U-U mismatches (trn Y, trnL2, trnS2) and a U-C mismatch (trnA). These mismatches may be corrected by an RNA-editing process which was proposed by Lavrov, Brown & Boore (2000), but has not been investigated fully in Lepidoptera.

Overlapping, intergenic spacer and A+T rich regions
Overlapping genes has been proposed to extend the genetic information possibly within the limited size of the genome, and are commonly observed in metazoan mitogenomes (Wolstenholme, 1992). We identified nine overlapping regions, a total length of 144 bp (Table 4). A seven bp overlapping region present at the boundary of atp6 and atp8 has also been reported in many other insects. The D. stuposa mitogenome also had 21 intergenic spacer regions, ranging in size from 1 to 105 bp. The 105 bp spacer located between trnA  and trnR and had high A and T content (A = 47.62% and T = 49.52%) and a similar spacer has been described in Andraca theae (77 bp spacer with A = 46.75% and T = 44.16%). We also observed a 22 bp spacer that contained an 'ATACTAA' motif located between nad1 and trnS2 (Fig. 6A). This region commonly exists in most insect mitogenomes even though the region varies in size between lepidopteran species (Cameron & Whiting, 2008). Metazoan mitogenomes usually have a single large non-coding region, named as the A+T rich region (Clayton, 1991). It contains initiation signals for DNA transcription and replication (Fernández-Silva, Enriquez & Montoya, 2003). The A+T rich region of D. stuposa mitogenome is located between rrnS and trnM and is 406 bp in size (Table 4), with the negative GC skew (−0.355) and highest A+T content (92.37%) of the genome (Table 3). The A+T rich region usually contains multiple tandem repeat elements (Zhang & Hewitt, 1997); however, D. stuposa did not have macro-repeats but does include short repeating sequences. It has the 'ATAGA' motif along with a 20 bp poly-T repeat, a microsatellite-like (AT) 10 repeat and a poly-A repeat sequence upstream of trnM (Fig. 6B). The poly-T stretch varies between different species (Dai et al., 2015), but the 'ATAGA' motif is conserved in insects (Zhang & Hewitt, 1997).

Phylogenetic relationships
To determine the phylogenetic position of D. stuposa, we reconstructed phylogenetic relationships with Noctuoidea species. In phylogenetic analyses, mitogenome PCGs have a lower sensitivity to analytical bias compared to other genes such as the tRNA or rRNA genes (Yang et al., 2015). Here, we applied the nucleotide sequence of the 13 PCGs for phylogenetic analyses using BI and ML methods. Results showed that D. stuposa is closely related to Grammodes geometrica, a clade that was well supported by both the methods (Figs. 7A and 7B). D. stuposa belongs to the family Erebidae and subfamily Catocalinae, consistent with the reported classification of Erebidae (Zahiri et al., 2011). Erebidae is a large noctuid family (Yang et al., 2015); however, its monophyly remained unconfirmed, especially for Catocalinae (Zahiri et al., 2012). In the present study, the Catocalinae was found monophyletic, but nodal support values were not significant, i.e., 0.76 posterior probability (BI) and 31% bootstrap values (ML). There is still some controversy about relationships of Catocalinae under Erebidae. Zahiri et al. (2011) demoted Catocalinae to a tribe Catocalini within the subfamily Erebinae, and upgraded Anobini (formerly as a tribe within Catocalinae by Holloway (2005) to subfamily Anobinae. Several species of the Dysgonia genus have been reclassified into Noctuidae (Holloway & Miller, 2003), results in further complications for phylogenetic analysis. Within Erebidae, our study supported the monophyly of subfamilies and suggested that Catocalinae is a subfamily, most closely