Mitochondrial phylogeny and comparative mitogenomics of closely related pine moth pests (Lepidoptera: Dendrolimus)

Background Pine moths, Dendrolimus spp. (Lasiocampidae), are serious economic pests of conifer forests. Six closely related species (Dendrolimus punctatus, D. tabulaeformis, D. spectabilis, D. superans, D. houi, and D. kikuchii) occur in China and cause serious damage to coniferophyte. The complete mito genomes of Dendrolimus genus are significant to resolve the phylogenetic relationship and provide theoretical support in pest control. Methods The complete mitogenomes of three species (D. superans, D. houi, and D. kikuchii) were sequenced based on PCR-amplified with universal primers, which were used to amplify initial fragments. Phylogenetic analyses were carried out with 78 complete mitogenomes of lepidopteran species from 10 superfamilies. Results The complete mitochondrial genomes of these three species were 15,417, 15,381, and 15,377 bp in length, separately. The phylogenetic analyses produced consistent results for six Dendrolimus species based on complete mitogenomes, two major clades were formed, one containing D. spectabilis clustered with D. punctatus + D. tabulaeformis, and D. superans as the sister group to this three-taxon clade, the other containing D. kikuchii and D. houi. Comparative analyses of the congeneric mitochondrial genomes were performed, which showed that non-coding regions were more variable than the A+T rich region. The mitochondrial nucleotide diversity was more variable when compared within than among genus, and the concatenated tRNA region was the most conserved and the nd6 genes was the most variable.


INTRODUCTION
Pine moths in the genus Dendrolimus (Lasiocampidae) are major economic pests of coniferous trees, such as Pinus, Larix, Picea, and Abies. The caterpillars feed extensively on conifer needles; the resulting damage may reduce the tree's seed yield and can lead to heavy defoliation, dieback, and death (Hou, 1987;Chen, 1990;Liu & Wu, 2006;Zhang et al., 2003a;Zhang et al., 2003b). During an outbreak period, a pine tree can be consumed (Curole & Kocher, 1999) providing an understanding of evolutionary dynamics and trends in a phylogenetic framework.
In this study, six complete mitogenomes from three species (D. superans, D. houi, and D. kikuchii, two individuals per species) were newly sequenced. These were combined with the complete mitogenomes of three other species (D. punctatus, D. tabulaeformis, D. spectabilis), which have been published previously (Qin et al., 2015), to investigate the taxonomic status of species in the genus Dendrolimus. To place the relationships of the genus Dendrolimus within a broader context, we also conducted phylogenetic analyses of mitogenomes from other lepidopteran species (mainly moth species). In order to investigate the evolutionary dynamics among six Dendrolimus species, comparative analyses were conducted based on 14 mitogenomes (including two subspecies of D. punctatus), comparing nucleotide composition, codon usage, differences of overlap and non-coding regions.

MATERIALS AND METHODS
Sample collection, DNA extraction, PCR amplification, sequencing, sequence assembly, and annotation Adult pine moth specimens were sampled at four locations in China, including Inner Mongolia, Yunnan and Guizhou provinces (Table 1). All specimens were preserved in 95% ethanol in the field and stored at 4 C in the laboratory until DNA extraction. The specimens were identified by Chun-sheng Wu, Institute of Zoology, Chinese Academy of Sciences, China, using morphological characters. Six individuals of three species (D. kikuchii, D. houi, and D. superans, two individuals for each species) were selected for sequencing in this study. Total genomic DNA was extracted from thoracic muscle tissue and leg muscle tissue using a DNeasy BLOOD and Tissue kit (QIAGEN, Hilden, Germany) following the manufacturer's protocol.
Mitochondrial genomes were PCR-amplified and sequenced as described in our previous study (Qin et al., 2015). In brief, universal primers were used to amplify initial fragments. Specific fragments were then designed to amplify overlapping regions (i.e., primer walking) (Salvato et al., 2008;Gissi, Iannelli & Pesole, 2008). PCR recipes and conditions followed Qin et al. (2015). All reactions were performed using Takara LA taq (TaKaRa Co., Dalian, China). PCR fragments containing the control region were cloned into the pEASY-T3 Cloning Vector (Beijing TransGen Biotech Co., Ltd., Beijing, China) and then sequenced by using tailed primers, M13-F (CGCCAGGGTTTTCCCAGTCACGAC) and M13-R (GAGCGGA TAACAATTT CACACAGG) primers. Raw sequences were checked manually and assembled on the basis of overlapping regions with the Bioedit V7.0.5 (Hall, 1999). The tRNA genes were identified by tRNAscan-SE Search Server v.1.21 (Simon et al., 1994). Protein-coding and rRNA genes were determined by comparing homologous sequences with other published lepidoptera mitochondrial genomes (following (Qin et al., 2015)). The sequence data have been deposited in GenBank under accession numbers KY000409-KY000414.

Phylogenetic analysis
To place the relationships of the genus Dendrolimus within a broader context, we reconstructed the phylogenetic tree of mitogenomes from other 64 lepidopteran species (mainly moth species). On the other hand, in order to investigate the phylogenetic status of Lasiocampoidea, comparative analyses were conducted based on 14 mitogenomes (including six species) of Lasiocampoidea and 13 mitogenomes of Bombycoidea. Other than the six mitogenomes obtained in this study, other complete mitogenomes were mined from Genbank representing lepidopteran species from 10 superfamilies (Supplemental Information 1). Moreover, in order to clarify the relationship between D. superans and D. sibiricus, the COI and COII sequences of D. sibiricus (GenBank No. KJ007773, KJ007800, KJ007771, and KJ007815) were used as supplement data to reconstructed phylogenetic tree, which was carried out based on 13 PCGs with maximum likelihood (ML) method.
Nucleotide sequences of the 13 PCGs were aligned based on the translated amino acid sequences using a customized perl script. Non-protein coding region were aligned using MUSCLE with default settings (Edgar, 2004). The separated genes and partitions were concatenated with SequenceMatrix software (Vaidya, Lohman & Meier, 2011). The concatenated sets of nucleotides were organized into two datasets: dataset 1 representing the 13 PCGs only and dataset 2 representing 37 genes (13 PCGs + 22 transfer RNA genes (tRNA) + 2 ribosomal RNA genes (rRNA)). Substitution saturations of 2 datasets were tested with software DAMBE (Xia & Xie, 2001), and both datasets were used in phylogenetic analyses, under the optimality criteria of ML and Bayesian inference (BI) (Ronquist & Huelsenbeck, 2003).
In order to standardize the partitioning strategy as recommended for phylogenetic analyses with mitogenomes (Zardoya & Meyer, 1996), PartitionFinder v1.1.1 software was used to select the optimal partitioning scheme and to find the best-fitting substitution model for each partition under the Bayesian Information Criterion (Lanfear et al., 2012). Not only that, optimized nucleotide substitution models could avoid being affected by the long branch attraction to some extent (Bergsten, 2005). The maximum possible partition scheme was 15 partitions: each PCG as a separate partition, the concatenated 22 tRNA genes and the concatenated rRNA genes).
Maximum likelihood analysis was performed with RAxML v7.9.6 and BI analysis with a parallel version of MrBayes v 3.2.2 (Stamatakis, 2006;Ronquist et al., 2012). The GTR+G+I model was selected for each partition in the two datasets. Support values for the ML topologies were evaluated via bootstrap tests with 1,000 iterations (in RaxML). BI analysis was conducted with two sets of four independent Markov chains run for 10 million Metropolis-coupled generations, with tree sampling occurring every 1,000 generations, and burn-in set to 25% of the trees. After 10 million generations, all runs reached stationary as determined by the program Tracer v1.5.0 (Rambaut & Drummond, 2007).

Genetic distance analysis among closely related species of Dendrolimus
In order to test the intraspecific and interspecific differentiation of Dendrolimus, 14 mitogenomic were used to calculate the genetic distance across the two datasets described above, which including two subspecies of D. punctatus (D. punctatus punctatus and D. punctatus wenshanensis) and other five species. Genetic distances were calculated using the GTR model selected as the best model by akaike information criterion which performed with Modeltest 3.7 (Posada & Buckley, 2004;Ronquist et al., 2012). Genetic distances were calculated using a custom C++ script that uses the bio++ function library (Guéguen et al., 2013). A correlation matrix was also estimated according to obtained genetic distance matrix. Correlation values ranged from -1 to 1, where values closer to one are indicative of a closer relationship. A graphical visualization of the genetic distances and correlation matrix was drawn using the corrplot.mixed function in R package (Wei, 2013).

Phylogenetic analyses of Lasiocampoidea
In the phylogenetic analyses of 78 moth mitogenomes, the monophyly of each superfamily was generally well-supported, in which Lasiocampoidea and Bombycoidea were monophyletic and clustered together as sister groups with high support (Figs. 1 and 2). Similar trees were obtained based on both two datasets (13PCGs and 37 genes), the only difference was among the superfamilies Bombycoidea, Geometroidea, Lasiocampoidea and Noctuoidea, which altogether constitute approximately 73,000 described species (Minet, 1991). The 13 PCGs dataset phylogeny placed Geometroidea with Bombycoidea and Lasiocampoidea, and Noctuoidea as the sister group to this three-taxon clade (53% BP support and 0.78 posterior probabilities) ( Fig. 1), which revealed similar relationship with a prior study . Nonetheless, the 37 gene dataset phylogeny (Supplemental Information 2) placed Bombycoidea + Lasiocampoidea as the sister group to Geometroidea + Noctuoidea with higher branch support (100% BP support and 1.0 posterior probabilities). The latter relationship was demonstrated with morphological and multigenetic proofs (Regier et al., 2009;Nieukerken Van et al., 2011;Bazinet et al., 2013;Kawahara & Breinholt, 2014). Likelihood and Bayesian inference analyses of (i) 13 protein coding genes (13PCGs); (ii) 37 genes (13 protein-coding genes + 22 transfer RNA genes + 2 ribosomal RNA genes, 37 gene). Numbers above or below branches indicate posterior probabilities and bootstrap percentages across the difference analyses and datasets (13PCGs-BI/13PCGs-ML/37gene-BI/37gene-ML). (B) Cladogram constructed using Bayesian inference analysis of nucleotide sequences of 13 mitochondrial protein-coding genes of Lepidopteran (moth) species, plus outgroups. Numbers above or below branches indicate posterior probabilities.

Phylogenetic and genetic distance analyses of Dendrolimus
Phylogenetic analyses of Dendrolimus resulted in a fully resolved tree with robust support for nearly all nodes (Figs. 1 and 2). Phylogenetic analyses inferred from different datasets exhibited the same topology. Six species formed two major clades: D. punctatus + D. tabulaeformis + D. spectabilis + D. superans (Clade 1) was the sister group to D. kikuchii and D. houi (Clade 2). Within "Clade 1," D. spectabilis clustered with D. punctatus + D. tabulaeformis, and D. superans was the sister group to this three-taxon clade (Fig. 1A). Among six morphospecies, D. superans was successfully found as monophyletic species of these three species (D. tabulaeformis, D. punctatus, and D. spectabilis), which was also confirmed by the results based on COI, COII and ribosomal internal transcribed spacer (ITS) (Dai et al., 2012;Kononov et al., 2016). Different from the previous view that D. sibiricus was a subspecies of D. superans (Liu & Wu, 2016), Kononov et al. (2016) demonstrate that D. sibiricus and D. superans were clearly distinguished from each other based on the phylogenetic analysis of ITS2 sequences. According to the phylogenetic analyses inferred from COI and COII sequences, D. sibiricus and D. superans were closed related species (Fig. 1B). However, the taxonomic status of D. sibiricus was still ambiguous and more data from nuclear markers were required. The genetic distance analyses produced results which were consistent with the results of the phylogenetic analyses. The correlation values obtained from genetic distance analysis among specimens of Dendrolimus showed that in many cases intraspecific and interspecific values were very similar. Values for intraspecific and interspecific correlations in the group comprising D. tabulaeformis and two subspecies of D. punctatus were equal or very close to one, which suggests these sequences all have quite a few differences, which would generally be regarded within the range of intraspecific variation. To illustrate the relationship of Dendrolimus more clearly, we re-calculated genetic distance with considered D. punctatus and D. tabulaeformis as an integral taxon (Group A). The genetic distance between D. spectabilis and Group A were 0.05, whereas D. superans and Group A were 0.07 (Fig. 2). Furthermore, both the correlation value between D. houi-Group A and D. kikuchii-Group A were negative, highlighting the relatively distant genetic relationship with other four species (D. punctatus, D. tabulaeformis, D. spectabilis, and D. superans) (Fig. 3).

Base composition and skewness
Metazoan mitogenomes usually exhibit a clear strand bias toward adenine (A) and thymine (T) in nucleotide composition. Consistent with previous observations of Dendrolimus mitogenomes, the mitochondrial sequence of three newly-sequenced Dendrolimus species were biased toward A and T. The A+T content of the majority strand ranged from 78.7% and 78.8% for D. kikuchii, 80% and 79.9% for D. houi, and 80.1% and 80.2% for D. superans (Supplemental Information 3). The strand bias also can be measured as AT-and GC-skews. The average AT-skew across all available Dendrolimus mitochondrial genomes was 0.028, ranging from 0.037 to 0.017, whereas the average GC-skew of the Dendrolimus mitochondrial genomes was -0.23, ranging from -0.26 to -0.22.

Codon usage and RSCU
Condon usage and RSCU results were compared across all available Dendrolimus mitogenomes (Supplemental Information 4). The analysis showed that Leu2 (UUR), Ile, Phe, Met, Asn, Gly, Ser2 (UCN), Tyr are the eight most frequent amino acids and were represented by at least 50 codons per thousand codons. Two codon families, Leu2 and Ile, had at least 100 codons per thousand codons. Leu2, a hydrophobic amino acid, was significantly more frequent than other amino acids, which may relate to the function of chondriosomes in many transmembrane proteins. The rarest used codon family was Cys.
The usage of both twofold and fourfold degenerate codons was biased towards the use of codons with A or T in third position (Supplemental Information 5). Codons which have relatively high G and C content are likely to be abandoned, reflecting a finding across other lepidopteran insects. Examination of the fourteen individual Dendrolimus mitogenomes showed that Leu2 (UUA), Ser2 (UCU), Arg (CGA), Ala (GCU), Ser1 (AGA) are the five most frequent relative synonymous codons.

Sliding-window analysis
Sliding-window analysis was conducted to compare nucleotide diversity among the mitochondrial PCGs and non-coding regions of 14 individuals in Dendrolimus (Fig. 4).
The intergenic region has the highest nucleotide diversity which is likely attributable to the large indels in this region. This was followed by nd6, cytb, cox2, atp6, cox3, nd3, A+T rich region, nd1, cox1, nd2, nd5, nd4, nd4l, atp8, rRNA, tRNA. It is notable that the nucleotide diversity of the A+T rich region was moderate; lower than many PCGs. The tRNA was the most conserved region and cox1 was the most conserved PCG. In contrast, sliding-window analyses using all 78 lepidopteran mitogenomes (same dataset as the phylogenetic analyses) produced substantially similar patterns: the nd6 gene had the highest level of divergence and tRNA was the most conserved region, while the cox1 was the most conserved than all PCGs.

DISCUSSION
The monophyly of each superfamily was generally well-supported based on the 78 mitogenome analysis, which was consistency with prior studies (Yang et al., 2009;Kawahara & Breinholt, 2014;Qin et al., 2015). Previous studies have included Lasiocampidae within Bombycoidea (Brock, 1971;Scoble, 1992;Kawahara & Breinholt, 2014), while other studies have treated Lasiocampidae as a distinct superfamily Lasiocampoidea (Minet, 1991;Regier et al., 2009;Van Nieukerken et al., 2011;Zwick et al., 2011;Bazinet et al., 2013;Wu et al., 2016). However, according to our results based on mitogenomes, Lasiocampoidea, and Bombycoidea were monophyletic and clustered together as sister groups with high support. Within the Bombycoidea, the relationship among the families Bombycidae, Sphingidae, and Saturniidae has been difficult to resolve in previous study (Regier et al., 2013). In our study, the analysis of both datasets placed the Bombycidae as the sister group to Saturniidae and Sphingidae with high support (100% bootstrap), which is consistent with the phylogenetic relationship based on transcriptomic data (2,696 genes) (Breinholt & Kawahara, 2013). The topology of our mitogenome Dendrolimus phylogeny showed some differences from the topology proposed by previous studies. Zhang et al. (2014) constructed a phylogeny of Dendrolimus based on one pheromone-binding proteins (PBP1) and two general odorant-binding proteins (OBPs) in which D. kikuchii and D. houi was proposed as basal species of D. tabulaeformis, D. punctatus, and D. spectabilis. The relationships of these three species were verified with mitogenomes analysis, sharing a closer relationship to each other with respect to D. superans. This result was also proved by the phylogenetic analysis of Dendrolimus based on COI and COII genes (Dai et al., 2012;Qin et al., 2015;Kononov et al., 2016). Two species, D. kikuchii and D. houi, were the most basal species in Dendrolimus. According to the phylogenetic tree based on whole mitogenomes, these two species could both be accept as monophyletic taxa, which was consistent with previous studies depended on COII and ITS genes (Dai et al., 2012;Kononov et al., 2016). This result was different from the previous studies based on the analysis with PBP1 and OBPs. The primary cause of this discrepancy is due to the different modes of inheritance, maternal for mtDNA and biparental for PBPs, OBPs, and ITS. Moreover, the dissimilarity in mutation rates and number of information sites could also lead to inconsistent results. For example, D. kikuchii and D. houi were the most closed related species based on the COI gene (Dai et al., 2012), however, paraphyletic groups were showed based on 5′-end portion of COI gene (Kononov et al., 2016). Not only that, the closed relationship of D. tabulaeformis, D. punctatus, and D. spectabilis was also proved by the genetic distance analysis. D. superans was the sister species to these three species, which also had different mitogenome component characteristics. The largest intergenic spacer of whole mitogenome is the A+T rich region, which not only has the characteristics of non-coding genes, but also contains important sites for the regulation of transcription and replication (Gissi, Iannelli & Pesole, 2008), as well as useful phylogenetic signals, particularly for determining congeneric relationships and relationships among recently diverged species. The results of phylogenetic analysis using the A+T rich region produced similar but slightly different topology comparing with the whole mitogenomes. This suggests the intergenic regions might be too variable to be useful for phylogenetic analyses; nevertheless, the A+T rich region might be an effective molecular tool in solving phylogenetic relationships among recently diverged species.

CONCLUSION
In this study, both phylogenetic and genetic distance analyses obtained consistent results regarding the relationships among six closely related species. The whole mitogenomes failed to provide enough information to distinguish D. tabulaeformis from D. punctatus, which suggest there might not be a clear species boundary between these two species. This finding is consistent with the results of previous studies, in which D. tabulaeformis was regarded as ecological type of D. punctatus based on several DNA markers and experiments of interspecific hybridization. Meanwhile, D. spectabilis fell as sister to these two sibling species, and D. superans fell as sister to these three taxa. D. kikuchii and D. houi are sister species, having relatively close relationship comparing with other four species. The phylogenetic relationships of Dendrolimus based on complete mitogenome could provide a theoretical basis for pest control of pine caterpillar, thereby reducing the economic losses of forests.
Congeneric species exhibit similar mitochondrial genome features, such as genome organization, nucleotide composition, codon usage and RSCU. Within the genus Dendrolimus, start and stop codons were variable in mitochondrial genome and the change of stop codons were rarer than start codons. Non-coding regions were the most variable regions in mitochondrial genomes. When comparing nucleotide diversity, the nad6 gene had the highest level of divergence and the tRNA region was the most conserved.