Mitochondrial genomes of three Tetrigoidea species and phylogeny of Tetrigoidea

The mitochondrial genomes (mitogenomes) of Formosatettix qinlingensis, Coptotettix longjiangensis and Thoradonta obtusilobata (Orthoptera: Caelifera: Tetrigoidea) were sequenced in this study, and almost the entire mitogenomes of these species were determined. The mitogenome sequences obtained for the three species were 15,180, 14,495 and 14,538 bp in length, respectively, and each sequence included 13 protein-coding genes (PCGs), partial sequences of rRNA genes (rRNAs), tRNA genes (tRNAs) and a A + T-rich region. The order and orientation of the gene arrangement pattern were identical to that of most Tetrigoidea species. Some conserved spacer sequences between trnS(UCN) and nad1 were useful to identify Tetrigoidea and Acridoidea. The Ka/Ks value of atp8 between Trachytettix bufo and other four Tetrigoidea species indicated that some varied sites in this gene might be related with the evolution of T. bufo. The three Tetrigoidea species were compared with other Caelifera. At the superfamily level, conserved sequences were observed in intergenic spacers, which can be used for superfamily level identification between Tetrigoidea and Acridoidea. Furthermore, a phylogenomic analysis was conducted based on the concatenated data sets from mitogenome sequences of 24 species of Orthoptera in the superorders Caelifera and Ensifera. Both maximum likelihood and bayesian inference analyses strongly supported Acridoidea and Tetrigoidea as forming monophyletic groups. The relationships among six Tetrigoidea species were (((((Tetrix japonica, Alulatettix yunnanensis), Formosatettix qinlingensis), Coptotettix longjiangensis), Trachytettix bufo), Thoradonta obtusilobata).

Because of their small size and minor importance as agricultural pests, this group has been of little concern and the focus of few studies. The study of Tetrigoidea focused primarily on behaviour, morphology, anatomy and cytology before the 1990s, and included bioecological observations (Paranjape & Bhalerao, 1985) and karyology (Warchałowska-Śliwa & Maryańska-Nadachowska, 1989;Maryańska-Nadachowska & Warchałowska-Śliwa, 1991;Del Cerro, Jones & Santos, 1997;Ma & Zheng, 1994;Ma & Guo, 1994). Research on the molecular systematics of Tetrigoidea gradually appeared later, with most of the studies focusing only on single genes. For example, the phylogenetic results of Flook & Rowell (1997a) and Flook & Rowell (1997b) support the monophyly of Tetrigoidea and a close relation between Tetrigoidea and Tridactyloidea. In a study of the phylogeny of Tetrigoidea, Jiang (2000) showed that Scelimenidae was sister group to all other Tetrigoidea of the sampling, and Tetrigidae located at the end of the phylogeny. However, according to the research of Chen (2005) and Yao (2008), Batrachididae was located at a more basal position and sistered to all other Tetrigoidea.
Animal mitogenome sequencing has exploded in recent years, and over 40,000 mitogenomes are avalialbe in the NCBI database (Tan et al., 2017). The insect mitogenome is typically a small, double-stranded circular molecule that ranges in size from 14 to 19 kb and encodes 37 genes (Kim et al., 2005). The mitogenome is one of the most extensively studied genomic systems and a widely used molecular component in the phylogenetic analysis of insects (Cameron, 2014), such as Tarragoilus diuturnus (Zhou, Shi & Zhao, 2014) and Lerema accius (Cong & Grishin, 2016).
Tetrigoidea is an important group in the phylogenetic and systemic studies of Caelifera; however, few complete mitogenomes were found in the GenBank database. Thus, currently, the phylogeny of Tetrigoidea is almost completely unknown (Song et al., 2015). For further study of the phylogenetic relationships among Tetrigoidea, the mitogenomes of Formosatettix qinlingensis, Coptotettix longjiangensis and Thoradonta obtusilobata were determined in this study. The phylogenetic analysis based on mitogenome data will provide a new insight for better understanding the phylogenetic relationship of Caelifera as well as Tetrigoidea.

Sample collection and DNA extraction
Specimens of F. qinlingensis were collected in Shaanxi, China, those of C. longjiangensis in the Guangxi Zhuang Autonomous Region, China, and those of T. obtusilobata in Guizhou, China. Insects were preserved in 100% ethanol and stored at 4 • C. The total genomic DNA was extracted using the standard phenol/chloroform method (Sambrook & Russell, 2001).

PCR amplification and sequencing by primer walking
Ten primary pairs of primers (Table S1) were used to amplify contiguous and overlapping fragments of the mitogenomes of F. qinlingensis, C. longjiangensis and T. obtusilobata, based on other published primer pairs (Zhou, 2008;Simon et al., 2006). PCR was performed in a total volume of 25 µL containing 12.5 µL of r-Taq mix (TaKaRa, Dalian, China), 9.5 µL of ddH 2 O, 1 µL of each primer (10 µmol), and 1 µL of template DNA. The amplifications were performed under the following conditions: predenaturation at 96 • C for 2 min followed by 40 cycles of 96 • C for 20 s, 50.4 • C for 90 s and 68 • C for 4 min and a final extension at 72 • C for 7 min. PCR products were sequenced by Beijing Huada Gene Technology Co., LTD.

Sequence assembly, annotation and analysis
The mitogenome sequences of F. qinlingensis, C. longjiangensis and T. obtusilobata were assembled using the Staden package 1.7.0 (Staden, Beal & Bonfield, 2000). Most of the transfer RNAs were identified by tRNAscan-SE 1.21 (Lowe & Eddy, 1997), and the other genes were determined by comparison with T. japonica (GenBank accession number JQ340002). The secondary structures of rRNA were inferred by comparison with those of Pedopodisma emiensis (Zeng, 2014) and Gomphocerus sibiricus (Zhang, 2013).

Phylogenetic analyses
In this study, the complete mitogenomes of 21 members of Caelifera, including three newly determined sequences of F. qinlingensis, C. longjiangensis and T. obtusilobata were used in the phylogenetic analysis (Table S2). Three species of Ensifera were used as the out-groups (Table S2). Thirteen protein-coding genes (PCG) and two rRNA genes were used for the construction of phylogenetic trees. All PCGs were aligned at the amino acid level using the default settings in MEGA 6.0 (Tamura et al., 2013), and the alignments were back translated to the corresponding nucleotide sequences. Because of high variability, the stop codons in PCGs were excluded in the alignment (Zhang et al., 2014;Shuang-Shuang et al., 2014). Two rRNA genes were aligned using Clustal X1.83 (Thompson et al., 1997), respectively. Finally, a PCG12 data set of 7,580 bp containing the first and second codon sites of 13 PCGs, a PCG123RY data set of 11,370 bp containing 13 PCGs with the third codon sites employing RY-coding strategy, a PCG12rRNA data set of 9,950 bp containing the first and second codon sites of 13 PCGs and two rRNA genes, and a PCG123RYrRNA data set of 13,740 bp containing 13 PCGs with the third codon sites employing RY-coding strategy and two rRNA genes were used for the phylogenetic analyses. PartitionFinder v1.1.1 (Lanfear et al., 2012) was used to search the optimal partitions and best models, with the ''unlinked'' branch lengths, ''BIC'' model selection, and ''greedy'' algorithm (Table 1).
The phylogenies were determined using both maximum likelihood (ML) and Bayesian inference (BI) methods. The ML analysis was performed using the program RAxML version 7.0.3 (Stamatakis, 2006), and the optimal partitions and best models were selected by using PartitionFinder v1.1.1 (Lanfear et al., 2012). A bootstrap analysis was performed with 1,000 replicates. The BI analysis was performed using MrBayes version 3.1.2 (Ronquist & Huelsenbeck, 2003), and also employing the optimal partitions and best models selected by PartitionFinder v1.1.1 (Lanfear et al., 2012). According to Markov Chain Monte Carlo analysis, four chains (one cold and three heated chains) were set to run simultaneously for 1,000,000 generations. Each set was sampled every 100 generations with a burn-in of 25%, and the remaining samples were used to obtain the consensus tree. The effective sample size (ESS) values were analyzed by Tracer v1.5 (Rambaut, Suchard & Drummond, 2004), with ESS values greater than 200.

Mitochondrial genomic structure
The size of the mitogenome sequence obtained from F. qinlingensis, C. longjiangensis and T. obtusilobata was 15,180, 14,495 and 14,538 bp, respectively ( Table 2). The three mitogenomes were deposited in the GenBank database under accession numbers KY798412 (F. qinlingensis), KY798413 (C. longjiangensis) and KY798414 (T. obtusilobata). The gene composition, order, and orientation of all three mitogenomes were the same as those of the mitogenomes of other Tetrigoidea species, such as T. japonica (JQ340002), and each sequence included 13 PCGs, partial sequences of rRNA genes (rRNAs), tRNA genes (tRNAs) and a A + T-rich region (Table 2; Fig. 1). As shown in other Tetrigoidea species, transcribed from the light strand were two rRNAs, four PCGs and eight tRNAs ( Table 2). The A + T contents were 75.6%, 73.1% and 71.8% in the mitogenomes of the Tetrigoidea species F. qinlingensis, C. longjiangensis and T. obtusilobata, respectively.

Nucleotide composition and skew
A comparative analysis of A + T content vs AT-skew and G + C content vs GC-skew within Caelifera mitogenomes is shown in Fig. 2. The approximately positive correlations were found between A + T content and AT-skew, and as well as between G + C content and GC-skew ( Figs. 2A and 2B). The trends of increased A + T content and AT-skew were roughly Tridactyloidea < Eumastacoidea < Acridoidea/Tetrigoidea, while the increased G + C content and GC-skew were roughly Acridoidea/Tetrigoidea < Tridactyloidea.
The average AT-skew of Caelifera mitogenomes was 0.15, ranging from 0.01 in Ellipes minuta to 0.22 in C. longjiangensis (Table S3). The average GC-skew of mitogenomes was −0.19, ranging from −0.30 in E. minuta to −0.11 in Pielomastax zhengi (Table S3). The Tridactyloidea had lower A + T content and A-skew, higher G + C content and C-skew compared with other superfamily in Caelifera.

Protein-coding genes
In F. qinlingensis, C. longjiangensis and T. obtusilobata, the A + T content of PCGs was 74.7%, 72.0% and 70.5%, respectively. For each PCG of the three Tetrigoidea mitogenomes, the A + T contents of atp8 and nad6 were much higher and those of COX genes in all three species lower than those of the other genes (Fig. S1), which are similar results to those found by Zhang et al. (2013b). Four PCGs (nad5, nad4, nad4L and nad1) coded by the N-strand had a T-skewed value, whereas each PCG in the J-strand was C-skewed, and each PCG in the N-strand was G-skewed (Fig. S1), which are results similar to those for Gomphocerinae mitogenomes (Zhang et al., 2013b).
For the initial and termination codons, the most common start codon was ATG. Start codons GTG, ATT, ATC, ATA, GTA and ACA also occurred in the Tetrigoidea species, with some of them conserved, such as ATC in cox1. The same use of ATC in cox1 is found in other  For all three Tetrigoidea species, stop codon usage was consistent in 11 PCGs (nad2, cox2, atp8, atp6, cox3, nad3, nad5, nad4, nad4L, nad6 and nad1). Cox3 and nad5 were terminated with the incomplete stop codon T in the three Tetrigoidea species. The terminal T serves as a stop signal after it is completed to UAA via post-transcriptional polyadenylation (Ojala, Montoya & Attardi, 1981). The relative synonymous codon usage of Caelifera was analyzed. The use of the anticodons NNA and NNU was relatively frequent, while NNG and NNC was lower (Table S4). This result revealed the preference for A or T in the third position, which was similar to the results of whiteflies (Chen et al., 2016). Mitogenome encoded 22 tRNA genes, which were used to synthesis 20 amino acids. Some mostly used synonymous codons of PCGs did not correspond to the tRNA anticodons of mitogenomes. For example, UUU is the mostly used synonymous codon of Phe(F) (Table S4), while anticodon of trnF in the mitogenomes is UUC (Fig. S4). This result shows that the protein synthesis of mitogenomes not only depends on mitochondria encoded tRNAs, but also needs nuclear encoded tRNAs.
The average ratio of Ka/Ks was calculated for each PCG of six Tetrigidae mitogenomes. The results showed that atp8 had the highest evolutionary rate, while cox1 was the lowest (Table S5). The average ratios of Ka/Ks for each PCG were all below 1 (Table S5), indicating the existence of purifying selection. A roughly negative correlation was observed between the average ratio of Ka/Ks and the G + C content of each PCG (Table S5), which was also found in true bug mitogenomes (Li et al., 2012). The evolutionary patterns of mitochondrial genes were probably caused by the varied G + C content (Hua et al., 2008). Furthermore, the ratios of Ka/Ks for atp8 gene were above 1 in some pairwise comparison (Fig. 4), indicating under positive selection. The varied sites of atp8 gene might be associated with the evolution of T. bufo (Fig. 5).

Ribosomal and transfer RNA genes
As in most insect mitogenomes, two rRNA genes (rrnL and rrnS) occurred in the three Tetrigoidea mitogenomes between trnL(cun) and the A + T-rich region, separated by trnV. The lengths of rrnS and rrnL determined in F. qinlingensis were 781 and 1,292 bp, respectively, and the A + T content of rrnS and rrnL was 76.7% and 79.3%, respectively. The overall rrnS structure of F. qinlingensis included three domains (Fig. S2), which were identical with those predicted for other Caelifera species, such as G. sibiricus (Zhang et al., 2013b). The secondary structure of rrnL in F. qinlingensis contained six domains with domain III degenerated to a single strand as one bond (Fig. S3), which is a structure similar to that found in the study of Zhang et al. (2013b). The percentage of conserved sites in the six domains among the three Tetrigoidea species showed that more conserved sites were in domains IV, V and VI than in other domains, whereas domain III had more variable sites.
A total of 22 tRNAs were found interspersed in the mitogenomes of F. qinlingensis and C. longjiangensis, which ranged in size from 54 bp (trnI) to 72 bp (trnV). Both trnL and trnS had two copies in the mitogenomes. Most of the tRNAs could be folded into the canonical cloverleaf secondary structure, except for trnS(agn) (Fig. S4). The trnS(agn) lacked the DHU arm in the three Tetrigoidea mitogenomes, which is a feature commonly observed in other Caelifera species (Zhao et al., 2010;Liu & Huang, 2010). Twenty-two non-Watson-Crick pairings were identified in tRNA genes of the F. qinlingensis mitogenome, including 18 G-U mismatches. Most of these G-U pairs were found in tRNAs on the N-strand. By contrast, in the study of Asakawa et al. (1991), G-U pairs are found more frequently in tRNAs of the J-strand in mitogenomes of the various animals they examined. Two A-G pairs were predicted in the acceptor arm of trnW and trnR; one A-A pair was predicted in the acceptor arm of trnQ; and one C-U pair was predicted in the TψC arm of trnH (Fig. S4).

A + T-rich region
A 600 bp A + T-rich region was observed between rrnS and trnI in the mitogenome of F. qinlingensis, which was composed of 80.8% A + T. The high mutation rate of this region might be related to the high A + T content and low selection pressure (Yang et al., 2011). F. qinlingensis had a larger A + T-rich region than that of other species of Tetrigoidea, e.g., 460 bp in A. yunnanensis (JQ272702) and 531 bp in T. japonica (JQ340002). Conserved or variable sections are not observed in the A + T-rich regions of insects; whereas tandem repetitions and conserved structural elements have been observed (Zhang, Szymura & Hewitt, 1995;Zhang & Hewitt, 1997). The A + T-rich region of F. qinlingensis contained tandem repeated sequences, and the repeats with (AATAATAAAAAAA)n (n = 3.1) were found at the 5 end of the A + T-rich region (nt 30-71), with more A nucleotides.

Phylogenetic analyses
The phylogenetic trees resulting from the PCG-ML and PCG-BI analyses were consistent, except for Myrmecophilus manni ( Fig. 6 and Fig. S5). The ML and BI topologies of mitochondrial datasets generated similar tree topologies ( Fig. 6 and Fig. S5).
In this study, Acrididae was the sister group of Pyrgomorphidae in Acridoidea. The phylogenetic relationships of subfamilies in Acrididae were ((((Gomphocerinae + Oedipodinae) + ((Calliptaminae + Catantopinae) + (Oxyinae + Melanoplinae))) + (Acridinae + Oedipodinae)) + Thrinchinae). However, the phylogenetic relationships within Acrididae obtained in this study contained some differences with other studies (Zhang et al., 2013a), such as a clade including A. cinerea and L. migratoria, which might be caused by different sampling approaches. Apart from different sampling approaches, hybridization might be a major reason for the difference, as hybridization has been observed and described in a number of acridoid species (Gottsberger, 2007;Hochkirch & Lemke, 2011;Rohde et al., 2015).

Grant Disclosures
The following grant information was disclosed by the authors: National Natural Science Foundation of China: 31601846. Natural Science Foundation of Shaanxi Province, China: 2017JQ3014. China Postdoctoral Science Foundation: 2016M602760. Fundamental Research Funds for the Central Universities, China: GK201603109, GK201603112.