The Evaluation of Genetic Relationships within Acridid Grasshoppers (Orthoptera, Caelifera, Acrididae) on the Subfamily Level Using Molecular Markers

Over the last few decades, molecular markers have been extensively used to study phylogeny, population dynamics, and genome mapping in insects and other taxa. Phylogenetic methods using DNA markers are inexpensive, fast and simple to use, and may help greatly to resolve phylogenetic relationships in groups with problematic taxonomy. However, different markers have various levels of phylogenetic resolution, and it’s important to choose the right set of molecular markers for a studied taxonomy level. Acrididae is the most diverse family of grasshoppers. Many attempts to resolve the phylogenetic relationships within it did not result in a clear picture, partially because of the limited number of molecular markers used. We have tested a phylogenetic resolution of three sets of the most commonly utilized mitochondrial molecular markers available for Acrididae sequences in the database: (i) complete protein-coding mitochondrial sequences, (ii) concatenated mitochondrial genes COI, COII, and Cytb, and (iii) concatenated mitochondrial genes COI and COII. We then complemented the analysis by testing the nuclear ITS2 region. Adequate phylogenetic resolution of Acrididae subfamilies can be achieved using three (COI, COII, and Cytb) or more mitochondrial markers. Moreover, we found the ITS2 and concatenated COI/COII markers to be the least informative, providing a poor resolution. All the studied acridids fall into three well-supported phylogenetic groups that include 13 subfamilies. Acridinae, Gomphocerinae, Oedipodinae, and Catantopinae are shown to be polyphyletic, while the remaining subfamilies are in accordance with current Acrididae systematics. Our study provides a basis for more comprehensive phylogenetic analyses of Acrididae on the subfamily and lower levels.

than others. A suitable phylogenetic marker is one that resolves relationships in a particular group within clades of similar age (families, subfamilies, etc.) (PATWARDHAN et al. 2014).
Currently, for many acridids and other insects, a great number of DNA sequences representing various markers (both mitochondrial and nuclear) are available in the NCBI GenBank database (BENSON et al. 2005). The most commonly used markers are mitochondrial protein-coding genes cytochrome oxidase subunit 1 (COI), cytochrome oxidase subunit 2 (COII), and cytochrome b, as well as nuclear (18S, 28S, including internal transcribed spacer 2 (ITS2)) and mitochondrial (16S and 12S) ribosomal RNA genes. A number of the markers used in different studies regarding the Acrididae phylogeny are also diverse, varying from single genes (REN et al. 2004;HUANG et al. 2013) to several concatenated sequences (FLOOK & ROWELL 1997;FRIES et al. 2007;CHAPCO & CONTRERAS 2011;HUSEMANN et al. 2012;SONG et al. 2015SONG et al. , 2018SUKHIKH et al. 2019). It was shown that ribosomal genes (18S, 28S) do not resolve the phylogenetic relationships of Acrididae on the subfamily level and lower, due to having highly conservative sequences (SONG et al. 2015). The most recent, comprehensive phylogenetic analysis was based on mitochondrial and nuclear genes for 134 taxa covering 21 of the 26 acridid subfamilies (SONG et al. 2018). However, it is unclear which minimum set of markers that prevail in the databases can be used to provide a sufficient and statistically supported resolution of the Acrididae subfamilies.
In our study, we conducted a phylogenetic analysis for four sets of phylogenetic markers: complete protein-coding mitochondrial (CPCM), concatenated COI/COII/Cytb, concatenated COI/COII, and ITS2 sequences in order to investigate the minimal marker composition with sufficient phylogenetic resolution, including the largest number of Acrididae species. Through comparison of the trees' properties, such as the position and support values of clusters on the trees, we identified the most suitable set of the markers for studying Acrididae phylogeny which can be recommended for future studies on the subfamily and lower levels.

DNA sampling and extraction, PCR amplification and sequencing
To obtain original data in addition to published DNA sequences, we used 66 species belonging to nine subfamilies of Acrididae. The specimens were collected in the Far East of Russia, Siberia, Central Asia, the Caucasus, Turkey, Japan, and South Africa from 2008-2015.
Total DNA was isolated from the thigh and jaw muscle tissue of the specimens using a DNeasy Blood and Tissue Kit (QIAGEN, Germany) according to the manufacturer's protocol.

Multiple sequence alignment and data partitioning
We searched the NCBI GenBank database for known Acrididae sequences of COI, COII, Cytb, and ITS2, as well as complete mitochondrial nucleotide sequences. All obtained sequences were organized into four sets: complete mitochondrial, concatenated COI+COII+Cytb, concatenated COI+COII, and ITS2 sequences. All multiple nucleotide sequence alignments were made using the MAFFT v7.312 program (KATOH & STANDLEY 2013) with the parameters: -localpair and -maxiterate 1000.
To analyze complete mitochondrial sequences, we extracted all 13 protein-coding genes from the whole sequences and concatenated them. The obtained alignment was partitioned by gene; each gene was partitioned by codon, then using Parti-tionFinder v 2.1.1 (LANFEAR et al. 2017), we searched for the best-fit scheme to further partition our data using the "greedy" algorithm and estimating clade length as "linked". Concatenated COI+COII+Cytb and COI+COII sequences were partitioned in the same way as complete mitochondrial sequences. Unlike with mitochondrial data, we did not partition ITS2, as it is a relatively short non-coding sequence. The alignment was treated as a single unit for further analysis.

Phylogenetic analysis
For each set of sequences, the IQ-tree program (TRIFINOPOULOS et al. 2016) was used to construct maximum likelihood phylogenetic trees of the family Acrididae with substitution models determined by the program using the -auto and +R (FreeRate heterogeneity) options based on the lowest corrected Akaike information criterion (AICc) as the main parameter. Output from Parti-tionFinder2 (LANFEAR et al. 2017) also included recommended models; however, we opted for the IQ-tree model selection, because not all of the models of PartitionFinder2 are available in IQ-tree. Two statistical tests, available from IQ-tree, were used to evaluate the credibility of the phylogenetic clusters. The approximate likelihood ratio test (SH-like aLRT, 1000 replicates) and the ultrafast bootstrap (UfBoot, 1000 replicates) test. According to literature and the IQ-tree manual (MINH et al. 2013;TRIFINOPOULOS et al. 2016), clusters with aLRT/UfBoot probability ³ 80/95, respectively, may be considered as credible. However, in order to retain the phylogenetic signal on lower resolution trees, we chose the threshold ³ 80/85 for aLRT and UfBoot supports, respectively.
In addition to the maximum likelihood, we performed the Bayesian analysis using MrBayes 3.2.6 (HUELSENBECK & RONQUIST 2001; RONQUIST & HUELSENBECK 2003). PartitionFinder2 (LANFEAR et al. 2017) was used to calculate substitution models for the Bayesian analysis. For each set of sequences, the starting running parameters were as follows: 2.5 million generations (ngen), sampling every 250 generations (samplefreq), with eight Markov chains Monte Carlo (nchains=8), and temperature 0.2 (temp). The analysis continued for additional generations until three conditions were satisfied: (i) average standard deviation £ 0.01, (ii) no tendency of increase or decrease over time on the MrBayes 3.2.6 sump plot, and (iii) potential scale reduction factor (PSRF) values close to 1.0. For every set of sequences, all PSRF values differed by less than 0.1. At the end of analysis, we discarded 25% of the trees. Bayesian posterior probabilities may vary depending on the nature of the analysis. In our case, we choose a limit of 0.85.
To annotate taxa from the tribe to family level on the obtained trees, we used classification obtained from the Orthoptera Species File (CIGLIANO et al. 2019).

Results and Discussion
We constructed three phylogenetic trees, starting with the complete protein-coding mitochondrial (CPCM) sequences, followed by the concatenated mitochondrial sequences of three (COI+COII+Cytb) and two (COI+COII) genes, comprising of the same species that we used on the CPCM tree (Suppl. Figs 1S, 2.1S, 3.1S). However, the advantage of shorter marker sets comes with a greater number of available sequences. Thus, we added all available species into the analysis of con-catenated sequences of three and two mitochondrial genes (Suppl. Figs 2.2S, 3.2S). Finally, we analyzed the nuclear ITS2 sequences in order to validate the mitochondrial data (Suppl. Fig. 4S). The resulting topologies of all trees are presented in Figure 1. The total list of studied taxa is presented in Suppl. Table 1S.

Complete protein-coding mitochondrial sequences
In the analysis of the CPCM sequences, we used 63 species from 11 subfamilies of Acrididae obtained from the NCBI Genbank database (BENSON et al. 2005). In addition, we used the CPCM sequences from 16 species from nine different families of the suborder Caelifera in order to use as an outgroup for Acrididae. We also used four species of Tettigoniidae (suborder Ensifera) as an outgroup for Caelifera.
All studied species of Acrididae formed a single monophyletic clade. Each of the other nine families from Caelifera, including Tettigoniidae, formed a separate clade, unique to each family (Fig. 1A, Suppl. Fig. 1S).
The resulting phylogeny shows three, statistically well-supported major phylogenetic groups which embrace the 11 subfamilies of Acrididae (Fig. 1A, Suppl. Fig. 1S). Group I includes species from three subfamilies; Oxyinae, Spathosterninae, and Hemiacridinae. Group II is only comprised of species of the subfamily Melanoplinae. Group III covers all seven remaining subfamilies; Acridinae, Calliptaminae (=Calopteninae), Catantopinae, Cyrtacanthacridinae, Eyprepocnemidinae, Gomphocerinae, and Oedipodinae. It should be noted that the subfamilies Acridinae, Gomphocerinae, and Oedipodinae form a well-supported subgroup in the Group III. Interestingly, within this subgroup, representatives from these subfamilies each split into two lineages, implying the pos-sible polyphyletic nature of these subfamilies (Fig.  1A, Suppl. Fig. 1S). Moreover, these lineages further organize into higher-level, well-supported clusters that contradict currently accepted morphology based systematics. Catantopinae also appear to be polyphyletic.
Previously, in the work of SONG et al. (2018), analysis of the mitochondrial genomes and nuclear sequences showed four main clades inside the Acrididae family, with one clade consisting fully of the subfamilies that are not presented in our work. It should be noted that this is the only well supported main clade in the work of SONG et al. (2018). Our data from the first tree, based on the same phylogenetic methods as in SONG et al. (2018), are mostly congruent with SONG et al. (2018). The position and separation of subfamilies is the same, but in our analysis, the clades have significant statistical support values. The lack of contradictions between our phylogeny and the data from SONG et al. (2018) shows that the first tree (Fig. 1A, Suppl. Fig. 1S) offers a well-supported and robust phylogeny of the Acrididae species.

COI, COII, and Cytb sequences
Although, the use of the CPCM sequences is obviously preferable to establish the most robust phylogenetic relationships of Acrididae, it is not that efficient in terms of spent time and resources. Thus, we tested the efficiency of the smaller set of markers -concatenated sequences of the three most commonly represented in the database of mitochondrial genes: COI, COII, and Cytb. These were similar to the COI/COII/Cytb sets used in several previous studies, although with less species analyzed (AMEDEGNATO et al. 2003;CONTRERAS & CHAPCO 2006;FRIES et al. 2007;CHAPCO & CONTRERAS 2011). Only the clusters with statistical support values higher than 80% (aLRT), 85% (UfBoot) and 0.85 (Bayes) are shown. Labels to the right from the branches correspond to different lineages, consisting of species from certain subfamilies, denoted accordingly by subfamily names. Detailed trees with species names and support values can be found in Supplementary Figs 1-4, and accession numbers for the corresponding sequences are presented in Supplementary Table 1. The width of the branches correspond to the number of species within. Phylogenetic groups are marked by colored blocks on the right, and additionally denoted with roman numerals. Arabic numerals denote different lineages of the subfamilies that are polyphyletic in the present analysis and correspond between all the four trees.
A -the tree constructed based on the 63 complete protein-coding mitochondrial sequences of Acrididae species. The tree was rooted with four species from the family Tettigoniidae (suborder Ensifera), and 16 species of nine different Caelifera families were used as an outgroup for Acrididae (not shown on the figure). B -the tree constructed based on the 141 concatenated COI/COII/Cytb mitochondrial sequences of Acrididae species. The tree was rooted with eight Pamphagidae species (not shown on the figure). C -phylogenetic tree constructed based on the 231 concatenated COI/COII mitochondrial sequences of Acrididae species. The tree was rooted with eight Pamphagidae species (not shown on the figure). D -phylogenetic tree constructed based on the ITS2 96 sequences of Acrididae species. The tree was rooted with three Pamphagidae species (not shown on the figure).
The resulting tree, consisting of the same species as in the CPCM analysis, was largely congruent with the CPCM tree, although with slightly lower support values (Suppl. Fig. 2.1S). In order to increase the species count and verify whether the larger dataset would affect the phylogenetic resolution, we included 141 species compared to 63 in the CPCM analysis. This included the same 11 subfamilies as on the CPCM tree, with the addition of the subfamily Pezotettiginae (Fig. 1B,  Suppl. Fig. 2.2S).
The three major phylogenetic groups that are supported on this tree are similar to those on the CPCM tree, although they differ in composition (Fig. 1A,B, Suppl. Figs 1S, 2.1S, 2.2S). On the COI/COII/Cytb level there was not enough statistical support for Group I, resulting in the separation of the Oxyinae, Spathosterninae, and Hemiacridinae subfamilies. The position of Melanoplinae (Group II) remained as on the CPCM tree. In Group III, only three subfamilies remained conjoined compared to the CPCM tree (Acridinae, Gomphocerinae, and Oedipodinae) and correspond to the respective subgroup in it. The species of these subfamilies formed five clusters inside phylogenetic Group III, four of which completely corresponded to the four clusters found on the CPCM tree (Fig. 1A,B, Suppl. Figs 1S, 2.2S). The remaining five subfamilies, which belong to Group III on the CPCM tree (Fig. 1A, Suppl. Fig. 1S), namely Calliptaminae, Catantopinae, Cyrtacanthacridinae, Eyprepocnemidinae, and Pezotettiginae, allocated outside Group III, each forming a separate lineage (Fig. 1B, Suppl. Fig. 2.2S). This could be due to the lower resolution of shorter COI/COII/Cytb sequences compared to the CPCM ones. The polyphyletic nature of the Catantopinae subfamily is also supported on this level.
Interestingly, on the COI/COII/Cytb tree, the subgroup of Acridinae, Gomphocerinae, and Oedipodinae subfamilies demonstrate the same phylogenetic patterns that were observed on the CPCM tree ( Fig. 1A- B,Suppl. Figs 1S,2.1S,2.2S). The fact that the same structure within the subgroup remains unchanged with the decrease in the length of the sequences analyzed indicates that these clusters present a strong and meaningful phylogenetic signal. Therefore, a more detailed look at Acridinae, Gomphocerinae, and Oedipodinae systematics may be required.

COI and COII sequences
To further decrease time and resources spent, we tested the resolution of the concatenated COI and COII sequences. COI/COII markers are more commonly used in phylogenetic studies concerning smaller groups of Acrididae and other grass-hoppers, such as separate subfamilies, tribes or genera (COLOMBO et al. 2005;CHAPCO 2013;WOLLER et al. 2014;SUKHIKH et al. 2019).
Starting with COI/COII sequences set, the positions of the Groups (I, II, III) were no longer concordant with the higher resolution trees, even in the analysis of the same species composition as on the CPCM tree ( Fig. 1A- C,Suppl. Figs 1S,3.1S,3.2S). Increasing species count to 231 allowed us to present 14 subfamilies. Differing from the previous tree, the subfamilies Conophyminae and Proctolabinae were added ( Fig. 1B-C, Suppl. Figs 2.2S, 3.2S). However, even though several subfamilies are well supported and separated (Conophyminae, Eyprepocnemidinae, Cyrthacantacridinae, Calliptaminae, Oxyinae, Melanoplinae), it is impossible to infer the phylogenetic relationships between them on both maximum likelihood and Bayesian trees. Moreover, the subfamilies Acridinae, Gomphocerinae, and Oedipodinae, which are well established as a single group or a subgroup on the higher resolution trees, do not form a single clade on the COI/COII tree (Fig. 1C, Suppl. Fig. 3S). However, some intersubfamily clusters, observed within this subgroup on the previous trees, remain unchanged.

ITS2 sequences
Additionally, to verify the resolution of the mitochondrial markers, we tested the nuclear phylogenetic marker -non-coding region ITS2. Although, the ITS2 phylogenetic marker is rarely used in studies above the genera level (SWORD et al. 2007), other commonly used nuclear markers (18S, 16S ribosomal RNA genes) are shown to be non-informative when analyzing the relationships below the Acridoidea family level (SONG et al. 2015).

Concluding Remarks
In this study, we analyzed the application of different sets of phylogenetic markers to infer the relationships between the Acrididae subfamilies. As expected, the most comprehensive phylogeny was obtained from the CPCM sequences (Fig. 1A,  Suppl. Fig. 1S). However, using CPCM sequences comes with the cost of limited taxa presented in the database, while obtaining new sequences still requires significant efforts from researchers. Therefore, we tested whether shorter sets of phylogenetic markers could provide the similar resolution level as the CPCM tree.
The concatenated COI/COII/Cytb sequences proved to be informative enough to establish phylogenetic relationships comparable to those resulting from CPCM sequences on the subfamily level and below. However, the resolution of the relationships between subfamilies was limited ( Fig. 1A-B, Suppl. Figs 1S, 2.2S). The COI/COII set of markers did not provide sufficient resolution to study the phylogenetic relationships of Acrididae on the subfamily level, although it resolved phylogenetic relationships between closely related tribes (Fig.  1C, Suppl. Fig. 3.2S). Therefore, care should be taken in the analysis of distantly related tribes of a single subfamily (such as in the Oedipodinae). The ITS2 marker was shown to be the least informative in the study of phylogenetic relationships between the Acrididae subfamilies. Moreover, although the ITS2 marker was able to separate the subfamilies, it should be used with caution, as the relationships between tribes and genera inside the subfamilies are poorly supported and not clear (Fig. 1D, Suppl. Fig. 4S). Thus, we can recommend the COI/COII/Cytb sequences as an alternative minimal set of markers to infer phylogenetic relationships between Acrididae species on the subfamily level.
Finally, using the selected small set of markers, we have demonstrated that some problematic points of Acrididae systematics can already be observed. The most obvious example is the relationship between and within the Acridinae, Gomphocerinae, and Oedipodinae subfamilies, as well as relationships among Catantopinae species.