Mitochondrial genomes of Macropsini (Hemiptera: Cicadellidae: Eurymelinae): Structural features, codon usage patterns, and phylogenetic implications

Abstract Macropsini is a tribe of Eurymelinae in the family Cicadellidae that is widely distributed worldwide. Still, its taxonomic status has been unstable, and the classification of certain clades at the genus level has been controversial. The aim of this study is to address the patterns and processes that explain the structure and the evolution of the mitogenomes of Macropsini, while contributing to the resolution of systematic issues involving five of their genera. To this task, the mitogenomes of 26 species of the tribe were sequenced and characterized, and their phylogenetic relationships were reconstructed. The results revealed that the nucleotide composition of mitochondrial genes in these 26 species was significantly skewed toward A and T. Codons ending with T or A in relative synonymous codon usage were significantly more prevalent than those ending with C or G. The parity plot, neutrality plot, and correspondence analysis revealed that mutation and selective pressure affect codon usage patterns. In the phylogenetic relationships of the Macropsini, the monophyly of Pedionis and Macropsis was well‐supported. Meanwhile, Oncopsis revealed paraphyletic regarding Pediopsoides. In conclusion, this research not only contributes the valuable data to the understanding of the mitogenome of the Macropsini but also provides a reference for future investigations on codon usage patterns, potential adaptive evolution, and the phylogeny of the mitogenome within the subfamily Eurymelinae.


| INTRODUC TI ON
The mitochondrial genome has been widely used in molecular systematics and population genetics due to its moderate length, relatively conserved gene arrangement, and rapid evolutionary rate (Ballard & Whitlock, 2004).Currently, with the popularization of high-throughput sequencing technology, the number of studies comparing the mitogenomes of different insect taxa has gradually increased.These studies mainly concentrate on genome structure, base composition bias, and codon usage bias (Cameron, 2013;Jiang et al., 2019;Yang et al., 2023).
Codon usage bias (CUB) refers to the phenomenon in which synonymous codons of a species or a gene are used at different frequencies (Hasegawa et al., 1979).The study of CUB is beneficial for exploring genetic evolution and understanding gene expression characteristics.
The evolution of CUB is a complex and debated issue.The mutationselection-drift equilibrium model is the most widely recognized theory (Duret & Mouchiroud, 1999;Li & Tzagoloff, 1979).However, the impact of these evolutionary forces on different species remains undefined (Hershberg & Petrov, 2008).Additionally, various biological factors associated with CUB have been identified, such as base composition characteristics, GC content, gene expression level, gene length, tRNA abundance, and amino acid properties (hydrophobicity and hydrophilicity) (Min & Hickey, 2007;Yadav & Swati, 2012).
Macropsini (Hemiptera, Auchenorrhyncha, Membracoidea, Cicadellidae, and Eurymelinae) is a tribe of Eurymelinae in the family Cicadellidae.Macropsini is mainly characterized by having the crown very broad, short, and protruding forward.The pronotum is strongly convex, covered with wrinkles and dots (Figure 1).These minute insects, colloquially known as hoppers, are plant feeders that suck plant sap from grass, shrubs, or trees.They undergo a partial metamorphosis, and interestingly, have various host associations, varying from very generalized to very specific.The insects of Macropsini are widespread worldwide, with over 750 species reported in 19 genera globally.In China, there are records of 138 species belonging to eight genera (Li, Dai, & Webb, 2023;Li, Li, et al., 2023;Li, Wang, et al., 2023;Wang, Wu, Yang, & Dai, 2020).These insects are significant ecological and commercial forest pests that mainly feed on woody plants, causing direct damage to plants by sucking plant juice and oviposition (Li et al., 2014).In addition, some species are potential pests due to their ability to spread plant viruses, such as the Oncopsis alni (Schrank), a vector for the grapevine yellows (Beirne, 1954;Kunkel, 1935;Maixner & Reinert, 1999).
The taxonomic history of Macropsini has seen several changes.
Evans initially established Macropsidae with Macropsis Lewis as the type genus (Evans, 1936) and later reduced it to Macropsinae, which included Macropsini and Nioniini (Evans, 1946).Nioniini was subsequently excluded, leaving only one tribe in the subfamily Macropsinae (Linnavuori, 1978).Dietrich and Thomas (2018)  At present, there have been few studies on the complete mitogenome of the Macropsini, and only two whole mitogenome sequences are available on NCBI (https:// blast.ncbi.nlm.nih.gov/ Blast.cgi).In this study, the mitogenomes of 26 species of the Macropsini were sequenced using high-throughput sequencing technology and analyzed.We combined different strategies to explore the characteristics of codon usage bias in the mitogenome of the Macropsini, providing a reference for a deeper understanding of their evolution.
In addition, this study reconstructed the phylogenetic relationship among genera and species of the Macropsini using mitogenome sequences, which provides a molecular solid information base for understanding its evolution, as well as to subside further studies involving the population genetics and the evolution of Eurymelinae.

| Sample collection and DNA extraction
Adult samples were collected in the field using the daytime sweep net method and the nighttime light trapping method, and they were preserved in anhydrous ethanol.The samples were returned to | 3 of 14 the laboratory in a −20°C freezer until DNA extraction.The adult specimens were identified according to their morphological traits (Li et al., 2019;Li, Dai, & Webb, 2023;Li, Li, et al., 2023).Genomic DNA was extracted from a single male adult's head and chest muscle tissue using the DNeasy® Blood & Tissue Kit according to the manufacturer's recommendations.The extracted genomic DNA was stored at −20°C for further analysis.DNA samples and voucher specimens with male external genitalia were preserved at the Institute of Entomology, Guizhou University, Guiyang, China (GUGC).

| Sequence assembly and annotation
Whole genomes for 26 species samples were sequenced using Illumina sequencing technology (Illumina HiSeq 4000 platform, 150 bp bipartite sequencing reads with an average insert size of 350 bp and 2 GB clean data; Berry Genetics, Beijing, China).The sequences from the NGS data were mapped in Geneious v 2019.2.1 using the "Map to Reference" function with a Medium-Low sensitivity and five times iteration, using 600 bp COI sequences of Macropsis notata (GenBank NC_042723) and Oncopsis nigrofasciata (GenBank MG_813492) (Wang, Wu, Yang, & Dai, 2020) as references.After that, the previously acquired results served as a fresh reference sequence, and the assembly procedure described above was repeated until all the mitogenomic reads were extracted.The retrieved mitogenome sequence was preliminary annotated using the MITOS web server (http:// mitos.bioinf.uni-leipz ig.de/ index.py) (Bernt et al., 2012) based on the mitochondrial genetic code of invertebrates.The NCBI ORF Finder function was then used to locate 13 protein-coding genes (PCGs) based on the mitochondrial genetic code of invertebrates, and the abnormal initiation and termination codons were determined by comparing and correcting with previously published mitochondrial PCGs from related species (Wang et al., 2017). ARWEN v.1.2 (Laslett & Canbäck, 2007) and tRNAscan-SE v.1.21(Lowe & Eddy, 1997) were used to localize 22 tRNA genes.
The location of the neighboring tRNA gene was used to pinpoint the location of the rRNA genes, which were identified by comparison with genes of other Hemipteran insects.Finally, the assembled sequences were uploaded to NCBI (https:// www.ncbi.nlm.nih.gov/ ), and the accession numbers are shown in Table 1.

| Nucleotide composition and diversity analysis
The sequences were analyzed using MEGA X (Kumar et al., 2018), regarding the percentage of overall nucleotide composition of each mitogenome (A%, T%, C%, G%), the nucleotide composition of the third position of the codons (A3%, T3%, C3%, G3%), the percentage of the GC content (G + C), the percentage of the AT content (A + T), and the frequency of the G + C of codon first and second nucleotides mean value (GC12).Strand asymmetry was calculated using the following formulas: GC skew = (G − C)/(G + C), and AT skew = (A − T)/ (A + T) (Perna & Kocher, 1995).
The polymorphic sites and nucleotide diversity (Pi) of each PCG among species were determined using DnaSP v5.0 (Librado & Rozas, 2009).A sliding window of 200 bp (in 20 bp overlap steps) was implemented to calculate Pi between PCGs and rRNA genes in the alignment of 26 mitogenomes (Yang et al., 2023).

| Codon usage
The relative synonymous codon usage (RSCU) of 13 PCGs in the mitogenome was analyzed using MEGA X (Kumar et al., 2018).A heatmap illustrating the RSCU values of the PCGs, excluding stop codons, in the 26 newly sequenced mitogenomes was plotted using Chiplot (https:// www.chipl ot.online/ ).RSCU, which can more accurately reflect the level of CUB, is the ratio of the actual observed value to the predicted value for a specific synonymous codon, regardless of gene length and amino acid frequency (Sharp & Li, 1986).

| Parity Rule2 (PR2) bias plot
PR2 bias plot analysis was performed with AT-bias = A3/(A3 + T3) as abscissa and GC-bias = G3/(G3 + C3) as ordinate to explore the effects of mutation pressure and natural selection pressure.Theoretically, the use of A/T or G/C at the third codon position is considered proportional when the gene is simply influenced by base composition (Kawabe & Miyashita, 2003).In contrast, selective pressure and mutation pressure combined can result in a different frequency of use of A/T or G/C (Sueoka, 1995).The central position of the plot (0.5, 0.5) indicates that A = T and G = C, demonstrating that CUB is unaffected by base mutations and natural selection (Yengkhom et al., 2019).

| Neutrality plot
Neutrality plot analysis is a method for quantitatively analyzing the influence of directed mutation pressure and natural selection on CUB.Here, we constructed the neutral plot with GC12 as the y-axis and GC3 as the x-axis.The correlation between GC12 and GC3 was analyzed using SPSS v26.0 software based on the Pearson correlation coefficient.When the slope of the regression curve goes to 0 and there is no significant correlation between GC12 and GC3, it is entirely influenced by natural selection.In contrast, when the slope is close to or equal to 1 and the correlation is significant, it is mainly affected by mutation pressure (Sueoka, 1988).

| Correspondence analysis (COA)
Correlation analysis (COA) is the use of multivariate statistical methods to explore the relationship between the variables.It was used here to address the main mechanisms affecting the codon usage patterns of 13 PCGs (Perrière & Thioulouse, 2002;Shields & Sharp, 1987).The RSCU frequency of 13 PCGs was analyzed by Past 4.09 software to investigate the specific causes of CUB further.All zero-row and stop codons (UAA and UAG) in the matrix were deleted.

| Grand average of hydropathy (GRAVY)
The amino acid composition of 13 PCGs from 26 species was analyzed by MRGA X (Kumar et al., 2018).The Grand average of hydropathy (GRAVY) value of 13 PCGs was calculated by Galaxy (https:// galax yproj ect.org) (Jalili et al., 2020).The GRAVY value is typically the product of the frequency of amino acids and the corresponding hydrophobic index, which determines the hydrophobicity (positive GRAVY value) and hydrophilicity (negative GRAVY value) of proteins (Kyte & Doolittle, 1982).

| Phylogenetic analysis
In this study, the mitogenomes of 28 species of Macropsini were selected as the ingroup, including the new 26 mitogenome sequences reported here as well as the mitogenomes of M. notata (GenBank To assess the presence of phylogenetic information in the sequences, three datasets (AA, PCG12-rRNA, and PCG-rRNA) underwent saturation analysis using DAMBE v7.0.35 (Xia et al., 2003).
PartitionFinder v2.1.1 (Lanfear et al., 2017) was utilized to determine the best model for each dataset, ensuring that each gene fragment had its ideal model.The phylogenetic tree was constructed using Maximum Likelihood (ML) and Bayesian Inference (BI) based on each dataset.ML phylogenetic trees were generated with IQ-TREE v1.6.3 using the ultrafast bootstrap approximation approach (Nguyen et al., 2015), repeated 10,000 times.BI phylogenetic trees were constructed using MrBayes v3.2.6 (Huelsenbeck & Ronquist, 2001).The BI analysis used default settings to simulate four independent operations for one million generations, sampling every 1000 generations.
The first 25% of samples were discarded when the average standard deviation of split frequencies reached 0.01, and the remaining samples were used to build a consensus tree and calculate the posterior probability (PP).

| Nucleotide composition and diversity analysis
The sequence lengths of the mitogenomes from the 26 species ranged from 15,279 bp (Pediopsis sp.) to 16,546 bp (Pedionis sagittata) (Table S1).A comparative analysis revealed that the lengths of

| Analysis of codon usage
The

| Parity Rule2 (PR2) bias plot analysis
The relationship between the third bases of the codons of 13 PCGs

| Neutrality plot analysis
The results of the neutral plot analysis of the 13 PCGs are shown in Figure 7.The correlation analysis between GC12 and GC3 showed that there was a highly significant correlation (p < .01) between GC12 and CG3 of ATP6, COX1, CYTB, ND2, ND5, and ND6 genes, | 7 of 14 as well as a moderately significant correlation for ATP8 (p < .05),indicating that the mutation pattern of bases in codon positions one and two was identical to that of position three and that mutational pressure played a significant role in the CUB of these seven genes.
Nevertheless, the results of the neutrality plot analysis revealed that the slopes of the regression lines of PCGs were generally lower than .5 except for the ND2 and ND6 genes, indicating that natural selection was an important factor influencing the CUB of PCGs.
The abovementioned results indicated that the CUB of ATP6, COX1, CYTB, ND5, and ATP8 genes were influenced by both mutation and natural selection, while ND2 and ND6 genes were mainly influenced by mutational pressure and other genes were mainly influenced by natural selection.

| Correspondence analysis (COA) of codon usage
Correspondence analysis is shown in Figure 8.The percentage of the variation represented by Axis 1 ranged from 13.01% (COX2) to 26.96% (ND1).The percentage of the variation represented by Axis 2 ranged from 11.67% (ATP8) to 16.72% (ND2).For several genes, The frequency distribution of start codons and stop codons of 13 PCGs in 26 species of the Macropsini.
F I G U R E 6 PR2-plot analysis of the 13 mitochondrial protein-coding genes, where each point represents one of the 26 evaluated species.some codons were discretely dispersed away from the central axis, suggesting that, besides mutation, natural selection is also influencing codon usage.

| Analysis of amino acid composition and protein properties
The total frequency of utilization of each amino acid of the 13 PCGs in the mitogenome is displayed in Figure 9.It was found that the hydrophobic amino acids Leu, Met, Ile, Phe, and Ser were most frequently used.In addition, Table 2 shows that the 13 PCGs of these species were more inclined to use hydrophobic amino acids.

| Phylogenetic analysis
The substitution saturation test revealed that none of the three candidate datasets (AA, PCG12-rRNA, and PCG-rRNA) were saturated and that the value of the substitution saturation index (Iss) was considerably lower than the threshold value (Iss.cSym or Iss.cAsym).The two-tailed test found that Iss significantly differed from Iss.cSym and Iss.cAsym (Table S2).This suggested that the retrieved data were suitable for further phylogenetic analyses.BI and ML studies were done on 28 species in the Macropsini using three datasets,

F I G U R E 9
The overall frequency of amino acid usage for 13 mitochondrial protein-coding genes in 26 species.

Species
The majority of the 13 PCGs in these 26 species utilize the typical triplet start codon ATN (ATT, ATA, ATC, ATG), whereas the termination codon TAA is most frequently used.The occurrence of this phenomenon is likely due in large part to selective avoidance of translational readthrough (TR).Previous studies have indicated that each termination codon has a different intrinsic error rate in eukaryotes, with the sequence being TGA > TAG > TAA (Geller & Rich, 1980;Parker, 1989).Therefore, it may be possible to reduce the incidence of TR by selecting for TAA (Ho & Hurst, 2022).Similarly, this study found that codons encoding the same amino acids are not used equally, and codons ending in T or A are significantly more frequent than those ending in C or G, consistent with previous findings (Wei et al., 2014).
Mutation and selection pressure effects play a significant role in the formation of codon usage patterns (Sharp et al., 2010;Wang, Meng, & Wei, 2018).A combination of PR2-plot, neutral plot, and COA analysis indicated that the codon usage patterns in the mitogenomes of these 26 species may be impacted by a mixture of mutation and natural selection.PR2-plot analysis revealed inconsistent frequency of the four bases in the third position, suggesting that mutation and natural selection may influence codon usage in these genes.Neutral plot analysis indicated that codon usage of ATP6, COX1, CYTB, ND5, and ATP8 genes is influenced by both mutation and natural selection.COA analysis indicated that besides mutation, natural selection is also influencing codon usage.The exact mechanisms behind these phenomena are not fully understood, but it is evident that a discernible equilibrium exists between natural selection factors (such as gene length, gene function, and translational selection) and mutational pressures (including base content and mutation location) during the evolution of codon usage patterns (Wang et al., 2018).Additionally, we found that the majority of amino acids in the mitochondrial genomes of these 26 species are hydrophobic.
Previous studies have shown that hydrophobic interaction plays a particularly dominant role in the stability of native structures (Kauzmann, 1959).It is well known that the hydrophobicity of amino acid residues plays an important role in protein folding, and it has been reported that local hydrophobicity has a greater impact on the formation of β-sheets than on α-helices (Kanehisa & Tsong, 1980), with codons overrepresented in β-sheets being underrepresented in α-helices (Das et al., 2006).
The appropriate sequencing and characterization of the target set of mitogenomes also increases the resolution of the phylogenetic relationships within Macropsini.The phylogenetic relationships, based on the three datasets in this study, supported the monophyly of both Pedionis and Macropsis.This result not only supported the monophyly of Pedionis but also provisionally resolved the monophyly of Macropsis, which was previously recovered as paraphyletic by Li and Dai (2018).Macropsis, the largest genus within Macropsini encompasses species with highly similar morphological features, especially genitalia (Li et al., 2012(Li et al., , 2014)).These features can be distinguished from those of other genera, further supporting the monophyly of Macropsis.In addition, the phylogenetic relation- further downgraded it to Macropsini and placed it and Idiocerini in the subfamily Eurymelinae.However, there are still significant uncertainties in the genus-level classification of Macropsis, Pedionis, and Pediopsoides.To address this, various researchers have conducted systematic studies.For instance, Li and Dai (2018) analyzed 22 Macropsini species based on the COX1 gene, supporting the monophyly of Oncopis and Pedionis but not of Macropsis and Pediopsoides.Subsequently, Xue et al. (2020) conducted a systematic study of Macropsini based on molecular fragments and morphological characteristics, supporting the monophyly of Macropsis, but leaving the taxonomic status of Pediopsoides unclear.

F
Dorsal and lateral views of Macropsis huangbana Li & Tishechkin.
RSCU values of each codon corresponding to the specific amino acids of PCGs in the mitogenomes of 26 species were examined.The findings are presented in Figure 4.It was discovered that 62 codons (aside from the two stop codons) were used unevenly in the process of coding genes, with the third position of the more frequent codons ending mostly with A or U. Thirteen codons, UUA (L), UCA (S), CCA (P), CGA (R), GCA (A), ACA (T), AUA (M), AAA (K), CAA (Q), GAA (E), GUU (V), GGA (G), and AGA (S), were overrepresented (RSCU >1.6).There were 25 underrepresented codons (RSCU <0.6), and all of them ended in G or C. F I G U R E 2 The GC skew versus G + C content (%) and AT skew versus A + T content (%) across the mitochondrial genomes of 26 species.Points are grouped in colors according to content, where each point represents a single species.Except for ND5 and ATP8, which started mostly with TTG, the remaining PCGs utilized the conventional start codon ATN (ATT, ATA, ATC, ATG), with ATG being the most often used initiation codon and ATC being the least frequently used.TAA, TAG, and T-were the termination codons of 13 PCGs, with TAA being the most frequently used and T-being the least frequently used (Figure 5), indicating that mitochondrial genes prefer A/T bases.
was analyzed by Parity Rule2 bias.The results of the PR2-plot analysis are shown in Figure 6.The genes of ND1, ND4, ND4L, and ND5 were primarily dispersed unevenly in the second quadrant, with few species distributed on the median line.The frequency of the four bases in the third codon position of these genes was not consistent, showing G > C, (G3/[G3 + C3]) mean values of 0.68, 0.63, 0.70, and 0.62, respectively; T > A, (A3/[A3 + T3]) mean values of 0.33, 0.30, 0.30, and 0.31, respectively.The rest of the genes were mainly irregularly distributed in the fourth quadrant, which has a higher frequency of usage of A than T and a higher frequency of usage of C than G.It was thus evident that their codon preferences are affected by factors such as natural selection in addition to base mutations.

F
Sliding window analysis presenting distribution of Pi values across PCGs and rRNAs genes, as evaluated in mitochondrial genomes of 26 species.F I G U R E 4 Heat map of RSCU values estimated for each codon across the mitochondrial PCGs of the 26 target species.
yielding six phylogenetic trees (Figure 10; Figure S1).The results showed that the genus-level topology of the ML and BI trees based on the mitogenome was completely consistent and most nodes received high nodal support values.In the present research, the monophyly of Pedionis was well supported (Bootstrap values [BS] = 100; Bayesian posterior probability [PP] = 1); the monophyly of Macropsis received relatively high support values in BI analyses (PP >0.88) and moderate support values in ML analyses (BS = 64-100); Oncopsis and Pediopsoides clustered into a large branch and formed a sister group relationship with Pedionis (BS = 100; PP = 1); Oncopsis revealed paraphyletic regarding Pediopsoides, but received relatively low support in ML analyses; Pediopsis was the early offshoot of Macropsini (BS = 100; PP = 1).Unfortunately, Pediopsis has only one representative in the dataset, making it impossible to determine its monophyly.Their phylogenetic relationships were as follows: ((((Oncopsis + Pedio psoides) + Pedionis) + Macropsis) + Pediopsis).4 | DISCUSS IONComparative analysis indicated that the nucleotide composition of the mitochondrial genes of 26 species of the Macropsini was significantly skewed toward A and T. This finding is consistent with the previous studies of the mitogenomes of several leafhopper species, including Macrosteles quadrimaculatus(Du et al., 2019), Idioscopus F I G U R E 7 Neutrality plot analysis of 13 mitochondrial protein-coding genes showing the correlation between GC values in the first and second x the third codon positions in the 26 evaluated species.F I G U R E 8 Correspondence analysis presenting the distribution of codons regarding their relative synonymous codon usage values across different protein-coding genes of the 26 evaluated species.
ships in this study indicated that Oncopsis revealed paraphyletic regarding Pediopsoides.Xue et al. (2020) conducted a study based on molecular fragments (28S D2, 16S, COX1, H2A, H3) and 86 morphological characters, which also found that Pediopsoides forms a paraphyletic group, and that Oncopsis shares a closer relationship with Pediopsoides.Moreover, while observing their morphological characteristics, it was noted that some species within the Pediopsoides exhibit broader faces and nearly horizontal distribution marks on the pronotum.Therefore, resolving the intricate relationship between Oncopsis and Pediopsoides may require the addition or substitution of molecular markers, or a reconsideration of their taxonomic status.Dietrich et al. (2017) and Dietrich and Thomas (2018), based on morphological characteristics of leafhoppers in fossils and molecular systematic studies, classified Macropsinae and Idiocerinae into Eurymelinae as Macropsini and Idiocerini, respectively.However, the systematic studies based on mitochondrial genomes have revealed that the phylogenetic relationship between Macropsini and Idiocerini is distant (Li, Wang, et al., 2023; Wang, Wu, Dai, & F I G U R E 1 0 A phylogenetic tree of Macropsini was constructed using MrBayes v3.2.6 based on amino acid sequences (BI: AA).Species marked with an asterisk indicate those that have been previously published.
Accession numbers in NCBI of 26 species of the Macropsini.
TA B L E 1