﻿Characterization of two new Pylorgus mitogenomes (Hemiptera, Lygaeidae, Ischnorhynchinae) and a mitochondrial phylogeny of Lygaeoidea

﻿Abstract Lygaeidae is a large family of Hemiptera (Heteroptera) currently separated into three subfamilies, Ischnorhynchinae, Lygaeinae, and Orsillinae. In this research, the complete mitogenomes of the iscnorhynchines Pylorgusporrectus Zheng, Zou & Hsiao, 1979 and Pylorgussordidus Zheng, Zou & Hsiao, 1979 were sequenced, and the phylogeny of Pylorgus and the Lygaeidae with known complete mitogenomes were examined. The mitogenomes are 15,174 bp and 15,399 bp in size, respectively, and comprised of 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs), and a control region (D-loop). Nucleotide composition is biased toward A and T, and the gene order is identical to that of the putative ancestral arrangement of insects. Eleven PCGs begin with a typical ATN, and the remaining two PCGs begin with TTG (cox1 and nad4l). All tRNAs had a typical cloverleaf secondary structure, but some of them had individual base mismatches. The phylogenetic analyses based on the concatenated nucleotide sequences of the 13 PCGs, using Bayesian inference and maximum likelihood, support the monophyly of Lygaeidae. The results show that P.porrectus and P.sordidus clustered with nine other Lygaeidae. This study includes the first complete sequencing of the mitochondrial genomes of two Pylorgus species, which will provide important data for studying the phylogenetic position of Lygaeidae in Lygaeoidea and reconstructing the phylogenetic relationships within Pentatomomorpha.


Introduction
The Lygaeoidea represents the second largest superfamily within the infraorder Pentatomomorpha and includes over 4660 described species in 16 families (Henry 2017;Dellapé and Henry 2020). Most Lygaeoidea feed mainly on mature seeds (Schuh and Slater 1995); although Blissidae, Colobathristidae, Malcidae, and Piesmatidae predominantly feed on plant sap (Sweet 2000;Henry et al. 2015), Berytidae are mostly phytophagous, with a few becoming pests, although some have been shown to be predatory (Henry 2000), and Geocoridae are primarily predators but sometimes also feed on seeds and leaves of plants (Sweet 2000).
Currently, three subfamilies of Lygaeidae (sensu stricto) are recognized: Ischnorhynchinae, Lygaeinae, and Orsillinae (Dellapé and Henry 2020). The main diagnostic characters of Lygaeidae are as follows: bucculae well developed, pronotal calli with an impressed transverse groove, scutellum usually with a raised cross-shaped carina, and hamus present on wings. Abdominal spiracles on segments II to VII dorsal (Malipatil et al. 2020).
To date, the phylogeny of Lygaeidae is unresolved (Yao et al. 2012;Zhang et al. 2019), and the status of Orsillinae and Ischnorhynchinae in relation to Lygaeidae (sensu stricto) continues to be discussed. Henry (1997) proposed that Orsillinae and Ischnorhynchinae be classified as subfamilies of Lygaeidae. However, Sweet (2000) recognized them as separate families from the Lygaeidae (Orsillidae and Ischnorhynchidae). A few workers have followed Sweet in adopting the family Orsillidae (Eyles and Malipatil 2010;Malipatil 2010;Ge and Li 2019), whereas Henry et al. (2015), supported by Schuh and Weirauch (2020), disagreed with Sweet, who provided no evidence to support his hypothesis.
The complete mitochondrial genome data of nine species in Lygaeidae are included on NCBI, and only two species of Ischnorhynchinae. However, for the largest genus in this subfamily, Pylorgus, the mitochondrial genome data is totally unknown. Therefore, in the present study, we obtained the complete mitochondrial genomes of two Pylorgus species, Pylorgus porrectus Zheng, Zou &Hsiao, 1979 andPylorgus sordidus Zheng, Zou &Hsiao, 1979, by using the next-generation sequencing technology. Furthermore, we constructed the phylogenetic trees based on the mitogenomes of 21 species of the superfamily Lygaeoidea and four outgroup species, which will provide important data for further studies on the phylogenetic position of Lygaeidae in Lygaeoidea and be also useful to reconstruct the phylogenetic relationships within Pentatomomorpha.

Sample collection, DNA extraction, and mitogenome sequencing
Adults of Pylorgus porrectus (Fig. 1a The complete genomic DNA was extracted from an adult sample using a Rapid Animal Genomic DNA Isolation Kit (Sangon Biotech, Shanghai, China). Libraries were prepared on an Illumina MiSeq PE300 platform (Sangon Biotech, Shanghai, China). Low-quality and short reads were removed using Fastp v. 0.36 (Chen et al. 2018) to obtain clean reads and ensure rich quality of information analysis.

Mitogenome assembly, annotation, and analyses
SPAdes v. 3.15 (Bankevich et al. 2012) was used to assemble the high-quality next generation sequencing data de novo to construct contig and scaffold. After the assembly was completed, we evaluated and quality controlled the assembly results, excluding any contamination that may originate from the host genome in the subsequent analysis, and only retained the scaffolds derived from the genome of the organelle. We used BLASTn to compare the scaffolds with the NCBI library to obtain sequence similarity information, extracted the sequencing depth and coverage information of each scaffold, and manually selected possible target scaffolds after sorting out and comprehensively considering the above information. Then GapFiller v. 1.11 (Boetzer and Pirovano 2012) was adopted to supplement GAP to the contig obtained by splicing, and PrInSeS-G was adopted to carry out sequence correction to correct editing errors and insertion and deletion of small fragments in the splicing process, and finally the complete mitochondrial genome was obtained.
For mitochondrial gene annotation, we used tBLASTn and GeneWise to backalign with near-source reference databases to obtain the coding sequence (CDS) gene boundaries, and MiTFi to obtain the transfer RNA genes (tRNAs) sequence annotation. The non-coding ribosomal RNA genes (rRNAs) were identified by cmsearchrfam alignment and finally summarized into a complete annotation result.

Phylogenetic analyses methods
The mitochondrial genome data of 25 species in Pentatomomorpha were used to reconstruct the phylogenetic relationship of Lygaeoidea, in which 21 species of Lygaeoidea were regarded as ingroup and four species was regarded as outgroup (Table 1). All sequences were standardized and extracted 13 protein-coding genes (PCGs) by PhyloSuite v. 1.2.2 (Zhang et al. 2020). The 13 PCGs of the 25 species were aligned individually using codon-based multiple alignments with MAFFT v. 7.313 software with default settings (Katoh and Standley 2013). Gblocks v. 0.91b software was used to remove the intergenic gaps and ambiguous sites (Talavera and Castresana 2007), and all PCGs sequences were concatenated in PhyloSuite v. 1.2.2. The best partitioning scheme and evolutionary models for constructing Bayesian inference (BI) and maximum-likelihood (ML) trees were selected by PartitionFinder2 (Lanfear et al. 2016), with a greedy algorithm, BIC criterion, and the gene and codon model. BI phylogenies were inferred using MrBayes v. 3.2.6 (Ronquist et al. 2012) under partition model (2,000,000 generations), in which the initial 25% of sampled data were discarded as burn-in. ML phylogenies were inferred using IQ-TREE (Nguyen et al. 2015) under the Edge-linked partition model for 5000 standard bootstraps with 1000 replicates.

Genome structure and composition
The assembled complete mitogenomes of Pylorgus porrectus and P. sordidus are circular DNA molecules of 15,174 bp and 15,399 bp in length, respectively, which is within the range of the sequenced mitogenomes of Lygaeidae in Gen-Bank (Table 1). These mitogenomes all have a similar typical insect mitogenome structure, closed-circular and double-stranded DNA, containing 13 PCGs, 22 tRNAs, two rRNAs, and a control region (D-loop) (Fig. 2). The sequence of mitochondrial protein-coding genes is the same as that in other Lygaeoidea (Cao et al. 2020). Among the 37 genes, 23 genes (9 PCGs and 14 tRNAs) are on the majority strand (N-strand), while the remaining four PCGs, eight tRNAs, and two rRNA genes are on the minority strand (J-strand).
The basic composition of P. porrectus was A = 42.7%, T = 31.8%, G = 9.6%, and C = 15.8%, and of P. sordidus, A = 42.8%, T = 33.1%, G = 9.6%, C = 14.5%. Furthermore, both mitochondrial genome sequences were biased toward A and T. The AT content of P. porrectus was 63.74% and that of P. sordidus was 64.12%. The AT-skew value was greater than 0, whereas the GC skew value was less than 0, indicating that the base composition of P. porrectus and P. sordidus showed a strong A-bias and T-bias (Table 2).

Protein-coding genes
The complete length of the 13 PCGs of P. porrectus and P. sordidus were 10,991 bp and 10,993 bp, respectively. Of these, nine PCGs are located at the N-strand, and the other four PCGs were encoded on the J-strand (Fig. 2). Most PCGs started with ATN except for cox1 and nad4l that began with TTG. Ten PCGs terminated with TAA/TAG, and the remaining three PCGs (cox1, cox2, and cox3)   (Tables 3, 4). It has been speculated that these incomplete termination codons can be completed by adding 'A' during transcription (Ojala et al. 1981;Lavrov et al. 2002), and do not affect translation.
The RSCU of the two species was calculated (Fig. 3). The codons that were most used TTA-Leu and AGA-Arg. Most of the frequently used codons are composed of A and T, which may be related to the fact that the A-T skewness is higher than the G-C skewness in the PCGs of the two species. The nucleotide diversity (Pi) of the two species based on 13 PCGs was computed ( Fig. 4) and ranged from 0.05 to 0.11. Among the PCGs, nad3 (0.11) had the highest Pi values, and nad4l (0.05) had the lowest Pi values, which implies that nad4l is the most conserved gene in Pylorgus.
The ratios of Ka/Ks for each gene of the 13 PCGs were also computed (Fig. 5). All Ka/Ks values were less than 1 and ranged from 0.01 to 0.13, indicating that the genes have been subjected to purification selection. In particular, the Ka/ Ks values were the highest for nad4 and nad5, suggesting that they had the highest evolution speed, and lowest for cox1, indicating the slowest evolution.

Gene overlaps and intergenic spacers
Eleven gene overlaps were observed in the two mitogenomes, ranging from 1 bp to 25 bp (Tables 3, 4), and nad4l and trnT possessed the longest overlap.  Intergenic spacers were identified in the two mitogenomes, and their lengths ranged from 1 bp to 38 bp (Tables 3, 4). The longest intergenic spacer of 38 in P. sordidus was located between trnH and nad4.

Transfer RNA and ribosomal RNA genes
The two mitogenomes both contain the complete set of 22 tRNA genes typical of Lygaeidae mitogenomes, ranging from 60 to 71 bp, which is consistent with previously sequenced mitogenomes of Lygaeidae (Cao et al. 2020;Huang et al. 2021). Fourteen of the 22 tRNAs were on the N-strand, and eight were on the J-strand (Fig. 2).
All tRNA have the typical cloverleaf secondary structure, including the TΨC arm, the amino acid acceptor arm, the anticodon arm, and the dihydrouridine arm. Some of tRNA genes (trnY, trnA, trnS1, trnF, trnH, trnP, and trnV) showed individual base mismatches, which is a common phenomenon in insect mitogenomes (Zhang et al. 2019).
The rrnL genes of the two mitogenomes are located at the intergenic region between trnL and trnV, with lengths that range from 1217 bp to 1221 bp. The rrnS genes are located between trnV and the D-loop, which are both 590 bp in length. Both rRNAs are located on the N-strand.

Phylogenetic analysis
Phylogenetic relationships within Lygaeoidea were reconstructed based on mitochondrial 13 PCGs using BI and ML methods (Figs 6, 7). A total of 21 Lygae- oidea species were selected as the ingroup and an additional four species from Pyrrhocoroidea, Coreoidea, Rhopalidae, and Alydidae were used as the outgroup. Compared to the ML tree, the BI tree had higher confidence values, and the monophyly of all the studied families was supported except Rhyparochromidae.
The clades making up the Lygaeidae had high support values in the BI results and confirmed the monophyly of Lygaeidae (Figs 6, 7). The monophyly of Lygaeidae was also supported in the ML results, but the nodal support is not

Crompus oculatus MW619652
Henestaris halophilus MW619656 Outgroup so high. However, the Lygaeidae clusters as sister to Malcidae in the ML tree, but sister to Geocoridae in the BI tree, implying that the positions of Geocoridae and Malcidae are unstable. The two species of Rhyparochromidae are not clustered together. Neolethaeus assamensis clusters as sister to the Pyrrhocoroidea species, and together they are sister to the remaining ingroups.

Discussion
In this study, we sequenced and analyzed the mitogenomes of Pylorgus porrectus and P. sordidus, which had common and similar structures. The mitochondrial genome structure of the two Pylorgus species is a double-stranded closed loop, containing a non-coding control region sequence and encoding 37 genes. The two species showed a substantial nucleotide bias toward a higher A and T content, as do other Pentatomomorpha (Zhang et al. 2019;Cao et al. 2020;Huang et al. 2021;Carapelli et al. 2021;Xu et al. 2021;Zhu et al. 2023). All PCGs began with ATN except for cox1 and nad4l that started with TTG. In total, 10 PCGs terminated with TAA/TAG and the remaining three PCGs (cox1, cox2, and cox3) terminated with incomplete T residues. The calculation of Ka/ Ks values revealed that nad4 and nad5 had relatively higher evolutionary rates, and cox1 was determined to be the most conserved gene. Eleven gene overlaps were observed in the two sequenced mitogenomes, and gene overlaps have also been found in other known Lygaeidae mitogenomes (Cao et al. 2020). All tRNA molecules have a typical cloverleaf structure (Li et al. 2017).
The phylogenetic results using 13 PCGs confirm the monophyly of Lygaeidae, which support the opinions of Henry (1997), Henry et al. (2015), and Schuh and Weirauch (2020). The ML tree shows that the topology within Lygaeidae is Ischnorhynchinae + (Lygaeinae + Orsillinae) ( Fig. 6; Table 1). This result is in agreement with Cao et al. (2020) and Carapelli et al. (2021) but differs slightly from Henry's (1997) morphological hypothesis of Lygaeinae + (Ischnorhynchinae + Orsillinae). However, in the ML tree, P. porrectus and P. sordidus cluster with Kleidocerys resedae and then Crompus oculatus of Ischnorhynchinae ( Fig. 6; Table 1), whereas in the BI tree, P. porrectus and P. sordidus only cluster with the K. resedae, and C. oculatus clusters with the other species of Lygaeinae and Orsillinae ( Fig. 7; Table 1). We think this is mainly because the limited number of published mitogenomes within the Lygaeidae. This problem could be solved by sequencing additional mitogenomes of lygaeid species. The two selected species of Rhyparochromidae are not clustered together, which is similar with the results of Cao et al. (2020), Carapelli et al. (2021), and Huang et al. (2021). Neolethaeus assamensis clusters sister to the Pyrrhocoroidea species, and they together sister to the remaining ingroups in our result. More mitochondrial genomes need to be determined to better understand the monophyly of Rhyparochromidae. Overall, our results enrich the understanding of mitochondrial genome structure in the Lygaeidae and further supports the monophyly of the family containing the three subfamilies Ischnorhynchinae, Lygaeinae, and Orsillinae. collecting specimens. We thank Thomas J. Henry (National Museum of Natural History, Washington DC.), the two anonymous reviewers, and the subject editor, Jader Oliveira, for their helpful and constructive comments. We also thank Guangyu Yu (Jiangxi Agricultural University) for revising the manuscript.