Phylogenomics of the family Lachesillidae (Insecta: Psocodea: Psocomorpha)

Lachesillidae is one of the largest families of bark lice and includes more than 420 described species, in 26 genera and three subfamilies. This family belongs in the suborder Psocomorpha, infraorder Homilopsocidea. The classification of Lachesillidae is based on male and female genital morphologies, but questions remain regarding the monophyly of the family and some of its genera. Here, we used whole genome and transcriptome data to generate a 2060 orthologous gene data matrix of 2,438,763 aligned bp and used these data to reconstruct the phylogenetic relationships of species of Lachesillidae and relatives. Taxon sampling included 24 species from Lachesillidae and 23 additional species belonging to related families from the infraorders Homilopsocidea and Caeciliusetae. Phylogenetic relationships reconstructed with maximum likelihood and coalescent‐based analyses indicated paraphyly of Lachesillidae, and monophyly of the tribe Graphocaeciliini and the genus Lachesilla were also never recovered. Instability was observed in the position of Eolachesilla chilensis, which was recovered either as sister to Elipsocidae or to Mesopsocidae species, so we cannot conclusively determine the position of this genus within the Homilopsocidea. Given our results, a reclassification is necessary, but more taxon sampling of other species in Mesopsocidae and Peripsocidae would be useful to add to a tree in future before proposing a new classification.

Morphology has played an important role in the systematics of Lachesillidae, and the monophyly of the family is in part supported by morphological data. Yoshizawa (2002) established the monophyly of the family based on morphological systematics, but once the genus Eolachesilla Badonnel (Eolachesillinae) was included, the monophyly of the family was called into question. In a similar way, (Schmidt & New, 2004) established the phylogenetic relationships of the family Elipsocidae, with Eolachesilla chilensis New & Thornton being the sister group of the elipsocids. This analysis gave support to the previous hypothesis, based on the male genital characters, that this genus is more closely related to Elipsocidae than to Lachesillidae (New & Thornton, 1981).
Within Lachesillinae, the highly diverse genus Lachesilla has been divided into 20 species groups based on morphological similarity (Garcia Aldrete, 1974;García Aldrete, 2014;García Aldrete & Mockford, 2011). However, a morphological phylogenetic analysis revealed paraphyly of the genus Lachesilla, with the pedicularia species group + genus Nadleria Badonnel standing apart from the remaining Lachesilla species, which are closely related to the genus Hemicaecilius Enderlein (Saenz Manchola et al., 2019). Similarly, several Asian genera of the subfamily Lachesillinae have been discussed, and their diagnosis may need a major taxonomic revision. For example, the genus Dicrolachesillus was placed in synonymy with Lachesilla by Lienhard (2003), whereas the monotypic genus Cyclolachesillus (Cyclolachesillinae) could possibly be an elipsocid based on the illustrations of C. ningxiaensis Li (García Aldrete, 2006).
As with morphological data, prior molecular phylogenetic analyses based on Sanger sequencing have not recovered the monophyly of Lachesillidae, with the genus Lachesilla being the main source of instability within the infraorder Homilopsocidea (Yoshizawa & Johnson, 2014). This Sanger sequence-based phylogeny recovered Lachesilla as sister to Peripsocidae, whereas Eolachesilla + Anomopsocus were recovered as sister to Elipsocidae. Recently, a phylogenomic study of higher level relationships within Psocodea revealed generally stable relationships within Psocomorpha, although the monophyly of the infraorder Homilopsocidea was not supported nor was the monophyly of Lachesillidae and Elipsocidae (de Moya et al., 2021). It should be noted that these Sanger and phylogenomic analyses were not focused on resolving relationships within Lachesillidae. However, recent mitochondrial genomics (Saenz Manchola et al., 2021) and UCE (Saenz Manchola et al., 2022) data sets with extensive taxon sampling of Lachesillidae also did not recover monophyly of the family. These data sets also had aspects of instability for some higher level relationships within Psocomorpha, including the relationships among the major clades of Lachesillidae, Elipsocidae and Mesopsocidae.
Here, we used whole genome and transcriptome data to generate a 2060 orthologous gene data matrix to reconstruct the phylogenetic relationships of bark louse species of Lachesillidae. Taxon sampling included 24 species from the family Lachesillidae, plus 23 additional species belonging to related families in the infraorders Homilopsocidea and Caeciliusetae. We performed concatenated (including examination of different codon positions, Binary RY based coding and reduced gene data matrix) and coalescent base methods to explore phylogenetic relationships of this group.

Genomic sequencing
For 35 species, whole genome sequence (WGS) data were generated by extracting total genomic DNA using a Qiagen DNeasy Microkit.
Library preparation and Illumina sequencing were conducted at the Roy J. Carver Biotechnology Centre at the University of Illinois. A Covaris M220 machine was used to sonicate DNA fragments to approximately 300-500 bp. Libraries were prepared using a Hyper Library construction kit from Kapa Biosystems. Libraries were quantified by qPCR and pooled for sequencing using Illumina HiSeq2500 or NovaSeq6000 S4 lanes for 151 cycles. Pooling was done to achieve between around 30-60X coverage based on an estimated (but unknown) genome sizes of 200-400Mbp, using genome sizes of other members of Psocodea for which genome size is known. The bcl2fastq v2.20 Conversion Software was used to demultiplex and generate FASTQ files. Raw reads were deposited in the NCBI Sequence Read Archive (Table 1).

Gene assembly
Transcriptome data used in this study were previously published by Johnson et al. (2018). Gene assembly was performed with aTRAM2 v2.2.0 , using a gene set of 2395 protein-coding orthologs as reference. This gene set was identified in the annotated genome of the human body louse, Pediculus humanus Linnaeus and previously used for phylogenomic at deep levels for the hemipteroid T A B L E 1 Taxa included in this study with their number of genes assembled and GenBank accession numbers insects ; Hemiptera: Auchenorrhyncha, (Skinner et al., 2020); bark lice and parasitic lice order Psocodea (de Moya et al., 2021), and shallow levels for the bird louse, genus Penenirmus . aTRAM2 was set to 1 iteration, using the amino acid sequences of the reference genes to assembly target genes with ABySS assembler (Simpson et al., 2009). The resulting exon sequences for each gene were stitched together with Exonerate (Slater & Birney, 2005) implemented in aTRAM2.
Additionally, to explore possible impact of molecular biases (GC bias), LBA and/or Incomplete Lineage Sorting (ILS) on phylogeny, several methods were implemented. We generated a binary RY coding-based data set from the full concatenated matrix with a customized python script (Braun & Kimball, 2021), which was analysed under an ML approach, using the same parameters as those for the full concatenated data matrix on IQTREE2. Similarly, two additional matrices based on the codon positions of the full concatenated matrix were generated with PAUP v4.0a (Swofford, 2003); first and second codon positions combined (Supplementary data 2) and second codon positions only (Supplementary data 3). We performed a coalescent species tree analysis with ASTRAL-III v5.7.7 (Zhang et al., 2018) using as input the individual gene trees generated with IQTREE2 (Àm MFP) and computing local posterior probabilities (LPP) for branch support (Sayyari & Mirarab, 2016). Finally, TreeShrink v1.3.9 (Mai & Mirarab, 2018) with default parameters was used to prune potential outlier species with abnormally long branches. Based on the TreeShrink analysis, we generated a reduced data set which was analysed under ML with IQTREE2, whereas individual gene trees pruned were used as input for an additional coalescent species tree analysis with ASTRAL-III.

RESULTS
From the 2395 single copy ortholog genes used as reference, aTRAM 2 assembled, on average, 2199 genes for the species generated in this study (Table 1). The final concatenated data matrix included 2060 single copy ortholog genes and 2,438,763 aligned bp (general statistics per gene associated to each data matrix can be found in Table S1).
Similarly, derived from the TreeShrink prune analysis, a reduced concatenated data matrix with 690 outlier-free genes containing 858,528 aligned bp was generated. Based on these concatenated data matrices, with the exception of the AA and the second codon only data sets, the majority of the ML analyses did not recover Eolachesilla chilensis Badonnel in a monophyletic clade with the remaining species of the family Lachesillidae. Also, monophyly of the infraorder Homilopsocidea was generally unsupported and relationships were generally unstable, especially at deep phylogenetic levels. In contrast, the infraorder Caeciliusetae always was recovered as monophyletic, regardless the data set and analysis used.
Excluding E. chilensis, the remaining species of Lachesillidae were grouped as a monophyletic clade with both concatenated and the majority of the ML analyses (Figures 1a,b, S1 and S2). The second codon position ( Figure S3) and AA ( Figure S4) analyses, clustered E. chilensis in a monophyletic subfamily Eolachesillinae, sister to the Elipsocidae + Mesopsocidae clade, rather than the subfamily Lachesillinae, whereas RY coding data set ( Figure S5) The subfamily Lachesillinae was always recovered as monophyletic (Figures 1a,b, S1-S5). For Lachesillinae, this study included two species of the genus Hemicaecilius Enderlein and ten species of the highly diverse genus Lachesilla Westwood, representing seven species groups, but monophyly of Lachesilla never was recovered. All data sets and analysis recovered the species group forcepeta (two species included here) sister to Hemicaecilius, whereas the species group pedicularia (which include the family, genus and group type species With the AA data set, deep relationships received high UFB (100-97%). Here, Ectopsocidae species were recovered sister to the remaining Homilopsocidea + Caeciliusetae, with Peripsocidae begin sister to the later infraorder ( Figure S4). Some differences were observed between full 2060 ASTRAL tree and the reduced 690 ASTRAL tree regarding the position of Ectopsocidae and Peripsocidae; the former data set recovered Peripsocidae as sister to Caeciliusetae + remaining species of Homilopsocidae (LPP = 0.45, Figure S1, node 1), whereas Ectopsocidae was recovered as sister to a clade which includes Caeciliusetae plus Elipsocidae + Mesopsocidae + E. chilensis, but with low branch support (LPP = 0.43, Figure S1, node 2-3). The latter data set recovered Peripsocidae sister to Caeciliusetae (LPP = 0.81, Figure S1, node 1) and Ectopsocidae sister to the remaining Homilopsocidea species (LPP = 0.58, Figure S1, node 2-3).

DISCUSSION
Relationships within family Lachesillidae have been discussed in the past based on morphology, but a phylogenetic framework has not been proposed until recently. Mockford and Sullivan (1986)  Lachesilla (Table 2) is the most specious genus, with more than 340 described and at least 100 undescribed species (García Aldrete & da Silva-Neto, 2020) and a nearly cosmopolitan distribution.
The phylogenomic tree of the family Lachesillidae presented here provides a new framework to better understand the phylogenetic relationships of some genera of the family. For example, the genus Eolachesilla has been an issue since Badonnel (1951) included this genus within Lachesillidae. This genus was transferred by New and Thornton (1981) to Elipsocidae and subsequently placed back into Lachesillidae by Mockford and Sullivan (1986) based on morphological characters. It was not until Yoshizawa (2002) that a morphological systematic approach explored the phylogenetic relationships of suborder Psocomorpha and found that the monophyly of the family

Lachesillidae including Eolachesilla is uncertain and noticed that
Eolachesilla may represent its own family, close to Lachesillidae.
Our results support the hypothesis that Eolachesilla is closely related to the Elipsocidae + Mesopsocidae rather than to Lachesillidae, as was suggested by Schmidt and New (2004). However, the sys- Such discordance between topologies (considering data sets and gene tree-species trees) may be the result of several issues that can affect phylogenetic estimation. These include faster substitution rate in nonadjacent phylogenetic lineages, poor taxon sampling due to extinction or unavailability of some taxa, and unsuitable models of sequence evolution that do not account for base compositional heterogeneity can be associated with long-branch attraction biases (LBA, Lartillot et al., 2007;Qu et al., 2017). Here, the topological conflict observed between nucleotide vs amino acid analysis could be caused by a model misspecification, which has been proposed as source of inaccurate phylogeny estimation and potentially resulting in conflicting topologies hypothesis under nucleotide vs amino acid (Gillung et al., 2018).
However, conflict among nucleotides or amino acids topologies is relatively common in phylogenomics, affecting Lepidoptera, Hymenoptera, Coleoptera, Hemiptera, amount others groups (see Rota et al., 2022, for a summary of insect phylogenomics with topologies conflict).
One of the proposed strategies to reduce the effects of LBA is to exclude long-branch taxa and/or fast-evolving genes from the analysis (Lartillot et al., 2007;T. Li et al., 2014) which was implemented here with the TreeShrink analysis. These resulted in both ML and coalescent analyses highly congruent, especially at shallow levels, whereas ASTRAL 2060 vs 690 data sets showed some incongruence, especially for Ectopsocidae + Peripocidae species and their position at deep levels ( Figure S1, nodes 1-2). Unlike for the 2060 vs 690 concatenated data sets, relationships between E. chilensis and Mesopsocidae + Elipsocidae differs (Figure 1a,b, nodes 3-4), which may be associated with the topology impact of the outlier taxa and genes in the phylogeny.
On the other hand, base composition bias (GC bias) may cause erroneous phylogenetic estimation, which has been identified in the past as a source of conflict in phylogenetic studies of psocids (de Moya et al., 2021;Saenz Manchola et al., 2022;Yoshizawa & Johnson, 2014).
In our current data set, E. chilensis and M. unipunctatus contain the largest average and codon-specific GC content among all the species analysed (Table S2). Both species were recovered as sister taxa in the full data set topology (which contains 54.44 and 50.82 GC %, respectively), whereas they were recovered as more distantly when the third codon was excluded and with the second codon position only data sets (containing 39.27 and 38.52 GC %, respectively, Table S2). An additional F I G U R E 2 Tree topologies for Lachesillidae species obtained with ML analysis for: (a) 2060 nuclear ortholog gene data set, (b) 2081 UCE loci data set and (c) 37 mitochondrial genes data set. Grey squares indicate Lachesillidae species. Black circles indicate monophyly not supported for Lachesillidae groups. Species pictures from left to right, tribe Graphocaeciliini: Graphocaecilius and Dagualachesilla. Subfamily Lachesillinae: Lachesilla (pedicularia species group) and Hemicaecilius. Credits to Dr. Ranulfo González strategy we implemented to avoid GC bias was the RY coding-based analysis ( Figure S5), and this result was similar to that obtained with the second codon position only and the amino acid data sets (i.e. subfamilies Eolachesillinae and Lachesillinae not clustered together in a monophyletic clade). However, the RY data set resulted in E. chilensis sister to the Mesopsocidae + Elipsocidae species rather that Eolachesillinae, which could be an indicator of the impact of GC bias has on the conflicting positions of this species in the phylogeny.
Given this instability, it is difficult to clearly establish a systematic position for Eolachesilla within Homilopsocidea, but we consider that a reclassification is likely necessary, thus excluding Eolachesilla chilensis from Lachesillidae and declaring this genus incertae sedis within Elipsocidae + Mesopsocidae. Additionally, we strongly suggest that more taxon sampling of other species in Elipsocidae and Mesopsocidae would be useful to add to a tree in future before finalizing a new classification for this problematic genus.
Apart from the unstable position of Eolachesilla, highly supported clades were found within Lachesillidae. For example, monophyly of tribe Graphocaeciliini was not recovered with any of the data sets or analysis (Figures 1a,b and S1-S5), and these results also were supported by previous UCE and Mitophylogenomics phylogenies ( Figure 2b,c). The grouping G. interpretatus + Anomopsocus spp. + Genus 1 was recovered in a clade apart from the remaining genera and species of graphocaeciliines with high branch support (UFB =100, LPP = 1, Figure 1a,b). A close relationship between the genera Graphocaecilius and Anomopsocus was recognized by Mockford and Sullivan (1986), based on the phallosome and epiproct morphology.
The arrangement of the mitochondrial genome indicates that this clade also shares a unique mitochondrial gene rearrangement, which supports the close relationships between these species (Saenz Manchola et al., 2021 (Figure 2b,c) and the male claspers-phallosome structure, we suggest that the species group forcepeta could be considered as a separate entity from the remaining species of Lachesilla.
Similarly, with the results presented here, the remaining species groups of Lachesilla should be re-evaluated, considering the species in the pedicularia species group, which includes the type species of the genus (L. pedicularia), as Lachesilla sensu strictu.

CONCLUSIONS
Since Mockford and Sullivan (1986) Figure 1a), with each one grouped either to Caeciliusetae, with the remaining Homilopsocidea species (second codon only data set, Figure S3), or as sister to the remaining species in a monophyletic infraorder Homilopsocidea (third codon excluded data set, Figure S2).

SUPPORTING INFORMATION
Additional supporting information can be found online in the Supporting Information section at the end of this article.