Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data

Cai, Chenyang

doi:10.1038/s42003-024-05793-7

Download PDF

Article
Open access
Published: 17 January 2024

Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data

Chenyang Cai ORCID: orcid.org/0000-0002-9283-8323¹

Communications Biology volume 7, Article number: 106 (2024) Cite this article

1649 Accesses
1 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Ants are the most ubiquitous and ecologically dominant arthropods on Earth, and understanding their phylogeny is crucial for deciphering their character evolution, species diversification, and biogeography. Although recent genomic data have shown promise in clarifying intrafamilial relationships across the tree of ants, inconsistencies between molecular datasets have also emerged. Here I re-examine the most comprehensive published Sanger-sequencing and genome-scale datasets of ants using model comparison methods that model among-site compositional heterogeneity to understand the sources of conflict in phylogenetic studies. My results under the best-fitting model, selected on the basis of Bayesian cross-validation and posterior predictive model checking, identify contentious nodes in ant phylogeny whose resolution is modelling-dependent. I show that the Bayesian infinite mixture CAT model outperforms empirical finite mixture models (C20, C40 and C60) and that, under the best-fitting CAT-GTR + G4 model, the enigmatic Martialis heureka is sister to all ants except Leptanillinae, rejecting the more popular hypothesis supported under worse-fitting models, that place it as sister to Leptanillinae. These analyses resolve a lasting controversy in ant phylogeny and highlight the significance of model comparison and adequate modelling of among-site compositional heterogeneity in reconstructing the deep phylogeny of insects.

Incongruence in the phylogenomics era

Article 27 June 2023

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

Article Open access 02 December 2019

Global determinants of insect mitochondrial genetic diversity

Article Open access 29 August 2023

Introduction

Ants are the most ubiquitous and ecologically dominant arthropods in terrestrial ecosystems, largely due to the evolution of eusociality^1,2. A robust phylogeny of the group is key to understanding ant character evolution, species diversification, and biogeography. Over the past two decades, developments in molecular phylogenetics^1,2,3,4,5,6, along with the discovery of exceptional Cretaceous fossils^7,8,9,10 have propelled substantial advances in our understanding of ant evolutionary history.

Among the three major groups (formicoids, leptanilloids and poneroids; Fig. 1) of extant ants, the intrafamilial relationships within the formicoids, a clade encompassing the vast majority of ant species, have crystallised from recent molecular phylogenies^2,6,11,12. By contrast, the intersubfamilial relationships within the poneroid clade are not clear in recent phylogenetic studies, although the monophyly of the clade is strongly supported^2,3,4. The most contentious open questions of ant phylogeny lie in the morphologically peculiar leptanilloid clade (Leptanillinae and Martialinae; Fig. 1)^2,4,5, with some phylogenetic studies rejecting the monophyly of this group^13,14. The subfamily Martialinae, represented by a sole Neotropical species Martialis heureka Rabeling & Verhaagh, was earlier recovered as the sister group to the remainder of the extant ants¹³. However, this hypothesis has been questioned by Kück et al.¹⁴, who recovered Leptanillinae as a sister group to the extant Formicidae based on improved alignment of the Sanger-sequencing data, alignment masking, and data partitioning. Such inconsistencies may result from confounding factors in phylogenetic analyses such as long-branch attraction¹⁵ and compositional heterogeneity across lineages⁴. Recently, a genome-scale phylogenomic study based on protein-coding genes and ultra-conserved elements under both concatenation and coalescence methods consistently and strongly supported the monophyly of the leptanilloid clade². Whether Martialis or the Leptanillinae are sisters to all other ants is thus the most pressing outstanding question of ant systematics and evolution¹, since it is fundamental to our understanding of ant phenotype and ecology.

**Fig. 1: Widely accepted knowledge of phylogenetic relationships among the extant subfamilies of ants.**

In the phylogenomic era, molecular phylogenies, whether they are maximally supported or not, are often misled by systematic errors when the properties of molecular evolution are not adequately modelled^{16,17,18,19,20}. To tackle this issue, I thoroughly explored the recently published Sanger-sequencing⁴ and genome-scale² datasets of ants by testing the fit of substitution models and modelling among-site compositional heterogeneity. My analyses of large-scale datasets under the best-fitting model, selected based on model comparison and posterior predictive model checking, identify modelling-dependent signals and shed light on the contentious early divergences in the tree of ants.

Results

Sanger-sequencing datasets of Borowiec et al.⁴

My phylogenetic analyses of three (complete 11-gene matrix and two matrices with the most AT-rich and the most GC-rich outgroups excluded, with 7451 NT sites) of the four datasets presented by Borowiec et al.⁴ under the site-heterogeneous CAT-GTR + G4 model yielded trees (Fig. 2) largely consistent with the ones obtained under the partition model in the original study⁴. The trees based on the complete dataset (Fig. 2a) and the one with the most GC-rich outgroups removed (Fig. 2c) were identical to each other in terms of the ingroup topology and support. Leptanillinae were recovered as sister to all other ants, and Martialis was weakly supported as the second-branching lineages (Bayesian posterior probability [BPP] = 0.58 and 0.74, respectively). For the matrix with the most AT-rich outgroups removed (Fig. 2b), Martialis was very weakly supported as a sister to Leptanillinae (BPP = 0.35), but all other relationships were congruent with the complete dataset. Due to site filtering, that reduced the phylogenetic signal and compositional bias⁴, the fourth compositionally homogeneous matrix (3995 NT sites) under CAT-GTR + G4 yielded an overall weakly supported tree (Fig. 2d): Martialis was weakly supported as sister to Leptanillinae (BPP = 0.77), and the intersubfamilial relationships within the poneroid and formicoid clades too were weakly supported. Overall, the deep relationships among ants, i.e., the systematic placements of Leptanillinae and Martialis, were not resolved with confidence based on these four datasets.

**Fig. 2: Phylogenetic analyses of Sanger-sequencing datasets from Borowiec et al.⁴ under the site-heterogeneous CAT-GTR + G4 model in PhyloBayes.**

Nuclear genomic datasets of Romiguier et al.²

In my maximum likelihood analyses using IQ-TREE, Matrix 1, Matrix 2 and Matrix 4 under LG4X + R and LG + C20 + F + G models yielded consistent results (Figs. 3a and 4a; Supplementary Figs. 1, 2 and 4): the clade Martialis + Leptanillinae was maximally supported (maximum likelihood bootstrap [MLB] = 100; except for Matrix 4 under the LG4X + R model, MLB = 92), as was the monophyly of the poneroid and formicoid clades (MLB = 100). Similarly, Matrix 3 under the site-homogeneous LG + F + G model and the empirical site-heterogeneous LG + C20, C40 and C60 models resulted in a similar topology to that derived from Matrix 2, although the support values for Martialis + Leptanillinae were low (MLB = 56, 89, 90 and 80, respectively; Supplementary Fig. 3). Interestingly, Matrix 5 under LG4X + R, LG + C20 + F + G and GHOST (LG + FO*H4) models consistently yielded different topologies in terms of the placement of Martialis: Martialis was supported as sister to poneroids + formicoids (MLB = 81, 97 and 73, respectively; Supplementary Figs. 5 and 6).

**Fig. 3: Phylogenomic analyses of filtered nuclear genomic 4151-gene dataset (Matrix 1) from Romiguier et al.².**

By contrast, under the site-heterogeneous CAT-GTR + G4 mixture model, all datasets (Matrices 1–5) yielded a consistent topology in terms of the systematic positions of Martialis and Leptanillinae (Figs. 3b, c and 4b; Supplementary Figs. 1c, 2c, 3f, 4c and 5c). All nodes in all Bayesian analyses of Matrices 1, 2 and 4 were maximally supported (BPP = 1), suggesting that the phylogenetic signal was strong in these matrices. Leptanillinae was a sister group to all other subfamilies of ants, and Martialis was recovered as a sister to the monophyletic poneroids and formicoids. The intersubfamilial relationships of formicoids were identical in all analyses, agreeing with the currently accepted topology based on recent phylogenetic studies^1,2,3,4. The interrelationships of poneroid subfamilies were slightly different between the 4151-gene Matrix 1 and the 2343-gene Matrices 2 and 3. Based on Matrix 1, Ponerinae were maximally supported as sister to the remaining five subfamilies (Fig. 3a; Supplementary Fig. 1c). Based on Matrices 2 and 3, however, Apomyrminae + Amblyoponinae was sister to other lineages (Fig. 4b; Supplementary Figs. 2c and 3f).

Model comparison

The LOO-CV (leave-one-out cross-validation) and the wAIC (widely applicable information criterion) scores were obtained based on Matrix 3, considering the huge computational burden. The scores were close to each other, suggesting that wAIC is a close approximation of LOO-CV for the 2343-gene amino acid dataset. ∆CV and ∆wAIC were calculated as the difference in the estimated predictive performance between the best-fitting model and another model under consideration. As shown in Fig. 4c, the CAT-GTR + G4 mixture model fitted the dataset better than any of the other models, including the site-homogeneous LG + F + G model, and the LG + C20 + F + G, LG + C40 + F + G, and LG + C60 + F + G models on the amino acid dataset, both according to LOO-CV (∆CV = −28.28 + 29.44 = 1.16) and according to wAIC (∆wAIC = −28.28 + 29.44 = 1.16). The LG + C20 + F + G model used by Romiguier et al.² was clearly better fitting than the LG + F + G model (∆CV = −29.66 + 30.39 = 0.73), but less well fitting than the CAT-GTR + G4 model. Therefore, topologies reconstructed with the CAT-GTR + G4 model were used as the preferred trees for elucidating the relationships of ant subfamilies.

Posterior predictive model checking

To test whether available models could adequately describe among-site amino acid preferences, I used analyses of site-specific amino acid diversity (PPA-DIV), or the mean number of distinct amino acids observed at each site, based on Matrix 3, as shown in Table 1. The Z score was adopted here since the MCMC estimate of the p-value is 0. PPA-DIV has a broad and distinctive distribution of Z scores across the different tested models (from 156.891 under LG + F + G to 3.974 under CAT-GTR + G4). For the tested dataset, absolute Z scores greater than 5 were obtained under the LG + F + G and LG + C20, C40 and C60 models, indicating a strong rejection of the null hypothesis that the model adequately describes the data. PPAs indicated that CAT-GTR + G4 (Z-score = 3.974), the model that was favoured based on my model comparison test, describes site-specific amino acid preferences substantially better than other tested models. This result was not surprising, since CAT-GTR + G4 is known to be by far the best-fitting model that can explicitly accommodate among-site compositional heterogeneity^21,22,23,24.

Table 1 Comparing model adequacy.

Full size table

Discussion

Impacts of outgroup choice and data filtering on tree topology

My detailed analyses of both Sanger-sequencing and genome-scale datasets provide a basis for reassessing the subfamilial relationships of ants. As found in the recent phylogenomic study², my results based on multiple supermatrices (variants of the less-outgroup 4151-gene and more-outgroup 2343-gene datasets) demonstrate that outgroup selection does not affect the internal phylogeny of ants when the phylogenetic signal is strong. By contrast, outgroup choice does influence the basal relationships when the much smaller Sanger-sequencing nucleotide datasets are analysed⁴ and this may stem from sequence divergence and a lack of sufficient phylogenetic signal. My phylogenomic analyses also show that data filtering of various degrees and taxon subsampling do not affect the topology of the basal ant phylogeny, when the large-scale matrices are used. Moreover, my comparative phylogenomic results under both site-homogenous and site-heterogeneous models, integrated with formal model comparison, clearly demonstrate that modelling among-site compositional heterogeneity is the key to a natural ant tree of life.

Significance of model comparison and modelling compositional heterogeneity

Modelling of amino acid replacement is central to phylogenomic inference, particularly so when dealing with deeper relationships and rapid radiations. As such, model comparison is a crucial yet computationally challenging step in modern phylogenomics. In the most recent phylogenomic study of ants², the empirical finite mixture model LG + C20 + F + G was selected to mitigate long-branch-attraction artefacts by modelling among-site compositional heterogeneity. The selection of this particular model (instead of LG + C40 + F + G, LG + C60 + F + G, or other theoretically better-fitting models) was apparently a compromise since runs of supermatrices under the C40 and C60 models (in IQ-TREE) are computationally expensive in terms of both running time and memory requirements. A recent study focusing on model comparison under Bayesian cross-validation, however, shows that amino acid mixture models (CAT models) outperform all single-matrix models (LG, WAG) and free finite mixtures (CAT-GTR + G4) consistently outperform empirical finite mixtures (e.g., LG + C20, C40 and C60)²⁵. Not surprisingly, my cross-validation analyses based on the ant dataset reached the same conclusion that CAT-GTR + G4 outperforms other tested models (including LG + C20 + F + G and LG + C60 + F + G). In addition, similarly to a recent simulation study on the rooting of the animal tree²⁶, my posterior predictive analyses demonstrate that CAT-GTR + G4 can best describe site-specific amino acid preferences in ant phylogenomics, so CAT should be preferred to C60. Overall, my analyses of genome-scale data highlight the significance of model comparison and adequate modelling of among-site compositional heterogeneity in deciphering the deep phylogeny of ants.

My reanalyses of the 11-loci datasets from Borowiec et al. ⁴ suggest that deeper phylogeny of ants and the position of Martialinae cannot be unambiguously resolved when the phylogenetic signal is weak. Among all of my analyses, the highest support regarding the placement of Martialis was 0.77, under the compositionally homogeneous matrix (Fig. 1d). However, in this particular analysis, support for the nodes within formicoids were exceptionally weak, and more importantly, the relationships within formicoids were inconsistent with the widely accepted topology^1,2 and my phylogenomic results.

Position of Martialis in the ant Tree of Life

The discovery of Martialis heureka (Martialinae) based on a single stray worker from the Amazon (north of Manaus, Brazil)¹³ is exciting and perplexing. Since M. heureka displays a bizarre combination of both pleisiomorphic and autapomorphic traits, it was placed into its own subfamily. Its precise phylogenetic position has been contentious since its discovery. Rabeling et al. ¹³ recovered M. heureka as the sister to all remaining extant subfamilies, while a reanalysis by Kück et al. ¹⁴ recovered it as the second-branching lineage after Leptanillinae. Subsequent integrated analyses based on a handful of loci^4,27 continued to reach divergent conclusions. Based a broad sampling and genome sequencing, Romiguier et al.². retrieved high support for the leptanillomorph clade (Leptanillinae and Martialinae) as the sister group to all other extant ants. My reanalyses of multiple datasets from the most comprehensive studies^2,4 under better-fitting models show that M. heureka is sister to all ants except Leptanillinae, agreeing with the conclusion of Kück et al.¹⁴ but rejecting other topologies^2,4,10,13. My analyses resolve a lasting enigma in ant phylogeny and offer a backbone topology for investigations of character evolution, biogeography, and ecology of early ants. For instance, Boudinot et al.¹⁰ recently have adduced some potential synapomorphies of Leptanillomorpha (Martialis + Leptanillinae), but the present phylogenomic results suggest that these morphological similarities could be a consequence of convergent acquisition of features adapting these ants to a subterranean lifestyle.

The relationships of formicoid subfamilies have long been concerted¹, but the intersubfamilial relationships of poneroids remain unsettled. Monophyly of the morphologically heterogeneous poneroids has been recently consistently recovered in molecular phylogenetic studies, including Moreau et al.⁶, some analyses of Brady et al.¹¹, Ward and Fisher²⁸, Borowiec et al.⁴ and phylogenomic studies^2,5. In more recent studies^4,5, many deeper nodes within poneroids were weakly to moderately supported. The resolution of poneroid relationships has been much improved in Romiguier’s et al.² genome-based phylogenomic study. The remaining incongruences in poneroid relationships between the aforementioned study and the present analyses of the filtered 4151-gene dataset (Matrix 1) under CAT-GTR + G4 remain to be addressed by future studies with a broader taxon sampling. Ponerinae are a sister group to the clade (Amblyoponinae, Apomyrminae) + (Paraponerinae, (Agroecomyrmecinae, Proceratiinae)) (Fig. 2a). My phylogenomic study will provide a foundation for understanding ant evolution and comparative studies of evolutionary innovations among ants.

Methods

Dataset collation

I used the most comprehensive Sanger-sequencing and nuclear genome alignments from Borowiec et al.⁴ and Romiguier et al.², respectively. The datasets (Sanger-sequencing datasets²⁹ and genome-scale datasets³⁰) were downloaded from the Zenodo data repository.

For the Sanger-sequencing (11 nuclear loci) data, I used all four nucleotide [NT] matrices generated in Borowiec et al.:⁴ (1) Full 11-locus matrix (123 taxa, 7451 NT sites); (2) Full matrix with the most AT-rich outgroups excluded (117 taxa, 7451 NT sites); (3) Full matrix with the most GC-rich outgroups excluded (117 taxa, 7451 NT sites); and (4) Homogeneous matrix with heterogeneous partitions removed (123 taxa, 3995 NT sites).

For the nuclear genomic data, I used the two BUSCO-gene amino acid [AA] supermatrices from Romiguier et al.:² (1) Fewer-outgroup AA dataset (83 taxa, 4151 single-copy protein-coding genes, 1,692,050 AA sites); and (2) More-outgroup amino acid dataset (188 taxa, 2343 single-copy protein-coding genes, 983,951 AA sites), which was designed to test the impact of outgroup selection on tree inference. To balance taxon sampling of subfamilies, focus on the deeper phylogeny of ants, and, more importantly, speed up computationally heavy Bayesian runs, I subsampled AA sites of the 4151-gene supermatrix and filtered constant sites (to speed up analyses) using BMGE v.1.1³¹, resulting in Matrix 1 (38 taxa, 647,114 AA sites). Similarly, I randomly pruned the outgroup taxa and selected all representative ingroup genera of the 2343-gene supermatrix, and filtered ambiguously aligned sites using BMGE with default setting (-m BLOSUM62, -h 0.5), yielding Matrix 2 (47 taxa, 623,908 AA sites). Additionally, as sensitivity tests of the potential impact of my data filtering and subsampling methods on tree inference, I (1) filtered the 2343-gene supermatrix using a stringent setting (-m BLOSUM30, -h 0.1:0.5) to select slow-evolving AA sites, resulting in Matrix 3 (47 taxa, 95,201 AA sites); (2) removed remotely related outgroup and selected representative ant genera, but kept all AA sites of the 4151-gene supermatrix, resulting in Matrix 4 (17 taxa, 1,692,050 AA sites); and (3) filtered the 4151-gene supermatrix using BMGE with a very stringent setting (-m BLOSUM30 -h 0.2:0.3), resulting in Matrix 5 (82 taxa, 21,902 AA sites).

Phylogenetic analyses

Phylogenomic analyses of the nuclear genomic datasets, Matrices 1–5, were conducted using the simpler LG4X + R model^32,33 and the site-heterogeneous model (LG + C20 + F + G)³⁴ with IQ-TREE v.2.1.3³⁵. For the site-heterogeneous LG + C20 + F + G model, the posterior mean site frequency (PMSF) model³⁶ was applied using the respective LG4X + R tree as the guide tree. In addition, the comparatively small Matrix 3 was analysed using the site-homogeneous model (LG + F + G) and the site-heterogeneous models (LG + C40 + F + G, and LG + C60 + F + G) with IQ-TREE v.2.1.3, corresponding to the models used on the following model comparison (see below). For Matrix 5, the heterotachous model (General Heterogeneous evolution On a Single Topology, GHOST³⁷) was also tested using IQ-TREE v.2.1.3.

As the subfamilial interrelationships of ants are expected to be affected by long-branch attraction artefacts^1,2, I used the compositionally site-heterogeneous infinite mixture model CAT-GTR + G4 implemented in PhyloBayes MPI 1.9³⁸, which has been proven to be effective for mitigating such a systematic error by modelling across-site compositional heterogeneity. Four Sanger-sequencing matrices (as nucleotides) and five genome-based supermatrices (as amino acids) were analysed under the CAT-GTR + G4 model. For each analysis, two Markov chain Monte Carlo chains were run, and convergence was assessed using the bpcomp and tracecomp tools implemented in PhyloBayes³⁹. Approximately 30% of samples were discarded as burn-in. Detailed information about the PhyloBayes runs (burnin samples, total number of cycles, bpcomp maxdiff and tracecomp minimal overall effective size) is given in the figure caption of each analysis.

Model comparison

For the filtered genome-scale AA dataset (Matrix 3), I used the comparatively efficient and reliable approaches, i.e., the leave-one-out cross-validation (LOO-CV) and the widely applicable information criterion (wAIC)⁴⁰, to estimate the relative fit of alternative models (CAT-GTR, LG + G, LG + C20, LG + C40, and LG + C60) in the latest PhyloBayes MPI 1.9. The general idea of cross-validation is to split the data set into two subsets, using one subset for training the model and then evaluating the fit of the model over the remaining subset. In the context of Bayesian inference, a natural procedure to implement CV is to average the validation likelihood over the training posterior distribution. The resulting CV score is then log-transformed and averaged over multiple random splits of the original data set into training and validation sets. In leave-one-out CV, each observation is taken in turn and set aside for validation, using the n − 1 remaining observations to train the model⁴⁰. The LOO-CV and wAIC scores were compared to determine and select the best-fitting model, based on which my preferred tree of ants was selected.

Testing model adequacy

Posterior predictive analyses (PPA)³⁹ were performed on Matrix 3 using PhyloBayes MPI 1.9 to test whether LG + F + G, LG + C20 + F + G, LG + C40 + F + G, LG + C60 + F + G, or CAT-GTR + G can adequately describe site-specific amino acid preferences for the dataset. These models (especially LG + F + G and LG + C20 + F + G) were selected and tested because they had previously been used in the recent study of ant phylogenomics² that yielded a contradictory topology to my preferred tree.

Statistics and reproducibility

For Bayesian phylogenetic analyses, I followed the practice as indicated in the manual of PhyloBayes^39,40, which integrates detailed methods (bpcomp and tracecomp) for statistical evaluations. Detailed information on the PhyloBayes runs (burnin samples, total number of cycles, bpcomp maxdiff and tracecomp minimal overall effective size) is given in the supplemental figure caption of each analysis. In addition, as sensitivity tests of the potential impact of my data filtering and subsampling methods on tree inference, I further designed three additional supermatrices for comparison.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data sets and output files generated in our phylogenomic analyses have been deposited in the Dryad Digital Repository [https://doi.org/10.5061/dryad.pk0p2ngsj]⁴¹.

References

Ward, P. S. The phylogeny and evolution of ants. Annu. Rev. Ecol. Evol. Syst. 45, 23–43 (2014).
Article Google Scholar
Romiguier, J. et al. Ant phylogenomics reveals a natural selection hotspot preceding the origin of complex eusociality. Curr. Biol. 32, 2942–2947 (2022).
Article CAS PubMed Google Scholar
Borowiec, M. L., Moreau, C. S. & Rabeling, C. Ants: Phylogeny and Classification, In Encyclopedia of Social Insects. (ed C. K. Starr), pp. 1–18. (Springer International Publishing, Cham, 2020).
Borowiec, M. L. et al. Compositional heterogeneity and outgroup choice influence the internal phylogeny of the ants. Mol. Phylogenet. Evol. 134, 111–121 (2019).
Article PubMed Google Scholar
Branstetter, M. G., Longino, J. T., Ward, P. S. & Faircloth, B. C. Enriching the ant tree of life: enhanced UCE bait set for genome‐scale phylogenetics of ants and other Hymenoptera. Methods Ecol. Evol. 8, 768–776 (2017).
Article Google Scholar
Moreau, C. S., Bell, C. D., Vila, R., Archibald, S. B. & Pierce, N. E. Phylogeny of the ants: diversification in the age of angiosperms. Science 312, 101–104 (2006).
Article CAS PubMed Google Scholar
Barden, P., Perrichot, V. & Wang, B. Specialized predation drives aberrant morphological integration and diversity in the earliest ants. Curr. Biol. 30, 3818–3824 (2020).
Article CAS PubMed Google Scholar
Barden, P. & Grimaldi, D. A. Adaptive radiation in socially advanced stem-group ants from the Cretaceous. Curr. Biol. 26, 515–521 (2016).
Article CAS PubMed Google Scholar
LaPolla, J. S., Dlussky, G. M. & Perrichot, V. Ants and the fossil record. Annu. Rev. Entomol. 58, 609–630 (2013).
Article CAS PubMed Google Scholar
Boudinot, B. E. et al. Evolution and systematics of the Aculeata and kin (Hymenoptera), with emphasis on the ants (Formicoidea: †@@@idae fam. nov., Formicidae). Preprint at bioRxiv https://doi.org/10.1101/2022.02.20.480183 (2022).
Brady, S. G., Schultz, T. R., Fisher, B. L. & Ward, P. S. Evaluating alternative hypotheses for the early evolution and diversification of ants. Proc. Natl Acad. Sci. USA 103, 18172–18177 (2006).
Article CAS PubMed PubMed Central Google Scholar
Ward, P. S., Brady, S. G., Fisher, B. L. & Schultz, T. R. The evolution of myrmicine ants: phylogeny and biogeography of a hyperdiverse ant clade (Hymenoptera: Formicidae). Syst. Entomol. 40, 61–81 (2015).
Article Google Scholar
Rabeling, C., Brown, J. M. & Verhaagh, M. Newly discovered sister lineage sheds light on early ant evolution. Proc. Natl Acad. Sci. USA 105, 14913–14917 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kück, P., Hita Garcia, F., Misof, B. & Meusemann, K. Improved phylogenetic analyses corroborate a plausible position of Martialis heureka in the ant tree of life. PLoS ONE 6, e21031 (2011).
Article PubMed PubMed Central Google Scholar
Jermiin, L. S., Ho, S. Y., Ababneh, F., Robinson, J. & Larkum, A. W. The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst. Biol. 53, 638–643 (2004).
Article PubMed Google Scholar
Cai, C. Y. et al. Integrated phylogenomics and fossil data illuminate the evolution of beetles. R. Soc. Open Sci. 9, 211771 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kapli, P., Flouri, T. & Telford, M. J. Systematic errors in phylogenetic trees. Curr. Biol. 31, R59–R64 (2021).
Article CAS PubMed Google Scholar
Kapli, P., Yang, Z. & Telford, M. J. Phylogenetic tree building in the genomic age. Nat. Rev. Genet. 21, 428–444 (2020).
Article CAS PubMed Google Scholar
Tihelka, E. et al. The evolution of insect biodiversity. Curr. Biol. 31, R1299–R1311 (2021).
Article CAS PubMed Google Scholar
Tihelka, E. et al. Fleas are parasitic scorpionflies. Palaeoentomology 3, 641–653 (2020).
Article Google Scholar
Lartillot, N., Brinkmann, H. & Philippe, H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7, S4 (2007).
Article PubMed PubMed Central Google Scholar
Feuda, R. et al. Improved modeling of compositional heterogeneity supports sponges as sister to all other animals. Curr. Biol. 27, 3864–3870 (2017).
Article CAS PubMed Google Scholar
Cai, C. Y., Tihelka, E., Liu, X. Y. & Engel, M. S. Improved modelling of compositional heterogeneity reconciles phylogenomic conflicts among lacewings. Palaeoentomology 6, 49–57 (2023).
Article Google Scholar
Li, Y. D., Engel, M. S., Tihelka, E. & Cai, C. Phylogenomics of weevils revisited: data curation and modelling compositional heterogeneity. Biol. Lett. 19, 20230307 (2023).
Article PubMed PubMed Central Google Scholar
Bujaki, T. & Rodrigue, N. Bayesian cross-validation comparison of amino acid replacement models: contrasting profile mixtures, pairwise exchangeabilities, and gamma-distributed rates-across-sites. J. Mol. Evol. 90, 468–475 (2022).
Article CAS PubMed PubMed Central Google Scholar
Giacomelli, M., Rossi, M. E., Lozano-Fernandez, J., Feuda, R. & Pisani, D. Resolving tricky nodes in the tree of life through amino acid recoding. iScience 25, 105594 (2022).
Article CAS PubMed PubMed Central Google Scholar
Moreau, C. S. & Bell, C. D. Testing the museum versus cradle tropical biological diversity hypothesis: phylogeny, diversification, and ancestral biogeographic range evolution of the ants. Evolution 67, 2240–2257 (2013).
Article PubMed Google Scholar
Ward, P. S. & Fisher, B. L. Tales of dracula ants: the evolutionary history of the ant subfamily Amblyoponinae (Hymenoptera: Formicidae). Syst. Entomol. 41, 683–693 (2016).
Article Google Scholar
Borowiec, M. L. et al. Compositional heterogeneity and outgroup choice influence the internal phylogeny of the ants [Data set]. Zenodo https://doi.org/10.5281/zenodo.2549806 (2019).
Romiguier, J. et al. Ant phylogenomics reveals a natural selection hotspot preceding the origin of complex eusociality [Data set]. Zenodo https://doi.org/10.5281/zenodo.5705739 (2022).
Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).
Article PubMed PubMed Central Google Scholar
Le, S. Q., Dang, C. C. & Gascuel, O. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol. Biol. Evol. 29, 2921–2936 (2012).
Article CAS PubMed Google Scholar
Yang, Z. A space-time process model for the evolution of DNA sequences. Genetics 139, 993–1005 (1995).
Article CAS PubMed PubMed Central Google Scholar
Le, S. Q., Gascuel, O. & Lartillot, N. Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24, 2317–2323 (2008).
Article Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. C., Minh, B. Q., Susko, E. & Roger, A. J. Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Syst. Biol. 67, 216–235 (2018).
Article CAS PubMed Google Scholar
Crotty, S. M. et al. GHOST: recovering historical signal from heterotachously-evolved sequence alignments. Syst. Biol. 69, 249–264 (2020).
CAS PubMed Google Scholar
Lartillot, N., Rodrigue, N., Stubbs, D., Richer, J. & PhyloBayes, M. P. I. phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).
Article CAS PubMed Google Scholar
Lartillot, N. PhyloBayes: Bayesian phylogenetics using site-heterogeneous models. In Phylogenetics in the Genomic Era. (eds C. Scornavacca, F. Delsuc & N. Galtier), pp. 1.5:1–1.5:16. No commercial publisher, authors open access book (2020).
Lartillot, N. Identifying the best approximating model in Bayesian phylogenetics: Bayes factors, cross-validation or wAIC? Syst. Biol. 72, 616–638 (2023).
Article PubMed PubMed Central Google Scholar
Cai, C. Data sets for phylogenomic analyses in: Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data [Data set]. Dryad https://doi.org/10.5061/dryad.pk0p2ngsj (2024).

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (42222201, 42288201) and the Second Tibetan Plateau Scientific Expedition and Research project (2019QZKK0706). I thank Prof. Philip Ward, Dr. Jonathan Romiguier, Dr. Marek Borowiec, and Mr. Erik Tihelka for their helpful discussions.

Author information

Authors and Affiliations

State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, Chinese Academy of Sciences, Nanjing, 210008, China
Chenyang Cai

Authors

Chenyang Cai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.C.: conceptualisation, project administration, data curation, formal analysis, funding acquisition, investigation, visualisation, writing.

Corresponding author

Correspondence to Chenyang Cai.

Ethics declarations

Competing interests

The author declares no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Katie Davis and Joao Valente.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplimental figure 1-6

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cai, C. Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data. Commun Biol 7, 106 (2024). https://doi.org/10.1038/s42003-024-05793-7

Download citation

Received: 01 July 2023
Accepted: 08 January 2024
Published: 17 January 2024
DOI: https://doi.org/10.1038/s42003-024-05793-7

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Incongruence in the phylogenomics era

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

Global determinants of insect mitochondrial genetic diversity

Introduction

Results

Sanger-sequencing datasets of Borowiec et al.4

Nuclear genomic datasets of Romiguier et al.2

Model comparison

Posterior predictive model checking

Discussion

Impacts of outgroup choice and data filtering on tree topology

Significance of model comparison and modelling compositional heterogeneity

Position of Martialis in the ant Tree of Life

Methods

Dataset collation

Phylogenetic analyses

Model comparison

Testing model adequacy

Statistics and reproducibility

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplimental figure 1-6

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links

Sanger-sequencing datasets of Borowiec et al.⁴

Nuclear genomic datasets of Romiguier et al.²