The Prevalence and Impact of Model Violations in Phylogenetic Analysis

Abstract In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).


Introduction
Phylogenetics is an essential tool for inferring evolutionary relationships between individuals, species, genes, and genomes. Moreover, phylogenetic trees form the basis of a huge range of other inferences in evolutionary biology, from gene function prediction to drug development and forensics (Eisen 1998;Farrell et al. 2000;M€ aser et al. 2001;Gardner et al. 2002;Yao et al. 2003Yao et al. , 2004Grenfell et al. 2004;Salipante and Horwitz 2006;Gray et al. 2009;Brady and Salzberg 2011;Dunn et al. 2011).
Most phylogenetic studies use models of sequence evolution which assume that the evolutionary process follows stationary, reversible, and homogeneous (SRH) conditions. Stationarity implies that the marginal frequencies of the nucleotides or amino acids are constant over time, reversibility implies that the evolutionary process is stationary and undirected (substitution rates between nucleotides or amino acids are equal in both directions), and homogeneity implies that the instantaneous substitution rates are constant along the tree or over an edge (Felsenstein 2004;Yang and Rannala 2012;Jermiin et al. 2017). However, these simplifying assumptions are often violated by real data (Foster and Hickey 1999;Tarr ıo et al. 2001;Paton et al. 2002;Goremykin and Hellwig 2005;Murray et al. 2005;Bourlat et al. 2006;Hyman et al. 2007;Sheffield et al. 2009;Nesnidal et al. 2010;Nabholz et al. 2011;Martijn et al. 2018). Such model violation may lead to systematic error that, unlike stochastic error, cannot be remedied simply by increasing the size of a data set (Felsenstein 2004;Ho and Jermiin 2004;Jermiin et al. 2004;Philippe et al. 2005;Sullivan and Joyce 2005;Kumar et al. 2012;Brown and Thomson 2017;Duchene et al. 2017). As phylogenetic data sets are steadily growing in terms of taxonomic and site sampling, it is vital that we develop and employ methods to measure and understand the extent to which systematic error affects phylogenetic inference (systematic bias), and explore ways of mitigating this systematic bias in empirical studies.
One approach to accommodate data that have evolved under non-SRH conditions is to employ models that relax the SRH assumptions. A number of non-SRH models have been implemented in a variety of software packages (Foster 2004;Lartillot and Philippe 2004;Blanquart and Lartillot 2006;Boussau and Gouy 2006;Jayaswal et al. 2007Jayaswal et al. , 2011Jayaswal et al. , 2014Knight et al. 2007;Dutheil and Boussau 2008;Sumner et al. 2012;Zou et al. 2012;Groussin et al. 2013;Nguyen et al. 2015;Woodhams et al. 2015). However, such models remain infrequently used as searching for optimal phylogenetic trees under these models is computationally demanding (Betancur-r et al. 2013) and the implementations are often not easy to use. As a result, the vast majority of empirical phylogenetic inferences rely on models that assume sequences have evolved under SRH conditions, such as the general time reversible family of models implemented in many of the most widely used phylogenetics software packages (Swofford 2001;Drummond and Rambaut 2007;Guindon et al. 2010;Ronquist et al. 2012;Bazinet et al. 2014;Bouckaert et al. 2014;Stamatakis 2014;Nguyen et al. 2015;Hö hna et al. 2016).
Another approach to accounting for data that may have evolved under non-SRH conditions is to test for model violations prior to tree reconstruction. Here, one first screens data sets or parts of data sets, and reconstructs trees exclusively from data that do not reject SRH conditions. A number of methods have been proposed to test for violation of SRH conditions in aligned sequences prior to estimating trees (Bowker 1948;Stuart 1955;Rzhetsky and Nei 1995;Kumar and Gadagkar 2001;Weiss and von Haeseler 2003;Ababneh et al. 2006;Ho et al. 2006), and there are also a posteriori tests for absolute model adequacy which are employed after trees have been estimated (Goldman 1993;Bollback 2002;Brown and ElDabaje 2009;Brown 2014;Duchene et al. 2017;Brown and Thomson 2018).
Allowing the data to reject the model when the assumptions of the model are violated is an important approach to reducing systematic bias in phylogenetic inference Brown 2014). Knowing in advance which sequences and loci are inconsistent with the SRH assumptions will allow us to choose more complex models or to omit some of these sequences and loci from downstream analyses (Kumar and Gadagkar 2001). The need for methods that assess the evolutionary process prior to phylogenetic inference becomes more important as the number of sequences and sites per data set increases, because systematic bias has an increasing effect on inferences from larger phylogenetic data sets Jermiin et al. 2004;Phillips et al. 2004;Delsuc et al. 2005).
In this article, we evaluate the extent and effect of model violation due to non-SRH evolution using 35 empirical data sets with a total of 3,572 partitions. We determine if the SRH assumptions are violated by extending and applying the matched-pairs tests of homogeneity ) to each partition. We then compare the phylogenetic trees for each data set estimated from all of the partitions, the partitions that reject the SRH assumptions, and the partitions that do not reject the SRH assumptions, in order to evaluate the effect violating SRH conditions on phylogenetic inference. Our results suggest that violating SRH assumptions can have substantial impacts on phylogenetic inference.

Empirical Data Sets
In order to assess the impact of model violation in phylogenetics, we first gathered a representative sample of 35 partitioned empirical data sets that had been used for phylogenetic analysis in recent studies (table 1). Within the constraints of selecting data that were publicly available and suitably annotated, that is, such that all loci and all codon positions within protein-coding loci could be identified, we selected the data sets to provide as representative a sample as possible of the data types, taxa, and genomic regions most commonly used to infer bifurcating phylogenetic trees from concatenated alignments. These data sets include nucleotide sequences from nuclear, mitochondrial, plastid, and virus genomes, and include protein-coding DNA, introns, intergenic spacers, tRNA, rRNA, and ultraconserved elements. The number of taxa and sites in these data sets range from 27 to 355 and from 699 to 1,079,052, respectively. The clades represented in these data sets include animals, plants, and viruses. We partitioned all data sets to the maximum possible extent based on the biological properties of the data, that is, we divided every locus and every codon position within each protein-coding locus into a separate partition. All partitioning information is available at the github repository (https://github.com/roblanf/SRHtests/tree/master/datasets), and the full details of every data set are provided in Workflow Summary Figure 1 outlines the workflow. For each partition in each data set, we used a new approach based on the three matchedpairs tests of homogeneity to ask whether the evolution of the aligned sequences in the partition rejects the SRH assumptions. The three matched-pairs tests of homogeneity, described in more detail below, test three slightly different assumptions about the historical process that generated each aligned pair of sequences in a given partition. A significant result from any test suggests that the nature of the evolutionary process required to explain the aligned sequences violates at least one of the three SRH conditions . For each test, we classify each partition as pass if the result of the test is nonsignificant or fail if the result of the test is significant. We then denote the original data set as D all , while the concatenation of pass partitions is denoted D pass and the concatenation of fail partitions as D fail ( fig. 1).
To investigate the impact of model violation on phylogenetic inference, we infer and compare three phylogenetic trees, T all , T pass , and T fail , estimated from D all , D pass , and D fail , respectively.

Matched-Pairs Tests of Homogeneity
The three matched-pairs tests of homogeneity that are applied to pairs of sequences are: the MPTS (matched-pairs test of symmetry), MPTMS (matched-pairs test of marginal symmetry), and MPTIS (matched-pairs test of internal symmetry). The statistics are computed on an m-by-m (m is 4 for nucleotides and 20 for amino acids) divergence matrix D with elements d ij , where d ij is the number of alignment sites having nucleotide (or amino acid) i in the first sequence and nucleotide (or amino acid) j in the second sequence.
The MPTS tests the symmetry of D by computing the Bowker's (1948) test statistic as the v 2 distance between D and its transpose: where d ij þ d ji > 0. A P value is then obtained by a v 2 test with f degrees of freedom, where f is the number of i; j ð Þ pairs for which d ij þ d ji > 0. A small P value (e.g., <0.05) indicates that the assumption of symmetry is rejected at that significance level, suggesting that evolution is nonstationary, nonhomogeneous, or both . The MPTMS tests the equality of nucleotide or amino acid composition between two sequences. To do so, MPTMS computes the Stuart's test statistic S 2 S ¼ u T V À1 u using the difference between nucleotide or amino acid frequencies of two sequences, u, and its variance-covariance matrix, V . In detail, u is given by indicates that the stationarity assumption is rejected. Note that when V is not invertible, the Stuart's statistic S 2 S is ill-defined and the MPTMS is not applicable.
The MPTIS uses the test statistic as the difference between Bowker's and Stuart's statistic: indicates that the homogeneity assumption is rejected.
The MPTS, MPTMS, and MPTIS test different aspects of the symmetry with which differences accumulate between pairs of sequences due to the substitution process. The MPTS is a comprehensive and sufficient test to determine whether the data comply with the SRH assumptions ), but it cannot provide any information about the source of this violation. Some information on the underlying source of model violation may be obtained by performing the other two tests of symmetry: the MPTMS and the MPTIS. If the violation of the SRH assumptions stems from differences in base composition between the sequences, this should affect the marginal symmetry of the sequence pair, which can in principle be detected by the MPTMS. If the violation of the SRH assumptions stems from changes in the relative substitution rates over time, this should affect the internal symmetry of the sequence pair, which can in principle be detected by the MPTIS. However, even after performing all three tests, it is difficult to ascertain which of the three SRH assumptions is violated during the evolutionary process because the relationships between the SRH conditions and the three matched-pair tests is neither bijective nor injective, that is, there is not a oneto-one correspondence between the three tests and violation of the three SRH conditions .
The three matched-pairs tests of homogeneity are appropriate to test for SRH assumptions as they consider the alignment on a site-by-site basis. The basic intuition that underlies these tests is that two sequences diverging under SRH conditions should accumulate differences symmetrically (e.g., both sequences are equally likely to accumulate at a C to T change at a site in which both originally shared a C). This symmetry of accumulation is reflected by symmetries in the resulting difference matrix, violations of which can be assessed statistically. However, these tests were designed to ask whether any single pair of sequences rejects the SRH conditions . To ask whether a given partition rejects SRH conditions, we developed an approach to extend the matched-pairs tests of homogeneity to accommodate data sets with more than two sequences.

Maximum Symmetry Test
In order to determine whether a given multiple sequence alignment rejects SRH conditions, we consider only the pair of taxa with the maximum divergence. In order to find the maximum divergent pair, we sum the off-diagonal elements of the divergence matrix and divide by the sum of all elements. We then randomly choose one pair from all the pairs with the maximum divergence score (if there is more than one pair). By using the most divergent sequence pair, we maximize our power to detect model violations without a priori knowledge of the underlying tree topology and the dependencies that it induces in the data. For the maximum divergent pair, we then apply the matched-pair tests of homogeneity and calculate their v 2 P values. If the obtained P value is <0.05, then we consider that the null hypothesis of SRH evolution is rejected for the corresponding partition and we add it to the D fail data set. Otherwise, we add it to the D pass data set. We denote our applications of the MPTS, MPTMS, and MPTIS based on the d max Pair as MaxSymTest, MaxSymTest mar , and MaxSymTest int , respectively.

Phylogenetic Inference
We used IQ-TREE (Nguyen et al. 2015) to infer up to seven phylogenetic trees for every data set: T all (all partitions from the original data set; D all ); and T pass and T fail based on the D pass and D fail data sets from each of the three tests (MaxSymTest, MaxSymTest mar , MaxSymTest int ), provided that there was at least one partition in each category. We ran IQ-TREE using the default settings with the best-fit fully partitioned model (Chernomor et al. 2016), which allows each partition to have its own evolutionary model and edge-linked rate determined by ModelFinder (Kalyaanamoorthy et al. 2017) followed 1,000 ultrafast bootstrap replicates (Hoang et al. 2018).

Distance between Trees
For each of the three tests (MPTS, MPTMS, MPTIS) we calculated the Normalized Path-Difference (NPD) and quartet distance (QD) (Steel and Penny 1993;Sand et al. 2014) between all three possible pairs of trees (T all vs. T pass ; T all vs. T fail ; and T pass vs. T fail ), as long as D pass and D fail were nonempty and so T pass and T fail had been estimated. The path-difference metric (PD) is defined as the Euclidean distance between pairs of taxa (Steel and Penny 1993;Mir and Russello 2010). In this study, because we are interested only in differences between topologies, we use the variant of the PD metric that ignores branch lengths. In order to compare path distances between trees with different number of taxa, we normalized PD (to obtain NPD) by the mean of a null distribution of PDs generated from 10K random pairs of trees with the same number of taxa (Bogdanowicz et al. 2012). Thus, an NPD of 0 indicates an identical pair of trees, an NPD of 1 indicates that a pair of trees is as similar as a pair of randomly selected trees with the same number of taxa; and an NPD >1 indicates a pair of trees that are less similar than a randomly selected pair of trees with the same number of taxa. Since path differences are always nonnegative, the NPD is also guaranteed to be nonnegative.
The QD metric is defined as the fraction of quartets (subsets of four taxa) that induce different subtrees between the two trees being compared. QD ranges between 0 and 1, where 0 means that two trees are identical and 1 means that they do not share any quartet subtrees. Compared with PD, QD has the advantage that its distribution is less sensitive to the underlying distribution of tree topologies (Steel and Penny 1993).

Tree Topology Tests
The NPD and the QD give us measures of the differences between pairs of trees, but they do not tell us whether the differences are phylogenetically significant in the three data sets (D pass , D all , and D fail ) derived from a given test. For example, trees that differ due to stochastic error associated with small data sets may be very different, but such differences may not be statistically significant. To assess the significance of the differences between T pass , T all , and T fail , we used the weighted Shimodaira-Hasegawa (wSH) test (Shimodaira and Hasegawa 1999;Shimodaira 2002) implemented in IQ-TREE with 1,000 RELL replicates (Kishino et al. 1990). Given the alignment (D pass ), the wSH test computes a P value for each tree, where a small P value (<0.05) implies that the corresponding tree has a significantly worse likelihood than the best tree in the set of T pass , T all , and T fail . We use D pass for these tests because it is, by definition, the only data set that does not reject the underlying assumptions of the SH test. As such, we only compute sWH P values when D pass is nonempty. Thus, we performed a wSH test for each of the three MaxSymTest variants: each of which asks whether T all and/or T fail can be rejected in favor of T pass .

Correlation between Number of Substitutions and Model Violation
We hypothesized that partitions with more substitutions may be more likely to violate the SRH assumptions, since substitutions form the raw data for the matched-pairs tests of homogeneity. To assess this, we fitted a linear mixedeffects model for each of the three tests using the glmer function from the lme4 package in R (Bates et al. 2015). In this model, we treat each partition as a datapoint, the number of substitutions measured for that partition as a fixed effect, and the data set from which that partition was taken as a random effect. This allows us to estimate the extent to which the number of substitutions in a partition associates with whether a partition fails a given test of symmetry, after accounting for differences between the data sets. To calculate the R 2 value, we use the r.squaredGLMM function from the MuMIn package in R (Barton 2009;Nakagawa and Schielzeth 2013).

Software Implementation
We implemented a new option -symtest in IQ-TREE to perform the three MaxSymTest matched-pairs tests of symmetry. In addition, the option -symtest-remove-bad allows users to remove from the final analysis partitions that fail the MaxSymTest. One can change the removal criterion to MaxSymTest mar or MaxSymTest int via the -symtest-type MARjINT option. In addition, the cutoff P value can be changed using the -symtest-pval NUM option, where the default value is 0.05.

Reproducibility
The GitHub repository (https://github.com/roblanf/SRHtests) contains the raw data and Python and R scripts necessary to perform all analyses reported in this study.
The proportion of partitions failing each test varied substantially among data sets ( fig. 2), but on an average, 21.8% of the partitions in each data set failed the MaxSymTest, 27.5% failed the MaxSymTest mar , and 5.1% failed the MaxSymTest int .
The fraction of failing partitions also varied with the genome type (e.g., mitochondrial, chloroplast, or nuclear) and context (e.g., protein-coding, UCE, tRNA) from which the partition was sequenced (table 2) although we note that a substantial proportion of the partitions from almost every category failed at least one of the tests (table 2).
There were no clear differences in the substitution models that were selected for the partitions that pass or fail the tests (see supplementary extended tables 1-3, Supplementary Material online). However, we note that the two mostfrequently selected substitution models (for 35% of the partitions) were relatively simple: K80 (Kimura 1980) and HKY (Hasegawa et al. 1985).

Model Violation Has a Large Influence on Tree Topologies
Using both MaxSymTest and MaxSymTest mar , we compared each tree inferred from each data set (T all ) to the corresponding trees estimated from the failed (T fail ) and passed (T pass ) partitions. Disturbingly, for each of the two tree distance metrics that we considered (NPD and QD), we find that the tree inferred from the original data set tended to be more similar to the tree estimated from the failed partitions (table 3  and supplementary extended table 4, Supplementary Material  online). Furthermore, the mean NPD distance between T pass and T fail across all 35 data sets for the MaxSymTest was 0.69, that is, the two trees are 69% as dissimilar as random pairs of trees. This suggests that violations of SRH assumptions drive large changes in tree topologies.
The results of the wSH tests (table 4) confirm that the differences between trees that we observe tend to be statistically significant. For example, when using the MaxSymTest mar , T pass is a significantly better description of the D pass data than T all in $37% of the data sets, and better than T fail in $89% of the data sets.   of substitutions in a partition is a highly significant (P < 2e-16) predictor of passing or failing any of the tests, that it explains only about a quarter of the variation suggests that other factors, such as underlying differences in the extent to which partitions violate the SRH assumptions, are driving the remaining $75% of the variation.

Model Violation Due to Non-SRH Evolution Affects the Inferred Relationship between Even-Toed and Odd-Toed Ungulates in the Tree of Mammals
To examine the effects of model violation in more detail, we selected two data sets for more detailed consideration.
Conflicting support for the placement of Xenacoelomorpha, the clade that contains Xenoturbella and Acoelomorpha, in the tree of life across different analyses has led to various hypotheses about the evolution of Bilateria (Cannon et al. 2016). In addition, the interordinal relationships in Laurasiatheria, especially the relationships between Fereuungulata (Perissodactyla, Cetartiodactyla, Carnivora, and Pholidota), in the tree of placental mammals is controversial (Cao et al. 1998;Zhou et al. 2012). It has been suggested that such inferences might be strongly affected by model violation and systematic error (Cao et al. 1998;Delsuc et al. 2005;Philippe et al. 2011;Tsagkogeorga et al. 2013). To assess whether data that pass or fail the MaxSymTest mar show different signals regarding the evolution of the Bilateria and the superorder Laurasiatheria, we examined in more detail the T all , T pass , and T fail trees from recent studies that explored the tree of placental mammals (Lartillot and Delsuc 2012) and the tree of all animals (Cannon et al. 2016). The mammals' data set comprises 78 mammalian taxa, including 73 placental mammals with 51 partitions representing the first, second, and third codon positions of the 17 genes (Lartillot and Delsuc 2012 . 3b and supplementary extended fig. 6, Supplementary Material online). The animal data set comprises 76 metazoan taxa, 2 choanoflagellate outgroups, 212 genes, and 424 partitions representing first and second codon positions (Cannon et al. 2016). The tree reconstructed from all of the partitions (T all ) is identical to the trees reconstructed from the 381 partitions that pass the MaxSymTest (T pass ), the partitions that fail the MaxSymTest (T pass , 43 partitions), and the tree shown in the original paper from both DNA and amino acid data (Cannon et al. 2016), which places Xenacoelomorpha as the sister group of Nephrozoa (Deuterostomia and Protostomia) with 100% bootstrap support (supplementary extended figs. 1-3, Supplementary Material online).

Discussion
In this article, we show that model violation is prevalent and has a strong impact on tree reconstruction in many phylogenetic data sets. This impact varies substantially between different data sets and different types of partitions. The trees inferred from different groups of partitions from the same data set often have topologies that are biologically and statistically significantly different.
Our results show great heterogeneity in the extent of model violation among different data sets and partitions. This is demonstrated by the varying proportion of partitions that failed the matched-pairs tests of homogeneity in each data set and in each genomic context (codon position, rRNA, tRNA, UCE, or other) and type of genome (nuclear, mitochondrial, plastid, and virus). Model violations are most frequently observed in the third codon positions for viral, mitochondrial and nuclear genomes, and intergenic spacers in plastid sequences. Yet, our results affirm that non-SRH evolution is far from constrained to these genomic regions. For example, in a data set of placental mammals, of the 22 partitions that failed the MaxSymTest, only 11 are third codon positions. The tree inferred from the partitions that show significant violation of the SRH conditions (T fail ) differs in its topology from the tree inferred from the partitions that do not show significant violation of the SRH conditions (T pass ) with respect to the interordinal relationships in Laurasiatheria (fig. 3). The tree inferred from partitions that violate the SRH conditions (T fail ) is consistent with the results from the original paper in that it places Perissodactyla as a sister group to Carnivora þ Pholidota (Lartillot and Delsuc 2012). However, other studies using ML analysis show Perissodactyla to be a sister group to Cetartiodactyla (Graur et al. 1997;Murphy et al. 2001;Tsagkogeorga et al. 2013;Liu et al. 2017), which is also the relationship we find in this study with the tree inferred from partitions that do not show significant violation of the SRH assumptions.
Examining the results of the two other tests (MaxSymTest mar and MaxSymTest int ) we noticed that all the partitions that failed the MaxSymTest also failed the MaxSymTest mar , suggesting that those partitions are violating the models mainly due to nonstationarity. Based on this observation, GC content may drive the differences between the trees inferred from all partitions and those inferred from partitions that failed neither MaxSymTest nor MaxSymTest mar . Trees with partitions that violate the models tend to group together clades with similar GC content (e.g., as in Betancur-r et al. 2013). However, it is hard to discern any clear evidence for this from examining the GC content of the clades ( fig. 3). Yet, our results show that all the clades in the partitions that failed the MaxSymTest have on an average a higher GC content ( fig. 3).
The results of our study also provide some insight into the likely cause of model violation in the data sets we examined. Figure 2 shows that violation of marginal symmetry (assessed with MaxSymTest mar ) was much more common than violation of internal symmetry (assessed with MaxSymTest int ). This suggests that nonstationarity, which is associated with marginal symmetry, is likely a more common cause of systematic bias than nonhomogeneity in the data sets that we examined (see also Jayaswal et al. 2005;Ababneh et al. 2006;Song et al. 2010). Yet, the difference between the proportion of partitions that failed the MaxSymTest mar and the proportion of partitions that failed the MaxSymTest int could also be due to the higher power of the MaxSymTest mar . Either way, this result hints that the development and application of nonstationary models (Yang 1994;Roberts and Yang 1995;Yap and Speed 2005) may be an important avenue toward reducing systematic bias in future analyses. Moreover, our results show a clear preference for simple substitution models with a single transition/transversion ratio over more complex models such as general time reversible. This suggests that developing nonstationary models with a single parameter for the transition/transversion ratio might be sufficient to reduce systematic bias in phylogenetic analysis.
One limitation of using the tests that we propose in this article is that their power will be limited if there are few differences between the sequences being examined. Indeed, our analyses show that in our representative sample of >3,500 partitions from published data sets, roughly $25% of the variance in whether a partition passes or fails a given test can be attributed to the number of observed differences between the sequences. Nevertheless, this implies that the remaining $75% of the variance in whether a partition passes or fails a test could be attributable to other processes, such as variation in the extent of model violation among partitions. This suggests that we should be cautiously optimistic: although a lack of power on small or slowly evolving partitions may induce some false negatives (i.e., failures to identify partitions that have evolved under non-SRH conditions), the tests we propose still have significant power to identify partitions that show the evidence of model violation. It is possible that removing such partitions from phylogenetic analyses may improve the accuracy of results by reducing the overall burden of model violation on the inference of the tree topology. We hope that our implementation of these tests in the user-friendly software IQ-TREE will allow empirical phylogeneticists to continue to explore whether this is the case.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.