Phenotypic Anchoring of Gene Expression Changes during Estrogen-Induced Uterine Growth

A major challenge in the emerging field of toxicogenomics is to define the relationships between chemically induced changes in gene expression and alterations in conventional toxicologic parameters such as clinical chemistry and histopathology. We have explored these relationships in detail using the rodent uterotrophic assay as a model system. Gene expression levels, uterine weights, and histologic parameters were analyzed 1, 2, 4, 8, 24, 48, and 72 hr after exposure to the reference physiologic estrogen 17β-estradiol (E2). A multistep analysis method, involving unsupervised hierarchical clustering followed by supervised gene ontology–driven clustering, was used to define the transcriptional program associated with E2-induced uterine growth and to identify groups of genes that may drive specific histologic changes in the uterus. This revealed that uterine growth and maturation are preceded and accompanied by a complex, multistage molecular program. The program begins with the induction of genes involved in transcriptional regulation and signal transduction and is followed, sequentially, by the regulation of genes involved in protein biosynthesis, cell proliferation, and epithelial cell differentiation. Furthermore, we have identified genes with common molecular functions that may drive fluid uptake, coordinated cell division, and remodeling of luminal epithelial cells. These data define the mechanism by which an estrogen induces organ growth and tissue maturation, and demonstrate that comparison of temporal changes in gene expression and conventional toxicology end points can facilitate the phenotypic anchoring of toxicogenomic data.

represents the most abundant and diversified order of living mammals and by variations in molecular evolutionary rate and mode in some families. For example, several molecular studies have suggested paraphyly of Rodentia or Glires [1][2][3], while others (and the majority of morphological data) support the monophyly of both groups [4][5][6]. Both molecular approaches and morphological analyses have their limitations. Critics of conclusions based on molecular characters cite the limited number of sequences considered and the apparent dependence of conclusions on the analytical methodologies employed, while adherents of molecular data point out that the predominantly dental and cranial characters employed in morphological analyses are likely subject to homoplastic evolution as a result of shared ecological constraints. Some intra-ordinal phylogenetic relationships in Rodentia also remain poorly resolved. For example, while the monophyly of many classically diagnosed Rodentia groups (Hystricognathi -a grouping of Myoxidae and Sciuridae -and the Muroidea/Dipodidae group) have been supported by molecular analyses (eg [4,7]), relationships between these groups as well as the placement of a few under-studied taxa (such as the Anomaluridae) are controversial. Discrepancies between molecular and other data are not restricted to tree topologies. Molecular dating approaches (typically employing mitochondrial DNA sequences) have tended to provide estimates of divergence times which conflict with inferences drawn from the fossil record. More recently the availability of relaxed and local molecular clock approaches [8], which allow evolutionary rates to differ across the tree, has allowed some reconciliation of molecular and fossil derived divergence time estimates within Euarchontoglires [9,10].
In the current study, we have sequenced and analysed the complete mitochondrial genome of Anomalurus sp. as a representative of the Anomaluridae, a family of flying squirrel-like rodents which possess two rows of pointed, raised scales on the undersides of their tails and whose cranial anatomy does not indicate a close relationship with sciurid flying squirrels. Indeed, the phylogenetic affinities of the Anomaluridae, which consists of three extant genera and whose geographic distribution is currently restricted to central Africa, have remained enigmatic owing both to the aforementioned weakness of morphological characters in the systematics of Rodentia and a relative lack of available molecular sequence data (currently restricted to five nuclear and two mitochondrial gene sequences). Previous studies based on molecular data have suggested alternative phylogenetic placements for Anomalurus, while weakly supporting various relationships between the Hystricognathi, the Sciuridae, and the Muroidea/Dipodidae group [11][12][13], while morphological classifications have suggested almost all possible placements for Anomalurus (reviewed in [14]).
We have performed extensive phylogenetic analyses of the protein coding regions of all available Primates, Lagomorpha and Rodentia mitochondrial genomes at both nucleotide and inferred amino acid sequence levels. We show that the sequence data suggest a phylogenetic affinity between Anomalurus and the Hystricognathi. However, statistical tests of alternative tree topologies do not exclude other phylogenetic hypotheses, either for the placement of Anomalurus sp. or for higher-level relationships within Rodentia. These observations are at least partially explained by a Bayesian relaxed molecular dating approach which generates estimates of divergence times within Euarchontoglires that are compatible with fossil and biogeographical data and suggest that a rapid evolutionary radiation within Glires occurred around 60 million years ago.

The mitochondrial genome of Anomalurus
The mtDNA of Anomalurus is 16,923 bp long and presents the common vertebrate gene organization. The entire genome sequence has been submitted to the EMBL sequence database under accession number AM_159537. Start and end positions of all protein coding, tRNA and rRNA genes were easily identifiable through homology searches using characterized mammalian mitochondrial protein sequences as probes. The control region (D-loop containing region) is 1439 bp long and shows the typical tripartite structure observed in mammals with the central conserved domain (15,062) and the CSB domain (16,923) both identifiable. Of the two conserved blocks known to be located in the ETAS domain, ETAS1 and ETAS2 [15], only a 40 bp long conserved sequence corresponding to ETAS1 can be identified (15641-15681). Indeed, only this element is conserved across Rodentia [16]. The CSB domain includes all the three known conserved sequence blocks (CSB1, CSB2 and CSB3), and contains a tandem repeat array made up of a 40-fold repetition of an 8 bp long monomer (CGTA-CAGC).

Phylogenetic analyses
While a concatenated dataset of unambiguously aligned regions of H-strand protein sequences (all protein-coding genes apart from NAD6) passed the compositional homogeneity test implemented in TREE-PUZZLE [17], many corresponding DNA sequences failed the equivalent test. Compositional heterogeneity was reduced by the removal of third codon positions from the DNA dataset, although several sequences still failed the chi square test. We have previously shown that first position synonymous leucine codon usage (Leu-SynP1) varies extensively between mitochondrial genomes and is a source of compositional heterogeneity [6]. Accordingly, we removed (Leu-SynP1) codons from the alignment resulting in a dataset where only sequences from the Cercopithecinae (Papio, Macaca, Chlorocebus) failed the test of compositional homogeneity. Bearing this result in mind, phylogenetic analyses at the DNA level were performed both in the presence and absence of sequences from Primates. Fig. 1 shows the Bayesian consensus tree of Euarchontoglires relationships inferred from protein sequences (with Bayesian Posterior Probabilities (PP) and distance bootstrap (BP) values associated with branches). Relationships recovered within Anthropoidea all have high bootstrap and posterior support and are uncontroversial. However, Primates emerge as a paraphyletic group with a clade defined by Tarsius, Nycticebus and Lemur diverging before the Dermoptera. Both the placement of Tarsius as sister to Nycticebus and Lemur, and the paraphyly of primates have been observed in other analyses of mitochondrial sequences (e.g. [18,6]). Scandentia (represented by the tree-shrew Tupaia) emerge basal to the Primates -Glires split. Given the controversy surrounding apparent discrepancies between mitochondrial and nuclear data with respect to these relationships, we conducted Approximately Unbiased (AU) tests [19] of competing topologies representing all plausible inter-relationships of Primates, Dermoptera, and Scandentia. At the 5% confidence level, the protein data exclude monophyletic primates regardless of whether Tarsius is sister to Anthropoidea or to Lemur and Nycticebus, while the DNA data permit monophyly of primates with Tarsius placed as sister to Anthropoidea (P = 0.059) or to Lemur and Nycticebus (P = 0.178). The protein (P = 0.062), but not the DNA sequences allow Scandentia to emerge immediately basal to Primates/Dermoptera (see discussion) but both datasets exclude a sister relationship between Scandentia and Dermoptera.
Notably, and in accord with our previous analyses [6], monophyly of Rodentia is supported with high Bayesian posterior support for protein-based analyses. However, Relationships within Euarchontoglires inferred from Bayesian analysis of 3519 unambiguously aligned amino acids encoded by H-strand mitochondrial genes   protein distance-based bootstrap support for this partition is low (30%). Inspection of bootstrap partitions reveals that decay in support of Rodentia monophyly is caused by the sequence of Anomalurus and to a lesser extent those of Thryonomys and Cavia (Hystricognathi) that have a tendency to cluster with the outgroup sequences. Likewise, the monophyly of Glires receives high posterior support but does not emerge on the bootstrap consensus tree, owing to a tendency of Lagomorpha to emerge basal to the Rodentia/(Primates + Dermoptera) divergence in some bootstrap datasets. In accordance with other molecular studies [11,12], Bayesian analyses of protein sequences strongly support the monophyly of Hystricognathi, the monophyly of Myoxidae and Sciuridae and the monophyly of the Muroidea/Dipodidae grouping (all with BP = 100, PP = 1.0). In both Bayesian and distance bootstrap trees, Anomalurus emerges as sister of the Hystricognathi. Both methods suggest that the Hystricognathi/ Anomalurus group is sister to a clade composed of Myoxidae/Sciuridae and the Muroidea/Dipodidae cluster. However, both the position of Anomalurus and the interrelationships between super-families within Rodentia receive only moderate posterior or bootstrap support. Bayesian analysis of the DNA data in the presence of the Cercopithecinae sequences yielded an identical tree topology apart from the position of Anomalurus which emerged as a poorly supported basal branch in the Primates/Dermoptera clade while Bayesian analyses of Glires, Scandentia and outgroup sequences alone generated an identical topology for Glires as the protein sequences (not shown).
In order to evaluate the degree of support for alternative hypotheses of relationships between Rodentia superfamilies, we generated a series of topologies where the constitution and internal topology of uncontroversial clades (outgroup, Primates/Dermoptera, Lagomorpha, Muroidea/Dipodidae, Myoxidae/Sciuridae and Hystricog-nathi) were constrained, but where inter-relationships of these groups and the placement of Anomalurus were varied. These topologies were tested under the AU test of alternative tree topologies. Selected results are shown in Table 1. In brief, both the protein and DNA data reject (at the 5% level) all topologies where Anomalurus is placed as sister to the Myoxidae/Sciuridae group, while all topologies depicting Anomalurus as sister to either Hystricognathi or the Muroidea/Dipodidae grouping are accepted as are various topologies placing Anomalurus either as the basal divergence among Rodentia or as sister to clades of Hystricognathi + Muroidea/Dipodidae, Hystricognathi + Myoxidae/Sciuridae or Muroidea/Dipodidae + Myoxidae/ Sciuridae. Rodentia monophyly is moderately supported in that we were unable to identify acceptable topologies depicting Rodentia as non-monophyletic apart from those suggested by Bayesian analyses of the DNA data (Anomalurus as a basal divergence among Primates)topologies where Anomalurus emerges as the basal branch of Glires were rejected by the protein data (P = 0.048) but were accepted by the DNA data. However, many topologies depicting non-monophyly of Glires (Lagomorpha divergence prior to the Rodentia/(Primate/Dermoptera) split, or Lagomorpha as sister to Tupaia -as suggested by other authors using mitochondrial sequence data [20,21]) were accepted by the AU test.

Slow-fast method
Given the apparent lack of resolution of Rodentia infraordinal relationships afforded by the mitochondrial sequence data and the tendency of the Anomalurus sequence to emerge in unexpected positions, especially for the DNA data, we wished to investigate whether undetected compositional or other types of systematic (or stochastic) biases manifested in faster evolving sites should be responsible for apparent decay in the phylogenetic signal. Accordingly we have used a variation on the "Slow-  In all cases, the local topology of clades found in the Bayesian tree of protein sequences was retained and interrelationships between the sequences/ clades specified were rearranged. Branchlengths and site likelihoods were optimized using PAML and the AU test implemented in the software CONSEL was applied. Topologies excluded by the AU test are marked with an asterisk.
Fast" phylogenetic analysis methodology [22] where faster evolving sites are progressively removed from the protein alignment and bootstrap partitions recalculated.
In the presence of misleading signal derived from homoplasy or biases at fast evolving sites, we might expect support for correct basal splits to increase as signal from slower evolving sites begins to predominate. We have used the SiteVarProt methodology [23] to estimate sitespecific relative amino-acid substitution rates. We removed the fastest evolving 5% or 25% of sites (beyond this level, the preponderance of constant and autapomorphic sites tends to lead to the generation of very poorly resolved and supported topologies). Distance bootstrap analyses were performed on these datasets and support for key partitions was compared with support derived from the complete dataset (see Table 2). The set with the 5% of fastest evolving sites removed generated an identical bootstrap topology to the full set with comparable (+/-< 10%) bootstrap support values at all nodes with respect to the original data. However, when the most variable 25% of sites were removed, Anomalurus was recovered as sister to the Muroidea/Dipodidae clade, albeit with low (30%) bootstrap support, while bootstrap support for the partition Anomalurus+Hystricognathi fell to 28%. Additionally, support for Muroidea+Dipodidae+Sciuridae+Myoxidae fell to 13% while Hystricognathi emerged as sister to the Anomalurus/Muroidea/Dipodidae clade (23% BP, not shown in Table 2). Strikingly, and in accordance with the AU tests, Anomalurus never emerged as monophyletic with Sciuridae and Myoxidae, regardless of which set of sites were analysed.

Molecular dating
We employed the Bayesian protein phylogeny for use in a Bayesian relaxed clock dating approach [24] to estimate divergence times between major lineages. Relaxed clock methods allow substitution rates to vary over the tree and thus do not rely on strict clock-like evolution of the sequences under consideration. Following the method of Amer and Kumazawa [25] we have incorporated the mtREV24 + gamma model into the MULTIDISTRIBUTE software in order to use a substitution model developed with mitochondrial protein sequences. We also estimated divergence times using the DNA codon position 1 and 2 data under the F84 + gamma substitution model. For calibration points we specified that: 1) the Rodentia/Lagomorpha divergence should have occurred between 61 and 90 Million Years Ago (MYA) [26,27], 2) the basal divergence in the sampled Lagomorpha should have occurred between 35 and 40 MYA [28] and 3) the divergence of Pongo should have occurred between 13 and 18 MYA [29]. The inferred times of some key divergences (with associated errors) are shown in  [13,30]. The estimates presented were generated using prior assumptions that the mean and standard error of the probability distribution describing the substitution rate at the root of the tree (a parameter required by the MULTIDIVTIME software) were equal to the mean of the substitution rate over the tree (assuming that Euarchontoglires is 75 million years old). However, the results of the Bayesian dating were extremely robust to the value specified for this parameter. Repeated runs with differing values yielded extremely similar estimates of divergence times (not shown).

Evolutionary rates
The Bayesian dating analysis also permits estimates of variation in evolutionary rates across the tree. Evolutionary rates of proteins estimated for branches leading to some nodes of interest are shown in Table 3. Like the estimates of divergence dates, the rate estimates were rather robust to the parameterization of the distribution of evolutionary rates at the base of the tree. The amino acid substitution rate inferred for the divergence between Primates/ Dermoptera and Glires 0.13%/MY) remains relatively constant until the divergence of the Tarsius+Nyctice-bus+Lemur clade (0.15%/MY), wherein a sharp rise in substitution rates is observed (0.24%/MY at the divergence of Cynocephalus and 0.35%/MY at the divergence between old world and new world monkeys. Rates remain high (or continue to increase) within the old world monkeys, but within the Hominoidea there is a notable decrease in substitution rates (0.26%/MY at the divergences both of Homo and Gorilla. With respect to the Glires, there is a slight tendency to increased amino acid substitution rates in the Muroidea, Hystricognathi and Anomalurus (0.23-0.26%/MY), while rates remain relatively stable in Lagomorpha and Sciurus. We wished to investigate whether the observed lineage specific shifts in amino acid substitution rates were a general property of mitochondrial protein coding genes or whether particular genes (or respiratory complexes) have undergone changes in evolutionary rates (a scenario that might indicate adaptive or functional changes). Accordingly, we used estimates of site-specific relative variability generated by the SiteVarProt methodology. For each major lineage in our dataset, we both counted the number of amino acid positions that are perfectly conserved within the group and calculated mean gene-specific normalized relative substitution rates for all variable sites (Table 4). Intriguingly, our data show that for proteins that are part of the cytochrome-c oxidase complex (COX1, COX2, COX3) and the cytochrome b protein, the normalized mean relative variability (of variable sites) is higher in Primates/Dermoptera than in Rodentia, while the number of perfectly conserved sites is lower in Primates/Dermoptera. These observations are highly consistent with previous studies that have identified accelerated rates of evolution of nuclear and mitochondrially encoded components of the cytochrome c oxidase complex and cytochrome b in some Primates (eg [31][32][33]). Conversely, for proteins that are components of complex I (NADH dehydrogenase complex) the mean relative variability of variable sites is somewhat lower in Primates/Dermoptera than in Rodentia while the number of perfectly conserved sites tends to be higher in Rodentia.

Protein vs. DNA sequences
The relative merits of performing phylogenetic analyses on nucleotide or corresponding amino acid sequences have been discussed extensively (eg [34]). In brief, while DNA sequences allow the complete parameterization of substitution models through the use of the data under examination, amino acid substitution models typically allow only amino acid frequencies to be adjusted according to the available data. On the other hand, the degree of substitutional saturation and homoplastic character evolution is expected to be higher among nucleotide sequences due to the restricted number of character states and mild to moderate compositional biases in DNA sequences are expected not to cause extensive perturbation of amino acid composition due to the degeneracy of the genetic code, but see [35]. It is clearly desirable that DNA and associated inferred amino acid sequences should generate congruent phylogenetic hypotheses; in the absence of such congruent results it is necessary to assess whether inferences derived from DNA and protein sequences are statistically incongruent and, if so, attempt to explain observed differences in terms of characteristics of the data. In the current investigation, neither dataset discriminates between the two Bayesian consensus trees according to the approximately unbiased test. It is of some concern that the Bayesian consensus tree generated from the DNA data recovers Anomalurus not within Rodentia but among Primates. However, we note that the DNA dataset considered includes several primate sequences that fail the chi square test of compositional homogeneity. When Primates are excluded, Anomalurus is recovered in an identical position to the amino acid analyses (as sister to the Hystricognathi). Furthermore, while distance bootstrap analyses of protein sequences support, albeit weakly, the monophyly of Rodentia (Fig. 1), equivalent analyses performed on DNA sequences yield poorly supported consensus trees depicting non-monophyletic Glires, Rodentia and Primates/Dermoptera (not shown). Finally, no potential amino-acid synapomorphies link Anomalurus with the Primates/Dermoptera clade (while potential synapomorphies with the Hystricognathi and with the Muroidea/Dipodidae clade have been identified). We therefore consider results derived from protein sequences to be more reliable in this case, although we suggest that there is no significant incongruence between inferences derived from the protein and DNA data.

The phylogeny of Euarchontoglires and the evolutionary placement of Anomalurus
Bayesian and distance bootstrap analyses of concatenated first and second codon positions and inferred protein sequences of Rodentia, Primates/Dermoptera, Scandentia and Lagomorpha generated well-supported hypotheses of relationships within Primates/Dermoptera. In accordance with other analyses of mitochondrial sequences [21,6], we recover Primates as paraphyletic with Dermoptera emerging as sister-group to the Anthropoidea with high bootstrap and posterior support. Our protein, but not DNA data reject monophyly of primates as assessed by the AU test of competing tree topologies. Analyses of concatenated nuclear (or nuclear and mitochondrial) data usually (eg [36][37][38]), but not always [39] prefer the traditional hypothesis of Primates monophyly. However, support for the position of Dermoptera as sister to Scandentia is often scarce and or dependent on the analytical method employed [36]. The positioning of Tarsius as sister to Lemur and Nycticebus is unexpected in the light of morphological and nuclear data, but consistent with other analyses of mt (for discussion see [18]) and some analyses of nuclear data [39,36]. The evolutionary affinities of Scandentia (represented in our analyses by Tupaia) have not been satisfactorily resolved by molecular data (see [40,20,39,36,37,41,21,38] and references therein) although current thinking tends to favour a sister relationship with Dermoptera in a clade which emerges basal to the primates. The analyses of mt protein data presented here are in accord with our previous analyses of mt DNA data [6] in suggesting that Tupaia represents the basal divergence of Euarchontoglires rather than constituting the sister taxon of Lagomorpha, Primates, Primates/Dermoptera or Dermoptera. However, where tests of competing tree topologies have been performed, the position of Scandentia has remained unclear [36]. Thus, while our mitochondrial dataset refutes what must be considered a weakly supported nuclear consensus for relationships between Dermoptera, Scandentia and Primates, it is not clear how inconsistent the nuclear data may be with the mitochondrially-derived hypothesis. Importantly, the question of Dermoptera/Primates relationships at least has recently been addressed through examination of the distribution of Short Interspersed Nuclear Elements in these organisms [42]. These data should be free of many of the problems associated with analysis of molecular sequences (substitutional saturation, model choice, compositional bias etc) and strongly support the traditional hypothesis of Primate monophyly -suggesting that available mitochondrial and (to a lesser extent) nuclear sequence data have failed to correctly resolve Primates/ Dermoptera relationships.
With respect to relationships within Glires, inferred protein sequences suggested a specific relationship between Anomalurus and the Hystricognathi. However, first and second codon positions of the gene sequences tended place Anomalurus among the basal divergences in the Primates/Dermoptera clade. This placement was not robustly supported and indeed Bayesian analyses of nucleotide sequences in the absence of Primates (some of whose sequences failed tests of compositional homogeneity) favoured the same placement as suggested by the protein sequences. While we are not aware of published hypotheses suggesting this relationship, it should be noted that a relationship between Anomaluridae and Ctenodactylidae has been proposed on the basis of morphological features [14]. Recent molecular and many classical studies have suggested an affinity between Hystricognathi and Ctenodactylidae (e.g. [11,7,13]). Unfortunately, at the present time, no complete mitochondrial genome sequences from Ctenodactylidae are available. Some molecular data have suggested that the Anomaluridae are specifically related to the Pedetidae (Spring Hares) [12]. Recent analyses that have included sequences from either of these taxa have tended to place these organisms as weakly supported basal branches in a clade containing Dipodidae, Muridae, Geomyidae and Heteromyidae (sister to the Dipodidae/Muroidea clade in our sampling) [36,11,7,43]. Our analyses of constrained tree topologies recovered this placement as a viable alternative to our preferred hypothesis of a relationship between Anomalurus and Hystricognathi (and presumably Ctenodactylidae).
With respect to relationships between other families/ superfamilies within Rodentia, we consistently recover previously proposed relationships between Dipodidae and Muroidea and between Sciuroidea and Gliridae with high bootstrap and posterior probability support. Our analyses however, like those based on other genes or gene concatenations [39,36,37,41,11,12,38,7,43] fail to unambiguously resolve relationships between these groups and the Hystricognathi in the sense that high posterior probabilities for higher order relationships within Rodentia are often accompanied by moderate or low bootstrap support and valid probabilistic tests of alternative topologies have seldom been presented. While our data and analyses prefer the hypothesis that the basal divergence within Roden-tia consists of Hystricognathi (and by inference Ctenodactylidae) + Anomaluridae, leaving the Dipodidae/Muroidea and Gliridae/Sciuridae clades as sisters to each other, our data do not exclude a multitude of other evolutionary scenarios.
The Slow-Fast method -in which faster evolving sites are progressively removed from the dataset and changes in support for nodes of interest are examined -was employed to investigate whether sites supporting different hypotheses of relationships could be partitioned according to evolutionary rates. Exclusion of fast evolving sites has little impact on the resolution of either the position of Anomalurus (when the 25% of sites inferred to be fastest evolving were removed, we recover Anomalurus as a weakly supported sister to the Muridae/Dipodidae clade in accordance with constrained topologies discussed previously) or other relationships within Rodentia, suggesting that "noise" from fast evolving sites is not obscuring phylogenetic signal present in slower evolving sites. We interpret this finding as an indication that phylogenetic signal for higher-order relationships within Rodentia is rather scarce. In accordance with this proposal, we observe that the inferred amino acid sequences derived from Anomalurus (3519 unambiguously aligned amino acids) share only three potential synapomorphies with the Hystricognathi and three with the Muroidea/Dipodidae clade. There are no potential synapomorphies linking all Rodentia, or associating Anomalurus with Lagomorpha, the Myoxidae/Sciuridae clade, Primates/Dermoptera, or any possible sister group set of Rodentia families.

Molecular dating of divergences in Euarchontoglires
Molecular dating of divergences within Euarchontoglires based on mitochondrial sequence data and a global molecular clock has historically yielded estimates in conflict with the fossil record, particularly with respect to Rodentia e.g. [44,45]. More recently several approaches that allow substitution rates to vary over the tree have been developed (for review see [8] These findings are notable as they highlight a fundamental problem in the resolution of higher order relationships within Rodentia. Accounting for the 5% error intervals of our dating estimates, the divergence of Rodentia from Lagomorpha, the divergence of Hystricognathi from other Rodentia and the divergence of Sciuridae/Myoxidae and Muroidea/Dipodidae potentially occurred within 3.1 million years of each other around 60 million years agoleaving relatively little time for the evolution of lineagespecific characters (molecular or morphological) which may be used in the reconstruction of phylogenetic affinities. Conversely, the relatively long subsequent independent evolutionary history of lineages considered here, in conjunction with the limited available taxonomic sampling is likely to have lead to extensive symplesiomorphy and homoplasy, further complicating phylogenetic reconstruction.

Conclusion
The use of mitochondrial sequences for the investigation of even relatively shallow phylogenetic relationships within Rodentia has recently been questioned [47,48]. Indeed it has long been suspected that fast evolutionary rates and compositional biases can lead to misleading phylogenetic signal and poorly supported splits for deeper relationships. While we agree that saturation and compositional biases present a major problem for the reconstruction of ancient divergences, we stress that conclusions from mitochondrial sequences regarding divergence times are consistent with fossil data. Indeed recent studies using individual and concatenated nuclear or nuclear and mitochondrial gene sequences also fail to robustly resolve higher-level relationships within Rodentia [36,37,11,12,38]. Given the aforementioned considerations, we suggest that difficulties in the reconstruction of correct and unambiguous higher-order relationships within Glires do not reflect limitations of either nuclear or mitochondrial sequence data, but are likely to be inherent consequences of a rapid evolutionary radiation which occurred around 60 million years ago.

DNA extraction, amplification and sequencing
Mitochondrial DNA was extracted from 4.5 g of frozen liver of an Anomalurus sp (scaly-tailed flying squirrel) specimen captured in central Africa (specimen provided by F. Catzeflis), according to previously described methods for mammalian species [49].
The entire mitochondrial genome was amplified, using the Polymerase Chain Reaction with eight pairs of heterologous primers designed on the basis of highly conserved regions of the complete mitochondrial sequence of several representative species mammalian species [6]. Amplifications were performed in 100 μl reaction volumes containing 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl 2 , 0.001% (w/v) gelatin, 0.25 mM of each dNTP, 0.5 μM of each primer, and 2.5U of TaqGold polymerase (Roche Applied Science). PCR cycling conditions were 10 min of hot start at 95°C for the activation of the enzyme, followed by 30-35 amplification cycles (45 s of denaturation at 95°C, 45 s of primer annealing at temperatures from 55 to 65°C, and 2 to 3 min of extension at 72°C) followed by a final cycle of 7 min at 72°C. Single amplification products with length between 1.2 and 3.5 Kb were consistently obtained and produced overlapping fragments that covered the whole mitochondrial genome. PCR products were purified using the Amicon Microcon-PCR Centrifugal Filter Devises (Millipore) following the manufacturer's instructions. Fragments were sequenced either directly or after cloning in the pGEM-t easy vector (Promega). Sequencing reactions were performed using the Thermo Sequenase Cy5.5 Dye Terminator Cycle Sequencing Kit (Amersham Pharmacia Biotech) in 8 μl reaction volumes and following the manufacturer's instructions. After purification, DNA sequences were analyzed on a Seq4×4 automated sequencer (Amersham Pharmacia Biotech). Double strand primer walking strategy provided contiguous sequence information for both strands in all fragments. All overlapping regions between amplified fragments matched perfectly and all predicted open reading frames followed the vertebrate mitochondrial genetic code, leading us to exclude the possibility that we had amplified fragments of mitochondrial genome that had been inserted into the nuclear genome. The mtDNA sequence of the flying squirrel Anomalurus sp. has a G+C content of 46.16% and has been deposited in the EMBL database under the accession number AM_159537.

Phylogenetic analyses
Conceptually translated coding sequences H-strand genes derived from all available complete mitochondrial genomes of Primates, Dermoptera, Scandentia, Rodentia and Lagomorpha species were aligned using the program MUSCLE [50] [see Additional file 1]. Sequences from the Laurasiatheria species sheep, dog and mole were included as outgroups (a total of 41 taxa, see table included in supplementary materials). Alignments were manually adjusted and DNA sequences reverse aligned to correspond with protein alignments. Regions of low alignment quality were identified using the program G-Blocks [51] and excluded from subsequent analyses. Protein sequences and the ungapped first and second codon positions (after exclusion of codons with first position leucine synonymous substitutions (Leu-SynP1)) of DNA sequences, were included in concatenated datasets for phylogenetic analyses (5358 nucleotides, 3519 amino acids).
Phylogenetic analyses were carried out using the program MrBayes 3.1 [52] using the General-Time-Reversible (GTR) substitution model for nucleotide sequences and "mtrev24" model for protein sequences, in both cases with the invariant site plus gamma options (eight categories). Two parallel analyses, each composed of one cold and three incrementally heated chains were run for 2,000,000 generations. Trees were sampled every 50 generations and 20,000 trees were discarded as "burn-in" (sufficient to allow convergence according to the tests indicated by the program).
Distance bootstrap analyses were performed by using the shellscript PUZZLEBOOT (available from the TREE_PUZZLE website) in conjunction with TREE-PUZ-ZLE [17] and the programs SEQBOOT, NEIGHBOR and CONSENSE from the PHYLIP package [53], using the substitution models employed in Bayesian analyses with rate heterogeneity parameters estimated by TREE-PUZZLE on the relevant Bayesian tree topology.
For tests of alternative tree topologies, site likelihoods were calculated under the GTR + gamma and mtrev24 + gamma models (for DNA and protein data respectively) using the PAML package [54]. The Approximately Unbiased (AU) tests were performed using the software CON-SEL [19].
Bayesian relaxed molecular clock dating analyses were performed using the MULTIDISTRIBUTE package [24] in conjunction with programs from the package PAML. For DNA sequences, the F85 + gamma model (the most com-plex model available in BASEML) was employed. For protein sequences, following the method of Amer and Kumazawa [25], a modified version of CODEML was used to estimate model parameters for the mtrev24 + gamma model. In both cases the program ESTBRANCHES [24] was used to estimate variances of branch lengths and MULTIDIVTIME [24] used to estimate divergence times.
Analyses of compositional homogeneity were performed using the Chi square test implemented in the program TREE-PUZZLE. Site-specific relative substitution rates were estimated using the SiteVarProt algorithm [23].