Phylogenomic analyses of echinoid diversification prompt a re-evaluation of their fossil record

Echinoids are key components of modern marine ecosystems. Despite a remarkable fossil record, the emergence of their crown group is documented by few specimens of unclear affinities, rendering their early history uncertain. The origin of sand dollars, one of its most distinctive clades, is also unclear due to an unstable phylogenetic context. We employ 18 novel genomes and transcriptomes to build a phylogenomic dataset with a near-complete sampling of major lineages. With it, we revise the phylogeny and divergence times of echinoids, and place their history within the broader context of echinoderm evolution. We also introduce the concept of a chronospace – a multidimensional representation of node ages – and use it to explore methodological decisions involved in time calibrating phylogenies. We find the choice of clock model to have the strongest impact on divergence times, while the use of site-heterogeneous models and alternative node prior distributions show minimal effects. The choice of loci has an intermediate impact, affecting mostly deep Paleozoic nodes, for which clock-like genes recover dates more congruent with fossil evidence. Our results reveal that crown group echinoids originated in the Permian and diversified rapidly in the Triassic, despite the relative lack of fossil evidence for this early diversification. We also clarify the relationships between sand dollars and their close relatives and confidently date their origins to the Cretaceous, implying ghost ranges spanning approximately 50 million years, a remarkable discrepancy with their rich fossil record.


Introduction
The fossil record represents the best source of primary data for constraining the origins of major lineages across the tree of life. However, the fossil record is not perfect, and even for groups with an excellent fossilization potential, constraining their age of origin can be difficult (Smith and Peterson, 2002;Donoghue and Benton, 2007). Furthermore, as many traditional hypotheses of relationships have been revised in light of large-scale molecular datasets, the affinities of fossil lineages and their bearings on inferred times of divergence have also required a reassessment. An exemplary case of this is Echinoidea, a clade comprising sea urchins, heart urchins, sand dollars, and allies, for which phylogenomic trees have questioned the timing of previously well-constrained nodes (Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d).
Echinoids are easily recognized by their spine-covered skeletons or tests, composed of numerous tightly interlocking plates. Slightly over 1000 living species have been described to date (Kroh and Mooi, 2020), a diversity that populates every marine benthic environment from intertidal to abyssal depths (Schultz, 2015). Echinoids are usually subdivided into two morpho-functional groups with similar species-level diversities: 'regular' sea urchins, a paraphyletic assemblage of hemispherical, epibenthic consumers protected by large spines; and irregulars (Irregularia), a clade of predominantly infaunal and bilaterally symmetrical forms covered by small and specialized spines. In today's oceans, regular echinoids act as ecosystem engineers in biodiverse coastal communities such as coral reefs (Edmunds and Carpenter, 2001) and kelp forests (Harrold and Pearse, 1987), where they are often the main consumers. They are first well known in the fossil record on either side of the Permian-Triassic (P-T) mass extinction event when many species occupied reef environments similar to those inhabited today by their descendants (Zonneveld et al., 2016;Thompson et al., 2017b). This extinction event was originally thought to have radically impacted the macroevolutionary history of the clade, decimating the echinoid stem group and leading to the radiation of crown group taxa from a single surviving lineage (Kier, 1977b;Twitchett and Oji, 2005). However, it is now widely accepted that the origin of crown group Echinoidea (i.e., the divergence between its two main lineages, Cidaroidea and Euechinoidea) occurred in the Late Permian, as supported by molecular estimates of divergence (Smith et al., 2006;Thompson et al., 2017a), as well as the occurrence of Permian fossils with morphologies typical of modern cidaroids (Smith and Hollingworth, 1990; Thompson et al., 2015). However, a recent total-evidence study recovered many taxa previously classified as crown group members along the echinoid stem, while also suggesting that up to three crown group lineages survived the P-T mass extinction (Mongiardino Koch and Thompson, 2021d). This result increases the discrepancy between molecular estimates and the fossil record and renders uncertain the early evolutionary history of crown group echinoids. Constraining the timing of origin of this clade relative to the P-T mass extinction (Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d) is further complicated by the poor preservation potential of stem group echinoids, and the difficulty assigning available disarticulated remains from the Late Paleozoic and Early Triassic to specific clades (Kier, 1977b;Twitchett and Oji, 2005;Smith, 2007;Kroh and Smith, 2010;Thompson et al., 2018;Thompson et al., 2019).
Compared to the morphological conservatism of regular sea urchins, the evolutionary history of the relatively younger Irregularia was characterized by dramatic levels of morphological and ecological innovation (Kier, 1982;Saucède et al., 2006;Barras, 2008;Hopkins and Smith, 2015). Within the diversity of irregulars, sand dollars are the most easily recognized (Figure 1). The clade includes greatly flattened forms that live in high-energy sandy environments where they feed using a unique mechanism for selecting and transporting organic particles to the mouth, where these are crushed using well-developed jaws (Mooi, 1990a;Nebelsick, 2020). Sand dollars (Scutelloida) were long thought to be most closely related to sea biscuits (Clypeasteroida) given a wealth of shared morphological characters (Mooi, 1990a;Kroh and Smith, 2010). The extraordinary fossil record of both sand dollars and sea biscuits suggested their last common ancestor originated in the early Cenozoic from among an assemblage known as 'cassiduloids' (Mooi, 1990a;Saucède et al., 2006), a once diverse group that is today represented by three depauperate lineages: cassidulids (and close relatives), echinolampadids, and apatopygids (Smith, 2016;Kroh and Smith, 2010). These taxa not only lack the defining features of both scutelloids and clypeasteroids but have experienced little morphological change since their origin deep in the Mesozoic (Kier, 1962;Smith, 2016;Hopkins and Smith, 2015;Souto et al., 2019). However, early molecular phylogenies supported both cassidulids and echinolampadids as close relatives of sand dollars (e.g., Littlewood and Smith, 1995;Smith et al., 2006), a topology initially disregarded for its conflicts with both morphological and paleontological evidence, but later confirmed using phylogenomic approaches (Mongiardino Koch et al., 2018). While many of the traits shared by sand dollars and sea biscuits have since been suggested to represent a mix of convergences and ancestral synapomorphies secondarily lost by some 'cassiduloids' (Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d), the strong discrepancy between molecular topologies and the fossil record remains unexplained. Central to this discussion is the position of apatopygids, a clade so far unsampled in molecular studies. Apatopygids have a fossil record stretching more than 100 million years and likely have phylogenetic affinities with even older extinct lineages (Kier, 1962;Kroh and Smith, 2010;Souto et al., 2019;Mongiardino Koch and Thompson, 2021d). Although current molecular topologies already imply ghost ranges for scutelloids and clypeasteroids that necessarily extend beyond the Cretaceous-Paleogene (K-Pg) boundary, the phylogenetic position of apatopygids could impose even earlier ages on these lineages ( Figure 1). Constraining these divergences is necessary to understand the timing of origin of the sand dollars, one of the most specialized lineages of echinoids (Mooi, 1990a;Smith, 2016;Hopkins and Smith, 2015;Nebelsick, 2020). Resolving some phylogenetic relationships within scutelloids has also been complicated by their recurrent miniaturization and associated loss of morphological features (Figure 1; Mooi, 1990a;Mooi, 1990b;Mongiardino Koch, 2021a).
Echinoidea constitutes a model clade in developmental biology and genomics. As these fields embrace a more comparative approach Dunn et al., 2018;Smith et al., 2020), robust and time-calibrated phylogenies are expected to play an increasingly important role.   (Kroh and Smith, 2010). Bottom: A recent total-evidence study split cassiduloid diversity into a clade of extant lineages closely related to scutelloids, and an unrelated clade of extinct forms (Nucleolitoida; Mongiardino Koch and Thompson, 2021d). Divergence times are much older and conflict with fossil evidence. Cassidulids and apatopygids lacked molecular data in this analysis. Scale bars = 10 mm.
Likewise, the extraordinary fossil record of echinoids and the ease with which echinoid fossils can be incorporated in phylogenetic analyses make them an ideal system to explore macroevolutionary dynamics using phylogenetic comparative methods (Mongiardino Koch, 2021a;Mongiardino Koch and Thompson, 2021d). In this study, we build upon available molecular resources with 18 novel genome-scale datasets and build the largest molecular matrix for echinoids yet compiled. Our expanded phylogenomic dataset extends sampling to 16 of the 17 currently recognized echinoid orders -plus the unassigned apatopygids (Kroh, 2020) -and is the first to bracket the extant diversity of both sand dollars and sea biscuits and include members of all three lineages of living 'cassiduloids' (cassidulids, echinolampadids, and apatopygids). We also incorporate a diverse sample of outgroups, providing access to the deepest nodes within the crown groups of all other echinoderm classes (holothuroids, asteroids, ophiuroids, and crinoids). With it, we reconstruct the phylogenetic relationships and divergence times of the major lineages of living echinoids and place their diversification within the broader context of echinoderm evolution.

Phylogeny of Echinoidea
Analyses relied on a 70% occupancy supermatrix composed of 1346 loci (327,695 amino acid sites), and including 54 echinoid terminals plus 12 outgroups. Inference was performed under multiple concatenation and coalescent-aware methodologies, as well as relying on maximum likelihood and Bayesian implementations of site-homogeneous and site-heterogeneous models, as these approaches are known to differ in their susceptibility to model violations (Lartillot et al., 2007;Kainer and Lanfear, 2015;Jiang et al., 2020; see Materials and methods for further details). Phylogenetic relationships supported by the full dataset were remarkably stable, with all nodes but one being identically resolved and fully supported across all methods ( Figure 2A). While recovering a topology similar to those of previous molecular studies (Littlewood and Smith, 1995;Smith et al., 2006;Thompson et al., 2017a;Mongiardino Koch et al., 2018;Lin et al., 2020;Mongiardino Koch and Thompson, 2021d), this analysis is the first to sample and confidently place micropygoids and aspidodiadematoids within Aulodonta, as well as resolve the relationships among all major clades of Neognathostomata (scutelloids, clypeasteroids and the three lineages of extant 'cassiduloids'). Our results show that Apatopygus recens is not related to the remaining 'cassiduloids' but is instead the sister clade to all other sampled neognathostomates. The strong support for this placement, as well as for a clade of cassidulids and echinolampadids (Cassiduloida sensu stricto) as the sister group to sand dollars, provides a basis for an otherwise elusive phylogenetic classification of neognathostomates. Our topology also confirms that Sinaechinocyamus mai, a miniaturized species once considered a plesiomorphic member of Scutelloida based on the reduction or loss of diagnostic features (Figure 1), is in fact a derived paedomorphic lineage closely related to Scaphechinus mirabilis (Mooi, 1990b). Salenioida is another major lineage sampled here for the first time, and whose exact position among regular echinoids proved difficult to resolve. While some methods supported salenioids as the sister group to a clade of camarodonts, stomopneustoids, and arbacioids (a topology previously supported by morphology; Kroh and Smith, 2010), others recovered a closer relationship of salenioids to Camarodonta + Stomopneustoida, with arbacioids sister to them all (as shown in Figure 2A). As revealed using likelihood mapping, these results do not stem from a lack of phylogenetic signal, but rather from the presence of strong and conflicting evidence in the dataset regarding the position of salenioids ( Figure 2B). However, a careful dissection of these signals shows that loci with high phylogenetic usefulness (as defined by Mongiardino Koch, 2021b;Mongiardino Koch and Thompson, 2021d; see Materials and methods) favor the topology shown in Figure 2A, with the morphological hypothesis becoming dominant only after incorporating less reliable loci ( Figure 2C). In line with these results, moderate levels of gene subsampling (down to 500 loci) targeting the most phylogenetically useful loci unambiguously support the placement of arbacioids as sister to the remaining taxa, regardless of the chosen method of inference ( Figure 2D). More extreme subsampling (down to 100 loci) again results in disagreement among methods. This possibly stems from the increasing effect of stochastic errors in smaller datasets, as less than half of the sampled loci in these reduced datasets contain data for all branches of this quartet (see Figure 2C). This result shows the importance of ensuring that datasets (especially subsampled ones) retain appropriate levels of occupancy for   Figure 2. Phylogenetic relationships among major clades of Echinoidea. (A) Favored topology, as obtained using the full supermatrix and a best-fit partitioning scheme in IQ-TREE (Nguyen et al., 2015). With the exception of a single contentious node within Echinacea (marked with a yellow star), all methods supported the same pattern of relationships, and assigned maximum support values to all nodes. Numbers below major clades correspond to the current numbers of described living species (obtained from Kroh and Mooi, 2020). (B) Likelihood-mapping analysis showing the proportion of quartets supporting different resolutions within Echinacea. While the majority of quartets support the topology depicted in A (shown in red), a relatively large number support an alternative resolution that has been recovered in morphological analyses (shown in blue; Kroh and Smith, 2010). (C) Difference in likelihood score (delta likelihood) for the two resolutions of Echinacea most strongly supported in the likelihood-mapping analysis. Genes were sorted based on their inferred phylogenetic usefulness (Mongiardino Koch, 2021b), and gene-wise delta scores were averaged for datasets composed of multiples of 20 loci. Support for a clade of Salenioida + (Camarodonta + Stomopneustoida), as depicted in A, is seen as positive delta scores and is predominantly concentrated among the most phylogenetically useful loci. This signal is attenuated in larger datasets that contain less reliable genes, eventually favoring an alternative resolution (as seen by negative scores for the largest datasets). Only the 584 loci containing data for the three main lineages of Echinacea were considered. The line corresponds to a second-degree polynomial regression. (D) Resolution and bootstrap scores (see color scale) of the topology within Echinacea found using datasets of different sizes and alternative methods of inference. clades bracketing contentious nodes (Dell'Ampio, 2014). Despite these disagreements, several lines of evidence favor the topology shown in Figure 2A, including the results of likelihood mapping, and the increased support for this resolution among the most phylogenetically useful loci and when using more complex methods of reconstruction, such as partitioned and site-heterogeneous models, which always favor this topology regardless of dataset size ( Figure 2D).

Sensitivity of node ages
While alternative methods of inference had minor effects on phylogenetic relationships, they did impact the reconstruction of branch lengths ( Figure 3). Site-heterogeneous models (such as CAT + GTR + G) returned longer branch lengths overall, but also uncovered a larger degree of molecular change among echinoderm classes. Branches connecting these clades were stretched to a much larger extent than those within the ingroup, a phenomenon that might affect the inference of node ages. We tested this hypothesis by exploring the sensitivity of divergence times to the use of alternative models of molecular evolution (site-homogeneous vs. site-heterogeneous), as well as different clocks (autocorrelated vs. uncorrelated), prior node distributions (Cauchy vs. uniform), and gene sampling strategies (using five different approaches; see Materials and methods). All combinations of these factors were explored, resulting in 40 different time calibration settings that were run using Bayesian approaches under a constrained tree topology (shown in Figure 2A). While the nodes connecting some outgroup taxa were among those most sensitive to these methodological decisions, large effects were also seen among nodes relating to the origin and diversification of the echinoid clades Cidaroidea, Aulodonta, and Neognathostomata. All of these nodes varied in age by more than 35 Myr -and up to 115 Myramong the consensus topologies of different analyses ( Figure 4).
In order to isolate and visualize the impact of each of these factors on divergence time estimation, chronograms were represented in a multidimensional space of node dates, with each axis representing the age of a given node. We term this type of graph a chronospace given its similarities to the treespaces commonly used to explore topological differences among phylogenetic trees (Hillis et al., 2005). Each observation (chronogram) was classified as obtained under a specific clock, model of molecular evolution, node prior distribution, and gene sampling strategy, and the major effects of each of these choices were extracted with the use of between-group principal component analyses (bgPCAs). The single dimension of chronospace maximizing the distinctiveness of chronograms obtained under different clocks explained 53.4% of the total variance in node ages across all analyses ( Figure 5). In contrast, the choice of different loci, models of molecular evolution, and prior distributions on node ages showed much lesser effects, explaining 10.7%, 3.9%, and 0.4% of the total variance, respectively ( Figure 5 and Figure 5-figure supplement 1). Even though most of these decisions affected a similar set of sensitive nodes (those mentioned above, as well as some relationships within Atelostomata), the choice of clock model modified the ages of 17 of these by more than Best-fit partitioning scheme (site-homogeneous) Figure 3. Estimated branch lengths across different models of molecular evolution. Different site-homogeneous models (left) infer similar levels of divergence, and the choice between them induces little distortion in the general tree structure. Site-heterogeneous models on the other hand not only infer a larger degree of divergence between terminals relative to site-homogeneous ones (center and right), but they also distort the tree (i.e., impose a non-isometric stretching), with branch lengths connecting outgroup taxa expanding much more than those within the ingroup clade.
20 Myr ( Figure 5-figure supplement 2). This degree of change was induced on only four nodes by selecting alternative loci, and was not induced on any node by enforcing different models of evolution or node age priors (Figure 5-figure supplements 3-5). Regarding gene choice, the ages most different to those obtained under random loci selection were found when using the most clock-like genes ( Figure 5C).

Echinoid (and echinoderm) divergence times
Even when the age of crown Echinodermata was constrained to postdate the appearance of stereom (the characteristic skeletal microstructure of echinoderms) in the Early Cambrian (Bottjer et al., 2006;Zamora et al., 2013), only analyses using the most clock-like loci recovered ages concordant with this (i.e., median ages younger than the calibration enforced; Figure  Our results also recover an early origin of crown group Holothuroidea (sea cucumbers; range of median ages = 350.4-384.2 Ma), well before the crown groups of other extant echinoderm classes. These dates markedly postdate the first records of holothuroid calcareous rings in the fossil record (Reich, 2015;Miller et al., 2017), and imply that this trait does not define the holothuroid crown group but instead evolved from an echinoid-like jaw-apparatus along its stem (Rahman et al., 2019). The other noteworthy disagreement between our results and those of previous studies (Rouse et al., 2013) involves dating crown group Crinoidea to times that precede the P-T mass extinction (range of median ages = 268.0-329.7 Ma, although highest posterior density intervals are always wide and include Triassic ages).  Across all of the analyses performed, the echinoid crown group is found to have originated somewhere between the Pennsylvanian and Cisuralian, with 30.2% posterior probability falling within the late Carboniferous and 69.1% within the early Permian ( Figure 6 and Figure 6-figure supplement 1). An origin of the clade postdating the P-T mass extinction is never recovered, even when such ages are common under the joint prior ( Figure 6-figure supplement 2). While the posterior distribution of ages for Euechinoidea spans both sides of the P-T boundary, the remaining earliest splits within the echinoid tree are constrained to have occurred during the Triassic, including the origins of Aulodonta, Carinacea, Echinacea, and Irregularia ( Figure 6 and Figure 4-figure supplement 1). Many echinoid orders are also inferred to have diverged from their respective sister clades during this period, including aspidodiadematoids, pedinoids, echinothurioids, arbacioids, and salenioids. Lineage-through-time plots confirm that diversification proceeded rapidly throughout the Triassic ( Figure 6B). Despite the topological reorganization of Neognathostomata, the clade is dated to a relatively narrow time interval in the Late to Middle Jurassic (range of median ages = 169.48-180.93 Ma), in agreement with recent estimates (Mongiardino Koch and Thompson, 2021d). Within this clade, the origins of both scutelloids and clypeasteroids confidently predate the K-Pg mass extinction (posterior probability of In the latter case, only the first two out of four bgPCA dimensions are shown. The inset shows the centroid for each loci sampling strategy, and the width of the lines connecting them are scaled to the inverse of the Euclidean distances that separates them (as a visual summary of overall similarity). The proportions of total variance explained are shown on the axis labels. The impact of the clock model is such that a bimodal distribution of chronograms can be seen even when bgPCA are built to discriminate based on other factors (as in C).

Ophiuroidea
The online version of this article includes the following figure supplement(s) for figure 5:     origination before the boundary = 1.00 and 0.97, respectively), despite younger ages being allowed by the joint prior ( Figure 6-figure supplement 2).

Discussion
The echinoid tree of life In agreement with previous phylogenomic studies ( Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d), echinoid diversity can be subdivided into five major clades (Figure 2A). Cidaroids form the sister group to all other crown group echinoids (Euechinoidea). Some aspects of the relationships among sampled cidaroids are consistent with previous molecular (Brosseau et al., 2012) and morphological studies (Kroh and Smith, 2010), including an initial split between   Histocidaris and the remaining taxa, representing the two main branches of extant cidaroids (Kroh, 2020;Kroh and Mooi, 2020). Others, such as the nested position of Prionocidaris baculosa within the genus Eucidaris, not only implies paraphyly of this genus but also suggests the need for a taxonomic reorganization of the family Cidaridae. Within euechinoids, the monophyly of Aulodonta is supported for the first time with sampling of all of its major groups. The subdivision of these into a clade that includes diadematoids plus micropygoids (which we propose should retain the name Diadematacea), sister to a clade including echinothurioids and pedinoids (Echinothuriacea sensu Mongiardino Koch et al., 2018) is strongly reminiscent of some early classifications (e.g., Durham and Melville, 1957).
Our expanded phylogenomic sampling also confirms an aulodont affinity for aspidodiadematoids (Kroh, 2020;Mongiardino Koch and Thompson, 2021d) and places them within Echinothuriacea as the sister group to Pedinoida. The remaining diversity of echinoids, which forms the clade Carinacea (Figure 2), is subdivided into Irregularia and their sister clade among regulars, for which we amend the name Echinacea to include Salenioida. Given the striking morphological gap separating regular and irregular echinoids, the origin of Irregularia has been shrouded in mystery (Durham and Melville, 1957;Saucède et al., 2006;Kroh and Smith, 2010). Our complete sampling of major regular lineages determines Echinacea sensu stricto to be the sister clade to irregular echinoids. A monophyletic Echinacea was also supported in a recent total-evidence analysis (Mongiardino Koch and Thompson, 2021d), but the incomplete molecular sampling of that study resulted in a slightly different topology that placed salenioids as the sister group to the remaining lineages. However, an overall lack of morphological synapomorphies uniting these clades had previously been acknowledged (Kroh and Smith, 2010). While the relationships within Echinacea proved to be difficult to resolve even with thousands of loci, multiple lines of evidence lead us to prefer a topology in which salenioids form a clade with camarodonts + stomopneustoids, with arbacioids sister to all of these ( Figure 2).
As has been already established (Littlewood and Smith, 1995;Smith et al., 2006;Kroh and Smith, 2010;Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d), the lineages of irregular echinoids here sampled are subdivided into Atelostomata (heart urchins and allies) and Neognathostomata (sand dollars, sea biscuits, and 'cassiduloids'). Despite the former being the most diverse of the five main clades of echinoids (Figure 2), its representation in phylogenomic studies remains low, and its internal phylogeny poorly constrained (Kroh, 2020). On the contrary, recent molecular studies have greatly improved our understanding of the relationships among neognathostomates (Mongiardino Koch et al., 2018;Lin et al., 2020;Mongiardino Koch and Thompson, 2021d), revealing an evolutionary history that dramatically departs from previous conceptions. Even when scutelloids and clypeasteroids were never recovered as reciprocal sister lineages by molecular phylogenies (e.g., Littlewood and Smith, 1995;Smith et al., 2006;Thompson et al., 2017a), this result was not fully accepted until phylogenomic data confidently placed echinolampadids as the sister lineage to sand dollars (Mongiardino Koch et al., 2018). At the same time, this result rendered the position of the remaining 'cassiduloids', a taxonomic wastebasket with an already complicated history of classification (Suter, 1994;Smith, 2016;Kroh and Smith, 2010;Souto et al., 2019), entirely uncertain. An attempt to constrain the position of these using a total-evidence approach (Mongiardino Koch and Thompson, 2021d) subdivided the 'cassiduloids' into three unrelated clades: Nucleolitoida, composed of extinct lineages and placed outside the node defined by Scutelloida + Clypeasteroida, and two other clades nested within it (see Figure 1G). Extant 'cassiduloids' were recovered as members of one of the latter clades, representing the monophyletic sister group to sand dollars. Here, we show that Apatopygus recens does not belong within this clade but is instead the sister group to all other extant neognathostomates. Given this phylogenetic position, as well as the morphological similarities between Apatopygus and the entirely extinct nucleolitids (Mortensen, 1948;Kier, 1966;Suter, 1994;Kroh and Smith, 2010;Souto et al., 2019), it is likely that the three extant species of apatopygids represent the last surviving remnants of Nucleolitoida, a clade of otherwise predominantly Mesozoic neognathostomates (Mongiardino Koch and Thompson, 2021d). Because of the renewed importance in recognizing this topology, we propose the name Luminacea for the clade uniting all extant neognathostomates with the exclusion of Apatopygidae (Figure 2A). This nomenclature refers to the dynamic evolutionary history of the Aristotle's lantern (i.e., the echinoid jaw-apparatus) within the clade (present in the adults of both clypeasteroids and scutelloids, but found only in the juveniles of Cassiduloida sensu stricto), the inclusion of the so-called lamp urchins (echinolampadids) within the clade, and the illumination provided by this hitherto unexpected topology. The previous misplacement of Apatopygus (Mongiardino Koch and Thompson, 2021d; see Figure 1G) is likely a consequence of tip-dating preferring more stratigraphically congruent topologies (King, 2020), an effect that can incorrectly resolve taxa on long terminal branches (Turner et al., 2017). Given the generally useful phylogenetic signal of stratigraphic information (Mongiardino Koch et al., 2021c), this inaccuracy further highlights the unusual evolutionary history of living apatopygids.

Chronospaces: a statistical exploration of time calibration strategies
Calibrating phylogenies to absolute time is crucial to understanding evolutionary history, as the resulting chronograms provide a major avenue for testing hypotheses of diversification, character evolution, and other macroevolutionary processes. However, the accuracy and precision of the inferred divergence times hinge upon many methodological choices (calibration strategies, prior distributions on node ages, clock models, etc.), that are often difficult or time-consuming to justify (Warnock et al., 2012;Sauquet, 2013;dos Reis et al., 2015;Reis et al., 2018;Carruthers et al., 2020;Carruthers and Scotland, 2021), and whose impact can be hard to quantify.
Here, we analyze the sensitivity of node ages to alternative criteria to sample loci from phylogenomic datasets, as well as different assumptions regarding patterns of molecular evolution across sites, variation in evolutionary rates among lineages, and ways in which fossils are translated into plausible times of divergence. To do so, we introduce an approach to visualize the distribution of chronograms in a multidimensional space of node ages, a chronospace, and measure the overall effect of these decisions on inferred dates using multivariate statistical methods. Our results reveal a minimal impact of selecting between alternative distributions to model the prior ages of calibrated nodes. This result conflicts with previous results (e.g., Inoue et al., 2010;dos Reis et al., 2015;Strassert et al., 2021), and may reflect the way these distributions are implemented in the software employed (Phylo-Bayes v4.1; Lartillot et al., 2013). Similarly, divergence times obtained under site-homogeneous and site-heterogeneous models (such as CAT + GTR + G) are broadly comparable. This happens despite the latter estimating higher levels of sequence divergence and stretching branches in a non-isometric manner (Figure 3). While site-heterogeneous models have become common for the inference of phylogenetic relationships, the degree to which they impact estimates of node ages has received less scrutiny. The lack of a meaningful effect uncovered here, coupled with their high computational burden (Whelan and Halanych, 2017), questions their usefulness for time-scaling phylogenies. A similar result was recently found when comparing site-homogeneous models with different numbers of parameters (Tao et al., 2020), suggesting that relaxed clocks adjust branch rates in a manner that buffers the effects introduced by using more complex models of sequence evolution.
The choice between different loci also has a small effect on inferred ages, with little evidence of a systematic difference between the divergence times supported by randomly chosen loci and those found using targeted sampling criteria, such as selecting genes for their phylogenetic signal, usefulness, occupancy, or clock-likeness. A meaningful effect was restricted to a few ancient nodes (e.g., Echinodermata), for which clock-like genes suggested younger ages that are more consistent with fossil evidence. While this validates the use of clock-like genes for inferring deep histories of diversification (Smith et al., 2018;Carruthers et al., 2020), the choice of loci had no meaningful effect on younger ages. Finally, the choice between alternative clock models induced differences in ages that were between five and one hundred times stronger than those of other factors, emphasizing the importance of either validating their choice (e.g., Tao et al., 2019) or -as done here -focusing on results that are robust to them.

Echinoid origins and diversification
The origin and early diversification of crown group Echinoidea have always been considered to have been determined (or at least strongly affected) by the P-T mass extinction (Kier, 1977b;Twitchett and Oji, 2005;Thompson et al., 2015;Thompson et al., 2019). However, estimating the number of crown group members surviving the most severe biodiversity crisis in the Phanerozoic (Raup and Sepkoski, 1982) has been hampered by both paleontological and phylogenetic uncertainties (Smith and Hollingworth, 1990;Smith et al., 2006;Thompson et al., 2017a;Thompson et al., 2017b;Thompson et al., 2018;Mongiardino Koch and Thompson, 2021d). Our results establish that multiple crown group lineages survived and crossed this boundary, finding for the first time a null posterior probability of the clade originating after the extinction event. While the survival of three crown group lineages is slightly favored (Figure 6-figure supplement 1), discerning between alternative scenarios is still precluded by uncertainties in dating these early divergences. Echinoid diversification during the Triassic was relatively fast ( Figure 6B) and involved rapid divergences among its major clades. Even many lineages presently classified at the ordinal level trace their origins to this initial pulse of diversification following the P-T mass extinction.
The late Paleozoic and Triassic origins inferred for the crown group and many euechinoid orders prompt a re-evaluation of fossils from this interval of time. Incompletely known fossil taxa such as the Pennsylvanian Eotiaris? meurevillensis, with an overall morphology akin to that of crown group echinoids, has a stratigraphic range consistent with our inferred date for the origin of the echinoid crown group . Additionally, the Triassic fossil record of echinoids has been considered to be dominated by stem group cidaroids, with the first euechinoids not known until the Late Triassic (Kier, 1984;Smith, 2007). However, the Triassic origins of many euechinoid lineages supported by our analyses necessitate that potential euechinoid affinities should be re-considered for this diversity of Triassic fossils. This is especially the case for the serpianotiarids and triadocidarids, abundant Triassic families variously interpreted as cidaroids, euechinoids, or even stem echinoids (Kier, 1984;Smith, 1994;Mongiardino Koch and Thompson, 2021d). A reinterpretation of any of these as euechinoids would suggest that the long-implied gap in the euechinoid record (Thompson et al., 2015;Thompson et al., 2018) is caused by our inability to correctly place these key fossils, as opposed to an incompleteness of the fossil record itself.
While our phylogenomic approach is the first to resolve the position of all major cassiduloid lineages, the inferred ages for many nodes within Neognathostomata remain in strong disagreement with the fossil record. No Mesozoic fossil can be unambiguously assigned to either sand dollars or sea biscuits, a surprising situation given the good fossilization potential and highly distinctive morphology of these clades (Kier, 1977a, Kier, 1982Mooi, 1990a;Kroh and Smith, 2010). While molecular support for a sister group relationship between scutelloids and echinolampadids already implied this clade (Echinolampadacea) must have split from clypeasteroids by the Late Cretaceous (Smith et al., 2006;Kroh and Smith, 2010;Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d), this still left open the possibility that the crown groups of sand dollars and sea biscuits radiated in the Cenozoic. Under this scenario, the Mesozoic history of these groups could have been composed of stem forms lacking their distinctive morphological features, complicating their correct identification. This hypothesis is here rejected, with the data unambiguously supporting the origination of the sand dollar and sea biscuit crown groups preceding the K-Pg mass extinction ( Figure 6C). While it remains possible that these results are incorrect even after such a thorough exploration of the time calibration toolkit (see for example Carruthers et al., 2020;Field et al., 2020), these findings call for a critical reassessment of the Cretaceous fossil record, and a better understanding of the timing and pattern of morphological evolution among fossil and extant neognathostomates. For example, isolated teeth with an overall resemblance to those of modern sand dollars and sea biscuits have been found in Lower Cretaceous deposits (Caramés et al., 2019), raising the possibility that other overlooked and disarticulated remains might close the gap between rocks and clocks.

Conclusions
Although echinoid phylogenetics has long been studied using morphological data, the position of several major lineages (e.g., aspidodiadematoids, micropygoids, salenioids, apatopygids) remained to be confirmed with the use of phylogenomic approaches. Our work not only greatly expands the available genomic resources for the clade, but finds novel resolutions for some of these lineages, improving our understanding of their evolutionary history. The most salient aspect of our topology is the splitting of the extant 'cassiduloids' into two distantly related clades, one of which is composed exclusively of apatopygids. This result is crucial to constrain the ancestral traits shared by the main lineages of neognathostomates, helping unravel the evolutionary processes that gave rise to the unique morphology of the sand dollars and sea biscuits (Mooi, 1990a;Hopkins and Smith, 2015;Mongiardino Koch and Thompson, 2021d).
Although divergence time estimation is known to be sensitive to many methodological decisions, systematically quantifying the relative impact of these on inferred ages has rarely been done. Here, we propose an approach based on chronospaces that can help visualize key effects and determine the sensitivity of node dates to different assumptions. Our results shed new light on the early evolutionary history of crown group echinoids and its relationship with the P-T mass extinction event, a point in time where the fossil record provides ambiguous answers. They also establish with confidence a Cretaceous origin for the sand dollars and sea biscuits, preceding their first appearance in the fossil record by at least 40-50 Myr, respectively (and potentially up to 65 Myr). These clades, therefore, join several well-established cases of discrepancies between the fossil record and molecular clocks, such as those underlying the origins of placental mammals (Ronquist et al., 2016) and flowering plants (Coiro et al., 2019).

Sampling, bioinformatics, and matrix construction
This study builds upon previous phylogenomic matrices ( Mongiardino Koch et al., 2018;Mongiardino Koch and Thompson, 2021d), the last of which was augmented through the addition of eight published datasets (mostly expanding outgroup sampling), as well as 18 novel echinoid datasets (16 transcriptomes and 2 draft genomes). For all novel datasets, tissue sampling, DNA/RNA extraction, library preparation, and sequencing varied by specimen, and are detailed in Appendix 1. Raw reads have been deposited in NCBI under Bioproject accession numbers PRJNA767441, PRJNA746411, and PRJNA746412. Final taxon sampling included 12 outgroups and 54 echinoids (SRA accession numbers and sampling details can be found in Appendix 1-table 1).
Raw reads for all transcriptomic datasets were trimmed or excluded using quality scores with Trimmomatic v. 0.36 (Bolger et al., 2014) under default parameters. Further sanitation steps were performed using the Agalma 2.0 phylogenomic workflow (Dunn et al., 2013), and datasets were assembled de novo with Trinity v. 2.5.1 (Grabherr et al., 2011). For genomic shotgun sequences, adapters were removed with BBDuk (https://sourceforge.net/projects/bbmap/), and UrQt v. 1.0.18 (Modolo and Lerat, 2015) was used to filter short reads (size <50) and trim low-quality ends (score <28). Datasets were then assembled using MEGAHIT v. 1.1.2 (Li et al., 2015). Draft genomes were masked using RepeatMasker v. 4.1.0 (Smit et al., 2015;Hoff and Stanke, 2019), before obtaining gene predictions with AUGUSTUS v. 3.2.3 (Stanke et al., 2006). A custom set of universal single-copy orthologs (USCOs) obtained from the Strongylocentrotus purpuratus genome assembly v. 5.0 was employed as the training dataset. Settings and further details of these analyses can be found in Appendix 1.
Multiplexed transcriptomes were sanitized from cross-contaminants using CroCo v. 1.1 (Simion et al., 2018), and likely non-metazoan contaminants were removed using alien_index v. 3.0 (Ryan, 2014), removing sequences with AI scores > 45. Datasets were imported back into Agalma, which automated orthology inference (as described in Dunn et al., 2013;Guang et al., 2021), gene alignment with MAFFT v. 7.305 (Katoh and Standley, 2013; using the E-INS-i algorithm), and trimming with GBLOCKS v. 0.91b (Talavera and Castresana, 2007). The amino acid supermatrix was reduced using a 70% occupancy threshold, producing a final dataset of 1346 loci (327,695 sites). As a final sanitation step, gene trees were obtained using ParGenes v. 1.0.1 (Morel et al., 2018), which performed model selection (minimizing the Bayesian information criterion) and phylogenetic inference using 100 bootstrap (BS) replicates. Trees were then used to remove outlier sequences with TreeShrink v. 1.3.1 (Mai and Mirarab, 2018). We specified a reduced tolerance for false positives and limited removal to at most three terminals which had to increase tree diameter by at least 25% (-q 0.01 -k 3 -b 25). Statistics for the supermatrix and all assemblies can be found in Appendix 1-table 2.

Phylogenetic inference
Coalescent-based inference was performed using the summary method ASTRAL-III , estimating support as local posterior probabilities (Sayyari and Mirarab, 2016). Among concatenation approaches, we used Bayesian inference under an unpartitioned GTR + G model in ExaBayes v. 1.5 (Aberer et al., 2014). Two chains were run for 2.5 million generations, samples were drawn every one hundred, and the initial 25% was discarded as burn-in. The entire posterior distribution of trees was composed of a single topology, and 210 out of 213 parameters attained adequate potential scale reduction factors (i.e., lower than 1.1). We also explored maximum likelihood inference with partitioned and unpartitioned models. For the former, the fast-relaxed clustering algorithm was used to find the best-fitting model among the top 10% using IQ-TREE v. 1.6.12 (Nguyen et al., 2015;Kalyaanamoorthy et al., 2017;-m MFP + MERGE -rclusterf 10 -rcluster-max 3000), and support was evaluated with 1000 ultrafast BS replicates (Hoang et al., 2017). For the latter, we used the LG4X + R model in RAxML-NG v. 0.5.1 (Kozlov et al., 2019) and evaluated support with 200 replicates of BS. Finally, we also implemented the site-heterogeneous LG + C60 + F + G mixture model using the posterior mean site frequency approach to provide a fast approximation of the full profile mixture model (Wang et al., 2018), allowing the use of 100 BS replicates to estimate support. Given some degree of topological conflict between the results of the other methods (see below), multiple guide trees were used to estimate site frequency profiles, but the resulting phylogenies were identical.
Given conflicts between methods in the resolution of one particular node (involving the relationships among Arbacioida, Salenioida, and the clade of Stomopneustoida + Camarodonta), all methods were repeated after reducing the matrix to 500 and 100 loci selected for their phylogenetic usefulness using the approach described in Mongiardino Koch, 2021b;Mongiardino Koch and Thompson, 2021d, and implemented in the genesortR script (https://github.com/mongiardino/genesortR). This approach relies on seven gene properties routinely used for phylogenomic subsampling, including multiple proxies for phylogenetic signal -such as the average BS and Robinson-Foulds (RF) similarity to a target topology -as well as several potential sources of systematic bias (e.g., rate and compositional heterogeneity). Outgroups were removed before calculating these metrics. RF similarity was measured to a species tree that had the conflicting relationship collapsed so as not to bias gene selection in favor of any resolution. A PCA of this dataset resulted in a dimension (PC 2, 17.6% of variance) along which phylogenetic signal increased while sources of bias decreased (Appendix 3figure 1), and which was used for loci selection. For the smallest subsampled dataset (100 loci), we also performed inference under the site-heterogeneous CAT + GTR + G model using PhyloBayes-MPI (Lartillot et al., 2013). Three runs were continued for >10,000 generations, sampling every two generations and discarding the initial 25%. Convergence was confirmed given a maximum bipartition discrepancy of 0.067 and effective sample sizes for all parameters > 150.
Two other approaches were used in order to assist in resolving the contentious node. First, we implemented a likelihood-mapping analysis (Strimmer and Von Haeseler, 1997) in IQ-TREE to visualize the phylogenetic signal for alternative resolutions of the quartet involving these three lineages (Arbacioida, Salenioida, and Stomopneustoida + Camarodonta) and their sister clade (Irregularia; other taxa were excluded). Second, we estimated the log-likelihood scores of each site in RAxML (using best-fitting models) for the two most strongly supported resolutions found through likelihood mapping. These were used to calculate gene-wise differences in scores, or δ values (Shen et al., 2017). In order to search for discernable trends in the signal for alternative topologies, genes were ordered based on their phylogenetic usefulness (see above) and the mean per-locus δ values of datasets composed of multiples of 20 loci (i.e., the most useful 20, 40, etc.) were calculated.

Time calibration
Node dating was performed using relaxed molecular clocks in PhyloBayes v4.1 using a fixed topology and a novel set of 22 fossil calibrations corresponding to nodes from our newly inferred phylogeny (listed in Appendix 2). Depending on the node, we enforced both minimum and maximum bounds, or either one of these. A birth-death prior was used for divergence times, which allowed for the implementation of soft bounds (Yang and Rannala, 2006), leaving 5% prior probability of divergences falling outside of the specified interval. We explored the sensitivity of divergence times to gene selection, model of molecular evolution, type of clock, and prior distribution on node ages. One hundred loci were sampled from the full supermatrix according to four targeted sampling schemes: usefulness (calculated as explained above, except incorporating all echinoderm terminals), phylogenetic signal (i.e., smallest RF distance to species tree), clock-likeness (i.e., smallest variance of rootto-tip distances), and level of occupancy. For clock-likeness, we only considered loci that lay within one standard deviation of the mean rate (estimated by dividing total tree length by the number of terminals; Telford et al., 2014), as this method is otherwise prone to selecting largely uninformative loci (Figure 7; Mongiardino Koch, 2021b). A fifth sample of randomly chosen loci was also evaluated. These five datasets were run under two unpartitioned models of molecular evolution, the sitehomogeneous GTR + G and the site-heterogeneous CAT + GTR + G, and both uncorrelated gamma (UGAM; Drummond et al., 2006) and autocorrelated log-normal (LN; Thorne et al., 1998;Kishino et al., 2001) clocks were implemented. Finally, fossil calibrations were translated into node age priors with the use of both uniform and Cauchy distributions (under default parameters), the latter of which account for the incompleteness of the fossil record by assuming that the most likely origination times occur at a distance from the minimum bound (dos Reis et al., 2015). While some methods have been developed to guide the selection of these parameters, exploring the sensitivity of results to a broad spectrum of conditions (even if some are suboptimal) can provide a better picture of the robustness of results to underlying assumptions. Furthermore, guidance methods can also be subject to their own assumptions. For example, CorrTest (Tao et al., 2019), an approach to select between alternative clock models, either supported or rejected the presence of autocorrelated rates depending on the species tree used from among those obtained under different methods of phylogenetic inference (see above).
The combination of these settings (loci sampled, model of evolution, type of clock, and node prior distribution) resulted in 40 analyses. For each, two runs were continued for 20,000 generations, after which the initial 25% was discarded and the chains thinned to every two generations (see loglikelihood trace plots in Appendix 3- figure 2). To explore the sensitivity of divergence times to these . Relationship between the root-to-tip variance (a proxy for the clock-likeness of loci) and the rate of evolution. The most clock-like loci (shown in red), which are often favored for the inference of divergence times (e.g., Smith et al., 2018;Carruthers et al., 2020), are among the most highly conserved and can provide little information for constraining node ages (see also Mongiardino Koch, 2021b). Clock-like genes with a higher information content were used instead by choosing the loci with the lowest root-to-tip variance from among those that were within one standard deviation from the mean evolutionary rate (shown in blue).
methodological decisions, 400 random chronograms were sampled from each analysis (200 from each run), and their node dates were subjected to bgPCA using package Morpho (Schlager, 2017) in the R statistical environment (R Development Core Team, 2019). bgPCA involves the use of PCA on the covariance matrix of group means, followed by the projection of original samples onto the obtained axes. The result is a multidimensional representation of divergence times -a chronospace -rotated so as to capture the distinctiveness of observations obtained under different settings. Separate bgPCAs were performed for each of the four factors explored, and the proportion of total variance explained by bgPC axes was taken as an estimate of the relative impact of these choices on divergence times. Finally, lineage-through-time plots were generated using ape (Paradis and Schliep, 2018). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. Tissue subsamples were finely chopped with a scalpel and preserved in RNAlater (Ambion) buffer solution for 1 day at 4°C to allow the RNAlater to effectively penetrate the tissues, followed by long-term storage at -80°C until RNA extraction. Total RNA was extracted from 1.5 mm Triple-Pure High Impact Zirconium beads (Benchmark Scientific) in Trizol (Ambion), using Direct-zol RNA Miniprep Kit (Zymo Research) with in-column DNase I incubation to remove genomic DNA. Prior to mRNA capture, total RNA concentration was estimated using Qubit RNA HS Assay Kit (Invitrogen; range = 8.33-96 ng/μl), and quality was assessed using either High Sensitivity RNA ScreenTape or RNA ScreenTape with an Agilent 4200 TapeStation. Mature mRNA was isolated from total RNA and libraries were prepared using KAPA mRNA HyperPrep Kit (KAPA Biosystems) following the manufacturer's instructions (including sample customization based on total RNA quantity and quality values), targeting an insert size circa 500 base pairs (bp), and using custom 10-nucleotide Illumina TruSeq style adapters (Faircloth and Glenn, 2012). Post-amplification, DNA concentration was estimated using Qubit dsDNA BR Assay Kit (Invitrogen; range = 6.06-13.4 ng/μl). Concentration, quality, and molecular weight distribution of libraries (range = 547-978 bp) were also assessed using Genomic DNA ScreenTape with an Agilent 4200 TapeStation. Ten libraries (including two annelids not employed here) were sequenced on one lane of a multiplexed run using NovaSeq S4 platform with 100 bp paired-end reads at the IGM Genomics Center (University of California San Diego). The assembled transcriptomes were sanitized from cross-contaminant reads product of multiplexed sequencing using CroCo v. 1.1 (Simion et al., 2018). These eight datasets, as well as two annelid transcriptomes not employed in this study but sequenced in the same lane, were incorporated. Transcripts considered over-or under-expressed across samples (as defined by default parameters) were kept. This resulted in an average removal of 1.26% of assembled transcripts (range: 0.4% for Aspidodiadema to 3.06% for Histocidaris).

Author contributions
Clypeaster japonicus, Encope emarginata, Leodia sexiesperforata, Peronella japonica. Eggs were collected from a single female specimen of each species and immediately placed into RNA later for preservation. The samples were left in RNAlater at 4°C for at least 1 day and then transferred to a -80°C freezer until RNA extraction. RNA extraction was performed using Trizol and was then treated with Ambion's Turbo DNA-free kit. RNA was quantified using both Nanodrop and an Agilent 2100 Bioanalyzer. Only RNA samples with an RNA integrity number (RIN) of >7 were used. RNA-Seq libraries were prepared using the NEB Ultra Directional kit using the standard protocol and multiplexed with 6 bp NEB multiplex primers. Paired-end 100 libraries were generated from the resultant libraries at the UC Berkeley Genome Center using an Illumina HighSeq 2500.
Bathysalenia phoinissa, Micropyga tuberculata: The material was collected by the deep-sea cruise BIOMAGLO (Corbari et al., 2017)  , with the financial support of the European Union (Xe FED). Specimens were sorted into taxonomic bins at class level and collectively fixed in 70% ethanol at room temperature. Following taxonomic identification, tissue samples were obtained using sterilized forceps and disposable scalpel blades. Tissue subsamples were collected in high-purity 96% ethanol and stored at −40°C until DNA extraction. Total genomic DNA was extracted using the DNeasy Blood and Tissue Kit (QIAGEN) following the manufacturer's instructions, complemented by the addition of 4 µl RNAse A (QIAGEN, 100 mg/ml) for RNA digestion. Elution buffer volume was reduced to 60 µl and the elution step repeated twice per sample in order to maximize DNA concentration and yield. After collection of the first elution, spin-column membranes were rinsed again twice with fresh elution buffer (40 µl) to further increase yield. Elutions were collected separately, with the first one used for sequencing and the second for quality control. For the latter, a portion of the mitochondrial cytochrome c oxidase subunit I was amplified using PCR. Amplifications were conducted with TopTaq DNA Polymerase (QIAGEN) using 1 μl of extracted genomic DNA (approx. 10-15 ng). Details on primers and protocols can be found in Bronstein et al., 2019. PCR products were visualized on a 1% agarose gel, purified using ExoSAP-IT (Affymetrix), and sequenced at Microsynth GmbH (Vienna, Austria) with the same primers. Sequences are deposited in GenBank under the accession numbers MZ568824 and MZ568825.
Prior to library preparation, total genomic DNA concentration was estimated using Qubit DNA BR Assay Kit (Invitrogen) in order to determine library preparation strategy. This step was carried out by Macrogen (Korea), using an Illumina TruSeq DNA PCR-free kit (for Micropyga) and an Illumina TruSeq Nano DNA kit (for Bathysalenia). Library preparation followed the manufacturer's instructions, targeting an insert size of 350 bp. The libraries were sequenced by Macrogen (Korea) on a shared lane of a multiplexed run using an Illumina HiSeq X instrument with 150 bp pairedend reads. Sequencing depth was 134.1 and 161.3 million reads for Bathysalenia and Micropyga, respectively. Pre-processing of Illumina shotgun data was carried out by removal of adapters using BBDuk (https://sourceforge.net/projects/bbmap/) with settings: minlen = 50 ktrim = r k = 23 mink = 11 tpe tbo; followed by quality trimming of reads using UrQt v. 1.0.18 (Modolo and Lerat, 2015), discarding regions with phred quality score <28 and sequences of size <50. Assembly of the data was carried out with MEGAHIT v. 1.1.2 (Li et al., 2015), in an iterative approach followed by assessment of assembly quality using BUSCO v. 3.0.2. scores (Seppey, 2019) using the metazoan dataset ODB v. 9 (Kriventseva et al., 2015).  (Smit et al., 2015) prior to gene prediction, using settings: -norna -xsmall; resulting in 3.82% and 2.9% masked bases for Bathysalenia and Micropyga, respectively. Gene prediction was carried out using AUGUSTUS v. 3.2.3 (Stanke et al., 2006) using settings: --strand= both --singlestrand= false --genemodel= partial --codingseq= on --sample = 0 --alternatives-from-sampling= false --exonnames= on --softmasking= on. This step relied on a training dataset composed of USCOs derived from the most recent S. purpuratus genome (Spur v. 5).
Echinothrix calamaris, Tripneustes gratilla. Tissue samples were taken from live sea urchins housed in laboratory aquaria and total RNA was immediately extracted from 150 mg of tissue using a PureLink RNA extraction kit (Invitrogen). The quality of total RNA was assessed on a BioAnalyzer 2100 (Agilent Technologies, Santa Clara, CA) to ensure a RIN > 9 for all samples. Mature mRNA was extracted from 1 µg of total RNA and cDNA libraries were constructed using a TruSeq kit (Illumina). Quality control of libraries was assessed on a BioAnalyzer and quantification measured using qPCR. NEBNext Multiplex adaptors were added via ligation, and the cDNA libraries were sequenced at Genome Quebec, McGill University. Three libraries were multiplex sequenced on one lane of Illumina at a concentration of 8 pM per cDNA library using HiSeq 2500 transcriptome sequencing to generate 125 bp paired-end reads. This resulted in approximately 80 million reads for each transcriptome.
Tetrapygus niger. Adult specimens were collected and transported alive to the University of Concepcion. Tissue samples (female gonad and tube feet) from one individual were finely chopped with a scalpel, and total RNA was extracted immediately using TriReagent (Sigma), following the manufacturer's instructions. Total RNA concentration was estimated using QuantiFluor RNA System (111.84-339.04 ng/µl) and quality was assessed using Agilent RNA 6000 Nano kit with an Agilent 2100 TapeStation (RIN: 5.4-8.4). cDNA libraries were prepared at Genoma Mayor SpA (Chile) using TruSeq Stranded mRNA Kit (Illumina) and sequenced on a multiplexed run using Illumina HiSeq 4000 platform with 150 paired-end reads, resulting in 54.1 M reads for gonad female and 52.7 M reads for tube feet, respectively. These two transcriptomic datasets were combined before performing all steps of bioinformatic processing.
Eucidaris metularia. A single specimen was preserved in RNAlater after being starved overnight and kept for 1 day at 4°C before long-term storage at -80°C. Total RNA was extracted using Directzol RNA Miniprep Kit (with in-column DNase treatment; Zymo Research) from Trizol. mRNA was isolated with Dynabeads mRNA Direct Micro Kit (Invitrogen). RNA concentration was estimated using Qubit RNA broad range assay kit, and quality was assessed using RNA ScreenTape with an Agilent 4200 TapeStation. Values were used to customize downstream protocols following manufacturers' instructions. Library preparation was performed with KAPA-Stranded RNA-Seq kits, targeting an insert size in the range of 200-300 bp. The quality and concentration of libraries were assessed using DNA ScreenTape. The library was sequenced in a multiplexed pair-end runs using Illumina HiSeq 4000 with seven other libraries in the same lane (all previously published in Mongiardino Koch et al., 2018). In order to minimize read crossover, 10 bp sequence tags designed to be robust to indel and substitution errors were employed (Faircloth and Glenn, 2012).
The assembled transcriptome was filtered from cross-contaminant reads product of multiplexed sequencing using CroCo v. 1.1 (Simion et al., 2018). Seven other transcriptomes sequenced together were also employed, and results of this step are reported in Fig. S1 of Mongiardino Koch and Thompson, 2021d. Appendix 1-table 1. Transcriptomic/genomic datasets added in this study relative to the taxon sampling of Mongiardino Koch et al., 2018 andThompson, 2021d. Taxa with citations were taken from the literature, and details can be found in the corresponding studies and associated NCBI BioProjects. Taxa are shown in alphabetical order; those identified with 'OG' are outgroup taxa (i.e., non-echinoids). Voucher specimens are deposited at the Benthic Invertebrate Collection, Scripps Institution of Oceanography (SIO-BIC), and the Echinoderm Collection, Muséum National d'Histoire Naturelle (MNHN-IE). If multiple accession numbers are shown for a given taxon, these datasets were merged before assembly. Similar details for all other specimens can be found in Mongiardino Koch et al., 2018 andThompson, 2021d. Appendix 1-table 2. Details of molecular datasets and supermatrix. Statistics for raw reads and assemblies are shown for datasets incorporated in this study relative to the sampling of Mongiardino Koch et al., 2018 andThompson, 2021d, where similar statistics can be found for the other datasets. Taxa are shown in alphabetical order; those identified with 'OG' are outgroup taxa (i.e., non-echinoids). Novel datasets correspond to Illumina pair-end transcriptomes, except for two draft genomes (Bathysalenia phoinissa and Micropyga tuberculata). Mean insert size is expressed in number of base pairs. For transcriptomes, read pairs (initial) shows numbers input into Agalma (Dunn et al., 2013), that is, after processing with Trimmomatic (Bolger et al., 2014) or UrQt (Modolo and Lerat, 2015), while read pairs (retained) show those that passed the Agalma curation checks (including ribosomal, adapter, quality, and compositional filters), and represent the final number of read pairs used for assembly. For genomes, see information in the bioinformatic details above. Assemblies were further sanitized with either alien_index (Ryan, 2014) alone or in combination with CroCo (Simion et al., 2018), and the Appendix 1-table 1 Continued number of assembled transcripts/contigs shows the size of datasets after these curation steps. The number of loci shows the occupancy of terminals in the supermatrix output by Agalma (1346 loci at 70% occupancy), after which loci were further removed by TreeShrink (Mai and Mirarab, 2018), resulting in the final occupancy scores.  Appendix 3-figure 1. Ordering of loci enforced using genesortR (Mongiardino Koch, 2021b) and its relationship to the seven gene properties employed. High ranking loci (i.e., the most phylogenetically useful) show low root-to-tip variances (or high clock-likeness), low saturation, and low compositional heterogeneity, as well as high average bootstrap and Robinson-Foulds similarity to a target topology (in this case, with the contentious relationship among major lineages of Echinacea collapsed).