What to compare and how: Comparative transcriptomics for Evo‐Devo

ABSTRACT Evolutionary developmental biology has grown historically from the capacity to relate patterns of evolution in anatomy to patterns of evolution of expression of specific genes, whether between very distantly related species, or very closely related species or populations. Scaling up such studies by taking advantage of modern transcriptomics brings promising improvements, allowing us to estimate the overall impact and molecular mechanisms of convergence, constraint or innovation in anatomy and development. But it also presents major challenges, including the computational definitions of anatomical homology and of organ function, the criteria for the comparison of developmental stages, the annotation of transcriptomics data to proper anatomical and developmental terms, and the statistical methods to compare transcriptomic data between species to highlight significant conservation or changes. In this article, we review these challenges, and the ongoing efforts to address them, which are emerging from bioinformatics work on ontologies, evolutionary statistics, and data curation, with a focus on their implementation in the context of the development of our database Bgee (http://bgee.org). J. Exp. Zool. (Mol. Dev. Evol.) 324B: 372–382, 2015. © 2015 The Authors. J. Exp. Zool. (Mol. Dev. Evol.) published by Wiley Periodicals, Inc.

as in situ hybridization or quantitative RT-PCR. With microarrays, then RNA-seq, it has become possible to study expression patterns on a genome-wide scale. Yet, application of these techniques to questions from Evo-Devo has lagged somewhat behind other fields. This stands in contrast to the rapid and productive use that was made of the increasingly available genome sequences, illustrated for example by the in-depth study of the evolution of Hox genes (Hoegg and Meyer, 2005), or enhancers sequences (Holland et al., 2008).
Although there has been recent progress in the comparison of transcriptomes of closely related species (Gallego Romero et al., 2012;Necsulea and Kaessmann, 2014) the difficulties in answering many of the questions posed in Evo-Devo remains. How do changes in expression patterns translate to changes in the function of homologous organs? What is the molecular basis for morphological innovation? What can deep conservation of expression tell us about homology relations between distant species? How much of transcriptome evolution is convergent versus parallel? Here, we discuss some of these issues and emerging solutions. We believe that the next years will be a very fruitful time for the application of transcriptomics to Evo-Devo, and that this will yield important insights into the evolution of development and of anatomy.

WHAT TO COMPARE? THE PROBLEM OF ANATOMICAL HOMOLOGY AND FUNCTIONAL EQUIVALENCE
A major problem in comparing expression patterns on a large scale between species is the definition of what is comparable. In comparative genomics, orthologous genes can be readily identified by sequence comparison, and reliable data are now available from numerous databases (Sonnhammer et al., 2014). But expression patterns are much more complex combinations of genes and of anatomical and temporal patterns. Thus criteria to guide the comparison between these structures are necessary.
The most obvious comparison criterion is homology, as is used for the genes themselves. Homology is most commonly taken in the sense of "historical homology," a definition in which the common evolutionary origin has to derive from common descent, that is, the common ancestor of both species possessed the anatomical structure, and both present day species being compared inherited it from this ancestor. But the interpretations of the homology concept have changed with progresses in biology, and different subfields operate with different definitions . Notably, the "biological homology" concept considers organs homologous if they share a set of developmental constraints (Wagner,'89). This processoriented and more mechanistic definition, mostly used in Evo-Devo, encompasses repeated parts in the same organism (e.g., somites), as well as sexually differentiated parts of individuals of the same species (e.g., testis and ovaries), which do not pass the common ancestry criterion, yet are important to our understanding of the relation between development and the evolution of form.
For transcriptome comparisons between very closely related species, such distinctions are usually not an issue: homologous structures are mostly obvious and comply with most definitions of the homology concept. For example, in comparisons among Great Apes, such as between humans and chimpanzees [see Gallego Romero et al. (2012) for a review], it is mostly trivial to accept that structures with the same name are homologous, and that moreover they essentially perform the same function. The situation is similar for the comparison of domesticated plants and their wild relatives (Koenig et al., 2013), or even two slightly more divergent C3 and C4 Cleomaceae species (Kulahoglu et al., 2014). In the latter case, there is for example little doubt that the sepals or petals are homologous over the ca. 10-20 million years of divergence investigated.
But when more distant species are compared, which anatomical structures should be compared becomes less obvious. Moreover, it is not always clear whether the most biologically relevant comparisons should involve homologous or functionally equivalent structures. Indeed, some homologies are well established, but correspond to organs which are widely divergent in function and structure [e.g., tetrapod lung and actinopterygian fish swim bladder (Zheng et al., 2011)]. For other structures, such as the vomeronasal organ in different tetrapods, or the ovary between teleost fishes and mammals, similarity of function is clear but homology is debated [for a discussion see Niknejad et al. (2012)]. Similarly, the homology of segmentation mechanisms among bilaterian lineages remains unclear (De Robertis, 2008).
To take into account the complexity of homology on a large scale, we have been developing computational representations of homology, through ontologies (Parmentier et al., 2010;Roux and Robinson-Rechavi, 2010;Niknejad et al., 2012), and using them in the annotation of transcriptome data to anatomical information. The distribution of the ontologies (Noy et al., 2009), and the recent merger of our efforts into Uberon (Mungall et al., 2012;Haendel et al., 2014), allows any comparative transcriptomics or comparative functional genomics project with an interest in homology to built upon this work (Robinson and Webber, 2014).
It is worth noting that the comparison of homologous organs may not be the relevant criterion if the aim is to detect sets of genes whose expression is responsible for the emergence or maintenance of particular functions (Bolker, 2014). It is generally assumed that the expression of core genes responsible for conserved functions should be itself conserved (Levine and Davidson, 2005). But in homologous structures, the signal of functional constraint might be difficult to characterize from the background similarity in gene expression due to their common origin. The task of isolating the common molecular signals responsible for functional similarity could thus be facilitated by comparing non-homologous organs with similarity in function, for example, tetrapod lung versus fish gills, both used as respiratory organs, or structures used in pregnancies of female mammals versus male Syngnathids (seahorses and pipefishes) (St€ olting and Wilson, 2007). Homologous organs that present functions that were not inherited from the common ancestor but have evolved in parallel can also be used, for example, body parts of cichlid fishes which evolved in parallel in African or Central American lakes to perform the same functions, in species that colonized similar ecological niches (Elmer and Meyer, 2011). In this context many studies have successfully related patterns of phenotypic convergence to patterns of gene expression, at the scale of a few genes.
Between closely related species, it was notably shown that independent genetic changes could be responsible for regulatory changes, often of homologous genes, leading to the parallel evolution of similar phenotypes (Martin and Orgogozo, 2013). Examples include the evolution of black spots on wings of different Drosophila species, involving the regulation of the yellow gene (Gompel et al., 2005;Prud'homme et al., 2006), or the parallel evolution of opsins expression affecting color perception in cichlids O'Quin et al., 2010). Between marine and freshwater populations of the stickleback fish, differences in the number of armor plates have been associated to an increased linkage disequilibrium within intron 2 of the EDA gene (Colosimo et al., 2005), likely to influence the regulation of this signaling gene between the two morphs (Knecht et al., 2007). A recent genome-wide study confirmed this finding, and found that across all genomic variants implicated in the repeated adaptations of marine populations to freshwater, regulatory changes likely played a substantially more prominent role than coding changes (Jones et al., 2012). Interestingly, parallel patterns of regulatory changes in the same genes in different species or populations are often caused by nucleotide substitutions affecting the same cis-regulatory elements (Elmer and Meyer, 2011).
There are still few transcriptome-wide studies of convergent anatomical structures, but recent work highlights the diversity of signatures underlying phenotypic convergence. For example, the convergent evolution of bacterial photophores of two squid species extends beyond anatomy, to highly convergent transcriptome profiles (Pankey et al., 2014). Whereas the convergent evolution of external genitalia in different amniotes is mirrored in patterns of transcriptomes which relate more to the tissue of origin of these new structures than to the evolution of their functional equivalence (Tschopp et al., 2014). It remains to be seen which case is more general: higher transcriptome similarity between convergent structures, or higher transcriptome similarity between homologous (yet functionally divergent) structures. And it remains to be seen which factors could lead to deep reorganization of transcriptomes with the evolution of new functions.
At larger evolutionary distances, such as between insects and vertebrates, there are few unambiguous homologous anatomical structures between species, and most shared functions are the result of convergence or parallel evolution. But fascinating, and unexpected, examples of gene expression patterns similarity were uncovered by Evo-Devo studies. The most famous example is probably the case of vertebrate and arthropods eyes, which are not homologous, but express similar transcription factors during their developmental cascade [including Pax6 (Halder et al.,'95)]. The concept of "deep homology" (Shubin et al., 2009) has been proposed to describe the relation between structures which share some homology in their development, notably in their key patterning genes, also share similarity of function and of structure, yet are not homologous at the anatomical level, certainly not in the sense of historical homology. Deep homology is formalized in our HOM ontology ) (http://purl.obolibrary.org/obo/HOM_0000044) as a special case of parallelism, which like convergence is a case of homoplasy. Other examples illustrate such a co-option of common, reusable toolkits, such as the developmental program of arthropod and vertebrate appendages, implicating the transcription factor Dll/Dlx (Nielsen and Martinez, 2003). Interestingly, this gene is also expressed in the distal tips of "outgrowths" in diverse animals, for example, the horns of beetles (Moczek, 2006). Interestingly, it seems that some orthologous loci behave as mutational "hotspots" for parallel quantitative variation of phenotypes, over phylogenetic distances spanning populations to phyla (e.g., insects and nematodes, or mollusks and vertebrates) (Martin and Orgogozo, 2013).
Conversely, convergent or parallel phenotypes can evolve with non-homologous gene expression. For example, the convergent evolution of coat color in different subspecies of beach mice probably evolved through changes in expression of different genes (Manceau et al., 2010). More striking, homologous structures are not always patterned by conserved expression patterns. For example, the vulva of different species of nematodes are patterned by non-homologous pathways (Schlager et al., 2006) and the proteins expressed in the crystalline lens of different vertebrate species are unrelated (Piatigorsky and Wistow,'91).
Overall, the correspondence between homology at the levels of gene expression and of anatomy is complex (Hall,'94), yet with proper experimental design can be very informative. Genes can be co-opted to conduct the same function in non-homologous structures, while vastly divergent expression can underlie homologous structures. Determining which anatomical comparisons are likely to be informative in the comparison of transcriptomes between species is not easy, especially between very distant species, and requires good anatomical and developmental knowledge. A challenge of future Evo-Devo transcriptomics is to integrate such knowledge with genome-scale studies to provide relevant insight into the relation between the evolution of organ function and homology.

WHAT TO COMPARE? THE PROBLEM OF HETEROCHRONY
For developmental processes, as for anatomy, we need to define what is comparable if we are to perform meaningful transcriptome comparisons. But unlike for anatomical structures, there are no clear homology relations between developmental stages. This is mostly because of heterochrony-differences in the relative timing of developing organs between species. For example, the human and rat developmental stages at which heart reaches the same developmental milestone might differ, and other structures in these two species can reach their developmental milestones in a different order [detailed in Jeffery et al. (2002)]. Thus, even a pair of very similar developmental stages between two species will usually include some organs or tissues whose development is more advanced in one species than the other.
Accordingly, heterochrony patterns are recovered in transcriptome data. For example, clear examples of timing differences in the expression of several gene pathways were found in a whole body comparison of closely related Xenopus species (Yanai et al., 2011). In a more anatomically targeted study focusing on the development of human, chimpanzee and rhesus macaque prefrontal cortex, as many as 71% of genes were found to change in timing of expression (Somel et al., 2009). In both of these examples, closely related species were compared. For more distant species, very broad developmental periods can be compared, to diminish the impact of heterochrony and ensure some level of comparability. For example, in Comte et al. (2010), we calculated the conservation of expression of orthologous genes between zebrafish and mouse for seven broad developmental stages, which lacked fine resolution, but was sufficient to detect significant differences between early development (up to neurula) and late development. These broad bilaterian "metastages" are available in our database of gene expression evolution Bgee (Bastian et al., 2008). Alternatively, transcriptome data themselves can be used to decide which developmental time points should be compared. The expression levels of genes possess information on the temporal order of samples from different developmental stages (Anavy et al., 2014;Snoek et al., 2014). It is thus possible to perform all-against-all comparisons of transcriptome time series from pairs of species, to align their developmental stages and define the "most comparable" ones. This approach has allowed to identify interesting patterns of expression conservation or divergence over ontogeny (Parikh et al., 2010;Irie and Kuratani, 2011), notably a minimum temporal divergence between Drosophila species at mid development (Kalinka et al., 2010;Gerstein et al., 2014;Li et al., 2014).

CAN WE COMPARE LEVELS OF EXPRESSION BETWEEN SPECIES ON A LARGE-SCALE?
In classical small-scale Evo-Devo studies, gene expression patterns have been typically compared in terms of presence or absence (Carroll, 2005). Yet expression levels can be key in understanding changes in ontogeny. This is for example illustrated by patterning genes whose expression levels influence developmental phenotypes-for example, Ubx for the formation of Drosophila halteres (de Navas et al., 2011)-but also by gradient genes whose expression levels control growth and differentiation during development-for example, Dpp for Drosophila eye morphogenesis (Wartlick et al., 2014), retinoic acid and fibroblast growth factor signaling for vertebrate neurogenesis (Pera et al., 2014), or KNOX genes for tomato ripening (Nadakuduti et al., 2014).
Such changes in expression levels have been shown to be involved in the emergence of developmental innovations during evolution. For example, differences in metamorphosis among vertebrates, including heterochronic shifts, seem to be driven in large part by changes in the expression level of thyroid hormone receptors (Laudet, 2011). Other examples include floral pigmentation differences between plant species, determined in part by changes in expression level of genes from the anthocyanin pathway (Sobel and Streisfeld, 2013), wing pigmentation differences across Drosophila species, controlled by the interplay between the expression levels of the yellow, ebony, and Dll genes (Wittkopp et al., 2002;Arnoult et al., 2013), and the fin-to-limb transition in evolution, involving changes in the complex interactions between Hox, Shh, Fgf8, and Wnt3a signals (Yano and Tamura, 2013).
Thus it is clear that the level of gene expression, and its evolution, plays a role in morphogenesis and its evolution. To understand the extent of this role, large-scale quantitative comparisons are necessary. A comparison of expression levels in mammalian organs identified numerous modules of genes with conserved tissue-specific expression, but whose absolute expression levels shifted in particular taxonomic groups, for example, primates (Brawand et al., 2011;Necsulea and Kaessmann, 2014). And the comparison of developmental transcriptomes of Xenopus species revealed that the dominant mode of divergence between species was through changes in levels of expression, which occurred at a much higher frequency than heterochronic shifts in expression (Yanai et al., 2011). It is thus relevant to consider expression levels as a primary evolving phenotype.
Microarrays allowed the first large-scale studies of the evolution of gene expression between species, but their analysis in an evolutionary context poses specific problems. Most microarray datasets whose comparison would be interesting were generated by different studies carried out separately in different species. Absolute expression levels are thus difficult to compare directly in such cases because they suffer from important batch effects. There can also be large biases in signal between microarray platforms and technologies used in different species, for example because the hybridization strength depends on probes and on gene sequence composition. Thus, to detect conserved or divergent levels of expression between species, post-processing approaches are often used. It is for example possible to compare lists of co-expressed genes, or modules, derived individually in each species (Bergmann et al., 2003;Stuart et al., 2003;Lu et al., 2009;Piasecka et al., 2012a). It is also possible to compare lists of differentially expressed genes between different conditions in each species, or to compare lists of functional categories of genes enriched for differentially expressed genes in each species (Lu et al., 2009).
Correlative methods have also been used to compare the hybridization signals across species. Gene expression is considered conserved if the orthologs are strongly correlated across different conditions or tissues between species. The most commonly used methods are Pearson or Spearman correlations and Euclidian distance (Su et al., 2004;Liao and Zhang, 2006;Xing et al., 2007;Parikh et al., 2010). However, the results of these approaches have been shown to differ greatly when different measures or data normalization schemes are used, and they behave differently for tissue-specific genes or broadly expressed (housekeeping) genes (Piasecka et al., 2012b), which complicates their interpretation. Additionally, correlative methods rarely take into account the variability of expression levels within species.
Direct comparing of hybridization signal is more legitimate when technical problems of microarrays in comparative studies are minimized or circumvented, for example, when the hybridization of samples from different species is performed within the same study, on the same microarray platform (Ranz et al., 2003;Rifkin et al., 2003;Nuzhdin et al., 2004;Ometto et al., 2011). Unfortunately, such an approach only works for the comparison of very closely related species where the hybridization of cDNAs on probes of the microarray is not perturbed by the low nucleotide divergence in the sequence of orthologous genes (Lu et al., 2009). Otherwise, probes that do not match exactly all orthologous target genes have to be ignored (Khaitovich et al., 2005;Kalinka et al., 2010). Customized multi-species microarrays have also been designed to eliminate the sequence mismatch effects. Samples from two species are competitively hybridized to the probes and intensity ratios are estimated by averaging signal on probes from the different species (Gilad et al., 2005;Vallee et al., 2006;Oshlack et al., 2007). However these approaches are dependent on having a high quality genome annotation for each species in order to locate sequence differences and to account for their potential effect.
RNA-seq is now allowing major progresses in describing gene expression variation between species. This technique has a larger dynamic range than microarrays, and can also be used to study differences in exon usage and alternative splicing (Gallego Romero et al., 2012). Importantly, it allows the study of nonmodel species in the absence of a sequenced genome (Grabherr et al., 2011;Perry et al., 2012), or when the genome sequence is of poor quality. The advantages of RNA-seq allow more straightforward direct comparisons of expression levels between species, and interesting insights have been provided by the first evolutionary studies using this technology (Blekhman et al., 2010;Brawand et al., 2011;Perry et al., 2012;Khan et al., 2013).
When expression levels are directly compared across species, heuristic approaches have mostly been used to identify patterns of conservation or of directional selection (Fraser, 2011). Expression divergence between species is compared to the divergence within biological replicates of the same species. This procedure is similar to the McDonald and Kreitman test (MK test) of positive selection, widely used to characterize selection on protein-coding sequences (McDonald and Kreitman,'91). Genes under stabilizing selection display limited expression variation both within and between species, whereas genes under directional selection in a given lineage display a shift in mean expression between species, while maintaining a low variation within species. Of note, alternative explanations for the latter pattern are possible, such as a relaxation of selective constraints, although this scenario should be accompanied with an increase of within species variation. Progress has recently been made towards the development of formal models to test for positive selection on expression levels (Rohlfs et al., 2014). Still, these approaches cannot control for differences in environmental input to gene regulation between species (Gallego Romero et al., 2012).
Even though many technological challenges have been overcome in the past years, the comparison of expression levels between species holds numerous remaining difficulties (Dunn et al., 2013), particularly when the species compared are distant. First, expression levels measured in a given tissue are a mix of expression levels of the cell types constituting this tissue. Tissues of different species often display differences in cellular composition, notably in the proportion and localization of different cell types. This factor can result in artificial differences in expression levels and bias the comparison between species (Pantalacci and Semon, 2014). For example, brain, blood or pancreatic samples were shown to typically vary substantially in their cellular composition between even closely related species (Hill and Walsh, 2005;Magalhaes et al., 2010;Steiner et al., 2010).
Second, a basic source of measurement noise both in microarray and RNA-seq transcriptomic studies comes from alternative splicing (Barbosa-Morais et al., 2012;Merkin et al., 2012;McManus et al., 2014). Alternative-splicing patterns have been shown to evolve fast, and in consequence to be mostly species-specific. A difference of the isoforms expressed between species can affect the measure of expression level because of differences in length (e.g., longer transcripts produce more reads in RNA-seq) and in exon structure (e.g., microarray probes may be designed to bind only some exons). Although RNA-seq facilitates the reconstruction of the pool of all isoforms for each gene, which then could ideally be examined in isolation, this task still proves to be very challenging (Hayer et al., 2014).
Third, the heterogeneity of annotation may bias expression comparison between species. Gene models for orthologs may contain fragments that are not orthologous. A solution to this problem is to carefully align the nucleotide sequences of orthologs sets to compare, and restrict further analysis only to fragments that are strictly orthologous (Blekhman et al., 2010;Brawand et al., 2011). The difference in the genomic context of orthologous genes between species can also be an issue. Different genomes contain different numbers of paralogous genes and pseudogenes, which may both be sources of spurious reads or of unspecific hybridization background. This challenges the comparison of genomic regions whose sequences are unique in the genome of some species but not others, leading to differential mappability of RNA-seq reads.
Although several recent papers illustrate the power of comparative RNA-seq and other functional genomics approaches between close species, and even try to extend them to the comparison of distant species (Boyle et al., 2014;Chen et al., 2014;Gerstein et al., 2014;Li et al., 2014), these studies remain limited in the anatomical and developmental complexity which is covered. In consequence their applications to fundamental Evo-Devo questions remains limited. A notable study focused on the evolution of gene expression in distant Dictyostelium species, which was facilitated by the conserved morphology of these species (Parikh et al., 2010). But to link functional genomics to more fine questions of Evo-Devo, more detailed transcriptome profiles are needed, which will in turn raise the issues of homology and heterochrony which have so far been circumvented.

DATA SETS FOR COMPARATIVE TRANSCRIPTOMICS ARE INCREASINGLY AVAILABLE AND INTEGRATED
Transcriptomics data, whether from microarrays or RNA-seq experiments, is increasing exponentially in public databases (Rustici et al., 2013). Although it is mostly generated to answer biomedical and other practical questions, it covers an increasing number of organs, tissues, developmental stages, and species, of interest for Evo-Devo studies. The challenge is to identify, recover and organize the relevant data among the tens of thousands of tumor and cell line samples. While there are several ongoing efforts to organize transcriptome data [notably the Gene Expression Atlas (Kapushesky et al., 2010) or the Gene Expression Barcode (McCall et al., 2014); see also Rung and Brazma (2013)], we will focus here on our database Bgee (http://bgee.org) (Bastian et al., 2008), which is the only effort to our knowledge which focuses on Evo-Devo concerns: inter-species comparisons using anatomical homology, detailed developmental stage annotation, and present/absent calls for different expression data types. We provide an overview of some major transcriptomic datasets of interest for Evo-Devo, whether they are already integrated into Bgee or will be in the near future.
All microarray and RNA-seq datasets come from public repositories such Gene Expression Omnibus (GEO) (Barrett et al., 2013) or ArrayExpress (Rustici et al., 2013) and are carefully annotated by Bgee curators to appropriate anatomical structures and developmental stages classes of the ontologies (Haendel et al., 2014). All selected datasets and samples were collected in normal (non-treated) conditions, and from wild-type animals. This latter condition is critical for evolutionary studies, where expression should be as much as possible relevant to the evolutionary history and wild-type selective pressures experienced by the species compared.
It is clear that the integration of datasets for non-model species obtained with RNA-seq is much easier than equivalent microarray datasets prepared using different technologies and often even with in-house designed platforms. Such data, from diverse species, is increasingly available, and much of it is not yet integrated into Bgee. For example, a newly emerging source of transcriptomic data from many primate species is the non-human primate reference transcriptome resource (NHPRTR). Their full RNA-seq dataset consists of 157 libraries across 14 species or subspecies: Chimpanzee (P. Squirrel monkey (Saimiri sp.), across up to 14 different tissues (Pipes et al., 2013).
Additionally, the ENCODE and modENCODE consortia have recently released large amounts of RNA-seq data from mouse, human, worm and fly, sampling numerous developmental stages, tissues and cell types, many of them highly relevant to Evo-Devo studies. Notably, more than 140 RNA-seq worm samples and 250 fly samples cover their development at great resolution (35 developmental stages for worm and 30 developmental stages for fly) (Graveley et al., 2011;Gerstein et al., 2014;Li et al., 2014). Although the majority of the human transcriptomic data from the ENCODE project comes from cancer or immortalized cell lines, which limits its use for evolutionary analysis, more than 200 human RNA-seq libraries were recently used along with worm and fly data for the identification of evolutionary conserved transcription modules between these distantly related species (Gerstein et al., 2014).
On a final note, while RNA-seq and microarrays provide genome-wide information, they often lack the fine resolution of in situ hybridizations, which are mostly small-scale. Yet, there exist a few large-scale projects aimed at generating in situ hybridizations for every gene of a species at different developmental time points, for example for mouse (Diez-Roux et al., 2011) or for zebrafish (Thisse et al., 2004). Apart from these systematic efforts, a collection of many small-scale studies can sum up to a large valuable resource, as reflected by the success of the Gene Ontology functional annotations (du Plessis et al., 2011). The abundance of in situ hybridizations integrated from published studies now provides an almost genomic overview of expression patterns in some species (Kassahn et al., 2009). Bgee integrates in situ hybridization data from four databases: BDGP for fruit fly (Hammonds et al., 2013), MGI for mouse (Smith et al., 2014), Xenbase for frog (Bowes et al., 2010), and ZFIN for zebrafish (Bradford et al., 2011). Thus it includes information from tens of thousands of independent in situ hybridization experiments. Unlike for microarrays and RNA-seq data, the primary extraction and annotation of information from papers is done by the curators of each of these respective databases, while Bgee provides mapping to the bilaterian anatomy ontology Uberon (Haendel et al., 2014), and to ontologies of developmental stages. These data cover anatomical complexity in an extremely detailed manner.
Thus overall there exist abundant transcriptome and gene expression data, which are relevant to the questions posed by Evo-Devo, and modern database efforts make these data increasingly available for genome-wide investigation into evolutionary developmental biology.

FUTURE DIRECTIONS
A challenge as well as an opportunity of comparative transcriptomics for Evo-Devo is relating large scale quantitative trends with morphological observations. The main limitation of comparative approaches so far is that, while it is relatively easy to establish significant conservation of expression, it is very difficult at a large scale to characterize significant changes of evolutionary relevance. Expression can differ because of many experimental reasons, from environmental conditions to measurement errors, as well as because of a lack of stabilizing selection. Yet it can also differ because of developmental innovations, convergent evolution, or other evolutionary scenarios of interest to understanding morphological variety. In the absence of a good baseline, similar to the synonymous substitution rate of protein-coding gene codon models (Yang, 2006), calling such relevant changes in transcriptomic studies remains a major challenge. In the end, the evolutionary scenarios that are the most difficult to unveil are often the most fascinating, which motivates further methodological work into evolutionary transcriptomics.
Another important challenge is establishing causality: mechanistic causality, as in determining which gene expression changes determine changes in morphology; and evolutionary causality, as in determining which expression patterns are fixed for their role in morphological innovations or constraints. For this, we will probably need to obtain more transcriptomic data of variation both within and between species, and combine it with other functional genomics information, such as transcription factor binding patterns (ChIP-seq) and large-scale mutant screens.
Despite the challenges and limitations, we feel that we are at a privileged moment in the long pursuit of the goal of linking genome evolution to the evolution of form, with its constraints and its adaptive roles. First, the progress of RNA-seq is unlocking the function of the genomes of many species beyond the classical model organisms. Second, new transcriptomics techniques are arriving which promise to combine genomic scale with anatomical precision. The development of single-cell RNA-seq has opened a wide array of possibilities for the study of the transcriptomics signatures of precise sets of cells extracted from developing organs (Tang et al., 2011). For example it has been recently used to study the evolutionary history of the endoderm germ layer . There has been recent progress towards retrieving full transcriptomes of individual cells in situ, which proves particularly useful to study rare cell populations without compromising the quality of their transcriptomes (Battich et al., 2013;Avital et al., 2014;Lee et al., 2014;Lovatt et al., 2014;Crosetto et al., 2015). Finally, the improvement of ontologies and bioinformatics methods allow us to increasingly take advantage of these genomic datasets to answer long standing questions of Evo-Devo.