1. Duncan T Odom  Is a corresponding author
  1. University of Cambridge, United Kingdom

There is an old saying in computational circles that researchers in bioinformatics would rather use someone else’s toothbrush than use someone else’s code. One example of this adage being true can be seen in previous attempts to compare the rates at which differences in the mechanisms that control DNA accumulate in different species and lineages.

The information contained in DNA is first accessed by dedicated proteins called transcription factors (TF) that bind to preferred sequence of bases in the DNA. This sequence is typically short, between 8 and 20 bases in length (Vaquerizas et al., 2009), although some can be as long as 35 bases (Filippova et al., 1996). After transcription factor binding has taken place, the basal transcription machinery and its associated complexes open the region’s chromatin and begin transcribing DNA into RNA. These crude transcripts must undergo extensive processing and maturation before they can be exported to the cytoplasm as mature messenger RNA (mRNA). Understanding the rate at which all these steps (notably transcription factor binding and the production of mRNA) change during evolution is a long-standing goal in genetics (Wray, 2007; Wittkopp and Kalay, 2012).

Technically, it is (relatively) easy to map all the contacts between the transcription factors and the DNA, and also to map all the mRNA molecules, in a biological sample using high-throughput sequencing technologies. A number of research groups have compared the amount of transcription factor binding in many species of flies and mammals (He et al., 2011; Paris et al., 2013; Schmidt et al., 2010; Ballester et al., 2014). Based on this work it seemed as if transcription factor binding evolved rapidly in mammalian tissues (Weirauch and Hughes, 2010), but only very slowly in fruit flies (He et al., 2011). However, it can be difficult to compare the first results generated in an entirely novel field of study because different groups often use very different approaches. And in this case this difficulty is further compounded by the toothbrush issue.

Now, in eLife, Trey Ideker and colleagues at the University of California San Diego – including Anne-Ruxandra Carvunis, Tina Wang and Dylan Skola as joint first authors – report that they used a new analysis pipeline to study the raw data for more than 25 species of complex eukaryotes across three animal lineages (mammals, birds and insects) that previously had only been studied in isolation (Carvunis et al., 2015). In other words, they have cleaned everyone’s teeth with the same toothbrush. Moreover, their pipeline could be tweaked to vary the analysis parameters for all the datasets across three lineages at once, thus allowing them to make like-with-like comparisons.

This intellectual scrubbing resulted in two major insights. First, it appears that transcription factor binding (which dictates the function of the genome) and mRNA both evolve at a shared (and perhaps even fundamental) rate in complex eukaryotes. This result is somewhat surprising since most evolutionary geneticists think that the mechanisms that influence genome or functional evolution for the lineages studied by Carvunis et al. are radically different.

Second, particularly in mammals, the evolution of the genome sequence en masse is much more rapid than the evolution of transcription factor binding and transcription. This disconnect may be linked to the instability of the large number largely-silent repeat elements in mammalian genomes, and/or to the fact that insects and birds have more stable genomes.

Moreover, Carvunis et al. have powerfully demonstrated why it is important for all of us in the functional genomics community to meticulously curate our raw data and to make it readily available for others to analyse. None of the insights reported in this work would have been possible without easy access to carefully annotated sequencing reads from the original studies.

References

Article and author information

Author details

  1. Duncan T Odom, Reviewing Editor

    Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
    For correspondence
    Duncan.Odom@cruk.cam.ac.uk
    Competing interests
    The author declares that no competing interests exist.

Publication history

  1. Version of Record published: February 11, 2016 (version 1)

Copyright

© 2016, Odom

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,566
    views
  • 179
    downloads
  • 0
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Duncan T Odom
(2016)
Comparative Genomics: One for all
eLife 5:e14150.
https://doi.org/10.7554/eLife.14150
  1. Further reading

Further reading

    1. Computational and Systems Biology
    2. Physics of Living Systems
    Taegon Chung, Iksoo Chang, Sangyeol Kim
    Research Article

    Locomotion is a fundamental behavior of Caenorhabditis elegans (C. elegans). Previous works on kinetic simulations of animals helped researchers understand the physical mechanisms of locomotion and the muscle-controlling principles of neuronal circuits as an actuator part. It has yet to be understood how C. elegans utilizes the frictional forces caused by the tension of its muscles to perform sequenced locomotive behaviors. Here, we present a two-dimensional rigid body chain model for the locomotion of C. elegans by developing Newtonian equations of motion for each body segment of C. elegans. Having accounted for friction-coefficients of the surrounding environment, elastic constants of C. elegans, and its kymogram from experiments, our kinetic model (ElegansBot) reproduced various locomotion of C. elegans such as, but not limited to, forward-backward-(omega turn)-forward locomotion constituting escaping behavior and delta-turn navigation. Additionally, ElegansBot precisely quantified the forces acting on each body segment of C. elegans to allow investigation of the force distribution. This model will facilitate our understanding of the detailed mechanism of various locomotive behaviors at any given friction-coefficients of the surrounding environment. Furthermore, as the model ensures the performance of realistic behavior, it can be used to research actuator-controller interaction between muscles and neuronal circuits.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Lauren Kuffler, Daniel A Skelly ... Gregory W Carter
    Research Article

    Gene expression is known to be affected by interactions between local genetic variation and DNA accessibility, with the latter organized into three-dimensional chromatin structures. Analyses of these interactions have previously been limited, obscuring their regulatory context, and the extent to which they occur throughout the genome. Here, we undertake a genome-scale analysis of these interactions in a genetically diverse population to systematically identify global genetic–epigenetic interaction, and reveal constraints imposed by chromatin structure. We establish the extent and structure of genotype-by-epigenotype interaction using embryonic stem cells derived from Diversity Outbred mice. This mouse population segregates millions of variants from eight inbred founders, enabling precision genetic mapping with extensive genotypic and phenotypic diversity. With 176 samples profiled for genotype, gene expression, and open chromatin, we used regression modeling to infer genetic–epigenetic interactions on a genome-wide scale. Our results demonstrate that statistical interactions between genetic variants and chromatin accessibility are common throughout the genome. We found that these interactions occur within the local area of the affected gene, and that this locality corresponds to topologically associated domains (TADs). The likelihood of interaction was most strongly defined by the three-dimensional (3D) domain structure rather than linear DNA sequence. We show that stable 3D genome structure is an effective tool to guide searches for regulatory elements and, conversely, that regulatory elements in genetically diverse populations provide a means to infer 3D genome structure. We confirmed this finding with CTCF ChIP-seq that revealed strain-specific binding in the inbred founder mice. In stem cells, open chromatin participating in the most significant regression models demonstrated an enrichment for developmental genes and the TAD-forming CTCF-binding complex, providing an opportunity for statistical inference of shifting TAD boundaries operating during early development. These findings provide evidence that genetic and epigenetic factors operate within the context of 3D chromatin structure.