Breakpoint graphs and ancestral genome reconstructions

  1. Max A. Alekseyev and
  2. Pavel A. Pevzner,1
  1. Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA

    Abstract

    Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.

    Footnotes

    • 1 Corresponding author.

      E-mail ppevzner{at}cs.ucsd.edu; fax (858) 534-7029.

    • 2 The breakpoint graphs represent a popular technique for the rearrangement analysis since they reveal pairs of breakpoints representing footprints of the rearrangement events. See chapter 10 of Pevzner (2000) for background information on genome rearrangements and breakpoint graphs.

    • 3 In this study, we use the term “reversal” (common in bioinformatics literature) instead of the term “inversion” (common in biology literature). For circular chromosomes, fusions and translocations are not distinguishable, that is, every fusion of circular chromosomes can be viewed as a translocation, and vice versa.

    • 4 The detailed information about synteny blocks and assembly builds is provided in the Supplemental material. Out of 1360 synteny blocks (kindly provided by Jian Ma), three synteny blocks represent intermixed segments of the chromosome X and other chromosomes (the mouse chromosome 7 and the rat chromosomes 15 and 20). Since these blocks are short (16, 47, and 17 kb, respectively), we have discarded them to simplify the chromosome X analysis below. For better illustration of the breakpoint graphs, the vertex ∞ is shown in multiple copies as black dots, each connected by a single multi-edge to a regular vertex.

    • 5 Switching from black rearrangements to a mixture of black and green rearrangements is a simple but powerful paradigm that proved to be useful in previous studies (Bafna and Pevzner 1998; Tannier and Sagot 2004).

    • 6 The use of Graphic -consistent 2-breaks here is motivated by an important property that every Graphic -consistent transformation can be turned into a strict Graphic -consistent transformation by changing the order of 2-breaks. Therefore, we do not directly address the strictness requirement in MGRA that first produces a Graphic -consistent transformation of the genomes P1, P2,…, Pk into the genome X and then reorders it into a strict transformation.

    • 7 While this representation is not unique, all these representations are equivalent (i.e., they produce the same final result). Figure 6B illustrates transformation of a simple cycle on six vertices into three complete multi-edges with two 2-breaks.

    • 8 In the special case x0 = xm + 1 = ∞, and the flanking edges are of the same Graphic-consistent multi-color; we perform a fusion 2-break as shown in Figure 6D. In the case of m = 1 (i.e., when p contains a single simple multi-edge), c represents a complete multi-edge rather than a cycle (Fig. 6C) and does not require further processing.

    • 9 One can prove that the topology of the resulting graph does not depend on the order in which good cycles/paths are processed.

    • 10 We use the MRD node of the phylogenetic tree in Figure 5 to approximate the Boreoeutherian ancestor. While this study focuses on the Boreoeutherian ancestor, MGRA reconstructs ancestral genomes for every node of the phylogenetic tree. We emphasize that while reconstruction starts with selection of the root branch (as in Fig. 5), the choice of this branch and the exact location of the root X on this branch are rather arbitrary and not correlated with a specific ancestral genome of interest (in contrast to the alternative “root-driven” approach described in Supplement C). As described in the Reconstructing Ancestral Genomes section, the ancestral genomes are defined by the reverse transformation from the (whatever) root genome X to the leaf genomes. Ideally, different choices of the root branch and locations of the root X itself will result in the same set of ancestral genomes.

    • 11 We are not claiming that this association does not exist since it may be present in some of 100+ genomes with available cytogenetics data. However, there is no support for this association in the six mammalian genomes. We remark that Ma et al. (2006) also did not find support for this association.

    • 12 inferCARs reconstructions slightly differ from those reported in Ma et al. (2006) since we use the synteny blocks from the latest builds of mammalian genomes (provided by Jian Ma, University of California, Santa Cruz). Similar to Ma et al. (2006) and Kemkemer et al. (2006), we ignore very short CARs blocks in both inferCARs and MGRA reconstructions to simplify the analysis (see Supplemental Table S14).

    • 13 Adding the seventh genome increases the number of the synteny blocks to 1746 (by ∼30%) but reduces the coverage of the genomes by the synteny blocks from 89% to 79%.

    • 14 In contrast, Ma et al. (2006) assumed the primate–rodent topology.

    • [Supplemental material is available online at www.genome.org.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.082784.108.

      • Received June 30, 2008.
      • Accepted January 22, 2009.

    Related Articles

    | Table of Contents

    Preprint Server