Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses

  1. Alexander Dobin1
  1. 1Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;
  2. 2Tri-Institutional Ph.D. Program in Computational Biology and Medicine, Weill Cornell Graduate School of Medical Sciences, New York, New York 10065, USA;
  3. 3Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Darlinghurst, New South Wales 2010, Australia;
  4. 4School of Medical Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
  • 5 Present address: Department of Physiology and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1A8, Canada

  • Corresponding authors: jesse.gillis{at}utoronto.ca, dobin{at}cshl.edu
  • Abstract

    The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.275613.121.

    • Freely available online through the Genome Research Open Access option.

    • Received April 7, 2021.
    • Accepted March 2, 2022.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server