Analysis of differential gene expression and alternative splicing is significantly influenced by choice of reference genome

  1. Colleen J. Doherty1
  1. 1Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695, USA
  2. 2Crop and Soil Science Department, North Carolina State University, Raleigh, North Carolina 27695, USA
  3. 3International Rice Research Institute (IRRI), DAPO Box 7777, Metro Manila, Philippines
  4. 4Max Planck Institute of Molecular Plant Physiology, D-14476, Potsdam, Germany
  5. 5Department of Agronomy, Kansas State University, Manhattan, Kansas 66506, USA
  1. Corresponding author: cjdohert{at}ncsu.edu
  1. 6 These authors contributed equally to this work.

Abstract

RNA-seq analysis has enabled the evaluation of transcriptional changes in many species including nonmodel organisms. However, in most species only a single reference genome is available and RNA-seq reads from highly divergent varieties are typically aligned to this reference. Here, we quantify the impacts of the choice of mapping genome in rice where three high-quality reference genomes are available. We aligned RNA-seq data from a popular productive rice variety to three different reference genomes and found that the identification of differentially expressed genes differed depending on which reference genome was used for mapping. Furthermore, the ability to detect differentially used transcript isoforms was profoundly affected by the choice of reference genome: Only 30% of the differentially used splicing features were detected when reads were mapped to the more commonly used, but more distantly related reference genome. This demonstrated that gene expression and splicing analysis varies considerably depending on the mapping reference genome, and that analysis of individuals that are distantly related to an available reference genome may be improved by acquisition of new genomic reference material. We observed that these differences in transcriptome analysis are, in part, due to the presence of single nucleotide polymorphisms between the sequenced individual and each respective reference genome, as well as annotation differences between the reference genomes that exist even between syntenic orthologs. We conclude that even between two closely related genomes of similar quality, using the reference genome that is most closely related to the species being sampled significantly improves transcriptome analysis.

Keywords

  • Received January 5, 2019.
  • Accepted March 6, 2019.

This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents