Elsevier

Gene

Volume 533, Issue 1, 1 January 2014, Pages 366-373
Gene

Methods paper
Genome-wide identification of allele-specific effects on gene expression for single and multiple individuals

https://doi.org/10.1016/j.gene.2013.09.029Get rights and content

Highlights

  • We developed a maximum likelihood model to characterize ASE based on RNA-seq data.

  • Our method can identify ASE in multiple individuals.

  • The fraction of ASE was stable and close to the fraction of experimental method.

  • Simulations gave a better performance and robustness compared to binomial test.

Abstract

The analysis of allele-specific gene expression (ASE) is essential for the mapping of genetic variants that affect gene regulation, and for the identification of alleles that modify disease risk. Although RNA sequencing offers the opportunity to measure expression at allele levels, the availability of powerful statistical methods for mapping ASE in single or multiple individuals is limited. We developed a maximum likelihood model to characterize ASE in the human genome. Approximately 17% of genes displayed an allele-specific effect on gene expression in a single individual. Simulations using our model gave a better performance and improved robustness when compared with the binomial test, with different coverage levels, allelic expression fractions and random noise. In addition, our method can identify ASE in multiple individuals, with enhanced performance. This is helpful in understanding the mechanism of genetic regulation leading to expression changes, alternative splicing variants and even disease susceptibility.

Introduction

Allele-specific gene expression (ASE) is the representation of the two alleles of a given gene in the corresponding mRNA. Normal development and cellular processes require the ratio of expression of the two alleles to be different from the allelic representation in genomic DNA (50:50). However, the precise mechanisms by which allele-specific gene expression occurs are not yet understood and there may be multiple mechanisms. Studies of expression quantitative trait loci (eQTLs) have shown that ASE usually reflects cis-acting genetic polymorphisms (Stranger et al., 2007), whereas trans-genetic regulatory or epigenetic mechanisms are relatively rare (Stranger et al., 2005, Zeller et al., 2010). It is generally believed that cis-regulatory polymorphism is the primary source of phenotypic difference and is associated with many diseases. The functional cis-regulatory variation can be mapped by measurement of ASE, using statistical or experimental approaches (Campino et al., 2008, Pastinen et al., 2005, Serre et al., 2008, Verlaan et al., 2009). In addition, although monoallelic expression is relatively rare, epigenetic mechanisms of allelic expression, such as imprinted genes, can also be detected by measuring ASE (Babak et al., 2008).

The precise identification of ASE genes has been the focus of much attention. Studies using the Illumina Allele-Specific Expression BeadArray platform and quantitative sequencing of real-time polymerase chain reaction (RT-PCR) products showed that differential allelic expression is a widespread phenomenon, which affects the expression of 20% of human genes in individuals of European descent (Serre et al., 2008). In addition, quantitative measurements of allelic expression in different HapMap populations (60 Caucasians of Northern and Western European origin (CEU), 45 unrelated Chinese individuals from Beijing University (CHB), 45 unrelated Japanese individuals from Tokyo (JPT), and 60 Yoruba from Ibadan, Nigeria (YRI)), using the Illumina BeadChips, found that approximately 18% of human genes showed differential allelic expression (Dimas et al., 2008). Statistical analyses of the Illumina BeadChip data have been used to identify genome regions that exhibit ASE. These analyses included the integration of z-score computations and a machine learning approach, based on hidden Markov models (Wagner et al., 2010). Recently, high-throughput RNA sequencing (RNA-seq) has provided a platform-independent method, similar to the microarray approach, which has allowed identification of the genetic regulatory variants at the transcript, isoform and allele levels. Statistical approaches have been proposed to characterize ASE on the basis of RNA-seq data. The binomial exact test has been applied to single nucleotide polymorphism (SNP) to test whether the expression of a reference allele was greater than or less than 0.5 (Degner et al., 2009). In addition, Nothnagel et al. (2011) developed a statistical framework, based on the likelihood ratio test, to examine allele imbalance of single SNPs in RNA-seq data, which allows for allele miscalls (Nothnagel et al., 2011). A Bayesian hierarchical model has been developed by Skelly et al. (2011), using RNS-seq data from a diploid hybrid of two diverse Saccharomyces cerevisiae strains, which can test for ASE in both a SNP and a gene (Skelly et al., 2011).

Although some statistical approaches have been developed to test for ASE, using RNA-seq data, they mainly focus on a single SNP or a single individual. To address the lack of statistical methods for detecting ASE from high-throughput RNA-seq data, we developed a maximum likelihood model to characterize ASE from individuals and populations. In a single individual approximately 17% of genes showed ASE or variable ASE, with a false discovery rate (FDR) of 7.50%. Together with simulation experiments, our method is accurate and robust for the detection of different allelic fractions, and reads coverage levels and random noise. Furthermore, we identified more ASE genes in populations. These data provide insights into the genetic mechanism of cis-acting regulatory variants and the inconsistent effects of regulatory variants observed in different individuals.

Section snippets

Human reference genome construction of SNP data

Phased variant sets were obtained from 1000 genome projects (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/integrated_call_sets), which included phased genotypes from NA12891, NA12892 and CEU individuals (lymphoblastoid samples from HapMap individuals from the CEPH—Centre d'Etude du Polymorphism Human). All heterozygote SNP genome locations were mapped and phase information was converted to the Browser Extensible Data (BED) format. The mitochondrial chromosome, Y

Global distribution of allelic fraction in genomic DNA data and RNA-seq data

Data from genomic DNA mapping of an individual (NA12891) was downloaded from 1000 genome projects. To eliminate read mapping and count bias, the analysis was restricted to SNPs with coverage from at least 10 reads, including 20,299 heterozygous sites. Two thousand nine hundred ninety-four genes, containing 13,894 heterozygous SNPs, were detected and the distribution of allelic read counts was studied (Fig. 1A). As shown in Fig. 1A, the distribution of RNA-seq data was significantly different

Discussion

Allele-specific expression is normally used to map genetic variants that affect gene regulation and to identify alleles that modify disease risk. Identification of ASE genes is helpful in understanding the divergence of phenotypes between individuals, including the difference in gene expression under cis-acting regulatory mechanisms, alternatively spliced transcript isoforms under genetic control and the association with disease. Recently, genome-wide allele-specific approaches that harness

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Genome-wide Identification of Allele-specific Effects on Gene Expression for Single and multiple Individuals".

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers 3001304, 61073136 and 31200998]; and the National Science Foundation of Heilongjiang Province [grant number D200834].

References (23)

  • M. Nothnagel

    Statistical inference of allelic imbalance from transcriptome data

    Hum. Mutat.

    (2011)
  • Cited by (0)

    1

    These authors contributed equally to this work.

    View full text