Methods paperGenome-wide identification of allele-specific effects on gene expression for single and multiple individuals
Introduction
Allele-specific gene expression (ASE) is the representation of the two alleles of a given gene in the corresponding mRNA. Normal development and cellular processes require the ratio of expression of the two alleles to be different from the allelic representation in genomic DNA (50:50). However, the precise mechanisms by which allele-specific gene expression occurs are not yet understood and there may be multiple mechanisms. Studies of expression quantitative trait loci (eQTLs) have shown that ASE usually reflects cis-acting genetic polymorphisms (Stranger et al., 2007), whereas trans-genetic regulatory or epigenetic mechanisms are relatively rare (Stranger et al., 2005, Zeller et al., 2010). It is generally believed that cis-regulatory polymorphism is the primary source of phenotypic difference and is associated with many diseases. The functional cis-regulatory variation can be mapped by measurement of ASE, using statistical or experimental approaches (Campino et al., 2008, Pastinen et al., 2005, Serre et al., 2008, Verlaan et al., 2009). In addition, although monoallelic expression is relatively rare, epigenetic mechanisms of allelic expression, such as imprinted genes, can also be detected by measuring ASE (Babak et al., 2008).
The precise identification of ASE genes has been the focus of much attention. Studies using the Illumina Allele-Specific Expression BeadArray platform and quantitative sequencing of real-time polymerase chain reaction (RT-PCR) products showed that differential allelic expression is a widespread phenomenon, which affects the expression of 20% of human genes in individuals of European descent (Serre et al., 2008). In addition, quantitative measurements of allelic expression in different HapMap populations (60 Caucasians of Northern and Western European origin (CEU), 45 unrelated Chinese individuals from Beijing University (CHB), 45 unrelated Japanese individuals from Tokyo (JPT), and 60 Yoruba from Ibadan, Nigeria (YRI)), using the Illumina BeadChips, found that approximately 18% of human genes showed differential allelic expression (Dimas et al., 2008). Statistical analyses of the Illumina BeadChip data have been used to identify genome regions that exhibit ASE. These analyses included the integration of z-score computations and a machine learning approach, based on hidden Markov models (Wagner et al., 2010). Recently, high-throughput RNA sequencing (RNA-seq) has provided a platform-independent method, similar to the microarray approach, which has allowed identification of the genetic regulatory variants at the transcript, isoform and allele levels. Statistical approaches have been proposed to characterize ASE on the basis of RNA-seq data. The binomial exact test has been applied to single nucleotide polymorphism (SNP) to test whether the expression of a reference allele was greater than or less than 0.5 (Degner et al., 2009). In addition, Nothnagel et al. (2011) developed a statistical framework, based on the likelihood ratio test, to examine allele imbalance of single SNPs in RNA-seq data, which allows for allele miscalls (Nothnagel et al., 2011). A Bayesian hierarchical model has been developed by Skelly et al. (2011), using RNS-seq data from a diploid hybrid of two diverse Saccharomyces cerevisiae strains, which can test for ASE in both a SNP and a gene (Skelly et al., 2011).
Although some statistical approaches have been developed to test for ASE, using RNA-seq data, they mainly focus on a single SNP or a single individual. To address the lack of statistical methods for detecting ASE from high-throughput RNA-seq data, we developed a maximum likelihood model to characterize ASE from individuals and populations. In a single individual approximately 17% of genes showed ASE or variable ASE, with a false discovery rate (FDR) of 7.50%. Together with simulation experiments, our method is accurate and robust for the detection of different allelic fractions, and reads coverage levels and random noise. Furthermore, we identified more ASE genes in populations. These data provide insights into the genetic mechanism of cis-acting regulatory variants and the inconsistent effects of regulatory variants observed in different individuals.
Section snippets
Human reference genome construction of SNP data
Phased variant sets were obtained from 1000 genome projects (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/integrated_call_sets), which included phased genotypes from NA12891, NA12892 and CEU individuals (lymphoblastoid samples from HapMap individuals from the CEPH—Centre d'Etude du Polymorphism Human). All heterozygote SNP genome locations were mapped and phase information was converted to the Browser Extensible Data (BED) format. The mitochondrial chromosome, Y
Global distribution of allelic fraction in genomic DNA data and RNA-seq data
Data from genomic DNA mapping of an individual (NA12891) was downloaded from 1000 genome projects. To eliminate read mapping and count bias, the analysis was restricted to SNPs with coverage from at least 10 reads, including 20,299 heterozygous sites. Two thousand nine hundred ninety-four genes, containing 13,894 heterozygous SNPs, were detected and the distribution of allelic read counts was studied (Fig. 1A). As shown in Fig. 1A, the distribution of RNA-seq data was significantly different
Discussion
Allele-specific expression is normally used to map genetic variants that affect gene regulation and to identify alleles that modify disease risk. Identification of ASE genes is helpful in understanding the divergence of phenotypes between individuals, including the difference in gene expression under cis-acting regulatory mechanisms, alternatively spliced transcript isoforms under genetic control and the association with disease. Recently, genome-wide allele-specific approaches that harness
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Genome-wide Identification of Allele-specific Effects on Gene Expression for Single and multiple Individuals".
Funding
This work was supported by the National Natural Science Foundation of China [grant numbers 3001304, 61073136 and 31200998]; and the National Science Foundation of Heilongjiang Province [grant number D200834].
References (23)
Global survey of genomic imprinting by transcriptome sequencing
Curr. Biol.
(2008)Non-invasive screening of HLA-DPA1 and HLA-DPB1 alleles for persistent hepatitis B virus infection: susceptibility for vertical transmission and toward a personalized approach for vaccination and treatment
Clin. Chim. Acta
(2011)Validating discovered Cis-acting regulatory genetic variants: application of an allele specific expression approach to HapMap populations
PLoS One
(2008)Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
Bioinformatics
(2009)Modifier effects between regulatory and protein-coding variation
PLoS Genet.
(2008)Natural selection on cis and trans regulation in yeasts
Genome Res.
(2010)Tissue effect on genetic control of transcript isoform variation
PLoS Genet.
(2009)RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression
Genome Res.
(2011)- et al.
Fast gapped-read alignment with Bowtie 2
Nat. Methods
(2012) Transcriptome genetics using second generation sequencing in a Caucasian population
Nature
(2010)
Statistical inference of allelic imbalance from transcriptome data
Hum. Mutat.
Cited by (0)
- 1
These authors contributed equally to this work.