Establishing Informative Prior for Gene Expression Variance from Public Databases

Li, Nan; McCall, Matthew N.; Wu, Zhijin

doi:10.1007/s12561-016-9172-x

Establishing Informative Prior for Gene Expression Variance from Public Databases

Published: 01 June 2017

Volume 9, pages 160–177, (2017)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

188 Accesses
Explore all metrics

Abstract

Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications

Article Open access 07 August 2015

Fang Yu, Ming-Hui Chen, … John S. Davis

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data

Article Open access 29 November 2019

Marina Wright Muelas, Farah Mughal, … Douglas B. Kell

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

Article Open access 10 May 2019

Wentao Yang, Philip Rosenstiel & Hinrich Schulenburg

References

Zheng-Bradley X, Rung J, Parkinson H, Brazma A (2010) Large scale comparison of global gene expression patterns in human and mouse. Genome Biol 11:R124
Article Google Scholar
Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33:422–425
Article Google Scholar
Conlon EM, Song JJ, Liu JS (2006) Bayesian models for pooling microarray studies with multiple sources of replications. BMC Bioinforma 7:247
Article Google Scholar
Conlon EM, Song JJ, Liu A (2007) Bayesian meta-analysis models for microarray data: a comparative study. BMC Bioinform 8:80
Article Google Scholar
Cho HJ, Lee JK (2004) Bayesian hierachical error model for analysis of gene expression data. Bioinformatics 20:2016–2025
Article Google Scholar
Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:1–25
Article MathSciNet Google Scholar
Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA (2005) Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6:59–75
Article Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121
Article Google Scholar
Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23:2881–2887
Article Google Scholar
Robinson MD, Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9:321–332
Article Google Scholar
Anders S, Huber W (2010) Differencial expression analysis for sequence count data. Genome Biol 11:R106
Article Google Scholar
Wu H, Wang C, Wu Z (2013) A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics 14(2):232–243
Article Google Scholar
McCall MN, Uppal K, Jaffee HA, Zilliox MJ, Irizarry RA (2011) The gene expression barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res 39:D1011–D1015
Article Google Scholar
McCall MN, Bolstad BM, Irizarry RA (2010) Frozen robust multiarray analysis (fRMA). Biostatistics 11:242–253
Article Google Scholar
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Revi Genet 11(10):733–739
Article Google Scholar
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264
Article Google Scholar
Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F (2004) A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 99:909–917
Article MathSciNet Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNAseq. Nat Methods 5:621–628
Article Google Scholar
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28(6):882–883
Article Google Scholar
Hansen KD, Wu Z, Irizarry RA, Leek JT (2011) Sequencing technology does not eliminate biological variability. Nat Biotechnol 29:572573
Article Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their insightful and constructive comments. This research was supported by National Science Foundation (DBI-1054905).

Author information

Authors and Affiliations

Brown University, Box G-S121-7, Providence, RI, 02912, USA
Nan Li & Zhijin Wu
University of Rochester, 265 Crittenden Boulevard, CU 420630, Rochester, NY, USA
Matthew N. McCall

Authors

Nan Li
View author publications
You can also search for this author in PubMed Google Scholar
Matthew N. McCall
View author publications
You can also search for this author in PubMed Google Scholar
Zhijin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhijin Wu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3560 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, N., McCall, M.N. & Wu, Z. Establishing Informative Prior for Gene Expression Variance from Public Databases. Stat Biosci 9, 160–177 (2017). https://doi.org/10.1007/s12561-016-9172-x

Download citation

Received: 04 January 2016
Revised: 08 July 2016
Accepted: 23 September 2016
Published: 01 June 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s12561-016-9172-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Establishing Informative Prior for Gene Expression Variance from Public Databases

Abstract

Access this article

Similar content being viewed by others

Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 3560 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Establishing Informative Prior for Gene Expression Variance from Public Databases

Abstract

Access this article

Similar content being viewed by others

Confident difference criterion: a new Bayesian differentially expressed gene selection algorithm with applications

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 3560 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation