Skip to main content
Log in

Establishing Informative Prior for Gene Expression Variance from Public Databases

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Identifying differential expressed genes across various conditions or genotypes is the most typical approach to studying the regulation of gene expression. An estimate of gene-specific variance is often needed for the assessment of statistical significance in most differential expression (DE) detection methods, including linear models (e.g., for transformed and normalized microarray data) and generalized linear models (e.g., for count data in RNAseq). Due to a common limit in sample size, the variance estimate is often unstable in small experiments. Shrinkage estimates using empirical Bayes methods have proven useful in improving the variance estimate, hence improving the detection of DE. The most widely used empirical Bayes methods borrow information across genes within the same experiments. In these methods, genes are considered exchangeable or exchangeable conditioning on expression level. We propose, with the increasing accumulation of expression data, borrowing information from historical data on the same gene can provide better estimate of gene-specific variance, thus further improve DE detection. Specifically, we show that the variation of gene expression is truly gene-specific and reproducible between different experiments. We present a new method to establish informative gene-specific prior on the variance of expression using existing public data, and illustrate how to shrink the variance estimate and detect DE. We demonstrate improvement in DE detection under our strategy compared to leading DE detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Zheng-Bradley X, Rung J, Parkinson H, Brazma A (2010) Large scale comparison of global gene expression patterns in human and mouse. Genome Biol 11:R124

    Article  Google Scholar 

  2. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33:422–425

    Article  Google Scholar 

  3. Conlon EM, Song JJ, Liu JS (2006) Bayesian models for pooling microarray studies with multiple sources of replications. BMC Bioinforma 7:247

    Article  Google Scholar 

  4. Conlon EM, Song JJ, Liu A (2007) Bayesian meta-analysis models for microarray data: a comparative study. BMC Bioinform 8:80

    Article  Google Scholar 

  5. Cho HJ, Lee JK (2004) Bayesian hierachical error model for analysis of gene expression data. Bioinformatics 20:2016–2025

    Article  Google Scholar 

  6. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:1–25

    Article  MathSciNet  Google Scholar 

  7. Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA (2005) Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6:59–75

    Article  Google Scholar 

  8. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121

    Article  Google Scholar 

  9. Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23:2881–2887

    Article  Google Scholar 

  10. Robinson MD, Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9:321–332

    Article  Google Scholar 

  11. Anders S, Huber W (2010) Differencial expression analysis for sequence count data. Genome Biol 11:R106

    Article  Google Scholar 

  12. Wu H, Wang C, Wu Z (2013) A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics 14(2):232–243

    Article  Google Scholar 

  13. McCall MN, Uppal K, Jaffee HA, Zilliox MJ, Irizarry RA (2011) The gene expression barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res 39:D1011–D1015

    Article  Google Scholar 

  14. McCall MN, Bolstad BM, Irizarry RA (2010) Frozen robust multiarray analysis (fRMA). Biostatistics 11:242–253

    Article  Google Scholar 

  15. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Revi Genet 11(10):733–739

    Article  Google Scholar 

  16. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264

    Article  Google Scholar 

  17. Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F (2004) A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 99:909–917

    Article  MathSciNet  Google Scholar 

  18. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNAseq. Nat Methods 5:621–628

    Article  Google Scholar 

  19. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28(6):882–883

    Article  Google Scholar 

  20. Hansen KD, Wu Z, Irizarry RA, Leek JT (2011) Sequencing technology does not eliminate biological variability. Nat Biotechnol 29:572573

    Article  Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their insightful and constructive comments. This research was supported by National Science Foundation (DBI-1054905).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhijin Wu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3560 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, N., McCall, M.N. & Wu, Z. Establishing Informative Prior for Gene Expression Variance from Public Databases. Stat Biosci 9, 160–177 (2017). https://doi.org/10.1007/s12561-016-9172-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-016-9172-x

Keywords

Navigation