Skip to main content
Log in

Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes

  • Published:
Journal of Agricultural, Biological, and Environmental Statistics Aims and scope Submit manuscript

Abstract

The performance of inbred and hybrid genotypes is of interest in plant breeding and genetics. High-throughput sequencing of RNA (RNA-seq) has proven to be a useful tool in the study of the molecular genetic responses of inbreds and hybrids to environmental stresses. Commonly used experimental designs and sequencing methods lead to complex data structures that require careful attention in data analysis. We demonstrate an analysis of RNA-seq data from a split-plot design involving drought stress applied to two inbred genotypes and two hybrids formed by crosses between the inbreds. Our generalized linear modeling strategy incorporates random effects for whole-plot experimental units and uses negative binomial distributions to allow for overdispersion in count responses for split-plot experimental units. Variations in gene length and base content, as well as differences in sequencing intensity across experimental units, are also accounted for. Hierarchical modeling with thoughtful parameterization and prior specification allows for borrowing of information across genes to improve estimation of dispersion parameters, genotype effects, treatment effects, and interaction effects of primary interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Anders, S., and Huber, W. (2010), “Differential expression analysis for sequence count data,” Genome Biol, 11(10), R106.

  • Benjamini, Y., and Speed, T. P. (2012), “Summarizing and correcting the GC content bias in high-throughput sequencing,” Nucleic Acids Research, p. gks001.

  • Dillies, M.-A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J. et al. (2013), “A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis,” Briefings in Bioinformatics, 14(6), 671–683.

  • Hardcastle, T. J., and Kelly, K. A. (2010), “baySeq: empirical Bayesian methods for identifying differential expression in sequence count data,” BMC Bioinformatics, 11(1), 422.

  • Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014), “Voom: precision weights unlock linear model analysis tools for RNA-seq read counts,” Genome Biol, 15(2), R29.

  • Lewin, A., Bochkina, N., and Richardson, S. (2007), “Fully Bayesian mixture model for differential gene expression: simulations and model checks,” Statistical Applications in Genetics and Molecular Biology, 6(1).

  • Li, J., and Tibshirani, R. (2013), “Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-seq data,” Statistical Methods in Medical Research, 22(5), 519–536.

  • Lorenz, D. J., Gill, R. S., Mitra, R., and Datta, S. (2014), “Using RNA-seq Data to Detect Differentially Expressed Genes,” in Statistical Analysis of Next Generation Sequencing Data Springer, pp. 25–49.

  • Love, M. I., Huber, W., and Anders, S. (2014), “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol, 15(12), 550.

  • Lund, S. P., Nettleton, D. et al. (2012), “The importance of distinct modeling strategies for gene and gene-specific treatment effects in hierarchical models for microarray data,” The Annals of Applied Statistics, 6(3), 1118–1133.

  • McCarthy, D. J., Chen, Y., and Smyth, G. K. (2012), “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation,” Nucleic Acids Research, 40(10), 4288–4297.

  • Nettleton, D. (2014), “Design of RNA Sequencing Experiments,” in Statistical Analysis of Next Generation Sequencing Data Springer, pp. 93–113.

  • Oshlack, A., Wakefield, M. J. et al. (2009), “Transcript length bias in RNA-seq data confounds systems biology,” Biol Direct, 4(1), 14.

  • Riebler, A., Robinson, M. D., and van de Wiel, M. A. (2014), “Analysis of Next Generation Sequencing Data Using Integrated Nested Laplace Approximation (INLA),” in Statistical Analysis of Next Generation Sequencing Data Springer, pp. 75–91.

  • Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010), “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26(1), 139–140.

  • Robinson, M. D., Oshlack, A. et al. (2010), “A scaling normalization method for differential expression analysis of RNA-seq data,” Genome Biol, 11(3), R25.

  • Rue, H., Martino, S., and Chopin, N. (2009), “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392.

  • Smyth, G. K. (2005), “Limma: linear models for microarray data,” in Bioinformatics and computational biology solutions using R and Bioconductor Springer, pp. 397–420.

  • van de Wiel, M. A., Leday, G. G., Pardo, L., Rue, H., Van Der Vaart, A. W., and Van Wieringen, W. N. (2013), “Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors,” Biostatistics, 14(1), 113–128.

  • Ventrucci, M., Scott, E. M., and Cocchi, D. (2011), “Multiple testing on standardized mortality ratios: a Bayesian hierarchical model for FDR estimation,” Biostatistics, 12(1), 51–67.

Download references

Acknowledgments

Research reported in this chapter was supported by the National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health and the joint National Science Foundation/NIGMS Mathematical Biology Program under award number R01GM109458. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Lithio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lithio, A., Nettleton, D. Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes. JABES 20, 598–613 (2015). https://doi.org/10.1007/s13253-015-0232-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-015-0232-3

Keywords

Navigation