Abstract
Genomic studies of plants often seek to identify genetic factors associated with desirable traits. The process of evaluating genetic markers one by one (i.e. a marginal analysis) may not identify important polygenic and environmental effects. Further, confounding due to growing conditions/factors and genetic similarities among plant varieties may influence conclusions. When developing new plant varieties to optimize yield or thrive in future adverse conditions (e.g. flood, drought), scientists seek a complete understanding of how the factors influence desirable traits. Motivated by a study design that measures rice yield across different seasons, fields, and plant varieties in Indonesia, we develop a regression method that identifies significant genomic factors, while simultaneously controlling for field factors and genetic similarities in the plant varieties. Our approach develops a Bayesian maximum a posteriori probability (MAP) estimator under a generalized double Pareto shrinkage prior. Through a hierarchical representation of the proposed model, a novel and computationally efficient expectation-maximization (EM) algorithm is developed for variable selection and estimation. The performance of the proposed approach is demonstrated through simulation and is used to analyze rice yields from a pilot study conducted by the Indonesian Center for Rice Research.
Acknowledgement
We would like to acknowledge the Indonesian Center for Agricultural Biotechnology and Genetic Resources Research and Development (ICABIOGRAD) for supplying data, NVIDIA and Amazon Web Services for computational support through their grants programs, the International Treaty on Plant Genetic Resources for Food and Agriculture grant W3A-PR-07-Indonesia, NIH/NIAID grant R01 AI121351, and NIH/NIDA grant R43 DA041211-01A1.
References
Alexandrow, N., S. Tai, W. Wang, L. Mansueto, K. Palis and R. Fuentes (2014): “SNP-Seek database of SNPs derived from 3000 rice genomes,” Nucleic Acids Res., 43, D1023–D1027.10.1093/nar/gku1039Search in Google Scholar
Armagan, A., D. Dunson and J. Lee (2013): “Generalized double Pareto shrinkage,” Stat. Sin., 23, 119–143.10.5705/ss.2011.048Search in Google Scholar PubMed
Dodds, K., J. McEwan, R. Brauning, R. Anderson, T. Stijn, T. Kristjansson and S. Clarke (2015): “Construction of relatedness matrices using genotyping-by-sequencing data,” BMC Genomics, 16, 1047.10.1186/s12864-015-2252-3Search in Google Scholar PubMed
Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360.10.1198/016214501753382273Search in Google Scholar
Geddy, R. and G. Brown (2007): “Genes encoding pentatricopeptide repeat (PPR) proteins are not conserved in location in plant genomes and may be subject to diversifying selection,” BMC Genomics, 8, 130.10.1186/1471-2164-8-130Search in Google Scholar PubMed
Huang, S., R. Shingaki-Wells, N. Taylor and A. Millar (2013): “The rice mitochondria proteome and its response during development and to the environment,” Front. Plant Sci., 4, 16.10.3389/fpls.2013.00016Search in Google Scholar PubMed
Kilian, J., F. Peschke, K. Berendzen, K. Harter and D. Wanke (2012): “Prerequisites, performance and profits of transcriptional profiling the abiotic stress response,” Biochim. Biophys. Acta, 1819, 166–175.10.1016/j.bbagrm.2011.09.005Search in Google Scholar PubMed
Matthews, R., M. Kropff, T. Horie and D. Bachelet (1997): “Simulating the impact of climate change on rice production in Asia and evaluating options for adaptation,” Agric. Syst., 54, 399–425.10.1016/S0308-521X(95)00060-ISearch in Google Scholar
Pandey, S. and H. Bhandari (2009): “Drought, coping mechanisms and poverty,” IFAD Occasional Papers.Search in Google Scholar
People Facts (2012): Population growth. http://os-connect.com/pop/p2ai.html (Last accessed, September 29, 2016).Search in Google Scholar
Sakai, H., S. Lee, T. Tanaka, H. Numa, J. Kim, Y. Kawahara, H. Wakimoto, C. C. Yang, M. Iwamoto, T. Abe and Y. Yamada (2013): “Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics,” Plant Cell Physiol., 54, e6.10.1093/pcp/pcs183Search in Google Scholar PubMed PubMed Central
Schielzeth, H. and A. Husby (2014): “Challenges and prospects in genome-wide quantitative trait loci mapping of standing genetic variation in natural populations,” Ann. N. Y. Acad. Sci., 132, 35–57.10.1111/nyas.12397Search in Google Scholar PubMed
Servin, B. and M. Stephens (2007): “Imputation-based analysis of association studies: candidate genes and quantitative traits,” PLoS Genet., 3, 1296–1308.Search in Google Scholar
Sharma, M. and G. Pandey (2015): “Expansion and function of repeat domain proteins during stress and development in plants,” Front. Plant Sci., 6, 1218.10.3389/fpls.2015.01218Search in Google Scholar PubMed PubMed Central
Shean, M. (2012): INDONESIA: stagnating rice production ensures continued need for imports. http://www.pecad.fas.usda.gov/highlights/2012/03/Indonesia_rice_Mar2012 (Last accessed, September 29, 2016).Search in Google Scholar
Sheikh, A., B. Raghuram, S. Jalmi, D. Wankhede, P. Singh and A. Sinha (2013): “Interaction between two rice mitogen activated protein kinases and its possible role in plant defense,” BMC Plant Biol., 13, 121.10.1186/1471-2229-13-121Search in Google Scholar PubMed PubMed Central
Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2013): “A sparse-group lasso,” J. Comput. Graph. Stat., 22, 231–245.10.1080/10618600.2012.681250Search in Google Scholar
Singh, D., M. Tsiang, B. Rajaratnam and N. Diffenbaugh (2014): “Observed changes in extreme wet and dry spells during the South Asian summer monsoon season,” Nat. Clim. Change, 4, 456–461.10.1038/nclimate2208Search in Google Scholar
Sun, L. and R. Wu (2015): “Mapping complex traits as a dynamic system,” Phys. Life Rev., 13, 155–185.10.1016/j.plrev.2015.02.007Search in Google Scholar PubMed PubMed Central
Teixeira, P. and E. Glaser (2013): “Processing peptidases in mitochondria and chloroplasts,” Biochim. Biophys. Acta, 1833, 360–370.10.1016/j.bbamcr.2012.03.012Search in Google Scholar PubMed
The UniProt Consortium (2015): “UniProt: a hub for protein information,” Nucleic Acids Res., 43, D204–D212.10.1093/nar/gku989Search in Google Scholar PubMed PubMed Central
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Ser. B, 58, 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar
Trivedi, D., S. Yadav, N. Vaid and N. Tuteja (2012): “Genome wide analysis of Cyclophilin gene family from rice and Arabidopsis and its comparison with yeast,” Plant Signal Behav., 7, 1653–1666.10.4161/psb.22306Search in Google Scholar PubMed PubMed Central
World Bank (2013): Indonesia population growth rate. http://data.world␣bank.org/country/indonesia.html (Last accessed, September 29, 2016).Search in Google Scholar
Yang, Z., H. Ma, H. Hong, W. Yao, W. Xie, J. Xiao, X. Li and S. Wang (2015): “Transcriptome-based analysis of mitogen-activated protein kinase cascades in the rice response to Xanthomonas oryzae infection,” Rice, 8, 4.10.1186/s12284-014-0038-xSearch in Google Scholar PubMed PubMed Central
Yazdani, A. and D. Dunson (2015): “A hybrid Bayesian approach for genome-wide association studies on related individuals,” Bioinformatics, 31, 3890–3896.10.1093/bioinformatics/btv496Search in Google Scholar PubMed
Yuan, M. and Y. Lin (2007): “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc. Ser. B, 68, 49–67.10.1111/j.1467-9868.2005.00532.xSearch in Google Scholar
Zhao, K., M. Wright, J. Kimball, G. Eizenga, A. McClung, M. Kovach, W. Tyagi, M. L. Ali, C. W. Tung, A. Reynolds and C. D. Bustamante (2010): “Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome,” PLoS One, 5, e10780.10.1371/journal.pone.0010780Search in Google Scholar PubMed PubMed Central
Zhou, X. (2016): Gemma user manual. http://www.xzlab.org/software.html (Last accessed, September 29, 2016).Search in Google Scholar
Zhou, X., P. Carbonetto and M. Stephens (2013): “Polygenic modeling with Bayesian sparse linear mixed models,” PLoS Genet., 9, 1–14.10.1371/journal.pgen.1003264Search in Google Scholar PubMed PubMed Central
Zou, H. (2006): “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., 101, 1418–1429.10.1198/016214506000000735Search in Google Scholar
Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc. Ser. B, 67, 301–320.10.1111/j.1467-9868.2005.00503.xSearch in Google Scholar
Supplemental Material
The online version of this article offers supplementary material https://doi.org/10.1515/sagmb-2017-0044).
©2017 Walter de Gruyter GmbH, Berlin/Boston