Abstract
Association mapping can be a powerful tool for detecting quantitative trait loci (QTLs) without requiring line-crossing experiments. We previously proposed a Bayesian approach for simultaneously mapping multiple QTLs by a regression method that directly incorporates estimates of the population structure. In the present study, we extended our method to analyze ordinal and censored traits, since both types of traits are common in the evaluation of germplasm collections. Ordinal-probit and tobit models were employed to analyze ordinal and censored traits, respectively. In both models, we postulated the existence of a latent continuous variable associated with the observable data, and we used a Markov-chain Monte Carlo algorithm to sample the latent variable and determine the model parameters. We evaluated the efficiency of our approach by using simulated- and real-trait analyses of a rice germplasm collection. Simulation analyses based on real marker data showed that our models could reduce both false-positive and false-negative rates in detecting QTLs to reasonable levels. Simulation analyses based on highly polymorphic marker data, which were generated by coalescent simulations, showed that our models could be applied to genotype data based on highly polymorphic marker systems, like simple sequence repeats. For the real traits, we analyzed heading date as a censored trait and amylose content and the shape of milled rice grains as ordinal traits. We found significant markers that may be linked to previously reported QTLs. Our approach will be useful for whole-genome association mapping of ordinal and censored traits in rice germplasm collections.
Similar content being viewed by others
References
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679
Agrama HA, Eizenga GC, Yan W (2007) Association mapping of yield and its components in rice cultivars. Mol Breed 19:341–356
Cowles MK (1996) Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Stat Comput 6:101–111
Cui KH, Peng SB, Xing YZ, Xu CG, Yu SB, Zhang Q (2002) Molecular dissection of seedling-vigor and associated physiological traits in rice. Theor Appl Genet 105:745–753
Donnelly PJ, Tavaré S (1995) Goalescents and genealogical structure under neutrality. Annu Rev Genet 29:401–421
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889
Godsill SJ (2001) On the relationship between Markov chain Monte Carlo methods for model uncertainty. J Comput Graph Stat 10:230–248
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732
Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494
Hawks JG (1983) The diversity of crop plants. Harvard University Press, Cambridge
He P, Li SG, Qian Q, Ma YQ, Li JZ, Wang WM, Chen Y, Zhu LH (1999) Genetic analysis of rice grain quality. Theor Appl Genet 98:502–508
Hobert JP, Casella G (1996) The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. J Am Stat Assoc 91:1461–1473
Huang N, Parco A, Mew T, Magpantay G, McCouh S, Guiderdoni E, Xu JC, Subudhi P, Angeles ER, Khush GS (1997) RFLP mapping of isozymes, RAPD, and QTLs for grain shape, brown planthopper resistance in a doubled-haploid rice population. Mol Breed 3:105–113
Huang H, Eversley CD, Threadgill DW, Zou F (2007) Bayesian multiple quantitative trait loci mapping for complex traits using markers of the entire genome. Genetics 176:2529–2540
International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800
Iwata H, Uga Y, Yoshioka Y, Ebana K, Hayashi T (2007) Bayesian association mapping of multiple quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L. germplasms. Theor Appl Genet 114:1437–1449
Jannink JL (2007) Identifying quantitative trait locus by genetic background interaction in association studies. Genetics 176:553–561
Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K et al (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379
Kilpikari R, Sillanpää MJ (2003) Bayesian analysis of multilocus association in quantitative and qualitative traits. Genet Epidemiol 25:122–135
Kingman JFC (1982) The coalescent. Stochastic Process Appl 13:235–248
Kojima S, Takahashi Y, Kobayashi Y, Monna L, Sasaki T, Araki T, Yano M (2002) Hd3a, a rice ortholog of the Arabidopsis FT gene, promotes transition to flowering downstream of Hd1 under short-day conditions. Plant Cell Physiol 43:1096–1105
Kojima Y, Ebana K, Fukuoka S, Nagamine T, Kawase M (2005) Development of an RFLP-based rice diversity research set of germplasm. Breed Sci 55:431–440
Kuo L, Mallick B (1998) Variable selection for regression models. Sankhya Ser B 60:65–81
Kurata N, Nagamura Y, Yamamoto K, Harushima Y, Sue N, Wu J, Antonio BA, Shomura A, Shimizu T, Lin SY, Inoue T, Fukuda A, Shimano T, Kuboki Y, Toyama T, Miyamoto Y, Kirihara T, Hayasaka K, Miyao A, Monna L, Zhong HS, Tamura Y, Wang ZX, Momma T, Umehara Y, Yano M, Sasaki T, Minobe Y (1994) A 300 kilobase interval genetic map of rice including 883 expressed sequences. Nature Genet 8:365–372
Lanceras JC, Huang HL, Naivikul O, Vanavichit A, Ruanjaichon V, Tragoonrung S (2000) Mapping of genes for cooking and eating qualities in Thai Jasmine rice (KDML105). DNA Res 7:93–101
Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265:2037–2049
Laval G, Excoffier L (2004) SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history. Bioinformatics 20:2485–2487
Li ZK, Yu SB, Lafitte HR, Huang N, Courtois B, Hittalmani S, Vijayakumar CH, Liu GF, Wang GC, Shashidhar HE, Zhuang JY, Zheng KL, Singh VP, Sidhu JS, Srivantaneeyakul S, Khush GS (2003) QTL × environment interactions in rice. I. Heading date and plant height. Theor Appl Genet 108:141–153
Li J, Xiao J, Grandillo S, Jiang L, Wan Y, Qiyun D, Yuan L, McCouch SR (2004) QTL detection for rice grain quality traits using an interspecific backcross population derived from cultivated Asian (O. sativa L.) and African (O. glaberrima S.) rice. Theor Appl Genet 47:697–704
Malosetti M, van der Linden CG, Vosman B, van Eeuwijk FA (2007) A mixed-model approach to association mapping using pedigree information with an illustration of resistance to Phytophthora infestans in potato. Genetics 175:879–889
Mei HW, Luo LJ, Ying CS, Wang YP, Yu XQ, Guo LB, Paterson AH, Li ZK (2003) Gene actions of QTLs affecting several agronomic traits resolved in a recombinant inbred rice population and two testcross populations. Theor Appl Genet 107:89–101
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Nat Acad Sci USA 70:3321–3323
Oraguzie NC, Rikkerink EHA, Gardiner SE, De Silva HN (2007) Association mapping in plants. Springer, New York
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
Satagopan JM, Yandell BS, Newton MA, Osborn TC (1996) A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144:805–816
Sillanpää MJ, Arjas E (1998) Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373–1388
Sillanpää MJ, Arjas E (1999) Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151:1605–1619
Sillanpää MJ, Bhattacharjee M (2005) Bayesian association-based fine mapping in small chromosomal segments. Genetics 169:427–439
Sillanpää MJ, Bhattacharjee M (2006) Association mapping of complex trait loci with context-dependent effects and unknown context variable. Genetics 174:1597–1611
Sillanpää MJ, Hoti F (2007) Mapping quantitative trait loci from a single-tail sample of the phenotype distribution including survival data. Genetics 177:2361–2377
Sillanpää MJ, Kilpikari R, Ripatti S, Onkamo P, Uimari P (2001) Bayesian association mapping for quantitative traits in a mixture of two populations. Genet Epidemiol 21(Suppl 1):S692–S699
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, Heidelberg
Sorensen DA, Gianola D, Korsgaard I (1998) Bayesian mixed-effect model analysis of censored normal distribution with animal breeding applications. Acta Agric Scand 48:222–229
Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ES (2001) Dwarf8 polymorphisms associate with variation with flowering time. Nature Genet 28:286–289
Uimari P, Hoeschele I (1997) Mapping linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms. Genetics 146:735–743
Uimari P, Sillanpää MJ (2001) Bayesian oligogenic analysis of quantitative and qualitative traits in general pedigrees. Genet Epidemiol 21:224–242
Wan XY, Wan JM, Weng JF, Jiang L, Bi JC, Wang CM, Zhai HQ (2005) Stability of QTLs for rice grain dimension and endosperm chalkiness characteristics across eight environments. Theor Appl Genet 110:1334–1346
Weber A, Clark RM, Vaughn L, Sanchez-Gonzalez JD, Yu JM, Yandell BS, Bradbury P, Doebley J (2007) Major regulatory genes in maize contribute to standing variation in teosinte (Zea mays ssp parviglumis). Genetics 177:2349–2359
Yamamoto T, Lin H, Sasaki T, Yano M (2000) Identification of heading date quantitative trait locus Hd6 and characterization of its epistatic interaction with Hd2 in rice using advanced backcross progeny. Genetics 154:885–891
Yamanaka S, Nakamura I, Watanabe KN, Sato YI (2004) Identification of SNPs in the waxy gene among glutinous rice cultivars and their evolutionary significance during the domestication process in rice. Theor Appl Genet 108:1200–1204
Yano M, Katayose Y, Ashikari M, Yamanouchi U, Monna L, Fuse T, Baba T, Yamamoto K, Umehara Y, Nagamura Y, Sasaki T (2000) Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. Plant Cell 12:2473–2484
Yi N (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167:967–975
Yi N, Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155:1391–1403
Yi N, George V, Allison DB (2003) Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 164:1129–1138
Yi N, Xu S, George V, Allison DB (2004) Mapping multiple quantitative trait loci for complex ordinal traits. Behav Genet 34:3–15
Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D (2005) Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics 170:1333–1344
Yi N, Banerjee S, Pomp D, Yandell BS (2007) Bayesian mapping of genome-wide interacting QTL for ordinal traits. Genetics 176:1855–1864
Yu J, Buckler ES (2006) Genetic association mapping and genome organization of maize. Curr Opin Biotech 17:155–160
Yu J, Pressoir G, Briggs W, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet 38:203–208
Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, Toomajian C, Zheng H, Dean C, Marjoram P, Nordborg M (2007) An Arabidopsis example of association mapping in structured samples. PLoS Genet 3:e4
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots—a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
Acknowledgments
This work was supported by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Green Technology Project, QT1001, and Genomics for Agricultural Innovation, DD-4050).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Sillanpää.
Appendices
Appendix A: Priors and posteriors
We considered the prior distributions of the parameters in the model in Eq. 4 except \( \varvec{\gamma} \) as follows:
and
where υ β , s 2 β , υ e , and s 2 e are hyperparameters for the distributions. That is, \( \sigma_{{\beta_{k} }}^{2} \) was sampled from a scaled inverted chi-square distribution with υ β degree of freedom and scale parameter s 2 β , and σ 2 e from the same distribution with different parameters (i.e., υ e and s 2 e ).
We considered that the prior distribution of the number of 1 s in \( \varvec{\gamma}, \) i.e., the number of QTLs included in the model (N Q ), follows a truncated Poisson distribution. That is,
and
where λ is a hyperparameter for the distribution and is construed as the expected number of QTLs included in the model, and Q max is a hyperparameter that determines the upper limit of the number of QTLs that can be included in the model. The Poisson prior on the number of 1 s in \( \varvec{\gamma}, \) has been proposed by Yi (2004).
The prior of the cut-points in the ordinal probit model in Eq. 2, i.e., κ m (m = 2, 3,…, M − 1), has a uniform distribution that respects the ordering constraints. That is,
Bayesian implementations of the tobit model are based on data augmentation, and they have been applied to statistical genetic analysis for censored traits (Sorensen et al. 1998; Sillanpää and Hoti 2007). In the tobit model in Eq. 1, y * i values are not observed if y * i is greater than or equal to the threshold y T, although their values are necessary for estimating parameters in the model in Eq. 3. The unobserved y * i values are augmented by values sampled from the fully conditional posterior distribution:
where TN [a,b)(μ, σ 2) is a normal distribution N(μ, σ 2) truncated to [a, b), q i is the ith row of matrix Q, and x i is the ith row of matrix X. The sampling of unobserved y * i values was conducted repeatedly in the MCMC sampling procedure, as described in the next section.
Bayesian implementations of the ordinal probit model based on data augmentation were presented by Albert and Chib (1993), and they have been applied to multi-locus QTL mapping for ordinal traits (Yi et al. 2004, 2007). In the model, y * i in Eq. 3 is not observed for all samples, and the variance of y * i is assumed to be 1 for consistency with the standard normal c.d.f. link function (Cowles 1996). Thus, the y * i values are augmented by values sampled from the fully conditional posterior distribution:
where \( \varvec{\kappa} \) is a vector for the bin cut-points, i.e., \( \left( {\kappa_{0} ,\kappa_{1} , \ldots ,\kappa_{M} } \right)^{T} . \) The fully conditional posterior distribution of κ m is uniform, as follows:
Here, let X* = [γ 1 X 1,γ 2 X 2,…, γ K X K ]. Then, Eq. 4 can be rewritten as:
where \( \varvec{\beta}= \left[ {\varvec{\beta}_{1}^{T} ,\varvec{\beta}_{2}^{T} , \ldots ,\varvec{\beta}_{K}^{T} } \right]^{T}.\) Here, let
where
and \( \varvec{\theta}= \left[ {\varvec{\alpha}^{T} ,\varvec{\beta}^{T} } \right]^{T}, \) and let
and
Then, the conditional posterior distribution of the ith element of \( \varvec{\theta} \) is:
where \( \varvec{\gamma} \) is a vector whose kth element is γ k , \( \tilde{\theta }_{i} = (r_{i} - {\mathbf{C}}_{i, - i}\varvec{\theta}_{ - i} )/c_{i,i,},\) is the ith diagonal element of the matrix C, r i is the ith element of vector r, C i,−i is a row vector obtained by deleting element i from the ith row of the matrix C, and \( \varvec{\theta}_{ - i} \) is a vector obtained by deleting element i from the vector \( \varvec{\theta} \) (Sorensen and Gianola 2002, p. 566).
The fully conditional posterior distribution of \( \sigma_{{\beta_{k} }}^{2} \) is given by
where \( \varvec{\beta}_{k} \) is a column vector for the genetic effects of marker k, i.e., \( \left( {\beta_{k2} , \ldots ,\beta_{{kL_{k} }} } \right)^{T} ,\quad \tilde{\upsilon }_{{\beta_{k} }} = L_{k} + \upsilon_{\beta } - 1, \) and \( \tilde{s}_{{\beta_{k} }}^{2} = \left[ {\varvec{\beta}_{k}^{T}\varvec{\beta}_{k} + \upsilon_{\beta } s_{\beta }^{2} } \right]/\tilde{\upsilon }_{{\beta_{k} }} . \)
For the tobit model, the fully conditional posterior distribution of σ 2 e is given by:
where \( \tilde{\upsilon }_{e} = n + \upsilon_{e} \) and \( \tilde{s}_{e}^{2} = [({\mathbf{y}}^{*} - {\mathbf{W}}\varvec{\theta})^{T} ({\mathbf{y}}^{*} - {\mathbf{W}}\varvec{\theta}) + \upsilon_{e} s_{e}^{2} ]/\tilde{\upsilon }_{e} . \) For the ordinal probit model, σ 2 e is fixed at 1 (see Eq. A2).
The fully conditional posterior distribution of γ k is given by:
where \( \varvec{\gamma}_{ - k} \) is a vector obtained by removing element k from the vector \( \varvec{\gamma}, \)
where
The vector \( \varvec{\eta}_{k}^{*} \) is the column vector of \( \varvec{\eta} \) with the entries corresponding to marker k replaced by \( \varvec{\beta}_{k} \). Similarly, \( \varvec{\eta}_{k}^{**} \) is obtained from \( \varvec{\eta} \) with the entries corresponding to marker k replaced by the null vector (0).
In the analyses, we set hyperparameters for the prior distributions as \( \upsilon_{\beta } = 2,s_{{_{\beta } }}^{2} = 1,\upsilon_{e} = - 2,s_{e}^{2} = 0,\lambda = 1, \) and \( Q_{\max } = 15. \) In this hyperparameter setting, the prior of \( \sigma_{e}^{2} \) becomes a flat prior (i.e., an improper uniform distribution). Thus, the parameters \( \varvec{\alpha}, \) \( \sigma_{e}^{2} \) and \( \varvec{\kappa} \) had improper distributions in this study. When improper priors are assigned, the posterior distributions may not always be proper (Hobert and Casella 1996). One way to avoid an improper posterior distribution caused by an improper prior distribution is to specify upper and/or lower limits of parameters. In this study, however, we did not specify such limits for the parameters \( \varvec{\alpha},\sigma_{e}^{2} \) and \( \varvec{\kappa}, \) because the improper priors of these parameters seemed to work well in the simulation studies.
Appendix B: MCMC sampling
On the basis of the above equations for prior and posterior distributions, we can use the Gibbs sampler to generate MCMC samples from the posterior distribution of the model parameters. In the sampling of y * i and κ m in the model in Eq. 2, however, we used the multivariate Hastings-within-Gibbs algorithm proposed by Cowles (1996), since the latter algorithm substantially improves the convergence of the MCMC estimations (Cowles 1996). Setting the initial values of the parameters as σ 2 e = 1, \( \sigma_{{\beta_{k} }}^{2} = 1, \) α j = 0, β kl = 0, and γ k = 0, the MCMC sampling proceeds as follows:
- 1.
-
2.
Sample \( \varvec{\kappa} \) from the full conditional posterior distribution described in Eq. A3. (This step is only necessary for ordinal data.)
-
3.
Sample y *. The full conditional posterior distribution of y * is described in Eq. A1 for censored data and Eq. A2 for ordinal data.
- 4.
-
5.
Sample \( \varvec{\alpha} \) and \( \varvec{\beta} \) (i.e., \( \varvec{\theta} \)) from the full conditional posterior distribution described in Eq. A8.
-
6.
Sample \( \sigma_{{\beta_{k} }}^{2} \) for each k from the full conditional posterior distribution described in Eq. A9.
-
7.
Sample \( \sigma_{e}^{2} \) from the full conditional posterior distribution described in Eq. A10. (This step is not necessary for ordinal data.)
-
8.
Sample \( \varvec{\gamma} \) from the full conditional posterior distribution described in Eq. A11.
The above process was repeated many times (see “Data analysis procedure” in “Materials and methods”) to obtain the MCMC samples.
Rights and permissions
About this article
Cite this article
Iwata, H., Ebana, K., Fukuoka, S. et al. Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L. germplasms. Theor Appl Genet 118, 865–880 (2009). https://doi.org/10.1007/s00122-008-0945-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-008-0945-6