An Empirical Bayes risk prediction model using multiple traits for sequencing data

Gengxin Li; Yuehua Cui; Hongyu Zhao

doi:10.1515/sagmb-2015-0060

Published by De Gruyter December 7, 2015

An Empirical Bayes risk prediction model using multiple traits for sequencing data

Gengxin Li , Yuehua Cui and Hongyu Zhao

From the journal Statistical Applications in Genetics and Molecular Biology

https://doi.org/10.1515/sagmb-2015-0060

Showing a limited preview of this publication:

Abstract

The rapidly developing sequencing technologies have led to improved disease risk prediction through identifying many novel genes. Many prediction methods have been proposed to use rich genomic information to predict binary disease outcomes. It is intuitive that these methods can be further improved by making efficient use of the rich information in measured quantitative traits that are correlated with binary outcomes. In this article, we propose a novel Empirical Bayes prediction model that uses information from both quantitative traits and binary disease status to improve risk prediction. Our method is built on a new statistic that better infers the gene effect on multiple traits, and it also enjoys the good theoretical properties. We then consider using sequencing data by combining information from multiple rare variants in individual genes to strengthen the signals of causal genetic effects. In simulation study, we find that our proposed Empirical Bayes approach is superior to other existing methods in terms of feature selection and risk prediction. We further evaluate the effectiveness of our proposed method through its application to the sequencing data provided by the Genetic Analysis Workshop 18.

Keywords: area under the ROC curve (AUC); cross validation (CV); Empirical Bayes (EB) estimate; multiple traits; receiver operating characteristic curve (ROC)

Corresponding author: Gengxin Li, Department of Mathematics and Statistics, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH 45435, USA, e-mail: gengxin.li@wright.edu

Acknowledgments

This work was partially supported by grants from National Institutes of Health (GM59507 to HZ) and National Science Foundation (DMS-1209112 to YC). The Genetic Analysis Workshops are supported by NIH grant R01 GM031575. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. Andrew R. Wood is supported by European Research Council grant SZ-245 50371-GLUCOSEGENES-FP7-IDEAS-ERC.

Conflict of interest: None declared.

Appendix

A. the best linear unbiased predictor

q is a standardized vector, and E(q)=0 and Var(q)=1.
ε_i have mean zero and variance ψ_i. Normally, ψ_i=1.
assuming a simple linear regression model (Equation 5), we have,
E(Ci)=E(βi,vq+εi)=0Var(Ci)=Var(βi,vq+εi)=βi,v2+ψi
Extending C_i to C, we get
E(C)=0Var(C)=βvβ′v+Ψ
where Ψ is the variance-covariance matrix of ε.
The joint distribution of (q, C) is (qC)~[(00), (1βvβvβvβ′v+Ψ)] since,
Cov(q, C)=Cov(q, βvq+ε)=Cov(q, βvq)+Cov(q, ε)=βvCov(q, q)+0=βvCor(q, C)=βv1×(βvβ′v+Ψ)1/2
where q and ε are uncorrelated.
The best linear predictor is given by
g(X)=μy+ρσyσx(X−μx)

The predicted value of q is denoted as q^# based on g(X). q and C correspond to y and x in the above equation. According to (i) and (iii),

E(q)=0⇒μy=0E(C)=0⇒μx=0Var(q)=1⇒σy=1Var(C)=βvβv′+Ψ⇒σx=(βvβv′+Ψ)1/2Cor(q, C)=βv1×(βvβv′+Ψ)1/2⇒ρ=βv(βvβv′+Ψ)1/2

Finally, the best predictor equation becomes,

q#=ρσyσxX=β′v(βvβ′v+Ψ)−1C=11+λv2β′vΨ−1C

where Ψ is the variance-covariance matrix of ε, and λv2=β′vΨ−1βv which is a squared multivariate distance of β_v from the origin.

B: thetstatistic

If ε_i follows a normal distribution, the t-test statistic for the null hypothesis β_i,v=0 under model (5) follows a non-central t distribution,

Ti,v=β^i,v−0se(β^i,v)=2c0β^i,v∼tn−2(γi,v) where c02=∑j=1n(qj−q¯)24

Proof: Given q, the ordinary least squares estimate of coefficient β_i,v is β^i,v|q=(q′q)−1q′Ci (Montgomery et al., 2006),

Var(β^i,v|q)=(q′q)−1q′q(q′q)−1Var(Ci|q)=(q′q)−1se(β^i,v|q)=(q′q)−1/2

It is known that E(q)=0,

q′q=∑j=1n(qj−q¯)2

If ε_i is normally distributed, the t-test statistic follows,

Ti,v=β^i,v−0se(β^i,v)=(q′q)1/2β^i,v=2c0β^i,v∼tn−2(γi,t)

where c02=∑j=1n(qj−q¯)24, then 2c0=(∑j=1n(qj−q¯)2)1/2=(q′q)1/2.

C: proof of proposition 1

The proof follows the results in literature (Ker, 2001; Nadarajah and Kotz, 2008). To simplify the notation, we use Z₁ and Z₂ to represent Z_i,d and Z_i,v, respectively. Assume that (Z₁, Z₂) follow a bivariate normal distribution (Z1Z2)∼MVN(γ, Σ) where γ=(γ1, γ2)T, Σ=(σ12pσ1σ2pσ1σ2σ22) where ρ denotes the correlation between Z₁ and Z₂. Let Z=min(Z₁, Z₂) and I=r if Z=Z_r (r=1, 2).

Lemma 1: Let ϕ(.) and Φ(.) are the univariate standard normal density (pdf) and its corresponding cumulative distribution function (cdf), respectively. Then the conditional pdf of Z given I=r for all z and r takes 1 or 2 is,

fr(z)=Pr(Z≤z|I=r){pr−1ϕ(z−γrσr)(1−Φ(z−γr∗σr∗))if ρ≠1.ϕ(z−γrσr)otherwise.

where γr∗=αrγ3−r+(1−αr)γr, σr∗=αrσ3−r1−ρ2, and αr=11−ρσ3−rσr. Here, Z=min(Z₁, Z₂) works for ρ≠1. If ρ=1, Z=Z₁=Z₂.

And pr=P(I=r)=Φ(γr−γ3−rσ12+σ22−2ρσ1σ2)

Proof: Let ℵ be the distribution of pair (Z, I), then

ℵ(z, 1)=Pr(Z≤z, I=1)=Pr(I=1)−Pr(Z2>Z1>z)=p1−∫z∞∫z1∞ϕ((z2−γ2)−ρσ2σ1(z1−γ1)σ21−ρ2)ϕ(z1−γ1σ1)dz2dz1=p1−∫z∞(1−Φ((z−γ2)−ρσ2σ1(z−γ1)σ21−ρ2))ϕ(z−γ1σ1)dz

Hence, for all z,

ℵ(z, 1)=Pr(Z≤z, I=1)=Pr(Z≤z|I=1)Pr(I=1)=p1f1(z)=1σ1ϕ(z−γ1σ1)(1−Φ((z−γ2)−ρσ2σ1(z−γ1σ1)σ21−ρ2))=1σ1ϕ(z−γ1σ1)Φ(ρ(z−γ1)σ11−ρ2−z−γ2σ21−ρ2)

where Φ(a)=1–Φ(a).

The equation of p₁ is derived from,

p1=Pr(Z1−Z2<0)=∫−∞γ1−γ2σ12+σ22−2ρσ1σ2ϕ(z)dz

Proposition 1: The distribution of the pair (Z, I) uniquely determines the distribution of the pair (Z₁, Z₂).

Proof: Suppose (Z, I) has the distribution ℵ, and we use parameters (γ′1, γ′2) and σ′r for r=1 or 2 to denote any bivariate normal pdf of (Z, I). According to the above results,

limz→−∞prfr(z)ϕ(z−γrσr)=limz→−∞prfr(z)ϕ(z−γ′rσ′r)=1

Hence, ϕ(z−γrσr)ϕ(z−γ′rσ′r)⇒1, then we will get γr=γ′r, σr=σ′r and ρ=ρ′.

Similarly, for I=2, the pdf of (Z, I) is derived by

ℵ(z, 2)=Pr(Z≤z, I=2)=p2f2(z)=1σ2ϕ(z−γ2σ2)Φ(ρ(z−γ2)σ21−ρ2−z−γ1σ11−ρ2)

Finally, the probability density function of Z=min(Z₁, Z₂) for I=1 or 2 will be given by,

ℵ(z, 1)+ℵ(z, 2)=p1f1(z)+p2f2(z)=1σ1ϕ(z−γ1σ1)Φ(ρ(z−γ1)σ11−ρ2−z−γ2σ21−ρ2)+1σ2ϕ(z−γ2σ2)Φ(ρ(z−γ2)σ21−ρ2−z−γ1σ11−ρ2)

when Z₁ and Z₂ are replaced by Z_i,d and Z_i,v separately, γ₁ and γ₂ denote γ_i,d and γ_i,v, and Z is replaced by Z_i,m, The pdf of Z_i,m=min(Z_i,d, Z_i,v) is derived.

In the same way, we can derive the pdf of Z=max(Z₃, Z₄). Assume that (Z₃, Z₄) follow a bivariate normal distribution, (Z3Z4)∼MVN(γ, Σ) where γi=(γ3, γ4)T, Σ=(σ32ρσ3σ4ρσ3σ4σ42) where ρ denotes the correlation between Z₃ and Z₄. Let Z=max(Z₃, Z₄) and I=r if Z=Z_r (r=3 or 4).

Lemma 2: Let ϕ(.) and Φ(.) are the univariate standard normal density (pdf) and its corresponding cumulative distribution function (cdf), respectively. Then the pdf of Z given I=r for all z and r takes 3 or 4 is,

fr(z)=Pr(Z≤z|I=r){pr−1ϕ(γr−zσr)Φ(z−γr∗σr∗)if ρ≠1.ϕ(γr−zσr),otherwise.

where γr∗=αrγ7−r+(1−αr)γr, σr∗=αrσ7−r1−ρ2, and αr=11−ρσ7−rσr.

Proof: let ℵ be the distribution of pair (Z, I), then

ℵ(z, 3)=Pr(Z≤z, I=3)=Pr(Z4<Z3≤z)=∫−∞z∫−∞z3ϕ((z3−γ3)−ρσ3σ4(z3−γ4)σ31−ρ2)ϕ(z4−γ4σ4)dz4dz3=∫−∞zΦ(ρσ3σ4(γ4−z)−(γ3−z)σ31−ρ2)ϕ(z−γ4σ4)dz

Then the pdf of (Z, 3) is given by

p3f3(z)=1σ4ϕ(γ4−zσ4)Φ(ρ(γ4−z)σ41−ρ2−γ3−zσ31−ρ2)

where ϕ(a)=ϕ(–a).

Similarly, the pdf of (Z, 4) is given by,

p4f4(z)=1σ3ϕ(γ3−zσ3)Φ(ρ(γ3−z)σ31−ρ2−γ4−zσ41−ρ2)

The equation of p₃ is derived from,

p3=Pr(Z3−Z4>0)=∫γ4−γ3σ32+σ42−2ρσ3σ4)∞ϕ(z)dz

Proposition 2: The distribution of the pair (Z, I) uniquely determines the distribution of the pair (Z₃, Z₄).

Proof: same as the above proposition 1.

Finally, the pdf of Z=max(Z₁, Z₂) is given by,

p3f3(z)+p4f4(z)=1σ4ϕ(γ4−zσ4)Φ(ρ(γ4−z)σ41−ρ2−γ3−zσ31−ρ2)+1σ3ϕ(γ3−zσ3)Φ(ρ(γ3−z)σ31−ρ2−γ4−zσ41−ρ2)

Again, when Z₃ and Z₄ are replaced by Z_i,v and Z_i,d separately, γ₃ and γ₄ denote γ_i,v and γ_i,d, and Z is replaced by Z_i,m, The pdf of Z_i,m=max(Z_i,d, Z_i,v) is derived.

D: Effects of multiple quantitative traits

GAW 17 provides three quantitative traits, such as: Q1, Q2 and Q4. In particular, the residual heritability of Q1 is 0.44, and covariates, age and smoking status, act on the Q1. Q2 has a heritability of 0.29, but is not affected by covariates. Q4 is simulated by various protective genes, and it has a heritability of 0.7. According to the disease risk simulation, Q1 and Q2 are positively related to the disease trait, but Q4 is negatively correlated to the disease status (Appendix Table A1). When we add three quantitative traits in the disease risk EB classifier separately (Appendix Table A2), five true genes are detected by adding Q1 in the EB model, four true genes identified through incorporating Q2 in the EB model, and only two true genes explored by adding Q4 in the EB method. In particular, all three quantitative traits detect FLT1 gene, but Q1 provides the highest ranking position compared to Q2 and Q4. Appendix Table A3 further evaluates the performances of three quantitative traits on predicting disease risk. It clearly illustrates that adding Q1 in the EB model gives the smallest cross-validation error (0.26 with SD 0.0003) and the largest AUC (0.91 with SD 0.022), followed by Q2, then Q4. The results agree with the simulation setting for each quantitative trait.

Table A1

Characteristics of three quantitative traits (Q1, Q2 and Q4).

Traits	Heritability	Covariates	Diseases trait
Q1	0.44	Related	+
Q2	0.29	Unrelated	+
Q4	0.70	Related	–

Covariates: age, sex and smoking status are used as the covariates; Disease trait: is a binary trait; +: denotes a positive correlation; –: indicates a negative correlation; related/unrelated: is used to check whether the quantitative trait is influenced by the covariates or not.

Table A2

The ranking position of true genes for three classifiers (D+Q1, D+Q2 and D+Q4).

	Predictor	D+Q1	D+Q2	D+Q4
	Predictor	Rank position	Rank position	Rank position
1	FLT1	1	4	4
2	KDR	2	–	–
3	PTK2B	16	–	–
4	PIK3C2B	47	26	–
5	SHC1	208	–	–
6	FLT4	–	182	–
7	VEGFA	–	73	–
8	AKT3	–	–	123

The ranking positions of eight true genes FLT1, KDR, PTK2B, PIK3C2B, SHC1, FLT4, VEGFA and AKT3 detected by 3 models (D+Q1, D+Q2 and D+Q4); D+Q1: adding Q1 in the Empirical Bayes model; D+Q2: adding Q2 in the Empirical Bayes model; D+Q4: adding Q4 in the Empirical Bayes model; “–”: the position of one gene is outside of top 210 important genes.

Table A3

Cross validation error and AUC for three classifiers (D+Q1, D+Q2 and D+Q4).

Item	Scenario	Method	D+Q1	D+Q2	D+Q4
CV	Gene+Envi	Mean	0.26	0.27	0.31
		SD	0.0003	0.0002	0.0024
AUC	Gene+Envi	Mean	0.91	0.88	0.85
		SD	0.022	0.020	0.021

CV indicates the cross validation error; AUC is the area under the receiver operating characteristic (ROC) curve when minimizing CV; SD is the standard deviation of cross validation error and AUC; D+Q1: adding Q1 in the Empirical Bayes model; D+Q2: adding Q2 in the Empirical Bayes model; D+Q4: adding Q4 in the Empirical Bayes model.

E: Multiple replicates

The GAW 17 sequence data contains R replicated phenotypes. This results in the analogous Z statistic for every replicate. According to the Brown and Stein Bayesian model, each Z_i,a (a=1, …, R) is derived from a normal distribution N(γi, σi2), then the expectation of a sequence of statistics (Z̅_i) is still γ_i, that is E(Z¯i)=E(Zi,1+…+Zi,RR)=γi. But the variance of this new estimator deviates from the unit variance due to the correlations among the sequence of Z statistics, which may violate the assumption of the statistic before estimating EB estimates. To interpret the variance of Z̅_i, we define a correlation r for any two replicates of statistics (Z_i,a and Z_i,b) where a, b=1, …, R and a≠b, then Cor(Z_i,a, Z_i,b)=r.

Var(Z¯i)=Var(Zi,1+…+Zi,RR)=Var(Zi,1)+…+Var(Zi,R)+(R×(R−1))rR2=1+(R−1)rRσi2

It is clearly seen that the variance of Z̅_i is not a constant, but the parameter correlation r can be empirically estimated by

Zi,a∼N(γi, σi2) and Z¯i∼N(γi, 1+(R−1)rRσi2)Cor(Zi,a, Zi,b)=r for any a, b, i

where a and b measure different replicates, and i indicates the ith feature. Because σi2 is close to 1, we could let σi2 take 1 to simplify our calculation without lose the accuracy of estimator.

A new statistic, Zi‡=Z¯i1+(R−1)rR, defined from Z̅_i has a unite variance, and follows a normal distribution with mean γi‡=γi1+(R−1)rR. The relevant Bayesian estimate of parameter γi‡ is achieved by E(γi‡|Zi‡) via shrinking the estimator Zi‡. The Empirical Bayes estimate of true γ_i is estimated by E(γi|Zi‡)=E(γi‡|Zi‡)1+(R−1)rR where E(γi‡|Zi‡)=E(γi1+(R−1)rR|Zi‡)=E(γi|Zi‡)1+(R−1)rR. Then we incorporate estimates E(γi|Zi‡) into the prediction model. In particular, r is estimated by

∑i=1N∑a=1R(Zi,a−Z¯i)2R−1N=1−rr^=1−∑i=1NsZi2N

where sZi2 is the sample variance of a sequence of statistics {Z_i,1, …, Z_i,R}at the ith feature, and i=1, …, N.

References

Almasy, L. A., T. D. Dyer, J. M. Peralta, J. W. Kent, J. C. Charlesworth, J. E. Curran and J. Blangero (2011): “Genetic analysis workshop 17 mini-exome simulation,” BMC Proc., 5 (Suppl 9), S2.10.1186/1753-6561-5-S9-S2Search in Google Scholar

Almasy, L. A., T. D. Dyer, J. M. Peralta, G. Jun, A. R. Wood, C. Fuchsberger, M. A. Almeida, J. W. Kent, S. Fowler, T. W. Blackwell, S. Puppala, S. Kumar, J. E. Curran, D. Lehman, G. Abecasis, R. Duggirala, J. Blangero and The T2D-GENES Consortium (2014): “Data for genetic analysis workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees,” BMC Proc., 8 (Suppl 2), S2.10.1186/1753-6561-8-S1-S2Search in Google Scholar

Amasyali, M. F. and B. Diri (2006): “Automatic Turkish Text Categorization in terms of Author, genre and gender,” Proc. of the 11th International Conference on Applications of Natural Language to Information Systems, pp. 221–226.Search in Google Scholar

Ball, M. P., J. V. Thakuria, A. W. Zaranek, T. Clegg, A. M. Rosenbaum, X. Wu, M. Angrist, J. Bhak, J. Bobe, M. J. Callow, C. Cano, M. F. Chou, W. K. Chung, S. M. Douglas, P. W. Estep, A. Gore, P. Hulick, A. Labarga, J. H. Lee, J. E. Lunshof, B. C. Kim, J. I. Kim, Z. Li, M. F. Murray, G. B. Nilsen, B. A. Peters, A. M. Raman, H. Y. Rienhoff, K. Robasky, M. T. Wheeler, W. Vandewege, D. B. Vorhaus, J. L. Yang, L. Yang, J. Aach, E. A. Ashley, R. Drmanac, S. J. Kim, J. B. Li, L. Peshkin, C. E. Seidman, J. S. Seo, K. Zhang, H. L. Rehm and G. M. Church (2012): “A public resource facilitating clinical use of genomes,” Proc. Natl. Acad. Sci. USA, 109, 11920–11927.10.1073/pnas.1201904109Search in Google Scholar

Bamshad, M. J., S. B. Ng, A. W. Bigham, H. K. Tabor, M. J. Emond, D. A. Nickerson and J. Shendure (2011): “Exome sequencing as a tool for Mendelian disease gene discovery,” Nat. Rev. Genet., 12, 745–755.Search in Google Scholar

Breiman, L. (2001): “Random forests,” Mach. Learn., 45, 5–32.Search in Google Scholar

Brown, L. D. (1971): “Admissible estimators, recurrent diffusions, and insoluble boundary value problems,” Ann. Math. Statist., 42, 855–903.Search in Google Scholar

Cirulli, E. T. and D. B. Goldstein (2010): “Uncovering the roles of rare variants in common disease through whole genome sequencing,” Nat. Rev. Genet., 11, 415–425.Search in Google Scholar

Dawid, A. P. (1994): Selection paradoxes of Bayesian inference, Multivariate analysis and its applications (Hong Kong.1992). Hayward, CA: IMS, pp. 211–220.Search in Google Scholar

Efron, B. (2009): “Empirical bayes estimates for large-scale prediction problems,” J. Am. Stat. Assoc., 104, 1015–1028.Search in Google Scholar

Goldstein, D. B., A. Allen, J. Keebler, E. H. Margulies, S. Petrou, S. Petrovski and S. Sunyaev (2013): “Sequencing studies in human genetics: design and interpretation,” Nat. Rev. Genet., 14, 460–470.Search in Google Scholar

Hindorff, L. A., P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins and T. A. Manolioa (2009): “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proc. Natl. Acad. Sci. USA, 106, 9362–9367.10.1073/pnas.0903103106Search in Google Scholar

Hoerl, A. E. and R. Kennard (1970): “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics 12, 55–67.10.1080/00401706.1970.10488634Search in Google Scholar

International Schizophrenia Consortium, Purcell, S. M., N. R. Wray, J. L. Stone, P. M. Visscher, M. C. O’Donovan, P. F. Sullivan, P. Sklar, Collaborators (121) (2009): “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder,” Nature, 460, 748–752.10.1038/nature08185Search in Google Scholar

Ker, A. P. (2001): “On the maximum of bivariate normal random variables,” Extremes 4, 185–190.10.1023/A:1013977210907Search in Google Scholar

Klema, J. and A. Almonayyes (2006): “Automatic categorization of fanatic texts using random forests,” Kuwait J. Sci. Eng., 33, 1–18.Search in Google Scholar

Le Cun, Y. (1989): “Generalization and network design strategies,” Technical Report CRG-TR-89-4, Dept. of Comp.Sci,. Univ. of Toronto.Search in Google Scholar

Le Cun, Y., L. Bottou, Y. Bengio and P. Haffner (1998): “Gradient-based learning applied to document recognition,” Proc. IEEE, 86, 2278–2324.10.1109/5.726791Search in Google Scholar

Lee, S. H., N. R. Wray, M. E. Goddard and P. M. Visscher (2011): “Estimating missing heritability for disease from genome-wide association studies,” Am. J. Hum. Genet., 88, 294–305.Search in Google Scholar

Li, B. and S. M. Leal (2008): “Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data,” Am. J. Hum. Genet., 83, 311–321.Search in Google Scholar

Li, G. X., J. Ferguson, W. Zheng, J. S. Lee, X. H. Zhang, L. Li, J. Kang, X. T. Yan and H. Y. Zhao (2011): “Large-scale risk prediction applied to genetic analysis workshop 17 mini-exome sequence data,” BMC Proc., 5 (Suppl 9), S46.Search in Google Scholar

Luo, L., E. Boerwinkle and M. Xiong (2011): “Association studies for next-generation sequencing,” Genome Res., 21, 1099–1108.Search in Google Scholar

Madsen, B. E. and S. R. Browning (2009): “A groupwise association test for rare mutations using a weighted sum statistic,” PLoS Genet., 5, e1000384.Search in Google Scholar

Montgomery, D. C., E. A. Peck and G. G. Vining (2006): Introduction to linear regression analysis, Hoboken, NJ, USA: Wiley-Interscience, John Wiley & Sons, fourth edition.Search in Google Scholar

Nadarajah, S. and S. Kotz (2008): “Exact distribution of the Max/Min of two gaussian random variables,” IEEE Trans. VLSI Syst., 16, 2.Search in Google Scholar

Need, A. C., V. Shashi, Y. Hitomi, K. Schoch, K. V. Shianna, M. T. McDonald, M. H. Meisler and D. B. Goldstein (2012): “Clinical application of exome sequencing in undiagnosed genetic conditions,” J. Med. Genet., 49, 353–361.Search in Google Scholar

Ng, S. B., A. W. Bigham, K. J. Buckingham, M. C. Hannibal, M. J. McMillin, H. I. Gildersleeve, A. E. Beck, H. K. Tabor, G. M. Cooper, H. C. Mefford, C. Lee, E. H. Turner, J. D. Smith, M. J. Rieder, K. Yoshiura, N. Matsumoto, T. Ohta, N. Niikawa, D. A. Nickerson, M. J. Bamshad and J. Shendure (2010): “Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome,” Nat. Genet., 42, 790–793.Search in Google Scholar

Price, A. L., G. V. Kryukov, P. I. de Bakker, S. M. Purcell, J. Staples, L. J. Wei and S. R. Sunyaev (2010): “Pooled association tests for rare variants in exon-resequencing studies,” Am. J. Hum. Genet., 86, 832–838.Search in Google Scholar

Renkema, K. Y., M. F. Stokman, R. H. Giles and N. V. A. M. Knoers (2014): “Next-generation sequencing for research and diagnostics in kidney disease,” Nat. Rev. Nephrol., 10, 433–444.Search in Google Scholar

Senn, S. (2008): “A note concerning a selection ‘Paradox’ of Dawid’s,” Am. Statist., 62, 206–210.Search in Google Scholar

Singh, D., P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A Renshaw, A. V. D’Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kanto, T. R. Golub and W. R. Sellers (2002): “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, 1, 203–209.10.1016/S1535-6108(02)00030-2Search in Google Scholar

Stahl, E. A., D. Wegmann, G. Trynka, J. Gutierrez-Achury, R. Do, B. F. Voight, P. Kraft, R. Chen, H. J. Kallberg, F. A. Kurreeman and others (2012): “Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis,” Nat. Genet., 44, 483–489.Search in Google Scholar

Stein, C. M. (1981): “Estimation of the mean of a multivariate normal distribution,” Ann. Stat., 9, 1135–1151.Search in Google Scholar

Tibshirani, R. (1996): “Regression shrinkage and selection via the Lasso,” J. Roy. Stat. Soc. B, 58, 267–288.Search in Google Scholar

Tibshirani, R., T. Hastie, B. Narasimhan and G. Chu (2002): “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proc. Natl. Acad. Sci. USA, 99, 6567–6572.10.1073/pnas.082099299Search in Google Scholar PubMed PubMed Central

Wray, N. R., M. E. Goddard and P. M. Visscher (2007): “Prediction of individual genetic risk to disease from genome-wide association studies,” Genome Res., 17, 1520–1528.Search in Google Scholar

Wu, M. C., S. Lee, T. Cai, Y. Li, M. Boehnke and X. Lin (2011): “Rare-variant association testing for sequencing data with the sequence kernel association test,” Am. J. Hum. Genet., 89, 82–93.Search in Google Scholar

Yang, Y., D. M. Muzny, J. G. Reid, M. N. Bainbridge, A. Willis, P. A. Ward, A. Braxton, J. Beuten, F. Xia, Z. Niu, M. Hardison, R. Person, M. R. Bekheirnia, M. S. Leduc, A. Kirby, P. Pham, J. Scull, M. Wang, Y. Ding, S. E. Plon, J. R. Lupski, A. L. Beaudet, R. A. Gibbs and C. M. Eng (2013): “Clinical whole-exome sequencing for the diagnosis of mendelian disorders,” N. Engl. J. Med., 369, 1502–1511.Search in Google Scholar

Zhang, H. (2008): “Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies,” Biostatistics, 9, 621–634.10.1093/biostatistics/kxn001Search in Google Scholar PubMed PubMed Central

Published Online: 2015-12-7

Published in Print: 2015-12-1

An Empirical Bayes risk prediction model using multiple traits for sequencing data

Abstract

Acknowledgments

Appendix

References

Journal and Issue

Articles in the same Issue