Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter December 7, 2015

An Empirical Bayes risk prediction model using multiple traits for sequencing data

  • Gengxin Li EMAIL logo , Yuehua Cui and Hongyu Zhao

Abstract

The rapidly developing sequencing technologies have led to improved disease risk prediction through identifying many novel genes. Many prediction methods have been proposed to use rich genomic information to predict binary disease outcomes. It is intuitive that these methods can be further improved by making efficient use of the rich information in measured quantitative traits that are correlated with binary outcomes. In this article, we propose a novel Empirical Bayes prediction model that uses information from both quantitative traits and binary disease status to improve risk prediction. Our method is built on a new statistic that better infers the gene effect on multiple traits, and it also enjoys the good theoretical properties. We then consider using sequencing data by combining information from multiple rare variants in individual genes to strengthen the signals of causal genetic effects. In simulation study, we find that our proposed Empirical Bayes approach is superior to other existing methods in terms of feature selection and risk prediction. We further evaluate the effectiveness of our proposed method through its application to the sequencing data provided by the Genetic Analysis Workshop 18.


Corresponding author: Gengxin Li, Department of Mathematics and Statistics, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH 45435, USA, e-mail:

Acknowledgments

This work was partially supported by grants from National Institutes of Health (GM59507 to HZ) and National Science Foundation (DMS-1209112 to YC). The Genetic Analysis Workshops are supported by NIH grant R01 GM031575. The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. Andrew R. Wood is supported by European Research Council grant SZ-245 50371-GLUCOSEGENES-FP7-IDEAS-ERC.

Conflict of interest: None declared.

Appendix

A. the best linear unbiased predictor

  1. q is a standardized vector, and E(q)=0 and Var(q)=1.

  2. εi have mean zero and variance ψi. Normally, ψi=1.

  3. assuming a simple linear regression model (Equation 5), we have,

    E(Ci)=E(βi,vq+εi)=0Var(Ci)=Var(βi,vq+εi)=βi,v2+ψi

    Extending Ci to C, we get

    E(C)=0Var(C)=βvβv+Ψ

    where Ψ is the variance-covariance matrix of ε.

  4. The joint distribution of (q, C) is (qC)~[(00),(1βvβvβvβv+Ψ)] since,

    Cov(q,C)=Cov(q,βvq+ε)=Cov(q,βvq)+Cov(q,ε)=βvCov(q,q)+0=βvCor(q,C)=βv1×(βvβv+Ψ)1/2

    where q and ε are uncorrelated.

  5. The best linear predictor is given by

    g(X)=μy+ρσyσx(Xμx)

The predicted value of q is denoted as q# based on g(X). q and C correspond to y and x in the above equation. According to (i) and (iii),

E(q)=0μy=0E(C)=0μx=0Var(q)=1σy=1Var(C)=βvβv+Ψσx=(βvβv+Ψ)1/2Cor(q,C)=βv1×(βvβv+Ψ)1/2ρ=βv(βvβv+Ψ)1/2

Finally, the best predictor equation becomes,

q#=ρσyσxX=βv(βvβv+Ψ)1C=11+λv2βvΨ1C

where Ψ is the variance-covariance matrix of ε, and λv2=βvΨ1βv which is a squared multivariate distance of βv from the origin.

B: thetstatistic

If εi follows a normal distribution, the t-test statistic for the null hypothesis βi,v=0 under model (5) follows a non-central t distribution,

Ti,v=β^i,v0se(β^i,v)=2c0β^i,vtn2(γi,v) where c02=j=1n(qjq¯)24

Proof: Given q, the ordinary least squares estimate of coefficient βi,v is β^i,v|q=(qq)1qCi (Montgomery et al., 2006),

Var(β^i,v|q)=(qq)1qq(qq)1Var(Ci|q)=(qq)1se(β^i,v|q)=(qq)1/2

It is known that E(q)=0,

qq=j=1n(qjq¯)2

If εi is normally distributed, the t-test statistic follows,

Ti,v=β^i,v0se(β^i,v)=(qq)1/2β^i,v=2c0β^i,vtn2(γi,t)

where c02=j=1n(qjq¯)24, then 2c0=(j=1n(qjq¯)2)1/2=(qq)1/2.

C: proof of proposition 1

The proof follows the results in literature (Ker, 2001; Nadarajah and Kotz, 2008). To simplify the notation, we use Z1 and Z2 to represent Zi,d and Zi,v, respectively. Assume that (Z1, Z2) follow a bivariate normal distribution (Z1Z2)MVN(γ,Σ) where γ=(γ1,γ2)T,Σ=(σ12pσ1σ2pσ1σ2σ22) where ρ denotes the correlation between Z1 and Z2. Let Z=min(Z1, Z2) and I=r if Z=Zr (r=1, 2).

Lemma 1: Let ϕ(.) and Φ(.) are the univariate standard normal density (pdf) and its corresponding cumulative distribution function (cdf), respectively. Then the conditional pdf of Z given I=r for all z and r takes 1 or 2 is,

fr(z)=Pr(Zz|I=r){pr1ϕ(zγrσr)(1Φ(zγrσr))if  ρ1.ϕ(zγrσr)otherwise.

where γr=αrγ3r+(1αr)γr,σr=αrσ3r1ρ2, and αr=11ρσ3rσr. Here, Z=min(Z1, Z2) works for ρ≠1. If ρ=1, Z=Z1=Z2.

And pr=P(I=r)=Φ(γrγ3rσ12+σ222ρσ1σ2)

Proof: Let ℵ be the distribution of pair (Z, I), then

(z,1)=Pr(Zz,I=1)=Pr(I=1)Pr(Z2>Z1>z)=p1zz1ϕ((z2γ2)ρσ2σ1(z1γ1)σ21ρ2)ϕ(z1γ1σ1)dz2dz1=p1z(1Φ((zγ2)ρσ2σ1(zγ1)σ21ρ2))ϕ(zγ1σ1)dz

Hence, for all z,

(z,1)=Pr(Zz,I=1)=Pr(Zz|I=1)Pr(I=1)=p1f1(z)=1σ1ϕ(zγ1σ1)(1Φ((zγ2)ρσ2σ1(zγ1σ1)σ21ρ2))=1σ1ϕ(zγ1σ1)Φ(ρ(zγ1)σ11ρ2zγ2σ21ρ2)

where Φ(a)=1–Φ(a).

The equation of p1 is derived from,

p1=Pr(Z1Z2<0)=γ1γ2σ12+σ222ρσ1σ2ϕ(z)dz

Proposition 1: The distribution of the pair (Z, I) uniquely determines the distribution of the pair (Z1, Z2).

Proof: Suppose (Z, I) has the distribution ℵ, and we use parameters (γ1,γ2) and σr for r=1 or 2 to denote any bivariate normal pdf of (Z, I). According to the above results,

limzprfr(z)ϕ(zγrσr)=limzprfr(z)ϕ(zγrσr)=1

Hence, ϕ(zγrσr)ϕ(zγrσr)1, then we will get γr=γr,σr=σr and ρ=ρ′.

Similarly, for I=2, the pdf of (Z, I) is derived by

(z,2)=Pr(Zz,I=2)=p2f2(z)=1σ2ϕ(zγ2σ2)Φ(ρ(zγ2)σ21ρ2zγ1σ11ρ2)

Finally, the probability density function of Z=min(Z1, Z2) for I=1 or 2 will be given by,

(z,1)+(z,2)=p1f1(z)+p2f2(z)=1σ1ϕ(zγ1σ1)Φ(ρ(zγ1)σ11ρ2zγ2σ21ρ2)+1σ2ϕ(zγ2σ2)Φ(ρ(zγ2)σ21ρ2zγ1σ11ρ2)

when Z1 and Z2 are replaced by Zi,d and Zi,v separately, γ1 and γ2 denote γi,d and γi,v, and Z is replaced by Zi,m, The pdf of Zi,m=min(Zi,d, Zi,v) is derived.

In the same way, we can derive the pdf of Z=max(Z3, Z4). Assume that (Z3, Z4) follow a bivariate normal distribution, (Z3Z4)MVN(γ,Σ) where γi=(γ3,γ4)T,Σ=(σ32ρσ3σ4ρσ3σ4σ42) where ρ denotes the correlation between Z3 and Z4. Let Z=max(Z3, Z4) and I=r if Z=Zr (r=3 or 4).

Lemma 2: Let ϕ(.) and Φ(.) are the univariate standard normal density (pdf) and its corresponding cumulative distribution function (cdf), respectively. Then the pdf of Z given I=r for all z and r takes 3 or 4 is,

fr(z)=Pr(Zz|I=r){pr1ϕ(γrzσr)Φ(zγrσr)if  ρ1.ϕ(γrzσr),otherwise.

where γr=αrγ7r+(1αr)γr,σr=αrσ7r1ρ2, and αr=11ρσ7rσr.

Proof: let ℵ be the distribution of pair (Z, I), then

(z,3)=Pr(Zz,I=3)=Pr(Z4<Z3z)=zz3ϕ((z3γ3)ρσ3σ4(z3γ4)σ31ρ2)ϕ(z4γ4σ4)dz4dz3=zΦ(ρσ3σ4(γ4z)(γ3z)σ31ρ2)ϕ(zγ4σ4)dz

Then the pdf of (Z, 3) is given by

p3f3(z)=1σ4ϕ(γ4zσ4)Φ(ρ(γ4z)σ41ρ2γ3zσ31ρ2)

where ϕ(a)=ϕ(–a).

Similarly, the pdf of (Z, 4) is given by,

p4f4(z)=1σ3ϕ(γ3zσ3)Φ(ρ(γ3z)σ31ρ2γ4zσ41ρ2)

The equation of p3 is derived from,

p3=Pr(Z3Z4>0)=γ4γ3σ32+σ422ρσ3σ4)ϕ(z)dz

Proposition 2: The distribution of the pair (Z, I) uniquely determines the distribution of the pair (Z3, Z4).

Proof: same as the above proposition 1.

Finally, the pdf of Z=max(Z1, Z2) is given by,

p3f3(z)+p4f4(z)=1σ4ϕ(γ4zσ4)Φ(ρ(γ4z)σ41ρ2γ3zσ31ρ2)+1σ3ϕ(γ3zσ3)Φ(ρ(γ3z)σ31ρ2γ4zσ41ρ2)

Again, when Z3 and Z4 are replaced by Zi,v and Zi,d separately, γ3 and γ4 denote γi,v and γi,d, and Z is replaced by Zi,m, The pdf of Zi,m=max(Zi,d, Zi,v) is derived.

D: Effects of multiple quantitative traits

GAW 17 provides three quantitative traits, such as: Q1, Q2 and Q4. In particular, the residual heritability of Q1 is 0.44, and covariates, age and smoking status, act on the Q1. Q2 has a heritability of 0.29, but is not affected by covariates. Q4 is simulated by various protective genes, and it has a heritability of 0.7. According to the disease risk simulation, Q1 and Q2 are positively related to the disease trait, but Q4 is negatively correlated to the disease status (Appendix Table A1). When we add three quantitative traits in the disease risk EB classifier separately (Appendix Table A2), five true genes are detected by adding Q1 in the EB model, four true genes identified through incorporating Q2 in the EB model, and only two true genes explored by adding Q4 in the EB method. In particular, all three quantitative traits detect FLT1 gene, but Q1 provides the highest ranking position compared to Q2 and Q4. Appendix Table A3 further evaluates the performances of three quantitative traits on predicting disease risk. It clearly illustrates that adding Q1 in the EB model gives the smallest cross-validation error (0.26 with SD 0.0003) and the largest AUC (0.91 with SD 0.022), followed by Q2, then Q4. The results agree with the simulation setting for each quantitative trait.

Table A1

Characteristics of three quantitative traits (Q1, Q2 and Q4).

TraitsHeritabilityCovariatesDiseases trait
Q10.44Related+
Q20.29Unrelated+
Q40.70Related

Covariates: age, sex and smoking status are used as the covariates; Disease trait: is a binary trait; +: denotes a positive correlation; –: indicates a negative correlation; related/unrelated: is used to check whether the quantitative trait is influenced by the covariates or not.

Table A2

The ranking position of true genes for three classifiers (D+Q1, D+Q2 and D+Q4).

PredictorD+Q1D+Q2D+Q4
Rank positionRank positionRank position
1FLT1144
2KDR2
3PTK2B16
4PIK3C2B4726
5SHC1208
6FLT4182
7VEGFA73
8AKT3123

The ranking positions of eight true genes FLT1, KDR, PTK2B, PIK3C2B, SHC1, FLT4, VEGFA and AKT3 detected by 3 models (D+Q1, D+Q2 and D+Q4); D+Q1: adding Q1 in the Empirical Bayes model; D+Q2: adding Q2 in the Empirical Bayes model; D+Q4: adding Q4 in the Empirical Bayes model; “–”: the position of one gene is outside of top 210 important genes.

Table A3

Cross validation error and AUC for three classifiers (D+Q1, D+Q2 and D+Q4).

ItemScenarioMethodD+Q1D+Q2D+Q4
CVGene+EnviMean0.260.270.31
SD0.00030.00020.0024
AUCGene+EnviMean0.910.880.85
SD0.0220.0200.021

CV indicates the cross validation error; AUC is the area under the receiver operating characteristic (ROC) curve when minimizing CV; SD is the standard deviation of cross validation error and AUC; D+Q1: adding Q1 in the Empirical Bayes model; D+Q2: adding Q2 in the Empirical Bayes model; D+Q4: adding Q4 in the Empirical Bayes model.

E: Multiple replicates

The GAW 17 sequence data contains R replicated phenotypes. This results in the analogous Z statistic for every replicate. According to the Brown and Stein Bayesian model, each Zi,a (a=1, …, R) is derived from a normal distribution N(γi,σi2), then the expectation of a sequence of statistics (Z̅i) is still γi, that is E(Z¯i)=E(Zi,1++Zi,RR)=γi. But the variance of this new estimator deviates from the unit variance due to the correlations among the sequence of Z statistics, which may violate the assumption of the statistic before estimating EB estimates. To interpret the variance of Z̅i, we define a correlation r for any two replicates of statistics (Zi,a and Zi,b) where a, b=1, …, R and ab, then Cor(Zi,a, Zi,b)=r.

Var(Z¯i)=Var(Zi,1++Zi,RR)=Var(Zi,1)++Var(Zi,R)+(R×(R1))rR2=1+(R1)rRσi2

It is clearly seen that the variance of Z̅i is not a constant, but the parameter correlation r can be empirically estimated by

Zi,aN(γi,σi2)   and   Z¯iN(γi,1+(R1)rRσi2)Cor(Zi,a,Zi,b)=r   for any a,b,i

where a and b measure different replicates, and i indicates the ith feature. Because σi2 is close to 1, we could let σi2 take 1 to simplify our calculation without lose the accuracy of estimator.

A new statistic, Zi=Z¯i1+(R1)rR, defined from Z̅i has a unite variance, and follows a normal distribution with mean γi=γi1+(R1)rR. The relevant Bayesian estimate of parameter γi is achieved by E(γi|Zi) via shrinking the estimator Zi. The Empirical Bayes estimate of true γi is estimated by E(γi|Zi)=E(γi|Zi)1+(R1)rR where E(γi|Zi)=E(γi1+(R1)rR|Zi)=E(γi|Zi)1+(R1)rR. Then we incorporate estimates E(γi|Zi) into the prediction model. In particular, r is estimated by

i=1Na=1R(Zi,aZ¯i)2R1N=1rr^=1i=1NsZi2N

where sZi2 is the sample variance of a sequence of statistics {Zi,1, …, Zi,R}at the ith feature, and i=1, …, N.

References

Almasy, L. A., T. D. Dyer, J. M. Peralta, J. W. Kent, J. C. Charlesworth, J. E. Curran and J. Blangero (2011): “Genetic analysis workshop 17 mini-exome simulation,” BMC Proc., 5 (Suppl 9), S2.10.1186/1753-6561-5-S9-S2Search in Google Scholar

Almasy, L. A., T. D. Dyer, J. M. Peralta, G. Jun, A. R. Wood, C. Fuchsberger, M. A. Almeida, J. W. Kent, S. Fowler, T. W. Blackwell, S. Puppala, S. Kumar, J. E. Curran, D. Lehman, G. Abecasis, R. Duggirala, J. Blangero and The T2D-GENES Consortium (2014): “Data for genetic analysis workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees,” BMC Proc., 8 (Suppl 2), S2.10.1186/1753-6561-8-S1-S2Search in Google Scholar

Amasyali, M. F. and B. Diri (2006): “Automatic Turkish Text Categorization in terms of Author, genre and gender,” Proc. of the 11th International Conference on Applications of Natural Language to Information Systems, pp. 221–226.Search in Google Scholar

Ball, M. P., J. V. Thakuria, A. W. Zaranek, T. Clegg, A. M. Rosenbaum, X. Wu, M. Angrist, J. Bhak, J. Bobe, M. J. Callow, C. Cano, M. F. Chou, W. K. Chung, S. M. Douglas, P. W. Estep, A. Gore, P. Hulick, A. Labarga, J. H. Lee, J. E. Lunshof, B. C. Kim, J. I. Kim, Z. Li, M. F. Murray, G. B. Nilsen, B. A. Peters, A. M. Raman, H. Y. Rienhoff, K. Robasky, M. T. Wheeler, W. Vandewege, D. B. Vorhaus, J. L. Yang, L. Yang, J. Aach, E. A. Ashley, R. Drmanac, S. J. Kim, J. B. Li, L. Peshkin, C. E. Seidman, J. S. Seo, K. Zhang, H. L. Rehm and G. M. Church (2012): “A public resource facilitating clinical use of genomes,” Proc. Natl. Acad. Sci. USA, 109, 11920–11927.10.1073/pnas.1201904109Search in Google Scholar

Bamshad, M. J., S. B. Ng, A. W. Bigham, H. K. Tabor, M. J. Emond, D. A. Nickerson and J. Shendure (2011): “Exome sequencing as a tool for Mendelian disease gene discovery,” Nat. Rev. Genet., 12, 745–755.Search in Google Scholar

Breiman, L. (2001): “Random forests,” Mach. Learn., 45, 5–32.Search in Google Scholar

Brown, L. D. (1971): “Admissible estimators, recurrent diffusions, and insoluble boundary value problems,” Ann. Math. Statist., 42, 855–903.Search in Google Scholar

Cirulli, E. T. and D. B. Goldstein (2010): “Uncovering the roles of rare variants in common disease through whole genome sequencing,” Nat. Rev. Genet., 11, 415–425.Search in Google Scholar

Dawid, A. P. (1994): Selection paradoxes of Bayesian inference, Multivariate analysis and its applications (Hong Kong.1992). Hayward, CA: IMS, pp. 211–220.Search in Google Scholar

Efron, B. (2009): “Empirical bayes estimates for large-scale prediction problems,” J. Am. Stat. Assoc., 104, 1015–1028.Search in Google Scholar

Goldstein, D. B., A. Allen, J. Keebler, E. H. Margulies, S. Petrou, S. Petrovski and S. Sunyaev (2013): “Sequencing studies in human genetics: design and interpretation,” Nat. Rev. Genet., 14, 460–470.Search in Google Scholar

Hindorff, L. A., P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins and T. A. Manolioa (2009): “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proc. Natl. Acad. Sci. USA, 106, 9362–9367.10.1073/pnas.0903103106Search in Google Scholar

Hoerl, A. E. and R. Kennard (1970): “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics 12, 55–67.10.1080/00401706.1970.10488634Search in Google Scholar

International Schizophrenia Consortium, Purcell, S. M., N. R. Wray, J. L. Stone, P. M. Visscher, M. C. O’Donovan, P. F. Sullivan, P. Sklar, Collaborators (121) (2009): “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder,” Nature, 460, 748–752.10.1038/nature08185Search in Google Scholar

Ker, A. P. (2001): “On the maximum of bivariate normal random variables,” Extremes 4, 185–190.10.1023/A:1013977210907Search in Google Scholar

Klema, J. and A. Almonayyes (2006): “Automatic categorization of fanatic texts using random forests,” Kuwait J. Sci. Eng., 33, 1–18.Search in Google Scholar

Le Cun, Y. (1989): “Generalization and network design strategies,” Technical Report CRG-TR-89-4, Dept. of Comp.Sci,. Univ. of Toronto.Search in Google Scholar

Le Cun, Y., L. Bottou, Y. Bengio and P. Haffner (1998): “Gradient-based learning applied to document recognition,” Proc. IEEE, 86, 2278–2324.10.1109/5.726791Search in Google Scholar

Lee, S. H., N. R. Wray, M. E. Goddard and P. M. Visscher (2011): “Estimating missing heritability for disease from genome-wide association studies,” Am. J. Hum. Genet., 88, 294–305.Search in Google Scholar

Li, B. and S. M. Leal (2008): “Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data,” Am. J. Hum. Genet., 83, 311–321.Search in Google Scholar

Li, G. X., J. Ferguson, W. Zheng, J. S. Lee, X. H. Zhang, L. Li, J. Kang, X. T. Yan and H. Y. Zhao (2011): “Large-scale risk prediction applied to genetic analysis workshop 17 mini-exome sequence data,” BMC Proc., 5 (Suppl 9), S46.Search in Google Scholar

Luo, L., E. Boerwinkle and M. Xiong (2011): “Association studies for next-generation sequencing,” Genome Res., 21, 1099–1108.Search in Google Scholar

Madsen, B. E. and S. R. Browning (2009): “A groupwise association test for rare mutations using a weighted sum statistic,” PLoS Genet., 5, e1000384.Search in Google Scholar

Montgomery, D. C., E. A. Peck and G. G. Vining (2006): Introduction to linear regression analysis, Hoboken, NJ, USA: Wiley-Interscience, John Wiley & Sons, fourth edition.Search in Google Scholar

Nadarajah, S. and S. Kotz (2008): “Exact distribution of the Max/Min of two gaussian random variables,” IEEE Trans. VLSI Syst., 16, 2.Search in Google Scholar

Need, A. C., V. Shashi, Y. Hitomi, K. Schoch, K. V. Shianna, M. T. McDonald, M. H. Meisler and D. B. Goldstein (2012): “Clinical application of exome sequencing in undiagnosed genetic conditions,” J. Med. Genet., 49, 353–361.Search in Google Scholar

Ng, S. B., A. W. Bigham, K. J. Buckingham, M. C. Hannibal, M. J. McMillin, H. I. Gildersleeve, A. E. Beck, H. K. Tabor, G. M. Cooper, H. C. Mefford, C. Lee, E. H. Turner, J. D. Smith, M. J. Rieder, K. Yoshiura, N. Matsumoto, T. Ohta, N. Niikawa, D. A. Nickerson, M. J. Bamshad and J. Shendure (2010): “Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome,” Nat. Genet., 42, 790–793.Search in Google Scholar

Price, A. L., G. V. Kryukov, P. I. de Bakker, S. M. Purcell, J. Staples, L. J. Wei and S. R. Sunyaev (2010): “Pooled association tests for rare variants in exon-resequencing studies,” Am. J. Hum. Genet., 86, 832–838.Search in Google Scholar

Renkema, K. Y., M. F. Stokman, R. H. Giles and N. V. A. M. Knoers (2014): “Next-generation sequencing for research and diagnostics in kidney disease,” Nat. Rev. Nephrol., 10, 433–444.Search in Google Scholar

Senn, S. (2008): “A note concerning a selection ‘Paradox’ of Dawid’s,” Am. Statist., 62, 206–210.Search in Google Scholar

Singh, D., P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A Renshaw, A. V. D’Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kanto, T. R. Golub and W. R. Sellers (2002): “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, 1, 203–209.10.1016/S1535-6108(02)00030-2Search in Google Scholar

Stahl, E. A., D. Wegmann, G. Trynka, J. Gutierrez-Achury, R. Do, B. F. Voight, P. Kraft, R. Chen, H. J. Kallberg, F. A. Kurreeman and others (2012): “Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis,” Nat. Genet., 44, 483–489.Search in Google Scholar

Stein, C. M. (1981): “Estimation of the mean of a multivariate normal distribution,” Ann. Stat., 9, 1135–1151.Search in Google Scholar

Tibshirani, R. (1996): “Regression shrinkage and selection via the Lasso,” J. Roy. Stat. Soc. B, 58, 267–288.Search in Google Scholar

Tibshirani, R., T. Hastie, B. Narasimhan and G. Chu (2002): “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proc. Natl. Acad. Sci. USA, 99, 6567–6572.10.1073/pnas.082099299Search in Google Scholar PubMed PubMed Central

Wray, N. R., M. E. Goddard and P. M. Visscher (2007): “Prediction of individual genetic risk to disease from genome-wide association studies,” Genome Res., 17, 1520–1528.Search in Google Scholar

Wu, M. C., S. Lee, T. Cai, Y. Li, M. Boehnke and X. Lin (2011): “Rare-variant association testing for sequencing data with the sequence kernel association test,” Am. J. Hum. Genet., 89, 82–93.Search in Google Scholar

Yang, Y., D. M. Muzny, J. G. Reid, M. N. Bainbridge, A. Willis, P. A. Ward, A. Braxton, J. Beuten, F. Xia, Z. Niu, M. Hardison, R. Person, M. R. Bekheirnia, M. S. Leduc, A. Kirby, P. Pham, J. Scull, M. Wang, Y. Ding, S. E. Plon, J. R. Lupski, A. L. Beaudet, R. A. Gibbs and C. M. Eng (2013): “Clinical whole-exome sequencing for the diagnosis of mendelian disorders,” N. Engl. J. Med., 369, 1502–1511.Search in Google Scholar

Zhang, H. (2008): “Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies,” Biostatistics, 9, 621–634.10.1093/biostatistics/kxn001Search in Google Scholar PubMed PubMed Central

Published Online: 2015-12-7
Published in Print: 2015-12-1

©2015 by De Gruyter

Downloaded on 30.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2015-0060/html
Scroll to top button