Skip to main content
Log in

Regularized principal components of heritability

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In family studies with multiple continuous phenotypes, heritability can be conveniently evaluated through the so-called principal-component of heredity (PCH, for short; Ott and Rabinowitz in Hum Hered 49:106–111, 1999). Estimation of the PCH, however, is notoriously difficult when entertaining a large collection of phenotypes which naturally arises in dealing with modern genomic data such as those from expression QTL studies. In this paper, we propose a regularized PCH method to specifically address such challenges. We show through both theoretical studies and data examples that the proposed method can accurately assess the heritability of a large collection of phenotypes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bertsekas D (1995) Nonlinear programming. Athena Scientific, Belmont, MA

    MATH  Google Scholar 

  • Bickel P, Levina E (2004) Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10:989–1010

    Article  MATH  MathSciNet  Google Scholar 

  • Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen K, Morley M, Spielman RS (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33:422–425

    Article  Google Scholar 

  • Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT (2005) Mapping determinants of human gene expression by regional and genome-wide association. Nature 437:1365–1369

    Article  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  MATH  MathSciNet  Google Scholar 

  • Fan J, Feng Y, Tong X (2012) A road to classification in high dimensional space: the regularized optimal affine discriminant. J R Stat Soc Ser B 74:745–771

    Article  MathSciNet  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh-dimensional feature space. J R Stat Soc Ser B 70:849–911

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York

    Book  MATH  Google Scholar 

  • Jin M, Fang Y (2011) Variable selection in canonical discriminant analysis for family studies. Biometrics 67:124–132

    Article  MATH  MathSciNet  Google Scholar 

  • Kavinoky R, Thoo JB (2008) The number of real roots of a cubic equation. AMATYC Rev 29:3–8

    Google Scholar 

  • Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430:743–747

    Article  Google Scholar 

  • Ott J, Rabinowitz D (1999) A principal-components approach based on heritability for combining phenotype information. Hum Hered 49:106–111

    Article  Google Scholar 

  • Reilly C, Miller MB, Liu Y, Oetting WS, King R, Blumenthal M (2007) Linkage analysis of a cluster-based quantitative phenotypes constructed from pulmonary function test data in 27 multigenerational families with multiple asthmatic members. Hum Hered 64:136–145

    Article  Google Scholar 

  • The Collaborative Study on the Genetics of Asthma (CSGA) (1997) A genome-wide search for asthma susceptibility loci in ethnically diverse populations. Nat Genet 15:389–392

    Google Scholar 

  • Wang Y, Fang Y, Jin M (2007) A ridge penalized principal-components approach based on heritability for high-dimensional data. Hum Hered 64:182–191

    Article  Google Scholar 

  • Winawer MR (2006) Phenotype definition in epilepsy. Epilepsy Behav 8:462–476

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Yuan.

Additional information

The research of Ming Yuan was supported in part by NSF Career Award DMS-0846234 and FRG Award DMS-1265202.

Appendix

Appendix

1.1 Proof of theorem

Assume the average family size \(N/n \rightarrow m_0\) and \(\min _{\Vert \beta \Vert =1}{\beta ^\mathsf{T}\Sigma _T\beta }\ge \delta >0\). Rewrite

$$\begin{aligned} h_{n, \lambda }(\beta )=\frac{\beta ^\mathsf{T}{\Sigma }_A\beta +\beta ^\mathsf{T}\Delta _A\beta }{\beta ^\mathsf{T}{\Sigma }_T\beta +\beta ^\mathsf{T}\Delta _T\beta +\lambda \Vert \beta \Vert _1^2}, \end{aligned}$$

where \(\Delta _A=\widehat{\Sigma }_A-\Sigma _A\) and \(\Delta _T=\widehat{\Sigma }_T-\Sigma _T\). Under some mild conditions, we have (e.g., Bickel and Levina 2004) \(\Vert \Delta _A\Vert _{\infty }=O_p(\sqrt{(\log d)/n})\) and \(\Vert \Delta _T\Vert _{\infty }=O_p(\sqrt{(\log d)/n})\), where \(\Vert \cdot \Vert _{\infty }\) is the element-wise super-norm. From \(\beta ^\mathsf{T}\Delta _A\beta \le \Vert \beta \Vert _{\ell _1} \Vert \Delta _A\beta \Vert _{\infty }\le \Vert \Delta _A\Vert _{\infty }\Vert \beta \Vert _{\ell _1}^2\), we have \(\beta ^\mathsf{T}\Delta _A\beta \le O_p(\sqrt{(\log d)/(n-1)}) \Vert \beta \Vert _{\ell _1}^2 \). Similarly, \(\beta ^\mathsf{T}\Delta _T\beta \le O_p(\sqrt{(\log d)/(N-n)}) \Vert \beta \Vert _{\ell _1}^2\).

Thus, if \(\lambda \gg O_p(\sqrt{(\log d)/n})\) and \(\min _{\Vert \beta \Vert =1}{\beta ^\mathsf{T}\Sigma _T\beta }\ge \delta >0\), we have \(\max _{\beta }|h_{n, \lambda }(\beta )-h_{0, \lambda }(\beta )|=o_p(1)\), where \(h_{0, \lambda }(\beta )=\beta ^\mathsf{T}{\Sigma }_A\beta /(\beta ^\mathsf{T}{\Sigma }_T\beta +\lambda \Vert \beta \Vert _{\ell _1}^2)\). Further, because \(\lambda \Vert \beta \Vert _{\ell _1}^2\le \lambda s_0\Vert \beta \Vert ^2\), if \(\lambda \ll 1/s_0\) and \(\min _{\Vert \beta \Vert =1}{\beta ^\mathsf{T}\Sigma _T\beta }\ge \delta >0\), we have \(\max _{\beta }|h_{0, \lambda }(\beta )-h(\beta )|=o(1)\). Together, we have

$$\begin{aligned} \max _{\beta }|h_{n, \lambda }(\beta )-h(\beta )|=o_p(1). \end{aligned}$$

Therefore, noting that \(0\le h(\beta _0)-h(\widehat{\beta }(\lambda ))\le h(\beta _0)-h_{n, \lambda }(\beta _0)+ h_{n, \lambda }(\widehat{\beta }(\lambda ))-h(\widehat{\beta }(\lambda ))\), we have \(h(\widehat{\beta }(\lambda ))\rightarrow _p h_{\max }\), and noting that \(h_{n, \lambda }(\beta _0)-h(\beta _0)\le h_{n, \lambda }(\widehat{\beta }(\lambda ))-h(\beta _0)\le h_{n, \lambda }(\widehat{\beta }(\lambda ))-h(\widehat{\beta }(\lambda ))\), we have \(h_{n,\lambda }(\widehat{\beta }(\lambda ))\rightarrow _p h_{\max }\).

1.2 The coordinate descent algorithm

We discuss the path solution to the optimization problem (4). Write two input matrices \(\widehat{\Sigma }_A\) and \(\widehat{\Sigma }_T\) as \((a_{ij})\) and \((t_{ij})\) respectively. Note that when \(\lambda \) is larger than some value say \(\lambda _{\max }, \widetilde{\beta }(\lambda )=\mathbf{0}_d\). Here we derive the formula for \(\lambda _{\max }\). If \(\beta _{(-1)}=\mathbf{0}_{d-1}\), then \(g(x)=t_{11}x^2+\lambda x^2+\gamma (a_{11}x^2-1)^2,\) which achieves minimum at \(x=\sqrt{[-\frac{t_{11}+\lambda -2a_{11}\gamma }{2\gamma a_{11}^2}]_{+}},\) where \([a]_{+}\) equals \(a\) if \(a\ge 0\) and zero otherwise. Therefore,

$$\begin{aligned} \lambda _{\max }=\max _{1\le j \le d}\{2a_{jj}\gamma -t_{jj}\}. \end{aligned}$$

On a fine grid of \(L\) values of \(\lambda \), say \(\exp \{\log (\lambda _{\max })(1:L)/L\}\), we computer the solution path starting with \(\lambda =\lambda _{\max }\) backwardly and \(\widetilde{\beta }(\lambda _{\max })=\mathbf{0}_d\). Then we calculate the solution associated with the consecutive \(\lambda \) on the grid, using the solution we obtain most recently as the initial for minimization.

Continuing the discussion in Sect. 2.2, the problem narrows down to find the minimum point of function (5). Now we examine the local minimum point(s) of \(g(x)\) when \(\beta _{(-1)}\ne \mathbf{0}_{d-1}\). Letting its derivative be zero, we have \(c_3x^3+c_2x^2+c_1x+c_0=0\), where \(c_3=2\gamma a_{11}, c_2=6\gamma a_{11}a_{12}^{*}, c_1=t_{11}+\lambda +4\gamma {a_{12}^{*}}^2+2\gamma a_{11}(a_{22}^{*}-1)\), and \(c_0=t_{12}^{*}+\lambda \hbox {sign}(x)r^{*}+2\gamma (a_{22}^{*}-1)a_{12}^{*}\). Here \(\hbox {sign}(x)\) is the sign of \(x\). Now the problem becomes finding solutions to this cubic equation. Let \(\widehat{\beta }_{(1)}\) be the updated \(\beta _{(1)}\) after one coordinate descent step.

The unique non-differentiable point of \(g(x), x=0\), needs special concerns. The set of all subgradients of \(g(x)\) at \(x=0\) is \(\{t_{12}^{*}+\lambda \xi r^{*}+2\gamma (a_{22}^{*}-1)a_{12}^{*}: |\xi |\le 1\}\) (for the definition of subgrandient, see e.g., Bertsekas 1995). At any \(x\ne 0, g(x)\) is differentiable. Letting \(P=c_1/c_3-c_2^2/(3c_3^2)\) and \(Q=2c_2^3/(27c_3^3)-c_1c_2/(3c_3)+c_0/c_3\), the equation becomes \(y^3+Py+Q=0\) with transformation \(y=x+c_2/(3c_3)\).

The problem of finding the solutions to \(y^3+Py+Q=0\) has been solved by many mathematicians. Here we follow Kavinoky and Thoo (2008). Define \(\Delta =Q^2/4+P^3/27, S=\root 3 \of {-Q/2+\sqrt{D}}\), and \(T=\root 3 \of {-Q/2-\sqrt{D}}\). Always, there are three roots (real or complex): \(y_1=S+T, y_2=-(S+T)/2+\sqrt{-3/4}(S-T), y_3=-(S+T)/2-\sqrt{-3/4}(S-T)\). If \(\Delta >0\), there is one real root, if \(D=0\), there are two real roots (one of them is minimum point), and if \(D<0\), there are three real roots (two of them are local minimum points). Let \(x_i=y_i-c_2/(3c_3), i=1, 2, 3\). Also let \(x_{10}=\min \{x_1, x_2, x_3\}\) and \(x_{20}=\max \{x_1, x_2, x_3\}\) (the middle one is a local maximum point).

Case (i). If \(|t_{12}^{*}+2\gamma (a_{22}^{*}-1)a_{12}^{*}|\le \lambda r^{*}\), zero is a local minimum point of \(g(x)\) and \(\widehat{\beta }_{(1)}=0\).

Case (ii). If \(t_{12}^{*}+2\gamma (a_{22}^{*}-1)a_{12}^{*} > \lambda r^{*}, \widehat{\beta }_{(1)}<0\). If \(D\ge 0\) (\(D\) depends on \(\hbox {sign}(\widehat{\beta }_{(1)})=-1\)), \(x_1\) is minimum point and \(\widehat{\beta }_{(1)}=x_1\). Otherwise, \(\widehat{\beta }_{(1)}\) equals the negative one if \(x_{10}\) and \(x_{20}\) are of different signs, and equals the one of smaller absolute value if both are negative.

Case (iii). If \(t_{12}^{*}+2\gamma (a_{22}^{*}-1)a_{12}^{*} < - \lambda r^{*}, \widehat{\beta }_{(1)}>0\). If \(D\ge 0\) (\(D\) depends on \(\hbox {sign}(\widehat{\beta }_{(1)})=1\)), \(x_1\) is minimum point and \(\widehat{\beta }_{(1)}=x_1\). Otherwise, \(\widehat{\beta }_{(1)}\) equals the positive one if \(x_{10}\) and \(x_{20}\) are of different signs, and equals the one of smaller absolute value if both are positive.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, Y., Feng, Y. & Yuan, M. Regularized principal components of heritability. Comput Stat 29, 455–465 (2014). https://doi.org/10.1007/s00180-013-0444-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-013-0444-3

Keywords

Navigation