Double Penalized H-Likelihood for Selection of Fixed and Random Effects in Mixed Effects Models

Xu, Peirong; Wang, Tao; Zhu, Hongtu; Zhu, Lixing

doi:10.1007/s12561-013-9105-x

Double Penalized H-Likelihood for Selection of Fixed and Random Effects in Mixed Effects Models

Published: 05 November 2013

Volume 7, pages 108–128, (2015)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Peirong Xu¹,
Tao Wang²,
Hongtu Zhu³ &
…
Lixing Zhu²

373 Accesses
4 Citations
Explore all metrics

Abstract

The goal of this paper is to develop a double penalized hierarchical likelihood (DPHL) with a modified Cholesky decomposition for simultaneously selecting fixed and random effects in mixed effects models. DPHL avoids the use of data likelihood, which usually involves a high-dimensional integral, to define an objective function for variable selection. The resulting DPHL-based estimator enjoys the oracle properties with no requirement on the convexity of loss function. Moreover, a two-stage algorithm is proposed to effectively implement this approach. An H-likelihood-based Bayesian information criterion (BIC) is developed for tuning parameter selection. Simulated data and a real data set are examined to illustrate the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New variable selection for linear mixed-effects models

Article 25 February 2016

Variable selection for fixed effects varying coefficient models

Article 15 December 2014

Quasi-maximum likelihood estimation and penalized estimation under non-standard conditions

Article 23 April 2024

References

Ahn M, Zhang HH, Lu W (2012) Moment-based method for random effects selection in linear mixed models. Stat Sin 22:1539–1562
MATH MathSciNet Google Scholar
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Google Scholar
Andrews DWK (1992) Generic uniform convergence. Econom Theory 8:241–257
Article Google Scholar
Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MH, While JS (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol 24:127–135
Article Google Scholar
Bondell HD, Krishna A, Ghosh SK (2010) Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66:1069–1077
Article MATH MathSciNet Google Scholar
Chen Z, Dunson DB (2003) Random effects selection in linear mixed models. Biometrics 59:762–769
Article MATH MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article MATH MathSciNet Google Scholar
Fan J, Li R (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723
Article MATH MathSciNet Google Scholar
Foster SD, Verbyla AP, Pitchford WS (2009) Estimation, prediction and inference for the LASSO random effect models. Aust N Z J Stat 51:43–61
Article MathSciNet Google Scholar
Huang J, Wu C, Zhou L (2002) Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89:111–128
Article MATH MathSciNet Google Scholar
Ibrahim JG, Zhu H, Garcia RI, Guo R (2011) Fixed and random effects selection in mixed effects models. Biometrics 67:495–503
Article MATH MathSciNet Google Scholar
Jiang J (2007) Linear and generalized linear mixed models and their applications. Springer, New York
MATH Google Scholar
Jiang J, Jia H, Chen H (2001) Maximum posterior estimation of random effects in generalized linear mixed models. Stat Sin 11:97–120
MATH MathSciNet Google Scholar
Jiang J, Rao J, Gu Z, Nguye T (2008) Fence methods for mixed model selection. Ann Stat 36:1669–1692
Article MATH Google Scholar
Kaslow RA, Ostrow DG, Detels R, Phair JP, Polk BF, Rinaldo CR (1987) The multicenter AIDS cohort study: rationale, organization and selected characteristics of the participants. Am J Epidemiol 126:310–318
Article Google Scholar
Lan L (2006) Variable selection in linear mixed model for longitudinal data. PhD Thesis, North Carolina State University
Lange N, Laird NM (1989) The effect of covariance structures on variance estimation in balance growth-curve models with random parameters. J Am Stat Assoc 84:241–247
Article MATH MathSciNet Google Scholar
Lee Y, Nelder JA (1996) Hierarchical generalized linear models (with discussion). J R Stat Soc B 58:619–678
MATH MathSciNet Google Scholar
Lee Y, Nelder JA, Pawitan Y (2006) Generalized linear models with random effects: unified analysis via H-likelihood. Chapman and Hall, London
Book Google Scholar
Li Y, Wang S, Song PX, Wang N, Zhu J (2011) Doubly regularized estimation and selection in linear mixed-effects models for high-dimensional longitudinal data. Manuscript
Liang H, Wu H, Zou G (2008) A note on conditional AIC for linear mixed-effects models. Biometrika 95:773–778
Article MATH MathSciNet Google Scholar
Meng X (2009) Decoding the H-likelihood. Stat Sci 24:280–293
Article Google Scholar
Ni X, Zhang D, Zhang HH (2010) Variable selection for semiparametric mixed models in longitudinal studies. Biometrics 66:79–88
Article MATH MathSciNet Google Scholar
Peng H, Lu Y (2012) Models selection in linear mixed effect models. J Multivar Anal 109:109–129
Article MATH MathSciNet Google Scholar
Pu W, Niu X (2006) Selecting mixed-effects models based on a generalized information criterion. J Multivar Anal 97:733–758
Article MATH MathSciNet Google Scholar
Rao CR, Wu Y (1989) A strongly consistent procedure for model selection in a regression problem. Biometrika 76:369–374
Article MATH MathSciNet Google Scholar
Schelldorfer J, Buhlmann P, van de Geer S (2011) Estimation for high-dimensional linear mixed-effects models using l ₁-penalization. Scand J Stat 38:197–214
Article MATH MathSciNet Google Scholar
Schelldorfer J, Buhlmann P (2011) GLMMLasso: an algorithm for high-dimensional generalized linear mixed models using l ₁-penalization. Preprint, ETH Zurich http://stat.ethz.ch/people/schell
Schwartz G (1978) Estimating the dimensions of a model. Ann Stat 6:461–464
Article Google Scholar
Song PX (2007) Correlated data analysis: modeling, analytics, and applications. Springer, New York
Google Scholar
Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86
Article MATH MathSciNet Google Scholar
Vaida F, Blanchard S (2005) Conditional Akaike information for mixed-effects models. Biometrika 92:351–370
Article MATH MathSciNet Google Scholar
Wang H, Leng C (2007) Unified Lasso estimation via least square approximation. J Am Stat Assoc 102:1039–1048
Article MATH MathSciNet Google Scholar
Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94:553–568
Article MATH MathSciNet Google Scholar
Wang S, Song PX, Zhu J (2010) Doubly regularized REML for estimation and selection of fixed and random effects in linear mixed-effects models. The University of Michigan Department of Biostatistics Working Paper Series. Working Paper 89. http://biostats.bepress.com/umichbiostat/paper89
White H (1994) Estimation, inference, and specification analysis. Cambridge University Press, New York
Book MATH Google Scholar
Yang H (2007) Variable selection procedures for generalized linear mixed models in longitudinal data analysis. PhD Thesis, North Carolina State University
Zipunnikov VV, Booth JG (2006) Monte Carlo EM for generalized linear mixed models using randomized spherical radial integration. Manuscript
Zou H (2006) The adaptive Lasso and its Oracle properties. J Am Stat Assoc 101:1418–1429
Article MATH Google Scholar

Download references

Acknowledgements

Xu’s research was supported by Scientific Research Foundation of Southeast University; Zhu’s research was supported by a grant from the Research Grants Council of Hong Kong. The authors thank the Editor, the Associate Editor and referees for their constructive suggestions and comments which led an improvement of the early manuscript. A special thank goes to a referee who pointed out a mistake in the original proof of Theorem 4 such that we had a chance to correct it.

Author information

Authors and Affiliations

Department of Mathematics, Southeast University, Nanjing, China
Peirong Xu
Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
Tao Wang & Lixing Zhu
Department of Biostatistics, University of North Carolina, Chapel Hill, USA
Hongtu Zhu

Authors

Peirong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongtu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lixing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lixing Zhu.

Appendix

Proof of Theorem 1

First, note that the marginal likelihood is approximated by function (2.5) through Laplace’s method. Hence, we know that

$$ AP_\alpha(\varpi) = l(\varpi) \bigl(1 + O_p \bigl(n^{-1} \bigr) \bigr) $$

according to the results of Tierney and Kadane [31]. On the other hand, under conditions (C1)–(C5), it follows from White [36] that A ₁₁(ϖ ^∗) is positive definite,

$$ n^{-1/2} \sum^n_{i=1} \frac{\partial l_i(\varpi^*)}{\partial\varpi_a} \stackrel{L}{\longrightarrow} N \bigl(\mathbf{0}, B_{11} \bigl(\varpi^* \bigr) \bigr), $$

and

$$ n^{-1/2} \bigl(\bar{\varpi}_a - \varpi^*_a \bigr) \stackrel{L}{\longrightarrow} N \bigl(\mathbf{0}, A_{11} \bigl( \varpi^* \bigr)^{-1} B_{11} \bigl(\varpi^* \bigr) A_{11} \bigl(\varpi^* \bigr)^{-1} \bigr). $$

Therefore, by Slutsky’s theorem, we have

$$ n^{-1/2} \frac{\partial AP_\alpha(\varpi^*)}{\partial\varpi _a} \stackrel{L}{\longrightarrow} N \bigl(\mathbf{0}, B_{11} \bigl(\varpi^* \bigr) \bigr), $$

(7.1)

and

$$ n^{-1/2} \bigl(\tilde{\varpi}_a - \varpi^*_a \bigr) \stackrel{L}{\longrightarrow} N \bigl(\mathbf{0}, A_{11} \bigl( \varpi^* \bigr)^{-1} B_{11} \bigl(\varpi^* \bigr) A_{11} \bigl(\varpi^* \bigr)^{-1} \bigr), $$

where $\tilde{\varpi}_{a} = (\tilde{\beta}_{a}^{\tau}, \tilde{d}_{a}^{\tau}, \tilde{\gamma}_{a}^{\tau})^{\tau}$ is the unpenalized maximum adjusted profile likelihood estimator of $\varpi^{*}_{a}$.

Then we prove the estimation consistency of $\widehat{\varpi}$. It is sufficient to show that for any given ϵ>0, there exists a large constant C _ϵ such that

$$ P \Bigl(\sup_{\|u\|_2 =C_{\epsilon}} \mathit{DPHL} \bigl(\varpi^* + n^{-1/2}u \bigr) < \mathit{DPHL} \bigl(\varpi^* \bigr) \Bigr) \geq1 - \epsilon, $$

where $u = (u^{\tau}_{1}, u^{\tau}_{2}, u^{\tau}_{3})^{\tau}$, in which u ₁=(u ₁₁,…,u _1p)^τ is a p-dimensional vector, u ₂=(u,…,u)^τ is a q-dimensional vector, and u ₃ is a (q(q−1)/2)-dimensional vector. This implies that there exists a local maximizer in the ball {ϖ ^∗+n ^−1/2 u:∥u∥₂≤C _ϵ} and thus $\|\widehat{\varpi} - \varpi^{*}\|_{2} = O_{p}(n^{-1/2})$.

Consider

$$\begin{aligned} S(u) =& \mathit{DPHL} \bigl(\varpi^* + n^{-1/2}u \bigr) - \mathit{DPHL} \bigl(\varpi^* \bigr) \\ =& AP_\alpha\bigl(\varpi^* + n^{-1/2}u \bigr) - AP_\alpha\bigl(\varpi^* \bigr) \\ & {}- n\sum^p_{j=1} \bigl( \varphi_{\lambda_{1n}} \bigl(\bigl|\beta^*_j + n^{-1/2}u_{1j}\bigr| \bigr) - \varphi_{\lambda_{1n}} \bigl(\bigl|\beta^*_j\bigr| \bigr) \bigr) \\ & {}- n\sum^q_{k=1} \bigl( \psi_{\lambda_{2n}} \bigl(\bigl|d^*_k+n^{-1/2}u_{2k}\bigr| \bigr) - \psi_{\lambda_{2n}} \bigl(\bigl|d^*_k\bigr| \bigr) \bigr). \end{aligned}$$

By using the concavity and monotonicity of penalty functions, we know that

$$\begin{aligned} S(u) \leq& AP_\alpha\bigl(\varpi^* + n^{-1/2}u \bigr) - AP_\alpha\bigl(\varpi^* \bigr) \\ & {}- n\sum^{p_1}_{j=1} \bigl( \varphi_{\lambda_{1n}} \bigl(\bigl|\beta^*_j + n^{-1/2}u_{1j}\bigr| \bigr) - \varphi_{\lambda_{1n}} \bigl(\bigl|\beta^*_j\bigr| \bigr) \bigr) \\ & {}- n\sum^{q_1}_{k=1} \bigl( \psi_{\lambda_{2n}} \bigl(\bigl|d^*_k+n^{-1/2}u_{2k}\bigr| \bigr) - \psi_{\lambda_{2n}} \bigl(\bigl|d^*_k\bigr| \bigr) \bigr) \\ \triangleq& S_1(u) - S_2(u) - S_3(u). \end{aligned}$$

For S ₁(⋅), using Taylor expansion around ϖ ^∗, we have

$$ S_1(u) = n^{-1/2} u^\tau\frac{\partial AP_\alpha(\varpi^*)}{\partial\varpi} - \frac{1}{2}u^\tau\biggl(-n^{-1} \frac{\partial^2 AP_\alpha(\varpi^*)}{\partial\varpi\partial\varpi^\tau} \biggr) u + o_p \bigl(n^{-1}\|u\|^2_2 \bigr). $$

Then, using Cauchy–Schwarz inequality together with condition (C3) and the fact (7.1), we have

$$\begin{aligned} S_1(u) \leq& O_p(1)\|u\|_1 - \frac{1}{2}u^\tau\bigl\{ A \bigl(\varpi^* \bigr)+o_p(1) \bigr\} u + o_p \bigl(n^{-1}\|u\|^2_2 \bigr) \\ \leq& \sqrt{p+q(q+1)/2} \|u\|_2 O_p(1) - \frac{1}{2}u^\tau A \bigl(\varpi^* \bigr)u + o_p \bigl(n^{-1}\|u\|^2_2 \bigr) \\ \triangleq& S_{11}(u) + S_{12}(u) + S_{13}(u), \end{aligned}$$

where ∥u∥₁ is the L ₁-norm of the (p+q(q+1)/2)-dimensional vector u, and it can be easily checked that $\|u\|_{1} \leq\sqrt {p+q(q+1)/2} \|u\|_{2}$. For S ₂(u), an application of Taylor expansion around zero vector yields

$$ S_2(u) \leq n^{1/2} a_n \|u_1 \|_1 + \frac{1}{2}b_n \|u_1 \|^2_2 + o_p \bigl(\|u_1 \|^2_2 \bigr). $$

Consequently, if a _n=O _p(n ^−1/2), b _n=o _p(1), we know that

$$\begin{aligned} S_2(u) \leq& \sqrt{p}\|u_1\|_2O_p(1) + o_p \bigl(\|u_1\|^2_2 \bigr) \\ \triangleq& S_{21}(u) + S_{22}(u). \end{aligned}$$

Similarly, for S ₃(u), we have

$$\begin{aligned} S_3(u) \leq& \sqrt{q} \|u_2\|_2O_p(1) + o_p \bigl(\|u_2\|^2_2 \bigr) \\ \triangleq& S_{31}(u) + S_{32}(u), \end{aligned}$$

if c _n=O _p(n ^−1/2) and d _n=o _p(1). We can see that, by choosing a sufficient large C _ϵ, S ₁₂(u) dominates other terms uniformly in ∥u∥=C _ϵ, which completes the proof. □

Proof of Theorem 2

First, we prove the sparsity. Note that if d _k=0, we have γ _kt=0, for all t=1,…,k−1. Therefore, it is sufficient to show $P(\widehat{\beta}_{j} = 0)\rightarrow1$ for all j=p ₁+1,…,p and $P(\widehat{d}_{k} = 0)\rightarrow1$ for all k=q ₁+1,…,q. Without loss of generality, we show in detail that $P(\widehat{\beta}_{p} = 0) \rightarrow1$. Then, the same argument can be used to show that $P(\widehat{\beta}_{j} = 0) \rightarrow1$ for j=p ₁+1,…,p−1. Similarly, we can show that $P(\widehat{d}_{k}=0) \rightarrow1$ for k=q ₁+1,…,q. Applying the Taylor expansion around β ^∗ yields

$$\begin{aligned} \frac{\partial \mathit{DPHL}(\widehat{\varpi})}{\partial\beta_p} =& \frac {\partial AP_\alpha(\varpi^*)}{\partial\beta_p} + \sum^p_{l=1} \frac{\partial^2 AP_\alpha(\varpi^*)}{\partial\beta_p \partial\beta _l} \bigl(\widehat{\beta}_l - \beta^*_l \bigr) \\ &{} + \frac{1}{2}\sum^p_{l=1}\sum ^p_{s=1} \frac{\partial^3 AP_\alpha(\varpi_0)}{\partial\beta_p \partial\beta_l \partial\beta_s} \bigl( \widehat{\beta}_l - \beta^*_l \bigr) \bigl(\widehat{ \beta}_s - \beta^*_s \bigr) - n\varphi'_{\lambda_{1n}}\bigl(| \widehat{\beta}_p|\bigr)\operatorname{sgn}(\widehat{\beta}_p), \end{aligned}$$

where ϖ ₀ lies between ϖ ^∗ and $\widehat{\varpi}$. From White [36], we know that $\frac{\partial AP_{\alpha}(\varpi^{*})}{\partial\beta_{p}} = O_{p}(n^{1/2})$. And from Theorem 1 we have $\|\widehat{\beta}-\beta^{*}\|_{2} = O_{p}(n^{-1/2})$. Hence, under conditions (C3) and (C5), we know

$$ \frac{\partial \mathit{DPHL}(\widehat{\varpi})}{\partial\beta_p} = n^{1/2} \bigl(O_p(1) - n^{1/2}\varphi'_{\lambda_{1n}}\bigl(|\widehat{ \beta}_p|\bigr)\operatorname{sgn}(\widehat{\beta}_p) \bigr), $$

which implies that $n\varphi'_{\lambda_{1n}}(|\widehat{\beta}_{p}|)\operatorname{sgn}(\widehat {\beta}_{p})$ dominates the first three terms with probability tending to one if $n^{1/2} \varphi'_{\lambda_{1n}}(|\widehat{\beta}_{p}|) \rightarrow \infty$. Since $\partial \mathit{DPHL}(\widehat{\varpi}) / \partial\beta_{p}=0$, this cannot hold as long as the sample size is sufficiently large. Consequently, $\widehat{\beta}_{p}$ has to be exactly 0 with probability tending to one.

Next, we prove the asymptotic normality. Following Theorems 1 and 2(1), there exists a root-n consistent estimator $\widehat{\varpi} = (\widehat{\varpi}^{\tau}_{a}, \mathbf{0}^{\tau})^{\tau}$ that satisfies the equation $\partial \mathit{DPHL}(\widehat{\varpi}) / \partial\varpi_{a} = 0$. Then, an application of Taylor expansion around $\varpi^{*}_{a}$ yields

$$\begin{aligned} 0 =& \frac{1}{\sqrt{n}}\frac{\partial AP_\alpha(\varpi^*)}{\partial \varpi_a} + \frac{1}{n} \frac{\partial^2 AP_\alpha(\varpi^*)}{\partial\varpi_a \partial\varpi ^\tau_a} n^{1/2} \bigl(\widehat{\varpi}_a - \varpi^*_a \bigr) + o_p \bigl(n^{1/2} \bigl( \widehat{\varpi}_a - \varpi^*_a \bigr) \bigr) \\ & {}- n^{1/2}F_1 \bigl(\varpi^*_a \bigr) - F_2 \bigl(\varpi^*_a \bigr) n^{1/2} \bigl( \widehat{\varpi}_a - \varpi^*_a \bigr) + o_p \bigl(n^{1/2} \bigl(\widehat{\varpi}_a - \varpi^*_a \bigr) \bigr), \end{aligned}$$

where $F_{1}(\varpi^{*}_{a}) = ( \varphi'_{\lambda_{1n}}(|\beta^{*}_{j}|)\operatorname{sgn}(\beta^{*}), j=1,\ldots, p_{1}, \psi'_{\lambda_{2n}}(|d^{*}_{k}|)\operatorname{sgn}(d^{*}), k=1,\ldots, q_{1}, \mathbf{0}^{\tau})^{\tau}$, and $F_{2}(\varpi^{*}_{a})$ is the corresponding second derivative matrix of penalty function vector $(\varphi_{\lambda_{1n}}(|\beta_{j}|), j=1,\ldots,p_{1}, \psi_{\lambda_{2n}}(|d_{k}|), k=1,\ldots, q_{1}, \mathbf{0}^{\tau})^{\tau}$ on $\varpi^{*}_{a}$. Consequently, under conditions in Theorem 2, we have

$$ 0 = \frac{1}{\sqrt{n}}\frac{\partial AP_\alpha(\varpi^*)}{\partial \varpi_a} + \frac{1}{n} \frac{\partial^2 AP_\alpha(\varpi^*)}{\partial\varpi_a \partial\varpi^\tau_a} n^{1/2} \bigl(\widehat{\varpi}_a - \varpi^*_a \bigr) + o_p(1). $$

Then, by Slutsky’s theorem,

$$ \sqrt{n} \bigl(\widehat{\varpi}_a - \varpi^*_a \bigr) \stackrel{L}{\longrightarrow} N \bigl(\mathbf{0}, A^{-1}_{11} \bigl(\varpi^* \bigr) B_{11} \bigl(\varpi^* \bigr) A^{-1}_{11} \bigl(\varpi^* \bigr) \bigr). $$

□

Proof of Theorem 3

In order to get the conclusion, by Theorems 1 and 2, we only need to check the corresponding a _n, c _n=o _p(n ^−1/2), b _n, d _n=o _p(1), $n^{1/2} \varphi'_{\lambda_{1n}}(|\widehat{\beta}_{j}|) \rightarrow \infty$ and $n^{1/2} \psi'_{\lambda_{2n}}(|\widehat{d}_{k}|) \rightarrow\infty$.

For any j=1,…,p ₁, under conditions (C1)–(C5), we know that $w_{\beta_{j}} = O_{p}(1)$. Consequently,

$$\begin{aligned} n^{1/2} \varphi'_{\lambda_{1n}} \bigl(\bigl| \beta^*_j\bigr| \bigr) =& n^{1/2}\lambda_{1n}w_{\beta_j} = o_p(1) \\ \varphi''_{\lambda_{1n}} \bigl(\bigl|\beta^*_j\bigr| \bigr) =& 0, \end{aligned}$$

which implies that a _n=o _p(n ^−1/2) and b _n=o _p(1). Similarly, we can easily find that c _n=o _p(n ^−1/2) and d _n=o _p(1).

While for any j=p ₁+1,…,p, under conditions (C1)–(C5), we know that $w_{\beta_{j}} = O_{p}(n^{\upsilon_{1}/2})$. Then,

$$ n^{1/2} \varphi'_{\lambda_{1n}}\bigl(|\widehat{ \beta}_j|\bigr) = n^{1/2}\lambda_{1n}w_{\beta_j} = O_p \bigl(\lambda_{1n} n^{(1+\upsilon_1)/2} \bigr), $$

which implies that $n^{1/2} \varphi'_{\lambda_{1n}}(|\widehat{\beta}_{j}|) \rightarrow\infty$. Similarly, we know that $n^{1/2} \psi'_{\lambda_{2n}}(|\widehat{d}_{k}|) \rightarrow\infty$ with probability tending to one, for all k=q ₁+1,…,q. This completes the proof. □

Proof of Theorem 4

We first show that for $\varpi_{n} \stackrel{P}{\longrightarrow} \varpi$,

$$\begin{aligned} AP_\alpha(\varpi_n) - E \bigl\{ l(\varpi) \bigr\} =& o_p(n). \end{aligned}$$

(7.2)

First, according to the results in Tierney and Kadane [31] and conditions (C2) and (C3), we have

$$ \max_{\varpi\in\varTheta} \frac{1}{n}\bigl|AP_\alpha( \varpi) - l(\varpi)\bigr| \stackrel{P}{\longrightarrow} 0. $$

(7.3)

Then, following the proof of Theorem 2a in [11], it is sufficient to show that

$$ \max_{\varpi\in\varTheta} \frac{1}{n}\bigl|l(\varpi) - E \bigl\{ l(\varpi) \bigr\} \bigr| \stackrel{P}{\longrightarrow} 0. $$

(7.4)

Note that conditions (C1)–(C5) imply $[l(\varpi)-E\{l(\varpi)\}]/n \stackrel{P}{\longrightarrow} 0$ for all ϖ∈Θ. Further, since conditions (C2), (C3) and (C5) satisfy the W-LIP assumption of Lemma 2 of Andrews [3], we have the uniform continuity and stochastic continuity of E{l(ϖ)} and [l(ϖ)−E{l(ϖ)}]/n, respectively. Consequently, according to Theorem 3 of Andrews [3], (7.4) holds based on the stochastic continuity and pointwise convergence properties. Therefore, together with (7.3), it implies (7.2).

Then, considering an arbitrary candidate model $\mathcal{S}$, under condition (C6), it follows from White [36] that the unpenalized estimator $\tilde{\varpi }_{\mathcal{S}}$ converges to $\varpi^{*}_{\mathcal{S}}$ in probability. Similarly, one can verify that $\varpi^{*}_{\mathcal{S}} = \varpi^{*}$ for any overfitted model $\mathcal{S} \supset \mathcal{S}_{T}$, and $\varpi^{*}_{\mathcal{S}} \neq\varpi^{*}$ for any underfitted model $\mathcal{S} \nsupseteq\mathcal{S}_{T}$ since we shrink some nonzero parameters to zero. Consequently, we prove the conclusion of the theorem in two different cases with respectively underfitted and overfitted model.

Case 1 (Underfitted Model). By the fact that $\varpi^{*}_{\mathcal{S}} \neq\varpi^{*}_{\mathcal{S}_{T}}$ for any $\mathcal{S} \nsupseteq\mathcal {S}_{T}$ and $\varpi^{*}_{\mathcal{S}_{T}} = \varpi^{*}$, we have

$$\begin{aligned} n^{-1} \bigl(\mathit{BIC}(\lambda_n) - \mathit{BIC}(\lambda) \bigr) =& 2n^{-1} \bigl\{ AP_\alpha(\widehat{\varpi}_{\lambda}) - AP_\alpha(\widehat{\varpi}_{\lambda_n}) \bigr\} + \frac{\log n}{n} (df_{\lambda_n} - df_{\lambda}) \\ \geq& 2n^{-1} \bigl\{ AP_\alpha(\widehat{\varpi}_{\lambda}) - AP_\alpha(\tilde{\varpi}_{\mathcal{S}_{\lambda_n}}) \bigr\} + o_p(1) \\ =& 2n^{-1} \bigl\{ E \bigl\{ l \bigl(\varpi^* \bigr) \bigr\} - E \bigl\{ l \bigl(\varpi^*_{\mathcal{S}_{\lambda_n}} \bigr) \bigr\} \bigr\} + o_p(1) \\ \geq& 2n^{-1}\min_{\mathcal{S} \nsupseteq \mathcal{S}_T} \bigl\{ E \bigl\{ l \bigl( \varpi^* \bigr) \bigr\} - E \bigl\{ l \bigl(\varpi^*_{\mathcal{S}} \bigr) \bigr\} \bigr\} + o_p(1), \end{aligned}$$

where the first inequality follows because $AP_{\alpha}(\tilde{\varpi }_{\mathcal{S}_{\lambda_{n}}}) \geq AP_{\alpha}(\widehat{\varpi}_{\lambda _{n}})$ for all λ _n and the second equality follows from (7.2). Therefore, under condition (C7), we have $P(\inf_{\lambda_{n} \in R_{-}} \mathit{BIC}(\lambda_{n}) > \mathit{BIC}(\lambda)) \longrightarrow1$.

Case 2 (Overfitted Model). For any λ _n∈R ₊, we have $df_{\lambda_{n}} - (p_{1} + q_{1}(q_{1}+1)/2) \geq1$. Then

$$\begin{aligned} &\mathit{BIC}(\lambda_n) - \mathit{BIC}(\lambda) \\ &\quad = 2 \bigl\{ AP_\alpha(\widehat{\varpi}_{\lambda}) - AP_\alpha(\widehat{\varpi}_{\lambda_n}) \bigr\} + \bigl(df_{\lambda_n} - \bigl(p_1 + q_1(q_1+1)/2 \bigr) \bigr) \log n \\ &\quad \geq 2 \bigl\{ AP_\alpha(\widehat{\varpi}_{\lambda}) - AP_\alpha(\tilde{\varpi}_{\mathcal{S}_{\lambda_n}}) \bigr\} + \log n \\ &\quad \geq 2 \min_{\mathcal{S} \supset\mathcal{S}_T} \bigl\{ AP_\alpha (\widehat{ \varpi}_{\lambda}) - AP_\alpha(\tilde{\varpi}_{\mathcal{S}}) \bigr\} + \log n. \end{aligned}$$

By the fact that AP _α(ϖ)=l(ϖ)(1+O _p(n ⁻¹)) and $\varpi^{*}_{\mathcal{S}} = \varpi^{*}$ for any $\mathcal{S} \supset \mathcal{S}_{T}$, an application of Taylor expansion yields

$$\begin{aligned} & AP_\alpha(\widehat{\varpi}_{\lambda}) - AP_\alpha( \tilde{\varpi}_{\mathcal{S}}) \\ &\quad = \bigl\{ \sqrt{n} \bigl(\widehat{\varpi}_{\lambda} - \varpi^* \bigr) - \sqrt{n} \bigl(\tilde{\varpi}_{\mathcal{S}} - \varpi^* \bigr) \bigr\} ^\tau n^{-1/2} \frac{\partial l(\varpi^*)}{\partial\varpi} \\ &\qquad {} + O_p \biggl( \bigl(\widehat{\varpi}_{\lambda} - \varpi^* \bigr)^\tau\frac{\partial^2 l(\varpi^*)}{\partial\varpi\partial \varpi^\tau} \bigl(\widehat{\varpi}_{\lambda} - \varpi^* \bigr) \biggr) \\ &\qquad {} + O_p \biggl( \bigl(\tilde{\varpi}_{\mathcal{S}} - \varpi^* \bigr)^\tau\frac{\partial^2 l(\varpi^*)}{\partial\varpi\partial \varpi^\tau} \bigl(\tilde{\varpi}_{\mathcal{S}} - \varpi^* \bigr) \biggr). \end{aligned}$$

Therefore, under conditions (C1)–(C5), it follows from White [36] that $AP_{\alpha}(\widehat{\varpi}_{\lambda}) - AP_{\alpha}(\tilde{\varpi}_{\mathcal{S}}) = O_{p}(1)$, which implies that

$$ \mathit{BIC}(\lambda_n) - \mathit{BIC}(\lambda) \geq O_p(1) + \log n \stackrel{P}{\rightarrow} \infty. $$

As a result, we have $P(\inf_{\lambda_{n} \in R_{+}} \mathit{BIC}(\lambda_{n}) > \mathit{BIC}(\lambda)) \longrightarrow1$, which completes the proof. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, P., Wang, T., Zhu, H. et al. Double Penalized H-Likelihood for Selection of Fixed and Random Effects in Mixed Effects Models. Stat Biosci 7, 108–128 (2015). https://doi.org/10.1007/s12561-013-9105-x

Download citation

Received: 15 August 2012
Accepted: 16 October 2013
Published: 05 November 2013
Issue Date: May 2015
DOI: https://doi.org/10.1007/s12561-013-9105-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Double Penalized H-Likelihood for Selection of Fixed and Random Effects in Mixed Effects Models

Abstract

Access this article

Similar content being viewed by others

New variable selection for linear mixed-effects models

Variable selection for fixed effects varying coefficient models

Quasi-maximum likelihood estimation and penalized estimation under non-standard conditions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Double Penalized H-Likelihood for Selection of Fixed and Random Effects in Mixed Effects Models

Abstract

Access this article

Similar content being viewed by others

New variable selection for linear mixed-effects models

Variable selection for fixed effects varying coefficient models

Quasi-maximum likelihood estimation and penalized estimation under non-standard conditions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation