Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques

  • Colin Griesbach ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft

    colin.griesbach@fau.de

    Affiliation Department of Medical Informatics, Biometry and Epidemiology Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

  • Andreas Groll,

    Roles Methodology, Writing – review & editing

    Affiliation Faculty of Statistics, TU Dortmund, Dortmund, Germany

  • Elisabeth Bergherr

    Roles Funding acquisition, Methodology, Supervision, Writing – review & editing

    Affiliation Department of Medical Informatics, Biometry and Epidemiology Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany

Abstract

Boosting techniques from the field of statistical learning have grown to be a popular tool for estimating and selecting predictor effects in various regression models and can roughly be separated in two general approaches, namely gradient boosting and likelihood-based boosting. An extensive framework has been proposed in order to fit generalized mixed models based on boosting, however for the case of cluster-constant covariates likelihood-based boosting approaches tend to mischoose variables in the selection step leading to wrong estimates. We propose an improved boosting algorithm for linear mixed models, where the random effects are properly weighted, disentangled from the fixed effects updating scheme and corrected for correlations with cluster-constant covariates in order to improve quality of estimates and in addition reduce the computational effort. The method outperforms current state-of-the-art approaches from boosting and maximum likelihood inference which is shown via simulations and various data examples.

1 Introduction

Linear mixed models [1] proved to be a very popular tool for analysing data with repeated measurements, especially clustered longitudinal data from clinical surveys. Nevertheless, they are applicable to much broader fields and various overviews can be found in [24]. Fitting these models can be achieved with a variety of R packages available [5, 6] and classical methods for inference like tests [7] or selection criteria [8, 9] have been developed.

In order to use mixed models for prediction analysis, various approaches to regularized regression like lasso [10, 11] and boosting techniques [12] have been proposed. Lasso type approaches can be found in [13] for linear and in [14] for generalized linear mixed models. Boosting in general can be distinguished between gradient boosting [15, 16] and likelihood-based boosting [17, 18]. Both boosting methods are capable of fitting mixed models and for the latter an extensive framework has been proposed towards this matter in [1921] and is included in the R package GMMBoost [22] available on CRAN. Apart from improving prediction analysis, component-wise boosting methods are due to an iterative and component-wise fitting process suitable for high dimensional data and implicitly offer variable selection. Good insights into component-wise boosting can be found in [23] for gradient boosting and in [24] for gradient and likelihood-based boosting as well. Please note that when talking about boosting, we always refer to the component-wise variant.

However, the bGLMM algorithm from the GMMBoost package tends to struggle with cluster-constant covariates, e.g. baseline covariates like gender or treatment group in longitudinal studies. The specified selection and updating procedure of the bGLMM algorithm tends to favour cluster-varying covariates while the simultaneously updated random intercepts partly account for effects actually evolving from cluster-constant covariates. As shown in Fig 1, this malfunction already occurs in a very basic data example with the popular Orthodont dataset, which is available in various R packages. The dataset depicts the evolution of an orthodontal measurement of 27 children and contains two covariates. A basic linear mixed model with random intercepts returns the two coefficient estimates by lmer and by bGLMM for the effect of the cluster-constant covariate gender. The reason for this difference becomes clear when looking at the random intercepts, where bGLMM tends to compensate the missing effect for gender by assigning every female subject a random intercept lowered by 2.32. Although the structure of the Orthodont data set is very simple and does not require boosting, it is evident that the described weak spot of bGLMM is not confined to more complex datasets and thus can occur for any clustered data containing cluster-constant covariates.

thumbnail
Fig 1. Comparison between random intercept estimates by lmer and bGLMM for Orthodont.

https://doi.org/10.1371/journal.pone.0254178.g001

We therefore propose an updated algorithm with various changes in order to avoid the phenomenon of random intercepts growing too quickly. These changes include the usage of smaller starting values and weaker random-effects updates to prevent the random effects from growing too fast as well as undocking the random effects update from the fixed effects boosting scheme, which guarantees a fair comparison between the single covariates for the fixed effects. Most importantly, we introduce a correction step for the random effects estimation to avoid possible correlations with observed covariates. The contribution of the present work is therefore a novel and better performing boosting algorithm regarding both estimation accuracy and runtime for mixed models, particularly in the presence of cluster-constant covariates. The algorithm not only solves the prescribed identification issues but in addition states the only regularization approach for mixed models, which explicitly accounts for estimation bias arising from possible correlations between random and regularized effects. While existing approaches bypass these issues by excluding affected covariates from the regularization approach, the presented algorithm addresses the phenomena directly by correcting falsely estimated random effects.

The remainder of the paper is structured as follows: Section 2 formulates the underlying model and the updated boosting algorithm as well as a detailed discussion of the changes. The algorithm is then evaluated and compared to other regularization approaches using an extensive simulation study described in Section 3. As an illustrating data example we have chosen the Primary Biliary Cirrhosis data, which further underlines the strengths and weaknesses of the compared methods and is discussed in Section 4. Finally, the results and possible extensions are discussed.

2 Methods

We propose a novel and improved boosting algorithm for linear mixed models in the following subsections.

2.1 Model specification

For clusters i = 1, …, n with observations j = 1, …, ni we consider the linear mixed model with covariate vectors and referring to the fixed and random effects β and γi, respectively. The random components are assumed to follow normal distributions, i.e. for the model error and for the random effects. This leads to a cluster-wise notation with , 1 = (1, …, 1), , and . Finally, we get the common matrix notation (1) of the full model with observations , design matrices and the block-diagonal . The random components and have corresponding covariance matrices σ2 IN and diag(Q, …, Q) where IN is the N = ∑ni dimensional unit matrix.

In order to perform likelihood inference, let ϑ = (β0, βT, γT) denote the effects and ϕ = (σ2, τ) information of the random components, where τ contains the values of Q. The marginal log-likelihood of the model can be obtained via where f(⋅|ϑ, ϕ) and p(⋅|ϕ) denote the normal densities of the model error and the random effects. Laplace approximation following [25] results in the penalized log-likelihood (2) which is going to be maximized simultaneously for ϑ and ϕ by likelihood-based boosting-techniques discussed in the following subsection.

2.2 Boosting algorithm

The lbbLMM (likelihood-based boosting for linear mixed models) algorithm iteratively fits the linear mixed model (1) via component-wise likelihood-based boosting. The fitting procedure in general is carried out by Fisher-scoring [26], a variant of Newton’s optimization method [27], which iteratively optimizes a given cost function based on quadratic approximations. It therefore obtains updates based on first order and second order derivatives, which are, in the context of Fisher scoring, represented by score vector and Fisher matrix of the underlying cost function. We first give a brief description of the algorithm and discuss the single steps in more detail in the following subsection.

Algorithm lbbLMM

  • Initialize estimates with starting values and . Choose total number of iterations mstop and step length ν.
  • for m = 1 to mstop do
  • step1: Update fixed effects
    For r = 1, …, p define with denoting the rth component of . Compute score vector and Fisher matrix with respect to the current intercept and the rth linear effect . Obtain p possible updates and find the best performing component * ∈ {1, …, p} minimizing AIC or BIC. This yields the update u* = (u0, u*) containing the update u* for the effect * with corresponding intercept update u0. Receive , by updating (3)
  • step2: Update random effects
    Update random effects using an additional Fisher scoring step based on the penalized log-likelihood by calculating and weakly updating The incorporation of the correction matrix C at this step is crucial and its derivation is discussed further below.
  • step3: Update variance-covariance-components
    Update variance-covariance-components using an approximate EM-algorithm.
  • end for
  • Stop the algorithm at the best performing m* with respect to the specified information criterion. Return and as the final estimates.

2.3 Computational details of the algorithm

We give a stepwise description of the computational details of the lbbLMM algorithm. For simplicity, we omit iteration indices and hats indicating estimated values whenever appropriate.

2.3.1 Starting values.

The parameters actually underlying the selection process are necessarily set to zero, thus . Initial intercept and model error are set to , and random effects are initialized as with small covariance-matrix, e.g. . An alternative approach which is also proposed in [19] would be fitting a standard linear mixed model for intercept and random effects by using e.g. the function lmer from the R package lme4 and extracting the starting values from the model fit.

2.3.2 Fixed effects boosting process.

The computation of the rth update is straight forward by calculating where is a N × 2 matrix containing a column of ones and the rth column of X associated with the rth covariate and η denoting the current fit. This leads to p possible parameter vectors ϑr, where only the intercept and rth component received an update according to ur. The best performing component is the one leading to minimal AICr or BICr [28, 29] given by Here, df = #ϕ + #{ip: βi ≠ 0} denotes the model complexity according to the marginal likelihood where #ϕ is the total number of variance-covariance parameters in ϕ.

2.3.3 Random effects update.

By calculating a weak and corrected update for the random effects is obtained. Note that this differs from the approach in [19] as the random effects are updated separately and in addition also receive an update scaled by the step length ν. The weak update ensures that the random effects don’t grow to quickly compared to the fixed effects. The disentanglement of the random effects update from the fixed effects updating scheme on the other hand guarantees a fair comparison of the single fixed effects, where the random effects do not play a crucial role. In addition, the Fisher matrix has block-diagonal form making the inversion much easier and thus strongly reducing the computational effort.

2.3.4 Deriving the correction matrix C.

The single random intercepts or random slopes are corrected independently of each other using distinct sets of covariates. For the correction of the sth random effect consider with covariates denotes the space of all n × m matrices with values in . where ps denotes the total number of correction-covariates used for the sth random effect. Note that Xcs has n rows as it contains only one representative observation from each cluster. The correction matrix contains all cluster-constant covariates for random intercepts and just a column of ones for random slopes, which corresponds to centering the given random slope. The correction matrix Cs for the sth random effect is obtained by so that the product corrects the sth random effect for any covariates contained in the corresponding matrix Xcs by counting out the orthogonal projections of the sth random effect estimates on the subspace generated by the covariates Xcs. This ensures the coefficient estimate for the random effects to be uncorrelated with any observed covariate. These separate corrections are summarised in one single correction matrix by defining the block diagonal and computing where P is a permutation matrix mapping γ to The product Cγ then corrects every random effect simultaneously. This concept also proved useful for an improved estimation of mixed models via model-based gradient-boosting [30].

2.3.5 Updating variance-covariance-components.

The covariance matrix Q of the random effects is updated with an approximate EM-algorithm using the posterior curvatures Fi of the random effects model [31]. An update is received by computing Each iteration’s model error is obtained by

2.3.6 Choice of steplength.

The steplength 0 < ν ≤ 1 controls the weakness of each update and is substantial in order to avoid overfitting and give each candidate variable equal opportunity to get selected. We stick to the choice of ν = 0.1 for both fixed and random effects updates, which is well established in the boosting community and thus makes a fairer comparison. This ensures that neither of the coefficient estimates is growing too quickly.

2.3.7 Stopping iteration.

The algorithm is stopped based on AIC or BIC, i.e. where AIC[m] and BIC[m] denote the information criteria after m iterations. An alternative and computationally more burdensome stopping rule would rely on cluster-wise cross-validation [32], which is however asymptotically equivalent to the marginal AIC as used above [33].

3 Simulation study

The algorithm is evaluated with a simulation study. The single simulation scenarios are described in the first subsection, while results are discussed in the latter two. Primary focus is to show, that the algorithm solves the identification problem of the random effects and thus is compared to the bGLMM function of the GMMBoost package available on CRAN. Furthermore, its performance is compared to the classical method implemented in the lmer function of the lme4 package as well as the glmmLasso function of the same-named package, which is another popular approach to regularized regression with potentially high numbers of candidate variables. Please note that we did not include mboost in the comparison, as its approach to random effects is not able to estimate variance components for random intercepts not to mention covariance matrices for multiple slopes. The bGLMM was also compared to the glmmPQL function [4] in [19].

The comparison focuses on mean squared errors of estimates for fixed effects and the random structure as an indicator for overall performance and to address the identification problem. Variable selection properties are evaluated via true and false positive as well as well as false discovery rates. As a side note, we compare computational effort.

3.1 Setups

The first setups’ random structure consists of random intercepts only. Overall, the setup includes four informative covariates and in addition varying numbers of non-informatives. For i = 1, …, 50 and j = 1, …, 5 we consider the random intercepts setup (4) with values β0 = 1, β1 = 2, β2 = 4, β3 = 3, β4 = 5 and βr = 0, r > 5 for the fixed effects, for the cluster-constant and cluster-varying covariates and and for the random components with σ = 0.4 and τ ∈ {0.4, 0.8, 1.6}. The total amount of covariates is evaluated for the six different cases p ∈ {10, 25, 50, 100, 250, 500} ranging from low to high dimensional setups.

The second setup is a slightly altered scenario with two additional random slopes, one for a cluster-constant and one for a cluster-varying covariate, i.e. (5) with where τ ∈ {0.4, 0.8, 1.6} and τ* is chosen so that cor(γki, γli) = 0.6 for all k, l = 1, 2, 3 holds.

For β = (β0, …, βp)T we consider mean squared errors as an indicator for estimation accuracy with ‖⋅‖F denoting the Frobenius norm of a given matrix. Variable selection properties are evaluated by calculating false positives (FP), true positives (TP) and false discovery rates (FDR) where Psel and Ptot denote the amounts of selected informative and total informative candidate variables with and as the equivalents for non-informative covariates. Finally, the elapsed time is measured in seconds where each simulation run was carried out on a 2 x 2.66 GHz-6-Core Intel Xeon CPU (64GB RAM).

Every single simulation setup was independently executed 100 times and, in order to account for skewness of the mean squared error distributions, median values are reported for estimation accuracy and average values for variable selection properties and computation time. The bGLMM boosting algorithm was initialized with mstop = 500, while lbbLMM was iterated up to mstop = 1500. To determine the optimal penalization parameter for glmmLasso, the grid {500, 495, 490, …, 0} was used. All of the included regularization approaches were tuned using the BIC.

3.2 Results: Random intercepts

3.2.1 Estimation accuracy.

Table 1 summarizes results for estimation accuracy. In general, the lbbLMM algorithm produces very precise estimates while the bGLMM function suffers from the prescribed identification problem yielding a minimum mean squared error of 22 + 42 = 20 as the cluster-constant covariates are not being selected. All methods get less precise as the values of τ and p increase, only lbbLMM has stable error rates regarding the amount of candidate variables p. Overall, lbbLMM outperforms its competitors in every single scenario. Estimation accuracy of the random structure is described in Table 2. Estimates by lbbLMM behave similarly well as by lmer while the identification problem in bGLMM results in high error rates. While lying in the same range as lmer, lbbLMM clearly outperforms the remaining regularization approaches.

thumbnail
Table 1. Median mseβ of 100 independent simulation runs for each random intercepts setup with corresponding interquartile range.

https://doi.org/10.1371/journal.pone.0254178.t001

thumbnail
Table 2. Median mseτ and mseσ of 100 independent simulation runs for each random intercepts setup.

https://doi.org/10.1371/journal.pone.0254178.t002

3.2.2 Variable selection.

Table 3 depicts variable selection properties. While selection quality of glmmLasso improves as p increases, both boosting approaches yield perfect properties with respect to false positives. However, the identification problem of bGLMM leads to a low true positives rate, as informative effects of cluster-constant covariates are being captured by the random intercepts. lbbLMM on the other hand has perfect selection properties with respect to both, true and false positives.

thumbnail
Table 3. Variable selection properties averaged over 300 runs for each p-dimensional random intercepts setup.

True positives (TP), false positives (FP) and false discovery rate (FDR). Since no noticeable variability regarding the choice of τ occurred, results are summarized.

https://doi.org/10.1371/journal.pone.0254178.t003

3.3 Results: Random slopes

3.3.1 Estimation accuracy.

Table 4 summarizes results for estimation accuracy. Except for the increased error rates in general, the behaviour is similar to the random intercepts setup (4). lbbLMM again performs stable as p increases and clearly outperforms the other regularization approaches. However, lmer has slightly better error rates in low-dimensional scenarios with higher values for τ, which can be also seen in Table 5.

thumbnail
Table 4. Median mseβ of 100 independent simulation runs for each random slopes setup with corresponding interquartile range.

https://doi.org/10.1371/journal.pone.0254178.t004

thumbnail
Table 5. Median mseQ and mseσ of 100 independent simulation runs for each random slopes setup.

https://doi.org/10.1371/journal.pone.0254178.t005

3.3.2 Variable selection.

Table 6 depicts variable selection properties for the random slopes setup. Results are almost identical to the random intercepts setup described in Table 3.

thumbnail
Table 6. Variable selection properties averaged over 300 runs for each p-dimensional random slopes setup.

True positives (TP), false positives (FP) and false discovery rate (FDR). Since no noticeable variability regarding the choice of τ occurred, results are summarized.

https://doi.org/10.1371/journal.pone.0254178.t006

The presence of the identification problem in bGLMM is reflected in high errors for fixed and random effects and low true positives rates. Based on its good values for mseβ, mseτ, mseQ and TP it can be stated that lbbLMM not only solves the problems occurring in bGLMM but also offers a reliable and good performing regularization approach to linear mixed models in general.

3.4 Computation time

Table 7 depicts average computation times for the random intercepts (4) and random slopes (5) setup. All regularization approaches roughly show a linear scaling with increasing amount of candidate variables p. In most cases, glmmLasso runs noticeably faster than its two boosting competitors. However, a direct comparison is hard to interpret as the computation time of glmmLasso strongly depends on the fineness of the grid used in order to determine the optimal penalization parameter. In addition, the corrupt updating process of bGLMM leads to substantial faster convergence, as the algorithm is due to its identification issue capable of fitting multiple effects in one single iteration and thus achieves faster convergence, which is also the reason why bGLMM runs faster in the random slopes setup.

thumbnail
Table 7. Averages of elapsed computation time of 100 independent simulation runs for each random intercepts (tint) and slopes (tslp) setup.

https://doi.org/10.1371/journal.pone.0254178.t007

However, the computational effort for bGLMM is very sensitive to increasing numbers of total observations N. Table 8 and Fig 2 depict averaged computation times of the random intercepts scenario (4) with τ = 0.4 and p = 10 fixed, but varying values ni ∈ {5, 10, 15, 20}, i.e. the number of observations per cluster. While glmmLasso and lbbLMM have a linear relationship between N and elapsed computation time, bGLMM increases exponentially making the method less applicable even to data sets with fewer candidate variables when the number of total observations is large.

thumbnail
Fig 2. Differing computational effort of the regularization routines bGLMM, lbbLMM and glmmLasso for varying cluster sizes.

https://doi.org/10.1371/journal.pone.0254178.g002

thumbnail
Table 8. Averages of elapsed computation time of 100 independent simulation runs regarding varying values for ni with τ = 0.4 and p = 10 fixed.

https://doi.org/10.1371/journal.pone.0254178.t008

4 Primary biliary cirrhosis

The primary biliary cirrhosis (PBC) dataset from 1994 [34] tracks the change of the serum bilirubin level for a total of 312 PBC patients randomized into a treatment and a placebo group and additionally contains baseline covariates as well as follow-up measurements of several biomarkers. The dataset is, among others, available in the JM package [35] and Table 9 gives an overview of the single covariates included in the data and how they are coded in the model formula. The serum bilirubin level, here modelled as the response variable, is considered a strong indicator for disease progression, hence an appropriate quantification of the impact of the given covariates on the serum bilirubin level will lead to an adequate prediction model for the health status of PBC patients. Using boosting to carry out this quantification will optimize the prediction properties. For yij denoting the jth measurement of serum bilirubin for the ith patient, we formulate the random intercept model (6) with and an included square time effect, since the effect of time might be nonlinear. Both boosting approaches were initialized with mstop = 500 and the grid {0, 0.1, 0.2, …, 30} was chosen for glmmLasso. Based on BIC, bGLMM determined m* = 389 and lbbLMM m* = 161 as the best performing number of iterations and the optimal tuning parameter for glmmLasso was λ* = 12.4. The coefficient estimates are compared to an unregularized model (lmer) displayed with corresponding p-values in Table 10 and the well known coefficient paths for lbbLMM are depicted in Fig 3.

thumbnail
Fig 3. Coefficient progression for the PBC data obtained by lbbLMM with m* = 161.

https://doi.org/10.1371/journal.pone.0254178.g003

thumbnail
Table 9. Variables of the PBC data set.

drug and sex are dummies for treatment group and female gender. Ascites is the abnormal buildup of fluid in the abdomen and spiders are blood vessel malformations in the skin. SGOT is short for serum glutamic oxaloacetic transaminase.

https://doi.org/10.1371/journal.pone.0254178.t009

thumbnail
Table 10. Variable selection and shrinkage of various regularization approaches compared to lmer.

https://doi.org/10.1371/journal.pone.0254178.t010

In general, the results reflect what was already observed in the simulation study. glmmLasso struggles with proper variable selection in lower dimensional scenarios and bGLMM does not select any cluster-constant covariates due to misspecification. Although the effect of drug has a comparatively high p-value, the coefficient estimate by bGLMM stands out among all regularization approaches while the value for τ is simultaneously pretty large, which indicates possible bias arising from wrongly identified random intercepts. On the other hand, the rest of the variables which were selected by lbbLMM tend to be of high impact depending on the chosen significance level. For lower choices, e.g. α ∈ {0.01, 0.005} [36], the selection process of lbbLMM matches with the selected covariate having significant impact while at the same time receiving shrinkage by the regularization mechanism. Regarding computational effort, bGLMM runs with approximately 17 hours (60543 seconds) tremendously long and needs around 600 times more computation time than its direct competitor lbbLMM.

5 Discussion

The updated algorithm is due to its minor and major tweaks capable of dealing with cluster-constant covariates in linear mixed models by preventing the random effects from taking up too much space. In addition, it preserves the well-known advantages of boosting techniques in general by offering variable selection and a good functionality even in high dimensional setups. As a very important side effect the computational effort receives a tremendous decrease making the algorithm more applicable to real world scenarios.

Primary hindrance of the lbbLMM algorithm is a missing approach for model choice as the random effects structure has to be specified in advance and does not underlie any selection process. Although reasonable options regarding the random structure are limited in most real world applications and could also be evaluated afterwards using appropriate information criteria, it remains and interesting question, how one could incorporate proper model selection during the updating process while simultaneously preserving the advantages gained by the lbbLMM algorithm.

Canonical extensions of the successful concept include incorporating non-linear predictor functions, i.e. estimation of smooth effects based on P-splines or extending the algorithm from linear mixed models to generalized mixed models to allow more flexible inference for a wider class of data structures. Both have been incorporated in [37] for classical likelihood-based boosting and it is assumed that the proposed tweaks in the present work would improve performance of the more flexible approaches as well.

Acknowledgments

Colin Griesbach performed the present work in partial fulfilment of the requirements for obtaining the degree ‘Dr. rer. biol. hum.’ at the Friedrich-Alexander-Universität Erlangen-Nürnberg.

References

  1. 1. Laird NM, Ware JH. Random-Effects Models for Longitudinal Data. Biometrics. 1982;38(4):963–974. pmid:7168798
  2. 2. Anderssen R, Bloomfield P. A Time Series Approach To Numerical Differentiation. Technometrics. 1974;16:69–75.
  3. 3. Wahba G. A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem. Annals of Statistics. 1985; p. 1378–1402.
  4. 4. Wood SN. Generalized Additive Models: An Introduction with R. 2nd ed. Chapman and Hall/CRC; 2017.
  5. 5. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015;67(1):1–48.
  6. 6. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team. nlme: Linear and Nonlinear Mixed Effects Models; 2020. Available from: https://CRAN.R-project.org/package=nlme.
  7. 7. Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2004;66(1):165–185.
  8. 8. Vaida F, Blanchard S. Conditional Akaike information for mixed-effects models. Biometrika. 2005;92(2):351–370.
  9. 9. Greven S, Kneib T. On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika. 2010;97(4):773–789.
  10. 10. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1):1–22. pmid:20808728
  11. 11. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1):267–288.
  12. 12. Freund Y, Schapire RE. Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco: Morgan Kaufmann; 1996. p. 148–156.
  13. 13. Schelldorfer J, Bühlmann P, De Geer Sv. Estimation for High-Dimensional Linear Mixed-Effects Models Using l1-Penalization. Scandinavian Journal of Statistics. 2011;38(2):197–214.
  14. 14. Groll A, Tutz G. Variable selection for generalized linear mixed models by L 1-penalized estimation. Statistics and Computing. 2014;24(2):137–154.
  15. 15. Breiman L. Arcing classifiers (with discussion). Ann Statist. 1998;26:801–849.
  16. 16. Breiman L. Prediction games and arcing algorithms. Neural Computation. 1999;11:1493–1517. pmid:10490934
  17. 17. Tutz G, Binder H. Generalized Additive Models with Implicit Variable Selection by Likelihood-Based Boosting. Biometrics. 2006;62(4):961–971. pmid:17156269
  18. 18. Tutz G, Reithinger F. A boosting approach to flexible semiparametric mixed models. Statistics in Medicine. 2007;26(14):2872–2900. pmid:17133647
  19. 19. Tutz G, Groll A. Generalized Linear Mixed Models Based on Boosting. In: Kneib T, Tutz G, editors. Statistical Modelling and Regression Structures. Berlin Heidelberg: Springer-Verlag; 2010. p. 197–216.
  20. 20. Groll A, Tutz G. Variable selection for generalized additive mixed models by likelihood-based boosting. Methods of Information in Medicine. 2012;51(2):168–177.
  21. 21. Tutz G, Groll A. Likelihood-based boosting in binary and ordinal random effects models. Journal of Computational and Graphical Statistics. 2013;22(2):356–378.
  22. 22. Groll A. GMMBoost: likelihood-based boosting approaches to generalized mixed models; 2013. Available from: https://cran.r-project.org/package=GMMBoost.
  23. 23. Bühlmann P, Hothorn T. Boosting algorithms: Regularization, prediction and model fitting. Statistical Sciences. 2007;27:477–505.
  24. 24. Mayr A, Binder H, Gefeller O, Schmid M. The Evolution of Boosting Algorithms—From Machine Learning to Statistical Modelling. Methods of Information in Medicine. 2014;53(6):419–427. pmid:25112367
  25. 25. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed model. Journal of the American Statistical Association. 1993;88:9–52.
  26. 26. Longford NT. A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika. 1987;74(4):817–827.
  27. 27. Nocedal J, Wright SJ. Numerical Optimization. 2nd ed. New York, NY, USA: Springer; 2006.
  28. 28. Akaike H. Information theory and the extension of the maximum likelihood principle. Second International Symposium on Information Theory. 1973; p. 267–281.
  29. 29. Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464.
  30. 30. Griesbach C, Säfken B, Waldmann E. Gradient boosting for linear mixed models. The International Journal of Biostatistics. 2020;(forthcoming).
  31. 31. Fahrmeir L, Tutz G. Multivariate Statistical Modelling Based on Generalized Linear Models. 2nd ed. New York: Springer-Verlag; 2001.
  32. 32. Müller S, Scealy JL, Welsh AH. Model Selection in Linear Mixed Models. Statistical Science. 2013;28(2):135–167.
  33. 33. Fang Y. Asymptotic Equivalence between Cross-Validations and Akaike Information Criteria in Mixed-Effects Models. Journal of Data Science. 2011;9:15–21.
  34. 34. Murtaugh P, Dickson E, Van Dam G, Malincho M, Grambsch P, Langworthy A, et al. Primary biliary cirrhosis: Prediction of short‐term survival based on repeated patient visits. Hepatology. 1994;20(1):126–134. pmid:8020881
  35. 35. Rizopoulos D. JM: An R Package for the Joint Modelling of Longitudinal and Time-to-Event Data. Journal of Statistical Software. 2010;35(9):1–33.
  36. 36. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, et al. Redefine statistical significance. Nature Human Behaviour. 2017;2(1):6–10.
  37. 37. Groll A. Variable Selection by Regularization Methods for Generalized Mixed Models. Ludwig-Maximilians-Universität München. Munich, Germany; 2011.