Restricted likelihood ratio testing in linear mixed models with general error covariance structure

We consider the problem of testing for zero variance components in linear mixed models with correlated or heteroscedastic errors. In the case of independent and identically distributed errors, a valid test exists, which is based on the exact finite sample distribution of the restricted likelihood ratio test statistic under the null hypothesis. We propose to make use of a transformation to derive the (approximate) test distribution for the restricted likelihood ratio test statistic in the case of a general error covariance structure. The proposed test proves its value in simulations and is finally applied to an interesting question in the field of well-being economics.


Introduction
We are interested in testing for zero variance components in the context of Linear Mixed Models (LMMs). LMMs constitute a flexible class of statistical models that can be applied to a wide range of problems. These models are suitable to analyze, for example, longitudinal or clustered data [e.g. 20] as well as spatial data [e.g . 8]. Moreover, LMMs are established as a useful framework for nonparametric models [e.g . 15].
Within the LMM framework it is possible to test whether individuals or clusters differ from other observation units in a way that cannot be predicted by the considered covariates, or whether an estimated smooth function is different from a polynomial of a given degree. Such test problems are equivalent to testing whether the variance of the corresponding random effect is different from zero. This is a nonstandard test situation because under the null hypothesis the tested parameter is on the boundary of the parameter space. A valid Re-stricted Likelihood Ratio Test (RLRT) for this situation has been established by Crainiceanu and Ruppert [3], who derived the exact finite sample distribution of the RLRT statistic in the case of an LMM with only one random effect's variance component. For LMMs with more than one additional variance components, there are good approximations of the test distribution based on the exact RLRT distribution [6,17].
The established test procedure has been derived based on the assumption that the errors are independent and identically distributed (i.i.d.) conditional on the random effects. In many practical settings the i.i.d. assumption is not fulfilled. For example, in a panel data set from a social survey, the observations of each respondent are likely to be autocorrelated conditional on the individual random effect, because observations closer in time are likely to be more similar.
The data situation we have in mind is motivated by the micro-econometric analysis in Wunder et al. [23]. Here, unbalanced panel data on subjective wellbeing were analyzed with a semiparametric mixed model containing a smooth function in age and an individual random intercept. The data used are from the German Socio-Economic Panel Study (SOEP), which are available as scientific use files at http://www.diw.de/en/soep [21]. The question of interest was whether the smooth function is different from a quadratic or cubic polynomial as is typically used in well-being research. In the LMM framework, this question can be answered in testing whether the corresponding random effect's variance is different from zero. Further analyses of the data set indicated that for each individual the errors are correlated over time.
Therefore, we look for a generalization of the established test allowing for more general covariance structures of the errors. Since the exact RLRT null distribution cannot simply be extended to correlated errors and a bootstrapbased test is computationally too demanding (given the large SOEP data set of more than 250 000 observations), we follow another approach: We use the RLRT statistic and propose to use a transformation to approximate the test distribution employing results on the exact distribution in the case of i.i.d. errors.
The paper is organized as follows: In Section 2 we present the methodological framework and the notation used in this paper. First, we introduce the considered class of models and discuss the problem of testing for zero variance components in the case of i.i.d. errors. After this, we present the idea of the transformation we adapt to take the correlation structure of the errors into account, before we propose the test procedure for the case of errors with general covariance structure. The accuracy of the proposed test is then investigated in a simulation study described in Section 3. Moreover, we apply the suggested test to the motivating question regarding the profile of subjective well-being over age in Section 4. Finally, Section 5 gives a short summary and discussion of our findings.

The linear mixed model
We consider LMMs of the following form: Here, X and Z 1 , . . . , Z S are design matrices with full ranks p and K 1 , . . . , K S , respectively. The vector β contains fixed covariate effects, whereas b 1 , . . . , b S are independent random effect vectors. The correlation matrices associated with the random effects vectors, Σ 1 , . . . , Σ S , are assumed known. The matrix R describes an error covariance structure. It is often assumed that the errors are uncorrelated, i.e. R = I n , where I n is the n-dimensional identity matrix and n is the total number of observations. Since there are many situations in practice in which this assumption is not appropriate, a more general class of models is considered here, allowing any covariance structure for the errors to be specified in R.
There are two important special cases of LMMs that are often used in applications: the simple random intercept model and mixed model smoothing. In both special cases, the model contains only one random effect vector b of length The random intercept model can be used to model longitudinal or clustered data where it is assumed that the individual or cluster means deviate randomly from the overall mean [e.g. 20], To express a penalized splines model as an LMM, the basis coefficients of the nonparametric model are split into two groups: those coefficients which are shrunk by the penalization imposed during the estimation and the unpenalized ones. The penalization of the one group of coefficients can equivalently be expressed by assuming a normal distribution with mean zero for those coefficients. Therefore, the penalized splines model can be formulated as an LMM where the random effects vector b is formed by the penalized basis coefficients. These coefficients represent deviations of the modeled function from a global polynomial of degree d.
For example, consider a penalized splines model with truncated power functions as spline basis functions where κ 1 < · · · < κ K are K knots and (u) d + = (max{0, u}) d . To impose smoothness of the estimated curve, high values of the coefficients b 1 , . . . , b K are penalized. Then, the estimator of θ = (β 0 , . . . , β d , b 1 , . . . , b K ) is given by: where B is the design matrix of the spline basis, λ is the inverse of the so-called smoothing parameter, which controls the amount of shrinkage, and This penalized splines model can alternatively be formulated as an LMM, where and the smoothing parameter is given by 1 . Estimating a smooth function within the LMM framework has the advantages of higher computational efficiency and of an integrated, automated smoothing parameter selection. Detailed information about penalized splines and mixed model smoothing can be found, for example, in Eilers and Marx [5] and Ruppert, Wand and Carroll [15].

Testing for zero variance components in LMMs
We are interested in testing whether a random effect is different from zero. In the case of a random intercept model for longitudinal data, this means determining whether there are significant individual-specific deviations from the population mean. In the penalized splines context, the tested null hypothesis (H 0 ) is that the smooth function of interest can adequately be described by a polynomial of a given degree d. Since the random effects have expectation zero, the considered testing situation is equivalent to testing if the variance of that random effect is different from zero. Hence, the test has the structure: The considered testing problem is not regular because first, under the null hypothesis the tested parameter lies on the boundary of the parameter space and second, the data are not i.i.d. because of the particular covariance structure of grouped data or due to correlations induced by penalized splines. Both characteristics violate the typical regularity conditions for likelihood ratio testing, therefore, the asymptotic distribution of the Likelihood Ratio Test statistic (LRT) is not the standard χ 2 distribution.
In the case of longitudinal data, however, the vector of observations y can be subdivided into independent subvectors. Considering an increasing number of subvectors, Stram and Lee [19] derived an asymptotic distribution of the LRT, applying results of Self and Liang [18]. This asymptotic distribution is an equal mixture of two χ 2 distributions. As shown by Morrell [9], their results also apply to the RLRT statistic, which is the test statistic based on Restricted Maximum Likelihood (REML) estimation of the variance components.
If we consider LMMs of a more general form like, for example, a penalized splines model, such a subdivision of the data is not possible. Therefore, the asymptotic χ 2 mixture distribution does not apply to this case. Crainiceanu and Ruppert [3] found the exact finite sample distribution of the LRT and of the RLRT in the case of an LMM with only one random effect variance component and i.i.d. errors. They also derived asymptotic distributions of both statistics in this situation. The distribution of the LRT under the null hypothesis can have a very high mass on zero; thus, testing on the basis of the LRT may be impracticable. That is why we will focus on the RLRT in the remainder of the paper.
Consider a test of the given structure for an LMM with only one random effect (vector) and i.i.d. errors. In this case, the exact RLRT distribution under H 0 can be expressed (with λ = σ 2 b/σ 2 ǫ ) as [3]: where N n (λ) = K l=1 λµ l,n 1 + λµ l,n ω 2 l , D n (λ) = s and ω l iid ∼ N(0, 1), l = 1, . . . , n − p. This distribution only depends on the design matrices of the fixed and random effects, X and Z s , and on the correlation structure within the random effects vector, Σ s .
Critical values or p-values of the distribution of RLRT n can be determined efficiently by simulation, which is implemented in the RLRsim package [16] for the statistical software environment R [13]. In case of nuisance variance components, the above RLRT distribution is a well-performing approximation of the exact test distribution [6,17].
The previously mentioned results are based on the assumption R = I n . We are interested in a more general setting where the covariance matrix of the errors can be of any possible structure, including heteroscedastic and correlated errors. Since the existing test cannot directly be extended to a general R, we suggest to follow the idea of the Generalized Least Squares (GLS) transformation to derive the null distribution of the RLRT in this situation.

Regression with correlated errors
The errors in a simple linear regression model are usually assumed to be i.i.d. with expectation zero and fixed variance σ 2 ǫ . In this situation, the Best Linear Unbiased Estimator (BLUE) of β is the Ordinary Least Squares (OLS) estimator, β OLS = (X ′ X) −1 X ′ y, which is equivalent to the ML estimator in the case of normality.
If the i.i.d. assumption is not fulfilled, the OLS estimator is still unbiased but no longer efficient. In case the covariance matrix of the errors Cov(ǫ) = V is known, one can obtain the BLUE as the GLS estimator: which is again the ML estimator, under normality. The GLS estimate of β can also be obtained as the OLS estimate of a transformed model, where the initial In the transformed model, the error covariance matrix is given by Cov( ǫ) = I n . Furthermore, if the covariance matrix of the errors is specified by V = σ 2 ǫ R, assuming the covariance structure is known up to a constant, it is possible to base the so-called GLS transformation only on the matrix R. Then, the variance of the transformed i.i.d. errors is σ 2 ǫ instead of one.
If the covariance structure is unknown, so-called Feasible GLS (FGLS) estimators can be used. Many of those are obtained from two-step estimation methods where the parameters of the covariance matrix are estimated in a first step. Then, the data are transformed by the inverse root of the estimated covariance matrix, before OLS estimation is finally applied to the modified data set. FGLS estimators are consistent and asymptotically normal, provided certain regularity conditions are fulfilled [10, 22, ch. 12]. Furthermore, comparative simulation studies indicate that consistent FGLS estimators are more efficient than the OLS estimator, in the situation of linear regressions with correlated errors as well as in the case of simple variance component models [14,1].
In the context of LMMs, parameter estimates are efficiently obtained via (RE)ML estimation. Thus, we are not interested in (F)GLS estimation here, but we adopt the idea of (F)GLS estimation to construct a test for zero variance components of an LMM in the case of a general error covariance structure.

Known covariance structure of the errors
We use the idea of the GLS transformation to derive the RLRT distribution in the case of a general covariance structure of the errors, i.e. V = σ 2 ǫ R for some R.
Consider first the LMM (1) with one random effect, S = 1, σ 2 S = σ 2 b , and assume that R and Σ are known. Model (1) then reduces to We can show that the RLRT for testing (2) in (4) is equivalent to the RLRT for hypothesis (2) in the GLS-transformed model where Lemma 1. The RLRT for hypothesis (2) in (4) is equivalent to the RLRT for hypothesis (2) in the GLS-transformed model (5).
A proof is given in the appendix. As model (5) has independent and identically distributed errors, the exact RLRT distribution for testing (2) in (4) is thus given by (3) [3], with the design matrixes X and Z replaced by X and Z.

Unknown covariance structure of the errors
Now consider the more general case where R = R(φ) may depend on an unknown parameter vector φ, which is estimated simultaneously with the other parameters using REML estimation.
We propose to use the RLRT distribution derived in 2.4.1 as an approximation in this case, substituting the estimate R = R( φ) from the null model to obtain the transformed design matrices X = R − 1 2 X and Z = R − 1 2 Z. Under the null hypothesis, φ is a consistent estimator of φ, and this substitution should thus be negligible asymptotically. We investigate the goodness of the approximation for finite sample sizes in the next section, focusing in particular on the observance of the type 1 error of the resulting test.
For testing a random effect b 1 in models with more than one random effect, S > 1, we employ the approximation proposed in Greven et al. [6]. In brief, this method treats the random effects in the model that are not tested as nuisance parameters. It uses an idea similar to pseudo-likelihood estimation to reduce the model to one with only the relevant random effect b 1 . Then, the distribution of the RLRT testing for this random effect's variance σ 2 1 in a reduced model containing all fixed effects, but only the random effect b 1 , is used as the distribution of the test statistic under the null hypothesis. The goodness of this approximation for i.i.d. errors was shown in Greven et al. [6]. We analyze it in the context of a general covariance structure using our transformation approach for the reduced model in the next section.
Additionally, we point out that our transformation approach actually provides an alternative way to approximate the null distribution of the test statistic in models with more than one random effect. This is achieved by writing the model as one with a single random effect and general covariance structure including the marginal covariance contribution from the additional random effects. Combining the nuisance random effects with the error terms in a new error term ε := s>1 Z s b s + ǫ, we can apply our transformation approach using the estimated correlation matrix of this new error term. That is, if Cov(b s ) = σ 2 s Σ s and λ s = σ 2 s/σ 2 ǫ , s = 1, . . . , S, the design matrices are transformed as X = This second approach explicitly takes the additional covariance component added by the nuisance random effects into account, and thus may further add in precision to the approximation of the null distribution over the approach in Greven et al. [6]. We also investigate this question in simulations in the next section.

Simulation study
We set up a simulation study to investigate the performance of the proposed approximation of the RLRT distribution in the case of correlated errors. Motivated by the economic analysis of subjective well-being mentioned in section 1, we consider a longitudinal data set where the errors of each individual follow a first order autoregressive process (AR (1)). The simulation setup covers three different settings. All simulations are conducted in the statistical software environment R [13]. The proposed test is then evaluated by means of simulation-based type-one error rates. Moreover, we consider a selection of additional scenarios to investigate the robustness of our results with respect to other specifications of the error covariance structure.

Simulation setup
For our simulation study we consider special cases of the following general model: The design matrix of the fixed effects, X, contains terms of a cubic polynomial in x, Z 1 is the design matrix of a random intercept, and Z 2 contains cubic truncated polynomial terms completing a penalized splines model for f (x). Finally, ǫ contains the residuals that are autocorrelated for each individual. We investigate three different test situations: We vary the number of individuals N ∈ {20, 100, 500}, the number of observations per individual n i ∈ {4, 10, 20} as well as the parameter of autocorrelation ρ ∈ {0, 0.4, 0.8}. We run 10 000 simulations for each of the 27 scenarios defined by combinations of the varied parameters in each of the three test situations. The data are generated according to the corresponding null model of the test situation. Motivated by our application we set β = (11, −0.7, 0.03, −0.0003) ′ and σ 2 1 = σ 2 ǫ = 1. For the penalized splines model, we choose a cubic truncated power function basis with 39 inner knots, which represent the 2.5 j% quantiles, j = 1, . . . , 39, of the empirical distribution of the simulated x-values.
In each simulation step, both null and alternative model are estimated using the functions lme() and gls() of the R package nlme [12] with option REML. After this, the RLRT statistic is computed. Then, the GLS transformation matrix is constructed using the estimate ρ 0 from the null model as correlation parameter ρ. In the considered AR(1) case, the transformation matrix Finally, the design matrices of the null model are transformed and used to obtain the 95% quantile of the RLRT distribution through the RLRTSim() function of the RLRsim package [16], and the test decision is taken.
To investigate the effect of an increase in the number of correlation parameters that need to be estimated, we also study two alternative settings with a larger number of parameters. The first alternative case is that of an ARMA(1,1) error process with two unknown parameters, the auto-regressive parameter φ = 0.4 and the moving average parameter θ = 0.2. We here consider all combinations of N and n i as done for the AR(1) case. The second alternative case is that of a general unstructured covariance structure of the errors. We simulate data using an AR(1) process with ρ = 0.4, but estimate each model without imposing a specific structure on the covariance. Here, we only consider n i = 4 for each N , as such models get prohibitive to estimate for large n i . This then gives n i (n i + 1)/2 = 10 unknown covariance parameters in Cov(ǫ). For both cases we study the most complex scenario 3 of testing for a cubic versus a smooth function in the presence of a nuisance random intercept. All other choices are as before.
Finally, for scenario 3 and an AR(1) error process with ρ = 0.4 and all combinations of N and n i , we also investigate the goodness of our approximation by writing the model as one with a single random effect and general covariance structure including the marginal covariance contribution from the additional random effects. Accordingly, we transform the design matrices as X = ( R + Table 1 lists the percentages of rejections of the null hypothesis in each scenario for the AR(1) error process. We observe values ranging from 4.57% to 5.91%, which are all reasonably close to 5%. Thus, the estimation uncertainty of the correlation parameter does not considerably affect the accuracy of the test. Furthermore, there is no systematic variation of the rejection rate across the simulated scenarios besides a general slight improvement of accuracy with larger sample sizes n. We conclude that the proposed approximation of the exact RLRT distribution performs sufficiently well. Results for scenario 3 when writing the model as one with a single random effect and general covariance structure indicate a slight improvement over using our transformation for a reduced model following Greven et al. [6]. This shows that the transformation approach is also a very promising approximation when testing for a random effect in linear mixed models with several random effects. Table 2 shows results for more general error covariance structures. Comparing results for ARMA(1,1) to those for AR(1), we find that estimating two Table 1 Percentages of rejected tests out of 10 000 simulated tests per scenario. 3(a) indicates testing in setting 3 using results by Greven et al. [6]. 3 (b) indicates testing in setting 3 by writing the model as one with a single random effect and general covariance structure Scenarios  correlation parameters in R compared to one does not noticeably deteriorate the quality of the approximation of the test statistic's null distribution. Even for the general unstructured covariance with ten unknown parameters, the approximation is not much worse. The exception is N = 20 and n i = 4, which apparently is a small number of observations to estimate 10 error correlation parameters in addition to 4 fixed effects and 2 random effects' variances. As expected, a larger number of observations is needed for the approximation to work well if the number of correlation parameters is very large. However, if a reasonable number of observations is provided, the approximation again is quite good.

Background
The work presented here was motivated by an econometric study about subjective well-being [23]. One crucial question in this field is how the level of overall life satisfaction evolves with age. Is there a midlife crisis? Does the level of life satisfaction decline when growing old? In the literature on well-being, it is often assumed that the relationship between life satisfaction and age can be described by a quadratic function. Yet empirical analyses using models that are quadratic in age find contradictory results: both a U-shaped profile [e.g. 2] and a reverse U-shaped profile [e.g. 4] have been reported. Since either shape lacks a convincing foundation in the economic theory, Wunder et al. [23] argue that the profiles found in these studies are an artefact of the quadratic modeling and suggest to follow a semiparametric approach instead.
Wunder et al. [23] base their analyses on data from the German SOEP and the British Household Panel Survey (BHPS), two suitable data sources for studies on well-being [21]. For each data set, they estimate an additive mixed effects model including a smooth term in age, several socio-economic control variables and a random intercept for each individual. The estimation of the smooth functions is based on the LMM representation of penalized splines. The resulting profiles appear to be clearly different from a quadratic curve. Moreover, likelihood-based tests and Bonferroni-adjusted confidence intervals indicate that, even though an LMM with a cubic function in age fits the data better than a quadratic one, this parametrization is not flexible enough to adequately describe the entire variability within the data sets.
In their models, Wunder et al. [23] assume that the errors are i.i.d. conditional on the random intercept. However, individual values of self-reported life satisfaction are likely to be correlated over time. Further investigations indeed confirmed a superior fit for a model with AR(1) errors for each respondent. Thus, the test based on the i.i.d. assumption for the errors is only a rough approximation. Here, we analyze the SOEP data set again, explicitely accounting for the covariance structure of the individual error terms. We fit models including errors that follow an AR(1) process for each respondent and then apply the appropriate test developed in Section 2.

Analysis of the SOEP data
For the SOEP Study, overall life satisfaction is measured on a discrete scale from 0 (not satisfied) to 10 (completely satisfied). The analyzed data set contains observations of 20 waves between 1986 and 2007. The years 1990 and 1993 are excluded, because one relevant control variable is not available for these years. Altogether, there are 253 044 data points obtained from 33 451 persons. Since the data set is sufficiently large, we can assume an approximate normal distribution for the life satisfaction and apply the LMM framework.
The aim of our analysis is to reveal the relationship between age and selfreported well-being, controlling for relevant socio-economic characteristics and at the same time accounting for the correlation structure of the observations. For this purpose, we test the null hypothesis that the profile of well-being over the life span is adequately described by a cubic polynomial against a smooth alternative with an RLRT. This test has the same structure as the test in the third situation of the simulation study in section 3.
We consider the following models under the null and under the alternative hypotheses: Here, X contains terms of a cubic polynomial in age and additional covariates, which have been chosen in accordance with Wunder et al. [23]. The matrix Z 1 is the design matrix of a random intercept and Z 2 is composed of cubic truncated polynomial terms of the penalized splines model under H 1 . Furthermore, the errors in ǫ are assumed to be autocorrelated for each individual. A detailed list of the covariates included in X can be found in appendix B, together with further estimation results.
To obtain the test decision, we transform the design matrices X and Z 2 with the inverse root of the estimated correlation matrix under the null hypothesis and determine the 95% and 99.9% quantiles of the corresponding RLRT statistic.
Since the data set is an unbalanced panel, the transformation matrix is block diagonal with N = 33 451 blocks of different sizes n i × n i , n i ∈ {1, . . . , 20}. Furthermore, for some respondents there are missing observations between two observations. We found that, in this case, the individual transformation matrices R − 1 2 i are given by: . . , n i }, is the time difference between two subsequent observations and f (∆ h ) = 1 + ∆ h −1 r=1 ρ 2r . Using the explicit form above to obtain the transformation matrix requires much less computational effort than other methods to compute R − 1 2 , which is particularly useful when analyzing a large data set as we do here. Figure 1 shows both estimated curves. There are considerable differences between the profiles estimated from the null and from the alternative model, in particular between 45 and 70 years of age. In contrast to the curve from the cubic model, the semiparametric curve does reveal a midlife crisis, which is located around the age of 50. Moreover, the following temporary recovery before the final decline of life satisfaction is not reflected by the curve estimated from the null model, which is due to the limited flexibility of a cubic polynomial.
In the corresponding RLRT, the null hypothesis is rejected at both levels, 95% and 99.9% (i.e. the p-value is less than 0.001). Hence, the test confirms that the profile of well-being over the life span has a complex shape that cannot be captured by a polynomial of a small degree. Neither a U-shaped or reverse U-shaped profile nor a cubic function are suitable to describe the course of life satisfaction along the aging process.

Discussion
In the present paper, we established a test for zero variance components in LMMs where the errors are not i.i.d. We used the idea of the GLS transformation to derive the RLRT distribution in the case of a general covariance structure of the errors. If the covariance structure of the errors is unknown and has to be estimated, we proposed to approximate the RLRT distribution by transforming the model according to the covariance parameters estimated from the considered null model. We set up a simulation study to investigate the performance of the proposed approximation of the RLRT distribution. Since the type one error rates of the simulated tests were reasonably close to 5%, we concluded that the approximation performs sufficiently well. Finally, we applied our test methodology to a micro-econometric analysis on subjective well-being. The test provides an answer to one crucial question of well-being economics: The profile of overall life satisfaction over the life span has a complex shape, that cannot be captured by a polynomial of a small degree. First, there is a steady decline up to a midlife crisis around the age of 50, then, well-being temporarily rises again, before finally, it declines again from the early sixties on.
The proposed method can be applied to a wide range of practical settings where correlated or heteroscedastic errors occur. Our approach also provides a way to test for a random effect in linear mixed models with several random effects. This is achieved by writing the model as one with a single random effect and general covariance structure including the marginal covariance contribution from the additional random effects. Our transformation approach can then be applied with the estimated covariance parameters.
to a constant -independent of the precise error contrast chosen [7]. A suitable matrix A 1 is, for example, The restricted likelihood for model (5) is the pdf for A 2 y, where A 2 y again define suitable error contrasts, for example, A 2 = I n − X( X ′ X) −1 X ′ . It is Thus, A 2 y is a simple variable transformation of A 1 y, and the pdf of A 2 y is equal to the pdf of A 1 y up to a multiplicative constant, the Jacobian |R − 1 2 |. For the two restricted likelihood ratio test statistics RLRT 1 , based on the profile restricted log-likelihood ℓ 1 (λ) for model (4), and RLRT 2 , based on the profile restricted log-likelihood ℓ 2 (λ) for model (5), where λ = σ 2 b/σ 2 ǫ , we have Appendix B: List of covariates and further estimation results indicator for attrition in t + 1, reference category: not retired from the panel in t + 1 Attrition in t + 2 indicator for attrition in t + 2, reference category: not retired from the panel in t + 2 Attrition in t + 3 indicator for attrition in t + 3, reference category: not retired from the panel in t + 3  SOEP 1986SOEP -2007SOEP (excl. 1990SOEP , 1993