MIXREGLS : A Program for Mixed-Eﬀects Location Scale Analysis

MIXREGLS is a program which provides estimates for a mixed-eﬀects location scale model assuming a (conditionally) normally-distributed dependent variable. This model can be used for analysis of data in which subjects may be measured at many observations and interest is in modeling the mean and variance structure. In terms of the variance structure, covariates can by speciﬁed to have eﬀects on both the between-subject and within-subject variances. Another use is for clustered data in which subjects are nested within clusters (e.g., clinics, hospitals, schools, etc.) and interest is in modeling the between-cluster and within-cluster variances in terms of covariates. MIXREGLS was written in Fortran and uses maximum likelihood estimation, utilizing both the EM algo-rithm and a Newton-Raphson solution. Estimation of the random eﬀects is accomplished using empirical Bayes methods. Examples illustrating stand-alone usage and features of MIXREGLS are provided, as well as use via the SAS and R software packages.


Introduction
Mixed-effects regression models (aka hierarchical linear models, multilevel models) have become a primary method for analysis of longitudinal (Hedeker and Gibbons 2006;Verbeke and Molenberghs 2000) and clustered (Goldstein 2011;Raudenbush and Bryk 2002) data.A basic characteristic of these models is the inclusion of random effects into regression models in order to account for the influence of subjects or clusters on their nested observations.Here, we will focus on longitudinal data, but it should be understood that the methods and program can also be applied to clustered data.For longitudinal data, these random effects reflect each person's growth or development across time, and the variance of these random effects indicate the degree of variation that exists in the population of subjects.Typically, the error variance, or the within-subjects (WS) variance, and the variance of the random effects, or the between-subjects (BS) variance, are treated as being homogeneous across subject groups or levels of covariates.However, these homogeneity of variance assumptions can be relaxed by modeling differences in variances, both between and within, across subject groups.The study of intra-individual variability has received increasing attention (Fleeson 2004;Hertzog and Nesselroade 2003;Martin and Hofer 2004;Nesselroade 2004); these articles describe many of the conceptual issues and some traditional statistical approaches for examining such variation.
Modern data collection procedures, such as ecological momentary assessments (EMA) and/or real-time data captures, have been developed to record the momentary events and experiences of subjects in daily life (Bolger, Davis, and Rafaeli 2003).These procedures yield relatively large numbers of subjects and observations per subject, and data from such designs are sometimes referred to as intensive longitudinal data (Walls and Schafer 2006).Such designs are in keeping with the "bursts of measurement" approach described by Nesselroade and McCollam (2000), who called for such an approach in order to assess intra-individual variability.Studies of this type are particularly amenable to examining intra-individual variation and to explain why subjects differ in variability rather than solely in mean level.In this article, one of our examples will be from a natural history study of adolescent smoking, using EMA, where interest was on determinants of the variation in the adolescents' moods.Hedeker, Mermelstein, and Demirtas (2008) described the mixed-effects location scale model as an extended random-intercept model in which log-linear (sub)models are included for both the WS and BS variance, allowing covariates to influence both sources of variation.Additionally, the model also includes a random subject effect to the WS variance specification.This permits the WS variance to vary at the subject level, above and beyond the influence of covariates on this variance.The correlation of the random scale and location effects is included in the model, yielding a more general and realistic specification for the random effects.In Hedeker et al. (2008), SAS PROC NLMIXED was used to estimate the model parameters.However, NLMIXED is rather slow to run and often requires excellent starting values to converge to a solution, particularly for the mixed-effects location scale model.Additionally, some data analysts find NLMIXED difficult to use, as it requires a higher degree of statistical programming knowledge than the average procedure in SAS.These considerations led to the development of the MIXREGLS (mixed-effects regression with location scale) program.This article describes MIXREGLS for the analysis of repeated or clustered (conditionally) normally-distributed response variables using maximum likelihood and empirical Bayes estimation procedures.The full model is estimated in three sequential stages, each using Newton-Raphson, and the results of each stage are provided in the output.Prior to Stage 1, once the data are read in, 20 iterations are performed of the EM algorithm to estimate the parameters of a random intercept model (regression coefficients, BS variance, WS variance, and random location effects).These estimates are then used as starting values at Stage 1 to estimate the same parameters plus the BS variance effects.Then at Stage 2, WS variance effects are added, and finally at Stage 3 the random scale (and association between the random location and scale effects) is included.Estimates at each stage are used as starting values at the next stage, which improves the convergence of the final model.This also provides a way of assessing the statistical significance of the additional parameters in each stage via likelihood-ratio tests.
The organization of the article is as follows: Section 2 describes the mixed-effects location scale model, Section 3 presents details of MIXREGLS use, Section 4 illustrates application of MIXREGLS using two examples, and Section 5 discusses and summarizes the program.
Estimation details are provided in Appendix A. The datasets that comprise the two examples are distributed with the program.The first is a traditional longitudinal study in which subjects are measured at six timepoints.The second example is an intensive longitudinal dataset incorporating EMA, in which subjects were measured, on average, 34 times during a weekly measurement period.Section 4 provides detailed instructions on MIXREGLS usage for these two examples, including explanation of the output, additional results files, how the program can be run via SAS, and running the program's dynamic-link library (DLL) from R. Though the examples are both longitudinal in nature, in which observations are clustered within subjects, these methods and the program can also be applied to multilevel or hierarchical data in which subjects are nested within clusters (e.g., classrooms, schools, clinics, hospitals, etc.).

Mixed-effects location scale model
Consider the following mixed-effects regression model for the measurement y of subject i (i = 1, 2, . . ., N subjects) on occasion j (j = 1, 2, . . ., n i occasions): where x ij is the p × 1 vector of regressors (typically including a "1" for the intercept as the first element) and β is the corresponding p × 1 vector of regression coefficients.The regressors can either be at the subject level, vary across occasions, or be interactions of subject-level and occasion-level variables.In the multilevel terminology, subjects are at level 2, while the repeated observations are at level 1.Thus, the level-2 random subject effect υ i indicates the influence of subject i on his/her repeated level-1 measurements.The population distribution of these random effects is assumed to be a normal distribution with zero mean and variance σ 2 υ .The errors ij are also assumed to be normally distributed in the population with zero mean and variance σ 2 , and independent of the random effects.Here, σ 2 υ represents the BS variance and σ 2 is the WS variance.
To allow covariates (i.e., regressors) to influence the BS and WS variances, we can utilize a log-linear representation, as has been described in the context of heteroscedastic (fixed-effects) regression models (Harvey 1976;Aitkin 1987), namely, The variances are subscripted by i and j to indicate that their values change depending on the values of the covariates u ij and w ij (and their coefficients).The number of parameters associated with these variances does not vary with i or j.Both u ij and w ij would usually include a (first) column of ones for the reference BS and WS variances (α 0 and τ 0 ), respectively.Thus, the BS variance equals exp α 0 when all covariates u ij equal 0, and is increased or decreased as a function of these covariates and their coefficients α.Specifically, for a particular covariate u * , if α * > 0, then the BS variance increases as u * increases (and vice versa if α * < 0).The exponential function ensures a positive multiplicative factor for any finite value of α, and so the resulting variance is positive.It should be noted that an occasion-level variable, say u ij , can alter the BS variance σ 2 υ ij by making subjects more/less heterogenous across occasions.For example, in our modeling of mood in Section 4.3 the occasion-level variable "being alone (vs. with others)" increases subject-level heterogeneity in mood.Thus, subjects are more alike in their mood responses when they are with others than when they are alone.The WS variance is modeled in the same way, allowing both subject-and occasion-level variables to influence how consistent/erratic a subject's responses are.The coefficients in α and τ indicate the degree of influence on the BS and WS variances, respectively, and the ordinary random intercept model is obtained if α = τ = 0 for all covariates in u ij and w ij (i.e., excluding the reference variances α 0 and τ 0 ).
The WS variance can vary across subjects, above and beyond the effect of covariates, namely, where the random subject (scale) effects ω i are distributed in the population of subjects with mean 0 and variance σ 2 ω .The idea for this is akin to the inclusion of the random (location) effects in Equation 1, namely, covariates do not account for all of the reasons that subjects differ from each other.The υ i parameters in Equation 1 indicate how subjects differ in terms of their means and the ω i parameters in Equation 4 indicate how subjects differ in variation, beyond the effect of covariates.Taking logs in Equation 4 yields log(σ 2 ij ) = w ij τ + ω i , which indicates that if the distribution of ω i is specified as normal, then the random effects serve as log normal subject-specific perturbations of the WS variance.In other words, the WS variances follow a log normal distribution at the subject level.The skewed, nonnegative nature of the log normal distribution makes it a reasonable choice for representing variances, and it has been used in many diverse research areas for this purpose (Fowler and Whitlock 1999;Leonard 1975;Renò and Rizza 2003;White, Shenk, and Burnhamb 1998;Vasseur 1999).
In this model, υ i is a random effect which influences a subject's mean, or location, and ω i is a random effect which influences a subject's variance, or (square of the) scale.Thus, the model with both types of random effects is dubbed a mixed-effects location scale model.These two random effects are correlated with covariance parameter σ υω , which indicates the degree to which the random location and scale effects are associated with each other.
Visually, Figure 1 presents the pertinent details of the model.The average across all subjects is depicted by the solid horizontal line, and the lines of two subjects are presented as dotted lines.In a given dataset, there will be as many dotted lines as there are subjects, but for simplicity here we only plot two subjects.Also, for simplicity, consider all covariates as subject-varying only.The effect of covariates x on the mean response is represented by β; these effects either raise or lower the solid line as a function of the covariates.Each dotted line represents a person's random location effect υ i , which indicate how a subject deviates from the mean response.In Figure 1, one subject is above and another subject is below the mean line.The heterogeneity in these dotted lines is indicative of BS variance: if the dotted lines are close together then there is not much subject heterogeneity, conversely if the dotted lines are spread out then more heterogeneity is indicated.The effect of covariates u on this heterogeneity is represented in the model as α.The degree of variation of a person's data points around their line is the WS variance.If the points are quite close to a subject's line, then that subject has low WS variance (e.g., the lower subject in Figure 1).Conversely, if a subject's data points vary greatly around that person's line then there is more WS variation (e.g., the upper subject in Figure 1).Covariates w influence the WS variation through the coefficients τ .Finally, the model posits that covariates do not explain all of a subject's WS variance by including the random scale effect ω i .

Standardization of random effects
It is convenient to represent the random effects in standardized form (i.e., as standard normals).For this, we can use the Cholesky factorization (Bock 1975).
Here, we include the subscripts i and j on the Cholesky elements because the BS variance σ 2 υ ij is allowed to vary by subjects and/or occasions.The model can now be written as where s 1ij = σ υ ij = exp(u ij α) , and the errors ij have variance given by

Alternative formulation for association of location and scale
Suppose that instead of allowing the location and scale random effects to be correlated, we assume that they are independent (i.e., σ υω = 0, and therefore s 2ij = 0), but that the location random effect θ 1i explicitly influences the WS variance.In this case, the WS variance could be expressed as where the regression coefficient τ l represents the (linear) influence of the location random effect θ 1i on the (log of the) WS variance.These two models of the WS variance are essentially the same, although in Equation 7 the parameter s 2ij is indicative of the covariance between the random location and scale effects, whereas in Equation 8 the parameter τ l represents the effect of the random location effect on the WS variance.Also, the Cholesky element s 3ij in Equation 7 is replaced by the (simpler) square root of the random scale σ ω in Equation 8. We have merely shifted from a correlation-like association between the mean and variance to a regression setting in which the mean influences the variance.Although equivalent in the present case, the latter representation can be more easily generalized to represent various forms of the relationship between the random location effect and the WS variance.For example, one can easily extend the model to allow for a quadratic relationship, namely, A quadratic relationship would seem to be useful for rating scale data with ceiling and/or floor effects, where subjects that have mean levels at either the maximum or minimum value of the rating scale also have near-zero variance.For example, if the rating scale goes from 1 to 10, then any subject with a mean level near either 1 or 10 would almost certainly have a very small variance, giving rise to the potential for a quadratic relationship between the mean and variance.In this regard, MIXREGLS allows for three possibilities: (1) no association (τ l = τ q = 0); (2) linear association only (τ l = 0, τ q = 0); and (3) linear and quadratic association (τ l = 0, τ q = 0).For a given program run, the user can select one of these three possibilities using the NCOV option, described in Section 3.

Intraclass correlation
The standardized random effects θ 1i and θ 2i are both normally distributed with mean 0 and variance 1, and are independent of each other.The expectation of y ij , E(y ij ), is simply x ij β.
For the situation in which the random location θ 1i has a quadratic effect on the WS variance (i.e., τ q = 0), the variance of y ij varies as a function of the random location effect.However, for the simpler situation in which τ q = 0, the variance of y ij is given by The covariance for any two observations nested within the same subject i equals Expressed as a correlation, this yields the intraclass correlation (ICC), denoted as r ij , Note that the ICC, which represents the proportion of total unexplained variation that is at the subject level, can be obtained for specific values of the covariates u ij and w ij .Thus, based on the current model, the ICC is allowed to vary as a function of both time-varying and time-invariant covariates.

Program description and usage
MIXREGLS uses maximum likelihood to estimate the model parameters β, α, τ , τ l , τ q , and σ ω .Specifically, it uses the Newton-Raphson algorithm to iteratively achieve the likelihood solution, with integration over the random effects done using numerical quadrature.The quadrature points and weights can be adapted to each person's data.Upon convergence, estimates of the random effects (θ 1i and θ 2i ) are obtained using empirical Bayes methods.Details on estimation are provided in Appendix A.
MIXREGLS was written in standard Fortran 95 for Windows-based computers.It can be run in batch mode using the executable file mixreglsb.exe,or accessed via the DLL file mixregls.dll;an example of this latter type of usage within R will be given in Section 4.2.
Here, we will first describe batch processing use.Program instructions must be stored in the file mixregls.def,the contents of which will be described below.MIXREGLS makes use of the following files: mixregls.def(definition file), filename.dat(input data file), filename.out(main output file), and filename.def(definition file to be saved).

Structure of filename.dat
This file contains all data read by the program.It is read in free format and must be a standard text (ASCII) file with no hidden characters or word processing format codes.Variable fields must be separated by one or more blanks.The data are assumed to consist of multiple level-1 observations within a level-2 unit, for example, in the longitudinal setting there are repeated observations (level 1) within subjects (level 2).There must be a level-2 id variable for each record, and the data must be sorted by this variable.The repeated measurements of a subject often take up as many records in this file as there are measurements for that subject.An exception to this is when the variables for a subject on a given occasion comprise more than one physical record, say two records.In this case, the repeated measurements of a subject take up twice as many records in the file as there are measurements for that subject.Also, if missing value codes are utilized, each subject may have data on the same number of records, but some records will contain missing value codes for some (or all) of the variables.
The fields of variables that are read in, separated by one or more blanks, usually consist of the id variable, the dependent variable, and covariates.(the order of variables does not matter ).The covariates include the regressors for the mean submodel (x), BS variance submodel (u), and WS variance submodel (w); the covariates in these submodels (i.e., x, u, and w) can be the same or different.All variables are read as real numbers with the exception of the id variable which is read as integer.All missing data must have a numeric missing value code, in particular, blank fields or periods are not acceptable as missing value codes.There is no need to include a column of ones in the dataset for the intercept(s), as the program can include intercepts for all submodels (though this can be changed using the PNINT, RNINT, and SNINT options, see below).

Structure of mixregls.def
This file contains specifications that determine the statistical model to be fit to the data in filename.dat.Except where noted, this file is read in free format.The filename and extension (mixregls.def)must be used, and should be in the same directory as the program mixreglsb.exe.This file needs to be created with an editor and saved in text format; one then double-clicks on the mixreglsb.exefile (or types mixreglsb at the command prompt) to run the program.Below are the specifications, and their optional values, that are necessary.
Line 5filename.def(definition file; the contents of mixregls.defare saved to this file).CONV = convergence criterion (usually 0.0001 or 0.00001); when all of the parameter first derivatives are smaller than this level, the algorithm has converged to a solution.
NQ = number of quadrature points.
MAXIT = maximum number of iterations; if the program does not converge by this number, the program stops and prints out the current estimates.

MISS = 0 (no missing values) or 1 (missing values).
STD = 0 (covariates will not be standardized) or 1 (all covariates will be standardized with mean 0 and variance 1).
NCOV = 0 (no association between the random location and WS variance), or 1 (only linear association between the random location and WS variance), or 2 (linear and quadratic association between the random location and WS variance).
Line 7 -two parameters: fields of the id variable and the dependent variable in filename.dat.
Line 8 -P parameters: field(s) of mean covariates in filename.dat.
Line 9 -R parameters: field(s) of BS variance covariates in filename.dat.
Line 10 -S parameters: field(s) of WS variance covariates in filename.dat.
Line 11 -8 character label for the dependent variable.
next line(s) -P labels for mean covariates in 8 character width fields (maximum of 10 labels per line).
next line(s) -R labels for BS variance covariates in 8 character width fields (maximum of 10 labels per line).
next line(s) -S labels for WS variance covariates in 8 character width fields (maximum of 10 labels per line).
If MISS=1 then the next lines are required and list the numeric missing value codes.
next line -missing value code for the dependent variable.
next line -P missing value codes, separated by blanks, for the mean covariates.
next line -R missing value codes, separated by blanks, for the BS variance covariates.
next line -S missing value codes, separated by blanks, for the WS variance covariates.
Upon program completion, the file specified on Line 4 (filename.out)will contain the results of the analysis.The contents of this file will be described for the two examples in Section 4.
Additionally, for a given run, four additional files are created by MIXREGLS: mixregls.its,mixregls.re1,mixregls.re2,and mixregls.est.The file mixregls.itscontains iterationspecific details including the likelihood value and the largest correction and derivative of the parameter estimates.The information that is included in this file is also displayed on the screen while the program is running.The file mixregls.re1contains standardized (level-1) residuals, while mixregls.re2contains the empirical Bayes estimates of the random effects (these are sometimes referred to as the level-2 residuals).As mentioned, MIXREGLS fits the mixed-effects location scale model in three stages.At the first stage, only the mean and BS variance covariate effects are estimated.At the second stage, the WS variance covariate effects are added.Finally, at stage three, the random scale effects are included.mixregls.re1and mixregls.re2contain results for all three stages.For the first two stages, four pieces of information are given in mixregls.re2:level-2 id, the number of level-1 observations for the level-2 unit, empirical Bayes estimate of the random location effect, and the posterior variance associated with the location effect.For stage three, which includes random location and scale effects, both empirical Bayes estimates of the location and scale effects are listed in mixregls.re2,followed by the posterior variance-covariance matrix associated with the random effects (in order: location variance, location scale covariance, scale variance).The file mixregls.estcontains a listing of the deviance (−2 ln L), number of iterations, MAXIT (i.e., maximum number of iterations), estimates (in the order P, R, S), and standard errors for all three model stages, respectively.

Some MIXREGLS errors
There are a few errors which can prevent MIXREGLS from running correctly, or even running at all.All information in the filename.datfile needs to be numeric, so any alphanumeric variables in this file will cause the program to fail.Also, missing values left as blank fields, and not given a specified numeric missing value code, may cause the program to fail or to estimate a model which is incorrect from the user's perspective.To see if this is causing a problem, the user can check the correctness of each variable's descriptive statistics (mean, minimum, maximum, and standard deviation) listed in the output file filename.out.If these descriptive statistics are incorrect, the data are not being read into the program correctly and a common reason is that missing values are being left as blank fields in the data file.
If the program "blows up," or does not converge to the solution by the maximum number of iterations (MAXIT), one thing to try is to standardize the covariates by setting STD = 1.This will ensure that all of the covariates, for all submodels, are on the same numerical scale with mean 0 and variance 1.Specifically, the standardized covariate z x is obtained as z x = (x − x)/s x , where x and s x are the mean and standard deviation of x based on all of the observations.
For models that do not converge properly, another thing to try is to increase the number of quadrature points (NQ) and/or switch to adaptive quadrature (i.e., set AQUAD = 1).It may be, for example, that specifying 10 points will not lead to convergence, while specifying 15 points does.Estimation of the final model stage, where the random scale effect is included, can be sensitive to the number of quadrature points.If after increasing the number of quadrature points to a very large number (say NQ = 31 or 41), problems still exist, it may be that the random-effect scale variance cannot be reliably estimated.In this case, a model without random scale may be warranted (i.e., stage 1 or 2).

Examples of mixed-effects location scale regression
MIXREGLS can estimate a variety of models for clustered and longitudinal data.It is especially useful for intensive longitudinal data, where subjects might have 30 or 40 (or more) observations, and interest is on examining the heterogeneity associated with the many responses.Here, we will present two examples that highlight MIXREGLS use for a typical longitudinal dataset and also for an EMA dataset.These two examples illustrate some of the ways in which MIXREGLS can be used, and the results that are obtained from the program.

Longitudinal analysis with random location and scale effects
A psychiatric study described in Reisby, Gram, Bech, Nagy, Petersen, Ortmann, Ibsen, Dencker, Jacobsen, Krautwald, Sondergaard, and Christiansen (1977) focused on the longitudinal responses of 66 depressed inpatients.In this study, subjects were diagnosed with either endogenous (N = 37) or non-endogenous (N = 29) depression, and were rated with the Hamilton depression rating scale (hamdep) weekly for a total of six weeks.Although the total number of subjects in this study was 66, the number of subjects with all measures at each of the weeks fluctuated: 61 at week 0 (baseline), 63 at week 1, 65 at week 2, 65 at week 3, 63 at week 4, and 58 at week 5.For this illustration, we focus on the following aspects of the study: is there evidence of differential improvement across time between endogenous and non-endogenous patients, and do the groups exhibit differential BS and WS heterogeneity.Additional analyses of this dataset can be found in Hedeker and Gibbons (1996).
The model fit to the hamdep scores includes a random subject intercept (i.e., random location effect), as well as a time effect, group effect (endogenous or non-endogenous) and a group by time interaction to examine whether these two groups of patients differed in terms of their initial severity and improvement across time.Additionally, we will investigate whether the two groups have different WS and BS variances, and allow the WS variance to change across time.A matrix representation of the model for subject i is given by Here, the grouping variable (endog) is a dummy-coded variable indicating whether a subject is endogenous (endog=1) or non-endogenous (endog=0).For the trend in hamdep means across time, the model includes a linear effect of week (coded sequentially from 0 to 5) and a group by (linear) week interaction.In terms of the variance modeling, we allow for the BS variance, and for the WS variance.Thus, the parameters α 1 and τ 2 represent differences between the groups in terms of the BS and WS variances, respectively.Additionally, τ 1 characterizes the change in the error (WS) variance across time, τ l is the effect of a subject's random location effect on their WS variance, and σ ω is the standard deviation of the random subject scale effect.
Although the above matrix representation is for a subject with data at all six timepoints, this is not a requirement.If a subject's data are missing or incomplete, then they would simply contribute less than six observations in the dataset, or their missing data would be coded with numeric missing value codes, which would be identified in the mixregls.deffile prior to the statistical analysis.For this first example, the following is the order and names of the variables in the the dataset riesby_example.dat:(subject's) id, hamdep, week, endog, and endweek (the product of endog and week).There are missing value codes (−9) for some subjects at specific timepoints; the data from these timepoints are not used in the analysis, however data from these subjects at other timepoints, where there are no missing data, are used in the analysis.Thus, for inclusion into the analysis, a subject's data (both the dependent variable and all model covariates being used in a particular analysis) at a specific timepoint must be complete.The number of repeated observations per subject then depends on the number of timepoints for which there are non-missing data for that subject.As MIXREGLS uses full likelihood estimation, described in Appendix A, it provides valid inference for incomplete data under missing at random (MAR) assumptions (Little and Rubin 2002).
Covariates are specified for three submodels: the mean, the BS variance, and the WS variance submodels.If the options PNINT, RNINT, and SNINT are all set to zero, then the program will include intercepts for all submodels.This is the usual case and what will be specified here.Based on the model described above the following covariates will be specified; mean: week, endog, endweek; BS variance: endog; WS variance: week, endog.The dataset to be read contains five variable fields, with id (field 1), hamdep (field 2), week (field 3), endog (field 4), and endweek (field 5).The field number is used to identify these variables in the mixregls.deffile for this model, which is listed below.
Riesby Data -adaptive 11 pt WS and BS variance models with random scale riesby_example.datriesby.outriesby.def 5 3 1 2 0 0 0 0.00001 11 1 200 1 0 1 1 2 3 4 5 4 3 4 hamdep week endog endweek endog week endog -9 -9 -9 -9 -9 -9 -9 Note that even though missing values are coded only for the dependent variable in the input data file, numeric missing value codes must be specified in the mixregls.deffile for all model variables (if MISS=1).In this case, the value −9 was specified for all variables since for the dependent variable this value is the correct missing value code, while for all other variables (week, endog, endweek) this value was never observed.Other options that are specified in this mixregls.deffile: a convergence criterion of 0.00001, 11-point adaptive quadrature, a maximum number of iterations set at 200, no standardization of model covariates, and a linear effect of a subject's location effect on the (log of their) WS variance.
The results for the model are written to the file riesby.outand are listed below.Here, for space, we have removed the parameter estimates associated with stages 1 and 2. The first part of the output is a list of some of the information given in the mixregls.deffile.Then, descriptive information about the dataset is provided: the number of level-1 observations, number of level-2 observations, and the number of level-1 observation for each level-2 cluster.Means, minimums, maximums, and standard deviations for all variables used in the analysis are listed.These descriptive statistics should be checked to confirm that the program has read the data correctly, otherwise the results of the subsequent analyses are presumably incorrect.Following the descriptives, starting values for all parameters are listed.Using code from the MIXREG program (Hedeker and Gibbons 1996), the program performs 20 iterations of the EM algorithm for a random-intercepts model.This provides excellent starting values for the regression coefficients, BS and WS variances, and random location effects.
Following the starting values, results for each of the three model stages are printed out.For each, the output includes the number of iterations required for convergence, the final ridge value, the log-likelihood value at convergence, and penalized versions of the log-likelihood value (i.e., Akaike's information criterion and Schwarz's Bayesian criterion).These likelihood values are also listed multiplied by −2 to aid in performing likelihood-ratio tests and/or model selection.The ridge is an adjustment made to the diagonal of the Hessian matrix (matrix of second derivatives) if the program encounters a non-increasing likelihood or some other indication of numerical difficulty.This adjustment often improves the chances of convergence.
Specifically, the diagonal of the Hessian matrix is multiplied by one plus the ridge value at each iteration.Thus, if the ridge value equals zero, there is no adjustment, however if the ridge value is greater than zero, then the diagonal elements of the Hessian matrix are increased.Increasing the diagonal elements of the Hessian matrix decreases the corrections or step sizes for each parameter in the iterative Newton-Raphson algorithm, thereby tending to stabilize the estimation procedure.At present, the ridge starts at zero (stage 1), 0.1 (first few iterations of stage 2), and 0.2 (first few iterations of stage 3), and is increased by 0.1 each time that difficulties are encountered.The ridge is then set back to zero if, after several iterations, the computations seem to be going smoothly.Also, at convergence, the ridge is set back to zero in order to obtain the correct standard errors for the model parameters, however the listing of the final ridge value indicates its value prior to being reset to zero.As such, the listed ridge value is indicative of the degree of computational difficulty that the program encountered.

Empirical Bayes estimatesmixregls.re2
MIXREGLS uses empirical Bayes (EB) estimation for the random effects, and upon program termination, the estimates are automatically saved in the file mixregls.re2.This file contains estimates for all three model stages, and they are listed in order.In the file, there are lines with labels that precede the numeric results of each stage.For example, for the first two stages, four pieces of information are given on each line: the subject's id, the number of repeated observations associated with the subject, the empirical Bayes estimate of the random subject (location) effect, and the posterior variance associated with the random subject effect.The line of labels in the file that precedes these results is "id, nobs, EB mean, EB var."For stage three, which includes random location and scale effects, there are two lines per subject.The first contains the subject id, the number of repeated observations associated with the subject, and empirical Bayes estimates of the random subject location and scale effects.The second line lists the posterior variance-covariance matrix associated with the random subject effects (in order: location variance, location scale covariance, scale variance).The line of labels that precedes these results is "id, nobs, EB mean vector, EB variance-covariance." For the Riesby data, the EB estimates of scale (denoted θ2i ) were sorted, and Table 1 lists the two subjects with the highest and lowest scale estimates.Also listed are the observed hamdep values for these subjects across time, abbreviated as hd0, ... hd5.As can be seen, the two subjects with the highest scale estimates do not follow a linear pattern across time.In particular, the hamdep values at week 1 are quite aberrant relative to the other values across time.One wonders if these are correct values, or perhaps if they are coding errors.Given that this particular study was conducted well in the past, it is impossible to say.However, for more current studies, it seems that these scale estimates can be used to detect unusual patterns of responses and perhaps coding errors.Turning to the subjects with the lowest scale estimates, these are both subjects with near-perfect linear downward trajectories.Thus, subjects with the lowest scale estimates are those with responses that mimic the pattern of the estimated population mean model estimates, which in the present case is a downward linear trend across time.These scale estimates can then be used to identify "ideal" cases, or those subjects that behave most like the average in the population.

Running MIXREGLS from R
MIXREGLS can also be run from R via a dynamic-link library (DLL).Lemmon and Schafer (2005) describe creation of a DLL from Fortran code, and how the DLL can be accessed by R and other software programs.For MIXREGLS, the DLL file mixregls.dll,an R function file mixregls_function.R, and the R syntax file mixregls_run.rare distributed with the program.Additionally, the R package Formula (Zeileis and Croissant 2010) must be installed.Together, these files and package can be used to yield the same analysis of the Riesby data as in the previous section.
The R syntax file mixregls_run.rincludes the following.First, the DLL file is loaded and checked to ensure that it is loaded.

R> source("mixregls_function.R")
The data are read in, variables with missing value codes of -9 are specified as not available, and a summary of the data are listed.

R> print(mixregls.results)
Results of each stage can also be accessed: the deviance (dev), mean submodel estimates (beta), BS variance submodel estimates (alpha), WS variance submodel estimates (tau), and all standard errors (se).A numerical suffix of 1, 2, or 3 indicates the stage.For example, to list the mean submodel estimates for stage 1, the following statement would be issued.

R> mixregls.results$beta1
For stage 3, estimates of the parameters associated with the inclusion of the random scale effect (τ l , τ q , and σ ω ) are returned (spar); to list these the following statement would be used.

R> mixregls.results$spar3
For all three stages, standardized residuals (zerr) are also returned from the DLL.For stage 3, these are plotted in a histogram and presented in Figure 2.

R> hist(mixregls.results$zerr3)
The histogram of the standardized model residuals presented in Figure 2 gives the general impression that the normality assumption is reasonable for these data.
For all three stages, estimates of the random effects (thet) and posterior variances (pvar) are returned.For the first two model stages, which only include a random location effect, thet is a one-dimensional vector (of size N , or the number of level-2 observations).However, for the final model stage with random location and scale effects, thet is a two-dimensional matrix of size N × 2. One might be interested in a scatterplot of these two random effects.
R> plot(mixregls.results$thet3[,1], mixregls.results$thet3[,2], pch = 20) The bivariate scatterplot of the estimated random location ( θi1 on x-axis) and scale effects ( θi2 on y-axis) in Figure 3 shows the heterogeneity in both of these subject effects.Note that the scale of both of these are standard normals, and each dot represents the model estimates for a given subject.The non-significance of the location random effect on the WS variance (τ l = 0.213, p = 0.143) that was obtained in the analysis is borne out in the plot, as there is a minimal positive relationship between these two subject-level estimates.
A few subjects are of note in Figure 3, and the data for these subjects are listed in Table 2. First, note the two subjects in the bottom left corner of the plot; these are subjects (ids 117 and 347) with small location and scale effects.These subjects seem to be good responders (i.e., follow a decreasing linear trend across time) with moderate to mild depression values.Conversely, subject 345, who is towards the bottom right corner of the plot, has a large location effect with a relatively small scale effect.This subject does follow a decreasing trend, These subjects do not follow a decreasing linear trend across time, and the last three subjects even display increasing depression values across time.
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −1 0 In the R function, default values have been set to PNINT = 0, RNINT = 0, SNINT = 0, CONV = 0.0005, NQ = 11, AQUAD = 1, MAXIT = 200, NCENT = 0, and NCOV = 1.These can be overwritten in the call to the function.For example, to use 15 quadrature points and to standardize all of the covariates, then the following would be used.

Intensive longitudinal data with random location and scale
Modern data collection procedures, such as ecological momentary assessments (EMA) (Stone and Shiffman 1994;Smyth and Stone 2003), experience sampling (de Vries 1992; Scollon, Kim-Prieto, and Diener 2003), and diary methods (Bolger et al. 2003), have been developed to record the momentary events and experiences of subjects in daily life.These procedures yield relatively large numbers of subjects and observations per subject, and data from these designs are sometimes referred to as intensive longitudinal data (Walls and Schafer 2006).Such designs are in keeping with the "bursts of measurement" approach described by Nesselroade and McCollam (2000), who called for such an approach in order to assess intra-individual variability.As noted by Nesselroade and McCollam (2000), such bursts of measurement increase the research burden in several ways; however, they are necessary for studying intraindividual variation and to explain why subjects differ in variability rather than solely in mean level (Bolger et al. 2003).As we have described in a series of articles (Hedeker et al. 2008;Hedeker, Demirtas, and Mermelstein 2009;Hedeker, Mermelstein, and Demirtas 2012), mixed-effects location scale analysis of such EMA data is particularly useful.
Data for the analyses reported here come from a longitudinal, natural history study of adolescent smoking (Mermelstein, Hedeker, Flay, and Shiffman 2002).Students included in the study were either in 8th or 10th grade at baseline, and self-reported on a screening questionnaire 6-8 weeks prior to baseline that they either had never smoked, but indicated a probability of future smoking, or had smoked in the past 90 days, but had not smoked more than 100 cigarettes in their lifetime.Adolescents carried hand held computers with them at all times during a seven consecutive day data collection period and were trained to both respond to random prompts from the computers and to event record (initiate a data collection interview) smoking episodes.For the analyses reported, we treated the responses obtained from the random prompts.In all, there were 17,514 random prompts obtained from 515 students with an approximate average of 34 prompts per student (range = 3 to 58).
The dependent variable considered is a measure of the subject's positive mood (posmood) at each random prompt.This measure consists of the average of several individual mood items that were identified via factor analysis.Each item was rated from 1 to 10 with higher values indicating higher levels of positive mood.Over all prompts, and ignoring the clustering of the data, the marginal mean of posmood was 6.733 (sd=2.117).Of interest is the degree of heterogeneity in this mood measure in terms of both WS and BS variation.As covariates, we will examine genderf, which is a subject (level-2) covariate coded 0 for males and 1 for females, and alone, which is a time-varying (level-1) covariate coded 0 if the subject was alone or 1 if with others.Both covariates will be included in all submodels.
The dataset, named posmood_example.dat,contains four variable fields: id (field 1), posmood (field 2), alone (field 3), and genderf (field 4).The field number is used to identify these variables in the mixregls.deffile for the model, which is listed below.
Analysis of Positive Mood -11 quad pts BS and WS variance model with random scale posmood_example.datposmood.outposmood.def 4 2 2 2 0 0 0 0.00001 11 1 200 0 0 1 1 2 3 4 3 4 3 4 posmood alone genderf alone genderf alone genderf Unlike the Riesby dataset, there are no missing values coded in this dataset, so MISS is set to zero.Other options that are specified in this mixregls.deffile: a convergence criterion of 0.00001, 11-point adaptive quadrature, a maximum number of iterations set at 200, and a linear effect of the random location on (log of) the WS variance.
The results for the model are written to the file posmood.out; a partial listing (descriptives and results from the final model stage) is below.Likelihood ratio tests yield: χ 2 2 = 70416.708−70335.768= 80.94 comparing stages 1 and 2, and χ 2 2 = 70335.768− 68283.920= 2051.848comparing stages 2 and 3. Thus, Model 3 clearly fits better than Model 2, indicating a significant linear effect of the random location effect on the WS variance and subject differences of scale.Notice, that there were two instances of a "BAD NR ITERATION," which occurred because the likelihood did not increase for the second and eleventh Newton-Raphson (NR) iterations.At those points, the ridge was increased to aid in the computational solution.It is not unusual for there to be a few "bad" iterations, however more than a few would suggest computational problems with the solution.Interpreting the final model stage, we see a significant negative effect of being alone on mean levels of positive mood, namely, subjects have worse mood when alone.Additionally, being alone increases the BS variance, and so subjects are more heterogeneous in mood when they are alone.The gender effect on the mean level and the BS variance are not seen to be significant.However, females exhibit greater WS positive mood variance, as does being alone.The mood responses of a subject are more heterogeneous and less stable for females, relative to males.Also, for a given subject, their mood responses are more varied when they are alone, relative to when they are with others.------------------------------------------------------ ------------------------------------------------------   Expressing the variance estimates in terms of intraclass correlation
As can be seen, the data are more correlated (i.e., less heterogeneous) within males, than females, and not too different when alone versus not alone.The latter result occurs because being alone increases both the WS and BS variances.However, in terms of gender, the X 'cd c:\mixregls & del mixregls.def& copy pm2.def mixregls.def& mixreglsb'; RUN; Below is a listing of the results from the final model stage for this analysis that includes the interaction of alone by genderf.The likelihood ratio test comparing this model to the previous one yields χ 2 3 = 68283.920− 68274.566= 9.354 which is statistically significant at the 0.025 level.Thus, there is evidence of significant interaction.Looking at the interaction effects in the three submodels, we see that the interaction is significant only in the mean submodel.The nature of this interaction, in terms of the mean, is that there is no significant gender effect when subjects are not alone, but a highly significant gender difference, with females being lower than males, when subjects are alone.Or, another way of expressing it is that while both males and females are lower on positive mood when they are alone, relative to when they are not alone, this difference is significantly greater for females than males.
- ----------------------Model WITH RANDOM Scale - ---------------------  As previously described, a quadratic relationship between a subject's mean and the log of their WS variance can be specifed through use of the NCOV option on line 6 (last option on this line) of the mixregls.deffile.Specifically, this option allows either: Thus far, all of our examples have considered NCOV=1 to allow for a linear association.By changing this value from 1 to 2, we can investigate whether the linear association is sufficient, relative to a quadratic relationship.Using the current example, this one change to the mixregls.deffile produces the results listed below.
-----------------------Model WITH RANDOM Scale -----------------------  A likelihood ratio test yields χ 2 1 = 68274.566− 68236.251= 38.32 for the null hypothesis H 0 : τ q = 0, clearly rejecting the null.Thus, the current model with the linear and quadratic effects fits the data significantly better than the previous model with only the linear effect.As can be seen, both linear and quadratic location effects are negative and significant.This suggests that subjects with higher location effects (more positive affect) exhibit less WS variation in more than a linear manner (on the log scale).As previously mentioned, this could be due to a ceiling effect of measurement in that subjects with high mean levels of positive affect (towards the maximum of 10) must exhibit low variation in their scores.In terms of the other model estimates, most are very similar to those obtained previously in the model with only the linear location effect, and none of the conclusions change appreciably.

Discussion
This article has described the program MIXREGLS which can be used to estimate the parameters of a mixed-effects location scale model (Hedeker et al. 2008).This extended mixed model augments the usual mean model with sub-models of the BS and WS variances.Covariates can be included in each of these sub-models.Relative to the standard mixed model, this augmented approach can be useful to identify predictors of both WS and BS variation, and to test hypotheses about these variances.Also, by including a random subject effect on the WS variance, this model can examine the degree to which subjects are heterogeneous in terms of their variation on the outcome variable, over and above the covariate effects.The random scale effects are further allowed to be associated with the usual random location effects, and this association can either be of a linear or quadratic form (in the log-linear model of the WS variance).Modern data collection procedures, such as EMA and/or real-time data captures, are increasingly used in many research areas.These approaches often provide a fair amount of both WS and BS data, and so allow the opportunity for modeling of both WS and BS variances as a function of covariates.
MIXREGLS is appropriate for continuous outcomes that are (conditionally) normally distributed.For ordinal outcomes, we have also described a mixed-effects location scale model (Hedeker et al. 2009).In the ordinal approach, we extend the mixed-effects logistic regression model of Hedeker and Gibbons (1994) by including the log-linear modeling of BS and WS variance, as well as the random scale effect.SAS NLMIXED can be used to estimate the parameters of the ordinal location-scale model, and Hedeker et al. (2009) includes sample syntax, however, as in the continuous case, it is rather slow to converge and can be sensitive to the specification of starting values.Thus, in future work we hope to develop an ordinal version of MIXREGLS as well.
Another limitation is that MIXREGLS only includes a random intercept as the location effect.For longitudinal data, a more general model with multiple random location effects is often used (Hedeker and Gibbons 2006), for example, to allow subjects to vary in terms of the model intercepts and time-trends.Also, MIXREGLS only allows estimation of a two-level model (i.e., repeated observations within subjects, or subjects within clusters), however in some cases a more extended hierarchical model is necessary.For example, for the EMA dataset one might consider a three-level model in which observations (level 1) are nested within days (level 2) within subjects (level 3).In a recent paper, Li and Hedeker (2012) proposed such a three-level model for EMA data, including development of a recursive conditional likelihood approach using SAS NLMIXED to estimate the model parameters.Future work on MIXREGLS will hopefully include some of these extensions of the model.

A. Estimation details
In terms of estimation, the model can be written as: and which allows the model to be expressed as: where the error variance is given by: with s = [τ l τ q σ ω ] and θ Under the conditional independence assumption, in which the responses within a subject are independent conditional on the random subject effects, θ i = [θ 1i θ 2i ] , the conditional log-likelihood for a given subject is with The marginal likelihood is obtained by integrating over the standardized random effects: where g(θ) is a standard bivariate normal density.The marginal log-likelihood from the sample of N subjects is then obtained as ln L = N i ln h( i ).Maximizing this log-likelihood yields maximum likelihood (ML) estimates, which are sometimes referred to as maximum marginal likelihood estimates (Bock 1989) because integrating the joint likelihood of random effects and responses over the distribution of random effects translates to marginalization of the data distribution.
MIXREGLS maximizes this log-likelihood using a Newton-Raphson algorithm.For this, both the first and second derivatives of the log-likelihood, with respect to the model parameters, must be derived.To simplify the notation, denote h i = h( i ) and l i = l( i | θ).Also, let η represent an arbitrary parameter vector (i.e., β, α, τ , or s).Then, and ∂ ln l i ∂η = − 1 2 For the first derivatives, we obtain: (33) To derive the matrix of second derivatives, denote where Notice that d i is equal to ∂h i /∂η, so that where Thus, we need to derive the second derivatives with respect to ln l i .These are obtained as: ∂ 2 ln l i ∂α ∂α = − 1 4 ∂ 2 ln l i ∂τ ∂τ = − 1 2 ∂ 2 ln l i ∂s ∂s = − 1 2 The Newton-Raphson iterative procedure can now be implemented.Specifically, estimates for the entire vector of parameters ζ, on iteration ι are improved by At each model stage, MIXREGLS continues to iterate until all of the corrections (ζ ι+1 − ζ ι ) are less, in absolute value, than the convergence criterion CONV specified in the mixregls.defdefinition file.At convergence, the ML estimates and their accompanying standard errors can be used to construct asymptotic z-statistics by dividing the parameter estimate by its standard error (Wald 1943).The computed z-statistic can then be compared with the standard normal table to test whether the parameter is significantly different from zero.MIXREGLS lists these z-and p-values, in addition to the parameter estimates and their standard errors.Also, the likelihood-ratio (or difference in log-likelihood) χ 2 test can be used for comparison of nested models, for example, the three sequential models that MIXREGLS fits.For this, the significance of the additional terms in model A over model B is determined by comparing −2(ln L B − ln L A ) to a table of the χ 2 distribution with degrees of freedom equal to the number of additional parameters in model A. To facillitate this, MIXREGLS lists the deviance (−2 ln L) of each of the sequential models that it estimates.

A.1. Numerical quadrature
In order to solve the above likelihood equations, numerical integration on the random effect θ space must be performed.For this, Gauss-Hermite quadrature can be used to approximate the above integrals to any practical degree of accuracy (Stroud and Sechrest 1966).In Gauss-Hermite quadrature, the integration is approximated by a summation on a specified number of quadrature points NQ for each dimension of the integration; thus, for the bivariate θ space, the summation goes over NQ 2 points.MIXREGLS allows for both ordinary quadrature, in which the same points are used for all subjects, or adaptive quadrature (Bock and Shilling 1997;Rabe-Hesketh, Skrondal, and Pickles 2002), in which the points are adapted to the location and dispersion for each subject at each iteration.The benefit of adaptive quadrature is that it usually requires a smaller number of quadrature points to achieve convergence.
At each iteration, and for each subject, the solution goes over the NQ 2 quadrature points, with summation replacing the integration over the random-effect distribution.The conditional likelihoods l i are obtained substituting the random-effect vector θ by the current 2-dimensional vector of quadrature points B q .The marginal density for each level-2 unit is then approximated as At each iteration, computation of the first derivatives and second derivative matrix then proceeds summing over subjects and quadrature points.In the summation over the NQ 2 quadrature points, the θ random-effect vector is substituted by the current vector of quadrature points B q , and the evaluation of the density g(θ) is substituted by the current quadrature weight A(B q ), which is obtained as the product of the quadrature weights from the two dimensions of the integral.Following the summation over subjects and quadrature points, parameters are corrected according to the Newton-Raphson algorithm, and the entire procedure is repeated until convergence.

A.2. Estimation of random effects
As was illustrated in the first example, it can be of interest to examine the estimates of the random effects θ i within the sample.A reasonable choice for this is the expected "a posteriori" (EAP) or empirical Bayes estimator θi (Bock and Aitken 1981).For the univariate case, this estimator θi is given by: The variance of this estimator is obtained similarly as: Upon convergence of each stage, MIXREGLS estimates these quantities using one additional round of quadrature.As mentioned in the first example, these estimates are written out to the file mixregls.re2.They may then be used, for example, to evaluate the location and scale estimates for particular subjects.

A.3. Standardized residuals
For each of the three model stages, the standardized residuals are obtained as:

Figure 1 :
Figure 1: A visual representation of the mixed-effects location scale model.

Line 6 -
NVAR P R S PNINT RNINT SNINT CONV NQ AQUAD MAXIT MISS STD NCOV.NVAR = number of variables to read from filename.dat.P = number of covariates for the mean submodel.R = number of covariates for the BS variance submodel.S = number of covariates for the WS variance submodel.PNINT = 0 (the mean submodel should include an intercept) or 1 (the mean submodel should not include an intercept).RNINT = 0 (the BS variance submodel should include an intercept) or 1 (the BS variance submodel should not include an intercept).SNINT = 0 (the WS variance submodel should include an intercept) or 1 (the WS variance submodel should not include an intercept).

Figure 3 :
Figure 3: Bivariate scatter plot of estimated random location and scale effects.
Positive Mood -11 quad pts BS and WS variance model with random scale data and output files:

Table 1 :
The data of the two highest and lowest scale estimates ( θ2i ) from analysis of the Riesby data.

Table 2 :
Interesting subjects in terms of location and scale estimates (i.e., θ1i and θ2i ) from analysis of the Riesby data (−9 represents missing data).but is somewhat less responsive and with rather high depression values across time.Subject 505 in the upper left corner, has generally moderate to low depression scores, but a rather erratic pattern across time.Finally, the four subjects(ids 607, 322, 328, and 360)in the upper right corner have generally high depression values which are fairly erratic across time. .......................