runmixregls - A Program to run the MIXREGLS mixed-effects location scale software from within Stata. Statistical

Hedeker and Nordgren (2013) present the stand-alone MIXREGLS program for ﬁtting the mixed-eﬀects location scale model to continuous longitudinal and other clustered data. This model can be used when interest lies in joint modeling the mean and dispersion of subjects’ responses over time. The model extends the standard two-level random-intercept mixed model by allowing both the within- and between-subject variances to be inﬂuenced by the covariates and for the within-subject variance to additionally depend on a subject random-scale eﬀect. In this article we present the runmixregls command to run MIXREGLS seamlessly from within Stata . We illustrate the notable advantages of using runmixregls by replicating and extending the two example analyses presented in Hedeker and Nordgren (2013). We then use runmixregls to demonstrate a new and important research ﬁnding. Namely, that ignoring the random-scale eﬀect in the within-subject variance function will lead to the regression coeﬃcients in this function to be estimated with spurious precision, especially the regression coeﬃcients of subject-level covariates.


Introduction
Mixed-effects models (Fitzmaurice, Laird, and Ware 2011;Hedeker and Gibbons 2006;Verbeke and Molenberghs 2000) -also known as multilevel models (Goldstein 2011;Snijders and Bosker 2012), hierarchical linear models (Raudenbush and Bryk 2002), or random effects models -are widely used in the analysis of longitudinal and other clustered data in the social, behavioral and medical sciences. The simplest mixed-effects model for continuous repeated measures data, where observations are nested within subjects, is the two-level random-intercept model. This model decomposes the variation in the response variable, having adjusted for the covariates, into a homogeneous between-subject (BS) variance and a homogeneous within-subject (WS) variance. Hedeker and Nordgren (2013) propose the mixed-effects location scale model to extend the above model by modeling both the WS and BS variances as log-linear functions of the covariates. A subject random-scale effect is then entered into the WS variance function to account for any remaining unexplained heterogeneity in the residual error variance between subjects. The model is argued to be especially useful when interest lies in joint modeling the mean and dispersion of subjects' responses over time. However, it can equally be employed to analyze clustered data when interest lies in explaining why both the mean and dispersion of subjects' outcomes vary across clusters (e.g., hospitals, schools, neighborhoods, etc.). Hedeker, Mermelstein, and Demirtas (2008b) show how mixed-effects location scale models can be fitted using maximum likelihood estimation in SAS (SAS Institute Inc 2012) via SAS PROC NLMIXED. However, Hedeker and Nordgren (2013) note that this procedure can be slow to run, often requires excellent starting values to converge, and is difficult for less statistical literate data analysts to use. It is these difficulties which motivated the development of their stand-alone MIXREGLS program (Hedeker and Nordgren 2013). MIXREGLS is Windows-based and is run in batch mode; there is no graphical user interface. A limitation of MIXREGLS is that it has no facilities beyond this single estimation capability. Furthermore, the process of writing MIXREGLS data and model definition files is fiddly and error prone. Hedeker and Nordgren (2013) recognize these difficulties and show how the process of fitting models in MIXREGLS can be partially automated using SAS and to a fuller extent using R. However, no parallel functionality is provided for the Stata software (StataCorp. 2013).
Other software can also fit mixed-effects location scale models. Rast, Hofer, and Sparks (2012) present syntax to fit these models using Markov chain Monte Carlo (MCMC) methods in WinBUGS (Lunn, Thomas, Best, and Spiegelhalter 2000). We note that WinBUGS can itself be called from Stata using the wb (Thompson, Palmer, and Moreno 2006) and wbs (Thompson 2014) suites of commands. However, WinBUGS is also difficult for less statistical literate data analysts to use. Leckie, French, Charlton, and Browne (2015) discuss fitting similar models using MCMC methods in the Stat-JR statistics package (Charlton, Michaelides, Parker, Cameron, Szmaragd, Yang, Zhang, Frazer, Goldstein, Jones, Leckie, Moreau, and Browne 2013), however Stat-JR is not currently accessible within Stata. Certain mixed-effects location scale models can be formulated using the double hierarchical generalized linear model (DHGLM) framework proposed by Lee and Nelder (2006) and are implemented in ASReml (Gilmour, Gogel, Cullis, and Thompson 2009), GenStat (Payne, Murray, Harding, Baird, and Soutar 2009) and R (R Core Team 2014) (via the dhglm package; Noh and Lee 2013), but none of these packages are accessible from within Stata. It is worth noting that it is also not currently possible to fit these models using inbuilt Stata commands. Nor is it possible to fit these models in SPSS (IBM Corporation 2012) or in specialized mixed-effects modeling software such as HLM (Raudenbush, Bryk, and Congdon 2012), MLwiN (Rasbash, Charlton, Browne, Healy, and Cameron 2009) and SuperMix (Hedeker, Gibbons, du Toit, and Cheng 2008a).
In this article, we present the runmixregls command to run the MIXREGLS software seamlessly from within Stata. There are notable advantages to running MIXREGLS in this way.
First, runmixregls provides a simple and intuitive command syntax which allows users to specify the full range of modeling and estimation options implemented in MIXREGLS. Second, prior to fitting a particular model, users can take advantage of Stata's excellent statistics, graphics and data management commands to prepare and descriptively analyze their data. Third, after fitting the model, users can apply Stata's comprehensive hypothesis testing, model comparison, prediction and plotting facilities to interpret their models and to communicate their results. Last, all analyses can be reproduced and documented for publication and review by typing all these commands into a file and running them all at once.
The remainder of the article is structured as follows. Section 2 briefly reviews the mixedeffects location scale model implemented in MIXREGLS. Section 3 describes how to install runmixregls from within Stata. Section 4 presents the runmixregls command syntax and describes the various modeling and estimation options. Sections 5 and 6 illustrate runmixregls by replicating and extending the two example analyses presented in Hedeker and Nordgren (2013). Section 7 performs a simulation study using runmixregls to demonstrate a new and important research finding. Namely, that ignoring random-scale effects in the WS variance function will lead the WS variance function regression coefficients to be estimated with spurious precision, especially the regression coefficients of subject-level covariates. Section 8 concludes.

Review of the mixed-effects location scale model 2.1. Mixed-effects model
Let y ij denote the continuous response measurement for subject i (i = 1, 2, ..., N ) at occasion j (j = 1, 2, ..., n i ). The standard two-level random-intercept mixed-effects model can then be written as where x ij is a vector of covariates, β is the associated vector of coefficients, υ i is the randomintercept effect, and ij is the residual error. The covariates may be time varying or time invariant. The random-intercept effect and residual error are assumed normally distributed with zero means and constant variances. The homogeneous BS variance, σ 2 υ , measures the variability in subjects' mean responses, having adjusted for the covariates. The homogeneous WS variance, σ 2 , measures the variability in subjects' measurements about their adjusted mean responses. This model can be fitted in Stata using the xtreg command with the mle option, or alternatively by using the mixed command. 1

Mixed-effects location scale model
The mixed-effects location scale model implemented in MIXREGLS may be viewed as an 1 Prior to Stata 13 (released June 2013), the mixed command was named xtmixed. extended reparameterized version of the above model and can be written as where we refer to Equation 4, Equation 5 and Equation 6 as the mean function, the BS variance function, and the WS variance function, respectively. Hedeker and Nordgren (2013, p. 3) describe this model in detail and so we provide only a brief summary here.
The mean function (Equation 4) is the same as that in the standard model (Equation 1), except that the random-intercept effect, now referred to as the random-location effect, is parameterized in standardized form, σ υ ij θ 1i . The first term, σ υ ij , denotes the square root of the BS variance, while θ 1i denotes the standardized random-location effect, θ 1i ∼ N (0, 1). Note that σ υ ij is subscripted by i and j to indicate that its value may change across subjects and occasions. The influence of θ 1i on y ij may therefore be amplified or dampened by the magnitude of σ υ ij .
The BS variance function (Equation 5), models the BS variance, σ 2 υ ij , as a log-linear function of a second vector of subject-or occasion-level covariates, u ij , where α denotes the associated vector of coefficients.
The WS variance function (Equation 6), models the WS variance, σ 2 ij , as a log-linear function of a third vector of subject-or occasion-level covariates, w ij , where τ is the associated vector of coefficients. A quadratic subject-level association is allowed between the unexplained location and scale variability by entering θ 1i and its square, θ 2 1i , into the WS variance function as latent covariates with regression coefficients, τ l and τ q , to be estimated. This additional flexibility is useful when the response exhibits floor or ceiling effects, as we then expect a concave relationship between subjects' variances and means whereby subjects with very low or very high means have near-zero WS variances, while subjects with means closer to the middle of the response scale have higher WS variances. A quadratic association is better able to capture such concavity. Finally, a new random effect, denoted θ 2i and referred to as a random-scale effect, is included to account for unexplained variation in the WS variance above and beyond the contribution of the covariates. This random effect is assumed normally distributed with zero mean and constant variance, σ 2 ω . When u ij and w ij each include only a constant and when τ l = τ q = σ ω = 0, the above mixedeffects location scale model simplifies to a reparameterized version of the standard two-level random-intercept mixed-effects model with homogeneous variances presented in Section 2.1.
The use of log-link functions ensures positive variances. However, it makes parameter interpretation less straightforward. In particular, the covariates have multiplicative rather than additive effects on the variances. In the examples we plot the predicted level-1 variance functions to aid their substantive interpretation. This proves especially helpful in interpreting quadratic random-location effects on the within-subject variance.
As in standard mixed-effects models, the random effects normality assumptions may not necessarily hold and it will be prudent to check their plausibility, for example, by inspecting quantile-quantile (Q-Q; normal scores plots) or other graphical plots post-estimation.
3. Installing runmixregls runmixregls requires Stata 12 or later and MIXREGLS. MIXREGLS can be freely downloaded from: http://www.uic.edu/sph/downloads/cms-downloads/cbd/zipfiles/MIXREGLS.zip. runmixregls can be installed from the Statistical Software Components (SSC) archive by typing the following command within a net-aware version of Stata

. ssc install runmixregls
Two files will be installed on your computer: runmixregls.ado, a Stata ado file which defines the command; and runmixregls.sthlp, a Stata help file which documents the command. Note that these files will be installed onto your adopath, the path where Stata searches for the files it needs.
If you have already installed runmixregls from the SSC, you can check that you are using the latest version by typing the following command . adoupdate runmixregls Next, you must declare the fully qualified filename for the MIXREGLS executable (the MIXREGLS.exe file) so that runmixregls knows where to find the software. You can do this by specifying a global macro called mixreglspath. Users must substitute the fully qualified filename that is correct for them, e.g., . global mixreglspath "C:\MIXREGLS\mixreglsb.exe" Advanced users may wish to set the MIXREGLS fully qualified filename every time Stata is started by simply inserting the following line into their profile do-file profile.do (see help profile).

Syntax diagram
The runmixregls command follows standard Stata syntax for estimation commands. We restrict our discussion here to the most common options. A complete description is provided in the runmixregls help file. 2 You may type the following at any point to view this help file . help runmixregls 2 The runmixregls help file is also available as a PDF document (Leckie 2013). Note that the terminology used in the help file follows Stata conventions and therefore differs slightly from that presented here where we have aimed to be consistent with Hedeker and Nordgren (2013).
Before using runmixregls, you must declare the subject identifier (the variable in the data which uniquely identifies the subjects) using the xtset command (see help xtset). xtset varname where varname is the name of the subject identifier.

Syntax
The runmixregls command has the following syntax where runmixregls is the name of the command, depvar denotes the dependent or response variable, varlist denotes the list of covariates appearing in the mean function (the square brackets indicate optional arguments), the if exp qualifier allows one to restrict the estimation sample to those observations for which the expression exp is true; the in range qualifier allows one to restrict the estimation sample to the range of observations given by range, the comma separates the specification of the mean function from the specification of any modeling or estimation options listed in options .

Options
The options are defined as follows association(atype ), where atype none|linear|quadratic specifies the subject-level association between the (log of the) WS variance and the random-location effect. The default is association(linear).

Random effects/Residuals
reffects(newvar1 newvar2 ) retrieves the empirical Bayes estimates of the standardized random effects. The standardized random-location effects are placed in newvar1 while the standardized random-scale effects are placed in newvar2 . The associated standard errors are placed in newvar1_se and newvar2_se .
residuals(newvar ) retrieves the standardized residual errors and places them in newvar .
Integration noadapt prevents MIXREGLS from using adaptive Gaussian quadrature. MIXREGLS will use ordinary Gaussian quadrature instead.
intpoints(#) sets the number of integration points for (adaptive) Gaussian quadrature. The default is intpoints(11). The more points, the more accurate the approximation to the log-likelihood. However, computation time increases with the number of quadrature points. When models do not converge properly, increasing the number of quadrature points can sometimes lead to convergence. Reporting level(# ) specifies the confidence level, as a percentage, for confidence intervals. The default is level(95).

Maximization
display_options controls column formats, row spacing, line width, and display of omitted variables and base and empty cells; see help estimation options##display_options for more details.
noheader suppresses the display of the summary statistics at the top of the output; only the coefficient table is displayed.
notable suppresses display of the coefficient table.
coeflegend specifies the legend of the coefficients and how to specify them in an expression to be displayed rather than displaying the statistics for the coefficients.

Example 1: Reisby depression study
The first example analyses presented by Hedeker and Nordgren (2013) relate to data drawn from a six-week longitudinal psychiatric study of 66 depressed inpatients (Reisby, Gram, Bech, Nagy, Petersen, Ortmann, Ibsen, Dencker, Jacobsen, Krautwald, Sondergaard, and Christiansen 1977). Patients were diagnosed at baseline with either endogenous (N = 37) or non-endogenous (N = 29) depression and were then rated weekly using the Hamilton depression rating scale. The data are two-level with observations (level-1) nested within patients (level-2). The total number of observation is 375, 21 less than the expected 66 × 6 = 396 as some subjects were missing on some weeks.

Data
We start by loading the data. We do this using the use command where we specify the clear option to replace any data should they currently exist in memory.

. use reisby, clear
We then use the codebook command with the compact option to compactly describe the data contents.
. codebook, compact The output shows that the data includes five variables: the subject identifier, id; the response variable, hamdep (range = 0 to 39, where -9 is the missing value code); the week number, week (coded sequentially from 0 to 5); the grouping variable, endog (coded 0 for non-endogenous subjects, 1 for endogenous subjects); and the interaction between endog and week, endweek. We use the recode command to recode the 29 missing values described above from -9 to Stata's system missing value . so that these observations will not be used in the analysis.

(hamdep: 21 changes made)
To get a better sense of the data, we plot depression scores against time separately for nonendogenous and endogenous subjects. We use the twoway command with the line plottype to give a twoway line plot of hamdep against week. The connect(ascending) option requests that we connect the points in ascending order of the x variable, here week. This ensures that we plot a separate line for each subject. The xlabel(0(1)5) option specifies that major ticks and labels appear on the x axis from 0 to 5 in steps of 1. The by(endog) option requests we draw separate plots within the graph for the two subject groups defined by endog. . twoway (line hamdep week, connect(ascending)), xlabel(0(1)5) by(endog) The plot shows depression scores tend to improve over time for all subjects. Endogenous subjects appear to have higher depression scores at baseline but it is unclear as to whether these subjects tend to recover more or less rapidly than non-endogenous subjects. There is substantial variability between subjects and this increases over time; the observed trajectories fan outwards as the study progresses. There is a slight suggestion that endogenous subjects are more heterogeneous than non-endogenous subjects at each point in time, but this is far from certain. Some subjects certainly appear to recover more erratically than others, but it is less clear as to whether the erraticness of individual subjects' sequences of depression scores is related to time or depression group.

Model
Hedeker and Nordgren (2013, p. 11) address these questions by specifying the following mixedeffects location scale model where, in the mean function, hamdep is modeled in terms of week, endog and endweek, to allow endogenous and non-endogenous subjects to differ in baseline depression severity and to recover at different rates. The random-location effect, σ υ i θ 1i , allows for subject intercept heterogeneity above and beyond that explained by the covariates. The log of the BS variance, σ 2 υ i , is modeled as a function of endog to allow endogenous and non-endogenous subjects to be differentially variable in terms of their mean depression scores, having adjusted for time, group and group by time interaction effects. The log of the WS variance, σ 2 ij , is modeled as a function of week and endog to allow the variability of a subject's depression scores about their individual trajectories to differ across the two groups and to change over time. The location effect is entered into the WS variance function to allow for a linear subject-level association between the log of the WS variance and the random-location effect. The random-scale effect, σ ω θ 2i , allows for any remaining unexplained variation in WS response heterogeneity across subjects.
An important feature of the data shown in Figure 1, but not captured by the model is that subjects vary in terms of their time trends as well as their intercepts. Ideally we would introduce a random-slope on week into the model to account for this. However, this modeling extension is not currently implemented in MIXREGLS. Restricting subjects to have parallel trajectories will force the WS variance to increase as a function of time.
Before we can fit this model, we must first use the xtset command to declare the grouping or panel variable, which for these data is id The above model can then be fitted using the following runmixregls command.

runmixregls -Run MIXREGLS from within Stata
The first term after the name of the command is the response variable, hamdep, and this is then followed by the list of mean function covariates, week endog endweek. The first option, between(endog), specifies the BS variance covariates, the second option, within(week endog), specifies the WS variance covariates. The formatting of the runmixregls output follows that of other Stata panel data estimation commands. A range of ancillary model information is presented first, followed by the table of parameter estimates and standard errors.
The first line of output simply states that MIXREGLS was run from within Stata using the runmixregls command. Immediately below that, the output states that the model is a mixed-effects location scale model where the grouping or panel variable is id. To the right, the output states that the model is fitted to an estimation sample of 375 observations belonging to 66 subjects where the number of observations per subject ranges from 4 to 6 with an average of 5.7 observations. The next block of output, immediately above the table, reports that MIXREGLS took 5.715 seconds to fit the model using 11 integration points. The log-likelihood statistic is also presented.
The table of parameter estimates and standard errors is presented in five blocks, labeled Mean (mean function regression coefficients), Between (BS variance function regression coefficients), Within (WS variance function regression coefficients), Association (subject-level locationscale association parameters) and Scale (random-scale effect standard deviation). The table also reports z-ratios, p values and 95% confidence intervals.
The substantive interpretation of these results is given by Hedeker and Nordgren (2013, p. 13), and so we provide only a brief summary here. The mean model shows a significant negative effect of time (β 1 = −2.29, p < 0.05) indicating depression scores decrease over the six-week study in the non-endogenous group. The endogenous group does not differ significantly from this trend in either their mean baseline depression score (β 2 = 1.88, p = 0.08) or in their average rate of improvement over time (β 3 = −0.03, p = 0.92). There are also no group differences in terms of the BS variance (α 1 = 0.51, p = 0.27) or the WS variance (τ 2 = 0.29, p = 0.24). Thus, endogenous subjects are found to be no more or less heterogeneous than non-endogenous subjects in either their mean depression scores or in the variability of their depression scores about their individual trajectories means. The WS variance significantly increases with time (τ 1 = 0.19, p < 0.05) suggesting that a subject's depression scores become more variable with time. However, as discussed above, the magnitude of this effect would be expected to reduce were we able to introduce random time trends into the model. The linear random-location effect does not have a significant effect on the log of the WS variance (τ l = 0.21, p = 0.14). Finally, the scale standard deviation is significant (σ ω = 0.66, p < 0.05) indicating that some subjects' depression scores are significantly more dispersed than other subjects even after adjusting for group and time effects.

Random effects and residuals
Next we retrieve the empirical Bayes estimates of the standardized random-location and random-scale effects by specifying the reffects(theta1 theta2) option. The standardized random-location effects are placed in a new variable named theta1 while the standardized random-scale effects are placed in theta2. The associated standard errors are placed in theta1_se and theta2_se. We retrieve the standardized residual errors by specifying rstandard(estd).
. runmixregls hamdep week endog endweek, between(endog) within(week endog) > reffects(theta1 theta2) residuals(estd) (output omitted ) The runmixregls output is identical to before and so we do not redisplay it here. . histogram estd, width(0.5) start(-3) frequency (bin=11, start=-3, width=.5) where we specify the width(0.5) and start(-3) options to set the width of the bins to 0.5 and to set the theoretical minimum value to -3. The frequency option requests the histogram to be drawn as frequencies. . scatter theta2 theta1, mlabel(id) where we have added the mlabel(id) option to add marker labels to the figure showing subjects' unique identifier values, id. The plot shows no obvious association between the estimated location and scale effects corroborating the non-significance of the random-location effects on the WS variance (τ l = 0.21, p = 0.14).  Tables 1 and 2 of Hedeker and Nordgren (2013, p. 18-20) present subjects who appear interesting in terms of their location and scale estimates. We can easily replicate these tables, but we must first carry out a small number of data manipulation steps.
We start by only keeping those variables needed to construct the tables and by renaming the response from hamdep to hd.
. keep id hamdep week theta1 theta2 . rename hamdep hd Next we use the reshape command to convert the data from "long form" (one record per observation) to "wide form" (one record per subject).
. format %4.3f theta1 theta2 Lastly we use the gsort command to sort the data in descending order by theta2.
. gsort -theta2 Table 1 in Hedeker and Nordgren (2013, p. 18) lists the two subjects with the highest scale estimates and the two subjects with the lowest scale estimates. We can replicate the first half of this table by using the list command where we specify if inlist(_n, 1, 2) to list the data for only the first two records (subjects) in the data (i.e., the two subjects with the highest scale estimates). These two subjects (Subjects 606 and 505) show unusual trajectories whereby their baseline depression scores appear somewhat anomalous relative to their subsequent scores.
. list id theta2 hd? if inlist(_n, 1, 2) We can replicate the second half of this table by changing the if statement to only list the last two records (subjects) in the data (i.e., the two subjects with the lowest scale estimates). These two subjects (Subjects 335 and 308) have linear rates of decline which closely match those predicted by the mean time trend.
. list id theta2 hd? if inlist(_n, 65, 66)  Table 2 in Hedeker and Nordgren (2013, p. 20) presents the data for subjects located in the four corners of Figure 2. To replicate this table, we first sort the data in ascending order by theta2.
. sort theta2 Subjects 117 and 347 are located in the bottom left corner of Figure 2. These subjects have some of the mildest depression scores and show continual smooth improvement in their scores over the course of the study.
. list id theta1 theta2 hd? if inlist(id, 117, 347) Subject 345 is located in the bottom right corner of Figure 2. This subject also shows continual smooth improvement in their scores over the course of the study, but these scores are always substantially above average. . list id theta1 theta2 hd? if inlist(id, 505) . list id theta1 theta2 hd? if inlist(id, 607, 322, 328, 360)

Additional analyses extending those of Hedeker and Nordgren (2013)
In Section 7 we shall use runmixregls in a simulation study which presents a new and important research finding. Namely, that ignoring random-scale effects in the WS variance function will lead the WS variance function regression coefficients to be estimated with spurious precision, especially the regression coefficients of subject-level covariates. In the remainder of this section we shall motivate this simulation study by examining the impact of ignoring random-scale effects in the Reisby data.
We must first reload the data, recode the missing values of hamdep, and declare the data to be panel data.
. use reisby, clear . recode hamdep (-9 = .) (hamdep: 21 changes made) . xtset id panel variable: id (balanced) Next we refit the model, removing the non-significant association between the WS variance and the random-location effects. We increase the number of integration points to 15 simply to demonstrate how one would do this.   The only subject-level covariate in the WS variance function is the depression group variable. The results show that there is no significant difference in WS variance between the endogenous and non-endogenous subjects, even at the 20% level (τ 2 = 0.30, p = 0.22). We shall compare this finding with that obtained from a model which ignores the significant random-scale effects.
At this point it is helpful to understand that MIXREGLS has fitted the full model shown above in three sequential and increasingly complex stages. The results of each model stage are provided as the starting values for the next stage. This sequential approach improves the convergence of the full model. The stage 1 model assumes the WS variance to be homogeneous. The stage 2 model adds in the WS variance covariates. The stage 3 model adds in the randomscale effects. The linear or quadratic subject-level association between the random-location effects and the WS variance is also added at this stage if it has been specified. In the current model, we specify there to be no such association and so the only difference between the stage 2 and stage 3 models is the inclusion of the random-scale effects. We can therefore simply compare the stage 2 and stage 3 model results for WS variance depression group regression coefficient.
The stage 1, 2 and 3 estimation results are saved by runmixregls and can be listed using the ereturn list command. Full explanations are given in the runmixregls help file. The names assigned to the different stage specific model results are suffixed by _1, _2 and _3. For example, the parameter estimates from each stage are stored in the matrices e(b_1), e(b_2) and e(b_3), while their corresponding variance-covariance matrices are stored in the matrices e(V_1), e(V_2) and e(V_3).
We can examine the stage 2 parameter estimates by using the matrix list command.
. matrix list e(b_2) More usefully, we can construct a complete estimates table for the stage 2 results by first making copies of e(b_2) and e(V_2).

. ereturn post b_2 V_2
Finally we display the complete estimates table using the ereturn display command.

Example 2: Positive mood study
The second example analyses presented by Hedeker and Nordgren (2013) relate to data drawn from a week-long intensive longitudinal study of 515 adolescent school students (Mermelstein, Hedeker, Flay, and Shiman 2002). Ecological momentary assessments (EMA) were collected by randomly prompting students to respond to questions about their mood, place, activity and companionship. The data are two-level with observations (level-1) nested within students (level-2). The response is a positive mood score and interest lies in explaining why subjects differ in their positive mood variability from prompt-to-prompt as well as why some subjects have higher average positive mood than others. The main covariates of interest are student gender and whether the student was alone when prompted.

Data
First we load the data.

. use posmood, clear
Next we describe the data's content.
To get a better sense of the data, Figure 4 plots a trellis plot of positive mood scores against observation number for the first 20 subjects in the data. To plot this figure, we must first generate a unique observation identifier within each subject and then xtset the data. The xtset output informs us that at least one subject contributes as many as 58 observations. Next we use the xtline command to draw the plot.
. xtline posmood if subject<=20, byopts(style(compact)) where we specify if subject<=20 to restrict the plot to the first 20 subjects and we specify the byopts(style(compact)) option to draw a more compact version of the plot than is plotted by default.
The plot shows some subjects contribute more observations than others. We also see that both mean positivity and mood variability varies substantially across subjects. For example, Subjects 17 and 19 show similar mean positivity, but differ substantially in their mood variability with Subject 19 exhibiting a more variable mood than Subject 17. In contrast, Subject 18 exhibits very high mean positivity, close to the maximum score of 10, and very low mood variability. This pattern is consistent with a ceiling effect in the positive mood scale.

Model 1
The first model Hedeker and Nordgren (2013, p. 22) specify for these data is written as where in the mean function posmood is modeled in terms of alone and genderf, to allow for mean mood differences by gender and whether subjects are alone. The random-location effect, σ υ i θ 1i , allows for subject intercept heterogeneity above and beyond that explained by the covariates. The log of the BS variance, σ 2 υ i , is modeled as a function of genderf to allow male and female subjects to be differentially variable in terms of their mean positive mood scores, having adjusted for the mean differences in gender and whether subjects are alone. It is also modeled as a function of alone and so the magnitude of each subject's randomlocation effect is allowed to be modified on occasions when they are alone. The log of the WS variance, σ 2 ij , is modeled as a function of alone and genderf to allow the variability of a subject's positive mood about their individual mean level to differ by gender and whether students are alone. The location effect is entered into the WS variance function to allow for a linear subject-level association between the log of the WS variance and the random-location effect. The random-scale effect, σ ω θ 2i , allows for any remaining unexplained variation in WS response heterogeneity across subjects.
. runmixregls posmood alone genderf, between(alone genderf) > within(alone genderf)     The summary information at the top of the output states that the model is fitted to an estimation sample of 17514 observations belonging to 515 subjects. The number of observations per subject ranges from 3 to 58 with an average of 34 observations. The model took 317 seconds to converge.
The substantive interpretation of these results is given by Hedeker and Nordgren (2013, p. 23), and so we provide only a brief summary here. The mean function shows subjects experience significantly lower moods when alone (β 1 = −0.37, p < 0.05), but there is no mean mood difference by gender (β 2 = −0.15, p = 0.17). The BS variance function also shows females to be no more variable than males in their mean levels of positivity (α 2 = 0.004, p = 0.97). However, unexplained differences in mean moods between subjects become more pronounced when subjects are alone relative to when they are with others (α 1 = 0.11, p < 0.05). The WS variance function shows females have significantly more variable moods from one observation to the next than males (τ 2 = 0.22, p < 0.05). Subjects are also shown to have more variable observation-to-observation moods when they are alone relative to when they are with others (τ 1 = 0.08, p < 0.05). The linear random-location effects are negatively associated with the WS variance and so, having adjusted for the covariates, subjects with higher mean moods tend to have lower mood variability (τ l = −0.22, p < 0.05). Finally, the random-scale standard deviation is significant suggesting that there is an unexplained component to subjects' mood variability (σ ω = 0.60, p < 0.05).

runmixregls: Run MIXREGLS from within Stata
We store these estimation results, giving them the name model1, so that we can retrieve them at a later point.
. estimates store model1 To aid the interpretation of this model Hedeker and Nordgren (2013, p. 25) predict the BS variance and the WS variance separately for the four subject groups formed by crossing gender by alone status: (1) Male, not alone; (2) Male, alone; (3) Female, not alone; (4) Female, alone. Below we replicate these predictions. First we use the generate and replace commands with appropriate if statements to generate a new categorical variable named group which indicates these four subject groups . generate group = .
(17514 missing values generated) . Then we define a value label named grouplabel and attach this to this variable.
We use the predictnl command to store these predictions in a new variable called BSvariance. We make use of the function xb(Between) which replicates the calculation of the linear predictor, u ijα , here,α 0 +α 1 alone ij +α 2 genderf i .

. predictnl BSvariance = exp(xb(Between))
Next we use the tabstat command to tabulate the predicted BS variance by subject group. Taking the exponential of both sides of the log-linear WS variance function (Equation 18) gives We wish to obtain the population-averaged WS variance for each of our four subject groups. This is obtained by integrating out the random-location and random-scale effects. It can be shown that the resulting expression is given by 3 We will use the predictnl command to also store these predictions. However, in order to do so we first need to know how to specify the parameters τ l and σ ω in the above expression. We can find their internal Stata parameter names by reissuing the runmixregls command with only the coeflegend option. This will simply redisplay the current (active) estimation results, but will present the parameter names rather than displaying the statistics for these coefficients. We see that the internal Stata parameter names for τ l and σ ω are _b[Association:linear] and _b[Scale:sigma], respectively. We can now use predictnl to predict the populationaveraged WS variances, placing them in a new variable called WSvariance. We make use of the function xb(Within) to replicate the calculation of the linear predictor, w ijτ , here, τ 0 +τ 1 alone ij +τ 2 genderf i .
. predictnl WSvariance = exp(xb(Within) > + 0.5 * (_b[Association:linear]^2 + _b[Scale:sigma]^2)) Next we tabulate the predicted population-averaged WS variance by subject group These predicted values show that females have higher occasion-to-occasion mood variability than males. Similarly subjects have more variable moods when alone, relative to when they are with others.
Having predicted the BS and WS variances for each subject group, Hedeker and Nordgren (2013, p. 25) go on to calculate the intraclass correlation coefficient (ICC) for each subject group. The ICC is defined as the expected residual correlation between two observations from the same subject and is calculated as r = σ 2 υ ij / σ 2 υ ij + σ 2 ij . We can once again use the predictnl command to calculate these predictions . predictnl ICC = exp(xb(Between))/(exp(xb(Between)) + exp(xb(Within) > + 0.5 * (_b[Association:linear]^2 + _b[Scale:sigma]^2))) We then tabulate the ICC by subject group These predicted values show that there is substantial clustering in the data for all four groups, but that observations are more correlated within males than females. In contrast, subjects' observations appear no more correlated when they are alone, than when they are with others.

Model 2
The second model Hedeker and Nordgren (2013, p. 26) specify explores the interaction between gender and whether subjects are alone on their mean moods, BS heterogeneity in mean moods, and in their WS mood variability. Specifically, they include algenf, the interaction between alone and genderf in the mean function, the BS variance function, and the WS variance function. Their model can be written as First we generate the interaction term.
. generate algenf = alone * genderf Then we fit the model. . estimates store model2 We can replicate the likelihood ratio test comparing this model to the previous one (Hedeker and Nordgren 2013, p. 27) by using Stata's lrtest command.

. lrtest model1 model2
Likelihood-ratio test LR chi2(3) = 9.36 (Assumption: model1 nested in model2) Prob > chi2 = 0.0249 We see that there is evidence of a significant interaction between gender and whether subjects are alone on positive mood (χ 2 3 = 9.36, p = 0.03). However, examining the individual parameters shows that only the mean function interaction is individually significant (β 3 = −0.14, p < 0.05). The nature of this interaction is that there is no gender difference in positivity when subjects are with others (β 2 = −0.12, p = 0.26) but a gender difference becomes apparent when subjects are alone with females exhibiting significantly lower mood than males (β 2 +β 3 = −0.27, p = 0.02). We calculated the p value for this result using the lincom command which computes point estimates, standard errors, z statistics, p values, and confidence intervals for linear combinations of model parameters. We refer to the estimated parameterŝ β 2 andβ 3 by their internal Stata parameter names.

Model 3
The third and final model Hedeker and Nordgren (2013, p. 28) specify allows for a quadratic relationship between the log of a subject's WS variance and their random-location effect.
Such functionality will better capture the suspected ceiling effect in these data. Their model can be written as To fit the model we simply add the association(quadratic) option to the previous call of runmixregls.
. runmixregls posmood alone genderf algenf, between(alone genderf algenf) > within(alone genderf algenf) association (    . estimates store model3 We then perform a likelihood ratio test to compare this model to the previous model.

. lrtest model2 model3
Likelihood-ratio test LR chi2(1) = 38.31 (Assumption: model2 nested in model3) Prob > chi2 = 0.0000 We see that the current model, which includes the quadratic effect of the random-location effects on the WS variance, fits the data significantly better than the previous model which only allows for a linear effect (χ 2 1 = 38.31, p < 0.05).
6.5. Additional analyses extending that presented in Hedeker and Nordgren (2013) The Association section of the above estimates table shows that both the linear and quadratic effects of the random-location effects are negative and individually significant. We can aid our interpretation of these coefficients by plotting the WS variance function for each of the four subject groups as a function of these random-location effects where we integrate out the random-scale effects to simplify the plot. The resulting WS variance function is given by To plot this function separately for each of the four subject groups, we use the following twoway command with four separate function plots. The plottype function plots mathematical functions of the form y = f (x), where f (x) is some function of x. Here x represents the standardized random-location effect, θ 1i , and so we use the range() option to restrict its range to be from -3 to 3. We refer to all parameters by their internal Stata parameter names. We use the ytitle() and xtitle() options to specify the titles to appear next to the axes. We use the legend() option with the order() suboption to specify the text for the legend key.
. 3 "Female, not alone" > 4 "Female, alone")) Figure 5 shows females exhibit higher mood variability than males as do subjects of both genders when they are alone, relative to when they are with others. The negative linear and quadratic influences of the random-location effects give the variance functions their curved shapes and suggest subjects with more extreme location effects (particularly very high location effects) exhibit less within-subject variation compared to those with less extreme location effects. This pattern reflects the floor and especially ceiling effects in the positive mood scale.

Simulation study
Most standard software can fit a limited version of the mixed-effects location scale model that allows heterogeneity in the WS variance across a limited number of sub groups. More specialized mixed-effects software such as HLM, MLwiN and SAS PROC MIXED can model the WS variance as a function of covariates. However, none of these packages allow users to additionally include random-scale effects in these functions. Ignoring unexplained subject differences in response dispersion will lead the resulting WS regression coefficients to be estimated with spurious precision, especially regression coefficients relating to subject-level predictors. This issue is analogous to the well known problem of ignoring clustering in standard mixed-effects models where omitting the random-location effects leads the mean function regression coefficients to be estimated with spurious precision. Despite this, we are not aware of this issue being discussed in the context of WS variance functions. Researchers ignoring unexplained subject variation in the residual error variance therefore run the risk of making type I errors of inference and of drawing misleading research conclusions regarding the factors which influence response dispersion. In Section 5 we showed using the Reisby data that ignoring the random-scale effects increased the apparent precision of the group difference in WS variance. In this section we present a simple parameter recovery simulation study to show that such increases in precision are entirely spurious.
We analyze 1000 replications of 250 subjects, with 10 observations per subject. On each replication, we generate subjects' responses, y ij , according to the following "true" mixedeffects location scale model where the predictors, x ij and z i , are standard normal variates defined at the observation and subject levels, respectively. For simplicity we assume a homogeneous BS variance and no subject-level association between the unobserved location and scale heterogeneity. The true values for the parameters are: β 0 = 0, β 1 = 0.5, β 2 = 0.2, α 0 = −2.3, τ 0 = −0.61, τ 1 = 0.1, τ 2 = 0.1, and σ ω = 0.45. We then fit a mixed-effects location scale model to each simulated dataset which matches the true model specification presented above. We save the results of each stage 2 model (which ignores the random-scale effects) and each stage 3 model (which includes the random-scale effects). Lastly we compare the two sets of model results, averaging over the 1000 replications.
We start by clearing any existing data from memory and by specifying an initial value for the random-number seed should we wish to replicate our simulation study at a later date.

. clear . set seed 21561561
We then use the forvalues command to execute the commands enclosed within its braces 1000 times.
save "rep_`r'.dta", replace 36. } (output omitted ) Line 2 of the loop displays the current replication number so that the user can see how far the simulation study has progressed. Line 3 clears any existing data from memory. Line 4 specifies that the new dataset will have 250 records, one for each subject. Line 5 generates a new variable to index the subjects. Line 6 generates the single subject-level covariate. Lines 7 and 8 generate the standardized random-location and random-scale effects, respectively. Line 9 expands the data from one record per subject to ten records per subject, one for each observation. Line 10 generates a new variable to index these observations within each subject. Line 11 generates the single observation-level covariate. Lines 12 and 13 generate the BS standard deviations of the random-location and random-scale effects, respectively. Line 14 generates the WS variance. Line 15 generates the residual errors. Line 16 generates the response variable. Line 17 declares the data to be a panel. Line 18 fits the mixed-effects location scale model to the simulated data. Line 19 clears the data from memory. Line 20 changes the number of records in the dataset to eight, one for each parameter. Line 21 generates a new variable to index these parameters. Lines 22 through 25 create copies of the stage 2 and stage 3 coefficient vectors and their associated variance-covariance matrices. Lines 26 and 27 create column vectors of the stage 2 and stage 3 parameter standard errors as the square roots of the leading diagonals of their variance-covariance matrices. Lines 28 through 31 take the stage 2 and stage 3 column vectors of parameter estimates and standard errors and stores their values as four new variables. Line 32 removes the suffix 1 which is automatically added to the names of the four new variables. Line 33 sets the display precision of the four newly created variables to three decimal places. Line 34 lists the dataset of stage 2 and 3 model results. Line 35 saves these model results to disk, saving the replication number in the filename. Line 36 closes the loop.
Next we clear the data and generate a new variable called rep which will store the replication number.
We then append the 1000 datasets of model results stored on disk together.
. forvalues r = 1/1000 { 2. append using "rep_`r'" 3. replace rep =`r' if rep==. 4. } (output omitted ) Line 2 appends the dataset corresponding to the current replication number. Line 3 replaces missing values of rep with the current replication number in order to index the appended results. Line 4 closes the loop.
Next we generate a new variable called truevalue which holds the true value associated with each parameter . generate truevalue = .
(8000 missing values generated) . replace truevalue = 0.5 if parameter==1 Then we generate the percentage bias for each parameter making sure its display precision is to the nearest percentage point.
. list parameter truevalue b_2 b_3 b_2_bias b_3_bias, noobs abbreviate (9)    The stage 3 averaged parameter estimates (stored in b_3) lie very close to their true values (truevalue) and this is confirmed by their small percentage biases (b_3_bias). The stage 2 averaged parameter estimates (b_2) also lie very close to their true values, except for the intercept of the WS variance function, τ 0 , which is biased downwards by 17%. Recall that τ 0 is expected to be lower in the stage 2 model as in that model τ 0 corresponds to the population-averaged intercept while in the stage 3 model τ 0 corresponds to the populationmedian intercept.
Second, we list the averaged standard errors.

Conclusions
We have presented runmixregls, a command to run the MIXREGLS software for fitting mixed-effects location scale models seamlessly from within Stata. There are notable advantages of running MIXREGLS in this way. First, runmixregls provides a simple and intuitive command syntax to allow users to specify the full range of modeling and estimation options implemented in MIXREGLS. Second, prior to fitting a particular model, users can take advantage of Stata's excellent statistics, graphics and data management commands to prepare and descriptively analyze their data. Third, after fitting the model, users can apply Stata's comprehensive hypothesis testing, model comparison, prediction and plotting facilities to interpret their models and to communicate their results. Last, all analyses can be reproduced and documented for publication and review by typing all these commands into a file and running them all at once. We illustrated all this functionality by replicating and extending the two example analyses presented by Hedeker and Nordgren (2013). We also performed a simulation study, using runmixregls, to demonstrate a new and important research finding. Namely, that ignoring the random-scale effect in the WS variance function will lead to the regression coefficients in this function to be estimated with spurious precision, especially the regression coefficients of subject-level covariates.
The annotated Stata do-file which accompanies this article allows readers to replicate the presented analyses. A complete description of the command syntax and all modeling and estimation options is provided in the help file (Leckie 2013). Further information and resources can be found on the runmixregls web site (http://www.bristol.ac.uk/cmm/software/ runmixregls/).