2013). runmlwin: A Program to Run the MLwiN Multilevel Modeling Software from within Stata.

We illustrate how to ﬁt multilevel models in the MLwiN package seamlessly from within Stata using the Stata program runmlwin . We argue that using MLwiN and Stata in combination allows researchers to capitalize on the best features of both packages. We provide examples of how to use runmlwin to ﬁt continuous, binary, ordinal, nominal and mixed response multilevel models by both maximum likelihood and Markov chain Monte Carlo estimation.


Introduction
Multilevel models are becoming increasingly popular in the social, behavioral and medical sciences.These models are described variously as multilevel models, random effects models, mixed models, and hierarchical linear models.A range of text books provide introductory (Hox 2010), intermediate (Raudenbush and Bryk 2002;Snijders and Bosker 2012) and advanced (Goldstein 2011) treatment of these models while two recent handbooks describe the latest developments (de Leeuw and Meijer 2008;Hox and Roberts 2010).

MLwiN software
MLwiN is a specialized software package for fitting multilevel models (Rasbash, Charlton, Browne, Healy, and Cameron 2009).At the time of going to press, the most recent version of MLwiN is version 2.26 and our discussion relates to this version.MLwiN can estimate multilevel models for continuous, binary (logistic, probit, complementary log-log), count in the comprehensive Stata manual series (http://www.stata-press.com/manuals/),while dedicated books on statistical modeling, graphics, data management and programming in Stata are published in the Stata Press series (http://www.stata-press.com/books/).
Stata provides the xtmixed, xtmelogit and xtmepoisson commands for fitting multilevel continuous, binary (logistic only) and count (Poisson only) response models, respectively.However, we if all that is required are two-level random intercept models, then the simpler and computationally faster xtreg, xtlogit and xtpoisson commands can be employed, respectively.An excellent introduction to all these commands is provided in Rabe-Hesketh and Skrondal (2012a) and Rabe-Hesketh and Skrondal (2012b).These commands, however, cannot fit multilevel binary probit models or multilevel negative binomial count models.There is also no provision for fitting multilevel ordinal and nominal response models.Estimation is by maximum likelihood; there is no provision for Bayesian estimation.While these three commands offer some provision for fitting cross-classified models, the implementation of these models as constrained hierarchical models is computationally inefficient and only proves feasible in models with few random effects fitted to small datasets.Stata also cannot fit many of the more advanced multilevel models available in MLwiN including multilevel multivariate mixed response type models, multilevel multiple membership models, multilevel spatial models, multilevel measurement error models and multilevel multiple imputation models.
Stata is also a programming language, and this has led many users to write their own add-on packages.In terms of multilevel analysis, the most prominent of these is the gllamm command (Rabe-Hesketh, Skrondal, and Pickles 2004).This command allows one to fit a much wider range of multilevel models than that provided by Stata's own commands, including some models which can also not be fitted in MLwiN.This command can also fit many latent variable models including structural equation and latent class models.An important disadvantage of gllamm is that it is computationally slow, even for simple models such as those that can be fitted by Stata's own commands.Using gllamm to fit realistically complex multilevel models with many random effects to large datasets can prove prohibitively time consuming.
There are several other user-written commands for running external software packages to perform multilevel modeling within Stata.These include: sabrestata for running the Sabre package for fast maximum likelihood estimation of multiprocess event/response sequences (Crouchley, Stott, and Pritchard 2009); hlm for running the hierarchical linear modeling (HLM) software (Raudenbush, Bryk, and Congdon 2012); realcomimpute for running the REALCOM-IMPUTE software for multilevel multiple imputation (Carpenter, Goldstein, and Kenward 2011); runmplus for running the MPlus software for multilevel structural equation modeling (Muthén and Muthén 2012); and winbugs (Thompson, Palmer, and Moreno 2006) for Bayesian model fitting in the WinBUGS package (Spiegelhalter, Thomas, Best, and Lunn 2003).We shall not review these commands further here.

The runmlwin command
We have developed the runmlwin command to be able to fit the full range of multilevel models available in MLwiN seamlessly from within Stata.The command allows the user to fit models by both the IGLS and MCMC algorithms and provides full control over all aspects of model specification and estimation.The runmlwin command works by writing, sending and then running an MLwiN macro file for the specified model in MLwiN and then returns, stores and displays the model results in Stata at which point the user can take full advantage of Stata's many hypothesis testing, model comparison, graphics and other post-estimation commands to aid their interpretation and analysis.The runmlwin command requires Stata 9 or later and can be downloaded and installed from within Stata by issuing the following command .ssc install runmlwin When using runmlwin for the first time, users must specify the full path name of the MLwiN executable using a global macro called MLwiN_path.Users must substitute the path name that is correct for them .global MLwiN_path "C:\Program Files\MLwiN v2.26\i386\mlwin.exe" The rest of this article takes a practical approach to describing runmlwin rather than a formal description of the command's syntax.Readers seeking the latter should consult the runmlwin help file (Leckie and Charlton 2013).

. help runmlwin
We shall present worked examples based on two datasets analyzed in the MLwiN User and MCMC Manuals.Section 2 considers multilevel continuous response models while Section 3 considers multilevel binary response models.In these two sections we focus on relatively simple models which can be fitted by both runmlwin and Stata's own multilevel modeling commands xtmixed and xtmelogit.We do this to make our description of runmlwin as accessible as possible to both existing MLwiN and Stata users but also to readers who currently use other software.A Stata do-file to replicate all analyzes is provided in the Supplementary materials.Section 4 demonstrates a selection of more advanced models that can only be fitted by runmlwin.Section 5 concludes.

Continuous response models
In this section we consider how to fit two-level random intercept and random slope models to continuous response variables.We use the London schools' exam scores data analyzed in Chapters 2-7 of the MLwiN User Manual.
The data are a sub-sample of examination results from six inner London Education Authorities (school districts).The examination is the General Certificate of Secondary Education (GCSE) taken at the end of compulsory schooling, normally when students are 16 years of age.The continuous response variable is student's exam score.A key aim of the original analysis was to establish whether some schools were more effective than others in promoting students' learning and development, taking account of variations in the characteristics of students when they started secondary school (Goldstein, Rasbash, Yang, Woodhouse, Pan, Nuttall, and Thomas 1993).The data have a two-level hierarchical structure, with 4,059 students (level 1) nested within 65 schools (level 2).The data can be read into Stata by running the following command .use "http://www.bristol.ac.uk/cmm/media/runmlwin/tutorial.dta",clear The data include the following variables school: school identifier; student: student identifier; normexam: student's exam score at age 16, normalized to have a standard normal distribution; cons: equal to the number 1 for each student and is used as the intercept term in the models; standlrt: student's score at age 11 on the London reading test (LRT), standardized; girl: student's gender (0: boy; 1: girl); schgend: school's gender (1: mixed school; 2: boys' school; 3: girls' school).

Variance components model
The simplest multilevel model we might consider for these data is a two-level students-withinschools random intercept model where we make no adjustments for predictor variables.This model is referred to as a variance components model as it decomposes the response variation into separate level-specific variance components.The model is written as where normexam ij is the age 16 exam score for student i in school j, β 0 is the intercept measuring the mean score across all schools, u j is a school level random effect, and e ij is a student level residual error.The u j and e ij are assumed independent of one another and normally distributed with zero means and constant variances σ 2 u and σ 2 e .Multilevel models are typically described in terms of their fixed part (that part involving parameters fixed across the whole sample) and their random part (that part involving the random effects and residual error).In this very simple model, the fixed part is simply β 0 , while the random part is u j +e ij .
The degree of clustering (dependence) in the data can be summarized by the intraclass correlation coefficient (ICC) and the variance partition coefficient (VPC).In this simple model, the formulas for these two coefficients coincide σ 2 u / σ 2 u + σ 2 e .The ICC measures the expected correlation between two students from the same school, while the VPC measures the proportion of total variance which lies at the school level.
When specifying the runmlwin syntax to fit this model, it is helpful to think of each right hand side term in the above model equation as being associated with an explanatory variable which is equal to the number 1 for all observations in the data.In our data the variable cons plays this role and so we can rewrite the model equation as where cons appears once in the fixed part of the model, once in the random part of the model at level 2 and once in the random part of the model at level 1.The runmlwin command for fitting this model is then .runmlwin normexam cons, level2(school: cons) level1(student: cons) While the structure of this syntax will be transparent to Stata users, it will be less clear for readers unaccustomed to Stata and so we shall discuss it in some detail.The syntax begins with the command runmlwin and then defines the response variable normexam followed by the list of predictors included in the fixed part of the model, here only cons.Next appears a comma which denotes the end of the fixed part of the model and the start of the random part of the model.The first term after the comma, level2(school: cons), specifies the random part of the model at level 2. Within the parentheses, the level identifier, school, is specified before the colon while all variables made random at level 2, here only cons, are specified after the colon.The second term after the comma, level1(student: cons), specifies the random part of the model at level 1. Again the level identifier, here student, is specified before the colon while all variables made random at this level, here only cons, are specified after the colon.
Running the above command fits the specified model by writing, sending and then running an MLwiN macro file in MLwiN.The MLwiN software automatically opens and displays the specified model in the Equations window using standard mathematical notation (Figure 1).At this point the macro pauses and the user is presented with a choice to either click the Resume macro button to fit the model or to click the Abort macro button to terminate the macro.Users should click the Abort macro button if they wish to use MLwiN interactively prior to manually fitting their model or if the model displayed in the Equations window does not match the desired model.At this point we again emphasize the pedagogic value of the MLwiN Equations window as an aid to better understanding and interpreting the statistical models being specified and fitted.
The first table of output summarizes the multilevel structure of the data and informs us that the model is fitted to 65 schools where the number of students per school ranges from 2 to 198 with an average of 62.4 students.(This table plays the role of the MLwiN Hierarchy Viewer window.)The next block of output reports that runmlwin took 1.28 seconds and 3 iterations of the IGLS algorithm to fit the model.The log-likelihood and deviance statistics are also presented.
The second table of output reports the fixed part parameters.The intercept β 0 is estimated to be −0.013 with standard error 0.054.Recall that the response is approximately normalized and so an estimated intercept of effectively zero is expected.The z ratio, p value and 95% confidence intervals are also reported, but in this example are of little substantive interest.The final table of output reports the random part parameters.The between-school variance σ 2 u is estimated to be 0.169 while the within-school variance σ 2 e is estimated to be 0.848.We can calculate the ICC using the display command, where we refer to the parameter estimates by their internal runmlwin names: σ 2 u is referred to as [RP2]var(cons) while σ 2 e is referred to as [RP1]var(cons).(The display command corresponds to the CALC macro in MLwiN.) .

display [RP2]var(cons)/([RP2]var(cons) + [RP1]var(cons))
.16590652 Interpreted as an ICC, the correlation in exam scores between schoolmates is 0.166.Interpreted as a VPC, 16.6% of the variation in exam scores lies between schools.There are clearly substantial differences in students' scores between schools.
Following Stata convention, standard errors and 95% confidence intervals are presented for the two variance components, however we caution against using them to test whether the between school differences are statistically significant.This is because z -tests and 95% confidence intervals assume asymptotic normal sampling distributions while variances are known to have positively skewed sampling distributions.The preferred way to test the between-school variance is to use Stata's lrtest command to perform a likelihood ratio test to compare the above model to a single-level model with no school effects (not shown).(The lrtest command corresponds to the MLwiN Tail Areas window.)Doing this confirms that there are indeed significant differences between schools (χ 2 1 = 498.72,p < 0.001); students from the same school are therefore significantly more alike than students from different schools.A multilevel approach to analyze the data is clearly favored over a single-level approach.When we ran the above runmlwin command, we clicked the Resume macro button twice in order to return the model results to Stata.We can add the nopause option to prevent runmlwin from pausing in this way.When we do this the results are automatically returned to Stata.This option proves essential when performing simulation studies using runmlwin.

Random intercept model with predictor variables
In our next model, we adjust for students' age 11 scores, student gender and school gender.We adjust for school gender by including a boys' school dummy variable and a girls' school dummy variable (the reference category is mixed sex schools).We create these variables in Stata with the generate command.(The generate command corresponds to the CALC macro in MLwiN.) . generate boysch = (schgend==2) .generate girlsch = (schgend==3) The model is written as To fit this model using runmlwin we simply add the four new predictors to the list of variables that appear in the runmlwin syntax for fixed part of the model.
When we fit multilevel models, it is often of interest to examine empirical Bayes estimates of the random effects to assess whether the random effects are (approximately) normally distributed and to make inferences about specific groups.We can retrieve the empirical Bayes estimates of the school random effects by specifying the level 2 random part of the model as level2(school: cons, residuals(u)) where the term residuals(u) is an option which only applies to this part of the model.The option requests that the random effects' estimates and standard errors are returned to Stata as two new variables u0 and u0se where the user can change the variable stub u by specifying a different string within residuals().All the functionality of the MLwiN Residuals window (Setting tab) is contained within this residuals() option.We spread the runmlwin command for this model over three lines using three slashes /// to denote line continuation.We specify the nogroup option to suppress the table summarizing the multilevel structure of the estimation sample as this is the same as before.
. test boysch = girlsch These results suggest that we might constrain the boys' school and girls' school effects to be the same and we could do this by specifying the constraints() option.(It is not possible to specify constraints when using Stata's xtmixed, xtmelogit and xtmepoisson commands.)Equivalently, we can remove the boys' and girls' school dummy variables and replace them with a single-sex school dummy variable.Either way, the single-sex school effect is estimated (results not shown) to be 0.165 and significant (z = 2.17, p = 0.030).Thus, students in single-sex schools score 0.165 standard deviations higher than students of the same sex in mixed sex-schools.When we ran the above runmlwin command, two new variables, u0 and u0se, appeared in the Stata Variables window.These store the empirical Bayes estimates and standard errors of the school random effects, respectively.While there are only 65 schools, these two variables have 4,059 observations, one for each student in the data.The estimate and standard error for each school is simply replicated across all the students within each school.In what follows, we want to base our analysis on just one observation per school and so we create a dummy variable to pick one observation per school at random.

. egen pickone_school = tag(school)
Next we use the qnorm graph command to produce quantile-quantile plots to check whether the random effects are normally distributed.If the random effects are normally distributed, all the data should be plotted along the 45 degree line.We use the aspectratio(1) option to specify a one-to-one relationship between the height and the width of the graph's plot region.If desired, further options can be added to alter the axes titles and the spacing of the ticks on the axes.All the graphical functionality of the MLwiN Residuals window can be replicated using Stata's many graph commands.
. qnorm u0 if pickone_school==1, aspectratio(1) The resulting graph is shown in Figure 3.While the 65 schools do not all lie on the line, they all lie close to the line suggesting that the predicted effects are approximately normally distributed.Next we shall produce a 'caterpillar plot' of the school effects.The school effects are plotted in ascending rank order and presented with 95% confidence intervals so we can examine those schools which differ significantly from the average school.First we use the egen command and the rank() function to rank the school effects by the magnitude of their predicted effects.

. egen u0rank = rank(u0) if pickone_school==1
We then use the serrbar graph command to produce the caterpillar plot.The first variable after the command must contain the predicted effects, the second the associated standard errors and the third the rank of the predicted effects.We use the scale(1.96)option to obtain 95% confidence limits and the yline(0) option to plot a horizontal line at zero which represents the average school in the data.Again, many further options can be added to improve the look and feel of this graph.
. serrbar u0 u0se u0rank if pickone_school==1, scale(1.96)yline(0) The resulting graph is shown in Figure 4.The plot shows that approximately two-thirds of schools cannot be distinguished from the overall average.Using Stata's count command we see that 12 schools are significantly less effective than average, while 11 schools are significantly more effective than average.
We finish our analysis of this model by dropping u0 and u0se from the data and then storing the model results, naming them as model2, as we shall refer back to them later.

Random slope model
Model 3 is a two-level random slope model where we allow the coefficient of standlrt ij to vary randomly across schools.Thus, we allow the relationship between age 11 and age 16 scores to be potentially stronger in some schools than others.

. lrtest model2 model3
Likelihood-ratio test LR chi2(2) = 44.31(Assumption: model2 nested in model3) Prob > chi2 = 0.0000 To get a better sense of the variability in the school intercepts and slopes we can use the generate command, referring once again to the parameter estimates by their internal runmlwin names, to calculate the predicted age 16 score for each student based only on their age 11 scores.All the functionality of the MLwiN Predictions window can be replicated in this way.
We then plot these predicted age 16 scores against students' age 11 scores to graphically illustrate the extent to which schools differ from one another.All the functionality of the MLwiN Customized Graphs window can be replicated using Stata's many graph commands.
. sort school standlrt .line ypred standlrt, connect(a) The resulting graph is shown in Figure 5.The school lines fan outwards and so the differences between schools appear more pronounced for students with high age 11 scores than they do for students with low age 11 scores.
The exact relationship for how the between school variance changes as a function of age 11 scores is given by the level 2 variance function.
We can use the generate command to calculate the predicted between school variance for each student.All the functionality of the MLwiN Variance window can be replicated in this way.
.  In this section we have focused on using runmlwin to fit two-level models for continuous responses.As with xtmixed, runmlwin can also fit continuous response multilevel models with three or more levels, cross-classifications and sampling weights.However, unlike xtmixed, runmlwin can additionally fit continuous response multilevel models with multivariate responses, multiple membership structures, complex level 1 variance functions (heteroskedasticity), and linear constraints.

Binary response models
In this section we consider how to fit two-level random intercept and random slope models to binary response variables.We use the Bangladeshi contraceptive use data analyzed in Chapter 9 of the MLwiN User Manual and Chapter 10 of the MLwiN MCMC Manual.Before we describe these data, we first discuss estimation of discrete response multilevel models.The likelihood of the observed data in discrete response multilevel models does not generally have a closed-form expression and so approximate methods of estimation must be used.MLwiN offers two approximate methods: quasi-likelihood and MCMC methods.
The idea behind quasi-likelihood methods is to approximate discrete response multilevel models as continuous response multilevel models so that the standard IGLS algorithm can be applied.MLwiN provides four quasi-likelihood methods: first and second order marginal quasi-likelihood (MQL1, MQL2) and first and second order penalized quasi-likelihood (PQL1, PQL2).All four methods have been shown to be biased, particularly if sample sizes within level 2 units are small or the response proportion is extreme.These methods should therefore only be used for model exploration; any final model should be fitted by MCMC.PQL2 is the most accurate of the four quasi-likelihood methods, but the least stable and the slowest to converge while MQL1 is the least accurate, but the most stable and fastest to converge.Thus, when fitting exploratory models, MLwiN users will often first fit their models by MQL1 and then use these estimates as starting values for running additional iterations by PQL2.The MLwiN User Manual provides an accessible introduction to quasi-likelihood methods, explained in the context of the MLwiN software.Technical details are given in Goldstein (2011).
MCMC is a simulation approach.After assigning starting values (e.g., the quasi-likelihood estimates) and prior distributions for the model parameters, a Markov chain is used to sequentially sample subsets of parameters from their conditional posterior distributions given current values of the other parameters.(We do not simulate directly from the joint posterior as it is generally intractable.)After an initial burn-in period, when the chain converges to its stationary distribution, the chain is run for a further monitoring period.The means and standard deviations of the sampled parameters from the monitoring period are used as parameter estimates and standard errors while the 2.5th and 97.5th quantiles of these chains provide Bayesian 95% credible intervals, analogous to 95% confidence intervals.When vague or diffuse prior distributions are specified (the default in MLwiN), the parameter estimates are essentially maximum likelihood estimates.The MLwiN MCMC Manual provides an accessible introduction to MCMC methods, in the context of the MLwiN software.Technical details are given in Goldstein (2011).The data are a sub-sample from the 1989 Bangladesh Fertility Survey (Huq and Cleland 1990).The binary response variable that we consider refers to whether a woman was using contraception at the time of the survey.The full sample was analyzed in Amin, Diamond, and Steele (1997), but with a four-category response that distinguishes between different types of contraceptive method.We shall present ordinal and nominal models for this response in Section 4. The aim of this analysis is to identify factors associated with use of contraception and to examine the extent of between-district variation in contraceptive use.The data have a two-level hierarchical structure, with 2,867 women (level 1) nested within 60 districts (level 2).

Variance components model
The two-level women-within-districts variance components model for these data is written as where use ij is the binary response for whether woman i in district j uses contraception, β 0 is the intercept measuring the log-odds of using contraception in the average district and u j is a district level random effect assumed normally distributed with zero mean and a constant variance σ 2 u .No woman-level residual error appears in the linear predictor.We can fit this model using runmlwin and the discrete() option where we specify the distribution(binomial), link(logit) and denominator(cons) sub options to request a binomial response model with a logit link function and a denominator (number of binomial trials) equal to one for each observation.We specify the pql2 sub option to fit the model by PQL2 as opposed to default estimation by MQL1.In contrast to continuous response models, we do not make cons random at level 1 as the model does not include a woman-level residual error.
In the fixed part parameters table, the first and second columns of numbers present the means (parameter estimates) and standard deviations (standard errors) of the parameter posterior distributions.In contrast to model output after fitting by IGLS, the third and fourth columns do not present the usual z ratios and p values as MCMC does not assume that the parameters follow asymptotic normal sampling distributions.Instead, column three presents the effective sample size (ESS) for each parameter, an estimate of the equivalent number of independent iterations that the chain represents.The ESS will typically be less than the number of actual iterations because the chains are positively autocorrelated (they are Markov chains).Column four presents one-tailed p values based on the posterior distributions.For a positive estimate, the p value is the proportion of that parameter's posterior distribution that is below zero.For a negative estimate, the p value is the proportion of that parameter's posterior distribution that is above zero.The fifth and sixth columns give the 2.5th and 97.5th quantiles of the posterior distribution, resulting in 95% Bayesian credible intervals.
The intercept β 0 is estimated to be -0.508 and is interpreted as the log-odds of using contraception in the average district.The corresponding odds and probability of using contraception are derived as exp (β 0 ) and exp (β 0 )/ {1 + exp (β 0 )} and are estimated to be 0.602 and 0.375, respectively.Thus, almost 40% of the women use some method of contraception.

. display exp([FP1]cons)/(1 + exp([FP1]cons))
.375707 The ESS is 133, implying that the sample of 5,000 values is only equivalent to only 133 independent iterations.We might therefore choose to refit the model specifying a longer burn-in and monitoring period.In the random part parameters table, the between-district variance σ 2 u is estimated to be 0.285.The ESS for this parameter is 704 and so its parameter chain is less autocorrelated than the intercept's parameter chain.In Section 1, we discussed the ICC and VPC statistics for measuring clustering or dependence in continuous response data.In the case of binary and other discrete response models, there is no single ICC or VPC value as the level 1 variance is a function of the mean.A popular solution is to formulate the model in terms of a continuous latent response variable which underlies the observed binary response.It can then be shown that the ICC and VPC statistics, in terms of the underlying latent response, are calculated as σ 2 u / σ 2 u + π 2 /3 .In our case, the latent variable underlying women's observed behavior would be interpreted as their propensity to use contraception. .

display [RP2]var(cons)/([RP2]var(cons) + (_pi^2)/3)
.0798183 Thus, the expected correlation in the propensity to use contraception between two women from the same district is just 0.080.Interpreted as a VPC we would say that 8% of the variation in womens' propensity to use contraception lies between districts.In addition to runmlwin, we have also developed a second command mcmcsum (distributed with runmlwin) to calculate more detailed graphical and statistical MCMC diagnostics for the parameter chains.The mcmcsum command replicates all the functionality of the MLwiN Trajectories window.For example, specifying the mcmcsum command's trajectories option produces trajectory plots (trace plots) of the deviance statistic and each model parameter.The resulting graph is shown in Figure 7.

. mcmcsum, trajectories
We can examine a specific parameter chain in detail by listing it after the mcmcsum command and then specifying the fiveway option to produce a five-way MCMC graphical diagnostic plot.We do this below for the between-district variance σ 2 u . .

mcmcsum [RP2]var(cons), fiveway
The resulting graph is shown in Figure 8.On the first row, the left panel plots the trace of the chain while the right panel plots a smoothed histogram of the chain.The smoothed histogram shows the posterior distribution of the between-district variance to be positively skewed.On the second row, the left panel is an auto-correlation function (ACF) plot showing the autocorrelation between iteration t and t − k while the right panel is a partial autocorrelation function (PACF) showing the autocorrelation between iteration t and t−k, having accounted for iterations t − 1, . . ., t − (k − 1).The less correlated the chain the better.Here the chain looks moderately correlated and we may wish to run the chain for longer.The  single graph on the third row plots the Monte Carlo standard error (MCSE), an indication of how much error is in the mean estimate due to the fact that MCMC is used.As the number of iterations increases the MCSE tends to 0. The MCSE can be used to calculate how long to run the chain to achieve a mean estimate with a particular desired MCSE.Full details are provided in the MLwiN MCMC Manual.Specifying the mcmcsum command for a specific parameter with no options gives an expanded set of summary statistics for that parameter.These include the mode, median and other quantiles of the posterior distribution along with the Brooks-Draper and Raftery-Lewis statistics for helping to judge burn-in length.Full details are provided in the MLwiN MCMC Manual. .

Random intercept model with predictor variables
In our next model we adjust for woman's age, number of living children, whether the woman lives in an urban or rural area of residence, education status and religion.
The number of living children and women's education levels are both entered into the model as series of dummy variables.We start by generating these dummy variables .generate lc1 = (lc==1) .generate lc2 = (lc==2) .generate lc3plus = (lc==3) .generate edlprim = (educ==2) .generate eduprim = (educ==3) .generate edsecprim = (educ==4) Next, we fit the model by MCMC, where we first fit the model by PQL2 to obtain sensible starting values for the model parameters.We prefix the runmlwin command with the quietly command to suppress the PQL2 model output.In the MCMC model, we specify the burnin(1000) and chain(10000) sub options within mcmc() to illustrate how to manually specify the burn-in and monitoring period.
In the random part of the model the between-district variance is now estimated to be 0.258 compared to 0.285 in the variance components model.The ICC and VPC are therefore now lower than they were previously suggesting that even though all the predictors are level 1 variables, they have explained away relatively more variation between districts than within districts.There is clearly substantial district variation in women's background characteristics across the sample.

Random slope models
In our next model we introduce a random coefficient to allow the magnitude of the urban-rural differential to vary across districts.
To fit this model using runmlwin, we simply add urban to the list of variables made random at level 2. We specify orthogonal within mcmc() to reparameterize the model using orthogonal fixed effects.This reparameterization tends to make the MCMC chains less autocorrelated and so we can typically run the model for fewer iterations to achieve the same ESS for each parameter.Indeed the ESS for the fixed part parameters are mostly higher than they were in the previous model where we ran the monitoring period for twice as many iterations.
. These results show that there is greater district-level variation in the probability of using contraception in rural areas than in urban areas.

display [RP2]var(cons) + 2*[RP2]cov(cons\urban) + [RP2]var(urban)
.24103108 In this section we have focused on using the runmlwin command to fit two-level logistic models for binary responses.As with xtmelogit, we can extend these models to include three or more levels, cross-classifications, and binomial (proportion) responses.However, unlike xtmelogit, runmlwin can additionally fit models with different link functions (probit and complementary log-log) and can also fit multivariate binary and binomial response models.

More advanced models
In this section we briefly consider how to fit three more advanced multilevel models using runmlwin: a multilevel multivariate mixed response type model, a multilevel ordinal response model and a multilevel nominal response model.These models cannot be fitted using Stata's multilevel modeling commands (xtmixed, xtmelogit and xtmepoisson).For simplicity we continue to analyze the education and demography example datasets introduced in the last two sections.

Multivariate mixed response type model
In this example we fit a multivariate mixed response type model to the London exam scores data.Specifically, we fit a two-level students-within-schools bivariate response model for students' age 11 scores standlrt and a dichotomized version of students' age 16 scores pass which equals 1 if normexam is positive and 0 otherwise.The model is a random intercept model where we only adjust for student gender.We formulate the binary response in terms of an underlying latent response pass * ij which can be interpreted as the propensity with which student i in school j passes their exam.
Figure 9 illustrates how the predicted probability of using each method of contraception changes by woman's age and whether that woman lives in a rural or urban area.(Predictions are calculated for a woman with two living children living in the average state).Figure 10 illustrates how the predicted probability of using each method of contraception changes by the number of living children a woman has, whether that woman lives in a rural or urban area and by which district the women lives in.(Predictions are calculated for a 30 year-old woman).
Figure 9 suggests that the probability of not using contraception increases with age.Among the three alternative methods of contraception, the probability of using traditional methods appears to decrease with age while the probability of using modern reversible method and particularly sterilization increases with age.The figure also suggests that the probability of using no contraception is much lower in urban areas than it is in rural areas while the probability of using traditional method is much higher in urban areas than it is in rural areas.The probabilities of using modern reversible methods and sterilization appear similar for women living in rural and urban areas.
Figure 10 suggests that the probability of not using contraception is highest for childless women, reduces for women with one child and is slightly lower again for women with two or more children.The probability of using each of the three different methods of contraception are all substantially higher for women with living children, particularly for those women with two or three living children, than they are for childless women.As in Figure 9, we see that women living in urban areas are much more likely to use contraception than women living in rural areas.Plotting the district specific probabilities of using each method of contraception, illustrates the substantial heterogeneity in woman's contraceptive use between districts; in some districts women are far more likely to use contraception than in other districts.
Finally, in the random part parameters table of the model output, the random effects correlations are all positive suggesting that districts with high (low) use of one type of method relative to using no contraception also tend to have high (low) use of other methods relative to using no contraception.The highest correlation is between the use of sterilization and the use of modern reversible methods which is expected as both of these types of method are promoted by family planning programs.

Conclusions
We have illustrated how to run MLwiN seamlessly from within Stata using the runmlwin command.The runmlwin command allows all models fitted in MLwiN to now be fitted within Stata and therefore greatly expands the range of multilevel modeling options available to Stata users.We note that where there is overlap with Stata's existing commands, runmlwin will often fit these models considerably more quickly.A very important advantage of operating MLwiN via runmlwin as opposed to through its point-and-click interface is that this makes carrying out reproducible and fully documented analyzes extremely easy.The annotated dofile which accompanies this article and replicates all of the presented Stata output is a case in point.More broadly, running MLwiN from within Stata allows users to take advantage of Stata's excellent statistics, graphics and data management commands to prepare and descriptively analyze their multilevel datasets.After fitting a model by runmlwin, users can apply Stata's comprehensive hypothesis testing, model comparison, and multilevel graphics facilities to aid model interpretation and to communicate their results.
We have created a dedicated runmlwin web site to provide users with further information and resources (http://www.bristol.ac.uk/cmm/software/runmlwin/).Readers can learn more about the wide range of models which can be fitted by downloading the different conference presentations we have given.Readers seeking more detail can then download Stata do-files and log-files which allow them to replicate every example presented in the two MLwiN manuals.
A complete description of the command syntax and every modeling and estimation option is provided in the 28-page help file (Leckie and Charlton 2013).Finally, the website provides an active user forum where we encourage users to post their queries.

Figure 1 :
Figure 1: MLwiN software: The Equations window shows the specified model.

Figure 2 :
Figure 2: MLwiN software: The Equations window shows the parameter estimates, standard errors and deviance statistics for the converged model.

Figure 3 :Figure 4 :
Figure 3: Quantile-quantile plot of the school effects plotted in Stata.

Figure 9 :
Figure 9: Category probabilities against age plotted in Stata.

Figure 10 :
Figure 10: Category probabilities against number of living children plotted in Stata.
Fitting this model first by IGLS MQL1 and then by MCMC produces