Hierarchical Generalized Linear Models: The R Package HGLMMM

The R package HGLMMM has been developed to ﬁt generalized linear models with random eﬀects using the h-likelihood approach. The response variable is allowed to follow a binomial, Poisson, Gaussian or gamma distribution. The distribution of random eﬀects can be speciﬁed as Gaussian, gamma, inverse-gamma or beta. Complex structures as multi-membership design or multilevel designs can be handled. Further, dispersion parameters of random components and the residual dispersion (overdispersion) can be modeled as a function of covariates. Overdispersion parameter can be ﬁxed or estimated. Fixed eﬀects in the mean structure can be estimated using extended likelihood or a ﬁrst order Laplace approximation to the marginal likelihood. Dispersion parameters are estimated using ﬁrst order adjusted proﬁle likelihood.


Introduction
In Nelder and Wedderburn (1972) the class of generalized linear models (GLM) was developed. This class of models allows for the response to follow a distribution from the exponential family, extending modeling capabilities beyond the Gaussian response. In Henderson, Kempthorne, Searle, and Krosigk (1959) the linear mixed model was suggested, which enabled to model correlation in the data. Further, it was extended to the generalized linear mixed model (see e.g., Molenberghs and Verbeke 2005), where the response from an exponential family is combined with normal random effects. In Lee and Nelder (1996) hierarchical generalized linear models were described, which allows random effects to be not normally distributed. Further Lee and Nelder (1996) proposed the extended likelihood rather than the marginal likelihood to estimate the parameters. Later Lee and Nelder (2001) focused on the joint modelling of the mean and dispersion structure. This estimation technique relies on the iterative weighted least squares (IWLS) algorithm, where fixed effects and random effects are estimated using the extended likelihood, and dispersion parameters are obtained by maximizing the adjusted profile likelihood. A subsequent adjustment of the algorithm was proposed in Noh and Lee (2007) replacing the extended likelihood by the first order adjusted profile likelihood as a criterion to estimate fixed effects in the mean structure.
The objective of this paper is to present the R (R Development Core Team 2011) package HGLMMM available from the Comprehensive R Archive Network at http://CRAN. R-project.org/package=HGLMMM. The package runs under R version 2.9.0 or higher. This package fits the class of generalized linear models with random effects. In the remainder of the paper we first outline the h-likelihood approach to the estimation and statistical inference. Next, we present the capabilities of the HGLMMM package on real-life datasets, described in Lee, Nelder, and Pawitan (2006).

H-likelihood estimation and inference framework
Standard maximum likelihood estimation for models with random effects is based on the marginal likelihood as objective function. The parameters are estimated by a marginal likelihood procedure (MML) and their standard errors are computed from the inverse of the negative hessian matrix of the marginal likelihood. In the marginal likelihood approach random effects v are integrated out and only fixed effects in the mean structure β and dispersion parameters λ are retained in the maximized function. For a mixed effects model the conditional likelihood of the j-th (j = 1, . . . , n i ) repeated observation on the i-th subject (i = 1, . . . , N ), i.e., y ij , is given by f β,λ (y ij |v i ). The likelihood of the i-th random effect is denoted as f λ (v i ). Note that λ contains dispersion parameters of the random components v i as well as the parameters describing the residual dispersion (overdispersion) of the response y ij . The marginal likelihood maximized in the MML procedure is given by Maximizing L M or equivalently the log-likelihood M = log(L M ) yields consistent estimates of the fixed effects parameters. However, the problem lies in computing the integrated likelihood. This may be a time-consuming task especially for complex models since it needs to be done for each subject and each iteration. Further, if the MLE is determined with a Newton-Raphson procedure then integrals need to be computed also for the first and second derivatives. It is also important to fine tune the likelihood calculations, see e.g., Lesaffre and Spiessens (2001).
In Lee and Nelder (1996) another approach to estimating the parameters was proposed. These authors argued to use the joint likelihood L E for the maximization, which is directly available from the definition of the model. The joint likelihood, called also extended likelihood or hlikelihood, is then maximized jointly with respect to v and β given dispersion parameters λ. At the maximum, standard errors are obtained in the classical way. In the notation of above, the extended likelihood is given by: The logarithm of (2) is called the extended log-likelihood by Lee et al. (2006) and they denoted its logarithm as h = log [L E (β, λ, v|y, v)]. We could say that this extended likelihood reflects the hierarchical character of the data.
In the h-likelihood approach the estimates of dispersion parameters are determined by maximizing the adjusted profile likelihood introduced by Cox and Reid (1987). However, for some models the approach of Lee and Nelder (1996) is not appropriate, therefore Noh and Lee (2007) proposed to replace the joint likelihood as the estimation criterion for β with another adjusted profile likelihood. The outline of this procedure is given in the next section.

Computing marginal MLEs using the h-likelihood approach
In some special cases, i.e., when the random effects are on the canonical scale (see e.g., Lee et al. 2006 pp. 112-114), joint maximization of the extended log-likelihood h with respect to all parameters (β, λ, v 1 , . . . , v N ) is equivalent to maximizing the marginal likelihood with respect to β, λ and taking the empirical Bayes (EB) estimates for v 1 , . . . , v N . But, most often the two maximization procedures are not equivalent. Noh and Lee (2007) suggest in the general case to work with a Laplace approximation to the marginal likelihood (1). Namely, the integral of the function k(x, y) exp [−ng(x, y)] with respect to x can be approximated as follows: withx the value of x that maximizes −g(x, y). This is called the Laplace approximation of the above integral (atx). More information on the Laplace approximation can be found in e.g., Severini (2000, Section 2.11).
Taking in expression (3) k(x, y) = 1; exp [ng(x, y)] = exp [−h(θ, v)] whereby θ = (β, λ); v representing the stacked vector of N random effects and finally n ∂ 2 g(x,y) ∂x ∂x = − ∂ 2 h(θ,v) ∂v ∂v , leads to withv maximizing the extended likelihood for a given (starting) value of β and λ, i.e., v(β, λ). Note that the approximation improves when the number of observations per subject, n i increases, which can be inferred from Severini (2000). Taking the logarithm of the previous expression leads to the adjusted profile (log)-likelihood with D(h, v) = − ∂ 2 h(β,λ,v) ∂v ∂v . The term 'adjusted profile likelihood' is chosen since h(β, λ, v)| v=v is a profile (log)-likelihood of β and λ and the second term in (5) is a correction term to approximate the marginal log-likelihood. The next step in the iterative procedure is to maximize the adjusted profile (log-)likelihood (5) with respect to β. Note that maximizing the profile log-likelihood h(β, λ, v)| v=v to find the MLE of β is not appropriate since this is equivalent to joint maximization of h over β and v which is most often invalid as seen above.
After obtainingβ λ from maximization of (5) for a given dispersion component λ, the estimation algorithm proceeds with estimation of λ. Another adjusted profile likelihood is used as an objective function to findλ.
After substitution of expression (7) into (6) one obtains: with D( M , β) = − ∂ 2 M ∂β ∂β . In the next step, one replaces everywhere the marginal log likelihood M by the adjusted profile log-likelihood p v (h) evaluated in β =β λ . This results in: (9) Finally, in Appendix 4 of Lee and Nelder (2001) it is shown that the sum of the last two terms in the above expression is equal to with dimensions equal to the sum of the dimensions of two adjustment terms in (9). As a result one obtains the following adjusted profile likelihood: The latter adjusted profile likelihood is maximized with respect to λ to obtainλ. Note that this objective function is "focussed" solely on the dispersion parameters. This offers an extension of restricted maximum likelihood (REML) estimation and provides inference for the class of generalized linear mixed models, see Noh and Lee (2007). We show below that in the case of linear mixed models this function is exactly the restricted maximum likelihood.

The linear mixed model case
We illustrate the above estimation framework for the linear mixed model case. While in general the above calculations are approximate, in this case they are exact. Some of the expressions are based on the results shown in Harville (1977). The classical linear mixed effects model assumes: The design matrix X i contains fixed effects for the i-th subject, while Z i is a design matrix for the random effects for the i-th subject. Matrices Σ i and Λ i determine the residual variance of y i and the variance of random effects v i respectively. Note that we have split up λ into λ e and λ v , the dispersion parameters pertaining to the residual variability Σ i and random effects variability Λ i , thereby showing the role of each of the dispersion components. Denote by V the total (over all subjects) marginal variance-covariance matrix V = ZΛZ T + Σ, where X is the design matrix for fixed effects obtained by stacking is the variance covariance matrix of the random effects and Σ = diag(Σ 1 , . . . , Σ N ) is the residual variance covariance matrix.
In this case the adjusted profile likelihood p v (h) becomes: which is the expression for the marginal likelihood of a linear mixed model. Further, the adjusted profile likelihood p v,β (h) is equal to: which is exactly equal to the classical REML likelihood (see e.g., Verbeke and Molenberghs 2000) for the linear mixed model.
Note that maximization of (5) with respect to β and λ is actually the maximization of the marginal likelihood computed by a classical Laplace approximation. In the above h-likelihood procedure, estimation of β given λ is the same as with a classical Laplace approximation, but the estimation of λ is essentially different. The h-likelihood procedure gives an elegant set of IWLS equations for the estimation of the dispersion parameters, which possibly depend on covariates.

Application to the hierarchical generalized linear models
The above theory is applied to generalized linear models with random effects. In this class of models the assumed distribution of the response y ij (conditional on random effects) belongs to the exponential family: This distribution is combined with the distribution of the random component, which distribution belongs to the family of conjugate Bayesian distributions for an exponential family, i.e., which can be expressed as the distribution of a pseudo-response ψ i as follows: In this class of models the response is allowed to follow a Gaussian, binomial, Poisson or gamma distribution. Their corresponding conjugate Bayesian distributions are Gaussian, beta, gamma and inverse-gamma, respectively. Note that both λ v and λ e are allowed to depend on covariates.

Implementation in the HGLMMM package
In this section we document the use of the function HGLMfit, together with accompanying functions HGLMLRTest, HGLMLikeDeriv and BootstrapEnvelopeHGLM. First we will describe the use of the fitting function HGLMfit and explain the parameters used in the invocation of the function. The following structure of the routine is used: DistResp ("Normal"): The distribution of the response as defined in (14) is set by the option DistResp. The user can set it to: Binomial, Normal, Poisson or Gamma. Note that the name of the distribution must start with a capital letter.
DistRand (no default): The next option DistRand specifies the distribution of the random components and should be set as a vector of distribution names from the set: "Beta", "Gamma", "IGamma" (inverse-gamma) and "Normal". For each random component one entry in the vector must be specified. Therefore for a model with two random effects, whereby the first random effect has a normal distribution and the second random effect has a gamma distribution, you should specify the vector c("Normal", "Gamma").
Link (canonical link): The link option is available for a gamma distribution of the response. The choice is either "Log" or "Inverse". For the random variables of a "Binomial", "Normal" or "Poisson" distribution only the default links are currently available, that is "Logit", "Identity" and "Log", respectively.
LapFix (FALSE): Having defined the structure of the model, i.e., (1) the distribution of the response, (2) random effects and (3) the link, one has to specify in the option LapFix whether the joint likelihood (2) will be used to estimate fixed effects in the mean structure of the response model, or the effects will be estimated by the adjusted profile likelihood (5), which is an approximation to the marginal likelihood. Set LapFix = TRUE for the adjusted profile likelihood and LapFix = FALSE for the joint likelihood.
ODEst (likelihood-based analysis), ODEstVal (0): Next the option ODEst determines whether the dispersion parameter λ e in (14) will be estimated or held fixed. This is set to NULL by default which implies for the "Poisson" and "Binomial" distribution for the response that it is fixed. For the "Normal" and "Gamma" response, the default option is that it is estimated from the data. Specifying ODEst = TRUE implies that λ e is estimated, while it is fixed for ODEst = FALSE. Further, the parameters ODEstVal specify either the starting values when λ e is estimated, or the values used in the estimation when λ e is set fixed.
formulaMain: The option formulaMain requires a two-sided formula to be specified to determine the structure of the linear predictor of the response, e.g., Outcome~Fixed.Efffect.1 + Fixed.Effect.2 + (1 | Subject.1). This specification sets Outcome as the response. Further, in the above example it is specified that there are two fixed effects and a random intercept with subject index Subject.1. Correlated random components are currently not allowed, therefore a structure (1 + time | Subject.1) is not valid, instead (1 | Subject.1) + (time | Subject.1) needs to be entered.
formulaOD, formulaRand: Similarly formulaOD specifies the covariates in the residual dispersion (overdispersion) structure by a one sided formula e.g.,~1 + time. Further, formulaRand requires a list of formulas determining the dependence of the dispersion parameters of the random components on covariates. For a model with two random effects the following code list(one =~1 + mean.time, two =~1) means that the dispersion parameter of the first random component depends on the average time, while the dispersion parameter of the second random component is constant (intercept only model).
DataMain, DataRand: The option DataMain determines which data frame to use for the estimation, i.e., where the data for the mean model and residual dispersion is referred to. Likewise DataRand is a list of data frames. They correspond to design matrices for the dispersion parameters of random effects. Therefore for each random effect specified in formulaMain, a corresponding dataframe needs to be included.
Offset (1): In Poisson regression, an offset variable t is specified in the form log(µ/t) = η, where η is a linear predictor and µ a mean modeled in the Poisson regression. Therefore it is not necessary to log-transform t before entering in the model.
BinomialDen (1): In binomial regression one has to use option BinomialDen to specify the denominator for the binomial distribution. Suppose you toss a coin ten times, then you have 10 independent trials, and the denominator should then be 10. It is allowed that each observation has a different denominator.
StartBeta (NULL), StartVs (NULL), StartRGamma (NULL): The three following options allow to specify the starting values for fixed effects in the mean structure StartBeta, random effects in the mean structure StartVs (which is a vector of values for all random components together) and dispersion parameters StartRGamma. Note that starting values for the residual dispersion (overdispersion) are supplied in ODEstVal. Recall that the overdispersion parameter may be fixed or estimated by setting the option ODEst. When starting values are not supplied, for the intercept of the mean structure an appropriate sample mean of the response is used, and zeros are used for the other parameters.
CONV (1e-04): Setting option CONV determines the criterion for convergence, which is computed as the absolute difference between values of all estimated parameters in the previous iteration and in the current iteration.
The function HGLMfit returns an object of class HGLM. We refer to the help file of the package for the documentation of all elements of this list. The function HGLMLikeDeriv takes an object of the class HGLM and returns gradient values for fixed effects in the mean structure and dispersion parameters. The function HGLMLRTest compares two HGLM objects with respect to h-likelihood values, marginal likelihood p v (h) and REML likelihood p β,v (h). Likelihood ratio tests are produced. Finally, the function BootstrapEnvelopeHGLM creates a qq-plot of the standardized deviance residuals for the response together with bootstrap envelopes under the assumption that the given model is correct. If the qq-plot falls within the envelopes we can claim that the assumed distribution of the response is reasonable.
There are many packages/routines/programs for analysis of the hierarchical models. The algorithm based on the h-likelihood approach offers a wider choice for the random effects distributions. Further, dispersion components of random effects as well as an overdispersion parameter can be modeled as a function of covariates. This is combined with relatively modest computational requirements.

Analysis of examples
In this section we will present the analysis of the three data sets, also analyzed in Lee et al. (2006), using the package HGLMMM.

Salamander data
In McCullagh and Nelder (1989) a dataset on salamander mating was presented. The dependent variable represents the success of salamanders mating. There were 20 males and 20 females in each of the 3 experiments coming from the two populations called whiteside (denoted by W) and roughbutt (denoted as R). Salamanders were paired for mating six times with individuals of their own kind and from the other population. In total 360 observations were generated. Denote as µ ijk the probability of successful mating. We will consider a model with crossed-random effects (also known as a multi-membership model) to analyze this dataset. The mean structure has the following expression: where TypeF is the type of female (R or W), TypeM is the type of male (R or W), while v i are the random effects corresponding to males and v j are the random effects corresponding to females. The following analyses were also performed in Lee et al. (2006, pp. 194-195). The program below is used to fit this model. The design matrices for the dispersion components of male and female random effects contain only an intercept and are created by the command below: R> RSal <-data.frame(int = rep(1, 60)) The following program is invoked to perform the analysis: R> modBin <-HGLMfit(DistResp = "Binomial", DistRand = c("Normal", "Normal"), + Link = "Logit", LapFix = TRUE, ODEst = FALSE, ODEstVal = 0, + formulaMain = Mate~TypeF + TypeM + TypeF * The output gives information about the response distribution, the number of random effects, their distribution and the subject index. Further, the link function is reported. Next, information is contained whether overdispersion (residual dispersion) is fixed or estimated. The method of estimating the fixed effects in the regression equation is reported, as well as the model equation, the overdispersion equation and the dispersion equations.
The detailed results of the fit can be obtained by the command:

R> summary(modBin, V = TRUE)
If option V = TRUE is omitted, the results for the random effects are not printed (as in our example). Output for our object modBin is as follows: The above output gives summary of fixed effects in the mean structure.

===== Overdispersion Parameters Fixed ===== Fixed Value (Intercept) 0
The overdispersion parameter is held fixed at 1, which is exp(0). Note that the parameters reported for overdispersion and dispersion pertain to the logarithm of the parameters. The output shown below reports values for the parameters in the dispersion structure of random effects. To obtain the value of λ v one again needs to exponentiate the reported estimate. The h-likelihood should be used to select the random effects, the marginal likelihood is the choice for the mean structure simplification, while REML likelihood can be used for inference on the variance components. We obtain similar estimates as in Lee et al. (2006).

Cake data
The chocolate cakes preparation experiment was conducted at Iowa State College. Three recipes for preparing the batter were compared. Cakes were prepared at 6 different baking temperatures, which ranged from 175 up to 225 degrees Celsius. For each recipe 6 cakes were prepared baked at different temperatures, therefore 18 cakes were baked all together and they are referred to as replication. There were 15 replications, therefore in total 270 cakes were baked. Further, in each replication there are 3 different recipes. To cope with such a design we include two random effects, v i representing the replication, and v ij for the j-th recipe within the replication. Finally e ijk stands for the error term associated with each cake. The linear predictor of the model is defined as follows: whereby i = 1, . . . , 15 refers to the replicate, j = 1, 2, 3 is the recipe and k =, 1 . . . , 6 to the temperature. The analyzes presented can also be found in Lee et al. (2006, pp. 163-166). We will consider two models. In the first model we consider the breaking angle as a normally distributed response, while in the second model we assume that it has a gamma distribution. Further, in both models we assume a normal distribution for both random effects v i and v ij . The following syntax is used for a gamma response model: R> cake$repbatch <-100 * cake$Replicate + cake$Batch R> R1Cake <-data.frame(int = rep(1, 15)) R> R2Cake <-data.frame(int = rep(1, 45)) R> modCake2 <-HGLMfit(DistResp = "Gamma", DistRand = c("Normal", "Normal"), + Link = "Log", LapFix = FALSE, ODEst = TRUE, ODEstVal = 0, + formulaMain = Angle~as.factor(Recipe) + as.factor(Temperature) + + as.factor(Recipe) * as.factor(Temperature) + (1 | Replicate) + + (1 | repbatch), formulaOD =~1, formulaRand = list(one =~1, + two =~1), DataMain = cake, DataRand = list(R1Cake, R2Cake), The normal model yields a (marginal) likelihood value of −819.54. This value is lower than for a gamma model, and thus preferable according to the AIC criterion. Following Lee and Nelder (1998) we create the model checking plots for both models using the following code. First some graphics manipulation parameters are set to control the layout of the graphs: R> op <-par(mfrow = c(2, 2), oma = c(1, 1, 2, 1), mar = c(3, 3, 4, 1) + 1.2) Next we copy the standardized deviance residuals as suggested by Lee et al. (2006, p. 52) from the model object modCake2, and compute the linear predictor of the model, i.e., η together with mean and transformed mean as indicated in Nelder (1990). This is done as follows: R> res <-modCake2$Details$StdDevianceResidualY R> xmat <-model.matrix(~as.factor(Recipe) + as.factor(Temperature) + + as.factor(Recipe) * as.factor(Temperature), data = cake) R> eta <-xmat %*% modCake2$Results$Beta R> mu <-exp(eta) R> mu <-log(mu) After a few steps, standardized deviance residuals and absolute standardized deviance residuals can be plotted against scaled fitted values, together with a loess curve. In addition a qq-plot of these residuals and histogram are created.
Further we fitted a gamma model without interaction recipe j * temp k and compared it with the original model using the likelihood ratio test: R> HGLMLRTest(modCake3, modCake2) The following output is obtained: The chi-squared test statistics with 10 degrees of freedom indicate that there is not enough evidence for the inclusion of the interaction term into the model (p value: 0.5). Therefore the simplified model is our choice. According to the diagnostics plot on Figure 2 the model seems to be fitting well. Lee et al. (2006) analyzed this data using linear mixed model for the original, as well as the transformed response. They also indicated that the gamma model is preferable by the AIC criterion.

Rat data
Thirty rats were treated with one of three chemotherapy drugs. White blood cell counts (W) and red blood cell counts (R) were taken at each of four different time points. We perform the same analysis as in Lee et al. (2006, pp. 224-229). We will focus here on quasi-Poisson model with normal random effects. First we create the design matrix for the dispersion parameter of the random component: R> Rrat <-data.frame(WBC = tapply(rat$WhiteBloodCells, rat$Subject, mean), + RBC = tapply(rat$RedBloodCells, rat$Subject, mean)) The model can be fitted by: The crucial option here is ODEst = TRUE, which requests a quasi-Poisson to be fitted instead of a likelihood-based Poisson model. Note that a summary of the likelihood values is not valid now, as we are not using a likelihood-based technique but extended quasi-likelihood is invoked. By a similar manipulation as in Section 3.2, we ask for the diagnostic plots of the model. Figure 3 presents diagnostic plots for the quasi-Poisson model for the residual error term (y|v), while Figure 4 gives diagnostics for the choice of the model for the dispersion of the random term (v). The corresponding diagnostic plots were also presented by Lee et al. (2006).

Alternative software for hierarchical data
The methods described in this paper are applied to model hierarchy in the outcomes (resulting in unobserved heterogeneity). There are numerous statistical programs and packages to address this type of modeling. Here we would like to mention a few. Note that that is not our intention to be exhaustive, but rather to focus on most popular programs and which we are acquainted with.
First of all, to our knowledge prior to this R package the h-likelihood algorithms were implemented only in GenStat (Payne, Murray, Harding, Baird, and Soutar 2009) software. The capabilities of HGLMMM and GenStat h-likelihood methods are similar. Extra flexibility is provided by HGLMMM by the ability to specify several random components, allowing for a different distribution of each component. This is not available in the GenStat program.
On the other hand GenStat offers a second order Laplace approximation to estimate variance components, which is not implemented in HGLMMM.
To analyze the salamander example the software needs to allow for a crossed random effects design. Gaussian quadrature methods are not applicable in this case as the dimension of integration is often too large. The R package lme4 (Bates, Mächler, and Bolker 2011) with the function lmer allows for the analysis of this example with an ordinary Laplace approximation. The advantage of using HGLMMM is the ability to include covariates in the dispersion part of the model. This is not allowed in lmer. Packages lme4 and HGLMMM differ in the way the dispersion parameters are estimated. The HGLMMM package uses the objective function which is equivalent to the REML approach in linear mixed models. The package lme4 is however faster and more efficient. Note that lme4 is a quite popular R package for the analysis of longitudinal data. The salamander data can also be analyzed by the PQL method of Breslow and Clayton (1993) as implemented e.g., in SAS (SAS Institute Inc. 2008) PROC GLIMMIX, which allows the dispersion parameters to depend on categorical covariates. In the cake example we analyzed the breaking angle of cakes assuming a normal distribution. This analysis can be reproduced in the R package nlme (Pinheiro, Bates, DebRoy, Sarkar, and The R Development Core Team 2011) or lme4 using a random effects model or a linear model with correlated errors. Also the SAS procedures MIXED, NLMIXED or GLIMMIX can be used. When the response follows a gamma distribution SAS procedures GLIMMIX and NLMIXED can be used, but both of them assume the random effects follow a normal distribution. On the other hand, in HGLMMM we easily combine non-normal responses with a non-normal random effects distribution. Such a combination can be done in SAS PROC NLMIXED only by using the trick of Liu and Yu (2008) and requires some additional programming. Finally, note that SAS PROC NLMIXED is a likelihood-based procedure, while HGLMMM allows the utilization of the extended quasi likelihood, with a overdispersion parameter varying with covariates.
In the rat example an extended quasi likelihood analysis is presented on the number of cancer cell colonies, with the variance of the random effect distribution depending on covariates. Now it is difficult to find standard software which could repeat such an analysis. SAS PROC GLIMMIX comes closest, allowing for overdispersion in Poisson distribution and the variance of random effect to depend on a categorical covariate.
In summary, the package HGLMMM blends the distribution of the response from an exponential family with a random effects distribution from the conjugate Bayesian distributions. Further, difficult designs can be handled such as multi-membership designs or more than 2-levels in the hierarchical models. On top of that one can put covariates in mean and (over)dispersion structure. All of these features are not available in one package/software to our knowledge. The limitation of HGLMMM is the necessity to assume independent random components, this assumption can be relaxed in R packages lme4, nlme and SAS PROC MIXED, NLMIXED, GLIMMIX.

Future improvements
In future work we would like to adapt the codes to use Matrix (Bates and Mächler 2011) package to save memory and to avoid problems with large datasets. Different links for the binomial response such as "Probit" and "CLogLog" need to be implemented as well. In addition, future work will focus on the introduction of the correlation between random effects. In Molas and Lesaffre (2010) the routines were extended to handle hurdle models for count data.