On the Connections Between Bridge Distributions, Marginalized Multilevel Models, and Generalized Linear Mixed Models

Generalized linear mixed models (GLMM) are commonly used to analyze hierarchical data. Unlike linear mixed models, they do not automatically provide parametric marginal regression functions, while such functions are needed for population-averaged inferences.  This issue has received considerable attention and here three approaches to address it are reviewed, expanded, and compared: (1) the closed-form expressions of the marginal moments and distributions for a variety of GLMMs, derived by Molenberghs et al. (2010), as well as an extension that accommodates overdispersion; (2) the marginalized multilevel models  of Heagerty (1999); (3) the bridge distribution of Wang and Louis (2003), a form for the random-effects distribution that allows the conditional and hierarchical mean to be described by the same link function. Our derivations are for the identity link function, the log link, and a collection of links for binary data. We highlight a number of useful connections: (a) it is shown that the bridge distribution for data with a mean on the unit interval is unique; (b) the three approaches are different for unit-interval data with the logit link, but are connected for the probit link; for the latter, there exist closed forms; (c) further results are derived for the bridge distribution in the case of unit-interval data and a Student's $t$ link; (d) in contrast to the unit-interval case, it is shown how large classes of distributions act as bridge distributions when an identity or a logarithmic link is adopted; (e) for these links, the three approaches are either identical or closely connected; (f) it is underscored for a random-intercepts model and logarithmic link, that the data contain no information about the particular distribution for the random intercept, given that the same fit to the data can be ascribed to an entire class of random-intercept distribution; (g) the implications of the difference between the unit-interval case on the one hand and the identity and logarithmic cases on the other, regarding sensitivity to model assumptions, are discussed.


Introduction
The statistical modeler often makes use of random effects to take into account overdispersion, subjectlevel heterogeneity, or both. Classical examples of generalized linear models with overdispersion random effects (McCullagh andNelder 1989, Hinde andDemétrio 1998ab) include the Poisson model for counts with gamma random effects, producing the negative-binomial model, and the Bernoulli model for binary data with beta random effects, leading to the beta-binomial model.
When data are hierarchically organized, it is common practice to consider mixed-effects models (Laird and Ware 1982, Breslow and Clayton 1993, Wolfinger and O'Connell 1993, Engel and Keen 1994, Goldstein 2002, Verbeke and Molenberghs 2000, Molenberghs and Verbeke 2005. Molenberghs et al (2010) formulated models for non-Gaussian overdispersed and hierarchical data with two sets of random effects simultaneously, which they termed the combined model (CM). little or even no information about the parametric form of the random-effects distribution. (g) We briefly consider the implications of this last result in terms of the sensitivity of these models and their resulting inferences to unverifiable assumptions about unobservables.
In summary, the results established here clarify relationships, similarities, and differences between seemingly disparate approaches to formulating hierarchical models for non-Gaussian data. In so doing, we gain insight into the fundamental differences between data types for which the mean function has support over a finite interval on the one hand, and those with support on the half line (logarithmic case) and entire real line (identity case) on the other. Even though remarks are made regarding estimation, our results are not primarily computational in nature. Rather, they help the modeler choose an appropriate model formulation in view of the estimands about which inferences are to be made. For the same reason, small-sample and other practical issues are mostly beyond the remit of this paper.
The rest of the paper is organized as follows. Motivating data examples are introduced in Section 2. In Section 3, each of the three specifications is reviewed in as much detail as is needed for the remainder of the paper. In particular, the three integral equations that come with these specifications are juxtaposed and scrutinized. In subsequent sections, the important situations arising for binary and proportion data (Section 4), the identity link (Section 5), and the log link (Section 6) are studied in detail. Implications for the identification of the random-intercepts distribution are given in Section 7. In Section 8, some remarks are made regarding parameter estimation. An illustration is described in Section 9. Technical background, details regarding estimation, and example SAS code are provided in electronically available Supplementary Materials.

Data Examples
Two sets of data are introduced, with binary and count outcomes, respectively. This will permit us to highlight the differences between the binary case and cases where the log link is used. The data will be analyzed in Section 9.

Binary Data: A Clinical Trial in Onychomycosis
These data, previously analyzed in several publications, come from a randomized, double-blind, parallel group, multicenter study for the comparison of two oral treatments, A and B, for toenail dermatophyte onychomycosis (TDO), described in full detail by Debacker et al. (1996). TDO is a common toenail infection, which is difficult to treat and affects more than 2% of western populations (Roberts 1992).
Anti-fungal compounds, classically used for treatment of TDO, need to be taken until the whole nail has grown out healthily. The development of new-generation compounds has reduced the treatment duration to 3 months. The aim of the present study was to compare the efficacy and safety of 12 weeks of continuous therapy with treatment A or with treatment B. In total, 2×189 patients were randomized.
Subjects were followed during 12 weeks of treatment and followed further, up to a total of 48 weeks.
Measurements were taken at baseline, every month during treatment, and every 3 months from then on, leading to a maximum of 7 measurements per subject. The outcome of interest was the severity of the infection, coded as 0 (not severe) or 1 (severe). The question of interest was whether the percentage of severe infections decreased over time, and whether that evolution was different for the two treatment groups.
In all cases, the fixed-effects predictor will take the form: where T i is an indicator for treatment (1 for the experimental compound and 0 for standard treatment) and t j is time in months at the jth visit.

Count Data: A Clinical Trial in Epileptic Patients
These data are from a randomized, double-blind, parallel group multi-center study for the comparison of placebo with a new anti-epileptic drug (AED), in combination with one or two other AED's (Faught et al. 1996) Let the number of epileptic seizures that patient i experienced during follow-up period j be Y ij and t j the time point of the jth measurement. Further, T i = 1 for the active group and T i = 0 for placebo. All models make use of the following predictor: 3 Three Related Specifications

Generalized Linear Mixed Models (GLMM) and Combined Models (CM)
We begin by introducing the combined model (CM). Suppose a longitudinal outcome Y ij is measured for each independent subject i = 1, . . . , N at occasion j = 1, 2, . . ., n i . The response for the ith subject, . . , Y ini ) T is assumed to follow an exponential family distribution. In a conventional GLMM, the outcome Y ij is modeled conditionally on a normal random effect b i that captures correlation between repeated measures and some overdispersion. Molenberghs et al. (2010) introduced a second set of random-effects θ ij to represent overdispersion in a more flexible way. Hence, the density of an individual outcome Y ij , conditional on both sets of random effects, takes the generic form: The conditional mean is modeled as the product: where the overdispersion random effect follows a distribution θ ij ∼ Θ ij υ ij , σ 2 ij , with mean υ ij and variance σ 2 ij . Given that θ ij enters the mean function directly, it has to preserve the range of the mean.
Molenberghs et al. argued that a conjugate distribution is a convenient choice for reasons of computation and interpretation. For example, in many cases the derivation of marginal moments and distributions then simplifies. The other mean component is written where X i and Z i are n i × p fixed-effects and n i × q random-effects design matrices, respectively. Their jth rows are denoted by x ij and z ij , respectively. g(·) is a link function and β and b i ∼ N (0, D) are fixed and random effects, respectively. The two sets of random effects θ ij and b i are conveniently assumed to be independent of each other, although this constraint can be relaxed. All parameters can be estimated, for example, using maximum likelihood .
Depending on the type of outcome under investigation, the distribution of θ ij can be chosen appropriately, such as a beta distribution for binary data or a gamma distribution for count and time-to-event data. Molenberghs et al. (2010) considered various such examples, some of which will return in Sections 4-6.
They paid particular attention to the case where the distribution for θ ij is conjugate for exponential family (3). For the case where also normal random effects are also present, Molenberghs et al. (2010) defined strong conjugacy, which essentially means that a simple, closed-form marginalization over the strongly conjugate random effect, but conditional on the normal random effect, applies.
When the CM does not contain random effects θ ij , a conventional GLMM (Breslow andClayton 1993, Wolfinger andO'Connell 1993) arises. In reverse, when no normal random effects are present, standard overdispersion models follow, such as the beta-binomial model for binary data, the negative-binomial model for count data, etc. (Hinde and Demétrio 1998ab). Marginalization of the model's mean structure is discussed in Section 3.4.

Marginalized Multilevel Models (MMM) and Combined Marginalized
Multilevel Models (COMMM) The general marginalized multilevel model due to Heagerty (1999) can be written as: Here, F b (0, D) is a distribution with mean zero and variance-covariance matrix D, while F Y c µ c ij , υ is a distribution parameterized by, say, a location and scale parameter, which could but do not have to be the mean and variance. The marginal mean µ m ij = E(Y ij ) depends on an n i × p matrix of p linear predictors X i through a link function g(·). Further, the conditional mean µ c ij = E(Y ij |b i ) relates to the random variable b i with distribution (8) and the function ∆ ij connects the marginal and conditional means through the same link function; the latter aspect could be relaxed if desired (Griswold and Zeger 2004). The outcome Y ij , given the random effects, is assumed to follow an exponential family model like (3) but now obviously with conditional expectation modeled as in (7). The function ∆ ij is obtained from solving an integral equation that will be spelled out in Section 3.4. The model parameters β have a marginal interpretation, so the use of MMMs or COMMS is more appropriate when population-averaged inferences are to be made.
By analogy with the CM described in the previous section, Iddi and Molenberghs (2011) extended the MMM by the inclusion of overdispersion random effects, i.e., by writing the conditional mean as θ ij κ ij , as in (4), but with parametric form (7) rather than (5). This slightly changes the integral equation leading to the connector ∆ ij , as will be shown in Section 3.4. Iddi and Molenberghs (2011) show that the resulting COMMM can be estimated easily from the data.

Bridge Distributions
Wang and Louis (2003) specified a model through the constraint that the marginal mean and conditional mean are specified by identical link functions, with predictors that are the same up to a multiplicative factor φ and an offset k: Specification (9) is similar to (6)- (7), but now the distribution of b i is unknown, rather than the connector ∆ ij in (7). From (9) it follows that β has a marginal as well as a subject-specific interpretation, up to the factor φ. Wang and Louis (2003) focused on the binary case, where g(·) is, for example, the logit, probit, or complementary log-log function. They restricted the random-effects structure to a random intercept only. Wang and Louis (2004) allowed for covariate-dependent random effects by modeling the variance of the random intercept. In sections to follow, we will see that in a number of settings, vector random effects also admit bridge constructions. This is true for the probit link for binary data (Section 4.3), the identity link (Section 5), and the log link (Section 6). Other link functions for the binary case are examined in Section 4. Note that here a combined-model version can be considered as well. This would follow from multiplying g(µ c ij ) by θ ij . The resulting marginal mean would get the extra factor E(θ ij ).
We do not consider this further.

Three Connected Integral Equations
As is clear from the discussion above, the three approaches are different marginalization operations but they have much in common. The connections between the approaches can be important when considering appropriate modeling frameworks in the light of the type of inferences to be drawn. To bring this out more clearly, we now juxtapose them. In all three approaches, the fixed-effects predictor x ij β, random-effects predictor z ij b i , and link function g(·).
Marginalization of GLMM or CM Integral: The random-effects density f(b i ) is given, while the marginal mean is unknown. This leads to an explicit integral: In the case that the combined model is considered rather than the GLMM, the overdispersion random effect distribution is also given and (10) becomes: Apart from special cases, µ m ij will not be a simple function of the model parameters. Thus, while marginalization is possible, sometimes even analytically, this does not imply that marginal effects of scientific interest (e.g., treatment effect) are always available in terms of simple parametric functions.
This motivates the use of the following two approaches.

Marginalized Multilevel Model Integral Equation:
The random-effects density f(b i ) is given, with the connector function ∆ ij unknown and identified through the relationships: When a combined model is considered, the distribution of the overdispersion random effect is also assumed to be known and (12) changes to: In contrast to (10) and (11), (12) and (13) are integral equations, which implicitly define the unknown quantity, here the connector function ∆ ij .

Bridge Integral Equation:
The random-effects density f(b i ) and the constants k and φ are unknown but identified through: As before, the identifying relationship is the solution to an integral equation.
In what follows, it will be shown that the three operations are different from each other for most link functions on the unit interval. The probit link is a notable exception in this respect. For a variety of other data types, either the linear link or the log link is routinely considered. In these cases, the three operations either coincide or are intimately connected. Furthermore, while for binary data there is a unique bridge distribution for a given link function, for continuous or non-negative data with linear and log links, the class of bridge distributions is very large.

The Binary Case
We will first review the specific theory for the binary case, as derived by Wang and Louis (2003), and present some further results and reflections. For ease of exposition, the focus will be on the unit interval, but transformation of the results to other finite intervals is straightforward.

Bridge Distribution Functions for the Binary Case
For the binary case, bridge integral equation (14) becomes a map between cumulative distribution functions, because g −1 (·) is a monotonically increasing map from the real line onto the unit interval. To emphasize this, we will write H(·) ≡ g −1 (·) for this case. Thus, (14) becomes (Wang and Louis 2003): Here, η ≡ x ij β for ease of notation, and the derivation is focused on a random intercept only. We will briefly sketch the arguments of Wang and Louis. Details can be found in the original paper. The authors took derivatives on both sides of (15) with respect to η, leading to with h = H , the first derivative. This is a convolution: h * f −b (η) = φh(k + φη). The subscript −b refers to sign reversal. They then transformed this equation to the Fourier domain and applied properties of the Fourier transform to yield: the Fourier transform. Clearly, (17) maps the density corresponding to the inverse link function to its characteristic function, as also noted by Wang and Louis (2003). Applying the inverse Fourier transform yields the generic solution to (15): They also showed that, for symmetric h(·), k = 0, and that 0 ≤ φ ≤ 1.
In addition, note that the existence and uniqueness of the bridge density is guaranteed. Existence holds whenever h(·) is integrable and non-degenerate, which is satisfied for all conventional links, precisely stemming from the connection with density functions. Uniqueness follows by construction and from the uniqueness of the Fourier transform.
It is very important to realize that the above results apply strictly to the case where the (inverse) link function is in a 1-1 relationship with a density function. It is tempting to conclude that this holds for binary data only, but the same arguments apply to proportions. That said, binary data and proportions are the two data types for which the above results follow in a natural way. Thus, these results apply to a wide class of link functions for two particular data types and, by linear transformation, to all situations where the mean ranges over a finite interval. An example of this is when a correlation coefficient would be modeled to make it depend on covariates. Whereas the results in the binary case are very specific, they are general for the identity (Section 5) and log (Section 6) links.

Logit Link
The combination of the logit link for binary hierarchical data and normally distributed random effects is arguably the most commonly encountered mixed-model setting for non-Gaussian data. At the same time, it is the most problematic one in numerical terms.
When marginalizing the GLMM or CM, the integral is: Here, ϕ(b i |0, D) is the zero-mean normal density with variance-covariance matrix D. In what follows, the parameter arguments will be suppressed from notation. As has been well documented (Breslow and Clayton 1993, Wolfinger and O'Connnell 1993, Zeger, Liang, and Albert 1988, Molenberghs and Verbeke 2005, this integral has no closed-form solution. A consequence of this is that the MMM (or COMMM) integral equation has no closed-form solution for the connector ∆ ij . In neither (19) nor (20) does θ ij pose a problem, given that its integral is explicit and can be replaced by E(θ ij ). In the more conventional GLMM situation where there are no overdispersion random effects, θ ij is removed from (19) and (20).
Should it be possible to solve (19) and (20) analytically, perhaps up to linear transformations, then the marginal mean on the one hand and the connector on the other would take simple forms in x ij β, and hence the normal density would act as a bridge density, a contradiction given the uniqueness stated in Section 4.1 and the fact that Wang and Louis (2003) derived a quite different form. In particular, these authors focused on a random intercept only and solved: Their solution, derived through (18) and Fourier transform operations, reads k = 0 and and d the random-intercept variance. Wang and Louis (2003) studied the properties of (21). Griswold and Zeger (2004) noted that, once the bridge distribution is given, it can be cast in MMM form: with mean 0 and variance d.

Probit Link
In spite of the close connection between logistic and probit regression for univariate binary data, the numerical aspects in the probit case are much simpler than in the logistic situation when data are hierarchically organized and random effects are present. It follows from several sources (Zeger, Liang, and Albert 1988, Griswold and Zeger 2004) that: with Expression (24) is the marginal mean for the probit-normal GLMM. In addition, the connector follows immediately: Finally, from (24) we see that the normal density is the bridge with k = 0 and φ ij as in (25). Note that φ depends on subject i and occasion j. However, if the random-effects structure consists of an intercept only, then φ ij in (25) reduces to The resemblance to (23) is immediately clear. Wang and Louis (2003) showed that the normal density is the bridge for the random-intercept probit model from straightforward application of (18).
In conclusion, in contrast to the logit case, all three operations lead to closed forms and can be said to coincide, up to perhaps a multiplicative factor of the form (25).
When the CM is used to extend the probit-normal GLMM, then (24) Similarly, for the corresponding COMMM, connector (26) becomes Expression (28) does not generally simplify and therefore is considerably more complex than (26). This results from the fact that the probit, in contrast to the identity and logarithmic links in Sections 5 and 6, respectively, does not allow for easy absorption of constants. Note that the closed-form solutions for the probit link can also be exploited when using the logit link, given that one function, when appropriately scaled, can approximate the other (Demidenko 2004, p. 338). This feature is alluded to for the combined model in Molenberghs et al. (2010).
For the MMM case, Griswold and Zeger (2004) constructed a connector integral equation that combines both probit and logit links. Details on this are given in the Supplementary Materials (Section A).

Other Link Functions
Apart from the logit and probit link functions, Wang and Louis (2003) studied the complementary loglog and Cauchy links. Some detail on the complementary log-log link is given in the Supplementary Materials (Section B.1). An overview of the derivations for the Cauchy links can be found there as well (Section B.2). For the latter case, when a Cauchy link H(η) = 1/π (π/2 + arctan η) is used, the bridge is again Cauchy, with parameter c 2 = (φ −1 − 1) 2 , leading to φ = (c + 1) −1 . Note that while the random-effects distribution has no finite moments, the conditional and marginal outcome distributions are Bernoulli and hence well defined.
A natural accompaniment to the probit and Cauchy links, is Student's t distribution, which has not been studied before in this context. The density is: Hurst (1995) shows that the characteristic function is where K α (ξ) is the modified Bessel function of the second kind, with index α (Abramowitz and Stegun 1964, p. 375). Using generic expression (18) and (29), together with the fact that symmetry implies k = 0, the bridge distribution for the t link can be shown to be: The connection to the two limiting cases is immediately clear. If ν = 1 then the t distribution reduces to the Cauchy distribution and the characteristic function becomes: Plugging the expressions K 1/2 (x) = π/2e −x x −1/2 and Γ(1/2) = √ π into (31) immediately produces (F h 1/2 )(ξ) = exp(−|ξ|), as mentioned for the Cauchy case, which led to a Cauchy bridge. Likewise, it can be derived (Hurst 1995) that (29) converges to (F h ∞ )(ξ) = exp −0.5ξ 2 , the characteristic function of the normal distribution, a fact that also follows from straightforward application of (17). Therefore, we can conclude that (30)

The Identity Link
The most prominent instance of the identity link is undoubtedly the linear mixed model (Verbeke and Molenberghs 2000), where the conditional mean function is linear in the fixed and random effects, the latter normally distributed. The three operations of Section 3.4 are now almost trivial.
First, marginalizing the linear mixed model is: Second, the connector function of the MMM is found through: i.e., ∆ ij = x ij β.
Third, we turn to the bridge distribution. Note that, in contrast to (16), there now is no relationship with a density function, because the mean ranges over the entire real line. Formally, the bridge integral equation becomes: Clearly, (32) is satisfied by any integrable density has mean zero, then further k = 0; this is a more desirable result, because then k is constant. In Comparing these three results, we can safely assert that the triple of operations can be made to coincide for linear links with zero-mean normally distributed random effects. This was not true for unit-interval links, except for the probit link and normally distributed random effects, where a close connection exists.
In sharp contrast with the uniqueness occurring in the binary case, the solution in the identity-link case is not unique. To see this, assume that b i in (32) follows a distribution with finite mean and density f(b i ), then the equation reduces to: producing φ = 1 and k = z ij E(b i ). Even more generally, if the conditional predictor is changed to a generalized additive structure E(Y ij |b i ) = γ 1 (x ij , β) + γ 2 (z ij , b i ), with obvious notation, and the function γ 2 (·, ·) has finite expectation over the random effects, then we find φ = 1 and k = E[γ 2 (z ij , b i )]. In conclusion, every random-effects distribution that satisfies mild regularity conditions is a bridge distribution for a model with identity link. The predictor can be linear or even generalized additive in fixed and random effects. Note that we did not specify the conditional outcome distribution, but restricted specification to the mean. A counterexample of a bridge that does not satisfy the regularity conditions is the Cauchy distribution. Owing to its lack of finite moments, the integral in (32) is not well defined.
One might argue that the identity link leads to an almost trivial situation, owing to linearity, and therefore the above results are not surprising. In the next section, it will be shown that the log link shares many, though not all, of these attractive properties.

The Ubiquitous Log Link
Switching to counts, it is natural to consider a Poisson GLM for the model specification conditional upon the random effects. Similarly, for time to event outcomes, one often assumes an exponential or Weibull GLM. This typically implies the use of a logarithmic link function. Marginalizing the corresponding Poisson-normal GLM is then also straightforward Albert 1988, Molenberghs, Verbeke, andDemétrio 2007): When the CM is considered instead, the conditional mean function becomes E(Y ij |b i , θ ij ) = θ ij e x ij β+z ij bi and then, slightly modifying (34), the marginal mean takes the form Note that we are able to absorb the overdispersion random effect into the argument to the inverse link function, a consequence of strong conjugacy . In the specific case that θ ij follows a Gamma(α 1j , α 2j ) distribution, the first term in (35) is ln(α 1j α 2j ).
Turning to the second operation, the connector function follows from (Griswold and Zeger 2004): and hence For the corresponding COMMM, embedding θ ij into (36) produces the connector function For the bridge distribution, it is immediately clear that the bridge integral equation is satisfied for normally distributed random effects, φ = 1, and k ij = 1 2 z ij Dz ij . It is even possible to incorporate the overdispersion random effect θ ij into the model and then absorb the logarithm of its mean into the predictor and hence into k ij .
As in the case of the identity link, the three operations are strongly related, though not identical. Indeed, for the marginalized GLMM, the marginalized CM, and for the bridge, the constant k ij appears in the marginal mean, whereas the same constant appears in the connector when considering the MMM or COMMM.
Also in line with the identity link, a wide variety of random-effects distributions satisfy the bridge integral equation (39). Slightly generalizing this requirement: it is clear that every random-effects distribution satisfies (40), with φ = 1 and provided the expectation exists.
In the case of the identity link, the result follows from additivity of the integration operator. Here, a similarly general result follows from the fact that integration and multiplication with a constant can be interchanged; the fixed-effects portion of the mean acts as the multiplier.
At the beginning of this section, we referred to count data, but nowhere in the development was this feature used explicitly. Only the use of the log link is essential. This implies that the results are valid for data other than counts, for example they also apply to the Weibull, the gamma, and the inverse Gaussian models for continuous data, to name but a few. Some detail on these is given in the Supplementary Materials (Section C).
The relative ease with which marginal and conditional means have the same or similar parametric forms for the log link, as well as for the identity link, has been noted also by Ma and Jørgensen (2007), who propose the Tweedie exponential dispersion distribution to build generalized mixed models. For the log-link case, their family encompasses, for example, the Poisson, gamma, and inverse-Gaussian model.
It also encompasses the normal model for the identity link. This implies, for example, that the bridge approach and the Tweedie approach in the Poisson case coincide.

Random-intercepts Distributions In Models With Log Link
In this section, we use the log link and a random-intercepts model to provide a simple illustration of the fact that only the first moment of the random-effects distribution is identifiable from the data. Assume that the random intercept follows a distribution with density f(b i ), then it follows that: Further if the random intercept were normally distributed, (41) becomes: Clearly, the two random-intercepts related constants on the right hand sides of (41) and (42) If f(b i ) is a density with known parametric form but unknown parameters, then the left hand side of (43) can be equated to the right hand side with d replaced by d. In conclusion, the fit from a GLMM can be changed into that of a collection of models with alternative random-intercept distributions, (a) without re-fitting a model to the data and (b) without altering the marginal mean function. This result is connected to that of Verbeke and Molenberghs (2011), who showed that the posterior distribution of the random effects given the data in hierarchical models is unverifiable from the data. The difference between our setting and theirs is that they considered the fit of the entire (multivariate) marginal model, whereas here we are concerned with the marginal mean function, an important component of the marginal model that nevertheless does not uniquely identify it. With our particular choice for the log link and a random intercept only, one can go further and show that, in the same way, the prior is unverifiable, because φ = 1 and the random intercepts merely contribute a constant to the marginal linear predictor; this constant can stem from a variety of distributions, not just from a standard normal.

Estimation
All models considered can be fitted easily with available software, for example, the SAS procedure NLMIXED. In the Supplementary Materials (Section D), various approaches are reviewed and example code is given.
Every data type and link function has its own subtleties. It is clear from Section 4.2 that the three operations will lead to different models, in spite of the fact that each one can be implemented in standard software. By contrast, because the probit link corresponds to a normal bridge distribution, the models resulting from the three approaches (GLMM, MMM, bridge) all allow for closed forms. Nevertheless, the models themselves are not entirely equivalent. In a conventional GLMM, it is clear from (24) that the regression function φ ij x ij β changes with levels of z ij , owing to the presence of φ ij . In contrast, in a MMM, the connector function is introduced to cancel out such changes and hence the marginal regression function simply becomes x ij β. In other words, the choice between GLMM and MMM is between which of the two regression functions, either marginal or conditional, will depend on the random-effects model.
Note that this is not an issue when only a random intercept is present, because then φ is constant.
Thus, only for a random-intercept model are the three approaches exactly the same. For general randomeffects structures, the GLMM and bridge models are identical, while these two are closely related but not identical to the MMM.
The log link setting is almost a mirror image to the probit link. Whereas for the probit link k = 0 and φ = 1, showing that the GLMM and MMM are of the same functional form but with different parameterization, for the log link k = 0 and φ = 1, as shown in Section 6. One can then fit a classical GLMM, for example a Poisson-normal model for count data. Evidently, the GLMM and the bridge are identical because the normal is one of the many possible bridge distributions. Furthermore, in contrast to the probit case, the GLMM and the MMM have the same marginal covariate effects, except for the intercept, as is clear from (37). The MMM absorbs the changes in intercept into the connector function.
However, whenever the intercept is of no scientific interest, it is sufficient to fit a GLMM. This fact has been known for a long time and was reported in Zeger, Liang and Albert (1988). Moreover, the same holds between the CM and COMMM when also incorporating overdispersion random effects, as can be seen from (38). Given the many situations where a log link applies (see Section 6), this useful result is very broadly applicable.

Binary Data: A Clinical Trial in Onychomycosis
Eight models are considered: standard GLMM with logit (1a) and probit (1b) link, as well as (1c) (2c); the remainder are added here. The code for (3a), using the SAS procedure NLMIXED, follows that of Wang and Louis (2003). For (2a), the marginal logit link is combined with a probit link for the conditional mean, for ease of implementation, as is clear from Section 4.3. Griswold and Zeger (2004) stated that the model where both links are of a logit type remains computationally challenging.
The fixed-effects predictor (1) is combined with a random intercept with b i ∼ N (0, d). The overdispersion random effect is assumed to follow a Beta(α 1 , α 2 ) distribution, in agreement with earlier analyses (Molenberghs et al. 2010, Iddi andMolenberghs 2011).
Parameter estimates and standard errors are displayed in Table 1. Comparing panels (a) and (b) clearly shows the impact of changing the logit to the probit link. In panel (a), all three models are different, in agreement with our derivations in Section 4.2. In contrast, when a probit link is chosen, the GLMM and bridge models are identical, while the parameters in the MMM all follow from the same multiplicative transformation, by a factor √ 1 + d 2 , a fact known from Section 4.3. The combined model extensions, displayed in panel (c) are given for generality. Comparing panels (b) and (c) shows that correcting for overdispersion reduces the strength of the treatment effect.
We have not considered a combined version of the bridge models for the binary case, because the mean of the overdispersion random effect cannot be absorbed into the predictor within the inverse link function, and hence it is not even clear how to define the bridge integral equation in an unambiguous fashion. This is different for the log link case, as shown in Section 6.

Count Data: A Clinical Trial in Epileptic Patients
All six models fitted make use of the log link and are based upon the linear predictor (2). These are: (1a) the standard GLMM, (2a) the MMM, and (3a) the bridge model; (1b)-(3b) are the combinedmodel extensions. For the combined models, the overdispersion random effects are assumed to follow a Gamma(α, 1/α) distribution.
Parameter estimates are presented in Table 2. As shown in Section 6, the bridge and GLMM models coincide, because the normal distribution is one of the many possible bridges and the GLMM happens to be based, therefore, on one particular member of the bridge family. In the binary case though, the logistic-normal GLMM is not based on a bridge. This allows us to also consider a combined-bridge model (3b), in contrast to the binary case. Obviously, models (1a) and (3a) on the one hand, and (1b) and (3b) on the other, are identical. Furthermore, as shown in the same section, the marginalized versions of these coincide with the original model except for the intercept. This implies that all treatment-effect parameters and functions of these are independent of the choice between GLMM, MMM, or bridge (with the same holding true for the combined versions). Note that, while both intercepts change under marginalization, they do so to the same degree: 0.5 * d, as follows from (37), but applied in the random-intercepts case.
This implies that the difference between the two intercepts, interpretable as the treatment difference at baseline, does not change when marginalizing. In conclusion, for the log link, only the intercepts transform and do so in an additive way, whereas for the probit link the transformation is multiplicative and applies to all fixed-effects parameters.

Concluding Remarks
In this paper, we have brought together three strands of research: (1) directly marginalizing the GLMM or its CM extension; (2) the MMM and the COMMM extension ; and (3) the bridge distribution. The following observations wer made.
First, the special status of data on the unit interval, i.e., binary data and proportions, was demonstrated.
In this case, the bridge distribution is unique, owing to the fact that a distribution function is used as inverse link function.
Second, within this class, the logit link has a special status, because the GLMM, MMM, and bridge approaches are fundamentally different from each other, and the first two do not allow for a closed-form solution.
Third, in contrast to this, the probit link does allow for closed-form solutions in all cases, given that the normal distribution is the bridge distribution. The bridge and GLMM approaches are identical, and closely connected to the MMM.
Fourth, additional link functions for the unit-interval case were studied, including the Cauchy and complementary log-log link, already studied by Wang and Louis (2003), and Student's t link, studied here for the first time. An appealing feature of this link is that the Cauchy link follows as a special case and the probit link follows by taking a limit.
Fifth, fundamental differences between the unit-interval case and all cases where a log link is used have been establishes. With this link, many random-effects distribution can be used as bridge distribution.
Furthermore, the three approaches are identical, apart from a difference in effect on the intercept. It follows from this that the researcher has to reflect on which approach to choose, in view of the inferences to be drawn. For purely subject-specific inferences, the GLMM is a sensible choice, whereas the MMM is an obvious candidate for marginal inferences. The bridge models might be preferred when both marginal and conditional inferences are of interest. For the log and identity link cases, the choice is less critical, given the close connection between the model families.
Sixth, for the log link and in particular for the random-intercept model, it is shown that the marginal regression function contains no information about a particular random-intercept distribution, provided the latter has a finite mean.
We conclude by a note on generality. The results for link functions on the unit interval can be generalized, using appropriate linear transformations, to any finite interval. Likewise, the results for the logarithmic link can be made to apply to any half line, not just the non-negative real numbers. It was shown through the gamma and inverse Gaussian models that seemingly different links, such as the inverse and the squared inverse, turn into a version of the logarithmic link when applying the range-preserving logarithmic transformation to the linear predictor. The generality of the identity link is self-evident.
leading to When there are no overdispersion random effects, (44) simplifies to We note the following three points.
First, this MMM operation does not correspond to marginalization of a GLMM or CM, because integrating the right hand side of (46) with prespecified ∆ ij = x ij β produces a probit, which would lead us back to (24). This makes it genuinely a different operation, useful in its own right.
Second, this operation does not correspond to an easy bridge construction, as can be shown by contradiction as follows. Assuming that the normal density would act as a bridge, we would have a solution for k and φ in where the notation β is self-evident. While k = 0 follows immediately, the solution for φ would be which depends on β and hence is not allowable. Of course, as a partial answer to this, it is possible to extend the general bridge distribution theory by replacing (15) Further, moving to links not on the unit interval, the generic bridge integral equation (14) can be replaced by The fact that there is no corresponding bridge construction implies that this MMM operation stands on its own.
Third, maintaining the probit link on the right hand side of (46) but replacing the expit on the left hand side by a generic c.d.f. H(·), yields an explicit connector, in the spirit of (45): Thus, the closed form is entirely due to the use of the probit link in the conditional model, and is independent of the link for the marginal model.

B.1 The Complementary Log-log Link
For the complementary log-log H(η) = 1 − exp[− exp(η)], Wang and Louis (2003) derived the bridge: In other words, the log-positive stable distribution is the bridge for the complementary log-log link. Γ(·) is the gamma function.

B.3 The t Link
We can also consider Student's t distribution as an inverse link function, with corresponding density . Hurst (1995) shows that the characteristic function is where K α (ξ) is the modified Bessel function of the second kind, with index α (Abramowitz and Stegun 1964, p. 375). which implies that The ratio of (47) and (48) is Using generic expression (18) and (49), together with the fact that symmetry implies k = 0, the bridge distribution for the t link can be shown to be: The link with the two limiting cases is immediately clear. If ν = 1 then the t distribution reduces to the Cauchy distribution and the characteristic function becomes: Plugging the expressions K 1/2 (x) = π/2e −x x −1/2 and Γ(1/2) = √ π into (51) immediately produces (F h 1/2 )(ξ) = exp(−|ξ|), as mentioned for the Cauchy case, which led to a Cauchy bridge. Likewise, it can be derived (Hurst 1995) that (47) converges to (F h ∞ )(ξ) = exp −0.5ξ 2 , the characteristic function of the normal distribution, a fact that also follows from straightforward application of (17).
Therefore, we can conclude that (50) offers a continuum of bridge random-effects distributions, with Cauchy and normal distributions as limiting cases.

C Further Uses of the Log Link
When the log link is applied to the eclectic collection of models made up of the Poisson, Weibull, gamma, inverse Gaussian model, etc. then a large class of distributions acts as bridge distribution on the one hand, and the three operations can be made to coincide for normal random effects on the other, as was stated in Section 6. Given that the Poisson model was studied in detail in the main paper, we provide some further material on the Weibull, gamma, and inverse Gaussian cases.
In the Weibull case, the conditional mean of the CM can be written as : Here, ρ is the Weibull shape parameter, and λ is an additional mean parameter, that could be absorbed in either the linear predictor or the overdispersion random effect θ ij . Given normality of b i , the marginal mean follows easily: From this marginalization, and by analogy with the Poisson derivations above, the connector is: The last term on the right hand side of (54) applies only when the COMMM is considered. Likewise, also here the normal distribution is one of the many bridge distributions. If adopted, it corresponds to the values φ = 1 and k = z ij Dz ij /(2ρ 2 ) + ln E(θ ij ).
The natural parameter can be written as based upon which the mean becomes and hence, like in the Weibull case, the log link results apply.

D Details on Estimation and Use of the SAS Procedure NLMIXED D.1 Overview of Estimation Strategies
The connections drawn out between GLMM, CM, MMM, COMMM, and bridge models have implications for their respective estimation strategies. In this section, we argue that all of the approaches considered can be fitted rather easily with standard software tools. Even in the binary case with conditional logit specification, all of the models can be fitted using a flexible parameter estimation device for non-linear mixed-effects models such as, for example, the SAS procedure NLMIXED. Parameter estimation and corresponding software tools have received considerable attention and Griswold and Zeger (2004) offer an extensive review. We mention, in particular, Pinheiro and Bates (2000), Diggle et al. (2002), Goldstein (2002, Skrondal and Rabe-Hesketh (2004), and Molenberghs and Verbeke (2005).
The most straightforward case is the standard GLMM, for which a variety of numerical methods to evaluate the marginal function have been proposed, such as first-and second-order Taylor series expansions its corresponding cumulative distribution function. Wang and Louis (2004) extended their estimation method to allow for covariate-dependent random-intercept models.

D.2 Example Software Code
In this section, code is provided for the various analysis done on the toenail dataset. All are based on the SAS procedure NLMIXED with a few instances of the SAS procedure GLIMMIX as well.