Some New Random Effect Models for Correlated Binary Responses

Abstract Exchangeable copulas are used to model an extra-binomial variation in Bernoulli experiments with a variable number of trials. Maximum likelihood inference procedures for the intra-cluster correlation are constructed for several copula families. The selection of a particular model is carried out using the Akaike information criterion (AIC). Profile likelihood confidence intervals for the intra-cluster correlation are constructed and their performance are assessed in a simulation experiment. The sensitivity of the inference to the specification of the copula family is also investigated through simulations. Numerical examples are presented.


Introduction
Data in the form of clustered binary responses arise in many elds of study. For example, in ophthalmology the two eyes of a patient are a cluster. In toxicological studies [8], a litter is a cluster of newborn animals. In education, the school and the classroom are clusters of students. Units within a cluster are more likely to be similar than units from di erent clusters and the study variables often exhibit an intra-cluster correlation. Standard statistical methods that ignore such a correlation provide a poor t of data and can lead to erroneous inferences. Such an extra-binomial variation can be accommodated by specifying a random cluster e ect: the responses in a cluster are conditionally independent given the cluster e ect. A common model is the beta-binomial, see [23], where the conditional probability of success in a cluster has a beta distribution. This model has been criticized for its lack of exibility when the cluster sizes are variable [6]. Other random e ects models for clustered data include the logistic-normal-binomial model described by [24], and the probit-normal-binomial, see [15].
Several measures of association for binary traits have been proposed in the literature. These include the Kappa coe cient [5,Chap. 8], the odds ratio [3,11]. This work focusses on the estimation of the "uncorrected" intra-cluster correlation (ICC). The ICC is a measure of the similitude of observations taken in the same cluster. It can be interpreted as an index of familial aggregation and as measure of inter-rater agreement, depending on the situation. When planning a cluster sampling or a cluster randomization trial, the clusters are the statistical units and the precision of the results depend on the ICC. Nowadays cluster randomized trials are widely used to assess various modes of community interventions. They involve assigning randomly clusters of per-sons, often belonging to the same neighborhoodf or group, to the arms of a trial. In such situations the ICC plays a crucial role in the sample size determination, as it measures the homogeneity within clusters [5]. ICC estimates are now published in the epidemiological literature for various types of intervention, see [16] for a recent example. It is therefore important to have precise ICC estimators with reliable methods for constructing con dence intervals.
The estimation of the ICC for binary data has been considered by many authors. [17] compare more than 20 estimators of the ICC, derived using either maximum likelihood or a modi ed method of moments, in a Monte Carlo study. They conclude that the estimator of [7] can be used as an omnibus estimator as it generally has good sampling properties. Recently, the construction of con dence intervals for the ICC has been investigated by [4,18,25]. The rst modeled the extra binomial variation using [12] generalized binomial distribution, [4] use a z-transform and a distribution-free large sample variance to construct an ICC con dence interval while [18] assumes a beta-binomial distribution. [18] also compared six methods for constructing con dence intervals for the ICC and found that, for beta-binomial data, the best procedure was a pro le likelihood con dence interval (PLCI). Note that inference about the ICC using a Bayesian framework have also been investigated, see [1,21,22].
This work investigates the construction of PLCI for the ICC using a new class of probability models for exchangeable clustered binary data, associated with exchangeable Archimedean copulas [13, chap. 2]. This family contains several speci cations for the extra binomial variation. The procedure investigated in this work consists in (i) selecting a model for the random cluster e ect and (ii) constructing a pro le con dence interval for the ICC using the model selected at step 1.
Section 2 introduces a class of model for an extra binomial variation associated with multivariate Archimedean copulas. Graphical models' comparisons are provided. Section 3 discusses model selection, the calculation of maximum likelihood estimates for the ICC, and the construction of pro le likelihood con dence intervals. These new statistical methods are investigated through simulations in Section 4. Two numerical examples are used to illustrate the proposed approach in Section 5.

Within cluster dependency modeling for binary exchangeable data . Model formulation
Consider a random sample of K clusters of size n i , i = , , . . . , K. Let Y ij , j = , , . . . , n i be a set of n i exchangeable Bernoulli random variables in cluster i, where Y ij = is a success and 0 is a failure. These random variables are identically distributed and possibly dependent. The total number of successes in the i th cluster is Y i = ni j= Y ij . A standard model for the within cluster dependency assumes that the probability of success p i is random and varies from one cluster to the next. The models considered here associate to cluster i a positive random variable θ i whose Laplace transform, ψα(t) = E(e −tθi ), depends on a positive dependency parameter α is a such way that P(θ i = ) = when α = . Let π ∈ ( , ) stands for the marginal probability of success; the proposed model is where ψ − α (π) denotes the function inverse of the Laplace transform ψα(·) evaluated at π. Under model (2.1), p i ∈ ( , ) and the marginal probability of success is E(p i ) = π. This model has a clear separation between π and the random variable for the dependency, θ i . Indeed, ψ − α (π) is a mere scale parameter for the distribution of − log p i . Model (2.1) assumes exchangeability within each cluster. This assumption could be questionable for familial data where the mother-child association could be stronger than that between two siblings. Table 1 gives the inverse of the Laplace transforms ψ − α (t) and the densities fα(θ) of the random e ects θ i for four models commonly used in the copula literature. For Gumbel (G) and Clayton (C) families, fα is a density de ned on R + while, for the models of Frank (F) and Joe (J), fα is a probability mass function de ned on the positive Table 1: Inverse of Laplace transforms, the densities of the random e ects θ i , and moments for four copula families, Clayton (C), Frank (F), Gumbel (G), Joe (J).
For the models in (2.1), the k-moments E(p k i ) have the following form, λ k = ψα{kψ − α (π)}; explicit expressions for these moments are given in Table 1. It follows that, given π and the association parameter α, the moment based de nition of the intra-cluster correlation (ICC) ρ under models (2.1) is given by . (2. 2) The ICC is the correlation between di erent Bernoulli variables, Y is and Y i , of cluster i. It is an increasing function of α, with ρ = when α = for all models in Table 1. Further interpretations of this coe cient can be found in [5,Chap.8].
For Gumbel's model, the random e ect θ i has a positive stable distribution with parameter /(α + ) and fα(θ) does not have a closed form expression. For this model, the moments λ k are equal to those for the q model of [8]. This provides an additional motivation for Kuk's proposal as it can be derived using stable random e ects in (2.1). Consider now Clayton's family. When α = , θ i has a negative exponential distribution with parameter 1, and p i has a beta distribution with parameters ( ψ − α (π) , ). Therefore Clayton's model with α = gives the beta-binomial distribution with parameters ( ψ − α (π) , ). For Frank's and Joe's model, θ i is an integer valued random variable de ned on N + . For Frank's copula, θ i follows a logarithmic distribution, while for Joe's family it is the Sibuya distribution. The later distribution is heavy tailed as no moment exists. See [13] for more information about these distributions. It might seem unrealistic to have latent cluster e ects distributed according to a positive stable law or to a discrete distribution. These assumptions are made to get, within (2.1), tractable models with probability mass function (pmf) having closed form expressions.
Swapping the successes and the failures, (2.1) gives new models for an extra binomial variation. The probability of success is then expressed as This duality was noted by [8] whose p model can be expressed as (2.3) where θ i has a positive stable distribution. In the sequel, models obtained with (2.3) will be called dmodel since, as will be seen in the next section, they arise when the joint distribution of the Bernoulli variables Y ij is expressed using a copula. In this paper, Kuk's p model is labeled by the dGumbel model. For (2.3), the intra cluster correlation is Two approaches are available to calculate the pmf of Y i . The rst method evaluates alternating sums directly using the moments λ k given in Table 1: For large clusters, say larger than 30, the evaluation of the alternating sums (2.5) is numerically unstable and yields negative probabilities. The second method to calculate the pmf of Y i is to evaluate (2.4) directly by integrating over the random e ects distribution. For the random e ects having gamma distributions of Clayton's family, we tried using the Gauss-Laguerre quadrature method. It gave poor approximations for large values of the dependence parameter α. Integrating over the random e ect is possible for the models of Joe and Frank as they have discrete distributions. For these models, where N is a positive integer large enough for these approximations to be accurate. The values N = and N = yielded nearly identical results in the small cluster simulations that are presented in Section 4. We used N = in the large cluster simulations presented in the next section. Note that binomial mixture models constructed in terms of Laplace transforms can be found in [2,8]. The innovative aspect of our paper is to parameterize these models in terms of a marginal probability π and the ICC ρ and to investigate this large class of models in a systematic way.

. A copula formulation of the model
This section derives an alternative formulation for (2.1), in terms of the joint survival distribution of the Bernoulli random variables Y ij , j = , , . . . , n i for the units within a cluster. Dropping subscript i, let {y j } be a sequence of n zeros and ones. Under (2.1) the joint survival distribution of the Y j 's within a cluster is . . , n, an n dimensional Archimedean copula for the dependency between the variables, see [13,Chap. 2] for more discussion. Note that the inverse ψ − α (·) of ψα is a decreasing function de ned on ( , ) such that is proved by noticing that, assuming (2.1), since, by construction, y j ψ − α (π) = ψ − α {F(y j )}. The value α = gives the independence copula, C ,n (u , . . . , un) = u × . . . × un. As α goes to ∞ the copulas for all the models of Table 1 converge to what is known as the Fréchet upper bound, This corresponds to the perfect correlation where all the Bernoulli variables within a cluster takes the same value, e. g. P(Y = . . . = Yn) = . An attractive feature of all models in Table 1 is that (π, ρ) has the full range ] , [×] , [, as π ∈ ( , ) and α ∈ ( , ∞). Moreover, ρ is a monotone function of α satisfying the following properties There is a copula behind most models for clustered binary data. Consider for instance the generalized binomial distribution, see [12]. Under this model the joint survival distribution within a cluster can be expressed as (2.7) with This expression highlights that the generalized binomial distribution is a discrete mixture model featuring two latent classes with respective probabilities − α and α. The clusters in the rst class are made of independent units while in the second class all the units of a cluster have the same Y values. Such a model is rather unlikely in practice and one would expect (2.1) to represent better the within cluster association. Despite its unrealistic description of the dependency, the generalized binomial has been widely used to investigate statistical procedures for cluster binary data, see [17,25].
Finally note that the estimation of the intra cluster correlation for discrete data using copulas was investigated by [19] in a di erent context. They were concerned with split-cluster designs, where individuals within a cluster are randomized to one of the two arms of a trial. They use bivariate copulas to model the association between the mean responses for the treated and for the untreated individuals in a cluster.

. Graphical comparisons
A total of 8 Archimedean copulas models are available to account for an extra binomial variation. They were compared using exploratory analysis techniques such as correspondence analysis. The ndings of these comparisons are now brie y presented. First the beta-binomial, the Clayton and the d-Clayton models are, for all practical purposes, equal. This is illustrated in (a), (b), and (c) of Figure 1 where the pmf for these three models obtained when n = , π = . , . , . and ρ = . are presented. Similar results are found for ρ = . ; they are presented in the supplementary material section. This means that when the beta-binomial is a candidate model for a data set, there is no need to include the Clayton and dClayton model as they provide ts that are very similar.
Graphs (d), (e) and (f) of Figure 1 compare the pmf for Joe (J), dJoe (dJ), Gumbel (G) and dGumbel (dG) models. The dJoe and dGumbel pmfs have nearly identical unimodal shapes while of the Gumbel and the Joe pmfs are similar with a bimodal structure. The pmfs of Figure 1 have a wide range of values for Pr(Y i = ), when π = . it goes from 0.05 to 0.25. Additional comparisons, with ρ = . are presented in the supplementary material section. In all comparisons, the Joe and the dJoe pmf presented the most extreme shapes, while the beta-binomial pmf has an average shape, between these two extremes. Therefore, in the small cluster simulations presented in the next section, only these three models are considered as being representative of all the models of Table 1. Figure 2 presents some pmf comparisons for clusters of size n = , with ρ = . and π = . , . , . . Such small values of ρ are typical in community intervention studies, see (Légaré et al., 2011). The models presented are the beta-binomial (BB), Frank (F), dFrank (dF), Joe (J) and dJoe (dJ). Figures 2 shows that the pmf for all the models are di erent, except possibly for the dFrank and dJoe models, whose shape are very similar. The pmf shows a unimodal structure for the models of dFrank and dJoe, and a bimodal structure under F's and J's models. Joe's pmf assigns a large probability to zero as compared with the others models and the beta-binomial pmf has an average shape between that of the Joe and the dJoe models. Graphs for ρ = . are presented in the supplementary material section.

Maximum likelihood inference for ICC(ρ)
This section shows how to calculate maximum likelihood estimates for ρ and π using data {(n i , y i ) : i = , . . . , K}. It also discusses the construction of pro le likelihood con dence intervals for ρ. Inference on ρ is carried out in a frequentist rather than a Bayesian set-up as this is commonly used in research on clustered randomized trials (Eldridge & Kerry, 2012, p.193).

. Maximun Likelihood Estimates
The parameters of interest are ρ and π, thus α needs to be expressed in terms of these two parameters. The only model for which equation (2.2) leads to an explicit expression for α is Gumbel's families for which For all the other models of It can be evaluated using one of the two methods presented in Section 2.1. The maximum likelihood estimates are the valuesπ andρ that maximize the log-likelihood de ned in (3.1). They were calculated using R. First an R function evaluating (3.1) in terms of ρ and π was written where α was evaluated in terms of ρ and π by solving (2.2) using uniroot. The search interval for α was set to ( − , ); for some copulas (2.2) cannot be evaluated for α values as large as and this generates a uniroot warning. This had no impact on the result since uniroot automatically lowers the upper bound of the search interval when this happens. Then the values of π and ρ that maximized the likelihood were obtained using the function optim of R. We used as starting values, the sample meanπ = y i / n i and ρ FC , the Fleiss Cuzick moment based estimate of ρ de ned in the next section. The log-likelihood is maximized in the square [ e − , .
]. The Fisher information matrix was then estimated as minus the Hessian of the likelihood function; this yields standard errors, s.e., for the two estimates. Negative estimates of ρ are not possible. Negative estimates of ρ have little practical interest, indeed [5, Chap. 8] recommend setting them to zero and proceeding as the data with cluster was independent.

. Model selection criteria
A model's t can be assessed using a deviance, equal to the likelihood ratio statistics comparing the copula model to that of a saturated model where each cluster has its own xed probability of success p i , Note however that this deviance does not have an asymptotic χ K− distribution when the copula model ts well as the null copula model is not a special case of the saturated model with cluster speci c p i when K is xed.
To select a speci c model to work with, we used the Akaike Information Criterion (AIC), de ned by where m, the number of parameters, is equal to 2 for all the copula models considered in this work. The preferred model is the one with the minimum AIC value. We did not use the model averaging approach of [14] as this does not permit the calculation of pro le con dence intervals.

. Pro le Likelihood Con dence Interval (PLCI)
Wald con dence intervals for ρ, with a ( − τ) con dence level, are given byρ ± z −τ/ s.e.(ρ), where z −τ/ is a normal quantile. This procedure may perform poorly for small to moderate sample sizes when estimating ρ, see Donner and Eliasziw (1992). This section reviews the construction of PLCI for ρ when using copula models associated to (2.1). Such intervals are generally considered to be more accurate than Wald con dence intervals. The pro le likelihood for ρ is given by L P (ρ) = maxπ L(π, ρ), ρ ∈ ( e − , . ). We used the R function optimize for its evaluation; it involved a uniroot function to evaluate α as a function of ρ and π. It may happen that L P (ρ) cannot be numerically evaluated for large values of ρ; to calculate the pro le con dence interval it is important to identify a value ρ M such that L P (ρ) can be evaluated for ρ ∈ ( e − , ρ M ). The ( − τ)% PLCI for ρ is de ned as the set CI = ρ : log L P (ρ) > log L(ρ,π) − χ ( − τ) , (3.4) where χ ( − τ) is the − τ quantile of a chi-squared distribution with one degree of freedom. The set CI is typically an interval and is denoted (ρ L ,ρ U ). We computed the two limits by nding two solutions to the equation log L P (ρ) = log L(ρ,π) − χ ( − τ), one in the interval ( ,ρ) and the other in the interval (ρ, ρ M ), using the R function uniroot. When log L(ρ,π) − χ ( − τ) < log L P ( e − ), the interval lower bound was set atρ L = . If ρ M is not speci ed correctly then log L P (ρ M ) cannot be evaluated; in such cases uniroot returnŝ ρ as the value of the con dence interval upper bound. This is not a legitimate upper bound and samples for whichρ U =ρ were rejected in the simulations; such unacceptable upper bounds occurred only in the large cluster simulations. For a quick evaluation of (ρ L ,ρ U ) the R function plkhci could also be used. When the true value of ρ is close to , the lower boundρ L is likely to be 0 and in this case the true coverage of the con dence might be slightly above its nominal level. PLCI for ρ, constructed using the beta-binomial distribution, were investigated by [18]. He concluded through Monte Carlo simulations that the pro le con dence interval performs well over a wide range of parameter values. A goal of this work is to investigate whether the beta-binomial pro le con dence interval is an omnibus procedure applicable in all circumstances or whether the t of the beta-binomial model should rst be ascertained.

Simulation Study
The purpose of this simulation study is to assess whether the AIC criterion (3.3) of Section 3.2 provides a reliable model selection criterion. Its goal is also to assess the performance of the pro le likelihood con dence intervals for ρ constructed for the models of Table 1. Two sets of simulations were carried out. The rst featured small clusters, with sizes ranging between 1 and 15 with a mean of 4. The second one had large clusters with n i between 27 to 65 with a mean of 38. Data was generated from the beta-binomial distribution, the Joe, dJoe, frank and dFrank models. For J's and dJ's models, cluster speci c θ i were generated from the Sibuya distribution while for the Frank and dFrank model, the logarithmic distribution was used. The accuracy of the AIC model selection criterion was measured using the proportions of times that model M is selected when the data was simulated from model M : where the indicator I i equal to 1 if model M provides the smallest AIC for sample i and 0 otherwise. The simulations also investigated % two-sided con dence intervals for ρ. Their performance was measured using the coverage (cov) and the average length (CIL) de ned as where A i equal to 1 if the true value ρ is in (ρ Li ,ρ Ui ), and 0 otherwise. All programming was done in R. Samples with y i = or y i = n i for all clusters were rejected.

. Small cluster simulations
Three models are investigated in the small cluster simulations, beta-binomial, Joe and dJoe. Two values of ρ and π, namely 0.1 and 0.3, and three values of K, 100, 250, and 500, were considered. Table 2 summarizes the proportion of times that a model is selected as a function of the model from which the data is simulated. The relatively high values on the diagonals of each quadrant of Table 2 show that the models are identi able. This is more pronounced, as either K, π or ρ increases. When a data set is generated from Joe's model, dJoe is nearly never selected and vice-versa. These results are in line with the ndings of the graphical comparisons of Figures 1 where these two models were very di erent. Table 2 also con rms the middle ground nature of the beta-binomial since this model has the smallest number of correct identi cations. Table 3 reports results on PLCI that assume that the model is selected correctly. It evaluates PLCI for ρ under the beta-binomial, the Joe and the dJoe models. The PLCI has generally good statistical properties; its coverage is close to nominal 95% level and its average length is relatively small. As expected, the expected con dence length decreases as the number of clusters increases. The PLCI for Joe's model is always shorter than the other two, with a very good empirical coverage. Thus PLCI constructed using the three models investigated in this section are reliable statistical techniques. Additional simulation results are presented in the Supplementary Material Section. First the coverage and the average length of [25] ICC con dence interval constructed using the moment estimator of [7], are reported together with the performance of the beta-binomial PLCI when the model is misspeci ed, that is when the data comes from the Joe and the dJoe model (see Web Table 1 in the "Web Appendix B"). The Zou & Donner con dence interval is very conservative as it is always much wider than the PLCI. These simulations also show that the PLCI for the beta-binomial is sensitive to a model misidenti cation. For instance for dJ data, K = , π = . , and ρ = . , the estimated coverage of the beta-binomial PLCI, 76.70%, is much smaller than the nominal value of 95%. However, when data are from J's model this method in general has a tendency to provide the observed coverage probabilities that are slightly larger than the nominal level. Table 3: Small cluster simulations: Con dence interval length (CIL) and empirical coverage (COV) of two-sided PLCI for ρ with a nominal % con dence level under beta-binomial, Joe, and dJoe models

. Large cluster simulations
We ran large cluster simulations with K =10, 25, and 50 clusters. Two values of ρ, 0.05 and 0.1, and of π, 0.1 and 0.3, were investigated. Besides the beta-binomial model, the models of Frank, dFrank, Joe, and dJoe were considered. Table 4 shows that the AIC has a hard time identifying the models when K = . The proximity, in Figure 2, of the pmf for Joe's and Frank's model shows up in Table 4 where the AIC has di culties distinguishing one from the other. This improves as K increases and at K = most models are identi able. The identi ability increases with π. Results in Tables 5 show that, when the model is correctly identi ed, the PLCI constructed with the beta-binomial, the Joe, the dJoe, the Frank and the dFrank models have good coverage properties, even with a relatively low (K = ) number of clusters. As expected, the average con dence interval length decreases with the number of clusters. In general, the Joe and the Frank PLCI provide smaller expected lengths and an empirical coverage close to the nominal values In Web Table 2 of "Web Appendix B", we reported con dence interval length (CIL) and empirical coverage (COV) of Zou and Donner's interval and of the beta-binomial PLCI, when the data comes from J's, dJ's F's and dF's distributions. We can see that Zou and Donner's method is very conservative. On the other hand, the PLCI based on beta-binomial model provides coverage below the nominal level when the model is misspeci ed. Thus using an omnibus PLCI based on the beta-binomial distribution is not a robust statistical procedure.

. A pilot trial about shared decision making
This example is concerned with the impact of training physicians in shared decision making on their patients' involvement in the decision making process, see [9] for more details. The data in Table 7 presents the number of patients y i reporting an active role in the decision about taking an antibiotics treatment for an acute respiratory infection, and the total patient enrollment n i in K = community health services of the Quebec region. The ICC ρ enters the sample size calculation in these trials. Getting a reliable estimate is needed for planning purposes. Its value is typically small in community intervention trials and the upper limit of a con dence interval is useful to calculate a conservative sample size.
The deviance of the independence model is . for degrees of freedom; the p-value of the independence test is Pr(χ > . ) = . %. This suggests that ρ > . This is a large cluster data set and only six sets of estimates are available. The values ofρ reported in Table 7 di er little, except for those obtained with the dJ and dF models which are null. The corresponding AIC are larger than that for the independence model, suggesting a poor t for these models. The best tting model is that of Frank . Thus Frank's model is plausible for this data and the 95% pro le con dence interval for ρ of (0.001,0.097) appears to be reliable.
As an application of this result, suppose that one were to design a cluster randomized trial to compare the e ciency of a method for training physicians in share decision making, with respect to no training, on the

. Chronic Obstructive Pulmonary Disease (COPD) data
This data is presented in Example 3 of [10], and is considered by [20,25]. In this data the familial aggregation of the COPD is used as a measure of how genetic and environmental factors may contribute to disease etiology. It involves 203 siblings from 100 families with sizes ranging from 1 to 6. The binary response here indicates whether a given sibling has impaired pulmonary functions.
In the Table 8 , we report estimates of π and ρ, pro le con dence intervals for ρ, and deviances obtained for all the models considered in this work. Results derived with the nonparametric (NP) model of [20], with six parameters λ k , k = , . . . , , are also provided. As expected from graphical comparisons, the dG and dJ models have very similar ts. These models provide the best ts to the COPD data in terms of the AIC value. Moreover, the likelihood ratio test for comparing the t of dGumbel model with that of the nonparametric model has χ = . p-value=0.78; this is not signi cant and the dGumbel ts well. The dGumbel 95% pro le con dence interval for ρ is ( . , . ); it is very close to the dJoe interval that has been shown to reliable in the small cluster simulations. Their length are 10% smaller than that of the Zou & Donner interval. As expected, the beta-binomial distribution, the Clayton and dClayton models give similar ts. Joe's and Frank's model do not t. This agrees with graphical comparisons of Figure 1 where these two pmfs give much larger probabilities to 0 than the other model. Globally all pro le likelihood con dence intervals are similar in this example, with di erences less than 10% in con dence interval length. Given the small family size, the fact that the selection of a particular model has a limited impact on the analysis does not come as a surprise. The statistical methods proposed in this work are useful when dealing with larger clusters.

Discussion
This paper has suggested to model the intra cluster association for binary data using multivariate Archimedean copulas. This family contains a wide range of distributional shape for the extra binomial variation. It has been demonstrated through a simulation study that the AIC is a useful criterion for selecting a particular model for the extra binomial variation. An important conclusion of our study is that pro le likelihood con dence intervals for Archimedean copulas models have good coverage properties, even when the number of clusters is small. The model selection step is important. An omnibus method, such as using the beta-binomial pro le Table 8: MLEs of π and ρ, and two-sided % con dence interval for ρ under 11 models for the (COPD) data. Degrees of freedom (DF), Deviance (Dev) and , AIC criterion of models are also reported. likelihood con dence interval for the ICC regardless of whether this model ts well, may lead to con dence interval with poor coverage properties.