A note on conditional Akaike information for Poisson regression with random effects

A popular model selection approach for generalized linear mixed-effects models is the Akaike information criterion, or AIC. Among others, \cite{vaida05} pointed out the distinction between the marginal and conditional inference depending on the focus of research. The conditional AIC was derived for the linear mixed-effects model which was later generalized by \cite{liang08}. We show that the similar strategy extends to Poisson regression with random effects, where condition AIC can be obtained based on our observations. Simulation studies demonstrate the usage of the criterion.


Introduction
Generalized linear models (GLM) are powerful modelling tools that have gained popularity in statistics. It has wide applications in medical studies, pattern classification, sample surveys, etc. The scope of GLM can be greatly expanded by the incorporation of random effects. For example, in typical longitudinal studies, a model with random effects not only models individual characteristics, but attempts to extrapolate to the entire population as well. It takes into account both within cluster and between cluster variations in the study. Model selection in GLM is typically achieved using AIC or BIC combined with step-wise procedures. With fixed-effects models, the definition of AIC is straightforward using the likelihood penalized by a term that depends on the number of parameters. When random effects come into play, it is not entirely clear how the number of parameters in the model should be defined. Based on other previous works such as [1] and [2], [5] made distinction between marginal and conditional inference and provided a formal definition of conditional Akaike information, cAI, which gives a theoretical justification for some previous approaches. They derived an unbiased estimator of cAI, called conditional AIC or cAIC, when the covariance matrix of random effects is known. [4] derives a more general cAIC that dispenses with such strong assumptions. In [5], the definition of cAI was given for general mixed-effects models but the unbiased estimator was only derived for linear mixed-effects models. A general approach of getting an unbiased estimator of cAI for generalized linear mixed-effects models (GLMM) seems to be out of reach. In this note, we propose an unbiased estimator of cAI for Poisson regression with random effects. The nature of Poisson regression is very different from the linear model since the responses are discrete. However, it turns out unbiased cAIC exists although it is derived in a different way.

Conditional AIC for count data
Suppose we have some count responses {y i }, i = 1, . . . , m from m clusters that we want to model in relation to covariates X i and Z i , with y i an n i × 1 vector from cluster i, and X i ,Z i are n i ×p and n i ×q matrices associated with fixed and random effects respectively. We use Poisson GLMM with the canonical link: where β is a p × 1 vector of fixed effects and b i is a q × 1 vector of random effects following a mean zero Gaussian distribution with unknown covariance matrix G. The total number of observations is thus N = m i=1 n i . Let θ be the population parameters in the model, including β and the parameters in G. The marginal likelihood is g(y|θ) = g(y|b, θ)g(b|G)db where g(y|b, θ) is the Poisson likelihood conditional on the random effects and g(b|G) is the density of the random effects. Sometimes it is more convenient to represent (1) in the condensed form In marginal inference, the focus is on the population parameters and the random effects are just a mechanism for modelling the correlations within the clusters. The standard AIC being used refers to this case and is called marginal AIC, mAIC, by [5], defined by −2 log g(y|θ(y))+2K, where K is the dimension of θ. This penalty term is there to correct the bias caused by using the same data to estimate θ as well as to evaluate the marginal likelihood g(y|θ). The AIC is designed to approximate the Akaike information, AI = −2E f (y) E f (y * ) log g(y * |θ), where y * is an independent replicate of y coming from the same true distribution f (y), which might not be contained within the family defined by (1).
In conditional inference, the focus is on the cluster and the estimation of the random effects is of interest. The prediction in this case refers to new responses with the same clusters. Suppose the true distribution of y is f (y, u) = f (y|u)p(u) where u is the true random effects with density p(u). Following [5], the conditional AI is naturally defined as where y * is independent of y, generated from the same conditional distribution f (·|u). Similar to AI, cAI is cannot be directly calculated since the true distribution f is unknown. For linear mixed-effects models, unbiased estimators were derived in [5] and [4]. No unbiased estimator has been proposed for other GLMM to our best knowledge. The following theorem gives an unbiased estimator of cAI for Poisson regression, and the proof is given in the Appendix.
and y (yi−1) is the same as y except its i−th component is replaced by y i − 1, and y i log[ŷ i (y (yi−1) )] = 0 when y i = 0 by convention. Remark 1. Although the derivation of the unbiased estimator for cAI is different from the linear model, with the latter derived by integration by parts [4], the results have some resemblance with each other. For linear models, K is given by i ∂ŷ i /∂y i and the partial derivatives are estimated by finite difference. Our K for Poisson regression bares the similarity in that it depends on the difference betweenŷ i andŷ i (y (yi−1) ), the fitted responses after perturbing the original observations. Remark 2. In Theorem 1, we only need to assume that the true model is in the Poisson family with means depending on some random effects u, which might also be different from the modelled random effects b. Thus the true model does not have to be included in the candidate model family. Besides, we are not assuming anything about the estimatorsθ andb and they can be any reasonable estimators used in the literature.

Simulation study
We conducted a simulation study to investigate the properties of our unbiased cAIC estimator and demonstrate the difference between marginal and conditional inference. We simulate data from model (1) with a random intercept: log µ ij = β 0 + β 1 x j + b i , i = 1, . . . , m = 10, j = 0, . . . , n i , . In our simulation, we consider n i = 5 and n i = 15 with σ b = 0.25, 0.5 and 1. For each of the six specifications, 500 data sets are generated. We compare the cAIC with the true bias, BC, defined by which is estimated by simulation with another independent 500 sets of y * 's generated from the true conditional distribution f (·|u) that shares common random effects u with current responses.
The results are shown in Table 1. The estimated biases are close to the true value. In general, the 'effective number of parameters' increases with the variance for the random effects. The same comparison can be made for mAIC and also for fixed-effects models, but we found in our simulations that the estimator K in those cases is very close to the number of population parameters and there appears to be no advantages of using our estimator which only increases the computational burden.
To illustrate the differences between marginal and conditional inference, we use the same setup as before with (β 0 , β 1 ) = (1, 0.2), n i = 5, σ b = 0.125, 0.25 and 0.5. Laplace approximation is used to approximate the marginal likelihood in the calculation of mAIC, for which the bias is simply estimated by the number of population parameters, 3 in this case. Also, a fixed-effects model log µ ij = β 0 + β 1 x j is fitted to the data and standard AIC is found. The values of AIC, mAIC and cAIC are shown in Figure 1 for different random effect variances. These values are averages over 500 sets of data simulated from the model. The difference between mAIC and cAIC is most obvious when σ b = 0.125. In fact, by comparing the information criteria, when σ b = 0.125, the fixed-effects model is selected for 395 of the 500 data sets when comparing AIC with mAIC, while it is selected only for 3 data sets when comparing AIC with cAIC. When σ b = 0.25, fixed-effects model is selected for 165 of the 500 data sets using mAIC, while it is selected only once when using cAIC. Meanwhile, In addition, we have that where y (zi−1) is the vector y whose i−th component has been replaced by z i − 1, and similarly for y (yi−1) . Therefore, cAI − (−2E f (y,u) log g(y|θ,b)) = 2E p(u) E f (y|u)