Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R

In this paper we present R functions for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using R functions is provided.


Introduction
Bivariate Poisson models are appropriate for modelling paired count data exhibiting correlation.Paired count data arise in a wide context including marketing (number of purchases of different products), epidemiology (incidents of different diseases in a series of districts), accident analysis (number of accidents in a site before and after infrastructure changes), medical research (the number of seizures before and after treatment), sports (the number of goals scored by each one of the two opponent teams in soccer), econometrics (number of voluntary and involuntary job changes), just to name a few.Unfortunately the literature on such models is sparse due to computational problems involved in their implementation.
Bivariate Poisson models can be expanded to allow for covariates, extending naturally the univariate Poisson regression setting.Due to the complicated nature of the probability function of the bivariate Poisson distribution, applications are limited.The aim of this paper is to introduce and construct efficient Expectation-Maximization (EM) algorithms for such models including easy-to-use R functions for their implementation.We further extend our methodology to construct inflated versions of the bivariate Poisson model.We propose a model that allows inflation in the diagonal elements of the probability table.
Such models are quite useful when, for some reasons, we expect diagonal combinations with higher probabilities than the fitted under a bivariate Poisson model.For example, in pre and post treatment studies, the treatment may not have an effect on some specific patients for unknown reasons.Another example arises in sports where, for specific cases, it has been found that the number of draws in a game is larger than those predicted by a simple bivariate Poisson model (Karlis and Ntzoufras, 2003).
In addition, an interesting property of inflated models is their ability to allow for modelling both correlation between two variables and over-dispersion (or alternatively underdispersion) of the corresponding marginal distributions.Given their simplicity, such models are quite interesting for practical purposes.
The remaining of the paper proceeds as follows: in Section 2 we introduce briefly the bivariate Poisson and the diagonal inflated bivariate Poisson regression models.In Section 3 we provide a detailed description of the R functions.Several illustrative examples (simulated and real) including guidance concerning the fitting of the models can be found in section 4. Finally, we end up with some concluding remarks in Section 5. Detailed description and presentation of the EM algorithms for maximum likelihood (ML) estimation is provided at the Appendix.
2 Models for Bivariate Poisson Data

Bivariate Poisson Regression models
Consider random variables X κ , κ = 1, 2, 3 which follow independent Poisson distributions with parameters λ κ , respectively.Then the random variables X = X 1 +X 3 and Y = X 2 +X 3 follow jointly a bivariate Poisson distribution, BP (λ 1 , λ 2 , λ 3 ), with joint probability function The above bivariate distribution allows for positive dependence between the two random variables.Marginally each random variable follows a Poisson distribution with E(X) = λ 1 + λ 3 and E(Y ) = λ 2 + λ 3 .Moreover, Cov(X, Y) = λ 3 , and hence λ 3 is a measure of dependence between the two random variables.If λ 3 = 0 then the two variables are independent and the bivariate Poisson distribution reduces to the product of two independent Poisson distributions (referred as double Poisson distribution).For a comprehensive treatment of the bivariate Poisson distribution and its multivariate extensions the reader can refer to Kocherlakota and Kocherlakota (1992) and Johnson et al. (1997).
More realistic models can be considered if we model λ 1 , λ 2 and λ 3 using covariates as regressors.In such case, the Bivariate Poisson regression model takes the form where i = 1, . . ., n, denotes the observation number, w κi denotes a vector of explanatory variables for the i-th observation used to model λ κi and β κ denotes the corresponding vector of regression coefficients, κ = 1, 2, 3.The explanatory variables used to model each parameter λ κi may not be the same.Usually, we consider models with constant λ 3 (no covariates on λ 3 ) because such models are easier to interpret.Although, assuming constant covariance term results to models that are easy to interpret, using covariates on λ 3 helps us to have more insight regarding the type of influence that a covariate has on each pair of variables.To make this understood recall that the marginal mean for X i is equal (from (2)) to E(X i ) = exp(w T 1i β 1 ) + exp(w T 3i β 3 ).If a covariate is present in both w 1 and w 3 , then a considerable part of the influence of this covariate is through the covariance parameter that is common for both X and Y variables.Moreover, such an effect is no longer multiplicative on the marginal mean (additive on the logarithm) but much more complicated (multiplicative on λ 1 and λ 3 and additive on the marginal mean).Jung and Winkelmann (1993) introduced and implemented bivariate Poisson regression model using a Newton-Raphson procedure.Ho and Singer (2001) and Kocherlakota and Kocherlakota (2001) proposed a generalized least squares and Newton Raphson algorithm for maximizing the log-likelihood respectively.Here we construct an EM algorithm to remedy convergence problems encountered with the Newton Raphson procedure.The algorithm is easily coded to any statistical package offering algorithms fitting generalized linear models (GLM).Here, we provide R functions for implementing the algorithm.Standard errors for the parameters can be calculated using the information matrix provided in Jung and Winkelmann (1993) or using standard bootstrap methods.The latter is quite easy since good initial values are available and the algorithm converges fairly quickly.Finally, Bayesian inference has been implemented by Tsionas (2001) and Karlis and Meligkotsidou (2005).

Diagonal Inflated Bivariate Poisson regression models
A major drawback of the bivariate Poisson model is its property to model data with positive correlation only.Moreover, since its marginal distributions are Poisson they cannot model over-dispersion/under-dispersion.As a remedy to the above problems, we may consider mixtures of bivariate Poisson models like those of Munkin and Trivedi (1999) and Chib and Winkelmann (2001).However, such models involve difficult computations regarding estimation and can not handle under-dispersion.In this section, we propose diagonal inflated models that are computationally tractable and allow for over-dispersion, (under-dispersion) and negative correlation.The results reported are new apart from a quick comment in Karlis and Ntzoufras (2003) and some special cases treated in Li et al. (1999), Dixon and Coles (1997) and Wang et al. (2003).
In the univariate setting, inflated models can be constructed by inflating the probabilities of certain values of variable under consideration, X.Among them, zero-inflated models are very popular (see, for example, Lambert, 1992, Bohning et al. , 1999).In the multivariate setting, there are few papers discussing inflated model in bivariate discrete distributions.
Such models have been proposed by Dixon and Coles (1997)  We propose a more general model formulation which inflates the probabilities in the diagonal of the probability table.This model is an extension of the simple zero-inflated model which allows only for an excess in (0, 0) cell.We consider, for generality, that the starting model is the bivariate Poisson model.Under this approach a diagonal inflated model is specified by where f D (x | θ) is the probability function of a discrete distribution D(x; θ) defined on the set {0, 1, 2, . ..} with parameter vector θ.Note that for p = 0 we have the simple bivariate Poisson model defined in the previous section.Diagonal inflated models (3) can be fitted using the EM algorithm provided at the Appendix.
Useful choices for D(x; θ) can be the Poisson, the Geometric or simple discrete distributions denoted by Discrete(J).The Geometric distribution might be of great interest since it has mode at zero and decays quickly as one moves away from zero.As Discrete(J) we consider the distribution with probability function where J x=0 θ x = 1.If J = 0 then we end up with the zero-inflated model.Two are the most important and distinctive properties of such models.Firstly, the marginal distributions of a diagonal inflated model are not Poisson distributions, but mixtures of distributions with one Poisson component.Namely the marginal for X is given by where f P o (x | λ) is the probability function of the Poisson distribution with parameter λ.
This can be easily recognized as a 2-finite mixture distribution with two components, the one having a Poisson distribution and the other a f D distribution.For example, if we consider a Geometric inflation then the resulting marginal distribution is a 2-finite mixture distribution with one Poisson and one geometric component.Thus the marginal mean is given by where E D (X) denotes the expectation of the distribution D(x; θ).The variance is much more complicated and is given by Since the marginals are not Poisson distributions, they can be either under-dispersed or over-dispersed depending on the choices of D(x; θ).For example, if D(x; θ) is a degenerate at one (that is, Discrete(1) with θ T = (0, 1)) implying inflation only on the (1, 1) cell, then, for λ 1 + λ 3 = 1 and p = 0.5, the resulting distribution is under-dispersed (variance equal to 0.5 and mean equal to 1).On the other hand, if the inflation distribution has positive probability on more points, for example a geometric or a Poisson distribution, the resulting marginal distribution will be over-dispersed.In the simplest case of zero-inflated models, the marginal distributions are also over-dispersed relative to the simple Poisson distribution.
Another important characteristic is that, even if λ 3 = 0 (double Poisson distribution), the resulting inflated distribution introduces a degree of dependence between the two variables under consideration.In general, the simple bivariate Poisson models has Thus for an inflated model we obtain The above formulas are a generalization of simpler case where inflation is imposed only on the (0, 0) cell given by Wang et al. (2003).If the data before introducing inflation are independent, that is if λ 3 = 0, the covariance is given by which implies non-zero correlation between X and Y .Note that for certain combinations of D(x; θ), the covariance can be negative as well.For example, if p = 0.5, λ 1 = 0.5, λ 2 = 2 and the inflation is a degenerate at one distribution then the covariance equals −0.125.
When inflation is added only on the cell (0, 0), we obtain that E D (X) = E D (X 2 ) = 0 and All the above datasets can be loaded (after attaching the bivpois package) using the command data; for example data(ex1.sim)loads the first dataset included in the package.

The Function pbivpois
Function pbivpois evaluates the probability function (or its logarithm) of BP (λ 1 , λ 2 , λ 3 ) for x and y values.The function can be called using the following syntax: pbivpois(x, y=NULL, lambda = c(1, 1, 1), log = FALSE) • REQUIRED ARGUMENTS: x: Matrix or vector containing the data.If this argument is a matrix then pbivpois evaluates the distribution function of the bivariate Poisson for all the pairs provided by the first two columns of x.Additional columns of x and y are ignored.
y: Vector containing the data for the second value of each pair (x i , y i ) for which we calculate the distribution function of Bivariate Poisson.It is used only if x is also a vector.Vectors x and y should be of equal length.
• OPTIONAL ARGUMENS: lambda: Vector of length three containing values of the parameters λ 1 , λ 2 and λ 3 of the bivariate Poisson distribution.
log: Logical argument controlling the calculation of the logarithm of the probability or the probability function itself.If the argument is not used, the function returns the probability value of (x i , y i ) since the default value is FALSE.

The Function simple.bp
Function simple.bpimplements the EM algorithm for fitting the simple bivariate Poisson model of the form (x i , y i ) ∼ BP (λ 1 , λ 2 , λ 3 ) for i = 1, . . ., n.It produces a 'list' object which gives various details regarding the fit of such a model.The function can be called using the following syntax: simple.bp(x, y, ini3=1.0,maxit=300, pres=1e-8) • REQUIRED ARGUMENTS: x, y : vectors containing the data.
maxit: Maximum number of EM steps.EM algorithm stops when the number of iterations exceeds maxit and returns as result the values obtained by the last iteration.
pres: Precision used in log-likelihood improvement.If the relative log-likelihood difference between two subsequent EM steps is lower than pres then the algorithm stops.Note that the algorithm stops if one of the arguments maxit or pres is satisfied.
A list object is returned with the following output components: lambda: Parameters λ 1 , λ 2 , λ 3 of the model.
loglikelihood: Log-likelihood of the fitted model.This argument is given in a vector form of length equal to iterations with one value per iteration.This vector can be used to monitor the log-likelihood improvement and the convergence of the algorithm.
parameters: Number of estimated parameters of the fitted model.
-AIC, BIC: AIC and BIC values of the fitted model.Values of AIC and BIC are also given for the double Poisson model and the saturated model.
iterations: Number of iterations of the EM algorithm.
During the execution of the algorithm the following details are printed: the iteration number, λ 1 , λ 2 , λ 3 , the log-likelihood and the relative difference of the log-likelihood.For an illustration of using this function see example 1 in section 4.1.

The Function lm.bp
Function lm.bp implements the EM algorithm for fitting the bivariate Poisson regression model ( 2) with for κ = 1, 2, 3; where l κ is a n × 1 vector containing the log λ κ for each observation, that is l κ = (log λ κ1 , . . ., log λ κn ) T ; n is the sample size; p 1 , p 2 and p 3 are the number of explanatory variables used for each parameter λ κ ; w 1 , w 2 and w 3 are the design/data matrices used for each parameter λ κ ; w κ,j are n × 1 vectors corresponding to j column of w κ matrix; β κ is the vector of length p κ + 1 with the model coefficients for λ κ and β κ,j is the coefficient which corresponds to j explanatory variable used in the linear predictor of λ κ (or j column of w κ design matrix).
data: data frame containing the variables in the model.
• OPTIONAL ARGUMENTS -l1l2=∼1: Formula of the form "∼ X 1 +. . .+X p " for the common parameters of log λ 1 and log λ 2 .If the explanatory variable is also found on l1 and/or l2 then a model using interaction type parameters is fitted (one parameter common for both predictors [main effect] and differences from this for the other predictor [interaction type effect] ).Special terms of the form "c(X1,X2)" can be also used here.These terms imply common parameters on λ 1 and λ 2 for different variables.
For example if c(x1,x2) is used then use the same beta for the effect of x 1 on log λ 1 and the effect of x 2 on log λ 2 .For details see example 4.
common.intercept=FALSE:Logical function specifying whether a common intercept on log λ 1 and log λ 2 should be used.
-zeroL3=FALSE Logical argument controlling whether λ 3 should be set equal to zero (therefore fits a double Poisson model).
-pres=1e-8: Precision used to terminate the EM algorithm.The algorithm stops if the relative log-likelihood difference is lower than the value of pres.
- A list object is returned with the following output components: coefficients: Vector β containing the estimates of the model parameters.When a factor is used then its default set of contrasts is used.
fitted.values:Data frame of size n × 2 containing the fitted values for the two responses x and y.For the bivariate Poisson model are simply given as λ 1 + λ 3 and λ 2 + λ 3 respectively.
residuals: Data frame of size n×2 containing the residual values for the two responses x and y given by x-fitted.value$xand y-fitted.value$y.
loglikelihood: Log-likelihood of the fitted model given in a vector form of length equal to iterations(one value per iteration).
parameters: Number of estimated parameters of the fitted model.
-AIC, BIC: AIC and BIC values of the model.Values are also given for the saturated model.
iterations: Number of iterations of the EM algorithm.
call: Argument providing the exact calling details of the lm.bp function.
The resulted object of the lm.bp function is a list of class lm.bp, lm and hence the functions coef, residuals and fitted can be also implemented.

The Function lm.dibp
Function lm.dibp implements the EM algorithm for fitting the simple diagonal inflated bivariate Poisson model of the form where λ κi are specified as in (6) proportion p evaluated at (x, y); see also equation ( 3).
-distribution='discrete': Specifies the type of inflated distribution; 'discrete' -jmax=2: Number of parameters used in Discrete distribution.This argument is not used when the diagonal is inflated with the geometric or the Poisson distribution.
A list object is returned with the following output components: coefficients: Vector containing the estimates of the model parameters (β 1 , β 2 , β 3 , p and θ).
fitted.values:Data frame with n lines and 2 columns containing the fitted values for x and y.For the diagonal inflated bivariate Poisson model are given by where E D (X) is the mean of the distribution used to inflate the diagonal.
diagonal.distribution:Label stating which distribution was used for the inflation of the diagonal.
theta: Estimated parameters of the diagonal distribution.If distribution='discrete' then the variable is a vector of length jmax with θ j =theta[j] for j = 1, . . ., jmax and then θ is the mean of the Poisson; if distribution='geometric' then θ is the success probability of the geometric distribution.
call: Argument providing the exact calling details of the lm.dibp function (same as in lm.bp).
The resulted object of the lm.dibp function is a list of class lm.dibp, lm and hence the functions coef, residuals and fitted can be also implemented.For an illustration of using this function see examples in sections 4.2, 4.4 and 4.5.

More Details on Formula Arguments.
In this subsection we provide details regarding the formula arguments used in functions lm.bp and lm.dibp.The formulas l1 and l2 are required in both functions.They are of the form x∼X1+X2+...+Xp and y∼X1+X2+...+Xp respectively; where x and y are the two response variables.Variables used as explanatory variables in l1 and l2 specify unique and not equal terms on the linear predictors of λ 1 and λ 2 parameters respectively.Therefore, if X1 is used as independent variable in l1 but not in l2 then X 1 will be used in the linear predictor of λ 1 but not on the linear predictor of λ 2 .
Formula arguments ll12 and l3 are not required.The first is used to specify variables that have common effect on the linear predictors of both λ 1 and λ 2 .The second argument, l3, is used to model the linear predictor of λ 3 which controls the covariance between the two response variables.Both of these formulas have the form ∼X1+...+Xp, hence no response variable should be defined.Note that if a term is present as an explanatory variable in l1 and/or l2 and l1l2 then non-common effect will be fitted.In more detail, the following combinations can be used as terms: 1. Effect only on λ 1 : lm.bp(x∼z1, y∼1, l1l2=NULL,...) .

5.
Different effect on λ 1 and λ 2 will be also fitted using the combination lm.bp(x∼z1, y∼1, l1l2=z1,...) with the difference that two parameters will be provided for effect of z1 on λ 1 .The first one will be equal to the effect of z1 on λ 2 (common effect) and the second will be the difference effect of z1 on λ 1 .Similarly the combination lm.bp(x∼1, y∼z1, l1l2=z1,...) will provide a common effect for both λ 1 and λ 2 and an additional difference effect of z1 for λ 2 .
A special term can be also used in both lm.bp and lm.dibp when we wish to specify a common effect on λ 1 and λ 2 for different variables.This can be achieved using terms of type c(z1,z2).Such as a term results to a common parameter for both λ 1 and λ 2 for the variable z 1 and z 2 respectively.For example, the combination of the following formulas l1=x∼Z4, l2=y∼Z4, l1l2=∼c(z1,z2)+ z5 will result to the following model Term c(z1,z2) will be denoted with the label z1..z2 .Such kind of terms are useful for fitting the models of Karlis and Ntzoufras (2003) for sports outcomes; see for an illustration in section 4.5.1.Note that β 12 is common to both regressions, this is why we did not use the comma separator to show that this is a common coefficients for regressors z 1 and z 2 in the two lines.Some commonly used models can be specified by the following syntax: • lm.bp(x∼1, y∼1, common.intercept=TRUE,...) : Common constant for λ 1 and λ 2 that is log λ 1i = β 0 and log λ 2i = β 0 .
Operators +, :, * can be used as in normal formula objects to define additional main effects, interaction terms, or both.

Simulated Example 1
In order to illustrate our algorithm, we have simulated 100 data points (x i , y i ) from a bivariate Poisson regression model of type (2) with λ 1i , λ 2i , λ 3i given by for i = 1, . . ., 100; where Z ki (k = 1, . . ., 5 and i = 1, . . ., 100) have been generated from a normal distribution with mean zero and standard deviation equal to 0.1 .The sample means were found equal to 11.8 and 7.9 for X and Y respectively.The correlation and the covariance were found equal to 0.623 and 6.75 respectively indicating that a bivariate Poisson model should be fitted.statistics based on the log-likelihood, we may also test the significance of specific parameters and identify which model should be selected.

Fitting the Simple (no regressors) Bivariate Poisson Model in R
In order to fit the simple bivariate Poisson model and store the results in an object called ex1.simple we firstly load the bivpois package and the data of the first example using the following commands: library(bivpois) data(ex1.sim)and then fit the model by: ex1.simple<-simple.bp(ex1.sim$x,ex1.sim$y)where x, y are vectors of length 100 containing our data included in ex1.sim data frame.
If we wish to monitor the calculated arguments then we type names(ex1.simple)resulting to: [1] "lambda" "loglikelihood" "parameters" "AIC" "BIC" [6] "iterations" We may further monitor any of the above values by typing the name of the stored object (here ex1.simple followed by the dollar character ($) and the name of the output component we wish to monitor.For example ex1.simple$lambda will print the values of λ 1 , λ 2 and λ 3 : Finally, all variables of ex1.simple can be printed by simple typing its name: > ex1.simple $lambda [1]  Parameter of Z 3 is common for both λ 1 and λ 2 ; Blank cells correspond to zero coefficients (the corresponding covariate was not used); ( * ): Corresponds to λ 3 = 0).
variables were stored in vectors x and y while the explanatory variables in vector z1, z2, z3, z4 and z5 within the above data frame.
Firstly, the simple model fitted in the previous section using the simple.bpfunction can be also fitted using the lm.bp function (see model ex1.m3 in the syntax which follows).The major difference is that the second function provides a wider variety of output components.
The syntax for fitting models 2 -11 presented in Table 1  In the above results, within brackets the parameter λ i is indicated for which each parameter is referred.Separate estimates for the effects on λ 1 , λ 2 and λ 3 can be obtained using the commands ex1.m11$beta1, ex1.m11$beta2 and ex1.m11$beta3, respectively.
From the above results, the model can be summarized by the following equation To illustrate our method we have implemented diagonal inflated models on both data of simulated example 1 and 2. For the data of the previous section no improvement was evident (estimated mixing proportion for all models was found equal to zero).For the data of example 2, both BIC and AIC values indicate the Poisson distribution is the most suitable for the diagonal inflation.Moreover, for the discrete distribution, we need to set at least J = 4 in order to get values of BIC lower than the corresponding values of the bivariate Poisson model with no inflation (see Table 3).Estimated parameters for the diagonal inflated model with the best discrete distribution, Poisson and geometric distributions are provided in Table 4.In all models we have used the actual underlying covariate set-up as given for model 11 in section 4.1.The above results can be summarized by the following model (θ 1 , θ 2 , θ 3 , θ 4 , θ 5 ) = (0.302, 0.149, 0.224, 0.147, 0.103) Similarly model 9 produces the following results

Example 3: Real Application
In this example, we have fitted bivariate Poisson models on data concerning the demand for Health Care in Australia, reported by Cameron and Trivedi (1986).The data refer to the Australian Health survey for 1977-1978.The sample size is quite large (n = 5190) although they are only a subsample of the collected data.We will use two variables, namely the number of consultations with a doctor or a specialist (X) and the total number of prescribed medications used in past 2 days (Y ) as the responses; see Table 5 for a cross-tabulation of the data.The data are correlated (Pearson correlation equal to 0.308) indicating that a bivariate Poisson model should be used.It is also interesting to examine the effect of the correlation in the estimates.
Three variables have been used as covariates: namely the gender (1 female, 0 male), the age in years divided by 100 (measured as midpoints of age groups) and the annual income in Australian dollars divided by 1000 (measured as midpoint of coded ranges).More details on the data and the study can be found in Cameron and Trivedi (1986).Standard errors for the bivariate regression models have been calculated using 200 bootstrap replications.This is easily implemented since, in this case, the convergence of the algorithm is fast due to the use of good initial values.
Comparing models (a) and (c) we can conclude that the covariance term is significant (p-value< 0.01).The effects of all covariates are statistically significant using asymptotic ttests.Furthermore, the effect of gender on the covariance term is significant.Similar results can be obtained using AIC and BIC values.
Let us now examine the estimated parameters.Concerning models (a) and (c), we observe that covariate effects for the two models are quite different.This can be attributed to the covariance λ 3 which is present.Using a bivariate Poisson model we take into account the covariance between the two variables and hence the effect of each variable on the other including the effect of the covariates.This may indicates that a Double Poisson would estimate incorrectly the true effect of each covariate on the marginal mean.
When comparing models (a) and (b), the covariate 'gender' in the covariance parameter is significant indicating that males and females have different covariance term.Note that, the gender effect on λ 1 (mean of variable X: number of doctor consultations) has changed dramatically, while this is not true for the rest parameters.This is due to the fact that the marginal mean for X is now λ 1 + λ 3 (instead of λ 1 ) and, since they share the same variate (gender), we observe different estimates concerning the gender effect on the number of doctor consultations.A plausible explanation might be that gender influences the number of doctor consultations mainly through the covariance term.A final important comment is that the gender effect in the bivariate Poisson model is no longer multiplicative on the mean (additive on the logarithm) since the marginal mean is equal to λ 1 + λ 3 .

Concluding Remarks
In this article we have presented R functions implementing maximum likelihood estimation for bivariate Poisson regression models and their diagonal inflated variations.Diagonal inflated models, also presented here, are useful in cases where excess of combinations of pairs with equal x and y values appear (for example in sports data; see Karlis and Ntzoufras, 2003).All functions are based on EM algorithms constructed for such models; see Appendix for details.
The software presented in this paper implements methodology which can be easily extended and implemented in several variations of models discussed in this article.For example, the extension of the EM algorithm to the multivariate Poisson models is straightforward since such models are obtained through similar multivariate reduction techniques and the same data augmentation approach can be easily applied.Similarly, an EM algorithm can be easily modified to cover the case of finite mixtures of bivariate Poisson regressions.Such a model is a generalization of the approach presented by Wang et al. (1996).The inflated models can be seen as a special case of finite mixtures of bivariate Poisson distributions.
The algorithms can be even extended to cover the case of models with random effects.For example, assuming gamma random effects, we obtain a bivariate negative binomial regression model, as in Munkin and Trivedi (1999).
Finally, the trivariate reduction technique (used in the data augmentation approach here) is useful for constructing multivariate models from simpler ones; see for example the bivariate generalized Poisson model of Vernic (1997).Clearly, EM algorithms, identical to the ones presented here, can be used to cover several other models arising by similar trivariate technique.
3i , p (k) and θ (k) , for i = 1, . . ., n calculate where f D (x | θ) is the probability function of the inflation distribution with parameter vector θ evaluated at the value of x.
For specific choices of the inflation distribution we obtain the following estimates: • Geometric Distribution: For the geometric distribution, with probability function x θ, 0 ≤ θ ≤ 1, x = 0, 1, . .., θ is updated by Note that, if θ = 1 the zero-inflated model is deduced.
• Poisson distribution: For the Poisson distribution with probability function f (x|θ) = • Discrete distribution: For any discrete distribution, Discrete(J), with probability function (4) then the model parameters are given by θ j = ( n i=1 v i ) −1 n i=1 I(X i = Y i = j)v i for j = 1, . . ., J and θ 0 = 1 − J j=1 θ j ; where I(x) is the indicator function taking value equal to one if x is true and zero otherwise.
• Zero-Inflated model: The zero inflated model is a special case of Discrete(J) with J = 0 and θ 0 = 1 which results to the inflation of cell (0, 0).Hence, there is no need to estimate additional parameters except p which is the mixing proportion of the inflation component.Further note that the zero-inflated model is a limiting case when either the Poisson (with θ → 0) or the geometric (with θ → 1) inflation is used.
In fact the M-step consists of several iterations of the iterated reweighted algorithm used for GLM.Hence the algorithm is an Expectation Conditional Maximization (ECM) algorithm.Usually the number of iterations needed to fit the GLM within each M-step can be considerably reduced if we use as starting values the values obtained by the previous EM step.Alternatively, we may constrain the number of iterations for fitting the GLM to a small number.This will be still sufficient to improve the log-likelihood, despite the fact that the fitted model may not be the best within each iteration of the EM algorithm.
Further complexity can be added to the model by imposing a additional covariate structure on parameters θ or p. EM-algorithms need to be slightly modified in order to incorporate such extensions.Similarly, the model of Dixon and Coles (1997) can be fitted using an EM algorithm identical to the one proposed here.The algorithm presented here, can be considered as a generalization of the algorithm described in Wang et al. (2003).Finally, generalizations of the models for multivariate versions can easily be derived.
verbose=getOption('verbose'): Logical argument for controlling the printing details during the iterations of the EM algorithm.Default value is taken equal to the value of options()$verbose.If verbose=FALSE then only the iteration number, the log-likelihood and its relative difference from the previous iteration are printed.If print.details=TRUEthen the model parameters for λ 1 , λ 2 and λ 3 are additionally printed.
574695 2.894695 5.125305 Similarly the command ex1.simple$BIC returns: > ex1.simple$BICSaturated DblPois BivPois 1869.7591091.8421049.359From the above output, the BIC of our fitted model is equal to 1049.36 .For comparison, the values of the simple double Poisson model and the saturated model are also given (1091.84ad 1869.76respectively).As saturated model we consider the double Poisson model with perfect fit, that is the expected values are equal to the data.Here BIC indicates that the bivariate Poisson model is better than both the simple double Poisson model and the saturated one.
Three competing models were fitted to the data: a) a model with constant covariance term (no covariates on λ 3 ), b) a model with covariates on the covariance term λ 3 (only gender was used which induces different covariance for each gender) and c) a model without any covariance (double Poisson model); detailed results are given at y, s, v and ṽ are n × 1 vectors with elements x i , y i , s i , v i and ṽi = 1 − v i for i = 1, . . ., n, βv (y, W ) is the weighted maximum likelihood estimates of β of a Poisson regression model with response y, data matrix W and weight vector v, and θv,D is the weighted maximum likelihood estimates of θ for the distribution D(x; θ) and weights given by vector v.

Table 2 .
Table 1 with their BIC and AIC values.Both AIC and BIC indicate that the best fitted model (among the ones we have tried) is model 11 which is the actual model we have used to generate our data.Using asymptotic χ 2

Table 3 :
.2 Simulated Example 2: Diagonal Inflated models Details for Fitted Models for Simulated Example 2 (Par.: Number of Parameters; vector γ (of length 100) with success probability 0.30 and a Poisson vector d (of length 100) with mean equal to 2. The new data (x i , y i ) were constructed by settingx i = x i (1 − γ i ) + γ i d iLog-like: Log-likelihood; Mix.Prop.: Mixing Proportion).andyi= y i (1 − γ i ) + γ i d i for i = 1, . .., n.Finally, 27 observations were contaminated with sample mean equal to 2.4.This means that 27 observations of the data generated in example 1 were randomly substituted by equal X and Y values generated from a Poisson distribution with mean equal to 2. Data are available using the command data(ex2.sim)after loading the bivpois package.

Table 4 :
Estimated Parameters for Fitted Models of Simulated Example 2. Parameter vector for model 7 is given by θ = (θ 0 , θ 1 , θ 2 , θ 3 , θ 4 , θ 5 ) .Here we illustrate how we can fit diagonal inflated models presented in Table3using lm.dibp function.The following commands have been used to fit each model

Table 5 :
Cross Tabulation of Data from the Australian Health Survey(Cameron and Trivedi,

Table 6 :
Results from Fitting Bivariate Poisson Models for the Data of Example 3.

Table 7 :
Karlis and Ntzoufras (2003)al Inflated Bivariate Poisson Models for the Data ofThe above changes do not affect the selection of the best model which is the diagonal inflated Here we illustrate we can use our R functions to fit models implemented byKarlis and Ntzoufras (2003).Data concerning the football data of Italian Serie A league for season 1991-92 are stored in a data frame object called ex4.ita91 and can loaded using the command data(ex4.ita91).Four variables are included in the dataset: g1, g2, team1, team2 corresponding to the goals scored by the home and away team and the coded level of the home and the away team respectively.Sample of the data frame is given below: In this example team1 indicates the attacking teams for variable g1 while team2 indicates the attacking teams for variable g2.Similarly team1 and team2 indicate the defensive teams for variables g2 and g1 respectively.Since we want to estimate common attacking and defensive effects we should use a term of type c(team1,team2) for estimating the common attacking parameters and a term of type c(team2,team1) for estimating the common defensive parameters.Therefore we fit the models presented in Karlis and Ntzoufras using the following

Table 1 of
Karlis and Ntzoufras, 2003)using the lm.bp and lm.dibp functions (with presision 10 −8 and maximum number iterations equal to 300) Finally, in Table8we present the parameters of models 1, 2 and 8 in connection with the R output (see also Table3ofKarlis and Ntzoufras, 2003).