A multivariate Poisson model based on comonotonic shocks

Multivariate count data arise naturally in practice. In analysing such data, it is critical to define a model that can accurately capture the underlying dependence structure between the counts. To this end, this paper develops a multivariate model wherein correlated Poisson margins are generated by a comonotonic shock vector. The proposed model allows for greater flexibility in the dependence structure than that of the classical construction, which relies on the convolution of vectors of common Poisson shock variables. Several probabilistic properties of the multivariate comonotonic shock Poisson model are established, and various estimation strategies are discussed in detail. The model is further studied through simulations, and its utility is highlighted using a real data application.


Introduction
Multivariate count data are ubiquitous in statistical applications to medicine, risk management and many other fields. An accurate analysis of such data relies critically on the specification of a probabilistic representation of the dependence structure between the counts. A typical approach is to assume Poisson margins and use an appropriate modelling technique to define their joint behaviour. Numerous approaches can be taken to accomplish this, and as a result, there is no unique definition of a multivariate Poisson distribution. Some of the most common solutions are briefly reviewed.

Standard Approach
A well-known method for constructing correlated Poisson random variables is the multivariate reduction technique. In the most general setting, this approach is based on the convolution of independent Poisson random variables, which can be viewed as a set of multiple common shocks to the margins. This framework was first described by Mahamunulu (1967) and revisited by several authors, including Kawamura (1976Kawamura ( ,1979, Karlis & Meligkotsidou (2007) and Inouye et al. (2017).
In the trivariate case, for example, the multivariate reduction formulation amounts to expressing the components of a vector (X 1 , X 2 , X 3 ) of Poisson random variables as sums of mutually independent Poisson random shocks Y 1 , Y 2 , Y 3 , Y 12 , Y 13 , Y 23 and Y 123 , namely In arbitrary dimension d 2, correlated Poisson margins can be generated by convoluting overlapping subsets of a (2 d 1)-variate shock vector of mutually independent Poisson random variables.
While the above-mentioned model construction is intuitive and provides some flexibility in the implied correlation structure, deriving the probability distribution is tedious, thereby leading to intractable estimation, as pointed out by Karlis (2003), Karlis & Meligkotsidou (2007) and Inouye et al. (2017). A simpler version of this approach is thus typically considered wherein a single common shock variable is used to induce dependence. In arbitrary dimension d, the model can be written as in terms of mutually independent Poisson random variables Y 1 , : : : , Y d , Z. This formulation is considered by several authors, such as Loukas & Papageorgiou (1991), Loukas (1993), Tsionas (1999) and Karlis (2003). We will hereafter refer to this construction as the classical, or common shock, multivariate Poisson model. Note that when only two variables are involved, the model reduces to the classical bivariate Poisson model of Campbell (1934). This family of bivariate Poisson distributions has been extensively studied in the literature and applied in numerous contexts; for reviews, refer to, for example, Johnson et al. (1997), Inouye et al. (2017) and Schulz (2018).

Limitations
In the classical multivariate Poisson model, the common shock construction in Eqn (1) inherently imposes restrictions on the implied dependence structure: pairwise covariances must be positive and identically equal to the common shock rate E(Z), which itself cannot exceed the corresponding marginal Poisson rates. Specifically, the correlation of the pair (X j , X k ), with marginal rates j and k , respectively, is restricted to fall within the interval 0; min. j ; k /= p j k for any distinct j, k 2 {1, : : : , d}. Griffiths et al. (1979) show, however, that for an arbitrary pair of Poisson random variables with respective marginal means j and k , the correlation can fall anywhere in the interval OE min . j ; k /; max . j ; k / with where G denotes the cumulative distribution function of a Poisson random variable with rate , N G is the corresponding survival function and N D f0; 1; : : : g.
It can be checked easily that the implied pairwise correlations in the classical multivariate Poisson model based on a common shock will fall short of max , except in the case where the marginal rates are identical. Moreover, the classical model cannot account for negative dependence. Further note that while the use of multiple shocks in the more general model of Mahamunulu (1967) alleviates the restriction that all pairwise covariances coincide, the implied pairwise correlations are still restricted to the interval 0; min. j ; k /= p j k .

Alternative Solutions
There are many alternative methods that can be used to establish a multivariate Poisson model. For example, copulas provide a flexible framework for constructing multivariate distributions with arbitrary margins. As such, a multivariate Poisson model can be defined by combining Poisson margins with a particular copula family. This approach was considered by several authors, such as Van Ophem (1999), Pfeifer & Nešlehová (2004), Nikoloulopoulos & Karlis (2009), Smith & Khaled (2012, Panagiotelis et al. (2012) and Inouye et al. (2017).
As another option, a multivariate Poisson distribution can be derived in terms of mixture models. Examples are given by Karlis & Meligkotsidou (2007), Sarabia & Gómez-Déniz (2011), Gómez-Déniz et al. (2012 and Sellers et al. (2016), to name a few. For an extensive review of these alternative multivariate Poisson model formulations, refer to, for example, Inouye et al. (2017) and Schulz (2018). Despite the flexibility that ensues from such model constructions, the resulting distributions are often very complex and difficult to interpret.

Purpose of the Present Paper
The goal of this work is to introduce a multivariate Poisson model allowing for a flexible dependence structure, as is the case for copula and mixture models, while retaining the simplicity and interpretability that ensues from the multivariate reduction technique. To this end, we propose a construction analogous to the latter, but which relies on the notion of a comonotonic shock vector rather than independent common shocks.
By definition, a random vector (Z 1 , : : : , Z d ) is comonotonic if and only if each component can be represented in terms of a common underlying uniform random variable on the unit interval. That is, for a uniform random variable U U.0; 1/, the comonotonic vector can be written as .Z 1 ; : : : ; Z d / D .F 1 1 .U /; : : : ; F 1 d .U //; where for each j 2 {1, : : : , d}, F j denotes the distribution function of Z j . It can be shown easily that a comonotonic vector follows the upper Fréchet-Hoeffding boundary distribution, that is, for all´1; : : : ;´d 2 R, Pr.Z 1 Ä´1; : : : ; Z d Ä´d / D min fF 1 .´1/; : : : ; F d .´d /g : Analogously to the method of Mahamunulu (1967), correlated Poisson random variables can be generated by convoluting a random vector (Y 1 , : : : , Y d ) of mutually independent Poisson components with varying overlaps of a set of comonotonic shocks. In the trivariate case, for example, this construction can be expressed as where the vectors (Z 12 , Z 21 ), (Z 13 , Z 31 ), (Z 23 , Z 32 ) and (Z 1 , Z 2 , Z 3 ) are mutually independent and each of them is comonotonic.
More generally, for given k 2 {2, : : : , d}, there are d k Á possible k-variate comonotonic shocks. By letting k vary from 2 to d, one then has a collection of 2 d d 1 comonotonic shock vectors of varying lengths. The use of comonotonic rather than common shocks as building blocks in the model has the advantage that the pairwise correlations can attain the upper bound given by Griffiths et al. (1979). Indeed, it can be shown that a pair of comonotonic Poisson random variables will have correlation max as given in (2).
Although the proposed multiple comonotonic shock construction allows for considerable flexibility in the inherent dependence structure, its complexity is rather daunting, as was the case in the common shock setting. In this paper, therefore, we limit ourselves to a thorough examination of the simpler model based on a single comonotonic shock vector. Specifically, let (Y 1 , : : : , Y d ) be a vector of mutually independent Poisson random variables, which is independent of the vector (Z 1 , : : : , Z d ) of comonotonic random variables with Poisson margins. A vector (X 1 , : : : , X d ) of correlated Poisson random variables can then be generated by setting We will refer to this model as the comonotonic shock multivariate Poisson model.

Plan of the Paper
The purpose of this paper is to explore the properties of the comonotonic shock multivariate Poisson model (3), which generalises to arbitrary dimension the construction first considered for d D 2 by Genest et al. (2018). The basic properties of the model, which are derived in Section 2, are relatively straightforward extensions of the results therein. However, estimation in higher dimension is considerably more complex than in the bivariate case. This is the topic of Section 3, which considers various estimation strategies and compares their performance through several simulations reported in Section 4. The merits of the proposed model are highlighted through a data illustration presented in Section 5. Finally, concluding remarks and future work are outlined in Section 6.

The Proposed Model
The multivariate Poisson model described in the following definition is based on the notion of a comonotonic Poisson random vector. Recall that a vector (Z 1 , : : : , Z d ) is said to be comonotonic if it can be written in the form .Z 1 ; : : : ; Z d / D .F 1 1 .U /; : : : ; F 1 d .U //, where U U.0; 1/ is a standard uniform random variable, and for all j 2 {1, : : : , d} and u 2 (0, 1), In what follows, P. / denotes a univariate Poisson distribution with rate . In this representation, the components of the parameter vector ƒ D . 1 ; : : : ; d / 2 .0; 1/ d are the marginal Poisson means, and the parameter Â 2 [0, 1] represents the portion of the overall mean which is due to the comonotonic shock, that is, Â D E.Z 1 C : : : C Z d /=E.X 1 C : : : C X d /: (4) It is also easily checked that Â D E.Z j /=E.X j / for all j 2 {1, : : : , d}.

Connections with Previous Proposals
When Â D 0 in the MP d .ƒ; Â / model, the margins are independent while Â D 1 results in perfectly positively dependent components, that is, the variables X 1 , : : : , X d are comonotonic. When the marginal Poisson rates are equal, that is, 1 D : : : D d D , the comonotonic shock given in Definition 1 reduces to the common shock defined in (1), given that Z 1 D : : : D Z d D Z with Z P.Â /. In dimension d D 2, the MP d .ƒ; Â / model amounts to the bivariate comonotonic shock Poisson model of Genest et al. (2018), denoted by BP.ƒ; Â /. Moreover, it is clear from the formulation given in (3) that any subset (X j , X k ) of .X 1 ; : : : ; X d / MP d .ƒ; Â / follows the BP. j ; k ; Â/ distribution.

Interpretation of Â as a Dependence Parameter
The following lemma, which is proved in Appendix A, states that the MP d .ƒ; Â / family of distributions is ordered by Â in the positive quadrant dependence (PQD) ordering introduced by Joe (1990). In what follows, F ƒ, Â denotes the cumulative distribution function of a random vector X D .X 1 ; : : : ; X d / MP d .ƒ; Â /, and N F ƒ;Â denotes the corresponding survival function.
In view of this result, the parameter Â regulates the strength of the dependence in the model. Indeed, it was shown by Joe (1990) that multivariate versions of standard measures of dependence such as Kendall's tau or Spearman's rho are monotone with respect to the PQD order. Furthermore, Theorem 9.A.5 of Shaked & Shanthikumar (2007) states that the PQD order is closed under marginalisation. Therefore, if X MP d .ƒ; Â / and X 0 MP d .ƒ; Â 0 /, where Â , Â 0 2 (0, 1) with Â < Â 0 , then for any distinct j, k 2 {1, : : : , d}, one has .X j ; X k / PQD .X 0 j ; X 0 k / and thus cov.X j ; X k / Ä cov.X 0 j ; X 0 k / and corr.X j ; X k / Ä corr.X 0 j ; X 0 k /: Accordingly, for fixed marginal parameters ƒ 2 (0, 1) d , the pairwise correlation between any pair (X j , X k ) from X MP d .ƒ; Â /, denoted as j ; k .Â /, is an increasing function of Â . This result also follows immediately from the fact that .X j ; X k / BP.ƒ; Â / and .X 0 j ; X 0 k / BP.ƒ; Â 0 /, as the PQD ordering in the BP family was established by Genest et al. (2018).

Correlation
It was already shown by Genest et al. (2018) and hence corr.
For X MP d .ƒ; Â /, it is clear that Â D 0 results in j ; k .Â / D 0 for each pair (X j , X k ) from X with distinct j, k 2 {1, : : : , d}. At the other end of the spectrum, it is easily verified that Â D 1 yields j ; k .Â / D max for all distinct j, k 2 {1, : : : , d}, that is, all pairwise correlations reach the upper bound given in (2). In this case, X is a comonotonic random vector, and its distribution reaches the upper Fréchet-Hoeffding bound. Thus, the proposed multivariate comonotonic shock Poisson model yields pairwise correlations spanning the full range OE0; max , as regulated by the dependence parameter Â 2 [0, 1]. By construction, the proposed multivariate Poisson comonotonic shock model allows for greater flexibility in the covariance structure than its classical analogue. Indeed, the proposed model results in pairwise covariances given by m j ; k .Â /, for any distinct j, k 2 {1, : : : , d}. Thus, although there is a single dependence parameter Â that determines the strength of the dependence, the magnitude of its effect is regulated by the marginal means j and k . In contrast, the classical model restricts all pairwise covariances to be identical.

Multivariate Distribution and Probability Mass Function
As detailed in Schulz (2018), the comonotonic random vector (Z 1 , : : : , Z d ) introduced in Definition 1 has probability mass function given, for all´1; : : : ;´d 2 N, by c ƒ;Â .´1; : : : ;´d / D Ä min j 2f1; ::: ;d g where OEx C D x 1.x > 0/. Using this fact, one can show readily that the cumulative distribution function of the MP d .ƒ; Â / model is given, for all x 1 ; : : : ; x d 2 N, by F ƒ;Â .x 1 ; : : : : : : : G .1 Â/ j .x j ´j /c ƒ;Â .´1; : : : ;´d /; where g denotes the univariate Poisson probability mass function given, for all x 2 N, by g .x/ D e x =xŠ. The corresponding probability mass function is given, for all x 1 ; : : : ; x d 2 N, by f ƒ;Â .x 1 ; : : : ; x d / D x 1 X 1 D0 : : : g .1 Â/ j .x j ´j /c ƒ;Â .´1; : : : ;´d /: (5) This result will be used in the estimation procedures detailed in Section 3. In comparison, the classical multivariate Poisson model in (1) generates correlated margins by setting .X 1 ; : : : ; X d / D .Y 1 C Z; : : : ; Y d C Z/ for mutually independent Poisson random variables Y 1 , : : : , Y d , Z with respective marginal rates 1 , : : : , d , . In this formulation, probabilities can be computed, for any x 1 ; : : : ; x d 2 N, according to the formula h ƒ; .x 1 ; : : : ; x d / D min.x 1 ; : As previously noted, when the marginal rates coincide, the comonotonic multivariate Poisson model reduces to the classical common shock model and the probability mass functions (5) and (6) will coincide. When the degree of dependence is low (i.e., Â and are near zero), there is very little difference between the classical and comonotonic models given that in both constructions, the vector (X 1 , : : : , X d ) is predominantly generated from the mutually independent components Y 1 , : : : , Y d . For strong degrees of dependence, however, the difference between the two formulations becomes more pronounced. When Â is large, in particular such that m j ; k .Â / > min. 1 ; : : : ; d / for any distinct j, k 2 {1, : : : , d}, the classical multivariate Poisson model is in fact completely inappropriate as the covariance must necessarily be smaller than the marginal parameters.
In order to readily contrast the classical common shock construction of (1) with the proposed comonotonic shock approach of (3), consider the bivariate setting. Comparisons can be made between the BP.ƒ; Â / model and an analogue form in the classical setup by choosing the same marginal rates ƒ D . 1 ; 2 / and covariance parameter D m 1 ; 2 .Â /. Figure 1 illustrates the difference between the probability mass functions (5) and (6) with ( 1 , 2 ) fixed at (1, 2) and Â 2 {0.10, 0.75}, or, correspondingly, 2 {0.11, 0.97}. Note that for any Â > 0.77 in the BP model with ƒ D .1; 2/, there is no equivalent form in the classical model as this results in a covariance m 1, 2 (Â ) > 1. As evidenced by the plots, the difference between the two models becomes more obvious as the degree of dependence increases.

Estimation
Let X 1 , : : : , X n be a random sample from the proposed MP d .ƒ; Â / model. There are several estimation strategies that will yield consistent estimators of the model parameters 1 , : : : , d and Â , provided that they are all strictly positive and that Â 2 (0, 1). In particular, this section will detail three techniques: the method of moments (MM), maximum likelihood (ML) estimation (MLE), and the inference function for margins (IFM) approach.

Method of Moments
The MM approach establishes parameter estimates by matching theoretical moments to sample moments. In the MP d model, the marginal parameters are simply the marginal means, and thus, the MM estimator Q ƒ D . Q 1 ; : : : ; Q d / is given by the vector . N X 1 ; : : : ; N X d / of sample means. The central limit theorem ensures that for each j 2 {1, : : : , d}, the estimator Q j is consistent and asymptotically Gaussian, that is, as n ! 1, p where denotes convergence in distribution. Note that the MM estimators are also the ML estimators in the independence model where Â D 0.
Estimation of the dependence parameter Â can be based on mixed moments. Suppose X MP d .ƒ; Â /, then for any distinct j, k 2 {1, : : : , d}, the pair (X j , X k ) has covariance m j ; k .Â /. The PQD ordering in the proposed MP d family ensures that m j ; k .Â / is a strictly increasing function of Â . For arbitrary j, k 2 {1, : : : , d}, let S jk denote the sample covariance of (X j , X k ), namely For any j, k 2 {1, : : In small samples, S jk may occasionally fall beyond the permissible bounds; in such instances, one could set Q Â jk D 0 or 1 as the case may be. However, if S jk is significantly smaller than 0, one should consider an alternative model specification to account for this negative dependence.
As stated in, for example, Theorem 8 of Ferguson (1996), the sample covariance is asymptotically Gaussian with asymptotic variance Then, an application of the delta method leads to where jk D m j ; k .Â / and 0 jk denotes the first derivative of jk . It is clear that in dimensions d > 2, there are in fact d(d 1)/2 distinct possible estimators of the form Q Â jk for all distinct j, k 2 {1, : : : , d}. Each of these estimators is consistent and asymptotically Gaussian. A moment-based estimator Q Â can then be defined as the average of these estimators, namely As the dimension increases, the aforementioned estimation procedure may become rather long to carry out. As an alternative, one could consider averaging in (7) over a large subset of estimates Q Â jk . The following theorem establishes the asymptotic behaviour of the moment-based estimator Q Â . Its proof is given in Appendix B.
Proposition 2. Let X 1 , : : : , X n be a random sample with distribution MP d .ƒ; Â /. For each choice of j, k 2 {1, : : : , d} with j < k, let Q Â jk denote the unique solution to the equation Â be the average of the set of estimators Q Â jk with j, k 2 {1, : : : , d} and j < k. Then, as n ! 1, p where the asymptotic variance is given by In the aforementioned expression, ( † ) denotes the vectorisation of the inverse covariance function jk ( jk ) with d D d.d 1/=2 elements and 0 denotes the corresponding componentwise derivative, that is, In general, the asymptotic variance of the moment-based estimator Q Â does not result in a compact expression. Nevertheless, simplifications occur in some special cases. For each 2 {1, : : : , d}, write X` `DY`CZ`, whereY`andZ`represent centred variables, respectively, given byY`D Y` .1 Â/ `, andZ`D Z` Â `. One then has This expression can then be further decomposed by considering the overlaps between the indices j, k, r, s. Additional simplifications ensue when 1 D : : : D d D . For example, when j < k, r < s and the four indices are distinct, it can be shown that

Maximum Likelihood Estimation
Likelihood-based estimation is also possible in the proposed MP d family. Based on the formulation of the joint probability mass function given in (5), the log-likelihood can be expressed as`.

Inference Function for Margins
Although ML estimation is valid, its actual implementation may become numerically infeasible in high dimensions. The IFM approach is an appealing alternative likelihood-based estimation technique wherein marginal parameters are estimated by their respective marginal ML estimators, say L 1 ; : : : ; L d , and the dependence parameter estimator is determined by L Â D arg max Â`. L 1 ; : : : ; L d ; Â/. This method is often used, e.g., in inference for copula models, where there is a natural separation between the marginal and dependence parameters; refer to Joe (2005Joe ( , 2014. In adapting the IFM approach to the MP d .ƒ; Â / model, first consider rewriting the probability mass function given in (5) as f ƒ;Â .x 1 ; : : : ; : : : ! ƒ;Â .zI x/c ƒ;Â .´1; : : : ;´d /; where ! ƒ, Â (z; x) can be regarded as a weight function defined by The expression given in (9) leads to an alternative formulation of the log-likelihood, namelỳ .ƒ; Â / D`1. 1 / C : : : C`d . d / C`D.ƒ; Â /. Here,`j( j ) is the marginal log-likelihood for the j-th component X j P. j / and the term`D(ƒ, Â ) encompasses the dependence between the margins and is given bỳ : : : The implementation of the IFM method results in a two-stage estimation procedure. In the first step, the marginal Poisson rates are simultaneously and independently estimated by their respective marginal ML estimates, that is, L j D N X j for all j 2 {1, : : : , d}. The second step involves solving @ @Â`.
L ƒ; Â / D @ @Â`D . L ƒ; Â / D 0: As outlined in, for example, Joe (2005), the estimators that result from the system given in (10) are consistent and asymptotically Gaussian. In particular, as n ! 1, p n. L ‰ ‰/ N.0; L V /, where the asymptotic variance is given by where M L s D cov fL s.XI ‰/g and D L s D E˚@ L s.XI ‰/=@‰ > « . Because the elements of the estimating equation (10) are composed of marginal and joint log-likelihoods, the asymptotic variance can be further simplified. Set q D d C 1 and, as previously introduced, let I denote the Fisher information matrix based on the full log-likelihood (ƒ, Â ). Using the convention that M jk denotes the (j, k)-th entry of a matrix M, the following simplifications, as shown in Schulz (2018), follow for the asymptotic variance matrix: I qj I qk m j ; k .Â /:

Comparison Between the Three Methods
Both ML estimation and the IFM method derive estimators according to a set of estimating equations, as given in (8) and (10), respectively. In comparing these two sets of estimating equations, it is clear that the difference between these two likelihood-based approaches is that the IFM method systematically ignores the information pertaining to the marginal parameters that is contained in the dependence portion of the log-likelihood`D(ƒ, Â ). As a result, there is a loss in efficiency in the IFM framework. However, when Â D 0, independence ensues and thus D .ƒ; 0/ D 0 so that the IFM estimation procedure coincides with a full ML approach. In the case of independence, as previously noted, the MM will also yield identical estimators as the ML approach.

Including Covariate Effects
In applications, it is often of interest to fit models that take into account covariate effects. In the univariate setting, this could be accomplished via generalised linear models (GLMs) wherein a transformation of the Poisson mean is expressed in terms of a linear predictor. This is typically performed by considering the log link function and setting ln. / D Tˇ, where T is a 1 p vector of covariates andˇis a p 1 vector of regression parameters.
Model (3) can be adapted to allow for the inclusion of explanatory variables. To this end, the parameters can be re-expressed in terms of covariates T, namely 1 D g 1 1 .Tˇ1/; : : : where for k 2 {1, : : : , d C 1}, g 1 k denotes the inverse link function for the k-th component. For univariate Poisson margins, an appropriate choice of link function is the natural logarithm. Hence, the exponential function is a suitable choice for g 1 1 ; : : : ; g 1 d . While it is natural to express the marginal means in terms of linear predictors, it is less clear whether the dependence parameter Â should depend on covariates and, moreover, what functional form the relation should assume, that is, the form of the function g d C 1 . As previously stated in (4), the dependence parameter can be viewed as the portion of the overall mean that is due to the comonotonic shock. Accordingly, the logit link function could be a suitable choice, wherein ln fÂ=.1 Â/g D T˛. where T again represents a 1 p vector of covariates and˛is a p 1 vector of regression parameters.
Both likelihood-based methods could be used to estimate the regression coefficients in the model that accounts for covariate effects. In considering the aforementioned link functions, the parameters can be expressed, for each j 2 {1, : : : , d}, as j D j .ˇj / D e Tˇj and Â D Â.˛/ D e T˛= .1 C e T˛/ . By design, the constraints 1 , : : : , d 2 (0, 1) and Â 2 (0, 1) will be satisfied. In a full ML approach, estimators are then given by .

Simulations
Simulations were conducted in order to assess the estimation methods outlined in Section 3. Several scenarios were considered wherein the values of the parameters, dimension, and sample size were varied. Specifically, 30 distinct scenarios were examined, each of which resulted from the combination of dimension d 2 {2, 3, 4}, dependence parameter Â 2 {0.10, 0.25, 0.50, 0.75, 0.90} and sample size n 2 {50, 500}. Throughout, the marginal parameters were held fixed at ƒ D .1; : : : ; d / as the primary interest was to assess the impact of dependence in the model. For each scenario, a random sample of size n was generated from the specified MP d .ƒ; Â / model according to Algorithm 1.
In each of the 30 cases, the three estimation methods discussed in Section 3 were implemented and replicated 500 times. Sample code for the data generation process and the implementation of the three estimation procedures is provided in the supporting information.
Recall that under the MM, estimation of the dependence parameter involves solving for the unique root of S jk D m N X j ; N X k .Â / for each j, k 2 {1, : : : , d} with j < k. This can readily be implemented in R using the uniroot function. This, however, is only feasible for values of S jk falling within the permissible range OE0; m N X j ; N X k .1/ implied by the MP d . Q ƒ; Â / family. Whenever S jk fell above the threshold m N X j ; N X k .1/, the convention was to set Q Â jk D 1. Whenever S jk < 0, however, the estimate Q Â jk was assigned a value of NA and was excluded from the average (7) yielding the MM estimate Q Â . In over 90% of iterations, there were no instances of negative pairwise correlations. Moreover, in only 1.5% of cases were all pairwise correlations negative, rendering estimation of the dependence parameter impossible. Such issues occurred mostly when .d; Â; n/ D .2;50; 0:10/. Both likelihood-based methods (i.e., ML and IFM) rely on maximising some form of a score function and were implemented using the optim function in R, where the MM estimates were used as starting values for the procedure. Note that whenever the MM estimate Q Â was found to be NA, the starting value was set to Â .0/ D 0:01, while when Q Â D 1 a starting value of Â .0/ D 0:99 was used instead. This was done to avoid issues at the boundary of the parameter space of Â . In order to simplify the optimisation procedures, the reparametrisation discussed in Section 3.2 was used. Recall that the latter involves the transformed parameters 1 D ln. 1 /; : : : ; d D ln. d /, Á D lnfÂ=.1 Â/g. In what follows, the reported results are transformed back to the original (ƒ, Â ) scale for convenience. Figures 2 and 3 display the dependence parameter estimates obtained from the various estimation methods for the specific set-ups (d, Â , n), where Â D 0:25 and 0.90, respectively. The IFM and ML methods have similar performance across all scenarios. The results suggest that the MM performs better for weaker levels of dependence, for all dimensions and sample sizes. In particular, the variability in the resulting MM estimates is similar to that under the IFM and ML methods for lower values of Â , while for stronger dependence levels, the variability in the MM estimates is noticeably greater than that for the two likelihood approaches.
The results also show a decrease in variability for the two likelihood-based approaches as the strength of the dependence increases. This phenomenon is in accordance with intuition: for larger values of Â , the random vector .X 1 ; : : : ; X d / MP d .ƒ; Â / will be predominately generated from the underlying comonotonic shock (Z 1 , : : : , Z d ), as in (3). This induces less uncertainty as the components of the comonotonic shock vector are generated from a common

Figure 2. Estimation results for Â from the method of moments (MM) (left), inference function for margins (IFM) method (middle) and maximum likelihood estimation (MLE) (right) in indicated scenario
uniform random variable. Moreover, the estimation results improve as n increases, as expected from standard theory.
Note that there were some instances where seemingly outlier estimates were obtained under the likelihood-based estimation methods. For example, this can be seen in the results for the scenarios with Â D 0:90. These outlier occurrences tended to happen when large starting values were used in the optimisation procedure. As initialising the algorithm at a value Â (0) close to the boundary of the parameter space can sometimes lead to poor estimates, caution is advised. Figure 4 displays the results for the marginal parameter estimates in dimension d D 3 for Â D 0:50. Recall that the MM and IFM methods both result in estimators given by the marginal sample mean, that is, Q j D L j D N X j for all j 2 {1, : : : , d}. Across all scenarios, the MM/IFM and ML results are very similar, suggesting that there is little information about the marginal parameters ƒ in the dependence portion of the log-likelihood`D(ƒ, Â ). This further reinforces the utility of the IFM approach as a likelihood-based estimation method. Figure 5 displays the run times in R for the three estimation procedures considered, respectively shown for each d 2 {2, 3, 4}. The MM is the fastest across all dimensions, followed by the IFM approach and lastly ML estimation. The execution time for the full ML estimation became increasingly burdensome as the dimension grew. The IFM approach, however, remained relatively quick to implement in higher dimensions. Notably, in dimension 4, the average run time of the ML method is more than six times longer than the average IFM execution time and almost 17 times than that of the MM. Conceivably, a full ML approach could eventually become infeasible as the dimension increases. This highlights the appeal of the IFM method as an alternative likelihood-based estimation method in high-dimensional settings.
An additional set of simulations was carried out to investigate the performance of the estimation methods in higher dimensions, specifically the case where d D 10. As the previous

the method of moments (MM)/inference function for margins (IFM) method (left) and maximum likelihood estimation (MLE) (right) in indicated scenario
simulations suggested that both likelihood-based approaches yielded similar results, we focused here on the IFM approach as well as the MM, with similar implementations are previously discussed. Throughout the simulations, the sample size was held fixed at n D 250, the marginal parameters were set to ƒ D .0:1; 0:2; 0:3; 0:4; 0:5; 0:6; 0:7; 0:8; 0:9; 1:0/ and Â was varied in {0.1, 0.5, 0.9}. Note that in order to speed up the simulations, the convergence tolerance levels were lowered in this second set of simulations to 10 4 . Each scenario was replicated 500 times, resulting in a total of 1500 iterations.
The results, which are summarised in Figure 6, are consistent with what was previously observed. In particular, for weaker levels of dependence there is seemingly little difference between the MM and IFM methods, while for larger values of Â the likelihood-based IFM approach shows considerably less variability. Note that estimation results for the marginal parameters ƒ are not shown as both the MM and IFM approaches yield identical estimates equal to the marginal sample means.
The implementation times (which are not shown) were substantially longer for IFM estimation in comparison with the MM in dimension 10. Indeed, the IFM estimation procedure took, on average, between 1.4 and 68.2 times longer than that of the MM. Not surprisingly, executing the estimation procedures for large d becomes increasingly burdensome. In particular, the MM involves computing the average of the solutions to d(d 1)/2 equations, while likelihood-based estimation involves evaluating a summation across d variables. The latter stems from the form of the probability mass function, as given in (5) or equivalently in (9).
Based on the simulations considered here, the implementation of the MM seemingly remains feasible even as the dimension becomes quite large. Nonetheless, a viable alternative for large d would be to consider taking the average of only a subset of the d(d 1)/2 estimators Q Â jk with j, k 2 {1, : : : , d} and j < k. As each Q Â jk is consistent and asymptotically Gaussian, an

Figure 6. Estimation results for Â from the method of moments (MM) (left) and inference functions for margins (IFM) method (right) in indicated scenario
average of any number of these estimators will also be consistent and asymptotically Gaussian. Note that a practical advantage of the MM estimation approach in higher dimensions is that in averaging over d(d 1)/2 estimators, there are less problematic occurrences wherein each S jk < 0, rendering estimation of Â impossible. Indeed, in this second set of simulations, there were no instances wherein all pairwise covariances were negative leading to Q Â D NA, as had been previously observed in the first set of simulations.
The implementation of the IFM method is more challenging for larger d. Note that choosing lower values for the marginal rates, as was performed here with ƒ D .0:1; 0:2; 0:3; 0:4; 0:5; 0:6; 0:7; 0:8; 0:9; 1:0/, rendered the computation of the IFM estimates more feasible. Indeed, larger values of ƒ quickly increase the number of summands involved in the dependence portion of the log-likelihood`D(ƒ, Â ) thereby increasing the computational difficulty. Despite the choice for ƒ used in this second set of simulations, there were some instances (specifically, 18 out of the 1500 iterations) in which the IFM estimation method failed because of size issues. Recall that the implementation of the IFM approach consists of a twostep procedure. As a first step, the marginal parameter estimates are obtained as the marginal ML estimates L 1 D N X 1 ; : : : ; L d D N X d . At the second step, the dependence parameter is estimated according to the score equation @`D. L ƒ; Â /=@Â D 0. Notice that the optimisation problem for IFM estimation consists only of univariate score equations. Nonetheless, the implementation of the IFM approach proved to be difficult in higher dimensions. Further explorations of more efficient implementation strategies are necessary for large d.

Data Illustration
To illustrate the usefulness of the proposed model, an application of the MP d class was carried out using open data provided by BIXI, Montréal's bicycle sharing system (https://montreal. bixi.com/en/open&hyphen;data). The raw data detail each bike rental occurrence and provide information on the exact date and time of departure and arrival, along with the corresponding departure and arrival stations, the total bike usage time as well as a binary variable indicating whether the user was a BIXI member.
We considered data from the 2017 season, extending from 15 April to 15 November, inclusively, and focused on bike usage information collected at three nearby stations located on Montréal's south shore town of Longueuil. In particular, we applied the trivariate model to the number of daily bike rentals taken from three specific departure stations, located at the intersection of St Charles and Charlotte streets (station 1), the intersection of St Charles and St Sylvestre streets (station 2) and at Collège Édouard-Montpetit (Station 3), denoted by the random vector (X 1 , X 2 , X 3 ). We used a subset of the data consisting only of BIXI members, resulting in a total of 215 observations. The stations exhibit moderate dependence, with sample correlations given by R 12 D 0:33, R 13 D 0:29 and R 23 D 0:37, where R jk D S jk =. N X j N X k / 1=2 for distinct j, k 2 {1, 2, 3}. The data showed evidence of Poisson margins, as supported by a standard 2 goodness-of-fit test (using the gofstat function in R), which yielded p-values of 0.92, 0.14 and 0.12 for Stations 1, 2 and 3, respectively.

A First Model
The three estimation methods discussed in Section 3 were applied to the BIXI data, namely the MM, the IFM approach and ML estimation. The results are summarised in Table 1. For comparisons, estimation results under the classical model (1) are also given; refer to Table 2. Recall that the latter model defines correlated Poisson margins in terms of a common shock. By letting Z denote the common shock variable, the classical model sets X 1 D Y 1 C Z, X 2 D Y 2 C Z and X 3 D Y 3 C Z, for mutually independent Poisson random variables Y 1 , Y 2 , Y 3 and Z with respective rates 1 , 2 , 3 and . In implementing the likelihoodbased estimation techniques, in both the common and the comonotonic shock models, the MM estimates were used as starting values for the optimisation procedures.
Tables 1 and 2 display the model parameter estimates and the implied pairwise correlations along with 99% bootstrap confidence intervals. The latter intervals are based on 999 bootstrap replications, as there was one iteration out of 1 000 in which the IFM optimisation failed to converge within the maximum number of iterations (set to 100 by default).  In model (3), the MM and IFM methods lead to marginal parameter estimates that are simply the marginal sample means. This is also the case for both estimation techniques (MM and ML) in the classical common shock model. Only ML estimation in the proposed MP d model leads to different estimators O ƒ. However, as shown in Table 1, this difference is small. This is consistent with what was observed in the simulations, further demonstrating that there is very little information concerning the marginal parameters in the dependence portion of the log-likelihood.
In both the classical and proposed models, the MM seemingly implies a stronger dependence level than the likelihood-based methods. Note that for both models, the bootstrap confidence intervals for the dependence parameter, respectively Â and , are wider for the MM than for ML estimation. In the MP 3 model, the IFM approach leads to the widest bootstrap confidence intervals for Â . This is the result of outlier values for L Â that are sometimes obtained in the optimisation procedure. This was also observed in the simulations.
Recall that the sample correlations were found to be R 12 D 0:33, R 13 D 0:29 and R 23 D 0:37. The results for the proposed MP 3 model (Table 1) show that each of these values falls within their respective interval obtained under the MM. In comparison, the bootstrap intervals obtained from the likelihood-based methods contain both R 12 and R 13 but fail to encompass R 23 . The classical model (Table 2) performs slightly worse: while the MM intervals contain each of the sample correlations, the intervals produced by the ML approach only contain R 12 . It is not surprising that the intervals based on the MM include all the pairwise sample correlations as this estimation technique is based on the sample covariances. The seemingly poor performance of the likelihood-based methods suggests that the underlying probabilistic model is not entirely satisfactory. It may thus be that a single dependence parameter is insufficient to regulate the strength of the dependence between all components of (X 1 , X 2 , X 3 ).
Interestingly, there is more evidence in favour of the proposed model if the sample correlation between variables X j and X k is computed using variance, rather than mean, estimates for the marginal Poisson rates. Specifically, suppose that for distinct j, k 2 {1, 2, 3}, the correlation between variables X j and X k is given by R jk D S jk =.S jj S kk / 1=2 , where S jj denotes the sample variance. One then finds R 12 D 0:26, R 13 D 0:25 and R 23 D 0:29. In the proposed MP 3 model, each of these sample correlations is contained within their respective bootstrap confidence intervals, whatever the estimation technique. This is not the case, however, in the classical model. While the MM intervals still contain the three pairwise sample correlations, the bootstrap intervals obtained from ML estimation fail to include R 23 .

Discussion
It is not surprising that the MP 3 model as well as the classical model (1) both present difficulties in quantifying the dependence structure, particularly the association between X 2 and X 3 . There are notable similarities in the implied pairwise correlations between stations 1 and 3 and between stations 2 and 3 in the proposed comonotonic shock model; refer to Table 1. Indeed, the estimates for 1 and 2 are comparable across all methods, thus leading to similar implied correlations 13 and 23 . Specifically, under the MM and IFM methods, the estimated marginal rates at stations 1 and 2 are respectively given by N X 1 D 1:39 and N X 2 D 1:53, which, by construction of the MP 3 family, leads to m N X 1 ; N X 3 .Â / m N X 2 ; N X 3 .Â / for any Â 2 (0, 1). A similar phenomenon is seen in the classical common shock model; refer to Table 2. This feature is a drawback of both the classical and proposed models. Indeed, whether a common or comonotonic shock is used in the construction, the multivariate reduction technique implies that if j D k , then j`D k`f or any`2 {1, : : : , d} n {j, k}. This follows from the fact that in both model formulations, there is a single dependence parameter simultaneously regulating the correlation structure between all components. However, the data suggest that there is a stronger association between X 2 and X 3 in comparison with that between X 1 and X 3 . The proposed comonotonic model, as well as the classical common shock model, is seemingly unable to capture this particular dependence structure. Nonetheless, the MM is able to better measure this dependence structure as this approach is based on the pairwise sample covariances.

Regression Model
In order to improve model fit, covariate effects were introduced into the proposed MP 3 model. As mentioned in Section 3.5, there are numerous ways in which this can be performed. Here, we considered the simple case wherein the marginal Poisson rates are expressed in terms of explanatory variables through GLMs and the dependence parameter is held fixed. In the context of the BIXI data application, two categorical variables were included in the marginal GLMs: one for the day of the week (taking on seven levels) and the other for the month (taking on eight levels).
For computational ease, estimation was carried out using the IFM approach. The implementation of the latter consisted of first fitting marginal Poisson GLMs (with log link functions) to the data at each individual station to yield daily level estimated means L ij D exp.t i Ľ j /, where t i is used to denote the explanatory variables corresponding to observation x i , for each i 2 {1, : : : , 215} and j 2 {1, 2, 3}. The next step in the estimation procedure then consisted of finding L Â , which maximised`. L ƒ; Â /. This amounts to solving @`D. L ƒ; Â /=@Â D 0. The fitted marginal GLMs showed similar patterns across stations. In particular, the estimated regression coefficients suggested that the months of July through September tended to have higher daily bike rental rates and that Thursdays were the busiest day of the week. An analysis of deviance was used to assess the marginal model fits. For each of the three respective stations, comparisons with the null model confirmed that the inclusion of the covariate effects via GLMs provided significantly better fits. Furthermore, comparisons with the full (saturated) model indicated that, with the exception of station 2, the fitted models were inadequate simplifications of the full model. Indeed, the GLMs were rather simplistic, and it is likely that important covariates are missing from the marginal models.
The regression-based IFM approach resulted in L Â D 0:18, with 99% bootstrap confidence interval given by (0.00, 0.35), based on 1 000 bootstrap replications. Recall that in the covariate-free model considered in Section 5.1, the dependence parameter estimate was given by L Â D 0:23 with a similar 99% bootstrap confidence interval of (0.00, 0.35). Adjusting for covariate effects in the model allows for varying daily level marginal Poisson rates for each site, which correspondingly allows for varying daily pairwise correlations i;j k D m i;j ; i;k .Â /= p i;j i;k , where i, j is the i-th daily rate at Station j. For the BIXI data application, the implied n D 215 daily pairwise correlations are summarised in Figure 7. The regression-based approach seems to imply slightly weaker levels of dependence across all stations. This suggests that perhaps the association between (X 1 , X 2 and X 3 ) can be partly explained by marginal covariate effects, thereby reducing the residual dependence.
The Akaike Information Criterion (AIC) can be used to compare the regression MP 3 model with the null MP 3 model considered in Section 5.1. In the regression-based MP 3 model discussed here, there are 14 coefficients for the marginal GLMs at each of the three stations, in addition to the dependence parameter, leading to a total of 43 model parameters. In comparison, the null model only involves a total of four parameters. Despite the added complexity induced by the covariate effects, the AIC for the regression model was found to be 2 083.208 in comparison to 2 180.517 for the null model, suggesting that the inclusion of covariate effects does indeed lead to a better model. In contrast, when the Bayesian Information Criterion (BIC) is used, the regression model yields a value of 2 228.145, while the null model results in a value of 2 194.000, indicating that the simpler model is more appropriate. The differing conclusions based on the AIC and BIC measures are not surprising, as BIC tends to penalise more severely for model complexity.
Note that while these measures are convenient for contrasting the regression based model with the covariate-free model, it is important to bear in mind that the IFM estimation method  does not optimise the full log-likelihood, but rather the marginal log-likelihoods and the modified dependence portion`D. L ƒ; Â /, as detailed in Section 3.3. However, as demonstrated through the simulations, there is seemingly very little difference between the IFM and MLE approaches. For further comparisons, Table 3 provides the log-likelihood, AIC and BIC values for both the regression and null models, respectively, according to the various estimation methods considered.

Conclusion
This paper introduces a multivariate Poisson model based on a comonotonic shock. In considering a multivariate reduction technique with a comonotonic rather than common shock vector, the proposed model is able to capture strong degrees of (positive) dependence that the classical model fails to attain. Indeed, the proposed MP d family allows for greater flexibility in the implied correlation structure while retaining an intuitive and interpretable stochastic representation akin to the classical setting. In particular, the proposed multivariate Poisson model was shown to be PQD ordered in terms of the dependence parameter Â and, moreover, can attain the upper Fréchet-Hoeffding bound. Several estimation techniques were discussed for the multivariate comonotonic shock model, each of which allows for consistent estimation of the model parameters. Through simulations as well as a real data application, each of the estimation methods was shown to perform well. While the proposed model provides an improvement to the classical common shock construction, the MP d family does not allow for a fully flexible representation, and indeed, the correlation structure suffers from certain inherent restrictions, as highlighted in the data application. First, the proposed model only allows for positive dependence, and there is no immediately obvious adaptation to the case of negatively correlated margins, except in dimension d D 2.
In the bivariate setting, the BP model can be adapted to allow for negative dependence by considering a counter-monotonic rather than comonotonic shock vector. In this setting, the shock vector has the form .Z 1 ; Z 2 / D .G 1 Â 1 .U /; G 1 Â 2 .1 U //; where U U.0; 1/, thus yielding negatively correlated Poisson pairs. This model, denoted by BP , was explored in Genest et al. (2018). It was shown there that the BP family is also PQD ordered in terms of Â , with pairwise correlations spanning from min in the case of perfect negative dependence to 0 in the case of independence. The difficulty in extending the BP model to higher dimensions stems from the fact that there is no convenient notion of countermonotonicity in dimension d > 2. Although this problem may be circumvented by considering multiple comonotonic and counter-monotonic shocks, similar to what was outlined in Section 1, the benefit of the added flexibility will likely be outweighed by the resulting model complexities and difficulties in estimation.
Another shortcoming of the proposed comonotonic shock model is that the pairwise correlations are regulated by a single dependence parameter, thus imposing certain restrictions on the covariance structure. For example, suppose that the margins X j and X k have a common marginal parameter j D k D , for some distinct j, k 2 {1, : : : , d}. Then for any`2 {1, : : : , d}, it follows that cov.X j ; X`/ D cov.X k ; X`/ as both covariances are equal to m ; `. Â /. This limitation of the model was highlighted in the data illustration. In the extremal cases, if there is a single pair (X j , X k ) within the vector X MP d .ƒ; Â / exhibiting independence, then Â must necessarily be equal to 0, and the vector X must be composed of mutually independent components. At the other end of the spectrum, if the pair (X j , X k ) exhibits perfect positive dependence, then Â D 1 and the random vector X must be comonotonic, that is, its distribution attains the upper Fréchet-Hoeffding bound. Clearly, there is a need to further develop the comonotonic shock construction to allow for more flexibility in the implied correlation structure. However, when d D 2 the resulting BP model allows for a fully flexible (positive) dependence structure as it is based on a single pair to begin with.
Despite some shortcomings, the MP d model is a significant improvement to the classical common shock representation and provides a novel modelling framework for multivariate count data. The intuitive form of the comonotonic shock construction allows for a straightforward understanding of the implied correlation structure with an interpretable parametrisation. The MP d model will certainly prove to be a helpful tool for characterising dependence patterns exhibited by multivariate count data.
By construction, both the random vectors S C Z and Z 0 have Poisson distributed margins with respective rates Â 0 1 , : : : , Â 0 d . Given that Z 0 is comonotonic, it follows immediately from the Fréchet-Hoeffding bound inequalities that S C Z PQD Z 0 . Theorem 9.A.4 of Shaked & Shanthikumar (2007) ensures that PQD ordering is closed under convolutions and thus T C S C Z PQD Y 0 C Z 0 . This concludes the proof, because X D T C S C Z and X 0 D Y 0 C Z 0 .