A New Bivariate INAR(1) Model with Time-Dependent Innovation Vectors

: Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the inﬁnite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent in the sense that the mean of the innovation vector is linearly increased by the previous population size. We discuss the stationarity and ergodicity of the observed process and its subprocesses. We consider the conditional maximum likelihood estimate of the parameters of interest, and establish their large-sample properties. The ﬁnite sample performance of the estimator is assessed via simulations. Applications on crime data illustrate the model.


Introduction
Bivariate count data occur in many contexts, often as the counts of two events, objects or individuals during a certain period of time. For example, such counts occur in epidemiology when two kinds of related diseases are examined, in criminology when two kinds of crimes are committed, in business when the volume of sales of two correlated products are observed or in manufacturing when two similar products are produced.
In real application, the observed time series data are often discrete, over-dispersed (the empirical variance is greater than the empirical mean) and have other features such as time dependence. Many univariate models have been proposed to deal with integer-valued time series data based on the univariate binomial thinning operator " • ", which is proposed by Steutel and van Harn [1]: where X is a non-negative integer-valued random variable and P(W i = 1) = 1 − P(W i = 0) = α. The INAR(1) model [2], The BAR(1) model [3], the INAR(p) model [4], The PDINAR(1) model [5] and the BARIO model [6] are very popular in analyzing non-negative integer-valued time series; see Weiß [7], Scotto et al. [8] and Davis et al. [9] for recent reviews on this topic. Motivated by infinite-patch metapopulation models discussed in Buckley and Pollett [10], Weiß [11] proposed an extension to the popular Poisson INAR(1) model, which is characterized by time-dependent innovations, i.e., the mean of the innovation is linearly increased by the previous population size. An important advantage of this model is that it gives a reasonable interpretation for immigration, which becomes more attractive if the current population is large; see Weiß [11] for an application to iceberg order data. Univariate models are extensively investigated in the literature, but relatively few multivariate models, especially for bivariate versions, have been studied in detail. Franke and Rao [12] proposed a multivariate INAR(1) model, which is generalized to the p-order case by Latour [13]. Pedeli and Karlis [14] discussed a tractable bivariate INAR(1) model, which can be used to deal with bivariate count data with equi-dispersion and over-dispersion, but with small flexibility. See Pedeli and Karlis [15] for the estimation of the BINAR model and Pedeli and Karlis [16] for a further discussion of the properties of the multivariate INAR(1) model. Based on hierarchical dynamic models, Ravishanker et al. [17] described a Bayesian framework for estimation and prediction for multivariate times series of counts. Popović [18] proposed a bivariate INAR(1) model with random coefficients based on different binomial thinning operators. The above models assumed the innovations of their marginal models are independent and identically distributed. Based on a finite range of counts, Scotto et al. [19] considered the density-dependent bivariate binomial autoregressive models by using their state-dependent thinning concept. Li et al. [20] proposed a bivariate random coefficient INAR(1) model with asymmetric Hermite innovations. Inspired by Weiß [11], we aim at providing a bivariate INAR model to analyze bivariate time series with time dependence and cross-correlation. The first contribution is that this paper gives an available method to capture the time-dependence trend by imposing the past information in the distribution of the innovational vector, which in turn makes the crosscorrelation between the two entries into an innovation vector. The second contribution is that the new model not only allows autocorrelation but also allows cross-correlation. The third contribution is that this paper illustrates the stationarity and ergodicity of the extended bivariate INAR process and its two subprocesses, which is important to derive the consistence and asymptotic normality of the CML estimation.
The remainder of the paper is organized as follows. In Section 2, we first give brief reviews of the bivariate Poisson distribution and the bivariate binomial thinning operator, based on which we give the definition of the new bivariate INAR(1) model. Conditional maximum likelihood (CML) estimates and the asymptotic properties of unknown parameters are discussed in Section 3. A simulation and two real data examples that show the effectiveness of the new model are given in Sections 4 and 5, respectively. Conclusions are made in Section 6.
From Kocherlakota and Kocherlakota [21], we obtain the fact that if (X, Y) follows BP(λ 1 , λ 2 , φ), there must exist three mutually independent random variables Z 1 , Z 2 , Z 3 such that X = Z 1 + Z 3 and Y = Z 2 + Z 3 , where Z 1 , Z 2 and Z 3 follow Poisson(λ 1 − φ), Poisson(λ 2 − φ) and Poisson(φ), respectively. Then, we have the conclusion that Cov(X, Y) = φ. In addition, P(X = x, Y = y), given in (2), is continuous and differentiable. For convenience, we denote f (x, y, λ 1 , λ 2 , φ) = P(X = x, Y = y). By using Lemma A3 in Li et al. [20], we obtain that Applying the univariate binomial thinning operator " • " given in (1) to the bivariate case with X = (X 1 , X 2 ) leads to the bivariate binomial thinning operator: where α ij ∈ (0, 1), i, j = 1, 2, X 1 and X 2 are non-negative integer-valued random variables, and all the thinnings are performed independent of each other. By calculation, E(A • X) = AE(X). Denoting V as the 2 × 2 variance matrix of the Bernoulli random variables In the following, we give the definition of the new bivariate INAR(1) model, which not only includes the property of the models defined by Pedeli and Karlis [14,16] but also allows the innovation vectors { t } to be time-dependent.
For simplicity, we denote the new model as the EBINAR(1) model. It is easy to see that the ith equation of model (6) is presented by: Notice that the model given by (7) is similar to the one discussed in Weiß [11], the main difference is that X it involves two paralleled survivors X 1,t−1 and X 2,t−1 . It is known that the EBINAR(1) process {X t , t ∈ Z} has two parts: the first part consists of the survivors of the elements of the system at the preceding time t − 1, denoted by X t−1 ; the other part is comprised by the time-dependent innovation vector t , which implies that the mean of the innovation vector is linearly increased by the previous population size.

Remark 1.
(1). If both A and B are diagonal matrices, the component equation given in (7) becomes to the one discussed by Weiß [11].
(2). If A is diagonal and B = 0, model (6) becomes the one discussed in Pedeli and Karlis [14], but it is worth mentioning that the autoregression matrix in Pedeli and Karlis [14] is diagonal, which means that it causes no cross-correlation in the counts.
(3). If A is non-diagonal and B = 0, model (6) becomes the one discussed in Pedeli and Karlis [16], which accounts for cross-correlation in the counts, but they still keep the innovations of their marginal models independent and identically distributed such that the time dependence can not to be captured.
To derive the pmf the EBINAR(1) process, we first denote h(k, m 1 , m 2 , α 1 , α 2 ) := P(X + Y = k) is the convolution of X + Y, ∀k ≥ 0 with X ∼ Bin(m 1 , α 1 ) and Y ∼ Bin(m 2 , α 2 ). By calculation, we obtain that Furthermore, by using Lemma A3 in Li et al. [20], Second, we denote ς = (ς 1 , ς 2 ) , ϑ = (ϑ 1 , ϑ 2 ) , k = (k 1 , k 2 ) and let x = ς 1 − k 1 and y = ς 2 − k 2 . Then, the conditional probability distribution of the EBINAR(1) process takes the following form: where If the largest eigenvalue of non-negative matrix A is less than 1, then the bivariate marginal distribution of model (6) can be expressed in terms of the bivariate innovation vectors: where A 0 is an identity matrix, and X 0 is the initial value of the process.
In what follows, we first discuss the stationarity and ergodicity of processes (6) and (7), respectively. Second, we obtain the first two-moment of {X t } and { t }, respectively. Third, we give a necessary and sufficient condition for the existence of E(X 1t ) k and E(X 2t ) k for any fixed positive integer k. These properties are necessary to derive the asymptotic properties of the estimators.
If the largest eigenvalue of Γ is less than 1, there exists a strictly stationary and ergodic process satisfying (7).
To prove the stationaity of the EBINAR(1) process, we first introduce a sequence of where the largest eigenvalues of the non-negative matrices A, B and Γ := A + B are less than 1, all of the non-negative matrices A, B and Theorem 2. If the conditions of Theorem 1 hold, there exists a strictly stationary process satisfying (6). and Because the joint distributions of the variables involved in the right-hand sides of (14) and in the left-hand side of the two equations are identically distributed. Hence, the process {X (n) t } is strictly stationary. Therefore, {X t } is a strictly stationary process.
According to Theorem 2, there exists an M > 0 such that where P(ϑ, ς) denotes the transition probability from state ϑ to state ς.
Denote n-state transition probability from state ς to state ϑ with P n ςϑ . For a given X t−1 , the conditional probability function of the random vector X t is derived by: then P 1 τυ > 0 for all τ, υ ∈ N 2 0 . According to (12), every state can be reached from any other state with positive probability in a finite number of steps, analogously. Hence, {X t } is irreducible. By (12), k steps of conditional probability distribution P k 0,0 are obtained with: .
Note that the first multiplier (V) is positive, which can be obtained by a similar method to (11). Denoting A j = (p ij ) i,j=1,2 , then we have: thus, the second part, (VI), is also positive. Therefore, P k 0,0 > 0, with probability one, i.e., {X t }, is aperiodic. Hence, {X t } is an ergodic Markov chain.
where Ψ involves the first moments of X (n) t and R t . Hence, the first two moments of X Proof. (1) and (2) are easy to prove by the moment property of A•, and we omit them. Here, we only give the proof of (3): ). Let λ, λ 1 and λ 2 be the largest eigenvalues of AA + 2AB , A and B, respectively. If A and B are diagonal matrices, Theorem 5. If the conditions of Theorem 1 hold, the first two moments and covariance matrices of { t } exist and: Proof. (1) is easy to prove by the distribution of t . We only need to prove (2) and (3). (2).
by the definition of t . E( t ) can be obtained directly by (6).

Theorem 6.
For any fixed positive integer k, it is a necessary and sufficient condition that Proof. For convenience, let A and B be diagonal matrices.

Parameter Estimation
In this section, we consider the conditional maximum likelihood estimation for model (6). Let θ = α ij , b ij , c i , φ , i, j = 1, 2. Suppose that X 0 , X 1 , · · · , X T are generated by the EBINAR(1) model with the true parameter value θ 0 .
To derive the identification of the EBINAR(1) model, we give the following two Lemmas.
Using the martingale central limit theorem and the Cramér-Wold device, it is direct to show that Third, there exists an H(X 1t , X 2t ) such that E[H(X 1t , X 2t )] < ∞ by Theorem 4. By the Taylor expansion, we have where θ T lies in betweenθ T and θ 0 . We observe that the ∂ (θ 0 ) ∂θ = 0 in (27) by (25), then Hence, the asymptotic normality ofθ T follows from (28).

Simulation
In this section, we conduct a simulation study to illustrate the finite sample property of the CML estimate. The simulation is carried out in R by using the optim function for the optimization of the conditional log-likelihood function.
In the simulation, we generate data from the non-diagonal EBINAR(1) model and the diagonal EBINAR(1) model. The sizes of samples are chosen to be 50, 100, 200, 500 and 1000 to reflect relatively small, small, moderate, large and relatively large sample sizes, and we use 500 replications. For the simulated sample, performances of the estimators are evaluated by mean squared error (MSE) and mean absolute deviation error (MADE), Tables 1-5 show that the MSE and MADE decrease with the increase in T for diagonal and non-diagonal models, which implies that the estimators are consistent.
To illustrate the location and dispersion of the estimates, we present the boxplots of the estimates for the non-diagonal and I of diagonal parameter combinations in Figures 1  and 2; the others are similar.

Illustrative Examples
In this section, we apply the proposed model to two crime datasets coming from different number of car beats, which is the unique ID for the observation unit's geography in Pittsburgh Police Department. The crime data is available online at "The Forecasting Principles" site (http://www.forecastingprinciples.com/index.php/crimedata) in the section about Crime data and download on 23 September 2016.
According to Cohen and Gorr [27], the occurrence of criminal mischief may be accompanied by burglary behavior, so does for the robbery. Hence, the monthly counts of burglary and CMIS (or those of burglary and robbery) may exhibit dependence. In this section, we take the monthly counts of burglary and CMIS in beat 11 and the monthly counts of burglary and robbery in beat 26 as examples.

Monthly Counts of Burglary and CMIS in Beat 11
In this part, we consider the monthly number of burglary and criminal mischief (CMIS) from January 1990 to December 2001 in the geographic ID = 11. Table 6 gives the statistics of the counts of burglaries and CMIS.  Table 6 shows that both the counts of burglaries and CMIS are over-dispersed because their variances are greater than their means. In contrast, this relationship can also be illustrated by the cross-correlation graph of the samples, which are given in Figure 3.  From Figure 3, the counts of burglaries are weakly dependent with those of CMIS. Their plots of sample path, autocorrelation function (ACF) and partial autocorrelation function (PACF) are given in Figure 4, which show that the analyzed data sets are bivariate integer-valued time series with some characteristics of mutual influence.
To give quantitative results about cross-correlation, we compare our model with the following models: • Full BINAR-BP with t following BP(λ 1 , λ 2 , φ) [16]; • Full BINAR-NB with t following bivariate negative binomial distribution with parameters (λ 1 , λ 2 , β); see [14,16] for detail.  As the goodness-of-fit criteria, we use the Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the mean square error of the Pearson residuals (PRMS), which is equal to ∑ n t=1 Z 2 t /(n − p * ), where p * denotes the number of estimated parameters and Z t denotes standardized Pearson residuals.
The CML estimate and approximated standard error (SE) of the parameter, including the fitted values of PRMS, AIC, BIC and log-likelihood function (Log Lik), are summarized in Table 7, where the approximated standard error is computed by using the estimated version of the robust sandwich matrix (J(θ 0 )) −1 I(θ 0 )(J(θ 0 )) −1 ; see Theorem 8 for details.  Table 7 shows that the EBINAR(1) model takes the highest Log Lik value and the lowest AIC, BIC and PRMS for the monthly number of burglaries and CMIS. Hence, the EBINAR(1) model is more suitable for the data sets.

Monthly Counts of Burglaries and Robberies in Beat 26
In this part, we consider the monthly number of burglaries and robberies from January 1990 to December 2001 in the geographic ID = 26; see Table 8 for some of their statistics.  Table 8 shows the monthly number of burglary and robbery are over-dispersed. In contrast, this relationship can also be illustrated by their cross-correlation graph given in Figure 5, which shows that the counts of burglary are significantly dependent on those of robbery. To further illustrate the the monthly number of burglaries and robberies in beat 26, we present their sample path, ACF and PACF plots in Figure 6, from which we can conclude that the analyzed data set exhibits some characteristics of mutual influence. To give quantitative result about cross-correlation, we compare our model with the Full BINAR-BP and Full BINAR-NB models. The CML estimate and SE, including the fitted PRMS, AIC, BIC and Log Lik, are summarized in Table 9. Table 9. Estimates for the monthly number of burglaries and robberies in beat 26.  Table 9 shows that the EBINAR(1) model takes the highest Log Lik value and the lowest AIC, BIC and PRMS for burglaries and robberies in beat 26. Hence, EBINAR(1) model is more suitable.

EBINAR(1) Full BINAR(1)-NB
To sum up, our findings reveal that there are some connections for the burglary and CMIS in beat 11 and those for the burglary and robbery in beat 26, which agrees with the conclusion of Cohen and Gorr [27]. Of course, the counts of burglary may be affected by other crime activities, such as simple assault, vagrancy and trespassing, which will be studied in a further study.

Remark 2.
For the two real datasets, our EBINAR(1) model performs best, but it is not clear enough regarding predicting unseen data. To further illustrate the better performance of the new model in prediction, one available solution is to conduct a further experiment when dividing the considered data into a training set and test set. In addition such experiment will be considered in future study of the crime data.

Concluding Remarks
This paper proposes a more flexible model for bivariate integer-valued time series data, i.e., the EBINAR(1) model, whose innovation vector is time-dependent. It is a generalization of the EINAR(1) model [11] to the two-dimensional case as well as a generalization of the BINAR(1) model [14,16], but with more flexibility. We discuss some necessary properties of the model, the CML estimators of parameters and their large-sample properties. Simulation was conducted to examine the finite sample performance of estimators. Real data examples are provided to illustrate our model to be effective relative to existing models.
To make the bivariate INAR-type models more flexible with respect to real-data applications in some cases, it may be interesting to include explanatory covariates or periodicity in the model to account for dependence through thinning operations on several factors, which will be considered in another project: see Aknouche et al. [28] and Chen and Khamthong [29].