Estimation of the variance of the quasi-maximum likelihood estimator of weak VARMA models

: This paper considers the problems of computing and estimating the asymptotic variance matrix of the least squares (LS) and/or the quasi-maximum likelihood (QML) estimators of vector autoregressive moving- average (VARMA) models under the assumption that the errors are uncorrelated but not necessarily independent. We ﬁrstly give expressions for the derivatives of the VARMA residuals in terms of the parameters of the models. Secondly we give an explicit expression of the asymptotic variance matrix of the QML/LS estimator, in terms of the VAR and MA polynomials, and of the second and fourth-order structure of the noise. We then deduce a consistent estimator of this asymptotic variance matrix. Modiﬁed versions of the Wald, Lagrange Multiplier and Likelihood Ratio tests are proposed for testing linear restrictions on the parameters. The theoretical results are illustrated by means Monte Carlo experiments.


Introduction
The class of vector autoregressive moving-average (VARMA) models and the sub-class of vector autoregressive (VAR) models are used in time series analysis and econometrics to describe not only the properties of the individual time series but also the possible cross-relationships between the time series (see [36,41]). This paper is devoted to the problems of computing and estimating the asymptotic variance matrix of the least squares (LS) and/or the quasi-maximum likelihood (QML) estimators of VARMA models under the assumption that the errors are uncorrelated but not necessarily independent. These models are called weak VARMA in contrast to the standard VARMA models, also called strong VARMA models, in which the error terms are supposed to be independent and identically distributed (iid). This independence assumption is often considered too restrictive by practitioners. It precludes conditional heteroscedasticity and/or other forms of nonlinearity (see [25] for a review on weak univariate ARMA models).
A process (X t ) t∈Z is said to be nonlinear when the innovation process in the Wold decomposition (see e.g. [12], for the univariate case, and Reinsel [41] in the multivariate framework) is uncorrelated but not necessarily independent, and is said to be linear in the opposite case (i.e. when the innovation process in the Wold decomposition is iid). Relaxing the independence assumption considerably extends the range of applications of the VARMA models, and allows to cover linear representations of general nonlinear processes. Indeed such nonlinearities may arise for instance when the error process follows an autoregressive conditional heteroscedasticity (ARCH) introduced by Engle [18] and extended to the generalized ARCH (GARCH) by [5], all-pass (see [3]) or other models displaying a second order dependence (see [1]). Other situations where the errors are dependent can be found in [25], see also [42]. Leading examples of multivariate linear processes are the VARMA and VAR models with iid error terms. Nonlinear models are becoming more and more employed because numerous real time series exhibit nonlinear dynamics, for instance conditional heteroscedasticity, which can not be generated by autoregressive moving-average (ARMA) models with iid error terms. 1 Work on asymptotic results usually focuses on univariate models (see [25] for a review of this topic). For multivariate models, important advances have been obtained by Dufour and Pelletier [16] who study the asymptotic properties of a generalization of the regression-based estimation method proposed by Hannan and Rissanen [29] under weak assumptions on the innovation process, Francq and Raïssi [20] who study portmanteau tests for weak VAR models, Boubacar Maïnassara and Francq [9] study the estimation of weak VARMA models, Boubacar Maïnassara [6,7] who studies portmanteau tests and the problem of order selection of weak VARMA models, Katayama [33] who also proposes a new portmanteau test statistic for weak VARMA models. In [9], it is shown that the asymptotic variance matrix of the usual estimators has the "sandwich" form Ω := J −1 IJ −1 , where the two Fisher information matrices J and I depend respectively on second and fourth-order moments of the errors and on the true parameter (denoted, hereafter, θ 0 ). This proposed asymptotic variance reduces to standard form Ω = 2J −1 in the linear case.
In the framework of (Gaussian) linear processes, the problem of computing the Fisher information matrices and their inverses has been widely studied. Various expressions of these matrices have been given by Whittle [46,47], Siddiqui [43], Durbin [17] and Box and Jenkins [11]. McLeod [39], Klein and Mélard [34,35] and Godolphin and Bane [27] have given algorithms for their computation. For few particular cases of weak ARMA models, the matrices I and J have been computed by Boubacar Maïnassara, Carbon and Francq [8], Francq and Zakoian [24,26] and Francq, Roy and Zakoian [22]. The main goal of the present paper is to complete the available results concerning the statistical analysis of weak VARMA models, by proposing another estimator of Ω, which allows to separate the effects due to the VARMA parameters from those due to the nonlinear structure of the noise.
The paper is organized as follows. Section 2 presents the models that we consider here, and presents the results on the QML/LS estimator asymptotic distribution obtained by Boubacar Maïnassara and Francq [9]. In Section 3 we give expressions for the derivatives of the VARMA residuals in terms of parameters of the models. Section 4 is devoted to find an explicit expression of the asymptotic variance of the QML/LS estimator, in terms of the VAR and MA polynomials, and of the second and fourth-order structure of the noise. In Section 5 we deduce a consistent estimator of this asymptotic variance. We describe, in Section 6, how to obtain numerical evaluations of tolerance for the information matrices J and I up to some tolerance. In Section 7 it is shown how the standard Wald, LM (Lagrange Multiplier) and LR (Likelihood Ratio) tests must be adapted in the weak VARMA case in order to test for general linearity constraints. This section is also of interest in the univariate framework because, to our knowledge, these tests have not been studied for weak ARMA models. Numerical experiments are presented in Section 8. The proofs of the main results are collected in the appendix. We denote by A ⊗ B the Kronecker product of two matrices A and B, and by vec(A) the vector obtained by stacking the columns of A. The reader is refereed to Magnus and Neudecker [37] for the properties of these operators. Let 0 r be the null vector of R r , and let I r be the r × r identity matrix.

Model and assumptions
Consider a d-dimensional stationary process (X t ) satisfying a structural VARMA(p, q) representation of the form where ǫ t is a white noise, namely a stationary sequence of centered and uncorrelated random variables with a non singular variance Σ 0 . The structural forms are mainly used in econometrics to introduce instantaneous relationships between economic variables. Of course, constraints are necessary for the identifiability of these representations. Let [A 00 . . . A 0p B 00 . . . B 0q ] be the d × (p + q + 2)d matrix of all the coefficients, without any constraint. The matrix Σ 0 is considered as a nuisance parameter. The parameter of interest, θ 0 , belongs to the parameter space Θ ⊂ R k0 , where k 0 is the number of unknown parameters, which is typically much smaller that (p + q + 2)d 2 . The matrices A 00 , . . . A 0p , B 00 , . . . B 0q involved in (2.1) are specified by θ 0 . More precisely, we write A 0i = A i (θ 0 ) and B 0j = B j (θ 0 ) for i = 0, . . . , p and j = 0, . . . , q, and Σ 0 = Σ(θ 0 ). We need the following assumptions used by Boubacar Maïnassara and Francq [9] to ensure the consistency and the asymptotic normality of the quasi-maximum likelihood estimator (QMLE).
For simplicity we now write A i , B j and Σ instead of A i (θ), B j (θ) and Σ(θ). Let : For all θ ∈ Θ, we have det A θ (z) det B θ (z) = 0 for all |z| ≤ 1.
A5: For all θ ∈ Θ such that θ = θ 0 , either the transfer functions We now introduce, as in [23] the strong mixing coefficients of a stationary process Z = (Z t ) denoted by measuring the temporal dependence of the process Z. Denoting by Z the Euclidean norm of Z.
A8: The matrix M θ0 is of full rank k 0 .
Under Assumptions A1-A8, Boubacar Maïnassara and Francq [9] have showed the consistency (θ n → θ 0 a.s. as n → ∞) and the asymptotic normality of the QMLE: where J = J(θ 0 , Σ e0 ) and I = I(θ 0 , Σ e0 ), with J(θ, Σ e ) = lim n→∞ 2 n ∂ 2 ∂θ∂θ ′ logL n (θ, Σ e ) a.s. and and e t (θ) = ϕ −1 θ (L)φ θ (L)X t . The subject of this section is to generalize these expansions to the multivariate ARMA case. The reduced VARMA representation can be rewritten as the compact form We denote respectively by a := (a ′ 1 , . . . , a ′ p ) ′ and b := (b ′ 1 , . . . , b ′ q ) ′ , the coefficients of the multivariate AR and MA parts. Thus we can rewrite where E ij is the d×d matrix with 1 at position (i, j) and 0 elsewhere. We denote by A * ij,h and B * ij,h the (d × d) matrices defined by The matrix λ h (θ) is well defined because the coefficients of the series expansions of A −1 θ and B −1 θ decrease exponentially fast to zero. We are now able to state the following proposition, which is a generalization of a result given in [38]. Moreover, at θ = θ 0 we have with the λ i 's are defined by (3.1).

Explicit expression of I and J
The subject of this section is to give expressions for the information matrices I and J involved in the asymptotic variance Ω of the QMLE. In these expressions, we isolate what is a function of the VARMA parameter θ 0 from what is a function of the distribution of the weak noise e t .
McLeod [38] gave a nice expression for J, for the univariate ARMA model, as the variance of a VAR model involving only the ARMA parameter θ 0 (see (8.8.3) in [12]). Francq, Roy and Zakoïan [22] obtained an expression of I involving the ARMA parameter θ 0 and the fourth-order moments of the weak noise (ǫ t ) (with their notations, J = Λ ′ ∞ Λ ∞ and I = Λ ′ ∞ Γ ∞,∞ Λ ∞ where Λ ∞ depends on θ 0 and Γ ∞,∞ depends on moments of (ǫ t )). For certain statistical applications, recently, Boubacar Maïnassara, Carbon and Francq [8] give similar expressions for I(θ) and J(θ) when θ = θ 0 . Let us define the matrix involving the second-order moments of (e t ). We are now able to state the following proposition, which provides a form for J = J(θ 0 , Σ e0 ), in which the terms depending on θ 0 (through the matrices λ i ) are distinguished from the terms depending on the second-order moments of (e t ) (through the matrix M) and the terms of the noise variance of the multivariate innovation process (through the matrix Σ e0 ).
where the λ i 's are defined by (3.1).
We now search similar tractable expressions for I. In view of (2.3), we have Note that, the existence of the sum of the right-hand side of (4.1) is a consequence of A7 and of Davydov's inequality [13] (see e.g. Lemma 11 in [9]). Let the matrices The terms depending on the VARMA parameter are the matrices λ i defined in (3.1) and let the matrices involving the fourth-order moments of the innovations e t . The terms depending on the noise variance of the multivariate innovation process are in Σ e0 . We now state an analog of Proposition 4.1 for I = I(θ 0 , Σ e0 ).
where the λ i 's are defined by (3.1).
where γ(i, j) = +∞ h=−∞ E(e t e t−i e t−h e t−j−h ) and λ ′ i ∈ R p+q are defined by (3.1). Remark 4.2. Francq, Roy and Zakoïan [22] considered the univariate case d = 1. In their paper, they used the LS estimator and they obtained where σ 2 is the variance of the univariate process e t and the vectors In [9], it is shown that Using the vec operator and the elementary relation vec(aa ′ ) = a⊗a ′ , their result writes which are the expressions given in Remark 4.1.
The following example illustrates that how the matrices I and J are depend on the terms θ 0 and terms involving the distribution of the innovations ǫ t . Example 1. Consider for instance a weak, univariate, ARMA(1, 1) of the form with variance σ 2 . Then, with our notations, Thus, we have where σ 0 is the true value of σ. We then deduce that Thus, in the standard strong ARMA case, i.e. when A4 is replaced by the assumption that (ǫ t ) is iid, it is easily seen that γ(i, j) = [E(ǫ 2 t )] 2 = σ 4 0 when i = j and 0 if i = j, so that I = 2J. In the general case we have I = 2J.

Estimating the asymptotic variance matrix
In section 4, we obtained explicit expressions for I and J. We now turn to the estimation of these matrices. Letê t =ẽ t (θ n ) be the QMLE residuals when p > 0 or q > 0, and letê t = e t = X t when p = q = 0. When p + q = 0, we haveê t = 0 for t ≤ 0 and t > n and t be an estimator of Σ e0 . The matrix M involved in the expression of J can easily be estimated by its empirical counterpart In view of Proposition 4.1, we define an estimatorĴ n of J by We are now able to state the following theorem, which shows the strong consistency ofĴ n .
In the standard strong VARMA caseΩ = 2Ĵ −1 is a strongly consistent estimator of Ω. In the general weak VARMA case this estimator is not consistent when I = 2J. So we now need a consistent estimator of I. The estimation of the long-run variance I is more complicated. In the literature, two types of estimators are generally employed: HAC estimators (see [2,40] for general references, and [26] for an application to testing strong linearity in weak ARMA models) and spectral density estimators (see [4] and also den [14] for a general reference; see also [9] for an application to a weak VARMA model). Let and a weight function f : R → R which is bounded, with compact support [−a, a] and continuous at the origin with f (0) = 1. Note that under the above where [x] denotes the integer part of x and wherê In view of Proposition 4.2, we define an estimatorÎ n of I by We are now able to state the following theorem, which shows the weak consistency of an empirical estimator ofÎ n .
Therefore Theorems 5.1 and 5.2 show that is a weakly estimator of the asymptotic covariance matrix Ω = J −1 IJ −1 , which allows to separate the effects due to the VARMA parameters from those due to the nonlinear structure of the noise.

Approximation of the information matrices by finite sums
In practice the infinite sums involved in J and I are truncated. This section concentrates on the choice of the truncation parameter for J and I. Matrix J is truncated by the matrix J M and defined by The following proposition defines a value of M such that J M be equal to J up to an arbitrarily small tolerance number ε. Let the matrix norm defined by A = i,j |A(i, j)| with obvious notations. Proposition 6.1. Let ρ be the inverse of the largest modulus of the zeroes of the polynomials det A θ (z) and det B θ (z) and let For all ε > 0, we can therefore choose an integer . Similarly to J, the matrix I is truncated by the matrix I M of M 2 terms, defined by We now state an analog of Proposition 6.1 for I.
For all ε > 0, we can therefore choose an integer

Testing linear restrictions on the parameter
It may be of interest to test s 0 linear constraints on the elements of θ 0 (in particular A 0p = 0 or B 0q = 0). We thus consider a null hypothesis of the form where R 0 is a known s 0 × k 0 matrix of rank s 0 and r 0 is a known s 0 -dimensional vector. The Wald, LM and LR principles are employed frequently for testing H 0 . The LM test is also called the score or Rao-score test. We now examine if these principles remain valid in the non standard framework of weak VARMA models. LetΩ =Ĵ −1ÎĴ −1 , whereĴ andÎ are consistent estimator of J and I, as defined in Section 5. Under Assumptions A1-A8, and the assumption that I is invertible, the Wald statistic asymptotically follows a χ 2 s0 distribution under H 0 . Therefore, the standard formulation of the Wald test remains valid. More precisely, at the asymptotic level α, the Wald test consists in rejecting H 0 when W n > χ 2 s0 (1 − α). It is however important to note that a consistent estimator of the formΩ =Ĵ −1ÎĴ −1 is required. The estimatorΩ = 2Ĵ −1 , which is routinely used in the time series softwares, is only valid in the strong VARMA case.
We now turn to the LM test. Letθ c n be the restricted QMLE of the parameter under H 0 . Define the Lagrangean where λ denotes a s 0 -dimensional vector of Lagrange multipliers. The first-order conditions yield It will be convenient to write a We deduce that Thus under H 0 and the previous assumptions, so that the LM statistic is defined by Note that in the strong VARMA case,Ω = 2Ĵ −1 and the LM statistic takes the more conventional form In the general case, strong and weak as well, the convergence (7.1) implies that the asymptotic distribution of the LM n statistic is is not asymptotically valid for general weak VARMA models. Standard Taylor expansions show that and that the LR statistic satisfies Using the previous computations and standard results on quadratic forms of normal vectors (see e.g. Lemma 17.1 in [45]), we find that the LR n statistic is asymptotically distributed as s0 i=1 λ i Z 2 i where the Z i 's are iid N (0, 1) and λ 1 , . . . , λ s0 are the eigenvalues of Its eigenvalues are therefore equal to 0 and 1, and the number of eigenvalues equal to 1 is Tr Therefore we retrieve the well-known result that LR n ∼ χ 2 s0 under H 0 in the strong VARMA case. In the weak VARMA case, the asymptotic null distribution of LR n is complicated. It is possible to evaluate the distribution of a quadratic form of a Gaussian vector by means of the Imhof algorithm (see [31]). An alternative is to use the transformed statistic which follows a χ 2 s0 under H 0 , whenĴ andŜ − LR are weakly consistent estimators of J and of a generalized inverse of S LR . The estimatorŜ − LR can be obtained from the singular value decomposition of any weakly consistent estimatorŜ LR of S LR . More precisely, defining the diagonal matrixΛ = diag(λ 1 , . . . ,λ k0 ) wherê λ 1 ≥λ 2 ≥ · · · ≥λ k0 denote the eigenvalues of the symmetric matrixŜ LR , and denoting byP an orthonormal matrix such thatŜ LR =PΛP ′ , one can set The matrixŜ − LR then converges weakly to a matrix S − LR satisfying S LR S − LR S LR = S LR , because S LR has full rank s 0 .
The obvious problem with this modified version of the LR statistic is that Σ LR is not invertible in the general situation where s 0 < k 0 , which invalidates the asymptotic χ 2 s0 distribution and also entails numerical problems in the computation of (7.2).

Numerical illustrations
In this section, by means of Monte Carlo experiments, we illustrate the finite sample behavior of our estimators of the information matrices I and J (hereafter denoted respectively I c and J c , where the subscript on I and J denoting computed) involved in the asymptotic matrix variance Ω, for strong and weak VARMA models. We used, the three kernels described in Table 1 below in the calculation of the estimator of the matrix I c . Let, We compare our estimator, of Ω, with the standard and the spectral density estimatorsΩ SP :=Ĵ −1Î SPĴ −1 (Î SP is defined in Theorem 3 of [9]) considered by Boubacar Maïnassara and Francq [9].

Simulating models
The structural VARMA(p, q) representation (2.1) can be rewritten in a standard reduced VARMA(p, q) form if the matrices A 00 and B 00 are non singular. Indeed, premultiplying (2.1) by A −1 00 and introducing the innovation process e t = A −1 00 B 00 ǫ t , with non singular variance Σ e0 = A −1 00 B 00 Σ 0 B ′ 00 A −1 ′ 00 , we obtain the reduced VARMA representation The structural form (2.1) allows to handle seasonal models, instantaneous economic relationships, VARMA in the so-called echelon form representation, and many other constrained VARMA representations (see [36], chap. 12). The reduced form (8.1) is more practical from a statistical viewpoint, because it gives the forecasts of each component of (X t ) according to the past values of the set of the components.
The above discussion shows that VARMA representations are not unique, that is, a given process (X t ) can be written in reduced form or in structural form by premultiplying by any non singular (d × d) matrix. Of course, in order to ensure the uniqueness of a VARMA representation, constraints are necessary for the identifiability of the (p + q + 2)d 2 elements of the matrices involved in the VARMA equation (2.1). In contrast, the echelon form guarantees uniqueness of the VARMA representation (see also [36]). The echelon form is the most widely identified VARMA representation employed in the literature. The identifiability of VARMA processes has been studied in particular by Hannan [28] who gave several procedures ensuring identifiability.
For simplicity, we now write a 1 , b 1 and b 2 instead of a 1 (2, 2), b 1 (2, 1) and b 1 (2, 2). For estimating the asymptotic matrix variance Ω, introduced in this paper, we implement the information matrices I c and J c involved in Ω, using the following steps: 1. Compute the estimatesÂ,B by QMLE. 2. Compute the QMLE residualsê t =ẽ t (θ n ) when p > 0 or q > 0, and let e t = e t = X t when p = q = 0. When p + q = 0, we haveê t = 0 for t ≤ 0 and t > n andê for t = 1, . . . , n, withX t = 0 for t ≤ 0 andX t = X t for t ≥ 1.
The real numbers (b n ) n∈N * satisfy (5.1) and a weight function f : R → R is supposed bounded, with compact support [−a, a] and continuous at the origin with f (0) = 1. With regard to the choice of (b n ), a few heuristic remarks can be made (see, for instance, [10,15,26,30]). When h is small relative to n, a weight f (hb n ) close to one is required. Therefore, it is supposed that (b n ) decreases to zero as n tends to ∞ (see condition (5.1)). On the contrary, when h is large relative to n, one wants a weight f (hb n ) close to zero. Therefore, it is supposed that (b n ) does not decrease to zero too quickly. We used, in this paper, the three kernels described in Table 1 in the calculation of the matrices estimatorsΓ n (i, j). For each kernel, we have taken b n equal to 1/ ln(n) which corresponds to the truncation point T n = [ln(n)]. 8. Compute the 9 × 1 matrix:

Empirical size
The numerical illustrations of this section are made with the free statistical software R (see http://cran.r-project.org/). We simulated N independent trajectories of different sizes n of Model (8.2), first with the strong Gaussian noise (8.3), second with the two weak noises (8.4) and (8.5).

Strong VARMA model case
We first consider the strong VARMA case. To generate this model, we assume that in (8.2) the innovation process (ǫ t ) is defined by We simulated N independent trajectories of size n = 1, 000 of Model (8.2) with the strong gaussian noise (8.3). For each of these N replications we estimated the coefficients (a 1 (2, 2), b 1 (2, 1), b 1 (2, 2)), using the Gaussian maximum likelihood estimation method, and we compared estimates of the asymptotic variance Ω, of the QMLE, of standard and modified (sandwich) estimators of Ω.

Weak VARMA model case
The GARCH(p, q) models constitute important examples of weak white noises in the univariate case. These models have numerous extensions to the multivariate framework. Jeantheau [32] has proposed a simple extension of the multivariate GARCH(p, q) with conditional constant correlation. For simplicity, we consider the bivariate ARCH(1) model. In which, the process (ǫ t ) verifies the following relation ǫ t = H t η t where {η t = (η 1,t , η 2,t ) ′ } t is an iid centered process with Var{η i,t } = 1, for i = 1, 2 and H t is a diagonal matrix whose elements h ii,t verify The elements of the matrix A, as well as the vector (c 1 , c 2 ) ′ , are supposed to be positive. In addition, suppose that the stationarity conditions hold. Then, we have We now repeat the same experiment on two weak VARMA(1, 1) models. We first assume that in (8.2) the innovation process (ǫ t ) is an ARCH (1) This noise is a direct extension of a weak noise defined by Romano and Thombs [42] in the univariate case. For the estimation of the coefficients, we used the quasi-maximum likelihood estimation method and we compared estimates of the asymptotic variance Ω, of the QMLE, of standard and modified (sandwich) estimators of Ω.   and Ω P AR = J −1 c I P AR J −1 c . In the strong VARMA case we know that the two estimators, standard and sandwich, are consistent. In view of the three top panels of Figure 1, it seems that the sandwich estimators is less accurate in the strong case. This is not surprising because the sandwich estimators are more robust, in the sense that these estimators continue to be consistent in the weak VARMA case, contrary to

Comments
Estimates, for n=1000, of diag(Ω) Estimates of Ω(2, 2) Estimates of Ω(3, 3) the standard estimator (see Figures 2 and 3). It is clear that in the weak Model (8.2)-(8.4) case nVar{b 1 (2, 1)−b 1 (2, 1)} 2 and nVar{b 1 (2, 2)−b 1 (2, 2)} 2 are better estimated by the sandwich estimators than by the standard ones (see Figure 2). We draw the same conclusion for the second weak Model (8.2)-(8.5) case for nVar{b 1 (2, 2)−b 1 (2, 2)} 2 (see Figure 3). We draw the conclusion that, the failure of the standard estimators of Ω in the weak VARMA framework may have important consequences in terms of identification (see [7]) or hypothesis testing. Table 2 displays the empirical sizes of the standard Wald, LM and LR tests, and that of the modified versions proposed in Section 7. We use the CompQuadForm R package to evaluate the p-values using the Imohf algorithm, 1961). For the nominal level α = 5%, the empirical size over the N = 1, 000 independent replications should vary between the significant limits 3.6% and 6.4% with probability 95%. For the nominal level α = 1%, the significant limits are 0.3% and 1.7%, and for the nominal level α = 10%, they  are 8.1% and 11.9%. When the relative rejection frequencies are outside the significant limits, they are displayed in bold type in Table 2. For the strong VARMA Model I, all the relative rejection frequencies are inside the significant limits. As expected, for the weak VARMA Models II and III cases, the relative rejection frequencies of the standard tests are definitely outside the significant limits. Thus the error of first kind is well controlled by all the tests in the strong case, but only by modified versions of the tests in the weak case. Table 3 shows that the powers of all the tests are very similar in the strong VARMA Model IV. The same is also true for the modified tests in the weak VARMA Models V and VI cases. The empirical powers of the standard tests are hardly interpretable for these weak VARMA Models V and VI cases, because we have already seen in Table 2 that the standard versions of the tests do not well control the error of first kind in the weak VARMA framework.  From these simulation experiments, we demonstrated that the validity of the different steps of the traditional methodology of Box and Jenkins, identification, estimation and validation, depends on the noise properties. This standard methodology needs however to be adapted to take into account the possible lack of independence of the errors terms. Under nonindependent errors, it appears that the standard tests are generally unreliable while the proposed tests offer satisfactory levels in most cases. Moreover, the error of first kind is well controlled by the modified versions of the tests. We draw the conclusion that the modified versions are preferable to the standard ones. Therefore the modified tests of the present article can be considered as complementary to the above-mentioned available results concerning the statistical analysis of weak VARMA models.
Hence, at θ = θ 0 we have It thus remains to prove the following two Lemmas.
Lemma A.1. We have Differentiating the two terms of the following equality with respect to the AR coefficients, we obtain where E ij = ∂A ℓ /∂a ij,ℓ is the d × d matrix with 1 at position (i, j) and 0 elsewhere and e t (θ) = B −1 θ (L)A θ (L)X t . Then we have Hence, for any a ℓ writing the multivariate noise derivatives with A * k−ℓ = 0 when k < ℓ. With these notations, we obtain The proof is similar to that given in Lemma A.1. However, we will give the derivatives which are different to that in the previous Lemma. Differentiating the two terms of the following equality A θ (L)X t = B θ (L)e t (θ), with respect to the MA coefficients, we obtain where E ij = ∂B ℓ ′ /∂b ij,ℓ ′ is the d × d matrix with 1 at position (i, j) and 0 elsewhere. We then have Similarly to Lemma A.2, we have where N (L) = [N 11 (L) : N 21 (L) : · · · : N dd (L)] and U t−ℓ ′ are respectively the d × d 3 and d × d 2 matrices. Then, we have where B * k−ℓ ′ = 0 when k < ℓ ′ . With these notations, we obtain The conclusion follows.
The same equality holds for the second-order derivatives ofl n . We thus have Using well-known results on matrix derivatives, we have In view of (A.3), we have by the ergodic theorem. Using the orthogonality between e t and any linear combination of the past values of e t , we have 2E{∂ 2 e ′ t (θ 0 )/∂θ∂θ ′ }Σ −1 e0 e t = 0. We then have Using vec(ABC) = (C ′ ⊗A) vec(B) with C = I d 2 (p+q) and in view of Proposition 3.1, we obtain The conclusion is complete.
Proof of Proposition 4.2. In view of (4.2), let In view of (4.1), we have Using the elementary relation vec(ABC) = (C ′ ⊗ A) vec B, we have By Proposition 3.1, we obtain Using also AC ⊗ BD = (A ⊗ B)(C ⊗ D), we have The conclusion is complete.
Proof of Remark 4.1. For d = 1, we have where σ 2 is the variance of the univariate process. We also have In view of Proposition 4.1, we have Replacing M by σ 2 I (p+q) 2 in vec J, we have Using (A.4) and in view of Proposition 4.2, we have The conclusion is complete.
Proof of Theorem 5.1. For any multiplicative norm, we have .
The proof will thus follow from Lemmas A.3, A.4 and A.6 below.
Proof of Lemma A.3. Boubacar Maïnassara and Francq [9] have showed the strong consistency ofθ n (θ n → θ 0 a.s. as n → ∞), which entails Using a Taylor expansion of vecλ i about θ 0 , we obtain where θ * n is betweenθ n and θ 0 . For any multiplicative norm, we have In view of (A.5) and (A.6), the proof is complete. Proof of Lemma A.4. We have For any θ ∈ Θ, let In view of (A.7), using A2 and the compactness of Θ, we have by Assumption A7. Now, we will consider the norm defined by: where Z is a d 1 random vector. In view of Proposition 3.1, (A.8) and Let e t = (e 1t , . . . , e dt ) ′ . The non zero components of the vector vec M n (θ) are of the form n −1 n t=1 e it (θ)e jt (θ), for (i, j) ∈ {1, . . . , d} 2 . We deduce that the elements of the matrix ∂ vec M n (θ)/∂θ ′ are linear combinations of 2 n n t=1 e it (θ) ∂e jt (θ) ∂θ .
The conclusion is complete.
The conclusion is complete.
For k, k ′ , m, m ′ = 1, . . . , ∞, let In the univariate case Francq, Roy and Zakoïan [21] have showed (see the proofs of their Lemmas A.1 and A.3) that sup ℓ,ℓ ′ >0 |Γ(ℓ, ℓ ′ )| < ∞. This can be directly extended in the multivariate case. The non zero elements of Γ(i, j) are of the form We then deduce that sup i,j≥1 Γ(i, j) = O(1). The proof will thus follow from Lemma A.7 below, in which we show the consistency ofΓ n (i, j) uniformly in i and j.
We then obtain and the result follows.
Proof of Proposition 6.2. The proof is similar to that given in Proposition 6.1. However, we will give the expression which are different to that in the previous Proposition. Let and the result follows.