Automatic Identification of General Vector Error Correction Models

Abstract There are a number of econometrics tools to deal with the different type of situations in which cointegration can appear: I(1), I(2), seasonal, polynomial, etc. There are also different kinds of Vector Error Correction models related to these situations. We propose a unified theoretical and practical framework to deal with many of these situations. To this aim: (i) a general class of models is introduced in this paper and (ii) an automatic method to identify models, based on estimating the Smith form of an autoregressive model, is provided. Our simulations suggest the power of the new proposed methodology. An empirical example illustrates the methodology.


Introduction
The basis of the theory of cointegration was laid out in a surge of articles about the late eighties and early nineties, preceded by the seminal paper by Granger (1981). By the mid-nineties, there was a relatively complete theoretical framework for I(1) (Engle and Granger, 1987;Johansen, 1988;Johansen and Juselius, 1990), I(2) cointegration (Paruolo, 1996;S., 1992), multicointegration (Granger and Lee, 1989) and seasonal cointegration (Hylleberg et al., 1990).
Since then, much of of the attention in cointegration theory has been focused on panel data (Levin and Lin, 1993;Levin et al., 2002) or on fractional cointegration (Engle and Granger, 1987;Granger and Joyeux, 1980). Latest developments are focused on particular cases as cointegration in dynamic panels (Yu and Lee, 2010), or panel data multicointegration (Worthington and Higgs, 2010)).
Also, some contributions have been made to the foundations and algebraic theory of multivariate integrated processes (Franchi, 2006(Franchi, , 2007. A concern that may explain the persistence of interest in the algebra of cointegration is the perceived necessity of a unifying theoretic framework for all situations. In this article, we contribute in two ways to facilitate a more unified treatment of cointegration. First, in section 2 we provide a general theoretical framework that covers any process that can be represented by an autoregressive model with unit roots. This framework has the following elements: (i) a classification of all such models based on the Smith form (SF); (ii) a class of General Vector Error Models (GVEC) that correspond to every unstable autoregressive model and (iii) a consistency result for the Least Squares estimator of the GVEC models. The first two points are in fact straightforward generalizations of the kind of reasoning that Hylleberg et al. (1990) used to find a seasonal vector error correction model by assuming that the process had a Wold representation with a Smith form of a particular shape. By removing the restriction on the SF, we find a general family of VEC models (General VEC or GVEC, hereinafter). The GVEC family covers the following cases: I(1), I(2), multicointegration (with a caveat that will be explained later), polynomial and seasonal cointegration. On the other hand, this framework does not include panel data or fractional integration. At this point, we do not know how difficult will be to generalize our results in those directions, but it seems that fractional integration poses more difficulties from the theoretical point of view, whereas the algorithms we use are not well-suited to panel data.
The second contribution (section 3) is an automatic method to identify GVEC models based on an estimate of the SF of the autoregressive model.
We show by means of Monte Carlo simulations (section 4) that our method works better than the Johansen test in the restricted cases in which the latter applies. But of course, our method has also the advantage that it can detect other situations. This is the case in the practical example of section 5, where our algorithm detects both seasonal cointegration and higher order of integration.
Therefore, in series that have a marked seasonal behavior, with our method it is no longer necessary to do a cointegration analysis with the seasonally adjusted series, as is sometimes done.

General Vector Error Correction Models
Let us recall some fundamental results of the representation of cointegrated variables by means of VEC models, following the ideas of Engle and Granger (1988), although not their notation.
We will denote polynomials and power series using the inderminate z. When we use them to define autoregressive or ARMA models, we substitute the backshift operator B for z, so for example φ(z) = φ 0 + . . . + φ p z p , but φ(B)y t = φ 0 y t + . . . + φ p y t−p . Sometimes, when the context makes clear that φ is a polynomial, we can drop the indeterminate for ease of notation. The same applies for vectors and matrices whose entries are polynomials or power series.
Let us assume that y t is a n × 1 vector whose components are I(1), meaning that (1 − B)y t = y t − y t−1 is weakly stationary. We additionally assume that it is purely nondeterministic, so it has a Wold representation where ε t is white noise with zero mean and covariance matrix Σ.
We say that y t is cointegrated with cointegration rank r if there exist r linearly independent n × 1 constant vectors a such that a ′ y t is stationary. We reproduce partially below the well-known Granger Representation Theorem from Engle and Granger (1988).
Proposition 1 (Granger representation theorem). If y t has cointegration rank r, then: (a) Ψ(1) is of rank n − r.
(b) There is an ARMA representation A(B)y t = m(B)ε t such that A(1) has rank r, A(0) = I and m(z) is a polynomial with m(1) = 0.
We will generalize this result in three directions: greater orders of integration, unit roots other than unity (among other things, to include seasonal integration) and polynomial cointegration (including multicointegration).
Assumption 1. Let s be a positive integer and d(z) a real polynomial such that all its roots belong to {ω k } s−1 k=0 where ω k = exp 2πik/s and d(B)y t is stationary and purely nondeterministic.
The fact that we limit the roots of d to that set is just for convenience, as the most common cause of unit roots other than the unity is seasonal integration.
Hence, s can be interpreted as the number of observations per year (or per week in the case of daily data). All the subsequent developments can easily be adapted to the general case, although most applications do not require that.
Under assumption 1, we can say that y t is integrated of order d k with respect to ω k and denote this property as y t ∼ I k (d k ). With this notation, I 0 (j) is We can remain in the realm of the real numbers by applying instead (1 − 2Reω k B + B 2 ) d k y t , which differentiates y t simultaneously respect to ω k and its complex conjugate We say that y t is polynomially cointegrated of order j at ω k when there exists a polynomial vector a(z) such that a(B) ′ y t is I k (d k − j). There can be several cointegration relationships for y t . For each k and j, let us say there is a set A of such vectors with exactly r k,j elements and {a(ω k )} a∈A is linearly independent. For the sake of generality, we do not require that all the components of y t have the same order of integration, so there may be trivial cointegration relationships. Then, if y t = (y 1,t , y 2,t ) ′ , where y 1,t ∼ I(1) and y 2,t ∼ I(0), a trivial cointegrating vector would be (0, 1) ′ . For convenience, we can say that y t is cointegrated of order 0 with rank n, so r k,0 = n. Clearly r k,j ≥ r k,j+1 and there is some j from which r k,j = 0 onwards. We call r k,j cointegration ranks.
Assumption 2. Ψ(z) is rational, that is, its entries are polynomial fractions.
Now, we will present our generalization of the Granger representation theorem in two parts. First, we adapt statements (a)-(c) of proposition 1.
Proposition 2. If assumptions 1 and 2 hold and y t has cointegration ranks r k,j , then The concept is fully developed in 3.1 versely, if the SF is as above, then y t has cointegration ranks r k,j .
has no unit roots. In this case, there is also the infinite VAR representation (c') There are full rank n × r k,j polynomial matrices α(z) and γ(z) such that α(z) ′ Ψ(z) and Ψ(z)γ(z) have jth-order zeros at ω k .
Statements (a') and (b') are of paramount importance for our results, because they mean that the cointegration structure of y t can be entirely obtained from the SF of Ψ(z) or Φ(B). In order to distinguish both diagonal matrices we can use D Ψ (z) and D Φ (z). Let δ 1 (z), . . . , δ h (z) be the distinct diagonal elements of D Φ (z).
Proposition 3. (d') In the conditions of proposition 2 y t satisfies the model where Γ(z) is a polynomial matrix. If D Ψ (z)|d(z), then there is a representation with Γ(z) an infinite power series and m(B) = 1.
REMARK 1: the rank of the matrices Π (1) kℓ is given by the number r 1 of times δ 1 is repeated in {d i } n i=1 . The rank of the matrices Π (1) kℓ that correspond to real roots is r 1 , whereas the rank of Π (1,−) kℓ is 2r 1 . For greater j, if r j is the times δ j appears in {d i } n i=1 , then the rank of Π (j) kℓ and Π (j,±) kℓ are bounded by r 1 + . . . + r j and 2(r 1 + . . . + r j ) respectively.
Let us see now some examples of known models that are particular cases of GVEC.
Example 1 (VAR model in differences).
the components of y t are not cointegrated. A stable VAR model can be fitted for their differences (∇y 1 t , . . . , ∇y n t ). In a similar fashion, if D Φ (z) = diag(1 − z s , . . . , 1 − z s ), a stable VAR is appropriate for the seasonal differences.
Example 4 (Seasonal integrated processes). For the more general case that the roots are among exp 2πki/s, with k = 0, . . . , s − 1, in some cases, the model (1) boils down to the model in section 4 of Hylleberg et al. (1990). Let as assume that Then, the model has the form Example 5. (Multicointegration.) Following Granger and Lee (1989), consider the case of the production (p t ) and sales (s t ). They satisfy a multicointegration relation, that is, they are cointegrated I(1) variables, so there is a vector (in particular (1, −1)) such that p t − s t is stationary and, in addition to that, is cointegrated with s t (again, with (1, −1)). This relation holds, for example, when ∇(p t , s t ) ′ = Ψ(B)ε t and Note that the alternative representation by Haldrup and Salmon (1998), is in a greater ring, that contains ∇ −1 . The condition D Ψ |d is violated because the second element ∇ 2 in the diagonal of the SF of Ψ(B) does not divide the operator ∇ that we use to make (p t , s t ) stationary. Nevertheless, in order to get a representation without a moving average part this problem can be circumvented by modeling the integrated vector ∇(P t , S t ) ′ = (p t , s t ) ′ (∇ = lcm(∇, ∇ 2 )∇ −1 ).
From (2), we get We are now in a situation like in example 3, with r 2 = 0. In fact, we can follow the constructive proof of proposition 3 and arrive to the model Here we can see the two cointegration relations. The second row means that sales and inventory are cointegrated, and the sum of the two rows gives the relation between sales and production.

Estimation
We will get asymptotic properties of of the least squares estimation of models of the form of (1) in the case that Γ(B) is of finite order, but first we need an additional assumption.
Assumption 3. Let F t be the σ−field generated by {ε s : s ≤ t}. Then, where L T = diag(T q jkℓ I n ) jkℓ , q jkℓ is the minimum multiplicity of the unit roots of ∆ jkℓ (z).
REMARK 1: it is likely that consistency could be proved in the case that the true Γ(B) has infinite order. That would require the order of the estimatê Γ(B) to diverge to infinity at a certain rate, as in Lewis and Reinsel (1985).
REMARK 2: this result should be extended to allow the presence of moving average terms in the model. The results of Barrio Castro and Osborn (2011), suggest that significant improvements could be achieved with this generalization, that could be attempted following the lines of Yap and Reinsel (1995).

Identification
We have built an R package to automatically identify and estimate GVEC models. This package can be obtained from the corresponding author until it is uploaded to a public repository. The main steps of the procedure are: (i) estimate a VAR representation Φ(B)y t = ε t ; (ii) estimating the SF of Φ(z) and (iii) applying proposition 3 to obtain the GVEC model.
This procedure resembles the unit root determination method of the program TRAMO 2 , albeit with the additional complication that polynomial matrices cannot be factorized as simply as polynomials. The method of TRAMO is described in the introductory notes by Maravall (2008).
Steps (i) and (iii) are straightforward. What we need now is a way to estimate the SF of a polynomial or rational matrix. In fact, we will describe a method to estimate the SF in any ring in which we can perform the Euclidean division (Euclidean ring). In subsection 3.1 we will describe the algorithm to do that. In subsection 3.2 we adapt the algorithm to identify GVEC models.

Smith form
Let us review first the theory about the SF. We take the following proposition from Hungerford (1980).

Proposition 5. If
A is a n × n matrix of rank r > 0 over a Principal Ideal Domain R, then A is equivalent to a matrix D of the form over R we may set the coefficient of the highest order term equal to unity, that is, to force d i to be monic.
In general, we will denote by R 1 a subset of R such that for any a ∈ R there is a unique a 1 ∈ R 1 with a = ua 1 and u invertible (R 1 always exists because by the axiom of choice, we can pick one element from each equivalence class with is the set of the monic polynomials. We define a function a → u(a) such that u(a) is invertible and u(a)a ∈ R 1 .
The proof of proposition 5 boils down to showing that there is an algorithm that by means of elementary operations transforms A into D. We will call that the 'exact algorithm' for reasons that will be obvious later. In the next subsection, we will present a stylized description of the algorithm.
We say that R is a Euclidean Ring with degree function ϕ : R − {0} → N when: (i) pq = 0 implies ϕ(pq) ≥ ϕ(p) and (ii) for any p, q ∈ R, there are some m, r ∈ R such that p = mq + r and r = 0 or ϕ(r) < ϕ(r).
Let M n (R) be the ring of the n × n matrices with elements in R and A ∈ M n (R). We will call admissible operations the following elementary operations: (a) Exchange rows i and j.
(b) Exchange columns i and j.
(c) Add c times row i to row j, where c is the quotient of the Euclidean division of a ij by a ii and a ii ∈ R 1 .
(d) Add c times column i to column j, where c is the quotient of the Euclidean division of a ji by a ii and a ii ∈ R 1 .
(e) Multiply column i or row j by u(a ii ).
Proposition 6. There exist algorithms to obtain the SF of a matrix over an Euclidean ring that have the following form: is obtained from A by performing an admissible operation that depends on e.
Condition (a) means that f 0 depends only on (1) its first argument and (2) which elements of the second argument are zero.
We can identify the algorithm with the functions f 0 and g. The interest of all this is not to prove proposition 5, that is proved in Hungerford (1980). The reason to go into this detail is that we need to use later a modification of this scheme. Let us see now with an example, why a modification is necessary.
Assume that the ring R is endowed with a topology (that makes the sum and product continuous) so that we can speak of random elements in R. Then, for a certain matrix A, we may have an estimate obtained with a sample of size T , sayÂ T . Furthermore, supposeÂ T is consistent in probability. We are interested in the SF D of A, so we could in principle, proceed by applying the exact algorithm toÂ T obtaining aD T with the hope that whenÂ T p → A, Unfortunately, it is easy to prove with an example that this does not work. Let us consider the case of a 2 × 2 matrix with elements in the ring of the polynomials over R. In order to achieve uniqueness, we set the highest order Consequently, we have to take a less direct approach to estimate the SF.
We assume now that R is endowed with a modulus function a ∈ R → |a| ∈ R such that for any a, b ∈ R, (ii) |ab| ≤ |a| · |b|.
To simplify matters, we may assume also that |a| = | − a| in that case, |a − b| is a metric.
Assumption 4 (Continuity of the Euclidean division). For any (p, q) ∈ R×R 1 , the remainders of the divisions of p by q and p ′ by q ′ respectively.
Let M n (R) be the ring (and R-module) of the n × n matrices with entries in R. In analogy with the usual notation in linear algebra, we will write for where for a = (a 1 , . . . , a n ), a = j |a j | 2 1/2 . The finiteness of the supremum in (7) can be proved in a similar fashion as in the vector space case. From now onwards, convergence of elements of R and matrices will be referred to the topologies generated by | · | and · respectively. Now, we can introduce the 'approximate algorithm'.
Definition 1. For any f 0 and g as in proposition 6, and for a certain ǫ, the ǫ−approximate algorithm is the one obtained when we replace f 0 by f ǫ that satisfies that f ǫ (e, A) = f 0 (e, B), where B ij = 0 when |A ij | < ǫ and otherwise, In other words, when the exact algorithm depends on whether A ij = 0, the approximate algorithm depends on whether |A ij | < ǫ. Let us introduce some notation. For a matrix A, S(A, ǫ) is the output of the ǫ−approximate algorithm. When there is ambiguity, we will specify the ring in which we are operating as S(A, ǫ; R). Consequently, S(A, 0) is the output of the exact algorithm, that is the SF of A. In particular, if we use the algorithm described in the appendix, we get the unique version in which the elements in the main diagonal of A belong to R 1 (in the polynomial case, they are monic).
Let nowÂ T be an estimate of A.
Theorem 1. If assumption 4 holds, the mapping u is continuous,

Automatic procedure
The central tool of the procedure is the approximate algorithm of subsection 3.1, but how to use it is not completely straightworfard. We have to make some considerations before presenting it in full.
(1) To use proposition 5 we need to specify in which ring we are considering the entries of the matrices. For the theoretical results of section 2, the ring , ∀z, |z| ≤ 1, g(z) = 0}, that is, the rational fractions without poles at any ω k . That this is an Euclidean Ring and the continuity of the Euclidean division are both proved in the mathmatical appendix (lemmas 1 and 3).
(2) To estimate D Φ when Φ(z) is a polynomial matrix, we can take advantage of the fact that the SF of Φ(z) in the ring of polynomials R[z] is also the SF are also invertible in R 1 (z) and the divisibility relationship in R[z] is preserved in R 1 (z).
(3) When we compute the SF, we can factorize its elements and remove all the factors without unit roots (which are invertible in R), since only unit roots are relevant. This way, we find the precise form of SF that appears in proposition 2.
(4) When we estimate the SF via the approximate algorithm of the previous subsection, we get a diagonal matrixD Φ (z) whose entries' roots are neither exactly unitary, nor exactly the same as those of the determinant of the estimated VAR matrixΦ(z). Since we are assuming that all unit roots belong to {ω k } k , we can orthogonally project the unit roots of detΦ(z) onto {ω k } k and put them in the right row and column according toD Φ (z). More specifically, for each rootω of detΦ(z), we get its projection ω ∈ {ω k } k and assign it to the place where it is the closest root of the diagonal entries ofD Φ (z). Once all the roots are assigned, we get a new diagonal matrixD Φ (z) similar toD Φ (z), but where all the roots are exactly unitary.
(5) The algorithms used to calculate the SF guarantee at each step that the elements of the main diagonal ofD Φ (z) divide each other. This entails much more matrix operations and thus a greater estimation error. Hence, we found that the algorith is more precise if we proceed in the following way: (i) apply the approximate algorithm wihout forcing the divisibility; (ii) apply the procedure described just above to make the roots exactly unitary and (iii) apply the full algorithm to get the SF with the diagonal elements dividing one each other.
Step (iii) is equivalent to make operations with integers, so is absolutely precise, whereas step (i) is simpler this way and thus, more accurate.
Considering all this, the algorithm of our package GVEC consists of the following steps: (a) Fit a VAR(p),Φ(z) to the data y 1 , . . . , y T . Order p is provided by the user or determined by BIC, AIC or HQ.
(c) Separate the unit roots of the elements in the main diagonal ofD (Φ,0) . For this, we use the criterion |u| < 1 + ǫ T . We obtain for each j = 1, . . . , n, the approximately unit roots u jℓ . The parameter ǫ T can be chosen by the user of the package. By default, we set ǫ T = log log T · T −1/2 . By the assumptions of theorem 1, ǫ T should decrease more slowly than T −1/2 , so log log T · T −1/2 is very near the boundary.
Since the asymptotic efficiency of AIC is proved for the nonstationary case in Ing et al. (2007) one may think that AIC would be the right choice, but there are reasons to support the use of a more parsimonious criterion, such as BIC.
One is that in fact, it is not necessary that the estimated VAR is consistent, as long as it captures consistently the unstable part of the model. This argument is developed in proposition 7. This is strengthened by the fact that the unstable part of the model is superconsistent. Hence, in the example of section 5 we use BIC.
Even when the model Φ(B)y t = ε t is an infinite VAR, it can be approximated by a VAR(p). We would like to prove that it is possible to estimate consistently the SF of the infinite model using finite approximations. Unfortunately, while for the case of stable infinite VAR we know that under some assumptions the finite VAR approximation is consistent, to our knowledge there is no such result for the unstable case.
However, we can show that for the rational case, i. e., when Φ = M −1 A, the approximation by a finite VAR works asymptotically well for our purposes even if the order of the VAR does not diverge to infinity. is the least squares estimator of Φ(B)y t = ε t with order greater than that of A, On the other hand, Φ may be approximated by a VARMA model. We lack a precise analysis of the properties of the estimators of unstable VARMA models, but probably the properties of the univariate case (Ling and Li, 1998) This is a possible line of investigation for the future 3 .

Monte Carlo
Since the automatic identification method described in subsection 3.2 is not an absolutely straightforward application of the algorithm of subsection 3.1, but has instead some small heuristic modifications, it is convenient to have an empiric confirmation that the method actually works. We want to compare the performance of the method to more traditional tools. Thus, our exercise will have two parts. First, we will compare our method to one based in the Johansen cointegration test in cases in which the latter can be applied. In second place, we will turn to a variety of cases and show the performance of our method alone, since there is no one to compare in such a general framework.
3 Mejorar la redacción de toda esta parte For this purpose, we will consider n × 1 (with n = 2, 3) processes y t with the form: where Q is a random matrix with [0, 1]−uniform entries, D Ψ (B) is a certain Smith matrix whose choice is described below and ε t is n × 1 Gaussian white noise. For each model (10), we will simulate M = 500 time series of length T = 25, 50, . . . , 500. For the simulations, we have translated the identificacion programs into MATLAB code that runs faster than our R package.

Comparison with Johansen test in the I(1) case
Our first exercise with simulated series is to compare our method to the Johansen test. The way in which we will use the test is as follows: (i) For each r = 0, . . . , n−1 we perform the Johansen test for the null that the cointegration rank is less or equal that r, with signifcance level α = 0.05.
(ii) We estimate the cointegration rank as r * + 1, where r * the greatest value of r for which the test rejects.
This procedure is only valid for I(1) processes. Therefore, we limit the range of the polynomial d(z) and matrices D Ψ in (10) to d(z) = 1 − z and Within this boundary, determining the SF is equivalent to determining the cointegration rank, so we can actually compare the probability of correctly identifying the cointegration rank with the Johansen test and with our procedure.
We estimate both probabilities counting for each series length T how many times both methods get it right. In figure 1, we represent the length-dependent curves for dimension 2.
Both in dimension 2 and 3, our method seems to do very well in the nontrivial cases when 0 < r < n, clearly outperforming the Johansen test, particularly in (the series are actually stationary), r = 2 (cointegrated with rank 2), r = 1 (cointegrated with rank 1) and r = 0 (no cointegration). n = 3. In the case r = n there are no very large differences, since all methods detect quite easily that the series are not actually integrated. The only case in which the Johansen test works better is when r = 0, for series of short to moderate lengths, although for d = 2 this only happens for α = 0.01. Also, even in the cases when the probability of correct identification with our method grows at a slower pace, it always start at pretty decent levels for very short series, unlike the Johansen test, which yields very small probabilities in some cases.
We would like to point out that at least in this limited framework our method performs at least as well as the Johansen test even though ours is not restricted to the I(1) case, unlike the Johansen test which limits itself to any of the n + 1 possible values for the cointegration rank. In the next subsection, we show what happens when the true model is not among those that can be identify using the Johansen test.

Results in the I(2) case
Now, we consider cases with Additionally, we also try a case with negative unit roots that can be interpreted as biannual seasonality, that is, In figure 1, we see that the convergence is somewhat slower in some cases, in particular when r 0 = r 1 = 1, but we got fairly good probabilities for long series, above 150 observations length. In n = 3 we have some cases that are really tough. Since the convergence is slower, we represent the curves with the length parameter going up to 500 observations. In particular, the convergence of the case r 0 = 2, r 1 = 0, seems to be very slow. It is not surprising that for n = 3 is more difficult to identify correctly the case, since there are much more possible cases to distinguish. In fact, considering only roots equal to unity, the number of combinations for n−dimensional I(d) series grows as (d + 1) n−1 . However, we see that the seasonal case works pretty well even for relatively short series.

Real data example
To illustrate the use of the method and the package GVEC, we will identify a model for a data set similar to the one used in Hylleberg et al. (1990). In particular, y 1 t will be the logarithm of the net disposable income of the UK and y 2 t will be the logarithm of the expenditure in final consumption of the The AIC-selected order for the var is 5. The estimated VAR iŝ We apply the automatic method with ǫ T = T −1/3 and get the matrix Then, the GVEC is where 0,1 , we get that there is cointegration in the with respect to frequency zero and the cointegrating vector is quite close to (1, −1).

A Mathematical Appendix
Proof of proposition 2. First, the ranks r k,j are uniquely determined because r k,j = dim C n a(ω k ) : a(B) ′ y t ∼ I(d k − j) .
By the uniqueness of the ranks, it suffices to prove that for every process y t , Ψ can be represented as in (a') and then, there are exactly n − ℓ≥j s k,ℓ cointegration vectors whose values at ω k are linearly independent.
The first step is a consequence of theorem from Hungerford (1980) that is reproduce here as proposition 5, applied to the ring R = {f /g : ∀k, g(ω k ) = 0}.
We have then a representation Ψ(z) = U (z)D Ψ (z)V (z), where the elements of the diagonal of D Ψ (z) divide one each other d i,i (z)|d i+1,i+1 (z). This allows to represent y t as It is clear that we can factorize D Ψ (z) as D Ψ (z) = D 0 (z) · . . . · D g (z). If we focus on a certain k, all the other unit roots can be moved to the left or the right so we can write Ψ If we choose G(z) as the last r k,j rows of U k (z) −1 , then we get Since the last r k,j elements of the diagonal of D Ψ (z) are divided by (1 − ω −1 k z) j , we get that G(B)y t ∼ I k (d k − j). On the other hand, G(ω k ) is full rank for otherwise U k (z) would have a root at ω k , which is contradictory with the conditions of the SF.
Thus, there are at least r ′ k,j := n − ℓ≥j s k,ℓ cointegrating vectors a(z) ∈ A such that {a(ω k )} a∈A are linearly independent. Hence r k,j ≥ r ′ k,j . To see that this is actually an identity, let us assume that r k,j > r ′ k,j . Then, we arrange the cointegration relationships in a r k,j × n matrix A 1 (z). Then A 1 (B)y t = Proof. Every Euclidean ring is a PID (see Hungerford, 1980, theorem 3.9). The fact that C[z] is an Euclidean ring is elementary. We will prove that for R. We can write any f /g ∈ R as f = hk/g, where h has roots only among {ω ℓ } s−1 ℓ=0 and k has none there. Then, if we denote by ∂p the degree of polynomial p, we can define ϕ(f ) = ∂h. To divide f 1 = h 1 k 1 /g 1 by f 2 = h 2 k 2 /g 2 , we first divide h 1 by h 2 , so h 1 = qh 2 + r. Then f 1 = (qh 2 + r)k 1 = qh 2 k 1 /g 1 + rk 1 /g 1 =qf 2 + rk 1 /g 1 , whereq = (g 2 k 1 g −1 1 k −1 2 )q and ϕ(rk 1 /g 1 ) = ∂r < ∂h 2 = ϕ(f 2 ).
Lemma 2. Let f (z) be a holomorphic function in Ω ⊂ C such thatf (z) = f (z) and p(z) a polynomial with real coefficients and nonzero roots. Let us assume that the roots of p that have nonnegative imaginary part are {θ k } k with multiplicities m k . Then, f (z) = h(z)p(z) + r(z) where h(z) is a holomorphic function in Ω and with c k,ℓ , c k,ℓ,0 , c k,ℓ,1 ∈ R.
Proof. Let q k,ℓ (z) = p(z)/(z − θ k ) ℓ . Is is easy to see that there are c k,ℓ ∈ C such that at each θ k , k m k ℓ=1 c k,ℓ q k,ℓ (z) and its derivatives up to m k − 1 coincide with f (z). It suffices to write down the identities and see that they form a triangular linear system.

Then,
Now, we can integrate the last identity along a closed path encircling only θ k .

Proof of proposition 3. From the identity Φ
where the vectors u i (z) and v i (z) ′ are respectively the columns of U Φ (z) and the rows of V Φ (z), and d i (z) is the ith element in the main diagonal of D(z). We can group terms with common d i , so On the other hand, let us write δ j+1 = c j δ j . We can divide c 1 between z.
If we replace δ 1 in (16), we get We can repeat this device to obtain the form The last step consists of rewriting the terms of the last sum in (17). We divide the elements of B (1) 1 (z) by c 1 (z) using lemma 2, so we get B (1) 1 (z) = c 1 (z)Q 1 (z)+ R 1 (z), where R 1 (z) is a polynomial matrix whose elements have the form given by lemma 2. Then, if c 1 has roots θ k with multiplicities m k , then where B (2) 1 = R 1 . We proceed dividing in turn each B Proof of proposition 4. The VAR representation of the process, Φ(B)y t = ε t , can be written as U (B)D(B)V (B)y t = ε t , where for ease of notation we omit the superscript · Φ . We can write the elements of the main diagonal of D(z) as has all the unit roots of d i (z). Then, we can move the non-unit roots to the right and get the representation U (B)D(B)Ṽ (B)y t = ε t . We will prove first the case when all unit roots are equal to 1. Then, exists r, s such that δ 1 = (1 − z) s and δ h = (1 − z) r . We will write ∆ i (z) = (1 − z) i .
Let us define ∆ i (B)y t = y (i) t . We will write down the form of the least squares estimator. Then by defining Y = y Now, we can write the model as Y = βX + ε and the least squares estimator isβ = Y X ′ (XX ′ ) −1 so thatβ = β + εX ′ (XX ′ ) −1 . We need to analyze the asymptotic behavior of εX ′ and XX ′ . By means of the Beveridge-Nelson decomposition we can represent Y t in a similar way as in section 2 of TT. In particular, we write Y t = Z t + M t , where Z t = DZ t−1 + B t and B t = (0, . . . , 0, C(1)ε t ) ′ and M ℓ,t = O p (t r−ℓ−1 ). We can also write the components of Z t as Z r−1,t = C(1) t j=1 ε j and Z ℓ,t = Z ℓ,t−1 + Z ℓ+1,t . Let us simplify the notation by setting ξ t = C(1)ε t , Z r−1,t = t τ =1 ξ τ and Z ℓ,t = t τ =1 Z ℓ+1,τ . Now, for u ∈ [0, 1], we define W ℓ,t (u) = t ℓ−r+1/2 Z ℓ, [tu] for ℓ < r. Then Since, M ℓ,t = O p (t r−ℓ−1 ), we get for i, j < r, Now, let L * T = diag(T −1/2 I pn , L T ). Then, (β − β)L * T = U * A * −1 equals p is the covariance matrix of (y We will sketch now the proof for the general case. In order to make the notation less cumbersome, we denote the multi-index (j, k, ℓ) as α. We consider its values ordered with the lexicographical order. Then, y in ∆ (h) . It is easy to prove that B 2 = {∆ (h) } ∪ {L ν,τ } ν,τ is also a basis by the Residue Theorem as in lemma 2. Then, there exists an invertible matrix Q such that Q · ∆ (α) (α) = L(0) ′ , . . . , L(s − 1) ′ ′ and L(τ ) = (L 1,τ (z), . . . , L mτ ,τ (z)) ′ contains the elements of B 2 associated to the τ th root. Then, Let us denote z ν,τ,t = L ν,τ (B)y t = (1 − θ −1 τ z) −ν C(B)ε t . When 0 < τ < b, z ν,τ,t and z ν,s−τ,t are conjugate. We can also make a transformation similar to that in section 3 of TT to transform the conjugate pairs into pairs of real and imaginary parts w ( u ν,τ,t = ν, τ, t, v ν,τ,t ) ′ . Then, there is an invertible matrix P such that V B −1 = W C −1 P , W = V P and C = P BP ′ . Consequently, We can deal now with the W 's and C's in a similar fashion as the unity case, along the lines of section 4 of TT.

9: end loop
In order to see that the algorithm always stops and that when it stops, we get the Smith form, we can easily adapt the proof in Hungerford (1980), page 340, replacing the arguments based on the finiteness of the divisors of an element in the ring, by the fact that the degree function takes values in N and thus it can decrease only a finite number of times.
Proof of Theorem 1. To simplify the proof, we assume that the algorithm does not involve the condition r = 0, but just conditions of the form a ij = 0. The proof can be easily adapted then by considering an augmented matrixÃ = [A : R], where R = (r ij ) ij and r ij is the remainder of the Euclidean division of a ii and a ij .
We will denote by (e k , A (k) ) the pair we obtain as the result of iterating (5)-(6) starting from A, whereas (ê k ,Â (k) ) is got by iterating the ǫ−approximate version of (5)-(6) starting fromÂ. We will see that ∀k, where ∀δ > 0, ∃M > 0, T 0 such that ∀T ≥ T 0 , P [ξ −1 T w k T > M |ê ℓ = e ℓ , ∀ℓ ≤ k] < δ. For any random variable that satisfies this property of w k T , we write O cp (ξ T ), that is, conditional order ξ T in probability. It is easy to see that this property behaves in a similar fashion to the usual order in probability, in particular, the product of two O cp (ξ T ) and O cp (η T ) sequences is O cp (ξ T η T ).
We will prove (18) and (19) by induction in k. For k = 0 it is trivial. Now, let us assume that it holds for k − 1 and we will prove it for k.
The case of the admissible operations (c) and (e) are similar, while (a) and (b) are isometries and thus they satisfy trivially the condition. Proof. If δ < 1, then |q − q ′ | < δ entails ϕ(q ′ ) = ϕ(q). Thus, the degrees of the polynomials, and consequently the number of operations involved in the division are bounded. Since the algorithm only requires addition, multiplication and division by the lead coefficient of the divisor, the continuity is granted.
The mapping u in this case boils down to calculate the inverse of the leading coefficient of the argument and by definition, the leading coefficient is always nonzero.
Before proving proposition 7, we need some preliminary results. In TT, it is proved that for purely nonstationary processes, that is, processes that satisfy a A(B)y t = M (B)ε t such that det Φ(z) has only unit roots, the autoregressive least squares estimates are consistent. For processes that are nonstationary, but not purely nonstationary, that is, when det A(z) has roots on and outside the unit circle, the purely nonstationary part of the estimates (in some sense that is specified in the proof of proposition 7) is consistent. Lemma 4. Let Φ s (z) be stable (i.e., without unit roots). Then, for any Φ n (z), possibly with unit roots, S((Φ −1 n + Φ −1 s ) −1 , 0; R 1 (z)) = S(Φ n , 0; R 1 (z)).
Proof of proposition 7. We will use the representation Y t = F Y t−1 + a t , where Y t = (y ′ t , . . . , y ′ t−p+1 ) ′ , a t = LΘ(B)ε t , F is the companion matrix of Φ and L comprises the first n columns of I np . Let J = P F P −1 be the Jordan form of F .
We can decompose it in the stable and non-stable parts as J = diag(J s , J n ).
We can recover the infinite MA representation of y t as y t = L ′ P −1 (1 − Since the first part is stable and the second unstable, we conclude by applying lemma 4.