Quasi-maximum likelihood estimation for cointegrated continuous-time state space models observed at low frequencies

In this paper, we investigate quasi-maximum likelihood (QML) estimation for the parameters of a cointegrated solution of a continuous-time linear state space model observed at discrete time points. The class of cointegrated solutions of continuous-time linear state space models is equivalent to the class of cointegrated continuous-time ARMA (MCARMA) processes. As a start, some pseudo-innovations are constructed to be able to define a QML-function. Moreover, the parameter vector is divided appropriately in long-run and short-run parameters using a representation for cointegrated solutions of continuous-time linear state space models as a sum of a L\'evy process plus a stationary solution of a linear state space model. Then, we establish the consistency of our estimator in three steps. First, we show the consistency for the QML estimator of the long-run parameters. In the next step, we calculate its consistency rate. Finally, we use these results to prove the consistency for the QML estimator of the short-run parameters. After all, we derive the limiting distributions of the estimators. The long-run parameters are asymptotically mixed normally distributed, whereas the short-run parameters are asymptotically normally distributed. The performance of the QML estimator is demonstrated by a simulation study.


Introduction
This paper deals with quasi-maximum likelihood (QML) estimation for the parameters of a cointegrated solution of a continuous-time linear state space model. The source of randomness in our model is a Lévy process, i.e., an R m -valued stochastic process L = (L(t)) t≥0 with L(0) = 0 m P-a.s., stationary and independent increments, and càdlàg sample paths. A typical example of a Lévy process is a Brownian motion. More details on Lévy processes can be found, e.g., in the monograph of Sato [48]. For deterministic matrices A ∈ R N×N , B ∈ R N×m , C ∈ R d×N and an R m -valued Lévy process L, an R dvalued continuous-time linear state space model (A, B,C, L) is defined by the state and observation equation dX (t) = AX (t)dt + BdL(t), Y (t) = CX (t). (1.1) The state vector process X = (X (t)) t≥0 is an R N -valued process and the output process Y = (Y (t)) t≥0 is an R d -valued process. Since the driving noise in this model is a Lévy process the model allows flexible margins. In particular, the margins can be Gaussian if we use a Brownian motion as Lévy process. The topic of this paper are cointegrated solutions Y of linear state space models. Cointegrated means that Y is non stationary but has stationary increments, and there exist linear combinations of Y which are stationary. The cointegration space is the space spanned by all vectors β so that β T Y is stationary. Without any transformation of the state space model (1.1) it is impossible to see clearly if there exists a cointegrated solution, not to mention the form of the cointegration space. In the case of a minimal state-space model (see Bernstein [9] for a definition), the eigenvalues of A determine whether a solution Y may be stationary or cointegrated. If the eigenvalue 0 of A has the same geometric and algebraic multiplicity 0 < c < min(d, m), and all other eigenvalues of A have negative real parts, then there exists a cointegrated solution Y . In that case Y has the form where B 1 ∈ R c×m and C 1 ∈ R d×c have rank c (see Fasen-Hartmann and Scholz [21,Theorem 3.2]). The starting vector Z is a c-dimensional random vector. The process Y st = (Y st (t)) t≥0 is a stationary solution of the state space model dX st (t) = A 2 X (t)dt + B 2 dL(t), Y st (t) = C 2 X st (t), (1.3) driven by the Lévy process L with A 2 ∈ R (N−c)×(N−c) , B 2 ∈ R (N−c)×m and C 2 ∈ R d×(N−c) . The matrices A, A 1 , A 2 , B, B 1 , B 2 , C,C 1 ,C 2 and C 3 are related through an invertible transformation matrix T ∈ R N×N such that where B i T denotes the transpose of B i (i = 1, 2) and 0 (N−c)×c ∈ R (N−c)×c denotes a matrix with only zero components. The process Y in (1.2) is obviously cointegrated with cointegration space spanned by the orthogonal of C 1 if the covariance matrix Cov(L(1)) is non-singular. The probabilistic properties of Y are analyzed in detail in Fasen-Hartmann and Scholz [21] and lay the groundwork for the present paper. Remarkable is that Y is a solution of the state space model (A ′ , B ′ ,C ′ , L) as well.
The class of cointegrated solutions of linear state space models is huge. They are equal to the class of cointegrated multivariate continuous-time ARMA (MCARMA) processes (see Fasen-Hartmann and Scholz [21]). As the name suggests, MCARMA processes are the continuous-time versions of the popular and well-established ARMA processes in discrete-time. In finance and economics continuous-time models provide the basis for option pricing, asset allocation and term structure the-ory. The underlying observations of asset prices, exchange rates, and interest rates are often irregularly spaced, in particular, in the context of high frequency data. Consequently, one often works with continuous-time models which infer the implied dynamics and properties of the estimated model at different frequencies (see Chen et al. [17]). Fitting discrete-time models to such kind of data have the drawback that the model parameters are not time-invariant: If the sampling frequency changes, then the parameters of the discrete-time model change as well. The advantages of continuous-time modelling over discrete-time modelling in economics and finance are described in detail, i.a., in the distinguished papers of Bergstrom [7], Phillips [43]; Chambers, McCrorie and Thornton [15] and in signal processing, systems and control they are described in Sinha and Rao [54]. In particular, MCARMA models are applied in diversified fields as signal processing, systems and control (see Garnier and Wang [22], Sinha and Rao [53]), high-frequency financial econometrics (see Todorov [58]) and financial mathematics (see Benth et al. [6], Andresen et al. [1]). Thornton and Chambers [16] use them as well for modelling sunspot data. Empirical relevance of non-stationary MCARMA processes in economics and in finance is shown, i.a., in Thornton and Chambers [16,56,57].
There is not much known about the statistical inference of cointegrated Lévy driven MCARMA models. In the context of non-stationary MCARMA processes most attention is paid to Gaussian MCAR(p) (multivariate continuous-time AR) processes: An algorithm to estimate the structural parameters in a Gaussian MCAR(p) model by maximum-likelihood started already by Harvey and Stock [27,26,28] and were further explored in the well-known paper of Bergstrom [8]. Zadrozny [60] investigates continuous-time Brownian motion driven ARMAX models allowing stocks and flows at different frequencies and higher order integration. These papers use the state space representation of the MCARMA process and Kalman filtering techniques to compute the Gaussian likelihood function. In a recent paper Thornton and Chambers [57] extend the results to MCARMA processes with mixed stock-flow data using an exact discrete-time ARMA representation of the low-frequency observed MCARMA process. However, all of the papers have in common on the one hand, that they do not analyze the asymptotic properties of the estimators. On the other hand, they are not able to estimate the cointegration space directly or rather relate their results to cointegrated models.
Besides, statistical inference and identification of continuously and discretely observed cointegrated Gaussian MCAR(1) processes, which are homogeneous Gaussian diffusions, are considered in Kessler and Rahbek [32,33]; Stockmarr and Jacobsen [55] and frequency domain estimators for cointegrated Gaussian MCAR(p) models are topic of Chambers and McCrorie [14]. There are only a few papers investigating non-Gaussian cointegrated MCARMA processes. For example, Fasen [18] treats a multiple regression model in continuous-time. There the stationary part is a multivariate Ornstein-Uhlenbeck process and the process is observed on an equidistant time-grid. The model in Fasen [19] is similar but the stationary part is an MCARMA process and the process is observed on a high-frequency time grid.
The aim of this paper is to investigate QML estimators for C 1 , B 1 and the parameters of the stationary process Y st from the discrete-time observations Y (h), . . . ,Y (nh) where h > 0 is fixed. The parameters of C 1 are the long-run parameters, whereas the other parameters are the short-run parameters. Although there exist results on QML for discrete-time processes they can unfortunately not directly applied to the sampled process for the following reasons.
MCARMA processes sampled equidistantly belong to the class of ARMA processes (see Thornton and Chambers [57] and Chambers, McCrorie and Thornton [15]). But identification problems arise from employing the ARMA structure for the estimation of MCARMA parameters. That is until now an unsolved problem (see as well the overview article Chambers, McCrorie and Thornton [15]). Moreover, in this representation the innovations are only uncorrelated and not iid (independent and identically distributed). However, statistical inference for cointegrated ARMA models is done only for an iid noise elsewise even a Gaussian white noise, see, e.g., the monographs of Johansen [30], Lütkepohl [35] and Reinsel [44], and cannot be used for estimation of Lévy driven MCARMA processes.
Another attempt is to use the representation of the sampled continuous-time state space model as discrete-time state space model (see Zadrozny [60]). That is what we do in this paper. Sampling Y with distance h > 0 results in Y (h) := (Y (kh)) k∈N 0 =: (Y (h) k ) k∈N 0 , a cointegrated solution of the discrete time state-space model (1.4) where (ξ (h) k ) k∈N 0 := ( kh (k−1)h e A(kh−t) BdL(t)) k∈N 0 is an iid sequence. For cointegrated solutions of discrete-time state space models of the form where (ε k ) k∈N 0 is a white noise, asymptotic properties of the QML estimator were investigated in the unpublished work of Bauer and Wagner [4]. An essential difference between the state space model (1.4) and (1.5) is that in (1.4) the noise is only going into the state equation, whereas in (1.5) it is going into both the state and the observation equation. That is an essential difference. Because an advantage of model (1.5) over our state space model is that it is already in innovation form, i.e., the white noise (ε k ) k∈N 0 can be represented by finitely many past values of (Y k ) k∈N 0 due to k ) k∈N 0 . Therefore, we are not able to apply the asymptotic results of Bauer and Wagner [4] to the setting of this paper.
We use the Kalman-filter to calculate the linear innovations and to construct an error correction form (see Fasen-Hartmann and Scholz [21,Proposition 5.4 and Theorem 5.7]). However, the linear innovations and the error correction form use infinitely many past values in contrast to the usual finite order form for VARMA models and discrete-time state space models as, e.g., in Lütkepohl and Claessen [36], Saikkonen [45], Yap and Reinsel [59] and respectively, Aoki [2], Bauer and Wagner [4] (see (1.6)). Indeed, the linear innovations are stationary, but in general it is not possible to say anything about their mixing properties. Hence, standard limit results for stationary mixing processes cannot be applied. For more details in the case of stationary MCARMA models we refer to Schlemm and Stelzer [50].
The representation of the innovations motivates the definition of the pseudo-innovations and hence, the pseudo-Gaussian likelihood function. The term pseudo reflects in the first case that we do not use the real innovations and in the second case that we do not have a Gaussian model. This approach is standard for stationary models (see Schlemm and Stelzer [51]) but not so well investigated for non-stationary models. In our model, the pseudo-innovations are as well non-stationary and hence, classical methods for QML estimation for stationary models do not work, e.g., the convergence of the quasi-maximum-likelihood function by a law of large numbers or an ergodic theorem.
Well-known achievements on ML estimation for integrated and cointegrated processes in discrete time are Saikkonen [46,47]. Under the constraint that the ML estimator is consistent and the longrun parameter estimator satisfies some appropriate order of consistency condition, the papers present stochastic equicontinuity criteria for the standardized score vector and the standardized Hessian matrix such that the asymptotic distribution of the ML estimator can be calculated. The main contributions of these papers are the derivation of stochastic equicontinuity and weak convergence results of various first and second order sample moments from integrated processes. The concepts are applied to a ML estimator in a simple regression model with integrated and stationary regressors.
In this paper, we follow the ideas of Saikkonen [47] to derive the asymptotic distribution of the QML estimator by providing evidence that these three criteria are satisfied. However, our model does not satisfy the stochastic equicontinuity conditions of Saikkonen [46,47] such that the weak convergence results of these papers cannot be applied directly. But we use a similar approach. In the derivation of the consistency of the QML estimator we even require local Lipschitz continuity for some parts of the likelihood-function which is stronger than local stochastic equicontinuity. For this reason we pay our attention in this paper to local Lipschitz continuity instead of stochastic equicontinuity.
Although Saikkonen [46,47] presents no general conditions for the analysis of the consistency and the order of consistency of a ML estimator in an integrated or cointegrated model, the verification of the consistency of the ML estimator in the regression example of Saikkonen [47] suggests, how to proceed in more general models. That is done by a stepwise approach: In the first step, we prove the consistency of the long-run parameter estimator and in the second step its consistency rate; the longrun parameter estimator is super-consistent. In the third step, we are able to prove the consistency of the short-run parameter estimator. However, important for the proofs is, as in Saikkonen [47], the appropriate division of the likelihood-function where one part of the likelihood-function depends only on the short-run parameters and is based on stationary processes. This decomposition is not obvious and presumes as well a splitting of the pseudo-innovations in a non-stationary and a stationary part depending only on the short-run parameters.
The paper is structured on the following way. An introduction into QML estimation for cointegrated continuous-time linear state space models is given in Section 2. First, we state in Section 2.1 the assumptions about our parametric family of cointegrated output processes Y . Then, we define the pseudo-innovations for the QML estimation by the Kalman filter in Section 2.2. Based on the pseudoinnovations we calculate the pseudo-Gaussian log-likelihood function in Section 2.3. In Section 2.4 we introduce some identifiability conditions to get a unique minimum of the likelihood function. The main results of this paper are given in Section 3 and Section 4. First, we show the consistency of the QML estimator in Section 3. Next, we calculate the asymptotic distribution of the QML estimator in Section 4. The short-run QML estimator is asymptotically normally distributed and mimics the properties of QML estimators for stationary models. In contrast, the long-run QML estimator is asymptotically mixed normally distributed with a convergence rate of n instead of √ n as occurring in stationary models. Finally, in Section 5 we show the performance of our estimator in a simulation study, and in Section 6 we give some conclusions. Eventually, in Appendix A we present some asymptotic results and local Lipschitz continuity conditions which we use throughout the paper. Because of their technicality and to keep the paper readable, they are moved to the appendix.

Notation
We use as norms the Euclidean norm · in R d and the Frobenius norm · for matrices, which is submultiplicative. 0 d×s denotes the zero matrix in R d×s and I d is the identity matrix in R d×d . For a matrix A ∈ R d×d we denote by A T its transpose, tr(A) its trace, det(A) its determinant, rank A its rank, λ min (A) its smallest eigenvalue and σ min (A) its smallest singular value. If A is symmetric and positive semi-definite, we write A 1 2 for the principal square root, i.e., A 1 2 is a symmetric, positive semi-definite matrix satisfying A For two matrices A ∈ R d×s and B ∈ R r×n , we denote by A ⊗ B the Kronecker product which is an element of R dr×sn , by vec(A) the operator which converts the matrix A into a column vector and by vech(A) the operator which converts a symmetric matrix A into a column vector by vectorizing only the lower triangular part of A. We write ∂ i for the partial derivative operator with respect to the i th coordinate and ∂ i, j for the second partial derivative operator with respect to the i th and j th coordinate. Further, for a matrix function f (ϑ ) in R d×m with ϑ ∈ R s the gradient with respect to the parameter vector ϑ is denoted by ∈ R dm×s . Let ξ = (ξ k ) k∈N and η = (η k ) k∈N be d-dimensional stochastic processes then Γ ξ ,η (l) = Cov(ξ 1 , η 1+l ) and Γ ξ (l) = Cov(ξ 1 , ξ 1+l ), l ∈ N 0 , are the covariance functions. Finally, we denote with w − − → weak convergence and with p − − → convergence in probability. In general C denotes a constant which may change from line to line.

Parametric model
Let Θ ⊂ R s , s ∈ N, be a parameter space. We assume that we have a parametric family (Y ϑ ) ϑ ∈Θ of solutions of continuous-time cointegrated linear state space models of the form where Z is a random starting vector, L ϑ = (L ϑ (t)) t≥0 is a Lévy process and Y st,ϑ = (Y st,ϑ (t)) t≥0 is a stationary solution of the state-space model In the parameterization of the Lévy process L ϑ only the covariance matrix Σ L ϑ of L ϑ is parameterized.
The parameter vector of the underlying process Y is denoted by ϑ 0 , i.e., . Throughout the paper, we shortly write (A 2,ϑ , B 1,ϑ , B 2,ϑ ,C 1,ϑ ,C 2,ϑ , L ϑ ) for the cointegrated state space model with solution Y ϑ as defined in (2.1). To be more precise we have the following assumptions on our model. Assumption A. For any ϑ ∈ Θ the cointegrated state space model (A 2,ϑ , B 1,ϑ , B 2,ϑ ,C 1,ϑ ,C 2,ϑ , L ϑ ) satisfies the following conditions: (A1) The parameter space Θ is a compact subset of R s .
(A2) The true parameter vector ϑ 0 lies in the interior of the parameter space Θ.
(A4) The eigenvalues of A 2,ϑ have strictly negative real parts.
(A6) The matrices B 1,ϑ ∈ R c×m and C 1,ϑ R d×c have full rank c ≤ min(d, m).
(A7) The c-dimensional starting random vector Z does not depend on ϑ , E Z 2 < ∞ and Z is independent of L ϑ .
(i) (A1) and (A2) are standard assumptions for QML estimation. (iv) We require that c respectively the cointegration rank r = d − c is known in advance to be able to estimate the model adequately. In reality, it is necessary to estimate first the cointegration rank r and obtain from this c = d − r. Possibilities to do this is via information criteria.
(v) Using the notation in (A9) it is possible to show that Y ϑ is the solution of the state space model (A ϑ , B ϑ ,C ϑ , L ϑ ). Furthermore, on account of (A5) and (A6), the state space model (A ϑ , B ϑ ,C ϑ ) is minimal with McMillan degree N (see Fasen-Hartmann and Scholz [21,Lemma 2.4]) and hence, as well unique up to a change of basis. That in combination with (A10) is sufficient that Furthermore, we assume that the parameter space Θ is a product space of the form Θ = Θ 1 × Θ 2 with Θ 1 ⊂ R s 1 and Θ 2 ⊂ R s 2 , s = s 1 + s 2 . The vector ϑ = (ϑ T 1 , ϑ T 2 ) T ∈ Θ is a s-dimensional parameter vector where ϑ 1 ∈ Θ 1 and ϑ 2 ∈ Θ 2 . The idea is that ϑ 1 is the s 1 -dimensional vector of long-run parameters modelling the cointegration space and hence, responsible for the cointegration of Y ϑ . Whereas ϑ 2 is the s 2 -dimensional vector of short-run parameters which has no influence on the cointegration of the model. Since the matrix C 1,ϑ is responsible for the cointegration property (see Fasen-Hartmann and Scholz [21, Theorem 3.2]) we parameterize C 1,ϑ with the sub-vector ϑ 1 and use for all the other matrices ϑ 2 . In summary, we parameterize the matrices with the following sub-vectors (A 2,ϑ 2 , B 1,ϑ 2 , B 2,ϑ 2 ,C 1,ϑ 1 ,C 2,ϑ 2 , L ϑ 2 ) for (ϑ 1 , ϑ 2 ) ∈ Θ 1 × Θ 2 = Θ.

Linear and pseudo-innovations
In this section, we define the pseudo-innovations which are essential to define the QML function. Sampling at distance h > 0 maps the class of continuous-time state space models to discrete-time state space models. That class of state space models is not in innovation form and hence, we use a result from Fasen-Hartmann and Scholz [21] to calculate the linear innovations by the Kalman filter. The Kalman filter constructs the linear innovations as ε * ϑ (k) = Y ϑ (kh) − P k−1 Y ϑ (kh) where P k is the orthogonal projection onto span{Y ϑ (lh) : −∞ < l ≤ k} where the closure is taken in the Hilbert space of square-integrable random variables with inner product (Z 1 , Z 2 ) → E(Z T 1 Z 2 ). Thus, ε * ϑ (k) is orthogonal to the Hilbert space generated by span{Y ϑ (lh), −∞ < l < k}. In our setting, the linear innovations are as follows.
be the steady-state Kalman gain matrix. Then, the linear in- ϑ (k)) k∈N := (Y ϑ (kh)) k∈N are the unique stationary solution of the state space equation ϑ C T ϑ is the prediction covariance matrix of the Kalman filter.
We obtain recursively from (2.3) However, the question arises which choice of X * ϑ (1) of the Kalman recursion results in the stationary (ε * ϑ (k)) k∈N . This we want to elaborate in the following. Since all eigenvalues of (e A ϑ h − K ϑ z j for z ∈ C is well-defined and due to Fasen-Hartmann and Scholz [21,Lemma 5.6] has the representation as for the linear filter k(z, ϑ ) : ϑ ∈ R d×d and a matrix α(ϑ ) ∈ R d×(d−c) with full rank d − c. This representation of l(z, ϑ ) helps us to choose the initial condition X * ϑ (1) in the Kalman recursion appropriate so that the linear innovations (ε * ϑ (k)) k∈N are really stationary. Therefore, it is important to know that the stationary process Y st,ϑ can be defined on R as Y st,ϑ (t) = t −∞ f st,ϑ (t − s) dL ϑ (s), t ∈ R, with f st,ϑ (u) = C 2,ϑ e A 2,ϑ u B 2,ϑ 1 [0,∞) (u) and the Levy process (L ϑ (t)) t∈R is defined on the negative real-line as L ϑ (t) = L ϑ (−t−) for t < 0 with an independent copy ( L ϑ (t)) t≥0 of (L ϑ (t)) t≥0 . Then, we have an adequate definition of ∆Y . As notation, we use B for the backshift operator satisfying BY . The matrix sequence (k j (ϑ )) j∈N is uniformly exponentially bounded, i.e. there exist constants C > 0 and 0 < ρ < 1 such that sup ϑ ∈Θ k j (ϑ ) ≤ Cρ j , j ∈ N.
Proof. It remains to show that (k j (ϑ )) j∈N is uniformly exponentially bounded. The proof follows in the same line as Schlemm and Stelzer [51, Lemma 2.6] using that all eigenvalues of (e A ϑ h − K (h) ϑ C ϑ ) lie inside the unit circle (see Scholz [52,Lemma 4.6.7]).
From this representation we see nicely that (ε * ϑ (k)) k∈N is indeed a stationary process. Defining Y (h) ϑ on the negative integers as The main difference of the linear innovations and the pseudo-innovations is that in the linear innovation Y (h) ϑ is going in, whereas in the pseudo-innovations Y (h) is going in. For ϑ = ϑ 0 the pseudoinnovations (ε (h) k (ϑ 0 )) k∈N are the linear-innovations (ε * ϑ 0 (k)) k∈N . In Appendix B we present some probabilistic properties of the pseudo-innovations which we use in the paper. In particular, we see that the pseudo-innovations are three times differentiable.

Quasi-maximum likelihood estimation
We estimate the model parameters via an adapted quasi-maximum likelihood estimation method. Minus two over n times the logarithm of the pseudo-Gaussian likelihood function is given by n . Therefore, we have to approximate the pseudo-innovations and the likelihood-function. For a starting value X (h) 1 (ϑ ), which is usually a deterministic constant, we define recursively based on (2.3) the approximate pseudoinnovations as and the approximate likelihood-function as Then, the QML estimator is defined as the minimizer of the pseudo-Gaussian log-likelihood function L (h) n (ϑ ). The estimator ϑ n,1 estimates the long-run parameter ϑ 1 and the estimator ϑ n,2 estimates the short-run parameter ϑ 2 . However, for our asymptotic results it does not matter if we use L n (ϑ ) as a conclusion of the next proposition. However, for that proposition to hold, we require the following Assumption B which assumes a uniform bound on the second moment of the starting value X (h) 1 (ϑ ) of the Kalman recursion and its partial derivatives.
This assumption is not very restrictive, e.g., if X (h) is a deterministic vector, which we usually have in practice, Assumption B is automatically satisfied.
Proposition 2.5. Let Assumption A and B hold. Moreover, let γ < 1 and u, v ∈ {1, . . . , s}. Then, The proof of this proposition is similarly to the proof of Schlemm and Stelzer [50, Lemma 2.7 and Lemma 2.15]. However, they are some essential differences since in their paper (Y k (ϑ )) k∈N are stationary sequences where in our setup they are non-stationary. Furthermore, we require different convergence rates. A detailed proof can be found in Appendix C.
We split now the pseudo-innovation sequence based on the decomposition ϑ = (ϑ T 1 , ϑ T 2 ) T so that one part is stationary and depends only on ϑ 2 : Due to similar calculations as in (B.1) we receive that Π(ϑ 0 n,2 (ϑ 2 ) depends only on the short-run parameters, whereas L (h) n,1 (ϑ ) depends on all parameters. Furthermore, we have the following relations: n,2 (ϑ 0 2 ). In the remaining of the paper, we will see that the asymptotic properties of ϑ n,1 are determined by L (h) n,1 (ϑ ), whereas the asymptotic properties of ϑ n,2 are completely determined by L n,2 (ϑ 2 ) is based only on stationary processes it is not surprising that ϑ n,2 exhibits the same asymptotic properties as QML estimators for stationary processes.

Identifiability
In order to properly estimate our model, we need a unique minimum of the likelihood function and therefore we need some identifiability criteria for the family of stochastic processes (Y ϑ , ϑ ∈ Θ). The first assumption guarantees the uniqueness of the long-run parameter ϑ 0 1 .
(ii) Due to the Lipschitz-continuity of C ⊥T 1,ϑ 1 and C ⊥T and B 1 have full rank, and thus, the process ε (h) k,1 (ϑ ) k∈N is indeed non-stationary for all long-run parameters ϑ 1 = ϑ 0 1 . (iv) The matrix function α(ϑ ) is continuous and has full column rank d − c so that necessarily inf ϑ ∈Θ σ min (α(ϑ )) > 0. Applying Bernstein [9, Corollary 9.6.7] gives for some constant C > 0: The next assumption guarantees the uniqueness of the short-run parameter ϑ 0 2 . Assumption D. For any ϑ 0 2 = ϑ 2 ∈ Θ 2 there exists a z ∈ C such that either Lemma 2.7. Let Assumption A and D hold. The function L (h) has a unique global minimum at ϑ 0 2 .
Proof. The proof is analogous to the proof of Lemma 2.10 in Schlemm and Stelzer [51].
Without the additional Assumption D we obtain only that L (h) 2 (ϑ 2 ) has a minimum in ϑ 0 2 but not that the minimum is unique.
Due to Fasen-Hartmann and Scholz [21, Theorem 3.1] a canonical form for cointegrated state space processes already exists and can be used to construct a model class satisfying Assumption C and Assumption D. Further details are presented in Fasen-Hartmann and Scholz [20]. Moreover, criteria to overcome the aliasing effect (see Blevins [10], Hansen and Sargent [25], McCrorie [40,41], Phillips [42,43], Schlemm and Stelzer [51]) are given there.

Consistency of the QML estimator
In order to show the consistency of the QML estimator, we follow the ideas of Saikkonen [47] in the simple regression model. Thus, we prove the consistency in three steps. In the first step, we prove the consistency of the long-run QML estimator ϑ n,1 and next we determine its consistency rate. Thirdly, we prove the consistency of the short-run QML estimator ϑ n,2 by making use of the consistency rate of the long-run QML estimator. Throughout the rest of this paper, we assume that Assumption A -D always hold. Furthermore, we denote by

Proof of Theorem 3.1
The following lemmata are important for the proof of the theorem.  (1), and V n and Q n are defined as in Proposition A.3.
In the following, we use Bernstein [9, (2.2.27) and Corollary 9.3.9] to get the upper bound Similarly we find upper bounds for the other terms. Moreover, due to Lemma B.1 (b) Since V −1 ϑ and log detV ϑ are Lipschitz continuous by Lemma B.1(a), we obtain Moreover, Π(ϑ ) is Lipschitz continuous as well (see Lemma B.1(a)) and the sequence of matrix functions (k j (ϑ )) j∈N and (∇ ϑ k j (ϑ )) j∈N are exponentially bounded (see Lemma 2.3 and Lemma B.2). Due to (A.4) and ε 2 dr and the convergence holds in the space of continuous functions on Θ with the supremum norm.

Proof.
(a) is a consequence of Proposition A.1(a) and the continuous mapping theorem.
Second, a conclusion of Proposition A.1(b) and the continuous mapping theorem is that On the other hand, due to Lemma 2.7 and Lemma 3.4(a) inf Using (2.6) and the above results we receive Hence, it suffices to show that for any τ > 0 An application of Lemma 3.4(b) and the continuous mapping theorem yield Due to Bernstein [9, Corollary 9.6.7] where we used Bernstein [9, 2.2.27] to permutate the matrices in the trace. The random matrix dr is P-a.s. positive definite since B 1 and the covariance matrix of W 3 have full rank. Hence, there exists an m × m-dimensional symmetric positive random matrix W * with Again an application of Bernstein [9, Corollary 9.6.7] and (3.9) yields n,1 (ϑ ) p −→ ∞ and thus, (3.7) is proven.

Super-consistency of the long-run QML estimator
From the previous section we already know that the QML estimator ϑ n,1 for the long-run parameter is consistent. In the following, we will calculate its consistency rate. For 0 ≤ γ < 1 define the set 11) and N n,γ (ϑ 0 1 , δ ) := Θ 1 \N n,γ (ϑ 0 1 , δ ) as its complement. As Saikkonen [47, eq. (26)] we receive the consistency rate from the next statement.

Proof of Theorem 3.5
The proof uses the next lemma.
Lemma 3.7. Let the notation of Lemma 3.3 hold. Then, Proof.
(a) Several applications of Bernstein [9, Corollary 9.6.7] give similarly as in (3.10) An application of Assumption C (see Remark 2.6) yields (a).
Proof of Theorem 3.5. Due to Proposition 2.5 the lower bound holds. We investigate now the second term. Note that L (h) n,2 (ϑ 2 ) depends only on the short-run parameters. Therefore, we take the infeasible estimator ϑ st n,2 := arg min ϑ 2 ∈Θ 2 L (h) n,2 (ϑ 2 ) for the short-run parameter ϑ 0 2 minimizing L (h) n,2 (ϑ 2 ). For this reason, we can interpret this as a "classical" stationary estimation problem. Applying a Taylor-expansion of nL for an appropriate intermediate value ϑ n,2 ∈ Θ 2 with ϑ n,2 −ϑ 0 n,2 (ϑ n,2 ) and √ n( ϑ st n,2 − ϑ 0 2 ) are asymptotically normally distributed (these are special and easier calculations as in Section 4.2) we can conclude n · L (h) Thus, if we can show that then for any τ > 0 Before we prove (3.12) we first note that due to (3.7) we only have to consider the set for n large enough instead of the whole set N n,γ (ϑ 0 1 , δ 1 ) in the infimum. Note that inf ϑ ∈Θ σ min ((V Due to Proposition A.1(b) and Lemma 3.3, we receive Thus, finally sup ϑ ∈M n,γ (ϑ 0 1 ,δ )×Θ 2 nL

Proof of Theorem 3.8
Again we prove some auxiliary results before we state the proof of the theorem. Lemma 3.10 corresponds to Saikkonen [47, eq. (32)] and Lemma 3.11 to Saikkonen [47, eq. (33)] for the regression model.
Proof. Due to Lemma 3.3 and Lemma 3.7 we have the upper bound Then, using Lemma B.1(b) results in Since U n = O p (1) by Lemma 3.3 and tr 1 by Proposition A.1(b) and the continuous mapping theorem, the right hand side of (3.13) converges to 0 in probability if 1 2 < γ < 1. This proves the lemma.
Lemma 3.11. For any δ > 0 and τ > 0 we have On the one hand, the first two terms converge to zero in probability, due to Lemma 3.4(a) and the continuous mapping theorem. On the other hand, inf 2 (ϑ 2 ) has a unique minimum in ϑ 0 2 by Lemma 2.7.

Asymptotic distributions of the QML estimator
The aim of this section is to derive the asymptotic distributions of the long-run parameter estimator ϑ n,1 and the short-run parameter estimator ϑ n,2 . These two estimators have a different asymptotic behavior and a different convergence rate. On the one hand, we prove the asymptotic normality of the short-run QML estimator and on the other hand, we show that the long-run QML estimator is asymptotically mixed normally distributed.

Asymptotic distribution of the long-run parameter estimator
We derive in this section the asymptotic distribution of the long-run QML estimator ϑ n,1 . From Corollary 3.6 we already know that ϑ n,1 − ϑ 0 1 = o p (n −γ ), for 0 ≤ γ < 1. Since the true parameter ϑ 0 = ((ϑ 0 1 ) T , (ϑ 0 2 ) T ) T is an element of the interior of the compact parameter space Θ = Θ 1 × Θ 2 due to Assumption A, the estimator ϑ n,1 is at some point also an element of the interior of Θ 1 with probability one. Because the parametrization is assumed to be threefold continuously differentiable, we can find the minimizing ϑ n = ( ϑ T n,1 , ϑ T n,2 ) T via the first order condition ∇ ϑ 1 L (h) n ( ϑ n,1 , ϑ n,2 ) = 0 s 1 . We apply a Taylor-expansion of the score vector around the point (ϑ 0 1 , ϑ n,2 ) resulting in n (ϑ n,1 , ϑ n,2 )n( ϑ n,1 − ϑ 0 1 ), n (ϑ n,1 , ϑ n,2 ) denotes the matrix whose i th row, i = 1, . . . , s 1 , is equal to the i th row of In the case n (ϑ n,1 , ϑ n,2 ) is non-singular we receive Thus, we have to consider the asymptotic behavior of the score vector ∇ ϑ 1 L

Asymptotic behavior of the score vector
First, we show the convergence of the gradient with respect to the long-run parameter ϑ 1 . For this, we consider the partial derivatives with respect to the i th -component of the parameter vector ϑ , i = 1, . . . , s 1 , of the log-likelihood function. These partial derivatives are given due to differentiation rules for matrix functions (see, e.g., Lütkepohl [ From Section B we already know that the pseudo-innovations are indeed three times differentiable. For reasons of brevity, we write ∂ 1 i := ∂ ∂ ϑ 1i for the partial derivatives with respect to the i thcomponent of the long-run parameter vector ϑ 1 ∈ Θ 1 , i ∈ {1, . . . , s 1 }, and similarly ∂ st j := ∂ ∂ ϑ 2 j for the partial derivatives with respect to the j th -component of the short-run parameter vector ϑ 2 ∈ Θ 2 , j ∈ {1, . . . , s 2 }. Analogously we define ∂ 1 i, j and ∂ st i, j , respectively for the second partial derivatives.
Note that the second term I n,2 converges due to the ergodicity of (ε
− − → 0. Thus, it only remains to show the convergence of the last term I n, 3 . We obtain with Proposition A.1(a,c) and the continuous mapping theorem

Asymptotic behavior of the Hessian matrix
The second partial derivatives of the log-likelihood function L Since the Hessian matrix should be asymptotically positive definite we need an additional assumption.
The asymptotic distribution of the Hessian matrix is given in the next proposition.

Proposition 4.2.
Let Assumption E additionally hold. Define the (s 1 × s 1 )-dimensional random matrix Z 1 (ϑ 0 ) as T for i, j = 1, . . . , s 1 . Then, Z 1 (ϑ 0 ) is almost surely positive definite and Proof. First, we prove the asymptotic behavior of the score vector and then, in the next step, that the limit is almost surely positive definite.
Step 1: The first term 1 n I n,1 in (4.4) converges to zero due to the additional normalizing rate of n −1 . Due to Proposition A.1 (a,c) we have for j = 2, . . . , 7 that I n, j = O p (1) (see exemplarily (4.3) for I n,5 ) and hence, 1 n ∑ 7 j=2 I n, j converges in probability to zero. To summarize, Due to Lemma B.2 and Proposition A.1 (a,c) we receive

Then, Proposition A.1(b) and the continuous mapping theorem result in
In particular, we have also the joint convergence of the partial derivatives.
Step 2: Note that we can take W 1 = C 1 B 1 W 3 and define M := B 1 1 0 W 3 (r)W 3 (r) T drB T 1 , which is a P-a.s. positive definite c × c matrix. We apply the Cholesky decomposition M = M * M T * . By using properties of the vec operator and the Kronecker product (see Bernstein [9, Chapter 7.1]) we have Now, if we consider the Hessian matrix Z 1 (ϑ 0 ), we have Due to Assumption E the ((d − c)c × s 1 )-dimensional matrix ∇ ϑ 1 C ⊥T 1,ϑ 0 1 C 1 is of full column rank and hence, the product has full rank s 1 . Therefore, we have the positive definiteness almost surely.

Asymptotic mixed normality of the long-run QML estimator
We are able now to show the weak convergence of the long-run QML estimator and thus, we have one main result.
Proof of Lemma 4.4.
(a) Note that on the one hand, n,1,1 (ϑ 0 1 , ϑ 2 ) = 0 and on the other hand, We can conclude with similar calculations as in Lemma 3.3 applying (A.4) and (A.6) that Since U n = O p (1) due to Lemma 3.3 we obtain the statement.
Then, the first term is bounded by (A.2) and the second term by (A.4) and (A.6), respectively. Hence, Since U n = O p (1) due to Lemma 3.3 and 1 The weak convergence of ∇ ϑ 1 L (h) n (ϑ 0 1 , ϑ n,2 ) to J 1 (ϑ 0 ) follows then by Proposition 2.5, Proposition 4.1 and Lemma 4.4(a). Due to Proposition 2.5, Proposition 4.2 and Lemma 4.4(b) we have that n (ϑ n,1 , ϑ n,2 ) converges weakly to the random matrix Z 1 (ϑ 0 ). In particular, Proposition A.1 also guarantees the joint convergence of both terms. Finally, the almost sure positive definiteness of Z 1 (ϑ 0 ) allows us to take the inverse and reorder (4.1) so that

Asymptotic distribution of the short-run parameter estimator
Lastly, we derive the asymptotic normality of the short-run QML estimator ϑ n,2 which we prove by using a Taylor-expansion of the QML-function similarly as in Section 4.1. Before we start the proof we want to derive some mixing property of the process (Y k ) k∈Z because this will be used throughout this section.

Asymptotic behavior of the score vector
First, we prove that the partial derivatives have finite variance.
Lemma 4.6. For any ϑ 2 ∈ Θ 2 , n ∈ N, and i = 1, . . . , s 2 the random variable Proof. We have due to Lemma B.3 (b) and the Cauchy-Schwarz inequality so that the statement follows with (4.2).
Now we can prove the convergence of the covariance matrix of the score vector where we plug in the true parameter.
Lemma 4.7. We have for all ϑ 2 ∈ Θ 2 and ℓ Proof. We can derive the result in a similar way as in Schlemm and Stelzer [51, Lemma 2.14]. Hence, we only sketch the proof to show the differences. A detailed proof can be found in Scholz [52,Section 5.9]. It is sufficient to show that for all ϑ 2 ∈ Θ 2 and all i, j = 1, . . . , s 2 the sequence converges as n → ∞. By the representation of the partial derivatives in (4.2) and (B.1) the sequence is stationary and the covariance in (4.8) depends only on the difference l = k 1 − k 2 . If we can show that then the Dominated Convergence Theorem implies Due to Lemma 4.5 and the uniformly exponentially bound of (k j (ϑ )) and (∂ st i k j (ϑ )) finding the dominant goes in the same vein as in the proof of Schlemm and Stelzer [51, Lemma 2.14].
In the following, we derive the convergence of the score vector with respect to the short-run parameters by a truncation argument. Proposition 4.8. For the gradient with respect to the short-run parameters the asymptotic behavior holds, where I(ϑ 0 2 ) is the asymptotic covariance matrix given in (4.7).
Proof. First, we realize that representation (4.2) and Lemma B. We Using the Cramér-Wold device and the univariate central limit theorem of Ibragimov [29,Theorem 1.7] for strongly mixing random variables we obtain ). Next, we need to show that Therefore, we first prove that Cov Y 1+l,2 (ϑ 0 ) as M → ∞. Note that the bilinearity property of the covariance operator implies We obtain with the Cauchy-Schwarz inequality, the exponentially decreasing coefficients (k j (ϑ 0 )) j∈N and the finite 4th-moment of Y (h) st and ∆Y (h) due to Assumption A that for some 0 < ρ < 1, Moreover, by the proof of Lemma 4.6 we have Var ∂ st j ℓ 1,2 (ϑ 0 ) 2 < ∞ as well. Thus, (4.12) converges uniformly in l at an exponential rate to zero as M → ∞ and 1+l,2 (ϑ 0 ) . Step 2: In this step, we show that 1 where U M,k and V M,k are defined in the same vein as Z M,k . Since both terms can be treated similarly we consider only the first one M,1+l )|. (4.14) With the same arguments as in Schlemm and Stelzer [51, Lemma 2.16] we obtain independent of i and j the upper bound , which implies tr Var 1 √ n ∑ n k=1 U M,k ≤ Cρ M (M + C), due to (4.14), for some constant C > 0. With the same ideas one obtains an equivalent bound for tr Var 1  Step 3: With the multivariate Chebyshev inequality (see Schlemm [49,Lemma 3.19]) and (4.15) from Step 2 we obtain for every τ > 0 that All in all, the results of Step 1 and Step 3 combined with Brockwell and Davis [13, Proposition 6.3.9] yield the asymptotic normality in Lemma 4.8.

Asymptotic behavior of the Hessian matrix
We require an additional assumption for the Hessian matrix (with respect to the short-run parameters) to be positive definite. Therefore, we need some notation. We write shortly F ϑ := e A ϑ h − K (h) ϑ C ϑ . The function is similar to the function in Schlemm and Stelzer [51, Assumption C11]. However, we define F ϑ different since we do not have a moving average representation of Y (h) with respect to the innovations ε (h) . Hence, we have to adapt the criterion in Schlemm and Stelzer [51] and define the function (4.16) Assumption F. Let there exist a positive index j 0 such that the [( j 0 + 2)d 2 × s 2 ] matrix ∇ ϑ 2 ψ ϑ 0 , j 0 has rank s 2 .
Proposition 4.9. Let Assumption F additionally hold. Then, where the (s 2 × s 2 )-dimensional matrix Z st (ϑ 0 ) is given by Moreover, the limiting matrix Z st (ϑ 0 ) is almost surely a non-singular deterministic matrix.
Proof. We proceed as in the proof of Proposition 4.2.
Step 1: k (ϑ 0 )) k∈N is a stationary and an ergodic sequence with finite absolute moment (see Lemma B.3(a)) we obtain with Birkhoff's Ergodic Theorem Combining this with Lemma B.3 (c,d) results in Step 2: Next we check that Z st (ϑ 0 ) is positive definite with probability one. That we show by contradiction similarly to the corresponding proofs in Schlemm and Stelzer [51,Lemma 3.22] or Boubacar and Francq [11,Lemma 4], respectively. From Step 2 we know that We can factorize Z st,2 (ϑ 0 ) in the following way: Thus, Z st,2 (ϑ 0 ) is obviously positive semi-definite. Similarly, Z st,1 (ϑ 0 ) is positive semi-definite. It remains to check that for any We assume for the sake of contradiction that there exists a vector b ∈ R s 2 \{0 s 2 } such that In order to be zero, each summand b T Z st,1 (ϑ 0 )b and b T Z st,2 (ϑ 0 )b must be zero, since Z st,1 (ϑ 0 ) as well as Z st,2 (ϑ 0 ) are positive semi-definite. But b T Z st,1 (ϑ 0 )b = 0 is only possible if

Asymptotic normality of the short-run QML estimator
We conclude this section with the last main result of this paper, namely the asymptotic distribution of the short-run QML estimator. Then, as n → ∞, Again we need the following auxiliary result for the proof.

Simulation study
In this section we want to demonstrate the validity of the proposed QML-method by a simulation study. The simulated state space processes are driven either by a standard Brownian motion or by a NIG (normal inverse Gaussian) Lévy process with mean value 0 m . The increment of an m-dimensional NIG Lévy process L(t) − L(t − 1) has the density where µ ∈ R m is a location parameter, α ≥ 0 is a shape parameter, β ∈ R m is a symmetry parameter, δ ≥ 0 is a scale parameter and ∆ ∈ R m×m is positive semi-definite with det ∆ = 1 determining the dependence of the components of (L(t)) t≥0 . The covariance of the process is then For more details on NIG Lévy processes see, e.g., Barndorff-Nielsen [3]. In all simulation studies we have simulated 350 independent replicates of a cointegrated state space process on an equidistant time grid 0, 0.01, . . . , 2000 by applying an Euler scheme to the stochastic differential equation (1.1) with initial value X (0) = 0 N and h in the observation scheme is chosen as 1. Moreover, we use canonical representations of the state space models. On the one hand, C 1,ϑ 1 are chosen on such a way that C 1,ϑ 1 are lower triangular matrices with C T 1,ϑ 1 C 1,ϑ 1 = I c and similarly C ⊥ 1,ϑ 1 are lower triangular matrices with C ⊥T 1,ϑ 1 C ⊥ 1,ϑ 1 = I d−c satisfying Assumption A, Assumption C, and Assumption E for a properly chosen parameter space Θ. On the other hand, the parametrization of the stationary part Y st,ϑ is based on the echelon canonical form as given in Schlemm and Stelzer [51] such that as well Assumption A and Assumption D are satisfied for the properly chosen parameter space Θ. The echelon canonical form is widely used in the VARMA context, see, e.g., Lütkepohl and Poskitt [37] and the textbooks of Lütkepohl [35], or Hannan and Deistler [24]. In the context of linear state space models canonical representations can also be found in Guidorzi [23]. For the asymptotic normality of the short-run parameters we require additionally Assumption F. However, this condition cannot be checked analytically, this is only possible numerically.

Bivariate state space model
As canonical parametrization of the family of cointegrated state space models we take This implies that we have one cointegration relation and the cointegration rank is equal to 1. In total we have 13 parameters. We use In order to obtain the covariance matrix of the NIG Lévy process, we have to set the parameters of the NIG Lévy process to On this way the parameters of the stationary process Y st,ϑ are chosen as in Schlemm and Stelzer [51,Section 4.2], who performed a simulation study for QML estimation of stationary state space processes.
In Table 1 the sample mean, bias and sample standard deviation of the 350 replicates of the estimated parameters are summarized. From this we see that in both the NIG-driven as well the Brownian motion driven model the bias and the sample standard deviation are quite low which reflect the consistency of our estimator. Moreover, for the Brownian-motion driven model the sample standard deviation is for all parameters lower than for the NIG-driven model which is not surprising since the Kalman filter as well as the quasi-maximum likelihood function are motivated from the Gaussian case. In contrast, the bias in the NIG-driven model is often lower than in the Gaussian model. It attracts attention that in both models the cointegration parameter ϑ 13 has the lowest bias and sample standard deviation of all estimated parameters. This is in accordance with the fact that the consistency rate for the long-run parameters is faster than that for the short-run parameters.
Next, we investigate what happens if we use as underlying parameter space in the QML method a space which does not contain the true model. In the first parameter space Θ I , we set B 2,ϑ = 0 3×2 and all other matrices as above. Hence, Y ϑ for ϑ ∈ Θ I is integrated but not cointegrated. In the second parameter space Θ W , we set C 1,ϑ = (0, 1) T and all other matrices as above such that Y ϑ for ϑ ∈ Θ W is cointegrated but the cointegration space does not model the true cointegration space. Finally, in the last parameter space Θ S , we set C 1,ϑ = (0, 0) T and all other matrices as above such that Y ϑ for ϑ ∈ Θ S is stationary and coincides with Y ϑ ,st . The sample mean, sample standard deviation, minimal value and maximal value of the minimum of the likelihood function for 100 replications of the Brownian motion driven model in the four different spaces is presented in Table 2. Of course in the space Θ, containing the true model, the sample mean of the minimum of the likelihood function is lowest. However, the sample mean for the space Θ I is not to far away because there at least the long-run parameters can be estimated more or less appropriate such that due to Proposition 2.5 and Lemma 3.4 we get inf ϑ ∈Θ I L (h) 2 (ϑ 2 ). However, the standard deviation is much lower in Θ than in Θ I . In contrast to the spaces Θ W and Θ S where the likelihood function seems to diverge. This is in accordance to the results of this paper because due to Proposition 2.5, Lemma 3.4 and (3.8) we have inf

Three-dimensional state space model
The canonical parametrization of the cointegrated state space model has the form ϑ 13 ϑ 14 ϑ 4 + ϑ 6 ϑ 12 ϑ 5 + ϑ 6 ϑ 13 ϑ 7 + ϑ 6 ϑ 14 ϑ 8 + ϑ 10 ϑ 12 ϑ 9 + ϑ 10 ϑ 13 ϑ 11 + ϑ 10 ϑ 14     , The state space model has two common stochastic trends and the cointegration space is a onedimensional subspace of R 3 . In total we have 28 parameters. In Table 3 the sample mean, bias and sample standard deviation of the estimated parameters of 350 replicates are summarized for both the NIG-driven as well the Brownian-motion driven model. In order to obtain the covariance matrix of the NIG Lévy process, we have had to set the parameters of the NIG Lévy process to The results are very similar to the two-dimensional example. In most cases the sample standard deviation in the Brownian motion-driven model is lower than in the NIG-driven model. Moreover, the bias and the standard deviation of the long-run parameters (ϑ 27 , ϑ 28 ) are lower than the values of the other parameters.

Conclusion
The main contribution of the present paper is the development of a QML estimation procedure for the parameters of cointegrated solutions of continuous-time linear state space models sampled equidistantly allowing flexible margins. We showed that the QML estimator for the long-run parameter is super-consistent and that of the short-run parameter is consistent. Moreover, the QML estimator for the long-run parameter converges with a n-rate to a mixed normal distribution, whereas the short-run parameter converges with a √ n-rate to a normal distribution. In the simulation study, we saw that the estimator works quite well in practice.
In this paper, we lay the mathematical basis for QML for cointegrated solutions of state-space models. In a separate paper Fasen-Hartmann and Scholz [20] we present an algorithm to construct  canonical forms for the state space model satisfying the assumptions of this paper, which is necessary to apply the method to data. We decided to split the paper because the introduction into a canonical form is quite lengthy and would blow up the present paper. Moreover, a drawback of our estimation procedure is that we assume that the cointegration rank is known in advance which is not the case in reality. First, we have to estimate and test the cointegration rank. For this it is possible to incorporate some well-known results for estimating and testing the cointegration rank for cointegrated VARMA processes as, e.g., presented in Bauer and Wagner [5], Lütkepohl and Claessen [36], Saikkonen [45], Yap and Reinsel [59]. This will also be considered in Fasen-Hartmann and Scholz [20]. Some parts of Fasen-Hartmann and Scholz [20] can already be found in Scholz [52].
The stated weak convergence results also hold jointly.
Before we state the proof of Proposition A.1 we need some auxiliary results.
Lemma A.2. Let ψ be defined as in (3.2). Then, the following statements hold.
Brownian motion with covariance matrix du and ψ is defined as in (3.2).
Proof. We shortly sketch the proof. The sequence (ξ  Proof of Proposition A.1. (a) The proof follows directly by Theorem 4.1 of Saikkonen [46] and the comment of Saikkonen [46, p.163 Note that we have two different coefficient matrices, whereas the results in Saikkonen [46] are proved for the same coefficient matrix. However, Saikkonen [46,Theorem 4.1] also holds if the coefficient matrices are different as long as each sequence of matrix coefficients satisfies the necessary conditions as mentioned in the paper of Saikkonen [46, p. 163  In particular, V n + Q n = O p (1).

C. Proof of Proposition 2.5
First, we present some auxiliary results for the proof of Proposition 2.5.
Lemma C.1. Let Assumption A and B hold. Define X (h) Proof. We prove (a) exemplary for (b) and (c). First, remark that E Y (h) j Since all eigenvalues of (e A ϑ h − K (h) ϑ C ϑ ) lie inside the unit circle (see Scholz [52,Lemma 4.6.7]) and all matrix functions are continuous on the compact set Θ and, hence, bounded, we receive for some 0 < ρ < 1 that sup ϑ ∈Θ e A ϑ h − K (h) ϑ C ϑ ≤ ρ and sup ϑ ∈Θ X (h) Similarly, Finally, due to Assumption B Lemma C.2. Let Assumption A and B hold. Furthermore, let u, v ∈ {1, . . . , s}.