Asymptotic Properties of an Estimator of the Drift Coefficients of Multidimensional Ornstein-Uhlenbeck Processes that are not Necessarily Stable

In this paper, we investigate the consistency and asymptotic efficiency of an estimator of the drift matrix, $F$, of Ornstein-Uhlenbeck processes that are not necessarily stable. We consider all the cases. (1) The eigenvalues of $F$ are in the right half space (i.e., eigenvalues with positive real parts). In this case the process grows exponentially fast. (2) The eigenvalues of $F$ are on the left half space (i.e., the eigenvalues with negative or zero real parts). The process where all eigenvalues of $F$ have negative real parts is called a stable process and has a unique invariant (i.e., stationary) distribution. In this case the process does not grow. When the eigenvalues of $F$ have zero real parts (i.e., the case of zero eigenvalues and purely imaginary eigenvalues) the process grows polynomially fast. Considering (1) and (2) separately, we first show that an estimator, $\hat{F}$, of $F$ is consistent. We then combine them to present results for the general Ornstein-Uhlenbeck processes. We adopt similar procedure to show the asymptotic efficiency of the estimator.


Introduction
Multidimensional processes with linear drift parameter have been used for modelling various physical phenomena. Among recent papers, works by Jankunas and Khasminskii ([12]) and Khasminskii, Krylov and Moshchuk ( [15]) on the estimation of the drift parameters of linear stochastic differential equations (of the form, dX t = AX t dt + n i=1 σ i X t dw i (t) and dX t = A θ X t dt + m i=1 σ i X t dw i (t)) can be mentioned. It should be noted that our work on Ornstein-Uhlenbeck (OU) processes does not follow from theirs and that the methodology used in our paper is also quite different from theirs.
The motivation for this work comes from Lai and Wei's paper [20], in which the authors have shown the strong consistency of the least square estimators of the coefficients of the discrete univariate general AR(p) processes. In this paper, we not only show that an estimator (which is the maximum likelihood estimator in the special case when A is nonsingular) of the drift parameter of the general multidimensional OU process is consistent but also show that it is asymptotically efficient. We consider the following SDE representation of the OU process: with any starting point Y 0 independent of the Brownian motion {W t , t ≥ 0}.
Here Y is a p-dimensional process, A is a constant matrix of p × r dimesnion and W t is a r-dimensional standard Brownian motion. Notice that it is always easier to estimate A through quadratic variation of the process by using Itô's rule. But, estimating F is usually the more difficult task. It is generally believed that one needs stationarity of the process to estimate F .
However, one may observe, is invertible and, in this case, the estimator is unbiased (as the expectation of the second term is zero). We show here thatF T is a consistent and an asymptotically efficient estimator of F , irrespective of the stationarity (or stability) of the process, provided F and A together satisfy a RANK condition (a), given in Section 2. This RANK condition is essential to prove that ( T 0 Y t Y ′ t dt) is invertible. We note here, if A is a nonsingular matrix, the RANK condition automatically holds. In fact, it is also easy to see that for a continuous autoregressive process (i.e., CAR(p)), the RANK condition holds.
We also make another assumption, condition (b). It is the distinctness of the eigenvalues with positive real parts. However, we point out that this condition can be relaxed with a condition (b') and also that if none of the conditions (b) or (b') hold it is still possible to proceed with the estimation (see the discussion after Remark 3.2). Notice that the condition (b') holds for the drift F in CAR(p) processes.
Our paper is organized as follows. In Section 2, we present the basic assumptions and the main theorems. In Section 3, we describe the case in which the eigenvalues of F have positive real parts. Methodology used here is similar to that of Lai and Wei's paper [20], while the case in which the eigenvalues of F have negative or zero real parts is quite different from them and it is discussed in Section 4. This case, in fact, combines the three cases, zero eigenvalues, purely imaginary eigenvalues and the eigenvalues with negative real parts. Details on the rates of growth and so forth for zero eigenvalues and imaginary eigenvalues are given in the Appendix. Section 5 examines the mixed case for consistency. The section 6 presents the results on asymptotic efficiency and some concluding remarks.

Basic Assumptions and the Main Theorem
We can decompose any p × p matrix F into the rational canonical form where G i are p i × p i matrices and M i are p i × p matrices for i = 0, 1 and p 0 + p 1 = p. Rows of M i and rows of M j are orthogonal for i = j.
All roots of G 0 lie in the right half space; all roots of G 1 lie on the left half space.

EXAMPLE Let
Then the characteristic polynomial of A is Thus φ 1 (t) = t − 2 and φ 2 (t) = t 2 + 1 are the distinct irreducible monic divisors of f (t). After computation, we find that is the minimal polynomial of A and thus the companion Similarly, the companion matrix for φ 2 (t) = t 2 + 1 is The rational canonical form of A is thus In the example above, the rational canonical form of A is formed by 3 blocks: Observe that, from (1.1) Y t = e F t Y 0 + t 0 e F (t−s) AdW s and thus have a multivariate Gaussian distribution with the mean e F t and the covariance Since Y t is Gaussian it has a positive density if and only if the covariance matrix is nonsingular. The RANK assumption which is the special case of Hörmander's hypoellipticity condition ensures the positive density of Y t (for details, see [11]), and hence the nonsingularity of covariance matrix.
Following Basawa and Rao ([5] Let F A = [A : F A : · · · : F p−1 A]. Then RANK (F A ) = p by the RANK assumption. Consider for i = 0, 1, From (2.2) and the argument given above, we conclude that T 0 U it U ′ it dt is positive definite a.s. for i = 0, 1.
We now present our main theorems whose proofs are given in Section 5 and in Section 6, respectively. Throughout the paper, we use λ min (C) and λ max (C) to denote the minimum and maximum eigenvalues of a matrix C.

Eigenvalues in the Right Half Space
We consider the case where all the eigenvalues of F have positive real parts.
In this case, it can be seen that Y t → ∞ exponentially fast as t → ∞.
To introduce the main result of this section we define a Gaussian random variable Since all the eigenvalues of F have positive real parts, it is clear that, We now derive the following results. Then, Moreover, B is positive definite with probability 1. Consequently, Here and throughout the paper, log x means the natural logarithm of x.
Thus, ||A 0 || 2 is equal to the maximum eigenvalue of A ′ 0 A 0 . Moreover, if A 0 is symmetric and non-negative definite, then ||A 0 || = λ max (A 0 ). In particular, for the companion matrix e −F T in Theorem 3.1, we have the following Lemma.
Since Z t converges almost surely to a finite random variable Z, sup {t≥0} Z t is finite almost surely and for each t ≥ T /2, ||Z T − Z T −t ||, being a cauchy sequence, converges to zero, almost surely, as T → ∞. Also, by Lemma have the first integral of (3.3), which is less than ǫ and the second integral goes to zero as sup {t≥0} Z t (ω) is finite and T T /2 ||e −F t || 2 dt → 0 as T → ∞. Therefore, (3.5) To show B = ∞ 0 e −F t ZZ ′ e −F ′ t dt is positive definite with probability 1, observe that Z has positive Gaussian density. Hence P (Z = 0) = 1. Fix an ω, such that Z(ω) = 0. Suppose, if possible, Then, for almost all t ∈ (0, T ), x ′ e −F t Z(ω) = 0, i.e., for almost all t ∈ (0, T ), This implies x ′ F k Z(ω) = 0, for k = 0, 1, · · · , p − 1. By the assumption (b), p−1 k=0 a k F k is nonsingular for any real number a k with not all of them being zero. Hence, for any nonzero vector in R p , in particular for x, x ′ p−1 k=0 a k F k is a nonzero vector. In other words, for nonzero vector x, p−1 k=0 a k (x ′ F k ) is nonzero for any nonzero vector (a 0 , . . . , a p−1 ). Thus .
We continue the proof of (3.1) of Theorem 3.1. From Lemma 3.2 we get, On the other hand, Hence, we have the proof of Theorem 3.1.
Proof. (i) Given ǫ > 0, ∀ω outside a null set, ∃T 0 (ω) such that As T → ∞, the first term tends to 0 since which is finite almost surely, by Lemma 3.1.
This completes the proof of Corollary 3.1.
If all the eigenvalues of F have positive real parts, we can relax condition a k F k being nonsingular for any reals a 1 , . . . , a n with at least one of them being nonzero.  Let the characteristic polynomial of F be given as where λ i are the real roots of multiplicity p i and x 2 + b j x + c j are the irreducible polynomials giving the complex roots with multiplicity q j and a 0 is a constant. Let the minimal polynomial of F be If r i = p i and s j = q j for all i, j, then the degree of the minimal polynomial of F and the degree of the characteristic polynomial of F are the same and the assumption (b') holds and our results follow. If some of the r i s are less than p i s and/or s j s are less than q j , then, (b') does not hold for F . However, in that case, one can transform F in the rational canonoical form as , respectively, and D is a square matrix of the dimension the same as the degree of the minimal polynomial of F (i.e., same as ( i r i + j s j )). For each j, C j is a partitioned diagonal matrix (i.e., only the diagonal blocks are nonzero blocks), each block is of dimension 2 × 2, and its diagonal block matrices are identical and repeating exactly (q j − s j ) times and have the characteristic polynomial x 2 + b j x + c j , and, for each i, B i is a diagonal matrix with diagonal entries consisting of the real characteristic root λ i repeating exactly (p i −r i ) times. Thus, we can work with D instead of F . For D the assumption (b') holds, since the degree of minimal polynomial of D is same as that of F and, consequently, the degree of the minimal polynomial of D is the same as the degree of the characteristic polynomial of D. Estimation of D can be done using the SDE of LY t . For B i and C j , one can consider each one separately and transform Y t to J i Y t and K j Y t and use the SDE of any component of J i Y t (as it has the Markov property) to estimate λ i and the SDE of the first two (or, any (2m-1)th and 2mth) components of K j Y t together, as they have the Markov property, to estimate a diagonal block of C j . Hence the assertion in the last remark.

Eigenvalues on the Left Half Space
In this Section, we study the asymptotic behavior of OU processes where the real parts of all the eigenvalues of F are either zero or negative. Unlike the exponential rate of growth for in Theorem 3.1 and Corollary 3.1 for the the process where all the eigenvalues of F have positive real parts, the following theorem shows that these quantities grow at most polynomially fast in t for these processes.
For stable processes Y t (i.e., eigenvalues of F with negative real parts), we know from Basak and Bhattacharya [4] that Therefore, the property of Y t starting at x is the same as that from 0. Hence, without loss of generality, we can assume that Y 0 = 0.
Proof. To prove (4.1) and (4.2), consider each component which follows, afortiori, by the Law of the Iterated Logarithm by Basak [3]. Therefore, which is positive definite a.s. Therefore, Hence, the proof.
is bounded uniformly over k. Hence, it would follow, for any δ > 0,

COROLLARY 4.1 With the same notations and assumptions as in Theorem
(ii) By the previous remark 4.1 (i), we note that, Hence, the proof.
To prove Theorem 4.2, we need the following lemma: Since all eigenvalues of is positive definite (since the RANK condition holds here as well) and it converges almost surely to some positive definite constant matrix as T → ∞. Therefore By Corollary 4.1, Therefore, which is bounded below (by a negative number possibly depending on ǫ) uniformly for large values of T by (4.3) and using the fact that both (Ẏ ǫ T ) ′ (C ǫ T ) −1 (Ẏ ǫ T ) and (Ẏ ǫ T ) ′ (Ċ ǫ T ) −1 (Ẏ ǫ T ) have the same order and the latter has the order as that of (Y ǫ T ) ′ (C ǫ T ) −1 (Y ǫ T ). Hence the proof of Lemma 4.1.
Proof of Theorem 4.2. Let F ǫ = F − ǫI, ǫ > 0. Since all eigenvalues of F are on the left half space, the real parts of all eigenvalues of F ǫ are negative, i.e., Y ǫ t is a stable process. By Corollary 4.1, f is a continuous function on [0, ǫ 1 ] and is differentiable in (0, ǫ 1 ). Then by the Mean Value Theorem, there exists an ǫ 0 ∈ (0, ǫ 1 ) such that That is, which is uniformly positive (i.e., bounded away from zero) for large values of T by Lemma 4.1. Since Hence the proof of Theorem 4.2.

General Ornstein-Uhlenbeck Processes
For the Ornstein-Uhlenbeck process defined in (1.1) with RANK condition (2.1), we have considered the case in which all the eigenvalues of F have positive real parts and the case in which all the eigenvalues of F have zero or negative real parts (i.e., zero eigenvalues, purely imaginary and the eigenvalues with negative real parts). Now we combine these cases to discuss the mixed model in which F can be decomposed into rational canonical form as follows: where all the characteristic roots of G 0 lie in the right half space and all the characteristic roots of G 1 lie on the left half space. Let We now derive the following result.
where B is defined in Section 2 (before (3.4)), I p 1 is a p 1 -dimensional identity matrix and Proof. Observing (5.1), we obtain, by Theorem 3.1, that Therefore, for all ω outside a null set, and for any given ǫ > 0, there exists As T → ∞, the first term goes to 0 since T 0 (ω) is fixed. The second term is less than ǫ by the choice of T 0 (ω) since C 1t is increasing in t (in the sense that C 1t 2 − C 1t 1 is positive definite whenever t 2 > t 1 ) and ||C As ǫ is arbitrary, the proof is complete.
We now observe that, and To show the remaining terms converges to 0, we prove the following Theorem. This theorem is in the spirit of Theorem 2.2 of Wei [25], which is presented for the discrete case.
To prove Theorem 5.1, we need the following lemmas.
Proof. Notice that, where |C 1t | is the determinant of C 1t . Observe that, G 1 can be further decomposed into a rational canonical form as follows: where all the characteristic roots of G 11 have negative real parts, those of G 12 are purely imaginary and those of G 13 are zero. For i, j = 1, 2, 3, define Hence, the proof.
We observe that, from Lemma 5.2, if we let Then, under the hypothesis of Theorem 5.1, Proof. Notice that M 1t is a martingale with respect to the filtration /T → 0, almost surely, as T → ∞, it is enough to show thatṼ T → 0, in probability, as T → ∞ and this would be immediate once one shows E(Ṽ T ) → 0 as T → ∞.

Now use Itô's Lemma to get
Therefore, by (5.3) and applying the Itô's Lemma again, one obtains Since V T ∧τn and U ′ 1t C −1 1t U 1t are non-negative, by Fatou's Lemma and the Monotone Convergence Theorem, Now, by the argument in (5.2), one has lim sup {T →∞} EṼ T ≤ αCt −α 1 . As t 1 can be taken to be arbitrarily large, we have the result.

. Then, with the same assumptions and notations as in Lemma 5.3,
Proof. Applying Itô's Lemma on V t , Therefore, Thus, Hence, the proof.
Proof of Theorem 5.1.
where lim T →∞ Σ T is a.s. positive definite. Thus, by Lemma 3.2, Therefore, the Theorem follows.

Asymptotic Efficiency
In this section we would like to show that our estimator for the drift matrix F is asymptotically efficient even if the underlying process is not necessarily stationary (stable). For matrix-valued estimator there several ways to define asymptotic efficiency (see Barndorff-Nielson and Sorensen [2], for details).
The result is already known in one-dimensional case and for vector-valued parameters (e.g., [5,7,18,23] and references therein) when the processes are not necessarily stationary. For multi-dimensional matrix-valued case, similar things can be proved once the asymptotic efficiency is properly defined for the matrix valued estimator.
Observe that, when AA ′ is nonsingular, the log-likelihood of F , (see [5], When AA ′ is not nonsingular, the log-likelihood of F cannot be written explicitly. Therefore, M.L.E. of F could not be achieved. However, we would show that the above estimator is asymptotically efficient under the assumptions of the section 2.
We show that E(Tr ] to prove the following result.

Proof of Theorem 2.2
Case 1: Eigenvalues of F are in the positive half space.
exists and has finite expectation. Also, (by symmetry) m Z = min 0≤t<∞ (Z t − Y 0 ) exists and has finite second moment. For symmetric matrices D 1 and D 2 , define, for all T ≥ T 0 , for some T 0 > 0 (T 0 may be taken to be 1). Thus, Since right hand side has finite expectation, using dominated con- Case 2: Eigenvalues of F are on the left half space.
When all the eigenvalues have real parts negative, by ergodic theorem,

Zero and purely imaginary eigenvalues.
When the eigenvalues are either all purely imaginary or all zero, replace F by F − ǫI = F ǫ , as it is done in Section 4, get the result as above by ergodic theorem. Now, as in Lemma 4.1, consider Therefore, , which is bounded below (by a negative number possibly depending on ǫ) uniformly for large values of T by (4.3) and using the fact that both TrE((Ṡ ǫ T ) ′ (C ǫ T ) −1 (Ṡ ǫ T )) and TrE((Ṡ ǫ T ) ′ (Ċ ǫ T ) −1 (Ṡ ǫ T )) have the same order and the latter has the order as that of TrE((S ǫ T ) ′ (C ǫ T ) −1 (S ǫ T )).
Now as in the argument in consistency part, since all eigenvalues of F are on the left half space, the real parts of all eigenvalues of F ǫ are negative, i.e., Y ǫ t is a stable process and Similarly, to get a upper bound, consider Therefore, , which is bounded above (by a positive number possibly depending on ǫ) uniformly for large values of T by (4.3).
Then by the Mean Value Theorem, there exists an ǫ 0 ∈ (0, ǫ 1 ) such that That is, which is uniformly bounded and positive (i.e., bounded away from zero and infinity) for large values of T as argued above. Since Mimicking the above argument, find , which is bounded below (by a negative number possibly depending on ǫ) uniformly for large values of T by (4.3) and using the fact that both Tr(E((C ǫ T ) −1 )E(Ċ ǫ T )) and Tr(E((C ǫ T ) −1 )E(C ǫ T )) have the same order.
Similary, to get an upper bound, consider , which is bounded above (by a positive number possibly depending on ǫ) uniformly for large values of T by (4.3).
Thus, using the similar argument as in (6.1) we show, since lim T →∞ Tr(E((C ǫ 1 Case 3: Mixed model.
In this case, use the decomposition of F as in Section 5, to decompose ). Then, one gets, Since for a symmetric invertible partition matrix, and H = I, i.e., identity matrix of order p 1 . Since F converging to zero almost surely by the proof of Lemma 5.1 and by the same lemma E converges to B almost surely, one obtains tr( by the case 1, and case 2. Similarly, ) expectation of which is finite by case 1 and case 2. Therefore one proves, for

Concluding remarks and discussion
It is easy to see that the state space equation of the general continuous autoregressive process (CAR(p)) of the form dX p−1 with α i real numbers, σ > 0 and W t a one-dimensional Browian motion.
Clearly, A is not singular. However, the RANK condition (a) holds for this F and A and, the condition (b') holds for this F . Hence, from our result, the consistency and the asymptotic efficiency of theF of general CAR(p) follows.
It is important to observe that this estimation procedure may be the first step in developing a test of zero roots of some F , which is necessary to de- follows asymptotically Normal, so that we could compute approximate confidence interval for the above testing procedures for the necessary parameters in F . As far as we know, these results are still unknown. Investigating the Asymptotically Mixed Normality property may be an important future direction to consider. One can look into LAMN property as well.
Besides, when the drift coefficient matrix depends on an unknown discrete paratmeter θ which follows a Markov chain (that helps the process to switch regimes), finding a consistent and asymptotically efficient estimator becomes important. Above questions can be asked in that setup as well.
In applications, we almost always use discrete sampled data. Similar questions can be asked for this model, when the data sampled are in deterministic (equal or unequal) time interval or in random interval. That can also be a focus of the future direction.

Purely Imaginary Eigenvalues
In this Section, we study the asymptotic behavior of OU processes when the drift matrix F only contains purely imaginary eigenvalues. The main results are summarized in the following: Moreover, To prove Theorem 7.1, we need the following Lemmas. Proof.
Hence, the proof of the theorem.

Zero Eigenvalues
In this Section, we study the asymptotic behavior of the OU processes when the drift matrix F contains only zeros eigenvalues.(i.e., F is a nilpotent matrix.) The main results are summarized in the following: Proof. Since F is a k × k nilpotent matrix of order γ (1 ≤ γ ≤ k), then To prove (7.5) Hence, the proof.