Maximum likelihood estimation for Gaussian process with nonlinear drift

Abstract. We investigate the regression model Xt = θG(t) + Bt, where θ is an unknown parameter, G is a known nonrandom function, and B is a centered Gaussian process. We construct the maximum likelihood estimators of the drift parameter θ based on discrete and continuous observations of the process X and prove their strong consistency. The results obtained generalize the paper [13] in two directions: the drift may be nonlinear, and the noise may have nonstationary increments. As an example, the model with subfractional Brownian motion is considered.


Introduction
Let B = {B t , t 0} be a centered Gaussian process with known covariance function, B 0 = 0. We assume that all finite-dimensional distributions of the process {B t , t > 0} are multivariate normal distributions with nonsingular covariance matrices.Now, let the process X t have a drift θG(t), that is, where G(t) = t 0 g(s) ds, and g ∈ L 1 [0, t] for any t > 0. The paper is devoted to the estimation of the parameter θ by observations of the process X.We construct the maximum likelihood estimators (MLEs) for discrete and continuous schemes of observations.We establish the strong consistency of both estimators.Moreover, we prove the a.s.convergence of the discrete estimator to the continuous one.This paper generalizes the results of [13], where model (1) with G(t) = t was considered.Moreover, contrary to [13], we do not assume the stationarity of increments of the driving process B. This substantially extends the class of possible models.As an example, we consider the model where B is the subfractional Brownian motion.c Vilnius University, 2018 Note that the problem of drift estimation for Gaussian processes is important for many applied areas, where an observed process can be decomposed as the sum of a useful signal and a random noise, which is usually modeled by a centered Gaussian process, see, e.g., [10,Chap. VII].In particular, such processes arise in telecommunication and on financial markets.For example, Samuelson's model (see [18]), which is popular in finance, is of the form (1).
Mention also that similar problems for the model with linear drift driven by fractional Brownian motion were studied in [3,9,11,15].The mixed Brownian-fractional Brownian model was treated in [7,13].Another approach to the drift parameter estimation in the model with two fractional Brownian motions was proposed in [12,14].In [2,16], the nonparametric functional estimation of the drift of a Gaussian processes was considered (such estimators for fractional and subfractional Brownian motions were studied in [8] and [19], respectively).
The paper is organized as follows.In Section 2, we study the case of discrete observations and prove the strong consistency of MLE.In Section 3, we consider the estimator constructed by continuous observations and establish the relations between discrete and continuous estimators.Then we prove the strong consistency of the estimator in the continuous scheme.In Section 4, these results are applied to the models with fractional and subfractional Brownian motions.Auxiliary results for nonrandom functions and integral equations are collected in the Appendix.

The case of discrete-time observations
Let the process X t be observed at the points 0 < t 1 < t 2 < • • • < t N .Then the vector of increments ∆X (N ) = (X t1 , X t2 − X t1 , . . ., X t N − X t N −1 ) is a one-to-one function of the observations.We assume in this section that the inequality G(t k ) = 0 holds at least for one k.

The likelihood function and MLE
Evidently, vector ∆X (N ) has Gaussian distribution N (θ∆G (N ) , Γ (N ) ), where Then one can take the density of the distribution of the vector ∆X (N ) for a given θ w.r.t. the density for θ = 0 as a likelihood function: Nonlinear Anal.Model.Control, 23(1):120-140 Then the corresponding MLE equals Since the observed process X t is Gaussian and the MLE θ(N) is a linear functional of the values of the process, we have that this estimator has a normal distribution.Taking into account that ∆X (N ) = ∆B (N ) + θ∆G (N ) , the estimator θ(N) can be represented in the following form: Hence, the estimator is unbiased and var θ(N) = 1/((∆G (N ) ) (Γ (N ) ) −1 ∆G (N ) ).

The behaviour of the MLE for the increasing number of points
Let N 1 N 2 , and a set of points {t N1 } be a subset of {t N2 }.Then there exists a matrix M , relating the increments w.r.t.these two sets of points:

If t
(1) N2 , then each column of the matrix M contains exactly one 1.If t N 2 and the time-points of the process X t used for the estimator θ(N1) make a subset of the time-points used for the estimator θ (N2) , then the increment θ(N2) − θ(N1) is independent of the value θ(N2) .

Consistency of MLE
Theorem 1.Let the following assumption hold: Proof.The estimator is unbiased: E θ(N) = θ.The estimator constructed by single observation has the variance var θ(1) The estimator constructed by N observations has smaller variance according to Corollary 1.Therefore, The following statement follows from the proof of [13,Thm. 2.7].(Note that it can be generalized: the mean-square convergence condition can be replaced with convergence in probability, see [5,Thm. 3 1} is a sequence of random variables such that its elements ξ 2 , ξ 3 , . . .(not including ξ 1 ) are mutually independent.If the series ∞ k=1 ξ k converges in the mean square sense to a random variable ζ, that is, then it converges a.s. to the same limit as well.
Theorem 2. Under the assumptions of Theorem 1, the estimator θ(N) is strongly consistent.
Proof.Let us show that the increments of the process { θ(N) , N ∈ N} are uncorrelated.

The case of continuous-time observations
In this section, we suppose that the process X t is observed on the whole interval [0, T ].We investigate MLE for the parameter θ based on these observations.

Assumptions on function G and process B
Evidently, B and X are Gaussian processes with the same covariance function, but, generally speaking, with different means since G is not zero identically.Our additional assumptions are: (A) There exists a linear self-adjoint operator Γ : where f, g The drift function G is not zero identically, and in the representation https://www.mii.vu.lt/NANote that, under assumption (A), the covariance between integrals of deterministic func-

Likelihood function
Now we establish the form of the likelihood function.In this order, introduce the notation the σ-field generated by the observations X t1 , . . ., X t N .Theorem 3. Let T be fixed, assumptions (A), (B) and additional assumption as a likelihood function.
Proof.Let us show that the function L(θ) defined in ( 9) is a density function for the distribution of the process {X t , t ∈ (0, T ]} for a given θ w.r.t. the distribution of the process {B t , t ∈ (0, T ]}, which coincides with {X t , t ∈ (0, T ]} when θ = 0.In other words, we need to prove that where P θ is the probability measure that corresponds to the value of the parameter θ.For that reason, let ϑ ∈ R be fixed and prove It is enough to verify that for all N , for all t 1 , . . ., t N , Under assumption (B), there exists t nz ∈ (0, T ] such that G(t nz ) = 0. We can always assume that for at least one of the observations X t k , the inequality G(t k ) = 0 holds (otherwise, due to the fact that A ∈ F t1,...,tnz,...,t N , we can insert t nz into the set t 1 , . . ., t N Nonlinear Anal.Model.Control, 23(1):120-140 and repeat what follows).For such A, Let us evaluate ) ) has multivariate Gaussian distribution because all its elements are linear functions in B t .Ev = 0; evaluate its covariance matrix.The (k + 1, 1) element of the matrix Evv is equal to which is the kth element of the vector ∆G (N ) (here t 0 = 0); thus, the lower-left block of the matrix Evv is equal to E[ T 0 h T (t) dB t ∆B (N ) ] = ∆G (N ) .Other blocks are var[ T 0 h T (t) dB t ] = Γ h T , h T = g, h T and var(∆B (N ) ) = Γ (N ) .Thus, the covariance matrix of vector v is equal to By [1, Thm.2.5.1], the conditional distribution of https://www.mii.vu.lt/NA Finally, Thus, ( 10) is proved.From the fact that (10) holds true for all sets of t 1 , . . ., t N and for all ϑ ∈ R it follows that L(θ) is a likelihood function.

MLE and its properties
The MLE maximizes the function L(θ).It equals Since X t = B t + t 0 g(s) ds, we have the following representation: We see that the estimator is normally distributed and unbiased.Its variance equals Under the assumptions of Theorem 3, the denominator in the definition of the estimator ( 13) is positive.Indeed, by (12), The last inequality holds if G(t k ) = 0 at least for one k because, in this case, ∆G (N ) = 0, and the matrix (Γ (N ) ) −1 is positively definite.

Relations between discrete and continuous estimators
Let Consider the discrete estimator θ(N) and the continuous estimator θT .Using ( 4) and ( 14), we can write where the vector v is defined in the proof of the Theorem 3. By (11), we get , whence Note that the function G(t) is continuous.Therefore, under assumptions of Theorem 3, there exists N 0 such that for all N > N 0 , G(k/N ) = 0 for some 1 k N .
We have https://www.mii.vu.lt/NAIt follows from assumption (8) that Recall also that Γ h T , h T = g, h T , var θT = 1/ g, h T .Hence, convergence ( 16) can be written in the form For sufficiently large N , we have Taking into account Lemma A.3 and convergence (17), we get By (15), whence the proof follows.
The proof repeats that of Theorem 2, where the reference to Theorem 1 is replaced by the reference to Theorem 4.

Consistency of the estimator
Theorem 6. Assume that for all T > 0, there exists a function h T ∈ L 2 [0, T ] such that g| [0,T ] = Γ T h T (Γ T denotes the dependence of the operator Γ on T ).If Nonlinear Anal.Model.Control, 23(1):120-140 then the estimator θT is consistent in mean square, that is, Proof.By (18), there exists an increasing sequence of positive numbers {t k , k ∈ N} such that lim k→∞ t k = +∞, for all k, the inequality G(t k ) = 0 holds, and Denote by t(T ) the largest t k that does not exceed T .Then The estimator θT is unbiased.Compare its variance with the variance of the estimator θ(1) t(T ) constructed by single observation X t(T ) (see ( 6) and ( 7) for the estimator and its variance).According to inequality (15), Lemma 3. The stochastic process θT (defined for all T such that T 0 h T (s) ds = 0) is a process with independent increments.

Proof. Let us calculate the covariance between estimators with
The values of θT are linear functions of Gaussian process X t .Hence they have joint Gaussian distribution, and uncorrelatedness of the increments θT2 − θT1 and θT4 − θT3 implies their independence.Theorem 7.Under the assumptions of Theorem 6, the estimator θT is strongly consistent.
The proof repeats the proof of [13,Thm. 3 If H ∈ (1/2, 1), then the operator Γ = Γ H T that satisfies (8) for B H equals Consider model (1) for G(t) = t and B = B H : Let us construct the estimators θ(N) and θT from (3) and ( 13), respectively, and establish their properties.In particular, Proposition 1 allows to define finite-sample estimator θ(N) .
Proposition 1.The linear equation Γ H T f = 0 has only trivial solution in L 2 [0, T ].As a consequence, the finite slice ( B H t1 , . . ., B H t N ) with 0 < t 1 < • • • < t N has a multivariate normal distribution with nonsingular covariance matrix.By statement (ii) of Lemma A.4, f 1 (s) = 0 almost everywhere on (0, 2T ), whence f (s) = 0 almost everywhere on (0, T ).Thus, the operator Γ H T is self-adjoint, compact, and positive definite.It admits the spectral representation where That is only possible if f v = 0 almost everywhere on [0, T ] and v = 0.
, the random process B H satisfies Theorems 1 and 2. Hence, under condition t N → +∞ as N → ∞, the estimator θ(N) is L 2 -consistent and strongly consistent.
In order to define the maximum likelihood estimator (13), we have to solve an integral equation.The following statement guarantees the existence of the solution: Proof.By Lemma A.5, the integral equation for almost all t ∈ (0, T ), and h(s) = (y(T + s) − y(T − s))/(H(2H − 1)) is a solution to integral equation ( 22).Note that h ∈ L 2 [0, T ], and this finishes the proof.As the result, L(θ) defined in (9) is the likelihood function in model (20), and θT defined in (13) is the maximum likelihood estimator.The estimator is L 2 -consistent and strongly consistent.

Model with fractional Brownian motion and power drift
Let 1/2 < H < 1 and α > −1/2.Consider the process where X t is a stochastic process observed on interval [0, T ], B H t is an unobserved fractional Brownian motion with Hurst index H, and θ is a parameter of interest.This is a particular case of model (1) with g(t) = (α + 1)t α .Now verify the conditions of the theorems.Due to Proposition 1, any finite slice of the stochastic process {B H t , t > 0} has a multivariate normal distribution with nonsingular covariance matrix; the process satisfies condition (A) with injective operator Γ .Condition (5) holds true if and only if α > H − 1.
Corollary 5.If α > 2H − 3/2, the conditions of Theorems 3-7 are satisfied.The estimator θT is consistent, L 2 -consistent and strongly consistent.For fixed T , it can be approximated by discrete-sample estimator in mean-square sense.

Simulations
Tables 1 and 2 contain the results of numerical simulations for model (23) with α = 1 and α = 2, respectively.For T = 1 and T = 10 and various values of H, we find h T directly by (24).For θ = 2, we simulate 1000 realizations of the process for each H and compute the values of θT by (13).The means and standard deviations of these estimates are reported.We see that these simulation studies confirm the consistency of θT .The results are quite similar for different values of H.Moreover, the increase of α increases the rate of convergence.In what follows, we assume that g(t) is not zero everywhere.This is equivalent to Then In particular, V n > 0 for sufficiently large n.Put Therefore, the numerator in (A.3) tends to 0. It follows from the uniform boundedness of s n (t) and (A.2) that for all n such that V n > 0. Proof.The statement of the lemma holds if b = 0. Otherwise, M −1 is also positively definite, therefore, b M −1 b > 0. Then Lemma A.4.Let 0 < p < 1 and b > 0.
for almost all t ∈ (0, b), then y(x) satisfies almost everywhere on [0, b], where D α a+ and D α b− are fractional derivatives, for almost all t ∈ (0, b) and for almost all x ∈ (0, b), then y(s) is a solution to integral equation (A.4).
https://www.mii.vu.lt/NASketch of proof.The integral equation is solved in [6,11].In [6, Sect.2.3], the equation is rewritten as For the new equation, the statement of the theorem can readily be obtained.
2 , then each of the first k columns of the matrix M contains exactly one 1, other N 2 − k columns consist of zeros.Lemma 1.If N 1

Theorem 4 .
Let the assumptions of Theorem 3 hold.Construct the estimator θ(N) from (3) by observations X T k/N , k = 1, . . ., N .Then θ(N) converges to θT in mean square.Proof.By Lemma A.2, there exists a sequence of piecewise constant functions g : [0, T ] → R (constant on the intervals ((k − 1)T /N, kT /N )) such that f N → h T in L 2 [0, T ], and T 0 f N (s)g(s) ds = T 0 h T (s)g(s) ds for sufficiently large N .The function f N (t) can always be chosen in the form are solutions to the equation 2T 0 y(s) ds |t − s| 2−2H = 0 for almost all t ∈ (0, 2T ).

Lemma A. 3 .
Let M be a positively definite n × n-matrix, a and b be n-dimensional vectors.Then a M ab M −1 b (a b) 2 .