Drift estimation with non-gaussian noise using Malliavin Calculus

Abstract: The aim of this paper is to show the existence of drift estimators dominating the standard one in continuous-time models of the form Xt “ ut ` Zt, where ut is the drift and Zt is either a Brownian martingale or a non-martingale noise living in the second Wiener chaos. Our results are based on the use of Malliavin calculus techniques, and extend previous findings of Privault and Réveillac (2008).


Overview
In this paper we consider the problem of estimating a (possibly random) drift pu t q tPr0,T s in continuous-time statistical models of the type X t " u t`Zt , t P r0, T s, (1.1) where pX t q tPr0,T s is the observation process, and pZ t q tPr0,T s is a noise having the form of a process living in a fixed Wiener chaos of a Brownian motion. Our results allow in particular to deal in detail with the following cases: (i) pZ t q tPr0,T s is a chaotic Brownian martingale, (ii) pZ t q tPr0,T s is a Rosenblatt process (living in the second Wiener chaos).
Our main finding (see Theorem 4.1) is that, under fairly general circumstances, it is possible to find an estimator of the drift, whose risk is smaller than the one of the standard estimator X t . Our results generalize the recent findings by Privault and Réveillac [20] which only dealt with the case of a Gaussian noise. As such, our work can be regarded as an infinite-dimensional extension of the seminal works by Stein [23], [24] and James-Stein [12], that first described analogous phenomena in a finite dimensional setting.

History and motivation
In his famous paper [23], Stein has shown that for a normally distributed ddimensional random vector X with unknown mean μ and covariance matrix I, the standard (unbiased) estimator X for the mean μ is inadmissible with respect to the quadratic loss function if and only if d ě 3. This means that there exists an estimator ξ for the mean, verifying: with a strict inequality for at least one μ if and only if d ě 3. In this case one says that ξ dominates X. For d " 1 and d " 2, the standard unbiased estimator X is admissible, see [23]. We recall that an estimator δ 1 is said to dominate an estimator δ 2 , if for every μ P R d we have E " }δ 1´μ } 2 ‰ ď E " }δ 2´μ } 2 ‰ , and one has a strict inequality for at least one μ. In other words, there are estimators that dominate the standard estimator X for the mean if and only if d ě 3. More precisely, Stein has shown that for d ě 3, for sufficiently large a ą 0 and sufficiently small b ą 0, the estimator δpXq "ˆ1´b a`}X} 2˙X " X´b X a`}X} 2 (1.2) is possibly biased, but has smaller risk than the standard estimator, that is: In 1962, James and Stein [12] have proved that estimators of the form δpXq "ˆ1´b }X} 2˙X " X´b X }X} 2 (1. 4) dominate X for every 0 ă b ă 2pd´2q, where d ě 3 is again the dimension of the random vector X. In 1981, Stein published an important article (see [24]): he was able to give a much easier proof of the earlier result using the technique of integration by parts. He created a link to superharmonic functions and was able to give a criterion for an estimator to have smaller risk than the standard estimator.
In parallel with these developments, which concern normally distributed random variables, research has concentrated mostly on two aspects: (i) considering spherically symmetric distributions more general than the Gaussian distribution, (ii) finding more general estimators that dominate the standard unbiased estimator (in the normal case and in the more general symmetric distributed case).
An overview of the theory can be found in [3], [4] and [5]. We give only a few examples of these extensions. For instance, Brandwein has proved (see [2]) that for the quadratic loss function and a spherically symmetric distributed d-dimensional random vector X such that Er}X} 2 s ă 8 and Er}X}´2s ă 8, dominates X if d ě 4 and 0 ă a ă r2pd´2q{ds{ Er}X}´2s. Brandwein and Strawderman have generalized the Stein-type estimator in 1991 by considering again the quadratic loss and more general estimators of the form δ a pXq " X`agpXq, see [4]. They show that δ a pXq has smaller risk than X if d ě 4, 0 ă a ă 1{`d Er}X}´2s˘and if some conditions on g hold (see [4] for these conditions).
It is worth noticing that superharmonic functions and the divergence theorem play a crucial role. We notice that the divergence theorem is closely connected to integration by parts. The technique of integration by parts was also used by Shinozaki [22] to prove the existence of estimators that dominate the standard estimator for the location parameters. Moreover the existence of estimators that dominate the standard estimator for the location parameters of Z is proved if ErZ i s " ErZ 3 i s " 0, ErZ 2 i s " 1 and ErZ 4 i s " k and Z i are independent and identically distributed.
Apart from these results that concern all classical probability theory, integration by parts has found applications in a paper by Evans and Stark (see [10]). In connection with stochastic processes, Girsanov's theorem is used to prove the following general result concerning the existence of an estimator of the form given in Eq. (1.2) satisfying the relation (1.3): if X " Z`θ is a ddimensional random vector with d ě 3, and if Z is not almost surely 0, ErZs " 0, Er}Z} 2 s ă 8 and Er}Z`θ} 2´d s ď }θ} 2´d , θ P R d , then δpXq "`1´a{p1`}X} 2 q˘X dominates X for every sufficiently small a ą 0. The techniques used in this last paper are non standard and are very different from those used in previous works. In recent years, research has turned to stochastic processes. Stein's approach has shown to be effective in this field as well. Privault and Réveillac have considered the problem of estimating the drift of a Gaussian process. We sketch below the main aspects of the setting considered by the authors (for details, see [20]). For T ą 0, the authors consider a real-valued Gaussian process pX t q tPr0,T s with covariance function γps, tq " ErX s X t s, s,t P r0, T s on a probability space pΩ, F, Pq, where F is the σ-algebra generated by X. The process pX t q tPr0,T s is represented as an isonormal Gaussian process on the real separable Hilbert space H generated by the functions χ t : s Þ Ñ min ts, tu for s, t P r0, T s, with the scalar product x¨,¨y H and the norm }¨} H defined by xχ t , χ s y H :" γps, tq.

Then
Xpχ t q :" X t , t P r0, T s and tXphq : h P Hu is a family of centered Gaussian random variables satisfying ErXphqXpgqs " xh, gy H , h,g P H.
The authors consider a one-dimensional Gaussian process pX t q tPr0,T s with where pu t q tPr0,T s is an adapted process of the form and pX u t q tPr0,T s is a centered Gaussian process under a probability P u which is the translation of P on Ω by u. Malliavin calculus and the integration by parts formula (see formula (3.2)) are used to construct an estimator with lower risk than the standard estimator X for the drift. The key to this approach is an infinite-dimensional version of Stein's lemma. Their estimator δpXq for a deterministic drift is biased and anticipating, but has smaller risk than the standard estimator X t for the drift: For the ease of notation, we use the shortform λ T :" λ|r0, T s. The estimator δpXq is given by where Π d denotes the orthogonal projection on the space generated by Xph 1 q, . . . , Xph d q, and the functionsh i are defined through h i and the covariance structure of the underlying Gaussian process (see [20] for details). This estimator has the same form than the James-Stein estimator in Eq. (1.4). In fact, letting X " pX 1 , . . . , X d q J , the classical James-Stein estimator in Eq. (1.4) writes as: where e i denotes the i-th vector of the canonical basis of R d . Privault and Réveillac prove also an interesting Cramer-Rao type bound that allows one to compare any unbiased and adapted drift estimator ξ with the standard drift estimator X in their setting: (1.6)

2981
Inequality (1.6) shows for the Gaussian process pX t q tPr0,T s that X t " u t`X u t is the best unbiased estimator for the drift u t and realizes the minimal risk among all adapted drift estimators. On the other hand inequality (1.5) shows that there are biased and anticipating estimators that improve upon this estimator. The fact that the estimators verifying inequality (1.5) are biased has its analogue in the classical situation considered by Stein. The methods used in the setting for Gaussian processes have been applied to Poisson processes and to the Fractional Brownian motion as well (see [21] and [9]).

Main results and plan
In the framework of the continuous-time model (see Eq. (1.1)), our main findings will be the following: (1) a Cramer-Rao type bound under the assumption that pZ t q tPr0,T s is a Brownian martingale, a situation in which X t is the best unbiased adapted drift estimator, see Theorem 2.1 and Theorem 2.2; (2) the existence of biased estimators with smaller risk than the one of the standard estimator in the case of a process living in the second Wiener chaos and under fairly general circumstances, see Theorem 4.1; (3) applications and examples that illustrate the previous result, see Section 6.
The paper is organized as follows: -In Section 2, we introduce the necessary notations. In this section we consider processes of the form The noise pZ t q tPr0,T s is a martingale with respect to the filtration generated by the Brownian motion pW u t q tPr0,T s . For the case of a deterministic drift pu t q tPr0,T s with 9 u P L 2 pr0, T s, λ T q, we show that the risk of an unbiased adapted drift estimator pξ t q tPr0,T s cannot be lower than the risk of the standard estimator: -In Section 3, we give a brief introduction to Malliavin calculus. We need the basic elements of this theory for the forthcoming proofs. -In Section 4, we consider (for a deterministic drift) processes of the form The noise pZ t q tPr0,T s is not necessarily a martingale. We define an estimator whose risk is smaller than the risk of the standard estimator. Combining this 2982 C. Krein result with the Cramer-Rao type bound of Section 2, we conclude that in the martingale case our estimator is superefficient and that the standard estimator is inadmissible. -In Section 6, we apply our main result to the case where the noise pZ t q tPr0,T s is a Gaussian process, a Rosenblatt process or a chaotic Brownian martingale. -In Section 7, we estimate a constant a that is needed to define our James-Stein estimator for the case of a Rosenblatt process and for the case of a chaotic Brownian martingale. In Eq. (1.7), a is a positive constant, gptq is a vector pg i ptq, . . . , g d ptqq J and every g i : r0, T s Ñ R is a continuous function, B is a positive definite matrix and Xphq is a vector of d observations. -In Section 8, we give without proof a discrete version of the main theorem for the case of a particular non Gaussian noise.

Notations
In this section we study Cramer-Rao bounds for the model in Eq. (1.1), in the case where pZ t q tPr0,T s is a square-integrable Brownian martingale. This result covers, in particular, the case of chaotic martingales -these are special cases of the noise processes studied in Section 6.1. More specifically, we consider T ą 0, a measurable space pΩ, Fq and a measurable mapping The stochastic process pu t q tPr0,T s is called the drift. We suppose that we have a representation of the form Suppose that there is a probability measure P u on pΩ, Fq, a filtration pF t q tPr0,T s and stochastic processes pb t q tPr0,T s , pW u t q tPr0,T s and pX t q tPr0,T s such that: ,T s is a standard Brownian motion with respect to P u and pF t q tPr0,T s , ,T s is a pF t q tPr0,T s -martingale with respect to P u .

2983
We suppose moreover that: From now on, we suppose that the process pu t q tPr0,T s has a representation as in Eq. (2.1) and is square-integrable with respect to P u bλ T . The process ş t 0 b s dW u s˘t Pr0,T s is a pF t q tPr0,T s -martingale with respect to P u . Conversely every pF t q tPr0,T s -martingale pM t q tPr0,T s (with respect to P u ) has such a representation if E u rM 0 s " 0.
A process pξ t q tPr0,T s is called unbiased drift estimator if E u rξ t s " E u ru t s, t P r0, T s for all square-integrable adapted processes pu t q tPr0,T s as defined above. It is called adapted if the process pξ t q tPr0,T s is F t -adapted (see [20]).

Cramer-Rao type bound
We are now going to state the two main findings of the present section, namely: (i) Theorem 2.1, containing a Cramer-Rao bound for possibly random drifts, under some restrictive assumptions on pb t q tPr0,T s ; (ii) Theorem 2.2, dealing with Cramer-Rao bounds for deterministic drifts, but with no technical assumptions on pb t q tPr0,T s . Theorem 2.1. Consider a drift pu t q tPr0,T s with u P L 2 pΩˆr0, T s, P u bλ T q and the situation of Section 2.1 with If the following conditions hold: us` b 2 s bs¯2 ds˙j ă 8 for every P U , where U is an arbitrary small neighbourhood of 0, (Novikov condition) ş¨0 b 2 s ds P L 2 pΩˆr0, T s, P u bλ T q, we have for every square-integrable, unbiased adapted drift estimator pξ t q tPr0,T s : In particular, X realizes the minimal risk for this class of estimators.

C. Krein
Proof. See Appendix 9.1. The proof is based on the proof given in [20]. The conditions are used to verify the conditions of Girsanov's theorem and Lebesgue's dominated convergence theorem (to interchange derivatives and integrals).

Theorem 2.2.
Consider the situation of Section 2.1 with and a square-integrable, unbiased adapted drift-estimator pξ t q tPr0,T s , i.e. ξ t is F t -adapted and: for all square-integrable F t -adapted processes pu t q tPr0,T s . If pu t q tPr0,T s is deterministic with ş T 0 9 u 2 t dt ă 8, we have: Proof. See Appendix 9.2. The proof is based on an approximation argument and elementary adapted processes.
Remark 2.3. The techniques used in the proofs of Theorems 2.1 and 2.2 make use of the martingale property of the noise`ş t 0 b s dW u s˘t Pr0,T s . For the general case, it is unknown to the author whether a Cramer-Rao type bound continues to hold if the martingale assumption is dropped. However, there are special cases outside the martingale setting for which a bound of this type holds. In [9], the authors prove the existence of a Cramer-Rao type bound if the noise pX t´ut q tPr0,T s is a Fractional Brownian motion with Hurst parameter 0 ă H ă 1{2. The proof is based on a version of Girsanov's Theorem for the Fractional Brownian motion (see [18] or [6]). Under the martingale assumptions, the theorems above stress the optimality of the standard estimator in the class of all unbiased adapted drift estimators. We will now provide some comparisons (essentially concerning the techniques in the proof) with other related bounds in the literature.
(a) In the framework of stochastic calculus and stochastic processes, Girsanov's theorem is used to prove a Cramer-Rao type inequality (see [20], [21] and [9]). The underlying idea is to use Girsanov's theorem to interchange expectation and differentiation. An inequality is found that allows the direct comparison of adapted and unbiased drift estimators with the standard estimator. (b) Evans and Stark also use Girsanov's theorem but make no statement about the optimality of their estimator. They only affirm that for the considered class of processes, their estimator dominates the standard estimator (see [10]). (c) James and Stein considered the special case of a random variable X that is normally distributed. For this particular case, it is known that the standard estimator is the best unbiased estimator for the expectation of X.
(d) In the framework of classical statistics, Brandwein, Strawderman and Shinozaki do not affirm that their estimators are uniformly minimum-variance unbiased estimator. Instead they consider the class of invariant estimators. An overview of the theory of (location) invariant estimators can be found in [13]. The authors suppose that the random variable X has a density of the form f px´μq for x, μ P R d . The location parameter μ is the mean of X. Then for every a P R d , with x 1 " x`a and μ 1 " μ`a, we have: The squared loss function shares this invariance property and the problem of estimating μ is thus called location invariant. Estimators δ that verify δpX`aq " δpXq`a are called location invariant estimators. In [3], [2] and [22], Brandwein, Strawderman and Shinozaki affirm that the standard estimator X is the best invariant estimator for the location parameter without affirming that these estimators are uniformly minimum-variance unbiased estimators. The main reason for the consideration of location invariant estimators seems to be the fact that it is easier to make statements about their existence.
Both approaches have disadvantages for our (continuous) setting: a version of Girsanov's theorem is not always available (for instance in the case of the Rosenblatt process) and it is not clear how the (finite dimensional) discrete concept of (location) invariant estimators can be adapted to the (infinite dimensional) continuous-time setting that we consider.

Malliavin Calculus
Consider the Cameron-Martin space, that is the real separable Hilbert space endowed with the scalar product xh, gy H :" gptqdt " x 9 h, 9 gy L 2 pr0,T s,λ T q for all h, g P H. The Hilbert space H is generated by the functions χ t : s Þ Ñ min ts, tu " s^t, s, t P r0, T s.
Consider a standard Wiener process pW u t q tPr0,T s (see Section 2) and a random variable F " f n pW u ph 1 q, . . . , W u ph n qq where n ě 1 and: -f n is an infinitely differentiable rapidly decreasing function on R n for n ě 1, -h 1 , . . . , h n P H, the Malliavin derivative of F . Both definitions are equivalent. We use the definition given in Eq. (3.1) and write D t F for the Malliavin derivative of F . We have that D t is closable from L 2 pΩ, P u q to L 2 pΩˆr0, T s, P u bλ T q, that is (see [7] or Moreover the Malliavin derivative has a closable adjoint δ (under P u ). The operator δ is called the divergence operator or, in the white noise case, the Skorohod integral. The domain of δ is denoted by Dom δ, it is the set of squareintegrable random variables v P L 2 pΩˆr0, T s, P u bλ T q with: for a constant c depending on v and all F P D 1,2 where D 1,2 is the closure of the class of smooth random variables with respect of the norm With the scalar product This relation is often called the integration by parts formula. We have the more general rule (see [17, For multiple Wiener integrals, we have for symmetric and square-integrable functions f n (see for instance [17, p.35]): Formula (3.2) can be generalized by considering the multiple divergence (see [16, p.33]): If v P Dom δ n and F P D n,2 : see [16] or [17] for details. For the ease of notations, we have used the shortform λ n T :" λ n |r0, T s n .

Preliminary remarks
In this section we consider primarily noises pZ t q tPr0,T s that live in the second Wiener-Itô chaos and processes pX t q tPr0,T s with X 0 " 0 and We consider stochastic integrals that do not necessarily define martingales. The kernels of the stochastic integrals depend on the time parameter t P r0, T s. This setting includes the well-known Rosenblatt process as well as a class of Brownian martingales living in the second Wiener-Itô chaos. As we already stressed in Remark 2.3, Theorem 2.1 and Theorem 2.2 cannot be applied in this section where we are outside the martingale setting. Therefore we shall compare our estimators only to the standard estimator X. We construct a stochastic integral with respect to our noise. Since our construction aims to be generally applicable, we consider a special class of observations. This restrictions can be relaxed when considering particular cases of our setting (see Section 6). In Paragraph 4.1.1, we recall the basic facts of the construction of the Lebesgue-Stieltjes integral. In Paragraph 4.1.2, we construct a stochastic integral with respect to the noise pZ t q tPr0,T s following the ideas of Tudor (see [26]). In Paragraph 4.1.3, we calculate the first two Malliavin derivatives for observations Xphq and we define functions g which play a crucial role in the proof of the main result. In Paragraph 4.1.4, we give a brief overview of our setting and definitions.

Lebesgue-Stieltjes integration
Consider a functionf : r0, T s Ñ R that is right-continuous and of bounded variation. We define the Lebesgue-Stieltjes integral with respect tof and give a brief summary of the well known construction of this integral. Sincef is of bounded variation and right-continuous, we can find right-continuous non decreasing functionsf 1 andf 2 withf "f 1´f2 . For everyf i there is a unique measure μ i on the Borel sets on r0, T s defined by: for every 0 ď α ă β ď T, and μ i pHq " 0. We consider the signed measure μ :" μ 1´μ2 . For a measurable function g : r0, T s Ñ R, the Lebesgue-Stieltjes integral of g with respect tof is defined as the Lebesgue integral of g with respect to μ: The Lebesgue-Stieltjes integral above exists if we have:

Stochastic integrals with respect to pZ t q tPr0,T s
We consider a process of the form and X 0 " 0. We make the following assumptions about the drift and the kernel of the stochastic integral: (i) the drift pu t q tPr0,T s is supposed to be deterministic and in the Cameron-Martin space (ii) the kernel f p¨,¨; tq is supposed to be symmetric in the first two variables for every t P p0, T s: (iii) the kernel f p¨,¨; tq is supposed to be square-integrable for every t P r0, T s: (iv) for almost every px 1 , x 2 q P r0, T s 2 , the function t Þ Ñ f px 1 , x 2 ; tq is rightcontinuous and of bounded variation, (v) the total variation V T 0 pf px 1 , x 2 ;¨qq is square-integrable: We construct a stochastic integral with respect to pZ t q tPr0,T s by following the ideas of Tudor (see [26]). We first define the stochastic integral with respect to the noise pZ t q tPr0,T s for step functions. We define ş T 0 1 p0,ts psqdZ s " Z t´Z0 " Z t , and more generally for 0 ď α ă β ď T : We have on the other hand: thus: By linearity, we can extend Eq. (4.2) to step functions ϕ : t Þ Ñ ř i γ i 1 pαi,βis ptq: We extend Eq. (4.3) to a larger class of functions. In our general setting, we limit ourselves to regulated function ϕ : r0, T s Ñ R. This means that the left and right limits ϕpx´q and ϕpx`q, as well as ϕp0`q and ϕpT´q, exist for every x P p0, T q. Dieudonné [8] proved that ϕ is a regulated function if and only if ϕ is the limit in L 8 pr0, T s, dλ T q of a series of step functions pϕ n q nPN . We use the following inequality (see [1, p.177 wheref is a function of bounded variation and g is a regulated function.

C. Krein
We notice that the measurability of f p¨,¨; tq implies the measurability of ş T 0 ϕ n ptqdf px 1 , x 2 ; tq. Moreover, Lebesgue's theorem of dominated convergence proves that for almost every px 1 , x 2 q: We prove now that convergence in Eq. (4.5) holds in L 2 pr0, T s 2 , λ 2 T q: We have used inequality (4.4) and assumption (v) above. We conclude that ş T 0 ϕpsqdf px 1 , x 2 ; sq is square-integrable and the convergence in Eq. (4.5) holds in L 2 pr0, T s 2 , λ 2 T q. The stochastic integral is thus well defined and by the Itô isometry we have: We can thus define for any regulated function ϕ: As a direct consequence we find that

Malliavin derivatives
We consider an absolutely continuous function h with: such that 9 h can be chosen as a regulated function. We define uphq :" ş T 0 9 hpsq 9 u s ds and Zphq :" and similarly for the observation Xphq: The first two Malliavin derivatives of Xphq exist and we have: We define a function g by gptq :" Cov u pXphq, X t q " We drop the dependence on h which is clear in the context. The second equality above is a consequence of Eq. (4.7) and the Itô isometry. We prove that g is continuous on p0, T q, left-continuous in T and right-continuous in 0 using assumption (vi). We have for s, t P r0, T s with |t´s| Ñ 0:

Setting and notations
We summarize the setting and introduce some notations. We consider a stochastic process pX t q tPr0,T s with X 0 :" 0 and for t P p0, T s: We suppose moreover that: (i) the drift pu t q tPr0,T s is supposed be deterministic and in the Cameron-Martin space (ii) the kernel f p¨,¨; tq is supposed to be symmetric in the first two variables for every t P r0, T s: (iii) the kernel f p¨,¨; tq is supposed to be square-integrable for every t P p0, T s: We consider absolutely continuous functions h i for i " 1, . . . , d and d ě 3 with: h i psqds, t P r0, T s, such that every 9 h i is a regulated function. We define for i " 1, . . . , d: We define for every i " 1, . . . , d continuous functions g i with We finally introduce some vector notations: We suppose that the matrix . Since this matrix is clearly symmetric and positive semi-definite, it is positive definite. We write B for its inverse, B is symmetric and positive definite as the inverse of a symmetric and positive definite matrix. We define for a ą 0: In Theorem 4.1, we consider an estimator of the form where a is a positive constant. Since a ą 0 and Xphq J BXphq ě 0, we have

Construction of an estimator for a second chaos noise
We formulate now the main result of this section.
Theorem 4.1. For the model discussed above with a deterministic drift and d ě 3, we consider the following drift estimator: The drift estimator X t´ξt has smaller risk than the standard estimator X t : for every value of a that is greater than a positive constant A.
(1) In Theorem 4.1, it is essential to find positive constants a such that inequality (4.9) holds. The proof of Theorem 4.1 shows that every a that is greater than some positive constant A, depending on f , h 1 , . . . , h d and T , satisfies inequality (4.9). The problem of finding A is non trivial and is discussed in Section 7 for two special cases.

C. Krein
(2) Another essential assumption in Theorem 4.1 is that the number of observations Xph i q used to construct the estimator is at least 3, that is d ě 3.
In the context of stochastic processes and drift estimation, this assumption can also be found in [20] and in [9]. In the context of classical statistics and for d-dimensional spherical symmetric distributions with a Lebesgue density, the assumption d ě 3 is also very common, see for instance [24], [10] or [22]. For some results, it is even necessary that the dimension satisfies the condition d ě 4, see for instance [4]. It is worth noticing that the usual condition d ě 3 may not be needed for some discrete settings. In [21], the authors estimate the intensities of a Poisson process and the estimators constructed dominate the standard estimator for every dimension d ě 1.
Proof of Theorem 4.1. We notice that ξ t is square-integrable with respect to P u bλ T . We show below that Before we prove the theorem, we transform both terms in this expression. The proof is complete if we find that the expression above is positive for some a ą A ą 0. We use that B is symmetric and have: We have for the second term: We transform the first term using Malliavin calculus. We use in particular integration by parts, see Eq. (3.5). Notice also that, for deterministic functions, the iterated divergence operator δ 2 coincides with the double Wiener-Itô integral, . Thus, with the classical Fubini theorem: This result can also be found using the Wiener-Itô chaos decomposition of ξ t and the fact that X t´ut lives in the second Wiener-Itô chaos. We notice that B is defined to be symmetric positive semi-definite. Since B is moreover supposed to be invertible, we have that B is symmetric positive definite. Thus B has a matrix square root C that is again symmetric positive definite and B " C 2 " C J C. We use Xphq " uphq`Zphq and the following inequalities with a ą 0 and k P p0, 1q (see Theorem 9.1 for the proofs): and for Q " 1 a`Xphq J BXphq respectively Q " 1 where }¨} is the standard euclidean norm. For the first and second Malliavin derivative of ξ t :" rgptq J BXphqs{ra`Xphq J BXphqs, we obtain by the means of the chain rule: We estimate (i) The sum of entries of A˚A´1 equals d for any invertible symmetric matrix A P R dˆd and the Hadamard product˚of matrices (see Theorem 9.2). Thus: and for a ą 0: The first equality follows from the symmetry in x 1 , x 2 , the second equality follows since Xph i q is a random variable in the second Wiener-Itô chaos and has therefore a deterministic second Malliavin derivative. The third equality follows from the definition of The last equality follows with Eq. (4.10).
(ii) For the ease of notation, we write }v 1 } for the standard euclidean norm of a vector v P R d . We have B " C 2 " C J C and the Cauchy-Schwarz inequality yields for arbitrary vectors v 1 , v 2 P R d :ˇv Thus for a ą 0: (iii) The next two terms can be estimated similarly. We find for a ą 0: and: (iv) We have for a ą 0: We transform this last expression using the definition of B: (v) Analogue estimations to the ones above show for a ą 0: (vi) We have for a ą 0: We combine now the estimations found above for a ą 0, d ě 3 and k P p0, 1q: Together with the inequalities of Theorem 9.1, we find: In the last step we have used that X t " u t`Zt and thus X t " Z t if u " 0. The expression in brackets is positive if a is chosen large enough, more precisely for a ą A ą 0 (see Section 7). The expectations in the last inequality above can be calculated without knowing the drift u. We notice that: The results of Theorem 9.1 imply that: is (for a fixed k P p0, 1q) finite and bounded as a function of a ě ą 0 (for any ą 0). This completes the proof. h i " 1 p0,tis for 0 ă t 1 ă t 2 ă . . . ă t d ď T . If Cov u pZ ti , Z tj q i,j is an invertible matrix, then the g i are continuous, linearly independent functions. The Gram-Schmidt algorithm with the standard scalar product on L 2 pr0, T s 2 , λ 2 T q can be used to find an invertible lower triangular matrix L such that pLgq 1 , . . . , pLgq d are orthonormalized. We have: This shows that ş T 0 gptqgptq J dt is invertible and equal to L´1L´J "`L J L˘´1. We conclude that ş T 0 gptqgptq J dt and the inverse B are symmetric positive definite.

Extensions
In this section, we point out extensions of the previous results. Since the methods of Section 4 are useful in more general settings but the calculations are lengthy, we do not provide complete proofs for the results of this section.

A more general setting for the noise
An analogue version of Theorem 4.1 holds for a process pX t q tPr0,T s with X 0 " 0 and for t P p0, T s: if the analogue, n-dimensional version of conditions (i)-(vii) hold. Moreover a version of Theorem 4.1 holds if pX t q tPr0,T s is a process with X 0 " 0 and where gptq :" pg 1 ptq, . . . , g d ptqq J with g i : t Þ Ñ Cov u pX t , Xph i qq and d ě 3. The drift estimator X t´ξt has smaller risk than the standard estimator X t : for every value of a greater than a positive constant A. ξ t " gptq J BXphq a`Xphq J BXphq is needed. Calculating these derivatives becomes increasingly complicated as n grows. We show below that only two of the terms appearing in D x1 . . . D xn ξ t are relevant for the proof and that all the terms can be estimated as in the proof of Theorem 4.1. We follow an idea that goes back to Meyer [15]. The Malliavin derivative satisfies the product rule, thus for smooth random variables F and G and n :" t1, . . . , nu: We use the notation D x S F " D x1 . . . D x l F for any subset S " t1, . . . , lu of n. Thus: We have thus: It can be seen by induction over n thaťˇˇˇˇˇÿ where R is square-integrable with respect to P u bλ n T and does not depend on a or u. If we choose again g i ptq :" Cov u pX t , Xph i qq and B " ş T 0 gptqgptq J dt¯´1, we can proceed with the terms in lines (5.3) and (5.4) as in Theorem 4.1 and prove: for a ą 0 large enough and d ě 3. We conclude that the estimator given by X t´ξt :" X t´g ptq J BXphq a`Xphq J BXphq , t P r0, T s has smaller risk than X t if a is large enough, d ě 3 and X t " u t`Zt and pZ t q tPr0,T s has the form given by Eq. (4.1) but lives in a Wiener chaos of higher order. (2) For the setting of Eq. (5.2), the method used in Theorem 4.1 is applicable as well. Notice however that in this situation extra terms appear that can be estimated applying once again the integration by parts formula.

Absolutely continuous kernels
We consider a stochastic process pX t q tPr0,T s with X 0 :" 0 and:

C. Krein
where t Þ Ñ f px 1 , x 2 ; tq is absolutely continuous with: We can replace the assumptions of Section 4 by the following assumptions that are more appropriate for the case of absolutely continuous kernels. Notice that assumptions (iv)-(vii) are used to define stochastic integrals, guarantee the existence of expectations and show the continuity of the functions g i . These properties can be proved more efficiently with conditions (i)-(iii) and the following conditions (iv') and (v'): (iv') the function f is absolutely continuous with respect to t: s has mixed second-order derivatives a.e. and they are equal to γ a.e.), we write q 2 to indicate the real such that 1{q 1`1 {q 2 " 1.
where gptq :" pg 1 ptq, . . . , g d ptqq J with g i : t Þ Ñ Cov u pX t , Xph i qq and d ě 3. The drift estimator X t´ξt has smaller risk than the standard estimator X t : for every value of a than a positive constant A.
In the following paragraphs, we review the construction of Section 4 and adapt it to the setting of absolutely continuous kernels.

The covariance function
We calculate E u rZ a Z b s for a, b P r0, T s: This is the two-dimensional version of an absolutely continuous function. Since has mixed second-order derivatives almost everywhere (see [27, We have moreover since γ P L 1 pr0, T s 2 , λ 2 T q: |γpa, bq| da db dt ă 8.

Hölder continuity
We notice that pZ t q tPr0,T s has a version that is k-Hölder continuous for every k P p0, 1{q 2 q. This result is proved in Theorem 9.3 using that γ P L q1`r 0, T s 2 , λ 2 T˘.

Stochastic integrals with respect to pZ t q tPr0,T s
We follow a similar approach to the one of Section 4.1 and extend the definition of ş T 0 ϕpsqdZ s from regulated functions to "sufficiently integrable functions" ϕ. For real numbers q 1 ą 1 and q 2 ą 0 with 1{q 1`1 {q 2 " 1, we suppose that: (a) the functions ϕ that are in L q2 pr0, T s, λ T q, (b) the function γ is in L q1 pr0, T s 2 , λ 2 T q (then the mixed second order derivatives of pa, bq Þ Ñ E u rZ a Z b s exist and are equal to γ almost everywhere).
It is known that the set of step functions is dense in L q2 pr0, T s, λ T q (see for instance [25,Theorem 3.4 and Theorem 4.3]). For ϕ P L q2 pr0, T s, λ T q, we choose a sequence ϕ n of step functions such that: for n Ñ 8.
We prove below, as n Ñ 8: exists and is square-integrable. Thus we can extend the definition of ş T 0 ϕpsqdZ s to functions ϕ P L q2 pr0, T s, λ T q: We have: Thus: Drift estimation using Malliavin Calculus The calculations above show that:

Observations and functions g i .
We define: As above, we consider a deterministic drift u P H 2 . We define Xphq. We have E u rZphq 2 s ă 8 for h P H q2 as exposed in the previous section. We have uphq :" ş T 0 9 u s 9 hpsqds ă 8 if h P H 2 . This shows the necessity to choose h P H q2 X H 2 " H maxt2,q2u . For h P H maxt2,q2u , we use the following notations in agreement with (5.8): We notice that the right side of the last equation can be approximated in L 2 pΩ, P u q by Xph n q where h n P H maxt2,q2u such that 9 h n can be chosen as a sequence of step functions that converge to 9 h in L maxt2,q2u pr0, T s, λ T q. For the functions g i we have:

C. Krein
We have for 0 ď s ď t ď T , γ P L q1 pr0, T s 2 , λ 2 T q, h P H maxt2,q2u , q 1 ą 1 and 1{q 1`1 {q 2 " 1: We have used Eq. (5.9) and find: The functions g i are thus bounded on r0, T s, continuous on p0, T q, right-continuous in 0 and left-continuous in T .

Applications
In this section we apply the results of Section 4 and Section 5 to concrete situations. We suppose that conditions (i)-(vii) hold, unless other conditions are specified. In Paragraph 6.1 we consider a noise that has a finite Wiener-Itô chaos decomposition and that defines a martingale. In Paragraph 6.2 we consider the Gaussian case and show moreover that for this particular situation, we can choose a " 0 in Eq. (4.8). In Paragraph 5.2 we consider the case of an absolutely continuous kernel and adapt the construction of Section 4 to this situation. In Paragraph 6.3 we consider in particular the case of the Rosenblatt process. In all these cases, Eq. (4.8) gives an estimator whose risk is smaller risk than the risk of the standard drift estimator.

The martingale case
In the martingale case, the Cramer-Rao bounds of Section 2 hold and our estimator is superefficient.
(1) Theorem 4.1 and Proposition 5.1 can be used to handle the case where the noise pZ t q tPr0,T s defines a martingale with respect to the filtration of the Brownian motion with Z 0 " 0 and for square-integrable function functions f 1 and f 2 . The function f 2 is symmetric. We define: The functions t Þ Ñ f 1 px 1 ; tq and t Þ Ñ f 2 px 1 , x 2 ; tq are monotone and rightcontinuous. Furthermore f 2 px 1 , x 2 ; tq is symmetric in the first two variables.
h i psqds for a regulated function 9 h i . With a result from [11], we have:

C. Krein
Since 9 h i is bounded as a regulated function, the Lebesgue-Stieltjes integrals above are square-integrable. We have: and in particular for h i : s Þ Ñ min ts, t i u, 9 h i " 1 p0,tis : The analogue of assumption (v) holds, since: It is easy to check that E u rpZ t´Zs q 2 s Ñ 0 for |t´s| Ñ 0. We have for s ď t: since f 1 and f 2 are square-integrable. We find similarly that ş T 0 E u rZ t s 2 dt ă 8. We conclude that Theorem 4.1 can be used in this situation to find estimators with smaller risk than the standard drift estimator X t .
(2) As a direct consequence, we can choose f 2 " 0 and recover the case of Gaussian processes of the form for a square-integrable function f 1 .

The Gaussian case
Choosing f 2 " 0 in Proposition 5.1, we have We prove for this class of Gaussian processes that we can choose a " 0. As in Eq. (4.8), we define ξ t :" pgptq J BXphqq{pa`Xphq J BXphqq and g i ptq " Cov u pXph i q, X t q. We have for the model given in Eq. (6.1) and d ě 3:

C. Krein
The right-hand expectation is positive for every d ě 3 and a ą 0. We can choose a " 0 for this special case if d ě 3 and Cov u pXphqq is invertible. This follows from the well known fact that is finite if and only if d ě 3.

The Rosenblatt process
An important process pZ t q tPr0,T s verifying the conditions of Section 4 respectively of Section 5.2 is the Rosenblatt process on a compact interval r0, T s.
(1) Tudor [26] gives the following representation for the Rosenblatt process on r0, T s: The covariance function is given by and we have almost everywhere the mixed second order derivative: This function is in L q1 pr0, T s 2 , λ 2 T q for every 1 ă q 1 ă 1{p2´2Hq. The function t Þ Ñ E u rZ 2 t s " t 2H is an integrable function over r0, T s. The conditions of Section 5.2 are easy to check (see also [26]). It is thus possible to apply Theorem 4.1 to construct an estimator with smaller risk than the standard estimator in the case of a Rosenblatt process with Hurst parameter 1{2 ă H ă 1.
(2) If pZ t q tPr0,T s is a Rosenblatt process with 1{2 ă H ă 1, the conditions about γ can be specified. For this particular process, we have with Eq. (5.7) that γps, tq " Hp2H´1q|t´s| 2H´2 for almost every ps, tq. It is easy to see that γ P L 1 pr0, T s 2 , λ 2 T q and this is enough for the results above to hold in the case of the Rosenblatt process. Consider ϕ P L 2 pr0, T s, λ T q Ă L 1{H pr0, T s, λ T q. We have with formula (5.9) and [19, p.19, (4.8)]: The constant cpHq depends only on H. This relation is sufficient to prove for instance that g i are continuous if 9 h i P L 2 pr0, T s, λ T q. Consider 0 ď s ď t ď T : We can also make a more specific statement about Hölder continuity. We can adapt the proof of Theorem 9.3 for 0 ď s ď t ď T : We find as in Theorem 9.3, that the Rosenblatt process has a k-Hölder continuous version for every k P p0, Hq. This result is also stated in [14].

Estimation of the constant A
In this section we discuss the problem of finding an optimal value for the constant A used to define the estimator defined in Eq. (4.8): For the case d ě 3, Theorem 4.1 states that the risk of X t´ξt is smaller than the risk of the standard estimator for every a that is large enough, more precisely a ą A ą 0, where A is a constant depending on h 1 , . . . , h d , f and T .
In this section we discuss the problem of estimating A. In Proposition 7.3 and Proposition 7.4, we give estimates for two special settings, one being related to the Rosenblatt process, the other related to the martingale case.

Estimating A in the Rosenblatt case
We estimate A in the case of a Rosenblatt process pZ t q tPr0,T s . Remark 7.1. For the proof of the main result, it is essential that d ě 3 and We have already proved that this inequality holds whenever a is large enough, more precisely a ą A ą 0. Finding the smallest possible value of A is a non trivial problem. We show how a value for A can be found using numerical calculations. Notice that this approach does not provide the smallest possible value for A. We consider T ą 0, d ě 3 and k P p0, 1q. We use the notations of Theorem 4.1, for instance hptq " ph 1 ptq, . . . , h d ptqq J . We notice that X t " u t`Zt , thus pX t q Pr0,T s " pZ t q tPr0,T s if u " 0. We estimate the terms of inequality (7.1).
(a) We give a lower bound for We notice that Zphq J BZphq ě 0 and that ϕ : r0,`8q Ñ p0, 1s, x Þ Ñ p1`xq´2 is a convex function. Jensen's inequality yields for every a ą 0: In the last step we have used that E 0 r a Zphq J BZphqs ď a E 0 rZphq J BZphqs. (b) We now estimate: We have with C J C " C 2 " B and We calculate 1 with the integration by parts formula: We get:ˇˇˇˇż (c) We estimate now: We have: We definehptq :" Chptq, then CZphq " Zphq. To simplify notations we write Zph i q " I 2 pq i q. We have: Drift estimation using Malliavin Calculus We prove: We writeb¨for the symmetrization of the contraction of two functions and find: We use the inequality and find: We conclude: We can prove similarly: using the inequality E 0 rI 1 pl 1 q 2 I 1 pl 2 q 2 s ď 3 E 0 rI 1 pl 1 q 2 s E 0 rI 1 pl 2 q 2 s, for any square-integrable functions l 1 , l 2 and random variables I 1 pl 1 q, I 1 pl 2 q in the first Wiener-Itô chaos. This inequality yields: We have: With inequality (7.2), we find: Hence:ˇˇˇˇż (d) We find for the left-hand side of inequality (7.1), a ą 0 and k P p0, 1q: For concrete situations, this last expression can be useful to find possible values for a. It is however obvious that the calculations above do not provide the optimal value for a.
Example 7.2. (a) In practice it may be useful to consider a noise that has a constant variance. We consider thus the model given by the equation where ą 0 and pZ t q tPr0,T s is a standard Rosenblatt process with Hurst parameter H P p1{2, 1q. We have then E u rpX t´ut q 2 s " 2 t´2 H t 2H " 2 for every t P p0, T s. To simplify notations and to avoid confusion with the initial model, we write: Xpt, , T q " u t` t´H Z t " u t`Z pt, , T q, t P p0, T s, (7.5) 3020 C. Krein and we defineXp0, , T q " 0. We use only simple random variablesXpt i , , T q to construct an estimator of the form where t i " iT {d for i " 1, . . . , d and d ě 3. Clearly t i depends on T and we do not insist on this obvious dependence in the calculations below. We have used the following notations: and for t P p0, T s:g pt, , T q " pg 1 pt, , T q, . . . ,g d pt, , T qq J , We defineg i p0, , T q " 0. The functionsg i p¨, , T q are right-continuous in t " 0. We have seen in Theorem 4.1 that the estimator defined in Eq. (7.6) has smaller risk than the standard estimator if a ą Ap , T q where Ap , T q does not depend on the drift and is supposed to be chosen as the infimum of all possible (positive) values. We prove the relation Ap , T q 2 T " Ap1, 1q (7.7) in Section 9.5. We have obviously: -Ap , T q decreases if ą 0 is fixed and T increases, -Ap , T q decreases if ą 0 increases and T is fixed.
This observation reflects our intuition: if ą 0 is small,Xpt, , T q « u t .
Since it is not possible to improve upon the "estimator" pu t q tPr0,T s , the term ξ a pt, , T q in our estimator should be small. This is realized in particular if Ap , T q Ñ 8 for Ó 0. On the other hand, a large value for reflects an important noise. The standard estimator for the drift is thus not very good and the termξ a pt, , T q inXpt, , T q´ξ a pt, , T q should allow an improvement uponXpt, , T q. This is realized in particular if Ap , T q Ó 0 for Ñ 8. If Ap , T q or Ap1, 1q cannot be chosen as the infimum of all possible values, Eq. (7.7) becomes an inequality. For instance, if Ap1, 1q is estimated by A 1 p1, 1q and is not optimal, we do not have an optimal value for Ap , T q either, but we can state: For d " 3 and H " 0.55, numerical calculations and the estimations of Remark 7.1 give for instance A 1 p1, 1q « 4.727ˆ10 7 , with T " 1 000 and " 100, we find Ap , T q ď 4.727. We notice that 2 T is the variance of the noise integrated over the interval r0, T s. We have ş T 0 E u rZpt, , T q 2 sdt " 2 T . (b) It should be noticed that similar calculations can be made for the process given by where pZ t q tPr0,T s is a standard Rosenblatt process as above. We find then: We have for the process defined by Eq. (7.8): The factor 2 T 2H`1 is again a multiple of the variance of the process integrated over r0, T s. We conclude: Consider a Rosenblatt process pZ t q tPr0,T s with Hurst parameter 1{2 ă H ă 1 and the situation defined in Eq. (7.4) respectively in Eq. (7.5) by: Xp0, , T q " 0;Xpt, , T q " u t` t´H Z t " u t`Z pt, , T q, t P p0, T s, with " 100 and T " 1 000. Consider the functions g i p¨, , T q : t Þ Ñ E u rZpt, , T qZpt i , , T qs and d " 3. The estimator defined in Eq. (7.6) bỹ Xpt, , T q´ξ a pt, , T q "Xpt, , T q´g pt, , T q JB p , T qXpt i , , T q i a`Xpt i , , T q J iB p , T qXpt i , , T q i , has smaller risk than the standard estimator for a ą 4.727.

Estimating A in the martingale case
As we did in Section 7.1, we can estimate the constant A in the martingale case. We notice that inequality (7.3) holds generally for a noise living in the second chaos and is not specific for the case of the Rosenblatt process. We choose in Eq. (4.1) f px 1 , x 2 ; tq :" 1 r0ďmintx1,x2uďmaxtx1,x2uďts {2. This leads to the process pX t q tPr0,T s defined by: ı , t P r0, T s.
The noise Z t " 1{2 "`W u t˘2´t ‰ is a self-similar process. This follows from the self-similarity of the Brownian motion. We prove the equivalence of all finite dimensional distributions. Consider s ą 0 and t 1 , . . . , t n P R, then: P u pZ s t1 ď a 1 , . . . , Z s tn ď a n q " P u`p W u s ti q 2 ď 2a i`ti s , i " 1, . . . , n" P u`s pW u ti q 2 ď 2a i`ti s , i " 1, . . . , n" P u´s 2 rpW u ti q 2´t i s ď a i , i " 1, . . . , n" P u psZ t1 ď a 1 , . . . , sZ tn ď a n q .
Thus pZ ct q pdq " pc Z t q where pdq " means equivalence of all finite dimensional distributions.
We consider the analogue of Eq. (7.4) in the present context: Xp0, , T q :" 0;Xpt, , T q " u t`? 2 t pW u t q 2´t 2 , t P p0, T s. (7.10) The estimation of the constant A for the model Eq. (7.10) is analogue to the estimation in Section 7.1. We consider again a drift estimator of the form Xpt, , T q´ξ a pt, , T q "Xpt, , T q´g pt, , T q JB p , T qXpt i , , T q i a`Xpt i , , T q J iB p , T qXpt i , , T q i (7.11) and t i " iT {d for d ě 3. Using the relation E u "`p W u a q 2´a˘`p W u b q 2´b˘‰ " 2 min ta, bu 2 , we find that: Cov u pXpt, , T q,Xpt i , , T qq " 2 t i t min tt, t i u 2 With the self-similarity of the noise pZ t q tPr0,T s we can prove an analogue of Eq. (7.7) and find numerically as in Section 7.1 for d " 3, T " 1 000 and " 100 that Ap , T q ď 1.057. We conclude: Proposition 7.4. Consider the situation defined in Eq. (7.10) by: Xp0, , T q :" 0;Xpt, , T q " u t` ? 2 t pW u t q 2´t 2 " u t`Z pt, , T q, t P p0, T s, with " 100 and T " 1 000. Consider the functions g i p¨, , T q : t Þ Ñ E u rZpt, , T qZpt i , , T qs and d " 3. The estimator defined in Eq. (7.11) bỹ Xpt, , T q´ξ a pt, , T q "Xpt, , T q´g pt, , T q JB p , T qXpt i , , T q i a`Xpt i , , T q J iB p , T qXpt i , , T q i has smaller risk than the standard estimator for a ą 1.057.

Discrete-time results
In this section, we give, without proof, a discrete version of the continuous-time models considered so far.

Discrete-time version of the estimator
Consider the martingale case of Section 6.1. In Proposition 5.1, we have noticed that the results of Theorem 4.1 hold in a Wiener-Itô chaos of higher order. We consider a stochastic process pX t q tPr0,T s with: where H n is the Hermite polynomial of order n. Using the properties of the Hermite polynomials and the classical version of the integration by parts formula, we can show the following discrete version of Theorem 4.1: has smaller risk than X as an estimator for μ with respect to the quadratic loss function, provided that a is large enough and d ě 3.

Approximation of the discrete-time estimator
Consider the discrete situation with Y " N p0, Σq and This corresponds to the j-th component of the estimator for the discrete version up to a factor d (which can be eliminated by replacing a by ad in the estimator for the continuous version).

Proof of Theorem 2.1
Let pξ t q tPr0,T s be an unbiased adapted drift estimator: E u rξ t s " E u ru t s, t P r0, T s, where u t " ş t 0 9 u s ds. Consider v t :" ş t 0 b 2 s ds. Condition 4 implies that u` v P L 2 pΩˆr0, T s, P u` v bλ T q and u t` v t " ş t 0`9 u s` b 2 s˘d s. We can suppose without loss of generality that U Ă p´1, 1q. We have for the unbiased adapted drift estimator for every | | ă 1 and t P r0, T s: (a) We have: is a pF t q tPr0,T s -martingale to get: (c) We apply the same procedure to calculate d d E u` v rv t s "0 . We give a justification for the existence of d d E u` v rv t s "0 in (d).
Comparing Eq. (9.1), Eq. (9.3) and Eq. (9.4), we get: We apply the Cauchy-Schwarz inequality and the Itô isometry to get: We calculate Var u pX t q " E u " pX t´ut q 2 ‰ by applying the Itô isometry: