Parameter Estimation of Gaussian Stationary Processes using the Generalized Method of Moments

We consider the class of all stationary Gaussian process with explicit parametric spectral density. Under some conditions on the autocovariance function, we defined a GMM estimator that satisfies consistency and asymptotic normality, using the Breuer-Major theorem and previous results on ergodicity. This result is applied to the joint estimation of the three parameters of a stationary Ornstein-Uhlenbeck (fOU) process driven by a fractional Brownian motion. The asymptotic normality of its GMM estimator applies for any H in (0,1) and under some restrictions on the remaining parameters. A numerical study is performed in the fOU case, to illustrate the estimator's practical performance when the number of datapoints is moderate.


Introduction
In this article we are interested in the parameter estimation of a stationary Gaussian process with explicit spectral density, which applies in particular to the stationary solution of the Ornstein-Uhlenbeck equation driven by the fractional Brownian motion, also known as the fOU process. The main assumption on the data is that a single trajectory of the process is observed at predetermined discrete times, with a fixed length of observation time, and an increasing horizon. Thus this study, which applies to continuous-time Gaussian stochastic processes, also falls within the realm of Gaussian time series with complex parametric autocovariances.
It is well known that the Maximum Likelihood estimation (MLE) is the preferred method of estimation because of its asymptotic optimality among densities with regular properties and independent samples (see for example, [? ]). If the samples are generated from a single path of a stationary process, they can be far from independent, and the asymptotic properties of the MLE can be deduced, but this would depend on their distributional assumptions. For example, one of the most studied classes of stochastic processes is the Gaussian one, mainly because of its wide use in modeling events which have some degree of temporal dependence, particularly when joint-distributional information need only rely on the first two moments, since no further information is needed in the Gaussian case. In a particular instance, the MLE estimator of a stationary ARMA process is strongly consistent and asymptotically efficient (see section 10.8 in [? ]). As an interesting generalization of the above fact, [? ] studies the parameter estimation of a Gaussian process with rational spectral density, using a modification of the likelihood function.
Another interesting example in which the MLE uses spectral information is the estimation of self-similar processes. The best known example of such processes in a Gaussian setting is the fractional Brownian motion (fBm). Its applications have been widely studied in many areas including hydrology, finance, climatology, and others. Jan Beran explains in more detail its definitions and properties in [? ] as well as another example of interest in applications: the fARIMA process. The estimation of the fractal dimension H of the fBm process, also known as Hurst parameter, which turns out to be the same parameter determining the slow power speed of decay of the auto-correlation function for fBm's stationary increments, namely n 2H−2 , has been researched from different points of view: exact and approximate Maximum Likelihood using spectral in- A natural generalization of the above example is to consider processes whose out to be strongly consistent and satisfies a CLT for H ∈ 1 2 , 3 4 . The proofs of strong consistency and the CLT for both estimates rely on Malliavin calculus techniques, in particular, the Nualart-Pecatti-Ortiz characterization of normal convergence for multiple stochastic integrals (see [? ]). In a more recent study, Brouste and Iacus (see [? ]) proved that the variogram of a non-stationary fOU process satisfies the regularity assumptions of Istas and Lang (see [? ]) and hence they were able to give a joint estimator of (H, σ 2 ) when λ is fixed. In fact, they proved that it is strongly consistent and asymptotically normal. When H ∈ ( 1 2 , 3 4 ) they also gave an estimator of λ based on Hu and Nualart's results in [? ], under a certain combination of increasing-domain and infill asymptotics. For H and σ fixed, their estimate is strongly consistent and asymptotically normal as well. Recently [? ] generalized the previous results to the estimation of drift and scaling parameters of essentially any long-memory stationary Gaussian process. That paper was written in response to their own paper jointly with El Onsy [? ] and several other recent papers cited in [? ] which are other special cases of their framework. However, their method does not appear to handle any memory parameter, nor do any of the methods cited in their work.
As we mentioned from the outset, our interest is to estimate the parameters in stochastic differential equations using discrete observations, which can be particularly useful in the areas of econometrics and finance; indeed these fields' long-memory stochastic modeling are typically limited in how frequently data can be observed. Section section 4 contains a brief and specific analysis of such a situation in the case of financial data. Unlike the cases described in the references above, the asymptotics are taken when the number of observations increases, but the time length among observations remains fixed. In the case of a fOU process this problem has been studied by [? ] for the estimation of λ when the Hurst parameter is fixed and is greater than 1 2 . The joint estimation of the pair (λ, σ) has been solved for discrete observations in [? ] when H is kept fixed in 1 2 , 3 4 . It is important to note that [? ] solves the parameter estimation of the entire set of parameters (H, λ, σ) for a fixed interval and infill asymptotics, i.e. drawing on availability of data with arbitrarily high sampling frequency.
In the present article, using the Generalized Method of Moments, we show how to achieve the joint estimation of any finite vector of parameters for stationary Gaussian sequences at the highest level of generality, using discrete observations, without requiring infill asymptotics, and we show that our setting applies directly to the joint estimation of H, λ and σ for the fOU process. Normal asymptotics are established thanks to the Breuer-Major theorem, and comments on efficiency are included. Our work is the first attempt to solve the joint estimation of the fOU process for its three parameters using a classical technique, without using infill asymptotics at all. We show via numerics for the fOU process that our technique is easily implementable, with good empirical precision and robustness properties even when operating with a moderate and realistic dataset size.
An interesting phenomenon occurs in the case of long-memory sequences such as discretely-observed fOU: if using only a Method of Moments based directly on the terms of the sequence itself, not its finite-differences, any estimator based imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017 on quadratic variations fails to be asymptotically normal when H > 3/4. This is a relatively well-known phenomenon, since that range of H falls beyond the Breuer-Major theorem's scope, though typically authors do not seek results in that range. Arguably, such results would be unnecessary since one only needs to consider one order filtered observations to avoid the non-normal asymptotics. However, it is instructive to investigate this case a bit further. Here we provide some calculations for the fOU case showing that the quadratic variation without any filtering cannot converge to a normal law, or even, which is more surprising, to a second-chaos law, but that it remains the basis for a strongly consistent estimator of the variance nonetheless. This requires the use of the Malliavin calculus.
This article is structured as follows: section 2 defines our GMM estimator for any stationary Gaussian process and states its general properties: consistency, asymptotic normality and efficiency. Section 3 is an application of the above results to the joint parameter estimation of the stationary solution of the fractional Ornstein-Uhlenbeck SDE. Finally, section 4 analyzes the numerical performance of the GMM estimator under a simulation study. Appendix A contain additional numerics. Appendix B contains mathematical proofs of technical lemmas and other results in the paper. Appendix C contains the convergence results, with proofs, for the non-finite-differenced estimator for fOU when H > 3/4.

Joint Estimation of Gaussian Stationary processes
Let X t be a real-valued centered gaussian stationary process with spectral density f θ0 (x), where f θ (x) is a continuous function with respect to x, continuously differentiable with respect to θ (θ represents a vector of paremeters and it belongs to a compact set Θ ⊂ R p ) and θ 0 ∈ Interior(Θ). Hereafter, θ 0 will be the true parameter of X t . By Bochner's theorem, the autocovariance function of X t can be written as: for s ≥ 0 and θ ∈ Θ. Note that if we assume that ρ θ (s) is a continuous function of s, then by [? ] (page 257), the process X t is ergodic. Take a positive integer L ≥ p, and define: (2.2) Now we have the condition that let us to identify the parameter θ.
Here we are using the notation ∇ θ · to mean the gradient of its argument with respect to θ. The concept of filter will be used along this article. Its definition is as follows: Definition 2.1. A filter a = (a 0 , . . . , a L ) of length L + 1 and order l is a sequence of L + 1 real numbers such that: a q q l = 0 when l > 0. When l = 0, we will assume that a 0 = 1 and a q = 0 for 0 < q ≤ L.
In this article we will employ a family of discrete filters of length L and orders in {0, . . . , L}. To simplify notation, we define for a filter a with order l: It is straightforward to prove that given two different orders l i , l j , the corresponding vectors b i and b j are linearly independent. For these reason we can assume the following: Assumption 2.2. Assume that we can choose L filters a 1 , . . . , a L with respective orders l 1 , . . . , l L such that the L × L + 1 matrix: Using this last assumption, we can define the function V (θ), for any θ ∈ Θ, as: and the filtered process with step size α > 0 (fixed) at t ≥ 0 as: imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017 Barboza, L. and Viens, F./Parameter Estimation of Gaussian Stationary Processes 7 for a fixed filter a. This filtered process was already considered in previous estimation studies, for example [? ], [? ], [? ] and [? ]. Due to the stationarity of X t , we can note that the variable ϕ(t) is centered gaussian, hence the expected value of ϕ(t) is zero and its variance is E[ϕ(t) 2 ] = V (θ 0 ), as follows: Following the notation of [? ] we define the vector of differences among the squared filtered observations for each of the L filters and its corresponding V (θ), for any θ ∈ Θ as: for 1 ≤ ℓ ≤ L and the subscript ℓ means that we use the filter a ℓ in the computation of the filtered process and its variance entrywise. Clearly, in the jargon of Method of Moments' estimation, g(t, θ) satisfies a population moment condition (see [? ] and [? ]): for all t ≥ 0. Also note that the vector g is a second-Wiener chaos vector (see [? ]). Let us assume that we have observed the process X t at times 0 = t 0 < t 1 < · · · < t N −1 < t N = T and fix α = t i − t i−1 .
Let A be a symmetric positive-definite matrix. We will denote A as its matrix norm induced by the Euclidean norm in R L+1 . It is important to note that A can be chosen to ensure that the GMM estimate is efficient (see section 2.3). Denote, for each θ ∈ Θ, and for an arbitrary time t ≥ 0: Note that g 0 (θ) does not depend on time t, due to the stationarity assumption on X t . Finally, we define the GMM (Generalized Method of Moments) estimator of θ 0 as (see [? ], [? ]):θ The main idea behind the GMM estimation is that since the function Q 0 (θ) attains a unique zero at θ 0 under Assumptions 2.1 and 2.2 (see Lemma 2.2 below), we would want to find a valueθ N (that depends on the current observations) such that our weighted empirical approximation to Q 0 , namelyQ N , should be as small as possible.
It is easy to prove that all the remaining results are still valid if we substitute the fixed matrix A with a sequence of random matrices (perhaps depending on the data) with a deterministically bounded eigenstructure. In particular, we can choose such sequence in order to attain convergence in probability to the efficient alternative of A. For more details see section 2.3.

Consistency
In this section we will prove the strong consistency of the estimator defined in (2.4). Our first lemma shows that the sequenceQ N (θ) is uniformly convergent for any θ ∈ Θ.
Lemma 2.1. It holds that: Under the injectivity assumptions on ρ θ (α), we can prove: And these two lemmas allow us to prove:

Asymptotic Normality
In this section we use the Breuer-Major theorem (see [? ]) to prove the asymptotic normality ofθ N , under some specific assumptions on ρ θ (α) and f θ (x). First denote for any θ ∈ Θ:Ĝ Note that G(θ) does not depend on t because of the stationarity of X t . Now we can state the following lemma: The following is true: Proof. See Appendix B.
The fact that X t is a stationary process allows us to write this gaussian process as a Wiener-Itô integral with respect to a complex centered Gaussian random measure W over [− π α , π α ] as follows (see Section 3.2 in [? ], Theorem 2.7.7 in [? ] and Section 2.1 in [? ]), (2.6) Assume that we choose n and m such that t n < t m . The covariance between X tm and X tn can be computed as: For the next lemma, we need to generalize the definition of b k in (2.3) in the following way: where i, j = 1, . . . , L and a i k is the k-th entry of the filter a i (recall the notation in Assumption 2.2). The main condition in the Breuer-Major theorem is the summability of the second-power of the autocovariance function at integer times (see [? ]). In order to validate this argument we can make the following assumption, which is equivalent to the above fact: Assumption 2.3. Assume that for any i, j ∈ {1, . . . , L} we have that: where l i is the order of the filter a i .
The assumption above and the fact that the vectorĝ N (θ 0 ) can be rescaled to obtain a sum of Hermite processes of second order, are sufficient conditions to prove the asymptotic normality of the sample moment equations, as follows: where Ω is a L × L matrix whose components are: (2.8) The next natural step in this analysis is to study the asymptotic behavior of the Mean Squared error (MSE) ofθ N . Note that from equation (2.5) the filtered process can be represented for any ℓ ∈ {1, . . . , L} and i ∈ {0, 1, . . . , N } as: Hence, using the multiplication rule of Wiener integrals (see [? ]), the componentwise average of the sample moment equations is: Then we can conclude thatĝ N (θ) is a vector of L integrals belonging to the second Wienerchaos of W . Because of Lemma 2.4, we already know that: imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017

Barboza, L. and Viens, F./Parameter Estimation of Gaussian Stationary Processes 11
so, we can use the fact that (ĝ N (θ 0 )) ℓ belongs to the second-Wiener chaos generated by the process W (x) and furthermore we can use the equivalence of all the L s -norms of (ĝ N (θ 0 )) ℓ to get: (see Theorem 3.50 in [? ]) where c s > 0. Using Jensen's inequality: All these calculations allow us to study the behavior of the MSE ofθ N . Let ε N :=θ N − θ 0 . First note that because of the multivariate mean value theorem (see Proposition F.6.1 in [? ] and more details in [? ]): and θ ∈ Θ (see Lemma 2.3). The next lemma states that all moments of ψ N can be uniformly bounded: Lemma 2.5. Let ψ N as in equation (2.10). Under the remark after assumption 2.1 there exists R s,L > 0 such that: Proof. See Appendix B.
Note also that because of the continuity of G: and due to Lemma 2.5, for any s > 0: If we take γ > 0, and under the same reasoning as before, we can use Chebyshev's inequality to get: and hence for any γ < 1 2 there exists s such that s > 1 1−2γ and, by Borel-Cantelli lemma there exists N 0 (ω) < ∞ such that for all N > N 0 (ω): where c is an arbitrary positive constant. Hence, we conclude the following theorem: Theorem 2.2. Under Assumptions 2.1 and 2.2 we have: Theorem 2.3 (Asymptotic Normality). Let X t be a Gaussian stationary process with parameter θ 0 . Ifθ N is the GMM estimator given in (2.4), then under Assumptions 2.1-2.3, it holds that: and Ω is defined as in Lemma 2.4.
Proof. Lemmas 2.3 and 2.4 gives sufficient conditions to apply Theorem 3.4 in [? ], and conclude the asymptotic normality ofθ N .

Practical Considerations
If the number of moment conditions L is greater than the number of parameters p, the asymptotic variance of √ N (θ N − θ 0 ) is minimized when A = Ω −1 (see Theorem 3.4 of [? ] and [? ] for a detailed proof of this fact). In the case of a number of moment equations L equal to the number of parameters p, the estimation problem is equivalent to the Method of Moments and hence A = I.
There are some numerical methods to compute a sequence of positive-definite matrices such thatÂ N d −→ A in such a way thatÂ N depends on the data available. This type of procedure gives numerical advantages, specifically when A = Ω −1 , since the previous matrix depends on the unknown parameter θ, while the sequenceÂ N do not depend on this parameter. Three methods that are commonly used to achieve the above are: 3. Continuous-Updating estimator (CUE) of Hansen, Heaton and Yaron (see [? ]).
For ease in our calculations in sections 3 and 4, we decided to use a constant matrix A = Ω −1 under the case where L > p and A = I otherwise.

Parameter Estimation of the fractional Ornstein-Uhlenbeck process
The main objective of this section is to apply the GMM method in the parameter estimation of a particular gaussian stationary process: the fractional Ornstein-Uhlenbeck process (fOU), under an increasing domain approach. First we will give a brief introduction of the fOU process.

Introduction
Cheridito [? ] defines the fractional Ornstein-Uhlenbeck process with initial condition ξ ∈ L 0 (Ω), as the unique almost surely solution of the Langevin equation: where λ, σ > 0, H ∈ (0, 1] and we use the notation θ = (H, λ, σ). The stationary solution of the above equation is written as: where the integral is understood in the sense of Riemann-Stieltjes. Since this process is centered-gaussian and stationary, we can describe its distributional properties by using only its autocovariance function: where H ∈ (0, 1) and . Note that when H ≥ 1 2 we can obtain an alternative expression for ρ θ (t) (see Lemma B.1 and formula (B.4) in the Appendix B). Note that using equation (3.2), we can write an explicit expression of the spectral density of X t in continuous time: Throughout this section we will assume that there exists a closed rectangle Θ ⊂ R 2 such that θ ∈ Θ.

Consistency
For ease of computation we will study the joint estimation of the three parameters in θ when H ≥ 1 2 , and we will employ the autocovariance function obtained in formula (B.4) only. Note that f θ (x) is continuous with respect to x, H, λ and σ and its partial derivatives are also continuous functions of its parameters.
Assumption 2.1 is verified locally because of Lemma B.2 in Appendix B. Note that the sufficient conditions in order to the above assumption to hold are (1) α < 1, (2) σ is of known sign and (3) 0 < λ < exp(Ψ(3)). The condition (2) does not change the magnitude of the variance of the process X t nor its scale, since the process is centered. Finally, the joint GMM estimator defined in (2.4) is consistent under the previous limitations.
The ergodicity of X t is deduced from the expansion of the function ρ θ (x) when H = 1 2 (see [? ]):

Asymptotic Normality
Assume that we have a collection of at least 3 filters satisfying Assumption 2.2. Hence, we can study under which cases Assumption 2.3 is satisfied: Lemma 3.1. Let X t be the stationary fOU process with parameters θ = (H, λ, σ). The Assumption 2.3 holds under the following two cases: Case 1 If l + l ′ ≥ 1 then it holds for all H ∈ (0, 1), Proof. See Appendix B.
Since the case 1 in the above lemma guarantees the asymptotic normality of the moment equations for any H ∈ (0, 1), we decided to use filters with order greater or equal to 1. If this is the case we can use the previous lemma together with Lemma 2.4 to obtain the asymptotics ofĝ N (θ) as follows: where the covariance matrix Ω(θ) has entries according to equation (2.8).
The behavior of deserves interest by itself due to its asymptotic behavior when H ≥ 3 4 and whose proofs rely heavily on Malliavin calculus techniques. We summarized these results in Appendix C.
We finish this section by applying the general results of the previous section to obtain consistency and asymptotic normality for the fOU's GMM estimator. Assumption 2.1 on injectivity of the estimator is satsified, as shown in Lemma B.2 in Appendix B; this lemma establishes this assumption for the fOU observations under an upper bound on λ and for α sufficiently small, where these conditions do not depend on N . First, we can use the general result in Theorem 2.2 to conclude the following rates of convergence of the GMM estimator: Proposition 3.1. Let X t be a fOU process with parameters θ = (H, λ, σ). Then, for any positive-definite matrix A and under the conditions of Lemma B.2, the GMM estimatorθ N satisfies: for all γ < 1 2 . Now we are ready to state the asymptotic distribution of our joint estimator, using the limiting behavior ofĝ N : Proposition 3.2. Let X t be the stationary fOU process with parameters θ = (H, λ, σ). Then, for any positive-definite matrix A and under the conditions of Lemma B.2, the GMM estimatorθ N of θ in equation (2.4) is consistent for any H ∈ (0, 1) and: and Ω according to equation (2.8).
Proof. The consistency is immediate from Theorem 2.1, since X t satisfies Assumption 2.1 locally. The asymptotic normality is attained since assumption 2.3 holds for any pair of filter lengths whose sum is greater or equal to 1, which is the case in this example. Therefore, we can apply Theorem 2.3 to get the desired result.
As we noted in section 2.3, we chose A = Ω −1 because this ensures that the asymptotic variance is minimal. If this is the case the asymptotic variance can be simplified to: imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017

Numerical Performance of the GMM estimators for the fractional OU process
In this section we test empirically the properties of the GMM estimator of (H, λ, σ) for the fOU case. We are especially interested in checking the performance of the joint estimator for a fixed value of N , particularly when N is not sufficiently large that the constants coming from the asymptotic variances become secondary concerns. As noted below, we run tests for N = 1000 and N = 600 observation times of a single path of the stochastic process, which represents a realistic dataset size for such questions as the estmation of memory length in financial markets under stochastic volatility (see [? ]) or in late-Holocene paleoclimatology (See [? ]), where single observation paths at discrete times are all that is available. These are examples of situations which are not data-rich, but where the amount of data should be sufficient to provide evidence of long memory. Specifically in the case of financial markets with stochastic volatility memory, evidence exists that such memory does not persist for longer than a few weeks at a given level. Assuming for instance that one has access to 20 working days of stock-, option-, and/or volatility-market data, which corresponds to calibrating models to one-month options, in order to stay in the realm of continuous-time models, i.e. to avoid having to consider jump models to account for market microstructure, the best one typically hopes for (liberal estimate) is to observe prices every 5 to 15 minutes depending on option liquidity. This would correspond to roughly 600 to 2000 datapoints before one would need to recalibrate or change the model, hence our order of magnitudes for N . This section provides empirical estimates of the bias and variance of our joint estimator on all three parameters, and a comparison to the the joint estimation of the pair (H, σ) when λ is fixed. First we choose a fixed step size among observations (α). In order to simulate the fOU process we employ the formula (B.4) together with the Choleski decomposition of the autocovariance matrix of this gaussian process. This procedure gives us m independently-generated paths of the fOU process, each such path, sampled at the N = 1000 observation times, representing one replication of the single-trajectory dataset available in practice. In our simulations we use m = 1000 replications. We decided to use finite-difference filters since they are the most typical p-vanishing-moments filters in the literature (see [? ? ? ] for example). The filters were selected sequentially, i.e. we choose l i = i for i = 1, . . . , L, for the purpose of ensuring that the respective matrix B satisfies Assumption 2.2 in page 6.
In order to solve numerically forθ N = (Ĥ N ,λ N ,σ N ) in (2.4), we use the quasi-Newton L-BFGS-R algorithm contained in the R-routine optim (see [? ] and [? ]). This optimization algorithm is designed for solving non-linear problems with simple bounds. In order to check that there was an acceptable level of variability produced by the optimization method, we compared the theoretical bounds induced in formula (3.5) with the empirical variance e(V ar) defined the largest eigenvalue of the empirical covariance matrix of the estimator vector. 99% of the confidence regions constructed with the empirical variance contained imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017 the true parameter for each scenario.
Finally, we ran different scenarios for the purpose of verification of our GMM procedure. We chose four different values of the parameter H (0.55, 0.65, 0.75 and 0.85) so that it is possible to assess the effect of increasing levels of longmemory in our estimates' performance. In addition, for each of the above cases we worked with subscenarios with an increase of 10% in λ and a decrease of 10% in the value of σ, as a quick check on the robustness the empirical statistics. We noted no significant aberrant behavior in these various scenarios; though all results are reported below, readers may safely concentrate on the case λ = σ = 1. We chose a maximum number of filters equal to 7, and since there are three parameters to estimate, we selected an initial order equal to 3. The scenarios were chosen so that we could analyze the impact of small changes on the actual values of the parameters in a given fOU process.
Tables 1 and 2 comprise three empirical meatrics of comparison among scenarios: whereθ N,i is the GMM estimate of θ using a sample size of N which was obtained using the i-th fOU replication. V ar(θ N ) is the empirical covariance matrix of the GMM estimator based on the m replications. These metrics are computed as the number of filters increases (i.e. as L increases). The calculations of Table  1 use a sample size N = 1000 and table 2 use N = 600. The second table also uses an increased time-step of α = 0.5 which in principle might affect estimator accuracy, though this seems to also depend on the value of H. We have also computed the empirical statistics when λ is assumed to be known, and when using another class of filters, with results shown in Appendix A. Below are some conclusions: 1. The empirical MSE ( M SE) seems to decrease for a fixed scenario when the number of filters (number of moment conditions L) increases. For α = 0.1, we observe an increase in the empirical MSE as the Hurst parameter increases; this fact does not persist when α = 0.5, where the empirical MSEs are larger by up to a factor of 3, particularly when H is small. The case of only 600 datapoints with a larger time interval appears to reach the limit of our joint estimation method's reliability when trying to estimate all three paramters together, since the MSE there exceeds 10% of the value of the parameter routinely in this case. See however, the comments below when λ is assumed known. 2. As the drift and diffusion parameters λ and σ are perturbed, the empirical MSE seems to change mainly due to the change in the magnitude of the and e( V ar) to λ and σ are noted as moderate, and for each case the change persists as the number of filters increases. Dependence of e( V ar) on σ is easily explained in theory since the asymptotic covariance in Proposition 3.2 is proportional to σ 2 . The two empirical statistics appear to be largely insensitive to the number of filters, however. 4. If we compare Tables 1 and 2 with Tables 3 and 4 in Appendix A, where the estimation is performed keeping the parameter λ fixed on each scenario, we notice that most of the empirical MSE is due to the variation in λ estimation. The level of precision in essentially all scenarios is higher than 100 to 1, and the estimators can be considered as empirically unbiased. This holds even in the unfavorable case of 600 observation times and larger time step (Table 4). We also note that sensitivity of the estimators' performances to the number of filters is very low, except for large values of H where it seems preferably to choose at least L = 3 even though there are only two parameters to estimate. 5. Finally Tables 5 and 6 in Appendix A contain the joint estimation of the three parameters under the same scenarios when the filters are Daubechies wavelets. We note that if we compare those tables with Tables 1 and 2, performance is essentially unchanged from one filter-class to the other. The slight improvement from Daubechies filters compared to finite-difference filters when the Hurst parameter is larger than 1 2 does not seem to be significant. Table 3 Comparison of Scenarios (α = 0.1). Estimation of (H, σ) with λ fixed.    First recall that E[ϕ(t) 2 ] = V (θ 0 ), where stationarity allows the above expression does not depend on t, and the data's true parameter is θ 0 . Note that, for any θ ∈ Θ and using the ergodicity of X t : where ϕ(t) 2 := (ϕ 1 (t) 2 , . . . , ϕ L (t) 2 ) ′ and V(θ) := (V 1 (θ), . . . , V L (θ)) ′ . Then we have: Furthermore, based on the proof of Theorem 2.6 in [? ]: for any θ ∈ Θ. Here we used the following facts: (1) the matrix A are deterministic, (2) g 0 (θ) is uniformly bounded (because ρ θ (t) is continuous over the compact set Θ) and (3) equation (B.1). Finally if we take the supremum over all θ ∈ Θ, the lemma holds.

Proof of Lemma 2.2
Note that Q 0 (θ) can be written as (for any θ ∈ Θ): imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017 Since A > 0 and B has column-rank between p and L (see assumption 2.2) we can diagonalize B ′ AB to get: where {λ i } p i=1 are non-zero eigenvalues of B ′ AB and J(θ) is a non-injective function of θ such that J(θ 0 ) = 0. Because of Assumption 2.1 the lemma holds.
Proof of Lemma 2.3 (i) : Let 1 ≤ j ≤ p and θ ∈ Θ. Then the j-th column ofĜ N (θ) is: and the continuity is inherited from the continuous differentiability of ρ θ (·). (ii) : Using (i), we have: and this matrix is a full rank-matrix due to the remark after Assumption 2.1 and the fact that B is full-rank too. Since D is positive definite, the result holds.

Proof of Lemma 2.4
First note that by rescaling the vectorĝ N (θ 0 ) with the variance of each component of the vector g(t, θ 0 ), we can write the sample moment equations in the following way: where V D (θ 0 ) := Diag (V ℓ (θ 0 )) ℓ∈{1,...,L} , H 2 (x) = x 2 − 1 is the second Hermite polynomial (see [? ]) and So the main idea of the proof is to use the vector-valued version of the Breuer-Major theorem with spectral-information conditions (see Theorem 3.1, [? ]) to study the asymptotic behavior of (B.2). In order to achieve this, we need to analyze the spectral behavior of Z ℓ,ti . First recall that for n, m ∈ {0, . . . , N } we have (see equation (2.7)): Cov(X tm , X tn ) = 1 2π π −π e i|m−n|uf θ0 u α du.

Proof of Lemma 2.5
For any matrix norm · and N ∈ N, we have that: where · op is the operator norm. Because of the remark after Assumption 2.1 and Lemma 2.3, the operator norm in the above equation is bounded for all N . Also note that if F N := G ′ (θ N )AG(θ N ) and if λ max (A) and λ min (A) are the maximum and minimum eigenvalue of A: where u = G(θ N )x. Moreover, and hence there exists R s,L > 0 such that ψ N 4s < R s,L uniformly for all N . The lemma holds.

Proof of Lemma 3.1
Take l, l ′ ∈ {l 0 , . . . , l L }, hence: First note that for any 0 < H < 1 and p = 0: and this implies that the two summands in equation (B.3) are finite for all p if π < |u| < 1. The same applies if |u| < 1 and p = 0. If p = 0 and |u| < 1 then the situation is slightly more complicated. Note that: if and only if 2(l + l ′ ) > 4H − 3. Then the lemma holds.
Lemma B.1. Assume that X t is the stationary solution of the Ornstein-Uhlenbeck SDE. Then, for t ≥ 0, the autocovariance function ρ θ (t) of θ = (H, λ, σ) is 2 : where: Var Proof. Note that: Using the proof of Lemma 5.2 in [? ] and Lemma 2.1 of [? ]: λv 0 e −s s 2H−2 dsdv and hence: and (B.4) holds. Note that [? ] proves formula (B.5) in the case H ≥ 1 2 , but using the analytical continuation of the gamma function its formula holds also for 0 < H < 1 2 . Lemma B.2. Let X t be the fOU process and θ = (H, λ, σ). Consider the mapping: in a closed rectangle Υ ⊂ Θ where it holds that λ < exp(Ψ(3)) and σ is of known sign. Then Assumption 2.1 is satisfied for α sufficiently small 3 .

Barboza, L. and Viens, F./Parameter Estimation of Gaussian Stationary Processes 34
First note that by using the multiplication rule of Weiner integrals: and hence using the Lemma C.1 (i) and (ii) we have that: and therefore, using Lemma C.2, we conclude that: For H > 3 4 , Lemma C.2 allows us to conclude that the limit of does not belong to the first Weiner chaos, and hence the GMM estimator is not normally distributed. Define the following function: for any (r, s) ∈ R 2 . By Lemma C.3, this function is not a Cauchy sequence in H 2 . Hence we cannot find a second-Wiener integral I 2 (g(·, ·)) such that: E[|F N − I 2 (g(·, ·))| 2 ] = 2 I N (·, ·) − g(·, ·) 2 H 2 −→ 0 when H > 3 Lemma C.1. If A k (u) := A k (u|θ) is defined as (2.5) for u > 0, then: for any H ∈ (0, 1).
Proof. (i) It is trivial to deduce that for u, v > 0: Suppose that A k is defined as (2.5). Define for H ≥ 3 4 : Proof. First note that the inner product of the symmetric tensor products can be written as: then, by symmetry of the indices, we can write: and we can study the limiting behavior of P N through the folllowing cases: Case I: i = i ′ = j = j ′ In this case we analyzed the behavior of the diagonal, which is convergent for any H ≥ 3 4 as follows: Case II: i = i ′ , j = j ′ and i = j In this particular case we observe: Note that if H = 3 4 : and if H > 3 4 : Case III: i = i ′ , j = j ′ and i = j Without loss of generality, we can assume that |i − j| ≥ 2 and |j − j ′ |. If |i−j| = 1 or |j −j ′ | = 1 then it can be proved that the rates of convergence are the same as Case I and II. We can write P N as: BN the first factor behaves as: |x − y| 2H−2 |x − y ′ | 2H−2 |y − y ′ | 2H−2 dxdydy ′ and the second: |x − y| 2H−2 |x − y ′ | 2H−2 |y − y ′ | 2H−2 dxdydy ′ and the previous integral is finite for any H > 1 2 . Hence, P N is bounded asymptotically, for H = 3 4 , by: and if H > 3 4 , it is bounded by: Case IV: i = i ′ , j = j ′ and i = j As we did in the previous case, the rates of convergence when at least two indices are far apart by less than two units are the same as the ones imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017 calculated in Cases I-II-III. So let us assume that |i − i ′ | ≥ 2, |j − j ′ | ≥ 2 and |i − j| ≥ 2.
and note that the first summand behaves asymptotically as: and the second summand can be asymptotically bounded by: and this quadruple integral is bounded for any H > 1 2 . Hence if H = 3 4 , P N is asymptotically bounded by: However, if H > 3 4 , the first summand of P N is asymptotically equal to: then the limit of P N is not 0. Moreover, since D N is asymptotically bounded by N 8H−4 , there exists k = 0 such that: Lemma C.3. If I N is defined as (C.2) and H > 3 Proof. We will prove the statement by contradiction. Assume that I N is Cauchy. First of all, note that I N is bounded in H 2 = L 2 ((−π, π) 2 ): and M 2 := ∞ k=1 k 4H−6 . This implies that I N is dominated in L 2 ((−π/α, π/α) 2 ). Also note that the pointwise limit of I N is (if H > 1 2 ): Then we can apply the dominated convergence theorem to conclude that: = E |I 1 (I N (·, ·))| 2 = 2 I N (·, ·) 2 H 2 −→ 0 but this contradicts the fact that: imsart-ejs ver. 2011/11/15 file: BarbozaViens_2016.tex date: January 18, 2017 which was already evidenced in the proof of Theorem 3.2. Hence, by contradiction, I N is not Cauchy.