Nonparametric estimation of the volatility function in a high-frequency model corrupted by noise

We consider the models Y_{i,n}=\int_0^{i/n} \sigma(s)dW_s+\tau(i/n)\epsilon_{i,n}, and \tilde Y_{i,n}=\sigma(i/n)W_{i/n}+\tau(i/n)\epsilon_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and \epsilon_{i,n} are centered i.i.d. random variables with E(\epsilon_{i,n}^2)=1 and finite fourth moment. Furthermore, \sigma and \tau are unknown deterministic functions and W_t and (\epsilon_{1,n},...,\epsilon_{n,n}) are assumed to be independent processes. Based on a spectral decomposition of the covariance structures we derive series estimators for \sigma^2 and \tau^2 and investigate their rate of convergence of the MISE in dependence of their smoothness. To this end specific basis functions and their corresponding Sobolev ellipsoids are introduced and we show that our estimators are optimal in minimax sense. Our work is motivated by microstructure noise models. Our major finding is that the microstructure noise \epsilon_{i,n} introduces an additionally degree of ill-posedness of 1/2; irrespectively of the tail behavior of \epsilon_{i,n}. The method is illustrated by a small numerical study.

= 1 and finite fourth moment. Furthermore, σ and τ are unknown deterministic functions and (W t ) t∈[0,1] and ( 1,n , . . . , n,n ) are assumed to be independent processes. Based on a spectral decomposition of the covariance structures we derive series estimators for σ 2 and τ 2 and investigate their rate of convergence of the MISE in dependence of their smoothness. To this end specific basis functions and their corresponding Sobolev ellipsoids are introduced and we show that our estimators are optimal in minimax sense. Our work is motivated by microstructure noise models. A major finding is that the microstructure noise i,n introduces an additionally degree of ill-posedness of 1/2; irrespectively of the tail behavior of i,n . The performance of the estimates is illustrated by a small numerical study.

Introduction
Consider the models   1] and ( 1,n , . . . , n,n ) are assumed to be independent, and σ and τ are unknown, positive and deterministic functions.
Our models (1.1) and (1.2) are natural extensions of the situation when σ and τ are constant, which has been, in a slightly broader setting, previously considered by [8], [13], [14] and [24] among others. In the latter papers sharp minimax estimators were derived for σ 2 and τ 2 . The minimax rate for σ 2 is n −1/4 and for τ 2 it is n −1/2 , and the corresponding constants for quadratic loss (MSE) being 8τ σ 3 and 2τ 4 , respectively. To estimate σ and τ, maximum likelihood is feasible (see [24]) and achieves these bounds. Other efficient estimators where given by [8], [13] or [14]. In our case, i.e. when σ and τ are functions these methods fail and techniques from nonparametric regression become necessary. We will postpone a more careful dicussion of models (1.1) and (1.2) to Section 2.
Both models incorporate, as usually in high-frequency financial models, an additional noise term, denoted as microstructure noise (cf. [1] and [16] ) in order to model market frictions such as bid-ask spreads and rounding errors. In general, microstructure noise is often assumed as white noise process with bounded fourth moment. Therefore, we may interpret both models as obtaining data from transformed Brownian motions under additional measurement errors. Particularly, our assumptions cover the important case when i,n i.i.d.
∼ N (0, 1) . In this paper we try to understand how estimation of the functions σ 2 and τ 2 in (1.1) and (1.2) itself can be performed, i.e. the time derivative of the integrated volatility. To our knowledge, this issue has never been addressed before, a remarkable exception is [3] where a harmonic analysis technique is introduced in order to recover σ 2 from noiseless data. A naive estimator of σ 2 would be the derivative of an estimator of s 0 σ 2 (x)dx with respect to s. However, (numerical) differentiation of s 0 σ 2 (x)dx with respect to s yields an additional degree of ill-posedness and there are to the best of our knowledge no estimates and no theoretical results available how to estimate σ 2 in our situation. Instead, we propose a regularized estimator for σ and τ that attains the minimax rate of convergence. Our estimator is a Fourier series estimator where we will estimate the single cosine Fourier coefficients, 1 0 σ 2 (x) cos(kπx)dx, k = 0, 1, . . . by a particular spectral estimator which is specifically tailor suited to this problem. The difficulty to estimate σ 2 can be explained generically from the point of view of statistical inverse problem: Microstructure noise induces an additional degree of ill posedness -similar as in a deconvolution problem-which in our case leads to a reduction of the rate of convergence by a factor 1/2. Surprisingly, and in contrast to deconvolution, this is only reflected in the behavior of the eigenvalues of the covariance operator of the process in (1.1) and (1.2) and not in the tail behavior of the Fourier transform of the error i,n .
We stress again that we are aware of the fact that our model assumes a deterministic function σ and τ , which only depends on time t and generalization to σ (t, X t ) is not obvious and a challenge for further research. However, the purely deterministic case already helps us to reveal the daily pattern of the volatility and finally we believe that our analysis is an important step into the understanding of these models from the view point of a statistical inverse problems. Results: All results are obtained with respect to MISE-risk. Let α and β denote a certain smoothness of σ 2 and τ 2 , respectively. Roughly speaking, these numbers correspond to the usual Sobolev indices, although in our situation, a particular choice of basis is required, leading us to the definition of Sobolev s-ellipsoids (see Definition 1). Then we show that τ 2 can be estimated at rate n −β/(2β+1) for β > 1, α > 1/2 in model (1.1) and β > 1, α > 3/4 in model (1.2). This corresponds to the classical minimax rates for the usual Sobolev ellipsoids without the Brownian motion term in (1.1) and (1.2). More interesting, we obtain for estimation of σ 2 the n −α/(4α+2) rate of convergence for α > 3/4, β > 5/4 in model (1.1) and α > 3/2, β > 5/4 in model (1.2). We will show that these rates are uniform for Sobolev s-ellipsoids. Lower bounds with respect to Hölder classes for estimation of σ 2 have been obtained in [17]. Here we will extend this result to Sobolev s-ellipsoids. It follows that the obtained rates are minimax, indeed.
To summarize, our major finding is that in contrast to ordinary deconvolution the difficulty of estimation σ 2 when corrupted by additional (microstructure) noise , is generically increased by a factor of 1/2 within the s-ellipsoids. This is quite surprising because one might have expected that for instance Gaussian error leads to logarithmic convergence rates due to its exponential decay of the Fourier transform (see e.g. [4], [6], [7] and [11] for some results in this direction). We stress that for our method a minimal smoothness of σ in (1.1) of α > 1/2 and in (1.2) of α > 3/2 is required. Although convergence rates are half compared with usual nonparametric regression, it turns out that for large sample sizes we get reasonable estimates for smooth functions σ 2 . Roughly speaking, the results imply that n data points for estimation of σ 2 can be compared to the situation, when we have √ n observation in usual nonparamteric regression.
The work is organized as follows. In Sections 2 and 3 we will discuss models (1.1) and (1.2) in more detail, introduce notation and define the required smoothness classes, Sobolev s-ellipsoids (details can be found in Appendix B). Section 4.1 and Section 4.2 are devoted to estimate σ 2 and τ 2 , respectively, and to present the rates of convergence of the estimators (for a proof see Appendix A). Section 5 provides the minimax result. In Section 6 we briefly discuss some numerical results and illustrate the robustness of the estimator against non-normality and violations of the required smoothness assumptions for σ 2 and τ 2 . Some further results and technicalities of Sections 4.1 and 4.2 are given in the supplementary material.
2 Discussion of Models (1.1) and (1.2) In this subsection we briefly discuss the background from financial economics of model (1.1) and explore the differences between models (1.1) and (1.2). We may consider the processes ( 1] , H (t) := t 0 σ 2 (s) ds as (inhomogeneously) scaled Brownian motions, where scaling takes place in space and in time, respectively. Hence we will refer to (σ (t) W t ) t∈[0,1] and t 0 σ (s) dW s t∈ [0,1] in the future as space-transformed (sBM) and time-transformed (tBM) Brownian motion.
Model (1.1): In the financial econometrics literature variations of model (1.1) are often denoted as high-frequency models, since (W t ) t∈[0,1] is sampled on time points t = i/n and nowadays there is a vast amount of literature on volatility estimation in high-frequency models with additional microstructure noise term (see [2], [15], [26] and [27]). These kinds of models have attained a lot of attention recently, since the usual quadratic variation techniques for estimation of 1 0 σ 2 (x)dx lead to inconsistent estimators (cf. [26]). We are aware of the fact, that in contrast to our model, volatility is modelled generally not only as time dependent but also depending on the process itself, i.e. Y i,n = X i/n + τ (i/n) i,n , i = 1, . . . , n, dX t = σ (t, X t ) dW t . An overview over commonly used parametric forms of σ (t, X t ) and a non-parametric treatment in the absence of microstructure noise, can be found in [12]. It is known that the same rates as for the case σ and τ constant hold true if we consider the model (1.1) and estimate the so called integrated volatility or realized volatility s 0 σ 2 (x)dx (s ∈ [0, 1]) and s 0 τ 2 (x)dx instead of σ 2 and τ 2 , respectively (see [20] and [22] for a discussion on estimation of integrated volatility and related quantities). Recently, model (1.1) has been proven to be asymptotically equivalent to a Gaussian shift experiment (see [21]). σ 2 as a function of time corresponds in model (1.1) to the instantaneous volatility or spot volatility.
Model (1.2): Model (1.2) can be regarded as a nonparametric extension of the model with constant σ, τ as discussed for variogram estimation by [24]. To motivate the usefulness of sBM we give the following Lemma. 1] is the unique solution of the SDE dX t = X t d (log (σ (t))) + σ (t) dW t , X 0 = 0, 0 ≤ t ≤ T.
(ii) The variogram of sBM is given by Proof. (i) It is easy to check that sBM indeed is a solution. To establish uniqueness, we apply Theorem 9.1 in [23]. (ii) This follows by straightforward calculations.
Comparison of the models: We remark that tBM can be related to sBM by partial integration To see the differences we compared in Figure 1 sBM and tBM in two typical situations: The case where σ (t) = 0 for t > T and the case, where σ is non-continuous. If σ (t) = 0 for t > T, sBM tends to zero, whereas tBM tends to a constant, i.e. the random variable T 0 σ (s) dW s . Furthermore, if σ is a jump function, sBM has a jump too, whereas tBM does not.
Unlike Model (1.1), which can be viewed as a price process, Model (1.2) has no direct application in financial mathematics. However, from the view point of nonparametric statistics it seems to be a natural extension of the situation when σ and τ are constant.

Introduction to Sobolev s-Ellipsoids and Technical Preliminaries
In this section we shortly introduce the setup needed in order to define the estimators. First we define suitable smoothness classes, which are different, but related to well known Sobolev ellipsoids (see Definition B.1).
Definition 1. For α > 0, C > 0, we call the function space , we say f has smoothness α. For 0 < l < u < ∞, we further introduce the uniformly bounded Sobolev s-ellipsoid Here the "s" refers to "symmetry" since the L 2 [0, 1] basis {ψ k , k = 0, . . .} := 1, √ 2 cos (kπt) , k = 1, . . . , (3.1) can also be viewed as a basis of the symmetric L 2 [−1, 1] functions Usually, Sobolev ellipsoids are introduced with respect to the Fourier basis on L 2 [0, 1] (see Definition (B.1)). As will turn out later on, Sobolev s-ellipsoids are more natural for our approach. If a function has a certain smoothness in one space, it might have a completely different smoothness with respect to the other basis. For instance the function cos ((2l + 1) πx), l ∈ N has smoothness α for all α < ∞ with respect to basis (3.1), and as can be seen by direct calculations only smoothness α < 1/2 for the Fourier basis. A more precise discussion can be found in Part B of the Appendix. Instead of (3.1) it is convenient to introduce the functions f k : Note that for k ≥ 1, f 2 k can be expanded in basis (3.1) by f 2 k = ψ 0 + 2 −1/2 ψ k . For any function g we introduce the forward difference operator ∆ i g := g ((i + 1) /n) − g (i/n) and further the transformed variables ∆Y k,1 i,n := (Y i+1,n − Y i,n ) f k (i/n) and ∆Y k,2 i,n := Ỹ i+1,n −Ỹ i,n f k (i/n), i = 1, . . . , n − 1 for models (1.1) and (1.2), respectively. In order to discuss the models simultaneously, we will write ∆Y k i,n = ∆Y k,l i,n , l = 1, 2. Throughout the paper we abbreviate first order differences of observations by ∆Y k := ∆Y k 1,n , . . . , ∆Y k n−1,n t .
We will suppress the index n − 1 and write K, D, Λ, λ i instead of K n−1 , D n−1 , Λ n−1 , and λ i,n , respectively. We write [x] := max z∈Z {z ≤ x}, x ∈ R, the integer part of x. log() is defined to be the binary logarithm and in order to define estimators properly, we assume throughout the paper additionally n > 16.

Estimation of τ 2
Before we will turn to the estimation of the volatility σ 2 , we will first discuss estimation of the noise variance, i.e. τ 2 . Let J τ n ∈ D n−1 given by where λ i is defined by (3.2) and δ i,j denotes the Kronecker delta. We consider models (1.1) and (1.2), simultaneously. Let In Lemma C.1 it will be shown thatt k,0 is a √ n−consistent estimator of Note that for k ≥ 1 this means t k,0 = Hence this also can be seen as a spectral filter in Fourier domain, where we cut off the first n/ log n frequencies. Note that for i ≥ 1, dx is the i-th series coefficient with respect to basis (3.1). This observation suggests to construct the cosine series estimatorτ  The next result provides the rate of convergence ofτ 2 N uniformly within Sobolev sellipsoids. To this end a version of the continuous Sobolev embedding theorem is required for non-integer indices α, β (see Lemma D.8). A proof of the following Theorem can be found in the supplementary material. (4.3). Assume β > 1, and Q,Q > 0. Further suppose that N = N n = o n 1/2 / log n . Assume either model (1.1) and α > 1/2 or model (1.2) and α > 3/4. Then it holds Minimizing the r.h.s. yields N * = O n 1/(2β+1) and consequently Remark 1. Note that for model (1.1) Theorem 1 holds, whenever α > 1/2. Hence the Brownian motion part of the model can be viewed as a nuisance parameter, not affecting rates for estimation of τ 2 . However, for model (1.2) α > 3/4 is required here. This more restrictive assumption is essentially a consequence of the fact that the process σ (i/n) W i/n is in general no martingale.
Remark 2. The result from Theorem 1 can be extended to 1/2 < β ≤ 1 in model (1.1) and Further suppose that N = O n 1/(2β+1) . Then we obtain by slight modifications of the proof of Theorem 1 for β > 1/2, α > 1/2 and Q,Q > 0 (i) Assume model (1.1). Then it holds (ii) Assume model (1.2). Then we have the expansion and the choice Remark 3. It is also possible, although more technical, to compute the asymptotic constant of the estimatorτ 2 N * . Suppose that the microstructure noise is Gaussian and assume model (1.1) and β > 1 or (1.2) and β > 1, α > 3/4, then we have more explicitly

Remark 4.
There are of course simpler estimators for t k,0 . For instance if we replace J τ n in (4.1) by (2n) −1 I n−1 , where I n−1 ∈ D n−1 denotes the identity matrix, we obtain the quadratic variation estimator for t k,0 (cf. [1]) and it is not difficult to show that this estimator attains the optimal rate of convergence. This approach could even be extended to a nonparametric estimator of the form (4.3). However, the single Fourier coefficients are not estimated efficiently, since in the case when the microstructure noise is Gaussian the asymptotic constant is 3n −1 τ 4 k (x)dx (this is a straightforward extension of Theorem A.1 in [27]) whereas for our estimator we have 2n −1 τ 4 k (x)dx (see Lemma C.1). If τ is constant it can be easily seen that estimators in (4.1) are efficient for k = 0 whereas quadratic variation is not.

Remark 5.
In practical application it would be more natural to use instead of n/ log n in (4.2) other cut-off frequencies e.g. n γ / log n or qn, where 1/2 < γ ≤ 1, 0 < q < 1. Smaller γ decreases the variance while on the other hand increases the bias of the estimator.

Estimation of σ 2
Define J n ∈ D n−1 by Similar, as for the estimation of τ 2 we first introduce an estimator of appropriate Fourier coefficients byŝ The second part, i.e. −7π 2t k,0 /3 is a bias correcting term, where the constant 7π 2 /3 is due to the choice of cut-off points n 1/2 + 1 and 2 n 1/2 in (4.4). As we will see, the estimator oft k,0 has better convergence properties than the first term inŝ k,0 , and hence does not affect the asymptotic variance. Similar to (4.3), we put (4.6) and minimizing the r.h.s. yields The proof of Theorem 2 is given in Section A.2.
Remark 6. It is also possible to extend this result for less smooth functions σ 2 and τ 2 .
(i) Assume model (1.1) and α > 1/2, β > 1. Then it holds (ii) Assume model (1.2) and α > 3/2, β > 1. Then it holds Remark 7. In analogy to (4.2), the estimatorŝ k,0 can also be viewed as a spectral filter in Fourier domain, where essentially only the frequencies n 1/2 , . . . , 2n 1/2 play a role. For practical purposes one can generalize this to estimators where the frequencies k, . . . , cn 1/2 , c > 0 are used. If σ is assumed to be very smooth, one even may set k = 1. In this more general setting, the constant −7π 2 /3 in the definition of the estimator has to be replaced by Remark 8. Since the matrix D in the definition ofŝ k,0 is a discrete sine transform (for a definition see [5]) the estimatorσ 2 N can be calculated explicitly taking O (N n log n) steps.

Minimax
In this section we will discuss the optimality of the proposed estimators. To this end we establish lower bounds with respect to Sobolev s-ellipsoids. Then there exists a C > 0 (depending only on α, Q, l, u), such that Proof. The proof relies on a multiple hypothesis testing argument and is close to the proof given in [17], Theorem 2.1. However, the lower bounds there are established with respect to the space of Hölder continuous functions of index α on the interval [0, 1] , i.e. for l < u Therefore, the statement above does not follow immediately from [17], Theorem 2.1 because of C b (α, L) Θ b s (α, Q) due to boundary effects. Here, we will only point out the difference to the proof of [17], Theorem 2.1. We write σ min , σ max for the lower and upper bound of σ 2 , respectively, i.e. σ 2 ∈ Θ b s (α, Q, [σ min , σ max ]). Without loss of generality, we may assume that σ min = 1. For the multiple hypothesis testing argument (cf. [25]) a specific choice of functions σ 2 i,n is required. For a construction see [17], proof of Theorem 2.1 where Due to the construction of

Simulations
In this section we briefly illustrate the performance of our estimators. Our aim is not to give a comprehensive simulation study, rather we would like to illustrate the behaviour of the estimator when assumptions of Theorems 1 and 2 are violated. In the following we plotted our estimator to simulated data, where we always set n = 25.000. From the point of view of financial statistics this is approximately the sample size obtained over a trading day (6.5 hours) if log-returns are sampled at every second. For simplicity, we will choose N in (4.3) and (4.6) as the minimizer of τ 2 − τ 2 2 n and σ 2 − σ 2 2 n , respectively, which is in practice unknown. Of course, proper selection of the threshold N * is of major importance for the performance of the estimator. To this end various methods are available, among others, cross validation techniques, balancing principles, and variants thereof could be employed (see e.g. [9], [10], [18] and [19]). A thorough investigation is postponed to a separate paper. Throughout our simulations we assumed τ = 0.01 and concentrated mainly on estimation of σ 2 , as it is the more challenging task.
In Figure 2 we have displayed the estimator for σ(t) = (2 + cos (2πt)) 1/2 . Note that by Definition 1, σ 2 has "infinite" smoothness, i.e. for any α > 0, we can find a Q < ∞, such that σ 2 ∈ Θ s (α, Q) . The reconstruction shows that estimation of τ 2 can be done much easier than estimation of σ 2 although it is of smaller magnitude. In Figure 3, we are interested in the behavior of the estimators if heavy-tailed microstructure noise is present. This was simulated by generating i, denotes a t-distribution with 3 degrees of freedom. We can see from Plot 1 in Figure 3 that the resulting microstructure noise has some severe outliers according to the tail x −4 of the density of t(3). Nevertheless, estimation of τ 2 and σ 2 is not visibly affected by the distribution of the noise.
In the subsequent figures we illustrate the behaviour of the estimator when the required smoothness assumptions on σ 2 and τ 2 are violated. To this end, we investigate in Figure 4 the situation when σ is random itself, i.e. a realization from a Brownian motion, 1] was modelled as independent from the Brownian motion in (1.1) and the microstructure noise process. It is of course not possible to reconstruct the complete path of σ 2 , but as Figure 4 indicates, the estimators at least detects the smoothed shape of the path and so our estimator might already reveal some parts of the pattern of volatility also in case σ is non-deterministic, which is certainly more realistic in most applications.
Finally, in Figure 5 we investigated the case of σ being a jump-function. We put σ (t) = 1 + I ( 1/2,1 ] (t) , a function with jump at t = 1/2. Fourier series usually show a Gibbs phenomenon, i.e. an oscillating behavior at discontinuities. This behavior is also clearly visible in the graph ofσ 2 . In order to reconstruct jumps in volatility other methods certainly will be more suitable and are postponed to a separate paper.
Computational tasks: We implemented the estimators in Matlab using the routine fft() for the discrete sine transform (see Remark 8). Calculation of the estimators for a sample size of n = 25.000 took around 2-3 seconds on a Intel Celeron 1.7 GHz processor. As mentioned in Remark 8, the estimator can be calculated in O (N n log n) steps. If we choose N with the optimal scale, i.e. N ∼ n 1/(4α+2) , we have for the complexity O (N n log n) = o n 5/4 log n , whenever α > 1/2.

Appendix A Convergence rate ofσ 2
In this section we will give a proof of Theorem 2. To this end we first introduce some notation and then prove a Lemma in order to get uniform estimates of bias and variance of the single estimatorsŝ k,0 .

A.1 Preliminary Results and Notation
Proofs of the upper bounds are based on a decomposition of ∆Y k . In this subsection we present some further notation. Let σ k (t) := σ(t)f k (t) and τ k (t) := τ (t)f k (t), t ∈ [0, 1]. Let throughout the following for the Sobolev s-ellipsoids in Definition 1 for σ 2 the constants being l = σ min and u = σ max and for τ 2 , l = τ min , u = τ max . We define In order to do the proofs for model (1.1) and model (1.2) simultaneously, we first define the more general process V k,l := X 1,k + X 2,k + Z 1,k,l + Z 2,k , l = 1, 2, where X 1,k , X 2,k , Z 1,k,l and Z 2,k are n − 1 dimensional random vectors with components Obviously, ∆Y k = V k,1 and ∆Y k = V k,2 if model (1.1) and (1.2) holds, respectively. Define the generalized estimatorst k,0,l := V t k,l DJ τ n D t V k,l andŝ k,0,l := V t k,l DJ n D t V k,l − 7π 2t k,0,l /3. Further there exists a decomposition with C 1,k,l , C 2,k ∈ M n−1,n such that where = ( 1,n , . . . , n,n ) t and ξ = ξ n is standard n-variate normal, , ξ independent and C 1,k,l ξ = X 1,k + Z 1,k,l , C 2,k = X 2,k + Z 2,k . Now, let be the scaled p-th Fourier coefficients of the cosine series of σ 2 k and τ 2 k , respectively. Define the sums A(σ 2 k , r) by for r ≡ 0 mod 2n, ∞ m=0 s k,2nm+n for r ≡ n mod 2n, q≡±r mod 2n, q≥0 s k,q for r ≡ 0 mod n, and analogously A(τ 2 k , r) with s k,p replaced by t k,p . Some properties of these variables are given in Lemma D.1 and Lemma D.2.
Further define We put Cum 4 ( ) := Cum 4 ( 1,n ) for the fourth cumulant of 1,n . If X, Y are independent random vectors, we write X ⊥ Y .
(i) Assume model (1.1), α > 1/2. Then it holds (ii) Assume model (1.2), α > 5/4. Then it holds Proof. The proof mainly uses the generalized estimators as introduced in Section A.1. It is clear that for two centered random vectors P and Q P, Q σ := E P t DJ n DQ defines a semi-inner product and by Lemma D.5, P ⊥ Q ⇒ P, Q σ = 0. Hence Clearly with (iii) in Lemma D.1 and r n := n −1/2 n 1/2 , Hence due to r n ≤ 1 and |r n − 1| ≤ n −1/2 Next we will bound X 2,k , X 2,k σ . In order to do this let T k ∈ D n−1 with entries (T k ) i,j = τ k (i/n) δ i,j . Further we defineT k ∈ M n−1 Using Lemma D.3 yields and further tr ΛJ n DT 2 Therefore, (A.14) can be written as Recall that tr (J n ) = O (n). It follows and therefore We bound the remaining terms of (A.10). Note In order to bound Z 1,k,2 , Z 1,k,2 σ define L := (i ∧ j) + 1 n i,j=1,...,n−1 (A. 16) and ∆Σ k ∈ D n−1 by We obtain Note that Hence by Proposition D.1, we obtain Applying the CS-inequality gives Using Proposition C.1 this yields (A.6) and (A.8). In order to give an upper bound for the variance ofŝ k,0,l note Var (ŝ k,0,l ) ≤ 2 Var V t k,l DJ n DV k,l + 2 Var t k,0,l .

Furthermore we have using (A.3) and Lemma D.3 (vi)
V t k,l DJ n DV k,l = ξ t C t 1,k,l DJ n DC t 1,k,l ξ + 2ξ t C t 1,k,l DJ n DC 2,k + t C t 2,k DJ n DC 2,k ≤ 2ξ t C t 1,k,l DJ n DC t 1,k,l ξ + 2 t C t 2,k DJ n DC 2,k .

Hence
Var V t k,l DJ n DV k,l ≤ 8 Var ξ t C t 1,k,l DJ n DC 1,k,l ξ + 8 Var t C t 2,k DJ n DC 2,k .
Finally, we bound Var ξ t C t 1,k,l DJ n DC 1,k,l ξ and Var t C t 2,k DJ n DC 2,k in two steps, which will be denoted by (a) and (b).
(b) Next, we see with the same arguments as in (A.20) We obtain
Using Lemma A.1 yields the result.

Appendix B Sobolev s-ellipsoids
In this chapter we will shortly discuss the function space introduced in Section 3 and provide a theorem needed for the lower bound. First recall the classical definition of Sobolev ellipsoids (cf. Proposition 1.14 in [25]).
Interesting characterizations arise if we put Sobolev s-ellipsoids into relation with Sobolev ellipsoids: Proof. First we show that if a function f ∈ W(α,C) then also f ∈ Θ s (α, C). Letf be defined on [−1, 1] byf .

Note thatf is an
It holds for j ≥ 1 Hence we have the Parseval type equality Further for k ≥ 1, j even, it follows by partial integration and for k ≥ 1 and j odd and hence proves the first part of the theorem. The other direction follows in a straightforward way by differentiation and is thus omitted.

Supplementary Material
Supplement: Proofs for upper bound ofτ 2 N and further technicalities (http://www.stochastik.math.uni-goettingen.de/munk) In the supplementary material we provide a proof for Theorem 1 and summarize results from linear algebra and matrix theory needed for the proofs.  N (0, 1), i.i.d., τ = 0.1, σ(t) = (2 + cos (2πt)) 1/2 . Plot 1 shows the data. Additionally to the data, we plotted the path of the tBM in Plot 2. The reconstruction of τ 2 and σ 2 (dashed lines) as well as the true function (solid lines) are given in Plot 3 and 4, respectively. The threshold parameters were selected as N * = 1 for estimation of τ 2 and N * = 3 for estimation of σ 2 . Figure 3: (Heavy-tailed microstructure noise) As Figure 2 but instead of Gaussian errors we assumed that the noise follows a normalized Student's t-distribution with 3 degrees of freedom. We observe that performance ofτ 2 andσ 2 is quite robust to heavy-tailed noise. The threshold parameters N * were selected as 1 and 3 for estimation of τ 2 and σ 2 , respectively.  Proof. We only prove the third equality the other two can be deduced similarly. Note that for τ 2 ∈ Θ b (β, Q), Taking supremum and applying Lemma D.8 gives the result.
Proof of Theorem 1. The proof is close to the one of Theorem 2. We obtain Var t k,0,l   .

Appendix D Technical Results
Proposition D.1. Let A ∈ M n−1 . Then tr (J n DAD) ≤ n + 5n 3/2 + 8n 3/2 (1 + log n) max Proof. We obtain with (A.12) Cov (X 2,k + Z 2,k ) = 1/2T 2 k K + 1/2KT 2 k + S k , where S k := 1/2T k +Cov (X 2,k , Z 2,k )+Cov (Z 2,k , X 2,k )+Cov (Z 2,k ) . Application of the triangle inequality gives Note that because of Lemma D.4 (iii) it holds Now we will bound from below. We obtain with Lemma D.3 Denote by τ k,(i) the i-th largest component of the vector Then Next we will derive an upper bound for the r.h.s. of (D.3). Let analogously to the Definition (A.11)T k be a tridiagonal matrix with entries Note that max i ∆ i τ 2 k ≤ 2τ 1/2 maxφn,1/2 . It is easy to check that T 2 k KT 2 k = 1/2T 4 k K +1/2KT 4 k + 1/2T k holds. Clearly, J τ n ≤ (n − n/ log n) −1 Λ −1 , and therefore we have for the upper bound in (D.3) where we used in the last inequality an argument as for (A.15). Combining this with (D.4) and Proposition C.1 yields Now we will bound the remainder term in (D.2). Using Lemma D.6 gives Cov (X 2,k , Z 2,k ) i,j (iii) Let Σ k as defined in (A.5). Then Remark D.1. In (iii), for |i − j| i + j, the r.h.s. behaves like s k,i−j . In the same way we obtain the equivalent result if we replace σ 2 by τ 2 .
Proof. (ii) Note that we can write in Sobolev s-ellipsoids. In particular the result shows that the Fourier series is absolute summable.
Proof. Consider the case γ > 0, α > 1/2. Using Lemma D.1 (i), we see that for n large enough where we used the definition of a Sobolev s-ellipsoid in the last step. If k = 0, γ = 0 and α > 1/2 we can argue similarly.
In the next lemma we collect some important facts about positive semidefinite matrices and trace calculation. (v) (CS inequality for trace operator) Let A and B matrices of the same size. Then tr AB t ≤ tr 1/2 AA t tr 1/2 BB t .
(vi) Let A, B matrices of the same size. Then Corollary D.1. Let A and B matrices of the same size. Then Proof. By Lemma D.3 (vi) A t B + AB t ≤ A t A + B t B. Applying Lemma D.3 (iv) for r = s = 0 yields the result.
In the following Lemma, we summarize some facts on Frobenius norms. (ii) It holds (iii) Let A, B be positive semidefinite matrices of the same size and 0 ≤ A ≤ B. Further let X be another matrix of the same size. Then Proof. (i) and (ii) is well known and omitted. (iii) By assumptions it holds 0 ≤ X t AX ≤ X t BX. Hence λ 2 i X t AX ≤ λ 2 i X t BX and the result follows.
We set TrSq(A) := n i=1 a 2 i,i . Then Proof. We only proof the first and the last statement in (ii). Note that a ij a kl Cov (V i V j , V k V l ) .
If i = j = k = l then Cov (V i V j , V k V l ) = 2 + Cum 4 (V ); if i = k, j = l, i = j or i = l, j = k, i = j then Cov (V i V j , V k V l ) = 1. Otherwise Cov (V i V j , V k V l ) = 0 and this gives (D.6).
In order to see (D.7) note that by Lemma D. 3 (v) Var V t ABW = B t A 2 F = tr BB t AA t ≤ tr 1/2 BB t 2 tr 1/2 AA t 2 = BB t F AA t F .