Common price and volatility jumps in noisy high-frequency data

We introduce a statistical test for simultaneous jumps in the price of a financial asset and its volatility process. The proposed test is based on high-frequency data and is robust to market microstructure frictions. For the test, local estimators of volatility jumps at price jump arrival times are designed using a nonparametric spectral estimator of the spot volatility process. A simulation study and an empirical example with NASDAQ order book data demonstrate the practicability of the proposed methods and highlight the important role played by price volatility co-jumps.


Introduction
In recent years the broad availability of high-frequency intra-day financial data has spurred a considerable collection of works dedicated to statistical modeling and inference for such data. Semimartingales are a general class of timecontinuous stochastic processes to model dynamics of intra-day log-prices in accordance with standard no arbitrage conditions. We consider a general Itô semimartingale log-price model allowing for stochastic volatility, price and volatility jumps as well as leverage. Uncertainty and risk in these models are usually ascribed to two distinct sources: First, the volatility process of the continuous semimartingale part that permanently influences observed returns and, second, occasional jumps in prices. Modeling and inference on the two components constitutes a core research topic in statistics, finance and econometrics bringing forth the seminal contributions by [6], [7], [10], [4] and much more literature devoted to this aspect. For asset pricing ( [20], [42]), macro and monetary economics ( [43]) and risk management ( [34]) information about jumps is of key importance. While the literature on price jumps is well developed from both a statistical and empirical point of view, methods and evidence about volatility jumps are lagging behind. Empirical evidence about volatility jumps is usually based on methods for price jumps applied to an observable volatility measure like the index of implied volatility of S&P 500 index options (VIX), see [17] and [41]. Such modeling strategies inevitably restrict the number of target variables and the overall scope of empirical insights. Since price jumps have often been associated with macro announcements or firm specific news, a natural empirical question arises, if prices and their volatilities jump at common times stimulated by the same events, or not. Such common jumps of price and volatility are often excluded in the statistics literature to avoid technical difficulties. Beyond the question if one should include simultaneous jump times in price and volatility in a model, testing locally for volatility jumps opens up new ways to study effects of information processing and volatility persistence. This is also reflected in an increasing interest to separate the leverage effect in a continuous and a jump part in the current literature, see [1] and [30]. The asset pricing model of [39] illustrates economic forces behind contemporaneous price and volatility jumps. In their model, agents learn about the profitability of a firm in a changing political environment. A change in government policy does not only affect the expected profitability of a firm (price jump) but also triggers a simultaneous volatility jump induced by the impact uncertainty of the new policy.
This article presents a statistical test to decide whether intra-day log-prices exhibit common price and volatility jumps. Our main contribution is to extend the pioneering works by Jacod and Todorov [29] and Bandi and Renò [8] and to provide an approach for an observation model that accounts for market microstructure. It is widely acknowledged that due to market microstructure of financial data recorded at high frequencies, as effects of transaction costs and bid-ask bounce, log-prices are not directly adequately modeled by semimartingales. Taking microstructure frictions into account substantially changes statistical properties and involved mathematical concepts of estimators. We introduce a spectral spot volatility estimator for noisy observations. The test generalizes the theory by [29] for non-noisy observations. We obtain a statistical test by a neat combination of a stable central limit theorem at (almost) optimal rate for the spectral spot volatility estimator and a suitable test function. In analogy to [29], the new test is self-scaling in the volatility and rate-optimal. Those two properties are crucial to obtain an efficient method. The development of a test that can cope with noise is of high relevance and importance as Jacod and Todorov [29] already remark in their empirical application: "presence of microstructure noise in the prices is nonnegligible". We show in simulations that compared to an application of the method by [29] based on skip-sampled returns, we can significantly improve the power of the test.
Jumps in prices and the volatility are of very different nature. Large price jumps become visible through large returns. More precisely, in a high-frequency context truncation techniques as suggested by [35], [32] and [24] can be used to identify returns that involve jumps. Up to some subtle changes due to dilution by microstructure, this remains valid also in the noisy setup. However, the localization of jump times becomes less precise and more difficult under noise. A first localization method for price jumps in the noisy semimartingale model has been introduced by [21] using wavelets. Other localization approaches are included in [33] and in [16]. We adopt the methods from [16] to estimate the spot volatility in presence of price jumps and also to locate price-jump times by thresholding. Contrarily to price jumps, volatility jumps are latent and not as obvious as price jumps due to the fact that we can not observe the volatility path. The key element to determine volatility jumps will be efficient estimates of the instantaneous volatility from observed prices.
Our spectral spot volatility estimator relies on the Fourier method promoted by [40] and [12] for estimating quadratic (co-)variation, combined with truncation techniques of [16] to deal with price jumps. These methods attain lower variance bounds for integrated volatility estimation from noisy observations and are, compared to simple smoothing methods and especially skip-sampling to lower observation frequencies, more efficient. While we are the first who address the testing problem under noise, consistent spot volatility estimators under noise are available. [45] and [36] present local two-scales estimators and prove stable central limit theorems. The construction of a rate-optimal pre-average estimator is sketched in Section 8.7 of [3]. An alternative approach considering deterministic volatility is presented in [38]. For our estimator, we establish rate-optimality and a stable central limit theorem with smaller asymptotic variance compared to the pre-average approach. The asymptotic theory allows for general heteroscedastic, serially correlated and endogenous noise. With this estimation approach at hand, we design a test, comparing estimated local volatilities and their left limits at the estimated price-jump times. As a special case, this includes a local test for volatility jumps at some fixed time or stopping time. A test with fast convergence rate based on second order asymptotics of the estimator is suggested. While the overarching strategy follows [29], the specific test function and construction in the noisy observation case are different and profit from the spectral estimation methodology. Compared to previous estimation techniques to smooth noise, the asymptotic variance structure of the spectral volatility estimates in Theorem 1 admits a simpler form. This facilitates a test statistic which is self-scaling in the local volatility and thus furnishes an asymptotic distribution free test with the best possible rate. The Monte Carlo study corroborates the high precision of the methods in finite samples. Our data study shows that price volatility co-jumps occur and are practically relevant.
The paper is organized as follows. Section 2 introduces the model and the statistical problem. We discuss the main ideas for the construction of the test including a short review of the approach for non-noisy data. Section 2.2 describes the spectral spot volatility estimation. We state and discuss the assumptions imposed on the model for the asymptotic theory in Section 3.1 before presenting the main results in Section 3.2. Practical guidance for the implementation and a Monte Carlo study are given in Section 4. In Section 5 the methods are used to analyze price and volatility jumps in NASDAQ high-frequency intra-day trading data, reconstructed from the order book. Section 6 concludes. All proofs are gathered in Section 7.

Model, testing problem and statistical approach
Let (Ω X , F X , (F X t ), P X ) be a filtered probability space satisfying the usual conditions. The latent log-price process X follows an Itô semimartingale with W an (F X t )-adapted standard Brownian motion, μ a Poisson random measure on R + × R with R + = [0, ∞) and an intensity measure (predictable compensator of μ) ν(ds, dx) = λ(dx)⊗ds for a given σ-finite measure λ. We consider discrete observation times i/n, i = 0, . . . , n, on the time span [0, 1]. The prevalent model, capturing market microstructure effects which interfere the evolution of an underlying semimartingale log-price process at high frequencies, is an indirect observation model with noise: where ( i ) 0≤i≤n is a discretization of the continuous-time noise process (U t ) t∈ [0,1] .
We consider X and U on a common probability space (Ω, F, Here, for two σ-algebras F and H, we denote F H the smallest σ-algebra which contains F ∪ H. X has the same form (1) on this space, see Section 16.1 of [28] for a formal construction of embedding X and U in a joint probability space. Regularity conditions on the characteristics of the efficient price X and the noise, under which we establish asymptotic results, are given in Section 3.1. In particular, we work with a general smoothness assumption on the volatility (σ t ) t∈ [0,1] . Similar to [29], resulting convergence rates of the spot volatility estimator and the asymptotic test hinge on this smoothness. First, readers may think of the typical case that (σ t ) t∈ [0,1] is an Itô semimartingale with a representation as X in (1) and with locally bounded characteristics.

Test for common price and volatility jumps
In the presence of price jumps, we design a statistical test to decide if contemporaneous price and volatility jumps occur on the considered time interval [0, 1]. Let (S p ) p≥1 be a sequence of stopping times exhausting the jumps of X. We denote the process of left limits of the volatility σ t− = lim u→t,u<t σ u . We address the null hypothesis of no common jump of volatility and price on [0, 1]: against the alternative hypothesis that there is at least one jump in the volatility at a jump time of X.
Our test for (3) relies on two main ingredients. First, localization of price jumps using thresholding. Second, a local test for volatility jumps. Suppose we want to test H * 0 : |σ 2 s − σ 2 s− | = 0 at a specific time s ∈ (0, 1), against the alternative hypothesis that the volatility exhibits a jump |σ 2 s − σ 2 s− | > 0. For such a test we require estimates of the squared volatility at time s,σ 2 s , and before time s,σ 2 s− . An intuitive test statistic is the differenceσ 2 s −σ 2 s− . It turns out that a more general class of statistics T * (s) = g σ 2 s ,σ 2 s− with a test function g facilitates improved asymptotic properties.
If discrete observations of the efficient log-price X i/n , i = 0, . . . , n, were directly available, and if we assume for this motivation that there are no jumps in X, σ 2 s and σ 2 s− could be estimated by local versions of realized volatility: For an Itô semimartingale (σ t ) t∈ [0,1] , k n = c √ n with some constant c,σ 2 s yields rate-optimal spot volatility estimators, that is, (σ 2 s − σ 2 s ) = O P n −1/4 . Further, on the null hypothesis that σ s− = σ s , for k n = c n b with b = 1/2 − δ and δ > 0 arbitrarily small, a stable central limit theorem can be proved For stochastic volatility the limit is mixed normal and it is important that the convergence holds stably in law to allow for confidence intervals. This is a stronger mode of weak convergence which is equivalent to joint weak convergence with every F X -measurable bounded random variable, see [28] for an overview on stable limit theorems. This limit theorem readily supplies an asymptotic test for a volatility jump at time s with a rate of convergence n b/2 . However, the convergence rate is rather slow and not optimal for this testing problem. For the test statistic −→ χ 2 1 with a χ 2 1 limit distribution and a much faster rate. This improves the (asymptotic) power significantly. A key property is that the test is pivotal, since T (s) is self-scaling in the volatility. This means that it does not require some estimated asymptotic variance, since the limit does not depend on any unknown parameter. Such a local test is not separately highlighted in [29], but is contained as one ingredient of their general method. The final test statistic of [29] is a sum of these local test statistics over all estimated jump times.
It is not obvious how to construct a generalization of the local test for a volatility jump to the noisy observations setup (2). Spot volatility estimators, which are local versions of integrated volatility estimators under noise, are avail-able, see for instance [45] and [36]. For an Itô semimartingale (σ t ) t∈[0,1] and i.i.d. noise with some moment assumption, stable central limit theorems with optimal β = 1/4 − δ, δ > 0, can be proved. Based onσ 2 s −σ 2 s− , a test with rate n β/2 could be constructed. Asymptotic variances AVAR s of such estimators are usually sums of at least three addends: one depending on the noise variance, one including the quarticity σ 4 s and a cross term depending on both. This applies to the asymptotic variances of the spot volatility estimators in [45] and [36], which, however, have sub-optimal slower convergence rates localizing a sub-optimal two-scales integrated volatility estimator. The construction of a rate-optimal pre-average spot volatility estimator with an asymptotic variance of the type above is sketched in Section 8.7 of [3]. Due to this structure of the asymptotic variance, it appears difficult to find a suitable test function that facilitates an asymptotic distribution free test with improved convergence rate.
Apart from attaining asymptotic efficiency, our main motivation to construct a method based on spectral spot volatility estimation is that we will be able to prove a stable central limit theorem under mild assumptions for semimartingale volatility. Here, η = E[ 2 i ] is the variance of i.i.d. noise, while we consider more general heteroscedastic and serially correlated noise in Section 3. This enables us to find a suitable test function g σ 2 s ,σ 2 s− , such that for a test statistic T 0 (s) which is self-scaling in the volatility. The self-scaling property and the much faster convergence rate are key features to derive a reliable testing procedure.
To test the null hypothesis (3), local tests are performed at the estimated price-jump times which can be detected by truncation methods. Our asymptotic analysis provides results for the local test at some time s as a special case.
The tests for common price and volatility jumps of [29] for direct observations and our generalization for noisy observations both restrict to finitely many large price adjustments at whose arrival times local tests are performed. Testing for volatility jumps over an interval instead would require a sequence of tests for volatility jumps at infinitely many points and is rather connected to a highdimensional testing problem. A theory without noise recently has been presented in [14] and a generalization of the techniques, which are quite different to [29], to the model with noise is a challenging topic for future research. It is clear that detecting volatility jumps from noisy observations of the price is especially difficult if we do not specify where to look for potential volatility jumps and the finite-sample performance of a global test is limited, see Section 6 of [14]. Restricting to local tests for volatility jumps as in this work facilitates a larger power in finite-sample applications.

Spectral spot volatility estimators
Consider a sequence of equispaced partitions of the considered time span [0, 1] into bins [kh n , (k + 1)h n ), k = 0, . . . , h −1 n − 1. For a simple notation suppose nh n ∈ N, such that on each bin we enclose nh n noisy observations. A main idea of spectral volatility estimation, constructed in [15], is to perform optimal parametric estimation procedures localized on the bins. Based on these local estimates, one can build estimators for the spot and the integrated squared volatility. We utilize L 2 -orthogonal functions (Φ jk ) 1≤j≤Jn for spectral frequencies 1 ≤ j ≤ J n in the Fourier domain up to a spectral cut-off J n ≤ nh n . For The indicator functions localize the sine functions to the bins. For the spectral volatility estimation, local linear combinations of the noisy data are used with local weights obtained by evaluating the functions (7) on the discrete grid of observation times i/n, i = 0, . . . , n. We use the notion of empirical scalar products and norms for functions f, g as follows: f, g n := 1 n The empirical norms of the sine functions above give for all bins k = 0, . . . , h −1 n − 1: and we have the discrete orthogonality relations where δ jr = 1 {j=r} is Kronecker's delta. The latter rely on basic discrete Fourier analysis, a detailed proof is given in [15]. The central building blocks of spectral volatility estimation are the spectral statistics in which observed returns Δ n i Y = Y i/n − Y (i−1)/n , i = 1, . . . , n, are smoothed by bin-wise linear combinations. Since the weight functions Φ jk (t) are non-zero only on the kth bin, the spectral statistics (S jk ) include returns (Δ n i Y ), i = knh n + 1, . . . , (k + 1)nh n only over the bin under consideration. In absence of price jumps, bin-wise estimates for the squared volatility σ 2 khn , k = 0, . . . , h −1 n − 1, are provided by weighted sums of bias-corrected squared spectral statistics: For the moment, readers can interpret (η t ) t∈[0,1] as time varying variance function of the observation errors in (2) andη khn some consistent estimator. In Section 3.1, this is further generalized. The oracle optimal weights with n η khn /n) −2 , follow from minimization of the variance under the constraint of unbiasedness. For a fully adaptive approach we apply a two-stage method and obtain adaptive local estimates ζ ad k (Y ) by plugging in estimated optimal weightsŵ jk in (12). Remark 1. Spectral statistics are related to pre-averages used by [26], but the two estimators can not be transformed into one another, see Remark 5.2 in [27] for a discussion of their connection. One difference is that for the spectral method we start with a histogram structure and not a rolling kernel and then smooth binwise noisy observations in the Fourier domain. The statistics (11) de-correlate the data for different frequencies and form their local principal components. This is key to the asymptotic efficiency attained by the spectral estimators as shown in [40] and [12]. The latter shows that the estimator's asymptotic variance coincides with the minimum asymptotic variance among all asymptotically unbiased estimators. We refer to Remark 3.1 of [27] for a recent discussion about efficient volatility estimation under noise.
The spectral volatility estimation provides local estimates (12) for the squared volatility σ 2 khn , k = 0, . . . , h −1 n − 1. In order to derive an estimateσ 2 s at some time s, we average the statistics ζ k (Y ) over a local window around s of length In the presence of jumps in (1), truncation disentangles bin-wise statistics (12) which include jumps from all others. We use the methods from [16] to cope with price jumps for volatility estimation. If h n |ζ k (Y )| > u n for a threshold sequence u n = c h τ n , τ ∈ (0, 1), with some constant c, the statistic is too large to be driven by the continuous part and is evoked by a jump of X. In order to estimate the volatility, we thus truncate ζ k (Y ) for these k. For estimating the squared volatility and its left limit at a certain time s, we use two disjoint windows after and before s, respectively.
When the optimal weights (13) are known, an oracle spot volatility estimator and the estimator forσ 2 s−,or : , we shrink one window length accordingly. Since the optimal weights (13) hinge on the unknown squared volatility and the noise level (η t ) t∈[0,1] , we proceed with a two-step estimation approach. First, select a pilot spectral cut-off J pi n nh n , and build pilot estimators for the squared volatilitŷ , andσ 2 s−,pil analogously. The pilot estimators are hence averages of squared, biascorrected spectral statistics over r −1 n bins and J pi n spectral frequencies. In the second step, these pilot estimators are plugged into (13) to determine adaptive weightsŵ jk for the final estimators. We write The spectral estimators of the squared spot volatility at time s and its left limit are: Estimates (17a) and (17b) are truncated local averages of the statistics (16). Our approach entails several tuning parameters whose practical choice is discussed in Section 4.2.

Assumptions with discussion
We start with the assumptions on the characteristics of X in (1) which are similar to the ones in [29].

Assumption 1.
For the adapted and locally bounded drift process (b s ) s≥0 , we require a minimal smoothness condition that for 0 ≤ t < s ≤ 1, some constant C and some ι > 0: The volatility process σ t is càdlàg and neither σ t nor σ t− = lim u→t,u<t σ s vanish.

Assumption (H-r). We assume that sup
We index the assumption in r ∈ [0, 2] to highlight the role of the jump activity index r. The larger r, the more general jump components are included in our model. In particular for r = 0 we consider jumps of finite activity. Imposing r < 1 instead allows for infinite activity jumps which are absolutely summable. We state the assumptions on characteristics of X with respect to (Ω, F, (1) is also a standard Brownian motion on this space. For the volatility process, our target of inference, we work with the following general smoothness condition determined by a smoothness parameter α ∈ (0, 1]. with some function f σ : R 2 → R, continuously differentiable in both coordinates, and two The smaller α, the less restrictive is Assumption (σ-α). It is natural to develop results for general α ∈ (0, 1] to cover a broad framework and preserve some freedom in the model. This is particularly important, since the precision of nonparametrically estimating a process (or function) foremost hinges on its smoothness α. Therefore, convergence rates in Section 3.2 hinge on α.
In the composition of the volatility in Assumption (σ-α), σ can contain a non-Lipschitz seasonality component (Lipschitz continuous seasonalities can as well be modeled by the drift of σ (A) t ). As pointed out by [29], σ can also be a long-memory volatility component as the prominent exponential fractional Ornstein-Uhlenbeck model by [19].
While an i.i.d. assumption on the noise is standard in most works, empirical findings, for instance by [22], motivate to allow for serial correlation and endogeneity in the noise. We develop our theory under the following general assumption.
for t ∈ [0, 1] uniformly on compacts in probability and we have the mixing behavior for some > 0, which is specified in the discussion below Theorem 1. The process (η t ) t∈[0,1] is locally bounded and satisfies for all t, (t + s) ∈ [0, 1] the mild smoothness condition: for some continuous bounded function The case that Cov( i , i+l ) = 0 for all l = 0 and η = Var( i ) constant for all i is tantamount to the classical setup with i.i.d. noise. In general the noise is serially correlated, endogenous and heteroscedastic. Different to Assumption (GN) in Section 7.2 of [3], we do not assume that the noise is conditionally centered to include the correlation to the increments of X in (23). The endogeneity condition (23) includes linear models of the form i = i l=i−Q c l √ nΔ n l X + U i , with U i exogenous errors and constants c l , similar as in Equation (6) of [31] or considered by [9]. If we knew the process (η t ) t∈ [0,1] , Assumption (η-p) with a mild lower bound for would be sufficient for our asymptotic results. For an adaptive method, however, we need to estimate the process (η t ) t∈ [0,1] . Consistent estimation of the noise long-run variance (20) requires stronger structural assumptions. For a Q-dependent noise process, that is, sup i=0,...,n |Cov( i , i+q )| = 0 for q > Q and some given Q < ∞, and if η in (20) is time-invariant, consistent estimation with √ n-convergence rate of η has been established by [23]. In [13] it is shown how Q can be found adaptively if it is unknown. Consistent estimation of the noise variance process under heteroscedasticity, but without serial correlations, is discussed in [27]. For the fully adaptive method, we tighten the assumptions on the noise as follows.
Assumption 2 is satisfied by a Q-dependent noise process. Then, a consistent estimation of the long-run noise variance (20) process is possible.
n −1, the locally constant approximated noise long-run variance process can be estimated with accuracŷ Our estimator is given in (43) in the appendix. It is somewhat related to the methods from [23] and [13], but localized to bins.
The assumptions on the noise are more general than in other works on spectral volatility estimation as in [5] and in [13]. In particular, to the best of our knowledge, we consider for the first time heteroscedastic and serially correlated, endogenous noise. (7)-(11), we discuss equidistant observations which allows us to rely on discrete-time Fourier identities in (10). Considering a heteroscedastic noise-level, our analysis and results are at the same time informative about non-equidistant observations. For general observation schemes t n i , i = 0, . . . , n, we impose the condition that a differentiable cdf F exists such that observation times t n i = F −1 (i/n) are obtained by a quantile transformation from the equidistant setting. Moreover, we require that the derivative F is strictly positive and satisfies the same smoothness as (η t ) in (22). These assumptions are the same as in Assumption (Obs-d) of [5]. Then, all our asymptotic results transfer from the equidistant to this general setting when we replace η s by η s (F −1 ) (s). This follows directly by the asymptotic equivalence of the respective experiments established in [12]. In particular, having locally less frequent observations is equivalent to having locally an increased noise level. Therefore, under the imposed conditions, 1], may be pooled. Note that adding the factor (F −1 ) (s) to the noise level η s is the same as generalizing the frequently occurring factor Φ jk

Remark 2 (Non-equidistant observations). For a coherent and simple exposition of the construction of the spectral estimator in
gives the local sample size. In the equidistant case this is nh n and we have that F (s) = 1 is constant.

Asymptotic results
Our first main result is on the spot squared volatility estimator and its asymptotic distribution.

Theorem 1. Suppose Assumptions 1, 2 and (H-r) with some r < 2 and smoothness Assumption
and τ < 1 − β/(p − 2) when p < ∞ moments of the noise exist, with τ the truncation exponent in the sequence u n in (15), (17a) and (17b), the estimators satisfy the F-stable central limit theorem: For the oracle estimators (14a) and (14b), the same limit theorem applies under the less restrictive Assumption (η-p) with p = 8, > β, and if τ < 1 − β/(p − 2). In fact, we can get arbitrarily close to the optimal rate for estimation which is known to be n α/(4α+2) in this case, see [37]. Balancing the squared bias and the variance guarantees that the estimators (17a) and (17b) attain the optimal rate. For a central limit theorem we avoid an asymptotic bias by slightly undersmoothing. Most interesting is the case when α ≈ 1/2, e.g. when the volatility is a semimartingale. Then the convergence rate is n 1/8 . In case that α > 1/2, we obtain faster convergence rates. In case that α = 1/2 and if all moments of the noise process exist, for any r < 3/2 in Assumption (H-r), we can choose β = 1/4 − ε with any ε > 0. Under the standard assumption that we only have Assumption (η-p) with p = 8, the condition τ < 23/24 results in r < 34/23 ≈ 1.478. Hence, restricting to the condition that up to 8th moments of the noise exist leads only to a slightly less general condition on the jump activity. We point out that the restriction r < 3/2 on the jump activity, to come close to the optimal convergence rate, is less restrictive than the one obtained for integrated squared volatility estimation, r < 1, in [16]. The reason is that for spot volatility estimation we can only obtain slower convergence rates by local smoothing compared to integrated volatility estimation. This, however, works also under more active jumps.
The limit variable in (26) is mixed normal which we denote by MN and defined on a product space of the original probability space (on which Y is defined) and an orthogonal space independent of F. The convergence is F-stable in law, marked (st). Stability of weak convergence then allows for a so-called feasible version of the limit theorem (26) that facilitates confidence sets.

Corollary 3.2.
Under the conditions of Theorem 1, and also for any J n fixed as n → ∞: (13), obtained by inserting the pilot estimates.

as defined in the weights
The results proved for the spot volatility estimator provide a main building block for our asymptotic test, but are moreover of interest in their own right. They show that the spectral method renders effective spot squared volatility estimators under general noise and in the presence of jumps.
In the sequel, let (S p ) p≥1 be a sequence of stopping times exhausting the jumps of X. We address the null hypothesis (3) that no common jumps of volatility and price occur on [0, 1]. Under the alternative hypothesis, there is at least one contemporaneous jump in volatility and price.
Analogously to [29], we specify test hypotheses more precisely by focusing on jumps of X with absolute values |ΔX Sp | > a for a ≥ 0 and write H(a) [0,1] . The reason for this is that a suitable test statistic and associated limit theory for H(a) [0,1] with a > 0 works under a much more general setup with jumps of infinite variation while testing H(0) [0,1] requires Assumption (H-0) to hold. In both cases, we concentrate on a finite number of (large) price jumps under the null hypothesis. From an applied point of view this is reasonable, since we are interested in volatility movements at finitely many relevant price adjustments on a fixed time interval.
Denote by g : Let us now state the general form of our test statistics: Under mild regularity assumptions on g in terms of differentiability in both coordinates, limit theorems for (28) the following asymptotic distribution of the test statistic applies under H(a) [0,1] : Under the alternative hypothesis, n β T 0 (h n , r n , g) → ∞ in probability. Therefore, we obtain an asymptotic distribution free test by the asymptotic χ 2 -distribution with N 1 degrees of freedom. The test with critical regions where q α (χ 2N 1 ) denotes the α-quantile of the χ 2N 1 -distribution, has asymptotic level α and asymptotic power 1.
In fact, (31) contains the estimated number of price jumpsN 1 . Since P(N 1 − N 1 > 0) → 0, (30) applies with N 1 also. A naive approach based on the asymptotic normality result (27) with test functiong(x 1 , x 2 ) = (x 1 − x 2 ) yields as well an asymptotic test. It holds that on the null hypothesis H(a) [0,1] . Apparently, the rate r −1/2 n n β/2 , 1 close to n 1/8 for α ≤ 1/2, is slower and thus the test in Theorem 2 is preferable. [29], their test based on (5) corresponds to a two-sample likelihood ratio test for equal variances in a Gaussian parametric model with observations

Remark 3. As mentioned by
In this simpler model -closely related to our model in case of no noise -the likelihood ratio is where the estimators (4) are the maximum likelihood estimators for this model, and we derive the convergence of −2 log(Λ) to a χ 2 1 -distribution from the standard asymptotic theory for likelihood ratio tests.
The model with noise is more complicated. Our test from Theorem 2 does not directly correspond to a parametric likelihood ratio test and our estimators (17a) and (17b) do also not agree with the non-explicit maximum likelihood estimators in this model. The choice of g in (29) is motivated by studying which properties in (5) are important for the asymptotic pivotal distribution under the null. Any function of the form g(x, y) = 2f ( x+y , with some twice differentiable function f , is suitable for the construction of tests (in both models) with the fast convergence rate based on second order asymptotics of the On the other hand, that the statistic (5) is self-scaling in the volatility leading to the pivotal limit distribution is due to the identity

which guarantees the above identity in the model without noise. In light of the efficient asymptotic variance under noise in Theorem
.
Since the noise level η s can be estimated with a much faster rate of convergence than σ 2 s -even under our general assumptions for the noise -this choice of (29) facilitates (30).
The particular choice of the spectral estimators (17a) and (17b) is not crucial for the construction of the test. Any rate-optimal spot volatility estimator may be used when it is possible to find a function f satisfying the above identities. However, with a more complex asymptotic variance structure, for instance for pre-average or realized kernel estimators, this appears to be difficult. Estimators attaining the same efficient variance as in (26) may be used with the same function g in (29), to derive a test with the same asymptotic properties. A localized QMLE as discussed by [18], for instance, could allow for analogous results.

Setup of Monte Carlo simulation study
The simulation study examines the finite-sample performance of the proposed methods. We implement a model where observed log-prices are given by with jump intensity measure ν(dt, dx, dy) = λ dt Π(dx)Π(dy) and with Gaussian jump sizes Π ∼ N (H, H/100) whose magnitude depend on a parameter H. The efficient semimartingale log-price process is recorded with additive microstructure noise In line with empirical evidence, this model generates serially correlated noise. We further consider two different noise models (34) and (35) below. We set values of η according to realistic noise-to-signal ratios. We use the median value of the estimated measure nη 1 0 ϕ 4 t σ 4 t ) −1/2 found in a comprehensive data study in [13]. Sample sizes n = 30, 000 and n = 5, 000 in our simulations suggest η 1/2 ≈ 0.005 and η 1/2 ≈ 0.015, which we use in the following as two realistic noise levels. According to the data summary in Table 5, 30,000 is a sample size that matches (approximately) the average daily observation numbers of our empirical data. We additionally analyze the methods' performance for smaller samples sizes n = 5, 000, which is realistic for less frequently traded assets. We set θ = 0.6 equal to the empirically motivated value in [13].
ϕ t = 1 − 3 5 √ t + 1 10 t 2 mimics a deterministic volatility intra-day seasonality pattern and σ 2 t a random stochastic volatility component with leverage: The jump measure above has a second real argument to incorporate instantaneous arrivals of volatility jumps. The volatility jump component is of the form with γ ∈ R and intensity measureν(dt, dz) = dt Π(dz). Setting γ = 0 results in no common price and volatility jumps which means the null hypothesis is valid. To simulate the model under the alternative hypothesis, we set γ = 1 instead.

Choice of tuning parameters
In the sequel, we provide advice on how to specify the tuning parameters that are involved in the nonparametric procedures. We also conduct a sensitivity analysis for the Monte Carlo study to find suitable values. First, the bin-width h n n −1/2 log n balances the number of observations on bins nh n , which should be large enough to smooth out noise, and the discretization error by approximating volatility bin-wise constant. The sensitivity analysis will show that the final test is very robust to modifications of h n . We advise to select h n such that the number of observations on bins is at least 50 within a range to 250 observations for typical high-frequency financial data. This results in a time resolution of 50-150 bins per trading day. For the spot volatility estimators (17a) and (17b) and the pilot estimator (15), we fix spectral cut-offs J n and J pi n , respectively. The values of the spectral cut-offs do not influence the methods when set sufficiently large. Since the weights (13) decay exponentially for j √ nh n log n, the addends with j large become negligible, such that it suffices to choose J n log n. The proportionality constant should be larger than 1, we take values between 3 and 12. The pilot estimators (15) instead use averages over frequencies j = 1, . . . , J pi n , such that we fix J pi n to be smaller. We thus use J pi n log n with a proportionality factor smaller than for J n . The threshold sequence u n determines the bins on which large returns are ascribed to jumps. We use the practical selection presented in [16].
The most influential tuning parameter for our test is the size of the smoothing window r n n −β log n. If we choose r n larger, the spot volatility estimates have smaller variance but the bias for rapidly varying volatilities increases. For α = 1/2, we know the exact order of r n depending on n. There is, however, no simple rule of thumb to fix the constant κ 2 , and we conduct an extensive sensitivity analysis to find the best suitable values. The sensitivity analysis reveals that in order to detect volatility jumps and separate them from a rough continuous semimartingale volatility component, we should use rather small smoothing window sizes.
We investigate the performance of the test for common price and volatility jumps depending on the tuning parameters h n and r n in the Monte Carlo simulation. We implement the setup from paragraph 4.1 with λ = 2, η 1/2 = 0.005 and H = 0.25 for both sample sizes n = 30, 000 and n = 5, 000. We set J n = 30 in Table 1 Empirical power of the α = 0.05-test for n = 5, 000 depending on tuning parameters h 5,000 and r 5,000 . all configurations which is large enough to guarantee high efficiency but smaller than nh n in any configuration. J pi n is set equal to 25. Figure 1 shows the empirical power and a global testing error including misspecification of the size for a typical testing level α = 0.05 and for n = 30, 000. The power of all configurations is quite high. Starting with values r −1 30,000 = 2, that means the smoothing window is two bins in each direction, the power significantly increases by choosing larger values of r −1 30,000 . However, larger values of r −1 30,000 lead to a misspecification of the size. The global testing error which adds the misspecification of size with equal weight to the power is minimal for r −1 30,000 = 4. On the other hand, the performance is remarkably robust across all considered values of h 30,000 .
The precise values of empirical power and size for n = 5, 000, depending on r 5,000 and h 5,000 are given in Table 1 and Table 2. Again, the global error measure becomes minimal when r −1 5,000 = 4, not changing much for r −1 5,000 = 3 or 5, and being very robust with respect to h 5,000 .

Simulation results for spot volatility estimation with a comparison to a multi-scale approach
We analyze the accuracy of the spectral spot volatility estimator. First, we illustrate its performance in the model from Section 4.1, with only a non-random but time-varying volatility component ϕ t = 1 − 3 5 t 1/5 + 1 10 t 2 without volatility jumps. This allows a convenient visualization of the estimation uncertainty. Next, we compare the performance of our spectral spot volatility estimator to that of a noise-robust multi-scale spot volatility estimator. The multi-scale estimator for integrated volatility is adopted from [44]. Applied to all data it estimates 1 0 σ 2 t dt and we denote it by X, X 1 . In order to obtain an estimator of σ 2 t at some t ∈ (0, 1), we use a local difference X, X t − X, X t−δ with suitable small δ. This extends the methods by [36] and [45] from two-scale to multi-scale versions. Though no theoretical results are established for this estimator, it is clear that for optimal δ the approach renders a rate-optimal multi-scale spot volatility estimator. A tuning parameter, the multi-scale frequency, is chosen data-driven in an optimal-way, for which a formula is provided in Section 6 of [11]. The multi-scale estimator gets biased under autocorrelated noise as in (33). Thus, we focus on noise models without serial correlation to draw a meaningful comparison. First, consider In this model, the bias-correction of the spectral estimator uses a standard noise variance estimator for i.i.d. noise. Further, we examine the estimators in the following noise model with time-varying and endogenous noise: and (34) for i = 0, . . . , 5. Here, we use locally bin-wise estimated noise levels for the bias-correction terms.
Since generated volatility paths in our simulation model are random and thus different in each run, we measure the discrepancy for each path. A suitable global quantity to assess the estimators' qualities from M Monte Carlo iterations is an average normalized mean integrated squared error The integrals are approximated by sums. For the multi-scale estimator, we set δ = K −1 MS and compute spot volatility estimates on a grid of K MS equidistant time points. An optimization of the MISE led us to fix K MS = 30 for n = 30, 000,

Fig 3. Bin-wise averages of spectral (points) and multi-scale (crosses) spot squared volatility estimates with bin-wise standard deviations (dashed lines) in comparison to the true spot squared volatility (solid line), for n = 30, 000. The area around the spectral estimates determined by their standard deviations is gray colored such that the other dashed lines depict the standard deviations of the multi-scale estimates.
and K MS = 10 for n = 5, 000. For the spectral estimator the discretization is given by the h −1 n bins of length h n . An overview of the results for different noise levels, each quantity based on M = 3, 000 Monte Carlo runs, is given in Table 3. The spectral estimation outperforms the ad hoc multi-scale approach in each model specification. The efficiency gains are most relevant for larger noise and more frequent observations. Figure 3 visualizes spectral and multi-scale spot volatility estimates with their standard deviations when the true volatility is deterministic and given by ϕ t = 1 − 3 5 √ t + 1 10 t 2 . The confidence regions sketched by the point-wise standard deviations are wider for the multi-scale than for the spectral estimator. We further see a small positive bias of the multi-scale estimates. The discretization, chosen to optimize MISE, is also coarser than the bins of the spectral method which we expect to be the main reason for this bias.
Overall, the estimation results for the spectral spot volatility estimator are promising. They confirm that it provides a useful statistical device which is of interest beyond its use as one ingredient for the statistical test for common price and volatility jumps.

Simulation results for the test with a comparison to a skip-sampling approach
In the sequel, we first study the empirical size and power of our test with respect to different calibrations of volatility jump sizes, noise level and number of observations. To evaluate the improved performance in comparison to the test by [29], we also implement the latter based on appropriately down sampled discretized simulated paths. The parameter configurations used in the Monte Carlo study for different scenarios are summarized in Table 4 together with the chosen tuning parameters according to the values found to be optimal in the sensitivity analysis. In scenario II (I) the average price jump is approx. 20 (60) times larger than the average absolute return. The identification of price jumps by truncation thus works with only very few errors. Hence, we can use the results from all Monte Carlo iterations to analyze our methods' performance. Examining the ability of thresholding to locate price jumps in different situations has been addressed in [16]. Here, the focus is on the test for common price and volatility jumps. The volatility jumps in scenarios I, II and IV are a bit smaller than half the size of the average range of the simulated continuous part of the intra-day volatility path. Figure 6 illustrates that in empirical applications much larger volatility jumps occur. In scenario III the jump in the volatility is less than 20% of the range of the continuous intra-day volatility motion. In scenarios I, II and IV we thus have a volatility jump size where the test should attain reasonable power, while scenario III investigates the behavior for rather small volatility jumps.
We compare the performance of our test based on the statistic (28) in scenario I for our simulated model with the method by [29]. We cannot apply the latter to the simulated n = 30, 000 high-frequency observations, since the simulated data contains noise. If we apply the test for direct observations to noisy data, the statistics are heavily biased and the performance is very poor. Instead, we skip-sample simulations at a coarser frequency. A heuristic optimization leads us in scenario I of our simulation study to an optimal skip-sample frequency resulting in ca. 500 "de-noised" observations on [0, 1]. For intra-day NASDAQ data this translates in using one observation per 46.8 seconds. In [29] a one minute frequency for different -but also very liquid -data is used in the application part. Moderate changes of the skip-sampling frequency do not affect the results substantially. Figure 4 demonstrates a very good performance of our test in scenario I. The power is 97.7% for the α = 0.05-test and above 90% even for level α = 0.01. Similar to our test, the performance of the Jacod-Todorov test applied to the 500 coarse returns is crucially influenced by the length of the smoothing window of local realized volatilities. We visualize two configu-

Empirical size and power of the tests in scenario I under the null hypothesis (left) and alternative hypothesis (right). Empirical amount of realizations smaller or equal percentiles of theoretical asymptotic distribution under the null (y-axis) against those percentiles (x-axis). The dotted line shows results for our test and the solid and dashed line two versions of the
Jacod-Todorov test using two different tuning parameters. The skip-sampling frequency is optimized to allow for the highest power. rations with k n = 50, 100 in the spot volatility estimators given in (4). The choice k n = 100 is in favor of higher power, but the accuracy of the asymptotic quantiles on the null hypothesis is not good. Setting k n = 50, we obtain less power but the empirical quantiles on the null hypothesis track the asymptotic ones more closely. In all configurations, the performance of the Jacod-Todorov test applied to skip-sampled data is inferior to the power of our noise-robust approach. This is not surprising, since for our approach we rely on an efficient smoothing technique while skip-sampling can be seen as the simplest method to smooth out noise. The performance of the Jacod-Todorov test is reasonably well also, but in a situation with large available sample sizes and significant noise it is worth to apply the more efficient, noise-robust procedure. If sample sizes are smaller (and the noise not larger), the difference between the two methods becomes smaller. Figure 5 shows the performance in other scenarios II, III and IV. Decreasing the sample size to n = 5, 000 observations in scenario II, while all parameters are equal as in scenario I, leads to a slightly smaller power and larger misspecification of the size. The power is still higher than for the skip-sample approach, but the difference is less relevant. With the tuning parameters which minimize the global empirical testing error, the misspecification of the size is still acceptable. Larger noise levels result in smaller power as shown for scenario IV in Figure 5, while the fit of the size remains good. In this situation, the Jacod-Todorov method would only work for less frequent skipsampling resulting as well in smaller power. For the alternative hypothesis with a small volatility jump in scenario III, a sensitivity analysis as in Section 4.2 led us to slightly different tuning parameters, h −1 30,000 = 200 and r −1 30,000 = 5. Since smaller bins give a higher time resolution, it is not surprising that detecting small volatility jumps in a rapidly time-varying spot volatility works better for a finer time resolution. On the other side, choosing r −1 n slightly larger leads to almost the same window length r −1 n h n for spot volatility estimation as before. The power for such small volatility jumps is less, but still ca. 60% for α = 0.05.

Data study
To provide evidence about the practical relevance of price-volatility co-jumps and to study the usefulness of our estimators and test in a real-world data environment, we apply our methodology to stocks traded at the exchange platform NASDAQ. The data study is based on limit order book data taken from the online data tool LOBSTER 2 . The example refers to stocks of the online and technology companies Amazon One benefit of our estimator and test is that we can directly plug-in traded log-prices, reconstructed from the order book, without considering any skip-sampling or synchronization procedures. Since the method is robust against market microstructure noise, we efficiently take into account all information stored in the data.
Guided by our theoretical results and the simulations, estimates and tests are based on spectral statistics calculated for k = 0, 1, ..., h −1 n −1 bins over a trading day, with h −1 n = 3 √ n/ log(n) . We set J = 30 and J pi = 15. Jumps in prices are detected with the locally adaptive thresholdû k = 2 log(h −1 n )h nσ 2 khn,pil , witĥ σ 2 khn,pil the pilot estimator (15) of the spot squared volatility. We fix constant window lengths r −1 n = 4. Surely, r −1 n determines a crucial parameter which can be studied to learn about the persistence or live-time of a break in spot volatility. We apply the test to each day separately. Table 5 reports the rejection rates for the 5% and 10% significance levels. Results indicate that on a 10% significance level 36% (INTC) up to 73% (AAPL) of jumps in prices are accompanied by jumps in volatility. It appears that the rejection rate decreases in the number of detected price jumps. This leads to relatively stable frequencies of price-volatility co-jumps over time across the considered stocks. Referring to the 5% significance level, the Amazon.com stock displays with around 4.4% of the trading days the lowest frequency of common price and volatility jumps. With around 6.7% of trading days, Facebook Inc. has the largest number of common jumps. Absolute jump sizes of the log squared volatility processes reported in Table 5 are considerably large. Figure 6 illustrates the mechanisms behind the test for common price and volatility jumps. Left hand plots show an upward jump in prices on bin k = 58, whereas right hand plots show a downward jump in prices on bin k = 39. Both price jumps are associated with a significant contemporaneous upward jump in spot volatility. The p-value in both examples is 0.00. On the first example date, August, 13th 2013, the investor Carl Icahn has taken a large stake of AAPL stocks. On May 14th, the downward jump example date, figures of mobile phone sales have been reported.
We find evidence for frequent occurrences of simultaneous jumps in price and volatility and quite large volatility jump sizes. Yet, by far not all detected price jumps are accompanied by volatility jumps. Understanding the economic sources of different jump events and their consequences for price-volatility co-jumps is of interest for future research.

Conclusion
We present a new test for the presence of contemporaneous jumps of price and volatility based on high-frequency data. The test transfers the methodology of [29] to a setup accounting for microstructure noise by employing a spectral estimation of the spot volatility and an accurate test function. The nonparametric spot volatility estimator shows appealing asymptotic and finite-sample qualities and is of interest beyond the scope of this article. It opens up several new ways for inference in models for high-frequency financial data with noise. Simulations demonstrate that the proposed noise-robust test increases the finitesample performance considerably compared to an application of the test by [29] to skip-sampled data. Our data study reveals cogent significance of price and volatility co-jumps in NASDAQ high-frequency data. The presented methods can be generalized in various directions. For instance, our methods guide the way how a test for correlation of price and volatility jumps, as presented by [25] for a non-noisy observation design, can be constructed. A general global test for volatility jumps under noise generalizing the methods from [14] could be addressed with a related high-dimensional testing procedure.

Preliminaries
On the finite time horizon [0, 1], we may augment local boundedness to uniform boundedness in Assumption (H-r) and Assumption 1, such that we can assume that there exists a constant Λ with for all (ω, s, x) ∈ (Ω, R + , R). This standard procedure can be found in Section 4.4.1 of [28]. Throughout the proofs K is a generic constant and K p a constant emphasizing dependence on p. We decompose the semimartingale X in its continuous part and the jump component The processesC serve as an approximation of C t by simplified processes without drift and with locally constant volatility. We separate jumps with absolute value bounded from above by some ε < 1 and larger jumps: with A ε = {z ∈ R|γ(z) ≤ ε} and later let ε → 0. Let us recall some usual estimates on Assumptions 1, (H-r) and (σ-α) which are crucial for the following proofs. For the continuous semimartingale part, we have For given 0 < ε < 1, for J(ε) the estimate The continuous semimartingale increments satisfy local Gaussianity in the sense that on Assumption (σ-α). The probability of a frequent occurrence of large jumps is small. Precisely, the expectation of jumps with absolute value larger than ε is bounded: Under Assumption (H-r) with r ≥ 1, the jumps moreover satisfy Under Assumption (σ-α) for 0 ≤ s < t ≤ 1, the squared volatility satisfies: Proofs of these bounds can be found, for instance, in Chapter 13 of [28]. (37b) follows from Equation (54) in [2].
In the sequel, we gather more properties of the basis functions (7). We define (Φ jk ) in (7) in the same way as [15] in their Equation (4b) to exploit discretetime Fourier identities under equidistant sampling. The asymptotic properties of the estimator remain the same when we usẽ instead which equals the definition from Equation (2.2) in [12]. We heavily exploit the summation by parts identity for spectral statistics with ϕ jk (t) = √ 2h −1/2 n cos jπh −1 n (t − kh n ) 1 [khn,(k+1)hn] (t), see Lemma 6.1 of [5]. For all (ϕ jk ), it holds that For the asymptotic theory, we shall further use the following identities The latter gives 4h n /(π 2 (j 2 − u 2 )) whenever j is odd and u even, or the other way round, and vanishes in all other cases. Recall the definition of the weights (13). The magnitude of these weights is In the proofs, we use the notation ζ ad k (Z) and ζ k (Z) from (12) analogously also for different processes Z. This means that we insert in (12) spectral statistics S jk (Z), analogous to (11), computed from the sequence Z i/n , i = 0, . . . , n, especially ζ k (X) for the statistics based on the unobserved efficient price.

Estimation of the noise long-run variance
First, consider the standard case where α ≤ 1/2 in Assumption (σ-α), such that β < 1/4. To estimate (η khn ) under (22), we use nh n observations on the bin [kh n , (k + 1)h n ]. For k = 0, . . . , h −1 n − 1, and u = 1, . . . , Q, define the cumulative empirical autocorrelation statistics For u = 0, the rescaled local realized volatilities in the first addend define Z (0) khn . We estimate η khn byη We assume that Q is known. However, the same result applies if the process isQ-dependent withQ < Q. It thus suffices to take Q sufficiently large. A statistical method to infer Q is provided by [13]. We consider separately the case α > 1/2 with possible values 1/4 ≤ β < 1/3. Then, we exploit the increased smoothness of the noise by (22) to estimate (η khn ) with an improved convergence rate. We partition [0, 1] in n/M n windows of lengths M n /n, each with M n observations, where M n = c M n 1−(2α+1) −1 . For a simple exposition we may suppose M n , n/M n ∈ N again. Completely analogously as before, we compute the cumulative empirical autocorrelation statistics Z Proof of Proposition 3. 1 We begin with the case α ≤ 1/2 in Assumption (σ-α), such that β < 1/4. We prove thatη Considering the expectation of the cumulative empirical autocorrelation statistics, all terms involving increments Δ n i X are of order O P (n −1/2 ), and even smaller under exogenous noise. Thus, we have that where we use (nh n ) −1 = O(n −1/2 ) for the first and the telescoping sum for the second addend. We obtain that for all 0 ≤ u ≤ Q. Summing over u ∈ {0, . . . , Q}, we exploit another telescoping sum: since Cov i−1 , i+Q |F khn = 0. There are at mostQ < ∞ addends i = knh n + 1, . . . , knh n +Q, for that E[ i |F khn ] = 0 is possible by endogeneity, which are asymptotically negligible in the above sum. A similar computation forZ (u) khn gives: since Cov i , i−Q−1 |F khn = 0 for all, except finitely many, i. This yields that for the estimator (43) /4 and (20) give that The following bound for the conditional variance of the estimator (43) completes the proof of (45). It holds uniformly in k that since the covariances vanish whenever the difference of two indices exceeds Q < ∞. Analogously, we derive that Var Z (u) khn |F khn = O P n −1/2 for all k. This readily implies that Var(η khn ) = O n −1/2 , and with Chebyshev's inequality and (47) we conclude thatη khn = η khn + O P (n −1/4 ).

Stable convergence of the spot squared volatility estimators
We first prove two lemmas, one on moments of the noise terms in the spectral statistics and one on moments of the statistics (12).
Considering fourth moments yields given that E[ 4 i |F X ] < ∞ almost surely for all i. That no fourth moments of the noise appear in the leading term is natural, as in standard proofs of central limit theorems using a moment method, since there are only n addends with i = l = u = v. That the remainder termR n is asymptotically negligible follows with the Taylor expansion from above.
Analogously, given that 2p th moments of the noise process exist for some p > 2, an analogous computation yields that

Lemma 2. On Assumptions 1, (σ-α), (H-r) and (η-p), we obtain the moment bounds
Proof. First, (12) is a convex combination and applying Jensen's inequality (for convex combinations) and Young's inequality, we derive that For the second addends, we obtain with (42) and J n = O(log(n)) that With Proposition 3.1 this bound applies to the conditional expectation witĥ η khn also. For the term with spectral statistics S jk (C + ), depending on the process (C t ) t∈ [0,1] and the noise, we infer with Young's inequality and since Applying Jensen's inequality again yields for the first addends . For the noise term, Lemma 1 implies that for all j = 1, . . . , J n = O(log(n)). Inserting the bounds above yields (49).

Proof of Theorem 1
The proof is structured in five steps. We establish the marginal stable central limit theorem for the estimator (17a). Since we may consider the continuous martingale part of X time-reversed, the mathematical analysis for the second component follows the same arguments and we restrict ourselves to the rightlimit case explicitly. Then, we address the joint convergence in the fifth step of the proof. The Steps 1-4 are structured according to the following decomposition: In Step 1, we establish the stable limit theorem for the oracle spectral estimator (14a) built from observations of the processC n in the simplified model with noise. Working more generally than under Assumption 2 with Proposition 3.1, just suppose that we have some estimator as well as Then, on Assumptions 1, (η-p) with p = 8, > β, (H-r) with r < 2 and (σ-α) and if 0 < β < α/(2α + 1), as n → ∞: Proof of Step 1: In order to prove a point-wise central limit theorem we verify three conditions: one addressing the conditional bias, one the variance and one Lindeberg-type criterion. Additionally we have to show that the convergence holds stably in law. First, we establish asymptotic unbiasedness of the local estimates (12): Using the summation by parts identity (39), we decompose and consider the three terms separately. For the first term we obtain with the martingale property that For the noise and bias-correction term, we obtain with the bound for the remainder from Lemma 1 and with (50) that n j −2 , whereas (ϕ jk ) integrate to zero. To put it simply, that the integrals in (41b) vanish for j = u guarantees that the endogenous noise does not induce any non-negligible bias term. This completes the proof of (53).
For the expectation of the left-hand side in (52), we deduce that because α > 0 and β < α(2α + 1) −1 . By (51) and using that Φ jk −2 n n −1 is uniformly bounded for all j, we obtain that Thus, the estimation of η khn in the bias-correction is negligible in the variance of σ 2 s . In case of exogenous noise, with Lemma 1, we can readily adopt the identity from Section 6.2.2 of [5] with I k , I jk from (13). We consider additionally the conditional variance terms due to endogenous noise under condition (23). With similar estimates for the remainders as in the bias term above, we obtain that In the first identity the terms for i = p and i not close to l, q cancel. We used the smoothness of (Φ jk ) and (ϕ jk ) again. Analogously, we obtain that With similar computations, we obtain that   Using the same approximations as in the previous terms and subtracting the term already contained in I −1 jk from the exogenous setup, we obtain the overall additional conditional variance However, by (41b) the integrals sum up to zero. Since the remainder is O P (log(n)) 3 n −1/4 , the effect of the endogenous noise becomes negligible at first asymptotic order. We conclude (54).
In the sequel we write w jk , I jk , I k as functions of the squared volatility and η: I j (σ 2 , η) = 1 2 σ 2 + Φ jk −2 n η n −2 , I(σ 2 , η) = Jn j=1 I j (σ 2 , η) and w j (σ 2 , η) = (I(σ 2 , η)) −1 I j (σ 2 , η). Note that Φ jk −2 n is equal for all k such that the timedependence of I, I j , w j is only in the squared volatility σ 2 and η. For the sum of conditional variances of the left-hand side of (52), we obtain that We exploit bounds on the derivative of the weights with respect to σ 2 and η ∂w j (σ 2 , η) here and several times below. The bound is proved as Equation (77) in [5]. ∂w j (σ 2 , η)/(∂η) can be bounded analogously. Observe that by the chain and product differentiation rule ∂ ∂σ 2 w 2 j (σ 2 , η)(I j (σ 2 , η)) −1 = 2w j (σ 2 , η) ∂w j ∂σ 2 (σ 2 , η)(I j (σ 2 , η)) −1 + w 2 j (σ 2 , η)4 σ 2 + Φ jk  (42), which tends to zero as n → ∞ because α > 0. By (22), the locally constant approximation of the long-run noise variance induces an error of smaller or at most equal order. The Lindeberg condition is proved by the stronger Lyapunov criterion considering fourth moments: and the reciprocal of the right-hand side thus constitutes the asymptotic variance ofσ 2 s . Finally, stability of the weak convergence is proved similarly as in Proposition 8.2 of [29]. For later use, let us directly consider a collection of times where we consider estimates of the spot volatilities instead of only one fixed time. In particular, for our test, we shall focus on finitely many jumps of X with absolute value larger than some constant. Consider a finite set (S p ) 1≤p≤P with fix P < ∞ of ordered stopping times exhausting those jump arrivals of X on [0, 1]. The restriction of Ω to Ω n = ω ∈ Ω|S 1 > r −1 n h n , S P < 1 − r −1 n h n , ∀p : (S p − S p−1 ) > 2r −1 n h n (57) satisfies P(Ω n ) → 1 as n → ∞. Thus, we work on Ω n . We aim at establishing for Sp− U p 1≤p≤P for any F-measurable bounded random variable Z and continuous bounded function g and for (U p , U p ) a sequence of standard normals defined on an exogenous space being independent of F. This is the definition of the claimed F-stable convergence.
The strategy is to exclude intervals on which the spot estimators are built and conditioning. Thereto, define G n 0 -measurable random variables and denote i p integer-valuedG n 0 -measurable random variables such that i p h n < S p < (i p + 1)h n . The stable limit theorem in Theorem 1 is valid when replacing the fixed time s by stopping times S p , p = 1, . . . , N 1 . Analogously as in Lemma 8.1 of [29], this readily follows with the points above by the asymptotic independence of the statistics in Step 1 of the proof of Theorem 1 with s = S p forσ 2 Sp , or s = i p h n forσ 2 iphn respectively, from F Sp . Here, we exploit that the noise is under Assumption 2 only weakly serially dependent over asymptotically decreasing intervals and only dependent on finitely many preceding increments of X, and the strong Markov property of Brownian motion.
Moreover, onΩ n all spot squared volatility estimates are computed from disjoint data subsets. Therefore, by (67), covariations between all estimates converge to zero in probability what implies joint weak convergence. 5 Stability of the convergence of the vector has been established above in Step 1 of the proof of Theorem 1.