Local SIML estimation of some Brownian and jump functionals under market micro-structure noise

This paper is a contribution to a special issue on Data Science: Present and Future, because the main topic has been and will be in an active area of contemporary data science. High-frequency financial data are commonly available by now. To estimate Brownian and jump functionals from high-frequency financial data under market micro-structure noise, we introduce a new local estimation method of the integrated volatility and higher order variation of Ito’s semi-martingale processes. Although extending the realized volatility (RV) estimation to the general diffusion-jump processes without micro-market noise is straightforward, estimating Brownian and jump functionals in the presence of micro-market noise may not be easy. In this study, we develop the local SIML (LSIML) method, which is an extension of the separating information maximum likelihood (SIML) method proposed by Kunitomo et al. (Separating information maximum likelihood method for high-frequency financial data, 2018) and Kunitomo and Kurisu (Jpn J Stat Data Sci (JJSD) 4(1):601–641, 2021). The new LSIML method is simple, and the LSIML estimator has some desirable asymptotic properties and reasonable finite sample properties.


Introduction
This paper is a contribution to a special issue on Data Science: Present and Future, because the main topic has been and will be in an active area of contemporary data science. High-frequency financial data are commonly available by now and there is a tremendous impact on data science. We are proposing a new approach, which is simple as a statistical method, to solve the difficult problem involved in this study.
In statistics and financial econometrics, several statistical methods for estimating integrated volatility and co-volatility from high-frequency data have been proposed. Integrated volatility is a type of Brownian functional, and realized volatility (RV) estimate has been often used when any micro-market noise does not exist and the underlying diffusion process is directly observed. The asymptotic distribution of the RV estimator depends on the fourth-order integrated Brownian functional as the asymptotic variance in the stable-convergence sense. Thus, we need to estimate the fourth-order integrated moments to make statistical inference on integrated volatility when the number of observations increases in a fixed interval. However, the RV estimator is known to be quite sensitive to the presence of micro-market noise in high-frequency financial data. Several statistical methods have been proposed to estimate the integrated volatility and co-volatility (for the details of some developments of financial econometrics, see (Ait-Sahalia & Jacod 2014;Barndorff-Nielsen et al., 2008;Jacod et al., 2009;Zhang & Per, 2005). In particular, Malliavin and Mancino (2009) have developed the Fourier series method, while , referred to as KSK (2018), independently developed the separating information maximumlikelihood (SIML) estimation. We use the latter formulation in this study, which is closely related to the former method; see Mancino et al. (2017).
When market micro-structure noise cannot be ignored in high-frequency financial data, KSK (2018) have developed the SIML method for estimating volatility and covolatilities of security prices when underlying processes are the class of diffusion processes. In this study, we extend the SIML method and develop the local SIML (LSIML) estimation method for estimating higher order Brownian and jump functionals, such as the fourth-order integrated moments and the jump part of quadratic variation. The LSIML method was originally suggested in Chapter 8 of KSK (2018), but they did not provide the detailed exposition. (To avoid the possible duplication of explanations on the SIML method, we sometimes refer to the corresponding parts of KSK (2018), and Kurisu (2021, referred to as KK (2021).) Our main motivation for developing the LSIML method is to improve the SIML method and to estimate some Brownian and jump functionals, which are general than the volatility and co-volatility. For instance, the fourth-order integrated moments appear as the asymptotic variance of the limiting distribution of several estimation methods, including the SIML estimation. As the main purpose of this study is to propose the use of the LSIML method, we attempt to make our formulation not in the most general case, but concentrate on the simple cases to make the results easy to understand.
In this paper, we show that the LSIML method has some desirable asymptotic properties, such as consistency and asymptotic normality. More importantly, there could be some applications to the jump-diffusion case. The LSIML method has reasonable finite sample properties, as demonstrated through several simulations. As the LSIML method is a straightforward extension of the SIML estimation, it is quite simple and useful for practical applications. Although other methods could be used for estimating higher order Brownian and jump functionals, the LSIML method has some merits, such as its simplicity and desirable asymptotic properties.
The reminder of this paper is organized as follows. In Sect. 2, we discuss the framework of the estimation problem of some Brownian and Jump functionals when market micro-structure noise in high-frequency financial data exists. In Sect. 3, we generalize the estimation problem of RV and explain the method of the local estimation in our study. In Sect. 4, we propose the LSIML method under market micro-structure noise, which is a generalization of the SIML method originally developed by KSK (2018). In Sect. 5, we investigate the asymptotic properties of the local SIML method, such as consistency and asymptotic normality. In Sect. 6, we discuss the problem of selecting key parameters required in the LSIML estimation method. In Sect. 7, we discuss the possible generalizations of our results in more general settings, including the jump-diffusion and multivariate models. In Sect. 8, we provide some finite sample properties of the LSIML estimation based on a set of Monte Carlo simulation and provide the empirical result of the high-frequency data analysis as an illustration. In Sect. 9, we provide some concluding remarks, and mathematical details are given in the Appendix.

Estimation of Brownian and jump functionals
To identify the essential feature of the local estimation method in this study, we first consider the basic and simple time-varying cases when p = 1 (where p is the dimension). Let be the (one-dimensional) observed (log-)price at t . We consider the case when where δ (≥ 0) is a constant. When δ = 0, (1) is the market micro-structure noise model, while it is the high-frequency financial model without micro-market noise when δ = +∞. When 0 < δ < +∞, (1) corresponds to the small-noise highfrequency model. The underlying continuous-time Brownian martingale is given by and we assume that it is independent of v(t (n) i ), σ s (= σ s (x)) is the (instantaneous) volatility function, which is bounded and Lipschitz-continuous, and B s is the standard Brownian motion.
Although the LSIML method can be applied to more general Itô semi-martingales under weaker conditions, such as the Hölder condition, we first consider the simple situation, because it provides the essential feature of the LSIML method in a simple manner. (See Sect. 7 for its possible extensions.) We assume that, when the volatility process is stochastic, it has a representation of Ito's Brownian semi-martingale as where B σ s is another Brownian motion, which may be correlated with B s , and μ σ s (= μ σ s (σ 2 )) and ω σ s (= ω σ s (σ 2 )) are the drift and diffusion coefficients, which are assumed to be deterministic, bounded and Lipschitz-continuous. They can be relaxed to some extent, but the generalization of the underlying process is not within the scope of this paper, except for some in Sect. 7.
The first problem of our interest is how to estimate Brownian functionals of the form for any positive integer r and a known function g(s) from a set of observations of Y (t n i ) (i = 1, . . . , n). We denote V (2r ) = V (g, 2r ) when g(s) = 1 (0 ≤ s ≤ 1) for convenience.
This type of Brownian functionals has important examples. A clear example is the integrated volatility that corresponds to the case when r = 1.
Example 1 When r = 1, we have the integrated volatility, which is given by Example 2 The asymptotic variance of the SIML estimator of integrated volatility V (2) is given by Note that the estimation of V (4) with r = 2 under market micro-structure noise is a non-trivial task. Zhang and Per (2005), Barndorff-Nielsen et al. (2008), Jacod et al. (2009), and Ait-Sahalia and Jacod (2014) discussed different estimation methods of integrated quarticity ( 1 0 σ 4 u du), a higher order Brownian functional with different g(s) functions. However, it seems that they are more complicated than the method developed in this study.
One important class of continuous-time processes is Ito's jump-diffusion process. A simple process may be expressed as where the jump term with X t − X t− = 0 and X = X t − X t− , which is independent of B s (i.e., the Brownian motion). The term 0≤s≤t X s is formally defined as with measurable functions f i (i = 1, 2), Poisson random measure N p (dtdx), and the compensatorN p (dtdx).
(See Chapter II of Ikeda and Watanabe (1989).) In this study, we use the simple cases when the number of jumps is finite in [0, 1], and the sizes of jumps f i (i = 1, 2) are bounded with E[ X s ] = 0. It is certainly possible to extend our analysis to more general jump processes with additional conditions.
As market micro-structure noise exists, which could be regarded as jump component at each observation, distinguishing the jump term in the underlying Ito's semi-martingales may be difficult from the market micro-structure noise, or measurement error in the statistical terminology. It is because, in the general theory of stochastic processes, there are small jumps and also large jumps. (See Ikeda and Watanabe (1989), Jacod and Protter (2012), and Kunitomo and Kurisu (2017) for details.) Our interpretation of jumps in the present study would be to detect large jumps of Ito's semi-martingales from noisy high-frequency observations.
In this situation, the fundamental quantity of the stochastic process is quadratic variation (QV), which is an extension of the integrated volatility, given by Example 3 When we have jumps under market micro-structure noise, we may be interested in the continuous part of QV by and the jump part of QV by When market micro-structure noise exists, the random jump process may be difficult to distinguish from the noise. However, for many applications, the roles of stochastic (large) jumps and market micro-structure noise (or measurement error) vary, and they can be distinguished in the high-frequency financial data.

Local estimation for the no-market micro-structure-noise case
For simplicity, we use t (n) j − t (n) j−1 = 1/n ( j = 1, . . . , n) and t n 0 = 0. We divide (0, 1] into b(n) sub-intervals, and in every interval, we allocate c(n) observations. We consider the sequence c * (n), such that c * (n) → ∞, and we can take b(n) → ∞ and c(n) ∼ n/b(n) as n → ∞. A typical choice of observations in each interval would be c(n) = [n γ ] and c * (n) = n γ (0 < γ < 1), whereupon b(n) ∼ [n 1−γ ]. Because some extra observations exist (where n may not be equal to b(n)c(n) and b(n) is a positive integer), we must adjust the number of terms in each interval c(n) = c * (n) + (several terms). Although finite sample effects can occur, we ignore the effects of extra terms in the following development, because they are asymptotically negligible, and hence, we take b(n)c(n) = n.
When market micro-structure noise does not exist, we simply use the log-return process r j = y(t We order data r j in each sub-intervals and denote r k,(i) (k = 1, . . . , c(n); i = 1, . . . , b(n)).
There can be two types of local estimation methods. (We explain the first type method in this section and the second method in the next section.) When p = 1, let the 2r th moment of r k,(i) (r ≥ 1) in the ith interval be Then, we define the first type of the local realized moment (LRM) estimator of V (2r ) where a r = 2r ! r !2 r .
When r = 1, it is the RV, because a r = 1. In this construction of the first type LRM estimation, we should normalize the sample moment by the scale factor n r −1 /a r and to use the local Gaussianity of underlying continuous martingales. (Note that a r is the 2r (r ≥ 1) moment of Gaussian distribution, for instance.) For a constant volatility, the variance of r k,(i) is proportional to σ 2 (1/n) and we need to normalize the higher order moment r 2r k,(i) (r ≥ 1) by a r because of the local Gaussianity as the interval decreases.
For the first type LRM estimator when there does not exist any jump term, we have the next result on the asymptotic properties, which could be obtained straightforwardly by extending the standard arguments developed in the existing literature to the present case. (See, for example, Section 4.1.2 and Eq. (4.6) of Ait-Sahalia and Jacod (2014) on the standard argument in financial econometrics.) Proposition 1 Assume that market micro-structure noise does not exist; that is, n = 0 with p = 1 and r ≥ 1 in (1), (3) and (4). Assume also that Y (t (n) i ) = X (t (n) i ) and σ s (0 ≤ s ≤ 1) is bounded and Lipschitz-continuous.
where L − s denotes the stable convergence (SC), and where c * r (= a 2r /a 2 r − 1) is a positive constant. We notice that we have used the SC in Proposition 1, because W is a random variable when the volatility function is stochastic in general. We use the SC in the following analysis, and we provide a brief discussion on the central limit theorem CLT), and SC at the end of the Appendix.

Local SIML estimation
In this section, we introduce two types of local estimation methods. We consider the estimation problem of some Brownian and jump functionals when market microstructure noise exists as (1), (2) with δ ≥ 0, and (3) or (8). We utilize the same localization of the estimation method in Sect. 3, and then divide (0, 1] into b(n) sub-intervals, and at every interval, we allocate c(n) observations. We consider the sequence c * (n) and c(n), such that c(n), c * (n) → ∞ (c(n) = c * +(some extra terms)), and we take b(n), c(n) → ∞ and b(n) ∼ n/c * (n) as n → ∞. We choose that observations in each interval would be c * (n) = n γ (0 < γ < 1), whereupon b(n) ∼ n 1−γ , but we assume n = b(n)c(n) for the resulting simplicity.
Then, we apply the SIML method developed by KSK (2018) to each sub-intervals. To use the SIML transformation in each local interval, we set m c = [c(n) α ] (0 < α < 0.5) in the ith interval (i = 1, . . . , b(n)), and the transformed data are denoted as z k,(i) as the kth data in the ith interval I c (i) (k = 1, . . . , c(n); i = 1, . . . , b(n)). Here, we explain the procedure for the general case when p ≥ 1 by following the notations in Chapter 3 of KSK (2018) for the p−dimensional stochastic process y(t where h c(n) = 1/c(n), and c(n) × c(n) matrices The initial conditions are given by the p × 1 vector y 0,(i) (which is the initial vector in the ith interval), and 1 c(n) = (1, 1, . . . , 1) ' Then, we have the spectral decomposition where When p = 1 and for any positive integer r , let the second moment in the ith sub-interval be Then, we define the second type of the LSIML estimator of V (2r ) bŷ If we take c(n) = n, b(n) = 1 and r = 1, then we have the SIML estimator for integrated volatility as a special case. In this construction of the LSIML estimator, we have c(n) observations in each interval. Thus, we must normalize (24), because the scale factor is c(n)/n = b(n) −1 . We note that the second typeV (2r ) in (23) and (24) is slightly different from the first typeV * (2r ) in (12) and (13) when r ≥ 2. In this paper, we mainly investigate the asymptotic properties ofV (2r ) in (23) and (24), because the results and derivations in Appendix are considerably simpler thanV * (2r ) in (12) and (13). (See Kunitomo and Sato (2018, unpublished) for some detail of the first type.) However, we shall report some simulation results in Sect. 8 and it may be interesting to compare two types of estimation methods.

Asymptotic properties of the local SIML
We consider the case when σ s is a time-varying continuous and bounded function when p = 1. First, we consider the asymptotic properties of the LSIML estimation for the case of r = 1. Second, we discuss the case when r ≥ 2. The SIML estimation method was originally developed for the case of constant volatility. However, it has some desirable asymptotic properties when the instantaneous volatility is time-dependent and also stochastic in the form of (4). The LSIML estimation shares these asymptotic properties of the SIML method. Given that we need some arguments based on the SC and the martingale CLT (MCLT) in the stochastic case, we explain the asymptotic properties in this section as if the time-varying volatility was a deterministic function. (This paper mainly reports simulation results in this case to save the space.) A discussion on the SC is given at the end of the Appendix.
(i) When r = 1 First, we consider the asymptotic behavior of the quantity and m c → ∞ as n → ∞. We summarize the result for the case of r = 1, which corresponds to Proposition 1 without any market micro-structure noise. This presentation may be useful to understand the results in more general cases with the presence of market micro-structure noise. The derivation is given in the Appendix.
(ii) When r ≥ 2 We investigate the asymptotic properties of the LSIML estimator when p = 1 and r ≥ 2. A generalization of Theorem 2 when r ≥ 2 and p = 1 is as follows, which is the summary of the asymptotic properties of the LSIML estimation. The derivation is given in the Appendix.
in the SC sense, where and When r ≥ 2, we need an asymptotic bias term in (29) to express the limiting distribution of estimator. As shown in the Appendix, we have some complications in the evaluation of stochastic orders in this case. When r = 1, however, any bias term does not exist, and we obtain the result in Theorem 2.
It may be interesting to find that the form of the asymptotic variance for the LSIML estimation is the same as that for RV as in Proposition 1 when market micro-structure noise does not exist except that n (= b(n)c(n)) is replaced by b(n)m c .

Optimal choice of˛and
Because the properties of the LSIML estimation method depend crucially on the choice of c(n) and m c , which are dependent on n, we should investigate the asymptotic and the small-sample effects of their choice.
As explained in the derivation of Theorem 2 in the Appendix [(A.81), (A.89), and (A.90)], the asymptotic bias of the LSIML estimator is proportional to and the asymptotic variance is proportional to Hence, when n is large, we may approximate the mean squared error of the LSIML estimator as where c 1g and c 2g are some constants. By setting c(n) = n γ and b(n) = n 1−γ ; (0 < γ < 1), we can re-write Then, by ignoring the difference of c(n) = [n γ ] and c(n) * = n γ as similar terms, and differentiating MSE with respect to α, we obtain the condition, such that . By rearranging the related terms, we obtain the next result.
When r ≥ 2, the result of Theorem 4 holds if the volatility function is constant in [0, 1]. However, in the general case, we need slightly different conditions, and further complication may occur. This is because we have an additional bias term due to V (2r ) − V * (2r ) of (29) in Part (ii) of Theorem 3 in the general case.

Possible extensions
Our results in the previous sections have possible generalization. We discuss two cases of the jump-diffusion process and the multivariate diffusion models in this section.

Continuous part and jump part of quadratic variation
We consider the estimation problem in Example 3 in Sect. 2. When market microstructure noise does not exist in the continuous-time Ito-process as X (t) = X (0) + t 0 σ s dB s + 0≤s≤t X s (0 ≤ s ≤ t ≤ 1), the method of estimating the continuous and jump parts of quadratic variation is known. For instance, in Chapters 9 and 13 of (Jacod & Protter, 2012), the truncation functionals were developed, and many theoretical results in high-frequency asymptotics were reported. However, when some market micro-structure noise exists, it seems that any unified estimation method is not available. The LSIML method provides a useful solution for this purpose. In this subsection, we investigate the simple case of the diffusion-jump model, and then assume that p = 1, the jump size is bounded, and a finite number of jumps in [0, 1] can occur in [0, 1]. The discussion is based on Sect. 2 of KK (2021), and the general discussion of diffusion-jump processes has been given in Jacod and Protter (2012).
We consider the truncated functionals of the LSIML estimation. From Sect. 3, when p = 1 and r = 1, let the second moment in the ith sub-interval be M 2,(i) = 1 m c m c k=1 [z k,(i) ] 2 . We define the truncated LSIML functionals V J (2) and V C (2) bŷ andV respectively, where I (·) is the indicator function.
Here, we take the truncation parameter u n (a sequence of positive constants), such that (See Lemma A-3 in the Appendix.) Then, we can estimate the continuous part and jump part of the quadratic variation in a simple manner. We summarize the asymptotic properties of the truncated LSIML estimator as the next result. The proof is given in the Appendix.
KK (2021) have derived the CLT for the SIML estimation when the underlying process is the class of Ito's jump-diffusion process in the multivariate case. When p = 1, in their Corollary 2.1, the asymptotic variance of the limiting distribution is given by As W = W J +W C , it can be regarded as a decomposition of the variance, and Theorem 5 is an extension of Theorem 2.1 of KK (2021).

Multivariate processes
For multivariate processes, there are possible generalizations when p ≥ 1. Let . As the underlying continuous-time process, we consider the class of multidimensional diffusion processes. As the theory of continuous-time stochastic processes (3), a general form of the stochastic differential equation (SDE) for the p-dimensional continuous-time stochastic processes is given by which has been called the diffusion-type continuous process, where μ(s) is the p × 1 drift vector, σ (s) is the p ×q 1 diffusion matrix, and B t is the q 1 ×1 Brownian motions.
(50) also has the representation as where the first term is an integration in the sense of Riemann, while the second term is an Itô's stochastic integration with respect to the Brownian motion B t (q 1 × 1 vector). Thus, we need some regularity conditions on μ(·) and σ (·). A detailed theory of the SDE, and stochastic integration has been explained by Ikeda and Watanabe (1989). When the volatility process σ (t) = (σ i j (t)) is stochastic, we take a diffusion-type process as where μ i j (s) is the drift coefficient, ω ω i j (s) is 1 × q 2 diffusion coefficients, and B σ s is another q 2 × 1 Brownian motion vector, which may be correlated with B s .
As an example of the estimation problem, we may assume p × p variancecovariance (or the integrated volatility) matrix x = 1 0 σ s σ s ds, which is the same as V(2) = (V gh (2)) in our notation. In this case, the terms (1/m c ) m c k=1 [z k,(i) ] 2 and the asymptotic variance 2 1 0 [σ x (s)] 4 ds in Sect. 5 are replaced bŷ where we set p = 2 and The most important fact is that both the SIML and LSIML methods are simple, and using them, when the dimension p of underlying processes is large, and interpreting the results is straightforward. This aspect is quite different from other estimation methods previously proposed. Recently, KK (2021) have considered a statistical procedure to detect factors of the hidden covariation with the rank r x (which is the number of hidden factors) when it is substantially less than the observed dimension p. We expect that, under a set of regularity conditions, we have the similar results on the asymptotic properties of the local SIML estimator in more general settings.

Simulations
We conducted simulations when r = 1 and r = 2 on the estimation of the true parameters of V (2), V (4), V C (2), and V J (2). We note that the estimated variance of the SIML estimator of integrated volatility corresponds to 2V (4) in the univariate case. In our simulations, we set b(n) = [n 1−γ ], c(n) = [n γ ] such that b(n)c(n) = n, and the number of replications is 3000. We set σ 2 v as the variance of noise. Furthermore, we have investigated several cases in which the instantaneous volatility function σ 2 s is given by where a i (i = 0, 1, 2) are constants, and we have some restrictions, such that σ s > 0 for s ∈ [0, 1]. This is a typical time-varying (but deterministic) case, and the integrated volatility V (2) is given by In this case, we used several intra-day volatility patterns including the flat (or constant) volatility, monotone (decreasing or increasing) movements, and U-shaped movements.
If we take c(n) = n, b(n) = 1 and r = 1, then we have the SIML estimator for integrated volatility as a special case. In the LSIML estimation, we have c(n) observations in each interval and there are b(n) intervals. Then, we need to normalize (57) and (59), because the scale factor is c(n)/n = b(n) −1 and we impose the local Gaussianity for underlying continuous martingales.
Tables 1 and 2 correspond to the case of flat volatility, while the other tables correspond to the case of time-varying, but non-stochastic volatility.
From Tables 1, 2, 3, 4 and 5, we confirm that the LSIML method performs well for the estimation of integrated volatility. Although some loss of estimation accuracy may occur when the underlying true stochastic process is known, the LSIML method provides desirable finite and asymptotic properties. One important result in our simulation is the estimation of 2V (4), which is the asymptotic variance of the SIML estimator of integrated volatility. As presented in the tables, the mean and variance (Var) of Table 3 Estimation of integrated volatility and fourth-order functional (a 0 = 6.0, a 1 = −24.0, a 2 = 24.0; σ 2 v = 0.0005, b(n) = 10, c(n) = 1, 000; α = 0.33, γ = 0.75) n = 10,000   (AV stands for the asymptotic variance.) When the volatility function is time-varying, there are some bias inV (4), which may be expected from Part (ii) of Theorem 3. In this case, the bias ofV * (4) is smaller thanV (4). This suggests that the asymptotic distribution ofV * (4) is simple as  has suggested. To investigate the asymptotic distribution of the LSIML estimator in the form of V (4), we give some typical empirical distribution of a set of simulated data in Fig. 1  (r = 1, b(n) = 14, c(n) = 3371, α = 0.4, a 0 = 6.0, a 1 = −24.0, a 2 = 24.0) and Fig. 2 (r = 2, b(n) = 76, c(n) = 677, α = 0.4, a 0 = 6.0, a 1 = −24.0, a 2 = 24.0).

Fig. 2 Normalized histogram and normalized distribution (V (4))
We confirm that we have the asymptotic normality of the SIML estimator and the limiting normal distribution provides a reasonable approximation of the finite sample distribution. Also we found that, when r = 2, we have a small bias with the limiting normal distribution.
As Fig. 3, we show a typical empirical histograms ofV (4) andV * (4) by setting n = 50, 440, c(n) = 1261, b(n) = 40, α = 0.45, σ 2 x = 1.0, σ 2 v = 0.0005, and a 0 = 6.0, a 1 = −24.0, a 2 = 24.0 as an illustration. The former (or the underlaid blue histogram for the second type) has some bias, but less variance of the latter (or the overlaid orange histogram for the first type). These observations are consistent with the results reported in Theorem 3. (We have the bias term V * * (4) when r = 2  (right) in the second type estimation method.) The more details of the first and second type estimation methods are currently under investigation.
(ii) As a second example, we present Tables 6, 7, 8, 9 and 10 for the jump-diffusion case under market micro-structure noise. We set the true parameter values of V C (2) = 2.0 and λ = 3/n for the diffusion-Poisson-jump model with the intensity 3/n. The jump size is 0.7 and 0.0, where 0.0 means no-jump. (We report only the results for the flat-volatility case, that is a 1 = a 2 = 0 to save the space.) In the tables, mean and Var are the mean and variance of the estimated values of V C (2) and V J (2) based on 3000 replications, respectively.
When there does not have any jump (jump size = 0.0), the means of estimated V J (2) are close to zero while the estimated V C (2)'s are close to 2 when V C (2) = 2.0 and V J (2) = 0.0. When the true jump part of QV is not zero, there can be some bias in the estimation of the continuous part of QV. In Tables 6, 7, 8, 9 and 10 we basically confirm that the LSIML estimation method of the continuous and jump parts of QV performs well. In our experiment, after some trials, we have set the threshold value as where S D(·) is the standard deviation and Q995 is the 0.995 quantile. We also show the empirical distribution of the continuous-and jump parts of the LSIML estimator under market micro-structure noise in Fig. 4 (c(n) = 1000, b(n) = 100, α = 0.4, Jump size =0.8, λ = 6/n ). Since the jump size and intensity are different from those in Tables 6, 7, 8, 9 and 10, the mean of the histogram of VJ(2) is around 0.8 2 × 6 = 3.84. We confirm that the limiting normal distribution in Theorem 5 provides reasonable approximation to the finite sample distributions of the estimator of the continuous and jump parts of QV. (iii) As the summary of our simulations, we found that the LSIML estimator of integrated volatility V (2) and V (4) perform quite well as we expected. We also confirmed that the continuous and jump parts of the QV in the presence of market micro-structure noise can be estimated. The behaviors of the LSIML estimator for higher Brownian  and jump functionals as r = 1 and r = 2 are reasonable despite the difficulties of the problem involved because of the presence of market micro-structure noise.

Empirical data analysis
To demonstrate the use of the LSIML method introduced in previous sections, we provide an empirical result of high-frequency data analysis in Table 11. We reported the LSIML estimates of V (2) and V (4) for each set of parameters with different frequencies to see their effects. We used the same dataset in Sect. 4 of KSK (2018), which is the high-frequency tick-data of Nikkei-225 Futures at April 16, 2007, traded at Osaka Stock Exchange (OSE). The data are 1s, 5s, 10s, and other frequencies [see KSK (2018) for more details], and we have taken α = 0.4, 1.0 and several values of γ . We have calculated the estimates of V (2) and V (4) and shown only the results of high frequency with 1s, 5s and 10s, because we wanted to show stability of estimates  with respect to the data-frequency and parameters taken for an illustration. We have chosen a particular day, which had typical normal movements in Japanese financial markets.
There are several interesting findings. The estimated values of the LSIML estimator are fairly stable, and they do not depend on the choice of observation lengths (1s, 5s, and 10s) except for the case when α = 1.0. (The last case with α = γ = 1 does not satisfy the conditions in Sect. 5.) The estimated standard deviation of V (2) is 2V (4)/(m c b(n)), whereV (4) is an estimated value of V (4)), and its values are highly significant in all cases. The estimated values of V (2) are quite similar to the estimated values of the SIML estimator reported in Sect. 4 of KSK (2018).
The estimated values of RV correspond to the case when α = 1.0, and the estimated values of RV on V (2) and V (4) are significantly different from the LSIML estimates. For instance, the estimate of V (2) for 1s with α = 1 is ten-times larger than the corresponding estimates with other α's . The values of V (2) with α = 0.4 are 5.21E−04, 4.80E−05, and 4.78E−05, while the value of V (2) with α = 1 4.95E−04, 2.60E−04, and 1.76E−04. The differences of the estimates of V (2) for 5s and 10s from the one with α = 1 become smaller, but still they are significant. We have a similar observation on the estimates of V (4).
From these observations, the bias due to the existence of micro-market noise is significant in this empirical example. (Note thatV (4) is asymptotically the same as V * (4) despite the small differences in finite samples.) As an empirical conclusion from this example, the estimated values of RV for V (4) as well as V (2) have significant biases due to market micro-structure noise, and its use is not recommended in practice. Furthermore, the use of RV may cause some problems in applications such as the risk managements. This may be also true for the estimates of other Brownian and jump functionals in empirical studies. On the other hand, we have found that the LSIML estimation gives a solution for practical situations.

Concluding remarks
In this study, we have developed the LSIML method for estimating higher order Brownian and second-order jump functionals, which is a new statistical method. We extended the SIML method proposed by KSK (2018). The main motivation of the LSIML method is to estimate higher order Brownian and jump functionals, including integrated volatility and co-volatility when market micro-structure noise in the highfrequency financial data exists. We have shown that the LSIML method has desirable asymptotic properties, such as consistency and asymptotic normality in the SC sense. Moreover, the LSIML method has reasonable finite sample properties, which are illustrated through several simulations and an empirical data analysis. Although other methods for estimating higher order Brownian and jump functionals could be available, the LSIML method is simple and it has desirable asymptotic properties. Hence, it should be useful for empirical applications, including measurement of financial market beta with possible jumps under market micro-structure noise, which are currently under investigation.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix : Mathematical derivations
In this Appendix, we give some details of the derivations of the results in Sects. 5 and 7. Since we use the stable convergence (SC) in Theorems 2, 3, and 5, we provide some discussion how we apply the basic arguments of the martingale CLT (MCLT) and the SC to our situation at the end of this Appendix. We use some notations of KSK (2018) and KK (2021). First, we prepare several lemmas.

Some Lemmas
Lemma A-1 Let r be any positive integer and a k,c(n) is given by (22) and, for any positive integers k 1 , k 2 , there exists a constant K 1 , such that

Proof of Lemma A-2
We use the decomposition as where θ k = [2π/(2c(n) + 1)](k − 1/2) (k = 1, . . . , c(n)), and θ k j = [2π/(2c(n) + 1)](k − 1/2)( j − 1/2). We use the relation Then, we use the relations [e −iθ k j ] 2 = 2 − 2 cos θ k = 4 sin 2 θ k 2 . and after some calculations. As [(2c(n) + 1)/c(n)]b 2 k,c(n) = 4 cos 2 θ k,c(n) = 4 sin 2 θ k , it is straightforward to obtain and we have the result. (The important point is that, as the dominant term, we have the second term in the last expression.) Using similar but tedious arguments for the fourth powers, after some calculations (we only need to evaluate the dominant term), we find that The last statement of Lemma A-2 follows by applying the Cauchy-Schwartz inequality to , and using the above relation.
k,(i) = z k,(i) when n = 0. Then, using (A.71) and (A.72) below, we decompose (A.68) Since the first term is of the order O p (c(n)/n) and the second term is of the order Then, using Lemma A-2, we can find a constant K 1 , such that (A.69) Using Lemma A-1, we have the first result under the conditions in Theorem 2.
(ii) Using the Markov inequality and b(n) = [n 1−γ ], we have the result.
The CLT in the stable-convergence sense will be discussed at the end of the Appendix.

Derivation of Theorem 3:
The derivation of Theorem 3 is an extension of the proof of Theorem 2, and we show the additional steps. ( Step 1) For r ≥ 2, we decomposê  (2021). Using Lemma A-3 in the Appendix, we have the asymptotic normality of (A.99) and (A.101) in the SC sense. (We have omitted some details, but we give a discussion on the CLT and the SC in the next subsection.)

On the SC and the MCLT:
We give an outline of the underlying arguments of the CLT and the SC in Theorems 2, 3, and 5. We consider the simple diffusion model of (1)-(4) when μ σ s and ω σ s in (3) and (4)  where μ σ * s and ω σ * s are the drift and diffusion coefficients and B σ s is Brownian motion, which may be correlated with B s . For 0 = t n 0 < t n 1 < · · · < t n n = 1, we write V (4) = σ 4 0 + as n → ∞.
Using the convergence of each term and applying Theorem 2.2.15 of Jacod and Protter (2012) to the martingale parts, we have the SC for a sequence of random variables. (The derivation of the CLT for the main term in the normalized SIML estimator U n , which has been given in Chapter 5 of KSK (2018).) We write the normalized SIML estimator in the form of U n = n j=2 U n j and it is asymptotically uncorrelated with V (4) (= 1 0 σ 4 s ds) (and higher order Brownian functionals). Then, we have the SC of the martingale U n to the limiting normal random variable given It is tedious, but straightforward to extend the above arguments to more general cases.
(See Jacod and Protter (2012), and Hausler and Luschgy (2015) for details of the SC.) Finally, in Theorem 5 for the jump-diffusion model, we need to show that the CLT can be applicable to (A.99) and (A.101) in the SC sense. As (A.102)-(A.104) in the diffusion model, it is possible to re-write it as the sum of martingale different sequences and we can use the similar arguments. It is because both of (A.99) and (A.101) are linear combinations of the underlying Brownian motions.