Estimation of the quadratic variation of log prices based on the Itˆo semi-martingale

: As the availability of high-frequency data becomes more widespread, it has become very popular to model random fluctuations of some econometric variables over time using Itˆo semi-martingale. An emblematic problem is to estimate the quadratic variation, i.e., the integrated volatility of log prices, using noisy high frequency data with endogenous time and jumps. We propose a methodology that combines the multiple sub-grids and thresholds. First, the sub-sample is used to reduce the e ff ect of the noise. Then, the threshold method is used to get rid of the e ff ect of jumps. Finally, the multiple sub-grids method is used to increase the convergence rate. The asymptotic properties, such as consistency and asymptotic normality, are investigated. Simulation is also included to illustrate the performance of the proposed procedure.


Introduction
As the availability of high-frequency data becomes more widespread, it has become very popular to model random fluctuations of some econometric variables over time using Itô semi-martingale.Specifically, in financial mathematics, it has become very popular to model log asset prices or interest rates using the stochastic processes X = (X t ): for t ∈ [0, 1] [1].An emblematic problem in econometrics is how to estimate the quadratic variation (the integrated volatility) of log prices, i.e., ⟨X, X⟩ t = t 0 σ 2 s ds.A classical estimator of integrated volatility is the realized volatility c.f. [2], based on the discrete time observations and the estimator is defined as [X, X] n t = t i ≤t (∆X t i ) 2 , where ∆X t i = X t i − X t i−1 for i ≥ 1.It is well known that [X, X] n t P → ⟨X, X⟩ t [3].However, when it comes to the reality, observed high-frequency data often exhibit complex features and complicated structures due to those issues: • Jumps; • Market microstructure noise; • Endogenous in the price sampling times.
For the first issue, two well-behaved estimators are the multiple-power estimator [4,5] and the realized threshold quadratic variation [6,7].One commonly used assumption is that X t is a jumpdiffusion Itô process: for t ∈ [0, T ], where X c and X d are the continuous and jumps terms, whose forms are given in (2.1) and (2.2) later.Under this setting, the quadratic variation of X becomes For the second issue, the model commonly used is the discretely observed process with microstructure noise: where {ε t i , i ≥ 0} are i.i.d.random variables, satisfying E(ε t i ) = 0, E(ε 2 t i ) = σ 2 , independent of the process X c t , and the sampling times {t i , i ≥ 0} are independent of X c .For estimating an univariate integrated volatility in the presence of microstructure noise, various estimators have been proposed by researchers, such as two-time scale realized volatility [8], multi-scale realized volatility [9], wavelet realized volatility [10], pre-averaging realized volatility [11], kernel realized volatility [12], and a quasi-maximum likelihood estimator [13].For estimating a multivariate integrated co-volatility, various methods include a quasi-maximum likelihood estimator based on generalized sampling time [14], the pre-averaging realized volatility [15], realized kernel volatility estimator based on a refresh time scheme [16], and multi-scale realized co-volatility based on previous tick data synchronization [17].For estimating large integrated volatility matrices, methods consist of universal thresholding [18][19][20][21], and adaptive thresholding [22].
For the last issue, the sampling times are irregular or random but (conditionally) independent of the price process.Volatility estimation in some special situations, and in a general situation have been studied [23][24][25].A detailed discussion on the issue of possible endogenous effect has been provided in a semi-parametric context [26], and the time endogenous effect on volatility estimation has been investigated in a non-parametric setting [27].When there were X c , X d and endogenous time, Li et al. [28] developed a procedure that yields a consistent estimator of the integrated volatility.When there were X c , microstructure noise and endogenous time, Li, Zhang and Zheng [29] considered estimators of the volatility and their asymptotic properties.Li and Guo [30] proposed a new estimator of the integrated volatility in the presence of both market micro-structure noise and jumps when sampling times are endogenous, through averaging every p observations that precede each observation in the sub-sample S to remove the effect of ε, and the method of cutting off the "big" part to remove the effect of the jump part.They obtained only an asymptotic rate n 1/6−δ for any δ > 0 due to the local averaging of a single sub-grid being used to reduce the effect of microstructure noise.
We must point out the differences between this paper and [31], although the methods of the two articles seem to be similar.A nonparametric procedure, based on a combination of the preaveraging method and threshold technique, is proposed to estimate the integrated volatility of an Itô semimartingale in the presence of jumps and microstructure noise.However, we propose a methodology that combines threshold and the multiple sub-grids, to estimate the quadratic variation of an Itô semimartingale in the presence of endogenous time, jumps, and microstructure noise.First, the sub-sample is used to reduce the effect of the noise.Then, the threshold method is used to get rid of the effect of jumps.Finally, the multiple sub-grids method is used to increase the convergence rate.Thus, the circumstances of the model and the estimated methods are both different.
In this paper, we use the sub-sample to reduce the effect of the noise, while using the multiple sub-grids method to increase the convergence rate.Then, we use the threshold method to get rid of the effect of jumps.we attempt to develop an estimator that converges consistently to the integrated volatility in the presence of jumps, micro-structure noise and time endogenous in a general setting.The asymptotic normality of the proposed estimator is also established.
The remainder of the paper is organized as follows.Some assumptions made by the model and introduction to the methodology are discussed in Section 2. The consistency and asymptotic normality results are given in Section 3. In Section 4, simulation results are presented.Some discussions are given in Section 5 and all the technical proofs are given in the Appendix.

Model assumptions
Let X = (X t ) be the log price of a single asset for continuous time t ≥ 0, which is defined on a stochastic basis (Ω, F , F t , P).Then, the model (1.3) is called an Itô semi-martingale if it has the form where b and σ are locally bounded optional processes, µ is a jump measure compensated by ν; ν(dt, dx) has the form dtF t (dx), where F t (dx) is a transition measure from Ω (0) ×R + endowed with the predictable σ−field into R/0, We define β := inf{s : |x|≤1 |x| s F t (dx) < ∞}, which is called the jump activity index in the literature.If 0 ≤ β < 1, we also say that X d has finite variation.Actually, instead of observing X t , we observe Y t due to bid-ask spread bounces, differences in trade sizes, et al., where where

and have common fourth moments.
Define the quadratic variation of X as Here, we aim to develop a new estimator for (2.4) and to investigate some asymptotic properties of the proposed estimator in the presence of jumps, micros-structure noise, and time endogenous.

Methodology
To estimate the quadratic variation of (2.4), in this section, we give a new estimator ⟨X c , X c ⟩ t .First, we need the notation of Y t k i,0 on the k-th sub-grid to reduce the effect of the noise.Then, we provide [Y, Y] S k t to get rid of the effect of jumps on the k-th grid.Finally, we use the moving average estimator ⟨X c , X c ⟩ t based on the multiple sub-grids to obtain the optimal rate n 1/4−δ .Now, let us describe the estimator in detail.
Denote N t = max{i : t i ≤ t}, we assume that max i ∆t i P → 0 is driven by some underlying force, for instance, n → ∞, where n (non-random) measures the sampling frequency over the time interval [0,t].In constructing the local average, we denote p as the number of observations, q as the size of blocks, and both are non-random numbers just as n.Define which satisfies that lq ≤ n, and as p shall be taken as o(n), lq/n → 1 as n → ∞.Moreover, for k = 0, 1, • • • , q − 1, we define We consider the time endogeneity on the sub-grid level.
The sub-sample S = S k := {t p+k , t q+p+k , • • • , t iq+p+k , • • • } is constructed by choosing every q-th observation starting from the p+k-th observation from the complete grid.Then, we define where t k i, j = t iq+p− j+k and recall that t k i,0 = t iq+p+k denotes the i-th observation time on the k-th sub-grid.To get rid of the effect of jumps on the k-th grid, the realized volatility of the locally averaged Y process is defined as where After correcting the bias due to noise, the threshold estimator ⟨X c , X c ⟩ t of ⟨X c , X c ⟩ t is provided as following: where L t := max{i : and u i is similar to u k i,0 .

Results
In this section, the limiting behavior of the estimator will be established.To provide the asymptotic results on multiple sub-grids, the following assumptions are needed.
• (1) There is a filtration (F t ) t≥0 where (t i ) i≥1 are (F t )-stopping times.Furthermore, the filtration (F t ) is generated by finitely many continuous martingales.• (2) W t , b t and σ 2 t ≥ c > 0 are adapted to a filtration (F t ), integrable and locally bounded, where c is non random; where r s is an adapted integrable process; • (5) the microstructure noise sequence (ε t i ) i≥0 consists of independent random variables with mean 0, variance σ 2 ε , and common finite third and forth moments, and is independent of F 1 .
Proof Thanks to a standard localization procedure, we can use a bounded assumption to replace the local bounded in assumptions, while we also assume that the process X t , itself, and thus the jump process X d t , is bounded as well.That is, for all results which need the assumption about volatility and Lévy measure, we may assume further that We can divide the equation into three parts, where ) ) (1) For ξ 11 , when |∆ Zt k i,0 | ≥ u k i,0 /2, for an appropriate constant C, we have when ) where l and r are both any positive numbers which may change at different places.By the assumption of boundedness of the parameters, we repeatedly use Hölder's and Burkholder's inequalities, then We deduce from above inequalities and estimations Let m = r = 1, we have that By assumption of u k i,0 , we have 1 q q−1 k=0 t k i,0 ≤t E|ξ 11 | → 0 uniformly.(2) For ξ 12 , similar to ξ 11 , we have 1 q q−1 k=0 t k i,0 ≤t E|ξ 12 | → 0 uniformly.(3) For ξ 13 , the proof is similar to Theorem 1 of [30] or the result of Theorem 2 in [29], we have 1 q q−1 k=0 t k i,0 ≤t E|ξ 13 | → 0 uniformly.Combining (1), ( 2) and (3), we can finish the proof of the theorem.□ We will use the concept of stable convergence in the Central Limit Theorem below.A sequence of random variables (r.v.s) X n converges stably in law to a r.v.X defined on the appropriate extension of the original probability space, if and only if for any set A ∈ F and real number x, we have lim n→∞ P(X n ≤ x, A) = P(X ≤ x, A). (3.15) We shall write it as X n S → X.An immediate consequence is that for any F −measurable random variable σ, we have the joint weak convergence (X n , σ) ⇒ (X, σ).Hence, it is slightly stronger than convergence in law.Define Theorem 2. Under the same assumptions in Theorem 1 and assumptions ( 6) and ( 7), then, we have and stably in law, where B t is a standard Brownian motion that is independent of F 1 .
Remark 3. The limiting process of (3.18) depends on the underlying X, the reason is that endogeneity of sampling times is existent.The endogeneity induces a bias term which is nonzero if and only if the limit in 1 q q−1 k=0 |≤u k i,0 } is no longer zero.The remaining term is the variance of a normal distribution.
Proof Since the jumps of X t is a finite variation process when β < 1, we have the following decomposition: Through the decomposition of (3.18), i.e., it suffices to show and Similar to Theorem 1, we have the following estimates: Table 1.Simulation results for β = 0.25 and β = 0.5 on three samples.

Conclusions
In this work, based on high-frequency transaction data, we provide a new estimator for the quadratic variation, i.e., integrated volatility, of log prices, in the presence of the endogenous time, microstructure noise, and jumps.First, we use the sub-sample method to reduce the effect of the noise.Second, we adopt the threshold method to get rid of the effect of jumps.Finally, the multiple sub-grids method is used to increase the rate of convergence.Both the consistency and asymptotic normality of the estimator are investigated.In Theorem 2, if one assumes that ∆ n = O p (1/n), then η = 0, and the convergence rate can be arbitrarily closed to n 1/4 , which is recognized as the optimal convergence rate in the presence of micro-structure noise.However, with the advance of technology in high-frequency trading, it often involves dozens or even hundreds of assets in financial applications.The corresponding integrated volatility matrix is turned to a high-dimensional problem, which motivates us to develop a new estimator to solve these issues when the observed data have endogenous time, micro-structure noise, jumps, etc.

Use of AI tools declaration
The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.