Weighted Least Squares Realized Covariation Estimation

We introduce a novel weighted least squares approach to estimate daily realized covariation and microstructure noise variance using high-frequency data. We provide an asymptotic theory and conduct a comprehensive Monte Carlo simulation to demonstrate the desirable statistical properties of the new estimator, compared with existing estimators in the literature. Using high-frequency data of 27 DJIA constituting stocks over a period from 2014 to 2020, we confirm that the new estimator performs well in comparison with existing estimators. We also show that the noise variance extracted based on our method can be used to improve volatility forecasting and asset allocation performance. JEL classification: C13, C22, G10


Introduction
Asset pricing, portfolio allocation, and risk management all require the precise estimation of assets' return moments, such as volatilities and covariances. Starting from Andersen et al. (2001) and Barndorff-Nielsen and Shephard (2002), a large body of the literature over the last twenty years uses high-frequency data and intraday information to construct more precise financial risk measures, including realized volatilities and realized covariances. However, using high-frequency data is associated with two main challenges, i.e., the market microstructure (MMS) noise effects, and the non-synchronic trading. The upward bias on realized volatilities due to the MMS noise and the downward bias on realized covariances due to the asynchronicity (Epps effect, Epps (1979)), can partially or even fully offset the incremental benefits of using intraday information, and hence may render the use of high-frequency data practically unattractive. Several existing studies have already proposed different approaches to mitigate these issues. 1 In this paper, we introduce a new approach to estimate integrated covariation based on a weighted least squares (WLS) method, which extends the least squares-based estimators by Curci and Corsi (2012) and Nolte and Voev (2012). We show that when the regression weights are chosen appropriately, the WLS estimator is asymptotically equivalent to the multiscale estimator of Zhang (2006) and Bibinger and Mykland (2016) which attains the optimal convergence rate and is simple to implement in practice. Interestingly, the WLS approach also simultaneously produces a consistent estimator for the MMS noise (co)variance.
Our theoretical analysis suggests that the WLS estimator of integrated covariation is consistent in the presence of observation asynchronicity and endogenous noise, but is in general biased when the noise is serially correlated or heteroscedastic. We quantify the asymptotic biases due to correlated or heteroscedastic noise and propose corrections to these biases for the WLS estimator. To understand the finite sample properties of the new estimator, we conduct a comprehensive Monte Carlo simulation analysis. We demonstrate the impact of tuning parameters and provide guidance on how to choose them adaptively in practice. Then we compare our estimator with adaptive choices of tuning parameters against a few well-known estimators 2 1 For example, Aït-Sahalia et al. (2010); Barndorff-Nielsen et al. (2011a); Bibinger and Mykland (2016); Christensen et al. (2010); Clinet and Potiron (2019); Hautsch et al. (2015); Lunde et al. (2016); Shephard and Xiu (2017); Varneskov (2016); Zhang (2011), among others. 2 The competing estimators include the composite realized kernel of Lunde et al. (2016), the flat-top realized kernel of Varneskov (2016), the pre-averaged Hayashi-Yoshida estimator of Christensen et al. (2013) and the muti-scale realized covariance of Bibinger (2011Bibinger ( , 2012 in the literature. We show that our estimator is frequently ranked among the least biased estimators of all estimators and as a top-three performer in terms of root mean squared error (RMSE) under different assumptions on the dependence of MMS noise. We also demonstrate the validity of the WLS method for the estimation of noise variance. Our noise variance estimate achieves the smallest bias and RMSE in almost all cases compared to other methods, such as the ReMeDI estimator of Li and Linton (2021).
In the empirical analysis, we estimate the daily integrated covariance matrices of 27 Dow Jones Industrial Average (DJIA) stocks based on high-frequency intraday data from 2014 to 2020 using our WLS estimator alongside with the competing estimators in the simulation. Based on the HAR-DRD model of Oh and Patton (2016); Bollerslev et al. (2018), we compare the predictive power of the integrated covariance estimators using a five-minute subsampled realized covariance forecasting target. We find that the WLS estimator has very similar in-sample and out-of-sample forecasting performance in comparison with its competitors. More importantly, we show that including the WLS estimator of daily noise variances into the HAR-DRD model leads to more accurate forecasts regardless of which integrated covariance estimator is used in the HAR-DRD model. The forecasting improvement is concentrated in periods with high noise variance. The statistical forecasting improvement can also be translated into economic value through asset allocation, as noise-augmented portfolio strategies consistently obtain positive performance fees relative to the benchmark portfolios without using the noise variance.
The above result suggests that the integrated covariance of asset returns may depend on past MMS noise variance, which provides new insights into the relation between the MMS noise and volatility. To better understand this result, we further explore the potential information content of the noise variance. By examining the correlation between the noise variance and several microstructure variables, we confirm the existing literature (see e.g. Roll (1984); Glosten and Harris (1988); Harris (1991); Hu et al. (2013)) that noise can to some extent be interpreted as a measure of market friction or illiquidity. However, including these variables along with noise in the forecasting regression does not completely subsume the predictive power of the noise variance. Instead, we show that an interaction between noise variance and realized variance plays an important role, similarly to the realized quarticity (RQ) in the HARQ model by Bollerslev et al. (2016). We further show that the predictive power of noise variance remains after controlling for realized quarticity and jump variations, highlighting its distinctive information content. Overall, our findings suggest that both illiquidity-based and error-based interpretations may jointly explain the predictive power of the noise variance.
The overall contribution of this paper is two-fold. First, we enrich the vast literature on the estimation of the integrated covariation matrix by introducing a novel WLS-based estimator.
We establish asymptotic properties of our WLS estimator under general setup on the observation scheme and the MMS noise structure, and prove a one-to-one mapping between the WLS estimator and the multi-scale realized covariance estimator of Bibinger (2011Bibinger ( , 2012; Bibinger and Mykland (2016). These results to a large extent strengthen the theoretical results in Curci and Corsi (2012) and Nolte and Voev (2012) and provide new insights into the properties of the multiscale estimators under more general settings.
Second, we also add to the growing literature on the intersection of MMS noise and volatility modelling. The problem of noise variance estimation is considered in e.g. Hansen and Lunde (2006); Nolte and Voev (2012); Ikeda (2015); Jacod et al. (2017); Li and Linton (2021), and several papers attempt to understand the interaction between return and MMS noise (for example, Diebold and Strasser (2013); Clinet and Potiron (2019); Andersen et al. (2021)). We provide a new WLS-based method which extracts noise variance in a multivariate setup. We not only provide evidence that microstructure noise is linked to a set of microstructure variables reflecting market illiquidity but also offer both statistical and economic evidence that incorporating the extracted noise into the forecasting model can improve volatility forecasting and asset allocation performance.
The remaining of the paper is structured as follows: Section 2 defines the WLS estimator and summarizes its theoretical properties. Rigorous econometric analyses of the theoretical properties are provided in Section 3. Section 4 presents a comprehensive Monte Carlo simulation analysis about the finite sample properties of the estimator. Section 5 conducts empirical analysis using high-frequency data of DJIA stocks. Section 6 concludes.

The Weighted Least Square Estimator
This section introduces our novel weighted least square (WLS) estimators for integrated covariation and noise covariances and provides a succinct summary about their theoretical properties.
The interested readers are referred to Section 3 for a rigorous econometric analysis of the estimator, which can be skipped without much loss of continuity. Our notation largely follows from the general notation in Aït-Sahalia and Jacod (2014).
where the interval [0, 1] is normalized to represent a trading day.
We do not observe X and Y directly on [0, 1]. Instead, we observe contaminated versions of X and Y on strictly increasing and possibly asynchronous observation times {t X n } n=0:N X 1 and {t Y n } n=0:N Y 1 understood as transaction times of X and Y , where N X t := ∞ n=1 1l {t X n ≤t} and analogously for N Y t count the number of observations of X and Y up to time t. The observed price processesX andỸ are realized as: where ϵ X t X n and ϵ Y t Y n are the corresponding measurement errors ofX andỸ at time t X n and t Y n , commonly known as the market microstructure (MMS) noise. They are assumed to be zero mean processes with finite eighth moments, but are allowed to be autocorrelated, endogenous and heteroscedastic (see Sections 3.1 and 3.2).
To deal with asynchronous observations, we adopt the pseudo-aggregation algorithm of Bibinger (2011Bibinger ( , 2012 to synchronize the prices. We introduce the notion of interpolated sampling times and the refresh time synchronization: Definition 1. For any s ∈ [0, 1], denote the next-tick and previous-tick interpolations of sampling times for X as: t X + (s) := min n=1:N X 1 {t X n : t X n ≥ s}, t X − (s) := max and t Y + (s) and t Y − (s) are defined analogously. The refresh time synchronization of Barndorff-Nielsen et al. (2011b) for {t X n } n=0:N X 1 and {t Y n } n=0:N Y 1 can be expressed as: T 0 = max{t X + (0), t Y + (0)}, T n = max{t X + (T n−1 ), t Y + (T n−1 )}, n ∈ {1, . . . , N }.
Intuitively, the nth refresh time T n is the time that we observe at least one transaction from both assets since T n−1 . Based on the refresh times, we can obtain a synchronized dataset {X Tn ,Ỹ Tn } n=1:N , and it is the synchronization scheme with minimal data loss (Aït-Sahalia et al., 2010). At the refresh times, we define the next-tick and previous-tick interpolations for X asX ± Tn :=X t X ± (Tn) and analogouslyỸ ± Tn :=Ỹ t Y ± (Tn) . For example, whenever T n belongs to {t X n } n=0:N X 1 , we must haveX ± Tn =X Tn , otherwiseX + Tn (resp.X − Tn ) returns the transaction price right before (resp. after) time T n in the local sampling grid {t X n } n=0:N X 1 . In a univariate setting whenX =Ỹ , we clearly have T n = t X n , ∀n, and thusX ± Tn =X t X n , ∀n.
Based on the pseudo-aggregation algorithm, Bibinger (2011Bibinger ( , 2012 propose the (generalized) subsampled RC estimator with a sampling interval of m refresh time ticks: The solution to the above problem is available in closed-form, which we present in Propositions 1 and 2.
3 In our simulation and empirical analysis, we replace the constant regressor in Eq. (6) by mN (m) /N , which corrects for a small finite sample bias of [X,Ỹ ] (m) in the spirit of Zhang et al. (2005) but does not affect the asymptotic properties of the WLS estimator discussed in Section 3.
The WLS regression is motivated by the fact that, assuming independent and identically distributed (i.i.d.) MMS noise and synchronous observations, as N → ∞ we have (see Eq. (35) of Zhang et al. (2005) in a univariate setting): where E[·|X, Y ] is the expectation operator conditional on X and Y processes. As each [X,Ỹ ] (m) has an asymptotic bias proportional to N (m) times the (unknown) covariance of the MMS noise, it is natural to regress [X,Ỹ ] (m) on N (m) using Eq. (6), and the estimates of β 0 and β 1 correspond to estimators of ⟨X, Y ⟩ and 2 E[ϵ X n ϵ Y n ], respectively. However, due to the heteroscedasticity in [X,Ỹ ] (m) , OLS regressions used in Curci and Corsi (2012) and Nolte and Voev (2012) do not provide an estimator with the optimal convergence rate, which motivates the use of the WLS approach in this paper. To emphasis the estimated quantity, we re-label the estimated regression coefficients as ⟨ X, Y ⟩ (W LS) :=β :=β The choices of w(x), M and Q play a crucial role in shaping the theoretical properties of ⟨ X, Y ⟩ (W LS) , which we summarize in what follows. First, Q, M and w(x) jointly determine the convergence rate and the asymptotic properties of ⟨ X, Y ⟩ (W LS) as N → ∞. To achieve the optimal convergence rate N 1 4 (see Varneskov (2016)  Second, when ⟨ X, Y ⟩ (W LS) achieves the optimal convergence rate, we show that it is asymptotically equivalent to the multiscale realized covariance (MSRC) of Bibinger and Mykland (2016), which is in turn asymptotically equivalent to the non-flattop realized kernel (RK) estimator. We establish the link between the choice of w(x) and the corresponding MSRC weight function in Theorem 1. In particular, ⟨ X, Y ⟩ (W LS) with w cubic (x) = x and are asymptotically equivalent to a non-flattop realized kernel with a cubic and a Parzen kernel, respectively. We use the subscripts ⟨ X, Y ⟩ (W LS) cubic and ⟨ X, Y ⟩ (W LS) parzen to distinguish between the two choices of w(x). For these two cases, we also explain the optimal choice of M at the end of Online Appendix A based on the results of realized kernels.
Third, the choice of Q determines the asymptotic bias of ⟨ X, Y ⟩ (W LS) introduced by autocorrelated MMS noise. Specifically, ⟨ X, Y ⟩ (W LS) is only asymptotically unbiased if the MMS noise is autocorrelated up to Q−1 lags. As the MMS noise is believed to be autocorrelated with a fast decaying autocorrelation structure (Jacod et al., 2017;Li and Linton, 2021), we propose to choose Q large enough such that the estimated MMS noise autocorrelation is approximately zero. The detailed procedure is explained in Online Appendix B.
Fourth, we show that ⟨ X, Y ⟩ (W LS) is robust to endogenous noise, but it is asymptotically biased when the MMS noise is heteroscedastic, a scenario studied in Kalnina and Linton (2008).
We derive a simple heteroscedasticity-corrected WLS estimator in Section 3.2, which we denote by ⟨ X, Y ⟩ (W LS, * ) . However, our simulation shows that ⟨ X, Y ⟩ (W LS, * ) and ⟨ X, Y ⟩ (W LS) have largely similar performances, thus the correction is immaterial from a practical perspective, which is consistent with the findings in Kalnina and Linton (2008).
where 0 ≤ t 0 < t 1 < . . . < t N ≤ 1 satisfies sup n (t n − t n−1 ) = O(N −1 ). The bivariate MMS noise process {ϵ X n , ϵ Y n } ′ n=1:N is assumed to be i.i.d. with zero mean and finite eighth moment independent of X and Y .
Note that throughout this paper, we deal with infill asymptotics by letting N → ∞ while fixing the time span [0, 1]. Assumption 1 is a simplified version of Assumption 1 in Bibinger and Mykland (2016) which guarantees the consistency and rate-optimality of the MSRC estimator.
As it is closely related to our WLS estimator, we provide a succinct review of some properties of the MSRC estimator.
For a choice of sampling interval m, we construct the subsampled RC estimator [X,Ỹ ] (m) based on Eq. (5), and note that in the synchronized setting it has the following simpler form: We give a definition of the MSRC estimator, which is a weighted sum of [X,Ỹ ] (m) with sampling intervals ranging from Q to M : where the weights α (m) satisfy the following two conditions: is a consistent rate-optimal estimator of ⟨X, Y ⟩ under Assumption 1 (Bibinger, 2011). To be specific, as N → ∞, we have: To understand the asymptotic distribution of ⟨ X, Y ⟩ (M S) , it is convenient work with the following decomposition: where a (m) = α (m) + O(N −1 ) is a slightly altered 4 version of α (m) that satisfies: Condition 2 ′ . M m=Q (a (m) /m) = 0. One can verify that M m=Q a (m) [X,Ỹ ] (m) has an asymptotic bias of −2 E[ϵ X n ϵ Y n ], which is corrected by E (m) . In the literature, E (m) is know as the end-effect correction of the MSRC estimator. The standard choice of a (m) is to set a (m) = m M 2 (h( m M ) + O(M −1 )), where the O(M −1 ) term can be found in Eq. (24) of Zhang (2006). The function h(x) is a twice continuously differentiable function (called the MSRC weight function) satisfying 1 0 xh(x)dx = 1 and 1 0 h(x)dx = 0. Importantly, the weights α (m) affect the asymptotic distribution of ⟨ X, Y ⟩ (M S) through the choice of h(x) only.
As to the choice of h(x), Zhang (2006) provides the optimal MSRC weight function of h(x) = 12(x − 1/2) which minimizes the asymptotic variance of ⟨ X, Y ⟩ (M S) attributable to MMS noise. Furthermore, Bibinger and Mykland (2016) show that ⟨ X, Y ⟩ (M S) with the weight function h(x) is asymptotically equivalent to a non-flattop realized kernel 5 (RK) with the kernel function k(x) satisfying k ′′ (x) = h(x). This implies that h(x) = 12(x − 0.5) corresponds to an RK with a cubic kernel and motivates the use of a Parzen kernel-implied MSRC weight function , which is overall more efficient than the cubic kernel.
We return to the discussion of the WLS estimator and its relation to the MSRC estimator.
To this end, we define a class of WLS weight functions: We call d w the dominating exponent of w(x).
Intuitively, d w controls for the rate of increase or decrease of w(x) at x = 0. When d w ≥ 0, w(0) is well-defined and w(x) increases locally around the origin, while when d w < 0, w(0) = ∞ and w(x) explodes at the origin. For example, w(x) = sin(x) has d w = 1 since w(x)/x = 1+o(x) as x → 0 + . The choice of d w plays a crucial role in determining the convergence rate of the WLS estimator as it determines the weights applied to the RC estimators sampled at the highest frequency. A complete analysis for the impact of d w on the convergence rate is presented in Theorem 2.
Our first result concerns an MSRC representation of the WLS estimator: Proposition 1. The WLS estimator ⟨ X, Y ⟩ (W LS) has the following MSRC representation: where: in which C 1 , C 0 , C −1 , are constants that depend on Q, w(x), M and N whose expressions can be found in Eq. (C.5). The weights ϕ (m) satisfy Conditions 1 and 2.
The above result implies that ⟨ X, Y ⟩ (W LS) is by definition an MSRC estimator with the weights ϕ (m) that satisfy conditions 1 and 2 regardless of the choices of Q, M and w(x), thus we can conclude that ⟨ X, Y ⟩ (W LS) is asymptotically unbiased. Furthermore, ϕ (m) has the following asymptotic structure: in which: and φ (m) satisfies conditions 1 ′ and 2 ′ . Suppose also that d w ≥ 1, then φ (m) has the following asymptotic structure: where: The function h(x) satisfies 1 0 xh(x)dx = 1 and 1 0 h(x)dx = 0.
Notice that h(x) is a valid MSRC weight function, which means that φ (m) has the same asymptotic structure as a (m) in Eq. (12) if we choose w(x) and h(x) that satisfy Eq. (18). As a result, M m=Q φ (m) [X,Ỹ ] (m) and M m=Q a (m) [X,Ỹ ] (m) have the same asymptotic distribution and convergence rate by properties of the MSRC estimator, which is a key step towards the asymptotic equivalence between ⟨ X, Y ⟩ (W LS) and ⟨ X, Y ⟩ (M S) that we establish in Corollary 1.
The following choices of h(x) are commonly used in the literature as they correspond to rate-optimal realized kernel estimators. Here we derive 6 the associated w(x) that satisfies Eq. (18) and denote them based on their corresponding kernel names: 6 Computing h(x) given w(x) is straightforward, but solving for w(x) based on a general h(x) may not be analytically tractable (for example, the Tukey-Hanning kernel-implied MSRC weights). In this case, one needs to solve a system of integral equations numerically for W −1 , W 0 , and W 1 and plug them back into Eq. (18) to obtain w(x).

(Parzen Kernel
It is worth noting that both w cubic (x) and w parzen (x) have d w = 1. Here we provide some intuition on why this leads to an optimal convergence rate for the WLS estimator. Under Assumption 1, the dependent variable in Eq. (6) has the following asymptotic expansion (see Zhang et al. (2005), Eq. (52)): and the O p ( N/m 2 ) term due to MMS noise dominates the variance of [X,Ỹ ] (m) (conditional on X and Y ) for m small relative to N . Therefore, the linear weight function w cubic (x) = x is optimal in the WLS sense as it completely offsets the heteroscedasticity in the regression residuals due to noise, which leads to the optimal MSRC weight h cubic (x) as in Zhang (2006).
Ignoring other factors of heteroscedasticity, this leads to a √ M -consistentβ by properties of the least square estimator, which translates into the optimal N 1 4 -convergence rate with . As w cubic (x) does not account for heteroscedasticity due to discretization or the correlation structure of the regressand, the linear weight can be inferior to other WLS weights in terms of overall efficiency, such as w parzen (x) which is linear on [0, 0.5] but decreases on [0.5, 1].
From the above discussion, we see that the WLS weight function w(x) should have d w = 1 to counterbalance the noise-induced variance for small m. The following result provides more insights into the impact of d w on the convergence rate of the WLS estimator: , the relation between the convergence rate of ⟨ X, Y ⟩ (W LS) and the dominating exponent d w of its WLS weight function w(x) is summarized in Table 1.
Theorem 2 shows that the convergence rate is completely characterized by d w . In general, the convergence rate becomes slower as d w decreases further, since we are putting more weights on the fastest scale of the subsampled RC estimator which has a larger variance. The result in Table 1 can be reconciled with several findings in the literature. First, an N 1 5 -consistent estimator can be constructed with d w = 1/6, which implies δ = 3/5, in line with the optimal bandwidth of a positive-definite non-flattop realized kernel (Barndorff-Nielsen et al., 2011b). Second, the convergence rate of the estimator in Nolte and Voev (2012) is recovered with d w = 0 and w(x) = 1. The N 1 6 case is found with d w ∈ (−1, 0), which shares the same convergence rate and choice of M as the estimators in Zhang et al. (2005); Barndorff-Nielsen et al. (2008).
Interestingly, our result implies that the estimator in Curci and Corsi (2012) has the slowest possible convergence rate with w(x) = x −1 and d w = −1. For any d w < −1, the WLS estimator diverges as the variance of the measurement errors explodes in the limit.
Moving on to the WLS estimator of noise , we have the following result: , which has the following explicit form: where θ (m) = ϕ (m) − φ (m) . It holds as N → ∞ that: Therefore, under the i.i.d. exogenous setting with synchronous observations, is a consistent and rate-optimal estimator of the noise covariance E[ϵ X n ϵ Y n ]. Adding to the findings in Nolte and Voev (2012), we show that a different linear combination of [X,Ỹ ] (m) can be used to extract the short-run covariance of the MMS noise process. Interestingly, Proposition 2 suggests the following decomposition for ⟨ X, Y ⟩ (W LS) in the spirit of Eq. (12): as its end-effect correction, which is different from We conclude this section by proving the asymptotic equivalence between ⟨ X, Y ⟩ (W LS) and ⟨ X, Y ⟩ (M S) in Corollary 1, which is a direct result of Theorem 1, Proposition 2 and Eq.
Corollary 1 implies that under Assumption 1, ⟨ X, Y ⟩ (W LS) is also a consistent and rateoptimal estimator of ⟨X, Y ⟩ in view of Eq. (11), and its asymptotic distribution is readily available from ⟨ X, Y ⟩ (M S) , or the corresponding non-flattop RK with the kernel function k ′′ (x) = h(x). The asymptotic equivalence in fact holds in a much more general setting as the two estimators only differ asymptotically in how they deal with the end-effects, which converges at a much faster rate than the estimator of ⟨X, Y ⟩ itself. This is an intriguing result, because we show that the MSRC estimator also emerges from a WLS-based estimator of ⟨X, Y ⟩ based on a quite different principle. The result also suggests a simple approach to choose h(x) by exploiting Eq. (18), in addition to the method proposed in Zhang (2006). As an interesting future research question, it may be possible to design an optimal w(x) in the regression setting to minimize the asymptotic variance of the WLS estimator based on the theory of generalized least squares, which is similar to the optimal kernel design problem as studied in Barndorff-Nielsen et al. (2008, 2011b.

Observation Asynchronicity and Dependent Noise
It is well-known that trades arrive asynchronously into the market, which generates the Epps' effect (Epps and Epps, 1976) that biases the realized covariance estimator downwards. Also, it is well-documented that the MMS noise of tick-by-tick transaction prices are autocorrelated (Aït-Sahalia et al., 2011;Ikeda, 2015;Jacod et al., 2017;Li and Linton, 2021). Empirical findings in Voev and Lunde (2007); Griffin and Oomen (2011) point out possible lead-lag relations between pairs of MMS noises. Varneskov (2016) shows that synchronization algorithms can produce artificial noise persistence in a multivariate setting. These results all suggest that Assumption 1 is unlikely to hold in practice.
In this section, we relax Assumption 1 to allow for observation asynchronicity and dependent MMS noise and discuss their implications to the consistency 7 of the WLS estimator. To this end, we replace Assumption 1 by Assumptions 2 and 3 below: Assumption 2. The efficient prices X and Y as defined in Eq.
(1) are observed on deterministic and strictly increasing asynchronous observation times {t X n } n=0:N X 1 and {t Y n } n=0:N Y 1 . The sampling times are assumed to be regular, that is: Assumption 3. Let {ϵ X Tn , ϵ Y Tn } n=0:N denote the MMS noise processes associated withX and Y synchronized at the refresh times {T n } n=0:N , which is constructed from {t X n } n=0:N X 1 and {t Y n } n=0:N Y 1 according to Definition 1. We assume that: 1. {ϵ X Tn , ϵ Y Tn } n=0:N is a strictly stationary bivariate process with zero mean and finite eighth moments.
2. For some non-negative integer q, {ϵ X Tn , ϵ Y Tn } n=0:N is q-dependent, that is, ϵ X Tn and ϵ Y Tn are independent of ϵ X T n−j and ϵ Y T n−j for every n and j > q.
3. It holds that ϵ X Tn = ϵ X t ± (Tn) and ϵ Y Tn = ϵ Y t ± (Tn) for every 0 ≤ n ≤ N , so that the MMS noise remains unchanged after interpolation at the local observation times.
Following Barndorff-Nielsen et al. (2011b), we specify the noise dynamics in refresh time to simplify our analysis. Intuitively, the above assumption states that the MMS noise is only dependent up to q observations in refresh time. As q can be chosen arbitrarily large, this assumption provides a flexible dynamic structure to capture the empirical dependence of the bivariate noise processes. It is worth noting that ϵ X Tn = ϵ X t ± (Tn) holds true whenever T n belongs to {t X n } n=0:N X 1 and likewise for Y , so condition 3 already holds true for half of the cases. We expect that it still approximately holds when T n does not belong to the local observation times given that the tick-by-tick MMS noise is known to be highly positively autocorrelated at the first lag (Jacod et al., 2017;Li and Linton, 2021).
7 As we are mostly interested in the first-order behaviour of the WLS estimator under more general settings, we do not discuss its asymptotic variance in these settings, which is left for future research.
We firstly discuss the impact of observation asynchronicity in the context of Assumption 2. We adopt the pseudo-aggregation algorithm of Bibinger (2011) to deal with asynchronicity.
In detail, when the prices are asynchronous, we utilize the generalized subsampled RC defined in Eq. (5) in the construction of ⟨ X, Y ⟩ (W LS) . As changing the definition of [X,Ỹ ] (m) does not affect the results in Corollary 1, ⟨ X, Y ⟩ (W LS) is still asymptotically equivalent to ⟨ X, Y ⟩ (M C) coupled with the pseudo-aggregation algorithm, which is a consistent and rate-optimal estimator in the presence of i.i.d. and exogenous MMS noise (Bibinger and Mykland, 2016). We also note that the deterministic observation times can be further weakened by allowing stochastic and endogenous sampling times in view of Corollary 3.5 of Bibinger and Mykland (2016), which we omit for brevity.
The effects of dependent MMS noise as specified in Assumption 3 on the consistency of the WLS estimator are summarized in Proposition 3: Proposition 3. Suppose Assumptions 2 and 3 hold true. Pick w( where Several important conclusions can be drawn from Proposition 3. First, in the case when δ = 1/2 and Q ≤ q, ⟨ X, Y ⟩ (W LS) has a bias of order O(1) which does not vanish in the limit. This result is unexpected in the literature of multiscale estimators, as Section 6.2 of Aït-Sahalia et al. (2011) shows that the multiscale realized variance with Q = 1 is robust to exponentially mixing noise with a bias that is of the order O(M −1 ), which contradicts our results. After careful examination, we find that Aït-Sahalia et al. (2011) missed a factor of N in the first equation of Section 6.2, causing the bias to have a much smaller asymptotic order. Therefore, one can only achieve consistency and optimal convergence rate if Assumption 3 holds and we choose Q > q, otherwise ⟨ X, Y ⟩ (W LS) , and hence ⟨ X, Y ⟩ (M S) , are inconsistent.
We conjecture that in the presence of a general noise dependence structure, a consistent version of ⟨ X, Y ⟩ (W LS) can be constructed by letting Q diverge at a suitable rate following the flat-top realized kernel of Varneskov (2016Varneskov ( , 2017 while still maintaining the optimal N 1 4 convergence rate. Theoretical analysis along this line is beyond the scope of this paper. Nevertheless, from a practical perspective, one should choose Q large enough such that Γ m ≈ 0 to avoid bias introduced by the cross-correlated (or autocorrelated in the univariate case) noise.
Second, the asymptotic order of the bias for the general δ ∈ ( 1 3 , 1) case is in line with the leading bias of the realized kernel in the presence of dependent noise (see e.g. Lemma 2 of Varneskov (2016)). This also rationalizes the choice of sub-optimal choices of δ to deal with dependent noise (such as δ = 2/3 in Barndorff-Nielsen et al. (2011b)), which ensures that the asymptotic bias due to Γ m vanishes in the limit.
remains a consistent estimator of E[ϵ X Tn ϵ Y Tn ] for any Q = O(1) and δ ∈ ( 1 3 , 1) under Assumption 3. This result can be strengthened to cover general dependent noise with absolutely summable Γ m , and the asymptotic order of the bias remains O(N −δ ). In a univariate setting, Proposition 3 implies that E is a consistent estimator of the MMS noise variance which is robust to autocorrelation in the noise dynamics. Kalnina and Linton (2008) emphasize the importance of endogenous and heteroscedastic noise in the estimation of integrated variance, which is further studied by Varneskov (2016Varneskov ( , 2017 in the construction of flat-top realized kernel estimators. As the impact of these features is not yet analyzed for the class of multiscale estimators, we provide some primitive theoretical considerations. To simplify exposition, we focus on the univariate process X and omit the superscript for X in the observation times whenever no confusion is caused. The following assumption replaces Assumption 3:

Endogenous and Heteroscedastic Noise
Assumption 4. Use the setting in Assumption 2. The noise process ϵ X n is assumed to have the following structure: where ∆t n = t n − t n−1 , ∆W n = W tn − W t n−1 , W = (W t ) t∈[0,1] is a standard Brownian motion satisfying d[B X , W ] t = ξdt, and ω : [0, 1] → R + is a bounded Lipschitz function. The deterministic sampling times {t n } n=0:N are assumed to be generated from t n = τ (n/N ) where τ (t) = t 0 λ 2 (u)du for some strictly positive and right continuous function λ(t).
The factor ∆t − 1 2 n ensures that ϵ X n = O p (1), similar to the design in Varneskov (2017). The heteroscedasticity of ϵ X n is captured by the function ω(t), and ξ ∈ [−1, 1] measures the degree of endogeneity. One can further generalize this specification by adding a fully exogenous component, but the influence on the WLS estimator remains qualitatively unchanged. We do not consider dependent noise in this case as it can be dealt with by choosing some large Q based on our discussion in the previous section. The construction of τ (t) is adapted from Barndorff-Nielsen et al. (2011b) to simplify the expression of limiting quantities with irregular sampling times but does not play a substantial role otherwise. Alternatively, one can define the asymptotic quadratic variation of time (e.g. Mykland and Zhang (2006)) which serves the same purpose.
We deduce the following result under Assumptions 2 and 4: Proposition 4. Suppose Assumptions 2 and 4 hold true. Pick w(x) with d w ≥ 1. Construct 1). As N → ∞, the following result holds: Three important conclusions can be drawn from Proposition 4. First, the endogeneity of the noise itself does not introduce an asymptotic bias to ⟨ X, X⟩ (W LS) , as the bias is not a function of ξ. Intuitively, this is because the cross term between the efficient price and the noise , which vanishes with the order O(N − 1 2 ) after we average across all scales. An in-depth analysis can be found in the proof of the proposition.
Third, the proof of the proposition shows that E is asymptotically equivalent to 1 N +1 N n=0 (ϵ X n ) 2 that consistently estimates 1 0 ω(τ (u)) 2 du, which is also robust to serially dependent noise in view of Proposition 3. However, we are unable to decompose the total noise variance into endogenous and exogenous components, as ξ is unidentified in this setup.
A simple heteroscedasticity correction for ⟨ X, X⟩ (W LS) can be formulated in the same spirit of Kalnina and Linton (2008): Corollary 2. Under the setup of Proposition 4, the following estimator of ⟨X, X⟩ is consistent: where φ (m) is defined in Eq. (16) and: In view of the decomposition in Eq. (22), (X,X) {Q} can be seen as a different end-effect correction term which consistently estimates w(0) 2 + w(1) 2 . The parameter Q in (X, X) {Q} plays the same role as in ⟨ X, X⟩ (W LS) to mitigate the impact of dependent noise, which does not introduce any additional tuning parameters. 8 For the bivariate case, the impact of endogenous and heteroscedastic noise on ⟨ X, Y ⟩ (W LS) is qualitatively unchanged under a suitable bivariate extension to Assumption 4. This can be analysed analogously following the proof of Proposition 4, which we omit for conciseness. The corresponding heteroscedasticity-corrected WLS estimator is constructed as: in which:
(1) implies continuous paths of efficient prices, which precludes jumps in the model. In the presence of jumps, ⟨ X, Y ⟩ (W LS) delivers a consistent estimator of the total quadratic covariation between X and Y which includes both covariation from the continuous and the jump components, as is discussed in Bibinger (2012) is robust to jumps from the WLS design. From a portfolio allocation point of view, it is reasonable to consider the total 8 Alternatively, one can also follow Barndorff-Nielsen et al. (2008) and jitter the endpoint ofX, which diminishes the impact of ω(0) 2 and ω(1) 2 . Consequently, the correction term (X,X) {Q} is no longer needed. However, this introduces an additional tuning parameter for the jittering rate, which is not pursued here.
quadratic covariance as a measure of covariance between assets, which is what we pursue in our empirical analysis.
However, one may be interested in disentangling the jump covariation, or co-jumps, from the total quadratic covariation, which is an important risk factor of asset prices (Todorov and Bollerslev, 2010;Gilder et al., 2014;Caporin et al., 2017). Existing approaches include the truncated RC estimator of Mancini and Gobbi (2012) and the truncated pre-averaged Hayashi-Yoshida estimator of Koike (2016), both based on the truncation principle of Mancini (2009).
Following this lead, a natural jump-robust extension to the WLS estimator is to replace each RC estimators [X,Ỹ ] (m) by their truncated counterparts.

Simulation
In this section, we conduct a Monte Carlo simulation study to verify the theoretical results derived in the previous sections and provide guidance on the choice of Q and M . We also compare the finite sample performance of the WLS estimators against various commonly used estimators in the literature. Finally, we verify that the WLS estimators can provide valid inference for the short-run variance of the MMS noise.

Simulation Design
The simulation setting closely follows Barndorff-Nielsen et al. (2011b); Christensen et al. (2010); Varneskov (2016). The data generating process of (X, Y ) is assumed to be a bivariate one-factor stochastic volatility model: Here B t , W X t and W Y t are standard Brownian motions capturing systematic and idiosyncratic risks in X and Y , respectively. The parameters of our simulation are: µ = 0.03, b 1 = 0.125, We simulate 1000 realizations of (X, Y ) on the interval [0, 1] representing a typical 6.5-hour trading day in the US stock market. The values of X and Y are firstly computed using Euler approximation with a step size of 0.1 seconds, or 1/234000. We then generate asynchronous observation times {t X n } and {t Y n } as independent Poisson processes with intensity parameters λ X and λ Y capturing the average number of observations per second. In the simulation we consider 9 (λ X , λ Y ) ∈ {(1, 1/2), (1/5, 1/10)}, so that X on average has twice as many observations as Y with 23400 and 4680 observations per day, respectively.
The observed prices {X t X n } and {Ỹ t Y n } are generated according to Eq.
(2), with the following MMS noise process: where IQ X t := t 0 (σ X s ) 4 ds is the integrated quarticity of X and IQ Y t is defined analogously. The noise-to-signal ratio parameter ψ determines the overall size of noise variance. We consider ψ 2 ∈ {0.005, 0.015}, representing the low and high noise cases. The function ω(x) is strictly positive and bounded continuous satisfying 1 0 ω(x) 2 dx = 1 which controls the degree of heteroscedasticity. We use the simple specifications ω(x) = 1 + cos(2πx)/2, which generates a U-shaped pattern as documented in Kalnina and Linton (2008). The processes ε X n and ε Y n are independent of X, Y and each other satisfy E[ε X n ] = E[ε Y n ] = 0 and Var[ε X n ] = Var[ε Y n ] = 1. Specifically, ε X n is assumed to follow an ARMA(1,1) process with unit variance: and ε Y n is defined analogously with parameters ϕ Y and θ Y . The following parameters are chosen for the simulation of MMS noises: We focus on the following WLS estimators of ⟨X, Y ⟩, ⟨X, X⟩ or ⟨Y, Y ⟩ with optimal convergence rates: ⟨ ·, ·⟩ For these estimators, we need to choose two tuning parameters: Q and M . We discuss the influence and provide guidance on the choices of the two parameters in the next section. In particular, we choose Q first and set M = ⌈c √ N ⌉ + Q − 1 for some constant c, where Q − 1 appears in the choice of M to ensure a constant regression size. Clearly, any fixed Q ensures , which gives the optimal convergence rate of the above estimators.

Simulation Results
We firstly examine the impacts of Q and c on the performances of the WLS estimators for ⟨X, Y ⟩, ⟨X, X⟩, and ⟨Y, Y ⟩. For illustration purposes, we present the result of ⟨ ·, ·⟩ (W LS) parzen and its univariate versions with ψ 2 = 0.005 and (λ X , λ Y ) = (1/5, 1/10) in the main text. The impacts of Q and c on the heteroscedasticity-corrected WLS estimators are highly similar to the original versions because the correction term is numerically very small, and is omitted from the paper.
For each simulated path of (X, Y ), we estimate ⟨ ·, ·⟩ (W LS) parzen based on 5 × 20 different combinations of Q and c. As the MSRC is asymptotically equivalent a realized kernel whose MSE optimal bandwidth choices c * are well-studied in the literature (Barndorff-Nielsen et al., 2008, 2011bIkeda, 2015;Varneskov, 2017), for each path we also estimate this optimal bandwidth, denoted byĉ * . Detailed construction ofĉ * is documented in Online Appendix A.
We firstly present the bias and the root mean squared error (RMSE) of ⟨ X, Y ⟩ parzen has a small (<1%) finite sample bias which increases slightly in magnitude as c and Q increase. A larger Q inflates the bias and variance of ⟨ X, Y ⟩ (W LS) parzen , and in this case the optimal choice of Q appears to be Q = 1. An optimal choice of c which minimizes RMSE is visible, and we see that the estimated optimal bandwidthĉ * for the realized kernel works well in approximating the optimal c of the WLS estimator.
The impact of Q on the bias is much more pronounced in Panels 2 and 3 of Figure 1 in the univariate setting. Firstly, it is clear that with the MA(1) noise, ⟨ X, X⟩ are largely biased downwards due to the negative first-order autocorrelation in the noise, and the bias is eliminated by taking any Q > 1, which results in a large improvement of MSE for small choices of c. In the presence of AR(1) noises, a bias is present for any choice of Q whose sign depends on the sign of the noise autocorrelation, and the noise-induced bias diminishes as Q and c increases. In this setting, globally optimal choices of Q and c are less obvious, butĉ * remains a reasonable estimator for the unknown true optimal c for larger choices of Q.
The findings in Figure 1 are qualitatively unchanged for ⟨ X, Y ⟩ parzen induced by the dependent noise does not decay (it in fact becomes more pronounced) when we sample more frequently, which confirms our discussion in Proposition 3. Also, autocorrelations in the noise alone do not bias the estimators of ⟨X, Y ⟩ due to the independence assumption between the two noise processes. The average estimated optimal bandwidth E[ĉ * ] are all close to the choice of c that provides minimum RMSEs.
The above results suggest thatĉ * provides a reliable choice of c for the WLS estimator, which is adopted in our empirical analysis. As to the choice of Q, an RMSE optimal choice appears difficult to derive analytically. However, since the WLS estimator is robust to the first Q lags of noise autocorrelation according to Proposition 3, we can set Q large enough to avoid most of the noise autocorrelation to reduce the bias of the WLS estimator in the finite sample.
Based on this principle, we design a simple algorithm to choose the value of Q using Jacod et al.'s (2017) estimator of noise autocorrelation, which is presented in Online Appendix B.
Intuitively, the algorithm chooses someQ as the smallest Q such that the noise autocorrelation becomes close enough to zero at lag Q. Descriptive statistics ofQ for the simulation is presented in Table B.1.
Using the optimally chosenĉ * andQ, we proceed to compare the performances of the WLS estimators with other competing estimators. The results for the case (λ X , λ Y ) = (1/5, 1/10) are presented in Table 2, while Tables D.1 presents the case (λ X , λ Y ) = (1, 1/2) which has qualitatively similar findings.  For each estimator, DGP and inference target, we compute the bias and root mean squared error (RMSE) of the 9 estimators based on 1000 simulated paths. The WLS estimators are constructed with adaptively chosenĉ * andQ. For each DGP and inference target, the top three (resp. one) estimators with the smallest absolute bias and smallest MSE are in bold (resp. underlined).
From Table 2, we first see that the WLS estimators are frequently the least biased among all estimators for all three DGPs and inference targets. This is due to the adaptive choice ofQ which avoids most of the bias induced by noise autocorrelation. As to the biases of the competing estimators, ⟨ ·, ·⟩ (P HY ) and ⟨ ·, ·⟩ parzen are largely comparable to the WLS estimators in most of the cases, which demonstrates their consistency under dependent and heteroscedastic noise. The bias of ⟨ ·, ·⟩ (CRK) parzen is slightly larger than ⟨ ·, ·⟩ (P HY ) and ⟨ ·, ·⟩ (F T RK) parzen when the noise is dependent, but is considerably smaller than ⟨ ·, ·⟩ (sub) 15 and ⟨ ·, ·⟩ (M S) , which are not robust to dependent noise. This result further confirms Proposition 3 that the original MSRC estimator using the fastest scales is equivalent to the WLS estimator with Q = 1, which is biased in the presence of dependent noise.
The biases of the WLS estimators relative to their heteroscedasticity-corrected versions require some further attention. Taking ⟨ X, X⟩ (W LS) cubic as an example and assume independent noise, its bias due to heteroscedasticity in our simulation is approximately 11 : We therefore see that the heteroscedasticity correction indeed only adds about ψ 2 to ⟨ ·, ·⟩ parzen are not largely biased from the noise autocorrelation (e.g. DGP 1), but it can also exaggerate the positive bias caused by the noise correlation in DGP 2. In general, the heteroscedaticity correction is immaterial in our simulation setting for a normal level of noise (ψ 2 = 0.005) that is more practically relevant. This is in line with the empirical finding of Kalnina and Linton (2008) that the heteroscedasticity-correction terms are typically very small in comparison to the volatility itself.
In terms of the RMSE of all estimators, we also find that the WLS estimators are frequently among the top three best performing estimators from both panels of Table 2. The Parzen kernel-based WLS estimators appear to have smaller RMSEs than the cubic kernelbased ones. ⟨ ·, ·⟩ (F T RK) parzen is overall the best estimator in terms of the RMSE among all the competing estimators, followed closely by ⟨ ·, ·⟩ (P HY ) and ⟨ ·, ·⟩ using the adaptive choices of c and Q and the Parzen kernel implied weights 12 . Proposition 4 en- are consistent estimators of ψ 2 IQ X 1 and ψ 2 IQ Y 1 , respectively. Since the noises are assumed to be independent between X and Y , we do not ex-11 Here we take τ (t) = t to compute the bias of the WLS estimator as in Proposition 4, which is implied by the Poisson sampling scheme. For a more rigorous discussion, please see Example 2.3 of Jacod et al. (2017). 12 In unreported simulation results, we find that the adaptive choices of c and Q optimized for the estimation of the integrated covariation also work well for the noise variance estimator.
in the simulation.
As competitors to the WLS estimator of E[ϵ X n ], we firstly consider the noise variance estimator based on subsampled RV estimators in the spirit of Zhang et al. (2005), defined as To avoid the impact of dependence in the noise dynamics, we choose m = 20. Secondly, we construct the ReMeDI estimator of Li and Linton (2021) for the MMS noise variance, which is a consistent estimator of MMS noise moments under very general noise dynamics. The ReMeDI estimator with tuning parameter k is defined as: We consider two choices of k in our simulation. Firstly, we consider a fixed choice of k = 10 following the simulation setting in Li and Linton (2021). Secondly, we compute 13 the adaptive choice of k as recommended by Li and Linton (2021), denoted byk. The full simulation results for the four noise variance estimators are presented in Table 3.
The left panel of Figure 2 clearly shows that [X,X] (m) drifts upward as m increases, leading to a negativeβ /2 < 0. When we increase ψ 2 from 10 −5 to 10 −3 as in the right panel of Figure 2, the noise term dominates [X,X] (m) , resulting in a significantly improved fit of the WLS regression and a positive noise variance estimate. It is worth noting that the ReMeDI estimator is also negative using data from the left panel, but the signs of the two estimators are not always consistent. Also,β does not seem to be affected much by the size of the noise.
The possible negative noise variance estimates pose an important empirical question: is 14 Note that when the noise is integrated, it is not identifiable from the efficient price process, which effectively produces a noise process with zero variance. the noise variance significantly different from zero in the observed tick-by-tick data? If one cannot reject the null hypothesis that the noise variance is zero, then a simple realized variance estimator for ⟨X, X⟩ or a Hayashi-Yoshida estimator for ⟨X, Y ⟩ based on the tick-by-tick observations may be more accurate than a noise-robust method due to a faster convergence rate.
The WLS estimator of noise variance provides a promising statistic for constructing such test in a classic WLS regression framework, which adds to the existing approaches as in Jacod et al.
(2017); Aït-Sahalia and Xiu (2019); Li and Linton (2021). As this requires a careful study of the asymptotic variance ofβ , we leave it for future research.

Data, Summary Statistics, and Empirical Estimates
In this section, we conduct our main empirical analysis to examine the performance of our new estimator with real data. We resort to the tick-by-tick high-frequency intraday data of 27 Dow Jones Industrial Average (DJIA) constituting stocks. 15 We collect trade and quote data of these Therefore, it is an appropriate sample to assess realized covariance estimators in the presence (WMT), and Exxon Mobil (XOM), which are the DJIA constituting stocks throughout the sampling period. 16 We do not include the MSRC estimator in our empirical analysis due to its asymptotic equivalence to the WLS estimator. 17 Order imbalance (resp. order flow) is defined as the daily log difference of numbers of buyer initiated trades (resp. volume) and seller initiated trades (resp. volume).    (henceforward abbreviated as NV) as defined in our simulation. 19 One observation is that our raw noise variance estimate is negative on average (−1.51 × 10 −9 ). As explained in Figure 2, this is due to that the noise-to-signal ratio is in general very small relative to the sample size. This finding is confirmed by the sample

10
(NV (LL) in the table), the ReMeDI estimator of noise variance, which also has a negative sample average with the same magnitude. Therefore, to ensure that E[·] (W LS) estimates MMS noise variances rather than the estimation error, we truncate the NV estimates at zero by replacing all the negative values with zero in the following sections of the paper. We find that both NV and the truncated NV estimates have positive autocorrelation on average. 18 Varneskov and Voev (2013) consider a heterogeneous trading portfolio by replacing two DJIA stocks with two 5-10 times less liquid non-DJIA stocks. In our sample, the most liquid DJIA stock is around 10 times more liquid than the most illiquid stock. Hence our sample is comparable to theirs in terms of the degree of heterogeneous trading. 19 To simplify our empirical analysis, we do not consider the noise covariance estimates here.

In-Sample Volatility and Correlation Modelling
We proceed to forecast future realized variance and covariances. Since each estimator tends to predict itself better and the open-to-close covariance estimator is not positive definite (despite its unbiasedness in the long run average sense), we use the five-minute realized covariance estimator (subsampled every second) as the forecasting target following the literature (Hautsch et al., 2015). Besides using each estimator to forecast the target, we also check whether adding noise moments may improve forecasting performance. Since the differences among estimators are small, we stick to two estimators (CRK and WLS) throughout the main empirical analysis and report results of two additional estimators (FTRK and PHY) in Online Appendix Table   D.3 to Table D.6.
We use a HAR-DRD specification to forecast realized variance and correlation separately as in Oh and Patton (2016) and Bollerslev et al. (2018). Specifically, we forecast one day ahead realized variance using daily, weekly, and monthly averaged lagged realized variances constructed based on CRK and WLS realized estimators. We also consider whether adding the noise variance (NV) into the HAR model improves forecasting performance for the variance. In addition, we consider a specification including the interaction of NV and RV. For the correlation forecast, we use the scalar HAR model without including noise moments for simplicity. To facilitate a more meaningful interpretation of the average effect of noise variance on future volatility, we resort to a panel data regression specification, similar to Patton and Sheppard (2015). 20 The general specification for the variance prediction is as follows: where σ 2 i,t is the forecasting target, i.e. the five-minute subsampled realized variance, of stock i on day t, RV   n ) capture the contribution of the past daily, weekly and monthly quadratic variation estimates (resp. noise variance estimates) to the prediction of the forecasting target. For variance forecasting, we also include stock fixed effects (StockF E i ) to account for the heterogeneity in the levels of variances for different 20 While the panel regression is more restricted compared to the asset by asset regression, it allows easy interpretation of the effect of lagged volatility and noise on future volatility. stocks. We obtain robust standard errors clustered by stock. 21 For correlation forecasting, we add individual element (diagonal (1s) and off-diagonal (pairwise stock correlation) elements) fixed effect and use robust standard error clustered by element. (1) (2)  averaged lagged realized variances, RV (1) , RV (5) , and RV (22) respectively), HAR plus daily lagged noise variance N V (1) , HAR plus daily, weekly, and monthly lagged noise variances N V (1) , N V (5) , and N V (22) , and the one include the interaction between RV (1) and N V (1) . We add stock fixed effect and use robust standard error clustered by stock. This table also reports in-sample realized correlation forecasting results. The dependent variable is the one-day ahead subsampled 5-min realized correlation. We only consider the scalar HAR model using daily, weekly, and monthly averaged lagged realized correlations (RCorr (1) , RCorr (5) , and RCorr (22) ) based on each estimator. We add individual element (diagonal (1s) and off-diagonal (pairwise stock correlation) elements) fixed effect and use robust standard error clustered by element. Numbers in brackets are t-statistics. ***, **, and * refer to statistical significance at 1%, 5%, and 10%.
Columns (1) to (4) and (6) to (9) of These HAR models with different variance estimators have a very similar performance with R 2 s around 54%. For the effect of the noise variance, we show that the coefficient for the daily lagged noise variance is consistently negative and significant across different estimators. The effect is not only statistically significant but also economically meaningful. A one standard deviation increase in the noise variance is followed by around on average 11% of one standard deviation decline of the realized variance estimates in the next day. The coefficient remains negative and significant for the weekly lagged noise and it turns positive but insignificant for the monthly lagged noise. Therefore, the effect of noise seems to concentrate in the short to mid horizons. Moreover, including the daily noise variance leads to an about 1% increase in model fit, supporting the predictive power of the noise variance. We also find that the interaction terms between noise variance and realized variance are negative and significant across specifications. The noise variance itself turns positive or insignificant once the interaction term is included. This finding indicates that noise variance may also serve as a conditioning variable. High noise periods tend to be those with higher estimation errors, hence future volatility depends less on past volatility. We resort to Section 5.4 for more discussions about potential sources of predictability.
Columns (5) and (10) report the in-sample correlation forecasting results. We show that daily, weekly, and monthly lagged correlation estimates all positively and significantly forecast one day ahead realized correlations. Results hold true across different estimators. The WLS estimator performs comparably well as CRK with R 2 of about 68% in forecasting correlations.

Out-of-Sample Forecasting and Asset Allocation
We then check whether our findings are robust in an out-of-sample analysis. We first conduct the out-of-sample realized covariance forecasts. We use a rolling window regression estimation with a window size of 500 days (approximately one-third of the whole sample length) to estimate our models. We then move the window forward and re-estimate model parameters every day.
We forecast one day ahead realized variance using the (panel) HAR model and the HAR plus daily lagged noise variance model (HARNV) for parsimony. For the correlation specification, we stick to the scalar HAR model. We use the predicted variances and correlations to reconstruct the conditional variance-covariance matrix and compare it with the target subsampled 5-min realized covariance. 23 An important question is when does the noise variance improve the forecasting performance.
We consider a simple sub-sample period analysis. We first calculate the daily cross-sectional average of noise variances to obtain a measure of the aggregate level noise variance. Then we define high or low noise variance periods when the aggregate noise variance is above or below its time-series median value during the out-of-sample period. We calculate RMSE for both high and low noise variance periods. We find that RMSEs are larger in high noise periods than in low noise periods as naturally expected. Most importantly, however, the forecasting improvement by the noise variance is mainly concentrated in high noise periods. Including the noise variance does not reduce forecasting errors when the noise is small. Our findings suggest that when the noise is high, forecasting volatility is more difficult. Hence, if the noise variance contains some missing information about the efficient price, then including the noise variance into the model improves the forecasting performance. In contrast, when the noise variance is small, i.e. the observed price is close the latent efficient price, using lagged realized variance to forecast future variance already performs well. Hence including noise does not improve forecasting performance during these periods.
Our results so far focus on statistical improvements. Do these forecasting improvements   following the literature (Varneskov and Voev, 2013;Hautsch et al., 2015;Bollerslev et al., 2018). In the global minimum variance problem, the only input is the day t forecast of day t + 1 variance-covariance matrix Σ t+1|t . Therefore, based on our model forecasts of the conditional variance-covariance matrix, the optimal portfolio weight vector w t can be obtained by: We then multiply the weight by the individual stock return vector and obtain the portfolio return (r p,t = w ′ t r t ). We report portfolio mean, return standard deviation, Sharpe ratio, and maximum drawdown. To assess the economic benefit of the forecasting improvement, we use the annualized performance fee ∆ (or utility gain/certainty equivalence) based on the change of average utility relative to the benchmark portfolio (HAR model in our case, ∆ HARN V =Ū HARN V −Ū HAR ), where the average utility is defined as:  24 We also report the annualized performance fee for each noise-augmented portfolio relative to the HAR benchmark portfolio based on a moderate level of risk aversion parameter (γ = 6). 25 HARNV portfolios generate positive performance fees relative to the HAR portfolios from 5 to 8 basis points per annum. These economic improvements are positive 24 We recognize that portfolio volatility is slightly larger for HARNV in the case of CRK, and the portfolio performance improvements largely stem from the mean effect. One plausible explanation is as follows. Noise variance matters more in high noise environment. Since high noise variance is followed by lower predicted realized variance, the global minimum variance portfolio tends to assign higher weights to these stocks with lower predicted variances, ceteris paribus. Due to the leverage effect (i.e. negative volatility-return relation), these stocks tend to have higher average return. Therefore, the portfolio earns higher average return compared to the portfolio without using noise variance information. In an unreported analysis, we also check short-selling constrained portfolios and find similar results as our main findings. Hence our results are not due to an alternative explanation that extremely negative portfolio weights on stocks with high volatility and negative returns may artificially boost portfolio performance. Prior studies, such as Voev (2009)

and Cenesizoglu and Timmermann
(2012), discuss in more details about potential different performances under statistical and economic criteria.
Overall, our findings show that including noise variance leads to not only more accurate covariance forecasting but also stronger portfolio performance. 25 Our results hold when we use alternative risk aversion levels of γ = 1 or γ = 10. and small, in line with the small magnitude of forecasting improvements, albeit not statistically significant. 26 The small magnitudes of economic improvements are comparable to prior studies considering different covariance estimates (e.g. Varneskov and Voev (2013)). To assess the feasibility of the strategy, we also consider performance fees using transaction cost adjusted portfolio returns. 27 We find that our main results remain valid, despite a drop in economic magnitudes. We also find that the positive performance fees are mainly generated in those high noise variance periods, consistent with our findings for the forecasting errors. In summary, our results support the out-of-sample predictive power of the noise variance.

Understanding Microstructure Noise
We then explore what might drive the predictive power of the noise variance. One interesting observation in the forecasting analysis is that the noise variance is negatively correlated with future volatility. The conventional wisdom suggests that microstructure noise mainly reflects information about market illiquidity, which tends to be positively associated with contemporaneous volatility. Therefore, due to the persistence of volatility, noise is expected to be positively correlated with future volatility, which contradicts the illiquidity-based explanation. Therefore, we conduct further empirical analyses to understand the microstructure noise.
First, we regress the noise variance on a set of microstructure variables we considered in Section 5.1. We use similar panel regression specifications as above and consider both contemporaneous and predictive regressions. Table 7 reports the relationship between noise variance and microstructure variables. We show that the noise variance is positively correlated with bid-ask spreads and negatively correlated with market depth measures, hence when trading costs are high or the market depth is low, the noise variance is high, consistent with the illiquidity interpretation. However, we also find that the noise variance is positively correlated with the number of trades and trading volumes per day. This finding indicates that while the noise variance captures some dimensions of illiquidity (trading cost, depth), it may not reflect 26 We use Diebold and Mariano (1995) (DM) test for the inference of performance fee and Ledoit and Wolf (2008) robust Sharpe ratio test for the inference of Sharpe ratio improvement. 27 Transaction cost adjusted portfolio return is calculated by subtracting the transaction cost scaled by the turnover from the raw portfolio return. Specifically, turnover is calculated as  2021)). These features may also drive the observed price away from the fundamental efficient price, and hence enlarge the noise variance. Furthermore, we find that the noise variance is also negatively correlated with variables related to net buying pressures such as order flow and order imbalance. Therefore, the noise may partially reflect signed information not fully captured by the current stock price. Despite the association of the noise variance and these microstructure variables, we show that these models can explain only about 0.2% to 1.2% of the variations of noise variance. Hence, a large proportion of the noise variance remains unexplained.
Second, we directly include these microstructure variables along with the noise variance into our baseline HAR specifications. The intuition is as follows: if the predictive power of the noise variance mainly stems from market illiquidity proxies or other microstructure variables, we expect to see the predictive power of the noise variance declining when these microstructure variables are explicitly incorporated. Table 8 reports these regression findings. To save space, we only consider results based on two volatility estimators: CRK and WLS. 28 We find that number of trades, spread, volume, and depths are all significantly correlated with the future volatility. However, the noise variance remains negative and significant after controlling for these variables, despite the slight drop in economic magnitude. Therefore, our findings indicate that the market illiquidity is unlikely to be the sole or a major driver for the predictive power of the noise variance.
Third, we discuss the interaction between noise variance and RV, as well as the relation between between noise variance, realized quarticity (RQ), and jumps. Our interaction specification in Table 5 suggests that noise variance may serve as a conditioning variable. Besides forecasting future volatility directly, noise may influence future volatility through its contemporaneous relation with current volatility. One possibility is that, besides market illiquidity, noise also reflects the measurement error of the variance estimates. 29 When noise is high, volatility 28 Results hold also for FTRK and PHY, as shown in Table D.5 in Online Appendix. 29 It is well-known that the asymptotic variance of the noise-robust volatility estimators depends on the variance of the noise, see e.g. Zhang et al. (2005); Zhang (2006); Barndorff-Nielsen et al. (2008). Therefore, when noise variance is large, volatility tends to have larger estimator errors, and hence past volatility is less informative about future volatility. Taking account of this feature in the interaction specification therefore contributes to forecast improvement. (1) (2) (3) (4) (5) (6) (7) (8)   The dependent variable is sub-sampled 5-min realized variance one day ahead. We add seven microstructure variables: number of trade (NoT), bid-ask spread (Spread), trading volume (Volume), bid and offer depth, order imbalance (OI), and order flow (OF) to the HAR model (the model with daily, weekly, and monthly averaged lagged realized variances RV (1) , RV (5) , RV (22) respectively) along with daily lagged noise variance (N V (1) ) model. HAR-RV refers to control for RV (1) , RV (5) , RV (22) . We add stock fixed effect and use robust standard error clustered by stock. Numbers in brackets are t-statistics. ***, **, and * refer to statistical significance at 1%, 5%, and 10%. Panel A reports results using CRK realized variance while Panel B report results using WLS realized variance.
is measured less accurately due to noise contamination. This explanation is analogous to the idea of HARQ as in Bollerslev et al. (2016) and Bollerslev et al. (2018) that future volatility depends less on current volatility when the realized quarticity (measurement error) is high. An implication is that the positive predictive power of the lagged volatility is weakened when the noise variance is high. Our results in Table 5 show that the interaction term is negative and significant. In contrast, the noise variance coefficient becomes largely insignificant or turns positive. 30 Therefore, our findings are consistent with the idea that future volatility depends less on the current volatility when the noise variance is large. Our findings in Table 6 also confirm that the noise variance plays a stronger role in the high noise environment. These results are consistent with the error-based explanations.
One may wonder whether noise variance simply reflects RQ. We therefore explicitly control for the interaction between realized variance and the square root of RQ as in the HARQ model.
In addition, since both noise and jumps, which capture large and discontinuous price movements, are not directly observable, it is also interesting to check whether the predictive power of noise variance remains after controlling for jumps. Andersen et al. (2007) examine the role of jumps in volatility forecasting. We construct two popular jump variation measures following the bipower variation approach by Barndorff-Nielsen and Shephard (2006) and the median RV approach by Andersen et al. (2012) and control for jumps in our baseline specifications. Table   9 reports results about the relation between noise variance, RQ, and jumps. 31 Controlling for the interaction between RV and the square root of RQ, noise variance and the interaction between RV and NV remain negative and significant. Therefore, while the noise-RV interaction seems to share the same error-based intuition, the noise variance does not simply reflect RQ.
We also show that our results remain hold after controlling for two types of jump variations.
Therefore, the information content of noise variance cannot be attributed to jumps. In short, our results suggest that both illiquidity-and error-based explanations may jointly account for the predictive power of the noise variance. We leave it to future studies to further explore other plausible explanations. 30 In the case of PHY, as shown in Table D   med ) methods. We add stock fixed effect and use robust standard error clustered by stock. Numbers in brackets are t-statistics. ***, **, and * refer to statistical significance at 1%, 5%, and 10%.

Conclusion
In this paper, we propose a novel weighted least squares estimator to measure realized covariation. We show that it is asymptotically equivalent to the rate-optimal multi-scale estimator of Bibinger (2011Bibinger ( , 2012 and derive its asymptotic properties under general settings of the observation scheme and the MMS noise dynamics. We further conduct a comprehensive Monte Carlo simulation study. Our new estimator has comparable and often stronger finite sample performance in comparison with a set of well-known estimators in the literature. We also show that our method provides a reliable estimate of MMS noise variance.
We then conduct an empirical analysis using high-frequency intraday data of 27 DJIA constituent stocks over the period from 2014 to 2020. Consistent with the simulation findings, our empirical analysis confirms the accuracy of our estimator in measuring realized covariation. Our estimator performs well when it is used in forecasting future realized variance and correlation. While models based on different estimators generally perform similarly, including the noise variance extracted from our approach leads to consistent improvements in forecasting performance both in-sample and out-of-sample. These improvements are concentrated in high noise periods. The statistical forecasting improvement can be translated into tangible economic benefit, despite the small economic magnitude. We also investigate more about the noise variance. While the noise variance is correlated to several microstructure variables, its predictive power is remained when these variables are controlling for. Instead, the interaction between noise and volatility plays an important role in the predictive power of noise variance. Neither realized quarticity nor jump variation can fully subsume the predictive power of noise variance.
Overall, this paper introduces a new econometric method to jointly estimate realized covariation and noise variance using high-frequency data. Future studies may consider potential applications of the new approach in other important economic contexts and further explore the information content of microstructure noise.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2011b). Multivariate for the pre-averaging step, and the bandwidth is chosen to be k n = ⌈0.15 N X 1 + N Y 1 ⌉, as suggested by Christensen et al. (2013). Note that the univariate version of this estimator takes a simpler form, as described in Remark 3.2 of the paper.
The composite realized kernel, ⟨ X, Y ⟩ parzen is a univariate realized kernel estimator of ⟨X, X⟩ based on the refresh time synchronized observations of X with a Parzen kernel and the bandwidth 3 √ N .ψ Y is constructed analogously. For the univariate case, we work with the full series of X and Y without synchronization. We re-compute the optimal bandwidth for each series, and the construction of the estimator is identical.
Note that the sameĉ * is also used for ⟨ X, X⟩ parzen . Specifically, letĉ * X andĉ * Y denote the estimated tuning parameter based on X and Y using Eq. (A.4), we simply use them to parzen (and their heteroscedasticity-corrected versions). The tuning parameter for ⟨ X, Y ⟩ cubic , one simply scales the estimated tuning parameters of the corresponding Parzen kernel-implied weights by 3.68 4.78 ≈ 0.77.

B Adaptive Choice of Q for the WLS Estimator
In this section we describe the algorithm to choose Q, the number of fast scales we skip, for the WLS estimators. We firstly introduce an estimator of noise autocorrelation due to Jacod Let k denote a positive integer, understood as a window size of local averaging. Construct the sequence of pre-averaged prices {X n } n=k:N −k such that: Suppose that the MMS noise ϵ X n is stationary with finite second moments, and let Γ X m := E[ϵ X n ϵ X n−m ] denote the m-th lag noise autocovariances. The (upwardly biased) JLZ estimator of Γ X m according to their Eq. (3.10) is defined as:  where the factor √ 2 is included to account for the fact thatΓ (JLZ) m is jointly estimating two noise cross-covariances.
In Table B.1, we present the descriptive statistics of the adaptively chosenQ for both univariate and bivariate WLS estimators under all simulation settings considered in our paper.
Firstly, the choice ofQ to a good extent reflects the dependence structure of the noise. For example, the averageQ is close to 1 for the bivariate case as the noise processes are independent.
For the univariate case, we find thatQ is close to 2 for DGP 1 and is much larger for DGP 2 and 3, reflecting the dependence structure in the noise. The range and standard deviation ofQ in general increases as we observe more data, which is due to the scaling of √ N in the bounds for the autocorrelations. Autocorrelation in the noise does not appear to have a large impact forQ of ⟨ X, Y ⟩ (W LS) , but it has a much larger impact for the univariateQs. In general, we find thatQ for ⟨ X, X⟩ (W LS) is much larger than that of ⟨ Y, Y ⟩ (W LS) due to a more persistent noise dynamics. In general, the results in Table B.1 shows that our proposed algorithm succeeds at picking up the dependence structure in the MMS noise. This allows us to chooseQ that avoids lags with large noise autocorrelation, which is crucial in reducing the bias of the WLS estimator.

C Proofs
Proof of Proposition 1. We start by defining the following vectors and matrices used in the WLS regression of Eq. (6): Define also the following matrix: The WLS estimator of ⟨X, Y ⟩ is then by construction: To derive the explicit expression of ϕ (m) , note that X ′ DX has the following form by direct matrix multiplication: where the constants C 1 , C 0 and C −1 are defined as: Note that the above matrix cannot be singular whenever D is positive definite. We therefore have: Proof of Theorem 1. We start with a useful lemma which characterizes the asymptotic behaviour of Riemann sums involving w(x) as M → ∞: Lemma 1. For w(x) with dominating exponent d w and any Q = O(1), define the Riemann sum The following results hold as M → ∞: w(x) = x dw , we rewrite W p,q as: (C.10) The sum in the RHS of the above equation is known as a p-series, which is closely linked to Riemann's ζ function. Specifically, when p + qd w > −1, the p-series diverges, and the following result holds by an Euler-Maclaurin expansion of the ζ function: where γ is the Euler-Mascheroni constant. Finally, when p+qd w < −1, the p-series in Eq. (C.11) converges, which implies the last case of the lemma. This completes the proof.
In view of Lemma 1, all Riemann sums in C 1 , C 0 and C −1 converge to proper Riemann integrals when d w ≥ 1 for any δ ∈ (0, 1), which implies: where W d is define in Eq. (18). We now evaluate the difference ϕ (m) − φ (m) explicitly: and the last estimate can be obtained by plugging Eq. (C.14) into the above equation and simplify. As the above result is independent of m, Eq. (15) is proved. To show that φ (m) satisfies conditions 1 ′ and 2 ′ , we consider the alternative representation: whereC −1 = C −1 + 2C 0 + C 1 andC 0 = C 0 + C 1 have the following simple representation: Note also that C 1C−1 −C 2 0 = C 1 C −1 − C 2 0 and the asymptotic orders ofC 0 andC −1 remain unchanged. From here, the two required conditions M m=Q φ (m) = 1 and M m=Q φ (m) /m = 0 can be obtained by direct computation. Eq. (17) is a straightforward result by plugging the asymptotic expressions of C 1 ,C 0 andC −1 into the expression of φ (m) and simplify. Finally, the results regarding the properties of h(x) can be verified by straightforward calculus. This completes the proof.
Under Assumption 1, the discussion in Zhang (2006) and Bibinger (2012) suggest the following asymptotic order for the two leading terms in Eq. (C.18): where γ 2 ∝ M m=1 ( φ (m) m ) 2 determines the asymptotic order of the noise term as a function of φ (m) (Zhang, 2006, Proposition 2). Therefore, ⟨ X, Y ⟩ (W LS) converges at the fastest rate when the order of the discretization error equals that of the main noise term, and the remainder term has R n + 2 E[ϵ X n ϵ Y n ] = O p (M −1/2 ) which is dominated by the two leading terms whenever δ ≥ 1/2.
We check the asymptotic order of γ under different choices of d w . Recall the definition of φ (m) : whereC −1 ,C 0 and C 1 are defined in Eq. (C.14) and Eq. (C.17), whose asymptotic orders are identical to C −1 , C 0 and C 1 which depend on d w .
We now study the asymptotic order of γ 2 as M → ∞ by cases. Start with the first case d m > 0. Eq. (C.14) holds for C 1 and C 0 whenever d m ≥ 0, while forC −1 the Riemann sum still converges but with a possibly slower rate by Lemma 1, thus we findC −1 = N 2 M 2 (W −1 + O(M −(dw∧1) )) = O(N 2 M −2 ). Plugging in the asymptotic orders of C −1 , C 0 , and C 1 into φ (m) gives: We can now determine the asymptotic order of γ 2 : where W p,q is defined in Lemma 1. From here, it should be clear that the asymptotic order of γ 2 is governed by O(M −3 )W −2,2 . Based on Lemma 1, we have three sub-cases: (1) when . This is similar to the case in Zhang (2006), with the optimal bandwidth M = O( √ N ) and the convergence rate is the optimal N For the last case d w < 0, all three Riemann sums in C 1 , C 0 and C −1 can diverge. Let us firstly consider the scenario d w ∈ (−1, 0), where only the Riemann sum in C −1 diverges. We The asymptotic order of γ 2 is therefore: The optimal bandwidth is M = O(N 1 2+dw ) with the convergence rate N 1+dw 4+2dw . But this rate is negative for any d w < −1, which means that ⟨ X, Y ⟩ (W LS) is inconsistent, and the proof is complete.
Proof of Proposition 2. Recall from Eq. (C.2) thatβ , where by direct matrix calculation: with the remainder term: (C.33) and the four terms in Eq. (C.32) are asymptotically independent. Consider firstly the convergence rate of (I). From a bivariate version of Corollary 2 in Zhang (2006) we can conclude that: which follows from the fact that M N θ (m) = m M 2 (g( m M ) + O(M −1 )) has the same asymptotic structure as a (m) with some MSRC weight g(x). We thus see that ) which is asymptotically negligible. For (II), a direct computation of its variance yields: ) which is the same as (I) and is also asymptotically negligible. (III) is a standard sample mean which is of the order O p (N − 1 2 ) by a classical CLT. Finally, the remainder term satisfies: which follows in the same spirit as Theorem 4 of Zhang (2006). This implies thatR N = O p (N − 3 4 ), again asymptotically negligible. We thus see thatβ whereR N is defined in Eq. (C.33) with θ (m) replaced by ∆r (m) . Following the same proof of Proposition 2, we find that:  Tn) ] due to Assumption 3.3. A similar simplification is also applied to Γ m . Using the properties of the WLS estimator, we find: which is the desired expression of the expectations. For the asymptotic order of the bias, we note that ϕ (m) m is of the order O(M −2 ) and θ (m) m is of the order O(M −1 N −1 ) when d w ≥ 1. Under the q-dependence assumption, we see that: converge to zero in the limit for any δ ∈ ( 1 3 , 1), and the convergence in probability follows. This completes the proof.
Proof of Proposition 4. Since the choice of Q does not affect the asymptotic order of the WLS weights ϕ (m) and θ (m) , we take Q = 1 for simplicity. We firstly consider the result regarding ⟨ X, X⟩ (W LS) . Starting with the usual decomposition: where U (m) = − 2 m N n=m e X n e X n−m and: The result M m=1 ϕ (m) [X, X] (m) p → ⟨X, X⟩ is standard. As each U (m) is the end-point of a zeromean martingale and independent across m, the weighted sum has variance converging to zero such that M m=1 ϕ (m) U (m) p → 0. Therefore the asymptotic limit of ⟨ X, X⟩ (W LS) is determined by the last two terms in Eq. (C.45). We consider the cross-term first: Clearly, the first term in Eq. (C.47) is also the end-point of a zero-mean martingale, and by a similar argument as in Eq. (57) of Zhang (2006), the variance of this term converges to zero in the limit. For the second term in Eq. (C.47), we have the following Riemann sum approximation: where we have used the approximation ∆X n ≈ σ X tn ∆B X n + µ X tn ∆t n by the continuity of σ X t and µ X t , and note that ∆t n = O(N −1 ) by Assumption 4. The above result implies that: Its variance satisfies: Therefore, M m=1 ϕ (m) [X, e X ] (m) p → 0. Now for R n , the first term is a straightforward Riemann sum: For the second term, by mean value theorem we find ω(τ (m/N )) 2 = ω(0) 2 + O(mN −1 ) and similarly ω(τ (1 − m/N )) 2 = ω(1) 2 + O(mN −1 ), which follows from τ (0) → 0 and τ (1) → 1 by Assumption 3. We can write E This implies: Also, the asymptotic order of the variance of R n can be derived using the same argument as Eq. (53) of Zhang (2006), which is O(M −1 ). We thus have R n p → 2 1 0 ω(τ (u)) 2 du − (ω(0) 2 + ω(1) 2 ), which proves the required consistency of ⟨ X, X⟩ (W LS) .
We proceed to derive the result concerning E[(ϵ X n ) 2 ]

(W LS)
:=β as before:β As M m=1 θ (m) = 0 and θ (m) = M N ϕ (m) , the first three terms in Eq. (C.55) and also the second term inR n converge to zero in probability. The first term inR n is identical to Eq. (C.52), which implies thatβ (W LS) 1 p → 2 1 0 ω(τ (u)) 2 du as desired, and the proof is complete.
Proof of Corollary 1. We firstly note that, under the assumptions in Proposition 4, we have: which is proved following the same steps as in the proof of Proposition 4. The only difference is that the leading term in Eq. (C.46) disappears due to that M m=Q φ (m) m = 0 by construction. Therefore, we only need to show that (X,X) {Q} p → ω(0) 2 + ω(1) 2 . Notice that (X,X) {Q} has a similar structure to [X,X] (Q) , which can be decomposed as: The first term is simply For the second term:     variable is the sub-sampled 5-min realized variance one day ahead. We focus on two estimators including FTRK (Panel A) and PHY (Panel B). For each estimator, we consider four volatility model specifications: HAR (the model with daily, weekly, and monthly averaged lagged realized variances, RV (1) , RV (5) , and RV (22) respectively), HAR plus daily lagged noise variance N V (1) , HAR plus daily, weekly, and monthly lagged noise variances N V (1) , N V (5) , and N V (22) , and the one include the interaction between RV (1) and N V (1) . We add stock fixed effect and use robust standard error clustered by stock. This table also reports in-sample realized correlation forecasting results . The dependent variable is the one-day ahead subsampled 5-min realized correlation. We only consider the scalar HAR model using daily, weekly, and monthly averaged lagged realized correlations (RCorr (1) , RCorr (5) , and RCorr (22) ) based on each estimator. We add individual element (diagonal (1s) and off-diagonal (pairwise stock correlation) elements) fixed effect and use robust standard error clustered by element. Numbers in brackets are t-statistics. ***, **, and * refer to statistical significance at 1%, 5%, and 10%. and asset allocation results. Panel A reports out-of-sample average RMSE, reported in 10 −4 . We consider two estimators including FTRK and PHY. We consider the HAR model and the HAR with daily lagged noise variance (HARNV) model to forecast the subsampled 5-min realized variance-covariance matrix based on a HAR-DRD specification (forecast variance and correlation separately and then composite the covariance matrix) using a rolling window estimation with a window size of 500 days. Numbers in parentheses are Diebold-Mariano (DM) test t-statistics. Panel B reports out-of-sample asset allocation results. We report annualized mean, standard deviation, Sharpe ratios, maximum drawdown (MDD), performance fee or utility gains (∆) (reported in in basis point (10 −4 ) per annum), and performance fee accounting for transaction costs (∆tc). Numbers in brackets are Ledoit and Wolf robust Sharpe ratio test p-values. Numbers in parenthesis are Diebold-Mariano (DM) test t-statistics. The risk aversion parameter is at a moderate level of 6. Portfolio return is adjusted for a moderate level of transaction cost of 0.1% when calculating the transaction cost adjusted performance fee. For each analysis, we also report results for high noise and low noise periods. High noise period refers to periods when the aggregate level noise variance (cross-sectional average) is above its time-series median value, while low noise periods refer to those below the median value. (1) (2) (3) (4) (5) (6) (7) Panel A: FTRK N V (1) -0.1392*** -0.1208*** -0.1435*** -0.1242*** -0.1240*** -0.1442*** -0.1442*** -0.1161*** The dependent variable is sub-sampled 5-min realized variance one day ahead. We add seven microstructure variables: number of trade (NoT), bid-ask spread (Spread), trading volume (Volume), bid and offer depth, order imbalance (OI), and order flow (OF) to the HAR model (the model with daily, weekly, and monthly averaged lagged realized variances RV (1) , RV (5) , RV (22) respectively) along with daily lagged noise variance (N V (1) ) model. HAR-RV refers to control for RV (1) , RV (5) , RV (22) . We add stock fixed effect and use robust standard error clustered by stock. Numbers in brackets are t-statistics. ***, **, and * refer to statistical significance at 1%, 5%, and 10%. Panel A reports results using FTRK realized variance while Panel B report results using PHY realized variance. (1) (2) (3) (4) (5) (6) (7)  Table D.6: Noise, realized quarticity, and jumps. This table reports in-sample volatility modelling with noise, the interaction of noise variance and realized variance controlling for Realized Quarticity (RQ) and jumps using panel data fixed effect regressions.
All variables are standardized to facilitate the interpretation. The dependent variable is the sub-sampled 5-min realized variance one day ahead. We consider two estimators: FTRK (Panel A)and PHY (Panel B). We add the interaction between daily lagged noise variance and daily lagged realized variance (N V (1) × RV (D) ) into the HAR model (the model with daily, weekly, and monthly averaged lagged realized variances RV (1) , RV (5) , RV (22) respectively) plus daily lagged noise variance (N V (1) ). We also control for the interaction between RV and square root of RQ (RV (1) × RQ 1/2 ) as in the HARQ model. We also control for two measures of jump variations based on Bipower (JV (1) bp ) and Median value (JV (1) med ) methods. We add stock fixed effect and use robust standard error clustered by stock. Numbers in brackets are t-statistics. ***, **, and * refer to statistical significance at 1%, 5%, and 10%.