Mixed-normal limit theorems for multiple Skorohod integrals in high-dimensions, with application to realized covariance

This paper develops mixed-normal approximations for probabilities that vectors of multiple Skorohod integrals belong to random convex polytopes when the dimensions of the vectors possibly diverge to infinity. We apply the developed theory to establish the asymptotic mixed normality of the realized covariance matrix of a high-dimensional continuous semimartingale observed at a high-frequency, where the dimension can be much larger than the sample size. We also present an application of this result to testing the residual sparsity of a high-dimensional continuous-time factor model.


Introduction
Covariance matrix estimation of multiple assets is one of the most active research areas in high-frequency financial econometrics. Recently, many authors have been attacking the high-dimensionality in covariance matrix estimation from high-frequency data. A pioneering work on this topic is the paper by Wang & Zou [68], where the regularization methods (banding and thresholding) proposed in Bickel & Levina [6,7] have been applied to estimating high-dimensional quadratic covariation matrices from noisy and non-synchronous high-frequency data. Subsequently, their approach has been enhanced by several papers such as [42,44,65].
Meanwhile, such methods require a kind of sparsity of the target quadratic covariation matrix itself, which seems unrealistic in financial data in view of the celebrated factor structure such as the Fama-French threefactor model of [27]. To overcome this issue, Fan et al. [28] have proposed a covariance estimation method based on a continuous-time (approximate) factor model with observable factors, which can be seen as a continuous-time counterpart of the method introduced in Fan et al. [31]. The method has been further extended in various directions such as situations with unobservable factor, noisy and non-synchronous ob-servations, heavy-tail errors and so on; see [1,23,29,43,59] for details. As an alternative approach to avoid assuming the sparsity of the target matrix itself, Brownlees et al. [9] have proposed applying the graphical Lasso, which imposes the sparsity on the inverse of the target matrix rather than the target matrix itself.
On the empirical side, high-dimensional covariance matrix estimation from high-frequency financial data is particularly interesting in portfolio allocation. We refer to [30,49,67] for illustrations of relevant empirical work on this topic, in addition to the empirical results reported in the papers cited above.
To the best of the author's knowledge, however, there is no work to establish a statistical inference theory validating simultaneous hypothesis testing and construction of uniformly valid confidence regions for high-dimensional quadratic covariation estimation from high-frequency data. Such a theory is important in statistical applications as illustrated by the following example: Let Y = (Y t ) t∈[0,1] be a d-dimensional continuous semimartingale. We denote by Y i the i-th component of Y for every i = 1, . . . , d. If one attempts to apply a regularization procedure to estimating the quadratic covariation matrix [Y, Y] 1 of Y, it is important to understand whether the target matrix is really sparse or not, and if so, how sparse it is. This amounts to evaluating the following series of the statistical hypotheses simultaneously: If one wants to test the null hypothesis such that all the hypotheses in (1.1) is true, it is natural to consider the maximum type statistic where Λ := {(i, j) ∈ {1, . . . , d} 2 : i < j}. More generally, if one wants to control the family-wise error rate in multiple testing for the hypotheses (1.1), it is enough to approximate the distribution of max (i, j)∈L | [Y, Y] n 1 | for any L ⊂ Λ, with the help of the stepdown procedure illustrated in Romano & Wolf [61]. Hence the problem amounts to approximating the distributions of such maximum type statistics in an appropriate sense. Using the test statistics considered in Bibinger & Mykland [5], this type of testing problem can be extended to the sparsity test for the residual processes of a continuous-time factor model with an observable factor and thus promising in applications to high-frequency financial data. In addition, such a problem will also be useful for covariance matrix modeling in a low-frequency setting because it often suffers from the curse of dimensionality due to the increase of the number of unknown parameters to be estimated, and thus it is a common practice to impose a certain structure on covariance matrices for reducing the number of unknown parameters in models. For example, Tao et al. [64] have proposed fitting a matrix factor model to daily covariance matrices which are estimated from high-frequency data using the methodology of [68], while Kurose & Omori [46,47] have introduced a dynamic (multiple-block) equicorrelation structure to multivariate stochastic volatility models. The afore-mentioned testing will be useful for examining the validity of such specification. If the dimension d is fixed, the desired approximation can be obtained as a simple consequence of a multivariate mixed-normal limit theorem for 1 ), which is well-studied in the literature and holds true under quite mild assumptions; see e.g. Theorem 5.4.2 of [35]. The problem here is how to establish an analogous result when the dimension d possibly diverges as n tends to infinity.
Indeed, even for the sum of independent random vectors, it is far from trivial to establish such a result in a situation where the dimension is possibly (much) larger than the sample size. This is not surprising because objective random vectors are typically not tight in the usual sense in such a high-dimensional setting, so any standard method to establish central limit theorems no longer works. A significant breakthrough in this subject was achieved by the seminal work of Chernozhukov, Chetverikov & Kato [14], where a Gaussian approximation of the maxima of the sum of independent random vectors in terms of the Kolmogorov distance has been established under quite mild assumptions which allow the dimension is (possibly exponentially) larger than the sample size. With the help of the Gaussian comparison theorem by Chernozhukov et al. [17], it enables us to construct feasible statistical inference procedures based on the maximum type statistics.
Their theory, which we call the Chernozhukov-Chetverikov-Kato theory, or the CCK theory for short, has been developed in the subsequent work by Chernozhukov et al. [15,18] and Chernozhukov et al. [19]: the first two papers have developed Gaussian approximation of the suprema of empirical processes, while the latter has extended the results of [14] to a central limit theorem for hyperrectangles, or sparsely convex sets in more general. Extension of the CCK theory to statistics other than the sum of independent random vectors has also been studied in many articles: Weakening the independence assumption has been studied in e.g. [10,16,70,71]; Chen [11] and Chen & Kato [12,13] have developed theories for U-statistics. Moreover, some authors have applied the CCK theory to statistical problems regarding high-frequency data; see Kato & Kurisu [40] and Koike [45]. Nevertheless, none of the above studies is applicable to our problem due to its non-ergodic nature. That is, the asymptotic covariance matrix is random and depends on the σ-filed of the original probability space, so the asymptotic distribution is essentially non-Gaussian.
Meanwhile, inspection of the proofs of the CCK theory reveals that most the parts do not rely on any structure of the underlying statistics. To be precise, let S n be the random vector corresponding to the objective statistic and suppose that we aim at approximating the distribution of S n by its Gaussian analog S † n which has the same mean and covariance matrix as those of S n . In the proofs of the CCK theory, the fact that S n is the sum of independent random vectors is crucial only to obtain a good quantitative estimate for the quantities |E[ f (S n )] − E[ f (S † n )]| for sufficiently smooth functions f . In the original CCK theory [14,19], such an estimate has been established by the so-called Stein's method, especially Slepian's interpolation (also known as the smart path method) and Stein's leave-one-out method. Although their approach is not directly applicable to our problem, it suggests that we might alternatively use Malliavin's integration by parts formula because it can be viewed as an infinite-dimensional version of Stein's identity (cf. Sakamoto & Yoshida [63]). In fact, the recent active research in probabilistic literature shows a beautiful harmony between Malliavin calculus and Stein's method, which is nowadays called the Malliavin-Stein method; we refer to the monograph [54] for an introduction of this subject. Indeed, this idea has already been applied in [45] to a situation where S n is a vector of smooth Wiener functionals (especially multiple Wiener-Itô integrals) and S † n is Gaussian, which has produced several impressive results. Our plan here is to apply this idea to a situation where S n is a vector of multiple Skorohod integrals and S † n is conditionally Gaussian. In this regard, a relevant result has been given in Theorem 5.1 of Nourdin et al. [53]. However, this result is not directly applicable to the current situation because it assumes that the components of S † n are conditionally independent, which is less interesting to statistical applications (and especially not the case in the problem illustrated above). To remove such a restriction from the result of [53], we employ the novel interpolation method introduced in Nualart & Yoshida [57], instead of Slepian's interpolation used in [53] and the original CCK theory.
Another problem in the present context is validation of standardizing statistics by random variables. In a low-dimensional setting, this is typically achieved by proving the so-called stable convergence in law (see e.g. [60] for details). However, in a high-dimensional setting, the meaning of stable convergence is unclear and its naïve extension is not useful because of the lack of the continuous mapping theorem and the delta method (see Section 3 for a relevant discussion). So we also aim at developing a formulation appropriate to validating such an operation.
The remainder of the paper is organized as follows. Section 2 is devoted to some preliminaries on notation and concepts used in the paper. Section 3 presents the main results obtained in this paper. In Section 4 we apply the developed theory to establish the asymptotic mixed normality of realized covariance matrices in a high-dimensional setting and illustrate its application to testing the residual sparsity of a continuous-time factor model. Section 5 provides a small simulation study as well as an empirical illustration using real data.
All the proofs are collected in the Appendix.

Preliminaries
In this section we present some notation and concepts used throughout the paper.

Basic notation
We begin by introducing some basic notation which is more or less common in the literature. For a vector x ∈ R d , we write the i-th component of x as x i for i = 1, . . . , d. Also, we set min x := min 1≤i≤d x i . For two vectors x, y ∈ R d , the statement x ≤ y means x i ≤ y i for all i = 1, . . . , d. For a vector x ∈ R d and a scalar a ∈ R, we set Here, ⊤ stands for the transpose of a matrix.
For a matrix A, we write its (i, j)-th entry as A i j . Also, A i· and A · j denote the i-th row vector and the j-th column vector, respectively. Here, we regard both the vectors A i· and A · j as column vectors. If A is an m × d matrix, we denote by |||A||| ∞ the ℓ ∞ -operator norm of A: If B is another m × d matrix, we denote by A · B the Frobenius inner product of A and B. That is, For a d × d matrix A, we denote by diag(A) the d-dimensional vector consisting of the diagonal entries of A, For a random variable ξ and a number p > 0, we write ξ p = (E[|ξ| p ]) 1/p . We also use the notation ξ ∞ to denote the essential supremum of ξ. We will denote by L ∞− the space of all random variables ξ such that ξ p < ∞ for every p ∈ [1, ∞). The notation → p stands for convergence in probability.
If V is a real Hilbert space, we denote by ·, · V and · V the inner product and norm of V, respectively. Also, we denote by L p (Ω; V) the set of all V-valued random variables ξ such that E[ ξ 2 V ] < ∞. Given real Hilbert spaces V 1 , . . . , V k , we write their Hilbert space tensor product as V 1 ⊗ · · · ⊗ V k . For a real Hilbert space V, we write the kth tensor power of V as V ⊗k , i.e.
Note that the Hilbert space tensor product is uniquely determined up to isomorphism, and we often select a convenient realization case by case. For example, we identify the tensor product V ⊗R d with the Hilbert space . This is possible because the latter is the Hilbert space tensor product of V and R d in the sense of Definition E.8 in [37]. Namely, there is a bilinear map T : V × R d → V d such that the range of T is total in V d and Evidently, T is bilinear and its range is total in (H ⊗k ) d . Moreover, for any f, g ∈ V and a = (a 1 , . . . , a d ) ⊤ ∈ For an element f ∈ V ⊗k , we write the (canonical) symmetrization of f as Sym( f ). Namely, the map V ⊗k ∋ f → Sym( f ) ∈ V ⊗k is characterized as the unique continuous linear operator on V ⊗k such that for all f 1 , . . . , f k ∈ V, where S k denotes the set of all permutations of {1, . . . , k}, i.e. the symmetric group of degree k. An element f ∈ V ⊗k is said to be symmetric if Sym( f ) = f . We refer to Appendix E of [37] for details on Hilbert space tensor products.

Multi-way arrays
In this subsection we introduce some notation related to multi-way arrays (or tensors) which are necessary to state our main results.
Given a positive integer N, we set [N] := {1, . . . , N} for short. We denote by K the real field R or the complex field C and consider a vector space V over K. Given q positive integers N 1 , . . . , N q , we denote by For an array T ∈ V N 1 ×···×N q and indices For two K-valued arrays S , T ∈ K N 1 ×···×N q , we define their Hadamard-type product (i.e. entry-wise product) by Let m be a positive integer. For each j = 1, . . . , m, let V j be a real Hilbert space, p j ∈ N, N ( j) Then we define In particular, we write

Malliavin calculus
This subsection introduces some notation and concepts from Malliavin calculus used throughout the paper. We refer to Nualart [55], Chapter 2 of Nourdin & Peccati [54] and Chapter 15 of Janson [37] for further details on this subject.
Given a probability space (Ω, F , P), let W = (W(h)) h∈H be an isonormal Gaussian process over a real separable Hilbert space H.
Let V be another real separable Hilbert space. For any real number p ≥ 1 and any integer k ≥ 1, D k,p (V) denotes the stochastic Sobolev space of V-valued random variables which are k times differentiable in the Malliavin sense and the derivatives up to order k have finite moments of order p. If F ∈ D k,p (V), we denote by D k F the kth Malliavin derivative of F, which is a random variable taking its values in the space For a positive integer q, we denote by δ q the q-th multiple Skorohod integral, which is the adjoint operator of the densely defined operator L 2 (Ω) ⊃ D q,2 ∋ F → D q F ∈ L 2 (Ω; H ⊗q ). That is, the domain Dom(δ q ) of δ q is defined as the set of all H ⊗q -valued random variables u such that there is a constant C > 0 satisfying 2 , and the following duality formula holds for any u ∈ Dom(δ q ) and F ∈ D q,2 :

Multi-indices
This subsection collects some notation related to multi-indices.

Main results
Throughout the paper, we consider an asymptotic theory such that the parameter n ∈ N tends to infinity.
For each n ∈ N, we consider a probability space (Ω n , F n , P n ), and we suppose that all the random variables at stage n are defined on (Ω n , F n , P n ). We also suppose that an isonormal Gaussian process W n = (W n (h)) h∈H n over a real separable Hilbert space H n is defined on (Ω n , F n , P n ). To keep the notation simple, we subtract the indices n from (Ω n , F n , P n ), W n and H n , respectively. So we will write them simply as (Ω, F , P), W and H, respectively. In particular, note that the spaces and the operators associated with W (which are introduced in Section 2.3) implicitly depend on n, although we do not attach the index n to them.
For each n ∈ N, let M n be a d-dimensional random vector consisting of multiple Skorohod integrals: where q j is a positive integer and u j n ∈ Dom(δ q j ) for every j. Here, we assume that the dimension d possibly depends on n as d = d n , while q j 's do not depend on n. We also assume d n ≥ 3 for every n and q := sup j q j < ∞. Our aim is to study mixed-normal limit theorems for the following functionals: where W n 's are d-dimensional random vectors which represent the uncentered part of the functionals.
Let us introduce mixed-normal random vectors approximating the functionals Z n in law as follows: Here, C n is a d × d symmetric positive semidefinite random matrix and ζ n is a d-dimensional standard Gaussian vector independent of F , which is defined on an extension of the probability space (Ω, F , P) if necessary.
The main aim of this paper is to investigate reasonable regularity conditions under which the distribution of Z n is well-approximated by that of Z n . To be precise, we are interested in the following type of result: It is well-recognized in statistic literature, however, that this type of result is usually insufficient for statistical applications because it does not ensure standardization by a random vector which is still random in the limit; such an operation is crucial for Studentization in the present context. In a low-dimensional setting, this issue is usually resolved by proving the stability of the convergence so that (Z n , X) → L (Z n , X) as n → ∞ for any m-dimensional (F -measurable) random variable X, where → L denotes the convergence in law. This statement is no longer meaningful in a high-dimensional setting such that d → ∞ as n → ∞, so we need to reformulate it appropriately. A naïve idea is to consider the following statement: However, if m depends also on n, this type of statement is not attractive neither theoretical nor practical points of view due to the following reasons: From a theoretical point of view, we need to assume a so-called anti-concentration inequality for X to prove this type of result by the CCK approach, but it is usually hard to check such an inequality for general random variables, especially when m → ∞ as n → ∞. Besides, from a practical point of view, it is still unclear whether the convergence (3.1) ensures the validity of standardization of Z n because no analog of the continuous mapping theorem has been established yet for high-dimensional central limit theorems of the form (3.1). For these reasons we choose the way to directly prove convergence results for normalized statistics of Z n . More formally, let Ξ n be an m × d random matrix, where m = m n ≥ 3 possibly depends on n. Our aim is to establish as n → ∞ under reasonable regularity conditions on Z n and Ξ n . Mathematically speaking, given a vector y ∈ R m , the set {z ∈ R d : Ξ n z ≤ y} is a finite intersection of hyperplanes in R d , i.e. convex polytopes in R d , so the convergence (3.2) can be considered as a high-dimensional central limit theorem for random convex polytopes. If we take Ξ n as the d × d diagonal matrix whose diagonals are the inverses of the "standard errors" of Z n , the convergence (3.2) does ensures the validity of (marginal) standardization of Z n . Now, our main theorem is stated as follows: Theorem 3.1. Suppose that M n , W n ∈ D q,∞ (R d ) and C n ∈ D q,∞ (R d×d ) and that u j n is symmetric for all n and j. Suppose also that Ξ n can be written as Ξ n = Υ n • X n with Υ n being an m × d (deterministic) matrix such that |||Υ n ||| ∞ ≥ 1 and X n ∈ D q,∞ (R m×d ). Assume that the following convergences hold true: and |||Υ n ||| |ν| * * +1 as n → ∞ for every ν ∈ N * 4 (q), where if ν ∈ α∈A(q j ) N * 4 (α) and ∆ n, j (ν) = 0 otherwise. Assume also that the following condition is satisfied: Then we have (3.2) as n → ∞.
where the last identity follows from the relation q j k=1 k(ν k1 + ν k2 + ν k3 + ν k4 ) = q j . Hence, according to the notation defined by (2.1), we obtain  Let us write down conditions (3.3)-(3.4) in the special case that q j ∈ {1, 2} for all j. In this case, setting J q = { j ∈ {1, . . . , d} : q j = q} for q = 1, 2, we can rewrite these conditions as follows: where F n , G n ∈ {M n , W n }. In particular, when q j = 1 for all j, they consist of the following convergences: When q j = 2 for all j, they consist of the following convergences: where F n , G n ∈ {M n , W n }.
As a special case of Theorem 3.1, we can deduce a high-dimensional central limit theorem for multiple Skorohod integrals in hyperrectangles as follows. Let A re (d) be the set of all hyperrectangles in R d , i.e. A re (d) consists of all sets A of the form in Theorem 3.1, where E d denotes the identity matrix of size d, we obtain the following result (note that (3.2) continues to hold true while R d is replaced by (−∞, ∞] d ): and that u j n is symmetric for all n and j. Assume that the following convergences hold true: . Assume also that the following condition is satisfied:

Some related results for statistical applications
In many applications, the objective variables are only approximately multiple Skorohod integrals. The following lemma is useful for such a situation.
In terms of statistical applications, the mixed-normal approximation given by Theorem 3.1 is often infeasible because the "asymptotic" covariance matrix C n usually contains unobservable quantities. In the following we give two auxiliary results bridging this gap. The first result ensures the validity of estimating the F -conditional distribution of Ξ n Z n while we replace C n , W n and Ξ n by their estimators.
We remark that the above proposition only gives a way to estimate the F -conditional distribution of Ξ n Z n when we have appropriate estimators for relevant variables: It says nothing about how to estimate the unconditional distribution of Ξ n Z n . Because of the non-ergodic nature of the problem, in general there seems no hope of consistently estimating the latter quantity even if we can consistently estimate unknown variables contained in Ξ n Z n . In a low-dimensional setting this issue is usually resolved by standardizing the objective statistic by a consistent estimator for its asymptotic covariance matrix, which is validated via the stability of convergence in law. In a high-dimensional setting, however, standardizing the (joint) distribution of the objective statistic is often difficult: Estimators for the conditional covariance matrix of the objective statistic are usually singular because the sample size is smaller than the dimension, and even if it is regular, computation of the inverse is typically time-consuming. Nevertheless, we can fortunately show that, in order to estimate quantiles of the unconditional distribution Ξ n Z n , it is sufficient to only estimate its F -conditional distribution. We remark that this fact has already been known in high-frequency financial econometrics and typically been used to construct jump-related testing procedures; see [36,48] for example. Formally, we can prove the following result: Proposition 3.2. For each n ∈ N, let T n , T † n , T * n be random variables defined on an extension of the probability space (Ω, F , P). Suppose that distribution of T † n has the density on E n for every n and lim n→∞ P(E n ) = 1. For each n ∈ N, let q * n be the F -conditional quantile function of T * n : Then we have P(T n ≤ q * n (α)) → α as n → ∞ for all α ∈ (0, 1).
In this section we assume that the probability space (Ω, F , P) admits the structure such that Ω = Ω ′ × W, F = F ′ ⊗ B and P = P ′ × P for some probability space (Ω ′ , F ′ , P ′ ) and the r-dimensional Wiener space (W, B, P) over time interval [0, 1], and consider the partial Malliavin calculus with respect to the r- (cf. Section 6.1 of [69]). In this setting the Hilbert space H coincides with the space L 2 ([0, 1]; R r ). We here allow the dimension r = r n to possibly depend on n ∈ N, so (Ω, F , P) and B may depend on n, but we subtract the index n from the notation. Let (B t ) t∈[0,1] denote the filtration generated by the canonical process on W, and define the filtration 1] given by the following: We remark that the processes µ and σ generally depend on n because d and r may depend on n. However, following the custom of high-dimensional statistics, we subtract the index n from the notation as above.
We observe the process Y at the discrete time points t h = t n h = h/n, h = 0, 1, . . . , n. In such a setting, the discretized quadratic covariation matrix which is known as the realized covariance matrix in high-frequency financial econometrics, is a natural estimator for the quadratic covariance matrix of Y: The aim of this section is to establish the asymptotic mixed normality of the estimator [Y, Y] n 1 in a highdimensional setting such that the dimension d is possibly (much) larger than the sample size n.
Before stating the results, we introduce some notation. First, for a random variable F taking values in R N 1 ×···×N q for some N 1 , . . . , N q ∈ N, we set F p,ℓ 2 := F ℓ 2 p for every p ∈ (0, ∞]. Next, for a positive integer k, we identify the space H ⊗k with L 2 ([0, 1] k ; (R r ) ⊗k ) in the canonical way (cf. Example E.10 in [37]). Therefore, if a univariate random variable F is k times differentiable in the Malliavin sense, the kth in (R r ) ⊗k evaluated at (t 1 , . . . , t k ) ∈ [0, 1] k . We denote this value by D t 1 ,...,t k F. Moreover, for an index (a 1 , . . . , a k ) ∈ {1, . . . , r} k , we write the (a 1 , . . . , a k )-th entry of D t 1 ,...,t k F as D (a 1 ,...,a k ) t 1 ,...,t k F (note that we identify (R r ) ⊗k with R r×···×r ). We remark that the variable D t 1 ,...,t k F is defined only a.e. on [0, 1] k ×Ω with respect to the measure Leb k ×P, where Leb k denotes the Lebesgue measure on [0, 1] k . Therefore, if D t 1 ,...,t k F satisfies some property a.e. on [0, 1] k × Ω with respect to the measure Leb k ×P, by convention we will always take a version We define the d 2 × d 2 random matrix C n by which plays the role of the conditional covariance matrix of the approximating mixed-normal distribution in our setting.
Remark 4.1. In the fixed dimensional setting, C n converges in probability as n → ∞ to the random matrix under mild regularity assumptions, soC plays the role of the asymptotic covariance matrix in such a setting.
However, in the high-dimensional setting the convergence rate of C n toC does matter and we usually need an additional condition like (4.7) to derive it. To avoid such an extra assumption, we use the "intermediate version" C n ofC to state Theorem 4.1 below.
, X n ∈ D 2,∞ (R m×d 2 ) and Υ n be an m × d 2 (deterministic) matrix such that |||Υ n ||| ∞ ≥ 1, where m = m n possibly depends on n ∈ N. Define Ξ n := Υ n • X n and assume Then the following statements hold true:    do not diverge as n → ∞. This is a restriction because r typically diverges as n → ∞ in a highdimensional setting. Such a condition is satisfied e.g. when Y i and (σ i· t ) t∈[0,1] depend on only finitely many components of B for each i (they may vary with i, though). Therefore, it is satisfied if the price and volatility processes have a certain factor structure, which seems realistic in financial applications.
(c) The Malliavin differentiability condition on µ in Theorem 4.1 can be replaced by a continuity condition on µ analogous to (4.7). In fact, it is used only to prove Lemma B.6, where it is only crucial that µ is well-approximated by a "strongly predictable" process.
(d) Assumptions on the second Malliavin derivatives of the volatility process σ t sometimes appear in high-frequency financial econometrics even for the fixed-dimensional case; see [21,22] for example. By an analogous discussion to the one before Corollary 3.1, we can deduce a high-dimensional central limit theorem for realized covariance in hyperrectangles from Theorem 4.1: In some situations, it is more convenient to consider a localized version of the assumptions of Theorem , and suppose that the following conditions are satisfied: (iv) For all ν ∈ N, (4.1) holds true with replacing X n and σ by X n (ν) and σ(ν) respectively.
To make Theorems 4.1-4.2 statistically feasible, we need to estimate the "asymptotic" covariance matrix C n . We can construct a "consistent" estimator for C n in the same way as in the low-dimensional setting of Barndorff-Nielsen & Shephard [4]: Define the d 2 -dimensional random vectors χ h by Then we set (i) lim ν→∞ lim sup n→∞ P(Ω n (ν) c ) = 0.
(iii) There is a constant γ ∈ (0, 1 2 ] such that Then the following statements hold true: for all p ∈ [1, ∞) and ν ∈ N. Suppose also that d = O(n c ) as n → ∞ for some c > 0. Then we have When the dimension d is very large, computation of C 1/2 n is practically challenging, so it is better to employ a (wild) bootstrap to generate random vectors having the same distributions as that of C 1/2 n ζ n as follows. Let (e h ) ∞ h=1 be a centered Gaussian process independent of F , which is defined on an extension of (Ω, F , P) if necessary. Then we define The Gaussian process (e h ) ∞ h=1 must have an appropriate covariance matrix so that the F -conditional covariance matrix of S * n mimics C n . As is well-known in the literature (see e.g. [34]), the standard i.i.d. wild bootstrap fails to approximate the joint distributions of statistics in the present context. 1 Alternatively, we assume that (e h ) ∞ h=1 is stationary with auto-covariance function Then we can easily check that the F -conditional covariance matrix of S * n is equal to C n , so S * n has the same distribution as that of C 1/2 n ζ n . We remark that such a sequence (e h ) ∞ h=1 considered above can be generated by the following Gaussian MA(1) process: where (η * h ) n h=0 is a sequence of i.i.d. centered Gaussian variables with variance 1 2 . Therefore, we can rewrite S * n as The second term on the right side of the above equation is usually asymptotically negligible, so the bootstrap procedure considered here is essentially the same as the wild blocks of blocks bootstrap proposed in Hounyo [34].

Testing the residual sparsity of a continuous-time factor model
As an application of the theory developed above, we consider the problem of testing the correlation structure of the residual process of a continuous-time factor model. This problem was investigated in Section 4 of Bibinger & Mykland [5] for the case of two assets, and we are aim at extending their analysis to a multiple assets situation. Specifically, we suppose that the d-th asset Y d is regarded as an observable factor and consider the following continuous-time factor model: Here, β j is a constant and R j is a semimartingale such that For each (i, j) ∈ Λ n , we consider the following hypothesis testing problem: Our aim is to test the hypothesis (4.10) simultaneously for (i, j) ∈ Λ n , but we start with constructing a test statistic for a fixed (i, j) ∈ Λ n . For notational convenience, we construct the test statistic for every pair (i, j) in {1, . . . , d} 2 .
Remark 4.4 (Sparsity test of the quadratic covariation matrix itself). Considering the case Y d ≡ 0, we have Hence the problem turns to multiple testing for the hypotheses (1.1).
We follow [5] and consider the following statistic 0 . Therefore, it is natural to consider the estimated version of T i j as follows: In order to make the test statistic scale invariant, we consider the Studentized version ofT i j n . According to [5], the "asymptotic variance" ofT i j n is given by the following statistic: Let us denote byV in the general situation as follows: Note that we can rewriteT Therefore, a bootstrapped version ofT (i, j) n is defined as n, * ) 1≤i, j≤d . We derive mixed-normal approximations for vec(T n ) and vec(T n, * ) by applying the theory developed above. For this purpose we define the d 2 × d 2 random matrix X n by for i, j = 1, . . . , d and k, l = 1, . . . , d. Note that the statistics vec(T n ) and vec(T n, * ) can be approximated by X n S n and X n S * n , respectively. We then obtain the following result.
where V n (ν) is defined analogously to V n with replacing Σ by Σ(ν).

Then we have
sup A∈A re (d 2 ) P vec T n ∈ A − P X n C 1/2 n ζ n ∈ A → 0 and sup A∈A re (d 2 ) P vec T n, * ∈ A|F − P X n C 1/2 n ζ n ∈ A|F → p 0 as n → ∞, provided that d = O(n c ) as n → ∞ for some c > 0.
Now we return to the problem of testing (4.10) simultaneously for (i, j) ∈ Λ n . Here, we consider a more general setting described in the following for the purposes of application (cf. Section 5.2). We suppose that the set Λ n is decomposed into non-empty disjoint sets Λ 1 n , . . . , Λ L n as Λ n = L ℓ=1 Λ ℓ n . We consider the problem of testing simultaneously for ℓ = 1, . . . , L. Here, for a subset L of Λ n , λ∈L H λ 0 (resp. λ∈L H λ 1 ) denotes the hypothesis that H λ 0 is true for all λ ∈ L (resp. H λ 1 is true for some λ ∈ L). For simplicity of notation, we set H ℓ 0 := λ∈Λ ℓ n H λ 0 and H ℓ 1 := λ∈Λ ℓ n H λ 1 . If we let L be the number of elements in Λ n and write Λ n = {λ 1 , . . . , λ L } and set Λ ℓ n = {λ ℓ } for ℓ = 1, . . . , L, we recover the original problem of testing (4.10) simultaneously for (i, j) ∈ Λ n .
Our aim is the strong control of the family-wise error rate (FWER) in this problem. More formally, let Θ n be a set of pairs (µ, σ) of coefficient processes, which is considered as the set of all data generating processes we are interested in (note that the data generating process may vary with n mainly because the dimensions d and r may depend on n). For each θ ∈ Θ n , we denote by L n (θ) the set of all indices ℓ ∈ {1, . . . , L} for which the hypothesis H ℓ 0 holds true when θ is the true data generating process. Then, the FWER for θ ∈ Θ n , which is denoted by FWER(θ), is defined as the probability that H ℓ 0 for some ℓ ∈ L n (θ) is rejected when θ is the true data generating process. Given the significance level α ∈ (0, 1), we aim at constructing multiple testing procedures such that lim sup n→∞ FWER(θ n ) ≤ α (4.13) for any sequence θ n ∈ Θ n (n = 1, 2, . . . ) of data generating processes. To accomplish this, we employ the stepdown procedure of Romano & Wolf [61] which we describe in the following. First, given a fixed index ℓ, we shall use the test statistic T ℓ n := max λ∈Λ ℓ n |T λ n | for the problem (4.12). Next, we sort the observed test statistics in descending order and denote them by Also, for every subset L ⊂ {1, . . . , L}, suppose that we have a critical value c L n (1 − α) to test the null λ∈L H λ 0 against the alternative λ∈L H λ 1 . Those critical values can be random variables and will be specified later. Then the stepdown procedure reads as follows: , then accept all the hypotheses and stop; otherwise, reject H ℓ 1 0 and continue.
, then accept all the hypotheses H ℓ 0 for ℓ ∈ L k and stop; otherwise, reject H ℓ k 0 and continue. . . .
According to Theorem 3 of [61], the above stepdown procedure satisfies (4.13) if the critical values c L n (1−α), L ⊂ {1, . . . , L}, satisfy the following conditions: whenever θ n is the true data generating process for every n.
The first method to construct the desired critical values is the well-known Bonferroni-Holm method. Namely, we set c L n (1 − α) := q N(0,1) (1 − α/(2#[ ℓ∈L Λ ℓ n ])) for every L ⊂ Λ, where q N(0,1) denotes the quantile function of the standard normal distribution and #[ ℓ∈L Λ ℓ n ] is the number of elements in ℓ∈L Λ ℓ n . The second method is to use the (1 − α)-quantile of max ℓ∈L T ℓ n . Of course, we cannot analytically compute the quantiles of max ℓ∈L T ℓ n in general, so we approximate them by resampling as in [14,61]. Formally, setting T ℓ n, * := max λ∈Λ ℓ n |T λ n, * |, we use the F -conditional (1 − α)-quantile of max ℓ∈L T ℓ n, * as c L n (1 − α), which can be evaluated by simulation. We refer to this method as the Romano-Wolf method in the following.   Chen [11] for details on such an application in the case of i.i.d. observations.

Simulation study and an empirical illustration
In this section we present a small Monte Carlo study to assess the finite sample performance of the multiple testing procedures proposed in Section 4.1. We also demonstrate how the proposed methodology works in a real world using high-frequency data from the components of the S&P 100 index.

Simulations
We focus on the problem of testing the hypotheses (4.10) simultaneously for (i, j) ∈ Λ n . The simulation design is basically adopted from [28], but we include only the first factor representing the market factor in our model. Specifically, we simulate model (4.9) with the following specification 2 : and We set µ = 0.05, κ = 3, θ = 0.09, η = 0.3 and ρ = −0.6. The initial value v 0 is drawn from the stationary distribution of the process (v t ) t∈[0,1] , i.e. the gamma distribution with shape 2κθ/η 2 and rate 2κ/η 2 . We assume that Γ := (γ ⊤ i γ j ) 1≤i, j≤d is a block diagonal matrix with 10 blocks of size (d/10) × (d/10) whose diagonals are uniformly generated from [0.2, 0.5] and the corresponding correlation matrices have the constant correlation of ρ γ . We set d = 100 and vary ρ γ as ρ γ ∈ {0.25, 0.5, 0.75}.
For each scenario, we compute the FWERs and the average powers (i.e. the average probabilities of rejecting the false null hypotheses) of the Bonferroni-Holm and Romano-Wolf methods at the 5% level based on 10,000 Monte Carlo iterations respectively. Here, we generate 999 bootstrap resamples for the Romano-Wolf method.
Tables 1 and 2 report the results. We see from Table 1 that both the methods succeed in controlling the FWERs under the nominal level 5%, although both are rather conservative. Table 2 shows that the average powers in both the methods tend to 1 as n and ρ γ increase. The table also reveals that the Romano-Wolf method is more powerful than the Bonferroni-Holm method. As expected, the difference of the average powers between two methods becomes pronounced as the correlation ρ γ of the residual processes increases.

Empirical illustration
We apply our methodology to high-frequency returns of the components of the S&P 100 index while taking the SPDR S&P 500 ETF (SPY) as the observable factor process. The sample period is the one month, March 2018, and we regard this period as the interval [0, 1] (over-night returns are ignored). The data are provided by Bloomberg. Following Fan et al. [28], we use 15 minute returns to avoid notable market microstructure effects. To illuminate the block diagonal structure reported in [28], we sort the assets by their Global Industry Classification Standard (GICS) sectors while we construct the log-price processes Y j , We begin by examining the sparsity of the quadratic covariation matrix of the assets without taking ac- count of the factor process. The top panel of Figure 1 shows the corresponding realized correlation matrix.
Here, we perform multiple testing for the hypotheses (1.1) using the Romano-Wolf method with 999 bootstrap resamples and change the entries for which the null hypotheses are not rejected at the 5% level to blanks. The violet squares indicate GICS sector classifications. Namely, all assets in the same square belong to the same sector. We clearly find that the raw realized correlation matrix is far from sparse, i.e. most the entries are not blank. In fact, our test suggests that about 90.9% pairs would have significant correlations at the 5% level. Meanwhile, the bottom panel of Figure 1 shows the realized correlation matrix of the residual processes of the assets regressed on SPY. Again, we perform multiple testing for the hypotheses (4.10) as above to change the entries with insignificant correlations to blanks. The violet squares have the same meaning as above. In contrast to the first case, the realized correlation matrix exhibits the remarkable diagonal structure inherited from the assets' sectors. In this case only about 4.3% pairs are significantly correlated at the 5% level.
To investigate this block diagonal structure more deeply, we conduct another multiple testing for the absence of covariations within and between sectors after regressing assets on SPY.  Table 3. As expected, the p-values for the absence of within-sector covariations are very small across all the sectors, which suggests within-sector covariations should exist for all the sectors. In contrast, we find that between-sector covariations can be insignificant for several pairs. For example, assets belonging to Materials (M) are not significantly correlated with assets belonging to the other sectors at the 5% level.
The table also reveals a similar between-sector covariation pattern to the one observed in [28]. Namely, they report that the correlation between Energy (E) and Financials (F) disappears but Consumer Staples (CS) and Utilities (U) remain strongly correlated after 2010, which is consistent with the p-values reported in Table 3.
Overall, our methodology partially provides a statistically formal support of the findings by [28], although the scope of our analysis is quite limited and thus more comprehensive empirical studies will be necessary.

A.1 Additional notation
This subsection introduces some additional notation related to multi-way arrays and derivatives, which are necessary for the subsequent proofs.
As in Section 2.2, K denotes the real field R or the complex field C. We consider a vector space V over K. Let N 1 , . . . , N q be positive integers. For T ∈ V N 1 ×···×N q and x ∈ K N 1 ×···×N q , we set In particular, for Here, note that we identify K N 1 ⊗ · · · ⊗ K N q with K N 1 ×···×N q in the canonical way (see Section 2.2). Moreover, Now suppose that K = R and V is a real Hilbert space. Then we have for any v ∈ V (recall (2.1)). Let V 0 be another real Hilbert space and N ′ 1 , . . . , N ′ p ∈ N. Then, for any In fact, we have Let φ = (φ(y)) y∈R N be a real-valued function. If φ is a C ∞ function, we define the R-valued N-dimensional q-way array ∂ ⊗q y φ(y) by ∂ ⊗q y φ(y) = (∂ y i 1 ···y iq φ(y)) 1≤i 1 ,...,i q ≤N ∈ R N×···×N for any y ∈ R N and q ∈ N, where ∂ y i 1 ···y iq := ∂ q /∂y i 1 · · · ∂y i q . We set ∂ ⊗0 y φ(y) := φ(y) by convention. In general, we say that φ is rapidly decreasing if φ is a C ∞ function and for any A > 0 and q ∈ Z + . When φ is rapidly decreasing, we define its Fourier transformφ : Here, i denotes the imaginary unit. By Theorem 7.4(c) from [62], one has for any y ∈ R N , q ∈ N and C-valued N-dimensional q-way array T ∈ C N×···×N .

A.2 Proof of Theorem 3.1
We begin by noting that it is enough to prove the theorem for the special case that all the rows of the matrix X n are identical: Lemma A.1. Suppose that the claim of Theorem 3.1 holds true if X 1· n = · · · = X m· n for every n ∈ N. Then the claim of Theorem 3.1 holds true for the general case as well.
Proof. Define the m × md matrix Υ n by We also define the m × md random matrix X n so that all the rows are identical to the md-dimensional random vector given by In addition, we define the md-dimensional random vector Z n so that By assumption we can apply Theorem 3.1 with taking Υ n , X n and Z n as Υ n , X n and Z n respectively, which yields the desired result.
Taking account of Lemma A.1, we focus only on the case that X 1· n = · · · = X m· n =: X n for every n ∈ N. Next we recall the following anti-concentration inequality called Nazarov's inequality in [19]: Proposition A.1 (Nazarov's inequality). Let ξ be an m-dimensional centered Gaussian vector such that ξ j 2 ≥ a for all j = 1, . . . , m and some constant a > 0. Then for any y ∈ R m and ε > 0, The above form of Nazarov's inequality is found in [20]. An application of the above result immediately yields the following anti-concentration inequality for a mixed-normal random vector: Lemma A.2. Let ξ be an m-dimensional standard Gaussian vector. Also, let Γ be an m × m symmetric positive-semidefinite random matrix independent of ξ. Then for any y ∈ R m and b, ε > 0, Now we turn to the main body of the proof. As is pointed out in the Introduction, the key part of the proof is to derive reasonable estimates for the quantities for smooth functions f : R 2d → R. In fact, the remaining part of the proof is essentially the same as the one for the high-dimensional central limit theorem of [19]. To get a reasonable estimate for (A.5), we derive an interpolation formula for it, borrowing an idea from [57]. Namely, we use the duality between iterated Malliavin derivatives and multiple Skorohod integrals combined with the interpolation method in the frequency domain introduced in [57] (see also [66]).
Following [57], we set and ϕ n (θ; z, x) = E[e λ n (θ;z,x) ] for θ ∈ [0, 1] and z, x ∈ R d . We first derive a representation formula for the derivative of ϕ n (θ; z, x) with respect to θ. For this purpose we need the following Malliavin derivative version of the (generalized) Faà di Bruno formula for the iterated derivative of a composition of functions: Lemma A.3. Let q, r be positive integers and g = (g(x)) x∈R r be a real-valued C q function all of whose partial derivatives up to order q are of polynomial growth. Then, for any F ∈ D q,∞ (R r ), we have g(F) ∈ D q,∞ and Noting that Malliavin derivatives can be characterized by directional derivatives along Cameron-Martin shifts (cf. Chapter 15 of [37]), we can derive Lemma A.3 from the usual Faà di Bruno formula (found in e.g. [51]). Alternatively, we can prove Lemma A.3 in a parallel way to the usual Faà di Bruno formula using the chain rule for Malliavin derivatives (see e.g. Theorem 15.78 of [37]) instead of that for standard ones.
Lemma A.4. Under the assumptions of Theorem 3.1, the partial derivative ∂ θ ϕ n (θ; z, x) exists and it is given by Proof. By assumption the function θ → ϕ n (θ; z, x) is evidently differentiable and we have where we also use the identity holding for any f, g ∈ H ⊗q such that g is symmetric. Now, by (A.3) we have Combining this identity with (A.6), we obtain the desired result.
The following lemma is presumably a standard result. We prove it for the shake of completeness.
Lemma A.5. Let f = ( f (y)) y∈R N be a real-valued C ∞ function all of whose partial derivatives are of polynomial growth. Then there is a sequence ( f j ) ∞ j=1 of compactly supported real-valued C ∞ functions on R N such that as j → ∞ for any ξ 0 , ξ 1 , . . . , ξ N ∈ L ∞− and α ∈ Z N + .
Proof. Take a C ∞ function φ : R N → [0, ∞) having compact support and satisfying φ(0) = 1. For every j = 1, 2, . . . , we define the function φ j : Then we define f j := f φ j for j = 1, 2, . . . . f j is evidently a C ∞ function with compact support. Moreover, we have ∂ α y f j (y) → ∂ α y f (y) as j → ∞ for any y ∈ R N and α ∈ Z N + . In addition, for any s ∈ N, there is a constant C > 0 which depends only on φ and s such that | f j (y)| ≤ C(| f (y)| + s k=1 ∂ ⊗k y f (y) ℓ 1 ) for any y ∈ R N ; we can easily prove these facts by directly differentiating f j with the help of the Leibniz formula and the chain rule. Consequently, we have sup j∈N ξ 0 ∂ α y f j (ξ 1 , . . . , ξ N ) 2 < ∞ for any α ∈ Z N + because ξ 0 , ξ 1 , . . . , ξ N ∈ L ∞− and all the partial derivatives of f have polynomial growth. Therefore, (ξ 0 ∂ α y f j (ξ 1 , . . . , ξ N )) j∈N is uniformly integrable, so the Vitali convergence theorem yields (A.8). This completes the proof. Now we get the following interpolation formula for (A.5): Lemma A.6. Let f : R 2d → R be a C ∞ function all of whose partial derivatives are of polynomial growth.
Under the assumptions of Theorem 3.1, we have Proof. Thanks to Lemma A.5, it is enough to prove the lemma when f is rapidly decreasing. In this case the Fourier inversion formula and the Fubini theorem yield Hence the desired result follows from Lemma A.4, (A.4) and the Fourier inversion formula.
We will use the following elementary result in the proof: Lemma A.7. Let k, l be two positive integers. Then we have for any i 1 , . . . , i k , j 1 , . . . , j l ∈ {1, . . . , d}.
One can easily prove the above lemma by induction on k and application of the Leibniz rule, so we omit its proof.
Finally, as in the original CCK theory, a special approximation of the maximum function (called the "smooth max function") will play a crucial role in our proof. The following lemma summarizes the key properties of this smooth max function used in the proof: Lemma A.8. Let ε > 0 and set β = ε −1 log m. Define the function Φ β : R m → R by Then we have for every w ∈ R m . Moreover, for any C ∞ function g : R → R, s ∈ N, ε > 0 and w ∈ R m , it holds that where C g,s > 0 depends only on g and s.
Proof. First, note that Φ β is usually denoted by F β in the literature on the CCK theory. Now, (A.10) is stated in e.g. Eq.
Proof of Theorem 3.1. First, as is already noted in the above, for the proof it is enough to focus only on the case that X 1· n = · · · = X m· n =: X n for every n ∈ N due to Lemma A.1. Note that in this case we have Ξ n z = Υ n (z • X n ) for every z ∈ R d .
We turn to the main body of the proof. Take a number ε > 0 arbitrarily, and set β = ε −1 log m. We define the function Φ β : R m → R by (A.9). We also take a C ∞ function g : R → [0, 1] such that all the derivatives of g are bounded and g(t) = 1 for t ≤ 0 and g(t) = 0 for t ≥ 1. Now let us fix a vector y ∈ R m arbitrarily, and define the functions ϕ : For any k, l ∈ Z + and any z, x ∈ R d , we have Applying the Leibniz rule repeatedly (cf. Proposition 5 of [33]), we deduce Note that the above inequality evidently holds true if A = ∅. Moreover, we obviously have a∈A ∂ z ia z j 1 · · · z j l = 0 if r > l. Consequently, we infer that Meanwhile, we can easily verify that

Now, by (A.11) it holds that
∂ ⊗s w ϕ(w) ℓ 1 ≤ C g,s ε −s (log m) s−1 for all w ∈ R m , where C g,s > 0 is a constant which depends only on g and s. Therefore, we obtain where ε 1 := ε ∧ 1 and c g,k,l , c ′ g,k,l > 0 are constants which depend only on g and k, l (recall |||Υ n ||| ∞ ≥ 1 by assumption). We especially infer that all the partial derivatives of f are of polynomial growth. Therefore, noting that ∆ n, j (ν) = 0 when ν α∈A(q j ) N * 4 (α), by (A.1) and Lemma A.6 we obtain where K q > 0 depends only on q and c ′′ g,q > 0 depends only on g and q. Now we have ≤ P(Ξ n Z n ≤ y + 2ε) + η n (ε) (∵ Eq.(A.10)).
Set Γ n := Ξ n C n Ξ ⊤ n . Then, by Lemma A.2 we obtain for every b > 0. By an analogous argument we also obtain Therefore, we conclude that by assumption. Letting b → 0, we complete the proof.

A.3 Proof of Lemma 3.1
Take a number ε > 0 arbitrarily. For any y ∈ R m , we have P(Y n ≤ y) ≤ P( log m Y n − Ξ n Z n ℓ ∞ > ε) + P(Ξ n Z n ≤ y + ε/ log m) Now, by assumption we obtain We first let ε → 0. After that, we let b → 0. Then we conclude that This completes the proof.

A.4 Proof of Proposition 3.1
The proof is analogous to that of Theorem 2 from [17] (see also the proof of Theorem 4.1 from [19]).
Thanks to (A.12), it suffices to prove as n → ∞ for any η > 0. However, since we have this amounts to proving as n → ∞ for any η > 0. To prove this claim, we take a number ε > 0 arbitrarily and set β = ε −1 log m as in the proof of Theorem 3.1. Then we define the function Φ β : R m → R by (A.9). We also take a C ∞ function g : R → [0, 1] such that all the derivatives of g is bounded and g(t) = 1 for t ≤ 0 and g(t) = 0 for t ≥ 1.
Fix a vector y ∈ R m arbitrarily and define the function ϕ : R m → R by Then we define the stochastic process Ψ = (Ψ(t)) t∈ [0,1] by We evidently have with probability one. Then, Stein's identity yields with probability one. Consequently, we obtain by Lemmas 3-4 of [17], where C > 0 is a constant which depends only on g. Now we have .(A.10)).
Since we have on the set Ω b by the Nazarov inequality, we obtain a.s. on the set Ω b . By an analogous argument we also obtain a.s. on the set Ω b . Therefore, we conclude that a.s. on the set Ω b . Since the function y → P Γ 1/2 n ξ ′ n ≤ y|F − P(Γ 1/2 n ξ n ≤ y|F ) is a.s. right-continuous, the above result implies that for all n ∈ N. Now, take a number a > 0 such that 2a b ( √ 2 + 2/ log 2) ≤ η 2 and set ε = a/ log m. Then the above inequality yields Therefore, (A.13) follows from the assumption (3.18), which yields the desired result.

A.5 Proof of Proposition 3.2
We follow Step 3 in the proof of Theorem 2 from [41]. First, by assumption and Theorem 9.2.2 of [26] there is a sequence ε n of positive numbers tending to 0 such that Next, let us denote by q † n the F -conditional quantile function of T † n . Then, on the set E n ∩ E n we have Hence, on E n ∩ E n it holds that q * n (α) ≤ q † n (α + ε n ). Therefore, we obtain Meanwhile, for any ω ∈ E n ∩ E n and any z ∈ R such that P(T * n ≤ z|F )(ω) ≥ α, we have Hence it holds that q † n (α − ε n )(ω) ≤ z. This implies that q * n (α) ≥ q † n (α − ε n ) on E n ∩ E n . Therefore, we obtain Consequently, we obtain P T n ≤ q * n (α) → α as n → ∞.

B.1 Proof of Theorem 4.1
We first introduce some notation. For two sequences (x n ), (y n ) of numbers, the notation x n y n means that there is a universal constant C > 0 such that x n ≤ Cy n for all n. Here, the value of the constant C will change from line to line. We define the d- by Proposition 1.3.11 of [55]. Moreover, from the above expression the process (δ(ξη t 1 (I h ×I h )∩S (·, t))) t∈ [0,1] is evidently F-predictable and H-valued. Therefore, Proposition 1.3.11 of [55] and Proposition 2.6 of [58] imply that ξ ⊗ η1 (I h ×I h )∩S belongs to Dom(δ 2 ) and δ 2 (ξ ⊗ η1 (I h ×I h )∩S ) = Similarly, we can show that ξ ⊗ η1 (I h ×I h )∩S c ∈ Dom(δ 2 ) and This completes the proof.
Thanks to Lemma B.1, we have u i j n ∈ Dom(δ 2 ), so we can define the variable M i j n by M i j n = δ 2 (u i j n ).
Next we prove some auxiliary results. We begin by noting some elementary facts which are frequently used throughout the proof. First, for any random variable ξ and any p, q ∈ (0, ∞), it holds that |ξ| q p = ξ q pq .
Next we establish some properties of M i j n which are necessary for the application of our main theorem. The first result gives the moment bounds.  Proof. (a) We prove the claim by induction on k. When k = 1, the claim follows from Proposition 1.3.8 of [55]. Next, supposing that the claim holds true for k = K ∈ N, we prove the claim for k = K + 1. From (B.5) we have = δ D t 1 ,...,t K ,t u + (K + 1) Sym D K u (t 1 , . . . , t K , t) for all t ∈ [0, 1]. This implies that the claim also holds true for k = K + 1 and thus completes the proof.
(b) This claim is an immediate consequence of claim (a) and Propositions 1.2.8 and 1.3.11 of [55].
We then obtain the following result. for any p ∈ [2, ∞) by the Hölder inequality and Proposition B.1.
We turn to the main body. First we prove claim (a). Note that we have ξ ℓ ∞ p ≤ k 1/p max 1≤i≤k ξ i p for any p ≥ 1, k ∈ N and k-dimensional random vector ξ. Then, thanks to Lemmas 3.1, B.6 and B.7, it suffices to prove lim n→∞ sup y∈R m |P (Ξ n (M n + W n ) ≤ y) − P(Ξ n (S n + W n ) ≤ y)| = 0.
Letting ν → ∞, we complete the proof. By construction both the Bonferroni-Holm and Romano-Wolf methods evidently satisfy condition (i). So it remains to check that they also satisfy (ii). Since it holds that max ℓ∈L n (θ n ) T ℓ n = max ℓ∈L n (θ n ) max λ∈Λ ℓ n |T λ n |,