Improved bounds for the bracketing number of orthants or revisiting an algorithm of Thi´emard to compute bounds for the star discrepancy

,


Introduction
Entropy numbers, like the logarithm of bracketing or covering numbers, are measures of the size and complexity of a given class of functions F , and are frequently used in empirical process theory, density estimation, high-dimensional probability, machine learning, uniform distribution theory, or, Banach space theory, see, e.g., [4,17,21,23,24].
In this paper we focus on the bracketing number for characteristic functions of axisparallel rectangles in the d-dimensional unit cube.Nevertheless, we start with the general definition of the bracketing number and put it into relation to other entropy concepts like covering and packing numbers or delta covers.
Let us assume that F is a subset of a normed space (F, • ) of real-valued functions.For two functions ℓ, u ∈ F with ℓ ≤ u we define the bracket [ℓ, u] by [ℓ, u] where inequalities between functions are meant pointwise.The bracket is an ε-bracket if its weight W ([ℓ, u]) satisfies A set of ε-brackets B is an ε-bracketing cover of F if F ⊆ ∪ B∈B B. The bracketing number N [ ] (ε, F , • ) is the minimum number of ε-brackets needed to cover F .Let us state a related definition: A finite subset Γ of F is called a (one-sided) δ-cover of F , if for every h ∈ F there exists a δ-bracket [ℓ, u] with ℓ, u ∈ Γ and h ∈ [ℓ, u].Denote by N(δ, F , • ) the minimum cardinality of all δ-covers of F .
The main difference between the notion of an ε-bracketing cover B and a δ-cover Γ (for δ = ε) is that B consists of ε-brackets, i.e., subsets of F , while Γ consists of functions, i.e., points of F .It is easy to verify the following relation: The first inequality shows that upper bounds for the bracketing number give us reasonable upper bounds for N(ε, F , • ), while the second inequality seems to be, in general, not overly helpful to provide good upper bounds for the bracketing number with the help of upper bounds for N(ε, F , • ).Techniques based on bracketing were introduced in the context of empirical process theory by R. M. Dudley in 1978 [8].The notion of a (one-sided) δ-cover was introduced in [18] in the context of approximation theory and was used frequently in the area of uniform distribution theory and quasi-Monte Carlo methods, see, e.g., the survey article [12] and the more recent research papers [1,3,2,5,6,20,16,25,13,14,9].
Similarly, we can define the covering number C(ε, F , • ) as the minimum number of closed balls {f | g − f ≤ ε} in F , with some center g ∈ F and radius ε, needed to cover F .Furthermore, we call a subset S of F ε-separated, if f − g > ε for all f, g ∈ S with f = g.The packing number P (ε, F , • ) is the maximum number of ε-separated functions in F .
Let us assume that (F, • ) satisfies the Riesz property, i.e., for all f, g ∈ F the pointwise inequality |f | ≤ |g| implies f ≤ g .Then it is straightforward to show that where the Riesz property is used to establish the final inequality, cf., e.g., [23,Chapter 2].The inequalities in (4) show that upper bounds for the bracketing number imply upper bounds for the covering and the packing number, but the other way around this is not necessarily the case.
Let us denote the d-dimensional Lebesgue measure by λ d .We confine ourselves to the setting where (F, • ) is the space L 1 (I d , λ d ) of λ d -integrable real-valued functions on I d := [0, 1) d and F is a set of characteristic functions of half-open boxes in the ddimensional unit cube I d .Note that L 1 (I d , λ d ) satisfies the Riesz property.
More precisely, for x = (x 1 , . . ., x d ) and y = (y 1 , . . ., y d ) in and let be the set systems of all anchored half-open d-dimensional intervals ("corners") and all half-open intervals ("rectangles") in I d , respectively.We identify these set systems with the set of characteristic functions respectively; as a general convention, we denote the characteristic function of a set A by 1 A .
For the set systems C d and R d we would like to have small ε-bracketing covers which can be efficiently constructed.Furthermore, we want good upper bounds for the bracketing numbers , which reveal the dependence of these quantities on ε and on d.
The so far best known upper bound with explicitly given constants for the bracketing number of C d is It was proved in [14,Theorem 2.5] and relies on a construction from [10]; due to the derivation of a new Faulhaber-type inequality and a refined analysis it improves on the bound presented in [10,Theorem 1.15].Furthermore, [10,Lemma 1.18] provides a constructive way to obtain an ε-bracketing cover for R d with the help of an ε 2 -bracketing cover for C d .In particular, it establishes the relation Combining the estimates (7) and ( 8) results in the so far best known upper bound for the bracketing number of R d .
In [22] Eric Thiémard gave a simple procedure to partition the unit cube I d , which results in an ε-bracketing cover P d ε for C d .This procedure is the key ingredient in his algorithm to compute bounds for the star discrepancy of arbitrary point sets in I d (for more information about approaches to calculate discrepancy measures we refer to the book chapter [7]).He also provided an upper bound for the cardinality of his bracketing cover.Let us focus here in the introduction on the most interesting case where d ≥ 3. (We provide in Theorem 2.9 and Remark 2.10 also results for d = 2, and discuss the previously known results in two dimensions in Remark 2.11.)Thiémard's bound reads (after some slight simplification to make it more easily comparable to the other bounds presented in this note) In this note we make a more refined analysis of Thiémard's partitioning procedure to derive a substantially better upper bound than (9).In particular, we show in Theorem 2.9 that and provide in Theorem 2.9 and Remark 2.10 some further (moderate) improvements.
Note that these bounds improve reasonably on the bound ( 9) and yield additionally better upper bounds for the bracketing numbers of C d and, via (8), of R d .These are good news, since the construction of Thiémard is relatively simple and can be implemented easily, cf.
[22, Algorithm 3].In contrast, the construction from [10] that leads to the bound ( 7) from [14] is rather complicated.The key ingredients for the improvements in Theorem 2.9 and Remark 2.10 are the new Lemmas 2.4 and 2.8.The newly derived bounds on ) can be used to improve other bounds as, e.g., the tractability bounds for the usual as well as the weighted star discrepancy and extreme discrepancy in [10,14,9].We confine ourselves here to compare our newly derived upper bound on the L 1 -packing number P (ε, C d , • L 1 ) for the set system C d with the celebrated upper bound of David Haussler [15] (which is actually applicable to general set systems with finite Vapnik-Chervonenkis (VC) dimension): Since the set system C d has VC dimension d, Haussler's bound reads Corollary 1], while (10) and ( 4) yield for d ≥ 3 for the second estimate cf.[19].

Revisiting an Algorithm of Eric Thiémard
We now describe Thiémard's algorithmic partitioning process of the d-dimensional unit cube that results for given ε ∈ (0, 1) and d ∈ N after a finite number of steps in an ε-bracketing cover P d ε of C d .To simplify matters, we identify each anchored half-open d-dimensional interval [0, x) in C d with its upper right corner point x ∈ [0, 1] d .Since we already agreed to identify the sets [0, x) in C d with their corresponding characteristic functions 1 [0,x) , cf. ( 5) and ( 6), this finally results in identifying a point x ∈ [0, 1] d with the corresponding function 1 [0,x) .
In this sense, we can identify for two given points x, y ∈ [0, 1] d satisfying x ≤ y (meant component-wise) the intersection of the bracket [1 [0,x) , 1 [0,y) ] (cf.( 1)) and 1 C d with the closed d-dimensional interval [x, y], which we also call a bracket.Consequently, we define its weight to be Recall that the latter weight is given by During the partitioning process the d-dimensional unit cube where the vector γ = γ P ∈ [0, 1] d is given as in Theorem 2.1.In particular, if j > 1 then we have ∅ and all these empty subintervals will not be considered further (and will, in particular, not be elements of the final bracketing cover).The remaining subintervals Q P k , k = j, j + 1, . . ., d, are subintervals of type k, and Q P d+1 is an ε-bracket with W (Q P d+1 ) = ε, which will be added to P d ε .Thiémard called the subroutine of his algorithm that corresponds to this recursion step decompose(P, j); we will use the same name to refer to the recursion step above.
If for k ∈ {j, j + 1, . . ., d} the subinterval Q P k has weight larger ε, then it will be decomposed further; otherwise it will be added to P d ε .For the convenience of the reader we restate two major results from [22] concerning the partitioning process of the unit cube I d , namely [22,Theorem 3.2] and [22,Corollary 3.2].Note that formula (9) in [22,Theorem 3.2] contains a typo -the power appearing there is incorrect.We corrected it in the corresponding formula (13) below.

Theorem 2.1 ([22]
).For each call of decompose(P, j) during the decomposition process of I d , where P = [α P , β P ), we have where
In the following remark we collect some helpful observations.Remark 2.3.During the decomposition process the α-component of index d is preserved for all subintervals that are not of type d + 1: cf. [22, Proof of Lemma 3.2].Hence we can infer for all subintervals P appearing in the decomposition process and being of type j < d + 1 that Furthermore, we have see [22, identity (4)].Let P be a subinterval of type j.Then we get from (15) and Corollary 2.2 for each i ∈ {j, . . ., d} the identity We always have δ P ∈ (0, 1), cf.[22, Section 3.2].
To be able to improve Thiémard's upper bound for the cardinality of P d ε , we need a suitable recursion formula that shows how the parameter δ = δ P evolves in the course of the partitioning process.The desired formula will be proved in the next lemma.
Lemma 2.4.For each call of decompose(P, j) during the decomposition process of I d where δ P W (P ) > ε, we obtain for every i ∈ {j, . . ., d} in decompose(Q P i , i) The quantity δ Q P i is strictly monotone increasing with respect to the parameters δ P and W (P ) and strictly monotone decreasing with respect to the parameter i.
Let us introduce some additional notation: We put For r ∈ N and j ∈ S r , where we define for an element Q (r) j of the decomposition process with W (Q for j r+1 = j r , . . ., d. By convention, we put S 0 := {0}.Furthermore, we denote for r ∈ N the vector (1, . . ., 1) ∈ N r by 1 r .
Definition 2.5.For given d ∈ N and ε ∈ (0, 1) let the height of the partition P d ε , denoted by h = h(d, ε), be the largest number in N such that in the partitioning process there exists a j ∈ S h−1 with W (Q To prove his upper bound for the cardinality of P d ε , Thiémard represented the elements of P d ε as vertices in a certain type of planar graph, namely a triangular tree.We restate the definition and the (for us) most relevant result for triangular trees from [22, Section 3.4]: Definition 2.6.Let w ∈ N and h ∈ N 0 .The triangular tree T w h of height h and width w is recursively defined: • T w 0 is a leaf.
• If h ∈ N, then T w h is a tree made up of a root with w attached triangular trees Theorem 2.7 ( [22]).Let w ∈ N and h ∈ N 0 .The triangular tree T w h has N w h := h+w−1 w−1 leaves.
A proof of the theorem can be found in [22,Section 3.4].Now fix d ∈ N and ε ∈ (0, 1), and let h = h(d, ε) be the height of the partition P d ε .In the following we explain how Thiémard related the bracketing cover P d ε to a subtree S d+1 h of T d+1 h having the same root such that there is a one-to-one correspondence between the elements of P d ε and the leaves of S d+1 h .Indeed, we can represent all d-dimensional intervals that appear in the decomposition process of I d as vertices in a tree.Its root is the d-dimensional unit cube I d itself, and if P is an interval of type j that is generated in the course of the decomposition process, then it is a vertex of the tree and two things may happen: If W (P ) ≤ ε (which is, in particular, the case if j = d + 1), then P is a leaf (which corresponds to the fact that P will not be decomposed further); else P has d−j +2 children (which corresponds to the fact that P will be decomposed via Decompose(P, j) into d − j + 2 subintervals).Since the decomposition process terminates after finitely many steps, the resulting tree S d+1 h is obviously a subtree of a triangular tree T d+1 h of height h = h(d, ε).
Clearly, the number of leaves in S d+1 h is at most the number of leaves in T d+1 h .Hence, due to Theorem 2.7, this would result in see [22,Theorem 3.4].The next lemma is crucial for being able to prove a good upper bound on h.
Assume that Q i and Q (r) j appear in the decomposition process of I d .Then we have and, if Inequality (20) is strict if and only if for some ν ∈ {1, . . ., r − 1} we have i ν < j ν , while inequality (21) is strict if and only if there exists some ν ∈ {1, . . ., r} such that i ν < j ν .
Proof.We prove the inequalities via induction on r.
For r = 1 Corollary 2.2 implies W (Q d .Now let r ∈ {2, . . ., h}.We assume that the statement of Lemma 2.8 holds for r − 1.Note that Q and Q (r−1) (j 1 ,...,j r−1 ) appear in the decomposition process and have weights strictly larger than ε.In case that (j 1 ,...,j r−1 ) the statement follows again from Corollary 2.2 and Lemma 2.4.Otherwise our induction hypothesis yields (j 1 ,...,j r−1 ) ) =: w ′ and δ := δ (j 1 ,...,j r−1 ) =: δ ′ .Now identity (17) implies j ), and Lemma 2.4 yields Theorem 2.9.Let d ∈ N and ε ∈ (0, 1].The height h = h(d, ε) of the partition 1r ) ≤ ε}, and it satisfies the upper bound The cardinality of the ε-bracketing cover P d ε can be estimated as if d = 2, and as Proof.For r ∈ N we put w r := W (Q (r) 1r ) and δ r := δ Q (r) 1r , and additionally we set (15) and ( 13) we obtain δ r = w r − ε w r 1/d for all r ∈ N 0 , and due to (17) we have w r = δ r−1 w r−1 for all r ∈ N.This yields where in the last step we used the inequality (1 − x) Due to (19) we obtain This yields in the case d = 2 In the case where d ≥ 3, we employ the fact that a product consisting of d positive factors is always at most as large as the d-th power of the arithmetic mean of the factors, which gives us Remark 2.10 (Some Further Improvements).If we apply in (25) a less coarse estimate, we may obtain a slightly better result than (22).Indeed, consider the function f : Using the Taylor series expansion of the function (1 − x) 1/d , we get for w ℓ−1 > ε the identity Since f is a monotonic increasing function and w ℓ−1 ≤ 1, we have Analogously as in the proof of Theorem 2.9, we obtain that the height h satisfies For instance, employing the first-order truncation 1 + d−1 2d ε ≤ f (ε), (27) implies the estimate , which for moderately small ε clearly improves on (22).By using the same arguments as in the proof of Theorem 2.9, we now may improve ( 23) and ( 24) by if d = 2, and by (31) note that this bound has a smaller coefficient in front of the most important power ε −2 than the bound (23).
The constructions that yield (30) and (31), respectively, are different from Thiémard's construction.But in [11,Proposition 3.1] the following asymptotic bound for Thiémard's bracketing cover was proved: Moreover, numerical results in [11] indicate that Thiémard's bracketing cover P 2 ε has a comparable or even slightly smaller cardinality than the one that yielded the bound (31), but a clearly larger one than the one that implied the bound (30).The problem with the latter construction is that it is not obvious how to generalize it to arbitrary dimensions in a way such that its cardinality can be upper bounded with reasonable effort.
open) subintervals until all those subintervals are ε-brackets, i.e., have a weight at most ε.More precisely, it starts with I d , which is a subinterval of type 1, and partitions it intod subintervals Q I d 1 , . . ., Q I d d , where Q I d j isa subinterval of type j, and a subinterval Q I d d+1 of type d + 1, which has weight exactly ε; Q I d d+1 is added to the bracketing cover P d ε .If for j ∈ {1, . . ., d} the subinterval Q I d j has weight at most ε, it will be added to P d ε ; otherwise it will later be partitioned into smaller subintervals.More generally, in the recursion step, a subinterval P = [α, β) of type j ∈ {1, . . ., d} with α = α P , β = β P ∈ [0, 1] d and W (P ) > ε is partitioned into subintervals

j
appear in the decomposition process of I d if and only if

Remark 2 . 11 (
The Case d = 2).The upper bounds for the bracketing number presented in Theorem 2.9 and Remark 2.10 improve on the best known bounds so far in the case d ≥ 3.In the case where d = 2, actually better bounds are known.Indeed, in [11, Proposition 5.1] it was shown constructively, that the asymptotic behavior of the bracketing number in dimension 2 isN [ ] (ε, C 2 , • L 1 ) ≤ ε −2 + o(ε −2 );(30) a matching lower bound was proved in [10, Theorem 1.5].An upper bound with fully explicit constants was established constructively in [14, Theorem 2.2] and reads N [ ] (ε, C 2 , • L 1 ) ≤ 2 ln(2)ε −2 + 3(ln(2) + 1)ε −1 − 13 9 ln(2) − 1 ; ).The partitioning process should result in a partition of the d-dimensional unit cube; to achieve this it is necessary to work with half-open intervals instead of closed ones.To avoid picky distinctions, we will identify half-open d-dimensional intervals [x, y) with their corresponding bracket [x, y] and, consequently, call [x, y) itself a bracket.Accordingly, the weight of such a bracket is given by 1/d ≤ 1 − x/d, which holds true for all x ∈ [0, 1].Due to Definition 2.5 and Lemma 2.8 the height h of the partition P d ε is given by h = min{r ∈ N 0 : w r ≤ ε}, and note that w r ≤ ε if and only if 1