Adaptive estimation of convex polytopes and convex sets from noisy data

: We estimate convex polytopes and general convex sets in R d , d ≥ 2 in the regression framework. We measure the risk of our estimators using a L 1 -type loss function and prove upper bounds on these risks. We show, in the case of convex polytopes, that these estimators achieve the minimax rate. For convex polytopes, this minimax rate is ln nn , which diﬀers from the parametric rate for non-regular families by a logarithmic factor, and we show that this extra factor is essential. Using polytopal ap- proximations we extend our results to general convex sets, and we achieve the minimax rate up to a logarithmic factor. In addition we provide an estimator that is adaptive with respect to the number of vertices of the unknown polytope, and we prove that this estimator is optimal in all classes of convex polytopes with a given number of vertices. 1 , . . . , n when G = T k in (1), and by E k the expectation with respect to this distribution. A simple computation shows that the Kullback-Leibler divergence K ( P k , P l ) between P k and P l , for k 6 = l , is equal


Definitions and notations
Let d ≥ 2 be a positive integer. Assume that we observe a sample of n i.i.d. pairs (X i , Y i ), i = 1, . . . , n such that X 1 , . . . , X n have the uniform distribution on [0, 1] d and Y i = I(X i ∈ G) + ξ i , i = 1, . . . , n.
The collection X 1 , . . . , X n is called the design. The errors ξ i , i = 1, . . . , n, are i.i.d. zero-mean random variables independent of the design, G is a subset of [0, 1] d , and I(· ∈ G) stands for the indicator function of the set G. Here we aim to estimate the set G in model (1). A subsetĜ n of [0, 1] d is called a set estimator, or simply, in our framework, an estimator, if it is a Borel set and if there exists a real measurable function f defined on ([0, 1] d × R) n such that I(· ∈Ĝ n ) = f (·, X 1 , Y 1 , . . . , X n , Y n ).
If G is a measurable (with respect to the Lebesgue measure on R d ) subset of [0, 1] d , we denote by |G| d or, when there is no possible confusion, simply by |G|, its Lebesgue measure and by P G the probability measure with respect to the distribution of the collection of n pairs (X i , Y i ), i = 1, . . . , n. Where it is necessary to indicate the dependence on n we use the notation P ⊗n G . If G 1 and 1301 G 2 are two measurable subsets of R d their Nikodym pseudo distance d 1 (G 1 , G 2 ) is defined as Note that ifĜ n is a set estimator and G is a measurable subset of [0, 1] d , then the quantity |G△Ĝ n | = [0,1] d |I(x ∈Ĝ n ) − I(x ∈ G)|dx is well defined and by Fubini's theorem it is measurable with respect to the probability measure P G . Therefore one can measure the accuracy of the set estimatorĜ n on a given class of sets in the minimax framework ; the risk ofĜ n on a class C is defined as R n (Ĝ n ; C) = sup For all the estimators that we will define in the sequel we will be interested in upper bounds on their risk, which give information about the rate at which these risks tend to zero, when the number n of available observations tends to infinity. For a given class of subsets C, the minimax risk on this class when n observations are available is defined as where the infimum is taken over all set estimators depending on n observations. If R n (C) converges to zero, we call minimax rate of convergence on the class C the speed at which R n (C) tends to zero.
In this paper, we study minimax rates of convergence on two classes of subsets of [0, 1] d : the class of all compact and convex sets, and the class of all convex polytopes with at most r vertices, where r is a given positive integer. Let C be a given class of subsets of [0, 1] d . One of our aims is to provide a lower bound on the minimax risk on the class C. This lower bound can give much information on how close the risk of a given estimator is to the minimax risk on the class that we consider. If the rate (a sequence depending on n) of the upper bound on the risk of an estimator matches with the rate of the lower bound on the minimax risk on the class C, then the estimator is said to have the minimax rate of convergence on this class.
We denote by ρ the Euclidean distance in R d , by B d (y, r) the d-dimensional closed Euclidean ball centered at y ∈ R d with radius r, and by β d the volume of the Euclidean unit ball in dimension R d . For any positive real number x, we denote by ⌊x⌋ the greatest integer that is less or equal to x. Any convex set that we will consider in the following is assumed to be compact and with nonempty interior in the considered topological space.

Former results and contributions
Estimation of convex sets and, more generally, of sets, has been extensively studied in the previous decades (see the nice surveys in Cuevas [6] and Cuevas and Fraiman [7] and the references therein, and related topics in [15]). First works, in the 1960's, due to Rényi and Sulanke [26,27], and Efron [10] were motivated by issues of stochastic geometry, discussed, for instance, in the book by Kendall and Moran [16] and [1]. Most of the works on estimation of convex sets dealt with models different than ours. Rényi and Sulanke, [26,27], were the first to study the convex hull of a sample of n i.i.d. random points in the plane. They obtained exact asymptotic formulas for the expected area and the expected number of vertices when the points are uniformly distributed over a convex set, and when they have a Gaussian distribution. They showed that if the points are uniformly distributed over a convex set K in the plane R 2 , then the expected missing area E[|K\K|] of the convex hullK of the collection of these points is of the order This result was generalized to any dimension, and we refer to [2] for an overview.
Estimation of convex sets in a multiplicative regression model has been investigated by Mammen and Tsybakov [22] and Korostelev and Tsybakov [19]. The design (X 1 , . . . , X n ) may be either random or deterministic, in [0, 1] d . In [22] Mammen and Tsybakov proposed an estimator of a convex set G, based on likelihood maximization over an ε-net whose cardinality is bounded in terms of the metric entropy [9]. They showed, with no assumption on the design, that the rate of their estimator cannot be improved.
The additive model (1) has been studied in [18] and [19], in the case where G belongs to a smooth class of boundary fragments and the errors are i.i.d. Gaussian variables with known variance. If γ is the smoothness parameter of the studied class, it is shown that the rate of the minimax risk on the class is n −γ/(γ+d−1) . The case of convex boundary fragments is covered by the case γ = 2, which leads to the expected rate n −2/(d+1) for the minimax risk, as we will discuss later (Section 5). It is important to note that in these works the authors always assumed that the fragment, which is included in [0, 1] d , has a boundary which is uniformly separated from 0 and 1. We will not make such an assumption in our work. Korostelev and Tsybakov [18,19] also looked at some non-gaussian noises, making more general assumptions. Cuevas and Rodriguez-Cazal [8], and Pateiro Lopez [24], studied the properties of set estimators of the support of a density under several geometric assumptions on the boundary of the unknown set.
One problem has not been investigated yet: What is the minimax rate of convergence if one assumes that the unknown set G in model (1) is a convex polytope with a bounded number of vertices ? This question can be reformulated in the framework of boundary fragments: What is the minimax rate of convergence if G is a fragment which belongs to a parametric family ? In the method used in [18] and [19], the true fragment is first approximated by an element of a parametric family of fragments, whose dimension is chosen afterwards according to the optimal bias-variance tradeoff. Thus, a parametric approximation of the fragment G and not directly G itself is estimated. This idea is exploited in the present work, when we estimate convex sets by using polytopal approximations. It is easy to show that the rate of convergence of the estimator, when G belongs to a parametric family of boundary fragments of dimension M , is of the order M/n. But this is true under the assumption of uniform separation from 0 and 1. We will see below that if this assumption is dropped in a special case of a parametric family (convex polytopes with a bounded number of vertices), an extra logarithmic factor appears in the rate of the minimax risk.
In order to estimate convex sets, we will first approximate a convex set by a convex polytope, and then estimate that polytope. There is an extensive literature on polytopal approximation of convex sets (cf. [23,11], and the references cited therein), which is of essential use in this paper. This method provides an explicit estimator but it will be shown to be suboptimal. This is why we will propose another method, which is rather classical, using the metric entropy, and yelds a rate-minimax estimator.
For an integer r ≥ d + 1, we denote by P r the class of all convex polytopes in [0, 1] d with at most r vertices. This class may be embedded into the finite dimensional space R dr since any polytope is completely defined by the coordinates of its vertices. Hence, one may expect that the problem of estimating G ∈ P r , for a given r, is parametric and therefore the minimax risk R n (P r ) would be of order 1/n, cf. [13]. However this is not the case. In Section 2.1, we propose an estimator that almost achieves this rate, up to a logarithmic factor. Moreover, we prove an exponential deviation inequality for the Nikodym distance between the estimator and the true polytope. Such an exponential inequality is of interest because it is much stronger than an upper bound on the risk of the estimator, and it is the key for adaptive estimation, as we will see later. In Section 2.2, we show that this estimator has the minimax rate of convergence, so that the logarithmic factor in the rate is unavoidable. In Section 3, we extend the exponential deviation inequality of Section 2 and cover minimax estimation of any convex set. In Section 4, we propose an estimator that is adaptive to the number of vertices of the estimated polytope, using as a convention that a non polytopal convex set can be considered as a convex polytope with infinitely many vertices. In Section 5 we discuss our results, and Section 6 is devoted to the proofs. We will try as much as possible to use geometric and explicit methods, and elementary arguments in the proofs.

Upper bound
We denote by P 0 the true polytope, i.e. G = P 0 in (1) and we assume that P 0 ∈ P r . Denote by P (n) r the class of all the convex polytopes in [0, 1] d with at most r vertices with coordinates that are integer multiples of 1 n . It is clear that the set P (n) r is finite and its cardinality is less than (n + 1) dr . We estimate P 0 by a polytope in P (n) r that minimizes some criterion. The criterion that we use is the sum of squared errors In what follows, we will write A(P ) instead of A(P, {(X i , Y i )} i=1,...,n ) in order to simplify the notations. Note that if the noise variables ξ i are supposed to be Gaussian, then minimization of A(P ) is equivalent to maximization of the likelihood. Consider the set estimator of P 0 defined aŝ Note that since P (n) r is finite, the estimatorP (r) n exists but is not necessarily unique.
Let us introduce the following assumption on the law of the ξ i . Assumption 1. The random variables ξ i , i = 1, . . . , n, are i.i.d., zero mean and subgaussian, i.e. satisfy the following exponential inequality.
where σ is a positive number.
Note that if the errors ξ i , i = 1, . . . , n, are i.i.d. zero-mean Gaussian random variables, then Assumption 1 is satisfied.
The next theorem establishes an exponential deviation inequality for the estimatorP (r) n . Theorem 1. Let r ≥ d + 1 be an integer, and n ≥ 2. Consider the model (1), with G = P , where P ∈ P r . Let Assumption 1 be satisfied. For the estimator P (r) n , there exist two positive constants C 1 and C 2 , which depend on d and σ only, such that sup P ∈Pr The explicit expressions for the constants C 1 and C 2 are given in the proof. From the deviation inequality of Theorem 1 one can easily derive that the risk of the estimatorP (r) n on the class P r is of the order ln n n . Indeed, we have the following result. Corollary 1. Let n ≥ 2. Let the assumptions of Theorem 1 be satisfied. Then, for any positive number q, there exists a constant A q which depends on σ, d and q such that The explicit form of the constant A q can be derived from the proof. Note that the construction of our estimator does not require the knowledge of σ.

Lower bound
Corollary 1 gives an upper bound of the order ln n n for the risk of our estimator P (r) n . The next result shows that ln n n is the minimax rate of convergence on the class P r .

V.-E. Brunel
Theorem 2. Let r ≥ d + 1 be an integer. Consider the model (1) and assume that the errors ξ i are zero-mean Gaussian random variables with variance σ 2 > 0. For any large enough n, we have the following lower bound.
where α = 1 2 − ln 2 2 ln 3 ≈ 0.29... Corollary 1 together with Theorem 2 gives the following bound on the class P r , in the case of Gaussian noise with variance σ 2 .
for n large enough and r ≥ d + 1. Note that the lower bound does not depend on the number of vertices r. This is because we prove our lower bound for the class P d+1 and we use that P r ⊇ P d+1 , for r ≥ d+1. The minimax rate of convergence on any of the classes P r , r ≥ d + 1, is therefore of the order (ln n)/n.
An inspection of the proofs shows that these results still hold for d = 1, r = 2 ; namely, in model (1), the minimax risk for the estimation of segments in [0, 1] is of order (ln n)/n.

A first estimator
Denote by C d the class of all convex sets included in [0, 1] d . Now we aim to estimate convex sets in the same model, without any assumption of the form of the unknown set. If G is a convex set in model (1), an idea is to approximate G by a convex polytope. For example one can select r points on the boundary of G and take their convex hull. This will give a convex polytope P r with r vertices inscribed in G. In Section 2 we showed how to estimate such a r-vertex convex polytope as P r . Thus, if P r approximates well G, an estimator of P r is a candidate to be a good estimator of G. The larger is r, the better P r should approximate G with respect to the Nikodym distance defined in (2). At the same time, when r increases the upper bound of Corollary 1 increases as well. Therefore r should be chosen according to the bias-variance tradeoff.
For any integer r ≥ d + 1 consider again the estimatorP (r) n defined in (4). However, now we chose a value for r that depends on n in order to achieve the bias-variance tradeoff.
, and letP (r) n the estimator defined in (4). Let Assumption 1 be satisfied. Then, there exist positive constants C 1 , C 2 and C 3 , which depend on d and σ only, such that The constants C 1 and C 2 are the same as in Theorem 1, and C 3 is given explicitly in the proof of the theorem. From Theorem 3 we get the next corollary.
Corollary 2. Let the assumptions of Theorem 3 be satisfied. Then, for any positive number q there exists a positive constant A ′ q which depends on σ, d and q such that .
The explicit expression for A ′ q can be derived in the same way as for the constants A q in Corollary 1. Note again that the construction of our estimator does not require the knowledge of σ.
Corollary 2 shows that the estimator given in Theorem 3 achieves the rate ln n n 2 d+1 . This estimator has an advantage: it is computable and constructed using an intuitive geometrical argument, polytopal approximation of convex sets. However, as we will show next, there exists an estimator which achieves the same rate without the logarithmic factor. That estimator is based on the metric entropy of the class C d , and is mainly of theoretical interest. We develop this in the next subsection.

Improvement of the upper bound
We propose an estimator whose construction is similar to [22], where the multiplicative model was considered. Bronshtein [4] proves the following upper bound on the metric entropy of C d . If d ≥ 2 and δ is a positive number, then there exists a δ-net in C d containing not more than τ 1 e τ2δ −(d−1)/2 sets, where τ 1 and τ 2 are positive numbers and depend on d only. Another result on the metric entropy of C d was obtained by Dudley [9], but in a weaker form than Bronshtein's upper bound, and could not be used in our analysis.
Let δ = n −2/(d+1) . Let N = ⌊τ 1 e τ2δ −(d−1)/2 ⌋ and G 1 , . . . , G N be a δ-net of C d . Let G ∈ C d be the true set in model (1). We define the estimatorG n = Gĵ, whereĵ is the index of a set in the δ-net of C d that we introduced above, which minimizes the sum of squared errors, as in Section 2.1: where A is defined in (3). Note again thatĵ may not be unique. We have the following result.
. Let Assumption 1 be satisfied. Then, there exist a positive integer n 0 (d) which depends on d only and positive constants C 0 and C 2 , which depend on d and σ only, such that Here, C 0 =C 1+τ2 C1 and the constantsC 1 , C 1 and C 2 are given in the proof of Theorem 1. Note again that the construction of the estimatorG does not require the knowledge of the noise level σ.
As for the estimator of the previous section, we derive from Theorem 4 an upper bound on the risk of the estimatorG, and we have the following result.

Lower bound
In this section we give a lower bound on the minimax risk on the class C d of all convex sets in [0, 1] d .
Theorem 5. Let n ≥ 125. Consider the model (1) and assume that the errors ξ i are zero-mean Gaussian random variables, with variance σ 2 > 0. There exist a positive constant C 4 which depends only on the dimension d and on σ, such that for any estimatorĈ, The explicit form of the constant C 4 can be found in the proof of the theorem. From Theorem 5 and Corollary 3, one gets, for n ≥ 125 and in the case of Gaussian noise, 0 < C 4 ≤ n 2 d+1 R n (C d ) ≤ A ′′ 1 < ∞, and therefore the minimax risk on the class C d is of the order n −2/(d+1) .

Adaptive estimation
In Section 2, we proposed an estimator that depends on the parameter r. A natural question is to find an estimator that is adaptive to r, i.e. that does not depend on r, but achieves the optimal rate on the class P r . The idea of the following comes from Lepski's method for adaptation (see [21], or [5], Section 1.5, for a nice overview). Assume that the true number of vertices, denoted by r * , is unknown, but is bounded from above by a given integer R n ≥ d + 1 that may depend on n and be arbitrarily large. Theorem 1 would provide the estimatorP (Rn) n , but it is clearly suboptimal if r * is small and R n is large. Indeed the rate of convergence ofP (Rn) n is Rn ln n n , although the rate r * ln n n can be achieved according to Theorem 1, when r * is known. The procedure that we propose selects an integerr based on the observations, and the resulting estimator isP Note that R n should not be of order larger than (ln n) −1 n d−1 d+1 , since for larger values of r, Corollaries 1 and 3 show that the estimation rate is better when one considers the class C d instead of P r . Let us denote, for C2 , C 0 , where the constants C 0 and C 2 are given in theorems 1 and 4 respectively.
The integerr is well defined ; indeed, the set in the brackets in the last display is not empty, since the inequality is satisfied for r = R n . The adaptive estimator is defined asP adapt n =Q (r) n . Note that the construction ofP adapt n requires the knowledge of σ through the definition ofr ; it depends on the constant C 2 of Theorem 1, which depends itself on σ. We then have the following theorem.
Thus, we show that one and the same estimatorP adapt n attains the optimal rate simultaneously on all the classes P r , d + 1 ≤ r, and on the class C d of all convex subsets of [0, 1] d . The explicit form of the constant C 5 can be easily derived from the proof of the theorem.

Discussion
Theorem 4 showed that the logarithmic factor in Corollary 2 can be dropped and that the minimax rate of convergence on the class C d is n −2/(d+1) . However, Theorems 1 and 2 show that the logarithmic factor is significant in the case of convex polytopes. We try to understand what brings this logarithmic factor in one case and not in the other.
Let us first answer the following question: What makes the estimation of sets on a given class C ⊆ C d difficult in the studied model ? First, it is the complexity of the class, expressed in terms of the metric entropy. We worked with this notion of complexity in our Theorem 5, using δ-nets. The second issue is how detectable the individual sets of the given class are, in our model. If the unknown subset G is too small, then, with high probability, it contains no points of the design. Conditionally to this, all the data have the same distribution and no information in the sample can be used in order to detect G. A subset G has to be large enough in order to be detectable by some procedure. The threshold on the volume beyond which a subset cannot be detected by any procedure gives a lower bound on the rate of the minimax risk. In [14], Janson studied asymptotic properties on the maximal volume of holes with a given shape. A hole is a subset of [0, 1] d that contains no point of the design (X 1 , . . . , X n ). Janson showed that with high probability, there are convex and polytopal holes that have a volume of order (ln n)/n. This result suggests that a lower bound on the minimax risk in Theorem 2 should be of the order (ln n)/n. Our lower bound is attained on the polytopes with very small volumes. We do not use the specific structure of these polytopes to derive the lower bound ; we only use the fact that some of them cannot be distinguished from the empty set, no matter what is the shape of their boundary, when we chose their volume of order no larger than ln n n . This shows that the rate 1/n, which would come from the complexity of the parametric class P r , is not the right minimax rate of convergence: the order (ln n)/n is dominating. On the other hand, the proof of our lower bound of the order n −2/(d+1) for general convex sets uses only the structure and regularity of the boundaries ; we do not deal especially with small hypotheses. The order n −2/(d+1) is much larger than (ln n)/n, and therefore seems to determine the best lower bound achievable on the minimax risk on the class C d .
Let us add two remarks in this discussion. First, if d = 2 it is easy to prove a better lower bound on the minimax risk on the class P r , for any integer r ≥ 3, using the scheme of the proof of Theorem 5 in the case d = 2.
R n (P (r) n ; P r ) ≥ max λ 1 ln n n , λ 2 r n , for some positive constants λ 1 and λ 2 . It seems to us that this lower bound should remain true for any value of d, with constants λ 1 and λ 2 which would depend on d and σ. If ln n is larger than r, then the minimax risk is controlled from below by the rate ln n n . If not, i.e., if the number of vertices of the unknown convex polytope can be arbitrarily large, the order of the risk has a lower bound of the order r n . Our second remark is the following. Let µ 0 be a fixed positive number. If one considers the subclass P ′ r (µ 0 ) = {P ∈ P r : |P | ≥ µ 0 }, then subsets of [0, 1] d with too small volume are excluded. Therefore, the construction used in the proof of Theorem 4 is no more valid and we expect the minimax rate of convergence on this class to be of the order r/n, i.e., without a logarithmic factor.

Proofs
Proof of Theorem 1 Let P 0 ∈ P r be the true polytope. We have the following lemma, proven in Section 7. Lemma 1. Let r ≥ d + 1, n ≥ 2. For any convex polytope P in P r there exists a convex polytope P * ∈ P (n) r such that Let P * ∈ P Note that for all ǫ > 0, where P * is a convex polytope chosen in P (n) r which satisfies the inequality |P * \P 0 | ≤ 2d d+1 (3/2) d β d n , cf. (6). For any P we have, by a simple algebra, where The random variables Z i depend on P but we omit this dependence in the notation. Therefore (7) implies that for all positive number u, by Markov's inequality. Since Z i 's are mutually independent, we obtain By conditioning on X 1 and denoting by W = I(X 1 ∈ P ) − I(X 1 ∈ P * ) we have We will now reduce the last expression in (11). It is convenient to use Table 1: the first three columns represent the values that can be taken by the binary variables I(X 1 ∈ P ), I(X 1 ∈ P * ) and I(X 1 ∈ P 0 ) respectively, and the last column gives the resulting value of the term under the expectation in (11), that is exp 2σ 2 u 2 I(X 1 ∈ P △P * ) − uW + 2uI(X 1 ∈ P 0 )W . Hence one can write E P0 [exp(−uZ 1 )] = 1 − |P △P * | + e 2σ 2 u 2 +u (|(P ∩ P 0 )\P * | + |P * \(P ∪ P 0 )|) + e 2σ 2 u 2 −u (|(P * ∩ P 0 )\P | + |P \(P * ∪ P 0 )|) .

1312
V.-E. Brunel Table 1 Values of exp 2σ 2 u 2 I(X 1 ∈ P △P * ) − uW + 2uI(X 1 ∈ P 0 )W P P * P 0 Value 1 1 1 1 Besides by the triangle inequality, Choose u = 1 4σ 2 . Then the quantity 1 − e 2σ 2 u 2 −u is positive and if |P △P 0 | ≥ ǫ, then We setC 1 = 1 + e 3 8σ 2 and C 2 = 1 − e − 1 4σ 2 . These are positive constants that do not depend on n or P 0 . From (10) and (13), and by the independence of Z i 's we have where C 1 = exp 2d d+1 (3/2) d β dC1 , noting that n + 1 ≤ n 2 . Therefore if we set ǫ = 2dr ln n C2n + x n for a positive number x, we get the following deviation inequality Proof of Corollary 1 Corollary 1 follows directly from Theorem 1 and Fubini's theorem. Indeed, if we denote Z := |P (r) n △P 0 | and by P Z its distribution measure, then Z is a continuous and nonnegative random variable and we have, by Fubini's theorem, that for some positive constant A q which depends on σ, d and q only. Note that the fifth step of this proof comes from the easy fact that for any positive numbers a and b, (a + b) q−1 ≤ 2 q−1 (a q−1 + b q−1 ) if q − 1 > 0, and (a + b) q−1 ≤ a q−1 + b q−1 if q − 1 ≤ 0, and the sixth comes from the equality Proof of Theorem 2 This proof is a simple application of Fano's method, see Corollary 2.6 in [28] or, for a more general setting, [12]. For k = 1, . . . , M we denote by P k the probability distribution of the observations (X i , Y i ), i = 1, . . . , n when G = T k in (1), and by E k the expectation with respect to this distribution. A simple computation shows that the Kullback-Leibler divergence K(P k , P l ) between P k and P l , for k = l, is equal to nh 4σ 2 . On the other hand, the distance between T k and T l , for k = l, is Let α ∈ (0, 1), and γ = 1 2σ 2 α . Then, if M = γn ln n , supposed without loss of generality to be an integer, we have 4σ 2 αM ln M = 2n − 2n ln ln n ln n + 2n ln γ ln n ≥ n for n large enough, so that Therefore, applying Corollary 2.6 in [28] with the pseudo distance defined in (2), we set for r ≥ d + 1 the following inequality For n great enough, we have M ≥ 3 and ln (M+1)−ln 2 ln M ≥ 1 − ln 2 ln 3 . We choose α = 1 2 − ln 2 2 ln 3 ∈ (0, 1). So, we get This immediately implies Theorem 2.

Proof of Theorem 3
The idea of the proof is very similar to that of Theorem 1. Here we need to control an extra bias term, due to the approximation of R by a r-vertex convex polytope. We give the following lemma (cf. [11]).
Lemma 2. Let r ≥ d + 1 be a positive integer. For any convex set G ⊆ R d there exists a convex polytope P r with at most r vertices such that where A is a positive constant that does not depend on r, d and G.
Let P * be a polytope chosen in P (n) r such that|P * △P r | ≤ (4d) d+1 β d n , like in the proof of Theorem 1. Thus by the triangle inequality, Adaptive estimation of convex polytopes and convex sets 1315 We now bound from above the probability P G |P (r) n △G| ≥ ǫ for any ǫ > 0. As in (7) and (9) we have Repeating the argument in (8) with G instead of P 0 we set The rest of the proof is very similar to the one of Theorem 1. Indeed, replacing P 0 by G in that proof between (7) and (12), (13) and (15) one gets Therefore if we set ǫ = 2dr ln n C2n +C 1Ad C2r 2/(d−1) + x n for a positive number x, we get the following deviation inequality where the constants are defined as in the previous section. That ends the proof of Theorem 3 by choosing r = ⌊ n ln n d−1 d+1 ⌋, and the constant C 3 is given by Proof of Theorem 4 The proof is similar to the proof of Theorem 1. The difference is that we now use a δ-net instead of a grid. If G is the true set, let i * , 1 ≤ i * ≤ N , be the index of a set of the δ-net whose distance to G is not greater than δ: It follows, from the definition of the estimator that This leads to the same inequality as (14) where the sum is now over i = 1, . . . , N, for which |G i △G| ≥ ǫ, and the term 2d d+1 (3/2) d β dC1 n should be replaced byC 1 δ.

Proof of Theorem 5
We first prove this theorem in the case d = 2 and then generalize the proof for d ≥ 3. We more or less follow the lines of the proof of the lower bound in [20] (which is similar to the proof of Assouad's lemma, see [28]). Let G be the disk centered in (1/2, 1/2) of radius 1/2, and P be a regular convex polygon with M vertices, all of them lying on the edge of G. Each edge of P cuts a cap off G, of area h, with π 3 /(12M 3 ) ≤ h ≤ π 3 /M 3 as soon as M ≥ 6, which we will assume in the sequel. We denote these caps by D 1 , . . . , D M , and for any ω = (ω 1 , . . . , ω M ) ∈ {0, 1} M we denote by G ω the set made of G out of which we took all the caps D j for which ω j = 0, j = 1, . . . , M .
For two probability measures P and Q defined on the same probability space and having densities denoted respectively by p and q with respect to a common measure ν (we also denote by dP = pdν and dQ = qdν), we call H(P, Q) the Hellinger distance between P and Q, defined as Some useful properties of the Hellinger distance can be found in [28], Section 2.4. Now, let us consider any estimatorĜ. For j = 1, . . . , M we denote by A j the smallest convex cone with origin at (1/2, 1/2) and which contains the cap D j . Note that the cones A j , j = 1, . . . , M have pairwise a null Lebesgue measure intersection. Then, we have the following inequalities.
Then if we denote by C 9 = 1 − e − 1 8σ 2 , it follows from (16) and (17) that Besides, since we assumed that M ≥ 6, we have that and if we take M = ⌊n 1/3 ⌋, we get by concavity of the logarithm is a positive constant that depends only on σ. This inequality holds for n ≥ 216, so that M ≥ 6.
We now deal with the case d ≥ 3. Let us first recall some definitions and resulting properties, that can also be found in [17]. Definition 1. Let (S, ρ) be a metric space and η a positive number. A family Y ⊆ S is called an η-packing family if and only if ρ(y, y ′ ) ≥ η, for (y, y ′ ) ∈ Y with y = y ′ . An η-packing family is called maximal if and only if it is not strictly included in any other η-packing family. A family Z is called an η-net if and only if for all x ∈ S, there is an element z ∈ Z which satisfies ρ(x, z) ≤ η.
We now give a Lemma. Lemma 4. Let S be the sphere with center a 0 = (1/2, . . . , 1/2) ∈ R d and radius 1/2, and ρ the Euclidean distance in R d . We still denote by ρ its restriction on S. Let η ∈ (0, 1). Then any η-packing family of (S, ρ) is finite, and any maximal η-packing family has a cardinality M η that satisfies the inequalities The construction of the hypotheses used for the lower bound in the case d = 2 requires a little more work in the general dimension case, since it is not always possible to construct a regular convex polytope with a fixed number of vertices or facets, and inscribed in a given ball. For the following geometrical construction, we refer to Figure 1.
Let G 0 be the closed ball in R d , with center a 0 = (1/2, . . . , 1/2) and radius 1/2, so that G 0 ⊆ [0, 1] d . Let η ∈ (0, 1) which will be chosen precisely later, and {y 1 , . . . , y Mη } a maximal η-packing family of S = ∂G 0 . The integer M η satisfies (18) by Lemma 4. For j ∈ {1, . . . , M η }, we set by U j = S ∩ B d (y j , η/2), and denote by W j the d − 2 dimensional sphere S ∩ ∂B d (y j , η/2). Let H j be affine hull of W j , i.e. its supporting hyperplane. This hyperplane dissects the space R d into two halfspaces. Let H − j be the one that contains the point y j . For ω = (ω 1 , . . . , ω Mη ) ∈ {0, 1} Mη , we set The set G ω is made of G 0 from which we remove all the caps cut off by the hyperplanes H j , for all the indices j such that ω j = 0.
We are now all set to reproduce the proof written in the case d = 2. Note that for all ω ∈ {0, 1} Mη and j ∈ {1 . . . , M η }, and this quantity is equal to since as mentioned before η 2 /4 is the height of the cap cut off by H j , or in order words the distance between y j and the hyperplane H j , independent of the index j. Therefore, Since 0 < η 2 /4 < 1/4, we then get .
Now, continuing (16) and (17), replacing M by M η and h by the lower bound in (19) and using lemmas 3 and 4, we get that where .
Note that since the ball B d−1 (0, 1/2) is included in the (d − 1)-dimensional hypercube centered at the origin, with sides of length 1, the following inequality holds and this shows that C 9 < 1. Therefore, since η < 1 as well, the concavity of the logarithm leads (20) to Let us choose η = n −1/(d+1) , so that (20) becomes Proof of Theorem 6 Let r * be a given and finite integer such that d + 1 ≤ r * ≤ R n − 1. Recall that by definition,Q . Note that if r * ≤ r ≤ r ′ , then P r * ⊆ P r ⊆ P r ′ . Therefore if P ∈ P r * and G = P in model (1), by Theorem 1 it is likely that with high probability we have, using the triangle inequality, for any r * ≤ r ≤ r ′ , where C is a constant. Therefore it is reasonable to select r as the minimal integer that satisfies (21). Letr be chosen as in Theorem 5. For r = d + 1, . . . , R n , let us denote by A r the event following event.
A r = ∀r ′ = r, . . . , R n , |Q (r) n △Q (r ′ ) n | ≤ 2C a r ′ ln n n , where C 2 is the same constant as in Theorem 1. Thenr is the smallest integer r ≤ R n such that A r holds. Let P ∈ P r * . We write the following. and we bound separately the two terms in the right side. Note that ifr ≤ r * , then, since the event Ar holds by definition, |Q (r * ) n △Q (r) n | ≤ 2C a r * ln n n .
Therefore, using the triangle inequality, ≤ 2C a r * ln n n + A 1 dr * ln n n by Corollary 1, sinceQ (r * ) n =P (r * ) n ≤ C 11 r * ln n n , where C 11 depends only on d and σ. The second term of (22)  Rn r=r * P P |Q (r * ) n △P | > C a r ln n n + P P |Q (r) n △P | > C a r ln n n ≤ 2 Rn−1 r=r * P P |P (r * ) n △P | > C a r * ln n n + P P |P (r) n △P | > C a r ln n n + 2P P |P (r * ) n △P | > C a r * ln n n + 2P P |G△P | > C a R n ln n n (24) Note that since P ∈ P r * , it is also true that P ∈ P r , ∀r ≥ r * . Therefore, if r * ≤ r ≤ R * − 1, we have, using Theorem 1, with x = (C a − 2d/C 2 )r ln n ≥ r ln n/C 2 , P P |P (r) n △P | > C a r ln n n ≤ C 1 e −r ln n ≤ C 1 n −(d+1) .
In addition, by Theorem 4, with x = (C a − C 0 )R n ln n ≥ R n ln n/C 2 , P P |G△P | > C a R n ln n n ≤ τ 1 e −Rn ln n ≤ τ 1 n −(d+1) .

It comes from (24) that
Finally, using (23) and (25), where C 12 is a positive constant that depends on d and σ. Let us now assume that r * is a given integer larger than R n , possibly infinite, and that P ∈ P r * . As in Theorem 6, if r * = ∞ we denote by P ∞ the class C d . Then with probability one,r ≤ r * . First of all, note that obviously, since by definition,r ≤ R n , |Q (Rn) n △Q (r) n | ≤ 2C a R n ln n n ≤ 2C a n − 2 d+1 with probability one. Then, by the triangle inequality, by Corollary 3, since P ∈ P r * ⊆ P ∞ andQ (Rn) n is the estimator of Theorem 4. Theorem 6 is then proven.

Appendix: Proof of the lemmas
Proof of Lemma 1 Let us first state the following lemma, which gives the Steiner formula in the case of convex polytopes. It can also be found in [3]. If R ⊆ R d and λ > 0, we denote by R λ the set of all x ∈ R d such that the Euclidean distance between x and R is less or equal to λ ; R λ = {x ∈ R d , ρ(x, R) ≤ λ} = R + λB d (0, 1).
Note that in this lemma, if R is included in B d (a, u) for some a ∈ R d and u > 0, then for all positive λ, R λ ⊆ B d (a, u) λ = B d (a, u + λ) and if we denote by β d = |B d (0, 1)|, Therefore, since all the L i (R) are nonnegative, one gets L i (R) ≤ (u + 1) d β d , i = 1, . . . , d by taking λ = 1 in (26).
useful comments and the interesting discussions about set estimation and related topics. I would also like to thank the referees for indicating the useful reference [4], which I did not find by myself before submitting this article, and the Caesarea Rothschild Institute at the University of Haifa, for hosting me when I began working on this article.