LIMIT THEOREMS FOR MULTI-DIMENSIONAL RANDOM QUANTIZERS

We consider the rth power quantization error arising in the optimal approximation of a d -dimensional probability measure P by a discrete measure supported by the realization of n i.i.d. random variables X 1 ,..., X n . For all d ≥ 1 and r ∈ ( 0, ∞ ) we establish mean and variance asymptotics as well as central limit theorems for the rth power quantization error. Limiting means and variances are expressed in terms of the densities of P and X 1 . Similar convergence results hold for the random point measures arising by placing at each X i , 1 ≤ i ≤ n , a mass equal to the local distortion.


Introduction and main results
Quantization for probability measures is a classic partitioning problem arising in information theory, cluster analysis, and mathematical models in economics [7]. It concerns the best approximation of a d-dimensional probability measure P by a discrete measure supported by a set having n atoms. An n point set partitions d into Voronoi cells C(x, ), x ∈ , and a d-dimensional random vector U with distribution P is quantized by mapping its realization, here denoted by u, to the point x ∈ whose Voronoi cell C(x, ) contains u. Given r ∈ (0, ∞), the goal is to select a set of 'codes' or 'Voronoi quantizers' in a way that minimizes the r th power quantization error ('distortion error') given by Letting the quantizing set be n := {X 1 , ..., X n }, where X i are i.i.d. with density κ on d , gives rise to a distortion error I r ( n ). The mean asymptotics of I r ( n ) were first investigated by Zador [14,15], Gersho [6], and later by Graf and Luschgy [7] and Cohort [5]. If κ is strictly positive on d , if (n r/d min 1≤i≤n |U − X i | r ) n≥1 are uniformly integrable for a fixed r ∈ (0, ∞), and if ω d := π d/2 [Γ(1 + d/2)] −1 denotes the volume of the unit radius d-dimensional ball, then the mean distortion satisfies (Theorem 9.1 of [7]) (1.2) One might expect that I r ( n ) ∼ I r ( n ) + error, where after suitable normalization the error tends to a Gaussian as n → ∞. Theorem 1.1, the main result of this note, confirms this, providing variance asympotics and central limit theorems which quantify the deviation of the random distortion from its mean. The results, obtained via stabilization techniques for point processes, hold for all d ≥ 1 and capture the second order dependency of the distortion on κ and h. We also investigate the limit theory of the point measures induced by the distortion, namely the random measures where δ x signifies a unit point mass at x. For any A ⊂ d , let B(A) denote the bounded functions on A. We seek the asymptotic behavior of the integrals n r/d 〈 f , ν n 〉, where for any measure ρ on d and f ∈ B( d ), 〈 f , ρ〉 denotes the integral of f with respect to ρ. Clearly, when f ≡ 1 we have 〈 f , ν n,r 〉 = I r ( n ) whereas, for example, if f = 1 B and B is a Borel set, then 〈 f , ν n 〉 measures the random local distortion on B. We require a bit more notation. Let denote a homogeneous rate one Poisson point process on d and let 0 be the point at the origin of d . Given a point set and x / ∈ we write C(x, ) for C(x, ∪ {x}). For all r ∈ (0, ∞) let M (r) := C(0, ) |w| r d w. Put for all r ∈ (0, ∞) (1.5) Assume henceforth that h is a bounded Lebesgue almost everywhere continuous function with compact convex support A ⊂ d and that κ is a Lebesgue almost everywhere continuous probability density function which is bounded away from zero on its support, which is also assumed to be A. The next result quantifies the asymptotic L 2 deviation of the distortion I r ( n ) about its mean and it also gives a distributional result for the centered distortion I r ( n ) − I r ( n ). For any random point measure ρ, let ρ denote its centered version, that is ρ := ρ − ρ. For all r ∈ (0, ∞) and f ∈ B(A) we put Let N (0, σ 2 ) denote a mean zero normal random variable with variance σ 2 and let −→ denote convergence in distribution.
whereas as n → ∞ Remarks.
(i) Related work. When d = 1 and f ≡ 1 Cohort [5] employs spacing techniques to establish the asymptotic normality of n 1/2+r/d 〈 f , ν n 〉 for continuously differentiable κ and h. He does not consider the convergence of n 1/2+r/d 〈 f , ν n 〉 for arbitrary d ∈ and general f ∈ B(A).
(ii) Convergence of finite-dimensional distributions. As in [2,8], an application of the Cramér-Wold device shows that the finite-dimensional distributions of n 1/2+r/d ν n converge as n → ∞ to those of a mean zero Gaussian field with covariance kernel (iii) Poisson central limit theorem. For all λ > 0 let λ := λκ be a Poisson point process with intensity λκ on d , and consider the random point measure Then for all f ∈ B(A) and r ∈ (0, ∞) we have The analog of (1.8) holds with ∆(r) ≡ 0, and hence, taking f 1 , ..., f k to be indicators over disjoint sets, the resulting vector of local distortions converges to a k-variate Gaussian with independent components.
As a by-product of the approach taken to prove Theorem 1.1 we obtain the following weak law of large numbers for 〈 f , ν n 〉, f ∈ B(A).

Remarks.
(i) Related work. Assuming regularity of κ and h, Cohort [5] establishes the L p and a.s. convergence of n r/d I r ( n ), but does not consider the quantities , X an independent copy of X 1 . If κ is not bounded away from zero on A, then straightforward modifications of the proof of Theorem (iv) L ∞ distortion error. Theorem 1.2 may be extended to treat the L ∞ distortion error given by with the random measures ν n,∞ similarly defined by Under the stated assumptions on h and κ we have for all f ∈ B(A)

Lemmas
The random summands in (1.3) share neither the same representation nor the same scaling properties as those considered in previous general work on stabilizing functionals [2,8], but we may nonetheless express 〈 f , ν n 〉 as a sum of terms having an exponentially stabilizing spatial dependency structure. In other words, we may show that the behavior of the summands in (1.3) depends on the surrounding environment only within a certain finite but random distance, and, when κ is bounded away from zero, one having an exponentially decaying tail (Lemma 2.2). Sums of random terms based on nearest neighbor distances, such as those figuring in general spacings statistics [1] and nearest neighbor graphs [10], enjoy similar properties. Local dependencies of the random distortion may be exploited to rigorously demonstrate the intuitive observation that the local behavior of λ 1/d µ λ at λ 1/d x is closely approximated by that of a homogeneous Poisson point process with intensity κ(x) (Lemma 2.4). Similarly, stabilization techniques [2,8] may be used to establish convergence of the pair correlation function (Lemma 2.5), the key to establishing the variance asymptotics and central limit theory of random distortions. We refer to [13] for an accessible survey on stabilizing functionals.
This section collects some lemmas. Throughout we continue to assume that h and κ satisfy the assumptions set forth in section 1. For ease of notation we suppress dependency on r of 〈 f , ν n 〉 and related quantities. For all locally finite ⊂ d and x ∈ the distortion I r ( ) is a sum of terms of the form We will make frequent use of the representation where we recall that A is the common support of h and κ. Abusing notation we henceforth assume for all λ ≥ 1 that Letting Po(α) denote an independent Poisson random variable with parameter α, Fubini's theorem and a change of variable gives yields

Making the substitution
as desired.
Next, we consider the spatial dependencies of the terms Φ λ (x, λ ) contributing to the distortion arising from λ . For all x ∈ d and r > 0 we let B r (x) denote the Euclidean ball with radius r centered at x. For all x ∈ d and all locally finite . With this definition, R λ (x, λ ) is a 'radius of stabilization', or a range of spatial dependency, for Φ λ at the point x with respect to λ , that is to say Φ λ (x, λ ) does not depend on points in λ 1/d λ distant more than R λ (x, λ ) from λ 1/d x. See [2,8,13] for further discussion of stabilizing functionals. We refer to section 6.3 of [8] for a proof of the following central result, showing that spatial dependencies uniformly fall off exponentially fast, that is to say the radius of stabilization of Φ λ has an exponentially decaying tail and thus Φ λ is an exponentially stabilizing functional. This stabilization property is a consequence of the assumption that κ is bounded away from zero on its support A. Let 3 denote the class of all finite subsets of A having at most three elements.

Lemma 2.2. It is the case that
The next lemma establishes moment conditions useful in proving central limit theorems.

Lemma 2.3. It is the case that
Proof. We will show (2.5) first. For all x ∈ λ and all λ ≥ 1, put D(x, λ) := diam(C(λ 1/d x, λ 1/d λ )). Then Thus to show (2.5) it suffices to show sup λ≥1,x∈A [D(x, λ) 4(r+d) ] < ∞. However this is a consequence of the exponential decay of D(x, λ) uniformly in x and λ, which follows from the assumption that κ is bounded away from zero (see section 6.3 of [8]). The bound (2.6) follows similarly because the diameter of C(λ 1/d x, by D(x, λ). Similar arguments involving exponential decay of Voronoi cell diameters on binomial point sets yields (2.7).
The next convergence lemma is the analog of Lemma 3.5 of [8] and is useful in establishing convergence of one point correlations functions (cf. section 4.1 of [2]). This lemma shows that the contribution to the distortion for scaled quantizing cells arising from a non-homogeneous Poisson point process of increasing intensity converges to the contribution to the distortion arising from quantizing cells from a homogeneous Poisson point process on all of d . This intuitive idea appears on p. 376 of [6], where it forms the basis for heuristic derivations. Recall that almost every x ∈ d is a Lebesgue point of κ, that is to say for almost all x ∈ d we have that ε −d B ε (x) |κ( y) − κ(x)|d y tends to zero as ε tends to zero.

Lemma 2.4. Suppose x ∈ A is both a Lebesgue point of κ and a continuity point for h. Then for all
(2.8) Proof. Fix x ∈ A and r ∈ (0, ∞). By definition of Φ λ at (2.2) we have for all z ∈ d and all λ ≥ 1 which, after substitution, becomes We shorthand C(0, λ 1/d ( λ − x) − z) by C λ and C(0, κ(x) ) by C. By Slutsky's theorem and the definition of ξ ∞ , it is enough to show that as λ → ∞ For all ε > 0 the Cauchy-Schwarz inequality and the exponential decay of D λ uniformly in λ shows that the second integral in (2.11) can be made less than ε provided that L is large enough. On the other hand, L r |h(λ −1/d (w + z) + x) − h(x)| is bounded by 2L r ||h|| ∞ on Ω × B L and moreover, by the continuity of h at x we have that L r |h(λ −1/d (w + z) + x) − h(x)| goes to zero as λ → ∞. Thus by the bounded convergence theorem the first integral in (2.11) goes to zero as λ → ∞. Letting ε tend to zero completes the proof of (2.9).
To show (2.10) we argue as follows [8,9]. Let be the space of locally finite point sets in d equipped with the metric D( , ′ ) := max{K ∈ : ∩ B K = ′ ∩ B K } −1 . As in the proof of Lemma 2.2, ξ ∞ is stabilizing with respect to κ(x) and moreover the radius of stabilization, say R, of ξ ∞ with respect to κ(x) at a point at the origin, is finite almost surely. By radius of stabilization R, we mean the infimum of all t > 0 with the property that C(0, κ(x) ∩ B t (0)) coincides with C(0, κ(x) ∪ ), where is any finite point set in B c t (x). Also, when ε < R −1 , then the inequality D( κ(x) , ) < ε implies C(0, κ(x) ) |w| r d w = C(0, ) |w| r d w, that is to say ) is a continuity point for ξ ∞ where the topology on d × is the product of the Euclidean topology on d and the topology induced by the metric Apply Campbell's theorem to find the variance of λ r/d 〈 f , µ λ 〉 and then multiply by λ to obtain We next show We show (3.2) by appealing to the lemmas from section two and to arguments similar to those in section four of [1]. This goes as follows. Putting y = x + λ −1/d z in the right-hand side in (3.1) reduces the double integral to is the pair correlation function for Φ λ . By Lemmas 2.4 and 2.5, it follows for almost all x ∈ A and all z ∈ d that the pair correlation function for Φ λ converges to the pair correlation function for ξ ∞ , i.e., the bracketed expression in (3.2 [8]. To complete the proof of (3.2) we only need to show convergence of This is a simple consequence of the convergence (2.8), the moment bounds (2.5), and dominated convergence. Having established the variance limit (3.2), we now show that (3.2) reduces to (1.10). For all x ∈ A we define V (x, 0) := 0 and for all a > 0 we put Recalling the definition of V (r) at (1.4), this yields V (x, a) = h 2 (x)a −2−2r/d V (r). Thus (3.2) reduces toσ 2 (r, f ), showing (1.10) as desired. Now to prove (1.6) we use de-Poissonization arguments [2,8]. For all x ∈ A and a > 0, define Then as in [2,8] and using the analog of Lemma 3.6 of [8], we have (1.5). This, combined with (3.5), shows (1.6). To prove asymptotic normality (1.7) we proceed as follows. First, we establish the analogous central limit theorem for Poisson input, namely the asymptotic normality of λ 1/2+r/d 〈 f , µ λ 〉, by combining Lemmas 2.2-2.4 with the dependency graph arguments of [8,12]. This approach has been previously used in [1] and we will not repeat it here. Having established the central limit theorem for Poisson input, we may de-Poissonize to obtain (1.7). The arguments used to prove Theorem 2.3 of [8] may be followed verbatim, where in particular we note that condition A4 ′ of [8] is satisfied by the moment bounds of Lemma 2.3 and the stabilization condition (2.4).
Proof of Theorem 1.2. As with the proof of asymptotic normality, the proof of (1.11) rests on stabilization techniques. We only sketch the main ideas, referring to [9,11] for details. By Campbell's theorem we have (3.6) The random variables Φ λ (x, λ ), λ ≥ 1, are uniformly integrable by Lemma 2.3 and thus by Lemma 2.4 with z = 0 we have for all x ∈ d the convergence of means [Φ λ (x, λ )] → [ξ ∞ (x, κ(x) )] as λ → ∞. By Lemma 2.3 and the dominated convergence theorem the integral on the right hand side of (3.6) satisfies which by Lemma 2.1, establishes a mean version of the advertised limit (1.11) for Poisson input. By coupling the binomial point process n with the Poisson process λ as in [11] we may further show To show convergence in L 2 , we may appeal to the methods of Penrose [9], which rest on establishing the analogs of Lemmas 2.4 and 2.5 when λ is replaced by a binomial point process n . The proofs of these lemmas are straightforward but tedious modifications of the existing proofs and so we omit the details.