On the weak convergence of the kernel density estimator in the uniform topology

The pointwise asymptotic properties of the Parzen-Rosenblatt kernel estimator (cid:98) f n of a probability density function f on R d have received great attention, and so have its integrated or uniform errors. It has been pointed out in a couple of recent works that the weak convergence of its centered and rescaled versions in a weighted Lebesgue L p space, 1 ≤ p < ∞ , considered to be a difﬁcult problem, is in fact essentially uninteresting in the sense that the only possible Borel measurable weak limit is 0 under very mild conditions. This paper examines the weak convergence of such processes in the uniform topology. Speciﬁcally, we show that if f n ( x ) = E ( (cid:98) f n ( x )) and ( r n ) is any nonrandom sequence of positive real numbers such that r n / √ n → 0 then, with probability 1, the sample paths of any tight Borel measurable weak limit in an (cid:96) ∞ space on R d of the process r n ( (cid:98) f n − f n ) must be almost everywhere zero. The particular case when the estimator (cid:98) f n has continuous sample paths is then considered and simple conditions making it possible to examine the actual existence of a weak limit in this framework are provided.


Introduction
The Parzen-Rosenblatt estimator of a probability density function f on R d , d ≥ 1 (Parzen [21], Rosenblatt [24]) is defined as follows: Here, (X n ) is a sequence of independent random copies of a random variable X, such that X has a (Borel measurable) probability density function f . In particular, we assume that the X n , n ≥ 1 are defined on a common probability space and induce Borel measurable maps. The parameter h = h(n) → 0 as n → ∞ is called the bandwidth, and we let K h (u) = h −d K(u/h) for a kernel function K : R d → R, that is, a Borel measurable and integrable function on R d with unit integral. The estimator f n is essentially a (possibly modified) version of the histogram whose smoothness is tuned by h and potentially enhanced by the regularity of the kernel function K. The random function x → f n (x) is the empirical counterpart of the function Assuming that K is bounded on R d , it is straightforward that the random function x → r n ( f n (x) − f n (x)), x ∈ S, defines a random process belonging to ∞ (S). Clearly, for this random process to converge weakly in ∞ (S) its uniform norm r n f n − f n ∞,S has to converge weakly in R, but this is not a sufficient condition. The uniform norm of r n ( f n − f n ) on S has been studied in many instances in the literature, see for example the early works of Bickel and Rosenblatt [1], Silverman [26] and Stute [28,29]; Talagrand's inequalities [30,31] and general distributional results on empirical processes (see the monographs by van der Vaart and Wellner [34] and van der Vaart [33]) then sparked renewed interest in this problem, see e.g. Einmahl and Mason [6], Giné and Guillou [8], Giné et al. [9], Einmahl and Mason [7] and Dony and Einmahl [3].
None of these works though consider the convergence of r n ( f n − f n ) as a random process taking values in an ∞ space on R d . More broadly, the problem of analyzing the convergence of this process in functional spaces such as L p spaces on R d has long been considered to be difficult. When K 2 is integrable on R d , the recent work of Nishiyama [20] generalized a result of Ruymgaart [25] by disproving the existence of a nondegenerate Borel measurable weak limit for the process r n ( f n − f n ) in the L 2 (R d ) space of square-integrable functions on R d provided r n / √ n → 0. The ideas of his paper paved the way for the work of Stupfler [27] which showed that the same negative conclusion holds in the weighted L p spaces  [20] and Stupfler [27] use the fact that for p finite, the space L p (R d , µ) is a separable metric space whose dual space is L q (R d , µ) for q = p/(p − 1). It is well-known that the space ∞ (S) fails to be separable in general and his dual space is more difficult to work with, which causes measurabilityrelated problems for the process r n ( f n − f n ) and makes it very hard to characterize weak convergence to an arbitrary Borel measurable random element in ∞ (S). This is why we start here by introducing a convenient subspace of the dual space of ∞ (S), and we then use it to identify the possible tight Borel measurable weak limits of the ECP 21 (2016), paper 17.
process r n ( f n − f n ) in the uniform topology. It is shown in what follows that such a limit must be 0 almost everywhere on S with probability 1. Our conclusion about the possible limits of the process r n ( f n − f n ) appears to be different from what may be obtained when considering other types of convergence, such as the weak convergence of processes constructed using f n and indexed by classes of functions, see the work by van der Vaart [32] and further developments in e.g. Radulović and Wegkamp [22] and Giné and Nickl [10], even though these papers also focus on weak convergence to tight (Gaussian) limits. Moreover, we shall highlight that when K is continuous, since the process considered has continuous sample paths, one can show as a corollary of our results that the limit must be 0 everywhere on S and discard the requirement that the weak limit be tight under a further mild condition on S by taking advantage of the particular topology of spaces of continuous functions over compact sets. We finally show how this makes it possible to classify the asymptotic behavior of the process of interest, depending on (r n ), by using the sharp rates of uniform convergence of f n to f n obtained in Giné and Guillou [8]. Under a further regularity condition on f , it is then straightforward that our results carry over to the process r n ( f n − f ), which is the process of interest in practice, when a classical bias condition is satisfied.
The outline of the paper is as follows: our main results are stated in Section 2 and some concluding remarks, including on possible extensions of our results, are given in Section 3.

Main results
In all what follows, we assume that S is a Borel measurable set in R d with positive Lebesgue measure and K : R d → R is an integrable function with unit integral which is bounded on R d . The guiding ideas are those of Nishiyama [20] and Stupfler [27]: our first result relates the problem of identifying the possible weak limits of the process r n ( f n − f n ) in ∞ (S) to the simpler problem of understanding the weak convergence of sequences of real-valued random variables constructed using this process and a suitably chosen class of continuous linear functionals on ∞ (S). To do so, we start by noting that Borel measurability of the random function r n ( f n − f n ) in ∞ (S) is not clear even though K and the X i are Borel measurable, because the space ∞ (S) is not separable (see the discussion in Section 1.1 of van der Vaart and Wellner [34]). In this paper, "weak convergence" thus refers to the notion of weak convergence using outer probabilities (see Definition 1.3.3 p.17 in van der Vaart and Wellner [34]).
We continue by recalling a few facts about duality in ∞ (S). Since we will investigate the possible tight and Borel measurable weak limits of the process r n ( f n −f n ), it turns out that we need only work with the space of all bounded and Borel measurable functions on S. This space is itself a subspace of L ∞ (S), the space of all Borel measurable functions which are essentially bounded on S: Let µ S be the measure on R d whose Radon-Nikodym density with respect to the Lebesgue measure is 1 S , the indicator of the set S. Then L ∞ (S) can naturally be viewed as a subspace of L ∞ (R d , µ S ), the space of the Borel measurable functions on R d which are bounded µ S −almost everywhere. By Theorem 16 p.296 in Dunford and Schwartz [5], the space ba(R d , B(R d ), µ S ) of the additive, bounded, signed measures on the Borel σ−algebra B(R d ) which are absolutely continuous with respect to µ S is isometrically isomorphic to the dual space of L ∞ (R d , S) and defines therefore a subspace of the dual ECP 21 (2016), paper 17.
with the topology on ba(R d , B(R d ), µ S ) being induced by the total variation distance between measures. Because an element of ba(R d , B(R d ), µ S ) may be additive but not countably additive, the whole dual space ba(R d , B(R d ), µ S ) is somewhat inconvenient to work with; in particular, the absolute continuity condition with respect to the (countably additive and σ−finite) measure µ S is difficult to take advantage of because it does not translate into the existence of a Radon-Nikodym derivative with respect to µ S . This is why we consider instead the subspace Using a Hahn-Jordan decomposition, any element ν of bca(R d , B(R d ), µ S ), which is σ−finite because it is bounded, must have a Radon-Nikodym derivative with respect to µ S . The particular structure of µ S then entails that ν must have a Radon-Nikodym derivative with respect to the Lebesgue measure as well, having value 0 everywhere outside S, and we denote it by dν/dx. With these elements in mind, the following result can be stated: Proof. For any ν ∈ bca(R d , B(R d ), µ S ), the map T ν is a continuous linear form on L ∞ (S), so if G 1 and G 2 have equal distributions then T ν (G 1 ) and T ν (G 2 ) must have equal distributions as well. Conversely, suppose that for any ν ∈ bca(R d , B(R d ), µ S ) such that dν/dx L ∞ (S) < ∞, the distributions of T ν (G 1 ) and T ν (G 2 ) are equal. We introduce the class F of functions F : L ∞ (S) → R for which there exist a positive integer J, a continuous and bounded real-valued function g on R J and ν 1 , . . . , ν J ∈ bca(R d , B(R d ), µ S ) having essentially bounded Radon-Nikodym derivatives on S with respect to the Lebesgue measure, such that: the Cramér-Wold device entails that the random vectors (T ν1 (G 1 ), . . . , T ν J (G 1 )) and (T ν1 (G 2 ), . . . , T ν J (G 2 )) must have the same distribution. If ρ 1 and ρ 2 are the pushforward probability measures on L ∞ (S) induced by G 1 and G 2 , it becomes clear that In the sense of van der Vaart and Wellner [34], p.25, the class F is a vector lattice of continuous bounded functions on L ∞ (S) containing the constant functions. By Lemma 1.3.12 (ii) p.25 in van der Vaart and Wellner [34], it suffices to prove that the class F separates the points of L ∞ (S). Let then ϕ, ψ ∈ L ∞ (S) be such that ϕ = ψ. In other words, the Borel measurable set we may find a bounded Borel measurable set F such that µ S (F ) > 0 and ϕ = ψ on F . In particular, ϕ = ψ on F ∩ S, which is a Borel measurable set having a positive and finite Lebesgue which is the desired separation property. The proof is complete. This result basically makes it possible to identify a tight Borel measurable random element of ∞ (S) up to null sets in S. It is the central tool necessary to prove our first asymptotic result on the possible tight Borel measurable weak limits of the process Theorem 2.2. Let (r n ) be a nonrandom sequence of positive real numbers. If r n / √ n → 0 and the random process r n ( f n − f n ) converges weakly in ∞ (S) to a tight Borel measurable random process G then G is almost surely zero almost everywhere on S.
is Borel measurable because S is a Borel measurable set and K and dν/dx are Borel measurable as well. As a consequence, T ν (K h (· − X)) is a Borel measurable real-valued random variable and, by the continuous mapping theorem (see Theorem 1.3.6 p.20 in van der Vaart and Wellner [34]), the weak convergence of r n ( f n − f n ) to G implies the following weak convergence of Borel measurable real-valued random variables: We start by showing that T ν (G) = 0 almost surely. Because ν is countably additive and σ−finite, Fubini's theorem yields We may then rewrite ∆ n (ν) as a sum of independent and identically distributed centered random variables, as follows: A change of variables yields almost surely. Because dν/dx L ∞ (S) < ∞, we get with probability 1: In other words, the random variable T ν (K h (·−X)) is almost surely bounded. The triangle inequality thus entails Consequently, ∆ n (ν) → 0 in probability as n → ∞ and T ν (G) = 0 almost surely. Now, because inclusion preserves tightness (see Lemma 14.4 p.257 in Kallenberg [13]), G also defines a tight Borel measurable random element of L ∞ (S). By Proposition 2.1, G = 0 almost surely in L ∞ (S), which means in particular that G is almost surely zero almost everywhere on S: the proof is complete.
Theorem 2.2 is an analogue of Theorem 2.1 in Nishiyama [20] and Theorem 2.2 in Stupfler [27], which tackled the case of weak convergence in weighted L p spaces on R d , 1 ≤ p < ∞. This result says that either the process r n ( f n − f n ) converges weakly to an essentially degenerate limit or does not converge weakly to a tight Borel measurable limit.  [34], p.24). As a consequence, tightness of the weak limit does not appear to be a very restrictive requirement in practice, and this condition makes it possible to use in the proof of Proposition 2.1 the very nice characterization of the distribution of a random process on a metric space contained in Lemma 1.3.12 (ii) p.25 in van der Vaart and Wellner [34], while getting information about a non-tight distribution appears to be difficult, see (i) in this same Lemma. Tightness is consequently a desirable property, encountered in many instances when considering weak convergence in a metric space endowed with a sup-norm, see for example the necessary-and-sufficient conditions for weak convergence in the space ∞ (S) in Section 18 of van der Vaart [33] and particularly Theorem 18.14 p.261 therein. Other instances where this condition is used include recent works on weak convergence in the uniform topology over a class of functions, see e.g. van der Vaart [32], Radulović and Wegkamp [22], Mendelson and Zinn [17], Nickl [18], Giné and Nickl [10], Nickl [19] and Radulović and Wegkamp [23]. It is remarkable that although Theorem 2.2 implies that the process x → r n ( f n (x) − f n (x)) cannot have a non-essentially trivial, tight Borel measurable weak limit in the space ∞ (S), the process where G is a suitable class of functions, may actually converge in ∞ (G) to a tight ECP 21 (2016), paper 17.
Brownian bridge limit, see for instance Radulović and Wegkamp [22] and Giné and Nickl [10]. A case though in which we can strengthen our conclusion about the possible limits of r n ( f n − f n ) is when it takes its values in the space C(S) of continuous functions on S. In this case, if no point in S is isolated from the point of view of the Lebesgue measure then we should expect the potential weak limit in Theorem 2.2 to be 0 everywhere on S instead of almost everywhere. Meanwhile, the tightness hypothesis about the weak limit, although fairly mild as mentioned above, can be for instance dropped when S is compact, because the space C(S) is then a separable and complete subspace of ∞ (S), making any Borel measurable random element tight in this space. Somewhat surprisingly, the tightness requirement can actually also be dropped in the much more general case when S is σ−compact. These two reasons lead us to introduce our next assumption on S: (H 1 ) S can be written as the union of countably many compact subsets of R d and, for every x ∈ S and ε > 0, the intersection of S and the Euclidean open ball with center x and radius ε has positive Lebesgue measure.
Combining condition (H 1 ), which holds true in most if not all practical applications (for instance if S is an open cube, the closure of an open set, or equal to R d ), with a continuity assumption about K, we get the following result. Theorem 2.4. Let (r n ) be a nonrandom sequence of positive real numbers. Assume that S satisfies (H 1 ) and K is a continuous function on R d . If r n / √ n → 0 and the random process r n ( f n − f n ) converges weakly in ∞ (S) to a Borel measurable random process G then G = 0 almost surely.
Proof. The regularity requirement on K makes it clear that the sample paths of the process f n − f n are in fact almost surely continuous on S, in the sense that the event A n := { f n − f n is continuous on S}, although not necessarily measurable, contains a measurable set having probability 1; in particular, A n has outer probability 1 for every n. Furthermore, because ∞ (S) and C(S) are complete metric spaces and a uniform limit of continuous functions is continuous, it is clear that the space C(S) is a closed subspace of ∞ (S): it follows from the Portmanteau theorem (see Theorem 1.3.4 p.18 in van der Vaart and Wellner [34]) that the probability that G belongs to C(S) is equal to 1. As a first conclusion, G thus defines a probability measure on C(S).
We first deal with the case when S is compact. The space C(S) is then a separable and complete metric space so that any Borel probability measure on C(S) is tight. It follows that G defines a tight element of C(S) and thus of ∞ (S) because inclusion preserves tightness (see Lemma 14.4 p.257 in Kallenberg [13]). By Theorem 2.2, G = 0 almost everywhere on S with probability 1. Finally, because G is continuous on S and (H 1 ) holds, one concludes that G = 0 almost surely on S.
If now S is not compact, notice that for every compact set T contained in S, the restriction map H → H |T from ∞ (S) to ∞ (T ) is continuous. By the continuous mapping theorem, r n ( f n − f n ) then converges weakly in ∞ (T ) to the restriction G |T of G on the set T . We thus get G |T = 0 almost surely for any compact subset T of S. The result follows since S is a countable union of compact subsets of R d .
In particular, when S is an open cube in R d , we can infer that any weak limit of r n ( f n − f n ) in ∞ (S) should be degenerate, although it is known since Stute [28,29] that under additional conditions, the sup-norm of f n − f n over S converges almost surely at the rate v n := nh d /| log h|. This last observation suggests that the sequence (v n ) shall play a crucial role in the description of the actual asymptotic behavior of f n − f n in ∞ (S). Another consequence of Theorem 2.4 is that centered and rescaled kernel ECP 21 (2016), paper 17. density estimators cannot converge weakly to a Gaussian process in spaces of continuous functions; see also the introduction of Ruymgaart [25].
We thus examine in the second part of this work what happens depending on the behavior of (r n ) relatively to (v n ). The arguments presented in what follows make a heavy use of the exact rates of convergence for the sup-norm of f n − f n which were investigated in Giné and Guillou [8]. We introduce the following hypotheses: (H 2 ) The set S is σ−compact and its interior S • is dense in S.
(M ) The kernel K is a nonnegative, bounded, compactly supported function belonging to the linear span of the nonnegative functions k satisfying the following property: the subgraph {(s, u) ∈ R d × R | k(s) ≥ u} of k can be represented as a finite number of Boolean operations among sets of the form {(s, u) ∈ R d × R | p(s, u) ≥ ϕ(u)} where p is a polynomial on R d+1 and ϕ is an arbitrary real function.
(R) The function f is uniformly continuous and the kernel K is continuous on R d .
Assumption (H 2 ) contains condition (H 1 ) and entails that the supremum of a continuous function over S is also its supremum over its interior S • . Hypothesis (M ) is taken from Giné and Guillou [8] and Giné et al. [9]; it is basically a measurability condition ensuring that the class of functions is a bounded measurable Vapnik-Červonenkis class, which ensures in particular that f n − f n ∞,S is a Borel measurable random variable and is thus a key ingredient for the search of uniform rates of convergence for f n − f n . Many kernels, such as the naive kernel or the pyramid kernel, satisfy this assumption, see the discussion p.911 in Giné and Guillou [8]. Regularity condition (R) on f and K especially gives that the sample paths of f n − f n should be almost surely continuous; notice that a similar condition is also required in Theorem 3.1 of Stute [29]. Condition (W ) was already partly introduced in p.87 of Stute [28] and p.367 of Stute [29], and in its present form, it is one of the hypotheses necessary for the results of Giné and Guillou [8] to hold. Related but stronger assumptions are those of Giné et al. [9], p.2574. The following result then holds: Theorem 2.5. Let (r n ) be a nonrandom sequence of positive real numbers. Assume that K satisfies condition (M ), that the density function f is bounded on R d and that condition (W ) holds.
Assume further that conditions (H 2 ) and (R) hold, the set S is either R d or bounded and the set {f > 0} ∩ S • is not empty.
(ii) If r n /v n → c ∈ (0, ∞], then r n ( f n − f n ) does not converge weakly to any Borel measurable random element in ∞ (S). Applying Proposition 3.1 in Giné and Guillou [8] when S • is bounded or Theorem 3.3 in Giné and Guillou [8] when it is equal to R d , we obtain It follows that r n f n − f n ∞,S has a positive (possibly infinite) almost sure limit; if r n ( f n − f n ) converged weakly to a Borel measurable random element in ∞ (S) then this limit would be almost surely 0 by Theorem 2.4 and therefore r n f n − f n ∞,S would converge to 0 in probability, which is a contradiction. The proof is complete.
Theorem 2.5, which offers a full classification of the asymptotic behavior of r n ( f n −f n ) in ∞ (S) depending on the rate (r n ), is the counterpart of Theorem 2.2 in Nishiyama [20] and Theorems 2.3 and 2.4 in Stupfler [27] for the uniform topology. We conclude Section 2 with several remarks about this last result. Remark 2.6. In Theorem 2.5, the uniform continuity condition on f contained in (R) can be relaxed, as mentioned in Giné and Guillou [8], by assuming that the set {f > 0} is open, f is continuous and bounded on this set, and f (x) converges to 0 as x → ∞. This makes it possible to apply Theorem 2.5 to the uniform distribution or the exponential distribution.

Remark 2.7.
It is interesting to note that the rate r n for which the weak behavior of r n ( f n − f n ) becomes nontrivial in ∞ (S) is v n = nh d /| log h| which is asymptotically smaller than the rate √ nh d playing an analogue role in L p (R d , µ S ) when p is finite (Nishiyama [20] and Stupfler [27]). While it is a well-known fact that uniform rates of convergence usually feature a logarithmic penalty term, it also suggests that one cannot easily deduce the limiting behavior of r n ( f n − f n ) in ∞ (S) from its behavior in an L p (R d , µ S ) space for p finite. Indeed, write tentatively √ nh d ( f n − f n ) cannot converge weakly to a Borel measurable element in ∞ (S), which is not informative enough since we know from Stute [28,29] that the rate of convergence of f n − f n ∞,S in R is strictly smaller than √ nh d .
converges weakly to a nondegenerate weak limit has a solution; when d = 1, the result of Bickel and Rosenblatt [1] actually gives A and B such that converges weakly to a nondegenerate, explicit limit. In other words, it appears that the correct way to obtain asymptotic uniform confidence bands on f is to look directly at the (possibly weighted) supremum of v n | f n − f n | over S instead of working on the weak behavior of v n ( f n − f n ) in ∞ (S).

Concluding remarks
In this paper, we examined the weak behavior of centered and rescaled versions r n ( f n −f n ) of the Parzen-Rosenblatt density estimator f n in ∞ spaces on R d . In particular, we showed that under mild conditions, any Borel measurable weak limit of this process is equal to 0, although the exact almost sure asymptotics for uniform norms of f n − f n are known to be nontrivial. Interestingly, our results are similar to the negative results of Stupfler [27] regarding the weak behavior of r n ( f n − f n ) in L p spaces on R d for p finite; besides, the basic idea in the case of L p spaces, which was to understand the weak behavior of this process through the weak behavior of a suitable collection of its integrals, can actually also be used successfully in the ∞ space, because of the particular structure of its dual space. In other words, although an L p space for 1 ≤ p < ∞ and an ∞ space are structurally very different from each other, their dual spaces have enough common characteristics to ensure that the weak convergence properties of f n − f n can be examined in the same way.
Here j n ↑ ∞ is a sequence of integers; the function Φ is a square-integrable function such that {Φ(· − k), k ∈ Z} is an orthonormal system in L 2 (R) and moreover, the (functional) linear spaces defined by induction by are nested and such that their union is dense in L 2 (R). Then clearly As a consequence, if ν ∈ bca(R d , B(R d ), dx), we may write V (r n , ν) := Var T ν r n ( f n − E( f n )) = r 2 After straightforward computations, we get V (r n , ν) = r 2 n n E R S(2 jn X 1 , y) − E(S(2 jn X 1 , y)) dν dx y 2 jn dy 2 with S(x, y) = k∈Z Φ(x − k)Φ(y − k).
If moreover Φ is bounded and compactly supported, then |S(x, y)| ≤ Q(y − x) where Q : R → R + is bounded and compactly supported, see Lemma 8.6 in Hardle et al. [12]. Because Q is then integrable, this entails V (r n , ν) ≤ r 2 which is a bound similar to the one we had found in the proof of Theorem 2.2 in the case of the kernel density estimator. It appears then that Theorem 2.2 holds for wavelet density estimators on R as well; in other words, any centered and rescaled version of the wavelet density estimator, if it converges to a tight Borel measurable weak limit in ∞ (R), must in fact converge essentially to 0. It is known though that the exact rate of almost sure convergence of the uniform norm f n − E( f n ) ∞,R is n/(j n 2 jn ) under certain regularity conditions, see Theorem 2 in Giné and Nickl [11]. Classifying the weak behavior of r n ( f n − E( f n )) in ∞ (R) can then likely be done as in Theorem 2.5 of the present paper for f n . The method of proof presented here seems therefore flexible enough to apply to, and yield the same results for, other density estimators than the Parzen-Rosenblatt estimator.