Weak convergence of the least concave majorant of estimators for a concave distribution function

We study the asymptotic behavior of the least concave majorant of an estimator of a concave distribution function under general conditions. The true concave distribution function is permitted to violate strict concavity, so that the empirical distribution function and its least concave majorant are not asymptotically equivalent. Our results are proved by demonstrating the Hadamard directional differentiability of the least concave majorant operator. Standard approaches to bootstrapping fail to deliver valid inference when the true distribution function is not strictly concave. While the rescaled bootstrap of Dümbgen delivers asymptotically valid inference, its performance in small samples can be poor, and depends upon the selection of a tuning parameter. We show that two alternative bootstrap procedures—one obtained by approximating a conservative upper bound, the other by resampling from the Grenander estimator—can be used to construct reliable confidence bands for the true distribution. Some related results on isotonic regression are provided. MSC 2010 subject classifications: Primary 62G09, 62G20.


Introduction
Nonparametric estimation under shape constraints such as monotonicity and concavity has received increasing attention in recent years. Groeneboom and 3842 B. K. Beare and Z. Fang Jongbloed (2014) provide a helpful introduction to the current state of the field. As pointed out by Walther (2009), nonparametric estimation under shape constraints is attractive for two main reasons: (1) shape constraints are often implied by theoretical models or are at least plausible assumptions, and (2) nonparametric estimation under shape constraints is often feasible without the use of tuning parameters, as opposed to classical kernel or series estimators. Perhaps the best known shape constrained estimator is the Grenander estimator of a nonincreasing density function. Grenander (1956) showed that, given a random sample drawn from a nonincreasing probability density, the left-derivative of the least concave majorant (LCM) of the empirical distribution function achieves the maximum likelihood among all nonincreasing densities.
In this paper we provide some new results concerning the asymptotic behavior of the least concave majorant of an empirical distribution function. Quite a lot is known about its left-derivative, the Grenander estimator, already. Denote bŷ f n the Grenander estimator of a nonincreasing density f based on a sample of size n with empirical distribution F n , and byF n the LCM of F n . The pointwise asymptotic distribution off n was obtained by Prakasa Rao (1969) at points where f is strictly decreasing, and by Carolan and Dykstra (1999) and Jankowski (2014) at points where f is flat or misspecified. The rate of convergence is n 1/3 in the former case and n 1/2 in the latter, and the limit distribution is non-Gaussian. Results on the global asymptotic behavior off n -specifically, of its L p -riskhave been provided by Groeneboom (1985), Groeneboom et al. (1999), Kulikov and Lopuhaä (2005), Durot (2007) and Durot et al. (2012). These global results require f to be strictly decreasing on its support. Kulikov and Lopuhaä (2006a) studied the behavior off n near the boundary of its support, while Woodroofe and Sun (1993) and Balabdaoui et al. (2011) studied its behavior at zero.
Turning to the asymptotic behavior ofF n , it is natural to consider weak convergence of the processĜ n = √ n(F n − F ). A result of Kiefer and Wolfowitz (1976) implies thatF n and F n are asymptotically equivalent when F satisfies strict concavity on its support, so thatĜ n converges weakly to G = B • F , a Brownian bridge B composed with F . Other results on the behavior ofF n − F n when F is strictly concave on its support have been provided by Wang (1994) and Lopuhaä (2006b, 2008). On the other hand, when F is the uniform distribution on the unit interval it is known thatF n and F n are not asymptotically equivalent, andĜ n converges weakly to the LCM of B, a process studied in detail by Carolan and Dykstra (2001). Carolan (2002) considered the more general case where F is affine over some maximal subinterval [a, b] of its support, and showed that the restriction ofĜ n to [a, b] converges weakly to the LCM of the restriction of G to [a, b].
Our main result, Theorem 2.1 below, establishes the weak convergence of G n in the intermediate cases where F is concave but not necessarily strictly concave or uniform. As might be guessed from the results of Carolan (2002), the weak limitĜ can be obtained by taking LCMs of G over the distinct intervals on which F is affine. Our proof exploits the fact that the LCM operator is Hadamard directionally differentiable (see Definition 2.2 and Proposition 2.1 below) despite not being fully Hadamard differentiable. This provides enough structure to invoke the Delta method (Shapiro, 1991;Dümbgen, 1993) and in this way derive the weak limit ofĜ n . Our result applies not only to the LCM of an empirical distribution function, but to any estimator of F obtained by taking the LCM of an estimator F n of F for which √ n(F n − F ) converges weakly to a continuous process G vanishing at infinity. Thus, for instance, we may take F n to be a smoothed estimate of F , as in Eggermont and LaRiccia (2000), and we may allow the data used to construct F n to exhibit limited serial dependence. We allow F to have bounded or unbounded support on the nonnegative half-line.
Our weak convergence result forĜ n can be useful, for example, when constructing uniform confidence bands for F . However, doing so requires consistent estimation of the law of the weak limitĜ, which turns out to be nontrivial. The fact that the LCM operator fails to be fully Hadamard differentiable implies that the bootstrap does not produce consistent estimates of the law ofĜ (Dümbgen, 1993;Fang and Santos, 2015). In other words, the Delta method generalizes under Hadamard directional differentiability to obtaining the weak limit ofĜ n but not to obtaining bootstrap consistency. We show in Theorem 3.1 that it instead approximates a process that coincides withĜ only when F is strictly concave. The rescaled bootstrap of Dümbgen (1993) does produce consistent estimates of the law ofĜ (Theorem 3.2) but, consistent with Dümbgen's warning of nonrobustness, we find that it performs poorly in numerical simulations. We suggest two alternative bootstrap procedures that can be used to construct reliable confidence bands for a concave distribution function. One is obtained by approximating the distribution of the uniform norm of G, which stochastically dominates the uniform norm ofĜ and can be consistently bootstrapped. The other is obtained by resampling from the Grenander estimator. This latter procedure is interesting in that, while it fails to provide consistent estimates of the law ofĜ, it does provide unconditionally consistent estimates of upper quantiles of the law of the uniform norm ofĜ. Theorem 4.1 identifies the process approximated by bootstrapping from the Grenander estimator, and numerical simulations are used to show that the upper quantiles of its uniform norm coincide with the upper quantiles of the uniform norm ofĜ.
In Appendix A we briefly discuss how our approach to studying the asymptotic behavior of estimators of concave distribution functions may be extended to the study of isotonic regression. Specifically, we develop limit theory for the greatest convex minorant of the cumulative sum diagram, whose left-derivative is the isotonic regression estimator. Weak convergence of a suitably normalized version of the cumulative sum diagram is established in Proposition A.1, and of its greatest convex minorant in Proposition A.2.
Before proceeding further we introduce some additional notation. We denote by R + the set {x ∈ R : x ≥ 0}. The underlying probability space on which all random elements are defined is (Ω, F, P ). For a set T , we let ∞ (T ) denote the set of uniformly bounded, real valued functions on T . Of particular importance is the space ∞ = ∞ (R + ), which we equip with the uniform metric d ∞ and ball σ-field A. Random elements of ∞ are F/A-measurable maps from Ω to ∞ . We denote by weak convergence in ( ∞ , d ∞ ) in the sense of Hoffmann-Jørgensen. We denote by C 0 = C 0 (R + ) the collection of continuous real valued functions on R + vanishing at infinity, also equipped with d ∞ . We denote by · the uniform norm on ∞ (T ), where T should be clear from context.

Weak convergence ofĜ n toĜ
The distribution function F to be estimated is taken to be that of a nonnegative real valued random variable. We treat it as an element of ∞ and maintain throughout that it satisfies the following condition.
Assumption 2.1. F : R + → R is a concave distribution function.
No further technical conditions will be directly imposed on F . To maintain generality about the underlying sampling scheme and method of estimation, we suppose the existence of a sequence F 1 , F 2 , . . . of random elements of ∞ satisfying the following high level condition, in which G n denotes √ n(F n − F ).
Assumption 2.2. G n G for some random element G of ∞ with sample paths in C 0 .
If F n is the empirical distribution function of an independent and identically distributed (iid) sample of size n drawn from F then G n is the usual empirical process and clearly Assumption 2.2 is satisfied with G = B•F and B a Brownian bridge. More generally, we may allow the sample drawn from F to satisfy a mixing condition or related property (Dehling and Philipp, 2002). The ball σfield A on ∞ is coarse enough to accommodate empirical distribution functions as random elements of ∞ , unlike the Borel σ-field on ∞ (cf. Pollard, 1984, pp. 65-66). We might also take F n to be a smoothed empirical distribution function (van der Vaart, 1994) or allow F n to be some other estimator satisfying Assumption 2.2 under suitable regularity conditions.
To exploit the concavity of F we propose using the estimatorF n = MF n , where M is the LCM operator. If F n is the empirical distribution function of an iid sample drawn from F , then the left-derivative ofF n is the classical Grenander estimator of the probability density for F . The following definition is adapted from Beare and Moon (2015, Def. 2.1).
Definition 2.1. Given a nonempty convex set T ⊆ R + , the LCM over T is the operator M T : ∞ → ∞ (T ) that maps each θ ∈ ∞ to the function We write M as shorthand for M R + and refer to M as the LCM operator.
The definition of M T given here differs from that of Beare and Moon (2015) only in that those authors took the domain of M T to be ∞ ([0, 1]) and required T to be a closed subinterval of the unit interval. It is easy to show that M T θ is a concave majorant of θ by arguing as in Eggermont and LaRiccia (2001, pp. 225-226). Other well known or easily established properties of M T include monotonicity, convexity, positive homogeneity of degree one, and weak contractivity (Durot and Tocquet, 2003, Lem. 2 Beare and Moon (2015) investigated the differential properties of M [0,1] in order to study the asymptotic behavior of a test of the monotone density ratio property proposed by Carolan and Tebbs (2005). For an application of this test to an empirical puzzle in the financial literature, see Beare and Schmidt (2016). Related tests of shape constraints based on the LCM operator have been developed by Delgado and Escanciano (2012. As pointed out by Beare and Moon (2015), M [0,1] fails to be Hadamard differentiable and hence violates the assumptions made in standard treatments of the Delta method (see e.g. van der Vaart and Wellner, 1996, Thm. 3.9.4); however, M [0,1] does satisfy a certain form of directional differentiability introduced by Shapiro (1990). Quite remarkably, the Delta method is valid under this weaker notion of differentiability (Shapiro, 1991), although the Delta method for the bootstrap typically fails (Dümbgen, 1993). Fang and Santos (2015) discuss the contributions of Shapiro and Dümbgen at length, providing a range of extensions, and illustrating their applicability to a variety of problems in econometrics.

Definition 2.2. Let D and E be Banach spaces. A map
for all h ∈ D 0 and all t 1 , t 2 , . . . ∈ R + such that t n ↓ 0. It is said to be Hadamard directionally differentiable at θ ∈ D tangentially to a set D 0 ⊂ D if there is a map φ θ : for all h ∈ D 0 and all h 1 , h 2 , . . . ∈ D and t 1 , t 2 , . . . ∈ R + such that t n ↓ 0 and h n − h D → 0.
As with various notions of differentiability in the literature, Hadamard directional differentiability can be understood by looking at the restrictions imposed on the approximating map (i.e. the derivative) and the way the approximation error is controlled. Specifically, define the remainder term where φ(θ) + φ θ (h) can be viewed as the first order approximation to φ(θ + h).
Hadamard directional differentiability of φ implies that the approximation error Rem θ (h) satisfy that Rem θ (th)/t tends to zero uniformly in h ∈ K for any compact set K -i.e., Compact differentiability in this sense, together with continuity of φ θ , is equivalent to Hadamard directional differentiability (Shapiro, 1990, Prop. 3.3). However, while continuity of φ θ is assured under Hadamard directional differentiability, linearity is often lost. Gâteaux directional differentiability is of course weaker than Hadamard directional differentiability, as no robustness with respect to perturbations in the direction h is required. Our first result, similar to Lemma 3.2 of Beare and Moon (2015), establishes Hadamard directional differentiability of the LCM operator. Its proof, along with the proofs of other results to be stated, may be found in Section 5 below.
The directional derivative M θ therefore behaves like a hybrid of the LCM and identity operators: for any direction h ∈ C 0 , it majorizes h by concave functions on regions over which θ is affine but acts like an identity map elsewhere.
Our next result summarizes some useful properties of M θ .

Proposition 2.2. For each θ ∈ ∞ , the Hadamard directional derivative M θ is weakly contractive and convex. Further, M θ is linear if and only if θ is strictly concave on
With Hadamard directional differentiability of M in hand, we obtain the weak limit ofĜ n by employing the Delta method.  Carolan (2002), who showed in the proof of his Theorem 5 that when F is affine over a maximal interval [a, b] ⊆ R + , the restriction ofĜ n to [a, b] converges weakly to the LCM of the restriction of G to [a, b]. Our result extends his to obtain weak convergence of the entire processĜ n even when F may have multiple affine segments separated by kinks, or by intervals over which it is strictly concave. Further, since our proof is an application of the Delta method, it is simple for us to consider general estimators F n satisfying Assumption 2.2, whereas Carolan (2002) requires F n to be the empirical distribution based on iid draws from F .
Remark 2.4. For typical choices of F n satisfying Assumption 2.2 we will havê G(x) = 0 for all x such that F (x) = 1.
Remark 2.5. If F is strictly concave then we know from Proposition 2.2 that M F is the identity map on C 0 . In this caseĜ = G, and so Theorem 2.1 implies the asymptotic equivalence ofF n and F n . More generally, asymptotic equivalence ofF n and F n holds when F is strictly concave at all x such that F (x) < 1, and G(x) = 0 for all x such that F (x) = 1

Bootstrap approximation of the law ofĜ n
Parallel to the level of generality adopted in our treatment of the estimator F n in the previous section, we here maintain a high degree of generality with respect to the method used to obtain a bootstrap version of F n . Let F * 1 , F * 2 , . . . be random elements of ∞ . For each n we assume the existence of independent σ-fields X n , W * n ⊂ F such that F n is X n /A-measurable and F * n is (X n ⊗ W * n )/Ameasurable. X n should be interpreted as the σ-field generated by the data used to construct F n , and W * n should be interpreted as the σ-field generated by the random weights associated with the bootstrap procedure used to generate a realization of F * n . For instance, if F n is the empirical distribution of an iid sample (X 1 , . . . , X n ) drawn from F , then the standard nonparametric bootstrap involves computing where (W * 1,n , . . . , W * n,n ) is a multinomial random vector with n categories and probabilities (n −1 , . . . , n −1 ), distributed independent of (X 1 , . . . , X n ). In this case X n is the σ-field generated by (X 1 , . . . , X n ) and W * n is the σ-field generated by (W * 1,n , . . . , W * n,n ). More general resampling schemes such as block bootstraps or Markovian procedures for serially dependent data may also be accommodated within our framework; see Radulović (2002) for a survey and discussion. We require some additional notation. Let P denote the collection of all probability measures on the measurable space ( ∞ , A). Denote by d P a metric on P that metrizes weak convergence in ∞ under d ∞ ; for instance, we could take d P to be the bounded Lipschitz metric, as in Pollard (1984, pp. 74-75). Given an arbitrary random element H of ∞ and σ-field C ⊆ F, denote by L(H) the law of H and by L(H | C) the law of H conditional on C, the latter uniquely defined up to almost sure equivalence.
Let G * n = √ n(F * n − F n ), the bootstrap version of G n . We assume that the bootstrap law of G * n delivers a valid approximation to the law of G n , in the following sense.
Assumption 3.1 is automatically satisfied when F * n is the standard nonparametric bootstrap version of the empirical distribution of iid draws from F . It is also satisfied by various dependent resampling schemes under suitable technical conditions, as discussed by Radulović (2002).
We would like to obtain a valid bootstrap approximation to the law ofĜ n . An obvious approach is to use as an approximation the law of √ n(MF * n −MF n ) conditional on X n . If we could apply the Delta method for the bootstrap (see e.g. van der Vaart and Wellner, 1996, Thm. 3.9.11) it would be possible to conclude that in outer probability, justifying the use of the law of √ n(MF * n −MF n ) conditional on X n as an approximation to the law ofĜ n . Unfortunately it is not possible to apply the Delta method for the bootstrap in this way when the directional derivative M F is nonlinear, and the above statement of convergence in outer probability is typically false unless F is strictly concave on its support. This is a consequence of Proposition 1 of Dümbgen (1993) and Theorem A.1 of Fang and Santos (2015); see also Volgushev and Shao (2014, p. 417). We instead have the following result characterizing the unconditional weak limit of √ n(MF * n − MF n ).
where G is an independent copy of G.
The unconditional weak limit appearing in Theorem 3.1 is equal toĜ if F is strictly concave on its support, but otherwise typically differs fromĜ. In the latter case we cannot have weak convergence of √ n(MF * n − MF n ) conditional on X n toĜ, so the bootstrap is not well-behaved.
In view of the failure of the standard bootstrap approximation to the law of G n , we may instead consider an alternative route based on a suitable estimator M n of the operator M F . The simple plug-in estimator M Fn is not effective because from Proposition 2.1 it is clear that M F h is typically not continuous in F for fixed h. We may instead constructM n using a version of numerical differentiation. Specifically, letM n be given bŷ with t 1 , t 2 , . . . a sequence of positive reals chosen to satisfy the following condition.
A bootstrap approximation to the law ofĜ n is provided by the law ofM n G * n conditional on X n . Note that, in view of the X n /A-measurability of F n and (X n ⊗ W * n )/A-measurability of G * n , we may in practice simulate the law of M n G * n conditional on X n by evaluatingM n at repeated bootstrap realizations of G * n . The next result indicates that this modified bootstrap procedure leads to asymptotically valid approximation of the law ofĜ n .
Theorem 3.2. If Assumptions 2.1, 2.2, 3.1 and 3.2 hold, then The bootstrap procedure just described is precisely the rescaled bootstrap studied by Dümbgen (1993). It can also be viewed as a special case of the more general bootstrap procedure studied by Fang and Santos (2015). Though Theorem 3.2 is immediate from Proposition 2 of Dümbgen (1993) and Proposition 2.1 above, to provide additional clarity we give a proof based on verifying the conditions of Theorem 3.2 of Fang and Santos (2015). Despite the asymptotic approximation established in Theorem 3.2, we will see in the following section that confidence bands constructed using the rescaled bootstrap perform poorly in numerical simulations.
We close this section by pointing out a connection between the rescaled bootstrap and the more familiar m-out-of-n bootstrap. This connection was identified by Dümbgen (1993), who pointed to earlier uses of the m-out-of-n bootstrap by Bretagnolle (1983) and Srivastava (1985, 1987); see also Bickel et al. (1997). Suppose we let F * n be the empirical distribution of an iid sample of m = m(n) draws from F n , and set G * n = √ m(F * n − F n ) and t n = m −1/2 . Assumption 3.1 will be satisfied if, for instance, F n is the empirical distribution of an iid sample from F and m tends to infinity with n (see e.g. Shorack and Wellner, 1986, pp. 763-764) while Assumption 3.2 will be satisfied if m increases to infinity more slowly than n. For this choice of G * n and t n we havê and so our rescaled bootstrap approximation to the law ofĜ n is precisely the usual m-out-of-n bootstrap approximation. It is thus apparent that, when the m-out-of-n bootstrap works, it does so because it implicitly provides an estimate of M F G by numerical differentiation, as in (3.2). Our more general framework allows us to retain this aspect of the m-out-of-n bootstrap procedure, while using bootstrap samples of size n rather than m to approximate the law of G n .

Uniform confidence bands for F
In this section we consider a variety of methods for constructing uniform confidence bands for F and investigate their performance using numerical simulations. It is apparent from Theorem 2.1 that (1 − α)-level uniform confidence bands for F may be constructed fromF n and a suitable estimate of the (1 − α)quantile of Ĝ . Denote this quantile by Different estimates of q 1−α will yield different confidence bands for F . The rescaled bootstrap of Dümbgen (1993) discussed in Section 3 provides one way to estimate q 1−α . The rescaled bootstrap estimate is which may be computed by evaluating M n G * n at repeated bootstrap realizations of G * n . We will callF n ± n −1/2 q 1−α n the rescaled bootstrap confidence bands. With Theorem 3.2 in hand it is straightforward to establish the following result.
Proposition 4.1. Let Assumptions 2.1, 2.2, 3.1 and 3.2 hold, and suppose that the distribution function of Ĝ is continuous and strictly increasing at q 1−α . Then q 1−α n → q 1−α in probability, and A limitation of the rescaled bootstrap (aside from poor small sample performance, which we shall come to shortly) is the need to choose the tuning parameter t n . There is an alternative procedure we can use to construct conservative confidence bands for F that does not require the use of a tuning parameter. A consequence of the weak contractivity of the LCM operator is that we must have Ĝ n ≤ G n when F is concave. If F n is the empirical distribution function of an iid sample drawn from F , and we are willing to assume further that F (0) = 0, then the law of G n is distribution free and its (1 − α)-quantile, which we denote byq 1−α n , may be obtained by Monte Carlo simulation or from tables of critical values for the Kolmogorov-Smirnov test. This quantile provides an upper bound on the (1 − α)-quantile of Ĝ n , and so we have for any fixed n, without resorting to asymptotics. We will callF n ± n −1/2q1−α n the conservative fixed n confidence bands.
In more general settings where F n is not the empirical distribution function of an iid sample drawn from F , the law of G n may need to be estimated in order to obtain conservative confidence bands for F . Under Assumption 3.1, this may be achieved using a suitable bootstrap technique. Define the quantilě We will callF n ± n −1/2q1−α n the conservative bootstrap confidence bands. The next result confirms they are asymptotically valid though potentially conservative.
Proposition 4.2. Let Assumptions 2.1, 2.2 and 3.1 hold, and suppose that the distribution function of G is continuous and strictly increasing at q 1−α . Theň q 1−α n →q 1−α in probability, and Our next procedure for constructing confidence bands is motivated by the Delta method approximation (3.1), which suggests the use of bands of the form We will callF n ± n −1/2q1−α n the naive bootstrap confidence bands. As discussed earlier, the Delta method approximation (3.1) is only valid for those F that are strictly concave on their support, because linearity of M F is required to apply the Delta method for the bootstrap. When F is not strictly concave on its support we do not expect the naive bootstrap confidence intervals to achieve their nominal coverage rate asymptotically. Their limiting behavior is instead governed by Theorem 3.1.
The final method of constructing confidence bands we shall consider was suggested by a referee, and has also been studied by Sen et al. (2010). Assume that F n is the empirical distribution function of an iid sample (X 1 , . . . , X n ) drawn from F . Let F † n be the empirical distribution function of n iid draws fromF n ; such draws may be constructed by applying the quantile function corresponding toF n to each of n draws from the uniform distribution on the unit interval, independent of one another and of X n , the σ-field generated by (X 1 , . . . , X n ). Definē We will callF n ± n −1/2q1−α n the constrained bootstrap confidence bands, because the distribution from which our bootstrap sample is drawn is constrained to satisfy concavity. Their unconditional limiting behavior is governed by the following result.

Theorem 4.1. Under Assumption 2.1 we have
where G is an independent copy of G.
Based on the fact that the weak limit in Theorem 4.1 is generally not equal to the weak limitĜ ofĜ n unless F is strictly concave, we might expect the constrained confidence bands to perform poorly, but in fact they perform well in our numerical simulations. It turns out that the upper quantiles of the uniform norms of these two weak limits are, as far as we can tell from numerical computations, identical. We will come back to this fortuitous property shortly.  Sen et al. (2010) have shown that confidence intervals for the left-derivative of F at a point obtained by bootstrapping fromF n are asymptotically invalid and have coverage rates well below the nominal level at relevant sample sizes. Moreover, their results indicate that resampling from the Grenander estimator can yield bootstrap processes that weakly converge unconditionally but not conditional on the data; note that Theorem 4.1 says nothing about conditional convergence. The good performance of the constrained confidence bands for F in our numerical simulations is therefore quite surprising.
To assess the small sample behavior of the various confidence bands just described we ran two sets of numerical simulations, with the former involving iid samples and the latter serially dependent samples. First we describe the former set of simulations. In each of 10000 experimental replications we randomly generated an iid sample of size n = 100 from one of three distribution functions to be defined shortly, and estimated that distribution usingF n , the LCM of the empirical distribution function. Setting α = 0.1, we computed the rescaled, naive and constrained bootstrap confidence bands. For the rescaled bootstrap we used twenty values of the tuning parameter t n running from 0.05 to 1 in increments of 0.05. Note that at t n = 0.1 = n −1/2 the rescaled bootstrap confidence bands are identical to the naive bootstrap confidence bands. All confidence bands were computed using 1000 bootstrap samples. We also computed the conservative fixed n confidence bands, withq 1−α n = 1.2 obtained by Monte Carlo simulation. The three distribution functions we used in our numerical simulations are plotted in Figure 1. All three distributions are supported on the unit interval. The first distribution is the uniform distribution on the unit interval, F 1 (x) = x. The second distribution is the piecewise linear function The third distribution, whose graph forms a quarter circle, is Note that F 1 is linear on its support, that F 3 is strictly concave on its support, and that F 2 is concave but not strictly concave on its support. The results of our first set of numerical simulations are displayed in Figure 2. The three panels correspond to the three different distribution functions plotted in Figure 1. The vertical axes measure coverage rates and the horizontal axes indicate the tuning parameter t n used for the rescaled bootstrap. Coverage rates for the rescaled, naive and constrained bootstrap confidence bands are plotted with solid, dotted and dashed lines respectively. Coverage rates for the conservative fixed n confidence bands are plotted with dash-dotted lines. We enumerate several observations on the displayed results.
1. Theorem 3.1 implies that the naive bootstrap confidence bands should have limiting coverage rate equal to the nominal level for the strictly concave distribution function F 3 , and indeed the computed coverage rate with n = 100 is 0.86, fairly close to the nominal level of 0.9. The coverage rates are 0.73 and 0.80 for F 1 and F 2 respectively, well below the nominal level. 2. The conservative fixed n confidence bands are, as expected, conservative, with coverage rates of 0.95 for F 1 , F 2 and F 3 . 3. The rescaled bootstrap confidence bands do not perform well, despite the asymptotic justification for their use provided by Proposition 4.1. At t n = n −1/2 = 0.1 they are identical to the naive bootstrap confidence bands by construction. For F 1 and F 2 , as we increase t n the coverage rate rises, but never attains the nominal rate. For F 3 the coverage rate is insensitive to the choice of t n . Additional unreported simulations showed the coverage rate for all three distributions effectively flat for t n > 1. 4. The constrained bootstrap confidence bands perform well, with coverage rates of 0.87, 0.88 and 0.88 for F 1 , F 2 and F 3 respectively. They dominate the rescaled bootstrap confidence bands across all values of t n .
The good performance of the constrained bootstrap confidence bands is surprising because Theorem 4.1 indicates that the weak limit of √ n(MF † n −F n ) differs fromĜ except when F is strictly concave. To investigate further, we computed the quantile functions of the uniform norms of the weak limitsĜ, orems 2.1, 3.1 and 4.1, for each of the three distributions F 1 , F 2 and F 3 . This was achieved by ordering the uniform norms of 10000 independent realizations of the three weak limits; each such realization was constructed from two independent Brownian bridges B and B computed using Karhunen-Loève series truncated at 10000 terms, evaluated over a grid of 1000 points spread evenly over the unit interval. The computed quantile functions are displayed in the top three panels of Figure 3. The three panels correspond to F 1 , F 2 and F 3 respectively. Solid, dotted and dashed lines are used to plot the quantile functions of the uniform norms of the weak limitsĜ, M F (G + G ) − M F (G ) and M F (G+M F (G ))−M F (G ) respectively. As expected, the three quantile functions coincide in the third panel, because the weak limits coincide when F is strictly concave. The quantile function for M F (G + G ) − M F (G ) lies below the quantile function for Ĝ in the first two panels, consistent with the undercoverage of the naive bootstrap confidence bands shown in the first two panels of Figure 2. What is more interesting is that the quantile functions for Ĝ and M F (G + M F (G )) − M F (G ) are nearly identical in the first two panels. Visible discrepancies occur only at quantiles of around 0.4 and below. The near-equality of these two quantile functions explains the good behavior of the constrained bootstrap confidence bands in Figure 2. Note that while the laws of the uniform norms ofĜ and M F (G + M F (G )) − M F (G ) are approxi- mately equal, the same is not true more generally for other choices of norm. In the second row of panels in Figure 3 we plot quantile functions for the L 1 -norms of the three weak limits. The three quantile functions are more clearly distinct from one another in the first two of these panels.
To provide further insight into the relationship between Ĝ and M F (G + M F (G )) − M F (G ) , in Figure 4 we display scatterplots of the two quantities based on 1000 independent draws of (G, G ). The horizontal axes measure Ĝ and the vertical axes measure M F (G + M F (G )) − M F (G ) . In panel (c), all realizations lie on the 45 • -line, because M F is linear (and idempotent). In panels (a) and (b) this is not the case; however, deviations from the 45 • line occur overwhelmingly toward the lower quantiles of the distributions of the two uniform norms. In panel (a) we do not see a single deviation from the 45 • line beyond 0.5 on the horizontal axis, which corresponds roughly to the 0.4 quantile of Ĝ . The poor performance of the rescaled bootstrap confidence bands apparent in Figure 2 was anticipated by Dümbgen (1993, p. 126), who warned that the rescaled bootstrap is "very nonrobust and shouldn't be used". He attributed this nonrobustness to the fact that the rescaled bootstrap typically fails to be consistent under a sequence of parameters indexed by the sample size (in our case, under a sequence of distributions F n ) converging at a rate no faster than n −1/2 . Note that the consistency property we established in Theorem 3.2 requires F to be fixed. The results in Figure 2 confirm the pertinence of Dümbgen's warning, especially in view of the dominant performance of the constrained bootstrap confidence bands. In additional unreported simulations, we investigated whether the performance of the rescaled bootstrap improved with larger sample sizes. We found that its performance continued to be disappointing even with sample sizes as large as n = 1000, exhibiting substantial undercoverage over all tuning parameter values. By comparison, the constrained bootstrap confidence bands obtained a coverage rate of nearly exactly 0.9 with n = 1000. We ran a second set of simulations to investigate the case of dependent sampling. The samples generated were stationary Markov chains of length n = 100. The invariant distribution of each chain was chosen to be F 1 , F 2 or F 3 , as displayed in Figure 1. Dependence in each chain was induced using a Gaussian copula function with correlation parameter ρ = 0.75. This means that the bivariate distribution of any two consecutive observations in a chain has Gaussian copula with ρ = 0.75 and marginal distributions both equal to one of F 1 , F 2 and F 3 . See Beare (2010Beare ( , 2012 for more on copula-based Markov models and their dependence properties. For each Markov chain of length n we computed its empirical distribution function F n . Bootstrap versions F * n of F n were generated by applying the so-called local bootstrap of Paparoditis and Politis (2002), which resamples observations in such a way as to preserve Markovian dependence. A tuning parameter used for the local bootstrap was chosen using a plug-in procedure based on an auxiliary first-order autoregression, as described by Paparoditis and Politis (2002, p. 315). See Beare and Seo (2014) for additional discussion of the local bootstrap and its application to copula-based Markov models.
The results of our second set of simulations are displayed in Figure 5. As with the first set of simulations, coverage rates were computed over 10000 experimental replications, and confidence bands were computed using 1000 bootstrap samples. Coverage rates were computed for the rescaled, conservative and naive bootstrap confidence bands, all based on the local bootstrap. We did not compute coverage rates for the constrained bootstrap confidence bands or conservative fixed n confidence bands because their applicability is confined to the iid setting. We enumerate some brief comments on the displayed results.
1. The naive bootstrap confidence bands have coverage rates of 0.80 and 0.81 for F 1 and F 2 respectively, which is an improvement on the iid case, but nevertheless well below the nominal level of 0.9. For the strictly concave distribution F 3 , where Theorem 3.1 indicates that we might expect the naive confidence bands to perform well, the coverage rate is only 0.83. 2. Despite the asymptotic conservatism established in Proposition 4.2, the conservative bootstrap confidence bands are in fact only conservative for F 1 . For F 2 and F 3 their coverage rate is 0.87, a little below the nominal level. 3. The coverage rates for the rescaled bootstrap confidence bands are qualitatively similar to the iid case, being equal to the coverage rate of the naive bootstrap confidence bands at t n = n −1/2 = 0.1 and then either rising (for F 1 and F 2 ) or staying relatively flat (for F 3 ) as t n increases. They are dominated by the conservative bootstrap confidence bands for all values of t n .
We conclude from the results displayed in Figure 5 that the conservative bootstrap confidence bands are the most reliable of the three confidence bands considered. These results are of course specific to the structure of dependence and method of resampling considered, but they at least point to the possibility of effective inference in a dependent setting using the conservative bootstrap confidence bands, and provide further evidence of poor performance of the rescaled and naive bootstrap confidence bands.

Proofs
Proof of Proposition 2.1. Weak contractivity of M yields t −1 It therefore suffices for us to demonstrate Gâteaux, rather than Hadamard, directional differentiability of M. That is, we wish to show that for each concave θ ∈ ∞ , the maps M θ,n : C 0 → ∞ and ζ θ : To show that M θ,n h is continuous on R + , we first observe that continuity of M θ,n h on (0, ∞) follows from the fact that the LCMs M(θ) and M(θ + t n h) are both concave on R + , hence continuous on (0, ∞). It remains to establish continuity at zero. Since M(θ) is bounded from below and nondecreasing on (0, ∞) (it is bounded from below by − θ and must therefore be nondecreasing due to its concavity), its right-limit at zero exists, and we denote it by M(θ)(0+).
To show that ζ θ h is continuous on R + observe that for any x ∈ R + either the set T θ,x is an open interval containing x, or it is the singleton {x}. In the former case ζ θ h coincides with the concave function M T θ,x h over T θ,x , so ζ θ h is continuous at x. Suppose instead that T θ,x = {x}, so that ζ θ h(x) = h(x) (this is the relevant case when x = 0). Then for any x > x it must be the case that T θ,x ⊆ (x, ∞), and consequently The function h is continuous and therefore has right-limit h(x) at x. We can show that the function M (x,∞) h has right-limit h(x) at x in the same way that we showed (5.2). Letting x decrease to x, we deduce from (5.4) that ζ θ h has right-limit h(x) at x, and is therefore right-continuous at x. A symmetric argument shows that ζ θ h is left-continuous at x when x > 0. We conclude that ζ θ h is continuous everywhere on R + , as claimed.
Next we obtain continuous extensions of M θ,n h and ζ θ h toR + . A bounded, nondecreasing real valued function on R + -in particular, any concave member of ∞ -is convergent to its supremum at infinity. Since M θ,n h is t −1 n times the difference of two such functions, M(θ + t n h) and θ, we may extend it continuously toR + by defining (5.5) To obtain a continuous extension of ζ θ h toR + , first consider the case where a := inf{x ∈ R + : θ(x) = sup y∈R + θ(y)} < ∞, so that θ achieves its supremum on [a, ∞), or possibly (0, ∞) if a = 0. In this case we have for all x > a. The function M (a,∞) h is concave and bounded from below by h, which vanishes at infinity, so M (a,∞) h must be nondecreasing. Also, M (a,∞) h is bounded from above by h . Since M (a,∞) h is nondecreasing and bounded it must be convergent at infinity, and so we may continuously extend ζ θ h toR + by defining Next suppose that we instead have a = ∞. In this case θ does not achieve its supremum on R + , and consequently T θ,x is of finite length for each x ∈ R + . Fix > 0. Since h ∈ C 0 , there exists N < ∞ such that, for all x ≥ N , we have |h(x)| < . But T θ,N is of finite length, so there must exist a finite N > N such that N is less than all elements of T θ,N . For all x ≥ N we have Since was arbitrary it follows that ζ θ h vanishes at infinity. We thus obtain a continuous extension of ζ θ h tō R + in the case a = ∞ by defining ζ θ h(∞) = 0. Before continuing further it will be useful to extend the domain and codomain of the LCM operator to accommodate sums of bounded functions and affine functions. Letting S denote the collection of maps from R + to R formed by summing a function in ∞ and an affine function, we define Mf (x) = inf{g(x) : g ∈ S, g is concave, and f ≤ g}, f ∈ S, x ∈ R + , which evidently reduces to our earlier definition of Mf when f ∈ ∞ . Consider functions f, g ∈ S with g affine, and write the latter as g(x) = σ + τ x with σ, τ ∈ R. The second part of the first sentence of Lemma 2.1 of Durot and Tocquet (2003) implies that M(f + g)(x) = M(f + σ)(x) + τ x, and the first part of the same sentence implies that M(f +σ) = Mf +σ. This proves a useful property of M: we have M(f + g) = Mf + g for any f, g ∈ S with g affine.
We next show that M θ,n h(x) is nonincreasing in n for each x ∈R + . First consider the case x ∈ (0, ∞); our arguments for this case are similar to those in the proof of Lemma 3.2 of Beare and Moon (2015). Since θ is concave, for each x ∈ (0, ∞) the supporting hyperplane theorem ensures the existence of an affine function ξ x : R + → R such that ξ x (x) = θ(x) and ξ x ≥ θ. We now have Since M is monotone and ξ x ≥ θ, this shows that M θ,n h(x) is nonincreasing in n for each x ∈ (0, ∞). Next consider the case x = 0. Since Mf (0) = f (0) for every f ∈ S, we have M θ,n h(0) = h(0), which is constant and hence nondecreasing in n. Finally consider the case x = ∞. Letting ξ ∞ = sup x∈R + θ(x), we see from (5.5) that M θ,n h(∞) = sup x∈R + [h(x)+t −1 n (θ(x) − ξ ∞ )], which is nondecreasing in n.
It remains to demonstrate that M θ,n h(x) → ζ θ h(x) for each x ∈R + . It is clear that ζ θ h(0) = h(0) and we observed already that M θ,n h(0) = h(0) for every n, so the desired convergence holds for x = 0. From (5.5), we also have with the stated convergence following from the Hadamard directional differentiability of the supremum operator; see e.g. Lemma B.1 of Fang and Santos (2015), which applies sinceR + is homeomorphic to [0,1]. Note that we should interpret h(∞) = 0 and arg max θ = {∞} if θ does not attain its supremum on R + . We now have M θ,n h(∞) → sup x≥a h(x) if θ attains its supremum on R + , and M θ,n h(∞) → 0 otherwise, so that in either case we have the desired convergence M θ,n h(∞) → ζ θ h(∞). It remains to establish convergence at points x ∈ (0, ∞). Our arguments here are again similar to those in the proof of Lemma 3.2 of Beare and Moon (2015). Let h θ,n,x = h + t −1 n (θ − ξ x ), and recall that M θ,n h(x) = Mh θ,n,x (x). A representation of the LCM in terms of a supremum of secant segments (cf. Carolan, 2002, Lem. 1

) allows us to write
Mh θ,n,x (x) = sup Since ξ x is affine and ξ x (x) = θ(x), the term in square brackets in (5.6) equals (5.7) The term in square brackets in (5.7) is equal to zero whenever x , x ∈ T θ,x and is less than zero otherwise. Moreover, letting T δ θ,x = {y ∈ R + : |y − z| < δ for some z ∈ T θ,x }, for any δ > 0 there exists > 0 such that the term in square brackets in (5.7) is less than − whenever we do not have x , x ∈ T δ θ,x . For n large enough that sup y∈R + h(y) − t −1 n < inf y∈R + h(y), we may therefore exclude points outside of T δ θ,x from the suprema in (5.6), yielding Mh θ,n,x (x) = sup with the inequality following from the nonpositivity of the term in square brackets in (5.7). Since h θ,n,x = h on T θ,x , we also have M T θ,x h(x) ≤ Mh θ,n,x (x). We have now shown that M T θ,x h(x) ≤ M θ,n h(x) ≤ M T δ θ,x h(x) for all n sufficiently large, implying that lim sup Suppose instead that λ = 0, so that m = 0 on the support of F . In this case the weak convergence established in Proposition A.1 may be rewritten as √ n (S n − S) σW, (A.5) with S = L 0 . Since L 0 is affine, Lemma 2.1 of Durot and Tocquet (2003) implies that the LCM of √ n(L 0 − S n ) is √ n(L 0 −Ŝ n ). From (A.5) and the continuous mapping theorem we therefore have Our results on the limiting behavior of S n andŜ n suggest a simple test of the null hypothesis that the nondecreasing function m is in fact flat. Under this null, it may be shown using the weak convergence (A.5), Lemma 2.1 of Durot and Tocquet (2003) and the continuous mapping theorem that √ n(Ŝ n − S n ) σ (MW − W) .
On the other hand, if m is strictly increasing then S n andŜ n are asymptotically equivalent, and we have √ n Ŝ n − S n → 0 in probability. This suggests using a test statistic of the form T n = √ n Ŝ n − S n /σ n , whereσ n is a consistent estimator of σ, and rejecting the null of flatness when T n is smaller than the α-quantile of MW − W . Such a procedure will reject with probability approaching α when m is flat, and with probability approaching one when m is strictly increasing.