Refinements of the Kiefer-Wolfowitz Theorem and a Test of Concavity

This paper studies estimation of and inference on a distribution function $F$ that is concave on the nonnegative half line and admits a density function $f$ with potentially unbounded support. When $F$ is strictly concave, we show that the supremum distance between the Grenander distribution estimator and the empirical distribution may still be of order $O(n^{-2/3}(\log n)^{2/3})$ almost surely, which reduces to an existing result of Kiefer and Wolfowitz when $f$ has bounded support. We further refine this result by allowing $F$ to be not strictly concave or even non-concave and instead requiring it be"asymptotically"strictly concave. Building on these results, we then develop a test of concavity of $F$ or equivalently monotonicity of $f$, which is shown to have asymptotically pointwise level control under the entire null as well as consistency under any fixed alternative. In fact, we show that our test has local size control and nontrivial local power against any local alternatives that do not approach the null too fast, which may be of interest given the irregularity of the problem. Extensions to settings involving testing concavity/convexity/monotonicity are discussed.


Introduction
Let {X i } n i=1 be a sample of i.i.d. nonnegative random variables with common distribution function F that is known to be concave. The seminal work by Grenander (1956) establish that the nonparametric maximum likelihood estimator of F is given by the least concave majorantF n of the empirical distribution function F n . In this context, the present paper provides new results on two aspects of the problem.
The first set of results concerns the closeness betweenF n and F n relative to the uniform norm · ∞ . The celebrated Marshall's lemma indicates thatF n is √ n-consistent for F relative to · ∞ -specifically, for each n ∈ N, Kiefer and Wolfowitz (1976) show thatF n in fact is asymptotically equivalent to F n by essentially establishing the following result: Theorem 1.1. Let α F ≡ inf{x ∈ R : F (x) = 1} and F be twice continuously differentiable on [0, α F ] with sup{x ∈ R : F (x) = 0} = 0. If α F < ∞ and
Since F (0) = 0 by assumption, it follows from α F < ∞ that the distribution associated with F has bounded support. In turn, the condition (1.1) implies that f (x) < 0 for all x ∈ (0, α F ) and hence that F is strictly concave on the support [0, α F ]. Finally, the condition (1.2) demands that the derivative f of the density be small relative to the minimum of f on its support. This paper provides refinements of Theorem 1.1 along several dimensions. First, we consider situations where f has potentially unbounded support, i.e., α F = ∞, an extension which appears to be nontrivial-see Remark 5 in Kiefer and Wolfowitz (1976, p.82) and also Kiefer and Wolfowitz (1977). In doing so, we necessarily have γ F = ∞ (so that (1.2) is being violated) because the density f must approach zero along the right tail. Our second refinement, which may or may not be surprising, allows F to be non-strictly concave or even non-concave but requires that it be "asymptotically" strictly concave as the sample size tends to infinity. In any case, the above asymptotic order result is uncovered under weaker regularity conditions.
The second set of results concerns inference aspect of the problem. In particular, we develop a test for the hypothesis that F is concave or equivalently f is nonincreasing. The insight we exploit here is that the hypothesis can be equivalently formulated as: whereF is the least concave majorant of F , and · p is an L p norm (with appropriate weighting) for p ∈ [1, ∞]. Thus, it is natural for us to employ the test statistic √ n F n − F n p . There are several technical challenges, however, in establishing statistical properties of our test. First, as demonstrated in Beare and Moon (2015) and Beare and Fang (2017), the least concave majorant operator F →F is not fully differentiable if F is concave but not strictly concave, and hence the conventional Delta method is inapplicable in establishing the asymptotic distribution of √ n F n − F n p . However, the same authors have also shown that it is Hadamard directionally differentiable, a weaker notion of differentiability under which the Delta method is in fact preserved (Shapiro, 1991;Dümbgen, 1993). Second, as shown by Dümbgen (1993) and Fang and Santos (2019), even though the Delta method generalizes to deriving asymptotic distributions under Hadamard directional differentiability, it does not generalize to bootstrap consistency. This can be remedied by appealing to the rescaled bootstrap of Dümbgen (1993). Lastly, the weak limit of √ n F n − F n p is degenerate at zero when F is strictly concave, a consequence of the Kiefer-Wolfowitz theorem. To build up a level α test, we leverage the asymptotic order results on F n − F n ∞ and propose a selection procedure we call the KW-selection that determines whether F is strictly concave or not under the null. The idea is that the convergence rates ofF n are different in these two cases, and hence we may introduce a suitable tuning parameter to identify the truth. Our test is in fact shown to have local size control which may be of interest since the weak limit of the statistic exhibits "discontinuity" with respect to the true distribution. Unfortunately, the local power will be poor under local (contiguous) alternatives that approach a strictly concave distribution.
The literature on testing concavity of the cumulative distribution function (or equivalently monotonicity of the density function) is surprisingly limited. In Section 3.3, we review related studies in this regard and compare them to our test. Overall, there are roughly two types of tests in the density context. The first type is concerned with special classes of alternatives. This includes the work of Woodroofe and Sun (1999) who test uniformity against monotonically increasing (but not uniform) densities, or uniformity against convex cdfs. Therefore, these tests are powerful in detecting these special classes by design, but may perform poorly for other alternatives, which is confirmed in our simulation studies. The second type remains agnostic about the particular natures of alternative hypothesis, but relies on critical values from the asymptotically least favorable distributions. This includes the tests of Durot (2003) and Kulikov and Lopuhaä (2004). While the use of least favorable distributions is common practice in some other settings (Carolan and Tebbs, 2005;Delgado and Escanciano, 2012, 2016, the resulting tests may be too conservative and cause substantial power loss. Finally, in similar contexts, there are recent studies such as Beare and Shi (2019) and Seo (2018) that aim to improve power by bootstrap. However, due to the lack of Kiefer-Wolfowitz type results, it is unclear what happens to their tests when the limiting distributions of the test statistics are degenerate. We shall remove the dependence on least favorable distributions by bootstrap, and control the size of our test (both pointwise and locally) even when degeneracy occurs by utilizing the Kiefer-Wolfowitz's theorem. We now introduce some notation. We denote by R + the set of nonnegative real numbers. For a sequence {Z n } ∞ n=1 of random variables, we write Z n = O a.s. (a n ) for a deterministic sequence {a n } ∞ n=1 if Z n /a n = O(1) as n → ∞ almost surely. Here and elsewhere in this paper, for a sequence {b n } of real numbers, b n = O(1) as n → ∞ means that {b n } is bounded. For two sequences of real numbers {a n } and {b n }, we write a n b n if a n /b n = O(1) as n → ∞, and write a n b n if b n a n . Moreover, the notation a n b n means that a n /b n = O(1) and b n /a n = O(1) as n → ∞. Analogously, for two functions f, g : R → R + , we write f ( ) g( ) as ↓ 0 if f ( )/g( ) = O(1) and g( )/f ( ) = O(1) as ↓ 0. For an arbitrary nonempty set T , ∞ (T ) is the space of bounded realvalued functions on R + equipped with the uniform norm · ∞ , i.e., f ∞ ≡ sup x∈T |f (x)|. The space C 0 (R + ) is the family of all real-valued continuous functions on R + that vanish at infinity. We denote by L − → the weak convergence in the sense of Hoffmann-Jørgensen (van der Vaart and Wellner, 1996).
The remainder of the paper is structured as follows. Section 2 presents refinements of Theorem 1.1. Section 3 formally develops our test and then compare to some existing monotonicity tests from the literature. Section 4 conducts simulation studies, with some of the results relegated to Appendix C. Section 5 concludes. All proofs are collected in Appendix A, while extensions of the test results to a general setup are discussed in Appendix B.

The Kiefer-Wolfowitz Theorems
This section presents refinements of the Kiefer-Wolfowitz theorem that allow the support of the density function f to be unbounded. Throughout, we think of the cdf F as a function on R + (rather than on R). We proceed with the following assumption.
is an i.i.d. sample with common distribution function F , and (ii) F is strictly concave on R + . Assumption 2.1 simply formalizes the i.i.d. setup and strict concavity of the distribution function F . To establish the refinements, we need to impose further regularity conditions on F . In particular, we need to introduce an analog of β F in (1.1). Specifically, for any ∈ [0, 1], we definē where we suppress the dependence ofβ( ) on the underling distribution for simplicity. Thus,β( ) may be viewed as a truncated version of β F in (1.1). As it turns out, whileβ( ) is allowed to approach zero as → 0, the speed at which it approaches zero determines the asymptotic order of F n − F n ∞ . To formalize our discussions, we now impose: Assumption 2.2. (i) F is twice continuously differentiable on R + , (ii)β( ) τ as → 0 for some constant τ > −1, and (iii) Assumption 2.2(i) is the same smoothness condition required by Theorem 1.1. Assumption 2.2(ii) characterizes the exact rates at whichβ( ) is allowed to approach zero as → 0. The special case τ = 0 implies inf 0≤x<∞ {−f (x)/f 2 (x)} > 0, which is exactly the condition (1.1) in the case of unbounded support. If τ > 0, thenβ( ) → 0 as → 0; if τ ∈ (−1, 0), thenβ( ) → ∞ as → 0 which still fulfills (1.1). Assumption 2.2(iii) is a technical condition that serves to control the interpolation error for the cdf F . When the support of the density function f is bounded, Assumption 2.2(iii) is automatically fulfilled because F as a continuous function is bounded. Thus, our assumptions are strictly weaker than those imposed in Kiefer and Wolfowitz (1976).
Given Assumptions 2.1 and 2.2, we now present the first refinement.
Theorem 2.1. If Assumptions 2.1 and 2.2 hold, then, as n → ∞, Theorem 2.1 delivers the asymptotic order of F n − F n ∞ , which crucially depends on the parameter τ . The slowerβ( ) approaches zero (as → 0) or the smaller τ is, the closerF n and F n is (asymptotically). Ifβ( ) approaches zero too fast or τ is too large, then the asymptotic equivalence ofF n and F n are no longer implied. This happens precisely when τ ≥ 1/2, in which case The special case τ = 0 leads to F n − F n ∞ = O a.s. ((log n)/n) 2/3 , which is exactly the result in Theorem 1.1. There are two major new ingredients in establishing the theorem, compared to the proof of Theorem 1.1. One is controlling the variation of F n over a small upper quantile region-see the treatment of the T n,kn+1 term and the B n term in the proof. The other is bounding the interpolation error for F by the L 1/2 -integral of F based on a result from approximation theory (Burchard, 1974). The latter allows us to dispense with the regularity condition γ F < ∞ (Kiefer and Wolfowitz, 1976, p.82).
Our second refinement allows the common distribution function F that the finite sample X 1 , . . . , X n share to be not strictly concave or even non-concave. However, we do require that it vary with the sample size n in such a way that it approaches (in a suitable sense to be specified) a strictly concave distribution function as n → ∞, a notion we call "local to concavity." In order to formalize our local analysis, we denote by P the set of probability measures with the support contained in R + that possibly govern the data: where M is the set of all Borel probability measures on R that are dominated by the Lebesgue measure on R. Further, we think of the distribution function F ≡ F (P ) as a map F : P → ∞ (R + ) defined by We may now formalize the precise meaning of "local" as follows.
Definition 2.1. A function t → P t mapping a neighborhood (− , ) of zero into P is called a differentiable path passing through P if, for P 0 = P and some function h : The notation dP t and dP may be understood as the densities of P t and P with respect to some dominating measure µ t (for each t), though the integral in (2.3) does not depend on the choice of µ t (van der Vaart, 1998, p.362). Loosely speaking, (2.3) implies that P t gets closer and closer to P 0 "on average", as t → 0. The function h is referred to as the score function of P and satisfies h dP = 0 and h ∈ L 2 (P )-see, for example, Lemma 25.14 in van der Vaart (1998). The term "score function" makes sense in view of the relation h(x) = d dt log dP t (x) t=0 (which is the usual definition of score function) under regularity conditions-see, for example, Lemma 7.6 in van der Vaart (1998).
If {P t : |t| < } is a differentiable path, then we obtain by Example 5.3.1 in Bickel et al. (1998) Intuitively, (2.4) means that the distribution function F t ≡ F (P t ) smoothly passes through F ≡ F (P 0 ) at the same speed as the underlying probability measure P t passes through P 0 . For a differentiable path {P t } satisfying (2.3) and F t ≡ F (P t ), we say {F t } is local to concavity if F is concave and is local to strict concavity if F is strictly concave.
We next show that the conclusion in Theorem 1.1 is preserved under local to strict concavity. To this end, we impose the following assumption.
is an i.i.d. sample with common probability measure P 1/ √ n , and (ii) {P 1/ √ n } ⊂ P corresponds to the differentiable path {P t } passing through P 0 ≡ P in the sense of Definition 2.1.
Theorem 2.2. If Assumptions 2.1(ii), 2.2 and 2.3 hold, then, as n → ∞, Theorem 2.2 is not much deeper than Theorem 1.1, but it has implications for the problem of testing concavity as we shall elaborate in Section 3.

Testing Concavity
In this section, we develop a test of concavity that controls size regardless of whether concavity is strict or not. This is accomplished by building on the asymptotic order results established previously. We shall also compare our concavity test with existing monotonicity tests. For simplicity, in what follows, we focus on the canonical case τ = 0 in Assumption 2.2(ii). The general case makes no essential differences-see Appendix B for extensions.

The Test Statistic
First, we introduce our test statistic and derive its asymptotic distribution. To this end, we introduce the least concave majorant (LCM) operator following Beare and Fang (2017)-see also Beare and Moon (2015).
Definition 3.1. Given a convex set T ⊆ R + , the LCM over T is the operator The hypothesis of our interest can now be formulated as: with g ∈ L 1 (R + ) some known positive weighting function. Here and throughout, we work with an arbitrarily fixed p ∈ [1, ∞], and thus suppress the dependence of φ on p for notational simplicity. Given the above formulation, we employ the statistic √ nφ(F n ). To derive the weak limit of √ nφ(F n ), note that under the null hypothesis, MF = F and hence we may rewrite: where D ≡ M − I with I the identity operator on ∞ (R + ). Thus, the asymptotic distribution of √ nφ(F n ) would be an implication of the continuous mapping theorem and the Delta method, if we could show that M (and hence D) is Hadamard differentiable (van der Vaart and Wellner, 1996, p.372-374). Unfortunately, as demonstrated by Beare and Moon (2015) and Beare and Fang (2017), the LCM operator fails to be fully differentiable but only Hadamard directionally differentiable in general (Shapiro, 1990). Nonetheless, this type of directional differentiability suffices for applying a generalized version of the Delta method (Shapiro, 1991;Dümbgen, 1993).
Definition 3.2. Let D and E be normed spaces equipped with norms · D and · E respectively, and φ : for all sequences {h n } ⊂ D and {t n } ⊂ R + such that t n ↓ 0, h n → h ∈ D 0 as n → ∞ and θ + t n h n ∈ D φ for all n.
The defining feature of Hadamard directional differentiability is that, unlike Hadamard (full) differentiability, the directional derivative is in general nonlinear though necessarily continuous and positively homogeneous of degree one (Shapiro, 1990). 1 We refer the reader to Shapiro (1990) and a more recent review by Fang and Santos (2019) for additional discussions. Proposition 2.1 in Beare and Fang (2017) implies that M is Hadamard directionally differentiable at any concave F ∈ ∞ (R + ) tangentially to the set C 0 (R + ) with the derivative M F : C 0 (R + ) → ∞ (R + ) given by: for any h ∈ C 0 (R + ) and x ∈ R + , We emphasize that M F is equal to the identity operator on C 0 (R + ) if and only if F is strictly concave, in which case M is Hadamard differentiable at F tangentially to C 0 (R + )-see Proposition 2.2 in Beare and Fang (2017) for more details.
We are now in a position to state the asymptotic distribution of our statistic √ nφ(F n ) by invoking the generalized Delta method.
Lemma 3.1 establishes the weak limit of the test statistic √ nφ(F n ) under the null. The Brownian bridge t → G(t) ≡ B(F (t)) is a zero mean Gaussian process with covariance function: for all s, t ∈ [0, 1], ( 3.3) The limiting distribution in Lemma 3.1 is not pivotal because it depends on F through the process G and, critically, the derivative M F .

The Critical Values
Towards constructing critical values for our test, we next aim to estimate the law of M F G − G p through bootstrap. There are, however, two complications involved, as we now elaborate.
First, when F is non-strictly concave in which case M is only Hadamard directionally differentiable, the standard bootstrap (compare to (3.1)), is necessarily inconsistent. This is a consequence of Proposition 1 in Dümbgen (1993), which has been formalized by Theorem A.1 in Fang and Santos (2019). Second, when F is strictly concave, the weak limit of √ nφ(F n ) is degenerate at zero since M F = I, which is not surprising in view of Theorem 2.1. The first issue can be resolved by appealing to the rescaled bootstrap in Dümbgen (1993). As shall be seen shortly, the rescaled bootstrap is connected in a subtle way to the modified bootstrap in Fang and Santos (2019); namely, it amounts to composing a suitable derivative estimator (see (3.7)) with some bootstrap process. The second issue is more challenging, which we shall fix by leveraging Theorems 2.1 and 2.2 as we describe now.
The key to the level control of our test is a selection procedure that determines whether the concavity of F is strict or not by exploiting the fact that the convergence rates in these two cases are different (see Theorem 2.1 and Lemma 3.1). Specifically, let {κ n } be a sequence of positive scalars such that κ n = o(1) and (log n) 2/3 n −1/6 /κ n = o(1) as n → ∞. For example, we may take κ n = n −1/7 or (log n) −1 . Define for each n ∈ N, (3.5) Then ξ n p − → 0 if F is strictly concave by Theorem 2.1 and ξ n p − → ∞ if F is nonstrictly concave by Lemma 3.1. Thus, if √ nφ(F n ) ≤ κ n , we may take F to be strictly concave and non-strictly concave otherwise. We refer to such a selection procedure as the KW-selection. As a result, the asymptotic level of our test can be controlled in this case by choosing the critical value to be κ n .
On the other hand, when F is non-strictly concave, the nondegenerate limit M F (G) − G p in Lemma 3.1 can be estimated by composing a suitable esti-matorM n of M F with the nonparametric bootstrap estimator √ n{F * n − F n } of the Gaussian process G, i.e., we employ the bootstrap estimator . . , W nn ) a multinomial random vector with n categories and probabilities (1/n, . . . , 1/n), andM n : ∞ (R + ) → ∞ (R + ) is some appropriate derivative estimator. Such a bootstrap procedure is justified by Theorem 3.2 in Fang and Santos (2019). It is in fact precisely the rescaled bootstrap proposed by Dümbgen (1993), which amounts to estimating M F by: for any h ∈ ∞ (R + ), where t n ↓ 0 such that t n √ n → ∞-see Fang and Santos (2019, p.390-1). In turn, we may choose the critical value in this case as: for α ∈ (0, 1), where P W denotes the probability with respect to the bootstrap weights {W ni } n i=1 holding the data {X i } n i=1 fixed. In practice,ĉ * 1−α can be estimated by Monte Carlo simulations by drawing a large number of bootstrap samples from F n for b = 1, . . . , B (with the data {X i } n i=1 fixed). Now, for each fixed α ∈ (0, 1) we set the critical value of our test aŝ which provides pointwise asymptotic level control as confirmed by the following theorem. Before stating the theorem, we formalize requirements on t n , κ n , and the (1 − α)th quantile c * 1−α of the cdf of the weak limit M F (G) − G p . Assumption 3.1. (i) {t n } is a sequence of positive scalars such that t n ↓ 0 such that t n √ n → ∞ as n → ∞; (ii) {κ n } is a sequence of positive scalars such that κ n ↓ 0 and (log n) 2/3 n −1/6 /κ n → 0 as n → ∞.
Assumption 3.2. The cdf of M F (G)−G p is continuous and strictly increasing at its (1 − α)th quantile c * 1−α when F is non-strictly concave.
Assumption 3.1 imposes the convergence rates on the tuning parameters κ n and t n . We do not touch the challenging issue of optimal choices in this paper. Assumption 3.2 is a standard technical condition (often implicitly imposed) to ensure that consistent bootstrap can produce consistent critical values-see Lemma 11.2.1 in Lehmann and Romano (2005). In verifying Assumption 3.2, we note that, by the proof of Theorem 3.2, the map g → φ F (g) ≡ M F g − g p is subadditive 2 and hence convex since it is obviously positively homogeneous of degree one. Hence, by Theorem 11.1 in Davydov, Lifshits and Smorodina (1998), the distribution function H of M F (G) − G p is absolutely continuous and strictly increasing on (r 0 , ∞), with r 0 ≡ inf{r ∈ R : H(r) > 0}. Therefore, Assumption 3.2 holds whenever c * 1−α > r 0 . We are now in a position to state the first main result of this paper.

8)
and under H 1 , Theorem 3.1 shows that our test is pointwise (in P ) asymptotically level α and consistent under any fixed alternative. However, in view of the irregularity of the problem and as argued in the literature (Lehmann and Romano, 2005;Andrews and Guggenberger, 2010;Romano and Shaikh, 2012), pointwise asymptotics can be unreliable in "nonstandard" settings. It is therefore of interest to investigate the uniform or at least local properties of our test.
To study both local size control and local power of our proposed test, we follow van der Vaart (1998, p.384-6) and consider differentiable paths in P passing through P 0 ≡ P that also belong to the set Thus, a differentiable path {P t } in H is such that if it satisfies the null hypothesis whenever t ≤ 0 but otherwise the alternative for all t > 0. For a differentiable path {P η/ √ n } where η ∈ R, we set P n ≡ n i=1 P , P n n ≡ n i=1 P η/ √ n and define the power function of our test for sample size n as Our next theorem establishes local properties of our test.
Theorem 3.2. Let {P t } be a differentiable path in H and let Assumptions 2.2, 2.3, 3.1, and 3.2 hold with τ = 0 and F ≡ F (P 0 ). Then it follows that 1. For any η ∈ R, (i) if F is strictly concave, then lim inf n→∞ π n (P η/ √ n ) = 0, and (ii) if F is non-strictly concave, then (3.11) (3.12) The first part of Theorem 3.2 delivers a lower bound of the local limiting power function. If the true distribution function F n ≡ F (P η/ √ n ) is local to a strictly concave function F , then the limiting local power along the path {P 1/ √ n } is zero. This is unfortunate and in fact is an implication of Theorem 2.2. The second part shows that the asymptotic null rejection rate along {P 1/ √ n } is no larger than α, establishing the asymptotic local size control.
The first part of Theorem 3.2 might leave one the impression that the test has poor local power against any sequence of local alternatives {P η/ √ n } (with η > 0). The third part is intended to reconcile such a misconception. Heuristically, it says that our test has nontrivial local power if the sequence {P η/ √ n } does not approach the null too fast. To appreciate the condition M F (Ḟ (h)) −Ḟ (h) p > 0, we first note that it prevents F from being strictly concave in which case M F is the identity operator and the limiting local power is zero by Part 1-(i). Second, by (2.4), we have that, as n → ∞, In turn, this implies by Proposition 2.1 in Beare and Fang (2017) and φ(F ) = 0 (by the definition of H) that, as n → ∞, (3.14) By the positive homogeneity of degree one of M F (as a Hadamard directional derivative), we thus have: for all η > 0. Therefore, ∆ ≡ M F (Ḟ (h)) −Ḟ (h) p > 0 implies that the third part is concerned with local power along a nontrivial Pitman drift, a canonical device for local power analysis. When ∆ = 0 (as when F is strictly concave), the speed that {P η/ √ n } approaches the null is too fast in the sense that √ nφ(F n ) → 0, thereby making the test hard to reject.

Comparisons with Existing Tests
We now compare our concavity test with some existing tests. While there is a rich literature on testing monotonicity in settings such as nonparametric regressionsee Chetverikov (2019) for a recent study with a brief survey, it is somewhat surprising that results for the same problem in the density context seem rather limited. This is just one piece of evidence consistent with Jon Wellner's view that "the (shape-constrained) community also needs to do more work to provide inferential methods beyond estimation" (Banerjee and Samworth, 2018). Banerjee and Wellner (2001) propose likelihood ratio tests for equality at a fixed point, rather than the global shape as we consider, in the setting of nonparametric estimation of a monotone function-see also Banerjee (2005). More related to our setting is the work by Woodroofe and Sun (1999), who are concerned with testing uniformity versus a monotone (but not uniform) density. Translated to our setup, they study the simple null F being the uniform distribution against the composite alternative that F is convex (but is not a uniform distribution). Therefore, the parameter spaces under the null and the alternative are smaller than the corresponding spaces in our setup. Woodroofe and Sun (1999) propose two tests, namely, the P -test and the D-test, based on a penalized nonparametric maximum likelihood estimator of the density. The P -test statistic is simply the penalized likelihood ratio, while the D-test statistic is √ n-with n being the sample size-times the uniform distance between the penalized cdf estimator and the uniform cdf.
The simple null hypothesis in Woodroofe and Sun (1999), though greatly simplifies the asymptotic analysis, may be restrictive from a practical point of view. This is a reflection of the fact that the pointwise asymptotic distributions of the test statistic under a composite null such as ours are highly nonstandard. In a nonparametric regression setting, Durot (2003) studies the composite null that the regression function is nonincreasing against the alternative that it is not. The test statistic proposed by Durot is a rescaled version of the uniform distance between a cumulative regression estimator and its least concave majorant. Durot (2003) shows that the constant regression functions are asymptotically least favorable, base on which critical values are constructed. Kulikov and Lopuhaä (2004) adapt Durot (2003)'s test to the density setup based on the same critical values-see their Theorem 3.1, and also Proposition 3.1 in Kulikov and Lopuhaä (2008). Kulikov and Lopuhaä (2004) also propose a test based on the L p -norm (against the empirical distribution) of the empirical cdf and its least concave majorant, also relying on critical values constructed from the least favorable distribution. We note that the test based on the statistic T n in Kulikov and Lopuhaä (2004) is invalid in the sense that it may over-reject under the null, as pointed out by the same authors and confirmed by their simulations.
Tests based on the least favorable distributions, though control the size in a very simple way, may be too conservative and result in power loss. This includes the tests of Durot (2003) and Kulikov and Lopuhaä (2004). On the other hand, the P -test and the D-test explicitly take into account the alternative hypothesis of nondecreasing densities. Therefore, they are powerful in detecting nondecreasing densities, but may perform poorly when the underlying distribution is neither concave nor convex. Our simulations confirm these predictions. Finally, we reiterate that the need of bootstrap for testing concavity as well as the tuning parameters κ n and t n is in line with the nonstandard nature of the problem, rather than a special attribute of our inferential framework. The implementation of our bootstrap is as simple as calculating the test statistic, or as simple as computing the least favorable majorant.

Simulation Experiments
We next evaluate the finite sample performance of our test through Monte Carlo simulations. Special attention shall be paid to the tuning parameters κ n (for the KW selection) and t n (for the rescaled bootstrap). For this, we consider κ n ∈ {n −1/7 , n −1/8 , log −1 n} and t n ∈ {n −1/ : = 3, 4, . . . , 7}. We run two sets of simulations in order to compare with the standard bootstrap and some existing monotonicity tests. Throughout, we let the significance level be 5%, and all results are based on 1000 Monte Carlo simulations and 500 bootstrap repetitions for each simulation replication (to implement our test).
First, we consider the distributions defined by the following densities supported on R + : for any x ∈ R + , (4. 2) The function f 1 is strictly decreasing, while f 2 is weakly decreasing though strictly decreasing on the region [9/10, ∞). Hence, the corresponding distribution functions, denoted F 1 and F 2 , are strictly concave and non-strictly concave respectively. To examine local behaviors of our test, we construct differentiable paths passing through F 1 and F 2 as follows: for t ∈ R + and x ∈ R + , By Example 3.2.1 and Proposition 2.1.1 in Bickel et al. (1998), these two paths are indeed differentiable (under the null). We then generate i.i.d. samples {X i } n i=1 from (4.1), (4.2), (4.3) and (4.4), with n ∈ {100, 200, 300, 400, 1000}. For simplicity, we only consider the sup statistic √ n F n − F n ∞ , and report results based on the rescaled and the standard bootstrap. Tables 1, 2, 3 and 4 present the numerical results. Inconsistency of the standard bootstrap is prominently evidenced in last columns of Tables 2 and 4 which are based on non-strictly concave functions (so that the operator M are only Hadamard directionally differentiable at these distribution functions). On the contrary, our test alleviates the size distortion both pointwise and locally, though the results vary with the choices of the tuning parameters. Overall, the choices t n ∈ {n −1/3 , n −1/4 } tend to be too small, while t n ∈ {n −1/5 , n −1/6 , n −1/7 } are adequate in that they control the size well. Tables 1 and 3 also show that, when the distribution function is strictly concave, our test by design controls the size both pointwise and locally, though the rejection rates are often close to zero, which is consistent with Theorem 3.2. Moreover, as in Tables 1 and 3, the standard bootstrap also exhibits size control, due to the facts that the LCM operator is fully (Hadamard) differentiable at strictly concave distribution functions and that the distance between the Grenander distribution estimator and the empirical cdf converges faster than √ n by the Kiefer-Wolfowitz theorems. Finally, we note that the results are quite insensitive to the choice of κ n . Next, we compare our test with some existing tests. As before, the null hypothesis is that the true cdf is conave (or, equivalently, the true pdf is nonincreasing), while the alternative is that it is not. We shall consider the D and the P tests in Woodroofe and Sun (1999), the Kolmogorov-Smirnov type test in Durot (2003) and Kulikov and Lopuhaä (2004), labelled as the KS test, and the test based on the R n statistic with k = 1 in Kulikov and Lopuhaä (2004), labelled as the KL test. We design two families of distributions. The first family consists of distributions defined by: for λ ∈ R, where F λ is understood to be the uniform cdf if λ = 0. Clearly, F λ is concave and hence in the null whenever λ ≤ 0, but is convex and hence in the alternative if λ > 0-see Figure 1-(a). Thus, this is a setup well suited for the D test and the P test proposed by Woodroofe and Sun (1999), because the alternative consists of only convex cdfs (or nondecreasing pdfs). As mentioned in Section 3.3, the D and the P tests may perform poorly in detecting distributions that are neither convex nor concave. To verify this point numerically, we consider a family of distributions (that belongs to the alternative) defined by: for λ ∈ (0, 1), As shown in Figure 1-(b), F λ defined in (4.6) is neither concave nor convex on [0, 1] for any λ ∈ (0, 1), though it is concave on [0, λ] and [λ, 1].
Tables 5 and 6 record the results for n ∈ {100, 400}. Across the choices of (t n , κ n ), when F λ is close to being uniform, our test tends to over-reject (under the null) with smaller t n , especially with t n = n −1/3 and in small samples. All other tests control size well in large samples, but tend to under-reject in small samples. In terms of power, the KS test appears to be the least powerful against local alternatives (i.e., distributions with positive but small λ). The Dtest overall appears to be the most powerful one, though the power is comparable  .5) and (4.6) to ours and that of the P and the KL tests. For a fair comparison, we also compute the size-adjusted power by subtracting the empirical rejection rates for λ = 0 (the least favorable case) from the corresponding rejections rates for λ > 0. The results are recorded in Table 7, from which we see that the power patterns more or less remain. Those results for n ∈ {200, 300, 600, 800, 1000} share similar patterns-see Appendix C.
Tables 8 and 9 record the rejection rates for n ∈ {100, 400} based on the design (4.6). As expected, the D and the P tests, which are designed to be powerful against convex alternatives, now perform poorly and in most cases have power less than 5%-this is the case even with large sample sizes. The KS and the KL tests appear more powerful than the D and the P tests, but their performance is strictly dominated by our tests across all sample sizes and pairs (t n , κ n ), with substantial power discrepancies in many cases. We remind the reader that in practice, one can rarely rule out a priori distributions such as those in (4.6). Thus, the simulations suggest that our test is more robust in the sense that they are powerful against a larger class of alternatives. Those results for n ∈ {200, 300, 600, 800, 1000} share similar patterns-see Appendix C.
Admittedly, the simulation results reinforce the importance of the choice of κ n for our test-the choice of t n does not appear as important. Overall, they provide comforting numerical evidence that our test is a useful addition to the literature. The associated computation cost is reasonable: with an Intel Xeon CPU E5-1650 v4 3.60GHz, a single replication based on 1000 samples from the design in (4.5) and 500 bootstrap repetitions is completed in 15 seconds.

Conclusion
This paper studies estimation of and inference on a cumulative distribution function with concavity constraint. The estimation results generalize the asymptotic order results in Kiefer and Wolfowitz (1976) to settings with unbounded support and contiguous distributions. These results are not only of interest in their own right, but also useful for conducting inference on the concavity. In particular, in conjunction with the rescaled bootstrap of Dümbgen (1993) and the recent work of Fang and Santos (2019), they allow us to build up a test that controls size, pointwise and locally, even when the distribution function is strictly concave (in which case the test statistic is asymptotically degenerate). Through simulation studies, we find that our test is powerful against a larger class of alternatives, compared to some existing tests such as those in Woodroofe and Sun (1999) that are designed to work against a specific class of alternatives, or those in Durot (2003) and Kulikov and Lopuhaä (2004) that rely on critical values from  the least favorable asymptotic distributions. In Appendix B, we show how the testing results may be extended to a general setup that includes regression and hazard rate problems as special cases.
In our limited simulation studies, we also find that the choice of κ n for the KW selection does not appear as important as the choice of t n for implementing the rescaled bootstrap. A formal investigation of both choices is, however, beyond the scope of this paper, and therefore left for future study.

Appendix A: Proofs of Main Results
Proof of Theorem 2.1: Our proof is a mixture of the original one in Kiefer and Wolfowitz (1976) and the "modernized" one given in Balabdaoui and Wellner (2007) but takes into account that the support of the density function f is potentially unbounded. Following Kiefer and Wolfowitz (1976), we first define interpolating processes for F and F n . Let k n ↑ ∞ be a sequence of positive integers (to be chosen). Define a j ≡ a (kn) j ≡ F −1 (j/k n ) for j = 1, . . . , k n − 1, and 2kn) ). Moreover, set a 0 = 0 and a kn+1 ≡ a (kn) kn+1 = ∞. Let L (kn) be the piecewise linear on [a j−1 , a j ] for j = 1, . . . , k n satisfying and L (kn) (x) = F (x) for x ∈ [a kn , a kn+1 ]. Thus, in Figure 2, L kn is the function that connects {a j , j/k n } kn j=0 on the interval [0, a kn ] in a piecewise linear way and is identical to F (x) on the tail [a kn , a kn+1 ]. Clearly, L (kn) inherits concavity from F . Using the notation in (de Boor, 2001, p.31), we may write L (kn) = I 2 F even though L (kn) is nonlinear on [a kn , a kn+1 ].
Next, we define L (kn) n by: for x ∈ [a j , a j+1 ] and j = 0, . . . , k n , Similarly, write L (kn) n = I 2 F n and note, for j = 0, . . . , k n + 1, Thus, L kn n and F n intersect at the grid points {a j }, just as L kn and F . Moreover, since L (kn) n is an affine transformation of L (kn) on each of the intervals [a j , a j+1 ], the former inherits the piecewise linearity from the latter on [a 0 , a kn ]. Heuristically, one may thus think of L (kn) n as a finite sample analog of L (kn) . We next show that L (kn) n also inherits concavity from L (kn) for a suitable k n .
Step 1: For A n ≡ {L In fact, we shall show that A n occurs for all n large with probability one. We follow the arguments given by Kiefer and Wolfowitz (1976, p.81-2). For j = 1, . . . , k n , define the increments: Note that the definition of ∆a j is in line with Balabdaoui and Wellner (2007) but differs slightly from Kiefer and Wolfowitz (1976). Since L on each such [a j , a j−1 ] is precisely T n,j /∆a j . In turn, it follows that establishing "asymptotic concavity" amounts to showing a string of inequalities comparing consecutive slopes-we remind the reader that L (kn) n is concave on the interval [a kn , a kn+1 ]. To formalize the idea, write, for j = 1, . . . , k n − 1, F (a kn+1 )−F (a kn ) f (a kn ) is the slope of L (kn) n at a kn . For all j = 1, . . . , k n , if B n,j holds, then L (kn) n stays concave as it moves from [a j−1 , a j ] into [a j , a j+1 ]. Therefore, we obtain the following representation of the event A n : We next consider the sets {B n,j } one by one. The goal is to establish analytically tractable sufficient conditions for each B n,j to occur. Then we shall bound P (A n ) from below by computing kn j=1 P (B c n,j ) (in view of (A.2)), which in turn may be controlled by utilizing those sufficient conditions.
For j = 1, . . . , k n − 2: It is simple to verify that B n,j occurs if so that B n,j occurs, where the second inequality holds whenever 0 ≤ δ n ≤ 1/3. Next, we show that ∆aj+1 ∆aj ≥ 1+3δ n holds for all n large under an additional rate restriction on δ n (to be specified)-we shall control the probability of |T n,i − 1 kn | ≤ δn kn shortly. By Assumption 2.2(i), we may employ Taylor's theorem to conclude that, for some ξ j+1 ∈ [a j , a j+1 ], ) is strictly increasing) and ∆a j ≡ F −1 ( j kn ) − F −1 ( j−1 kn ) by definition, the ratio ∆a j /k −1 n is no larger than the slope of F −1 at the right end point of the interval [(j − 1)/k n , j/k n ] (i.e., j/k n ). Mathematically, this means that Combining previous results (A.5) and (A.6) yields where the second inequality holds since f (ξ j+1 ) ≤ f (a j ) because f is decreasing and ξ j+1 ≥ a j . By setting δ n ≤ 1 6knβ ( 1 kn ), the second part of display (A.3) thus holds. Note thatβ( ) is clearly nondecreasing in (because the domain of the infimum in (2.1) shrinks as increases), so δ n ≤ 1/3 for all large n. Moreover, by Assumption 2.2(ii),β( ) > 0 for all small > 0 and so δ n > 0 for large n.
The above three facts together imply that there is a unique ν = ν n that satisfies equation (A.21) for all n large and diverges to infinity as n → ∞. To evaluate the order of ν n , we note that, by Assumption 2.2(ii), implying that ν n (n/ log n) 1/(2τ +3) . Since k n is the integer part of ν n , we have k n (n/ log n) 1/(2τ +3) , which diverges to infinity as n ↑ ∞ since 2τ + 3 > 0 by n −2 < ∞, the first Borel-Cantelli lemma implies that, with probability one, A n is concave for all n large.
Step 2: Relate F n − F n ∞ to the interpolating process L (kn) and L (kn) n . For notational simplicity, define By the triangle inequality and Lemma 2.2 in Durot and Tocquet (2003), we have Invoking the triangle inequality once again we in turn have from (A.24) that where the second inequality exploited F (x) = L (kn) (x) for all x ∈ [a kn , a kn+1 ].
Step 3: Conclude by controlling the orders of B n , D n and E n separately. To this end, we introduce the concept of modulus of continuity following de Boor (2001, p.25). For a generic function g : The treatment of D n follows closely Balabdaoui and Wellner (2007), which we include here for completeness. Note that where the inequality is due to result (18) in de Boor (2001, p.36) with |a| ≡ max k j=1 (a j −a j−1 ), and d = denotes equality in distribution and is due to Theorem 1.1.2 in Shorack and Wellner (1986), with U n ≡ √ n{G n − I} the empirical process of n i.i.d. Uniform(0, 1) random variables as defined in Shorack and Wellner (1986). By Theorem 14.2.1 in Shorack and Wellner (1986), we have lim n→∞ ω(U n ; p n ) 2p n log(1/p n ) = 1 a.s., (A.27) provided p n → 0, np n → ∞, log(1/p n )/ log(log n) → ∞, and log(1/p n )/(np n ) → 0. To see that these rate conditions on p n ≡ k −1 n are indeed met, we first note that p n ((log n)/n) 1/(2τ +3) . Thus, trivially, p n → 0 and as n → ∞, since 2τ +2 2τ +3 > 0 by Assumption 2.2(ii). The third rate condition is also trivial because log(1/p n ) is of order log(n) − log(log n), diverging to infinity faster than log(log n). For the fourth condition, log(1/p n ) grows at a logarithmic rate while np n grows at a polynomial rate-note τ + 1 > 0 and so 2τ +2 2τ +3 > 0. Now, by simple algebra, we may conclude from result (A.27) that We now turn to B n which is important in taking care of the unbounded support. Heuristically, B n measures the interpolating error for F n on the (unbounded) right tail [a kn , a kn+1 ] of the support.
Proof of Theorem 2.2: Let {k n } be the same sequence that is determined by (A.22). Define the grid points {a j } kn+1 j=0 and the interpolation processes L (kn) and L (kn) n in the same fashion as before (based on F ). The proof analogously consists of three steps as in the proof of Theorem 2.1. One of the key observations is that Theorem 1.1.2 in Shorack and Wellner (1986) holds for any i.i.d. {X i } n i=1 whose common distribution function is allowed to depend on n.
Step 1: For A n ≡ {L (kn) n is concave on R + } (as before), show that (A.1) holds. This is accomplished by the same arguments as before. In particular, the arguments preceding (A.20) are purely algebraic, Theorem 1.1.2 in Shorack and Wellner (1986) (for result (A.20)) holds for any n, and Lemma 5.2 in Balabdaoui and Wellner (2007) and Inequality 10.3.2 in Shorack and Wellner (1986) are concerned with the uniform empirical process whose relations to F n only depend on the i.i.d. assumption (for each n) as characterized by Theorem 1.1.2 in Shorack and Wellner (1986). This last observation means that, again, the common cdf shared by X 1 , . . . , X n can depend on n, as in Assumption 2.3.
Step 2: For B n , D n and E n defined as before, show that (A.25) holds. This is a consequence of the triangle inequality and the fact F (x) = L (kn) (x) for all x ∈ [a kn , a kn+1 ]. The arguments are thus the same as before.
Step 3: Control B n , D n and E n . The treatment of B n is the same as before, because Theorem 1.1.2 in Shorack and Wellner (1986) is still applicable which then allows one to invoke Inequality 10.3.2 in Shorack and Wellner (1986). The treatment of E n is merely a consequence of the same approximation result, namely, Theorem 1 in Burchard (1974). It thus remains to consider D n .
and hence, in view of (A.35), This completes the proof of the theorem.
Proof of Lemma 3.1: By Assumption 2.1(i), we obtain by Example 19.6 in van der Vaart (1998) van der Vaart (1998, p.266) for a brief discussion of the limit G. Define a map ψ : ∞ (R + ) → ∞ (R + ) by ψ(g) = Mg − g for any g ∈ ∞ (R + ). Then by definition φ(g) = ψ(g) p . Since F is concave under H 0 , Proposition 2.1 in Beare and Fang (2017) implies that ψ is Hadamard directionally differentiable tangentially to C 0 (R + ) with the derivative ψ F : for all h ∈ C 0 (R + ). Moreover, · p is Hadamard directionally differentiable at 0 since ( 0 + t n h n p − 0 p )/t n = h n p → h p whenever t n ↓ 0 and h n → h in ∞ (R + ). By the chain rule as in Proposition 3.6 in Shapiro (1990), we may therefore conclude that φ : ∞ (R + ) → R is Hadamard directionally differentiable tangentially to C 0 (R + ) with the derivative φ F : C 0 (R + ) → R given by φ F (h) = M F h − h p . This, in view of Theorem 2.1 in Fang and Santos (2019), implies that: under H 0 (and so φ(F ) = 0), This completes the proof of the lemma.
Proof of Theorem 3.1: Consider first the case when F is strictly concave on R + . By Assumption 3.1(ii), n −2/3 (log n) 2/3 /(n −1/2 κ n ) → 0 as n → ∞. In turn, we may thus have by Theorem 2.1 that Now suppose that F is non-strictly concave. In this case, consistency of the rescaled bootstrap is justified by Proposition 2 in Dümbgen (1993). Alternatively, the bootstrap consistency follows by Lemma S.3.8 (for consistency of M n as an derivative estimator of M F ) and Theorem 3.2 in Fang and Santos (2019). In any case,ĉ * 1−α p − → c * 1−α by Assumption 3.2 and Corollary 3.2 in Fang and Santos (2015). Therefore, it follows that where the second inequality follows from Slutsky's lemma (so that and the portmanteau theorem, and the last step is due to Assumption 3.2. This proves the first claim of the theorem. We now turn to the second part of the theorem and suppose that F is not concave. First, we show thatĉ * 1−α = O p (1) regardless of whether F is concave or not, where O p (1) means "bounded in probability." To this end, note that by Lemma 2.2 in Durot and Tocquet (2003), for any h ∈ ∞ (R + ), where means " smaller than or equal to up to a universal constant". Here, the constant only depends on the (known) weighting function g if p ∈ [1, ∞). In turn, result (A.47) implies that Moreover, Theorem 3.1 in Giné and Zinn (1990) and Proposition 10.7 in Kosorok (2008) yields √ n{F * n −F n } ∞ = O p (1) outer almost surely. This, together with result (A.48) and Lemma 3 in Cheng and Huang (2010), implies that: unconditionally. Fix ∈ (0, 1). Then we may choose some M > 0 such that By the definition ofĉ * 1−α , ifĉ * 1−α > M , then we must have We may then conclude from the implication of (A.51) that where the second inequality follows by the Markov's inequality, the third inequality by Lemma 1.2.6 in van der Vaart and Wellner (1996), and the last step by the result (A.50). This shows thatĉ * 1−α = O p (1) and henceĉ α = max{κ n ,ĉ * 1−α } = O p (1) in view of κ n = o(1) by Assumption 3.1(ii).
Next, by the triangle inequality and Lemma 2.2 in Durot and Tocquet (2003), where C > 0 is a constant depending on the weighting function g. Note that if F is noncacave, then MF and F must differ from each other on a set of positive Lebesgue measure, implying that MF − F p > 0. Together with √ n{F n − F } ∞ = O p (1), we may then conclude from result (A.53) that √ nφ(F n ) → ∞ in probability. Combining withĉ α = O p (1), it follows that This completes the proof for the second claim of the theorem.
Proof of Theorem 3.2: First, for convenience of the reader, we introduce the concept of contiguity. For each n ∈ N, let P n and Q n be two generic probability measures on some measurable space (Ω n , A n ). Then the sequence {Q n } is said to be contiguous with respect to {P n } if, for any statistic T n : Ω n → R, one has T n p − → 0 under Q n whenever T n p − → 0 under P n . Heuristically, this means that any statistic T n that is asymptotically negligible under P n remains so under Q n . We say that {Q n } and {P n } are mutually contiguous if {Q n } is contiguous with respect to {P n } and vice versa. We refer the reader to Chapter 6 in van der Vaart (1998)-in particular, Lemma 6.4 there-for more details.
Since {P t } is a differentiable path, it follows by Theorem 12.2.3 and Corollary 12.3.1 in Lehmann and Romano (2005) that P n n and P n are mutually contiguous. By Theorem 2.1 in Fang and Santos (2019) and φ(F (P )) = 0, we then have: In turn, we obtain by (A.55) and mutual continuity of P n n and P n that, under P n n , As is well known in the literature (see, for example, Bickel et al. (1998, p.192)), F n is a regular estimator of F so that, under P n n , By result (2.4), we also have, as a deterministic result, By Slutsky's theorem, we obtain from results (A.57) and (A.58) that under P n n . Combining (A.56) and (A.59), together with the continuous mapping theorem and Slutsky's theorem, we may thus conclude that, under P n n , For the first part of the theorem, we consider the two cases separately.
Case I: F is strictly concave. By Assumption 3.1(ii), n −2/3 (log n) 2/3 /(n −1/2 κ n ) → 0 as n → ∞. By Assumption 3.1(ii) and Theorem 2.2 we thus have Case II: F is non-strictly concave. Mutual contiguity of P n n and P n , together withĉ * 1−α p − → c * 1−α under P n from the proof of Theorem 3.1, implies that whenever F is non-strictly concave, Since c * 1−α is nonnegative as a quantile of M F (G) − G p and κ n = o(1) by Assumption 3.1(ii), we obtain from (A.62) and the continuous mapping theorem thatĉ 1−α p − → c * 1−α under P n n . This, together with result (A.60), Slutsky's theorem and the portmanteau theorem, allows us to conclude that lim inf n→∞ π n (P 1/ If in addition the cdf of M F (G + ηḞ (h)) − {G + ηḞ (h)} p is continuous at c * 1−α , then (A.63) holds with equality, again by the portmanteau theorem. This completes the proof for the first part of the theorem.
As for the second part, we again consider the following two cases separately.
Case I: F is strictly concave. By Assumption 3.1(ii) and Theorem 2.2 we have lim sup This is in fact also implied by result (A.61) which holds for any η ∈ R.
Case II: F is non-strictly concave. We follow the arguments of Theorem 3.3 in Fang and Santos (2019). To begin with, note that result (A.60),ĉ 1−α p − → c * 1−α under P n n (as argued previously), Slutsky's theorem and the portmanteau theorem together imply that lim sup n→∞ Next, we show that the map g → φ F (g) = M F (g)−g p (defined on ∞ (R + )) is subadditive. Note that (i) M F (g) − g ≥ 0 for all g ∈ ∞ (R + ) because M F (g) majorizes g, and (ii) ∞ (R + ) can be identified as a subspace of L p (R + ) equipped with norm · p if p ∈ [1, ∞)-of course, if p = ∞, then ∞ (R + ) is a subspace of itself. Thus, we may view ψ F as a real-valued map defined on some (normed) Since φ F is defined in term of the · p norm, such an embedding (of the domain ∞ (R + ) of φ F into L p (R + )) allows us to rewrite φ F according to Lemma A.1.
Specifically, let ∞ (R + ) p * be the topological dual space of ∞ (R + ) viewed as a subspace of L p (R + ) if p ∈ [1, ∞), and be the topological dual space of where · op is the operator norm, and ϕ ≥ 0 means that ϕ(g) ≥ 0 whenever g(x) ≥ 0 for all x ∈ R + . By Lemma A.1 we now have: for any g 1 , g 2 ∈ ∞ (R + ), where the second equality follows by linearity of each ϕ ∈ S p + . Since M F is convex by Proposition 2.2 in Beare and Fang (2017) and positively homogeneous of degree one as a Hadamard directional derivative (Shapiro, 1990), it follows that M F must be subadditive, i.e., (A.67) Since each ϕ ∈ S p + is linear and satisfies ϕ(g) ≥ 0 whenever g(x) ≥ 0 for all x ∈ R + , we obtain from (A.67) that, for any ϕ ∈ S p + , Combining results (A.66) and (A.68) then leads to where the equality again follows by linearity of each ϕ ∈ S p + . By Lemma A.1, we in turn have from (A.69) that, for any g 1 , g 2 ∈ ∞ (R + ), In light of result (A.66), we immediately obtain that Moreover, since {P η/ √ n } is a differentiable path under the null when η ≤ 0, it follows that φ(F (P η/ √ n )) = φ(F (P )) = 0 for all n ∈ N and η ≤ 0. Hence, ( where we also exploited that c * 1−α is a continuity point of the cdf of M F (G) − G p by Assumption 3.2. This completes the proof of the second part.
Finally, for the third part of the theorem, we note by Lemma 2.2 in Durot and Tocquet (2003) that where the second inequality follows from the definition of · p . Fix M > 0. By Dvoretzky, Kiefer and Wolfowitz's inequality (see, for example, Theorem 11.2.18 in Lehmann and Romano (2005)), we have that which can be made arbitrarily small by choosing a sufficiently large M . Combining results (A.73) and (A.74) we thus obtain that Moreover, by Proposition 2.1 in Beare and Fang (2017) and φ(F ) = 0 (by the definition of H), we obtain that, as n → ∞, By the positive homogeneity of degree one of M F (as a Hadamard directional derivative), we thus have from (A.76) that: for all η > 0. It follows from (A.75) and (A.77) that, under P n n , Having derived the order of the test statistic as in (A.78), we next evaluate the order of the critical value. For this, note that, by result (A.48), By Dvoretzky, Kiefer and Wolfowitz's inequality (see, for example, Theorem 11.2.18 in Lehmann and Romano (2005)), we note that, for any M > 0, Our final lemma entails some knowledge on Riesz space. For convenience of the reader, we introduce some relevant concepts in this regard. Just like normed spaces generalize to abstract spaces the operations of vector addition and scalar multiplication in Euclidean spaces, Riesz spaces generalize the binary relations ≥ and ≤. Let E be a set. We say that E is partially ordered if there exists a binary relation ≥ that is transitive (i.e., x ≥ z whenever x ≥ y and y ≥ z), reflexive (i.e., x ≥ x) and antisymetric (i.e., x ≥ y and y ≥ x implies x = y). The notation y ≤ x is equivalent to x ≥ y. If x ≥ y but x = y, we also write x > y or equivalently y < x. Let E be partially ordered by ≥. An element z is the supremum of a pair x, y ∈ E, denoted x ∨ y, if x ≥ z, y ≥ z, and z ≤ u whenever x ≤ u and y ≤ u. The infimum of x, y ∈ E, denoted x ∧ y, is defined similarly. Not every pair x, y in E admits a supremum or infimum. But, if this is the case, then we call E a lattice.
If E is a vector space, then one may hope that the partial order ≥ is "compatible" with the algebraic structure of E. Concretely, whenever x ≥ y, one should have (i) x + z ≥ y + z for all z ∈ E, and (ii) αx ≥ αy for all α ∈ R + . In this case, we call E an (partially) ordered vector space. A partially ordered vector space that is also a lattice is called a Riesz space. For example, for L p (R + ) with p ∈ [1, ∞), we may define the partial order ≥ by: for all x ∈ R + . Then both L p (R + ) and ∞ (R + ) are Riesz spaces. Given the notion of supremum, we may define the absolute value of x ∈ E by |x| ≡ x + + x − where x + ≡ x ∨ 0 and x − ≡ (−x) ∨ 0. If E is a Riesz space and is equipped with a norm · satisfying the property that x ≤ y whenever |x| ≤ |y|, then E is called a normed Riesz space, and the norm · is called a lattice norm. We refer the reader to Aliprantis and Border (2006) for more discussions. With these concepts in hand, we now have: Lemma A.1. Let E be a normed Riesz space with partial order ≥ and lattice norm · E . Then for any x ≥ 0, we have where E * is the topological dual space of E, · op is the operator norm, and ϕ ≥ 0 means that ϕ(y) ≥ 0 whenever y ≥ 0.
Proof: The conclusion trivially holds if x = 0. Suppose that x > 0. Then Theorem 39.3 in Zaanen (1997) implies that there exists some ϕ * ∈ E * with ϕ * op = 1 and ϕ * ≥ 0 such that ϕ * (x) = x E . This in turn implies that On the other hand, we have This completes the proof of the lemma.

Appendix B: Some Extensions
In this section, we discuss extensions of our test results to a general setup that includes inference on density, regression and hazard functions as special cases.
To this end, We shall assume that the data {X i } n i=1 are i.i.d. with common distribution P η/ √ n ∈ P, which we also denote by P n with some abuse of notation. Here, P denotes the model as before. This allows us to consider the standard i.i.d. setup (i.e., η = 0) as well as the setup for local analysis, in a unified way. In turn, wherever appropriate, we identify the parameter of interest θ as a map θ : P → ∞ ([a, b]) such that, for any P ∈ P and t ∈ [a, b], where a, b ∈ R are known with a < b throughout. The dependence of θ(P ) on P sometimes is also suppressed for simplicity. While it may be possible to consider settings with unbounded support such as [0, ∞), we confine our attention to the bounded case for simplicity-it is not essential to what we intend to convey anyway. Our objective is to test the null that θ(P η/ √ n ) is concave versus otherwise. In the main text, g is the density function and θ is the cdf. Below we formalize the hazard and the regression examples.
Example B.1 (Monotone Hazard Rate). Let X ∈ R + be a random variable with cdf F that admits a pdf f with respect to the Lebesgue measure. Then the hazard rate λ at u is defined by: Heuristically, λ(u) measures the probability of instantaneous failure rate at time u, given the subject has functioned until u (Groeneboom and Jongbloed, 2014). A leading example of shape restrictions is that λ be monotonically increasing on an interval [0, b] for some b ∈ (0, ∞). This can be inferred from convexity of the cumulative hazard function Λ defined by: for any t ∈ [0, b], is the left-continuous version of F . Here, F (u) = F (u−) for all u ∈ R + since F is absolutely continuous, so Λ(t) = t 0 λ(u) du. The definition above, however, allows us to construct estimators of Λ in a straightforward way. In this example, θ = −Λ.
bivariate random vectors generated according to One may also view Λ as a population analog of the cumulative sum diagramsee, for example, Beare and Fang (2017, p.3865). Here, θ = −Λ.
Since a function θ : [a, b] → R is convex if and only if −θ is concave, both Examples B.2 and B.1 fall within the scope of our framework. For a coherent treatment, we shall thus maintain the null hypothesis that θ is concave versus otherwise, and revisit Examples B.2 and B.1 after the general theory is presented. Towards this end, letθ n : be an unconstrained estimator of θ. Note that X i is a generic notation for the i-th observation; e.g, X i = (Y i , Z i ) in Example B.2. In turn, we then employ the test statistic for some positive weighting function g ∈ L 1 ([a, b]). Throughout, we study the test functional φ with an arbitrarily fixed p ∈ [1, ∞].
We now proceed by imposing the following assumptions, where C([a, b]) is the space of continuous functions on [a, b] equipped with the uniform norm.
with a common distribution P η/ √ n for some η ∈ R; (ii) {P t : |t| < } ⊂ P is a differentiable path with score h in the sense of Definition (2.1); (iii) where θ n ≡ θ(P η/ √ n ) and G ∈ C([a, b]); (ii) Mθ n −θ n ∞ = O p (s n ) under P n n if θ 0 is strictly concave for some sequence {s n } of strictly positive scalars satisfying √ ns n = o(1).
Assumption B.1(i)(ii) formalizes the data generating process. Assumption B.1(iii) is a generalization of the property (2.4), and is fulfilled whenever θ : is a regular parameter in the sense of Definition 5.1 in Bickel et al. (1998). If θ n and θ 0 are in C([a, b]) for all n, then so is ηθ(h) as a uniform limit of continuous functions. Assumption B.1(iv) implies that {P η/ √ n } is a sequence of local perturbations such that θ n tends to a concave function θ 0 . We stress that the parameter of interest in truth is θ n while θ 0 is the limit of {θ n }; if η = 0 (e.g., the i.i.d. setup with a fixed distribution P 0 ), then θ n = θ 0 (and ηθ(h) = 0). Assumption B.2(i) requires an estimatorθ n of θ n that admits a weak limit G at rate √ n. In turn, Assumption B.2(ii) is a high level condition describing the consequence of degeneracy when θ 0 is strictly concave. For the density problem in the main text, we have s n = (n −1 log n) 2/3 as verified by Theorem 2.2. Note that the order in probability (i.e., O p ) instead of almost surely (i.e., O a.s. ) suffices for our inferential purpose.
By a simple generalization of Lemma 3.2 in Beare and Moon (2015), M : is Hadamard directionally differentiable at the concave θ 0 tangentially to C([a, b]). Like M F in the main text, the derivative M θ0 (h) majorizes h by concave functions on regions over which θ 0 is affine but acts like an identity map elsewhere. We omit the explicit expression of the derivative M θ0 as it is not important to the implementation of the bootstrap, but refer the reader to Beare and Moon (2015). By Assumptions B.1 and B.2(i), we in turn have by analogous arguments as in the proof of Theorem 3.2 that Therefore, letting η = 0 yields the pointwise weak limit If θ 0 is strictly concave, then M θ0 is the identity map so that (B.7) is degenerate at 0, consistent with Assumption B.2(ii). Following the main text, we differentiate strict concavity from non-strict concavity by introducing a tuning parameter κ n > 0 such that κ n → 0 and √ ns n /κ n → 0 as n → ∞.
Towards construction of the critical values, we estimate the law of (B.7) by the rescaled bootstrap. First, we estimate M θ0 by: for any h ∈ ∞ ([a, b]), where t n ↓ 0 such that t n √ n → ∞. Next, we need to bootstrap the law of G. In order to accommodate flexible bootstrap schemes, we consider a general bootstrap quantityĜ n : . For example, for the standard empirical bootstrap, W n ≡ (W n1 , . . . , W nn ) is a multinomial random vector over n categories with probabilities (1/n, . . . , 1/n). More generally, one may consider multiplier or exchangeable bootstrap, corresponding respectively to i.i.d. or exchangeable weights (Praestgaard and Wellner, 1993;Kosorok, 2008). To formalize the notion of bootstrap consistency, we employ the bounded Lipschitz metric (van der Vaart and Wellner, 1996). For a generic normed space D equipped with norm · D (e.g., D = ∞ ([a, b])), let IfĜ n : {X i , W ni } n i=1 → D is a bootstrap for the law of a random element G in D, thenĜ n is said to be a consistent bootstrap for G if Finally, for α ∈ (0, 1), we set our critical valueĉ 1−α = max{κ n ,ĉ * 1−α } wherê We then reject the null that θ n is concave if √ nφ(θ n ) >ĉ 1−α .
To ensure validity ofĉ 1−α as our critical value, we impose: for any continuous and uniformly bounded f : ∞ ([a, b]) → R.
Assumption B.4. (i) {t n } is a sequence of positive scalars such that t n → 0 and √ nt n → ∞ as n → ∞; (ii) {κ n } is a sequence of positive scalars such that κ n → 0 and √ ns n /κ n → 0 as n → ∞.
Assumption B.3(i)(ii) formalize the bootstrap consistency ofĜ n per our discussions above. Assumption B.3(iii)(iv) are mild technical conditions. The precise meaning of Assumption B.3(iii) is that, for any f ∈ BL 1 (D), where E * and E * are respectively outer and inner expectations with respect to {X i , W ni } n i=1 (jointly)-see Chapter 1.2 in van der Vaart and Wellner (1996) for more discussions. Assumption B.3(iii) can be verified by appealing to existing results directly; see Theorems 2.6 and 10.4 in Kosorok (2008), or Theorem 3.6.13 in van der Vaart and Wellner (1996) in conjunction with Lemma S.3.9 in Fang and Santos (2019). Assumption B.3(iv) is met whenever f (Ĝ n ) is continuous in {W ni } n i=1 , as in common bootstrap schemes. In the main text, Assumption B.3 is automatically satisfied forĜ n that is constructed by the classical empirical bootstrap-see Theorem 3.6.1 in van der Vaart and Wellner (1996). Assumption B.4 imposes suitable rate conditions on the tuning parameter t n (for the rescaled bootstrap) and on κ n (for the KW-selection). Finally, Assumption B.5 is an analog of Assumption 3.2, which ensures that bootstrap consistency delivers consistency of the critical valueĉ * 1−α . Given the above assumptions, we may obtain the following analog of Theorem 3.2 which subsumes the pointwise results in Theorem 3.1. As before, let Theorem B.1. Let Assumptions B.1, B.2, B.3, B.4 and B.5 hold, and set c 1−α = max{κ n ,ĉ * 1−α } withĉ * 1−α defined as in (B.10). Then it follows that 1. For any η ∈ R, (i) if θ 0 is strictly concave, then lim inf n→∞ π n (P η/ √ n ) = 0, and (ii) if θ 0 is non-strictly concave, then where (B.11) holds with equality if, in addition, the cdf of M θ0 (G + ηθ(h)) − {G + ηθ(h)} p is continuous at c * 1−α . 2. For any η ≤ 0, we have lim sup n→∞ π n (P η/ √ n ) ≤ α . (B.12) The proof of Theorem B.1 is in complete accord with the proof of Theorem 3.2 given our high level assumptions, and is thus omitted. In what follows, we therefore focus on verifying some of the main assumptions and in particular Assumption B.2(ii) for Examples B.1 and B.2. Assumptions B.1(i)(ii)(iv) and B.5 may be thought of as regulating the data generating process, while Assumption B.4 is simply concerned with choices of tuning parameters. where F n is the empirical cdf and F n,− is the left-continuous version of F n . Since the map F → Λ F is Hadamard differentiable (under regularity conditions)-see, for example, Lemma 20.14 in van der Vaart (1998), by the Delta method in conjunction with weak convergence of F n , one can show that Assumption B.2(i) is satisfied with G = W • χ ∈ C([a, b]), where W is the standard Brownian motion and χ(t) ≡ F (t)/(1 − F (t)) for any t ∈ [0, b]. We refer the reader also to Chapter 6 in Shorack and Wellner (1986) for a detailed treatment of weak convergence ofΛ n . If F * n is the bootstrap empirical distribution as described below (3.6), then the aforementioned Hadamard differentiability implies that G n = √ n{Λ F * n − Λ Fn } satisfies Assumption B.3 by a combination of Theorems 3.6.1 and 3.9.11 in van der Vaart and Wellner (1996) and Lemma S.3.9 in Fang and Santos (2019). The main condition to verify is now Assumption B.2(ii). In this regard, we believe that, as in Theorem 2.2, it is possible to obtain the same asymptotic order under n i=1 P η/ √ n as under n i=1 P 0 . Since such a development is beyond the scope of this paper, we provide a shortcut building upon existing results under n i=1 P 0 , at the cost of slowing down the rate only a little bit. To illustrate, we first note that existing results-see, for example, MacGibbon, Lu and Younes (2002), Groeneboom and Jongbloed (2012) and Durot and Lopuhaä (2014) Thus, Assumption B.2(ii) is satisfied with s n = (n −1 log n) 2/3 n for a suitable n ↑. For example, if n = (log n) 1/3 , then s n = n −2/3 log n. Example B.2 (cont.). In verifying Assumptions B.1(iii) and B.2(i), we focus on the pointwise asymptotics, as a through investigation of regular/efficient estimation of Λ as in Chapter 5 in Bickel et al. (1998) is beyond the scope of this paper-and we could not find existing results in this regard. Nonetheless, we provide a sketch here. Let p z and p be the densities of Z 1 and 1 with respect to the Lebesgue measure. Then the joint density p x of X 1 ≡ (Y 1 , Z 1 ) at x ≡ (y, z) is p (y − m(z))p z (z). Thus, the differentiable path {P η/ √ n } may be constructed by perturbing the density p , which causes no changes in Λ in view of (B.5), or perturbing m and/or p z . Here, by "perturbing" we mean a pathwise differentiable sequence as in Definition 2.1 in the case of p z , or, in the case of m, a scalar-parametrization η → m η ∈ H for some suitable Hilbert space H such that m η = m + ηg + o(η) as η → 0 for some g ∈ H. A thorough treatment in this regard may be found in van der Vaart (1991). Since pathwise differentiability is Hadamard differentiability-see, for example, Bickel et al. (1998, p.456), it follows by Lemmas 3.9.23 and 3.9.27 in van der Vaart and Wellner (1996) in conjunction with Lemma 3.9.3 in van der Vaart and Wellner (1996) that η → Λ η is pathwise differentiable where, for any t ∈ [0, 1], with the quantile function Q η arising from perturbing p z . This completes sketching the verification of Assumption B.1(ii).
Next, we verify Assumption B.2(i) under pointwise asymptotics (i.e., η = 0); the general case may be handled with the help of Theorem 3.10.7 and Lemma 3.10.11 in van der Vaart and Wellner (1996), in a manner similar to Theorem 3.10.12 in van der Vaart and Wellner (1996). Let {Z i } n i=1 be arranged in ascending order as {Z (i) } n i=1 , and let the corresponding {Y i } n i=1 and { i } n i=1 be denoted by {Y (i) } n i=1 and { (i) } n i=1 . Then the sample cumulative sum diagram is given by {(k, S n (k)) : k = 0, . . . , n} where S n (0) = 0 and, for k = 1, . . . , n, The cumulative sum diagram plays an important role in isotonic regression because the isotonic regression estimator of m evaluated at Z (i) is precisely the left derivative at i of the greatest convex minorant over [0, n] of {Y (i) } n i=1see Brunk (1958) and Mukerjee (1988). Following Beare and Fang (2017), we identify S n as a random elementΛ n ∈ ∞ ([0, 1]) defined bŷ Λ n (t) = 1 n for all t ∈ [0, 1], where we define Y (0) = Y (n+1) = 0 and for x ∈ R we denote by [x] the largest integer in [x − 1, x]. Proposition A.1 in Beare and Fang (2017) then shows that, under regularity conditions, for some Gaussian process G ∈ C([0, 1]).
For Assumption B.3, we propose the paired bootstrap, i.e., resample from {Y i , Z i } n i=1 with replacement, and constructΛ * n in the same fashion asΛ n . The main arguments underlying Proposition A.1 in Beare and Fang (2017) are the Delta method and weak convergence of the partial sum defined by the errors. Thus, the bootstrap consistency ofĜ n ≡ √ n{Λ * n −Λ n } follows from Theorem 3.9.4 in van der Vaart and Wellner (1996), and consistency of bootstrapping the partial sum process for which we refer the reader to Kinateder (1992), Holmes, Kojadinovic and Quessy (2013) and Calhoun (2018) for related results.