On Convex Least Squares Estimation when the Truth is Linear

We prove that the convex least squares estimator (LSE) attains a $n^{-1/2}$ pointwise rate of convergence in any region where the truth is linear. In addition, the asymptotic distribution can be characterized by a modified invelope process. Analogous results hold when one uses the derivative of the convex LSE to perform derivative estimation. These asymptotic results facilitate a new consistent testing procedure on the linearity against a convex alternative. Moreover, we show that the convex LSE adapts to the optimal rate at the boundary points of the region where the truth is linear, up to a log-log factor. These conclusions are valid in the context of both density estimation and regression function estimation.


Introduction
Shape-constrained estimation has received much attention recently.The attraction is the prospect of obtaining automatic nonparametric estimators with no smoothing parameters to choose.Convexity is among the popular shape constraints that are of both mathematical and practical interest.Groeneboom, Jongbloed and Wellner (2001b) show that under the convexity constraint, the least squares estimator (LSE) can be used to estimate, both a density and a regression function.For density estimation, they illustrated that the LSE fn of the true convex density f 0 converges pointwise at a n −2/5 rate under certain assumptions.The corresponding asymptotic distribution can be characterized via a so-called "invelope" function investigated by Groeneboom, Jongbloed and Wellner (2001a).In the regression setting, similar results hold for the LSE rn of the true regression function r 0 .
However, in the development of their pointwise asymptotic theory, it is required that f 0 (or r 0 ) has positive second derivative in a neighborhood of the point to be estimated.This assumption excludes certain convex functions that may be of practical value.Two further scenarios of interest are given below: 1.At the point x 0 , the k-th derivative f (k) 0 (x 0 ) = 0 (or r (k) 0 (x 0 ) = 0) for k = 2, 3, . . ., 2s−1 and f (2s) 0 (x 0 ) > 0 (or r (2s) 0 (x 0 ) > 0), where s is an integer greater than one; 2. There exists some region [a, b] on which f 0 (or r 0 ) is linear.
Scenario 1 can be handled using techniques developed in Balabdaoui, Rufibach and Wellner (2009).The aim of this manuscript is to provide theory in the setting of Scenario 2.
We prove that for estimation of a convex density when Scenario 2 holds, at any fixed point x 0 ∈ (a, b), the LSE fn (x 0 ) converges pointwise to f 0 (x 0 ) at a n −1/2 rate.Its (left or right) derivative f n (x 0 ) converges to f 0 (x 0 ) at the same rate.The corresponding asymptotic distributions are characterized using a modified invelope process.More generally, for any δ > 0, weak convergences of the processes x ∈ [a + δ, b − δ] are established.We remark that unlike the case of Groeneboom, Jongbloed and Wellner (2001b), there does not exist a "universal" distribution of fn on (a, b), i.e. the pointwise limit distributions at different points are in general different.
In addition, we study the adaptation of the LSE fn at the boundary points of the linear region (e.g. a and b).Note that the difficulty level of estimating f 0 (a) and f 0 (b) depends on the behavior of f 0 outside [a, b].Nevertheless, we show that fn (a) (or fn (b)) converges to f 0 (a) (or f 0 (b)) at the minimax optimal rate up to a negligible factor of √ log log n.Last but not least, we show the analogous rate and asymptotic distribution results for the LSE rn under the regression setting.
Our study yields a better understanding of the adaptation of the LSE in terms of pointwise convergence under the convexity constraint.It is also one of the first attempts to quantify the behavior of the convex LSE at non-smooth points.When the truth is linear, the minimax optimal n −1/2 pointwise rate is indeed achieved by the LSE on (a, b).The optimal rate at the boundary points a and b is also achievable by the LSE up to a log-log factor.In addition, our results can be viewed as an intermediate stage for the development of theory under misspecification.Note that linearity is regarded as the boundary case of convexity: if a function is non-convex, then its projection to the class of convex functions K will have linear components.We conjecture that the LSE in these misspecified regions converges at a n −1/2 rate, with the asymptotic distribution characterized by a more restricted version of the invelope process.More broadly, we expect that this type of behavior will be seen in situations of other shape restrictions, such as the log-concavity for d = 1 (Balabdaoui, Rufibach and Wellner, 2009) and the k-monotonicity (Balabdaoui and Wellner, 2007).
The LSE of a convex density function was first studied by Groeneboom, Jongbloed and Wellner (2001b), where its consistency and some asymptotic distributional theory were provided.On the other hand, the idea of using the LSE for convex regression function estimation dates back to Hildreth (1954).Its consistency was proved by Hanson and Pledger (1976), with some rate results given in Mammen (1991).In this manuscript, for the sake of mathematical convenience, we shall focus on the non-discrete version discussed by Balabdaoui and Rufibach (2008).See Groeneboom, Jongbloed and Wellner (2008) for the computational aspects of all the above-mentioned LSEs.
There are studies similar to ours regarding other shape restrictions.See Remark 2.2 of Groeneboom (1985) and Carolan and Dykstra (1999) in the context of decreasing density function estimation when the truth is flat, and Balabdaoui (2014) with regard to discrete log-concave distribution estimation when the true distribution is geometric.For estimation under misspecification of various shape constraints, we point the readers to Jankowski (2014), Cule and Samworth (2010), Dümbgen, Samworth and Schuhmacher (2011), Chen andSamworth (2013), andBalabdaoui et.al. (2013).More recent developments on global rates of the shape-constrained methods can be found in Guntuboyina and Sen (2013b), Doss and Wellner (2013), and Kim and Samworth (2014).See also Meyer (2013) and Chen and Samworth (2014) where an additive structure is imposed in shape-constrained estimation in the multidimensional setting.
The rest of the paper is organized as follows: in Section 2, we study the behavior of the LSE for density estimation.In particular, we first focus on a special case where the true density function f 0 is taken to be triangular.The convergence rate and asymptotic distribution are given in Section 2.1.More general cases are handled later in Section 2.2.Section 2.3 discusses the adaptation of the LSE at the boundary points.Analogous results with regard to regression function estimation are presented in Section 3. Some proofs, mainly on the existence and uniqueness of a limit process and the adaptation of the LSE, are deferred to the appendices.

Estimation of a density function
Let X 1 , . . ., X n be independent and identically distributed (IID) observations from a density function f 0 : [0, ∞) → [0, ∞).Let F 0 be its distribution function (DF).In this section, we denote the convex cone of all continuous convex and integrable functions on [0, ∞) by K.The LSE of f 0 is given by fn where F n is the empirical distribution function of the observations.Furthermore, we denote the DF of fn by Fn .
Throughout the manuscript, without specifying otherwise, the derivative of a convex function can be interpreted as either its left derivative or its right derivative.
Consistency of fn over (0, ∞) in this setting can be found in Groeneboom, Jongbloed and Wellner (2001b).Balabdaoui (2007) has shown how fn can be used to provide consistent estimators of f 0 (0) and f 0 (0).
Here we concentrate on the LSE's rate of convergence and asymptotic distribution.

Rate of convergence
The following proposition shows that the convergence rate at any interior point where f 0 is linear is n −1/2 , thus achieving the optimal rate given in Example 1 of Cai and Low (2014).
Proof of Theorem 2.1 A key ingredient of this proof is the version of Marshall's lemma in this setting (Dümbgen, Rufibach and Wellner, 2007, Theorem 1), which states that where • ∞ is the uniform norm.Let c = fn (x 0 ) − f 0 (x 0 ).Two cases are considered here: (a) c > 0 and (b) c < 0.
In the first case, because fn is convex, one can find a supporting hyperplane of fn passing through (x 0 , f 0 where In the second case, find 0 ≤ t < x 0 such that fn and f 0 intersect at the point (t, 2 − t).If such value does not exist, we take t = 0. Note that Figure 2 illustrates the above inequalities graphically.Therefore, where the last inequality uses the fact that for any t ∈ [0, x 0 ), t 2 /(x 0 − t) is an increasing function of t while x 0 − t is a decreasing function of t.By (2.1), we have that It then follows that c = O p (n −1/2 ), as desired.
Corollary 2.2 (Uniform rate of convergence).For any Let f − n and f + n denote respectively the left and right derivatives of fn .The same convergence rate also applies to these derivative estimators.

Asymptotic distribution
To study the asymptotic distribution of fn , we start by characterizing the limit distribution.
A detailed construction of the above limit process can be found in Appendix I.Note that our process is defined on a compact interval, so is technically different from the process presented in Groeneboom, Jongbloed and Wellner (2001a) (which is defined on the whole real line).As a result, extra conditions regarding the behavior of H (and H ) at the boundary points are imposed here to ensure its uniqueness.Other characterization of the limit process is also possible.A slight variant is given below.
Corollary 2.5 (Different characterization).Conditions (3) and (4) in the statement of Theorem 2.4 can be replaced respectively by Now we are in the position to state our main result of this section.
Theorem 2.6 (Asymptotic distribution).Suppose f 0 (t) = 2(1 − t)1 {t∈[0,1]} .Then for any δ > 0, the process where C is the space of continuous functions equipped with the uniform norm, D is the Skorokhod space, and H is the "invelope process" defined in the first part of Theorem 2.4.
In particular, for any x 0 ∈ (0, 1), Proof of Theorem 2.6 Before proceeding to the proof, we first define the following processes on [0, ∞): (2.3) Furthermore, define the set of "knots" of a convex function f on (0, 1) as (2.4) We remark that the above definition of knots can be easily extended to convex functions with a different domain.By Lemma 2.2 of Groeneboom, Jongbloed and Wellner (2001b), Ĥn (t) ≥ Y n (t) for t ∈ [0, 1], with equality if t ∈ S( fn ).In addition, 3) n (t) = 0. Now define the space E m of vector-valued functions (m ≥ 3) as and endow E m with the product topology induced by the uniform norm on C (i.e. the space of continuous functions) and Skorokhod metric on D (i.e. the Skorokhod space).Let E m be supported by the stochastic process In addition, both Ĥn and Ĥ n are tight in C[0, 1] (via an easy application of Marshall's lemma).Since X n converges to a Brownian bridge, it is tight in D[0, 1] as well.Finally, the same is also true for Y n .
Note that E m is separable.For any subsequence of Z n , by Prohorov's theorem, we can construct a further subsequence Z n j such that {Z n j } j converges weakly in E m to some Using Skorokhod's representation theorem, we can assume without loss of generality that for almost every ω in the sample space, Z n j (ω) → Z 0 (ω).Moreover, since X has a continuous path a.s., the convergence of X n j can be strengthened to X n j (ω)−X(ω) ∞ → 0, where • ∞ is the uniform norm.In the rest of the proof, ω is suppressed for notational convenience, so depending on the context, Z n j (or Z 0 ) can either mean a random variable or a particular realization.
In the following, we shall verify that H 0 satisfies all the conditions listed in the statement of Theorem 2.4.
(1) The fulfillment of this condition follows from the fact that inf t∈ (3) This condition always holds in view of our construction of Ĥn in (2.3).
(4) We consider two cases: (a) if H 0 (1 − 1/m) → ∞ as m → ∞, then the conditions are satisfied in view of Corollary 2.5; (b) otherwise, it must be the case that H 0 (1 − ) is bounded from above.Note that H 0 (1 − ) is also bounded from below a.s., which can be proved by using Corollary 2.3 and the fact that Ĥ n is convex.Denote by τ n j the knot of fn j closest to 1. Then by Lemma 2.2 of Groeneboom, Jongbloed and Wellner (2001b), Ĥn j (τ n j ) = Y n j (τ n j ) and Ĥ n j (τ n j ) = X n j (τ n j ).Consistency of fn j allows us to see that lim j→∞ τ n j = 1.Because H 0 (1 − ) is finite and both Y (t) and X(t) are sample continuous processes, taking τ n j → 1 − yields H 0 (1) = Y (1) and H 0 (1) = X(1).Note that this argument remains valid even if τ n j > 1, because in this scenario, Ĥ n j is linear and bounded on [2 − τ n j , τ n j ].
(5) It follows from Since this holds for any m, one necessarily has that (3) 0 (t) = 0. Consequently, in view of Theorem 2.4, the limit Z 0 is the same for any subsequences of Z n in E m .Fix any m > 1/δ.It follows that the full sequence {Z n } n converges weakly in E m and has the limit (H, H , H , H (3) , Y, X) T .This, together with the fact that H (3) is continuous at any fixed x 0 ∈ (0, 1) with probability one (which can be proved using Conditions (1) and ( 5) of H), yields (2.2).
It can be inferred from Corollary 2.5 and Theorem 2.6 that both fn (0) and fn (1) do not converge to the truth at a n −1/2 rate.In fact, Balabdaoui (2007) proved that fn (0) is an inconsistent estimator of f 0 (0).Nevertheless, the following proposition shows that fn (0) is at most O p (1).For the case of the maximum likelihood estimator of a k monotone density, we refer the readers to Gao and Wellner (2009) for a similar result.

More general settings
The aim of this subsection is to extend the conclusions presented in Section 2.1 to more general convex densities.We assume that f 0 is positive and linear on (a, b) for some 0 ≤ a < b, where the open interval (a, b) is picked as the "largest" interval on which f 0 remains linear.More precisely, it means that there does not exist a bigger open interval (a , b For the sake of notational convenience, we suppress the dependence of H * on a, b and F 0 in the following two theorems. Theorem 2.8 (Characterization of the limit process).Let X(t) = U(F 0 (t)) and Y (t) = t 0 X(s)ds for any t > 0. Then a.s., there exists a uniquely defined random continuously differentiable function H * on [a, b] satisfying the following conditions: (1) H * (t) ≥ Y (t) for every t ∈ [a, b]; (2) H * has convex second derivative on (a, b); Theorem 2.9 (Rate and asymptotic distribution).For any Moreover, where H * is the invelope process defined in Theorem 2.8.

Adaptation at the boundary points
In this subsection, we study the pointwise convergence rate of the convex LSE fn at the boundary points of the region where f 0 is linear.Examples of such points include a and b given in Section 2.2.To begin our discussion, we assume that x 0 ∈ (0, ∞) is such a boundary point in the interior of the support (i.e.f 0 (x 0 ) > 0).Here again f 0 is a convex (and decreasing) density function on [0, ∞).Three cases are under investigation as below: As pointed out in Example 2 of Cai and Low (2014), the minimax optimal convergence rate at x 0 is n −1/3 in (A).Furthermore, in (B) and (C), Example 4 of Cai and Low (2014) suggests that the optimal rate at x 0 is n −α/(2α+1) .In the following, we prove that the convex LSE automatically adapts to optimal rates, up to a factor of √ log log n.
Theorem 2.10 (Adaptation at the boundary points: I).In the case of (A), Proof of Theorem 2.10 Suppose that for some fixed δ > 0, (A) holds for every t ∈ [x 0 − 2δ, x 0 + 2δ].Using essentially the same argument as illustrated in the proof of Theorem 2.1, we can obtain inf Therefore, in the rest of the proof, it suffices to only consider the situation of fn Then fn is linear on either [x 0 −δ, x 0 ] or [x 0 , x 0 +δ].It follows from the line of reasoning as in the proof of Theorem 2.1 that max fn ( By the law of the iterated logarithm for local empirical processes (cf.Lemma 4.3.5 of Csörgő and Horvath (1993)), (2.6) is at most τ In view of (2.5), rearranging the terms in the above inequality yields fn . Furthermore, we note that in our definitions τ + n and τ ++ n might not be distinct.Within this setting, three further scenarios are to be dealt with.
Then one can apply a strategy similar to that used in the proof of Theorem 2.
where we have used the facts that |τ Here the second fact can be derived by invoking which follows easily from consistency of f n in estimating f 0 at the points We then apply the argument presented in (c1) to derive . By proceeding as in the proof of Lemma 4.3 of Groeneboom, Jongbloed and Wellner (2001b), we are able to verify that inf Here we invoked Lemma A.1 of Balabdaoui and Wellner (2007) with k = 1 and d = 2 to verify the above claim.See also Kim and Pollard (1990).Finally, we can argue as in (c1) to show that fn The proof is complete by taking into account all the above cases.
Theorem 2.11 (Adaptation at the boundary points: II).In the case of (B) or (C),

Estimation of a regression function
Changing notation slightly from the previous section, we now assume that we are given pairs where r 0 : [0, 1] → R is a convex function, and where { n,i : i = 1, . . ., n} is a triangular array of IID random variables satisfying To simplify our analysis, the following fixed design is considered: (ii) X n,i = i/(n + 1) for i = 1, . . ., n.
The LSE of r 0 proposed by Balabdaoui and Rufibach (2008) is where, in this section, K denotes the set of all continuous convex functions on [0, 1].We note that the above estimator is slightly different from the more "classical" LSE that minimizes Since its criterion function has a completely discrete nature, different techniques are needed to prove analogous results.We will not pursue this direction in the manuscript.

Basic properties
In this subsection, we list some basic properties of rn given as (3.1).
Proposition 3.2.Let S(r n ) denote the set of knots of rn .The following properties hold: (a) This version of Marshall's lemma in the regression setting serves as an important tool to establish consistency and the rate.In particular, (3.2) easily yields consistency of Rn , and consistency of rn follow from this together with convexity of rn .

Rate of convergence and asymptotic distribution
In the following, we assume that r 0 that is linear on (a, b) ⊆ (0, 1).Moreover, (a, b) is "largest" in the sense that one can not find a bigger open interval (a , b ) on which r 0 remains linear.
Theorem 3.5 (Rate and asymptotic distribution).Under Assumptions (i) -(ii), for any Moreover, where H is the invelope process defined in the second part of Theorem 2.4 using X(t) = W (t) (i.e. a standard Brownian motion).
In presence of the linearity of r 0 on (a, b), the limit distribution of the process √ n(r n −r 0 ) on (a, b) is independent of r 0 .In comparison, we do not observe this feature in density estimation, as illustrated by Theorem 2.9.In addition, the above theorem continues to hold if we weaken Assumption (ii) to: Theoretical results in the random design are also possible, where for instance, we can assume that {X n,i , i = 1, . . ., n} are IID uniform random variables on [0, 1].In this case, Theorem 3.5 is still valid, while a different invelope process H is required to characterize the limit distribution.This follows from the fact that in the random design √ n(R n − R 0 ) can converge to a Gaussian process that is not a Brownian motion.
4 Appendices: proofs 4.1 Appendix I: existence of the limit process Lemmas 4.1 -4.8 are needed to prove Theorem 2.4, where f 0 (t) = 2(1 − t)1 {t∈[0,1]} , where X(t) = U(F 0 (t)), and where Y (t) = t 0 X(s)ds for t ≥ 0. Lemma 4.1.Let the functional φ(g) be defined as for functions in the set Then with probability one, the problem of minimizing φ(g) over G k has a unique solution.
Proof of Lemma 4.1 We consider this optimization problem in the metric space L 2 .First, we show that if it exists, the minimizer must be in the subset for some 0 < M < ∞.To verify this, we need the following result sup Let W (t) be a standard Brownian motion.We note that 1 0 g(t) dX(t) has the same distribution as Using the entropy bound of G 1,1 in L 2 (Theorem 2.7.1 of Guntuboyina and Sen (2013a)) and the Dudley's theorem (cf.Theorem 2.6.1 of Dudley (1999)), we can establish that t) is an isonormal Gaussian process indexed by H, we have that a.s.sup g∈G 1,1 1 0 g(t)f 0 (t)dW (t) < ∞.Furthermore, it is easy to check that W (1) < ∞ a.s. and sup g∈G 1,1 1 0 g(t)f 0 (t)dt ≤ 2. So our claim of (4.1) holds.Now for sufficiently large M (with M > k),

Thus, for any
).Since φ at the minimizer could at most be as large as φ(0) = 0, we conclude that it suffices to only consider functions in G k,M for some sufficiently large M .
Note that the functional φ is continuous (cf.Dudley's theorem) and strictly convex.Moreover, for g 1 , g 2 ∈ G k,M , if , the existence and uniqueness follow from a standard convex analysis argument in the Hilbert space.
As a remark, it can be seen from the proof of Lemma 4.1 that for a given ω ∈ Ω from the sample space (which determines the value of X(t)), if the function φ has a unique minimizer over G 1 (which happens a.s.), it also admits a unique minimizer over G k for any k > 1.
Proof of Lemma 4.2 First, consider the case of t = 0. Theorem 1 of Lachal (1997) says that lim sup where W is a standard Brownian motion.From this, it follows that lim sup This implies that Y (t) does not have a parabolic tangent at t = 0. Second, consider the case of t = 1.After some elementary calculations, we see that it suffices to show that lim sup Denote by Z(t) = t 0 W (s 2 )ds.For any 0 < t 1 < t 2 < 1, we argue that the random variable and thus, Now setting t 1 = 1/2 and t i+1 = t 2 i for every i ∈ N. It is easy to check that the collection of random variables where we made use of the fact that lim i→∞ Assume that there exists some K > 0 such that |Z(t)| ≤ Kt 2 for all sufficiently small t > 0. But it follows from (4.2) that a.s.one can find a subsequence of N (denoted by as j → ∞.The last step is due to the facts of t i j → 0 + and lim sup s→0 + |W (s)|/s 1/4 = 0 a.s.
(which is a direct application of the law of the iterated logarithm).The proof is completed by contradiction.Now denote by f k the unique function which minimizes φ(g) over G k .Let H k be the second order integral satisfying Lemma 4.3.Almost surely, for every k ∈ N, f k and H k has the following properties: , where the derivative can be interpreted as either the left or the right derivative; (iii) H k (t) = Y (t) and H k (t) = X(t) for any t ∈ S(f k ), where S is the set of knots; (iv) t 0 f k (s)ds ≤ X(t) − X(0) and

Proof of Lemma 4.3
To show (i), (ii), (iii) and (iv), one may refer to Lemma 2.2 and Corollary 2.1 of Groeneboom, Jongbloed and Wellner (2001a) and use a similar functional derivative argument.
For (v), we note that since f k is convex, discontinuity can only happen at t = 0 or t = 1.In the following, we show that it is impossible at t = 0. Suppose that f k is discontinuous at zero.Consider the class of functions g δ (t) = max(1 − t/δ, 0).Then . By considering the functional derivative of φ(g) and using integration by parts, we obtain that for any δ > 0, which implies that kδ 2 /2 ≥ Y (δ) for every δ > 0. But this contradicts Lemma 4.2, which says that a.s.Y does not have a parabolic tangent at t = 0. Consequently, f k is continuous at t = 0.The same argument can also be applied to show the continuity of f k at t = 1.Lemma 4.4.Fix any t ∈ (0, 1).Denote by τ − k the right-most knot of f k on (0,t].If such knot does not exist, then set τ − k = 0. Similarly, denote by τ + k the left-most knot of f k on [t,1), and set τ + k = 1 if such knot does not exist.Then, for almost every ω ∈ Ω, there exists K > 0 such that for every k ≥ K, τ − k = 0 and τ + k = 1.Here we suppressed the dependence of f k (as well as τ − k and τ + k ) on ω (via X) in the notation.
Proof of Lemma 4.4 First, we show the existence of at least one knot on (0, 1).Note that the cubic polynomial Therefore, take for instance s = 0.5 and consider the event P k (0.5) ≥ Y (0.5).This event can be reexpressed as which will eventually become false as k → ∞.This is due to the fact that Y (t) is sample bounded.In view of (i) and (v) of Lemma 4.3, we conclude that f k has at least one knot in the open interval (0, 1) for sufficiently large k.
Next, take k large enough so that f k has one knot in (0, 1), which we denote by τ k .By (iii) of Lemma 4.3, H k (τ k ) = Y (τ k ) and H k (τ k ) = X(τ k ).Without loss of generality, we may assume that τ k > t.Now the cubic polynomial P k with P k (0) = Y (0) = 0, P k (τ k ) = Y (τ k ), P k (0) = k and P k (τ k ) = X(τ k ) can be expressed as By taking, say for example, s = t/2, it can then be verified that the event P k (t/2) ≥ Y (t/2), which is equivalent to will eventually stop happening as k → ∞.This is due to the sample boundedness of both X(t) and Y (t).Consequently, τ − k = 0 for sufficiently large k.Furthermore, using essentially the same argument, one can also show that τ + k = 1 for large k, which completes the proof of this lemma.
Lemma 4.5.For almost every ω ∈ Ω (which determines X(t) and f k ), we can find an Proof of Lemma 4.5 Fix any δ ∈ (0, 1/2].In view of Lemma 4.4, we may assume that there exist knots τ − k and τ + k on (0, δ] and [1 − δ, 1) respectively for sufficiently large k.If inf [0,1] f k ≥ 0 for every k > K, then we are done.Otherwise, we focus on those inf [0,1] f k < 0 and find 0 Note that the existence of t k,1 and t k,2 are guaranteed by Lemma 4.3 (v).In the following, we take δ = 1/12 and consider two scenarios.
), where ∂ is the subgradient operator.Then Lemma 4.3(v) implies that a k,1 < 0 and a k,2 > 0. Since both a k,1 (s−t k,1 ) and a k,2 (s − t k,2 ) can be regarded as supporting hyperplanes of f k by convexity, it follows that By the convexity of f k , we see that the first term is no smaller than 2δM 2 k /3.For the second term, we can again use Dudley's theorem and the fact that the following class has entropy of order η −1/2 in L 2 to argue that sup Therefore, the second term is at most O(M k ).Then we can use the argument in the proof of Lemma 4.1 to establish that a.s.lim sup k→∞ M k < ∞.
Lemma 4.6.For any fixed t ∈ (0, 1) and almost every ω Proof of Lemma 4.6 Let ∆ = min(t, 1 − t)/2.In view of Lemma 4.4, for sufficiently large k, we can assume that f k has at least one knot in (0, t − ∆], and one knot in [t + ∆, 1).Denote these two points by τ − k and τ + k respectively.By the convexity of f k , there exists some c ∈ R such that where Proof of Lemma 4.7 By Lemma 4.3(iv), for any t ∈ [0, 1], min 0, inf Consequently, it follows from Lemma 4.5 that sup t∈ .4 says that one can always find a knot τ ∈ (0, 1) with X(τ ) = H k (τ ) for all sufficiently large k.Thus, the boundedness of sup t∈ Finally, one can derive the sample boundedness of sup t∈[0,1] |H k (t)| k by using the equality In fact, they are uniformly Hölder continuous with exponent less than 1/4.

Proof of Lemma 4.8
Here we only show that the family {H k } k is uniformly equicontinuous.Fix any 0 < δ < 1.For any 0 ≤ t 1 ≤ t 2 ≤ 1 with t 2 − t 1 < δ, by the convexity of f k , In the following, we shall focus on δ 0 f k (s)ds.The term 1 1−δ f k (s)ds can be handled in exactly the same fashion.By Lemma 4.4, S(f k ) is non-empty for every k > K, where K is a sufficiently large positive integer.Furthermore, in view of Lemma 4.5, we can assume that inf k∈N inf [0,1] f k ≥ −M for some M > 0.
(a) There exists at least one knot for some M > 0 (which is the α-Hölder constant of this particular realization of X(t)).
The last line is due to Lemma 4.3(iv) and the fact that X(t) is α-Hölder-continuous.
We remark that since f k (0) = k, the above conclusion also implies that S(f k )∩(0, δ α ] = ∅ for all sufficiently large k.
, so we can use essentially the same argument as above to see that To finish the proof, we shall apply Lemma 4.5 to verify that

Proof of Theorem 2.4
For every m ∈ N with m ≥ 3, define the following norms First, we show the existence of such a function for almost all ω ∈ Ω by construction.Fix ω (thus we focus on a particular realization of X(t) but suppress its dependence on ω in the notation).Let H k be the function satisfying H k = f k , H k (0) = 0 and H k (1) = Y (1).We claim that the sequence H k admits a convergent subsequence in the topology induced by the norm • m .
By Lemma 4.6, we may assume that for t = 1/m and t In the following, we show that H has the properties listed in the statement of the theorem.
(3) -( 4) Since H k (0) = Y (0) and H k (1) = Y (1) for all k ∈ N, we have H(0) = Y (0) and In light of Lemma 4.4, one can assume that lim k→∞ τ − k = 0 and lim k→∞ τ + k = 1.The sample continuity of X(t), together with the property of knots (see (iii) of Lemma 4.3), entails that H k (τ − k ) → X(0) and H k (τ + k ) → X(1) as k → ∞.Finally, we use the uniform convergence and the continuity of H to establish H (0) = X(0) and H (1) = X(1). (5 This completes the proof of existence.It remains to show the uniqueness of H. Suppose that there are H 1 and H 2 satisfying Conditions (1) -( 5) listed in the statement of Theorem 2.4.For notational convenience, we write h 1 = H 1 and h 2 = H 2 .Then, 1 2 where we used Conditions (1) -( 5) of H 2 to derive the last inequality.By swapping H 1 and H 2 , we further obtain the following inequality 1 2 Adding together the above two inequalities yields 0 2 dt, which implies the uniqueness of H on (0, 1).The uniqueness of H then follows from its third condition.

Proof of Corollary 2.5
We can easily verify the existence of such a function by using the same construction in the proof of Theorem 2.4,.In particular, if Y (t) does not have parabolic tangents at both t = 0 and t = 1 (which happens a.s.according to Lemma 4.2), then On the other hand, if H (0 + ) → ∞, there must be a sequence of knots τ 1 , τ 2 , . . . of H with lim j→∞ τ j = 0.In views of Conditions (1), ( 2) and ( 5), one necessarily have H(τ j ) = Y (τ j ) and H (τ j ) = X(τ j ) for every j.The fact that H, H , Y and X are all continuous entails that H(0) = Y (0) and H (0) = X(0).Consequently, Condition (3') implies Condition (3).We now apply the same argument to H (1 − ) to conclude that Condition (4') implies Condition (4).Hence, in view of Theorem 2.4, H is unique.
4.2 Appendix II: pointwise adaptation for cases (B) and (C) The following three lemmas are required to prove Theorem 2.11.Lemma 4.9.For any α > 1, inf k∈[0,1] Proof of Lemma 4.9 It suffices to show that for any k ∈ [0, 1], First, it is easy to check that the above inequality holds true when k = 0.In the case of k > 0, we can restate the inequality to be proved as Next, we define m = k/(1 + k) ∈ [0, 1/2], so that (4.3) can be rewritten as .
Proof of Lemma 4.10 First, it is easy to check that (4.5) If τ ≤ x 0 , then (4.5) can be expressed as On the other hand, if τ > x 0 , then after some elementary calculations, we can show that (4.5) is equal to Denote by k = (τ − x 0 )/(τ + − τ ), so that (4.6) can be rewritten as where C f 0 > 0 is a constant that only depends on f 0 , and where we applied Lemma 4.9 with the fact that k ∈ [0, 1] to derive the above displayed equation.Consequently, by setting K f 0 = C f 0 /2 α+2 , it is straightforward to check that (4.6) is greater than or equal to Lemma 4.11.Let F be a collection of functions defined on [x 0 − δ, x 0 + δ], with δ > 0 small.Suppose that for a fixed x ∈ [x 0 − δ, x 0 + δ] and every 0 for some d ≥ 1/2 fixed and K > 0 depending only on x 0 and δ.Moreover, suppose that Let ξ n be an arbitrary sequence of numbers converging to x 0 .Define η − n = max{t ∈ S( fn ) : t < ξ n } and η Here one can apply Lemma A.1 of Balabdaoui and Wellner (2007) with k = α and d = 2 to verify the above extension.By proceeding as in the proof of Lemma 4.3 of Groeneboom, Jongbloed and Wellner (2001b), one can verify that there exists some In fact, it is easy to see that the above conclusion still holds if we change our assumption from τ 2α+1) ).Consider the behavior of t 0 Fn (s)ds (as a function of t) at the middle point τ It was shown in Lemma 4.2 of Groeneboom, Jongbloed and Wellner (2001b) that In view of Lemma A.1 of Balabdaoui and Wellner (2007) with k = α and d = 2, for any > 0. On the other hand, by Lemma 4.10, we have for some constant K f 0 > 0 that only depends on the density function f 0 .Combining (4.8), (4.9) and (4.10) together yields τ for any small > 0. On the other hand, in view of Lemma 4.10 and identity (4.12), some elementary calculations yield Plugging the above two equations into (4.The rest of the proof is similar to (b) in the proof of Theorem 2.10.By the law of the iterated logarithm for local empirical processes, where the last equality follows from the linearity of fn (t) and f 0 (t) − K 2 (t − x 0 ) α 1 {t>x 0 } on [τ − n , τ + n ].Since our assumption in (iv) guarantees that 1 ≤ (τ rearranging the terms in the above displayed equation leads to max fn (x 0 ) − f 0 (x 0 ), 0 (4.13) Finally, as τ + n − τ − n ≥ n −1/(2α+1) , we can plug (4.7) and (4.11) into (4.13) to verify max fn (x 0 ) − f 0 (x 0 ), 0 ≤ O p n −α/(2α+1) log log n .

Appendix III: technical details of other theoretical results
Proof of Corollary 2.2 A closer look at the proof of Theorem 2.1 reveals that we have where the final equation follows from Corollary 2.2.
We prove this by contradiction.Suppose that lim inf k→∞ inf t∈[0,1] {G k (t) − G 0 (t)} = −M for some M > 0. By extracting subsequences if necessary, we can assume that inf t∈[0,1] {G k (t) − G 0 (t)} → −M as k → ∞.In view of (4.14), it follows from the convexity of G k and G 0 that one can find an interval I k of positive length δ (which can depend on M ) such that inf t∈I k {G k (t) − G 0 (t)} ≤ −M/2 for every k > K, where K is a sufficiently large integer.This implies that From this perspective, it is easy to check that Proposition 3.1, Proposition 3.2 and Proposition 3.1 follow from, respectively, slight modifications of Lemma 2.1, Lemma 2.2 of Groeneboom, Jongbloed and Wellner (2001b) and Theorem 1 of Dümbgen, Rufibach and Wellner (2007).
Proofs of Theorem 2.8, Theorem 2.9 and Theorem 3.5 are also omitted for the sake of brevity, since the arguments are very similar to those shown previously in Section 2.1.
} k are bounded.By the convexity, {f k } k have uniformly bounded derivatives on [1/m, 1 − 1/m] so are uniformly bounded and equicontinuous.Therefore, Arzelà-Ascoli theorem guarantees that the sequence f k has a convergent subsequence f k l in the supremum metric on [1/m, 1 − 1/m].Extracting further subsequences if necessary, one can get f k l converging in the topology induced by the L ∞ norm on [1/m, 1 − 1/m] for m = 3, 4, . ... Now by Lemma 4.7 and Lemma 4.8, we can assume that {H k } k and {H k } k are bounded and equicontinuous on [0, 1].By Arzelà-Ascoli theorem again, we are able to extract further subsequences if necessary to make H k l converge in the topology induced by the norms • m for m = 3, 4, . ... We denote the function that H k l converges to by H.
By Lemma 4.5, we see that for almost every ω ∈ Ω, lim sup k→∞ f k (t) is bounded.Combining this with the lower bound we established previously entails the boundedness of {f k 1 for large n.This implies that s n ≥ α for sufficiently large n.It now follows from Lemma 4.11 (with d = 2 and s 0