A two stage $k$-monotone B-spline regression estimator: Uniform Lipschitz property and optimal convergence rate

This paper considers k-monotone estimation and the related asymptotic performance analysis over a suitable Hölder class for general k. A novel two stage k-monotone B-spline estimator is proposed: in the first stage, an unconstrained estimator with optimal asymptotic performance is considered; in the second stage, a k-monotone B-spline estimator is constructed (roughly) by projecting the unconstrained estimator onto a cone of k-monotone splines. To study the asymptotic performance of the second stage estimator under the sup-norm and other risks, a critical uniform Lipschitz property for the k-monotone B-spline estimator is established under the ∞-norm. This property uniformly bounds the Lipschitz constants associated with the mapping from a (weighted) first stage input vector to the B-spline coefficients of the second stage k-monotone estimator, independent of the sample size and the number of knots. This result is then exploited to analyze the second stage estimator performance and develop convergence rates under the sup-norm, pointwise, and Lp-norm (with p ∈ [1,∞)) risks. By employing recent results in k-monotone estimation minimax lower bound theory, we show that these convergence rates are optimal. MSC 2010 subject classifications: Primary 62G08; secondary 62G20.


Introduction
For fixed k ∈ N, a real-valued univariate function is said to be k-monotone if its (k − 1)-th derivative is increasing. Examples of k-monotone functions include monotone functions (with k = 1) and convex functions (with k = 2). The study of k-monotone regression concerns the nonparametric estimation of an underlying k-monotone function such that a constructed estimator preserves the k-monotone shape. Belonging to the general framework of shape constrained estimation in nonparametric statistics, this problem has garnered substantial attention in statistics and approximation theory, due to the vast number of applications; see [10,25,32,35,36] and references therein. The surging interest in shape constrained estimation is driven by the observation that incorporating shape constraints into estimator construction often improves estimator accuracy and efficiency [31].
Despite extensive research on monotone and convex regression, no asymptotic performance analysis results are available on k-monotone regression estimators when k > 2, especially under the sup-norm risk, even though the (shape preserving) spline approximation of k-monotone functions for larger k has been studied in approximation theory [19,20,21,30] and numerous applications are available. Such applications include insurance (with k = 3, 4) [12], qualitative simulation (with k = 3) [22], and attitude control (with k = 3) [32]. Particularly, a prudent utility function u(·), in the context of risk management [12], has a nonnegative third derivative (and is thus, 3-monotone), whereas a temperate utility function u(·) has a nonpositive fourth derivative (and thus, −u(·) is 4-monotone). Additionally, when solving systems of qualitative differential equations [22], nonnegative third order derivative constraints are used to reduce spurious distinctions between qualitative behaviors. Finally, in the attitude control system treated in [32], the control force F is a nonlinear function of input voltage V , and experimental studies show that F (3) Motivated by the higher-order k-monotone constraints arising in applications and the lack of statistical performance analysis, we consider a two stage kmonotone B-spline regression estimator for arbitrary k ∈ N and study its asymptotic performance under the sup-norm, pointwise, and L p -norm (with p ∈ [1, ∞)) risks. B-splines are popular in approximation and estimation theory due to their numerical advantages [8,10]. One can easily impose k-monotone constraints on B-spline coefficients, which can be efficiently computed via quadratic programs in polynomial time. Moreover, recent results show that when k = 1, 2, k-monotone B-spline estimators outperform many other constrained estimators [37,38,41]. For instance, it is known that the popular monotone and convex least squares estimators lack smoothness and cannot achieve uniform convergence since they are inconsistent on the boundary with non-negligible asymptotic bias [28,42]; these deficiencies are overcome by the monotone and convex B-spline estimators [37,41]. In spite of numerical efficiency and performance advantages, the asymptotic analysis of k-monotone B-spline estimators is nontrivial, particularly when the sup-norm risk is considered.

Challenges in k-monotone estimation
A major challenge in k-monotone spline estimation arises from the difficulty of accurately approximating certain k-monotone functions by splines that preserve the k-monotone shape. It is known that when k > 3, the accuracy of a k-monotone spline approximation associated with a given knot sequence is hampered by the k-monotone constraint [21]. Consequently, the bias of a k-monotone B-spline estimator associated with a given knot sequence is unsatisfactorily large if the knot sequence is coarse. In particular, knot sequences typically utilized to produce unconstrained B-spline estimators with optimal (asymptotic) performance are too coarse to be directly used in the production of k-monotone B-spline estimators. Further, the noisy data prevents us from choosing (asymptotically) finer knot sequences to obtain a more accurate k-monotone B-spline estimator. Hence, a single stage k-monotone B-spline estimator cannot achieve the optimal asymptotic performance under the sup-norm risk when k > 3.
Another major challenge in the k-monotone B-spline estimator asymptotic analysis emerges from the requirement of a deep understanding of the mapping from a (weighted) input vector to the corresponding B-spline coefficient vector associated with the k-monotone constraint, where the size of the input vector is related to that of the knot sequence. For fixed knot and design point sequences, it is known in optimization theory that this mapping is a Lipschitz piecewise linear function due to the k-monotone inequality constraint. As the sample and knot sequence sizes increase and tend to infinity, an infinite family of size-varying piecewise linear functions arises. A fundamental question in the asymptotic analysis asks how the Lipschitz constants of these piecewise linear functions behave as the knot sequence size tends to infinity. A critical uniform Lipschitz property has been established for monotone P-splines (corresponding to k = 1) [37] and convex B-splines (corresponding to k = 2) [41]. This property states that the size-varying piecewise linear mappings from the (weighted) input vector to the constrained B-spline coefficient vector attain a uniform Lipschitz constant under the ∞ -norm, independent of the number of linear pieces, design points, and knots. It leads to many important results in the asymptotic performance analysis, e.g., uniform convergence, optimal convergence rates, and the sup-norm and pointwise mean squared risks [41]. It has been conjectured that this property can be extended to k > 2 [41]. However, this extension encounters several difficulties: the proof of this property for the monotone and convex cases (k = 1 or 2), relies on the diagonal dominance of certain matrices that no longer holds for larger k. In addition, the results in [37,41] are based on a restrictive assumption of evenly spaced design points and knots, and the extension to the unevenly spaced case is nontrivial.

Contributions
This paper addresses the aforementioned challenges and presents new developments in k-monotone estimation. We summarize these results and contributions as follows.
(1) As one of the key contributions of this paper, we establish the uniform Lipschitz property for general k ∈ N via a new technique (cf. Theorem 3.1). This technique relies on a deep result in B-spline theory (dubbed de Boor's conjecture) first proved by A. Shardin [34]; see [16] for a recent, simpler proof. Informally speaking, this result says that the ∞ -norm of the inverse of the Gramian formed by the normalized B-splines of order k is uniformly bounded, independent of the knot sequence and the number of B-splines (cf. Theorem 7.1). Inspired by this result, we construct (nontrivial) coefficient matrices for the piecewise linear functions corresponding to the k-monotone constraint and relate these matrices to suitable B-splines. This yields the uniform bounds on the Lipschitz constants in the ∞ -norm for arbitrary k and possibly unevenly spaced design points and knots; see Section 7.1 for an overview of the proof.
(2) We propose a novel two stage k-monotone B-spline estimator. In the first stage, an unconstrained estimator with optimal performance is used; in the second stage, a k-monotone B-spline estimator is constructed by projecting the first stage estimator onto the cone of k-monotone B-splines with a suitable knot sequence. The second stage estimator can be quickly produced by solving a strictly convex quadratic program with a small number of variables; see Section 3.2. The two stage estimator has several notable advantages: it is simple, flexible, and can be efficiently computed. More importantly, the optimal first stage estimator allows the second stage constrained B-spline estimator to obtain optimal bias and stochastic error, a key motivation for the two stage estimator.
(3) By leveraging the uniform Lipschitz property, k-monotone approximation theory, and statistical techniques, we derive (asymptotically) optimal uniform bounds on the estimator risk with respect to the sup-norm, pointwise, and L pnorm (with p ∈ [1, ∞)) errors over a Hölder class. These bounds lead to the uniform convergence and convergence rates under the sup-norm, pointwise, and L p -norm (with p ∈ [1, ∞)) risks, including boundary consistency, noting that many constrained estimators, such as the monotone least-squares estimators, fail to achieve the boundary consistency [28,42].
(4) Finally, we show that the proposed k-monotone estimator obtains an optimal convergence rate, and thus the optimal minimax risk, over a suitable Hölder class in the sup-norm, by exploiting the recent development in minimax lower bound theory for k-monotone estimators [23,Chapter V]. This result is the first of its kind to the best of our knowledge.
The present paper not only recovers the monotone and convex estimation results and leads to new asymptotic analysis results for arbitrary k ∈ N, but also treats k-monotone estimation in a unified framework and paves the way for the study of constrained estimation under a broader class of shape constraints.

Organization and notation
The paper is organized as follows. In Section 2, we introduce the k-monotone estimation problem and k-monotone splines. In Section 3, we present the two stage k-monotone B-spline estimator, characterize the second stage estimator, and state the uniform Lipschitz property. With the help of the uniform Lipschitz property, we establish optimal convergence rates for the two stage estimator in Section 4. Two numerical examples illustrate the performance of the second stage estimator in Section 5. Concluding remarks and the proof of the uniform Lipschitz property are given in Sections 6 and 7 respectively.
Notation. We introduce some notation used in this paper. Define the function δ ij on N×N so that δ ij = 1 if i = j, and δ ij = 0 otherwise. Let I S be the indicator function for a set S and let 1 n ∈ R n and 1 n1×n2 ∈ R n1×n2 be the column vector and matrix of all ones, respectively. For an index set α, let α be its complement and |α| be its cardinality. For j ∈ N, let the set α + j := {i + j : i ∈ α}. For a column vector v ∈ R n , let v i be its ith component. For a matrix A, let (A) ij or (A) i,j be its (i, j)-entry, let (A) i• be its ith row, and (A) •j be its jth column. Given an index set α, let v α ∈ R |α| be the vector formed by the components of v indexed by elements of α, and (A) α• be the matrix formed by the rows of A indexed by elements of α. Let n p denote the binomial coefficient indexed by n, p ∈ N with n ≥ p. For sequences of positive numbers (a n ) and (b n ) we write a n b n if there exist positive c, C ∈ R n such that c ≤ lim inf n→∞ a n /b n ≤ lim sup n→∞ a n /b n ≤ C. For p ∈ [1, ∞), let · Lp denote the L p -norm on [0, 1] (or on some other interval I ⊆ R when the context is clear), so that

k-monotone estimation and k-monotone splines
In this section, we introduce the k-monotone estimation problem, and collect some key properties of k-monotone splines to be used in the sequel.

k-monotone estimation
Fix k ∈ N. The class of k-monotone univariate functions on the interval [0, 1] is (2) Additionally, given n ∈ N, consider the family of design point sequences for fixed constants c ω,1 and c ω,2 such that 0 < c ω,1 ≤ 1 ≤ c ω,2 , where c ω,1 is used for the minimax lower bound analysis: This paper focuses on the estimation of functions in S k,H (r, L) := S k ∩ H r L . Specifically, given a sequence of design points (x i ) n i=0 ∈ P n on the interval [0, 1], consider the regression problem where f is an unknown true function in S k,H (r, L), the ε i 's are independent standard normal errors, σ is a positive constant, and the y i 's are samples. The goal of k-monotone estimation is to construct an estimator f that preserves the k-monotonicity of the true function characterized by S k . With regards to the estimator asymptotic analysis, we are particularly interested in the estimator uniform convergence on the entire interval [0, 1], as well as the convergence rates when the sample size n is sufficiently large. Note that it is possible to relax the assumption of the normality of the errors ε i 's in the model (4). However, we assume normal errors to simplify technical developments in this paper. A two stage k-monotone B-spline regression estimator will be developed for k-monotone estimation problems (cf. Section 3). Since this estimator is given by a k-monotone spline, we give a detailed discussion on such splines in the following subsections.

B-spline properties
We first provide a brief review of B-splines as follows; see the monograph [8] for more details. For a given K ∈ N, let t := {0 = t 0 < t 1 < · · · < t K = 1} be a sequence of (K + 1) knots in R.

k-monotone B-splines
In this subsection, we characterize the class of k-monotone splines via B-spline coefficients. Let be a given sequence of (K n + 1) knots in [0, 1], and let g b,t : where the b j 's are real coefficients of B-splines, b := (b 1 , . . . , b Kn+k−1 ) T is the spline coefficient vector, and the subscript n in K n corresponds to the number of design points to be used in the subsequent sections.
To derive a necessary and sufficient condition for g b,t ∈ S k , we introduce some matrices. Let D (j) ∈ R j×(j+1) denote the first order difference matrix, i.e., If k = 1, let D k,t := D (Kn−1) . In what follows, consider k > 1. For the given knot sequence t with the usual extension t j = 0 for any j < 0 and t j = 1 for any j > K n , define the following diagonal matrices: Δ 0,t := I Kn−1 , and for each for each p = 1, . . . , k. To simplify notation, we often drop the subscript t and write D p,t as D p when the context is clear. Roughly speaking, D p,t denotes the p th order difference matrix weighted by the knots of t. When the knots are evenly spaced, it is almost identical to a standard difference matrix (except on the boundary). Since Δ −1 k−p,t is invertible and D (Kn+k−1−p) has full row rank, it can be shown via induction that D p,t is of full row rank for any p and t.
In what follows, define N := K n + k − 1 for a fixed spline order k ∈ N. Note that N depends on K n and thus on n.
denote the B-splines of order p defined by t, with the usual extension. Then the following hold: Proof. We write g b,t as g in the following proof for notational simplicity.

Two stage k-monotone B-spline estimator and the uniform Lipschitz property
In this section, we present the two stage k-monotone B-spline estimator, study the second stage k-monotone B-spline estimator, and state the uniform Lipschitz property. This property is critical to the asymptotic performance analysis.

Two stage k-monotone B-spline estimator
The proposed k-monotone B-spline estimator consists of two stages: in the first first stage, we choose any unconstrained estimator that converges to the true function at the optimal rate over the Hölder class H r L in the minimax sense; the second stage B-spline estimator is constructed roughly by projecting the first stage estimator onto the cone of order k splines associated with S k and a suitable knot sequence. In what follows, we describe these two stages in detail.

First stage unconstrained estimator
Given a design point sequence P = (x i ) n i=0 ∈ P n , where c ω,1 , c ω,2 are fixed (cf. (3)), we choose the first stage estimator f [1] P to be any unconstrained estimator such that it converges to the true function at the optimal asymptotic k-monotone B-spline estimator 1397 rate uniformly over H r L for a given performance metric, independent of P ∈ P n . Specifically, when the sup-norm risk is considered, f [1] P achieves Equivalently, there exists a positive constant c * such that f [1] P satisfies for all large n, where c * depends only on k, r, L, σ, c ω,1 , and c ω,2 . Alternatively, when the pointwise p-risk or L p -norm risk with p ∈ [1, ∞) are considered, the first stage estimator achieves (12) or equivalently, there exists a positive constant c * such that for any fixed for all large n, where c * depends only on p, k, r, L, σ, c ω,1 , and c ω,2 . The asymptotic error bound on the second stage estimator L p -norm risk, established in Corollary 4.1, does not follow from an asymptotic error bound on the first stage estimator L p -norm risk. This is a result of the fact that the second stage estimator depends on pointwise evaluations of the first stage estimator at each of the design points (cf. (15) and (16)). Hence, we require (12) and (13) to hold when the L p -norm risk is the performance metric under consideration, rather than analogous bounds on the first stage estimator L p -norm risk. It is known from nonparametric statistical theory [27,39] that many (unconstrained) estimators meet the above conditions. One example is the local polynomial estimator of order (k − 1), with bandwidth h n log n n 1 2r+1 for the sup-norm risk (resp. h n n − 1 2r+1 for the pointwise and L p -norm (with p ∈ [1, ∞)) risks); see [39, Section 1.6]. Another example is the (unconstrained) B-spline estimator. This gives rise to great flexibility in the construction of the two stage k-monotone estimator.

Second stage k-monotone B-spline estimator
In this subsection, we describe how the first stage unconstrained estimator f [1] P is utilized to produce the second stage k-monotone B-spline estimator.

T. M. Lebair and J. Shen
Given a sample data vector y = (y 0 , y 1 , . . . , y n ) T ∈ R (n+1) corresponding to the to the design point sequence P = (x i ) n i=0 ∈ P n , compute f [1] P . Let (K n ) be a sequence of integers depending on the sample size n to be specified later (cf. Theorems 4.1, 4.2, and Corollary 4.1). Additionally, fix constants c t,1 and c t,2 with 0 < c t,1 ≤ 1 ≤ c t,2 . For each K n ∈ N, define the set of sequences of (K n + 1) knots on [0, 1] with the usual extension on the left and right boundary: denote the set of order k B-splines associated with t and the usual extension, scaled such that N j=1 B t k,j (x) = 1 for all x ∈ [0, 1] (see Section 2.2 for a relevant discussion of the properties of B-splines). The second stage constrained B-spline estimator characterized by S k is given by where the coefficient vector b P,t : where in (16) x n+1 := 1, and the matrix D k,t is defined in (9). It follows from Lemma 2.1 that the constraint D k,t b P,t ≥ 0 is equivalent to the k-monotonicity of the spline f P,t ∈ S k . Hence, the second stage B-spline estimator f P,t preserves the k-monotone constraint characterized by S k . Algorithm 1 below summarizes the construction of the two stage k-monotone B-spline estimator. Note that the second stage estimator f P,t can be computed in polynomial time as a function of N n by solving a strictly convex quadratic program; see Section 3.2 for more details.

Characterization of the second stage k-monotone B-spline estimator
In this subsection, we further characterize the second stage constrained B-spline estimator f P,t introduced in (15). Toward this end, we summarize all of the assumptions that will be used throughout the rest of the paper for a fixed k ∈ N as follows:
Consider the second stage estimator f P,t given by (15) and (16). Let f and b denote f P,t and b P,t respectively, noting that f and b depend on P and t. It follows from Lemma 2.1 that f ∈ S k .
Define the diagonal matrix where f [1] P is the first stage estimator (cf. Section 3.1.1); define the weighted input vector y := K n · X T Θ f [1] ∈ R N . The quadratic optimization problem in (16) for b P,t or simply b can be written as b := arg min Note that Λ is positive definite for any given K n , P , and t. Hence, problem (18) (or equivalently (16)) is a strictly convex quadratic program, which can be solved in polynomial time by many effective numerical solvers. Define Λ ∈ R N ×N and y ∈ R N such that for i, j = 1, . . . , N. We note that for n sufficiently large, Λ approximates Λ and y approximates y. Hence, (18) is closely related to b := arg min Because a spline g : , with b j given by (19), is the L 2 -projection of the first stage estimator onto the cone of k-monotone splines with knot sequence t. Hence, the second stage estimator given by (15) and (16) roughly corresponds to the L 2 -projection of the first stage estimator onto the cone of k-monotone splines with knot sequence t.
We call a function f : R n → R n piecewise linear if there are finitely many linear functions {h s } r s=1 on R n such that for any x ∈ R n , f (x) = h s (x) for some h s , where each linear function h s is called a linear piece of f . It is well known that a (continuous) piecewise linear function is globally Lipschitz continuous [33]. Note that b in (18) can be treated as a function of the weighted input vector y. In fact, since the matrix Λ is positive definite for any given K n , P , and t, it follows from the standard theory of quadratic programming that the function b : R N → R N from y to b(y) is piecewise linear and globally Lipschitz continuous [33]. To establish the piecewise linear formulation of b or equivalently the linear pieces of b, we consider the KKT conditions for (18) given by: where χ ∈ R Kn−1 is the Lagrange multiplier, and u ⊥ v means that the vectors u and v are orthogonal. Similar to [37,38,41], define the index set for any given y ∈ R N , where α may be the empty set. For each α associated with y, the KKT conditions in (20) become For a given index set α, let F T α ∈ R N ×(|α|+k) be a matrix whose columns form a basis for the null space of (D k,t ) α• or simply (D k ) α• . This implies that F α depends on both α and t. In particular, if α is the empty set, then N = K n + k − 1 = |α| + k, and F T α is the identity matrix of order N . To express b(y) in terms of F α for an arbitrary α, we deduce from (22) and obtain that where we use the fact that F α ΛF T α is invertible, since F α has full row rank and Λ is positive definite.
For any index set α given in (21), define the linear function b α P,t : where F α depends on t, α, and Λ depends on K n , P, t. Clearly, b α P,t is a linear piece of the piecewise linear function b : R N → R N , and b has 2 Kn−1 linear pieces for a given K n . Finally, observe that for any invertible matrix R ∈ R (|α|+k)×(|α|+k) , Thus any choice of F α leads to the same b α P,t , provided that the columns of F T α form a basis of the null space of (D k ) α• .

k-monotone B-spline estimator: Uniform Lipschitz property
As indicated in the previous subsection, the piecewise linear function b P,t (·) (or simply b(·)) is Lipschitz continuous for fixed K n , P, t. An important question asks whether the Lipschitz constants of size-varying b with respect to the ∞norm are uniformly bounded, independent of K n , P , and t, as long as the numbers of design points and knots are sufficiently large. If so, we say that b satisfies the uniform Lipschitz property. Originally introduced and studied in [37,38,40,41] for monotone P-splines (k = 1) and convex B-splines (k = 2) with equally spaced design points and knots, this property is shown to play a crucial role in the uniform convergence and asymptotic analysis of k-monotone B-spline estimators. In this paper, we establish this property for k-monotone Bspline estimators with general k, under relaxed conditions on the design points and knots. For any p, K n ∈ N, and t ∈ T Kn , it is noted that for any t j ∈ t, we have, Moreover, in view of t j−p = 0 for any j ≤ p and t j = 1 for any j ≥ K n , it can be shown that for each 1 ≤ j ≤ K n + p − 1, 1 . In summary, we have, for each j = 1, . . . , K n + p − 1, Using the above notation, the following theorem states one of the the main results of the paper, i.e., the uniform Lipschitz property of b P,t (·). Since the proof of this property is technical, we postpone it to Section 7 to maintain a smooth paper flow.  Let k ∈ N and constants c ω,1 , c ω,2 , c t,1 , c t,2 be fixed, where  0 < c ω,1 ≤ 1 ≤ c ω,2 and 0 < c t,1 ≤ 1 ≤ c t,2 . For any n, K n ∈ N, let b P,t : R Kn+k−1 → R Kn+k−1 be the piecewise linear function in (18) corresponding to the kth order B-splines defined by the design point sequence P ∈ P n and the knot sequence t ∈ T Kn . Then there exists a positive constant c ∞ , depending on k, c t,1 only, such that for any increasing sequence (K n ) with K n → ∞ and K n /n → 0 as n → ∞, there exists n * ∈ N, depending on (K n ) (and the fixed  constants k, c ω,2 , c t,1 , c t,2 ) only, such that for any P ∈ P n and t ∈ T Kn with n ≥ n * , The above result can be refined for a particular sequence of P and t. Corollary 3.1. Let (K n ) be an increasing sequence with K n → ∞ and K n /n → 0 as n → ∞, and P n , t n be a sequence in P n ×T Kn . Then there exists a positive constant c ∞ , independent of n, such that for each n, This corollary recovers the past results on the uniform Lipschitz property for k = 1, 2 (e.g., [37,41]) for equally spaced design points and knots.

Two stage k-monotone estimator risk and optimal convergence rate
In this section, we exploit the uniform Lipschitz property to establish the supnorm, pointwise, and L p -norm (with p ∈ [1, ∞)) risks of the second stage kmonotone B-spline estimator. In particular, we show that the second stage kmonotone B-spline estimator achieves the same asymptotic convergence rate as the first stage (unconstrained) estimator over S k,H (r, L) under the sup-norm risk, pointwise risk, and L p -norm risk. This result leads to the optimal convergence rate of the two stage k-monotone B-spline estimator f P,t over S k,H (r, L) in the sup-norm (cf. Theorem 4.4).

Second stage k-monotone B-spline estimator risks
To develop the risks of the second stage k-monotone B-spline estimator f P,t , we first establish the following lemma, which concerns the approximation of k-monotone functions f ∈ S k,H (r, L) by k-monotone splines associated with a given knot sequence. It will be used to bound the second stage estimator risks in the subsequent development. For a given knot sequence t ∈ T Kn , let Recall that r ∈ (k − 1, k] for a fixed k ∈ N, and define q := min{r − k + 3, r}.
Proof. Consider the following two cases: Case 1: q = r. In this case, we have that r ≤ r − k + 3 so that k ≤ 3, i.e., k = 1, 2, or 3. It is known in [4,9] that the lemma holds for k = 1, by taking f B to be the piecewise constant least squares approximation or interpolant of f . Similarly, when k = 2, the lemma holds, by taking f B to be the piecewise linear interpolant of f at the knots in t [3,4,41]. Finally, when k = 3, it is shown in [20] that the lemma holds for evenly spaced knots and in [30] that a similar result holds (where c B is taken to be a slightly larger constant) for unevenly spaced knots. An alternative construction of the k-monotone constrained f B is given in [23, Appendix A], demonstrating that when k = 3, the lemma holds for unevenly spaced knots with c B = 3 2 Lc 3 t,2 . Hence, the lemma holds in Case 1. Case 2: q < r. In this case, we have that r > r−k+3 = q so that k > 3. Define g := f (k−3) ∈ S 3,H (r + 3 − k, L). By the results of Case 1, there exists a spline Then f B ∈ S k,H (r, L) and for all x ∈ [0, 1], where, if k > 4, we define, for any integrable function h, The above lemma establishes a uniform bound on the error that arises from approximating a k-monotone function f ∈ S k,H (r, L) by a k-monotone spline for a fixed knot sequence. With the help of this result, we obtain the following theorem, which bounds the risk and establishes a convergence rate for the second stage k-monotone B-spline estimator under the sup-norm over this Hölder class.  1 (cf. (11)). Then there exists a constant C k > 0, depending only on k, r, L, c t,1 , c t,2 , c ω,1 , c ω,2 , and σ such that for all n sufficiently large, .
Proof. Fix f ∈ S k,H (r, L). For notational simplicity, let f and f [1] denote f P,t and f [1] P respectively for any given P ∈ P n and t ∈ T Kn . By Lemma 4.1, for each K n ∈ N and t ∈ T Kn , there exists and we bound f − f B ∞ as follows. In view of f B ∈ S t +,k and Lemma 2.1, there . . , f B (x n )) T ∈ R (n+1) . In light of the results in Section 3, we have By the uniform Lipschitz property stated in Theorem 3.1, there exist a uniform positive constant c ∞ and a large n * ∈ N such that for all n ≥ n * , where f [1] is defined in (17). Additionally, it follows from the definition of X T and the non-negativity, upper bound, and support of the B t k,j 's shown in Section 2.2 that for each j = 1, . . . , N, whenever n is sufficiently large such that (2c ω,2 K n /n) ≤ c t,2 . Combining (25)- (27), we obtain In view of (11) and taking the expectation of both sides of the above inequality, we have for all n sufficiently large. This yields the desired result with C k := (c B + c * ) c ∞ c t,2 (k + 1) + c B .
The above theorem establishes the uniform convergence of the estimator f P,t on the entire interval [0, 1], as well as this estimator's consistency, including the consistency at the two boundary points, over a Hölder class. Note that the monotone and convex least-squares estimators are inconsistent at the boundary points due to a non-negligible asymptotic bias [17,28,29,42], which is known as the spiking problem. Furthermore, the asymptotic convergence rate established in Theorem 4.1 will be shown to be optimal in Section 4.2.
An analogous result can be established under the pointwise risk by using the uniform Lipschitz property and piecewise linear function theory [33].  , depending only on p, k, r, L, c t,1 ,  c t,2 , c ω,1 , c ω,2 , and σ, such that for all n sufficiently large,

T. M. Lebair and J. Shen
Proof. Fix arbitrary f ∈ S k,H (r, L) and x * ∈ [0, 1]. For any P ∈ P n and t ∈ T Kn , we again let f and f [1] denote f P,t and f [1] P , respectively. By an argument similar to that in the proof of Theorem 4.1, there exists for all x ∈ [0, 1] for some b := ( b 1 , . . . , b N In what follows, we establish a uniform bound on f ( T ∈ R N , and for any given P and t, let (21). Since b P,t (·) : R N → R N is a Lipschitz piecewise linear function on R N , it follows from piecewise linear function theory [33] that b P,t (·) admits a conic subdivision of R N , which consists of q N polyhedral cones that partition R N . On the sth polyhedral cone where s = 1, . . . , q N , the function b P,t (·) coincides with a linear function whose coefficient matrix is given by W α s for some index set α s defined in (21). By an argument similar to that in [41,Proposition 4.3], we see that for any z, z ∈ R N , there exist real numbers μ 1 , . . . , μ q N ∈ [0, 1] (depending on z and z ) whose sum is one such that Thus for fixed f [1] and f B , there exist μ 1 , . . . , μ q N ∈ [0, 1], whose sum is one, such that where the inequality follows from an argument similar to that in (28), and f := (f (x 0 ), f(x 1 ), . . . , f(x n )) T ∈ R n+1 denotes the noiseless data vector. Furthermore, by (27), we have, for all n sufficiently large, To simplify notation, define the random vectors v : (30) and the bound on By (13), (30), Jensen's inequality, the law of total expectation, we obtain noting that the positive constant c * given in (13) is independent of x ∈ [0, 1]. Combining the above results and (29), we obtain It then follows from the above display and (28) that for all n sufficiently large,

T. M. Lebair and J. Shen
We finally bound the L p -norm risk of the second stage k-monotone B-spline estimator, over a Hölder class, in the following corollary.  1 (cf. (13)). Then there exists a positive constant C p,k , depending only on k, r, L, c t,1 , c t,2 , c ω,1 , c ω,2 , σ, and p, such that for all n sufficiently large, Proof. By the concavity of the function p √ · and the Fubini-Tonelli Theorem, The last inequality follows from Theorem 4.2, as C p,k in Theorem 4.2 does not depend on x * ∈ [0, 1].

Optimal convergence rate in the sup-norm
In this subsection, we show that the attained convergence rate indicates the optimal asymptotic performance in the sup-norm. Consider the k-monotone regression problem given by (4). For simplicity, we assume that for each n ∈ N, the design points (x i ) n i=0 are evenly spaced, i.e., x i = i/n, although this assumption is not crucial and can be relaxed. In what follows, we denote an estimator as f n to emphasize the dependence on the sample size n. Theorem 4.3 below establishes a lower bound under the sup-norm on the minimax risk associated with k-monotone estimators over S k,H (r, L). Its proof relies on minimax lower bound theory from nonparametric estimation [39, Section 2] and a careful construction of a suitable collection of functions (called hypotheses) that satisfy the k-monotone constraint. In particular, such a collection of hypothesis functions from S k,H (r, L) satisfies a certain sup-norm separation order and a small total L 2 -distance order, based upon information theoretical results on distances between probability measures in minimax lower bound theory [27,39]. See [23,Chapter V] for the construction and complete proof; a simpler proof using a similar idea for convex estimation is given in [24].
where inf fn denotes the infimum over all k-monotone estimators f n ∈ S k on the interval [0, 1].
It follows from Theorems 4.1 and 4.3 that the two stage k-monotone B-spline estimator f P,t satisfies: for all n sufficiently large, In light of the above result, we obtain the following theorem demonstrating that f P,t is an (asymptotically) optimally performing estimator over the constrained function class S k,H (r, L), and the optimal rate of convergence is given by log n n r 2r+1 . This result not only unifies the optimal minimax risk theory for monotone and convex estimation in the sup-norm, but also extends such developments to the general k-monotone case with unevenly spaced design points and/or knots, which is the first result of its kind to the best of our knowledge.

Theorem 4.4.
Fix k ∈ N, r ∈ (k − 1, k], and L > 0. Consider the regression problem given by (4). Then where inf fn denotes the infimum over all k-monotone estimators f n ∈ S k on the interval [0, 1].

Numerical examples
In this section, two numerical examples are presented, in order to illustrate the performance of the second stage k-monotone B-spline estimator under the supnorm risk, the pointwise root mean squared error, and the L 2 -norm risk.
In both examples, the first stage unconstrained estimator is given by a minimax optimal unconstrained B-spline estimator, with evenly spaced knots; the second stage k-monotone/constrained B-spline estimator utilizes evenly spaced knots as well. pointwise root mean squared error and the L 2 -norm risk, with q = min(r − k + 3, r) = 3 (cf. Section 4.1) in both cases. Plots of the second stage k-monotone/constrained B-spline estimator, the unconstrained first stage estimator, and the k-monotone least squares estimator are given on the left in Figure 1 for Example 1 and Example 2, with n = 100. The corresponding absolute errors for these three estimators are then plotted on the right in the same figure. To further compare the performance of the three estimators, simulations were run 200 times, and the average performance of each estimator over these simulations was recorded in each case. Three performance metrics were considered, i.e., the sup-norm risk, the pointwise root mean squared error (at x = 0), and the L 2 -norm risk. Table 1 summarizes the performance of the estimators for different sample sizes, where f denotes the computed estimator. It is seen in these examples, the second stage constrained estimator either performs approximately the same as or noticeably outperforms the other two estimators. In particular, the second stage estimator outperforms the other two estimators with respect to the sup-norm risk (in all cases). It is also observed that the k-monotone least squares estimator performs poorly with respect to both the sup-norm risk and the root mean squared error at x = 0. This is expected, as least squares constrained estimators are inconsistent on the boundary [17,42].

Conclusions
This paper establishes a critical uniform Lipschitz property for a two stage k-monotone B-spline estimator with possibly unevenly spaced design points Table 1 Performance of the 2nd Stage Estimator (con.), the 1st Stage Estimator (unc.), and the k-Monotone Least Squares Estimator (LSQR) and/or knots. This property is exploited, in combination with statistical techniques and B-spline approximation theory, to study the asymptotic performance of the two stage k-monotone B-spline estimator under the sup-norm, pointwise, and L p -norm (with p ∈ [1, ∞)) risks. The optimal convergence rate under the sup-norm risk is established. Future research topics include estimation under more general constraints, e.g., multiple derivative constraints or constraints on a linear combination of derivatives of different orders. Additional extensions to Sobolev classes and multivariate regression will also be considered.

Appendix: Proof of the uniform Lipschitz property
In this section, we prove the uniform Lipschitz property stated in Theorem 3.1.

Overview of the proof
The proof of Theorem 3.1 is somewhat technical. To facilitate the reading, we outline its key ideas and provide a road map of the proof as follows. In view of the piecewise linear formulation of b P,t (·) in (23), we see that for any given K n , P , and t, the Lipschitz constant of b P,t (·) with respect to the ∞ -norm is where max α denotes the maximum over all the index sets α defined in (21). Therefore it suffices to establish a uniform bound on all large n, independent of K n , α, P ∈ P n , and t ∈ T Kn . Motivated by [41], we observe that for any diagonal matrix Ξ α (possibly depending on α) with positive diagonal entries, Hence, a natural idea is to choose suitable matrices F α and Ξ α so that the quan- relies on a deep result in B-spline theory (dubbed de Boor's conjecture) proved by Shardin [34]. Roughly speaking, this result says that the ∞ -norm of the inverse of the Gramian of the normalized B-splines of order k is uniformly bounded, independent of the knot sequence and the number of B-splines. be the kth order B-splines on [a, b] defined by a knot sequence t := {a = t 0 < t 1 < · · · < t K = b} for some K ∈ N. Let G ∈ R (K+k−1)×(K+k−1) be the Gramian matrix given by Then there exists a positive constant ρ k , independent of a, b, t , and K, such that G −1 ∞ ≤ ρ k . Inspired by this theorem, we intend to approximate Ξ α F α ΛF T α by an appropriate B-spline Gramian matrix with a uniform approximation error bound. To achieve this goal, we construct suitable matrices F α and Ξ α in Section 7.2; see F α := F α,k,t defined in (39) and Ξ α := k Kn · Ξ k,Vα,t defined in (37). In view of F α ΛF T α = K n · F α X T · Θ · F α X T T , we compute the entries of F α X T in Section 7.3 and show that they are related to suitable B-splines (cf. Proposition 7.2). It is shown in Section 7.4 that F T α and Ξ α F α attain uniform bounds (cf. Proposition 7.4) and that Ξ α F α ΛF T α attains a uniform approximation error bound (cf. Proposition 7.5), all with respect to the ∞ -norm. We then apply Theorem 7.1 to establish a uniform bound on Ξ α F α ΛF T α −1 (cf. Corollary 7.1).
Using these bounds, the uniform Lipschitz property is proven in Section 7.5.
For any fixed K ∈ N and any knot sequence t := (t j ) K j=0 on [0, 1] with (K + 1) distinct knots and the usual extension, consider the diagonal matrices of order (K + k − 1): Δ 0,t := I K+k−1 and for each p = 1, . . . , k, where Δ p,t is defined in (8). Related to these, we further introduce the following matrices of order (K + k − 1): for p = 0, let Ξ 0,t := (Δ 0,t ) −1 = I K+k−1 , and for each p = 1, . . . , k, In addition, define the following square matrices of order r ∈ N: Note that S (r) acts as a summation operator, while D (r) is similar to a difference matrix.
With the above notation, we now define F α,p,t inductively: F α,0,t is as defined in (33), and For notational simplicity, we also write F α,p,t as F α,p for each p = 0, 1, . . . , k. Note that F α,p depends on both α and t for p ≥ 1, since Ξ p−1,V := Ξ p−1,Vα,t depends on both α and t, and Δ p−1,t depends on t. Additionally, since D (qα) , Ξ p−1,V , Δ p−1,t , and S (N ) are all invertible, each F α,p has full row rank by induction. Furthermore, it is easy to see that if α is the empty set, then, in view of F α,0 = I N and using induction, F α,p = I N for each p = 1, . . . , k, regardless of the choice of t.
(40) A useful observation is that F α,k,t can be expressed in terms of S k,t . In fact, it is easy to verify that F α,k,t = Q · F α,0,t · S k,t for a suitable matrix Q. It is shown below that F α,k,t (or simply F α,k ) constructed above is a suitable choice for F α in the piecewise linear formulation of the function b P,t in (23). Proposition 7.1. Fix t ∈ T Kn , and let α be any index set defined in (21). If α is nonempty, then the columns of F T α,k,t form a basis of the null space of (D k,t ) α• ; otherwise, F T α,k,t = I N .
Proof. When α is the empty set, it has been shown that F T α,k,t = I N ; see the discussion below (39). Therefore, in what follows, consider the case when α is nonempty. To simplify notation, we drop the subscript t in Δ p,t , D p,t , Δ p,t , and S p,t for p = 0, 1, . . . , k; see (8), (9), (36), and (40) for the definitions of Δ p,t , D p,t , Δ p,t , and S p,t respectively.
Recalling that N = K n + k − 1, it is easy to see via (9) and (33), that Moreover, it follows from the discussions below (40) that F α,k = Q · F α,0 · S k for a suitable matrix Q. Combining the above results, we have Since F α,k has full row rank, the q α columns of F T α,k are linearly independent. Additionally, since D k is of full row rank, as indicated after (9), so is (D k ) α• . Therefore, rank[(D k ) α• ] = |α| and the null space of (D k ) α• has dimension (K n + k − 1 − |α|), which is equal to q α in light of |α| + |α| = K n − 1. Therefore the columns of F T α,k form a basis for the null space of (D k ) α• . The above proposition shows that for any knot sequent t ∈ T Kn and any index set α, the matrix F α,k,t defined in (39) is a suitable choice for F α that determines the linear piece b α P,t associated with α shown in (23). Hence we conclude this subsection by setting

Properties of the matrix F α X T
As discussed in Section 7.1, the proof of the uniform Lipschitz property boils down to establishing certain uniform bounds in the ∞ -norm, including bounds for Ξ α F α ΛF T α −1 ∞ , where Ξ α is a diagonal matrix with positive diagonal entries to be specified later. Recall that Λ := Λ Kn,P,t = K n · X T Θ X ∈ R N ×N for a design point sequence P and a knot sequence t ∈ T Kn ; see the definition above (18). Note that the design matrix X depends on P and t, and Therefore in order to study the matrix F α ΛF T α , it is essential to know about F α X T . In Proposition 7.2, we determine the values of most of the entries of F α X T in terms of certain B-splines. For this purpose, we first establish a technical lemma, which relates the B-spline design matrix X to certain truncated power functions.
Let P = (x i ) n i=0 be a sequence of design points on the interval [0, 1]. For each i = 0, 1, . . . , n, consider the truncated power function ϕ i : [0, 1] → R given by Lemma 7.1. Let t := (t j ) K j=0 be a sequence of (K + 1) distinct knots on [0, 1] with t 0 = 0, t K = 1, and the usual extension. Let P := (x i ) n i=0 be a sequence of distinct design points with x 0 = 0 and x n = 1, so that the design matrix X is given by ( X T ) j(i+1) = B t j,k (x i ) for each i and j. For any i = 1, . . . , n − 1, j = 1, . . . , N := K + k − 1, and p = 0, 1, . . . , k, where the matrix S p,t is defined in (40), In what follows, we drop the subscript t in S p,t and Δ p,t to simplify notation. Fix i = 1, . . . , n − 1 and j = 1, . . . , N. We proceed by induction on p.
Consider p = 0 first. By the definition of X and [8, page 87], we have Hence, (44) holds for p = 0.
Fix p = 1, . . . , k, and assume that the result holds for (p − 1). First, note that since x i > 0 for all i = 1, . . . , n, we have, in view of (43), that for each q ∈ Z + , the q-th derivative ϕ We consider the following two cases. Case 1: j = 1, . . . , N −p. In this case, by the definition of Δ k−p+1 in (36), the structure of S (N ) in (38) and S p−1 in (40), the induction hypothesis, and (45), we have  where the last equality follows from [8, (viii) on page 6] and the fact that for any i = 1, . . . , n − 1, x i < 1 such that ϕ i (·) is smooth at x = 1. Hence (44) holds for p in Case 2, when j = N − p + 1. Now, fix j = N − p + 2, . . . , N, and assume that (44) holds for (j − 1). Since (Δ k−p ) jj = 1 for j = N − p + 1, . . . , N, we obtain via (40) that By the respective induction hypotheses on j and p, we further have Hence, (44) holds for p in Case 2 by induction. This completes the proof.
Proof. Let x * be an arbitrary but fixed real number in the open interval (0, 1). Consider a sequence of design points P = (x i ) n i=0 with x 0 = 0 and x n = 1