Convergence rate and Bahadur type representation of general smoothing spline M-estimates

: This work was motivated by Cox and O’Sullivan (1990) who derived the optimal convergence rates for smoothing spline estimates when the loss function is suﬃciently smooth. However, the study of statistical estimates resulting from nonsmooth criteria functions has become popular in recent years. In this paper, we will study the asymptotic properties of the smoothing spline estimates when the criteria functions are insuﬃciently smooth. Here, the smoothing spline estimate is deﬁned as an approximate solution to an M-estimating equation. We prove that if the derivative of loss function is Lipschitz, then the convergence rate and Bahadur type representation of the estimate can be derived simultaneously. For a speciﬁc class of loss functions with discontinuous derivatives, the Bahadur type representation is also presented provided that we know the convergence rate. Examples are given when Huber’s robust loss and median loss are employed.


Introduction
Consider the bivariate data (X 1 , Y 1 ), . . . , (X n , Y n ), which form an independent and identical sample from population (X, Y ). We assume that the data are linked by the following nonparametric regression model where X i 's take values in I = [0, 1], e i 's are iid noises independent of X i 's. It is of particular interest to estimate the unknown functional parameter θ 0 , a sufficiently smooth element in some Soblev space. Usually, the estimation problem under model (1.1) is ill-posed since we allow the space of parameters to be infinite-dimensional. However, we can overcome this ill-posedness by using a penalty functional, i.e., we estimate θ 0 by optimizing a penalized criteria function. This estimation procedure is called the method of penalization (see Wahba (1990) for a detailed review and related references). When the criteria function is sufficiently smooth, the convergence rate of the penalized estimates has been studied by many authors under various settings. The most important literature includes Chen (1991); Cox (1983Cox ( , 1988; Cox and O'Sullivan (1990); Gu and Qiu (1993); Gu and Ma (2005); O'Sullivan (1993,1995); Silverman (1982), and the references therein. For non-penalized optimization, Wong and Severini (1991) obtained optimal convergence rates for ǫ-MLE.
However, in practice, the estimates resulted from a nonsmooth loss function l are also important. Examples include (1) l(s) = s 2 1(|s| ≤ c)/2 + (c|s| − c 2 /2)1(|s| > c) for some c > 0; (2) l(s) = |s|. The function ϕ(s) = d ds l(s) is called a moment function. The corresponding estimates will have unique features (such as robustness), and hence, their theoretical properties need to be explored. Shen (1998) used a penetrating method to obtain the optimal convergence rates of spline quantile estimate. As far as we know, there seems to be little theory treating the infinite-dimensional smoothing spline estimates resulted from general nonsmooth ϕ. Related references include Shen and Wong (1994) who studied asymptotic properties of a sieve MLE, and Chen and Pouzo (2008) who obtained optimal convergence rates for sieve estimates identified by a class of moment estimating equations with general nonsmooth moment functions.
In this paper, we will study the asymptotic behavior of penalized estimates in two general situations. In the first, by assuming that ϕ is Lipschitz, we simultaneously obtain the optimal convergence rates and Bahadur type representation for the estimates, provided that we know that the estimate is consistent. In the second, we allow ϕ to be discontinuous, and obtain a Bahadur type representation for the estimates provided that we know the convergence rate. On the one hand, our results are not only about the convergence rate but also about the Bahadur type representation, which is different from the previous contributions mainly addressing the problem of convergence rate. On the other hand, the penalized estimate considered in this paper is obtained directly over the infinite-dimensional parameter space, which is different from a sieve estimate obtained over a sequence of finite-dimensional sub-spaces approaching the entire parameter space.
Since Bahadur type representation is part of our work, it seems necessary to review some relevant literature about this issue. Most of the existing results have been built under finite-dimensional situations, i.e., θ 0 is a finite-dimensional parameter. Relevant references include He and Shao (1996), Wu (2005Wu ( , 2007, and the reference therein. But, when the parameter space is infinite-dimensional, e.g., a Sobolev space, little result has been gained. Portnoy (1997) derived a Bahadur type representation for quantile smoothing spline estimates which pointwise holds on some selected knots in I (Portnoy, 1997, Theorem 2.2). While it still remains open whether the Bahadur type representation holds when general insufficiently smooth loss functions and usual Sobolev norms have been employed, which is also a motivation of this work.

Notation and assumptions
With W well defined, we can define the estimate of θ 0 . Let X 1 , . . . , X n be iid samples drawn from density f . The responses Y i 's and covariates X i 's are linked by the nonparametric model (1.1). We estimate θ 0 by finding the solutionθ n,λn to the following approximate penalized M-estimating equation Assumptions A.1 was originally introduced by Gu and Qiu (1993) to simultaneously diagonalize V and J. This assumption means that for any ǫ > 0, there are linear functionals l 1 , . . . , l k such that if l j (θ) = 0, j = 1, . . . , k, then V (θ, θ) ≤ ǫ(V + J)(θ, θ) for any θ ∈ Θ 1 . Assumption A.1 is not a restrictive assumption and holds under general settings (see, e.g., (Weinberger, 1974, Theorem 2.9)). A direct consequence from Assumption A.1 is that the bilinear functionals V and J can be simultaneously diagonalized, which is the following result.
We let {h µ } be the sequence satisfying (2.2). By Proposition 2.1, we may rewrite ·, · 1 as θ, By Proposition 2.2, the sequence {h µ } forms a basis in Θ 1 . To facilitate the proofs of our main results, we assume, throughout this work, the following assumption for {h µ }.
To define a Sobolev space with a different order, for b ≥ 0, we let . Let C(I) be the Banach space of continuous functions defined on I endowed with supremum norm · sup . Define S(θ) = E {ϕ(Y − θ(X))K X }, where the expectation is taken with respect to (Y, X), and define S λ (θ) = nS(θ) + λW (θ). Let E{ϕ(e − u)} = ζ(u), for u ∈ R, where e denotes the error in model (1.1). Assumption A.3. There is a γ ∈ (0, 1], a positive number α and a constant C ϕ > 0 such that There is a neighborhood A of θ 0 in C(I) and constants C A and δ 0 such that for any 0 < δ ≤ δ 0 , Assumption A.5. Model errors e i 's are independent of X i 's.
Assumption A.3 ′ is the so-called stochastic equi-continuity condition (see Pollard (1982)), and has been adopted by a number of authors (see Chen et al. (2003); Chen and Pouzo (2008); He and Shao (1996)). Assumption A.3 ′ is satisfied by quantile loss. Assumption A.3 is satisfied by several commonly used robust loss functions such as the Huber's loss function. Assumption A.4 essentially requires two things. First, θ 0 is smoother than the elements in Θ 1 . As demonstrated later, the estimateθ n,λ will be obtained in space Θ 1 , so we need this assumption to guarantee that the true parameter θ 0 is smoother than θ n,λ . Second, θ 0 is identifiable since it is assumed to be the unique root of S(θ) = 0. Assumption A.5 requires that the random design values X i 's are independent of the model errors e i 's. This is only a technical assumption which facilitates the proofs. We may use Assumption A.5 to rewrite the functional S(θ) as Assumption A.6. ζ is twice continuously differentiable and both ζ ′ and ζ ′′ are upper bounded. Furthermore, there is a neighborhood I of zero such that inf u∈I ζ ′ (u) > 0, and sup u∈I E{|ϕ(e − u)| 2 } < ∞.
Assumption A.6 is satisfied by some commonly adopted function ϕ (see examples in Section 5). Under Assumption A.6, we have the following result which demonstrates that the zero of S λ is sufficiently close to θ 0 .
This is the local uniqueness of the solution to S λ .
The proof of Proposition 2.2 is given in Appendix B. Proposition 2.2 is similar to Theorem 3.1 of Cox and O'Sullivan (1990). However, unlike the latter, Proposition 2.2 guarantees the uniqueness of θ λ , i.e., θ λ is locally fixed when b changes. If we consider θ λ as the "target" ofθ n,λ , then changing the parameter space Θ b for 0 ≤ b ≤ 1 will not move this target, this means that θ λ is somewhat "identifiable".
Both Assumptions A.7 and A.7 ′ do not require thatθ n,λ is a root of S n,λ . However, we need the assumption thatθ n,λ is as smooth as θ λ . In some special cases,θ n,λ can be taken as an approximate MLE, e.g., ǫ-MLE introduced by Wong and Severini (1991), and the consistency ofθ n,λ can be proved under the assumption of the uniform continuity of the likelihood and the relative compactness of the parameter space (see (Wong and Severini, 1991, Theorem 1)). θ n,λ − θ λ 1 = o p (1) thus follows from the consistency ofθ n,λ and the fact that θ λ − θ 0 1 = o(1).

Main results
Our main results consist of two parts. Section 3.1 focuses on the convergence rates ofθ n,λ , while Section 3.2 includes the Bahadur type representations forθ n,λ .

Convergence rate
The following result demonstrates whenθ n,λ attains the optimal convergence rate.
Remark 3.1. The quantities m, b, d have a clear statistical interpretation. m and d respectively represent the degree of smoothness of the estimateθ n,λ and the true parameter θ 0 , and b indicates the norm under which the bias ofθ n,λ is measured. In practice, it is possible to regularize the values of d, m, b and λ so that condition (iii) in Theorem 3.1 is satisfied. One way is to let d be relatively large, for instance, d, m, b and λ satisfy max κ + m(1 + b) 2md + 1 , κ + 4mb + 1 2md + 1 < 1 and λ/n = n −2m/(2md+1) . (3.1) Under (3.1), it can be verified according to Theorem 3.1 that the convergence rate of θ n, We claim that this convergence rate is optimal. This is based on the following considerations. Since θ 0 is md-times differentiable, andθ (mb) n,λ is clearly an estimate of θ (mb) 0 , where θ (mb) denotes the mb-order derivative of θ, then by Stone (1982), the optimal convergence rate for θ (mb) n,λ −θ (mb) 0 . On the other hand, we notice that θ n,λ − θ 0 b ≥ θ (mb) n,λ − θ (mb) 0 L 2 (I) , which means that θ n,λ − θ 0 b cannot converge faster than θ (mb) n,λ − θ (mb) 0 L 2 (I) . So n −m(d−b)/(2md+1) is the optimal convergence rate for θ n,λ − θ 0 b , and this rate is achieved when λ/n = n −2m/(2md+1) .
The technical arguments in the proofs are valid only for a suitably large d. When d is small, which implies that θ 0 is not smooth enough, our approach cannot result in an optimal convergence rate. In such situations, we leave the achievability of the optimal convergence rate as an open problem.
Remark 3.2. Wong and Severini (1991) obtained an optimal convergence rate for the infinite-dimensional nonpenalized MLE under the assumption that the loss function is twice uniformly and continuously differentiable. Here, we briefly introduce their way of proof. They first established an important result which they call "Basic Lemma". This result states that the bias of the estimate is controlled by two terms. Then they obtained the optimal convergence rates by balancing these two terms. However, the second order derivative of the likelihood is needed for the proof of their "Basic Lemma".
In the proof of Theorem 3.1, we cannot establish such "Basic Lemma" as we do not assume that the second order derivative of the likelihood exists. Instead, we first establish Lemma A.1 which states that the variation of S n,λ (θ) is stochastically controlled by the variation of θ. Using Lemma A.1, one can obtain the desired results without finding the Fréchet derivative of S n,λ . He and Shao (1996) used the same idea to establish the Bahadur representation for finitedimensional estimates. Here, we have actually generalized their result to the infinite-dimensional setting using the techniques introduced by Kosorok (2008). We then use Lemma A.1 to establish a quadratic inequality for the term θ n,λ − θ 0 b , and use this inequality to obtain the convergence rate forθ n,λ .
Remark 3.3. Cox and O'Sullivan (1990) obtained the convergence rate for the penalized estimates under the assumption that the likelihood is three times Fréchet differentiable. They controlled the bias of the estimate by two terms which they called the systematic error and stochastic error. The optimal convergence rates were obtained by balancing these two error terms. Their proof also relies on the sufficient smoothness of penalized likelihood.
One of the commonly used parameter spaces is H 2 (I) which corresponds to the case m = 2. For instance, Gu and Qiu (1993) have considered this parameter space for purpose of estimating spline densities, which is different from our problem. For this specific situation, it can be shown by Theorem 3.1 that the following result holds.

Bahadur type representation
The purpose of establishing a Bahadur representation is to approximate an estimate by a sum of independent random variables. Generally speaking, the remainder in a Bahadur representation should be of higher orders than usual statistical bias. This might be the reason why this sort of result attracts so many authors. In this section, we attempt to establish a Bahadur type representation forθ n,λ . In the following result, we consider the case that ϕ is Lipschitz, i.e., satisfying Assumption A.3. Theorem 3.3. Let the assumptions in Theorem 3.1 be satisfied. Assume further that δ n = O(a n n −m(d−b)/(2md+1) ). If d, m, b and λ satisfy (3.1), then the following representation holds, (3. 2) The proof of Theorem 3.3 is based on arguments similar to the proof of Theorem 3.1 and can also be found in Appendix A. However, we should mention that the convergence rate in (3.2) might be suboptimal. By Theorem 3.3, if we fix b, then the convergence rate of the remainder termθ n,λ −θ 0 +DS λ (θ 0 ) −1 S n,λ (θ 0 ) under · b -norm could be arbitrarily close to n −1 (log log n) 1/2 when m and d are large enough, which could be even faster than the optimal convergence rate ofθ n,λ discussed in Remark 3.1. By Theorems 3.1 and 3.3, when ϕ is Lipschitz, it is possible to derive the optimal convergence rate and (suboptimal) Bahadur type representation simultaneously from the assumption thatθ n,λ is consistent. Unfortunately, when ϕ is not Lipschitz, or even not continuous, the derivation of optimal convergence rate becomes complicated. However, if the convergence rate ofθ n,λ is known a priori, we may derive a suboptimal Bahadur type representation.
In particular, if γ = 1 and (3.7) In particular, if (3.5) holds and δ n = O(n 1/2 s The proof of Theorem 3.4 is given in Appendix A which relies on Lemma A.3. Remark 3.4. When S n,λ is Fréchet differentiable, we might still be able to obtain a result similar to Theorem 3.4 without using Lemma A.3. However, when S n,λ is not Fréchet differentiable, Lemma A.3 plays an important role in the proof of Theorem 3.4. Actually, Lemma A.3 somewhat overcomes the difficulty caused by the nonsmoothness of S n,λ (θ). Stone (1982) proved that the optimal convergence rate for a nonparametric estimate under the supremum norm is (n/ log n) −md/(2md+1) . As demonstrated by Stone (1982), this optimal convergence rate is achievable under certain conditions. Portnoy (1997) obtained a Bahadur type representation for quantile smoothing spline estimates (see (Portnoy, 1997, Theorem 2.2)). This representation holds at the breakpoints (a discrete set of points) in I. While, the representation (3.3) or (3.7) holds under Sobolev norms. On the other hand, the proof of the result by Portnoy (1997) strongly relies on the property of quantile loss. Actually, the quantile smoothing spline estimate has to be piecewise linear. While, the result in Theorem 3.4 is valid for a general class of ϕ.

This section contains several illustrative examples. In these examples, we let
be the sequence of orthonormal basis in L 2 (I) under L 2 (I)-norm. Suppose X i 's are independent and uniformly distributed on I. Then Assumptions A.1 and A.2 follow straightforwardly. Assume that the true parameter θ 0 ∈ Θ d with d > 2 + 1/(2m). In the following examples, we assume that the density function f e of the noise e satisfies the following assumption.
Assumption A.8. f e is symmetric, strictly positive around zero, and having a bounded derivative.
It can be verified that in these examples, Assumption A.6 follows from Assumption A.8.
Example 4.1. Consider the Huber's loss which corresponds to the following ϕ 1 where c > 0 is a constant. It is easy to verify that Assumption A.3 holds for C ϕ1 = γ = 1.
Let N 0 be a subset of C(I) such that any θ ∈ N 0 satisfies θ − θ 0 sup ∈ I, where I is indicated in Assumption A.6. Suppose θ ∈ N 0 satisfies S(θ) = 0. By Fubini's theorem, for any µ ∈ Z, which implies ζ((θ−θ 0 )(x)) = 0 for any x ∈ I. Therefore, θ = θ 0 by monotonicity of ζ over I. This verifies the identifiability of θ 0 , i.e, Assumption A.4. By Corollary 3.2, when m = 2, for some suitable b, d and λ which satisfy the assumptions in Corollary 3.2,θ n,λ achieves the optimal convergence rate under · b , and the following Bahadur type representation holds Example 4.2. We consider the negative sign function ϕ 2 (e) := −sgn(e), then θ n,λ corresponds to the median loss. By an argument similar to the proof of Theorem 3.2 in He and Shao (1996), Assumption A.3 ′ holds. Similar to Example 4.1, it can be shown that both Assumptions A.4 and A.6 hold. Consequently, when the convergence rate ofθ n,λ is available, a representation of type (3.3) holds with the convergence rate of R n indicated by (3.7).

Theoretical framework
In the previous sections, we have introduced the main results in this paper and several sufficient conditions which are used to derive these results. In this section, we will give some additional framework which is useful for us to continue the theoretical derivation. The proofs of all the propositions in this section can be found in Appendix B.

About the kernel function K and operator W
To prove our main results, the function K, which is used to define the equation (2.1), has to satisfy certain desired properties. The following result guarantees the existence of K which satisfies all such desired properties.
Proposition 5.1. For any β ∈ (1/(2m), 1], there is a bivariate kernel function K(·, ·) defined on I × I satisfying the properties: Throughout this paper, we assume that for some fixed β ∈ (1/(2m), 1], K : I × I → R satisfies the properties (i)-(iii) in Proposition 5.1. This requirement for β is related to the dimension of I, which is 1 in the present situation.
The original domain of W is Θ 1 . To facilitate the technical proofs, it is useful to extend this domain to be a larger space, say Θ b for 0 ≤ b ≤ 1. There are two equivalent ways to achieve this extension. One way is based on the fact Using this fact, one can define W µ θ µ h µ = µ γµ 1+γµ θ µ h µ , which is a bounded linear operator from Θ b to Θ b for any 0 ≤ b ≤ 1. The other way is through Lemma 2.1 in Cox and O'Sullivan (1990). By such extension, W is a well defined bounded linear operator from Θ b to Θ b for any 0 ≤ b ≤ 1.
To conclude this subsection, we assert that, by the above properties of K and W , S n,λ (θ) (defined in Section 2) is exactly the Fréchet derivative of where J(θ) is defined in the beginning of Section 2 and ρ is the loss function satisfying ρ ′ = ϕ. We leave the details of the verification to the interested readers. So, finding the exact solution to S n,λ (θ) is equivalent to minimizing l n (θ). This demonstrates the credibility of estimating θ 0 through finding the approximate solution to S n,λ (θ).

Fréchet derivatives and their applications
Our technical proofs rely on the exact calculations of the Fréchet derivatives. The following results summarize these calculations.
where · sup indicates the supremum norm.
By Proposition 5.3, when θ is sufficiently close to θ 0 , there is certain similarity between the norms defined by V and DS(θ)·, · 1 . However, the basis {h µ } might no longer be orthogonal under the latter norm. Therefore, it is desired to seek a new basis which is orthonormal under the latter norm. For this purpose, we define V * (θ,θ) = DS(θ * )θ,θ 1 for any θ * ∈ K and θ,θ ∈ Θ 1 . Thanks to Proposition 5.3 and Assumption A.1, if θ * ∈ K, then V * is completely continuous with respect to V * + J. So there are eigenvalues {γ * µ } and eigenvectors {h * µ } ⊆ Θ 1 satisfying Furthermore, an application of Courant-Weyl's principle (Weinberger, 1974, Theorem 5.2) shows that there are positives c 1 and c 2 (independent of θ * ∈ K) such that and let θ * b = θ, θ * b . Let Θ * b be the completion of {θ ∈ Θ 1 | θ * b < ∞} under · * b . By Proposition 5.3, it can be verified that Θ * 0 = Θ 0 and Θ * 1 = Θ 1 . Applying the interpolation approach introduced in Cox (1988), Θ * b = H mb (I), where the equality means set equality and norm equivalence. It can be further shown that this equivalence is uniform for θ * ∈ K, i.e., the following result holds.

Conclusion and future work
Theoretical properties of smoothing spline estimates have been studied in this paper. Precisely, we have demonstrated both optimal convergence rates and Bahadur type representations for a smoothing spline estimate which is an approximate root of an M-estimating equation. In particular, we assume that the moment function ϕ, which plays a role in characterizing the M-estimating equation, may be either Lipschitz or discontinuous.
We have considered only unidimensional splines. Both the associated editor and one anonymous reviewer have suggested to consider the multidimensional situation. This is a valuable but complicated problem. We conjecture that the techniques used in unidimensional case can also be applied to multidimensional situations, i.e., the penalty functional becomes where N * is the set of nonnegative integers and θ ∈ H 2 ([0, 1] p ). However, the generalizations to multidimensional situations might involve more complicated notation and mathematical derivations. We intend to leave this problem as future work. As suggested by an anonymous reviewer, another issue we intend to explore in future is to generalize the model framework. In this paper, the model where samples were drawn is the classical nonparametric model y = θ(x) + e. This framework restricts the applications and needs to be extended. One extension is to assume that samples (X i , Y i ) are iid drawn from an unknown distribution P (X, Y ). This new setting does not require a regression model to link X and Y , and thus, allows more flexibility. One particular example is the support vector machines in which Y takes values 1 or -1 indicating positive and negative classes respectively. A classifier θ, which belongs to a Sobolev space, could be found by minimizing n i=1 (1 − Y i θ(X i )) + + λJ(θ), with (a) + = a if a < 0 and 0 otherwise, and with J being the penalty functional. The results in this paper cannot be directly applied to this situation, and we intend to explore their extensions in future work.

Appendix A: Proofs of the results in Section 3
In this section, proofs of the results in Section 3 will be given. Entropy theory will be used in the proofs. Let B b denote the unit ball in Θ b , i.e., B b = {θ ∈ Θ b | θ b ≤ 1}. Define D(ǫ, · sup ) to be the packing number of B 1 under supremum norm, i.e., the maximal number of the elements that can be fit in B 1 while maintaining a distance greater than ǫ between all elements. Then from Cucker and Smale (2002), there is a positive constant c > 0 such that log 2 D(ǫ, · sup ) ≤ cǫ −1/m . We refer to Zhou (2002) for entropy theory in general reproducing kernel Hilbert spaces. To facilitate the technical proofs, we assume without loss of generality that ζ ′ (0) = 1. All the arguments can be applied without much revision to the case that ζ ′ (0) = 1.
Before proving Theorem 3.1, we need several technical lemmas.
Proof of Lemma A.1. Proofs of parts (i) and (ii) are similar, so we focus on part (i) primarily and briefly discuss the proof of part (ii).
If we preselect t such that const · t 2 > K 2 2 2α/γ , then the above sum converges to zero as n → ∞. This completes the proof of part (ii).
The following lemma is used to prove Theorem 3.4 part (ii).
. Therefore, by Freedman's inequality (Freedman (1975)), where C A is constant defined in Assumption A.3 ′ . Thus, by choosing a large C and letting n → ∞, ξ k ∈T k P ξ k → 0. Before proceeding further, we introduce the following variant of Bernstein type inequality, which is proved by Yurinskiȋ (1976).
To prove Claim I, it is sufficient to show S n,λ (θ λ ) − S λ (θ λ ) 1 = O p (n 1/2 ). For any µ ∈ Z, by Cauchy's inequality, Let E{·} and V ar{·} denote the expectation and variance w.r.t. (X i , Y i )'s, then it follows by Fubini's theorem and Assumption A.5 that with some constant C 0 independent of µ. Therefore, by Fubini's theorem which proves Claim I. Using a similar argument in the proof of Proposition 5.4 (see Appendix B), we can show that for any θ * = θ λ ∈ K, DS λ (θ * ) −1 is a well defined element in B(Θ b , Θ b ) for and 1/(2m) < b < 1 − 1/(2m).
By the expansion in (A.9), we get that By independence between X i and e i , Assumption A.3 and that θ λ − θ 0 sup is bounded uniformly for λ, we have the following approximation Therefore, Claim IV holds.
Next, we will use Claims I-IV and several relevant assumptions to establish an inequality for θ b , and find the optimal convergence rate.
Proof of Proposition 5.2. We only prove the results under Assumption A.3. For Assumption A.3 ′ , the proof is similar.
(iii) Proof can be finished similarly to that in (ii).
Proof of Proposition 5.3. We only show the lower bound. The proof for the upper bound is similar. Suppose ξ ∈ Θ 1 . For any µ ∈ Z, by Fubini's theorem, Therefore, On the other hand, by Cauchy's inequality and Proposition 5.1(iii), we have Therefore, by dominated convergence theorem, the summation and expectation in (B.1) could be changed. Thus, by Proposition 5.1 (ii) This completes the proof of Proposition 5.3.