Derivatives diagnostics and robustness for smoothing splines
Introduction
Consider first the modelwhere yk are observations at design points is a smooth curve and εk are zero mean, uncorrelated random variables. Many competing methods for estimating the curve f(t) are available, for example, kernel-based methods and smoothing splines. Nonparametric regression using splines is a rapidly growing branch of statistics in recent years. Suppose the estimate f̂ is the minimizer ofover the class of all twice differentiable functions f, where λ is referred to as the smoothing parameter. λ plays a key role in controlling the trade-off between the goodness of fit represented by the residual sum of squares and “smoothness” of the estimate measured by integral of the square of the second derivative.
For smoothing splines, deletion diagnostics and local influence diagnostics have been developed. Thomas (1991) proposed local influence diagnostics for the smoothing parameter in spline smoothing based on Cook's (1986) approach. Recently, some local influence diagnostics in partially linear models and nonlinear mixed-effects models were given by Zhu et al. (2003) and Lee and Xu (2003), respectively. For the deletion diagnostics, when the value of the smoothing parameter is fixed, the current diagnostics for smoothing splines are mostly based on the change on the fitted value, for example, Eubank (1985), Eubank and Gunst (1986) and Fung et al. (2002). In addition to fitted values, the slope and the second derivative also supplement important information in characterizing the behavior of plane curves. However, there is no diagnostic yet developed which incorporates the information provided by the slope and the second derivative in nonparametric regression using splines. Such diagnostics are developed and their properties are investigated. Secondly, as indicated by Carroll and Ruppert (1988, p. 175), neither diagnostics nor robust methods alone are as useful as the appropriate combination of both. The more the influential observations are learned, the more likely a sensible robust method is developed. Therefore, a sensible robust nonparametric regression method is proposed by appropriately downweighting these influential observations. These influence measures alone could be used to detect the observations inconsistent with the model while the associated robust method could provide an alternative fit.
One motivating example is given in next section to illustrate these interesting issues. Some influence measures are developed in Section 3 while the associated robust nonparametric regression methods are described in Section 4. In Section 5, the proposed diagnostics and robust methods are applied to the data used in Section 2. In addition, a numerical simulation is set up under different situations, including influential observations of different types, different sample sizes, noise levels, and mean functions. Several extensions to a variety of nonparametric regression models are given in Section 6. Finally, a concluding discussion is given in Section 7.
Section snippets
Motivating examples
The data are taken from a study by Brinkman (1981) (see also Simonoff, 1996). Eighty-eight measurements for three variables, , and E from an experiment in which ethanol was burned in a single-cylinder automobile test engine were recorded, where NOx is the concentration of nitric oxide (NO) and nitrogen dioxide (NO2) in engine exhaust, normalized by the work done by the engine, C is the compression ratio of the engine, and E is the equivalence ratio at which the engine was run. Different
Diagnostics
In this section, the influence measures for assessing the case influence on the slope and the second derivative of the fitted curve are proposed.
For ease of exposition and consistency with later development in Section 6, let f(t)=∑j=1pajBj(t), where p is the number of suitably chosen basis functions, usually at least large enough to ensure the accuracy of the approximation, for example, p=n+2, as all knots included and Bj(t) are basis functions, for example, the commonly used B-splines. Thus,
What next? robust nonparametric regression
A variety of robust methods have been developed for nonparametric regression (see Eubank, 1988, pp. 173–176; Simonoff, 1996, pp. 200–203; He et al., 2002). Most of them are M-type estimators of which goal is to reduce the influence of outliers in the estimation process, rather than identifying them. On the other hand, in this section, a robust method which identifies influential observations first and then downweights these influential observations is proposed.
When the smoothing parameter is
Numerical illustrations
In this section, one simulated example and one real data example are presented. In these examples, the influential observations discussed in Section 2 will be identified. In addition, a thorough simulation experiment had been set up for a range of scenarios, including different sample sizes, replications and noise levels. The simulation results attest to the validity of the warning limit proposed in the previous section. Also, the simulation results also illustrate the effectiveness of the
Spline smoothing in generalized linear models
Consider the standard generalized linear model in which each component of the response vector has a distribution taking the formwhere θk and φ are scalar parameters, and and c(·) are specific functions. The dependence of the response yk on the associated explanatory variable tk can be modeled through the link function d(·), where θk=d(α+βtk).u(φ)=1 and the natural link are assumed hereafter. Also, let θk=f(tk) and f is estimated by the penalized
Discussion
As illustrated in the examples given in Section 5, the identified influential observations might have great impact on the behavior of the fitted curve and further result in different interpretations of the data. The robust method thus provides an alternative sensible fit. The influence measures in Section 3 might still underestimate the case influence since only the differences at the design points are used. It is also sensible to use integration rather than summation in developing these
Acknowledgements
I would like to thank a referee, an associate editor, and the editor, Professor Kontoghiorghes, for helpful suggestions that led to a substantial improvement in the paper. I would also like to thank Professor Kosorok at Madison, USA, for useful comments on an earlier version of this manuscript. The author is partly supported by Taiwan NSC Grant (Project: NSCg2-2118-M-029-004).
References (24)
- et al.
Diagnostics for penalized least-squares estimators
Statist. Probab. Lett.
(1986) Choosing among two-dimensional smoothers in practice
Comput. Statist. Data Anal.
(1994)Smoothing parameter selection for smoothing splinesa simulation study
Comput. Statist. Data Anal.
(2003)Ethanol fuel-A single-cylinder engine study of efficiency and exhaust emissions
SAE Trans.
(1981)- et al.
Transformations and Weighting in Regression
(1988) Assessment of local influence (with discussion)
J. Roy. Statist. Soc. Ser. B
(1986)- et al.
Smoothing noisy data with spline functions
Numer. Math.
(1979) Diagnostics for smoothing splines
J. Roy. Statist. Soc. Ser. B
(1985)Spline Smoothing and Nonparametric Regression
(1988)- et al.
A note on local influence based on normal curvature
J. Roy. Statist. Soc. Ser. B
(1997)