Summary
Statistical regression methods in environmental chemistry are of vital importance. Regression techniques provide environmental chemical analysts with the ability to calibrate instruments and model large environmental systems.
It has become apparent that ordinary least-squares regression is not well suited to modeling data that contains outliers or strong nonlinearities. In the presence of outlying data robust regression methods prove to be a useful tool, while various non-parametric regression models are useful should the data possess nonlinearities or high levels of noise.
Robust techniques have the ability to detect outliers and dampen their effect on the modeling procedure. Several robust regression methods have been proposed but this article focuses on the least median of squares method and reweighted least squares regression.
The non-parametric models to be discussed include the ACE model, the PI model and the MARS model. Unlike ordinary least squares, these methods have evolved only recently, hence there is only limited documentation available on these methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Abbreviations
- LS:
-
ordinary least squares
- ACE:
-
alternating conditional expectations
- PI:
-
pi implementation
- MARS:
-
multivariate adaptive regression splines
- LMS:
-
least median squares
- RLS:
-
reweighted least squares
- PE gcv :
-
generalized cross-validation estimate of the prediction error
- lo f :
-
lack-of-fit
- mi :
-
maximum level of interaction
- RSS :
-
residual sum of squares
- mad :
-
median absolute deviation
- CV :
-
cross-validating
- NP :
-
number of estimated parameters
- GCV :
-
generalized cross-validation
- MSE :
-
mean square error
- AIC :
-
Akaike’s information criterion
- y :
-
response variable
- X i :
-
i-th predictor variable; i = 1,..., m
- ε :
-
residual component
- β i :
-
i-th regression coefficient
- m :
-
number of predictor variables
- N(·):
-
normal distribution
- n :
-
number of observational units
- Z :
-
sample of data vectors
- Z′:
-
corrupted sample of data vectors
- T :
-
regression estimator
- \((\hat .)\) :
-
estimate of (·)
- ‖(·)‖ :
-
magnitude of (·)
- гn :
-
finite sample breakdown point
- σ 2 :
-
population variance
- σ :
-
population standard deviation
- \({\hat \sigma _{LS}}\) :
-
population standard deviation estimated by LS
- \({\hat \sigma _{LMS}}\) :
-
population standard deviation estimated by LMS
- z :
-
standardized residual component
- |(·)|:
-
modulus of (·)
- w i :
-
weight assigned to the i-th observational unit
- R 2 :
-
coefficient of determination
- R 2cv :
-
cross-validated coefficient of determination
- R 2adj :
-
adjusted coefficient of determination
- s :
-
scatterplot smoother
- c i :
-
i-th cutpoint
- R k :
-
k-th set of observations
- η((·)):
-
neighborhood of (·)
- a :
-
constant
- d ((·)):
-
an even function that decreases with |(·)|
- S0j :
-
weight given to y jin producing a smoothing estimate of the observation x 0
- k :
-
number of observations in a symmetric neighborhood
- r −i :
-
cross-validated residual component
- s −i :
-
fitted smooth with the point xi removed
- \(\hat \xi \) :
-
local error estimate
- K*:
-
initial span
- ϱ(·):
-
truncated power function
- q :
-
spline degree
- t :
-
knot
- θ(·):
-
transformed response
- g(·):
-
transformed predictor
- e 2 :
-
unexplained variance
- θ* (·):
-
optimal transformed response
- g*(·):
-
optimal transformed predictor
- Пj :
-
j-th product in a PI model
- ø(·):
-
cubic spline basis function
- J :
-
number of products in a PI model
- K :
-
number of knots in each cubic spline basis function
- J*:
-
optimal number of products in a PImodel
- U(·):
-
uniform distribution
- B p :
-
p-th multivariate spline basis function
- P :
-
number of multivariate spline basis functions
- V :
-
number of groups the data is divided into for cross-validation
- \((\mathop .\limits^ - )\) :
-
mean value of (·)
- f −v :
-
estimated function when the v-th group has been removed.
References
Draper N, Smith H (1981) Applied regression analysis. John Wiley and Sons, California
Friedman J (1991) Multivariate adaptive regression splines (with discussion). Annals of Statistics 19: 1–141
Myers H (1990) Classical and modern regressions with applications. PWS-KENT, Boston
Breiman L, Friedman J (1985) Estimating optimal transformations for multiple regression and correlation (with discussion). J American Statistical Association 80: 580–619
Frank I (1988) ACE: a non-linear regression model. Chemometrics and Intelligent Laboratory Systems 3: 301–313
) S-PLUS for windows user’s manual volume 2 Version 3.1, Seattle: Statistical Sciences, Inc.
Fox J, Long J (1990) Modern methods of data analysis. Sage Publications, Inc. California
Härdle W (1990) Applied nonparametric regression. Cambridge University Press
Clare B (1993) Structure-activity correlations for psychotomimetics. 2. Phenylalkylamines: a treatment on nonlinearity using the alternating conditional expectations technique. Chemometrics and Intelligent Laboratory Systems 18: 71–92
Breiman L (1991) The H method for estimating multivariate functions from noisy data. Technometrics 33: 125–160
Sekulic S, Kowalski B (1992) Mars: A tutorial. J Chemometrics 6: 199–216
Friedman J (1988) Fitting functions to noisy data in high dimensions. Technical Report 101, Statistics Department, Stanford University
Hastie T, Tibshirani R (1991) Generalized additive models. Chapman and Hall, London
Hastie T, Tibshirani R (1986) Generalized additive models (with discussion). Statist Sci 1: 297–318
Breiman L (1993) Fitting additive models to regression data. Computational statistics and data analysis 15: 13–46
Blank T, Brown S (1993) Nonlinear multivariate mapping of chemical data using feedforward neural networks. Analytical chemistry 65: 3081–3089
Rousseeuw P (1991) Tutorial to robust statistics. J Chemometrics 5: 1–20
Frank I (1989) Comparative Monte Carlo study of biased regression techniques. Technical Report 105, Statistics Department, Stanford University
Höskuldsson A (1988) PLS regression methods. J Chemometrics 2: 211–228
Manne R (1987) Analysis of two partial-least squares algorithms for multivariate calibration. Chemometrics and Intelligent Systems 2: 187–197
de Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Systems 18: 251–263
Wakeling I, Macfie H (1992) A robust PLS procedure. J Chemometrics 6 (4): 189–198
Frank I (1990) A nonlinear PLS model. Chemometrics and Intelligent Systems 8: 109–119
Siegel A (1982) Robust regression using repeated medians. Biometricka 69: 242–244
Rousseeuw P (1983) Regression techniques with high breakdown point. IMS Bull, 12 155
Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York
Massart D, Kaufman L, Rousseeuw P, Leroy A (1986) Least median of squares: a robust method for outlier and model eror detection in regression and calibration. Anal Chim Acta 187: 171–179
Rousseeuw P, van Zonieren B (1990) Unmasking multivariate outliers and leverage points. J AM Stat Assoc 85: 633–639
Cleymaet R (1991) Lead and Cadmium in tooth enamel: measurement via acid etch biopsies. PhD thesis, Free University of Brussels
Coomans D, Slop D, Cleymaet R (1991) Lead and cadmium content in tooth surface enamel of Belgian schoolchildren from different geographic areas. Technical Report, Dept of Mathematics and Statistics, James Cook University and Eenheid Prothetische Tandheelkunde, Vrije Universiteit Brussel
Cleymaet R, Quartier E, Retief D, Slop D, Coomans D (1991) Reappraisal of an in vitro and in vivo acid etch microbiopsy method applied to human tooth surfaces. Trace El Med 8: 74–82
Box G, Cox D (1964) An analysis of transformations. J Royal Statistical Society B26: 211–252
Bartlett M (1947) The use of transformations. Biometrics 3: 39–52
Bennett C, Franklin N (1954) Statistical analysis in chemistry and the chemical industry. Wiley, New York
Bickel P (1981) An analysis of transformations revisted. J American Statistical Association 76: 296–311
Cleveland W (1979) Robust locally-weighted regression and smoothing scatterplots. J American Statistical Association 74: 829–836
Friedman J (1984) A variable span smoother. Technical report LCS5, Department of statistics, Stanford University
Australian Institute of Marine Science (1992) Long term monitoring of the Great Barrier Reef: dissolved and particulate nutrients. Australian Institute of Marine Science, Townsville, Australia
Norusis M (1993) SPSS for windows release 6.00. SPSS Inc.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mallet, Y.L., Coomans, D.H., de Vel, O.Y. (1995). Robust and Non-parametric Methods in Multiple Regressions of Environmental Data. In: Einax, J. (eds) Chemometrics in Environmental Chemistry - Statistical Methods. The Handbook of Environmental Chemistry, vol 2 / 2G. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49148-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-49148-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-14885-3
Online ISBN: 978-3-540-49148-4
eBook Packages: Springer Book Archive