Robust and Non-parametric Methods in Multiple Regressions of Environmental Data

Mallet, Yvette L.; Coomans, Danny H.; de Vel, Olivier Y.

doi:10.1007/978-3-540-49148-4_6

Yvette L. Mallet³,
Danny H. Coomans³ &
Olivier Y. de Vel⁴

Part of the book series: The Handbook of Environmental Chemistry ((HEC2,volume 2 / 2G))

361 Accesses
1 Citations

Summary

Statistical regression methods in environmental chemistry are of vital importance. Regression techniques provide environmental chemical analysts with the ability to calibrate instruments and model large environmental systems.

It has become apparent that ordinary least-squares regression is not well suited to modeling data that contains outliers or strong nonlinearities. In the presence of outlying data robust regression methods prove to be a useful tool, while various non-parametric regression models are useful should the data possess nonlinearities or high levels of noise.

Robust techniques have the ability to detect outliers and dampen their effect on the modeling procedure. Several robust regression methods have been proposed but this article focuses on the least median of squares method and reweighted least squares regression.

The non-parametric models to be discussed include the ACE model, the PI model and the MARS model. Unlike ordinary least squares, these methods have evolved only recently, hence there is only limited documentation available on these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Classical and Robust Regression Analysis with Compositional Data

Article Open access 06 October 2020

Robust inference for nonlinear regression models

Article 05 December 2017

Multivariate Cross-Validation and Measures of Accuracy and Precision

Article Open access 12 January 2023

Abbreviations

LS:: ordinary least squares
ACE:: alternating conditional expectations
PI:: pi implementation
MARS:: multivariate adaptive regression splines
LMS:: least median squares
RLS:: reweighted least squares
PE _gcv :: generalized cross-validation estimate of the prediction error
lo f :: lack-of-fit
mi :: maximum level of interaction
RSS :: residual sum of squares
mad :: median absolute deviation
CV :: cross-validating
NP :: number of estimated parameters
GCV :: generalized cross-validation
MSE :: mean square error
AIC :: Akaike’s information criterion
y :: response variable
X _i :: i-th predictor variable; i = 1,..., m
ε :: residual component
β _i :: i-th regression coefficient
m :: number of predictor variables
N(·):: normal distribution
n :: number of observational units
Z :: sample of data vectors
Z′:: corrupted sample of data vectors
T :: regression estimator
\((\hat .)\) :: estimate of (·)
‖(·)‖ :: magnitude of (·)
г_n :: finite sample breakdown point
σ ² :: population variance
σ :: population standard deviation
\({\hat \sigma _{LS}}\) :: population standard deviation estimated by LS
\({\hat \sigma _{LMS}}\) :: population standard deviation estimated by LMS
z :: standardized residual component
|(·)|:: modulus of (·)
w _i :: weight assigned to the i-th observational unit
R ² :: coefficient of determination
R ²_cv :: cross-validated coefficient of determination
R ²_adj :: adjusted coefficient of determination
s :: scatterplot smoother
c _i :: i-th cutpoint
R _k :: k-th set of observations
η((·)):: neighborhood of (·)
a :: constant
d ((·)):: an even function that decreases with |(·)|
S_0j :: weight given to y _jin producing a smoothing estimate of the observation x ₀
k :: number of observations in a symmetric neighborhood
r ⁻ⁱ :: cross-validated residual component
s ⁻ⁱ :: fitted smooth with the point xi removed
\(\hat \xi \) :: local error estimate
K*:: initial span
ϱ(·):: truncated power function
q :: spline degree
t :: knot
θ(·):: transformed response
g(·):: transformed predictor
e ² :: unexplained variance
θ* (·):: optimal transformed response
g*(·):: optimal transformed predictor
П_j :: j-th product in a PI model
ø(·):: cubic spline basis function
J :: number of products in a PI model
K :: number of knots in each cubic spline basis function
J*:: optimal number of products in a PImodel
U(·):: uniform distribution
B _p :: p-th multivariate spline basis function
P :: number of multivariate spline basis functions
V :: number of groups the data is divided into for cross-validation
\((\mathop .\limits^ - )\) :: mean value of (·)
f ^−v :: estimated function when the v-th group has been removed.

References

Draper N, Smith H (1981) Applied regression analysis. John Wiley and Sons, California
Google Scholar
Friedman J (1991) Multivariate adaptive regression splines (with discussion). Annals of Statistics 19: 1–141
Article Google Scholar
Myers H (1990) Classical and modern regressions with applications. PWS-KENT, Boston
Google Scholar
Breiman L, Friedman J (1985) Estimating optimal transformations for multiple regression and correlation (with discussion). J American Statistical Association 80: 580–619
Article Google Scholar
Frank I (1988) ACE: a non-linear regression model. Chemometrics and Intelligent Laboratory Systems 3: 301–313
Article CAS Google Scholar
) S-PLUS for windows user’s manual volume 2 Version 3.1, Seattle: Statistical Sciences, Inc.
Google Scholar
Fox J, Long J (1990) Modern methods of data analysis. Sage Publications, Inc. California
Google Scholar
Härdle W (1990) Applied nonparametric regression. Cambridge University Press
Google Scholar
Clare B (1993) Structure-activity correlations for psychotomimetics. 2. Phenylalkylamines: a treatment on nonlinearity using the alternating conditional expectations technique. Chemometrics and Intelligent Laboratory Systems 18: 71–92
Article CAS Google Scholar
Breiman L (1991) The H method for estimating multivariate functions from noisy data. Technometrics 33: 125–160
Google Scholar
Sekulic S, Kowalski B (1992) Mars: A tutorial. J Chemometrics 6: 199–216
Google Scholar
Friedman J (1988) Fitting functions to noisy data in high dimensions. Technical Report 101, Statistics Department, Stanford University
Google Scholar
Hastie T, Tibshirani R (1991) Generalized additive models. Chapman and Hall, London
Google Scholar
Hastie T, Tibshirani R (1986) Generalized additive models (with discussion). Statist Sci 1: 297–318
Article Google Scholar
Breiman L (1993) Fitting additive models to regression data. Computational statistics and data analysis 15: 13–46
Article Google Scholar
Blank T, Brown S (1993) Nonlinear multivariate mapping of chemical data using feedforward neural networks. Analytical chemistry 65: 3081–3089
Article CAS Google Scholar
Rousseeuw P (1991) Tutorial to robust statistics. J Chemometrics 5: 1–20
Article Google Scholar
Frank I (1989) Comparative Monte Carlo study of biased regression techniques. Technical Report 105, Statistics Department, Stanford University
Google Scholar
Höskuldsson A (1988) PLS regression methods. J Chemometrics 2: 211–228
Article Google Scholar
Manne R (1987) Analysis of two partial-least squares algorithms for multivariate calibration. Chemometrics and Intelligent Systems 2: 187–197
Article CAS Google Scholar
de Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Systems 18: 251–263
Article Google Scholar
Wakeling I, Macfie H (1992) A robust PLS procedure. J Chemometrics 6 (4): 189–198
Article CAS Google Scholar
Frank I (1990) A nonlinear PLS model. Chemometrics and Intelligent Systems 8: 109–119
Article CAS Google Scholar
Siegel A (1982) Robust regression using repeated medians. Biometricka 69: 242–244
Article Google Scholar
Rousseeuw P (1983) Regression techniques with high breakdown point. IMS Bull, 12 155
Google Scholar
Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York
Book Google Scholar
Massart D, Kaufman L, Rousseeuw P, Leroy A (1986) Least median of squares: a robust method for outlier and model eror detection in regression and calibration. Anal Chim Acta 187: 171–179
Article CAS Google Scholar
Rousseeuw P, van Zonieren B (1990) Unmasking multivariate outliers and leverage points. J AM Stat Assoc 85: 633–639
Article Google Scholar
Cleymaet R (1991) Lead and Cadmium in tooth enamel: measurement via acid etch biopsies. PhD thesis, Free University of Brussels
Google Scholar
Coomans D, Slop D, Cleymaet R (1991) Lead and cadmium content in tooth surface enamel of Belgian schoolchildren from different geographic areas. Technical Report, Dept of Mathematics and Statistics, James Cook University and Eenheid Prothetische Tandheelkunde, Vrije Universiteit Brussel
Google Scholar
Cleymaet R, Quartier E, Retief D, Slop D, Coomans D (1991) Reappraisal of an in vitro and in vivo acid etch microbiopsy method applied to human tooth surfaces. Trace El Med 8: 74–82
CAS Google Scholar
Box G, Cox D (1964) An analysis of transformations. J Royal Statistical Society B26: 211–252
Google Scholar
Bartlett M (1947) The use of transformations. Biometrics 3: 39–52
Article CAS Google Scholar
Bennett C, Franklin N (1954) Statistical analysis in chemistry and the chemical industry. Wiley, New York
Google Scholar
Bickel P (1981) An analysis of transformations revisted. J American Statistical Association 76: 296–311
Article Google Scholar
Cleveland W (1979) Robust locally-weighted regression and smoothing scatterplots. J American Statistical Association 74: 829–836
Article Google Scholar
Friedman J (1984) A variable span smoother. Technical report LCS5, Department of statistics, Stanford University
Google Scholar
Australian Institute of Marine Science (1992) Long term monitoring of the Great Barrier Reef: dissolved and particulate nutrients. Australian Institute of Marine Science, Townsville, Australia
Google Scholar
Norusis M (1993) SPSS for windows release 6.00. SPSS Inc.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, James Cook University, Townsville, QLD, 4811, Australia
Yvette L. Mallet & Danny H. Coomans
Department of Computer Science, James Cook University, Townsville, QLD, 4811, Australia
Olivier Y. de Vel

Authors

Yvette L. Mallet
View author publications
You can also search for this author in PubMed Google Scholar
Danny H. Coomans
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Y. de Vel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Inorganic and Analytical Chemistry, Friedrich Schiller University, Lessingstraße 8, D-07743, Jena, Germany
Jürgen Einax

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mallet, Y.L., Coomans, D.H., de Vel, O.Y. (1995). Robust and Non-parametric Methods in Multiple Regressions of Environmental Data. In: Einax, J. (eds) Chemometrics in Environmental Chemistry - Statistical Methods. The Handbook of Environmental Chemistry, vol 2 / 2G. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49148-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-49148-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-14885-3
Online ISBN: 978-3-540-49148-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Robust and Non-parametric Methods in Multiple Regressions of Environmental Data

Summary

Access this chapter

Preview

Similar content being viewed by others

Classical and Robust Regression Analysis with Compositional Data

Robust inference for nonlinear regression models

Multivariate Cross-Validation and Measures of Accuracy and Precision

Abbreviations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Robust and Non-parametric Methods in Multiple Regressions of Environmental Data

Summary

Access this chapter

Preview

Similar content being viewed by others

Classical and Robust Regression Analysis with Compositional Data

Robust inference for nonlinear regression models

Multivariate Cross-Validation and Measures of Accuracy and Precision

Abbreviations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation