Skip to main content

Robust and Non-parametric Methods in Multiple Regressions of Environmental Data

  • Chapter
Chemometrics in Environmental Chemistry - Statistical Methods

Part of the book series: The Handbook of Environmental Chemistry ((HEC2,volume 2 / 2G))

Summary

Statistical regression methods in environmental chemistry are of vital importance. Regression techniques provide environmental chemical analysts with the ability to calibrate instruments and model large environmental systems.

It has become apparent that ordinary least-squares regression is not well suited to modeling data that contains outliers or strong nonlinearities. In the presence of outlying data robust regression methods prove to be a useful tool, while various non-parametric regression models are useful should the data possess nonlinearities or high levels of noise.

Robust techniques have the ability to detect outliers and dampen their effect on the modeling procedure. Several robust regression methods have been proposed but this article focuses on the least median of squares method and reweighted least squares regression.

The non-parametric models to be discussed include the ACE model, the PI model and the MARS model. Unlike ordinary least squares, these methods have evolved only recently, hence there is only limited documentation available on these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

Abbreviations

LS:

ordinary least squares

ACE:

alternating conditional expectations

PI:

pi implementation

MARS:

multivariate adaptive regression splines

LMS:

least median squares

RLS:

reweighted least squares

PE gcv :

generalized cross-validation estimate of the prediction error

lo f :

lack-of-fit

mi :

maximum level of interaction

RSS :

residual sum of squares

mad :

median absolute deviation

CV :

cross-validating

NP :

number of estimated parameters

GCV :

generalized cross-validation

MSE :

mean square error

AIC :

Akaike’s information criterion

y :

response variable

X i :

i-th predictor variable; i = 1,..., m

ε :

residual component

β i :

i-th regression coefficient

m :

number of predictor variables

N(·):

normal distribution

n :

number of observational units

Z :

sample of data vectors

Z′:

corrupted sample of data vectors

T :

regression estimator

\((\hat .)\) :

estimate of (·)

‖(·)‖ :

magnitude of (·)

гn :

finite sample breakdown point

σ 2 :

population variance

σ :

population standard deviation

\({\hat \sigma _{LS}}\) :

population standard deviation estimated by LS

\({\hat \sigma _{LMS}}\) :

population standard deviation estimated by LMS

z :

standardized residual component

|(·)|:

modulus of (·)

w i :

weight assigned to the i-th observational unit

R 2 :

coefficient of determination

R 2cv :

cross-validated coefficient of determination

R 2adj :

adjusted coefficient of determination

s :

scatterplot smoother

c i :

i-th cutpoint

R k :

k-th set of observations

η((·)):

neighborhood of (·)

a :

constant

d ((·)):

an even function that decreases with |(·)|

S0j :

weight given to y jin producing a smoothing estimate of the observation x 0

k :

number of observations in a symmetric neighborhood

r −i :

cross-validated residual component

s −i :

fitted smooth with the point xi removed

\(\hat \xi \) :

local error estimate

K*:

initial span

ϱ(·):

truncated power function

q :

spline degree

t :

knot

θ(·):

transformed response

g(·):

transformed predictor

e 2 :

unexplained variance

θ* (·):

optimal transformed response

g*(·):

optimal transformed predictor

Пj :

j-th product in a PI model

ø(·):

cubic spline basis function

J :

number of products in a PI model

K :

number of knots in each cubic spline basis function

J*:

optimal number of products in a PImodel

U(·):

uniform distribution

B p :

p-th multivariate spline basis function

P :

number of multivariate spline basis functions

V :

number of groups the data is divided into for cross-validation

\((\mathop .\limits^ - )\) :

mean value of (·)

f −v :

estimated function when the v-th group has been removed.

References

  1. Draper N, Smith H (1981) Applied regression analysis. John Wiley and Sons, California

    Google Scholar 

  2. Friedman J (1991) Multivariate adaptive regression splines (with discussion). Annals of Statistics 19: 1–141

    Article  Google Scholar 

  3. Myers H (1990) Classical and modern regressions with applications. PWS-KENT, Boston

    Google Scholar 

  4. Breiman L, Friedman J (1985) Estimating optimal transformations for multiple regression and correlation (with discussion). J American Statistical Association 80: 580–619

    Article  Google Scholar 

  5. Frank I (1988) ACE: a non-linear regression model. Chemometrics and Intelligent Laboratory Systems 3: 301–313

    Article  CAS  Google Scholar 

  6. ) S-PLUS for windows user’s manual volume 2 Version 3.1, Seattle: Statistical Sciences, Inc.

    Google Scholar 

  7. Fox J, Long J (1990) Modern methods of data analysis. Sage Publications, Inc. California

    Google Scholar 

  8. Härdle W (1990) Applied nonparametric regression. Cambridge University Press

    Google Scholar 

  9. Clare B (1993) Structure-activity correlations for psychotomimetics. 2. Phenylalkylamines: a treatment on nonlinearity using the alternating conditional expectations technique. Chemometrics and Intelligent Laboratory Systems 18: 71–92

    Article  CAS  Google Scholar 

  10. Breiman L (1991) The H method for estimating multivariate functions from noisy data. Technometrics 33: 125–160

    Google Scholar 

  11. Sekulic S, Kowalski B (1992) Mars: A tutorial. J Chemometrics 6: 199–216

    Google Scholar 

  12. Friedman J (1988) Fitting functions to noisy data in high dimensions. Technical Report 101, Statistics Department, Stanford University

    Google Scholar 

  13. Hastie T, Tibshirani R (1991) Generalized additive models. Chapman and Hall, London

    Google Scholar 

  14. Hastie T, Tibshirani R (1986) Generalized additive models (with discussion). Statist Sci 1: 297–318

    Article  Google Scholar 

  15. Breiman L (1993) Fitting additive models to regression data. Computational statistics and data analysis 15: 13–46

    Article  Google Scholar 

  16. Blank T, Brown S (1993) Nonlinear multivariate mapping of chemical data using feedforward neural networks. Analytical chemistry 65: 3081–3089

    Article  CAS  Google Scholar 

  17. Rousseeuw P (1991) Tutorial to robust statistics. J Chemometrics 5: 1–20

    Article  Google Scholar 

  18. Frank I (1989) Comparative Monte Carlo study of biased regression techniques. Technical Report 105, Statistics Department, Stanford University

    Google Scholar 

  19. Höskuldsson A (1988) PLS regression methods. J Chemometrics 2: 211–228

    Article  Google Scholar 

  20. Manne R (1987) Analysis of two partial-least squares algorithms for multivariate calibration. Chemometrics and Intelligent Systems 2: 187–197

    Article  CAS  Google Scholar 

  21. de Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Systems 18: 251–263

    Article  Google Scholar 

  22. Wakeling I, Macfie H (1992) A robust PLS procedure. J Chemometrics 6 (4): 189–198

    Article  CAS  Google Scholar 

  23. Frank I (1990) A nonlinear PLS model. Chemometrics and Intelligent Systems 8: 109–119

    Article  CAS  Google Scholar 

  24. Siegel A (1982) Robust regression using repeated medians. Biometricka 69: 242–244

    Article  Google Scholar 

  25. Rousseeuw P (1983) Regression techniques with high breakdown point. IMS Bull, 12 155

    Google Scholar 

  26. Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York

    Book  Google Scholar 

  27. Massart D, Kaufman L, Rousseeuw P, Leroy A (1986) Least median of squares: a robust method for outlier and model eror detection in regression and calibration. Anal Chim Acta 187: 171–179

    Article  CAS  Google Scholar 

  28. Rousseeuw P, van Zonieren B (1990) Unmasking multivariate outliers and leverage points. J AM Stat Assoc 85: 633–639

    Article  Google Scholar 

  29. Cleymaet R (1991) Lead and Cadmium in tooth enamel: measurement via acid etch biopsies. PhD thesis, Free University of Brussels

    Google Scholar 

  30. Coomans D, Slop D, Cleymaet R (1991) Lead and cadmium content in tooth surface enamel of Belgian schoolchildren from different geographic areas. Technical Report, Dept of Mathematics and Statistics, James Cook University and Eenheid Prothetische Tandheelkunde, Vrije Universiteit Brussel

    Google Scholar 

  31. Cleymaet R, Quartier E, Retief D, Slop D, Coomans D (1991) Reappraisal of an in vitro and in vivo acid etch microbiopsy method applied to human tooth surfaces. Trace El Med 8: 74–82

    CAS  Google Scholar 

  32. Box G, Cox D (1964) An analysis of transformations. J Royal Statistical Society B26: 211–252

    Google Scholar 

  33. Bartlett M (1947) The use of transformations. Biometrics 3: 39–52

    Article  CAS  Google Scholar 

  34. Bennett C, Franklin N (1954) Statistical analysis in chemistry and the chemical industry. Wiley, New York

    Google Scholar 

  35. Bickel P (1981) An analysis of transformations revisted. J American Statistical Association 76: 296–311

    Article  Google Scholar 

  36. Cleveland W (1979) Robust locally-weighted regression and smoothing scatterplots. J American Statistical Association 74: 829–836

    Article  Google Scholar 

  37. Friedman J (1984) A variable span smoother. Technical report LCS5, Department of statistics, Stanford University

    Google Scholar 

  38. Australian Institute of Marine Science (1992) Long term monitoring of the Great Barrier Reef: dissolved and particulate nutrients. Australian Institute of Marine Science, Townsville, Australia

    Google Scholar 

  39. Norusis M (1993) SPSS for windows release 6.00. SPSS Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mallet, Y.L., Coomans, D.H., de Vel, O.Y. (1995). Robust and Non-parametric Methods in Multiple Regressions of Environmental Data. In: Einax, J. (eds) Chemometrics in Environmental Chemistry - Statistical Methods. The Handbook of Environmental Chemistry, vol 2 / 2G. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49148-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49148-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-14885-3

  • Online ISBN: 978-3-540-49148-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics