Skip to main content
Log in

Nonsingular subsampling for regression S estimators with categorical predictors

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Simple random subsampling is an integral part of S estimation algorithms for linear regression. Subsamples are required to be nonsingular. Usually, discarding a singular subsample and drawing a new one leads to a sufficient number of nonsingular subsamples with a reasonable computational effort. However, this procedure can require so many subsamples that it becomes infeasible, especially if levels of categorical variables have low frequency. A subsampling algorithm called nonsingular subsampling is presented, which generates only nonsingular subsamples. When no singular subsamples occur, nonsingular subsampling is as fast as the simple algorithm, and if singular subsamples do occur, it maintains the same computational order. The algorithm works consistently, unless the full design matrix is singular. The method is based on a modified LU decomposition algorithm that combines sample generation with solving the least squares problem. The algorithm may also be useful for ordinary bootstrapping. Since the method allows for S estimation in designs with factors and interactions between factors and continuous regressors, we study properties of the resulting estimators, both in the sense of their dependence on the randomness of the sampling and of their statistical performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. We used observations 311, 313, 318:319, 323, 326, 332, 502, 505:506, 508:509, 511:514, 516, 520:522, 669:692, 717, 719, 721:725, 727:730, 732, 734:740, 765:788, 1631, 1633:1636, 1639, 1641:1643, 1645:1646, 1648:1649, 1823, 1825, 1828:1829, 1831:1832, 1836, 2920, 2922, 2924, 2927, 2929:2930, 2932, 2934, 2937:2941, 3184:3186, 3188:3189, 3191, 3193:3194, 3196:3199, 3201:3207, 5573:5575, 5577:5581, 5583:5584, 5586, 5588:5591, 5593:5596, 7177:7200 of the NOxEmissions dataset from the \({\textsf {R}}\)  package robustbase (Rousseeuw et al. 2015).

References

  • Davies PL, Gather U (2005) Breakdown and groups. Ann Stat 33(3):977–1035

    Article  MathSciNet  MATH  Google Scholar 

  • Demmel J (1997) Applied numerical linear algebra. Society for Industrial and Applied Mathematics

  • Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Hampel FR (1975) Beyond location parameters: robust concepts and methods. Bull Int Stat Inst 46:375–382

    MathSciNet  MATH  Google Scholar 

  • Koller M, Stahel WA (2011) Sharpening wald-type inference in robust regression for small samples. Comput Stat Data Anal 55(8):2504–2515. doi:10.1016/j.csda.2011.02.014

    Article  MathSciNet  Google Scholar 

  • Maronna RA, Yohai VJ (2000) Robust regression with both continuous and categorical predictors. J Stat Plan Inference 89(12):197–214. doi:10.1016/S0378-3758(99)00208-6

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics, theory and methods. Wiley, NY

    Book  MATH  Google Scholar 

  • Mili L, Coakley CW (1996) Robust estimation in structured linear regression. Ann Stat 24(6):2593–2607

    Article  MathSciNet  MATH  Google Scholar 

  • Mili L, Phaniraj V, Rousseeuw P (1991) Least median of squares estimation in power systems. IEEE Trans Power Syst 6(2):511–523

    Article  Google Scholar 

  • Politis DN, Romano JP, Michael W (1999) Subsampling. Springer series in statistics. Springer, NY

    Google Scholar 

  • R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0

  • Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2015) Robustbase: basic robust statistics. http://CRAN.R-project.org/package=robustbase, r package version 0.92-5

  • Ruckstuhl AF (1995) Analysis of the t2 emission spectrum by robust estimation techniques. Ph.D. thesis, Swiss Federal Institute of Technology Zurich

  • Salibian-Barrera M, Yohai V (2006) A fast algorithm for S-regression estimates. J Comput Graph Stat 15(2):414–427

    Article  MathSciNet  Google Scholar 

  • Stahel WA, Ruckstuhl AF, Senn P, Dressler K (1994) Robust estimation in the analysis of complex molecular spectra. J Am Stat Assoc 89(427):788–795

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Kali Tal for providing editorial help with the manuscript. A reviewer has provided very helpful suggestions to improve earlier versions of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Koller.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koller, M., Stahel, W.A. Nonsingular subsampling for regression S estimators with categorical predictors. Comput Stat 32, 631–646 (2017). https://doi.org/10.1007/s00180-016-0679-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-016-0679-x

Keywords

Navigation