Abstract
Simple random subsampling is an integral part of S estimation algorithms for linear regression. Subsamples are required to be nonsingular. Usually, discarding a singular subsample and drawing a new one leads to a sufficient number of nonsingular subsamples with a reasonable computational effort. However, this procedure can require so many subsamples that it becomes infeasible, especially if levels of categorical variables have low frequency. A subsampling algorithm called nonsingular subsampling is presented, which generates only nonsingular subsamples. When no singular subsamples occur, nonsingular subsampling is as fast as the simple algorithm, and if singular subsamples do occur, it maintains the same computational order. The algorithm works consistently, unless the full design matrix is singular. The method is based on a modified LU decomposition algorithm that combines sample generation with solving the least squares problem. The algorithm may also be useful for ordinary bootstrapping. Since the method allows for S estimation in designs with factors and interactions between factors and continuous regressors, we study properties of the resulting estimators, both in the sense of their dependence on the randomness of the sampling and of their statistical performance.
Similar content being viewed by others
Notes
We used observations 311, 313, 318:319, 323, 326, 332, 502, 505:506, 508:509, 511:514, 516, 520:522, 669:692, 717, 719, 721:725, 727:730, 732, 734:740, 765:788, 1631, 1633:1636, 1639, 1641:1643, 1645:1646, 1648:1649, 1823, 1825, 1828:1829, 1831:1832, 1836, 2920, 2922, 2924, 2927, 2929:2930, 2932, 2934, 2937:2941, 3184:3186, 3188:3189, 3191, 3193:3194, 3196:3199, 3201:3207, 5573:5575, 5577:5581, 5583:5584, 5586, 5588:5591, 5593:5596, 7177:7200 of the NOxEmissions dataset from the \({\textsf {R}}\) package robustbase (Rousseeuw et al. 2015).
References
Davies PL, Gather U (2005) Breakdown and groups. Ann Stat 33(3):977–1035
Demmel J (1997) Applied numerical linear algebra. Society for Industrial and Applied Mathematics
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Hampel FR (1975) Beyond location parameters: robust concepts and methods. Bull Int Stat Inst 46:375–382
Koller M, Stahel WA (2011) Sharpening wald-type inference in robust regression for small samples. Comput Stat Data Anal 55(8):2504–2515. doi:10.1016/j.csda.2011.02.014
Maronna RA, Yohai VJ (2000) Robust regression with both continuous and categorical predictors. J Stat Plan Inference 89(12):197–214. doi:10.1016/S0378-3758(99)00208-6
Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics, theory and methods. Wiley, NY
Mili L, Coakley CW (1996) Robust estimation in structured linear regression. Ann Stat 24(6):2593–2607
Mili L, Phaniraj V, Rousseeuw P (1991) Least median of squares estimation in power systems. IEEE Trans Power Syst 6(2):511–523
Politis DN, Romano JP, Michael W (1999) Subsampling. Springer series in statistics. Springer, NY
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0
Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2015) Robustbase: basic robust statistics. http://CRAN.R-project.org/package=robustbase, r package version 0.92-5
Ruckstuhl AF (1995) Analysis of the t2 emission spectrum by robust estimation techniques. Ph.D. thesis, Swiss Federal Institute of Technology Zurich
Salibian-Barrera M, Yohai V (2006) A fast algorithm for S-regression estimates. J Comput Graph Stat 15(2):414–427
Stahel WA, Ruckstuhl AF, Senn P, Dressler K (1994) Robust estimation in the analysis of complex molecular spectra. J Am Stat Assoc 89(427):788–795
Acknowledgments
The authors would like to thank Kali Tal for providing editorial help with the manuscript. A reviewer has provided very helpful suggestions to improve earlier versions of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Koller, M., Stahel, W.A. Nonsingular subsampling for regression S estimators with categorical predictors. Comput Stat 32, 631–646 (2017). https://doi.org/10.1007/s00180-016-0679-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0679-x