Abstract
Estimator selection has become a crucial issue in non parametric estimation. Two widely used methods are penalized empirical risk minimization (such as penalized log-likelihood estimation) or pairwise comparison (such as Lepski’s method). Our aim in this paper is twofold. First we explain some general ideas about the calibration issue of estimator selection methods. We review some known results, putting the emphasis on the concept of minimal penalty which is helpful to design data-driven selection criteria. Secondly we present a new method for bandwidth selection within the framework of kernel density density estimation which is in some sense intermediate between these two main methods mentioned above. We provide some theoretical results which lead to some fully data-driven selection strategy.
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceedings 2nd International Symposium on Information Theory. (P. N. Petrov and F. Csaki, eds.). Akademia Kiado, Budapest, pp. 267–281.
Arlot, S. and Bach, F. (2009). Data-driven calibration of linear estimators with minimal penalties. In Advances in Neural Information Processing Systems. (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.). Vol. 22, pp. 46–54.
Arlot, S. and Massart, P. (2009). Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10, 245–279 (electronic).
Bahadur, R.R. (1958). Examples of inconsistency of maximum likelihood estimates. Sankhya Ser. A 20, 207–210.
Barron, A.R. and Cover, T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37, 1034–1054.
Barron, A.R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Th. Rel. Fields 113, 301–415.
Baudry J.-P., Maugis C. and Michel B. 2011 Slope heuristics: overview and implementation. Stat. Comput., 1–16.
Bertin, K., Lacour, C. and Rivoirard, V. (2016). Adaptive pointwise estimation of conditional density function. Ann. Inst. Henri Poincaré Probab. Stat. 52, 939–980.
Bertin, K., Le Pennec, E. and Rivoirard, V. (2011). Adaptive Dantzig density estimation. Ann. Inst. Henri Poincaré Probab. Stat. 47, 43–74.
Bickel, P.J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732.
Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Th. Relat. Fields 97, 113–150.
Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4, 329–375.
Birgé, L. and Massart, P. (2001) Gaussian model selection. J. Eur. Math. Soc., 203–268.
Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Th. Rel. Fields 138, 33–73.
Boucheron, S., Lugosi, G. and Massart, P. (2013) Concentration inequalities. Oxford University Press.
Daniel, C. and Wood, F.S. (1971). Fitting Equations to Data. Wiley, New York.
Devroye, L. and Lugosi, G. (2001). Combinatorial methods in density estimation, Springer Series in Statistics. Springer, New York.
Donoho, D.L. and Johnstone, I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455.
Donoho. D.L. and Johnstone. I.M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sc. Paris Sér. I Math. 319, 1317–1322.
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage:Asymptopia? J. R. Statist. Soc. B 57, 301–369.
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24, 508–539.
Doumic, M., Hoffmann, M., Reynaud-Bouret, P. and Rivoirard, V. (2012). Nonparametric estimation of the division rate of a size-structured population. SIAM J. Numer. Anal. 50, 925–950.
Efroimovitch, S.Yu. and Pinsker, M.S. (1984). Learning algorithm for nonparametric filtering. Automat. Remote Control 11, 1434–1440. translated from Avtomatika i Telemekhanika 11, 58–65.
Goldenshluger, A. and Lepski, O. (2008) Universal pointwise selection rule in multivariate function estimation, Vol. 14.
Goldenshluger, A. and Lepski, O. (2009). Structural adaptation via L p -norm oracle inequalities. Probab. Theory Related Fields 143, 41–71.
Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. Ann. Statist. 39, 1608–1632.
Goldenshluger, A. and Lepski, O. (2013). General selection rule from a family of linear estimators. Theory Probab. Appl. 57, 209–226.
Goldenshluger, A. and Lepski, O. (2014). On adaptive minimax density estimation on \(\mathbb {R}^{d}\). Theory Probab. Appl. 159, 479–543.
Kerkyacharian, G., Lepski, O. and Picard, D. (2008). Nonlinear estimation in anisotropic multiindex denoising. Sparse case. Theory Probab. Appl. 52, 58–77.
Lacour, C. and Massart, P.P. (2016) Minimal penalty for Goldenschluger-Lepski method. < hal-01121989v2 >. To appear in Stoch. Proc. Appl.
Lebarbier, E. (2005). Detecting multiple change points in the mean of Gaussian process by model selection. Signal Process. 85, 717–736.
Lepskii, O.V. (1990). On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 36, 454–466.
Lepskii, O.V. (1991). Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36, 682–697.
Lepskii, O.V. (2013). Upper functions for positive random functionals. II. Application to the empirical processes theory, Part 1. Math. Methods Statist. 22, 83–99.
Lerasle, M. (2012). Optimal model selection in density estimation. Ann. Inst. Henri Poincaré Probab. Stat. 48, 884–908.
Lerasle, M., Malter-Magalahes, N. and Reynaud-Bouret, P. (2015) Optimal kernel selection for density estimation. To appear in High dimensional probabilities VII: The Cargese Volume.
Lerasle, M. and Takahashi, D.Y. (2016). Sharp oracle inequalities and slope heuristic for specification probabilities estimation in general random fields. Bernoulli 22, 1.
Mallows, C.L. (1973). Some comments on C p . Technometrics 15, 661–675.
Massart, P. (2007). Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics 1896. Springer, Berlin/Heidelberg.
Nikol’skii, S. M. (1977) Priblizhenie funktsii mnogikh peremennykh i teoremy vlozheniya. (Russian) [Approximation of functions of several variables and imbedding theorems] Second edition, revised and supplemented. “Nauka”, Moscow.
Pinsker, M.S. (1980). Optimal filtration of square-integrable signals in Gaussian noise. Probl. Inf. Transm. 16, 120–133.
Reynaud-Bouret, P., Rivoirard, V. and Tuleau-Malot, C. (2011). Adaptive density estimation: a curse of support? J. Statist. Plann. Inference 141, 115–139.
Rigollet, P. (2006). Adaptive density estimation using the blockwise Stein method. Bernoulli 12, 351–370.
Saumard, A. (2013). Optimal model selection in heteroscedastic regression using piecewise polynomial functions. Electron. J. Stat. 7, 1184–1223.
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B. 58, 267–288.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lacour, C., Massart, P. & Rivoirard, V. Estimator Selection: a New Method with Applications to Kernel Density Estimation. Sankhya A 79, 298–335 (2017). https://doi.org/10.1007/s13171-017-0107-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-017-0107-5
Keywords and phrases
- Concentration inequalities
- Kernel density estimation
- Penalization methods
- Estimator selection
- Oracle inequality