Estimator Selection: a New Method with Applications to Kernel Density Estimation

Lacour, Claire; Massart, Pascal; Rivoirard, Vincent

doi:10.1007/s13171-017-0107-5

Estimator Selection: a New Method with Applications to Kernel Density Estimation

Published: 12 June 2017

Volume 79, pages 298–335, (2017)
Cite this article

Sankhya A Aims and scope Submit manuscript

Claire Lacour¹,
Pascal Massart¹ &
Vincent Rivoirard²

317 Accesses
33 Citations
2 Altmetric
Explore all metrics

Abstract

Estimator selection has become a crucial issue in non parametric estimation. Two widely used methods are penalized empirical risk minimization (such as penalized log-likelihood estimation) or pairwise comparison (such as Lepski’s method). Our aim in this paper is twofold. First we explain some general ideas about the calibration issue of estimator selection methods. We review some known results, putting the emphasis on the concept of minimal penalty which is helpful to design data-driven selection criteria. Secondly we present a new method for bandwidth selection within the framework of kernel density density estimation which is in some sense intermediate between these two main methods mentioned above. We provide some theoretical results which lead to some fully data-driven selection strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Article 07 February 2017

John K. Kruschke & Torrin M. Liddell

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceedings 2nd International Symposium on Information Theory. (P. N. Petrov and F. Csaki, eds.). Akademia Kiado, Budapest, pp. 267–281.
Google Scholar
Arlot, S. and Bach, F. (2009). Data-driven calibration of linear estimators with minimal penalties. In Advances in Neural Information Processing Systems. (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.). Vol. 22, pp. 46–54.
Arlot, S. and Massart, P. (2009). Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10, 245–279 (electronic).
Google Scholar
Bahadur, R.R. (1958). Examples of inconsistency of maximum likelihood estimates. Sankhya Ser. A 20, 207–210.
MathSciNet MATH Google Scholar
Barron, A.R. and Cover, T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37, 1034–1054.
Article MathSciNet MATH Google Scholar
Barron, A.R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Th. Rel. Fields 113, 301–415.
Article MathSciNet MATH Google Scholar
Baudry J.-P., Maugis C. and Michel B. 2011 Slope heuristics: overview and implementation. Stat. Comput., 1–16.
Bertin, K., Lacour, C. and Rivoirard, V. (2016). Adaptive pointwise estimation of conditional density function. Ann. Inst. Henri Poincaré Probab. Stat. 52, 939–980.
Article MathSciNet MATH Google Scholar
Bertin, K., Le Pennec, E. and Rivoirard, V. (2011). Adaptive Dantzig density estimation. Ann. Inst. Henri Poincaré Probab. Stat. 47, 43–74.
Article MathSciNet MATH Google Scholar
Bickel, P.J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732.
Article MathSciNet MATH Google Scholar
Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Th. Relat. Fields 97, 113–150.
Article MathSciNet MATH Google Scholar
Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4, 329–375.
Article MathSciNet MATH Google Scholar
Birgé, L. and Massart, P. (2001) Gaussian model selection. J. Eur. Math. Soc., 203–268.
Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Th. Rel. Fields 138, 33–73.
Article MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G. and Massart, P. (2013) Concentration inequalities. Oxford University Press.
Daniel, C. and Wood, F.S. (1971). Fitting Equations to Data. Wiley, New York.
MATH Google Scholar
Devroye, L. and Lugosi, G. (2001). Combinatorial methods in density estimation, Springer Series in Statistics. Springer, New York.
Book MATH Google Scholar
Donoho, D.L. and Johnstone, I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455.
Article MathSciNet MATH Google Scholar
Donoho. D.L. and Johnstone. I.M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sc. Paris Sér. I Math. 319, 1317–1322.
MathSciNet MATH Google Scholar
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage:Asymptopia? J. R. Statist. Soc. B 57, 301–369.
MathSciNet MATH Google Scholar
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24, 508–539.
Article MathSciNet MATH Google Scholar
Doumic, M., Hoffmann, M., Reynaud-Bouret, P. and Rivoirard, V. (2012). Nonparametric estimation of the division rate of a size-structured population. SIAM J. Numer. Anal. 50, 925–950.
Article MathSciNet MATH Google Scholar
Efroimovitch, S.Yu. and Pinsker, M.S. (1984). Learning algorithm for nonparametric filtering. Automat. Remote Control 11, 1434–1440. translated from Avtomatika i Telemekhanika 11, 58–65.
MATH Google Scholar
Goldenshluger, A. and Lepski, O. (2008) Universal pointwise selection rule in multivariate function estimation, Vol. 14.
Goldenshluger, A. and Lepski, O. (2009). Structural adaptation via L_p-norm oracle inequalities. Probab. Theory Related Fields 143, 41–71.
Article MathSciNet MATH Google Scholar
Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. Ann. Statist. 39, 1608–1632.
Article MathSciNet MATH Google Scholar
Goldenshluger, A. and Lepski, O. (2013). General selection rule from a family of linear estimators. Theory Probab. Appl. 57, 209–226.
Article MathSciNet MATH Google Scholar
Goldenshluger, A. and Lepski, O. (2014). On adaptive minimax density estimation on \(\mathbb {R}^{d}\). Theory Probab. Appl. 159, 479–543.
Article MathSciNet MATH Google Scholar
Kerkyacharian, G., Lepski, O. and Picard, D. (2008). Nonlinear estimation in anisotropic multiindex denoising. Sparse case. Theory Probab. Appl. 52, 58–77.
Article MathSciNet MATH Google Scholar
Lacour, C. and Massart, P.P. (2016) Minimal penalty for Goldenschluger-Lepski method. < hal-01121989v2 >. To appear in Stoch. Proc. Appl.
Lebarbier, E. (2005). Detecting multiple change points in the mean of Gaussian process by model selection. Signal Process. 85, 717–736.
Article MATH Google Scholar
Lepskii, O.V. (1990). On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 36, 454–466.
MathSciNet Google Scholar
Lepskii, O.V. (1991). Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36, 682–697.
Article MathSciNet MATH Google Scholar
Lepskii, O.V. (2013). Upper functions for positive random functionals. II. Application to the empirical processes theory, Part 1. Math. Methods Statist. 22, 83–99.
Article MathSciNet Google Scholar
Lerasle, M. (2012). Optimal model selection in density estimation. Ann. Inst. Henri Poincaré Probab. Stat. 48, 884–908.
Article MathSciNet MATH Google Scholar
Lerasle, M., Malter-Magalahes, N. and Reynaud-Bouret, P. (2015) Optimal kernel selection for density estimation. To appear in High dimensional probabilities VII: The Cargese Volume.
Lerasle, M. and Takahashi, D.Y. (2016). Sharp oracle inequalities and slope heuristic for specification probabilities estimation in general random fields. Bernoulli 22, 1.
Article MathSciNet MATH Google Scholar
Mallows, C.L. (1973). Some comments on C _p. Technometrics 15, 661–675.
MATH Google Scholar
Massart, P. (2007). Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics 1896. Springer, Berlin/Heidelberg.
MATH Google Scholar
Nikol’skii, S. M. (1977) Priblizhenie funktsii mnogikh peremennykh i teoremy vlozheniya. (Russian) [Approximation of functions of several variables and imbedding theorems] Second edition, revised and supplemented. “Nauka”, Moscow.
Pinsker, M.S. (1980). Optimal filtration of square-integrable signals in Gaussian noise. Probl. Inf. Transm. 16, 120–133.
MathSciNet MATH Google Scholar
Reynaud-Bouret, P., Rivoirard, V. and Tuleau-Malot, C. (2011). Adaptive density estimation: a curse of support? J. Statist. Plann. Inference 141, 115–139.
Article MathSciNet MATH Google Scholar
Rigollet, P. (2006). Adaptive density estimation using the blockwise Stein method. Bernoulli 12, 351–370.
Article MathSciNet MATH Google Scholar
Saumard, A. (2013). Optimal model selection in heteroscedastic regression using piecewise polynomial functions. Electron. J. Stat. 7, 1184–1223.
Article MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6, 461–464.
Article MathSciNet MATH Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London.
Book MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B. 58, 267–288.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Mathématiques d’Orsay, University Paris-Sud, UMR8628, Orsay, 91405, France
Claire Lacour & Pascal Massart
CEREMADE, CNRS, UMR 7534, Université Paris Dauphine, PSL Research University, 75016, Paris, France
Vincent Rivoirard

Authors

Claire Lacour
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Massart
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Rivoirard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Massart.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lacour, C., Massart, P. & Rivoirard, V. Estimator Selection: a New Method with Applications to Kernel Density Estimation. Sankhya A 79, 298–335 (2017). https://doi.org/10.1007/s13171-017-0107-5

Download citation

Received: 07 July 2016
Published: 12 June 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s13171-017-0107-5

Keywords and phrases

AMS (2000) subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimator Selection: a New Method with Applications to Kernel Density Estimation

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords and phrases

AMS (2000) subject classification

Navigation

Estimator Selection: a New Method with Applications to Kernel Density Estimation

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Partial Least Squares Methods: Partial Least Squares Correlation and Partial Least Square Regression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

AMS (2000) subject classification

Search

Navigation