A comparison study of nonparametric imputation methods

Ning, Jianhui; Cheng, Philip E.

doi:10.1007/s11222-010-9223-y

A comparison study of nonparametric imputation methods

Published: 29 December 2010

Volume 22, pages 273–285, (2012)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Jianhui Ning¹ &
Philip E. Cheng²

767 Accesses
7 Citations
Explore all metrics

Abstract

Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

References

Anderson, T.W.: Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J. Am. Stat. Assoc. 52, 200–203 (1957)
Article MATH Google Scholar
Carpenter, J.R., Kenward, M.G., Vansteelandt, S.: A comparison of multiple imputation and doubly robust estimation for analyses with missing data. J. R. Stat. Soc. A 69, 571–584 (2006)
Article MathSciNet Google Scholar
Chen, J., Shao, J.: Nearest neighbor imputation for survey data. J. Off. Stat. 16, 113–132 (2000)
MATH Google Scholar
Chen, J., Shao, J.: Jacknife variance estimation for nearest neighbor imputation. J. Am. Stat. Assoc. 96, 260–269 (2001)
Article MathSciNet MATH Google Scholar
Cheng, P.E.: Strong consistency of nearest neighbor regression function estimators. J. Multivar. Anal. 15, 63–72 (1984)
Article MATH Google Scholar
Cheng, P.E.: Nonparametric estimation of mean functionals with data missing at random. J. Am. Stat. Assoc. 89, 81–87 (1994)
Article MATH Google Scholar
Cheng, P.E., Wei, L.J.: Nonparametric inference under ignorable missing data process and treatment assignment. In: International Statistical Symposium, Taipei, vol. 1, pp. 97–112 (1986)
Google Scholar
Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)
MATH Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Article MATH Google Scholar
Dempster, A.P., Laird, N.M., Roubin, D.B.: Maximum likehood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)
MATH Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen., Part II 7, 179–188 (1936)
Article Google Scholar
Fix, E., Hodges, J.L. Jr.: Discriminatory analysis, nonparametric discrimination. USAF School of Aviation Medicine, Randolph Field, Tex., Project 21-49-404, Rept. 4, Contract AF41(128)-31 (1951)
Gunn, S.R.: Support vector machines for classification and regression. Technical Report MP-TR-98-05, Image Speech and Intelligent Systems Group, University of Southampton (1998)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite population. J. Am. Stat. Assoc. 47, 663–685 (1952)
Article MathSciNet MATH Google Scholar
Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22, 523–539 (2007)
Article MathSciNet Google Scholar
Lee, H., Rancout, E., Sarndal, C.E.: Experiments with variance estimation from survey data with imputed values. J. Off. Stat. 10, 231–243 (1994)
Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)
MATH Google Scholar
Logtsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36, 1049–1051 (1965)
Article Google Scholar
Neyman, J.: Contribution to the theory of sampling human populations. J. Am. Stat. Assoc. 33, 101–116 (1938)
Article MATH Google Scholar
Orchard, T., Woodury, M.A.: A missing information principle: Theory and applications. In: Proc. 6th Berkeley Symposium on Math. Stat. and Prob., vol. 1, pp. 697–715 (1972)
Google Scholar
Potthoff, R.F., Roy, S.N.: A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51, 313–326 (1964)
MathSciNet MATH Google Scholar
Qin, J., Shao, J., Zhang, B.: Efficient and doubly robust imputation for covariate-dependent missing responses. J. Am. Stat. Assoc. 103, 797–810 (2008)
Article MathSciNet MATH Google Scholar
Rancourt, E.: Estimation with nearest neighbor imputation at Statistics Canada. In: Proceedings of the Section on Survey Research Methods, pp. 131–138. Am. Statist. Assoc., Alexandria (1999)
Google Scholar
Robins, J.M., Rotnitzky, A.: Comment on “Inference for semiparamentric models: some questions and an answer,” by P.J. Bickel and J. Kwon. Stat. Sin. 11, 920–936 (2001)
Google Scholar
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89, 846–886 (1994)
Article MathSciNet MATH Google Scholar
Robins, J.M., Sued, M., Quanhong, L.-G., Rotnitzky, A.: Comment: performance of double-robust estimators when inverse probability weights are highly variable. Stat. Sci. 22, 544–559 (2007)
Article Google Scholar
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–45 (1983)
Article MathSciNet MATH Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Article MathSciNet MATH Google Scholar
Sande, I.G.: A personal view of Hot Deck imputation procedures. Surv. Methodol. 5, 238–258 (1979)
Google Scholar
Scharfstein, D.O., Rotnitzky, A., Robins, J.M.: Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Stat. Assoc. 94, 1096–1120 (1999)
Article MathSciNet MATH Google Scholar
Shao, J., Wang, H.: Confidence intervals based on survey data with nearest neighbor imputation. Stat. Sin. 18, 281–297 (2008)
MathSciNet MATH Google Scholar
Wang, Q., Rao, J.N.K.: Empirical likelihood-based inference under imputation for missing response data. Ann. Stat. 30, 896–924 (2002)
Article MathSciNet MATH Google Scholar
Yates, F.: The analysis of replicated experiments when the field results are incomplete. Emporium J. Exp. Agric. 1, 129–142 (1933)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Central China Normal University, Wuhan, China
Jianhui Ning
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
Philip E. Cheng

Authors

Jianhui Ning
View author publications
You can also search for this author in PubMed Google Scholar
Philip E. Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip E. Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ning, J., Cheng, P.E. A comparison study of nonparametric imputation methods. Stat Comput 22, 273–285 (2012). https://doi.org/10.1007/s11222-010-9223-y

Download citation

Received: 24 November 2009
Accepted: 14 December 2010
Published: 29 December 2010
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11222-010-9223-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison study of nonparametric imputation methods

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparison study of nonparametric imputation methods

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation