Skip to main content

Advertisement

Log in

A comparison study of nonparametric imputation methods

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, T.W.: Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J. Am. Stat. Assoc. 52, 200–203 (1957)

    Article  MATH  Google Scholar 

  • Carpenter, J.R., Kenward, M.G., Vansteelandt, S.: A comparison of multiple imputation and doubly robust estimation for analyses with missing data. J. R. Stat. Soc. A 69, 571–584 (2006)

    Article  MathSciNet  Google Scholar 

  • Chen, J., Shao, J.: Nearest neighbor imputation for survey data. J. Off. Stat. 16, 113–132 (2000)

    MATH  Google Scholar 

  • Chen, J., Shao, J.: Jacknife variance estimation for nearest neighbor imputation. J. Am. Stat. Assoc. 96, 260–269 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng, P.E.: Strong consistency of nearest neighbor regression function estimators. J. Multivar. Anal. 15, 63–72 (1984)

    Article  MATH  Google Scholar 

  • Cheng, P.E.: Nonparametric estimation of mean functionals with data missing at random. J. Am. Stat. Assoc. 89, 81–87 (1994)

    Article  MATH  Google Scholar 

  • Cheng, P.E., Wei, L.J.: Nonparametric inference under ignorable missing data process and treatment assignment. In: International Statistical Symposium, Taipei, vol. 1, pp. 97–112 (1986)

    Google Scholar 

  • Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)

    MATH  Google Scholar 

  • Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Roubin, D.B.: Maximum likehood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)

    MATH  Google Scholar 

  • Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen., Part II 7, 179–188 (1936)

    Article  Google Scholar 

  • Fix, E., Hodges, J.L. Jr.: Discriminatory analysis, nonparametric discrimination. USAF School of Aviation Medicine, Randolph Field, Tex., Project 21-49-404, Rept. 4, Contract AF41(128)-31 (1951)

  • Gunn, S.R.: Support vector machines for classification and regression. Technical Report MP-TR-98-05, Image Speech and Intelligent Systems Group, University of Southampton (1998)

  • Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite population. J. Am. Stat. Assoc. 47, 663–685 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  • Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22, 523–539 (2007)

    Article  MathSciNet  Google Scholar 

  • Lee, H., Rancout, E., Sarndal, C.E.: Experiments with variance estimation from survey data with imputed values. J. Off. Stat. 10, 231–243 (1994)

    Google Scholar 

  • Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)

    MATH  Google Scholar 

  • Logtsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36, 1049–1051 (1965)

    Article  Google Scholar 

  • Neyman, J.: Contribution to the theory of sampling human populations. J. Am. Stat. Assoc. 33, 101–116 (1938)

    Article  MATH  Google Scholar 

  • Orchard, T., Woodury, M.A.: A missing information principle: Theory and applications. In: Proc. 6th Berkeley Symposium on Math. Stat. and Prob., vol. 1, pp. 697–715 (1972)

    Google Scholar 

  • Potthoff, R.F., Roy, S.N.: A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51, 313–326 (1964)

    MathSciNet  MATH  Google Scholar 

  • Qin, J., Shao, J., Zhang, B.: Efficient and doubly robust imputation for covariate-dependent missing responses. J. Am. Stat. Assoc. 103, 797–810 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Rancourt, E.: Estimation with nearest neighbor imputation at Statistics Canada. In: Proceedings of the Section on Survey Research Methods, pp. 131–138. Am. Statist. Assoc., Alexandria (1999)

    Google Scholar 

  • Robins, J.M., Rotnitzky, A.: Comment on “Inference for semiparamentric models: some questions and an answer,” by P.J. Bickel and J. Kwon. Stat. Sin. 11, 920–936 (2001)

    Google Scholar 

  • Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89, 846–886 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J.M., Sued, M., Quanhong, L.-G., Rotnitzky, A.: Comment: performance of double-robust estimators when inverse probability weights are highly variable. Stat. Sci. 22, 544–559 (2007)

    Article  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–45 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  • Sande, I.G.: A personal view of Hot Deck imputation procedures. Surv. Methodol. 5, 238–258 (1979)

    Google Scholar 

  • Scharfstein, D.O., Rotnitzky, A., Robins, J.M.: Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Stat. Assoc. 94, 1096–1120 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Shao, J., Wang, H.: Confidence intervals based on survey data with nearest neighbor imputation. Stat. Sin. 18, 281–297 (2008)

    MathSciNet  MATH  Google Scholar 

  • Wang, Q., Rao, J.N.K.: Empirical likelihood-based inference under imputation for missing response data. Ann. Stat. 30, 896–924 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Yates, F.: The analysis of replicated experiments when the field results are incomplete. Emporium J. Exp. Agric. 1, 129–142 (1933)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip E. Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ning, J., Cheng, P.E. A comparison study of nonparametric imputation methods. Stat Comput 22, 273–285 (2012). https://doi.org/10.1007/s11222-010-9223-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-010-9223-y

Keywords

Navigation