Skip to main content

Optimized Parameters for Missing Data Imputation

  • Conference paper
PRICAI 2006: Trends in Artificial Intelligence (PRICAI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4099))

Included in the following conference series:

Abstract

To complete missing values, a solution is to use attribute correlations within data. However, it is difficult to identify such relations within data containing missing values. Accordingly, we develop a kernel-based missing data imputation method in this paper. This approach aims at making optimal statistical parameters: mean, distribution function after missing-data are imputed. We refer this approach to p arameter op timization method (POP algorithm, a random regression imputation). We experimentally evaluate our approach, and demonstrate that our POP algorithm is much better than deterministic regression imputation in efficiency of generating an inference on the above two parameters. The results also show our algorithm is computationally efficient, robust and stable for the missing data imputation.

This work is partially supported by Australian large ARC grants (DP0449535 DP0559536 and DP0667060), a China NSFC major research Program (60496327), a China NSFC grant (60463003) and a grant from Overseas Outstanding Talent Research Program of Chinese Academy of Sciences (06S3011S01).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 239.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batista, G.A., et al.: An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence 17(5-6), 519–533 (2003)

    Article  Google Scholar 

  2. Blake, C.L., Merz, C.J.: UCI Repository of machine learning database. Irvine, CA: university of California, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLResoesitory.html

  3. Chen, S.M., Chen, H.H.: Estimating null values in the distributed relational databases environments. Cybernetics and Systems 31, 851–871 (2000)

    Article  MATH  Google Scholar 

  4. Chen, S.M., Huang, C.M.: Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Transactions on Fuzzy Systems 11, 495–506 (2003)

    Article  Google Scholar 

  5. Gessert, G.: Handling Missing Data by Using Stored Truth Values. SIGMOD Record 20(3), 30–42 (1991)

    Article  Google Scholar 

  6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  7. Kahl, F., Heyden, A., Quan, L.: Minimal Projective Reconstruction Including Missing Data. IEEE Trans. Pattern Anal. Mach. Intell. 23(4), 418–424 (2001)

    Article  Google Scholar 

  8. Lakshminarayan, K., Harp, S.A., Goldman, R.P., Samad, T.: Imputation of Missing Data Using Machine Learning Techniques. In: KDD-1996, pp. 140–145 (1996)

    Google Scholar 

  9. Little, R.J.A., Rubin, D.A.: Statistical analysis with missing data. John Wiley and Sons, New York (1987)

    MATH  Google Scholar 

  10. Magnani, M.: Techniques for Dealing with Missing Data in Knowledge Discovery Tasks (2004), http://magnanim.web.cs.unibo.it/index.html

  11. Qin, Y.S., Rao, J.N.K.: Confidence intervals for parameters of the response variable in a linear model with missing data. Technical Report, School of Math and Statistics, Carleton University (2004)

    Google Scholar 

  12. Pawlak, W.: Kernel classification rules from missing data. IEEE Transactions on Information Theory 39(3), 979–988 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  13. Pesonen, E., Eskelinen, M., Juhola, M.: Treatment of missing data values in a neural network based decision support system for acute abdominal pain. Artificial Intelligence in Medicine 13(3), 139–146 (1998)

    Article  Google Scholar 

  14. Ramoni, M., et al.: Robust Learning with Missing Data. Machine Learning 45(2), 147–170 (2001)

    Article  MATH  Google Scholar 

  15. Rao, J.N.K.: On variance estimation with imputed survey data. J. Amer. Statist. Assoc. 91, 499–520 (1996)

    Article  MATH  Google Scholar 

  16. Schafer, J.L., Graham, J.W.: Missing Data: Our View of the State of the Art. Psychological Methods 7(2), 147–177 (2002)

    Article  Google Scholar 

  17. Zhang, S., Zhang, C., Yang, Q.: Information Enhancement for Data Mining. IEEE Intelligent Systems 19(2), 12–13 (2004)

    Article  Google Scholar 

  18. Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: Missing is useful: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12), 1689–1693 (2005)

    Article  Google Scholar 

  19. Wang, Q., Rao, J.N.K.: Empirical likelihood-based inference in linear models with missing data. Scand. J. Statist. 29, 563–576 (2002a)

    Article  MATH  MathSciNet  Google Scholar 

  20. Wang, Q., Rao, J.N.K.: Empirical likelihood-based inference under imputation for missing response data. Ann. Statist. 30, 896–924 (2002b)

    Article  MATH  MathSciNet  Google Scholar 

  21. Lall, U., Sharma, A.: A nearest-neighbor bootstrap for resampling hydrologic time series. Water Resource. Res. 32, 679–693 (1996)

    Article  Google Scholar 

  22. John, S.-T., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge Press, Cambridge (2004)

    Google Scholar 

  23. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York (1986)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, S., Qin, Y., Zhu, X., Zhang, J., Zhang, C. (2006). Optimized Parameters for Missing Data Imputation. In: Yang, Q., Webb, G. (eds) PRICAI 2006: Trends in Artificial Intelligence. PRICAI 2006. Lecture Notes in Computer Science(), vol 4099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36668-3_124

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-36668-3_124

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36667-6

  • Online ISBN: 978-3-540-36668-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics