Optimized Parameters for Missing Data Imputation

Zhang, Shichao; Qin, Yongsong; Zhu, Xiaofeng; Zhang, Jilian; Zhang, Chengqi

doi:10.1007/978-3-540-36668-3_124

Shichao Zhang^20,21,
Yongsong Qin²⁰,
Xiaofeng Zhu²⁰,
Jilian Zhang²⁰ &
…
Chengqi Zhang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4099))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2222 Accesses
8 Citations

Abstract

To complete missing values, a solution is to use attribute correlations within data. However, it is difficult to identify such relations within data containing missing values. Accordingly, we develop a kernel-based missing data imputation method in this paper. This approach aims at making optimal statistical parameters: mean, distribution function after missing-data are imputed. We refer this approach to p arameter op timization method (POP algorithm, a random regression imputation). We experimentally evaluate our approach, and demonstrate that our POP algorithm is much better than deterministic regression imputation in efficiency of generating an inference on the above two parameters. The results also show our algorithm is computationally efficient, robust and stable for the missing data imputation.

This work is partially supported by Australian large ARC grants (DP0449535 DP0559536 and DP0667060), a China NSFC major research Program (60496327), a China NSFC grant (60463003) and a grant from Overseas Outstanding Talent Research Program of Chinese Academy of Sciences (06S3011S01).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 239.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Batista, G.A., et al.: An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence 17(5-6), 519–533 (2003)
Article Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning database. Irvine, CA: university of California, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLResoesitory.html
Chen, S.M., Chen, H.H.: Estimating null values in the distributed relational databases environments. Cybernetics and Systems 31, 851–871 (2000)
Article MATH Google Scholar
Chen, S.M., Huang, C.M.: Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Transactions on Fuzzy Systems 11, 495–506 (2003)
Article Google Scholar
Gessert, G.: Handling Missing Data by Using Stored Truth Values. SIGMOD Record 20(3), 30–42 (1991)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Kahl, F., Heyden, A., Quan, L.: Minimal Projective Reconstruction Including Missing Data. IEEE Trans. Pattern Anal. Mach. Intell. 23(4), 418–424 (2001)
Article Google Scholar
Lakshminarayan, K., Harp, S.A., Goldman, R.P., Samad, T.: Imputation of Missing Data Using Machine Learning Techniques. In: KDD-1996, pp. 140–145 (1996)
Google Scholar
Little, R.J.A., Rubin, D.A.: Statistical analysis with missing data. John Wiley and Sons, New York (1987)
MATH Google Scholar
Magnani, M.: Techniques for Dealing with Missing Data in Knowledge Discovery Tasks (2004), http://magnanim.web.cs.unibo.it/index.html
Qin, Y.S., Rao, J.N.K.: Confidence intervals for parameters of the response variable in a linear model with missing data. Technical Report, School of Math and Statistics, Carleton University (2004)
Google Scholar
Pawlak, W.: Kernel classification rules from missing data. IEEE Transactions on Information Theory 39(3), 979–988 (1993)
Article MATH MathSciNet Google Scholar
Pesonen, E., Eskelinen, M., Juhola, M.: Treatment of missing data values in a neural network based decision support system for acute abdominal pain. Artificial Intelligence in Medicine 13(3), 139–146 (1998)
Article Google Scholar
Ramoni, M., et al.: Robust Learning with Missing Data. Machine Learning 45(2), 147–170 (2001)
Article MATH Google Scholar
Rao, J.N.K.: On variance estimation with imputed survey data. J. Amer. Statist. Assoc. 91, 499–520 (1996)
Article MATH Google Scholar
Schafer, J.L., Graham, J.W.: Missing Data: Our View of the State of the Art. Psychological Methods 7(2), 147–177 (2002)
Article Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Information Enhancement for Data Mining. IEEE Intelligent Systems 19(2), 12–13 (2004)
Article Google Scholar
Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: Missing is useful: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12), 1689–1693 (2005)
Article Google Scholar
Wang, Q., Rao, J.N.K.: Empirical likelihood-based inference in linear models with missing data. Scand. J. Statist. 29, 563–576 (2002a)
Article MATH MathSciNet Google Scholar
Wang, Q., Rao, J.N.K.: Empirical likelihood-based inference under imputation for missing response data. Ann. Statist. 30, 896–924 (2002b)
Article MATH MathSciNet Google Scholar
Lall, U., Sharma, A.: A nearest-neighbor bootstrap for resampling hydrologic time series. Water Resource. Res. 32, 679–693 (1996)
Article Google Scholar
John, S.-T., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge Press, Cambridge (2004)
Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York (1986)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Deparment of Computer Science, Guangxi Normal University, China
Shichao Zhang, Yongsong Qin, Xiaofeng Zhu & Jilian Zhang
Faculty of Information Technology, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia
Shichao Zhang & Chengqi Zhang

Authors

Shichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yongsong Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jilian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chengqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Hong Kong University of Science and Technology,, Hong Kong
Qiang Yang
Clayton School of Information Technology, Monash University, P.O. Box, Australia
Geoff Webb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S., Qin, Y., Zhu, X., Zhang, J., Zhang, C. (2006). Optimized Parameters for Missing Data Imputation. In: Yang, Q., Webb, G. (eds) PRICAI 2006: Trends in Artificial Intelligence. PRICAI 2006. Lecture Notes in Computer Science(), vol 4099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36668-3_124

Download citation

DOI: https://doi.org/10.1007/978-3-540-36668-3_124
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36667-6
Online ISBN: 978-3-540-36668-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics