Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data

Deb, Rupam; Liew, Alan Wee-chung

doi:10.1007/978-3-662-45652-1_28

Rupam Deb⁵ &
Alan Wee-chung Liew⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 481))

Included in the following conference series:

International Conference on Machine Learning and Cybernetics

1626 Accesses
5 Citations

Abstract

Road traffic accidents are a major public health concern, resulting in an estimated 1.3 million deaths and 52 million injuries worldwide each year. All the developed and developing countries suffer from the consequences of increase in both human and vehicle population. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analysis the traffic accident factors effectively we need a complete traffic accident historical database without missing data. Road accident fatality rate depends on many factors and it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Any missing data in the database could obscure the discovery of important factors and lead to invalid conclusions. In order to make the traffic accident datasets useful for analysis, it should be preprocessed properly. In this paper, we present a novel method based on decision tree and imputed value sampling based on correlation measure for the imputation of missing values to improve the quality of the traffic accident data. We applied our algorithm to the publicly available large traffic accident database of United States (explore.data.gov), which is the largest open federal database in United States. We compare our algorithm with three existing imputation methods using three evaluation criteria, i.e. mean absolute error, coefficient of determination and root mean square error. Our results indicate that the proposed method performs significantly better than the three existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fogue, M., Garrido, P., Martinez, F.J., Cano, J.-C., Calafte, C.T.: A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms. Expert Systems with Applications 40(1), 323–336 (2013)
Article Google Scholar
Zamani, Z., Poumand, M.,.Saraee, M.H.: Application of data mining in traffic management: Case of city of Isfaha. In: Proceeding of ICECT2010 Conference, Kuala Lumpur, pp. 102–106 (May 2010)
Google Scholar
Shanthi, S., Ramani, R.G.: Feature relevance analysis and classification of road traffic accident data through data mining techniques. In: Proceeding of WCECSC2012 Conference, San Francisco (October 2012)
Google Scholar
Miksovsky, P., Matousek, K., Kouba, Z.: Data pre-processing support for data mining. In: Proceeding of IEEE SMC2002 Conference, Hammmet, pp. 1–8 (October 2002)
Google Scholar
Rahman, M.G., Islam, M.Z.: k-DMI: A novel method for missing values imputation using two levels of horizontal partitioning in a data set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013,Part II. LNAI, vol. 8347, pp. 250–263. Springer, Heidelberg (December 2013)
Google Scholar
Rahman, M.G., Islam, M.Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Proceeding of AusDM2011 Conference, Ballarat, pp. 41–50 (December 2011)
Google Scholar
Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14(5), 853–871 (2001)
Article Google Scholar
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Journal of Applied Artificial Intelligence 17((5-6)), 519–533 (2003)
Article Google Scholar
Islam, M.Z., Brankovic, L.: Privacy preserving data mining: A noise addition framework using a novel clustering technique. Knowledge-Based systems 24(8), 1214–1223 (2011)
Article Google Scholar
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Journal of Atmospheric Environment 38(18) 2895–2907 (2004)
Google Scholar
Liew, A.W.C., Law, N.F., Yan, H.: Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in Bioinformatics 12(5), 498–513 (2011)
Article Google Scholar
Gan, X., Liew, A.W.C., Yan, H.: Microarray missing data imputation based on a set theoretic framework and biological consideration. Nucleic Acids Research 34(5), 1608–1619 (2006)
Article Google Scholar
Cheng, K.O., Law, N.F., Siu, W.C.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recognition 45(4), 1281–1289 (2012)
Article Google Scholar
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29(4), 293–313 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Technology, Griffith University, Logan, Australia
Rupam Deb & Alan Wee-chung Liew

Authors

Rupam Deb
View author publications
You can also search for this author in PubMed Google Scholar
Alan Wee-chung Liew
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rupam Deb .

Editor information

Editors and Affiliations

Hebei University, Baoding, China
Xizhao Wang
Department of Electrical and Computer En, University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz
South China University of Technology, Guangzhou, China
Patrick Chan
Hebei University, Baoding, China
Qiang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deb, R., Liew, A.Wc. (2014). Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds) Machine Learning and Cybernetics. ICMLC 2014. Communications in Computer and Information Science, vol 481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45652-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-662-45652-1_28
Published: 05 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45651-4
Online ISBN: 978-3-662-45652-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics