Abstract
Road traffic accidents are a major public health concern, resulting in an estimated 1.3 million deaths and 52 million injuries worldwide each year. All the developed and developing countries suffer from the consequences of increase in both human and vehicle population. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analysis the traffic accident factors effectively we need a complete traffic accident historical database without missing data. Road accident fatality rate depends on many factors and it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Any missing data in the database could obscure the discovery of important factors and lead to invalid conclusions. In order to make the traffic accident datasets useful for analysis, it should be preprocessed properly. In this paper, we present a novel method based on decision tree and imputed value sampling based on correlation measure for the imputation of missing values to improve the quality of the traffic accident data. We applied our algorithm to the publicly available large traffic accident database of United States (explore.data.gov), which is the largest open federal database in United States. We compare our algorithm with three existing imputation methods using three evaluation criteria, i.e. mean absolute error, coefficient of determination and root mean square error. Our results indicate that the proposed method performs significantly better than the three existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fogue, M., Garrido, P., Martinez, F.J., Cano, J.-C., Calafte, C.T.: A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms. Expert Systems with Applications 40(1), 323–336 (2013)
Zamani, Z., Poumand, M.,.Saraee, M.H.: Application of data mining in traffic management: Case of city of Isfaha. In: Proceeding of ICECT2010 Conference, Kuala Lumpur, pp. 102–106 (May 2010)
Shanthi, S., Ramani, R.G.: Feature relevance analysis and classification of road traffic accident data through data mining techniques. In: Proceeding of WCECSC2012 Conference, San Francisco (October 2012)
Miksovsky, P., Matousek, K., Kouba, Z.: Data pre-processing support for data mining. In: Proceeding of IEEE SMC2002 Conference, Hammmet, pp. 1–8 (October 2002)
Rahman, M.G., Islam, M.Z.: k-DMI: A novel method for missing values imputation using two levels of horizontal partitioning in a data set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013,Part II. LNAI, vol. 8347, pp. 250–263. Springer, Heidelberg (December 2013)
Rahman, M.G., Islam, M.Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Proceeding of AusDM2011 Conference, Ballarat, pp. 41–50 (December 2011)
Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14(5), 853–871 (2001)
Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Journal of Applied Artificial Intelligence 17((5-6)), 519–533 (2003)
Islam, M.Z., Brankovic, L.: Privacy preserving data mining: A noise addition framework using a novel clustering technique. Knowledge-Based systems 24(8), 1214–1223 (2011)
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Journal of Atmospheric Environment 38(18) 2895–2907 (2004)
Liew, A.W.C., Law, N.F., Yan, H.: Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in Bioinformatics 12(5), 498–513 (2011)
Gan, X., Liew, A.W.C., Yan, H.: Microarray missing data imputation based on a set theoretic framework and biological consideration. Nucleic Acids Research 34(5), 1608–1619 (2006)
Cheng, K.O., Law, N.F., Siu, W.C.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recognition 45(4), 1281–1289 (2012)
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29(4), 293–313 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deb, R., Liew, A.Wc. (2014). Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds) Machine Learning and Cybernetics. ICMLC 2014. Communications in Computer and Information Science, vol 481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45652-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-662-45652-1_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45651-4
Online ISBN: 978-3-662-45652-1
eBook Packages: Computer ScienceComputer Science (R0)