Skip to main content

Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data

  • Conference paper
  • First Online:
Book cover Machine Learning and Cybernetics (ICMLC 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 481))

Included in the following conference series:

Abstract

Road traffic accidents are a major public health concern, resulting in an estimated 1.3 million deaths and 52 million injuries worldwide each year. All the developed and developing countries suffer from the consequences of increase in both human and vehicle population. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analysis the traffic accident factors effectively we need a complete traffic accident historical database without missing data. Road accident fatality rate depends on many factors and it is a very challenging task to investigate the dependencies between the attributes because of the many environmental and road accident factors. Any missing data in the database could obscure the discovery of important factors and lead to invalid conclusions. In order to make the traffic accident datasets useful for analysis, it should be preprocessed properly. In this paper, we present a novel method based on decision tree and imputed value sampling based on correlation measure for the imputation of missing values to improve the quality of the traffic accident data. We applied our algorithm to the publicly available large traffic accident database of United States (explore.data.gov), which is the largest open federal database in United States. We compare our algorithm with three existing imputation methods using three evaluation criteria, i.e. mean absolute error, coefficient of determination and root mean square error. Our results indicate that the proposed method performs significantly better than the three existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fogue, M., Garrido, P., Martinez, F.J., Cano, J.-C., Calafte, C.T.: A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms. Expert Systems with Applications 40(1), 323–336 (2013)

    Article  Google Scholar 

  2. Zamani, Z., Poumand, M.,.Saraee, M.H.: Application of data mining in traffic management: Case of city of Isfaha. In: Proceeding of ICECT2010 Conference, Kuala Lumpur, pp. 102–106 (May 2010)

    Google Scholar 

  3. Shanthi, S., Ramani, R.G.: Feature relevance analysis and classification of road traffic accident data through data mining techniques. In: Proceeding of WCECSC2012 Conference, San Francisco (October 2012)

    Google Scholar 

  4. Miksovsky, P., Matousek, K., Kouba, Z.: Data pre-processing support for data mining. In: Proceeding of IEEE SMC2002 Conference, Hammmet, pp. 1–8 (October 2002)

    Google Scholar 

  5. Rahman, M.G., Islam, M.Z.: k-DMI: A novel method for missing values imputation using two levels of horizontal partitioning in a data set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013,Part II. LNAI, vol. 8347, pp. 250–263. Springer, Heidelberg (December 2013)

    Google Scholar 

  6. Rahman, M.G., Islam, M.Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Proceeding of AusDM2011 Conference, Ballarat, pp. 41–50 (December 2011)

    Google Scholar 

  7. Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate 14(5), 853–871 (2001)

    Article  Google Scholar 

  8. Batista, G.E.A.P.A., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Journal of Applied Artificial Intelligence 17((5-6)), 519–533 (2003)

    Article  Google Scholar 

  9. Islam, M.Z., Brankovic, L.: Privacy preserving data mining: A noise addition framework using a novel clustering technique. Knowledge-Based systems 24(8), 1214–1223 (2011)

    Article  Google Scholar 

  10. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Journal of Atmospheric Environment 38(18) 2895–2907 (2004)

    Google Scholar 

  11. Liew, A.W.C., Law, N.F., Yan, H.: Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in Bioinformatics 12(5), 498–513 (2011)

    Article  Google Scholar 

  12. Gan, X., Liew, A.W.C., Yan, H.: Microarray missing data imputation based on a set theoretic framework and biological consideration. Nucleic Acids Research 34(5), 1608–1619 (2006)

    Article  Google Scholar 

  13. Cheng, K.O., Law, N.F., Siu, W.C.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recognition 45(4), 1281–1289 (2012)

    Article  Google Scholar 

  14. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29(4), 293–313 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rupam Deb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deb, R., Liew, A.Wc. (2014). Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds) Machine Learning and Cybernetics. ICMLC 2014. Communications in Computer and Information Science, vol 481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45652-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45652-1_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45651-4

  • Online ISBN: 978-3-662-45652-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics