Skip to main content
Log in

Missing data imputation for fuzzy rule-based classification systems

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Fuzzy rule-based classification systems (FRBCSs) are known due to their ability to treat with low quality data and obtain good results in these scenarios. However, their application in problems with missing data are uncommon while in real-life data, information is frequently incomplete in data mining, caused by the presence of missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation. In this work, we focus on FRBCSs considering 14 different approaches to missing attribute values treatment that are presented and analyzed. The analysis involves three different methods, in which we distinguish between Mamdani and TSK models. From the obtained results, the convenience of using imputation methods for FRBCSs with missing values is stated. The analysis suggests that each type behaves differently while the use of determined missing values imputation methods could improve the accuracy obtained for these methods. Thus, the use of particular imputation methods conditioned to the type of FRBCSs is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. http://keel.es.

  2. http://sci2s.ugr.es/keel/datasets.php.

References

  • Acuna E, Rodriguez C (2004) The treatment of missing values and its effect in the classifier accuracy. In: Banks D, House L, McMorris F, Arabie P, Gaul W (eds) Classification, clustering and data mining applications. Springer, Berlin, pp 639–648

  • Alcalá-Fdez J, Sánchez L, García S, Jesus MJD, Ventura S, Garrell JM, Otero J, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) Keel: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318

    Article  Google Scholar 

  • Barnard J, Meng X (1999) Applications of multiple imputation in medical studies: From AIDS to NHANES. Stat Methods Med Res 8(1):17–36

    Article  Google Scholar 

  • Batista G, Monard M (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5):519–533

    Article  Google Scholar 

  • Berthold MR, Huber KP (1998) Missing values and learning of fuzzy rules. Int J Uncertain, Fuzziness and Knowl-Based Syst 6:171–178

    Article  MATH  Google Scholar 

  • Chen Y, Wang JZ (2003) Support vector learning for fuzzy rule-based classification systems. IEEE Trans on Fuzzy Systems 11(6):716–728

    Article  Google Scholar 

  • Chi Z, Yan H, Pham T (1996) Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific

  • Cover TM, Thomas JA (1991) Elements of Information Theory, 2nd edn. John Wiley

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York

    Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Dubois D, Prade H (1978) Operations on fuzzy numbers. International Journal of Systems Sciences 9:613–626

    Article  MathSciNet  MATH  Google Scholar 

  • Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst, Man, Cybern, Part A 37(5):692–709

    Article  Google Scholar 

  • Farhangfar A, Kurgan L, Dy J (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recognit 41(12):3692–3705

    Article  MATH  Google Scholar 

  • Feng H, Guoshun C, Cheng Y, Yang B, Chen Y (2005) A SVM regression based approach to filling in missing values. In: Khosla R, Howlett RJ, Jain LC (eds) 9th international conference on knowledge-based & intelligent information & engineering systems (KES 2005), Springer, Lecture Notes in Computer Science, vol 3683, pp 581–587

  • Gabriel TR, Berthold MR (2005) Missing values in fuzzy rule induction. In: Anderson G, Tunstel E (eds) 2005 IEEE conference on systems, man and cybernetics, IEEE Press

  • García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694

    MATH  Google Scholar 

  • García-Laencina P, Sancho-Gómez J, Figueiras-Vidal A (2009) Pattern classification with missing data: a review. Neural Comput Appl 9(1):1–12

    Google Scholar 

  • Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065

    Article  Google Scholar 

  • Grzymala-Busse J, Goodwin L, Grzymala-Busse W, Zheng X (2005) Handling missing attribute values in preterm birth data sets. In: 10th international conference of rough sets and fuzzy sets and data mining and granular computing (RSFDGrC’5), pp 342–351

  • Grzymala-Busse JW, Hu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Ziarko W, Yao YY (eds) Rough sets and current trends in computing, Springer, lecture notes in computer science, vol 2005, pp 378–385

  • Ishibuchi H, Nakashima T, Nii M (2004) Classification and modeling with linguistic information granules: advanced approaches to linguistic data mining. Springer-Verlag New York Inc.

  • Ishibuchi H, Yamamoto T, Nakashima T (2005) Hybridization of fuzzy GBML approaches for pattern classification problems. IEEE Trans Syst, Man Cybernet B 35(2):359–365

    Article  Google Scholar 

  • Hruschka Jr. ER, Hruschka ER, Ebecken NFF (2007) Bayesian networks for imputation in classification problems. J Intell Inf Syst 29(3):231–252

  • Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198

    Article  Google Scholar 

  • Kuncheva L (2000) Fuzzy classifier design. Springer, Berlin

    MATH  Google Scholar 

  • Kwak N, Choi CH (2002a) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Article  Google Scholar 

  • Kwak N, Choi CH (2002b) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

    Article  Google Scholar 

  • Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. In: 4th international conference of rough sets and current trends in computing (RSCTC04), pp 573–579

  • Little RJA, Rubin DB (1987) Statistical Analysis with Missing Data, 1st edn. Wiley series in probability and statistics. Wiley, New York

  • Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFNs and event covering method. Neural Netw 23:406–418

    Article  Google Scholar 

  • Matsubara ET, Prati RC, Batista GEAPA, Monard MC (2008) Missing value imputation using a semi-supervised rank aggregation approach. In: Zaverucha G, da Costa ACPL (eds) 19th Brazilian symposium on artificial intelligence (SBIA 2008), Springer, Lecture Notes in Computer Science, vol 5249, pp 217–226

  • Oba S, aki Sato M, Takemasa I, Monden M, ichi Matsubara K, Ishii S (2003) A bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096

    Article  Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IIEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  • Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, MIT Press, Cambridge, pp 185–208

  • Pyle D (1999) Data preparation for data mining. Morgan Kaufmann

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:853–871

    Article  Google Scholar 

  • Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525

    Article  Google Scholar 

  • Vapnik VN (1998) Statistical learning theory. Wiley-Interscience

  • Wang H, Wang S (2010) Mining incomplete survey data through classification. Knowl Inf Syst 24(2):221–233

    Article  Google Scholar 

  • Wang LX, Mendel JM (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst, Man, Cybernet 25(2):353–361

    MathSciNet  Google Scholar 

  • Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst, Man, Cybernet 2(3):408–421

    Article  MATH  Google Scholar 

  • Wong AKC, Chiu DKY (1987) Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans Pattern Anal Mach Intell 9(6):796–805

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Spanish Ministry of Science and Technology under Project TIN2008-06681-C06-01. J. Luengo and J.A. Sáez hold a FPU scholarship from Spanish Ministry of Education and Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julián Luengo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luengo, J., Sáez, J.A. & Herrera, F. Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16, 863–881 (2012). https://doi.org/10.1007/s00500-011-0774-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-011-0774-4

Keywords

Navigation