Skip to main content
Log in

FRCT: fuzzy-rough classification trees

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Using fuzzy-rough hybrids, we have proposed a measure to quantify the functional dependency of decision attribute(s) on condition attribute(s) within fuzzy data. We have shown that the proposed measure of dependency degree is a generalization of the measure proposed by Pawlak for crisp data. In this paper, this new measure of dependency degree has been encapsulated into the decision tree generation mechanism to produce fuzzy-rough classification trees (FRCT); efficient, top-down, multi-class decision tree structures geared to solving classification problems from feature-based learning examples. The developed FRCT generation algorithm has been applied to 16 real-world benchmark datasets. It is experimentally compared with the five fuzzy decision tree generation algorithms reported so far, and the rough decomposition tree algorithm. Comparison has been made in terms of number of rules, average training time, and classification accuracy. Experimental results show that the proposed algorithm to generate FRCT outperforms existing fuzzy decision tree generation techniques and rough decomposition tree induction algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. A fuzzy set whose support is a single point in U with μ F (u) = 1 is called a fuzzy singleton.

References

  1. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall, New York

    MATH  Google Scholar 

  2. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  3. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kauffmann, San Mateo

    Google Scholar 

  4. Quinlan JR (1990) Decision trees and decision making. IEEE Trans SMC 20(2):339–346

    Google Scholar 

  5. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans SMC 21(3):660–674

    MathSciNet  Google Scholar 

  6. Yuan Y, Shaw MJ (1995) Induction of fuzzy decision trees. Fuzzy Sets Syst 69:125–139

    Article  MathSciNet  Google Scholar 

  7. Umano M et al. (1994) Fuzzy decision tree by fuzzy ID3 algorithm and its application to diagnosis systems. In: IEEE international conference on Fuzzy Systems, June 26–29, pp 2113–2118

  8. Chiang I-J, J.Y.-jen Hsu (2002) Fuzzy classification trees for data analysis. Fuzzy Sets Syst 130:87–99

    Article  MATH  Google Scholar 

  9. Jeng B, Jeng Y-M, Liang T-P (1997) FILM:A fuzzy inductive learning method for automated knowledge acquisition. Dec Support Syst 21:61–73

    Article  Google Scholar 

  10. Ichihashi H, Shirai T, Nagasaka K, Miyoshi T (1996) Neuro-fuzzy ID3. Fuzzy Sets Syst 81:157–167

    Article  MathSciNet  Google Scholar 

  11. Sison LG, Chong EKP (1994) Fuzzy modeling by induction and pruning of decision trees. In: Proceedings of the IEEE international symposium on intelligent control, Columbus, OH, pp 166–171

  12. Tani T, Sakoda M (1992) Fuzzy modeling by ID3 algorithm and its application to prediction of outlet temperature. In: Proceedings of the IEEE international conference on Fuzzy Systems, San Diego, CA, pp 923–930

  13. Weber R (1992) Fuzzy ID3: A class of methods for automatic knowledge acquisition. In: Proceedings of the international conference on Fuzzy Logic Neural Networks, Iizuka, Japan, pp 265–268

  14. Mitra S, Knowar KM, Pal SK (2002) Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation. IEEE Trans SMC-C: Appl Rev 32(4):328–339

    Google Scholar 

  15. Janikow CZ (1998) Fuzzy Decision trees: issues and methods. IEEE Trans SMC-B:Cybern 28(1):1–14

    Article  Google Scholar 

  16. Yeung DS, Wang XZ, Tsang ECC (1999) Learning weighted fuzzy rules from examples with mixed attributes by fuzzy decision trees. In: Proceedings of the IEEE international conference on SMC, Tokyo, Japan, October 12–15, pp 349–354

  17. Ming Dong, Kothari R (2001) Look-ahead based fuzzy decision tree induction. IEEE Trans Fuzzy Syst 9(3):461–468

    Article  Google Scholar 

  18. Liu X, Pedrycz W (2007) The development of fuzzy decision trees in the framework of axiomatic fuzzy set logic. Appl Soft Comput 7(1):325–342

    Article  Google Scholar 

  19. Pedrycz W, Sosnowski ZA (2005) C-fuzzy decision trees. IEEE Trans SMC-C 35(4):498–511

    Google Scholar 

  20. Wang X-Z, Yeung DS, Tsang ECC (2001) A comparative study on heuristic algorithms for generating fuzzy decision trees. IEEE Trans SMC-B 21(2):215–226

    MATH  Google Scholar 

  21. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Norwell

    MATH  Google Scholar 

  22. Nguyen HS (1997) Discretization of real value attributes: a boolean reasoning approach. Department of Mathematics, Computer Science and Mechanics, Warsaw University

  23. Dubois D, Prade H (1992) Putting fuzzy sets and rough sets together. In: Slowinski R (ed) Intelligent decision support. Kluwer, Dordrecht, pp 203–232

    Google Scholar 

  24. Zadeh LA (1998) Fuzzy logic. IEEE Comput 21(4):83–93

    Google Scholar 

  25. Rajen B. Bhatt, Gopal M (2006) On the extension of functional dependency degree from crisp to fuzzy partitions. Pattern Recogn Lett 27(5):487–491

  26. Rajen B. Bhatt, Gopal M (2004) FRID: Fuzzy-Rough Interactive Dichotomizers. In: Proceedings of the IEEE international conference on Fuzzy Systems, IEEE-FUZZ’04, Budapest, Hungary, July 26–29, pp 1337–1342

  27. Blake CL, Merz CJ (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/∼mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science

  28. Nozaki K, Ishibuchi H, Tanaka H (1997) A simple but powerful heuristic method for generating fuzy rules from numerical data. Fuzzy Sets Syst 86:251–270

    Article  Google Scholar 

  29. http://lib.stat.cmu.edu/datasets/veteran

  30. Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 128:221–254

    Article  MathSciNet  Google Scholar 

  31. Jayashree S, Bhatia M, Shweta S, Anand S (2007) Quantative EEG analysis for assessment to ‘plan’ a task in Amyotropic Lateral Sclerosic (ALS) patients: a study of executive functions in ALS patients. Cogn Brain Res 22(1):59–66

    Google Scholar 

  32. http://lib.stat.cmu.edu/DASL/Datafiles/ICU.html

  33. Pal NR, Bezdek JC (1995) On cluster validity for fuzzy c-means model. IEEE Trans Fuzzy Syst 3(3):370–379

    Article  Google Scholar 

  34. Hogg RV, Tanis EA (1977) Probability and statistical inference. Macmillan, New York

    MATH  Google Scholar 

  35. Hong T-P, Wang T-T, Wang S-L, Chien B-C (2000) Learning a coverage set of maximally general fuzzy rules by rough sets. Expert Syst Appl 19(2):97–103

    Article  Google Scholar 

Download references

Acknowledgement

The Authors would like to thank anonymous referees for their helpful and precious comments. The Authors are thankful to Prof. Zdislaw Pawlak, Polish Academy of Sciences, Warsaw, Poland, for sending them some of his research notes on rough set theory. The authors are also thankful to C. Olaru for providing OMIB-One Machine Infinite Bus dataset [30].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajen B. Bhatt.

Appendix

Appendix

Consider a dataset [4] given in Table 7. Each pattern is described by four attributes and classified to either Don’t play or Play. Two attributes Temp and Humidity are real-valued and other two Outlook and Windy? are categorical. In the context of rough set theory, we may call categorical attributes as discrete attributes.

Table 7 Example dataset

Rough set theory requires a priori discretization of real-valued attributes. Real-valued attributes Temp and Humidity have been discretized using partition intervals given in Table 8, producing the dataset seen in Table 9. These intervals are crisp. For example, any value of Temp and Humidity lying in the intervals [68–81] and [74–87.5] respectively, will be treated as Med only, without considering their degree of belongingness to that interval.

Table 8 Discretization intervals
Table 9 Discretized dataset

For given dataset:

$$ U = {\left\{{1,2, \ldots, 14} \right\}}{{;}}y = {\left\{{Don\text{'}t\, play,Play} \right\}};n = 14. $$

Let x 1 = Outlookx 2 = Tempx 3 = Humidity, and x 4 = Windy?.

A set of patterns i ∈ U, for which attribute values are similar, partitions U into a set of equivalence classes. Partition of U by attribute x 1 is

$$ {\left\{{{\left[ {Rain} \right]},{\left[ {Overcast} \right]},{\left[ {Sunny} \right]}} \right\}} = {\left\{\begin{aligned} &{\left\{{1,2,5,11,12} \right\}}, \\ &{\left\{{3,4,6,10} \right\}}, \\ &{\left\{{7,8,9,13,14} \right\}} \\ \end{aligned} \right\}}. $$

Partitions of U by class labels are:

$$ {\left[ {Don\text{'}t\, play} \right]} = {\left\{{1,2,8,9,12,13} \right\}}; [Play] = {\left\{{3,4,5,6,7,10,11,14} \right\}}. $$

The concept given by Pawlak is to approximate each class by a pair of exact sets, called the ‘lower’ and ‘upper’ approximations. Lower approximation is the set of patterns, which certainly are classified to a given class, while upper approximation is the set of patterns, which can be possibly classified to a given class.

Formally, if each attribute x j takes a value from finite set of categorical values F jk (1 ≤ k ≤ c j ), classes \({\left[ {F_{jk}} \right]} = {\left\{{i \in U|x^{i}_{j} = F_{jk}} \right\}}; 1 \leq k \leq c_{j},\) are called the equivalence classes of U with respect to jth attribute x j . By the same way, [ l ] = {i ∈ U|y i = l} is an equivalence class through lth classification label. Equivalence classes can also be generated by considering more than one attributes at a time. Given arbitrary class l and equivalence classes generated by attribute x j , rough set is a tuple \({\left\langle {\underline{l}, \overline{l}} \right\rangle},\) where lower and upper approximations \(\underline{l} \) and \(\overline{l}\) respectively, are defined by Pawlak [21] as

$$ \begin{aligned} &\underline{l} = {\left\{{i \in U|{\left[ {F_{jk}} \right]} \subseteq {\left[ l \right]};k = 1, \ldots, c_{j}} \right\}} \\ &\overline{l} = {\left\{{i \in U|{\left[ {F_{jk}} \right]} \cap {\left[ l \right]} \ne \emptyset; k = 1, \ldots, c_{j}} \right\}} \end{aligned} $$
(10)

Lower and upper approximations for Don’t play and Play, through equivalence classes by x 1 can be calculated by using Eq. (10).

$$ \begin{aligned} &\underline{{Don\text{'}t\, play}} = \emptyset; \underline{{Play}} = {\left\{{3,4,6,10} \right\}};\\ &\overline{{Don\text{'}t\, play}} = {\left\{{1,2,5,7,8,9,11,12,13,14} \right\}}; \overline{{Play}} = {\left\{{1,2,3,4,5,6,7,8,9,10,11,12,13,14} \right\}}. \end{aligned} $$

The positive region is the set of patterns which can be classified with certainty to a unique class; i.e.,

$$ \text{POS}_{{x_{j}}} {\left(y \right)} = {\bigcup\limits_{l = 1}^q {\underline{l}}} $$
(11)

For the considered example,

$$ \text{POS}_{{x_{1}}} {\left(y \right)} = \underline{{Don\text{'}t\, play}} \cup \underline{{Play}} = {\left\{{3,4,6,10} \right\}}. $$

It is clear that through attribute x 1, patterns {1,2,5,7,8,9,10,11,12,13,14} can not be classified to unique decision class with certainty. For example, patterns 1 and 5 are classified in different decisions, even though they have same value Rain of attribute x 1. But there is no classification ambiguity in patterns belonging to \(\text{POS}_{{x_{1}}} {\left(y \right)}.\) If \(\text{POS}_{{x_{j}}} {\left(y \right)} = U,\) then each pattern i ∈ U has been certainly classified to a unique class without any ambiguity. By this, Pawlak [21] defined the dependency degree \(\gamma_{{x_{j}}} {\left(y \right)}\) of decision attribute y on condition attribute x j :

$$ \gamma_{{x_{j}}} {\left(y \right)} = \frac{{{\left\| {\text{POS}_{{x_{j}}} {\left(y \right)}} \right\|}}}{n}, $$
(12)

where \({\left\| {\text{POS}_{{x_{j}}} {\left(y \right)}} \right\|}\) is the number of patterns belonging to \(\text{POS}_{{x_{j}}} {\left(y \right)}.\)

For considered example, The dependency degree can be calculated by Eq. (12), which is \(\gamma_{{x_{1}}} {\left(y \right)} = \frac{{{\left\| {\text{POS}_{{x_{1}}} {\left(y \right)}} \right\|}}}{{14}} = 0.2857.\) By the same way we can obtain \(\gamma_{{x_{2}}} (y) = 0,\gamma_{{x_{3}}} (y) = 0,{\hbox{and}}\; \gamma_{{x_{4}}} (y) = 0.\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhatt, R.B., Gopal, M. FRCT: fuzzy-rough classification trees. Pattern Anal Applic 11, 73–88 (2008). https://doi.org/10.1007/s10044-007-0080-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-007-0080-z

Keywords

Navigation