Abstract
Using fuzzy-rough hybrids, we have proposed a measure to quantify the functional dependency of decision attribute(s) on condition attribute(s) within fuzzy data. We have shown that the proposed measure of dependency degree is a generalization of the measure proposed by Pawlak for crisp data. In this paper, this new measure of dependency degree has been encapsulated into the decision tree generation mechanism to produce fuzzy-rough classification trees (FRCT); efficient, top-down, multi-class decision tree structures geared to solving classification problems from feature-based learning examples. The developed FRCT generation algorithm has been applied to 16 real-world benchmark datasets. It is experimentally compared with the five fuzzy decision tree generation algorithms reported so far, and the rough decomposition tree algorithm. Comparison has been made in terms of number of rules, average training time, and classification accuracy. Experimental results show that the proposed algorithm to generate FRCT outperforms existing fuzzy decision tree generation techniques and rough decomposition tree induction algorithm.
Similar content being viewed by others
Notes
A fuzzy set whose support is a single point in U with μ F (u) = 1 is called a fuzzy singleton.
References
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall, New York
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kauffmann, San Mateo
Quinlan JR (1990) Decision trees and decision making. IEEE Trans SMC 20(2):339–346
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans SMC 21(3):660–674
Yuan Y, Shaw MJ (1995) Induction of fuzzy decision trees. Fuzzy Sets Syst 69:125–139
Umano M et al. (1994) Fuzzy decision tree by fuzzy ID3 algorithm and its application to diagnosis systems. In: IEEE international conference on Fuzzy Systems, June 26–29, pp 2113–2118
Chiang I-J, J.Y.-jen Hsu (2002) Fuzzy classification trees for data analysis. Fuzzy Sets Syst 130:87–99
Jeng B, Jeng Y-M, Liang T-P (1997) FILM:A fuzzy inductive learning method for automated knowledge acquisition. Dec Support Syst 21:61–73
Ichihashi H, Shirai T, Nagasaka K, Miyoshi T (1996) Neuro-fuzzy ID3. Fuzzy Sets Syst 81:157–167
Sison LG, Chong EKP (1994) Fuzzy modeling by induction and pruning of decision trees. In: Proceedings of the IEEE international symposium on intelligent control, Columbus, OH, pp 166–171
Tani T, Sakoda M (1992) Fuzzy modeling by ID3 algorithm and its application to prediction of outlet temperature. In: Proceedings of the IEEE international conference on Fuzzy Systems, San Diego, CA, pp 923–930
Weber R (1992) Fuzzy ID3: A class of methods for automatic knowledge acquisition. In: Proceedings of the international conference on Fuzzy Logic Neural Networks, Iizuka, Japan, pp 265–268
Mitra S, Knowar KM, Pal SK (2002) Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation. IEEE Trans SMC-C: Appl Rev 32(4):328–339
Janikow CZ (1998) Fuzzy Decision trees: issues and methods. IEEE Trans SMC-B:Cybern 28(1):1–14
Yeung DS, Wang XZ, Tsang ECC (1999) Learning weighted fuzzy rules from examples with mixed attributes by fuzzy decision trees. In: Proceedings of the IEEE international conference on SMC, Tokyo, Japan, October 12–15, pp 349–354
Ming Dong, Kothari R (2001) Look-ahead based fuzzy decision tree induction. IEEE Trans Fuzzy Syst 9(3):461–468
Liu X, Pedrycz W (2007) The development of fuzzy decision trees in the framework of axiomatic fuzzy set logic. Appl Soft Comput 7(1):325–342
Pedrycz W, Sosnowski ZA (2005) C-fuzzy decision trees. IEEE Trans SMC-C 35(4):498–511
Wang X-Z, Yeung DS, Tsang ECC (2001) A comparative study on heuristic algorithms for generating fuzzy decision trees. IEEE Trans SMC-B 21(2):215–226
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Norwell
Nguyen HS (1997) Discretization of real value attributes: a boolean reasoning approach. Department of Mathematics, Computer Science and Mechanics, Warsaw University
Dubois D, Prade H (1992) Putting fuzzy sets and rough sets together. In: Slowinski R (ed) Intelligent decision support. Kluwer, Dordrecht, pp 203–232
Zadeh LA (1998) Fuzzy logic. IEEE Comput 21(4):83–93
Rajen B. Bhatt, Gopal M (2006) On the extension of functional dependency degree from crisp to fuzzy partitions. Pattern Recogn Lett 27(5):487–491
Rajen B. Bhatt, Gopal M (2004) FRID: Fuzzy-Rough Interactive Dichotomizers. In: Proceedings of the IEEE international conference on Fuzzy Systems, IEEE-FUZZ’04, Budapest, Hungary, July 26–29, pp 1337–1342
Blake CL, Merz CJ (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/∼mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
Nozaki K, Ishibuchi H, Tanaka H (1997) A simple but powerful heuristic method for generating fuzy rules from numerical data. Fuzzy Sets Syst 86:251–270
http://lib.stat.cmu.edu/datasets/veteran
Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 128:221–254
Jayashree S, Bhatia M, Shweta S, Anand S (2007) Quantative EEG analysis for assessment to ‘plan’ a task in Amyotropic Lateral Sclerosic (ALS) patients: a study of executive functions in ALS patients. Cogn Brain Res 22(1):59–66
http://lib.stat.cmu.edu/DASL/Datafiles/ICU.html
Pal NR, Bezdek JC (1995) On cluster validity for fuzzy c-means model. IEEE Trans Fuzzy Syst 3(3):370–379
Hogg RV, Tanis EA (1977) Probability and statistical inference. Macmillan, New York
Hong T-P, Wang T-T, Wang S-L, Chien B-C (2000) Learning a coverage set of maximally general fuzzy rules by rough sets. Expert Syst Appl 19(2):97–103
Acknowledgement
The Authors would like to thank anonymous referees for their helpful and precious comments. The Authors are thankful to Prof. Zdislaw Pawlak, Polish Academy of Sciences, Warsaw, Poland, for sending them some of his research notes on rough set theory. The authors are also thankful to C. Olaru for providing OMIB-One Machine Infinite Bus dataset [30].
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Consider a dataset [4] given in Table 7. Each pattern is described by four attributes and classified to either Don’t play or Play. Two attributes Temp and Humidity are real-valued and other two Outlook and Windy? are categorical. In the context of rough set theory, we may call categorical attributes as discrete attributes.
Rough set theory requires a priori discretization of real-valued attributes. Real-valued attributes Temp and Humidity have been discretized using partition intervals given in Table 8, producing the dataset seen in Table 9. These intervals are crisp. For example, any value of Temp and Humidity lying in the intervals [68–81] and [74–87.5] respectively, will be treated as Med only, without considering their degree of belongingness to that interval.
For given dataset:
Let x 1 = Outlook, x 2 = Temp, x 3 = Humidity, and x 4 = Windy?.
A set of patterns i ∈ U, for which attribute values are similar, partitions U into a set of equivalence classes. Partition of U by attribute x 1 is
Partitions of U by class labels are:
The concept given by Pawlak is to approximate each class by a pair of exact sets, called the ‘lower’ and ‘upper’ approximations. Lower approximation is the set of patterns, which certainly are classified to a given class, while upper approximation is the set of patterns, which can be possibly classified to a given class.
Formally, if each attribute x j takes a value from finite set of categorical values F jk (1 ≤ k ≤ c j ), classes \({\left[ {F_{jk}} \right]} = {\left\{{i \in U|x^{i}_{j} = F_{jk}} \right\}}; 1 \leq k \leq c_{j},\) are called the equivalence classes of U with respect to jth attribute x j . By the same way, [ l ] = {i ∈ U|y i = l} is an equivalence class through lth classification label. Equivalence classes can also be generated by considering more than one attributes at a time. Given arbitrary class l and equivalence classes generated by attribute x j , rough set is a tuple \({\left\langle {\underline{l}, \overline{l}} \right\rangle},\) where lower and upper approximations \(\underline{l} \) and \(\overline{l}\) respectively, are defined by Pawlak [21] as
Lower and upper approximations for Don’t play and Play, through equivalence classes by x 1 can be calculated by using Eq. (10).
The positive region is the set of patterns which can be classified with certainty to a unique class; i.e.,
For the considered example,
It is clear that through attribute x 1, patterns {1,2,5,7,8,9,10,11,12,13,14} can not be classified to unique decision class with certainty. For example, patterns 1 and 5 are classified in different decisions, even though they have same value Rain of attribute x 1. But there is no classification ambiguity in patterns belonging to \(\text{POS}_{{x_{1}}} {\left(y \right)}.\) If \(\text{POS}_{{x_{j}}} {\left(y \right)} = U,\) then each pattern i ∈ U has been certainly classified to a unique class without any ambiguity. By this, Pawlak [21] defined the dependency degree \(\gamma_{{x_{j}}} {\left(y \right)}\) of decision attribute y on condition attribute x j :
where \({\left\| {\text{POS}_{{x_{j}}} {\left(y \right)}} \right\|}\) is the number of patterns belonging to \(\text{POS}_{{x_{j}}} {\left(y \right)}.\)
For considered example, The dependency degree can be calculated by Eq. (12), which is \(\gamma_{{x_{1}}} {\left(y \right)} = \frac{{{\left\| {\text{POS}_{{x_{1}}} {\left(y \right)}} \right\|}}}{{14}} = 0.2857.\) By the same way we can obtain \(\gamma_{{x_{2}}} (y) = 0,\gamma_{{x_{3}}} (y) = 0,{\hbox{and}}\; \gamma_{{x_{4}}} (y) = 0.\)
Rights and permissions
About this article
Cite this article
Bhatt, R.B., Gopal, M. FRCT: fuzzy-rough classification trees. Pattern Anal Applic 11, 73–88 (2008). https://doi.org/10.1007/s10044-007-0080-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-007-0080-z