Elsevier

Knowledge-Based Systems

Volume 29, May 2012, Pages 12-19
Knowledge-Based Systems

Jmax-pruning: A facility for the information theoretic pruning of modular classification rules

https://doi.org/10.1016/j.knosys.2011.06.016Get rights and content

Abstract

The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.

Highlights

► We improve a rule pruning method for modular classification rules. ► We examine the information theoretical shortcoming of the J-pruning approach. ► Our Jmax-pruning is based on the rule’s maximum theoretical information content. ► Empirical results show a significant improvement of Jmax-pruning to J-pruning.

Introduction

The growing interest in the area of data mining has led to various developments for the induction of classification rules from large data samples in order to classify previously unseen data instances. Classification rule induction algorithms can be categorised in two different approaches: the Top Down Induction of Decision Trees (TDIDT) [1] also known as the ‘divide and conquer’ approach and the ‘separate and conquer’ approach. The ‘divide and conquer’ approach induces classification rules in the intermediate form of a decision tree, whereas the ‘separate and conquer’ approach directly induces ‘IF… THEN…’ rules. The ‘divide and conquer’ approach can be traced back to the 1960s [2] and resulted in wide selection of classification systems such as C4.5 and C5.0. The ‘separate and conquer’ approach [3] can also be traced back to the 1960s. Its most notable member is the Prism family of algorithms as a direct competitor to the induction of decision trees.

The original Prism algorithm described in [4] identified the tree structure induced by ‘separate and conquer’ as the major handicap of decision trees which makes them vulnerable to overfitting, especially on noisy datasets. Prism has been shown to produce a similar classification accuracy compared with decision trees and in some cases, even outperforms decision trees. This is particularly the case, if the training data is noisy. There is also some recent interest in parallel versions of Prism algorithms, in order to make them scale better on large datasets. The framework described in [5], the Parallel Modular Classification Rule Inducer (PMCRI) allows to parallelise any member of the Prism family of algorithms.

Nevertheless also Prism, like any classification rule induction algorithm, is prone to overfitting on the training data. For decision trees there is a large variety of pruning algorithms that modify the classifier during or after the induction of the tree in order to make the tree more general and reduce unwanted overfitting [6]. For the Prism family of algorithms there is only one method described in the literature, and that is J-pruning [7], [17]. J-pruning is based on the J-measure [8], an information theoretic measure for quantifying the information content of a rule. Also there is an extension of the PMCRI framework mentioned above, the J-PMCRI framework [9] that incorporates J-pruning into PMCRI, not only for quality reasons but also because J-pruning reduces the number of rules and rule terms induced and thus the runtime of Prism and PMCRI.

This paper examines the J-pruning method described in [7], [17] and proposes a new pruning method Jmax-pruning, that aims to exploit the J-measure further and thus further to improve Prism’s classification accuracy. The basic Prism approach is described in Section 2 and compared with decision trees, also some particular members of the Prism family are introduced briefly. J-pruning is described and discussed in Section 2.3 and Jmax-pruning as a new variation of J-pruning is introduced and discussed in Section 3.2 and evaluated in Section 4. The ongoing work is described in Section 5 and discusses J-PrismTCS a version of Prism solely based on the J-measure and more importantly proposes the development of a version of Jmax-pruning for Decision Tree induction algorithms. Section 6 concludes the paper with a brief summary and discussion of the findings presented.

Section snippets

The Prism Family of Algorithms

As mentioned in Section 1, the representation of classification rules differs between the ‘divide and conquer’ and ‘separate and conquer’ approaches. The rule sets generated by the ‘divide and conquer’ approach are in the form of decision trees whereas rules generated by the ‘separate and conquer’ approach are modular. Modular rules do not necessarily fit into a decision tree and normally do not. The rule representation of decision trees is the main drawback of the ‘divide and conquer’

Variation of J-pruning

In general there is very little work on pruning methods for the Prism family of algorithms. Bramer’s J-pruning in the Inducer software seems to be the only pruning facility developed for Prism algorithms. This Section critiques the initial J-pruning facility and outlines Jmax-pruning, a variation that makes further use of the J-measure.

Evaluation of Jmax-pruning

The datasets used have been retrieved from the UCI repository [16]. Each dataset is divided into a test set holding 20% of the instances and a training set holding the remaining 80% of the instances.

Table 1 shows the number of rules induced per training set and the achieved accuracy on the test set using PrismTCS with J-pruning as described in Section 2.3 and Jmax-pruning as proposed in Section 3.2.

What is also listed in Table 1 as ‘J-value recoveries’, is the number of times the J-value

J-PrismTCS

Another possible variation of PrismTCS that is currently being implemented is a version that is solely based on the J-measure. Rule terms would be induced by generating all possible categorical and continuous rule terms and selecting the one that results in the highest J-value for the current rule instead of selecting the one with the largest conditional probability. Again the same stopping criterion as for standard PrismTCS could be used, which is that all instances of the current subset of

Conclusions

Section 2 discussed the replicated subtree problem introduced due to the representation of rules in the form of decision trees. The Prism family of algorithms has been introduced as an alternative approach to TDIDT that does induce modular classification rules that do not necessarily fit into a tree structure. The Prism family of algorithms was highlighted and J-pruning, a pre-pruning facility for Prism algorithms based on the J-measure, which describes the information content of a rule, was

References (17)

  • J. Cendrowska

    PRISM: an algorithm for inducing modular rules

    International Journal of Man-Machine Studies

    (1987)
  • M.A. Bramer

    Using J-Pruning to Reduce Overfitting in Classification Trees

    Knowledge-Based Systems

    (2002)
  • J.R. Quinlan

    C4.5: Programs for Machine Learning

    (1993)
  • E.B. Hunt et al.

    Experiments in Induction

    (1966)
  • R.S. Michalski, On the Quasi-Minimal solution of the general covering problem, in: Proceedings of the Fifth...
  • F.T. Stahl et al.

    PMCRI: A Parallel Modular Classification Rule Induction Framework

  • F. Esposito et al.

    A comparative analysis of methods for pruning decision trees

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • M.A. Bramer

    An information-theoretic approach to the pre-pruning of classification rules

There are more references available in the full text version of this article.

Cited by (21)

  • Rule extraction from support vector machines based on consistent region covering reduction

    2013, Knowledge-Based Systems
    Citation Excerpt :

    In fact, most regions could be redundant. Similar to other rule learning method, rule induction is needed to reduce useless rules and time complexity [4,5]. Which regions should be kept or removed?

  • Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry

    2013, Knowledge-Based Systems
    Citation Excerpt :

    Artificial intelligence techniques, which have been extensively used when generating credit ratings, have outperformed statistical methods [9,24]. Particularly, intelligent hybrid systems integrate several models for processing classification problems [2,50,55]. In practice, an ensemble classifier outperforms stand-alone models [43,44].

  • Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks

    2012, Knowledge-Based Systems
    Citation Excerpt :

    Ongoing work comprises the extension of the framework to induce general rules that do not predict a certain class but describe important relationships in the dataset. Also the integration of a further, only recently developed, pre-pruning version for Prism algorithms, Jmax-pruning [30,31] is investigated for integration in the PMCRI framework. In general the development of PMCRI allows the application of Prism algorithms on a larger range of datasets that have previously simply been too large to be analysed using the Prism approach.

  • The construction of scalable decision tree based on fast splitting and j-max pre pruning on large datasets

    2021, International Journal of Engineering, Transactions B: Applications
View all citing articles on Scopus
View full text