Abstract
Distance is widely used in most lazy classification systems. Rather than using distance, we make use of the frequency of an instance's subsets of features and the frequency-change rate of the subsets among training classes to perform both knowledge discovery and classification. We name the system DeEPs. Whenever an instance is considered, DeEPs can efficiently discover those patterns contained in the instance which sharply differentiate the training classes from one to another. DeEPs can also predict a class label for the instance by compactly summarizing the frequencies of the discovered patterns based on a view to collectively maximize the discriminating power of the patterns. Many experimental results are used to evaluate the system, showing that the patterns are comprehensible and that DeEPs is accurate and scalable.
Article PDF
Similar content being viewed by others
References
Aha, D. W. (1997). Lazy Learning. Dordrecht, Netherlands: Kluwer Academic Publishers.
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Bayardo, R. J., & Agrawal, R. (1999). Mining the most interesting rules. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 145–154). San Diego, CA: ACM Press.
Blake, C., & Murphy, P. (1998). The UCI machine learning repository. [http://www.cs.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Dasarathy, B. (1991). Nearest Neighbor Pattern Classification Techniques. Los Alamitos, CA: IEEE Computer Society Press.
Datta, P., & Kibler, D. (1995). Learning prototypical concept description. In A. Prieditis, & S. J. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 158–166). San Francisco, CA: Morgan Kaufmann Publishers.
Datta, P., & Kibler, D. (1997). Symbolic nearest mean classifier. In D. H. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (pp. 75–82). San Francisco, CA. Morgan Kaufmann.
Devroye, L., Gyorfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag.
Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24:2, 141–168.
Dong, G., & Li, J. (1998). Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In X. Wu, K. Ramamohanarao, & K. B. Korb (Eds.), Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne (pp. 72–86). Springer-Verlag.
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 43–52). San Diego, CA: ACM Press.
Dong, G., Zhang, X., Wong, L., & Li, J. (1999). CAEP: Classification by aggregating emerging patterns. In S. Arikawa, & K. Furukawa (Eds.), Proceedings of the Second International Conference on Discovery Science, Tokyo, Japan (pp. 30–42). Springer-Verlag.
Duda, R., & Hart, P. (1973). Pattern Classification and Scene Analysis. New York: John Wiley & Sons.
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In R. Bajcsy (Ed.), Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1029). Morgan Kaufmann.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Hilderman, R. J., & Hamilton, H. J. (2001). Evaluation of interestingness measures for ranking discovered knowledge. In D. W.-L. Cheung, G. J. Williams, & Q. Li (Eds.), Proceedings of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 247–259). Hong Kong, China: Springer-Verlag.
Hirsh, H. (1994). Generalizing version spaces. Machine Learning, 17, 5–46.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11, 63–90.
Keung, C.-K., & Lam, W. (2000). Prototype generation based on instance filtering and averaging. In T. Terano, H. Liu, & A. L. P. Chen (Eds.), Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 142–152). Berlin Heidelberg: Springer-Verlag.
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., & Verkamo, A. (1994). Finding interesting rules from large sets of discovered association rules. In Proceedings of the 3rd International Conference on Information and Knowledge Management (pp. 401–408). Gaithersburg, Maryland: ACM Press.
Kohavi, R., John, G., Long, R., Manley, D., & Pfleger, K. (1994). MLC++: A machine learning library in C++. In Tools with artificial intelligence (pp. 740–743).
Kubat, M., & Cooperson, M. (2000). Voting nearest-neighbor subclassifiers. In Machine Learning: Proceedings of the Seventeenth International Conference (pp. 503–510). Morgan Kaufmann.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifier. In W. R. Swartout (Ed.), Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). AAAI Press.
Langley, P., & Iba, W. (1993). Average-case analysis of a nearest neighbor algorithm. In R. Bajcsy (Ed.), Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 889–894). Chambery, France.
Li, J. (2001). Mining emerging patterns to construct accurate and efficient classifiers. Ph.D. Thesis, Department of Computer Science and Software Engineering, The University of Melbourne, Australia.
Li, J., Dong, G., & Ramamohanarao, K. (2000). Instance-based classification by emerging patterns. In D. A. Zighed, H. J. Komorowski, & J. M. Zytkow (Eds.), Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 191–200). Lyon, France: Springer-Verlag.
Li, J., Dong, G., & Ramamohanarao, K. (2001). Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems: An International Journal, 3, 131–145.
Li, J., Ramamohanarao, K., & Dong, G. (2000). The space of jumping emerging patterns and its incremental maintenance algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, CA, USA (pp. 551–558). San Francisco: Morgan Kaufmann.
Li, J., Ramamohanarao, K., & Dong, G. (2001). Combining the strength of pattern frequency and distance for classification. In D. W.-L. Cheung, G. J. Williams, & Q. Li (Eds.), The Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 455–466). Hong Kong.
Li, J., & Wong, L. (2002). Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18:5, 725–734.
Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. E. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 80–86). New York, USA: AAAI Press.
Meretakis, D., & Wuthrich, B. (1999). Extending naive bayes classifiers using long itemsets. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 165–174). San Diego, CA: ACM Press.
Mitchell, T. (1977). Version spaces: A candidate elimination approach to rule learning. In R. Reddy (Ed.), Proceedings of the Fifth International Joint Conference on Artificial Intelligence (pp. 305–310). Cambridge, MA.
Mitchell, T. (1982). Generalization as search. Artificial Intelligence, 18, 203–226.
Padmanabhan, B., & Tuzhilin, A. (1998). A belief-driven method for discovering unexpected patterns. In R. Agrawal, P. E. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 94–100). New York, NY: AAAI Press.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Sahar, S. (1999). Interestingness via what is not interesting. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 332–336). San Diego, CA: ACM Press.
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.
Sebag, M. (1996). Delaying the choice of bias: A disjunctive version space approach. In Machine Learning: Proceedings of the Thirteenth International Conference (pp. 444–452). Morgan Kaufmann.
Silberschatz, A., & Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8:6, 970–974.
Wettscherech, D. (1994). A hybrid nearest-neighbor and nearest-hyperrectangle algorithm. In F. Bergadano, & L. D. Raedt (Eds.), Proceedings of the Seventh European Conference on Machine Learning (pp. 323–335). Springer-Verlag.
Wettschereck, D., & Dietterich, T. G. (1995). An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, 19:1, 5–27.
Wilson, D. R., & Martinez, T. R. (1997). Instance pruning techniques. In D. H. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (pp. 403–411). Morgan Kaufmann.
Wilson, D., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38:3, 257–286.
Zhang, J. (1992). Selecting typical instances in instance-based learning algorithms. In Machine Learning: Proceedings of the Ninth International Conference (pp. 470–479). Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Li, J., Dong, G., Ramamohanarao, K. et al. DeEPs: A New Instance-Based Lazy Discovery and Classification System. Machine Learning 54, 99–124 (2004). https://doi.org/10.1023/B:MACH.0000011804.08528.7d
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000011804.08528.7d