DeEPs: A New Instance-Based Lazy Discovery and Classification System

Li, Jinyan; Dong, Guozhu; Ramamohanarao, Kotagiri; Wong, Limsoon

doi:10.1023/B:MACH.0000011804.08528.7d

DeEPs: A New Instance-Based Lazy Discovery and Classification System

Published: February 2004

Volume 54, pages 99–124, (2004)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

DeEPs: A New Instance-Based Lazy Discovery and Classification System

Download PDF

Jinyan Li¹,
Guozhu Dong²,
Kotagiri Ramamohanarao³ &
…
Limsoon Wong⁴

1242 Accesses
87 Citations
3 Altmetric
Explore all metrics

Abstract

Distance is widely used in most lazy classification systems. Rather than using distance, we make use of the frequency of an instance's subsets of features and the frequency-change rate of the subsets among training classes to perform both knowledge discovery and classification. We name the system DeEPs. Whenever an instance is considered, DeEPs can efficiently discover those patterns contained in the instance which sharply differentiate the training classes from one to another. DeEPs can also predict a class label for the instance by compactly summarizing the frequencies of the discovered patterns based on a view to collectively maximize the discriminating power of the patterns. Many experimental results are used to evaluate the system, showing that the patterns are comprehensible and that DeEPs is accurate and scalable.

Article PDF

Understanding Machine Learning Through Data-Oriented and Human Learning Approaches

Deep Learning of Representations

Deep Learning

References

Aha, D. W. (1997). Lazy Learning. Dordrecht, Netherlands: Kluwer Academic Publishers.
Google Scholar
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Google Scholar
Bayardo, R. J., & Agrawal, R. (1999). Mining the most interesting rules. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 145–154). San Diego, CA: ACM Press.
Google Scholar
Blake, C., & Murphy, P. (1998). The UCI machine learning repository. [http://www.cs.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Google Scholar
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Google Scholar
Dasarathy, B. (1991). Nearest Neighbor Pattern Classification Techniques. Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Datta, P., & Kibler, D. (1995). Learning prototypical concept description. In A. Prieditis, & S. J. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 158–166). San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Datta, P., & Kibler, D. (1997). Symbolic nearest mean classifier. In D. H. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (pp. 75–82). San Francisco, CA. Morgan Kaufmann.
Google Scholar
Devroye, L., Gyorfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag.
Google Scholar
Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24:2, 141–168.
Google Scholar
Dong, G., & Li, J. (1998). Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In X. Wu, K. Ramamohanarao, & K. B. Korb (Eds.), Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne (pp. 72–86). Springer-Verlag.
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 43–52). San Diego, CA: ACM Press.
Google Scholar
Dong, G., Zhang, X., Wong, L., & Li, J. (1999). CAEP: Classification by aggregating emerging patterns. In S. Arikawa, & K. Furukawa (Eds.), Proceedings of the Second International Conference on Discovery Science, Tokyo, Japan (pp. 30–42). Springer-Verlag.
Duda, R., & Hart, P. (1973). Pattern Classification and Scene Analysis. New York: John Wiley & Sons.
Google Scholar
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In R. Bajcsy (Ed.), Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1029). Morgan Kaufmann.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Google Scholar
Hilderman, R. J., & Hamilton, H. J. (2001). Evaluation of interestingness measures for ranking discovered knowledge. In D. W.-L. Cheung, G. J. Williams, & Q. Li (Eds.), Proceedings of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 247–259). Hong Kong, China: Springer-Verlag.
Google Scholar
Hirsh, H. (1994). Generalizing version spaces. Machine Learning, 17, 5–46.
Google Scholar
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11, 63–90.
Google Scholar
Keung, C.-K., & Lam, W. (2000). Prototype generation based on instance filtering and averaging. In T. Terano, H. Liu, & A. L. P. Chen (Eds.), Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 142–152). Berlin Heidelberg: Springer-Verlag.
Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., & Verkamo, A. (1994). Finding interesting rules from large sets of discovered association rules. In Proceedings of the 3rd International Conference on Information and Knowledge Management (pp. 401–408). Gaithersburg, Maryland: ACM Press.
Google Scholar
Kohavi, R., John, G., Long, R., Manley, D., & Pfleger, K. (1994). MLC++: A machine learning library in C++. In Tools with artificial intelligence (pp. 740–743).
Kubat, M., & Cooperson, M. (2000). Voting nearest-neighbor subclassifiers. In Machine Learning: Proceedings of the Seventeenth International Conference (pp. 503–510). Morgan Kaufmann.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifier. In W. R. Swartout (Ed.), Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). AAAI Press.
Langley, P., & Iba, W. (1993). Average-case analysis of a nearest neighbor algorithm. In R. Bajcsy (Ed.), Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 889–894). Chambery, France.
Li, J. (2001). Mining emerging patterns to construct accurate and efficient classifiers. Ph.D. Thesis, Department of Computer Science and Software Engineering, The University of Melbourne, Australia.
Google Scholar
Li, J., Dong, G., & Ramamohanarao, K. (2000). Instance-based classification by emerging patterns. In D. A. Zighed, H. J. Komorowski, & J. M. Zytkow (Eds.), Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 191–200). Lyon, France: Springer-Verlag.
Google Scholar
Li, J., Dong, G., & Ramamohanarao, K. (2001). Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems: An International Journal, 3, 131–145.
Google Scholar
Li, J., Ramamohanarao, K., & Dong, G. (2000). The space of jumping emerging patterns and its incremental maintenance algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, CA, USA (pp. 551–558). San Francisco: Morgan Kaufmann.
Google Scholar
Li, J., Ramamohanarao, K., & Dong, G. (2001). Combining the strength of pattern frequency and distance for classification. In D. W.-L. Cheung, G. J. Williams, & Q. Li (Eds.), The Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 455–466). Hong Kong.
Li, J., & Wong, L. (2002). Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18:5, 725–734.
Google Scholar
Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. E. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 80–86). New York, USA: AAAI Press.
Google Scholar
Meretakis, D., & Wuthrich, B. (1999). Extending naive bayes classifiers using long itemsets. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 165–174). San Diego, CA: ACM Press.
Google Scholar
Mitchell, T. (1977). Version spaces: A candidate elimination approach to rule learning. In R. Reddy (Ed.), Proceedings of the Fifth International Joint Conference on Artificial Intelligence (pp. 305–310). Cambridge, MA.
Mitchell, T. (1982). Generalization as search. Artificial Intelligence, 18, 203–226.
Google Scholar
Padmanabhan, B., & Tuzhilin, A. (1998). A belief-driven method for discovering unexpected patterns. In R. Agrawal, P. E. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 94–100). New York, NY: AAAI Press.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Google Scholar
Sahar, S. (1999). Interestingness via what is not interesting. In S. Chaudhuri, & D. Madigan (Eds.), Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 332–336). San Diego, CA: ACM Press.
Google Scholar
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.
Google Scholar
Sebag, M. (1996). Delaying the choice of bias: A disjunctive version space approach. In Machine Learning: Proceedings of the Thirteenth International Conference (pp. 444–452). Morgan Kaufmann.
Silberschatz, A., & Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8:6, 970–974.
Google Scholar
Wettscherech, D. (1994). A hybrid nearest-neighbor and nearest-hyperrectangle algorithm. In F. Bergadano, & L. D. Raedt (Eds.), Proceedings of the Seventh European Conference on Machine Learning (pp. 323–335). Springer-Verlag.
Wettschereck, D., & Dietterich, T. G. (1995). An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, 19:1, 5–27.
Google Scholar
Wilson, D. R., & Martinez, T. R. (1997). Instance pruning techniques. In D. H. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (pp. 403–411). Morgan Kaufmann.
Wilson, D., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38:3, 257–286.
Google Scholar
Zhang, J. (1992). Selecting typical instances in instance-based learning algorithms. In Machine Learning: Proceedings of the Ninth International Conference (pp. 470–479). Morgan Kaufmann.

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21, Heng Mui Keng Terrace, Singapore, 119613
Jinyan Li
Department of CSE, Wright State University, USA
Guozhu Dong
Department of CSSE, The University of Melbourne, AU
Kotagiri Ramamohanarao
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, Singapore, 119613
Limsoon Wong

Authors

Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Guozhu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Kotagiri Ramamohanarao
View author publications
You can also search for this author in PubMed Google Scholar
Limsoon Wong
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Dong, G., Ramamohanarao, K. et al. DeEPs: A New Instance-Based Lazy Discovery and Classification System. Machine Learning 54, 99–124 (2004). https://doi.org/10.1023/B:MACH.0000011804.08528.7d

Download citation

Issue Date: February 2004
DOI: https://doi.org/10.1023/B:MACH.0000011804.08528.7d

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

DeEPs: A New Instance-Based Lazy Discovery and Classification System

Abstract

Article PDF

Similar content being viewed by others

Understanding Machine Learning Through Data-Oriented and Human Learning Approaches

Deep Learning of Representations

Deep Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

DeEPs: A New Instance-Based Lazy Discovery and Classification System

Abstract

Article PDF

Similar content being viewed by others

Understanding Machine Learning Through Data-Oriented and Human Learning Approaches

Deep Learning of Representations

Deep Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation