Skip to main content
Log in

Advances in Instance Selection for Instance-Based Learning Algorithms

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance based learning algorithms. Machine Learning, 6(1):37–66.

    Google Scholar 

  • Blake, C. and Merz, C. 1998. UCI repository of machine learning databases.

  • Brighton, H. 1996. Experiments in case-based learning. Undergraduate Dissertation, Department of Artificial Intelligence, University of Edinburgh, Scotland.

    Google Scholar 

  • Brighton, H. 1997. Information filtering for lazy learning algorithms. Masters Thesis, Centre for Cognitive Science, University of Edinburgh, Scotland.

    Google Scholar 

  • Brighton, H. and Mellish, C. 1997. Geometric criteria for case deletion in case-based learning algorithms. Unpublished article.

  • Brighton, H. and Mellish, C. 1999. On the consistency of information filters for lazy learning algorithms. In Principles of Data Mining and Knowledge Discovery, 3rd European Conference, Prague, Czech Republic, J.M. Zytkow and J. Rauch (Eds.), pp. 283–288.

  • Brodley, C. 1993. Addressing the selective superiority problem: Automatic algorithm /mode class selection. In Proceedings of the Tenth International Machine Learning Conference, Amherst, MA, pp. 17–24.

  • Cameron-Jones, R.M. 1992. Minimum description length instance-based learning. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence, Hobart, Australia, pp. 368–373.

  • Chang, C.-L. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23:1179–1184.

    Google Scholar 

  • Cover, T.M. and Hart, P.E. 1967. Nearest neighbor pattern classification. IEEE. Transactions on Information Theory, IT-13:21–27.

    Google Scholar 

  • Daelemans, W., van den Bosch, A., and Zavrel, J. 1997.Afeature-relevance heuristic for indexing and compressing large case bases. In Proceedings of the 9th European Conference on Machine Learning, Prague, Czech Republic, pp. 29–38.

  • Daelemans, W., van den Bosch, A., and Zavrel, J. 1999. Forgetting exceptions is harmful in language learning. Machine Learning, 34(1/3): 11–41.

    Google Scholar 

  • Dasarathy, B. 1991. Nearest Neighbor (NN) norms: NN Pattern Classification Techniques. Los Alimos, CA: IEEE Computer Society Press.

    Google Scholar 

  • Domingos P. 1995. Rule induction and instance-based learning: A unified approach. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 1226–1232.

  • Gates, G.W. 1972. The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3):431–433.

    Google Scholar 

  • Hart, P.E. 1968. The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14(3):515–516.

    Google Scholar 

  • Holte, R.C., Acker, L., and Porter, B. 1989. Concept learning and problem of small disjuncts. In Proceedings of the 11th International Joint Conference on Artificial Intelligence, pp. 813–818.

  • King, R.D., Feng, C., and Sutherland, A. 1995. Statlog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3):289–333.

    Google Scholar 

  • Kolodner, J.L. 1993. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Markovitch, S. and Scott, P.D. 1988. The role of forgetting in learning. In Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI, pp. 459–465.

  • Markovitch, S. and Scott, P.D. 1993. Information filtering: Selection mechanisms in learning systems. Machine Learning, 10(2):113–151.

    Google Scholar 

  • Ritter, G.L., Woodruff, H.B., Lowry, S.R., and Isenhour, T.L. 1975. An algorithm for the selective nearest neighbour decision rule. IEEE Transactions on Information Theory, 21(6):665–669.

    Google Scholar 

  • Salganicoff, M. 1993. Density-adaptive learning and forgetting. In Proceedings of the 10th International Conference on Machine Learning, University of Massachusetts, Amherst, pp. 276–283.

  • Salzberg, S. 1991. A nearest hyperrectangle learning method. Machine Learning, 6:227–309.

    Google Scholar 

  • Sebban, M., Zighed, D.A., and Di Palma, S. 1999. Selection and statistical validation of features and prototypes. In Principles of Data Mining and Knowledge Discovery, 3rd European Conference. Prague, Czech Republic, J.M. Zytkow and J. Rauch (Eds.), pp. 184–192.

  • Smyth, B. and Keane, M.T. 1995. Remembering to forget. In IJCAI-95, Proceedings of the Fourteenth International Conference on Artificial Intelligence, C.S. Mellish (Ed.), Vol. 1., pp. 377–382.

  • Swonger, C.W. 1972. Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In Frontiers of Pattern Recognition, S. Watanabe (Ed.), Orlando, FA: Academic Press, pp. 511–519.

    Google Scholar 

  • Tomek, I. 1976. An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6):448–452.

    Google Scholar 

  • van den Bosch, A. and W. Daelemans, 1998. Do not forget: Full memory in memory-based learning of word pronunciation. In Proceedings of NeMLaP3/CoNLL98, Sydney, Australia, pp. 195–204.

  • Wilson, D.L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408–421.

    Google Scholar 

  • Wilson, D.R. and Martinez, A.R. 1997. Instance pruning techniques. In Machine Learning: Proceedings of the Fourteenth International Conference, D. Fisher (Ed.). San Francisco, CA.

  • Zhang, J. 1992. Selecting typical instances in instance-based learning. In proceedings of the Ninth International Machine Learning Conference, Aberdeen, Scotland, pp. 470–479.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brighton, H., Mellish, C. Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002). https://doi.org/10.1023/A:1014043630878

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014043630878

Navigation