Advances in Instance Selection for Instance-Based Learning Algorithms

Brighton, Henry; Mellish, Chris

doi:10.1023/A:1014043630878

Advances in Instance Selection for Instance-Based Learning Algorithms

Published: April 2002

Volume 6, pages 153–172, (2002)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Henry Brighton¹ &
Chris Mellish²

1373 Accesses
394 Citations
Explore all metrics

Abstract

The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aha, D.W., Kibler, D., and Albert, M.K. 1991. Instance based learning algorithms. Machine Learning, 6(1):37–66.
Google Scholar
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases.
Brighton, H. 1996. Experiments in case-based learning. Undergraduate Dissertation, Department of Artificial Intelligence, University of Edinburgh, Scotland.
Google Scholar
Brighton, H. 1997. Information filtering for lazy learning algorithms. Masters Thesis, Centre for Cognitive Science, University of Edinburgh, Scotland.
Google Scholar
Brighton, H. and Mellish, C. 1997. Geometric criteria for case deletion in case-based learning algorithms. Unpublished article.
Brighton, H. and Mellish, C. 1999. On the consistency of information filters for lazy learning algorithms. In Principles of Data Mining and Knowledge Discovery, 3rd European Conference, Prague, Czech Republic, J.M. Zytkow and J. Rauch (Eds.), pp. 283–288.
Brodley, C. 1993. Addressing the selective superiority problem: Automatic algorithm /mode class selection. In Proceedings of the Tenth International Machine Learning Conference, Amherst, MA, pp. 17–24.
Cameron-Jones, R.M. 1992. Minimum description length instance-based learning. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence, Hobart, Australia, pp. 368–373.
Chang, C.-L. 1974. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23:1179–1184.
Google Scholar
Cover, T.M. and Hart, P.E. 1967. Nearest neighbor pattern classification. IEEE. Transactions on Information Theory, IT-13:21–27.
Google Scholar
Daelemans, W., van den Bosch, A., and Zavrel, J. 1997.Afeature-relevance heuristic for indexing and compressing large case bases. In Proceedings of the 9th European Conference on Machine Learning, Prague, Czech Republic, pp. 29–38.
Daelemans, W., van den Bosch, A., and Zavrel, J. 1999. Forgetting exceptions is harmful in language learning. Machine Learning, 34(1/3): 11–41.
Google Scholar
Dasarathy, B. 1991. Nearest Neighbor (NN) norms: NN Pattern Classification Techniques. Los Alimos, CA: IEEE Computer Society Press.
Google Scholar
Domingos P. 1995. Rule induction and instance-based learning: A unified approach. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 1226–1232.
Gates, G.W. 1972. The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3):431–433.
Google Scholar
Hart, P.E. 1968. The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14(3):515–516.
Google Scholar
Holte, R.C., Acker, L., and Porter, B. 1989. Concept learning and problem of small disjuncts. In Proceedings of the 11th International Joint Conference on Artificial Intelligence, pp. 813–818.
King, R.D., Feng, C., and Sutherland, A. 1995. Statlog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3):289–333.
Google Scholar
Kolodner, J.L. 1993. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Markovitch, S. and Scott, P.D. 1988. The role of forgetting in learning. In Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI, pp. 459–465.
Markovitch, S. and Scott, P.D. 1993. Information filtering: Selection mechanisms in learning systems. Machine Learning, 10(2):113–151.
Google Scholar
Ritter, G.L., Woodruff, H.B., Lowry, S.R., and Isenhour, T.L. 1975. An algorithm for the selective nearest neighbour decision rule. IEEE Transactions on Information Theory, 21(6):665–669.
Google Scholar
Salganicoff, M. 1993. Density-adaptive learning and forgetting. In Proceedings of the 10th International Conference on Machine Learning, University of Massachusetts, Amherst, pp. 276–283.
Salzberg, S. 1991. A nearest hyperrectangle learning method. Machine Learning, 6:227–309.
Google Scholar
Sebban, M., Zighed, D.A., and Di Palma, S. 1999. Selection and statistical validation of features and prototypes. In Principles of Data Mining and Knowledge Discovery, 3rd European Conference. Prague, Czech Republic, J.M. Zytkow and J. Rauch (Eds.), pp. 184–192.
Smyth, B. and Keane, M.T. 1995. Remembering to forget. In IJCAI-95, Proceedings of the Fourteenth International Conference on Artificial Intelligence, C.S. Mellish (Ed.), Vol. 1., pp. 377–382.
Swonger, C.W. 1972. Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In Frontiers of Pattern Recognition, S. Watanabe (Ed.), Orlando, FA: Academic Press, pp. 511–519.
Google Scholar
Tomek, I. 1976. An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6):448–452.
Google Scholar
van den Bosch, A. and W. Daelemans, 1998. Do not forget: Full memory in memory-based learning of word pronunciation. In Proceedings of NeMLaP3/CoNLL98, Sydney, Australia, pp. 195–204.
Wilson, D.L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408–421.
Google Scholar
Wilson, D.R. and Martinez, A.R. 1997. Instance pruning techniques. In Machine Learning: Proceedings of the Fourteenth International Conference, D. Fisher (Ed.). San Francisco, CA.
Zhang, J. 1992. Selecting typical instances in instance-based learning. In proceedings of the Ninth International Machine Learning Conference, Aberdeen, Scotland, pp. 470–479.

Download references

Author information

Authors and Affiliations

Language Evolution and Computation Research Unit, Department of Theoretical and Applied Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK
Henry Brighton
Department of Artificial Intelligence, The University of Edinburgh, Edinburgh, EH1 1HN, UK
Chris Mellish

Authors

Henry Brighton
View author publications
You can also search for this author in PubMed Google Scholar
Chris Mellish
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brighton, H., Mellish, C. Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002). https://doi.org/10.1023/A:1014043630878

Download citation

Issue Date: April 2002
DOI: https://doi.org/10.1023/A:1014043630878

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advances in Instance Selection for Instance-Based Learning Algorithms

Abstract

Access this article

Similar content being viewed by others

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Instance spaces for machine learning classification

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Advances in Instance Selection for Instance-Based Learning Algorithms

Abstract

Access this article

Similar content being viewed by others

Instance Selection for the Nearest Neighbor Classifier: Connecting the Performance to the Underlying Data Structure

Instance spaces for machine learning classification

Instance selection improves geometric mean accuracy: a study on imbalanced data classification

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation