Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

Driessens, Kurt; Reutemann, Peter; Pfahringer, Bernhard; Leschi, Claire

doi:10.1007/11731139_10

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

Kurt Driessens^22,23,
Peter Reutemann²³,
Bernhard Pfahringer²³ &
…
Claire Leschi²⁴

Conference paper

3104 Accesses
21 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Abstract

The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are “pre-labeled” by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this pre-labeled data, the nearest neighbor classifier consistently improves on the original classifier.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Seeger, M.: Learning with labeled and unlabeled data. Technical report, Edinburgh University (2001)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100. Morgan Kaufmann, San Francisco (1998)
Chapter Google Scholar
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Proceedings of the Annual Conf. on Neural Information Processing Systems, NIPS (2004)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision / IEEE Workshop on Motion and Video Computing, Breckenridge, CO, USA, January 5-7, pp. 29–36. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: Bratko, I., Džeroski, S. (eds.) Proceedings of ICML99, 16th International Conference on Machine Learning, pp. 200–209. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, Neural Information Processing Systems, NIPS 2001, Vancouver and Whistler, British Columbia, Canada, December 3-8, pp. 945–952. MIT Press, Cambridge (2001)
Google Scholar
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, Neural Information Processing Systems, NIPS 2002, Vancouver, British Columbia, Canada, December 9-14, pp. 585–592. MIT Press, Cambridge (2002)
Google Scholar
Zhu, X.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania (PA), USA (2005)
Google Scholar
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Brodley, C., Pohoreckyj Danyluk, A. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, pp. 19–26. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Joachims, T.: Transductive learning via spectral graph partitioning. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), August 21-24, pp. 290–297. AAAI Press, Washington (2003)
Google Scholar
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised searning using Gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), August 21-24, pp. 912–919. AAAI Press, Washington (2003)
Google Scholar
Neville, J., Jensen, D.: Collective classification with relational dependency networks. In: Proceedings of the Second International Workshop on Multi-Relational Data-Mining (2003)
Google Scholar
Zhou, Z.H., Jiang, Y.: Nec4.5: neural ensemble based c4.5. IEEE Transactions on Knowledge and Data Engineering 16, 770–773 (2004)
Article Google Scholar
Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Friedman, J., Bentley, J., Finkel, R.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 209–226 (1977)
Article MATH Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: pre-print (2005), available from, www.cs.rochester.edu/u/beygel/publications.html
Omohundro, S.: Efficient algorithms with nearal network behavior. Journal of Complex Systems 1, 273–347 (1987)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, K.U. Leuven, Belgium
Kurt Driessens
Department of Computer Science, University of Waikato, Hamilton, New Zealand
Kurt Driessens, Peter Reutemann & Bernhard Pfahringer
Institut National des Sciences Appliquees, Lyon, France
Claire Leschi

Authors

Kurt Driessens
View author publications
You can also search for this author in PubMed Google Scholar
Peter Reutemann
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Pfahringer
View author publications
You can also search for this author in PubMed Google Scholar
Claire Leschi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C. (2006). Using Weighted Nearest Neighbor to Benefit from Unlabeled Data. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_10

Download citation

DOI: https://doi.org/10.1007/11731139_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics