Skip to main content

Using Weighted Nearest Neighbor to Benefit from Unlabeled Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Abstract

The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are “pre-labeled” by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this pre-labeled data, the nearest neighbor classifier consistently improves on the original classifier.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Seeger, M.: Learning with labeled and unlabeled data. Technical report, Edinburgh University (2001)

    Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100. Morgan Kaufmann, San Francisco (1998)

    Chapter  Google Scholar 

  3. Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Proceedings of the Annual Conf. on Neural Information Processing Systems, NIPS (2004)

    Google Scholar 

  4. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  5. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision / IEEE Workshop on Motion and Video Computing, Breckenridge, CO, USA, January 5-7, pp. 29–36. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  6. Joachims, T.: Transductive inference for text classification using support vector machines. In: Bratko, I., Džeroski, S. (eds.) Proceedings of ICML99, 16th International Conference on Machine Learning, pp. 200–209. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  7. Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, Neural Information Processing Systems, NIPS 2001, Vancouver and Whistler, British Columbia, Canada, December 3-8, pp. 945–952. MIT Press, Cambridge (2001)

    Google Scholar 

  8. Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, Neural Information Processing Systems, NIPS 2002, Vancouver, British Columbia, Canada, December 9-14, pp. 585–592. MIT Press, Cambridge (2002)

    Google Scholar 

  9. Zhu, X.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania (PA), USA (2005)

    Google Scholar 

  10. Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Brodley, C., Pohoreckyj Danyluk, A. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, pp. 19–26. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  11. Joachims, T.: Transductive learning via spectral graph partitioning. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), August 21-24, pp. 290–297. AAAI Press, Washington (2003)

    Google Scholar 

  12. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised searning using Gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), August 21-24, pp. 912–919. AAAI Press, Washington (2003)

    Google Scholar 

  13. Neville, J., Jensen, D.: Collective classification with relational dependency networks. In: Proceedings of the Second International Workshop on Multi-Relational Data-Mining (2003)

    Google Scholar 

  14. Zhou, Z.H., Jiang, Y.: Nec4.5: neural ensemble based c4.5. IEEE Transactions on Knowledge and Data Engineering 16, 770–773 (2004)

    Article  Google Scholar 

  15. Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  16. Friedman, J., Bentley, J., Finkel, R.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 209–226 (1977)

    Article  MATH  Google Scholar 

  17. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: pre-print (2005), available from, www.cs.rochester.edu/u/beygel/publications.html

  18. Omohundro, S.: Efficient algorithms with nearal network behavior. Journal of Complex Systems 1, 273–347 (1987)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C. (2006). Using Weighted Nearest Neighbor to Benefit from Unlabeled Data. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_10

Download citation

  • DOI: https://doi.org/10.1007/11731139_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics