Skip to main content
Log in

An efficient algorithm for large-scale quasi-supervised learning

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

We present a novel formulation for quasi-supervised learning that extends the learning paradigm to large datasets. Quasi-supervised learning computes the posterior probabilities of overlapping datasets at each sample and labels those that are highly specific to their respective datasets. The proposed formulation partitions the data into sample groups to compute the dataset posterior probabilities in a smaller computational complexity. In experiments on synthetic as well as real datasets, the proposed algorithm attained significant reduction in the computation time for similar recognition performances compared to the original algorithm, effectively generalizing the quasi-supervised learning paradigm to applications characterized by very large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Angelov P, Lughofer E, Zhou X (2008) Evolving fuzzy classifiers using different model architectures. Fuzzy Sets Syst 159(23):3160–3182

    Article  MathSciNet  MATH  Google Scholar 

  2. Auer P (1997) On learning from multi-instance examples: empirical evaluation of a theoretical approach. In: Proceedings of the fourteenth international conference on machine learning, pp 21–29

  3. Blum A, Kalai A (1998) A note on learning from multiple-instance examples. Mach Learn 30:23–29

    Article  MATH  Google Scholar 

  4. Chapelle O, Schölkopf B, Zien A (2006) Introduction to semi-supervised learning. MIT Press, USA

    Book  Google Scholar 

  5. Cortes C, Vapnik VN (1995) Support vector networks. Mach Learn 20(1–2):273–297

    MATH  Google Scholar 

  6. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. Inform Theory 13(1):21–27

    Article  MATH  Google Scholar 

  7. Dietterich TG, Lanthrop RH, Lozano-Perez T (1997) Solving the multiple-instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71

    Article  MATH  Google Scholar 

  8. Embrechts P, Kluppelberg C, Mikosch T (2000) Modelling extremal events for insurance and finance, applications of mathematics, vol 33. Springer, Berlin

    MATH  Google Scholar 

  9. Fukunaga K, Hostetler LD (1975) k-nearest-neighbor bayes risk estimation. IEEE Trans Inform Theory 21(3):285–293

    Article  MathSciNet  MATH  Google Scholar 

  10. Gersho A, Gray RM (1991) Vector quantization and signal compression. The springer international series in engineering and computer science. Springer, Berlin

    Google Scholar 

  11. Gray RM (1984) Vector quantization. IEEE ASSP Mag 1(2):4–29

    Article  Google Scholar 

  12. Haykin S (2008) Neural networks and learning machines, 3rd edn. Prentice Hall, USA

    Google Scholar 

  13. Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    Article  Google Scholar 

  14. Karaçalı B (2010) Quasi-supervised learning for biomedical data analysis. Pattern Recognit 43(10):3674–3682

    Article  MATH  Google Scholar 

  15. Karaçalı B (2012) Hierarchical motif vectors for prediction of functional sites in amino acid sequences using quasi-supervised learning. IEEE/ACM Trans Comput Biol Bioinform 9(5):1432–1441

    Article  Google Scholar 

  16. Karaçalı B, Krim H (2003) Fast minimization of structural risk by nearest neighbor method. IEEE Trans Neural Netw 14(1):127–137

    Article  Google Scholar 

  17. Karaçalı B, Ramanath R, Snyder W (2004) Structural risk minimization-based nearest neighbor classifier. Pattern Recognit Lett 25(1):63–71

    Article  Google Scholar 

  18. Kuncheva LI (2000) Fuzzy classifier design. Springer, Berlin

    Book  MATH  Google Scholar 

  19. Lughofer E (2012) Single-pass active learning with conflict and ignorance. Evol Syst 3(4):251–271

    Article  Google Scholar 

  20. Lughofer E, Buchtala O (2013) Reliable all-pairs evolving fuzzy classifiers. IEEE Trans Fuzzy Syst 21(4):625–641

    Article  Google Scholar 

  21. McLachlan GJ (2004) Discriminant analysis and statistical pattern recognition., Wiley series in probability and statisticsWiley-Interscience, USA

    MATH  Google Scholar 

  22. Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359

    Article  MATH  Google Scholar 

  23. Olson CF (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21(8):1313–1325

    Article  MathSciNet  MATH  Google Scholar 

  24. Roe BP, Yang HJ, Zhu J, Liu Y, Stancu I, McGregor G (2005) Boosted decision trees as an alternative to artificial neural networks for particle identification. Nucl Instrum Methods Phys Res Sect A Accel Spectrom Detect Assoc Equip 543(2–3):577–584

    Article  Google Scholar 

  25. Vapnik V (1998) Statistical learning theory. Wiley, USA

    MATH  Google Scholar 

  26. Vapnik V (2006a) Estimation of dependences based on empirical data. Information science and statistics. Springer, Berlin

    MATH  Google Scholar 

  27. Vapnik V (2006b) Transductive inference and semi-supervised learning. In: Chapelle O, Schölkopf B, Zien A (eds) Semi-supervised learning, chap 24. MIT Press, USA, pp 453–472

  28. Varki A, Cummings RD, Esko JD, Freeze HH, Hart GW, Etzler ME (2008) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, USA

    Google Scholar 

  29. Weerapana E, Imperiali B (2006) Asparagine-linked protein glycosylation: from eukaryotic to prokaryotic systems. Glycobiology 16(6):91R–101R

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the European Union Seventh Framework Programme Marie Curie Action grant PIRG03-GA-2008-230903. The MiniBooNE neutrino dataset was provided by Dr. Byron Roe, Emeritus Professor at the Department of Physics, University of Michigan at Ann Arbor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bilge Karaçalı.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 83 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karaçalı, B. An efficient algorithm for large-scale quasi-supervised learning. Pattern Anal Applic 19, 311–323 (2016). https://doi.org/10.1007/s10044-014-0401-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-014-0401-y

Keywords

Navigation