Abstract
We present a novel formulation for quasi-supervised learning that extends the learning paradigm to large datasets. Quasi-supervised learning computes the posterior probabilities of overlapping datasets at each sample and labels those that are highly specific to their respective datasets. The proposed formulation partitions the data into sample groups to compute the dataset posterior probabilities in a smaller computational complexity. In experiments on synthetic as well as real datasets, the proposed algorithm attained significant reduction in the computation time for similar recognition performances compared to the original algorithm, effectively generalizing the quasi-supervised learning paradigm to applications characterized by very large datasets.
Similar content being viewed by others
References
Angelov P, Lughofer E, Zhou X (2008) Evolving fuzzy classifiers using different model architectures. Fuzzy Sets Syst 159(23):3160–3182
Auer P (1997) On learning from multi-instance examples: empirical evaluation of a theoretical approach. In: Proceedings of the fourteenth international conference on machine learning, pp 21–29
Blum A, Kalai A (1998) A note on learning from multiple-instance examples. Mach Learn 30:23–29
Chapelle O, Schölkopf B, Zien A (2006) Introduction to semi-supervised learning. MIT Press, USA
Cortes C, Vapnik VN (1995) Support vector networks. Mach Learn 20(1–2):273–297
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. Inform Theory 13(1):21–27
Dietterich TG, Lanthrop RH, Lozano-Perez T (1997) Solving the multiple-instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71
Embrechts P, Kluppelberg C, Mikosch T (2000) Modelling extremal events for insurance and finance, applications of mathematics, vol 33. Springer, Berlin
Fukunaga K, Hostetler LD (1975) k-nearest-neighbor bayes risk estimation. IEEE Trans Inform Theory 21(3):285–293
Gersho A, Gray RM (1991) Vector quantization and signal compression. The springer international series in engineering and computer science. Springer, Berlin
Gray RM (1984) Vector quantization. IEEE ASSP Mag 1(2):4–29
Haykin S (2008) Neural networks and learning machines, 3rd edn. Prentice Hall, USA
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Karaçalı B (2010) Quasi-supervised learning for biomedical data analysis. Pattern Recognit 43(10):3674–3682
Karaçalı B (2012) Hierarchical motif vectors for prediction of functional sites in amino acid sequences using quasi-supervised learning. IEEE/ACM Trans Comput Biol Bioinform 9(5):1432–1441
Karaçalı B, Krim H (2003) Fast minimization of structural risk by nearest neighbor method. IEEE Trans Neural Netw 14(1):127–137
Karaçalı B, Ramanath R, Snyder W (2004) Structural risk minimization-based nearest neighbor classifier. Pattern Recognit Lett 25(1):63–71
Kuncheva LI (2000) Fuzzy classifier design. Springer, Berlin
Lughofer E (2012) Single-pass active learning with conflict and ignorance. Evol Syst 3(4):251–271
Lughofer E, Buchtala O (2013) Reliable all-pairs evolving fuzzy classifiers. IEEE Trans Fuzzy Syst 21(4):625–641
McLachlan GJ (2004) Discriminant analysis and statistical pattern recognition., Wiley series in probability and statisticsWiley-Interscience, USA
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359
Olson CF (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21(8):1313–1325
Roe BP, Yang HJ, Zhu J, Liu Y, Stancu I, McGregor G (2005) Boosted decision trees as an alternative to artificial neural networks for particle identification. Nucl Instrum Methods Phys Res Sect A Accel Spectrom Detect Assoc Equip 543(2–3):577–584
Vapnik V (1998) Statistical learning theory. Wiley, USA
Vapnik V (2006a) Estimation of dependences based on empirical data. Information science and statistics. Springer, Berlin
Vapnik V (2006b) Transductive inference and semi-supervised learning. In: Chapelle O, Schölkopf B, Zien A (eds) Semi-supervised learning, chap 24. MIT Press, USA, pp 453–472
Varki A, Cummings RD, Esko JD, Freeze HH, Hart GW, Etzler ME (2008) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, USA
Weerapana E, Imperiali B (2006) Asparagine-linked protein glycosylation: from eukaryotic to prokaryotic systems. Glycobiology 16(6):91R–101R
Acknowledgments
This work was supported by the European Union Seventh Framework Programme Marie Curie Action grant PIRG03-GA-2008-230903. The MiniBooNE neutrino dataset was provided by Dr. Byron Roe, Emeritus Professor at the Department of Physics, University of Michigan at Ann Arbor.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Karaçalı, B. An efficient algorithm for large-scale quasi-supervised learning. Pattern Anal Applic 19, 311–323 (2016). https://doi.org/10.1007/s10044-014-0401-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-014-0401-y