Scaling Large Learning Problems with Hard Parallel Mixtures

Collobert, Ronan; Bengio, Yoshua; Bengio, Samy

doi:10.1007/3-540-45665-1_2

Scaling Large Learning Problems with Hard Parallel Mixtures

Ronan Collobert⁶,
Yoshua Bengio⁶ &
Samy Bengio⁷

Conference paper
First Online: 01 January 2002

2021 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2388))

Abstract

A challenge for statistical learning is to deal with large data sets, e.g. in data mining. Popular learning algorithms such as Support Vector Machines have training time at least quadratic in the number of examples: they are hopeless to solve problems with a million examples. We propose a “hard parallelizable mixture” methodology which yields significantly reduced training time through modularization and parallelization: the training data is iteratively partitioned by a “gater” model in such a way that it becomes easy to learn an “expert” model separately in each region of the partition. A probabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to locally grow linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm provably goes down in a cost function that is an upper bound on the negative log-likelihood.

Part of this work has been done while Ronan Collobert was at IDIAP, CP 592, rue du Simplon 4, 1920 Martigny, Switzerland.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. A. Cole, M. Noel, T. Lander, and T. Durham. New telephone speech corpora at CSLU. Proceedings of the European Conference on Speech Communication and Technology, EUROSPEECH, 1:821–824, 1995.
Google Scholar
R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143–160, 2001.
Article MathSciNet Google Scholar
S. E. Fahlman. Fast-learning variations on back-propagation: An empirical study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pages 38–51, Pittsburg 1988, 1989. Morgan Kaufmann, San Mateo.
Google Scholar
M. Haruno, D. M. Wolpert, and M. Kawato. Mosaic model for sensorimotor learning and control. Neural Computation, 13(10):2201–2220, 2001.
Article MATH Google Scholar
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.
Article Google Scholar
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991.
Article Google Scholar
J. T. Kwok. Support vector mixture for classification and regression problems. In Proceedings of the International Conference on Pattern Recognition (ICPR), pages 255–258, Brisbane, Queensland, Australia, 1998.
Google Scholar
J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Smola, Bartlett, Schlkopf, and Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–73. MIT Press, 1999.
Google Scholar
A. Rida, A. Labbi, and C. Pellegrini. Local experts combination through density decomposition. In Proceedings of UAI’99. Morgan Kaufmann, 1999.
Google Scholar
V. Tresp. A bayesian committee machine. Neural Comp., 12:2719–2741, 2000.
Article Google Scholar
V. N. Vapnik. The nature of statistical learning theory. Springer, 2nd edition, 1995.
Google Scholar
C. K. I Williams and C.E. Rasmussen. Gaussian processes for regression. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 514–520. MIT Press, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

DIRO, Université de Montréal, CP 6128, Succ. Centre-Ville, Montréal, Canada
Ronan Collobert & Yoshua Bengio
IDIAP, CP 592, rue du Simplon 4, 1920, Martigny, Switzerland
Samy Bengio

Authors

Ronan Collobert
View author publications
You can also search for this author in PubMed Google Scholar
Yoshua Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Samy Bengio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Korea University, Anam-dong, Seongbuk-ku, Seoul, 136-701, Korea
Seong-Whan Lee
Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, Via Dodecaneso 35, 16146, Genova, Italy
Alessandro Verri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Collobert, R., Bengio, Y., Bengio, S. (2002). Scaling Large Learning Problems with Hard Parallel Mixtures. In: Lee, SW., Verri, A. (eds) Pattern Recognition with Support Vector Machines. SVM 2002. Lecture Notes in Computer Science, vol 2388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45665-1_2

Download citation

DOI: https://doi.org/10.1007/3-540-45665-1_2
Published: 25 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44016-1
Online ISBN: 978-3-540-45665-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics