Support vector machine active learning for music retrieval

Mandel, Michael I.; Poliner, Graham E.; Ellis, Daniel P. W.

doi:10.1007/s00530-006-0032-2

Support vector machine active learning for music retrieval

Regular Paper
Published: 07 April 2006

Volume 12, pages 3–13, (2006)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Michael I. Mandel¹,
Graham E. Poliner¹ &
Daniel P. W. Ellis¹

600 Accesses
69 Citations
3 Altmetric
Explore all metrics

Abstract

Searching and organizing growing digital music collections requires a computational model of music similarity. This paper describes a system for performing flexible music similarity queries using SVM active learning. We evaluated the success of our system by classifying 1210 pop songs according to mood and style (from an online music guide) and by the performing artist. In comparing a number of representations for songs, we found the statistics of mel-frequency cepstral coefficients to perform best in precision-at-20 comparisons. We also show that by choosing training examples intelligently, active learning requires half as many labeled examples to achieve the same accuracy as a standard scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Introduction to Machine Learning

References

All Music Guide: Site glossary. Urlhttp://www.all-music.com/cg/amg.dll?p=amg&sql=32:amg/info_pages/a_siteglossary.html
Aucouturier, J.J., Pachet, F.: Improving timbre similarity: How high's the sky? J. Negative Results Speech Audio Sci. 1(1), (2004)
Google Scholar
Berenzweig, A., Ellis, D.P.W., Lawrence, S.:Using voice segmentsto improve artist classification of music. In: Proceedings of AES International Conference on Virtual, Synthetic, and Entertainment Audio. Espoo, Finland (2002)
Berenzweig, A., Ellis, D.P.W., Lawrence, S.: Anchor space for classification and similarity measurement of music. In: Proceedings of IEEE International Conference on Multimedia & Expo, pp. 29–32 (2003)
Berenzweig, A., Logan, B., Ellis, D.P.W., Whitman, B.: A large-scale evalutation of acoustic and subjective music similarity measures. In: Proceedings International Conference on Music Information Retrieval, pp. 103–109 (2003)
Burgess, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discov. 2(2), 121–167 (1998)
Google Scholar
Chang, E.Y., Tong, S., Goh, K., Chang, C.W.: Support vector machine concept-dependent active learning for image retrieval. ACM Trans. Multimedia (2005) in press
Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (1998)
Cristianini, N., Shawe-Taylor, J.: An introduction to support Vector Machines: And other kernel-based learning methods, Cambridge University Press, New York, NY (2000)
Google Scholar
Downie, J.S., West, K., Ehmann, A., Vincent, E.: The 2005 music information retrieval evaluation exchange (MIREX 2005): Preliminary overview. In: Reiss, J.D., Wiggins, G.A. (eds.) Proceedings of the International Conference on Music Information Retrieval, pp. 320–323 (2005)
Ellis, D., Berenzweig, A., Whitman, B.: The “uspop2002” pop music data set (2003). URL http://labrosa.ee. columbia.edu/projects/musicsim/uspop2002.html
Ellis, D.P.W., Whitman, B., Berenzweig, A., Lawrence, S.: The quest for ground truth in musical artist similarity. In: Proceedings of the International Conference on Music Information Retrieval, pp. 170–177 (2002)
Foote, J.T.: Content-based retrieval of music and audio. In: C.C.J.K. et al. (ed.) Proceedings Storage and Retrieval for Image and Video Databases (SPIE), vol. 3229, pp. 138–147 (1997)
Gish, H., Siu, M.H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 873–876 (1991)
Hoashi, K., Matsumoto, K., Inoue, N.: Personalization of user profiles for content-based music retrieval based on relevance feedback. In: Proceedings of ACM International Conference on Multimedia, pp. 110–119. ACM Press, New York, NY (2003)
Hoashi, K., Zeitler, E., Inoue, N.: Implementation of relevance feedback for content-based music retrieval based on user prefences. In: International ACM SIGIR conference on Research and development in information retrieval, pp. 385–386. ACM Press, New York, NY (2002)
Ihler, A.: Kernel density estimation toolbox for MATLAB (2005)URL http://ssg.mit.edu/~ihler/code/
Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, pp. 487–493. MIT Press, Cambridge, MA (1999)
Lai, W.C., Goh, K., Chang, E.Y.: On scalability of active learning for formulating query concepts. In: Amsaleg, L., Jónsson, B.T., Oria, V. (eds.) Workshop on Computer Vision Meets Databases, CVDB, pp. 11–18. ACM (2004)
Logan, B.: Mel frequency cepstral coefficients for music modelling. In: Proceedings of the International Conference on Music Information Retrieval, pp. 33–45 (2000)
Logan, B., Salomon, A.: A music similarity function based on signal analysis. In: Proceedings of IEEE International Conference on Multimedia & Expo. Tokyo, Japan, pp. 745–748 (2001)
Moreno, P., Rifkin, R.: Using the fisher kernel for web audio classification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 2417–2420 (2000)
Moreno, P.J., Ho, P.P., Vasconcelos, N.: A kullback-leibler divergence based kernel for SVM classification in multimedia applications. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA (2004)
Oppenheim, A.V.: A speech analysis-synthesis system based on homomorphic filtering. J. Acoust. Soc. Am. 45, 458–465 (1969)
Article PubMed Google Scholar
Penny, W.D.: Kullback-Liebler divergences of normal, gamma, Dirichlet and Wishart densities. Technical report, Wellcome Department of Cognitive Neurology (2001)
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: Solla, S., Leen, T., Mueller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 547–553 (2000)
Tong, S., Chang, E.: Support vector machine active learning for image retrieval. In: Proceedings of ACM International Conference on Multimedia, pp. 107–118. ACM Press, New York, NY (2001)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the International Conference on Machine Learning, pp. 999–1006 (2000)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learning Res. 2, 45–66 (2001)
Article Google Scholar
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Article Google Scholar
Whitman, B., Flake, G., Lawrence, S.: Artist detection in music with minnowmatch. In: IEEE Workshop on Neural Networks for Signal Processing, pp. 559–568. Falmouth, Massachusetts (2001)
Whitman, B., Rifkin, R.: Musical query-by-description as a multi-class learning problem. In: Proceedings of IEEE Multimedia Signal Processing Conference, pp. 153–156 (2002)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, 1312 S.W. Mudd, 500 West 120th Street, New York, NY, 10027, U.S.A
Michael I. Mandel, Graham E. Poliner & Daniel P. W. Ellis

Authors

Michael I. Mandel
View author publications
You can also search for this author in PubMed Google Scholar
Graham E. Poliner
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. W. Ellis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael I. Mandel.

Additional information

Michael Mandel is a PhD candidate at Columbia University. He received his BS degree in Computer Science from the Massachusetts Institute of Technology in 2004 and his MS from Columbia University in Electrical Engineering in 2006. In addition to music recommendation and music similarity, he is interested in computational models of sound and hearing and machine learning.

Graham Poliner received his BS degree in Electrical Engineering from the Georgia Institute of Technology in 2002 and his MS degree in Electrical Engineering from Columbia University in 2004 where he is currently a PhD candidate. His research interests include the application of signal processing and machine learning techniques toward music information retrieval.

Daniel Ellis is an associate professor in the Electrical Engineering Department at Columbia University in the City of New York. His Laboratory for Recognition and Organization of Speech and Audio (LabROSA) is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing. Ellis has a PhD in Electrical Engineering from MIT, where he was a research assistant at the Media Lab, and he spent several years as a research scientist at the International Computer Science Institute in Berkeley, CA. He also runs the AUDITORY email list of 1700 worldwide researchers in perception and cognition of sound.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandel, M.I., Poliner, G.E. & Ellis, D.P.W. Support vector machine active learning for music retrieval. Multimedia Systems 12, 3–13 (2006). https://doi.org/10.1007/s00530-006-0032-2

Download citation

Published: 07 April 2006
Issue Date: August 2006
DOI: https://doi.org/10.1007/s00530-006-0032-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Support vector machine active learning for music retrieval

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Introduction to Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Support vector machine active learning for music retrieval

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Introduction to Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation