Skip to main content
Log in

Support vector machine active learning for music retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Searching and organizing growing digital music collections requires a computational model of music similarity. This paper describes a system for performing flexible music similarity queries using SVM active learning. We evaluated the success of our system by classifying 1210 pop songs according to mood and style (from an online music guide) and by the performing artist. In comparing a number of representations for songs, we found the statistics of mel-frequency cepstral coefficients to perform best in precision-at-20 comparisons. We also show that by choosing training examples intelligently, active learning requires half as many labeled examples to achieve the same accuracy as a standard scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. All Music Guide: Site glossary. Urlhttp://www.all-music.com/cg/amg.dll?p=amg&sql=32:amg/info_pages/a_siteglossary.html

  2. Aucouturier, J.J., Pachet, F.: Improving timbre similarity: How high's the sky? J. Negative Results Speech Audio Sci. 1(1), (2004)

    Google Scholar 

  3. Berenzweig, A., Ellis, D.P.W., Lawrence, S.:Using voice segmentsto improve artist classification of music. In: Proceedings of AES International Conference on Virtual, Synthetic, and Entertainment Audio. Espoo, Finland (2002)

  4. Berenzweig, A., Ellis, D.P.W., Lawrence, S.: Anchor space for classification and similarity measurement of music. In: Proceedings of IEEE International Conference on Multimedia & Expo, pp. 29–32 (2003)

  5. Berenzweig, A., Logan, B., Ellis, D.P.W., Whitman, B.: A large-scale evalutation of acoustic and subjective music similarity measures. In: Proceedings International Conference on Music Information Retrieval, pp. 103–109 (2003)

  6. Burgess, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discov. 2(2), 121–167 (1998)

    Google Scholar 

  7. Chang, E.Y., Tong, S., Goh, K., Chang, C.W.: Support vector machine concept-dependent active learning for image retrieval. ACM Trans. Multimedia (2005) in press

  8. Chen, S., Gopalakrishnan, P.: Speaker, environment and channel change detection and clustering via the Bayesian Information Criterion. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (1998)

  9. Cristianini, N., Shawe-Taylor, J.: An introduction to support Vector Machines: And other kernel-based learning methods, Cambridge University Press, New York, NY (2000)

    Google Scholar 

  10. Downie, J.S., West, K., Ehmann, A., Vincent, E.: The 2005 music information retrieval evaluation exchange (MIREX 2005): Preliminary overview. In: Reiss, J.D., Wiggins, G.A. (eds.) Proceedings of the International Conference on Music Information Retrieval, pp. 320–323 (2005)

  11. Ellis, D., Berenzweig, A., Whitman, B.: The “uspop2002” pop music data set (2003). URL http://labrosa.ee. columbia.edu/projects/musicsim/uspop2002.html

  12. Ellis, D.P.W., Whitman, B., Berenzweig, A., Lawrence, S.: The quest for ground truth in musical artist similarity. In: Proceedings of the International Conference on Music Information Retrieval, pp. 170–177 (2002)

  13. Foote, J.T.: Content-based retrieval of music and audio. In: C.C.J.K. et al. (ed.) Proceedings Storage and Retrieval for Image and Video Databases (SPIE), vol. 3229, pp. 138–147 (1997)

  14. Gish, H., Siu, M.H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 873–876 (1991)

  15. Hoashi, K., Matsumoto, K., Inoue, N.: Personalization of user profiles for content-based music retrieval based on relevance feedback. In: Proceedings of ACM International Conference on Multimedia, pp. 110–119. ACM Press, New York, NY (2003)

  16. Hoashi, K., Zeitler, E., Inoue, N.: Implementation of relevance feedback for content-based music retrieval based on user prefences. In: International ACM SIGIR conference on Research and development in information retrieval, pp. 385–386. ACM Press, New York, NY (2002)

  17. Ihler, A.: Kernel density estimation toolbox for MATLAB (2005)URL http://ssg.mit.edu/~ihler/code/

  18. Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems, pp. 487–493. MIT Press, Cambridge, MA (1999)

  19. Lai, W.C., Goh, K., Chang, E.Y.: On scalability of active learning for formulating query concepts. In: Amsaleg, L., Jónsson, B.T., Oria, V. (eds.) Workshop on Computer Vision Meets Databases, CVDB, pp. 11–18. ACM (2004)

  20. Logan, B.: Mel frequency cepstral coefficients for music modelling. In: Proceedings of the International Conference on Music Information Retrieval, pp. 33–45 (2000)

  21. Logan, B., Salomon, A.: A music similarity function based on signal analysis. In: Proceedings of IEEE International Conference on Multimedia & Expo. Tokyo, Japan, pp. 745–748 (2001)

  22. Moreno, P., Rifkin, R.: Using the fisher kernel for web audio classification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 2417–2420 (2000)

  23. Moreno, P.J., Ho, P.P., Vasconcelos, N.: A kullback-leibler divergence based kernel for SVM classification in multimedia applications. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA (2004)

  24. Oppenheim, A.V.: A speech analysis-synthesis system based on homomorphic filtering. J. Acoust. Soc. Am. 45, 458–465 (1969)

    Article  PubMed  Google Scholar 

  25. Penny, W.D.: Kullback-Liebler divergences of normal, gamma, Dirichlet and Wishart densities. Technical report, Wellcome Department of Cognitive Neurology (2001)

  26. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: Solla, S., Leen, T., Mueller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 547–553 (2000)

  27. Tong, S., Chang, E.: Support vector machine active learning for image retrieval. In: Proceedings of ACM International Conference on Multimedia, pp. 107–118. ACM Press, New York, NY (2001)

  28. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the International Conference on Machine Learning, pp. 999–1006 (2000)

  29. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learning Res. 2, 45–66 (2001)

    Article  Google Scholar 

  30. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  31. Whitman, B., Flake, G., Lawrence, S.: Artist detection in music with minnowmatch. In: IEEE Workshop on Neural Networks for Signal Processing, pp. 559–568. Falmouth, Massachusetts (2001)

  32. Whitman, B., Rifkin, R.: Musical query-by-description as a multi-class learning problem. In: Proceedings of IEEE Multimedia Signal Processing Conference, pp. 153–156 (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael I. Mandel.

Additional information

Michael Mandel is a PhD candidate at Columbia University. He received his BS degree in Computer Science from the Massachusetts Institute of Technology in 2004 and his MS from Columbia University in Electrical Engineering in 2006. In addition to music recommendation and music similarity, he is interested in computational models of sound and hearing and machine learning.

Graham Poliner received his BS degree in Electrical Engineering from the Georgia Institute of Technology in 2002 and his MS degree in Electrical Engineering from Columbia University in 2004 where he is currently a PhD candidate. His research interests include the application of signal processing and machine learning techniques toward music information retrieval.

Daniel Ellis is an associate professor in the Electrical Engineering Department at Columbia University in the City of New York. His Laboratory for Recognition and Organization of Speech and Audio (LabROSA) is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing. Ellis has a PhD in Electrical Engineering from MIT, where he was a research assistant at the Media Lab, and he spent several years as a research scientist at the International Computer Science Institute in Berkeley, CA. He also runs the AUDITORY email list of 1700 worldwide researchers in perception and cognition of sound.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandel, M.I., Poliner, G.E. & Ellis, D.P.W. Support vector machine active learning for music retrieval. Multimedia Systems 12, 3–13 (2006). https://doi.org/10.1007/s00530-006-0032-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-006-0032-2

Keywords

Navigation