Abstract
The performance of categorical music emotion classification that divides emotion into classes and uses audio features alone for emotion classification has reached a limit due to the presence of a semantic gap between the object feature level and the human cognitive level of emotion perception. Motivated by the fact that lyrics carry rich semantic information of a song, we propose a multi-modal approach to help improve categorical music emotion classification. By exploiting both the audio features and the lyrics of a song, the proposed approach improves the 4-class emotion classification accuracy from 46.6% to 57.1%. The results also show that the incorporation of lyrics significantly enhances the classification accuracy of valence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Casey, M., et al.: Content-based music information retrieval: current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)
Yang, Y.-H., et al.: A regression approach to music emotion recognition. IEEE Trans. Audio, Speech and Language Processing 16(2), 448–457 (2008)
Lu, L., et al.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio, Speech and Language Processing 14(1), 5–18 (2006)
Cheng, H.-T., et al.: Automatic chord recognition for music classification and retrieval. In: Proc. ICME, pp. 1505–1508 (2008)
Yang, D., et al.: Disambiguating music emotion using software agents. In: Proc. ISMIR, pp. 52–58 (2004)
Chuang, Z.-J., et al.: Emotion recognition using audio features and textual contents. In: Proc. ICME, pp. 53–56 (2004)
Chua, B.-Y., et al.: Perceptual rhythm determination of music signal for emotion-based classification. In: Proc. MMM, pp. 4–11 (2006)
Omar Ali, S., et al.: Songs and emotions: are lyrics and melodies equal partners. Psychology of Music 34(4), 511–534 (2006)
Fornäs, J.: The words of music. Popular Music and Society 26(1) (2003)
Cai, R., et al.: MusicSense: Contextual music recommendation using emotion allocation modeling. In: Proc. ACM Multimedia, pp. 553–556 (2007)
Tzanetakis, G., et al.: Musical genre classification of audio signals. IEEE Trans. Speech and Audio Processing 10(5), 293–302 (2002), http://marsyas.sness.net/
Cabrera, D., et al.: PSYSOUND: A computer program for psycho-acoustical analysis. In: Proc. Australian Acoustic Society Conf., pp. 47–54 (1999), http://psysound.wikidot.com/
Geleijnse, G., et al.: Efficient lyrics extraction from the web. In: Proc. ISMIR (2006)
Sebastiani, F.: Machine learning in automated text categorization. ACM CSUR 34(1), 1–47 (2002)
Want, J., et al.: Short-text classification based on ICA and LSA. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 265–270. Springer, Heidelberg (2006)
Hofmann, T., et al.: Probabilistic latent semantic indexing. In: Proc. ACM SIGIR, pp. 50–57 (1999)
Thayer, R.E., et al.: The Biopsychology of Mood and Arousal. Oxford University Press, New York (1989)
Russel, A.: A circumplex model of affect. Journal of Personality & Social Science 39(6), 1161–1178 (1980)
Smola, A.J., et al.: A tutorial on support vector regression. Statistics and Computing (2004)
Chang, C.-C., et al.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Snoek, C., et al.: Early versus late fusion in semantic video analysis. In: Proc. ACM Multimedia, pp. 399–402 (2005)
LingPipe, http://alias-i.com/lingpipe
Logan, B., et al.: Semantic analysis of song lyrics. In: Proc. ICME, pp. 827–830 (2004)
Mahedero, J., et al.: Natural language processing of lyrics. In: Proc. ACM Multimedia, pp. 475–478 (2005)
Cho, Y.-H., Lee, K.-J.: Automatic affect recognition using natural language processing techniques and manually built affect lexicon. IEICE Trans. Information Systems E89(12), 2964–2971 (2006)
Leshed, G., et al.: Understanding how bloggers feel: Recognizing affect in blog posts. In: Proc. ACM CHI (2006)
Abbasi, A., et al.: Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans. Knowledge and Data Engineering 20(9), 1168–1180 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, YH., Lin, YC., Cheng, HT., Liao, IB., Ho, YC., Chen, H.H. (2008). Toward Multi-modal Music Emotion Classification. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-89796-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89795-8
Online ISBN: 978-3-540-89796-5
eBook Packages: Computer ScienceComputer Science (R0)