Toward Multi-modal Music Emotion Classification

Yang, Yi-Hsuan; Lin, Yu-Ching; Cheng, Heng-Tze; Liao, I-Bin; Ho, Yeh-Chin; Chen, Homer H.

doi:10.1007/978-3-540-89796-5_8

Toward Multi-modal Music Emotion Classification

Yi-Hsuan Yang⁸,
Yu-Ching Lin⁸,
Heng-Tze Cheng⁸,
I-Bin Liao⁹,
Yeh-Chin Ho⁹ &
…
Homer H. Chen⁸

Conference paper

1742 Accesses
42 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5353))

Abstract

The performance of categorical music emotion classification that divides emotion into classes and uses audio features alone for emotion classification has reached a limit due to the presence of a semantic gap between the object feature level and the human cognitive level of emotion perception. Motivated by the fact that lyrics carry rich semantic information of a song, we propose a multi-modal approach to help improve categorical music emotion classification. By exploiting both the audio features and the lyrics of a song, the proposed approach improves the 4-class emotion classification accuracy from 46.6% to 57.1%. The results also show that the incorporation of lyrics significantly enhances the classification accuracy of valence.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Casey, M., et al.: Content-based music information retrieval: current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)
Article Google Scholar
Yang, Y.-H., et al.: A regression approach to music emotion recognition. IEEE Trans. Audio, Speech and Language Processing 16(2), 448–457 (2008)
Article Google Scholar
Lu, L., et al.: Automatic mood detection and tracking of music audio signals. IEEE Trans. Audio, Speech and Language Processing 14(1), 5–18 (2006)
Article Google Scholar
Cheng, H.-T., et al.: Automatic chord recognition for music classification and retrieval. In: Proc. ICME, pp. 1505–1508 (2008)
Google Scholar
Yang, D., et al.: Disambiguating music emotion using software agents. In: Proc. ISMIR, pp. 52–58 (2004)
Google Scholar
Chuang, Z.-J., et al.: Emotion recognition using audio features and textual contents. In: Proc. ICME, pp. 53–56 (2004)
Google Scholar
Chua, B.-Y., et al.: Perceptual rhythm determination of music signal for emotion-based classification. In: Proc. MMM, pp. 4–11 (2006)
Google Scholar
Omar Ali, S., et al.: Songs and emotions: are lyrics and melodies equal partners. Psychology of Music 34(4), 511–534 (2006)
Article Google Scholar
Fornäs, J.: The words of music. Popular Music and Society 26(1) (2003)
Google Scholar
Cai, R., et al.: MusicSense: Contextual music recommendation using emotion allocation modeling. In: Proc. ACM Multimedia, pp. 553–556 (2007)
Google Scholar
Tzanetakis, G., et al.: Musical genre classification of audio signals. IEEE Trans. Speech and Audio Processing 10(5), 293–302 (2002), http://marsyas.sness.net/
Article Google Scholar
Cabrera, D., et al.: PSYSOUND: A computer program for psycho-acoustical analysis. In: Proc. Australian Acoustic Society Conf., pp. 47–54 (1999), http://psysound.wikidot.com/
Geleijnse, G., et al.: Efficient lyrics extraction from the web. In: Proc. ISMIR (2006)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM CSUR 34(1), 1–47 (2002)
Article Google Scholar
Want, J., et al.: Short-text classification based on ICA and LSA. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 265–270. Springer, Heidelberg (2006)
Chapter Google Scholar
Hofmann, T., et al.: Probabilistic latent semantic indexing. In: Proc. ACM SIGIR, pp. 50–57 (1999)
Google Scholar
Thayer, R.E., et al.: The Biopsychology of Mood and Arousal. Oxford University Press, New York (1989)
Google Scholar
Russel, A.: A circumplex model of affect. Journal of Personality & Social Science 39(6), 1161–1178 (1980)
Article Google Scholar
Smola, A.J., et al.: A tutorial on support vector regression. Statistics and Computing (2004)
Google Scholar
Chang, C.-C., et al.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Snoek, C., et al.: Early versus late fusion in semantic video analysis. In: Proc. ACM Multimedia, pp. 399–402 (2005)
Google Scholar
LingPipe, http://alias-i.com/lingpipe
Logan, B., et al.: Semantic analysis of song lyrics. In: Proc. ICME, pp. 827–830 (2004)
Google Scholar
Mahedero, J., et al.: Natural language processing of lyrics. In: Proc. ACM Multimedia, pp. 475–478 (2005)
Google Scholar
Cho, Y.-H., Lee, K.-J.: Automatic affect recognition using natural language processing techniques and manually built affect lexicon. IEICE Trans. Information Systems E89(12), 2964–2971 (2006)
Google Scholar
Leshed, G., et al.: Understanding how bloggers feel: Recognizing affect in blog posts. In: Proc. ACM CHI (2006)
Google Scholar
Abbasi, A., et al.: Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans. Knowledge and Data Engineering 20(9), 1168–1180 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Taiwan University, Taiwan
Yi-Hsuan Yang, Yu-Ching Lin, Heng-Tze Cheng & Homer H. Chen
Telecommunication Laboratories, Chunghwa Telecom, Taiwan
I-Bin Liao & Yeh-Chin Ho

Authors

Yi-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Ching Lin
View author publications
You can also search for this author in PubMed Google Scholar
Heng-Tze Cheng
View author publications
You can also search for this author in PubMed Google Scholar
I-Bin Liao
View author publications
You can also search for this author in PubMed Google Scholar
Yeh-Chin Ho
View author publications
You can also search for this author in PubMed Google Scholar
Homer H. Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering Science, National Cheng Kung University, No.1, University Road, 701, Tainan City, Taiwan
Yueh-Min Ray Huang
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, China
Changsheng Xu
Institute of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Kuo-Sheng Cheng
Department of Electrical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Jar-Ferr Kevin Yang
Department of Electrical and Computer Engineering, Concordia University, S-EV005.139, 1515 St. Catherine West, Montreal, H4G 2W1, Quebec, Canada
M. N. S. Swamy
Microsoft Research Asia, 5/F, Beijing Sigma Center, No. 49, Zhichun Road, Hai Dian District, 100080, Beijing, China
Shipeng Li
Department of Information Management, National Kaohsiung University of Applied Sciences, No. 415, Jiangong Road, Sanmin District, 80778, Kaohsiung, Taiwan
Jen-Wen Ding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, YH., Lin, YC., Cheng, HT., Liao, IB., Ho, YC., Chen, H.H. (2008). Toward Multi-modal Music Emotion Classification. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-89796-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89795-8
Online ISBN: 978-3-540-89796-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics