Abstract
Music and songs are integral parts of Bollywood movies. Every movie of two to three hours, contains three to ten songs, each song is 3–10 min long. Music lovers like to listen music and songs of a movie, however it is time consuming and error prone to search manually all the songs in a movie. Moreover, the task becomes much harder when songs are to be extracted from a huge archived movies’ database containing hundreds of movies. This paper presents an approach to automatically extract music and songs from archived musical movies. We used song grammar to construct Markov Chain Model that differentiates song scenes from dialogue and action scenes in a movie. We tested our system on Bollywood, Hollywood, Pakistani, Bengali, and Tamil movies. A total of 20 movies from different industries were selected for the experiments. On Bollywood movies, we achieved 97.22% recall in song extraction, whereas the recall on Hollywood musical movies is 80%. The test result on Pakistani, Tamil and Bengali movies is 87.09%.
Similar content being viewed by others
References
Aisopos F, Papadakis G, Varvarigou T (2011) Sentiment analysis of social media content using n-gram graphs. In: ACM multimedia 2011, ACM MM 2011, 28 Nov 28–1 Dec 2011, Scottsdale, Arizona, USA
Berenzweig AL, Ellis DPW (2001) Locating singing voice segments within music signals. In: IEEE WASPAA’01, New York, pp 119–122
Bhimani H (1995) In search of Lata Mangeshkar. Indus, Qasimabad
Blum A, Mitchell TM (1998) Combining labeled and unlabeled data with co-training. In: Workshop on computational learning theory. Morgan Kaufmann, San Mateo, pp 92–100. doi:10.1145/279943.279962
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge
Chowdhry P (2000) Colonial India and the making of empire cinema: image, ideology and identity. Manchester University Press, Manchester
Becchetti C, Ricotti LP (1999) Speech recognition: theory and C++ implementation. Wiley
Doulamis AD, Doulamis ND, Kollias SD (2000) On-line retrainable neural networks: improving the performance of neural networks in image analysis problems. IEEE Trans Neural Netw 11:137–155. doi:10.1109/72.822517
El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: International conference on acoustics, speech, and signal processing
Gokulsing KM, Dissanayake W (1998) Indian popular cinema: a narrative of cultural change. Orient Longman, New Delhi
Gulzar GN, Chatterjee S (2003) Encyclopedia of Hindi cinema: an enchanting close-up of India’s Hindi cinema (Britannica). Encyclopedia Britannica Inc., Chicago
Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufmann, San Mateo
Hirji F (2005) When local meets lucre: commerce, culture and imperialism in Bollywood cinema. Glob Media J 4(7):1–18
Huang TM, Kecman V, Kopriva I (2006) Kernel based algorithms for mining huge data sets: supervised, semi-supervised, and unsupervised learning. Springer, Berlin. doi:10.1007/3-540-31689-2
Imai T, Sato S, Homma S, Onoe K, Kobayashi A (2007) Online speech detection and dual-gender speech recognition for captioning broadcast news. IEICE Trans 90-D:1286–1291. doi:10.1093/ietisy/e90-d.8.1286
Jiang Zhang H, Kankanhalli A, Smoliar SW (1993) Automatic partitioning of full-motion video. Multimedia Syst 1:10–28. doi:10.1007/BF01210504
Kender JR, Lock Yeo B (1998) Video scene segmentation via continuous video coherence. In: Computer vision and pattern recognition, pp 367–373
Kim YE, Whitman B (2002) Singer identification in popular music recordings using voice coding features. In: International symposium/conference on music information retrieval
Lehane B, O’Connor N, Murphy N (2004) Dialogue scene detection in movies using low and mid-level visual features. In: Conference on image and video retrieval
Lehane B, O’Connor NE, Lee H, Smeaton AF (2007) Indexing of fictional video content for event detection and summarisation. Eurasip J Image Video Process 2007:1–16. doi:10.1155/2007/14615
Li Y, Narayanan SS, Kuo CCJ (2004) Content-based movie analysis and indexing based on audiovisual cues. IEEE Trans Circuits Syst Video Technol 14:1073–1085. doi:10.1109/TCSVT.2004.831968
Lu L, Jiang Zhang H, Jiang H (2002) Content analysis for audio classification and segmentation. IEEE Trans Audio Speech Lang Process 10:504–516. doi:10.1109/TSA.2002.804546
Lu L, Zhang HJ, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Syst 8:482–492. doi:10.1007/s00530-002-0065-0
Mesaros A, Virtanen T, Klapuri A (2007) Singer identification in polyphonic music using vocal separation and pattern recognition methods. In: International symposium/conference on music information retrieval
Movavi video cutter. http://www.movavi.com/videoeditor/
Nwe TL, Wang Y (2004) Automatic detection of vocal segments in popular songs. In: International symposium/conference on music information retrieval
Panagiotakis C, Tziritas G (2005) A speech/music discriminator based on rms and zero-crossings. IEEE Trans Multimedia 7:155–166. doi:10.1109/TMM.2004.840604
Rabiner LR, Juang BH (1993) Fundamentals of speech recognition
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: International conference on acoustics, speech, and signal processing
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: International conference on acoustics, speech, and signal processing
Shen J, Shepherd J, Cui B, Lee Tan K (2009) A novel framework for efficient automated singer identification in large music databases. ACM Trans Inf Syst 27:1–31. doi:10.1145/1508850.1508856
Song structure (popular music). http://en.wikipedia.org/wiki/Song_structure
Sundaram H, Chang SF (2000) Determining computable scenes in films and their structures using audio-visual memory models. In: ACM multimedia conference, pp 95–104. doi:10.1145/354384.354440
Xilisoft video cutter. http://www.xilisoft.com/video-cutter.html
Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7:771–785. doi:10.1109/76.633496
Zhang H, Low CY, Smoliar SW, Wu JH (1995) Video parsing, retrieval and browsing: an integrated and content-based solution. In: ACM multimedia conference, pp 15–24. doi:10.1145/217279.215068
Author information
Authors and Affiliations
Corresponding author
Additional information
The preliminary idea of this paper was published in the Proceedings of International Conference on Database and Data Mining (ICDDM), July 2010, Manila, Philippines.
Rights and permissions
About this article
Cite this article
Doudpota, S.M. Mining movie archives for song sequences. Multimed Tools Appl 69, 359–382 (2014). https://doi.org/10.1007/s11042-012-1021-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1021-4