Skip to main content
Log in

Ensemble audio segmentation for radio and television programmes

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

State-of-the-art audio segmentation strategies obtain good results when performing simple tasks but its performance is degraded when segmenting real-world scenarios such as radio and television programmes; this issue can be partially solved by performing a fusion of different audio segmentation strategies. Hence, a framework to perform decision-level fusion in the audio segmentation task is presented in this paper. First, the class-conditional probabilities of each audio segmentation strategy are estimated from a confusion matrix obtained by performing audio segmentation in a training dataset. Performance measures are extracted from these class-conditional probabilities, which are used to compute different estimates of the classifier’s reliability; specifically, reliability estimates based on precision, recall, accuracy, F-score and mutual information were proposed. These reliability estimates are used as weights in a weighted majority voting fusion strategy. The validity of the proposed fusion scheme and reliability estimates was assessed in the framework of Albayzin 2010, 2012 and 2014 audio segmentation evaluations, which consisted in segmenting collections of radio and television programmes. The experimental results showed that this simple fusion strategy improves the performance achieved by the individual audio segmentation strategies and by other well-known decision-level fusion strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Anguera X, Hernando J (2004) XBIC: Nueva Medida para segmentación de locutor hacia el indexado automático de la señal de voz. In: III Jornadas en tecnología del habla, 237–242

  2. Butko T, Nadeu C (2011) Audio segmentation of broadcast news in the albayzin-2010 evaluation: overview, results, and discussion. EURASIP Journal on Audio, Speech and Music Processing 2011(1)

  3. Butko T, Nadeu C, Schulz H (2010) Albayzin-2010 audio segmentation evaluation: Evaluation setup and results. In: Proceedings of FALA 2010 - VI jornadas en tecnología del habla and II iberian SLTech workshop, 305–308

  4. Castan D, Ortega A, Miguel A, Lleida E (2014) Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP Journal on Audio, Speech and Music Processing 2014(34)

  5. Castanedo F (2013) A review of data fusion techniques. Sci World J:2013

  6. Cettolo M, Vescovi M (2003) Efficient audio segmentation algorithms based on the BIC. In: Proceedings of ICASSP VI, 537–540

  7. Cho S, Kim J (1995) Multiple network fusion using fuzzy logic. IEEE Trans Neural Netw 6(2):497–501

    Article  Google Scholar 

  8. Comon P (1994) Independent component analysis - a new concept? Signal Process 36:287– 314

    Article  MATH  Google Scholar 

  9. Delacourt P, Kryze D, Wellekens CJ (2000) DISTBIC: a speaker-based segmentation for audio data indexing. Speech Comm 32(1-2):111–126

    Article  Google Scholar 

  10. Do CT, Barras C, Lee VB, Sarkar AK (2013) Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data. In: Proceedings of interspeech, 2484–2488

  11. Franco-Pedroso J, Gomez-Rincon E, Ramos D, Gonzalez-Rodriguez J (2014) ATVS-UAM system description for the albayzin 2014 audio segmentation evaluation. In: Proceedings of iberspeech 2014: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 247–252

  12. Gunatilaka AH, Baertlein BA (2001) Feature-Level And Decision-Level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans Pattern Anal Mach Intell 23(6):577–589

    Article  Google Scholar 

  13. Hall M (1998) Correlation-based feature subset selection for machine learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand

  14. Huang YS, Suen CY (1993) The Behavior-Knowledge space method for combination of multiple classifiers. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp 347–352

  15. Kasapoglu NG, Anfinsen SN, Eltoft T (2012) Fusion of optical and multifrequency PolSAR data for forest classification. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp 3355–3358

  16. Kittler J, Hatef M, Duln P, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  17. Koa AH, Sabourina R, de Souza Britto Jr. A, Oliveira L (2007) Pairwise fusion matrix for combining classifiers. Pattern Recogn 40(8):2198–2210

    Article  MATH  Google Scholar 

  18. Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Science

  19. Kuncheva L, Rodriguez J (2014) A weighted voting framework for classifiers ensembles. Knowl Inf Syst 38(2)

  20. Littlestone N, Warmuth M (1994) Weighted majority algorithm. Inf Comput:212–261

  21. Lopez-Otero P, Docio-Fernandez L, Garcia-Mateo C (2014) GTM-UVIgo System for Albayzin 2014 Audio Segmentation Evaluation. In: Proceedings of iberspeech 2014: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 253–262

  22. Meinedo H, Neto J (2005) A Stream-Based audio segmentation, classification and clustering Pre-Processing system for broadcast news using ANN models. In: Proceedings of interspeech, 237–240

  23. Metze F, Rawat S, Wang Y (2014) Improved audio features for Large-Scale multimedia event detection. In: IEEE International conference on multimedia and expo, ICME, 1–6

  24. Molina L (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of IEEE international conference on data mining, 306–313

  25. Ortega A, Castan D, Miguel A, Lleida E (2014) The albayzin 2014 audio segmentation evaluation. In: Proceedings of iberspeech: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 283–289

  26. Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45

    Article  Google Scholar 

  27. Ramona M, Richard G (2009) Comparison of different strategies for a SVM-based audio segmentation. In: Proceedings of the european signal processing conference (EUSIPCO)

  28. Rodriguez-Fuentes L, Penagarikano M, Varona A, Diez M, Bordel G (2012) GTTS Systems for the albayzin 2012 audio segmentation evaluation. In: Proceedings of iberspeech 2012: VII jornadas en tecnología del habla and III iberian SLTech workshop, 590–595

  29. Ross A, Govindarajan R (2005) Feature level fusion using hand and face biometrics. In: Proceedings of SPIE conference on biometric technology for human identification II 5779, 196–204

  30. Rybach D, Gollan C, Schlüter R, Ney H (2009) Audio segmentation for speech recognition using segment features. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), 4197–4200

  31. Schuller B, Metze F, Steidl S, Batliner A, Eyben F, Polzehl T (2010) Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), 5230–5233

  32. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  33. Seyerlehner K, Pohle T, Schedl M, Widmer G (2007) Automatic music detection in television productions. In: Proceedings of the 10th international conference on digital audio effects (DAFx-07)

  34. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    MATH  Google Scholar 

  35. Silvestre-Cerdà J, Giménez A, Andrés-Ferrer J, Civera J, Juan A (2012) Albayzin evaluation: the PRHLT-UPV audio segmentation system. In: Proceedings of iberspeech: VII jornadas en tecnología del habla and III iberian SLTech workshop, 596–600

  36. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437

    Article  Google Scholar 

  37. Tao Q, Veldhuis R (2009) Threshold-optimized decision-level fusion and its application to biometrics. Pattern Recogn 42:823–836

    Article  Google Scholar 

  38. Tavarez D, Navas E, Alonso A, Erro D, Saratxaga I, Hernaez I (2014) Aholab audio segmentation system for albayzin 2014 evaluation campaign. In: Proceedings of iberspeech 2014: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 273–282

  39. Tulys P, Akkermans A, Kevenaar T, Schrijen G, Bazen A, Veldhuis R (2005) Practical biometric authentication with template protection. In: Proceedings of 5th international conference on audio- and video-based personal authentication, 436–446

  40. Tzanetakis G (2002) Manipulation, analysis and retrieval systems for audio signals. Ph.D. Thesis, Princeton University

  41. Young SJ, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P (2006) The HTK book version 3.4, Cambridge University Press

Download references

Acknowledgments

This work has been supported by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, ’Consolidation of Research Units: AtlantTIC Project’ CN2012/160) and the Spanish Government (‘SpeechTech4All Project’ TEC2012-38939-C03-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paula Lopez-Otero.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lopez-Otero, P., Docio-Fernandez, L. & Garcia-Mateo, C. Ensemble audio segmentation for radio and television programmes. Multimed Tools Appl 76, 7421–7444 (2017). https://doi.org/10.1007/s11042-016-3386-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3386-2

Keywords

Navigation