Ensemble audio segmentation for radio and television programmes

Lopez-Otero, Paula; Docio-Fernandez, Laura; Garcia-Mateo, Carmen

doi:10.1007/s11042-016-3386-2

Ensemble audio segmentation for radio and television programmes

Published: 09 March 2016

Volume 76, pages 7421–7444, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Paula Lopez-Otero ORCID: orcid.org/0000-0003-2859-099X¹,
Laura Docio-Fernandez¹ &
Carmen Garcia-Mateo¹

275 Accesses
8 Citations
Explore all metrics

Abstract

State-of-the-art audio segmentation strategies obtain good results when performing simple tasks but its performance is degraded when segmenting real-world scenarios such as radio and television programmes; this issue can be partially solved by performing a fusion of different audio segmentation strategies. Hence, a framework to perform decision-level fusion in the audio segmentation task is presented in this paper. First, the class-conditional probabilities of each audio segmentation strategy are estimated from a confusion matrix obtained by performing audio segmentation in a training dataset. Performance measures are extracted from these class-conditional probabilities, which are used to compute different estimates of the classifier’s reliability; specifically, reliability estimates based on precision, recall, accuracy, F-score and mutual information were proposed. These reliability estimates are used as weights in a weighted majority voting fusion strategy. The validity of the proposed fusion scheme and reliability estimates was assessed in the framework of Albayzin 2010, 2012 and 2014 audio segmentation evaluations, which consisted in segmenting collections of radio and television programmes. The experimental results showed that this simple fusion strategy improves the performance achieved by the individual audio segmentation strategies and by other well-known decision-level fusion strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Image segmentation evaluation: a survey of methods

Article 18 April 2020

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

References

Anguera X, Hernando J (2004) XBIC: Nueva Medida para segmentación de locutor hacia el indexado automático de la señal de voz. In: III Jornadas en tecnología del habla, 237–242
Butko T, Nadeu C (2011) Audio segmentation of broadcast news in the albayzin-2010 evaluation: overview, results, and discussion. EURASIP Journal on Audio, Speech and Music Processing 2011(1)
Butko T, Nadeu C, Schulz H (2010) Albayzin-2010 audio segmentation evaluation: Evaluation setup and results. In: Proceedings of FALA 2010 - VI jornadas en tecnología del habla and II iberian SLTech workshop, 305–308
Castan D, Ortega A, Miguel A, Lleida E (2014) Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP Journal on Audio, Speech and Music Processing 2014(34)
Castanedo F (2013) A review of data fusion techniques. Sci World J:2013
Cettolo M, Vescovi M (2003) Efficient audio segmentation algorithms based on the BIC. In: Proceedings of ICASSP VI, 537–540
Cho S, Kim J (1995) Multiple network fusion using fuzzy logic. IEEE Trans Neural Netw 6(2):497–501
Article Google Scholar
Comon P (1994) Independent component analysis - a new concept? Signal Process 36:287– 314
Article MATH Google Scholar
Delacourt P, Kryze D, Wellekens CJ (2000) DISTBIC: a speaker-based segmentation for audio data indexing. Speech Comm 32(1-2):111–126
Article Google Scholar
Do CT, Barras C, Lee VB, Sarkar AK (2013) Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data. In: Proceedings of interspeech, 2484–2488
Franco-Pedroso J, Gomez-Rincon E, Ramos D, Gonzalez-Rodriguez J (2014) ATVS-UAM system description for the albayzin 2014 audio segmentation evaluation. In: Proceedings of iberspeech 2014: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 247–252
Gunatilaka AH, Baertlein BA (2001) Feature-Level And Decision-Level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans Pattern Anal Mach Intell 23(6):577–589
Article Google Scholar
Hall M (1998) Correlation-based feature subset selection for machine learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand
Huang YS, Suen CY (1993) The Behavior-Knowledge space method for combination of multiple classifiers. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp 347–352
Kasapoglu NG, Anfinsen SN, Eltoft T (2012) Fusion of optical and multifrequency PolSAR data for forest classification. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp 3355–3358
Kittler J, Hatef M, Duln P, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Article Google Scholar
Koa AH, Sabourina R, de Souza Britto Jr. A, Oliveira L (2007) Pairwise fusion matrix for combining classifiers. Pattern Recogn 40(8):2198–2210
Article MATH Google Scholar
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Science
Kuncheva L, Rodriguez J (2014) A weighted voting framework for classifiers ensembles. Knowl Inf Syst 38(2)
Littlestone N, Warmuth M (1994) Weighted majority algorithm. Inf Comput:212–261
Lopez-Otero P, Docio-Fernandez L, Garcia-Mateo C (2014) GTM-UVIgo System for Albayzin 2014 Audio Segmentation Evaluation. In: Proceedings of iberspeech 2014: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 253–262
Meinedo H, Neto J (2005) A Stream-Based audio segmentation, classification and clustering Pre-Processing system for broadcast news using ANN models. In: Proceedings of interspeech, 237–240
Metze F, Rawat S, Wang Y (2014) Improved audio features for Large-Scale multimedia event detection. In: IEEE International conference on multimedia and expo, ICME, 1–6
Molina L (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of IEEE international conference on data mining, 306–313
Ortega A, Castan D, Miguel A, Lleida E (2014) The albayzin 2014 audio segmentation evaluation. In: Proceedings of iberspeech: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 283–289
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Article Google Scholar
Ramona M, Richard G (2009) Comparison of different strategies for a SVM-based audio segmentation. In: Proceedings of the european signal processing conference (EUSIPCO)
Rodriguez-Fuentes L, Penagarikano M, Varona A, Diez M, Bordel G (2012) GTTS Systems for the albayzin 2012 audio segmentation evaluation. In: Proceedings of iberspeech 2012: VII jornadas en tecnología del habla and III iberian SLTech workshop, 590–595
Ross A, Govindarajan R (2005) Feature level fusion using hand and face biometrics. In: Proceedings of SPIE conference on biometric technology for human identification II 5779, 196–204
Rybach D, Gollan C, Schlüter R, Ney H (2009) Audio segmentation for speech recognition using segment features. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), 4197–4200
Schuller B, Metze F, Steidl S, Batliner A, Eyben F, Polzehl T (2010) Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote. In: Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), 5230–5233
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Seyerlehner K, Pohle T, Schedl M, Widmer G (2007) Automatic music detection in television productions. In: Proceedings of the 10th international conference on digital audio effects (DAFx-07)
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
MATH Google Scholar
Silvestre-Cerdà J, Giménez A, Andrés-Ferrer J, Civera J, Juan A (2012) Albayzin evaluation: the PRHLT-UPV audio segmentation system. In: Proceedings of iberspeech: VII jornadas en tecnología del habla and III iberian SLTech workshop, 596–600
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
Tao Q, Veldhuis R (2009) Threshold-optimized decision-level fusion and its application to biometrics. Pattern Recogn 42:823–836
Article Google Scholar
Tavarez D, Navas E, Alonso A, Erro D, Saratxaga I, Hernaez I (2014) Aholab audio segmentation system for albayzin 2014 evaluation campaign. In: Proceedings of iberspeech 2014: VIII jornadas en tecnología del habla and IV iberian SLTech workshop, 273–282
Tulys P, Akkermans A, Kevenaar T, Schrijen G, Bazen A, Veldhuis R (2005) Practical biometric authentication with template protection. In: Proceedings of 5th international conference on audio- and video-based personal authentication, 436–446
Tzanetakis G (2002) Manipulation, analysis and retrieval systems for audio signals. Ph.D. Thesis, Princeton University
Young SJ, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P (2006) The HTK book version 3.4, Cambridge University Press

Download references

Acknowledgments

This work has been supported by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, ’Consolidation of Research Units: AtlantTIC Project’ CN2012/160) and the Spanish Government (‘SpeechTech4All Project’ TEC2012-38939-C03-01).

Author information

Authors and Affiliations

AtlantTIC Research Center, Multimedia Technologies Group, University of Vigo, E.E. Telecomunicación, Campus Universitario de Vigo, S/N, C.P. 36310, Vigo, Spain
Paula Lopez-Otero, Laura Docio-Fernandez & Carmen Garcia-Mateo

Authors

Paula Lopez-Otero
View author publications
You can also search for this author in PubMed Google Scholar
Laura Docio-Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Garcia-Mateo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paula Lopez-Otero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lopez-Otero, P., Docio-Fernandez, L. & Garcia-Mateo, C. Ensemble audio segmentation for radio and television programmes. Multimed Tools Appl 76, 7421–7444 (2017). https://doi.org/10.1007/s11042-016-3386-2

Download citation

Received: 07 May 2015
Revised: 03 January 2016
Accepted: 23 February 2016
Published: 09 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11042-016-3386-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble audio segmentation for radio and television programmes

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Image segmentation evaluation: a survey of methods

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble audio segmentation for radio and television programmes

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Image segmentation evaluation: a survey of methods

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation