A Novel Zero-Resource Spoken Term Detection Using Affinity Kernel Propagation with Acoustic Feature Map

Sudhakar, P.; Rao, K. Sreenivasa; Mitra, Pabitra

doi:10.1007/s42979-023-01754-9

A Novel Zero-Resource Spoken Term Detection Using Affinity Kernel Propagation with Acoustic Feature Map

Original Research
Published: 06 April 2023

Volume 4, article number 310, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

70 Accesses
1 Citation
Explore all metrics

Abstract

Spoken term detection (STD) without linguistic clues is challenging for retrieval tasks. Despite numerous studies to overcome the challenges, there is a scope for improvement. Dynamic time warping based techniques were extensively employed to accomplish the STD task in the absence of linguistic resources. A drawback of this approach is handling the speaker, language, acoustic and spoken query variabilities that exist in natural speech. Our approach introduces a novel acoustic feature representation adjoined with affinity kernel propagation to overcome the challenges. At first, the Self Organising Map based feature vector representation was employed to overcome the speaker variability issues. In the next stage, introducing the affinity kernel propagation approach captures the best alignment between the spoken query and the utterances in the similarity-matching task without constraining the nature of the query. By introducing the acoustic feature mapping and similarity-matching through affinity kernel propagation, a 6% performance gain of Maximum Term Weigh Value and a 5% reduction in the cross-entropy cost were achieved during the evaluation with QUESST-14 speech corpus across multiple languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Phonetic-Based Approach to Query-by-Example Spoken Term Detection

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Article Open access 13 January 2016

Multilingual spoken term detection: a review

Article 22 July 2020

Data availability

The dataset and evaluation scripts used in this study are available at https://speech.fit.vutbr.cz/software.

Notes

References

Chelba C, Hazen TJ, Saraclar M. Retrieval and browsing of spoken content. IEEE Signal Process Mag. 2008;25(3):39–49.
Article Google Scholar
Levin K, Jansen A, Durme BV. Segmental acoustic indexing for zero resource keyword search (2015).
Kamper H, Livescu K, Goldwater S. An embedded segmental k-means model for unsupervised segmentation and clustering of speech (2017).
Oosterveld B, Veale R, Scheutz M. A parallelized dynamic programming approach to zero resource spoken term discovery (2017).
Thual A, Dancette C, Karadayi J, Benjumea J, Dupoux E. A k-nearest neighbours approach to unsupervised spoken term discovery (2018).
Bhati S, Villalba J, Zelasko P, Dehak N. Self-expressing autoencoders for unsupervised spoken term discovery (2020).
Sung M-L, Lee T. Unsupervised spoken term discovery based on re-clustering of hypothesized speech segments with siamese and triplet networks. CoRR arXiv:2011.14062 (2020) .
Benzeghiba M, et al. Automatic speech recognition and speech variability: a review. Speech Commun. 2007;49(10):763–86.
Article Google Scholar
Li J, Wang X, Xu B. An empirical study of multilingual and low-resource spoken term detection using deep neural networks (2014).
Knill K, Gales M, Ragni A, Rath SP. Language independent and unsupervised acoustic models for speech recognition and keyword spotting (2014).
Park A, James RG. Unsupervised pattern discovery in speech. IEEE Trans Audio Speech Lang Process. 2008;16(1):186–97.
Article Google Scholar
Râsänen O, Doyle G. Unsupervised word discovery from speech using automatic segmentation into syllable-like units. G. & Michael C. Frank; 2015.
Mantena G, Prahallad K. Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios (2014).
Tulsiani H, Rao P. The iit-b query-by-example system for mediaeval 2015 (2015).
Cui J, et al. Multilingual representations for low resource speech recognition and keyword search (2015).
Yuan Y, Xie L, Leung C-C, Chen H, Ma B. Fast query-by-example speech search using attention-based deep binary embeddings. IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1988–2000.
Google Scholar
Ram D, Miculicich L, Bourlard H. Multilingual bottleneck features for query by example spoken term detection (2019).
Park A, James RG. Towards unsupervised pattern discovery in speech (2005).
Gupta V, Ajmera J, Kumar A, Verma A. A language independent approach to audio search (2011).
Bhati S, Nayak SK. Unsupervised segmentation of speech signals using kernel-gram matrices. Sri Rama Murty; 2018.
Zhang Y, Glass JR. Towards multi-speaker unsupervised speech pattern discovery (2010).
Jansen A, Benjamin VD. Efficient spoken term discovery using randomized algorithms (2011).
Chan C, Lee L. Model-based unsupervised spoken term detection with spoken queries. IEEE Trans Audio Speech Lang Process. 2013;21(7):1330–42.
Article Google Scholar
Muscariello A, Gravier G, Bimbot F. Unsupervised motif acquisition in speech via seeded discovery and template matching combination. IEEE Trans Audio Speech Lang Process. 2012;20(7):2031–44.
Article Google Scholar
Ludusan B, et al. Exploring multi-language resources for unsupervised spoken term discovery (2015).
Lyzinski V, Sell G, Jansen A. An evaluation of graph clustering methods for unsupervised term discovery (2015).
Yang P, et al. The nni query-by-example system for mediaeval 2014 (2014).
Karthik Pandia DS, Saranya MS, Hema AM. A fast query-by-example spoken term detection for zero resource languages (2016).
Myers C, Rabiner L, Rosenberg A. Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans Acoust Speech Signal Process. 1980;28(6):623–35.
Article MATH Google Scholar
Jansen A, Church K, Hermansky H. Towards spoken term discovery at scale with zero resources (2010).
Cottrell M, Fort J-C, Pagès G. Theoretical aspects of the som algorithm. Neurocomputing. 1998;21(1):119–38.
Article MATH Google Scholar
Xu H et al. Approximate search of audio queries by using dtw with phone time boundary and data augmentation (2016).
Lopez-Otero P, Parapar J, Barreiro A. Statistical language models for query-by-example spoken document retrieval. Multimed Tools Appl. 2020;79(11):7927–49.
Article Google Scholar
Mantena G, Achanta S, Prahallad K. Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Trans Audio Speech Lang Process. 2014;22(5):946–55.
Article Google Scholar
Heskes T. Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans Neural Netw. 2001;12(6):1299–305.
Article Google Scholar
Delgado S, Higuera C, Calle-Espinosa J, Morán F, Montero F. A som prototype-based cluster analysis methodology. Expert Syst Appl. 2017;88:14–28.
Article Google Scholar
Yao P, Zhu Q, Zhao R. Gaussian mixture model and self-organizing map neural-network-based coverage for target search in curve-shape area. IEEE Trans Cybern. 2020;52:3971–83.
Article Google Scholar
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS. Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon technical report n 93, 27403 (1993).
Anguera X, et al. Query-by-example spoken term detection evaluation on low-resource languages (2014).
Fiscus JG, Ajot J, Garofolo JS, Doddingtion G. Results of the 2006 spoken term detection evaluation (2007).

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Advanced Technology Development Centre, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India
P. Sudhakar
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, 721302, India
K. Sreenivasa Rao & Pabitra Mitra

Authors

P. Sudhakar
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Mitra
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors contribute to the conceptualization, methodology, implementation and article writing aspects.

Corresponding author

Correspondence to P. Sudhakar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

The implementation was available at https://github.com/sudhakar-pandiarajan/KWS.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sudhakar, P., Rao, K.S. & Mitra, P. A Novel Zero-Resource Spoken Term Detection Using Affinity Kernel Propagation with Acoustic Feature Map. SN COMPUT. SCI. 4, 310 (2023). https://doi.org/10.1007/s42979-023-01754-9

Download citation

Received: 26 July 2022
Accepted: 25 February 2023
Published: 06 April 2023
DOI: https://doi.org/10.1007/s42979-023-01754-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Zero-Resource Spoken Term Detection Using Affinity Kernel Propagation with Acoustic Feature Map

Abstract

Access this article

Similar content being viewed by others

A Phonetic-Based Approach to Query-by-Example Spoken Term Detection

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Multilingual spoken term detection: a review

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Zero-Resource Spoken Term Detection Using Affinity Kernel Propagation with Acoustic Feature Map

Abstract

Access this article

Similar content being viewed by others

A Phonetic-Based Approach to Query-by-Example Spoken Term Detection

Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

Multilingual spoken term detection: a review

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation