Abstract
Sound Event Detection (SED) plays a significant role in the present research, implemented in several areas such as Computer Science, Healthcare, Environmental Science, Security and Surveillance, etc. With the advancement of technology, SED can be deployed to mimic the human auditory system. In this paper, we have undertaken a Systematic Literature Review focused on sound event detection, presenting a comprehensive and well-structured analysis and in-depth discussions. This review is based on the authors' extensive knowledge and expertise in the field, and it compares various algorithms employed for sound event detection. The primary objective of this study is to offer valuable insights into datasets, feature extraction techniques, and execution models commonly used in SED, along with an examination of their corresponding accuracy, challenges, and limitations. Furthermore, the paper delves into identifying potential trends within the field, offering forward-looking information that can be invaluable for future research and development efforts in sound event detection. This systematic review aims to contribute to the continued advancement of SED technologies and applications by synthesizing existing knowledge and identifying emerging directions. It provides a foundation for researchers, practitioners, and stakeholders to make informed decisions and explore new possibilities within this evolving domain.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering–a systematic literature review. Inf Softw Technol 51(1): 7–15
Mesaros A, Heittola T, Virtanen T (2016) TUT database for acoustic scene classification and sound event detection. 24th European signal processing conference (EUSIPCO), pp 1128–1132. https://doi.org/10.1109/EUSIPCO.2016.7760424
Lim H, Park J, Han Y (2017) Rare sound event detection using 1D convolutional recurrent neural networks. In: Proceedings of the detection and classification of acoustic scenes and events 2017 workshop (DCASE2017), pp 80–84
Kawaguchi Y, Tanabe R, Endo T, Ichige K, Hamada K (2019) Anomaly detection based on an ensemble of dereverberation and anomalous sound extraction. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 865–869
Adavanne S, Virtanen T (2017) Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. arXiv preprint arXiv:1710.02998
Archontis P, Mesaros A, Adavanne S, Heittola T, Virtanen T (2020) Overview and evaluation of sound event localization and detection in DCASE2019. IEEE/ACM transactions on audio, speech, and language processing, 29 pp 684–698
Kawaguchi Y, Endo T, Ichige K, Hamada K (2018) Non-negative novelty extraction: A new non-negativity constraint for NMF. 16th international workshop on acoustic signal enhancement (IWAENC), pp 256–260
Küçükbay SE, Sert M (2015) Audio-based event detection in office live environments using optimized MFCC-SVM approach. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015), pp 475–480
Mak M-W, Kung S-Y (2012) Low-power SVM classifiers for sound event classification on mobile devices. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1985–1988
Parathai P, Tengtrairat N, Woo WL, Abdullah MAM, Rafiee G, Alshabrawy O (2020) Efficient noisy sound-event mixture classification using adaptive-sparse complex-valued matrix factorization and OvsO SVM. Sensors 20(16):4368
Tran HD, Li H (2010) Sound event recognition with probabilistic distance SVMs. IEEE Trans Audio Speech Lang Process 19(6):1556–1568
Yu C-Y, Liu H, Qi Z-M (2017) Sound event detection using deep random forest. Detection and Classification of Acoustic Scenes and Events
Phan H, Maaß M, Mazur R, Mertins A (2014) Random regression forests for acoustic event detection and classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):20–31
Xia X, Togneri R, Sohel F, Huang D (2017) Random forest classification based acoustic event detection. IEEE International Conference on Multimedia and Expo (ICME), pp 163–168
Xia X, Togneri R, Sohel F, Huang D (2018) Random forest classification based acoustic event detection utilizing contextual-information and bottleneck features. Pattern Recogn 81(2018):1–13
Stoller D, Ewert S, Dixon S (2018) Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185
Park J, Shin J, Lee K (2018) Separation of instrument sounds using non-negative matrix factorization with spectral envelope constraints. arXiv preprint arXiv:1801.04081
Chan TK, Chin CS, Li Y (2020) Non-negative matrix factorization-convolutional neural network (NMF-CNN) for sound event detection. arXiv preprint arXiv:2001.07874
Bisot V, Essid S, Richard G (2017) Overlapping sound event detection with supervised nonnegative matrix factorization. IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 31–35
Imoto K, Tonami N, Koizumi Y, Yasuda M, Yamanishi R, Yamashita Y (2020) Sound event detection by multitask learning of sound events and scenes with soft scene labels. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 621–625
Wei W, Zhu H, Benetos E, Wang Y (2020) A-crnn: A domain adaptation model for sound event detection. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 276–280
Innami S, Kasai H (2012) NMF-based environmental sound source separation using time-variant gain features. Comput Math Appl 64(5):1333–1342
Komatsu T, Senda Y, Kondo R (2016) Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation. IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2259–2263
Noh K, Chang J-H (2020) Joint optimization of deep neural network-based dereverberation and beam forming for sound event detection in multi-channel environments. Sensors 20(7):1883
Turpault N, Serizel R, Wisdom S, Erdogan H, Hershey JR, Fonseca E, Seetharaman P, Salamon J (2021) Sound event detection and separation: a benchmark on desed synthetic soundscapes. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 840–844
Komatsu T, Toizumi T, Kondo R, Senda Y (2016) Acoustic event detection method using semi-supervised non-negative matrix factorization with a mixture of local dictionaries. In: Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), pp 45–49
Kong Q, Cao Y, Iqbal T, Xu Y, Wang W, Plumbley MD (2019) Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems. arXiv preprint arXiv:1904.03476
Grondin F, Glass J, Sobieraj I, Plumbley MD (2019) Sound event localization and detection using CRNN on pairs of microphones. arXiv preprint arXiv:1910.10049
Adavanne S, Politis A, Virtanen T (2019) A multi-room reverberant dataset for sound event localization and detection. arXiv preprint arXiv:1905.08546
Zhang J, Ding W, He L (2019) Data augmentation and prior knowledge-based regularization for sound event localization and detection. DCASE 2019 detection and classification of acoustic scenes and events 2019 Challenge
Cao Y, Iqbal T, Kong Q, Galindo M, Wang W, Plumbley M (2019) Two-stage sound event localization and detection using intensity vector and generalized cross-correlation. DCASE2019 Challenge, Tech. Rep
Adavanne S, Politis A, Nikunen J, Virtanen T (2018) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Top Signal Process 13(1):34–48
Xue W, Tong Y, Zhang C, Ding G, He X, Zhou B (2020) Sound event localization and detection based on multiple DOA beam forming and multi-task learning. Proc. Interspeech 2020 : 5091-5095
Nguyen TNT, Jones DL, Gan W (2020) Ensemble of sequence matching networks for dynamic sound event localization detection and tracking. In: Detection and classification of acoustic scenes and events 2020 workshop (DCASE2020)
Trowitzsch I, Schymura C, Kolossa D, Obermayer K (2019) Joining sound event detection and localization through spatial segregation. IEEE/ACM Trans Audio Speech Lang Process 28:487–502
Kim B, Pardo B (2019) Sound event detection using point-labeled data. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp 1–5
Xia X, Togneri R, Sohel F, Huang D (2018) Auxiliary classifier generative adversarial network with soft labels in imbalanced acoustic event detection. IEEE Trans Multimedia 21(6):1359–1371
Basaran D, Essid S, Peeters G (2018) Main melody extraction with source-filter NMF and CRNN. In: 19th International Society for Music Information Retreival. 2018
Boulanger-Lewandowski N, Mysore GJ, Hoffman M (2014) Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6969–6973. IEEE
Liu S, Guo L, Wiggins GA (2018) A parallel fusion approach to piano music transcription based on convolutional neural network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 391–395. IEEE
Hsieh T-H, Su L, Yang Y-H (2019) A streamlined encoder/decoder architecture for melody extraction. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 156–160. IEEE
Machado RB, Aguiar L, Jones G (2017) Do acoustic indices reflect the characteristics of bird communities in the savannas of Central Brazil? Landsc Urban Plan 162:36–43
Ross S-J, Friedman NR, Dudley KL, Yoshimura M, Yoshida T, Economo EP (2018) Listening to ecosystems: data-rich acoustic monitoring through landscape-scale sensor networks. Ecol Res 33(1):135–147
Gómez WE, Isaza CV, Daza JM (2018) Identifying disturbed habitats: a new method from acoustic indices. Eco Inform 45:16–25
Khanaposhtani MG, Gasc A, Francomano D, Villanueva-Rivera LJ, Jung J, Mossman MJ, Pijanowski BC (2019) Effects of highways on bird distribution and soundscape diversity around Aldo Leopold’s shack in Baraboo, Wisconsin, USA. Landsc Urban Plan 192:103666
Siddagangaiah S, Chen C-F, Wei-Chun Hu, Pieretti N (2019) A complexity-entropy based approach for the detection of fish choruses. Entropy 21(10):977
Roma G, Nogueira W, Herrera P (2013) Recurrence quantification analysis features for environmental sound recognition. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics, pp 1–4. IEEE
Sobieraj I, Kong Q, Plumbley MD (2017) Masked non-negative matrix factorization for bird detection using weakly labeled data. In: 2017 25th European signal processing conference (EUSIPCO), pp 1769–1773. IEEE
Yu S, Yi Yu, Chen Xi, Li W (2021) HANME: hierarchical attention network for singing melody extraction. IEEE Signal Process Lett 28:1006–1010
Surampudi N, Srirangan M, Christopher J (2019) Enhanced feature extraction approaches for detection of sound events. In: 2019 IEEE 9th international conference on advanced computing (IACC), pp 223–229. IEEE
Gumelar AB, Kurniawan A, Sooai AG, Purnomo MH, Yuniarno ME, Sugiarto I, Widodo A, Kristanto AA, Fahrudin TM (2019) Human voice emotion identification using prosodic and spectral feature extraction based on deep neural networks. In: 2019 IEEE 7th international conference on serious games and applications for health (SeGAH), pp 1–8. IEEE
Jain U, Nathani K, Ruban N, Raj ANJ, Zhuang Z, Mahesh VGV (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In: 2018 international conference on sensor networks and signal processing (SNSP), pp 386–391. IEEE
Lee S, Pang H-S (2020) Feature extraction based on the non-negative matrix factorization of convolutional neural networks for monitoring domestic activity with acoustic signals. IEEE Access 8:122384–122395
Piczak KJ (2015) ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1015–1018
Zinemanas P, Cancela P, Rocamora M (2019) MAVD: a dataset for sound event detection in urban environments. Detection and classification of acoustic scenes and events, DCASE 2019, New York, NY, USA, 25–26 Oct, page 263–267
Mesaros A, Heittola T, Virtanen T (2016) August. TUT database for acoustic scene classification and sound event detection. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp 1128–1132). IEEE
Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 1041–1044
Stowell D, Plumbley MD (2013) An open dataset for research on audio field recording archives: freefield1010. arXiv preprint arXiv:1309.5275
Vozáriková E, Juhár J, Čižmár A (2011) Acoustic events detection using MFCC and MPEG-7 descriptors. In: International conference on multimedia communications, services and security, pp 191–197. Springer, Berlin, Heidelberg
Johnson DS, Lorenz W, Taenzer M, Mimilakis S, Grollmisch S, Abeßer J, Lukashevich H (2021) Desed-fl and urban-fl: Federated learning datasets for sound event detection. In: 2021 29th European signal processing conference (EUSIPCO), pp 556–560. IEEE
Purohit H, Tanabe R, Ichige K, Endo T, Nikaido Y, Suefusa K, Kawaguchi Y (2019) MIMII dataset: sound dataset for malfunctioning industrial machine investigation and inspection. arXiv preprint arXiv:1909.09347
Hertel L, Phan H, Mertins A (2016) Comparing time and frequency domain for audio event recognition using deep learning. In: 2016 International Joint Conference on Neural Networks (Ijcnn), pp 3407–3411. IEEE
Gemmeke JF, Ellis DP, Freedman D, Jansen A, Lawrence W, Moore RC, Plakal M, Ritter M (2017) Audio set: An ontology and human-labeled dataset for audio events. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 776–780. IEEE
Ooi K, Watcharasupat KN, Peksi S, Karnapi FA, Ong ZT, Chua D, Leow HW, Kwok LL, Ng XL, Loh ZA, Gan WS (2021) A strongly-labelled polyphonic dataset of urban sounds with spatiotemporal context. arXiv preprint arXiv:2111.02006
Cartwright M, Cramer J, Mendez AEM, Wang Y, Wu HH, Lostanlen V, Fuentes M, Dove G, Mydlarz C, Salamon J, Nov O (2020) SONYC-UST-V2: An urban sound tagging dataset with spatiotemporal context. arXiv preprint arXiv:2009.05188
Fonseca E, Favory X, Pons J, Font F, Serra X (2020) FSD50k: an open dataset of human-labeled sound events. arXiv preprint arXiv:2010.00475
Abeßer J (2021) USM-SED-A dataset for polyphonic sound event detection in urban sound monitoring scenarios. arXiv preprint arXiv:2105.02592
McFee B, Bertin-Mahieux T, Ellis DP, Lanckriet GR (2012) The million song dataset challenge. In: Proceedings of the 21st International Conference on World Wide Web, pp 909–916
Gao S, Zheng Y, Guo X (2020) Gated recurrent unit-based heart sound analysis for heart failure screening. Biomed Eng Online 19(1):1–17
Fonseca E, Pons Puig J, Favory X, Font Corbera F, Bogdanov D, Ferraro A, Oramas S, Porter A, Serra X (2017) Freesound datasets: a platform for the creation of open audio datasets. In: Hu X, Cunningham SJ, Turnbull D, Duan Z (eds) Proceedings of the 18th ISMIR Conference; 2017 oct 23–27; Suzhou, China.[Canada]: International Society for Music Information Retrieval, pp 486–93. International Society for Music Information Retrieval (ISMIR)
Koizumi Y, Saito S, Uematsu H, Harada N, Imoto K (2019) ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection. In: 2019 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp 313–317. IEEE
Cartwright M, Mendez AEM, Cramer J, Lostanlen V, Dove G, Wu HH, Salamon J, Nov O, Bello J (2019) SONYC Urban Sound Tagging (SONYC-UST): A multilabel dataset from an urban acoustic sensor network
Li Y, Liu M, Drossos K, Virtanen T (2020) Sound event detection via dilated convolutional recurrent neural networks. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 286–290. IEEE
Mesaros A, Heittola T, Virtanen T (2018) A multi-device dataset for urban acoustic scene classification. arXiv preprint arXiv:1807.09840
Wan M, Wang R, Wang B, Bai J, Chen C, Fu Z, Chen J, Zhang X, Rahardja S (2019) Ciaic-ASC system for DCASE 2019 challenge task1. Tech. Rep., DCASE2019 Challenge
Heittola T, Mesaros A, Virtanen T (2020) Acoustic scene classification in dcase 2020 challenge: generalization across devices and low complexity solutions. arXiv preprint arXiv:2005.14623
Rakotomamonjy A, Gasso G (2014) Histogram of gradients of time–frequency representations for audio scene classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):142–153
Mesaros A, Heittola T, Diment A, Elizalde B, Shah A, Vincent E, Raj B, Virtanen T (2017) DCASE 2017 challenge setup: Tasks, datasets and baseline system. In: DCASE 2017-workshop on detection and classification of acoustic scenes and events
Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55
Zhang Keming, Cai Yuanwen, Ren Yuan, Ye Ruida, He Liang (2020) MTF-CRNN: multiscale time-frequency convolutional recurrent neural network for sound event detection. IEEE Access 8:147337–147348
Özseven T, Düğenci M (2018) SPeech ACoustic (SPAC): A novel tool for speech feature extraction and classification. Appl Acoust 136:1–8
Dang A, Vu TH, Wang JC (2018) Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction. In: 2018 IEEE international conference on consumer electronics (ICCE), pp. 1–4. IEEE
Glowacz Adam (2018) Acoustic-based fault diagnosis of commutator motor. Electronics 7(11):299
Deng M, Meng T, Cao J, Wang S, Zhang J, Fan H (2020) Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Netw 130:22–32
Heittola T, Mesaros A, Eronen A, Virtanen T (2013) Context-dependent sound event detection. EURASIP J Audio Speech Music Process 2013(1):1–13
Mesaros A, Heittola T, Dikmen O, Virtanen T (2015) Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 151–155. IEEE
Ohishi Y, Mochihashi D, Matsui T, Nakano M, Kameoka H, Izumitani T, Kashino K (2013) Bayesian semi-supervised audio event transcription based on Markov Indian buffet process. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 3163–3167. IEEE
Cakir E, Heittola T, Huttunen H, Virtanen T (2015) Multi-label vs. combined single-label sound event detection with deep neural networks. In: 2015 23rd European signal processing conference (EUSIPCO), pp. 2551–2555. IEEE
Parascandolo G, Huttunen H, Virtanen T (2016) Recurrent neural networks for polyphonic sound event detection in real life recordings. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6440–6444. IEEE
Adavanne S, Parascandolo G, Pertilä P, Heittola T, Virtanen T (2017) Sound event detection in multichannel audio using spatial and harmonic features. arXiv preprint arXiv:1706.02293
Cakır E, Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans Audio Speech Lang Process 25(6):1291–1303
Jung S, Park J, Lee S (2019) Polyphonic sound event detection using convolutional bidirectional lstm and synthetic data-based transfer learning. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 885–889. IEEE
Adavanne S, Pertilä P, Virtanen T (2017) Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 771–775. IEEE
Mondal Ashok, Banerjee Poulami, Tang Hong (2018) A novel feature extraction technique for pulmonary sound analysis based on EMD. Comput Methods Programs Biomed 159:199–209
Mushtaq Zohaib, Shun-Feng Su (2020) Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl Acoust 167:107389
Lin L, Wang X, Liu H, Qian Y (2019) Guided learning convolution system for dcase 2019 task 4. arXiv preprint arXiv:1909.06178
Altinors Ayhan, Yol Ferhat, Yaman Orhan (2021) A sound based method for fault detection with statistical feature extraction in UAV motors. Appl Acoust 183:108325
Adavanne S, Politis A, Virtanen T (2018) Multichannel sound event detection using 3D convolutional neural networks for learning inter-channel features. In: 2018 international joint conference on neural networks (IJCNN), pp 1–7. IEEE
Kong Q, Xu Y, Wang W, Plumbley MD (2020) Sound event detection of weakly labelled data with CNN-transformer and automatic threshold optimization. IEEE/ACM Trans Audio Speech Lang Process 28:2450–2460
Lin L, Wang X, Liu H, Qian Y (2020) Guided learning for weakly-labeled semi-supervised sound event detection. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp 626–630. IEEE
Alías F, Socoró JC, Sevillano X (2016) A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Appl Sci 6(5):143
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE
Cakir E, Heittola T, Huttunen H, Virtanen T (2015) Polyphonic sound event detection using multi label deep neural networks. In: 2015 international joint conference on neural networks (IJCNN), pp 1–7. IEEE
Madhu A, Kumaraswamy S (2019) Data augmentation using generative adversarial network for environmental sound classification. In: 2019 27th European signal processing conference (EUSIPCO), pp 1–5. IEEE
Kao CC, Wang W, Sun M, Wang C (2018) R-CRNN: Region-based convolutional recurrent neural network for audio event detection. arXiv preprint arXiv:1808.06627
Cakir E, Adavanne S, Parascandolo G, Drossos K, Virtanen T (2017) Convolutional recurrent neural networks for bird audio detection. In: 2017 25th European signal processing conference (EUSIPCO), pp 1744–1748. IEEE
Sharma G (2018) Acoustic signal classification for deforestation monitoring: tree cutting problem. J Comput Sci Syst Biol 11:178–184
Incze A, Jancsó H-B, Szilágyi Z, Farkas A, Sulyok C (2018) Bird sound recognition using a convolutional neural network. In: 2018 IEEE 16th international symposium on intelligent systems and informatics (SISY), pp 000295–000300. IEEE
Chatterjee CC, Mulimani M, Koolagudi SG (2020) Polyphonic sound event detection using transposed convolutional recurrent neural network. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 661–665. IEEE
Riaz M, Mendes E, Tempero E (2009) A systematic review of software maintainability prediction and metrics. 2009 3rd international symposium on empirical software engineering and measurement, pp 367–377. https://doi.org/10.1109/ESEM.2009.5314233
Bansal A, Garg NK (2022) Environmental sound classification: a descriptive review of the literature. Intell Syst Appl 200115
Chan TK, Chin CS (2020) A comprehensive review of polyphonic sound event detection. IEEE Access 8:103339–103373
Mesaros Annamaria, Heittola Toni, Virtanen Tuomas, Plumbley Mark D (2021) Sound event detection: a tutorial. IEEE Signal Process Mag 38(5):67–83
Nogueira AFR, Oliveira HS, Machado JJM, Tavares JMRS (2022) Sound classification and processing of urban environments: a systematic literature review. Sensors 22(22):8608
Shreyas N, Venkatraman M, Malini S, Chandrakala S (2020) Trends of sound event recognition in audio surveillance: a recent review and study. The Cognitive Approach in Cloud Computing and Internet of Things Technologies for Surveillance Tracking Systems 95–106
Abayomi-Alli Olusola O, Damaševičius Robertas, Qazi Atika, Adedoyin-Olowe Mariam, Misra Sanjay (2022) Data augmentation and deep learning methods in sound classification: a systematic review. Electronics 11(22):3795
Mesaros Annamaria, Heittola Toni, Virtanen Tuomas (2016) Metrics for polyphonic sound event detection. Appl Sci 6(6):162
Xiao Y, Khandelwal T, Das RK (2023) FMSG submission for DCASE 2023 challenge task 4 on sound event detection with weak labels and synthetic soundscapes. Proc. DCASE Challenge
Martín-Morató I, Harju M, Ahokas P, Mesaros A (2023) Training sound event detection with soft labels from crowdsourced annotations. In: ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. IEEE
Cai X, Gan Y, Wu M, Wu J (2023) Weak supervised sound event detection based on Puzzle CAM. IEEE Access
Xu L, Wang L, Bi S, Liu H, Wang J (2023) Semi-Supervised sound event detection with pre-trained model. In: ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. IEEE
Wang Qing, Jun Du, Hua-Xin Wu, Pan Jia, Ma Feng, Lee Chin-Hui (2023) A four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection. IEEE/ACM Trans Audio Speech Lang Process 31:1251–1264
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All authors equally contributed and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mohmmad, S., Sanampudi, S.K. Exploring current research trends in sound event detection: a systematic literature review. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18740-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18740-9