Skip to main content
Log in

Exploring current research trends in sound event detection: a systematic literature review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Sound Event Detection (SED) plays a significant role in the present research, implemented in several areas such as Computer Science, Healthcare, Environmental Science, Security and Surveillance, etc. With the advancement of technology, SED can be deployed to mimic the human auditory system. In this paper, we have undertaken a Systematic Literature Review focused on sound event detection, presenting a comprehensive and well-structured analysis and in-depth discussions. This review is based on the authors' extensive knowledge and expertise in the field, and it compares various algorithms employed for sound event detection. The primary objective of this study is to offer valuable insights into datasets, feature extraction techniques, and execution models commonly used in SED, along with an examination of their corresponding accuracy, challenges, and limitations. Furthermore, the paper delves into identifying potential trends within the field, offering forward-looking information that can be invaluable for future research and development efforts in sound event detection. This systematic review aims to contribute to the continued advancement of SED technologies and applications by synthesizing existing knowledge and identifying emerging directions. It provides a foundation for researchers, practitioners, and stakeholders to make informed decisions and explore new possibilities within this evolving domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering–a systematic literature review. Inf Softw Technol 51(1): 7–15

  2. Mesaros A, Heittola T, Virtanen T (2016) TUT database for acoustic scene classification and sound event detection. 24th European signal processing conference (EUSIPCO), pp 1128–1132. https://doi.org/10.1109/EUSIPCO.2016.7760424

  3. Lim H, Park J, Han Y (2017) Rare sound event detection using 1D convolutional recurrent neural networks. In: Proceedings of the detection and classification of acoustic scenes and events 2017 workshop (DCASE2017), pp 80–84

  4. Kawaguchi Y, Tanabe R, Endo T, Ichige K, Hamada K (2019) Anomaly detection based on an ensemble of dereverberation and anomalous sound extraction. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 865–869

  5. Adavanne S, Virtanen T (2017) Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network. arXiv preprint arXiv:1710.02998

  6. Archontis P, Mesaros A, Adavanne S, Heittola T, Virtanen T (2020) Overview and evaluation of sound event localization and detection in DCASE2019. IEEE/ACM transactions on audio, speech, and language processing, 29 pp 684–698

  7. Kawaguchi Y, Endo T, Ichige K, Hamada K (2018) Non-negative novelty extraction: A new non-negativity constraint for NMF. 16th international workshop on acoustic signal enhancement (IWAENC), pp 256–260

  8. Küçükbay SE, Sert M (2015) Audio-based event detection in office live environments using optimized MFCC-SVM approach. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015), pp 475–480

  9. Mak M-W, Kung S-Y (2012) Low-power SVM classifiers for sound event classification on mobile devices. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1985–1988

  10. Parathai P, Tengtrairat N, Woo WL, Abdullah MAM, Rafiee G, Alshabrawy O (2020) Efficient noisy sound-event mixture classification using adaptive-sparse complex-valued matrix factorization and OvsO SVM. Sensors 20(16):4368

  11. Tran HD, Li H (2010) Sound event recognition with probabilistic distance SVMs. IEEE Trans Audio Speech Lang Process 19(6):1556–1568

    Article  Google Scholar 

  12. Yu C-Y, Liu H, Qi Z-M (2017) Sound event detection using deep random forest. Detection and Classification of Acoustic Scenes and Events

  13. Phan H, Maaß M, Mazur R, Mertins A (2014) Random regression forests for acoustic event detection and classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):20–31

    Article  Google Scholar 

  14. Xia X, Togneri R, Sohel F, Huang D (2017) Random forest classification based acoustic event detection. IEEE International Conference on Multimedia and Expo (ICME), pp 163–168

  15. Xia X, Togneri R, Sohel F, Huang D (2018) Random forest classification based acoustic event detection utilizing contextual-information and bottleneck features. Pattern Recogn 81(2018):1–13

    Article  Google Scholar 

  16. Stoller D, Ewert S, Dixon S (2018) Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185

  17. Park J, Shin J, Lee K (2018) Separation of instrument sounds using non-negative matrix factorization with spectral envelope constraints. arXiv preprint arXiv:1801.04081

  18. Chan TK, Chin CS, Li Y (2020) Non-negative matrix factorization-convolutional neural network (NMF-CNN) for sound event detection. arXiv preprint arXiv:2001.07874

  19. Bisot V, Essid S, Richard G (2017) Overlapping sound event detection with supervised nonnegative matrix factorization. IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 31–35

  20. Imoto K, Tonami N, Koizumi Y, Yasuda M, Yamanishi R, Yamashita Y (2020) Sound event detection by multitask learning of sound events and scenes with soft scene labels. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 621–625

  21. Wei W, Zhu H, Benetos E, Wang Y (2020) A-crnn: A domain adaptation model for sound event detection. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 276–280

  22. Innami S, Kasai H (2012) NMF-based environmental sound source separation using time-variant gain features. Comput Math Appl 64(5):1333–1342

    Article  Google Scholar 

  23. Komatsu T, Senda Y, Kondo R (2016) Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation. IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2259–2263

  24. Noh K, Chang J-H (2020) Joint optimization of deep neural network-based dereverberation and beam forming for sound event detection in multi-channel environments. Sensors 20(7):1883

    Article  Google Scholar 

  25. Turpault N, Serizel R, Wisdom S, Erdogan H, Hershey JR, Fonseca E, Seetharaman P, Salamon J (2021) Sound event detection and separation: a benchmark on desed synthetic soundscapes. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 840–844

  26. Komatsu T, Toizumi T, Kondo R, Senda Y (2016) Acoustic event detection method using semi-supervised non-negative matrix factorization with a mixture of local dictionaries. In: Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), pp 45–49

  27. Kong Q, Cao Y, Iqbal T, Xu Y, Wang W, Plumbley MD (2019) Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems. arXiv preprint arXiv:1904.03476

  28. Grondin F, Glass J, Sobieraj I, Plumbley MD (2019) Sound event localization and detection using CRNN on pairs of microphones. arXiv preprint arXiv:1910.10049

  29. Adavanne S, Politis A, Virtanen T (2019) A multi-room reverberant dataset for sound event localization and detection. arXiv preprint arXiv:1905.08546

  30. Zhang J, Ding W, He L (2019) Data augmentation and prior knowledge-based regularization for sound event localization and detection. DCASE 2019 detection and classification of acoustic scenes and events 2019 Challenge

  31. Cao Y, Iqbal T, Kong Q, Galindo M, Wang W, Plumbley M (2019) Two-stage sound event localization and detection using intensity vector and generalized cross-correlation. DCASE2019 Challenge, Tech. Rep

  32. Adavanne S, Politis A, Nikunen J, Virtanen T (2018) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J Sel Top Signal Process 13(1):34–48

    Article  Google Scholar 

  33. Xue W, Tong Y, Zhang C, Ding G, He X, Zhou B (2020) Sound event localization and detection based on multiple DOA beam forming and multi-task learning. Proc. Interspeech 2020 : 5091-5095

  34. Nguyen TNT, Jones DL, Gan W (2020) Ensemble of sequence matching networks for dynamic sound event localization detection and tracking. In: Detection and classification of acoustic scenes and events 2020 workshop (DCASE2020)

  35. Trowitzsch I, Schymura C, Kolossa D, Obermayer K (2019) Joining sound event detection and localization through spatial segregation. IEEE/ACM Trans Audio Speech Lang Process 28:487–502

    Article  Google Scholar 

  36. Kim B, Pardo B (2019) Sound event detection using point-labeled data. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp 1–5

  37. Xia X, Togneri R, Sohel F, Huang D (2018) Auxiliary classifier generative adversarial network with soft labels in imbalanced acoustic event detection. IEEE Trans Multimedia 21(6):1359–1371

    Article  Google Scholar 

  38. Basaran D, Essid S, Peeters G (2018) Main melody extraction with source-filter NMF and CRNN. In: 19th International Society for Music Information Retreival. 2018

  39. Boulanger-Lewandowski N, Mysore GJ, Hoffman M (2014) Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6969–6973. IEEE

  40. Liu S, Guo L, Wiggins GA (2018) A parallel fusion approach to piano music transcription based on convolutional neural network. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 391–395. IEEE

  41. Hsieh T-H, Su L, Yang Y-H (2019) A streamlined encoder/decoder architecture for melody extraction. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 156–160. IEEE

  42. Machado RB, Aguiar L, Jones G (2017) Do acoustic indices reflect the characteristics of bird communities in the savannas of Central Brazil? Landsc Urban Plan 162:36–43

    Article  Google Scholar 

  43. Ross S-J, Friedman NR, Dudley KL, Yoshimura M, Yoshida T, Economo EP (2018) Listening to ecosystems: data-rich acoustic monitoring through landscape-scale sensor networks. Ecol Res 33(1):135–147

    Article  Google Scholar 

  44. Gómez WE, Isaza CV, Daza JM (2018) Identifying disturbed habitats: a new method from acoustic indices. Eco Inform 45:16–25

    Article  Google Scholar 

  45. Khanaposhtani MG, Gasc A, Francomano D, Villanueva-Rivera LJ, Jung J, Mossman MJ, Pijanowski BC (2019) Effects of highways on bird distribution and soundscape diversity around Aldo Leopold’s shack in Baraboo, Wisconsin, USA. Landsc Urban Plan 192:103666

    Article  Google Scholar 

  46. Siddagangaiah S, Chen C-F, Wei-Chun Hu, Pieretti N (2019) A complexity-entropy based approach for the detection of fish choruses. Entropy 21(10):977

    Article  Google Scholar 

  47. Roma G, Nogueira W, Herrera P (2013) Recurrence quantification analysis features for environmental sound recognition. In: 2013 IEEE workshop on applications of signal processing to audio and acoustics, pp 1–4. IEEE

  48. Sobieraj I, Kong Q, Plumbley MD (2017) Masked non-negative matrix factorization for bird detection using weakly labeled data. In: 2017 25th European signal processing conference (EUSIPCO), pp 1769–1773. IEEE

  49. Yu S, Yi Yu, Chen Xi, Li W (2021) HANME: hierarchical attention network for singing melody extraction. IEEE Signal Process Lett 28:1006–1010

    Article  Google Scholar 

  50. Surampudi N, Srirangan M, Christopher J (2019) Enhanced feature extraction approaches for detection of sound events. In: 2019 IEEE 9th international conference on advanced computing (IACC), pp 223–229. IEEE

  51. Gumelar AB, Kurniawan A, Sooai AG, Purnomo MH, Yuniarno ME, Sugiarto I, Widodo A, Kristanto AA, Fahrudin TM (2019) Human voice emotion identification using prosodic and spectral feature extraction based on deep neural networks. In: 2019 IEEE 7th international conference on serious games and applications for health (SeGAH), pp 1–8. IEEE

  52. Jain U, Nathani K, Ruban N, Raj ANJ, Zhuang Z, Mahesh VGV (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In: 2018 international conference on sensor networks and signal processing (SNSP), pp 386–391. IEEE

  53. Lee S, Pang H-S (2020) Feature extraction based on the non-negative matrix factorization of convolutional neural networks for monitoring domestic activity with acoustic signals. IEEE Access 8:122384–122395

    Article  Google Scholar 

  54. Piczak KJ (2015) ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1015–1018

  55. Zinemanas P, Cancela P, Rocamora M (2019) MAVD: a dataset for sound event detection in urban environments. Detection and classification of acoustic scenes and events, DCASE 2019, New York, NY, USA, 25–26 Oct, page 263–267

  56. Mesaros A, Heittola T, Virtanen T (2016) August. TUT database for acoustic scene classification and sound event detection. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp 1128–1132). IEEE

  57. Salamon J, Jacoby C, Bello JP (2014) A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 1041–1044

  58. Stowell D, Plumbley MD (2013) An open dataset for research on audio field recording archives: freefield1010. arXiv preprint arXiv:1309.5275

  59. Vozáriková E, Juhár J, Čižmár A (2011) Acoustic events detection using MFCC and MPEG-7 descriptors. In: International conference on multimedia communications, services and security, pp 191–197. Springer, Berlin, Heidelberg

  60. Johnson DS, Lorenz W, Taenzer M, Mimilakis S, Grollmisch S, Abeßer J, Lukashevich H (2021) Desed-fl and urban-fl: Federated learning datasets for sound event detection. In: 2021 29th European signal processing conference (EUSIPCO), pp 556–560. IEEE

  61. Purohit H, Tanabe R, Ichige K, Endo T, Nikaido Y, Suefusa K, Kawaguchi Y (2019) MIMII dataset: sound dataset for malfunctioning industrial machine investigation and inspection. arXiv preprint arXiv:1909.09347

  62. Hertel L, Phan H, Mertins A (2016) Comparing time and frequency domain for audio event recognition using deep learning. In: 2016 International Joint Conference on Neural Networks (Ijcnn), pp 3407–3411. IEEE

  63. Gemmeke JF, Ellis DP, Freedman D, Jansen A, Lawrence W, Moore RC, Plakal M, Ritter M (2017) Audio set: An ontology and human-labeled dataset for audio events. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 776–780. IEEE

  64. Ooi K, Watcharasupat KN, Peksi S, Karnapi FA, Ong ZT, Chua D, Leow HW, Kwok LL, Ng XL, Loh ZA, Gan WS (2021) A strongly-labelled polyphonic dataset of urban sounds with spatiotemporal context. arXiv preprint arXiv:2111.02006

  65. Cartwright M, Cramer J, Mendez AEM, Wang Y, Wu HH, Lostanlen V, Fuentes M, Dove G, Mydlarz C, Salamon J, Nov O (2020) SONYC-UST-V2: An urban sound tagging dataset with spatiotemporal context. arXiv preprint arXiv:2009.05188

  66. Fonseca E, Favory X, Pons J, Font F, Serra X (2020) FSD50k: an open dataset of human-labeled sound events. arXiv preprint arXiv:2010.00475

  67. Abeßer J (2021) USM-SED-A dataset for polyphonic sound event detection in urban sound monitoring scenarios. arXiv preprint arXiv:2105.02592

  68. McFee B, Bertin-Mahieux T, Ellis DP, Lanckriet GR (2012) The million song dataset challenge. In: Proceedings of the 21st International Conference on World Wide Web, pp 909–916

  69. Gao S, Zheng Y, Guo X (2020) Gated recurrent unit-based heart sound analysis for heart failure screening. Biomed Eng Online 19(1):1–17

    Article  Google Scholar 

  70. Fonseca E, Pons Puig J, Favory X, Font Corbera F, Bogdanov D, Ferraro A, Oramas S, Porter A, Serra X (2017) Freesound datasets: a platform for the creation of open audio datasets. In: Hu X, Cunningham SJ, Turnbull D, Duan Z (eds) Proceedings of the 18th ISMIR Conference; 2017 oct 23–27; Suzhou, China.[Canada]: International Society for Music Information Retrieval, pp 486–93. International Society for Music Information Retrieval (ISMIR)

  71. Koizumi Y, Saito S, Uematsu H, Harada N, Imoto K (2019) ToyADMOS: A dataset of miniature-machine operating sounds for anomalous sound detection. In: 2019 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), pp 313–317. IEEE

  72. Cartwright M, Mendez AEM, Cramer J, Lostanlen V, Dove G, Wu HH, Salamon J, Nov O, Bello J (2019) SONYC Urban Sound Tagging (SONYC-UST): A multilabel dataset from an urban acoustic sensor network

  73. Li Y, Liu M, Drossos K, Virtanen T (2020) Sound event detection via dilated convolutional recurrent neural networks. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 286–290. IEEE

  74. Mesaros A, Heittola T, Virtanen T (2018) A multi-device dataset for urban acoustic scene classification. arXiv preprint arXiv:1807.09840

  75. Wan M, Wang R, Wang B, Bai J, Chen C, Fu Z, Chen J, Zhang X, Rahardja S (2019) Ciaic-ASC system for DCASE 2019 challenge task1. Tech. Rep., DCASE2019 Challenge

  76. Heittola T, Mesaros A, Virtanen T (2020) Acoustic scene classification in dcase 2020 challenge: generalization across devices and low complexity solutions. arXiv preprint arXiv:2005.14623

  77. Rakotomamonjy A, Gasso G (2014) Histogram of gradients of time–frequency representations for audio scene classification. IEEE/ACM Trans Audio Speech Lang Process 23(1):142–153

    Google Scholar 

  78. Mesaros A, Heittola T, Diment A, Elizalde B, Shah A, Vincent E, Raj B, Virtanen T (2017) DCASE 2017 challenge setup: Tasks, datasets and baseline system. In: DCASE 2017-workshop on detection and classification of acoustic scenes and events

  79. Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55

    Article  Google Scholar 

  80. Zhang Keming, Cai Yuanwen, Ren Yuan, Ye Ruida, He Liang (2020) MTF-CRNN: multiscale time-frequency convolutional recurrent neural network for sound event detection. IEEE Access 8:147337–147348

    Article  Google Scholar 

  81. Özseven T, Düğenci M (2018) SPeech ACoustic (SPAC): A novel tool for speech feature extraction and classification. Appl Acoust 136:1–8

    Article  Google Scholar 

  82. Dang A, Vu TH, Wang JC (2018) Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction. In: 2018 IEEE international conference on consumer electronics (ICCE), pp. 1–4. IEEE

  83. Glowacz Adam (2018) Acoustic-based fault diagnosis of commutator motor. Electronics 7(11):299

    Article  Google Scholar 

  84. Deng M, Meng T, Cao J, Wang S, Zhang J, Fan H (2020) Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Netw 130:22–32

    Article  Google Scholar 

  85. Heittola T, Mesaros A, Eronen A, Virtanen T (2013) Context-dependent sound event detection. EURASIP J Audio Speech Music Process 2013(1):1–13

    Article  Google Scholar 

  86. Mesaros A, Heittola T, Dikmen O, Virtanen T (2015) Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 151–155. IEEE

  87. Ohishi Y, Mochihashi D, Matsui T, Nakano M, Kameoka H, Izumitani T, Kashino K (2013) Bayesian semi-supervised audio event transcription based on Markov Indian buffet process. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 3163–3167. IEEE

  88. Cakir E, Heittola T, Huttunen H, Virtanen T (2015) Multi-label vs. combined single-label sound event detection with deep neural networks. In: 2015 23rd European signal processing conference (EUSIPCO), pp. 2551–2555. IEEE

  89. Parascandolo G, Huttunen H, Virtanen T (2016) Recurrent neural networks for polyphonic sound event detection in real life recordings. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6440–6444. IEEE

  90. Adavanne S, Parascandolo G, Pertilä P, Heittola T, Virtanen T (2017) Sound event detection in multichannel audio using spatial and harmonic features. arXiv preprint arXiv:1706.02293

  91. Cakır E, Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans Audio Speech Lang Process 25(6):1291–1303

    Article  Google Scholar 

  92. Jung S, Park J, Lee S (2019) Polyphonic sound event detection using convolutional bidirectional lstm and synthetic data-based transfer learning. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 885–889. IEEE

  93. Adavanne S, Pertilä P, Virtanen T (2017) Sound event detection using spatial features and convolutional recurrent neural network. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 771–775. IEEE

  94. Mondal Ashok, Banerjee Poulami, Tang Hong (2018) A novel feature extraction technique for pulmonary sound analysis based on EMD. Comput Methods Programs Biomed 159:199–209

    Article  Google Scholar 

  95. Mushtaq Zohaib, Shun-Feng Su (2020) Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl Acoust 167:107389

    Article  Google Scholar 

  96. Lin L, Wang X, Liu H, Qian Y (2019) Guided learning convolution system for dcase 2019 task 4. arXiv preprint arXiv:1909.06178

  97. Altinors Ayhan, Yol Ferhat, Yaman Orhan (2021) A sound based method for fault detection with statistical feature extraction in UAV motors. Appl Acoust 183:108325

    Article  Google Scholar 

  98. Adavanne S, Politis A, Virtanen T (2018) Multichannel sound event detection using 3D convolutional neural networks for learning inter-channel features. In: 2018 international joint conference on neural networks (IJCNN), pp 1–7. IEEE

  99. Kong Q, Xu Y, Wang W, Plumbley MD (2020) Sound event detection of weakly labelled data with CNN-transformer and automatic threshold optimization. IEEE/ACM Trans Audio Speech Lang Process 28:2450–2460

    Article  Google Scholar 

  100. Lin L, Wang X, Liu H, Qian Y (2020) Guided learning for weakly-labeled semi-supervised sound event detection. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp 626–630. IEEE

  101. Alías F, Socoró JC, Sevillano X (2016) A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Appl Sci 6(5):143

    Article  Google Scholar 

  102. Piczak KJ (2015) Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE

  103. Cakir E, Heittola T, Huttunen H, Virtanen T (2015) Polyphonic sound event detection using multi label deep neural networks. In: 2015 international joint conference on neural networks (IJCNN), pp 1–7. IEEE

  104. Madhu A, Kumaraswamy S (2019) Data augmentation using generative adversarial network for environmental sound classification. In: 2019 27th European signal processing conference (EUSIPCO), pp 1–5. IEEE

  105. Kao CC, Wang W, Sun M, Wang C (2018) R-CRNN: Region-based convolutional recurrent neural network for audio event detection. arXiv preprint arXiv:1808.06627

  106. Cakir E, Adavanne S, Parascandolo G, Drossos K, Virtanen T (2017) Convolutional recurrent neural networks for bird audio detection. In: 2017 25th European signal processing conference (EUSIPCO), pp 1744–1748. IEEE

  107. Sharma G (2018) Acoustic signal classification for deforestation monitoring: tree cutting problem. J Comput Sci Syst Biol 11:178–184

    Google Scholar 

  108. Incze A, Jancsó H-B, Szilágyi Z, Farkas A, Sulyok C (2018) Bird sound recognition using a convolutional neural network. In: 2018 IEEE 16th international symposium on intelligent systems and informatics (SISY), pp 000295–000300. IEEE

  109. Chatterjee CC, Mulimani M, Koolagudi SG (2020) Polyphonic sound event detection using transposed convolutional recurrent neural network. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 661–665. IEEE

  110. Riaz M, Mendes E, Tempero E (2009) A systematic review of software maintainability prediction and metrics. 2009 3rd international symposium on empirical software engineering and measurement, pp 367–377. https://doi.org/10.1109/ESEM.2009.5314233

  111. Bansal A, Garg NK (2022) Environmental sound classification: a descriptive review of the literature. Intell Syst Appl 200115

  112. Chan TK, Chin CS (2020) A comprehensive review of polyphonic sound event detection. IEEE Access 8:103339–103373

  113. Mesaros Annamaria, Heittola Toni, Virtanen Tuomas, Plumbley Mark D (2021) Sound event detection: a tutorial. IEEE Signal Process Mag 38(5):67–83

    Article  Google Scholar 

  114. Nogueira AFR, Oliveira HS, Machado JJM, Tavares JMRS (2022) Sound classification and processing of urban environments: a systematic literature review. Sensors 22(22):8608

  115. Shreyas N, Venkatraman M, Malini S, Chandrakala S (2020) Trends of sound event recognition in audio surveillance: a recent review and study. The Cognitive Approach in Cloud Computing and Internet of Things Technologies for Surveillance Tracking Systems 95–106

  116. Abayomi-Alli Olusola O, Damaševičius Robertas, Qazi Atika, Adedoyin-Olowe Mariam, Misra Sanjay (2022) Data augmentation and deep learning methods in sound classification: a systematic review. Electronics 11(22):3795

    Article  Google Scholar 

  117. Mesaros Annamaria, Heittola Toni, Virtanen Tuomas (2016) Metrics for polyphonic sound event detection. Appl Sci 6(6):162

    Article  Google Scholar 

  118. Xiao Y, Khandelwal T, Das RK (2023) FMSG submission for DCASE 2023 challenge task 4 on sound event detection with weak labels and synthetic soundscapes. Proc. DCASE Challenge

  119. Martín-Morató I, Harju M, Ahokas P, Mesaros A (2023) Training sound event detection with soft labels from crowdsourced annotations. In: ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. IEEE

  120. Cai X, Gan Y, Wu M, Wu J (2023) Weak supervised sound event detection based on Puzzle CAM. IEEE Access

  121. Xu L, Wang L, Bi S, Liu H, Wang J (2023) Semi-Supervised sound event detection with pre-trained model. In: ICASSP 2023–2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1–5. IEEE

  122. Wang Qing, Jun Du, Hua-Xin Wu, Pan Jia, Ma Feng, Lee Chin-Hui (2023) A four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection. IEEE/ACM Trans Audio Speech Lang Process 31:1251–1264

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

All authors equally contributed and approved the final manuscript.

Corresponding author

Correspondence to Sallauddin Mohmmad.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohmmad, S., Sanampudi, S.K. Exploring current research trends in sound event detection: a systematic literature review. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18740-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18740-9

Keywords

Navigation