Skip to main content
Log in

Acoustic domain mismatch compensation in bird audio detection

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Detecting bird calls in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management. This paper presents front-end acoustic enhancement techniques to handle the acoustic domain mismatch problem in bird detection. A time-domain cross-condition data augmentation (TCDA) method is first proposed to enhance the domain coverage of a fixed training dataset. Then, to eliminate the distortion of stationary noise and enhance the transient events, we investigate a per-channel energy normalization (PCEN) to automatic control the gain of every subband in the mel-frequency spectrogram. Furthermore, a harmonic percussive source separation is investigated to extract robust percussive representations of bird call to alleviate the acoustic mismatch. Our experiments are performed on the Bird Audio Detection Task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events 2018. Extensive results show that the proposed TCDA leads to a relative 5.02% AUC improvements on mismatch conditions. And also on the cross-domain test set, the proposed percussive features (RPFs), and these RPFs with PCEN significantly improve the baseline with conventional log mel-spectrogram features from 81.79% AUC to 84.46% and 88.68%, respectively. Moreover, we find that combing different front-end features can further improve the system performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Adavanne, S., Drossos, K., Çakir, E., & Virtanen, T. (2017). Stacked convolutional and recurrent neural networks for bird audio detection. In Proc. EUSIPCO (pp. 1729–1733).

  • Bai, J. S., Wu, R., Wang, M., et al. (2018). CIAIC-BAD sysytem for DCASE2018 challenge task3. In DCASE challenge.

  • Battenberg, E., Child, R., Coates, A., et al. (2017). Reducing bias in production speech models. CoRR, 1705, 04400.

  • Becker, L., Nelus, A., Gauer, J., Rudolph, L., & Martin, R. (2020). Audio feature extraction for vehicle engine noise classification. In Proc. ICASSP (pp. 711–715).

  • Berger, F., Freillinger, W., Primus, P., & Reisinger, W. (2018). Bird Audio Detection - DCASE 2018. In DCASE challenge

  • Duan, S., Towsey, M., Zhang, J., Truskinger, A., Wimmer, J., & Roe, P. (2011). Acoustic component detection for automatic species recognition in environmental monitoring. In Proc. ISSNIP (pp. 514–519).

  • FitzGerald, D. (2010). Harmonic/percussive separation using median filtering. In Proc. DAFx (pp. DAFX1-DAFX-4).

  • Franceschi, J.-Y., Fawzi, A., & Fawzi, O. (2018). Robustness of classifiers to uniform \(\ell _p\) and gaussian noise. In Proc. AISTATS (pp. 1–25).

  • Grill, T., Schlüter, J. (2017). Two convolutional neural networks for bird detection in audio signals. In Proc. EUSIPCO (pp. 1764–1768)

  • Himawan, I., Towsey, M., & Roe, P. (2018). 3D convolutional recurrent neural networks for bird sound detection. In Proc. DCASE workshop pp.108–112.

  • IEEE AASP challenge on detection and classification of acoustic scenes and events. DCASE2018 Challenge. http://dcase.community/challenge2018/task-bird-audio-detection

  • Jamali, S., Ahmadpanah, J., & Alipoor, G. (2018). Bird audio detection using supervised weighted NMF. In DCASE challenge

  • Kong, Q., Iqbal, T., Xu, Y., et al. (2018). DCASE 2018 challenge SURREY cross-task convolutional neural network baseline. In Proc. DCASE Workshop (pp. 217–221).

  • Krstulovic, S. (2018). Audio event recognition in the smart home. Computational analysis of sound scenes and events (pp. 335–371). Springer.

    Google Scholar 

  • Lasseck, M. (2018). Acoustic bird detection with deep convolutional neural networks. In Proc. DCASE Workshop (pp. 143–147)

  • Liaqat, S., Bozorg, N., Jose, N., Conrey, P., Tamasi, A., & Johnson, M. T. (2018). Domain tuning methods for bird audio detection. In Proc. DCASE Workshop (pp. 163–167)

  • Lostanlen, V., et al. (2019). Per-channel energy normalization: Why and how. IEEE Signal Processing Letters, 26(1), 39–43.

    Article  Google Scholar 

  • Lostanlen, V., Salamon, J., Farnsworth, A., Kelling, S., & Bello, J. P. (2018). Birdvox-full-night: A dataset and benchmark for avian flight call detection. In Proc. ICASSP (pp. 266–270).

  • Mukherjee, R., Banerjee, D., Dey, K., & Ganguly, N. (2018). Convolutional recurrent neural network based bird audio detection. In DCASE challenge.

  • Müller, D. (2014). Disch. Extending harmonic-percussive separation of audio. In Pro. ISMIR (pp. 611–616).

  • Ono, N., Miyamoto, K., Kameoka, H., & Sagayama, S. (2008a). A real-time equalizer of harmonic and percussive components in music signals. In Proc. ISMIR (pp. 139–144).

  • Ono, N., Miyamoto, K., Roux, J. L., Kameoka, H., & Sagayama, S. (2008b). Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram. In Proc. EUSIPCO (pp. 240–244).

  • Park, D. S., Chan, W., Zhang, Y., et al. (2019). SpecAugment: A simple data augmentation method for automatic speech recognition. In Proc. Interspeech (pp. 2613–2617).

  • Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proc. ICASSP (pp. 53–56).

  • Schluter, J., & Lehner, B. (2018). Zero-mean convolutions for level-invariant singing voice detection. In Proc. ISMIR (pp. 1–6).

  • Shen, J., Qu, Y., Zhang, W., & Yu, Y. (2018). Wasserstein distance guided representation learning for domain adaptation, AAAI (pp. 4058–4065).

  • Song, J., & Li, S. (2018). Bird audio detection using convolutional neural networks and binary neural networks. In DCASE challenge.

  • Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation, in Proc. AAAI (pp. 2058–2065).

  • Vesperini, F., Gabrielli, L., Principi, E., & Squartini, S. (2018). A capsule neural networks based approach for bird audio detection. In DCASE Challenge.

  • Vincent, L., Salamon, J., Farnsworth, A., et al. (2019). Robust sound event detection in bioacoustic sensor networks. PLoS ONE, 14(10).

  • Wang, Y., Getreuer, P., Hughes, T., Lyon, R. F., & Saurous, R. A. (2017). Trainable frontend for robust and far-field keyword spotting. In Proc. ICASSP (pp. 5670–5674).

  • Xie, J., Hu, K., Zhu, M., Yu, J., & Zhu, Q. (2019). Investigation of different CNN-based models for improved bird sound classification. IEEE Access, 7, 175353–175361.

    Article  Google Scholar 

  • Yu, C. C, Hao, Y., Yang, W. B., & Fu, B. (2018). Author guidelines for DCASE2018 challenge technical report. In DCASE challenge

  • Zinemanas, P., Cancela, P., & Rocamora, M. (2019). End-to-end convolutional neural networks for sound event detection in urban environments. In Proc. FRUCT (pp. 533–539).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 62071302).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanhua Long.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, T., Long, Y., Li, Y. et al. Acoustic domain mismatch compensation in bird audio detection. Int J Speech Technol 25, 251–260 (2022). https://doi.org/10.1007/s10772-022-09957-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-09957-w

Keywords

Navigation