Skip to main content
Log in

Bird Call Classification Using DNN-Based Acoustic Modelling

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Bird call recognition using deep neural network-hidden Markov model (DNN-HMM)-based transcription is proposed. The work is an attempt to adapt the human speech recognition framework for bird call classification through transcription approach. Initially, the phone transcriptions are generated using CMU-Sphinx, and lexicons are modified using group delay-based segmentation. Later, bird call transcription is implemented using hybrid DNN-HMM framework through DNN-based acoustic modelling. During the DNN-based acoustic modelling, mel-frequency cepstral coefficient features (MFCCs) are computed and experimented with monophone models, triphone models, followed by linear discriminative analysis and maximum likelihood linear transform. The transcribed phonemes are corrected using context-based rules in the final phase. The proposed approach is evaluated on a dataset that consists of ten species with 563 audio tracks. The hybrid DNN-HMM approach outperforms the convolutional neural network and long short-term memory framework with an accuracy of 94.46%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Bird-vocalization.

  2. www.gbif.org.

  3. www.xeno-canto.org (Xeno-canto)

References

  1. R. Ajayakumar, R. Rajan , Predominant instrument recognition in polyphonic music using gmm-dnn framework. pp. 1–5 (2020). 10.1109/SPCOM50965.2020.9179626

  2. J.Bonada, R. Lachlan, M. Blaauw, Bird song synthesis based on hidden markov models. In Proc. of International Conference on Spoken Language Processing pp. 2582–2586 (2016)

  3. D. Chakraborty, P. Mukker, P. Rajan, A.D. Dileep, Bird call identification using dynamic kernel based support vector machines and deep neural networks. In Proc. of 15th IEEE International Conference on Machine Learning and Applications (ICMLA) pp. 280–285 (2016)

  4. W. Chu, D.T. Blumstein, Noise robust bird song detection using syllable pattern-based hidden Markov models. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing pp. 345–348 (2011)

  5. D.B. Efremova, M. Sankupellay, D.A. Konovalov, Data-efficient classification of birdcall through convolutional neural networks transfer learning. Digital image computing: techniques and applications pp. 1–8 (2019)

  6. D. Gelling, Bird song recognition using GMMs and HMMs. Masters Project Dissertation, Department of Computer Science, University of Sheffield (2001)

  7. A.Harma, Automatic identification of bird species based on sinusoidal modeling of syllables. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing 5, V–545 (2003)

  8. G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine pp. 82–97 (2012)

  9. P. Jancovic, M. Kokue, M. Zakeri, M. Russell, Bird species recognition using HMM-based unsupervised modelling of individual syllabls with incoparated duration modelling. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing pp. 559–563 (2016)

  10. P. Jancovic, M. Köküer, Bird species recognition using unsupervised modeling of individual vocalization elements. IEEE/ACM Transactions on Audio, Speech, and Language Processing pp. 932–947 (2019). 10.1109/TASLP.2019.2904790

  11. P. Jancovic, M. Köküer, M.Russell, Bird species recognition from field recordings using HMM-based modelling of frequency tracks pp. 8252–8256 (2014)

  12. T.N.K. Prasad, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42, 429–446 (2004)

    Article  Google Scholar 

  13. M. Kaya, S.H. Bilge, Deep metric learning: a survey. Symmetry 11(9), 1–26 (2019)

    Article  Google Scholar 

  14. I. Kipyatkova, A. Karpov, DNN-based acoustic modeling for Russian speech recognition using Kaldi. In Proc. of the International Conference on Speech and Computer, LNAI 9811 pp. 1–8 (2016)

  15. E. Knight, K. Hannah, G. Foley, C. Scott, R. Brigham, E. Bayne, Recommendations for acoustic recognizer performance assessment with application to five common automated signal recognition programs. Avian Conservation and Ecology 12(2) (2017)

  16. C. Liua, L. Fengb, G. Liuc, H. Wangd, S. Liub, Bottom-up broadcast neural network for music genre classification. Pattern Recognit. Lett. pp. 1–7 (2019)

  17. H. Murthy, B. Yegnanarayana, Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)

    Article  Google Scholar 

  18. T. Nagarajan, H. Murthy, M. Rajesh, Group delay based segmentation of spontaneous speech into syllable-like units. EURASIP J. Appl. Signal Processi. pp. 2641–2625 (2004)

  19. I. Potamitis, S. Ntalampiras, K.R. Olaf Jahn, Automatic bird sound detection in long real-field recordings: applications and tools. Appl. Acoust. pp. 1–9 (2014)

  20. R. Rajan, H.A. Murthy, Group Delay Based Melody Monopitch Extraction from Music (In: Proceedings of the IEEE Int. Conf. on Audio, Speech and Signal Processing Pp, 2013), pp. 186–190

  21. R. Rajan, H. AMurthy, Music genre classification by fusion of modified group delay and melodic features. In Proc. of National Conference on Communications (2017)

  22. R. Rajan, H.A. Murthy, Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)

    Article  Google Scholar 

  23. P. Somervuo, A. Harma, Analyzing bird song syllables on the self-organizing map. In Proc. of the Workshop on Self-Organizing Maps, Hibikino, Japan (2003)

  24. D. Stowell, M. Wood, Y. Stylianou, H. Glotin, Bird detection in audio: A survey and a challenge. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing Salerno, Italy pp. 1–6 (2016)

  25. C.P. Tang, K. Chui, Y. u, Z. Zeng, K. Wong, Music genre classification using a hierarchical long short term memory (LSTM) model. In Proc. of International Conference on Information Retrieval,Yokohama,Japan pp. 521–526 (2018)

  26. A. Thakur, V. Abrol, P. Sharma, P. Rajan, Compressed convex spectral embedding for bird species classification. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing pp. 261–265 (2018)

  27. A. Thakur, V. Abrol, P. Sharma, P. Rajan, Deep convex representations: Feature representations for bioacoustics classification. In Proc. of International Conference on Spoken Language Processing pp. 2127–2131 (2018)

  28. A. Thakur, V. Abrol, P. Sharma, P. Rajan, Local compressed convex spectral embedding for bird species identification. The J. Acoust. Soc. Am. 143, 3819–3828 (2018)

    Article  Google Scholar 

  29. W.P. Vellinga, R. Planqué, Working notes of conference and labs of the evaluation forum (CELF) (2015)

  30. S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK book (2002)

  31. D. Yu, J. Li, Recent progresses in deep learning based acoustic models. IEEE/CAA J. Automat. Sinica 4(3), 396–409 (2017). https://doi.org/10.1109/JAS.2017.7510508

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajeev Rajan.

Ethics declarations

Data Availability

The datasets generated during and/or analysed during the current study are available in the Xeno-canto bird sound repository-www.xeno-canto.org (Xeno-canto)

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajan, R., Johnson, J. & Abdul Kareem, N. Bird Call Classification Using DNN-Based Acoustic Modelling. Circuits Syst Signal Process 41, 2669–2680 (2022). https://doi.org/10.1007/s00034-021-01896-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01896-2

Keywords

Navigation