Bird Call Classification Using DNN-Based Acoustic Modelling

Rajan, Rajeev; Johnson, Jisna; Abdul Kareem, Noumida

doi:10.1007/s00034-021-01896-2

Bird Call Classification Using DNN-Based Acoustic Modelling

Published: 03 January 2022

Volume 41, pages 2669–2680, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

519 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Bird call recognition using deep neural network-hidden Markov model (DNN-HMM)-based transcription is proposed. The work is an attempt to adapt the human speech recognition framework for bird call classification through transcription approach. Initially, the phone transcriptions are generated using CMU-Sphinx, and lexicons are modified using group delay-based segmentation. Later, bird call transcription is implemented using hybrid DNN-HMM framework through DNN-based acoustic modelling. During the DNN-based acoustic modelling, mel-frequency cepstral coefficient features (MFCCs) are computed and experimented with monophone models, triphone models, followed by linear discriminative analysis and maximum likelihood linear transform. The transcribed phonemes are corrected using context-based rules in the final phase. The proposed approach is evaluated on a dataset that consists of ten species with 563 audio tracks. The hybrid DNN-HMM approach outperforms the convolutional neural network and long short-term memory framework with an accuracy of 94.46%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

Neural Speech Processing for Whale Call Detection

Notes

https://en.wikipedia.org/wiki/Bird-vocalization.
www.gbif.org.
www.xeno-canto.org (Xeno-canto)

References

R. Ajayakumar, R. Rajan , Predominant instrument recognition in polyphonic music using gmm-dnn framework. pp. 1–5 (2020). 10.1109/SPCOM50965.2020.9179626
J.Bonada, R. Lachlan, M. Blaauw, Bird song synthesis based on hidden markov models. In Proc. of International Conference on Spoken Language Processing pp. 2582–2586 (2016)
D. Chakraborty, P. Mukker, P. Rajan, A.D. Dileep, Bird call identification using dynamic kernel based support vector machines and deep neural networks. In Proc. of 15th IEEE International Conference on Machine Learning and Applications (ICMLA) pp. 280–285 (2016)
W. Chu, D.T. Blumstein, Noise robust bird song detection using syllable pattern-based hidden Markov models. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing pp. 345–348 (2011)
D.B. Efremova, M. Sankupellay, D.A. Konovalov, Data-efficient classification of birdcall through convolutional neural networks transfer learning. Digital image computing: techniques and applications pp. 1–8 (2019)
D. Gelling, Bird song recognition using GMMs and HMMs. Masters Project Dissertation, Department of Computer Science, University of Sheffield (2001)
A.Harma, Automatic identification of bird species based on sinusoidal modeling of syllables. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing 5, V–545 (2003)
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine pp. 82–97 (2012)
P. Jancovic, M. Kokue, M. Zakeri, M. Russell, Bird species recognition using HMM-based unsupervised modelling of individual syllabls with incoparated duration modelling. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing pp. 559–563 (2016)
P. Jancovic, M. Köküer, Bird species recognition using unsupervised modeling of individual vocalization elements. IEEE/ACM Transactions on Audio, Speech, and Language Processing pp. 932–947 (2019). 10.1109/TASLP.2019.2904790
P. Jancovic, M. Köküer, M.Russell, Bird species recognition from field recordings using HMM-based modelling of frequency tracks pp. 8252–8256 (2014)
T.N.K. Prasad, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42, 429–446 (2004)
Article Google Scholar
M. Kaya, S.H. Bilge, Deep metric learning: a survey. Symmetry 11(9), 1–26 (2019)
Article Google Scholar
I. Kipyatkova, A. Karpov, DNN-based acoustic modeling for Russian speech recognition using Kaldi. In Proc. of the International Conference on Speech and Computer, LNAI 9811 pp. 1–8 (2016)
E. Knight, K. Hannah, G. Foley, C. Scott, R. Brigham, E. Bayne, Recommendations for acoustic recognizer performance assessment with application to five common automated signal recognition programs. Avian Conservation and Ecology 12(2) (2017)
C. Liua, L. Fengb, G. Liuc, H. Wangd, S. Liub, Bottom-up broadcast neural network for music genre classification. Pattern Recognit. Lett. pp. 1–7 (2019)
H. Murthy, B. Yegnanarayana, Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)
Article Google Scholar
T. Nagarajan, H. Murthy, M. Rajesh, Group delay based segmentation of spontaneous speech into syllable-like units. EURASIP J. Appl. Signal Processi. pp. 2641–2625 (2004)
I. Potamitis, S. Ntalampiras, K.R. Olaf Jahn, Automatic bird sound detection in long real-field recordings: applications and tools. Appl. Acoust. pp. 1–9 (2014)
R. Rajan, H.A. Murthy, Group Delay Based Melody Monopitch Extraction from Music (In: Proceedings of the IEEE Int. Conf. on Audio, Speech and Signal Processing Pp, 2013), pp. 186–190
R. Rajan, H. AMurthy, Music genre classification by fusion of modified group delay and melodic features. In Proc. of National Conference on Communications (2017)
R. Rajan, H.A. Murthy, Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)
Article Google Scholar
P. Somervuo, A. Harma, Analyzing bird song syllables on the self-organizing map. In Proc. of the Workshop on Self-Organizing Maps, Hibikino, Japan (2003)
D. Stowell, M. Wood, Y. Stylianou, H. Glotin, Bird detection in audio: A survey and a challenge. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing Salerno, Italy pp. 1–6 (2016)
C.P. Tang, K. Chui, Y. u, Z. Zeng, K. Wong, Music genre classification using a hierarchical long short term memory (LSTM) model. In Proc. of International Conference on Information Retrieval,Yokohama,Japan pp. 521–526 (2018)
A. Thakur, V. Abrol, P. Sharma, P. Rajan, Compressed convex spectral embedding for bird species classification. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing pp. 261–265 (2018)
A. Thakur, V. Abrol, P. Sharma, P. Rajan, Deep convex representations: Feature representations for bioacoustics classification. In Proc. of International Conference on Spoken Language Processing pp. 2127–2131 (2018)
A. Thakur, V. Abrol, P. Sharma, P. Rajan, Local compressed convex spectral embedding for bird species identification. The J. Acoust. Soc. Am. 143, 3819–3828 (2018)
Article Google Scholar
W.P. Vellinga, R. Planqué, Working notes of conference and labs of the evaluation forum (CELF) (2015)
S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK book (2002)
D. Yu, J. Li, Recent progresses in deep learning based acoustic models. IEEE/CAA J. Automat. Sinica 4(3), 396–409 (2017). https://doi.org/10.1109/JAS.2017.7510508
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, College of Engineering, Trivandrum, Thiruvananthapuram, India
Rajeev Rajan, Jisna Johnson & Noumida Abdul Kareem
APJ Abdul Kalam Technological University, Thiruvananthapuram, Kerala, India
Rajeev Rajan, Jisna Johnson & Noumida Abdul Kareem

Authors

Rajeev Rajan
View author publications
You can also search for this author in PubMed Google Scholar
Jisna Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Noumida Abdul Kareem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajeev Rajan.

Ethics declarations

Data Availability

The datasets generated during and/or analysed during the current study are available in the Xeno-canto bird sound repository-www.xeno-canto.org (Xeno-canto)

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajan, R., Johnson, J. & Abdul Kareem, N. Bird Call Classification Using DNN-Based Acoustic Modelling. Circuits Syst Signal Process 41, 2669–2680 (2022). https://doi.org/10.1007/s00034-021-01896-2

Download citation

Received: 18 April 2021
Revised: 21 October 2021
Accepted: 23 October 2021
Published: 03 January 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00034-021-01896-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bird Call Classification Using DNN-Based Acoustic Modelling

Abstract

Access this article

Similar content being viewed by others

An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

Neural Speech Processing for Whale Call Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bird Call Classification Using DNN-Based Acoustic Modelling

Abstract

Access this article

Similar content being viewed by others

An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

Neural Speech Processing for Whale Call Detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data Availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation