Abstract
Bird call recognition using deep neural network-hidden Markov model (DNN-HMM)-based transcription is proposed. The work is an attempt to adapt the human speech recognition framework for bird call classification through transcription approach. Initially, the phone transcriptions are generated using CMU-Sphinx, and lexicons are modified using group delay-based segmentation. Later, bird call transcription is implemented using hybrid DNN-HMM framework through DNN-based acoustic modelling. During the DNN-based acoustic modelling, mel-frequency cepstral coefficient features (MFCCs) are computed and experimented with monophone models, triphone models, followed by linear discriminative analysis and maximum likelihood linear transform. The transcribed phonemes are corrected using context-based rules in the final phase. The proposed approach is evaluated on a dataset that consists of ten species with 563 audio tracks. The hybrid DNN-HMM approach outperforms the convolutional neural network and long short-term memory framework with an accuracy of 94.46%.
Similar content being viewed by others
Notes
www.xeno-canto.org (Xeno-canto)
References
R. Ajayakumar, R. Rajan , Predominant instrument recognition in polyphonic music using gmm-dnn framework. pp. 1–5 (2020). 10.1109/SPCOM50965.2020.9179626
J.Bonada, R. Lachlan, M. Blaauw, Bird song synthesis based on hidden markov models. In Proc. of International Conference on Spoken Language Processing pp. 2582–2586 (2016)
D. Chakraborty, P. Mukker, P. Rajan, A.D. Dileep, Bird call identification using dynamic kernel based support vector machines and deep neural networks. In Proc. of 15th IEEE International Conference on Machine Learning and Applications (ICMLA) pp. 280–285 (2016)
W. Chu, D.T. Blumstein, Noise robust bird song detection using syllable pattern-based hidden Markov models. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing pp. 345–348 (2011)
D.B. Efremova, M. Sankupellay, D.A. Konovalov, Data-efficient classification of birdcall through convolutional neural networks transfer learning. Digital image computing: techniques and applications pp. 1–8 (2019)
D. Gelling, Bird song recognition using GMMs and HMMs. Masters Project Dissertation, Department of Computer Science, University of Sheffield (2001)
A.Harma, Automatic identification of bird species based on sinusoidal modeling of syllables. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing 5, V–545 (2003)
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine pp. 82–97 (2012)
P. Jancovic, M. Kokue, M. Zakeri, M. Russell, Bird species recognition using HMM-based unsupervised modelling of individual syllabls with incoparated duration modelling. In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing pp. 559–563 (2016)
P. Jancovic, M. Köküer, Bird species recognition using unsupervised modeling of individual vocalization elements. IEEE/ACM Transactions on Audio, Speech, and Language Processing pp. 932–947 (2019). 10.1109/TASLP.2019.2904790
P. Jancovic, M. Köküer, M.Russell, Bird species recognition from field recordings using HMM-based modelling of frequency tracks pp. 8252–8256 (2014)
T.N.K. Prasad, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42, 429–446 (2004)
M. Kaya, S.H. Bilge, Deep metric learning: a survey. Symmetry 11(9), 1–26 (2019)
I. Kipyatkova, A. Karpov, DNN-based acoustic modeling for Russian speech recognition using Kaldi. In Proc. of the International Conference on Speech and Computer, LNAI 9811 pp. 1–8 (2016)
E. Knight, K. Hannah, G. Foley, C. Scott, R. Brigham, E. Bayne, Recommendations for acoustic recognizer performance assessment with application to five common automated signal recognition programs. Avian Conservation and Ecology 12(2) (2017)
C. Liua, L. Fengb, G. Liuc, H. Wangd, S. Liub, Bottom-up broadcast neural network for music genre classification. Pattern Recognit. Lett. pp. 1–7 (2019)
H. Murthy, B. Yegnanarayana, Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)
T. Nagarajan, H. Murthy, M. Rajesh, Group delay based segmentation of spontaneous speech into syllable-like units. EURASIP J. Appl. Signal Processi. pp. 2641–2625 (2004)
I. Potamitis, S. Ntalampiras, K.R. Olaf Jahn, Automatic bird sound detection in long real-field recordings: applications and tools. Appl. Acoust. pp. 1–9 (2014)
R. Rajan, H.A. Murthy, Group Delay Based Melody Monopitch Extraction from Music (In: Proceedings of the IEEE Int. Conf. on Audio, Speech and Signal Processing Pp, 2013), pp. 186–190
R. Rajan, H. AMurthy, Music genre classification by fusion of modified group delay and melodic features. In Proc. of National Conference on Communications (2017)
R. Rajan, H.A. Murthy, Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)
P. Somervuo, A. Harma, Analyzing bird song syllables on the self-organizing map. In Proc. of the Workshop on Self-Organizing Maps, Hibikino, Japan (2003)
D. Stowell, M. Wood, Y. Stylianou, H. Glotin, Bird detection in audio: A survey and a challenge. In Proc. of IEEE International Workshop on Machine Learning for Signal Processing Salerno, Italy pp. 1–6 (2016)
C.P. Tang, K. Chui, Y. u, Z. Zeng, K. Wong, Music genre classification using a hierarchical long short term memory (LSTM) model. In Proc. of International Conference on Information Retrieval,Yokohama,Japan pp. 521–526 (2018)
A. Thakur, V. Abrol, P. Sharma, P. Rajan, Compressed convex spectral embedding for bird species classification. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing pp. 261–265 (2018)
A. Thakur, V. Abrol, P. Sharma, P. Rajan, Deep convex representations: Feature representations for bioacoustics classification. In Proc. of International Conference on Spoken Language Processing pp. 2127–2131 (2018)
A. Thakur, V. Abrol, P. Sharma, P. Rajan, Local compressed convex spectral embedding for bird species identification. The J. Acoust. Soc. Am. 143, 3819–3828 (2018)
W.P. Vellinga, R. Planqué, Working notes of conference and labs of the evaluation forum (CELF) (2015)
S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK book (2002)
D. Yu, J. Li, Recent progresses in deep learning based acoustic models. IEEE/CAA J. Automat. Sinica 4(3), 396–409 (2017). https://doi.org/10.1109/JAS.2017.7510508
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Data Availability
The datasets generated during and/or analysed during the current study are available in the Xeno-canto bird sound repository-www.xeno-canto.org (Xeno-canto)
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rajan, R., Johnson, J. & Abdul Kareem, N. Bird Call Classification Using DNN-Based Acoustic Modelling. Circuits Syst Signal Process 41, 2669–2680 (2022). https://doi.org/10.1007/s00034-021-01896-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01896-2