Abstract
In soundscape ecology analysis, the use of acoustic features is well established and offers important baselines to ecological analyses. However, in many cases, the problem is difficult due to high-class overlap in terms of time-frequency characteristics, as well as the presence of noise. Deep neural networks have become state-of-the-art for feature learning in many multi-class applications, but they often present issues such as over-fitting or achieve unbalanced performances for different classes, which can hamper the deployment of such models in realistic scenarios. In the context of counting the number of classes in observations, the quantification task is attracting attention and was shown to be effective in other applications. This paper investigates the use of quantification combined with classification loss in order to train a convolutional neural network to classify species of birds and anurans. Results indicate quantification has advantages over both acoustic features alone and the use of regular classification networks, in particular in terms of generalization and class recall making it a suitable choice for segregation tasks related to soundscape ecology. Moreover, we show that a more compact network can outperform a deeper one for fine-grained scenarios of birds and anurans species.
Similar content being viewed by others
Notes
Animal order of Amphibia classes such as frogs and toads.
Spatial Ecology and Conservation Lab - LEEC. website: https://github.com/LEEClab.
The complete area of the ecological corridor is located between northeastern São Paulo state and south Minas Gerais state, Brazil.
References
Aalborg University (2004) The mel frequency scale and coefficients. http://kom.aau.dk/group/04gr742/pdf/MFCC_worksheet.pdf
Bedoya C, Isaza C, Daza JM, López JD (2017) Automatic identification of rainfall in acoustic recordings. Ecol Indic 75:95–100
Beijbom O, Hoffman J, Yao E, Darrell T, Rodriguez-Ramirez A, Gonzalez-Rivero M, Guldberg OH (2015) Quantification in-the-wild: data-sets and baselines. arXiv preprint arXiv:1510.04811
Bella A, Ferri C, Hernández-Orallo J, Ramirez-Quintana MJ (2010). Quantification via probability estimators. In: IEEE international conference on data mining. IEEE, pp 737–742
Boelman NT, Asner GP, Hart PJ, Martin RE (2007) Multi-trophic invasion resistance in hawaii: bioacoustics, field surveys, and airborne remote sensing. Ecol Appl 17(8):2137–2144
Bottou L (1998) Online algorithms and stochastic approximations. In: Saad D (ed) Online learning and neural networks. Cambridge University Press, Cambridge
Bradfer-Lawrence T, Gardner N, Bunnefeld L, Bunnefeld N, Willis SG, Dent DH (2019) Guidelines for the use of acoustic indices in environmental research. Methods Ecol Evol 10(10):1796–1807
Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJK, Hadley AS, Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131(6):4640–4650
Brown A, Garg S, Montgomery J (2019) Automatic rain and cicada chorus filtering of bird acoustic data. Appl Soft Comput 81:105501
Cakır E, Parascandolo G, Heittola T, Huttunen H, Virtanen T (2017) Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans Audio Speech Lang Process 25(6):1291–1303
Cavallari GB, Ribeiro LS, Ponti MA (2018). Unsupervised representation learning using convolutional and stacked auto-encoders: a domain and cross-domain feature space analysis. In: 31st SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 440–446
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Depraetere M, Pavoine S, Jiguet F, Gasc A, Duvail S, Sueur J (2012) Monitoring animal diversity using acoustic indices: Implementation in a temperate woodland. Ecol Indic 13(1):46–54
Dong X, Towsey M, Zhang J, Roe P (2015) Compact features for birdcall retrieval from environmental acoustic recordings. In: Proceedings of the 2015 IEEE 15th international conference on data mining workshops. IEEE Computer Society, pp 1–6
Dröge S, Martin DA, Andriafanomezantsoa R, Burivalova Z, Fulgence TR, Osen K, Rakotomalala E, Schwab D, Wurz A, Richter T et al (2021) Listening to a changing landscape: acoustic indices reflect bird species richness and plot-scale vegetation structure across different land-use types in north-eastern madagascar. Ecol Indic 120:106929
Forman G (2005) Counting positives accurately despite inaccurate classification. European conference on machine learning. Springer, Berlin, pp 564–575
Gao W, Sebastiani F (2015) Tweet sentiment: from classification to quantification. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 97–104
Gao W, Sebastiani F (2016) From classification to quantification in tweet sentiment analysis. Soc Netw Anal Min 6(1):19
Gasc A, Sueur J, Pavoine S, Pellens R, Grandcolas P (2013) Biodiversity sampling using a global acoustic approach: contrasting sites with microendemics in new caledonia. PLoS ONE 8(5):e65311
González P, Castaño A, Chawla NV, Coz JJD (2017) A review on quantification learning. ACM Comput Surv (CSUR) 50(5):1–40
González P, Díez J, Chawla N, del Coz JJ (2017) Why is quantification an interesting learning problem? Prog Artif Intell 6(1):53–58
González-Castro V, Alaiz-Rodríguez R, Alegre E (2013) Class distribution estimation based on the hellinger distance. Inf Sci 218:146–164
Harvey M (2018) Acoustic detection of humpback whales using a convolutional neural network. https://ai.googleblog.com/2018/10/acoustic-detection-of-humpback-whales.html
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hilasaca LMH, Gaspar LP, Ribeiro MC, Minghim R (2021) Visualization and categorization of ecological acoustic events based on discriminant features. Ecol Indic 126:107316
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1–54
Kasten EP, Gage SH, Fox J, Joo W (2012) The remote environmental assessment laboratory’s acoustic library: an archive for studying soundscape ecology. Ecol Inform 12:50–67
Kingma, D.P., Ba, J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kornblith S, Shlens J, Le QV (2019) Do better imagenet models transfer better? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2661–2671
Krause B (1987) Bioacoustics, habitat ambience in ecological balance. Whole Earth Rev 57:14–18
LeBien J, Zhong M, Campos-Cerqueira M, Velev JP, Dodhia R, Ferres JL, Aide TM (2020) A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network. Ecol Inform 59:101113
Lin TH, Fang SH, Tsao Y (2017) Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings. Sci Rep 7(1):4547
Lin TH, Tsao Y (2020) Source separation in ecoacoustics: a roadmap towards versatile soundscape information retrieval. Remote Sens Ecol Conserv 6(3):236–247
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Maletzke A, dos Reis D, Cherman E, Batista G (2019) Dys: a framework for mixture models in quantification. Proc AAAI Confer Artif Intell 33:4552–4560
Maletzke AG, dos Reis DM, Batista GE (2017). Quantification in data streams: Initial results. In: Brazilian conference on intelligent systems (BRACIS). IEEE, pp 43–48
Mello RF, Ponti MA (2018) Machine learning: a practical approach on the statistical learning theory. Springer, Berlin
Mezquida DA, Martínez JL (2009) Platform for bee-hives monitoring based on sound analysis. a perpetual warehouse for swarm’s daily activity. Span J Agric Res 7(4):824–828
Mitchell SL, Bicknell JE, Edwards DP, Deere NJ, Bernard H, Davies ZG, Struebig MJ (2020) Spatial replication and habitat context matters for assessments of tropical biodiversity using acoustic indices. Ecol Indic 119:106717
Nonato LG, Aupetit M (2018) Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE Trans Vis Comput Graph 25(8):2650–2673
Parascandolo G, Huttunen H, Virtanen T (2016) Recurrent neural networks for polyphonic sound event detection in real life recordings. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6440–6444
Parks SE, Miksis-Olds JL, Denes SL (2014) Assessing marine ecosystem acoustic diversity across ocean basins. Ecol Inform 21:81–88
Pekin B, Jung J, Villanueva-Rivera L, Pijanowski B, Ahumada J (2012) Modeling acoustic diversity using soundscape recordings and lidar-derived metrics of vertical forest structure in aneotropical rainforest. Landsc Ecol 27(10):1513–1522
Perez, L., Wang, J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621
Pieretti N, Farina A, Morri D (2011) A new methodology to infer the singing activity of an avian community: the Acoustic Complexity Index (ACI). Ecol Indic 11(3):868–873
Pijanowski BC, Farina A, Gage SH, Dumyahn SL, Krause BL (2011) What is soundscape ecology? An introduction and overview of an emerging new science. Landsc Ecol 26(9):1213–1232
Ponti M.A, Ribeiro L.S.F, Nazare T.S, Bui T, Collomosse J (2017) Everything you wanted to know about deep learning for computer vision but were afraid to ask. In: SIBGRAPI-conference on graphics, patterns and images. Brazilian Computer Society (SBC)
Ramsay JO (2006) Functional data analysis. Wiley Online Library
Righini R, Pavan G (2020) A soundscape assessment of the sasso fratino integral nature reserve in the central apennines, italy. Biodiversity 21(1):4–14
Salamon J, Bello JP (2015). Unsupervised feature learning for urban sound classification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 171–175
Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett 24(3):279–283
Sánchez-Gendriz I, Padovese L (2016) Underwater soundscape of marine protected areas in the south Brazilian coast. Mar Pollut Bull 105(1):65–72
Scarpelli MD, Ribeiro MC, Teixeira CP (2021) What does atlantic forest soundscapes can tell us about landscape? Ecol Indicat 121:107050
Scarpelli MD, Ribeiro MC, Teixeira FZ, Young RJ, Teixeira CP (2020) Gaps in terrestrial soundscape research: it’s time to focus on tropical wildlife. Sci Total Environ 707:135403
Servick K (2014) Eavesdropping on ecosystems. Science 343:834–837
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stowell D, Plumbley MD (2014) Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning. PeerJ 2:e488
Strout J, Rogan B, Seyednezhad SM, Smart K, Bush M, Ribeiro E (2017) Anuran call classification with deep learning. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2662–2665
Sueur J, Aubin T, Simonis C (2008) Seewave, a free modular tool for sound analysis and synthesis. Bioacoustics 18(2):213–226
Sueur J, Farina A, Gasc A, Pieretti N, Pavoine S (2014) Acoustic indices for biodiversity assessment and landscape investigation. Acta Acust United Acust 100(4):772–781
Sueur J, Pavoine S, Hamerlynck O, Duvail S (2008) Rapid acoustic survey for biodiversity appraisal. PLoS ONE 3(12):e4065
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Pearson Education India, Noida
Tasche D (2014) Exact fit of simple finite mixture models. J Risk Financ Manag 7(4):150–164
Thomas M, Martin B, Kowarski K, Gaudet B, Matwin S (2019) Marine mammal species classification using convolutional neural networks and a novel acoustic representation. Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 290–305
Towsey M, Wimmer J, Williamson I, Roe P (2014) The use of acoustic indices to determine avian species richness in audio-recordings of the environment. Ecol Inform 21:110–119
Villanueva-Rivera L, Pijanowski B, Doucette J, Pekin B (2011) A primer of acoustic analysis for landscape ecologists. Landsc Ecol 26(9):1233–1246
Welch P (1967) The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoust 15(2):70–73
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) Finance Code 001, FAPESP (Grant #2019/07316-0) and the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPq) Grant #307411/2016-8 and #304266/2020-5. The authors would like to thank professor Mílton C. Ribeiro from the São Paulo State University, Rio Claro, Brazil, for his data and useful feedback.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dias, F.F., Ponti, M.A. & Minghim, R. A classification and quantification approach to generate features in soundscape ecology using neural networks. Neural Comput & Applic 34, 1923–1937 (2022). https://doi.org/10.1007/s00521-021-06501-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06501-w