A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring

Cañas, Juan Sebastián; Toro-Gómez, María Paula; Sugai, Larissa Sayuri Moreira; Benítez Restrepo, Hernán Darío; Rudas, Jorge; Posso Bautista, Breyner; Toledo, Luís Felipe; Dena, Simone; Domingos, Adão Henrique Rosa; de Souza, Franco Leandro; Neckel-Oliveira, Selvino; da Rosa, Anderson; Carvalho-Rocha, Vítor; Bernardy, José Vinícius; Sugai, José Luiz Massao Moreira; dos Santos, Carolina Emília; Bastos, Rogério Pereira; Llusia, Diego; Ulloa, Juan Sebastián

doi:10.1038/s41597-023-02666-2

Download PDF

Data Descriptor
Open access
Published: 06 November 2023

A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring

Juan Sebastián Cañas ORCID: orcid.org/0000-0003-0365-5005¹,
María Paula Toro-Gómez ORCID: orcid.org/0000-0001-8875-6619¹,
Larissa Sayuri Moreira Sugai²,
Hernán Darío Benítez Restrepo³,
Jorge Rudas¹,
Breyner Posso Bautista¹,
Luís Felipe Toledo ORCID: orcid.org/0000-0002-4929-9598⁴,
Simone Dena⁵,
Adão Henrique Rosa Domingos⁶,
Franco Leandro de Souza⁷,
Selvino Neckel-Oliveira⁸,
Anderson da Rosa⁸,
Vítor Carvalho-Rocha ORCID: orcid.org/0000-0002-4747-1219⁸,
José Vinícius Bernardy⁹,
José Luiz Massao Moreira Sugai⁹,
Carolina Emília dos Santos⁹,
Rogério Pereira Bastos⁹,
Diego Llusia^10,11,12 &
…
Juan Sebastián Ulloa¹

Scientific Data volume 10, Article number: 771 (2023) Cite this article

2817 Accesses
47 Altmetric
Metrics details

Subjects

Abstract

Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires automatic identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources have been made available at https://soundclim.github.io/anuraweb/.

ANIMAL-SPOT enables animal-independent signal detection and classification using deep learning

Article Open access 19 December 2022

An open access dataset for developing automated detectors of Antarctic baleen whale sounds and performance evaluation of two commonly used detectors

Article Open access 12 January 2021

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Article Open access 18 December 2023

Background & Summary

Global anthropogenic biodiversity loss is a major challenge of contemporary society¹. With severe wildlife population declines and extinctions over the planet, monitoring and predicting species responses to global changes became an urgent task for conservation. Novel technologies now offer remote, non-invasive, and automated methods to survey and monitor biodiversity at unprecedented spatial and temporal scales². For instance, passive acoustic monitoring (PAM) has been largely adopted in ecological research and is increasingly used in conservation applications³. Based on acoustic sensor networks, PAM enables us to remotely and automatically record the vocal activity of wild animals, increasing our ability to study biological communities. However, a critical bottleneck for the widespread use of this method is the need for automated techniques to retrieve biologically meaningful information in the huge time-series audio datasets collected by PAM. Manual inspection of these recordings is unattainable due to the human specialist workload when audio data collected reach the big data scale⁴.

In the last decade, the three fundamental reasons for the success of machine learning (ML) techniques have been the advancement in high-computing hardware, novel algorithms, and the curation of high-quality datasets for standardized benchmark⁵. As a consequence, ML has emerged as a key solution and a general accelerator for multiple domains in which biodiversity monitoring programs, animal ecology, and global change research are not an exception^6,7. Particularly, the growth of ML for ecological applications now depends on the variety, quality, and availability of public datasets that define ML tasks for determined contexts and problems^7,8. Despite recent efforts to curate datasets for ecological research, available data remains taxonomically and geographically biased⁹. ML has opened up exciting possibilities for research in this area^{10,11,12,13,14}, but limitations in the diversity of existing datasets must be acknowledged. In the field of bioacoustics and PAM, datasets aimed at supporting acoustic identification have been developed for a limited number of taxonomic groups, mainly birds^15,16, mosquitoes¹⁷, and mammals^8,18,19. These datasets have also served as general benchmarks for the detection and classification of the recorded individuals into species²⁰. Altogether, the increasing number of curated datasets coming from bioacoustics research generates a unique opportunity to foster the culture of open data, open models, and benchmarks in conservation research²¹. PAM has special importance in applied conservation, where datasets may impact the robustness of biodiversity monitoring programs that support ecological²² and policy-related²³ decision-making.

Amphibians are one of the most endangered vertebrate groups in the world, with more than 40% of the species endangered to extinction²⁴. In the tropics, amphibian communities exhibit high diversity²⁵ and are more prone to extinction²⁶ compared to other regions. To monitor these communities, researchers can take advantage of PAM techniques which are a non-invasive data collection that allows incorporating information from both rare and cryptic species, as well as from common and abundant ones. Acoustic communication has a central role in the reproductive behavior of anurans²⁷. During the breeding season, males call, for example, to attract females, defend territories, and deter competitors²⁸. Thus, a wide range of research relies on the identification and quantification of these sounds, with an increasing number of applications. However, there is a lack of open datasets for this highly vocal group that can support the development of ML models for PAM research.

This study introduces a large-scale annotated dataset of Neotropical anuran calls: AnuraSet. This dataset was compiled through a country-wide collaborative PAM program across Brazil between 2019 and 2021, and it is composed of 1612 1-minute annotated audio recordings, equivalent to 26.87 hours of audio. We collected data from four strategically selected sites in the Neotropics and generated precise annotations on the recordings. Subsequently, we preprocessed the data to train deep learning models, enabling us to conduct a baseline experiment and launch a benchmarking initiative for the automated identification of anuran calls (Fig. 1). The preprocessing and baseline code is released under the MIT License and all the data is under the CC0 license to support reproducible research. AnuraSet will potentially provide a common and realistic-scale evaluation task for species identification in Neotropical soundscapes. In addition, AnuraSet is a solid starting point for a comprehensive and accessible dataset of anuran calls and choruses. Since tropical acoustic environments are highly complex and manually annotated datasets are scarce, AnuraSet has the capacity to accelerate the development of robust machine listening models for wildlife monitoring in biodiversity hotspots. Furthermore, we summarize the main challenges and propose a roadmap to foster a culture of collaboration, experimentation, research, and exploration in ML for applied ecology. In our viewpoint, this culture is essential for advancing ML techniques and ecological inferences for conservation policies. In addition, the challenges posed by biodiversity acoustic monitoring provide a unique opportunity for exploring new avenues in the field of ML.

**Fig. 1: Overview of the AnuraSet methodological workflow that encompasses the process of dataset creation and benchmarking.**

In summary, our contributions are (i) a collection of manually annotated PAM recordings of Neotropical anurans calling activity, with information on species composition (presence-absence data) and audio quality of the recordings; (ii) a curated, preprocessed, and in the wild acoustic dataset, with a detailed description of the data challenges; and (iii) baseline models for benchmarking the problem of species identification towards the creation of robust classifiers and the fast development of new models. Overall, our goal is to support a community of ML researchers and conservationists who can work together to develop innovative solutions for biodiversity monitoring. All our experiments and resources have been made available at https://soundclim.github.io/anuraweb/. By providing open-access resources and encouraging the exploration of new techniques, we aim to contribute to developing powerful tools for conservation and ecological research.

Methods

Data Collection

Calling activity of Neotropical anuran communities was monitored from 2019 to 2021 in four sites located at the Cerrado (INCT17, INCT41) and Atlantic Forest (INCT20955, INCT4) biomes, known for their critical role as global biodiversity hotspots (Fig. 2). INCT refers to Institutos Nacionais de Ciência, Tecnologia e Inovação (National Institutes of Science and Technology). At the edge of the water bodies of each site, we installed an acoustic sensor equipped with omnidirectional microphones (SM4, Wildlife Acoustics, Inc., Concord, MA, USA) that were fixed on trees or wooden bases, at about 1.5 m above the ground. Each recorder was configured to register one min every 15 min over 24 h a day (a total of 1.6 hours per day), with a sampling rate of 22050 Hz and 16-bit depth resolution. Audios were recorded in stereo mode, with 10 dB and 16 dB gain on each channel. We considered aspects of anuran calling behavior to choose this recording schedule: a) detectability of pond-breeding anurans is often high, as individuals engage in calling activity from aggregations on the margins of the ponds where the recorders are installed, and b) 1 minute of recording every 15 minutes was the best compromise between obtaining data at a high daily temporal resolution while enabling the sampling over longer periods (e.g. 3 to 4 months)²⁹.

Audio Annotation

We developed an annotation protocol in order to build automated tools for determining the species recorded with PAM. We combined weak labels (temporal precision was limited to the 60 s duration) with strong labels (providing exact temporal segments of the audio recording where the anuran call was active). The weak labels were annotated by local herpetologists and bioacoustics experts and the strong labels were annotated by a herpetologist over a selected subsample of all raw recordings to obtain a presence-absence dataset at the scale of the audio recording. All annotators had previous experience detecting anurans calls in recordings. Since the list of species at each study site was initially unknown, we first searched each 1-min recording of the species using local expert knowledge in the form of weak labels. After that, we used strong labels as they are better to solving the audio event classification problem³⁰. The protocol that we developed consists of three steps specifically tailored for the identification of anuran calls. However, it can be easily adapted and customized for any taxon.

Step 1. Audio sampling

To annotate audio files, train, and validate ML models, we first obtained a stratified sample of audio recordings from each site that was representative of both seasonal periods and daily periods of highest calling activity. Samples were drawn from months considering the extent of the breeding season, as informed by the principal investigators at each site (3–6 months), at night time (from 1 h before sunset until 1 h before sunrise). From these strata, we randomly selected a total of 300 to 600 files, depending on the amount of months informed by the researchers. These files were processed using two sequential steps, with first, inspection to generate weak labels (see step 2) followed by strong label annotations (step 3). In total, we selected 1612 1-min audio files (26.87 hours) over the four study sites.

Step 2. Weak labeling

To identify anuran species recorded in the selected samples (420, 354, 472 and 366 files for INCT04, INCT17, INCT20955, INCT41, respectively), local herpetologists and bioacoustics experts (JVB, SD, JLMMS, AdaR) performed a visual and auditory analysis of spectrograms using Audacity ® 3.2.5 software (https://audacityteam.org/). Local annotators were asked to report the level of calling activity of each recorded anuran species based on the Amphibian Calling Index³¹ (Table 1), according to the species-specific calling activity level in each 1-minute audio file (weak labeling).

Table. 1 Levels of anuran calling activity for the weak labeling.

Full size table

Step 3. Strong labeling

To provide precise annotations within the 1-min files, we identified bouts of advertisement calls and generated strong labels (step 1). Using Audacity 3.2, we conducted a detailed visual and aural inspection of the spectrogram to identify temporal limits (beginning and end) containing species-specific calls with an inter-call interval of less than 1 second. These annotations ensured fine-scale specificity (Fig. 3). For longer inter-call intervals, we boxed calls separately and labeled them independently. Detailed labels assigned to time boxes were composed of (i) the species ID, tagged with a unique 6-letter code built from the scientific name of each identified species (Supplementary Table 1), and (ii) the perceived quality of the recorded signal, included as a single letter indicating a Low (L), Medium (M), or High (H) quality (Fig. 4). To ensure consistency among the perceptual quality labels, we set up the following criteria: A high-quality call has a high signal-to-noise ratio, no overlap with other sounds, has a well-identifiable structure on the spectrogram, and can be easily visualized on the oscillogram. A medium-quality call can be visually identified on the spectrogram but may overlap with other sounds that can be difficult to identify in the oscillogram. A low-quality call shows a low signal-to-noise ratio, is partially masked by other sounds, appears with low intensity on the spectrogram, and cannot be easily identified on the oscillogram. This information was used to promote the usability of the data and improve the error analysis of the learning model.

**Fig. 4: An illustrative example of the advertisement call of *Physalaemus albonotatus* for the three audio quality categories.**

We followed a consistent annotation procedure for all the data, performed by a single trained herpetologist (MPTG). We used Audacity ® 3.2.5 software to visualize the spectrograms and create the labels in steps 2 and 3. We optimized the visualization of the acoustic signals by setting the spectrogram configuration parameters as follows: linear scale for frequency, the maximum frequency of 10 kHz, gain of 20 dB, range 80 dB, FFT algorithm with a window size of 1024, and Hann type, and standard color range to represent sound energy.

Data Preprocessing

We framed the species identification problem as a multi-label classification task considering the common occurrence of call overlap in PAM. We applied a set of transformations over the raw audio files and annotations to obtain a dataset suitable for use with ML algorithms. First, reading the metadata of the 1-minute raw audio files, we obtained samples of a 3-second fixed-length window applying a 1-second sliding window. This produced a two-third overlap between samples^10,32. Second, we assigned a multilabel species label to each sample whenever a portion of a species call appeared within one of these windows. This procedure was applied to all calls, regardless of their quality. Third, we preprocessed each 1-minute annotated audio file using the scikit-maad python package³³ and applied the sliding window approach described above. After trimming the 3-second in time and the frequency limits between 1 Hz and 10000 Hz, we applied a bandwidth filter which uses a bandpass filter to process a 1D signal with an infinite impulse response (IIR) Butterworth filter of order 5. After that, we normalized the audio signal to a maximum amplitude of 0.7 decibel full-scale value (dBFS) and saved it as an uncompressed WAV format. Finally, we selected each 1-minute recording containing weak labels to split the dataset between training and test. We summed the occurrences of all species and applied an iterative stratification for the multi-label setting^34,35 to the unbalanced proportions in the different subsets, with 70% in training and 30% in the test. In this step, we used the 1-minute recording level to avoid data leakage (same 1-minute audio with samples in train and testing subsets).

Data Records

The dataset and the raw data are provided under the Public Domain Dedication license (CC0) and are deposited in Zenodo³⁶. We collected data for 42 neotropical anuran amphibian species from 12 genera and 5 families (Supplementary Table 1). Taxonomic nomenclature followed Frost³⁷. A total of 16,000 time boxes equivalent to approximately 31 hours of cumulative duration and 27 hours of human-generated annotations was created, considering all individual or series of calls from these species. It is important to note that due to significant overlap in time boxes among different species, the cumulative duration exceeded the sum of the recording time. Among the collected data, approximately 20% of the 1-minute raw audio files did not contain anuran calls but contained soundscapes with geophonic sources like rain and wind, as well as biophonic sources such as other vocalizing species like insects and birds. The strong labeled annotated data was unevenly distributed across the sites INCT17, INCT20955, INCT41, and INCT4 at 42.5%, 33%, 13.5%, and 11%, respectively. The distribution of samples per species in the final dataset exhibits a long-tailed pattern, which coincides with the typical species diversity pattern in tropical environments (Fig. 5). This reflects the local number of registered species and their vocal activity levels, which depends on the regional pool of species and other contexts regarding the ecology of species. Additionally, we observed a high degree of variability in species composition between sites; specifically, only five species were detected in more than one site.

**Fig. 5: Frequency distribution of 3-second samples per anuran species.**

Here we provide two main data resources: (i) the raw audio files with an associated table containing annotations, and (ii) the preprocessed input dataset for ML with 93378 3-second audio samples, both sharing a similar folder structure. The raw data was divided into separate folders per site. Inside each folder, there is a collection of 1-minute recordings in WAV format with self-explanatory filenames that include the site name, the date, and the time as follows: {site}_{date}_{time}.wav. For example, the file INCT20955_20190830_231500.wav is located in the folder of site INCT20955 and was obtained on 30 August 2019 at 23:15 (BRT time zone). In the same way, the preprocessed dataset follows the same folder and naming structure but also includes the start and final second of the audio segment: {site}_{date}_{time}_{start second}_{final second}.wav. Following the previous example, INCT20955_20190830_231500_30_33.wav means that the sample starts in the second 30 and ends at the second 33. The dataset folder contains 2 files and one folder containing separate folders per site. The samples are WAV audio files with fixed 3-second lengths, obtained with 22.05 kHz sampling frequency and 16-bit depth. The two other files are a README file describing the structure and construction of the dataset and a metadata CSV file containing the labels for each sample as follows:

sample_name: the unique identifier of each sample that corresponds to a unique audio file in the audio folder and follows the structure {site}_{date}_{hour}_{start second}_{final second}.wav. The next 5 columns were constructed based on this column.
fname: raw audio filename extracted from a site and used by annotators to create weak labels.
min_t: second where the annotation starts in a fixed window length.
max_t: second where the annotation ends in a fixed window length.
site: identifier of the recording site.
date: datetime of the recording.
subset: training or test subset.
species_number: total number of species in each sample. The sum of the next 42 columns per row.
{species} × 42 Binary columns of each species where 1 if some portion of the call is in the sample, 0 else. The 42 species column names are the codes shown in Supplementary Table 1.

Technical Validation

Experimental setup

The main goal for creating the AnuraSet is to provide a solution for the species identification problem and boost ecological inferences in PAM-based anuran monitoring programs. We frame the species identification problem as a multi-label classification problem using the data from all 4 sites without temporal or site distinction. Following the ecological conditions of the large-scale analysis bioacoustics project³², we choose the F1-score as the performance classification metric using the usual 0.5 threshold. For the case of multi-label classification, we selected the Macro version of the F1-score to give the same importance to all species. To better understand the dependency between the number of samples and performance, we grouped species into ‘‘Common’’, ‘‘frequent’’, and ‘‘rare’’ categories using the samples frequency similar to the Auto Arborist Dataset¹². The grouping reflected the label frequency within each anuran assemblage with 2 breakdowns, where species with more than 10.000 samples were classified as common species, less than 5.000 samples were classified as rare species, and those between 5.000 and 10.000 samples were classified as frequent species.

Baseline Models

Following the pipelines of previous studies^10,32, we applied a Mel Spectrogram transformation on audio recordings using a window size of 512, a hop length of 28, and the number of mel filter banks of 128. Then we applied SpecAugmentation in time and frequency³⁸ as spectrogram augmentation strategies and resize. The transformations and augmentations described above generated the inputs in ResNet³⁹ family models. Specifically, we tested the ResNet18, ResNet50, and ResNet152. All our baseline experiments were implemented using the PyTorch⁴⁰ framework and the torchaudio⁴¹ library which are publicly available in the repository https://github.com/soundclim/anuraset.

Benchmark Results

After testing the ResNet family models, we grouped the performance by species according to their classes of sample frequency (Table 2). The best model in all cases was the ResNet152, with a percentage (%) F1-score of 68.4 for the Frequent group, 56.8 for the Common, and 15.7 for the Rare classes. The total Macro F1-score was 37.8. This result suggested that the number of samples strongly influences the general performance of the models. The F1-score performance of each species in each site is reported in Fig. 6. In this Figure, we confirmed the challenge of learning from small samples, which is related to the problem of creating machine learning models using just a few samples for training, for example in Fig. 6 we can see that in less than 1000 samples the algorithms perform percentage F1-score less than 20% in all cases. This problem is still an open research area in deep learning for computational bioacoustics⁴².

Table. 2 Performance of ResNet family models in F1-score percentage using all sites and species.

Full size table

**Fig. 6: Performance for benchmarking the species identification problem.**

Usage Notes

Data Challenges and Open Problems

During the annotation and dataset-building process, we faced challenges inherent to Neotropical, real-world datasets in PAM. We encourage researchers to experiment with the AnuraSet, from heuristics to understand the optimal parameters in preprocessing steps, including augmentation strategies to novel techniques for advancing the anuran call identification problem and other tasks yet to be discovered. With the goal of paving the way for new directions and advancements in ML research for bioacoustics and ecoacoustics, we summarize these challenges in the following topics.

The devil is in the tails

As expected, the number of audio samples per species is highly imbalanced (Fig. 5), forming a long-tailed distribution⁴³. The characteristics of a large number of categories and small training examples pose a challenge for obtaining good classifiers in all species. As we see in the benchmark results, there was a dependency between the number of samples and performance. This situation is especially relevant when rare species are of interest for ecological and conservation applications. AnuraSet is a suitable dataset to test different methods such as algorithmic solutions^44,45,46 or augmentation strategies that have been proposed to overcome the long-tailed problem. Furthermore, this problem can be formulated as a Learning from small samples problem to explore state-of-the-art approaches⁴² like few-shot learners^47,48 or self-supervised learning^49,50.

Human Intensive Annotation

Another manifestation of the Learning from small samples challenges happens in the early beginning of the annotation process. As we showed in the annotation protocol section, this is a human labor-demanding process. To scale in rich and large datasets it is necessary to use new ways to annotate data points as have been shown in clever approaches like Auto Arborist Dataset¹². For example, one possible path is to explore a hybrid approach for human-machine collaboration labeling using an active labeling and learning scheme where each step of the learning procedure is actively assisted by a learning algorithm⁵¹. Recent work⁵² shows that weak labels combined with unsupervised learning approaches can improve the performance of classifiers. The evaluation of such methods on the AnuraSet dataset can facilitate advancements in efficient and scalable annotation techniques.

Fine-grained audio in natural environments

Despite a classic dataset such as ImageNet⁵³, where the classes can be easily identified for a human, the classes annotated in the AnuraSet rely on the expert knowledge of local herpetologists on sound-based species identification. Additionally, the recordings were collected in complex environments, generating variability in the signal-to-noise ratio of the data due to neotropical soundscape diversity in the different biomes (Fig. 7). We confirm that the presence of calls in noisy conditions is a typical situation encountered in tropical environments investigated by PAM. This kind of problem, which involves distinguishing between subtle differences may imply other approaches^54,55 compared with generic object recognition.

**Fig. 7: Analytical challenges of the AnuraSet.**

Multi-label dataset

Tropical anuran assemblages recorded via PAM exhibit a distinctive feature of dense choruses with high call overlap, comprising different call types. This characteristic often leads to sound masking and makes the identification of individual calls challenging. Species calls in PAM recordings from AnuraSet are highly overlapped, therefore, calls often overlap not only between conspecifics but also between heterospecifics. As Fig. 7a shows, 8 different anuran calls were recorded in less than 8 seconds. This characteristic is unique to PAM data and poses a challenge that is different from other wildlife monitoring sensors like camera trap images. These overlaps are related to the classic problem of the cocktail party, in which we try to search for an audio signal of interest like the anuran call, while other species, geophony, and biophony sounds co-occur or overlap with the signal of interest. Recent studies^56,57,58 show promising progress in the context of bioacoustics.

Towards abundance and behavior classification

In the weak labeling process, we go beyond binary presence-absence annotation and use four categories to capture calling activity, similar to the Amphibian Calling Index³¹ (Table 1). By mixing this assignment of weak labels with strong labels in the AnuraSet it is possible to work towards call activity classifiers that can measure anuran abundance and behavior in an ecologically meaningful way. These classifiers could help us understand species co-occurrence, temporal patterns of vocal activity, and chorus formation. Measuring abundance in bioacoustics is not straightforward, as it depends on factors such as variability of animal vocalization behavior, overlap, and interference of sounds from different sources. However, the AnuraSet provides a dataset with these properties in a natural and complex environment that will allow the development of new classification techniques that consider sources of error and bias.

Code availability

The dataset and the raw data are hosted in Zenodo https://doi.org/10.5281/zenodo.8342596 under the CC0 license³⁶. All the code for reproducing the experimental protocol, the building and preprocessing of the dataset, and the use of the baseline model are available in the repository https://github.com/soundclim/anuraset under the MIT license. We open the Python code to fast development of new deep learning models and experiments in Pytorch.

References

Urban, M. C. et al. Improving the forecast for biodiversity under climate change. Science 353, aad8466 (2016).
Article PubMed Google Scholar
Sugai, L. S. M., Silva, T. S. F., Ribeiro, J. W. Jr & Llusia, D. Terrestrial passive acoustic monitoring: review and perspectives. BioScience 69, 15–25 (2019).
Article Google Scholar
Gibb, R., Browning, E., Glover-Kapfer, P. & Jones, K. E. Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring. Methods Ecol. Evol. 10, 169–185 (2019).
Article Google Scholar
Beery, S. Scaling Biodiversity Monitoring for the Data Age. XRDS Crossroads ACM Mag. Stud. 27, 14–18 (2021).
Article Google Scholar
Hardt, M. & Recht, B. Patterns, predictions, and actions: Foundations of machine learning. (Princeton University Press, 2022).
Rolnick, D. et al. Tackling climate change with machine learning. ACM Comput. Surv. 55, 1–96 (2022).
Article Google Scholar
Tuia, D. et al. Perspectives in machine learning for wildlife conservation. Nat. Commun. 13, 792 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Dufourq, E. et al. Automated detection of Hainan gibbon calls for passive acoustic monitoring. Remote Sens. Ecol. Conserv. 7, 475–487 (2021).
Article Google Scholar
Luccioni, A. S. & Rolnick, D. Bugs in the Data: How ImageNet Misrepresents Biodiversity. Proc. AAAI Conference on Artificial Intelligence. 37, 14382–14390, https://doi.org/10.1609/aaai.v37i12.26682 (2023).
Article Google Scholar
Van Horn, G. et al. Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset. in Proc. ECCV European Conference on Computer Vision. 271–289, https://doi.org/10.1007/978-3-031-20074-8_16 (2022).
Van Horn, G. et al. The inaturalist species classification and detection dataset. in Proc. IEEE conference on computer vision and pattern recognition. 8769–8778, https://doi.org/10.1109/CVPR.2018.00914 (2018).
Beery, S. et al. The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21294–21307, https://doi.org/10.1109/CVPR52688.2022.02061 (2022).
Kay, J. et al. The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting. in Proc. ECCV European Conference on Computer Vision. 290–311, https://doi.org/10.1007/978-3-031-20074-8_17 (2022).
Beery, S., Van Horn, G. & Perona, P. Recognition in terra incognita. in Proc. ECCV European conference on computer vision. 456–473, https://doi.org/10.1007/978-3-030-01270-0_28 (2018).
Lostanlen, V., Salamon, J., Farnsworth, A., Kelling, S. & Bello, J. P. Birdvox-full-night: A dataset and benchmark for avian flight call detection. in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 266–270, https://doi.org/10.1109/ICASSP.2018.8461410 (2018).
Chronister, L. M., Rhinehart, T. A., Place, A. & Kitzes, J. An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Ecology. 102, e03329 (2021).
Article PubMed Google Scholar
Kiskin, I. et al. HumBugDB: a large-scale acoustic mosquito dataset. in Conference on Neural Information Processing Systems 35^th (NeurIPS) Track on Datasets and Benchmarks. https://doi.org/10.48550/arXiv.2110.07607 (2021).
Aodha, O. M. et al. Towards a General Approach for Bat Echolocation Detection and Classification. Preprint at bioRxiv, https://doi.org/10.1101/2022.12.14.520490 (2022).
Prat, Y., Taub, M., Pratt, E. & Yovel, Y. An annotated dataset of Egyptian fruit bat vocalizations across varying contexts and during vocal ontogeny. Sci. Data. 4, 170143 (2017).
Article PubMed PubMed Central Google Scholar
Hagiwara, M. et al. BEANS: The Benchmark of Animal Sounds. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1–5, https://doi.org/10.1109/ICASSP49357.2023.10096686 (2022).
Baker, E. & Vincent, S. A deafening silence: a lack of data and reproducibility in published bioacoustics research? Biodivers. Data J. 7, e36783 (2019).
Google Scholar
Ross, S. R.-J. et al. Passive acoustic monitoring provides a fresh perspective on fundamental ecological questions. Funct. Ecol. 37, 959–975 (2023).
Article CAS Google Scholar
August, T. et al. Realising the potential for acoustic monitoring to address environmental policy needs. JNCC Rep. N° 707 (2022).
Stuart S.N. et al. Threatened amphibians of the world (Lynx Edicions, 2008).
Pyron, R. A. & Wiens, J. J. Large-scale phylogenetic analyses reveal the causes of high tropical amphibian diversity. Proc. R. Soc. B Biol. Sci. 280, 20131622 (2013).
Article Google Scholar
Duarte, H. et al. Can amphibians take the heat? Vulnerability to climate warming in subtropical and temperate larval amphibian communities. Glob. Change Biol. 18, 412–421 (2012).
Article ADS Google Scholar
Narins, P. M. & Feng, A. S. Hearing and sound communication in amphibians: prologue and prognostication. (Springer, 2006).
Köhler, J. et al. The use of bioacoustics in anuran taxonomy: theory, terminology, methods and recommendations for best practice. Zootaxa. 4251, 1–124 (2017).
Article PubMed Google Scholar
Sugai, L. S. M., Desjonquères, C., Silva, T. S. F. & Llusia, D. A roadmap for survey designs in terrestrial acoustic monitoring. Remote Sens Ecol Conserv 6, 220–235 (2020).
Article Google Scholar
Hershey, S. et al. The Benefit of Temporally-Strong Labels in Audio Event Classification. in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 366–370, https://doi.org/10.1109/ICASSP39728.2021.9414579 (2021).
Mossman, M. J. & Weir, L. A. North American amphibian monitoring program (NAAMP). in Amphibian declines. 307–313, (University of California Press, 2005).
Kahl, M. S. S. Identifying birds by sound: large-scale acoustic event recognition for avian activity monitoring. Wissenschaftliche Schriftenreihe Dissertationen der Medieninformatik Chemnitz University of Technology. 10, 2195–2574 (2019).
Google Scholar
Ulloa, J. S., Haupert, S., Latorre, J. F., Aubin, T. & Sueur, J. scikit-maad: An open-source and modular toolbox for quantitative soundscape analysis in Python. Methods Ecol. Evol. 12, 2334–2340 (2021).
Article Google Scholar
Szymański, P. & Kajdanowicz, T. A Network Perspective on Stratification of Multi-Label Data. Proc. PMLR Machine learning Research. 74, 22–35 (2017).
Google Scholar
Sechidis, K., Tsoumakas, G. & Vlahavas, I. On the Stratification of Multi-label Data. Machine Learning and Knowledge Discovery in Databases ECML. 6913, 145–158, https://doi.org/10.1007/978-3-642-23808-6_10 (2011).
Article Google Scholar
Cañas, J. S. et al. AnuraSet: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring. Zenodo, https://doi.org/10.5281/zenodo.8342596 (2023).
Frost, D. R. Amphibian Species of the World: an Online Reference. Version 6.2. https://amphibiansoftheworld.amnh.org/index.php, https://doi.org/10.5531/db.vz.0001 (2023).
Park, D. S. et al. Specaugment: A simple data augmentation method for automatic speech recognition. Proc. Interspeech 2613–2617, https://doi.org/10.21437/Interspeech.2019-2680 (2019).
Targ, S., Almeida, D. & Lyman, K. Resnet in resnet: Generalizing residual architectures. Preprint at ArXiv, https://doi.org/10.48550/arXiv.1603.08029 (2016).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).
Google Scholar
Yang, Y. Y. et al. Torchaudio: Building blocks for audio and speech processing. in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6982–6986, https://doi.org/10.1109/ICASSP43922.2022.9747236 (2022).
Stowell, D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ. 10, e13152 (2022).
Article PubMed PubMed Central Google Scholar
Van Horn, G. & Perona, P. The devil is in the tails: Fine-grained classification in the wild. Preprint at ArXiv, https://doi.org/10.48550/arXiv.1709.01450 (2017).
Menon, A. K. et al. Long-tail learning via logit adjustment. in Proc. ICLR International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2007.07314 (2021).
Cui, Y., Jia, M., Lin, T. Y., Song, Y. & Belongie, S. Class-Balanced Loss Based on Effective Number of Samples. in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9260–9269, https://doi.org/10.1109/CVPR.2019.00949 (2019).
Cao, K., Wei, C., Gaidon, A., Arechiga, N. & Ma, T. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. in. Adv. Neural Inf. Process. Syst. 32, 1567–1578 (2019).
Google Scholar
Nolasco, I. et al. Learning to detect an animal sound from five examples. Ecological Informatics. 77, 102258, https://doi.org/10.1016/j.ecoinf.2023.102258 (2023).
Wang, Y., Bryan, N. J., Cartwright, M., Pablo Bello, J. & Salamon, J. Few-Shot Continual Learning for Audio Classification. in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 321–325, https://doi.org/10.1109/ICASSP39728.2021.9413584 (2021).
Hagiwara, M. AVES: Animal Vocalization Encoder based on Self-Supervision. in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 1–5, https://doi.org/10.1109/ICASSP49357.2023.10095642 (2022).
Gontier, F. et al. Polyphonic training set synthesis improves self-supervised urban sound classification. J. Acoust. Soc. Am. 149, 4309–4326 (2021).
Article ADS PubMed Google Scholar
Papadopoulos, D. P., Uijlings, J. R. R., Keller, F. & Ferrari, V. We Don′t Need No Bounding-Boxes: Training Object Class Detectors Using Only Human Verification. in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 854–863, https://doi.org/10.1109/CVPR.2016.99 (2016).
Michaud, F., Sueur, J., Le Cesne, M. & Haupert, S. Unsupervised classification to improve the quality of a bird song recording dataset. Ecol. Inform. 74, 101952 (2023).
Article Google Scholar
Deng, J. et al. ImageNet: A large-scale hierarchical image database. in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248–255, https://doi.org/10.1109/CVPR.2009.5206848 (2009).
Cui, Y., Song, Y., Sun, C., Howard, A. & Belongie, S. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4109–4118, https://doi.org/10.1109/CVPR.2018.00432 (2018).
Yang, Z. et al. Learning to navigate for fine-grained classification. in Proc. ECCV European Conference on Computer Vision. 420–435, https://doi.org/10.1007/978-3-030-01264-9_26 (2018).
Bermant, P. C. BioCPPNet: automatic bioacoustic source separation with deep neural networks. Sci. Rep. 11, 23502 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Denton, T., Wisdom, S. & Hershey, J. R. Improving bird classification with unsupervised sound separation. in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. 636–640, https://doi.org/10.1109/ICASSP43922.2022.9747202 (2022).
Wisdom, S. et al. Unsupervised Sound Separation Using Mixture Invariant Training. Adv. Neural Inf. Process. Syst. 33, 3846–3857 (2020).
Google Scholar

Download references

Acknowledgements

The authors acknowledge financial support from the intergovernmental Group on Earth Observations (GEO) and Microsoft, under the GEO-Microsoft Planetary Computer Programme (October 2021); São Paulo Research Foundation (FAPESP #2016/25358–3; #2019/18335–5); the National Council for Scientific and Technological Development (CNPq #302834/2020–6; #312338/2021–0, #307599/2021–3); National Institutes for Science and Technology (INCT) in Ecology, Evolution, and Biodiversity Conservation, supported by MCTIC/CNpq (proc. 465610/2014–5), FAPEG (proc. 201810267000023); CNPQ/MCTI/CONFAP-FAPS/PELD No 21/2020 (FAPESC 2021TR386); Comunidad de Madrid (2020-T1/AMB-20636, Atracción de Talento Investigador, Spain) and research projects funded by the European Commission (EAVESTROP–661408, Global Marie S. Curie fellowship, program H2020, EU); and the Ministerio de Economía, Industria y Competitividad (CGL2017–88764-R, MINECO/AEI/FEDER, Spain). We also thank Tom Denton for machine learning evaluation suggestions, dataset revision, and comments on the manuscript.

Author information

Authors and Affiliations

Instituto de Investigación de Recursos Biológicos Alexander von Humboldt, Avenida Paseo Bolívar 16-20, Bogotá, Colombia
Juan Sebastián Cañas, María Paula Toro-Gómez, Jorge Rudas, Breyner Posso Bautista & Juan Sebastián Ulloa
K Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, 159 Sapsucker woods road, 14850, Ithaca, New York, USA
Larissa Sayuri Moreira Sugai
Pontificia Universidad Javeriana Seccional Cali, Calle 18 No 118-250, Cali, Valle del Cauca, Colombia
Hernán Darío Benítez Restrepo
Laboratório de História Natural de Anfíbios Brasileiros (LaHNAB), Universidade Estadual de Campinas, Campinas, SP, Brazil
Luís Felipe Toledo
Museu de Diversidade Biológica (MDBio), Universidade Estadual de Campinas, Campinas, SP, Brazil
Simone Dena
Instituto de Pesquisa da Biodiversidade (IPBio), Reserva Betary, Iporanga, São Paulo, Brazil
Adão Henrique Rosa Domingos
Universidade Federal de Mato Grosso do Sul, Instituto de Biociências, Campo Grande, MS, Brazil
Franco Leandro de Souza
Departamento de Ecologia e Zoologia, Universidade Federal de Santa Catarina, Florianopolis, SC, Brazil
Selvino Neckel-Oliveira, Anderson da Rosa & Vítor Carvalho-Rocha
Universidade Federal de Goiás, Goiania, GO, Brazil
José Vinícius Bernardy, José Luiz Massao Moreira Sugai, Carolina Emília dos Santos & Rogério Pereira Bastos
Terrestrial Ecology Group, Departamento de Ecología, Universidad Autónoma de Madrid, C/ Darwin, 2, Ciudad Universitaria de Cantoblanco, Facultad de Ciencias, Edificio de Biología, 28049, Madrid, Spain
Diego Llusia
Centro de Investigación en Biodiversidad y Cambio Global (CIBC), Universidad Autónoma de Madrid. C/ Darwin 2, 28049, Madrid, Spain
Diego Llusia
Laboratório de Herpetologia e Comportamento Animal, Departamento de Ecologia, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiás, Brazil
Diego Llusia

Authors

Juan Sebastián Cañas
View author publications
You can also search for this author in PubMed Google Scholar
María Paula Toro-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Larissa Sayuri Moreira Sugai
View author publications
You can also search for this author in PubMed Google Scholar
Hernán Darío Benítez Restrepo
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Rudas
View author publications
You can also search for this author in PubMed Google Scholar
Breyner Posso Bautista
View author publications
You can also search for this author in PubMed Google Scholar
Luís Felipe Toledo
View author publications
You can also search for this author in PubMed Google Scholar
Simone Dena
View author publications
You can also search for this author in PubMed Google Scholar
Adão Henrique Rosa Domingos
View author publications
You can also search for this author in PubMed Google Scholar
Franco Leandro de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Selvino Neckel-Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Anderson da Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Vítor Carvalho-Rocha
View author publications
You can also search for this author in PubMed Google Scholar
José Vinícius Bernardy
View author publications
You can also search for this author in PubMed Google Scholar
José Luiz Massao Moreira Sugai
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Emília dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Rogério Pereira Bastos
View author publications
You can also search for this author in PubMed Google Scholar
Diego Llusia
View author publications
You can also search for this author in PubMed Google Scholar
Juan Sebastián Ulloa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.S.C. and J.S.U. conceived and designed the experiments; J.S.U., L.S.M.S., and D.L. directed the project; J.S.U., L.S.M.S., D.L., H.B., R.P.B., and L.F.T. obtained funding sources; R.P.B., J.V.B., L.F.T., S.D., F.L.S., S.N.O., A.R., V.C.R., C.E.S., and A.H.R.D. participated in the data collection; M.P.T., J.S.U., L.S.M.S., and D.L. designed the annotation protocol; J.V.B., S.D., J.L.M.M.S., and A.R. made the weak labels, M.P.T. made the strong labels; J.S.C., J.S.U., H.B., J.R., and B.P. participated in the benchmark and dataset design; J.S.C. developed the code for dataset building, preprocessing, benchmark, and analysis tools; J.S.C. wrote the original draft; J.S.C., M.P.T., J.S.U., L.S.M.S., D.L., and H.B. wrote and edited the draft; all authors reviewed the final draft.

Corresponding author

Correspondence to Juan Sebastián Cañas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

List of the anuran species included in the AnuraSet_Final

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cañas, J.S., Toro-Gómez, M.P., Sugai, L.S.M. et al. A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring. Sci Data 10, 771 (2023). https://doi.org/10.1038/s41597-023-02666-2

Download citation

Received: 20 June 2023
Accepted: 19 October 2023
Published: 06 November 2023
DOI: https://doi.org/10.1038/s41597-023-02666-2