Abstract
Monitoring animals by the sounds they produce is an important and challenging task, whether the application is outdoors in a natural habitat, or in the controlled environment of a laboratory setting. In the former case, the density and diversity of animal sounds can act as a measure of biodiversity. In the latter case, researchers often create control and treatment groups of animals, expose them to different interventions, and test for different outcomes. One possible manifestation of different outcomes may be changes in the bioacoustics of the animals. With such a plethora of important applications, there have been significant efforts to build bioacoustic classification tools. However, we argue that most current tools are severely limited. They often require the careful tuning of many parameters (and thus huge amounts of training data), are either too computationally expensive for deployment in resource-limited sensors, specialized for a very small group of species, or are simply not accurate enough to be useful. In this work we introduce a novel bioacoustic recognition/classification framework that mitigates or solves all of the above problems. We propose to classify animal sounds in the visual space, by treating the texture of their sonograms as an acoustic fingerprint using a recently introduced parameter-free texture measure as a distance measure. We further show that by searching for the most representative acoustic fingerprint, we can significantly outperform other techniques in terms of speed and accuracy.
Similar content being viewed by others
References
Bardeli R (2009) Similarity search in animal sound databases. IEEE Trans Multimed 11(1):68–76
Beiderman Y, Azani Y, Cohen Y, Nisankoren C, Teicher M, Mico V, Garcia J, Zalevsky Z (2010) Cleaning and quality classification of optically recorded voice signals. Recent Patents on Signal Proc’ 6–11
Bianconi F, Fernandez A (2007) Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recogn 40(12):3325–35
Blumstein DT (2001) Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations, and prospectus. J Appl Ecol 48:758–767
Brown JC, Smaragdis P (2009) Hidden Markov and Gaussian mixture models for automatic call classification. J Acoust Soc Am 6:221–22
Campana BJL, Keogh EJ (2010) A compression-based distance measure for texture. Stat Anal Data Min 3(6):381–398
Dang T, Bulusu N, Feng WC, Hu W (2010) RHA: A Robust Hybrid Architecture for Information Processing in Wireless Sensor Networks. In 6th ISSNIP
Dietrich C, Schwenker F, Palm G (2001) Classification of time series utilizing temporal and decision fusion. Proceedings of Multiple Classifier Systems (MCS), LNCS 2096, pp 378–387, Cambridge
Desutter-Grandcolas L (1998) First Analysis of a Disturbance Stridulation in Crickets, Brachytrupes tropicus (Orthoptera: Grylloidea: Gryllidae). J Insect Behav 11
Elliott L, Hershberger W (2007) The songs of insects. Houghton-Mifflin Company
Fu A, Keogh EJ, Lau L, Ratanamahatana CA, Wong RCW (2008) Scaling and time warping in time series querying. VLDB J 17(4):899–921
Han NC, Muniandy SV, Dayou J (2011) Acoustic classification of Australian anurans based on hybrid spectral-entropy approach. Appl Acoust 72(9):639–645
Hao Y (2011) Animal sound fingerprint Webpage. www.cs.ucr.edu/~yhao/animalsoundfingerprint.html
Holy TE, Guo Z (2005) Ultrasonic songs of male mice. PLoS Biol 3:e386
Jang Y, Gerhardt HC (2007) Temperature effects on the temporal properties of calling songs in the crickets Gryllus fultoni and G.vernalis: implications for reproductive isolation in sympatric populations. J Insect Behav 20(1)
Keogh EJ, Lonardi S, Ratanamahatana CA, Wei L, Lee S, Handley J (2007) Compression-based data mining of sequential data. DMKD 14(1):99–129
Kogan JA, Margoliash D (1998) Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden markov models: a comparative study. J Acoust Soc Am 103(4):2185–219
Li M, Chen X, Li X, Ma B, Vitanyi P (2003) The similarity metric. Proc’of the 14th Symposium on Discrete Algorithms, pp 863–72
MacQueen JB (2009) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp 281–297. MR0214227. ZbI0214.46201
Mankin RW, Hagstrum DW, Smith MT, Roda AL, Kairo MTK (2011) Perspective and promise: a century of insect acoustic detection and monitoring. Amer Entomol 57:30–44
Marcarini M, Williamson GA, de Garcia LS (2008) Comparison of methods for automated recognition of avian nocturnal flight calls. ICASSP 2008:2029–32
Mellinger DK, Clark CW (2000) Recognizing transient low-frequency whale sounds by sonogram correlation. J Acoust Soc Am 107(6):3518–29
Mitrovic D, Zeppelzauer M, Breiteneder C (2006) Discrimination and retrieval of animal sounds. In Proc. of IEEE Multimedia Modelling Conference, Beijing, China, 339–343
Celis-Murillo A, Deppe JL, Allen MF (2009) Using soundscape recordings to estimate bird species abundance, richness, and composition. J Field Ornithol 80:64–78
Nowak DJ, Pasek JE, Sequeira RA, Crane DE, Mastro VC (2001) Potential effect of Anoplophora glabripennis on urban trees in the United States. J Entomol 94(1):116–122
Panksepp JB, Jochman KA, Kim JU, Koy JJ, Wilson ED, Chen Q, Wilson CR, Lahvis GP (2007) Affiliative behavior, ultrasonic communication and social reward are influenced by genetic variation in adolescent mice. PLoS One 4:e351
Riede K, Nischk F, Thiel C, Schwenker F (2006) Automated annotation of Orthoptera songs: first results from analysing the DORSA sound repository. J Orthoptera Res 15(1):105–113
Roach J (2006) Cricket, Katydid Songs Are Best Clues to Species’ Identities. National Geographic News, (URL). news.nationalgeographic.com/news/2006/09/060905-crickets.html
Schmidt AKD, Riede K, Römer H (2011) High background noise shapes selective auditory filters in a tropical cricket. J Exp Biol 214:1754–1762
Wells MM, Henry CS (1998) Songs, reproductive isolation, and speciation in cryptic species of insect: a case study using green lacewings. In Endless Forms: species and speciation, Oxford Univ. Press, NY
Xi X, Ueno K, Keogh EJ, Lee DJ (2008) Converting non-parametric distance-based classification to anytime algorithms. Pattern Anal Appl 11(3–4):321–36
Yu G, Slotine JJ (2009) Audio classification from time-frequency texture. IEEE ICASSP, pp 1677–80
Acknowledgments
We thank the Cornell Lab of Ornithology for donating much of the data used in (Macaulay Library, www.macaulaylibrary.org/index.do), Jesin Zakaria for help projecting snippets of sounds into 2D space, and Dr. Agenor Mafra-Neto for entomological advice.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Speeding Up the Search for Sound Fingerprints
Appendix A: Speeding Up the Search for Sound Fingerprints
In the main text, we defined sound fingerprints concretely, but did not discuss how we find them. We have regaled that discussion to this appendix to enhance the flow of the paper.
As we noted in the main text, there may be huge number of candidate sound fingerprints to examine and score; thus, we must provide an efficient mechanism for searching for the best one for a given species. For ease of exposition, we begin by describing the brute force algorithm for finding the sound fingerprint for a given species and later consider some techniques to speed this algorithm up.
A Brute-Force Algorithm
For concreteness, let us consider the following small dataset, which we will also use as a running example to explain our search algorithms in the following sections. We created a small dataset with P containing ten two-second sound files from Atlanticus dorsalis (Gray shieldback), and U containing ten two-second sound files from other random insects. If we just consider fingerprints of length 16 (i.e. L min = L max = 16), then even in this tiny dataset there are 830 candidate fingerprints to be tested, requiring 1,377,800 calls to the CK distance function.
The brute force algorithm is described in Table 4. We are given a dataset D, in which each sound sequence is labeled either class P or class U, and a user defined length L min to L max (optional: we default to the range 16 to infinity).
The algorithm begins by initializing bsf_Gain, a variable to track the best candidate encountered thus far, to zero in line 1. Then all possible sound fingerprint candidates Sk,l for all legal subsequence lengths are generated in the nested loops in lines 2, 4, and 5 of the algorithm.
As each candidate is generated, the algorithm checks how well each candidate Sk,l can be used to separate objects into class P and class U (lines 2 to 9), as illustrated in Fig. 2. To achieve this, in line 6, the algorithm calls the subroutine CheckCandidates() to compute the information gain for each possible candidate. If the information gain is larger than the current value of bsf_Gain, the algorithm updates the bsf_Gain and the corresponding sound fingerprint in lines 7 to 9. The candidate checking subroutine is outlined in the algorithm shown in Table 5.
In the subroutine CheckCandidates(), shown in Table 5, we compute the order line L according to the distance from the sound sequence to the candidate computed in minCKdist() procedure, which is shown in Table 6. In essence, this is the procedure illustrated in Fig. 2. Given L, we can find the optimal split point (definition 8) in lines 10 to 15 by calculating all possible splitting points and recording the best.
While the splitting point can be any point on the positive real number line, we note that the information gain cannot change in the region between any two adjacent points. Thus, we can exploit this fact to produce a finite set of possible split positions. In particular, we need only test |D|-1 locations.
In the subroutine CheckCandidates(), this is achieved by only checking the mean value (the “halfway point”) of each pair of adjacent points in the distance ordering as the possible positions for the split point. In CheckCandidates(), we call the subroutine minCKdist() to find the best matching subsequence for a given candidate under consideration.
We do this for every sonogram in D, including the one from which the candidate was culled. This explains why in each order line at least one subsequence is at zero (c.f. Figs. 2 and 12). In minCKdist() (Table 6), we use the CK measure (6. Campana and Keogh 2010) as the distance measurement between a candidate fingerprint and a generally much longer sonogram.
In Fig. 11 we show a trace of the brute force algorithm on the Atlanticus dorsalis problem.
Note that the search continues even after an information gain of one is achieved in order to break ties. The 1,377,800 calls to the CK function dominate the overall cost of the search algorithm (99 % of the CPU time is spent on this) and require approximately 8 h. This is not an unreasonable amount of time, considering the several days of effort needed for an entomologist to collect the data in the field. However, this is a tiny dataset. We wish to examine datasets that are orders of magnitude larger. Thus, in the next section we consider speedup techniques.
Admissible Entropy Pruning
The most expensive computation in the brute force search algorithm is obtaining the distances between the candidates and their nearest matching subsequences in each of the objects in the dataset. The information gain computations (including the tie-breaking computations) are inconsequential in comparison. Therefore, our intuition in speeding up the brute force algorithm is to eliminate as many distance computations as possible.
Recall that in our algorithm, we have to obtain the annotated linear ordering of all the candidates in P. As we are incrementally doing this, we may notice that a particular candidate looks unpromising. Perhaps when we are measuring the distance from the current candidate to the first object in U we find that it is a small number (recall that we want the distances to P to be small and to U large), and when we measure the distance to the next object in U we again find it to be small. Must we continue to keep testing this unpromising candidate? Fortunately, the answer may be “no”. Under some circumstances we can admissibly prune unpromising fingerprints; without having to check all the objects in the universe U.
The key observation is that we can cheaply compute the upper bound of the current partially computed linear ordering at any time. If the upper bound we obtain is less than the best-so-far information gain (i.e., the bsf_Gain of Table 4), we can simply eliminate the remaining distance computations in U and prune this particular fingerprint candidate from consideration.
To illustrate this pruning policy, we consider a concrete example. Suppose that during a search the best-so-far information gain is currently 0.590, and we are incrementally beginning to compute the sound fingerprint shown in Fig. 2. Assume that the partially computed linear ordering is shown in Fig. 12. We have computed the distances to all five objects in P, and to the first two objects in U.
Is it possible that this candidate will yield a score better than our best-so-far? It is easy to see that the most optimistic case (i.e., the upper bound) occurs if all of the remaining objects in U map to the far right, as we illustrate in Fig. 13.
Note that of the objects on the left side of the split point, all are from P. Of the objects on the right side, are from P and are from U. Given this, the entropy of the hypothetical order line shown in Fig. 13 is:
Therefore, the best possible information gain we could obtain from the example shown in Fig. 12 is just 0.612, which is lower than the best-so-far information gain. In this case, we do not have to consider the ordering of the remaining objects in U. In this toy example, we have only pruned two invocations of the CheckCandidates() subroutine shown in Table 5. However, as we shall see, this simple idea can prune more than 95 % of the calculations for more realistic problems.
The formal algorithm of admissible entropy pruning is shown in Table 7. After the very first sound fingerprint candidate check, for all the remaining candidates, we can simply insert EntropyUBPrune() in line 4 of Table 5, and eliminate the remaining CK distance and information gain computation if the current candidate satisfies the pruning condition, as we discussed in this section. EntropyUBPrune() takes the best-so-far information gain, current distance ordering from class P and class U, and remaining objects in U, and returns the fraction of the distance measurements computed to see how much elimination we achieved.
We can get a hint as to the utility of this optimization by revisiting the Atlanticus dorsalis problem we considered above. Figure 14 shows the difference entropy pruning makes in this problem.
Note that not only does the algorithm terminate earlier (with the exact same answer), but it converges faster, a useful property if we wish to consider the algorithm in an anytime framework (Xi et al. 2008).
Euclidean Distance Ordering Heuristic
In both the brute force algorithm and the entropy-based pruning extension introduced in the last section, we generate and test candidates; from left to right; and top to bottom, based on the given lexical order of the objects’ label (i.e., the file names used by the entomologist).
There are clearly other possible orders we could use to search, and it is equally clear that for entropy-based pruning, some orders are better than others. In particular, if we find a candidate which has a relatively high information gain early in the search, our pruning strategy can prune much more effectively.
However, this idea appears to open a “chicken and egg” paradox. How can we know the best order; until we have finished the search? Clearly, we cannot. However, we do not need to find the optimal ordering, we just need to encounter a relatively good candidate relatively early in the search. Table 8 outlines our idea to achieve this. We simply run the entire brute force search using the Euclidean distance as a proxy for the CK distance, and sort the candidates based on the information gain achieved using the Euclidean distance.
Concretely, we can insert EuclideanOrder() between lines 4 and 5 in Table 4 to obtain a better ordering to check all the candidates.
Running this preprocessing step adds some overhead; however, it is inconsequential because the Euclidean distance is at least two orders of magnitude faster than the CK distance calculation. For this idea to work well, the Euclidean distance must be a good proxy for the CK distance calculation. To see if this is the case, we randomly extracted 1,225 pairs of insect sounds and measured the distance between them under both measures, using the two values to plot points in a 2-D scatter plot, as shown in Fig. 15. The results suggest that Euclidean distance is a good surrogate for CK distance.
To measure the effect of this reordering heuristic we revisited our running example shown in Figs. 11 and 14.
The Euclidean distance reordering heuristic is shown in Fig. 16.
As we can see, our heuristic has two positive effects. First, the absolute time to finish (with the identical answer as a brute force search) has significantly decreased. Secondly, we converge on high quality solution faster. This is a significant advantage if we wanted to cast the search problem as an anytime algorithm (Xi et al. 2008).
Scalability of Fingerprint Discovery
In the experiments shown above, when our toy example had only ten objects in both P and U, we showed a speedup of about a factor of five, although we claimed this is pessimistic because we expect to be able to prune more aggressively with larger datasets. To test this, we reran these experiments with a more realistically-sized U, containing 200 objects from other insects, birds, trains, helicopters, etc. As shown in Fig. 17, the speedup achieved by our reordering optimization algorithm is a factor of 93 in this case.
In essence, we believe that our algorithm is fast enough for most practical applications. In particular the time taken for our algorithm will typically be dwarfed by the time taken to collect the data in the field or type up field notes etc.
Rights and permissions
About this article
Cite this article
Hao, Y., Campana, B. & Keogh, E. Monitoring and Mining Animal Sounds in Visual Space. J Insect Behav 26, 466–493 (2013). https://doi.org/10.1007/s10905-012-9361-5
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10905-012-9361-5