Skip to main content

Advertisement

Log in

Monitoring and Mining Animal Sounds in Visual Space

  • Published:
Journal of Insect Behavior Aims and scope Submit manuscript

Abstract

Monitoring animals by the sounds they produce is an important and challenging task, whether the application is outdoors in a natural habitat, or in the controlled environment of a laboratory setting. In the former case, the density and diversity of animal sounds can act as a measure of biodiversity. In the latter case, researchers often create control and treatment groups of animals, expose them to different interventions, and test for different outcomes. One possible manifestation of different outcomes may be changes in the bioacoustics of the animals. With such a plethora of important applications, there have been significant efforts to build bioacoustic classification tools. However, we argue that most current tools are severely limited. They often require the careful tuning of many parameters (and thus huge amounts of training data), are either too computationally expensive for deployment in resource-limited sensors, specialized for a very small group of species, or are simply not accurate enough to be useful. In this work we introduce a novel bioacoustic recognition/classification framework that mitigates or solves all of the above problems. We propose to classify animal sounds in the visual space, by treating the texture of their sonograms as an acoustic fingerprint using a recently introduced parameter-free texture measure as a distance measure. We further show that by searching for the most representative acoustic fingerprint, we can significantly outperform other techniques in terms of speed and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Bardeli R (2009) Similarity search in animal sound databases. IEEE Trans Multimed 11(1):68–76

    Article  Google Scholar 

  • Beiderman Y, Azani Y, Cohen Y, Nisankoren C, Teicher M, Mico V, Garcia J, Zalevsky Z (2010) Cleaning and quality classification of optically recorded voice signals. Recent Patents on Signal Proc’ 6–11

  • Bianconi F, Fernandez A (2007) Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recogn 40(12):3325–35

    Article  Google Scholar 

  • Blumstein DT (2001) Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations, and prospectus. J Appl Ecol 48:758–767

    Article  Google Scholar 

  • Brown JC, Smaragdis P (2009) Hidden Markov and Gaussian mixture models for automatic call classification. J Acoust Soc Am 6:221–22

    Google Scholar 

  • Campana BJL, Keogh EJ (2010) A compression-based distance measure for texture. Stat Anal Data Min 3(6):381–398

    Article  Google Scholar 

  • Dang T, Bulusu N, Feng WC, Hu W (2010) RHA: A Robust Hybrid Architecture for Information Processing in Wireless Sensor Networks. In 6th ISSNIP

  • Dietrich C, Schwenker F, Palm G (2001) Classification of time series utilizing temporal and decision fusion. Proceedings of Multiple Classifier Systems (MCS), LNCS 2096, pp 378–387, Cambridge

  • Desutter-Grandcolas L (1998) First Analysis of a Disturbance Stridulation in Crickets, Brachytrupes tropicus (Orthoptera: Grylloidea: Gryllidae). J Insect Behav 11

  • Elliott L, Hershberger W (2007) The songs of insects. Houghton-Mifflin Company

  • Fu A, Keogh EJ, Lau L, Ratanamahatana CA, Wong RCW (2008) Scaling and time warping in time series querying. VLDB J 17(4):899–921

    Article  Google Scholar 

  • Han NC, Muniandy SV, Dayou J (2011) Acoustic classification of Australian anurans based on hybrid spectral-entropy approach. Appl Acoust 72(9):639–645

    Article  Google Scholar 

  • Hao Y (2011) Animal sound fingerprint Webpage. www.cs.ucr.edu/~yhao/animalsoundfingerprint.html

  • Holy TE, Guo Z (2005) Ultrasonic songs of male mice. PLoS Biol 3:e386

    Article  PubMed  Google Scholar 

  • Jang Y, Gerhardt HC (2007) Temperature effects on the temporal properties of calling songs in the crickets Gryllus fultoni and G.vernalis: implications for reproductive isolation in sympatric populations. J Insect Behav 20(1)

  • Keogh EJ, Lonardi S, Ratanamahatana CA, Wei L, Lee S, Handley J (2007) Compression-based data mining of sequential data. DMKD 14(1):99–129

    Article  Google Scholar 

  • Kogan JA, Margoliash D (1998) Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden markov models: a comparative study. J Acoust Soc Am 103(4):2185–219

    Article  PubMed  CAS  Google Scholar 

  • Li M, Chen X, Li X, Ma B, Vitanyi P (2003) The similarity metric. Proc’of the 14th Symposium on Discrete Algorithms, pp 863–72

  • MacQueen JB (2009) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp 281–297. MR0214227. ZbI0214.46201

  • Mankin RW, Hagstrum DW, Smith MT, Roda AL, Kairo MTK (2011) Perspective and promise: a century of insect acoustic detection and monitoring. Amer Entomol 57:30–44

    Google Scholar 

  • Marcarini M, Williamson GA, de Garcia LS (2008) Comparison of methods for automated recognition of avian nocturnal flight calls. ICASSP 2008:2029–32

    Google Scholar 

  • Mellinger DK, Clark CW (2000) Recognizing transient low-frequency whale sounds by sonogram correlation. J Acoust Soc Am 107(6):3518–29

    Article  PubMed  CAS  Google Scholar 

  • Mitrovic D, Zeppelzauer M, Breiteneder C (2006) Discrimination and retrieval of animal sounds. In Proc. of IEEE Multimedia Modelling Conference, Beijing, China, 339–343

  • Celis-Murillo A, Deppe JL, Allen MF (2009) Using soundscape recordings to estimate bird species abundance, richness, and composition. J Field Ornithol 80:64–78

    Article  Google Scholar 

  • Nowak DJ, Pasek JE, Sequeira RA, Crane DE, Mastro VC (2001) Potential effect of Anoplophora glabripennis on urban trees in the United States. J Entomol 94(1):116–122

    CAS  Google Scholar 

  • Panksepp JB, Jochman KA, Kim JU, Koy JJ, Wilson ED, Chen Q, Wilson CR, Lahvis GP (2007) Affiliative behavior, ultrasonic communication and social reward are influenced by genetic variation in adolescent mice. PLoS One 4:e351

    Article  Google Scholar 

  • Riede K, Nischk F, Thiel C, Schwenker F (2006) Automated annotation of Orthoptera songs: first results from analysing the DORSA sound repository. J Orthoptera Res 15(1):105–113

    Article  Google Scholar 

  • Roach J (2006) Cricket, Katydid Songs Are Best Clues to Species’ Identities. National Geographic News, (URL). news.nationalgeographic.com/news/2006/09/060905-crickets.html

  • Schmidt AKD, Riede K, Römer H (2011) High background noise shapes selective auditory filters in a tropical cricket. J Exp Biol 214:1754–1762

    Article  PubMed  Google Scholar 

  • Wells MM, Henry CS (1998) Songs, reproductive isolation, and speciation in cryptic species of insect: a case study using green lacewings. In Endless Forms: species and speciation, Oxford Univ. Press, NY

  • Xi X, Ueno K, Keogh EJ, Lee DJ (2008) Converting non-parametric distance-based classification to anytime algorithms. Pattern Anal Appl 11(3–4):321–36

    Article  Google Scholar 

  • Yu G, Slotine JJ (2009) Audio classification from time-frequency texture. IEEE ICASSP, pp 1677–80

Download references

Acknowledgments

We thank the Cornell Lab of Ornithology for donating much of the data used in (Macaulay Library, www.macaulaylibrary.org/index.do), Jesin Zakaria for help projecting snippets of sounds into 2D space, and Dr. Agenor Mafra-Neto for entomological advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Hao.

Appendix A: Speeding Up the Search for Sound Fingerprints

Appendix A: Speeding Up the Search for Sound Fingerprints

In the main text, we defined sound fingerprints concretely, but did not discuss how we find them. We have regaled that discussion to this appendix to enhance the flow of the paper.

As we noted in the main text, there may be huge number of candidate sound fingerprints to examine and score; thus, we must provide an efficient mechanism for searching for the best one for a given species. For ease of exposition, we begin by describing the brute force algorithm for finding the sound fingerprint for a given species and later consider some techniques to speed this algorithm up.

A Brute-Force Algorithm

For concreteness, let us consider the following small dataset, which we will also use as a running example to explain our search algorithms in the following sections. We created a small dataset with P containing ten two-second sound files from Atlanticus dorsalis (Gray shieldback), and U containing ten two-second sound files from other random insects. If we just consider fingerprints of length 16 (i.e. L min = L max = 16), then even in this tiny dataset there are 830 candidate fingerprints to be tested, requiring 1,377,800 calls to the CK distance function.

The brute force algorithm is described in Table 4. We are given a dataset D, in which each sound sequence is labeled either class P or class U, and a user defined length L min to L max (optional: we default to the range 16 to infinity).

Table 4 Brute-force sound fingerprint discovery

The algorithm begins by initializing bsf_Gain, a variable to track the best candidate encountered thus far, to zero in line 1. Then all possible sound fingerprint candidates Sk,l for all legal subsequence lengths are generated in the nested loops in lines 2, 4, and 5 of the algorithm.

As each candidate is generated, the algorithm checks how well each candidate Sk,l can be used to separate objects into class P and class U (lines 2 to 9), as illustrated in Fig. 2. To achieve this, in line 6, the algorithm calls the subroutine CheckCandidates() to compute the information gain for each possible candidate. If the information gain is larger than the current value of bsf_Gain, the algorithm updates the bsf_Gain and the corresponding sound fingerprint in lines 7 to 9. The candidate checking subroutine is outlined in the algorithm shown in Table 5.

Table 5 Check the utility of single candidate

In the subroutine CheckCandidates(), shown in Table 5, we compute the order line L according to the distance from the sound sequence to the candidate computed in minCKdist() procedure, which is shown in Table 6. In essence, this is the procedure illustrated in Fig. 2. Given L, we can find the optimal split point (definition 8) in lines 10 to 15 by calculating all possible splitting points and recording the best.

Table 6 Compute minimum subsequence CK distance

While the splitting point can be any point on the positive real number line, we note that the information gain cannot change in the region between any two adjacent points. Thus, we can exploit this fact to produce a finite set of possible split positions. In particular, we need only test |D|-1 locations.

In the subroutine CheckCandidates(), this is achieved by only checking the mean value (the “halfway point”) of each pair of adjacent points in the distance ordering as the possible positions for the split point. In CheckCandidates(), we call the subroutine minCKdist() to find the best matching subsequence for a given candidate under consideration.

We do this for every sonogram in D, including the one from which the candidate was culled. This explains why in each order line at least one subsequence is at zero (c.f. Figs. 2 and 12). In minCKdist() (Table 6), we use the CK measure (6. Campana and Keogh 2010) as the distance measurement between a candidate fingerprint and a generally much longer sonogram.

In Fig. 11 we show a trace of the brute force algorithm on the Atlanticus dorsalis problem.

Fig. 11
figure 11

A trace of value of the bsf_Gain variable during brute force search on the Atlanticus dorsalis dataset. Only sound fingerprints of length 16 are considered here for simplicity

Note that the search continues even after an information gain of one is achieved in order to break ties. The 1,377,800 calls to the CK function dominate the overall cost of the search algorithm (99 % of the CPU time is spent on this) and require approximately 8 h. This is not an unreasonable amount of time, considering the several days of effort needed for an entomologist to collect the data in the field. However, this is a tiny dataset. We wish to examine datasets that are orders of magnitude larger. Thus, in the next section we consider speedup techniques.

Admissible Entropy Pruning

The most expensive computation in the brute force search algorithm is obtaining the distances between the candidates and their nearest matching subsequences in each of the objects in the dataset. The information gain computations (including the tie-breaking computations) are inconsequential in comparison. Therefore, our intuition in speeding up the brute force algorithm is to eliminate as many distance computations as possible.

Recall that in our algorithm, we have to obtain the annotated linear ordering of all the candidates in P. As we are incrementally doing this, we may notice that a particular candidate looks unpromising. Perhaps when we are measuring the distance from the current candidate to the first object in U we find that it is a small number (recall that we want the distances to P to be small and to U large), and when we measure the distance to the next object in U we again find it to be small. Must we continue to keep testing this unpromising candidate? Fortunately, the answer may be “no”. Under some circumstances we can admissibly prune unpromising fingerprints; without having to check all the objects in the universe U.

The key observation is that we can cheaply compute the upper bound of the current partially computed linear ordering at any time. If the upper bound we obtain is less than the best-so-far information gain (i.e., the bsf_Gain of Table 4), we can simply eliminate the remaining distance computations in U and prune this particular fingerprint candidate from consideration.

To illustrate this pruning policy, we consider a concrete example. Suppose that during a search the best-so-far information gain is currently 0.590, and we are incrementally beginning to compute the sound fingerprint shown in Fig. 2. Assume that the partially computed linear ordering is shown in Fig. 12. We have computed the distances to all five objects in P, and to the first two objects in U.

Fig. 12
figure 12

The order line of all the objects in P and just the first two objects in U

Is it possible that this candidate will yield a score better than our best-so-far? It is easy to see that the most optimistic case (i.e., the upper bound) occurs if all of the remaining objects in U map to the far right, as we illustrate in Fig. 13.

Fig. 13
figure 13

The logically best possible order line based on the distances that have been calculated in Fig. 12. The best split point is shown by the yellow/heavy line

Note that of the objects on the left side of the split point, all are from P. Of the objects on the right side, are from P and are from U. Given this, the entropy of the hypothetical order line shown in Fig. 13 is:

Therefore, the best possible information gain we could obtain from the example shown in Fig. 12 is just 0.612, which is lower than the best-so-far information gain. In this case, we do not have to consider the ordering of the remaining objects in U. In this toy example, we have only pruned two invocations of the CheckCandidates() subroutine shown in Table 5. However, as we shall see, this simple idea can prune more than 95 % of the calculations for more realistic problems.

The formal algorithm of admissible entropy pruning is shown in Table 7. After the very first sound fingerprint candidate check, for all the remaining candidates, we can simply insert EntropyUBPrune() in line 4 of Table 5, and eliminate the remaining CK distance and information gain computation if the current candidate satisfies the pruning condition, as we discussed in this section. EntropyUBPrune() takes the best-so-far information gain, current distance ordering from class P and class U, and remaining objects in U, and returns the fraction of the distance measurements computed to see how much elimination we achieved.

Table 7 Entropy upper bound pruning

We can get a hint as to the utility of this optimization by revisiting the Atlanticus dorsalis problem we considered above. Figure 14 shows the difference entropy pruning makes in this problem.

Fig. 14
figure 14

A trace of the bsf_Gain variable during brute force and entropy pruning search on the Atlanticus dorsalis dataset

Note that not only does the algorithm terminate earlier (with the exact same answer), but it converges faster, a useful property if we wish to consider the algorithm in an anytime framework (Xi et al. 2008).

Euclidean Distance Ordering Heuristic

In both the brute force algorithm and the entropy-based pruning extension introduced in the last section, we generate and test candidates; from left to right; and top to bottom, based on the given lexical order of the objects’ label (i.e., the file names used by the entomologist).

There are clearly other possible orders we could use to search, and it is equally clear that for entropy-based pruning, some orders are better than others. In particular, if we find a candidate which has a relatively high information gain early in the search, our pruning strategy can prune much more effectively.

However, this idea appears to open a “chicken and egg” paradox. How can we know the best order; until we have finished the search? Clearly, we cannot. However, we do not need to find the optimal ordering, we just need to encounter a relatively good candidate relatively early in the search. Table 8 outlines our idea to achieve this. We simply run the entire brute force search using the Euclidean distance as a proxy for the CK distance, and sort the candidates based on the information gain achieved using the Euclidean distance.

Table 8 Euclidean distance measure order pruning

Concretely, we can insert EuclideanOrder() between lines 4 and 5 in Table 4 to obtain a better ordering to check all the candidates.

Running this preprocessing step adds some overhead; however, it is inconsequential because the Euclidean distance is at least two orders of magnitude faster than the CK distance calculation. For this idea to work well, the Euclidean distance must be a good proxy for the CK distance calculation. To see if this is the case, we randomly extracted 1,225 pairs of insect sounds and measured the distance between them under both measures, using the two values to plot points in a 2-D scatter plot, as shown in Fig. 15. The results suggest that Euclidean distance is a good surrogate for CK distance.

Fig. 15
figure 15

The relationship between Euclidean and CK distance for 1,225 pairs of sonograms

To measure the effect of this reordering heuristic we revisited our running example shown in Figs. 11 and 14.

The Euclidean distance reordering heuristic is shown in Fig. 16.

Fig. 16
figure 16

A trace of value of the bsf_Gain variable during brute force, entropy pruning, and reordering optimized search on the Atlanticus dorsalis dataset

As we can see, our heuristic has two positive effects. First, the absolute time to finish (with the identical answer as a brute force search) has significantly decreased. Secondly, we converge on high quality solution faster. This is a significant advantage if we wanted to cast the search problem as an anytime algorithm (Xi et al. 2008).

Scalability of Fingerprint Discovery

In the experiments shown above, when our toy example had only ten objects in both P and U, we showed a speedup of about a factor of five, although we claimed this is pessimistic because we expect to be able to prune more aggressively with larger datasets. To test this, we reran these experiments with a more realistically-sized U, containing 200 objects from other insects, birds, trains, helicopters, etc. As shown in Fig. 17, the speedup achieved by our reordering optimization algorithm is a factor of 93 in this case.

Fig. 17
figure 17

A trace of value of the bsf_Gain variable during brute force, entropy pruning, and reordering optimized search on the Atlanticus dorsalis dataset with the 200-object universe

In essence, we believe that our algorithm is fast enough for most practical applications. In particular the time taken for our algorithm will typically be dwarfed by the time taken to collect the data in the field or type up field notes etc.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, Y., Campana, B. & Keogh, E. Monitoring and Mining Animal Sounds in Visual Space. J Insect Behav 26, 466–493 (2013). https://doi.org/10.1007/s10905-012-9361-5

Download citation

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10905-012-9361-5

Keywords

Navigation