Background & Summary

Advances in information and communication technologies enabled the development of precision livestock farming (PLF) systems with potential applications to improve farm operational efficiency and animal welfare1,2. Over the last three decades, PLF has grown substantially, attracting farmers, operators and industries around the world3,4. New PLF developments include methodologies to enable the individual monitoring of livestock feeding behavior, which might be used to detect changes in animal welfare with direct insights into animal nutrition, health or performance5,6,7. Wearable sensors are the most common data acquisition method to monitor feeding behavior8,9. Accelerometers and inertial measurement units determine head and neck movements and have been used mainly in confined environments10,11. Acoustic sensors are typically preferred over motion sensors in free-ranging conditions12 to classify different animal jaw movements (JMs)13,14,15,16,17 and feeding behavior18,19. Furthermore, distinguishing different types of JMs is useful for delimiting grazing and rumination bouts20, estimating dry matter intake, and discriminating different feedstuffs and plants21,22.

The acoustic monitoring of foraging behavior is an engineering task that requires robust solutions capable of tolerating noise, interference and disturbance12. The opportunities to use acoustic methods for practical farm-level management and animal research are ample23, but the limited availability of public/open acoustic datasets could hinder new and relevant research24. To the best of our knowledge, there are only two open datasets of cattle acoustic sounds. The first dataset contains 52 audio recordings of JMs of dairy cows grazing on two contrasting forage species at two sward heights25. The other dataset provides 270 samples of cattle calls, also called cattle vocalizations26,27.

This work presents a dataset of audio recordings of chewing and biting sounds of dairy cows along with their corresponding event identification labels. The dataset is organized into three groups. (i) It includes 24-h audio recordings of continuously monitored dairy cows grazing in pastures or visiting the dairy milking barn. A total of 708.1 h were recorded, from which 462.0 h corresponded to sounds registered in a free-range pasture environment. Annotations of the grazing and rumination bouts are provided for each of the cows. Periods during which the dairy cows were inside the dairy barn are also indicated. (ii) It contains two audio files of 54.6 min of grazing and 30.0 min of rumination, with the corresponding labels for JMs. Experts identified and labeled 4,221 ingestive JMs and 2,006 rumination JMs produced during grazing and rumination, respectively. (iii) It provides a comprehensive description of the different types of JMs and animal behaviors, and specific information about the audio recordings. The dataset presented here has been previously used to create automatic machine-learning algorithms for detecting and classifying different JMs28,29,30 and for classifying grazing and rumination activities25,31,32,33. This dataset could be used to improve the recognition rate, generalization ability, and noise robustness of existing algorithms34, as well as to develop novel algorithms that combine acoustic signals with othefr sources of information35.

Methods

The field study took place from July 31 to August 19, 2014, and was conducted at the W. K. Kellogg Biological Station’s Pasture Dairy Research Center of Michigan State University, located in Hickory Corners, Michigan, US (GPS coordinates 42°24′21.8″ N 85°24′08.4″ W). The procedures for animal handling and care were revised and approved by the Institutional Animal Care and Use Committee of Michigan State University (#02/17–020–00) before the start of the experiment. As described by Watt et al.36, animals were managed on a grazing-based platform with free access to the robotic milking system. Voluntary milking (3.0 ± 1.0 daily milkings) was conducted using two Lely A3-Robotic milking units (Lely Industries NV, Maassluis, The Netherlands). Permissions for milking were set by a minimum expected milk yield of 9.1 kg or a 6 h milking interval. Thus, milking frequency varied across cows according to milk yield. Dairy cows were fed a grain-based concentrate at 1 to 6 kg per kg of extracted milk (daily maximum 12 kg/cow) during milking and through automatic feeders located inside the dairy milking barn. The neutral detergent fiber (NDF), net energy for lactation (NEL), and average crude protein (CP) of the grain-based concentrate pellet supplied (Cargill Inc, Big Lake, MN) were 2.05 Mcal/kg dry matter (DM), 99.4 g/kg DM, and, 193.0 g/kg DM respectively. Cows were allowed 24-h access to grazing paddocks with a predominance of orchardgrass (Dactylis glomerata), tall fescue (Lolium arundinacea) and white clover (Trifolium repens), or perennial ryegrass (Lolium perenne) and white clover. Two allocations of ~15 kg/cow of fresh pasture were offered daily, from 10:00 to 22:00 and from 22:00 to 10:00 (GMT-5), resulting in an average daily offer of ~30 kg of DM/cow. Allocations of fresh ungrazed pasture were made available at opposite sides of the farm (south and north) to entice cow traffic through the milking shed. Thirty readings of sward height (SH, x) along each paddock were conducted by a plate meter to estimate pre-grazing and post-grazing herbage biomass to ground level (Y, Y = 125x;r2 = 0.96). This equation was also developed and verified for similar swards. Across the 16 paddocks used in this study, the average pre-grazing herbage biomass was 2387 ± 302 kg DM/ha (19.2 ± 2.5 cm SH) and the average post-grazing herbage biomass was 1396 ± 281 kg DM/ha (11.2 ± 2.2 cm SH). Composite hand-plucked samples from the 16 paddocks were used to determine the 48 h in vitro digestibility of DM (IVDMD) (Daisy II, Ankom Technology Corp.), the acid (ADF) and neutral detergent fiber (NDF) (Fiber Analyzer, Ankom Technology Corp., Fairport, NY), the crude protein (CP) (4010 CN Combustion, Costech Analytical Technologies Inc., Valencia, CA), and the acid detergent lignin (ADL) content of consumed forages. The values of DM expressed in terms of g/kg for IVDMD, CP, NDF, ADF and ADL were 781 ± 30, 257 ± 20, 493 ± 45, 187 ± 25, 33 ± 8, respectively.

For this study, 5 lactating high-producing multiparous Holstein cows were selected from a herd of 146 Holstein cows and used to non-invasively acquire and record acoustic signals over 24-h periods. Specific characteristics of individual cows are provided in Table 1. Individualized 24-h audio recordings were conducted on July 31, and August 4, 6, 11, 13 and 18, 2014, respectively. Recordings were obtained following a 5 × 5 Latin-square design (Table 2) using 5 independent monitoring systems (halters, microphones and recorders) that were rotated daily across the 5 cows and throughout 6 non-consecutive recording days. This design was decided to control for differences of sound data associated with a particular cow, recording systems or experiment day. On the first day, each recording system was randomly assigned to each cow. On the sixth day, the recording systems were reassigned to cows using the same order that was used on the first day. No training in the use of the recording systems was deemed necessary before study onset. Recording problems were encountered with the recording system number 2. On the first day, the recording trial had to be stopped a few hours before completion because the recording system was unfastened from the cow. This trial was considered valid and was not repeated. On the sixth day, the recording system failed to register any sound because the microphone connector was disconnected from the recorder. This trial was repeated on the next day (August 19) to complete the recordings of the sixth day. Changes in the order and completion of recording trials should be considered when designating trial days as a random variable in the experimental design. The weather conditions during the study were registered by the National Weather Service Station located at the Kellogg Biological Station (Table 3).

Table 1 Specific traits and description of the dairy cows used to acquire the audio recordings.
Table 2 Latin-square design for recording systems, cows and days.
Table 3 Weather conditions during audio recording trials.

Each recording system consisted of two directional electret microphones connected to the stereo input channels of a digital recorder (Sony Digital ICD-PX312, Sony, San Diego, CA, USA). A 1.5 V AAA alkaline battery powered the digital recorder. The digital recorder saved the data in a 4 GB micro secure digital (SD) card (SanDisk SDSDB-004G-B35 SDHC, Western Digital, Milpitas, CA, USA). This instrumentation was enclosed in a weather proof protective case (1015 Micron Case Series, Pelican Products, Torrance, CA, USA) mounted to the top side of a halter neck strap (Fig. 1). One microphone was positioned facing inwards to capture the bone-transmitted vibrations and pressed against the forehead of the animal, while the other microphone faced outwards to capture the sounds produced by the animal. To achieve better microphone contact, hair of the central forehead area was removed using a sharp clipper. The microphones were held in the desired position by using a rubber foam and elastic headband attached to the halter. This design prevented microphone movements and allowed the insulation of microphones from environmental noise caused by wind, friction and scratches37,38.

Fig. 1
figure 1

Recording system used to record the acoustic signals composed of inward and outward facing microphones (a). Wired microphones were covered by an elastic headband (b) and plugged (c) into a recorder housed inside a weather proof case attached to the top side of a halter neck strap (d).

After the morning milking session, the study cows were automatically separated into a holding pen. They were then restrained using head lockers to install recording systems equipped with new batteries and empty SD cards. As each cow completed the 24 h of continuous recording, they were manually guided to the head lockers to remove the recording systems. The date and relevant information of the recording systems and cows were kept in a logbook. A similar process was repeated on every trial day following the Latin-square design. In each recorder system, the two microphones were connected randomly to the stereo-input channels of the recorder at the beginning of trials. This information was not logged. Experienced animal handlers, who had extensive experience in animal behavior, data collection and analysis, directly observed the focal animals for blocks of ~5 min each hour. Observation of foraging behavior, the time the equipment was turned on and other relevant parameters were documented and registered in the logbook. The handlers also checked the correct placement and location of recording systems on the cows. Observations were conducted at a distance from the animals to minimize disruptions of behavior.

The label files were generated by two experts with extensive experience in animal behavior understanding and digital analysis of audio signals25,28,37,38,39,40. The labeling was performed by an expert and the results were reviewed by another expert. The experts were guided by the logbook and used Audacity software (www.audacityteam.org) to observe the sound waveforms and to listen to sounds to identify, classify, and label data into animal behavior categories. Annotations of interest that experts could not acoustically identify, such as the installation and removal of recording systems, were labeled by using the logbook registers. Although the experts matched all label assignments, there were some small differences in the start and/or end times (timestamp) of some labels. In those cases, both experts revised the labels together until they reached a mutual agreement. Additionally, as previously mentioned, the two microphones of each recording system were randomly connected to the stereo-input channels of the recorder throughout the trials. As a consequence, the stereo-input channels are swapped across the audio recordings. To address this, the experts listened to segments of grazing activity and barn location for all audio recordings and marked the one-to-one correspondence between the stereo-input channels and the two microphones (facing inwards and outwards of the forehead of the animal). However, establishing the proper microphone correspondence for some audio recordings was not straightforward due to the similar or wide variation in the channels. The experts made their decision based on a final mutual agreement.

Twenty-four-h recordings were registered in two settings: indoors, while cows visited the dairy milking barn and outdoors, while cows had free access to grazing pasture. During the continuous acoustic monitoring of cows, the animal handler annotated the rumination and feeding activities inside the milking barn in the logbook. However, the experts did not label these activities because the presence of acoustic noise in the audio recordings made it difficult to ensure their proper delimitation. The main focus of the experiment was to collect acoustic signals of foraging behavior while cows grazed in free-range conditions.

A total of 6,227 ingestive and rumination JMs were individualized, delimited and labeled by the experts, following the same approach and criteria used for labeling the animal behavior categories. This is a complex task that requires significant processing time and expertise in audio signal processing and inspection. Therefore, the start and end timestamps of the JMs could be subjective and may vary from the true bounds of the JMs in the audio files. To address this potential bias, an additional group of JMs’ timestamps was generated using a Python script. This script automatically adjusts the start and end boundaries of the JMs defined by the experts, without changing the JM label. Adjusted timestamps are determined based on the sound intensity during the JMs when it exceeds a threshold level. This threshold level is defined using the sound intensity during the pauses that occur between consecutive JMs.

Moreover, a pattern recognition JMs classifier algorithm39 has been used to automatically create JMs’ timestamps and labels in all grazing and rumination bouts of the daily recordings. The algorithm inputs were the channel corresponding to the facing-inward microphone of the audio recording and the outputs were a series of label files. The algorithm labels three types of JMs in terms of chews, bites and chew-bites. A post-processing algorithm was applied to have four types of JMs by dividing the chews into chews during grazing and chews during rumination. The JMs label files were not verified by the experts. Therefore, these files may contain possible identifications of JMs that did not exist, misidentifications of JMs that do exist, and/or incorrect JM labels.

Data Records

The data is available at Figshare41. The audio recordings were saved in MPEG-1 Audio Layer III (MP3) format42 with a sampling rate of 44.1 kHz, providing a nominal recording bandwidth of 22 kHz and a dynamic range of 96 dB. The recordings were made in stereo, using one microphone per channel with a resolution of 16 bits at 192 kbps. This configuration made it possible to save up to 48 h of audio on the SD card with a battery autonomy of 55 h ensuring the desired 24-h recording with a good margin. The digital recorder automatically crops and generates a new MP3 file if the current audio recording is longer than 6 h. Thus, 24 h audio recordings are partitioned into 4 parts of approximately 6 h each.

The dataset is organized into three distinct groups (Fig. 2) as follows:

  1. 1.

    Daily recordings: It contains 30 ZIP files that correspond to the different recording trials of this study (6 days and 5 cows). Each ZIP file comprises ~24 h of audio recordings and the corresponding activity label and automatically generated JM label files (Fig. 2). A total of 708.1 h are included in the 133 audio recordings, consisting of 246.1 h registered indoors while cows visited the dairy milking barn, and 462.0 h registered outdoors while cows remained at pasture. The 133 label files are a list of timestamps indicating the start and end of identified animal behaviors and other annotation remarks. Labels of animal behavior categories include grazing and rumination in standing and lying-down positions, among others. Other annotation labels indicate that the animal is in the barn and the time of installation and removal of the recording systems. The JM label files specify two types of information: (i) a list of timestamps indicating the start and end, and type of JMs; (ii) a list of timestamps with the middle location and type of JMs.

  2. 2.

    JMs: It consists of a ZIP file containing 2 WAV audio files and label files of JMs. The WAV files correspond to a grazing and rumination bout extracted from channel 1 of the ‘D3RS4ID2909P3.mp3’ file, lasting 54.6 and 30.0 min, respectively. Each WAV file has three associated label files in each format (TXT and CSV file extension):

    • A file generated by the experts indicating a list of timestamps (start and end) and a label with the type of JMs.

    • A label file indicating a list of the middle location (single mark) and the type of JMs. This file was created using a Python script that computes the middle locations as the average of the starts and ends specified by the experts.

    • A label file generated with a Python script indicating a list of automatically adjusted timestamps (start and end) and the type of JMs labeled by the experts.

    The former label files are also provided for direct usage with the “D3RS4ID2909P3.mp3” file.

  3. 3.

    Additional information:

    • The ‘BehaviorLabelsDescription.pdf’ file provides a comprehensive description of animal behavior categories, including the registered annotations and the criteria used by the experts to determine the start and end of each behavior.

    • The ‘JMDescription.pdf’ file explains the marks and characteristics used to distinguish the different ingestive and rumination JMs produced during grazing and rumination activities, respectively.

    • The ‘MP3AudioInformation.xlsx’ file provides three worksheets with detailed information on the audio recordings. Information consists of the corresponding trials of the Latin-square design (day, cow and recording system), date, audio duration, sound quality, registered animal behaviors, audio channels, and companion comments.

Fig. 2
figure 2

Internal dataset organization in bundled files and naming.

Technical Validation

The interruptions of regular JMs performed rhythmically in grazing and rumination activities can be used to delimit their bouts12. In this study, interruptions of consecutive JMs greater than 90 s were considered to delimit the grazing and rumination bouts. The duration of the grazing and rumination bouts is shown in Fig. 3. Small interruptions between two consecutive grazing bouts could be associated with an animal distraction or animal walking to a distant feeding patch. The great sensitivity to interruptions of regular JMs generates multiple short grazing bouts that can be aggregated into longer grazing meals, making it useful to estimate minute to hourly grazing time budgets. Thus, about 40% of the grazing bouts last less than 25 min (see Fig. 3), while a typical grazing meal lasts more than 1 h12. About 85% of the rumination bouts lasts less than 75 min (Fig. 3). The waveform and spectrogram of audio signals during grazing and rumination are shown in Fig. 4 and Fig. 4, respectively. The bottom panel of Fig. 4 shows a zoom-in of the waveform region produced during the pause required for swallowing and regurgitating the feed cud between two consecutive chewing periods6,40. A more detailed explanation of grazing and rumination activities is provided in the file ‘BehaviorLabelsDescription.pdf’.

Fig. 3
figure 3

Histogram showing the frequency distribution of the duration of grazing and rumination bouts grouped in 25 min intervals. A total of 257 grazing bouts and 206 rumination bouts are present in the dataset.

Fig. 4
figure 4

Spectrogram and waveform (with zoom) of foraging audio signals associated with (a) grazing and (b) rumination activities.

The 6,227 JMs labeled by the experts correspond to 2,006 chews during rumination (32.2%), 1,136 chews during grazing (18.2%), 578 bites (9.3%), 2,507 chew-bites (40.3%) and 6 possible non-labeled JMs (<0.1%). This indicates a ratio of chew actions to bite actions performed during grazing of 1.18 (see Eq. 1), supporting previously reported results43. The number of chews (Nc), bites (NB) and chew-bites (NCB) produced in a grazing bout can be used to determine the chew-per-bite ratio (RC:B) as:

$${R}_{C:B}=\frac{{N}_{C}+{N}_{CB}}{{N}_{B}+{N}_{CB}}$$
(1)

Examples of the waveforms and the average spectral characteristics of the different types of JMs are shown in Fig. 5. A more detailed explanation of the JMs is provided in the ‘JMDescription.pdf’.

Fig. 5
figure 5

Typical waveform (a) and average spectrum (b) for the different types of JMs: chew produced during rumination and chew, bite and chew-bite produced during grazing. Energy spectra were averaged over all JMs and normalized to the maximum value.

To evaluate the sound quality of the audio recordings obtained from the continuous monitoring of dairy cows, only the active grazing and rumination bouts were examined. Initially, the experts conducted a subjective analysis by listening to random segments of each grazing and rumination bout and confirmed that the corresponding activities were aurally discriminated from the background noise. This statement was further confirmed through a quantitative analysis of these bouts using the JMs’ timestamps automatically generated with the JMs classifier algorithm. For each audio recording, two quality indicators of JMs were individually calculated for grazing and rumination using previously established parameters30.

The first parameter, the JM modulation index (MI) is useful to locate the JMs. The MI is a measure based on the difference between the audio signal intensity produced during the JMs and the background noise. Given that the JMs are performed rhythmically every ~1 s during grazing and rumination, the MI was computed as:

$$M{I}_{JM}=\left(\overline{J{M}_{intra}}-\overline{J{M}_{inter}}\right)/\left(\overline{J{M}_{intra}}+\overline{J{M}_{inter}}\right)\in \left[0;1\right]$$
(2)

where \(\overline{J{M}_{intra}}\) and \(\overline{J{M}_{inter}}\) are the mean audio signal intensity produced during JMs and mean audio signal intensity produced in the short-pauses between consecutive JMs respectively, and defined as:

$$\overline{J{M}_{intra}}=\frac{1}{{l}_{intra}}\mathop{\sum }\limits_{k=1}^{l}{x}^{2}[k]w[k]$$
(3)
$$\overline{J{M}_{inter}}=\frac{1}{{l}_{inter}}\mathop{\sum }\limits_{k=1}^{l}{x}^{2}[k]\left(1-w[k]\right)$$
(4)

where x[k] is the audio signal, l is the length in samples of the audio signal, lintra and linter are the total number of samples with and without JMs, respectively, and w[k] is a logical function indicating the presence of a JM in the k-th sample.

The second parameter is the signal-to-noise ratio (SNR). This parameter indicates the extent to which the background noise affects the sound produced during JMs, thus helping to differentiate between JMs associated with chews, bites and chew-bites. To compute the SNR, the sound produced during JMs must be isolated from the background noise. A multiband spectral subtraction algorithm assuming uncorrelated additive noise in the audio recordings was used to estimate a noise-free signal \(\widehat{s}[k]\) and a noisy signal \(\widehat{n}[k]\)44. The SNR is computed as follows:

$$SNR(dB)=10log\left(\mathop{\sum }\limits_{k=1}^{l}{\widehat{s}}^{2}[k]\right)-10log\left(\mathop{\sum }\limits_{k=1}^{l}{\widehat{n}}^{2}[k]\right)\in {\mathbb{R}}$$
(5)

Examples of audio recordings with high- and low-quality sound are available at Gitlab (https://gitlab.com/luciano.mrau/acoustic_dairy_cow_dataset/-/tree/master/data/sound_quality). Their waveforms are presented in Figs. 5, 6. The higher the MIJM and SNR values, the better the audio recording quality. The frequency distribution of the estimated values of MIJM and SNR for both rumination and grazing computed over the 133 audio recordings of continuous monitoring are shown in Fig. 7. Figure 7. shows a considerable variation in the MIJM values of rumination and grazing. The MIJM values of rumination tend to be smaller than the MIJM values of grazing. This indicates that the JMs produced in rumination (exclusively chews) are more difficult to distinguish from the background noise. This is partly due to the lower intensity of the rumination JMs compared to the ingestive JMs, as shown in Figs. 4, 5. We hypothesize that the lower intensity in rumination is because of the high moisture content of the ingested matter30,40. Fig. 7 shows that the ingestive JMs produced during active grazing are less affected by background noise than the rumination JMs produced during rumination. This could be due to the difference in the energy spectral density of the JMs produced in grazing and rumination compared to that of the background noise45.

Fig. 6
figure 6

Waveforms of segments of audio recordings with (a) high- and (b) low-quality sound.

Fig. 7
figure 7

Frequency distribution of the audio recording quality in terms of (a) the modulation index and (b) the signal-to-noise ratio.

Quantitative differences between the two channels of the audio recordings have been measured in terms of the MIJM and SNR values. Table 4 presents the MIJM values computed for grazing and rumination in each daily recording. The slash-separated values represent the MIJM for grazing and rumination. The less the difference in the MIJM values of channels 1 and 2 of a determined daily recording, the greater the similarity in the signals. Table 5 presents the SNR values in an analogous way to Table 4. In particular, the small MIJM and SNR values of the channel corresponding to the inward-facing microphone from the recordings of day 1 - cow 5, day 3 - cow 3 and day 3 - cow 5 are associated with poor sound quality.

Table 4 MIjm values computed in the trials for the two channels of the MP3 files.
Table 5 SNR values in dB computed in the trials for the two channels of the MP3 files.

Usage Notes

Audio editing software, such as Audacity or Sonic Visualiser46 (www.sonicvisualiser.org), can be used to work with this dataset. The MP3 and WAV files, along with their corresponding label files, can be imported. The multiple label files associated with each audio file (delimited by the experts, delimited automatically or with one central mark) can also be imported simultaneously for comparison or other specific user interests.

The ‘MP3AudioInformation.xlsx’ is a spreadsheet file that provides specific information on the audio recordings obtained from the continuous monitoring of dairy cows. The sheet called “Audiofile properties” describes the Latin-square design for this experiment, which could be useful to analyze variations related to animals, experimental days or recording systems. Additionally, the correspondence between the direction of the microphones (inwards/outwards) and the channels in the audio recordings elaborated by the experts is also indicated. It should be noted that some errors may have occurred in the channel assignment due to the diverse sound quality detected across audio recordings. Any observations or particularities presented in the audio recordings are also mentioned. The sheet named “Cattle activities” specifies the kind of animal behavior categories and annotations presented in the audio recordings. This enables users to filter activities of interest.

Audio recording qualities vary greatly due to differences in microphones and recording channels. We hypothesize that these variations were caused by differences in microphone response, microphone setup at the onset of recordings, and microphone movement during recordings. The sheet named “Audio quality” shows the values of the quality parameters for the audio recordings, using a background color scale from green to red to indicate high- and low-quality sound, respectively. This enables users to choose the optimal audio recordings or apply signal enhancement techniques, among other options. We recommend listening to the audio recordings in stereo or mono, depending on their preferred comfort and result, as this can vary from user to user due to differences in hearing capacity and audio signal intensity. We suggest listening in stereo for audio recordings with high-quality sound and listening only to the channel corresponding to the microphone facing inward for those with low-quality sound, as indicated in the ‘AudioDescription.xlsx’ file.

The information on the JMs labeled by experts can be used as a standalone dataset for JMs analysis and for developing new automatic algorithms for detecting and classifying JMs. We encourage users to utilize the provided JM labels generated by experts as an audiovisual guide and reference to verify and correct the automatically generated JM labels in all audio recordings.