A Machine Vision-Based Method for Monitoring Scene-Interactive Behaviors of Dairy Calf

Simple Summary Requirements for dairy products are increasing gradually in emerging economic bodies such as China, so it is critical to monitor and maintain the health and welfare of the increasing population of dairy cattle, especially dairy calves (over 20% mortality). In this study, a new method was built by combining background-subtraction and inter-frame difference methods to monitor the behaviors of dairy calf. By using the new model and motion characteristics of the calf in different areas of the enclosure, the scene-interactive behaviors of entering or leaving the resting area, turning around, and stationary (no movement) were identified automatically with a 93–97% success rate. This newly developed method provides a basis for inventing evaluation tools to monitor calves’ health and welfare on dairy farms. Abstract Requirements for animal and dairy products are increasing gradually in emerging economic bodies. However, it is critical and challenging to maintain the health and welfare of the increasing population of dairy cattle, especially the dairy calf (up to 20% mortality in China). Animal behaviors reflect considerable information and are used to estimate animal health and welfare. In recent years, machine vision-based methods have been applied to monitor animal behaviors worldwide. Collected image or video information containing animal behaviors can be analyzed with computer languages to estimate animal welfare or health indicators. In this proposed study, a new deep learning method (i.e., an integration of background-subtraction and inter-frame difference) was developed for automatically recognizing dairy calf scene-interactive behaviors (e.g., entering or leaving the resting area, and stationary and turning behaviors in the inlet and outlet area of the resting area) based on computer vision-based technology. Results show that the recognition success rates for the calf’s science-interactive behaviors of pen entering, pen leaving, staying (standing or laying static behavior), and turning were 94.38%, 92.86%, 96.85%, and 93.51%, respectively. The recognition success rates for feeding and drinking were 79.69% and 81.73%, respectively. This newly developed method provides a basis for inventing evaluation tools to monitor calves’ health and welfare on dairy farms.


Introduction
The global population is predicted to reach 9.5 billion in 2050, then the requirement for the animal protein (e.g., eggs, meat, and milk) is expected to increase by over 70% in 2050 as compared to

Experimental Setup and Image Collection
Two cameras were set up (DS-2CD4012, Hikvision, Hangzhou, China) for video/image collection on a commercial dairy farm (Keyuan Clone Ltd., Yangling, China) for monitoring a two-month old Holstein dairy calf in a rectangular fenced enclosure (4 × 2 × 1.5 m). Experimental setup is shown in Figure 1. The Camera A on the length side of the fence monitored the calf activity from the side with a wide angle of view. The vertical height of Camera A was the half height of fence (i.e., 0.75 m), and the horizontal distance to the fence was set to cover the whole activity area of the calf. Camera B was positioned at a height of 1.8 m on the short side of the fence, inclined slightly downwards. Calf's eating and drinking behaviors were monitored, as shown in Figures 1 and 2. Image/video data were collected from 07:00 to 18:00 h each day in July 2013. A single video file was generated per day. The video was captured at 25 frames/s, 2000 kb/s, and with a resolution of 704 pixels (horizontal) × 576 pixels (vertical) (in PAL format). The data-processing computer consisted of a CPU (Intel Core I5-2400, 3.2 GHz) with 8 GB memory and a 500 GB hard disk. Sample data were read and processed using MATLAB 2014b. Animals 2020, 10,190 3 of 14

Calf-Target Detection Method
Common target-detection methods include the inter-frame difference [27], the backgroundsubtraction [28], the Gaussian Mixture Model [29], and the ViBe [30], etc. The Gaussian Mixed Model and ViBe methods have been used to detect moving targets, but were not efficient enough in monitoring the stationary status of animals. The background-subtraction method is able to detect stationary targets, but is susceptible to background interference. The inter-frame difference method has stronger anti-interference properties but cannot detect stationary targets. In this study, the background-subtraction and inter-frame difference methods were integrated to rebuild the background model for individual calf detection. Images processing steps include: (1) Median filtering was performed on the video frames; RGB images were converted to grayscale images, then the background frame was selected, background subtraction was performed, and small areas were removed; (2) Otsu's method was used to segment the image. A square 4 × 4 pixel element was selected for the closing operation and hole filling; (3) The top, bottom, left, and right borders of the non-zero region were expanded outward by five pixels to obtain new borders of a search box that contained as much as possible of the target region. If the border of the search box overlapped with the image border, the image border was considered to be the border of the search box

Calf-Target Detection Method
Common target-detection methods include the inter-frame difference [27], the backgroundsubtraction [28], the Gaussian Mixture Model [29], and the ViBe [30], etc. The Gaussian Mixed Model and ViBe methods have been used to detect moving targets, but were not efficient enough in monitoring the stationary status of animals. The background-subtraction method is able to detect stationary targets, but is susceptible to background interference. The inter-frame difference method has stronger anti-interference properties but cannot detect stationary targets. In this study, the background-subtraction and inter-frame difference methods were integrated to rebuild the background model for individual calf detection. Images processing steps include: (1) Median filtering was performed on the video frames; RGB images were converted to grayscale images, then the background frame was selected, background subtraction was performed, and small areas were removed; (2) Otsu's method was used to segment the image. A square 4 × 4 pixel element was selected for the closing operation and hole filling; (3) The top, bottom, left, and right borders of the non-zero region were expanded outward by five pixels to obtain new borders of a search box that contained as much as possible of the target region. If the border of the search box overlapped with the image border, the image border was considered to be the border of the search box

Calf-Target Detection Method
Common target-detection methods include the inter-frame difference [27], the background-subtraction [28], the Gaussian Mixture Model [29], and the ViBe [30], etc. The Gaussian Mixed Model and ViBe methods have been used to detect moving targets, but were not efficient enough in monitoring the stationary status of animals. The background-subtraction method is able to detect stationary targets, but is susceptible to background interference. The inter-frame difference method has stronger anti-interference properties but cannot detect stationary targets. In this study, the background-subtraction and inter-frame difference methods were integrated to rebuild the background model for individual calf detection. Images processing steps include: (1) Median filtering was performed on the video frames; RGB images were converted to grayscale images, then the background frame was selected, background subtraction was performed, and small areas were removed; (2) Otsu's method was used to segment the image. A square 4 × 4 pixel element was selected for the closing operation and hole filling; (3) The top, bottom, left, and right borders of the non-zero region were expanded outward by five pixels to obtain new borders of a search box that contained as much as possible of the target region. If the border of the search box overlapped with the image border, the image border was considered to be the border of the search box where U end is the top boundary of the target area, D end is the bottom of the target area, L end is the left boundary of the target area, R end is the right boundary of the target area, U test is the top boundary of the non-zero area, D test is the bottom of the non-zero area, L test is the left boundary of the non-zero area, and R test is the right edge of the non-zero area.
(4) Using the above steps, the target area was detected and proposed, as shown in Figure 3b, and the parts outside the target area were extracted (Figure 3c). The region corresponding to the target region in the previously synthesized background frame (Figure 3d where is the top boundary of the target area, is the bottom of the target area, is the left boundary of the target area, is the right boundary of the target area, is the top boundary of the non-zero area, is the bottom of the non-zero area, is the left boundary of the non-zero area, and is the right edge of the non-zero area. (4) Using the above steps, the target area was detected and proposed, as shown in Figure 3b

Features Extraction Method of Calf Scene-Interactive Behaviors
The calf entering or leaving the resting area was recorded on the left of the side-video view. When the right border of the target was in Area A (yellow box in Figure 4), the behavior was defined as entering or leaving the resting area. Feeding and drinking behaviors occurred on the right of the side-video view. When the target's right border reached Area B (blue box in Figure 4), the front video was acquired, the feeding-basin and drinking-basin areas were extracted, and feeding and drinking behaviors were tested.

Features Extraction Method of Calf Scene-Interactive Behaviors
The calf entering or leaving the resting area was recorded on the left of the side-video view. When the right border of the target was in Area A (yellow box in Figure 4), the behavior was defined as entering or leaving the resting area. Feeding and drinking behaviors occurred on the right of the side-video view. When the target's right border reached Area B (blue box in Figure 4), the front video was acquired, the feeding-basin and drinking-basin areas were extracted, and feeding and drinking behaviors were tested.  For extracting entering or leaving behaviors in resting area, the motion characteristics of the individual calf were combined to establish the following behavior recognition model. As animal behavior was continuous, the characteristic average of 10 consecutive frames was taken as the last feature, as shown in Equations 2-6  For extracting entering or leaving behaviors in resting area, the motion characteristics of the individual calf were combined to establish the following behavior recognition model. As animal behavior was continuous, the characteristic average of 10 consecutive frames was taken as the last feature, as shown in Equations (2) Animals 2020, 10, 190 where B R (i) is the right border of the target area in the i-th frame, B D (i) is the distance between the left and right borders of the target area in the i-th frame, and B L (i) is the left border of the target area in the i-th frame.
As the calf's resting area was dark and the calf was black and white, the black parts of the calf that overlapped with the resting area could be lost during target detection when the calf entered or left the resting area. Therefore, we figured out the bias and considered that the calf started to enter or leave the resting area when B L (i) < 30 pixels. Besides this, we experimentally determined that the moving boundary before and after changed by more than 10 pixels when the calf entered, left the resting area, or turned around. The border fluctuation range was less than three pixels when the calf was stationary.
The calf was considered to enter the resting area if three features of the target area satisfied Equation (2). When Equation (3) was satisfied, the calf would be considered as stationary. The calf was considered to leave the resting area when Equation (4) was satisfied. When Equations (5) or (6) was satisfied, the calf would be considered as turning around.

Feeding and Drinking Behaviors Monitoring and Analysis
Background subtraction in the grayscale image was used to detect if there was a calf in the feeding/drinking area. If no calf was present, the current frame would be taken as a new background frame to continue the detection until the calf appeared. During this period, median filtering was used to pre-process data, and Otsu's method was used for segmentation [31]. When the calf was eating, the head extended into the feeding basin. The bottom border of the acquired target area corresponded to the bottom border of the basin mouth, and the target had a larger area. The bottom border of the basin was denoted as D f , the bottom border of the target area was D t , and the area of the target was S. Considering the variability in the boundary of the target area, a threshold value of D f − 5 was used in the test. After the experiment, the area threshold was set as 1950 pixels.
where D f is the bottom border of the basin; D t is the bottom border of the target area; and S is the proportion of the target area. When the detection area satisfied Equation (7), the calf was considered to be feeding. Otherwise, it was considered as not feeding.

Target Detection Results
The selected videos that included the calf in the resting and activity areas totaled 20,640 frames. The experiments were performed using the inter-frame difference method, the background subtraction method, the Gaussian mixture model, ViBe, and the new integrated background model developed in this study. The first column in Figure 5 shows detection of a calf in motion and the second column shows detection of the calf in stationary.
As shown in Figure 5b, the inter-frame difference method had strong noise rejection but it could not detect static and slow-moving targets. The conventional background subtraction method was able to detect most areas of dynamic and static targets but exhibited noise and poor adaptability. The Gaussian mixture model and ViBe had better noise immunity and detected dynamic targets, but were still unable to detect targets with continuous or small-amplitude motions. The new method of the integrated background model included the advantages of the inter-frame difference and conventional background subtraction methods, i.e., strong noise resistance and adaptability, and clearly detected most areas of dynamic and static targets.

Feeding and Drinking Behaviors Monitoring and Analysis
Background subtraction in the grayscale image was used to detect if there was a calf in the feeding/drinking area. If no calf was present, the current frame would be taken as a new background frame to continue the detection until the calf appeared. During this period, median filtering was used to pre-process data, and Otsu's method was used for segmentation [31]. When the calf was eating, the head extended into the feeding basin. The bottom border of the acquired target area corresponded to the bottom border of the basin mouth, and the target had a larger area. The bottom border of the basin was denoted as Df, the bottom border of the target area was Dt, and the area of the target was S. Considering the variability in the boundary of the target area, a threshold value of Df − 5 was used in the test. After the experiment, the area threshold was set as 1950 pixels.
where Df is the bottom border of the basin; Dt is the bottom border of the target area; and S is the proportion of the target area. When the detection area satisfied Equation (7), the calf was considered to be feeding. Otherwise, it was considered as not feeding.

Target Detection Results
The selected videos that included the calf in the resting and activity areas totaled 20,640 frames. The experiments were performed using the inter-frame difference method, the background subtraction method, the Gaussian mixture model, ViBe, and the new integrated background model developed in this study. The first column in Figure 5 shows detection of a calf in motion and the second column shows detection of the calf in stationary.   As shown in Figure 5b, the inter-frame difference method had strong noise rejection but it could not detect static and slow-moving targets. The conventional background subtraction method was able to detect most areas of dynamic and static targets but exhibited noise and poor adaptability. The Gaussian mixture model and ViBe had better noise immunity and detected dynamic targets, but were still unable to detect targets with continuous or small-amplitude motions. The new method of the integrated background model included the advantages of the inter-frame difference and

Recognition of Entering/Leaving Behaviors in Resting Area
When the right border of the target was in Area A in Figure 6, identification of the calf entering or leaving the resting area was performed. Monitored Area A was the inlet and outlet of calf's resting area. Monitored behaviors in Area A include entering the resting area, leaving the resting area, stationary (not moving), and turning around. The right border, left border, and the distance between the two clearly detected most areas of dynamic and static targets.

Recognition of Entering/Leaving Behaviors in Resting Area
When the right border of the target was in Area A in Figure 6, identification of the calf entering or leaving the resting area was performed. Monitored Area A was the inlet and outlet of calf's resting area. Monitored behaviors in Area A include entering the resting area, leaving the resting area, stationary (not moving), and turning around. The right border, left border, and the distance between the two borders were used as classification features. Figure 6 shows four behavioral examples. The extracted characteristic curves are shown in Figure 7.

Recognition of Entering/Leaving Behaviors in Resting Area
When the right border of the target was in Area A in Figure 6, identification of the calf entering or leaving the resting area was performed. Monitored Area A was the inlet and outlet of calf's resting area. Monitored behaviors in Area A include entering the resting area, leaving the resting area, stationary (not moving), and turning around. The right border, left border, and the distance between the two borders were used as classification features. Figure 6 shows four behavioral examples. The extracted characteristic curves are shown in Figure 7. As shown in Figure 7a, when the calf was approaching the resting area, the right border and the distance between the target's right and left borders started to decrease, as well as the left border. When the calf was entering the resting area, the left border was essentially unchanged. In Figure 7b, the calf was static in the first 102 frames, where the first three features were more or less unchanged. When the calf was leaving the resting area, the right border and the distance between the target's right and left borders started to increase. In Figure 6c, the head of the calf was facing the resting area. In the first 480 frames, the target's right border was unchanged. The left border and the distance between the left and right borders suddenly changed because of a slight twisting of the front half of the calf. After the first 480 frames, the right border started to decrease, then became stable, and finally increased again. The distance between the left and right borders gradually decreased, then increased, and finally the left border gradually increased as the calf turned around.
The video segments containing the behaviors of entering the resting area, leaving the resting area, static behavior, and turning around had a total of 42,950 frames. The recognition rate (as compared to video review manually) are shown in Table 1. As shown in Figure 7a, when the calf was approaching the resting area, the right border and the distance between the target's right and left borders started to decrease, as well as the left border. When the calf was entering the resting area, the left border was essentially unchanged. In Figure 7b, the calf was static in the first 102 frames, where the first three features were more or less unchanged. When the calf was leaving the resting area, the right border and the distance between the target's right and left borders started to increase. In Figure 6c, the head of the calf was facing the resting area. In the first 480 frames, the target's right border was unchanged. The left border and the distance between the left and right borders suddenly changed because of a slight twisting of the front half of the calf. After the first 480 frames, the right border started to decrease, then became stable, and finally increased again. The distance between the left and right borders gradually decreased, then increased, and finally the left border gradually increased as the calf turned around.
The video segments containing the behaviors of entering the resting area, leaving the resting area, static behavior, and turning around had a total of 42,950 frames. The recognition rate (as compared to video review manually) are shown in Table 1. Note: The recognition rate refers to the ratio of the number of correctly identified frames to the total number of frames in a behavior sample, and the ratio of the number of misclassified frames to the total number of frames in the behavior.
The recognition rates of calf's entering and leaving behaviors in the inlet and outlet of the resting area, stationary, and turning around were 94.38%, 92.86%, 96.85%, and 93.51%, respectively (Table 1). Failures in recognizing the entering or leaving behaviors were due to their being dark in color and the calf being black and white. The area of the calf that overlapped with the resting area could be missed during target detection. In this study, we used the average of 10 consecutive frames to calculate the characteristic value for recognizing behaviors, so static behavior was occasionally misjudged as entering or leaving the resting area when the calf entered or left the resting area from the static state/stationary. Besides, head swinging also led to misjudgment of the static behavior as turning around. In turning around, detected left and right border information sometimes remained essentially unchanged, with both the forelimbs and hindlimbs moving, resulting in misjudgment or detection.

Feeding and Drinking Behaviors Identification
When the target's right border reached the feeding/drinking area, the front video could be acquired. Based on the front video, the feeding-basin area (91 × 91 pixels) and drinking-basin area (251 × 192 pixels) were extracted (Figure 8). A square 4 × 4 pixel element was selected for the closing operation, extraction of the maximum area, and hole filling (Figure 8c,f).
Animals 2020, 10,190 11 of 14 Note: The recognition rate refers to the ratio of the number of correctly identified frames to the total number of frames in a behavior sample, and the ratio of the number of misclassified frames to the total number of frames in the behavior.
The recognition rates of calf's entering and leaving behaviors in the inlet and outlet of the resting area, stationary, and turning around were 94.38%, 92.86%, 96.85%, and 93.51%, respectively (Table 1). Failures in recognizing the entering or leaving behaviors were due to their being dark in color and the calf being black and white. The area of the calf that overlapped with the resting area could be missed during target detection. In this study, we used the average of 10 consecutive frames to calculate the characteristic value for recognizing behaviors, so static behavior was occasionally misjudged as entering or leaving the resting area when the calf entered or left the resting area from the static state/stationary. Besides, head swinging also led to misjudgment of the static behavior as turning around. In turning around, detected left and right border information sometimes remained essentially unchanged, with both the forelimbs and hindlimbs moving, resulting in misjudgment or detection.

Feeding and Drinking Behaviors Identification
When the target's right border reached the feeding/drinking area, the front video could be acquired. Based on the front video, the feeding-basin area (91 × 91 pixels) and drinking-basin area (251 × 192 pixels) were extracted (Figure 8). A square 4 × 4 pixel element was selected for the closing operation, extraction of the maximum area, and hole filling (Figure 8c  During drinking, the calf's head area accounted for a large proportion of the field of view. When the calf's head had just entered the basin, it was not in the drinking state and accounted for a smaller proportion of the view (Figure 9a). In addition, 'looking' behavior occurred in the drinking area (Figure 9b), but occupied a small area. In this study, drinking and non-drinking behaviors were distinguished by setting the detected area threshold of S t = 2900 pixels. When the proportion of the detected area was greater than S t , it was considered to be drinking behavior, otherwise it was considered to be non-drinking behavior.
Animals 2020, 10, 190 12 of 14 During drinking, the calf's head area accounted for a large proportion of the field of view. When the calf's head had just entered the basin, it was not in the drinking state and accounted for a smaller proportion of the view (Figure 9a). In addition, 'looking' behavior occurred in the drinking area (Figure 9b), but occupied a small area. In this study, drinking and non-drinking behaviors were distinguished by setting the detected area threshold of St = 2900 pixels. When the proportion of the detected area was greater than St, it was considered to be drinking behavior, otherwise it was considered to be non-drinking behavior. In total, 1080 frames were sampled in the feeding area and 2045 frames in the drinking area. When the calf's head was detected in these areas, the characteristics of the target area were extracted and used to identify whether the calf was feeding or drinking. The recognition accuracy was estimated in as: TP/(TP + TN), where TP is the number of samples that were correctly identified and TN is the number of samples that were erroneously identified; for feeding and drinking behaviors, these were 79.69% and 81.73%, respectively. Problems with the recognition of feeding behavior could occur if the calf's head was stationary in the feeding-basin before and after eating, or if the shadow of the calf's head was mistakenly recognized as a feeding behavior. Feeding is a continuous process and it was difficult to separate preand post-feeding behavior from feeding behavior during the study. Licking the basin edge and the smelling the basin resulted in failures in the identification of drinking behaviors. Besides, only one calf was used to test the newly developed method, because commercial dairy farms usually put only one calf in a pen. As some farms may put a number of calves in a large pen, future studies will be required to optimize the method for the behavior-tracking of individual and groups of calves on commercial dairy farms.
In this study, only the daytime video was recorded. In the future, we will further develop the algorithm to recognize the behavior of calves at night. Besides animal behavior monitoring with 2D cameras, other non-invasive/remote monitoring technologies (e.g., heart rate monitor and infrared thermal) can also be added to the existing system to expend the functions or increase the accuracy of the dairy calf behavior monitoring system.

Conclusions
In this study, a new method (i.e., Integrated Background Model) was built by combining background-subtraction and inter-frame difference methods to monitor the behaviors of the dairy calf. By using the new model and motion characteristics of the calf in different areas of the enclosure, we successfully identified the behaviors of entering the resting area (94.38%), leaving the resting area (92.86%), remaining stationary (96.85%), turning around (96.85), feeding (79.69%), and drinking (81.73%).
The new method was tested with satisfied detection performance such as anti-interference characteristics for both dynamic and static targets, as compared to inter-frame difference and the background subtraction methods, Gaussian Mixture Model, and ViBe model. This newly developed method provides a basis for inventing evaluation tools to monitor calves' health and welfare on dairy farms. In total, 1080 frames were sampled in the feeding area and 2045 frames in the drinking area. When the calf's head was detected in these areas, the characteristics of the target area were extracted and used to identify whether the calf was feeding or drinking. The recognition accuracy was estimated in as: TP/(TP + TN), where TP is the number of samples that were correctly identified and TN is the number of samples that were erroneously identified; for feeding and drinking behaviors, these were 79.69% and 81.73%, respectively. Problems with the recognition of feeding behavior could occur if the calf's head was stationary in the feeding-basin before and after eating, or if the shadow of the calf's head was mistakenly recognized as a feeding behavior. Feeding is a continuous process and it was difficult to separate pre-and post-feeding behavior from feeding behavior during the study. Licking the basin edge and the smelling the basin resulted in failures in the identification of drinking behaviors. Besides, only one calf was used to test the newly developed method, because commercial dairy farms usually put only one calf in a pen. As some farms may put a number of calves in a large pen, future studies will be required to optimize the method for the behavior-tracking of individual and groups of calves on commercial dairy farms.
In this study, only the daytime video was recorded. In the future, we will further develop the algorithm to recognize the behavior of calves at night. Besides animal behavior monitoring with 2D cameras, other non-invasive/remote monitoring technologies (e.g., heart rate monitor and infrared thermal) can also be added to the existing system to expend the functions or increase the accuracy of the dairy calf behavior monitoring system.

Conclusions
In this study, a new method (i.e., Integrated Background Model) was built by combining background-subtraction and inter-frame difference methods to monitor the behaviors of the dairy calf. By using the new model and motion characteristics of the calf in different areas of the enclosure, we successfully identified the behaviors of entering the resting area (94.38%), leaving the resting area (92.86%), remaining stationary (96.85%), turning around (96.85), feeding (79.69%), and drinking (81.73%).
The new method was tested with satisfied detection performance such as anti-interference characteristics for both dynamic and static targets, as compared to inter-frame difference and the background subtraction methods, Gaussian Mixture Model, and ViBe model. This newly developed method provides a basis for inventing evaluation tools to monitor calves' health and welfare on dairy farms. Author Contributions: D.H. was Project PI. Y.G. and D.H. designed experiment and conducted the field study. Y.G. and D.H. tested the method. Y.G. and L.C. analyzed the data and wrote the manuscript. L.C. submitted the manuscript to journal for review. All authors have read and agreed to the published version of the manuscript.