On Developing a Machine Learning-Based Approach for the Automatic Characterization of Behavioral Phenotypes for Dairy Cows Relevant to Thermotolerance

: The United States is predicted to experience an annual decline in milk production due to heat stress of 1.4 and 1.9 kg/day by the 2050s and 2080s, with economic losses of USD 1.7 billion and USD 2.2 billion, respectively, despite current cooling efforts implemented by the dairy industry. The ability of cattle to withstand heat (i


Introduction
The dairy industry is increasingly facing grand challenges due to climatic changes [1,2], with heat stress being one of the most significant environmental factors affecting dairy cattle [3].Projected climatic trends indicate a troubling forecast for dairy production in the United States, with anticipated decreases in milk production due to heat stress expected to reach significantly low levels by 2080 [4].Despite current cooling efforts, these losses are juxtaposed against the increasing need to identify judicious uses of natural resources, including water [5].
The adverse effects of heat stress on cattle include diminished milk production, decreased reproductive capabilities, heightened susceptibility to diseases, and potentially increased mortality rates [6].These consequences affect productivity and translate into considerable annual economic losses, estimated at billions of dollars.The ability of dairy cows to withstand heat, termed thermotolerance, is affected by a combination of physiological and behavioral aspects.These traits are significantly heritable and exhibit considerable variation across individual cows, complicating the challenge of effectively managing heat stress within dairy herds [4].
The complexity of interpreting cow behavior in relation to heat stress is heightened by genetic diversity influencing their capacity to manage thermal stress [7].In particular, drinking and environmental enrichment use are challenging behaviors to quantify but may be the most informative behaviors for characterizing how individuals cope with heat stress.The intricate nature of thermal adaptation necessitates integrating sophisticated, non-invasive measures into the genetic selection process to enhance the thermotolerance of dairy herds.Thus, interpreting cow behavior in response to heat stress is complicated because some cows are genetically better suited to cope with heat stress.In contrast, others are behaviorally flexible in dealing with thermal challenges [8].
Integrating automated, non-invasive phenotypic indicators of thermotolerance into genetic selection decisions using metrics relevant to thermotolerance is necessary [4].However, the existing monitoring methods of heat stress are labor-intensive and often fail to provide timely data [9].Furthermore, the increase in dairy size is juxtaposed against the need to monitor individual animals using fewer employees.Because of this paradigm, there is a persistent need to develop new strategies and technologies for monitoring individual animals in large groups [1].
Cattle have a variety of inherent traits that can be used to identify unique individuals, including coat patterns, iris patterns, retinal patterns, facial features, and muzzle patterns [8].Holstein cattle, a common breed of dairy cattle in the US, are easily recognizable by their distinctive black-and-white patterns.Each cow's pattern of spots is unique, making this morphological feature a useful biometric tool for individual identification [10].
This paper presents an approach to address these challenges by developing a system that provides real-time and automated monitoring of dairy cows milked in a robotic milking system using artificial intelligence (AI) and computer vision technologies [11].First, we present an imagery data collection and processing approach that automatically detects and quantifies the drinking and brush use behavior of Holstein cattle dairy cows using their coat patterns.Second, the presented approach performs the fundamental research needed to enable the characterization and development of non-invasive behavioral phenotypes indicative of a cow's ability to withstand heat stress.These behaviors (i.e., brush use and drinking behavior) are integral to maintaining homeostasis, particularly during heat stress [4].Monitoring an animal's use of these resources provides insight into an animal's inherent water efficiency (e.g., drinking behavior), temperament (e.g., resource use frequency, circadian pattern, and plasticity to environmental conditions), and motivation to engage in pleasurable behaviors (e.g., brush use) that ultimately promotes animal welfare [3].
To validate the applicability of the proposed approach, we captured a video dataset consisting of 3421 videos with a total duration of 24 h of continuous recording of dairy cows housed at the T&K Dairy, a commercial dairy partner in Snyder, Texas.Figure 1 shows examples from the collected video dataset.As shown in the video snapshots, the cows are housed in a single free-stall barn that is divided into six pens (n = 180 cows/pen).Each pen provides cattle with access to four water troughs evenly placed throughout the barn.Near three of the water troughs within the pen, cattle have access to an automatic rotating cattle brush that is mounted to the barn.The individual cows that appear in this video dataset were identified using clustering algorithms (e.g., K-means [12]) to assign unique identifiers to individual cows, converting raw visual video streams into structured and analyzable formats stored in a relational database.Then, we utilized ML-based object detection models (e.g., YoloV8 [13]) to accurately recognize individual cows using their coat patterns within the complex farm environments.A Convolutional Neural Network (CNN) model [14] is trained using the extracted cow objects to classify each cow to a particular cluster in our database, which is used in conjunction with the DeepSORT algorithm [15] to track cow activities and provides accurate quantification of watering and brush use behaviors.Finally, a user-friendly GUI interface is developed to enable system users to utilize the developed system conveniently.
This paper makes the following contributions.First, we present a machine learning approach that can automatically capture, process, and visualize massive video datasets to characterize behavioral phenotypes for dairy cows relevant to thermotolerance.Second, a novel object-tracking module is proposed to detect moving cows' behavior in real-time CCTV footage videos.Third, this paper presents a GUI interface on top of a pipeline of ML models and computer vision algorithms (i.e., K-means, YoloV8, CNN, and DeepSORT) to allow ranchers to interact with the developed system conveniently using a web-based GUI.

Related Work
Machine learning (ML) coupled with computer vision [14,[16][17][18][19] has already enabled game-changing capabilities of robotic milking systems by providing the ability to enhance dairy cow health management by automating the detection and analysis of heat stress behaviors using CCTV footage videos [7].ML and computer vision have been used in the literature for a wide variety of functionalities in the dairy cattle domain, including the identification of individual animals [18], analysis of cow behaviors such as feeding [20] and standing and lying [7], and the detection of health indicators such as lameness [21] and body condition score [17].
Fuentes et al. [11] studied the use of ML and computer vision to identify the age of cattle based on their facial features.The face location was detected in still frames isolated from recorded video using YOLOv5.The authors used the MobilenetV2 tool to extract the face's vector of 128 features and aligned it with Resnet18.The extracted feature vector is then fed into an ML model to predict the animal's age accurately.Despite the similarity in the scope, our approach uses different methodologies in utilizing a pipeline of ML and clustering algorithms to identify individual cows based on side-angle images of their coat patterns.
In [7], the authors used computer vision techniques to detect the lying behavior of dairy cows in a freestall barn.Similar to our work, the authors used a combination of YOLOv5x and DeepSORT to identify and track cows using individual bounding boxes for each cow.Changes in the properties of the bounding boxes were used to identify the start and end of positional change events (i.e., lying down and standing from a lying position).However, no attempt has been made to identify cattle based on their biometrics.While the bounding boxes are used to detect behaviors, the behaviors are detected using the changing properties of a single box, unlike our presented approach, which involves two bounding boxes overlapping.
Gupta et al. [22] used the YOLOv4 model to identify cattle by breed.The YOLOv4 model was trained using a custom dataset of eight cattle breeds.The authors evaluated the model using an intersection over union metric, precision-recall curves, a confusion matrix, an overall accuracy equation, and Cohen's kappa.The model was experimentally proven more effective with smaller and high-resolution images.When comparing the YOLOv4 model to other models used for breed detection (e.g., faster RCNN, SSD, and YOLOv3), it is found that YOLOv4 improved the performance of the three models.
Another work presented in [18] attempted to develop a cattle identification method based on coat patterns.Videos were captured from a top angle, resulting in a top-down image of the cow's back.A Mask R-CNN model was used to identify the patterned region of the cow and extract pattern features from the frames of the video, after which a Support Vector Machine (SVM) was used to identify cows based on their pattern features.The resulting system had an accuracy of 98.67%.While this project and ours focus on identifying cattle by coat pattern, the methodologies used vary significantly.The previous work uses a top-down view of the cow in contrast to ours, which uses a side view.
Wang et al. [14] used a 3D-based CNN (E3D) algorithm to classify five cow behaviors in a clip of video: standing, walking, drinking, feeding, and lying down.Videos captured from cattle pens were split into short segments, each containing one of the behaviors of interest.The E3D algorithm comprised several modular parts: a 3D convolution module, a SandGlass-3D module, and an ECA module.The 3D convolution module extracted features from the still video frames, which were then put through the SandGlass 3D module to identify spatial and temporal properties.Background information from the videos was screened and removed by the ECA module.A 3D pooling layer and the Softmax function were used as the final processing steps to compress the behavioral features and perform the behavioral classification, respectively.The proposed model achieved high accuracy in detecting and classifying cow behaviors.This project adopted a different approach from ours, though both were based on a CNN model.Both projects also identify multiple behaviors with the same algorithm.However, this work focuses solely on behavior detection and does not attempt to identify cattle as individuals.
Another study presented in [23] achieved acceptable results in detecting the behaviors of dairy cows.The authors focused on developing a deep learning model called Res-DenseYOLO, which is an improvement of the YOLOv5 model by incorporating DenseNet and residual network structures to enhance feature extraction, for the automatic recognition of dairy cow behaviors, specifically standing, lying, eating, and drinking.However, this work has not implemented the unique identification of individual cows or continuously tracked the duration of behaviors.
Despite the previous success of cattle identification using computer vision to identify coat patterns, the existing work has notable limitations [17,24].The used imagery dataset has a small field of view, often set up where cattle walk through narrow passages with limited ability to turn [8,10,25].Additionally, the lighting is constant, and there may be only one or a few cattle in the frame at a time, all of which simplify the task of identifying cattle by computer vision but limit the potential applications in a busy barn.In contrast, our approach is designed to identify cattle at a distance and in an open space within a broad view frame.
In summary, the existing work focusing on the automatic characterization of behavioral phenotypes for dairy cows used different approaches for cattle identification [11,17,18] and behavioral monitoring [7,11,22] via computer vision and ML.However, none of them have successfully combined these two objectives into a single platform and conveniently provided a user-friendly GUI interface to the system.To the best of our knowledge, our approach represents the first step to building a system that automatically identifies dairy cattle based on biometric features and monitors their behavior of interest based on interactions with other objects in the barn (i.e., water troughs and brush stations).

Dataset Collection
Figure 2 illustrates the camera placement in the barn at a low angle to capture the side of the cows.As shown in the schematic figure, each barn at the T&K Dairy is fitted with Safevant, Safesky, and 1080P Isotect wireless security cameras that continuously capture individual cow behavior at the waterers and the brushes throughout the 45-day observation period.Cows are milked using a Lely Robotic Milking System, equipped with 18 robots and 3 robots per pen, that milks the cows twice daily.Cows are cooled using multiple different strategies.The barn is equipped with fans, sprinklers, and foggers.The sprinklers begin running, at rates of one-minute durations, in a round-robin system across all pens when the air temperature in the barn exceeds 74 • F. Thus, each pen will have the sprinklers turned on for one minute at least ten times per hour until the temperature falls below 74 • F. When the temperature in the barn exceeds 80 • F, the fogger system will begin operating and will continue until the temperature drops below 80 • F.
Several variables are collected using the robotic system and recorded in the Lely management software Time for Cows (T4C).This project is of specific interest in milk production, yield, maximum milk speed, dead milking time, and robot behavior (i.e., visit, rejection, and fetch frequency).A subset of focal cows (n = 96; 16 cows/pen) that are 45-90 DIM were monitored for a 45 d period.
We captured a video dataset on 12-13 March 2023, consisting of 3421 videos with a total duration of 24 h of continuous recording of the cows.These videos were recorded in DAT format.We converted them to MP4 format using the FFmpeg conversion tool [26].This preprocessing step was necessary because the MP4 format has high compression and compatibility with numerous multimedia applications, making it the preferred choice for ensuring seamless playback and processing.
These video recordings were used quantify drinking behavior and brush use behavior.When individual identification is required on the video recordings, each dairy cow has a unique coat color spotting pattern that can be used for individual identification.During the time that individuals were fitted with pedometers, their drinking and brush use behavior (frequency, duration, circadian pattern, displacements) were decoded from video recordings.While this is possible using manual decoding methods, the development of automatic ML-based methods can expedite data collection, knowledge creation, and results implementation.

Methodology
The practical side of the proposed approach is to build machine vision and ML methods to support the automatic acquisition and processing of imagery data needed to develop behavioral phenotypes for dairy cows relevant to thermotolerance.The foundational work aims to understand the principles underlying such systems and inform the design and implementation decisions about them.
Figure 3 shows the system architecture of the proposed approach, which is divided into four layers.Layer 1 shows the video preprocessing phase, which involves slicing the collected video dataset (i.e., 3421 videos) into individual frame images using a Python 3.12.4script leveraging the FFmpeg framework [26] at predefined intervals.We then extracted 1961 cow objects to train the cow clustering algorithm and CNN model.The Roboflow tool [27] was utilized to annotate the cow, brushing tool, and waterer objects to train the detection and segmentation models.Layer 2 shows the cow detection and segmentation module using the YOLOv8 model, the cow clustering module using the K-means model, and the cow identification module using the CNN and SENet models [28].Layer 3 describes the application layer implemented using the Python Flask Framework [29] and SQLite version 3.46 database engine [30] to build a GUI web-based app that allows system users, shown in layer 4, to use the system conveniently.
Using unsupervised and supervised ML models alongside algorithmic tracking and behavior analysis [24], we utilized a pipeline approach for processing the video dataset where data flows from one layer to the next.Figure 4 shows the different phases of cow detection, clustering, identification, and tracking behaviors of interest.

Cow Detection and Segmentation Using YoloV8
The YoloV8 model is trained using a custom imagery dataset to accurately detect and segment the cow objects in the video frames.The YoloV8 model is a deep learning algorithm used for its high-performance detection of real-time objects within video streams.Upon receiving video input, YoloV8 processes the frames to identify and locate the cow, water tank, and brushing tool objects, assigning bounding boxes around them.
After detecting the objects of interest (i.e., cow, water tank, and brushing tool objects), we used a cropping tool to extract the bounding boxes generated by YoloV8 containing these objects from the frames.This extraction process is vital to isolate objects of interest from their background, allowing for cleaner data input into the next clustering phase.This process is summarized in Algorithm 1.  Load and process the image to RGB Process images in the specified folder 18: end procedure We used the TaskAlignedAssigner class to improve the model's performance by effectively matching the predicted bounding boxes with ground truth boxes.In particular, it calculates a score s for each predicted box, as follows.
where γ is the prediction score corresponding to the ground truth category, η is the IoU of the prediction bounding box and the ground truth bounding box, and m and n are hyperparameters that weight the importance of the classification score and the IoU score, respectively.TaskAlignedAssigner ensures that only these prediction scores, which are confident in their class predictions and accurate in their localization, are selected as positive samples.This dual consideration helps the model learn more effectively from classification and localization tasks, leading to improved overall performance in object detection.

Cow Clustering Using K-Means
The extracted cow objects are then fed into a K-means clustering algorithm, an unsupervised learning algorithm that groups the cow objects into clusters based on their visual similarities.The K-means algorithm iteratively assigns each cow object to one of K predefined clusters based on feature similarities, minimizing variance within the clusters and maximizing variance between them.
Algorithm 2 shows the steps of the cow clustering phase, which is divided into the following processes: (i) texture feature extraction using the Local Binary Pattern (LBP) method and (ii) creating color histograms by capturing and analyzing the color distribution in the cow images.We also used the Principal Component Analysis (PCA) method to reduce the feature dimensionality and focus on the most important features of input images.
As shown in the algorithm, we select an initial cluster of centroids, K, randomly selected from the data points.We then assign each data point to the nearest cluster centroid.For a given data point x i and centroid µ j , the assignment process is performed as follows: where c i is the cluster assignment for data point x i , and ∥x i − µ j ∥ 2 is the squared Euclidean distance between x i and µ j .The K-means algorithm updates the centroids after each iteration by calculating the mean of data points assigned to each cluster, as follows: where µ j is the new centroid of cluster j, and C j is the set of data points assigned to cluster j.K-means tries to minimize the Within-Cluster Sum of Squares (WCSS) inertia objective function, which is defined as: The algorithm keeps iterating between the assignment and updates steps until convergence, typically when the cluster assignments no longer change or the change in the objective function is below a certain threshold.

Cow Identification Using a CNN and SENet Model
We trained a Convolutional Neural Network (CNN) model enhanced with Squeezeand-Excitation Network (SENet) layers [28] using the cow clusters generated from the clustering phase to detect the cow objects based on their features and behaviors of interest.
The training process allows CNN to learn the nuanced differences between clusters by calculating a similarity score for each cow against the cluster centroids.If the score exceeds a predefined threshold, the cow is assigned the ID of that cluster; otherwise, the cow is flagged as potentially new or not belonging to any existing cluster.Algorithm 3 shows the steps of the cow identification phase using the CNN and SENet model.for each cluster and image path do 13: Create or verify the existence of a directory for cluster 14: Copy the image to the corresponding cluster directory Organize images into clusters based on their assigned cluster label 43: end procedure The SENet block enhances the feature extraction and representation of the trained cow images by dynamically recalibrating channel-wise feature responses.
First, SENet applies a convolution operation, δ to the input feature map I, as follows: where f δ (I) represents the convolution operation and X is the output feature map with dimensions H × W × C. Calculate precision, recall, and F1 score manually return precision, recall, F1 44: end function 45: Calculate and print precision, recall, F1 for validation and test sets Then, the squeeze operation performs a global average pooling on X to generate a channel descriptor θ, which is defined as:

Algorithm 3 Cow identification using CNN and SENet Models
where θ c is the cth element of the descriptor θ ∈ R C .
The excitation operation models the channel-wise dependencies using two fully connected layers with ReLU and sigmoid activations, as follows: where r are the weight matrices, r is the reduction ratio, and σ is the sigmoid function.
Finally, the recalibration step scales the original feature map X by the channel-wise weights s, as follows: where Y i,j,c is the recalibrated feature map.

Tracking Cow Behaviors of Interest Using DeepSORT
The DeepSORT algorithm is used to track the cow's behaviors of interest (i.e., drinking and brushing).The DeepSORT algorithm extends the SORT (Simple Online and Real-Time Tracking) algorithm by incorporating deep learning features for more accurate tracking in crowded and complex environments, which can track multiple objects in a video stream, handling challenges such as occlusion and reappearance.
DeepSORT uses the assigned cluster IDs generated from the previous CNN phase across the video frames and associates the recognized behaviors of interest to the individual cows throughout recorded videos.Algorithm 4 summarizes this process.return predicted cluster-ID and probability 32: end procedure We used the Kalman filter to enhance the motion prediction.The Kalman filter predicts the current state x t of an object based on its previous state x t−1 , as follows:

Algorithm 4 Tracking cow behaviors of interest using DeepSORT
where F is the state transition matrix, B is the control input matrix, u t is the control vector, and w t is the process noise.
The observation model updates the state with new measurements z t , as follows: where H is the observation matrix and v t is the measurement noise.
The cost matrix ψ combines the motion and appearance information to match detections to tracks for better data association: where d(y i , y j ) is the Mahalanobis distance between the predicted state y i and the actual detection value y j , cosine( f i , f j ) is the cosine distance between the deep feature vectors f i and f j , and λ is a weight parameter to balance motion and appearance costs.

Cow Behavior Analysis with Overlap Detection Algorithm
The final phase in our pipeline system is quantifying the cow behaviors of interest using an overlap detection algorithm.In particular, we developed an algorithm that calculates the duration each cow spends drinking water or using the brush tool by measuring the overlap area between the bounding boxes of the cow and the water tank or brushing tool.This duration is then logged into the database, which monitors the cows' health indicators over time.
In Algorithm 5, the Calculate the Coordinates of the Bounding Box procedure describes the step of calculating the coordinates of the bounding box of an object of interest using the input parameters x center , y center , width, height, and converting the result to a topleft format (i.e., le f t, top, right, bottom).
Procedure Check the Existence of Overlap between the Bounding Boxes presents the functionality of checking the existence of overlap between the bounding boxes of a cow object with a water tank or brushing tool object.The two coordinates of the two boxes, bbox1, bbox2, are fed into the calculate_overlap_area function, which returns True if there is an overlap between bbox1 and bbox2; otherwise, it returns False.
In Procedure Calculate the Overlap Area between Two Bounding Boxes, we calculate the overlap area between the input bounding boxes (bbox1, bbox2) using their centroid (i.e., the center (x, y) coordinates of the bounding box) as follows: x_overlap × y_overlap.We then calculate the Euclidean distance between the centroids of bounding boxes as described in Procedure Calculate the Euclidean Distance between the Centroids of Bounding Boxes.
We then check the proximity of a target bounding box to other boxes in a video frame by comparing the Euclidean distance of the target bounding boxes with all identified boxes in the scene using a predefined threshold.x_overlap ← max(0, min(right1, right2) − max(le f t1, le f t2))

Implementation 4.1. Dataset Preprocessing
We implemented various image preprocessing techniques to the training dataset to improve the training accuracy and decrease training loss of the ML models.First, we changed the color contrast of the images and applied Gaussian noise.Also, we used image desaturation, making pixel colors more muted by adding more black and white colors.These transformations aim to reduce the influence of the background factor during the training process.
Before training the ML models, we normalized the dataset's pixel intensity values of cow images, distributed as a Gaussian curve centered at zero.Image normalization was calculated by subtracting the mean value of the cow image δ from the value of each pixel C(i, j), then dividing the output by the cow image's standard deviation α, as follows: where C is the input cow image, X is the output image, and i and j are the current pixel indices to be normalized.We augmented the number of images for a few cow clusters that lack a training set to avoid the overfitting issue of the ML models.Figure 5 illustrates a sample of the implemented geometric transformations applied to these images.In particular, we implemented horizontal flipping, −45 • to 45 • rotation, 1.5× scaling, zoom with a range of 0.2, width and height shifts with a relative scale of 0.3, and cropping some images manually.The Roboflow tool was used to annotate the objects of interest (i.e., cows, water banks, and brushing objects) for our detection and segmentation modules, as shown in Figure 6.Roboflow is a versatile platform for annotating the imagery dataset that utilizes the Segment Anything Model (SAM) to annotate these objects, for instance, segmentation functionality, which boosts the annotation task 10 times faster than traditional annotation methods.However, it fails to annotate some objects where the edges of two objects are mixed up.In these scenarios, we had to define and describe the spatial regions of the target objects manually.

ML Models
We implemented ML models using a Jupyter development environment [31].Jupyter, which uses PyTorch [32] as a back-end engine, is an open-source neural network library written in Python.PyTorch provides comprehensive tools and pre-built functions that facilitate the construction of deep learning models.The ML model training was conducted using an Alienware server computer equipped with a 5 GHz Intel Core™ i9-16 MB CPU processor, Dell PC manufactured, Irvine, CA, USA, 2 TB SSD Hard Drive, 32 GB of RAM, and NVIDIA RTX GPU capability.

Cow Detection and Segmentation Using YoloV8
We developed a cropping tool using the Python programming language to extract the objects of interest from the video frame images using the bounding boxes generated by YoloV8.Figure 7 shows examples of the extracted cows, water bank, and brushing objects.This process isolates the objects of interest from their background, allowing cleaner data input into the clustering phase.The YOLOv8 model was trained with the annotated cow images in various poses, interactions, and lighting conditions typical of a farm setting.This diversity helps the model to recognize the cow objects reliably under different real-world conditions.We set the batch size and number of epochs to be 50 images and 100 epochs, respectively.

Cow Clustering Using K-Means
The K-means clustering algorithm was implemented using scikit-learn [33], a free and open-source ML library for the Python programming language.The developed K-means model generated 300 different cow clusters, each with multiple images of the same cow from various angles and poses.Figure 10 shows an example of the cow images from the same cluster.

Cow Identification Using the CNN and SENet Model
Before training the CNN model, all cow images must be the same size.We trained the model with colored (RGB) images with resized dimensions of 200 × 200 pixels.We set the batch size to 100 images and the number of epochs to 15 epochs; a snapshot of the trained weights is taken every 5 epochs to monitor the progress.
The CNN model is structured with over 59 million trainable parameters.We trained the model with four fully connected convolutional layers: one input layer, a classification layer, and SENetBlock.We adjusted several model parameters, including the learning rate, number of layers, and number of neurons per layer, to find the optimal configuration that maximizes precision and meets the confidence threshold requirements.

Tracking Cow Behaviors of Interest Using DeepSORT
The DeepSORT algorithm was implemented using YOLO libraries and various Python OpenCV 4.9 libraries, including CV2, Pandas, and Shutils.Figure 12 shows successful examples of quantifying the drinking and brushing behaviors of multiple cows in the same scene.As shown in the figure, our system shows the duration of each behavior of interest in seconds, along with the identified cow IDs.

GUI User Interface
We built the web app using Python Flask Framework, ReactJS, HTML5, CSS3, JavaScript, and JSON.To run the web application on top of the ML models, we had to wrap both models, implemented on PyTorch, as a REST API using the Flask web framework.In other words, the communication between PyTorch and Flask is coordinated through that REST API.When the user captures an image using the camera, Flask uses the POST method to send the image from the user browser to PyTorch via an HTTP header.
The GUI is designed to be intuitive, allowing users to easily upload videos, view analysis results, and receive notifications about the behaviors of interest related to heat stress.Figure 13a shows the videos page that allows users to upload videos and preview them to confirm correctness.
Once a video has been uploaded, the user can kick-start the cow detection pipeline by hitting the Start Inference button.Figure 13b shows the cows page that displays all identified cows along with their assigned identification number and sample photo.Once the inference process has been completed, the user is renavigated to the dashboard page to view the behavioral analysis completed on the uploaded video (see Figure 13c).Users can also view the duration of each behavior of interest and preview the inference video, which shows the identified cows and the duration spent.

Evaluation
We experimentally evaluated our prototype implementation regarding classification accuracy and performance.For classification accuracy, we observed that our system delivers good results in natural conditions even when the images are captured from different distances from the camera, orientations, and illumination conditions.Figure 14 shows an example of successful inference of cow identifiers and their behaviors of interest, along with the duration spent in each activity.The precision vs. recall curve, shown in Figure 15, summarizes the trade-off between the true positive rate and the positive predictive value for our YoloV8 model using different probability thresholds.In other words, it indicates the model's ability to accurately identify the cow objects while maintaining a balance between false positives and false negatives.The curve demonstrates that the model achieves high precision and recall across a wide range of thresholds.Also, it attests to its effectiveness in detecting cows regardless of the sensitivity level, which proves that our system can be reliably deployed in real-world scenarios.
Precision represents the positive predictive value of our model, while recall is a measure of how many true positives are identified correctly.As shown in the figure, the precision vs. recall curve tilts towards 1.0, which means that our YoloV8 model achieves high accuracy while minimizing the number of false negatives.
The precision ratio describes the performance of our model at predicting the positive class.It is calculated by dividing the number of true positives (TPs) by the sum of TPs and false positives (FPs), as follows: Precision = TPs TPs + FPs (13) The recall ratio is calculated as the ratio of the number of true positives divided by the sum of TPs and the false negatives (FNs), as follows: The overall classification accuracy of our model is calculated as the ratio of correctly predicted observation (i.e., the sum of TPs and true negatives (TNs)) to the total observations (i.e., the sum of TPs, FPs, FNs, and TNs) using this equation:

Conclusions and Future Work
As intensive dairies grow, the need for automatic cattle monitoring becomes more pressing.Manual observation can be practical on a small scale but quickly becomes infeasible when dairies host hundreds or thousands of cows.Further, there is an increasing need to use modern technologies, including computer vision and AI, to track behavioral changes to alert the farmer of the herd's health status.This paper presented the design and implementation of an ML-powered approach for automatically characterizing behavioral phenotypes for dairy cows relevant to thermotolerance.
We collected a dataset consisting of 3584 videos of 24 h of continuous recording of hundreds of cows captured from the T&K Dairy in Snyder, Texas.The developed system used computer vision and ML models to monitor two cow behaviors of interest: the drinking and brush use of dairy cows in a robotic milking system.In particular, we utilized the YoloV8 model to detect and segment cow, water tub, and brushing tool objects.The K-means algorithm is used to group the cows into clusters, which is used as input to a CNN model to identify the cows in the videos.We used the DeepSORT model to track the cow activities in the barn.We finally quantified the behaviors of interest using the developed overlap detection algorithm.A user-friendly interface was created on top of the ML models, allowing ranchers to interact with the system conveniently.We tested our system with a dataset of various cow videos, where crowded backgrounds, low contrast, and images of diverse illumination conditions were considered.Our system achieved high precision in object detection and behavior recognition, which was corroborated by the system's ability to accurately track and analyze the cow behaviors of interest within a dynamic farm environment.Most notably, the YoloV8 and CNN models achieved accuracies of 93% and 96% in detecting the objects of interest and identifying the cow IDs, respectively.
In ongoing work, we are looking into opportunities for generalizing our approach to detect a broader range of changes in behaviors or health indicators in various farm conditions [6], such as increased mounting or standing behavior that can indicate that a cow is going into estrus.In contrast, changes in walking and lying behavior can indicate lameness before it is evident enough to be noticed by manual inspection [21].Another avenue of further improvement is incorporating IoT sensors into the barn that could automate data collection and action initiation, such as adjusting environmental conditions in response to detected behaviors, thereby enhancing the system's responsiveness.We expect the developed system to inform the genetic selection decisions and impact dairy cow welfare and water use efficiency.

Figure 1 .
Figure 1.Sample examples from our video dataset.

Figure 2 .
Figure 2. Camera placement in the barn.

Figure 4 .
Figure 4. Phases of cow detection, clustering, identification and tracking behavior.

1 :
Input: video_path, output_video_path, model_path 2: Output: Annotated Video, Activity Videos, Cow Images 3: procedure INFERENCE(video_path, output_video_path) cow image to predict cluster ID 17: Save cow image and update the database with the cluster ID 18:if cow is performing an activity then and release resources 27: end procedure 28: procedure PREDICT_CLUSTER_ID(image)29:Transform image to tensor 30:Predict cluster-ID using cow identification model 31:

Figure 6 .
Figure 6.Annotating the cows, water bank and brushing objects using Roboflow.

Figure 7 .
Figure 7.The extracted cows, water trough and brush objects.

Figure 8
illustrates the calculated training loss of the cow detection model graphically for four different loss functions: box loss, segmentation loss, classification loss, and total loss.The accuracy increases while the Mean Squared Error loss decreases consistently over the 100 training epochs.As shown in the figure, our model converged after the 75th epoch, which means that our image dataset and the fine-tuned parameters fit the model well.

Figure 8 .
Figure 8.The training loss of the YoloV8 model.

Figure 9
Figure 9 illustrates an example inference result of the YOLOv8 model that detects the cows, water bank, and brushing objects with high accuracy.As shown in the figure, the developed YOLOv8 model performed various computer vision and ML functionalities, including object detection, segmentation, pose estimation, tracking, and classification.

Figure 11
illustrates an example of the inference result of the CNN model for fying the correct cow ID from different clusters.As shown in the figure, the CNN model classified the input cow objects to the correct cluster ID based on their coat patterns with an average accuracy of 96%.

Figure 11 .
Figure 11.Examples of successful inference results using the CNN model.

Figure 12 .
Figure 12.Cow behavior analysis using the overlap detection algorithm.
(a) Video dashboard page (b) Identified cows page

Figure 13 .
Figure 13.Screenshots of the web-based GUI user interface.(a) The video dashboard page that allows users to upload the videos for inference.(b) The cow's page that shows the identified cow IDs that appeared in the videos.(c) The recognized cow behaviors page showing the drinking and brushing activities along with their timestamps and durations.

Figure 14 .
Figure 14.An Example of successful inference of cow identifiers and their behaviors of interest.

Figure 15 .
Figure 15.The precision-recall curve of the YoloV8 model.

1 :
Define image transformations for data augmentation 2: Load the training, validation, and testing datasets 3: function IMSHOW(input, title)

1 :
procedure CALCULATE THE COORDINATES OF THE BOUNDING BOX 2

:
Require x center , y center , width, height 3:le f t ← int(x center − width/2) if le f t1 ≥ right2 OR right1 ≤ le f t2 OR top1 ≥ bottom2 OR bottom1 ≤ top2 then 35: procedure CHECK THE CLOSE PROXIMITY OF A TARGET BOUNDING BOX TO OTHERBOXES IN A VIDEO FRAME