Novel method for real-time detection and tracking of pig body and its different parts

Detection and tracking of all major parts of pig body could be more productive to help to analyze pig behavior. To achieve this goal, a real-time algorithm based on You Only Look At CoefficienTs (YOLACT) was proposed. A pig body was divided into ten parts: one head, one trunk, four thighs and four shanks. And the key points of each part were calculated by the novel algorithm, which was based mainly on combination of the Zhang-Suen thinning algorithm and Gravity algorithm. The experiment results showed that these parts of pig body could be detected and tracked, and their contributions to overall pig activity could also be sought out. The detect accuracy of the algorithm in the data set could reach up to 90%, and the processing speed to 30.5 fps. Furthermore, the algorithm was robust and adaptive.


Introduction 
China is a country with the largest number of pigs breeding [1] . With continuous expansion of factory farming, a risk of pig farming is also increasing [2] . Detection and tracking of pigs can help to find out their abnormal situation in advance and prevent pigs form diseases [3][4][5][6] .
In order to realize automatic instead of manual recording, many different new technologies have been applied in the recent decades. Some researches adopted radio frequency identification technology and sensor technology for pig monitoring [7,8] . Compared to that, machine vision is of the advantages of simple operation, low cost, fast detection speed, wide application range, and great potential and benefit in agricultural production [9] . Shao et al. [10] designed a system, which could successfully detect mobile state of pigs, and classify the comfort state of pigs into three categories by crowding degree. Nasirahmadi et al. [11] adopted Delaunay Triangulation to analyze the relationship between ambient temperature and crowding degree of pigs. Kashiha et al. [12] used pattern recognition technology to identify pigs, but the patterns could not be stored for a long time. In order to achieve the purpose of long-term monitoring, Ahrendt et al. [13] built up support maps to estimate the location and identity of pigs. Xiao et al. [14] adopted the detection and tracking based on a set of association rules with constraint items (DT-ACR) to track pigs. However, this method could not completely solve the influence of light, adhesion and occlusion on monitoring in natural environment. In another way, some researchers have made great efforts with convolutional neural network to solve the problem [15] . Among them, Hua et al. [16] proposed a way to combine occlusion and motion reasoning with a tracking-by-detection approach to handle the occlusion problem, and Yang et al. [17] presented a method to predict the position of a target through geometric transformation of an object. In order to improve detecting accuracy. RNN [18] adopted region recommendation and CNN to detect targets, the recognition effect with which was obviously improved, but there were processing speed bottlenecks in this algorithm. In response to a slow detection speed of Region-CNN (R-CNN), Faster R-CNN [19] with the Region Proposal Network (RPN) was proposed to improve target detection speed. Although the processing speed of this algorithm is improved, it is difficult to achieve a real-time detection which requires speed above 30 fps.
YOLACT is the first one-stage instance segmentation model whose detection speed is above 30 fps [20] , while other deep learning models (such as Single Shot MultiBox Detector (SSD) [21] , You Only Look Once (YOLO) [22,23] , etc.) are mainly for object detection. The YOLACT discards some implicit feature location steps and divides an instance segmentation task into two parallel tasks: (1) generating a series of prototype masks covering the whole graph; (2) predicting a series of linear combination coefficients for each instance. Moreover, for each instance, Fast Non-Maximum Suppression (Fast NMS) is adopted to process the predicted mask. Its input is RGB image, and output is mask of different location, color and semantic instance.
In this paper, a real-time algorithm based on YOLACT was proposed to improve the detection speed and accuracy. Detection and Tracking of Multiple Parts of pig body (DTMP for short) were achieved to improve the ability to analyze the behavior of pigs.
Some achievements have been made in the research of animal activity detection [24,25] . Oczak et al. [26] adopted activity index to classify aggressive behavior. Ojukwu et al. [27] used a computer vision system to detect pig inactivity. Currently, there is no clear way to measure pig activity by detecting and tracking of pig body part movement. In this paper, the activity accumulation method [14] was used to measure pig activity, which can reflect pig activity state well and describe pig activity characteristics in a long time.

Animal model template
Based on the parameterized 3D model (shape completion and animation of people) [28] and 2D human pose model [29] , a pig pose model was proposed, which was divided into 10 parts (separated by black dotted lines), and 15 key points (marked with red points) were determined, as shown in Figure 1

Data acquisition and marking
This study was carried out on pens in Siping Hongzui Agricultural High-tech Development Co., Ltd. A data set was taken from ten pig pens that consisted of Landrace×Yorkshire crossbred pigs aged between 1-2 years. The data was collected from June 10 to July 5, 2019, with a time span of 3 weeks, and three videos were collected for each pen, for a total of 30 videos. The algorithm proposed was to detect pig activity, so the data selected were mainly concentrated in the 11:00-13:00 and 15:00-17:00 time periods, which were the most active time periods for feeding pigs in a day [30] . The duration of each video was about 130 min, frame rate was 30 fps, and image resolution was 640 × 480 pixels. For each pen, one video randomly selected was taken as a test sample, and the rest as training sample. 10000 images were selected randomly in the training sample, and 5000 images in test sample. In order to achieve a good generalization of training results, 1022 pig images selected from the ImageNet dataset were added into a primitive training sample. Therefore, there were 11 022 images in the training sample, 5000 images in test sample in all. There was no image processing before training, and an average was taken as a final result after training repeated 10 times. Another set of data for the activity accumulation experiment was collected while the camera was fixed in a corner of the pen with 1.5 m high and 45° downward sloping.
Labelme [31] , an open source image labeling tool, was adopted to mask the training sample and test sample. A sample tag image obtained was shown in Figure 2.

Figure 2 Tag image of a pig from Labelme
A Deep-learning framework TensorFlow [32] and a Lenovo computer with 16 G memory, windows 10 operating system, Intel (R) core (TM) i5-9400 CPU and Nvidia GeForce RTX 2060 SUPER (8G) GPU were adopted to train the network and test the performance of the algorithm. In addition, because pigs are dormant at night, this situation is not considered in this paper.

Detection and tracking of multiple parts of pig body (DTMP) algorithm
The operation procedure of the DTMP algorithm is shown in Figure 3. The DTMP algorithm starts by reading an image. Furthermore, the image is processed by YOLACT, then each pig body mask and 10 parts masks of each pig body are obtained, and mask is area information of target image.

Figure 3 Flowchart of DTMP algorithm
Step 1: Assign each part to corresponding body by its mask. Each pig body mask is fetched, then 10 parts of a pig are assigned to a corresponding pig body according to Equation (1): where, (x j , y j ) is mask point of j th pig part mask, (x i , y i ) is edge point of i th pig body mask. When more than 95% of the mask points of a part are in a pig body mask, it means that this part belongs to the corresponding body.
Step 2: Calculate key point of pig head (No.0). Key point of pig head could be calculated according to the center of gravity [33] , shown as Equation (2): where, (x, y) is the key point No.0; (x i , y i ) is boundary point of the head mask, i = 1, …, n, x n+1 = x 1 , y n+1 = y 1 .
Step 3: Calculate three key points of right front thigh and shank (No.3, 4 and 5).
Two key points, No.3 and 4, of right front thigh are calculated by Zhang-Suen thinning algorithm [34] , with which a thigh skeleton is obtained. The calculation method of key points of No.4 and 5 of right front shank is the same as that of thigh. The pig knee is the splice point between shank and thigh, and its key point No.4 is in accordance with Equation (3) Step 4: Calculate two points of trunk (No.1 and 2). As far as the front thighs are concerned, there are two ways to calculate key point No.1 of pig trunk. One way is used only when one front thigh mask is detected. The key point could be calculated by Equation (4): where, (x i , y i ) represents trunk mask boundary point, (x, y) represents key point No.1, and d represents the distance between thigh root key point No.3 (or No.6) and trunk mask edge.
Another way can be used only when both front thigh masks are detected. Equation (3) can be adopted, and an average value of two shoulder key points No.3 and No.6 can be taken as the key point No.1 of the trunk. The calculating method of key point No.2 of two rear thighs is the same as that of the two front thighs.
When no thigh mask is detected, Zhang-Suen thinning algorithm [34] is adopted to get a trunk skeleton, and then its two key points (both the ends of the skeleton) are sought.
After each key point is output, the algorithm will wait for 10 ms, during which it will detect whether the ESC key is pressed. The algorithm will exit when the ESC key is pressed, otherwise, it will continue to read the next image.
In order to seek out pig activity, the position of pig have to be tracked. The Oriented Fast and Rotated Brief (ORB) algorithm [35] with a real-time speed is chosen to extract and describe key points, and these feature points are matched with a specific pig by the Hamming distance.
A displacement of the same key point in two adjacent frames is taken as the movement distance of target.
The distance accumulation in a period of time can be expressed as the sum of the displacement between adjacent frames, which can be expressed as Equations (5)- (7): , , , , , , are total x-direction and y-direction moving distances of the h th participating pig target in the i th frame; ax i,j,h and ay i,j,h are the activity of the h th pig in the i th frame. Because different parts have different effects on activity, p j is a weight parameter of j th parts, and the sum of p j is 1. To analyze activity of different pigs, based on the relationship of average surface in image, some different weights were assigned to different parts, i.e., trunk weight was 0.6, head was 0.3, thighs were 0.06 and shanks were 0.04. The weight is not fixed, and it can be manually modified according to specific requirements of test.
Based on the Equation (7), each pig activity accumulation can be sought out. The activity experiments were carried out with 2 pigs, 3 pigs and 4 pigs in a pen, respectively. Two experiments were conducted during a quiet period and feeding period on two adjoining date.
Based on these real masks [36] of all body parts of each pig, to test and verify the consistency between the target masks and real masks, the target accuracy and robustness (A-R pair) indicator [37] was introduced as Equation (8).
where, AR denotes A-R pair, A 0 denotes average overlap and F 0 denotes the failure rate. The robustness of a tracker is defined as an exponential failure distribution, shown as Equation (9) and (10): where, R S is the robustness of tracker; M denotes meantimebetween-failures, where N is the length of sequence; The robustness of a tracker can be interpreted as a probability that the tracker successfully tracks an object up to S frames since last failure, the choice of S does not affect the performance of tracker, but can be adjusted as scale factor for better visualization. In this paper, S is taken as 30.

Multi-target detection
In order to test the algorithm proposed, a target detection experiment of multiple pigs was designed. It consisted of ten pens, each pig with 10 parts. As mentioned in Part 2.2, 5000 test images of ten pens were selected for detection, and the results were shown in Figure 4. Each colored mask represented a recognition parts of pig body (colors were added randomly, which did not represent a specific parts), and each black spot on the image represented a key point calculated. As shown in Figure 4a, one pig body and its 9 key points were obtained. However, in the upper left corner, two pigs were judged as one due to their over occlusion. In Figure 4b, six pigs in a pen, their parts and 25 key points were detected. In Figure 4c-f, two pig bodies and 13 key points, two bodies and 12 key points, five bodies and 30 key points, three bodies and 16 key points were detected respectively in four pens. The results above show that this algorithm could achieve detection of each pig body and all body parts of each pig in most cases. 5000 images with 550 × 550 pixels and 700 × 700 pixels were selected to test the detection of body and each part of each pig.
As shown in Table 1, when the image size of image was 550×550, detection speeds were above 30 fps, which can be used for a real-time detection. While the image size of image was 700×700, the average precision (AP) was higher, but the detection speed was lower, only 20.6 fps, and the detection speed of Mask R-CNN was only 8.6 fps, so they were difficult to use for real-time detection. Take AP as account, the detection of pig body is of the highest AP, just as it occupies the most body space, and has the most of the features available for detection. By comparison, the pig head has more features than the rest parts. Trunk, thigh and shank follows consecutively. In addition, trunk has only some edge features, so its AP is lower than that of head. Because the edges of thigh and shank are not obvious, their detection accuracy are relatively lower [39] . In DTMP, the body AP was the highest up to 90.4%, head followed up to 87.2%, trunk was a little lower up to 84.5%, thigh and shank were up to 76.8% and 71.2% respectively, lower than trunk. It can be seen from the experimental results that this method has good performance on solving problems of real-time processing and insufficient information in multi-target detection.
a. b. c.
d. e. f. Figure 4 Six random processed images

Multi-target tracking
The Accuracy-Robustness plot in Figure 5 shows results of multi-target tracking, averaged over entire dataset. There was not much difference in robustness between DTMP and Mask R-CNN, but the mean Average Precision (mAP) of DTMP was slightly lower about 3.1%. The order of accuracy from high to low was body, head, trunk, thigh and shank. The algorithm could realize real-time tracking of each part of multiple pigs with good robustness and accuracy.

The experiment of pig activity
Pig behavior includes gregarious, fighting, sex, exploration, abnormality, aftereffect, eating, excretion, sleep etc., which can be reflected by pig activity. Usually, abnormal activity data can predict some pig health problems in advance or detect their environmental stimuli. Figure 5 Accuracy-robustness data visualization for pig body and all parts All parts of pig activity accumulation were measured to detect the pig activity. Two time periods were chosen: quiet period (13:00-14:00) and feeding period (15:30-16:30), one hour each period [14] . On this basis, two experiments were designed to detect each part activity during the two periods. The video was edited manually and divided into an hour for the experiments.
As shown in Figure 6, the activity of each part of the pig varied greatly. In the quiet period, the sum of thigh and shank movement accounted for more than 80%, which was because pig walking accounted for a large proportion of pig movement. The activity of the pig head was higher than that of trunk, it was consistent with the reality. During the feeding period, although the activity of head and trunk increased, the activity of thigh and shank increased more, accounting for 91.43%. This means the majority of activity was from the thigh and shank, and less from head and trunk.
As mentioned in sections 2.2 and 2.3, the pig activity accumulation can be sought out, and the experimental results were shown in Figure 7 The results showed that the pig activity accumulation during feeding period was much larger than that during quiet period in three pens. The pig activity accumulation was slightly different during the quiet period in each pen with the range from 98.72 m to 164.38 m. The pig activity accumulation fluctuated greatly during the feeding period with the range from 336.12 m to 1050.42 m. As shown in Figure 7c, the activity accumulation of one pig was obviously lower than that of other pigs during feeding periods, which may be due to the hierarchical relationship among pigs or crowded feeding space. It is necessary to establish a new social class by competing with each other in the polyculture of pigs, the pig would be more active in this environment. In the process of feeding, pigs of high social status would get priority [40][41][42] . Although one pig cumulative activity may undulate, the sum of the cumulative activity of all pigs can clearly indicate whether the pig is in a quiet period or feeding period.

Conclusions
A real-time algorithm based on YOLACT was proposed and verified. The algorithm can effectively detect 10 body parts of each pig and get their key points. In the data set, the detection accuracy of body, head, trunk, thigh and shank could reach to 90.4%, 87.2%, 84.5%, 76.8% and 71.2%, respectively. Furthermore, the algorithm is of good robustness and detection accuracy for multiple targets detection, and its detection speed can reach 30.5 fps. Two activity tests based on the algorithm were carried out. The results showed that the contribution of each part of pig body to the overall activity of pig could be calculated. This algorithm provides a new way to solve the problem of target detection, and it is of great significance to further study the behavior of pigs.