Dog behaviour classification with movement sensors placed on the harness and the collar

Dog owners ’ understanding of the daily behaviour of their dogs may be enhanced by movement measurements that can detect repeatable dog behaviour, such as levels of daily activity and rest as well as their changes. The aim of this study was to evaluate the performance of supervised machine learning methods utilising accelerometer and gyroscope data provided by wearable movement sensors in classification of seven typical dog activities in a semi-controlled test situation. Forty-five middle to large sized dogs participated in the study. Two sensor devices were attached to each dog, one on the back of the dog in a harness and one on the neck collar. Altogether 54 features were extracted from the acceleration and gyroscope signals divided in two-second segments. The performance of four classifiers were compared using features derived from both sensor modalities. and from the acceleration data only. The results were promising; the movement sensor at the back yielded up to 91 % accuracy in classifying the dog activities and the sensor placed at the collar yielded 75 % accuracy at best. Including the gyroscope features improved the classification accuracy by 0.7 – 2.6 %, depending on the classifier and the sensor location. The most distinct activity was sniffing, whereas the static postures (lying on chest, sitting and standing) were the most challenging behaviours to classify, especially from the data of the neck collar sensor. The data used in this article as well as the signal processing scripts are openly available in Mendeley Data, https://doi.org/10.17632/vxhx934tbn.1.

The number of consumer-targeted activity trackers available for dogs has increased in recent years and the market is expected to grow rapidly in the forthcoming years.One of the key factors for the growth is dog owners' increased interest and awareness towards dog wellbeing.(Grand View Research, 2018) Combined with a smartphone application, solutions typically visualize the data as daily total activity, type of activity (light or heavy), and collective behaviours, such as the amount of time the dog has spent for moving or resting during the day.There is evidence that existing activity trackers are feasible for evaluating simple canine behaviours, for example, differentiating between a sedentary activity and two intensities of physical activity (Yam et al., 2011).Depending on the sensitivity of the measurement unit, spontaneous activities of a dog, such as locomotion, postural change and movement of body in each posture, can be differentiated from accelerometer data (Yamada and Tokuriki, 2000).However, dog owners and dogs could benefit from even more accurate and detailed analysis of motion and body postures of the dog.Detailed detection of dog's everyday activities would improve dog owners' understanding of particular dog behaviours and reactions, such as suffering from separation anxiety while alone at home or in a kennel.Extracting more detailed behaviours from accelerometer data also has the potential to be used as an index of wellbeing and health status of the animal, for example, by detecting stress and pain-related behaviours (Morrison et al., 2014a;Brown et al., 2010).Automatic behaviour distinction would also benefit behavioural research, where behaviours are traditionally measured by manual annotation of video recordings, which is labour intensive and time consuming and it would enable more detailed behavioural research for free-roaming wild animals (Rast et al., 2020).
Some activity loggers designed for humans have been commonly used for canine activity monitoring.These include "ActiGraph GT3X / GTX3+" by Yam et al., 2011 andMorrison et al., 2014b; as well as "Actical" by Hansen et al., 2007 andOlsen et al., 2016.These devices are primarily intended for data logging (recording raw accelerometer data) and they do not classify behaviour or motions of dogs automatically.In this study, we have developed and evaluated the accuracy of behaviour classification from ActiGraph data using conventional machine learning classifier approaches.In respect to accuracy of classification algorithms developed for dogs, Ladha et al. (2013) achieved a 68.6 % of global accuracy for differentiating 16 canine behaviours in naturalistic environments.den Uijl et al. (2017) showed that walk, trot, canter/gallop, eat, drink, and headshake behaviours could be classified with 95 % accuracy using a hierarchical "one vs. the rest" classifier.Additionally, seven canine activities such as sitting, and trotting were distinguished with an accuracy of more than 80 % by using 3-axis accelerometer and 3-axis gyroscope in (Gerencsér et al., 2013). Ferdinandy et al. (2020) obtained up to 60 % or 80 % accuracy depending on the cross-validation strategy in classifying eight behaviours and Ladha and Hoffman (2018a) achieved 86 % accuracy in detecting resting of the dog.
The placement of the movement sensor (e.g.neck or back) is an important factor in the movement analysis.Regarding versatility and practicality, the best placement for an activity sensor has been concluded to be ventral attachment to the neck collar, because this placement makes it possible to detect also behaviours that do not involve movement of the whole body, such as scratching and eating (Hansen et al., 2007).On the other hand, accelerometers attached at the back may be able to differentiate behaviours that devices attached to the collar cannot detect such as elevated walking velocity (Preston et al., 2012).
The aim of this study was to evaluate four commonly used classification algorithms for distinguishing seven dog behaviours using inertial sensor data recorded in semi-controlled test situations.The behaviours were galloping, lying on chest, sitting, sniffing, standing, trotting, and walking.Furthermore, as some behavioural phenomena may be better detectable from the dog's neck or back locations, we systematically examined how the device placement, either on the harness at the back or on the collar around the neck, affected the classification accuracy.Initial results of the study with smaller number of dogs and with only one sensor location (neck) have earlier been published in ACI (Animal Computer Interaction) conference (Kumpulainen et al., 2018).

Test setup
The experiments were conducted at the University of Helsinki, Faculty of Veterinary Medicine.The study protocol was reviewed and accepted by the Ethical Committee for the Use of Animals in Experiments at the University of Helsinki (minutes 5/2017).All dog owners signed an informed consent before participating in the study.The attendants were free to cancel their participation at any time without giving a reason.
A total of 45 healthy, middle to large -sized pet dogs from 27 breeds participated in the study.The average age of the dogs was 4.9 years (range 1-9 years) and the average weight was 24.5 kg (range 13-41 kg).Table 1 shows detailed breed, weight and age statistics of the participating dogs.The dataset is described in more detail in a separate dataset article (Vehkaoja et al., 2021a) and is freely available in Mendeley Data (Vehkaoja et al., 2021b).

Test protocol
The tests were conducted in a dog sporting hall in a testing arena of 10m × 18m covered with artificial turf.The test sequence consisted of seven tasks where the owner was instructed to guide the dog accordingly.Three of the tasks were static tasks (i.e.sitting, standing, lying down) and four were dynamic tasks (i.e.trotting, walking, playing, and treatsearching), each task lasted for three minutes.The whole procedure was repeated after a short break while changing the order of the tasks.Dogs performed tasks sequentially, alternating between static and dynamic tasks.Treat search was always performed as the final task of the sequence and it consisted of searching small pieces of dry dog food spread on the ground (area of 4m × 4m) by sniffing.
Dogs wore two ActiGraph GT9X Link (ActiGraph LLC, Florida, USA) activity sensors including 3-axis accelerometer and 3-axis gyroscope sensors (sampling rate 100 Hz).One sensor was placed inside a tight pocket made of neoprene on the back belt of the dog's harness, referred to as the back sensor in this paper.The other sensor was attached tightly with an adhesive tape on the ventral side of the neck collar and is referred to as the collar sensor in this paper.Dogs were on the leash (1.5 m) and were led by their owners or the experimenter.The leash was connected to a separate collar that was placed closer to the dogs' body than the collar to which the sensor was attached.The owners were allowed to give food rewards and command their dogs through the entire test.

Behaviour annotation
The actual behaviour of the dogs during the assigned tasks were annotated using video recordings.The test procedure was recorded with Panasonic HDC-SD600 and Sony HDR-CX450 video cameras positioned on the opposite lateral walls and facing towards the testing arena.The post hoc annotation of the video recordings was done using the Observer XT 10.5 software (Noldus, The Netherlands).Only segments longer than one second were included in the annotation.Dynamic behaviours (i.e.Walking, Trotting, Galloping, Sniffing; Table 2) were only encoded if unambiguous, i.e. if there was only one obvious, continuous dynamic behaviour without the dog leaning towards the handler or pulling the leash, thus affecting the gait pattern or the body position.Galloping was annotated only during the play task and sniffing during the treat search task (see Table 2 for the ethogram).Static behaviours consisted of still postures (i.e.Lying on chest, Sitting, Standing) and annotated when limbs did not move and there was no physical contact between the handler and the dog, except if a treat was given.

Feature extraction and labelling
The raw time series data produced by the movement sensors were saved with ActiLife software (ActiGraph LLC, Florida, USA) and analysed offline with MATLAB R2018b (The MathWorks, Inc., Natick, MA USA).
The time series signals were segmented into two-second time windows with 50 % overlap.A total of 27 features per sensor type were calculated for each segment and used for classification of the behaviours.The same features were used for both accelerometer (A1 -A27) and gyroscope (G1 -G27) data.The descriptions of the resulting 54 features are given in Table 3.
The interpolated inverse empirical cumulative distribution function (ecdf) has been presented by Cox and Oakes (1984) and it has earlier been used for computing movement related features by Hammerla et al. (2013).The ecdf features are based on the cumulative distribution function P c (x) = P(X ≤ x) in a following way.Seven values of p i , evenly distributed between 0 and 1, in each x, y and z axis were selected.For each p i the value x i for which P(X ≤ x i ) = p i was estimated by shape-preserving piecewise cubic interpolation.The true behaviour classes (see Table 2) of the data segments were assigned according to the video annotations and synchronized to match the timestamps in the video annotations.A behaviour class was assigned as a label to a segment if a single annotated behaviour was occurring a minimum of 75 % of the segment.This was done in order to increase the amount of included data, especially for the behaviours that typically occur in short durations, namely sniffing and galloping.The data of both test sequences were included for 17 dogs and only one test sequence for the remaining 28 dogs.The reason for including only one of the test sequences for the 28 was the challenges faced in reliable synchronization of the data.The 62 tests from the 45 dogs provided 54,594 instances of labelled data.Table 4 shows how the segments were distributed between the behaviours.

Feature selection and classification
All 54 features were Z-score normalised to zero mean and unit variance.Due to the high number of features, the feature selection was performed in two parts.First, weights of importance were calculated for each feature by the ReliefF algorithm (Robnik-Sikonja and Kononenko, 2003) based on the k-nearest neighbour approach.In order to reduce computational costs, only the most important features were used in the subsequent forward feature selection.Here, features are added one at a time in an order in which they best improve the classification accuracy.This is continued until additional features provide no improvement.The forward selection was performed for each of the four classifiers.Both, the forward selection and the final classification results were computed using Leave-One-Dog-Out cross validation.Thus, all the data of each individual dog were left out at a time as a test set and the classifiers were identified using the rest of the data.Leave one subject out cross validation has been proposed over the random splitting of the data in which case the data from the same individual easily ends up in both training and validation data sets producing overly goods classification results.This was recently shown by Ferdinandy et al. (2020) in the context of animal behaviour classification.
The whole feature selection procedure was repeated using only the accelerometer features to verify the benefit of the information provided by the gyroscopes.
The four classifiers in this study were linear and quadratic discriminant analysis classifiers (LDA and QDA, respectively), a support vector machine (SVM) classifier with gaussian kernel, and a classification tree (Duda et al., 2000).The regularisation of the classification tree was controlled by the number of cuts allowed in the tree.The number was optimised with the cross validation using the most significant features given by the relief weights and that value was used in the forward feature selection thereafter.These four classifiers were chosen based on their popularity in basic machine learning studies and their simple structure and low computational cost that would enable their integration also into a power constrained embedded measurement platform in the future.The predicted class was obtained as the highest probability class proposed by the classifier referred to as overall accuracy in (Ferdinandy et al., 2020).

Statistical analysis
The classification accuracies reported in the results were calculated

Table 2
Ethogram of the behaviours included in the statistical analyses.

Behaviour Description
Galloping 3-or 4-beat gait where the dog lifts and puts down both front and rear extremities in a coordinated manner, in 1− 2-3-beat gait (canter) or in 1− 2-3− 4 beat gait (gallop).All four extremities are simultaneously in the air at some point in every stride.Galloping occurred only during Playing task.

Lying on chest
The dog's torso is touching the ground and hips are in the same level as shoulders.The dog can change balance point without using limbs.

Sitting
The dog has four extremities and rump on the ground.The dog can change balance point from central to hip or vice versa.

Sniffing
The dog has its head below its back line and moves its muzzle close to the ground.The dog walks, stands or performs another slow movement, but its chest and bottom do not touch the ground.Taking food from the ground and eating it can be included (eating was not coded separately).

Standing
The dog has the four extremities on the ground, without the dog's torso touching the ground.Trotting 2-beat gait where the dog lifts and puts down extremities in diagonal pairs at a speed faster than walking.Walking 4-beat gait where the dog moves extremities at slow speed, legs are moved one by one in the order: left hind leg, left front leg, right hind leg, and right front leg.The dog moves straight forward or at maximum in 45 degrees angle.

Table 3
Description of the features calculated for 2-second segments of time series movement sensor data."A" refers to accelerometer and "G" to gyroscope.
Feature code Feature description A1, G1 Total activity: sum of standard deviation in all three axis A2, G2 Position offset: Euclidean distance from the robust mean obtained while the dogs were standing still A3, G3 The number of mean crossings, the sum of x, y and z axis A4 -A6, G4 -G6 The mean value of each axis; x, y, z A7 -A27, G7 -G27 Interpolated inverse empirical cumulative distribution function (ecdf): seven values for each axis, a total of 21 features for each sensor type P. Kumpulainen et al. as the averages of the percentages of correctly predicted behaviour class in all folds of the cross validation.Differences between the classifiers were tested at p = 0.05 level using t-test, which assumes normal distribution.The normality of the classification rate distribution of each classifier was tested by Kolmogorov-Smirnov test.The statistical analyses were conducted by MATLAB Statistics and Machine Learning Toolbox R2018b.

Results
The relief feature weights were calculated with six values of k: {3, 5, 9, 13, 17, 21}.For the feature set with both accelerometer and gyroscope data, the features included in the top 20 weights by any of the k values (23 features for the back and 22 for the collar sensor) were used in the forward selection phase.When evaluating the accelerometer data alone, the features were selected in the same way but choosing the features included in the top 15 weights by any of the k values (17 features for the back and 15 for the collar sensor).
The optimal numbers of cuts acquired for the classification trees were 124 for the back and 121 the collar sensor for all features, and 168 for the back and 54 for the collar sensor for the accelerometer features only.The features selected by the forward selection and the final crossvalidated classification accuracies are presented in Table 5.
In all cases, more accelerometer features and fewer gyroscope features were selected.Considering also the gyroscope features in the classification provided better accuracy with all classifiers and both sensor locations.Therefore, all the detailed results presented below are presented for the cases with the selected accelerometer and gyroscope features.
Confusion matrices of the classification results are presented in Fig. 2. The most challenging behaviours to classify with the data of the collar sensor were the static postures: lying on chest, sitting, and standing.Lying on chest was most often mixed with the other static postures.The back sensor provided similar results but there the difference in the accuracies between the classes was not so clear.The activities that involved movement were generally classified very accurately, mostly higher than 90 %, sniffing being the most distinct behaviour in both sensor locations and almost all classifiers.Walking was classified with slightly worse accuracy with the neck sensor and was mixed with static behaviours.
The classification results were also calculated separately for each dog to study the differences in the accuracy between individual dogs.The results are shown as boxplots in Fig. 3.The box contains the interquartile range between the 25th and 75th percentiles.The notch around the median covers 95 % confidence limits.The whiskers extend to the extreme data point up to 1.5 times the interquartile range from the box.Individual points outside the maximum length of the whiskers are marked with red crosses.Marking the confidence interval of the median value allows visual inspection of the statistically significant differences between the results obtained with different classifiers.If the notches of the results of two classifiers do not overlap, the medians are statistically different.
As seen in Fig. 3, the results vary considerably between the dogs.The highest classification accuracies for some individual dogs are above 99

Table 4
The number of segments assigned to each behaviour.% for all classifiers with the sensor attached on the harness.The collar sensor reaches 90 % accuracy for some dogs.However, the lowest accuracies are between 47 % and 66 %.
For the back sensor, differences in the distributions of the results of individual dogs obtained with different classifiers were not statistically significant.For the collar sensor, SVM gave significantly better results than LDA and QDA.LDA and QDA were not significantly different from each other.The results obtained with the classification tree were not significantly different from any of the other classifiers.Table 6 provides all pairwise p-values of the dog-wise accuracy distributions of different classifiers.

Discussion
Consumer-targeted dog activity meters are widely available on the market, but the information they give for the dog owners is rather limited.The aim of this study was to evaluate the performance of activity classification with two movement sensors located in the collar and the harness.The sensors provided both accelerometer and gyroscope data for classifying seven activities of dogs in a semi-controlled test situation.The results were promising, yielding up to 91 %   classification accuracy using the data of both sensor types from the sensor at the back.Including the gyroscope data in addition to the accelerometer data provided 0.7 %-2.6 % better accuracy with all four classifiers and both sensor locations.
Sniffing was the most distinct behaviour, resulting in 99.2 % accuracy with the collar sensor and 98.0 % with the back sensor.This is in contrast to an earlier finding by Ladha et al. (2013) who found that walking and running (of the behaviours shared with this work) had better classification performance than sniffing.However, all classes included in a classification task affect the performance of each individual class, which may explain the difference.The most challenging task for the classifiers was differentiation between the static postures with the collar sensor, namely lying down, sitting, and standing.From those time segments where the dog was lying down, only 28%-45% were classified correctly, and the rest were classified mainly as either sitting or standing.However, considering the minor orientation change in the dogs' neck and back during these tasks, the mixing of these postures is rather logical.den Uijl et al. ( 2017) report similar results, while they had the lowest specificity for sleep behaviour.However, they did not classify standing, sitting, and lying down separately, but a combined class as static/inactive, which makes the classification task significantly easier.In general, the results showed that activity monitors on the back yielded better results for classification than attaching an activity monitor on the neck collar.The results of the back sensor are in line with those by Gerencsér et al. (Gerencsér et al., 2013) who also used a sensor at the back.
The placement of the sensor has been shown to affect the amount of measured activity in previous studies (Hansen et al., 2007;Preston et al., 2012).In addition, tightness of the attachment may affect the accuracy.For example, Preston et al. (2012) found that accelerometers attached tightly to the back detected behaviours more accurately than devices attached loosely to the back.In our study, the attachment technique of the sensors was dependent on the placement: the collar sensor was attached to the collar with adhesive tape, but the back sensor was inserted in a neoprene pocket.This might be one reason for the difference between the results for the collar and the back sensors.Although the pocket was tight, the movements of the sensor could be affected by it, as has been earlier concluded by Martin et al. (Martin et al., 2016).In our study, a likely reason for the difference in the classification accuracy between the collar and the back sensors is that the distinct static postures result in more significant changes in the orientation of a sensor attached to the back than to the collar.Another likely reason is that, because the orientation of the neck sensor may change slightly due to turning of the collar around the neck, the orientations recorded in the static tasks may overlap.Following this, the potential rotation of the collar needs to be considered and compensated as has also been concluded in (Ladha et al., 2018b).
Walking was classified with high accuracy by the back sensor but mixed with static postures with the neck sensor.It should be noted that walking in controlled test situation was rather different from real life, where a dog rarely purely walks slowly in leash, but rather mixes walk and pace gaits.
As the classification accuracies vary considerably between individual dogs and the manually annotated video data from every dog does not necessarily contain the same amount of all behaviours.Thus, the question arises whether the proportions of the behaviours have an effect on the individual classification results.However, no significant correlation was found between the class proportions and the classification rate, except for a negative correlation for walking behaviour with QDA and SVM classifiers on the back sensor.However, walking was one of the best classified behaviours in this study, so this correlation must have happened by a pure chance.
We tested only medium to large sized dog breeds to get a homologous participant group with smaller variability.For example, Ladha et al. (2013) found better global accuracy for differentiating behaviours in small and medium sized dogs than in large ones.In the future, the accuracy of the current behaviour classifications should be tested in a larger dog population that includes also smaller dogs as well as dogs with deviant body structure (e.g.short legs).As our test setup was also relatively controlled, future studies should also be done with natural behaviours of dogs moving freely in their familiar environment.Although the sensor was attached to a different collar than the leash, the handler's behaviour may have affected the sensor, especially in those situations when the dog moved slower than the handler did.It would be ideal if the leash could be attached to the harness and the sensor to the collar (or vice versa), but as we aimed to test both sensor placements simultaneously, this was not an option.In future studies, for reliable behaviour detection, the device should always be positioned at exactly the same orientation, or the orientation should be recalibrated after each time it is attached (as was done in the present study) as well as each time it may have been shifted.
SVM classifier provides the best results for both sensors, but the difference in performance is statistically significant only for the collar sensor.In all cases, the classification rates have high deviation between individual dogs.Thus, for practical purposes, tuning the classifier for each individual would be beneficial from the accuracy point of view but may not be necessarily feasible in practice.Including the gyroscope data in the classification lowers the misclassification rates.However, for practical embedded products, it is a compromise whether the improvement is worth the added complexity and decreased battery lifetime.

Conclusion
Our current results suggest that behaviour classification was more successful from the movement sensor attached to the harness at the back of the dog rather than on the neck collar.In particular, static behaviours of sitting, standing, and lying down were hard to differentiate with the sensor attached to the collar.Positioning may comprise a challenge for the usability of activity monitors for differentiating behaviours in real life.Attaching the sensor to the collar is convenient for the dog and the owner, but if it compromises differentiation of resting from other sedentary behaviours as concluded in (den Uijl et al., 2017), it can lead to misleading conclusions in cases where rest behaviour is used, for example, as an indicator for a dog's pain or stress level.
Our current results are promising in terms of development of practical methods for automatically gaining information on dog behaviour.This type of more accurate information can be useful in supporting the owner in gaining overall understanding of a dog's daily life, assessment of health or sickness, and functioning of medication, in particular for dogs suffering from chronic illness.In the future, the technique could be developed further to identify behaviour problems, their causes, their treatment as well as the effectiveness of the treatment and assess the issues and changes in overall welfare based on the data.Furthermore, our present results pave the way for developing solutions to associate the activity to the affective state of the dog, to support a more comprehensive assessment of dog welfare.
Fig. 1 left panel shows an example of cumulative distribution function of a normal distribution.The right panel shows two examples derived from two-second windows of accelerometer signal in x direction during walking and trotting.

Fig. 1 .
Fig. 1.Illustration of ecdf feature calculation.Left panel: one point from normal distribution at P(X ≤ x) = p i .Right panel: example of actual data evaluated at seven p i values between 0 and 1.

Fig. 2 .
Fig. 2. Confusion matrices of the four classifiers.The true classes are in the rows and the predicted classes in the columns.

Fig. 3 .
Fig. 3. Boxplots of the accuracies of the four classifiers (LDA, QDA, SVM, Tree).Each box contains the classification accuracies of the 45 dogs.

Table 1
Characteristics of the 45 dogs that participated in the study.

Table 5
Classification accuracies and the selected features for each classifier and both feature set scenarios separately for sensors located on the back and the neck.