Evaluation of Neural Networks to Identify Types of Activity Among Children Using Accelerometers, Global Positioning Systems and Heart Rate Monitors

There is a growing awareness of the health benefits of physical activity during childhood and adolescence. However, there is still much to be learned about the nature of children’s physical activity patterns and the mechanisms underlying these benefits (Rowland, 2007; Twisk, 2001). The physical activity pattern of children is very different from that of adults. Children’s physical activity pattern is characterized by frequent spasmodic bursts of short duration (Baquet et al., 2007). They participate in intermittent and unstructured activities and the type of activities children engage in changes as they develop, going from informal active play during early childhood to activities that begin to mirror those of adults during adolescence (Salmon & Timperio, 2007). The understanding of children’s physical activity pattern has been hampered by the lack of satisfactory instruments for measuring it. Children’s physical activity has traditionally been measured with self-reports. Self-reports are easily administered, low-cost measurements. However, they do not capture the sporadic short-burst nature of children’s physical activity very well (Baquet et al., 2007). Furthermore, self-reports are influenced by recall bias or social desirability. Accelerometers have therefore, in recent times, become the method of choice in physical activity research. These lightweight, unobtrusive devices provide objective information about the frequency, intensity, and duration of physical activity. In most studies, the raw acceleration signal is converted into activity counts. Total or mean activity counts per day and minutes per day spent above a certain intensity threshold are reported. Activity counts are linearly related to energy expenditure. This does not value the richness of accelerometer data (Esliger et al., 2005) because this approach is unable to correctly distinguish between different types of activities with different levels of energy expenditure but that produce similar mean activity counts over time. A solution for this problem is to use not only the mean counts over a certain time period, but also more information about the distribution of these counts (i.e. standard deviation, percentiles) over time. Recently, statistical models have been developed to identify specific types of physical activities based on a new methodology for processing accelerometer data. The most common classification algorithms have been developed by using the pattern-recognition or


Introduction
There is a growing awareness of the health benefits of physical activity during childhood and adolescence.However, there is still much to be learned about the nature of children's physical activity patterns and the mechanisms underlying these benefits (Rowland, 2007;Twisk, 2001).The physical activity pattern of children is very different from that of adults.Children's physical activity pattern is characterized by frequent spasmodic bursts of short duration (Baquet et al., 2007).They participate in intermittent and unstructured activities and the type of activities children engage in changes as they develop, going from informal active play during early childhood to activities that begin to mirror those of adults during adolescence (Salmon & Timperio, 2007).The understanding of children's physical activity pattern has been hampered by the lack of satisfactory instruments for measuring it.Children's physical activity has traditionally been measured with self-reports.Self-reports are easily administered, low-cost measurements.However, they do not capture the sporadic short-burst nature of children's physical activity very well (Baquet et al., 2007).Furthermore, self-reports are influenced by recall bias or social desirability.Accelerometers have therefore, in recent times, become the method of choice in physical activity research.These lightweight, unobtrusive devices provide objective information about the frequency, intensity, and duration of physical activity.In most studies, the raw acceleration signal is converted into activity counts.Total or mean activity counts per day and minutes per day spent above a certain intensity threshold are reported.Activity counts are linearly related to energy expenditure.This does not value the richness of accelerometer data (Esliger et al., 2005) because this approach is unable to correctly distinguish between different types of activities with different levels of energy expenditure but that produce similar mean activity counts over time.A solution for this problem is to use not only the mean counts over a certain time period, but also more information about the distribution of these counts (i.e. standard deviation, percentiles) over time.Recently, statistical models have been developed to identify specific types of physical activities based on a new methodology for processing accelerometer data.The most common classification algorithms have been developed by using the pattern-recognition or

Subjects and data collection
Children between the age of 9 and 12 years were recruited from three elementary schools in the Netherlands by sending written information about the purpose and nature of the study to their parents.Finally 58 healthy children (31 boys, 27 girls) were permitted by one of their parents to participate in the study.Data from 52 children (27 boys, 25 girls) had measurements from all devices (accelerometer, GPS and heart rate monitors), and could be used for the analyses.The characteristics of these children are shown in Table 1.Each child was observed by a research assistant while performing a fixed sequence of 20 minutes comprising the following activities: sitting during a writing task, standing, walking, running, rope skipping, playing soccer (i.e., kicking the ball back and forth to the research assistant), regular cycling and brisk cycling.The research assistant recorded the starting and the finishing time of each activity with a stopwatch.In order to imitate real-life, all activities were performed at a self-paced speed.With the exception of sitting, all activities were conducted outdoors in the direct vicinity of the subject's school in similar weather conditions (i.e., no rain, mild wind).For cycling, the subjects used their own bicycle.All subjects wore various measurement instruments: a heart rate receiver unit (Polar Electro S610i, Finland) on the wrist with the transmitter (Polar T61 Coded Transmitter, Finland) worn on the chest, a three-axial ActiGraph accelerometer (ActiGraph GT3X, Pensicola, FL) and a GPS (QSTARZ travel recorder V4.3, Taipei, Taiwan) on the right hip.The ActiGraph is the most validated and widely used accelerometer.It has good reproducibility, validity, and feasibility when used to assess physical activity in children (De Vries et al., 2009).The axes senses of the three-axial accelerometer are vertical, medio-lateral and anterior-poterior direction.Accelerometer data (counts) were collected for each axis in one second epochs.The default sample-rate frequency setting of the GPS (kilometer per hour) is also 1 Hz and the Polar heart rate monitor (beats per minute) sampled at 0.2 Hz (once in every 5 seconds

Data processing
When the data collection was complete, the accelerometer data were downloaded to a personal computer and processed using the ActiLifeGT3X software program.GPS data were also downloaded to the computer and processed using Qstarz Travel Recorder PC Utility V4 software.Polar Precision Performance software was used to read the Polar Electro S610i receiver.Next, the data was labeled to one of the eight physical activities according to the starting and the finishing time of each activity.Data from the physical activity sitting were eliminated because this activity was performed inside of the school and the GPS monitors cannot receive an accurate signal inside of buildings.For each of the seven remaining activities the first and the last four seconds of the signals were deleted to eliminate any noise in the data of the transition period between activities.This time buffer was determined by visually inspecting the data set.If the activity (e.g., standing) was carried out several times, the signal was cleaned for each period.The cleared data for the physical activities were then used for further analyses.

Statistical analysis
First, descriptive statistics were used to characterize the sample and to study signal differences between activity types.Second, correlations between accelerometer counts (3axes) and GPS data and between accelerometer counts (3-axes) and heart rate data were computed to study the relationship between these variables.Third, differences in the mean accelerometer counts from different axes, GPS speed, and heart rate per child between activities types were tested with an ANOVA.Finally, post hoc tests comparing all pairs of physical activity were performed.Values were considered statistically significant when the two-sided P value was lower than 0.05.
To classify the activity type, four artificial neural networks (ANN) models were developed; a model based on: three-axial accelerometer data (Model 1), three-axial accelerometer data and GPS data (Model 2), three-axial accelerometer data and heart rate data (Model 3), and three-axial accelerometer data, GPS data and heart rate data (Model 4).ANNs provide a flexible non-linear extension of multiple regression.Feed-forward neural network models were used for the analyses (Ripley, 1996).They consist of a function with a set of predictors or input variables that represent characteristics or statistical summaries describing the signals, a single hidden layer with several hidden units, and one discrete dependent or output variable with several categories that represent physical activity types.Figure 1 presents an illustration of a feed-forward ANN model with five hidden units.The mathematical equations of the model are also provided.If x i denotes an input variable, y k an output variable with k categories and f j and f k denote the transformation functions, then this model can be written as The transformation functions f j and f k are taken to be the logistic function exp( ) () 1e x p ( ) (2) since this transformation performed better than other alternative functions.Alternative transformation functions are described in Ripley (1996).The parameters w ij and w jk are known as weights, and they are linear combinations of the inputs or the hidden units.Finally, the intercepts  j and  k are known as biases.(2009).For the accelerometer data, we used the following signal characteristics: 10th, 25th, 75th, and 90th percentiles, absolute deviation (i.e., the sum of the absolute difference between each element of the interval and the mean), coefficient of variability (i.e., the ratio of the standard deviation and the mean), and lag-one autocorrelation (i.e., the correlation between consecutive elements within intervals).These statistics were computed for each axis independently.For the GPS signal, the features mean and absolute deviation were included as input variables for the two models with GPS data.
For the heart rate signal, only the mean was computed because of the reduced number of data points per interval.The accuracy of the four models was evaluated by leave-one-subject-out cross-validation (Venables & Ripley, 2002).In this method a set of n-1 subjects was used as a training set and the subject left out was used as a test set.This process was repeated for all n subjects.Feedforward ANN models with a single hidden layer, five hidden units, and a weight decay1 equal to 0.006 showed the highest classification accuracy.Next, contingency tables were built to evaluate the classification errors of the models in more detail.
All statistical analyses were performed using the software package R version 2.8.0 (R Development Core Team, 2008).The classification models were developed with the function nnet (Venables & Ripley, 2002).Both R and nnet are freely available.

Descriptive results
Figure 2a-c shows the main output per measurement instrument.Figure 2a reports accelerometer mean counts per second and standard deviations for the x-, y-and z-axes.The figure shows that the differences in mean counts between physical activities are larger for the x-axis than for the other two axes.Furthermore, it can be seen that the standard deviations are larger for the activity rope skipping than for the other activities.The differences in mean counts per second between standing and all other activities are significant for each axis (x-axis: F(6,330)=401.03,p<.001; y-axis: F(6,330)=207.7,p<.001; z-axis; F(6,330)=72.01,p<.001).Figure 2b represents bar-charts of mean heart rate output and standard deviations in beats per minute (bpm) across children per physical activity.The mean heart rate is higher for standing (129.5 bpm) than for walking (114.9 bpm).This unexpected result most likely occurred because the recovery time of the heart rate returning to resting status between physical activities is larger than the eliminated data of the time buffer of 4 seconds.The activity standing was performed for short intervals of 1 minute after each activity.Because heart rate values for standing are very high, the differences in heart rate between standing and walking and between standing and regular cycling (136.7 bmp) are not significant.The global test was (F(6,248)=54.02p<.001).Figure 2c reports the mean speed in kilometers per hour (km/h) and standard deviation across children for each activity (F(6,323)=742.4,p<.001).Mean speed is higher for cycling than for all other activities and there is a significant difference between regular cycling (10.3 km/h) and brisk cycling (17.7 km/h).There is also a significant difference between walking (3.4 km/h) and running (7.3 km/h).The mean speed for playing soccer (2.4 km/h) is lower than the mean speed for walking.This unexpected result may be due to differences in intensity that children play soccer (i.e., long periods of standing still and short bursts of movement).The standard deviation for playing soccer (2.3) is also larger than the standard deviation for walking (1.3).From Figure 2a-c it can be seen that the mean counts for regular cycling and brisk cycling are very similar while the differences in mean speed and mean heart rate suggest that the intensity of brisk cycling is higher than the intensity of regular cycling.This illustrates the additional value of these monitors to discriminate between two activities with similar means counts.

Activity classification
Table 2 reports the percentage of correctly classified activities of the cross-validated results for the four developed ANN models.In general all models performed well (>80%) in classifying the activities walking, standing still, rope skipping, running and playing soccer.Cycling was best classified by models including GPS data.Overall, the model based on accelerometer data (Model 1) correctly classified 82% of the activity types.When adding GPS data (Model 2), the overall percentage of correctly classified activities improved to 89%.The improvement was lower (from 82% to 84%) when heart rate data were added to the model (Model 3).Finally, the overall percentage of correctly classified activities with the complete model including accelerometer, GPS and heart rate data (Model 4) was 90%.This is 1% higher than the percentage of correctly classified activities achieved by Model 2. In order to evaluate the classification errors of the four models in more detail, a contingency table was built for each model representing the relationship between the observed and the predicted physical activities.Table 3 shows that the highest percentage of misclassification errors occurred for allocation to the activities brisk and regular cycling.The percentages of misclassification were higher for brisk cycling than for regular cycling.Model 1 achieved the highest misclassification error for both cycling activities (53.6%) followed by Model 3 (31.2%).Furthermore, Model 1 and Model 3 could not discriminate between the activities regular cycling and standing very well.Model 2 achieved the highest misclassification errors between the activities standing and rope skipping.

Discussion
The purpose of this study was to investigate whether the accuracy of the previously developed ANN models from De Vries et al. (2011b)  cycling, and from 33% to 80% for brisk cycling.Though the model based on data from all sensors (Model 4) produced the best overall classification, the gain in the percentage of activities correctly classified was only 1% higher that the improvement achieved by the model based on accelerometer and GPS data (Model 2), and the differences in performance per activity were very small (<=3%).Compared to Model 4, Model 2 could discriminate better between regular and brisk cycling, and it was simpler because it used less input variables.Therefore, the addition of GPS data to the model based on three-axial accelerometer data is sufficient to discriminate between activities performed with different intensity.
It is difficult to compare our results with those of previous studies because all the studies differ in the age groups studied, the type of accelerometer monitor the participants worn, the type of patter-recognition model used, the type of signal analyzed (raw data or counts), and also the type of physical activities classified (see Bonomi et al., 2009a, Khan et al., 2008;Liu & Chang, 2009;Staudenmayer et al., 2009).Staudenmayer et al. (2009) achieved a high percentage of correctly classified activities (89%) with an ANN model with a sample of adults.Bonomi et al. ( 2009) also achieved a high percentage of correctly classified activities (93%) with a decision tree model for a sample of adults.An advantage of the models proposed by Khan et al. (2008) and Liu & Chang (2009) could be that they used combined characteristics of the three-axial accelerometer signal as input variables in the models.However, our classification results did not improve when we performed additional analyses including combined characteristics of the three-axial accelerometer data.
To our knowledge, this is the first study that used ANN models with data from multiple sensors to classify children's physical activity type.We can only compare our results to the single sensor model proposed in De Vries et al. (2011b).Though the performance of the model based on three-axial accelerometer data presented in this paper (82%) is better than the performance of the equivalent model proposed in De Vries et al. (2011b) for children (77%), the differences in performance can be mainly explained by the differences in the type of activities classified.De Vries et al. (2011b) did not distinguish between regular and brisk cycling, and they included data from the activity sitting.In their model a high misclassification error was found between the activities sitting and standing.In this study, data from the activity sitting were not included because this activity was performed inside of the schools and accurate GPS data cannot be registered inside of buildings.
In previous studies, the output of multiple sensors has been combined to increase the prediction of energy expenditure.There are several monitors available that combine multiple sensors.For example, the Actiheart is an integrated accelerometer and heart rate unit and it provides a more accurate prediction of children's energy expenditure than either heart rate or accelerometry alone (Rowlands and Eston, 2007).Another device is the Intelligent Device for Energy Expenditure and Activity (IDEEA) which consists of five miniaccelerometers attached to the chest, to both thighs and under both feet (Rothney et al., 2007).However, these monitors cannot be used in large epidemiological studies because they are either very expensive or difficult to place.Heart rate monitors perform better in combination with other sensors than alone when classifying physical activity because their assessment of physical activity is based on the linear relationship between oxygen uptake and heart rate.If the intensity of an activity increases, the heart rate increases.Moreover, there are large inter-individual differences in the heart rate recovery time, and in rest heart rate levels.Therefore, heart rate data must be calibrated before including them in a pattern recognition model.
Beside heart rate units, GPS monitors have already been combined with accelerometer data in previous studies to assess physical activity.Mostly location GPS data were combined with accelerometer data to assess physical activity (Maddison & Mhurchu, 2009).Our study has showed that speed GPS data may help to discriminate between physical activities performed with different intensity, such as regular cycling and brisk cycling, better than heart rate data because GPS data do not need to be calibrated.Furthermore, other factors such as emotional stress can change the heart rate.A drawback of GPS monitors is that the GPS signal strength is not always sufficient, for example, when the monitors are worn inside of buildings.This study had some weaknesses.First, it is known that children's activity pattern is different from the adults' activity pattern.In children's physical activity, there is more variance in intensity within and between activities.In addition they more often change of type of activity.In this study, all signals were segmented in non overlapping intervals of 10 seconds, and signal features were calculated for each interval.However, 10 seconds may be a too long time period when there are several transitions between activities within a few seconds.Therefore, it must be studied whether the classification performance of the models may increase when shorter time intervals are used.Furthermore, future studies should determine whether the accuracy of the ANN models based on accelerometers and heart rate data could be further improved by increasing the measurement frequency of heart rate, by calibrating the heart rate signal or by computing other signal features.Alternative features could be the maximum peak per interval or the maximum deviation from the baseline heart rate.

Acknowledgment
This study was funded by the Dutch Ministry of Health, Welfare and Sport.We are grateful to Hannah Hofman and Marjolein Engels for giving assistance during data collection.

Fig. 1 .
Fig.1.Feed-forward neural network model for k=7 activities Before estimating the ANN models, all signals were segmented into non overlapping intervals of 10 seconds.For heart rate the intervals only contained two data points because the sampling interval was 5 seconds.Next, several signal characteristics or statistical summaries were computed for each 10 second segment.The statistical summaries used in this study were selected from the set of characteristics used byRothney et al. (2007), Bonomi  et al. (2009), and Staudenmayer et al. (2009).For the accelerometer data, we used the following signal characteristics: 10th, 25th, 75th, and 90th percentiles, absolute deviation (i.e., the sum of the absolute difference between each element of the interval and the mean), coefficient of variability (i.e., the ratio of the standard deviation and the mean), and lag-one autocorrelation (i.e., the correlation between consecutive elements within intervals).These statistics were computed for each axis independently.For the GPS signal, the features mean and absolute deviation were included as input variables for the two models with GPS data.For the heart rate signal, only the mean was computed because of the reduced number of data points per interval.The accuracy of the four models was evaluated by leave-one-subject-out cross-validation(Venables & Ripley, 2002).In this method a set of n-1 subjects was used as a training set and the subject left out was used as a test set.This process was repeated for all n subjects.Feedforward ANN models with a single hidden layer, five hidden units, and a weight decay 1 equal to 0.006 showed the highest classification accuracy.Next, contingency tables were built to evaluate the classification errors of the models in more detail.

Fig. 2b .
Fig.2a.Three-axial accelerometer data in counts per second for seven physical activities (mean and standard deviation)

Table 3 .
could be improved by the inclusion of data on the intensity of activities by means of heart rate data and/or inclusion of data on the speed of activities from GPS.The results have shown that the performance of the previously developed accelerometer based ANN model, which classified 82% of the activities correctly, improves by 2-8% when including other sensor data.The largest improvement was found when adding GPS data.Including GPS data seemed especially valuable for distinguishing between regular and brisk cycling.The percentages improved from 67% to 87% for regular Cross-validation results for the classification of seven physical activities of Model 1 (accelerometer), Model 2 (accelerometer + GPS), Model 3 (accelerometer + heart rate), and Model 4 (accelerometer + GPS + heart rate) in percentages.