Recognition of Daily Human Activity Using an Artificial Neural Network and Smartwatch

Human activity recognition using wearable devices has been actively investigated in a wide range of applications. Most of them, however, either focus on simple activities wherein whole body movement is involved or require a variety of sensors to identify daily activities. In this study, we propose a human activity recognition system that collects data from an off-the-shelf smartwatch and uses an artificial neural network for classification. The proposed system is further enhanced using location information. We consider 11 activities, including both simple and daily activities. Experimental results show that various activities can be classified with an accuracy of 95%.


Introduction
As the Internet of Things (IoT) technology advances, various devices have been developed for smart life. The wireless sensor network technology has been used in industrial systems and smart homes [1][2][3]. Several wearable devices that can collect a large amount of physical activity data from sensors attached to a human body have been developed [4][5][6][7].
Human activity recognition (HAR) using wearable devices has been actively investigated for a wide range of applications, including healthcare, sports training, and abnormal behavior detection. Machine learning algorithms have been used to detect various human activities such as walking, running, and sitting using a smartphone as a sensing device [8]. Data regarding exercise motions such as standing triceps extension with a dumbbell and wide-grip bench press with a barbell were obtained using sensors worn on the forearm [9]. Another study classified the daily activities that are beneficial to the bones of premenopausal women [10]. A previous study also classified posture and motion with four accelerometer sensors at sternum, wrist, thigh, and lower leg [11].
Most existing studies focus on simple activities such as walking, running, and sitting wherein whole body movement is required. It is, however, necessary to classify daily activities such as cooking, eating, and working to realize various applications. Daily human activities were identified in an IoT-enabled smart home equipped with a variety of sensors in a previous study [12]. An activity is characterized by a combination of sensing data obtained from multiple sensors.
In this study, we propose an HAR system that collects data from an off-the-shelf smartwatch and uses an artificial neural network for classification. Smartwatches are effective and readily available wearable devices for use in HAR systems. Wrist-worn smartwatches can provide sensitive information on human activity as well as the information on whole body movement. We consider 11 activities such as walking, cooking, and working in this work. If we accurately predict the activity of the user, we can improve energy efficiency and enhance user convenience.
Furthermore, we propose an enhanced HAR system using location information in addition to movement information. We expect this enhancement to improve classification performance since certain human activities can be done only in certain locations. For example, if the individual is located at a public transportation location, cooking is an activity that can be ruled out as a possible interpretation of the sensor data.  The rest of the study is organized as follows. In Section 2, we describe the motivation of our work. In Section 3, we explain the system overview, and we propose a novel classification scheme for classifying various human activities in Section 4. In Section 5, we evaluate the performance of the proposed classification scheme. The conclusions are given in Section 6.

Approach
Smartwatches are one of the most familiar and most widely used wearable devices. According to Strategy Analytics [13], global smartwatch shipments in the first quarter of 2016 amounted to 4.2 million and occupied 62.4% market share of wearable devices in 2016. Smartwatches are well suited for gathering data that can classify user activities in realtime, as it is used by many people and is constantly worn on the body. A wrist-worn smartwatch provides more sensitive information on user behavior than a smartphone in a pocket. Since it is likely that a typical user wears the smartwatch on the wrist of the dominant hand, it can sense the movements of that hand.
We use the accelerometer in smartwatches for classifying human activities. In HAR research, both gyroscopes and accelerometers are usually used to classify activities. However, there is little improvement performance between using both sensors and using only an accelerometer [14]. Moreover, the accelerometer is embedded in most wearable devices and smartphones. Thus, the proposed system uses acceleration data.
We also propose the use of location information. Certain human activities can be done only in certain locations. For example, office work takes place in the office but cooking does in the kitchen. By using the location information as a feature, we can set up more detailed classifiers according to the location and thus can improve performance.
In this study, we consider three locations and 11 activities, as shown in Table 1. First, we consider five activities in offices: office work, reading, writing, taking a rest, and playing a computer game. Office work includes writing e-mail, coding, and writing a document on a computer. Second, we consider three activities in kitchens: eating, cooking, and washing dishes. Finally, we consider three outdoors activities: walking, running, and taking a transport. Transportation includes taking a bus or riding in a subway.

System
Structure. The proposed system comprises a smartwatch, smartphone, and server, as shown in Figure 1.  A user wears a smartwatch on his dominant hand, which collects data from the acceleration sensor. A smartphone gathers the sensor data from the smartwatch via Bluetooth communication and then transmits the data to a server. The server processes the collected data and classifies the activities using machine learning algorithms.
A smartwatch application was developed to collect data values of the acceleration on the x, y, and z axes. Moreover, the user chooses an activity label from a list of available activities on the application before starting the activity, as shown in Figure 2. This ensures that the sensing data are classified accordingly and these labels are used for training the classifiers. A data capture is operated by the start/stop button on the smartwatch application interface. During the data capture, sensor data from the smartwatch are collected and sent to a smartphone. We assume that users do not perform multiple activities at the same time. After collecting the data of acceleration and activity, the smartwatch transmits data to a smartphone by Bluetooth communication.
The smartphone provides a bridge for the data between the smartwatch and the server. The server has three roles: data storage, feature extraction, and classification. The data from the smartwatch and phone are saved on the server. This data are used to extract features for classification. The server makes datasets for HAR in the feature extraction stage. This dataset is used for training and testing the classifier.
To evaluate the performance of the proposed system, we used an Apple Watch Series 2 and an Apple iPhone 6. The server, running the CentOS, is equipped with Intel Xeon E5-2630 v 2.2Ghz CPU 2EA, 256GB RAM, and GTX1080Ti GPU 4EA. We designed the proposed classifier based on Tensorflow [15], which is an open-source software library for machine learning.

HAR System
In this section, we explain the proposed HAR system in detail, as shown in Figure 3. We designed two models.
Δt Δt Δt Δt Δt  Figure 3(a) shows a basic model that uses only acceleration sensor data. Figure 3(b) shows a model that uses location information in addition to the acceleration sensor data. Use of the location information allows for more appropriate, specific, and detailed classifiers.

Temporal Segmentation.
Acceleration data measured in smartwatch are divided into time segments before extraction [14]. The sliding window technique is widely used and has been proven effective for handling streaming data [16,17]. Figure 4 shows two schemes with an example of segmenting the accelerometer signal, where X, Y, and Z represent the three components of a triaxial acceleration sensor. All of time interval is the same as Δt. Δt is defined as the window size. → D refers to the readings of X, Y, and Z in the period of time [t, t + Δt]. In the case of nonoverlap, → D and → D +1 come from different periods of time, as shown in Figure 4(a). For the overlapping situation, → D and → D +1 share parts of the sensor readings. We use the overlapping window method because it generally has better smoothness than the nonoverlapping window method when handling continuous data. The two adjacent time windows overlap by 50%.

Feature Extraction.
In this section, we explain the feature extraction. The features are very important in our machine learning model because their configuration changes the output based on six types of grouped streaming data.

Wireless Communications and Mobile Computing
We extract informative features from each time window. For each time window, we extract a single feature vector f as follows: , and → D (Z) represent the acceleration vector of an axis X, Y, and Z, respectively. We calculate the average and standard deviation of each axis of the triaxial accelerometer as features for machine learning algorithm. The average value is a good indicator of the extent to which each axis value of the acceleration sensor is measured for each activity. The standard deviation is a useful measure to quantify the amount of variation or dispersion of a set of activity data. The active degree of the activity can be predicted by using the standard deviations.

Classifier.
The feature vector of the time window is used as the input to the classifier. We consider two models: one uses only acceleration sensor data, as shown in Figure 3(a), and the other uses the location information as well, as shown in Figure 3(b). Using location information, a more specific and detailed location-based classifier can be applied. There are well-known ways to obtain location information indoors [18,19] and outdoors [20,21]. In this study, however, the location information is not collected from the system and derived from the type of activities. For example, if the activity label of the time window is cooking, the location information is derived to kitchen. We consider three locations: office, kitchen, and outdoors. In real life, the location information can be collected from GPS and indoor positioning system. The classifier is designed by a multilayer perceptron which is a class of feedforward artificial neural network (ANN). Weight is initialized using Xavier algorithm [22] and bias is initialized randomly. We use ReLU as the activation function. Xavier and ReLU are commonly used algorithms to reduce the learning time in the field of ANN. The mean value of cross entropy is used for the cost function. The learning rate is set to 0.01, and the Adam optimizer [23] is used since it is known to achieve good results fast. Table 2 is an overview of the dataset used in this study. The dataset, which has been deposited on the website http://ncl.kookmin.ac.kr/HAR/, was collected from two volunteers who performed activities using a smartwatch attached on the wrist of their dominant hand for four weeks. The accelerometer worked at a sampling rate of 10 Hz. The task was to distinguish the following eleven activities. The dataset was preprocessed and segmented with the sliding window, which was variable in one experiment, 10 s long in other experiments, and had fifty percent overlap between two adjacent segments. We extract a range of features associated with the accelerometer. In all experiments, we used a 5-fold validation algorithm for reliability.

Performance Measures.
For the effective performance evaluation of the proposed system, we used the following four indicators: accuracy, precision, recall, and F1-score. Table 3 and Equations (2) to (5) show how accuracy, precision, recall, and F1-score are derived, respectively. These four expressions are the most frequently used performance indicators for machine learning models.

Effect of Location Information.
The experiment tests and evaluates two models: one using the dataset with location information and the other using the dataset without location information. Figure 5 shows the accuracy of each activity of the two models. We observe that the model that does not use location information is less accurate than the model that uses it. On average, the model with location information shows an accuracy of 95% and the model without location information does 90%. The proposed activity classification model is designed as a 5-level layer ANN. The window size is set to 10 s. We used a 5-fold validation algorithm to increase confidence in the results.
The following is a more detailed analysis of the experimental results of the two models. Tables 4 and 5 show the results of the two models on a confusion matrix. We observe that the model using location information wrongly attributed only the activities possible at the same location, whereas the model without location information misattributions can extend to other activities in different locations. First, we evaluate the result of the model that does not use location information. In Table 4, the A11 activity is identified with the least accuracy. A11 activity is particularly confused with A4   and A5. A7 and A8 are confused with each other. Moreover, we can confirm that A1 and A3 are much confused with A2 and A4. In Table 5, the result of the model using location information shows that the prediction accuracy of A11, which was confused with A4 and A5, is improved. However, because A7 and A8 have the same location information, they are still confused. All of the other activities are less confused with activities in other locations and the accuracy shows improvement.
We have confirmed through this experiment that our proposed model can classify daily activities as well as simple activities. In addition, it is confirmed that location information is very helpful to classify activities. We classify activities in daily life with an accuracy of about 95%.

Effect of the Number of Layers.
The following is the performance analysis according to the number of layers. Figure 6 shows a graph indicating the performance for each model according to the number of layers. The performance increases toward the 5-level layer and decreases again as the layer level becomes deeper than the 5-level layer. This pattern is not related to the location data; it seems to be a feature of the ANN. Level models that are too shallow do not learn effectively, and too much depth is not effective because the number of levels required for learning is too great.
The 5-level layer model works best for both cases, with and without location information, yielding accuracies of 96% and 91%, respectively. When the model uses location information, the performance of the 3-, 4-, 5-, 6-, and 7-level layer models are all above about 95%, which is high enough. Without location information, 4-, 5-, and 6-level layer models are all above about 90%.

Effect of the Window Size.
In this experiment, we initialized the window size at 1, 5, 10, 30, 60 (1 min), 120 (2 min), 180 (3 min), and 240 (4 min), and the prediction rate of each window size was examined. Figure 7 shows a graph of the performance according to the window size of the model based on the 5-level layer model, which yielded the best  performance in the previous results. Figure 7(a) shows the result based on the data set without location information, and Figure 7(b) shows the result using the location information.
Window size is the unit by which the prediction model is based on classifying activity. In other words, it is necessary to have data corresponding to the window size to determine the activity. A smaller window size increases the prediction rate but decreases the prediction accuracy; however, while a larger window size would increase the accuracy at the cost of rate, there is a limit: beyond a certain window size, the accuracy could be negatively impacted due to overlap with other behaviors, which could confound the results. Herein, we quantify effective window sizes. When the window size is less than 3 s (1 s, 0.5 s), the prediction speed is very fast but the accuracy is very low. When the window size is more than one minute (2, 3, 4, and 5 min), the prediction speed is very slow and the accuracy does not increase any further. Therefore, we judge 10 s as the optimal window size, which balances the inversely proportional parameters: predicted speed and accuracy.

Comparison with Other Machine Learning Algorithms.
Herein, we compare the performance of the most commonly used algorithms in supervised learning: decision tree (DT), random forest (RF), and support vector machine (SVM). This experiment uses a Scikit-learn library, which is a free software machine learning library for the python programming language [24]. Each model finds optimal model parameters through a grid-search function. Classification models are designed with the found model parameters to classify activities in everyday life using the same dataset. Figure 8 shows the result of the machine learning algorithms. Figure 8(a) is the result obtained using dataset without location information. In this study, RF shows the best performance, but ANN imperceptibly lagged behind by RF. The gap of both the models was less than 0.1%. However, the Wireless Communications and Mobile Computing other models showed poor performance. The performances of both DT and SVM were less than 75%. These results cannot determine whether the classification model works well. Figure 8(b) shows the result obtained using dataset with location information. Overall, the performance of all the models improved, as shown in Figure 8(a). The performances of DT and SVM were still inferior compared to those of RF and ANN. However, the performances of both models improved by 15%. The performances of the RF and ANN also improved by 5%. When using location information, the best model is ANN. All performance indicators differ by more than 1%. Figure 9 shows the results of the real-time activity recognition evaluation. We let one participant do seven activities consecutively for 2 min each, and we predict the activity of the subject using the predictive model previously learned by using the total dataset. Figure 9(a) shows the results of using the predictive model learned without using the location information, and Figure 9(b) shows the result of using the predictive model using the location information.

Real-Time Evaluation.
Both the models were observed to be confused with completely different activities in the transaction section. This is interpreted as a result of the fact that the data in the corresponding interval may contain two or more patterns when a person performs the action as a continuous action. Cleaned sections are clearly classified as well because they clearly include the characteristics of previous or next actions. When collecting data for this experiment, the subject's A2 activity pattern tended to be very similar to the A3 activity pattern of the entire data set that model learned. Both classification models confuse A2 activity with A3 activity. It appears that there are many confusing features of A3 activity within the subject's natural A2 activity. Next, each model is divided and evaluated. Figure 9(a) shows the results of a model that does not use location information and is highly confused with the A3 and A2 activities. In particular, A3 activity is confused with activities at other locations such as A6 and A11. This problem is not observed, as shown in Figure 9(b), when the model uses location information. In Figure 9(a), the prediction rate of A2 is less than 50%, but it is more than 70% in the model using location information. Even if there is a lot of confusing data such as A2, the model using the location information has a higher accuracy by using more detailed classifier.
When the model is designed using the epoch, window size, and layer level as determined in the previous experiment, it is possible to confirm a considerably high prediction rate in an experiment that recognizes human activities in realtime. Consecutive activities of a person are difficult to define in terms of a single activity. Therefore, the problem of low accuracy in the transaction section is difficult to solve because the definition of activity is not accurate. In contrast, the problem of confusing A2 with A3 seems to need to be overcome by reducing the overfitting problem and adding new features or designing a model that take into account various activity patterns.

Conclusion
In this study, we proposed a HAR system using an off-theshelf smartwatch and ANN. We also showed that the location information can enhance the performance of the system. We considered 11 activities, including both simple activities and daily activities and our experimental results showed that various activities can be classified with an accuracy of 95%.
Energy efficiency can be improved by using the proposed system as it can accurately predict the user's activity, consume only the energy required for the activity, and decrease wastage of energy. Moreover, the proposed system can enhance convenience. For example, after the system predicts the user's activity, it turns off the light when the user is lying in bed.

Data Availability
The raw data is available at "https://www.dropbox.com/s/ x4n1za8zo3oe8eg/rawData.csv?dl=0" or from the corresponding author upon request.