A Smartphone-Based Model for Human Activity Recognition

Activity recognition (AR) is a new interesting and challenging research area with many applications (e.g. healthcare, security, and event detection). Basically, activity recognition (e.g. identifying user’s physical activity) is more likely to be considered as a classification problem. In this paper, a combination of 7 classification methods is employed and experimented on accelerometer data collected via smartphones, and compared for best performance. The dataset is collected from 59 individuals who performed 6 different activities (i.e. walk, jog, sit, stand, upstairs, and downstairs). The total number of dataset instances is 5418 with 46 labeled features. The results show that the proposed method of ensemble boost-based classifier overperforms other classifiers that were examined in this research paper.


Introduction
Recently, the need for monitoring and recognizing human activities is increasing.This task can be accomplished by employing some machine learning techniques [1], [2].Human activity recognition (AR) might take part in many applications such as context aware behavior, smart environments, health care and security [3], [4] [2,5].The main aims of this research is to (i) employ and evaluate the performance of various standalone machine learning techniques for the AR task, and (ii) suggest an AR classification model that is more robust and accurate.To achieve these goals, a concrete background of the related works' results should be discussed, the proposed method performance should be illustrated, compared to previous results using trusted accuracy metrics for evaluation.This research will be organized as follows, the related work will be discussed in section 2, the dataset characterization will be also described in subsection 2.2.And in section 3, a number of classifiers will be employed, tested, and compared, and the result of our model will be presented and discuss the experiments.Finally, conclusions and recommendations for future work are presented in section 4.

Related Works
The recognition of physical human activity has been previously studied in some researches that depended on either accelerometer data collected from smart mobile devices such as in [6][7][8], or other wearable sensors [2,9,10].However, the most recent related works are explained in this section.Kwapisz et al. [11] in the wireless sensor data mining (WISDM) project 1 , collected a mobile phone-based dataset from 29 individuals who carried their Android smart phones on pockets while they were performing activities of daily life (ADL) such as sit, walk, climbing stairs, jog, and stand.However, they collected a dataset of 4526 instances with 46 features.That dataset was used to train 4 different classifiers (i.e.Decision Tree (DT), Logistic Regression (LR), Multi Layer Perceptron (MLP), and Straw Man (SM)) for the purpose of human activity recognition.Result showed that the MLP classifier was the best method with accuracy of 90%, while SM was the lowest accuracy classifier with 37.2% performance.Accordingly, the performance of the other two methods, DT and LR, was 85.1% and 78.1%, respectively.Trabelsi et al. [12] used inertial wearable-sensors to collect acceleration data, then used that data to train an unsupervised model to achieve AR task.The proposed model used Hidden Markov Map (HMM) for segmenting the data, and Expectation Maximization (EM) method for learning process.In other word, they proposed a Multiple Hidden Markov Map with Regression (MHMMR).Results showed that the proposed model performance was 91,4%, which, in comparison, is better than the performance of k-means (60.2%) and standard HMM (84.1%).In addition, they also evaluated some well known supervised learning methods (Naïve Bayes, MLP, Support Vector Machine, k-Nearest Neighbor, and Random Forest) and the results were 80.6%, 83.1%, 88.1%, 95.8%, and 93.5%, respectively.Statistically, supervised methods performed higher than the unsupervised methods.However, the dataset set that have been used is likely to be small (6 individuals), so additional work is needed on larger and wider data.Thus, Micucci et al. [3] proposed a rich and sufficient dataset, in both subjective and objective manners.This dataset (7,013 instances) was collected from 30 individuals (6 men and 24 women), with wide range of ages (i.e. from 18 to 60 year-old).In order to benchmark the dataset, two different classifiers were used: kNN and SVM.Furthermore, cross validation with 5 and 30 folds were evaluated on each classifier.Results showed that the performance of KNN (k=1) classifier is 86.89% and 86.47% for the SVM classifier.Bayat et al. [4] proposed an AR system with lowpass filter, which isolates gravity noise components from accelerometer raw data.Then, five classifiers were evaluated and compared to the suggested model, which is to use a mix of classifiers in one tier (average of probabilities).Results showed that a combination of MLP, LR, and SVM classifiers performed the best among other methods with 91.15% accuracy.Gupta and Dallas [7] proposed a feature selection based AR system to classify ADLs and falls.
During the feature selection process two functions were employed: Sequential Forward Floating Search (SFFS), and Releif-F.In addition, two classification methods were employed: NB and kNN.Results showed 98% accuracy for both methods.Although this work presents promising results, which outperform filter-based systems in accuracy, it tends to cost more computations and has a low generalization on other machine learning methods.However, there is a need for this approach to be investigated on richer data.Catal et al.
[13] proposed a multiple classifier system which utilized from MLP, DT, and LR and to be combined with the average probabilistic rule.The result was higher than using only MLP classifier.

Experimental Results
In this section, experiments of 5 machine learning classifiers (RF, NB, kNN, JRip, and CvR) are presented and discussed, in addition to our proposed method of ensemble multi classifier.The proposed method depends on boosting the performance of classification by using voting technique to a specific learning algorithm, repeatedly, and add the learned hypothesis [18], [19].Furthermore, cross validation method (with 10 folds) is used in all experiments, and results were compared in aspects of accuracy, F-measure, and root mean square error.However, the confusion matrix of the 7 classifiers are shown in below: 1-Confusion matrix of Random Forest From Table 1  The results illustrated in Table 3 show that 4658 instances were classified correctly.

4-Confusion matrix of Naïve Bayes
From Table 4 above, the results show that 4099 instances were classified correctly.

5-Confusion matrix of classification via regression
The results of CvR method listed in Table 5 show that 4877 instances were classified correctly.6-matrix of Adaboost (J48 as classifier)* From the confusion matrix listed in Table 6, results clearly show that 5106 instances were classified correctly.

7-Confusion matrix of Adaboost (Forest Random as classifier)*
The results listed in Table 7 obviously show that 5110 instances were classified correctly.Furthermore, in order to understand the accuracy of the 7 classifiers in detail, the overall and perclass accuracy are illustrad in Table 8.
From Table 8, results show that the combination of AdaBoost and FR methods in one classifier model comes with the highest accuracy of 94.31%.In addition, Adaboost with DT kernel (specifically J48) also performed very well with 94.24% accuracy.Random forest classifier comes in the third place with 92.89% accuracy.Furthermore, CvR classifier comes in the fourth place with accuracy of 90%, exceeding JRip and kNN classifiers with accuracy of 85.97% and 85.93% respectively.Naïve Bayes classifier performs the worst in that experiment with 79.99% accuracy.On the other hand, the results show that the jogging action has the highest accuracy of being classified correctly (97.54%), and walking comes in the second place with 96.43% overall class accuracy.Oppositely, downing and upping stairs actions were found to perform lowest class accuracy with 56.84% and 64.59%, respectively.However, On the other hand, root mean square error (RMSE) of each classifier is calculated and compared with other classifiers' RMSE values.Figure 1 shows the RMSE values of the 7 employed classifiers.
From figure 1, it is noticeable that the highest error value found was for the NB claasifier with 0.2345.On the other hand, the lowest error value was that of the AdaJ48 (0.1323) and AdaRF (0.1338) classifiers.
The other metric that we used to evaluate the accuracy of classifiers is F-measure, as shown in Figure 2.
Form Figure 2 above, it can be noticed that the highst average F-measure value was for the Ada classifier.In specific, the average F-measue values for AdaRF and AdaJ48 classifiers were 0.943 and 0.942 respectively.On the other hand, the lowest average F-measure value amonst the classifiers was that of NB with 0.781.

Discussion
The results obtained show that Ada method with RF kernel gives the highest classification accuracy with less error than other examined classifiers.Additionally, this result out performs the result obtained in [12], [4], [13], and [14].Yet, the main limitations that face the AR task might be the position and orientatation of the mobile devices, features of devices/ sensors, in addition to the sensored data nature [9], [20,21].

Conclusions and Future Work
In this research, a combination of classification methods were employed and compared in accuracy and error aspects.The results show that random forest classifier performance was higher than other classifiers.Furthermore, the proposed method of ensemble of multi classifiers system (multi kernel) improved the performance and reduced the classification errors in the task of activity recognition.Specifically, employing random forest classifier with boosting technique should give best classification results.Nevertheless, one of the challenges in real life application (e.g. a phone-context problem) might occur, that is the mobile phone's position is at inappropriate orientation and position for the target/ being sensed activity.However, as an idea for the future work might focus on preprocessing phase for the purpose of gaining more enhanced data with important affective features.Also, examining the performance of the suggested model to improve the task of human activity recognition based on other mobile data.In addition, the suggested model and techniques are needed to work on, and improve the performance, of online activity recognition task.

AVERAGE F-MEASURE
Al-Taei [14] employed 5 classification methods (i.e.MLP, NB, DT, SMO, and BN) for training WISDM (WIreless Sensor Data Mining) dataset of 6 different activities (i.e.walkning, jogging, climbing and downing stairs, sitting and standing).Results showed that MLP outperformed the other classifiers with overall accuracy of 92.65%.Lockhart and Weiss [15] analyzed and compared the performance of different AR models: universal, personal, and hybrid models.A combination of classification algorithms is used (i.e.LR, RF, NN, IBK, NB, J48, and JRip).Results show that personal models out performed other models.And the best classification method was RF, and NN, respectively.
Figure (1): Comparison of classifiers' error values above, it is noticable that 5033 of instances were classified correctly.2-Confusionmatrix of Instance Based (kNN=3).And from the results of the IB confusion matrix above, the total number of correctly classified instances is 4656.3-Confusion matrix of Rule Induction (JRip)