Personalization in Mobile Activity Recognition System Using 𝐾 -Medoids Clustering Algorithm

. Nowadays mobile activity recognition (AR) has been creating great potentials in many applications including mobile healthcare and context-aware systems. Human activities could be detected based on sensory data that are available on today’s smart phone. In this study, we consider mobile phones as an independent device since sending the data to central server can generate privacy issues. Furthermore, applying AR on mobile phone does not only require an effective accuracy rate but also the lowest power consumption. Normally, an AR model learnt from acceleration data of a specific person is distributed to other people to recognize the same activities instead of generating different models individually. This work often cannot create accurate results on the prediction in broad range of participants. Moreover, such AR model also has to allow each user to update his new activities independently. Therefore, we propose an algorithm that integrates Support Vector Machine classifier and 𝐾 -medoids clustering method to resolve completely the demand.


Introduction
Currently, mobile-based AR has been applied widely in various technological aspects to enhance the quality of life. In healthcare applications, it has been used to assess physical activities and aid cardiac rehabilitation, detect a fall event as in our achievement [1], predict user's energy consumption based on monitoring activity of daily living (ADL), and generate daily, weekly, and monthly activity reports in order to promote health and fitness. In contextaware pervasive computing systems, mobile accelerometer has gained significant achievements. In term of user's device security improvement, gait recognition is studied as a potential protection mechanism [2,3]. Moreover, human activity information can also be used to adapt automatically the behavior of using mobile phone. It can include sending calls directly to voicemail if a user is bicycling or jogging, turning on music when jogging is taken place, and so forth. In order to gain these benefits, mobile phone accelerometer data must be processed at device or central server via communication channels (Wi-Fi, Bluetooth, etc.). In this study, we consider mobile phones as an independent device since sending the data to central server from mobile device can generate privacy issues [4]. Normally, there are 3 steps in AR. First, data windows from segmentation of accelerometer signals are taken. Second, some features that describe the clearest properties of studying activity are extracted. These preprocessing steps are the most important parts of AR system since the last step is classification that can be studied by any existing machine learning algorithms.
Some recent mobile achievements attempted to recognize ADL as in [5,6]. However, these achievements also remain restrictions including the instability of accuracy especially in cross-people prediction which measures the sustainability of classification features in predicting activities of new people based on a trained model from a specific person and lacking of evidence about energy consumption since mobile devices are powered by limited energy resource and memory storage.
To resolve these problems, in our latest studies [7,8], we proposed (1) an effective classification feature extraction to balance accuracy and energy consumption and (2) an adaptive strategy for energy saving by selecting appropriately the combination of feature classification (CF) and sampling frequency (SF) for each activity.
From our previous achievements, classification models were created offline from the training data and ported to different mobile users to measure the tolerance of the classification. However, in fact it is hard to generate a unique model that can classify all users' motion activities in a large scale of participants.
This issue was denoted in our previous achievement as in Figure 1. The overall accuracy descended significantly when we increase the number of subjects. Moreover, an effective mobile AR system should allow users to update their own new activities instead of only predicting predefined activities. Addressing these needs, we propose an algorithm to achieve the following goals: (i) enhancing the personalization in mobile AR system for predefined activities based on the actual data of its owner; (ii) allowing new activities that are updated on each mobile user.
Applying AR on mobile phone is not like applying on power machine. It not only requires effective accuracy but also guarantees the lowest power consumption. Therefore, we propose the following methods.
(i) Firstly, we propose an effective real-time mobile AR system. Its preprocessing phase includes motion analysis on -and -axes from mobile triaxial accelerometer and segmenting on main -axis based peak detection algorithm. Proposed classification features in each window are extracted in time domain. Section 3 presents this core system.
(ii) A personalization algorithm is presented through the method that selects confident samples to update the existing model. The proposed algorithm is integration of SVM classifier and clustering algorithm. In this study, clustering algorithm is aimed to explore two common partitioning methods--means [9] andmedoids [10]. Section 4 presents our personalization method.
ADLs in this study include walking, jogging, bicycling, going upstairs and downstairs, and running while the phone is attached at front pant pocket of mobile user. Experiments of our two main contributions are presented in Sections 5 and 6, respectively. Finally, Section 7 gives our conclusion.

Related Works
A few recent AR achievements, like ours, did use a commercial mobile accelerometer to measure real-time running capability of mobile phone without sending data to central server. The authors in [11] achieved 75.3% and 73.95% in walking, jogging, upstair, and downstair activities by using J48 and Multilayer Perceptron. By using iPhone accelerometer, the authors in [12] predicted walking, running, biking, and driving activities with average accuracy to reach 93.88% by using SVM. However, driving is similar with relaxing activity which does not have significant changes in acceleration. Therefore, prediction becomes easier. Other authors Multiple-subject AR accuracy Figure 1: Accuracies of mobile AR system in multiple-subject prediction type on SCUT-NAA dataset in our achievement [8].
in [5] had restrictions in detecting upstairs and downstairs activities.
According to on-board training on mobile phone as the classification model is created individually on every user [11,13], this approach can allow users update their own new activities. However, a sufficient training sample quantity must be collected on each phone user that could cause unexpected inconvenience. Thus, an implicit method that allows personalization on predefined activities and updating new activities is necessary in user's mobile AR system.
Based on transfer learning technique [14], personalization issue was also recently studied on mobile [15]. In this approach, -means is used to select confident samples, and the convergence of personalized model comes from comparing parameters inside the model at two continuous iteration steps. In our solution with another approach, the equality of confident samples at the steps is considered as a stopping condition in the personalization phase. With the core SVM classification, we measure the performance of clustering methods among the -means and -medoids through the presented targets.
In our study, the targets are denoted as in Figure 2. Model A is generated from user A's data on specific activities. The model is then distributed to other users with essential demands including personalizing these activities and updating new activities for each user independently.

Mobile AR System Based on Gait Cycle
In this study, we paid particular attention to front pant pocket because of the popularity of attaching phone at this position. The mobile phone was vertically placed at the pocket location as shown in Figure 3. From three axes of accelerometer, the -axis captures horizontal movement of the user's leg. Theaxis captures the upward and downward motion; the -axis captures the forward movement.

System Overview.
A flowchart of the proposed mobile AR method is depicted in Figure 4. Only effective axes of mobile accelerometer that presents the most clearly gait cycles in human motion are used. A set of features is extracted on time domain after passing preprocessing steps. SVM classification is used in our core system.

Effective Axis Selection.
Gait cycle notion is used to explain the combinative force that is recorded by mobile accelerometer. The notion is defined as the time interval between two successive occurrences of one of the repetitive events when walking. In other words, two consecutive steps form a gait cycle. As shown in Figure 5, the cycle starts with initial contact of the right toe, then it will continue until the right toe contacts the ground again. The left goes through exactly the same series of events as the right, but displaced in time by half of a cycle. When the toe touches the ground in phase "a" or phase "g" as in the figure, the association between ground reaction force and forward inertial force together makes the -axis signal strongly changes and forms peaks with the high magnitude. In fact, in order to make changes of -axis we need the change of -axis beforehand since heel touches the ground. It explains why we always have peaks ofand -axes like negative peak, and positive peak respectively, as illustrated in Figure 6.
From our observation, we realized that ground reaction force is expressed most clearly especially in jogging and upstair, downstair activities by negative peaks. Therefore we consider -axis signal as the main axis. The first window segmentation is started by the first peak on -axis instead of choosing first point of the window. This reflects clearly properties of activities since it shows how many peaks existed in one window so that it enhances accuracy in matching method. In overlapping of windows, a next window is still started at a peak which occurred in previous window.
Other values on -axis are just selected at the same time with values. A gait cycle is defined between two consecutive peaks on -axis.
In previous studies [5,6,11,16], windows can be segmented from arbitrary points which only guarantee the length of the window. This might cause an incorrect reflection of human activities on each gait cycle. Thus, the segmentation method based on peaks is used in our study to extract clearly gait cycle's features.

Linear Interpolation.
Due to power saving function and the built-in nature, an accelerometer on mobile phone is simpler than standalone one. The sampling rate is rather low. Time intervals between two consecutive acceleration values are also not equal. A sensor only generates value when the forces acting on each axis have a significant change. Therefore, we interpolated the acquired signal to 32 Hz ( Figure 7) using linear interpolation method to ensure that the time interval between two sample points will be fixed.

Noise Elimination.
When accelerometer samples movement data by user walking, some noises will inevitably be collected. These additional noises could have come from various sources (e.g., idle orientation shifts or bumps on the road while walking). A digital filter needs to be designed to eliminate noises. Multilevel wavelet decomposition and reconstruction method are adopted to filter the signal. According to Figure 8, original signal is denoted by ( ). High-pass filter and low-pass filter are denoted by HF and LF. On each level, the outputs from high-pass filter are known as detail coefficients. On the other hand, low-pass filter outputs contain most of the information of the input signal. They are known as coarse coefficients. The signal sample is down by 2 at each level. Coefficients obtained from the low-pass filter are used as the original signal for the next level and continue until the desired level is achieved. In contrast, reconstruction is the reverse of decomposition process. To eliminate noises, we assign the detail coefficients to 0. The reconstruction of the signal is computed by concatenating the coefficients   of high frequency with low frequency. During experiment, the Daubechies orthogonal 6 wavelet (Db6) is adopted for signal decomposition and noise reduction. Figure 9 showed the signal after noise reduction using Db6 at level 2.

Peak Detection Algorithm.
In order to segment window based on axis peaks, we designed an algorithm to detect these peaks as follows.
The original signal is denoted as ( ). First, we extract a set of peaks from ( ). A data point is called peak if its value is less than its previous and next one. Let where is the ith value in ( ). Threshold is estimated to determine true peaks using (2). The peaks which have magnitudes lower than are identified as set of true peaks : where , are mean and standard deviation of all peaks in , respectively, and is the user-defined constant. Figure 10 shows the threshold with different values. In our experiment, choosing = 1/3 gave the best partition rate.  The following time features (TFS) on sliding windows are chosen in order to record the most clearly gait cycles on each window. They are extracted on effective -and -axes, and this is different from [6,11,16] since these achievements used features on all axes. in [5,17]. Since sampling frequency rate was stable by using linear interpolation, this energy value shows amount of the change on a physical activity. Its value has a significant difference among activities like changes in jogging occurred in both of -and axes but the concentration only focuses on axis in bicycling. Equation dedicating it in a window size is presented as where is acceleration at time on -or -axis.
(iv) Hjorth Mobility and Complexity. In electroencephalography (EEG) signal analysis, Bo Hjorth [18] derived certain features that describe the EEG signal by means of simple time domain since this signal cannot be associated with the sine function used in frequency domain. These parameters, namely, Activity, Mobility, and Complexity, were used to characterize the EEG pattern in terms of amplitude, time scale, and complexity. These values were applied in [5,19] for emotion assessment and also in accelerometer, respectively. Mobility is a measure of the signal mean frequency. Complexity measures the deviation of the signal from the sine shape. Both values are scalar features performed as follows with var( ) being variance function of signal and standing for the derivate of signal : From analyzed TF features, our classification is acted by using SVM classifier which is used widely in AR [5,16]. SVM algorithm is set of support vectors which separate training samples to a corresponding class by maximizing margin of hyperplanes among classes. In this work, we use Radial Basis Function (RBF) kernel in order to map support vectors to multiple dimensions since there are eleven TF attributes.

Personalization in Mobile AR System
An AR model is firstly trained offline on power machine by actual mobile acceleration data on known activities of a specific person through the proposed feature extraction. Personalization on mobile AR system in our method is studied on two phases. Firstly, the system has to allow users to update their own new activities. Secondly, in the context of sharing the trained model, the system has to update implicitly and suitably activities trained on that model. This is to predict accurately activities of new users since each person has different ranges in data distribution.

On-Board Training Undefined
Activities. Generating individual model for each person causes a large data acquisition and inconvenience for mobile users in reality. Thus, as the simulation shown in Figure 2, the phone on person B uses the model of user A to predict his own activities. A new activity with its label which has not yet been defined inside the model can be updated directly on mobile based on labeled samples of the activity. Consider = update ( , tar , ) , where tar is extracted feature set of a sample of . This function simply increases predicting new activities of person B on the model through on-board updating on his phone.

6
International Journal of Distributed Sensor Networks

Improving Personalization in Predefined Activities.
In the scope of a small user group, a user's trained model could recognize the data of other people with acceptable accuracy rates. However, in fact, it is hard to generate an effective model that can classify all users' motion activities in a large quantity of participants; this means that the trained model is not sustainable for a really new person because of the different data distribution among participants. Thus, the solution of taking an effective amount of unlabeled data of person B for personalizing not only makes more convenience but also enhances prediction ability for those who use this model instead of keeping fixed accuracy.
With activities trained on from user A, updating by samples of these predefined activities from user B can increase the probability of false-positive rate; therefore personalization means reconstructing parameters of the model to improve the true-positive probability in prediction without expanding data distribution of these activities.
Our solution for this issue is combining an effective clustering algorithm and SVM classifier together. Firstly, we build model A with labeled samples of person A. The model is then transferred to person B's phone. Secondly, we classify the unlabeled samples of person B to model A. These data are then used to adjust model B based on choosing confident samples for updating model A. These steps are represented as in Figure 11.
In the first step, user B's samples tar = { 1 , 2 , . . . , } with their corresponding label could be recognized inaccurately by the original SVM model of person A, where is th attribute of tar . Consider where function represents data distribution of activity on each person. The merit of SVM algorithm is performed by maximizing margin of hyper-planes among classes. When samples of activity of person A and B are similar and exclusive from other activities, their hyper-plane can be separated easily in from others. In fact with randomly selecting participants A and B, ( ) covers ( ) and closes with ( ), where ̸ = . It involves increasing of false-positive rate as the result from (7).
In second step, clustering algorithm is a division of samples into group of similar objects. Objects possessed by the same cluster tend to be similar from their feature set { 1 , 2 , . . . , }, while dissimilar objects are covered by different clusters. In fact, activity 's samples on person B are similar and they perform a unique motion style which is different from other activities [20]. Clustering into groups based on feature space { 1 , 2 , . . . , } performs actual activities of person B. Among clustering methods, we study on two common methods--means and -medoids-for measuring real-time running capability on mobile device. A clustering method creates partitions, called clusters, from given set of data objects. In our study, the dissimilarity between two objects is measured by using Euclidean distance as Each partition is represented by either a centroid or medoid. A centroid is an average of all data objects in a partition as in (9), while medoid is the most representative point of a cluster [21]. An iterative relocation technique is used to improve the partitioning by moving objects from one group to another. Consider where and denote th class and its samples, respectively. For both of the two methods, clustering quality and iteration number are related to the initial cluster centers relied on the fact that the new user B has known the number of output classes from before personalizing. In each cluster, samples which are nearest to their center point are selected since they perform the convergence based on the inner relation of samples in the same cluster. Labels of these samples are then updated by activity label of the center point. In -means method, the center point is assigned by the sample defined as = min 2 ( tar , ) with ∀ tar ∈ .
These confident samples are used to update model . Our proposed algorithm is summarized as the following in case of = 5, which means that has five activities on person B; see Algorithm 1.
At step (1), model that was trained based on time domain features of user A is transferred to new user B. From step (3) to step (5), unlabeled samples of user B are classified and clustered based on the parameters including , Δ Time , and number defined before. At step (6), confident samples from each class are selected to update with label of their center point. The personalized model for user B is returned when there exists the convergence in Confidence Sample or the excess Δ Time .

Experiment Results in Mobile AR System Based Gait Cycle
SCUTT-NAA dataset [16] and our self-constructed data are used in our experiment to measure the efficiency of our realtime AR mobile system and personalization algorithm.  removed because these points almost could not present clearly a specific activity's change. Moreover, since our noise elimination step requires a 2 sample length, we firstly shorten the length of each sample to satisfy the conditions in Table 1.

With SCUT-NAA
After that, a Daubechies level 3 filter is applied in this step to gain certain information in the change of each activity. Next, peaks on axis are detected to segment windows. In this dataset, we found that positive peaks could describe more clearly than negative peaks on the same axis. As our best analysis, this phenomenon can be explained such that the author of this dataset placed accelerometer's direction different from our built-in mobile accelerometer. Therefore, our peak detection algorithm is changed a little bit including replacing (1), (2) and (3), respectively, as follows: Moreover, the interval between true two consecutive peaks has to be greater than 29 data points. We did not choose overlapping method on sliding windows since there was a large data length in each activity of subjects. We separate 512 samples per window. Each window is started from a axis peak which is the nearest peak to the last one of previous window. LIBSVM was used to train and predict activities in the dataset and our data.    Figure 12 shows overall accuracy from our four methods and Xue and Jin [16]. By using SVM with RBF kernel, our TF method shows stable accuracy in predicting the activities. Our contribution in this work is expressed by removing last points in data, selecting effective -and -axes without using -axis, selecting different filter, and segmenting windows from peaks on main -axis. To the best of our knowledge,   T  P  T  P  T  P  T  P  T  P  1  3  this is a novel approach to predict accurately ADL activities based on mobile accelerometer.

With Our Self-Constructed Data.
In order to ensure running successfully in real-time environment on mobile accelerometer, we also develop an application for collecting data from triaxial BMA 150 accelerometer on Google Android HTC Nexus One from six volunteers. The mobile device measures acceleration force up to ±2G. The sampling rate is approximately 30 Hz on SENSOR DEPLAY FASTEST mode in Android SDK. The data were collected at normal speed of each subject in natural environment. Table 2 shows the numbers of training and predicting data of each activity. In our data, we used Daubechies at level 2 as noise filtering. Distance between consecutive peaks has to be greater than 23 data points. From the proposed method, our overall accuracies in single and multiple-subject types are shown as in Table 3. The overall accuracies express our contribution in preprocessing phase of mobile AR system based on gait cycles. They guarantee an effective prediction and improve the precision of achievements presented in Section 2.

Efficiency in Real-Time AR on Mobile Users.
To measure the performance of our system in real-time running on mobile, we developed two prototypes for this experiment. First prototype uses the proposed method which extracts features in time domain, and the remaining is implemented by using classification features in frequency domain. Public libraries including Android 2.3 OS, Jtransforms, LibSVM, and Weka are used inside these prototypes. Since mobile accelerometer does not collect data when the screen is turned off, our prototypes are developed as a foreground service. Other functions including wireless, light screen were turned off during the experiment. We also developed a power measurement module based on battery event API from Android SDK to record battery's changes. Trained SVM model of each volunteer approximately taking 42KB is ported to SD card of the device to predict new incoming signals. Each prototype takes about 11MB on phone's memory. Its snapshot is shown as in Figure 13. Figure 14 shows average consumptions of two prototypes within 2.5 hours of volunteers. The method using SVM classifier and features on frequency domain consumes much energy than the remaining method in average.  Figure 13: The implementation on HTC Nexus and its volume.
where is total number of data points in signal, and is true peaks. Data analysis in time domain can retrieve useful information which describes characteristics of signal better than amplitude of coefficients in frequency domain. Therefore, the features on time domain are used to personalize SVM models among mobile users in next steps.

Performance of Personalization on Mobile AR System
To illustrate the performance of our algorithm, we verify the personalization algorithm with two subjects who yielded the lowest accuracy from multiple-subject prediction type as shown in Table 2. Experiment is formed by two phases including personalizing predefined activities and measuring the efficiency of the personalized model in predicting new activities.

Personalization in Predefined
Activities. Firstly, we measure the impact of training samples in multiple-prediction type of these two subjects. SVM model A was created by person A from total 148 samples of five activities including bicycling, downstair, upstair, jogging, and walking. In 148 training samples of person A, we divided them into five groups with different sizes corresponding to 30, 58, 86, and 148 samples. Each group is used to train a model and predict all testing samples of person A and person B, respectively. The results are shown in Figure 15.
The accuracy of the SVM model from person A increases significantly when the number of training samples varies from 30 to 86 samples. Otherwise, from 86 to 148 samples, the accuracy increases less than the previous range. In other words, with 86 training samples, we can build well a sufficient model to predict testing of person A. Accuracy of predicting all testing data of person B also increases gradually when we increase user A's training samples because the probability for successful prediction is improved when new data of person B are added more.
In order to observe the effect of confident samples used for personalizing algorithm in cases of -means and -medoids clustering algorithms, we used all 148 training  samples of person A to predict different groups of testing data of person B, respectively. We divided testing data of person B into five groups with quantities as in 30, 60, 77, 89, and 106 samples. Figure 16 shows the accuracies when we vary and the number of testing samples. For each case, its accuracy can converge regardless of whether is small or large when we increase testing samples. It can be explained that new testing samples of a new user are very diverse, and the probability of matching correctly to personalized model can be increased. However, the samples must perform a unique motion style of individual people in each activity. They cannot have quite difference to each other. Moreover, -means algorithm of personalization process divided separately activities into different groups. These reasons made a convergence in accuracy of a person when we increase testing samples to a sufficient quantity.
At each quantity of testing samples, changing confident samples can add more noises. Thus, its accuracy can be descended. We found that a value of = 25 yielded the optimal results. It can be explained that, when the value of is too small, the useful information of the new user is comparatively little.
= 25 is sufficient to increase 8% of accuracy compared with pure multiple-prediction type as in Figure 15. Figure 17 represents the performance of -medoids method. In contrast to the previous method, this clustering method performs stability when we vary number of testing samples. Higher accuracies compared to -means method in small testing sample groups are achieved. However, when testing sample number is over a 77-sample size, it involves In fact, -medoids method is more robust than -means in the presence of noise and outliers, because a medoid is less influenced by outliers or other extreme values than a mean, while -means perform well in processing large data sets with a discrete distribution. In cases of small testing sample groups, the model is updated by crucially correct samples when we select = 10 and 15. In large testing data groups including = 25, 30 and 35, the precision descended since more noises were enhanced inside. At the optimal value of = 20, the accuracy of personalization process is improved up to 11 percent compared with pure multiple-prediction type as in Figure 15. Table 4 summarizes the performance of both methods. Normally, ≪ and ≪ , and -medoids consume more time for personalizing although an effective accuracy and an optimal quantity of confident samples are achieved. During the test, the user-defined threshold for iteration Δ Time was set as 85. However, -means algorithm stopped at round = 25 while this number was identified at averagely 36 for -medoids algorithm. These results show an effective computational time for running real time on mobile.

Updating New Activity in Personalized Model.
In this experiment, we use two personalized models based onmeans at = 25 and -medoids at = 20, respectively. The models were personalized from all 106 testing samples on user B. To measure the effectiveness in predicting new activities, we collected new samples from new running activity on the user. Since running is different from jogging activity on velocity, and the false-positive rate of them is high.
The models are firstly trained by 40 running samples. In terms of prediction phase, a quantity of 60 samples on the activity is collected and then divided into four groups with different sizes as in 20, 40, 50, and 60 samples. From Figure 18, we could observe that (i) the model based on -medoids method performs the effective tolerance while sample size is varied; (ii) a convergence appeared in both methods regardless of increasing testing data quantity from the size of 40 samples. Since there are many activities that need to be updated on the model, more training samples should be on-board trained to maintain an effective accuracy rate.

Conclusions
In this paper, an effective personalization algorithm that integrates SVM classification and -medoids clustering method is proposed to select confident samples for updating a given AR model. The algorithm's performance is verified through recognizing predefined and new activities. An increasing accuracy of 11% is compared to nonpersonalization approach. The personalization algorithm is developed based on a basic mobile AR system that extracts time domain features from windows segmented by peaks on axis. The effectiveness of system is compared to previous achievements in accuracy standard and another system that uses frequency domain features for energy consumption standard.