CAPHAR: context-aware personalized human activity recognition using associative learning in smart environments

The existing action recognition systems mainly focus on generalized methods to categorize human actions. However, the generalized systems cannot attain the same level of recognition performance for new users mainly due to the high variance in terms of human behavior and the way of performing actions, i.e. activity handling. The use of personalized models based on similarity was introduced to overcome the activity handling problem, but the improvement was found to be limited as the similarity was based on physiognomies rather than the behavior. Moreover, human interaction with contextual information has not been studied extensively in the domain of action recognition. Such interactions can provide an edge for both recognizing high-level activities and improving the personalization effect. In this paper, we propose the context-aware personalized human activity recognition (CAPHAR) framework which computes the class association rules between low-level actions/sensor activations and the contextual information to recognize high-level activities. The personalization in CAPHAR leverages the individual behavior process using a similarity metric to reduce the effect of the activity handling problem. The experimental results on the “daily lifelog” dataset show that CAPHAR can achieve at most 23.73% better accuracy for new users in comparison to the existing classification methods.

provide the inertial measurements such as accelerometer, gyroscope sensors, and social interaction data through smartphones and mobility patterns, respectively. Another way is to use the sensor activations which require a specific infrastructure for the humans to directly interact with the sensors such as smart switches, infrared sensors, and pressure sensors. Existing studies also use the interaction between the environment and the people, i.e. contextual information, to model the human behavior and recognize complex physical activities [3,4]. In the broad sense, contextual information can be viewed in two perspectives; location, identity, activity, time and user & role, process & task [5]. In this study, the contextual information is referred to as the data obtained by the secondary sensors associated with the performed action such as the time of action, the location where the action is performed, or the interrelationship of action and the object.
Existing studies use the context information as one of the features for training the action classifier to improve the accuracy of the recognition system, but they are limited to low-level actions. In our study, a low-level action refers to the basic human action performed in daily routine such as standing, sitting, walking, running, lying, and ascending/descending stairs. However, these are the fundamental ones and therefore can be recognized with quite a reasonable accuracy by using only inertial sensors [6]. Contextual information has also been used to derive high-level activities as well as to model human behavior with an assumption that prior knowledge is available for the interaction between human action and the specified context [7]. High-level activity refers to the higher level of abstraction for the performed action such as eating, personal grooming, desk work, and more. The high-level activities can be inferred by combining low-level actions and contextual information. Single or multiple contexts can be used depending on the available infrastructure and the sensor characteristics. For instance, time/duration of the activity can be obtained using the sensor characteristics such as an accelerometer, whereas the indoor location and object sensors need a fixed infrastructure. The high-level activity recognition can be performed either using direct inferencing by inference rules [8], and ontological reasoning [9], or integrating the contextual information in the machine learning framework, i.e. adding the context as one of the feature vectors [10]. The human behavior in this study refers to the way or routine in which the actions are performed while interacting with the available contexts. Existing studies have used human behavior modeling extensively for determining individual behavior, health anomalies, and human identification [2,11]. Human behavior modeling can be categorized into two schools of thought, i.e. social and applied sciences. The social sciences more deviate towards the impact caused by a certain behavior on society while the applied science field leverages the behavior to attain automation or derive recommendations. Previous studies have also combined the use of physical activity recognition and human behavior models to provide recommendations for healthy lifestyles, social interactions, and human identification [2,11].
The physical activity recognition systems heavily rely on the use of machine learning techniques or statistical inferences such as class association rules (CARs). The integration of inference rules (CARs) in a machine learning framework is referred to as associative learning. The reason for using CARs is that they provide a model that is simple and is proven to be effective in terms of accuracy and interpretability [12]. The CARs fall in the category of association rules which consider the rule consequent as a fixed class attribute [4]. The left side of the CARs could be multiple itemsets which are based on the conjunction of attribute-value pairs whereas the right side of the CARs corresponds to the target concept which refers to the activity class label in this study. These rules allow us to extract interesting patterns based on the attribute-value pairs concerning the target concept. The CARs in existing studies have been used with the sensor activations, i.e. an attribute-value pair, and without considering the contextual information [4,[12][13][14][15]. Furthermore, the CARs have not been exploited for personalized recognition services. The goal of human activity recognition (HAR) systems is to learn from annotations and acquired sensor readings to infer the performed activity. Most of the machine learning algorithms take into consideration the feature vectors which are either provided by the physical or contextual sensors. The problem is that the learning algorithm does not distinguish between the base and the contextual information; it explicitly trains the model. Therefore, if contextual information appears out of order, it will simply look up the weights assigned to a feature vector instead of considering the relationship between the sensor and the secondary information. Moreover, the shallow learning algorithms are either too complex such as random forests, and support vector machines, or considered to be "black boxes" like neural networks. Considering that the CARs are highly interpretable, they have competitive strength in the field of personalized activity recognition by providing inferential insights regarding human behavior. Hence, it is necessary to integrate the contextual information with the attribute-value pair of low-level actions to model the CARs for recognizing personalized activities.
There are two issues that we need to address on developing personalized models; activity handling problem and human behavior process. First, the activity handling problem, as aforementioned, is a phenomenon in which the subjects may perform actions differently with respect to multiple contexts (e.g., location, time, user, and identity information). The existing studies mainly focus on developing subject-independent models for activity recognition and are regarded as a one-fits-all activity model. However, the subject-independent models do not exhibit the same performance when applied to data from a new test subject due to high variation in terms of human behavior and the mannerisms for performing a specific action. The study [16] implies that the subjects are idiosyncratic; they interact differently with the context as per their style, behavior, and ability to perform different actions. It has also been proved that the subjects may perform actions differently with respect to the location and time of the day [2,11,17]. For instance, the "meal preparation" activity at home can be executed differently when performed at the office, or "housework" activity may vary in execution when performed in the morning time or at night. In this regard, researchers often move towards building personalized or subject-dependent activity models. Very few studies apply the personalization aspect due to the challenge of collecting and annotating large amounts of userspecific data [1].
Human behavior processes have been modeled in existing studies using Kalman Filters, crowd simulation, and soft computing techniques [18,19]. The researchers in the field of process mining have also tried to model human behavior as a process for analyzing the interesting behavior patterns [20]. Process mining techniques represent the sensor measurements or samples from an indoor location system in process workflows for modeling human behavior [21]. Most of the works in process mining regarding human behavior focus on improving the process workflows. Assuming that the users may have similarities in their behavior patterns, this study uses it for computing the similarity between the existing pool of models to select and test it against the data of a new user. However, assuming that some of the data from new users are available, a semi-population calibration approach can be used to test one of the trained models against the new user data. It maps a new individual to a pool of existing users to solve the "coldstart" problem. The cold-start problem concerns the personalized activity recognition for new users having less or no amount of annotated data. The mapping of the personalized models onto the new users in existing studies was performed using the gender and demographic details such as age, color, and region. However, gender and demographic details have limited knowledge to be leveraged, and they do not provide sufficient information to solve the activity handling problem. We assume that the behavioral characteristics of human activities can be modeled best using their respective behavior processes. Although many studies use physical activities to derive or identify the behavior process models, the reverse approach has not been explored for solving the cold-start problem in physical activity recognition studies. Intuitively, it makes sense as the similarity in behavior is more likely to map the action characteristics of one individual to another in comparison to age and gender. An implication of such a study is evident by the recent report from Reuters Graphics [22] which provided an insight into the trajectory of patient 31 as shown in Fig. 1. Provided with the limited information of the physical activity, time, and location, the system could have chosen a similar individual based on the behavioral patterns and predicted the location of the next activity, or as a precautionary measure, if an individual seems probably affected, the system could generate a notification to the locations where the subject tends to socialize or have a meeting. This shows the importance of using behavior patterns for recognizing human activities and the relevancy of this study with current situation.
In this study, we address two issues for personalized activity recognition systems. The first is the integration of the contextual information from multiple sensor modalities to improve high-level activity recognition. The second is the use of human behavior processes to solve the cold-start problem by mapping the new user  [22] onto the existing pool of subjects. In this regard, we propose the context-aware personalized human activity recognition (CAPHAR) framework using associative learning and human behavior process models. The CARs in associative learning take into account the contextual information from different sensors such as location, object, and time along with the low-level action from IMUs or sensor activations to find the frequent patterns. These frequent patterns are then used to classify the highlevel activities. The method also provides a generalized way of recognizing highlevel activities from either low-level actions or sensor activations which has not been directly addressed in the existing studies. We demonstrate the effectiveness of associative learning using two real-world human activity datasets, i.e., "daily life-logging" [17] and "Activity Recognition in the Home" [23]. The former consists of low-level actions from IMUs, location, and time information, whereas the latter consists of sensor activations and the time information. We assume that the evaluation of these two datasets will prove the real-world applicability of our proposed method.
Furthermore, we generate the individual behavior process models of each user using process mining techniques from the recognized high-level activities. We propose a way to measure the similarity between the behavior model of a new user and the existing pool of behavior process models to perform personalized high-level activity recognition. The proposed framework is shown to work in an environment where we have a limited amount of annotated data from a new user. The contributions of our study are summarized as follows: • We propose the CAPHAR framework for personalized HAR based on human behavior and contextual information. • The CAPHAR framework can handle various sensing inputs such as low-level actions or sensor activations in a single framework. • We propose the use of a semi-population calibration approach by computing the similarity between the human behavior process models for solving the cold-start problem. • The classification accuracy is improved by leveraging human behavior using associative learning.
The remainder of the paper is structured as follows: Sect. "Related works" presents a summary of the related works. Section "Proposed method" explains the proposed methodology to recognize low-level actions using traditional machine learning methods, to recognize high-level activities using associative learning, to generate process models from the recognized high-level activities, and to compute the similarity for mapping the personalized models on to new users. Section "Experiments and results" briefly describes the daily life-logging dataset and activity recognition in the home database. The results are also presented in Sect. "Experiments and results" which first presents activity recognition results and then depicts the usefulness of the proposed work for personalized activity recognition. We conclude our study in Sect. "Conclusion" along with discussion and some future directions.

Association rule mining and Class association rules for activity recognition
The association rule is formally defined as X ⇒ Y , where X and Y are disjoint sets and are referred to as consequent and antecedent, respectively. One of the most used association rule mining algorithms is Apriori [24] which finds the frequent itemsets based on the user-specified minimum support and confidence. Another one is the FP-growth algorithm [25] which uses a tree-like structure to mine the frequent patterns. Association rule mining has been widely used on historical transaction data for predictive analysis [8].
Association rule mining has also been used as a feature extraction mechanism for many classification models. Chien and Chen [24] used the association rules as features to train the classification model for stock trading. Pach et al. [25] introduced the concept of using association rules with fuzzy logic to classify numerical data. Qodmanan et al. [26] and Yan et al. [27] used association rules in conjunction with genetic algorithms to optimize the value for minimum support and confidence.
Very few studies have applied the association rules for mining and matching frequent patterns to recognize activities. Gu et al. [28] and Palmes et al. [15] extracted the repetitive patterns from sensor activations which were frequent for one activity and infrequent for the others. They transformed the sensor activation data into different window sequences and computed the score for all individual activities. These scores were used as weights for categorizing activities from sensor activations. Rashidi et al. [29] proposed an activity recognition and tracking algorithm based on unsupervised learning. They mined the frequent patterns and clustered these patterns for all the available activities. However, they ignored the fact that the same sensors can be activated by different activities. Luhr et al. [30] used frequent pattern mining for activity recognition using sensor activation sequences, which at many times change in realistic environments. Yassine et al. [31] used association rule mining for analyzing the energy consumption patterns from electronic appliances which are directly related to human actions. Sfar et al. [32] proposed a causal association rule mining to detect and recognize anomalous activities in smart home environments. Atzmueller et al. [12] proposed a class association rule-based mining algorithm (CARMA) for analyzing different rule sets to select the final classification model. Their work used the inertial measurement units from mobile phones for recognizing human activities. Liu et al. [13,14] used the association rules to mine the frequent patterns and used these patterns as discriminative features. Multi-task learning was employed to improve recognition performance. Liu et al. [33] focused on extracting mid-level features by mining frequent patterns using association rules. They suggested that the extracted temporal patterns can help improve activity recognition performance. Marimuthu et al. [34] proposed the activity recognition method using an adaptive neuro-fuzzy inference system with frequent pattern mining. The frequent pattern mining in this study was used to reduce the number of membership functions and the rules. Our method is inspired by the works of [4,14,33], i.e. proposing a modified Apriori algorithm to mine the frequent patterns. The main focus of the works mentioned above was to recognize overlapping activities efficiently. Most of the existing studies use the Opportunity dataset in which the activities were performed in a controlled environment and have uniform labels. The existing methods mine the frequent patterns between the activity (ground truth) and the sensors involved while performing the activity (sensor activations). The proposed work is unique in a sense that it integrates both the sensor activations and low-level actions from IMUs to mine the patterns rather than limiting the method to a single modality. With the advances in IoT and micro-electromechanical systems (MEMs), the use of contextual sensors has also increased drastically. Therefore, the need of a framework which can undertake the information from more than one context is needed which is ignored by many existing studies. Moreover, none of the above-mentioned studies leverages human behavior for personalized human activity recognition. We extended their work for accommodating contextual information along with the generalization on the data acquired from sensor activations or inertial measurement units to mine the frequent patterns for activity recognition.

High-level activity recognition using contextual information
A very few studies focus on recognizing high-level activities as most of the existing works focused on classifying low-level actions. Recently a survey [35] highlighted how the characteristics such as time, location, and objects, are used as contexts for recognizing such high-level activities. Khowaja et al. [9] exploited ontological reasoning combined with data-driven methods to derive high-level activities. Gong et al. [36] introduced an idea of using pattern mining with the location as a context for high-level activity recognition. They used a single context, i.e. location, for recognizing high-level activities. Villalonga et al. [7] proposed a knowledge-driven approach for recognizing high-level activities using multiple contexts such as location and emotion. Cao et al. [37] proposed a group-based context-aware human action recognition (GCHAR) where they use the transitional logic and the previous state of the human to predict the action. The transitional logic was referred to as the context-aware component in their study. They also focused on improving the action recognition accuracy by leveraging the contextual information in the form of transitional logic, however, they only recognized four lowlevel actions, i.e. (grouping the similar actions in one) using IMUs. Other studies try to leverage the contextual information for constructing a better classifier model by embedding the contextual information in the feature space. Filippoupolitis et al. [10] presented a location enhanced activity recognition where the location is considered as one of the features for training the classification model. Zhang et al. [38] used the subtasks as contextual information to recognize high-level activities. A sequence chunking strategy was proposed by segmenting and labeling different chunks of high-level activities. Lee et al. [39] conducted an exploratory study on the context affecting human activity recognition and in their study the context referred to as the housing environments. Aminikhanghahi and Cook [40] proposed the use of time context to segment a high-level activity into its corresponding subtasks to improve the recognition performance. Zhang et al. [41] proposed an ontological knowledge-based system for human activity recognition in smart homes. The main focus of the work was to deal with heterogeneous data sources and interoperability, therefore, multiple contexts have been used in collaboration with the sensor activations deployed at smart homes. Civitarese et al. [42] also proposed an ontological framework for human activity recognition for classifying interleave and concurrent activities. The method was tested on the CASAS dataset which has homogeneous activity labels. Moreover, the personalized characteristics has somewhat been reduced by removing the occurrences of noisy motion sensors as they may refer to a habit or a behavioral trait.
The above-mentioned studies focus on integrating the contextual information within the learning framework without considering the interrelationship of multiple contexts. The studies are also restricted to a single context and a single source of physical sensor readings (either sensor activations or inertial measurements) for recognizing high-level activities. Additionally, such studies do not consider personalization and activity handling problem due to the varying behavior which can hinder in generalizing the recognition performance. For instance, one subject performs "personal grooming" activity at the location "home" while the other subject performs it at the location "office. " Similarly, the activity "desk work" can be performed at both locations "home" and "office. " Using locations merely for constructing inference rules by assuming that certain activities are always performed at specific locations is another way of avoiding activity handling problems.

Personalized activity recognition
In realistic scenarios, the activities performed by one subject can be different from others in terms of activity label and behavior. For instance, while walking on the street, one subject may label the instance as socializing activity whereas the other subject may label it as a transportation activity. The activity handling problem arises due to diverse interactions with the contexts, which can be solved to an extent by introducing the aspect of personalization suggesting that the classification model should be separately trained for each subject. Many researchers agree upon having a personalized rather than the onefits-all model [1].
Zhang et al. [43] proposed a probabilistic learning method for training activity models from both incomplete and complete data. Stikic et al. [44] tried to reduce the number of annotations using graph-based and multi-instance learning-based label propagation. Maekawa et al. [45] proposed a method where the model is trained on new users as well as other user's data and termed it as supportive users. Bianchi et al. [46] proposed a personalized activity recognition method by collecting the data from individuals and training a light-weight model for each individual from a few days of data. The personalization effect was added irrespective of any physiognomies or behavior as the model is trained separately for each individual. Siirtola et al. [47] performed personalization with incremental learning approaches. The undertaken dataset had 9 subjects having the same action labels. As the data was collected in a controlled environment, i.e. activity handling problem was avoided, there was no improvement when applied the personalization effect. Burns and Whyne [48] proposed the use of personalized deep features, personalized engineered features, and personalized triplet networks to embed the personalization effect in their system. However, the study neither used any contextual representation nor accounted for human behavior. Other researchers focused on mapping the training data from the pool of existing users onto the new ones. The mapping is generally based on gender and demographic details such as age, height, weight, and gender [1]. The activity labels are the same for new and old users which is again an unrealistic scenario. For instance, one user can perform an activity labeled as taking medication or playing soccer whereas another user may not perform such activities at all. Thus the activity labels may not be the same for all the users. In this case, the performance of personalized models also decreases as they are grouped based on their physicality rather than the similarity in their routines and the way they handle their activities.
Vaizman et al. [49], surveyed the contextual information which can be used for user behavior modeling to develop a specific healthcare application. They also suggested that for large-scale applications, the context-recognition component should be made unassertive without making the users alter their behavior. Shen et al. [50] used the behavioral characteristics of smartphone sensors to recognize human activities. They experimented on both the general and the personalized models using the smartphone usage behavior for recognizing low-level actions and suggested that the personalized models perform better than its counterpart. Mafrur et al. [11], modeled human behavior using life-logging data from smartphones to identify the human instead of activity recognition. This study supports our assumption that a general activity recognition would fail in realworld applications as human behavior is unique and different from other users. Jalali et al. [51], proposed the use of human behavior for recognizing multiple events. Furthermore, the sequential and parallel relations of these events were recognized using frequent co-occurrence patterns. Soleimani and Nazerfard [52] proposed the use of subject adaptor generative adversarial networks (SA-GAN) for cross-subject transfer learning. The method was tested on the Opportunity dataset which has homogeneous labels and the behavior of the subjects is the same as all the activities are performed in a controlled environment. It is apparent that human behavior analysis has not been used previously for personalized high-level activity recognition. To the best of our knowledge, this study is the first to use the behavior process models for computing the similarity between a new user and the existing pool of subjects.
The benefit of using associative learning is that it automatically maps the relationship of actions or sensor activations with the contextual information for each user, hence, reducing the problem of activity handling to some extent. Unlike the existing methods which use demographics or physiognomies of the users, we employ a process modeling approach to measure the similarity between different users to solve the activity handling problem. Furthermore, the studies only consider the datasets which exhibit a uniform set of labels for all the subjects which is not the case with real-world scenarios. For instance, "Taking medication" activity is not common with all the subjects, therefore, a limited number of users might perform this activity. In such cases, the system fails to recognize the activity due to the variations in activity labels. In this study, we propose the similarity metric based on the behavior process models to map the subject in such a way that the variation in activity labels can be reduced and the activity recognition performance can be enhanced.
Some of the existing works are consolidated and summarized in Table 1 with respect to the personalization, activity handling, data source, classification approach, low-level actions, high-level activities, and use of contextual information. As it can be noticed that none of the existing works focuses on the activity handling which represents a realworld problem as each individual performs an action in their own way. Similarly, there are very few works proposing a generalized framework that can be applied to both the IMUs and sensor activations. With respect to the consolidated review of existing works, we are focusing on the maximum number of activities, i.e. combined low-level actions and high-level activities, to evaluate the performance of the CAPHAR framework. Class association rules have been used previously for 4-8 activities using only sensor activations without the consideration of contextual information. This work not only extends the number of activities to recognize but also adds multiple contexts along with the generalization of both sensor modalities, i.e. IMUs and sensor activities. Moreover, the existing studies use demographics and physiognomies for personalization whereas in this study we use the human behavior process to model the personalization effect which helps to reduce the activity handling problem. Figure 2 illustrates the CAPHAR framework, specifying the inter-dependencies among the building blocks. The method for high-level activity recognition is based on two modules: low-level action recognition (in the case of data from IMUs) and high-level activity recognition. We consider the data from IMU, i.e., accelerometer (Acc) and Gyroscope (Gyr) sensors for low-level action recognition (LLAR) module. The method extracts the features from raw sensor data and applies machine learning methods to build a classification model. The output of this module (classified action) will be the input to the highlevel activity recognition (HLAR) module. The pipeline of the LLAR module is quite similar to conventional learning, therefore, we adopt the feature extraction and classification method proposed in existing works. As suggested in Sect. "Association rule mining and Class association rules for activity recognition", the associative learning method in HLAR is inspired by the works proposed in [4,14,33], however, we have modified the algorithm and method to work with multiple contexts as well as to integrate the use of IMUs and sensor activations. Furthermore, the use of behavior models for computing the similarity among the pool of subjects for the semi-population calibration approach is the main highlight of the CAPHAR framework. Table 2 presents the nomenclature of the variables used in this study. Each of the variables is briefly explained in the text at the point of their usage. The HLAR module takes into account either the classified action or sensor activations, along with the available contextual information to perform activity recognition using the associative learning method. To perform the personalization, we first generate the behavior process models from the predicted activities using associative learning. These models are evaluated against the reference activity models generated from the annotated data. For a new test user, the behavior process model is mapped on to the existing pool  of models based on the similarity metric. The most similar model will be elected to recognize personalized high-level activities from the new test user's data. In the below subsections, we explain the particulars of each building block for LLAR and HLAR modules with the details for process mining and similarity approaches.

Low-level action recognition module
The LLAR module follows a typical machine learning pipeline to recognize low-level actions. We present the details of the employed dataset in a later section. The data from inertial sensors are usually acquired with a sampling rate of 50 Hz. The data is first preprocessed to reduce the noise in the data stream. We employ the low-pass butter-worth and median filter with a 10 Hz cutoff frequency to filter the raw data. The signals are then sampled with sliding windows of 1.5 s and overlapping of 30% between the sliding windows. Once the data is pre-processed, we extract features from each sliding window.

Feature extraction
We extract 34 statistical features from Acc and Gyr sensor readings from each sliding window, i.e. 17 features from each of the sensors. The fast Fourier transform (FFT) coefficients are based on the frequency components from 1 to 50 Hz, which are extracted from each of the sensors and are added to the feature space, accordingly. Thus, the statistical features (34) and the FFT features (100) results in a total of 134 features. We present the list of extracted features in Table 3. These features are commonly used for low-level action recognition studies [55].

Classification model
The classification in our approach is to recognize the discrete category of new sensor readings by learning from the training samples. Many generic classification methods have been proposed and used for LLAR such as SVM, neural networks (NN), decision trees, and gradient boosting. Recently, deep learning algorithms have left a positive impact in the field of action recognition, especially the convolutional neural network (CNN) [3] and recurrent neural network (RNN) [56]. These methods have shown promising results, but they require a large dataset for training which sometimes may not be available in real-life scenarios. As the LLAR module is of vital importance to the pipeline of HLAR, we will evaluate different kinds of classification methods for attaining high accuracy. In this regard, we mainly adopt two methods for classification which have shown promising results in the literature, i.e. error-correcting output codes (ECOC) [6] and long short-term memory networks (LSTM) [56]. We use ECOC with only a single base classifier without any mini-batch to perform the training. The base classifier is chosen empirically as shown in Sect. "Low-level action recognition results". We also evaluate the LLAR module using LSTM networks, a variant of RNN. The difference with existing studies is that we use the loss-function based on the F1-score instead of the cross-entropy. The reason for employing F1-score loss is that most of the publicly available datasets are highly imbalanced in terms of samples for a particular action label. Thus, using cross-entropy, L1 or L2 loss may result in overfitting for specific action classes with a large number of observations. The F1-loss for the set of N samples is defined in Eq. 1.
where pr i refers to the probability vector for each instance, the symbol • denotes element-wise multiplication, ℓ represents the number of action classes, and q i is the binary vector signifying the correct action class l , i.e. q il = 1 (q ik = 0) k� =l |q i ∈ R ℓ . For example, let us consider we get the probability vector for an instance for action classes: sitting, standing, walking, and running as [0.3 0.3 0.25 0.15], respectively. If the true label for the given instance is standing, the binary vector q i will be represented as [0 1 0 0] for the respective action classes.

High-level activity recognition module
Our HLAR module considers either the low-level action or sensor activations along with the contextual information. HLAR faces one dependency which is the availability of information from either of the modality for carrying out the further process. As shown in Fig. 2, if the data from IMUs is available, we use the classified action; otherwise, the sensor activations will be considered to recognize high-level activities. In this section, we define the concept of building an activity classifier using CARs.
To move further, we first define the activity trace which is the set of low-level actions or sensor activations and the contextual information during an activity. For instance, the high-level activity "socializing" may comprise multiple low-level actions such as walking, climbing (down), and standing, at location "office" on a specific time which can be regarded as the activity trace of "socializing. " An example of an activity trace is shown in Fig. 3. The first example shows the transformation of the raw data from IMUs to the activity trace format in compliance to Fig. 2 (HLAR Module). The second example considers the sensor activations as an input. The datasets which we chose for evaluation of associative learning provide time and location as their contexts. Therefore, in Fig. 3, con-text1 and context2 refer to time code and location, respectively. The goal of associative learning in our study is to construct a classifier based on CARs from the traces of each activity.

Definition 1 (Activation)
Let AS be a set of available activations. Then, AS is defined in Eq. (2).
where A is a set of low-level actions defined as A = a 1 , a 2 , . . . , a g , and S refers to a set of sensor activations defined as S = {s 1 , s 2 , . . . , s h } . The indexes g and h represent respectively. An operator ⊕ refers to an exclusive-OR (XOR) operation suggesting that the activation will simply return either the set of low-level actions or sensor activations, based on the availability of the data. We define the set of available contexts as C = {c m |m = 1, . . . , M} , where m is the number of available contexts. We constitute the following definition for the itemset which combines the available activations with different contexts.

Definition 2 (Itemset)
Let I be a set of itemsets. Then, an itemset i k ∈ I is defined in Eq. (3) where AS k ⊆ AS and C k are the k-th action and context sets, respectively. For example, AS may comprise of four low-level actions i.e., "standing, walking, sitting, and lying", whereas AS k may only include "sitting and standing" amongst four actions. As per the example shown in Fig. 3 In this regard, we can replace C k by T k and L k in definition 2. We define the set of high-level activities (HLA) in Eq. (4) where R refers to the number of HLAs. The activity trace in the form of a transaction is defined in the following definition.

Definition 3 (Activity trace) Let
Tr be a set of activity traces. Then, an activity trace tr k ∈ Tr is defined in Eq. (5) where is the k-th trace, ha k and i k are activity and an itemset assigned to the k-th trace tr k , respectively. An example tr 1 = PersonalGrooming, sitting, standing , Home, Mo represents that ha 1 = PersonalGrooming and i 1 = [sitting, standing], Home, Mo . Then, we can define the support and confidence of Tr given antecedent I * in Eq. 6, 7 whereI * ⊆ I In the above equation, |Tr| is the total number of activity traces in the training set, k f I * (tr k ) refer to the number of activity traces having I * and k f I * ,ha r (tr k ) is the number of activity traces containing I * and ha r . We provide the examples of activity traces in Table 4 using low-level actions with the contextual information. From Table 4  The activity trace may include the itemsets from I which are unique for different HLAs, but the real-life scenarios are much complicated which gives rise to activity handling problems as different people perform the activities differently [57,58]. For example, in the case of sensor activations, activity ha 1 triggers the sensors {s 2 , s 4 , s 4 , s 4 } and activity ha 5 triggers the sensors {s 2 , s 2 , s 2 , s 4 } with the same location and time code. Both the HLAs trigger the same set of sensors {s 2 , s 4 } , therefore they are quite challenging to differentiate. However, both the activities have different frequencies of sensor activations, and this information can be used to improve the recognition process. In this regard, we use the frequencies of actions or sensors triggered during a certain activity. This will allow us to find the temporary patterns for the sensor activations or actions which might improve the performance of associative learning, intuitively. By doing so, we are incorporating the frequency of actions or sensors into the confidence calculation for an association rule. We modify the set of actions and sensors as, A = (a 1 , |a 1 |), (a 2 , |a 2 |), . . . , a g , a g and S = {(s 1 , |s 1 |), (s 2 , |s 2 |), . . . , (s h , |s h |)} , respectively. The notation |{ }| refers to the frequency of corresponding actions or sensor activations. The same earlier example will result in {(s 2 , 3), (s 4 , 1)} and {(s 2 , 1), (s 4 , 3)} for activities ha 1 and ha 5 , respectively. The embedded frequency information provides better discriminative capability compared to the previous scenario where we only considered (6) Supp I * ⇒ ha r = k f I * ,ha r (tr k ) |Tr| , Conf I * ⇒ ha r = k f I * ,ha r (tr k ) k f I * (tr k ) (7) f I * ,ha r (tr k ) = 1, I * ⊂ tr k ha r ∈ tr k 0, otherwise and f I * (tr k ) = 1, I * ⊂ tr k 0, otherwise the sensor activations. For further analysis and computation, we use the modified set of actions and sensor activations which embeds the frequency information. Another advantage of using the frequency information is that it helps to overcome the intra-activity variations in the activity traces. Assuming the two activities, i.e. "grooming" and "toileting" as shown in Fig. 4. The "toileting" activity triggers the sensor placed at location "bathroom, " while "grooming" activity is performed at different locations including "bathroom. " The "toileting" activity triggers the sensor at the location "bathroom" more often than the activity "grooming. " This suggests that ignoring frequency information may misclassify these activities due to the intra-activity variation. We extend the method of mining CARs from [4].

Classification of activities using CARs
As shown in the previous section, each HLA has its own activity traces. In this regard, we create CARs in which the activity trace itself is the antecedent and the HLA is the consequent as suggested in [4,59]. The classifier using associative learning can be built by using all class association rules which meet user-specified minimum Supp and Conf . There are mainly two steps for building such a classifier [4]: (1) eliminating the rules which are subordinated to other rules, and (2) removing the rules from the first step that do not contribute to the improvement of classification accuracy. For instance, given two rules, if rule 2 is subordinate to rule 1 and both the rules have the same consequent, then rule 2 will be considered redundant with rule 1. Consequently, the frequent pattern (antecedent) of rule 2 is considered as the subset to that of the rule 1. Some of the rules proposed in [4] are also applied for the elimination which is as follows: • Rule1 has higher Confidence than rule2 • The Confidence of both rules is the same, but rule 1 has higher Support. • The Confidence and Support for both rules are the same, but rule 1 was generated before rule 2.
In such cases, the rule 2 will be eliminated. While classifying a new instance, the rule whose activity trace satisfies the instance will be considered for the classification. We create a group of CARs and represent it as car(ha r ) . For the classification, we apply the group of CARs to aggregate the gain against the HLA for classifying a test instance. Using multiple association rules has proven to achieve better classification performance as compared to the single association rule [24]. In order to create the list of CARs, we define car(ha r ) that stores activity traces associated with different activities in Eq. (8)

Fig. 4 Example of intra-activity variation
where z is the unlabeled itemset. The notations Conf min and Supp min refer to the userspecified minimum confidence and support, respectively. We arrange the itemsets according to their length in descending order. For example, if rule1 is the superset of rule 2, then rule 2 would not be considered for the computation of gain(z, ha r ) which is the exact purpose of line 4 in Algorithm 1. The gain(z, ha r ) is aggregated based on the CARs (line 5, Algorithm 1), and the HLA having the maximum gain will be returned as the classification result (line 10, Algorithm 1). Algorithm 1 only focuses on the association rules for a certain high-level activity, therefore, the complexity of the proposed algorithm is O(n). Furthermore, the association rules that do not meet the requirement of minimum threshold in terms of confidence are pruned which results in a smaller search space.

Human behavior process modeling
In past years, human behavior process modeling from activities of daily life (ADL) has been a hot research topic. Various studies try to mine the patterns based on the annotated activity logs. Some of them focus on anomalous behavior detection whereas others emphasize predicting activities [60]. There are various methods and techniques to perform behavior process modeling. We provide notations and a brief description of the elements considered for constructing a behavior process model. Generally, a process model consists of events, traces, and logs. It is to be noted that the trace considered in the process model is quite different from the activity trace. Let E , PTr , and L represent the set of events, traces of events, and logs, respectively. The set E comprises of a finite set of events, and the trace PTr is the sequence of events in E such that PTr = �e 1 , e 2 , . . . , e N � where N is the size of the trace |PTr| . The L comprises of traces such that L = {PTr 1 , PTr 2 , . . . , PTr M } . In our study, an event is an activity (HLA) and the PTr is the sequence of the HLAs performed. An example of L can be given as Housework, Eating\Drinking, PersonalGrooming 2 , PersonalGrooming, DeskWork, Socializing, Transportation . In this example each HLA is an event and each sequence of events enclosed by . is a trace of events. The given example consists of three traces: two instances of �Housework, Eating\Drinking, PersonalGrooming� (8) car(ha r ) = z|z ∈ Z, Conf (z ⇒ ha r ) ≥ Conf min , Supp(z ⇒ ha r ) ≥ Supp min whose frequency is denoted by the number in the superscript, and one instance of PersonalGrooming, DeskWork, Socializing, Transportation .. Using the events, traces of events and logs, the process model can be created using Petri net. For more details of process mining building blocks and creation of process model using Petri net, refer to [61] The reasons for using such process modeling methods are twofold: the first is to use the activity process models in conjunction with associative learning to improve recognition, efficiency, and reliability, and the second is to use the activity process models as a means for computing the similarity between two users. The similarity-based on behavior process models will not only help to map the new user on to the pool of existing models but also to reduce the effect of activity handling issues. We used the inductive visual miner (IVM) [62] to construct the process model from the activities classified using the proposed associative learning. The reason for choosing IVM is the guarantee of soundness in comparison to other methods. The soundness refers to the absence of deadlock in the process model [62]. We evaluated the effectiveness of this implication by generating the process model from classified activities using a part of individual data and compared it with the model generated from the annotated data. The process model is first discovered with Petri nets which represent the flow of a process using modeling formalisms. A triplet can represent a Petri net (Pι, T r, F) , where Pι and T r refer to the set of places and transitions, respectively, such that their interaction results in ∅ . The term F is regarded as a set of directed arcs and can be defined as F = (Pι × T r) ∪ (T r × Pι) . Once the Petri net model is created, we then apply the IVM to generate the final process model which is represented as a process tree. For the Petri net, we assume the standard semantics as proposed in [61].
The evaluation was based on two parameters, i.e., trace fitness and generalization [63]. Trace fitness measures the extent of the model for reproducing the traces from the event log. It tries to align as many events as possible from the traces (also called the alignment measure). If the alignment is not perfect, the events may be skipped or inserted without their presence in the log, but this adds a penalty to the fitness score. The computation of trace fitness score is presented in Eq. (9) where Mod refers to the process model created using Petri net followed by IVM. The variable L is an arbitrary event log such that L ⊆ L. Trace fitness heavily depends on the ratio for the cost of aligning L with the Mod to the minimal cost of aligning the model on the L . The minimal cost is also responsible for normalizing the value between 0 and 1. The generalization refers to the frequency of each event in the model visited to reproduce the event log. It indicates incorrect behavior if an event is visited more often than it is anticipated. The generalization is considered to be bad if some events of the model are visited very infrequently. The computation of the generalization is given in Eq. (10)

User similarity based on behavior process models for personalization
The studies carried out for personalized activity recognition have always thrived for solving the cold start problem. It is common in all activity recognition systems that the performance on the data acquired from a new user is not generalized to the same extent due to the small amount of labeled data. To cope with this problem, researchers use the calibration approach which is to map the characteristics of the new user data on to the pool of existing subjects. By doing so, the increase in recognition performance has been achieved. Existing studies map the characteristics of a new user based on gender, age, and physiognomies such as height, weight, and age. These physiognomies fail to capture the variations introduced by the difference in activity handling. In this regard, we measured the similarity of a new subject's annotated activity log with that of the existing behavior process models to find the most relevant mapping for personalized activity recognition. The similarity is computed based on the formulation shown in Eq. (11) where α , β , and γ are the user-defined weights for trace fitness, generalization, and number of normalized activities, such that α + β + γ = 1 , respectively. For computing similarity between two subjects i.e. U 1 and U 2 , we compute the trace fitness for the model generated by activities of U 1 and the log of U 2 We normalized the number of activities for two subjects with respect to the maximum number of activities available either in the log or the model. To compute the normalized number of activities, we used the computation as: . Once we find the process model having the maximum similarity from the existing pool of subjects, we can apply the association rules to perform activity recognition.

Experiments and results
In this section, we present the details of the dataset employed in our study and the challenges associated with it. The experimental results obtained for LLAR and HLAR are discussed in detail. Additionally, we also present the results for the effectiveness of the behavior process model generation and personalization based on the proposed similarity metric as discussed in the former section. For the experiments and analysis, we use two datasets. The first is the daily lifelog dataset which we refer to as dataset1 and the second is activity recognition in the home dataset which we refer to as dataset2 in our study. One of the most widely used activity recognition dataset is Opportunity [64], however, we do not consider this dataset due to the following reasons: • The recognition accuracy has already surpassed over 98% [13].
• The dataset is acquired in a controlled environment, therefore, including personalization will adhere little to no effect. • This study is focused on the activity handling problem which cannot be addressed through this dataset due to the limited variations in performing activities.
The employed dataset1 has been made publicly available by the University of Mannheim [17]. This dataset consists of data from 7 subjects (age 23.1 ± 1.81). Each subject data has been recorded for 12 h/day on average for 2 weeks. There are seven low-level actions, i.e., climbing (up), climbing (down), sitting, standing, walking, lying, and running. Additionally, this dataset offers 13 high-level activities namely deskwork, eating/ drinking, housework, meal preparation, movement, personal grooming, relaxing, shopping, socializing, sport, transportation, sleeping, and taking medication. The data was self-recorded using their current location, low-level action, and high-level activity. The subjects were only given an initial guideline and information on pre-defined labels, but they were not administered when performing activities. Due to the lack of supervision, subjects performed the activities differently as compared to the others which introduce the activity handling problem. Existing studies that use physiognomies might fail to perform well on this dataset as all the subjects are male with a similar age range and physical appearances.
The dataset2 has been made publicly available by the Massachusetts Institute of Technology [23]. This dataset consists of data from 2 subjects (age 30 and 80). Each subject data has been recorded for 16 days. This dataset comprises of sensor activations which were fitted to different appliances, furniture, and containers used in everyday life activities. The first subject performs 13 activities, i.e., cleaning, doing laundry, preparing a beverage, grooming, dressing, going out to work, bathing, preparing a snack, washing dishes, preparing breakfast, toileting, preparing lunch, and preparing dinner. On the other hand, the second subject performs 9 activities which include watching TV, preparing a snack, washing dishes, preparing breakfast, toileting, taking medication, listening to music, preparing lunch, and preparing dinner. As this dataset is a lot in contrast with the daily lifelog, it will prove the applicability of our proposed method to generalize on both datasets having different characteristics.

Low-level action recognition results
We perform the analysis of LLAR on dataset1 only as the dataset2 does not provide the information for low-level actions. As explained in the former section, we adopted an existing method [6] with a single base classifier for recognizing low-level actions along with the LSTM network for carrying out a fair comparison. We used the following model parameters for LSTM. The network model comprises of 2 layers with 256 units in each layer. The drop-out value was set to 0.5 for both hidden layers. We used an ADAM optimizer [65], with a learning rate of 0.001 decaying to 0.005. As the number of instances for the low-level actions is highly imbalanced, we compute not only the accuracy but also the F1-scores which is commonly used for evaluating action recognition methods. We have performed a leave-one-subject-out (LOSO) analysis since the low-level actions do not exhibit the activity handling problem in this dataset. Table 5 presents the accuracies for each action using different base classifiers and the LSTM network. Figure 5 shows the F1-scores for each action obtained using the respective methods. It is quite apparent that our existing method using only an adaptive boosting (AdaBoost) classifier performs better than LSTM networks. The graph shows that the variations in accuracy are uniform, for instance, the actions "lying" and "sitting" perform comparatively better in comparison to other actions which is obvious due to their non-confusing characteristics. We observe that "standing" action is confused with "walking" action sometimes. Similarly, "running" action is also confused with "walking" action. As the self-annotation was not administered, we assume that the users stop moving or stand for the time being while not annotating it as "standing". In action recognition, we call it transitional actions such as standwalk-stand, and stand-sit-stand. Another reason for the low-accuracy of "Climbing (up)" and "running" action is that the number of observations for these actions is less as compared to the other actions. It is obvious that in real life people do not run or  climb stairs more often in their daily lives which makes this dataset more challenging in terms of low-level action recognition as well.
Although the accuracy of all the methods is relatively closer, it is necessary to select the classifier with the best recognition performance as our HLAR is dependent on the accurate classification of low-level actions. It is also necessary for a good classifier to recognize not a certain subset but all the actions accurately. In comparison to the ECOC (AB), LSTM's performance for the walking and running action is not so good, even the performance for the running action is lower than the other two classifiers. We consider the average accuracy of 92.70% is promising enough given the above observations drawn from the results. Moreover, the testing time for ECOC using AdaBoost and LSTM networks is approximately 0.2 s and 0.48 s, respectively which shows that the ECOC with AdaBoost is better in terms of execution time as well as performance for LLAR.

High-level activity recognition results
To evaluate the personalized HLAR results using our proposed method, we use a leaveone day out (LODO) validation. We use a single day of each subject for testing while the remaining days for the computation of CARs. We repeat this experiment the times same as the number of days for each subject and report the average results. For computing Car (S r ), we set the value of Conf min and to 60% and 20%, respectively. In order to make a fair comparison with existing methods, we tested many discriminative classifiers such as decision trees, support vector machines, and Naïve Bayes. We also compared our method with hidden Markov models (HMM) [66] as they take into account the temporal relationships between the activities and have been used extensively in activity recognition. We combined the features extracted from Acc and Gyr sensor measurements along with a low-level action label, location, and time code to train the discriminative and generative classifiers for dataset1. Similarly, we used the sensor activations along with the time code to train the classifiers for dataset2.
Although the location information is available in dataset2 it is redundant to the sensor activations as the location indicates the placement of sensors not the presence of the user. We report the accuracies in Table 6 using HMM, random forest (RF), ECOC using AdaBoost as a base classifier, and LSTM network with the same configuration used for LLAR except that we added one more hidden layer with drop out ratio of 0.2. The reason for reporting the accuracies from these methods is that they achieved comparatively  10:35 better accuracy than the other discriminative classification methods. We also present the average precision of each activity for each subject using the proposed AL method in Table 7.
The precision values highlight the false-positive rate (FPR) suggesting that the FPR would be low if the precision is high and vice versa. Holistically, the dataset1 have many instances for the activity 'DeskWork' and 'Movement' . It is reflected by the precision values for these two activities. However, 'TakingMedication' , 'Sleeping' , and 'Shopping' have fewer occurrences as compared to the other activities. The advantage of using the proposed AL is that even the activities such as 'TakingMedication' , 'Sports' , and 'Sleeping' , occur only twice or thrice for data recording of 12 days, it can generate the corresponding rules and classify the activities with considerable precision. The only instance where the proposed AL failed to recognize 'Sleeping' activity was for subject 1 as the activity occurred at only two instances and was performed on a train during transportation. The frequency of the posture for the 'Sleeping' activity was also the same as the 'Transportation' activity, therefore, the rule was eliminated. Many times the 'HouseWork' activity was confused with 'MealPreparation' and 'PersonalGrooming' , the 'Eating/Drinking' activity was confused with 'Relaxing' and 'DeskWork' when performing the activities at location 'Home' , and the 'Socializing' activity was confused with 'Movement' , 'Sports' , and 'Shopping' when performing the activities at location 'Street/Road/Pasture' , respectively. The precision values often resulted in "0" for some of the less frequent activities when using other classifiers.
The precision is a good metric but not a complete one as it does not reflect the ratio of correctly positive predicted values to the actual positive ones, i.e. "False Negatives". In this regard, we show the F1 scores for each subject in Fig. 6 which consider both the false positive and false negative, accordingly.
We present the day-wise accuracies for both the subjects from dataset2 in Table 8. Unlike the results from dataset1, ECOC (AB) achieves second-best results in comparison to LSTM networks. One disadvantage of using LSTM on dataset2 is less number of observations as compared to dataset1. Although for some days the results are better for ECOC (AB), the results from associative learning are the best amongst all other classifiers for most of the days and overall average accuracy. Similar to the dataset1, we provide the precision values for each activity performed by each subject on dataset2 in Table 9. Most of the instances for both the subjects belong to the activity 'Toileting' which is reflected by its high precision. For subject 1, the activities 'Preparing Dinner' , 'Washing Dishes' , and 'Cleaning' have the least number of instances which is apparent by their low precision, however, the proposed AL can still classify these activities whereas other classifiers such as RF, HMM, and LSTM yield "0" precision for the said activities. The three activities 'Preparing Breakfast' , 'Preparing Lunch' , and 'Preparing Dinner' is mostly confused due to the similar sensor activities and frequency. Subject 2 has more uniformly distributed instances for all the activities except 'Toileting' which results in high precision values for most. In addition to the precision values, we show the F1 scores for each subject in Fig. 7. It is evident from the results that the proposed method not only achieves the best results for the data from inertial measurement units but also achieves the best results for the sensor activations which shows the generalizing ability of CARs on different data modalities. The results show that the classifiers assuming the data to be independent and identically distributed (IID) do not perform well as they do not take into account the relationship of activities being performed with respect to a certain context. The proposed AL classifier models the activity-context relationship well which results in better recognition performance. For dataset1, the accuracies are not so high and one of the major reasons is the inhomogeneity in the activity labels. As we performed LODO analysis, an activity label that is performed on the testing day may not be available in the training days which results in misclassifications. However, the proposed AL is better at recognizing activities with variations in terms of the interaction with the contexts, i.e. time and location, which is supported by the better accuracy, precision, and F1-scores.

Behavior process modeling of human activities
To evaluate the associative learning method for the behavior process modeling of human activities, we divide the datasets for each subject into two. We generate a reference model from the first ten days and 14 days of annotated activities for each subject from dataset1 and dataset2, respectively. We use the last 2 days' data from both the datasets to predict the activities using RF, HMM, ECOC with AdaBoost, LSTM, and AL. We then compute the trace fitness and generalization parameters for each subject. For the computation of the said parameters and generating process model from activity logs, we used Rapid Miner with RapidPRoM extension [67]. An example of the process is provided in the Additional file 1 generated using 80% activities and 80% path. We present the results in Tables 10 and 11, accordingly. The results indicate that our proposed associative learning generates the best alignment of event logs which is reflected by the values of trace fitness. It was also observed that the predicted activity log generalizes quite well to the reference model specified by the generalization parameter.

Personalized human activity recognition using CAPHAR
The experimental results for personalization have been performed on dataset1 only due to the number of subjects. For cross-subject analysis, we use a semi-population calibration approach as suggested in [1]. We leave one subject out as the test user and take 2 days of the test user's data to compute the similarity metric from the pool of the remaining subjects. We then conduct activity recognition using RF, HMM, ECOC with AdaBoost, LSTM, and AL to compute the accuracies and F1-scores for performance evaluation on the remaining days of test user's data. We develop the similarity matrix as shown in Table 12 to select the model closest to the test user. For the similarity computation, we set the value of α, β, and γ to be 0.4, 0.4, and 0.2 respectively. We highlight the best similarity of the test user with others in the bold face. It should be noted that the similarity matrix does not follow symmetry property i.e., Sim(U 1 , U 2 ) = Sim(U 2 , U 1 ) . The reason behind non-symmetrical results is the variation in the number of activities mostly. For instance, subject 1 might perform only 6 annotated activities in his first two days, while the subject 2 has a total of 13 activities. Similarly, subject 2 may perform 6 activities in his first 2 days while subject 1 has a total of 12 activities. The results also affect the TF and Ge values which result in the asymmetric similarity matrix. We observe that subject 3 has a different way of handling the activities as compared to other users which leads to low similarity values. We assume that the reason for subject 5 having low similarity value is less amount (days) of data and less number of activities as compared to other subjects. Based on the similarity values, we perform the activity recognition on the remaining days of data for each test user to evaluate the HLAR performance. The accuracies and F1-scores for each subject are shown in Table 13 and Fig. 8, respectively. The results show the weakness of conventional discriminative and generative learning algorithms. Although the performance degrades when the model is fitted to a new subject which is obvious in activity recognition field, still the associative learning achieves better results with a good margin, relatively. The highest accuracy was achieved for subject seven whereas the lowest accuracy was recorded for subject 5 which we assume is the same reason for getting low similarity with all subjects in the similarity matrix. The graph in Fig. 8 demonstrates the effectiveness of associative learning in comparison to the other classification algorithms. Due to the capability of AL to model the relationship between the actions/activations and the interaction with contexts, it can cope with the activity handling problem. The strength of AL can be justified by comparing its performance with HMM and LSTM as both the techniques use sequential modeling of the activities. However, the sequence of actions is not enough to enhance the performance, the frequency of actions being performed or interacted with the corresponding context should also be taken into account as considered in AL, accordingly.

Comparison with State-of-the-Art methods
As stated, most of the works have used the opportunity dataset for proving the effectiveness of their work, however, the opportunity dataset has less number of subjects, highlevel activities, does not exhibit activity handling problem, and is homogeneous in terms of activity labels. The proposed work deals with all the above-mentioned problems. To prove its effectiveness, we implement some state-of-the-art works proposed in existing studies and apply it on dataset1 to perform a fair comparison. We used the implementation of DeepSense [68], Deep Residual Bidirectional LSTM (DRBLSTM) [69], DeepCon-vLSTM [3], and Associative Learning [4], respectively. We evaluated the methods with Leave-one-day-out (LODO) protocol as performed in the earlier experiments. We used the raw sensor readings of accelerometer and gyroscope along with the location and time-encoded feature vectors as suggested in DeepSense, DRBLSTM, and DeepConvL-STM, instead of directly providing the low-level action label. The associative learning proposed in [4] used only a single context so we apply the method using the location context first and then the time-encoded values. The comparative results for each subject are shown in Table 14. The results show that the associative learning method in [4] works better with the location context in comparison to the time-encoded values. It is shown that the proposed method outperforms the existing ones on all the subjects except 6 and 7, however, the difference is quite low, i.e. 0.45 and 0.13, respectively. The proposed method also achieves better accuracy on average for all subjects in comparison to the existing works. An advantage of the proposed method over the existing ones is also the reduced number of parameters required in optimizing the training process concerning deep learning architectures. The reason for better performance is assumed to be the constrained search space and the pruning policy which our proposed method undergoes for generating association rules. It helps to avoid overfitting for a certain activity label which is important as the labels are not homogeneous and the activities are not always performed in similar way. It is also worth noticing that the execution time of the proposed method is 0.32 s which is quite faster than DeepSense, DRBLSTM, and Deep-ConvLSTM having 2.04, 1.45, and 1.73 s, respectively.

Conclusion
Recognizing low-level actions has been addressed by many existing studies and detecting only the basic actions such as standing, sitting, walking, running, and lying down are not of many benefits as the daily life activities are much more complex than the basic actions. In this paper, we proposed CAPHAR using the associative learning method based on class association rules with contextual information for personalized human activity recognition. The effectiveness of associative learning when dealing with the activity handling problem is evident with the results shown in Table 14. The proposed method not only achieves the best results but also shows the best trade-off in terms of accuracy and execution time. The personalization effect in CAPHAR has been achieved by computing the similarity between existing and new users based on their behavior process models. We proved that leveraging association rules for classifying activities performs better than the conventional discriminative and generative classification methods. The results show that associative learning can achieve 18.16%, 16.28%, 8.93%, and 6.00% better accuracy using RF, HMM, ECOC with AdaBoost, and LSTM on dataset1, respectively. This shows that not only the CAPHAR can achieve better personalization results but also can cope with the activity handling problem as users do not perform the activity in the same manner. We also proved that associative learning could perform better on both datasets having different characteristics and modalities, i.e., low-level actions and sensor activations. We also provided the results for the behavior process modeling of individual user activities and provided a solution to the cold start problem using the proposed similarity metric for personalization. The results for the behavior process modeling of individual users show that associative learning can predict the activities quite well, resulting in better trace fitness and generalization values in comparison to the other classification approaches. This leads to the realization of CAPHAR in the field of human behavior modeling which is more complex.
Furthermore, we trained activity models from different subjects and used these models to predict activities for a new user having a small amount of annotated activity data. The mapping of a new user was performed using the similarity metric based on trace fitness, generalization, and an average number of activities. Experimental results show that the associative learning method can achieve 23.73%, 21.86%, 16.76%, and 16.13% better accuracy in comparison to RF, HMM, ECOC with AdaBoost, and LSTM on dataset1, respectively. The results show the effectiveness of our personalized human activity recognition when provided with a new user having small amount of annotated activity data. We have also shown the qualitative results for F1-scores for LLAR, HLAR, and HLAR using a semi-population calibration approach.
In this paper, we show a different way of recognizing activities by leveraging the association rules using LLA/SA with contextual information. Although the reported results are better in comparison and show promising aspects to cope with the activity handling problem, there is still room for improving the performance of activity recognition. The limitation of this study is that it uses at least 2 days of the test users' data to generate a process model for mapping. In real-life situations, an online learning approach may be required to deal with this issue so that the model is updated regularly by asking the user for the activity labels. Currently, we only used the associative learning method on two datasets that provide low-level actions/sensor activations data along with location and time information. We also intend to record a large database with more fine-grained activities to leap forward in terms of realization. The extension of the dataset will also allow us to integrate our similarity measure with demographic features to understand the various aspects of human activity and useful insights for variations in activity handling. Furthermore, the associative learning method can be used for large-scale activity recognition from videos by considering the associations of the contexts such as objects and scene (background) rather than the co-occurrences of Spatio-temporal features. It will be interesting to see the association of the action with respect to a certain context and to compare it with the