Sensor Feature Selection and Combination for Stress Identification Using Combinatorial Fusion

The identification of stressfulness under certain driving condition is an important issue for safety, security and health. Sensors and systems have been placed or implemented as wearable devices for drivers. Features are extracted from the data collected and combined to predict symptoms. The challenge is to select the feature set most relevant for stress. In this paper, we propose a feature selection method based on the performance and the diversity between two features. The feature sets selected are then combined using a combinatorial fusion. We also compare our results with other combination methods such as naïve Bayes, support vector machine, C4.5, linear discriminant function (LDF), and k-nearest neighbour (kNN). Our experimental results demonstrate that combinatorial fusion is an efficient approach for feature selection and feature combination. It can also improve the stress recognition rate.


Introduction
In a recent survey by the American Psychological Association [2], more than half of the Americans surveyed indicated that stress is a major cause of personal health problems. In addition, more than 94% of these adults believe that stress is an essential factor in the development of illnesses such as depression, heart disease and obesity. Stress can also trigger heart attacks, arrhythmias and sudden death. Therefore, it is important to track and understand an individual's stress patterns by constantly and efficiently detecting his/her stress levels. By doing so, physicians are provided with much more reliable data and information with which to perform interventions and stress reduction if necessary.
In recent years, identifying the stress of human beings using multiple psychological sensors has received a lot of attention as a research topic. Existing studies ( [7], [8], [12], [3], [19], [21], [4]) have shown that psychosocial stress can be recognized from the physiological information of a human being. The physiological information can be acquired by biological or physiological sensors, which usually include: ECG (electrocardiogram), GSR (galvanic skin response), EMG (electromyogram) and RESP (respiration).
In the process of analysing and interpreting data, features were extracted from the raw physiological sensor data using the feature extraction methods. Then the most sensitive and relevant features are selected by using certain feature selection heuristics. Next, based on the selected features, a feature fusion procedure is applied to identify the stress level. Many feature fusion heuristics may be used in this procedure and different decision results can be acquired accordingly. Finally, we obtain the predicted stress level based on the feature fusion results (see Figure 1). In this paper, we continue our previous work [5,6] to focus on three main issues: (1) how to select the most relevant features based on individual performance and pair diversity, (2) how to combine features in order to accurately detect the stress level, and (3) how does the combinatorial fusion method compare to other conventional feature combination methods.
The organization of this paper is as follows. In section 2 we briefly review related work. Feature selection using performance and cognitive diversity is introduced in Section 3. The use of CFA to fuse the features is presented in Section 4. Our experimental results are summarized and described in Section 5. Finally, we give a conclusion and suggest future work that could result from the paper in Section 6.

Related work
In the experiment conducted by Healey and Picard [8], five kinds of wearable sensors were used: an EEG sensor, a hand and a foot GSR sensor, and a RESP sensor. People with these five sensors drove in and around downtown Boston on a pre-determined path. They went through three kinds of driving conditions classified as "rest", "highway" and "city". The signals of these sensors were recorded and analysed. A total of 22 features were extracted and feature combination was performed, using Linear Discriminant Function (LDF), to predict the driver's stress level [7].
The data acquired by Healey and Picard's experiment have been partially published on the website PHYSIONET [14]. It enables other researchers to explore and study stress detection. Akbas [1] presented an evaluation based on the driver dataset of PHYSIONET [14]. In his work, only 10 groups of those total 16 recordings were used. The remaining six groups are not evaluated because of the incompleteness of the sensor information. Zhang et al [24] gave a systematic approach using a Bayesian Network to combine sensor features.  [8] Combinatorial Fusion ( [9], [10], [11]) provides a useful information fusion method and metric in analysing the combination and fusion of multiple scoring systems (MSS). It has been used in many application domains such as video target tracking [14], the virtual screening of molecular compound libraries [23], protein structure prediction [13] and on-line learning algorithms [15].
Let D={d1, d2, …, dn-1, dn} be a set of candidates, such as sensor data, genes, documents, images, locations or classes. Let A be a scoring system with a score function from the set D to the set of real number R. In the framework of combinational fusion, each scoring system A consists of two functions: a score function sA and a rank function rA derived from sA by sorting the function values of sA in descending order. A Rank-Score Characteristic (RSC) function is defined as: fA: N→R [11].  [10,11] For a set of p scoring systems A1, A2, …, Ap on the set D, at least two different approaches can be used to combine them: Score Combination (SC) and Rank Combination (RC). The equations are as follows: where di is in D, and sA and rA are the score function and rank function from D to R and N respectively.
For a pair of two scoring systems A and B, the diversity between A and B, d(A,B), can be defined as d(sA,sB), d(rA,rB) and d(fA,fB) using score functions, rank functions, and rank-score characteristics (RSC) functions respectively. We use the last one in our paper and call this cognitive diversity.
3. Sensor feature selection using combinatorial fusion

Feature extraction
We use the same feature extraction results as Healey [7,8] and extract the 22 features from the driver stress dataset. Table 1 presents a detailed description. Upon further investigation into the 10 available driver datasets of the 17 driver datasets in PHYSIONET [16], we found that of these ten groups, seven drivers' data sets (drivers 6, 7, 8, 10, 11, 12 and 15) are complete. These not only include all the sensor information but also have a clear mark identification. Three drivers' sets (drivers 5, 9, and 16) are partially complete but can be used in the experiment. Driver05's first highway period lacks heart rate information. Driver09's second city period is less than five minutes and last rest period lacks a clear mark. Driver16's second city period and last rest period are both less than five minutes. The remaining seven drivers' data sets (drivers 1, 2, 3, 4, 13, 14 and 17) do not contain all the sensor information and the mark of the different driving period is not clear. Based on the complete portion of the sensor information, we acquired 65 segments with 22 features for each segment.

Feature selection using performance and cognitive diversity
Every feature extraction method is actually a score assignment metric. So every feature generation system can be regarded as a scoring system. We can use the performance as well as the diversity of multiple scoring systems to select the most important features, which can result in better performance when combined. The general principles are: (a) relatively good performance features can often result in better combination performance, and (b) those features with higher diversity can often result in better combination performance. Therefore, the aim of our feature selection is to find features with relatively good performance as well as relatively high diversity.

Performance sorted for single feature
We assume that the performance of a feature is in accordance or discordance with the increment of the stress level.   Figure 4 shows the sorted final performance of each feature in decreasing order. We can see that feature F has the highest correct rate with a value of 76.92% and feature U has the lowest correct rate with a value of 33.85%.

Cognitive diversity between features
The diversity of a feature pair {F1, F2} can be calculated as in Equation 3. f1 and f2 are the rank-score characteristic function for F1 and F2 respectively. Both functions f1 and f2 have total n score values and a rank sequence with range from 1 to n.
The diversity of a feature set S={F1, F2,…,Fn} is calculated as in The diversity between two feature sets S1 and S2 is calculated as in Equation 5. |S1| and |S2| are the cardinal number of S1 and S2.

Feature selection algorithm (FSA)
Our algorithm for feature selection based on combinational fusion is as follows: 1. Performance analysis; a) calculate the performance of each feature, b) sort feature performance in decreasing order ; 2. Diversity analysis; a) divide the total features into different groups based on the different sensor types, b) calculate the average performance and average diversity for each feature group, c) calculate the inter diversity between each pair of two features from different groups; 3. Selection based on performance and diversity; a) select a feature set with m (m = 5, 7, 9, 11 in our experiment) features with both high performance and high diversity ; b) repeat step 3(a) and generate p (p is 4 for each set in our experiment ) different feature groups to carry out further feature combination and evaluation; Figure 6 presents the modalities, performance and diversities of features. We divided the 22 features into four modalities according to the corresponding sensor types. These four modalities are: 1. EMG Features; 2. RESP Features; 3. GSR Features; 4. HR features. The features in each of the four modalities are in decreasing order according to performance. Bold font denotes the top 13 features. The number on the link between the two nodes is the diversity of these two feature sets. The number on the curve within a node is the average diversity of a feature set. It is assumed that, when the diversity between two modalities is bigger, it is more likely to generate a better combination performance. What's more, the features with better individual performance are more likely to result in a better combination performance.

Feature selection results
Based on Figure 6, we selected features from the top-13performance features and the diversity of the selected feature set is as high as possible. Using this feature selection method, we selected a total of four 5-feature sets (A), four 7-feature sets (C), four 9-feature sets (D) and four 11-feature sets (E). In order to perform comparisons, we also randomly select four 5-feature sets (B). The selected results are shown in Table 2.

Leave-one-out based on combinatorial fusion
We use a leave-one-out metric based on combinational fusion to evaluate feature selection results. There are 65 groups of data in total and each group contains five feature values. In each round of evaluation, one group of data is selected as a test case and the other 64 groups of data are used as training cases. The test case can then be predicted based on the trained data. The above steps are repeated 65 times and each time the test case is unique. In this way, we can obtain 65 predicted results. These predicted results are compared with the standard answers and the correct rate is calculated. First, every feature is regarded as a group with 65 scores. For features N, C, P, S and V, we negate their scores. Normalizing the scores for each feature would ensure that the score value is between "0" and "1". Sorting the scores in decreasing order implies that every feature has both 65 score values and 65 rank values. Next, the features from the feature set are selected and score combination and rank combination are performed individually according to Equation 1 and Equation 2. Finally, the score combination is sorted in decreasing order and the rank combination in increasing order respectively.
In the testing procedure, score combination and rank combination are also calculated. Then we compare the testing combination results with the training case combination results. For either score combination or rank combination, if the testing case is within the top 28 sequences, then the stress level is high. If testing case is between the 29th and 47th sequences, the stress level is medium. Finally if testing case is within the last 18 sequences, then the stress level is low.

Feature fusion result 4.2.1 Fusion results of 5-feature sets
The feature fusion correct rate of both score combination and rank combination for the 5-feature sets in both (A) and (B) in Table 2 is presented in Figure 7 and Figure 8. In Figure 7(a), the maximum correct rate is 83.08% resulting from the score combination of features F and T. In Figure 7(b), the maximum correct rate is 86.15% resulting from the rank combination of features E, D, O and P. In Figure 7(c), the maximum correct rate is 87.69% resulting from the rank combination of features Q, D, L, T and E. In Figure 7(d), both the rank combination and core combination of features E, T, L, A and P can result in the highest correct rate 86.15%. In Figure 8(a), the maximum correct rate is 83.08% resulting from the rank combination of features F and T. In Figure 8(b), the maximum correct rate comes from the individual performance of feature F. In Figure 8(c), the maximum correct rate is 80% resulting from the score combination of features T and D. In Figure  8(d), the maximum correct rate is 58.46% resulting from the rank combination of features R and I. Figure 7 (a) ~ Figure 7(d) and Figure 8(a) ~ Figure 8(d) belongs to 5-feature sets in (A) and in (B) respectively (See Table 2).
The overall performance of the 5-feature sets in (A) is much better than that of the 5-feature sets in (B).

Fusion results of t-feature sets, t=7, 9, 11
Tables 3(a) to 3(c) show the combination results of the 7feature sets, 9-feature sets and 11-feature sets respectively. Since the total combination is a large number, we only list the results of the highest combination performance of each feature group and its corresponding features.  Table 3. Combination result of t-feature sets (t=7, 9,11) 5. Fusion results comparison

Other feature fusion methods
In order to evaluate our feature fusion methods, we use another five methods as the feature fusion algorithms: LDF (Linear Discriminant Function), Decision Tree C4.5, SVM(Support Vector Machine), NB(Naïve Bayes) and KNN(K-Nearest Neighbours).
 In Healey's work [7], a linear discriminant function (Equation 6) was used to classify the stress levels.  C4.5 algorithm, is one of the most popular and practical methods for inductive inference [17,18], which uses information entropy as the metric to evaluate performance and uses the information acquired to select the nodes of the tree.  Pioneered by Vapnik [22], Support Vector Machine (SVM) is a statistical learning algorithm, the basic idea of which is to find an optimal hyper-plane that can maximize the margin between two groups of samples. The vectors nearest to the optimal hyperplane are called support vectors.  A Naive Bayes classifier is a simple probabilistic classifier based on applying Bayesʹ theorem with strong (naive) independence assumptions [25].  KNN is an instance-based learning, where the function is only approximated locally and all computation is deferred until after classification [20]. Table 4 shows the comparison of predictions of correct rates for different feature sets selected by using different feature selection metrics as well as different feature fusion methods. We use the maximum correct rate value of all the combination results for that feature set as the correct rate of combinational fusion used in this table.

Comparison of results
From Table 4, we can see that, for feature sets selected by using our method, the combinatorial fusion can result in better performance than the other five fusion algorithms. CFA generates a higher value of both the maximum correct rate and the average correct rate for the 5-feature set (A), 7-feature set (C), 9-feature set (D) and 11-feature set (E). Over all, CFA can result in a better performance than the other five fusion methods.
From Table 4 we can also see that when using combinational fusion, the maximum correct rate is 87.69% for the 5-feature set, 87.67% for 7-feature set, 87.69% for 9feature set and 87.69% for 11-feature set. The correct rate has not increased with the increase of the feature number. The 5-feature set selected by our feature selection method can result in the same highest performance as the 11feature set.  For the randomly selected 5-feature set (B), the best performance is 86.15%, which is generated using the C4.5 method on feature set B(5,2). The highest performance using combinatorial fusion is 83.08% on feature set B(5,1). However, the average performance of combinatorial fusion method is 74.62% which is lower than 86.15% but better than 68.46%, the highest (B) of the five fusion methods. So, overall, combinatorial fusion is better than the other five fusion methods for the four randomly selected 5-feature set (B).

Conclusion and future work
In this paper, we demonstrated how to use combinational fusion to select features and fuse physiological sensor information to determine drivers' stress levels. Our results showed that combinational fusion provides a good method with which to fuse sensor information. The correct rate can even achieve a much higher point when we use the features selected by our feature selection metric based on both performance and diversity.
The main contributions of this paper are: (1) we proposed an individual feature selection method based on both performance and diversity, (2) we used the combinatorial fusion method to fuse physiological sensor information to detect drivers' stress, and (3) we performed a comparison of the combinatorial fusion method with five machine learning methods. Our work showed that the combinatorial fusion can result in better correct rates in several cases.
In the future, we will study decision level fusion based on combinatorial fusion. In addition, other kinds of feature selection and feature fusion methods will be investigated.

Acknowledgments
The work of Yong Deng was performed during his visiting year 2011-2012 at School of Information Science and Technology, Pennsylvania State University. This research is supported by the State Key Program of National Natural Science of China (grant no. 61232005) and the National Key Technology R&D Program (no: 2012BAH06B01).