Cross-Task Mental Workload Recognition Based on EEG Tensor Representation and Transfer Learning

The accurate evaluation of mental workload of operators in human machine systems is of great significance in ensuring the safety of operators and the correct execution of tasks. However, the effectiveness of EEG based cross-task mental workload evaluation are still unsatisfactory because of the different EEG response patterns in different tasks, which hindered its generalization in real scenario severely. To solve this problem, this paper proposed a feature construction method based on EEG tensor representation and transfer learning, which was verified in various task conditions. Specifically, four working memory load tasks with different types of information were designed firstly. The EEG signals of participants were collected synchronously during task execution. Then, the wavelet transform method was used to perform time-frequency analysis of multi-channel EEG signals, and three-way EEG tensor (time-frequency-channel) features were constructed. EEG tensor features from different tasks were transferred based on the criteria of feature distribution alignment and class-wise discrimination criteria. Finally, the support vector machine was used to construct a 3-class mental workload recognition model. Results showed that compared with the classical feature extraction methods, the proposed method can achieve higher accuracy in both within-task and cross-task mental workload evaluation (91.1% for within-task and 81.3% for cross-task). These results demonstrated that the EEG tensor representation and transfer learning method is feasible and effective for cross-task mental workload evaluation, which can provide theoretical basis and application reference for future researches.


I. INTRODUCTION
T HE mental workload (MWL) state of operators would directly influence the task performance in humanmachine systems. Excessive workload might induce the decline of the operators' decision-making ability and therefore bring threat to its safety [1], [2]. It is of great significance to evaluate the mental workload accurately in ensuring the correct execution of tasks and the operators' safety. At present, the commonly used mental workload evaluation methods include subjective measurement (e.g., NASA-TLX, SWAT etc. [3]), task performance measurement (e.g., response time, accuracy, etc.) and physiological measurement (e.g., EEG, ECG and so on) [2], [4]. Of all these methods, the physiological measurement has attracted more attention from researchers since it can assess mental workload objectively and can be applied in real-time scenario.
Among all the physiological signals, EEG is considered to be the most sensitive physiological indicator to mental workload since it can directly reflect the cognitive process of human brain. Various studies have confirmed that the linear features (e.g., Power Spectral Density (PSD) [5], [6]), nonlinear features (e.g., Sample Entropy [7]) and functional connectivity features (nodal and global properties) [8], [9], [10] of EEG signals can effectively distinguish the mental workload level of operators in specific tasks. However, when the same features were applied to evaluate mental workload in different tasks (i.e., cross-task mental workload evaluation), the evaluation results usually tend to be corrupted and the recognition accuracy usually decreases to random levels [11], [12], [13]. Many studies attributed the cross-task problem of mental workload assessment to the differences of brain information processing mechanisms when operators performing different tasks. Specifically, different brain regions were involved in the process of different information, the response regularities of these brain regions differ from each other as well, which lead to the changes of EEG characters with mental workload under different tasks also varied [12], [14], [15]. From the perspective of the statistical characteristics of data, EEG features under different tasks usually appeared to have different probability distribution. However, the classical machine learning methods hypothesize that all the data were in an independent and identically distribution, these methods would be unsuitable in the cross-task scenarios therefore.
Researches of EEG-based cross-task mental workload evaluation methods can be divided into two categories, namely, common feature selection-based methods [8], [9], [16] and transfer learning-based methods [17]. The common feature selection method obtains the "common" indexes that suitable for various tasks by selecting physiological indexes that change significantly with mental workload in all different tasks, which can be realized by statistical test analysis [18], feature ranking strategies [9], [10] and so on. Ke et al. [18] explored the consistency of EEG power spectrum and taskindependent auditory event-related potential with mental workload in two types of cognitive tasks (Verbal N-Back and MATB II tasks) by nonparametric statistical test method. Kakko [9] and Dimitrakopoulos [10] proposed EEG feature selecting methods based on sequence forward search algorithm and model-driven strategy respectively. By using the selected features, satisfactory recognition accuracy was achieved in the cross-task mental workload recognition (N-Back and mental arithmetic tasks).
Transfer learning-based methods aim to find a feature transformation function/matrix that EEG features under different tasks would have same/similar probability distribution after transformation. By using the transferred features, the recognition accuracy of cross-task mental workload evaluation was supposed to be improved [19], [20]. The feature transfer learning method has been widely used in the field of EEGbased Motor Imagery Brain-Computer Interface [19], emotion recognition [21], cognitive state detection [22], [23], [24] and other tasks. In terms of cross-task mental workload evaluation, Zhou et al. [17] first tried to use classical transfer learning models such as Transfer Component Analysis (TCA) and Joint Distribution Analysis (JDA) to evaluate the cross-task mental workload in working memory and mathematics addition tasks, and a higher recognition accuracy than non-transfer learning methods was achieved.
In addition, it should also be noticed that the generally used feature analysis methods usually only consider the singledimension information of EEG (such as time dimension, frequency dimension, etc.), and these features were processed in the form of vectors in the subsequent processing procedure. Since multiple brain regions were involved and these regions worked collaboratively in a dynamic way when the brain is processing information, the structural characteristics of the EEG signals between different dimensions (such as time, frequency, channel, etc.) would be ignored, which result in the loss of useful information in distinguishing mental workload levels [25], [26], [27]. Various studies demonstrated that the multi-dimension information of EEG signals is sensitive to the level of mental workload [6], [16], [28]. Therefore, it is necessary to explore a feature construction method that can simultaneously characterize multi-mode information of EEG for the accurate mental workload evaluation. Tensor is a representation of high-dimensional information of data, which has been widely used in the fields of gait recognition [29], image recognition [30] and so on. The intrinsic attributes of EEG data can be effectively retained by the tensorized form of representation [25], [26], [27]. In the field of brain-  computer interface, Li [31] and Zhang [32] proposed EEG tensor features construction methods in decoding subjects' motion intentions in motor imagery tasks. Dao [33] realized the detection of epileptic spikes based on EEG tensor features and tensor decomposition methods. These studies implied that EEG feature construction method based on tensor representation might be feasible for mental workload evaluation.
Considering all the above, it is believed that the tensor form of EEG features can represent the dynamic response characteristics of brain in task-state more comprehensively. Meanwhile, by applying transfer learning procedure on the EEG tensor data, the distance between feature distributions under different tasks can be effectively reduced, thus the accuracy of cross-task mental workload evaluation might be improved. Therefore, this study proposed a crosstask mental workload evaluation method based on the EEG tensor representation and feature transfer learning methods. Specifically, the time-frequency characteristics of multi-channel EEG under different information types and different load levels were extracted to construct EEG feature tensors. Tensorized feature transfer learning method and the corresponding optimization method were then introduced and applied. Finally, based on the classical support vector machine (SVM), mental workload recognition models for both withintask and cross-task scenario were constructed to verify the effectiveness of the proposed method.

A. Subjects and Experiment Protocol
Sixteen participants were recruited in the experiment (8 males and 8 females, with an average age of 25.6±2.4). All subjects signed the informed consent before the experiment, and this study was approved by the Ethics Committee of Beihang University.
The N-Back paradigm was used to carry out the experiment. Four different information types (verbal, object, space (verbal), space (object)) were used in this study (Fig. 1). Three task loads were set by manipulating the number of items to be remembered in the N-Back tasks (N value from 1 to 3). That is, all subjects need to complete a total of 12 mental workload tasks (4 task types × 3 task difficulty), which can be seen in Fig. 2. Specifically, under each task condition, the subject needs to judge whether the current presented information is consistent with the previous Nth information in a limited time (2s in this study). Each task condition consisted of 43 trials (of which the first three trials were not used for subsequent analysis). During the tasks, the response time was recorded. After the tasks, the response accuracy was calculated and the participants were asked to finish the NASA-TLX scale. Detailed experimental procedure can also be found in our previous work [8].

B. Data Acquisition and Pre-Processing
The entire experiment was carried out in a sound-shielded room. The NeuroScan system (SynAmps2, USA) was used to record 60-channel EEG signals of the participants in real time, and the signal sampling rate was 1000 Hz. During the acquisition procedure, the electrode impedance is kept below 10k .
The effective components in the original EEG signal are often submerged in noise and artifacts, which would seriously affect the analysis results. Therefore, a band-pass filtering (0.5-40Hz) was used to eliminate the DC component and highfrequency noise of the signal, and the artifact such as blink, vertical/horizontal electrooculogram, and electromyography were removed by using the SASICA algorithm [34]. Signal from 0.5s before stimulation to 2s after stimulation was taken as a trial for subsequent feature analysis.

C. EEG Tensor Representation
Considering the non-stationary and dynamic characteristics of EEG signals, and the close relationship of its temporal, frequency, and spatial dimensional information to mental workload, the time-frequency-space dimension was selected to construct the EEG tensor feature.
The whole process of EEG feature tensor representation and mental workload recognition is shown in Fig. 3. Considering that the most sensitive components to mental workload of EEG is mainly concentrated in the low frequency band, a band-pass filter (1-20Hz) is first applied, then the time-frequency analysis was carried out for each EEG signal channel from all trials based on the wavelet transform method. The time-frequency data of each channel were combined to construct 3-way tensors (time-frequency-channel) of the task-state EEG signal feature.

D. Tensorized Transfer Learning
The classical transfer learning methods mainly consider the distribution difference of features. Although these methods can align the source domain and target domain samples in the perspective of probability distribution, the effectiveness of classification tasks still needs to be considered. That is to say, it is necessary to consider both the consistency of feature probability distribution and classification discrimination ability when conducting feature learning.
To fill this gap, the proposed tensorized feature transfer learning methods considered the distribution alignment and class-wise discrimination learning in the same time, and the cost function can be expressed as L = L distribution + θ L discrimination , in which the L distribution represents the distribution alignment and the L discrimination names the classwise discrimination loss. θ is the tuning factors to adjust the weight of the two parts, when θ = 0, the whole cost function was degenerated to the tensorized joint domain distribution alignment method [35].
1) Distribution Alignment: The distribution alignment assumes that the feature from source domain and target domain appears to have same or similar distribution in the subspace with projection of matrix U = U (k) k=1,...,K . Considering about the large margin of feature distribution among different tasks, a source domain transfer matrix P = P (k) k=1,...,K was also introduced in this study [36]. According to the classical joint domain distribution alignment method, the proposed distribution alignment method also composed of the marginal and conditional distribution alignment parts, which can be represented as L distribution = (1 − µ) L marginal + µL conditional , in which the µ is a tuning factor to regular the weight of the two parts and µ ∈ [0, 1]. The distance of feature distribution was measured by the Maximum Mean Discrepancy (MMD) criteria. The detailed calculation of the distribution alignment loss was shown as below: In which, the X S and X T represents the tensor sample from source and target domain respectively and the number of samples are N S and N T , which was corresponding to the tensor features from different task conditions in this study. P and U denote the source domain feature transfer matrix and the subspace projection matrix respectively, and [[X S ; P]] means the mode-product of tensor X S and matrix P ([[X S ; P]] = X S × 1 P (1) × 2 P (2) · · · × k P (k) ). C is the number of class/workload level, N T are the sample number in source or target domain that belong to the class C. tr (·) is the trace of matrix, and k represents the mode of the feature tensor. By unfolding (1) with mode-k, the L distribution can be calculated as: in which, T , Q 0 and Q 1 are the MMD matrix of marginal and conditional distribution alignment, which can be calculated by: 2) Class-Wise Discrimination Learning: Although minimizing the distribution distance of tensor features can reduce the distribution diversity, it cannot guarantee the learned features have enough ability in distinguishing workload levels. Therefore, the General Tensor Discriminative Analysis (GTDA) [29], [31] methods were introduced to construct the objective function of class-wise discrimination learning procedure after the distribution alignment, which can be seen as: In which, L between and L within represents the margin of intra-and inter-class, M S and M T are the mean value of the data from source and target domain, M  (5), the class-wise discrimination learning cost function can be represented as: B (k) and W (k) are the intra-and inter-class matrix of the k-mode unfolded matrix, which can be calculated as: In which, mat (k) (·) is the mode-k matrix. The factor ζ is set as the maximal eigenvalue of W (k) −1 · B (k) . 3) Optimization: The objective of the proposed method is to learn the transfer matrix U and P. By using these two transfer matrixes, the EEG features from different tasks would have similar distribution and class-wise discrimination property. According to the cost function of distribution alignment and the class-wise discrimination learning depicted in Eq. (2) and Eq. (6), the whole optimization object can be arranged as: Since there are several parameters to be solved (U and P), the Alternating Iteration Strategy was applied. In detail, the problem was decomposed into subproblems with one parameter to be solved by fixing the other. These subproblems were optimized iteratively until convergence to obtain an approximate solution [37]. The detailed calculation procedure are as follows: Step 1 (Optimize U Given P): The subspace matrix U can be solved by High Order Singular Value Decomposition (HOSVD) of tensors. To solve this problem, an auxiliary variable Z S = [[X S ; P]] was introduced and the optimizing problem (8) can be converted to the SVD of (1 − µ) Q Step 2 (Optimize P Given U ): The optimization object was degenerated to L distribution when U was fixed, that is: By unfolding (9), the problem can be converted to (10), which can be solved by using the classical optimizing calculating procedure [38].
At last, the tensor feature from source and target domain can be represented as X * S and X * T by using the projection matrix U and P, which can be calculated by (11).
Algorithm Discriminative Tensor Feature Transfer Learning Input: Source and target domain data sets: ; Tuning parameters: µ, θ. Output: Subspace projection matrix U ; Source domain transformation matrix P; Transferred data: X * S , X * T .
1: Initialize U, P 2: FOR iter = 1 to T 3: FOR k = 1 to nDim 4: Update P by solving Eq. (10) 6: END FOR 7: Check for convergence 8: END FOR 9: Calculate X * S , X * T by Eq. (11) 10: Return X * S , X * T , U, P In total, the whole procedure of the proposed algorithm is as follows:

E. Feature Selection and Classification
In order to remove the redundant information in the feature and reduce the complexity of the classification model, Fisher Score is used to select features that give most contributions in distinguishing mental workload for subsequent classification steps [39]. The central idea of the Fishier Score is to find out a subset of features which have the largest inter-class margin and the smallest intra-class margin. Specifically, for a certain feature X i , let µ i c and σ i c be the mean and standard deviation of the features in category C, and µ i c , σ i c be the mean and standard deviation of all samples, respectively. The Fisher Score of this feature can be calculated by Equation (12). After sorting the Fisher Score of all the features in a descending order, the first N features were taken as the input of the classification model.
In order to verify the feasibility of the proposed feature construction method for mental workload recognition, a SVM classifier was used to construct a 3-class recognition model for both within-task and cross-task conditions. For within-task classification, a 5-fold cross-validation procedure for each of the four tasks was applied. For cross-task mental workload identification, one of the four tasks was selected as the training task, and the other three were used as the validation set, which can also be called leave-one-task-out approach. In order to evaluate the recognition effect, the accuracy, sensitivity and specificity were calculated respectively. Table I shows the classification results of the proposed method in within-task and cross-task workload evaluation.

A. Classification Performance
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. An average recognition accuracy of 91.1% for the within-task and 81.3% for cross-task mental workload recognition were achieved by using the EEG tensor feature and transfer learning procedure.
Previous researches have demonstrated that various EEG features can be used in distinguishing mental workload levels. In order to validate the effectiveness of the proposed tensor transfer learning feature and the classical feature extraction method for mental workload recognition, the classical single mode features and tensor features without transfer learning were also tested in both within and cross-task workload classification under the same N-Back workload task condition. Specifically, the single mode features include PSD, multi-scale sample entropy respectively. Fig. 4 illustrates the comparison between these classical feature extraction methods and the proposed method in this paper. It can be seen that although the feature extraction method based on time domain (multi-scale sample entropy) and frequency domain (PSD) can achieve satisfactory results in within-task mental workload recognition (Sample Entropy: 0.662 ± 0.016, PSD: 0.685 ± 0.024, Sample Entropy + PSD: 0.736 ± 0.025), the recognition effectiveness in cross-task scenario declined severely (Sample Entropy: 0.394 ± 0.018, PSD: 0.399 ± 0.020, Sample Entropy + PSD: 0.408 ± 0.025). Compared with these single-dimensional features, the recognition accuracy of tensor features without transfer learning was significantly improved in both withintask and cross-task scenarios (within-task: 0.871 ± 0.079, cross-task: 0.625 ± 0.029). The feature extraction method based on tensor representation and transfer learning proposed in this paper achieved the highest recognition accuracy. Furtherly, Table II illustrated the typical state-of-the-art studies concerning the cross-task mental workload evaluation. It should be noticed that there is no uniformed task paradigm in this field, which means the tasks used in the workload evaluation varied from each other. Therefore, the comparison is not rigorous. To be consistency, Table II mainly focus on researches that involving similar cognitive process (i.e., working memory). It can be seen from the table that researches using only PSD features could hardly obtain a satisfactory classification results, while the accuracies have been improved in distinguishing high and low workload (binary classification) when the connectivity relationship of different EEG channels (or different brain regions) were considered, which is consistency to the results depicted by Fig. 4. The proposed method of this study achieved comparable classification accuracy in a 3-class situation when taking the multi-mode information of EEG account.

B. Convergence and Parameter Sensitivity
In each iteration, the projection matrix U of each mode is determined by the tr U 1 − µ Q k 0 Given U k and U k + 1 as the projection matrix in kth and k+1th iteration, the total scatter difference matrix of the two iterations is ψ y k and ψ y k+1 , respectively according to the calculation formula of intraclass and inter-class dispersion matrix. Since U is taken from the eigenvectors of ψ y(k) with the largest scatter difference matrix, it can be known that ψ y(k) ≤ ψ y(k+1) . Therefore, it is reasonable that: That means the total scatter difference is a monotonically changing process with an upper limit a and a lower limit b. Fig. 5 shows the changes of the total scatter difference obtained by the eigenvalue matrix of (1 − µ) Q and by the random initialization way. It can be seen that the scatter difference calculated by the two methods for the first time is quite different, but with the iteration proceeding, the scatter difference quickly converged. After 3∼4 iterations, both initialization methods can reach the convergence state.
In the tensor feature representation and transfer learning procedure, the initialization parameters would directly affect the recognition accuracy of mental workload. In order to find out the influence of parameters on classification performance,  different initialization parameter settings were compared and analyzed. Among these parameters, µ determines the importance of marginal and conditional probability distribution alignment. When it tends to 1, the cost function mainly considers the conditional distribution, and when it tends to 0, the marginal distribution was mainly considered. θ determines the weight of distribution alignment and the classwise discrimination. Fig. 6 shows the cross-task recognition accuracy changing with θ and µ. It can be seen that when θ is set to 0.5, the recognition accuracy of mental workload is the highest, and when µ is set to 0.5, the recognition accuracy is slightly higher than other values.

C. Computational Complexity
The computation process of the proposed method includes tensor representation of EEG data and feature transfer learning. The tensor representation procedure based on time-frequency analysis of the N EEG channels with signal length L would take O (L · N ). Considering about the feature transfer learning, the projection and transformation matrix U and P of each mode (k) were calculated in each iteration, so the number of calculations is O (T · K ). Furtherly, the SVD of U and the optimizing of P (Eq. (10)) both cost O N 3 , which leads to a total time complexity of O (L · N ) + O 2 · T · K · N 3 [38].

IV. DISCUSSION
The effectiveness of the cross-task mental workload evaluation based on physiological signals such as EEG still remains unsatisfactory. From the perspective of mining multidimensional character of EEG signals, a tensor representation and transfer learning-based EEG feature construction method was proposed. After that, recognition model based on the classical SVM classifier in both within-task and cross-task scenario was also realized. Results showed that by using the tensorized and transferred EEG feature, an averaged accuracy of 91.1% for within-task and 81.3% for cross-task mental workload evaluation were achieved, which is significantly higher than the recognition results by using classical features such as PSD, sample entropy and so on. These results demonstrated that it is feasible to use EEG tensor features for mental workload evaluation, and the tensorized transfer learning method is effective for cross-task mental workload evaluation.
Many studies have demonstrated that the time, frequency, spatial information of EEG signals could reflect the mental workload state of operators [6], [16], [28]. The classical method based on linear and nonlinear characteristics of EEG usually ignored the time information of EEG signals as well as the interaction information between different brain areas (or EEG channels), which might not able to characterize the workload state comprehensively and thus lead to poor effects of mental workload evaluation [28]. The tensor representation of EEG data can fill the gap of the information losing in classical feature construction methods by considering multi-dimensional information of EEG, thus its recognition accuracy is significantly higher than that of classical method. Meanwhile, this result is consistent with the previous researches which believe that multi-dimensional information of EEG makes contribution to mental workload recognition. At the same time, it also confirms the necessity of applying multi-dimensional feature representation (i.e., tensor representation) to evaluate mental workload. Furtherly, compared with the EEG time-frequency-spatial tensor features without transfer learning, the EEG tensor features after transfer learning have higher recognition accuracy in both withintask and cross-task conditions, indicating that the feature distribution and class discriminative learning procedure both have good effect on distinguishing mental workload levels, which also verified the validity of the proposed method.
Considering the influence of parameter settings on recognition accuracy, it is found that the optimal classification results cannot be achieved in the cross-task mental workload evaluation when the distribution alignment of features (with θ tending to 0) or class discrimination information (with θ tending to 1000) was the only factor considered. Although the analysis of parameter sensitivity in this study relies on experience and attempts, these results can still demonstrate that it is necessary to consider both the distribution of features and the discriminative features in the transfer learning process. Optimization criteria-based methods for parameter selection can be explored in the future researches [40]. In addition, tensorized classifiers such as support tensor machine can be applied in the future to further verify the effectiveness of the proposed method instead of the classic SVM model [41].
In conclusion, this paper proposed a cross-task mental workload evaluation method based on EEG tensor representation and transfer learning. Satisfactory results in both within-task and cross-task conditions were achieved, which can provide theoretical basis and application reference value for future research. However, the research of this paper is still limited to mental workload recognition in a pre-set task environment (i.e., N-Back tasks), the effectiveness of the proposed methods in real human-machine systems with more complex information and operations needs to be further verified.