Optimized electroencephalogram and functional near-infrared spectroscopy-based mental workload detection method for practical applications

Mental workload is a critical consideration in complex man–machine systems design. Among various mental workload detection techniques, multimodal detection techniques integrating electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS) signals have attracted considerable attention. However, existing EEG–fNIRS-based mental workload detection methods have certain defects, such as complex signal acquisition channels and low detection accuracy, which restrict their practical application. The signal acquisition configuration was optimized by analyzing the feature importance in mental workload recognition model and a more accurate and convenient EEG–fNIRS-based mental workload detection method was constructed. A classical Multi-Task Attribute Battery (MATB) task was conducted with 20 participating volunteers. Subjective scale data, 64-channel EEG data, and two-channel fNIRS data were collected. A higher number of EEG channels correspond to higher detection accuracy. However, there is no obvious improvement in accuracy once the number of EEG channels reaches 26, with a four-level mental workload detection accuracy of 76.25 ± 5.21%. Partial results of physiological analysis verify the results of previous studies, such as that the θ power of EEG and concentration of O2Hb in the prefrontal region increase while the concentration of HHb decreases with task difficulty. It was further observed, for the first time, that the energy of each band of EEG signals was significantly different in the occipital lobe region, and the power of β1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta_{1}$$\end{document} and β2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta_{2}$$\end{document} bands in the occipital region increased significantly with task difficulty. The changing range and the mean amplitude of O2Hb in high-difficulty tasks were significantly higher compared with those in low-difficulty tasks. The channel configuration of EEG–fNIRS-based mental workload detection was optimized to 26 EEG channels and two frontal fNIRS channels. A four-level mental workload detection accuracy of 76.25 ± 5.21% was obtained, which is higher than previously reported results. The proposed configuration can promote the application of mental workload detection technology in military, driving, and other complex human–computer interaction systems.


Introduction
Mental workload has long been a factor of immense interest in designing and applying complex human-machine systems [1], and is increasingly recognized as a serious, worldwide public health concern. Only when the mental workload is in the appropriate range can high performance and operational reliability be maintained. An irregular mental workload state will impair a person's work performance, leading to task failure [2,3], and might endanger people's health or safety in severe cases. In human-machine systems with high safety requirements, such as in the military, aviation, aerospace, driving, and other domains, human errors such as information acquisition and judgment decision error caused by excessive mental workload are often responsible for accidents [4,5]. For example, soldiers in a battlefield must participate in warfare for a long time without interruption and must stay alert to respond to various systems. The harsh environment poses extraordinary physical and psychological challenges to them [6,7]. Goodman et al. [8] reported that of 1094 USAF Unmanned Aerial Vehicle (UAV) operators stationed in the US, approximately 20% reported high fatigue, 11% reported high cynicism, and 3% reported reduced performance. This is because the long-term unsuitable mental workload state seriously endangers the operator's mental and physiological health. Therefore, it is crucial to measure the mental workload of the operator accurately.
In the past several decades, neurophysiological signal have played an important role in mental workload detection because of its objectivity and stability [9]. One of the major topics to be investigated in this field is electroencephalogram (EEG)-based mental workload detection method. For instance, Georgios et al. [10] carried out a study of EEG based mental workload detection, and the experiment was implemented with N-back and mental arithmetic (the most commonly used single cognitive resource tasks for studies of mental workload [11][12][13]). The binary classification accuracies of mental workload are 88% and 86% in the N-back task and mental arithmetic task, respectively. Besides, there are still some researches on EEG for mental workload detection in complex tasks. Hongquan Qu et al. [14] carried out a three-level Multi-Task Attribute Battery (MATB) [15] task with 32-channel electroencephalogram (EEG) acquisition. Power spectrum density (PSD) was analyzed with independent components analysis (ICA) algorithm, and the average recognition accuracy reached 79.8%. 64-channel EEG data were recorded in a simulated flight experiment, and PSD, phase lag index (PLI) connection features were analyzed and extracted, giving a recognition accuracy 82% of threelevel mental workload [16]. For application of mental workload detection, the significant advantage of EEG is that it contains abundant information, but it also has the disadvantages of low spatial resolution and complex operation.
As a new neurophysiological signal acquisition technology, recently, functional nearinfrared spectroscopy (fNIRS) has become a research hotspot in this field with the advantages of high spatial resolution and portability [17]. Reported studies proposed that fNIRS performs well in mental workload detection of both single task and complex task. For example, Asgher et al. [18] observed the brain activities of the prefrontal cortex (PFC) region with fNIRS technology in a four-level mental arithmetic task, and obtained a recognition accuracy 89.31% with classification algorithm of long short-term memory (LSTM). Siddiquee et al. [19] explored the response difference of brain activity measured by fNIRS in different areas of the PFC. N-back task was conducted in the experiment, and the results show that the blood oxygen of middle prefrontal position can significantly improve the recognition precision, with a highest binary classification accuracy 90%. In a study of actual driving environment [20], four-channel fNIRS system was adopted to monitor the real-time change of blood oxygen in PFC region. The recognition accuracy of three-levels driving tasks reached 82.71%. Besides, in a study of air traffic control instructions tasks in flight simulators, Gateau et al. [21] collect fNIRS signals to detect the mental load state of pilots in two different group, and the accuracy reached 80% by SVM.
Aforementioned studies of mental workload detection are based on single physiological signal (either EEG, or fNIRS). Either the number of grades or the recognition accuracy of mental workload detection was not ideal; i.e., in Ref. [10] the authors reached an high accuracy but over a low number of classes, while in Ref. [16] the authors reported a classification performance on a high number of classes but with a low discrimination accuracy. This limitation is probably due to the limited information of single physiological signal. Therefore, researchers began to focus on multi-physiological signal fusion detection methods. Liu, Y. et al. [22] carried out a study on mental workload detection of simple cognitive resource tasks by fusing 28-channel EEG and 16-channel fNIRS. 3-level workload was induced by N-back task in the experiment, and the result indicates recognition accuracy based on fusion of EEG and fNIRS was significantly greater compared with single signal of EEG or fNIRS. Similar result was also observed in mental workload detection of complex tasks, Sangtae et al. [23] proposed a multi-physiological signal based mental workload detection method, which collected 64-channel EEG signals, and eight-channel fNIRS signals of drivers simultaneously. The result indicates that the recognition accuracy based on multi-physiological signal was significantly greater than that of single physiological signal. What's more, reported literatures [11,24] support the above viewpoint as well. In conclusion, existing studies suggest that the combination of multiple physiological signals can obtain better performance in mental workload detection compared with signal physiological signal. Nevertheless, there are still some limitations of the reported studies in the following aspects: the multiple physiological signals acquisition configuration was based on an excessive number of channels (i.e., 64 channels in the 10-20 system) [11,23], only three or even less different levels of mental workload were considered [25][26][27], and the recognition accuracy was not sufficiently high (i.e., accuracy value higher than 70%).
In a related study to practical applications, Chi et al. [28] designed an experiment to collect task completion times and subjective mental load for five driving tasks. The results showed that task completion times for truck driving could be predicted using a learning curve. Practice significantly reduced mental workload ratings. Midha et al. [29] used fNIRS in an office environment to measure changes in mental workload for daily reading and writing tasks. In actual flight scenarios, Hankins et al. [30] collected ECG, EEG, EEG and subjective data to assess the level of mental load in different driving states. From these studies related to practical applications it is clear that the following issues must be paid attention to in real-time monitoring of mental workload in complex man-machine system. First, monitoring equipment should be portable and easy to operate, with a simple channel configuration of signal acquisition as far as possible. In addition, the response speed of the monitoring model should be as high as possible; consequently, algorithms with a large amount of computation are not suitable for this task. Finally, the gradation of the mental workload should be more detailed and the detection accuracy should be higher. This study focuses on these issues, and adopted complex simulation tasks to mimic the actual task scenarios. First, optimized data acquisition system integrates a few channels of EEG acquisition devices and a portable near-infrared devices was adopted. In addition, this study considers the modeling performance based on three different feature sets: derived from EEG signal, derived from fNIRS signal, derived from both EEG and fNIRS signals. Various classifiers were used to obtain an optimization modeling method. Last but not least, the difficulty of experiment task was divided into four levels, both of EEG and fNIRS features were took into account in the detection model of mental workload for a better modeling performance. The results of this study can promote the application of mental workload detection technology in military, driving, and other complex human-computer interaction systems. Fig. 1, the influence of the task's difficulty level on the comprehensive scale and comprehensive performance of 20 subjects was analyzed. Figure 1 indicates that subjective scale scores increase while the task performance decrease with the increase of task difficulty. After the outliers were removed, one-way ANOVA analysis showed that the overall score of the scale had a major effect on the mental workload level [F (3, 76) = 31.633, P < 0.001]. Post hoc analysis showed significant differences between load levels 1 and 5 (P_FDR < 0.01), 1 and 7 (P_FDR < 0.01), 3 and 5 (P_FDR < 0.05), and 3 and 7 (P_FDR < 0.01) (P_FDR < 0.05). One-way ANOVA analysis of performance scores showed that the main effect of combined performance scores was also observed on the mental workload level, F (3, 76) = 17.16, P < 0.001. Post hoc analysis showed that there were significant differences in the overall performance between difficulty 1 and 5 (p_ FDR = 0.001), 1 and 7 (p_FDR < 0.001), and 3 and 7 (p_FDR = 0.001) (p_FDR < 0.05). Feature changes analysis results

EEG channels selection
In order to obtain the optimal EEG channel configuration for measuring mental workload, the importance of each channel was calculated by the algorithm mentioned in 5.5.2 above, and the results were sorted from high to low, as shown in Fig. 2. It can be seen in Fig. 3 that almost all channels have good performance in β 1 and β 2 , which indicates that power of β 1 and β 2 are sensitive physiological characteristics of mental workload. In top-ranked channels (P4, PO6, O1, P6, PO5, P8, PO8, P7, PO7, PO4), the power of θ and α has a greater influence on the model. In addition, Fig. 3 reveals that the sensitive channels on mental workload of EEG mainly comprised occipital lobe (P4, PO6, O1, P6, PO5, P8, and OZ), the frontal (AF3) and sports area (C1, C2). Furthermore, classification model based on SVM classifier was performed to explore the influence of channel number on classification accuracy. The channel was added to the model one by one based on the ranking order in Fig. 3, and the modeling performance varying with the number of channels is shown in Fig. 3. It can be seen that the classification accuracy of the model increases with the increase of the number of Fig. 2 Channels importance ordering. The horizontal axis is channel, and the vertical axis is importance value of each channel, and the results were sorted from high to low. Different colors represent different frequency bands of EEG, and the height of the color block represents the features importance for a given channel channels, but the rising trend slows down when the number of channels reaches 26. Therefore, we selected 26 channels (P4, PO6, O1, P6, PO5, P8, PO8, P7, PO7, PO4, P5,  P1, O2, CP2, P2, AF3, C1, P3, Pz, C2, PO3, TP8, Oz, CP4, TP7, CP6) for subsequent research.

EEG feature changes analysis results
ANOVA was performed to analyze the difference of power response in θ, α, β 1 and β 2 band under various mental workload levels. The statistical analysis results of 62 scalp electrodes was demonstrated as scalp maps in Fig. 4, and the result was corrected by FDR, since hundreds of comparisons were implemented simultaneously. After removing the outliers, ANOVA results showed differences in PSD that were mainly concentrated in all regions of θ band, prefrontal and occipital regions of α band, and occipital regions of β 1 and β 2 band. Further, in order to find out the change trend of PSD with task difficulty, the difference of power in θ, α, β 1 and β 2 band between highest load (Level 7) and lowest load (Level 1) were calculated. The average results among 20 subjects of 62 scalp electrodes were demonstrated as scalp maps are shown as Fig. 5 a. Figure 5b indicates that power of θ band in prefrontal increased with the increase of task load, while power of α band in the right hemisphere and occipital region decreased with the increase of task difficulty. In addition, the power of β 1 and β 2 bands in the occipital region increased with the increase of task difficulty.
Specifically, the change trend of PSD in θ, α, β 1 and β 2 band of three typical channels (O1, P4 and PO5) were analyzed, as shown in Fig. 5a. According to the data shown in Fig. 5b, three major conclusions can be obtained: power of θ band in O1, P4 and PO5, showed a good correlation with task difficulty and increased with the increase of load level, which was consistent with the previous research [34]; power of β-band in O1 and PO5 also increased with the increase of task load; the α-band energy of O1 channel is negatively correlated with the load, which is consistent with the study [12].

fNIRS feature changes analysis results
Next, the changes of O 2 Hb and HHb during the change of mental workload were analyzed. The average of 20 subjects under the same load level obtained the results as shown in Fig. 6, which shows the variation trend of O 2 Hb and HHb amplitude in the left and right prefrontal regions with a time window of 3 s. After the outliers were removed, oneway ANOVA analysis showed that the main effect of O 2 Hb on the left and right side After the brain enters the working state, local neuronal activations increase metabolic rate, leading to increased blood flow and volume [36]. At the beginning of the task, PFC oxygenation locally increases, and the higher the load of the task, the more active the brain becomes, which also leads to the increase of O 2 Hb in the PFC region [37]. As can be seen from the pattern shown in the figure, during the period of Level 1 to Level 7 tasks, the concentration of O 2 Hb increases with the difficulty of tasks, and the content of O 2 Hb at Level 7 is significantly higher than that of other tasks with load levels. We also note that the change of O 2 Hb is more dramatic under high workload levels. Especially during a Level 7 task, the variation range and average amplitude of O 2 Hb are significantly improved compared with Level 1. The change of HHb and O 2 Hb showed an opposite trend. The content of HHb decreased with the difficulty of the task, and the change range of HHb was significantly lower than that of O 2 Hb.

Classification results
In order to study the classification effect of EEG, fNIRS, and EEG-fNIRS feature sets, three classifiers, namely, SVM, RF, and DT, were performed to establish the recognition model. After processing, 960 samples per subject were obtained for each difficulty level, on a total of 3840 samples for the four difficulty levels. Only-EEG feature set is the dataset using only EEG features, and the dimensionality of the dataset is 3840 samples × 132 features. only-fNIRS feature set is the dataset using only fNIRS signal extraction features, and the dimensionality of the dataset is 3840 samples × 44 features. The EEG-fNIRS feature set uses both fNIRS features and EEG features, and the dimension of the dataset is 3840 samples × 180 features. All classification results were obtained using the 20% number of retained feature sets tested. Table 1 presents the minimum, maximum, average and standard deviation value of classification accuracies of different feature sets among 20 subjects. For only-EEG feature set, the mean of classification accuracies of SVM, RF and DT are 52.29%, 54.24% and 45.89%, respectively, with a highest recognition accuracy 54.24% in RF classifiers. For only-fNIRS feature set, the mean of classification accuracies of SVM, RF, and DT are 67.13%, 70.30% and 62.163%, respectively, with a highest recognition accuracy 70.30% in RF classifiers. For EEG-fNIRS feature set, the mean of classification accuracies of SVM, RF, and DT are 70.23%, 76.25% and 64.46%, respectively, with a highest recognition accuracy 76.25% in RF classifiers.
To verify that our classifier is not overfitting, in Table 2, we give the results of the corresponding training set, and we can see that the test set results are very close to the training set results, and we can assume that our model is adequately fitted. After removing outliers, a two-factor method was used to analyze the effects of various classifiers and datasets on the accuracy. The results showed that the main effect of accuracy was observed on the feature set and the classifier, F (2, 171) = 92.539, P < 0.01, F (2,171) = 15.253, P < 0.001. No interaction effect F (4, 171) = 0.317, P > 0.05 was observed. To sum up, RF classifier performed better in three feature sets compared with SVM and DT, and EEG-fNIRS feature set provided better performance than both only-EEG feature set and only-fNIRS feature set, with a highest four-level recognition accuracy 76.25 ± 5.21%.
What's more, according to Table 1, we also observed that the standard deviations of recognition accuracy in EEG-fNIRS feature set was smaller than that of both only-EEG feature set and only-fNIRS feature set. In conclusion, EEG-fNIRS feature sets not only significantly improve the classification accuracy, but also make the model more stable and more robust, which is particularly important in practical applications. Table 1 Test classification results on three feature sets Note that "Min" represents the minimum value of classification accuracy for all subjects, "Max" represents the Maximum value of classification accuracy for all subjects, "Mean" represents the average of the classification accuracy of all subjects, "Std" represents the standard deviation value in classification accuracy for all subjects

Feature set Classifier Results
Min

Discussion
In order to promote the practical application of the mental workload status detection technology, this study conducted MATB to simulate the cognitive needs of operators in their daily work and used portable EEG acquisition equipment and fNIRS acquisition equipment to collect the physiological signals of subjects during the task. In this study, first, EEG signals from 64 channels were simplified to 26 channels, which significantly improved the convenience of operating the equipment. Second, variation of both EEG and fNIRS features with the task difficulty were analyzed, in order to provide physiological explanation for the variation of mental workload. Finally, the modeling performance of only-EEG feature sets, only-fNIRS feature sets, and EEG-fNIRS feature sets in four levels of mental workload monitoring was explored and compared.
In the analysis of operator behavioral data, we observed that job performance significantly decreased and subjective scale score increase with the increase of mental workload. This indicates that excessive mental workload will lead to insufficient cognitive resources for operators to maintain good job performance. We selected 26 channels with the highest degree of correlation with load for analysis from the results of channel screening. It can be seen that these 26 channels are mainly concentrated in the occipital lobe and parietal lobe, while a few channels are located in the frontal region. The occipital lobe region is generally considered as a visual region [38], which is highly correlated with basic cognitive functions such as visual search and visual attention [39]. The performance of the operation is highly correlated with the parietal lobe [40]. The prefrontal region is involved in various higher cognitive abilities, such as executive function and memory [41]. This indicates that the mental workload of operators during the task is mainly caused by vision and operation, especially as the operator needs to take into account all subsystems, so it evokes a significant response in the visual region.
Further analysis of the changes in features found that the θ band power in the frontal region increased significantly with the increase of task load, and all brain regions showed significant differences. It can be considered that θ band power is a sensitive physiological index of mental load. When the mental load was increased, α band power in the occipital lobe and other hemispheric regions decreased, which was consistent with the results in reported studies [11,42,43]. In general, the power of β 1 and β 2 bands increased obviously with the increase of the load in the occipital region. However, by analyzing the power spectrum energy changes in specific channels, it was found that the power of β 2 band in O1 channel increased with the increase of task difficulty, while the power of β 2 band in P4 channel showed a downward trend. This shows that channel location has a substantial effect on features.
From the perspective of modeling performance, three different machine learning models were used to analyze the classification results under different feature sets. The results point out that the multimodal dataset is better than the unimodal dataset, which validates the results of [44][45][46]; moreover, the detection method established in this study is better in terms of classification accuracy than studies such as [11,47] in the multimodal study. The comparison of the number of categories from load classification is higher than [18,26,48] and the robustness of the model is better. The reasons for the better results in this study are as follows: (1) the task selection, which is more relevant to the actual scenario, and the task difficulty setting is more reasonable, which effectively induces different levels of brain load; and (2) the channel screening was conducted before modeling, and some redundant EEG channels were eliminated before fusion with fNIRS features. This feature combination overcomes the defects of low EEG spatial resolution and low fNIRS temporal resolution on the one hand, and may generate other key interaction information to help improve the accuracy of EEG load recognition on the other hand.
The feature data can be further mined later and relevant experiments can be designed to verify the present conjecture. Finally, it should be noted that there are still some limitations in our study. For one thing, data acquisition configuration of EEG contains 26 electrodes, which means there is still an optimized space. For another, only an offline experiment was implemented in this study, and the experimental task is not a real application task scenario. For further exploration, we hope to reduce the number of EEG channels further and improve our measurement method by monitoring the mental workload status of pilots in real-time during flight missions. Furthermore, the absence of female subjects is a limitation and we will include female subjects in subsequent studies.

Conclusions
In conclusion, this study was to construct a more accurate and convenient EEG-fNIRSbased mental workload detection method by optimizing the signal acquisition configuration. The result suggested 26 EEG channels and two frontal fNIRS channels is enough for a four-level mental workload detection accuracy of 76.25 ± 5.21%, which is higher than previously reported studies. The results of this study can promote the application of mental workload detection technology in military, driving, and other complex humancomputer interaction systems.

Participants
Twenty volunteers participated in our experiment, all from China Astronaut Research and Training Center, with an average age of 25.6 ± 2.24, normal or corrected to normal vision, without any neurological disease or history of neurological disease, and in a stable mental state before the experiment. In order to avoid the influence of gender differences and hand dominance on the results, all the subjects were male and right-handed. After familiarizing themselves with all the experimental procedures and requirements, the subjects signed the informed consent form.
During the experiment, the subjects were asked to sit in front of a 23-in. LCD monitor with their eyes about 70-90 cm away from the screen. The volunteers were asked to carefully read the instructions given on the screen before each sub-task and complete the experimental tasks as required. They were also asked to focus on the process of the experiment task execution as far as possible to get the best results, to ensure that the experimental data were real and effective, and to cooperate with data collection work.

Experimental design
This experiment uses the MATB task to trigger the mental workload of subjects. This task is composed of three sub-tasks, namely, system monitoring, tracking, and resource management. It is a complex task involving three cognitive resources: attention, operation, and reasoning. The cognitive resources involved in the actual task are the same as those involved in the fundamental tasks performed by the operator, which can well simulate the real-world task environment of the subjects. During the task, three sub-tasks appear on the screen simultaneously. As shown in Fig. 7a, the upper left is the system monitoring task. When an abnormal state occurs, the subject will press the "F1-F6" button to respond, and the subject is required to press the corresponding button in the shortest possible time. On the upper right is the tracking task, in which subjects track a circular target by controlling a joystick and are asked to aim the crosshairs at the target as accurately as possible. The lower part is the oil management task. The subjects control the opening and closing of the oil circuit by pressing "1-8", and they are required to ensure the oil volume of tank A/B remains within a specific range.
Experimental procedure is shown in Fig. 7c. The subjects completed a total of four blocks, each of which was randomly assigned four difficulty levels. Each block takes 3 min, and the four 3-min tasks are rated on a scale of difficulty from low to high: 1, 3, 5, and 7. There was a 1-min rest period between two tasks and a 3-min rest period before the task began. The actual experiment scene is shown in Fig. 7b. At the end of each task, the subjects were required to fill out the NASA-TLX scale. The four sequential tasks were performed in a random order among all subjects.

Data acquisition
The EGGO device of Ant Neuro was used to record the EEG signals of the subjects at 64 electrode positions as per international standards 10-20. The reference was set at CPZ and the eye movement signals were recorded from below the left eye. The impedance of all electrodes is kept below 5 KΩ and the sampling rate is 500 Hz. The PORTALITE device of Artinis was used to record the changes in blood oxygen in the brain on both sides of the forehead. The sampling rate is set to 50 Hz.

Preprocessing and feature extraction
After the collected EEG data were obtained, the average values of the two mastoid channels (M1, M2) were used to re-reference the signals. Then, a 0.5-45-Hz Butterworth band-pass filter and a 50-Hz notch filter were conducted to remove the interference of the DC component, high-frequency component, and power frequency. The sampling rate was reduced to 200 Hz to reduce the data memory size. Independent component analysis (ICA) was adopted to remove eye movement, motion artifacts, channel noise, and other interference. Finally, the data were segmented every 3 s with the beginning of stimulus as the mark. The above processing uses the MATLAB open-source toolkit EEGLAB.
The collected fNIRS signal contains considerable noise, including motion artifacts, physiological interference, and instrument noise. Preprocessing was performed using the Homer2 open-source toolkit to remove the motion artifacts. A 0.5-Hz low-pass filter was also applied to reduce instrumental and physiological noise. The processing steps are consistent with Foy et al. [31,32]. Finally, the data were segmented every 3 s, also marked by the time the stimulus started.
For each epoch, 11 statistical features were extracted, i.e., mean value, standard deviation, mean square error, skewness, root mean square, peak value, peak factor, kurtosis, waveform factor, pulse factor, and margin factor. A neurovascular coupling feature of the frontal EEG and frontal oxygen signals was added to calculate the zero-lagged correlation between the amplitude of HHb or O 2 Hb and the EEG frequency band power (in the four independent bands above). These HHb-or O 2 Hb-based NVO features are represented as oxidative neurovascular coupling and deoxy-neurovascular coupling, respectively. A total of 48 (2 channels × 2 × 11 + 2 channels × 2) fNIRS features were extracted.
To avoid disturbances from outliers, for EEG and fNIRS samples, we calculated the z-score of each subject's samples to remove samples with greater than three times the standard deviation.

Behavioral data analysis methods
Behavioral data collected in the course of this experiment mainly include the subjective scale score data filled out by the subjects and the task performance data generated during the task execution of the experimental platform. The subjects were asked to use the NASA-TLX scale to describe their subjective feelings of task load from six aspects: mental demand, physical demand, time demand, subjective evaluation of mission performance, effort, and frustration. To facilitate the analysis, the six dimensions of the subjective scale were normalized based on the maximum and minimum values and then averaged to obtain a comprehensive scale score. The MATB task used in this experiment consists of three sub-tasks: monitoring task, tracking task, and oil management task. Therefore, there are six performance indicators: the response time and response accuracy of the monitoring task; X-and Y-axis deviation of the tracking task; and fuel deviations of tank A and tank B for the oil management task. The performance data are normalized based on the maximum and minimum values and then averaged to obtain the comprehensive performance.
In order to analyze the influence of load level on tasks, we calculated Bonferroni-corrected ANOVA for overall performance as well as for comprehensive scale scoring, taking mental load level as a factor, and conducted post hoc comparison among various load levels. False discovery rate (FDR) correction was performed for multiple comparisons. The significance value was set at P < 0.05. By analyzing task behavior data, we can preliminarily summarize the law of the influence of task difficulty on mental load.

EEG channels selection methods
Given that not all areas of the human brain are directly related to mental workload, the 64-channel whole-brain EEG is unnecessary, and the fewer the channels, the easier it is to use in practice. The goal of this study was to use as few EEG channels as possible without losing accuracy. In order to find the brain regions that have high correlation with mental workload, the recursive feature elimination (RFE) algorithm based on SVM was used to filter the channels. RFE is a feature selection method with good performance and strong generalization ability [33]. The main idea is to select the best features by repeatedly building models (such as SVM), eliminating the selected features, and then repeating the above process on the remaining features until all the features are trawled. Considered to be physiological features closely related to mental load in reported studies [12,34], PSD features of four frequency bands including θ (3-8 Hz), α (8-13 Hz), β 1 (13)(14)(15)(16)(17)(18)(19)(20), and β 2 (20-30 Hz) of 64-channel EEG signals were calculated for screening. The top 100 features with the largest RFE results were selected of each subject for further analysis. Feature importance weight was defined as the number of people with this feature in the above-mentioned top 100 feature sets among 20 volunteers. Channel importance weight was defined as the sum of feature weights of four frequency bands of the channel.

Classification methods
In this study, three classification models, namely, Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF), were adopted to model the extracted features.