Air Traffic Controller Workload Detection Based on EEG Signals

The assessment of the cognitive workload experienced by air traffic controllers is a complex and prominent issue in the research community. This study introduces new indicators related to gamma waves to detect controllers’ workload and develops experimental protocols to capture their EEG data and NASA-TXL data. Then, statistical tests, including the Shapiro–Wilk test and ANOVA, were used to verify whether there was a significant difference between the workload data of the controllers in different scenarios. Furthermore, the Support Vector Machine (SVM) classifier was employed to assess the detection accuracy of these indicators across four categorizations. According to the outcomes, hypotheses suggesting a strong correlation between gamma waves and an air traffic controller’s workload were put forward and subsequently verified; meanwhile, compared with traditional indicators, the indicators associated with gamma waves proposed in this paper have higher accuracy. In addition, to explore the applicability of the indicator, sensitive channels were selected based on the mRMR algorithm for the indicator with the highest accuracy, β + θ + α + γ, showcasing a recognition rate of a single channel exceeding 95% of the full channel, which meets the requirements of convenience and accuracy in practical applications. In conclusion, this study demonstrates that utilizing EEG gamma wave-associated indicators can offer valuable insights into analyzing workload levels among air traffic controllers.


Introduction
The effects of prolonged, high-intensity mental activities on brain fatigue are welldocumented, with ramifications on mood and physical function [1].Particularly in the case of air traffic controllers, mental fatigue can significantly impair cognitive abilities, reaction times, and overall alertness, posing significant implications for operational performance or even leading to safety incidents [2][3][4][5].As air traffic controllers play a crucial role in civil aviation, the detection and analysis of their workload are of the utmost importance, particularly given the increasing volume of flights [6].
The current approaches to workload detection for air traffic controllers can be broadly categorized into subjective and objective methods.Subjective methods, such as the KSS scale and NASA-TLX scale, are relatively easier to implement and more widely applicable, but frequent completion of these scales may disrupt experimental subjects and add to their workload [7][8][9].On the other hand, objective methods involving the analysis of physiological signals, such as EEG and ECG signal detection [10], eye and facial feature detection [11], and voice feature detection [12], offer high reliability and validity, accurately capturing realtime physiological changes.When comparing the two method types, subjective detection methods are relatively easier to implement, faster, and more widely applicable.
Notably, the high temporal resolution and portability of EEG signals have established their credibility as a dependable tool for evaluating the workload of air traffic controllers, making them an essential technical resource for assessing workload in this field [12][13][14].In addition, studies have underscored the sensitivity of EEG to vigilance fluctuations and Sensors 2024, 24, 5301 3 of 19 the workload of controllers is imperative.Such a system should be portable, affordable, simple, and user-friendly in order to effectively support the needs of air traffic controllers.
In considering the aforementioned points, it is essential to acknowledge the significant strides made by researchers in studying controller fatigue using EEG signals.However, it is important to recognize the substantial differences that exist between the airspace delineation environment and operational habits and processes of Chinese controllers when compared with controllers from other nations [38,39].Consequently, the research findings cannot be directly applied and must be adapted to suit the unique conditions of frontline control units.
To advance the detection and analysis of controllers' workloads and verify the potential use of gamma waves for workload detection, this study conducted experiments to collect EEG data from controllers.Based on the experiment results, the hypothesis was formulated as follows: Gamma waves can effectively detect changes in controller workload.Subsequently, the study aims to validate the hypothesis by examining the distinguishability of gamma-containing and non-gamma-containing indicators using various categorization methods.The most effective indicator identified will then be utilized for workload detection, with its efficacy assessed based on its detection accuracy across different channel combinations.It is anticipated that through this process, an indicator and a limited number of channels will be identified to enable precise, straightforward, and rapid detection of controllers' workloads.

Materials Method 2.1. Experimental Design 2.1.1. Subjects and Experimental Environment
The study involved sixteen seasoned controllers, each possessing a minimum of three years of experience in control systems and ranging in age from 27 to 37. In the experiment, a control simulator was employed to manage workload.Participants were engaged in control activities within simulated environments featuring varying degrees of heavy traffic.To gather EEG data, a 64-electrode Borecon NeuSen W-series wireless EEG acquisition system was utilized.

Experimental Scene Selection
The mental workload of air traffic controllers is heavily influenced by the volume of traffic, which is measured by the number of aircraft requiring handling within a specific time frame, including both landings and take-offs.With increased traffic, controllers are tasked with managing a larger number of aircraft simultaneously to ensure their safe and efficient operation in the air and on the ground.This heightened workload entails a wide range of responsibilities, including assigning aircraft to appropriate routes and altitudes, coordinating and monitoring communications, and overseeing traffic flow.These responsibilities demand continuous vigilance over multiple aircraft and the ability to make timely and accurate decisions and instructions, all of which contribute to the mental workload and stress experienced by controllers.
In addition to managing traffic flow, various abnormal situations such as equipment malfunctions, foreign object debris on runways, specific operational demands of airports, and incident management can also significantly impact the workload of air traffic controllers.These diverse scenarios, collectively referred to as abnormal situations, require information processing, decision-making, and tailored guidance to ensure aviation safety, thereby placing additional strain on controllers and subjecting them to heightened psychological stress.
Given the aforementioned factors, a preliminary experiment was conducted to select four distinct and representative exercises on the control simulator before the main experiment.The workload for each exercise was categorized based on changes in traffic levels and the presence or absence of abnormal situations, resulting in classifications as lower load, higher load, overload, and abnormal situation scenes.Moreover, a resting state scene (labeled as Scene 0) was included to monitor the EEG data of controllers with their eyes closed.The specific scene settings are detailed in Table 1.Furthermore, for the purposes of this study, an abnormal situation refers to the presence of foreign objects on the runway, and these labels will be utilized to classify the collected EEG data.To determine the optimal sequence for each scene, preliminary experiments were conducted to compare the distinguishability of experimental data across different sequences, including "Scene 0-Scene 1-Scene 2-Scene 3-Scene 4", "Scene 4-Scene 3-Scene 2-Scene 1-Scene 0", and "Scene 0-Scene 3-Scene 2-Scene 1-Scene 4".Upon analysis, it was found that the EEG data of the subjects exhibited the greatest distinguishability when presented in the order of "Scene 0-Scene 4-Scene 3-Scene 1-Scene 2".As a result, subsequent experiments will be carried out in this specific order.

Experimental Procedure
During the experiments, EEG electrodes were positioned in accordance with the International 10-20 System, with a sampling frequency of 1000 Hz.Initially, the EEG data of each subject in the resting state with their eyes closed (scene 0) were collected before simulating the working state.After completing the NASA-TXL scale [40] for this scene, the EEG data during the working state were then recorded.Each scene lasted for half an hour, and an additional five minutes were required to complete the NASA-TXL scale.Subjects were tasked with performing all exercises to ensure the comprehensive collection of intact EEG data and NASA-TXL subjective scale data.The detailed process is illustrated in Figure 1.
Given the aforementioned factors, a preliminary experiment was conducted to select four distinct and representative exercises on the control simulator before the main experiment.The workload for each exercise was categorized based on changes in traffic levels and the presence or absence of abnormal situations, resulting in classifications as lower load, higher load, overload, and abnormal situation scenes.Moreover, a resting state scene (labeled as Scene 0) was included to monitor the EEG data of controllers with their eyes closed.The specific scene settings are detailed in Table 1.Furthermore, for the purposes of this study, an abnormal situation refers to the presence of foreign objects on the runway, and these labels will be utilized to classify the collected EEG data.To determine the optimal sequence for each scene, preliminary experiments were conducted to compare the distinguishability of experimental data across different sequences, including "Scene 0-Scene 1-Scene 2-Scene 3-Scene 4", "Scene 4-Scene 3-Scene 2-Scene 1-Scene 0", and "Scene 0-Scene 3-Scene 2-Scene 1-Scene 4".Upon analysis, it was found that the EEG data of the subjects exhibited the greatest distinguishability when presented in the order of "Scene 0-Scene 4-Scene 3-Scene 1-Scene 2".As a result, subsequent experiments will be carried out in this specific order.

Experimental Procedure
During the experiments, EEG electrodes were positioned in accordance with the International 10-20 System, with a sampling frequency of 1000 Hz.Initially, the EEG data of each subject in the resting state with their eyes closed (scene 0) were collected before simulating the working state.After completing the NASA-TXL scale [40] for this scene, the EEG data during the working state were then recorded.Each scene lasted for half an hour, and an additional five minutes were required to complete the NASA-TXL scale.Subjects were tasked with performing all exercises to ensure the comprehensive collection of intact EEG data and NASA-TXL subjective scale data.The detailed process is illustrated in Figure 1.

Feature Extraction and Classification
After completing the experiment, the NASA-TXL data and EEG data underwent processing for feature extraction and classification.Before analyzing the EEG data, several crucial preprocessing steps were conducted on the collected EEG data.First, unnecessary electrodes, such as ECG, HEOR, HROL, VEOU, etc., were removed, leaving 59 channels available for subsequent analysis.
After completing the experiment, the NASA-TXL data and EEG data underwent processing for feature extraction and classification.Prior to analyzing the EEG data, several essential preprocessing steps were taken to ensure data quality.This involved removing unnecessary electrodes, such as ECG, HEOR, HROL, and VEOU, leaving 59 channels available for subsequent analysis, as shown in Figure 2.

Feature Extraction and Classification
After completing the experiment, the NASA-TXL data and EEG data underwent processing for feature extraction and classification.Before analyzing the EEG data, several crucial preprocessing steps were conducted on the collected EEG data.First, unnecessary electrodes, such as ECG, HEOR, HROL, VEOU, etc., were removed, leaving 59 channels available for subsequent analysis.
After completing the experiment, the NASA-TXL data and EEG data underwent processing for feature extraction and classification.Prior to analyzing the EEG data, several essential preprocessing steps were taken to ensure data quality.This involved removing unnecessary electrodes, such as ECG, HEOR, HROL, and VEOU, leaving 59 channels available for subsequent analysis, as shown in Figure 2. To reduce noise and remove artifacts, a bandpass filter with a frequency range between 0.5 and 100 Hz was applied to each subject's EEG signal.This filtering step aimed to retain relevant neural activity within desired frequency bands while attenuating noise and out-of-range frequencies.
Regarding re-referencing to the average, we added a zero-filled channel as the initial reference, calculated the average potential, and then subtracted it from all the channels [41].This way, the smallest eigenvalue of the data (after removing the 'initial reference') was about 10^0, which is very large compared with the limit of the effective rank, and ICA can work on full-ranked data.Subsequently, Independent Component Analysis (ICA) was used to decompose the EEG signal and identify and remove potential artifacts or noise components.The informax runica.malgorithm within MATLAB's EEGLAB toolbox was used with default parameter settings.
Following ICA, additional artifact rejection steps were performed based on visual inspection and statistical criteria to ensure data quality.Any segments of the EEG signal contaminated by excessive noise, movement artifacts, or other disturbances were manually identified and removed from the analysis.
Upon completing these preprocessing steps, the clean EEG data were ready for feature extraction and subsequent classification analyses.The entire preprocessing pipeline was implemented using MATLAB's EEGLAB, with careful monitoring and validation of all processing steps to ensure data reliability and validity.
Once the preprocessing was completed, the aim was to distinguish changes in brain activity of controllers under different workloads using the EEG power spectrum [42,43].To achieve this, relevant EEG indicators were extracted and then utilized for classification.To reduce noise and remove artifacts, a bandpass filter with a frequency range between 0.5 and 100 Hz was applied to each subject's EEG signal.This filtering step aimed to retain relevant neural activity within desired frequency bands while attenuating noise and out-ofrange frequencies.
Regarding re-referencing to the average, we added a zero-filled channel as the initial reference, calculated the average potential, and then subtracted it from all the channels [41].This way, the smallest eigenvalue of the data (after removing the 'initial reference') was about 10ˆ0, which is very large compared with the limit of the effective rank, and ICA can work on full-ranked data.Subsequently, Independent Component Analysis (ICA) was used to decompose the EEG signal and identify and remove potential artifacts or noise components.The informax runica.malgorithm within MATLAB's EEGLAB toolbox was used with default parameter settings.
Following ICA, additional artifact rejection steps were performed based on visual inspection and statistical criteria to ensure data quality.Any segments of the EEG signal contaminated by excessive noise, movement artifacts, or other disturbances were manually identified and removed from the analysis.
Upon completing these preprocessing steps, the clean EEG data were ready for feature extraction and subsequent classification analyses.The entire preprocessing pipeline was implemented using MATLAB's EEGLAB, with careful monitoring and validation of all processing steps to ensure data reliability and validity.
After completing the feature extraction, a Pearson correlation analysis was conducted to investigate the relationship between the indicators and workload.The Pearson productmoment correlation coefficient, which measures the correlation between two variables X and Y and provides a value between −1 and 1, was utilized.The calculation formula is as follows: Specifically, −1 indicates a perfect negative correlation, while +1 indicates a perfect correlation.The strength of a variable's correlation is typically assessed based on the following range of values for the correlation coefficient: 0.8-1.0indicates a very strong correlation, 0.6-0.8indicates a strong correlation, 0.4-0.6 indicates a moderate correlation, 0.2-0.4indicates a weak correlation, and 0.0-0.2indicates an extremely weak correlation or no correlation [45].
Following the Pearson correlation analysis, a Support Vector Machine (SVM) classifier was utilized to categorize the calculated indicators.SVM, a type of generalized linear classifier, performs binary classification through supervised learning, using a maximummargin hyperplane as the decision boundary based on learned samples [46,47].The dataset was randomly divided into a training set (70% of the total sample), testing set (15% of the total sample), and validation set (15% of the total sample) to ensure generalizability.
To enhance the generalizability of the results, the data were categorized in various ways and classified by the SVM classifier, including resting/working, 0/high/medium-low load, 0/low/medium-high load, and 0/low/medium/high load.For the classification method of resting/working, the EEG data of a controller at rest was labeled as "resting", while the EEG data of a controller at work was labeled as "working".To achieve a balanced distribution of sample data across the two labels, we conducted data sampling from Scene 1, Scene 2, Scene 3, and Scene 4. Specifically, we randomly selected one-fourth of the data from each work scene to represent the input data associated with the "working" label.
Each feature's corresponding data were classified using the SVM classifier, and a grid search method was employed to optimize the classification results by identifying the best parameter combinations for each feature under different classification methods.Subsequently, the best-performing feature was chosen based on its classification accuracy across different mental workload categories for controller detection.

Application
The preceding work has enabled the identification of the most suitable indicator for detecting the controller workload.Subsequently, the suitability of this indicator for controller workload detection was validated by assessing its detection accuracy in singlechannel, multichannel, and full-channel applications.
This paper utilizes the mRMR algorithm to select a single channel or a combination of channels for the indicator, aiming to enhance its applicability.The mRMR algorithm combines correlation and redundancy between features to identify the most relevant and minimally redundant features [48].
This study employs the mRMR algorithm to select a single channel or a combination of channels for the indicator with the objective of enhancing its applicability.The mRMR algorithm integrates correlation and redundancy between features to identify the most relevant and minimally redundant features [48].The maximal relevance criterion is calculated as: Here, S represents the set of channels, and c represents the target variable (mental workload), with x i being one of the channels.The minimum redundancy criterion is as follows: The redundancy between channels can be calculated by using this formula.The criterion that combines the above two constraints is called "minimal-redundancy-maximalrelevance" (mRMR). max∅ According to this algorithm, the set of "minimal-redundancy-maximal-relevance" channels for each subject corresponding to the best-performing indicator can be selected.Furthermore, the applicability of the indicator can be analyzed based on the detection accuracy in different channels or combinations of channels.

Data of NASA-TXL
The NASA Task Load Index is a multidimensional assessment of workload developed by NASA's Ames Research Center.It is assessed on six scales: Mental Demands, Physical Demands, Temporal Demands, Own Performance, Effort, and Frustration.
In this study, NASA-TXL scale values were utilized to evaluate workload, gathered from eight subjects rating workload across four different scenes.As illustrated in Figure 3, although individual workload perceptions varied slightly across scenes, a consistent overall trend was observed.Scene 3 consistently exhibited the highest workload, and the scene exhibited nearly the lowest workload, whereas the other scenes indicated lower overall workloads.Moreover, Scene 4 consistently demonstrated a higher workload compared to Scene 2.
Additionally, given that the number of subjects was fewer than 5000, the Shapiro-Wilk normality test was employed to ascertain the data's validity.If the p-value exceeds 0.05 for all four scenes, the data are considered normally distributed, and the scenarios are deemed valid.Otherwise, if the p-values are below 0.05, the data are considered invalid, requiring the experiment to be re-evaluated.Additionally, given that the number of subjects was fewer than 5000, the Shapiro-Wilk normality test was employed to ascertain the data's validity.If the p-value exceeds 0.05 for all four scenes, the data are considered normally distributed, and the scenarios are deemed valid.Otherwise, if the p-values are below 0.05, the data are considered invalid, requiring the experiment to be re-evaluated.
The results of the normality test are presented in Table 3.The obtained p-values for all four scenes were greater than 0.05, signifying that the NASA-TXL values for the subjects in each scene followed a normal distribution.Based on the results of the Shapiro-Wilk test, it was confirmed that the workload data of the controllers in each scene followed a normal distribution.Subsequently, a oneway ANOVA was conducted to assess the significance of differences in workload data among the controllers in different scenes.The output indicated an f_statistic value of 13.244017117542056 and a p-value of 9.6884759694676 × 10 −7 , with the p-value being less than 0.05.This confirms a significant difference in workload data across the various scenarios, thereby validating the selection of the experimental scenes.

Results of Pearson's Coefficient
Prior to computing Pearson's coefficient, data processing tasks such as outlier identification and data scaling were performed.The results, depicted in Figure 4 below, reveal that Pearson's coefficient for each indicator exceeds 0.5.This preliminary finding suggests a correlation between all 25 selected indicators and workload, with 13 indicators demonstrating a strong correlation.These outcomes validate the suitability of all 17 selected indicators for further analysis.The results of the normality test are presented in Table 3.The obtained p-values for all four scenes were greater than 0.05, signifying that the NASA-TXL values for the subjects in each scene followed a normal distribution.Based on the results of the Shapiro-Wilk test, it was confirmed that the workload data of the controllers in each scene followed a normal distribution.Subsequently, a one-way ANOVA was conducted to assess the significance of differences in workload data among the controllers in different scenes.The output indicated an f_statistic value of 13.244017117542056 and a p-value of 9.6884759694676 × 10 −7 , with the p-value being less than 0.05.This confirms a significant difference in workload data across the various scenarios, thereby validating the selection of the experimental scenes.

Results of Pearson's Coefficient
Prior to computing Pearson's coefficient, data processing tasks such as outlier identification and data scaling were performed.The results, depicted in Figure 4 below, reveal that Pearson's coefficient for each indicator exceeds 0.5.This preliminary finding suggests a correlation between all 25 selected indicators and workload, with 13 indicators demonstrating a strong correlation.These outcomes validate the suitability of all 17 selected indicators for further analysis.

Results of Classification
Additionally, the results of the classification process are detailed in Figure 5.
When the data were categorized into resting and working states, all indicators exhibited accuracy above 0.8, affirming the validity of these indicators within this classification.

Results of Classification
Additionally, the results of the classification process are detailed in Figure 5.
When the data were categorized into resting and working states, all indicators exhibited accuracy above 0.8, affirming the validity of these indicators within this classification.the data were categorized into resting and working states, all indicators exhibited accuracy above 0.8, affirming the validity of these indicators within this classification.
Furthermore, when the EEG data of controllers are divided into three categories: 0 load, high load (Scene 3 and Scene 4), and medium-low load (Scene 1 and Scene 2), the accuracy of all indicators is above 0.33333.Another way of categorizing involves three categories: 0 load, medium-high load (Scene 2, Scene 3, and Scene 4), and low load (Scene 1), where the accuracy of all indicators reaches 0.6 and above.
The results of the three classifications indicate that the data can be segmented, and the indicators are valid.When the data are divided into four categories: 0 load, medium load (Scene 2), high load (Scene 3 and Scene 4), and low load (Scene 1).The accuracy of all the indicators is much higher than 0.25.The results of the four categories indicate that the data can be effectively classified, and the indicators are valid.
In the classification method of rest/working and 0/low/medium-high workload, the differences in classification accuracy between indicators are minimal.In fact, many indicators have the same classification accuracy, making it difficult to compare the performance of different indicators.In the classifications of 0/low/high workload and 0/low/medium/high workload, indicators show significant differences in classification accuracy.However, the ranking of indicators varies greatly between these two classification methods.Some indicators perform better in the 0/low/high workload classification, while others excel in the 0/low/medium/high workload classification, thereby increasing the complexity of feature selection.In order to have a metric to access the performance of indicators under different classification methods and then enable comparisons between indicators, the data were normalized using the following formula: f (x i ) is defined as a value of the indicator after normalization and provides a value between 0 and 1. f (x i ) = 0 indicates the lowest accuracy under the corresponding classification method, and f (x i ) = 1 means the highest accuracy under the classification method.The normalization results are presented in Figure 6.Furthermore, when the EEG data of controllers are divided into three categories: 0 load, high load (Scene 3 and Scene 4), and medium-low load (Scene 1 and Scene 2), the accuracy of all indicators is above 0.33333.Another way of categorizing involves three categories: 0 load, medium-high load (Scene 2, Scene 3, and Scene 4), and low load (Scene 1), where the accuracy of all indicators reaches 0.6 and above.
The results of the three classifications indicate that the data can be segmented, and the indicators are valid.When the data are divided into four categories: 0 load, medium load (Scene 2), high load (Scene 3 and Scene 4), and low load (Scene 1).The accuracy of all the indicators is much higher than 0.25.The results of the four categories indicate that the data can be effectively classified, and the indicators are valid.
In the classification method of rest/working and 0/low/medium-high workload, the differences in classification accuracy between indicators are minimal.In fact, many indicators have the same classification accuracy, making it difficult to compare the performance of different indicators.In the classifications of 0/low/high workload and 0/low/medium/high workload, indicators show significant differences in classification accuracy.However, the ranking of indicators varies greatly between these two classification methods.Some indicators perform better in the 0/low/high workload classification, while others excel in the 0/low/medium/high workload classification, thereby increasing the complexity of feature selection.In order to have a metric to access the performance of indicators under different classification methods and then enable comparisons between indicators, the data were normalized using the following formula: is defined as a value of the indicator after normalization and provides a value between 0 and 1.It is evident that indicator 15 (δ + β + θ + α + γ) demonstrates a superior categorization ability compared to indicator 5 (δ + β + θ + α) across all categorizations.This suggests that indicators containing gamma waves are more effective in capturing the changes in EEG characteristics resulting from workload variations among air traffic controllers.
Furthermore, among the top nine indicators, with the exception of γ/β and γ/α, all are absolute energy, while the lower-ranked indicators are relative energy.This leads to the speculation that absolute energy may more accurately the workload changes of controllers than relative energy.
Delving into the ranking of absolute energy, it is notable that the absolute energy of δ and θ is ranked lower, while β, α, and γ are ranked higher.This finding implies that controllers are alert during their work, corroborating previous studies [26].This conclusion further reinforces the reliability of the experimental data.

Hypothesis and Verification
Based on the results above, the following hypothesis is proposed.H1: Gamma waves can be used to detect changes in the controller's workload.

H2:
Compared with relative energy, absolute energy can better reflect controller workloads.
To test the hypothesis, new indicators were proposed and processed as described previously.Based on the results, it was found that the absolute energy indicators combined with different frequency rhythms performed better.Consequently, the new indicators comprised absolute energy indicators that combined various frequency rhythms.The results of this analysis, involving a total of 49 indicators organized into 14 groups of gamma-containing and non-gamma-containing indicators, were then normalized and ranked in Table 4.The ranked indicators reaffirmed the superior effectiveness of absolute energy in reflecting the controllers' state under varying workloads compared to relative energy.Furthermore, among the 49 indicators, indicator 49 (β + θ + α + γ) emerged with the highest score across all categorizations, making it the most suitable for detecting the workload of controllers.
In order to test hypothesis H1, the indicators with and without gamma were compared, and the results displayed in Figure 7 supported the hypothesis.Across all groups, the gamma-containing indicators consistently scored higher compared to the non-gammacontaining indicators.
Furthermore, to assess whether there is a significant difference between the scores of gamma-containing indicators and non-gamma-containing indicators, a paired samples ttest was implemented on the two data sets.The analysis revealed that the t-statistic yielded a value of −2.7208074278915646, with a corresponding p-value of 0.017485943009039176.Given that the p-value is less than 0.05, it can be concluded that there is indeed a significant disparity between the two sets of data.This outcome validates the initial hypothesis and supports the conclusion that gamma waves can effectively be utilized to detect the workload of controllers.This validated the hypothesis and led to the conclusion that gamma waves can indeed be utilized to detect the workload of controllers.To evaluate hypothesis H2, a comparison of the scores of absolute energy indicators and relative indicators is presented in Figure 8. Notably, all of the absolute energy indicators were consistently ranked higher than the relative indicators, except for indicators 1 and 2.
Furthermore, to ascertain whether there exists a significant disparity between the absolute energy indicator scores and the relative energy indicator scores, statistical tests were conducted on the two sets of data.Initially, Levene's test for homogeneity of variances was executed on the samples to assess the extent of deviation of each group's observations from their respective group means.The resulting p-value was 0.003254042167283171, indicating that the variances of the two data sets were not homogeneous.Given the uneven variance and sample size between the two data sets, the Welch's t-test was then carried out.The findings revealed a t-statistic value of 7.45561541056084 and a p-value of 1.6265354752083185 × 10 −8 , demonstrating a significant Furthermore, to assess whether there is a significant difference between the scores of gamma-containing indicators and non-gamma-containing indicators, a paired samples ttest was implemented on the two data sets.The analysis revealed that the t-statistic yielded a value of −2.7208074278915646, with a corresponding p-value of 0.017485943009039176.Given that the p-value is less than 0.05, it can be concluded that there is indeed a significant disparity between the two sets of data.This outcome validates the initial hypothesis and supports the conclusion that gamma waves can effectively be utilized to detect the workload of controllers.This validated the hypothesis and led to the conclusion that gamma waves can indeed be utilized to detect the workload of controllers.
To evaluate hypothesis H2, a comparison of the scores of absolute energy indicators and relative indicators is presented in Figure 8. Notably, all of the absolute energy indicators were consistently ranked higher than the relative indicators, except for indicators 1 and 2.
Furthermore, to ascertain whether there exists a significant disparity between the absolute energy indicator scores and the relative energy indicator scores, statistical tests were conducted on the two sets of data.Initially, Levene's test for homogeneity of variances was executed on the samples to assess the extent of deviation of each group's observations from their respective group means.The resulting p-value was 0.003254042167283171, indicating that the variances of the two data sets were not homogeneous.Given the uneven variance and sample size between the two data sets, the Welch's t-test was then carried out.The findings revealed a t-statistic value of 7.45561541056084 and a p-value of 1.6265354752083185 × 10 −8 , demonstrating a significant difference between the two data sets, as p < 0.05.This result supports the validity of H2, indicating that absolute energy is more effective than relative energy in reflecting the state of controllers under different workloads.
difference between the two data sets, as p < 0.05.This result supports the validity of H2, indicating that absolute energy is more effective than relative energy in reflecting the state of controllers under different workloads.

Results of Application
To identify channels most relevant to the workload of controllers, the mRMR algorithm was applied, with the target variable being workload.Among the 49 indicators, the indicator β + θ + α + γ demonstrated the highest accuracy, and the results of mRMR based on this indicator are detailed in Figure 9, which displays the results of the channel screening, revealing the top 10 ranked channels for the 16 subjects.
The subsequent analysis revealed that each subject corresponded to a different channel ranking, suggesting that the best combination of channels cannot be directly determined using this algorithm.However, it is worth noting that some channels appeared frequently across subjects, suggesting that while a universally optimal channel combination may not exist, it is feasible to identify channel combinations that perform well for the majority of subjects.To address this issue, the occurrence of each channel was counted, and the results are presented in the Table 5. CP6 and TP5 appeared 14 times, while CP5 and CP2 appeared 13 times, accounting for 75% or more of the total number of subjects.

Results of Application
To identify channels most relevant to the workload of controllers, the mRMR algorithm was applied, with the target variable being workload.Among the 49 indicators, the indicator β + θ + α + γ demonstrated the highest accuracy, and the results of mRMR based on this indicator are detailed in Figure 9, which displays the results of the channel screening, revealing the top 10 ranked channels for the 16 subjects.

Results of Application
To identify channels most relevant to the workload of controllers, the mRMR algorithm was applied, with the target variable being workload.Among the 49 indicators, the indicator β + θ + α + γ demonstrated the highest accuracy, and the results of mRMR based on this indicator are detailed in Figure 9, which displays the results of the channel screening, revealing the top 10 ranked channels for the 16 subjects.
The subsequent analysis revealed that each subject corresponded to a different channel ranking, suggesting that the best combination of channels cannot be directly determined using this algorithm.However, it is worth noting that some channels appeared frequently across subjects, suggesting that while a universally optimal channel combination may not exist, it is feasible to identify channel combinations that perform well for the majority of subjects.To address this issue, the occurrence of each channel was counted, and the results are presented in the Table 5. CP6 and TP5 appeared 14 times, while CP5 and CP2 appeared 13 times, accounting for 75% or more of the total number of subjects.The top 10 channels of 41 subjects are concentrated within this area.The subsequent analysis revealed that each subject corresponded to a different channel ranking, suggesting that the best combination of channels cannot be directly determined using this algorithm.However, it is worth noting that some channels appeared frequently across subjects, suggesting that while a universally optimal channel combination may not exist, it is feasible to identify channel combinations that perform well for the majority of subjects.
To address this issue, the occurrence of each channel was counted, and the results are presented in the Table 5. CP6 and TP5 appeared 14 times, while CP5 and CP2 appeared 13 times, accounting for 75% or more of the total number of subjects.Based on these findings, the study organized and combined the mentioned channels and classified the data accordingly using the method described previously.This comprehensive analysis considered a total of 15 combinations, including single-channel, dual-channel, triple-channel, and quad-channel configurations; the detection accuracy of various channel combinations is shown in Figure 10.Notably, the combination of channels 40 and 41 demonstrated a notably high classification accuracy, reaching up to 80% of the full-channel accuracy regardless of the classification mode.Based on these findings, the study organized and combined the mentioned channels and classified the data accordingly using the method described previously.This comprehensive analysis considered a total of 15 combinations, including single-channel, dualchannel, triple-channel, and quad-channel configurations; the detection accuracy of various channel combinations is shown in Figure 10.Notably, the combination of channels 40 and 41 demonstrated a notably high classification accuracy, reaching up to 80% of the fullchannel accuracy regardless of the classification mode.Based on Indicator 49, the controller workload is detected using different channels and combinations of channels.The labels are categorized corresponding to the above, where the detection accuracy of all channels is the highest regardless of the categorization method.Among all combinations, CP4 exhibits the best performance.In resting/working and low/medium-high load scenes, the classification accuracy reaches up to 95% of the full channel.Even in low/medium-high load scenarios, the accuracy slightly drops but still achieves up to 85.93% of the full channel.Notably, in quadruple classification, the singlechannel classification accuracy of channel 38 exceeds the full-channel accuracy, reaching 0.575.This observation highlights that the number of channels does not directly correlate with classification accuracy.Notably, the detection accuracy by only one channel, CP4, surpasses that of all channel combinations except for the full-channel and the TP7 and CP6 combination under different classification methods.Overall, the combination of Indicator 49 and CP4 demonstrates both convenience and accuracy in detecting the workload of controllers.

Comparison between Indicators
In exploring the application of EEG signals in assessing the workload of controllers, this study initially selects EEG power spectrum indicators associated with delta waves, Based on Indicator 49, the controller workload is detected using different channels and combinations of channels.The labels are categorized corresponding to the above, where the detection accuracy of all channels is the highest regardless of the categorization method.Among all combinations, CP4 exhibits the best performance.In resting/working and low/medium-high load scenes, the classification accuracy reaches up to 95% of the full channel.Even in low/medium-high load scenarios, the accuracy slightly drops but still achieves up to 85.93% of the full channel.Notably, in quadruple classification, the single-channel classification accuracy of channel 38 exceeds the full-channel accuracy, reaching 0.575.This observation highlights that the number of channels does not directly correlate with classification accuracy.Notably, the detection accuracy by only one channel, CP4, surpasses that of all channel combinations except for the full-channel and the TP7 and CP6 combination under different classification methods.Overall, the combination of Indicator 49 and CP4 demonstrates both convenience and accuracy in detecting the workload of controllers.

Comparison between Indicators
In exploring the application of EEG signals in assessing the workload of controllers, this study initially selects EEG power spectrum indicators associated with delta waves, theta waves, alpha waves, and beta waves.Subsequently, a new set of EEG power spectrum indicators related to gamma waves is proposed.A control simulation experiment is designed to collect experimental data, validating the correlation between these indicators and workload.
The results reveal that the classification accuracy of indicator15 (δ + β + θ + α + γ) surpasses that of indicator5 (δ + β + θ + α), and the classification accuracy of indicator14 (γ absolute energy) exceeds that of the delta, theta, alpha, and beta waves.Moreover, absolute energy proves to better reflect controllers' workload compared to relative energy.Hence, there is a proposition that gamma waves can be utilized to detect controller workload, along with the novel indicators of absolute energy, to test this hypothesis.Notably, among the 14 groups of indicators, those including gamma waves demonstrate higher classification accuracy compared to those without gamma waves.
From Figure 11, regarding the absolute energy of delta, theta, alpha, and beta waves, the findings of this study align with existing research results [40].The inherent energy of these waves effectively detects controllers' workload, achieving a classification accuracy of 80% or higher in distinguishing between resting and working states.Furthermore, they also produce satisfactory results in triple classification and quadruple classification.However, regardless of the classification method utilized, the absolute energy of gamma waves introduced in this paper consistently outperforms the aforementioned indicators in terms of accuracy.In line with previous research [30], this study reinforces the effectiveness of relative energy indicators, specifically θ/β, α/β, (α + θ)/β, and (α + θ)/(α + β), for evaluating the workload of controllers.These indicators displayed a classification accuracy of over 80% in distinguishing between resting and working states, as well as yielding satisfactory results in triple and quadruple classifications.Nonetheless, as shown in Table 6, their accuracies consistently lag behind the relative energy of γ/β introduced in this paper, regardless of the classification mode employed.In conclusion, these results demonstrate that utilizing EEG gamma wave-associated indicators can offer valuable insights into analyzing workload levels among air traffic controllers.The results reveal that the classification accuracy of indicator15 (δ + β + θ + α + γ) surpasses that of indicator5 (δ + β + θ + α), and the classification accuracy of indicator14 (γ absolute energy) exceeds that of the delta, theta, alpha, and beta waves.Moreover, absolute energy proves to better reflect controllers' workload compared to relative energy.Hence, there is a proposition that gamma waves can be utilized to detect controller workload, along with the novel indicators of absolute energy, to test this hypothesis.Notably, among the 14 groups of indicators, those including gamma waves demonstrate higher classification accuracy compared to those without gamma waves.
From Figure 11, regarding the absolute energy of delta, theta, alpha, and beta waves, the findings of this study align with existing research results [40].The inherent energy of these waves effectively detects controllers' workload, achieving a classification accuracy of 80% or higher in distinguishing between resting and working states.Furthermore, they also produce satisfactory results in triple classification and quadruple classification.However, regardless of the classification method utilized, the absolute energy of gamma waves introduced in this paper consistently outperforms the aforementioned indicators in terms of accuracy.In line with previous research [30], this study reinforces the effectiveness of relative energy indicators, specifically θ/β, α/β, (α + θ)/β, and (α + θ)/(α + β), for evaluating the workload of controllers.These indicators displayed a classification accuracy of over 80% in distinguishing between resting and working states, as well as yielding satisfactory results in triple and quadruple classifications.Nonetheless, as shown in Table 6, their accuracies consistently lag behind the relative energy of γ/β introduced in this paper, regardless of the classification mode employed.In conclusion, these results demonstrate that utilizing EEG gamma wave-associated indicators can offer valuable insights into analyzing workload levels among air traffic controllers.

Application of Research Results
A thorough analysis of 49 indicators using four different classification methods in this study revealed variability in the ranking of indicator accuracy across the methods, suggesting a correlation between classification methods and accuracy.Thus, careful selection of the most appropriate indicators and classification methods is crucial when utilizing EEG for workload assessment.
Significantly, regardless of the classification method employed, indicator 49 (β + θ + α + γ) was robust in different categorization methods and consistently demonstrated the highest precision, indicating strong generalizability.This finding aligns with existing research [49], emphasizing the efficacy of combined frequency rhythms over individual rhythms for various purposes.Additionally, the EEG features related to gamma waves introduced in this study prove to be highly suitable for detecting mental workload in controllers compared to conventional features.In summary, the findings of this paper can contribute to the more accurate and convenient detection of the workload of controllers.

Contributions and Limitations
The study confirmed the effectiveness of the indicator β + θ + α + γ in assessing controller workload, with a recognition rate of over 95% for single channel usage, which enhances both convenience and accuracy in practical applications.Furthermore, the research illustrates the significance of utilizing EEG gamma wave-associated indicators to gain valuable insights into air traffic controllers' workload levels.
Notably, during the verification process, an unexpected discovery was made regarding the higher accuracy of most absolute energy indicators compared to relative energy indicators.However, due to time constraints, this finding was not extensively investigated.Therefore, researchers are encouraged to further explore this conclusion by proposing additional absolute and relative energy indicators based on this hypothesis for experimental verification.

Conclusions
In this study, novel power spectrum indicators based on gamma waves were proposed for identifying controller workload in an EEG-based system.The results demonstrate that the indicators related to gamma waves presented in this paper exhibit higher accuracies compared to traditional indicators.The following was found: (1) β + θ + α + γ demonstrates the highest accuracy regardless of the classification method.
This indicator allows for the feasible use of a single channel in detecting controller workload, with a recognition rate exceeding 95% of the full channel.This meets the practical application requirements for both convenience and accuracy.(2) The best channel or channel combination varies from one research subject to another; in other words, the best-performing channel varies from person to person.(3) Classification methods have a significant impact on classification accuracy.In this study, two triple classification methods were used, and the accuracy of low/mediumhigh/low load is usually higher than low/high/medium-low load.(4) Different controllers perceive the workload of special scenes differently, and one of the reasons for this phenomenon might be the variations in work experience.

Figure 1 .
Figure 1.Experimental Procedure.Each scene lasted 30 min, and the NASA-TXL scale was filled out about the scene at the end of the scene.

Figure 1 .
Figure 1.Experimental Procedure.Each scene lasted 30 min, and the NASA-TXL scale was filled out about the scene at the end of the scene.

Figure 2 .
Figure 2. Channel location by name.

Figure 2 .
Figure 2. Channel location by name.
Moreover, Scene 4 consistently demonstrated a higher workload compared to Scene 2.

Figure 3 .
Figure 3. Workload of each sub.Scene 3 always had the highest workload, while Scene 1 always had the lowest workload.

Figure 3 .
Figure 3. Workload of each sub.Scene 3 always had the highest workload, while Scene 1 always had the lowest workload.

Figure 4 .
Figure 4. Pearson's coefficient heat map.Darker colors represent greater Pearson correlation coefficients between the indicator and workload.The Pearson's coefficient of each indicator is greater than 0.5.

Figure 5 .
Figure 5.This figure shows the classification results of the indicators under different classification methods; each column represents a classification method, and darker colors of the corresponding data in each column mean a better classification result.In each column of data, red arrows indicate the worst classification accuracy for that classification method, yellow effects indicate moderate classification accuracy, and green arrows indicate the best classification accuracy.

Figure 4 .
Figure 4. Pearson's coefficient heat map.Darker colors represent greater Pearson correlation coefficients between the indicator and workload.The Pearson's coefficient of each indicator is greater than 0.5.

3. 3 .
Hypothesis and Verification Base on the Results of Classification 3.3.1.Results of Classification Additionally, the results of the classification process are detailed in Figure 5. Sensors 2024, 24, x FOR PEER REVIEW 9 of 19

Figure 4 .
Figure 4. Pearson's coefficient heat map.Darker colors represent greater Pearson correlation coefficients between the indicator and workload.The Pearson's coefficient of each indicator is greater than 0.5.

Figure 5 .
Figure 5.This figure shows the classification results of the indicators under different classification methods; each column represents a classification method, and darker colors of the corresponding data in each column mean a better classification result.In each column of data, red arrows indicate the worst classification accuracy for that classification method, yellow effects indicate moderate classification accuracy, and green arrows indicate the best classification accuracy.

Figure 5 .
Figure 5.This figure shows the classification results of the indicators under different classification methods; each column represents a classification method, and darker colors of the corresponding data in each column mean a better classification result.In each column of data, red arrows indicate the worst classification accuracy for that classification method, yellow effects indicate moderate classification accuracy, and green arrows indicate the best classification accuracy.

Sensors 2024 ,
24, x FOR PEER REVIEW 10 of 19 accuracy under the classification method.The normalization results are presented in Figure6.

Figure 6 .
Figure 6.The bar charts show the normalized data for the 25 indicators under different classifications, and the line graphs show the means of the normalized data under different classifications.

Figure 6 .
Figure 6.The bar charts show the normalized data for the 25 indicators under different classifications, and the line graphs show the means of the normalized data under different classifications.Indicators are sorted by average value from left to right as the average value goes from highest to lowest.Some indicators correspond to fewer than 4 bars because, in some classifications, f(x i ) = 0.

Figure 7 .
Figure 7.Comparison chart of control groups.The bar chart represents the scores of non-gammacontaining indicators, and the line chart represents the scores of gamma-containing indicators.From the figure, we can see that the gamma-containing indicators always have higher scores than the nongamma-containing indicators.

Figure 7 .
Figure 7.Comparison chart of control groups.The bar chart represents the scores of non-gammacontaining indicators, and the line chart represents the scores of gamma-containing indicators.From the figure, we can see that the gamma-containing indicators always have higher scores than the non-gamma-containing indicators.

Figure 8 .
Figure 8. Scores for various types of indicators.The indicator score is the average value of normalized classification accuracy under different classification methods, which represents the performance of the indicator.The closer the score is to 1, the better the performance of the indicator.

Figure 9 .
Figure 9.The output of the mRMR algorithm is the top 10 channels in which each subject performed better, and each subject's channel was colored as shown above.

Figure 8 .
Figure 8. Scores for various types of indicators.The indicator score is the average value of normalized classification accuracy under different classification methods, which represents the performance of the indicator.The closer the score is to 1, the better the performance of the indicator.

Figure 8 .
Figure 8. Scores for various types of indicators.The indicator score is the average value of normalized classification accuracy under different classification methods, which represents the performance of the indicator.The closer the score is to 1, the better the performance of the indicator.

Figure 9 .
Figure 9.The output of the mRMR algorithm is the top 10 channels in which each subject performed better, and each subject's channel was colored as shown above.

Figure 9 .
Figure 9.The output of the mRMR algorithm is the top 10 channels in which each subject performed better, and each subject's channel was colored as shown above.

Figure 10 .
Figure 10.Comparison of channel accuracy.Based on Indicator 49, the controller workload is detected using different channels and combinations of channels.The labels are categorized corresponding to the above, where the detection accuracy of all channels is the highest regardless of the categorization method.

Figure 10 .
Figure 10.Comparison of channel accuracy.Based on Indicator 49, the controller workload is detected using different channels and combinations of channels.The labels are categorized corresponding to the above, where the detection accuracy of all channels is the highest regardless of the categorization method.

Figure 11 .
Figure 11.For comparison of absolute energy indicators across five bands.Regardless of the classification method utilized, the absolute energy of gamma outperforms the absolute energy of delta, theta, alpha, and beta in terms of accuracy.

Figure 11 .
Figure 11.For comparison of absolute energy indicators across five bands.Regardless of the classification method utilized, the absolute energy of gamma outperforms the absolute energy of delta, theta, alpha, and beta in terms of accuracy.

Table 1 .
Control scene selection in experiment.

Table 1 .
Control scene selection in experiment.

Table 2 .
Proposal of indicators.

Table 3 .
S-W test of NASA-TXL data.

Table 3 .
S-W test of NASA-TXL data.
Sensors 2024,24,x FOR PEER REVIEW 15 of 19 theta waves, alpha waves, and beta waves.Subsequently, a new set of EEG power spectrum indicators related to gamma waves is proposed.A control simulation experiment is designed to collect experimental data, validating the correlation between these indicators and workload.

Table 6 .
Comparison of relative energy indicators.

Table 6 .
Comparison of relative energy indicators.