EEG based Classification of Long-term Stress Using Psychological Labeling.

Stress research is a rapidly emerging area in the field of electroencephalography (EEG) signal processing. The use of EEG as an objective measure for cost effective and personalized stress management becomes important in situations like the nonavailability of mental health facilities. In this study, long-term stress was classified with machine learning algorithms using resting state EEG signal recordings. The labeling for the stress and control groups was performed using two currently accepted clinical practices: (i) the perceived stress scale score and (ii) expert evaluation. The frequency domain features were extracted from five-channel EEG recordings in addition to the frontal and temporal alpha and beta asymmetries. The alpha asymmetry was computed from four channels and used as a feature. Feature selection was also performed to identify statistically significant features for both stress and control groups (via t-test). We found that support vector machine was best suited to classify long-term human stress when used with alpha asymmetry as a feature. It was observed that the expert evaluation-based labeling method had improved the classification accuracy by up to 85.20%. Based on these results, it is concluded that alpha asymmetry may be used as a potential bio-marker for stress classification, when labels are assigned using expert evaluation.


Introduction
The response of the human body to a demand for change is considered as stress [1]. A balance exists between the sympathetic and parasympathetic arms of the autonomic nervous system in healthy people. A fight-or-flight response is invoked when there is an exposure to a threatening situation. Daily routine stress does not pose any danger to life, however, the fight-or-flight response may still be invoked. A persistence of this short-term stress for a longer duration can cause long lasting effects on the neurology of an individual and may give rise to depression [2]. Long-term stress is a better predictor of depressive symptoms as compared to short-term stress [3]. Long term stress is considered a risk factor for many health conditions such as cardiovascular diseases [4,5].
The prevention of the onset of depression requires a timely detection of long-term stress symptoms. Conventional psychological methods and analysis of hormones such as cortisol and alpha-amylase are widely used in long-term stress studies [6]. These methods are practical but they are affected by various factors, such as language and objectivity. For instance, the Perceived Stress Scale (PSS) is a widely used questionnaire to measure the level of chronic stress, validated extensively across diverse samples [7]. Though in general, a self-administered checklist cannot equal the precision of an interviewer trained to elicit aspects of events critical to examine stress. Such interviews have shown to provide substantially better information in comparison to relatively unassisted self-reporting mechanisms [8]. Respondents have been found to report minor or positive events in response to questions designed to elicit negative and undesirable events [9].
Psychological methods alone are not enough to assess stress-related conditions [10]. Stress can be quantified objectively from bio-markers like electroencephalography (EEG), galvanic skin response, and electrocardiography [11]. Recently, wearable systems were developed that can record electro-physiological signals (such as EEG and heart rate variability) to detect acute stress [12]. EEG is one of the most common source of information for studying brain function [13][14][15][16][17]. The oscillations generated by the variation of electric potential in the brain are recorded using low resistance electrodes placed on the human scalp [18]. It is a widely used noninvasive method due to its excellent temporal resolution, ease of use, and low cost. EEG signals are categorized by their frequency bands including delta, theta, alpha, beta, and gamma. Each frequency band can be used as a discriminating feature for different brain states [19]. There are methods reported in literature to quantify human acute stress in response to induced stressors using EEG signal recordings. In comparison, the classification of long-term or chronic stress using EEG has not been widely assessed.

Our Contributions
In this study, the problem of long-term human stress recognition is addressed by using PSS labels and expert evaluation, which has not been explored before. We have hypothesized that wearable sensors (such as those for recording brain activity using EEG electrodes) can be used for identifying chronic stress, without inducing stress using a stimulus. To this end, our experiments have shown that involving a psychology expert for labeling stressed and control subjects is beneficial for such a classification. It is important to note here that we did not use any stimulus in our study to induce stress so that this system can be administered for detecting stress in daily life routine. Two groups of participants were considered including the stressed group and the control group. A total of forty five different features were extracted from EEG signals in frequency domain to classify these two groups. Discriminating features were selected using a statistical significance test. Five different machine learning classifiers including support vector machine (SVM), Naive Bayes (NB), K-nearest neighbor (KNN), logistic regression (LR), and multi-layer perceptron (MLP) were used to classify human stress using the selected features. Due to limitations of the data size and the noisy nature of the signals, deep-learning-based systems were not suitable for the task at hand. Therefore, we concentrated on machine learning classifiers that are more suitable for the task that we target to solve. The summary of our findings in this study is as follows: 1. We used EEG signals acquired from 33 participants in closed eye conditions using a five-channel EEG headset for long term stress classification (no stimuli used to induce stress) and found that among different feature, three frequency domain features were statistically significant in stress and control groups. 2. To the best of our knowledge, this is the first that the stress level of participants was labeled by a psychology expert in an EEG-based study. We showed its feasibility with a validated set of experiments. 3. The conventional machine learning classifiers suite well to long-term human stress classification and give better performance using psychological expert labeling.
The rest of the paper is organized as follows: Section 2, describes the related work. Section 3 presents the proposed methodology including data collection, feature extraction, and classification algorithms. Section 4 presents the results and a comparison with previously reported studies. Finally, the conclusion of the study is given in Section 5.

Related Work
Hemispheric specialization is a major concern in neuro-physiological research. Generally, a healthy brain at rest has a fairly balanced level of activity in both hemispheres of brain [20]. The left hemisphere is associated with the processing of positive emotions, while the right hemisphere is associated with the processing of negative emotions [21]. The extent of asymmetry has been suggested to vary under conditions of chronic stress [22]. Frontal asymmetry is highly related to post-traumatic stress disorder (PTSD) [23]. The results in [24], have shown that major depression disorder (MDD) group is significantly right lateralized relative to controls, and both MDD and PTSD displayed more right-than left-frontal activity.
Recently, the feasibility of using EEG in classifying multilevel mental stress has been demonstrated [19], where alpha rhythm at the right pre-frontal cortex was suggested as a suitable bio-marker. A machine learning framework using EEG signals was proposed in [25], where stress was induced by using the Montreal imaging stress task (MIST), and SVM, NB, and LR classifiers were used to classify the stress level of participants. The EEG of participants in resting-state was recorded under negative, positive, and neutral stimulus using soundtracks from the international affective digitized sounds (IADS-2) dataset [26]. Stress detection based on frontal alpha asymmetry was performed using the DEAP dataset, and classification was performed using SVM, KNN, and fuzzy KNN [27]. In [28], a mobile EEG was used to assess stress in humans using EMOTIV EPOC headset in an out-of-lab environment. In an EEG based study, 11 participants were analyzed for the identification of long-term stress [10], including seven mothers of children with mental disability (stress group) and four mothers of healthy children (control group).
A variant of the trier social stress task (TSST) was used to assess stress in 49 participants [29]. Samples of the salivary cortisol and resting state EEG based alpha asymmetry were assessed before and after performing TSST. The frontal and parietal alpha asymmetry was used to classify depression in elderly people [30]. The correlation between frontal and parietal alpha asymmetry, the geriatric depression scale, and the mini mental state examination were analyzed. A high beta activity at the frontal and occipital lobes was observed on the visual input of negative images [31]. The frontal theta activity was shown to decrease due to a stressful mental arithmetic task [32]. In [33], low beta waves in closed eye condition were found to be a strong predictor of perceived stress, where PSS score was predicted by using multiple linear regression. The pre-frontal relative gamma power i.e., the ratio of gamma band and slow brain rhythms, was proposed as a bio marker for identification of stress [34,35].
The related studies presented here can be grouped as either short-term or long-term stress assessment. Short-term stress is measured using a stress eliciting task, while long-term stress is measured without performing any additional mental task. Different techniques have been adopted to measure stress, but most of these techniques require human intervention. Among different physiological measures, EEG has the potential to be used as a measure of stress in daily life. This is due to the fact that EEG headsets are becoming commercially available for observing brain activity in an easy to wear and cost effective manner. The proposed study uses EEG signals acquired with a commercially available EEG headset to identify baseline or long-term stress without relying on stress-inducing tasks.

Methodology
We devised a supervised machine learning model for the classification of human stress ( Figure 1). A total of 33 volunteers participated in this study. The resting state EEG data for each participant were acquired using an EMOITV Insight headset (https://www.emotiv.com/insight/) in a closed eye condition for three minutes. After EEG signal recording, participants were asked to fill in the PSS-10 questionnaire followed by an interview with the psychology expert. The average time for the interview was 25 min. Based on the PSS scores and interview, the psychology expert grouped each participant in either the stress or the control group. The recorded EEG signals were made noise free in the pre-processing stage. Neuro-physiological features including alpha (α), low beta (β l ), beta (β), gamma (γ), delta (δ), theta (θ), and relative gamma (RG) power were extracted from the signals at each electrode. Frontal and temporal alpha and beta asymmetries, and alpha asymmetry was calculated from these features. Five supervised machine learning algorithms (SVM, NB, KNN, LR, and MLP) were used to classify human stress. Two different labeling methods were used, including the perceived stress scale and expert evaluation, where the PSS and interview scores were simultaneously used. A detailed description of these methods is presented in the following subsections. The flow of events during the data acquisition process is shown in Figure 2.

Data Acquisition
All EEG recordings were performed in a noise free lab using the EMOTIV Insight headset, which records brainwaves and provides advanced electronics that are optimized to produce clean and robust signals. Its data transmission rate is 128 samples per second, which provides the ability to perform an in-depth analysis on the brain activity. It has a minimum voltage resolution of 0.51 volts least significant bit (LSB) with 5 EEG electrodes at AF3, AF4, T7, T8, Pz locations and 2 reference electrodes. The headset is shown in Figure 3 with the five electrodes highlighted for reference. The device uses 14 bits for quantization, where 1 LSB = 0.51 µV. A 16-bit analog to digital conversion (ADC) is used, where 2 bits of instrumental noise floor are discarded. The reference electrodes CMS/DRL were located on left mastoid bone. The participants were asked to close their eyes for a duration of three minutes and were instructed to keep their head still to reduce movement artifacts. This also helped in minimizing the muscular motion and reduce these artifacts, since we recorded data at the frontal electrodes. A closed-eye condition was used, since correlates of long-term stress have been found in this condition in previous studies [10,33]. Another advantage of using the closed eye condition is the minimization of eye blink artifact. EEG signal acquisition was performed using the EMOTIV Xavier TestBench v.3.1.21. EEG signals were recorded from the scalp of participants while they were seated in a comfortable chair. Our experiments were specifically carried out in the afternoon (between 3-5 pm) to comply with similar studies where the circadian rhythm was assumed to be similar at this time period for the participants.

Pre-Processing
The EEG signals recorded from the scalp contained noise due to external interference. Before feature extraction, noise was removed from the signals for better classification results. The data recorded using the EEG electrodes provided by Emotiv have a DC offset in their value that should be removed before doing analysis based on fast Fourier transform. The average value of data from each channel was subtracted from the sample values to remove the DC offset. For reducing muscular artifacts, participants were instructed to minimize their head movements during the EEG acquisition. In a closed eye condition, blink artifacts were also found to be minimal. EMOTIV Insight has a frequency response of 1-43 Hz, which makes the signal noise free from AC line interference at 50 Hz.

Subject Labeling
The proposed method uses two types of labeling for supervised classification. PSS-10 was used for the questionnaire-based labeling method to subjectively evaluate the stress of participants. This questionnaire consists of ten questions. Each question asks the subject about the frequency of stressful events that have occurred during a period covering the last thirty days. The response for each question is on a scale of 0 to 4, where 0 represents that the event never occurred and 4 represents a frequent occurrence. The total PSS-10 score for each participant has a range between 0 and 40. The participants are divided in two groups i.e., the control and stress group, using the PSS score. A threshold was selected for this purpose, which was given by the following equation, where T p is threshold of PSS score, µ is the mean, and σ is standard deviation of the PSS scores. The psychologist assigned labels for the stress and control groups after an expert evaluation based on the interview and PSS scores. During the interview, the expert investigated the physical, emotional, behavioral, and cognitive symptoms of stress. Physical symptoms included aches or pain, diarrhea or constipation, nausea, dizziness, chest pain, and rapid heart rate. Emotional symptoms of stress included depression, anxiety, moodiness, irritability, overwhelming feelings, and loneliness. Behavioral and cognitive symptoms included memory problems, inability to concentrate, poor judgment, negativity, racing thoughts, and constant worrying. The interviews were conducted by the psychologist who was affiliated with a public sector hospital. The labels (control/subject) were assigned to participants by the expert based on the responses and the PSS score for each participant. The eighteen symptoms evaluated by the expert are presented in Table 1. The assigned labels were used as ground truth for training the system using the corresponding EEG recordings for each subject.

Stress Classification
In this study, five different types of classifiers were used for classification, which are described in the following subsections very briefly to make the manuscript self contained.

Support Vector Machine
A support vector machine uses the statistical learning theory based on the principle of structural risk minimization. An SVM selects a hyper-plane, which separates the feature space in to control and stress group according to the labels provided. The SVM is a highly efficient classifier and is used widely for stress classification in EEG based studies [19,25]. The use of SVM reduces the risk of data over-fitting and provides good generalization performance.

The Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes theorem. It uses the maximum posterior hypothesis of statistics and works well for high dimensional input data. It is a nonlinear classifier and gives good results in real world problems. In addition, the Naive Bayes classifier requires a small amount of training data to approximate the statistical parameters [36].

K-Nearest Neighbors
KNN is an instance-based learning classifier, where training instances are stored in their original form. A distance function is used to determine the member of the training set, which is nearest to a test example and used to predict the class. The distance function is easily determined if the attributes are numeric. Most instance-based classifiers use Euclidean distance for distance calculation. The distance between an instance with attribute values a 1 , a 2 , ..., a n (where n is the number of attributes) and b 1 , b 2 , ..., b n is defined as, 3.

Logistic Regression
The logistic regression algorithm guards against over-fitting by penalizing large coefficients. The output is set to one for training instances belonging to the class and zero otherwise. Logistic regression builds a linear model based on a transformed target variable, where a transformation function converts a nonlinear function to a linear function.

Multi-Layer Perceptron
In a multi-layer perceptron structure, transfer functions are used for mapping inputs to the output. These functions include sigmoid function, rectified linear unit, and hyperbolic tangent. The classifier uses back-propagation to classify instances. Multi-layer perceptrons are trained by minimizing the squared error of the network output, essentially treating it as an estimate of the class probability, which is given by the following equation, where f (x) is the network prediction obtained from the output unit and y is the instance class label.

Dataset
A total of 33 participants related to the education field volunteered for this study. The participants reported no history of brain injury and they were not using any medications that could have affected their brain activity at the time of experiment. Among these 33 healthy participants, 20 were male and 13 were females (60.6% male and 39.4% female). The participant's ages ranged from 18 to 40 years (µ = 23.85, SD = 5.48). In line with the Helsinki Declaration [37] and the departmental ethics guidelines, all participants of the study were briefed about the research goals. In addition, a signed informed consent was obtained from each participant. This study was approved by the Directorate of Advanced Studies and Research at the University of Engineering and Technology, Taxila.

Performance Parameters
The parameters used in this study include average accuracy rate, Kappa statistic, F-measure, mean absolute error (MAE), and root mean absolute error (RMAE). Accuracy is the ratio of truly classified instances over total number of instances in the recorded data. F-measure is calculated by considering the precision and recall values. The Kappa statistic values ranges between 0 and 1, where 0 represents chance level classification and 1 means perfect classification. A value less than zero shows that the classification is worse than chance level. For stress classification, the generalization performance of the proposed system was tested using cross validation to avoid over-and under-fitting as well as to make sure the the proposed system adopts well to unseen data. A 10-fold cross validation technique was used in this study, where the training data was randomly divided into ten equal parts (nine parts for train and one part for test) and the process was repeated 10 times. During the process, every instance was used for testing at a time and the remaining instances were used for training of the classifier.

Stress and Control Group
The scores acquired from participants using the PSS questionnaire are shown in Figure 4. The green and red bars represent the PSS scores of participants belonging to the control and stress groups respectively. The yellow bars indicate the PSS scores of participants not considered in either the stress or the control group. Overall, for the PSS scores we have (µ, σ) = (20.4 ± 6.14). A participant with a PSS score below 17.33 was considered to be in control group, whereas a participant with a PSS score higher than 23.47 was categorized in the stress group. These values were calculated using the threshold criteria defined in Equation (6). Hence 12 participants were put into the stress group (red bars) and 9 into the control group (green bars).
In expert (hybrid) evaluation, the psychology expert considered both PSS scores and the symptoms obtained from the interview method. The expert interviewed each participant for an average duration of 25 min. Out of the 33 participants, 10 were assigned to the stress group and 10 were assigned to the control group. The details about each participant regarding gender, age, PSS score, the label assigned by using PSS score, and the label assigned by expert is given in Table 2. There were fifteen differences in the assigned labels between those assigned using PSS scores and the expert (hybrid) evaluation. The experimental results show that expert (hybrid) labeling helps in improving the classification of long-term stress. It is important to note here that in a majority of the cases regarding label mismatch (13 out of 15), the PSS score ranges between 17 and 25, which covers the neutral range. Since we hypothesize that the expert (hybrid) labeling is better suited for the classification task, we have used these labels as ground truth.  Table 2. Gender, age, PSS score, and labels for the participants according to PSS and expert-based (hybrid) labeling (A-control group, B-stress group, X-neutral).

Feature Selection Using t-Test
We used a two-sided Student's t-test with a significance level of 0.05 and results using the p-values are shown in Table 3 for different EEG oscillations. For the t-test, the degree of freedom was 9 and the null hypothesis was tested for various features for stress and control groups. It is evident that at a confidence level of 0.05, none of the extracted feature were found statistically significant in the stress and control condition when PSS-based labeling was used for the reference standard. It is also revealed that beta and gamma waves from AF3 are statistically significant features in the stress and control group, when labels assigned by expert evaluation were used as a reference standard. Five additional features, namely frontal (α f ) and temporal (α t ) alpha asymmetries, frontal (β f ) and temporal (β t ) beta asymmetries, and alpha asymmetry (α a ) were also used (see Equations (1)-(5)). Results of the t-test applied over these features in stress and control groups are presented in Table 4. It can be seen that alpha asymmetry is statistically different between the stress group and the control group using expert-based labeling. Three significant features, namely beta (AF3), gamma (AF3), and alpha asymmetry were selected for long-term stress classification based on the results of t-test. A p-value of 0.04 and 0.03 for beta and gamma oscillations indicated their statistical significance. The p-value of alpha asymmetry from frontal and temporal channels was 0.0005, indicating the statistical significance of alpha asymmetry from both temporal and frontal regions.
The box plots are presented in Figure 5, where the first row represents features acquired through PSS labeling including alpha asymmetry, beta, and gamma respectively. The second row shows the same features acquired through expert evaluation. The + indicates an outlier, and the red line within the box represents the median value. A comparatively short box plot suggests that the features are in agreement with each other. A taller box plot suggests features show different distribution within themselves. From box plots (Figure 5b,c,e,f) it is observed that there is not much difference in the beta and relative gamma features to differentiate stress and control groups for both expert-and PSS-based labels. However, Figure 5a,d are candidates for good features as they appear to differentiate the stress and control group. In Figure 5a, the alpha asymmetry for the stressed group does not have a long lower whisker, which shows alpha asymmetry is not varied along the negative quartile, while in Figure  5d, the stressed group has varied alpha asymmetry as shown by the lower and upper whiskers. Also, in Figure 5d the median is comparatively at the center of the distribution. This suggests that alpha asymmetry is a good candidate to be used in the stress classification task.

Classification
We performed a comprehensive set of experiments to test and validate our proposed model using five classifiers, namely, KNN, NB, SVM, LR, and MLP. These classifiers were used with alpha asymmetry, beta, and gamma waves from channel AF3 as features to classify long-term stress. Each combination of the selected features was analyzed with each of the classifiers. The results of these classifiers in terms of average accuracy are shown in Table 5. We used 10-fold cross validation in these experiments since our dataset was limited. We used 10 folds, where in each fold 90% of the data were used for training and 10% for testing and reporting the average values of parameters across all 10 folds. The hyper parameters for classifiers used in our experiment were chosen using a grid search.
We observed that the classifier accuracy was high whenever alpha asymmetry was either used as a single feature or in combination with other features. The SVM-and LR-based classifiers give the highest accuracy when alpha asymmetry was used as a feature. The performance evaluation parameters for these classifiers are given in Table 6. We also observed that both SVM and LR show very similar values for kappa statistic and F-measure. SVM may have a slightly lesser mean absolute error of 0.15 than that of logistic regression with a value of 0.22, whereas LR has a lesser RMAE of 0.36 than that of SVM i.e., 0.38. The overall classification accuracy of both these classifier is similar. Overall, we concluded that SVM may be a better choice for an assisting system for stress recognition.

Discussion
Numerous studies have analyzed brain activities under stressful conditions, which are induced by a task such as impromptu speech, examination, mental task, public speaking, and the cold pressor test [38][39][40][41][42][43]. These studies evaluate short-term induced stress, whereas the classification of long-term stress using EEG has not been widely investigated. In Table 7, studies involving EEG to classify human stress are presented for comparison. It is observed that different stress-inducing tasks were used such as driving simulation, examination, and mental arithmetic tasks. Specialized instruments like MIST and Stroop tests were also used to induce stress. For chronic stress there could be several stressors that affect the physical, emotional, cognitive, or behavioral well being of a human being. Therefore, it is proposed that recording resting state EEG for stress classification is a better choice without involving stress induction. The number of participants involved in such studies vary from 5 to 42. The SVM and NB were used as classifiers in most of the studies. SVM was found to be the most efficient classifier, giving a maximum accuracy of 96%, when stress was induced by mental arithmetic test. In [10], the resting state EEG was recorded for two minutes and a nonlinear analysis was performed but no classification algorithm was used. In [44], chronic stress has been classified with an accuracy of 90%, using EEG recordings from eight electrodes and a stress-inducing condition.
Despite the difficulties of EEG in stress studies, there are cases where the use of EEG is vital and it has a clinical meaning in various conditions. For instance, ECG is not a direct stress measurement system, especially when mental stress originates in the brain. Furthermore, we studied long-term stress, and we did not have any stress inducer in our study (unlike other ECG-and HRV-based studies); hence, EEG can be a modality of choice for our experiments and we show its effectiveness with our experimental results. Although EEG has not been widely used for long term stress classification in clinical practice, our proposed method attempts to establish this approach. It has been shown that conditions such as anxiety, tension, and depression decrease as the frontal asymmetry shifts to the right hemisphere of the brain giving significance to EEG laterality [22]. It was demonstrated that variations in the beta activity [31] and pre-frontal gamma [34] contribute towards stress assessment. Hence there is evidence suggesting that these oscillations in the pre-frontal brain region can be used for assessment of stress using EEG recordings.
It is shown in this study that the alpha asymmetry of the brain can be considered as a potential marker for the recognition of chronic stress in humans. We observed ( Table 5) that the classification accuracy using beta and gamma oscillations was lower when compared to alpha asymmetry. Whenever a combination of alpha asymmetry from beta and gamma oscillations was used, the decision boundaries were changed. Due to this, the classification accuracy was lower when compared to the case when alpha asymmetry was individually used as a feature. The labeling should be performed by using a hybrid method (psychology expert and PSS scores) for training the system in a supervised manner. Due to the limited size of the data, we have shown that MLP is the only class of neural network based classifier that can fit to the task of stress classification. For more deeper networks, we would need more instances of EEG recordings.

Conclusions
In this paper, two different labeling methods were used for the classification of long-term stress in humans using EEG signals. Forty-five signal features were analyzed for the classification of chronic stress, and alpha asymmetry was found to be a discriminating feature when using expert's evaluation as ground truth. The PSS scores, when used solely for labeling, returned no significant features. Furthermore, it is evident from our experimental results that SVM and LR give the highest accuracy (85.20%) for classification. We also observed that the stress group was better classified when compared to the control group irrespective of the classifiers used. Finally, we established that alpha asymmetry can be used a potential bio-marker for the classification of long-term stress with SVM. To the best of our knowledge, no previous EEG-based studies have involved a psychology expert for labeling of groups for long-term stress assessment. In the future, more features and participants will be considered for the analysis. With the availability of more data, deep learning based strategies can be applied for potentially improved methods.