Automatic Sleep Staging Based on EEG-EOG Signals for Depression Detection

In this paper, an automatic sleep scoring system based on electroencephalogram (EEG) and electrooculogram (EOG) signals was proposed for sleep stage classification and depression detection. Our automatic sleep stage classification method contained preprocessing based on independent component analysis, feature extraction including spectral features, spectral edge frequency features, absolute spectral power, statistical features, Hjorth features, maximum-minimum distance and energy features, and a modified ReliefF feature selection. Finally, a support vector machine was employed to classify four states (awake, light sleep [LS], slow-wave sleep [SWS] and rapid eye movement [REM]). The overall accuracy of the Sleep-EDF database reached 90.10 ± 2.68% with a kappa coefficient of 0.87 ± 0.04. Furthermore, a depression recognition method was developed to distinguish the patients with depression from healthy subjects. Specifically, according to the differences in sleep patterns between the two groups, REM latency, sleep latency, LS proportion, SWS proportion, sleep maintenance and arousal times were employed in this study. Sleep data from 12 healthy individuals and 19 patients with depression were applied to the system. The accuracy of the recognition results reached 95.24%, thus verifying the feasibility of our approach.


Introduction
Sleep is the primary function of the brain and plays an essential role in an individual's performance, learning ability and physical movement. Sleep staging is the gold standard for analyzing human sleep. The aim of sleep staging is to identify the sleep stages that are vital in diagnosing and treating sleep disorders. Traditionally, doctors evaluate the quality of patient sleep through manual sleep staging, but it In recent years, a number of studies have attempted to develop methods to automate sleep stage scoring using single channels based on the public dataset Sleep-EDF. Akara et al. proposed a deep learning model that combines a convolutional neural network and a bidirectional long short-term memory network, DeepSleepNet, for automatic sleep stage scoring based on raw single-channel EEG signals. The overall accuracy of this method reached 82.00%, and Cohen's kappa coefficient was 0.76 [9]. SleepEEGNet is composed of deep convolutional neural networks and was proposed by Mousavi et al. using single-EEG channels; this model achieved good annotation performance, with an overall accuracy of 84.26% and a kappa coefficient of 0.79 [10]. Ghimatgar et al. used relevance and redundancy analyses for sleep stage classification with a single-EEG signal, and the method yielded an overall accuracy of 83.52%, with a kappa value of 0.75 [11]. In addition, the EOG signal is often selected for sleep staging. Rahman et al. extracted various moment-based and entropy-based features from the discrete wavelet transform (DWT) bands of EOG signals, and the accuracy was 83.00% [12]. Phan et al. combined EEG and EOG methods to perform sleep stage classification, and they obtained an overall accuracy of 82.30% using the multitask 1-max CNN method [13]. The classification accuracy considerably varies among the automatic sleep stage classification methods reported in the literature, ranging from 70% to 84%, and the sensitivity and specificity remain lower than 90%. However, the multimodal physiology data providing more information may be helpful to pattern recognition [14]. Therefore, studies of automatic sleep staging are still in their infancy, and more effective and accurate methods are needed.
In this study, we proposed an automatic sleep staging method based on EEG-EOG signals. First, different dimensional features from the time domain, frequency domain, and time-frequency domain were extracted. Then, a modified independent component analysis (ICA)-ReliefF method was proposed for feature selection. Finally, the sleep stages were classified with a support vector machine classifier. The proposed sleep staging method achieved an average accuracy of 90.10% for the Sleep-EDF dataset. Compared with the most recent benchmark approach, our proposed method improves the accuracy of sleep staging classification. Furthermore, we distinguished the patients with depression from healthy controls using the results of sleep staging. The experimental results yielded an average accuracy of 95.24%, indicating that sleep staging analysis can contribute to convenient, quick and accurate depression detection.

Sleep-EDF
Sleep-EDF, a public dataset based on sleep research, includes information about two sets of subjects from two studies: sleep recordings for healthy subjects (SC) and sleep telemetry data for subjects with mild difficulty falling asleep (ST) [21]. Each PSG recording contains 2-channel EEG signals from Fpz-Cz and Pz-Cz, 1 EOG (horizontal) signal, 1 EMG signal, 1 oronasal respiration signal and 1 body temperature. All EEG and EOG signals have the same sampling rate of 100 Hz, and the other signals' sampling rate is 1 Hz. These recordings were manually classified into sleep stages, including W (Wake); REM; 1, 2, 3, and 4 M (Movement time); and ? (not scored), by sleep experts according to the R&K standard. In this work, we selected 38 sleep records and labeled recordings collected at sleeping times (ignoring movement times) for 20 subjects (age 28.7 ± 3.0) from SC. In addition, we merged S1 and S2 into light sleep (LS) and merged S3 and S4 into slow-wave sleep (SWS). W, LS, SWS and REM are the four stages of sleep in the system.

Acquisition of Sleep Data for Healthy Subjects and Patients
The raw sleep data, provided by a hospital, are from 12 healthy subjects and 19 depressed patients. All depressed subjects were diagnosed by a physician as having mild to severe depression based upon a structured clinical interview, with severity as defined by the HAMD Rating Scale. For these patients, we collected 32 PSG channel signals that were obtained under the same conditions. In this study, we used information from two channels, EEG (C3, O1) and EOG (E1), as the input data for the automatic sleep stage classification system. The same sampling rate of 256 Hz was used for each channel. For the healthy subjects, we obtained the 30-channel EEG and 2-channel EOG signals. Five channels (FC3, T3, C3, C2, and CP3) and 1 EOG channel were used in this work. The sampling rate for each channel was 250 Hz. The collected data were manually divided into four classifications (W, LS, SWS and R) by experienced doctors.

Data Processing and Algorithm
For our proposed automatic sleep staging system, sleep staging included three progressive steps: preprocessing, feature extraction, and feature selection and classification. The goal of staging is to translate sleep activity into meaningful information through a series of well-designed components.
Initially, the user's brain activity is recorded to generate the polysomnographic signals that are used to evaluate the automatic sleep stage classification scheme. As shown in Fig. 1, after the data are recorded, a preprocessing phase is implemented with the ICA method, including filtering and artifact rejection, to enhance the PSG signal. Then, the resultant signals are transformed through feature extraction to derive informative attributes that are used in the classification stage. In addition, feature selection is performed prior to the classification stage to reduce the number of features that are derived from the input features. Finally, the extracted attributes are passed to classifiers to categorize human sleep stages. For depression detection, distinguishing between healthy subjects and depressed patients was based on the method shown in Fig. 2. Some features extracted from the labels were manually extracted by experts via visual inspection, and then the feature vector was input into the support vector machine (SVM) classifier to distinguish between patients and healthy subjects.

Preprocessing
In this part, all of the EEG and EOG channels were displayed with a 0.3-35 Hz bandpass filter, and then the input sleep data were segmented into sets of 30 seconds per epoch because the sleep recordings were scored by experts according to the AASM standard. ICA, developed from blind signal separation technology, is a multichannel signal processing method. This approach is very popular among researchers who work in areas such as signal processing and speech recognition. The initial learning rate was set to 0:00065=log e channels ð Þ . In the preprocessing stage, the signals both use EEG and EOG channels (3 Â frames), so the initial learning rate is set to 5:91655 Â 10 À4 . The ICA training process is stopped when the learning rate is less than 10 À6 or the number of steps reaches 512. Finally, three independent  denoising signals are obtained. ICA is a widely used preprocessing technique that has been applied in a wide area of research and therefore will not be discussed in detail. The details are given in the description of Bell and Sejnowski's algorithm for the ICA study of EEG analysis [22].

Feature Extraction
Sleep stage classification is typically performed by identifying the features extracted from cerebral rhythms, which contain time-based, frequency-based, and entropy-based features. For depression detection, the features extracted from the sleep stage labels have been used to distinguish between depressed patients and healthy people in many studies.
Spectral features: Spectral features were originally described in [23] and were first used for EEG signal analysis in [24]. Spectral features attempt to capture classification related information from the magnitude spectrum of the signal based on fast Fourier transformation. Each filtered epoch was divided into eight groups without overlapping, and we used the average value as the final spectral feature.
Spectral edge frequency features: The spectral edge frequency (SEF) is the frequency below which a certain fraction of the signal power is contained. The difference between that frequencies is generally denoted as SEFxx, where xx is the fraction of the signal power for which the edge frequency is calculated [25]. The spectral edge frequency at 50% (SEF50), shown in Eq. 1, is the frequency below which half of the signal power is present. This value is equivalent to the median frequency of the signal. SEF95 is shown in Eq. 2: where a i is the Fourier transform coefficient.
Absolute spectral power: In the field of sleep staging, the absolute spectral power (AP) is used to extract REM features. A repeated-measures analysis of variance showed a significant sleep stage effect for alpha band power, and alpha band power was significantly elevated in phasic REM sleep compared with that in sleep stage 4 [26]. Therefore, we set f 1 ¼ 8Hz, f 2 ¼ 16Hz and then expressed the corresponding formula as follows: where n f i ð Þ is the index of the Fourier transform coefficient corresponding to the frequency f i .
Statistic features: As physiological signals, sleep EEG, EOG, EMG, respiration and temperature signals can also be analyzed by statistical analysis [27,28]. The features contain the mean, standard deviation and some parameters of the first/second derivative of the raw signals. There were 6 features for each epoch in this study.
Hjorth features: The Hjorth parameter reflects the statistical property of a signal in the time domain and includes three components: activity, mobility, and complexity [29]. The activity parameter is associated with the high-frequency components of the signal. The mobility parameter is defined as the square root of the ratio of the variance of the first derivative of the signal to that of the signal. The complexity parameter indicates how similar the shape of a signal is to that of a pure sine wave. The expressions of these parameters are as follows [30]: where r x , s x 0 , and s x 00 are the standard deviations of x n ð Þ, x 0 n ð Þ and x 00 n ð Þ, respectively.
Maximum-Minimum Distance (MMD): The MMD feature is based on the distance formula derived from the Pythagorean Theorem [1]. The idea behind this feature is to find the distance between the maximum and minimum points in each subwindow. The subwindow is set to ¼ 100 if the number of samples in an epoch is less than 10,000. If number of samples is greater than 10,000 and less than 100,000, then ¼ 1000 and so on. This rule must be taken into consideration for all epochs. Eq. (8) represents how the number of samples in a sliding window or the wavelength is determined.
where n is number of samples in an epoch. To compute the MMD feature, first the maximum and minimum values are found with their indices in each sliding window. Then, the distance between the maximum point and minimum point are calculated using the Pythagorean distance formula: where Dx indicates the x-axis (time) difference of the maximum and minimum points and Dy refers to the y-axis (amplitude) difference of the maximum and minimum points in each subwindow. Last, (10) is carried out to find the sum of the distance sequences in each sliding window: where w refers to the total number of subwindows in an epoch.

EnergySis (Esis):
The basic idea of this feature is to assume that the signal has speed and energy [1]. According to (11), the speed (velocity) of the signal can be measured by using the frequency and wavelength parameters. The frequency (f ) is calculated by finding the midpoint of the pass-frequency for each band. The wavelength () of the EEG waveform is determined as mentioned earlier in (8). Then, the obtained velocity (v) is multiplied by each squared amplitude (X) modulus of the sample. Finally, the total Esis is calculated as described in (12).
where N refers to the length of the epoch.
REM latency: REM latency is the time between lying down and reaching the REM stage. Investigations have shown that REM latency is an important feature of primary depression because it is significantly longer than that for healthy controls [31][32][33].
Sleep latency: Sleep latency is the time from lying down to the appearance of W, and it is the sleep symptom most complained about by depressed patients [34].
LS proportion: The proportion of LS to the total sleep time is the LS proportion. The N1 and N2 periods for depressed patients are relatively short compared to those of healthy subjects [35].
SWS proportion: The proportion of SWS to the total sleep time is the SWS proportion. The shortening of SWS due to the suppression of the mechanisms responsible for regulating NREM sleep can also reflect symptoms of depression [20].
Sleep maintenance: Sleep maintenance reflects the proportion of time required to fall asleep to the amount of time spent in bed, which may play an important role in the development of depressive disorders [35,36].
Arousal times: Arousal times, the number of awakenings throughout the night, could reflect the quality of sleep, and significant increases in awakenings for depressed patients have been reported [37].

Feature Selection
The feature selection process, which is an important part of pattern recognition and machine learning, reduces computation costs and increases classification performance. All original features are not always useful for classification or for regression tasks since, in the distribution of the dataset, some features may be irrelevant/redundant or noisy and reduce the classification performance. ReliefF is the supervised feature weighting algorithm of the filter model. This determines the extent to which feature values discriminate the instances among different classes and is used in estimating the quality of the features according to this criterion. The ReliefF algorithm has the advantage of dealing with noisy and unknown data [38,39]. For the sleep stage classification task, the ReliefF algorithm outputs a weight to indicate the corresponding relevance of each sleep feature. In this study, the Chebyshev distance measure is used instead of the Manhattan distance measure for identifying the nearest miss and nearest hit instances, which is enough to attain accurate neighborhood selection and better prediction and it completely reduces the curse of dimensionality problem. The weight of the ReliefF algorithm is calculated using the following formula: m defines the sample size (randomly selected from a subset of the training set), and diff x ij ; near hit ij À Á is the difference between values of the attribute within the randomly selected j distance and near hit ij value of the attribute within the closest training sample in the same class. In parallel, diff x ij ; near miss ij À Á is described as value of the closest training sample from a different class. For a useful attribute, x ij and near hit ij values are expected to be very close to each other. If an attribute is not useful, both differences are expected to take almost the same distribution. All the features are first ranked using the modified ReliefF method, and they are added to the features vector in turn according to their weights for classification.

Classification
In the small sample classification in the field of pattern recognition, K-means is a common method in unsupervised learning [40,41], while SVM is the most common method in supervised learning. SVMs, the method that has been successfully applied to classification in various domains of pattern recognition, are used for classification in this paper. The SVM is a linear discriminant that maximizes the separation between two classes based on the assumption that it improves the classifier's generalization capability [42,43]. In this study, there are two parts using the SVM. The first part is to achieve sleep staging with Sleep-EDF and acquisition sleep data. The other part is to distinguish the subject between healthy subjects and patients with depression. The SVM classifier with the linear kernel based on the LIBSVM toolbox [44] was used to classify the data, and the remaining parameters are set to default values [45].

Performance Evaluation Methods
K-fold cross-validation has been widely used in automatic sleep staging [8]. The data set is first divided into k subclusters in a k-fold cross-validation test. Moreover, (k − 1) times subclusters are used in training, whereas 1 subcluster is used for testing. The process is continued until all subclusters are left outside training and tested. The success achieved in the tested data sets provides the reliability and validity degree of the employed method. The average testing success for the k-times data set is obtained to arrive at a single validity value.

Experiments and Results
Three experiments were conducted in this study, including two experiments involving automatic sleep staging with the Sleep-EDF data set and acquired sleep data and one involving depression detection. For each experiment, we used the proposed machine learning method for feature extraction and classification with an SVM. In this study, the experiments were developed in a MATLAB environment (MATLAB and Signal Processing Toolbox Release 2019b) with an Intel(R) Core(TM) i5-4210M 2.60 GHz CPU and 12 GB of installed RAM on the Windows x64 platform.

Experiment I: Sleep Staging with Sleep-EDF
This experiment was designed to evaluate the performance of our automatic sleep staging system using the public data set Sleep-EDF. Sleep data for 38 subjects from Sleep-EDF were independently applied to the system in this experiment. Data of equal proportions in each period were extracted as the training set, and the remaining data were regarded as the testing set.
As shown in Fig. 1, we first preprocessed 3 EEG and EOG channel signals with ICA and set the initial learning rate to 5:91655 Â 10 À4 . Then, we extracted the features of all the signals to form a feature vector, including spectral features, spectral edges frequency features, statistical features, Hjorth features, etc. The ReliefF method was used for ranking features. Finally, we evaluated the system using 4-fold crossvalidation based on an SVM and obtained the overall accuracy. To compare the performance of singlechannel and multimodal signal methods, we established a control experiment using Fpz-Cz only. In this case, there was no ICA preprocessing step with only a single signal.
For Fpz-Cz, the average overall accuracy was 77.88 ± 6.86%, with a kappa coefficient of 0.70 ± 0.09, as shown in Tab. 1. The average accuracies of the W, LS, SWS and REM stages were 77.32%, 62.96%, 91.10% and 80.16%, respectively. The highest accuracy was 88.78%, and the lowest accuracy was 62.28%. For the multimodal signals, the overall accuracy and kappa coefficient reached 90.10 ± 2.68% and 0.87 ± 0.04, which can be seen in Tab. 2. The average accuracies of the W, LS, SWS and REM stages were 96.83%, 80.18%, 91.59% and 91.79%, respectively. The accuracies of the subjects for the W, LS and REM periods were significantly higher than those using a single channel (p < 0.01, paired t-test), especially in the W period, as shown in Fig. 3. The overall accuracy was higher than 90% for more than half of the subjects, and all the kappa coefficients were higher than 0.8. Additionally, the lowest accuracy was higher than 80%.
Although the accuracy for LS is relatively low, potentially because LS is a transitional phase between being awake and other sleep states and displays similar EEG patterns as those for REM, most sleep staging methods have similar problems. Moreover, the accuracy of W and LS is much higher than the former method. The overall accuracy for multimodal signals is 11.62% higher than that for single-channel signals, which reflects the effectiveness and reliability of the proposed method with multimodal signals and ICA preprocessing.

Experiment II: Sleep Staging with Raw Sleep Data
To validate the effectiveness of our method, we performed sleep staging experiments on healthy subjects and patients with depression based on the automatic sleep staging system that was optimized in Experiment I. Sleep data for 19 depressed patients and 12 healthy subjects were used in this experiment, and the data spanned an entire night. Feature vectors were extracted from the sleep data of subjects through preprocessing and feature extraction. Then, sleep staging was performed using the vectors through the SVM classifier.
For the patients with depression, we obtained an average overall accuracy of 78.62 ± 6.71% with a kappa coefficient of 0.71 ± 0.09, as shown in Tab. 3. The accuracies of W, LS, SWS and REM were 75.45%,  70.52%, 90.36% and 78.11%, respectively. Notably, the accuracy for elderly subjects was relatively low, and that for teenagers was higher than the average value.
For the healthy subjects, the overall accuracy reached 69.23 ± 5.55%, and the kappa coefficient was 0.58 ± 0.08, as shown in Tab. 4. In particular, the accuracy of W was higher than 80%, but the accuracies of LS and REM were relatively low.  The accuracy for the LS period was relatively low for healthy subjects. In addition, the accuracy in Experiment II was lower than that in Experiment I with Sleep-EDF.

Experiment III: Depression Detection
To explore the feasibility of depression detection, we established an experiment to differentiate patients with depression and healthy subjects. The labels of patients and healthy subjects, which were manually classified by a sleep expert, were regarded as input data and added to the depression detection system. Ten labels for 4 healthy subjects and 6 depressed patients were randomly selected as the training set, and the other labels were regarded as the test set. For the label data, features were acquired, including the REM latency, sleep latency, LS proportion, SWS proportion, sleep maintenance time, arousal time, etc. Finally, the SVM classifier was used to classify patients. As reported in Tab. 5, among the test set consisting of 21 subjects (13 depressed patients and 8 healthy subjects), 20 subjects were classified correctly, so the accuracy was 95.24%. The true positive rate (TPR), false positive rate (FPR), true negative rate (TNR) and false negative rate (FNR) were 100%, 7.69%, 92.31% and 0%, respectively. Moreover, the F-score reached 94.12%, which suggests that the proposed method is effective.
TPR and FPR are appropriate measures for evaluating the performance of depression detection. The receiver operating characteristic (ROC) curves depict the relation between the TPR and FPR. We used the extracted features to obtain the ROC curve. As shown in Fig. 4, the areas under the ROC curve are both close to 1. The conventional parameters of sleep stage labeling are summarized in Tab. 6. The depressed patient group showed several of the well-known pathognomonic sleep changes, such as decreases in the LS stage percentage, sleep latency, sleep efficiency and arousal times and increases in the REM latency and SWS stage percentage.

Discussion
This work provides a comprehensive survey of automatic multimodal signal processing techniques applied for sleep stage identification. The automatic sleep stage classification system analysis procedure was divided into four essential parts: preprocessing, feature extraction, feature selection and classification. The survey offers valuable information for researchers to determine the effectiveness of methods based on multimodal signals, and the performance and efficiency of various methods are discussed. In this study, we evaluated the proposed automatic sleep staging algorithm using the public dataset Sleep-EDF and sleep data acquired for healthy subjects and patients with depression. For Sleep-EDF, we obtained a relatively high overall accuracy using multimodal signals, reaching 90.10 ± 2.68%, and the kappa coefficient was 0.87 ± 0.04. For the sleep data obtained for patients with depression, the accuracy was 78.62 ± 6.71%, which indicates the effectiveness of the proposed system. In addition, an algorithm for depression detection was proposed. The accuracy of depression recognition was 95.24%, with an F-score of 94.12%. This finding indicates that the proposed method has certain advantages over other methods in  terms of accuracy and feasibility; therefore, this research can be considered an important step toward a fully automated, convenient and efficient sleep quality evaluation system.
The proposed automatic sleep staging system took approximately 30 minutes to complete a sleep staging task for one subject. However, it takes hours for experts to perform manual sleep stage classification. Therefore, the proposed method is efficient compared to the current methods applied in practice. Although we used multimodal signals, several channels, including EEG and EOG, were used in the system. This method can be effectively applied in clinical practice. In Experiment I, the method with multimodal signals yielded an accuracy increase of greater than 10% over that for a single-channel signal (77.88 ± 6.86%). In particular for the LS period, its accuracy improved dramatically, though it has always been regarded as the most difficult period to be classified. Thus, the proposed effective multimodal signal automatic sleep staging method is highly reliable for use in automatic sleep staging systems. However, the accuracy of sleep staging of healthy subjects in Experiment 2 was not quite as good compared with that of patients. As pointed out in most clinical studies, there is a decrease of total sleep time and stage SWS and an increase of stage W in depressed patients in comparison with the healthy controls, and that increasing the difficulty of sleeping staging for the healthy subjects. Furthermore, the physiologic systems in a healthy state generate activity fluctuations on many time scales, which would also influence the accuracy of sleep staging to some extent [46].
All of the methods listed in Tab. 7 utilized the Sleep-EDF dataset, and only the best accuracy is presented. As shown, the accuracy of the proposed method was 90.1%, and our proposed method achieved nearly the highest accuracy reported. There are a few factors that may contribute to the high performance of our result. First, the multimodal signals were used in this system and they could provide more information of sleep data. In other aspects, multimodal signals are suitable for ICA preprocessing to decrease noise in the signals. Second, the application of the ReliefF feature selection in sleep staging system is also an important factor. The ReliefF algorithm is used to evaluate the relevance of features to labels. Highly relevant features can commendably reflect the discrimination of samples from different classes and the similarity of samples from the same class. The trained classifier used ReliefF with the Chebyshev distance measure to capture more valid information from the new feature vector, thereby enlarging its ability to perform pattern recognition. The accuracy variations for different patients may be because depressed patients' sleep patterns are different from those of healthy subjects, as noted in many studies. The sleep efficiency of depressed individuals is obviously lower than that of healthy patients. Patients displayed abnormally small autocorrelations during the LS and SWS stages in comparison to healthy controls [46]. Sleep latency and REM latency are important variables that contribute to the distinction between healthy and depressed people [15,52]. Additionally, there is enough evidence provided by other parameters, such as sleep maintenance and REM activity, to indicate the differences between healthy and depressed individuals [53]. In other words, physiological systems in a healthy state generate activity fluctuations on many time scales, and disease states are associated with a breakdown of this traditional temporal structure. Although the numbers of healthy and depressed subjects studied were relatively small, an accuracy rate of up to 95.24% was achieved, indicating that the proposed method provides a viable way to automatically recognize depression with computers.
The limitations of this study are primarily related to data acquisition and the proposed methodology. The raw sleep data collected, especially for healthy subjects, may have too much noise to yield high-accuracy results in sleep staging analyses. In addition, the sleep stage labels used for depression detection were not manually divided by experts because the accuracy of the results of Experiment II was not high enough to support depression recognition.
Future work will consider artificial neural networks to enable us to efficiently improve the performance of automatic sleep staging [54]. Additionally, we will further explore the difference between the sleep patterns of patients with depression and healthy people and attempt to automatically detect them.

Conflicts of Interest:
The authors declare that there are no conflicts of interest regarding the publication of this paper.