Automatic Sleep Staging Algorithm Based on Random Forest and Hidden Markov Model

: In the ﬁeld of medical informatics, sleep staging is a challenging and time-consuming task undertaken by sleep experts. According to the new standard of the American Academy of Sleep Medicine (AASM), the stages of sleep are divided into wakefulness (W), rapid eye movement (REM) and non-rapid eye movement (NREM) which includes three sleep stages (N1, N2 and N3) that describe the depth of sleep. This study aims to establish an automatic sleep staging algorithm based on the improved weighted random forest (WRF) and Hidden Markov Model (HMM) using only the features extracted from double-channel EEG signals. The WRF classiﬁcation model focuses on reducing the bias of the imbalance data, while the HMM model focuses on improving the detection rate of sleep staging through the relationship between adjacent sleep stages. In particular, the improved weighted RF classiﬁcation model can increase the recognition rate of the N1 stage. In addition, the method of removing features with low variance is used to select meaningful and contributing feature parameters for model training. This is an innovative content of this paper. The sleep EEG data are ﬁrst segmented into 30 s epochs, and the feature parameters of the epoch data are extracted from the double-channel by applying continuous wavelet packet transform (WPT). Each epoch is then segmented into 29 subepochs of 2 s long with 1 s overlap, and the frequency domain features and statistical features of each subepoch are extracted. The performance of the proposed method is tested by evaluating the accuracy (AC), Kappa coefﬁcient, Recall (R), Precision (P) and F1-score (F1). In the Sleep-EDF database, the overall AC and Kappa coefﬁcient obtained by WRF are 93.20% and 0.890, respectively using the subject-non-independent test. In the 10 sc* and 10 st* Sleep-EDF Expanded database, the overall AC and Kappa coefﬁcient obtained by proposed method are 91.97% and 0.874, respectively using the subject-independent test. The best AC and Kappa coefﬁcient of single subject can reach 96.3% and 0.912, respectively. Experimental results show that the performance of the proposed method is competitive with the most current methods and results, and the recognition rate of N1 stage is signiﬁcantly improved.


Introduction
Sleep is an important physiological activity of the human body, and sufficient sleep will guarantee the work of the next day [Liu, Hay and Faught (2013) ;Bertisch, Pollock, Mittleman et al. (2018)]. After continuous exploration, the researchers found that sleep contains specific physiological laws, and many disease-related information can be obtained from sleep electroencephalogram (EEG) [Hood, Wilson, Gorenz et al. (2018); Olsson, Ärlig, Hedner et al. (2018); Gonzalez, Small, Cases et al. (2018)]. Therefore, studying the information contained in sleep EEG signals and analyzing the changes of sleep cycle is of great significance for the study of sleep-related diseases. Overnight polysomnography (PSG) with EEG is the primary diagnostic test for evaluating patients with sleep problems [Malhotra, Kirsch, Kristo et al. (2018)]. Typically, sleep specialists use PSG for clinical diagnosis of sleep staging. PSG contains a set of signals, including EEG, electromyogram (EMG), electrocardiogram (ECG) and electrooculogram (EOG). These signals are recorded by sensors connected to different parts of the body [Tarek, Sahbi, Perrine et al. (2015)]. Sleep experts or doctors will classify these epochs into different sleep stages according to the guidelines developed by Rechtschaffen and Kales (R&K) or the classification manual proposed by the American Academy of Sleep Medicine (AASM) [da Silveira, Kozakevicius and Rodrigues (2017)]. This process is called sleep stage scoring or sleep staging. Basing on the R&K rules, sleep recordings can be classified into six sleep stages: wakefulness (W), non-rapid eye movement (NREM) sleep stage 1 (N1), NREM sleep stage 2 (N2), NREM sleep stage 3 (N3), NREM sleep stage 4 (N4), and rapid eye movement (REM) [Wolpert (1969)]. According to the standardization rules of AASM in recent years, N4 has been merged into the N3 sleep stage, so this paper will also study five sleep stages [Berry, Brooks, Gamaldo et al. (2017)]. EEG is a kind of bioelectrical signal with randomness, time variant, non-stationarity and non-linearity. Accurate sleep staging is a crucial part of the diagnosis and treatment of some disease [Bachmann, Päeske, Kalev et al. (2018)]. So far, doctors have been accustomed to visually observe brain waves for sleep stage marking, which takes a lot of time. Therefore, an effective automatic sleep stage classification algorithm will alleviate the doctor's scoring work. In order to classify the sleep stage of EEG signals, feature extraction from sleep EEG signals is one of the most critical technique in the process of sleep staging. Some researchers used feature extraction methods that combined time and frequency domain features, but most studies only extracted time-domain features or frequency-domain features [Li, Wang, Luo et al. (2017) ;Hernández, Trujillo, Z-Flores et al. (2018); Colominas, Jomaa, Jrad et al. (2017); TM and M (2018)]. Cabir et al. [Cabir and Murat (2010)] adopted principal component analysis (PCA) dimensional reduction techniques and compared features in time domain, frequency domain and the combination of both in perforemance evaluation of sleep staging. According to the experimental results, time-domain features alone provided the worst classification while the mixed features provided the best classification performance. Salih et al. [Salih, Kemal and Yosunkaya (2010)] proposed a new hybrid system comprising of Welch spectral analysis, k-means clustering based feature weighting and classifier algorithms (k-NN classifier and decision tree classifier) to automatically score the sleep stages. A total samples of 35-hours EEG signals collected from five male subjects were used to achieve 82% classification accuracy (AC) through this system. Fraiwan et al. [Fraiwan, Lweesy and Khasawneh (2012)] researched sleep staging algorithm using three time-frequency techniques (Choi-Williams distribution, continuous wavelet transform and Hilbert-Huang Transform) and entropy measures based on the sleep EEG, the experimental results showed that the continuous wavelet transform analysis method provided the best performance with accuracy of 83%. Although the state-of-the-art research has fully captured the time domain and frequency domain information of sleep EEG signals, it still has certain limitations in feature extraction. Memar et al. [Memar and Faradji (2017)] used the bandpass filter to decompose 8 subbands waves (δ, θ, α, σ, β1, β2, γ1 and γ2) from each EEG epoch. Baha et al. [Baha, Musa, Abdullah et al. (2014)] decomposed the rhythm bands of EEG data by 5-level discrete wavelet transform. Since the wavelet transform only decomposes the low-frequency bands of the signal and the high-frequency bands are ignored, the large amount of information about the sleep cycle contained in the EEG signals cannot be fully explored. Fortunately, wavelet packet transform (WPT) [Shahnaz, Sarker and Paul (2017); Reddy, Fahim, Nikhil et al. (2018)] can provide more detailed EEG signal decomposition including low frequency part and high frequency part, so it can perform better timefrequency localization analysis on EEG signals containing a large amount of medium and high frequency information. The choice of classification methods greatly affects the performance of sleep staging. In recent years, the application of deep learning methods in biomedical signal processing has become more and more common. For example, the convolutional neural networks (CNN), the one-dimensional convolutional neural networks (1D-CNN), recurrent neural networks (RNN) and the fast discriminative complex-valued convolutional neural network (FDCCNN) have been proposed recently to deal with the sleep stage classification tasks [Sokolovsky, Guerrero, Paisarnsrisomsuk et al. (2019); Malek, Melgani and Bazi (2018); Yulita, Fanany and Arymuthy (2017); Zhang and Wu (2017)]. In addition, the method of feature extraction using deep learning instead of manual feature extraction is also explored for automatic sleep staging. However, Ghimatgar et al. [Ghimatgar, Kazemi, Helfroush et al. (2019)] proposed that deep learning is limited in the study of automatic sleep staging. Firstly, deep learning needs a lot of data to train complex models, which is less meaningful for the analysis of a small number of special patients clinically. Secondly, compared to traditional methods, deep learning usually takes hours or even days for training which requires huge computing resources. Thirdly, no relevant research can prove the clinical significance of auto-learning characteristics from a physiological point of view, and it cannot explain the role of these features in each sleep stage. In this paper, an innovative method for automatic sleep EEG staging based on improved weighted random forest (WRF) and hidden markov model (HMM) is proposed. Compared to traditional shallow learning methods, deep learning utilizes multiple layers of linear and nonlinear processing units to learn hierarchical representations or features from input data. Compared with using the WRF classifier alone, the algorithm combining WRF with HMM can take the sleep stage state of neighboring epoch into account, which can smooth the classification result and improve the accuracy of the model. After transforming the preprocessed EEG signals by WPT, long-term features which are extracted by decomposing the EEG data into different 30 s epoch data and short-term features which are extracted by segmenting the 30 s epoch data into 2 s epoch data with 1 s overlap are used to train a sleep staging model by WRF. And then, HMM is adopted to smooth the sleep staging model and reduce misclassification. The rest of the paper is organized as follows. In Section 2, materials and methods are described. The experimental results are given in Section 3. Discussion is described in Section 4 and Section 5 is the conclusions.

Materials and methods
The paper proposes a sleep staging method based on WRF and HMM, following the step of preprocessing, feature extraction, WRF training, WRF testing and HMM smoothing. Fig.  1 shows the procedure of the method. The algorithm simulation in this paper is done on python 3.7 platform, and EDF data waveform can be viewed through EDFbrowser.

Sleep datasets
The experimental data in this study comes from the Sleep-EDF database (website is https://www.physionet.org/physiobank/database/sleep-edf/) and the Sleep-EDF Expanded database (website is https://www.physionet.org/ physiobank/database/sleep-edfx/) which are provided by Kemp [Kemp, Zwinderman, Tuk et al. (2000)]. The Sleep-EDF database includes 8 EEG files recording from 4 males and 4 females between the age of 21 and 35. These 8 EEG recordings are divided into healthy group (sc4002e0, sc4012e0, sc4102e0, and sc4112e0, marked as sc*) and mild sleep disorder group (st7022j0, st7052j0, st7121j0, and st7132j0, marked as st*) according to sleep status. The 4 sc* recordings were obtained in 1989 from ambulatory healthy volunteers during 24 hours in their normal daily life, using a modified cassette tape recorder. The 4 st* recordings were obtained in 1994 from subjects who had mild difficulty falling asleep but were healthy, during a night in the hospital. Each EEG recording contains 1 horizontal EOG and 2 EEG channels called Fpz-Cz and Pz-Oz with the sampling rate of 100 Hz. All of these sleep EEG data are annotated by experts with different stages according to R&K guideline, which are named as W, REM, N1, N2, N3 and N4 and unscored. However, sleep EEG is now commonly staged according to the AASM guideline in clinical studies, including W, N1, N2, N3 and REM. Therefore, this paper states that the class N3 includes two stages in this paper, N3 and N4. Sleep-EDF Expanded database is the extended version of the Sleep-EDF database, which was updated to version 1 in 2013 with a total of 61 subjects, including 39 sc* files and 22 st* files. The 39 SC* files were obtained from 10 healthy males and 10 healthy females aged from 25 to 34, without any sleep-related medication [Mourtazaev, Kemp, Zwinderman et al. (1995)]. The ST* files were obtained from a study of sleep effects in 7 males and 15 females who had a mild difficulty falling asleep. Same as Sleep-EDF database, the sampling rate of these signals is 100 Hz. In this paper, channel Fpz-Cz and channel Pz-Oz are analyzed.

EEG preprocessing
The frequency range of Sleep-EDF database and Sleep-EDF Expanded database EEG is generally 0.1-50 Hz. In this study, a 5-th order IIR Butterworth bandpass filter with the frequency range of 0.5-32 Hz is used to remove noise and artifacts from EEG signals. Then, the filtered EEG signals will be segmented into 30 s epochs without overlap. According to the above division method, 15172 sleep epochs are obtained from the 8 EEG recordings in which there are 8039, 1609, 604, 3621, 1299 epochs of W, REM, N1, N2 and N3 stage respectively. Tab. 1 shows the epoch numbers for different sleep stages in the 3 EEG groups (Sleep-EDF database, 20 sc* files in the Sleep-EDF Expanded database and 10 sc* + 10 st* files in the Sleep-EDF Expanded database) used in this study.

Wavelet packet transform
The EEG signal in W stage is characterized by low amplitude β wave in state of eyes closed and α wave in state of eyes opened. The EEG signal in N1 stage is characterized by the presence of vertex sharp and θ wave with high amplitude. Stage N2 is characterized by the appearance of spindles and K-complex. The deepest sleep stage N3 is characterized by δ wave with high amplitude. The REM stage is characterized by low amplitude sawtooth wave. Therefore, WPT is used to process EEG signals, and the time-frequency features of different spontaneous EEG rhythms(δ wave, θ wave, α wave, β wave, spindles and Kcomplex) are analyzed in this paper. Choosing db6 as wavelet basis function to transform each 30 s epoch in WPT. The paper decomposes the signal into 7 frequency bands: δ wave (1.6-4 Hz), θ wave (4-8 Hz), α wave (8-13 Hz), β1 wave (13-22 Hz), β2 wave (22-32 Hz), spindles (12-14 Hz) and K-complex wave (0.5-1.5 Hz). The wavelet transform only decomposes the low-frequency part of the signal step by step, and does not perform any processing on the high-frequency part of the signal. The WPT further decomposes the low-frequency component and the high-frequency component of the previous stage. Let ϕ(n) and Ψ(n) be scale function and wavelet function, respectively, so that: where i is the node number, j is the level of decomposition. h(m) and g(m) = (−1) 1−m h(1 − m) are a pair of quadrature mirror filters. Ψ j,k 2i and Ψ j,k 2i+1 are wavelet packet basis functions based on two-scale equations. The wavelet transform coefficients of the EEG signal x(n) at j-th level and k-th point are computed via the following recursive equations: For the M -layer wavelet packet decomposition, the EEG signal x(n) have 2 M sample points, then x(n) can be reconstructed by the Eq. (4).
For the 5-layer wavelet packet decomposition of the signal, there are 2 5 nodes in the 5-th layer. In the first layer decomposition, the EEG signal (0, 0) is decomposed into a low frequency part (1, 0) and a high frequency part (1, 1), in the second layer decomposition, the signal of (1, 0) is decomposed into a low frequency part (2, 0) and a high frequency part (2, 1), and the signal of (1, 1) is decomposed into a low frequency part (2, 3) and high frequency part (2, 2), the third to fifth layers are sequentially decomposed according to this law. In order to show the law of wavelet packet decomposition more intuitively, 0 is used to represent the low-frequency part obtained after each decomposition, and 1 is used to indicate the high-frequency part obtained by each decomposition, and then a binary number b n which contains n digits can be defined, which means it is decomposed into n layers. For example, (1, 0) can be expressed as 0, and (2, 2) can be expressed as 11. If the low-frequency portion of the upper layer is decomposed, the low-frequency portion is decomposed to the left side, and if the high-frequency portion of the upper layer is decomposed, the high-frequency portion is decomposed to the left side. So the frequency distribution of the nodes of the last layer is not in order. In this paper, the EEG signal is decomposed into five layers. The fifth layer contains 32 nodes. According to the above-mentioned wavelet packet decomposition law, the frequencies of these 32 nodes are arranged in order from low to high as: (5, 0)

Feature extraction
Sun et al. [Sun, Jia, Goparaju et al. (2017)] claimed that the duration of each epoch (30 s) is relatively long compared to transient events, including sleep spindle waves and K complex waves. So the spectral information of the epoch may not be fully extracted. Therefore, each 30 s epoch will be segmented into 29 subepochs of 2 s with 1 s overlap. The rhythms of the different frequency bands of the wavelet packet decomposition have been obtained, and the overall features of the 30 s epochs and the statistical features of the 29 subepochs are respectively calculated to represent the feature differences between the EEG signals from different sleep stages. The features extracted from rhythm waves are presented in Tab. 3. In this study, x(n) represents a kind of rhythm wave in 30 s epoch. Since the sampling rate of the EEG signal is 100 Hz, the length of x(n) is N = 3000.

Time-domain features
• Standard deviation(SD): SD is a time domain feature that measures the amount of variation of the EEG data which can be calculated by Eq. (5).
wherex is the mean of all samples in x(n), expressed as Eq. (6).
• Skewness (Skew): Skew is a time domain feature used to measure the degree of asymmetry of data distribution which can be calculated by Eq. (7).
wherex is the mean of x(n), and N = 3000 is the number of samples in x(n).
• Kurtosis (Kur): Kur is used to measure the level of the peak of the data frequency distribution curve.
wherex is the mean of x(n), and N = 3000 is the number of samples in x(n).
• Line length: The measurement of the amplitude and frequency of EEG oscillation can be realized by line length. The line length is represented by the variable L, and the formula is as follows.
where σ x , σ x and σ x" are the SD of the x(n) and its first and second derivatives, x(n) and x(n)", respectively.
• Katz fractal dimension (KFD): KFD is used to describe the geometric scale characteristics of EEG signals and is an indicator for measuring signal complexity.
where L refers to the sum of distances between two successive points, and d represents the maximum Euclidean distance between the first point and any other point on the waveform.
• Hurst exponent (HE): HE is used to study the scaling properties of a signal. The HE of x(n) is defined as follows: where R represents the biggest difference, that is, the difference between the maximum and the minimum of the deviation of x(n).

Frequency-domain features
The frequency-domain features extracted by the proposed method include two parts: the overall features of the 30 s epochs and the statistical features of the 2 s subepochs. the overall features of the 30 s epochs include the ratio of δ, θ, α, β1, β2, K-complex and spindles energy to total energy, the ratio of α energy to θ energy, and the ratio of δ energy to θ energy. The statistical features of the 2 s subepochs include the 95th percentile, min, mean and variance of E i /E all , where E i is the energy of i-th subepoch, i = 1, 2, ..., 29 and the E all is the overall energy of the 30 s epoch. By this feature extraction method, a 109-dimensional feature vector is obtained for one channel of each EEG signal epoch.
8 × 1 Ratio of 7 waves energy to total energy 7 × 1 Frequency Ratio of α energy to θ energy 1 Ratio of δ energy to θ energy 1 Mean, min, variance and 95th percentile of

Removing features with low variance
As can be seen from Tab. 3, each channel has 109 features, so the double-channel epoch contains 218 features. In order to eliminate the features with lower contribution, reduce the dimension of the feature vector, and reduce the difficulty of model training, the method of removing features with low variance is adopted in this study. The method of removing features with low variance is simple in principle and fast in operation. Assuming that the features vector of an epoch is F , each column in F is a feature, and the variance corresponding to each feature in F is calculated, and a threshold is set, if the variance is lower than this threshold, the feature is removed.

Weighted random forest algorithm
It can be seen from Tab. 1 that the number of epochs in W stage is much higher than that in other stages (N1, N2, N3 and REM) and the number of N1 epochs is the least. In view of the problem of data imbalance in sleep staging, the typical RF classification method is optimized in this paper, and a WRF model is established to improve the recognition rate of W, NREM and REM stage. It is assumed that the number of feature vectors for each class in the training set is N U M i , i = 1, 2, ..., 5, the number of feature vectors of all classes is N U M all , the number of class N = 5, and the weight of each class W i can be calculated by Eq. (14) RF classifier proposed by Leo Breiman contains multiple decision trees and its output category is determined by the highest number of votes among all trees' results [Breiman (2001)]. Through weighted bootstrap resampling technique, a new training sample set is generated by randomly selecting k samples repeatedly from the original training sample set M , and then a RF is generated from k individual decision tree classifier. The essence of RF classifier is an improvement of the decision tree algorithm. Multiple decision trees are merged together, and the establishment of each tree depends on an independently extracted sample. Each tree in the forest has the same distribution, and the classification error depends on each tree's classification ability and their correlation. In this paper, the subject-independent test is performed on the sleep-EDF database, and over-fitting is avoided by the 10-fold cross-validation method. The feature data extracted from each 30 s epoch EEG listed in Tab. 3 are trained through the WRF method after removing features with low variance.

HMM smoothing method
The sleep state and time series in the EEG data are closely related. The WRF classification result has no correlation from the time axis and may appear from the W stage to the N3 stage, which will lead to the wrong sleep stage switching. While HMM can consider the neighboring sleep stage handoff information and improve the classification accuracy of sequence sleep EEG signal. So this paper uses HMM to smooth the output of the WRF.
HMM is a probabilistic model of time series which describes the process of generating the sequence of unobservable states randomly from a hidden markov chain, and then generating the sequence of observations randomly from each state. There is an important property of HMM called non-aftereffect property, which means the current state of system is related only to the past state, not to the future state of system. Generally, HMM is mainly used to solve three kinds of problems: evaluation problem, decoding problem and learning problem. The HMM decoding problem refers to find the most likely path through a model given an observed sequence. Smoothing by HMM's Viterbi algorithm needs the stage transition matrix which describes the probability of transition from previous state to the next state and the emission matrix for the observations which describe the probability distribution of the stages assigned by WRF classifier as a function of the underlying true stage. In this work, the state transition matrix is obtained by calculating the probability of the handoff relation of neighboring epoch's sleep stage state. After obtaining the sleep state of a single subject in the test set after passing through the random forest, we get the emission matrix. The accuracy of the model is improved by using the Viterbi algorithm.

Evaluation criteria
In order to evaluate the effectiveness of the proposed method, confusion matrix (CM) will be calculated to show the difference between the result given by the proposed method and the experts' marks as shown in Eq. (15). S ij represents the number of epochs that are marked to be sleep stage i and are classified as sleep stage j.
AC, Recall (R), Precision (P) and F1-score(F1) can also be calculated from the CM to show the performance of the proposed method as shown in Eq. (19) to Eq. (22). Among them, AC represents the percentage of correctly classified number of epochs to the number of all testing epochs. In this paper, the AC, R and F1-score are used as indicators to evaluate the classification effect of each sleep stage. AC and Kappa coefficient are used to evaluate the general performance of all stages and measure the agreement between the algorithm prediction results and the expert score results.

Test scheme
According to the different ways of dividing training set and test set, the study uses two test methods to evaluate the performance of the model, which includs subject-non-independent test and subject-independent test.
(1) Subject-non-independent test The test scheme mixes the epochs of 8 subjects from the Sleep-EDF database for 10fold cross-validation and assesses performance based on the level of epochs rather than the subjects. In addition, the HMM smoothing method cannot be used for subject-nonindependent test because the temporal relationship between epochs has disappeared. In this type of testing, some of the test data and training data may come from the same EDF data, so the actual application is less meaningful than subject-independent test. However, in order to compare the results to some of the latest studies under similar conditions, we also provides test results of subject-non-independent test.
(2) Subject-independent test In the subject-independent test, the data in the test set and the training set are all independent of the subjects. The data of 19 people in the Sleep-EDF Expanded dataset are used to train the random forest classifier, and the data of one person is used for testing and the experiment is repeated 20 times. Clearly, subject-independent test avoids the selfcorrelation of EEG from the same subject to affect performance, and thus can provide more meaningful results for practical applications and clinical studies. This paper mainly conducts multi-research direction experimental verification of the proposed model based on subject-independent test.

Experiment results and discussion
In this study, experiments are performed according to different purposes using 2 subsets of Sleep-EDF database (4 sc* and 4 st*) and the Sleep-EDF Expanded database (20 sc*, 10 sc* + 10 st*). The data in Sleep-EDF database is often studied in the past work.
To conduct a fair and comprehensive comparison with previous studies on Sleep-EDF database, experiments are performed from multi-conditions which are set to be as similar as possible, including dataset size, EEG channel, and the number of sleep stages. At the same time, in order to verify the robustness and versatility of the proposed model, subjectindependent test is performed using 20 sc* files and 10 sc* + 10 st* files in Sleep-EDF Expanded database, respectively. If there is no special explanation, the experiments in this paper are for 5-class of sleep staging, of which the expected sleep stages are W, REM, N1, N2 and N3. In Section 3.1, this paper explores the performance of a kind of classifiers to verify that the proposed WRF model has advantages in sleep staging. In Section 3.2, E1, E2 and E3 indicate the advantages of the dual channel selected in this paper in automatic sleep staging. In this paper, the importance of HMM smoothing method in automatic sleep staging is shown in Section 3.3, which proves that the proposed method has strong generalization ability and competitiveness.

Comparison of different classifiers
It can be known from the no free lunch theorem that there is no universally applicable optimal classification model that is independent of the specific application. An algorithm (Algorithm A) performs better on a particular dataset than another algorithm (Algorithm B), and must be accompanied by the performance of Algorithm A on another particular dataset that is not as good as Algorithm B. Therefore, in order to compare the performance of different classification algorithms in sleep staging, this paper also lists the classification results of different classifiers to select a best classifier. As shown in Tab. 4, in addition to WRF, this paper also uses 2 classifiers that are widely used in various classification problems: Support Vector Machines (SVM) and eXtreme Gradient Boosting (XGBoost). The results show that the performance of the WRF classifier is the best.    In order to visually show the results of different channel EEG signals, Fig. 3(a) shows Recall, P and F1 of the N1 stage for subject-non-independent test in double-channel EEG, Fpz-Cz channel EEG and Pz-Oz channel EEG. The Recall and F1 of the Fpz-Cz channel are the lowest. It can be seen from Fig. 3(b) that the overall performance of the doublechannel EEG is optimal, and at the same time comparing Tabs. 5-7, it proves that the double-channel EEG selected in this study has strong superiority.

Experiments and results of sleep staging on Sleep-EDF Expanded database
This section contains a total of 8 experiments to explore the performance of the proposed model in different files and test the robustness and generality of the proposed model. E4 to E7 perform the subject-independent test of 20 sc* files in the Sleep-EDF Expanded database. E8 to E11 perform the subject-independent test of 10 sc* files and 10 st* files in the Sleep-EDF Expanded database. For Pz-Oz channel and double-channel signals from the sc* group, the confusion matrix and statistical parameters for each stage without and with HMM are listed in Tab. 8. For Pz-Oz channel and double-channel signals from the 10 sc* + 10 st* group, the statistical parameters for each stage without and with HMM are listed in Tab. 9. As shown in Tabs. 8 and 9, the classification results calculated by WRF have achieved high consistency with the standard labels by experts before HMM-based smoothing. In Sleep-EDF Expanded 1 database, this paper improves the AC of the double-channel from 92.02% to 94.34% by using HMM smoothing, and the total Kappa coefficient from 0.835 to 0.886. The overall AC of the Pz-Oz channel is improved from 90.88% to 94.04% and the Kappa is improved from 0.810 to 0.880. It is also worth noting that with the improvement of AC and Kappa, the R of the N1 stage has been significantly improved. In Tab. 9, The R for Pz-Oz channel signals and the R for double-channel signals increases by 22.27% and 21.32%, respectively. This provides a new way to increase the N1 stage recognition rate, which is a common challenge for automatic sleep staging systems.  In order to visually show the performance of HMM, Figs. 4(a) and 4(b) show the R, P and F1 of each stage for double-channel signals in Sleep-EDF Expanded 2 database without and with HMM smoothing, respectively. It can be clearly seen that R, P and F1 of each stage are improved, and the increase of R in the N1 stage is most obvious. The WRF algorithm based on HMM smoothing mainly improves the recognition rate of N1 stage and REM stage, which greatly improves the overall performance of the proposed method. After HMM smoothing, the overall AC for the double-channel EEG is optimized from 88.84% to 91.97%, and Kappa coefficient is optimized from 0.823 to 0.874. Similarly, the overall performance of Pz-Oz channel EEG with HMM smoothing has also been greatly improved, which fully demonstrates the competitiveness and robustness of the proposed model. The WRF without HMM classifier ignores contextual information between neighboring epoch during sleep staging, which assigns stages independent of other epochs. This method occasionally leads to the improvement of the false recognition rate of sleep staging classification, as seen in the example shown in the middle panel of Figs. 5(a) and 5(b). It  In terms of signal filtering research, traditional filtering algorithms such as Butterworth filters often do not fully understand the time-frequency characteristics of actual signals, and the EEG signal is an unsteady signal, which is more prominent. The wavelet packet transform method can identify and determine the frequency components contained in the signal, thereby filtering out noise or unwanted frequency components, and retaining the required rhythm waves for filtering purposes. As can be seen from Tab. 10, in Sleep-EDF Expanded database 1 and Sleep-EDF Expanded database 2, the sleep staging result after WPT is superior to the result of extracting the rhythm waves by the bandpass filter, regardless of whether or not the HMM smoothing method is employed.

Overall classification performance compared with state-of-the-art works
The performance of the proposed method is compared with the recent work available from the literature listed in Tab. 11. In comparison with state-of-the-art methods, the conditions are set as similar as possible, like dataset sizes, EEG channels and the test schemes. As shown in Tab. 11, the proposed method gains the best performance on both the Fpz-Cz channel and the Pz-Oz channel in the subject-non-independent test from 8 subjects of the Sleep-EDF database compared to the recent machine learning method. Seral [Seral (2013)] used the EEG, EMG and EOG recordings of 5 healthy subjects to establish an artificial neural network (ANN) classification model, and improved the sequential feature selection method to classify the sleep stages step by step. However, the AC is about 2.3% lower than the proposed method.
For subject-independent test works on the Sleep-EDF Expanded database 1, Jiang et al. [Jiang, Lu, MA et al. (2019)] achieved the best AC and Kappa of 91.4% and 0.862, which is much better than other studies, but is still lower than the proposed method by 3.9% and 0.018. Compared to the performance for Pz-Oz channel signals, the overall AC and Kappa for the double-channel signals of the proposed method reached 94.34% and 0.886. At the same time, even though the results of the proposed method for Sleep-EDF Expanded database 2 are better than those for other methods of the sc* group topic which indicates that the method has good generalization capabilities for different databases. Jiang et al. [Jiang, Lu, MA et al. (2019)] also used RF and HMM model for subject-independent test on 10sc* and 10st* files, but the overall AC and Kappa were only 87.80% and 0.810 which is far lower than the proposed method of 91.97% and 0.874. In general, the Kappa coefficient of less than 0.4 is considered to be bad, the value between 0.41 and 0.60 is normal, the value from 0.61 to 0.80 is good, and the value higher than 0.80 is excellent. The proposed method has higher Kappa coefficients for different channel signals of different datasets which indicates that our algorithm results are in good agreement with the expert labels.

Discussion
An automatic sleep staging algorithm, based on the AASM standard, using WRF classifier and HMM is implemented successfully. This paper uses double-channel (Fpz-Cz and Pz-Oz) EEG from the Sleep-EDF database and the Sleep-EDF Expanded database to study the classification performances of different sleep stages. The datasets used in the proposed method is public, which facilitates repeatability studies to facilitate comparison with other papers. The rhythm waves in different sleep stages are significantly different. For example, the δ wave plays a major role in the recognition of the N3 stage, while the N1 stage EEG signal is characterized by the appearance of vertex sharp waves and high amplitude θ waves. Therefore, rhythm wave extraction can effectively identify different sleep stages. The study of Sun et al. [Sun, Jia, Goparaju et al. (2017)] shows that the duration of the rhythm wave is less than 2 s, so this study divides each 30 s epoch into 29 subepochs of 2 s long with 1 s overlap. The contributions of this paper are WPT-based rhythm wave decomposition method and the method of feature extraction. Unlike most studies, this paper uses a 5-layer wavelet packet transform method to decompose each epoch into δ, θ, α, β, spindles and K-complex waves. As shown in Tab. 10, the AC of the proposed method increases by 2% to 3%, and the Kappa coefficient increases by about 0.02, compareing with the method of extracting the rhythm wave using a bandpass filter. In this paper, two types of features are extracted including long-term features and short-term features, and the features having higher correlation with sleep staging are selected by the method of removing features with low variance. Subepoch segmentation and feature selection are the main reasons for the good performance of the proposed model. As a validation, the method has been tested under different conditions, such as different datasets, EEG channels and training models. The results not only indicate the good applicability of the proposed method under various conditions, but also provide the most suitable conditions for potential applications. Another contribution of this method is the HMM smoothing method. This is a fully automatic correction algorithm that does not require pre-defined rules. There is a great correlation between adjacent sleep stages. For example, it is impossible to directly convert the sleep state from N1 to N3, as can be seen from the example of the sleep stage change in Fig. 5(a). Contextual information from neighboring epochs can be interpreted using HMM. The predicted hypnogram after smoothing is shown in the bottom panel of Fig. 5(c). This paper solves the problem that the sleep staging algorithm can only give the classification results of each stage separately without considering the contact between time series and adjacent sleep stages. The innovation of this paper is to propose a novel method of sleep staging, and establish a model combining RF and HMM. There are a lot of literatures about sleep staging algorithms either using RF classification model alone or applying HMM alone, the contribution of this paper lies in the combination of these two methods. The proposed method can take the sleep stage state of neighboring epoch into account, which can smooth the classification model and improve the accuracy of it. At present, the research on sleep staging has been very extensive, but all methods face the same problem, that is, the recognition rate of the N1 stage is much lower than other stages. Because N1 stage is a transitional stage between the W and the N2 stage, and has the characteristics of these two stages. The duration of the N1 stage is much less than other four stages, so it may be wrongly marked with these two stages. In addition, there is a strong similarity between the N1 stage and the REM stage, it is difficult for experts to distinguish the two stages. To solve this problem, this paper proposes an improved WRF model to balance the training set by setting different weights so that the number of epochs in each classes is equal. Fraiwan et al. [Fraiwan, Lweesy and Khasawneh (2012)] used the time-frequency analysis methods of CWD, CWT and HHT to study the sleep staging of single-channel EEG signals, The accuracy and Kappa coefficients for identifying sleep stages using CWD, CWT, and HHT were 78% and 70%, 83% and 76%, 75% and 65%, respectively. Compared with the proposed method, their research is based on single-channel EEG data, but the accuracy and Kappa coefficient have no obvious advantages. Another study was done by Masaaki et al. [Masaaki, Masaki and Haruaki (2002)] which obtained an accuracy of 80% according to AASM sleep staging rules. Their work was based on spectral analysis of the EEG waveforms and they used decision-tree learning for the classification procedure. The proposed method in this paper uses a classification model combining WRF and HMM , and the AC reaches 91.97% when using double-channel EEG data in Sleep-EDF Expanded 2 database.

Conclusions
As accurate sleep staging is helpful to diagnose some disease related to sleep disorder. Doctors need to manually observe the EEG data to mark different sleep stages, which will take a lot of time according to AASM standards. But the variability of sleep EEG waves, accurate sleep staging is hard work for most of the doctors. In order to solve the problem of intelligent sleep staging, the paper proposes an automatic sleep staging method based on RF and HMM with double-channel EEG data. The experimental data are coming from Sleep-EDF database and Sleep-EDF Expanded database. The sleep EEG data are divided into 5 classes (W, REM, N1, N2 and N3) and are segmented into 30 s epochs and 2 s epochs to extract the time domain and frequency domain features based on the WPT coefficient and some statistical features of different rhythm waves. The performance of the proposed method is evaluated by Kappa coefficient, AC, Recall, P and F1-score of the Sleep-EDF database and the Sleep-EDF Expanded database. The experimental results of using the Sleep-EDF Expanded 2 database show that the AC and Kappa coefficient reaches 91.97% and 0.874, respectively. The best AC and Kappa coefficient of single subject reaches 99.68% and 0.919, respectively. Further study on diagnosis of sleep-related disease like depression can be researched based on the automatic sleep staging algorithm proposed in this paper.