Collaborative Sleep Electroencephalogram Data Analysis Based on Improved Empirical Mode Decomposition and Clustering Algorithm

. Sleep-related diseases seriously aﬀect the life quality of patients. Sleep stage classiﬁcation (or sleep staging), which studies the human sleep process and classiﬁes the sleep stages, is an important reference to the diagnosis and study of sleep disorders. Many scholars have conducted a series of sleep staging studies, but the correlation between diﬀerent sleep stages and the accuracy of classiﬁcation still needs to be improved. Therefore, this paper proposes an automatic sleep stage classiﬁcation based on EEG. By constructing an improved empirical mode decomposition and K-means experimental model, the concept of “frequency-domain correlation coeﬃcient” is deﬁned. In the process of feature extraction, the feature vector with the best correlation in the time-frequency domain is selected. Extraction and classiﬁcation of EEG features are realized based on the K-means clustering al-gorithm. Experimental results demonstrate that the classiﬁcation accuracy is signiﬁcantly improved, and our proposed algorithm has a positive impact on sleep staging compared with other algorithms.


Introduction
Sleep is of extraordinary significance to human beings and is closely related to people's life. It plays an important role in the maintenance of human body functions because it can enhance assimilation and reduce the level of alienation [1,2]. Sleep is a state of rapid reversibility characterized by loss of consciousness and diminished response to external stimuli [3][4][5]. For humans, one of the principal causes of medical problems is sleep-related diseases, which seriously affect the life quality of patients. e purpose of sleep staging is to classify sleep stages, which is essential for sleep studies and the diagnosis of sleep disorders. Traditionally, according to the Rechtschaffen and Kales recommendations or the new guidelines developed by the American Academy of Sleep Medicine (AASM), experts manually analyze night polysomnography (PSG) records to perform visual scoring. Later, based on the improvement of the AASM rules, the S3 and S4 phases were merged into slow-wave sleep (SS), and the sleep was divided into five stages: the W, S1, S2, SS, and REM periods [6,7].
An electroencephalograph (EEG) is a record that reflects the regular electric action of brain cell groups and it contains a large amount of physiological and pathological information. It is helpful for clinicians to improve the reliability and accuracy of diagnosis and detection of neurological injury in the brain [8][9][10]. At the same time, it provides an effective method for the diagnosis of brain diseases. An EEG signal is a waveform that contains a variety of frequency components, and it is usually divided into β waves , α waves (8)(9)(10)(11)(12)(13), θ waves (4-7 Hz), and δ waves (0-4 Hz).
erefore, sleep stages can be classified based on different brainwave frequencies and data characteristics in the EEG signal. e manual classification of 8-hour PSG recordings (whole records) takes approximately 2 to 4 hours. Moreover, manual marking results have a strong subjective consciousness, which easily affects the classification accuracy. erefore, the study of automatic sleep stage classification is imperative [11]. rough the analysis on PSG records, automatic sleep stage classification (ASSC) can be achieved with computers, thus solving the problem of time-consuming and laborious manual marking [12].
Considering the nonlinear and unsteady timing complexity of EEG data signals, we propose an automatic sleep stage classification method based on EEG. e improved complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm is used to extract the features of EEG data and calculate the IMF components. By calculating the frequency-domain correlation coefficients of the IMF components, an appropriate number of IMF components are selected, and a new feature vector is formed as the input for the next stage classifier. e clustering of extracted and selected features is performed by an improved K-means algorithm. ASSC is finally achieved based on the selection of relevant distances and cluster centers.
e main contributions of this study are as follows: (1) An improved CEEMDAN incorporating the frequency-domain correlation coefficient is proposed to extract EEG feature and calculate the IMF components. By calculating the frequency-domain correlation coefficients of the IMF components, an appropriate number of IMF components are selected and the new feature vector is formed.
(2) An improved K-means clustering algorithm with density based on correlation coefficient is proposed. e correlation coefficient is first defined as the distance metric based on the temporal and spatial correlation of specific time series data. en, the density is used to select the clustering center of K-means and clustering centers are iteratively updated by calculating the average of all the points.
(3) For the nonlinear and unsteady timing complexity of EEG data signals, we propose an automatic sleep stage classification method based on the improved empirical mode decomposition and K-means clustering algorithm. rough the innovative improvement of the feature extraction method and the classifier algorithm, the classification accuracy of ASSC is obviously improved, and better experimental results are obtained.
e remainder is organized as follows. Section 2 reviews on the state-of-the-art automatic sleep stage classification and Section 3 introduces the EMD and its variants briefly. Section 4 presents the proposed ASSC based on improved CEEMDAN and K-means in detail including overall framework, improved CEEMDAN incorporating the frequency-domain correlation coefficient, and improved K-means clustering algorithm with density based on correlation coefficient. Section 5 analyzes the experimental setting and results on sleep staging accuracy and clustering effectiveness and also discusses the finding in this study. At last, Section 6 concludes the paper and discusses future opportunity of the research.

Related Work
To further improve the efficiency and accuracy of ASSC, researchers have conducted a large number of experimental studies to achieve better ASSC results by improving the relevant algorithms [13][14][15]. Among them, feature selection is applied to enhance the ability of classifying the training data and can improve the efficiency and accuracy of data classification [16][17][18]. Recently, feature analysis and extraction methods have been increasingly studied, and classical or modern signal processing methods have been adopted to analyze EEG data. For example, Anderson et al. [19] used an autoregressive (AR) model to extract features of EEG signals and used two-and three-layer forward neural network to perform 10-fold cross-validation on 4 subjects with 5 cognition items. To achieve better results, Yang et al. [20] proposed an EEG signal feature extraction method based on wavelet packet decomposition to classify two different thinking activities. Fell et al. [21] conducted a comparative study of frequency domain and nonlinear methods and divided the sleep process into four stages: S1, S2, SS, and REM. Extracted frequency-domain features included power, spectral edge, and D2. However, these studies ignore the changes in the timing characteristics of EEG and local feature signal along with self-adaptation. Fortunately, the empirical mode decomposition (EMD) algorithm solves the problem of nonstationary signals in EEG data [22], and data feature extraction can be solved adaptively by decomposing the data into an intrinsic mode function (IMF). erefore, scholars have conducted extensive studies on applying EMD and EMD algorithms into signal filtering and detection, fault analysis, and physiological signal processing. ese results are difficult to achieve by traditional methods such as Fourier transform and wavelet transform [23].
Due to the good self-adaptability of the EMD algorithm, researchers have conducted extensive studies on EMD. MuFeng and Yuyu [24] proposed an improved EMD filtering algorithm and used fast Fourier transformation (FFT) to perform simple spectrum analysis on the signal. If there are high-frequency noises in the signal, then the first-order IMF component decomposed by EMD is processed and can achieve a better filtering effect. Zhang et al. [25] proposed a fast wavelet transform (FWT), which can achieve high computational speed and improve computational accuracy at the same time. Wu and Huang [26] proposed a new ensemble empirical mode decomposition (EEMD) algorithm in 2005. It is an improved EMD algorithm that effectively solves the EMD mixing phenomenon. Later, CEEMDAN algorithm was proposed with added adaptive noise so that the modal effect is further reduced. Compared with the former algorithms, it has better convergence and 2 Complexity Hassan and Bhuiyan [27] applied it to the analysis of EEG data to achieve ASSC. At present, there are many classification methods applied to EEG signals, including clustering, SVM, neural network, and decision tree [28,29]. e K-means algorithm was proposed by Macqueen in 1967 [30]. It is a numerical clustering algorithm and requires the simultaneous extraction of N features. e original K-means is a distance-based iterative clustering algorithm. e advantages are that it is fast and simple and has high efficiency.
is paper focuses on the research of clustering algorithms and hopes to apply it to the research of sleep staging through the improved K-means algorithm. For example, Günes et al. [31] proposed the combined structure of feature weighting and a C4.5 decision tree based on K-means clustering in sleep stage classification. e clustering algorithm on the ASSC not only solves the disadvantages of time-consuming and laborious manual marks but can also effectively improve the efficiency of staging operations and the accuracy of ASSC [32].

EMD and Its Variants
EMD is a novel and adaptive signal time-frequency processing method proposed by Huang et al. [33] in 1998. It is especially suitable for the analysis of nonlinear nonstationary signals. In 1999, Huang et al. [34] improved EMD and introduced Hilbert spectral analysis to improve the data processing capability. It was considered to be a breakthrough in linear and steady-state spectral analysis based on the Fourier transform in 2000. EMD aims to generate a highly localized time-frequency estimation of a signal in a datadriven fashion by decomposing it into a finite sum of IMF or modes. Each mode must satisfy two conditions: (1) e number of extrema and the number of zero crossings must be the same or different by at most one (2) At any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero For input signal, EMD iteratively decomposes an Npoint EEG epoch into amplitude and frequency-modulated IMFs according to the following steps: Step 1: initializing k � 1 and l � 1.
Step 2: identifying the local maxima and minima of input EEG data x.
Step 3: obtaining the envelope of local maxima v max and local minima v min using cubic spline interpolation.
Step 4: generating the local mean curve m with the upper and lower envelopes: Step 5: computing h k by subtracting the local mean curve from x: Step 6: if h k satisfies the two conditions of IMF, then IMF l is obtained; otherwise, set x � h k , k � k + 1, and go to step 2, repeating steps 2-5 until h k satisfies the two conditions of IMF, and finally, IMF l is obtained as follows: Step 7: setting c l � IMF l as the current mode.
Step 8: finding the residue, r l � x − c l , x � r l , and l � l + 1. Steps 2-8 are known as sifting.
Step 9: repeating steps 2-8 to find the rest of the IMFs us, the input signal can be decomposed into L IMFs until the residue becomes a monotonic function such that further extraction of an IMF is not possible. e input x can be reconstructed from all the IMFs as follows: where r L is the residue of the L-th iteration. EMD and its variants such as bivariate EMD and multivariate EMD are widely used for EEG and other physiological signal analysis [35]. However, EMD and its extensions suffer from a mode mixing problem. Later, to eliminate the problem of mode mixing, a noise-aided adaptive data analysis and extension method based on EMD was proposed and named EEMD [36]. In EEMD, low-level random noise is added as input to the EMD decomposition process. e process of implementing EMD on the modified signal is called a test. e test was repeated many times to obtain the final pattern. e biggest improvement of EEMD is that it incorporates Gaussian white noise on the basis of EMD as follows: where w i (n) with i � 1, 2, . . ., I represents different realizations of white Gaussian noise. Although EMD is data-driven, it is affected by the problem of mode mixing, which leads to different oscillations in the same mode, or similar oscillations in different modes. e structure of this algorithm is too simple to solve the problem. Although EEMD effectively solves the problem of pattern blending, EEMD decomposition produces residual noise and is also computationally expensive.

Overall Framework.
We design an improved CEEM-DAN and K-means combination algorithms to implement ASSC as shown in Figure 1. On the basis of time-domain feature extraction, we further add frequency-domain feature calculations and select the most effective IMF components.

Complexity 3
is approach eventually achieves the ASSC through K-means clustering. e process of the proposed approach is as follows: (1) Input: data preprocessing removes artifacts and interference noise by the wavelet denoising, and EEG data are obtained as experimental sample data. (2) Feature extraction: we use improved CEEMDAN algorithm to extract features from the sample data. rough the calculation of the time-domain correlation of the EEG signal, the sample data are subjected to EMD decomposition to obtain the IMF component. en, we apply an iterative calculation to extract features from the EEG signals.
(3) Feature selection: on the basis of feature extraction and IMF component acquisition, we transform each IMF component into its corresponding frequencydomain feature through FFT, and its "frequencydomain correlation coefficient" is obtained through empirical selection and effective data analysis methods; then, IMF components with high-frequency-domain correlation are selected, and all IMF components are reconstructed. (4) Classifier: the reconstructed EEG is selected as input to achieve sleep staging based on the improved K-means clustering algorithm. (5) Output: the clustering result is directly output, and then the result is compared with manual markers.

Improved CEEMDAN Algorithm.
CEEMDAN algorithm is improved from the original EMD algorithm. e added Gaussian white noise is subjected to EMD decomposition to obtain the IMF component based on the bootstrap aggregation method, and then, iterative calculations are performed to realize the decomposition of the dataset. It is worth mentioning that the performance of the traditional time-frequency transform method based on wavelet transformation is influenced by choice of the best basis function. CEEMDAN is data-driven and does not require prebase functionality, which makes it an attractive option for dealing with highly nonlinear and nonstationary signals such as sleep EEG signals [37][38][39].
In this study, an improved CEEMDAN incorporating the frequency-domain correlation coefficient is proposed in which the IMF's frequency-domain correlation coefficient is redefined and the correlation of each component of the IMF with the original signal in the frequency domain is embodied. en, an appropriate number of IMF components are selected and the new feature vector is formed by calculating the frequency-domain correlation coefficients of the IMF components. Improved CEEMDAN algorithm can be described as follows: An operator E j (·) is defined that produces the j-th mode of EMD.
Step 1: calculate X i (n): where X i (n) is obtained by adaptively adding white noise sequence to x(n), w i (n) with i = 1, 2, . . ., I represents different realizations of white Gaussian noise, and ε 0 is the standard deviation of white Gaussian noise.
Step 2: decompose the above signals by EMD to obtain their first modes.
Step 3: compute the first mode IMF 1 of CEEMDAN with Step 4: obtain the first residue: Step 5: decompose realizations r 1 (n) + ε 1 E 1 (w i (n)) with i = 1, 2, . . ., I up to their first EMD mode. ε k (k = 1 for this stage) is the standard deviation of white Gaussian noise at k-th stage. IMF 2 (n) can be calculated as follows: Step 6: compute the k-th residue for k = 2, 3, . . ., K: Step 7: decompose realizations r 1 (n) + ε 1 E 1 (w i (n)) with i = 1, 2, . . ., I up to their first EMD mode and define the (k + 1)th mode as follows: Step 8: go to step 6 for the next k.
Steps 6 to 8 are repeated until the residue becomes a monotonic function such that further extraction of an IMF is not possible. K is the total number of modes and r k (n) is the final residue. en, k + 1 EMD modes are obtained through iterative calculation as shown in formula (11). At this point, we have multiple intrinsic modal functions (IMFs), that is, IMF 1 , IMF 2 , . . . , IMF K .
Step 9: for each IMF component, fast Fourier transformation (FFT) is performed to transform the EEG signal from the time domain to the frequency domain. Its definition is as follows: where W N � e −j(2π/N) . e frequency spectrum analysis of the IMF component is performed by using FFT, and the corresponding frequency-domain form in the time-domain feature is calculated to obtain the frequency and amplitude. In the experiment, we directly call the FFT function in MATLAB. Among them, we set the sampling frequency f s = 100 Hz and sample time t = N/f s ; N is the data length. e amplitude and frequency are shown in Figure 2.
Step 10: to obtain effective feature components after CEEMDAN decomposition, we redefine the IMF's frequency-domain correlation coefficient index by the improvement of the concept of "frequency-domain correlation coefficient" proposed in [40]: In formula (13), IMF i (f) and x(f) represent the frequency-domain form of IMF k (n) and x(n) in formula (7) respectively; μ IMF I and μ x are the frequency-domain mean of IMF k (n) and x(n); σ IMF i and σ x represent the standard deviation of the frequency domain. e frequency-domain correlation coefficient of the IMF reflects the correlation of each component of the IMF with the original signal in the frequency domain. Namely, ρ x,IMF i ∈ [0, 1] indicates the correlation between the IMF components and the original signal in the frequency domain. erefore, according to the value of ρ x,IMF i , the most effective IMF component can be selected from the multiple IMF components obtained by the decomposition, and the IMF can be recorded according to the frequency band from high to low: IMF � IMF 1 , IMF 2 , . . . , IMF K . e specific details of IMF component screening are described in the experimental section and in Figure 3.
Step 11: the input x(n) can be reconstructed from all the IMFs as follows: We calculate the mean (μ), variance (σ 2 ), skewness (c), and kurtosis (κ) for the reconstructed x(n), as defined mathematically in Table 1.
IMF time-frequency-domain feature vector is calculated for N-band EEG data, and the feature vector is composed as follows: e input x(n) can be reconstructed from an N-dimensional vector feature set into a new x(n) set as input to the next stage classifier.

Improved K-Means Algorithm.
K-means is a basic clustering algorithm; however, it is also limited by some practical applications and its own mechanisms. First, K must be given in advance and the choice of K value is very difficult to estimate. It is necessary to determine an initial partition based on the initial clustering center. It can be seen from K-means algorithm framework that the time complexity of the algorithm is large, the application of the time series data Complexity is vulnerable to outliers, and the uncertainty of K value also leads to a decline in the quality of the cluster [41]. To overcome the disadvantage of K-means converging to the local problem and improve the influence of K value on the clustering quality due to inaccuracy, we improve the K-means algorithm as follows.
In this paper, the initial center of clustering is selected by a density concept based on the correlation coefficient and correlation distance. For EEG sample data, there is a certain correlation between the time series of EEG data. e correlation coefficients are defined as follows: In formula (16), cov(x i , x j ) is the covariance of x i , x j . D(x i ) and D(x j ) are the variance of x i and x j , respectively. P x i ,x j is called the correlation coefficient, which is used to measure the degree of correlation between random variables. P x i ,x j ∈ [−1, 1] indicates that greater correlation coefficients are associated with greater correlation between the variables x i and x j . When P x i ,x j is 1 or −1, there is a definite linear correlation between x i and x j . e relevant distance is calculated as follows: For these data relationships, the density is defined as a number of data points randomly distributed within a certain range. Now, setting D � x 1 , x 2 , . . . , x n , the density of x i is defined as follows: Among them, x j belongs to the set of points closest to x i . In the clustering process, ρ minimum point is the first clustering center. When the next cluster center is determined, the set of clusters formed by the first ρ minimum is removed from the dataset D. In the remaining sets, the smallest point is chosen to form a new clustering center until K clustering centers are selected. Most related algorithms choose Euclidean distance to calculate the cluster center. However, Euclidean distance neglects the correlation between the time series data, and thus, it is not suitable to EEG signal analysis. erefore, improved K-means uses the correlation coefficient x i − μ σ 4 6 Complexity as the distance metric based on the temporal and spatial correlation of specific time series data and takes full account of the correlations in the time series data. Improved K-means clustering will iterate to update the prototype after selecting the initial center and calculate the average of all the points in the class as the new clustering center.
e mean vector of the new clustering center is defined as follows: For a certain set of data, the data distribution law conforms to a certain normal distribution law.
us, the normally distributed sample data are discussed in detail to be accurate to the threshold of each segment. In addition, it is used in conjunction with the idea of a piecewise function. Now, the normal distribution probability and data correlation coefficient are equivalent: where u is the mean of the distance from the interior point to the center of the cluster and σ is the distance standard deviation from the interior point to the center of the cluster.

Algorithmic Flowchart and Description.
e flowchart of the proposed ASSC is depicted in Figure 4. e algorithmic description of the proposed ASSC is summarized in Algorithm 1.

Performance Metrics.
We use the accuracy rate (ACC) as an objective evaluation metric, which means that the number of correct samples is divided by the number of all samples. Its definition is as follows: where TP and TN represent, respectively, the positive samples and negative samples that are classified into the right types. ACC, as one of the evaluation indexes of the classification accuracy, can effectively measure the classification accuracy of the experimental results. erefore, we calculated the ACC to better evaluate the experimental results.
To evaluate K-means clustering algorithm, we used SSE as its performance metric to evaluate the clustering quality. SSE expresses the sum of squared errors of the fitting data and the corresponding points of the original data. e smaller the SSE is, the smaller the error of the sample and the center is. Its definition is as follows: where N indicates the number of data sources, K represents the number of cluster centers, and u k indicates the cluster center. e purpose is to make the data in each class different from each cluster.

Datasets and Preprocessing.
Experimental data are obtained from the PhysioNet Data Bank's Sleep-EDF database. e experimental samples were from Caucasian men and women (21-35 years) who did not take any medications. e first four records (marked as sc * ) were obtained from volunteers with healthy respiration within 24 hours of normal daily life in 1989. e data records in the database (marked as st * ) were obtained in 1994 from subjects who were light sleepers but relatively healthy. One example is shown in Figure 5.
ere were only 8 sample datasets in the original Sleep-EDF database, including 4 from healthy volunteers. To achieve the ASSC, our experimental samples selected sleep data from three healthy volunteers, sc1, sc2, and sc3 (corresponding to the database data sc4002e0/sc4012e0/ sc4112e0); each sample contained the level of EOG, Fpz-Cz, and Pz-Oz EEG data, and each sampled at 100 Hz. EEG signals from the Pz-Oz channel produced better classification performance than the Fpz-Cz channel [42,43]. e Pz-Oz channel was chosen for our study by comparing with the literature [44].
Experts scored the EEG data and generated the PSG based on R&K recommendations as shown in Figure 6. e interval for each period in this study was defined as 30 s, or 30 × 100 � 3000 data points. In addition, we calculated the effective sleep time for each sample, marked the 24-hour EEG signal, and selected 9 hours of sleep validity as the final test sample through the manual marker results (.hyp) in the Polyman statistics.
For comparison with the manual marking results, we designed two classification criteria. One was defined according to the R&K sleep stages including AWA, S1, S2, S3, S4, and REM, and the other was to define sleep stages as five categories (S3 and S4 were combined into SS periods).
Data preprocessing was performed to reduce noise interference. In this experiment, the function wthcoef was used to perform the threshold processing on wavelet decomposition coefficients, and then, the threshold-processed wavelet coefficients were used for reconstruction to denoise. First, wavelet function "db5" was used to perform 3-layer decomposition of the signal. Second, the scale vector n was set to 1, 2, and 3, and the threshold vector p was set to 100, 90, and 80. Next, the modified wavelet decomposition structure was reconstructed. Finally, we obtained the processed data as shown in Figure 7.

Results.
FFT was calculated based on the frequencydomain correlation, according to the principle that the larger the ρ value, the greater the correlation between the IMF components and the original signal in the frequency domain. We found that, after sorting, the ρ values of IMF 1 to IMF 12 gradually decreased as shown in Figure 8.
We set the threshold method (when ρ > 0.3, the frequency-domain correlation was more obvious; otherwise, if ρ ≤ 0.3, we ignored it) and selected 7 IMF components, Complexity 7 which were reconstructed by improved CEEMDAN and used as classifier input as shown in Figure 9.
According to the clinical requirements of sleep staging, the sleep process was divided into 30 s as a stage. If two phases of data appeared in the same data area, more than half of the sleep phase time represented the sleep phase as this segment of data. We observed the changes of classification accuracy of sleep grading by adjusting the number of cluster centers (K � 5 or K � 6) as shown in Figures 2, 9, and 10. e objective evaluation metric of ACC is introduced to compare and analyze the results of manual markers. e proportion of accurately classified samples to the total samples was calculated, and the classification accuracy of the experiment was obtained. e experimental results show that different K values lead to different results in Tables 2 and 3. When K � 6, the R&K sleep staging criteria were used to define six sleep stages. At this time, S3 and S4 were counted independently, and the classification accuracy reached 81%. When K � 5,

Require:
e original EEG signal is processed with the wavelet denoising algorithm. Ensure: e clustering results indicate that EEG signal is divided into different sleep stages. (1) Define x(n) an N-point EEG epoch X.
(2) Variable Nstd is the noise standard deviation; NR is the number of realizations; MaxIter is the maximum number of sifting iterations allowed.
(3) By improved CEEMDAN decomposition, the first mode IMF 1 and the first residual component r 1 (n) are obtained, as in formulas (6)-(9). (4) for k � 2, . . ., K do (5) Calculate the k-th IMF component and residual component r k (n) � r k−1 (n) − IMF k (n). (6) Decomposing to achieve the new mode as in formula (11). (7) end for (8) e initial cluster center is divided. (9) According to formulas (16) and (17), the correlation distance d x i ,x j between the data points and the density ρ of each point are calculated, and the smallest ρ is used as the first cluster center to obtain set D. (10) e data remaining in set D are allocated to the nearest class according to the distance from the nearest cluster center. (11) According to formula (19), the distance is calculated from each point to the center in each class; u and σ are calculated according to different σ segment calculations. e smallest ω is obtained as the new clustering center. (12) Recalculate and assign individual sample objects until the cluster center no longer changes.  8 Complexity stages S3 and S4 were considered as a single stage. e number of cluster centers was reduced, and the experimental complexity was reduced. With this approach, the accuracy rate was 83%. Table 4 shows the SSE values of the two algorithms after the first iteration. e SSE value of the improved algorithm was less than that of the original clustering algorithm, and the smaller the SSE, the better the clustering results.
Due to the uncertainty of different channels of EEG data, various methods of ASSC make it difficult to establish a unified standard for comparison. To make the results more meaningful, the results obtained from several different methods on the same dataset were used for comparison. e accuracy values in Table 5 are the best for a given method.
ere are no relevant accuracy values in some literature studies, so the missing cases are represented by "-" in     Table 5. erefore, the proposed method was compared with the original K-means algorithm and other algorithms. e experimental results show that obvious increase was achieved in metrics of accuracy and efficiency.

Discussion
. It is not difficult to find that the average accuracy greatly improved using the proposed method. We also found that there is a certain correlation between different stages such as W and S1 stages. In the blinking state, the alpha wave is weak, and its characteristics are similar to those of S1. is is indistinguishable from the morphological diversity of EEG waveforms, but it does not mean that all phase values will be reduced. Especially, when using the six-phase classification, the S3 and S4 phases are significantly improved in comparison with the results in the literature [44]. is improvement may be related to our choice of features and improvements in the correlation coefficient of the clustering algorithm. Moreover, previous studies may have overlooked the correlations between different stages and these issues will be studied in the future.

Conclusion
is paper proposes an ASSC method based on improved CEEMDAN and K-means. First, improved CEEMDAN algorithm is applied to time series data processing. e appropriate time-frequency is selected based on frequencydomain correlation analysis and calculation. Domain features are used as feature vectors, which are reconstructed by EMD methods to reduce the data dimensionality of the original EEG signals and to improve the computational efficiency. Second, we have improved the classification accuracy of the clustering algorithm based on the innovation of density definition. Finally, we find that the correlation between different sleep stages is significantly improved based on the improved clustering algorithm.
Although the proposed method can improve the classification accuracy of sleep staging, there are still some limitations that should be taken into consideration [48][49][50]. One of the disadvantages is the relatively low classification accuracy. We will employ deep learning to solve this problem in the future study [51,52]. On the other hand, we will further explore the correlation between different sleep stages and differentiate them to better improve the classification accuracy [53,54].

Data Availability
Data used in preparation of this article were obtained from the PhysioNet Data Bank's Sleep-EDF database (https:// www.physionet.org/content/sleep-edfx/1.0.0/). e investigators within the Sleep-EDF database contributed to the design and implementation of the ASSC method and/or provided data but did not participate in analysis or writing this report. e dataset is described in Kemp et al. [55]. It can be downloaded from https://ieeexplore.ieee.org/document/ 867928. is dataset has been supported by Goldberger et al. [56].

Conflicts of Interest
e authors declare that they have no conflicts of interest.