Deep convolutional architecture‐based hybrid learning for sleep arousal events detection through single‐lead EEG signals

Abstract Introduction Detecting arousal events during sleep is a challenging, time‐consuming, and costly process that requires neurology knowledge. Even though similar automated systems detect sleep stages exclusively, early detection of sleep events can assist in identifying neuropathology progression. Methods An efficient hybrid deep learning method to identify and evaluate arousal events is presented in this paper using only single‐lead electroencephalography (EEG) signals for the first time. Using the proposed architecture, which incorporates Inception‐ResNet‐v2 learning transfer models and optimized support vector machine (SVM) with the radial basis function (RBF) kernel, it is possible to classify with a minimum error level of less than 8%. In addition to maintaining accuracy, the Inception module and ResNet have led to significant reductions in computational complexity for the detection of arousal events in EEG signals. Moreover, in order to improve the classification performance of the SVM, the grey wolf algorithm (GWO) has optimized its kernel parameters. Results This method has been validated using pre‐processed samples from the 2018 Challenge Physiobank sleep dataset. In addition to reducing computational complexity, the results of this method show that different parts of feature extraction and classification are effective at identifying sleep disorders. The proposed model detects sleep arousal events with an average accuracy of 93.82%. With the lead present in the identification, the method becomes less aggressive in recording people's EEG signals. Conclusion According to this study, the suggested strategy is effective in detecting arousals in sleep disorder clinical trials and may be used in sleep disorder detection clinics.


INTRODUCTION
Studies have shown that unfavorable sleep during rest negatively affects both work performance (Ting et al., 2014) and emotional wellbeing (Galvão et al., 2021;Vandekerckhove & Yu-lin, 2018). Arousals during sleep are a common indicator of poor sleep quality (Balakrishnan et al., 2006). A sudden change in electroencephalographic frequency occurring within 10 s of sleep onset is called electroencephalographic arousal by the American Academy of Sleep Medicine (AASM; Bonnet et al., 2007). Moreover, theta and alpha subbands, as well as frequencies over 16 Hz, have been observed to have changed.
Sleep arousal is a brief interruption of consciousness occurring during sleep caused by snoring or partial airway obstructions (Fernández-Varela et al., 2017). Shortening of the sleep cycle occurs every 3 to 15 s after waking up. Most people do not notice when they are waking up in this condition; however, if they have been awake for more than 15 s, they may notice. The quality of human sleep decreases with frequent awakenings. The American Sleep Apnea Association reports that chronic sleepiness can be caused by as few as five arousals per hour.
In spite of the widespread problems associated with sleep irregularities, relatively little attention has been paid to automated detection and monitoring of apneas and arousal disorders (Zhang et al., 2022).
Thus, sleep interruptions and daytime sleepiness are associated with non-restorative sleep (Ghassemi et al., 2018).
As arousals during sleep are harder to detect using traditional methods, which makes research expensive, arousals during sleep are also less reliable for automatic detection than apneas (Engleman & Douglas, 2004). Early recognition of sleep hyperarousal is crucial to the diagnosis and treatment of sleep disorders (Fonod, 2022). It is possible to reduce the likelihood of sequelae, such as changes in blood pressure and cardiovascular disease, if the condition is detected early.
Most state-of-the-art arousal detection methods use multi-channel recordings of PSG signals. Sleep experts visually grade 30-s segments of PSG recordings according to standards established by the AASM (Berry et al., 2017). As a result of the large amount of data that must be analyzed, the process takes a considerable amount of time and effort. The sleep recording was recorded at 200 Hz and sampled 75 million times over an 8-h period. Sleep recordings on this device can be manually recorded for hours. It is therefore necessary to create more efficient and consistent processes to meet the inter-rater consensus of about 80% for the AASM standard (Altevogt & Colten, 2006).
As time passes, the data record the patient's sleep and wakefulness phases, which can be analyzed by experts later. Despite their usefulness for analysis, PSG signals are both time-consuming and difficult to collect. In addition, they require a lot of connections with tactile sensors, which can be uncomfortable for the subject and can alter the outcomes.
Previous research has attempted, with varied degrees of success, to accomplish automated arousal detection based on physiological mark- Here are the other sections of this article: Section 2 discusses relevant literature. Sections 3 and 4 present the suggested method and experimental results. Section 4 discusses the results that prove the proposed sleep arousal detector. The conclusion of the research is presented in Section 5.

RELATED WORK
A number of works have been conducted in recent years on the use of EEG signals to recognize sleep arousal incidents in order to detect sleep disorders (Chien et al., 2021;Cho et al., , 2007Fonod, 2022;Liang et al., 2015;Olesen et al., 2020;Ugur & Erdamar, 2019). It has been suggested in the study by   In view of the large variety of methodologies that are employed across past studies, it is impossible to compare the results of past studies in an exact and objective way. As explained in Chien et al. (2021 andLiang et al. (2015), the analyzed datasets, just like the physiological signals and performance indicators, are exclusive to individual researchers or are inaccessible to the general public. One of the fundamental problems with sleep arousal detection is the lack of a reliable threshold for recognizing sleep arousal.

PROPOSED MODEL
As shown in Figure 1, the algorithm consists of three steps: windowing, feature extraction from deep learning structures, and classification.

Signal decomposition
High-and low-frequency parts of EEG data are analyzed using discrete wavelet transforms, which require less processing time than wavelet transforms (WT). The gamma band (γ), beta band (β), alpha band (α), theta band (θ), and delta band (δ) can be used to categorize EEG background waveforms (Al-Kadi et al., 2014;Al-Qazzaz et al., 2015). It is also possible to obtain useful information from EEG background waves. With its numerous resolutions, WT is a new technique for time-frequency analysis. Both temporal and frequency localizations are possible with WT. Band-pass filters separate signals into various frequency bands. Sleep problem sufferers may experience similar effects from EEG data extraction patterns as non-arousal participants.
It would be very useful to segment the EEG data into various subbands and frequencies since both types of signals produce similar patterns.
The EEG signal from subjects needs to be decomposed into subbands, and these rhythms are based on different frequencies of the separation format signal. Gamma bands are defined as higher than 30 Hz, beta bands as higher than 12 Hz, alpha bands as higher than 12 Hz, theta bands as higher than 4 Hz, and delta bands as lower than 4 Hz.
Using a classification algorithm and localized wavelet filter bank characteristics, we can detect anomalous arousal conditions in participant EEGs. For signal dimension reduction, the EEG signal is first converted into wavelet coefficients using a wavelet decomposition algorithm employing DB3 (Level 3).

Signal windowing
When processing non-stationary physiological signals, windowing is essential for reducing signal complexity. The EEG signal is windowed before features are retrieved from each produced frame with a specific amount of overlap. Because the duration of the EEG signal windowing process varies from state to state, selecting the duration and degree of overlap between successive frames may be challenging. For the EEG data to be split into equal time intervals, all frames must have the same length and overlap with the frames before and after them.

Spectrogram
Using windowed Fourier transform expansions or short-time Fourier transforms (STFTs), signal power spectrum and framing can be created.
The moving window function g(t) is applied to the signal x(t) at time τ.
For each specific time τ, x(t) within the window is transformed using a finite time Fourier transform. Alternately, the window is shifted by τ in the direction of time and the Fourier transform is applied. As a result of this alternating process, the Fourier transform of the whole signal is calculated, while the signal part inside the window is treated as static.
STFT transfers a signal in the time range to 2D time-frequency display, and the changes in the frequency content of that signal are displayed in the window. As shown in (1), STFT can be defined as follows: STFT has a limitation, which is that if the window size is chosen once, the frequency-time separation will remain constant throughout the frequency-time plane even as g(t) changes. Because of this, choosing an appropriate window size in the STFT method will be challenging when both high-frequency and low-frequency components are present In a given frequency unit, the power spectrum represents a part of the signal power.

Feature extraction
Convolution However, with the increasing sampling rate, it is also possible that the previous layers will not maintain the appropriate features in some cases (especially when the feature mapping is thin to reduce processing time). There has already been evidence that this structure saves time and improves accuracy (Szegedy et al., 2017). There is a hybrid primitive module in Inception-ResNet-v2, which is inspired by ResNet's transfer learning model. Inception ResNet has two minor versions, version 1 and version 2. The modules A, B, and C of both subversions, as well as the reduction blocks, share the same structure. There is only a difference in the hyperparameter settings. In order to set hyperparameters precisely, many methods have been proposed.
The residual connections are considered the output of the initial module's complexity operation. In order to add a residual to the study, the output and input after convolution must have the same dimen- network. The interior modules are illustrated in Figure 3. S indicates the use of "same" padding, while V indicates the use of "valid" padding.
Sizes to the left and right of each layer summarize its output form.

Classification
Using a categorization framework, we construct models that predict output characteristics based on input quality. By incorporating an appropriate strategy within the machine learning framework, SVM improves data classification.
SVMs are designed to reduce class separation by identifying the optimal hyperplane. Furthermore, the Φ kernel function detects highly complex inputs with wider coverage. With this kernel function, we can linearly separate data in more than two dimensions. Using D features and a single training dataset, we can create a function that translates input into output: Function f can only be obtained by minimizing the values of w and b in this equation: We will assume that the C (a soft margin parameter) variable is a fixed parameter that can be adjusted by the user. Due to the absence of the variable ε, this parameter is aimed at stabilizing and modifying penalty weights while also boosting discrimination capacity. As a result, the L ε function is presented in the following manner: F I G U R E 2 Inception-ResNet-v2 has been illustrated in its architecture, and the details of its structure are presented in Figure 3.

F I G U R E 3 An outline of the design of the Inception-Reduction-Network A, B, C and Reduction-Network A, B.
As a maximum of the previous equation, the following one is written: where Equation (5) is used to define conditions: Condition : By analyzing the mentioned equations, the SVM function, that is, f in Equation (2), can be achieved utilizing the kernel function: In the first half of the experiment, classification is applied to training data. SVM classifier with RBF kernel initializes γ (the inverse of standard deviation) and C randomly. There is a difference in the speed at which each of these qualities changes. Accordingly, kernel RBF can be described as (8) for support vectors: Sigma (σ) specifies the level of non-linearity included in the model and is comparable to the kernel function's bandwidth. When sigma is small, the decision boundary will be highly non-linear. By contrast, a linear decision boundary is more prevalent when sigma is large. Based on the training data, the GWO method (Mirjalili et al., 2014) selects the structure with the least amount of error. GWO considers alpha the best solution, beta the second-best solution, and delta the third-best solution. All other solutions are considered omegas. In the GWO algorithm, hunting is driven by α, β, and δ. The solution ω follows these three wolves. When the prey stops moving and is surrounded by wolves, the α wolf leads the attack. This process is modeled by reducing the vector "x." The vector of coefficients of "X" decreases as "x" vector decreases since "X" is a random vector in the interval −2x to 2x. If |X| < 1, the α wolf will approach the prey (and other wolves) and if |X| > 1, the wolf will distance itself from the prey (and other wolves). According to the GWO, all wolves must update their positions based on the positions of α, β, and δ wolves. Alphas usually lead hunting operations. Hunting may occasionally be conducted by beta and delta wolves. Using a mathematical model of grey wolf hunting behavior, we hypothesized that alpha, beta, and delta have a better understanding of possible prey locations. According to the following equations, the other agent is required to update its positions based on the positions of the three best search agents. Until the end of the algorithm, the top three answers are selected as α, β, and δ.
• In each iteration, the three best answers (α, β, and δ wolves) estimate the position of the prey using the following relationship: • Each time the position of α, β, and δ wolves is determined, the rest of the answers are updated accordingly.
• Vector x (and consequently X) and other related vectors are updated with each iteration.
• As a result of the iterations, the alpha wolf position is introduced as the optimal position. By using this method, convergence toward the optimum response is accelerated and fine-tuned. To evaluate the classification structure, the average accuracy, and the most effective matching parameters, the optimization procedure is repeated five times. The accuracy of the network plays an influential role in determining an appropriate match. Searching for the GWO algorithm for global optimization can be assisted by calculating the optimal values of the RBF kernel.

EXPERIMENTAL RESULTS
The results and interpretations are presented in this section. First, physiological signals are discussed.

Physiological signals
PhysioNet was used to validate the suggested technique as part of the Challenge 2018 project (Ghassemi et al., 2018). Neurological special-  whereas in the test set, 989 labels were concealed. The examination and training sets were distinct. In the training set, arousal labels were selected by specialists and neurologists.
According to the full dataset, arousal or excitatory regions are classified as "1" in around 4% of instances, non-excited regions are classified as "0" in 80% of instances, and undefined regions are classified as "−1." Arousal zones have a minimum duration of around 30 s and a maximum duration of nearly 4 min, based on the data. Furthermore, most of these non-apnea arousal zones are in Stages 1 and 2, with a few in Stage 3.
The distribution of subject arousal is shown in

Setting
A 64-bit operating system and 4GB of RAM were used for Intel to 20, 100, 10, 2.5, 0.25, and 0.5, respectively. According to research (Mirjalili et al., 2014), these parameters were chosen based on faster convergence results as well as values suggested in research. Furthermore, the search intervals for selecting C and γ parameters were set to 1 to 50 and 1 to 30, respectively.
As shown in Figure  folds out of 10 folds and the final accuracy. In the ROC, the receiver agent describes the uncertainty problem. Figure 6 shows that the average error of the suggested algorithm for detecting arousal based on EEG recordings is generally lower than other methods, and the area under the measurement error curve can be used to quantify this issue according to (12).
F I G U R E 6 Applying the proposed model to different test signals received from the same channel and drawing receiver operating characteristic curves. There are separate curves in each of the three figures indicating the presence or absence of arousal. AR, arousal; N-AR, non-arousal.

F I G U R E 7
According to box plots, the o1-M1 channel is more effective than the other channels in detecting arousal events in several repetitions of the algorithm.
TA B L E 2 Arousal detection has been done for the training and test signals in the presence of people's movement noise, and the output of the algorithm is examined in two situations of low and high complexity.

Discussion
A five-repetition test to assess accuracy in detecting arousal disorders through the o1-M2 channel and other channels shows the differences that the proposed algorithm has created. A box plot showing the results of different signals, including arousal and non-arousal signals, is depicted in Figure 7. In another experiment, the signals were subjected to noise conditions similar to motion noise from people. In two situations with low complexity and high complexity, the estimates were accepted. For the training and test data, the proposed method in Table 2 was considered considering low complexity (first part) and high com-plexity (second part). Based on the ROC curve shown in Figure 8 arousal methods have relied less on physiological signals in recent years, some of them are computationally complex (Jabari et al., 2023).
Some other methods have either used PSG signals or analyzed EEG signals without considering the elements underlying their relationship.
To identify unique sleep events with greater than 90% accuracy using standard algorithms, precise data patterns must be extracted. According to our research, combining deep features with handcrafted features (Badiei et al., 2023) can help improve accurate estimation of arousal events. Table 3 compares excitation detection models based on metrics such as computational complexity and accuracy.

CONCLUSION
This study describes a method for efficiently gathering and classifying relevant features from single-lead EEG signals using optimal learning.
We created a hybrid deep learning strategy based on single-lead EEG sleep signals to identify arousal events. In addition, we planned to identify discrete occurrences of awakening and arousals that could result in progressive brain disease, an EEG signal anomaly that might be temporary. However, it gives essential information regarding the likelihood of developing multiple system atrophy, Parkinson's, and Alzheimer's disorders. These difficulties have been linked to these diseases, according to strong evidence. Based on GWO and SVM-RBF, an optimal learning model was developed for describing arousal episodes. We constructed a discriminative technique for analyzing physiological signals by using Inception-ResNet-v2 and extensively extracted features. As a means of preventing a range of sleep-related diseases, future studies could examine the relationship between ECG and EEG signals. In the future, we will focus on detecting arousals and estimating their intensity based on a small number of signal channels and subbands. Moreover, the authors will try to reduce the computational complexity and improve the accuracy.

ACKNOWLEDGMENTS
We gratefully acknowledge the generous support of Islamic Azad University of Central Tehran Branch.