Automatic Detection of Obstructive Sleep Apnea Events Using a Deep CNN-LSTM Model

Obstructive sleep apnea (OSA) is a common sleep-related respiratory disorder. Around the world, more and more people are suffering from OSA. Because of the limitation of monitor equipment, many people with OSA remain undetected. Therefore, we propose a sleep-monitoring model based on single-channel electrocardiogram using a convolutional neural network (CNN), which can be used in portable OSA monitor devices. To learn different scale features, the first convolution layer comprises three types of filters. The long short-term memory (LSTM) is used to learn the long-term dependencies such as the OSA transition rules. The softmax function is connected to the final fully connected layer to obtain the final decision. To detect a complete OSA event, the raw ECG signals are segmented by a 10 s overlapping sliding window. The proposed model is trained with the segmented raw signals and is subsequently tested to evaluate its event detection performance. According to experiment analysis, the proposed model exhibits Cohen's kappa coefficient of 0.92, a sensitivity of 96.1%, a specificity of 96.2%, and an accuracy of 96.1% with respect to the Apnea-ECG dataset. The proposed model is significantly higher than the results from the baseline method. The results prove that our approach could be a useful tool for detecting OSA on the basis of a single-lead ECG.


Introduction
Obstructive sleep apnea (OSA) is a major sleep-disordered breathing (SDB) syndrome that is an independent risk factor of coronary heart disease, hypertension, and arrhythmia [1]. According to the manual of the American Academy of Sleep Medicine (AASM) [2], OSA in adults is scored when there is a 90% or more reduction in the baseline of the oral and nasal respiration amplitude for 10 s or more, occuring during sleep.
is condition is associated with repetitive airflow limitation and sleep fragmentation, decreasing the sleep time and degrading the sleep quality of the OSA patients [3]. OSA not only causes excessive daytime neurocognitive deficits, drowsiness, depression, fatigue, and heart stroke [4][5][6] but can also cause a brain stroke, high blood pressure, arrhythmias, myocardial infarction, and ischemia [7][8][9]. According to the AASM [2], polysomnography (PSG) is considered to be the gold standard for OSA detection, which is based on a comprehensive evaluation of the sleep signals [10]. PSG involves overnight recording of the patient and the measurement of many signals using the sensors attached to the body, e.g., an electroencephalogram (EEG), electromyogram (EMG), electrocardiogram (ECG), and electrooculogram (EOG), to monitor the respiratory effort and other biophysiological signals [1].
After collecting the PSG data, physicians inspect them using statistical tools to score the OSA events.
However, PSG has several disadvantages. First, patients need to sleep in the hospital for at least one night, which consumes a considerable amount of time and is expensive. Furthermore, many patients cannot sleep well in hospitals. Second, many electrodes have to be connected to the body of a patient. ese electrodes will interrupt their sleep, which will result in the deviation of the measurement results. erefore, it is important to develop methods that can reliably diagnose OSA with a few signals and that can be used at home. According to Mietus and Peng [11], the heart beat interval of patients fluctuates periodically during the occurrence and recovery of OSA. Zarei and Asl [12] indicated that significant changes in heart rate or abnormal activities of the heart may indicate OSA. Additionally, according to our clinical research, patients' compliance is very low when they wear the pressure transducer sensor to obtain the oral and nasal respiration. Patients often pull out the nasal cannula. erefore, when compared with the ECG signal, nasal airflow data can be unstable due to lead falling off. Hence, in this study, we use ECG signals to detect OSA events.
Traditional visual OSA scoring is a very tedious and time-consuming process for a physician to conduct. erefore, many alternative OSA detection methods have been developed [13]. ese methods were based on biosignals such as the respiratory [14], snoring [15][16][17], SpO2 [8,9,18], and ECG [12,[19][20][21][22][23][24] signals, and many authors have obtained a high performance level in terms of OSA detection. However, almost all these methods involved data preprocessing, feature extraction, feature selection, and classification. Although feature extraction is essential to ensure good performance, this process requires considerable domain expertise and is particularly limited to high-dimensional data [25].
Deep learning is an attractive alternative because it can automatically learn and extract features from raw data and can be merged with a classification procedure. In particular, convolutional neural networks (CNNs), which are a popular deep-learning model, have gained considerable success owing to their excellent performance in various domains, including visual imagery [26], speech recognition [27], and text recognition [28]. CNNs have also been applied to biosignal classification problems. For example, in our previous study [29], a CNN can be used to score the sleep stages. Banluesombatkul et al. [30] used metalearning classify sleep stages. Piriyajitakonkij et al. [31] proposed a SleepPoseNet to recognize sleep postures. An event-related potential encoder network was applied to ERP-related tasks [32]. Wilaiprasitporn et al. [33] used a deep-learning approach to improve the performance of affective EEG-based person identification. Recently, some models based on CNNs have been employed to detect OSA. Urtnasan et al. [25] proposed a method for the automated detection of OSA from a single-lead ECG using a CNN. Ho et al. [10] developed an approach for OSA event detection using a CNN and a single-channel nasal pressure signal. Banluesombatkul et al. [34] used a CNN to extract ECG signal features and fully connected neural networks for OSA events detection. McCloskey et al. [35] used a CNN and wavelets to analyze the nasal airflow and detect the OSA events. However, most of these methods score OSA events by minute-by-minute analysis. According to the AASM ruler [2], OSA events occur in 10 s or more. erefore, minute-by-minute analysis will lose some OSA events. At the same time, the duration of each OSA event is different. Multiple OSA events can occur as briefly within only a single minute (i.e., one epoch); at times, one OSA event can be prolonged over multiple epochs. erefore, it is difficult to detect complete OSA events for these methods.
According to Guilleminault et al. [36], there is a relation between the OSA events and heart rate variability. ey indicated that the heart rate decelerates at the beginning of an OSA event and that it suddenly increases when normal breathing is resumed [36]. Because long short-term memory (LSTM) maintains internal memory and utilizes feedback connections to learn temporal information from sequences of inputs, in this study, we propose a new method for OSA detection using the CNN and LSTM. e LSTM [37] is used to learn these dependencies, such as the transition rules employed by physicians, to identify future OSA events from previous ECG epochs. To detect complete OSA events, a window overlapping method is required to accurately detect the OSA events, which can identify the start and end positions of the event. erefore, the proposed method can alert for OSA events of long duration, which will reduce the rate of sudden death caused by OSA events [38].
is study is organized as follows: the datasets are presented in Section 2, and the methods are presented in Section 3.
e experimental results and discussion are presented in Section 4, and Section 5 concludes this study.

Dataset and Preprocessing
e Apnea-ECG dataset [39], downloaded from https:// www.physionet.org/content/apnea-ecg/1.0.0, was used to evaluate the proposed approach. e dataset comprises 70 PSG recordings, among which 35 are used in the training set and 35 are used in the test set. e training set was used to update the parameters of the proposed model, and the test set was used to perform independent performance assessments. Each recording contains a continuous digitized ECG signal, a set of apnea annotations (derived by human experts on the basis of the simultaneously recorded respiration and related signals), and a set of machine-generated QRS annotations. e sampling rate for the ECG was 100 Hz with a 12 bit resolution. e records contain variable lengths from 7 to 10 hours. e age of the subjects is between 27 and 63 years, and their weights are 35-135 kg.
First, according to Urtnasan et al. [25], a Chebyshev type-II band-pass filter (5-11 Hz) was used to remove undesirable noise from the single-lead ECG data. Second, the data were segmented into epochs (10 s long) to train the proposed model. Table 1 presents the distribution of all the epochs in the training and test sets. Abnormal epoch means an OSA event.

Convolutional Neural Network.
In this study, we used a one-dimensional (1D) CNN to classify the ECG signals. e CNN comprised convolutional, pooling, and fully connected layers. e net input of neuron j in layer l is defined as follows: where M j represents the selection of input maps, w j,i denotes the weight or the filter associated with the connection between neurons j and i, x l−1 i is the output signal from neuron i in layer l − 1, b l j is the bias associated with neuron j in layer l, and ( * ) denotes vector convolution. To acquire an output map, an activation function is required as follows: When compared with other activation functions, a rectified linear unit (ReLU) exhibits robust training performance. Hence, in this study, we used ReLU as the activation function for the output maps, which can be expressed as follows: After the convolutional layer, a pooling layer was placed, which was used to reduce the dimensions of the feature maps, network parameters, and the computational cost associated with successive layers using specific functions to summarize the subregions, such as by considering the average value or the maximum value. Additionally, the pooling layer allowed the CNN to learn features that were scale invariant or can be attributed to the orientation changes [40]. e pooling operation consisted of sliding a window across the previous feature map. Herein, max pooling was used after the convolutional layer was activated. Finally, a dense layer, which was generally used in the final stages of the CNN, was fully connected to the outputs of the previous layers.

Batch Normalization.
During the training of a CNN, a change in the distribution of the inputs of each layer will affect the outputs of all the succeeding layers. is can result in difficulty when attempting to train models with saturated nonlinearities [41]. erefore, batch normalization (BN) was used to solve this problem.
Suppose X � {x 1 , x 2 , · · · , x d } is the input to a layer with dimension d. e corresponding minibatch is mb. e mean of all the inputs in the same minibatch can be expressed as follows: e variance of the input in a minibatch can be expressed as follows: erefore, BN can be expressed as follows: where , c, and β are learnable parameters. e training efficiency of a CNN can be improved using BN. At the same time, BN helps the CNN to train faster and provides high accuracy [41].

Long Short-Term
Memory. LSTM controls the cell state via three gates, i.e., a forgetting gate, an input gate, and an output gate. e output features obtained from the previous dense layer of a CNN layer are passed to the gate units. e memory cells constituting the LSTM update their states via the activation of each gate unit controlled to a continuous value between 0 and 1. e hidden state of the LSTM cell h t is updated after every t steps. e input gate, forget gate, and output gate can be written as shown in equations (7)-(9) [37], respectively.
where ∘ represents point-wise multiplication. e cell states and hidden states can be expressed using equations (10) and (11), respectively.
e CNN and LSTM can be used as backpropagation algorithms to update the parameters of the model during training.

Statistical Evaluation Methods.
In this study, we use the kappa coefficient (KP) [42], which is a robust statistical measure of the inter-rater agreement, to evaluate the performance of our method. Additionally, the total accuracy (TAC), sensitivity (SE), specificity (SP), positive predictive value (PPV), and negative predictive value (NPV) were where TP, TN, FP, and FN denote the true positive, true negative, false positive, and false negative, respectively. We implement our experiments on a workstation with a GeForce GTX2060 GPU in a Windows environment. e TensorFlow framework is used to train the proposed model.

e Proposed Deep Model Architecture and Parameters.
To build an optimal OSA detection architecture, we need to understand the characteristics of the input data. e sampling rate of the ECG was 100 Hz, and the 10 s input dimension was 1000. To extract different scale features, we need to set up different size filters. erefore, experiments are implemented while varying the filters size of the convolution layer to identify the optimal parameters for automated OSA detection. According to existing study [25,29], we design a network model, which contains a convolution, BN, pooling, dropout, and dense layer, as shown in Figure 1. N denotes the number of the filters. e parameters and results are shown in Table 2. From Table 2, we can see that model_2 performs best and model_1 is the second. However, the parameters of model_2 are large than those of model_1. For portable OSA devices or real-time OSA analysis systems, model_1 is more appropriate. erefore, model_1 is used to learning the features representation of ECG. To learn the transition rules of OSA, LSTM is used. e proposed model contains the BN, convolutional, pooling, LSTM, and dense layer, as shown in Figure 2. e detailed parameters of the proposed model are presented in Table 3. is table shows the number of filters, the size, and stride in each convolution layer, the size and stride of the kernel in each pooling layer, and the output size of each layer, including the LSTM layer. e batch size is 30, the training epoch is 100, and the learning rate is 0.1. Figure 3 shows the learning results in terms of accuracy and loss obtained as the number of epochs is varied. e results show that the accuracy and loss reach stable values after several iterations of learning when applied to the validation dataset. Figure 4 shows the filter morphology and training time with each training epoch. From Figure 4(a), we can see that, after 90 training epochs, the morphology of the filter almost does not change. Figure 4(b) indicates that the speed of model training is fast. Table 4 presents the performances of the proposed model for the automated detection of OSA from a single-lead ECG signal. When applied to the test dataset, we obtained a KP of 0.92, an SE of 96.1%, an SP of 96.2%, a TAC of 96.1%, a PPV of 97.6%, and an NPV of 93.8%. As can be seen, the proposed model performed very well for the detection of OSA.

Performance Results.
From Table 4, we can observe that 3.9% of the AEs were misclassified as NEs and that 3.8% of the NEs were misclassified as AEs. According to our research, these misclassifications could have been caused by two probable reasons. One reason is that a transition epoch from NE to AE or AE to NE is difficult to classify. For example, Figure 5 shows a transition epoch from NE to AE, whereas Figure 6 shows a transition epoch from AE to NE. A skilled physician would be able to classify these epochs based on the contextual information. However, the proposed model does not use the contextual information to score OSA, making it unable to distinguish the transition epochs. e other reason may be that the proposed model finds it difficult to score the artifact epochs.
e ECG signals can be polluted by unwanted noise signals, including body movement. Figure 7 shows a polluted ECG epoch. Because the artifact epochs are few and varied, the proposed model was unable to learn the distributions of all the artifact epochs. erefore, it is difficulty for the proposed model to detect the OSA of artifact epochs. In this case, the usage of handcrafted features seems to be considerably robust. 4 Computational Intelligence and Neuroscience

Benefits of Long Short-Term Memory.
e major advantage associated with the usage of LSTM is that it can be trained to learn long-term dependencies, including the transition rules that are used by the physicians to identify the next possible OSA event(s) from a sequence of ECG epochs. To validate the usefulness of LSTM, we removed the LSTM layer from the model (Figure 2) and then reimplemented the experiment. is test was named CNN_1. Table 5 shows the comparison results, where we can see that the proposed model (CNN + LSTM) results in a gain of 1.3% over the TAC of CNN_1. In addition, KP increased by 0.03 when LSTM was added, verifying our assumption. Figure 8 shows an example of the NE ECG signal. When the proposed method (CNN + LSTM) is used, the epoch is classified as an NE. However, when CNN_1 is used, this epoch is scored as an OSA event.
e reason for OSA misclassification is that the heart rate is slow at the center of this epoch. According to a previously conducted study [11], the heart rate decelerates when OSA occurs. erefore, CNN_1 learned this feature. However, from Figure 8, we can observe that the heart rate changes very little. At the same time, the heart rates of previous epochs are similar to those of this epoch. However, because the LSTM learns long-term dependencies, the CNN + LSTM method accurately detects the epoch, which is the benefit associated with the usage of LSTM.

OSA Detection.
As mentioned previously, long OSA is dangerous because it can lead to sudden death. To identify long OSA, the window overlapping method can be used to detect the start and end positions of an OSA event. In this way, long OSA can be detected Figure 9 shows that the proposed model can detect complete OSA events from the ECG signals. From the nasal airflow signal, we can observe that the OSA events detected by our model have been accurately identified.

Comparison of the Proposed Method with Existing Studies.
e comparison of various methods of automatic OSA detection is difficult because different datasets, feature sets, and classifiers are used in different studies. For ensuring a fair comparison with existing studies, Table 6 shows the classification performances of different methods based on single-lead ECG signals. From Table 6, we can observe that the proposed model achieved better performance when compared with those achieved in the previous studies. More      Computational Intelligence and Neuroscience

Conclusions
In this study, we developed an automated OSA event detection method using a CNN, where the feature extraction and selection processes were not required. e proposed method detected the start and end positions of the OSA events based on the overlapping epochs in the ECG signal dataset. Our method automatically extracted the time-invariant features from raw ECG signals without utilizing any handcrafted features. e proposed approach is robust and completely automated, and the method can be easily adapted to other physiological signal analyses and prediction problems. e TAC and KP of the proposed model applied to the single-channel ECG reached 96.1% and 0.92, respectively. e experimental results showed that the proposed method could accurately score the OSA events and that it achieved comparable performance with other state-of-the-art studies. More importantly, our method can prevent sudden death from OSA, which is important for the patients who are severely affected by OSA.
ere are some limitations associated with our CNN method. First, the proposed model can only detect OSA and normal events but not hypopnea events. Although hypopnea is not as serious as OSA, it is still prevalent in sleep-disordered breathing patients. Second, it is difficult to score transition epochs using our method. In the future, we will improve the discrimination ability of our method for OSA OSA OSA OSA Figure 9: e start and end positions of multiple OSA events. AEs and NEs. In addition, the automated anomaly detection of ECG based on the CNN, which is important to rapidly assess the quality of the ECG data, will be studied.

Conflicts of Interest
e authors declare that they have no conflicts of interest.