Determination of Signs of Sleep Apnea Using Machine Learning Methods in Combination with Reducing the Dimensionality of Heart Rate Variability Features

—Obstructive Sleep Apnea Syndrome (OSAS) is a clinically significant disorder characterized by recurrent episodes of upper airway obstruction, manifesting as either apnea or hypopnea, predominantly occurring at the pharyngeal level. Despite the preservation of respiratory muscle function during these episodes, OSAS poses considerable health risks, including cardiovascular complications and cognitive impairment. In recent years, a growing body of literature has explored novel methodologies to discern and diagnose OSAS, with a particular focus on cardiac activity analysis through Heart Rate Variability (HRV). This study contributes to the existing literature by conducting a comprehensive HRV analysis aimed at identifying indicative patterns of sleep apnea. The analysis incorporates diverse parameters within both time and frequency domains, facilitating a nuanced understanding of the complex interplay between cardiac dynamics and respiratory disruptions during sleep. In an effort to enhance the interpretability of the data, various scaling and dimensionality reduction techniques, such as Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP), were applied. The dataset utilized in this investigation comprises records from 70 patients, sourced from the Apnea-ECG Database on the Physionet platform. To discern the optimal classification model, several machine learning algorithms were employed after the dimensionality reduction, including k-Nearest Neighbors (k-NN), logistic regression, Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosting. Intriguingly, the results demonstrate a remarkable 100% accuracy across all classifiers when utilizing the UMAP dimensionality reduction method. A distinctive feature of the proposed methodology lies in its amalgamation of machine learning techniques with HRV parameters post-dimensionality reduction. This approach not only enhances the interpretability of the complex physiological data but also underscores the potential applicability of the developed model in real-world scenarios for the detection of OSAS. The robustness of the proposed approach, as evidenced by its high accuracy rates, positions it as a promising tool for advancing diagnostic capabilities in the realm of sleep medicine. Future research endeavors may further refine and validate this methodology, paving the way for its integration into clinical practice and contributing to the broader landscape of sleep disorder diagnostics.


I. INTRODUCTION
The text discusses Obstructive Sleep Apnea Syndrome (OSAS), a condition caused by recurring episodes of upper airway obstruction (apnea) or narrowing (shallow breathing) occurring at the level of the throat, with preserved respiratory muscle function [1], [2].Consequences of apnea and shallow breathing include worsened blood oxygenation and awakening episodes (most remain unconscious), leading to sleep fragmentation.This results in daytime complaints and, in conjunction with recurring hypoxemia episodes and excessive sympathetic nervous system activity, can lead to increased blood pressure with subsequent complications.
The lack of oxygen activates a survival reflex, prompting the individual to wake up to restore breathing.While this reflex sustains life, it interrupts the patient's sleep cycle, hindering restful sleep and potentially leading to severe consequences, including cardiac strain with potentially fatal outcomes [1], [2].

297387.2
Електронні системи та сигнали 2024 Самсоненко А. С., Попов А. О. DOI: 10.20535/2523-4455.mea.297387 In recent years, increasing attention has been given to research on identifying apnea through the analysis of heart activity based on heart rate variability.Heart rate variability (HRV) is the fluctuation in time intervals between consecutive heartbeats.HRV indexes neurocardiac function and is generated by the interaction between the heart and brain, as well as dynamic nonlinear processes in the autonomic nervous system.HRV is a property of interdependent regulatory systems acting on different time scales to help us adapt to environmental and psychological challenges.It reflects the regulation of autonomic balance, arterial pressure (AP), gas exchange, intestinal tone, heart and vessel tone related to the diameter of blood vessels regulating AP, and even facial muscles [3], [4].
In the study [5], HRV analysis was used to measure and assess the autonomic nervous system (ANS) function during normal breathing and apnea in two groups of subjects.The results showed that compared to normal breathing, both simulated apnea (voluntary apnea) and actual apnea (sleep disorder) led to a significant increase in the average R-R interval duration, normalized power of low-frequency (LF) components, LF/HF ratio (where HF stands for high frequency).Meanwhile, the values of the root mean square of consecutive differences in RR intervals (RMSSD) parameter and normalized power of HF components significantly decreased, indicating a substantial enhancement of sympatho-vagal modulation.The ANS balance underwent significant changes, and the fractal characteristics of the heart were strengthened [5]- [7].
HRV analysis for determining sleep apnea features can be conducted using various types of parameters.Time domain indices are easy to compute and intuitively understandable.Frequency domain indices are used to measure sympatho-vagal modulation.For instance, normalized power of HF components can reflect relative vagal modulation, while normalized power of LF components can reflect relative sympathetic modulation [5]- [7].In the study [5], apnea was defined as a cessation of breathing for longer than 15 seconds, although sleeprelated breathing disorder studies typically use a threshold of 10 seconds.
The reasons why apnea leads to arrhythmia are diverse and complex.From an anatomical perspective, it has been established that inspiratory muscles in the lungs relax during apnea.Subsequently, the relaxed muscles cause an increase in intrathoracic pressure, hindering venous return to the right atrium, reducing absolute venous pressure.These low-level changes contribute to enhanced sympathetic modulation through lowpressure baroreceptors, making the ANS imbalanced and ultimately leading to arrhythmia.This study investigated arrhythmia causes through changes in ANS function.Experimental results showed that during apnea, an increase in the average R-R interval indicated increased vagal modulation, while a decrease in RMSSD and normalized power of HF, along with an increase in normalized power of LF and LF/HF ratio, indicated a relatively enhanced sympathetic modulation and a disturbance in the initial ANS balance.Heart modulation is regulated by the ANS, and when normal ANS function is disrupted, abnormal heart rhythm forms, causing arrhythmia.Simultaneously heightened sympathetic and parasympathetic modulations are also the most common triggers for arrhythmia, such as atrial fibrillation [5]- [7].
Most studies on detecting sleep apnea rely on supervised learning [8].In these studies, oxygen saturation and ECG signals were used as biomedical markers for sleep apnea, as their correlation with apnea was observedthe research shows that heart rate and systolic blood pressure increase in response to apnea.Various decision tree classifier variants were employed to achieve an accuracy of 93%.PPG measurements were obtained from an SPO2 sensor and analyzed to calculate heart rate and respiratory effort.One of the best classification performances, reaching 87%, was obtained when linear discriminant analysis was used to combine SPO2 and PPG features.In other studies, accuracy reached 77.7%, combining statistical and temporal features of SPO2 and PPG, incorporating age as a feature, and using these data as input for a SVM algorithm.The studies discussed in [8] emphasize that age is also an explicit parameter as it correlates with cardiovascular health, and using age alone for detecting apnea can provide sufficient accuracy.This work presents the results of the analysis of HRV parameters using machine learning methods to identify sleep apnea features.The distinctive aspect of the proposed approach is the application of machine learning to HRV parameters after reducing their dimensionality.

A. ECG measurement and pre-processing
The work utilized the Apnea-ECG Database from the PhysioNet platform [9].This database consists of 70 signals (35 for the training dataset and 35 for testing the algorithm), each representing a person's ECG during sleep with a duration of 7-10 hours, and they include annotated QRS complexes.The database also contains annotation files for the occurrence of apnea in the training set.Examples of rhythmograms for cases with and without apnea are provided in Fig. 1.
During the preprocessing of RR intervals using the wfdb library [10], RR intervals longer than 3 seconds and shorter than 0.3 seconds were removed and replaced with their respective upper limits.Subsequently, spline interpolation of RR intervals was performed to obtain a uniformly discretized time series with a sampling rate of 2 Hz.DOI: 10.20535/2523-4455.mea.297387

B. Heart rate variability parameters
For the interpolated RR intervals, indices of HRV in the time domain were calculated.These indices quantitatively determine the degree of variability in the interval between beats, which represents the period of time between consecutive heart contractions (see Table 1).
Measurements in the frequency domain allow for a qualitative assessment of the distribution of absolute and relative power in four frequency bands.The working group of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (1996) divided HRV into ultra-low-frequency (ULF), very low-frequency (VLF), low-frequency (LF), and highfrequency (HF) ranges (see Table 2).

HR Max − HR Min bpm
The average difference between the highest and lowest pulse within each respiratory cycle.

RMSSD ms
The root mean square of consecutive differences in RR intervals.
HRV triangular index -The integral of the density histogram of RR intervals, divided by its height.

TINN ms
The base width of the histogram of RR intervals.

Parameter Dimension Description
ULF power ms 2  Absolute power in the ultra-low-frequency range (≤0.003Hz).
VLF power ms 2  Absolute power in the very low-frequency range (0.0033-0.04 Hz).

LF peak Hz
Peak frequency in the low-frequency band (0.04-0.15 Hz).

LF power ms 2
Absolute power in the low-frequency range (0.04-0.15 Hz).
LF power -Relative power in the low-frequency band (0.04-0.15 Hz) in normalized units.
LF power % Relative power in the low-frequency range (0.04-0.15 Hz).

HF peak Hz
Peak frequency in the high-frequency band (0.15-0.4 Hz).HF power -Relative power in the high-frequency band (0.15-0.4 Hz) in normalized units.
HF power % Relative power in the high-frequency band (0.15-0.4 Hz).

LF/HF %
The ratio of power in the low-frequency (LF) to high-frequency (HF) band.
In addition to the HRV parameters in the time and frequency domains, spectra and rhythmogram spectrograms were also calculated.For the array of HRV parameter features, additional scaling and dimensionality reduction methods were applied to two main components using the sklearn library [11].These methods include linear dimensionality reduction based on principal component analysis (PCA) and T-distributed Stochastic Neighbor Embedding (t-SNE), as well as uniform manifold approximation and projection (UMAP) for approximation and projection of diversity.

C. Machine learning models for HRV analysis
The obtained data was divided into a testing set (20%) and a training set (80%).For solving the classification task, the k-nearest neighbors clustering method from the sklearn library [11] was used, specifically the KNeighborsClassifier() class.This method is relatively fast and works well with high-dimensional data.Additionally, the logistic regression method from the sklearn library [11] was chosen, which is not typically used for classification tasks but can provide sufficient accuracy under certain data distributions.Furthermore, support vector machines, decision trees, and ensemble methods such as random forest and gradient boosting were used.The respective classes from the sklearn library [11] include SVC(), DecisionTreeClassifier(), Random-ForestClassifier(), and GradientBoostingClassifier().

II. RESULTS
Based on the accuracy comparison, the classifier based on gradient boosting of trees proved to be the best for the case of applying PCA.The scatterplot with the decision boundary is shown in Fig. 2. The obtained average accuracy of 94% for the provided test data and the obtained labels is a quite satisfactory result.From the confusion matrix (Fig. 3), it can be observed that the model more often makes mistakes in favor of the normal state.In other words, the majority of errors correspond to situations where an apnea episode was classified as a normal state, which is a significant drawback of such a classification model and component distribution.
The dataset calculated using the t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction method allowed training classifiers with significantly higher decision-making accuracy.The scatter plot with the depicted decision boundary is shown in Fig. 4, and the confusion matrix in Fig. 5.The highest accuracy of 100% was achieved with k-nearest neighbors and random forest methods.However, it's worth noting that such high accuracy may also indicate overfitting of the model and may lead to low decision-making accuracy on real-world data, although on the test data, a 100% accuracy with 0 errors was maintained.It is also noteworthy that support vector machines, gradient boosting, and decision tree methods also demonstrated high classification accuracy at 99.9%.
The data preprocessed with the UMAP dimensionality reduction method resulted in the highest decisionmaking accuracy among the applied machine learning models.The scatter plot with the decision boundary is shown in Fig. 6, and the confusion matrix is presented in Fig. 7.All selected machine learning models achieved 100% decision-making accuracy, which was maintained on the test dataset as well.It is worth noting that the obtained accuracy is due to sufficient separation of classes in the feature space, and consequently, the decision boundary is relatively simple and linear.
The dataset formed by calculating Fourier transform coefficients and spectrograms yielded comparable accuracy to that obtained through dimensionality reduction using UMAP.However, since the dataset formed by calculating Fourier transform coefficients and spectrograms differs in its distribution in space from other considered feature arrays, such an apnea detection algorithm may be less effective on other real-world data.However, this assumption requires experimental confirmation.

CONCLUSIONS
The research presented in the paper investigates the classification efficiency of human rhythmograms to identify sleep apnea features.The study evaluates the performance of classifiers (k-NN, logistic regression, support vector machine, decision tree, and ensemble methods: random forest and gradient boosting) in combination with dimensionality reduction methods (PCA, t-SNE, and UMAP).It is determined that the highest average accuracy (100%) can be achieved by applying non-linear dimensionality reduction using the UMAP method in combination with all the classifiers used.However, this comes at the cost of the longest computation time.Therefore, future work should focus on optimizing feature preprocessing without sacrificing accuracy.
-R (RR) peaks intervals standard deviation SDANN ms The standard deviation of the mean NN intervals for each 5-minute segment of a 24-hour HRV recording.SDNN index (SDNNI) ms The average value of the standard deviations of all NN intervals for each 5-minute segment of a 24-hour HRV recording.pNN50 % The percentage of consecutive RR intervals that differ by more than 50 ms.

Fig. 1
Fig. 1 Examples of a rhythmogram in a normal state (bottom plot) and during apnea (top plot)

Fig. 2 A
Fig.2A scatter plot with the depicted decision boundary for the classification of apnea (brown color) and the normal state (blue color) for the data calculated using the dimensionality reduction method PCA

Fig. 4 6
Fig.4Scatter plot with the depicted decision boundary of the classification of apnea (brown color) and the normal state (blue color) for the data calculated using the t-SNE dimensionality reduction method

Fig. 6
Fig. 6 Scatter plot with the decision boundary of the apnea (brown color) and normal state (blue color) classification for the data calculated using the UMAP dimensionality reduction method

TABLE 1 TIME
DOMAIN HRV PARAMETERS