Analysis of EEG Sleep Spindle Parameters from Apnea Patients Using Massive Computing and Decision Tree

In this study, Matching Pursuit (MP) procedure is applied to the detection and analysis of EEG sleep spindles in patients evaluated for suspected OSAS. Elements having the frequency of EEG sleep spindles are selected from different dictionary sizes, with and without a frequency modulation function (chirp) for signal description. This procedure was done with high computational cost in order to ﬁnd best parameters for real EEG data description. At the end we used the atom parameters as input for a decision tree-based classiﬁer, making possible to obtain a classiﬁcation according to apnea-hypopnea index group and allowing to see how atom parameters such as frequency and amplitude are affected by the presence of sleep apnea.


I. INTRODUCTION
The Obstructive Sleep Apnea Syndrome (OSAS) is characterized by functional airway obstruction, fragmenting sleep and impairing blood oxygenation [1]. These features may be reflected in the EEG tracing, indicating changes in brain functioning. Sleep microstructure has been shown to be affected in OSAS [2]. When dealing with this type of problem, the irregularity and nonstationarity of the EEG signal emerges together with a wide range of variables, resulting in the need for massive computational analysis, regardless of the tools employed [1], [3].
In order to process and analyze massive data obtained from human sleep studies in the context of OSAS investigation, we used the GRID/UNESP [4] system. The amount of results obtained in a week of GRID processing was comparable to more than a decade of processing on conventional PCs. Our dataset was processed during the winter of 2010 and has been analyzed since that time, advancing knowledge on the sleep EEG frequency-distribution problem in OSAS patients [5]- [7]. The aim of this work is to present the GRID/UNESP computing system helping to understand a real clinical EEG problem.
In this study, Matching Pursuit (MP) procedure is applied to the detection and analysis of EEG sleep spindles in patients evaluated for suspected OSAS. as MP dictionary size is explored in the first part of the study. Variability in the characteristics of atoms representing sleep spindles in OSAS patients is analyzed in the second part of the study, from the perspective of a decision tree-based classifier.

A. Dataset
This study was approved by the local ethics committee and all subjects provided informed consent before entering the study. The series comprised single consecutive sleep studies pertaining to 49 patients (47.6 ± 13.6 age years; 16 female) with suspected OSAS, who underwent investigation in Hospital de Clinicas de Porto Alegre from April 2007 to July 2009, meeting study criteria (maximum age = 60y, willingness to participate in study, no previous treatment for OSAS, no alcohol or substance abuse).

B. Matching Pursuit
Matching Pursuit is not a transform but an adaptive approximation of the signal by a set of core functions chosen from a dictionary. MP describes the signal through fundamental "Atoms" (a dictionary of functions, like words in a language). MP time-frequency dictionaries were introduced in [9] and a correction scheme for large data sizes, such as EEG case, was complemented in [3]. On the MP approach, a signal S(t) is obtained, and subsequent steps are made to adapt S(t) in terms of a basic dictionary and redundant functions.
The basic strategy is to iteratively decompose the signal, looking at each iteration for Gabor wave functions of arbitrary support like where α is the amplitude, t and ω are the central time position and frequency of the atom respectivelly, β is a chirp factor and N is the size of the series window submited to MP procedure. This function choice is most similar to EEG signal [3], [5], [10]. The similarity measure used is the inner product between the signal and the Gabor function. This Gabor function is subtracted from the signal and the resulting signal, commonly called the residue, is again submitted to the methodology. Thus, the signal is decomposed into a sum of waveforms, prototypes or atoms with different weights or coefficients. For further methodology description see [3], [5]- [7], [9]- [11].

C. Computational GRID
The problem of MP analysis can be classified as a "bag of tasks", since it is in an independent set of tasks with high computational cost. The computational cost of performing the integrative analysis of the results is negligible. We therefore used the task scheduler Condor to coordinate the submission process of analyzing the series.

D. Statistical and parameter analysis
There are several variables that are used to describe the signal in a MP approach, but these can be divided into two basic groups: a dictionary of function parameters (amplitude, frequency, etc . . .) and proper parameters of the MP procedure itself (basically dictionary size and atom shape). Each atom, as described by 1, has amplitude, duration and central frequency, plus a term (β) measuring frequency modulation. These parameters are often called "descriptors of the signal", however may subtly vary with the choice of dictionary size. Classically the β term is not used due to computational cost. Here we use four different dictionaries to describe the signal: 70000 (70 k) and 100000 (100k) atoms with (wc) and without (nc) the beta modulation term, respectively. This choice allows to verify whether there is any difference between the dictionaries at this accuracy level.
In all cases, the MP approach requires some minimum amplitude threshold to be chosen, so that an atom is regarded as representative of events observed in the EEG. Atoms very small, with low amplitude or short duration, are part of the so-called signal background. Here we focus on sleep spindles, collecting atoms with 0.5s to 2s half-width (±σ on a Gaussian curve) duration, central frequency ranging from 11Hz to 16Hz and β (chirp rate) between −2Hz/s and 2Hz/s. These elements are intrinsically nonstationary [1], [3], [10], [11] and many studies are attempting to address this internal variation of their frequency [1]. Moreover, these events are very dependent on the subject (individual variability), especially with regard to their amplitude. Considering that we have a large database, we can address the question of subject amplitude dependence and make an analysis based on a variable amplitude threshold per subject and channel criterion. We chose to consider only the top 10% higher amplitude elements.

E. Decision tree-based classification
In order to determine which atom parameter is more affected by the presence of respiratory events, the set of 1491 atoms obtained using the 100K wc MP dictionary was subjected to the WEKA J48 decision tree procedure. We used WEKA (Waikato Environment for Knowledge Analysis) software package, a collection of machine learning algorithms for data mining tasks [12]. It was possible to build a decision tree classifying subjects in the three (C, M and S) studied groups.

III. RESULTS
Using the criteria detailed in the previous section, we can extract elements in sleep spindle frequency range in both ways (with a fixed amplitude criterion and a criterion depending on the subject/signal amplitude). Both are shown in Figure  1. In Figure 1A is shown the amount of MP atoms found with a fixed 40 muV amplitude criterion, similar to that used in [3], [11] for healthy young adults. Here the variability between subjects is very evident, regardless of AHI index. In Figure 1B this effect is attenuated by using an adaptive amplitude criterion (where only the top 10% higher amplitude elements for each subject and channel are used). As the use of a variable (individualized) threshold allows collecting a more representative sample of atoms from each subject, this approach was used for atom collection throughout the rest of the study. Figure 2 shows atom distribution considering MP dictionary size and presence or absence of the beta (frequency modulation) term. No significant differences were found for dictionary (70k or 100k) size. There is a decrease in the number of elements when using a more complete (wc) dictionary, which is expected since fewer atoms (but more complex) are needed to describe the signal.
In order to test how sensitive atom parameters are compared accross AHI (Apnea-Hypopnea Index) groups, we employed a decision tree approach using J48 algorithm. The result is the decision tree shown in Figure 3. In this tree, right-sided numbers count for errors and left-sided represent matches. Atom amplitude and frequency criteria, as well as subject gender appear in the tree and are discussed in the last section.

IV. DISCUSSION AND CONCLUSIONS
The number of atoms in the MP dictionary did not affect the number of atoms found in the sleep spindle frequency and duration range. This is in agreement with the fact that spindles are one of the easiest structures to be separated from background EEG noise. This does not imply that spindle visualization by humans is an easy task when there is superimposition with slow waves. The use of a more complex dictionary implies a more complete description with fewer atoms, which is not surprising because these atoms are more representative of the signal. However, this also increases computational cost. When using a clinical EEG sample, subject inter-variability becomes clear and makes necessary a selection approach based on a spindle amplitude criterion adjustable for individual and channel [13].
Based on Figure 3 we can see that it is possible to separate spindles obtained from the three apnea groups with good accuracy. In the case of atoms with higher amplitudes, gender is no longer relevant for the classification, which shows that these atoms can be representative of the clinical problem with the implication that higher amplitude and faster spindles are easily found in C group. Note that these atoms represent the top 10% higher amplitude atoms for each subject. For smaller amplitude atoms, gender seems to play some role in the differentiation, with frequency being the separator criterion for men and amplitude for women. This result, however, needs to be considered with caution since the female sample was relatively small and had less severe apnea. Moreover, atoms with low amplitude also tend to be less representative, possibly indicating that the main spindle generator mechanism is not active, and what is being observed in this case may be just background noise.
It should be kept in mind that spindles are considered to be markers of integrity for thalamo-cortical circuits. Subjects with high AHI scores produced spindles with lower amplitude and frequency, and this can be best observed in male patients. However, looking at the left part of the tree in figure 3 it is possible to see that females with low amplitude spindles tend to corroborate this finding. In conclusion, it was possible to see that, for this sample, patients with sleep apnea produced spindles with lower amplitude and frequency. These results correlate well with theories of brain plasticity. Chronic problems, such as apnea, Parkinson's disease, epileptic encephalopathy, and others eventually impact neural mechanisms through final common pathways that may translate into lower amplitudes for short time EEG transients like spindles [7], [14]- [17]. This is a preliminary work where it was possible to perform systematic MP decomposition of sleep EEG signals pertaining to a representative sample of apnea patients, thereby finding a dictionary size more appropriate for this type of study. The results suggest a possible relation between apnea correlates, such as blood oxygenation, sleep fragmentation, spindle characteristics (represented by MP atoms here), and sleep spindles quality.