Extracting continuous sleep depth from EEG data without machine learning

The human sleep-cycle has been divided into discrete sleep stages that can be recognized in electroencephalographic (EEG) and other bio-signals by trained specialists or machine learning systems. It is however unclear whether these human-defined stages can be re-discovered with unsupervised methods of data analysis, using only a minimal amount of generic pre-processing. Based on EEG data, recorded overnight from sleeping human subjects, we investigate the degree of clustering of the sleep stages using the General Discrimination Value as a quantitative measure of class separability. Virtually no clustering is found in the raw data, even after transforming the EEG signals of each 30-s epoch from the time domain into the more informative frequency domain. However, a Principal Component Analysis (PCA) of these epoch-wise frequency spectra reveals that the sleep stages separate significantly better in the low-dimensional sub-space of certain PCA components. In particular the component C1(t) can serve as a robust, continuous ‘master variable’ that encodes the depth of sleep and therefore correlates strongly with the ‘hypnogram’, a common plot of the discrete sleep stages over time. Moreover, C1(t) shows persistent trends during extended time periods where the sleep stage is constant, suggesting that sleep may be better understood as a continuum. These intriguing properties of C1(t) are not only relevant for understanding brain dynamics during sleep, but might also be exploited in low-cost single-channel sleep tracking devices for private and clinical use.


Introduction
Sleep is an essential biological behavior (Baker, 1985;Geyer et al., 2009) and therefore highly conserved across animal evolution (Joiner, 2016). In mammals, healthy sleep involves a programmed series of characteristic changes in the activity of body and brain, which include alterations of brain wave and breathing patterns, variations of blood pressure, heart beat and body temperature, as well as modulated biochemical activity. Due to the quasi-periodic structure of these changes, they can be viewed as repetitions of a sleep cycle , subdivided into apparently distinct stages such as Wake, REM, N1, N2 and N3.
While the recognition of these sleep stages from electroencephalographic (EEG) recordings (Barlow, 1993) and other measured biosignals was reserved for trained specialists in the past, the invention of modern machine learning and data analysis tools has quickly led to systems for automatic sleep stage detection (Boostani et al., 2017;Faust et al., 2019;Chriskos et al., 2021;Krauss et al., 2021), with the promise of eventually freeing medical doctors in sleeping labs from the burden of manual sleep stage classification.
However, the typical automatic classifiers used in the field of machine learning are black boxes with huge numbers of parameters, which makes their classification decisions hard to interpret and difficult to reproduce (Castelvecchi, 2016;Marcus, 2018). Can these opaque, self-organized, hierarchical features of multi-layer machine learning models be replaced by more transparent, human-interpretable features?
In a previous paper , we have demonstrated how to use well-defined statistical operators, such as the mean, the variance, or the kurtosis, for aggregating the raw time series of EEG signals into a few human-interpretable, time-dependent statistical variables. As it turned out, the probability distributions of these variables depend to a certain degree on the sleep stage, and this dependence could be exploited for Bayesian sleep stage detection. In this 'flat' Bayesian approach, all likelihoods and prior probabilities have a simple mathematical meaning, in contrast to the learned weights and biases of a conventional deep multi-layer classifier. On top of the practical application, our analysis of the time-dependent statistical properties of EEG signals also revealed that certain statistical variables follow continuous trends within and even across the distinct sleep stages -a finding that will be reconfirmed in the present work.
Although automatic sleep stage detection has already been proven possible using a variety of different approaches (Boostani et al., 2017;Faust et al., 2019;Chriskos et al., 2021;Krauss et al., 2021), the accuracies achieved by these systems are not as satisfactory than in other machine learning applications. Together with the fact that the agreement about sleep stages is relatively poor even among human specialists (Fiorillo et al., 2019), this raises the question whether the low accuracy merely reflects flaws of the automatic classification systems, or if they are rooted in the data itself: Can human-defined sleep stages be considered as well-defined, 'natural kinds'? Could they be rediscovered by non-supervised data clustering methods without prior knowledge? Are sleep stages really distinct classes or strongly overlapping, fuzzy concepts? What would be the practical consequences of this fuzziness?
We have addressed some of these more theoretical questions in another recent publication (Metzner et al., 2022). We there argued that classes are in general subjective (user-based) and goal-oriented constructs that must not necessarily reflect the objective structure of the underlying data distributions. Human-defined classes can be useful for certain purposes, even though they do not correspond to well-separated clusters in data space but overlap significantly. This overlap leads however to an upper limit for the achievable classification accuracy (Metzner et al., 2022).
To test whether the sleep-stages are natural kinds and therefore can be rediscovered by purely objective data analysis, we quantitatively determined the degree of sleep stage clustering in EEG data space, using the previously developed General Discrimination Value (GDV (Schilling et al., 2021a),). Finding only an extremely small degree of clustering, even after converting the EEG time series of each epoch to the frequency domain, we next investigated whether the degree of sleep stage clustering can be increased by non-supervised dimensionality reduction with an autoencoder. In this experiment we indeed observed a progressive cluster enhancement over the increasingly narrow autoencoder layers, but quantitatively the effect remained extremely small (Metzner et al., 2022).
This work is a follow-up on our former publications regarding the analysis of human EEG signals during sleep (Krauss et al., 2018aTraxdorf et al., 2019;Metzner et al., 2021Metzner et al., , 2022, aiming to enhance the degree of sleep stage clustering by suitable ways of pre-processing and dimensionality reduction. As in (Metzner et al., 2022), we focus on a single EEG channel (the electrode at position F4), split the time series into 30-s epochs that are sleep-stage labeled by a human specialist, and then compute the epoch-wise frequency spectra by Fast Fourier Transformation. Improving upon (Metzner et al., 2022), we now additional explore the effect of 'scaling' the Fourier amplitudes by taking their modulus to a power of γ. Each of the resulting epoch-specific 'spectral vectors' is considered as a point in a high-dimensional data space, and we are interested in any kind of cluster structure within this point distribution.
Learning from our past experiments, we now study in advance how the degree of clustering in a data distribution (again quantified by the GDV) depends on the dimensionality of the data space and on the relative fraction of 'separating' and 'non-separating' features/dimensions. This understanding will then inform the optimal selection of data dimensions.
Since the autoencoder architecture used in (Metzner et al., 2022) for dimensionality reduction of the spectral vectors produced relatively large reconstruction errors, we now turn to Principal Component Analysis (PCA) as an alternative method of 'unsupervised data compression'. This classical method has actually several advantages over neural-network based autoencoders, such as the absence of any tunable model parameters, full mathematical transparency and interpretability, as well as the generation of output dimensions that are mutually uncorrelated.
We then carefully analyze which subset of PCA components should be best included for the GDV-based cluster analysis, comparing different scaling exponents γ. It turns out that the best cluster separation (with a GDV of − 0.211 rather than − 0.047 for the uncompressed spectral vectors) is obtained by retaining only the single PCA component C 1 with γ = 1/2. According to the 'Eigen-spectrum' of C 1 , this component basically measures the relative wave contents of the EEG signal in the frequency regimes below 3 Hz (roughly corresponding to the delta-range) and above 3 Hz (covering the theta, alpha, beta and gamma range).
Interestingly, we find an unexpectedly large Pearson correlation coefficient of 0.59 between the time-(or epoch)-dependent PCA component C 1 (t) and the 'hypnogram', a common plot of the numerical sleep (or vigilance) label L(t) versus time (with Wake = 0, REM = − 1, N1 = − 2, N2 = − 3 and N3 = − 4). We verify this result by plotting C 1 (t) together with L(t) and conclude that C 1 (t) can be regarded as a continuous variable for 'sleep depth', as it indeed closely resembles the hypnogram. Strikingly, this sleep depth variable C 1 (t) rises sharply whenever the subject is switching to a more 'shallow' sleep stage, but it falls much more gradually whenever the subject switches to a 'deeper' sleep stage. As suggested already in , these results support the idea that sleep is better treated as a continuous process rather than a sequence of distinct stages.
A practical application of the sleep depth variable C 1 (t) is, of course, sleep stage prediction -the automatic sleep stage classification from single-channel EEG data. In contrast to other systems, ours is extremely simple and mathematically fully transparent from the pre-processing up to the final dimensionality reduction method.
We asses the performance of the method by computing the correlation between the sleep depth variable C 1 (t) and the hypnogram, separately for 68 independent full-night sleep recordings, obtaining Pearson correlation coefficients of up to 0.8.
Finally, we test the robustness of the method by recording the same human subject simultaneously with two very different EEG devices and then extract the sleep depth variable C 1 (t) separately from both recordings. We find that the two instances of the variable match so closely that they can even be used for a post-hoc temporal synchronization of the EEG machines.
The overall architecture of the paper is as follows: In the Methods section, we first provide a detailed description of the various numerical techniques used in our study, including Multi-Dimensional Scaling (MDS), the Generalized Discrimination Value (GDV) and Principal Component Analysis (PCA), as well as our methods of data preprocessing. In the Results section, we first explore the properties of the GDV, which are then used to systematically enhance sleep-stage clustering and to chose an optimal subset of PCA components. Subsequently we investigate the eigenspectra of the dominating PCA components and their relation to the human-made hypnograms. Finally we compare the sleep depth variables C1(t) extracted from different EEG devices. In the Discussion section, we justify our decision of using a Fourier-based data pre-processing and PCA for dimensionality reduction. In the Conclusion section, we summarize our main results and compare them with a similar study that was based on an autoencoder for dimensionality reduction.

Three-channel sleep EEG data
For the main part of this paper, we are using 68 three-channel EEG data sets from the sleep laboratory of University Hospital Erlangen, each corresponding to a full-night recording of brain signals from a different human subject. The data were recorded with a sampling rate of 256 Hz, using three separate channels F4-M1, C4-M1, O2-M1. In this work, however, we are only using the first channel (F4-M1).
The participants of the study included 46 males and 22 females, with an age range between 21 and 80 years. Exclusion criteria were a positive history of misuse of sedatives, alcohol or addictive drugs, as well as untreated sleep disorders. The study was conducted in the Department of Otorhinolaryngology, Head Neck Surgery, of the Friedrich-Alexander University Erlangen-Nürnberg (FAU), following approval by the local Ethics Committee (323-16 Bc). Written informed consent was obtained from the participants before the cardiorespiratory polysomnography (PSG).
After recording, the raw EEG data were analyzed by a sleep specialist accredited by the German Sleep Society (DGSM), who detected typical artifacts (Tatum et al., 2011) in the data and visually identified the five sleep stages (Wake, REM, N1, N2, N3) in subsequent 30-s epochs, according to the AASM criteria (Version 2.1, 2014) (Conrad, 2007; American Academy of Sleep Medicine et al., 2012).

Multi-dimensional scaling (MDS)
A frequently used method to generate low-dimensional embeddings of high-dimensional data is t-distributed stochastic neighbor embedding (t-SNE) (Van der Maaten and Hinton, 2008a). However, in t-SNE the resulting low-dimensional projections can be highly dependent on the detailed parameter settings (Martin et al., 2016), sensitive to noise, and may not preserve, but rather often scramble the global structure in data (Vallejos, 2019;Moon et al., 2019). In contrast to that, multi-Dimensional-Scaling (MDS) (Torgerson, 1952;Kruskal, 1964Kruskal, , 1978Cox and Cox, 2008) is an efficient embedding technique to visualize high-dimensional point clouds by projecting them onto a 2-dimensional plane. Furthermore, MDS has the decisive advantage that it is parameter-free and all mutual distances of the points are preserved, thereby conserving both the global and local structure of the underlying data.
When interpreting patterns as points in high-dimensional space and dissimilarities between patterns as distances between corresponding points, MDS is an elegant method to visualize high-dimensional data. By color-coding each projected data point of a data set according to its label, the representation of the data can be visualized as a set of point clusters. For instance, MDS has already been applied to visualize for instance word class distributions of different linguistic corpora (Schilling et al., 2021b), hidden layer representations (embeddings) of artificial neural networks (Schilling et al., 2021a;Krauss et al., 2021), structure and dynamics of recurrent neural networks (Krauss et al., 2019a(Krauss et al., , 2019b(Krauss et al., , 2019c, or brain activity patterns assessed during e.g. pure tone or speech perception (Krauss et al., 2018b;Schilling et al., 2021b), or even during sleep (Krauss et al., 2018a;Traxdorf et al., 2019). In all these cases the apparent compactness and mutual overlap of the point clusters permits a qualitative assessment of how well the different classes separate.

Generalized Discrimination Value (GDV)
We used the GDV to calculate cluster separability as published and explained in detail in (Schilling et al., 2021a). Briefly, we consider N points x n ¼ 1..N = (x n,1 , …, x n,D ), distributed within D-dimensional space. A label l n assigns each point to one of L distinct classes C l = 1..L . In order to become invariant against scaling and translation, each dimension is separately z-scored and, for later convenience, multiplied with 1 2 : √ the standard deviation of dimension d. Based on the re-scaled data points s n = (s n,1 , …, s n,D ), we calculate the mean intra-class distances for each class C l and the mean inter-class distances for each pair of classes C l and C m Here, N k is the number of points in class k, and s (k) i is the i th point of class k. The quantity d(a, b) is the euclidean distance between a and b. Finally, the Generalized Discrimination Value (GDV) is calculated from the mean intra-class and inter-class distances as follows: whereas the factor 1 ̅̅ ̅ D √ is introduced for dimensionality invariance of the GDV with D as the number of dimensions.
Note that the GDV is invariant with respect to a global scaling or shifting of the data (due to the z-scoring), and also invariant with respect to a permutation of the components in the N-dimensional data vectors (because the euclidean distance measure has this symmetry). The GDV is zero for completely overlapping, non-separated clusters, and it becomes more negative as the separation increases. A GDV of − 1 signifies already a very strong separation.

Generic pre-processing of EEG epochs
Each recorded single-channel epoch consists of 30 × 256 = 7680 EEG signal values, and our database of 68 person-specific data sets contains totally more than 70000 of these labeled epochs. In the following, we denote the signal value U at time step t ∈ {1, …, 7680} within epoch e of the personal data set s ∈ {1, …, 68} by U (s) e,t .Within each of the 68 data sets, we first remove all epochs in which artifacts where detected by the sleep specialist. The remaining epochs are normalized by z-scoring, where μ (s) is the average signal value and σ (s) is the standard deviation in data set s. By this way, variations of the signal amplitudes between different subjects s are suppressed, but variations between the sleep stages of each individual subject are retained.
Pooling now over all subjects s, we create a unified list of normalized signal vectors U k (t). Each of them corresponds to one of the global epochs k ∈ {1, …, 70174}, contains time steps t ∈ {1, …, 7680}, and has a known sleep-stage label L k .
Next, we convert the time-domain signal vectors U k (t) to the frequency domain by Fast Fourier Transformation (FFT), yielding 3840 complex Fourier amplitudes Â k (f) for each epoch k. Discarding the phase information, we retain only the real-valued modulus, which is taken to the power of γ (called the scaling exponent) in order to enhance cluster separation later on. This produces a total number of 70174 spectral vectors, defined as V k (f) = |Â k (f)| γ , representing the scaled momentary frequency spectrum during the global epoch k. Since our EEG device produces a strong drop of all frequency components above about 35 Hz, we keep only the entries below this cutoff, so that f ∈ {1, …, 1050}.

Principal Component Analysis (PCA)
PCA ( (Jolliffe and Cadima, 2016)) is a well-known technique for reducing the dimensionality of multi-variate datasets, based on replacing the original variables by new linear combinations, the principal components, which are mutually uncorrelated. Subsequently, the principal components can be ordered according to the fraction of data variance each of them captures, and by retaining only the first n components one obtains a useful low-dimensional approximation of the data. Since the PCA linear transformation corresponds to finding the eigenvectors of the data's covariance matrix, it is not only mathematically fully transparent and reproducible, but can also be computed efficiently using modern variants, called 'matrix-free methods'.
In our case, we perform a PCA on the list of spectral vectors V k (f), from which we keep only the first three components C 0 , C 1 and C 2 . By projecting all spectral vectors into this three-dimensional subspace, we obtain 70174 compressed vectors W k (C), each containing the 'most essential information' about the momentary frequency spectrum in a given epoch.
Each of the 70174 data vectors (either the signal, spectral, or compressed vectors) can be considered as a point in a vector space. In the following, we are interested in the distribution of these points, and we analyze how well they cluster with respect to the five human-defined sleep stages.

Sleep depth variable C 1 (k)
It turns out that the best separation of sleep stages is obtained with the first PCA component C 1 = PCA 1 and when using the scaling exponent γ = 0.5 (compare Results section). Since the temporal development of C 1 over subsequent sleep epochs k closely resembles the hypnogram, it can be interpreted as a 'sleep depth' variable. Summing up, it is computed from the temporal EEG signal U k (t) in epoch k in the following way:

Sleep depth C 1 (k) from different EEG devices
For testing the robustness of C 1 (k), we use a separate overnight data set from a different human subject, recorded simultaneously with two EEG devices in a sleeping lab of the Paracelsus Medical University, Nürnberg. The first ('clinical') device has 3 channels and was set to a sampling frequency of 128 Hz. The second ('research') device has 64 channels and was set to a larger sampling frequency, but has after the measurement been down-sampled to 128 Hz as well. In the following, we use only the signals of channel F4 from both devices.
As before, the EEG signals are split into subsequent 30-s epochs k, yielding the signal vectors U k (t). The latter are Fourier transformed, keeping only the lowest 1000 frequency components, and then scaled with the exponent γ = 1/2, yielding the spectral vectors By averaging these spectral vectors over all epochs k, we obtain two overall frequency spectra, V (cli) (f) and V (res) (f), one for each device (See Fig. 7(a)). As they are too different for a direct comparison, we compute a frequency-dependent filter function as the ratio between the clinical and research frequency spectra (See Fig. 7(b)) We now multiply each of the epoch-specific spectral vectors of the research device with this filter function After this filtering, the overall (epoch-averaged) frequency spectra V (cli) (f) and V (res) (f) of the two devices become identical (See Fig. 7(c), where a small difference has been artificially introduced for better visibility).
Next, we fit a PCA model to the spectral vectors V (cli) (f) of the clinical device. This single model is then used, for both devices, to compress the 1000-dimensional spectral vectors V k (f) down to two-dimensional vectors W k (C) = (C 0 (k), C 1 (k)). We further consider only the PCA component C 1 (k), because it can be interpreted as a variable for sleep depth (See Fig. 7(d,e)).

Properties of the Generalized Discrimination Value (GDV)
In this work, we use the GDV (Schilling et al., 2021a) as a quantitative measure for cluster separation. Specifically, we investigate to which degree the clustering of data vectors from different sleep stages can be improved by optimizing the parameters of data pre-processing and subsequent unsupervised dimensionality reduction. It is therefore important to know how the GDV depends on the dimensionality of the data space and on the degree of clustering in the individual dimensions of this space.
For this purpose, we generate an artificial ten-dimensional data set with two data classes. The ten components x i = 0…9 of the data vectors x → are mutually independent (as they actually are after a PCA transformation) and normally distributed with unit variance. However, between the two classes the mean values of the Gaussians differ by certain amounts d i . In particular, we assume that d i = 1 (significant separation) for the first five, but d i = 0 (no separation) for the remaining five components. We then compute the GDV of the data set when more and more of the ten components (dimensions) are included ( Fig. 1(b)). We find that the GDV is monotonically decreasing for the first five (class-separating) dimensions, indicating enhanced clustering of the data points, but the GDV already shows a clear saturation in this example. Consequently, if several dimensions are available in which the data classes are well separating, it can be beneficial to include more than one of these separating dimensions.
However, once the non-separating dimensions are subsequently included, the GDV is strongly increasing, indicating a progressive loss of the clustering that was already achieved before. As a general rule, it is therefore important to remove all non-separating dimensions from the data space, before the clustering structure of the data points is analyzed.

Enhancing sleep-stage clustering
Next, we apply the GDV to quantify the degree of sleep stage clustering in 70174 30-s long epochs of recorded one-channel EEG signals (For details about the data and pre-processing see Methods section. Three example epochs are shown in Fig. 2(a)).
Each of these 'signal vectors' can be considered as a point in a 7680dimensional space and has been labeled by a sleep specialist to belong to one of the five sleep stages (Wake, REM, N1, N2, and N3). The distribution of these points in their data space can be visualized in only two dimensions by using Multi-Dimensional Scaling (MDS, cf. Methods). The MDS visualization of the time-domain signal vectors shows no hints of clustering ( Fig. 2(d)), and this is confirmed by a deneralized discrimination value (GDV, cf. Methods) of only − 0.005.
Next we transform the data vectors to Fourier space, keeping only the first 1050 of the real-valued frequency-dependent amplitudes (Three example epochs are shown in Fig. 2(b)). The MDS visualization of the resulting spectral vectors shows now already a small degree of clustering ( Fig. 2(e)), corresponding to a GDV value of − 0.018.
The small clustering in Fourier space means that the spectral vectors are significantly different in the five sleep stages. This is confirmed by computing the average frequency spectra for each sleep stage (Fig. 2(c)). Note that the high-frequency components are progressively suppressed in the 'deeper' sleep stages. Only the spectrum of the REM phase (blue) does not follow this general ordering.
Next, we perform a PCA-based dimensionality reduction with the spectral vectors, keeping only the first five PCA components (For the MDS visualization, see Fig. 2(f)). This results in a significant enhancement of the sleep stage clustering, yielding a GDV value of − 0.047.

Optimal subset of PCA components
We have shown above that in order to enhance the degree of clustering in a multi-dimensional data space, only the class-separating dimensions should be retained. For this purpose, we next investigate how the GDV is changing when only one or two out of the first three PCA components are selected. The results can be presented in the form of a symmetric matrix, where the diagonal elements correspond to only one selected component (See heat maps in Fig. 3(a-c)). In addition, we investigate the effect of changing the scaling exponent γ of the spectral vectors.
For γ = 1, the best cluster separation is achieved when only the PCA component C 2 is kept ( Fig. 3(a)). The GDV in this case is − 0.122. For γ = 0.7, the best cluster separation is again achieved with only the PCA component C 2 (Fig. 3(b)). The GDV is now − 0.152. For γ = 0.5, however, the best cluster separation is achieved with only C 1 (Fig. 3(c)). The GDV is then − 0.211, indicating a quite significant degree of clustering. Note that values of γ smaller than 1/2 can lead to detrimental effects in the subsequent analysis.
The quantitative GDV values above can also be visually confirmed by plotting the distributions of the single best-separating PCA components in the five sleep stages (Fig. 4(a,c,e)). Note that the maxima of the peaks in the distributions are at different positions for the five sleep stages (with the exception that Wake (black) and N1 (green) are surprisingly similar), but nevertheless there is a significant overlap between all five peaks. This is the kind of 'fuzziness' mentioned in the Introduction, which eventually limits the achievable accuracy in automatic sleep stage classifiers.

Eigenspectra of the PCA components
The spectral vectors form a complex distribution of points in their data space. If this distribution, for the sake of clarity, is imagined as a spheroidal point cloud in a three-dimensional space, the center of mass of this point cloud corresponds to the mean EEG frequency spectrum V mean (f), averaged over all recorded epochs. In this image, the PCA is finding the main axes of the spheroid (the orthogonal axes of maximum variation in the data distribution), each corresponding to one principal component i. It places into data space a new coordinate system that consists of these main axes, with the origin located at the center of the point cloud. Moving from the origin into the direction of one of the main axes i by an amount C i means to modify the frequency spectrum away from the average in a well-defined way: The modification can be mathematically expressed by adding a perturbation to the mean spectrum, namely C i times the 'eigenspectrum' ΔV i (f) of PCA component i. In general, a point in data space with PCA coordinates C → = (C 1 , C 2 , …) corresponds to the frequency spectrum In Fig. 3(d,e,f), we plot the Eigenspectra for the first three PCA components, and for the three tested values of the scaling exponent γ. In the case of γ = 0.5, the Eigenspectrum for the best separating component C 1 (orange curve in Fig. 3(f)) is negative for small frequencies between zero and about 3 Hz, and positive for larger frequencies. Consequently, the more negative PCA component C 1 becomes, the more the low-frequency brain waves dominate over the high frequency waves in the EEG spectrum. We therefore expect C 1 to become more negative in the 'deeper' sleep stages.

Correlation of PCA components and sleep labels
In order to test whether some PCA components indeed correlate with sleep depth, we first compute the mutual Pearson correlation coefficients between the components C 0 …C 2 and the negative numerical sleep label (Wake = 0, REM = − 1, N1 = − 2, N2 = − 3 and N3 = − 4), which corresponds to a hypnogram when plotted over time (more precisely: over successive sleep epochs k). Since positive and negative correlations are equally interesting here, we only consider the modulus of the Pearson coefficients. They are again presented in the form of a symmetrical matrix in Fig. 4(b,d,f).
The mutual correlations between C 0 , C 1 and C 2 are zero, as it should be in PCA. More interestingly, the correlation between the hypnogram and the three PCA components (first row of each matrix) is non-zero, and it is maximal for the same PCA component that also gives the best cluster The first five dimensions 0-4 are (weakly) separating between the two classes and therefore lead to a monotonous decrease of GDV (better clustering), until it saturates. The second five dimensions 5-9 do not separate at all and therefore lead to a monotonous increase of the GDV (worse clustering). It will eventually approach zero (no clustering) if more and more non-separating dimensions are included. This examples demonstrates that non-separating dimensions should be removed from the data. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) separation. In particular, for the a scaling exponent of γ = 0.5, the Pearson coefficient between C 1 and the hypnogram reaches a surprisingly large value of 0.59. We therefore consider only the case γ = 0.5 for the rest of this work.

Sleep depth variable C 1 (t) versus hypnograms
To further test the relation between C 1 (t) and the hypnogram, we plot both quantities during the same time period. Fig. 5 shows three arbitrarily chosen time periods, each with a duration of 250 min. It is obvious that C 1 (t) (blue) resembles the hypnogram (black) quite closely, but there are also some characteristic differences: Whenever the sleep label switches upwards, to a more shallow sleep stage or to the Wake state, C 1 (t) also shows an abrupt increase. By contrast, when the label switches downwards, to a deeper sleep stage, C 1 (t) decreases as well, but in a very gradual way that can easily continue for more than 30 min. These results confirm that (for γ = 1/2), the PCA component C 1 (t) indeed encodes sleep depth in a continuous way.

Performance of C 1 (t) in individual data sets
For potential future applications of C 1 (t) in personal sleep tracking devices and other low-cost scenarios, it is important to analyze the degree of correlation between the sleep depth variable and the ground truth hypnogram for a larger group of full-night recordings. Moreover, it is unclear whether a better performance of C 1 (t) can be achieved when the PCA is fit 'globally' to all available EEG data sets, or 'locally' to each individual subject.
To answer these questions, we compute the Pearson correlation coefficient between C 1 (t) and the hypnogram separately for all 68 available EEG data sets. The results are shown as probability distributions in Fig. 6.
We find a surprisingly large fraction of Person correlations in the 'good' range [0.6, 0.8] and in the 'medium' range [0.4, 0.6], no matter if the PCA is fitted globally (top panel) or locally (bottom panel). However, using only person-specific information for the PCA fit results in a significantly larger fraction of 'bad' outcomes in the range [0, 0.4]. It is therefore more appropriate to fit the PCA model to a large pool of fullnight EEG recordings before the sleep depth variable is evaluated for a new individual.

Sleep depth from different EEG devices
Since the degree of sleep stage clustering was shown in this work to depend strongly on the used data pre-processing (including parameters such as the spectral scaling exponent γ), it is likely that using different EEG devices with distinct spectral responses and noise levels will also affect the clustering and eventually might lead to non-comparable time courses of the extracted sleep depth variable C 1 (t).
To address this potential problem, we measure the same human subject simultaneously with two different EEG setups. One of them is a clinical device with only three EEG channels and the other one is a 64channel 'research' device, but we are using only the channel 'F4' for our comparison. The two machines are not mutually synchronized or coupled in any way, and they are run at different sampling rates.
After recording a whole night of sleep, the F4-signal of the research device is down-sampled to the sampling frequency of the clinical device (128 Hz). When computing the Fourier spectra (re-scaled with γ = 1/2), averaged over all sleep epochs, we find that the spectral responses of the two devices are quite different (Fig. 7(a)). To make the data as similar as possible, we compute a filter function in frequency space, defined as the ratio between the average spectral responses of the clinical and research device ( Fig. 7(b)). By multiplying each epoch-specific spectral vector of the research device with this filter function, the two responses become identical on average (Fig. 7(c)).
Next, we fit a PCA model to the spectral vectors of the clinical machine. Note that this pool of spectral vectors comes only from a single full-night recording. In future practical applications, it might be preferable to use a larger pool of recordings instead.
Based on the given PCA model, we now extract the sleep depth variable C 1 (t) from both machines (Fig. 7(d)). As expected, the time courses do not match, because the two devices are not synchronized and have actually been switched to recording mode at different times.
However, by artificially applying different time-shifts Δt (multiples of 30-s epochs) between the two C 1 (t) signals and computing the Pearson correlation coefficient for each Δt, we find that this cross-correlation has a clear global peak at 27 epochs ( Fig. 7(e)).
Shifting the C 1 (t) time coarse of the research device by this optimum amount, we find a remarkably close match between the two extracted sleep variables (Fig. 7(f)). It is therefore possible to directly compare the C 1 (t) signals measured with different EEG devices by using the above technique.
Another interesting future application would be to compute C 1 (t) separately for the different electrodes of a given EEG device and then to investigate if different brain regions 'fall asleep' at slightly different times or to different degrees.

Discussion
In this work, we continue a line of investigation that focuses on the cluster structure of human EEG data during different sleep stages. We start with the assumption that if sleep stages would be 'natural kinds', they should be discoverable by completely unsupervised methods of data analysis as distinct clusters in data space. However, using the General Discrimination Value as a quantitative measure of class separability, we only find a vanishingly small degree of clustering (GDV = − 0.005) in the raw epoch-wise EEG signals. This is not surprising, because two signal vectors that have been shifted relative to each other by only a few time steps can have an arbitrarily large euclidean distance, even if they belong to the same sleep stage. In general, finding any classseparating features in time-domain EEG data is extremely difficult for a machine learning system without prior information. Even if some method of unsupervised data analysis could find well-separated clusters directly in the space of time-domain data vectors, it is doubtful whether those will correspond to the human-defined sleep stages.
For this reason, we believe that a minimal human-assisted pre-processing and feature selection is required for the analysis of EEG sleep data, an approach that might be called 'weakly supervised' data analysis. In the present study, Fourier-transforming the epoch-wise timedomain vectors to the frequency domain and subsequently ignoring all phase information was sufficient to improve sleep-stage clustering significantly (GDV = − 0.018), and a further improvement was possible by scaling the resulting frequency spectra by a suitable exponent γ. It is however likely that sleep-stage clustering could be further enhanced in future studies by choosing different time windows (currently we have restricted our numerical experiments to 30-s epochs only), or by replacing the Fourier transformation by a suitable wavelet transformation.
As we have demonstrated using the GDV and artificial data, clustering can be maximized by removing all 'non-separating' dimensions from the data vectors, that is, those dimensions with class-independent probability distributions. For this purpose, we could choose from a variety of techniques for dimensionality reduction, including linear methods such as Principal Component Analysis (PCA (Jolliffe, 2002),) or Linear Discriminant Analysis (LDA (Shi and Malik, 2000),), as well as nonlinear ones such as Independent Component Analysis (ICA (Hyvärinen et al., 2000),), t-distributed Stochastic Neighbor Embedding (t-SNE (van der Maaten and Hinton, 2008b),), Isomap ((B Tenenbaum et al., 2000)), Locally Linear Embedding (LLE (Roweis and Saul, 2000),), Non-negative Matrix Factorization (NMF (Lee and Seung, 1999),), Uniform Manifold Approximation and Projection (UMAP (McInnes and HealyJames, 2018),), or the Autoencoder ( (Goodfellow et al., 2016)). While the non-linear methods are generally considered as more powerful, due to their ability to preserve in the produced low-dimensional representations even the complex correlations of 'curved', high-dimensional manifolds, they all are come with certain freely adjustable tuning parameters. Since the results often depend sensitively on these parameters, we do not consider those methods as truly objective. Among the linear methods, LDA is not suitable for our purpose as it needs to be trained in a supervised fashion. We therefore decided to use PCA, a well-known unsupervised method of dimensionality reduction that moreover turned out to be perfectly suited for our pre-processed data.

Conclusion
In this work, we have achieved a substantial enhancement of sleepstage related data clustering up to a GDV value of − 0.211, by combining a proper way of data-preprocessing (epoch-wise Fourier transformation and rescaling) with PCA for dimensionality reduction, and by retaining only the best separating dimension.
For comparison, in our former study (Metzner et al., 2022), we used an autoencoder for dimensionality reduction. The epoch-wise Fourier spectra presented to the input layer of the autoencoder had a GDV = − 0.035. In the subsequent layers, the degree of clustering increased marginally to GDV = − 0.037 in layer 1 and up to GDV = − 0.041 in layer 2, but already in layer 3 it decreased again to GDV = − 0.036. The degree of clustering in our present PCA-based approach is therefore more than  (a,b) For γ = 1.0, component C 2 is maximally separating, but the stages Wake, REM and N1 can be hardly distinguished. The correlation between C 2 and the hypnogram is 0. 4. (c,d) For γ = 0.7, component C 2 is again separating best. Now all sleep stages, with the exception of N1 (green) and Wake (black), can be distinguished. The correlation between C 2 and the hypnogram is 0.52. (c) For γ = 0.5, the superior component is C 1 . Graphically, the separability of the sleep stages appears similar to (b), but actually the GDV is significantly smaller (compare Fig. 3). The correlation between C 1 and the hypnogram is 0.59. Note that, in all cases, the PCA-component with the best cluster separation is also correlating best with the hypnogram. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) five times larger than in the best-separating autoencoder layer 2. Moreover, while data compression with the PCA method always leads to identical results and guaranties perfect reconstruction in the limit of many retained components, the stochastic training of an autoencoder is less reproducible and showed considerable reconstruction errors in our former study.
We found that the single PCA component C 1 (t), in combination with a spectral scaling exponent of γ = 1/2, not only maximizes sleep-stage clustering, but also correlates surprisingly well (Pearson correlation of 0.59) with the hypnogram, a traditional plot of the numerical sleep stage label over time (epoch). We note that this correlation might be further enhanced by adjusting the arbitrary numerical values assigned to the five different sleep stages.
An analysis of the 'eigenspectrum' of C 1 (t) reveals that this PCA component basically measures the relative content of slow (below 3 Hz) and fast (above 3 Hz) brain waves in the momentary spectrum of the EEG signal. It is remarkable that this well-known discriminating feature emerged as being optimal for enhancing sleep-stage clustering in our weakly supervised approach.
When plotting the time-course of C 1 (t) together with the hypnogram, we find that the two curves are strikingly similar, so that C 1 (t) can actually be viewed as a sleep depth variable. However, the two quantities differ in their behavior at transitions from one discrete sleep stage to another: When the brain is 'switching' (according to the criteria of the human rater) to a more shallow sleep stage or to the wake stage, C 1 (t) is also abruptly rising. By contrast, switches to deeper sleep stages only mark the beginning of a continuous downward trend of C 1 (t) that can last for more than 30 min. During these downward trends, the EEG's frequency spectrum is gradually becoming dominated by slow brain waves, even though the sleep stage is rated as constant.
In any case, the gradual evolution of the brain wave spectrum within a fixed sleep stage, together with the strongly overlapping probability distributions of C 1 in the various sleep stages, strongly suggest that sleep is better understood as a continuum, rather than a succession of discrete phases. The sleep depth variable C 1 (t) offers an easy, transparent and reproducible way to track this continuous brain process based on single channel EEG data.

Funding
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): grant SCHU 1272/16-1 (project number 455908056) to HS, grant TR 1793/2-1 (project number 455908056) to MT, grant SCHI 1482/3-1 (project number 451810794) Fig. 5. Temporal evolution of the PCA component C 1 (t) for γ = 0.5 (blue), and its relation to the human-generated hypogram (black). Shown are three time intervals with a duration of 250 min each. Note that C 1 (t) tracks the general 'depth of sleep' very well. However, where the hypnogram is jumping abruptly to one of the deeper sleep stages, C 1 (t) shows a gradual decrease. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) Fig. 6. Distribution of (absolute) Pearson correlation coefficients between the hypnogram and PCA component C 1 . Top: Distribution when the PCA is fit to the aggregated spectra from all 68 data sets. Bottom: Distribution when the PCA is fit to the personal spectrum only.

Ethical approval and informed consent
The main part of the study (68 three-channel data sets) was conducted in the Department of Otorhinolaryngology, Head Neck Surgery, of the Friedrich-Alexander University Erlangen-Nürnberg (FAU), following approval by the local Ethics Committee (323-16 Bc). Written informed consent was obtained from the participants before the cardiorespiratory poly-somnography (PSG). The comparison between different EEG machines (64-channel and 3-channel device) was conducted in the Department of Otorhinolaryngology, Head and Neck Surgery, Paracelsus Medical University, Nürnberg, Germany, following approval by the local Ethics Committee (103-20 B). Written informed consent was obtained from the participants before the cardiorespiratory poly-somnography (PSG).

Third party rights
All material used in the paper are the intellectual property of the authors.  Fig. 7. Comparing PCA component C 1 derived from different EEG devices. The same sleeping subject is simultaneously measured with a 3-channel 'clinial' device (blue) and 64-channel 'research' device (orange). We focus on channel F4 in both cases. By averaging the spectral vectors over all 30-s epochs, we find that the two devices produce different frequency spectra (a). To compensate for these differences, we multiply the spectra of the research device with a suitable filter function (b). After filtering, the two devices produce the same average spectra (c). Since the two EEG devices are not synchronized and have been activated at different times, the time dependence of component C 1 (t) during the first 250 epochs appears different (d). The Pearson correlation between C 1 (t) from the two devices however changes when a time shift is artificially introduced (e). The correlation has a clear maximum for a shift of Δt = 27 epochs. When this optimal time shift is applied, the time courses C 1 (t) from the two devices clearly match (f). The PCA component C 1 is therefore a robust measure that can even be used and compared even across devices. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)