Discriminating rapid eye movement sleep from wakefulness by analyzing high frequencies from single-channel EEG recordings in mice

Rapid eye movement sleep (REMS) is characterized by the appearance of fast, desynchronized rhythms in the cortical electroencephalogram (EEG), similar to wakefulness. The low electromyogram (EMG) amplitude during REMS distinguishes it from wakefulness; therefore, recording EMG signal seems to be imperative for discriminating between the two states. The present study evaluated the high frequency components of the EEG signal from mice (80–500 Hz) to support REMS detection during sleep scoring without an EMG signal and found a strong positive correlation between waking and the average power of 80–120 Hz, 120–200 Hz, 200–350 Hz and 350–500 Hz. A highly negative correlation was observed with REMS. Furthermore, our machine learning approach demonstrated that simple EEG time-series features are enough to discriminate REMS from wakefulness with sensitivity of roughly 98 percent and specificity of around 92 percent. Interestingly, assessing only the higher frequency bands (200–350 Hz as well as 350–500 Hz) gives significantly greater predictive power than assessing only the lower end of the EEG frequency spectrum. This paper proposes an approach that can detect subtle changes in REMS reliably, and future unsupervised sleep-scoring approaches could greatly benefit from it.


Results
The median percentage of wakefulness was 47 (39 to 54), whereas REMS had a value of five (4 to 6). As expected, manually scored vigilance states by a sleep expert revealed a significant difference between the duration of wakefulness and REMS (p < 0.0001, Fig. 1, Supplementary Fig. S1 and Supplementary Fig. S2). A similar proportion has been reported in other studies in mice with the same genetic background 13 .
The Pearson correlation coefficient was measured to estimate the linear relationship between the z-scoredaverage power of 80-120 Hz, 120-200 Hz, 200-350 Hz, and 350-500 Hz with the percentage of wakefulness as well as REMS for each hour ( Table 1). As is depicted in Fig. 2 (and Supplementary Fig. S3), during wakefulness, the z-scored-average power of 80-120 Hz, 120-200 Hz, 200-350 Hz and 350-500 Hz is significantly increased, while the value is decreased significantly for REMS (the power spectral density (PSD) was also evaluated, which shows the same trends, Supplementary Fig. S4). In mice, the waking is higher in the dark phase of the light-dark cycle while REMS is higher in the light phase of this nocturnal animal, which is interestingly correlated with the circadian rhythm of the z-scored-average power of 80-120 Hz, 120-200 Hz, 200-350 Hz, and 350-500 Hz (Fig. 2 a,b,c,d).
These findings led us to hypothesize that high frequencies from a single-channel EEG could be used to differentiate REMS from wakefulness; therefore, machine learning was applied to assess the possibility of distinguishing the two vigilance states. Furthermore, we sought to determine the relative contribution of each frequency band to REMS discrimination.
Classification system. Figure 3 shows the flow chart of the algorithm. Based on the features extracted from single-channel EEG recordings, NBC could distinguish REMS from wakefulness reliably and reached a mean ± SD accuracy, sensitivity, and specificity of 92.79 ± 2.75%, 98.20 ± 2.26% and 92.16 ± 3.18%, respectively. NBC was fed by features extracted from visual cortex recordings; however, it was also capable of discriminating REMS from wakefulness using the same features extracted from the sensory and motor cortex (Supplementary  Table S1); therefore, the electrode location did not affect the algorithm's performance. True positive ratio (TPR) and false positive ratio (FPR) in predicting wakefulness and REMS were estimated. As presented in Fig. 4 as a receiver operating characteristic plot (ROC plot), the algorithm reached very high prediction accuracy for wakefulness as well as REMS. The mean ± SD of TPR and FPR for wakefulness were 0.92 ± 0.02 and 0.02 ± 0.02, respectively, compared to 0.97 ± 0.02 and 0.07 ± 0.02 for REMS (Area under the curve was 0.98 ± 0.004). The confusion matrix in Fig. 5a. summarizes the performance of the classification algorithm. To distinguish REMS from wakefulness, our kernel-based NBC model achieved accuracy, sensitivity, and specificity of 92.92 ± 2.60%, 97.82 ± 2.43% and 92.50 ± 2.95%, respectively ( Fig. 5b and Table 2). To further validate the classifier, we compared it to the quadratic discriminant analysis (QDA), k-NN and SVM classifiers. While the kernel-based NBC model was superior to other models (Supplementary Table S2), three other models also predicted REMS very well, indicating that features are highly predictive in REMS discrimination.
The algorithm was then fed with the average power of different frequency bands one by one and assessed the false negative ratio (FNR) of the REMS detection. Hence, it will reveal the frequency band(s) with a higher prediction value. As the proportion of REMS versus wakefulness is imbalanced (Fig. 1), assessing the FNR would indicate the cases where the algorithm neglected REMS and falsely chose wakefulness. As demonstrated in Fig. 6, if the algorithm is fed with only the average power of 0.1 to 4 Hz, the FNR will increase to nearly 100% (Supplementary Table S3) which means all epochs were considered wakefulness. This is also true for the average power of 4-8 Hz, 8 Table S3 and Supplementary Table S4, Supplementary Fig. S4).  (15), while the value is 5 (2) for REMS according to manual scoring, showing significant difference (n = 9, p < 0.0001, according to Mann-Whitney unpaired test). www.nature.com/scientificreports/

Discussion
It is generally accepted that sleep-wake characteristics are found within the lower frequency end of the EEG spectrum 15 . However, we proposed the potential of higher frequency bands (above 80 Hz) in the sole EEG signal for REMS detection and found a strong positive correlation between waking and the average power of 80-120 Hz, 120-   www.nature.com/scientificreports/ assessing only the higher frequency bands gives significantly greater predictive power than assessing only the lower end of the EEG frequency spectrum. To further validate our findings, we also assessed the accuracy, sensitivity and specificity of REMS discrimination with different classification models and electrode locations. Finding high prediction values supported the robustness of the features in REMS discrimination from wakefulness. REM sleep is characterized by low EMG amplitude due to atonia in axial muscles, which is the main feature to distinguish REMS from wakefulness 4 ; therefore, recording EMG signals seems to be imperative for discriminating between the two states. However, we reached high sensitivity and specificity in distinguishing REMS from wakefulness based on the features extracted from single-channel EEG, which in the context of the 3Rs    www.nature.com/scientificreports/ (Replacement, Reduction, and Refinement) would mean considerable refinement. Eliminating EMG results in less pain for the animal and a lower risk of infection due to a simpler EEG montage. Furthermore, several research groups (particularly those in the field of epilepsy) already have extensive EEG databases, and with our EEG-only approach to REMS detection, they would all be able to utilize and assess the database from the point of change in sleep behavior, resulting in a reduction in the number of animals used for future experiments. To our knowledge, there is no other study available using single or multichannel EEG, with or without EMG, to examine the possibility of scoring REMS using higher frequency bands. At present, sleep scoring tends to focus on the lower end of the EEG frequency spectrum. However, two studies assessed an extended range (above 200 Hz) of EEG activity across different vigilance states. Moffett et al., measured EEG activity in healthy, spontaneously behaving rats in the range of 200 to 2000 Hz 14 . The authors defined two new high-frequency EEG bands: Psi (ψ) centered between 260-280 Hz and omega (ω) with a broad peak in the 400-500 Hz range. In accordance with our study, the authors also found the values of the power spectral densities sum in ψ as well as ω band increased during periods of wakefulness. Another study by González et al. analyzed broad-band epidural EEG of rats using Permutation Entropy (PeEn) analysis up to 512 Hz 19 , and found that PeEn was maximal during wakefulness and decreased during sleep (REMS and NREMS). Similar to our findings, the authors reported the largest difference between wakefulness and REMS at the higher range of frequencies (above 200 Hz), which may support the utility of high frequency components of EEG in distinguishing between wakefulness and REMS.
Although high-frequency oscillations (ripples defined between 80-250 Hz and fast ripples between 250-500 Hz) have been extensively studied in epileptic rodents 20,21 , their physiological role is not well understood. Jones et al. used simultaneous in vivo intracellular and epipial field-potential recording to investigate the cellular correlates of fast oscillations in somatosensory cortex of healthy rats, evoked by vibrissa stimulation, and observed "fast oscillations" (200-400 Hz) and "very fast oscillations" (400-600 Hz) in the barrel cortex 22 . Additionally, EEG activity in the range of 200 Hz or higher has been observed in healthy rats following high-frequency stimulation of the thalamus 23 . Rodents' neocortex shows oscillations of 200-600 Hz (but not the hippocampus and parahippocampus) 24 , suggesting rapid integration of tactile information 25 . It is suggested that these oscillations may represent very fast interactions of local circuits established by cortical pyramidal cells during sleep and wakefulness 26 . We also observed a significant increase in the average power of higher frequencies during wakefulness, when animals are exposed to more tactile stimuli, which aligns with the proposed function of higher frequencies in integrating tactile information. However, additional research is necessary to fully elucidate the functional role of high-frequency oscillations in relation to vigilance states.
We assessed epidural EEG from motor, sensory and visual cortex and found no difference in the accuracy, sensitivity and specificity of the classifier. The findings indicated no significant differences in these parameters based on the location of the electrode, suggesting that the micro-structure motif and the dynamical behavior of the cortical signal were independent of the origin of the signal. Our results were consistent with a previous study by González et al., which also found no differences in PeEn analysis of vigilance states based on electrode location 19 . Interestingly, when comparing amplitude entropy values during REMS and wakefulness, the maximum differences were reported in the visual cortex (right and left hemisphere) compared to the motor and sensory cortex (right and left hemisphere). This finding is compatible with our slightly better results when the algorithm was fed by the visual cortex compared to the motor and sensory cortex. Our findings were also consistent with another study by Mondino et al., which assessed the spectral power and coherence of epidural EEG during different vigilance states in various cortical regions, including the olfactory bulb (OB), motor, sensory, and visual cortex (up to 200 Hz) 27 . The authors found differences between wakefulness and REMS in the higher frequency range more prominent than the lower frequency range of the EEG in all recording sites. Moreover, the authors reported a narrow peak in high-frequency oscillation at approximately 130 Hz that could only be observed during REMS in the OB and sensory cortices. This finding might explain our slightly lower accuracy and specificity in the sensory cortex compared to the motor and visual cortex. Overall, our results suggest that changes in high frequency oscillations during REMS and wakefulness are not significantly influenced by the recording site. However, further investigations are needed to fully understand this phenomenon.
One of the main distinctions between wakefulness and sleep is the absence of muscle tone and movement during sleep, particularly in REMS 4 . Muscle activity generates high amplitude, broad-spectrum electrical signals that overlap with the spectral range of neural high-frequency oscillatory activity, and several studies (reviewed here 28 ) suggest that muscular activity could significantly contribute to surface EEG recordings in the highfrequency bands. To decrease contamination with the EMG artifact, skin was opened, holes were drilled and electrodes were inserted on the dura (see "Methods"). It was reported that the epidural recording is less vulnerable to EMG artifact [29][30][31] and can significantly increase the ratio of ongoing brain activity to artifact amplitude in comparison of the simultaneously non-invasive EEG 32 . Additionally, a study by Hansen et al. 33 found no significant differences in the trends of high-frequency oscillation (up to 200 Hz) changes between epidural recording and local field potential (LFP) recording. Since LFP is extracted from a comparatively small area of neuronal tissue located directly at the recording site, where the neuronal signal is generated, it is highly resistant to external muscle activity. This suggests that the signal-to-noise ratio of epidural recording is dependable even in high frequencies. Although we removed epochs with movement artifacts in our analysis, it is still possible that muscle activity contamination (mainly from muscle tone) reached the epidural electrodes through volume conduction, which could contribute to the recorded signal. Furthermore, we did not distinguish between immobile and mobile wakefulness, which have different EMG signature, adding to the complexity of the issue. More research is required to quantify the influence of muscle artifact during both active and inactive wakefulness, as well as sleep, on epidural recordings at high frequencies.
The application of machine learning to imbalanced databases is subject to difficulties, which is the main limitation of this study. The REMS periods are usually short and only around 5% of the entire day, which in the context of machine learning, can make the detection challenging. In this scenario, the classifier may have www.nature.com/scientificreports/ overclassified the majority class (wakefulness) and ignored the infrequent one (REMS), yet still reported a high level of accuracy (see "Methods"). To tackle the issue, instead of oversampling of REMS epochs, we assessed the raw values and reported the sensitivity and specificity of REMS discrimination, which are reliable representations of the classifier's performance. However, further research applying other techniques, in particular, the attention-based model 7 can focus on preventing class-imbalance problems on a highly class-imbalanced dataset.
In the present study, we did not seek to present the best-performing REMS detection algorithm, but to introduce and evaluate a novel EEG-based feature in sleep scoring, which would improve REMS detection. Our approach is capable of detecting subtle REMS alternations reliably in fields where REMS detection is crucial (for instance, in psychiatric 34 or degenerative diseases 35 ). Future algorithms, in particular unsupervised techniques, could benefit greatly from this approach.

Methods
Animals. Nine adult (12-18 weeks, BW: 24-27 g) male wild-type mice (C57BL/6N, Charles River Laboratories GmbH, Germany) were used for this study. The mice were housed individually inside a custom-made, sound-attenuated recording box Faraday Chamber (bench-top Faraday cage, TMC, Peabody, MA, USA) under a 12/12-h light/dark cycle (lights ON: 9 am/lights OFF: 9 pm) and a room temperature of 22 ± 1 °C with a humidity of 50 ± 5% (food and water ad libitum). All experimental procedures were approved by the Committee of Animal Health and Care of the State of Upper Bavaria, Germany (ROB-55.2-2532.Vet_02-19-121) and conducted in accordance with relevant guidelines and regulations. Reporting in the manuscript follows the recommendations in the ARRIVE guidelines.
Surgical procedure. The mice were anesthetized with 1.9-2.2% isoflurane (airflow rate 190-200 ml/min; Univentor 410 Anesthesia Unit, AgnTho's, Lidingö, Sweden) and were fixed in a stereotactic frame. The body temperature was maintained at 37 °C using a feedback-controlled heating pad (CMA 450, Harvard Apparatus, USA). For analgesia, Carprofen (0.004 mg/g of body weight) was subcutaneously injected in the scalp. The head was shaved, the scalp was opened medially and the periosteum was removed. Seven holes were drilled into the skull with a dental precision drill (Typ 4811, KaVo, Biberach, Germany). A PCP-Socket (PRECI-DIP SA Series 861, Switzerland) containing all electrodes was secured laterally using cyanoacrylate (UHU GmbH & Co. KG, Baden, Germany) and dental cement (Paladur, Heraeus-Kulzer, Hanau, Germany). Self-tapping jeweler's screws (Ø 1.2 × 2 mm; Paul Korth GmbH, Lüdenscheid, Germany) were inserted into the two medial holes to stabilize the implant. Electrodes were made of 24 K gold (diameter 150 µm; Haefner & Krullmann GmbH, Germany). The tip of the electrode is ball-shaped to avoid injury as well as to improve surface contact. Three epidural EEG electrodes were implanted unilaterally on the left hemisphere (coordinates in mm from the bregma): primary visual cortex (V1, posterior, − 2.80, lateral, − 2.41), the primary sensory cortex (S1, anterior 1.94, lateral − 2.78), and the primary motor cortex (M1, anterior 1.10, lateral − 1.66). The grounding (reference) electrode was implanted most lateral to S1. A monopolar EMG electrode was inserted into the left neck muscle (Supplementary Fig. S6). For further information regarding the surgical procedure and electrode design, please refer to Fulda et al. 36 and Fritz et al. 37 . The animals received analgesic treatment (Carprofen, 0.067 mg/ml) through drinking water from the pre-surgical day throughout post-surgical day 4. Data acquisition. After 10 days of recovery, the PCP-socket mounted on the animal's head was connected to a pre-amplifier (amplification factor: 1x, custom made, NPI electronics, Tamm, Germany) and a commutator (SL-10 slip-ring commutator, Dragonfly Research & Development, Ridgeley, WV, USA) through a flexible recording cable. The commutator was mounted on a swivel system (custom made, M. Streicher, Innsbruck, Austria) that neutralizes weight and allows free movement of the animal in all three dimensions. After 4 days of adaptation to the cable attachment, 23 h of chronic EEG/EMG recordings were acquired. All recordings began at 09:00 and ended at 08:00 the following morning. Each recording channel was individually amplified (DPA-2FL, NPI electronics, Tamm, Germany) with a gain of 1000 × and filtered with a high-pass filter at 0.1 Hz, low-pass filter at 1000 Hz, and a 50 Hz hardware notch filter. The EEG and EMG signals were digitized with an analogdigital converter (Power1401-3A, Cambridge Electronic Design Limited, Cambridge, England) at a sampling frequency of 1000 Hz. All data were recorded with Spike2 Software (Version 7, CED, Cambridge, Great Britain, https:// ced. co. uk/ produ cts/ spkov in) and stored offline for further data analyses.
Manual vigilance state classification. 23 h of EEG/EMG recordings from each animal were divided into a light phase (first 12 h) and a dark phase (following 11 h). A semi-automated sleep scoring software 5,6,18 was used to annotate vigilance states (Wake, NREMS, REMS) to every epoch (4 s long non-overlapping EEG episodes). The scoring software relies on manual thresholds to determine the sleep stages based on EMG-root mean squared (EMG RMS), delta and theta activities. NREMS is identified by low EMG RMS and high delta activity. A below threshold EMG RMS accompanied by below threshold delta activity and above threshold theta activity in the EEG signal was referred to REMS. Wakefulness was classified by high EMG RMS and low delta activity. The semi-automated sleep scores were manually reviewed and rescored considering a vigilant state that lasted at least 3 epochs as a behaviorally relevant state. Vigilance states lasting for less than 3 epochs were regarded as micro-states and were not considered as a vigilance state change. During the step for the manual confirmation, special care was taken to correctly classify epochs of quiet wakefulness based on the EEG features. Manually scored stages were based on the raw multichannel EEG and EMG; however, the EMG signal was dismissed for the further steps of the study and only the EEG was used. Artifact free EEG episodes were included in the study (the exclusion criteria were noisy EEG signals, e.g., more than 5% movement artifacts). www.nature.com/scientificreports/ Feature extraction. EEG data was segmented into consecutive 4-s, non-overlapping epochs that correspond to manually scored epochs. As sharp waveforms with asymmetric rising and falling phases can confound the spectral analysis 38 the movement artifacts were removed before the FFT calculation. This was accomplished by manually defining a threshold and removing the peak of the artifacts, plus 50 ms before and after it. Then, features were extracted from the raw signal for all epochs. Features consist of the average power in the frequency range of 0. Despite we recorded three EEG channels from the sensory cortex, motor cortex, and visual cortex, we analyzed each channel separately; therefore, the resulting feature vectors consisted of 9 elements (from 9 mentioned frequency bands). However, the study focused on less well-studied frequency bands above 80 Hz. The percentage of the mean wakefulness and REMS for each one-hour period over 23 h was calculated. In order to make sure the results were valid across animals, the standard score (z-score) of each feature was calculated.
Classification of the features. To distinguish REMS from wakefulness, kernel-based naïve Bayes classification (NBC) was employed. NBC is based on the hypothesis of conditionally independent predictors of the given classes. The dataset was randomly split into a training dataset containing 75% of the data points, which was used to train the classifiers based on the manual scoring, and a test dataset containing the remaining 25% of data points that were used to test the accuracy of the algorithms. In order to increase the reliability of the results, the dataset was randomly shuffled 1000 times and the algorithms were trained and tested on the respective datasets. The final classification result is reported based on the average of the 1000 times, however, each individual result was not qualitatively and quantitatively different from the final average. The figures are based on the analysis of features collected from the EEG of the visual cortex; however, the features from the sensory as well as motor cortex were also calculated and reported in the supplementary material. To determine if the features are sufficiently strong to produce similar predictive values in different classifiers, in addition to NBC, QDA, k-NN and SVM were assessed.
Evaluation measures. The REMS and wakefulness scored by the expert occur with widely differing frequencies ( Fig. 1), which can pose non-trivial challenges when creating data-driven systems due to the tendency of systems to over-classify majority class (wakefulness) and misclassify infrequent class (REMS) 39 . In this case, accuracy may not reflect actual performance, especially for a class with low frequency (i.e., REMS). Therefore, the results of the classification were reported by three indicators of accuracy, sensitivity, and specificity in REMS detection, which are defined as follows: In the equations above, true positives (TP) is the number of epochs correctly scored as REMS, false positives (FP) is the number of epochs incorrectly scored as REMS, true negatives (TN) is the number of epochs correctly rejected as REMS, and false negatives (FN) is the number of epochs incorrectly rejected as REMS. For each time bin of 4 s, measurements were individually computed and the outcomes were presented as mean ± SD.
A ROC plot demonstrates our binary classifier system's diagnostic ability for individual recordings based on the TPR and FPR. To summarize the prediction performance of all 9 recordings, we used confusion matrix. Each row of the matrix represents the values in an actual class while each column represents the instances in a predicted class. Statistical analysis. Statistical analysis was performed using GraphPad Prism version 9.1 for Windows (GraphPad Software, USA, https:// www. graph pad. com/). The Mann-Whitney U test was used to compare two datasets (wakefulness and REMS) that were not normally distributed. In order to assess the correlation between the average power of each frequency band and the percentage of wakefulness and REMS, despite applying a z-score to normalize data, normalization was further tested through the Shapiro-Wilk test. Afterward, the Pearson correlation coefficient was assessed. In order to determine which frequency band plays a pivotal role, a onesample Wilcoxon signed-rank test to compare FNR in REMS detection for the power of each frequency band was applied. The hypothetical median was set to 100. Type I error was set at 0.05 for all tests.

Data availability
The scored EEG as well as EMG data, which were implemented in this work can be made available upon request by contacting the first author (Dr. Sadegh Rahimi, Medical University of Innsbruck, Austria, sadegh. www.nature.com/scientificreports/