Feasibility of automated early postnatal sleep staging in extremely and very preterm neonates using dual-channel EEG

(cid:1) Dual-channel quantitative EEG metrics are useful to distinguish between sleep-wake states of preterm neonates in the ﬁrst postnatal hours. (cid:1) Combining different qEEG metrics helps develop a sleep-wake state classiﬁer robust against rapid maturational changes in the preterm period. (cid:1) The complexity features showed the best sleep staging capability and greatest robustness among several quantitative EEG metrics.


Introduction
Preterm birth, especially very early prematurity (<30 weeks of gestational age, wks GA), exposes the newborn brain to an extrauterine environment at a time when critical developmental processes emerge (Humberg et al., 2020;Pierrat et al., 2021;Tau and Peterson, 2010;Volpe, 2019). Evidence has shown that neonatal sleep-wake organization has long-term impacts on later life  (Anders et al., 1985;Bennet et al., 2018). For establishing a healthy sleep pattern, it is important to personalize care plans according to the infant's sleep/wake cycle, and to integrate caregiving activities that promote sleep into the routine clinical care in the neonatal intensive care unit (NICU) (Als and McAnulty, 2011;Altimier and Phillips, 2016). Hence, an appropriate sleep assessment tool is urgently needed to help caregivers to identify the infant's sleepwake states in the NICU.
There are primarily three distinct sleep-wake states during the preterm period: active sleep (AS), quiet sleep (QS), and wakefulness (W) (Brazelton and Nugent, 1995;Georgoulas et al., 2021;Prechtl, 1974). Between the three states, transitional periods or undifferentiated states, such as drowsiness and intermediate sleep (IS), are also commonly seen in premature infants. Traditionally, the assessment of preterm sleep-wake states in the NICU is performed by sleep specialists based on direct behavioral observation and/or visual evaluation of multiple-parametric polysomnography (PSG) recordings, among which electroencephalography (EEG) signals play a pivotal role (Grigg-Damberger et al., 2007;Werth et al., 2017). However, the visual sleep scoring techniques suffer from several drawbacks, such as the tedious nature of manual labeling work, proneness to subjective bias, prohibitive cost, and extensive training and experience requirements. It is, therefore, necessary to develop a new approach for automated sleep staging in preterm infants admitted to the NICU.
Quantitative EEG (qEEG) analysis has exhibited promising potential for developing a fully automated neonatal sleep staging system (Dereymaeker, Pillay, Vervisch, De Vos, et al., 2017). Using mathematical and statistical algorithms, the qEEG analysis extracts numerical parameters from raw EEG signals (Tong and Thankor, 2009). The qEEG parameters can be utilized not only to depict key clinical features of preterm EEG, such as discontinuity, but also to uncover hidden attributes from EEG signals, such as spectral properties (Hrachovy and Mizrahi, 2015;Watanabe et al., 1999). Compared to complicated black-box models, a qEEG-based automated sleep classification model is explainable and can assist clinicians in making informed decisions in the NICU (Petch et al., 2021).
So far, however, the role of qEEG in sleep staging is less wellstudied in premature infants born before 30 wks GA (i.e., extremely-to-very preterm infants) despite the improved survival rates of this population. Several recent studies have attempted to, at least partially, cover this age group (De Wel et al., 2017;Dereymaeker, Pillay, Vervisch, Van Huffel, et al., 2017;Koolen et al., 2017). Still, these findings might have been affected by using non-validated visual sleep labeling methods for this population . Moreover, these studies focused only on QS and non-QS state classification regardless of the important role of AS in early brain development (Del Rio-Bermudez and Blumberg, 2018;Knoop et al., 2021). Furthermore, extremely and very preterm infants are particularly susceptible to brain impairment during the first postnatal days (Benders et al., 2015;Rice and Jr, 2000). Given the different roles of alternating sleeping and waking states in early brain development (Knoop et al., 2021), it is necessary to start providing appropriate caregiving activities contingent on the infant's state as early as possible. However, it remains unknown whether the qEEG measures would still be a valuable tool for determining different sleep-wake states in newly born preterm infants. Existing preterm sleep-qEEG research has focused on multichannel EEG montages that are difficult to apply to preterm newborns due to their tiny heads and vulnerable skin (De Wel et al., 2017;Dereymaeker, Pillay, Vervisch, Van Huffel, et al., 2017;Koolen et al., 2017;Paul et al., 2003). In contrast, dual-channel amplitude-integrated EEG (aEEG) with access to raw EEG traces is a simplified technique readily available in many NICUs world-wide for bedside monitoring of preterm cerebral function (Tao and Mathur, 2010). Due to its less invasive nature, the (a)EEG monitoring can be initiated shortly after birth, providing a unique window of opportunity to identify the preterm newborn's earliest sleep-wake states.
In this light, the current study targets the least studied preterm age group (<30 wks GA) and aims to evaluate the ability of qEEG characteristics in distinguishing between sleep-wake states during their first days after birth. Sleep annotation was performed by using a recently published behavioral sleep scoring system from our group, which is, to our knowledge, the first system validated for extremely and very preterm infants . To provide easily interpretable results, we extracted several representative qEEG measures, including bursting, synchrony, spectral power, and complexity, from dual-channel EEG signals and provided a statistical description of how each separate qEEG feature is distributed across the states of AS, QS, and W. We also aim to identify which specific qEEG parameters are more powerful in discriminating between sleep-wake states.

Patients
The current study enrolled 17 infants born extremely or very preterm (10 males; mean wks GA = 27.69, standard deviation (SD) = 1.16, range 25.14 -29.43 wks GA) and admitted to the NICU of the Wilhelmina Children's Hospital (Utrecht, The Netherlands). All enrolled infants received continuous dual-channel EEG monitoring combined with three-hour visual sleep observation during their first three days after birth. Exclusion criteria were: congenital malformations, seizures, overt brain injury on cerebral ultrasound (e.g., intraventricular hemorrhage grade III-IV), and mother's use of recreational drugs during pregnancy. The study protocol (No. 21-066-C) was reviewed by the Medical Research Ethics Committee (MREC / METC) of the University Medical Center Utrecht, who confirmed that the Medical Research Involving Human Subjects Act (WMO) does not apply to this study. Written informed consent was obtained from parents before enrollment. All data were anonymized prior to analysis.

Visual sleep observation
For each patient, three consecutive hours of bedside sleep observation were carried out between 9 am and 7 pm at the NICU within the first three days after birth. Sleep-wake states were manually annotated by two independent observers (AB and EG) using an in-house behavioral sleep scoring system developed and validated for extremely and very preterm infants, called BeSSPI (de . The inter-rater reliability was assessed by Fleiss' Kappa, yielding a j value of 0.79. Four behavioral states, AS, IS, QS and W, were assigned to discrete one-minute epochs based on behavioral measurements, including eyes, vocalizations, facial and body movements, and vital physiologic parameters, including heart and respiratory rate. The vital signs were measured using a mobile bedside patient monitor (IntelliVue MP70, Philips Healthcare, Best, The Netherlands). A smoothing procedure was applied to the observational sleep annotations to increase scoring accuracy by accounting for temporal context information. Details of the sleep observation method can be found in de . The sleep observation could be interrupted by several caretaking events and these interruption epochs were treated as missing values. In the BeSSPI, the state of IS is defined as a temporary transitional period either between sleep states (i.e., AS and QS), or between sleep and wake (e.g., AS and W). As a heterogeneous state, it blends characteristics of different behavioral states together and lacks typical EEG features (Tsuchida et al., 2013). To avoid bias, the IS states were thus excluded from subsequent statistical analysis.

Dual-channel (a)EEG monitoring
Dual-channel (a)EEG monitoring during sleep observation was performed using the BrainZ monitor (Natus Medical Inc., Seattle, WA). Raw EEG signals were recorded at a sampling rate of 256 Hz. Four needle electrodes were placed subcutaneously over the frontal and parietal lobes (F3, F4, P3, P4) according to the International 10-20 system (Jasper, 1958;Shellhaas et al., 2011). In addition, a center electrode was used as reference. A dualchannel bipolar montage derived from the four active electrodes (left: F3-P3, right: F4-P4) was used for subsequent quantitative EEG analysis.

Quantitative EEG metrics
Four types of qEEG metrics were extracted from raw EEG data: 1) bursting, 2) synchrony, 3) spectral power, and 4) complexity. These metrics were selected because of their great potential to capture neonatal sleep cycling and their close associations with neonatal brain functional development (De Wel et al., 2021;. Moreover, different types of qEEG features can provide complementary information and combining them might help improve automated sleep classification performance. Intermittent bursting activity, also known as spontaneous activity transients (SATs), is a dominant EEG feature in early stages of preterm infancy (Vanhatalo and Kaila, 2006). We detected SATs using an automated detection algorithm as presented in Palmu et al. (2010). Three indices of bursting features were calculated: 1) number of SATs per minute (SAT rate), 2) interval duration of the inter-SAT interval (ISI), and 3) percentage of inter-SAT duration per minute (ISP). These features measure the degree of discontinuity of EEG activity. Lower SAT rate, higher ISP, and longer ISI are correlated with a larger amount of discontinuity in EEG tracing (Louis et al., 2016).
The degree of interhemispheric synchrony is estimated by activation synchrony index (ASI), which quantifies the temporal coincidence of bursting activity in left and right hemispheres (Koolen et al., 2014;Räsänen et al., 2013). A higher ASI value indicates profusion of coincidences, while a lack of coincidences results in a lower ASI value.
Spectral power analysis was used to quantify spectral characteristics of EEG. Absolute and relative spectral power of four different frequency bands were computed: delta (d): 0.5-3 Hz, theta (h): 3-8 Hz, alpha (a): 8-15 Hz, beta (b): 15-30 Hz (O'Toole et al., 2016;Tokariev et al., 2012). Absolute power describes the average power of a specific frequency band and relative power is expressed as the ratio of the absolute power to total power of all four bands.
Multiscale entropy (MSE), as a complementary measure of spectral power, characterizes the dynamic complexity of EEG signals by quantifying sample entropy over multiple temporal scales (Costa et al., 2002(Costa et al., , 2005. The range of the time scale factor s is set as 1-20. The MSE curve was generated by plotting the sample entropy as a function of the scale factor. Four features were extracted from the MSE curve: 1) the area under the curve, 2) the maximum value of the curve, 3) the average slope of fine scales (s: 1-5), and 4) the average slope of coarse scales (s: 6-20). The complexity index was defined as the area under the MSE curve. An increased complexity index suggests more unpredictable dynamics of the EEG time sequence, while a reduced value repre-sents more repetitive EEG patterns (Courtiol et al., 2016;De Wel et al., 2017;Kosciessa et al., 2020).
Bursting analyses were performed using an in-house developed software SignalBase (version 10.6.4.0; University Medical Center Utrecht, Utrecht, The Netherlands). The rest of the qEEG analyses were conducted with in-house developed Matlab scripts (Math-Works Inc., Natick, MA, USA). To improve data quality, artifact correction and band-pass filtering (0.5-10 Hz for bursting features, 0.5-30 Hz for the other features) procedures were performed before feature extraction. The qEEG features were extracted from a short-duration window of 60 seconds and summarized across channels using the median value.
2.5. Statistical analysis 2.5.1. Quantitative EEG feature distribution across sleep-wake states For each qEEG feature a generalized linear mixed model (GLMM) with Wald Chi-square test was employed to detect statistically significant differences among different sleep-wake states. Because each patient occupies multiple epochs of sleep observation, the dependencies in the data were taken into account. Specifically, the qEEG feature was treated as dependent variable, sleepwake state as a fixed effect, and patient identifier as a random effect in each GLMM. Considering that some of the features were not normally distributed, the distribution information of each qEEG feature was considered in the GLMM analysis. We used the Akaike information criterion (AIC) to choose an appropriate distribution to best fit each feature from normal, gamma, and lognormal distributions. Post-hoc multiple comparison procedures were conducted to identify differences of the qEEG features between each pair of the sleep states. The P-values were obtained by T-tests or Z-tests, as applicable.
Given the week-by-week developmental changes occurring during the newborn period Stevenson et al., 2020), we further examined if differences exist between extremely preterm (EP, < 28 wks GA) and very preterm (VP, 28-30 wks GA) subgroups. The interaction effect of age Â sleep-wake state was also examined by using GLMM analysis with post-hoc testing, in which age was added as a fixed-effect factor. For all statistical analyses, the significance threshold was set at P < 0.05. Holm's procedure was used to adjust for multiple comparisons in post-hoc analyses. The GLMM analyses were performed using R (version 4.0.5) within R studio (version 1.4.1106).

Automated sleep-wake state classification
Furthermore, we evaluated the role of the qEEG features in automated sleep-wake state classification. A multinomial logistic regression (MLR) model was built to automatically classify sleepwake states into AS, QS, or W, using all of the qEEG features as predictors. The classification performance was measured by metrics of accuracy and macro-averaged area under the receiver operating characteristic curve (AUC of ROC) through a stratified 10-fold cross-validation procedure repeated 20 times. The statistical significance of each performance metric was assessed with a permutation test by shuffling the class labels 1000 times. To further test the effect of age, we also modeled a MLR classifier for EP and VP infants, respectively. The MLR classification analysis was implemented by using scikit-learn (version 0.23.2) within Python 3.8.5.
As the focus of the current study was not on designing a bedside or online implementation architecture, the whole analysis procedure (including all quantitative EEG analyses and statistical analysis) was executed offline on a personal computer with Intel Xeon W-2133 (3.60 GHz) processors.

Demographics and sleep characteristics
Demographic, behavioral, and clinical characteristics of all patients are detailed in Table 1 and Supplementary Table S1. A total of 2803 minutes of annotated sleep-wake data were collected, of which 1435 minutes were labeled as AS (51.2 %), 311 minutes as IS (11.1 %), 961 minutes as QS (34.3 %), and 96 minutes as W (3.4 %).

Quantitative EEG features during different sleep states
As shown in Table 2, all qEEG features, except for ASI, differed significantly between sleep-wake states. Results of the post-hoc pairwise comparison analyses were presented in Supplementary  Table S2.
Significant differences between AS and QS were found for all the three bursting features (cf. Fig. 1). Rate of SAT was higher in AS than in QS (P < 0.001), while ISI and ISP were lower in AS than in QS (both P < 0.001). Rate of SAT was significantly higher in AS than in W (P = 0.011) (cf. Fig. 1A). ISI and ISP were significantly higher in QS than in W (P = 0.020, P < 0.001, respectively) (cf. Fig. 1B and 1C).
Out of the four frequency bands, the low-frequency delta band exhibited the highest absolute and relative power values during all sleep states, with theta being the second, and alpha and beta the lowest (Table 2). Higher delta power was observed in either of the sleep states compared to wakefulness, while higher power in other frequency bands was found in wakefulness compared to sleep states. Relative beta power and absolute power of the delta, alpha and beta bands were significantly different between each pair of the three behavioral states (all P < 0.001) (cf. Fig. 2A, 2C and 2D). Relative delta and alpha power differed significantly in the state pairs of AS-W and QS-W (all P < 0.001) (cf. Fig. 2E and 2G). Significant differences were also found in absolute theta power of AS-QS (P = 0.036) (cf. Fig. 2B) and in relative theta power of AS-W (P = 0.020) (cf. Fig. 2F).
All MSE features were significantly different between each state pair (all P < 0.001), with the area under the MSE curve, the maximum value of the curve, and the average slope of the curve in fine scales following a trend of W > AS > QS, while the average slope of the curve in coarse scales following a pattern of AS > QS > W (cf. Fig. 3).

Effects of age on sleep-qEEG measures
We also tested the influence of age on qEEG measures across different sleep-wake states (Table 3, Supplementary Table S3 and S4). The enrolled infants were divided into two subgroups, EP (<28 wks GA) and VP (28-30 wks GA). Significant age differences were found in SAT, ISP, ASI, relative alpha power, and absolute power of delta, theta and alpha bands. Most remarkably, ASI was significantly higher in AS than in QS for the VP group (P < 0.033). Means were reported with their standard deviation in parentheses. Median Apgar scores were presented with interquartile range (quartile 1, quartile 3) in parentheses. Minutes of each observed sleep-wake state were presented with percentages relative to total observation time in parentheses. M = male; F = female; GA = gestational age; EP = extremely preterm; VP = very preterm; PMA = postmen- .001 *** * P < 0.05, ** P < 0.01, *** P < 0.001. Means were reported with their standard deviation in parentheses. For each qEEG feature a generalized linear mixed model with Wald Chi-square (v 2 ) test was used to detect statistically significant differences among different sleep-wake states. The P-value tests the null hypothesis that all the three sleep-wake states have identical mean values for each qEEG feature. The significance threshold was set at P

Automated sleep stage classification using qEEG features
The MLR classifier based on the combination of all qEEG features reached a mean AUC of 0.748 (SD = 0.065, P < 0.001) and a mean accuracy of 0.658 (SD = 0.041, P < 0.001) (cf. Fig. 4). The most contributing features for recognizing AS, QS and W were the maximum value of the MSE curve, the area under the MSE curve, and the average slope of the MSE curve in fine scales, respectively.  and lower box bounds represent the third and first quartile respectively, the upper and lower whiskers represent 1.5 times the interquartile range from upper and lower quartile respectively, and the white diamonds denote the mean. Asterisks indicate statistically significant differences among sleep-wake state pairs (* P < 0.05, ** P < 0.01, *** P < 0.001, Holm corrected for multiple comparisons). EEG = electroencephalography; SP_ABS = absolute spectral power; SP_REL = relative spectral power; d = delta frequency band; h = theta frequency band; a = alpha frequency band; b = beta frequency band; AS = active sleep; QS = quiet sleep; W = wake.
We further compared the individual predictive power of each kind of feature by building four independent MLR models. The complexity (i.e., MSE) feature subset yielded the best AUC, which is close to the predictive ability of all kinds of feature collection (cf. Fig. 5). The classification performance was also significant for EP (accuracy = 0.664 ± 0.045, P < 0.001; AUC = 0.762 ± 0.077,  * P < 0.05, ** P < 0.01, *** P < 0.001. Means were reported with their standard deviation in parentheses. For each qEEG feature a generalized linear mixed model with Wald Chi-square (v 2 ) test was used to determine whether the interaction effect of age Â sleep-wake states is statistically significant. The significance threshold was set at P X. Wang, A. Bik, E.R. de Groot et al. Clinical Neurophysiology 146 (2023) 55-64 P < 0.001) and VP (accuracy = 0.665 ± 0.065, P < 0.001; AUC = 0.7 13 ± 0.105, P < 0.001) subgroups, separately.

Discussion
Using quantitative analysis of dual-channel EEG, the current study characterized the distributions of a set of representative qEEG measures across different behavioral sleep-wake states (AS, QS, and W) for extremely and very preterm infants (<30 wks GA) during their first three days after birth. Our results showed that most (15/16, 94 %) qEEG features differed significantly between sleep-wake states in the entire sample. The exception was the synchrony feature (i.e., ASI) which only differed significantly between AS and QS in the very preterm subgroup (28-30 wks GA). In addition to ASI, sleep-bursting and sleep-spectral power associations were also found to be influenced by age. Furthermore, the combination of all qEEG features achieved good automated sleep-wake state classification performance.
The results of bursting analyses showed that SAT rate was significantly higher in AS than in QS, while ISP and ISI were higher in QS than in AS. Similar results were also reported by Palmu et al. (2013), who found that the proportion of SAT events was dif-ferent between PSG-defined sleep stages, with AS > QS. Moreover, our findings agree well with existing visual EEG assessment findings that QS and AS are characterized by discontinuous and relatively continuous background EEG activity, respectively (André et al., 2010;Bourel-Ponchel et al., 2020;Dereymaeker, Pillay, Vervisch, De Vos, et al., 2017). A growing body of evidence has now demonstrated the critical importance of SAT events in early human brain development (Arichi et al., 2017;Benders et al., 2015;O'Toole et al., 2016;Tolonen et al., 2007;Vanhatalo and Kaila, 2006). The result of higher SAT in AS than in QS thus lend further support to the present notion that AS, as compared to QS, might contribute more to the early developmental processes (Graven and Browne, 2008;Mirmiran, 1995;Mirmiran et al., 2003). To promote optimal brain development, it is of particular importance for caregivers to safeguard AS in the NICU settings.
Despite the significant sleep-bursting associations, it appeared that the state of wakefulness could not be well characterized by the bursting features in the current sample. Both lowest SAT rate and inter-SAT (ISI and ISP) values were seen during wakefulness (cf. Fig. 1). This is paradoxical, as SAT and inter-SAT measures are negatively correlated. One possible explanation for this paradoxical finding is that the small number of waking states observed in the current study could not adequately represent the actual situation.
Regarding the degree of interhemispheric synchrony, we found lower ASI values during all three sleep-wake states in VP infants compared to EP infants (Supplementary Table S3). This result is in line with previous visual EEG studies in which high synchrony between the two hemispheres was consistently observed before 28 wks GA, followed by asynchronous bursting activity until near-term age (André et al., 2010;Hrachovy and Mizrahi, 2015). The early synchronizations occurring in EP infants are believed to reflect non-synaptic communication (Bourel-Ponchel et al., 2020;Wallois et al., 2021). So far, no evidence exists that transitions between sleep states drive changes in non-synaptic activities in EP infants, which might be the cause for non-significant sleep-ASI associations in this subgroup. The subsequent asynchrony in older preterm infants is considered to be caused by the fact that the generation of non-synaptic events is massively attenuated, and the developing callosal fibers are not yet mature enough to produce stable interhemispheric interactions (Wallois et al., 2021). The synaptic plasticity has long been postulated to be facilitated by AS during early brain development (Del Rio-Bermudez and Blumberg, 2018;Knoop et al., 2021;Kurth et al., 2015;Li et al., 2017;Liu et al., 2007). Hence, it is reasonably speculated that the significantly higher ASI value in AS compared to QS, as found in the subgroup of VP infants, is related to callosal synaptic development.
Additionally, Räsänen et al. (2013) found that ASI could not well represent continuous activity during AS in an older preterm cohort with average conceptional age of 30.4 weeks. This, however, appears to have little discernible impact on the present study, since the EEG itself is almost completely constituted by a discontinuous background in all behavioral states (relatively more continuous in AS than in QS) in preterm infants born before 30 wks GA (André et al., 2010). Future studies should consider the fact that the use of ASI might lead to controversial results in older infants with more continuous EEG background activity, especially during AS.
In accordance with existing preterm EEG evidence, the power spectral analyses revealed an imbalance between low-and highfrequency content. For all sleep-wake states, the overwhelming majority (above 85 %) of overall spectral power was concentrated in the low-frequency delta band. Slow delta waves with superimposed fast activity (i.e., 8-to 30-Hz alpha-beta spindles), also known as ''delta brushes", are the hallmark of preterm EEG and constitute the main component of the SATs (Milh et al., 2007;Fig. 4. The ROC curve of the multinomial logistic regression combined with a repeated 10-fold cross-validation procedure. Macro-averaged AUC value is reported with its standard deviation in parentheses. ROC = receiver operating characteristic; AUC = area under the curve. X. Wang, A. Bik, E.R. de Groot et al. Clinical Neurophysiology 146 (2023) 55-64 Pavlidis et al., 2017;Vecchierini et al., 2007). During the early preterm period, delta brushes occur more in AS than in QS (Vanhatalo and Kaila, 2006;Whitehead et al., 2016), which explains the significantly higher power of the delta, alpha and beta bands during AS compared to QS, as observed in the present work. Higher-frequency components, primarily alpha and beta bands, of neonatal EEG signals have been associated with the processing of exogenous or environmental stimuli during wakefulness (Norman et al., 2008;Saby and Marshall, 2012). Given that infants' brain shows less EEG activity while asleep than awake, it is not surprising to see higher alpha and beta power in wake than in either of the sleep states. Interestingly, there is evidence that neonates are capable of learning and processing external information even during sleep, particularly AS Tarullo et al., 2011). This could also be the reason for the higher alpha and beta power observed in AS than in QS.
The theta power was of limited value for differentiating between sleep-wake states in the entire group (cf. Fig. 2B and 2F). However, most pairwise comparison analyses of theta power reached statistical significance for the EP and VP subgroups, separately (Supplementary Table S4). Previous research suggested that the evolution of theta activities depends upon subsequent activation of endogenous generators (Wallois et al., 2021). The drivers of these generators change at around 28 wks GA (i.e., the boundary between EP and VP subgroups) from spontaneous activities to sensory information. Therefore, it would be attractive to speculate that different generator-based mechanisms underlie the association between theta power and sleep-wake states in extremely and very preterm infants.
The complexity index (i.e., the area under the MSE curve) was highest in W but decreased in AS and reached its lowest level during QS, reflecting the high-to-low level of brain activity associated with each sleep/wake state (Ma et al., 2018). The fine-scale slope of the MSE curve and the maximum value of the curve showed a similar distribution pattern across the three states. In contrast, at coarse time scale, the slope of the MSE curve followed a distinct pattern of AS > QS > W. These results indicate a time scalespecific relationship between entropy and sleep-wake states, which was also seen in healthy adults (Miskovic et al., 2019;Shi et al., 2017). Furthermore, fine-to-coarse time scales of MSE were assumed to be linked to power content in terms of high-to-low frequencies, respectively (Courtiol et al., 2016). This explains the same distribution pattern found for finer-scale MSE and higher frequency power (W > AS > QS), and the same pattern for coarserscale MSE and lower frequency power (AS > QS > W).
Despite the noticeable impact of maturational effects on some of the above-mentioned qEEG features, we found that automated sleep staging is possible in extremely and very preterm infants by combining these different features together. The classification performance (overall AUC = 0.748) is comparable to, or even better than existing automated sleep staging algorithms based on multichannel qEEG in the same population, which were implemented during a later postnatal period (i.e., a few weeks after birth) (De Wel et al., 2017;Dereymaeker, Pillay, Vervisch, Van Huffel, et al., 2017;Koolen et al., 2017). In addition, we found that MSE metrics are the most important features in automated sleep staging, irrespective of age, which implies their greater robustness and reliability compared to other qEEG measures used in the present study. Although the MSE analysis is still rarely applied to preterm EEG, our findings show it as a promising avenue for future studies on better characterizing sleep-wake states and as a useful index applied across different preterm age groups.
Several limitations of this study warrant mention. To achieve reliable estimates of sleep-qEEG patterns, we conducted successive behavioral observations over a period of three hours, covering several complete sleep cycles (André et al., 2010), in relatively healthy infants who were clinically stable at the time of observation. However, this resulted in a small sample size (n = 17), which may reduce the statistical power to draw firm inferences. Furthermore, as sleep pervades preterm newborn's life, only a small number of waking states were observed in the current sample. More awake data is needed for further confirmation of our findings on wakefulness in future work. Another point of concern is the restricted visibility of behaviors in several infants, due to respiratory support or phototherapy during sleep observation (Supplementary Table S1). These conditions led to fewer parameters available to score sleep stages confidently. This issue can be mitigated by the sleep annotation smoothing procedure used in the current study but cannot be entirely excluded. Moreover, a clear sleep state distinction is known to be quite difficult in preterm infants born at less than 30 wks GA, which might inherently limit the sleep classification performance (Barbeau and Weiss, 2017;Bennet et al., 2018;Kuhle et al., 2001).
It is worth highlighting several promising directions for future research. Firstly, despite the use of cross-validation in the current study, an independent external validation sample would help enhance the generalizability (i.e., external validity) of our findings. In the future, we will explore a web-accessible implementation of our analysis procedure, creating an opportunity for NICU clinicians and researchers to validate our qEEG-based automated sleep staging model in their own practice. Secondly, aiming to provide guideline values for clinicians, the present study only included a limited number of typical qEEG features. To create a fully automated sleep classification system in the NICU, future research should consider using a more extensive and comprehensive qEEG feature set. In particular, a basic assumption of the spectral analysis is the stationarity of data, which is not well fulfilled due to the nonstationary nature of neonatal EEG. Future research could apply more advanced time-frequency techniques to improve the timevarying EEG spectral estimation. Finally, state-of-the-art EEG techniques, such as dry electrodes and cable shielding, are being developed for early preterm infants, which would further increase the clinical utility of the present work in the future.

Conclusion
In sum, we demonstrated the feasibility of differentiating sleepwake states using quantitative analysis of dual-channel EEG in preterm infants born before 30 wks GA during their first three days after birth. The fusion of different qEEG features enabled automated sleep-wake state classification with achieving a good prediction performance in both EP and VP infants. Complexity characteristics showed the strongest predictive power among the four kinds of qEEG parameters and might be the most promising and valuable features for future preterm sleep-EEG research. Our findings may provide a promising solution for creating a completely automated sleep assessment tool that assists appropriate care planning in the NICU, thereby enhancing neurodevelopmental outcomes for extremely and very preterm infants.

Declaration of Competing Interest
The authors declare that they have no conflict of interest.