Night-to-Night Variability of Polysomnography-Derived Physiologic Endotypic Traits in Patients With Moderate to Severe OSA

Background Emerging data suggest that determination of physiologic endotypic traits (eg, loop gain) may enable precision medicine in OSA. Research Question Does a single-night assessment of polysomnography-derived endotypic traits provide reliable estimates in moderate to severe OSA? Study Design and Methods Two consecutive in-lab polysomnography tests from a clinical trial (n = 67; male, 69%; mean ± SD age, 61 ± 10 years; apnea-hypopnea index [AHI] 53 ± 22 events/h) were used for the reliability analysis. Endotypic traits, reflecting upper airway collapsibility (ventilation at eupneic drive [Vpassive]), upper airway dilator muscle tone (ventilation at the arousal threshold [Vactive]), loop gain (stability of ventilatory control, LG1), and arousal threshold (ArTh) were determined. Reliability was expressed as an intraclass correlation coefficient (ICC). Minimal detectable differences (MDDs) were computed to provide an estimate of maximum spontaneous variability. Further assessment across four repeated polysomnography tests was performed in a subcohort (n = 22). Results Reliability of endotypic traits between the two consecutive nights was moderate to good (ICC: Vpassive = 0.82, Vactive = 0.76, LG1 = 0.72, ArTh = 0.83). Variability in AHI, but not in body position or in sleep stages, was associated with fluctuations in Vpassive and Vactive (r = –0.49 and r = –0.41, respectively; P < .001 for both). MDDs for single-night assessments were: Vpassive = 22, Vactive = 34, LG1 = 0.17, and ArTh = 21. Multiple assessments (mean of two nights, n = 22) further reduced MDDs by approximately 20% to 30%. Interpretation Endotypic trait analysis using a single standard polysomnography shows acceptable reliability and reproducibility in patients with moderate to severe OSA. The reported MDDs of endotypic traits may facilitate the quantification of relevant changes and may guide future evaluation of interventions in OSA.

OSA is characterized by repetitive upper airway (UA) collapse during sleep, concomitant intermittent hypoxia, arousals from sleep, increased risk for cardiovascular and metabolic comorbidities, and a compromised quality of life. 1 However, the heterogeneity of clinical presentations cannot be captured by the traditional classification of disease severity based on frequency of respiratory disturbances. 2 Pathophysiologic mechanisms, including high upper airway collapsibility, insufficient muscle compensation, ventilatory instability, and low arousal threshold have been described as potential underlying causes. 3 Yet, standard methods to quantify these pathways require complex sleep study protocols, limiting clinical accessibility.
To overcome these limitations, an advanced modelling technique has been developed to estimate the key endotypic traits from routinely collected signals in clinical polysomnography. [4][5][6] Several studies applied this approach to address responses to various OSA treatments, 7,8 the potential to predict treatment success, 9 or the effectiveness of therapeutic intervention in cohorts of mixed disease expressions. 10 Using this method to investigate the pathophysiologic features of OSA in clinical practice may lead to recognition of clinical phenotypes and eventually may enable steps toward precision medicine and personalized treatment in this disorder. 11 The use of endotypic traits for clinical classification of patients with OSA requires a high intraindividual stability and repeatability, which was investigated recently, 12 whereas test-retest variability has not been quantified systematically. In fact, it is unclear to what extent these endotypic traits are influenced by known night-to-night variability of OSA severity and physiologic conditions during sleep. 13 In the current study of moderate to severe sleep-disordered breathing, we explored night-to-night variability of endotypic traits and defined the thresholds for differences unlikely to result from spontaneous variability (minimal detectable difference [MDD]).

Study Participants and Data Collection
The data used for the present evaluation was obtained from a clinical trial described in detail elsewhere. 14 In brief, this was a randomized, placebo-controlled safety and tolerability study evaluating a potential drug treatment for sleep apnea. Inclusion criteria were age 18 to 75 years, BMI $ 20 kg/m 2 and # 35 kg/ m 2 , apnea-hypopnea index [AHI] $ 15 events/h, Epworth Sleepiness Scale score $ 6, and previous experience with CPAP, terminated because of nonacceptance, nontolerability, or both at least 4 weeks before the study. Patients (n ¼ 68) underwent two consecutive standardized in-laboratory polysomnographic sleep recordings at baseline and an additional two nights after 4 weeks. Evaluation of night-to-night variability and test-retest reliability was performed on the two consecutive baseline assessments for all participants (main analysis cohort). Additional evaluation was performed using only the placebo-treated cohort (n ¼ 22) to assess potential long-term variability after 4 weeks (subgroup analysis cohort). One participant was excluded because of lack of valid flow signal. A study flow chart is presented in Figure 1. The study was performed according to the tenets of the Declaration of Helsinki for clinical trials, and oral and written informed consent were obtained from each participant before entry into the study. The protocol was registered in the European Union Clinical Trials Register (Identifier: 2017-004767-13), and the current analysis was approved by the Swedish Ethical Review Authority (registration numbers 045-18 and 2020-06237).

Take-home Points
Study Question: Does a single-night assessment of polysomnography-derived endotypic traits provide reliable estimates in patients with OSA? Results: Nine endotypic traits showed moderate to good reliability, both in the short-term and longterm. The analysis was extended further to provide minimal detectable differences (MDDs) as thresholds to evaluate changes beyond spontaneous variability. Interpretation: Polysomnography-derived endotypic traits from a single-night assessment provide robust markers of the ventilatory control system and upper airway pathophysiologic features in OSA. MDDs of endotypic traits may be used to identify treatment responders and to guide clinical decisions in future clinical practice. The average of consecutive recordings can be used to reduce variance further.

Sleep Study
Full-night, attended, in-laboratory polysomnography recordings (Embla A10 system; Flaga) were obtained in accordance with standards set by the American Academy of Sleep Medicine. 15 The polysomnography recordings montage included EEG, electrooculography, chin and left and right anterior tibialis electromyography, and ECG. In addition, abdominal and thoracic respiratory effort belts, nasal pressure, nasal-oral thermistor, body position, and finger pulse oximetry were recorded.
All recordings were scored manually by a single sleep technologist in accordance with the American Academy of Sleep Medicine criteria. 15 Arousal duration was determined carefully. Hypopneas were defined by the presence of a $ 3% desaturation, an arousal, or both. The frequency of apneic and hypopnic events was captured and calculated as the AHI. Desaturation events of $ 4% were used to determine the oxygen desaturation index (ODI). The derived sleep variables included total sleep time (TST), sleep efficiency, and sleep stages expressed as percentage of TST (eg, rapid eye movement [REM] percentage).

Endotypic Traits Assessment
Endotypic traits were derived according to previously described methodology by Sands and colleagues 4,5 and Terrill and colleagues 6 using an automated analysis tool (PUPBeta, Phenotyping Using Polysomnography, version 02/2022). In brief, the nasal pressure signal was transformed to respiratory flow and used as input, together with manually scored apneas, hypopneas, and arousal events. The recording was split into windows of 7 min's duration, which were used to derive the endotypic traits (Fig 2). Loop gain was computed as the response to both, a 1 cycle/minute disturbance (LG1) and at the natural frequency (LGn). Pharyngeal muscle activity was addressed by reporting the ventilation at minimal, and eupneic ventilatory drive (V min and V passive ), as well as at the ventilatory drive corresponding to the arousal threshold (V active) .
Endotypic traits were evaluated separately during REM and nonrapid eye movement (NREM) sleep and in the supine and nonsupine positions. Although endotypic traits were evaluated in both REM and NREM sleep, emphasis was placed on measurements from NREM sleep, because original validation studies were performed in this sleep stage. Nine separate endotypic traits were computed and used in the final evaluation, describing the ventilatory control system (LG1, LGn, arousal threshold, ventilatory response to arousals, and time delay) and UA pathophysiologic features (V min , V active , and V passive , and muscle compensation [V comp ¼ V active -V passive ]).

Statistical Analysis
Data handling, visualization, and statistical analysis were performed using R version 4.1.3 software (R Foundation for Statistical Computing). Preprocessing was applied on endotypic traits V active and V passive to correct previously described floor and ceiling effects. 4 In detail, a logit function, ln À x 1Àx Á , was applied to both V active and V passive , and then linearly back transformed for presentation within the scale (0,1). Similar corrections were reported previously. 16 Additional information on the transformation of UA characteristics is described in e-Appendix 1 (e-Figs 1-4).
Statistical analysis was conducted independently on both analysis cohorts (n ¼ 67 on consecutive nights and n ¼ 22 within 4 weeks). Normal distribution was not assumed for parameter distribution on individual assessments, but was confirmed for the differences between measurements using Shapiro-Wilk normality test, together with visual inspection of distribution patterns and normal Q-Q plots. Descriptive data are presented as median (interquartile range) for all individual assessments. As a measure of agreement, differences between assessments are presented as mean AE SD and were evaluated using a paired t test. Because of the multiple comparisons, threshold to assume significant differences was set to 0.003 (Bonferroni adjustment). For the subgroup analysis cohort, long-term variability was determined by comparing the average of the two consecutive recordings at baseline to the average of the two consecutive recordings at 4 weeks to control for short-term, night-to-night variability. Intercorrelations have been investigated between the variation in phenotypic traits and standard polysomnography parameters. Visualization of variation between consecutive recordings was performed using Bland-Altman plots. Intraclass correlation coefficients (ICC; derived using the R package psych 17 ) were reported as a measure of reliability and were evaluated according to recently published guidelines 18 as a two-way mixed-effects model on absolute agreement. Reliability was interpreted as moderate (ICC > 0.5) or good (ICC > 0.75).
For evaluation of changes between consecutive recordings, a distribution-based method, the SEM, was applied. The SEM describes the statistical precision of a single measure, that is, with a confidence of 67% that the true value lies within a range of AE SEM. The MDD is derived based on the SEM. The MDD describes a threshold within which a repeated measure may fluctuate with high certainty, that is, 95%. Therefore, a difference between two measures that exceeds the MDD can be considered significantly higher than potential night-to-night variability.
In the present analysis, MDD was computed based on SEM (derived using the R package rel 19 ) as MDD ¼ 1:96 Â ffiffi ffi 2 p Â SEM, following the approach described by Fleiss and Kingman. 20 Both the main analysis cohort (67 Â 2 polysomnography tests) and subgroup analysis cohort (22 Â 2 polysomnography tests) were used to derive MDD for single-night assessments. The subgroup analysis cohort (22 Â 4 polysomnography tests) additionally was used to evaluate MDD between average scores of two recordings. Bootstrapping with i ¼ 500 iterations has been applied to randomly select two pairs of the four assessments for each participant to compute SEM.

Subgroup Analysis Cohort
(2 additional polysomnography tests after 4 wk) n = 22 Main Analysis Cohort (2 consecutive polysomnography tests) n = 67 Recording failure (missing flow signal) n = 1 Exclude from subgroup analysis n = 45 Study Cohort n = 68 Because of previously reported differences in endotypic traits by sleep stage and position, 21-23 evaluation was performed in view of different combinations of these sleep characteristics. To be in line with previous publications, the main focus of this study was on the analysis of phenotypic traits during NREM sleep, including all body positions. Additional analysis of endotypes with respect to body position and sleep stages have been performed to provide guidance on potential applications in the routine clinical setting of automatically derived endotypic traits. One analysis was performed to address the overall expression of phenotypic traits during the entire sleep period, whereas an additional sensitivity analysis was conducted including only NREM sleep with predominant supine body position. To ensure a reasonable level of data quality, a third analysis was performed where recordings with # 2 h of sleep in the supine body position were excluded. To provide additional context on the implications of imposing more liberal or stricter conditions, the number of evaluated 7-min windows in each subcohort were reported, together with the number of evaluated participants.

Results
Baseline characteristics of the main analysis cohort (n ¼ 67 subjects), as well as the secondary analysis cohort (n ¼ 22) are presented in Table 1. Patients were predominantly male, overweight, and had moderate to severe OSA. More than one-third had known but wellcontrolled cardiometabolic morbidities. To provide additional background information, baseline relationships between endotypic traits and selected sleep metrics are described in e-Appendix 2 (e-Figs 5, 6, e- Tables 1, 2).
Evaluation of conventional polysomnography parameters between consecutive nights is shown in Table 2. Approximately normally distributed differences were found for all parameters. AHI tended to be lower, whereas sleep efficiency, TST, and percentage of REM  Results for the comparison of endotypic traits, derived during NREM sleep in all body positions, are included in Table 2. Approximately normally distributed differences were found for all traits. Using the paired t test, only ventilatory response to arousals showed a statistical trend toward increase on the second recording, but no significant differences were found after adjustment for multiple comparisons. Evaluating reliability by ICC showed moderate to good reliability for all endotypic traits, whereas V comp and time delay showed the lowest reliability (ICC, 0.59 and 0.66, respectively). A visualization of variations between two consecutive nights in AHI and LG1 is shown in Figure 3. An in-depth evaluation of the correlation analysis is presented in e-Appendix 3 (e- Table 3). In summary, relevant correlation (ie, |r| > 0.4) between traditional polysomnography and novel endotype parameters was found for night-to-night changes in OSA severity (AHI and ODI) and night-to-night changes in upper airway characteristics (V active , V passive , and V min ). The number of valid 7-min windows for evaluation of endotypic traits was higher in patients with severe compared with moderate sleep apnea. However, fluctuations in either the number of evaluated windows or in polysomnography parameters-such as TST, REM percentage, or time spent in the supine position-did not correlate with fluctuations in endotypic traits. Change in REM percentage was correlated with change in AHI.
An additional analysis of long-term variability was performed in the subgroup analysis cohort (n ¼ 22 with four individual polysomnography tests over 4 weeks). None of the parameters showed significant differences between assessments and all parameters showed similar reliability compared with the previously reported nightto-night variability. However, the 95% CIs generally widened ( Table 3).
Reliability of endotypic traits in different subsets of the data defined by sleep stage and body position are presented in Table 4. Analyses have been performed using the main analysis cohort. The ICC showed moderate to good reliability between the two assessments for all variables. In general, a reduction in the number of participants or the number of 7-min windows was observed when stricter criteria were applied (Table 4), together with increases in variability and overall uncertainty. However, parameters describing UA characteristics (V min , V active , V passive , and V comp ) showed improvements in the analysis only during supine positioning after assuring a minimum time spent in the supine position of > 2 h. Additional results on this sensitivity analysis are present in e-Appendix 4 (e -Tables 4-7) SEM was derived and used to compute MDD for endotypic traits derived during NREM sleep. Both measures were derived to address changes between individual recordings, as well as between mean values of two recordings (subgroup analysis cohort). SEM with 95% CI, together with MDD, are presented in Table 5 for AHI, ODI, and the nine automatically derived endotypic traits.

Discussion
Our study demonstrated moderate to good agreement for nine endotypic trait variables assessed on consecutive nights, agreement that was maintained for at least a 4week period. Night-to-night variability of endotypic traits was not associated with the fluctuations seen for OSA frequency measures, confirming independent stability in view of first-night effects. Results were extended further by contrasting influences of sleep stage and body position. Our study evaluated, for the first time to our knowledge, MDDs based on two consecutive individual assessments, as well as on the average of two recordings. The study provided insights into the clinical usefulness and potential limitations for derivation of OSA endotypes.
The concept of night-to-night variability in OSA was addressed in several previous studies. 24,25 Our results, in general, confirmed previously described typical first night effects, such as a reduction of TST and the percentage of REM sleep. 26 On a group level, AHI and ODI did not differ significantly on repeated assessments. However, we observed substantial variability in the AHI (SD, 13.8) in accordance with other studies on AHI variability in moderate to severe OSA based on clinical polysomnography (reported SDs of between 8 and 27). [27][28][29][30] Moreover, our estimate for reliability of the AHI (ICC, 0.82) is in line with previous analyses (ICC, 0.75-0.81). [31][32][33] Correlation of nightly differences between AHI and time in the supine position have been reported 30 that we were not able to reproduce, although we observed high variability in the supine position. Our study, investigating consecutive nights and variability over a span of up to 4 weeks, adds important information on previous evaluations of variability in polysomnography-derived endotypic trait measurements that evaluated variability within one night or changes over multiple years. 12 No significant difference in endotypic trait variability was observed between consecutive recordings. Known physiologic relationships 3 between OSA severity (AHI and ODI) and UA characteristics of collapsibility (V active , V passive , and V min ) were confirmed (e- Fig 5, e- Table 1) and may explain the relationship between fluctuations in these characteristics. No other relevant correlations (ie, |r| > 0.4) were found, suggesting that the parameters mainly were independent of known physiologic fluctuations between nights. Importantly, we observed that both V active and V passive showed weaker agreement between nights at low numerical values (ie, high collapsibility). This, in turn, may influence the related measures of muscle compensation and may explain the increased uncertainty of this parameter. However, V min , which recently was proposed as a superior marker of collapsibility in a population-based study, 34 generally was more stable and reliable for assessment of UA characteristics, as shown by higher values of ICC. Jointly, our findings suggest that pathophysiologic endotyping using a newly developed method indeed provides stable markers that may overcome between-night fluctuations and first-night phenomena.
In addition to the evaluation of night-to-night variability, this is the first study directly to address consequences of restricting the computation of endotypic traits to NREM sleep or supine body positioning. Previous studies 21-23 demonstrated  Table 2). Despite this, the evaluation of the entire recording without subclassifying sleep stages and position lead to lower variance, as well as increased ICC. In fact, imposing stricter criteria evaluating only supine position and NREM sleep reduced agreement and reliability. With the exclusion of participants with a low duration (# 2 h) of supine sleep, we again found strong reliability in the UA characteristics, whereas the remaining parameters still showed weaker agreement. Our findings suggest that an important trade-off exist between technical model precision and physiologic stability. As a result, position-dependent analyses should be conducted only if sufficient data quality can be assured to reduce the estimate's uncertainty. This could be realized by ensuring a certain data volume, which also was indicated in a related study by applying thresholds for a minimum number of evaluated 7-min windows. 12 If technical limitations cannot be overcome, for example in ambulatory sleep assessments, it may be advisable to evaluate the entire night without separation of sleep and position, because this would provide more stable markers to identify phenotypic characteristics.
Our study provided a first suggestion for how to define minimal detectable changes in the proposed endotypic traits. Thresholds were determined by a distributionbased method widely used to quantify minimal detectable differences. [35][36][37] We performed a traditional analysis using the two consecutive baseline assessments for deriving cutoffs for minimal detectable changes. Previous studies addressing MDD for OSA severity found generally lower AHI thresholds (MDD, 13-18 vs 28), but also were addressing less severe cohorts (mean AHI, 10-15 vs 53). [38][39][40] A second analysis for calculation of MDD values was performed by evaluating the differences between the averages from two pairs of overnight polysomnography recordings, resulting in generally lower MDD thresholds. Future studies exploring these thresholds are warranted. Additional approaches, such as anchor-based methods, 41 should be considered not only to investigate statistically detectable changes, but also to define clinically important differences related to outcome.
Several important study strengths should be recognized.
Our study provided an in-depth investigation of the clinically important question on night-to-night variability and reproducibility of automatically derived endotypic traits using the PUPBeta software tool in a sizable cohort of patients with moderate to severe OSA. We demonstrated the stability of repetitive measures, gave a detailed evaluation on how various subanalyses affect the overall stability of the addressed measures and how these results can be placed into the context of clinical studies. The use of data collected from a clinical trial provides important insights into the application of a high-precision algorithm in the clinical setting. Our data strongly support the stability of derived endotypic traits for use in routine polysomnography assessments. Restricting analyses to specific sleep stage and position conditions may require stricter protocols to ensure reliable measurements.
The study also has important limitations. Our data were collected in a clinical trial and body posture was unrestricted. Consequently, some patients spent only a short time in the supine position, thereby reducing power and increasing variance in the statistical analysis of body position influences. After controlling for at least 2 h of supine positioning in the evaluated recordings, the cohort size reduced by around 50%. Controlling for body position increased the reliability of the UA characteristics, although without improving reliability in  Values are presented as No., mean AE SD, or intraclass correlation coefficient (95% CI). ArTh ¼ arousal threshold (% eupnea); LGn ¼ loop gain at natural frequency; LG1 ¼ loop gain at 1 cycle/min; NREM ¼ nonrapid eye movement; V active ¼ ventilation at the arousal threshold; V comp ¼ muscle compensation; V min ¼ ventilation at minimal ventilatory drive; V passive ¼ ventilation at eupneic drive; VRA ¼ ventilatory response to arousals (% eupnea). a Sigmoidal transformation applied. b Derived using transformed V active and V passive . Sixty-seven consecutive recordings were used to assess differences between individual recordings. Analysis was repeated with the subgroup analysis cohort to provide a context for comparison. Additionally, the 22 participants with four recordings each were evaluated to address differences in scores derived from averaging two nights to indicate the reduction of minimal detectable changes when applying a method to control for technical fluctuations. AHI ¼ apnea-hypopnea index; ArTh ¼ arousal threshold (% eupnea); LGn ¼ loop gain at natural frequency; LG1 ¼ loop gain at 1 cycle/min; MDD ¼ minimal detectable difference; ODI ¼ oxygen desaturation index; V active ¼ ventilation at the arousal threshold; V comp ¼ muscle compensation; V min ¼ ventilation at minimal ventilatory drive; V passive ¼ ventilation at eupneic drive; VRA ¼ ventilatory response to arousals (% eupnea). a Sigmoidal transformation applied. b Derived using transformed V active and V passive .
chestjournal.org the remaining parameters. An important consequence of the data collection process was the evaluation of a preselected cohort: the study population comprised predominantly male, elderly patients with moderate to severe OSA and intolerance of CPAP therapy. This is a relevant cohort for the application of this methodology in the context of evaluating alternative treatment options in OSA, but differences in endotypic expressions with varying demographics have been reported previously. [42][43][44] Furthermore, the initial study design allowed for an extensive evaluation of variability between consecutive recordings, but not for an in-depth evaluation of intraindividual variability or specific subtypes of OSA. Further studies are needed to determine if our findings can be generalized to other cohorts. This includes specifically addressing mild sleep apnea, strictly position-dependent OSA, REMdependent OSA, OSA in women, and OSA in younger populations with predominantly UA abnormalities. Finally, it is important to bear in mind that this is a methodology requiring a full polysomnographic montage and good signal quality to enable the complex mathematical modelling. Although the validity has been studied extensively, the algorithmic realization and details, like transformation of UA characteristics, are updated and improved continuously. Currently, it is unclear to what extent physiologic and technical issues explain the observed variations. Additional evaluation in routine clinical settings is warranted to improve the methodology further.

Interpretation
Endotypic traits derived from polysomnography tests using the PUPBeta software tool provided robust biomarkers over a span of up to 4 weeks. No direct relationship to spontaneous fluctuations in respiratory events was found, suggesting that endotypic traits reflect OSA mechanisms beyond momentary disease expressions. No systematic differences between assessments were observed, together with reasonable reliability assessed by ICC, indicating that a single night recording can be used for reliable endotypic characterization in clinical practice. Application of stricter criteria for sleep stages or body position not only may increase physiologic stability, but also may introduce statistical uncertainty. Detailed analyses on how to quantify the variation of novel endotypic traits with respect to different criteria on sleep and body position were performed. Moreover, this is the first study proposing thresholds to address statistically significant changes in endotypic traits beyond the natural course of night-to-night variation. Further studies applying this classification of endotypic traits in various clinical settings are warranted.

Funding/Support
The clinical study that provided the data for the current analysis was sponsored by