Obstructive sleep apnea screening in different age groups: performance of the Berlin, STOP-Bang questionnaires and Epworth Sleepiness Scale

Highlights • Performance of screening instruments for obstructive sleep apnea.• Obstructive sleep apnea screening in different age groups.• The STOP-Bang questionnaire has high sensitivity for screening OSA.• ROC curve and area under the curve of three sleep disorder screening instruments.• Age has no significant influence on the screening for obstructive sleep apnea.


Introduction
Obstructive Sleep Apnea (OSA) is a very prevalent respiratory disorder, affecting 32.8% of the general population. 1 Its pathophysiology results from the interaction of several factors, such as sleep physiology, nasal airflow, 2 respiratory control and muscle control of the upper airway, 3 which are influenced by age, 4 making it difficult to identify just one site or cause of obstruction. 5 A trend of increased prevalence is observed in the general population, possibly due to factors such as population aging, reaching approximately 84% between the ages of 60 and 85. 6 Young and old people may have a different presentation of the disease, 7 and the latter tend to present mild or unusual symptoms of the disease, such as mood swings, insomnia, and cognitive impairment, 1 making it even more difficult to suspect the presence of the disease and, consequently, recommend type 1 Polysomnography (PSG). This test, which is the gold standard test for OSA diagnosis, has high costs and low availability, especially in middle and low-income countries. 9 Thus, given the difficult access to PSG, clinical instruments for OSA screening have gained increasing importance in clinical practice, helping in a standardized assessment and in the more accurate selection of individuals who should undergo PSG. 10 These instruments present different performances depending on the characteristics of the population studied, whether elderly, sleep laboratory patients, surgical patients or those who suffer from a comorbidity. 11,12 Among the instruments already validated and widely used, we have the Berlin (BQ) 13,14 and STOP-Bang (SBQ) Questionnaires, 11,15 as well as the Epworth Sleepiness Scale (ESS), 16,17 which, despite not having been developed specifically for OSA investigation, 16 ends up aiding in screening for OSA, since it assesses excessive daytime sleepiness, one of the symptoms presented by some OSA patients. In this context, the aim of this study was to evaluate the performance of the BQ, SBQ, and the ESS in screening for obstructive sleep apnea syndrome in adults of different age groups, comparing them to the results of the gold standard, type 1 polysomnography.

Methods
This is a cross-sectional study, with prospective allocation of patients, conducted between February 2020 and January 2022. The sample consisted of individuals from various clinical specialties and seen at the Outpatient Sleep Medicine Clinic of a University Hospital. Inclusion criteria were individuals aged ≥18 years. Those who were unable to adequately understand and answer the questions posed by the examiner without the help of a third party, had a previous history of OSA-related tests, had PSG of poor technical quality with an examination time <4 h, or with a prevalence of central events, had incomplete interview data collected, or had a previous diagnosis of OSA were excluded. The study protocol complied with the Helsinki Declaration and was approved by the Research Ethics Committee, through Plataforma Brazil, substantiated opinion number 3.298.539. Informed consent was obtained from all participants.
An interview was conducted with collection of sociodemographic and clinical-epidemiological data and application of the BQ, SBQ and ESS. In a second stage, all underwent PSG, with registration time and technical quality considered adequate for analysis.
The sample was categorized into 3 age groups: 18---39 years, 40---59 years, and 60 years or older. Each group was analyzed regarding the presence or absence of OSA.

Screening instruments
For the application of the questionnaires, the criteria of the versions approved for use in Brazil were followed. For the BQ, patients with two or more positive categories were classified as high risk for OSA. 18 For the SBQ, those who presented scores ≥3 were classified as high risk for the disease. 19 For the ESS, individuals with scores >10 were considered to have probable excessive daytime sleepiness. 17

Gold-standard
All PSG were performed at the Sleep Laboratory of the Federal University of the State of Rio de Janeiro (LABSONO-UNIRIO). The EMSA system, approved by the Brazilian National Health Surveillance Agency (ANVISA), was used, and had a complete record (more than 7 channels). The electroencephalogram, electrooculogram, chin and leg electromyogram, nasal airflow, thermistor, thoracic and abdominal plethysmography, snoring sensor, body position sensor, pulse oximeter with oxyhemoglobin saturation recording, and electrocardiogram were monitored continuously and simultaneously. In order to maintain a standard of interpretation, the records were manually recorded according to the American Academy of Sleep Medicine (AASM) 20 criteria by trained physicians, all belonging to the LABSONO clinical staff. They were blind as to the results obtained in the screening instruments. The investigator who performed the statistical analyzes was also blinded. The tests with a duration of more than 4 h and without relevant interpretation interference were considered suitable for analysis. Exams with a predominance of central events were not considered.

Diagnosis
The diagnostic criteria followed the International Classification of Sleep Disorders-third edition (ICSD-3), 21 without considering the degree of intensity/severity of the disease. To be classified as having OSA, it was necessary to present signs, symptoms, or comorbidities correlated to OSA and an Apnea-Hypopnea Index (AHI) ≥5 events/h, or only AHI ≥ 15 events/h.

Statistical analysis
The screening instruments were analyzed for OSA identification in relation to PSG results, for each age group.
For the performance analysis of QB, QSB, and ESE, 2 × 2 contingency tables were used, and sensitivity, specificity, Positive and negative Predictive Value (PPV) and Negative Predictive Value (NPV), Positive likelihood Ratio (PSR) and Negative Likelihood Ratio (NLR), and accuracy, with 95% Confidence Interval, were estimated in each age group.
Receiver Operating Characteristic (ROC) curves were constructed and Areas Under the Curve (AUC) were calculated for each screening instrument by age group. Descriptive measures of the score of each questionnaire according to the diagnosis of OSA were obtained for each age group. The Shapiro-Wilks test was applied to verify the distribution of the scores. For distributions that were not normal, the Wilcoxon test was used to assess differences in the medians between patients with and without the diagnosis of OSA. The AUCs obtained were compared two by two using DeLong's test. 22 The curves and AUC comparison tests were performed using the pROC package of the R program. 23

Results
From a total of 494 individuals seen and interviewed, 173 were excluded according to the criteria. Thus, 321 patients were considered suitable for analysis, and the following distribution by age was observed: 83 patients (25.9%) aged 18---39 years, 148 (46.1%) aged 40---59 years, and 90 (28.0%) aged 60 years or more.
We found 254 (79.1%) individuals with OSA in the overall sample, and the mean age was slightly higher among those with OSA (51 years) compared to the mean age of those without the disease (48 years).
Even with the predominance of females in the overall sample (56%), we observed a higher number of males with OSA compared to those without the disease when compared to females. The gender variable showed significant difference (Table 1).
Using only the PSG results as criteria (AHI ≥5 events/h), OSA was identified in 87% of the patients in the overall sample. Regarding degree of severity of the disease, the highest percentage was concentrated in the severe degree. The mild degree had the second highest frequency of individuals, with little percentage difference in the overall sample or in the age groups ( Fig. 1).
As for performance of the instruments, in the general sample, the SBQ stood out for presenting higher sensitivity, PPV, SVR and accuracy (Table 2). In the 18---39 age group, the highest sensitivity of the BQ stood out, but the SBQ showed higher PPV, SVR and accuracy values. The ESS, however, had the worst performance. When analyzing the ROC curves with their respective AUC, it was observed that the SBQ presented the highest AUC (Fig. 2). However, when compared to BQ, it showed little advantage, though statistically significant (p = 0.04426). The ESS obtained the worst AUC and, when compared to the BQ, there was no significant difference between them (p = 0.3157).
Among those aged 40---59 years, the SBQ maintained the best values for sensitivity and accuracy, whereas ESS presented the highest specificity, PPV and SVR ( Table 2). As for AUC, the SBQ maintained the best performance, among the three instruments, while ESS presented the worst value (Fig. 3). As in the previous age group, it was observed that the BQ and SBQ presented a significant difference in the AUC (p = 0.01123), with the latter performing better. This difference was not observed between the BQ and ESS (p = 0.8177), as between the SBQ and ESS (p = 0.06509).
Among individuals ≥60 years, the SBQ also had good sensitivity, but the ESS stood out for the highest specificity values, PPV and SVR, with the latter more than double the result from the other two instruments ( Table 2). Regarding ROC curves and AUC, the results remained similar to the other age groups. The SBQ presented the best performance, with the highest AUC (Fig. 4), followed by the BQ and ESS, however, the difference in performance between these last two instruments was not significant (p = 0.6187).

Discussion
The motivation of this study was to analyze the three most commonly used clinical instruments in OSA screening in order to verify whether the individual's age range would   influence their performance. This is justified by the fact that individuals with OSA may present different signs, symptoms and clinical implications of the disease depending on their age, 7,8 which is not predicted or adjusted in the BQ, SBQ or ESS.
Given the difficulty of access to PSG, 9 the use of these instruments takes on even greater importance, functioning as screening to be applied before a more costly and complex exam is requested.
Thus, we also tried to identify if there would be, among the three instruments, any that would perform better in OSA screening, depending on the age group. It is expected that an ideal screening instrument would have high sensitivity and reasonable specificity. 24 As observed in results, the SBQ had   a good performance in all age groups, with good sensitivity among those ≥40 years. However, in a clinical investigation, predictive values gain importance for estimating the probability of disease (or its absence) from the test result, 24 while the likelihood ratio contributes in estimating how much a given test contributes to the probability of disease detection, compared to the prevalence of this disease. 25 The SBQ also showed good values for PSR and PPV, but the latter was also high for the other two instruments, probably due to the high prevalence of OSA in the sample.
The construction of the ROC curves and calculation of the AUC provided a simpler and more objective analysis between the instruments, corroborated the tests cited and added information on the comparability between the instruments in the different age groups. It was evident, for example, that between 18 and 39 years of age, the SBQ was the instrument with the best results, but with little difference compared to the BQ. Thus, in this age group, both instruments could be used satisfactorily as screening tests, unlike the ESS, which showed a non-significant AUC value in this age group.
Among individuals aged 40---59 years, the SBQ also presented the highest AUC value, being the only instrument with a p value lower than 0.05. However, a significant difference was observed between the AUC values of BQ and SBQ. This may be justified by the fact that the ESS presented the best PPV and specificity values among the instruments in this age group.
Among individuals aged 60 years or older, once again the SBQ showed better performance. The comparison between BQ and ESS did not show a significant p-value, and as such, in this age group, we could not conclude which, between these two, had a better performance. It is worth noting that ESS presented its highest AUC value, with p < 0.05. This may be explained by its high specificity and PLR in this age group.
This better overall performance of the SBQ had already been evidenced by other researchers, in samples with similar mean ages as the present study, 25,26 in which SBQ showed the highest and most significant AUC in all age groups. On the other hand, the BQ only showed significant AUC for individuals between the ages of 18 and 39. The good performance of these two questionnaires among younger people may be explained by the fact that they present sub-items that assess the presence of comorbidities and anthropometric data, which have a strong association with OSA in this age group. 25,27,28 The ESS, however, assesses symptoms connected to daytime sleepiness without considering comorbidities or anthropometric data. Among individuals aged 60 years and older, although the SBQ still maintained the best performance, the ESS showed its highest AUC value, significant only in this age group. This may be explained by the fact that many elderly people without sleep disorders nap due to the absence of socially imposed schedules, such as those related to work or education, which is not necessarily indicative of pathological daytime sleepiness. 29 Thus, even though this study does not have external validity, because it is unicentric and carried out with individuals seen in a sleep laboratory, there is the perception that the SBQ is the instrument with the best overall performance. Its application can bring benefits to the population, since it is a practical, short instrument with a simple and direct scoring system, and can be applied by any medical specialty or in family medicine teams. 19 The present study had some limitations: the patients analyzed were those referred to or who voluntarily sought care in the HUGG sleep laboratory, which increases the chance of selection bias. Being a single-center study limits the implications of the results for the general population. On the other hand, among the strengths of this study, one can highlight the considerable sample of adults, all of whom underwent complete polysomnography, performed in a center that is a reference for the diagnosis and treatment of sleep-related disorders, with manual analysis of the results by physicians' blind to the scores of the instruments. There are few studies in the literature that evaluate and compare the performance of OSA screening instruments in different age groups.

Conclusions
In conclusion, the present study, carried out with adult individuals investigated in a Sleep Laboratory with a prevalence of OSA similar to that observed in the literature, showed that the Berlin and STOP-Bang questionnaires performed well in recognizing obstructive sleep apnea syndrome. The most noteworthy was the STOP-Bang, which performed well for any age group. The Epworth Sleepiness Scale did not prove to be a good option for tracking the disease in the sample studied, regardless of age range.
Although the present study has no external validation, it seems sensible to consider applying the STOP-Bang questionnaire, within a clinical setting, without the concern that age may significantly influence the screening method for obstructive sleep apnea. This consideration would be particularly valid in individuals with a similar profile to those investigated in the present study.