Inter-rater agreement in sleep stage classification between centers with different backgrounds

Basner, M.; Griefahn, B.; Penzel, T.

doi:10.1007/s11818-008-0327-y

Inter-rater agreement in sleep stage classification between centers with different backgrounds

Übereinstimmung in der visuellen Schlafstadienauswertung zwischen Zentren mit unterschiedlichem Hintergrund

ORIGINAL ARTICLE
Published: 23 January 2008

Volume 12, pages 75–84, (2008)
Cite this article

Somnologie - Schlafforschung und Schlafmedizin Aims and scope Submit manuscript

M. Basner MD, MSc^1,2,
B. Griefahn MD^2,3 &
T. Penzel PhD^2,4

157 Accesses
24 Citations
3 Altmetric
Explore all metrics

Zusammenfassung

Fragestellung

Unterschiede in der visuellen Schlafauswertung zwischen einem klinischen Zentrum (Universität Marburg, UMA) und zwei Zentren mit Forschungshintergrund (Deutsches Zentrum für Luft- und Raumfahrt, DLR und Universität Dortmund, UDO) sollten bestimmt werden. Die neuen AASM Regeln zur Schlafstadienklassifikation wurden bezüglich ihrer Eignung zur Erhöhung der Übereinstimmung in der Schlafstadienauswertung überprüft.

Patienten und Methoden

Jedes Zentrum trug mit 20 Nächten bei. Alle 60 Nächte (37 Probanden, 9 weiblich, mittleres Alter ± Standardabweichung = 41.8 ± 16.1 Jahre) wurden von jedem Zentrum nach Rechtschaffen und Kales ausgewertet. 20 Probanden wurden auf Schlafapnoe untersucht. Die übrigen Versuchspersonen nahmen an Verkehrslärmwirkungsstudien teil und waren schlafgesund.

Ergebnisse

Laut kappa-Statistik war die Übereinstimmung zwischen den Zentren in 38 % exzellent, in 62 % mäßig bis gut und niemals schlecht. kappa-Durchschnittswerte sanken in der Reihenfolge REM, Wach, Stadium 2, Tiefschlaf (Stadium 3 und Stadium 4 kombiniert), Stadium 4, Stadium 1 und Stadium 3. Die in den verschiedenen Schlafstadien verbrachte Zeit war positiv mit kappa-Werten korreliert. Paarweise Vergleiche zeigten, dass UDO signifikant schlechter in Stadium 1 übereinstimmte. Für die übrigen Schlafstadien konnten jedoch keine signifikanten Unterschiede zwischen den Zentren gefunden werden. Venn-Diagramme zeigten, dass UDO tendenziell mehr Wach und UMA tendenziell mehr Stadium 4 alleine klassifizierten.

Schlussfolgerungen

Insgesamt waren die Unterschiede zwischen den Zentren gering ausgeprägt. Sowohl durch paarweise kappa-Vergleiche zwischen mehreren Zentren/Auswertern als auch durch Venn-Diagramme können systematische Abweichungen einzelner Zentren/ Auswerter aufgedeckt werden, und diese sollten anschließend vertiefendes Training erhalten. Die neuen AASM-Regeln zur Schlafstadienklassifikation werden die Übereinstimmung in der Auswertung vermutlich erhöhen, was zukünftige Studien jedoch erst noch zeigen müssen.

Summary

Question of the study

To investigate inter-rater agree- ment between scorers from three centers with clinical (Marburg University, UMA) or research (German Aerospace Center, DLR and Dortmund University, UDO) backgrounds. Additionally, sleep scoring rules of the new AASM manual for the scoring of sleep and associated events were reviewed regarding possible implications for inter-rater agreement.

Patients and methods

Each of three centers contributed 20 nights. All 60 nights (37 subjects, 9 female, mean age ± sd = 41.8 ± 16.1 years) were scored by each center according to the rules of Rechtschaffen and Kales. Twenty subjects underwent obstructive sleep apnea (OSA) diagnosis, the remaining subjects participated in studies on the effects of traffic noise on sleep and were free of intrinsic sleep disorders.

Results

According to kappa statistics, inter-rater agreement between the three centers was excellent in 38 %, fair to good in 62 % and never poor. Mean kappa values decreased in the order rapid eye movement sleep, wake, stage 2, slow wave sleep, stage 4, stage 1 and stage 3. Time spent in the different sleep stages was positively correlated with kappa values. Pairwise comparisons revealed that agreement on stage 1 was significantly worse for UDO, but concerning all other stages none of the centers deviated significantly from the other two. Analyses of Venn diagrams showed tendencies of UDO for scoring wake alone and of UMA for scoring stage 4 alone.

Conclusions

Differences between clinical and research centers were overall minor. Pairwise kappa comparisons of several centers/scorers as well as Venn diagrams may detect systematic deviances of single centers/ scorers that consequently should receive additional training. The revised AASM rules for sleep scoring will most likely increase inter-rater agreement, but future studies will have to prove this.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Basner M, Isermann U, Samel A (2006) Aircraft noise effects on sleep: Application of the results of a large polysomnographic field study. J Acoust Soc Am 119(5):2772–2784
Article PubMed Google Scholar
Basner M, Samel A (2005) Effects of nocturnal aircraft noise on sleep structure. Somnologie 9(2):84–95
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 70:213–220
Google Scholar
Collop NA (2002) Scoring variability between polysomnography technologists in different sleep laboratories. Sleep Med 3(1):43–47
Article PubMed Google Scholar
Danker-Hopfe H, Herrmann W (2001) Interrater-Reliabilität visueller Schlafstadienklassifikation nach Rechtschaffen- und Kales-Regeln: Review und methodische Erwägungen. Klin Neurophysiol 32(2):89–99
Article Google Scholar
Danker-Hopfe H, Kunz D, Gruber G, Klosch G, Lorenzo JL, Himanen SL, Kemp B, Penzel T, Roschke J, Dorn H, Schlogl A, Trenker E, Dorffner G (2004) Interrater reliability between scorers from eight European sleep laboratories in subjects with different sleep disorders. J Sleep Res 13(1):63–69
PubMed Google Scholar
Ferri R, Ferri P, Colognola RM, Petrella MA, Musumeci SA, Bergonzi P (1989) Comparison between the results of an automatic and a visual scoring of sleep EEG recordings. Sleep 12(4):354–362
PubMed CAS Google Scholar
Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76:378–392
Article Google Scholar
Griefahn B, Marks A, Robens S (2006) Noise emitted from road, rail and air traffic and their effects on sleep. J Sound Vib 295(1–2):129–140
Article Google Scholar
Iber C, Ancoli-Israel S, Chesson A, Quan SF (2007) The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications, 1st, American Academy of Sleep Medicine, Westchester, Illinois
Kim Y, Kurachi M, Horita M, Matsuura K, Kamikawa Y (1993) Agreement of visual scoring of sleep stages among many laboratories in Japan: effect of a supplementary definition of slow wave on scoring of slow wave sleep. Japanese Journal of Psychiatry and Neurology 47(1):91–97
PubMed CAS Google Scholar
Kubicki S, Holler L, Berg I, Pastelak- Price C, Dorow R (1989) Sleep EEG evaluation: a comparison of results obtained by visual scoring and automatic analysis with the Oxford sleep stager. Sleep 12(2):140–149
PubMed CAS Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Article PubMed CAS Google Scholar
Norman RG, Pal I, Stewart C, Walsleben JA, Rapoport DM (2000) Interobserver agreement among sleep scorers from different centers in a large dataset. Sleep 23(7):901–908
PubMed CAS Google Scholar
Penzel T, Behler PG, von Buttlar M, Conradt R, Meier M, Möller A, Danker-Hopfe H (2003) Reliability of visual evaluation of sleep stages according to Rechtschaffen and Kales from eight polysomnographs by nine sleep centers. Somnologie 7(2):49–58
Article Google Scholar
Perneger TV (1998) What’s wrong with Bonferroni adjustments. BMJ 316 (7139):1236–1238
PubMed CAS Google Scholar
Rechtschaffen A, Kales A, Berger RJ, Dement WC, Jacobsen A, Johnson LC, Jouvet M, Monroe LJ, Oswald I, Roffwarg HP, Roth B, Walter RD (1968) A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. Public Health Service, U.S. Government, Printing Office, Washington, D.C.
Schaltenbrand N, Lengelle R, Toussaint M, Luthringer R, Carelli G, Jacqmin A, Lainey E, Muzet A, Macher JP (1996) Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep 19(1):26–35
PubMed CAS Google Scholar
Silber MH, Ancoli-Israel S, Bonnet MH, Chokroverty S, Grigg-Damberger MM, Hirshkowitz M, Kapen S, Keenan SA, Kryger MH, Penzel T, Pressman MR, Iber C (2007) The visual scoring of sleep in adults. J Clin Sleep Med 3(2):121–131
PubMed Google Scholar
Whitney CW, Gottlieb DJ, Redline S, Norman RG, Dodge RR, Shahar E, Surovec S, Nieto FJ (1998) Reliability of scoring respiratory disturbance indices and sleep staging. Sleep 21(7):749–757
PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

German Aerospace Center (DLR), Institute of Aerospace Medicine, Köln, Germany
M. Basner MD, MSc
HGF-Virtual Institute, “Transportation noise – Effect on sleep and performance”, Köln, Germany
M. Basner MD, MSc, B. Griefahn MD & T. Penzel PhD
Institute for Occupational Physiology, Dortmund University, Dortmund, Germany
B. Griefahn MD
Dept. for Cardiology, Sleep Center Charité University Hospital, Berlin, Germany
T. Penzel PhD

Authors

M. Basner MD, MSc
View author publications
You can also search for this author in PubMed Google Scholar
B. Griefahn MD
View author publications
You can also search for this author in PubMed Google Scholar
T. Penzel PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Basner MD, MSc.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Basner, M., Griefahn, B. & Penzel, T. Inter-rater agreement in sleep stage classification between centers with different backgrounds. Somnologie 12, 75–84 (2008). https://doi.org/10.1007/s11818-008-0327-y

Download citation

Received: 28 June 2007
Accepted: 12 December 2007
Published: 23 January 2008
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11818-008-0327-y

Schlüsselwörter

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inter-rater agreement in sleep stage classification between centers with different backgrounds

Zusammenfassung

Fragestellung

Patienten und Methoden

Ergebnisse

Schlussfolgerungen

Summary

Question of the study

Patients and methods

Results

Conclusions

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Schlüsselwörter

Key words

Search

Navigation