Interpreting interrater reliability coefficients of the Braden scale: A discussion paper
Section snippets
What is already known about the topic?
- •
Pearson's product–moment correlation coefficient and the overall proportion of agreement are the most frequently used coefficients for indicating the degree of the overall Braden score interrater reliability.
- •
Interrater reliability of the individual items was calculated using Cohen's kappa.
- •
It is assumed that interrater reliability for the Braden scale is high.
What this paper adds
- •
Pearson's product–moment correlation and Cohen's kappa are inappropriate measures for the interrater reliability of the Braden scale.
- •
Published interrater reliability coefficients are not comparable and interpreting the degree of the Braden scale interrater reliability is limited.
- •
The intraclass correlation coefficient in combination with the overall proportion of agreement is recommended for calculating the degree of interrater reliability for single items and the overall Braden score.
Search strategy
The international databases MEDLINE, CINAHL, EMBASE and the German database CARELIT were searched using the terms Braden scale and/or pressure sore or ulcer in combination with interrater reliability and/or studies. All the relevant literature was examined. Whenever further authors or studies were quoted and seemed to be relevant they were obtained and analysed as well.
Findings
A total of 31 studies were identified. All findings were assigned to two categories: the interrater reliability coefficients
Definition of interrater reliability
“Interrater (interobserver) reliability is the degree to which two raters or observers, operating independently, assign the same ratings or values for an attribute being measured or observed” (Polit and Beck, 2004, p. 721). Despite the fact that the meaning of interrater reliability is well known Stemler (2004) has emphasised that interrater reliability is not a single unitary concept. As early as 1975, Tinsley and Weiss differentiated between interrater agreement and interrater reliability.
Conclusion
Numerous statistical approaches were used to calculate the interrater reliability of the Braden scale. Each interrater reliability coefficient yielded different amounts and types of information. Therefore, they cannot be compared with each other and it was shown that most of them were not appropriate for measuring the interrater reliability of the overall Braden score or of single items. So far, the evaluation of the degree of interrater reliability based on published data is limited.
Although
References (68)
Pressure ulcer prevalence, incidence, risk factors, and impact
Clinics in Geriatric Medicine
(1997)- et al.
A clinical trial of the Braden scale for predicting pressure sore risk
Nursing Clinics of North America
(1987) - et al.
Bias, prevalence and kappa
Journal of Clinical Epidemiology
(1993) - et al.
The effect of various combinations of turning and pressure reducing devices on the incidence of pressure ulcers
International Journal of Nursing Studies
(2005) - et al.
High agreement but low kappa: I. The problems of two paradoxes
Journal of Clinical Epidemiology
(1990) - et al.
Validity and reliability of the Braden scale and the influence of other risk factors: a multi-centre prospective study
International Journal of Nursing Studies
(2000) - et al.
Sensitivity and specificity of the Braden scale in the cardiac surgical population
Journal of Wound Ostomy and Continence Nursing
(2000) - et al.
The Braden scale for pressure ulcer risk: evaluating the predictive validity in Black and Latino/Hispanic elders
Applied Nursing Research
(1999) - et al.
Risk assessment scales for pressure ulcers: a methodological review
International Journal of Nursing Studies
(2007) - et al.
Predictive validity of the Braden scale and nurse perception in identifying pressure ulcer risk
Applied Nursing Research
(1996)
Prametric statistics and ordinal data: a pervasive misconception
Nursing Research
Weak measurements vs. strong statistics: an empirical critique of S.S. Stevens’ proscriptions on statistics
Educational and Psychological Measurement
Pressure ulcer risk following critical traumatic injury
Advances in Wound Care
Interpreting kappa values for two-observer nursing diagnosis data
Research in Nursing and Health
Clinical application of the Braden scale in the acute-care setting
Dermatological Nursing
Standardized quality-assessment system to evaluate pressure ulcer care in the nursing home
Journal of the American Geriatrics Society
The cost of pressure ulcers in the UK
Age and Ageing
The Braden scale for predicting pressure sore risk
Nursing Research
Predicting pressure ulcer risk
Nursing Research
The development of a national registration form to measure the prevalence of pressure ulcers in the Netherlands
Ostomy Wound Management
The kappa statistic for establishing interrater reliability in the secondary analysis of qualitative clinical data
Research in Nursing and Health
The Braden scale: a review of the research evidence
Orthopaedic Nursing
Pressure ulcer risk in long-term units: prevalence and associated factors
Journal of Advanced Nursing
Predicting the risk of pressure ulcers in critically ill patients
American Journal of Critical Care
A coefficient of agreement for nominal scales
Educational and Psychological Measurement
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit
Psychological Bulletin
Pressure ulcers: validation of two risk assessment scales
Journal of Clinical Nursing
Nursing-care dependency. Development of an assessment scale for demented and mentally handicapped patients
Scandinavian Journal of Caring Sciences
A note on estimating the reliability of categorical data
Edducational and Psychological Measurement
The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability
Educational and Psychological Measurement
Statistical Methods for Rates and Proportions
Scales and statistics
Review of Educational Research
Reliability in category coding systems
Nursing Research
Cited by (33)
Deep learning based liquid level extraction from video observations of gas–liquid flows
2022, International Journal of Multiphase FlowValidation and clinical impact of paediatric pressure ulcer risk assessment scales: A systematic review
2013, International Journal of Nursing StudiesCitation Excerpt :Willock et al. (2008) and Gordon (2008, 2009) calculated proportions of agreement, kappa and ICC coefficients that were appropriate statistical measures (Lucas et al., 2010). Huffines and Logsdon (1997) and Suddaby et al. (2005) used Persons r that is inappropriate to indicate reliability (Kottner and Dassen, 2008; Lucas et al., 2010). Study characteristics and results are shown in Table 5.
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed
2011, International Journal of Nursing StudiesCitation Excerpt :The treatment of sampling errors because of different raters is crucial for the appropriate selection of an ICC (Shrout and Fleiss, 1979; McGraw and Wong, 1996). Moreover, although the ICC is reported in many research reports, it is often not clear which ICC was used (Kottner and Dassen, 2008a; Bhat and Rockwood, 2005). When continuous measurements are split into distinct categories (see item 2), it is recommended that results be calculated for the continuous measurement as well, because the transformation of continuous values into categories may cause difficulties in interpretation and lead to a reduction in statistical power (Colle et al., 2002; Shoukri et al., 2004; Donner and Eliasziw, 1994).
Guidelines for reporting reliability and agreement studies (GRRAS) were proposed
2011, Journal of Clinical EpidemiologyCitation Excerpt :The treatment of sampling errors because of different raters is crucial for the appropriate selection of an ICC [61,84]. Moreover, although the ICC is reported in many research reports, it is often not clear which ICC was used [14,43]. When continuous measurements are split into distinct categories (see item 2), it is recommended that results be calculated for the continuous measurement as well, because the transformation of continuous values into categories may cause difficulties in interpretation and lead to a reduction in statistical power [32,58,85].
Pressure ulcer risk assessment in critical care: Interrater reliability and validity studies of the Braden and Waterlow scales and subjective ratings in two intensive care units
2010, International Journal of Nursing StudiesPressure ulcer risk profiles of hospitalized patients based on the Braden Scale: A cluster analysis
2022, International Journal of Nursing Practice