Interpreting interrater reliability coefficients of the Braden scale: A discussion paper

https://doi.org/10.1016/j.ijnurstu.2007.08.001Get rights and content

Abstract

There are many studies investigating psychometric properties of the Braden scale, a scale that predicts the risk for pressure ulcers. The main focus of these studies is validity as opposed to reliability. In order to estimate the degree of interrater reliability a literature review revealed that numerous statistical approaches and coefficients were used (Pearson's product-moment correlation, Cohen's kappa, overall percentage of agreement, intraclass correlation). These coefficients were calculated for the individual items and the overall Braden score and were used inconsistently. The advantages and limitations of every coefficient are discussed and it is concluded that most of them are inappropriate measures. Therefore, estimating the degree of the Braden scale interrater reliability is limited to a certain extent. It is shown that the intraclass correlation coefficient is an appropriate statistical approach for calculating the interrater reliability of the Braden scale. It is recommended to present intraclass correlation coefficients in combination with the overall percentage of agreement.

Section snippets

What is already known about the topic?

  • Pearson's product–moment correlation coefficient and the overall proportion of agreement are the most frequently used coefficients for indicating the degree of the overall Braden score interrater reliability.

  • Interrater reliability of the individual items was calculated using Cohen's kappa.

  • It is assumed that interrater reliability for the Braden scale is high.

What this paper adds

  • Pearson's product–moment correlation and Cohen's kappa are inappropriate measures for the interrater reliability of the Braden scale.

  • Published interrater reliability coefficients are not comparable and interpreting the degree of the Braden scale interrater reliability is limited.

  • The intraclass correlation coefficient in combination with the overall proportion of agreement is recommended for calculating the degree of interrater reliability for single items and the overall Braden score.

Search strategy

The international databases MEDLINE, CINAHL, EMBASE and the German database CARELIT were searched using the terms Braden scale and/or pressure sore or ulcer in combination with interrater reliability and/or studies. All the relevant literature was examined. Whenever further authors or studies were quoted and seemed to be relevant they were obtained and analysed as well.

Findings

A total of 31 studies were identified. All findings were assigned to two categories: the interrater reliability coefficients

Definition of interrater reliability

“Interrater (interobserver) reliability is the degree to which two raters or observers, operating independently, assign the same ratings or values for an attribute being measured or observed” (Polit and Beck, 2004, p. 721). Despite the fact that the meaning of interrater reliability is well known Stemler (2004) has emphasised that interrater reliability is not a single unitary concept. As early as 1975, Tinsley and Weiss differentiated between interrater agreement and interrater reliability.

Conclusion

Numerous statistical approaches were used to calculate the interrater reliability of the Braden scale. Each interrater reliability coefficient yielded different amounts and types of information. Therefore, they cannot be compared with each other and it was shown that most of them were not appropriate for measuring the interrater reliability of the overall Braden score or of single items. So far, the evaluation of the degree of interrater reliability based on published data is limited.

Although

References (68)

  • G.D. Armstrong

    Prametric statistics and ordinal data: a pervasive misconception

    Nursing Research

    (1981)
  • B.O. Baker et al.

    Weak measurements vs. strong statistics: an empirical critique of S.S. Stevens’ proscriptions on statistics

    Educational and Psychological Measurement

    (1966)
  • K.M. Baldwin et al.

    Pressure ulcer risk following critical traumatic injury

    Advances in Wound Care

    (1998)
  • M. Banerjee et al.

    Interpreting kappa values for two-observer nursing diagnosis data

    Research in Nursing and Health

    (1997)
  • D. Barnes et al.

    Clinical application of the Braden scale in the acute-care setting

    Dermatological Nursing

    (1993)
  • B.M. Bates-Jensen et al.

    Standardized quality-assessment system to evaluate pressure ulcer care in the nursing home

    Journal of the American Geriatrics Society

    (2003)
  • G. Bennett et al.

    The cost of pressure ulcers in the UK

    Age and Ageing

    (2004)
  • N. Bergstrom et al.

    The Braden scale for predicting pressure sore risk

    Nursing Research

    (1987)
  • N. Bergstrom et al.

    Predicting pressure ulcer risk

    Nursing Research

    (1998)
  • G. Bours et al.

    The development of a national registration form to measure the prevalence of pressure ulcers in the Netherlands

    Ostomy Wound Management

    (1999)
  • P.F. Brennan et al.

    The kappa statistic for establishing interrater reliability in the secondary analysis of qualitative clinical data

    Research in Nursing and Health

    (1992)
  • S.J. Brown

    The Braden scale: a review of the research evidence

    Orthopaedic Nursing

    (2004)
  • A. Capon et al.

    Pressure ulcer risk in long-term units: prevalence and associated factors

    Journal of Advanced Nursing

    (2007)
  • E.V. Carlson et al.

    Predicting the risk of pressure ulcers in critically ill patients

    American Journal of Critical Care

    (1999)
  • J. Cohen

    A coefficient of agreement for nominal scales

    Educational and Psychological Measurement

    (1960)
  • J. Cohen

    Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit

    Psychological Bulletin

    (1968)
  • T. Defloor et al.

    Pressure ulcers: validation of two risk assessment scales

    Journal of Clinical Nursing

    (2005)
  • A. Dijkstra et al.

    Nursing-care dependency. Development of an assessment scale for demented and mentally handicapped patients

    Scandinavian Journal of Caring Sciences

    (1996)
  • European Pressure Ulcer Advisory Panel, 1998. Pressure ulcer prevention guidelines. Available at:...
  • R.H. Finn

    A note on estimating the reliability of categorical data

    Edducational and Psychological Measurement

    (1970)
  • J.L. Fleiss et al.

    The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability

    Educational and Psychological Measurement

    (1973)
  • J.L. Fleiss et al.

    Statistical Methods for Rates and Proportions

    (2003)
  • P.L. Gardner

    Scales and statistics

    Review of Educational Research

    (1975)
  • B.J. Garvin et al.

    Reliability in category coding systems

    Nursing Research

    (1988)
  • Cited by (33)

    • Validation and clinical impact of paediatric pressure ulcer risk assessment scales: A systematic review

      2013, International Journal of Nursing Studies
      Citation Excerpt :

      Willock et al. (2008) and Gordon (2008, 2009) calculated proportions of agreement, kappa and ICC coefficients that were appropriate statistical measures (Lucas et al., 2010). Huffines and Logsdon (1997) and Suddaby et al. (2005) used Persons r that is inappropriate to indicate reliability (Kottner and Dassen, 2008; Lucas et al., 2010). Study characteristics and results are shown in Table 5.

    • Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed

      2011, International Journal of Nursing Studies
      Citation Excerpt :

      The treatment of sampling errors because of different raters is crucial for the appropriate selection of an ICC (Shrout and Fleiss, 1979; McGraw and Wong, 1996). Moreover, although the ICC is reported in many research reports, it is often not clear which ICC was used (Kottner and Dassen, 2008a; Bhat and Rockwood, 2005). When continuous measurements are split into distinct categories (see item 2), it is recommended that results be calculated for the continuous measurement as well, because the transformation of continuous values into categories may cause difficulties in interpretation and lead to a reduction in statistical power (Colle et al., 2002; Shoukri et al., 2004; Donner and Eliasziw, 1994).

    • Guidelines for reporting reliability and agreement studies (GRRAS) were proposed

      2011, Journal of Clinical Epidemiology
      Citation Excerpt :

      The treatment of sampling errors because of different raters is crucial for the appropriate selection of an ICC [61,84]. Moreover, although the ICC is reported in many research reports, it is often not clear which ICC was used [14,43]. When continuous measurements are split into distinct categories (see item 2), it is recommended that results be calculated for the continuous measurement as well, because the transformation of continuous values into categories may cause difficulties in interpretation and lead to a reduction in statistical power [32,58,85].

    View all citing articles on Scopus
    View full text