Interpreting interrater reliability coefficients of the Braden scale: A discussion paper

doi:10.1016/j.ijnurstu.2007.08.001

International Journal of Nursing Studies

Volume 45, Issue 8, August 2008, Pages 1238-1246

https://doi.org/10.1016/j.ijnurstu.2007.08.001 Get rights and content

Abstract

There are many studies investigating psychometric properties of the Braden scale, a scale that predicts the risk for pressure ulcers. The main focus of these studies is validity as opposed to reliability. In order to estimate the degree of interrater reliability a literature review revealed that numerous statistical approaches and coefficients were used (Pearson's product-moment correlation, Cohen's kappa, overall percentage of agreement, intraclass correlation). These coefficients were calculated for the individual items and the overall Braden score and were used inconsistently. The advantages and limitations of every coefficient are discussed and it is concluded that most of them are inappropriate measures. Therefore, estimating the degree of the Braden scale interrater reliability is limited to a certain extent. It is shown that the intraclass correlation coefficient is an appropriate statistical approach for calculating the interrater reliability of the Braden scale. It is recommended to present intraclass correlation coefficients in combination with the overall percentage of agreement.

Section snippets

What is already known about the topic?

•
Pearson's product–moment correlation coefficient and the overall proportion of agreement are the most frequently used coefficients for indicating the degree of the overall Braden score interrater reliability.
•
Interrater reliability of the individual items was calculated using Cohen's kappa.
•
It is assumed that interrater reliability for the Braden scale is high.

What this paper adds

•
Pearson's product–moment correlation and Cohen's kappa are inappropriate measures for the interrater reliability of the Braden scale.
•
Published interrater reliability coefficients are not comparable and interpreting the degree of the Braden scale interrater reliability is limited.
•
The intraclass correlation coefficient in combination with the overall proportion of agreement is recommended for calculating the degree of interrater reliability for single items and the overall Braden score.

Search strategy

The international databases MEDLINE, CINAHL, EMBASE and the German database CARELIT were searched using the terms Braden scale and/or pressure sore or ulcer in combination with interrater reliability and/or studies. All the relevant literature was examined. Whenever further authors or studies were quoted and seemed to be relevant they were obtained and analysed as well.

Findings

A total of 31 studies were identified. All findings were assigned to two categories: the interrater reliability coefficients

Definition of interrater reliability

“Interrater (interobserver) reliability is the degree to which two raters or observers, operating independently, assign the same ratings or values for an attribute being measured or observed” (Polit and Beck, 2004, p. 721). Despite the fact that the meaning of interrater reliability is well known Stemler (2004) has emphasised that interrater reliability is not a single unitary concept. As early as 1975, Tinsley and Weiss differentiated between interrater agreement and interrater reliability.

Conclusion

Numerous statistical approaches were used to calculate the interrater reliability of the Braden scale. Each interrater reliability coefficient yielded different amounts and types of information. Therefore, they cannot be compared with each other and it was shown that most of them were not appropriate for measuring the interrater reliability of the overall Braden score or of single items. So far, the evaluation of the degree of interrater reliability based on published data is limited.

Although

References (68)

R.M. Allman
Pressure ulcer prevalence, incidence, risk factors, and impact
Clinics in Geriatric Medicine
(1997)
N. Bergstrom et al.
A clinical trial of the Braden scale for predicting pressure sore risk
Nursing Clinics of North America
(1987)
T. Byrt et al.
Bias, prevalence and kappa
Journal of Clinical Epidemiology
(1993)
T. Defloor et al.
The effect of various combinations of turning and pressure reducing devices on the incidence of pressure ulcers
International Journal of Nursing Studies
(2005)
A.R. Feinstein et al.
High agreement but low kappa: I. The problems of two paradoxes
Journal of Clinical Epidemiology
(1990)
R.J.G. Halfens et al.
Validity and reliability of the Braden scale and the influence of other risk factors: a multi-centre prospective study
International Journal of Nursing Studies
(2000)
L.J. Lewicki et al.
Sensitivity and specificity of the Braden scale in the cardiac surgical population
Journal of Wound Ostomy and Continence Nursing
(2000)
C.H. Lyder et al.
The Braden scale for pressure ulcer risk: evaluating the predictive validity in Black and Latino/Hispanic elders
Applied Nursing Research
(1999)
P. Papanikolaou et al.
Risk assessment scales for pressure ulcers: a methodological review
International Journal of Nursing Studies
(2007)
T. VandenBosch et al.
Predictive validity of the Braden scale and nurse perception in identifying pressure ulcer risk
Applied Nursing Research
(1996)

G.D. Armstrong

Prametric statistics and ordinal data: a pervasive misconception

Nursing Research

(1981)

B.O. Baker et al.

Weak measurements vs. strong statistics: an empirical critique of S.S. Stevens’ proscriptions on statistics

Educational and Psychological Measurement

(1966)

K.M. Baldwin et al.

Pressure ulcer risk following critical traumatic injury

Advances in Wound Care

(1998)

M. Banerjee et al.

Interpreting kappa values for two-observer nursing diagnosis data

Research in Nursing and Health

(1997)

D. Barnes et al.

Clinical application of the Braden scale in the acute-care setting

Dermatological Nursing

(1993)

B.M. Bates-Jensen et al.

Standardized quality-assessment system to evaluate pressure ulcer care in the nursing home

Journal of the American Geriatrics Society

(2003)

G. Bennett et al.

The cost of pressure ulcers in the UK

Age and Ageing

(2004)

N. Bergstrom et al.

The Braden scale for predicting pressure sore risk

Nursing Research

(1987)

N. Bergstrom et al.

Predicting pressure ulcer risk

Nursing Research

(1998)

G. Bours et al.

The development of a national registration form to measure the prevalence of pressure ulcers in the Netherlands

Ostomy Wound Management

(1999)

P.F. Brennan et al.

The kappa statistic for establishing interrater reliability in the secondary analysis of qualitative clinical data

Research in Nursing and Health

(1992)

S.J. Brown

The Braden scale: a review of the research evidence

Orthopaedic Nursing

(2004)

A. Capon et al.

Pressure ulcer risk in long-term units: prevalence and associated factors

Journal of Advanced Nursing

(2007)

E.V. Carlson et al.

Predicting the risk of pressure ulcers in critically ill patients

American Journal of Critical Care

(1999)

J. Cohen

A coefficient of agreement for nominal scales

Educational and Psychological Measurement

(1960)

J. Cohen

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit

Psychological Bulletin

(1968)

T. Defloor et al.

Pressure ulcers: validation of two risk assessment scales

Journal of Clinical Nursing

(2005)

A. Dijkstra et al.

Nursing-care dependency. Development of an assessment scale for demented and mentally handicapped patients

Scandinavian Journal of Caring Sciences

(1996)

European Pressure Ulcer Advisory Panel, 1998. Pressure ulcer prevention guidelines. Available at:...

R.H. Finn

A note on estimating the reliability of categorical data

Edducational and Psychological Measurement

(1970)

J.L. Fleiss et al.

The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability

Educational and Psychological Measurement

(1973)

J.L. Fleiss et al.

Statistical Methods for Rates and Proportions

(2003)

P.L. Gardner

Scales and statistics

Review of Educational Research

(1975)

B.J. Garvin et al.

Reliability in category coding systems

Nursing Research

(1988)

Cited by (33)

Deep learning based liquid level extraction from video observations of gas–liquid flows
2022, International Journal of Multiphase Flow
The slug flow pattern is one of the most common gas–liquid flow patterns in multiphase transportation pipelines, particularly in the oil and gas industry. This flow pattern can cause severe problems for industrial processes. Hence, a detailed description of the spatial distribution of the different phases in the pipe is needed for automated process control and calibration of predictive models. In this paper, a deep-learning based image processing technique is presented that extracts the gas–liquid interface from video observations of multiphase flows in horizontal pipes. The supervised deep learning model consists of a convolutional neural network, which was trained and tested with video data from slug flow experiments. The consistency of the hand-labelled data and the predictions of the trained model have been evaluated in an inter-observer reliability test. The model was further tested with other data sets, which also included recordings of a different flow pattern. It is shown that the presented method provides accurate and reliable predictions of the gas–liquid interface for slug flow as well as for other separate flow patterns. Moreover, it is demonstrated how flow characteristics can be obtained from the results of the deep-learning based image processing technique.
Validation and clinical impact of paediatric pressure ulcer risk assessment scales: A systematic review
2013, International Journal of Nursing Studies
Citation Excerpt :
Willock et al. (2008) and Gordon (2008, 2009) calculated proportions of agreement, kappa and ICC coefficients that were appropriate statistical measures (Lucas et al., 2010). Huffines and Logsdon (1997) and Suddaby et al. (2005) used Persons r that is inappropriate to indicate reliability (Kottner and Dassen, 2008; Lucas et al., 2010). Study characteristics and results are shown in Table 5.
Pressure ulcer risk assessment using an age-appropriate, valid and reliable tool is recommended for clinical paediatric practice.
(1) What PU risk scales for children currently exist? (2) What is the diagnostic accuracy of their scores? (3) Are the scores reliable and what is the degree of agreement? (4) What is the clinical impact of risk scale scores in paediatric practice?
Systematic review.
MEDLINE (1950 to December 2010), EMBASE (1989 to December 2010), CINAHL (1982 to December 2010), reference lists.
Two reviewers independently screened databases, selected and evaluated articles and studies. Diagnostic accuracy, reliability/agreement, and experimental studies investigating the performance and clinical impact of PU risk scale scores in the paediatric population (0–18 years) were included. PU development was used as reference standard for diagnostic accuracy studies. Methodological quality of the validity and reliability studies was assessed based on the QUADAS and QAREL checklists.
The search yielded 1141 hints. Finally, 15 publications describing or applying 12 paediatric pressure ulcer risk scales were included. Three of these scales (Neonatal Skin Risk Assessment Scale for Predicting Skin Breakdown, Braden Q Scale, Burn Pressure Skin Risk Assessment Scale) were investigated in prospective validation studies. Empirical evidence about interrater reliability and agreement is available for four instruments (Neonatal Skin Risk Assessment Scale for Predicting Skin Breakdown, Starkid Skin Scale, Glamorgan Scale, Burn Pressure Ulcer Risk Assessment Scale). No studies were identified investigating the clinical impact.
Sound empirical evidence about the performance of paediatric pressure ulcer risk assessment scales is lacking. Based on the few results of this review no instrument can be regarded as superior to the others. Whether the application of pressure ulcer risk assessment scales reduces the pressure ulcer incidence in paediatric practice is unknown. Maybe clinical judgement is more efficient in evaluating pressure ulcer risk than the application of risk scale scores.
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed
2011, International Journal of Nursing Studies
Citation Excerpt :
The treatment of sampling errors because of different raters is crucial for the appropriate selection of an ICC (Shrout and Fleiss, 1979; McGraw and Wong, 1996). Moreover, although the ICC is reported in many research reports, it is often not clear which ICC was used (Kottner and Dassen, 2008a; Bhat and Rockwood, 2005). When continuous measurements are split into distinct categories (see item 2), it is recommended that results be calculated for the continuous measurement as well, because the transformation of continuous values into categories may cause difficulties in interpretation and lead to a reduction in statistical power (Colle et al., 2002; Shoukri et al., 2004; Donner and Eliasziw, 1994).
Results of reliability and agreement studies are intended to provide information about the amount of error inherent in any diagnosis, score, or measurement. The level of reliability and agreement among users of scales, instruments, or classifications is widely unknown. Therefore, there is a need for rigorously conducted interrater and intrarater reliability and agreement studies. Information about sample selection, study design, and statistical analysis is often incomplete. Because of inadequate reporting, interpretation and synthesis of study results are often difficult. Widely accepted criteria, standards, or guidelines for reporting reliability and agreement in the health care and medical field are lacking. The objective was to develop guidelines for reporting reliability and agreement studies.
Eight experts in reliability and agreement investigation developed guidelines for reporting.
Fifteen issues that should be addressed when reliability and agreement are reported are proposed. The issues correspond to the headings usually used in publications.
The proposed guidelines intend to improve the quality of reporting.
Guidelines for reporting reliability and agreement studies (GRRAS) were proposed
2011, Journal of Clinical Epidemiology
Citation Excerpt :
The treatment of sampling errors because of different raters is crucial for the appropriate selection of an ICC [61,84]. Moreover, although the ICC is reported in many research reports, it is often not clear which ICC was used [14,43]. When continuous measurements are split into distinct categories (see item 2), it is recommended that results be calculated for the continuous measurement as well, because the transformation of continuous values into categories may cause difficulties in interpretation and lead to a reduction in statistical power [32,58,85].
Results of reliability and agreement studies are intended to provide information about the amount of error inherent in any diagnosis, score, or measurement. The level of reliability and agreement among users of scales, instruments, or classifications is widely unknown. Therefore, there is a need for rigorously conducted interrater and intrarater reliability and agreement studies. Information about sample selection, study design, and statistical analysis is often incomplete. Because of inadequate reporting, interpretation and synthesis of study results are often difficult. Widely accepted criteria, standards, or guidelines for reporting reliability and agreement in the health care and medical field are lacking. The objective was to develop guidelines for reporting reliability and agreement studies.
Eight experts in reliability and agreement investigation developed guidelines for reporting.
Fifteen issues that should be addressed when reliability and agreement are reported are proposed. The issues correspond to the headings usually used in publications.
The proposed guidelines intend to improve the quality of reporting.
Pressure ulcer risk assessment in critical care: Interrater reliability and validity studies of the Braden and Waterlow scales and subjective ratings in two intensive care units
2010, International Journal of Nursing Studies
The application of standardized pressure ulcer risk assessment scales is recommended in clinical practice.
The aims of this study were to compare the interrater reliabilities of the Braden and Waterlow scores and subjective pressure ulcer risk assessment and to determine the construct validity of these three assessment approaches.
Observational.
Two intensive care units of a large University Hospital in Germany.
21 and 24 patients were assessed by 53 nurses. Patients’ mean age was 69.7 (SD 8.3) and 67.2 (SD 11.3).
Two interrater reliability studies were conducted. Samples of patients were assessed independently by a sample of three nurses. A 10-cm visual analogue scale was applied to measure subjective pressure ulcer risk rating. Intraclass correlation coefficients (ICC) and standard errors of measurement (SEM) were used to determine interrater reliability and agreement of the item and sum scores. Pearson product moment correlation coefficients (r) were used to indicate the degree and direction of the relationships between the measures.
The interrater reliability for the subjective pressure ulcer risk assessment was ICC(1,1) = 0.51 (95% CI 0.26–0.74) and 0.71 (95% CI 0.53–0.85). Interrater reliability of Braden scale sum scores was ICC(1,1) = 0.72 (95% CI 0.52–0.87) and 0.84 (95% CI 0.72–0.92) and for Waterlow scale sum scores ICC(1,1) = 0.36 (95% CI 0.09–0.63) and 0.51 (95% CI 0.27–0.72). The absolute degree of correlation between the measures ranged from 0.51 to 0.77.
Interrater reliability coefficients indicate a high degree of measurement error inherent in the scores. Compared to subjective risk assessment and the Waterlow scale scores the Braden scale performed best. However, measurement error is too high to draw valid inferences for individuals. Less than 26–59% of variances in scores of one scale were determined by scores of another scale indicating that all three instruments only partly measured the same construct. The use of the Braden-, Waterlow- and Visual Analogue scales for measuring pressure ulcer risk of intensive care unit patients is not recommended.
Pressure ulcer risk profiles of hospitalized patients based on the Braden Scale: A cluster analysis
2022, International Journal of Nursing Practice

View all citing articles on Scopus

View full text

Interpreting interrater reliability coefficients of the Braden scale: A discussion paper

Abstract

Section snippets

What is already known about the topic?

What this paper adds

Search strategy

Findings

Definition of interrater reliability

Conclusion

Clinics in Geriatric Medicine

Nursing Clinics of North America

Journal of Clinical Epidemiology

International Journal of Nursing Studies

Journal of Clinical Epidemiology

International Journal of Nursing Studies

Journal of Wound Ostomy and Continence Nursing

Applied Nursing Research

International Journal of Nursing Studies

Applied Nursing Research

Prametric statistics and ordinal data: a pervasive misconception

Nursing Research

Weak measurements vs. strong statistics: an empirical critique of S.S. Stevens’ proscriptions on statistics

Educational and Psychological Measurement

Pressure ulcer risk following critical traumatic injury

Advances in Wound Care

Interpreting kappa values for two-observer nursing diagnosis data

Research in Nursing and Health

Clinical application of the Braden scale in the acute-care setting

Dermatological Nursing

Standardized quality-assessment system to evaluate pressure ulcer care in the nursing home

Journal of the American Geriatrics Society

The cost of pressure ulcers in the UK

Age and Ageing

The Braden scale for predicting pressure sore risk

Nursing Research

Predicting pressure ulcer risk

Nursing Research

The development of a national registration form to measure the prevalence of pressure ulcers in the Netherlands

Ostomy Wound Management

The kappa statistic for establishing interrater reliability in the secondary analysis of qualitative clinical data

Research in Nursing and Health

The Braden scale: a review of the research evidence

Orthopaedic Nursing

Pressure ulcer risk in long-term units: prevalence and associated factors

Journal of Advanced Nursing

Predicting the risk of pressure ulcers in critically ill patients

American Journal of Critical Care

A coefficient of agreement for nominal scales

Educational and Psychological Measurement

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit

Psychological Bulletin

Pressure ulcers: validation of two risk assessment scales

Journal of Clinical Nursing

Nursing-care dependency. Development of an assessment scale for demented and mentally handicapped patients

Scandinavian Journal of Caring Sciences

A note on estimating the reliability of categorical data

Edducational and Psychological Measurement

The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability

Educational and Psychological Measurement

Statistical Methods for Rates and Proportions

Scales and statistics

Review of Educational Research

Reliability in category coding systems

Nursing Research