Study of the role of different severity scores in respiratory ICU

The development of the Sepsis-Related Organ Failure Assessment (SOFA) score was an attempt to objectively and quantitatively describe the degree of organ dysfunction over time and to evaluate morbidity in ICU patients with sepsis [6]. Later, when it was realized that it could be applied equally well in nonseptic patients, the acronym ‘SOFA’ was taken to refer to Sequential Organ Failure Assessment [7]. The SOFA scoring scheme daily assigns 1–4 points to each of the following six organ systems depending on the level of dysfunction: Respiratory, circulatory, renal, hematology, hepatic, and central nervous system.


Introduction
The high-complexity features of the ICU services and the clinical situation of patients themselves render correct prognosis, fundamentally important not only for patients, their families, and physicians, but also for hospital administrators, fund providers, and controllers [1].
The severity scoring systems were first introduced for critically ill patients in ICUs in 1980. The basis for their development was the intention to provide information on the prognosis of patients, on the efficacy of therapeutic interventions, on stratification for clinical studies, on workload, and on benchmarking of ICUs [2].
Over the last three decades, several scoring systems have been developed; the Acute Physiology and Chronic Health Evaluation (APACHE) [3,4] and the Simplified Acute Physiology Score (SAPS) [5] scores are the most widely used scoring systems in the ICU.
The APACHE II score consists of patient's age, chronic health condition, and physiological variables. Although APACHE II was one of the first systems described, it is still the most widely used of them, insofar as the data required for its calculation are simple, well defined, reproducible, and can be collected on a routine basis during intensive care service provision [1].
The development of the Sepsis-Related Organ Failure Assessment (SOFA) score was an attempt to objectively and quantitatively describe the degree of organ dysfunction over time and to evaluate morbidity in ICU patients with sepsis [6]. Later, when it was realized that it could be applied equally well in nonseptic patients, the acronym 'SOFA' was taken to refer to Sequential Organ Failure Assessment [7]. The SOFA scoring scheme daily assigns 1-4 points to each of the following six organ systems depending on the level of dysfunction: Respiratory, circulatory, renal, hematology, hepatic, and central nervous system. SAPS II is a standardized and internationally accepted system to assess the severity and prognosis in patients hospitalized in the ICU. Twelve acute physiological variables are scored, besides age, admission type, and the presence of a chronic disease. The final score, converted through a logistic regression equation into

Study of the role of different severity scores in respiratory ICU
Mona Mansour a , Iman Galal a , Enas Kassem b Background Scoring systems are increasingly used in the ICUs in an attempt to accurately predict the mortality outcome in critically ill patients.

Objective
The performance of the Acute Physiology and Chronic Health Evaluation II (APACHE II) score, the Sequential Organ Failure Assessment (SOFA) score, and the Simplified Acute Physiology Score (SAPS) II was compared in terms of calibration and discrimination in critically ill patients admitted to the respiratory ICU.

Materials and methods
Mean admission APACHE II, SAPS II, and SOFA scores were compared in 105 patients. The outcome measure was ICU mortality. The discriminatory ability of the scores was evaluated using the area under the receiver operating characteristic curve. Calibration was tested using the Hosmer-Lemeshow goodness-of-fit test.

Results
The mean admission APACHE II, SAPS II, and SOFA scores were higher in nonsurvivors compared with survivors; yet, only admission SOFA score differed significantly. There was highly significant positive correlation between the three scores. The cutoffs obtained by the receiver operating characteristic curve were 11 for APACHE II, 7.5 for SOFA, and 40 for SAPS II score. Discrimination power of the three scores was poor; yet, in the order of best discrimination, SOFA [area under the curve (AUC) = 0.63] was followed by APACHE II (AUC = 0.60) and then SAPS II (AUC = 0.59). In terms of calibration, SAPS II (χ 2 = 4.82; P = 0.78) had the best calibration and APACHE II (χ 2 = 7.34; P = 0.39) had the worst. Logistic regression analysis showed that, of the three scores, only the SOFA score was an independent predictor of mortality among the respiratory ICU patients; with a unit increase in the SOFA score, there was a 1.2 times higher risk for mortality.
was the only score that differed significantly between the survivors and nonsurvivors (4.95 ± 2.49, 6.11 ± 2.76; P = 0.028, respectively). Although the mean ± SD admission scores were not significantly different between the survivors and nonsurvivors for APACHE II (16.07 ± 7.31, 18.77 ± 7.55; P = 0.07, respectively) and SAPS II (41.17 ± 11.93, 46.23 ± 15.37; P = 0.068, respectively), they were higher in nonsurvivors compared with survivors (Table 2). ROC curve was constructed for each score with respect to the outcome, and accuracy and measure of the AUC were obtained. The efficacy of various scores to discriminate between the survivors and nonsurvivors, as assessed by the AUC, is given in Table 3 and Fig. 1. All the scores tested probability of hospital mortality, results from the sum of the variable scores, with higher scores corresponding to more severe patient conditions [5].
This study aimed at evaluating the performance of different scoring systems, in terms of calibration and discrimination, and predicting the patients' outcome in the respiratory ICU (RICU).

Materials and methods
In this prospective observational study, all consecutive patients admitted to the RICU of Abbassia Chest Hospital were enrolled. In all, 105 patients were studied prospectively. Demographic data, admission diagnosis of the patients, comorbidities, and outcome were recorded. For all patients, APACHE II, SOFA, and SAPS II scores were determined on the day of admission to the RICU. The study was approved by the institutional ethics committee.

Statistical analysis
Parametric data were expressed as minimum, maximum, and mean ± SD, and nonparametric data were expressed as number and percentage of the total. Student's t-test was used for comparing the averages of continuous measurements. Correlation between two studied parameters was determined using Pearson's correlation coefficient. The predictive capability of the three scores at the best cutoffs was assessed using the receiver operating characteristic (ROC) curve. Discrimination was tested using the ROC curves and by evaluating areas under the curve (AUC). Observed and predicted mortality was compared using the Hosmer-Lemeshow goodness-of-fit test, in which lower χ 2 values and higher P values (>0.05) indicate good fit. Stepwise logistic regression analysis was used to estimate the predictive ability of the APACHE II, SOFA, and SAPS II scoring systems in assessing outcome. The dependent variable was mortality and the potential independent variables were APACHE II, SOFA, and SAPS II. Statistical significance was set at P value less than 0.05. Statistical analyses were performed using Statistical Package for Social Sciences software (SPSS for Windows, version 16.0; SPSS Inc., Chicago, Illinois, USA).

Results
The study enrolled a total of 105 patients; the mean ± SD age was 54.59 ± 15.75 years with a range of 20-88 years, 71 (68%) were male patients and 34 (32%) were female patients. On admission to the RICU, the mean ± SD of APACHE II, SOFA, and SAPS II scores was 16.07 ± 7.31, 4.95 ± 2.49, and 41.17 ± 11.93, respectively. Descriptive data of the included patients are displayed in Table 1. The mean ± SD admission SOFA score   positive correlation between the various scoring systems (P < 0.01) as assessed by linear regression analysis. The closest correlation was observed between APACHE II and SAPS II scores (r 2 = 0.78) followed by SOFA and SAPS II scores (r 2 = 0.68), whereas the least correlation was observed between APACHE II and SOFA scores (r 2 = 0.61) (Fig. 2).

Discussion
In our study, we determined the initial scores of APACHE II, SOFA, and SAPS II during the first 24 h of admission to the RICU. We further compared the performance of these three scores with respect to their calibration and discrimination. The outcome measure was ICU mortality. This study has the advantage of evaluating these scores in the RICU, which was rarely tested in previous studies; instead, general and surgical ICUs were mostly the environment under test.
APACHE II [8] and SAPS II [5] are certainly among the most commonly used and validated tools for predicting outcome in the ICUs. An ideal scoring system should be able to predict mortality rate correctly, had poor discrimination power with AUC less than 0.7; yet, SOFA score performed better (AUC = 0.63) than APACHE II (AUC = 0.60) and SAPS II scores (AUC = 0.59). The cutoff points obtained by the ROC curve simultaneously considered the best sensitivity and specificity with respect to the addressed variable (Table 3 and Fig. 1). On comparing the actual and expected hospital mortality of the three scores, it was found that APACHE II correctly predicted 83% of survivors and 30% of nonsurvivors with overall 59% predictability, SOFA correctly predicted 74% of survivors and 40.1% of nonsurvivors with overall 59% predictability, and SAPS II correctly predicted 83% of survivors and 28% of nonsurvivors with overall 58% predictability. Logistic regression analysis showed that, of the three scores, only SOFA score was an independent predictor of mortality among the RICU patients; with a unit increase in the SOFA score, there was a 1.2 times higher risk for mortality. Using the Hosmer-Lemeshow goodness-of-fit test for evaluating the calibration of the various scoring systems, it was found that SAPS II (χ 2 = 4.82), with P = 0.78, had the best calibration and APACHE II (χ 2 = 7.34), with P = 0.39, had the worst ( Table 4), suggesting that SAPS II score had the least statistically significant discrepancy between predicted and observed mortality. There was highly significant Correlation between the three scores. APACHE II, Acute Physiology and Chronic Health Evaluation II; SAPS II, Simplified Acute Physiology Score II; SOFA, Sequential Organ Failure Assessment.   specificity and the overall accuracy of the SOFA score was the highest among the three scores. Although the APACHE II index was not developed for assessing individual prognoses, ICU physicians and medicine as a whole have yearned for such predictive ability. Thus, many studies have attempted to assess the use of this index with this purpose in mind [22]. Accordingly, if the utility of APACHE II score for assessing patients' outcome in ICU is mandatory, it would be more reliable to combine this score with other scores for more accurate results.
When the three scores were tested, only SOFA score was found to be an independent predictor of mortality among the RICU patients; with a unit increase in the SOFA score, there was a 1.2 times higher risk for mortality.
The correlation between the three scores was significantly positive. A similar strong positive correlation was found between admission SOFA and APACHE II scores in other studies [15,23]. This significant positive correlation observed in our study might suggest that the overall performance of combining these scores can improve the accuracy of individual scores.
Our study tested these scores in the RICU; differences in the ICU types, ethnicity, pattern of disease, critical care offered to patients, and admission criteria might lead to different results.
The study has some limitations. The small sample size is the most important limitation, as it might influence the evaluation of calibration and discrimination of the scores. Furthermore, repetitive scores were lacking in this study. Finally, no followup data of the patients discharged from the RICU were available. It can be concluded that SOFA score has better discriminatory power, whereas SAPS II score has better calibration. These findings were not surprising on the basis of the understanding that it is impossible for any model to have perfect calibration and discrimination at the same time [24]. Yet, more studies are needed on a larger number of patienºs to support our findings.
that is predicted mortality rate should be close to the actual mortality rate; should be well calibrated, that is it should be able to provide risk estimate corresponding to the observed mortality; should have high levels of discrimination, that is it should be able to identify the patients who are at higher risk of dying; should be easy to compute; and should be based on easily available patient parameters. Furthermore, it also has to be dynamic, reflecting the change in management and case mix over time [9].
In our study, the mean admission APACHE II, SAPS II, and SOFA scores were higher in nonsurvivors compared with survivors; yet, only admission SOFA score differed significantly between survivors and nonsurvivors. Similarly, several studies reported that admission APACHE II [10][11][12] score and SAPS II [11] score did not differ significantly between survivors and nonsurvivors and did not have an influence on the risk for mortality in ICU patients. Furthermore, some studies [13][14][15] were in agreement with our findings with respect to admission SOFA score that differed significantly between survivors and nonsurvivors. Yet, owing to the difference in the study population with respect to the type of ICU under study and the diagnosis on admission besides the knowledge that in some studies the repetitive daily scores were used instead of the admission scores, some studies reported that APACHE II [16,17] and SAPS II [18] score differed significantly between survivors and nonsurvivors.
The performance of the prognostic models encompasses two objective measures: Calibration and discrimination [19]. Calibration refers to how closely the estimated probabilities of mortality correlate with the observed mortality over the entire range of probabilities. Discrimination refers to how well the model discriminates between individuals who will live and those who will die. The study showed that APACHE II score had the poorest calibration power, SAPS II score showed good calibration, and the calibration power of SOFA score was intermediate between APACHE II and SAPS II scores. The overall discriminatory capability of all the three scoring models, as measured by the AUC of ROC, was generally poor; yet, it was better for the SOFA score compared with the APACHE II and SAPS II scores. Similarly, in a study by Halim et al. [20], the discrimination power of SOFA score was better than that of APACHE II score. In a recent study by Sakr et al. [21], calibration was worst for APACHE II score compared with SAPS II score, which showed good calibration. Although the APACHE II score carried the highest sensitivity at the selected cutoff, the specificity of this score was very low, whereas the