Comparison of SAPS 3 performance in patients with and without solid tumor admitted to an intensive care unit in Brazil: a retrospective cohort study

Objective To compare the performance of the Simplified Acute Physiology Score 3 (SAPS 3) in patients with and without solid cancer who were admitted to the intensive care unit of a comprehensive oncological hospital in Brazil. Methods We performed a retrospective cohort analysis of our administrative database of the first admission of adult patients to the intensive care unit from 2012 to 2016. The patients were categorized according to the presence of solid cancer. We evaluated discrimination using the area under the Receiver Operating Characteristic curve (AUROC) and calibration using the calibration belt approach. Results We included 7,254 patients (41.5% had cancer, and 12.1% died during hospitalization). Oncological patients had higher hospital mortality than nononcological patients (14.1% versus 10.6%, respectively; p < 0.001). SAPS 3 discrimination was better for oncological patients (AUROC = 0.85) than for nononcological patients (AUROC = 0.79) (p < 0.001). After we applied the calibration belt in oncological patients, the SAPS 3 matched the average observed rates with a confidence level of 95%. In nononcological patients, the SAPS 3 overestimated mortality in those with a low-middle risk. Calibration was affected by the time period only for nononcological patients. Conclusion SAPS 3 performed differently between oncological and nononcological patients in our single-center cohort, and variation over time (mainly calibration) was observed. This finding should be taken into account when evaluating severity-of-illness score performance.


INTRODUCTION
Severity-of-illness scores are used in intensive care unit (ICU) settings worldwide for performance evaluation and monitoring, quality improvement and benchmarking. (1)(2)(3) Since their first description in the 1980s, many prognostic models have been developed. However, among them, the Simplified Acute Physiology Score 3 (SAPS 3) is the only one developed from a multinational cohort (16,784 patients from 35 countries). (4,5) In Brazil, Moralez et al. recently demonstrated that SAPS 3 (standard equation) remains the most accurate prognostic model. (6) However, the performance of severity-of-illness scores might be different in some institutions due to the case-mix and representativeness of subgroups, such as oncological patients.
Objective: To compare the performance of the Simplified Acute Physiology Score 3 (SAPS 3) in patients with and without solid cancer who were admitted to the intensive care unit of a comprehensive oncological hospital in Brazil.
Methods: We performed a retrospective cohort analysis of our administrative database of the first admission of adult patients to the intensive care unit from 2012 to 2016. The patients were categorized according to the presence of solid cancer. We evaluated discrimination using the area under the Receiver Operating Characteristic curve (AUROC) and calibration using the calibration belt approach.
Results: We included 7,254 patients (41.5% had cancer, and 12.1% died during hospitalization). Oncological patients had higher hospital mortality than nononcological patients (14.1% versus 10.6%, respectively; p < 0.001). SAPS 3 discrimination was better for oncological patients (AUROC = 0.85) than for nononcological patients (AUROC = 0.79) (p < 0.001). After we applied the calibration belt in oncological patients, the SAPS 3 matched the average observed rates with a confidence level of 95%. In nononcological patients, the SAPS 3 overestimated mortality in those with a low-middle risk. Calibration was affected by the time period only for nononcological patients.
Conclusion: SAPS 3 performed differently between oncological and nononcological patients in our single-center cohort, and variation over time (mainly calibration) was observed. This finding should be taken into account when evaluating severity-of-illness score performance.
Because intensivists are increasingly managing oncological patients, (7) studies evaluating the performance of these prognostic models in this subgroup of critically ill patients are welcomed. Although several studies have been published concerning this topic, (8)(9)(10) they were published almost ten years ago, and a well-known phenomenon that compromises the performance of these prognostic models is deterioration over time (mainly in calibration) as previously demonstrated. (8,11) None of these studies compared the model performance in oncological versus nononcological patients.
Therefore, our objectives in the present study were to evaluate the performance of SAPS 3 in patients with cancer admitted to a Brazilian ICU, compare the SAPS 3 performance of patients with and without cancer and to study time trends in SAPS 3 performance.

METHODS
This was a retrospective cohort study of all consecutive patients admitted to our medical-surgical ICU (a 30-bed unit at Hospital Sírio-Libanês, a private tertiary hospital with a dedicated oncology unit, located in São Paulo, Brazil) and was approved by the local ethics committee. The detailed description of our unit was previously published and did not change during the study period. (12) The exclusion criteria were age younger than 18 years and pregnancy. If the patients had more than one admission during the inclusion period, only the first admission was included. Some of the patients included in this study were also included in a previous analysis of our group regarding ICU readmission (1,702 patients). (12) Our analysis used administrative data that were prospectively collected in a cloud-based software database (Sistema Epimed™) by trained nurses. (13) The study period was from January 1st, 2012 to July 31st, 2016. The oncological condition was defined as any patient admitted with an active solid cancer (current curative or palliative chemotherapy, radiotherapy, immunotherapy or surgery) in the last 12 months. Hematological patients were excluded because they usually have distinct characteristics compared with patients with solid tumors (e.g., a higher burden of active disease; required oncological treatment during ICU stay; prolonged duration of neutropenia; a higher intensity of immunosuppression; concomitant bone marrow transplant; and a higher incidence of specific complications, such as invasive mold fungal infections and cytomegalovirus infection).
The data recorded included age, sex, date of ICU admission, SAPS 3, (4,5) referring facility, admission diagnosis, surgical procedures before admission, Charlson index for comorbidities, (14) resource utilization during ICU stay (mechanical ventilation, vasoactive drugs or renal replacement therapy), oncological status (locoregional or metastatic) and hospital mortality. The SAPS 3 was calculated using data from the ICU admission. As recommended, missing values were coded as "normal" for each variable. (6)

Data analysis
The primary outcome was hospital mortality. Quantitative parametric data were presented as means ± standard deviation (SD), nonparametric data were presented as medians (25% -75% interquartile range -IQR), and categorical variables were presented as percentages. Categorical variables were compared using chi-squared test.
The primary outcome was hospital mortality. The estimated mortality rate was calculated using the standard equation for the SAPS 3. SAPS 3 discrimination was evaluated using the area under the Receiver Operating Characteristic curve (AUROC). Comparisons between AUROCs were performed using the Delong method. (15) Calibration was assessed using the calibration belt method as described by the GiViTI group. (16) This method applies a generalized polynomial logistic function between the outcome and logit transformation of the estimated predicted probability, with the respective 95% and 80% confidence interval (CI) boundaries. A statistically significant deviation from the bisector (the line of perfect calibration) occurs when the 95%CI boundaries of the calibration belt do not include the bisector. (16) Standardized mortality rates (SMRs) with 95%CIs were calculated by dividing the observed by the predicted mortality rates. The Brier score is an overall performance measure that was calculated using a standard formula. (17) To study time trends in the SAPS 3 performance, we split the cohort into two subgroups by the ICU admission date (October 1st, 2014 to create two subgroups with similar sizes).
The data were analyzed using IBM SPSS Statistics, Version 21 and R (http://www.r-project.org). All the statistics were two-tailed, and a p value < 0.05 was considered statistically significant. Length of hospital stay before ICU admission (days)   nononcological patients (B), described as bisector deviation intervals. The predicted mortality intervals at which the calibration belt significantly deviates from the bisector and the 80% and 95% confidence levels are described in the lower right region of the plots. SAPS 3 -Simplified Acute Physiology Score 3.

Comparison of the SAPS 3 performance in oncological patients and nononcological patients
The discrimination of SAPS 3 was higher for oncological patients than for nononcological patients (p < 0.001; Table 2). Calibration belt analysis demonstrated that, in oncological patients, no miscalibration was observed. However, in nononcological patients, SAPS 3 overestimated mortality in those with low-middle predicted risk (Figure 1).

Time trend evaluation
The frequency of oncological patients did not change between the two time periods evaluated (1,498 of 3,542 patients [42.3%] versus 1,510 of 3,712 patients [40.7%], respectively; p = 0.16). The discrimination of SAPS 3 was not affected within subgroups by time period (Figure 2). Calibration belt analysis showed no miscalibration in the oncological subgroup of patients within either period. However, in the nononcological group, undercalibration was observed in the first period, and overcalibration was observed in the second period (Figure 3 andTable 3).

DISCUSSION
In this retrospective Brazilian cohort, we evaluated SAPS 3 performance in both oncological and nononcological critically ill patients. We found that (1) the SAPS 3 discrimination and calibration were accurate in our cohort for oncological patients; (2) the discrimination was greater and calibration was more accurate for oncological patients than for nononcological patients; and (3) the calibration was affected by the time period only for nononcological patients.
Recently, Moralez et al. published the largest validation study of severity-of-illness scores in Brazil using contemporary data from a multicenter cohort. (6) They showed that the SAPS 3 standard equation was accurate in predicting outcomes in our country, supporting the national initiative from the Associação de Medicina Intensiva Brasileira (AMIB) for benchmarking units. (18) However, the case-mix variation between units may lead to performance deterioration. (19) A particular subgroup of interest is critically ill oncological patients because 15% of the patients admitted to European ICUs have cancer. (20) Some previous publications evaluated SAPS 3 performance in oncological patients. (8)(9)(10) Overall, these studies suggest that the measure is accurate for both discrimination and calibration, as demonstrated by our data. This finding is reassuring because oncological patients are usually poorly represented in development cohorts (8% in the original SAPS 3 cohort). (4,5) Nevertheless, the hospital mortality of critically ill patients with cancer depends on acuity rather than on the presence and characteristics of the malignant disease itself. (20,21) Therefore, the general severity of illness scores might capture most of the shortterm prognosis in this population.
A novel approach of the present study was the comparison of SAPS 3 performance in oncological versus nononcological patients. We observed that this score was superior in oncological patients admitted to the ICU in terms of discrimination and calibration. To the best of our knowledge, this is the first study to compare these subgroups in the same cohort. The reasons for this observation are unclear but highlight the importance of case-mix differences between units and how this might affect model comparisons.
Another relevant observation was the effect of time on performance components. Calibration is particularly susceptible to time trends. Zimmerman et al. showed that the discrimination of Acute Physiology And Chronic Health Evaluation (APACHE) IV was robust and changed little throughout the evaluated time period but that calibration deteriorated in the general ICU population. (11) Soares et al. also demonstrated this deterioration in their temporal analysis of SAPS 3 calibration in oncological patients. (8) Again, we observed different effects of the time trend analysis in oncological compared with nononcological patients. No miscalibration was observed in oncological patients in either period. However, not only did miscalibration occur in nononcological patients but it also moved from underestimation to overestimation from the first to the latest period evaluated. We speculated that clinical practice changes affected mainly nononcological patients, but this suggestion may be oversimplistic. Selection bias at ICU preadmission and end-of-life practice may also affect calibration.
Our study has some strengths, such as a large sample size, which allowed the performance of hypothesisgenerating subgroup analyses (oncological versus nononcological patients) not previously performed. Our study also evaluated a score validated in the same settings as in previous studies in oncological patients as well as in one of the largest external validation cohorts. (6) However, this study also has limitations. First, it is a single-center cohort; our results reflect local practice and our case mix (which limits the generalizability of our study). Nevertheless, some of our results agree with previous study findings. Second, the study had a long period of data collection. Although this factor might affect the overall performance evaluation, it was required for the time period analysis. Finally, end-oflife decisions were not systematically annotated in our database.

CONCLUSION
We observed that SAPS 3 discrimination was better for oncological than for nononcological patients and that SAPS3 showed no relevant deviations from optimal calibration in oncological patients. SAPS 3 performance (mainly calibration) varied over time differently according to the oncological status in our single-center cohort.

FUNDING
This work was supported by a local grant from Hospital Sírio-Libanês, São Paulo (Brazil).

AUTHORS' CONTRIBUTIONS
L.U. Taniguchi was responsible for the study conception, data interpretation and drafting of the manuscript. E.M.P. Siqueira acquired the data. L.U. Taniguchi and E.M.P. Siqueira analyzed the data. All the authors read and approved the final manuscript.