Validity of the ACS NSQIP surgical risk calculator as a tool to predict postoperative outcomes in subacute orthopedic trauma diagnoses

Purpose This retrospective study aimed to validate the ACS NSQIP Surgical Risk Calculator (SCR) to predict 30-day postoperative outcomes in patients with one of the following subacute orthopedic trauma diagnoses; multiple rib fractures, pelvic ring/acetabular fracture, or unilateral femoral fracture. Methods Data of patients with these diagnoses treated between January 1, 2015 and September 19, 2020 were extracted from the patients’ medical files. Diagnostic performance, discrimination, calibration, and accuracy of the ACS NSQIP SRC to predict specific outcomes developing within 30 days after surgery was determined. Results The total cohort of the three diagnoses consisted of 435 patients. ACS NSQIP SRC underestimated the risk for serious complications, especially in patients with multiple rib fractures (8.3% predicted vs 17.2% observed) or pelvic ring/acetabular fracture (6.1% vs 19.8%). Underestimation was more pronounced for the composite outcome ‘any complication’. Sensitivity ranged from 16.7% to 100% and specificity from 41.1% to 97.1%. Specificity exceeded sensitivity for pelvic ring/acetabular and femoral fractures. Discrimination was good for predicting death (femoral fracture), fair for readmission (femoral fracture), serious complication (multiple rib fractures), and any complication (multiple rib fractures), but poor in all other outcomes and diagnoses. Calibration and accuracy were adequate for all three diagnoses (p-value for Hosmer-Lemeshow test >0.05 and Brier scores <0.25). Conclusion Performance of the ACS NSQIP SRC in the studied cohort was variable for all three diagnoses. Although it underestimated the risk of most outcomes, calibration and accuracy seemed generally adequate. For most outcomes, adequate diagnostic performance and discrimination could not be confirmed.


Introduction
Patients with acute surgical conditions can have a disruptive effect on the elective surgical program.Especially for subacute diagnosis, discussing the order of the surgery program is challenging.This is partly due to the fact that literature on outcomes related to timing of surgery for patients with subacute diagnoses is inconclusive.Classification of urgency is critical, especially with limited treatment capacity [1][2][3][4][5][6][7][8][9][10][11].
Current classification systems merely use diagnosis-specific factors to define urgency [1,[12][13][14].Patient-related factors are hardly taken into account.In relation to subacute trauma diagnoses, it can be speculated that patients with a higher risk for postoperative complications might benefit from earlier surgery.In order to improve surgical care for vulnerable patients with a subacute orthopedic trauma diagnosis, a tool with consideration of patient-related risk factors should be used to triage patients.Accurately recognizing patients at a higher risk of postoperative complications is vital for health care management.
The American College of Surgeons developed the National Surgery Quality Improvement Program Surgical Risk Calculator (ACS NSQIP SRC) in order to predict postoperative events [15,16].The SRC predicts postoperative outcomes accurately for pooled surgical diagnoses from the surgical subspecialties and for patients with colon surgery [16,17].However, except for an accurate prediction of complications in elderly patients with a hip fracture, the validity of the ACS NSQIP SRC in specific subacute orthopedic trauma diagnoses requires investigation [18].This includes commonly treated subacute conditions such as multiple rib fractures, pelvic ring/acetabular fractures, and femoral fractures.Subacute diagnoses are defined as critical conditions that are preferably operated on within 24 hours, present in patients who are otherwise stable.
The aim of this study was to determine the diagnostic performance, discrimination, calibration, and accuracy of the ACS NSQIP SRC to predict 30-day postoperative outcomes in patients with a subacute diagnosis of multiple rib fractures, pelvic ring/acetabular fracture, or unilateral femoral fracture.

Methods
This retrospective cohort study was performed in an academic hospital.Patients were identified from the patient's medical files, using the registered local surgical codes for the diagnoses multiple (i.e., three or more) rib fractures, pelvic ring/acetabular fracture, or unilateral femoral fracture.
Patients aged 18 years or older who underwent urgent (definitive) surgical treatment for any of the three predefined diagnoses after trauma between January 1, 2015 and September 19, 2020 were included.All procedures were designated as emergency cases that lacked other potential appropriate treatment options.Patients with insufficient data to compute the ACS NSQIP SRC score, patients who were pregnant during surgery, patients who required immediate surgery based on hemodynamic instability, and patients for whom less than 30 days of postoperative follow-up was available in their medical files were excluded.Preoperative comorbidities and postoperative events were collected from the patient's medical files.Postoperative events were restricted to events that were directly related (or highly likely directly related) to the diagnosis studied.
The study was exempted by the Medical Research Ethics Committee (ref. no. MEC-2020-0430), and consent was waived.The study has been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendment.

ACS NSQIP SRC
The ACS NSQIP SRC predicts postoperative (adverse) events based on Current Procedural Terminology (CPT®) codes and 20 preoperative factors, including demographic information and comorbidities (Fig. 1A).The SRC is based on the ACS NSQIP patient database, which includes more than 6 million patients with various diagnoses in 719 hospitals worldwide, evaluated in 2019 [19].The calculator scores patient-specific and average predicted risks of 13 postoperative (adverse) events within 30 days after surgery or hospital length of stay (HLOS; Fig. 1B) [20].The definition for each comorbidity and outcome is given on the ACS NSQIP SRC website as part of the calculator.The outcomes are combined into two composite outcome measures, being serious complication or any complication [19,20].In 14% of patients, ASA class was determined using alternative information from medical files.An overview of the CPT® codes with description used in this study is shown in Table 1.

Outcome measures
The primary outcome measure of this study was the predictive value of the ACS NSQIP SRC score for the different (adverse) events within 30 days or HLOS, defined by diagnostic performance, discrimination, calibration, and accuracy.All complications within 30 days were collected, including those that occurred during this timeframe but were documented after this period.The secondary outcome measure was the time to surgery and its correlation with the ACS NSQIP SRC score.

Statistical analysis
Data were analyzed using the Statistical Package for the Social Sciences (SPSS) version 25 (SPSS, Chicago, Ill., USA).Receiver operating characteristic (ROC) analysis and calculation of sensitivity and specificity were performed using MedCalc Statistical Software version 18.2.1 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org;2018).Normality of continuous variables was tested with the Shapiro-Wilk test.A p-value lower than 0.05 was considered statistically significant.Missing data were not imputed.
First, demographics and patient characteristics were summarized using descriptive statistics.Data are shown as median with quartiles for continuous variables and numbers with percentage for categorical variables.
Next, the correlation between the ACS NSQIP SRC predicted risk and the time to surgery was determined using the Spearman rank correlation test.Spearman's Rho is shown separately for patients who did or did not develop the (adverse) events studied.
Finally, the performance of the ACS NSQIP SRC in predicting the 30-day risk of postoperative outcome was evaluated.The SRC predicted risk for (adverse) events (serious, any, and specific (adverse) events as mentioned above) was compared against the actual observed value (i.e., whether the complication occurred or did not occur).The diagnostic performance, discrimination, calibration, and accuracy of the SRC were evaluated for the three studied diagnoses analogous to procedures used in the original validation of the SRC [16].For diagnostic performance, sensitivity and specificity were computed.For this analysis, the predicted risk was categorized as 1) "above average risk" versus 2) "average risk or below average risk".
Discrimination was evaluated by the area under the receiver operating characteristic curve (AUC) or c-statistic.The curve plots the false-positive rate (1-specificity) against the true-positive rate (sensitivity) for all possible predicted risk score cutoff points.The SRC predicted risk per patient was entered as continuous variable.An AUC of 0.5 indicates no discrimination above chance and an AUC of 1.0 indicates perfect discrimination.Generally, an AUC = 0.9-1.0represents excellent, AUC = 0.8-0.9good, AUC = 0.7-0.8fair, and AUC = 0.6-0.7 poor discriminative ability.Discrimination is assumed to be useful if AUC ≥0.75 [21].
Calibration was assessed performing the Hosmer-Lemeshow goodness-of-fit test.The Hosmer-Lemeshow (HL) statistic evaluates differences in the probability of observed and predicted events across deciles of increasing predicted risk.The SRC predicted risk was entered as continuous variable.The null hypothesis that the SRC model is well-calibrated is rejected at a p-value of <0.05 [22].
The accuracy of the SRC was assessed using the Brier score, which reflects the deviation between the predicted and observed (adverse) events.The Brier score is computed as the mean squared differences between the predicted risk (continuous variable) and the  actual outcome.Presence or absence of an event was scored as 1 or 0, respectively.A risk prediction model that perfectly predicts the outcomes of all individuals has a Brier score of 0. A Brier score of 1 indicates that the model did not predict the outcome.A score <0.01 indicates predictive precision >90% [15,23,24].A risk prediction model with a Brier score of 0.25 or higher is considered non-informative [22].

Results
The overall number of patients for all three diagnoses included in the analysis was 435.Below, more detailed results are given, stratified for the three studied diagnoses.Fig. 2 shows the study flow chart per diagnosis.Table 2 provides an overview of the patient characteristics and preoperative risk factors per diagnosis.Online Supplemental Table S1 shows the observed rates and risks predicted by the ACS NSQIP SRC for all 13 outcomes per diagnosis, in the order that the calculator presents those outcomes.In addition, it shows the diagnostic performance, discrimination, calibration, and accuracy of the ACS NSQIP SRC for predicting these postoperative (adverse) events.The results section below focuses on the three main (adverse) events, which are presented in Table 3. Figs. 3 and 4 show the ROC curves and correlations of ACS NSQIP SRC predicted risk with time to surgery for the serious and any complication of the three diagnoses studied, respectively.
The observed complication rates were up to 5 times higher than the risks predicted by the ACS NSQIP SRC for the outcomes listed in Table 3.As a measure of diagnostic performance, sensitivity ranged from 25.0% for readmission to 100% for death (Table 3).Specificity was lower, and ranged from 41.1% for return to OR to 72.7% for any complication.The discriminatory ability was very poor for readmission and return to OR (AUC <0.6) and fair for serious complication and any complication (AUC 0.7-0.8;Table 3 and Fig. 3A  and B).Calibration was adequate for the outcomes listed (HL > 0.05) (Table 3).The ACS NSQIP SRC showed adequate predictive precision for the outcomes mentioned (i.e., Brier score <0.25), except for any complication (Table 3).No statistically significant associations were found between predicted risk scores (for serious or any complication) and time to surgery (Fig. 4A and B).

Pelvic ring/acetabular fractures
In the group with a pelvic ring/acetabular fracture, 116 out of 225 patients screened were included (Fig. 2B).They had a median age of 43 years (P 25 -P 75 30-57), and 27 (23.3%)patients were female (Table 2).The median time to surgery was 73.7 h (P 25 -P 75 45.2-124.1).The observed complication rates were higher than the risks predicted by the ACS NSQIP SRC for the outcomes listed in Table 3.The largest effect was seen for any complication (69.8% observed vs 6.5% predicted).Sensitivity ranged from 16.7% for readmission to 100% for death (Table 3).Specificity was higher, and ranged from 67.0% for return to OR to 97.1% for any complication.The AUC ranged from 0.64 to 0.69, which indicates poor discriminatory ability for the outcomes listed (Table 3 and Fig. 3C and D).The p-value of the HL test was consistently >0.05, indicating that the ACS NSQIP SRC has adequate calibration for the outcomes listed (Table 3).For the prediction of postoperative (adverse) events, the ACS NSQIP SRC showed adequate predictive precision (i.e., Brier score <0.25), except for any complication (Table 3).
No statistically significant associations were found between predicted risk scores (for serious or any complication) and time to surgery (Fig. 4C and D).

Femoral fractures
In the group with a femoral fracture, 261 out of 489 patients screened were included (Fig. 2C).They had a median age of 64 years (P 25 -P 75 43-76), and 104 (39.8%) patients were female (Table 2).The median time to surgery was 14.1 h (P 25 -P 75 5.9-19.8).The observed complication rates were higher than the risks predicted by the ACS NSQIP SRC for the outcomes listed in Table 3, except for readmission (3.4% observed versus 4.6% predicted).Sensitivity ranged from 37.9% for serious complication to 76.9 % for death (Table 3).Specificity was higher, except for death, and ranged from 61.6% for return to OR to 82.9% for any complication.The discriminatory ability was (very) poor for serious complication, any complication, and return to OR (AUC<0.7),fair for readmission (AUC 0.7-0.8),and good for death (AUC 0.8-0.9;Table 3 and Fig. 3E and F).Calibration was adequate for all outcomes listed (Table 3).The ACS NSQIP SRC showed adequate predictive precision for the outcomes mentioned (i.e., Brier score <0.25), except for any complication (Table 3).
The predicted risk scores (for serious or any complication) were statistically significantly positively correlated with time to surgery (Fig. 4E and F).

Discussion
The results of this study show that the predicted risk by the ACS NSQIP SRC is an underestimation of the observed rate of postoperative outcomes in all three diagnoses studied.Sensitivity and specificity varied highly across the outcomes and diagnoses, with sensitivity ranging from 16.7% to 100% and specificity ranging from 41.1% to 97.1%.The discriminatory ability was good for predicting death (femoral fracture) and fair for readmission (femoral fracture), serious complication (multiple rib fractures), and any complication (multiple rib fractures), but poor in all other outcomes and diagnoses.Calibration was adequate for all outcomes in all diagnoses.Finally, the accuracy was adequate for all outcomes studied, except for any complication (multiple rib fractures, pelvic ring/ acetabular fracture, and femoral fracture).These results suggest that the ACS NSQIP SRC is not a suitable risk prediction tool for these three diagnoses in an academic hospital setting in The Netherlands.
A statistically significant positive association of the ACS NSQIP SRC predicted risk with time to surgery was found for patients with a femoral fracture (i.e., patients with a lower ACS NSQIP SRC score were operated earlier).A possible explanation would be that patients with a lower complication risk have less comorbidities and would thus require a less comprehensive preoperative preparation.There is a paucity in literature on the performance of the ACS NSQIP SRC for the subacute diagnoses investigated in the current study.
It is unclear if the underestimation of the complication rates may, to some extent, be due to cross-cultural differences, or if it is attributable to a different case mix between the current cohort (i.e., academic setting with a substantial proportion of polytrauma) and the population used for determining the weighting factors in the SRC (i.e., general population with mostly monotrauma).Similar to the current data on femoral fractures, Wang et al. showed that the ACS NSQIP SRC underestimates the risk of postoperative outcomes, adequate calibration for all outcomes, and adequate accuracy for all outcomes except any complication [18].Their results on discrimination differed from our results for serious complication and any complication (poor in our study versus fair for Wang et al.), readmission (fair versus poor), return to OR (poor versus good), and death (good versus excellent).It is unclear if this is attributable to differences in age (65 years or older, which comprises only 50% of our study) and the much larger sample size, or a potential difference in case mix between the two studies.To our knowledge, no research specifically focusing on using the ACS NSQIP SRC for multiple rib fractures and pelvic ring/acetabular fractures is available.However, in comparison with previous literature this cohort had a higher overall complication rate for multiple rib fractures of 62.1% versus 10.3%, and a higher pneumonia rate of 27.6% vs 16.7 and 17.1% [25,26].This is similar for pelvic ring fractures and femoral fractures where literature reports 7.0-31.1% and 25.3-35.5% complication rates versus 69.8% and 57.5% in this cohort [27][28][29].This further supports that the ACS NSQIP SRC is not a good tool to predict complications in an academic hospital in The Netherlands.
The results of the different performance measures indicate that implementation of the ACS NSQIP SRC in a subacute orthopedic trauma setting requires more extensive validation studies.In the current study, the ACS NSQIP SRC has not been used for timing of surgery, and the data generally show no significant association between the time to surgery and the predicted risk of postoperative outcomes.Literature shows that early surgical treatment results in lower risk of complications [2,4,8,11].Whether or not implementing the ACS NSQIP SRC as a tool to optimize surgical timing will reduce the rates of postoperative outcomes (and consequently the difference between observed rates and predicted risks) requires further investigation.
As with most observational studies, this study also had some limitations, the most obvious one being the retrospective design.Excluding patients due to missing data for the ACS NSQIP SRC and due to loss to follow-up may have affected the true rates of postoperative outcomes.In addition, with the participating site being an academic hospital, the enrolled patient group likely does not fully represent the general population with any of the three diagnoses included.A substantial proportion has additional injuries, making it hard to link the complications to the studied diagnoses [30,31].The ACS NSQIP SRC has added the "Surgeon adjustment of risk" variable, there was however no standardized method to incorporate this due to the retrospective nature of this study.In addition, there is a possibility that patients present their complications within 30 days at a different healthcare facility.In the event they sought medical attention elsewhere, this would have been registered during their subsequent follow-up appointment and included in the current analysis.Additional research is needed in order to draw conclusions on the validity of the ACS NSQIP SRC for the general population with these diagnoses.Another limitation is in the ACS NSQIP SRC itself.The calculator not only incorporates native complications, but also dependent sequelae, such as return to OR, discharge to nursing or rehabilitation facility, and death, which are not complications but consequences thereof.In addition, the use of composite outcome measures like 'any complication' any 'serious complication' has limitation, especially in the current population in which participants may have multiple injuries.Furthermore, with the relatively small sample size of multiple rib fractures and an associated low number of some of the individual outcomes, this study could be underpowered.Larger, preferably prospective, studies are required in order to draw a final conclusion on the validity of the ACS NSQIP SRC for predicting outcomes in this specific set of injuries.There is no available prior literature for these diagnoses to support external validity.Finally, literature reporting on the validity of the ACS NSQIP SRC for multiple rib fractures and pelvic

Fig. 1 .
Fig. 1.Example of the ACS NSQIP SRC website, showing the A) patient and surgical information and B) resulting risks of postoperative outcomes.

Fig. 3 .
Fig. 3. Receiver-operating characteristic curves for serious or any postoperative complication according to the ACS NSQIP SRC for A and B) multiple rib fractures, C and D) pelvic ring/acetabular fracture, or E and F) femoral fracture.

Fig. 4 .
Fig. 4. Correlation between ACS NSQIP SRC predicted risk for serious or any complication and time to surgery for A and B) multiple rib fractures, C and D) pelvic ring/acetabular fracture, or E and F) femoral fracture.

Table 1
Frequency of Current Procedural Terminology (CPT) codes per diagnosis in the study population.

Table 2
Preoperative risk factors per diagnosis.
Data are shown as median (P 25 -P 75 ) or as n (%).ASA, American Society of Anesthesiologists; BMI, body mass index; COPD, chronic obstructive pulmonary disease; HLOS, hospital length of stay; ICU, intensive care unit; LOS, length of stay; SIRS, systemic inflammatory response syndrome.aDataare missing for 11 patients with a femoral fracture.bData are shown for patients admitted to the ICU.C.L.E.Laane et al.

Table 3
Observed rates and predicted risks of postoperative outcomes and diagnostic performance, discrimination, and calibration of the ACS NSQIP SRC.