Assessment of three risk evaluation systems for patients aged ≥70 in East China: performance of SinoSCORE, EuroSCORE II and the STS risk evaluation system

Objectives To assess and compare the predictive ability of three risk evaluation systems (SinoSCORE, EuroSCORE II and the STS risk evaluation system) in patients aged ≥70, and who underwent coronary artery bypass grafting (CABG) in East China. Methods Three risk evaluation systems were applied to 1,946 consecutive patients who underwent isolated CABG from January 2004 to September 2016 in two hospitals. Patients were divided into two subsets according to their age: elderly group (age ≥70) with a younger group (age <70) used for comparison. The outcome of interest in this study was in-hospital mortality. The entire cohort and subsets of patients were analyzed. The calibration and discrimination in total and in subsets were assessed by the Hosmer–Lemeshow and the C statistics respectively. Results Institutional overall mortality was 2.52%. The expected mortality rates of SinoSCORE, EuroSCORE II and the STS risk evaluation system were 0.78(0.64)%, 1.43(1.14)% and 0.78(0.77)%, respectively. SinoSCORE achieved the best discrimination (the area under the receiver operating characteristic curve (AUC) = 0.829), followed by the STS risk evaluation system (AUC = 0.790) and EuroSCORE II (AUC = 0.769) in the entire cohort. In the elderly group, the observed mortality rate was 4.82% while it was 1.38% in the younger group. SinoSCORE (AUC = .829) also achieved the best discrimination in the elderly group, followed by the STS risk evaluation system (AUC = .730) and EuroSCORE II (AUC = 0.640) while all three risk evaluation systems all had good performances in the younger group. SinoSCORE, EuroSCORE II and the STS risk evaluation system all achieved positive calibrations in the entire cohort and subsets. Conclusion The performance of the three risk evaluation systems was not ideal in the entire cohort. In the elderly group, SinoSCORE appeared to achieve better predictive efficiency than EuroSCORE II and the STS risk evaluation system.

the best discrimination in the elderly group, followed by the STS risk evaluation system (AUC = .730) and EuroSCORE II (AUC = 0.640) while all three risk evaluation systems all had good performances in the younger group. SinoSCORE, EuroSCORE II and the STS risk evaluation system all achieved positive calibrations in the entire cohort and subsets. Conclusion. The performance of the three risk evaluation systems was not ideal in the entire cohort. In the elderly group, SinoSCORE appeared to achieve better predictive efficiency than EuroSCORE II and the STS risk evaluation system.

INTRODUCTION
The number of cardiac surgeries in China is increasing steadily, especially coronary artery bypass grafting (CABG) . Cardiac surgeries are currently regarded as safe and effective owing to the development of surgical, anesthetic and perioperative management (Hu et al., 2016). Preoperative risk evaluation systems play an important role in current cardiac surgical practice. During the last two decades, various risk evaluation systems have been developed to predict mortality for cardiac surgery, such as the European System for Cardiac Operative Risk Evaluation (EuroSCORE) (Nashef et al., 1999;Chalmers et al., 2013) and the Society of Thoracic Surgeons (STS) score (Fortescue, Kahn & Bates, 2001). In China, Sino System for Coronary Operative Risk Evaluation (SinoSCORE) Zheng & Zhang, 2010), based on more than 9,000 Chinese patients was published in 2010.
Previous studies only evaluated the predictive capacity of these risk evaluation systems among different operations and had rarely evaluated the ability to predict mortality among different age groups (Collins & Manach, 2016;Allyn et al., 2017;Aggarwal et al., 2016;Ad et al., 2016;Garcia-Valentin et al., 2016). Therefore, our study attempts to analyze the predictive capacity of the three risk evaluation systems for patients aged ≥70 who were treated with isolated CABG operation in East China.

METHODS Patients
Form January 2004 to September 2016, 2,070 patients from two hospitals (the First Affiliated Hospital of Nanjing Medical University and the East Hospital affiliated to Tongji University) who underwent isolated CABG were invited to participate in the study, which was approved by ethic committees of the two hospitals (Ethical Application Ref: 2017-SR-053;[2017] research (018), respectively). All patients had been discharged when data was extracted. Inclusion criteria were isolated CABG while exclusion criteria were CABG combined with other cardiac surgeries and missing of information. All enrolled patients signed informed consent forms. There were 124 (5.99%) patients excluded from the analysis because of incomplete data (n = 84, 4.01%) and participation in developing SinoSCORE (n = 41,1.98%), and a total of 1,946 procedures constituted made up the database. Each patient's diagnosis was confirmed by coronary arteriography. According to the study database, the operative risk was predicted by the algorithms online: SinoSCORE is available at http://www.cvs-china.com/sino.asp, EuroSCORE II is available at http://www.euroscore.org/calc.html and the STS risk evaluation system is available at http://riskcalc.sts.org/STSWebRiskCalc273/de.aspx. The expected mortality rates of each patient were ascertained by every system above.
The average life expectancy of the Chinese people has reached 76.34 years old, while the mortality rate of CAD has increased significantly in people over 70 years old. To explore the predictive efficacy of the three evaluation systems, in each set, patients were divided into two subsets according to their age: elderly group (age ≥70, 33.04%, 643/1,946) and younger group (age<70, 66.96%, 1,303/1,946). The calibration and discrimination of the three systems in entire cohort and each subset were assessed and compared. Furthermore, in order to make a fair comparison among the three systems, we compared the expected and observed mortality rates in entire cohort and each subset, respectively.

Outcome endpoint
The outcome of interest in this study was in-hospital mortality, which was defined as postoperative in-hospital all-cause death.

Statistical analysis
Statistical analysis was performed with SPSS version 19.0 (SPSS Inc., Chicago, IL, USA). If continuous variables satisfy the normal distribution, then variables were expressed as mean ± standard deviation, else variables were expressed as median and interquartile range (IQR). Categorical variables were expressed as percentages. Statistical analysis comparing groups was performed using t -tests for continuous variables, Mann-Whitney-Wilcoxon tests for ordinal or continuous variables that do not satisfy the assumptions required for a t -test, and Fisher's exact or χ 2 (chi-square) tests for categorical variables. A P-value of less than 0.05 was considered significant.
Performance of the three risk evaluation systems was assessed by comparing the expected and observed in-hospital mortality rates. Calibration (statistical precision) was analyzed by Hosmer-Lemeshow (H-L) goodness-of-fit statistic (Lemeshow & Hosmer Jr, 1982). The H-L statistic measured the differences between expected and observed outcomes. P-value greater than 0.05 means there is no evidence that this risk evaluation system is poorly calibrated. The area under the receiver operating characteristic curve (AUC), which was used to assess how well the evaluated system could discriminate between survivors and non-survivors, describes an estimate of the model's discrimination ability (Hanley & Mcneil, 1982). The discriminative power of the risk evaluation system is considered excellent if the AUC >0.80, good if >0.75 and acceptable if >0.70. Calibration plots of observed versus expected mortality rates for 20 equally sized groups by ranked expected rates calculated of the three systems were constructed. The ideal calibrated predictions consist of a 45 • line, Where points below or above the diagonal line indicated overestimation or underestimation respectively. Plots of observed and expected mortality rates of the three systems by age distribution were also drawn, to illustrate the differences mortality rate in subsets.
Finally, the net benefit of three risk evaluation systems for predicting in-hospital mortality was performed by Decision Curve Analysis (DCA). DCA consisted in the subtraction of the proportion of all patients who are false-positive from the proportion who are true-positive, weighting by the relative harm of a false-positive and a false-negative result. DCA was performed using R software version 3.4.0 (R Core Team, 2017) with the package Decision curve.

Performance in elderly group
There were 31 observed deaths, with an observed mortality rate of 4.82% (Table 1). Expected mortality rate of SinoSCORE, EuroSCORE II and the STS risk evaluation system for the elderly group were 1.06(0.87)%, 2.21(1.29)% and 1.27(0.97)%, respectively. The expected ability of the three risk evaluation systems in the elderly group is shown in Table 2. SinoSCORE, EuroSCORE II and the STS risk evaluation system all showed positive calibration in predicting in-hospital mortality (H-L: P = 0.053, P = 0.389 and P = 0.061, respectively) ( Table 2). The discriminative power for the elderly group in SinoSCORE was the best (0.829), followed by the STS risk evaluation system (0.730) and EuroSCORE II (0.640) (Fig. 1). SinoSCORE showed excellent discrimination in elderly patients while EuroSCORE II had poor discrimination in elderly patients. The decision curve analysis (DCA) showed the clinical practicability of SinoSCORE, EuroSCORE II and the STS risk evaluation system when the clinician predicted the mortality of patients after cardiac surgery. The results were presented as a graph with the selected probability threshold (i.e., the degree of certitude of postoperative mortality over which patients refused operation) plotted on the abscissa and the net benefits of the risk evaluation system on the ordinate. The net benefit of the SinoSCORE was always greater than that of EuroSCORE II and the STS risk evaluation system regardless of the selected threshold, included between 0 and 20% ( Fig. 2).

Performance in younger group
There were 18 observed deaths, with an observed mortality rate of 1.38%. Expected mortality rates of SinoSCORE, EuroSCORE II and the STS risk evaluation system for the younger group were 0.67(0.50)%, 1.18(0.70)% and 0.60(0.50)%, respectively. The expected ability of the three risk evaluation systems in the younger group is shown in Table 2. SinoSCORE, EuroSCORE II and the STS risk evaluation system showed positive calibration (H-L: P = 0.643, P = 0.527 and P = 0.321, respectively) ( Table 2). The discriminative power for the younger group in EuroSCORE II was the highest (0.785), followed by the STS risk evaluation system (0.772) and SinoSCORE (0.769) (Table 2, Fig. 1). Three risk evaluation systems all showed good discrimination in the younger group. The decision curve of SinoSCORE, EuroSCORE II and the STS risk evaluation system remained very close regardless of the threshold selected. The benefit of EuroSCORE II was relatively greater than that of SinoSCORE and the STS risk evaluation system: between 0 and 20% (Fig. 2).
Calibration plots showed that the three risk evaluation systems deviated from the diagonal line, so these risk evaluation systems all underestimated mortality rates in the entire cohort (Fig. 4). When the age factor was considered, the mortality obviously increased with age. SinoSCORE, EuroSCORE II and the STS risk evaluation system overestimated in-hospital mortality rates in patients under 55 years, but underestimated in-hospital mortality rates in patients over 55 years (Fig. 5).

DISCUSSION
Preoperative risk assessment of patients undergoing CABG is critical for treatment decisions-making, prognostic judgments, preoperative patient education and qualityassurance measure (Patratdelon et al., 2016). The EuroSCORE II and the STS risk evaluation system, which were widely used around the world, have been well evaluated in different countries and regions (Ad et al., 2016;Aydın et al., 2015;Holinski et al., 2015;George et al., 2015). However, they may not accurately predict preoperative risk of Chinese patients and they require validation before clinical application (Bai et al., 2016;Wang et al., 2014). In China, based on more than 9,000 patients in a national multi-centre database known as the Chinese Coronary Artery Bypass Grafting Registry Study , Sino System for Coronary Operative Risk Evaluation (SinoSCORE) was published in 2010 (Zheng & Zhang, 2010). After its publication, SinoSCORE was accepted and achieved good assessments in several medical centres in China (Yu et al., 2015;Feng et al., 2014;Liu et al., 2013). Patients from the two regional central hospitals in East China could represent typical East China patients. Additionally, there was no difference in surgical techniques. Any risk evaluation system could show its best performance only when the patient's characteristics and treatment process were similar to those which the system was originated. Therefore, risk evaluation system should be tested in the local population before used reliably. So it is very meaningful to assess and compare the predictive ability of the three risk evaluation systems in East China patients, especially in the elderly. However, previous studies rarely evaluated the ability to predict mortality among subsets grouped by age.
In this study, SinoSCORE was superior to EuroSCORE II and the STS risk evaluation system in predicting operative mortality both in entire cohort and in the elderly group. There are some possible reasons to explain the results: (1) SinoSCORE was based on Chinese patients, the characteristics of patients were similar to the study database; (2) Removing 41 cases involved in SinoSCORE, we compared the differences between our local database and the SinoSCORE database. There were some similar risk factors between two databases, such as age, hypertension, diabetes, renal failure, peripheral vascular disease, active endocarditis, critical preoperative state, three-vessel coronary disease and in-hospital mortality Zheng & Zhang, 2010;, and which might be the reason contributed to SinoSCORE's a good expected power (Table 4); (3) SinoSCORE was designed for isolated CABG operation, while EuroSCORE II and the STS risk evaluation system were not only suitable for isolated CABG but also for other cardiac surgeries, such as valvular surgery and aortic surgery. Therefore, SinoSCORE was more suitable in the study and showed better expected performance.
Although the three systems all had positive calibration and discrimination, it was a pity that they substantially underestimated the mortality in the entire cohort and subsets. The discrepancy between observed and expected mortality was particularly high in the elderly group (4.82% vs.1.06%, 2.21%, 1.27%, respectively). One reason may be related to the predictive accuracy of the three systems; another is that the preoperative parameters of the elderly were different from those of normal population. Aging is a global problem; China, as a developing country, is also facing this problem. However, compared with some developed countries, China's aging is not so serious Li et al., 2015;Zhen et al., 2015). In this study, patients over 70 years old accounted for 33.04%. Previous literature has reported that the proportion of patients over 70 years was up to 60% (Churpek et al., 2015). Patients are now older, have more comorbidities and more coronary lesions with decreased cardiac function. Some risk factors such as hypertension, COPD, NYHA class IV, preoperative IABP and more diseased vessels are more common in elderly patients. It is generally known that the risk of cardiac surgery in elderly patients is higher than that of the general population. Thus, preoperative assessment of surgical risk is more important for the elderly.
The STS risk evaluation system was widely used and has been found to be effective for predicting mortality in short and long term. EuroSCORE II is another system which was widely accepted and implemented not only in Europe, North America but also in Asia. EuroSCORE II included fewer variables than the STS risk evaluation system (18 vs 26), thereby making it easier to use in clinical practices. However, SinoSCORE has only 11 variables, whose weights were obviously different from the previous two systems. Comparing the variables of the three systems, only a few were the same, such as age, diabetes, status of surgery and type of surgery, which could lead to the difference in predictive capabilities among them. Previous literatures reported the in-hospital mortality rate of patients undergoing CABG in developed countries was 2.18∼2.50% (Moazzami et al., 2017;Curtis et al., 2017;Swaminathan et al., 2016;Herlitz et al., 2015;Kuwaki et al., 2015). The mortality rate in this study was 2.52%, slightly higher than previous reported. It also explained that the three risk evaluation systems under estimated mortality rates. One possible reason was that although cardiac surgery and perioperative care in China had developed rapidly in the last decades, there were still some gaps compared with the developed countries. Another possible reason was that there were 124 patients excluded from the study. The quality control of the hospital required all in-hospital deaths should be archived. As a result, this study included all deaths (except two death developing SinoSCORE); to some extent, the mortality rate had increased.
SinoSCORE solved the problem that China did not have its own heart surgery risk evaluation system. Although it was published recently, SinoSCORE has achieved good assessments in this study, as it did in other reports (Yu et al., 2015;Feng et al., 2014;Liu et al., 2013). When compared with EuroSCORE II and the STS risk evaluation system, SinoSCORE shows no compromise.
SinoSCORE was designed to evaluate isolated CABG mortality, where as EuroSCORE II and the STS risk evaluation system can be used to predict other cardiac surgical mortality. Also, the STS risk evaluation system can also predict other outcomes, evaluating the predictive capacities of the STS risk evaluation system to predict only operative mortality may undermine its potency. This study was a double-centre retrospective and nonrandomized observational study. The sample size was still small compared with other studies that were sourced from a large number of cases. The three risk evaluation systems underestimated the mortality in the entire corhort and subsets in spite of their positive discrimination and calibration. These factors might result in bias. Therefore, the mortality statistics may be limited to some degree.

CONCLUSION
The performance of the three risk evaluation systems was not ideal, although the three risk evaluation systems showed positive discrimination and calibration in the entire cohort. SinoSCORE achieved slightly better predictive efficiency than STS risk evaluation system in elderly patients underwent CABG in East China.