Fully automated calcium scoring predicts all-cause mortality at 12 years in the MILD lung cancer screening trial

Coronary artery calcium (CAC) is a known risk factor for cardiovascular (CV) events and mortality but is not yet routinely evaluated in low-dose computed tomography (LDCT)-based lung cancer screening (LCS). The present analysis explored the capacity of a fully automated CAC scoring to predict 12-year mortality in the Multicentric Italian Lung Detection (MILD) LCS trial. The study included 2239 volunteers of the MILD trial who underwent a baseline LDCT from September 2005 to January 2011, with a median follow-up of 190 months. The CAC score was measured by a commercially available fully automated artificial intelligence (AI) software and stratified into five strata: 0, 1–10, 11–100, 101–400, and > 400. Twelve-year all-cause mortality was 8.5% (191/2239) overall, 3.2% with CAC = 0, 4.9% with CAC = 1–10, 8.0% with CAC = 11–100, 11.5% with CAC = 101–400, and 17% with CAC > 400. In Cox proportional hazards regression analysis, CAC > 400 was associated with a higher 12-year all-cause mortality both in a univariate model (hazard ratio, HR, 5.75 [95% confidence interval, CI, 2.08–15.92] compared to CAC = 0) and after adjustment for baseline confounders (HR, 3.80 [95%CI, 1.35–10.74] compared to CAC = 0). All-cause mortality significantly increased with increasing CAC (7% in CAC ≤ 400 vs. 17% in CAC > 400, Log-Rank p-value <0.001). Non-cancer at 12 years mortality was 3% (67/2239) overall, 0.8% with CAC = 0, 1.0% with CAC = 1–10, 2.9% with CAC = 11–100, 3.6% with CAC = 101–400, and 8.2% with CAC > 400 (Grey’s test p < 0.001). In Fine and Gray’s competing risk model, CAC > 400 predicted 12-year non-cancer mortality in a univariate model (sub-distribution hazard ratio, SHR, 10.62 [95% confidence interval, CI, 1.43–78.98] compared to CAC = 0), but the association was no longer significant after adjustment for baseline confounders. In conclusion, fully automated CAC scoring was effective in predicting all-cause mortality at 12 years in a LCS setting.


Introduction
Coronary artery calcium (CAC) is an independent predictor of cardiovascular (CV) events and mortality [1]. Previous studies in low-dose computed tomography (LDCT) lung cancer screening (LCS) participants demonstrated the predictive value of manual CAC scoring [2,3]. Nevertheless, the timely consuming procedure of manually computing CAC scoring currently hinders its routine evaluation in LCS. Artificial intelligence (AI) software has shown promise to provide fully automated CAC quantification, possibly refining post-test risk assessment in LCS while speeding up the process and minimizing variability [4]. However, the evidence in favor of such an approach is still limited, particularly in non-electrocardiogram (ECG)-gated LDCT.
The present study aimed at assessing the predictive value of an automated CAC quantification for 12-year all-cause and non-cancer mortality in the Multicentric Italian Lung Detection (MILD) LCS screening trial. The association between CAC and lung cancer mortality was also explored.

Study participants
Details about the MILD study (ClinicalTrials.gov Identifier: NCT02837809) were published elsewhere [5,6]. Briefly, MILD eligibility criteria were as follows: age 49-75 years, current or former smokers (having quit smoking within 10 years before recruitment) with at least 20 pack-years of smoking history, and no history of cancer within the previous 5 years. The MILD project was initially designed as a multicentric trial, with a planned sample size of 10 000 individuals, a screening period of 10 years, and a total follow-up of 100 000 person-years. Such a sample size would be adequate to detect a 30% reduction in lung cancer mortality in the LDCT arm. However, the national program faced many difficulties as a result of a lack of funding, limited support from local authorities, and cultural prejudice: only a few hospitals from the Lombardy region obtained permission to start the trial, and recruitment was limited. For these reasons, we included in the MILD trial only the individuals enrolled and screened at the Istituto Nazionale Tumori of Milan. A total of 4099 participants were enrolled from September 2005 to January 2011. Of them, 1723 were randomized to the control group and 2376 to the LDCT group. All the MILD trial volunteers randomized to the LDCT group were considered potentially eligible in our retrospective study. The original Institutional Review Board approval and written informed consent allowed the use of data for future research, including the present analyses.

Imaging acquisition and analysis
LDCTs were acquired on a 16-detector row CT scanner (Somatom Sensation 16; Siemens Medical Solutions, Forchheim, Germany). The whole chest volume was scanned during one deep inspiratory breath-hold without the use of a contrast medium and with the following scanning parameters: tube voltage, 120 kV; effective tube current, 30 mAs; individual detector collimation, 0.75 mm; gantry rotation time, 0.5 seconds; and pitch, 1.5. Neither ECG-triggering nor dose-modulation systems were used. Images were reconstructed as follows: one-millimeter-thick sections were reconstructed with an increment of 1 mm (medium-sharp kernel, B50f), and 5-mm-thick sections were reconstructed with an increment of 5 mm (mediumsmooth kernel, B30f).
For the automated CAC evaluation, 1-mm images were transferred to a dedicated graphic station (Alienware Area 51 R6 equipped with Dual NVIDIA GeForce RTX 2080 OC graphics) and analyzed using commercial AI software (AVIEW, Coreline Soft, Seoul, Korea) based on a 3-dimensional U-net architecture [7] (Fig 1). The rationale for using 1-mm images was supported by previous data demonstrating a more accurate CAC scoring with 1-mm than thicker slices in LCS LDCTs [8,9]. CAC was assessed using the Agatston score and stratified into the following strata: 0, 1-10, 11-100, 101-400, and > 400 [10].

Demographic, clinical, and follow-up data
Clinical and demographic information was retrieved by a medical doctor from the written questionnaire completed by each study participant during the baseline visit. The vital status and date of death were obtained through the Istituto Nazionale di Statistica (ISTAT, SIATEL 2.0 platform). Participants accumulated person-years of follow-up from the date of baseline until death or the date of the last follow-up as of August 2022. Causes of death were collected through direct contact with referring mortality and cancer registers located throughout Italy. A total of 17/191 causes of death were missing.

Statistical analysis
Categorical variables were reported as numbers and percentages, whereas continuous variables as medians with interquartile ranges (IQRs); associations were evaluated by the Cochran-Mantel-Haenszel Test for trend for categorical data and by the Jonckheere-Terpstra Test for trend for continuous variables. Boxplots were reported to describe the distribution of CAC scores stratified by (i) age (<55, 55-59, 60-64, and � 65) and sex and by (ii) pack-years (<30, 30-39, 40-49, and � 50) and sex. Kaplan-Meier curve for 12-year all-cause mortality was reported in strata of CAC score in all participants, and comparisons were tested by Log-Rank test for trend. Twelve-year non-cancer mortality was estimated by cumulative incidence function for competing risk, and comparisons were tested by Gray's test. Mortality curves were also stratified by sex. Lung cancer-specific mortality was explored as a supplementary analysis by cumulative incidence function for competing risk, and comparisons were tested by Gray's test. Univariate and multivariate Cox proportional hazard regression was applied to estimate the 12-year all-cause mortality hazard ratio (HR) and 95% confidence interval (CI). Univariate and multivariate Fine and Gray's competing risk model was used to estimate the 12-year noncancer mortality sub-distribution hazard ratio (SHR) and 95% confidence interval (CI). Multivariate models were adjusted for age, sex, smoking status, pack-years, body mass index (BMI), and prior CV disease (i.e., angina, myocardial infarction, stroke, or thrombosis) to reduce the potential effect of different baseline characteristics. The analyses were performed using the Statistical Analysis System Software (Release SAS:9.04; SAS Institute, Cary, North Carolina, USA) and R Statistical Software (R Studio).
In  predicted all-cause mortality (S1 Table in S1 File). The life expectancy at 12 years significantly decreased with higher CAC (Log-Rank test, p < 0.001) (Fig 4). The same trend was confirmed by analyzing males and females separately (S1A and S1B Fig in S1 File, respectively). Volunteers with a CAC � 100 had a lower risk of all-cause mortality than subjects with a CAC > 100 (6% vs. 14%, Log-Rank p < 0.001), as well as those with a CAC � 400 as compared with those with a CAC > 400 (7% vs. 17%, Log-Rank p < 0.001). The risk of non-cancer mortality significantly increased with increasing CAC (Gray's test, p < 0.001) (Fig 5). In 12-year non-cancer Fine and Gray's competing risk model, unadjusted SHR was statistically significant for CAC > 400 (SHR, 10.62 [95%CI, 1.43-78.98]) ( Table 3). In the multivariate model, the CAC score > 400 was no longer statistically significant. The increased risk was shown both in males  and in females separately, despite the decrease in the number of events (S2A and S2B Fig in S1 File, respectively). A non-significant trend of CAC scores was found for lung cancer-specific mortality (Gray's test, p = 0.2652) (S3 Fig in S1 File).

Discussion
Fully automated AI-based CAC scoring was an independent predictor of 12-year all-cause mortality in the MILD LCS trial, with a 2.3-fold risk for CAC > 100 compared to � 100 (p < 0.001) and a 2.5-fold risk for CAC > 400 compared to � 400 (p < 0.001). Otherwise, automated CAC scoring did not independently predict 12-year non-cancer mortality and was found not to be significantly associated with lung cancer mortality. CAC values were significantly higher in males than in females, and the CAC scoring increased as the age or the packyears increased, with its predictive value being similar in males and females despite the different frequencies in the two groups. Increasing CAC scores were significantly associated with a higher risk of coronary events and all-cause mortality in the NELSON trial [11]. A recent meta-analysis including data from six LCS trials found that subjects with CAC > 400 or > 1000 had more than 2-fold increased relative risk of all-cause mortality compared to lower CAC scores [2]. In a previous study performed on a subset of MILD volunteers, manual CAC predicted all-cause mortality even when adjusted for potential confounders [3]. Our results are mostly in keeping with such evidence, notably regarding all-cause mortality risk stratification, while the estimates of non-cancer mortality were possibly affected by the small number of events. Overall, based on the extended follow-up period, the present analysis highlights the potential for systematic CAC scoring implementation in LCS trials to improve preventive strategies by individual mortality risk assessment.
There is evidence in the literature to suggest a link between CAC and cancer mortality [12,13]. In a retrospective analysis including 55,943 patients from the CAC consortium, CAC scores were significantly associated with an increased risk of long-term mortality from lung cancer, with the strongest associations for current and former smokers, especially in women   [17]. Our preliminary findings on lung cancer mortality seem not to corroborate previous evidence but need to be interpreted with caution due to the single-center study cohort and the setting of LCS. Further analyses are foreseen to explore the association between CAC and cancer mortality in the MILD trial, ideally providing information on possible links that go beyond lung cancer by including other types of neoplasms diagnosed in the present cohort. We observed that ex-smokers had significantly higher CAC scores than current smokers. Previous studies reported that smoking cessation is associated with a lower prevalence of CAC compared with current smokers [18], suggesting that it might play a role in decreasing the CAC burden, in apparent discrepancy with our findings. However, the MILD population included ex-smokers who had stopped smoking less than 10 years before to the observation, enabling speculation that a longer duration of smoking cessation may have led to a CAC burden decrease in former smokers [19].
We build on previous methodologies through fully automated AI-based software, possibly increasing reproducibility while reducing the time-consuming manual evaluation process [20,21]. AI-based approaches have been increasingly employed in CAC evaluation as supported by data demonstrating automated CAC reliable performances compared with the reference standard of manual assessment [4,[22][23][24]. In the present study, we employed an AI-based software validated on non-electrocardiogram (ECG)-gated LDCT using multiinstitutional datasets with manual CAC scoring as the reference standard [25]. The same software yielded better diagnostic performance with 1-mm than 2.5-mm LDCT images [9], supporting our method of analyzing the thinnest reconstruction available. It is worth emphasizing that this approach may lead to potential prognostic implications for participants in LCS trials, that is, in a setting where neither recommended image reconstruction [26] nor ECG-gating is routinely available.
The present study has some limitations. First, the single-center retrospective design is prone to confounding factors affecting the results' generalizability, such as patient selection. Furthermore, as the MILD study primarily targeted LC, conventional CV risk factors were mostly unknown, thereby not allowing an exhaustive cardiovascular risk profiling. However, measuring risk factors such as lipid levels is not part of current LCS protocols and would affect the logistics of screening practice. Last, data about previous therapies were not systematically collected, preventing a thorough analysis and adjustment of automated CAC prediction.
In conclusion, a fully automated CAC by means of AI-based commercially available software could be performed on chest LDCT for mortality risk stratification in LCS.