Prediction of clinically relevant Pancreatico-enteric Anastomotic Fistulas after Pancreatoduodenectomy using deep learning of Preoperative Computed Tomography

Rationale: Clinically relevant postoperative pancreatic fistula (CR-POPF) is among the most formidable complications after pancreatoduodenectomy (PD), heightening morbidity/mortality rates. Fistula Risk Score (FRS) is a well-developed predictor, but it is an intraoperative predictor and quantifies >50% patients as intermediate risk. Therefore, an accurate and easy-to-use preoperative index is desired. Herein, we test the hypothesis that quantitative analysis of contrast-enhanced computed tomography (CE-CT) with deep learning could predict CR-POPFs. Methods: A group of 513 patients underwent pancreatico-enteric anastomosis after PD at three institutions between 2006 and 2019 was retrospectively collected, and formed a training (70%) and a validation dataset (30%) randomly. A convolutional neural network was trained and generated a deep-learning score (DLS) to identify the patients with higher risk of CR-POPF preoperatively using CE-CT images, which was further externally tested in a prospective cohort collected from August 2018 to June 2019 at the fourth institution. The biological underpinnings of DLS were assessed using histomorphological data by multivariate linear regression analysis. Results: CR-POPFs developed in 95 patients (16.3%) in total. Compared to FRS, the DLS offered significantly greater predictability in training (AUC:0.85 [95% CI, 0.80-0.90] vs. 0.78 [95% CI, 0.72-0.84]; P = 0.03), validation (0.81 [95% CI, 0.72-0.89] vs. 0.76 [95% CI, 0.66-0.84], P = 0.05) and test (0.89 [95% CI, 0.79-0.96] vs. 0.73 [95% CI, 0.61-0.83], P < 0.001) cohorts. Especially in the challenging patients of intermediate risk (FRS: 3-6), the DLS showed significantly higher accuracy (training: 79.9% vs. 61.5% [P = 0.005]; validation: 70.3% vs. 56.3% [P = 0.04]; test: 92.1% vs. 65.8% [P = 0.013]). Additionally, DLS was independently associated with pancreatic fibrosis (coefficients: -0.167), main pancreatic duct (coefficients: -0.445) and remnant volume (coefficients: 0.138) in multivariate linear regression analysis (r2 = 0.512, P < 0.001). The user satisfaction score in the test cohort was 4 out of 5. Conclusions: Preoperative CT based deep-learning model provides a promising novel method for predicting CR-POPF occurrences after PD, especially at intermediate FRS risk level. This has a potential to be integrated into radiologic reporting system or incorporated into surgical planning software to accommodate the preferences of surgeons to optimize preoperative strategies, intraoperative decision-making, and even postoperative care.

The reported risk factors for CR-POPFs are broadly classifiable as local factors at pancreatic remnant [9], systemic factors (e.g., high body mass index [BMI]), and operative factors (eg, blood loss) [10][11][12][13]. Local risk factors at pancreatic remnant, which strongly linked to underlying local histopathologic changes, such as rich viable gland, absence of fibrosis [14], and fatty pancreas [15,16], represent the most likely determinants directly related to anatomic failure. The Fistula Risk Score (FRS), which incorporated four of the aforementioned parameters: small-sized main pancreatic duct (MPD), soft glandular texture (by surgeon's palpation), high-risk pathology (chronic pancreatitis [CP] or pancreatic adenocarcinoma [PDAC]), and undue intraoperative blood loss, is a well-developed and validated 10-point scale used to intraoperatively predict CR-POPF development after PD [17][18][19]. Despite the simplicity and convenience of the FRS, it relies on subjective intraoperative findings of surgeons. Moreover, >50% patients qualify as intermediate risk (FRS scores of [3][4][5][6], which is a grey zone warranting more objective and reliable predictors. Beyond FRS, local risk factors via quantitative imaging from standard-of-care contrast-enhanced computed tomography (CE-CT) images, such as pancreatic thickness [20] or remnant volume [21,22], have been measured manually in dozens of studies conducted during the last decade, showing promising results in predicting CR-POPF events. Recently, the artificial intelligence (AI) based medical image analysis provides an objective and automatic way to capture all important local properties due to its selflearning characteristics, and has achieved success in different fields [23][24][25][26]. However, deep-learning model hasn't been investigated in CR-POPF prediction.
Thus, the purpose of this study was to (1) develop and validate an easy-to-use deep-learning model to predict CR-POPF after PD, and (2) determine its diagnostic performance and compare it with FRS, and finally (3) investigate the histomorphologic changes pertaining to deep-learning score (DLS) generated by the deep-learning model.

Study population
This was a diagnostic, multicenter, multi-cohort study involving four cohorts from four high-volume academic institutions. This study was approved by the Institutional Review Board of each local institution and adhered to ethical standards of the 1964 Helsinki Declaration, including its later amendments. Informed consent was waived in all cohorts. From institutions A-C, 513 patients were obtained retrospectively from radiologic and pathologic archives, and were randomly divided into a training (n = 359) and validation cohort (n = 154) randomly with a ratio of 70/30 to train and validate a deep-learning model based on preoperative CE-CT images. Another cohort (Institution D) was prospectively collected as the external test (n = 70) dataset, all summarized in Figure 1. Notably, for this prospective cohort, patients who were eligible for pancreaticoduodenectomy were consecutively collected during the period. The radiologists who interpreted the radiological images were blinded to the clinical and laboratory results. The surgeons were also blinded to the estimated POPF risk by the deeplearning model. STROBE guidelines [27] for reporting observational studies were applied during study design, training, validation, and reporting of the prediction model.

Clinical data collection and mitigation strategies
Five lead pancreatic surgeons (>20 years of collective pancreatic surgical experience) performed all PDs in conjunction with either pancreaticojejunostomy (PJ) or pancreaticogastrostomy (PG) [28] for a full array of indications. Medical records provided demographic and clinical data, including age, sex, BMI, diabetes mellitus, reported weight loss in the previous 6 months, jaundice, smoking, or alcohol abuse. POPFs were graded in accordance with ISGPF standards [29,30] as either biochemical or clinically relevant (see Supplemental Table S1).
The four risk factors required for FRS calculations (see Supplemental Table S2) were obtained from operative notes retrospectively or prospectively recorded during surgery by attending surgeons attuned to this study. These risk factors served to generate FRS scores (0-10) individually and thus categorized fistula risk as low (0-2 points), intermediate (3-6 points), or high (7-10 points). Other details are included as Supplemental material S1.

Histology of pancreatic stump
Specimens of the pancreatic stump were evaluated to quantify fibrous tissue, exocrine glandular atrophy (A) [31] and degrees of lipomatosis (L) [32,33], as detailed Supplemental material S2.

Preoperative CT imaging and segmentation
CT scanning parameters and detailed descriptions are presented in Supplemental Table S3. All patients underwent preoperative multiphasic scans (64-channel multi-detector CT or better) within 4 weeks of surgery at <3-mm minimum slice thickness and in three standard-of-care phases, adhering to current National Comprehensive Cancer Network (NCCN) guidelines [34,35]. A nonionic contrast agent containing iodine (300mg/mL) was injected at 2-2.5 mL/kg body weight. Median scan delays from injection of contrast to starts of pancreatic parenchymal and portal venous phases were 40-50sec and 65-70sec, respectively. In most patients, the estimated transection line was at superior mesenteric vein, with modifications as needed for individual tumor locations and projected safety margin restrictions.
Volumetric regions of interest (ROIs) in CT images (Figure 2) were segmented separately by four experienced abdominal radiologists (all with over 10 years of experience in pancreatic imaging), using open-source software (3D Slicer version 4.10; www.slicer.org). Segmentations were undertaken in pancreatic phase of transverse sections. In addition, pancreatic thickness, width, and remnant pancreatic volume were measured as previously published fixed CT classifiers (Supplemental material S1).

Deep-learning (DL) model
The pipeline for the CR-POPF deep-learning model is depicted in Supplemental Figure S1. 2D-ROIs were input to the deep-learning model and DLS values were yielded as probabilities of CR-POPF. To further ensure a robust predictive exercise, the DL model received all pancreatic slices in each patient, conveying the average probability of CR-POPF. Further deep-learning training details can be found in Supplemental material S3. Intermediate activation layers were visualized to assess how the network carries the information from input to output to understand the feature extraction. The Gradientweighted Class Activation Mapping (Grad-CAM) was used to produce a coarse localization map highlighting the important regions in the image for predicting the target concept (CR-POPF or non-CR-POPF). And the reconstructed localization maps were named as positive and negative filters later, which were also used to evaluate the class discrimination [36]. Both Keras toolkit and Python 3.5 were needed to implement this model.

Statistical analysis
Candidate risk factors included radiological DLS, clinical FRS, demographic parameters (sex, BMI, etc.), anastomotic technique (PD or PJ), surgeons and hospital sites, and fistula mitigation strategies (e.g. prophylactic octreotide or transanastomotic stent usage). Risk factors for CR-POPF demonstrating significance (P < 0.05) in univariate logistic regression analysis were then applied to multivariate logistic regression modeling. Strengths of associations were presented as odds ratios (ORs), with 95% confidence intervals (CIs). We also explored the potential for the radiologic DLS to enhance the clinical FRS in predicting CR-POPFs, especially at intermediate levels of FRS risk, pursuing logistic regression analysis in the predictive models. Their comparative performances were plotted as areas under the receiver operating characteristics (ROC) curve (AUC) by Delong method [37]. Diagnostic indices, such as accuracy, sensitivity, and specificity, were obtained with ROC-derived cut-off and compared using McNemar's test. To investigate a potential relation between DLS and histologic or morphologic changes, stepwise multiple linear regression tests were conducted (detailed in Supplemental materials S4).

Usability testing
Double-blind usability testing addressed five functional aspects: predictive ability (e.g., AUC, accuracy, etc.), learnability, efficiency, satisfaction, and memorability. Following training, an open-source DLS model was released online (https://github.com/ lungproject/Pancreas), offering dual tutor-assisted sample cases. The external test cohort was evaluated independently and the DLS values were calculated as directed by the tutor on their own computers.

Patient characteristics
Comparisons of clinical characteristics, intraoperative data, and postoperative histology in patients with and without CR-POPFs are presented in Supplemental Table S4 Figure S2).

Performance of DLS in predicting CR-POPF
Previously published fixed CT classifiers (remnant pancreatic volume, MPD size, and pancreatic width or thickness) failed to surpass the DLS, and no further incremental gains in discriminatory capacity were achieved by adding these to the DLS (Supplemental Figure S3).

FRS risk stratification analysis
Clinical FRS values served to stratify the train cohort by FRS risk level as low ( Figure 4 for each FRS score. A more specific confusion matrix is included as Supplemental Table S6; and details of AUC, specificity, and sensitivity are provided in Supplemental Table S7.

Usability testing
In the test cohort, the user satisfaction score was 4 when using a 5-point scale. The external test cohort was prepared within 10 minutes, and the cases were run and DLS values listed within 1~2 minutes (see Supplemental material S7 for usability testing results).

Discussion
Recently, deep learning has become a popular tool for imaging analysis, fueled by its hierarchal automated learning capacities and the optimal parametric sets delivered. In utilizing the hidden layers of deep learning, the DLS described herein may well serve as a new, highly accessible and approachable means of routine patient examination. Within 1~2 minutes, DLS can be automatically generated from preoperative CT scans with excellent performance (AUC = 0.90 in external cohort), the latter being mandatory for nearly all patients in this setting.   Other traditional methods of quantitative imaging reported to date are burdened by complicated models and tedious manual measurements, whereas the DLS consolidates all local risk factors into one simplified model. We directly compared the DLS with almost all CT classifiers [5,[20][21][22][38][39][40][41] cited in literature (remnant pancreatic volume, MPD diameter, and pancreatic width or thickness), discounting the common belief that deep learning always prevails. However, none of the above showed any superiority to DLS or any incremental benefit in combination with DLS, perhaps reflecting their innate correlations with DLS. One recent study [42] has claimed good results in 80 training (AUC = 0.82) and 37 test (AUC = 0.76) subjects by harnessing conventional radiomics analysis. Similarly, Kambakamba et al utilized machine learning-based texture analysis, which could achieve an impressive AUC of 0.95 in predicting POPF on 110 patients from a single institution [43]. However, the present effort is the first to explore a deep-learning based method with significantly larger cohorts from different institutions, which is simpler to use and easier to interpret using one single biomarker (DLS, or the probability). Compared to classical textural analysis, the visualization of the developed deep learning model could highlight the important regions in the image, such as the pancreatic parenchyma and stump areas, for predicting CR-POPF or non-CR-POPF, which is helpful in revealing the biology underpinnings of the DLS. Additionally, our study further emphasized the super prediction ability in the sub-cohort with FRS-based intermediate risk, which is a grey zone warranting more objective and reliable predictors. The radiologic DLS was designed to add more objective local features to the FRS and fully utilize preoperative CT studies, beyond mere recognition of MPD dilatation. Moreover, pancreatic texture (another determinant of FRS) is indirectly ascertained via DLS, perhaps enabling a strong sense of visually glandular fibrosis or atrophy on CT images. Our analysis also indicated that DLS may outperform the FRS at certain FRS risk levels. In low-risk subjects, in whom CR-POPFs are unlikely, FRS performance was excellent; whereas FRS values >6 clearly equated with high risk. Unfortunately, ~50% of cases gravitate to intermediate FRS risk, according to our data and that of others. CR-COPF rates at this intermediary level (~15%) and overall (~16%) are thus indistinguishable, hindering POPF prevention decisions. Contrastingly, the DLS significantly outperforms the FRS in excluding those patients who lack CR-POPFs, significantly improving specificity in the intermediate-risk group without altering sensitivity. Hence, we see no need for DLS calculation at very low FRS values, except if used as an alternative preoperative biomarker. The DLS is otherwise highly recommended at intermediate FRS levels and may help exclude falsely high risk assessments at FRS values > 6. In short, the DLS is intended for preoperative evaluation of CR-POPF risk in any manner deemed surgically appropriate. As in prior studies, octreotide use for mitigation and nutritional status (weight loss or BMI status) were also associated herein with CR-POPF risk. A comprehensive analysis of all parameters is likely to benefit in predicting postoperative CR-POPF.
Visualization of the convolution filters helped us demystify what has been learned in devising the DLS. Hot-spotted regions, such as remnant pancreatic volume and stump areas ( Figure 2) were important contributors to the final concept, consistent with traditional CT [5] or MR [44] indices of these morphologic features. At a macroscopic level, DLS (Table S8) was significantly associated with MPD  diameter, remnant pancreatic volume, pancreatic  thickness or width, and softer gland texture, while showing a negative relation with pancreatic fibrosis and acinar atrophy in histologic preparations. Higher DLS thus corresponds with soft, non-fibrotic, largesized pancreatic remnants and a non-distended MPD, signifying full tissue viability and fluid productivity and presenting greater challenge in suturing. Conversely, lower DLS attests to a hardened, fibrotic, and atrophic pancreas, with a dilated MPD and limited secretory capacity for easier anastomosis and lower risk of CR-POPF.
We do acknowledge some limitations of this study. The first issue is that our data accrued from four separate and independent centers attended by various research associates. Although different CT modalities were used, the scanners were standardized to 64-slice capability or better, with slice thickness <3 mm. The Hounsfield scale, an international standardization, was also upheld in all CT systems to control inner variabilities of different scanners. The FRS, however, relies on subjective evaluations of individual surgeons, creating opportunities for data inconsistencies. Furthermore, DLS and FRS values were generated exclusively at high-volume institutions and may not carry their predictive weight in lower-volume facilities equipped with earlier generations of scanners or staffed by fewer surgeons.

Conclusion
In conclusion, quantitative preoperative CT assessment using AI is strongly predictive of CR-POPF in patients with pancreaticoenteric anastomoses following PD. The automated scores could reflect histomorphologic features pertaining to pancreatic duct, remnant pancreatic tissue volume and parenchymal fibrosis. The DLS is particularly helpful for patients with intermediate FRS risks in gauging the potential for CR-POPF. Future efforts would focus on its integration into image archiving and communication systems used for radiologic reporting or incorporation into surgical planning software to accommodate the preferences of surgeons, thus optimizing preoperative strategies, intraoperative decision-making, and even postoperative care.