Identification of a serum proteomic biomarker panel using diagnosis specific ensemble learning and symptoms for early pancreatic cancer detection

Background The grim (<10% 5-year) survival rates for pancreatic ductal adenocarcinoma (PDAC) are attributed to its complex intrinsic biology and most often late-stage detection. The overlap of symptoms with benign gastrointestinal conditions in early stage further complicates timely detection. The suboptimal diagnostic performance of carbohydrate antigen (CA) 19–9 and elevation in benign hyperbilirubinaemia undermine its reliability, leaving a notable absence of accurate diagnostic biomarkers. Using a selected patient cohort with benign pancreatic and biliary tract conditions we aimed to develop a data analysis protocol leading to a biomarker signature capable of distinguishing patients with non-specific yet concerning clinical presentations, from those with PDAC. Methods 539 patient serum samples collected under the Accelerated Diagnosis of neuro Endocrine and Pancreatic TumourS (ADEPTS) study (benign disease controls and PDACs) and the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS, healthy controls) were screened using the Olink Oncology II panel, supplemented with five in-house markers. 16 specialized base-learner classifiers were stacked to select and enhance biomarker performances and robustness in blinded samples. Each base-learner was constructed through cross-validation and recursive feature elimination in a discovery set comprising approximately two thirds of the ADEPTS and UKCTOCS samples and contrasted specific diagnosis with PDAC. Results The signature which was developed using diagnosis-specific ensemble learning demonstrated predictive capabilities outperforming CA19-9, the only biomarker currently accepted by the FDA and the National Comprehensive Cancer Network guidelines for pancreatic cancer, and other individual biomarkers and combinations in both discovery and held-out validation sets. An AUC of 0.98 (95% CI 0.98–0.99) and sensitivity of 0.99 (95% CI 0.98–1) at 90% specificity was achieved with the ensemble method, which was significantly larger than the AUC of 0.79 (95% CI 0.66–0.91) and sensitivity 0.67 (95% CI 0.50–0.83), also at 90% specificity, for CA19-9, in the discovery set (p = 0.0016 and p = 0.00050, respectively). During ensemble signature validation in the held-out set, an AUC of 0.95 (95% CI 0.91–0.99), sensitivity 0.86 (95% CI 0.68–1), was attained compared to an AUC of 0.80 (95% CI 0.66–0.93), sensitivity 0.65 (95% CI 0.48–0.56) at 90% specificity for CA19-9 alone (p = 0.0082 and p = 0.024, respectively). When validated only on the benign disease controls and PDACs collected from ADEPTS, the diagnostic-specific signature achieved an AUC of 0.96 (95% CI 0.92–0.99), sensitivity 0.82 (95% CI 0.64–0.95) at 90% specificity, which was still significantly higher than the performance for CA19-9 taken as a single predictor, AUC of 0.79 (95% CI 0.64–0.93) and sensitivity of 0.18 (95% CI 0.03–0.69) (p = 0.013 and p = 0.0055, respectively). Conclusion Our ensemble modelling technique outperformed CA19-9, individual biomarkers and indices developed with prevailing algorithms in distinguishing patients with non-specific but concerning symptoms from those with PDAC, with implications for improving its early detection in individuals at risk.


Methods
539 patient serum samples collected under the Accelerated Diagnosis of neuro Endocrine and Pancreatic TumourS (ADEPTS) study (benign disease controls and PDACs) and the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS, healthy controls) were screened using the Olink Oncology II panel, supplemented with five in-house markers.16 specialized base-learner classifiers were stacked to select and enhance biomarker performances and robustness in blinded samples.Each base-learner was constructed through cross-validation and recursive feature elimination in a discovery set comprising approximately two thirds of the ADEPTS and UKCTOCS samples and contrasted specific diagnosis with PDAC.

Conclusion
Our ensemble modelling technique outperformed CA19-9, individual biomarkers and indices developed with prevailing algorithms in distinguishing patients with non-specific but concerning symptoms from those with PDAC, with implications for improving its early detection in individuals at risk.

Author summary
Pancreatic ductal adenocarcinoma (PDAC) has one of the lowest 5-year survival rates among cancers, primarily due to its complex biology and late-stage diagnosis.Early symptoms often mimic benign gastrointestinal conditions, complicating timely detection.The standard biomarker, carbohydrate antigen (CA) 19-9, is not reliable due to its suboptimal performance and elevation in benign conditions, highlighting the need for better diagnostic tools.In our study, we aimed to develop a biomarker signature to distinguish between uk/studies/all-studies/u/ukctocs/.Once the project has been discussed with the team a project proposal will need to be developed and submitted together with a completed Data Access Committee Application Form to the Data Access Committee.Requests should be addressed to ukctocs@ucl.ac.uk.All packages used in the pre-processing of data and subsequent analysis have been identified, together with their versions, to secure full reproducibility.We have identified the resampling strategies in detail and the base-learner stacking approach tested is also described in detail in the Materials and methods section and in supplementary figures.The application of each of the used algorithms followed the respective package recommendation and the optimal hyperparameters for the data sets we used are reported in Supplementary Information.

Introduction
Pancreatic ductal adenocarcinoma (PDAC) ranks as the seventh primary cause of cancerrelated mortality [1,2].Projections suggest that by 2030, mortality rates from PDAC will exceed that of other prevalent cancers, a shift which is attributed to an increasing incidence of obesity, diabetes mellitus, alcohol consumption in some regions (Europe, North America, and Oceania), advancements in detection and institution of screening initiatives that facilitate the timely identification of more common cancers [1][2][3].
The overall 5-year survival for pancreatic cancer (PC) patients is less than 10%.These figures improve in patients diagnosed with pre-invasive lesions (intraepithelial neoplasia, mucinous cystic lesions) or small tumours (< 2cm) detected at a localised stage [4].Patients with resectable disease are only identified in less than 20% of cases and advances in early detection strategies hold potential for improving these dismal figures [5,6].The relatively low incidence and lifetime risk for PC in the general population (1.3%) preclude asymptomatic, average-risk adult (>50 age) screening, and efforts are rather focused on high-risk populations [6][7][8].Internationally, screening and surveillance is therefore recommended only in high-risk individuals (genetically predisposed, family history and high-risk pancreatic cysts), where a lifetime risk of at least 5% justifies their surveillance [6,7,9,10].While surveillance in these high-risk cohorts is consensus, we also reported on symptomatic cohorts in which the increased risk could justify investigations, as an additional risk group [6,11].
Existing evidence regarding the effect of timely diagnosis on outcomes in PDAC are limited, mostly due to the lack of randomisation, appropriate statistical considerations and homogenisations of study populations, and the topic remains an area of strong debate [12].Yet, it is very likely that prompt identification of PC would improve its prognosis [12][13][14].
The reality of the situation however is that disease rarity, the presence of non-localising symptoms, the relatively low positive predictive values even for cancer specific 'red-flag' and advanced symptoms (e.g.weight loss, painless jaundice of 4-13%) challenge timely recognition in primary care settings, and a substantial number of PC patients are diagnosed following prolonged periods of clinical uncertainty [15,16].Previous case-control primary care studies associated various abdominal symptoms and increased frequency of primary care consultations with PDAC, over the two years preceding its diagnosis [11,17,18].These data suggest another potential window of opportunity for acceleration of PC detection.
In roughly 30% of patients, PC manifests in the form of jaundice indicating tumour induced biliary obstruction, which is more evident in pancreatic head tumours [19].Together with significant weight loss, these frequently represent an already advanced disease.Although most often explained by benign aetiologies, symptoms such as back or epigastric pain, dyspepsia, anorexia, bloating, changes in consistency of stool, weight loss and anxiety/depression may also indicate an underlying pancreatic malignancy [11,[17][18][19][20].Such symptoms in adults (age > 60 years) with lifestyle factors (including heavy alcohol and tobacco consumption, obesity) and on the background of new or long-standing diabetes and chronic pancreatitis, are worrisome [6,11,18].
To accelerate and improve cancer detection rates in the UK, 'electronic cancer decision support tools' (eCDST) have been developed to support primary care clinicians in fast tracking investigations in cases of suspected cancer [21][22][23].Risk prediction algorithms such as QCancer [21][22][23] combine symptoms data, patient risk factors and laboratory tests to predict a risk of undiagnosed cancers of various anatomical sites (colon, pancreas, renal, gastro-oesophageal and ovarian).These are digitally available for primary care physicians through patient record and data management portals [21][22][23][24] and where higher risk justifies further investigations, could be combined with blood biomarker panels for further risk stratification prior to more invasive workup.
When suspected, establishing a diagnosis will involve measurement of the serum marker CA19-9, cross-sectional (computed tomography or magnetic resonance) imaging and histopathology (endoscopic ultrasound guided tissue biopsy; EUS-FNB).CA19-9 is most reliable as a marker of tumour resectability, prognosis and monitoring of disease progression [25,26], but as a diagnostic marker it performs poorly (median sensitivity and specificity of ~80%; AUC = 0.82), particularly in stage I/II disease and in Lewis body negative patients [27,28].The development of reliable and accurate diagnostic biomarkers is essential for risk stratification and prioritisation of further investigations, as well as justification of invasive interventions where the findings on imaging are unequivocal [29].
Recent research has explored blood-based diagnostic biomarkers including proteins, micro-RNAs, circulating tumor cells and DNA methylation patterns, yet remain unvalidated in clinically representative cohorts [30,31].Their aberrant expression in both inflammatory and malignant processes further challenge their discriminative properties.Multi-cancer early detection tests like CancerSEEK [32] and Galleri are emerging [33,34].These analyse circulating DNA for genetic mutations and proteins or methylation patterns associated with cancer.CancerSEEK has shown 67% sensitivity for 12 cancers at 99% specificity, with 72% sensitivity for pancreatic cancer (stages I-III) and 83.7% sensitivity for pancreatic ductal adenocarcinoma (PDAC) detection.However, sensitivity varies across cancer types and additional larger validation studies [34] are needed before considering them for widespread screening [35].
Using serum samples collected from a selected study cohort with benign pancreatic and biliary tract conditions and applying robust machine learning stacked modelling, we therefore developed a novel serum biomarker signature capable of differentiating PC patients from healthy individuals and patients with benign abdominal conditions presenting with non-specific yet concerning symptoms for pancreatic cancer, at higher rates than CA19-9 and other state-of-the-art biomarkers.

Data set characteristics
In the full set of samples collected from the ADEPTS cohort, age at the time of sample collection, 57.44 (range from 19.00 to 93.00) for controls and 69.72 (range from 43.00 to 91.00) for PDAC cases, emerged as a risk factor (OR = 1.06 (95% CI 1.04-1.09),p = 2.47×10 −7 ) (Table 1).As a predictor in a logistic regression model age achieved a ROC AUC of 0.73 (95% CI 0.66-0.79),with a cut-off at 61.5 years (calculated using the Youden's J statistic).This finding was also observed in both the discovery (Fig  [36].In our past research which was focused exclusively on UKCTOCS longitudinal samples, age similarly emerged as a risk factor for PDAC [37].Furthermore, gender (OR = 2.72 (95% CI 1.46-5.27),p = 0.0015) and ethnicity taken as a one-hot encoded variable (OR = 2.02 (95% CI 1.34-3.03),p = 6.56×10 −4 ) were also confirmed as significantly associated with an increased risk of PDAC (Table 1).In the whole set of samples collected from the ADEPTS cohort, men had a 2.72-fold risk of PDAC compared to their female counterparts.Individuals of Caucasian ethnicity demonstrated a decreased risk of PDAC in a one versus rest calculation (OR = 0.38 (95% CI 0.20-0.69),p = 0.0018) and no

Development of a PDAC biomarker signature in the presence of confounding conditions
To aid the early detection of this cancer in individuals at risk, we aimed to develop a biomarker signature that could be used to differentiate between suspected PDACs and benign biliary conditions that often overlap in clinical presentation.We applied a uniquely developed ensemble learning model, with a logistic regression stacking layer (see Fig A in S1 Appendix and statistical analysis in the Materials and methods section), to a set of 539 serum samples (493 controls and 46 PDAC cases) which were analysed using the Olink Oncology II panel as well as four additional biomarkers we previously reported on [37].These included IL6ST, VWF, THBS2 and CA19-9.The oncogenic and prognostic glycolytic enzyme PKM2 was additionally selected based on our past report of its diagnostic utility in biliary tract cancer patients [38][39][40].
The application of stacked ensemble modelling as presented herein bolsters the robustness of predictive outcomes, enhancing the performance of biomarker panels through the incorporation of serum biomarker levels and relevant clinical covariates for distinct diagnostic classes.Each base classifier within the ensemble is designed to provide a specialized distinction between confounding diagnoses and PDAC, thereby establishing a heterogeneous set of classifiers that facilitates the precise identification of PDAC (see statistical analysis section in Materials and methods).Previous studies have attested to the beneficial role of ensemble methods in augmenting early detection of PDAC against only healthy controls [37].The implementation of stacked (Stack, Fig 2), specialized classifiers, developed within the discovery set, generated a biomarker signature capable of predicting PDAC with an AUC of 0.98 (95% CI 0.98-0.99),sensitivity of 0.99 (95% CI 0.98-1), PPV 0.92 (95% CI 0.91-0.92)and NPV 0.99 (95% CI 0.97-1) at 90% specificity.In contrast, the predictive efficacy of CA19-9 in the discovery set taken as a single predictor under a logistic regression model was 0.79 (95% CI 0.66-0.91)(p = 0.0016 under a one-sided bootstrap test comparing the two AUCs), sensitivity 0.67 (95% CI 0.50-0.83),PPV 0.32 (95% CI 0.26-0.38)and NPV of 0.97 (95% CI 0.96-0.99)at 90% specificity (see Table C in S1 Appendix).Amongst all biomarkers, CA19-9 demonstrated the most (Spec) performance of single marker models, i.e.BMI and Age, in the validation set.H Similar to A but for Gender, Ethnicity and Diabetes.Performances were calculated with the respective single feature models developed in the discovery set.The ROC AUC significance threshold is also represented by a purple dashed line at 0.5.Error bars in figures corresponding to the validation set are the 95% Confidence Intervals (CI), calculated by stratified bootstrapping 2000 times.See Statistical Analysis in Materials and methods for further details and Tables A, B and N in S1 Appendix.
A closer examination of the individual performances of each base-learner classifier (Fig 2A and 2B) reveals that the logistic regression stacked ensemble approach has superior performance in both discovery and validation sets.Despite the best base-learner being trained on samples diagnosed as 'Gastritis/Reflux Disease' (Fig 2A and 2B), its performance was also superseded by the AUC computed with the stack model, the logistic regression coefficients of which are delineated in Table E in S1 Appendix.The stack model significantly relies on the "Healthy", "Chronic Pancreatitis", "IgG4 Disease", "Irritable Bowel Syndrome", 'Other Biliary Duct Disease", "Sphincter of Oddi Dysfunction", "No Relevant Diagnosis", "Other Cancer" and "Pancreatic Cyst" base-learners.Even though the remaining diagnostic class base-learners, including "Gastritis/Reflux Disease", did not reach statistical significance (p<0.05),employing a stack that solely resorts to significant base-learners led to a reduction in generalization capacity: AUC 0.98 (95% CI 0.97-0.99),sensitivity 0.98 (95% CI 0.95-1), PPV 0.92 (95% CI 0.91-0.92),NPV 0.97 (95% CI 0.94-0.99) in the discovery set; AUC 0.93 (95% CI 0.87-0.99),sensitivity 0.82 (95% CI 0.64-0.95),PPV 0.53 (95% CI 0.47-0.57),NPV 0.97 (95% CI 0.95-0.99) in the validation set.Although the differences are not substantial, we retain the full set of baselearners to enhance the generalization capacity for predicting PDAC in unseen data sets and new samples.In fact, upon following recursive base-learner elimination the best ensemble was always proven to be the full set of 16 base classifiers.
The employment of stacked diagnosis-specialized classifiers surpassed the AUC performance of state-of-the-art algorithms such as random forests (RRF) and extreme gradient boosting methods (xgbTree), in terms of AUC, sensitivity, positive predictive value, and negative predictive value at 90% specificity (Fig 2C and 2D); although the performance AUC of the stacked classifier was only marginally significantly higher than that obtained with RRF (p = 0.040, one-sided) and not significant when compared with xgbTree (p = 0.26, one-sided), the sensitivity values at 90% specificity obtained with the alternative methods were, in fact, significantly lower, p = 0.028 and p = 0.045, respectively.The ensemble also outperformed a logistic regression model with recursive feature elimination (Fig 2C and 2D, and Materials and methods section) that did not rely on ensemble modelling (p = 0.0066, one-sided), further learners presented in A and B, as well as of state-of-the-art algorithms (xgbTree, RRF and RFE glm) is reported in the discovery and heldout validation sets, respectively.xgbTree, RRF and RFE glm were trained in the whole discovery set, which contrasts with the ensemble algorithm.

Receiver Operating Characteristic (ROC) Area Under the Curve (AUC), Sensitivity (Sens), Positive Predictive Value (PPV) and Negative Predictive Value (NPV) at 90% Specificity (Spec).
https://doi.org/10.1371/journal.pcbi.1012408.g002substantiating our choice of machine learning paradigm for facilitating the identification of PDAC cases in a clinical setting where confounding diagnoses may be present, and the prevalence is low.
The results with additional subsampling algorithms, i.e., under-sampling of the majority class and SMOTE (see Materials and methods), further reinforce our choice (Table F in S1 Appendix).Only xgbTree benefits from SMOTE but the result is not consistent between the discovery and the held-out validation sets when we evaluate the positive predictive value.Moreover, synthetic sample generation has been proven to be less efficient in high-dimensional datasets [41].Further alternatives with adaptive synthetic data generation are necessary to evaluate the advantages of SMOTE in the current problem [42].For the sake of simplicity, we generated all the subsequent results with the ensemble of base-learners derived with oversampling of the minority class, which generalises better in the held-out validation set (Fig 2).
The comprehensive index signature, incorporating all diagnostic categories, was constituted by 49 features, of which 44  Gene Ontology (GO) and biological pathway enrichment (Kyoto Encyclopaedia of Genes and Genomes; KEGG, Reactome Pathway Database; REAC and WikiPathways; WP) analysis was performed for the selected set of features using the gprofiler2 R package (Fig C in S1 Appendix).Top significant terms for biological processes (BP) included 'circulatory system development', 'blood vessel morphogenesis', 'cell adhesion', 'angiogenesis', 'blood vessel development', 'regulation of cell adhesion', 'positive regulation of cell population proliferation', 'cell-cell adhesion', and 'regulation of developmental process'.Top relevant biological pathways included: 'PI3K-KAT signalling pathway', 'ERBB signalling pathway', 'pathways in cancer', 'proteoglycans in cancer', 'platinum drug resistance', 'prostate cancer', 'type I diabetes mellitus', 'MAPK signalling pathway' and 'focal adhesion'.
The scaled importance of each feature and diagnostic class/classifier is depicted in Fig 3 .It is of significance to note that not every biomarker was selected by each individualized classifier, highlighting the requirement for an array of diverse predictors, each tailored to specific underlying conditions, to effectively identify PDAC.This is consistent with the idea that heterogeneous ensembles are fundamental for predictive capacity in blind datasets [37,43].
Of the five selected clinical covariates, only Age, Ethnicity, and Gender manifested as significant predictors of PDAC in the validation set, as illustrated in Fig 1 and explained in the data set characteristics subsection (see also Tables A and B in S1 Appendix).It is worth emphasizing that the lack of significant association between certain markers and PDAC in the discovery set does not preclude their inclusion in the signature.These variables were selected due to their contribution to the enhanced robustness and generalization capacity in predicting PDAC during cross-validation with a recursive feature elimination routine (see Materials and methods).A similar trend was verified in prior work focussed on ensemble models for PDAC early detection against healthy controls and further substantiates the need for extensive discovery analysis in the data sets collected for the present project [37].
In our comprehensive analysis, we identified eight features that exhibited relatively elevated scaled importance in distinguishing controls from patients diagnosed with PDAC.These features, as detailed in   and CD160.To rigorously assess the diagnostic utility of these selected features, we devised a reduced index employing the diagnosis-specific ensemble strategy previously delineated.Diabetes and Age were incorporated as clinicodemographic variables.This strategy facilitates a targeted evaluation of the features' collective performance in a clinical context.Detailed performance metrics and analytical outcomes of this reduced index are presented in S1 Appendix and Table 2.
In evaluating the efficacy of our ensemble modelling approach and data analysis protocol, we benchmarked it against biomarker combinations reported in existing literature [40,[44][45][46][47][48].Our comparison pitted our panels and protocol against a recursive feature elimination routine presented above that did not resort to ensemble modelling (RFE glm in Fig 2 ); the latter selected Diabetes, ABL1, ERBB3, ESM1, EGF, SYND1, PPY, TGFA, VEGFA as the best feature combination.From this set the first five markers were involved in the diagnosis-specific ensemble index (Fig 3), and from the remaining only TGFA and VEGFA are reported in the literature as part of best performing panels [40,[44][45][46][47][48].Despite this overlap, given that the starting point for RFE glm was the same as in our ensemble approach, a greater portion of single markers reported in the literature should have arisen as the most competitive when applied to our data set.This is not verified, performance wise, since the diagnosis-specific ensemble modelling still outperformed any other alternative, thus confirming our protocol's added value in distinguishing diagnosed PDACs from benign and healthy controls (Fig 2).
Furthermore, it is critical to acknowledge that the best-performing index detailed in this study significantly deviates from our earlier work [37].In the previous study, the goal was to develop a combined index that could distinguish undiagnosed PDAC cases from healthy controls years before a cancer diagnosis was made.In contrast, the current study focuses on identifying an analysis strategy or pipeline leading to effective biomarker panels for use in secondary care among at-risk populations.This shift in focus reflects our ongoing efforts to refine diagnostic tools tailored to the specific needs of different clinical contexts.

Application of the full PDAC ensemble signature in symptomatic patients
Our subsequent aim was to explore whether specific clinical manifestations were correlated with PDAC status in our ADEPTS patient cohort, for which such information was available (refer to Fig D and Table G in S1 Appendix).As a similar type of data was not available for the UKCTOCS subset (healthy controls) used in this work, we focused this section on the ADEPTS cohort.

Correlation of the full PDAC ensemble signature with QCancer pancreatic score
In our final analysis, we juxtaposed the performance of our full ensemble classifier PDAC index against the QCancer risk prediction index, a clinical decision support tool available for primary care physicians, that integrates a myriad of individual-specific risk factors including age, sex, ethnicity, clinical measurements, diagnoses, and patient-reported symptoms into a risk stratifying point of care questionnaire [21].The 'Today's QCancer' index evaluates an individual's current risk of having an undiagnosed cancer as well as the specific risk for 9 distinct underlying cancer types, including pancreatic ('pancreatic' score) [49,50].The aim was to determine whether in combination, the QCancer eCDST and our biomarker index signature would be able to better discriminate PDAC patients in a symptomatic (ADEPTS) cohort or whether it would be redundant.As the current risk threshold set by the NICE is at 3% for triggering specialist referrals [51], we opted to assess the combined performance of our index signature and the eCDST at a same or lower cut-off values.
The number of samples for which a QCancer score was computed is illustrated in Fig 6C .Using the diagnostic-specific ensemble model delineated previously, probability scores for samples in both discovery and validation cohorts were used to ascertain the combined ROC AUC for those samples possessing a QCancer score.This amalgamation was imperative, considering the reduced number of samples with an associated QCancer score (Fig 6C).It should be emphasized that no subsequent refinements or training of the algorithm were conducted.The ensemble stack index demonstrated a remarkable performance, achieving an AUC of 0.98 (95% CI 0.97-0.99),a sensitivity of 0.99 (95% CI 0.97-1), a PPV of 0.91 (95% CI 0.90-0.91),and a NPV of 0.99 (95% CI 0.96-1), at 90% specificity.Interestingly, when considering only samples with a QCancer risk above 2 or 2.5, the biomarker and clinical covariate ensemble index exhibited comparatively lower performance (Fig 6A).For a QCancer risk above 3.0, the performance of the index decreases minimally once again, which is expected as the difficulty of correctly singling out cases from confounding controls is increased (Fig 6A).However, the QCancer pancreatic score did exhibit a correlation with the odds of PDAC as determined by the ensemble classifier (R = 0.36, p = 3.4×10 −8 , Fig 6D ) which highlights an important link between the purely clinical variables recorded for this cohort and the PDAC signature.Most importantly, the stacked index succeeded in attributing higher odds ratios above 1 to several PDAC cases that would have otherwise escaped detection had a QCancer score above 3 been taken as the risk predictor (Fig 6D).Contrarily, when depending exclusively on the QCancer score, and using it to calculate the ROC AUC, the predictive capacity for PDAC in the ADEPTS samples is noticeably diminished in comparison to the performance of the ensemble index (Fig 6B ); this was verified in all samples with a Qcancer score (p = 1.56×10 −18 , one-sided bootstrap test comparing AUCs), with a score above 2 (p = 1.24×10 −10 ), above 2.5 (p = 3.68×10 −13 ) and above 3 (p = 2.33×10 −8 ).This justifies the PDAC signature as a useful complementary resource for enhanced and accelerated diagnosis in the clinic.

Discussion
Our objective was to derive a data analysis strategy and construct a multi-biomarker signature that could effectively differentiate individuals with non-specific yet concerning symptoms attributable to both benign abdominal pathologies and PDAC.CA19-9 tumour marker blood levels are currently used clinically to help confirm PDAC diagnosis in a clinical context (positive findings on imaging, histopathology), prognosticate and monitor recurrence following tumour resections [31].Its absent expression in Lewis body negative blood group individuals, an overall limited predictive capacity (79-81% test sensitivity and 82-90% specificity at best), especially in the presence of certain inflammatory pancreatico-biliary conditions, have driven researchers to rather combine it in multi-marker panels to enhance its predictive performance [26,27,31].In an evolving multi-omics area, reported panels have included proteins, circulating nucleic acids (micro-RNA, cfDNA) or tumours cells, metabolites, and products of alternative DNA splicing and methylations [31,52], developed to differentiate PDAC from healthy controls and those with benign pathologies.Yet, the role of such diagnostic and screening panels in symptomatic cohorts remains unestablished.
The majority of the sampled population in our study is an enriched, symptomatic, secondary care cohort where the prevalence of PDAC was close to 8%, representing figures observed in our hepatobiliary specialised referral centres.By using this target population and their unique set of serum samples provided by the ADEPTS study [44], we were able to develop a biomarker signature in a cohort of patients who were referred to our participating centres (University College London Hospitals, London UK and the Royal Free Hospital, London UK) with various abdominal and hepatobiliary conditions which in symptomatic presentation might overlap with PDAC [17].Moreover, we included samples from patients with known risk factors for PC (chronic pancreatitis, those with family history of PDAC and cystic lesions of the pancreas, CLPs) and with biliary conditions that are known confounders of CA19-9 (i.e.biliary tract inflammation/obstruction, pancreatitis, CLPs)-the only tumour marker clinically applied in the workup and management [26,48,53] of PDAC.
We employed ensemble learning methods, which have achieved impressive accuracy in numerous complex classification tasks [37,43,54,55].Specifically, we utilized stacking-a form of meta-learning [43]-to create a superior-level predictive model based on the predictions of diagnosis-specific base classifiers.These classifiers leveraged a diverse set of features, highlighting the fundamental importance of heterogeneity arising from specific diagnoses when compared against PDAC, an approach previously demonstrated to be effective [37,54].Moreover, this study enabled us to evaluate the specificity of our general early detection machine learning approach [37] within a relevant symptomatic population, thereby allowing us to address confounding factors that may impact their performance.The use of such diagnostic specialized base-learners was further justified by the data asymmetry between PDAC cases and controls observed in both the discovery and held-out validation datasets.
The performance of this index panel and model development methodology must be appreciated within the context of the complex biology associated with each of the ensembled diagnostic classes, i.e., the challenges associated with biomarker alterations on the background of pancreatico-biliary inflammatory and obstructive pathologies.When applying a redacted, 8-marker signature (CA19-9, VWF, CPE, CTSV, CEACAM1,CD160, Diabetes and Age)-features that were selected with relatively high importance across most base learners, the performance was naturally reduced, yet still performed significantly better against CA19-9 as a single marker during discovery.Using the general linear model stack as was done in the case of the full index, the reduced signature predicted PDAC with AUC of 0.97 (95% CI 0.95-0.98),sensitivity 0.98 (95% CI 0.95-1), PPV 0.92 (95% CI 0.91-0.92)and NPV of 0.98 (95% CI 0.94-1), at 90% specificity (Fig EC in S1 Appendix), values comparable to the full index.During validation, however, the predictive capacity of the reduced signature was significantly reduced compared to the full stacked model (Fig ED in S1 Appendix) and only marginally superior to CA19-9 alone across the cohort.In contrast, it still outperformed CA19-9 by a significant margin when predicting PDAC against healthy controls (supplementary text in S1 Appendix).
As validation of its performance, we also applied the full ensemble index signature to the cohort when re-stratified based on presenting symptoms, with no further model refinement.Since the ensemble of classifiers were developed independently of symptomatic data, the aim was to test the signature performance in differentiating PDAC cases from controls by accounting for presenting symptoms which have been linked with repeated primary care consultations up to two-years prior to PDAC diagnosis [17] (Fig 4).Enriched by fulfilling certain sociodemographic, clinical and attributable suspicious symptoms (identified using eCDSTs such as QCancer tool), symptomatic patients would form an ideal cohort for further risk stratification by minimally invasive blood biomarker testing for prioritisation of more invasive (and costly) investigations.Yet, contrary to the full ensemble index, in the cohort used in the current work the QCancer score used as the sole predictor of PDAC did not achieve significant performances in samples above the threshold of 3%.This further motivates the recourse to combined strategies where complementary biomarker panels such as those identified by ensemble modelling approaches could improve early detection when used in conjunction with eCDSTs.
In our test subjects, however, only 'Jaundice' (p = 3.22×10 −15 ), and 'Weight Loss' (p = 1.44×10 −6 ) were significantly associated with PDAC.When testing the diagnostic performance of the full index signature in all symptomatic patients presenting with 'Weight Loss', the signature significantly outperformed CA19-9: AUC signature of 0.95 (95% CI 0.90-0.1)vs. AUC CA19-9 of 0.74 (95% CI 0.58-0.90)(Fig 5A and Tables 2 and I in S1 Appendix).'Weight loss' has previously been reported to have the longest diagnostic interval in a prospective primary cohort study (SYMPTOM pancreatic study), assessing symptom trends and associated diagnostic intervals in PC [11].Attesting to the full index signature's capacity as a rule out test in such patients, is its outstanding negative predictive value compared to that of CA19-9 (0.97 95% CI 0.75-1 vs. 0.81 95% CI 0.73-0.9,respectively) (Table 2).Similarly, the index signature performed superiorly to CA19-9 in jaundiced patients (AUC of 0.89 (95% CI 0.79-0.99)vs. CA19-9 AUC of 0.70 (95% CI 0.53-0.86),see also Table I in S1 Appendix), which underscores once again the increased capacity of the ensemble index to better identify PDAC in the presence of a known confounder of CA19-9 [26,48].
While our study provides valuable insights, it is not without limitations.While the observed prevalence of PDAC in this study aligns with secondary care population trends, enhanced specificity and positive predictive value would necessitate larger cohorts with an increased number of cases.Moreover, the sample set representing the 'Healthy' control class warrants expansion to incorporate a more diverse population of both men and women.This control class, derived from the UKCTOCS samples used in a previous study [37], was exclusively comprised of women.Given its superior performance in predicting PDAC, as depicted in Fig 2, the inclusion of male samples within this class could further enhance the breadth of the panel of markers identified in this study.Lastly, although diabetes emerged both as a risk factor and a central clinical covariate in our signature (including in the reduced panel), we must emphasize and recognise the lack of complete (type, duration) data in the UKCTOCS cohort [37].Nevertheless, diabetes mellitus (and in particularly of new onset) is an established risk factor and therefore its inclusion as a relevant feature in the signature is of no surprise [6].
While our index was superior in its predictive performance to CA19-9 alone and other biomarker combinations reported in the literature, in addition to compensating for asymmetric binary classes by creating a diagnostic-specific ensemble, its complexity challenges its utilisation in clinic.Yet, in the current era of rapidly evolving assay technologies, the utilization of a complex biomarker signature comprising numerous variables has gained significant relevance.While the complexity of these biomarker signatures may pose analytical challenges, the evolving assay technologies offer the means to effectively harness their potential.
Future enhancements however, will naturally necessitate the study of larger cohorts and multi-modal data, potentially incorporating a biomarker-contextualized machine learning perspective that accounts for sample-specific aspects related to diagnosis, a strategy employed in other cancer research domains [56].The utilization of disease trajectory tracking and clinical history analysis [57] may also facilitate the application of advanced deep learning techniques and electronic health data.When combined with ensemble biomarker signatures taken for example in a longitudinal context [37,58], these approaches could enhance the estimation of PDAC risk within an enriched symptomatic population.

Ethics statement
For the Accelerated Diagnosis of neuro Endocrine and Pancreatic TumourS (ADEPTS) study, University College London (UCL)/ University College London Hospital (UCLH) Research Ethics Committee reference 06/Q0512/106, IRAS Number 234637, NIHR portfolio no.7343, patients were recruited at gastroenterology/hepatobiliary and surgical clinics at UCLH and the Royal Free Hospitals (RFH), London, UK.All patients recruited to the ADEPTS study provided written informed consent and no data allowing identification of patients was used.
For the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), Joint UCL/ UCLH Research Ethics Committee A (Ref. 05/Q0505/57), written informed consent for the use of samples in the trial and secondary ethically approved studies was obtained from donors and no data allowing identification of patients was used.

Study design
As our cohort, we used serum samples from the ADEPTS study [44] aimed at detecting pancreatic cancer in patients at an earlier stage.As part of the Early Diagnosis Research Alliance, the ADEPTS study (previously referred to as TRANSlational research in BILiary tract and pancreatic diseases study), commenced in 2018 and included a multicentre prospective blood sample collection from patients with non-specific but concerning symptoms associated with PDAC.Patients were recruited at gastroenterology/hepatobiliary and surgical clinics at UCLH and the RFH, London, UK.Blood samples were collected from subjects with benign hepatobiliary conditions as well as those with PDAC (stages I-IV).
For PDAC patients, tumour staging was performed according to the AJCC 8 th edition (TNM) based on cross-sectional imaging and for those undergoing surgery, based on multidisciplinary team recordings.All included PDAC cases were histologically confirmed by UCLH and RFH local pathologists based on tissue analysis obtained by endoscopic ultrasound guided fine needle biopsies or specimens obtained during surgical resection.
For benign disease controls, patients were selected to include the following diagnoses: chronic pancreatitis, intraductal papillary mucinous neoplasms (IPMN), or benign pancreatic diseases (e.g., serous cystadenomas and pancreatic heterotopia).Patients with acute and chronic pancreatitis, pancreatic cysts, benign biliary duct diseases (e.g., IgG4 disease), liver disease, gastritis/reflux disease, gallstones as well as those with familial history of pancreatic cancer, were also used.Samples also included those collected from patients presenting with nonspecific symptoms which were not otherwise explained by an underlying gastrointestinal pathology (such as non-specific abdominal pain and irritable bowel syndrome) as well as other malignancies.Medical history and confirmation of diagnosis was obtained from hospital medical records and included GP and secondary clinic referral letters.For 45 patients, a QCancer score was available at time of specialist centre consultations.QCancer calculates the probability of an individual as harbouring an existing, yet undiagnosed cancer, by considering their specific risk factors and presenting symptoms.These are digitally available for primary care physicians through patient record and data management portals such as EMIS Web and INPS and designed as clinical decision support tools to aid in assessment of need for specialist referrals [21][22][23][24].
To further represent the healthy population we also used samples from 72 healthy control UKCTOCS [36] samples that were collected from a nested case control discovery study part of UKCTOCS reported before [37].The original UKCTOCS dataset from which data was used was derived from serum samples collected from post-menopausal women, aged between 50 and 74 years, who were recruited between the years 2001 and 2005 [36].The collection of these samples was conducted in accordance with a specific Standard Operating Procedure [59,60].For the current work our interest lies only with the UKCTOCS matched non-cancer controls, i.e., with no cancer registry code, from individual women selected based on collection date, age, and centre to minimize variation due to handling and storage.Comprehensive information regarding diabetes status for the selected UKCTOCS participants was either unavailable or incomplete.In addition, data on disease duration was not accessible.Consequently, it was not feasible to stratify samples to discovery and validation sets based on the type of diabetes they may have had.For the purposes of this study, only healthy controls that were matched to PDAC cases, with less than one year to diagnosis, were utilized.
A total of 539 serum samples (493 controls and 46 PDAC cases, see Table 1) were analysed using the Olink multiplex immunoassay Oncology II panel in addition to five in-house markers: Carbohydrate antigen 19-9 (CA19-9), Interleukin 6 Cytokine Family Signal Transducer (IL6ST/IL6RB), von Willebrand factor (VWF), Pyruvate kinase isozymes M1/M2 (PKM/ PKM2) and Thrombospondin 2 (THBS2/TSP2).The selection of additional markers, beyond CA19-9, was informed by our preceding research in early detection of PDAC [37,48].In those studies, a panel of markers was identified due to its demonstrated ability to facilitate the early detection of pancreatic cancer, with a lead time of up to two years prior to diagnosis.

Serum analyte measurements
All ADEPTS [44] samples were randomized for testing.Table J in S1 Appendix summarizes dilution factors and coefficients of variation.CA19-9 was measured using the Mucin PC/ CA19-9 ELISA Kit (Alpha Diagnostic International) according to the manufacturer, using a 1:4 serum dilution.For VWF, we resorted to the Von Willebrand Factor Human ELISA Kit (abcam) at a 1:100 serum dilution.IL6ST/IL6RB by Quantikine human soluble gp130 (R&D Systems), according to manufacturer recommendations, at a 1:100 serum dilution.THBS2/ TSP2 was measured using the Quantikine Human Thrombospondin-2 Immunoassay (R&D Systems) at a 1:10 serum dilution.Pyruvate kinase M2 (PKM2) was measured with an ELISA (Cloud-Clone Corp) at a 1:10 dilution.
We outsourced tests using the multiplex immunoassay Oncology II panel from Olink on all samples.This Olink panel measured known cancer antigens, growth factors, receptors, angiogenic factors, and adhesion regulators (as detailed in Table K in S1 Appendix).Identical assays were performed on a subset of samples derived from the UKCTOCS study [59,60].
To bridge the normalized protein expression values from Olink between the UKCTOCS and ADEPTS datasets, we selected a representative sample set of 16 from each cohort and plated them together.Subsequently, a correction was applied to the datasets using the statistical algorithms recommended in the Olink data normalization white paper [61].This method ensured that the data from different batches and studies were comparable, thereby enhancing the robustness and validity of the findings.

Statistical analysis
The selected set of ADEPTS samples used in this work was partitioned into two distinct sets: a discovery subset, comprising two-thirds of the total sample size, and a held-out validation subset, encompassing the remaining one-third.All algorithms were trained in the discovery subset.Allocation into each subset was performed by stratifying for specific age ranges, diabetes status, PDAC status and control diagnosis class.For the PDAC cases, tumour stage was also used.The age stratification ranges were the following: 18<Age�28; 29<Age�38; 39<Age�48; 49<Age�58; 59<Age�68; 69<Age�78; Age�79.The samples assigned to the control class were made of benign conditions such as: Sphincter of Oddi dysfunction, Pancreatic Cyst, Other Cancer, Other Biliary Duct Disease, No Relevant Diagnosis, Liver Disease, Irritable Bowel Syndrome, IgG4 Disease, Gastritis/Reflux Disease, Gallstone Disease, Familial Pancreatic Cancer, Chronic Pancreatitis, Acute Pancreatitis, Isolated LFT Derangement and Non-specific Abdominal Pain.We also added an additional set of healthy control samples collected from a nested study done in UKCTOCS samples used in a previous paper [62].The controls matched by age to the PDAC cases in the UKCTOCS cohort that had a time to diagnosis below up to one year were selected.The allocation of these controls to the discovery or heldout validation sets was done according to the division used in our previous work [62].The number of controls and cases collected for this study can be visualized in Fig 1 .UKCTOCS controls are identified as 'Healthy'.
The discovery held-out validation final split, i.e., with ADEPTS and UKCTOCS samples, put the prevalence of PDAC in the discovery set at close to 8%.The prevalence of PDAC in the resulting validation was approximately 14%.The held-out validation set was isolated and not used in any stage of model and biomarker signature development.This is akin to having a blinded dataset.
Receiver operating characteristic (ROC) curves were constructed for each model to assess diagnostic performance.The area under the curve (AUC) for the ROC curves was used as the metric.ROC curves were generated with the pROC R package (version 1.18.0,https://cran.rproject.org/web/packages/pROC/index.html).95% CI for AUCs were determined by stratified bootstrapping.All AUC confidence intervals crossing 0.5 were considered to be non-significant.P values comparing ROC curves were also calculated using the pROC package, under a one-sided bootstrap approach with 10000 runs.
In order to evaluate the association between each of the single markers available for this work, including clinical covariates (see Fig 1 and Table 1), and PDAC status, we created univariate models using a logistic regression model implemented in the logistf R package (https:// cran.r-project.org/web/packages/logistf/index.html,version 1.24.1).This approach fits a logistic regression model using Firth's bias reduction method.The reported confidence intervals for odds ratios and tests were based on the profile penalized log likelihood and incorporate the ability to perform tests where contingency tables are asymmetric or contain zeros.The performance of single marker models was also verified in the discovery and held-out validation sets (see Fig B and Tables C and D in S1 Appendix).The same package was also used to verify the association of the presence of symptoms and PDAC status (see Fig 4).
A comprehensive multi-dimensional examination of the collated data was conducted by employing two distinct analytical frameworks.The first was a stacked ensemble algorithm where base-learners were developed according to the same algorithm but in subsets of the discovery set where samples belonging to a specific control diagnosis class were contrasted against the same 24 PDAC cases (see the proportions in Fig 1).The resulting base-learners were then stacked by a logistic regression model, (see Table E for the resulting coefficients and Fig A for the stacking procedures, in S1 Appendix).This approach aimed to leverage the predictive power of multiple models, thereby enhancing the robustness and potentially leading to more precise predictive outcomes [37,43,54].For each base-learner classifier we resorted to a Recursive Feature Elimination (RFE) routine with logistic regression as the fitting algorithm available through caret (version 6.0-93, https://cran.r-project.org/web/packages/caret/index.html).Due to the prevalence of PDAC cases in the whole dataset being low, random under sampling of the majority class, here benign and healthy controls, if the PDACs are pitted against the whole set of controls, would pose a challenge for most algorithms.Therefore, creating an ensemble of classifiers specialised in contrasting a specific diagnostic class against PDAC allowed us more balanced subsets leading to increased performance (Fig 2).For the samples collected from UKCTOCS no symptoms information was available and, therefore, we created a separate classifier associated with this subset of individuals.
Selection of type and number of base-learners has been studied before in other areas [37,43].Different approaches to this problem have been put forward that either focus on a greedy search for the best ensemble, or rely on diversity based metrics to ensure robustness in external datasets [43].Here, we have chosen to enforce the diagnosis-specific design to ensure that we relied on clinically relevant features and respected the underlying question of ADEPTS [44].Nevertheless, we also tested ensemble selection by recursively eliminating base-learners and the best ensemble performer was always the full set of 16 reported above, each contrasting the same PDACs against diagnosis-specific controls.
The second analytical framework followed a more traditional application of state-of-the-art algorithms to the whole discovery set.We tested 3 different algorithms under this framework: random forests (RRF, version 1.9.4,https://cran.r-project.org/web/packages/RRF/index.html);extreme gradient boosting trees (xgbTree, version 1.6.0.1, https://cran.r-project.org/web/packages/xgboost/index.html); and a generalized linear model with RFE (RFE glm).It is important to clearly stress the differences between this RFE glm model and the stacked ensemble model reported above.The additional RFE glm model, despite using similar techniques to each base-learner in the ensemble approach, was applied to the whole discovery data set, without division of the control samples into diagnosis classes.This resulted in one model only as opposed to 16, and therefore there is no need for stacking under this additional approach.
All ensemble base-models as well as all the additional state-of-the-art algorithms mentioned above, i.e., xgbTree, RRF and RFE glm, were trained in the discovery subset with leave-oneout cross validation in order to find the optimal set of input features or the optimum hyperparameters (see Table M in S1 Appendix).1000 random parameter combinations were tested to achieved optimum performance.
We tested 3 subsampling algorithms combined with each of the models: oversampling of the minority class, the Synthetic Minority Oversampling Technique (SMOTE) [41,63] algorithm and under sampling of the majority class (see Table F in S1 Appendix).The sub-sampling routines were performed within the cross-validation procedure to avoid overfitting [64][65][66].
The RFE associated with 2 of the algorithms mentioned above was also performed within the cross-validation folds.This reduces data leakage and overfitting due to the fact that feature selection is performed for each training fold and a rank of potential feature groups is created based on their cross-validation performance [64][65][66][67], thus leading to the most robust option.
To verify if the PDAC index developed with the ensemble stacked approach had any association with metrics used in the clinic but not taken into account in any stage of algorithm training, we also gathered the QCancer pancreatic score [21] for individuals in the ADEPTS study (see Fig 6).This allowed us further validation of the diagnosis-based ensemble index and a view of its potential as a complementary measure.
The procedure for assessing feature importance in each base learner was a model-agnostic method based on a simple feature importance ranking measure [68], implemented in the R package vip (version 0.3.2,https://cran.r-project.org/web/packages/vip/index.html).The model-agnostic interpretability, by decoupling the interpretation from the model itself, introduces a level of flexibility that enables its application across any supervised learning algorithm.Despite the algorithm used for each diagnosis-specific classifier being the same, the model-agnostic approach allows us to be able to generalise the computed importances to other work in the literature.
Enrichment analysis for each of the signatures developed was performed with the gprofiler2 R package (version 0.2.1, https://cran.r-project.org/web/packages/gprofiler2/index.html).A threshold for multiple comparison correction under the framework of false discovery rate was instituted at 0.05.

Fig 2 .
Fig 2. Performance of individual base-learner classifiers, stack ensemble and state-of-the art algorithms.A Base-learners performance in the discovery set.Each base-learner classifier was developed by training with a recursive feature elimination technique (RFE) and logistic regression (glm) in samples belonging to each specific diagnosis class against the same 24 PDACs in the discovery set.The performance reported in A is, nevertheless, of each classifier in the whole discovery set.The performances reported in B correspond to the base-learners developed in the discovery set but applied to the whole validation set.In C and D the performance of the ensemble stack based on the base- Fig 3, include the biomarkers CA19-9, VWF, CPE, CTSV, CEACAM1,

Fig 3 .
Fig 3. Features selected per diagnosis class (base-learner classifiers).The scaled importance is calculated within each base-learner (Fig 2A).Selected features are ranked from left to right according to the average scaled importance across base learners.See Fig 1 and Tables B, C and D in S1 Appendix for the univariate predictive performances of each of the markers in the discovery and validation sets.See Materials and methods section for details on model-agnostic algorithm for feature importance calculation.See S1 Data file for the underlying data for the figure. https://doi.org/10.1371/journal.pcbi.1012408.g003

Table 2 .
Performance summary for selected models in symptomatic patients.The probability values used to calculate the performance metrics were generated with each model developed in the training set and reported in the main text.Probability values for symptomatic patients belonging to the training set and validation set were concatenated to generate the ROC curves.Only ADEPTS samples had symptoms information.A. L. Derang.: Asymptomatic LFT Derangement.B. Pain: Back Pain.C. B. Habit: Change in Bowel Habit.W. Loss: Weight Loss.95% confidence intervals are provided in parentheses.See also Table H in S1 Appendix for the explicit performance ranks according to model, symptom and metric and Fig 5. Receiver Operating Characteristic (ROC) Area Under the Curve (AUC), Sensitivity (Sens), Positive Predictive Value (PPV) and Negative Predictive Value (NPV) at 90% Specificity (Spec).

Fig 4 .
Fig 4. Association between symptoms and PDAC.A Number of subjects with each symptom according to PDAC status, case or control.B Association of symptoms with PDAC status, p values were calculated according to a logistic regression model with a bias reduction method.Purple dashed lines correspond to -Log [0.05].In B dot sizes correspond to odds ratios and are colour coded according to their respective values, i.e., blue if OR<1 and red if OR>1.See also Table I in S1 Appendix.Only samples belonging to the ADEPTS cohort were used as no information about symptoms was available for the UKCTOCS set of samples.https://doi.org/10.1371/journal.pcbi.1012408.g004

Fig 5 .
Fig 5. Receiver operating characteristic curves for selected models in symptomatic patients.A Only CA19-9.B Full index signature.C Reduced index signature.The probability values used to calculate the performance metrics were generated with each model developed in the discovery set and reported in the main text.Probability values for symptomatic patients belonging to the discovery set and validation set were concatenated to generate the ROC curves.Only ADEPTS samples had symptoms information.A. L. Derang.: Asymptomatic LFT Derangement.B. Pain: Back Pain.C. B. Habit: Change in Bowel Habit.W. Loss: Weight Loss.See also Table 2 for numerical values for area under the curve and other metrics.https://doi.org/10.1371/journal.pcbi.1012408.g005

Fig 6 .
Fig 6.Prediction of PDAC in patients with specific symptoms and according to QCancer score values.The ensemble stack was selected as the best model according to Fig 2. A Performance of the stack in participants for which a Qscore had been calculated or above a specific threshold, bigger than 2, 2.5 or 3.0.B Performance of the Qscore taken as the predictor of PDAC risk in participants for which a Qscore had been calculated or above a specific threshold, bigger than 2, 2.5 or 3.0.C Number of subjects that had a calculated Qscore or are above a specific threshold, bigger than 2, 2.5 or 3.0.D Correlation between QCancer score and odds ratio of PDAC according to the stacked ensemble.D is in log scale and R stands for the Person correlation coefficient and p for the p-value calculated with a t-test.The QCancer score is identified as Qscore in the figure panels.Receiver Operating Characteristic (ROC) Area Under the Curve (AUC), Sensitivity (Sens), Positive Predictive Value (PPV) and Negative Predictive Value (NPV) at 90% Specificity (Spec).https://doi.org/10.1371/journal.pcbi.1012408.g006 1 and Table A in S1 Appendix, ROC AUC 0.74 (95% CI 0.64-0.83),cut-off at 70) and validation sets (Fig 1 and Table B in S1 Appendix, 0.74 (95% CI 0.64-0.82),cut-off at 60), which incorporated not only ADEPTS samples but also healthy control samples collected from UKCTOCS

Table 1 . Cohort characteristics. The
data set used to develop and test the classifiers is a combination of samples collected from the ADEPTS cohort and selected controls from the UKCTOCS cohort.BMI: Body Mass Index.See Study Design in Materials and methods section for additional details.Odds ratio (OR) and respective 95% confidence intervals are also provided in the p value column.