A Machine Learning Platform to Optimize the Translation of Personalized Network Models to the Clinic.

PURPOSE
Dynamic network models predict clinical prognosis and inform therapeutic intervention by elucidating disease-driven aberrations at the systems level. However, the personalization of model predictions requires the profiling of multiple model inputs, which hampers clinical translation.


PATIENTS AND METHODS
We applied APOPTO-CELL, a prognostic model of apoptosis signaling, to showcase the establishment of computational platforms that require a reduced set of inputs. We designed two distinct and complementary pipelines: a probabilistic approach to exploit a consistent subpanel of inputs across the whole cohort (Ensemble) and a machine learning approach to identify a reduced protein set tailored for individual patients (Tree). Development was performed on a virtual cohort of 3,200,000 patients, with inputs estimated from clinically relevant protein profiles. Validation was carried out in an in-house stage III colorectal cancer cohort, with inputs profiled in surgical resections by reverse phase protein array (n = 120) and/or immunohistochemistry (n = 117).


RESULTS
Ensemble and Tree reproduced APOPTO-CELL predictions in the virtual patient cohort with 92% and 99% accuracy while decreasing the number of inputs to a consistent subset of three proteins (40% reduction) or a personalized subset of 2.7 proteins on average (46% reduction), respectively. Ensemble and Tree retained prognostic utility in the in-house colorectal cancer cohort. The association between the Ensemble accuracy and prognostic value (Spearman ρ = 0.43; P = .02) provided a rationale to optimize the input composition for specific clinical settings. Comparison between profiling by reverse phase protein array (gold standard) and immunohistochemistry (clinical routine) revealed that the latter is a suitable technology to quantify model inputs.


CONCLUSION
This study provides a generalizable framework to optimize the development of network-based prognostic assays and, ultimately, to facilitate their integration in the routine clinical workflow.


INTRODUCTION
In recent years, there has been a transition from making clinical decisions on the basis of macroscopic characteristics of the tumor to molecular-based biomarkers. [1][2][3][4][5] Advances in omic technologies have led to the development of prognostic and predictive molecular signatures for the majority of solid tumors. 6-17 A few transcriptomic-based taxonomies have been commercialized, and their introduction in the clinic is being evaluated. [18][19][20][21] Dynamic systems models, historically relegated to comprehending cancer biology in basic research, have emerged as valuable biomarkers. [22][23][24][25][26][27] Systems models, which primarily are based on ordinary or partial differential equations, have the inherent ability to encode pathway properties that can serve both as prognostic indicators and as screening platforms to inform patient treatment. The network dynamics, and thus model predictions, can be personalized by tuning critical model inputs with patient-specific measurements, such as protein concentrations or expression of genes and microRNAs. However, robust quantification of model inputs requires a large tumor volume for each sample, highly specialized equipment, and time-consuming protocols not suitable for a fast-paced environment such as a clinical histopathology department.
We developed a framework to identify the essential model inputs that require de novo patient-specific quantification for a specific clinical setting, which therefore assists with the integration of pathway-based biomarkers in the routine clinical workflow (Fig 1). As a case study, we selected a mathematical model of apoptosis execution, APOPTO-CELL, that our group developed 28 and showed to be prognostic in colorectal cancer (CRC) 22,26 and glioblastoma multiforme. 29 We developed two methods to apply APOPTO-CELL with a minimal set of protein inputs: Ensemble and Tree. Ensemble estimates the APOPTO-CELL signature by aggregating the results from a collection of APOPTO-CELL simulations performed with a range of concentrations for the proteins that were not quantified. In contrast, Tree uses a decision tree to estimate the APOPTO-CELL signature from a smaller number of protein inputs. The set of proteins that need to be measured is dictated by the decision tree tailored for the individual patient. We validated our framework in an in-house CRC cohort, with protein inputs quantified by high-throughput techniques, such as reverse phase protein array (RPPA) and immunohistochemistry (IHC).

Development and Validation of Minimal Models for APOPTO-CELL With Virtual and Real-World CRC Cohorts
APOPTO-CELL is a mathematical model that takes as input tumor protein expression of APAF1, procaspase-3 (PC3), procaspase-9 (PC9), SMAC, and XIAP and predicts apoptosis sensitivity. The development of the Ensemble and Tree variants of a minimal model for the APOPTO-CELL signature with a virtual patient cohort is described in the Data Supplement.
We validated our minimal models in a real-world cohort from a retrospective collection of patients with stage II to IV CRC, as previously reported. 26 Clinical follow-up of patients is standardized, and every included patient was reviewed every 6 months for 3 years followed by 6-month or annual visits for years 4 and 5. We used disease-free survival (DFS) and overall survival (OS) as clinical end points. DFS was defined as the absence of a suspicious lesion on surveillance computed tomography of thorax, abdomen, and pelvis performed according to guidelines in place at the treating hospital (every 6 months for 2 years followed annually for 3 years or annually for 3 years). OS was defined as the most recent clinical contact with a patient. We focused our study on patients with stage III CRC treated with fluorouracil-based chemotherapy with clean resection margins and follow-up of at least 1 month after surgery; with sufficient bulk tumor material available for quantification of PC3, PC9, SMAC, and XIAP by IHC; and previous measurements of those proteins by RPPA. 26 The workflow that illustrates data handling and patient inclusion is shown in Appendix Figure A1, and clinical baseline characteristics are listed in the Data Supplement uses a probabilistic approach with clinically grounded distributions in place of individualized measurements for a subset of proteins. In contrast, Tree can be used to personalize the proteins to measure for every patient by traversing the classification decision tree to a node with the predicted APOPTO-CELL signature.

Statistical Analysis
We assessed differences among DFS and OS curves using Kaplan-Meier estimates and evaluated statistical significance by log-rank tests. We estimated the relative risk of relapse and death associated with the signatures by unadjusted and two multivariate Cox proportional hazards regression models (Data Supplement). In multivariate model 1, we controlled for clinical covariates previously found to be associated with survival in univariate analyses, 26

Investigating Minimal Models for APOPTO-CELL
We synthetized a virtual patient cohort to develop and evaluate the performance of two variants of a minimal model for APOPTO-CELL: Ensemble and Tree. We previously profiled all five proteins required by APOPTO-CELL in a cohort of patients with stage II/III CRC by quantitative Western blotting (qWB) 22 (Fig 2A, 1). We used these protein profiles to build clinically relevant distributions of each protein input discretized into 20 percentile bins (ie, 20 concentrations; Fig 2A, 2). We built a virtual CRC cohort with one patient for each permutation of the five protein inputs and the 20 concentrations considered, which resulted in approximately 3,200,000 patients (five-dimensional cube with 20 concentrations five proteins ; Fig 2A, 3). We ran one simulation for each virtual patient and computed the corresponding APOPTO-CELL signature using 25% substrate cleavage (SC) as the cutoff (SC less than or equal to 25% v SC greater than 25% for apoptosis-resistant and apoptosissusceptible predictions, respectively). 22,26,28 Next, we examined systematically the predictions landscape obtained by running APOPTO-CELL with an arbitrary subset of the proteins. For each measurement status (available v not available) and for each of five protein inputs, we calculated the APOPTO-CELL signature. These 32 combinations (two-measurement status five proteins ) span the complete set from having all five protein measurements available to having no quantifications. For each virtual patient, we computed both the ground-truth APOPTO-CELL and the Ensemble signatures. The ground-truth APOPTO-CELL signature was obtained by running APOPTO-CELL using all five proteins as input. The Ensemble signature was determined by running multiple simulations (20 concentrations No. unavailable proteins ) using the available inputs and permutations of possible values for the remaining unavailable protein concentrations. For each virtual patient, we enumerated the simulations classified as apoptosis sensitive or apoptosis resistant and used the majority vote as the overall prediction (Fig 2A, 4). For each combination of proteins, we assessed the fractions of virtual patients for whom the Ensemble and ground-truth APOPTO-CELL predictions matched ( Fig 2B). Unavailability of APAF1 expression affected only a very limited number of simulations (less than 1%), which rendered PC3 + PC9 + SMAC + XIAP the optimal quartet of proteins. Among the trios and duos of proteins, PC3 + SMAC + XIAP and SMAC + XIAP ranked highest (fourth and eighth out of 32), which matched ground-truth APOPTO-CELL predictions in 92% and 87% of the simulations, respectively. We deemed the Ensemble signatures conclusive when , we computed the same predictions for more than 75% of the simulations (supermajority vote; Fig 2A, 4). The PC3 + SMAC + XIAP and SMAC + XIAP protein combinations conclusively and correctly categorized 84% and 74% of the virtual patients, respectively. However, we conclusively made incorrect predictions for 3% and 5% of patients (ie, more than 75% of the simulations did not match with ground-truth APOPTO-CELL; Fig 2B). SMAC closely followed by XIAP ranked highest as single proteins and achieved 75% and 73% accuracy, respectively. For all other protein combinations, the ability to make conclusive predictions and to identify apoptosis-resistant virtual patients decreased drastically.
We next sought to identify concentration ranges where apoptosis predictions are driven largely by a single protein ( Fig 2C). For each concentration of the protein of interest, we aggregated by majority vote the APOPTO-CELL signature across 160,000 simulations (20 concentrations four available proteins ). For extreme SMAC (greater than or equal to the 95th percentile) or XIAP (less than the 5th percentile) concentrations, APOPTO-CELL predicted apoptosis sensitivity almost independently of any other protein. Conversely, high XIAP (greater than or equal to the 95th percentile) produced predominantly predictions of apoptosis impairment (86%). The fraction of simulations that predicted apoptosis resistance decreased with increasing concentrations of PC3 and, albeit more moderately, PC9. In contrast, APAF1 expression influenced   APOPTO-CELL predictions only at low concentrations (predominantly less than the 5th percentile). 22,26,28

Development of Classification Decision Trees for Optimal Selection of Model Inputs
Results from Ensemble revealed high prediction accuracy for subpopulations of patients from only a single or a reduced set of inputs. Hence, we examined whether we could optimally select which proteins to measure, personalized to each patient, and quantify additional inputs only when essential.
We trained a binary classification decision tree (Tree) on the 3,200,000 virtual patients using protein concentrations as input features and the ground-truth APOPTO-CELL Tree Size Tree with Gini index and per protein penalty   signature (SC greater than 25% v SC less than or equal to 25%) as target class (Data Supplement). The root and internal nodes include a predictor (protein) and a split point (concentration). The root node contains the predictor and concentration, which yielded the best split. The leaf node encodes the APOPTO-CELL signature and its confidence level. Trees that meet a required prediction confidence can be built by tuning the splitting routine. To build a tree, we tested all possible predictors/proteins and split points/ concentrations and selected recursively the optimal split (protein and concentration) with the Gini index as evaluation metric (Fig 3A). To build trees that would favor re-use of proteins previously evaluated in the internal nodes rather than require proteins not yet in use, we introduced a perprotein penalty in the Gini index metric (penalized Gini index; Fig 3B). By varying the confidence threshold and the per-protein penalty in a parameter scan, we observed an increase in size ( Fig 3C) and accuracy (Fig 3D) when comparing trees built with the default and penalized Gini index metrics (light and dark red solid lines, respectively). Tree outperformed the best (solid line) and individual (single data points) Ensemble approaches (Fig 3D, in gray) previously outlined in Fig 2C. Tree can be customized for a given clinical application. As an illustrative example, we deemed it worth measuring an additional protein only if it would result in an increase of at least 5% in accuracy (best trade-off tree; Fig 3D, 3E).

Validation of Minimal Models in a Cohort of Patients With Stage III CRC, With Protein Inputs Quantified by RPPA
We validated the minimal APOPTO-CELL models in a realworld cohort of patients with stage III CRC treated with fluorouracil-based chemotherapy for whom the expression of PC3 + PC9 + SMAC + XIAP had been previously determined by RPPA. 26 We evaluated the performances of Ensemble for the best double, triple, and quadruple combination of proteins identified in Fig 2C. For each patient, we tested 20 concentration No. unavailable proteins scenarios, which resulted in 20 3 , 20 2 , and 20 1 simulations for the best double, triple, and quadruple protein combinations, respectively, which totaled approximately 1,010,400 evaluations for 120 patients. We observed differences in Kaplan-Meier estimates for OS (P = .04) but not for DFS (P = .24) when comparing patients categorized as apoptosis sensitive or resistant by Ensemble for the best duo combination (Figs 4A and 4E; Data Supplement Table). When testing the best trio and quartet, we found that patients predicted as apoptosis impaired by Ensemble showed reduced DFS and OS compared with those predicted as apoptosis sensitive (P , .05; Figs 4B, 4C, 4F, and 4G; Data Supplement Table). Furthermore, when testing systematically all combinations of proteins, we found a statistically significant association (Spearman ρ = 0.43; P = .02) between Ensemble accuracy and prognostic value (Fig 5).
We observed differences in DFS (P = .054) and OS (P = .01) probabilities when comparing patients categorized by Tree-4P (best trade-off tree built with only the four proteins measured by RPPA and selected as in Fig 3D [Figs  4D and 4H; Data Supplement Table]). These results prognostic value for Ensemble. Association between accuracy (ie, agreement with ground-truth APOPTO-CELL; Fig 2B) Overall log-rank P = .14 Overall log-rank P = .28 Overall log-rank P = .09 closely resembled those obtained by using the best quartet of proteins (0.0% and 3.0% difference in concordance index computed for DFS and OS, respectively) while requiring measurement of, on average, only 2.8 rather than four proteins (Figs 4D and 4H; Data Supplement Table).

Minimal Models Provide Insights Into Patient Risk, Even With a Suboptimal Protein Set Quantified by IHC
We tested the performances of Ensemble and Tree on protein inputs quantified by IHC, a technique used in the clinical pathologic routine. When comparing IHC-and RPPA-based measurements (n = 152; Appendix Fig A1), we found an acceptable quantitative agreement for PC3 (Spearman ρ = 0.24; P = .003) and XIAP (Spearman ρ = 0.28; P , .001), whereas we observed no association for PC9 (Spearman ρ = 0.12; P = .15) and SMAC (Spearman ρ = 0.09; P = .29). RPPA is the gold standard for quantitative proteomics [35][36][37] ; thus, for subsequent analyses, we retained only proteins with significant associations between RPPAand IHC-based measurements. Figs 6A and 6B show representative images for PC3 and XIAP of weak, moderate, and strong IHC staining with validated antibodies (Appendix Fig A2; Data Supplement) and corresponding batchcorrected histoscore distributions (Appendix Fig A3). We did not find statistically significant differences for DFS and OS (P . .05) when examining PC3 and XIAP as a single (Appendix Fig A4; Table).
In exploratory analyses, we investigated the use of protein fingerprinting by IHC for PC3 + XIAP and by RPPA for PC9 + SMAC as inputs for Tree-4P (Figs 6F and 6J; Data Supplement Table), ground-truth APOPTO-CELL (Appendix Figs A6A and A6B; Data Supplement Table), and an enriched apoptosis signature developed to account for PC3-mediated apoptosome-independent signaling (APOPTO-CELL-PC3 26 ; Appendix Figs A6E and A6F; Data Supplement Table). All three classifiers retained prognostic value for DFS and OS (P , .05), which corroborates previous results on the basis of RPPA-only quantifications 26 (Appendix Fig A6C). Moreover, permutation analyses (Data Supplement) indicated that PC3 + XIAP expression by IHC critically contributed to the prognostic value of the groundtruth APOPTO-CELL signature (Appendix Fig A6D).

DISCUSSION
The appeal of dynamic pathway models as biomarkers stems from gaining a mechanistic understanding of the network (dys)regulation and potential therapeutic interventions. 38,39 Network models deliver personalized predictions by tailoring the model skeleton with individualized inputs. However, input quantifications represent a major bottleneck in translating mathematical models into the clinic. Inputs should be measurable from minimal amounts of sample with high-throughput, robust, cost-effective, and clinically amenable techniques equally suited for fresh frozen/ formalin-fixed paraffin-embedded tissues.
We used a mathematical model of caspase-driven apoptosis in CRC (APOPTO-CELL 28 ) with proven prognostic value 22,26 and treatment stratification capabilities 40 as an illustrative example. We built and validated a computational platform to capitalize on the systems-level understanding provided by the network dynamic while optimizing the inputs required. This approach not only reduces the cost for molecular characterization but also limits the exhaustion of clinical material. We developed two computational methods (Ensemble and Tree) with distinct and complementary applications in clinical settings.
We designed Ensemble to identify a consistent inputs subset to quantify for all patients enrolled in a study, whereas Tree provides a personalized input prioritization system. Ensemble could be applied when designing studies with different cohort characteristics, such as different CRC stages and cancer types. Ensemble also could be applied in clinical trials when planning the development Translating Network Biomarkers Into the Clinic of companion assays for clinical testing of inhibitor of apoptosis protein antagonists. Furthermore, it enables the integration of historical data sets. Of note, Ensemble also would provide estimates of loss of accuracy and prognostic power as rationale for optimizing the cost-effectiveness of the model reduction.
In contrast, Tree is the optimal approach for individual patients who come to the clinic, preserving 99% accuracy while requiring 46% fewer quantifications. However, this approach has the overhead necessary to manage allocation of patients through the various quantification paths. Thus, for this method to be valuable and easily adopted in busy and fast-paced clinical settings, a robust handling system would need to accompany the deployment of tree-based approaches.
For some applications, not all model inputs may be detected accurately and reliably by technologies that integrate seamlessly in the clinic. Techniques such as qWB and quantitative polymerase chain reaction may be optimal for protein and transcript profiling in the development, validation, and refinement phases of a dynamic system model cycle. However, correlation of mRNA and protein levels often is poor in tumor tissue, 41,42 and qWB is not a routine biochemistry/histopathology technique in the clinic. We previously showed how APOPTO-CELL inputs could be quantified reliably not only by qWB but also by RPPA. 26 However, integration of an RPPA-based workflow into the clinical environment is also not feasible because RPPA is an exploratory tool used for the collective high-throughput analysis of larger patient cohorts. In contrast, we demonstrated that at least for a subset of the inputs, IHC, a more clinically amenable protein profiling alternative, represents a valid substitute. Ensemble predictions using patientspecific quantifications for PC3 + XIAP outperformed the prognostic value of the same two proteins used as a combinatorial biomarker. These results highlight how Ensemble makes it possible to exploit system-level properties, even with a suboptimal inputs set, instead of falling back to combinatorial static biomarkers. Of note, Ensemble provides an estimate of confidence in the model predictions and thus allows the flagging of patients who require further investigations.
Although our study has the limitation that the proteins under investigation currently are not assessed routinely, the methods developed can be applied to multiple diagnostic assays that investigate oncogenic pathways and proteins that are indeed currently routinely assessed in the clinic. Although many challenges are at play when attempting to bridge the gap between bench and bedside, this work presents actionable methods to assist in the application of systems medicine approaches in the clinic.       Overall log-rank P = .007 and (B) and PC3 expression by IHC (Fig 6A).
Translating Network Biomarkers Into the Clinic