Personalised progression prediction in patients with monoclonal gammopathy of undetermined significance or smouldering multiple myeloma (PANGEA): a retrospective, multicohort study

Summary Background Patients with precursors to multiple myeloma are dichotomised as having monoclonal gammopathy of undetermined significance or smouldering multiple myeloma on the basis of monoclonal protein concentrations or bone marrow plasma cell percentage. Current risk stratifications use laboratory measurements at diagnosis and do not incorporate time-varying biomarkers. Our goal was to develop a monoclonal gammopathy of undetermined significance and smouldering multiple myeloma stratification algorithm that utilised accessible, time-varying biomarkers to model risk of progression to multiple myeloma. Methods In this retrospective, multicohort study, we included patients who were 18 years or older with monoclonal gammopathy of undetermined significance or smouldering multiple myeloma. We evaluated several modelling approaches for predicting disease progression to multiple myeloma using a training cohort (with patients at Dana-Farber Cancer Institute, Boston, MA, USA; annotated from Nov, 13, 2019, to April, 13, 2022). We created the PANGEA models, which used data on biomarkers (monoclonal protein concentration, free light chain ratio, age, creatinine concentration, and bone marrow plasma cell percentage) and haemoglobin trajectories from medical records to predict progression from precursor disease to multiple myeloma. The models were validated in two independent validation cohorts from National and Kapodistrian University of Athens (Athens, Greece; from Jan 26, 2020, to Feb 7, 2022; validation cohort 1), University College London (London, UK; from June 9, 2020, to April 10, 2022; validation cohort 1), and Registry of Monoclonal Gammopathies (Czech Republic, Czech Republic; Jan 5, 2004, to March 10, 2022; validation cohort 2). We compared the PANGEA models (with bone marrow [BM] data and without bone marrow [no BM] data) to current criteria (International Myeloma Working Group [IMWG] monoclonal gammopathy of undetermined significance and 20/2/20 smouldering multiple myeloma risk criteria). Findings We included 6441 patients, 4931 (77%) with monoclonal gammopathy of undetermined significance and 1510 (23%) with smouldering multiple myeloma. 3430 (53%) of 6441 participants were female. The PANGEA model (BM) improved prediction of progression from smouldering multiple myeloma to multiple myeloma compared with the 20/2/20 model, with a C-statistic increase from 0·533 (0·480–0·709) to 0·756 (0·629–0·785) at patient visit 1 to the clinic, 0·613 (0·504–0·704) to 0·720 (0·592–0·775) at visit 2, and 0·637 (0·386–0·841) to 0·756 (0·547–0·830) at visit three in validation cohort 1. The PANGEA model (no BM) improved prediction of smouldering multiple myeloma progression to multiple myeloma compared with the 20/2/20 model with a C-statistic increase from 0·534 (0·501–0·672) to 0·692 (0·614–0·736) at visit 1, 0·573 (0·518–0·647) to 0·693 (0·605–0·734) at visit 2, and 0·560 (0·497–0·645) to 0·692 (0·570–0·708) at visit 3 in validation cohort 1. The PANGEA models improved prediction of monoclonal gammopathy of undetermined significance progression to multiple myeloma compared with the IMWG rolling model at visit 1 in validation cohort 2, with C-statistics increases from 0·640 (0·518–0·718) to 0·729 (0·643–0·941) for the PANGEA model (BM) and 0·670 (0·523–0·729) to 0·879 (0·586–0·938) for the PANGEA model (no BM). Interpretation Use of the PANGEA models in clinical practice will allow patients with precursor disease to receive more accurate measures of their risk of progression to multiple myeloma, thus prompting for more appropriate treatment strategies. Funding SU2C Dream Team and Cancer Research UK.


Introduction
Multiple myeloma is often preceded by two precursor conditions, monoclonal gammopathy of undetermined significance and smouldering multiple myeloma, with current diagnostic criteria differentiating these from symptomatic multiple myeloma, [1][2][3][4] as defined by SLiM-CRAB guidelines: clonal bone marrow plasma cells greater than or equal to 60%; serum free light chain (FLC) ratio greater than or equal to 100, provided involved FLC level is 100 mg/L or higher; more than one focal lesion on MRI; hypercalcaemia; renal failure; anaemia; and bone lesions. 5 Various criteria have been developed to stratify patients with precursor disease into risk groups based on predicted probability of progression to multiple myeloma and to identify which patients might benefit from early intervention. The Mayo criteria stratify patients with smouldering multiple myeloma into risk categories depending on no risk factors (low-risk), one risk factor (intermediate-risk), or two or more risk factors (high-risk), which include a free light chain (FLC) ratio of more than 20, a monoclonal protein concentration of more than 2·0 g/dL, and a bone marrow plasma cell percentage (BMPC%) of more than 20%. 6 This 20/2/20 stratification system was updated by the International Myeloma Working Group (IMWG) to include the fluorescence in-situ hybridisation (FISH) results of t(4;14), t (14;16), gain(1q), and del(13/13q). 7 These models are applied at precursor diagnosis and rely on discrete cutoffs despite inherent variation in biomarkers throughout disease monitoring. 8,9 Consequently, the models are rarely used to restratify patients according to evolving laboratory findings, 8,9 despite improvements to the ability of the 20/2/20 model to prognosticate when applied at discrete timepoints after diagnosis. 10 Current risk stratification criteria are also limited by variation in the availability and measurement of bone marrow biomarkers. Smouldering multiple myeloma progression risk is often estimated using BMPC%, and the arbitrary cutoff of 10% BMPC is used to dichotomise monoclonal gammopathy of undetermined significance and smouldering multiple myeloma. However, the use of discrete BMPC% categories is limited by heterogeneity of the involved marrow, an absence of early-stage biopsies, and heterogeneous interpretations by pathologists. 11,12 Previous studies have shown that some rates of change of biomarkers more accurately predict progression than a discrete value at a single timepoint. For example, evolving M-protein (monoclonal protein) and haemoglobin concentrations were independent predictors of progression within 2 years for patients with smouldering multiple myeloma. 13 Also, Markov models of longitudinal data enhance predictions of myeloproliferative disease progression. 14 These studies suggest a need for the development and validation of prediction models that incorporate time-varying biomarkers to update risk throughout precursor evolution and to prognosticate time to progression, particularly for haematological diseases that rely heavily on longitudinal serum measurements.
To address this need, we developed the Precursor Asymptomatic Neoplasms by Group Effort Analysis (PANGEA) model, which uses time-varying clinical biomarkers to model how precursor progression risk to multiple myeloma evolves for a single patient over time, both with and without bone marrow biopsies. We assembled a cohort of patients with monoclonal gammopathy of undetermined significance and patients with smouldering multiple myeloma with serial laboratory measurements and we developed multivariate Cox models

Research in context
Evidence before this study Prediction models are used to predict future outcomes through the analysis of large datasets. We searched for evidence of time-varying prediction models in precursor disease through PubMed, Google Scholar, and MEDLINE from database inception to April 31, 2022, in the English language. Terms included in this search were "monoclonal gammopathy of undetermined significance", "MGUS", "smoldering multiple myeloma", "SMM", "multiple myeloma", "progression", "prediction", and "modeling". Results included primarily analyses of current standard risk criteria for precursor disease progression. There were no prediction models that used multivariable, time-varying biomarkers to predict the risk of precursor disease progression to multiple myeloma.

Added value of this study
The PANGEA project is, to our knowledge, the largest international project of time-varying biomarker data on patients with precursors to multiple myeloma. Our findings show that the PANGEA models are more accurate than current precursor progression risk criteria including the International Myeloma Working Group (IMWG) risk stratification for monoclonal gammopathy of undetermined significance and the 20/2/20 risk stratification for smouldering multiple myeloma. These accuracy improvements were also demonstrated in large, independent validation cohorts.

Implications of all the available evidence
The improved accuracy of the PANGEA models over current risk criteria suggests that models that incorporate dynamic measurements of myeloma-specific parameters can improve clinician's ability to make therapeutic decisions for individual patients. The PANGEA models can be directly accessed in clinic and are appropriate replacements of the IMWG risk stratification criteria for patients with monoclonal gammopathy of undetermined significance and 20/2/20 risk criteria for patients with smouldering multiple myeloma.
with time-varying patient profiles to predict precursor progression to multiple myeloma. Our hypothesis is that disease progression from monoclonal gammopathy of undetermined significance or smouldering multiple myeloma to overt multiple myeloma can be anticipated by trends in clinical values that are associated with clonal proliferation and that modelling these changes can improve predictions of progression risk. We strove to develop models with commonly available biomarkers to allow for broad clinical application, and we validated these models in two independent cohorts. This validation illustrates that both PANGEA models (with [BM] and without bone marrow biopsy [no BM]) outperform the prediction accuracy of previous models in multiple cohorts. Finally, we provide an online calculator implementing the PANGEA model that allows clinicians and patients to assess individual risk of progression and consider early therapeutic interceptions.

Study design
In this retrospective, multicohort study, we included an international cohort of patients with precursor disease to multiple myeloma with serial clinical and biological variables. Patients were identified retrospectively at oncology centres ( This study was approved by the DFCI Institutional Review Board  and done in accordance with the Declaration of Helsinki. Consent was waived due to the non-invasive nature of this research.

Participants
The PANGEA project included patients with smouldering multiple myeloma and monoclonal gammopathy of undetermined significance within three independent cohorts: the training cohort, which included patients at DFCI (annotated from Nov 13, 2019, to April 13, 2022); the validation cohort 1, which included patients at University of Athens (annotated from Jan 26, 2020, to Feb 7, 2022) and patients at UCL (annotated from May 9, 2020, to April 10, 2022); and validation cohort 2, which included patients at RMG (annotated from May 1, 2004, to March 10, 2022. For more information on the cohorts see appendix (p 1).
Patients from all four sites were eligible for inclusion if aged 18 years or older, diagnosed with non-IgM monoclonal gammopathy of undetermined significance or smouldering multiple myeloma by the IMWG criteria. Patients diagnosed with overt multiple myeloma at diagnosis were excluded from analysis, and patients treated with therapy during their precursor disease course were censored at treatment start dates. Patients were included in analysis until the date of progression per SLiM-CRAB criteria, death, or initiation of treatment. In all three cohorts, patients were selected for analysis from tissue-banking and retrospective monitoring trials for precursor disease states.

Procedures
The time of diagnosis and the first visit (visit 1) coincided in all cohorts (ie, the average time between date of original diagnosis and visit 1 was 0 months for training cohort, validation cohort 1, and validation cohort 2).
We retrieved patient information for total protein, IgA via nephelometry, IgM, IgG, κ-free light chain (FLC) and λ-FLC via Optilite (Binding Site, Birmingham, UK), FLC ratio (involved and uninvolved), calcium, creatinine, albumin, haemoglobin, lactate dehydrogenase, β2microglobulin, M-protein, and bodyweight from medical records. Serial values were annotated on average at 5 (IQR 3-8) month time intervals from the date of monoclonal gammopathy of undetermined significance or smoulder ing multiple myeloma diagnosis, censoring at the date of progression to active multiple myeloma, last follow-up, initiation of precursor treatment, or death. We also retrieved data on gender, race, ethnicity, age at diagnosis, height, progression, survival status, immunofixation isotype, and bisphosphonate use. For all bone marrow biopsies, plasma cell percentages were collected from core biopsy samples and FISH results from bone marrow aspirates (appendix p 4).
We built the PANGEA model, a multivariate Cox regression with time-varying biomarkers, by selecting clinically significant predictors of progression (age, FLC ratio, M spike in g/dL, creatinine in mg/dL, and BMPC%) identified using the training cohort. FLC ratio and creatinine concentration were log-transformed to reduce outlier effect. We also evaluated whether biomarker trends correlated with the progression risk and selected decreasing haemoglobin concentration as a categorical trend variable (appendix p 3). We compared the predictive accuracy of this model with those created through backward selection and Bayes information criterion and selected the most accurate model containing the least redundancy.
We developed two versions of the PANGEA model (BM and no BM). Our final Cox model (named the PANGEA model [BM]) included age, FLC ratio, M spike concentration in g/dL, creatinine concentration in mg/dL, BMPC%, and the haemoglobin trajectory variable (appendix p 14). We then eliminated all biomarkers that require a bone marrow biopsy and repeated the modelling process (the PANGEA model [no BM]) with four continuous predictors (age, FLC ratio, M spike concentration in g/dL, and creatinine concentration in mg/dL, and haemoglobin trajectory; appendix p 14). The models assume that the hazard of progression to multiple myeloma is a linear function that only depends on a patient's clinical profile and is conditional on expected time to death.

See Online for appendix
We developed a web application that allows input of patient variables of the PANGEA model (BM and no BM) using the Shiny R package (1.7.1). The resulting PANGEA app outputs a patient's risk of progression using these biomarkers (monoclonal protein, involved over uninvolved FLC ratio, creatinine, haemoglobin trajectory, and age; appendix p 5). Alternatively, if bone marrow data is not available, users can enter all other variables, and patient progression risk will be evaluated using the PANGEA (no BM) model. If longitudinal measurements are available, users can enter variables at multiple time points.
The main outcome measure, time to progression, was defined as the time from precursor disease diagnosis per IMWG criteria 4 to multiple myeloma diagnosis per SLiM-CRAB 5 criteria.

Statistical analysis
We used bootstrapping and calibration analyses (appendix pp 16,21) and Schoenfeld tests, residual plots, and splines of predictors (appendix pp 11,[19][20] to assess the PANGEA models. R (version 4.2.0) was used for all statistical analyses. The average number of timepoints for validation cohort 1 was six and for validation cohort 2 was one; thus, we used validation cohort 1 to validate how the PANGEA model performed for patients with follow-up and validation cohort 2 to validate how the PANGEA model performed at diagnosis (visit 1). When comparing the PANGEA model with the current risk stratification criteria, application of the IMWG 4 or 20/2/20 6 criteria as binary cutoffs at diagnosis will be referred to as the baseline model and restratification by these criteria as discrete variables over time will be referred to as the rolling model. Subcohorts of patients with smouldering multiple myeloma from validation cohort 1 and validation cohort 2 were used for comparative analyses against the baseline and rolling 20/2/20 models. A subcohort of patients with monoclonal gammopathy of undetermined significance from validation cohort 2 was used for comparative analyses against the baseline and rolling IMWG models.
The C-statistic is a standard metric used to compare prediction models. A C-statistic of 0⋅5 indicates that the model performs no better than random chance and a C-statistic of 1 indicates perfect prediction. For the PANGEA models, we computed C-statistics for visits 1, 2, and 3 for validation cohort 1 and at visit 1 for validation cohort 2. For the baseline models, we fit a Cox model in the training cohort to estimate the hazard ratios (HRs) for risk groups and computed the Cox linear combination of predictor and C-statistics in the validation cohorts. For the rolling models, we fit a time-varying Cox model in the training cohort to estimate HRs and computed the C-statistics at visits 1, 2, and 3 in validation cohort 1. The C-statistic estimates for validation cohort 1 and validation cohort 2 are representative of model accuracy in two cohorts independent from the training cohort used for developing the PANGEA models.
To visualise the time to progression for the validation cohorts, we divided patients into quartiles (low, intermediate-low, intermediate-high, and high risk) based on their predicted risk from the PANGEA models. This discretisation is only used when needed for graphical summaries and for comparisons with models that define risk groups. We visualised these groups using Kaplan-Meier curves for time to progression or death (with patients censored at treatment). In these analyses, we included patients who qualified for the PANGEA models by having all necessary biomarker values available at the visit of interest. We explored whether FISH biomarkers could provide additional prediction improvements to the PANGEA model (BM). Due to the frequent absence or failure of FISH testing and the rarity of some cytogenetic alterations, our training cohort was of small size. Therefore, we selected patients with one or more successful FISH panels and corresponding laboratory datasets, resulting in a subcohort of patients (appendix pp [8][9]. We built the PANGEA model (FISH) by selecting significant predictors.

Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.
Variations in accuracy between the three modelling processes (significant predictor selection, backward selection, and Bayes information criterion), as measured by C-statistics, were less than 2%. All variables selected for these models were identical except for the PANGEA model (no BM) produced by Bayes information criterion, which incorporated albumin and isotype. We selected the significant predictor model due to its accuracy and succinctness.
The PANGEA model (no BM) included haemoglobin trajectory, FLC ratio, M spike concentration, age, and creatinine concentration as significant progression predictors (appendix p 14). Total protein, κ-FLC or λ-FLC, calcium (corrected for albumin) concentration, LDH, and β2-microglobulin concentrations, and bisphosph onate use, family history of haematological malignancy, time with disease, race, ethnicity, and sex were not significant indicators of disease progression.
The PANGEA model improved prediction of smouldering multiple myeloma progression to multiple myeloma compared with both 20/2/20 models (baseline and rolling) in validation cohort 1 and validation cohort 2, as indicated by a C-statistic increase of more than 10% (  The PANGEA models improved output probabilities of progression for individual patients with smouldering multiple myeloma in validation cohort 1 and validation cohort 2 and patients with monoclonal gammopathy of undetermined significance in validation cohort 2 (figure 3; appendix p 17) when they were artificially stratified into high, high-intermediate, low-intermediate, and low progression risk groups. We compared the predicted risk groups in validation cohort 1, and 58% of patients with smouldering multiple myeloma who eventually had progression to multiple myeloma were reclassified from a 20/2/20 intermediate-risk or low-risk category into a PANGEA (BM) high-risk category (figure 3B). Furthermore, patients who did not have progression to multiple myeloma were often classified with lower risks than those who do progress ( figure 3A, 3C). Similarly, 43% of patients with monoclonal gammopathy of undetermined significance who eventually had progression to multiple myeloma were reclassified from a IMWG lower risk category into a PANGEA model (BM) high-risk category (appendix p 17).
Currently, bone marrow biopsies are the primary source of genomic information available from the clinic. Because genomic aberrations have a crucial role in precursor progression, 15,16 we expanded the PANGEA model (BM) to include FISH covariates. The resulting PANGEA model (FISH) used the significant predictors of age, FLC ratio, M spike concentration, creatinine concentration, BMPC%, del(17/17p), gain(1q), del(13/13q), and haemoglobin trajectory (appendix p 18). We also identified MYC rearrangement (8q24) as a significant covariate in a subcohort of 957 patients from the training cohort and validation cohort 1 who were tested for this translocation (appendix pp [8][9]. The significance of FISH biomarkers suggests potential for further improvements to the PANGEA model when additional datasets for validation become available.

Discussion
The study of precursor disease created stratification systems, which identify patients at the highest risk of progression to multiple myeloma. However, current monoclonal gammopathy of undetermined significance and smouldering multiple myeloma progression prediction algorithms stratify patients into risk groups using baseline measurements rather than time-varying

Number at risk (number of events)
Low Intermediate-low Intermediate-high High

Number at risk (number of events)
Low Intermediate-low Intermediate-high High
We assembled a cohort of patients with precursor multiple myeloma with extensive longitudinal data to develop the PANGEA models, multivariate Cox regressions that use widely available, time-varying biomarkers with and without bone marrow data, to improve predictions of individual patients' progression risk. The PANGEA models incorporate clinical variables beyond typical measures of tumour burden, including creatinine concentration, age, and haemoglobin concentration, in addition to those in the 20/2/20 criteria (M spike concentration, FLC ratio, and BMPC%). The parameters of the PANGEA models are concordant with recent research that found that decreasing haemoglobin is an independent predictor of smouldering multiple myeloma progression to multiple myeloma 19 and decreased renal function at precursor diagnosis is associated with worse outcomes. 20 Research has also shown that incidences of monoclonal gammopathy of undetermined significance, smouldering multiple myeloma, and multiple myeloma increase with age; 2 the PANGEA models capture this distinction by incor porating an age variable. Additionally, dynamic assessment of risk was suggested by Blade and collegues 21 as early as 1989 and, more recently, shown by the Mayo group with improvements to the 20/2/20 model's ability to prognosticate when reapplied after diagnosis. 10 However, most of these studies have been small relative to the PANGEA project, have failed to include time-varying biomarkers, and have not been validated in external cohorts. 6,7 A crucial difference between PANGEA and the 20/2/20 risk criteria is that the PANGEA models provide patient- Data shown are C-statistic (95% CI), as tested in patients with smouldering multiple myeloma from validation cohort 1 and validation cohort 2. Bootstrapping is shown on appendix (p 16). specific probabilities of progression. PANGEA allows for improved prognostication, as validation analyses showed a relative precision improvement over current risk criteria. When models are applied to the same cohort, C-statistics allowed for direct comparison of predictive accuracy. Analysis of the PANGEA model compared with the baseline and rolling 20/2/20 models for patients with smouldering multiple myeloma and the rolling IMWG for patients with monoclonal gammopathy of undetermined significance all showed changes in C-statistic of greater than 10%. This increase in C-statistic was validated by early identification of patients who later progressed to overt multiple myeloma, with 58% of progressors identified as high risk by the PANGEA model and not by the rolling 20/2/20 model (figure 3). Our comparisons to alternative stratification models highlight that the PANGEA models are clinically appropriate, improve prediction accuracy, and capture changes in disease risk after diagnosis.
A crucial goal of this project was to identify the role of bone marrow biopsies in risk prediction. Despite the reliance of current stratification models on BMPC%, many patients with precursors to multiple myeloma do not regularly undergo bone marrow biopsies or forgo them altogether. These patients cannot be adequately assessed by risk criteria that rely on BMPC%. The PANGEA model (no BM) shows that progression risk can be accurately estimated with trends in serum biomarkers. Specifically, both PANGEA models (BM and no BM) outperform the baseline and dynamic models for the IMWG monoclonal gammopathy of undetermined significance and 20/2/20 smouldering multiple myeloma criteria (appendix p 10, table 2). These data suggest that variables derived from bone marrow biopsies are not required to accurately determine progression risk. When bone marrow biopsy data are no longer required and with considerable biological overlap between monoclonal gammopathy of undetermined significance and smoul dering multiple myeloma, 15,16,22,23 predictions models that consider these precursor conditions together are advantageous. With this approach, we foresee a transition from coarse, discrete risk groups (monoclonal gammopathy of undetermined significance vs smoul dering multiple myeloma risk groups) to a granular spectrum of the precursor population at the individual level. Regardless of a patient's bone marrow status, the PANGEA model can be used via the online PANGEA app to easily calculate progression risk of all precursor patients.
Genomic and epigenetic factors that lead to multiple myeloma progression are also a crucial part of a patient's progression risk. 15,16,24 Studies have shown that monoclonal gammopathy of undetermined significance and smouldering multiple myeloma clones already harbour chromosomal alterations and that progression to multiple myeloma is due to the expansion of clones that are present in early disease stages. [24][25][26] We built the PANGEA model (FISH), which incorporated sequential cytogenetic data in personalised risk prediction. The PANGEA model (FISH) is novel in that it examines changes in cytogenetic alterations when providing probabilities of disease progression. The PANGEA model (FISH) model shows the predictive value of FISH variables and suggests that previously imperceptible clonal tumour evolution might be approximated by clinical cytogenetic results; however, future studies are required to evaluate this model in independent datasets.
Together, PANGEA is a three-tiered model (BM, no BM, and FISH), which can take advantage of complex clinical tests or be readily available for patients with few data. FISH and bone marrow biopsies were included in our analysis because we acknowledge that both physicians and patients will continue to request them; however, patients without bone marrow biopsies and FISH results can receive accurate risk predictions with the PANGEA model (no BM) as it also outperforms existing models.
The PANGEA models are inherently limited by the selected variables and modelling process, our prioritisation for model simplicity and interpretability, and our assumptions on proportional hazards and non-informative censoring. Larger datasets, advanced machine-learning, and extended validation cohorts have the potential to improve accuracy in the future. We plan to evaluate circulating tumour cells, cell-free DNA, immune variables, and other biomarkers to refine risk stratification. We also aim to use prospective cohorts for further validation and we look forward to ethically including more patients with precursors to multiple myeloma who identify as African American-a population with increased prevalence of precursor conditions. The hope is that the PANGEA models dramatically improve how clinicians can inform patients of their personalised risk of developing myeloma and aid decision making for early therapeutic interception, particularly when recommending follow-up testing to monitor time-varying biomarkers. The PANGEA model is freely accessible, using continuous variables available in all clinical settings, enabling its use at both the individual patient level and in clinical trials for the rapid development of therapeutic interventions. from the International Myeloma Society for travel and conference expenses. JR declares honoraria from Sanofi, Janssen, Amgen, GSK, and Bristol Myers Squibb; travel grants from BMS, Janssen, and Amgen; and funding from a consulting or advisory role from Sanofi, Janssen, Amgen, GSK, and BMS. EK reports honoraria from Amgen, Janssen, Takeda, Genesis Pharma, Pfizer, and GSK; travel grants from Janssen; and is an advisory board member at Janssen and Prothena. MAD declares honoraria from Amgen, BMS, Takeda, and Janssen and is an advisory board member at Amgen, BMS, Takeda, and Janssen. CRM reports research funding from GRAIL. GG declares honoraria for lectures from Society for Neuro-oncology, Society of Tumor Oncology, and MD Anderson; honoraria as a Paul C Zamecnik Chair in Oncology; research funding from IBM and Pharmacyclics; patents, royalties, other intellectual property as Inventor on patent applications related to MSMuTect, MSMutSig, MSIDetect, POLYSOLVER, and SignatureAnalyzer-GPU; and stock and other ownership interests from Founder as a consultant and has privately-held equity in Scorpion Therapeutics. IMG declares honoraria from Celgene, Bristol-Myers Squibb, Takeda, Amgen, Janssen, and Vor Biopharma; consulting or advisory roles at Bristol-Myers Squibb, Novartis, Amgen, Takeda, Celgene, Cellectar, Sanofi, Janssen, Pfizer, Menarini Silicon Biosystems, Oncopeptides, The Binding Site, GSK, AbbVie, Adaptive, and 10xGenomics; and a spouse who is the Chief Medical Officer at Disc Medicine and holds equity in the company. AC, FF, SSF, GG, LT, and IMG have applied for a patent for the application of the PANGEA models described in this paper.

Data sharing
The PANGEA team encourages collaboration to further model development. Data from this project can be made available in aggregate and after deidentification to investigators who submit appropriate proposals approved by the study team. Please direct questions to irene_ghobrial@dfci.harvard.edu.