B cells are associated with survival and immunotherapy response in sarcoma

Soft-tissue sarcomas represent a heterogeneous group of cancer, with more than 50 histological subtypes1,2. The clinical presentation of patients with different subtypes is often atypical, and responses to therapies such as immune checkpoint blockade vary widely3,4. To explain this clinical variability, here we study gene expression profiles in 608 tumours across subtypes of soft-tissue sarcoma. We establish an immune-based classification on the basis of the composition of the tumour microenvironment and identify five distinct phenotypes: immune-low (A and B), immune-high (D and E), and highly vascularized (C) groups. In situ analysis of an independent validation cohort shows that class E was characterized by the presence of tertiary lymphoid structures that contain T cells and follicular dendritic cells and are particularly rich in B cells. B cells are the strongest prognostic factor even in the context of high or low CD8+ T cells and cytotoxic contents. The class-E group demonstrated improved survival and a high response rate to PD1 blockade with pembrolizumab in a phase 2 clinical trial. Together, this work confirms the immune subtypes in patients with soft-tissue sarcoma, and unravels the potential of B-cell-rich tertiary lymphoid structures to guide clinical decision-making and treatments, which could have broader applications in other diseases. Immune profiling of the tumour microenvironment of soft-tissue sarcoma identifies a group of patients with high levels of B-cell infiltration and tertiary lymphoid structures that have improved survival and a high response rate to immune checkpoint blockade therapy.

Soft-tissue sarcomas represent a heterogeneous group of cancer, with more than 50 histological subtypes 1,2 . The clinical presentation of patients with different subtypes is often atypical, and responses to therapies such as immune checkpoint blockade vary widely 3,4 . To explain this clinical variability, here we study gene expression profiles in 608 tumours across subtypes of soft-tissue sarcoma. We establish an immune-based classification on the basis of the composition of the tumour microenvironment and identify five distinct phenotypes: immune-low (A and B), immune-high (D and E), and highly vascularized (C) groups. In situ analysis of an independent validation cohort shows that class E was characterized by the presence of tertiary lymphoid structures that contain T cells and follicular dendritic cells and are particularly rich in B cells. B cells are the strongest prognostic factor even in the context of high or low CD8 + T cells and cytotoxic contents. The class-E group demonstrated improved survival and a high response rate to PD1 blockade with pembrolizumab in a phase 2 clinical trial. Together, this work confirms the immune subtypes in patients with soft-tissue sarcoma, and unravels the potential of B-cellrich tertiary lymphoid structures to guide clinical decision-making and treatments, which could have broader applications in other diseases.
Soft-tissue sarcomas (STSs) comprise many histological subtypes with distinct clinical and biological behaviours. Genetically 'simple' STSs are characterized by translocations that result in fusion proteins and few, if any, other genomic lesions, whereas 'complex' STSs have an unbalanced karyotype and several genomic aberrations 1 . STSs are considered 'non-immunogenic' with a low mutational burden 2 . Among complex tumours, undifferentiated pleomorphic sarcoma (UPS), dedifferentiated liposarcoma (DDLPS) and-to a lesser extent-leiomyosarcoma (LMS) can exhibit durable responses to immune-checkpoint blockade, whereas simple tumours do not respond to PD1 monotherapy or a combination of anti-PD1 and anti-CTLA4 antibodies 3,4 . Few reports investigating the composition of the tumour microenvironment (TME) in different STS histologies have been published [5][6][7] , but a recent study from The Cancer Genome Atlas (TCGA) consortium suggested an association with prognosis 8 .
Here, we developed a new classification of STS, based on the composition of the TME in large cohorts of STS, using the microenvironment cell populations (MCP)-counter method 9 . We found that the B lineage signature-a hallmark of an immune-high class we called E-correlated with an improved survival of patients with STS, in tumours with both high or low infiltration of CD8 + T cells. In an independent cohort, we used immunohistochemistry to validate the high density of B cells and presence of tertiary lymphoid structures (TLS) in class E. Finally, we showed that class E exhibited the highest response rate to PD1 blockade therapy and improved progression-free survival in a multicentre phase 2 clinical trial of pembrolizumab in STS (SARC028) 4,10 .

Immune classification of STS
The TME compositions from four independent discovery primary STS datasets (TCGA SARC, Gene Expression Omnibus (GEO) accessions GSE21050, GSE21122 and GSE30929) (Extended Data Table 1) with publicly available gene expression profiles were analysed by MCP-counter, a gene-expression-based TME deconvolution tool 9 . An immune-based classification of STS was developed from this analysis (Extended Data Fig. 1, Methods) and tumours were assigned to one of five sarcoma immune classes (SICs), labelled A, B, C, D and E, with highly distinct profiles (Fig. 1).
We compared the SIC distribution across histological subtypes and found that most LMS tumours were classified to SICs A and B (Fig. 1a). DDLPS accounted for half of SIC C tumours. Tumours classified as SICs D and E were more evenly distributed across histological subtypes. Application of the predictor of the immune classes (Methods) to other STS histologies from French Sarcoma Group (FSG) cohort (Extended Data Table 1) revealed that all SICs could be identified in each histology (Extended Data Fig. 2a).
The TME composition differs significantly between SICs (Fig. 1b). Three SICs showed homogeneous profiles. SIC A, 'immune desert', was characterized by the lowest expression of gene signatures related to immune cells, as well as low vasculature. SIC C, 'vascularized', was dominated by a high expression of endothelial-cell-related genes. SIC E, 'immune and TLS high', was characterized by the highest expression of genes specific to immune populations such as T cells, CD8 + T cells, natural killer cells and cytotoxic lymphocytes. Notably, a key determinant of SIC E was the high expression of the B lineage signature (P = 1.8 × 10 −29 ). SICs B and D were characterized by heterogeneous but generally immune-low and immune-high profiles, respectively.
The expression of genes associated with T cell or myeloid cell chemotaxis, T cell activation and survival, major histocompatibility complex class I, and regulatory gene signatures was high in SICs D and E, intermediate in SICs B and C, and very low in SIC A (Fig. 1c). Expression of the lymphoid-structures-associated B-cell-specific chemokine CXCL13 was notably high in E tumours, moderate in D tumours, generally low in B and C tumours, and negligible in A tumours.
The expression of immune-checkpoint-related genes ( Fig. 1d) followed that of immune infiltrates, with high expression of the genes encoding PD1, PDL2, CTLA4 and TIM3 (PDCD1, PDCD1LG2, CTLA4 and HAVCR2, respectively) in SIC E followed by SIC D tumours, and low-tovery-low expression in SIC C, B and A tumours. CD274 (which encodes PDL1) was heterogeneously expressed across SICs, whereas LAG3 was expressed at high levels only in SIC E tumours, and its expression was low in all other classes. The above findings were consistent across the four discovery cohorts (Extended Data Fig. 3).

SICs are associated with patient survival
After confirmation that the two cohorts with available survival data (TCGA SARC, n = 213; GSE21050, n = 283) exhibited similar survival patterns (data not shown), the cohorts were pooled to study the clinical outcome of the five SICs (Fig. 2a). Patients with SIC A exhibited the shortest overall survival compared with group D or E patients (P = 0.048 and P = 0.025, respectively). Similarly, among the other STS histologies from the FSG cohort, patients with SIC A had a shorter overall survival than patients with SIC E (Extended Data Upper whisker extends to whichever is minimal, maximum or third quartile plus 1.5× IQR. Lower whisker extends to whichever is maximal, minimum or first quartile minus 1.5× IQR. P values were determined by chi-squared test (c) or two-sided Mann-Whitney tests (d).
The landscape of copy-number variations, assessed on the TCGA SARC cohort, revealed differences between histologies, consistent with previous observations 8 . However, there was no notable difference in copy-number variation between SICs (data not shown).

In situ validation of SIC profiles in tumours
To validate the TME profiles of SICs in situ, we analysed an independent cohort of 93 STS cases (NTUH cohort) (Extended Data Table 1). Seventythree samples passed quality control for transcriptomic analysis using Nanostring nCounter technology. We classified this cohort into the same five SICs (Methods) with the following distribution: A, 16 (21.9%); B, 19 (26.0%); C, 10 (13.7%); D, 17 (23.3%); and E, 11 (15.1%). The NTUH cohort samples exhibited gene-expression-based TME profiles that were similar to that of TCGA SARC and GSE21050 cohorts (Extended Data Fig. 7a).

TLSs are a feature of SIC E tumours
The CXCL13 chemokine, which is associated with the presence of TLSs 12 , was strongly expressed in SIC E tumours (Fig. 1c, Extended Data Fig. 2c). Expression of CXCL13 was highly correlated with that of the TLS-associated 12-chemokine signature 13 (Extended Data Fig. 8a), which suggests that TLSs could be a marker of SIC E. TLSs were defined as a CD20 + B-cell follicle juxtaposed to a CD3 + T cell aggregate containing at least one DC-LAMP + (also known as LAMP3 + ) mature dendritic cell 12,14-16 (Fig. 3a, left). A strong association between SICs and the presence of TLSs was identified (P = 3.13 × 10 −6 ) (Fig. 3c). No TLSs were observed in tumours from SICs A, C and D, and only one tumour from SIC B had one TLS. By contrast, nine out of eleven (82%) SIC E tumours exhibited one or more TLS. All TLSs were intratumoural (Extended Data Fig. 8b), and found at the periphery and in the centre of the tumour in all histologies (Extended Data Fig. 8c, d).

SICs predict patient response to PD1 blockade
We examined whether SICs can predict the patient response to checkpoint blockade therapy. We obtained 47 pre-treatment STS metastasis biopsies from patients enrolled in the SARC028 clinical trial 4 and its expansion cohort 10 (Extended Data Table 1), which evaluated the efficacy of the anti-PD1 monoclonal antibody pembrolizumab in patients with metastatic STS. Of these 47 patients, 1 achieved a complete response, 9 a partial response, 17 stable disease and 20 had progressive disease (Fig. 4a). Pre-treatment tumours were classified into SICs based on gene-expression data. The objective response rate (ORR) (which accounts for complete and partial responses) as evaluated by response evaluation criteria in solid tumours (RECIST) criteria was 21.2% in the overall cohort. SICs, however, showed substantial variation in ORR, with SIC E patients exhibiting the highest ORR (50%, 5 out of 10), followed by SIC D (25%, 3 out of 12) and SIC C (22%, 2 out of 9) (Fig. 4a). A complete response was found only in SIC E, as well as one patient who had a 100% change in target lesions but a non-complete response in non-target lesions and thus did not qualify for a complete response.
Notably, there were no responders within the SIC A (0 out of 5) and B (0 out of 11) groups (Fig. 4a). Overall, SIC E tumours were associated with the highest response rate to pembrolizumab in comparison with tumours from other SICs (P = 0.026, Fig. 4b). Patients with SIC E tumours also exhibited improved progression-free survival compared with patients with SIC A or B tumours (P = 0.023 and P = 0.0069, respectively) (Fig. 4c).

Discussion
This study is, to our knowledge, the most comprehensive analysis of the STS immune TME and the first to evaluate the prognostic effect of immune infiltrates by simultaneously integrating several immune cell populations and malignant cell characteristics. Previous studies have examined the immune profile of STS tumours, but the importance of B cells and TLSs was not investigated. The clinical effect of CD8 + T cells and PD1 expression has yielded controversial results 7,8,[19][20][21][22][23][24][25] . Here, we found the CD8 + T cell signature and PD1 were expressed in class D and E SICs, which are associated with favourable outcomes, providing high infiltration of B cells. The integrative analysis demonstrates that infiltration by B cells is a key discriminative feature of a group of patients with improved survival. This B-cell-high group was found to respond better to PD1 blockade therapy, although this should be validated on a larger cohort. The field of immuno-oncology is rapidly expanding, and is crucial to accurately identify patients who are likely to respond. Here, we propose a classification for STS that is immune-centric with prognostic effect. It defines a group of patients with a better response to anti-PD1 therapy marked by B cells and TLSs. This finding may have broad applications. Sarcomas are considered immune-quiescent tumours, with a low mutational burden. Nevertheless, our data show that some STSs are immunogenic and that this is driven by B cells. Further work is needed to extend these findings to all STS histologies and other cancers. Similarly, the underlying mechanisms require further investigation, but a possible explanation is that TLSs are sites at which anti-tumoral immunity is generated, with B cells instructing T cells-in particular CD8 + T cells-to recognize tumour-associated antigens 26 . It is noteworthy that TLS-rich tumours are more infiltrated by CD8 + T cells. These T cells can become exhausted, explaining the correlation of the expression of immune checkpoints (such as PD1 and LAG3) with TLSs, and why treatment with checkpoint inhibitors may allow productive anti-tumour immunity in TLS-rich tumours. Overall, our findings lay the foundation for a tool to risk-stratify patients with STS and identify those who may be more likely to benefit from immunotherapies, and may be broadly applicable to other malignancies [26][27][28][29][30] .

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-019-1906-8.

Ethics and patients
Patients diagnosed with DDLPS, LMS and UPS were identified and the pathology diagnosis was confirmed by a certified pathologist in National Taiwan University Hospital. The research was approved by the Research Ethics Committee of NTUH (201605061RINA) for this retrospective study. Formalin-fixed paraffin-embedded (FFPE) blocks were retrieved and 4-5-µm-thick slides were taken for immunohistochemistry staining and RNA extraction for Nanostring testing. Other cohorts were previously published 4,8,[31][32][33][34][35] .

Establishing the immune classification of STS
To establish a robust immune classification of STS, publicly available transcriptomic data from TCGA data portal and the GEO repository representing four large and independent patient cohorts were included. Only tumours from the most common histologies of genomically complex STS were included: LMS, UPS and DDLPS. We analysed data from the TCGA SARC 8 (n = 213), GSE21050 31 (n = 283), GSE21122 32 (n = 72) and GSE30929 33 (n = 40) cohorts.

Public transcriptomic data pre-processing
Transcriptomic data were downloaded from the TCGA data portal (SARC cohort) and GEO (accessions GSE21050, GSE21122 and GSE30929). TCGA SARC was restricted to complex genomics sarcomas (UPS, DDLPS and LMS). Normalized TCGA SARC RNA-sequencing data were log 2transformed. Microarray data were normalized using frozen-RMA method 36 from the R package frma. Batch effect was corrected across series using ComBat 37 , with histology as covariate.

Estimation of the TME composition
The TME composition of each tumour was assessed with the MCPcounter tool 9 , which provides abundance scores for eight immune (T cells, CD8 + T cells, cytotoxic lymphocytes, natural killer cells, B cell lineage, monocytic lineage, myeloid dendritic cells and neutrophils), and two stromal populations (endothelial cells and fibroblasts). The scores are based on analysis of transcriptomic markers-that is, transcriptomic features that are strongly, specifically and stably expressed in a unique cell population. These scores are proportional to the abundance of each cell population in the tumour, therefore allowing intersample comparison and large cohort analyses 38

Intracohort immune classifications
The fibroblasts signature was removed from this analysis as all STS tumours exhibited high and homogeneous scores for this cell population, which is consistent with the mesenchymal origin of STS. The signature for CD8 T cells was removed from the analysis for GSE21050, GSE21122 and GSE30929 as it showed very small variation across all samples in these microarray-based cohorts. Unsupervised clustering of samples in each cohort was performed based on the metagene Z-score for the included populations of MCP-counter (Extended Data Fig. 9a-d) using R software, with the Euclidian distance and Ward's linkage criterion, using the gplots package. The TCGA SARC, GSE21050, GSE21122 and GSE30929 cohorts were separated into 6, 9, 7 and 6 groups, respectively. The number of clusters was chosen empirically following the dendrograms shown in Extended Data Fig. 9a-d. Analysis of the intersample variance revealed that much of the explainable variance was already attained at the chosen number of clusters as visualized in Extended Data Fig. 9e-h.

Pan-cohort immune classes
To aggregate the above four intracohort classifications, the transcriptome matrix of each cohort was independently zero-centred for each gene across all samples. Then, we computed the centroids of each class over the whole transcriptome and analysed the Pearson correlations between all the centroids on the set of genes shared across the four cohorts (Extended Data Fig. 9i). From these correlations, we deduced five SICs. The tumours from six remaining cohort-specific clusters shared intermediate/weak correlation patterns to other clusters and were temporarily labelled as 'unclassified'.

Prediction of the immune classes
Centroids of SICs were computed on MCP-counter intraseries Z-scores for T cells, cytotoxic lymphocytes, B cell lineage, natural killer cells, monocytic lineage, myeloid dendritic cells, neutrophils and endothelial cells, on all cohorts. To predict de novo the immune classes of each of the cohorts, MCP-counter Z-scores were computed, and each sample was assigned to the closest immune class based on its Euclidian distance to the related centroids. The SICs labels used are the ones predicted using this method. Principal component analysis of the 608 samples on the MCP-counter scores shows that the intra-SIC homogeneity was improved by this prediction step (Extended Data Fig. 9j, k), as confirmed by supervised tests across SICs (Extended Data Fig. 9l, m).

Gene signatures for the functional orientation
The signatures used to determine the functional orientation of the TME were derived from the literature 39 . The signatures were the following: immunosuppression (CXCL12, TGFB1, TGFB3 and LGALS1), T cell activation (CXCL9, CXCL10, CXCL16, IFNG and IL15), T cell survival (CD70 and CD27), regulatory T cells (FOXP3 and TNFRSF18), major histocompatibility complex class I (HLA-A, HLA-B, HLA-C, HLA-E, HLA-F, HLA-G and B2M), myeloid cell chemotaxis (CCL2), and tertiary lymphoid structures (CXCL13). For each signature, scores were computed as the geometric mean signature expression.

De novo prediction of the immune classes of additional cohorts and other platforms
The predictor described above was adapted to analyse new and independent samples, from Nanostring-analysed FFPE samples. In a first step, SICs were estimated on the NTUH cohort by sorting samples on the B lineage signature, T cells signature then endothelial cell signature and assigning each sample according to the SIC it resembled the most. Similar to as described above, centroids of each SIC on Nanostring data MCP-counter scores Z-scores were computed and samples were reassigned to the SIC they were closest to the centroid of. For new samples from the SARC028 cohort, MCP-counter scores for T cells, cytotoxic lymphocytes, B lineage and endothelial cells were computed and transformed as Z-scores. Distances with Nanostring-defined centroids presented above were computed with Euclidian metric, and samples were assigned to the SIC with the lowest distance.

Nanostring nCounter analysis
The RNA was analysed using the nCounter Technology (Nanostring Technologies) as per the manufacturer's protocol. Data were normalized using the nSolver software (Nanostring Technologies).

Enzymatic and fluorescent multiplexed immunohistochemistry
The FFPE human tumour and control specimens were cut into 3-µm-thick sections. Human FFPE tonsil sections were used as positive controls for CD3, CD4, CD8, CD20, CD21, CD23, CD34, CXCR5, DC-LAMP, PD1, PDL1 and PNAd, placenta sections were used in addition for PDL1 and cerebral cortex tissue was used as a negative control. The specificity of all antibodies was tested by the manufacturers and the specificity of anti-PD1 antibodies was validated in our laboratory on overexpressing cells pellets as previously reported 40 . Antigen retrieval was carried out on a PT-link (Dako) using the EnVision FLEX Target Retrieval Solutions at High pH (Dako, K8004) or Low pH (Dako, K8005). Endogenous peroxidase activity and non-specific Fc receptor binding were blocked with H2O2 3% (Gifrer, 10603051) and Protein Block (Dako, X0909) respectively. The primary and secondary antibodies used for immunohistochemistry and immunofluorescence are summarized in Extended Data Table 2. Immunohistochemistry and immunofluorescence images were independently analysed blindly by three observers (L.L., C.S.-F. and G.L.).

Fluorescent multiplexed immunohistochemistry
For the PD1, CD20 and CD3 3-plex staining, a tyramide system amplification (TSA) was used. The stainings were performed with a Leica Bond RX. The incubation with TSA reagent was performed after the incubation of the horseradish peroxidase (HRP)-conjugated polymer and was followed by antibody stripping at 97 °C for 10 min. This protocol was repeated for the second and third primary antibodies and corresponding polymer incubations. The dilutions used for the TSA are 1:400 for TSA AF488, 1:800 for TSA AF594 and 1:200 for TSA AF647, as per the manufacturer's recommendations. For the CXCR5, CD4 and PD1 3-plex staining, we used a conventional fluorescent-dye conjugated secondary antibody system performed manually (all secondary antibodies were diluted at 1:100). For all the fluorescent stainings, the nuclei were stained with DAPI Solution (Thermo Fisher, 62248) at 2 µg ml −1 for 10 min. After mounting with ProLongTM Gold Antifade Mountant (Thermofisher, P36934), the slides were scanned with a Zeiss Axio Scan.Z1.

Statistical analysis
All statistical analyses were performed using the R software (v. 3.4.4) and the packages survival, gplots, dunn.test and FactoMineR. The relationship between two categorical variables was estimated with the chi-squared test. The relationship between a categorical variable and a quantitative variable was estimated with the Mann-Whitney U test (two categories) or the Kruskall-Wallis test (three or more categories). All tests were two-sided. In cases with three or more categories, pairwise comparisons were carried out with Dunn tests. The relationship between two quantitative variables was estimated with the Pearson correlation. When appropriate, P values were corrected for multiple hypothesis testing with the Bonferroni or Benjamini-Hochberg methods, as specified in the text or figure legends. Survival was analysed with Kaplan-Meier estimates and log-rank tests. No statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment unless stated otherwise.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
The transcriptomic datasets analysed in this study can be accessed on the GDC Portal (portal.gdc.cancer.gov, cohort TCGA SARC) and the GEO repository under accession numbers GSE21050, GSE21122 and GSE30929. FSG cohort data are publicly available from ArrayExpress for gastrointestinal stromal tumour with accession code E-MTAB-373, and from the GEO for synovial sarcomas with accession number GSE40021. Myxoid liposarcomas from the FSG cohort are available from the corresponding authors upon reasonable request. Immunohistochemistry and gene expression data related to the NTUH cohort (Fig. 3

Code availability
All code used in this study is available from the corresponding author upon reasonable request.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code

Data collection
Immunohistochemistry images were analysed with HALO 10 software (IndicaLab). Immunofluorescence data were obtained with AxioScan (Zeiss)

Data analysis
Data was analysed with R software (version 3.4.4) and packages gplots, survival and FactoMineR. Custom code was produced in R for the analysis.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The transcriptomic datasets analysed in this study can be accessed on the GDC Portal (TCGA SARC) and the Gene Expression Omnibus repository (accession numbers GSE21050, GSE21122, GSE30929). Immunohistochemistry, gene expression and clinical-related to NTUH cohorts (Fig. 3, Extended Data Figs. 7 and 8) are available from the corresponding author on reasonable request. The data that support the findings related to Fig. 4 are available from SARC but restrictions apply to the availability of these data, which were used under license for the study. Data are however available upon reasonable request to HAT (HTawbi@mdanderson.org) and with permission of SARC. All code used in this study is available from the authors upon reasonable request.