Whole-exome sequencing of pancreatic cancer defines genetic diversity and therapeutic targets

Pancreatic ductal adenocarcinoma (PDA) has a dismal prognosis and insights into both disease etiology and targeted intervention are needed. A total of 109 micro-dissected PDA cases were subjected to whole-exome sequencing. Microdissection enriches tumour cellularity and enhances mutation calling. Here we show that environmental stress and alterations in DNA repair genes associate with distinct mutation spectra. Copy number alterations target multiple tumour suppressive/oncogenic loci; however, amplification of MYC is uniquely associated with poor outcome and adenosquamous subtype. We identify multiple novel mutated genes in PDA, with select genes harbouring prognostic significance. RBM10 mutations associate with longer survival in spite of histological features of aggressive disease. KRAS mutations are observed in >90% of cases, but codon Q61 alleles are selectively associated with improved survival. Oncogenic BRAF mutations are mutually exclusive with KRAS and define sensitivity to vemurafenib in PDA models. High-frequency alterations in Wnt signalling, chromatin remodelling, Hedgehog signalling, DNA repair and cell cycle processes are observed. Together, these data delineate new genetic diversity of PDA and provide insights into prognostic determinants and therapeutic targets.

P ancreatic ductal adenocarcinoma (PDA) portends a poor prognosis, with a 5-year survival rate of B6% (refs [1][2][3]. This poor outcome is related to multiple factors, including the relatively late stage of diagnosis, many patients presenting with unresectable disease, and therapy recalcitrance resulting in disease recurrence in spite of operable disease and systemic therapy 1 . Genetic analysis of PDA yielded insights related to altered signalling pathways 4,5 ; however, unlike other cancers the number of sequenced PDA genomes has been relatively modest. This reflects the difficulty of sequencing a tumour that is dominated by presence of stromal and inflammatory cells 2,6 . Tumour cellularity in samples subjected to whole-exome or -genome sequencing represents a known barrier in obtaining high quality data; therefore, many studies recommended 460% tumour cell content in the samples 7 . To facilitate mutation detection and remove contaminating non-neoplastic tissue the initial PDA sequencing effort employed patient-derived xenografts and cell lines 5 . This approach resulted in detecting an average of 48 somatic mutations per model sequenced. Subsequent work employed exome sequencing on 99 clinical cases of PDA, and applied the q-pure algorithm to overcome contamination with non-neoplastic tissue, yielding an average of 26 mutations per case 4 . Here 109 micro-dissected PDA cases annotated with outcome data were subjected to whole-exome sequencing. The data from this analyses demonstrates that there is substantial genetic diversity within PDA with a mutation frequency comparable to other solid tumours. Alterations in mutation spectra and burden are associated with hallmark pancreatic cancer risk factors. Increased copy number variation indicative of chromosomal instability is associated with poor outcome. Amplification of the locus involving the MYC oncogene are specifically associated with poor outcome and association with adenosquamous subtype of PDA. In contrast, select mutated genes (for example, RBM10) are associated with longer patient survival. Diversity within the canonical KRAS signalling pathway defined prognostic features of the disease and the potential for targeted therapy of BRAFmutated cases. The mutation and copy number alterations frequently involved important signalling pathways that contribute to tumour development and represent therapeutic targets.

Results
Tissue cell enrichment and mutation burden. To enhance tumour cellularity in PDA patient samples, needle dissection was employed to significantly enrich PDA tumour cells from the surrounding microenvironment (Fig. 1a). This approach was applied to a total of 109 surgically resected PDA cases annotated with clinical outcome and etiological features ( Table 1). The cohort had a median survival of 21 months that is consistent with historical survival of resected PDA 8,9 . Of standard pathological features, grade 3 was associated with poor outcome, while nodal status and adenosquamous subtype trended towards poor outcome ( Supplementary Figs 1 and 2). Enriched tumour cells were utilized as the substrate for whole-exome sequencing (Methods). Average tumour purity exceeded 50% and the average sequencing depth was B50 Â . In prior studies sequencing PDA 4 , 26 mutations were detected per case representing a relatively low mutational burden when compared with the majority of solid tumours ( Supplementary Figs 3 and 4). The average mutation burden observed in our cohort was 67 nonsynonymous events per case, equivalent to multiple other solidtumour types (Fig. 1b, Supplementary Data 1, Supplementary . Sanger sequencing was used to validate 248 genetic events with validation occurring in 92% of cases (Supplementary Data 1). In addition, p53 immunohistochemistry performed across the cohort demonstrated the expected strong correlation between TP53 mutation and stabilization of the protein (Supplementary Fig.  6).
To determine whether deeper sequencing would further increase the number of mutations detected, 21 cases were sequenced to B120 Â depth. The majority of events identified with deeper sequencing had low allelic frequency or coverage depth, and only a small fraction of statistically significant mutations were identified (Supplementary Fig. 7 and Methods).
Determinants of mutation burden and spectrum. Consistent with observations in other solid malignancies [10][11][12] , cases with the highest relative mutational burden harboured genetic lesions in mismatch repair genes, known to be associated with a mutator phenotype (Fig. 1c). These cases also displayed a mutation spectrum consistent with mismatch repair deficiency, T4C transition at CTG tri-nucleotides 11,13 (Fig. 1c and Supplementary  Fig. 8). The overarching mutational spectra of the PDA cohort were dominated by C4T mutations associated with age, and had minimal evidence for enriched APOBEC-associated mutagenesis at TCW tri-nucleotides ( Supplementary Fig. 8). By hierarchical clustering it was apparent that the variable mutational spectra of individual PDA cases specified discrete subtypes (Fig. 1d).
Smoking status was associated with an enrichment of C4A transversions that resembled the mutational 'smoking signature' observed in other cancers for which smoking is a significant risk factor (Fig. 1e, Supplementary Figs 9-11) 12,14 . Smokers had generally poor outcome, with a median survival half that of nonsmokers (Fig. 1f). Thus, etiological events associated with the development of PDA were identifiably associated with the mutational spectrum of the tumours.
Copy number variation and the role of MYC. Affinity propagation clustering (APC) was used to delineate deterministic patterns of commonality associated with copy number alterations (Fig. 2a, left panel) and partitioned cases in discrete subtypes with increasing genetic complexity (Fig. 2a, right panel). Cases with high levels of amplifications/deletions (clusters 5 and 6) were significantly over-represented for alterations of genes involved in DNA break repair, but not TP53 (Fig. 2b). Similar results were observed with standard hierarchical clustering methods based on Euclidean distance ( Supplementary Figs 12 and 13). Both cluster 5 and 6 harboured poor outcome relative to other clusters with fewer copy number changes (Fig. 2c, Supplementary Figs 12 and  13). GISTIC analysis defined significant common regions of amplification and deletion that harbour multiple oncogenes (for example, MYC and CCND1) and tumour suppressors (for example, SMAD4 and CDKN2A) ( Fig. 2d and Supplementary Data 2). Interrogation of amplified and deleted regions for association with survival (Supplementary Data 2) revealed that the amplification of the 8q24 locus, harbouring the MYC oncogene, was uniquely associated with poor outcome (Fig. 2e, Supplementary Data 2). While MYC overexpression has been shown to facilitate the development of pancreatic cancer in mouse models 15 , little analysis has been performed in patient specimens. The amplification of MYC was confirmed by fluorescent in situ hybridization (Fig. 2f). The MYC amplified cases did not have a higher mutation burden or association with other hallmark mutations of PDA; however, amplification was significantly over-represented in the adenosquamous subtype of pancreatic carcinoma ( Supplementary Fig. 14). Evaluation of precursor lesions pancreatic intraepithelial neoplasia (PanIN) associated with invasive disease also revealed MYC amplification ( Supplementary Fig. 15). These data suggest a role of MYC in initiation and progression of this exceedingly aggressive form of ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms7744 PDA. Consistent with this observation, in a MYC-driven mouse model of pancreatic cancer, the tumours exhibited adenosquamous histology and stained positively for p63, an established marker of squamous differentiation ( Supplementary  Fig. 16).
Significantly mutated genes in pancreatic cancer. MutSigCV analysis 16   were detected in six and four conventional PDA cases, respectively. Notably, PDAs harbouring RNF43 and RBM10 alterations did not arise from IPMN and there was no association between these genes and GNAS mutations in this cohort. We also identified a number of 'cancer genes' 18 as significantly mutated in PDA, including BCLAF1 (5% of cases), IRF6 (4% of cases), FLG (10% of cases), AXIN1 (5% of cases), GLI3 (6% of cases) and PIK3CA (4% of cases).
Mutated genes associated with pancreatic cancer prognosis. Selected mutations were associated with pathological features of disease or survival ( Fig. 3b and Supplementary Table 1). RBM10 is critical regulator of alternative splicing that was recently identified as significantly mutated in lung cancer 19,20 . Remarkably, RBM10 mutations were associated with enhanced survival (Fig. 3b), although all RBM10-mutated cases were of high grade, pT3 stage and harboured lymph node metastasis in three of four cases ( Supplementary Fig. 21). Mutations in the chromatin remodelling gene ARID1A trended towards poor survival (Fig. 3b). Immunohistochemical analyses of additional 296 PDA cases demonstrated that ARID1A protein deficiency was significantly associated with poor outcome in this expanded cohort (Fig. 3c, Supplementary Table 2). The loss of ARID1A protein is similarly observed in multiple PDA cell lines, although whether this event is functionally significant is unknown ( Supplementary Fig. 22). Loss of ARID1A is known to induce vulnerability to ARID1B depletion in other cancer models 21 and we could confirm this finding in ARID1A-deficient PDA cell lines, indicating that loss of protein has a functional consequence ( Supplementary Fig. 22). To further explore gene interactions, deterministic clustering was performed followed by analysis of exemplar cases ( Supplementary Fig. 23). This work further reinforced the significance of genetic alterations beyond KRAS and TP53, as coordinate mutation of ARID1A delineated a cluster with worse prognosis (Supplementary Fig. 23). Correlation of mutations with clinicopathologic data revealed specific somatic aberrations associated with PDA grade, presence of lymph node metastasis and adenosquamous histology (Supplementary Table 1).
To comprehensively interrogate significant mutations in PDA clinical cases, MutSigCV analysis was performed using combined data from current and 99 previously reported PDA cases 4 . The combined 208 cases yielded no additional high-frequency mutated genes, however, multiple lower frequency (o2.5% recurrence) cancer-related genes were identified including ATM, ARID2, TGFBR2 and ACVR1B ( Supplementary Fig. 24). This additional discovery is consistent with the added power of combined data analysis being required for a full accounting of cancer genomes 18 .
Genetic diversity in the KRAS pathway. PDA is dominated by mutations in KRAS that were detected in 92% of the cases in our cohort (Fig. 4a). The majority of KRAS mutations occurred in codon 12, although mutations also occurred in codons 13 and 61 ( Fig. 4b and c), consistent with other sequencing studies 4,5 (Fig. 4c). It has been suggested that different KRAS mutations harbour diverse biological activity 22 . Notably, while all mutations affecting codon 12 exhibited similar association with survival ( Supplementary Fig. 25), cases mutated at codon 61 had a remarkably favourable prognosis in this cohort (Fig. 4d). Interestingly, as we observed in the RBM10 mutant tumours, cases harbouring mutations at codon 61 paradoxically exhibited histological and clinical features indicative of poor survival ( Supplementary Fig. 26). When assessed for pERK staining, cases with KRAS mutations at codon 61 exhibited less ERK activation as compared with cases with other KRAS alleles (Fig. 4e). This finding suggests that in PDA, KRAS mutant allele status potentially determines the quality of RAS pathway activation and the prognosis of disease.
While prior sequencing studies have shown that a small subset of PDA cases harbour wild-type KRAS, the genomic driver(s) of these tumours has remained obscure. Here we found that KRAS wild-type PDA exhibit somatic lesions in known oncogenes that are also RAS effector proteins ( Fig. 4a and Supplementary  Figs [27][28][29][30]. Importantly, the PIK3CA mutations and BRAF mutations, occurring in cases with wild-type KRAS, are known to be mechanistically activating and are enriched in other tumour types ( Fig. 4f and Supplementary . In cases with wildtype KRAS, BRAF and PIK3CA a number of cancer-associated genes representing potential drivers were identified (for example, STK11, GNAS, CHEK2 and RB1) ( Supplementary Fig. 28). BRAF V600E mutations, occurring at a frequency of 3%, were mutually exclusive with KRAS mutations (Supplementary Fig. 29). While the expression of BRAF V600E would suggest that it is a driver event based on mouse models 23 , no endogenous model of this form of pancreatic cancer had been developed. One of the BRAF V600E cases (PDA_014) was utilized to develop a cell line that maintained the BRAF mutation ( Supplementary Fig. 30). BRAF V600E PDA cell line was equally sensitive as MNT1 melanoma cells to the FDA-approved BRAF inhibitor PLX-4032, while KRAS driven cells (PL45) are resistant (Fig. 4g-i). Together, these data suggest that genetic analysis of PDA could identify a subset of patients, albeit small, who may benefit from targeted therapy along the KRAS/BRAF axis.
Frequently mutated pathways in pancreatic cancer. In addition to heterogeneity within the core KRAS pathway, we detected multiple additional pathways that were genetically altered at high frequency (420%) in PDA (Fig. 5a, Supplementary Figs  31-33). Specifically, we found that the TGF-pathway is disrupted via a combination of largely mutually exclusive events in addition to the frequent disruption of SMAD4. This includes loss of TGFBR2/TGFBR1, as well as mutations in ACVR1B, which is a newly identified cancer-associated gene in the pathway 18 . Similarly, in concert with frequent deletion of CDKN2A and CDKN2B, CDK4 and CCND1 amplification and RB1 loss was observed in the RB pathway. Among signalling pathways, beta-  Supplementary Fig. 34). Only DNA repair-associated pathways had a trend towards poor outcome, while beta-catenin signalling trended towards improved outcome. In general, the associations between pathways were limited as determined by Pearson's correlation (Fig. 5b). Random Forest and APC clustering were used to define subtypes of PDA based on altered pathways, and demonstrated marked diversity of combinations of deregulated pathways ( Fig. 5c and Supplementary Fig. 35). While tumours with isolated KRAS-pathway alterations alone or in combination   with TP53 have a poor prognosis, tumours with more complex pathway deregulations trended towards even poorer outcome (Fig. 5d). Many of the highlighted pathways represent therapeutic targets that are actionable in preclinical models and, in some cases, in the clinic ( Table 2). For example, BRAF V600E has been shown to be a potential target in melanoma and other cancers. Loss of RNF43 or AXIN1 are associated with sensitivity to porcupine and tankyrase inhibitors targeting the beta-catenin pathway 24 , while deletion of CDKN2A or amplification of CDK4/ CCND1 confer sensitivity to CDK4/6 inhibitors 25 . Deficits in Fanconi Anaemia genes can be targeted by cross-linking agents 26 , a finding readily observed in cell lines harbouring homozygous deletion of FANCF (Supplementary Fig. 36). Similarly, loss of BRCA function is associated with response to PARP inhibitors 27 . A number of these genetic alterations remain actionable in the presence of an activating KRAS mutation and could provide the opportunity for genetically targeted therapy in PDA.

Discussion
In total, the data herein demonstrates that PDA is a genetically diverse disease harbouring mutational burden similar to other solid malignancies. Due to the enrichment approach utilized, we identified multiple significantly mutated genes that have not been previously described in PDA and provide genetic insights into the etiology, prognostic features and potential therapeutic targets. Risk factors for PDA, include smoking and deficits in DNA mismatch repair as exemplified by Lynch Syndrome [28][29][30] . We observed that smoking status in PDA was associated with a mutation spectra similar to other smoking-associated tumours 13,14 . We also found that select cases exhibit a 'mutator' phenotype, suggesting a contribution of mismatch repair deficiency in tumorigenesis 12 .
In the analysis of copy number, we identified many regions that have been previously identified by CGH and other methods 31 . In general, tumors with more copy number alterations (indicative of chromosomal instability) exhibited mutations in DNA break repair genes and trended toward poor prognosis.
Importantly, we discovered that MYC amplification was associated with poor prognosis and adenosquamous histology. We did not identify mutations in UPF1, which has been recently reported to be selectively mutated with high frequency in pancreatic adenosquamous carcinoma 32 . In spite of a comprehensive unbiased survey of amplicons/deleted regions only one locus (MYC) was associated with outcome; however, clearly there is variation in the overall magnitude of copy number alterations, suggesting that there are perhaps distinct drivers for the chromosomal instability observed in PDA. 50%  19%  10%  7%  6%  6%  6%  6%  6%  6%  6%  6%  6%  5%  5%  5%  4%  4%  4%  4%  4%  4%  4%   KRAS  TP53  SMAD4  FLG  ATXN1  ARID1A  COL14A1  CDKN2A  RNF43  RP1L1  ITGAE  GLI3  GNAS  SPTA1  BCLAF1  AXIN1  NIN  RBM10  IRF6  HDAC2  PABPC1  PIK3CA  USP10       Using MutSigCV and pathway analysis, multiple genes and pathways altered at high frequency were identified, a subset of which associated with survival. ARID1A was confirmed as a marker of poorer outcome using an independent cohort, and RBM10 mutation was associated with longer survival. We also uncovered a number of novel albeit rare alterations in signalling pathways important in PDA biology. While CDKN2A loss is common in PDA and contributes to the aberrant cell cycle progression, loss of RB1 or CCND1 amplification has not been previously reported. In PDA tumours harbouring wild-type KRAS, we identified BRAF and PIK3CA mutations at oncogenic hotspots expanding the spectrum of oncogenic drivers in PDA. Importantly, a cell line developed from BRAF V600E-mutated PDA was equally sensitive as melanoma cells to FDA-approved BRAF inhibitor PLX-4032. In addition to these rare, potentially, clinically actionable events, several pathways that were genetically altered at high frequency (420%) in our study could yield selective therapeutic sensitivities. Genetic lesions targeting some facet of DNA repair or chromatin remodelling SWI/SNF pathway occurred in 435% and 42% of PDA cases, respectively. Alteration in WNT/beta-catenin pathway were common and included loss of RNF43 or AXIN1 conferring sensitivity to porcupine and tankyrase inhibitors targeting. A number of these genetic alterations remain actionable in the presence of an activating KRAS mutation and could provide the opportunity for genetically targeted therapy in PDA.

92%
Together this expanded genetic framework of PDA reveals disease complexity that will contribute to future efforts of disease modelling and patient stratification for treatment.

Methods
Tissue collection and microdissection. All tissues were collected under approved protocols of the Thomas Jefferson University Institutional Review Board and UT Southwestern Medical Center Institutional Review Board that require the informed consent of all patients. Cases represented diverse histological subtypes including colloid (n ¼ 4), adenosquamous (n ¼ 11) and ductal carcinoma, NOS (n ¼ 94). The majority of the analysed tumours were stage I (n ¼ 5) or II (n ¼ 97); the remainder were stage III (n ¼ 6) or IV (n ¼ 1). For stage IV case tissue form the primary pancreatic tumour was analysed. About 7-mM thick slides were cut from frozen tissue and lightly stained with haematoxylin. Tumour epithelial cells were dissected manually using a 14 gauge needle under a dissecting microscope (Leica Microsystems). DNA was extracted using the QIAamp DNA Mini Kit. Between 500 ng and 1.5 mg of enriched tumour DNA was recovered from each case. Analysis of KRAS mutation and other tumour-associated mutations demonstrated that average tumour cellularity 450%. Germline DNA was obtained from normal tissue resected at surgery (n ¼ 105) or peripheral blood (n ¼ 4).
DNA quality control and sequencing. Recovered DNA was quantified by Nanodrop and Qubit using standard procedures. Integrity of genomic DNA was confirmed by gel electrophoresis. A total of 109 tumour/normal pairs passed the quality control to move forward for whole-exome sequencing. TruSeq Exome Enrichment (FC-121-1048) and Nextera Exome Enrichment (FC-140-1003) kits were utilized to build the sequencing libraries as per manufacturer's protocol. The DNA was then subjected to paired-end whole-exome sequencing using Illumina HiSeq2500 instruments. The sequence reads were aligned to the human genome (Build-UCSC hg19) using the BWA alignment algorithm 33 . A general sequencing depth of 51.28 Â (95% confidence interval (CI): 49.5 and 53.1) was observed for sequenced cases, and represented 99.26% (95% CI: 99.20-99.31%) of the target sequences. Across the target sequences, 91.51% (95% CI: 90.97-92.05%) had 415 read coverage, and on average 2.178 Â 10 9 bases were sequenced per case. For 21 cases, the sequencing depth was doubled with an average number of 5.523 Â 10 9 bases sequenced per case. The average fold coverage was 123.01 (95% CI: 115.4 À 130.6). Across target sequences, 94.74% (95% CI: 93.41-96.07) had 415 read coverage. With the deeper sequencing relatively few additional mutation were discovered. The increased depth largely served to detect mutations with low sequence coverage and allele frequency. These data are summarized in Supplementary Fig. 7.
Informatics pipeline. Somatic point mutations and INDELS in the tumour tissues compared with their corresponding normal tissues were identified using the MuTect 34 and VarScan2 algorithms 35 , respectively, with default parameter settings. The mutations were annotated using Broad Institute's Oncotator programme. Minimum of 14 reads covering a site in the tumour and 8 in the normal were required for mutation calling. Only INDELS with reference allele counts greater than eight and tumour variant allele counts greater than three were considered.
Genome Analysis Toolkit (v2.4, DepthOfCoverage) was used to calculate the coverage for all sequenced cases 36 . Log ratio of coverage between tumour and normal were normalized by circular binary segmentation using ExomeCNV and DNAcopy packages in R/Bioconductor. To obtain copy number alterations in tumours compared with normal tissue, GISTIC 2.0 module version 6.2 with join segment size ¼ 4 and confidence level ¼ 0.95 was used 37 . SNV and INDEL analyses were summarized into tables using R (R Core Team-2014; R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org).
Computational and statistical analysis. Hierarchical clustering and Kaplan-Meier analyses were performed using the stats and survival packages in R. Associations of genes with features of disease was determined using hypergeometric testing, with Po0.05 considered significant. Survival between two groups was determined using log-rank statistic, with Po0.05 considered significant. Heatmaps for mutation spectrum and pathway oncoprints were generated using the R package pheatmap. All other plots were generated using the graphics package in R.
The MutSigCV algorithm 16 was run locally with default settings to define significantly mutated genes with input MAF files developed from the merge of MuTect and VarScan outputs. To remove any gene paralogue or alignment driven artifacts, the mutations at all positions for high-frequency genes were interrogated by BLAST against the human genome. This approach removed a number of highfrequency false positives, including CDC27 and FRG1. Mutations in significantly mutated genes and all INDELs were manually interrogated by examining BAM files in Integrative Genomics Viewer 38 . Subsequent filtering was employed were genes with at least four nonsilent mutations (43.5%) were retained. A P value cut-off of 0.05 was used to obtain a list of top significantly mutated genes. Subsequent analysis of sequenced paralogues or analogous sequences was used to further restrict the results to 24 genes. A list of MutSigCV-defined genes to Po0.1 with associated P and Q values is provided in Supplementary Data 3. Unsupervised Random Forest clustering approach was used to cluster the pathways 39 . Random Forest package in R was used to obtain the Random forest dissimilarity measure and the data was classified into clusters using the partitioning around medoids implemented in the cluster package in R.
Clustering analysis was performed with the APC algorithm using the 'apcluster' package in R. APC is a deterministic clustering method, which identifies the number of clusters, and cluster 'exemplars' (that is, the cluster centroid or the data point that is the best representative of all the other data points within that cluster) from the data 40 . APC performs clustering by passing messages between the data points. It takes as input a square matrix representing pairwise similarity measures between all data points. The algorithm views each data point as a node in a network and is initialized by connecting all the nodes together where edges between nodes are proportional to Pearson correlations. The algorithm then iteratively transmits messages along the edges, pruning edges with each iteration until a set of clusters and exemplars emerges.
Two real valued messages are passed between nodes. The 'responsibility' message computes how well-suited it is for point i to choose point k as an exemplar, given all the other candidate exemplars, k 0 , and is updated by: The availability message, a(i, k), computes how appropriate it is for point i to select point k as an exemplar, taking into account all the other points for which k is In the above equation, a(i, k) is set to the self responsibility, r(k,k), plus the sum of the positive responsibilities candidate k receives from other points. The entire sum is thresholded at 0, and a negative availability indicates that it is inappropriate for point i to chose point k as an exemplar and the tie is severed. The self-availability, a(k,k), reflects the accumulated evidence that point k is an exemplar and is updated with the following rule, which reflects the evidence that k is an exemplar based on the positive responsibilities sent to k from all points, and is updated by: In the first iteration, all points are considered equally likely to be candidate exemplars, and a(i, k) is set to 0 and s(i, k) is set to the input similarity measure between points i and k. The above rules are then iteratively updated until a clear, stable set of clusters and exemplars emerges. In our implementation of the algorithm, we first ran the algorithm to identify an initial set of exemplars and clusters from the data. The exemplars were then clustered together and this procedure was repeated until no more clusters emerged to identify a hierarchical structure of clusters. Networks were drawn with cytoscape 41 .
To generate mutation peg plots, protein domain information was downloaded from the Pfam database (http://pfam.xfam.org/) and isoforms were matched between data sets with Uniprot IDs. Mutation information for various TCGA tumour data sets were gathered using the cBioPortal R package, 'cgdsr' (cbioportal.org).
To compare the mutational spectra of the PDA_COHORT data set to the TCGA tumour data sets, individual MAF files from the most current TCGA data were downloaded from the TCGA website (http://cancergenome.nih.gov/) and nucleotide substitution information was compared.
Validation. A total of 248 nonsilent mutations from 84 of the 109 tumour/normal tissue pairs were selected to validate by Sanger sequencing. This validation encompassed 132 different genes, including most that were referenced in the text (for example BRAF, BRCA2, RB1, ARID1A). We aimed to validate mutations with allelic frequency of 40.15, recognizing that Sanger sequencing sensitivity is impaired by low mutant allele frequency. Mutations confirmation was 92% (95% CI: 87.64-94.58%). This range is consistent with other studies using MuTect to define mutations 42 , validated mutations are denoted in Supplementary Data 1.
Immunohistochemistry and FISH. ARID1A immunohistochemistry was performed in a clinical laboratory using a BenchMark Ultra autostainer (Ventana Medical Systems). The ARID1A antibody (cat# sc-32761, Santa Cruz Biotechnology-dilution 1:150) specificity was confirmed by negative staining in cases with known mutation in ARID1A. All scores were read blind to outcome, and data were dichotomized based on loss of the homogeneous ARID1A staining observed in normal tissue or known cases with intact ARID1A gene. Association with survival was performed by Kaplan-Meier analysis. Phospho-ERK immunohistochemistry (cat#4370, Cell Signaling Technology-dilution 1:150) was performed on a DAKO autostainer. All scores were read blind to outcome or association with KRAS mutation.
Fluorecence in situ hybridization (FISH) at the MYC locus was performed in a CLIA clinical laboratory using Vysis LSI MYC dual colour 'break apart' rearrangement probe. The probes detect regions directly centromeric and telomeric of the MYC genes, facilitating the detection of amplifications (both colours amplified in proximity) and/or translations (resulting in the separation of the two colours spatially).
Functional studies. The established PDA cell lines PL45, CAPAN2, PL5, ASPC1 and HS7662 were obtained from ATCC and cultured in DMEM supplemented with 10% FBS. The cell line derived from PDA_014 was established from primary tumour fresh tissue that was digested with trypsin and plated on collagen coated plates in Keratinocyte Serum-Free Media supplemented with Bovine Pituitary Extract, EGF and 2% FBS. Tumour cells were expanded and subjected to sequencing analysis, which confirmed the retention of select tumour-associated mutations including BRAF V600E. The melanoma cell line MNT1 was cultured in DMEM supplemented with 10% FBS. Sensitivity to PLX-4032 was determined by flow-cytometry, crystal violet and CTG analysis as we have previously published 25 . For immunoblotting, 30 mg of total protein were resolved by SDS-PAGE and transferred to polyvinylidene fluoride membranes (Immobilon-P, Millipore). Membranes were immunoblotted with anti-ARID1A (SC-32761, Santa Cruz, dilution 1:500), b-actin (SC47778, Santa Cruz, dilution 1:500), anti-phospho-ERK (SC-7383, Cell Signaling dilution 1:500), anti-ERK (SC-94, Cell Signaling, dilution 1:500) and anti-horse radish peroxidase-conjugated antibodies (Jackson ImmunoResearch-dilution 1:2,500). ). Representative immunoblots are shown in Supplementary Fig. 37.