Spatial and temporal homogeneity of driver mutations in diffuse intrinsic pontine glioma

Diffuse Intrinsic Pontine Gliomas (DIPGs) are deadly paediatric brain tumours where needle biopsies help guide diagnosis and targeted therapies. To address spatial heterogeneity, here we analyse 134 specimens from various neuroanatomical structures of whole autopsy brains from nine DIPG patients. Evolutionary reconstruction indicates histone 3 (H3) K27M—including H3.2K27M—mutations potentially arise first and are invariably associated with specific, high-fidelity obligate partners throughout the tumour and its spread, from diagnosis to end-stage disease, suggesting mutual need for tumorigenesis. These H3K27M ubiquitously-associated mutations involve alterations in TP53 cell-cycle (TP53/PPM1D) or specific growth factor pathways (ACVR1/PIK3R1). Later oncogenic alterations arise in sub-clones and often affect the PI3K pathway. Our findings are consistent with early tumour spread outside the brainstem including the cerebrum. The spatial and temporal homogeneity of main driver mutations in DIPG implies they will be captured by limited biopsies and emphasizes the need to develop therapies specifically targeting obligate oncohistone partnerships.

) Right: Evolutionary trees reconstructed using CNV corrected allele frequencies from the deep amplicon sequencing data targeting the candidate genes found in whole exome sequencing.
*We had access to whole exome sequencing data from only one sample for patient DIPG4. The evolutionary tree for this patient was reconstructed based on deep amplicon sequencing data, except the frequency values were not corrected for CNV events (due to lack of whole exome sequencing data on the rest of the samples from this patient.) Lacking a robust correction for copy number, we consider this phylogeny less reliable and hence do not discuss it in the main text.
DIPG7 DIPG8 DIPG9 Supplementary Figure 9: Tumor spread outside the brain stem in patients DIPG7, DIPG8, and DIPG9. In DIPG7, spread to cerebellum and across the brain stem is consistent with early event in tumor evolution as it does not harbor additional mutations. In DIPG8, the anatomical spread path towards the frontal lobe is not known and hence is represented as a dashed line (--). Spread to "Medulla 2" in this patient is presumed to happen relatively later than the spread towards the cerebellum since it harbors an additional PTPN1 mutation. The spread towards the medulla and ventricle in DIPG9 is suggested to also happen relatively late in tumor evolution since both samples in these areas share the subclonal ATRX and PPM1D mutations seen in the primary tumor.            clonal in our dataset and in some other reported cases. Similarly, a PTEN mutation was also 1 found in some but not all tumor areas in DIPG2 (Fig. 3). We can only speculate at this time as to 2 why mutations that ultimately activate the PI3K/AKT pathway appear clonal for some of the 3 component genes and subclonal for others. This may be related to the dosage effect of pathway 4 activation, as the loss of the regulatory unit (PIK3R1) leads to increased baseline activity of the 5 wild-type kinase whereas the PIK3CAH1047R mutation has been associated with higher kinase 6 activity and oncogenic potential through cellular reprograming and induction of stemness 7 properties 24 . Further studies are needed to help ascertain the specific role of these accessory 8 driver mutations in oncohistone tumorigenesis. The previously unsuspected homogeneity for main driver mutations across the course of the 4 disease we uncover in this study indicates that efforts to cure DIPG should be directed at the 5 oncohistone partnership, as other genetic alterations are generally sub-clonal. Our findings 6 further indicate that needle biopsies recommended to orient care are representative of the main 7 drivers in DIPG even if the regional heterogeneity of other secondary targetable alterations, such 8 as PIK3CA mutations, may not be fully captured. Based on early tumor spread, efforts to cure 9 DIPG should aim for early systemic tumor control as opposed to regimens focused on the pons. 10 Methods 1

Patient samples 2
All patient samples were collected with informed consent in accordance with the respective 3 Ethics Review Boards of the institutions that provided them. DIPG post-mortem specimen 4 procurement was performed as previously described 19 . Briefly, brainstem and cerebellum were 5 removed en bloc from the whole brain, and dissected into ~9 transverse sections. The cerebral 6 cortex was dissected into ~11 coronal sections. The brainstem, cerebellum, and cerebral cortex 7 sections were alternatively frozen or fixed in formalin. A total of 158 samples were studied by 8 immunohistochemistry and molecular analyses, representing various neuroanatomical locations 9 such as frontal, parietal, temporal, occipital lobes, thalamus, lateral ventricles, hippocampus, 10 midbrain, pons, medulla, and cerebellum (Supplementary Tables 1-3 Immunohistochemistry was performed on 5μm thick FFPE slides. Briefly, slides were de-19 paraffinized, processed for epitope retrieval, DAB detected using reagents customized for the 20 Leica BOND-MAX automated stainer (Leica Biosystems, Buffalo Grove, IL). Processed slides 21 were probed by immunohistochemical assay for hematoxylin and eosin (H&E), Ki67, H3-K27M, 22 and H3-K27me3 as previously described 12 (Supplementary Fig. 2). rapid-run mode with 100 bp paired-end reads. Next, adaptor sequences were removed; reads 2 were trimmed for quality using the FASTX-Toolkit. An in-house program was used to ensure the 3 presence of exclusively paired-reads to be used in further steps of the analysis. We next aligned 4 the reads using Burrows-Wheeler Aligner (BWA) 0.7.7 to hg19 as reference genome. Indel 5 realignment was performed using the Genome Analysis Toolkit (GATK) 29 We next marked the 6 duplicate reads using Picard and excluded them from further analyses as previously described 6 . 7 The coverage of consensus coding sequence (CCDS) bases was assessed using GATK. The 8 average coverage over all the samples was 70x. The majority of samples had >90% of CCDS 9 bases covered by at least 10 reads and > 83% of CCDS bases covered by at least 20 reads. 10 We called SNVs and short indels using SAMtools mpileup with the extended base alignment 11 quality (BAQ) adjustment (-E) 30,31 . Next we filtered them for quality so that at least 10% of 12 reads supporting each variant call. We used both ANNOVAR 32 and in-house tools to annotate 13 the variants and to identify whether these variants affect protein-coding sequence and if they had Methylation profiling data was analyzed as previously described 6 . The raw data were subject to 22 quality control and preprocessing utilizing the R package minfi, and normalized for technical 23 variation between the Infinium I and II probes using the SWAN method. We removed probes on 1 sex chromosomes (chrX, Y), those containing SNPs (dbSNP: 2 http://www.ncbi.nlm.nih.gov/SNP/) as well as non-specific probes that bind to multiple genomic 3 locations. Unsupervised hierarchical clustering was performed using average linkage and 4 Pearson rank correlation distance on the top 5,000 most variable probes selected based on 5 standard deviation of beta values (β-values). 6 7

Copy Number Variation analysis 8
To study copy number variations in our samples we developed an in-house program to calculate 9 the deviation of B allele frequency from 50% as well as normalized coverage from whole exome 10 sequencing data (adapted from methods used in FishingCNV 33 and ExomeAI 34 ). Different CNV 11 events (duplication, deletion, copy neutral LOH) were called based upon the B allelic imbalance 12 and the status of the normalized coverage as follow: Deviation from 50% B allele frequency and 13 an increase in normalized coverage was considered as amplification, Deviation from 50% B 14 allele frequency and decrease in normalized coverage as deletion, and Deviation from 50% B 15 allele frequency and no change in the normalized coverage was considered as potential copy 16 neutral loss of heterozygosity. We mainly assessed the CNV events at the chromosomal arm 17 level. The results of our CNV detection are presented in Supplementary Fig. 5  We used PhyloWGS 35 to reconstruct the tumor phylogeny, which uses a Bayesian approach to 22 infer cellular frequencies from mutation allele frequencies. It applies Dirichlet process to cluster 23 mutations with similar cellular frequencies and the tree-structured stick-breaking process to 1 model the clonal evolutionary tree. For multi-region samples from the same patient, we 2 normalized the read counts used for phylogenetic tree construction by copy number counts from 3 CNV analysis. Read counts were corrected for the CNV events ( Supplementary Fig. 5; Table 3