Genomic Landscape and Immunological Pro le of Glioblastoma in East Asian Patients


 While it is well known that Glioblastoma (GBM) shows profound inter- and intra-tumoral heterogeneity, disparities in molecular features across ancestry groups have been largely overlooked. We collected a large cohort of GBM samples from East Asian patients (EAS-GBM) and performed genomic and transcriptomic analyses of these samples (EAS-GBM, n=443). Further characterization and comparative analysis of the EAS-GBM with the predominantly European-ancestry TCGA GBM dataset (EUR-GBM, n=383) revealed differential genomic and genetic landscape and immunological profile of EAS-GBM from EUR-GBM. EAS-GBM showed an enrichment for NF1, H3F3A, TP53 and ATRX mutations compared to EUR-GBM. Transcriptomic clustering revealed distinct EAS-GBM specific subtypes, namely Proliferative (PL), Neuron-Synaptic (NS), Metabolic (MB) and Immunomodulatory (IM). Notably, the classic subtype of EUR-GBMs with EGFR as its main marker gene was absent in EAS-GBMs, Moreover, the IM subgroup in EAS-GBM with an expression profile that has been previously associated with response to immunotherapy in cancers. Together, our comprehensive characterization revealed the unique genomic and genetic features of EAS-GBMs, providing mechanistic rationale in patient stratification for molecular targeted therapy and drug development in the era of personalized medicine.


Introduction
Glioblastoma multiforme (GBM) is the most common primary malignant tumor in the central nervous system, as well as the leading cause of primary brain cancer-related death. To date, most molecular characterizations of GBM have focused on populations of mostly European ancestry (EUR-GBM), such as The Cancer Genome Atlas (TCGA) program 1,2 . Therefore, genetic and genomic differences between EAS-GBM and EUR-GBM remain underexplored 3 . Consequently it remains unclear which factors in uence outcomes in EAS-GBM. Here we analyzed specimens from 443 individuals with East Asian ancestry by whole-exome sequencing (WES) and RNA sequencing (RNA-Seq). In this work we will address the following speci c questions: What are the differences in genetic drivers between EAS and EUR-GBM? Can we re ne the known GBM expression subtypes with the additional data from EAS-GBM? What could be clinically relevant differences in the immune microenvironment and driver mechanisms between these novel molecular subtypes?
Comparison of EAS-GBM genomes to the 383 patients of European descent in the TCGA GBM cohort revealed signi cant differences in the frequency of well known drivers between EAS and EUR-GBMs and a novel EAS-GBM transcriptomic subgroups. These subgroups include a newly identi ed immune in ltrated subgroup and show potential clinical utility. Our studies provide comprehensive characterizations of the unique genomic and genetic features of EAS-GBMs, laying a foundation for better understanding ethnicity-dependent tumorigenesis and improved clinical strati cation and treatment strategy. (Extended Material 1-2). Our analysis of focal somatic copy number aberrations EAS-GBM con rmed several driver variants known from EUR-GBM including ampli cations at 12q14.1(CDK4), 4q12 (PDGFRA), 2p24.3 (MYCN) and deletions at 9p21.3 (CDKN2A). EAS-GBM also showed signi cant ampli cations at 2q11.1 (harboring ANKRD36C) and 6p21.1 (VEGFA), that have not been observed in EUR-GBM (Extended Figure 1A and Extended Table 2).

Patient characterization
Focusing on signi cantly recurring driver genes in EAS-GBM (as de ned by MutsigCV) we con rmed many of the genes that had been identi ed as drivers in EUR-GBM ( Figure 1C-D, Extended Figure 1B-D), including the most frequently mutated driver genes: TP53, ATRX, PTEN, NF1, and EGFR. Our analysis also showed several potentially novel driver genes exclusively observed in EAS GBM (including SOX1 and MEGF9) which will require further validation. When controlling for IDH status, IDH mut -GBM showed no signi cant differences while the IDH wt EAS-GBM showed a signi cantly higher rate of H3F3A, NF1, TP53, and ATRX mutations and a depletion of EGFR mutations (q=0.07, Figure 1D, Extended Figure 1B-C).
TP53 shows a different mutation pattern in EAS-vs. EUR-GBM In addition to the increased frequency of TP53 mutation in EAS-GBM we also noticed that the location of these SNVs was different. Most of the TP53 variant sites in EUR-GBM scatter evenly in the DNA-binding domain and the oligomerization domain ( Figure 2A- −14 ). No signi cant differences were observed in the oligomerization domain. While the underlying mechanism still needs further elucidation, these data suggest that EAS-GBM is enriched in p53 mutants with different functions than EUR GBM.

Mutational signatures in EAS-GBM show an enrichment for SBS6
We also performed mutational signatures analysis based on the Catalogue of Somatic Mutations in Cancer (COSMIC). Three and four mutational signatures were extracted from EAS and EUR-GBM, respectively ( Figure 2C, Extended Figure 1E). We found that EAS and EUR-GBM both had 20-30% activity of SBS1 (associated with aging 9 ) and SBS3 (associated with de ciency in homologous recombination based DNA repair 9 ) (slightly lower in EAS-GBM, q<0.01 for SBS1 and q=0.05 for SBS3). In contrast, SBS6 was signi cantly enriched in EAS-GBM (q<0.01) ( Figure 2D). SBS6 showed high activity in hypermutant samples likely due to association with defective DNA mismatch repair (MMR) and microsatellite instability ( Figure 2E). PARP inhibitors, reported as effective treatment for HRD and MMR tumors, may potentially bene t GBM patients carrying SBS 3 and 6 10,11 . These data suggest differences in the mutation etiology likely based on differences in environmental exposure and inherent differences in DNA damage response between EAS-and EUR-GBM.
RNA-expression subgroups establish molecular classi cation for EAS-GBMs EUR-GBMs (TCGA) were previously classi ed into three subtypes based on RNA expression: Proneural, Classical and Mesenchymal based on RNA expression clusters 12,13 , which we con rmed when reanalyzing the TCGA data (Extended Materials 3-4). Leveraging our large cohort of RNAseq data from EAS-GBM, we set out to expand this classi cation.
First, we found that in EAS-GBM IDH wt has a general higher immune response compared to IDH mut GBM (p-value=1.6x10 −6 , Extended Materials 5) though both IDH wt and IDH mut EAS-GBMs exhibit substantial heterogeneity in immune signature activities. These data suggested that GBM dichotomy solely based on IDH mutation status is insu cient to discriminate intrinsic characteristics of EAS-GBM. Therefore, we conducted a molecular subtype identi cation for all GBM samples. Unsupervised clustering of 358 EAS-GBM RNAseq samples by non-negative matrix factorization (NMF) consensus clustering divided the cohort into four subtypes (Extended Figures 2A-B). These four clusters were also identi able in t-SNE plots ( Figure 3A). Clustering was consistent after excluding IDH mutant samples (Extended Materials 6), validating the robustness of the methodology we employed and rationality of including both IDH wt and IDH mut . The largest cluster in NMF clustering showed an enrichment for immune related pathways in Metascape pathway analysis. The second largest was enriched in cell cycle, replication and cell division pathways, the third in pathways related to nucleotide metabolism and the smallest cluster was enriched in pathways related to neural synaptic function (Extended Materials 7, 8). We therefore named these clusters: Proliferative (PL), Neuron-Synaptic (NS), Metabolic (MB) and Immunomodulatory (IM). PL GBM and IM GBM showed the largest expression distance within EAS-GBM ( Figure 3B).
Next, we explored whether there are correspondent relationships between EAS and EUR subtypes and subtype taxonomic hierarchies. SubMap results showed that EAS-PL and NS GBM both correspond to EUR Proneural GBM and EAS-IM GBM corresponds to EUR Mesenchymal GBM ( Figures 3C-D). In contrast, EAS-MB GBM and EUR Classical GBM did not correspond to any subtypes in the other ancestry group, suggesting that they are both unique subtypes for their ancestry, respectively. Marker genes accounting for each GBM subtypes were identi ed. A substantial number of overlapping marker genes (n=150 and n=69) were found between EUR Proneural and EAS-PL and EAS-NS, respectively (Extended Figures 2C).
This result supports our previous assertion that both EAS-PL and EAS-NS correspond to EUR Proneural GBM (Extended Figure 2C). Similar expression patterns of previously identi ed EUR proneural GBM maker genes including NF1 and OLIG2 also support this conclusion (Extended Materials 9). The clear separations between EAS-PL and EAS-NS, suggests that EUR Proneural GBM could be a mixed group which did not not separate due to the limited sample size in the previous study. We identi ed novel marker genes for PL-GBM (MYB, MYCN, BRIP1, CCND2) and NS-GBM (WIF1, ATP2B3, GRM3, GRIN2A, KIAA1598, FLT3) ( Figure 3E, G, Extended Figure 2D). Only ve marker genes overlap between EAS-MB and EUR classical (Extended Figure 2C). EUR Classical GBM marker genes like NOTCH3, TP53, GLI2, SMO, were highly expressed in the EUR cohort (Extended Materials 9), but showed low expression in EAS-MB GBM (Extended Figure 2D). This suggests EAS-MB GBM to be a novel subgroup with no correlate in the prior classi cation. Novel marker genes for EAS-MB GBM include ASPSCR1, NTHL1, MAD1L1, GADD45G, TERT. EAS-IM GBM corresponded to the EUR Mesenchymal group as evident by the large number of overlapping marker genes (n=179) between the two groups (Extended Figure 2C). Mesenchymal signature genes like TRADD, CD44, MET and immune related genes containing CTLA4, PDCD1, CD86, CD274, RELB share a similar expression across EAS-IM GBM and EUR Mesenchymal GBM ( Figure 3E, G, Extended Figure 2D, Extended Materials 9). Our results also show that a series of immune related genes (including REL, TGFBR2, COL3A1, COL1A1, SELE, CD274, PDCD1LG2, CCR4, IL21R, IL7R, FAS) are marker genes for IM GBM.
To investigate the underlying biological and clinical application of the new classi cation system, we studied correlations between GBM molecular subtypes and a variety of molecular and clinical parameters such as PRS type, IDH mutation, 1p19q codeletion, MGMT promoter methylation status, and prognosis ( Figure

EAS-GBM shows an immune in ltrated subgroup with biomarkers of immune checkpoint therapy response
As immunotherapy continues to play an ever-more important role in the frontiers of cancer therapeutics, we sought to characterize the immunological pro les of each molecular subtype. We found that immune cell in ltration, activation, and in ammation factor level are all higher in EAS-IM GBM than other subtypes. Immune scores, ESTIMATE scores and stromal scores also suggest that immune response is highest in EAS-IM and EUR Mesenchymal GBM ( Figure 4A, Extended Figure 3A-B). Proneural marker genes, including NF1, NKX2-2, OLIG2, PDGFRA, DLL3 and ASCL1, however show an inverse relationship to the immune in ltration level. Overall our data shows a clear gradient of immune signaling activity with IM-GBM as highest, followed by MB, NF and PL as least immune in ltrated. These data demonstrate different cell fates and signaling axes of the various subtypes.
We also found that HLA genes expressed differently among the four EAS-GBM clusters. EAS-IM GBM generally has the highest HLA gene expression (including HLA-A, HLA-B, HLA-C, HLA-DR). Increased HLA expression could lead to greater antigen presentation and account for the increased immune signatures also observed in EAS-IM GBM. ( Figure 4B, Extended Figure 3C, Wilcoxon tests, all q<0.06). Immune checkpoint blockade therapy target genes, including CTLA4, PDCD1, CD274, and CD86, were highly expressed in EAS-IM GBM relative to other groups (q<2x10 −2 , Extended Materials 11). The IFN-γ/ T-cellin amed gene expression pro le (GEP) has been shown to be predictive of immune checkpoint blockade (ICB) therapy response in clinical trials in multiple cancers [14][15][16][17] . EAS-IM GBMs also had the highest IFNγ/T-cell-in amed GEP scores compared to the other three subtypes (Wilcoxon test: p<0.01, Figure 4C), which suggested that EAS-IM GBM patients could potentially bene t more from ICB therapy.

Immune receptor repertoires by EAS subgroup
Accumulated evidence suggests that patients with higher T cell in ltration tend to bene t from ICB treatment 18 . To assess the diversity of B and T cells, we employed TRUST 19 , which identi es the number of unique clones based on the complementary-determining region 3 (CDR3) of the respective receptor. These data show the TCR CDR3 abundance and diversity to be the highest in EAS-IM GBM, followed by EAS-MB GBM ( Figure 4D, E). Extra TCR counts and clones could suggest that T cells in IM EAS GBM are more activated compared with the other three subtypes. No signi cant difference was observed in TCR diversity among EAS-PL, NS and MB GBM. TCR sequence alignment analysis showed a total of ve unique TCR CDR3 consensus sequences (C1-C5) in the EAS-GBM micro-environment; such amino acid sequences have been used as templates for engineering CAR-T and TCR-T therapies (EAS-GBM TCR/BCR Repertoire: Extended Materials 12, Extended Table 3). γδ T cells, while only a small percentage of total T cells, are closely associated with anti-tumor immune responses 20 . Recent studies reported that Vγ9Vδ2, a major subset (65-95%) of γδ T cells are able to recognize and kill cancer cells through a TCR-dependent manner 21 . Among EAS-GBM, the NS subgroup has the lowest γδ T cell in ltration whereas IM tumors have the most (P < 0.05) (Extended Materials 13).
We next investigated BCR repertoires (including heavy chains: IgG, IgA, IgE, IgM, IgD) in the EAS-GBM tumor microenvironment. Normalized BCR CDR3 counts are also signi cantly higher in the MB and IM GBM relative to the other subgroups ( Figure 4F, P < 0.05). In contrast to TCR diversity, BCR diversity was signi cantly lower in IM GBM compared to the other groups ( Figure 4G). Such restricted clonal diversity but increased BCR abundance could imply that speci c B cells expanded as part of an adaptive immune   Figure 3E). This signi cance is also con rmed by the multivariate Cox regression model by integrating other clinical variates. It is tempting to speculate that HLA-B*54:01 could allow for the ideal presentation of a yet to be identi ed GBM speci c antigen while HLA-A*31, HLA-B*50 or HLA-B*56 completely lack that capacity (Extended Materials 17).
Our analysis revealed that age has signi cant correlation with multiple factors, including stromal and immune score, HLA expression level, MSI and TMB. Stromal score, immune score, HLA expression level and indices of genome instability (GII, Ploidy, GII Deletion, GII Ampli cation) all correlated with each other (Extended Materials 18). We therefore decided to carry out log rank survival analysis and multivariate survival analysis (Extended Figure 3E-3J, Extended Materials 19). A multivariate cox analysis found higher ploidy, GII (Genome Instability Index), MATH (Mutant-Allele Tumor Heterogeneity) scores and H3F3A and PAN3 somatic mutations to be associated with unfavorable prognosis in EAS-GBM patients.
We also found that the NS subtype independently predicted a better prognosis compared to the other three subtypes (Extended Figure 3D, Extended Materials 20) (risk: 0.850, 95% con dence interval: 0.748-0.966, p<0.013). We found no correlation with survival for any of the immune-related indicators (normalized TCR CDR3 counts, TCR clonal diversity and γδ T cell fractions, normalized BCR CDR3 counts, BCR clone diversity, heavy/light chain subtype) (Extended Materials 21-22).
To allow for rapid assessment of EAS-GBM survival probability in the clinic, we established a nomogram scale plot to calculate 1-, 3-and 5-year-survival probabilities (Extended Figure 4A-B). Receiver operating characteristic (ROC) curves of the nomogram were above 0.7 for both 1-and 5-year survival prediction (Extended Figure 4C). We also trained a multiple Cox regression-based machine learning model to predict EAS-GBM patient prognosis based on multi-omics data, including clinical information, driver gene mutation status, genome stability index, immune score, and gene expression pro le (Extended Figure 4D). Patients with a high median risk score were correlated with worse OS as expected (Extended Figure 4E). AUC values also corroborated the high prediction e ciency of our multiple Cox regression model (Extended Figure 4F). The C-index, short for concordance index, is a metric to evaluate the predicted accuracy of our multiple Cox regression model. C-index results indicated that machine learning model integrated multi-omics data (including clinical features, genome index and somatic mutation status) have a signi cantly better performance than models constructed by any single features (Extended Figure 4G).

Treatment opportunity and e cacy in EAS-GBMs
We also evaluated treatment e cacy of different modalities in each GBM subtype using multivariate Cox models including chemotherapy, radiotherapy, age, gender, disease (PRS) type, IDH mutation status, 1p19q codeletion status, and MGMT promoter methylation status ( Figure 5, Extended Materials 23). All EAS-GBM subtypes except NS GBM showed improved outcomes after receiving chemotherapy (all p< 0.05 after controlling for the other factors). Meanwhile, EAS-MB GBM appeared to bene t from radiotherapy signi cantly (p = 0.001). Notably, EAS-NS GBM is not sensitive to either chemotherapy or radiotherapy. These data suggest that the EAS-GBM expression subtypes could in uence clinical management.
It was reported that tumor cells with homologous recombination defects (HRD) are more sensitive to platinum drugs or PARP inhibitors, such as Olaparib 23,24 25 . These results also suggest EAS-GBM molecular subtype rather than TMB could be an ICB response indicator. Interestingly, through driver gene analyses, we found that this absent EUR group is driven by EGFR activating mutations, whose SNVs were found to be depleted in the EAS cohort, indicating a less prominent role for EGFR signaling in EAS GBM than in EUR GBM. Furthermore, EAS-GBM was also characterized by a subgroup that showed immune in ltrate expression signatures. This subgroup contained a third of all EAS-GBM patients and showed expression signatures that have been associated with response to immunotherapy in other cancer types 14,16 . This could suggest that a subset of EAS-GBM have a higher chance of responding to immunotherapy. We also found that our novel four class EAS GBM classi cation matches well with previous hierarchical clustering based on GBM scRNA data 28 . PL, NS, MB and IM are aligned with GBM patients enriched in cell cycle, oligo, hypoxia and immune signaling pathway.

Discussion
The prior transcription based classi cations so far have not made any impact on clinical management.
Analyses of the clinical outcomes in the EAS cohort showed that the different expression subgroups respond differently to the current rst line standard of care: chemotherapy and radiation. While NS EAS-GBM showed no bene t from either therapy, IM and PL GBM showed a bene t from chemotherapy only and MB EAS GBM responded to both chemotherapy and radiation. These ndings highlight a potential direct clinical relevance of the expression subgroups in EAS-GBM.
Overall, we believe the EAS-GBM classi cation proposed in this manuscript dissected out some of its unique genetic and transcriptomic differences, both within itself and in relation to EUR-GBM. Additions from multi-omics modalities could further re ne this classi cation, and add markers that can predict differential responses to treatments, particularly those that are disparate in different ancestry. This work could provide guidance ranging from research design, clinical trial recruitment to daily clinical management of EAS-GBM. According to the World Health Organization (WHO) 2016 and 2021 classi cation of central nervous system tumors, patients in this study were diagnosed by three independent experienced pathologists based on histological characteristics combined with molecular characteristics 29,30 . In this study, glioblastoma were further divided into PRS types, which refers to primary GBM (pGBM), secondary GBM (sGBM) and recurrent glioblastoma (rGBM). pGBM is primary GBM without any clinical or histologic evidence related to less malignant lesions. sGBM in this study refers to glioblastoma that developed from low grade glioma (LGG) by de nition, usually accompanied with IDH mutation. In addition, rGBM refers to a patient whose GBM has been diagnosed to relapse again. Considering the impact of IDH mutation status on the prognosis of GBMs, whole exome sequencing (WES) and RNA sequencing (RNA Seq) were performed on 471 individuals of East Asian descent, which were grouped by EAS IDH1wt and IDH1 mutant groups. IDH1 mutation status in this study was detected by Sanger sequencing independently, 1p19q co-deletion status was checked by Fluorescence in situ hybridization (FISH), MGMT promoter methylation status was determined following previous protocol 31 .

Sample preparation and sequencing
Glioblastoma tissues were snap-frozen in liquid nitrogen immediately after surgical resection and preserved in liquid nitrogen. First, we extracted genomic DNA from the tumor and matched blood samples, and con rmed its high integrity by 1% agarose gel electrophoresis. DNA was then fragmented for quality control, and paired libraries were prepared. For whole exome sequencing, Agilent SureSelect kit v5.4 was used for target capture. Sequencing was performed on the Illumina HiSeq 4000 platform using paired-end sequencing strategy. Total RNA was extracted from each tumor sample, followed by removing tRNA and rRNA. After quality control and quanti cation, mRNA was reverse transcribed into cDNA, then followed by library preparation and sequencing.

Somatic variant identi cation
Short sequence reads were mapped to the human reference genome GRCh37 using BWA-MEM (v.0.7.17) with default parameters. Following GATK best practice, PCR duplicates were rst removed and parameters were rst determined using 50 runs of random starting points with default settings, followed by 300 runs with optimal ranks to acquire the nal NMF clustering solutions. Silhouette analysis was performed to assess the consistency of proposed clustering solutions. Clustering signi cance was evaluated using SigClust, and all class boundaries were proved to be statistically signi cant.

Principal component analysis and clustering
Principal Component Analysis (PCA) was rst employed to detect RNA decomposition of these samples.
Next, PCA were performed again to check correlation between proposed molecular subtypes and batches based on top 3000 variable genes used in NMF clustering, PCA was realized using R built-in function prcomp. Sample points were colored respectively based on their NMF clusters and batches, while shapes were annotated by PRS type. Besides, t-SNE clustering was performed to check grouping e ciency by applying the t-SNE R package. Phylogenetic tree of the EAS cohort was analyzed to plot the genetic distance among different molecular subtypes using the monocle R package (v.0.2.3).

Molecular subgroup mapping across EAS and EUR Cohort
To investigate correlation among molecular subtypes identi ed from EAS and EUR cohorts, an unsupervised subclass mapping method SubMap (https://www.genepattern.org/) was used to identify correspondence or commonality of subtypes from the two cohorts. SubMap analysis was performed on the GenePattern online platform using default settings with a random seed of 1. The intersection of genes selected for NMF clustering (3,000 most variable genes in each cohort) were input to perform analysis. Signi cant correspondences were identi ed with a cutoff of Bonferroni adjusted P < 0.01.

Marker genes selection and pathway enrichment analysis
To identify marker genes for each GBM subtypes, Comparative Marker Gene Selection algorithms on the GenePattern online platform were performed with default settings and a threshold score > 0.5. Top 500 differential expressed genes (DEGs) were selected as marker genes for each subtype, DESeq2 (3.12)   HLA supertype and haplotype identi cation Human leukocyte antigen (HLA) molecules are highly polymorphic. They contain various supertypes (also known as serotypes) and haplotypes even within the same race, and across different races. Different HLA supertypes and haplotypes have shown a signi cant disparity in antigen presenting ability and peptide ligands a nity 34 . Expression of HLA genes among GBM subtypes were obtained and compared from transcriptome TPM value. HLA supertypes and haplotypes were identi ed using two independent software tools, arcasHLA (v.0.2.5) and seq2HLA with default parameters 35,36 . Only haplotypes and supertypes con rmed by the two softwares were retained for future analysis. HLA loss of heterozygous (LOH) was considered as an individual that only one allele gene was detected on his or her allele loci.
Association between HLA status and clinical features were then investigated by univariate and multivariate analysis.
Detection and analysis of TCR, BCR CDR3 sequences from GBMs RNA-Seq data TRUST (v.3.0.1) was applied (https://bitbucket.org/liulab/trust) to extract TCR/BCR sequence from 345 GBMs RNA-seq data 33 . CDR3 sequences with signi cantly fewer reads (less than 3) were excluded in this analysis. The number of total sequencing reads was obtained from each bam le using Samtools, each variable (V), joining (J), or constant (C) region of CDR3 sequences were annotated with the help of TRUST. In order to compare the richness of TCR/BCR among different subtypes, we normalized CDR3s counts by total reads and tumor purity in each sample. Clonotype diversity of T/B cells was estimated by TCR/BCR CDR3s per kilo TCR/BCR reads (CPK) in each sample as previously described 37 . Shannon entropy calculated from the frequencies of TCR β chain and IgH CDR3 sequences is another index evaluating diversity of TCR/BCR, and such an index represents the adaptive ability of patients' immune systems. γδ T cell fraction, which plays a pivotal role in tumor cytotoxic immunity 38 , was estimated by the total number of γ or δ CDR3s divided by the total number of TCR CDR3s in each sample 39