Immuno-genomic classification of colorectal cancer organoids reveals cancer cells with intrinsic immunogenic properties associated with patient survival

The intrinsic immuno-ge7nomic characteristics of colorectal cancer cells that affect tumor biology and shape the tumor immune microenvironment (TIM) are unclear. We developed a patient-derived colorectal cancer organoid (CCO) model and performed pairwise analysis of 87 CCOs and their matched primary tumors. The TIM type of the primary tumor was classified as immuno-active, immuno-exhausted, or immuno-desert. The gene expression profiles, signaling pathways, major oncogenic mutations, and histology of the CCOs recapitulated those of the primary tumors, but not the TIM of primary tumors. Two distinct intrinsic molecular subgroups of highly proliferative and mesenchymal phenotypes with clinical significance were identified in CCOs with various cancer signaling pathways. CCOs showed variable expression of cancer-specific immune-related genes such as those encoding HLA-I and HLA-II, and molecules involved in immune checkpoint activation/inhibition. Among these genes, the expression of HLA-II in CCOs was associated with favorable patient survival. K-means clustering analysis based on HLA-II expression in CCOs revealed a subgroup of patients, in whom cancer cells exhibited Intrinsically Immunogenic Properties (Ca-IIP), and were characterized by high expression of signatures associated with HLA-I, HLA-II, antigen presentation, and immune stimulation. Patients with the Ca-IIP phenotype had an excellent prognosis, irrespective of age, disease stage, intrinsic molecular type, or TIM status. Ca-IIP was negatively correlated with intrinsic E2F/MYC signaling. Analysis of the correlation between CCO immuno-genotype and TIM phenotype revealed that the TIM phenotype was associated with microsatellite instability, Wnt/β-catenin signaling, APC/KRAS mutations, and the unfolded protein response pathway linked to the FBXW7 mutation in cancer cells. However, Ca-IIP was not associated with the TIM phenotype. We identified a Ca-IIP phenotype from a large set of CCOs. Our findings may provide an unprecedented opportunity to develop new strategies for optimal patient stratification in this era of immunotherapy.


Conclusions:
We identified a Ca-IIP phenotype from a large set of CCOs. Our findings may provide an unprecedented opportunity to develop new strategies for optimal patient stratification in this era of immunotherapy.

Background
Understanding the intrinsic immuno-genomic characteristics of cancer cells is essential if we are to develop optimal patient stratification systems for targeted therapy. Moreover, the tumor immune microenvironment (TIM) has received significant attention due to its close association with responses to immunotherapy. However, baseline data are lacking, and few studies have examined the intrinsic immuno-genomic properties of patient-derived cancer cells. Currently, comprehensive characterization of cancers is based on gene expression patterns in bulk cancer tissues, which contain both cancer cells and the TIM. RNA sequencing delineates the global gene expression pattern in primary cancer tissues as it captures cancer cells as well as various cells in the TIM; therefore, gene expression patterns in cancer tissues reflect both these entities [1][2][3]. However, the TIM is "noisy" and can mask intrinsic cancer-specific signatures in bulk primary tissues. Therefore, an analysis based on the detection of purely intrinsic cancer cell-derived signals will highlight the immuno-genomic characteristics of the cancer cells themselves. To acquire such purely intrinsic signals, it is necessary to use well-defined cancer models rather than cancer cell lines, as the former will recapitulate cancer tissues in the absence of the TIM.
Recently, tissue-specific stem cells from several human organs were cultured to generate organoids, which mimic the 3D structure and function of the organ from which they are derived [4][5][6][7]. These organoid models have been developed for several cancers and used successfully to recapitulate the architecture of primary cancer tissues [8][9][10][11]. Organoids derived from colorectal cancer have been shown to maintain the therapeutic responses as well as various genomic characteristics of the primary tumor [12][13][14][15]. However, these studies have included only a small number of organoids, and studies investigating the collective features of colorectal cancer in large numbers of colorectal cancer organoids (CCOs) are still lacking. Pairwise integrative analysis of CCOs and the corresponding cancer tissues with TIM in a large number of samples can provide deeper insights into the purely intrinsic immuno-genomic characteristics of colorectal cancers.
Here, we generated 87 CCOs as part of an organoid biobank project. These CCOs closely mimicked the histological characteristics of their respective primary cancers. We then evaluated the cancer-specific intrinsic immuno-genomic properties of colon cancer cells using this large set of CCOs and their corresponding cancer tissues after confirming that these organoids recapitulated the signatures of the primary cancers at the transcriptome level.

Patient samples
Small (1-4 cm 3 ) sections of colorectal cancer tissue samples were obtained from resected colorectal specimens as part of the biobanking process for colorectal cancer at the Asan Bio-Resource Center of Asan Medical Center (Seoul, Korea). All tissues were used with the patients' consent. The research protocol was approved by the Ethics Committee of Asan Medical Center, and the entire experimental protocol was conducted in compliance with the relevant institutional guidelines. Samples were categorized as tumor or normal tissue based on histopathological assessment. The diagnosis of each patient was confirmed by pathologists at Asan Medical Center. A total of 87 colorectal cancers were used to generate the CCO models used in this study. The clinicopathological features of the 87 patients are summarized in Supplementary Table 1.

Culturing patient-derived cancer organoids
Within 1 h of excision, patient samples were placed in cold Hank's balanced salt solution (HBSS; Lonza, Basel, Switzerland) containing 1× primocin (Invivogen, Hong Kong, China) and transported to the laboratory on ice. Samples were then washed three times with cold HBSS containing antibiotics and cut into 1-2 mm 3 sections using sterile blades. The sectioned samples were incubated (37°C for 40-90 min with intermittent agitation) in DMEM/F12 medium (Gibco, OK, USA) supplemented with 0.2 U/μL collagenase II (Gibco), 1% penicillin/streptomycin (Gibco), and 0.5 mg/mL amphotericin B (2%; Sigma-Aldrich, MO, USA). After incubation, the suspensions were triturated repeatedly by pipetting, centrifuged at 1200 rpm for 5 min, and washed three times with DPBS (Welgene). Next, the suspensions were passed through 100 μm cell strainers (BD Falcon, CA, USA) and centrifuged at 600 rpm for 3 min. The resulting pellet was resuspended in 100 μL minimum basal medium for CCO (serum-free medium [ Then, 200 μL Matrigel (Corning, NY, USA) was added to the remaining 100 μL suspension to establish organoids, and 150 μL of the resulting cell suspension was allowed to solidify in a single well of a 6-well culture plate (Corning) pre-warmed at 37°C for 10 min. After gelation, 3 mL CCO MBM was added to the well, and the medium was changed every 4 days. The organoids were passaged after 1-3 weeks. For passage, a solidified Matrigel drop containing the organoids was harvested in cold DPBS and then centrifuged at 112×g for 3 min at 4°C. The pellets were washed with cold DPBS and centrifuged at 250 rcf for 15 min at 4°C, and the organoids in the pellet were resuspended in 2 mL TrypLE Express (Invitrogen) and incubated for 10 min at 37°C to allow dissociation. After this, 10 mL DMEM/F12 containing 10% FBS was added and the mixture centrifuged at 112×g for 3 min. The pellets were then washed with DPBS, centrifuged at 112 g for 3 min, resuspended in CCO MBM + Matrigel (1:3), and reseeded at a ratio of 1: 3 or 1:4 to form new CCOs.

Tissue microarray construction and immunohistochemical analysis
A tissue microarray was constructed for the primary tumor tissues. Two random cores (2 mm in size) were obtained from formalin-fixed and paraffin-embedded (FFPE) primary tumor tissues after microscopic evaluation. The tissue microarray was subjected to immunohistochemical analysis using antibodies specific for HLA-DR/DP/DQ/DX (

Targeted DNA sequencing and data processing
To obtain CCOs (> passage 3) for genomic analysis, a solidified Matrigel drop containing the CCOs was harvested in cold DPBS and centrifuged at 112×g for 3 min at 4°C. The pellets were washed with cold DPBS and centrifuged at 250×g for 15 min at 4°C. Then, genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Germany). A DNA library was prepared using the SureSelect XT custom kit (Agilent Technology) after checking DNA quality. Pooled libraries were sequenced at the Department of Pathology, Asan Medical Center on an Illumina MiSeq instrument (Illumina, CA, USA) using a targeted gene panel. This nextgeneration sequencing (NGS) system has been approved for clinical NGS testing by the Korean government. The targeted gene panel is approximately 1 Mb in size and contains 33,524 probes targeting 382 genes, including the entire exons of 199 genes, 184 hotspots, and partial introns of eight genes frequently rearranged in cancers [16,17]. Targeted sequencing of CCOs was performed without matched normal tissue.
Sequenced reads were aligned against the human reference genome (National Center for Biotechnology Information build 37) using BWA (0.5.9) with default options [18]. Demultiplexing was performed using the MarkDuplicates tool in the Picard package (Broad Institute, Cambridge, MA; http://broadinstitute.github.io/ picard, last accessed on February 14, 2018) to remove PCR duplicates. The deduplicated reads were realigned at known indel positions using the GATK IndelRealigner tool [19]. Next, the base qualities were recalibrated using the GATK BaseRecalibrator tool. Somatic singlenucleotide variants and short indels were detected with an unmatched normal using Mutect version 1.1.6 and the SomaticIndelocator tool in GATK (Broad Institute) [19][20][21]. Common and germline variants from somatic variant candidates were filtered out using the common dbSNP build 141 (found in > 1% of samples), Exome Aggregation Consortium release 0.3.1 (https://gnomad. broadinstitute.org/), the Korean Reference Genome database (http://coda.nih.go.kr/coda/KRGDB/index.jsp), and an in-house panel of normal variants. Final somatic variants were annotated using Variant Effect Predictor version 79 [22] and converted to the maf file format using vcf2maf (GitHub; https://github.com/mskcc/vcf2maf, last accessed on February 14, 2018). False-positive variants were curated manually using the Integrative Genomics Viewer [23]. Because CCOs comprise pure cancer epithelial cells, variants with a variant allele fraction (VAF) > 0.9 were considered to be biallelic.

Analysis of copy number variation
To determine copy number variation (CNV), the organoid DNA bam file was analyzed using CNVkit (v0.9.0) [24]. In this study, the CNV fraction was defined as the ratio of the region in which the log2 segmentation value is > 0.4 or the log2 segmentation value is < − 0.4 to the region in the exomes. Copy segments were visualized using the copy number package [25].
Whole transcriptome sequencing and data processing Total RNA sequencing was performed for 87 CCOs and matched FFPE primary tumor tissues. For FFPE tissues, manual microdissection of unstained tissue sections was performed for viable tumor areas under a light microscope, using the corresponding H&E slide as a reference. Total RNA was extracted using the RNeasy Mini Kit (Qiagen) and a cDNA library was constructed using the TruSeq RNA Access Library Prep Kit (Illumina), starting with 1 μg of total RNA. All samples passed the cDNA library quality assurance tests (minimum requirement > 5 nM). Paired-end sequencing of 100 nt fragments was performed on the HiSeq 2500 Illumina platform. RNA sequence data from primary cancer tissues and CCOs were processed in a similar manner; raw RNAseq data were processed using the TCGA RNAseq Pipeline (v2) after sequence quality assurance, and the quality of the FastQ files was checked using FastQC (https://github. com/s-andrews/FastQC, v0.7.15). The FastQC "basic statistics" results for all primary cancer tissues and CCOs were "PASS". Primary cancer tissues had the following average basic statistics: total sequence, 50

TIM classification
The immune and stroma classes from the 87 primary cancer tissues were distinguished using the Nearest Template Prediction module [28] implemented in Gene-Pattern (www.broadinstitute.org/genepattern). Each sample was predictively assigned to an "Immunogenic" or "Non-immunogenic" class based on its immune characteristics [1]. The immunogenic class was further divided into a "Normal stroma" and an "Activated stroma" class based on stroma characteristics [2]. Finally, the immunogenic class with normal stroma was classified as immuno-active (Active), the immunogenic class with activated stroma was classified as immuno-exhausted (Exhausted), and the "Non-immunogenic" class was classified as immuno-desert (Desert) [1][2][3].

Profiling of tumor-infiltrating immune cells
To identify infiltrating immune cells, CIBERSORT [29] with LM22 (22 immune cell types) gene signatures was used to analyze RNA expression profiles of the 87 primary colon cancer samples or CCOs. The sum of the scores for the 22 cell types was used as the total immune score [11]. MCPcounter [30] was used to identify 10 cell populations in the TIM.

Cancer signaling pathway analysis
Gene Set Enrichment Analysis (GSEA) v4.0.2 [31] was used to identify alterations in cancer signaling pathways based on RNA expression profiles in the primary tissues or CCOs. The hallmark gene set v7.0 was used as the Molecular Signatures Database, and the GSEA results were considered significant when the FDR q value was < 0.25 and the nominal p-value was < 0.05. Gene Set Variation Analysis (GSVA) [32] was used to score the activation of hallmark gene sets and to calculate the CD8 + T cell exhaustion score using the CD8 + T cell exhaustion gene set [33] for each sample. [34] v1.8.1 (default options) was used to detect gene fusions in CCO RNA sequence data. To filter out false-positive results, fusion fragments per million (FFPM) > 0.15 of total RNAseq reads was applied to the STAR-Fusion results. The results were further filtered based on the number of split reads and spanning reads: without any split reads, there should be at least 20 spanning reads, and one split read requires at least 10 spanning reads. When there are more than two split reads, five or more spanning reads are required. False positives were removed manually when the same fusion was detected multiple times due to mis-mapping. The filtered fusions were visualized as circos plots using chimeraviz [35] v1.8.5 in R.

STAR-Fusion
Variant calling in RNA sequence data RNA sequence reads were mapped against the human reference genome (National Center for Biotechnology Information build 37) using STAR (2.7.3) [36]. Picard Markduplicates, GATK IndelRealigner, and BaseRecalibrator were implemented in the same way as described above for DNA reads. GATK HaplotypeCaller (3.8.0) and GATK Mutect2 (4.0.2) were used to identify mutations in the RNA BAM files. Mutations discovered using HaplotypeCaller were filtered based on the following criteria: FS > 30.0 and QD < 2.0. For variants detected by Mutect2, only the "PASS" from FilterMutectCalls was used. This analysis pipeline can also accurately detect variants in RNA sequences [37]. RNA variants were identified, and only those variants found in CCO samples were analyzed. Sequencing depth at the mutated position was calculated using the "depth" option in samtools.

HLA typing
Seq2HLA [38] (v2.3) was used to identify highresolution HLA molecular types from organoid RNA sequence data. HLA class I alleles were identified with four-digit resolution. HLA-A and HLA-B results (including the ambiguity flag) were mapped to the HLA supertype [39].

Molecular classification based on whole transcriptome data
To identify cancer subgroups with specific signaling pathways, all CCO samples in hallmark v7 gene sets were scored using GSVA [32], based on CCO mRNA expression. Subgroups were identified using k-means clustering analysis and principal component analysis (PCA). The results were visualized using the R package Com-plexHeatmap [42]. Consensus Molecular Subtype (CMS) classification of organoid transcriptome data was performed in R using the CMScaller package, which is suitable for colorectal cancer pre-clinical models [43].

Microsatellite instability (MSI) testing
MSI status was evaluated using polymerase chain reaction (PCR). Fluorescence-labelled primers were used to amplify five microsatellite loci, two mononucleotide repeats (BAT-25 and BAT-26), and three dinucleotide repeats (D5S346, D2S123, and D17S250) in tumors and matched normal samples. MSI status was based on the length of the PCR product within the tumor sample versus in the paired normal sample. Samples with instability in two or more of the five loci were defined as microsatellite instability-high (MSI-H), samples with instability in one of the five loci were defined as microsatellite instability-low (MSI-L) and samples with no instability were defined as microsatellite stable (MSS). Two samples were excluded from MSI-PCR testing and status was determined using our previously reported algorithm based on targeted DNA sequence data [44,45].

Enrichment analysis
The enrichment score (ES) was calculated to determine whether samples with specific binary events were enriched in certain targets with continuous variables across the whole sample [46,47]. The 87 CCOs with corresponding CD8 + T cell immune scores were ordered based on increasing immune scores. Next, the enrichment score for biallelic alterations identified in KRAS in the CCOs was calculated based on the ordered immune scores and then normalized using the Kolmogorov-Smirnov statistics [46,47], as shown below. The enrichment score for KRAS biallelic alterations (KBA) in the organoids was calculated as follows: N = total organoid sample number; G = number of organoid samples with KBA.
The enrichment score had a higher positive score when samples with KBA were consistently ranked at the top of the list of whole samples. The maximum enrichment score was obtained when the n th sample in the KBA was ranked as the top enrichment score among whole samples. The KBA class label was permutated 10, 000 times, and the maximum enrichment score generating background distribution was recorded each time. The permutated P-value was then calculated as follows: The ES for TP53 GOF mutations was calculated in the same manner.
Analysis of public cancer cell line genomics data and differential dependency Cancer cell line genomic data, including mutation calls, were downloaded from the DepMap 19Q4 data release (https://depmap.org/portal) [48]. Annotation of primary disease sites for each cell line can be found in the Dep-Map 19Q4 data release. Genes with dependencies in cell lines with the FBXW7 mutation (hotspot or damaging mutation) were identified and compared with those in cell lines without the FBXW7 mutation. Differences in mean dependency between FBXW7 mutated and nonmutated cell lines were estimated, and the associated Pvalues were computed using the Wilcoxon rank-sum test. In addition, sensitivity of 1001 molecularly annotated human cancer cell lines to 265 drugs was assessed [49]. Cell line mutation data were obtained from the Cell Lines Project v91 (https://cancer.sanger.ac.uk/cosmic).

Analysis of publicly available single cell RNA sequence (scRNAseq) data
Normalized scRNAseq data from 17,469 epithelial cells from 23 colon cancer tissues were downloaded from the Gene Expression Omnibus database (https://www.ncbi. nlm.nih.gov/geo/; accession no. GSE132465) [50]. The Rtsne v0.15 package in R was used for clustering and visualization of the scRNAseq expression data.

Drug testing in CCOs
CCOs were cultured in 24-well plates for 2 weeks, and then harvested and dissociated using TrypLE Express. The dissociated CCOs were mixed with MBM + Matrigel (1:3 ratio) and seeded in 96-well white plates (10 μL of 2 × 10 3 cells/well; Corning). After gelation, 100 μL MBM was added to each well. The CCOs were allowed to grow for 10-14 days until their diameters reached 100 μm. Then, ten concentrations of FH535, ICG-001 (all Selleckchem, TX, USA), and DMSO controls were added every 3 days. After 6 days, the medium was changed to 100 μL MBM per well to measure cell viability, and 100 μL CellTiter-Glo (Promega) was added to each well. The experiments were conducted in triplicate. The plates were agitated for 30 min at room temperature prior to luminescence measurement. IC 50 values were determined using Graph Pad Prism (v5.01).

Statistical analysis
Continuous variables were analyzed using Spearman's correlation analysis. Depending on the normality of data distribution, the Wilcoxon rank-sum test (non-parametric) or Student's t-test (parametric) was used to evaluate the significance of differences in continuous variables between groups. Fisher's exact test or the chi-square test was used to evaluate the significance of the differences between categorical variables. Multivariate Cox proportional hazards regression and logistic regression analyses were also performed. All statistical analyses were performed in R version 3.5.2.

Development of CCOs and genomic characterization
A total of 87 patient-derived CCOs, cultured from resected colorectal cancer tissues ( Fig. 1a and Supplementary Table 1) as part of the cancer organoid biobank project, were used. CCOs showed diverse morphologies under a bright-field microscope, including thick-walled spherical (R), thin-walled bubble-like (RM), compact globular (RF), and budding tubular (RB) shapes (Fig. 1b). H&E staining revealed that CCOs maintained the characteristic histology of their respective primary cancer. IHC showed that the CCOs recapitulate the expression patterns and percentages of positive cells for transcription factor CDX2, differentiation marker cyto-keratin20 (CK20), and carcinoembryonic antigen (CEA); these markers were positive in most cancer cells ( Fig. 1c and Supplementary Fig. 1a). Histological examination revealed that tumor organoids comprised only tumor cells, with no TIM components, such as lymphocytes, fibroblasts, and blood vessel components.
The top 20 mutated genes in the CCOs are shown in Fig. 1d. Although mutations in all genes could not be identified because mutation analysis was based on targeted sequencing, the most frequently mutated genes were APC and TP53 (in 79% of the CCOs), followed by KRAS (in 55% of the CCOs), and FBXW7 (in 24% of the CCOs). These frequencies were similar to those previously reported for colorectal cancer [51]. The frequencies of these known major driver gene mutations were similar to those observed in an independent colorectal cancer cohort at Asan Medical Center (n = 832), which were examined using the same targeted NGS platform. However, other genes showed higher mutation frequencies in CCOs than in the cancer cohort (Fig. 1e).
The higher rate of detection of mutations in CCOs is likely due to the high tumor cell content. Additionally, the genetic variants mainly arose as subclonal events. In fact, the CCOs had widely distributed VAFs (Fig. 1f and Supplementary Fig. 1b), indicating the presence of subclones. These data suggest that organoids have intratumoral heterogeneity, as previously shown [8]. We further examined correlations in VAF between CCOs and primary tumors using RNA sequences of the top 4 most frequent mutations, which mostly overlapped between CCOs and primary tumors (Fig. 1g-h). Interestingly, we observed a significant tendency toward loss-ofheterozygosity (LOH) in TP53 (Fig. 1h). This is consistent with a previous report showing that more than 91% of the TP53 mutations undergo biallelic inactivation due to biallelic mutations or copy-neutral LOH at the DNA and RNA levels [52]. We also observed significant correlations with respect to VAFs between organoid RNA and organoid DNA (Fig. 1i and Supplementary Fig. 2a), as well as between organoid DNA and tissue RNA (Supplementary Fig. 2b-d). With respect to correlations between DNA and RNA, we observed non-linear patterns, especially in the case of nonsense mutations (Fig. 1j); this may be explained by nonsense-mediated decay of mRNA [53]. Various copy number alterations were also present across CCOs. Organoids with MSI-H exhibited subtle copy number alterations ( Supplementary Fig. 3a). A total of 36 fusions were identified in 25 CCOs (Supplementary Fig. 3b). However, no known targetable fusion events were present. The frequency of HLA class I molecular types ( Supplementary Fig. 3c-d) was in good agreement with that previously reported in 5082 Koreans [54].

Oncogenic gene expression signatures are maintained in organoids
Next, we examined whether oncogenic gene expression signatures of the primary tumors were maintained in the organoids. First, we found that genes exhibiting log2 expression of 5 or higher (n = 12,114) among patients were significantly correlated between organoids and primary tissues (Fig. 2a). Most genes had significant positive correlations ( Fig. 2b and Supplementary Fig. 4a). We selected 5000 genes with the highest variance in organoids and calculated the Spearman's correlation coefficient between organoids and primary tumors. We found significant correlation in mRNA expression between organoid and matched tissues in the 87 patients (correlation coefficients range: 0.501-0.71, Fig. 2c). These correlation coefficients were significantly higher in the Desert group, which was characterized by a low level of immune cell infiltration (Fig. 2d).
After confirming that the global expression patterns of the 5000 genes were similar in organoids and primary tumor tissues, we tested correlations using 1090 key genes involved in carcinogenesis pathways from the hallmark gene sets v7; these genes were involved in Wnt/βcatenin, DNA repair, Notch, PI3K/AKT/mTOR, MYC target, P53, and KRAS signaling pathways. The majority (72%) of these genes showed no overlap with the 5000 genes with high variance (Fig. 2e). A significant correlation was identified between the expression of this gene set in organoids and primary tissues in the 87 patients (correlation coefficient range: 0.736-0.941); the correlation coefficients for this gene set were significantly higher than those for the set of 5000 genes with high variance (Fig. 2f-g). Among the genes showing a significant correlation between organoids and primary tissues (e.g. APC, BRAF, CTNNB1, FBXW7, KRAS, MYC, PIK3CA, PTEN, and TP53), APC showed the highest correlation (rho = 0.74, p < 2.2e − 16 ; Spearman's correlation test; Fig. 2h).
We then tested whether cancer signaling pathways were maintained in the organoids. All primary tissues and organoids were scored for the hallmark gene set v7 using GSVA [32]. We found that most cancer signaling pathways tended to show a positive correlation (Fig. 2i) and that there were significant correlations in signaling pathways between organoids and primary tissues, including the WNT/β-catenin, TP53, PI3K/AKT/mTOR, and unfolded protein response signaling pathways (Fig. 2j). In summary, we verified at multiple levels-i.e., total genes, high variance genes, genes involved in key carcinogenesis pathways, important cancer-related genes, and genes involved in cancer signaling pathways-that organoids recapitulate the primary tumors in terms of gene expression and mechanisms of carcinogenesis.
We also found that-as expected-microenvironmental signatures of gene expression were the major factors that distinguished organoids from primary tissues (Supplementary Fig. 4b-c). However, the expression of differentiation markers or stem cell markers of intestinal epithelium [55,56] was also correlated between organoids and primary tumor tissues despite the absence of TIM in organoids ( Supplementary Fig. 4d). Taken together, the gene expression patterns in organoids represent tumor cell-specific signatures (separate from the TIM).

Cancer-intrinsic signaling pathways in CCOs
Gene expression analysis revealed that variable alterations in cancer signaling pathways were present across all 87 CCOs, indicating inter-tumoral heterogeneity (as expected for tumors from different patients). Unsupervised clustering using k-means (Fig. 3a) and principal component analysis (Fig. 3b) identified four subgroups, two of which were designated the high proliferation subgroup, (k1), in which E2F/MYC pathways were activated, and the mesenchymal subgroup, (k4) (Fig. 3a and c). The k2 and k3 subgroups were in a "grey zone", with no specific alterations in gene expression associated with particular pathways. The mesenchymal subgroup (k4) harbored frequent KRAS mutations (Fig. 3d), and the high proliferation subgroup (k1) showed the worst prognosis in terms of recurrence-free survival (Fig. 3e). The  high proliferation subgroup (k1) showed high expression of genes associated with DNA replication stress responses (Fig. 3f), indicating active replication status. IHC was performed in primary tissues to detect Ki67, E2F, and cyclin-E ( Fig. 3g and Supplementary Fig. 5a-b). Ki67 expression in cancer cells derived from these primary tissues (Ki67-IHC) correlated with the expression of Ki67 mRNA in the organoids, although no correlation was identified between Ki67 mRNA expression in organoids and primary tissue containing TIM (Fig. 3h and Supplementary Fig. 5c). Additionally, the Ki67-IHC correlated with E2F-IHC in cancer cells derived from primary tumor tissues (Fig. 3h right). We validated these findings using publicly available scRNAseq data [50] and found that cancer cells expressing Ki-67 and E2F1 overlapped and the percentage of cancer cells expressing Ki-67 varied among the 23 patients and correlated with the percentage of cancer cells expressing E2F1 (Fig. 3i). When the k1 subgroup (high proliferative group) was compared with the k4 subgroup (mesenchymal subgroup), the k1 subgroup in organoids showed IHC features similar to those observed in the corresponding primary tumors (Fig. 3j), suggesting that the high proliferation activities of primary cancers were maintained in the organoids. In addition, at the gene level, increased expression of mesenchymal genes, including vimentin, was identified in the k4 subgroup of CCOs while decreased expression was observed in the k1 subgroup (Fig. 3k). Thus using scRNAseq data, we verified that most of the single cells expressing MKI67 and the cells expressing vimentin do not overlap (Fig. 3l), which suggests that the high proliferation subgroup and mesenchymal subgroup are distinct subgroups in colorectal cancers.
Clustering analysis based on CMS classification [43] of organoids revealed significant correlation (Supplementary Fig. 5d), suggesting that the intrinsic molecular subtypes could represent the distinct colorectal cancer subgroups.

Variable cancer-intrinsic expression of immune-related genes in CCOs
Next, we examined the expression of previously identified immune-related genes [57] (Fig. 4a). CCOs expressed various immune-related genes at high levels with a high standard deviation, including HLA class II genes and genes related to immune checkpoint regulation (Fig. 4b). A subset of CCOs overexpressed PDL1 (CD274), independently of total PDL1 expression levels in tissue samples (Fig. 4c). Overall, PDL1 expression was higher in primary tissues than in organoids (Supplementary Fig. 6a) and tended to be higher in the k4 (mesenchymal) subgroup of organoids (Fig. 4d). Although PDL1 expression in organoids did not correlate with expression in tissues, PDL1 expression in primary tissues correlated with the degree of immune cell infiltration of primary tissues (Supplementary Fig. 6b). These findings indicate that the main source of PDL1 expression in colon tumor tissues is inflammatory cells. However, PDL1 expression varied among organoids; a subset of organoids exhibited relatively high cancerspecific expression of PDL1. The frequency of individual mutated genes did not differ significantly between the high (≥ median) and low (< median) PDL1 expression groups (Fig. 4e). However, organoids showed different gene expression patterns based on the level of PDL1 expression ( Supplementary Fig. 6c). Gene Ontology (GO) enrichment analysis revealed that CCOs with high PDL1 expression were enriched in GO terms associated with interferon signaling (Fig. 4f). CCOs with high PDL1 expression were found to be enriched in genes associated with interferon α/γ response pathways in a GSEA with 50 hallmark gene sets, whereas CCOs with low PDL1 expression tended to be enriched in genes associated with the Wnt/β-catenin signaling pathway (Fig. 4g). These findings suggest that alteration of pathways intrinsic to cancer cells is associated with PDL1 expression levels and presence of cancer cells with intrinsic alteration of immune related pathway such as interferon signaling.

Impact of cancer-intrinsic immune-related gene expression on survival
Next, we classified immune-related genes into four functional categories, i.e., immune checkpoint stimulation, immune checkpoint inhibition, HLA-I, and HLA-II (Fig. 5a). The mean expression values of related genes were used to represent the signature of each functional category. Among them, cancer-intrinsic expression of HLA-I and of immune checkpoint stimulators and inhibitors was not associated with survival in any percentile cut off (Fig. 5b). However, the cancer-intrinsic expression of the HLA-II signature (of the top five highly expressed genes, i.e., DRB1, DQB1, DRA, DRB5, and DPA1, from CCOs), was significantly associated with patient survival (Fig. 5c). Assessment of the prognostic impact of the expression of the five individual HLA-II genes (for the top 5 HLA-II genes from CCOs) also showed similar results ( Supplementary  Fig. 7a). These findings indicate that cancer-cell intrinsic HLA-II expression has a prognostic impact. We also explored the prognostic impact of HLA-II signature in CCOs based on expression levels using 4-quantiles (Fig.  5d). The CCO patient group with no/very low HLA-II signature showed the worst prognosis. Interestingly, the CCO patient groups with low, medium or high HLA-II signatures showed good prognosis, irrespective of HLA-II signature levels (Fig. 5d). These findings indicate that cancer intrinsic HLA-II expression is associated with favorable prognosis. However, higher intrinsic HLA-II     signature level does not mean a better prognosis. Expression of HLA-II DR, DP, and DQ in CCOs correlated with protein expression detected by IHC in cancer cells derived from primary tissues (Fig. 5e). When patients were stratified based on IHC data, patients with higher expression of HLA-II showed better survival, although this was not significant ( Supplementary Fig. 7b). The prognostic impact of HLA-II signature could not be identified upon using primary tissues from two independent sets, i.e., our own ( Fig. 5f) and TCGA colorectal cancer data (Fig. 5g); this suggests that cancer-intrinsic expression of HLA-II plays an important role in predicting patient prognosis because bulk cancer tissue contains TIM that includes antigenpresenting cells such as mononuclear inflammatory cells, which are the main centers of HLA-II expression, and the HLA-II signal from those cells adds to the net HLA-II signature in the bulk cancer tissue.

A subset of CCOs with intrinsic immunogenic properties is associated with improved patient survival
Among the immune-related genes expressed by CCOs, HLA-DQB1 exhibited the most variable expression (Fig.  4b) and appeared to have some clinical impact (Supplementary Fig. 7a). Therefore, we performed unsupervised clustering based on HLA-II genes, which revealed a tendency to cluster based on HLA-II expression levels, with the highest expression being of that of DRB1, DQB1, DRA, DRB5, DPA1, and DPB1 (Fig. 6a). The clustering pattern in our CCO cohort was similar to that of an independent CCO cohort from nine patients [11] ( Fig. 6b). To better stratify the CCOs, we performed kmeans clustering and PCA on our CCO cohort and generated three groups, i.e., high HLA-II expression group (HLA-II-k2 group), medium HLA-II expression group (HLA-II-k1 group), and no/low HLA-II expression group (HLA-II-k3 group) (Fig. 6c). The patients in the HLA-II-k3 group had the worst prognosis, whereas those in the HLA-II-k1 group and HLA-II-k2 groups were predicted to have favorable survival (Fig. 6d). This finding is similar to our finding in Fig. 5d. Therefore, HLA-II-k1 group and HLA-II-k2 group were integrated into a single group classified as Higher HLA-II group while HLA-II-k3 group was classified as Lower HLA-II group for further analysis. The prognostic impact of the Higher HLA-II group was independent of other clinicopathological parameters, such as disease stage, age, intrinsic molecular subgroup (k1 subgroup), and immunogenic TIM (Fig. 6e).
To better characterize the Higher HLA-II group, KEGG pathway analysis was performed on the CCO data. The results revealed that Higher HLA-II CCOs had significant enrichment of intrinsic antigen processing and presentation pathways (Fig. 6f), higher expression of signatures associated with intrinsic immune checkpoint stimulation (Fig. 6g), and higher expression of intrinsic HLA-I (Fig. 6h), despite the fact that the Higher HLA-II group was identified based on clustering of HLA-II expression patterns. This group was not associated with the total number of immune cells infiltration in the TIM of primary tumor tissue ( Fig. 6i and Supplementary  Fig. 8a). Thus, we refer to the Higher HLA-II group as cancer cells with intrinsic immunogenic properties (Ca-IIP) that have a clinical impact (Fig. 6j). To explore underlying molecular characteristics of Ca-IIP, we compared mutational profiles and gene set enrichment analysis (GSEA) in a hallmark gene set. There were no significant differences in the mutational profiles of CCOs with and without Ca-IIP (Fig. 6k). However, significantly enriched cell-cycle regulatory pathways, including MYC and E2F targets for CCOs with no Ca-IIP, were observed (Fig. 6l). The HLA-I supertype was not associated with Ca-IIP ( Supplementary Fig. 8b).

Cancer-intrinsic immuno-genomic alterations associated with TIM phenotypes
To identify cancer cell-intrinsic immuno-genomic alterations associated with the TIM, we classified TIM status as immunogenic or non-immunogenic based on the gene expression profiles of the 87 primary tumor tissues and known expression patterns of marker genes; thus the immunogenic TIM was divided into two additional subtypes, i.e., Exhausted and Active ( Fig. 7a and Supplementary Fig. 9a). The TIM status was validated using different methods, including CIBERSORT immune cell profiling ( Supplementary Fig. 9b), MCP immune and stromal scores ( Supplementary Fig. 9c), and the CD8 + T exhaustion score (as measured by the GSVA algorithm) ( Supplementary Fig. 9d). TIM status was not associated with Ca-IIP (Fig. 7b). Additionally, TIM status was significantly associated with cancer cell-intrinsic expression of HLA-I ( Fig. 7c and Supplementary Fig. 9e) but not of HLA-II (Fig. 7d). Clinically, patients with the active TIM had better survival than the other groups, albeit without statistical significance (Fig. 7e). Based on the TIM status, all four tumors with MSI-H and one hypermutated tumor were classified in the Exhausted group (Fig. 7a). These tumors also had high CD8 + T exhaustion scores (Fig. 7f). With respect to nonsynonymous mutations, there were differences in the frequency of NF1 and remodeling genes SMARCA4, KMT2A, and ATRX based on TIM class (Fig. 7a). There were no significant differences in CNV ( Fig. 7a and Supplementary Fig. 9f) or fusion number ( Supplementary  Fig. 9 g) based on TIM status.
Next, we examined genes that were differentially expressed in the different TIM subgroups (Active, Exhausted, and Desert) and found no significant differences at the single gene level among the TIM subgroups Unsupervised clustering based on HLA-II genes expression in two independent CCO cohorts revealed similar clustering patterns. (C) k-means clustering based on HLA-II genes expression revealed 3 subgroups, which were correlated with HLA-II gene expression levels (Kruskal-Wallis test). HLA-II-k3 subgroup (subgroup with the lowest HLA-II genes expression) resulted in unfavorable prognosis (D) and this prognostic impact was independent of patient age, sex, stage, high proliferative molecular subgroup (k1 subgroup) and tumor immune microenvironment (TIM) status (Multivariate Cox regression analysis) (E). (F) Enriched antigen processing/presentation pathway in CCOs with high HLA-II expression (HLA-II-k1 and HLA-II-k2 subgroups). CCOs belonging to these subgroups had high intrinsic immune-stimulatory gene expression (G) and high HLA class I expression (H). However, the total immune cell infiltration amount in primary tissues was not associated with these subgroups (Wilcoxon rank-sum test) (I). (J) These subgroups (HLA-II-k1 and HLA-II-k2 subgroups) were referred as Ca-IIP group and their characteristics were summarized. (K) There were no significant mutations associated with Ca-IIP (p > 0.05 by logistic regression analysis). (L) Enriched cancer-intrinsic signaling pathways based on Ca-IIP (GSEA analysis) in CCOs. CCO, colorectal cancer organoid; Ca-IIP, Cancer cells with intrinsic immunogenic properties in the organoids (one-way ANOVA test, FDR > 0.25 for all tested genes) ( Supplementary Fig. 10). Therefore, we focused on signaling pathways and found that the Wnt/β-catenin signaling pathway was enriched in the Desert (non-immunogenic) group ( Fig. 8a and Supplementary Fig. 11a-c). The mutation status of APC, a major component of the Wnt/β-catenin signaling pathway, was associated with the Desert group ( Fig. 8b and Supplementary Fig. 11d-f). The KRAS (biallelic) mutation was more common in tumors with decreased numbers of cytotoxic lymphocytes (as determined by MCP) (Fig. 8c) or CD8 + T cells (as determined by CIBER-SORT) (Supplementary Fig. 12a). These biallelic KRAS mutations tended to occur in non-immunogenic cancers ( Supplementary Fig. 12b-d). Colon cancers with the KRAS mutation showed upregulated expression of CXCL3 and PD-L1 in cancer cells, decreased granzyme B levels in cancer tissues, and increased M2 macrophage numbers in the TIM ( Supplementary Fig. 12e), supporting previous findings from a report of a model showing the characteristics of cancers with KRAS mutations [58]. Similarly, the TP53 GOF mutations were enriched in tumors with low CD8 + T cell infiltration ( Fig. 8d and Supplementary Fig. 13a-c). The unfolded protein response pathway was enriched in the Exhausted group ( Fig. 8e and Supplementary Fig. 13d). We also found that the FBXW7 mutation was more common in the Exhausted group (Fig. 8f), and that it correlated significantly with CD8 + T cell exhaustion (Fig. 8g) and the unfolded protein response pathway, based on GSVA (Fig. 8h). Multivariate logistic regression analysis revealed that the association between immuno-exhaustion and FBWX7 mutation was independent of the hypermutated cancer phenotype including MSI-H (Fig. 8i). The FBXW7 mutation was one of the most common mutations associated with exhausted TIM (Fig. 8f) and enrichment of genes associated with the unfolded protein response pathway (Fig. 8). Therefore, we screened the Achilles project dataset to identify candidate targets of synthetic lethality in FBXW7-mutated tumors. We found that CTNNB1 inactivation is a potential target in tumors with the FBXW7 mutation ( Supplementary Fig. 14a). Next, we selected compounds associated with the Wnt/ β-catenin pathway and used CCLE data to analyze responses to approximately 200 drugs. We found that cancer cell lines harboring the FBXW7 mutation had significantly higher sensitivity to the PPAR inhibitor, FH535, than cell lines without the mutation (Supplementary Fig. 14b). Furthermore, only the colorectal cancer type yielded significant results in the synthetic  Fig. 14c). We validated the induction of lethality as the PPAR inhibitor for the FBXW7 mutation using our CCOs ( Supplementary  Fig. 15), thereby demonstrating the utility of CCOs as a preclinical model.

Discussion
Tumors with abundant infiltrating T cells are referred to as "immunogenic" or "hot"; these tumors are associated with a favorable prognosis and better response to immune checkpoint inhibitor-based therapy [59,60]. Therefore, many studies have focused on the TIM. However, the lack of a patient-derived cancer cell model that recapitulates the primary cancer means that relatively few studies have investigated the intrinsic immunogenic properties of cancer cells and their clinical significance. Several studies have shown that CCOs maintain the genetic diversity and treatment response characteristics of the primary tumor [12][13][14][15]. Our study revealed that CCOs recapitulate the global gene expression profiles and pathways that characterize the tumors. Although a limited number of genes were evaluated, we confirmed that mutations in known major driver genes, such as TP53, KRAS, APC, and FBXW7 were maintained The KRAS mutation is enriched in tumors with low cytotoxic lymphocyte infiltration (10,000 permutated test and Wilcoxon rank-sum test one-sided). (D) Association between TP53 GOF mutations and decreased cytotoxic lymphocyte infiltration (10,000 random permutation test and Spearman's correlation test). (E) Upregulated cancer-intrinsic pathways in CCOs with exhausted TIM of primary tumor (GSEA analysis). (F) Frequent FBXW7 mutation in CCOs with exhausted TIM of primary tumor (Fisher's exact test). CD8 + T cell exhaustion score (G) and unfolded protein response pathway score (H) by GSVA analysis based on FBXW7 mutation status (Spearman's correlation test). (I) FBXW7 mutation and hypermutated phenotype including MSI-H were independently associated with exhausted TIM of primary tumor (multivariate logistic regression analysis). GOF, gain-of-function; TIM, tumor immune microenvironment; CCO, colorectal cancer organoid. MSI-H, microsatellite instability-high in the CCOs. Furthermore, we also investigated cancerintrinsic immunogenomic characteristics along with clinical features in a large CCO sample set.
Cancer organoids enable the identification of signatures associated with cancer-intrinsic immuno-genomic alterations, which are often masked when using bulk tumor tissue due to the signatures within the TME. In this study, we identified Ca-IIP using a large number of CCOs. Patients with Ca-IIP showed an excellent prognosis with respect to overall and recurrence-free survival, with HLA-II being the key intrinsic factor associated with patient survival. We found significant correlation between the expression of HLA-II mRNA in CCOs and HLA-II protein expression in cancer cells from primary tissues. However, the impact of HLA-II IHC data in primary tumor tissue on survival was not significant; this might be explained by the discrepancy between the measurement strategies used in IHC and RNA sequencing; 1) different antibody targets used for HLA-II, 2) tissue microarray (2 mm 2 × 2 cores) used for IHC may not represent the entire cancer tissue, 3) low resolution of IHC-based expression measurement (categorical measurement) compared to the continuous RNA sequencing results, and 4) small sample size resulting in low statistical power. Indeed, a previous study that measured the expression of HLA class II molecules using IHC in approximately 1000 colorectal cancer tissue samples has shown that the expression of HLA class II antigens in CRC cells was significantly associated with a favorable clinical course [61]. The results from our study suggest that identification of patients based on cancerintrinsic expression of HLA-II mRNA is a more powerful predictor of survival than identification based on intrinsic expression of HLA-I, intrinsic immune signatures, intrinsic cancer signaling pathways, or TIM status.
Recently, the TIM has received significant attention due to its potential relationship with clinical outcome and response to immunotherapy. As such, we classified colorectal cancers into three microenvironment groups based on immune cell and stromal signatures in primary tumor tissues. Cellular constituents of the TIM are more likely to be induced by intrinsic genomic alterations in tumor cells than by germline genomic characteristics of the host. We identified several genomic alterations in CCOs that were associated with tumor TIM class. All MSI-H tumors belonged to the Exhausted group, which may be associated with a good response to PD1/PDL1 inhibitors because PD1 is an important marker for CD8 + T cell exhaustion. In addition, the Desert group showed activation of Wnt/β-catenin signaling and high expression of Wnt pathway target genes, which is consistent with results from recent studies showing that the activation of Wnt/β-catenin signaling results in immune exclusion [3,62,63]. Our data also suggest that cancerintrinsic genetic alterations, including those in KRAS, TP53, and FBXW7, are linked to TIM status.