Functional genomics identifies five distinct molecular subtypes with clinical relevance and pathways for growth control in epithelial ovarian cancer

Epithelial ovarian cancer (EOC) is hallmarked by a high degree of heterogeneity. To address this heterogeneity, a classification scheme was developed based on gene expression patterns of 1538 tumours. Five, biologically distinct subgroups — Epi-A, Epi-B, Mes, Stem-A and Stem-B — exhibited significantly distinct clinicopathological characteristics, deregulated pathways and patient prognoses, and were validated using independent datasets. To identify subtype-specific molecular targets, ovarian cancer cell lines representing these molecular subtypes were screened against a genome-wide shRNA library. Focusing on the poor-prognosis Stem-A subtype, we found that two genes involved in tubulin processing, TUBGCP4 and NAT10, were essential for cell growth, an observation supported by a pathway analysis that also predicted involvement of microtubule-related processes. Furthermore, we observed that Stem-A cell lines were indeed more sensitive to inhibitors of tubulin polymerization, vincristine and vinorelbine, than the other subtypes. This subtyping offers new insights into the development of novel diagnostic and personalized treatment for EOC patients.

classified into the Stem-B subtype (Suppl. Figs. 4B and 6A); although, these non-serous tumours shared "few molecular similarities". The inclusion of the other histotypes, therefore, may allow the identification of a unique subgroup in serous carcinomas. To see the effect of including the other histologies in our molecular classification (using a standard deviation cutoff of 1.05), we examined the lists of genes that were most variably expressed across the samples: 1) all samples used in the current study (n = 1,538), 2) serous carcinoma samples (n = 1,274) and 3) samples with histotypes other than serous carcinoma (n = 264). These lists showed a significantly higher overlap between all 1,538 and 1,274 serous samples (1,116 overlapped/1,138 genes in total; 98.1%), than between all 1,538 and 264 of the other histology samples (1,089 overlapped/1,311 genes in total; 83.1%). This difference in the gene lists indicates that the overall expression pattern with all 1,538 samples was influenced mainly by variation within the serous cancer samples, but to a lesser extent than that within the other histological types. The other histotypes might have been thus classified into the same molecular category as Stem-B, even though they exhibited distinct biological characteristics. The clinicopathological information obtained with each dataset was neither standardised nor centrally reviewed across the datasets; therefore, it is possible that pathologically misdiagnosed samples were included. However, it is also likely that the variation between serous and the other histologies contributed to the identification of the Stem-B subtype exhibiting "less serous" features. This notion was also supported by the fact that the expression level of WT1 gene, a marker of serous adenocarcinoma of the ovary (Lawrenson & Gayther, 2009) was significantly lower in Stem-B tumours, irrespective of whether they were exhibiting other histotypes or were only serous ovarian carcinoma (Suppl. [ClaNC] samples with survival information) in the validation dataset might be responsible for this discrepancy. However, we also found that the serous samples included in the Stem-B subtype have different overall survival rates between the training and validation datasets, implying possible heterogeneity within the subtype.

Discordance of molecular subtype prediction
Despite recent progress in the development of statistical models for cancer molecular subtyping with expression microarray data, there is still uncertainty about the reproducibility of these various classification algorithms because of the intrinsic nature of subtype identification where the true classification remains unknown (Haibe-Kains et al, 2012).
Previously, Haibe-Kains et al. showed that the concordance between pairs of different classifiers varied from 49-86% (median: 68%) in the molecular subtyping of 5,715 breast cancer samples derived from 36 independent datasets (Haibe-Kains et al, 2012). Similarly, a concordance of 77.5% was observed between a centroid prediction and a k-means clustering in a breast cancer cohort with 412 samples (Calza et al, 2006), showing that the best methods achieve a concordance comparable to the 78.8% we observed in our study. By scrutinising the discrepancy in the subtype classification, we noted high rates of discordant assignments in distinguishing Epi-A from Epi-B or Stem-B, and Epi-B from Mes (Suppl . Table 8). This ambiguity may arise from shared biological properties between some of the subtypes; for example, all three Epi-A, Epi-B and Stem-B subtypes expressed epithelial markers, and many clinical samples with Epi-B or Mes subtype assignment expressed inflammatory cell markers (Fig. 1A;Suppl. Fig. 11A). A similar overlap was indeed observed in a breast cancer cohort between luminal A and luminal B subtypes, which shared a nuclear expression of oestrogen receptors (Calza et al, 2006). In fact, 82% of 489 ovarian cancer expression data of TCGA were assigned to more than one subtype in a cross-validation with ss-GSEA (Verhaak et al, 2013), implying there were transcriptionally overlapped features across the samples. We believe that the observed discordance is due to similarity in the biological properties of the clinical samples.

Classification accuracy of core and non-core samples
The classification accuracy of core samples was worse than non-core samples (Fig. 1D) primarily due to the difference in gene sets used for defining core samples, and the gene sets used to define subtype by BinReg. The gene components for each gene set were largely distinct, with a 9.8% overlap (116/1185 genes). Moreover, the silhouette width for the genes with variable expression was not correlated with that for the BinReg subtype signatures (Spearman correlation rho = 0.0194 and p = 0.4612). These findings are not inconsistent with our observation, in which the core samples were not always predicted as anticipated.

Association of molecular subtypes with patient outcomes
In addition to  . Table 5B).
: The Stem-A or Epi-B subtype was found to be an independent prognostic factor from multiple clinical characteristics for serous ovarian cancer patients.
2) Serous cancer cases with age, stage, grade, metastasis status and molecular subtype status, based on the progression-free survival rate (PFS) (Suppl . Table 5E).
: The molecular subtype was not independently correlated with PFS for serous ovarian cancer.
3) All histology cancer cases with age, stage, grade, metastasis status, surgical status and molecular subtype status, based on OS (Suppl . Table 5C).
: The limited number of cases with complete information prevented us from identifying even the debulking status as an independent prognostic factor with statistical significance. Similarly, none of the molecular subtype was found as an independent prognostic factor. 4) All histology cancer cases with status for debulking surgery or molecular subtypes, based on OS (Suppl . Table 5D).
: When we examined only the debulking surgery status and the molecular subtype, Epi-B and Stem-A were found to be significant prognostic factors independent of the debulking status.
In summary, the Stem-A and Epi-B subtypes are consistently identified as prognostic factors that are independent of multiple clinical parameters and status, which include the status for surgery in the overall survival rate (Table 1; Suppl. Tables 5A, 5B and 5D).

Cell line subtype identification by consensus clustering
Four independent datasets of ovarian cancer cell lines from Duke University (42 cell lines), Kyoto University (37 cell lines), National University of Singapore (34 cell lines) and Lawrence Berkeley National Laboratory (29 cell lines) were analysed (Guan et al, 2007;Matsumura et al, 2011). The data for a total of 142 cell lines were compiled and analysed with the data of 1,142 core clinical samples in consensus clustering, without considering the histological origin of the cell line. Realising that the identified subclass labelling for cell lines did not fully capture the pattern of clinical samples, this labelling was then used as a tentative assignment for cell line subtypes for subsequent clustering analysis. After identification of the subtype-specific marker genes using the "cell-line only" expression data with SAM and ROC (Tusher et al, 2001), a consensus clustering was again performed relying on the selected gene sets (Suppl. Fig. 10A). The approach of the co-clustering analysis (Lowe et al, 2007;Perou et al, 1999;Prat et al, 2010;Virtanen et al, 2002) yielded a stable subtype classification for the cell lines with reasonable similarity to that of the clinical samples (Suppl. Fig. 10C), and achieved a 100% consistency in subtype classification for biological replicates of 28 cell lines that originated from the same source (Suppl. Table 9). This is in contrast to the approach of predicting the cell line subtype with BinReg or ClaNC, where the consistency in subtype classification was 67.9% (19/28 cell lines) amongst the biological replicates (Suppl. Table 9).
Based on this observation, co-clustering was adopted instead of a prediction approach for subtype assignment of the ovarian cancer cell lines. We performed a silhouette analysis and a SigClust (Liu et al, 2008) to confirm the validity of the cell line subtypes ( Fig. 2A). In addition, to confirm the expression similarity between cell lines and clinical samples for each subtype, Spearman correlation map was constructed to measure the closeness in gene expression between the cell lines and clinical samples (Suppl. Fig. 10C). Furthermore, BinReg was adopted to validate the subtype assignment of the cell lines ( Fig. 2B) (Gatza et al, 2010).
A co-consensus clustering of the cell lines and the serous tumour samples was performed, and yielded a result that was highly concordant (137/142, 96.5%) with that of the original analysis. We noted that the five discordant arrays included three biological replicates of the TYK-nu cell line and one of its derivatives (TYK-nu CisR) (data not shown), which were originally grouped into the Mes subtype but were clustered with Stem-A serous tumours when clustering with serous tumours only. Nevertheless, this highly concordant assignment demonstrates the robustness of the co-clustering method and implies a very modest influence of the other histologies when included with serous tumour samples.  (Boyer et al, 1989;Freedman et al, 2000;Mantovani et al, 2008).

Pathways affected in silencing PA-1-specific genes
The ss-GSEA revealed the pathways affected by targeting each of the five PA-1 relevant genes in PA-1, OVCA433 and HeyA8. All siRNA transfections produced significant and appropriate silencing of the genes with 67.3-92.6% (median 83.9%) efficacy (Suppl. Fig.   13A). A total of 534 pathways were commonly altered across the three cell lines (Suppl. Fig.   13B; Suppl. Table 14). GTF3C1 knockdown coincided with the down-regulation of the gene set Reactome RNA Polymerase III Transcription in all three cell lines pertaining to the function of GTF3C1 as a regulatory component of RNA polymerase III transcription machinery (Kovelman & Roeder, 1992). Similarly, as one of the components required for microtubule nucleation (Fava et al, 1999;Moritz et al, 1995;Moritz et al, 1998), TUBGCP4 knockdown resulted in down-regulation of the Microtubule gene set in the transcriptome. these examples indicate that this approach can connect a gene with a pathway as expected from the biological function of the gene. Statistical power plots to distinguish one subtype from the others. The statistical power computation is based on the results of a t-test, with a significance level α = 0.05, and on the assumption that subtype distribution is the same throughout different population sizes.

Suppl. Fig. 9. Three-way split cross-validation using BinReg and ClaNC
Five sets of three-way split cross-validations for subtype classifiers by BinReg and ClaNC using the dataset with 1,538 clinical samples are shown. The data was divided into five random sets of: training set A, (40%, n = 615); training set B, (40%, n = 615) and a testing set (20%, n = 308), such that each sample was used in the testing set exactly once to ensure balance. The classifier was subsequently built using training set A or B, and then used to Note that the Kyoto data included duplicated arrays for the HEY cell line. Also note that the two-round CC method provides consistent subtype assignments for the biological replicates across the 28 cell lines derived from the same source (Duke, Kyoto, and Singapore). C. whereas the hairpin copy numbers were sorted according to hairpin score in the RIGER analysis (Luo et al, 2008), with which binary comparisons were performed for each subtype to obtain subtype-specific cell growth determinant genes. Note that a clear genome-wide distinctive pattern of hairpins across subtypes was detected. The relatively amplified hairpins (red) putatively target the genes that have suppressive effects on cell growth under conventional culture conditions, while genes targeted by the relatively depleted shRNAs (green) may have growth-promoting effects on cells with a given subtype. Red = higher; green = lower copy number counts. B. Effect-size distribution of the subtype-specific amplified or depleted hairpins from the three binary comparisons. x-axis is the effect size, yaxis is the frequency. Abbreviations

Supplemental Tables
Please refer to the Excel file.

Principal component analysis (PCA) was performed on the combined dataset before and after
ComBat standardisation to ensure analysis on the combined dataset was not biased by the presence of technical variations derived from non-biological effects. Expression data of all the probes were used to calculate the principal components, and then the first three principal components comprising less than ~97% variance were visualised in the 3-dimensional scatter plot by Matlab (ver. 7.8.0).

Statistical power analysis
Statistical power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. A statistical power analysis for distinguishing among the subtypes was performed as follows: First, we computed the subtype distribution of the 1,538 ovarian carcinoma  . Last, we computed the statistical power for two-sided t-test using Matlab for each gene, with significance set at α = 0.05. We repeated the procedure for the different number of samples assuming that the subtype distribution is consistent. The statistical power plot for each subtype is given in Suppl. Fig. 2.

Predictive modelling and validation by BinReg
The BinReg parameters determined from training sets A and B for molecular subtype predictive modelling are given in Suppl. Table 17.

Predictive modelling and validation by ClaNC
Silhouette analysis was performed using Matlab (ver. 7.8.0) to identify core samples that best represent their subtypes with a positive silhouette width. Significance analysis of microarrays (SAM) and receiver operating characteristic (ROC) analysis were applied to determine the marker genes for each subtype and to assess their capability for distinguishing one subtype from the others (Tusher et al, 2001). A false discovery rate of zero and area under the curve (Pejovic et al, 2009) threshold of >0.78 (up-regulated in the subtype) or <0.22 (downregulated in the subtype) were used to filter out non-significant genes for SAM and for ROC, respectively. Based on these marker genes, we applied classification of microarrays to nearest centroids (ClaNC) to generate signatures for each subtype, and subsequently the subtype predictive model of the clinical samples (Dabney, 2006). In order to validate the subtype prediction, we adopted for a 10-fold cross-validation to provide a sufficient estimation of the predictive model performance. In the 10-fold cross-validation, the 1,538 epithelial ovarian cancer samples were randomly partitioned into 10 sets, each comprising 153-154 samples (Subramanian & Simon, 2011). One set was used as a validation set (to be predicted), whereas the other 9 sets (1,384 or 1,385 samples) were used to build the predictive model. This process was repeated 10 times such that each set was used as the validation set exactly once. This method minimised bias introduced by the sample order and distribution when assessing the predictive model. Subtype predictions of all the validation sets were combined and compared against the subtype assignment by consensus clustering on all of the 1,538 samples.

Three-way split cross-validation using BinReg and ClaNC
To provide additional assessment on BinReg and ClaNC for ovarian cancer molecular subtype predictors, we performed five random 40-40-20 cross-validations using the two methods, where two mutually exclusive sets comprising 40% (n = 615) of the 1,538 ovarian cancer samples were used as either training set A or B to model a predictor, and the remaining 20% (n = 308) of the samples not in training sets A and B were used as a testing set to assess the predictor performance. To ensure balance in predictor assessment, each sample was used in the testing sample exactly once in the five random 40-40-20 data division. Subsequently, we identified molecular subtypes independently from training set A or B using consensus clustering (with most varying genes), silhouette width analysis, and then build BinReg or ClaNC predictors using core samples, as described in the section "Predictive modelling and validation by BinReg/ClaNC" of Materials and Methods and Suppl. Materials and Methods.
The predictive model from training set A or B was then used to predict the subtype of the samples in the testing set. Concordance was measured by comparing predicted subtype with the original classification from the consensus clustering using all the 1,538 samples. The result is shown in Suppl. Fig. 9.

Cell line panel
The cell-line panel used in the study was a mixture of histology according to the publicly available descriptions (Suppl . Table 11). Microarray analyses for the gene expression of 142 ovarian cancer cell lines were combined from four independent datasets and analysed: (1) Duke University (n = 42: A2008, A2780 cisR, A2780, BG1, C13, Caov-2, Caov-3, CH1, TOV-112D, TOV-21G, UPN251). Out of these 142 cell lines, 28 cell lines originated from the same source (Duke, Kyoto, and National University of Singapore) and served as biological replicates. These 28 cell lines are described in Suppl. Table 9.

Genome-wide RNAi screens for subtype-specific proliferation genes
To assess the reliability of our shRNA screen, we performed RIGER analysis to obtain hairpins that correlated with the TP53 status of the 14 cell lines. The result shows that our systematic functional screening was able to deliver known TP53-downstream genes associated with a TP53 genotype of the cultured cell lines, with statistical significance (data not shown). Subsequently, the same method was applied to identify hairpins specifically relevant for cell growth in a molecular subtype.
A comparison of the Cheung et al. (Cheung et al, 2011) dataset with ours was not feasible because of different experimental designs (i.e., at the endpoint of the screen, we compared the abundance of integrated shRNA sequences across the different molecular subtypes, while Cheung et al. compared the relative abundance to the initial shRNA pool across different cell line lineages) and detection platforms (next-generation sequencing versus microarray, respectively).