DEPTH2: an mRNA-based algorithm to evaluate intratumor heterogeneity without reference to normal controls

Intratumor heterogeneity (ITH) is associated with tumor progression, unfavorable prognosis, immunosuppression, genomic instability, and therapeutic resistance. Thus, evaluation of ITH levels is valuable in cancer diagnosis and treatment. We proposed a new mRNA-based ITH evaluation algorithm (DEPTH2) without reference to normal controls. DEPTH2 evaluates ITH levels based on the standard deviations of absolute z-scored transcriptome levels in tumors, reflecting the asynchronous level of transcriptome alterations relative to the central tendency in a tumor. By analyzing 33 TCGA cancer types, we demonstrated that DEPTH2 ITH was effective in measuring ITH for its significant associations with tumor progression, unfavorable prognosis, genomic instability, reduced antitumor immunity and immunotherapy response, and altered drug response in diverse cancers. Compared to other five ITH evaluation algorithms (MATH, PhyloWGS, ABSOLUTE, DEPTH, and tITH), DEPTH2 ITH showed a stronger association with unfavorable clinical outcomes, and in characterizing other properties of ITH, such as its associations with genomic instability and antitumor immunosuppression, DEPTH2 also displayed competitive performance. DEPTH2 is expected to have a wider spectrum of applications in evaluating ITH in comparison to other algorithms.


Background
Intratumor heterogeneity (ITH) refers to the differences of molecular and phenotypic profiles between different tumor cells within tumors. ITH has significant associations with tumor advancement, prognosis, immune evasion, and therapeutic responses [1]. Many algorithms have been proposed to quantify ITH at the genetic level, such as MATH [2], EXPANDS [3], DITHER [4], and PhyloWGS [5]. However, although genetic ITH is prevalent in human cancers, it cannot delineate the full phenotypic diversity of cancers [6]. It suggests that mRNA ITH could also contribute to the phenotypic diversity of cancers. Thus, a few algorithms have been developed to quantify ITH at the mRNA level, such as DEPTH [6] and tITH [7]. Previously, we developed an mRNA-based algorithm, namely DEPTH [6], to evaluate ITH levels. We demonstrated that DEPTH is superior to or comparable with other methods in characterizing the properties of ITH [6].
In this study, we proposed a novel algorithm to score ITH at the mRNA level. Similar to DEPTH, the novel algorithm, termed DEPTH2, measured ITH based on the Page 2 of 17 Song and Wang Journal of Translational Medicine (2022) 20:150 perturbations of gene expression profiles. However, different from DEPTH and other ITH evaluation methods, DEPTH2 quantified ITH without reference to normal controls. It indicates that DEPTH2 can be applied to any gene expression profiles in tumors, regardless of whether the gene expression profiles in normal samples available or not. Furthermore, by analyzing transcriptomic profiles in more than 30 cancer types, we demonstrated that DEPTH2 is competitive in characterizing the properties of ITH compared with most established algorithms, including DEPTH.

Algorithm for quantifying ITH without reference to normal controls
Given a normalized gene expression matrix containing m genes and t tumor samples, the DEPTH2 score of tumor sample T is defined as where ex(x, y) denotes gene x expression level in the tumor sample y. DEPTH2 calculated a tumor's ITH level based on the standard deviations of absolute z-scored expression values of a set of genes in the tumor. In a tumor, if most genes show close absolute z-scored expression values, the tumor will likely have a low DEPTH2 score, namely low ITH level; otherwise, the tumor will have a relatively high DEPTH2 score. Therefore, the DEPTH2 score reflects the asynchronous level of gene expression alterations relative to the central tendency normalized by standard deviation in all tumors for all genes in the gene expression matrix (Fig. 1). We argued that the asynchronous level was determined by the heterogeneity level of gene expression profiles among different tumor cells within a tumor and thus reflects the ITH level at the mRNA level. The R package for the DEPTH2 algorithm is available at the website: https:// github. com/ XS-Wang-Lab/ DEPTH2, under a GNU GPL open-source license.

Evaluation of genomic instability and tumor purity
The tumor mutation burden (TMB) in a tumor was defined as the total number of somatic mutations in the tumor. We obtained copy number alteration (CNA) scores and homologous recombination deficiency (HRD) scores in TCGA cancers from the publication by Knijnenburg et al. [12]. We calculated tumor purity using the ESTIMATE [13] algorithm with the input of gene expression profiles.

Survival analysis
We compared survival prognosis between higher-DEPTH2-score (> median) and lower-DEPTH2-score (< median) tumors in pan-cancer (overall survival (OS), disease-specific survival (DSS), disease-free interval (DFI), and progression-free interval (PFI)) and in individual cancer types (OS and disease-free survival (DFS)). We used Kaplan-Meier (KM) curves to display survival time differences and the log-rank test to assess the significance

High-ITH tumor
Low-ITH tumor GEL: Gene expression level  of survival time differences. The survival analyses were implemented using the function "survfit" in the R package "survival".

Gene-set enrichment analysis
We scored the enrichment levels of immune, stemness, and pathway signatures using the single-sample geneset enrichment analysis (ssGSEA) [14]. The ssGSEA scored these features based on the expression levels of their marker or pathway genes. The ratio of two immune signatures in a tumor sample was the geometric mean expression level of all marker genes in an immune signature divided by that in another immune signature (log 2 -transformed). The marker or pathway gene sets are presented in Additional file 1: Table S2.

ITH evaluated by other algorithms
We calculated MATH ITH scores using the function "math.score" [2] in the R package "maftools" with the input of "maf " files. We calculated ABSOLUTE ITH scores, namely ploidy scores, using the ABSOLUTE algorithm [15] with the input of "SNP6" files. Both "maf " and "SNP6" files were obtained from the genomic data commons data portal (https:// portal. gdc. cancer. gov/). In addition, we obtained the ITH scores calculated by DEPTH [6], PhyloWGS [5] and tITH [7] from their associated publications.

Statistical analysis
In evaluating correlations between ITH scores and other variables, we used the Spearman's correlation and reported the correlation coefficient (ρ) and P-value. In comparisons of ITH scores among different classes, we used the one-sided Mann-Whitney U test for two classes and Kruskal-Wallis test for more than two classes. We used the Benjamini and Hochberg method [16] to calculate the false discovery rate (FDR) for adjusting for multiple tests. All statistical analyses were implemented in the R programming environment (version 4.0.2).

DEPTH2 scores correlate positively with genomic instability in cancer
Genomic instability may increase TMB and CNA [21]. We found that TMB had a positive correlation with DEPTH2 scores in 10 individual cancer types (P < 0.05) (Fig. 3A). In 18 cancer types, CNA scores correlated positively with DEPTH2 scores (P < 0.05) (Fig. 3A). Because p53 has a critical role in maintaining genomic stability, TP53 mutations may promote genomic instability [22,23]. We found that DEPTH2 scores were remarkably higher in TP53-mutated than in TP53-wildtype tumors in 14 cancer types (P < 0.05) (Fig. 3B). Further, DNA mismatch repair deficiency and HRD are important factors responsible for genomic instability in cancer [21]. We found that the expression levels of three DNA mismatch repair proteins MSH2, MSH6, and PCNA correlated positively with DEPTH2 scores in 12, 14, and 11 cancer types, respectively (P < 0.05) (Fig. 3C). Furthermore, DEPTH2 scores correlated positively with HRD scores [12] in pan-cancer (ρ = 0.05, P = 1.27 × 10 -6 ) and in 19 individual cancer types (P < 0.05) (Fig. 3D). Taken together, these results suggest a strong positive association between the DEPTH2 ITH level and genomic instability in cancer.

DEPTH2 scores correlate positively with tumor purity
Notably, DEPTH2 scores had significant positive correlations with tumor purity in pan-cancer (ρ = 0.39, P < 0.001) and in 22 cancer types (P < 0.05) (Fig. 5A). It implies that the DEPTH2 ITH level increases with the increase of tumor purity. To further show that the DEPTH2 measure does indeed represent ITH between tumor cells within a bulk tumor, we calculated the DEPTH2 scores in normal controls for 30 cancer types with related data available. As expected, DEPTH2 scores were significantly lower in normal controls than in tumor samples in pan-cancer (P = 1.63 × 10 -142 ) and in all the 28 cancer types (P < 0.05) (Fig. 5B).
To correct for the impact of tumor purity on the relationship between DEPTH2 scores and genomic instability, we divided DEPTH2 scores by proportions of tumor cells in bulk tumors, which were obtained from the TCGA pathological slides data, termed tumor purityadjusted DEPTH2 scores. Notably, the tumor purityadjusted DEPTH2 scores showed significant positive correlations with TMB, CNA, and HRD in 5, 11, and 12 cancer types, respectively (P < 0.05) (Additional file 3: Fig. S2A); they correlated inversely with the enrichment scores of CD8+ T cells, NK cells, immune cytolytic activity, and IFN response in 14, 9, 10, and 13 cancer types, respectively (P < 0.05) (Additional file 3: Fig. S2B); they correlated positively with stemness scores in 11 cancer types (Additional file 3: Fig. S2C). Moreover, the tumor purity-adjusted DEPTH2 scores were significantly higher in late-stage versus early-stage tumors, in high-grade versus low-grade tumors, and in metastatic versus primary tumors in 8, 6, and 4 cancer types, respectively (Additional file 3: Fig. S2D). Meanwhile, the tumor purityadjusted DEPTH2 scores still had significant negative correlations with survival prognosis in diverse cancer types (Additional file 3: Fig. S2E). These results suggest that the significant associations of DEPTH2 scores with antitumor immune signatures, genomic instability, and clinical outcomes in cancer are independent of tumor purity.

Proteins with significant expression correlations with DEPTH2 ITH
We found 50 proteins having significantly positive expression correlations with DEPTH2 scores in at least 5 cancer types (FDR < 0.05) (Additional file 1: Table S3).

DEPTH2 scores are associated with drug response in cancer
In 265 antitumor compounds from the Genomics of Drug Sensitivity in Cancer (GDSC) project (https:// www. cance rrxge ne. org), 199 (75%) displayed significant correlations of drug sensitivity (IC50 values) with DEPTH2 scores in cancer cell lines (P < 0.05) (Additional file 1: Table S4). It suggests that the DEPTH2 ITH correlates with drug response for a wide range of anticancer drugs. Interestingly, among the 199 compounds, 131 had a significant inverse correlation of IC50 values with DEPTH2 scores versus 68 showing a significant positive correlation ( Fig. 7A and Additional file 1: Table S4). It indicated that with the increase of the DEPTH2 ITH level, to many compounds, the sensitivity of cancer cells could increase or decrease. Interestingly, among the 131 compounds whose IC50 values correlated inversely with DEPTH2 scores, many targeted the cell cycle pathways, including NPK76-II-72-1, KIN001-270, THZ-2-102-1, PHA-793887, AT-7519, THZ-2-49 and CCT007093. In addition, two compounds (Methotrexate and Temozolomide) targeted the DNA replication pathway. These results are justified since DEPTH2 scores showed significant positive correlations with the enrichment scores of the cell cycle and DNA replication pathways in cancer cell lines (P < 0.05) (Fig. 7B). In contrast, among the 68 compounds whose IC50 values correlated positively with DEPTH2 scores, many targeted the pathways upregulated in low-DEPTH2-score tumors, such as AZ628, CI-1040, RDEA119, PD-0325901, AZD6244 and Trametinib targeted the MAPK signaling pathway, and KIN001-102, CAL-101, GSK2126458, OSI-027, PIK-93, YM201636, GSK690693, MK-2206, AZD8055 and PF-4708671 targeted the PI3K/mTOR signaling pathway. Again, these results are reasonable in terms of the significant negative correlations between DEPTH2 scores and the enrichment scores of the MAPK, PI3K-Akt, and mTOR signaling pathways in cancer cell lines (P < 0.05) (Fig. 7B).

Associations between different ITH evaluation algorithms
We explored associations between DEPTH2 ITH scores and ITH scores by other five algorithms, including MATH [2], PhyloWGS [5], ABSOLUTE [15], DEPTH [6], and tITH [7]. The five algorithms evaluate ITH at different molecular levels, with MATH, PhyloWGS, and ABSOLUTE at the DNA level, and DEPTH2 and tITH at the mRNA level. In pancancer, DEPTH2 scores had significant positive correlations with ITH scores by all the other algorithms (P < 0.01) and had the strongest correlations with tITH (ρ = 0.58) and DEPTH2 scores (ρ = 0.49) (Fig. 8). DEPTH2 scores had the weakest correlations with MATH and PhyloWGS scores in pancancer (ρ = 0.04). In most individual cancer types, DEPTH2 scores showed significant positive correlations with DEPTH and tITH scores (P < 0.05) (Additional file 1: Table S5). Particularly, DEPTH2 scores had strong positive correlations with DEPTH scores in 20 cancer types and with tITH scores in 5 cancer types (ρ > 0.5). However, there was no any individual cancer type in which DEPTH2 scores showed a strong positive correlation with ITH scores by MATH, PhyloWGS, or ABSOLUTE. These results indicate that DEPTH2 scores have stronger correlations with ITH scores by mRNA-based algorithms than by DNA-based algorithms.

Comparisons of DEPTH2 with other ITH evaluation algorithms
We further compared DEPTH2 with the five ITH evaluation algorithms in their correlations with clinical features, genomic instability, antitumor immune response, and tumor purity in the 33 TCGA cancer types. ITH scores by MATH, PhyloWGS, ABSOLUTE, DEPTH, and tITH were negatively correlated with OS time in 3, 2, 4, 5, and 2 cancer types, respectively, compared to DEPTH2 in 10 cancer types (log-rank test, P < 0.05) (Fig. 9A). Additionally, ITH scores by MATH, PhyloWGS, ABSOLUTE, DEPTH, and tITH were negatively correlated with DFS time in 2, 5, 4, 1, and 2 cancer types, respectively, compared to DEPTH2 in 10 cancer types (P < 0.05) (Fig. 9A). ITH scores by MATH, PhyloWGS, ABSOLUTE, DEPTH, and tITH were significantly higher in metastatic than in primary tumors in 3, 3, 2, 2, and 2 cancer types, respectively, compared to DEPTH2 in 5 cancer types (P < 0.05) (Fig. 9A). Furthermore, ITH scores by MATH, PhyloWGS, ABSOLUTE, DEPTH, and tITH were significantly higher in high-grade than in low-grade tumors in 5, 1, 2, 5, and 1 cancer types, respectively, compared to DEPTH2 in 6 cancer types (P < 0.05) (Fig. 9A). Overall, these results suggest that the DEPTH2 ITH has a stronger association with unfavorable clinical outcomes in cancer than the other algorithms' ITH.
Altogether, these results indicate that DEPTH2 is superior to or comparable with the other algorithms in characterizing ITH.

Discussion
In this study, we proposed a new mRNA-based ITH evaluation algorithm (DEPTH2), which is an improved version of DEPTH we developed previously [6]. One major advantage of DEPTH2 over DEPTH and other ITH evaluation algorithms is that DEPTH2 evaluates ITH without reference to normal controls. Hence, DEPTH2 should have a wider spectrum of applications in measuring ITH in comparison to DEPTH and most other algorithms. Furthermore, our data showed that DEPTH2 ITH was associated with tumor progression, unfavorable prognosis, genomic instability, reduced antitumor immunity and immunotherapy response, and altered drug response in diverse cancers. It suggests that DEPTH2 is effective in measuring ITH because the DEPTH2 metric reflects the common properties of ITH [1]. Moreover, our data suggest that DEPTH2 ITH likely has a stronger association with unfavorable clinical outcomes (e.g., survival prognosis) in cancer than the ITH evaluated by other algorithms. In characterizing other properties of ITH, such as its associations with genomic instability and antitumor immune evasion, DEPTH2 also displays competitive performance versus other algorithms. DEPTH2 evaluates ITH levels based on the standard deviations of absolute z-scored transcriptome levels in tumors. The closer the absolute z-scored expression values, the lower the DEPTH2 score likely in the tumor. Thus, to a great degree, the DEPTH2 score indicates the asynchronous level of transcriptome alterations relative to the central tendency in a tumor. The asynchrony of transcriptome alterations in a tumor is associated with the heterogeneity of gene expression profiles among different tumor cells constituting the tumor. This is the rationale for DEPTH2 to evaluate the ITH level at the mRNA level.
Nevertheless, DEPTH2 has several limitations. First, because bulk tumors often involve non-tumor components, the ITH evaluated by DEPTH2 is likely confounded by nontumor cells. To overcome this limitation, introducing the variable of tumor purity in the algorithm could be a solution.
In addition, an investigation of the DEPTH2 algorithm in single-cell transcriptomes could eliminate the confounding effect of tumor purity. Second, due to differences in RNA-Seq or DNA microarray technology, read mapping, gene expression quantification methods, and the expression values of different genes, there could be large scale deviations between gene expression values that could have an impact on the generalization of DEPTH2 to a wide variety of data resources. To overcome this limitation, additional normalization strategies for gene expression values are needed, such as min-max scaling. In addition, nonparametric transformations of gene expression values could be a solution. We plan to further improve the DEPTH2 algorithm using these strategies in the future.

Conclusions
DEPTH2 is a new algorithm evaluating ITH levels at the mRNA level. DEPTH2 is superior to or comparable with other methods in characterizing the properties of ITH and is expected to have a wider spectrum of applications in measuring ITH in comparison to most other algorithms. The DEPTH2 ITH may provide new insights into cancer biology, as well as potentially valuable markers for cancer diagnosis and treatment.
Additional file 1: Table S1. A summary of the datasets used in this study. Table S2. The marker or pathway genes of immune signatures, stemness, and pathways. Table S3. Proteins with significant expression correlations with DEPTH2 scores in at least 5 cancer types. Table S4. Spearman correlations between DEPTH2 score and drug sensitivity (IC50 values) of cancer cell lines to 265 compounds. Table S5. Correlations between ITH scores inferred by six different algorithms within individual cancer types.
Additional file 2: Figure S1. Kaplan-Meier curves showing that higher-DEPTH2-score (> median) tumors have more inferior overall survival and disease-specific survival than lower-DEPTH2-score (< median) tumors in 10 and 10 individual cancer types, respectively. The log-rank test P-values are shown.