This study is conducted on a subset of Caucasian samples from CRC patients enrolled in a randomized phase III trial CALGB/SWOG 80405 due to the limited sample size from other ethnicities. Germline genotyped data was extracted from the peripheral blood of 1,165 patients (Figure S1). RNA-seq data were extracted from the primary tumor tissue of 469 patients, obtained from pre-treatment formalin-fixed paraffin-embedded blocks (Figure S2). The trial was designed to compare cetuximab, bevacizumab, or cetuximab + bevacizumab, each plus chemotherapy as first-line therapy in KRAS wild-type advanced or mCRC. The combined arm of the study (cetuximab + bevacizumab) reduced the efficacy of the treatment and therefore, was discontinued prematurely [13]. The clinical study did not show significant differences in OS of patients treated with bevacizumab versus cetuximab. However, these treatments differ at the molecular levels by targeting distinct biological pathways. Therefore, multiple genes and their interconnected pathways collaborate to influence the response to these therapies.
Here, we first constructed a data-driven network of genes and then augmented the network with genetic information to identify the transcriptomic-causal network based on Mendelian randomization technique [9, 11]. Toward this object, we conducted expression quantitative trait loci (eQTL) analysis and utilized the eQTLs as potential instrumental variables to estimate causal relationships among genes and construct the transcriptomic-causal network. We also included known gene regulatory pathways for identifying the causal network. We performed this analysis using the data from patients treated with either bevacizumab or cetuximab. We then assessed the stability of the network and replicated the edges using 103 samples from the bevacizumab + cetuximab arm of the study (Methods). Furthermore, we replicated the interconnectivity among genes using STRING database.
By mapping identified eGenes (genes with eQTLs) on the network, we identified sub-networks of genes regulated by eQTLs. We linked the network to OS and estimated the effects of genes on OS that were not attributed to the other genes in the study. Finally, we defined gene signatures corresponding to the sub-networks with genes associated with OS. For the analyses reviewed here, we accounted for the influence of the tumor microenvironment by adjusting for immune cell abundances estimated from RNA-seq data. In addition, all the analyses were adjusted for all RAS and BRAFv600E mutation status along with age, gender. The eQTL analysis is adjusted for batch effect correction. We replicated the findings using a subset of data from the cohort under study as well as two external cohorts. Additionally, we explored the relationship between our findings and immune signatures defined in the literature as key inputs to a description of the immune subtypes of cancer. Figure 1 represents the overall workflow of the study.
Genetic regulatory variants of gene expression in CRC tumor tissue
To identify candidate susceptibility genes subject to regulation by genetic variants, we performed a cis-eQTL analysis on 8,301 genes with adequate variation (standard deviation of normalized counts across samples > 0.5) while adjusting for covariates. We assessed the relationship of 33,209,829 cis-eQTL-gene pairs using 350 Caucasian samples with both RNA-seq and genotype data (Fig. 2A). We selected 352 top genes after applying the permutation test (p-value < 0.05) (Figures S4 and S5, Table S1). Enrichment analysis revealed a significant depletion of exons (Fig. 2B), and we observed a high enrichment of cis-eQTLs in bivalent chromatin states associated with enhancer sequences based on the Roadmap Epigenomics Consortium enhancer databases [14] (Fig. 2C). Bivalent enhancer refers to segments of DNA that have both repressing and activating epigenetic regulators in the same region.
Transcriptomic-causal network
We combined the samples of bevacizumab and cetuximab arms to identify the transcriptomic-causal network since RNA seq data were recorded prior to the treatments (Fig. 2A). We identified the network on 8,301 genes and focused on 2,267 edges that passed the stability assessment. Using the 103 patients (treated with bevacizumab + cetuximab), who were initially excluded from the main analysis, we successfully replicated 71% of the interconnectivity among genes (Methods and Figures S7-9). By taking these steps, we reduced the likelihood of false positive discoveries and increased the chances of reproducibility for the identified edges. We then integrated cis-eQTL analysis results for sub-networks identifications and the application of Mendelian randomization technique to identify causal relationships among genes (Methods).
Gene signatures and patient stratification
We investigated the effect of the gene expression levels on OS by integrating the transcriptomic-causal network with Cox-proportional hazard models for the patients receiving bevacizumab (203 samples) and cetuximab (163 samples) separately. In this way, we controlled for the effects of confounding genes identified using the network. This analysis yielded the identification of three gene signatures corresponding to three sub-networks that comprised genes with significant effects (p-value < 0.1) on overall survival (OS) for either bevacizumab or cetuximab treatments (Fig. 3).
One of the sub-networks (Fig. 3A, sub-network 1) includes 4 genes (UAP1L1, IL2RB, RELT, and MYO1G). Among these genes, MYO1G was identified as an eGene. To assess the effect of MYO1G on OS, we controlled for the effect of RELT as confounding gene and observed that MYO1G prolonged, and RELT shortened OS under cetuximab treatment (HR: 0.65 and 1.37, p-value: 0.05, 0.07, respectively) but not under bevacizumab treatment (HR: 0.86 and 1.14, p-value: 0.51, 0.33, respectively), (Fig. 3).
The other sub-network (Fig. 3A, sub-network 2) involved 4 genes (IDO1, GBP2, GBP4, and GBP5), three of which belong to the guanylate-binding protein (GBP) family, including eGene GBP5. We observed that the two directly connected genes IDO1 and GBP4 significantly prolonged OS of patients under bevacizumab treatment (HR: 0.79, 0.63, p-value: 0.018, 0.015, respectively) and not under cetuximab (HR: 0.92, 0.93, p-value: 0.455, 0.692, respectively), (Fig. 3C). In this analysis, we controlled for the confounding role of IDO1 on the GBP4-OS relationship.
The third sub-network (Fig. 3A, sub-network 3) includes 6 genes (BLM, FANCI, UBE2T, SNRPA1, PRC1, and DTL), with BLM being an eGene. We observed that PRC1 and FANCI shortened the OS of patients treated with cetuximab (HR: 1.53, 1.43; p-value: 0.03, 0.04 respectively) but not with bevacizumab (HR: 0.94, 1.31; p-value: 0.71, 0.10), (Fig. 3C).
To facilitate the nomination of a scoring model for prospective testing as a gene signature and to support the visualization of survival plots, we dichotomized the expression level of each gene based on their third quartile and defined beneficial/non-beneficial expression levels with respect to OS specific to each treatment. We then classified patients based on having either beneficial or non-beneficial expression levels in each sub-network (Methods). We estimated the survival function based on Kaplan-Meier estimator using set of genes that collectively form a gene signature. We observed a notable decrease in the median OS from 43.5 to 16.1 months (p-value: 0.0002) for patients with the beneficial vs. non-beneficial levels for both RELT and MYO1G in sub-network 1; from 38.1 months to 13.1 months (p-value: 0.0059) for patients with beneficial vs. non-beneficial levels for both FANCI and PRC1 in sub-network 3 (Fig. 4). For patients with beneficial expression for one gene and non-beneficial expression for other gene in the signatures see Figure S10. Due to the limitations associated with dichotomizing the data in Kaplan-Meier estimators, the statistical power of the analysis for sub-network 2 in the bevacizumab arm was insufficient to detect significant differences among patients.
Immune feature enrichment
The biological functions of the gene signatures were assessed by clustering genes in the corresponding sub-networks (Fig. 5) based on 10 immune signatures known as key inputs to a description of the immune subtypes of cancer. These immune signatures including macrophages [15], lymphocytes [16], TGF-β [17], IFN-γ [18], wound healing [18], and CD8 + T cell [19] measure a final common pathway of antitumor immune activity (cytotoxicity [19]), characteriz the immune microenvironment (T follicular helper, Tfh, cells [20]), and mediate the response to checkpoint inhibitors (B cells and T cells cooperation [21]).
The median value of the genes (Tables S2 and S3) within an immune signature was assigned to each patient except for the cytotoxicity signature where the geometric mean was used, according to Rooney et al. [22]. In addition, we included immunoglobulin G[23] and single gene CD274[24] as prognosis biomarkers. Since in this study, gene expression levels are adjusted for enriched immune cell types, we first investigated the relationship of the immune signatures and the enriched immune cell types (Figure S11). We then clustered the genes in the sub-networks represented as a heat map plot in Fig. 5.
As the heatmap in this figure shows, Sub-network 1 is divided into two clusters: one includes IL2RB and MYO1G and the other includes UAP1L1 and RELT, consistent with what we observed in the OS analysis where MYO1G prolonged OS and RELT shortened OS. Both genes (MYO1G and RELT) in the corresponding signature, showed stronger correlation with macrophages.
All the gene in sub-network 2 are clustered together and represented strong correlation with cytotoxicity, CD8 + T cells, lymphocytes, and macrophages. The two genes (IDO1 and GBP4) in the corresponding signature showed stronger correlation with cytotoxicity (Fig. 5).
On the other hand, sub-network 3 is only related to wound healing and all the genes in this pathway showed correlations with the wound healing (Fig. 5).
Replications
In addition to using 103 samples for replication of edges in the network, we used three external replication cohorts. We replicated the edges in the network and investigated the functional relationships among genes within identified signatures using the STRING database, a biological database of protein–protein interactions [25]. Table 1 represents confidence scores for protein interactions corresponding to the genes in the signatures. Scores above 0.5 are indicative of promising evidence for potential physical interactions.
Table 1
Replication of gene interactions in the identified signatures using the STRING database. The confidence scores greater than 0.5 indicate promising evidence for potential physical interactions of proteins encoded by the genes in the signatures.
Protein
|
Connected
|
Score
|
FANCI → PRC1
|
directly
|
0.63
|
IDO1 → GBP4
|
directly
|
0.52
|
RELT → → → MYO1G
|
within two steps
|
0.65, 0.51, 0.58
|
We used the GSE146889 data from the Gene Expression Omnibus database as an external cohort, which includes 85 paired samples from normal and tumor tissue of colorectal cancer patients. For 21,983 genes measured in this cohort, we performed differential expression analysis. All the genes in sub-networks 1–3 except IL2RB and MYO1G were differentially expressed in this cohort after adjusting for multiple testing using the false discovery rate (FDR) method (FDR < 0.05) (Table 2).
Table 2
Summary result of differential expression analysis over 21,983 genes from the GSE146889 database as an external cohort. FC: fold change; SE: standard error; stat: the value of the test statistic for the gene. The adjusted p-value is based on FDR correction.
Gene symbol
|
log2 (FC)
|
SE log2(FC)
|
stat
|
p-value
|
Adjusted
p-value
|
BLM
|
-1.75
|
0.16
|
-10.98
|
4.60E-28
|
9.49E-27
|
DTL
|
-2.29
|
0.18
|
-12.75
|
3.16E-37
|
1.47E-35
|
FANCI
|
-1.88
|
0.11
|
-16.75
|
5.54E-63
|
4.39E-60
|
GBP2
|
0.49
|
0.12
|
4.27
|
1.99E-05
|
5.09E-05
|
GBP4
|
-0.77
|
0.15
|
-5.18
|
2.25E-07
|
7.15E-07
|
GBP5
|
-0.97
|
0.20
|
-4.88
|
1.05E-06
|
3.11E-06
|
IDO1
|
-2.35
|
0.27
|
-8.62
|
6.97E-18
|
6.01E-17
|
PRC1
|
-2.06
|
0.13
|
-15.96
|
2.44E-57
|
8.49E-55
|
RELT
|
-0.47
|
0.16
|
-2.90
|
3.68E-03
|
6.91E-03
|
SNRPA1
|
-0.77
|
0.06
|
-12.75
|
3.28E-37
|
1.52E-35
|
UAP1L1
|
-0.40
|
0.11
|
-3.62
|
2.91E-04
|
6.41E-04
|
UBE2T
|
-2.43
|
0.16
|
-15.30
|
7.35E-53
|
1.57E-50
|
We also replicated the gene-OS relationships using data from the COAD project of The Cancer Genome Atlas (TCGA), consisting of 27 samples treated with bevacizumab and ten with cetuximab. Thus, we performed the replication analysis for the findings related to the bevacizumab arm and not for cetuximab due to the limited number of samples. We used the Wilcoxon rank-sum test to validate the findings for the follow-up time of 27 patients treated with bevacizumab who exhibited elevated expression levels of both GBP4 and IDO1 or the absence of high expression levels in both genes. The expression levels were dichotomized based on the third quartile, similar to the Kaplan-Meier analysis. We observed a significant difference between these two groups (p-value = 0.038), confirming the association of both GBP4 and IDO1 in sub-network 2 with the overall survival of patients treated with bevacizumab.