Construction of IRGP signatures
In this study, in accordance with the gene expression profiles of OV from the TCGA database, we performed a series of bioinformatics analyses to build a model for prognostic evaluation of IRGPs in OV patients (Fig. 1). Here, a total of 1,039 IRGs from the InnateDB database were obtained. Of these genes, 539,241 IRGPs were established. After removing samples in all datasets with gene pair values of 0 or 1, we left the residual IRGPs and subsequently selected them as first candidate IRGPs to conduct further analysis. Then, a univariate Cox proportional hazards regression model was adopted for the calculation of the relationship between each IRGP and patient survival with a log-rank test p-value < 0.05. A total of 3,765 identified IRGPs were identified (Supplementary Table 2). Moreover, using the LASSO regression model, we further selected 17 IRGPs for the final risk prognostic signature (Table 2, Supplementary Fig. 2).
Table 2
The results of 17 IRGPs using LASSO regression model
IRGPs | Coef | p-value | HR | Low.95CI | High.95CI |
LILRA2 vs P2RY14 | 1.083615 | 0.00014 | 2.955345 | 1.69188 | 5.162343 |
NOD2 vs LILRA2 | -1.15639 | 0.000278 | 0.31462 | 0.168658 | 0.586905 |
CXCL14 vs SHARPIN | 0.877286 | 0.003049 | 2.404364 | 1.345709 | 4.295853 |
TSC22D3_vs_CXCL11 | 1.177209 | 0.005145 | 3.245304 | 1.422665 | 7.403006 |
FOXA2_vs_RCAN1 | -0.68767 | 0.007236 | 0.502744 | 0.30437 | 0.83041 |
PLA2G4A vs SCAF11 | -0.84326 | 0.019819 | 0.430307 | 0.211683 | 0.874725 |
ABCA1 vs MST1R | 0.75507 | 0.027164 | 2.127761 | 1.088905 | 4.157725 |
STAT4 vs IL1R2 | -0.85268 | 0.031899 | 0.426269 | 0.195622 | 0.928859 |
AP3B1 vs BTN3A3 | 0.525508 | 0.040716 | 1.691318 | 1.022447 | 2.797755 |
MID1 vs THBS1 | -1.69541 | 0.069621 | 0.183524 | 0.029397 | 1.145736 |
IFNGR1 vs CASP6 | 0.483861 | 0.250733 | 1.622325 | 0.710478 | 3.704465 |
MSR1 vs CXCL11 | -0.33671 | 0.265347 | 0.714119 | 0.394875 | 1.291463 |
BTN3A2 vs IRF2 | -0.68764 | 0.281319 | 0.502761 | 0.143903 | 1.756516 |
IL1B vs CXCR3 | 0.37003 | 0.284697 | 1.447778 | 0.735004 | 2.851768 |
SNX27 vs CXCL11 | 0.364458 | 0.388464 | 1.439734 | 0.628867 | 3.296136 |
CASP7 vs CXCL11 | 0.274334 | 0.509042 | 1.315654 | 0.582788 | 2.970113 |
BTN3A3 vs TPST1 | -0.00035 | 0.998939 | 0.999646 | 0.59306 | 1.684976 |
Note Coef: coefficient by LASSO analysis. HR: Hazard Ratio. Low.95CI: low 95% confidence interval (CI). High.95CI: high 95% CI. |
The evaluation and validation of IRGP signatures for survival prediction
Based on the above 17 IRGPs, we constructed a prognostic risk model for OV patients. Since the overall survival time of patients was distributed over more than 2 years (Supplementary Fig. 3), the predictive effect of this model on datasets for 1, 3, and 5 years was evaluated. Next, the IRGPs were adopted for the calculation of the risk score for the respective case in the TCGA training group. The average AUC of the training set was 0.869, and the average AUC of the validation set was 0.712 (Supplementary Table 3). In addition, the average AUC of all TCGA datasets was 0.778, and the average AUC of the independent testing set was 0.73 (Fig. 2A-D). By dividing OV patients into low- (Risk-L) and high-risk groups (Risk-H) based on the median risk score, we observed that the Risk-H group exhibited noticeably poorer prognosis than the Risk-L group on the training set, validation set, all TCGA set and independent GEO testing set (log-rank test p-value < 0.05, Fig. 3A-D).
Functional analysis of immune-related gene pairs
In our signature, the 17 IRGPs contained a total of 29 immune genes. These genes were significantly enriched in the “caspase” and “interferon receptor” families (p-value < 0.01, Table 3). The GO annotation results of 29 genes suggested that 4 genes were significantly enriched in interleukin-1 secretion and inflammatory bowel disease biological processes (Fig. 4A). Seven of the 17 IRGPs showed a negative correlation with risk scores, and ten showed a positive correlation (Fig. 4B, Table 4). Here, IRGPs reflected the level of relative expression between two genes. We found that genes related to immune activation/response (P2RY14, CXCL11, CXCR3, MST1R, and NOD2) showed relatively low expression levels (Supplementary Table 4). The relatively highly expressed genes (CASP6, CASP7, IL1B, THBS1, and TPST1) were mainly related to the apoptotic process and inflammation response (Supplementary Table 5). Further enrichment analysis of DEGs in the high- and low-risk groups was performed (Supplementary Table 6). The results suggested that the immune response-related pathways (Toll-like receptor and chemokine signal pathway, and others) were significantly negatively correlated with risk scores; however, the pathways such as p53 signaling pathway and apoptosis had a positive correlation with the risk scores (Fig. 4C), which seems to indicate that samples from the low-risk group may have higher immune activity (or immune activation status).
Table 3
The gene family enrichment results of 29 immune-related genes
Gene family | Genes | p-value | FDR |
Caspases | CASP7/CASP6 | 0.000168 | 0.005042 |
Interferon receptors | IFNGR1 | 0.007737 | 0.232112 |
P2Y receptors | P2RY14 | 0.011584 | 0.347516 |
V-set domain containing | BTN3A2/BTN3A3 | 0.019208 | 0.576234 |
ATP binding cassette subfamily A | ABCA1 | 0.019234 | 0.577029 |
Clathrin/coatomer adaptor, adaptin-like, N-terminal domain containing | AP3B1 | 0.019234 | 0.577029 |
Zinc fingers RANBP2-type | SHARPIN | 0.028087 | 0.842623 |
NLR family | NOD2 | 0.033112 | 0.993349 |
Scavenger receptors | MSR1 | 0.035614 | 1 |
Sorting nexins | SNX27 | 0.038111 | 1 |
Sulfotransferases, membrane bound | TPST1 | 0.048034 | 1 |
Receptor tyrosine kinases | MST1R | 0.05173 | 1 |
Phospholipases | PLA2G4A | 0.054186 | 1 |
Interleukin receptors | IL1R2 | 0.054186 | 1 |
Interleukins | IL1B | 0.055412 | 1 |
Forkhead boxes | FOXA2 | 0.055412 | 1 |
Chemokine ligands | CXCL14 | 0.057858 | 1 |
CD molecules | LILRA2/CXCR3 | 0.092531 | 1 |
Tripartite motif containing | MID1 | 0.117076 | 1 |
SH2 domain containing | STAT4 | 0.123936 | 1 |
Endogenous ligands | CXCL11 | 0.258581 | 1 |
Ring finger proteins | SCAF11 | 0.327935 | 1 |
unknown | IRF2:RCAN1:THBS1:TSC22D3 | 1 | 1 |
Table 4
Analysis of correlation between gene pair values and risk scores
IRGPs | p-value | Pearson correlation | Type |
TSC22D3_vs_CXCL11 | 1.08E-27 | 0.54906008 | Positive |
CASP7_vs_CXCL11 | 4.08E-23 | 0.50599649 | Positive |
SNX27_vs_CXCL11 | 1.42E-19 | 0.467906634 | Positive |
AP3B1_vs_BTN3A3 | 5.90E-16 | 0.42334027 | Positive |
IL1B_vs_CXCR3 | 6.39E-14 | 0.39504485 | Positive |
MSR1_vs_CXCL11 | 1.10E-12 | 0.37640839 | Positive |
LILRA2_vs_P2RY14 | 2.03E-12 | 0.372271702 | Positive |
ABCA1_vs_MST1R | 9.10E-10 | 0.32707845 | Positive |
CXCL14_vs_SHARPIN | 9.39E-07 | 0.264532336 | Positive |
IFNGR1_vs_CASP6 | 2.01E-03 | 0.168462127 | Positive |
PLA2G4A_vs_SCAF11 | 2.69E-04 | -0.198123738 | Negative |
MID1_vs_THBS1 | 1.16E-08 | -0.305853734 | Negative |
FOXA2_vs_RCAN1 | 3.98E-11 | -0.3511514 | Negative |
BTN3A3_vs_TPST1 | 3.63E-11 | -0.35182302 | Negative |
STAT4_vs_IL1R2 | 1.72E-11 | -0.357242428 | Negative |
BTN3A2_vs_IRF2 | 1.73E-12 | -0.37335433 | Negative |
NOD2_vs_LILRA2 | 1.09E-12 | -0.376498688 | Negative |
Figures Legends |
Relationship between prognostic risk signature and clinical features
Using clinical information such as age, tumor stage, and grade from the TCGA database, we analyzed the relationship between the 17-IRGP risk signature and clinical characteristics. Here, samples from the tumor stage II, III and IV groups showed significant differences in risk scores (Fig. 5A), but no significant difference was observed for prognosis of the high- and low-risk group samples in stage II (p-value > 0.05). However, significant differences were shown in stages III and IV, indicating that the model may be more suitable for stage III/IV OV patients (Fig. 5B-D). For tumor grades, G1/G2 and G3/G4 samples had no significant difference in risk scores (Fig. 5E). However, the prognosis of the high- and low-risk group samples showed significant differences in G1/G2 and G3/G4 (Fig. 5F-G). We also did not observe a significant correlation between age and risk scores (Fig. 5H).
The performance of prognostic risk signature in OV subtypes
The TCGA project revealed that surviving gene expression characteristics can predict clinical outcomes and divide OV patients into four transcription subtypes, including differentiated, immunoreactive, mesenchymal, and proliferative (27). We next compared the prognostic performance of our model on these four molecular subtypes. Low- and high-risk groups of the four subtypes were identified to have significant prognostic differences (Fig. 6A-D). In addition, we also found the best prognosis in immunoreactive subtype, Risk-L samples, while the worst prognosis mesenchymal subtype, Risk-H samples. Moreover, OV was divided into four immune subtypes (C1-C4) (28) based on immune molecular tags. We further compared the model's performance on different immune subtypes (the C3 immune subtype had only 3 samples and was not added to the analysis). Among the above three immune subtypes, there were also different survival outcomes between the high- and low-risk groups in both the C1 and C2 immune subtypes (Fig. 6E-G).
Comparison of our prognostic risk signature with other models
Finally, using the same method, we evaluated the AUC values of four existing OV prognostic models at 1, 3, and 5 years. The average AUC values of these 4 models were all < 0.7, which were lower than the AUC of our 17-IRGPs signature, indicating that our model has better prediction performance. Among the 4 models, only the Risk-H and Risk-L groups calculated by the 5-gene signature model have no significant difference in prognosis, and other 3 models showed significant differences in prognosis (Fig. 7A-D). Based on the C-index of above five prognostic models, the 17-IRGPs model has the largest C-index (Fig. 8A), indicating that the overall performance of our model was better than the other four models. The RMST curves of the five models also show significant differences. The risk scores of these models have a very significant relationship with the prognosis (HR > 1, p-value < 0.0001), but we see that the RMST cure of 17-IRGPs was better than the other four models, which has a steeper slope (Fig. 8B), indicating that our model can better evaluate the survival rate in OV.
Moreover, we examined using GSE14764 and GSE26712, separately. As shown in Supplementary Fig. 4, we first used the same method in the GSE14764 dataset to evaluate the AUC values of four existing OV prognostic models. We can see that AUC values of all four existing OV prognostic models are very small. The results of most K-M curves are not statistically significant. Similar results can still be revealed in GSE26712 dataset (Supplementary Fig. 5). Therefore, combining the above analysis results, we can find that in TCGA, GSE14764, or GSE26712 dataset, the AUC values of four existing OV prognostic models are very small, indicating that our 17-IRGPs model can better evaluate the survival rate in OV (Supplementary Fig. 6).