Integrated analysis of gene expression profiles reveals prognostic biomarkers for immunotherapy in cancer

Tumor immunotherapy has emerged as a promising method in cancer treatment, but patient responses vary, necessitating personalized strategies and prognostic biomarkers. This study aimed to identify prognostic factors and construct a predictive model for patient survival outcomes and immunotherapy response. We curated six immunotherapy datasets representing diverse cancer types and treatment regimens. After data preprocessing, patients were stratified based on immunotherapy response. Differential gene expression analysis identified 22 genes consistently dysregulated across multiple datasets.


Abstract
Tumor immunotherapy has emerged as a promising method in cancer treatment, but patient responses vary, necessitating personalized strategies and prognostic biomarkers.This study aimed to identify prognostic factors and construct a predictive model for patient survival outcomes and immunotherapy response.
We curated six immunotherapy datasets representing diverse cancer types and treatment regimens.After data preprocessing, patients were stratified based on immunotherapy response.Differential gene expression analysis identified 22 genes consistently dysregulated across multiple datasets.
Functional analysis provided critical insights, highlighting the enrichment of these dysregulated genes in immune response pathways and tumor microenvironment-related processes.To create a robust prognostic model, we meticulously employed a multistep approach.Initially, the identified 22 genes underwent rigorous univariate Cox regression analysis to evaluate their individual associations with patient survival outcomes.Genes showing statistical significance (p-values < 0.05) at this stage advanced to the subsequent multivariate Cox regression analysis, which aimed to address potential confounding factors and collinearity among genes.From this analysis, we ultimately identified four key genes-ST6GALNAC2, SNORA65, MFAP2, and CDKN2Bthat were significantly associated with patient survival outcomes.
Incorporating these four key genes along with their corresponding coefficients, we constructed a predictive model.This model's efficacy was validated through extensive Cox regression analyses, demonstrating its robustness in predicting patient survival outcomes.Furthermore, our model exhibited promising predictive capability for immunotherapy response, providing a potential tool for anticipating treatment efficacy.
These findings provide insights into immunotherapy response mechanisms and suggest potential prognostic biomarkers for personalized treatment.Our study contributes to advancing cancer immunotherapy and personalized medicine.

Introduction
Tumor immunotherapy has emerged as a promising therapeutic method for various cancer types, activating the power of the immune system to eliminate cancer cells [1].While some patients exhibit remarkable responses to immunotherapy, others experience limited or no benefit, highlighting the urgent need for personalized treatment strategies and prognostic biomarkers [2].Understanding the molecular basis underlying immunotherapy response and resistance is vital for optimizing patient outcomes and advancing the field of cancer immunotherapy.
In recent years, high-throughput technologies, such as RNA sequencing, have enabled comprehensive analysis of gene expression profiles in tumor tissues, providing valuable insights into the intricate interplay between tumor microenvironment and immune system [3].This deeper understanding of the molecular dynamics within tumors and their interaction with the immune system has paved the way for more precise and effective treatment strategies.
The exploration of key prognostic factors in tumor immunotherapy holds immense clinical significance [4].These factors serve as molecular indicators that can guide treatment decisions, enabling oncologists to tailor therapies to individual patients [5].By identifying genes that are consistently dysregulated across diverse cancer types and treatment regimens, our study aims to provide clinicians with valuable insights into patient prognosis and potential response to immunotherapy.This knowledge can assist in selecting the most appropriate treatment options for each patient, thereby optimizing treatment outcomes while minimizing unnecessary side effects.
Our study's endeavor to construct a prognostic model based on four key genes (ST6GALNAC2, SNORA65, MFAP2, and CDKN2B) further underscores the clinical relevance of exploring these factors.These genes, implicated in various cellular processes, including immune modulation and tumor microenvironment interaction, hold the potential to guide treatment decisions by identifying patients who benefit most from immunotherapy.Such precision in treatment selection not only enhances patient care but also contributes to resource optimization within healthcare systems.
In summary, the clinical significance of exploring key prognostic factors in tumor immunotherapy lies in its potential to transform the landscape of cancer treatment.By deciphering the molecular intricacies of immunotherapy response and integrating this knowledge into prognostic models, we pave the way for personalized therapeutic approaches that offer improved outcomes and quality of life for cancer patients.

Data collection
The counts data and clinical information for the six immunotherapy datasets (MEL_GSE78220, MEL_GSE91061, MEL_Nathanson_2017, MEL_PRJEB23709, GBM_PRJNA482620, and STAD_PRJEB25780) were retrieved from the Gene Expression Omnibus (GEO) and ArrayExpress database.The counts data were converted into the Transcript per Million (TPM) format to compare gene expression levels.

Grouping samples based on immunotherapy response and identification of differentially expressed genes
Samples from the six datasets were subset into two groups based on their immunotherapy response status.The response group includes samples showing either complete response (CR) or partial response (PR) to immunotherapy.The non-response group includes samples showing either stable disease (SD) or disease progression (PD) in response to immunotherapy.
To identify genes with varying expression patterns between the immunotherapy response group and the immunotherapy non-response group, the Limma package [6] was employed.The following criteria were used for DEG selection: Log-fold change (logFC) > 0.5 and p-value < 0.05.The DEGs obtained using Limma were categorized as either upregulated or downregulated based on their logFC values.

Robust Rank Aggregation analysis
To further identify genes that consistently play a role across multiple datasets, we utilized the Robust Rank Aggregation (RRA) algorithm.RRA is a powerful method used to combine ranked gene lists from different datasets, enabling the identification of robust and reproducible differentially expressed genes (DEGs) across multiple experiments [7].The RRA analysis was performed using the upregulated and downregulated gene lists from each of the six datasets as input.The algorithm assigns ranks to each gene based on their differential expression significance and then aggregates the ranks across datasets to obtain a unified rank that reflects the overall significance of each gene across all datasets.Genes with a p-value less than 0.05 were considered significant after RRA, indicating that they consistently exhibited differential expression across the multiple datasets.

Gene Ontology and KEGG pathway enrichment analysis
To delve into the relevant biological processes and signaling networks involved in the immunotherapy response, we conducted Gene Ontology (GO) and KEGG pathway enrichment analysis using the genes obtained from the RRA analysis.The enrichment analysis was conducted using the clusterProfile R package [8] with a significance threshold of p-value < 0.05.

Prognostic model construction and evaluation
The MEL_GSE91061 dataset, containing survival outcomes and immunotherapy response status, served as the training set for constructing the model.The RRA-selected gene set underwent univariate Cox regression analysis to evaluate the association between gene expression and outcomes.Genes with a p-value less than 0.05 in the univariate Cox regression were considered significantly associated with survival and progressed to the next analysis.The genes identified from the univariate Cox regression were further assessed using multivariate Cox regression analysis.Multivariate Cox regression was performed to control for potential confounding factors and eliminate genes showing collinearity.The genes obtained from the multivariate Cox regression analysis were utilized to construct the prognostic model.The combination of these genes and their corresponding coefficients formed the basis for the prognostic equation.The survival analysis and ROC curve analysis were used to assess the model's predictive performance.

External validation
The MEL_PRJEB23709 dataset was used for external validation of the prognostic model.Based on the prognostic model developed using the MEL_GSE91061 dataset, the risk score was computed for each sample in the MEL_PRJEB23709 dataset, the samples in the MEL_PRJEB23709 dataset were stratified into different risk groups based on predetermined risk score thresholds.To assess the external predictive capability of the constructed model, survival analysis was conducted on the stratified risk groups.To gain a comprehensive view of the differential gene expression results across the datasets, we constructed UpSet plots for upregulated and downregulated genes (Figures 1A and 1B).RRA analysis of the differential gene expression results identified a subset of genes with consistent expression patterns across datasets.The RRA analysis yielded 12 upregulated genes (LINC00221, CD8A, IFNG, CXCL10, CXCL9, CD274, GBP5, LINC00945, FAM151A, CCL5, CDKN2B, TFAP2A-AS1) and 10 downregulated genes (FBLN1, ZNF256, MIR6826, ALDH1L2, DACT1, ST6GALNAC2, SNORA65, DEFB124, MFAP2, IL13RA2), which showed robust differential expression in response to immunotherapy across the six datasets.The heatmap (Figure 1C) visually presents the fold change of the 22 RRA-derived genes across the six datasets, highlighting their potential significance in the context of immunotherapy response.

Expression of the RRA-derived genes
To investigate the expression patterns of the 22 RRA-derived genes in specific datasets, we focused on three datasets: MEL_PRJEB23709 (Figure 2A), MEL_Nathanson_2017 (Figure 2B), and STAD_PRJEB25780 (Figure 2C).The heatmap visualization allows us to explore the expression profiles of genes in relation to patients' immunotherapy response status.
In the heatmap, patients were categorized based on their response to immunotherapy, and the expression levels of RRA-selected upregulated genes were notably higher in the response group.This observation suggests that these genes might suggest a positive immunotherapeutic outcome.

Enrichment analysis of RRA-derived genes
To uncover the biological processes and signaling cascades linked to the 22 genes identified through the RRA method, we conducted Gene Ontology and KEGG pathway enrichment analyses.The GO enrichment analysis revealed that the 22 genes are significantly enriched in cellular response to external stimuli and immune cell chemotaxis, which are consistent with their potential roles in immune response processes (Figure 3A).These findings suggest that the identified genes may play crucial roles in orchestrating immune responses to external stimuli.The KEGG pathway enrichment analysis indicated that the 22 genes are significantly enriched in several important immune-related pathways, including cell-cytokine interactions, toll-like receptor signaling pathway, and antigen presentation processes (Figure 3B).These findings align with the known involvement of these pathways in mediating immune therapeutic responses.
The enrichment analysis results collectively highlight the functional relevance of the 22 RRA-derived genes in immune response processes and provide valuable insights into the potential mechanisms underlying the immunotherapeutic response.

Prognostic model construction and evaluation
Using the MEL_GSE91061 dataset as the training set, a prognosis model was established, taking into account the 22 genes obtained from RRA.Initially, univariate Cox regression analysis was performed on the 22 genes, resulting in the selection of five genes with p-values less than 0.05.Subsequently, these five genes underwent multivariate Cox regression analysis to eliminate collinearity, ultimately leading to the identification of four genes: ST6GALNAC2, SNORA65, MFAP2, and CDKN2B, for the prognostic model (Figure 4A).The coefficients of the four genes are 0.2357769, 0.2793911, 0.2613965, and 0.3594544, respectively.The scoring model was established by multiplying the coefficient of each gene by its corresponding expression level.
Using this scoring model, we calculated the risk scores for each sample in the training set and stratified patients into high-risk and low-risk groups.Survival assessment indicated that those in the high-risk group exhibited significantly inferior survival prospects (Figure 4B).To assess the forecasting potential of the four-gene prognostic model, we generated receiver operating characteristic (ROC) curves.The AUC values for 1-year, 2-year, and 3-year survival predictions were 0.69, 0.71, and 0.79, respectively (Figure 4C).These results suggest that the model exhibits reasonable accuracy in predicting patient survival at different time points.
Additionally, we investigated the association between the prognostic model and immunotherapy response.Interestingly, we discovered that patients categorized as low-risk exhibited a greater rate of positive response to immunotherapy when contrasted with individuals in the high-risk category (Figure 4D).This observation indicates that the constructed prognostic model not only predicts patient survival but also possesses the ability to predict immunotherapy response.

External validation of the prognostic model
To validate the performance of the prognostic model, we utilized the MEL_PRJEB23709 dataset as the external validation set.Leveraging the model constructed using the training set, we derived risk scores for every validation set sample.By applying the cutoff value derived from the training set, we subset the validation set into low-risk and high-risk groups based on the risk scores.
Examination of survival data from the validation set revealed that patients categorized as low-risk had notably superior survival outcomes in comparison to their high-risk counterparts (Figure 5A).These findings show that the prognostic model's predictive capability extends beyond the training set and holds promise for its applicability in independent patient cohorts.Moreover, we assessed the link between the prognostic model and the outcome of immunotherapy in the validation sample.Remarkably, patients classified as low risk exhibited a notably elevated immunotherapy response rate compared to those categorized as high risk (Figure 5B).This validation of the model's predictive performance in an external dataset further supports its robustness and clinical potential.

Discussion
In this study, we aimed to explore key prognostic factors in tumor immunotherapy and construct a prognostic model to predict patient survival outcomes and immunotherapy response.By comprehensively analyzing six publicly available immunotherapy datasets, we identified a set of 12 upregulated and 10 downregulated genes that consistently exhibited differential expression across multiple datasets, suggesting their potential significance in tumor immunotherapy.
Our findings hold several important implications for tumor immunotherapy and personalized treatment strategies.Firstly, the identification of these differentially expressed genes provides valuable insights into the molecular mechanisms underlying immunotherapy response and resistance.The upregulated genes, including CD8A, IFNG, and CD274 (PD-L1), are known to play critical roles in immune response regulation, T cell activation, and immune checkpoint pathways [9][10][11].Their dysregulation may influence tumor immune escape mechanisms, affecting the efficacy of immunotherapy.
Secondly, the constructed prognostic model, incorporating four key genes (ST6GALNAC2, SNORA65, MFAP2, and CDKN2B), demonstrated promising predictive capability for patient survival outcomes.Among these genes, ST6GALNAC2, MFAP2, and SNORA65 were down-regulated, whereas CDKN2B was upregulated in the immunotherapy response group.These genes have been implicated in diverse biological processes, such as cell adhesion, extracellular matrix remodeling, and cell cycle regulation [12][13][14][15].Their inclusion in the model further highlights their potential prognostic significance and suggests their involvement in tumor progression and response to Submit a manuscript: https://www.tmrjournals.com/mdmtreatment.
The enrichment analysis of the identified genes sheds light on their potential roles in immunotherapy response.The GO enrichment analysis revealed their involvement in cellular response to external stimuli and immune cell chemotaxis, which aligns with their biological functions in immune response processes.KEGG pathway analysis highlighted their enrichment in immune-related pathways, such as cell-cytokine interactions, toll-like receptor signaling, and antigen presentation, further supporting their relevance in immunotherapy response.
Moreover, the findings provide new avenues for future research and therapeutic interventions in tumor immunotherapy.For instance, the downregulated genes, including FBLN1 and IL13RA2, may serve as potential targets for enhancing antitumor immune responses and overcoming resistance to immunotherapy [16,17].Targeting these genes may potentially improve patient outcomes and expand the scope of immunotherapy applications in various tumor types.
Furthermore, the integration of multi-omics data, such as proteomics, epigenomics, and immune profiling, could enhance the comprehensive understanding of the tumor microenvironment and immune landscape, leading to the identification of additional biomarkers and potential therapeutic targets.
The prognostic model developed in this study has the potential to find practical applications in clinical settings.Firstly, it could serve as a valuable tool for clinicians in predicting patient outcomes following immunotherapy.By assessing the expression levels of the four key genes (ST6GALNAC2, SNORA65, MFAP2, and CDKN2B) in tumor samples, clinicians can stratify patients into high-risk and low-risk groups, providing insights into expected survival outcomes and helping in personalized treatment planning.Additionally, our model may aid in the selection of patients who are more likely to respond favorably to immunotherapy.Identifying patients who are predisposed to positive responses can lead to more targeted and effective treatment strategies, potentially reducing unnecessary adverse effects and healthcare costs.
However, it is essential to acknowledge some limitations of this study.Although we validated the model using multiple publicly available datasets, further external validation in larger cohorts and diverse populations is warranted to strengthen the robustness and generalizability of the findings.Additionally, the use of transcriptomic data may not fully capture the complexity of tumor-immune interactions, and the integration with other omics data could enhance the comprehensiveness of future studies.
In conclusion, this study identified a set of genes associated with immunotherapy response and patient prognosis and constructed a prognostic model with potential implications for personalized treatment and outcome assessment in tumor immunotherapy.The integrated analysis of gene expression and enrichment pathways offers a comprehensive view of the underlying biological processes and molecular events relevant to immunotherapy response.Nevertheless, further research and clinical validation are necessary to corroborate and extend these findings, paving the way for their implementation in future immunotherapeutic endeavors.

Figure 1
Figure 1 UpSet plots of differentially expressed genes (DEGs) across six immunotherapy datasets.(A) UpSet plot depicting the intersection of upregulated DEGs in the six datasets.(B) UpSet plot showing the intersection of downregulated DEGs in the six datasets.(C) Heatmap representation of the fold change of the 22 genes derived from the Robust Rank Aggregation (RRA) analysis across the six datasets.

Figure 2
Figure 2 Expression profiles of the RRA-derived genes in specific datasets.(A) Heatmap displaying the expression patterns of the 22 RRA-derived genes in the MEL_PRJEB23709 dataset.(B) Heatmap illustrating the expression profiles of the 22 RRA-derived genes in the MEL_Nathanson_2017 dataset.(C) Heatmap showing the expression levels of the 22 RRA-derived genes in the STAD_PRJEB25780 dataset.

Figure 3
Figure 3 Enrichment analysis of the RRA-derived genes.(A) Gene Ontology (GO) enrichment analysis revealed the significant enrichment of the 22 genes in immune response and cellular response to external stimuli.(B) In the context of KEGG pathway enrichment analysis, the 22 genes exhibited significant enrichment in cell-cytokine interactions, toll-like receptor signaling pathway, and antigen presentation processes.

Figure 4
Figure 4 Prognostic model construction and evaluation.(A) Forest plot displaying the hazard ratios of the four key genes (ST6GALNAC2, SNORA65, MFAP2, and CDKN2B) derived from the multivariate Cox regression analysis.(B) Kaplan-Meier survival curve showing the survival difference between low-risk and high-risk groups based on the prognostic model in the training set.(C) Receiver operating characteristic (ROC) curve analysis evaluating the prognostic model's predictive capability for survival outcomes at 1-year, 2-year, and 3-year time points.(D) Immunotherapy response analysis indicating the association between the prognostic model and immunotherapy response.

Figure 5
Figure 5 External validation of the prognostic model.(A) Kaplan-Meier survival curve illustrating the survival difference between high-risk and low-risk groups based on the prognostic model in the MEL_PRJEB23709 dataset.(B) Immunotherapy response analysis demonstrating the association between the prognostic model and immunotherapy response in the MEL_PRJEB23709 dataset.