Identification of marker genes for TEC by single-cell transcriptome analysis
Based on single-cell sequencing samples from 10 lung adenocarcinoma patients, cells obtained after quality control (QC) were used for subsequent analysis (Supplementary Fig. 1A). The first 2,000 variable genes were then screened for principal component analysis to reduce dimensionality, and cluster analysis was performed using 30 PCs to cluster all cells into 26 cell clusters (Supplementary Fig. 1B). Cells in cluster 19 were annotated as TEC based on the endothelial cell classical marker gene (PECAM1) and the universal TEC marker genes (ACKR1, PLVAP, IGFBP3) obtained from the literature (Zeng et al. 2023) (Fig. 2B). The other clusters of cells were annotated as T cell, B cell, NK cell, DC, monocyte, macrophage, epithelial cell, and fibroblast with the assistance of "singleR" (Fig. 2A). The heatmap shows each cell type's top five marker genes (Fig. 2C). Subsequently, the TEC marker genes were screened using the "FindALLMarkers" function using the Wilcoxon test with the thresholds of logfc.threshold = 1, p_val_adj < 0.05. Finally, the distribution of each cell type among different samples was shown in a stacked histogram (Fig. 2D).
Trajectory analysis of TEC and intercellular communication analysis
To further analyze the developmental process of the TEC subpopulation, we performed trajectory analysis of the TEC subpopulation using the R package Monocle2, which showed that the TEC subpopulation was roughly classified into seven differentiation states (Fig. 2F), and Fig. 2E shows the timeline of cellular differentiation, with the darker color indicating the earlier developmental stage. To explore the interactions between TEC and other cell types in the TME and to reveal the role of TEC in the tumor microenvironment. We used the R package "Cellchat" to analyze the intercellular communication of TEC, and the results showed that there were high intensity and number of interactions between TEC and other types of cells in the TME (Fig. 3A-B). When TEC acted as a signaling source, it generated a wide range of ligand-receptor interactions with various types of cells in the TME (Fig. 3C). When TEC acted as signaling targets, it was mainly able to receive VEGF from epithelial cells, fibroblasts, and myeloid cells and CXCL from epithelial cells and myeloid cells (Fig. 3D-F). The extensive interactions between TEC and various types of cells in the TME may be related to the pro-angiogenic and immunomodulatory effects of TEC.
GO, KEGG functional enrichment analysis of TEC marker genes
GO, KEGG functional enrichment analysis was performed based on 254 TEC marker genes, and the results showed that (Supplementary Fig. 1C) the TEC marker genes were mainly associated with pathways such as angiogenesis, regulation of angiogenesis, endothelial cell migration, extracellular matrix structural constituent, growth factor binding, PI3K-AKT signaling pathway, ECM − receptor interaction and Leukocyte transendothelial migration.
Construction of prognostic model based on TEC marker genes
To investigate TEC's effect on the prognosis of LUAD patients further, we constructed a prognostic model based on 254 TEC marker genes obtained from single-cell analysis. We used the TCGA-LUAD cohort as a training set to construct the prognostic model, and 47 genes associated with the prognosis of LUAD patients were obtained by univariate Cox regression analysis (Fig. 4A). Next, LASSO + stepcox regression analysis was used to construct the prognostic model (Fig. 4B-C). Nine model genes were finally screened: COL4A1,TM4SF1,TSPAN7,EFNB2,LPAR6,SH3BP5,CCDC85B,DPYSL2,RAB11A (Fig. 4D). The risk score was calculated as follows: risk score = (0.291 × COL4A1 expression) + (0.179 × TM4SF1 expression) + (-0.11 × TSPAN7 expression) + (0.213 × EFNB2 expression) + (-0.385 × LPAR6 expression) + (-0.208 × SH3BP5 expression) + (0.243 × CCDC85B expression) + (-0.229 × DPYSL2 expression) + (0.405 × RAB11A expression). Risk scores were sorted from high to low, and patients were categorized into high-risk (n = 250) and low-risk (n = 250) groups according to the median risk score. To explore the prognostic differences between the high and low-risk groups and to assess the model's accuracy in predicting patient prognosis. We performed survival analysis and ROC curve analysis in the training and test sets. The results showed that in the TCGA-LUAD training set, the prognosis of patients in the high-risk group was significantly worse than those in the low-risk group (P < 0.0001). The 1-year, 3-year, and 5-year AUC values of the area over time in the ROC curves were 0.73, 0.7, and 0.69, respectively (Fig. 4E). The significantly worse prognosis of patients in the high-risk group than those in the low-risk group was observed in all GEO test sets, and the GEO cohort ROC curves all had high AUC values (Fig. 4F-H). These results indicate that the prognostic model constructed based on TEC marker genes has high accuracy in predicting the prognosis of LUAD patients in both the training and validation sets.
Construction and validation of nomogram combining clinicopathologic features
To improve the clinical prediction effectiveness of the model, we constructed a nomogram by combining the clinicopathologic features in RiskScore and TCGA-LUAD (Fig. 5A). The calibration curves' results showed that the nomogram's predictions agreed with the actual observations (Fig. 5B). Decision curve analysis (DCA) showed that the nomogram and RiskScore had better net clinical benefits than other clinicopathologic features (Fig. 5C). In addition, Time-ROC analysis in the TCGA-LUAD cohort confirmed that the AUC values of nomogram and RiskScore exceeded the AUC values of other clinicopathologic features (Fig. 5D). These results suggest that the nomogram constructed based on RiskScore is a reliable and accurate tool for predicting the prognosis of patients with LUAD. To determine whether RiskScore was an independent prognostic factor for predicting OS in patients with LUAD, we performed univariate and multivariate Cox regression analyses of RiskScore and clinicopathologic characteristics in the TCGA-LUAD cohort. The results of univariate Cox regression analysis showed (Fig. 5E) that RiskScore was a significant risk prognostic factor for OS (HR > 1, p < 0.001). The results of multivariate Cox regression analysis showed (Fig. 5F) that RiskScore remained a significant risk prognostic factor for OS (HR > 1, p < 0.001). The results of these analyses suggest that RiskScore is an independent risk factor for LUAD patients and accurately predicts their prognosis.
Clinicopathologic analysis of prognostic characteristics of LUAD patients
To explore the differences in the expression of modeling genes between high and low-risk groups, we plotted heat maps containing clinicopathologic factors (Fig. 6A). We found significant differences in the expression of modeling genes between high and low-risk groups. In addition, we assessed the correlation between RiskScore and various clinicopathologic features. In the TCGA-LUAD cohort, we observed that patients with later stages of staging, T-staging, and N-staging had higher risk scores (Fig. 6B). By survival analysis, we found that the RiskScore showed good predictive ability in subgroups with different clinicopathologic characteristics, including age, gender, stage, T, N, and M stage (Fig. 6C-M).
Potential molecular mechanisms underlying risk scores in the Bulk transcriptome
To further explore the molecular mechanisms underlying the association between risk scores and the prognosis of LUAD patients, we performed a functional enrichment analysis. In the GSVA enrichment analysis based on the HALLMARK gene set, we observed that the high-risk group showed higher activity in the pathways related to G2M_CHECKPOINT, E2F_TARGETS, MYC_TARGETS_V2, GLYCOLYSIS, and PI3K_AKT_MTOR_SIGNALING. In comparison, the low-risk group showed higher activity in pathways associated with IL6_JAK_STAT3_SIGNALIN, INTERFERON_GAMMA_RESPONSE, and BILE_ACID_METABOLISM (Fig. 7A). Subsequently, in the GSEA enrichment analysis based on the KEGG gene set, we observed that the high-risk group was highly enriched for pathways such as Cell Cycle, DNA Replication, P53 Signaling Pathway, Ecm-Receptor Interaction, and Ribosome (Fig. 7E). In contrast, the low-risk group was highly enriched for the Intestinal Immune Network For Iga Production, Primary Immunodeficiency, Th1 And Th2 Cell Differentiation, Cytokine- Cytokine Receptor Interaction, B Cell Receptor Signaling Pathway, and other immune-related pathways (Fig. 7F). In addition, to investigate further the potential molecular mechanisms affecting the prognostic differences between the high- and low-risk groups, we performed GO and KEGG enrichment analysis of the differential genes between the two groups. The results of the KEGG enrichment analysis showed that the up-regulated genes in the high-risk group were mainly significantly enriched in Cell cycle, ECM-receptor interaction, PI3K-Akt signaling pathway, HIF-1 signaling pathway, etc. In contrast, the down-regulated genes were mainly enriched in the Intestinal immune network for IgA production, Antigen processing and presentation, Cytokine-cytokine receptor interaction, etc. (Fig. 7B). The GO enrichment results showed that the up-regulated genes in the high-risk group were mainly enriched in the following pathways: organelle fission, nuclear division, and cytokine receptor interaction. Organelle fission, nuclear division, and other processes (Fig. 7C), and down-regulated genes were mainly enriched in the regulation of T cell activation, leukocyte mediated immunity, and other processes (Fig. 7D). The above enrichment analyses showed that patients in the high-risk group were significantly enriched in carcinogenesis-related pathways, while those in the low-risk group were mainly significantly enriched in immunity-related pathways, indicating that the risk scores were closely related to cancer-related biological processes and metabolic pathways.
Genomic mutation landscapes in different risk subgroups
To investigate genomic mutation differences between high and low-risk groups, we mapped the mutation waterfalls of the high and low-risk groups (Fig. 8A-B). We found different mutation profiles between the two groups. The mutation frequency of the top twenty mutated genes was higher in the high-risk group than in the low-risk group. Subsequently, we compared the differences in TMB between the high and low-risk groups and found that the TMB in the high-risk group was higher than that in the low-risk group (Fig. 8C), and the KM curve showed that patients with higher TMB and high-risk scores had the worst prognosis (Fig. 8D). In addition, we analyzed the CNV of nine modeled genes (Fig. 8E). The results showed that CNV gain occurred mainly in COL4A1, EFNB2, TM4SF1, CCDC85B, RAB11A, and SH3BP5, while CNV loss occurred primarily in LPAR6, TSPAN7, and DPYSL2. Finally, we showed the distribution of the nine modeled genes on the chromosomes (Fig. 8F).
Immune landscape in different risk subgroups
Malignant tumors are not just an accumulation of tumor cells but also constitute a microenvironment containing endothelial cells, fibroblasts, structural components, and infiltrating immune cells that influence tumor progression, invasion, metastasis, and outcome(Bremnes et al. 2016). To assess the immune infiltration status of LUAD patients in the high and low-risk groups, we calculated the ImmuneScore, StromaScore, ESTIMATE Score, and TumorPurity using the ESTIMATE algorithm for patients in the high and low-risk groups. The results showed that the ImmuneScore and StromalScore were significantly higher in the low-risk group. At the same time, the TumorPurity was significantly higher in the high-risk group than in the low-risk group (Fig. 9B-E), which indicated that the overall immune level and immunogenicity of the TME in the low-risk group were higher. Subsequently, we quantified the abundance of immune cell infiltration in each LUAD patient using six commonly used immune cell infiltration algorithms (Fig. 9A). The results of multiple algorithms showed that the low-risk group was enriched with higher levels of CD8 + T cell, B cell, DC, and many other immune cells. Similar results were obtained by applying the ssGSEA algorithm for validation (Fig. 9F). In addition, the scores of immune-related functions were obtained in this study using the ssGSEA algorithm, and it was found that the low-risk group exhibited higher activities in APC_co_stimulation, Cytolytic_activity, HLA, and Type_II_IFN_Reponse (Fig. 9G). These results suggest that the poorer prognosis of patients in the high- and low-risk groups may be related to the level of immune infiltration.TEC may affect patients' prognosis by influencing the tumor microenvironment of LUAD. Next, we compared the distribution of TME typing of lung adenocarcinoma that had been reported in the high-risk and low-risk groups. The results showed that there were significant differences in the TME phenotypes of different risk groups (Fig. 9H), in which most of the patients in the low-risk group belonged to the IE and IE/F phenotypes. In contrast, most of the patients in the high-risk group belonged to the F and D phenotypes. For an anticancer immune response to effectively kill cancer cells, a series of stepwise events must be initiated and allowed to proceed and expand iteratively, steps known as the tumor-immune cycle(Chen et al. 2013). We compared the differences in activity between the high-risk and low-risk groups at each step of the immune cycle. Differences were found between the two risk groups in step 1, step 2, step 4, step 5, and step 6. In addition, "step 4-trafficking of immune cells to tumors" was further refined to analyze the recruitment ability of different immune cells in the high and low-risk subgroups. The results showed that the low-risk group was more able to recruit immune cells such as T cells, CD4 + T cells, CD8 + T cells, dendritic cells, and neutrophils (Fig. 9I). Antigen presentation is an important process by which immune cells recognize tumor cells for killing(Jhunjhunwala et al. 2021). We analyzed the differences in the expression of antigen-presenting genes between the two risk groups, and the results showed that the expression of antigen-presenting genes was significantly higher in low-risk groups than in the high-risk group (Fig. 9J). Immune checkpoints play an important role in the process of tumor immunotherapy. Forty-seven immune checkpoint-related genes were obtained from the literature(Danilova et al. 2019), and we analyzed the differences in their expression between the two groups. Twenty-three immune checkpoint genes were found to be significantly upregulated in the low-risk group. A significantly higher expression of the immune checkpoint genes CD276 and TNFSF4 was observed in the high-risk group (Fig. 9K). Higher T-cell dysfunction and rejection (TIDE) prediction scores indicate a higher likelihood of immune escape and a lower likelihood that patients will benefit from immunotherapy. We obtained TIDE scores for TCGA-LUAD patients through the TIDE website and compared the differences in TIDE scores between high and low-risk groups. The results showed (Fig. 9L-O) that the high-risk group had higher TIDE scores and T-cell rejection scores. These analyses suggest that tumor cells in patients in the high-risk group may exhibit more significant immune escape characteristics than those in the low-risk group.
Prediction of immunotherapy efficacy in different risk subgroups
In order to predict the response of patients in high and low-risk groups to immunotherapy, we collected three immunotherapy cohorts: including the IMvigor210 cohort, the GSE78220 cohort, and the immunotherapy cohort of melanoma patients obtained from the article by Whijae Roh et al. First, we performed SubMap analysis (Hoshida et al. 2007) based on the immunotherapy cohort of melanoma patients obtained from the literature, with lower P-values indicating higher similarity in immunotherapy outcomes. The results showed that patients in the low-risk group were sensitive to the effects of PD-1 inhibitors, whereas patients in the high-risk group were insensitive to PD-1 and CTLA4 inhibitors (Fig. 10A). Analysis of the TCGA-LUAD immunotherapy cohort obtained from the TIDE website showed a higher proportion of immunotherapy-nonresponsive patients in the high-risk group (Fig. 10B) and a significantly higher risk score for immunotherapy nonresponsive patients than for immunotherapy-responsive patients (Fig. 10C). Subsequently, we also analyzed the immunotherapy response status of patients in the high and low-risk groups in the IMvigor210 and GSE78220 cohorts.348 patients in the IMvigor210 cohort showed different responses to anti-PD-L1 receptor blockers, including stable disease (SD), partial remission (PR), complete remission (CR), and disease progression (PD)(Mariathasan et al. 2018). We found that the prognosis of patients in the high-risk group was significantly worse than that of patients in the low-risk group (Fig. 10D), and the percentage of patients with SD/PD in the high-risk group was higher than that in the low-risk group (Fig. 10E). The patients with SD/PD also possessed a higher risk score (Fig. 10F). The same results were observed in the GSE78220 cohort, where patients in the high-risk group had a significantly worse prognosis than patients in the low-risk group (Fig. 10G). The percentage of patients with SD/PD was higher in the high-risk group than in the low-risk group (Fig. 10H), and patients with SD/PD also possessed higher risk scores (Fig. 10I). In addition, to confirm our predictive results, we analyzed the IPS scores obtained from the TCIA database. Higher IPS scores, indicating better response to ICI therapy, which includes both PD-1 inhibitor and CTLA4 inhibitor therapy, were categorized into four categories:(1) ips_ctla4_pos_pd1_pos (CTLA4+/PD1 + treatment), (2) ips_ctla4_pos_pd1_neg (CTLA4+/PD1- treatment), (3) ips_ctla4_neg_pd1_pos (CTLA4-/PD1 + treatment), and (4) ips_ctla4_neg_pd1_neg (CTLA4-/PD1- treatment). Our analysis showed that the IPS scores of patients in the low-risk group were all higher than those in the high-risk group (Supplementary Fig. 2A-D), suggesting that patients in the low-risk group may be expected to have better immunotherapy outcomes.
Drug sensitivity analysis in different risk subgroups
To explore chemotherapeutic drugs for patients in high- and low-risk groups, we quantified the sensitivity of each TCGA-LUAD patient to 198 chemotherapeutic drugs using the "oncopredict" algorithm. Chemotherapeutic drugs that differed in sensitivity between high and low-risk groups were compared and screened (Supplementary Fig. 2E). Subsequently, we screened four chemotherapeutic drugs that were highly sensitive for patients in the high-risk group (Supplementary Fig. 2F-I), including AZD6738 (a potent, selective, orally bioavailable inhibitor of ataxia telangiectasia and Rad3-associated ATR kinase)(Wilson et al. 2022). Dasatinib (a multi-target protein tyrosine kinase inhibitor)(Liu et al. 2021). Lapatinib (a dual tyrosine kinase inhibitor)(Maeda et al. 2022). Docetaxel (a chemotherapeutic agent that acts by polymerizing microtubule proteins and inducing cell cycle arrest)(Gupta et al. 2023). These drugs may serve as potential therapeutic options for inhibiting the progression of cancer to malignancy, with the potential to improve the poor prognosis of patients in high-risk groups.
Identification of core genes and experimental validation of expression
To identify the core genes that play a vital role in the high-risk and low-risk groups. We analyzed the expression levels of nine modeled genes in tumors and normal tissues and divided patients into high and low-expression groups based on the median gene expression to explore their impact on the prognosis of LUAD patients. Four core genes were identified by differential analysis and survival analysis. The results of the differential analysis showed that the expression of four genes, CCDC85B (Fig. 11A), DPYSL2 (Fig. 11B), LPAR6 (Fig. 11C), and SH3BP5 (Fig. 11D), differed significantly in tumor and normal tissues. The results of survival analysis showed that LUAD patients with high expression of CCDC85B had a poorer prognosis (Fig. 11E), and LUAD patients with low expression of DPYSL2 (Fig. 11F), LPAR6 (Fig. 11G), and SH3BP5 (Fig. 11H) had a poorer prognosis. Finally, we verified the expression of the four core genes in three cell lines (normal cell line BEAS-2B, lung adenocarcinoma cell lines A549 and H1299) by RT-qPCR (Fig. 11I), and the experimental results were in agreement with the results of bioinformatic analysis.