Glycosylphosphatidylinositol anchor biosynthesis pathway-based biomarker identification with machine learning for prognosis and T cell exhaustion status prediction in breast cancer

As the primary component of anti-tumor immunity, T cells are prone to exhaustion and dysfunction in the tumor microenvironment (TME). A thorough understanding of T cell exhaustion (TEX) in the TME is crucial for effectively addressing TEX in clinical settings and promoting the efficacy of immune checkpoint blockade therapies. In eukaryotes, numerous cell surface proteins are tethered to the plasma membrane via Glycosylphosphatidylinositol (GPI) anchors, which play a crucial role in facilitating the proper translocation of membrane proteins. However, the available evidence is insufficient to support any additional functional involvement of GPI anchors. Here, we investigate the signature of GPI-anchor biosynthesis in the TME of breast cancer (BC)patients, particularly its correlation with TEX. GPI-anchor biosynthesis should be considered as a prognostic risk factor for BC. Patients with high GPI-anchor biosynthesis showed more severe TEX. And the levels of GPI-anchor biosynthesis in exhausted CD8 T cells was higher than normal CD8 T cells, which was not observed between malignant epithelial cells and normal mammary epithelial cells. In addition, we also found that GPI -anchor biosynthesis related genes can be used to diagnose TEX status and predict prognosis in BC patients, both the TEX diagnostic model and the prognostic model showed good AUC values. Finally, we confirmed our findings in cells and clinical samples. Knockdown of PIGU gene expression significantly reduced the proliferation rate of MDA-MB-231 and MCF-7 cell lines. Immunofluorescence results from clinical samples showed reduced aggregation of CD8 T cells in tissues with high expression of GPAA1 and PIGU.


Introduction
Nowadays, breast cancer, as the most prevalent malignant tumor in women, has replaced lung cancer as the cancer with the highest incidence rate worldwide (1).Although numerous advances in early screening, and diagnostic methods, the incidence of BC is increasing yearly, constitutes ~11.7% of all new cancer patients (2).Therefore, comprehending the involvement of diverse oncogenes and risk factors in the initiation and progression of BC assumes significance for early prevention and treatment strategies targeting this disease.
Due to the distinctive immunostimulatory properties exhibited by breast cancer, immunotherapy has gained considerable attention as a promising therapeutic approach for breast cancer (3).In the context of immunotherapeutic therapy, immune cells play a pivotal role as the cellular basis.In certain highly immune-infiltrated tumors, tumor-infiltrating lymphocytes (TILs) can constitute more than 40% of the overall cell population (4).The functional status of T cells is crucial for tumor evasion and immune therapy.However, the benefits of immunotherapy in cancer treatment still remains limited due to T cell functional dysregulation.It is typically characterized by reduced proliferation capacity and impaired effector function of T cells, as well as the excessive expression of various inhibitory receptors (5), which result in a weakened immune response and the escape of tumor cells from immune surveillance (6).CD8 T cell exhaustion, which epitomizes the dysfunctional state of T cells within the TME, is widely recognized as a major obstacle to current anticancer immunotherapy (7).It is imperative to comprehensively investigate the development mechanisms underlying T cell exhaustion in TME and devise novel strategies to avert functional T cell exhaustion in immunotherapeutic approaches.
Glycosylphosphatidylinositol (GPI) anchor biosynthesis is one of the generally exists in eukaryotic protein modification after translation process (8).In simple terms, it provides a "glycosylphosphatidylinositol anchor" for certain intracellular proteins to specifically anchor onto the lipid bilayer (9), thereby enabling their precise localization and engagement in vital biological processes such as cellular signaling, adhesion, and immune responses (10).. Abnormalities in the processes of GPIanchor biosynthesis have been shown to be involved in a variety of diseases, such as paroxysmal nocturnal hemoglobinuria and the syndrome of hyperphosphatemia with mental retardation (11, 12).In addition, more and more studies have found that there are identified abnormalities of GPI-anchor biosynthesis in tumorigenesis and development, including the dysregulated expression of genes involved in GPI biosynthesis and increased synthesis of specific GPI-anchored proteins (GPI-AP) (13).The biosynthesis of GPI-anchored proteins has been found to be closely associated with the regulation of T cell functions in the immune response (14).Studies indicate that GPI-anchored proteins play crucial roles in regulating T cell activation, proliferation, and cell death processes (15).In conditions such as chronic infections, tumor microenvironments, or autoimmune diseases, T cells may undergo functional exhaustion, leading to immune response suppression and reduced therapeutic efficacy (16).These findings suggest that the components of GPI-anchor biosynthesis and the high expression of tumor-associated GPI-AP may play a role in carcinogenesis, providing new ideas for the exploration of tumor markers and targeted therapies.
In the context of breast cancer, the biosynthesis of GPIanchored proteins has also been found to be closely associated with the proliferation, invasion, and metastasis of breast cancer cells (17,18).By intervening in the GPI-anchored biosynthesis pathway, we may potentially impact the survival and invasiveness of breast cancer cells, thereby enhancing treatment efficacy and patient survival rates.In this study, we aimed to explore the link of GPI anchor biosynthesis with T cell exhaustion and tumor prognostic progression in BC.

Collection of raw data
The RNA-seq data and clinical information of patients were obtained from the TCGA and GTEx databases (http://xena.ucsc.edu/).The BC cohort contains 1097 cancer samples and 120 paracancer samples.The pan-cancer dataset contains 10,534 samples covering 33 different types of cancers.All the TCGA files are stored in FPKM standard format.The validation dataset was downloaded from the Metabric database (https://www.cBioPortal.org),which provides RNAseq data and related clinical information of 1918 BC patients.In addition, we downloaded the scRNA-seq profiles of BC from the GEO database (https://www.ncbi.nlm.nih.gov/GSE114727) and selected 11 normal breast single-cell samples and 30 tumor single-cell samples from it for subsequent analysis.

Obtaining the necessary genes for various scoring systems
The Glycosylphosphatidylinositol (GPI) -anchor biosynthesis related genes were obtained from the relevant literature as indicated in Supplementary Table 1 (19).The scoring genes for cytotoxic T lymphocytes (CTL) are sourced from the String database (https:// cn.string-db.org/)(Supplementary Table 2).The chemokine score consists of 41 known chemokines (Supplementary Table 3).All the scores are determined by the expression of scoring genes with the ssGSEA Algorithm (20).The cytolytic activity score (CYT-score), which quantifies the cell-killing function, is based on the geometric mean of gene expressions of GZMA and PRF1 (21).

Pan-cancer prognostic analysis of GPI-anchor biosynthesis score
Package 'GSVA', was employed to calculate GPI-anchor biosynthesis score (GPI-score) for patients in the TCGA pancancer dataset.Differences in GPI-scores between various tumors and normal samples were assessed using the "ggpubr" software package.The association between GPI-scores and prognostic indicators of patients was investigated by univariate Cox regression analysis (uniCox).Prognostic indicators include: overall survival (OS); disease-free interval (DFI); progression-free interval (PFI); and disease-specific survival (DSS) (22).And the Kaplan-Meier method was employed to assess differences in OS across distinct groups (23).

Functional enrichment analysis
The differential gene analysis among various groups was conducted using the "limma" package, and | log2 fold change (FC) |≥ 0.5 and p< 0.05 were the standard for the differential genes.Finally, KEGG and GSEA enrichment analysis of the differential genes was performed on SangerBox portal (https:// www.sangerbox.com)(24).

Immune microenvironment analysis
Apply the "ssGSEA" algorithm to assess the enrichment scores of 22 immune cell types in every individual, while employing the deconvolution method with LM22 immunological feature matrix in CIBERSORT for precise calculation of infiltrating immune cell components (25).In addition, Stromal score and immune score were calculated by "ESTIMATE" algorithm (26).

Consensus clustering
Previous studies have demonstrated that interleukin-2 (IL-2), tumor necrosis factor (TNF), interferon-gamma (IFN-g), and the cytotoxic T lymphocyte (CTL) can serve as quantitative markers for assessing the abundance of T cell exhaustion (27,28).Consensus clustering analysis was performed on the TCGA BC cohort based on the aforementioned characteristics of T cell exhaustion with K-Means clustering algorithm.The R package "ConsensusClusterPlus" was used to determine the number of clusters with the parameters were set as reps=50, maxK=9, pFeature=1, pItem=0.8.And 1000 repetitions were carried out to ensure the stability of the subtype (29).

Processing of the scRNA-seq data
The "Seurat" package was used for quality control of data, which involved excluding cells exhibiting ≥5% expression of mitochondria-related genes and those with fewer than 50 detected genes.Further normalization of the gene expression matrix is conducted based on the top 3000 variable genes.The significant principal components (PCs) were computed by principal component analysis (PCA), and the dimensionality was subsequently reduced for the first 20 PCs.Then, a cluster classification analysis was conducted on all units.With the help of the CellMarker database and FindAllMarkers function, we annotated different cell clusters.There are significant differences in the chromosomal copy number variations (CNV)between malignant cells and normal cells in solid tumors.The InferCNV algorithm is widely used for exploring scRNA-seq data to provide evidence of large-scale chromosomal copy number alterations, thereby effectively identifying malignant cells (30).We utilized the InferCNV algorithm to identify malignant cells within clusters of epithelial cells.The GPI-score in different cell clusters was calculated by the AddModuleScore algorithm implemented in Seurat.The Monocle 2 algorithm was employed for pseudo-time trace analysis, and the plot_pseudotime_heatmap function was utilized to construct heatmaps illustrating the dynamic expression of genes (31).

Establishment of a diagnostic model for T cell exhaustion
We performed ssGSEA analysis on the characteristics of T-cell exhaustion (including IL2, TNF, IFNG, TGFB1, IL10, TCF7, TBX21, TOX expression abundance, CTL score, Chemokines score, and CYT score) to derived a T-cell exhaustion score (TEXscore).Based on the median TE-score, we divided BC patients into two types (Severe exhaustion type and mild exhaustion type).Different exhaustion states are included as the outcomes of the model.The XGBoost algorithm and the Logistic regression algorithm were used for model training, and the TEX-score of patients was used as the gold standard to evaluate the predictive value of these two models for T-cell exhaustion.The ROC analysis for outcome prediction was conducted using pROC's ROC function, and the final AUC result was determined by utilizing pROC's ci function to obtain both AUC and confidence intervals.

Construction of a prognostic model for breast cancer
Integrating GPI-anchor biosynthesis related genes and T cell exhaustion related genes (IL2, TNF, IFNG) as input features, we arranged 101 combinations of 10 algorithms in the training dataset for variable selection and model construction based on a ten-fold cross-validation framework (32).Optimal modeling method selection is based on c-din values, followed by feature gene screening, calculation of the proportion of feature genes in the prognostic model, and subsequent construction of a risk score.4).Breast tumor tissue was processed into 4mm thick paraffin sections and subsequently dewaxed.The sections were then incubated in 0.01M sodium citrate at room temperature for 100 minutes to eliminate endogenous enzyme activity.Subsequently, a solution containing 5% goat serum was applied and incubated at room temperature for 60 minutes, followed by overnight incubation with DAPI (1:100), CD8 (1:100), PIGU (1:100), and GPAA1 (1:100) antibodies, respectively.On the following day, after washing with 0.02% PBST, a secondary antibody solution (concentration of 1:200) was added under light-avoiding conditions and incubated at room temperature for 2 hours.Finally, the sections were washed with 0.02% PBST, mounted, and observed using confocal optical microscopy to visualize the staining results.
Small interfering RNA (siRNA) was designed by Tongyong (Shanghai)Company to knock down PIGU mRNA in BRCA cell lines.Total RNA was extracted by Trizol reagent (RNAiso Plus-9109, Takara), and cDNA was synthesized through PCR thermal cycler (Eppendorf, Germany) using reverse transcription reagent (RR036A, Takara) according to the instructions.An appropriate amount of cDNA was used as a template to verify the efficiency of knock down for qRT-PCR through TB Green dye (RR820A, Takara) on Mastercycler real-time PCR system (LightCycler, Roche).After targeted gene knock down, siNC and siPIGU cells were seeded in 96-well plates and cultivated for 48h.CCK8 reagent (CK04, Dojindo) was added and incubated for 2h.The absorbance was measured at 450nm by microplate reader (Tecan, Switzerland).

Statistical analysis
The t-test was applied to the normally distributed data in this study, while the Wilcoxon rank sum test was used for analyzing the non-normally distributed data.The chi-square test was employed to compare categorical variables.In order to compare the disparities among small sample sizes, a non-parametric test is selected.Pearson correlation analysis is used to measure the strength and direction of the linear relationship between two variables.All the analysis were conducted on the R software (version 4.1.2).And adhering to standard practice, the results are statistically significant with p< 0.05.

Glycosylphosphatidylinositol (GPI) anchor biosynthesis is a prognostic risk factor in breast cancer
We quantified the intensity of GPI-anchored biosynthesis in patients with GPI-scores and evaluated the differences in GPIanchored biosynthesis between cancer and paracancer tissues in the TCGA pan-cancer dataset.Figure 1A shows that there are 20 cancer types exhibit significant changes in GPI-anchored biosynthesis.The levels of GPI-anchored biosynthesis in the tumor tissues of LGG, TGCT, LUAD, READ, BRCA, THCA, LIHC, LUSC, SKCM, PRAD and GBM were higher than those in normal tissues.Conversely, the levels of GPI-anchored biosynthesis in the tumor tissues of DLBC, OV, KIRC, KICH, STAD, ESCA, KIRP, PAAD and ACC were lower than normal tissues.As the result of unicox analysis shown in the forest diagram (Figure 1B), GPI-anchored biosynthesis is a risky factor of prognosis in LGG, HNSC, GBM, CESC and BRCA, while it is a protective factor in READ, THYM, OV, MESO, LUSC and KIRC.
The results from the KM plotter demonstrated that the enhancements to the GPI-anchored biosynthesis was associated with poor prognosis in HNSC, BRCA, LGG and LUAD (Figure 1C).However, in COAD, LIHC, LUSC, MESO, READ and KIRC, the stronger the GPI-anchored biosynthesis, the more favorable the prognosis for patients.Combined the results of the cox regression, K-M survival curve and the expression of GPI-scores in tumor and normal samples, we finally determined that the GPI-anchored biosynthesis may exert a pivotal role in the development of breast cancer.

Functional enrichment analysis of the GPI-anchored biosynthesis
The TCGA BC patients were divided into two cohorts with high and low GPI-anchored biosynthesis based on the best cutoff values of the GPI-score.The heat map illustrates the distinct gene patterns within various patient subgroups (Figure 2A).KEGG functional enrichment analysis of different clusters of patients showed that the patients with high GPI-anchored biosynthesis had a worse prognosis accompanied by a lower immune response, the specific manifestations include the down-regulation of Cytokine-cytokine receptor interaction, Helper T lymphocytes (Th17, Th1 and Th2) cell differentiation, Primary immunodeficiency, and Antigen processing and presentation (Figure 2B).Interestingly, the patients in the high GPIanchored biosynthesis group exhibited augmented metabolic pathways, characterized by the Estrogen signaling pathway and AMPK signaling pathway, as well as the enhanced Fatty acid metabolism and Cortisol synthesis secretion (Figure 2B).In addition, GSEA analysis showed that as the GPI-score increased, protein secretion and estrogen response were elevated, peroxisome metabolism and oxidative phosphorylation pathway were enhanced.However, IL2-STAT5 signaling, IL6-JAK-STAT3 signaling, TNFA signaling, Interferon Gamma response and Inflammatory response were downregulated (Figure 2C).We also examined the abundance of 22 immune cells in different groups.Surprisingly, the group of patients with high GPI-score showed a decrease in adaptive immune function, with a decrease in T cells CD8 , T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs),and T cells gamma delta.It was also accompanied by an increasing of M2 macrophages (tumor-promoting) and monocytes (Figure 2D) (33).Simultaneously, the analysis of ESTIMATE scores and immune checkpoints revealed that patients with higher GPI-anchored biosynthesis exhibited a diminished immune status and response (Figure 2E).Additionally, these patients exhibited decreased stromal cell infiltration in their tumor tissues but elevated levels of tumor purity (Figures 2F, G).

The correlation between GPI-anchored biosynthesis and T-cell exhaustion
GPI-score is negatively correlated with T cell CD8, T cell CD4 memory activation, Treg, B cell memory, and T-cell follicular helical (P<0.001), while showing a positive correlation with M2 macrophages (P<0.001)(Figure 3A).As it is depicted in Figure 3B, the expression of the majority of genes involved in GPI-anchored biosynthesis exhibited a significant negative correlation with diverse immune cell populations.Then, consensus clustering analysis was employed to categorize BC patients based on the specific TEX pathway which includes TNF, IL-2, IFN-g, and CTL.We identified 4 clusters of patients with different Tcell exhaustion status in the TCGA-BC cohort (C1-C4) (Supplementary Figure 1).The heat map also shows the change of other molecular pathways of TEX in different groups, including TGFB, IL10, chemokines, transcription factors (TCF7, TBX21, Tox), CYT scores, and the abundance of infiltrating lymphocytes (Activated B cell, Activated CD4 T cell, Activated CD8 T cell, Effector memeory CD4 T cell, Effector memeory CD8 T cell, Th1, Th2, and Th17) (Figure 3C).Consistently, these characterized pathways of TEX including CYT and CD4 T cell and CD 8 T cell showed a decreasing trend in C1-C4 (Figures 3D-F), suggesting that the four TEX subgroups identified in this study accurately represent the biology of the TEX grading stage.Interestingly, the intensity of GPI-anchored biosynthesis increases with the severity of the TEX (Figure 3G).Based on the KEGG enrichment analysis, we compared the differences in biological function between the groups with the greatest differences in the degree of TEX (C1 and C4), and surprisingly, alterations in GPI-anchored biosynthesis were the most significant change between the two groups, topping the list in terms of magnitude of change (Figures 3H, I).

Single-cell analysis of GPI-anchored biosynthesis in TME
We identified 14 major cell types using annotations from the Cell Marker database, including CD4 T cells, CD8 T cells, Exhausted CD8 T cells, B cells, Epithelial cells, Fibroblasts, Myeloid cells, Innate Lymphocytes (ILCs), Dendritic cells, Monocytes, Neutrophils, Mast cells, Endothelial cells and Macrophages (Figure 4A).Projecting the clinical information of the samples onto various cell subpopulations to trace the cellular origin, the exhausted CD8 T cells were predominantly present in tumor tissues (Figure 4B).Compared with the normal mammary microenvironment, the proportion of various types of cells in TME was significantly altered, in which the proportion of B cells, endothelial cells, macrophages, and monocytes was elevated while the CD4 T cells and CD8 T cells were decreased and replaced by a substantial increase in exhausted CD8 T cells (Figure 4C).This observation suggests that the T-cell exhaustion is a rare occurrence in normal breast tissue, but frequently manifests in tumor tissue.
Then, we defined 127 malignant cells in the tumor epithelial cell population using the InferCNV algorithm (Figures 4D, E; Supplementary Table 5) and compared the differences of GPI-anchored biosynthesis between the malignant epithelial cells and normal breast epithelial cells.Interestingly, the GPI-anchored biosynthesis was instead reduced in malignant epithelial cells (Figure 4F).We compared the alterations of GPI-anchored biosynthesis in other tumor-derived cell populations, we found that remarkably differences could be only observed between CD8 T cells and exhausted CD8 T cells (Figure 4G; Supplementary Figure 2).The GPI-anchored biosynthesis was much stronger in exhausted CD8 T cells compared to normal CD8 T cells.Similarly, GSVA analysis of KEGG biological pathways showed there is no difference in GPI-anchored biosynthesis between malignant and non-malignant epithelial cells (Figure 4H), whereas this difference was evident in CD8 T cells and exhausted CD8 T cells (Figure 4I).

Pseudotime analysis of GPI-anchored biosynthesis related genes in TME
The differentiation trajectories of CD8 T cells and epithelial cells were constructed to observe the process of CD8 T cell exhaustion and metabolic remodeling of GPI-anchored biosynthesis during tumorigenesis.In the pseudotiming analysis, CD8 T cells were initially located along the trajectory path and gradually transitioned into exhausted CD8 T cells (Figures 5A, B).We calculated the dynamic expression levels of GPI-anchored biosynthesis related genes in CD8 T cells and exhausted CD8 T cells along the pseudotemporal transition, and the gene heat map suggested that PIGP, PIGC, PIGU, PIGW, PIGV, and GPAA1 were highly expressed at the end of the trajectory path (Figure 5C), which means that these genes are highly expressed in exhausted CD8 T cells.Two-dimensional map shows the dynamic expression of genes in CD8 T cells (green) and exhausted CD8 T cells (blue) (Figure 5D).These results suggest that the expression of GPIanchored biosynthesis related genes is heterologous during the CD8 T cell state transition.Figures 5E, F represent the pseudotemporal developmental trajectories of epithelial cells.We calculated the GPIscores in the cells and present them in a two-dimensional plot, where we can see that the overall level of GPI-anchored biosynthesis in exhausted CD8 T cells is always higher than that in non-exhausted CD8 T cell, and that metabolic intensity increases as CD8 T cell become more exhausted (Figure 5G).However, the intensity of GPIanchored biosynthesis is lower in tumor epithelial cells than in normal epithelial cells (Figure 5H).Thus, it is worth considering whether GPIanchored biosynthesis related genes could serve as a potential biomarker for predicting T cell exhaustion in BC patients.Interestingly, we did not observe this phenomenon in the metabolic landscape of exhausted CD4 T cells.It is worth noting that GPI-anchored biosynthesis levels are lower in exhausted CD4 T cells compared to non-exhausted CD4 T cells (Supplementary Figure 3).

Machine learning-based model for the diagnosis of T cell exhaustion
Correlation analysis was performed between the GPI-score and TEX-score in Metabric and TCGA BC cohort, with correlation coefficients (r-values) of -0.49 and -0.51 respectively (Figures 6A,  B).Then, a variational analysis was performed in Metabric BC cohorts with the model_profile function in the XGBoost algorithm, assessing the association between the expression of GPI-anchored biosynthesis related genes and the states of T cell exhaustion in BC patients.The Partial Dependence profile demonstrates the impact of individual gene expression on the predictive ability of the XGboost model (Figure 6C). Figure 6D illustrates the magnitude of the impact of each GPI-anchored biosynthesis related gene on the states of T cell exhaustion in Metabric BC patients, it can be observed that PIGQ exerts the most significant influence on severe T cell exhaustion states in patients, followed by GPAA1.6E, F).Our results demonstrated that the GPI-anchored biosynthesis related genes are valuable as a tool to predict whether a patient has developed severe T cell exhaustion.

Establishment of a prognosis signature related to GPI-anchored biosynthesis
To validate the prognostic value of GPI-anchored biosynthesis related genes in BC, we develop a gene signature (GPIS) based on 21 GPI-anchored biosynthesis related genes and the characteristic factors of T cell exhaustion (TNF, IL-2, IFN-g).The Metabric dataset, with its high number of patients, served as the training set, while the TCGA cohort, with its low number of samples, served as the validation set.We employed a ten-fold cross-validation framework to train and validate 101 predictive models, subsequently determining the C-index for each set.Models with C-index less than 0.5 indicate obvious overfitting problems in the construction process and should be excluded.Interestingly, the optimal model was Random Forest (RSF) with the highest average C-index (0.767) Figure 7A.Then, the RandomForestSRC algorithm was utilized to optimize the results of the RSF. Figure 7B represents the weight ratio of the characterized genes in GPIS, which shows that PIGV, PIGU, GPAA1, and PGAP1 are the core genes of the model.By weighting the 13 gene expression values and their regression coefficients in RandomForestSRC, a risk score was obtained for each patient (Figure 7C; Supplementary Table 6).With the optimal thresholds for risk scores determined by the "surviminer" package, patients were classified into high-risk and low-risk groups.K-M curves indicate that patients categorized as high-risk exhibited notably reduced OS duration compared to individuals in the low-risk cohort (Figures 7D, E).ROC analysis measured the discrimination of GPIS, and the 3-year, 4-year and 5year AUC of training dataset were 0.62, 0.64 and 0.65 (Figure 7F).The 1-year, 3-year and 5-year AUC in validation dataset were 0.68, Frontiers in Immunology frontiersin.org0.60 and 0.59 (Figure 7G).The Accuracy of Random Forest model is 0.685, KS score is 0.510, F1 score is 0.645, and Precision is 0.672 (Supplementary Figure 5).

GPAA1 and PIGU are associated with poor prognosis and CD8 T cell infiltration in BC
The multivariable Cox regression analysis revealed that GPAA1, PIGU, PIGS, PIGQ, and PIGW independently serve as risk factors for OS (Figure 7H).In the feature weight analysis of the T cell exhaustion diagnostic model and prognosis model, further comprehensive investigation is necessary for GPAA1 and PIGU.We employed the Timer method from the IOBR software package to conduct immune cell infiltration scoring on 9,406 tumor samples covering 38 cancer types within the TCGA pan-cancer dataset.The heat map reveals a strong correlation between GPAA1 and immune infiltration in 32 cancer types, while PIGU exhibits a significant association with immune infiltration in 33 cancer types (Figure 8A).The infiltration of CD8+T cells in breast cancer exhibits a negative correlation with the expression levels of GPAA1 and PIGU.In addition, the K-M curve indicates that patients with high expression of GPAA1 and PIGU have a worse OS (Figures 8B, C).Compared to normal breast cells (MCF10A), the expression of GPAA1 and PIGU is relatively high in MCF7, MDA-MB-231 and BT-474 cells (Figure 8D).SiRNA was employed to downregulate the expression of PIGU in MCF7 and MDA-MB-231 cells, and the ability of PIGU to effectively retard proliferation of MCF7 and MDA-MB-231 cells was confirmed by CCK8 assay (Figure 8E).We also employed a multiplex immunofluorescence staining technique to detect the expression levels of GPAA1 and PIGU molecules, as well as the degree of CD8 T cell infiltration in 25 local breast cancer tissues.Surprisingly, In multiple microscopic views, the expression of GPAA1 and PIGU in breast cancer cells was negatively correlated with the expression of CD8 molecules in neighboring lymphocytes (Figures 8F, G).When tumor cells high expressed GPAA1 and PIGU, the fluorescence intensity of CD8 molecules in peripheral infiltrating lymphocytes noticeably decreased, indicating a reduction in the number of CD8+ T cells.

Discussion
Our study demonstrates the dysregulation of GPI-anchored biosynthesis to varying extents in the majority of prevalent tumors.Augmented GPI-anchored biosynthes is not only exhibits a correlation with an unfavorable prognosis in breast cancer but also contributes to the exhaustion of CD8 T cells within the TME.Considering the pivotal role of GPI-anchored biosynthesis in cellular processes (34), we quantified 21 genes associated with GPI-anchored biosynthesis to generate a composite score for assessing the extent of GPI-anchored biosynthesis in tumor patients.Our findings suggest that elevated GPI-anchored biosynthesis functions as a prognostic risk factor in BC patients.Further scoring of the immune status and immune cells suggests that the BC patients with high GPI-anchored biosynthesis had lower levels of immunity, as well as the significant down-regulation of IL2, TNF, and IFN-g.Studies have shown that during the persistent infection, CD8 T cells undergo a progressive depletion of their effector function and proliferative capacity (35,36), which is characterized by the upregulation of inhibitory receptors such as PD-1, CTLA4, and TIM-3, as well as impaired production of TNF, IFN-g, chemokines, and degranulation ability (37)(38)(39).Ultimately, the effectiveness of CD8 T cells in inhibiting tumor growth becomes diminished, leading to the immune evasion of tumor cells (40).Interestingly, the characterization of CD8 T cell exhaustion is highly manifested in patients with high GPI-anchored biosynthesis, suggesting that there may be a more severe state of T cell depletion in this patient cluster.This observation motivated us to deep investigate the potential association between CD8 T cell exhaustion levels and GPI-anchored biosynthesis intensity in BC patients.
Referring to the developmental pathway of CD8 T cell exhaustion defined by Beltra et al (41), we identified four distinct patterns of T-cell exhaustion based on the abundance levels of IL-2, TNF, IFN-g, and CTL expression in the BC patients.As anticipated, the intensity of GPI-anchored biosynthesis in patient clusters increased with the exhaustion of T cells, showing a high correlation between the two.To our surprise, single-cell analysis showed that GPI-anchored biosynthesis was only differentiated in CD8 T cells and over-activated with the exhaustion of CD8 T cells, while this phenomenon was not observed in malignant epithelial and normal breast epithelial cells in TME.This suggests that the enrichment of GPI-anchored biosynthetic activity is an important biological change in the progression of CD8 T cells toward exhaustion.
However, existing studies have not revealed whether abnormalities in the GPI anchoring process in tumors specifically affect the exhaustion of CD8 T cells.Katagiri et al. reported that patients with bone marrow failure (BMF) exhibited T cells deficient in GPI-anchored protein (GPI-AP), and this subset of defective T cells demonstrated heightened susceptibility to immunotherapy (42).More and more studies have proposed that co-expression of some GPI-APs on CD8 T cell can help identify the T cell subsets with weakened function and exhaustion during chronic infection.Such as the co-expression of 2B4, CD160, and KLRG1 with PD-1 on HCV-specific CD8 T cells is associated with intermediate differentiation and impaired function (43,44).The increased expression of some GPI-anchored inhibitory receptors on T cell surface associated with progressive CD8 T cell exhaustion in BC may be attributed to the boost of GPI anchor biosynthesis.
Our findings suggest a clear mechanistic link between the GPIanchored biosynthetic pathway and T cell exhaustion, although it has not yet been fully explored.However, based on the existing research results and theoretical hypotheses, we can make some speculations and discuss how GPI-anchored biosynthesis interferes with CD8 T cell exhaustion.1.The role of GPI-anchored proteins in T cell signaling (45): GPI-anchored proteins play a crucial role in signal transduction and cell-cell interactions on the cell membrane.There may exist GPI-anchored proteins that play key roles in T cell activation and maintenance of function.Therefore, abnormalities in GPI-anchored protein biosynthesis may affect the expression or function of these proteins, leading to T cell dysfunction.metabolism and signaling pathways.Abnormal GPI-anchored protein biosynthesis may impact the normal functioning of these metabolic pathways, resulting in T cell dysfunction and exhaustion.4.Association of GPI-anchored proteins with immune checkpoints (47): GPI-anchored proteins may be associated with immune checkpoint molecules, influencing the balance between T cell activation and suppression.Abnormal GPI-anchored protein biosynthesis may alter the expression or function of these immune checkpoints, leading to T cell dysfunction and immune exhaustion.Furthermore, the persistent presence of a metabolic state, such as an inflammatory state, in tumor patients can directly influence the tumor microenvironment.Research indicates that immune cells in the tumor microenvironment can exhibit metabolic plasticity based on changes in the TME (48).For example, MRC1+ CCL18+ macrophages in colorectal cancer show enrichment in oxidative phosphorylation, while macrophages in liver metastasis primarily rely on amino acid metabolism (49).This metabolic adaptability directly impacts the function, activation state, and response of immune cells to the tumor (50).Metabolites derived from glutamine metabolism, for instance, can promote T cell-driven immunosuppressive programs (51).In breast cancer patients, the increased expression of some GPI-anchored inhibitory receptors on the surface of CD8+ T cells during T cell exhaustion may be related to cellular responses to changes in the TME, leading to metabolic reprogramming and potentially promoting increased GPI-anchored protein biosynthesis.
Overall, the potential mechanistic links between GPI-anchored protein biosynthesis and T cell dysfunction may involve multiple aspects, including signal transduction, immune regulation, metabolic regulation, and immune checkpoints.Future research can further explore these mechanistic connections to uncover the specific role of GPI-anchored protein biosynthesis in T cell function regulation, providing a deeper theoretical basis for the development of novel immunotherapy strategies.
In addition to suggesting enhanced GPI-anchored biosynthesis, which is associated with CD8 T cell exhaustion and poor prognosis in BC, we also observed different perturbations of expression of 21 genes involved in GPI-anchored biosynthetic processes during the transformation process of CD8 T cell exhaustion.With the further establishment of BC prediction models for GPI-anchored biosynthesis-related genes, the role of GPAA1 and PIGU are pivotal in T cell exhaustion diagnosis and patient prognosis, suggesting the potential utility of GPI-anchored biosynthesis related genes as biomarkers for diagnosing T cell depletion status and predicting prognosis in BC patients.
In fact, it has been shown that GPAA1 and PIGU are defined as oncogenes in a variety of cancers.For example, GPAA1 facilitates the advancement of gastric cancer by increasing the expression of GPI-anchored proteins and augmenting the ERBB signaling pathway (52).High expression of GPAA1 exacerbates the progression of hepatocellular carcinoma (53).And PIGU was recognized as an oncogenic factor of bladder cancer in 2004 (54).This opens a new door that GPI-anchored bioanabolic activity itself may be tumorigenic.We also found that GPAA1 and PIGU are highly amplified in breast cancer cell lines (55), and inhibiting the expression of PIGU has a negative impact on cell proliferation.Not only that, BC tissues with high expression of GPAA1 and PIGU exhibited suppression of peripheral CD8 T cell infiltration.
In conclusion, our study provides a valuable insight into T cell exhaustion in BC patients.We strongly recommend the inclusion of GPI-anchor biosynthesis metabolic pathway as a potential risk factor for tumorigenesis and progression of T-cell exhaustion.And we advocate for increased attention to be directed toward the aspect of GPI-anchor biosynthesis in future research endeavors.In-depth exploration of the impact of GPI-anchor biosynthesis on T cell function may contributes to uncovering mechanisms of tumor immune tolerance and provides a theoretical basis for the development of novel immunotherapy strategies.

Risk score ¼ on
Coefficient of gene ðnÞ Â Expression of gene ðnÞ2.10Immunofluorescence and cellular experiment25 BC patients were recruited from the First Affiliated Hospital of Wenzhou Medical University (Supplementary Table

2 Functional
FIGURE 2 Functional enrichment analysis of the GPI-anchored biosynthesis.(A) Heat maps of Differential gene in patients with high and low GPI-score(|log2 fold change (FC)|≥1 and p< 0.05).(B) KEGG enrichment analyses of Up-regulated differential genes(up) and down-regulated differential genes(down).(C) GSEA analysis, the first 15 statistically significant pathways were presented.(D) The abundance of 22 immune cells in patients with high and low GPI-score from TCGA databases with CIBERSORT.(E) Expression of 24 immune checkpoints in different subgroups.(F, G) Immune infiltration scores and Tumor Purity score for different groups.(*p< 0.05; **p< 0.01; ***p< 0.001; ****p< 0.0001).

3
FIGURE 3 Characterization of TEX heterogeneity within the TME of BC. (A) The correlation between GPI-score and the abundance of 22 immune cells.(B) The correlation of GPI-anchored biosynthesis related genes with immune cells.(C) TCGA BC patients were divided into four subgroups according to the specific TEX pathway.(D) CYT scores for the four clusters.(E, F) Boxplots depict the relative abundance of infiltrating T lymphocytes across distinct clusters.(G) Expression of GPI-score among the four TEX subgroups.(H) Heat maps showing the first 10 differential pathways in patients with C1 and C4 clusters.(I) The volcano plot displays the log fold change of differential pathways in patients from C1 and C4 clusters.

4
FIGURE 4 The signature of GPI-anchored biosynthesis with Single-cell analysis in TME of BC. (A, B) Dimensional reduction plot of single cells (A) 16 cell types, (B) tissue origins (tumor, normal).(C) The proportion of different types of cells in TME.(D) Identification of malignant epithelial cells by InferCNV algorithm.The CNV of epithelial cells in cluster 1, 3, 4, 8, 9, and 10 was significantly increased, and the tumor-derived epithelial cells in these clusters were defined as malignant epithelial cells.(E) Two-dimensional dot plot of malignant epithelial cells, non-malignant epithelial cells, and others.(F, G) Expression levels of GPI-score in different cell populations.(*p< 0.05; ****p< 0.0001) (H, I) Differential analysis of KEGG biological pathways in different cell populations.

5
FIGURE 5The pseudotime analysis of GPI-anchored biosynthesis related genes in TME.(A, B) Differentiation trajectory of CD8 T cells, colorcoded for pseudotime (A) and sub-cell types (B).The direction of cell differentiation was CD8 T cells to exhibited CD8 T cells.(C) Pseudo-heatmap of genes altered in the differentiation process of CD8 T cells in BC (D) Dynamic changes of the expression of 21 GPI-anchored biosynthesis related genes in CD8 T cells and exhibited CD8 T cells.(E, F) Differentiation trajectory of epithelial cells in BC, colorcoded for pseudotime (E) and sub-cell types (F).(G, H) Two-dimensional plots showing the dynamic expression of GPI-anchored biosynthesis score during the T cell transitions (G) and epithelial cells transitions (H) along the pseudotime.
Finally, XGBoost and Logistic Regression were used to construct a T-cell exhaustion diagnostic model associated with GPI-anchored biosynthesis related genes in two BC cohorts.We evaluated the accuracy of these two models in predicting the occurrence of T-cell exhaustion in patients.The area under the ROC curve represents the accuracy of the model.Compared to the XGBoost algorithm, the Logistic regression model has greater AUC values in both the Metabric and TCGA cohorts, with values of 0.688 and 0.684 respectively (Figures 6G, H), while the AUC values for the XGBoost model are 0.637 and 0.623 (Figures

6
FIGURE 6 Machine learning-based models for the diagnosis of T cell exhaustion.(A, B) The correlation between GPI-score and Tex-score in Metabric (Spearman, r = -0.47,p = 3.2e-107) and TCGA BC queue (Spearman, r = -0.51,p = 1.8e-72).(C) Partial Dependence Plot of XGboost model, part of the graph shows the dependence of the change in T-cell exhaustion prediction on each gene variable.(D) The importance of the features ranked in descending order.(E-H) The ROC curve were established following XGBoost and Logistic regression fed by the GPI-anchored biosynthesis related genes in Metabric and TCGA BC queue.

7
FIGURE 7 Development and validation of GPIS with 101 machine learning methods.(A) The C-index of GPIS for each algorithm combination in Metabric and TCGA datset.(B) RandomForestSRC analysis of the genes obtained in RSF.(C) Coefficients of 13 genes obtained in RandomForestSRC.(D, E) K-M plots for riskscore in Metabric and TCGA sets.(F, G) Time-dependent ROC analysis of GPIS, (F) AUC for 3-, 4-, and 5-year OS in Metabric set.(G) AUC for 1 -, 3 -, and 5-year OS in TCGA set.(H) The multivariable Cox regression analysis of GPIS.