CD276 and the gene signature composed of GATA3 and LGALS3 enable prognosis prediction of glioblastoma multiforme

Glioma is the most common type of primary brain tumor, accounting for 40% of malignant brain tumors. Although a single gene may not be a marker, an expression profiling and multivariate analyses for cancer immunotherapy must estimate survival of patients. In this study, we conducted expression profiling of immunotherapy-related genes, including those in Th1/2 helper T and regulatory T cells, and stimulatory and inhibitory checkpoint molecules associated with survival prediction in 571 patients with malignant and aggressive form of gliomas, glioblastoma multiforme (GBM). Expression profiling and Random forests analysis of 21 immunosuppressive genes and Kaplan-Meier analysis in 158 patients in the training data set suggested that CD276, also known as B7-H3, could be a single gene marker candidate. Furthermore, prognosis prediction formulas, composed of Th2 cell-related GATA transcription factor 3 (GATA3) and immunosuppressive galactose-specific lectin 3 (LGALS3), based on 67 immunotherapy-related genes showed poor survival with high scores in training data set, which was also validated in another 413 patients in the test data set. The CD276 expression helped distinguish survival curves in the test data set. In addition, inhibitory checkpoint genes, including T cell immunoreceptor with Ig and ITIM domains, V-set domain containing T cell activation inhibitor 1, T-cell immunoglobulin and mucin-domain containing 3, and tumor necrosis factor receptor superfamily 14, showed potential as secondary marker candidates. These results suggest that CD276 expression and the gene signature composed of GATA3 and LGALS3 are effective for prognosis in GBM and will help us understanding target pathways for immunotherapy in GBM.


Introduction
Glioma is the most common type of primary brain tumor accounting for 40% of all malignant brain tumors [1].The World Health Organization (WHO) classifies gliomas into grades I-IV by malignancy and overall survival (OS) [1].Glioblastoma multiforme (GBM) is a fast-growing grade IV malignant glioma [1].GBM is the most common brain tumor in adults with a median

Random survival forests analysis
Random survival forests analysis was performed to determine variable importance factors distinguishing expression of genes with survival times, which was estimated by randomly permutating its values and recalculating the predictive accuracy of the model that were expressed as the log rank test statistics using the randomForestSRC package in R [17].

Cox hazards regression analysis
Correlations between gene expression and survival times were evaluated by multivariate analyses.Clinical characteristics were used as an additional variable to execute multivariate analyses.The statistical data was determined by the Cox proportional hazards regression model using JMP (SAS Institute Inc.) [18].

Multivariate correlation coefficient analysis
Correlation among variables were analyzed using the glasso package in R [19,20].

Statistics
Statistical analysis was performed using R software, Bioconductor, and JMP (SAS Institute Inc.) [15].P < 0.05 was considered statistically significant.

The gene signature for prognosis prediction in glioblastoma multiforme
We tested whether the prognosis prediction formula Z 1 derived from the training data set and Z 2 derived from test data set are effective to estimate prognosis in the test data set and training data set, respectively.However, the subgroup with Z 1 � -0.064 (median score) showed no difference in OS (HR = 0.853, P = 0.1554) in the test data set (S7A Fig) .Similarly, the subgroup with Z 2 � -11.118 (median score) also showed no differences in OS (HR = 0.735, P = 0.12) in the test data set (S7B Fig) .Then, we next focused on the common factors, including the GATA3 transcription factor and the LGALS3 β-galactoside-binding protein family, in the both data sets, thereby, constructed the formulas as follows (Fig 4A and 4B The training data set and test data set were divided into the two subgroups by the median scores of Z 3 by 3.22 and Z 4 by 0.102.The subgroups with Z 3 � 3.22 (HR = 1.549,P = 0.0253) and Z 4 � 0.102 (HR = 1.31,P = 0.0157), showed poor prognosis for OS in each data set (Fig 4A and 4B).These results suggest that the gene signature composed of common factors including GATA3 and LGALS3 would be useful to estimate prognosis of the patients with GBM.However, indices of each gene and cutoff values depended on the data set.Thus, the common formula using identical factors, indices, and cutoff values was unable to be constructed.The problem should be solved using a huge data with development of more useful and exact analytical methods in future studies.

Discussion
Naive T cells differentiate to Th1 and Th2 cells [21,22].Th1 cells are characterized by TBX21 and STAT4 expression [21,22].Interferon (IFN)-γ is expressed in Th1 cells and constitutively activates type I IFN-α/β against the growth of glioma cells [23].Further, IL-4 activates Th2 cells, which functions by eliminating extracellular parasites and producing effector cytokines [24].The effector cells of Th2 immunity mainly comprise mast cells, IL-4/IL-5 CD4 + T cells, and B cells [24].GATA3 and STAT6 play pivotal roles in Th2 cells [24].Although the score of Th1 gene signature, including TBX21, IFNG, and IL12RB1/2, makes it difficult to estimate prognosis in GBM, higher score of Th2 gene signature, including GATA3 and IL-4, is associated with poor prognosis in GBM [15].In the study, higher expression of GATA3, IL18R1, and TGFB3 were also  [25].TBX21, GATA3, and retinoic acid receptor (RAR)-related orphan receptor gamma thymus (RORC) stimulate CD4 + cells to differentiate into Th17 cells, defined by IL-17 production [26][27][28].Th17 cells also produce IL-2, which is required for generation and maintenance of Tregs.However, IL-2 inhibits Th17 differentiation [28].Dysregulation of Th17 cells causes malfunction of Tregs by decreasing TGF-β signaling [29,30].In the context of Treg differentiation as described above, higher expression of TGFB1 was also associated with poor prognosis in the study (S4F PD-1 is significantly correlated with genes including CD40, ICOS, IDO1, SATB1 and TGFB1, and other immune checkpoint molecules including CD276, CTLA4, LAG3, and TIM3, which represents an anticancer agent [31].The higher expression of PD-1 is associated with poor prognosis in patients with diffuse gliomas [31].The low score of Th2 gene signature with lower expression of PD-L1, PD-L2, and PD-1 is associated with good prognosis in GBM [15].In the study, lower expression of PD-  CD276 (B7-H3) is an immune checkpoint molecule that belongs to the CD28 family, which plays pivotal roles in T-cell suppression in glioma [32].The higher expression of CD276 showed poor prognosis in glioma patients from CGGA and TCGA [33]; this was also consistent with the present data (Figs 1C, 2E and 3D).The 4IgB7H3 isoform is a candidate of therapeutic target in GBM [34].Isocitrate dehydrogenase (IDH) mutation also seems to influence differential expression of CD276 between the grade II and higher-grade gliomas [35].Gene ontology analysis reveals that CD276 is associated with immune response, cell cycle, cell proliferation, and Toll-like receptor signaling [35].CD276 also discriminates endothelial cells resected from malignant tissues and normal tissues [36].Furthermore, in addition to the advanced colorectal and breast cancers, CD276-positive circulating endothelial cells also occur in higher frequencies in patients with GBM [36].Expression analysis represents the marked increase of GATA3 expression in phosphate-activated glutaminase-expressing GBM cell line and GBM patients [37].In this study, higher expression of GATA3 was associated with poor prognosis in the training data set (S4C Fig) , but not in the test data set (S6Q Fig) .Galectin-3, a glioma-related marker encoded by LGALS3, is a β-galactosidase-binding lectin that is important in cell proliferation, adhesion, and apoptosis [38].Galectin-3 is activated in microglia and macrophages according to the progression of glioma, however, it is not expressed in oligodendrocytic cells representing the early stage of glioma tumorigenesis [38].In this study, lower expression of LGALS3 (galectin-3) and LGALS9 (galectin-9) were associated with a poor prognosis in the test data set (Fig 3G and 3J), but not in the training data set.
In summary, we have demonstrated that a single gene CD276 (B7-H3) and the gene signature composed of GATA3 and LGALS3 would be promising marker candidates for prognoses in GBM.Interestingly, a combination of the expression levels of GATA3 and LGALS3 enables prognosis prediction in GBM, but each gene individually is not a single marker.In addition, indices of each gene in the prediction formulas have distinct eigenvalues based on the data set, which should be further analyzed in future.However, the aim of this study, a detection for diagnosis and/or prognosis marker candidates for GBM, namely, CD276, GATA3, and LGALS3, would have been achieved successfully by using of their expression data, clinical information, and multivariable analyses.Besides, we also found the second candidate of GBM diagnosis/prognosis markers, including TIGIT, HAVCR2, PDCD1, TIGIT, and TNFRSF14.These genes were associated with patients' survival and genetically interacted within a complex network hub, suggesting a possibility of simple diagnosis in GBM.Especially, it is of great importance that higher expression of the genes related to Th2 cells and stimulatory checkpoint molecules and lower expression of the Th1-related genes resulted in worse prognoses in the two independent GBM data sets.In addition, higher expression of the Treg-related genes also tended to show poor prognosis.These results could provide promising marker candidates for cancer immunotherapies, especially involving the inhibitory checkpoint, and would also make way for understanding and developing target therapies and pathways in GBM.
S1 Fig).The training data set constituted of 158 patients from the Glioblastoma Multiforme data set, as shown in S1A Fig.The median OS was 11.25 months (range: 0.16-88.07)(S1C Fig).

Fig 2 .
Fig 2. Prognostic marker candidates for cancer immunotherapy in the training data set of glioblastoma multiform (GBM).(A) Random survival forest analysis for cancer immunotherapy-related genes in GBM.Top 20 of variable importance was shown.(B) Prognosis prediction formula for cancer immunotherapy-related genes in GBM.The Z 1 score is calculated by the expression values of 16 immunosuppressive pathway genes.(C,D) Kaplan-Meier survival analysis using the Z 1 score = 4.11.(C) Overall survival analysis (OS).(D) Disease free survival analysis.HR, hazard ratio.(E-H) Kaplan-Meier survival analysis for representative immunosuppressive pathway genes in GBM.(E) CD276/B7-H3.(F) TIGIT.(G) VTCN1.(H) CD86.Numbers in the parentheses indicate the threshold of gene expression.High and low indicate subgroups with over and under the threshold.OS, overall survival.HR, hazard ratio.Subgroups were divided by the median score of Z 1 and the median expression of genes.https://doi.org/10.1371/journal.pone.0216825.g002
associated with poor prognosis in the training data set (S4C-S4E Fig), as well as CSF2 and IL-3/4/5/6/9/13 in the test data set (S6A, S6C and S6E-S6I Fig).The Th1/ Th2 lineage is developmentally different from the Th17 lineage Fig and S6K Fig).Furthermore, higher expression of CD163, FOXP3, and TGFB3 showed poor prognosis in the training data set (S4A, S4B and S4E Fig), but not in the test data set.Similarly, IL4 expression also showed poor prognosis in the test data set (S6F Fig).
1 and PD-1 ligands were associated with good prognosis in the training data set (Fig 1E) and the test data set (Fig 3H and 3I), respectively.In the study, lower expression of STAT1 for stimulatory checkpoint was associated with poor prognosis (S4K and S6S Figs), suggesting the suppression of stimulatory checkpoint activity as a prognosis marker candidate.

Fig 4 .
Fig 4. Gene signature constituted of Th2 cell-related gene GATA3 and immunosuppressive gene LGALS3 in glioblastoma multiform (GBM).(A) Survival distribution using the Z 3 score in the training data set of GBM.(B) Survival distribution using the Z 4 score in the test data set of GBM.OS, overall survival.HR, hazard ratio.Subgroups were divided by the median scores of Z 3 and Z 4 .https://doi.org/10.1371/journal.pone.0216825.g004

analyses for the genes involved in cancer immunotherapy in the training data set
).Moreover, we tried to develop a prognosis prediction formula using the 21 immunosuppressive genes, whereas a Cox hazard regression analysis returned only CD276 (S2 Fig).Higher expression of CD276 showed poor prognosis (HR = 1.923,P = 0.002) (Fig1C).Besides, higher expression of HAVCR2, PDCD1, TIGIT, and TNFRSF14, and lower expression of VTCN1 also showed poor prognosis (HR > 1.495, P < 0.05) (Fig 1D-1H).These results suggest that CD276 and TIGIT could contribute to prognosis in several analyses and construction of immunosuppressive genetic networks in the training data set of GBM.Second, we analyzed 67 immunotherapy pathway-related genes that include 17 Th1-related genes, 18 Th2-related genes, 21 stimulatory checkpoint genes, 21 inhibitory checkpoint genes, and 14 regulatory T cells (Treg)-related genes, thereby, focusing on the immunosuppressive genes to identify prognosis markers (Fig 2, S1 Table).Since the expression values were expanded in the genes variously categorized, the values were converted into a log scale by log 2 (x+1).Random survival forests analysis returned variable importance of each gene in the training data set (Fig 2A).Especially, TNFRSF18, TNFSF14, TNFSF9, TNFSF18, CD276, GATA3, TNF, STAT4, and TGFB1 were associated with relatively high scores (Fig 2A).Immunosuppressive genes including CD276, VTCN1, and CD96 were ranked by relatively high score of the variable importance (Fig 2A).Cox proportional hazards regression analysis resulted in 16 candidate genes being associated with the effect of variables on the OS (S3 Fig).