Tsukushi is a novel prognostic biomarker and correlates with tumor- infiltrating B cells in non-small cell lung cancer

A recent study has reported that tsukushi (TSKU) may be related to the development of lung cancer. However, few studies focused on if TSKU associated with the prognosis and immune infiltration cells in non-small cell lung cancer (NSCLC). The effect of TSKU expression on prognosis with NSCLC was analyzed in the PrognoScan database and validated in The Cancer Genome Atlas. The composition of tumor infiltrating cells was quantified by methylation and expression data. We combined levels of tumor infiltrating cells with TSKU to evaluate the survival of patients. The analysis of a cohort (GSE31210, N=204) of lung cancer patients demonstrated that high TSKU expression was strongly associated with poor overall survival (P =1.90E-05). The combination of high TSKU expression and low infiltration B cells identified a subtype of patients with poor survival in NSCLC. Besides, the proportion of B cells in NSCLC patients with TSKU hypermethylation were higher than those patients with TSKU hypomethylation (P <0.001). Overall, high TSKU expression combined with low infiltration of B cells may associate with a poor prognosis of NSCLC patients. TSKU might be a potential prognostic biomarker involved in tumor immune infiltration in NSCLC.


INTRODUCTION
Non-small cell lung cancer (NSCLC) is an extremely common and complicated malignant tumor worldwide [1]. Although NSCLC therapy has made significant progress recently, the 5-year overall survival (OS) rates remain low, at only approximately 25% [2][3][4]. Recently, immunotherapy was developed as a promising treatment for many cancers, including NSCLC. Studies found that tumor-infiltrating lymphocytes (TILs), such as CD8 + T cells and CD3 + T cells, up-regulated the expression of the markers of immunomodulator, which may affect the efficacy of immunotherapy and associate with a poor prognosis in NSCLC [5,6]. DNA methylation plays a critical role in cell lineage specification [7,8], and studies have indicated that DNA methylation can accurately estimate the distribution of cell subtypes in the blood [9,10]. Therefore, DNA methylation may identify a specific molecular marker for the typing of immune cell subtypes, but it has rarely been explored in evaluating TILs in tumor tissue. In 2017, Jeschke, et al. first identified a methylation of TIL (MeTIL) signature by utilizing genome-wide DNA methylation profiling and then transformed the individual methylation values of the MeTIL markers into a score (MeTIL score) for the evaluation of TIL distributions to predict prognosis for breast cancer patients [11,12]. Therefore, it is significant and imperative to uncover whether individual genes and their methylation statuses relate to TILs in tissue and prognosis in NSCLC.
Tsukushi (TSKU) is a protein-encoding gene that is a new member of the small leucine-rich repeat proteoglycan (SLRP) family. Previous studies have found that Tsku is involved in multiple cell signaling pathways, including the BMP, FGF, TGF-β, and Wnt pathways [13][14][15], and serves as a principal coordinator by interacting with signaling molecules in different animal tissues [16]. However, there have been few reports on exploring the functional significance of TSKU in human cancers. In March of 2019, the study published by Yamada, et al. first reported that TSKU overexpression enhanced cell proliferation activity and inhibited the epithelialmesenchymal transition (EMT) in lung cancer cell lines [17]. Despite the possible functional potential of TSKU in cancer, little is known about whether TSKU is associated with clinical prognosis and tumor-infiltrating immune cells (TIICs) in human cancer. Previous studies have reported that TSKU serves as a modulator involved in the wound healing process via inhibition of TGF-β secretion from macrophages [18,19]. Moreover, TGF-β is recognized as a pleiotropic cytokine with immunoregulatory properties that activate the differentiation and proliferation of immune cells, including T regulatory cells (Tregs) and T helper 17 (Th17) cells [20,21]. Given the role of TSKU in regulating the expression of cytokines involved in immunoregulation in the wound healing process, we hypothesized that TSKU may be involved in the tumor immune response and have effects on prognosis in NSCLC.
Therefore, in this study, we analyzed the association between TSKU expression and the prognosis of NSCLC patients. We also evaluated the correlation of TSKU expression with TIIC levels in diverse tumor types. We further explored the relationship between TSKU methylation and the proportion of TIICs in lung cancer.

The expression levels of TSKU in different cancers
Based on the analysis of the Oncomine database, TSKU expression was higher in lung, bladder, brain and CNS, and other cancers than in normal tissues ( Figure 1A). Lower expression of TSKU in tumors than in normal tissues was observed in breast, kidney, and liver cancers and sarcoma. The detailed results of TSKU expression in multiple cancer types are summarized in Supplementary Table 1.
To further validate the differential TSKU expression between different tumor and normal tissues, we analyzed TCGA (The Cancer Genome Atlas) data via the TIMER (Tumor Immune Estimation Resource) database. The expression of TSKU was significantly higher in LUAD (lung adenocarcinoma), LUSC (lung squamous cell carcinoma), and READ (rectum adenocarcinoma) datasets than in normal tissues ( Figure 1B), while the expression of TSKU was lower in cancer than in normal tissues in BRCA (breast invasive carcinoma), CHOL (cholangiocarcinoma), COAD (colon adenocarcinoma), KICH (kidney chromophobe), KIRC (kidney renal clear cell carcinoma), LIHC (liver hepatocellular carcinoma), and STAD (stomach adenocarcinoma) datasets. These two databases showed consistent results in the differential TSKU expression between tumor and normal tissues in the lung cancer (LUAD and LUSC), BRCA, KICH, KIRC, and LIHC datasets.

Associations between TSKU expression and prognosis in different cancers
We evaluated the impact of TSKU expression on the prognosis of various cancers using PrognoScan (Supplementary Table 2). TSKU expression has been significantly associated with the prognosis in some kinds of cancers, including lung, head and neck, breast, and soft tissue cancers (Figure 2A-2F). The cohort (GSE31210, N=204) of lung cancer patients in PrognoScan demonstrated Kaplan-Meier survival curves that showed patients in the high TSKU expression have poorer survival than those in low TSKU expression in overall survival (P =1.90E-05) and relapse-free survival (P =6.60E-05). High TSKU expression was strongly associated with poor overall survival of patients with lung cancer by multivariate Cox regression analysis, with HROS of 4.700 (95 % CI 2.360-9.360, P =1.10E-05) and HRRFS of 3.400 (95 % CI 2.030-5.810, P =4.00E-06), respectively. In addition, the cohort (jacob-00182-HLM, N=79) of lung cancer patients with the high TSKU expression also showed poorer OS than those with low TSKU expression (P=0.029). Since the sample size is small for each cancer in PrognoScan, we merged GSE datasets in different survival statuses for every cancer type to perform a meta-analysis. The results of 14 types of meta-analysis included datasets in OS for 7 types of cancer, DFS (disease-free survival) for 2 types of cancer, DSS (disease specific survival) for 2 types of cancer, RFS (relapse-free survival) for 2 types of cancer, and DMFS (distant metastasis free survival) for 1 type of cancer. Among the 14 types of combination meta-analysis, we found that high TSKU expression was significantly associated with poorer OS in lung cancer and poorer DFS in colorectal cancer. (Lung cancer, N=1303, HR=1.260, 95% CI, 1.110-1.420; Colorectal cancer, N=413, HR=1.810, 95% CI, 1.000-3.290) (Supplementary Figure 1). By further validating the association between TSKU expression and prognosis as determined by OS and DFS in 33 types of cancers from TCGA data via GEPIA (Gene Expression Profiling Interactive Analysis) (Supplementary Figure 2), we found that patients in the high TSKU expression showed poorer survival than those in the low TSKU expression in LUAD (P=0.004), ACC (adrenocortical carcinoma), KIRC, MESO (mesothelioma), PAAD (pancreatic adenocarcinoma), and THCA (thyroid carcinoma). However, patients in the low TSKU expression demonstrated poorer survival than those in the high TSKU expression in DLBC (lymphoid neoplasm diffuse large B-cell lymphoma), PRAD (prostate adenocarcinoma), and UVM (uveal melanoma). These two databases revealed that TSKU expression has an impact on the prognosis of some cancers, including lung cancer (LUAD). Moreover, we analyzed the proportion of different TIICs between groups with higher and lower TSKU expression levels in NSCLC using the TIMER database. The samples with high TSKU expression had a lower infiltration level of B cells and CD4 + T cells than the samples with low TSKU expression in LUAD and LUSC ( Figure 3C, 3D).

Correlation between TSKU expression and gene markers of TIICs in lung cancer
Interestingly, while analyzing the relationships between TSKU expression and the marker genes of different immune cells, including CD8 + T cells, T cells (general), B cells, monocytes, TAMs, M1 and M2 macrophages, neutrophils, NK (natural killer) cells, DCs, exhausted T cells, and different subtypes of CD4 + T cells (T helper 1 (Th1) cells, T helper 2 (Th2) cells, follicular helper T (Tfh) cells, Th17 cells, and Tregs) in LUAD and LUSC (Table 1), we found that most of the gene markers of B cells and DCs significantly correlated with TSKU expression levels, especially CD19, CD20, CD21, and CD40L for B cells and HLA-DPB1, HLA-DQB1, HLA-DRA, and HLA-DPA1 for DCs ( Figure 4A-4D).

Prognostication of different NSCLC subtypes defined by the combination of TSKU expression and infiltrating B cell (or DC) levels
Tumor-infiltrating lymphocytes, which are identified as an independent predictor of survival, have the potential to affect cancer prognosis [22,23]. Therefore, we analyzed the impact of TIICs on the prognosis of NSCLC patients and found that patients with low levels of infiltrating B cell (HR=1.559; 95% CI, 1.179-2.062, Cox P<0.001) and DC (HR=1.437; 95% CI, 1.041-1.984, Cox P=0.026) presented a poorer prognosis in LUAD than patients with high levels of infiltrating B cell and DC ( Figure 4E). However, the infiltration level of B cells (HR=0.872; 95% CI, 0.645-1.180, Cox P=0.354) and DCs (HR=0.829; 95% CI, 0.618-1.113, Cox P=0.202) have no associated significantly with the prognosis in LUSC ( Figure 4F). Based on the association of infiltrating B cell and DC levels with prognosis in LUAD, we further explored whether the combined analysis of TSKU expression and infiltrating B cell (or DC) levels yielded different prognoses in NSCLC patients. Patients with high TSKU expression and low infiltrating B cell levels had poorer survival than those with low TSKU expression and high infiltrating B cell levels (HR=2.016; 95% CI, 1.330-3.057, Cox P=0.001) ( Figure 4G). A similar result was observed with infiltrating DC levels (HR=1.678; 95% CI, 1.080-2.607, Cox P=0.021) ( Figure 4H). Regardless of the disease subtype (LUAD or LUSC), patients with high TSKU expression and low infiltrating B cell levels presented a poorer survival than those with low TSKU expression and high infiltrating B cell levels. However, high or low TSKU expression and infiltrating DC levels did not affect the prognosis of patients in either LUAD or LUSC datasets (Supplementary Figure 3). These data suggest that the combination of high TSKU expression and low infiltrating B cell levels may be associated with a poor prognosis in NSCLC patients.

Correlation between TSKU promoter hypomethylation and elevated TSKU expression in NSCLC
To clarify whether the aberrant methylation of the promoter affects gene expression, we evaluated the correlation between the TSKU methylation level in the promoter region and its expression. There were quite a few probes in the promoter regions with a negative correlation between methylation and expression for TSKU in LUAD and LUSC, as analyzed by MEXPRESS (Supplementary Figure 4). We further analyzed the correlation of TSKU methylation with the expression level in LUAD and LUSC datasets from TCGA data using the MethHC database. There were significant negative correlations between differential TSKU methylation and expression level of all CpG sites (probes) in the promoter in LUAD (cor =-0.598, P <0.001) and LUSC (cor =-0.351, P <0.001) datasets ( Figure 5A, 5D). There were significant negative correlations between differential methylation and expression for some probes in the promoter region in LUAD, including cg20708135 (cor =-0.598, P <0.001) and cg20886049 (cor =-0.558, P <0.001) ( Figure 5B, 5C). In addition, a similar trend was observed in LUSC including the cg20708135 (cor =-0.329, P <0.05) and cg20886049 (cor =-0.374 P =0.004) probes ( Figure 5E, 5F).

Correlation between TSKU methylation and the proportion of infiltrating immune cells in LUAD and LUSC
We calculated the proportion of infiltrating immune cells in every sample using the EpiDISH (Epigenetic Dissection of Intra Sample Heterogeneity) algorithm and TCGA Infinium 450K methylation data in LUAD and LUSC ( Figure 6A, 6B) datasets and found that cancer tissues contained a higher proportion of infiltrating B cells, NKs, CD4 + T cells, and granulocytes than of CD8 + T cells and monocytes in both LUAD and LUSC datasets. Furthermore, the abundance of B cells and CD8 + T cells in cancer tissues were significantly higher than those in normal tissues. Nevertheless, the level of granulocytes in cancer tissues was lower than that in normal tissues ( Figure 6C, 6D).   We further evaluated the proportion of different TIICs between groups with higher and lower TSKU methylation levels in LUAD and LUSC samples from TCGA datasets (Figure 6E, 6F) and found that the proportion of B cells in cancer tissues with TSKU hypermethylation was higher than that in cancer tissues with TSKU hypomethylation. However, the level of monocytes was higher in hypomethylated samples than that in hypermethylated samples.

TSKU methylation status and prognosis in different cancers
In light of the significant negative correlation between differential methylation and expression, we further analyzed the association between methylation level of TSKU and overall survival in 24 types of cancer from TCGA data via the MethSurv database (Supplementary Table 3). We found that low TSKU methylation was associated with poor prognosis in ACC, BRCA, KICH, LGG (brain lower grade glioma), and PAAD, while high TSKU methylation was associated with good prognosis in KIRC and UCEC (uterine corpus endometrial carcinoma

DISCUSSION
In this study, we found for the first time that the levels of TSKU methylation and expression significantly correlated with tumor-infiltrating B cell levels in NSCLC. In addition, high TSKU expression combined with low tumor-infiltrating B cell levels may influence the prognosis of patients with NSCLC.
According to the Oncomine and TIMER databases, we found consistent results on the differential TSKU expression between tumor and normal tissues for the lung, breast, kidney, and liver cancer (Figures 1A, 1B). We further analyzed the association between TSKU expression and the prognosis of these cancers and found that, only in lung cancer, the high expression of TSKU was associated with a poor OS based on the above results of TSKU expression differential analysis ( Figure  2A, 2B; Supplementary Figure 2A-2G). In addition, we found only research on the functional mechanism of TSKU expression in lung cancer [17]. This study will support us to explore the association between TSKU expression and the prognosis of lung cancer patients based on public databases. These results suggest that TSKU may be a potential independent prognostic biomarker in lung cancer.
In previous studies, TSKU serves as a modulator involved in the wound healing process via inhibition of TGF-β secretion from macrophages (18). Since TGF-β is a pleiotropic cytokine with immunoregulatory properties that activates the differentiation and proliferation of immune cells, TSKU may involve in the immunoregulation and relate to the immune infiltrating cells. Therefore, we analyzed the relationship between TSKU and the level of tumor immune infiltrating cells to explore whether it is associated with the prognosis of lung cancer. And found that high TSKU expression correlated with low B cell and CD4 + T cell infiltration levels in both LUAD and LUSC ( Figure 3A-3D). Moreover, we also observed correlations between TSKU expression and gene markers of B cells and DCs, which demonstrated that TSKU expression might play a role in regulating tumor immunity in both LUAD and LUSC (Table 1). Although these correlations between TSKU expression and gene markers were not very strong, the low levels of B cell and DC infiltration, and mainly of B cell infiltration, were associated with poor prognosis in LUAD ( Figure 4E). We further found that the combination of high TSKU expression and low B cell infiltration identified a group of patients with poor survival in NSCLC ( Figure 4G). These results suggest that the co-assessment of TSKU expression and B cell infiltration levels may provide a useful assessment of the immunologic state in NSCLC and, in turn, the patient survival.
Recent studies have focused on the possible mechanisms that may explain why elevated TSKU expression and a low level of infiltrating B cells are associated with poor survival in NSCLC. TSKU, a 37 kDa core protein, is a prototype class IV SLRP that is considered a structural element of the extracellular matrix (ECM) [24]. Similar to TSKU, decorin (DCN) and biglycan (BGN) are two key SLRPs that have altered expression in various cancers with diverse clinical outcomes, and BGN serves as a potential marker of cancer proliferation associated with poor clinical outcome [25][26][27]. Moreover, the expression of CD40, serving as a marker of DLBC, is co-expressed with BGN and associated with a superior prognosis [28]. The previous study also confirmed that TSKU is more highly expressed in their lung cancer tissue (N=62) and cells and activates proliferation in cancer cells [17]. Therefore, TSKU expression may be related to clinical outcome development and may be indicative of a potential mechanism in which TSKU regulates B cell functions in NSCLC. Nevertheless, the mechanisms behind high TSKU expression leading to poorer survival in NSCLC patients with low levels of infiltrating B cell need to be studied further.
Another important aspect of this study was the significant negative correlation between differential methylation and expression in the promoter region (probes cg20708135 and cg20886049) of TSKU ( Figures 5A-5F). However, we did not observe a significant association between TSKU methylation and prognosis in NSCLC (Supplementary Table 3). A possible reason is that methylation does not serve as an independent factor regulating gene expression. Other factors, including copy number alterations, transcription factor production and recruitment, histone modifications, and microRNA expression, may also play a role in regulating TSKU expression [29]. In addition, the TSKU methylation probes from the TCGA Illumina Infinium HumanMethylation450 BeadChip are limited and do not include all probes to analyze the effects on prognosis. Therefore, it is necessary to explore further other factors affecting TSKU expression in addition to methylation. Currently, our results preliminarily demonstrate that TSKU hypomethylation in the promoter region increases the expression levels of TSKU and worsens the clinical outcome of patients. More importantly, we first utilized methylation levels in patients with NSCLC to evaluate the abundance of six types of TIICs ( Figure 6A, 6B). The proportion of B cells and CD8 + T cells were higher in tumors than in normal tissue ( Figure 6C, 6D). According to TSKU methylation levels, we further analyzed TSKU hypomethylation levels in cancer tissue and found a low proportion of B cells in lung cancer patients ( Figure 6E, 6F). These results are consistent with those found during the evaluation of infiltrating B cell levels in samples with high TSKU expression.
TILs were identified as a favorable prognostic marker that plays a critical role in shaping tumor development and determining treatment responses in the tumor microenvironment [30]. The reasons for selecting DNA methylation to estimate the composition and purity of TIICs were based on the following studies. First, a previous study demonstrated that DNA methylation might represent a specific biomarker for distinguishing immune cell subtypes [11]. Additionally, in 2019, Loo Yau, H et al. found that the aberrant epigenomes, including methylation alterations, observed in cancer cells and infiltrating immune cells that play a critical role in driving or mediating tumor progression and provide a vulnerability that may be utilized in epigenetic therapy [31]. Recent studies have often utilized DNA methylation data profiled by TCGA to accurately estimate tumor purity and cellular composition, such as MethylCIBERTSORT, EpiDISH, and CP (constrained projection) algorithms. In addition, EpiDISH has robust correlations, and it outperformed both CP and MethylCIBERSORT in terms of estimating mixed cell proportion [32][33][34]. Therefore, we selected the deconvolution method of EpiDISH to evaluate the intrasample heterogeneity for six types of TIICs. Advances in the deconvolution method to estimate both tumor purity and composition from DNA methylation data might provide some insights that reveal potential biomarkers for immunotherapy response and increase our understanding of the contribution of the tumor microenvironment in lung cancer.
In this study, we first evaluated the abundance of six TIICs in LUAD and LUSC methylation data using the EpiDISH algorithm. More extensive studies to determine the generality and feasibility of the EpiDISH method in other tumor tissues are needed. Additionally, we should further validate whether TSKU methylation in the promoter affects the expression of TSKU and clinical outcome using large NSCLC patient sample sets.
In summary, TSKU overexpression that combines with low infiltrating B cell levels to influence the prognosis AGING of NSCLC patients. Our study provides insights into the potential role of TSKU in tumor immunology and its identification as a prognostic biomarker.

Oncomine database analysis
We compared the TSKU mRNA levels of multiple cancers with the levels of corresponding normal tissues using the Oncomine database (http://www.oncomine. org). The threshold was selected as a P value=1E-5, with a 1.5-fold change.

Prognoscan database analysis
The associations between the expression of TSKU and survival in various types of cancer were analyzed using the PrognoScan database (http://www.abren.net/ PrognoScan/) [35]. The significance threshold was a Cox P-value< 0.05.

TIMER database analysis
TIMER is an integrative database that analyzes immune infiltrates in different cancer types (https://cistrome. shinyapps.io/timer), including information on TIICs in over 10,000 tumor samples across 32 cancer types from TCGA data, by applying a statistical deconvolution method to estimate the abundance of TIICs from gene expression profiles [36,37]. We first validated the differential TSKU expression between tumor and normal tissues using the Oncomine database analysis. Then, we further analyzed the correlations between expression of TSKU and the abundance of infiltrating immune cells, including B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells, in different cancer tissues and analyzed the association of TIICs with the prognosis of lung cancer patients. The correlation between TSKU expression and gene markers of TIICs (CD8 + T cells, T cells (general), B cells, monocytes, TAMs, M1 macrophages, M2 macrophages, neutrophils, NK cells, DCs, Th1 cells, Th2 cells, Tfh cells, Th17 cells, Tregs, and exhausted T cells) were estimated by Spearman's correlation [38,39].

MethHC database analysis
The MethHC database (http://awi.cuhk.edu.cn/~ MethHC/methhc_2020/php/index.php) integrates data regarding DNA methylation, gene expression, and the correlations between methylation and gene expression for different cancers of TCGA [41]. We analyzed the correlation between differential methylation and expression of TSKU in both LUAD and LUSC datasets using the MethHC database.

MEXPRESS database analysis
MEXPRESS is a data visualization tool designed for the easy visualization of TCGA expression, DNA methylation, and clinical data (http://mexpress.be/) [42]. We analyzed the methylation of TSKU with probes distributed in different regions and visualized the correlation between TSKU methylation and expression via the localization of each probe.

MethSurv database analysis
The MethSurv database (https://biit.cs.ut.ee/methsurv/) performs univariable and multivariable survival analysis based on DNA methylation data from TCGA [43]. We evaluated the associations between methylation levels of TSKU and prognosis in multiple tumor types.

EpiDISH package analysis
EpiDISH is an R package for inferring the proportions of a priori known cell subtypes present in a sample representing a mixture of such cell types. This package identifies differentially methylated cell types and the direction of their methylation change, including six cell subtypes (B cells, CD4 + T cells, CD8 + T cells, NK cells, monocytes, and granulocytes; noting that granulocytes consist of neutrophils and eosinophils) [32,34]. We assessed the proportion of six tumor-infiltrating cells in the tumor and normal tissues of lung cancer patients using the EpiDISH algorithm via the TCGA Infinium Human Methylation 450K arrays. According to the abundance of the six immune cells in every patient, we evaluated the proportions of different TIICs between groups with higher and lower TSKU methylation levels in LUAD and LUSC datasets.

Statistical analysis
The proportion of immune cell tumors estimated by gene expression data was downloaded by the TIMER database and HumanMethylation450 data to quantify immune infiltration analysis were downloaded by the TCGA lung cancer dataset from the NCI GDC data. These results were analyzed using the R statistical package (R version 3.5.2) and GraphPad Prism 8.00 software (La Jolla, CA, USA). All P values were twosided, and P values <0.05 were considered statistically significant for all statistical analyses.