FAM72 serves as a biomarker of poor prognosis in human lung adenocarcinoma

FAM72A–D promote the self-renewal of neural progenitor cells. There is accumulating evidence that FAM72 promotes tumorigenicity. However, its effects in lung adenocarcinoma (LUAD) have not been determined. Thus, we evaluated the prognostic value of FAM72A–D in LUAD using bioinformatics approaches. In particular, we evaluated the relationship between FAM72 and LUAD using a wide range of databases and analysis tools, including TCGA, GEO, GEPIA, Metascape, cBioPortal, and MethSurv. Compared with its expression in normal lung tissues, FAM72 expression was significantly increased in LUAD tissues. A univariate Cox analysis showed that high FAM72 expression levels were correlated with a poor OS in LUAD. Additionally, FAM72 expression was independently associated with OS through a multivariate Cox analysis. GO and GSEA revealed enrichment in mitotic nuclear division and cell cycle. Moreover, high FAM72 expression was associated with poor survival. An analysis of immune infiltration showed that FAM72 is correlated with immune cell infiltration. Finally, we found that the methylation level was associated with prognosis in patients with LUAD. In summary, these results indicate that FAM72 is a potential molecular marker for poor prognosis in LUAD and provide additional insight for the development of therapies and prognostic markers.


INTRODUCTION
Lung adenocarcinoma (LUAD) and lung squamous cell carcinoma are the two main types of non-small cell lung cancer (NSCLC) [1]. LUAD is the most common histological type of NSCLC [2]. Lung cancer has become the second most common cancer in both men and women. It accounts for 25% of all cancer-related deaths. During 2007-2013, the 5-year relative survival rate of lung cancer was only 18%. It is usually diagnosed at a relatively advanced stage [3,4]. Currently, the treatment for LUAD mainly includes surgical resection, chemotherapy, radiotherapy, and molecular targeted therapy. However, compared with the steadily increasing survival rates for many cancers, the overall survival (OS) for lung cancer has not improved substantially, primarily owing to the lack of effective therapeutic targets [5]. The most important and only curable treatment for LUAD is surgical resection. However, once distant metastasis occurs, LUAD cannot be cured through surgery. Therefore, there is an urgent need to identify novel molecular mechanisms and effective therapeutic targets for LUAD.
FAM72A-D (Family with sequence similarity 72 member A-D) is composed of four human-specific paralogs (A-D). FAM72 is a protein-coding neural stem cellspecific gene [6,7]. However, under certain conditions, FAM72 may lead to post-mitotic neuronal death and may promote the occurrence and development of cancers in tissues other than neuronal tissues [6,8,9], including breast cancer, prostate cancer, and glioblastoma [6,10,11]. FAM72 is associated with cancer cell division, proliferation, and differentiation. Additionally, high levels of FAM72 are associated with hypomethylation and affect the prognosis in various cancers [11,12]. These results suggest that FAM72 is a prognostic biomarker in cancer. However, it has not been established In this study, we performed the first analysis of the functions of FAM72 in LUAD. In particular, The Cancer Genome Atlas (TCGA) was utilized to analyze the expression of FAM72A-D and its clinicalprognostic value in LUAD. We validated the prognostic value in four independent datasets from the Gene Expression Omnibus (GEO) database. We first analyzed the expression differences of FAM72 in patients with LUAD between tumor tissue and normal tissue and then explored correlations between expression levels and clinicopathological parameters based on univariate and multivariate Cox regression models. We further developed a nomogram to predict prognosis and analyzed its predictive performance. To gain a more indepth understanding of the biological mechanisms underlying the effects of FAM72, we performed a Gene Ontology (GO) analysis and a gene set enrichment analysis (GSEA). Finally, we examined the correlations between FAM72 and mutations, immune infiltration, and methylation and comprehensively explored the link between FAM72 and tumorigenesis.

Clinical characteristics
Data (shown in Table 1) were collected from TCGA, including gene expression data and clinical data. Patient characteristics, including age, gender, pack-years, race, tumor site, epithelial growth factor receptor (EGFR) status, anaplastic lymphoma kinase (ALK) status, kirsten rat sarcoma viral oncogene (KRAS) status, TNM stage, pathological stage, and gene expression, were collected.

Expression status of FAM72 in LUAD tissues
As evaluated by the Wilcoxon rank-sum test, the expression of FAM72A-D was significantly higher in LUAD tissues than in normal tissues (P < 0.001) ( Figure  1A-1D). FAM72A-D was significantly overexpressed in LUAD (P < 0.001), revealing that the expression of FAM72A-D is associated with lung carcinogenesis. As shown in Figure 1E

Association between FAM72 and survival
OS was significantly reduced in patients with high FAM72A-D expression than in patients with low FAM72A-D expression (P ≤ 0.001) (Figure 2A-2D). To verify the association between FAM72A-D expression and OS, we used the GSE13213, GSE30219, GSE41271, and GSE50081 datasets from the GEO database. In these datasets, OS was significantly reduced in patients with high FAM72 expression than in those with low FAM72 expression (P = 0.009, 0.009, 0.038, and 0.005, respectively) ( Figure 2E-2H). We next used a univariate Cox regression model to analyze prognostic factors in LUAD (Table 2). In this analysis, high FAM72 expression levels were correlated with a worse OS. We then performed a multivariate analysis with the Cox regression model. Owing to missing data exceeding 20%, the M stage was not included in this analysis. The results indicated that FAM72A-D expression (all P < 0.01), age, and pathological stage are independently associated with OS. These findings demonstrated that increased FAM72 expression is correlated with a poor OS in LUAD.

Development of a prognostic model based on FAM72 and clinicopathological factors
A nomogram integrating FAM72A-D expression and independent clinical risk factors (age and pathological stage) was constructed ( Figure 3A-3D). A worse prognosis was represented by a higher total number of points on the nomogram. The C-index values were 0.7, 0.7, 0.69, and 0.68 for FAM72A-D based on 1000 bootstrap replicates. The deviation correction line in the calibration plot was close to the ideal curve (i.e., a 45° line), indicating that the prediction results are in good agreement with the observation results ( Figure 4A-4D).

FAM72-related functional enrichment analysis
A GO enrichment analysis of genes with correlated expression revealed various overrepresented terms in the three main functional groups (Figure 5A-5D): cellular component, biological process, and molecular function. In the cellular component category, FAM72 and genes with similar expression patterns were mainly involved in mitotic nuclear division, spindle, cell cycle, and DNA replication. Figure 6A-6D shows an interactive network of the most highly enriched terms (colored by cluster-ID, where distinct colors indicate enriched pathways).

FAM72-related signaling pathways obtained by GSEA
RNA-seq data obtained from TCGA were used to compare groups with high and low FAM72A-D expression. Approximately 20,000 differentially expressed genes were identified between the high and low expression groups. Based on a GSEA of all differentially expressed genes, various significantly enriched signaling pathways were identified, including EGFR signaling, lung cancer poor survival, undifferentiated cancer, proliferation, and cell cycle, as determined by the AGING normalized enrichment score (NES), adjusted P-value, and false discovery rate (FDR) ( Figure 7A-7D). Several pathways enriched in FAM72A-D were related to LUAD, including proliferation, EGFR signaling, undifferentiated cancer, lung cancer poor survival, and cell cycle pathways ( Figure 8A).

Genetic mutations in FAM72 and their associations with OS
We analyzed genetic alterations in FAM72A-D and their associations with OS in LUAD. As shown in Figure 8B, 8C, patients with LUAD showed a high FAM72A-D mutation rate. Among 510 patients with LUAD, genetic alterations were found in 122 patients, with an overall mutation rate of 24% and specific mutation rates of 14%, 17%, 12%, and 18% for the four loci, respectively. Furthermore, genetic alterations in FAM72A-D were associated with a poor OS in patients with LUAD. These results implied that mutations in FAM72A-D could affect prognosis in LUAD.

Correlations between FAM72 expression and immune infiltration
As shown in Table 3, activated CD4 T cells were significantly positively correlated with FAM72A-D expression. In other subsets, we found that FAM72 expression was associated with plasmacytoid dendritic cell, natural killer T cell, monocyte, mast cell, macrophage, immature dendritic cell, eosinophil, activated dendritic cell, type 2 T helper cell, type 17 T helper cell, T follicular helper cell, memory B cell, immature B cell, gamma delta T cell, effector memory CD4 T cell, central memory CD4 T cell, activated CD4 T cell, and activated B cell counts. PDCD1 (PD-1) and CD274 (PD-L1) levels were positively correlated with FAM72A-D expression (Table 4).

Correlation between FAM72 expression and methylation
As shown in Figure 8A, the methylation level of FAM72 is low. Its expression may be related to the hypomethylation level ( Figure 9A-9C). MethSurv was used to evaluate the effect of hypomethylation levels and FAM72 expression on prognosis in LUAD. We discovered that cg09169215, located in a CpG island, was associated with a poor prognosis ( Figure 9D).

DISCUSSION
Owing to the lack of early clinical features of LUAD, a large number of patients develop metastases [3,4]. Currently, the primary clinical treatments for advanced LUAD are chemotherapy, radiotherapy, and targeted therapy; however, these approaches are limited with respect to improving survival. Moreover, the prognosis is poor for stage III or higher in NSCLC, with fewer than 15% of patients surviving 5 years [13]. Most patients eventually die from chemotherapy resistance and cancer progression. Therefore, the identification of new molecular mechanisms is necessary to develop therapeutic targets in LUAD.
Our results indicated that FAM72 may be a prognostic biomarker in LUAD. High FAM72 expression was associated with a poor survival rate. FAM72 mutations were also associated with poor survival. FAM72 has been reported to promote cancer cell proliferation in glioblastoma and multiple myeloma [11,12]. Rahane et al. indicated that the increased expression of mitotic FAM72 in tumor cells is caused by upstream mutations in primitive oncogenes or in tumor suppressor genes, such as EGFR, RAS, BRAF, and TP53, resulting in increased cell proliferation. Silencing neural stem cellspecific FAM72 may prevent cancer cell proliferation [11]. These previous findings suggest that FAM72 is an oncogene.
Similarly, our results demonstrated FAM72 was upregulated in LUAD tissues, supporting its potential role in the development of LUAD. FAM72 paralogs might contribute to tumorigenesis by the activation of centrosome and mitotic spindle formation via mitotic cell cycle genes KIF23, ASPM, CEP55, KIF14, SGO1, CENPE, and BUB1 [11]. In addition, several studies have shown that FAM72 may be a target for tumor therapy [6,[10][11][12]14]. However, the association between FAM72 and LUAD has not been explored to date. Therefore, our results provide the first evidence for its therapeutic and prognostic value in LUAD.
Key terms identified in a GO enrichment analysis, such as mitotic nuclear division, spindle, cell cycle and DNA  Immunotherapy is a promising approach in tumor therapy. The level of T cell immune infiltration is related to the efficacy of immunotherapy and is therefore the focus of current cancer immunotherapy research [15]. A large number of studies have shown that high levels of B cells and T cells, such as adenocarcinoma B cells and breast CD8T, are associated with a better OS in many types of cancer, including LUAD [16]. NK cells are important mediators of anti-tumor immunity and ultimately participate in malignant cell killing [17]. Our results showed that FAM72A-D expression levels are correlated with immune infiltration, including levels of T cells, B cells, eosinophils, NK cells, monocytes, mast cells, macrophages, and dendritic cells. These results suggest that the expression level of FAM72 may indicate the level of immune infiltration, providing a reference for the application of immunotherapy in LUAD. In addition, we found that PDCD1 (PD1) and CD274 (PD-L1) expression levels are positively correlated with FAM72A-D expression levels. High levels of PD-L1 have been detected in many tumors, including NSCLC, and are associated with poor prognosis [18]. However, some studies have AGING indicated that elevated PD-L1 may be related to a better response to immunotherapy. A randomized phase 3 trial of pembrolizumab and docetaxel in patients with PD-L1-positive tumors suggested that the survival benefit was associated with PD-L1 expression (50% expression of PD-L1, HR = 0.53; 1-49% expression, HR = 0.76) [19]. Another study has shown that patients with PD-L1 overexpression have better clinical outcomes after anti-PD-1 therapy than those of patients with low expression [20]. Our results indicated that FAM72 is a candidate predictive biomarker for the efficacy of immunotherapy and is associated with prognosis in patients with LUAD receiving immunotherapy.

AGING
Several studies have demonstrated an inverse relationship between hypomethylation and gene expression in NSCLC [21,22]. Sato et al. indicated that high gene expression induced by hypomethylation is associated with poor prognosis in NSCLC [23,24]. Previous studies have shown that high FAM72 expression is associated with hypomethylation and outcomes in cancer [11,12]. We found that FAM72A-D show hypomethylation and that the hypomethylation level is associated with a poor   AGING prognosis in LUAD. Therefore, FAM72 expression may be epigenetically regulated by DNA hypomethylation, providing an additional prognostic factor.

AGING
Although our findings improve our understanding of the relationship between FAM72A-D and LUAD, our study had some limitations. First, the data were obtained from public databases, and data quality cannot be assessed. Further experimental studies are needed to confirm our results. Second, the specific role of FAM72 in the development of LUAD should be comprehensively evaluated, and a broader range of clinical factors should be considered. These analyses were limited by a lack of some information in public databases. Third, the correlation between FAM72 expression at the mRNA and protein levels needs to be verified by cellular and clinical experiments. Fourth, we cannot determine the direct mechanisms by which FAM72 promotes the development of LUAD, which should be a focus of future research. To further investigate the mechanism underlying the effects of FAM72 in LUAD, we plan to conduct cellular experiments in the near future.

AGING
In summary, the results of the present study partially revealed the roles of FAM72A-D in LUAD, providing a potential therapeutic and prognostic biomarker. In particular, we found that FAM72 may be a molecular marker of poor prognosis in LUAD. FAM72 mutations were associated with poor survival. Additionally, FAM72 may be a predictive biomarker for the efficacy of immunotherapy as well as prognosis in patients with LUAD undergoing immunotherapy. Finally, hypomethylation was associated with high FAM72 expression, providing additional insight into the molecular mechanisms underlying LUAD.

Clinical information from TCGA and GEO databases
Gene expression data (HTSeq-Counts and HTSeq-FPKM), phenotype data, and detailed clinicopathological information for TCGA-LUAD were obtained using the UCSC Xena browser (version: 07-20-2019, https://xenabrowser.net/datapages/). Sequence data were retrieved using the Illumina HiSeq_RNA_Seq platform. HTSeq-FPKM gene expression data were transformed into TPM (transcripts per million reads) for subsequent analyses. TPM yields more similar results to those produced by a microarray approach and facilitates comparisons among samples [25]. The inclusion criteria for data retrieval from the two databases were as follows: (1) pathological diagnosis of adenocarcinoma, (2) complete gene expression data, and (3) complete survival information. GEO datasets used GPL570, GPL6480 and GPL6884 platforms. The probe ID was converted to the gene symbol according to the related annotation file, and the average expression values for multiple probes corresponding to the same gene were calculated. Data used in the study were in accordance with publication guidelines provided by TCGA and GEO. No studies directly involving human participants or animal experiments were included. Ethics approval and informed consent were not required.

Statistical analysis
All statistical analysis was performed using the IBM SPSS statistical package (SPSS 26.0) and R (4.0.2). The Wilcoxon rank-sum test was used to compare the expression of FAM72A-D between LUAD and normal groups. The relationships between clinicopathologic parameters and FAM72A-D expression levels were evaluated with Wilcoxon signed-rank tests.

Construction and evaluation of the nomogram and prognostic model
A nomogram was constructed based on the optimal multivariate Cox regression analysis to predict the 1-year, 3-year, and 5-year survival probabilities. The rms R package (v. 6.0-1, https://cran.r-project.org/web/packages/ rms/index.html) was used to produce a nomogram. The concordance index (C-index) and calibration plot are frequently used to evaluate the quality of nomogram models. The C-index and calibration plot were evaluated using the Hmisc R package (v. 4.4-1, https://cran.rproject.org/web/packages/Hmisc/index.html). In this study, the C-index was used to determine the discrimination ability with 1000 bootstrap replicated. As the C-index increases, the prediction accuracy increases.

AGING
The calibration curve was visually evaluated by mapping the nomogram predictions to observed probabilities, and a 45° line represented optimal predictive values.

Functional enrichment analysis
GEPIA, a database based on the UCSC Xena project (http://xena.ucsc.edu), enables dynamic analyses and visualization of TCGA gene expression profile data [26]. In our study, GEPIA2.0 (http://gepia2.cancer-pku.cn/) was used to find genes with highly correlated expression levels to those of FAM72A-D in LUAD. The correlations between levels of these genes and FAM72A-D were greater than 0.62. Then, the functions of FAM72A-D and genes with correlated expression in TCGA-LUAD were predicted by a GO enrichment analysis, as implemented using Metascape (http://metascape.org) [27].

Gene set enrichment analysis
Expression profiles (HTSeq-Counts) were compared between high and low FAM72A-D expression groups to identify differentially expressed genes using the DESeq2 R package (v. 1.28.1, http://www.bioconductor.org/ packages/release/bioc/html/DESeq2.html) in R (4.0.2). A GSEA of approximately 20,000 differentially expressed genes was performed. GSEA determines whether a set of prior defined genes show statistically significant and consistent differences between two biological states [28,29]. In this study, GSEA was performed using the Molecular Signatures Database (MSigDB) Collection (c2.all.v7.0.entrez.gmt) of the clusterProfiler R package (3.18.0, http://bioconductor.org/packages/release/bioc/ html/clusterProfiler.html) to identify statistically significant pathway differences between high and low FAM72A-D expression groups in LUAD. The expression level of FAM72A-D was used as a phenotype label. Pathway terms with adjusted P-value < 0.05 and FDR q-value < 0.25 were considered significantly enriched.

FAM72A-D mutations and prognosis
cBioPortal (https://www.cbioportal.org) can be used to explore, visualize, and analyze multidimensional cancer genome data [30]. In our study, we analyzed the genomic profiles of FAM72A-D with a z-score threshold of ±1.8. Genetic mutations in FAM72A-D and their association with OS were evaluated.

Analysis of immune infiltration with respect to FAM72A-D expression by ssGSEA
To evaluate correlations between FAM72A-D and levels of immune cell infiltration, the ssGSEA (singlesample Gene Set Enrichment Analysis) method was applied using the GSVA package (v. 1.36.3, http://www.bioconductor.org/packages/release/bioc/htm l/GSVA.html). The immune reference set was obtained from the literature (http://dx.doi.org/10.1016/j.celrep. 2016.12.019) [31]. The relative levels of 28 types of tumor-infiltrating immune cells in the immunocyte signatures, including 782 genes for prediction in individual tissue samples, were evaluated. Based on 28 immunocyte signature genes in the literature, a relative enrichment score for every immunocyte was quantified from the gene expression profile of each tumor sample [32].

Correlation between FAM72A-D expression and methylation
Studies have shown that FAM72 is related to methylation. Therefore, we further analyzed the relationship between FAM72A-D and methylation. Spearman correlation coefficients were determined to evaluate the correlation between FAM72A-D and methylation levels in TCGA-LUAD. For this analysis, MethSurv (https://biit.cs.ut.ee/methsurv/) was used, which is a web tool for univariate and multivariate survival analyses based on DNA methylation biomarkers using TCGA data, containing 25 different types of cancer and 7,358 patients [33].

AUTHOR CONTRIBUTIONS
YL Yu, ZP Wang and JC Li designed this study. YL Yu and QH Zheng contributed to the data collection.