Splicing factor proline-and glutamine-rich is a prognostic biomarker and correlated with clinical pathologic features and immune infiltrates in hepatocellular carcinoma

Background: Hepatocellular carcinoma (HCC) is the fourth leading cause of cancer-related deaths globally. Splicing factor proline-and glutamine-rich (SFPQ) is a multifunctional protein that controls various biological functions. As a potential therapeutic target and a promising prognostic indicator, the potential effects and processes of SFPQ in HCC require further investigation. Methods: The RNA sequencing data were obtained from the Gene Expression Omnibus, International Cancer Genome Consortium, and The Cancer Genome Atlas databases to analyze SFPQ expression and differentially expressed genes (DEGs). We utilized the LinkedOmics database to identify co-expressed genes. A Venn diagram was constructed to determine the overlapping genes between the DEGs and the co-expressed genes. Functional enrichment analysis was performed on the overlapping genes and DEGs. Furthermore, our study involved functional enrichment analysis, a protein-protein interaction network analysis, and an analysis of immune cell infiltration. The cBioPortal and Tumor Immune Single-cell Hub were utilized to investigate the genetic alterations of SFPQ and the single-cell transcriptome visualization of the tumor microenvironment. A ceRNA network was established with the assistance of the ENCORI website. Finally, we elucidated the clinical significance of SFPQ in HCC by employing Kaplan-Meier survival analysis, univariate and multivariate Cox regression, and prognostic nomogram models. Results: The expression of SFPQ in HCC tissues was significantly elevated compared to normal tissues. GSEA results indicated that increased expression of SFPQ was associated with pathways related to HCC. The ceRNA network, including SFPQ, hsa-miR-101-3p, AC023043.4, AC124798.1, AC145207.5, and GSEC, was constructed with the assistance of ENCORI. High SFPQ expression was related to a poor prognosis in HCC and its subtypes. Univariate and multivariate Cox regression analysis showed that elevated SFPQ expression is an independent predictive factor. Conclusions: The overexpression of SFPQ may serve as a potential prognostic biomarker, indicating a poor prognosis in HCC.


Introduction
Hepatocellular carcinoma (HCC) is considered one of the most common types of primary liver cancers [1].In the last several decades, HCC-related mortality has increased more rapidly than deaths associated with any other type of cancer [2].The majority of people diagnosed with liver cancer are in the middle or late stages of the illness, and their prognosis is poor [3].Therefore, the search for a more reliable and potent diagnostic biomarker for HCC is necessary.
Splicing factor proline-and glutamine-rich (SFPQ) is a multifunctional protein implicated in the development of various cancers.Previous studies have shown that SFPQ plays a crucial role in post-transcriptional gene silencing [4], DNA double-strand break repair [5], and homologous recombination repair of DNA damage [6].SFPQ is upregulated in lung cancer mesenchymal stem cells and plays a role in enhancing mesenchymal stem cell proliferation, chemical resistance, and invasion capabilities [7].The expression and subcellular localization of SFPQ represent distinctive features that can serve as a diagnostic and treatment-monitoring marker in lung cancer [8].Additionally, loss of SFPQ/polypyrimidine tract-binding protein-associated splicing factor leads to apoptosis in BRAFV600E-driven colorectal cancer cells [9].SFPQ is also a significant driver of melanoma, likely attributed to SFPQ-RNA interactions that enhance the expression of numerous oncogenic transcripts [10].SFPQ/p54nrb protects cells from platinum-induced death, ultimately contributing to chemoresistance in epithelial ovarian cancer [11].However, the prognostic significance and tumor immunology of aberrant SFPQ expression in HCC require further investigation.
This research aims to investigate and comprehend the association between SFPQ expression and clinicopathological features and its prognostic relevance.This study delves deep into the expression levels of SFPQ and investigates the potential mechanisms of SFPQ through enrichment analysis, immune infiltration, single-cell sequencing analysis, and DNA methylation analysis.Additionally, the establishment of ceRNA network provides a better understanding of the function and regulatory mechanism of SFPQ in the organism.This work provides unique insights into the processes underlying HCC carcinogenesis and identifies SFPQ as a potential biomarker for the diagnosis and prognosis of HCC.

Source of data and preprocessing
We extracted data from The Cancer Genome Atlas (TCGA) database, which included 419 samples, consisting of 50 para-carcinoma tissues and 369 tumor tissues.For the purpose of this research, TCGA RNA sequencing data were converted from Fragments Per Kilobase Million format to transcripts per million.It is important to note that all written informed consents were obtained before data collection, as the TCGA database is accessible to the public in accordance with specific regulations.Additionally, gene expression profiles were sourced from the GSE36376, GSE64041, and HCCDB18 databases in Gene Expression Omnibus and International Cancer Genome Consortium.The Human Protein Atlas website presents immunohistochemical data on the expression of SFPQ in both normal and tumor tissues.

Co-expression genes and differentially expressed genes in HCC
SFPQ co-expression genes were identified using the LinkedOmics database through Spearman test.The criteria for co-expression genes were set as false discovery rate (FDR) < 0.05, P < 0.05, and |corporation| ≥ 0.3.Furthermore, patients with HCC from the TCGA database were categorized into high and low SFPQ expression groups based on the median score of SFPQ expression.Differentially expressed genes (DEGs) analysis between these groups was conducted using the R package DESeq2 [12].The criteria for DEGs were defined as an adjusted P value < 0.05 and |log2-fold change| > 1.The overlapping genes between DEGs and co-expressed genes were identified with the assistance of the Draw Venn diagram.

Functional enrichment analysis and gene set enrichment analysis (GSEA)
Clusterprofile 3.6.0was implemented to conduct gene ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis [13].Functional enrichment analysis was performed on overlapping genes.Furthermore, GSEA was performed to investigate the biological distinction between patients with high and low SFPQ expression [14].Enrichment results satisfying P.adj < 0.05 and FDR Q value < 0.25 were considered statistically significant.

Construction of protein-protein interaction (PPI) network
To investigate SFPQ-related proteins, PPI networks associated with SFPQ have been independently constructed using the STRING database and GeneMANIA database.The software Cytoscape_v3.6.1 with the MCODE plugin was utilized to visualize the PPI network and identify hub genes.

Immune infiltration analysis in HCC
The cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT) and single-sample GSEA (ssGSEA) algorithms from the GSVA package in R were utilized to analyze immune infiltration in HCC samples [15].The Wilcoxon rank-sum test was then employed to investigate the correlation between immune cell levels and the expression of SFPQ.

Genetic alterations and DNA methylation analysis
We utilized the cBioPortal to investigate the SFPQ genetic alterations in TCGA, Firehose Legacy dataset [16].Correlation analysis was performed using the MEXPRESS database to further investigate the association between SFPQ expression and DNA methylation [17,18].

Single-cell sequencing analysis
Tumor Immune Single-cell Hub was utilized to delve into the heterogeneity of the tumor microenvironment (TME) and visualize the single-cell transcriptome of the TME [19].Based on four datasets (LIHC_GSE140228_10X, LIHC_GSE140228_Smartseq2, GSE166635, and GSE98638), the expression of SFPQ across multiple cell types in the TME was investigated.

Construction of ceRNA network
The ENCORI website, as a database and web platform, offers various computational tools for analyzing the interactions between SFPQ and non-coding RNA, including microRNAs and long non-coding RNAs (lncRNAs).Cytoscape was utilized to construct lncRNA-miRNA-mRNA networks.

The generation of prognostic model
Univariate and multivariate Cox regression analyses were performed to determine whether SFPQ could function as an independent prognostic factor.The clinical parameters, including T stage and tumor status, were involved.In addition, the rms package and survival package constructed a nomogram and calibration plot for predicting 1-year, 3-year, and 5-year overall survival (OS).

Statistical analysis
Statistical analyses and graphs were performed and presented with R (version 3.6.3).The Wilcoxon rank-sum test was utilized to determine the statistical significance of SFPQ expression between normal and tumor tissues.Survival analysis was performed using the Kaplan-Meier method and the log-rank test, employing the median level of SFPQ expression as the cutoff value.Univariate and multivariate Cox regression, as well as prognostic nomogram models, were utilized to unveil the clinical significance of SFPQ in HCC.

Results
Elevated SFPQ expression in tumor samples compared to normal Submit a manuscript: https://www.tmrjournals.com/mdmtissues To investigate the variations in SFPQ expression between tumor and normal tissues, we initiated the study by evaluating the expression of SFPQ in the TCGA's pan-cancer data utilizing the TIMER database (Figure 1).The expression of SFPQ was elevated in eleven cancer types compared to normal tissue, including breast invasive carcinoma, cholangiocarcinoma, colon adenocarcinoma, esophageal carcinoma, head and neck squamous cell carcinoma, liver hepatocellular carcinoma, lung squamous cell carcinoma, prostate adenocarcinoma, rectum adenocarcinoma, stomach adenocarcinoma, and uterine corpus endometrial carcinoma.Conversely, SFPQ expression is low in kidney chromophobe and thyroid carcinoma.This data indicates that the mRNA expression of SFPQ is abnormally expressed in different cancer types.Figure 1 illustrates a significant upregulation of SFPQ expression in HCC tissues compared to normal tissues (P < 0.001).Additionally, the upregulation of SFPQ expression was verified using three datasets, including GSE36376, GSE64041, and HCCDB18.The HPA database provides the immunohistochemistry results of SFPQ in HCC tissues and normal tissues, further validating the elevated protein level of SFPQ in HCC tissues.In addition, the HPA database also helps us compare the 20 genes with the highest significance associated with an unfavorable prognosis in liver cancer.As shown in Table 1, the expression level of SFPQ ranks fourth among the expression levels of 20 genes.

Identification of co-expression genes and DEGs in HCC associated with SFPQ
We employed the LinkedOmics database to identify co-expression genes of SFPQ.Applying the criteria FDR < 0.05, P < 0.05, and |corporation| ≥ 0.3 as thresholds, we identified 3185 co-expression genes.We divided 419 HCC patients into high-and low-SFPQ expression groups based on the median SFPQ expression value.We identified 2036 DEGs (1625 up-regulated and 411 down-regulated) in the high-and low-SFPQ expression groups using a threshold of absolute log-fold change > 1 and P < 0.05 (Figure 2).713 overlapping genes between DEGs and co-expression genes were identified.The overlapping genes were then subjected to enrichment analysis for further investigation.

Functional annotation of SFPQ-associated differentially expressed genes in HCC
Gene ontology and KEGG enrichment analyses were utilized to investigate the functional significance of the DEGs in high and low SFPQ expression groups (Figure 2).As for biological process, nuclear division, chromosome segregation, nuclear chromosome segregation, sister chromatid segregation, and mitotic sister chromatid segregation were the most significant enrichments, accompanied by the notable enrichments in the spindle, chromosomal region, condensed chromosome, chromosome, and mitotic spindle the cellular component level.Molecular function included oxidoreductase activity, monooxygenase activity, microtubule motor activity, and steroid hydroxylase activity.Additionally, the KEGG pathway analysis indicated that SFPQ DEGs are involved in the cell cycle, complement, and coagulation cascades, metabolism of xenobiotics by cytochrome P450, and retinol metabolism.Subsequently, GSEA analysis revealed significant differences in multiple liver cancer-associated pathways between the high and low SFPQ expression groups (Figure 3).

Creating protein interaction network
The PPI network of the target protein was constructed using the STRING database and visualized through Cytoscape_v3.6.1 software with the MCODE plugin to identify hub genes.As illustrated in Figure 4, the PPI network displayed 141 nodes and 5224 edges, with a score of 74.629.Furthermore, GeneMANIA was utilized to generate and visualize the PPI network.The GeneMANIA analysis revealed 20 genes that are associated with SFPQ (Figure 4), including NONO, CHD3, PSPC1, RAD51D, RAD51, KDM1A, SNRPA, YLPM1, HES1, RPA3, U2AF2, HNRNPM, MATR3, MKI67, TOP1, GSK3B, SND1, LMO7, PINK1, TNF.The functions of SFPQ and these genes are primarily associated with the regulation of the apoptotic signaling pathway, recombinational repair, negative regulation of the apoptotic signaling pathway, regulation of oxidative stress-induced intrinsic apoptotic signaling pathway, telomere organization, regulation of oxidative stress-induced cell death, and DNA recombination.

Association between SFPQ and immunological infiltration
CIBERSORT and ssGSEA were utilized to investigate immune infiltration of SFPQ expression.As shown in Figure 5, the stacked bar chart of 22 immune cell types was generated by CIBERSORT analysis.The analysis of immune infiltration in the samples revealed that the high SFPQ expression group exhibited significantly higher proportions of B cell memory, plasma cells, T cell CD4 memory activated, T cells follicular helper, macrophages M0, and neutrophils as determined by CIBERSORT analysis.Furthermore, ssGSEA analysis indicated that the high SFPQ expression group showed elevated levels of activated CD4 T cell, activated dendritic cell, effector memory CD4 T cell, natural killer cell, plasmacytoid dendritic cell, T follicular helper, and type 2 T helper cell.These observations suggest that high SFPQ expression may be associated with increased T cell immune infiltration and activation, potentially implying a role for SFPQ in regulating T cell immune response.

Genetic alterations and DNA methylation analysis
The cBioPortal was utilized to investigate the genetic alterations of SFPQ.Genetic alterations of SFPQ occurred in 1.6% of the HCC patients, including mutations, amplification, and deep deletion.In addition, SFPQ was found to undergo missense mutation and frameshift mutation insertion, specifically including the substitution of isoleucine for methionine 579, threonine replacing lysine 421, and replacement of lysine for asparagine 511 at resulting in the appearance of a premature stop codon, causing early termination of protein synthesis.The MEXPRESS was utilized to investigate the correlation between SFPQ and DNA methylation.4 CpG sites (cg24719193 (r = -0.264,P < 0.001), cg23207673 (r = -0.307,P < 0.001), cg24344221 (r = -0.108,P < 0.05), and cg20800296 (r = -0.129,P < 0.01)) showed a negative correlation with SFPQ expression (Figure 7).

Single-cell sequencing analysis
On the basis of four datasets (LIHC_GSE140228_10X, LIHC_GSE140228_Smartseq2, GSE146115 and GSE166635), the single-cell sequencing analysis showed that SFPQ was mainly expressed within Tprolif cells, CD8 + T cells (CD8T), B cells, Mono/Macro.The findings suggest that SFPQ may play a crucial role in TME of HCC (Figure 8).

Construction of ceRNA network
ENCORI was utilized to predict and analyze the interactions between miRNA, mRNA, and lncRNA, which are utilized to construct ceRNA networks.As shown in Figure 9, hsa-miR-101-3p expression exhibited a significant inverse association with SFPQ in HCC (P < 0.001).In addition, hsa-miR-101-3p expression was significantly increased in HCC tissues compared to adjacent normal tissues (P < 0.001).Furthermore, patients with high expression of hsa-miR-101-3p exhibited a worse prognosis than those with low expression (P < 0.001).These findings indicate that hsa-miR-101-3p may play a role in regulating HCC progression by targeting SFPQ.

Prognostic model of SFPQ in HCC
The nomogram model was constructed to provide prognostic predictions for patients with HCC.The nomogram included T stage, tumor status, and SFPQ.Besides, we evaluated the nomogram's prediction efficacy using time-dependent receiver operating characteristic and calibration curves.The 1-year, 3-year, and 5-year areas under the curves were correspondingly 0.743, 0.65, and 0.604, respectively (Figure 12).
Prediction of small molecule drugs CTD database was utilized to screen for the correlation between SFPQ and potential drugs.As shown in Figure 13, 13 chemicals exhibited a significant inverse association with SFPQ expression, and 12 chemicals exhibited a significant positive association with SFPQ expression.In addition, the reference count of selected chemicals was visualized by R package ggplot2.PubChem and partcommunity (https://b2b.partcommunity.com/community/)displayed the 3d structures of the top four chemicals (bisphenol A, nanotubes carbon, ethinyl estradiol, and acrolein) with the highest reference count.

Discussion
HCC is one of the most frequent types of primary liver cancer [20].HCC is asymptomatic until cancer has reached an advanced stage, making early diagnosis through monitoring crucial for minimizing HCC-related mortality [21].
In this research, the expression of SFPQ in various cancers was compared.SFPQ was found to be considerably up-regulated in eleven cancers, including breast invasive carcinoma, cholangiocarcinoma, colon adenocarcinoma, esophageal carcinoma, head and neck squamous cell carcinoma, liver hepatocellular carcinoma, lung squamous cell carcinoma, prostate adenocarcinoma, rectum adenocarcinoma, stomach adenocarcinoma, and uterine corpus endometrial carcinoma.Based on GSEA, the research delved into the functional implications of SFPQ and the potential mechanisms underpinning its influence on HCC development and metastasis.The study unveiled that pathways highly enriched in the group with the high SFPQ expression were associated with HCC-related pathways.
Cibersort and ssGSEA analyses showed that elevated SFPQ expression was significantly correlated with T cell CD4 memory activated, T cell follicular helper, and type 2 T helper cell.In single-cell analysis, SFPQ was highly expressed in Tprolif and CD8T cell clusters.These observations suggest a potential association between elevated SFPQ expression and enhanced T cell immune infiltration and activation, indicating a potential role for SFPQ in regulating T cell immune response.
Prior studies have indicated that miRNAs and lncRNAs regulate cancer progression through multiple mechanisms [22][23][24].We utilized ENCORI to identify miRNAs and lncRNAs that potentially control HCC by regulating SFPQ.Hsa-miR-101-3p was downregulated in HCC tissues, negatively correlated with SFPQ expression, and associated with a worse prognosis in HCC patients who exhibited low hsa-miR-101-3p expression.Prior studies have shown that miR-101-3p inhibits Beclin-1-mediated autophagy and determines the sensitivity of HCC cells to oxaliplatin [25].Hepatitis B virus has been reported to downregulate miR-101-3p expression and promote HCC cell proliferation and migration by targeting Rab5a [26].
We identified a correlation between SFPQ expression and OS using Kaplan-Meier survival analysis.Elevated SFPQ expression was associated with a poorer prognosis in patients with T stage 1, 2, and 3, N0, M0, with tumor, male, age > 60, histologic grade G2, G3.In addition, the prognostic value of SFPQ in HCC patients was investigated.Through multivariate Cox regression analysis, T stage, tumor status, and high SFPQ expression were the independent predictive variables for OS deterioration.To predict the 1-year, 3-year, and 5-year OS of HCC, the nomogram prognostic model based on SFPQ expression level, T stage, and tumor status was further developed.The nomogram's predicting capacity was presented using time-dependent receiver operating characteristic curves and calibration plots.
This study offers a comprehensive explanation of the significance of SFPQ in HCC.The ceRNA network identified in this study sheds light on the complex regulatory interactions among SFPQ and its downstream targets, unveiling a novel layer of understanding in the molecular pathways underlying cancer pathogenesis.The implications of this study extend beyond the individual role of SFPQ to reveal a broader network of regulatory relationships that contribute to the multifaceted nature of cancer biology.

Limitations
Our limitation of the study is the reliance on public databases without experimental validation.The primary reliance on existing datasets and bioinformatics analyses may introduce inherent limitations, as the findings are based on computational predictions and not experimental confirmation.Future research efforts should consider complementing bioinformatics analyses with wet experiments to validate the findings and strengthen the conclusions drawn from the study.

Conclusions
SFPQ is speculated to be a possible indicator of poor prognosis in HCC.

Figure 1
Figure 1 SFPQ is overexpressed in HCC.(A) Expression of SFPQ in normal tissues vs pan-cancer samples.(B) Based on TCGA data, differential expression of unpaired difference analysis of SFPQ mRNA expression in HCC.(C-E) Verification of the expression of SFPQ in HCC relative to normal tissues in the GSE36376, GSE64041, and ICGC datasets.(F & G) Immunohistochemistry results of SFPQ in HCC tissues and normal tissues.HCC, hepatocellular carcinoma; SFPQ, splicing factor proline-and glutamine-rich; TCGA, The Cancer Genome Atlas; ICGC, International Cancer Genome Consortium.

Figure 3 Figure 4
Figure 3 Enrichment analysis of SFPQ in HCC.Gene set enrichment analysis of SFPQ in HCC.HCC, hepatocellular carcinoma; SFPQ, splicing factor proline-and glutamine-rich; FDR, false discovery rate; NES, normalized enrichment score.

Figure 5
Figure 5 Analysis of the relationship between SFPQ expression and immune infiltration in HCC.(A) Stacked bar chart of 22 immune cell types.(B & C) Differential enrichment scores of 22 immune cell types in SFPQ high/low expression groups as determined by CIBERSORT and ssGSEA analysis.HCC, hepatocellular carcinoma; SFPQ, splicing factor proline-and glutamine-rich; ssGSEA, single-sample gene set enrichment analysis; MDSC, myeloid-derived suppressor cell.

Table 2 Univariate Cox regression and multivariate Cox regression in HCC (continued)
BMI, body mass index; SFPQ, splicing factor proline-and glutamine-rich; CI, confidence interval; OS, overall survival.