Identification of novel biomarkers in Hepatocellular carcinoma by integrated bioinformatical analysis and experimental validation

: Background : Hepatocellular carcinoma (HCC) is one of the most common poorly prognosed virulent neoplasms of the digestive system. In this study, we identified novel biomarkers associated with the pathogenesis of HCC aiming to provide new diagnostic and therapeutic approaches for HCC.Gene expression profiles of GSE62232, GSE84402,GSE121248 and GSE45267 were obtained in GEO database. Differential expressed genes (DEGs)between HCC and normal samples were identified using the GEO2R tool and Venn diagram software.Database for Annotation, Visualization and Integrated Discovery (DAVID) were used to carry out enrichment analysis on gene ontology (GO) and the Kyoto Encyclopaedia of Genes and Genomes pathway (KEGG). The protein-protein interaction (PPI) network of DEGs was constructed by the Search Tool for the Retrieval of Interacting Genes (STRING) and visualized by Cytoscape. Expressions and prognostic values of hub genes were validated through Kaplan-Meier plotter, Gene Expression Profiling Interactive Analysis (GEPIA), the Human Protein Atlas Database (HPA), western blot (WB) and quantitative real-time polymerase chain reaction (qRT-PCR). Additionally, potential small molecule drugs were screened by Connectivity Map (CMAP). Results : A total of 100 overlapped DEGs were detected and results showed 23 of which were up-regulated with the rest being down-regulated. STRING screened the 70 edges and the 199 nodes in the PPI network. Survival analysis showed that aberrant mRNA expression of TOP2A, DTL, ANLN, CDKN3, BUB1B, CDK1, PBK, RRM2, RACGAP1, PRC1, NEK2, ECT2, CCNB1, HMMR, ASPM was significantly associated with a low survival rate. Results of WB and qRT-PCR showed that the expression levels of ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A were all increased in HCC tissues. Furthermore, CMAP predict suggest the 10 most vital small molecule drugs could reverse the progression of HCC. Conclusions : Core DEGs (ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A) with poor prognosis and candidate drugs for HCC treatment were identified through integrated bioinformatic analysis . This study will contribute to providing prognostic biomarker and therapeutic strategies in HCC.


Background
As one of the most frequent virulent cancer, hepatocellular carcinoma (HCC) is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer death worldwide [1,2].During the past few decades, the occurrence of HCC has increased significantly in developed countries [3].Due to the difficulty of early diagnosis, rapid metastasis and high recurrence following surgical resection of HCC, there is still no effective treatment of HCC, the survival of HCC stays generally low and the 5-year survival rate approximately 30% [4][5][6][7].
Currently, gene expression profiling has been employed in medical research for over ten years. Using the gene chip, the genetic alteration can be identified quickly at the transcriptional level, pertaining to potential biomarkers for human diseases [8].
Recently bioinformatic studies showed the value of this method to find meaningful clues and uncover the underlying mechanisms [9][10][11].Therefore, the research on the pathogenesis of HCC and the continuous discovery of new targets is still a hot topic in the medical field. The development of high-throughput gene chip and sequencing technology makes it possible to efficiently study the gene expression profile of HCC and discover the gene expression and key genetic variation of tumor tissues and cells in a specific state.
Bioinformatics can be used to systematically study diseases through gene chip technology, methods of sequence alignment, statistical analysis, visual mapping, biological clustering, biomolecule network and pathway analysis [12][13][14].In present study, the microarray datasets GSE62232, GSE84402, GSE121248 and GSE45267 were selected from the Gene Expression Omnibus (GEO) database to identify DEGs.The Gene Ontology (GO) and Kyoto Encyclopaedia of Genes Genomes (KEGG) pathway analysis were conducted to elucidate DEGs. core genes from the common DEGs were identified through the construction of protein-protein interaction (PPI) network. Subsequently, significant DEGs were imported into GEPIA and Kaplan-Meier plotter to carry out survival analysis. Furthermore, the Human Protein Atlas Database (HPA) database was used to screen the related core genes with poor prognosis and quantitative real-time polymerase chain reaction (qRT-PCR) and western blot (WB) were used to verify the changes in gene expression level. In addition, several potential small molecules were repositioned using the Connectivity Map (CMAP).
The present study provides more accurate and practically reliable biomarkers into the molecular mechanisms, which serves as a reference for clinical studies in HCC. Fig.1 illustrates the workflow of this study.

Identification of DEGs in HCC
The gene expression profiles of GSE62232, GSE84402, GSE121248 and GSE45267 from NCBI-GEO, totalling 213 HCC samples and 100 normal samples was obtained.
The volcano plot showed the up-regulated and down-regulated DEGs in each dataset with the cutoff criterion of P˂0.05 and |log2FC| ≥ 2 (Fig.2). The Venn diagrams showed the 100 overlapping DEGs among the three datasets including 23 significantly up-regulated genes and 77 down-regulated genes (Fig.3).

Enrichment analysis
For regulated DEGs, 10 GO terms as well as the significant molecular pathways altered in HCC were summarised to explore their biological functions (Fig. 4, Table 1).
GO term analysis was carried out via BP, MF and CC. The BP analysis showed that these DEGs were mainly enriched in epoxygenase P450 pathway, oxidation-reduction process, exogenous drug catabolic process and xenobiotic metabolic process. MF analysis indicated that these DEGs were significantly associated with heme binding, oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, iron ion binding and arachidonic acid epoxygenase activity.
Changes in CC were mainly enriched in organelle membrane, extracellular region, midbody and blood microparticle. As shown in Fig.4D, the KEGG analysis results of the DEGs mainly involved in Retinol metabolism, Metabolic pathways, p53 signaling pathway and Steroid hormone biosynthesis.

PPI network construction and module analysis
The DEGs PPI network was established via Cytoscape software, which incorporated the information from the STRING database. After hiding the individual nodes, it consisted of 18 up-regulated genes and 52 down-regulated ones, with 70 nodes and 199 edges (Fig. 5). Subsequently the most meaningful subnetwork from the PPI network was extracted with MCODE and 15 core genes were proposed to be crucial in the signal transduction during the HCC progression.

Analysis and validation of hub genes
The 15 core DEGs We imported into the Kaplan Meier-plotter databases to test their significance in prognosis. TOP2A, DTL, ANLN, CDKN3, BUB1B, CDK1, PBK, RRM2, RACGAP1, PRC1, NEK2, ECT2, CCNB1, HMMR and ASPM were found to significantly worsen the survival (p < 0.05, Fig. 6). The DEGs expression between the normal and HCC samples was compared through GEPIA and results showed that all core genes were expressed higher in HCC samples. (p < 0.05, Fig. 7).
To validate the results, the protein expression level of core genes were identified using HPA database and it is found that ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A were in a higher expression state in HCC tissues compared to normal tissues ( Fig. 8, Table 2).

Evaluation of gene expression and protein expression in HCC
To further verify the expression of ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A genes in HCC tissues and corresponding adjacent normal tissues, six pairs of tumor tissues and corresponding adjacent non-tumorous tissues were quantified by qRT-PCR and WB. The results showed that the average ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A mRNA expression level and protein level in HCC tissues was significantly higher compared with non-tumourous tissues (*p<0.05, compared with adjacent non-tumorous tissues).

Identification of related active small molecules
To find potential candidate drugs that reverse the poor clinical outcome of HCC patients, drug retargeting analysis was carried out through CMAP. The 10 most significant small molecules are listed in Table 3

Discussion
Hepatocellular carcinoma (HCC) is one of the most fatal cancer in the world, whose metastasis and pathology can be attributed to the effect of genetic factors, epigenetic aberrations, endocrinological perturbations and environmental factors [29,30]. To diagnose and provide treatment to HCC, it is of crucial importance to understand its molecular mechanism. The technology of gene expression profiling has been proven to be an effective way for the identification of candidate biomarker and has been used widely in the prediction of HCC therapeutic targets. In this study, we conducted a bioinformatical analysis on 100 DEGs extracted from GSE62232,GSE84402, GSE121248 and GSE45267 to further explore prognostic biomarkers of HCC.
The analysis of GO and KEGG was conducted to understand the interactions among the DEGs.According to the biological process analysis, DEGs were found enriched significantly in oxidation-reduction process, epoxygenase P450 pathway, exogenous drug catabolic process and xenobiotic metabolic process. These findings confirm that reactive oxygen species (ROS) are deemed as the major reason for tumor development and progression [31,32]. Unlike normal cells, cancer cells are often characterized by the ability to produce ROS and the increase of their dependence on the antioxidant defense system [33]. The KEGG pathway enrichment analysis revealed eight significant pathways, including metabolic pathway, chemical carcinogenesis, p53 signaling pathway, and tryptophan metabolism involved in the occurrence and development of HCC. Studies have proved that metabolic pathways are involved in cell transformation, inflammation, tumor survival, proliferation, invasion, angiogenesis, and metastasis of cancer [34]. Metabolic reprogramming of cancer can be promoted by the absence of tumor-inhibiting factors and the activation of oncogenes, thereby enhancing nutrient uptake to provide biosynthetic and energetic pathways [35,36]. It is also reported that p53 signaling pathways have a significant impact on the development of HCC [37,38]. Cell cycle arrest, cell senescence, or apoptosis make up the response of p53 pathways to stress. In addition, the p53 pathways have been reported to encounter a disruption in almost all types of cancer including HCC [39][40][41].
In this study we extracted one module with the most significant degree from the The results suggest that the GEO database is consistent with the trend of gene and protein expression in the experiment, which confirms that we have reliable data analysis and experimental results.
It is reported that ANLN, CCNB1, DTL, RACGAP1, RRM2, and TOP2A are involved in tumorigenesis and proliferation. (a) ANLN is a cytoskeletal scaffolding protein that works as a pivotal medium of cytokinesis [42,43]. It includes the PH domain binding to Septins, Ect2 and RhoA, the C-terminal Anillin homology domain, and the conserved N-terminal actin and myosin binding domains [43,44]. Studies have shown that ANLN is a potential therapeutic target as its overexpression may cause the growth of human cancer. Hence, knockdown of ANLN could inhibit the proliferation of breast cancer and non-small cell lung cancer cell lines [45,46], and ANLN could be used as an independent prognostic indicator for patients with breast cancer [47]. (b) CCNB1 is a crucial protein in the cell cycle [48]. It is reported that effective G2/M transition bioenergy can be provided to cells through CCNB1/Cdk1-mediated phosphorylation, so as to shorten the time of the entire cell cycle [49]. CCNB1 serves as a therapy targeting HCC, which is directly suppressed by miR-144 [50]. It is consistent with our results that CCNB1 has a high expression in HCC and also has a close relationship with the poor prognosis of HCC patients [51].
Often, the prognosis of chemotherapy drug therapy is estimated using an expression of CCNB1. (c) DTL is a key regulator of genome stability and cell cycle progression [52]. DTL could motivate the growth of gastric cancer cells and mammary epithelial cells, and induction of defects in cytokinesis, apoptosis, and chromosomal segregation could silence DTL, thereby impairing the growth of these cells significantly [53]. Previous studies have shown that up-regulated expression of DTL occurs frequently in HCC of invasiveness, and its level is positively correlated with tumor grading and poor survival of patients [54]. (d) RACGAP1 is a cytokinesis-regulatory protein with high overexpression in multiple cancers. It is involved in cytokinesis, differentiation, migration, and other cellular processes [55][56][57], and controls cytokinesis by the formation of central spindlin compounds and the mediation of Rho-dependent signals for actomyosin contractile ring assembly [58,59].
RACGAP1 phosphorylation plays an important role in the regulation of its function during the whole mitosis [60]. It is phosphorylated at Ser387 of the GAP domain by the mitotic kinase Aurora B, that is, to transform RACGAP1 into an active GAP towards RhoA, which plays a vital role in mediating cytokinesis [61]. Recently, it has been reported that there is an association between the expression of RACGAP1 and tumor malignancy [62][63][64][65][66]. (e) Ribonucleotide reductase (RR) is a structural unit required for DNA replication and repair [67]. RR consists of two subunits, namely RRM1 and RRM2. RRM1 represents the relatively constant expression of the protein in the whole course of cell life, while dynamic changes occur upon stimulation at the time of RRM2 protein expression [67,68]. It has been confirmed that RRM2 is involved in the regulation and modification of proteins [69][70][71]. RRM2 is also believed to be an important component of tumor progression [72], a regulator of certain oncogenes [73], and a promising tumor biomarker [74,75]. RRM2 antagonizes sorafenib, an FDA-approved multikinase inhibitor for hepatoma therapy, which may be because it has the ability of partially rescuing hepatoma cells from long-term cytotoxicity induced by sorafenib [76]. (f) TOP2A is involved in the processes of cell cycle progression, chromatid separation, and torsional stress relief to encode DNA topoisomerase II alpha. It is reported that TOP2A is often co-amplified with HER-2 to reduce clinical outcomes in breast cancer and bladder cancer [77,78]. Besides, our findings also show that TOP2A is overexpressed significantly and has an association with a poor prognosis of HCC, which is consistent with results of previous studies [79].
Results of this study confirm that expression of mRNA and protein of ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A in HCC tissues are significantly higher than those in non-tumorous tissues, which could be potential targets for the diagnosis and treatment of HCC.
Based on the overlapping DEGs and CMAP database, several small molecular drugs with potential therapeutic effects on HCC have been identified. The results of this study show that apigenin, trichostatin A and resveratrol have anticancer activity.
Several biological and experimental studies have reported that apigenin as an edible plant-derived flavonoid is a type of anticancer agent. Apigenin could induce apoptosis by up-regulating caspase-3, caspase-8, and TNF-α expressions and activating exogenous caspase-dependent pathways [80]. Moreover, the combination of apigenin and 5-FU could significantly inhibit the growth of HCC xenograft tumors through the mitochondrial apoptotic process [81].
Trichostatin A is a histone deacetylase (HDAC) inhibitor specific in class I and II HDAC enzymes. Some studies indicated that pleiotropic effects of HDAC activity disruption were found in tumour cells, such as apoptosis, cell-cycle arrest, autophagy, decreased angiogenesis, and differentiation [82][83][84]. HDAC inhibition may improve the efficacy of radiation through DNA-repair signaling mechanisms and cell cycle checkpoint perturbations [85]. Resveratrol (3,5,4'-trihydroxystilbene) is a naturally occurring polyphenolic compound [86]. In a dose-dependent manner, it is able to significantly inhibit cell proliferation in HCC cell lines, partly due to inhibition of glycolysis in aerobic HCC cells [87]. Furthermore, the combination of resveratrol and matrine manifests significant anti-proliferative effects by induction of reactive oxygen species and apoptosis, down-regulation of survivin, and disruption of mitochondrial membrane potential [88]. Meanwhile it is worth noting that so far no research has focused on the potential role of thioguanosine, meticrane, felbinac, medrysone, phenoxybenzamine and daunorubicin in HCC.

Conclusions
Based on the four different gene expression profiles, core DEGs between HCC and normal samples was identified. Results showed that ANLN, CCNB1, DTL, RACGAP1, RRM2 and TOP2A play key roles in the progression of HCC. Also a group of potential small molecules were identified in this research presents. These biomarker signatures and candidate drugs may contribute to the development of accurate diagnosis and effective treatment strategies in future studies.

Screening for DEGs
The raw data from the downloaded datasets GSE62232, GSE84402, GSE121248 and GSE45267 were analyzed following integrated transformation and correlation analysis via GEO2R online tools [20]. With |logFC|≥2 while adj. p-value < 0.05, DEGs were identified between normal and HCC samples. Venn diagram software was used to obtain DEGs among the above mentioned three datasets. When logFC < 0, the according DEGs were treated as down-regulated genes; given logFC > 0, we treated the DEGs as up-regulated genes.

GO and KEGG enrichment analysis of DEGs
DAVID (https://david.ncifcrf.gov/) [21] was used to conduct pathway enrichment analysis and functional annotation through both GO and KEGG pathway analysis.Its aim is clarifying the shared DEG's molecular function (MF), cellular component (CC) and biological process (BP). GO is commonly used to identify unique biological features of high throughput genome or transcriptome data by defining genes and its protein or RNA products [22].Meanwhile, as a database connecting high-order functional information to genomic information, KEGG serves as a useful resource for systematic gene function analysis [23]. In all enrichment analysis, the p-value was corrected using the Benjamini-Hochberg's method and set the adjusted p-value threshold at ˂0.05.

Protein-protein interaction (PPI) network construction and module analysis
STRING (https://string-db.org/) [24] is an online database containing DEG-encoded proteins and PPI information.STRING to the DEGs was mapped with the threshold of the minimum required interaction score set at 0.400, representing medium confidence.
The subnet was visualized and analyzed via Cytoscape [25]. the significant modules from the PPI network were filtered using the app MCODE in Cytoscape, where node score cut-off = 0.2, k-core = 2, max. Depth = 100, and degree cut-off = 2. In the functional annotation tool of DAVID, the module is further analyzed through enrichment analysis.

Hub gene RNA sequencing expression and survival analysis
Using the KM plotter (http://kmplot.com/analysis/) [26], the overall survival (OS) between the two eigengenes of the module assigned groups was determined.The hazard ratio (HR) and logarithmic rank p-value at the 95% confidence intervals was calculated. Subsequently, we used the GEPIA platform (http://gepia.cancer-pku.cn/) [27] to evaluate how the hub genes affect patient prognosis. The hub gene expression in the samples of both normal and HCC tissues was analysed utilising the customisable functions offered by the platform.A series of box diagrams were used to show the differences among the risk groups at the gene level. Furthermore, considering that the expression level of poor prognosis genes in cancer tissues would change, we used the human protein atlas (HPA) database (http://www.proteinatlas.org/) for immunohistochemistry analysis and exploration.

RNA extraction and qRT-PCR analysis
Total RNA was extracted from tissues using the Trizol Kit

Western blot analysis
The tissues frozen by liquid nitrogen and ground were lysed and centrifuged at 12,000 g for 15 min at 4°C. Then total protein was extracted from the resulting supernatant, and the concentration was quantified using the bicinchoninic acid (BCA) assay. Equal amounts (20μg) of proteins were separated using 10% sodium dodecyl sulfate-polyacrylamide gel, followed by transfer onto polyvinylidene difluoride membranes. After blocking, the membranes were treated with rabbit monoclonal anti-human β-actin and LHPP antibodies (1:1000 dilution) overnight at 4°C. This was followed by incubation with appropriate HRP-conjugated secondary antibodies at room temperature for 1h, and detection was achieved using an enhanced chemiluminescence (ECL) kit (GE Healthcare, Beijing, China).

Identification of candidate small molecules
The potential candidate small molecules were screened by CMAP (http://www.broadinstitute.org/cmap/) [28]. The overlapping differently expressed probe sets were separated into two groups,including up-regulated and down-regulated.
The corresponding active small molecules possessing the potential to reverse the progression of HCC was matched. The positive connection value (approaching +1) indicates the ability of the small molecule to induce the state of HCC cells, while the negative connection value (approaching -1) indicates that the small molecule can reverse the above cancer cell state. Hypergeometric probability test was used to calculate the association between drugs and colorectal cancer.

Ethics statement
This study was performed with the approval of the institutional ethics committee of

Acknowledgements
Thanks go to the editor and the anonymous reviewers for their comments and suggestions.

Funding
This study was supported by the National natural science foundation of China

Availability of data and materials
The datasets supporting the conclusions of this article are available in public database from GEO, DAVID, STRING, GEPIA, Kaplan-Meier plotter,HPA and CMAP.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Figure S1
The potential transcription factors associated with the expression of hub genes.

Figure S2
A regulatory network of lncRNA-miRNA-mRNA constricted by GCBI.