Identification of abnormally expressed genes and their roles in invasion and migration of hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is the third leading cause of cancer death in the world, mainly attributable to its high migration tendency. Most patients with HCC miss the benefit of radical resection or liver transplantation because of distant metastasis of the tumor. Therefore, identification of diagnostic and therapeutic targets by analyzing abnormally expressed genes associated with tumor invasion and migration in HCC could be useful. In this study, we carried out a meta-analysis of three gene expression profile data sets (GSE45050, GSE84598 and GSE89377), including 112 liver cancer and 77 normal tissue samples, to identify candidate genes and pathways. The three gene expression omnibus (GEO) data sets had 155 differentially expressed genes (DEGs) in common, including 25 up-regulated and 129 downregulated genes. Protein-protein interaction (PPI) networks were generated in the STRING database and explored further in Cytoscape to identify network hubs. Moreover, the statistically significant (p < 0.05) functions and signaling pathways enriched by the shared DEGs were identified. The Kaplan-Meier curve was applied to analyze univariate survival outcomes of the hub genes, which suggested AURKA and CDC20 as independent prognostic factors in HCC. Additionally, we found that AURKA and CDC20 were markedly upregulated in HCC cell lines. Quantitative PCR, western blot, and Immunohistochemistry (IHC) was performed to determine mRNA and protein expression of AURKA and CDC20 in 20 normal liver tissues and 66 HCC tissues from patients with HCC who underwent complete surgical resection in stage 1 to 4. Results showed that expression of AURKA and CDC20 were significantly higher in HCC tissues than normal tissues and were increased from HCC stage 1 to 4. Importantly, through loss of function, we showed that AURKA and CDC20 both significantly promote HCC migration and invasion. In summary, we identified the key candidate DEGs and dysregulated pathways in HCC through bioinformatic analysis and experimental validation. These candidate DEGs and pathways enhance our understanding of the potential pathogenesis of HCC and may hold promise as markers for the diagnosis, treatment, and prognosis of HCC.


INTRODUCTION
Hepatocellular carcinoma (HCC), a common malignant neoplasm in the digestive system, is the third leading cause of cancer mortality worldwide [1] and ranks second in China [2]. Treatment options for HCC are limited by its complexity, heterogeneity, high recurrence rate after surgical resection, and diagnosis being primarily dependent on serological changes during disease progression [3]. Therefore, it is critical to understand the causative molecular mechanisms underlying HCC and any associations between them, to identify novel biomarkers such as target genes that can be used for early diagnosis, disease progression, and personalized therapy.
Recently, RNA sequencing and bioinformatics have been extensively utilized to identify general genomic changes during cancer development and to discover biomarkers for the diagnosis and prognosis of tumors [4]. Gene expression microarrays have been used in many studies to identify differentially expressed genes (DEGs) and pathways associated with HCC [5]. A comprehensive as well as systematic study of the interactions between DEGs and the pathways enriched by them can accurately identify the biological changes that occur during the development of HCC.
With the increasing number of datasets generated by microarray and RNA sequencing technologies, Gene expression omnibus (GEO) has become a significant resource for bioinformatics [6]. Datasets GSE45050, GSE84598 and GSE89377 from GEO were selected for this study. The DEGs shared by the three gene expression datasets were identified using the R package 'LIMMA' from the bioconductor project [7] and Venn diagram software. Subsequently, functional annotation of DEGS by GO categories, including biological processes (BP), cellular components (CC), molecular function (MF), as well as KEGG (Kyoto Encyclopedia of Gene and Genomes) pathway enrichment analysis was carried out using the Database for Annotation, Visualization and Integrated Discovery (DAVID). Thereafter, protein-protein interaction (PPI) networks were generated in the STRING database while Cytotype MCODE (Molecular Complex Detection) was used for the identification of significant modules. Ten hub genes were selected following exploration of the network with the Cytoscape plugin, cytoHubba. The Kaplan-Meier plotter online database was performed to obtain important prognostic information associated with these hub genes (P < 0.05) was studied using the online Kaplan Meier plotter. Expression level of hub DEGs between HCC samples and paired normal samples by gene expression profiling analysis (GEPIA) (P <0.05), 2 hub genes (AURKA and CDC20) were screened out. Although it has been reported that AURKA and CDC20 were overexpressed in HCC [8,9], which was consistent with our results of qPCR, western blot, and IHC, the study whether the upregulation of AURKA and CDC20 is related to the invasion and migration of HCC still needs further elucidation. Unexpectedly, we found that knockdown of AURKA or CDC20 both attenuated the invasion and migration of HCC cell lines. In conclusion, our study is aimed to screen and identify key candidate genes and pathways via bioinformatic analysis and experimental validation of function.

Identification of DEGs in HCC
A total of 112 cases of HCC and 77 cases of normal liver tissue were analyzed in this study. The online tool, GEO2R, identified 902 DEGs in GSE45050, 3499 DEGs in GSE84598 and 491 DEGs in GSE89377. DEGs in two representative samples from each of the three expression profiling datasets are shown in Figure  1A-1C). A total of 155 DEGs were common to the three datasets, including 129 downregulated (logFC < 1) and 25 upregulated (logFC > 1) genes in the HCC samples (Table 1 and Figure 1D).

Gene ontology and KEGG pathway analysis of DEGs in HCC
DAVID was used for functional annotation and pathway enrichment analysis of identified DEGs. In GO analysis, DEGs were identified to be significantly enriched in BP such as cellular response to cadmium ion, oxidation-reduction process, epoxygenase P450 pathway, negative regulation of growth, and cellular response to zinc ion (Table 2), CC such as extracellular region, organelle membrane, extracellular exosome, extracellular space, and blood microparticle (Table 2), and MF terms such as monooxygenase activity, heme binding, serine-type endopeptidase activity, oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen ( Table  2). KEGG pathways analysis showed that the DEGs were primarily enriched in metabolic pathways, complement and coagulation cascades, chemical carcinogenesis, retinol metabolism, and mineral absorption ( Table 2).

Key candidate DEGs and pathways identified through protein-protein interaction network (PPI) and module analysis
The PPI network of DEGs generated in the online STRING database is shown in Figure 1E and the most distinctive module identified in Cytoscape is shown in Figure 1F. Thirty genes that matched the degree criteria AGING (degree >10, that is the node had more than 10 interactions) were identified as hubs and the 10 most significant genes according to the node degree were  SPP2, FETUB, C8A, KLKB1, F9, MBL2, PLG, PTGS2, AURKA, CDC20 (Figure 2A). The full names, abbreviations and physical functions for these hub genes are presented in Table 3. The network of the selected hub genes and their co-expression genes generated in the cBioPortal online platform is shown in Figure 2B. Hierarchical clustering showed that the hub genes could distinctly categorize tumor tissues and normal tissues ( Figure 2C).

Hub gene screening and survival analysis of the cohorts
A correlation analysis between the expression profiles of hub genes and outcomes in liver cancer data sets from TCGA was carried out to study the clinical relevance of the hub genes in HCC. GEPIA analysis identified that HCC patients with genomic alterations in AURKA and CDC20 had poor overall and disease-free survival (Overall survival: AURKA P=0.00022, CDC20 P=3.8e-06; Disease-free survival: AURKA P=0.0012, CDC20 P=0.0026) ( Figure 3). Therefore, hub genes  AURKA and CDC20 may have an important impact on the progression of hepatocellular carcinoma. Moreover, to explore prognostic value of AURKA and CDC20 expression profiles in more tumor types, we analyzed the expression of AURKA and CDC20 in other types of tumors other than HCC and assessed the prognostic value based on TCGA cohort by the tumor-immune system interactions (TISIDB, http://cis.hku.hk/TISIDB/index. php) online database. As shown in Supplementary Figure  1, the high expression of AURKA is not only associated with the poor prognosis of patients with hepatocellular carcinoma, but also associated with the poor prognosis of adrenocortical carcinoma, kidney chromophobe, kidney renal clear cell carcinoma, brain lower grade glioma, and lung adenocarcinoma, etc. In addition, the high expression of CDC20 is related to the poor prognosis of patients with adrenocortical carcinoma, kidney renal clear cell carcinoma, brain lower grade glioma, lung adenocarcinoma, mesothelioma, and pancreatic adenocarcinoma, etc (Supplementary Figure 2).

Hub gene analysis
Gene expression profiles of AURKA and CDC20 were compared between HCC tumor samples and adjacent normal tissues in RNA-sequencing datasets from the TCGA database. The expression of AURKA and CDC20 was significantly higher in 369 HCC tissues compared with 160 normal tissues ( Figure 4A and 4D). Moreover, correlation analysis revealed that AURKA mRNA expression was remarkably correlated with tumor stages and grades ( Figure 4B and 4C). Similarly, CDC20 mRNA expression levels were also associated with clinical HCC stages and grades ( Figure 4E and 4F).
Overall, elevated expression of AURKA and CDC20 mRNA showed a notable association d with advanced clinicopathological parameters in HCC patients.
A total of 100 candidate genes with positive or negative correlation were identified using GSEA. Importantly, GSEA with the hallmark gene sets showed that the significantly enriched pathways for AURKA included cell cycle, DNA replication, endocytosis, P53 signaling pathway, progesterone mediated oocyte maturation, RNA degradation, spliceosome, and ubiquitin mediated proteolysis ( Figure 5A-5H). A heatmap of the transcriptional expression profiles of the 100 significant genes is shown in Figure 5I. The enriched hallmark gene sets enriched in CDC20 included base excision repair, cell cycle DNA replication, mismatch repair,     AGING oocyte meiosis, pyrimidine metabolism, RNA degradation, and spliceosome ( Figure 6A-6H) and the heatmap showing transcriptional expression profiles of the 100 significant genes are shown in Figure 6I.

Abnormal expression of AURKA and CDC20 in human HCC cells lines and clinical samples
To validate the results from bioinformatics analysis, we determined the expression of AURKA and CDC20 in two human HCC cells (Huh7 and SK-Hep-1) as well as normal human hepatic cells (QSG-7701) by qRT-PCR. As shown in Figure 7A and 7B, AURKA and CDC20 mRNA expression was markedly upregulated in both the HCC cell lines. This was further confirmed by western blot which showed high expression levels of AURKA and CDC20 protein in the two HCC cell lines ( Figure 7C). Furthermore, we collected HCC tissues (n=66) and nontumor samples (n=20) from patients with HCC who underwent surgical resection. According to Barcelona Clinic Liver Cancer staging system, these patients have been clinically divided into stage 1 (n = 20), stage 2 (n = 20), stage 3 (n = 20), and stage 4 (n = 6). We firstly validated the mRNA and protein levels of AURKA and CDC20 in each stage of patients with HCC as well as normal liver tissues by qRT-PCR and western blot. As shown in Figure 8A-8C, with the development of liver cancer, the mRNA levels of AURKA and CDC20 were 5-20 folds higher than that in nontumor tissues, and the protein expression of AURKA and CDC20 were also significantly upregulated in HCC tissues than normal tissues. Then, IHC was applied to analyze the proteins levels and the localization of AURKA and CDC20 in 5 paraffin-embedded normal liver tissues and 17 paraffin-embedded archived HCC tissue samples, which included 5 cases of stage 1, 5 Transcriptional level of AURKA expression was found to be significantly higher in 369 HCC tissues compared with 160 normal tissues (p<0.0001). (B) Transcriptional expression of AURKA was significantly correlated with AJCC stages, patients in more advanced stages expressed higher levels of AURKA mRNA. (C) Transcriptional expression of AURKA was significantly correlated with ISUP grade, patients in more advanced grade score expressed higher levels of AURKA mRNA. Highest AURKA mRNA expression was found in stage 3 or grade 4. (D) Transcriptional level of CDC20 expression was found to be significantly higher in 369 HCC tissues compared with 160 normal tissues p<0.0001). (E) Transcriptional expression of CDC20 was significantly correlated with AJCC stages, patients in more advanced stages tended to express higher levels of CDC20 mRNA. (F) Transcriptional expression of CDC20 was significantly correlated with ISUP grade, patients in more advanced grade score expressed higher levels of CDC20 mRNA. Highest CDC20 mRNA expression was found in stage 3 or grade 4.
AGING cases of stage 2, 5 cases of stage 3, 2 cases of stage 4 tumor tissues. Consistent with the results of qRT-PCR and western blot, expression of AURKA and CDC20 were remarkably increased along with the progression of HCC from stage 1 to 4. In contrast, AURKA and CDC20 were weakly detected in 5 cases of nontumor tissues ( Figure  8D). Moreover, both the AURKA and CDC20 were mainly localized in cytoplasm/ membrane of malignant cells. These results were further confirmed by quantitatively analysis of the H-scores of AURKA and CDC20 staining ( Figure 8E). Our results are consistent with previously reported studies: Li et al reported a higher level of CDC20 mRNA in three HCC cell lines (HepG2, SMMC-7721 and Huh7) as compared with the normal cell line (LO2) and in 14 pairs of clinical HCC tissues as compared with normal adjacent tissues from the same patients [9], whereas Jeng et al reported AUKRA overexpression in 61% of HCC cases and eight liver cancer cell lines [8], both of their studies are consistent with our results of qRT-PCR, western blot, and IHC, suggesting the accuracy of our experimental results. Taking together, our results suggest that AURKA and CDC20 are upregulated in HCC, thus suggesting that they may have an important role in the progression of HCC.

In vitro effects of AURKA and CDC20 on HCC migration and invasion
Although Li et al have investigated the effect of CDC20 on cell proliferation and cell cycle of HCC cell lines [9], whether AURKA and CDC20 have a role in the invasion and migration of HCC cells is still unknown. Therefore, we transfected Huh7 and SK Hep-1 cells with siRNA specific for AURKA (siAURKA) and CDC20 (siCDC20), at 50 nM concentration, to study the functions AURKA and CDC20 in HCC. As shown in Figure 7D and 7E, expression of AURKA was significantly silenced in Huh-7 and SK-Hep-1 cells by siAURKA-1 but not siAURKA-2, the CDC20 level was also remarkably down-regulated in Huh-7 and SK-Hep-1 cells by siCDC20-1 but not siCDC20-2. First, a wound healing assay was performed to investigate the roles of AURKA and CDC20 in the migration of HCC cell lines. The results presented in Figure 9 showed that knockdown of AURKA or CDC20 could remarkably impair cell migration compared with normal control cells. Next, we conducted the transwell invasion assay to evaluate the roles of AURKA and CDC20 in invasion of HCC cells. As shown in Figure 10, knockdown of AURKA or CDC20 could markedly AGING inhibit cell invasion. All these results suggested that AURKA and CDC20 may be key factors in the invasion and migration of HCC. Therefore, targeting AURKA or CDC20 can be a promising strategy for treating HCC patients with intrahepatic invasion or distant metastasis, thus ameliorating the prognosis of patients with HCC.

DISCUSSION
HCC is one of the leading causes of cancer-induced death worldwide. Although remarkable advances have been made in investigating the epidemiology, risk factors, and molecular mechanisms of HCC, and translated to the clinic, the prevalence and mortality rate from HCC continue to rise considerably with large numbers of HCC patients in advanced stages [10,11]. Similar to other solid tumors, the development of HCC is also characterized by abnormal genetic and epigenetic alterations. Numerous studies spanning across exome sequencing, transcriptome analysis, and genomic characterization have demonstrated that HCC tumors are heterogeneous at multiple molecular levels, with variable phenotypes and clinical outcomes [12][13][14]. Thus, it is imperative to identify a series of well-selected candidate genes and pathways associated with HCC for early diagnosis, prevention, and treatment. Bioinformatic approaches have led to the identification of several biomarkers in HCC, but few studies have verified bioinformatic analysis by experimentally.

AGING
In this study, we have analyzed three mRNA microarray datasets (GSE45050, GSE84598 and GSE89377) and identified a total of 155 DEGs, including 129 down regulated and 25 up-regulated genes, in HCC. Functional annotation using DAVID identified that the DEGs in HCC were enriched in GO terms such as cellular response to cadmium ion, oxidation-reduction process, extracellular region, extracellular exosome, and heme binding, and KEGG pathways including metabolic pathways, complement and coagulation cascades, and chemical carcinogenesis. The PPI network of DEGs created in the STRING database followed by a module analysis using the Cytoscape software identified ten hub genes, including SPP2, FETUB, C8A, KLKB1, F9, MBL2, PLG, PTGS2, AURKA, CDC20. The GEPIA analysis identified that that HCC patients with genomic alterations in the hub genes AURKA and CDC20 had worse outcomes of overall and disease-free survival.
AURKA belongs to the aurora kinase family, a major regulator of mitotic progression, which is upregulated in multiple human cancers [15][16][17]. AURKA is required for centrosome maturation and mitotic commitment at the late G2 phase, especially for centrosome replication, bipolar spindle formation, chromosomal segregation, and cytokinesis [18][19][20][21]. Abnormal expression of AURKA has been shown to cause chromosome instability and contribute to tumorigenesis [22,23]. In addition, a non-canonical function of AURKA in DNA duplication has also been demonstrated. AURKA promotes the development of cancers by phosphorylating Akt, mammalian target of the rapamycin (mTOR), and mitogen-activated protein kinase (MAPK), and thereby activating epithelialmesenchymal transition (EMT) reprogramming [24]. Moreover, overexpression of AURKA has been observed to be correlated with advanced HCC grades and stages [8], Additionally, AURKA has been shown to activate EMT and reprogram tumor cell stemness, and thereby contribute to cancer metastases [25,26], Thus, the findings from these studies are consistent with the results of our study and suggest that AURKA may indeed be a key regulator of HCC invasion and metastasis.
CDC20 (cell division cycle 20) encodes a regulatory protein which interacts with the anaphase-promoting complex/cyclosome (APC/C) in the cell cycle and is involved in development of solid tumors [27,28]. There is evidence to support that CDC20 is involved in the progression of a variety of malignancies including pancreatic cancer, oral squamous cell carcinoma, gastric carcinoma, cervical cancer [29][30][31][32]. Moreover, CDC20 was shown to be associated with cell proliferation and cell cycle in liver cancer cells [9]. However, the relationship between CDC20 and the metastasis of HCC cells is not yet well understood. Knockdown of both, AURKA and CDC20 by siRNA was observed to attenuate invasion and migration of HCC cell lines. Thus, AURKA and CDC20 can be potentially used as indicators of intrahepatic invasion or distant metastasis in HCC patients.
The aim of our study was to identify the key candidate DEGs and gene regulatory networks based on DEGs in HCC. We identified the hub genes associated with poor prognosis, AURKA and CDC20, and explored their roles in migration and invasion of HCC cell lines. However, there are a few shortcomings in our study. First, the datasets utilized in this study are from the GEO database and not directly obtained from patient samples thus limiting our control on the experimental conditions and the quality of the results. Second, the identification of abnormally expressed genes associated with tumor invasion and migration may be best identified by comparing tissues with and without invasion and metastasis. Third, we have not explored the signaling mechanisms involving AURKA and CDC20 and additional research will be needed to further elucidate the detailed mechanisms by which these key candidate DEGs modulate HCC.
In conclusion, this study has identified key candidate DEGs and hub genes associated with HCC tumors, enhancing our understanding of the underlying molecular mechanisms, and which can be potentially used for early diagnosis, treatment and prognosis of HCC. Additional research is imperative to clearly elucidate the roles hub genes in HCC.

Gene expression datasets
The three datasets analyzed in this study, GSE45050, GSE84598, and GSE89377, were downloaded from NCBI-GEO, a free public database of transcriptional expression datasets. GSE45050 data was obtained with the GPL6244 Platforms (Affymetrix Human Gene 1.0 ST Array, with transcript gene version) and sourced from six liver tumors and ten non-tumor liver tissue samples (including five liver cirrhosis, two fatty changes of liver, and three normal liver tissues), GSE84598 data was based on the GPL10558 platform (Illumina HumanHT-12 V4.0 expression beadchip) and collected from 44 hepatocellular carcinoma and 22 nontumor tissues. GSE89377 data was obtained from the GPL6947 platform (Illumina HumanHT-12 V3.0 expression beadchip), and consisted of 13 normal liver samples, 20 chronic hepatitis, 12 cirrhosis, and 62 liver tumors tissue samples.

Data preprocessing and identification of DEGs
Raw microarray data was preprocessed to eliminate noise ensure integrity. Background correction of probe data, standardization, and summarization was performed by the robust multiarray average analysis algorithm [33] in the affy package of R [34].
DEGs between HCC and non-cancerous samples were identified using GEO2R, an interactive web tool that allows users to compare two or more datasets in a GEO series in order to identify DEGs across experimental conditions. The adjusted P-values (adj. P) and Benjamini and Hochberg false discovery rate were applied to provide a balance between statistically significant genes and false-positives. Probe sets without corresponding gene symbols or genes with more than one probe set were removed or averaged, respectively. Absolute logFC > 1 and P value < 0.05 were considered statistically significant. The raw data in. TXT format were compared using an online Venn software to identify the DEGs common to the three datasets. DEGs with log FC < 0 were considered as down-regulated genes, while the DEGs with log FC > 0 were considered as up-regulated gene.

GO and KEGG pathway enrichment analyses of DEGs
DAVID (http://david.ncifcrf.gov) (version 6.7) is an online biological information database that integrates biological data and analysis tools, and provides a comprehensive set of functional annotation information of genes and proteins for users to extract biological information [35]. KEGG is a database resource for understanding advanced functions and biological systems from large-scale molecular data generated by high-throughput experimental techniques [36]. GO is a major bioinformatics tool for annotating genes and analyzing the biological processes of these genes [37]. We used DAVID to visualize the DEGs enrichment of BP, MF, CC and pathways (P < 0.05).

PPI network and module analysis
DEG-encoded proteins and the PPI were obtained using the online database STRING (available online: http://string-db.org) [38]. An interaction with a combined score > 0.4 was considered statistically significant. Cytoscape (version 3.7.1), a public source bioinformatics software platform, was used to visualize and analyze molecular interaction networks [39]. The plug-in MCODE (version 1.5.1) of Cytoscape can be used to identify the most dense and significant module in PPI networks [40]. MCODE was used with the following criteria: degree cut-off = 2, node score cut-off = 0.2, max depth = 100, and K-score = 2, to cluster the PPI network based on the topology to find densely connected regions.

Survival analysis
The Kaplan-Meier method was used to analyze differences in survival between groups. The primary endpoint was disease-free survival (DFS), the duration from the onset of curative treatment to the date of progression or the start date of the second-line treatment or the date of death, whichever occurs first. The overall survival (OS) as a secondary endpoint was the length of time from the date of diagnosis or first treatment to the date of death or last follow-up. The Kaplan-Meier method was used to estimate and account for the duration of follow-up, the 95% confidence interval (95% CI) and the log-rank test in the separation curve.
The overall score was determined as the sum of the weights of each important central gene.

Selection of hub genes and analysis
The Cytoscape plug-in cytohubba (version 0.1) was used to identify hub genes based on degree thresholds. Genes in the network with degrees ≥10 (a node had 11 or more connections/interactions) were identified as hubs. The online platform cBioPortal (http://www.cbioportal.org) was used to identify the coexpressed genes [41]. ClueGO is a plug-in of Cytoscape that can visualize the nonredundant biological terms for large clusters of genes in a functionally grouped network [42]. The biological process from GO analysis of hub genes was performed and visualized by ClueGO (version 2.5.4) and CluePedia (version 1.5.4), which was a functional extension of ClueGO and a plug-in of Cytoscape [43]. In addition, hierarchical clustering of the hub genes was constructed.

Gene set enrichment analysis (GSEA)
The TCGA database was used with the GSEA method using the Category version 2.10.1 package. For each separate analysis, Student's-t-test statistical score was performed in consistent pathways and the mean of the differential expression genes was calculated. A permutation test with 1000 times was used to identify the significantly changed pathways. The adjusted P value (adj. P) using Benjamini and Hochberg (BH) false discovery rate (FDR) correction was applied by default to handle the false positive results [44]. An adj. P less than 0.01 and FDR less than 0.25 was considered to be statistically significant. Statistical analysis and graphical plotting were carried out using R software (Version 3.6.1).

Patients and tissue samples
20 normal liver tissues and 66 HCC tissues were obtained from patients with HCC who underwent surgical resection at the Eastern Hepatobiliary Surgery Hospital, Second Military Medical University (Shanghai, China). According to Barcelona Clinic Liver Cancer staging system, these patients have been clinically divided into stage 1 (n = 20), stage 2 (n = 20), stage 3 (n = 20), and stage 4 (n = 6). Tissues frozen in liquid nitrogen were used for qRT-PCR and western blot, and tissues embedded in paraffin were used for IHC. Informed consent was obtained from all patients, and the study protocol was approved by our institutional review board of the Second Military Medical University.

Immunohistochemistry (IHC) and scoring
IHC for AURKA and CDC20 expression in normal liver tissues and HCC samples from patients in stage 1-4 was performed using standard methods. Briefly, paraffin sections (4 μm) were deparaffinage and antigen retrieval by citrate buffer. After blocking using bovine serum albumin (1%; BSA) in PBST, tissue sections were incubated with rabbit anti-AURKA or anti-CDC20 diluted 1:100 overnight at 4°C. The secondary polyhorseradish peroxidase (HRP) anti-rabbit IgG antibody (Abcam, China) was incubated in room temperature for 1 hour. The immunostained sections were doubleblindly reviewed and scored by two experienced pathologists. Based on the H-score method, the percentage and staining intensity of positive cells were evaluated by semi-quantitative results. Five visual fields (× 200) were observed on each section, and the total number of cells and cells stained at each intensity were counted in each field. Positive staining intensity: 0 =colorless; 1=light yellow; 2=medium brown; 3=dark brown. The H-score was calculated following the formula: (% of cells stained at intensity category 1x1) + (% of cells stained at intensity category 2x2) + (% of cells stained at intensity category 3x3). H-scores varied from 0 to 300 where 300 represented 100% of cells strongly stained (+++). High CDC20 expression was defined as staining H-scores of cells ≥200.

Wound healing assays
Cells were seeded in six-well plates at suitable density and cultured until 100% confluence. A yellow pipette tip was used to make a straight scratch, simulating a wound. Microscopy based photographs were taken at 0h or 24 h after wounding.

Transwell assay
The transwell invasion assay was performed using 24well transwell chambers (Greiner bio-one, Switzerland). First, the upper ventricular surface of the basal membrane of transwell chamber was coated with Matrigel (BD Bioscience, USA). Thereafter, 2 × 10 5 cells were suspended in 0.2 mL serum-free medium and added to the inserts. The lower compartment was supplemented with 0.5 mL medium containing 20% FBS as a chemoattractant. After incubation for 48 h at 37 °C, the cells on the upper surface of the membrane were carefully removed using a cotton swab and cells on the lower surface were continuously fixed with 100% methanol and then stained with 0.1% crystal violet. Five random visual fields of 200× magnification of each insert were selected and counted for cell numbers under a light microscope (Olympus, Japan).

Statistical analysis
The results are shown as mean ± SD. Differences between two groups were estimated using unpaired Student's t-test. A two-tailed value of P < 0.05 was considered as statistically significant.

Ethics approval
The Ethics Committee of The Third Department of Hepatic Surgery, Eastern Hepatobiliary Surgery Hospital approved the study.

AUTHOR CONTRIBUTIONS
The work presented here is a collaborative effort of all the authors. ZWP and YSX defined the research theme, guided the analyses, interpretation, and presentation. ZJN, QX, and LXG carried out the cell biology and molecular biology experiments, carried out the sequence alignment, and drafted the manuscript. MXN and HZP were involved in data collection, cohort validation, and drafting of the manuscript. ZW and ZCH helped to perform the statistical analysis. All the authors have read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors declare no Conflicts of interest.

FUNDING
This work is supported by Grants from the National Natural Science Foundation of China (No. 81972575)