Deep sequencing and comprehensive expression analysis identifies several molecules potentially related to human poorly differentiated hepatocellular carcinoma

Hepatocellular carcinoma (HCC) that is graded histologically as poorly differentiated has a high recurrence, metastasis and poor prognosis. We sought to determine the regulatory mechanisms of HCC tumorigenesis and to identify molecules closely related to poorly differentiated HCC. High‐throughput sequencing was used to construct microRNA (miRNA) and mRNA expression profiles for poorly differentiated HCC tissues and adjacent tissues. Network analysis was carried out to study miRNA–target interactions. Integrating the miRNA and mRNA data of HCC with four tumor grades from The Cancer Genome Atlas (TCGA) portal enabled the identification of potential closely related molecules for early diagnosis of poorly differentiated HCC. Electronic validation of RNA‐seq data and survival analysis was also performed. In total, 1051 differentially expressed genes and 165 differentially expressed miRNAs were identified between HCC tumor and paired non‐tumorous tissue. Based on 3718 miRNA–target interactions, we established an miRNA–target interaction network; the target genes were mainly involved in bile acid biosynthesis and bile secretion. Integrating expression data of HCC from TCGA indicated that two proteins, TM4SF1 and ANXA2, are convincing indicators for initial diagnosis of poorly differentiated HCC. According to the survival analysis, three proteins, ANXA2, C8orf33 and IGF2BP3, were identified as being associated with the survival time of HCC patients. Moreover, we suggest that hsa‐miR‐1180 may be an effective biomarker for poorly differentiated HCC. Three molecules, TM4SF1, ANXA2 and C8orf33, are potential biomarkers for distinguishing poorly differentiated from well‐differentiated HCC.

Hepatocellular carcinoma (HCC) that is graded histologically as poorly differentiated has a high recurrence, metastasis and poor prognosis. We sought to determine the regulatory mechanisms of HCC tumorigenesis and to identify molecules closely related to poorly differentiated HCC. Highthroughput sequencing was used to construct microRNA (miRNA) and mRNA expression profiles for poorly differentiated HCC tissues and adjacent tissues. Network analysis was carried out to study miRNA-target interactions. Integrating the miRNA and mRNA data of HCC with four tumor grades from The Cancer Genome Atlas (TCGA) portal enabled the identification of potential closely related molecules for early diagnosis of poorly differentiated HCC. Electronic validation of RNA-seq data and survival analysis was also performed. In total, 1051 differentially expressed genes and 165 differentially expressed miRNAs were identified between HCC tumor and paired non-tumorous tissue. Based on 3718 miRNA-target interactions, we established an miRNA-target interaction network; the target genes were mainly involved in bile acid biosynthesis and bile secretion. Integrating expression data of HCC from TCGA indicated that two proteins, TM4SF1 and ANXA2, are convincing indicators for initial diagnosis of poorly differentiated HCC. According to the survival analysis, three proteins, ANXA2, C8orf33 and IGF2BP3, were identified as being associated with the survival time of HCC patients. Moreover, we suggest that hsa-miR-1180 may be an effective biomarker for poorly differentiated HCC. Three molecules, TM4SF1, ANXA2 and C8orf33, are potential biomarkers for distinguishing poorly differentiated from well-differentiated HCC.
Hepatocellular carcinoma (HCC) is the most frequent type of primary liver cancer; it arises from hepatocytes and leads to a high worldwide mortality [1]. The hepatitis B virus (HBV) is the main risk factor associated with HCC, especially in developing countries such as those in Southeast Asia and sub-Saharan Africa [2]. HCC patients are always diagnosed at an advanced stage and miss out on potentially curative options. Although there have been some advances in HCC's diagnosis and therapy, its prognosis has not improved considerably because of a high recurrence rate [3].
The histological types and grades of HCC can be identified by histopathological examination of liver tissues, and there are usually four tumor grades: G1-G4 [4]. To some extent, the histological grading can estimate tumor malignancy and the differentiation level, which corresponds successively to well-differentiated, moderately differentiated, poorly differentiated and undifferentiated HCC cells. Specifically, HCC with a poor differentiation level represents greater malignancy of the tissue, which is related to higher recurrence, metastasis and poorer prognosis [5,6]. Therefore, poor differentiation can be an indicator of poor outcome in patients with HCC. It is important to diagnose the poorly differentiated HCC early, which would provide a significant benefit in treatment options and improve the prognosis.
In the current study, the miRNA and mRNA expression profiles of poorly differentiated HCC were created using high-throughput sequencing, and the miRNA-target interaction regulatory network was constructed. Then, we integrated the miRNA and mRNA data of HCC with four tumor grades (G1-G4) downloaded from The Cancer Genome Atlas (TCGA) portal to identify the stable closely related molecules. Additionally, electronic validation of RNA-seq data and survival analysis were further performed, which was helpful in identifying key molecules in poorly differentiated HCC.

Sample preparation
Three patients were enrolled in this study, and were diagnosed as HCC by imaging and pathology. All patients showed poor histological differentiation and hepatitis B infection. Their HCC tumors and matched adjacent nontumor control tissues were surgically resected. Informed consent was obtained from each patient and the project was approved by the Second Hospital of Dalian Medical University (no. 2017.19). The completed clinical data were obtained, and the clinical and pathological features of the three HCC patients are shown in Table 1.

RNA extraction and RNA-seq
Total RNA was isolated from the tumor and control tissues using Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocols. The RNA quality and quantity were verified with a NanoDrop 1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). An mRNA library was constructed using polyA selection using a TruSeq RNA Library Preparation Kit (Illumina, San Diego, CA, USA) and a small RNA library was constructed with the TruSeq Small RNA Sample Prep Kit (Illumina). Using the Hise-q2500/Miseq platform, mRNA and miRNA libraries were sequenced as 100 bp pair-ends and 50 bp single-end reads, respectively. We obtained the high quality reads after removing contaminating and low quality reads.

Identification of differentially expressed genes and miRNAs
The RNA-seq data were used to discover differentially expressed genes (DEGs) and miRNAs. Firstly, the processed reads were mapped to the human reference genome hg19 by the TOPHAT program version 2.0.8 using the default parameters [7,8]. According to the mapped reads, DEGs were identified using the CUFFDIFF tool in the CUFFLINKS program (http://coletrapnell-lab.github.io/cufflinks/cuffdiff/index.html). Moreover, the DEGSEQ package in R was used to determine differentially expressed miRNAs. P < 0.001 and |log fold change (FC)| > 2 were used as the cut-off criteria.
Integrative miRNA-mRNA analysis Differentially expressed miRNA and gene data from matched samples were further integrated. A given miRNA is usually negatively correlated with the expression of its targets. Therefore firstly, miRNA-mRNA pairs that were inversely correlated in expression levels were screened. Then, we extracted the extant miRNA-target relationship by validation or prediction in RNA22, MIRANDA, MIRDB, MIRWALK, PICTAR2 and TARGETSCAN, and the miRNA-target pairs recorded by > 4 algorithms were reserved. A search for inverse correlations between differentially expressed miR-NAs and mRNAs was conducted to obtain the miRNA-target pairs for HCC. Furthermore, miRNA-target pairs were subjected to construct the HCC-specific miRNA-target interaction network, which was visualized using CYTOSCAPE [9].

Pathway analysis of miRNA-targets
We used the web-based DAVID [10] to discover the biological functional interpretation of the miRNA-targets and address the enriched and redundant relationships among many genes in many terms. The listed target genes can be mapped to associated Gene Ontology (GO) terms [11]. Similarly, the target genes were also assigned Kyoto Encyclopedia of Genes and Genomes (KEGG) terms [12] and an enrichment analysis was also carried out.

MRNA and miRNA expression data for HCC from TCGA
To identify the stable miRNA and mRNA biomarker for poorly differentiated HCC, the mRNA and miRNA expression data for patients with HCC were downloaded from the TCGA data portal (http://tcga.cancer.gov; January 2016).
After principal component analysis, we removed the genes and miRNAs that were outliers, and 13 141 genes and 259 miRNAs were used for subsequent study. Genes and miRNAs associated with tumor grade were further identified by a P-value of < 0.05 using the linear by linear association test. Moreover, we used an ANOVA and Tukey's honest significant difference to detect the significant differences across the groups and false positives, and the different averages across the experimental groups were evaluated. Genes and miRNAs were regarded as DEGs and miRNAs with P < 0.05. In order to validate the expression of RNA sequencing data, especially those DEGs and miRNAs between G3 and other groups of HCC, we performed an electronic validation in the TCGA dataset. The selected differential genes were SLC6A13, APOA5, TOP2A, IGF2BP3, NDRG2, CYP4A11, CTH, KIAA1522, TM4SF1 and ANXA2. hsa-mir-942, hsa-mir-769, hsa-mir-1180, hsa-mir-1301, hsa-mir-483, hsa-mir-210 and hsa-mir-18a were validated as differentially expressed miRNAs.

Survivability analysis in HCC
Survivability is an important prognostic indicator in the development of HCC. In order to analyze the survival time of HCC patients, we evaluated the correlation of HCC survival with the expression of overlapped DEGs between RNA sequencing and the TCGA dataset by using the online software CBIOPORTAL for Cancer Genomics (http:// www.cbioportal.org/).

Overview of the RNA-seq data
To discriminate HCC from non-tumorous tissue, we selected three cases of HCC patients who showed poor histological differentiation and hepatitis B infection.  After sequencing of miRNA and mRNA libraries and filtering of raw RNA-seq reads, high quality clean reads were obtained (Table 1). Differential expression analysis was performed in normal-cancerous paired tissue samples. Using the criteria of P < 0.001 and |log FC| > 2, 1051 DEGs were identified between the HCC tumor and the paired non-tumorous tissue, including 492 up-regulated genes and 559 down-regulated genes. Moreover, 165 differentially expressed miRNAs were identified, including 157 up-regulated miRNAs and 8 down-regulated miRNAs. The clustering map of the top 100 DEGs is shown in Fig. 1.

Pathways analysis of miRNA targets
Functional enrichment analysis of miRNA targets showed that these were significantly enriched in the GO terms digestion, bile acid metabolic process, lipid metabolic process and liver development. Moreover, KEGG pathway enrichment analysis revealed that the miRNA targets were significantly associated with primary bile acid biosynthesis, bile secretion, fat digestion and absorption, and fatty acid metabolism (Table 3).

DEGs associated with poorly differentiated HCC
Using mRNA expression data for HCC with various grades of differentiation from the TCGA, differential expression analysis was performed to identify critical genes that can distinguish poorly differentiated HCC from others. A linear association test found that 577 genes were associated with tumor grade of HCC. Of these 577 genes, 51 were differentially expressed in the G3 vs G1 group, 18 were differentially expressed in the G3 vs G2 group, and 22 were differentially expressed in the G3 vs G4 group. Hierarchical clustering analysis indicated that the DEGs in the G3 group can be significantly distinguished from the G1 group (Fig. 3).
To find a stable biomarker for poorly differentiated HCC, the above DEGs were overlapped with DEGs from our sequencing data. This yielded 88 genes associated with tumor grade, in which seven DEGs (C8orf33, TSNAXIP1, TM4SF1, CTH, ANXA2, KIAA1522 and LRRC1) can distinguish HCC of G3 from G1, two DEGs (CYB5A and RRAGD) can distinguish HCC of G3 from G2, and two DEGs (CFHR4 and F13B) can distinguish HCC of G3 from G4 (Fig. 4). Additionally, electronic validation of several genes (SLC6A13, APOA5, TOP2A, IGF2BP3, NDRG2, CYP4A11, CTH, KIAA1522, TM4SF1 and ANXA2) that were differentially expressed between G3 and other grades in the RNA sequencing are shown in Fig. 5. All these genes can distinguish G3 from other groups of HCC, which was consistent with the RNA sequencing result.

Differentially expressed miRNAs associated with poorly differentiated HCC
Using miRNA expression data for HCC with various grades of differentiation from the TCGA, differential expression analysis was performed to identify the critical miRNAs that can distinguish poorly differentiated HCC from others. A linear association test found that 20 miRNAs were associated with tumor grade of HCC, of which there were six miRNAs (hsa-miR-103-1, hsa-miR-103-2, hsa-miR-1180, hsa-miR-151, hsa-miR-30b and hsa-miR-30d) differentially expressed in the G3 vs G1 group, hsa-miR-3074 differentially expressed in the G3 vs G2 group, and no miRNA differentially expressed in the G3 vs G4 group.
To find a stable biomarker for poorly differentiated HCC, the above miRNAs were overlapped with miR-NAs differentially expressed from our sequencing data. This yielded three miRNAs (hsa-miR-210, hsa-miR-1180 and hsa-miR-18a) associated with tumor grade, in which hsa-miR-1180 can distinguish HCC with G3 from G1. Furthermore, electronic validation of several miRNAs (hsa-mir-942, hsa-mir-769, hsa-mir-1180, hsamir-1301, hsa-mir-483, hsa-mir-210 and hsa-mir-18a) that were differentially expressed between G3 and the other grades in the RNA sequencing is shown in Fig. 6. All these miRNAs can distinguish G3 from the other HCC groups, which was consistent with the RNA sequencing result.

Survivability analysis
In this study, four overlapped DEGs (ANXA2, C8orf33, TM4SF1 and IGF2BP3) between RNA sequencing and the TCGA dataset were selected for survival analysis in HCC. According to the results of the survival analysis, the expression of ANXA2 (P = 0.00923), C8orf33 (P = 0.04) and IGF2BP3 (P = 0.00313) was significantly correlated with overall survival time of HCC patients (Fig. 7A-C). However, the expression of TM4SF1 was not remarkably associated with overall survival time of HCC patients (Fig. 7D).  Primary bile acid biosynthesis 9 7.61 9 10 À10 hsa04976 Bile secretion 14 3.56 9 10 À8 hsa04920 Adipocytokine signaling pathway 9 2.59 9 10 À4 hsa04115 p53 signaling pathway 9 2.59 9 10 À4 hsa05200 Pathways in cancer 21 4.8 9 10 À4 hsa04975 Fat digestion and absorption 6 2.537 9 10 À3 hsa04010 MAPK signaling pathway 14 2.0704 9 10 À2 Fig. 1. Clustering map of the top 100 DEGs in RNA-seq. The diagram presents the result of a two-way hierarchical clustering of 100 DEGs and samples. The clustering was constructed using the complete-linkage method together with Euclidean distance. Each row represents a DEG and each column a sample. The DEG clustering tree is shown on the right. The color scale illustrates the relative level of DEG expression: red, below the reference channel; green, higher than the reference.

Discussion
The prognosis for HCC remains poor, which probably results from late diagnosis, metastasis and recurrence [13]. Specifically, compared with other histopathological grades of HCC, the poorly differentiated HCC represents greater malignancy of the tissues, leading to a poor outcome after hepatectomy of patients with HCC [3,5]. To better understand the characteristics of the poorly differentiated HCC, we used RNA-seq to create miRNA and mRNA expression profiles for HCC and non-tumorous tissue. This yielded 1051 DEGs (559 up-regulated and 492 down-regulated) and 165 differentially expressed miRNA (157 up-regulated and 8 down-regulated). An miRNA-target regulatory network was then constructed, which contained 3718 miRNA-target interactions. Pathway analysis indicated that miRNA targets were mainly involved in digestion, bile secretion and liver development.  Tumor grades are commonly estimated by histopathological investigation and represent tumor malignancies. Generally, high-grade tumors or poorly differentiated tissues have a poor prognosis and low survival rate. Due to similar characteristics of tumors with different degrees of differentiation, diagnosing poorly differentiated HCC tumors is critical. To identify an indicator for tumor grade of HCC, especially the poorly differentiated HCC, we downloaded the miRNA and mRNA data of HCC from the TCGA portal and found that 577 genes were linearly correlated with tumor grade of HCC.
TM4SF1, a member of the transmembrane 4 superfamily, is characterized by four hydrophobic and transmembrane domains [14]. Several TM4SF proteins have a strong association with cell motility and are associated with cancer invasion and metastasis [15]. Previous study indicated that transfection and overexpression of TM4SF1 in HeLa cells had no impact on tumor cell proliferation but rather enhanced the migration ability of HeLa cells and contributed to the metastatic phenotype [16]. Using cDNA microarrays, Huang et al. [17] found that the expression of TM4SF1 was significantly increased in HBV-related  HCC tissues over the adjacent non-cancerous tissues. We found that TM4SF1 was up-regulated in HBVrelated HCC tumor tissues. Moreover, the expression of TM4SF1 was positively associated with tumor grade and it can distinguish HCC with G3 from G1. However, we found that it was not remarkably associated with the overall survival time of HCC patients. Anyway, we suggest that TM4SF1 may be a molecule closely related to HCC malignancy, especially to the poorly differentiated HCC.
ANXA2 is a calcium-dependent phospholipid binding protein that is also involved in cell differentiation [18]. ANXA2 was frequently increased in HCC tissues and the expression of ANXA2 was significantly correlated with differentiation degree and metastasis phenotype of HCC [19]. Shi et al. [20] suggested that ANXA2 plays a role in the malignant behavior of HCC cells by remodeling cell motility-associated structures. Immunohistology experiments have also indicated that ANXA2 is correlated with the histological grade of HCC, and ANXA2 staining was more intense in poorly differentiated tissues than well-differentiated tissues [21]. Our results showed that ANXA2 was up-regulated in HCC tissues and its expression was positively associated with tumor grade and could distinguish G3 from G1 of HCC. Moreover, survival analysis showed that the expression of ANXA2 was closely associated with the survival time of HCC. Emerging data suggest that ANXA2 may be a molecule closely related to HCC that can distinguish poorly differentiated from well-differentiated HCC and may be a prognosis factor in the development of HCC.
It should be pointed out that the expression of C8orf33 was disordered in HCC samples compared with normal liver samples [22]. In this study, we found that C8orf33 was down-regulated in HCC and could distinguish G3 from G1 of HCC. Furthermore, the survival analysis showed that the expression of C8orf33 was closely related to the survival time of HCC patients. Our results suggested that C8orf33 is an important differentiation-related molecules in the process of HCC and could be regarded as a crucial factor in the prognosis of HCC. In addition, it was reported that aberrant up-regulation of LRRC1 contributes to human HCC [23]. The nuclear membrane proteome of HCC showed that CYB5A protein was down-regulated and played a role in oncogenesis of HCC [24]. Our finding showed that up-regulation of LRRC1 and downregulation of CYB5A were associated with tumor grade. Further studies are needed to explore the role of LRRC1 in the differentiation degree of HCC.
It is well known that miRNAs play important roles in cancer development and progression of HCC. In our study, we found three miRNAs that were associated with tumor grade of HCC, namely hsa-miR-210, hsa-miR-1180 and hsa-miR-18a. One of them, hsa-miR-1180, caught our attention. Our sequencing data indicated that hsa-miR-1180 was up-regulated in HCC. Moreover, hsa-miR-1180 was positively associated with tumor grade and distinguished HCC of G3 from G1. Previous studies have indicated that has-miR-1180 was significantly increased in HCC tissues and cells, and promoted cell proliferation of HCC by targeting TNIP2 [25,26]. Here, we make the first suggestion that hsa-miR-1180 may be an effective biomarker for poorly differentiated HCC.

Conclusions
Taken together, we constructed the miRNA and mRNA expression profiles of poorly differentiated HCC by high-throughput sequencing. Further integrating expression data of HCC with different degree of differentiation from TCGA led to the discovery that TM4SF1 and ANXA2 are closely related molecules for distinguishing poorly differentiated HCC from well-differentiated HCC. Furthermore, ANXA2, C8orf33 and IGF2BP3 were demonstrated to be associated with the survival time of HCC patients, which suggested that they could be considered as prognosis factors for HCC.