Identification of key genes, pathways and potential therapeutic agents for liver fibrosis using an integrated bioinformatics analysis

Background Liver fibrosis is often a consequence of chronic liver injury, and has the potential to progress to cirrhosis and liver cancer. Despite being an important human disease, there are currently no approved anti-fibrotic drugs. In this study, we aim to identify the key genes and pathways governing the pathophysiological processes of liver fibrosis, and to screen therapeutic anti-fibrotic agents. Methods Expression profiles were downloaded from the Gene Expression Omnibus (GEO), and differentially expressed genes (DEGs) were identified by R packages (Affy and limma). Gene functional enrichments of each dataset were performed on the DAVID database. Protein–protein interaction (PPI) network was constructed by STRING database and visualized in Cytoscape software. The hub genes were explored by the CytoHubba plugin app and validated in another GEO dataset and in a liver fibrosis cell model by quantitative real-time PCR assay. The Connectivity Map L1000 platform was used to identify potential anti-fibrotic agents. Results We integrated three fibrosis datasets of different disease etiologies, incorporating a total of 70 severe (F3–F4) and 116 mild (F0–F1) fibrotic tissue samples. Gene functional enrichment analyses revealed that cell cycle was a pathway uniquely enriched in a dataset from those patients infected by hepatitis B virus (HBV), while the immune-inflammatory response was enriched in both the HBV and hepatitis C virus (HCV) datasets, but not in the nonalcoholic fatty liver disease (NAFLD) dataset. There was overlap between these three datasets; 185 total shared DEGs that were enriched for pathways associated with extracellular matrix constitution, platelet-derived growth-factor binding, protein digestion and absorption, focal adhesion, and PI3K-Akt signaling. In the PPI network, 25 hub genes were extracted and deemed to be essential genes for fibrogenesis, and the expression trends were consistent with GSE14323 (an additional dataset) and liver fibrosis cell model, confirming the relevance of our findings. Among the 10 best matching anti-fibrotic agents, Zosuquidar and its corresponding gene target ABCB1 might be a novel anti-fibrotic agent or therapeutic target, but further work will be needed to verify its utility. Conclusions Through this bioinformatics analysis, we identified that cell cycle is a pathway uniquely enriched in HBV related dataset and immune-inflammatory response is clearly enriched in the virus-related datasets. Zosuquidar and ABCB1 might be a novel anti-fibrotic agent or target.


INTRODUCTION
Hepatic fibrosis is characterized by the pathological accumulation of extracellular matrix (ECM) following chronic liver injury arising from various sources including toxic damage, viral infections, autoimmune conditions, and metabolic or genetic diseases. Patients with advanced liver fibrosis generally have a poor prognosis as they often develop decompensated cirrhosis and hepatocellular carcinoma (Tsochatzis, Bosch & Burroughs, 2014).
For the management of patients with hepatic fibrosis in clinical practice, several wellvalidated clinical practice guidelines and recommendations have been established, including antiviral therapy for patients with chronic hepatitis B or C (Liver, 2017a;Liver, 2017b), a cessation of alcohol consumption in patients with alcoholic liver disease (Liver, 2012), and lifestyle modifications in patients with nonalcoholic fatty liver disease (Djordjevic et al., 2018). However, eliminating the cause of fibrosis is not generally sufficient to halt progression from liver fibrosis to cirrhosis (Feng et al., 2018). Unfortunately, there are no approved anti-fibrotic drugs currently available (Bottcher & Pinzani, 2017). A better understanding of the molecular mechanisms controlling the fibrotic response is thus needed to facilitate the development of new drugs, and to thereby improve patient outcomes.
High-throughput sequencing technology offers an ideal means of profiling large gene expression datasets in order to gain a comprehensive understanding of the mechanisms underlying fibrosis. For example, Chan et al. (2016) found that in cirrhotic liver tissues there is a unique gene expression pattern related to inflammation, the immune response, and cell growth, and with a potential relationship with cancer as well. Using a bioinformatics analysis, many hub genes which are essential to fibrogenesis have been identified. For example, ITGBL1 was identified in an HBV-related fibrosis dataset . LUM, THBS2, FBN1, and EFEMP1 were all identified in a NAFLD-related fibrosis dataset (Lou et al., 2017). TAF1, HNF4A and CALM2 were identified in an HCV-related fibrosis dataset (Ji et al., 2018). COL6A1, COL6A2, COL6A3, PIK3R3, COL1A1, and CCND2, were identified in NAFLD and HCV-related datasets . However, it remains unclear as to whether these pathways and hub genes are unique to individual disease etiologies or are shared between them.
In order to clarify this uncertainty, we integrated three datasets, each pertaining to fibrosis of a different etiology. Using bioinformatics analyses, we thereby sought to identify key genes and pathways of interest, and to screen for therapeutic agents and novel targets with the potential to treat liver fibrosis.

Microarray data
Four gene expression datasets were downloaded from the Gene Expression Omnibus (GEO) database; three were analyzed to identify DEGs, while one was used for validation. Table 1 summarizes the pertinent information for the selected GEO datasets used in this study. GSE6764 (Wurmbach et al., 2007), GSE49541 (Moylan et al., 2014), and GSE84044  represent datasets from patients with liver fibrosis arising from hepatitis C virus (HCV), nonalcoholic fatty liver disease (NAFLD), and hepatitis B virus (HBV), respectively. All three of these gene expression profiles were based on the GPL570 platform. GSE49541 and GSE84044 were derived from two liver fibrosis studies in which tissues with severe fibrosis (F3-F4) and mild fibrosis (F0-F1) were selected. GSE6764 and GSE14323 (Mas et al., 2009) (used for validation) were derived from two liver cancer studies, in which cirrhotic (F4) and normal tissues (F0) were selected.

Identification of differentially expressed genes
Background expression value correction and data normalization were conducted for the raw data in each dataset using an R package (Affy, version 1.52.0). Probes in each data file were then annotated based on the appropriate platform annotation files. Probes without matching gene symbols were removed. In instances where different probes mapped to the same gene, the mean value of all probes mapping to that gene was taken as the final expression value for that gene. Then, the Linear Models for Microarray Analysis R package (limma; version 3.30.11; Ritchie et al. (2015)) was applied for differential expression analysis. Those genes with an adjusted P-value < 0.05 and absolute value of fold-change (FC) >1.5 were deemed to be the DEGs. DEGs overlapping between datasets were obtained using an online Venn analysis tool (http://bioinformatics.psb.ugent.be/webtools/Venn/).

Gene Ontology and pathway enrichment analyses
Gene Ontology (GO) is a commonly used bioinformatics tool that provides comprehensive information on gene function of individual genomic products based on defined features. This analysis consists of three facets: molecular functions (MF), biological processes (BP) and cellular components (CC). The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database resource for understanding high-level biological functions and utilities. These analyses and annotations are based on the DAVID database (https://david.ncifcrf.gov/), which provides a comprehensive set of functional annotation tools for investigators to explore and understand the biological meaning underlying particular gene lists. In this study, both GO and KEGG analyses of DEGs were performed with a criterion false discovery rate (FDR) < 0.05.

Protein-protein interaction (PPI) network construction and hub gene analysis
In order to analyze the connections among the proteins encoded by identified DEGs, DEGs were uploaded to Search Tool for the Retrieval of Interacting Genes (STRING, https://string-db.org/), a database of known and predicted protein-protein interactions, and the results with a minimum interaction score of 0.4 were visualized in Cytoscape. Furthermore, CytoHubba, a Cytoscape plugin app, providing a user-friendly interface to explore important nodes in biological networks, was utilized with the maximal clique centrality (MCC) method to explore the PPI network for hub genes.

DEGs validation
Another dataset, GSE14323, was used to confirm the validity and disease relevance of identified DEGs. A heat map of the expression of 25 hub genes was developed using the HemI1.0.3.3 software. Statistical difference analysis between the liver cirrhosis group (LC) and normal control group (NC) was performed via student's t -test using SPSS V20.0. P < 0.05 was considered statistically significant. As activation of hepatic stellate cells (HSCs) is considered as a central driver of liver fibrosis, we used a human HSC cell line-LX2 treated with TGF-β1 to represent this activation stage. An expression of 25 hub genes was performed by quantitative real-time PCR assay compared with normal control.

Cell culture and treatment
The LX2 cell line was purchased from Procell Life Science & Technology (Wuhan, China), cultured with Dulbecco Modified Eagle Medium (DMEM)-high glucose supplemented with 10% fetal bovine serum (FBS) and antibiotics (100 U/mL penicillin-G and 100 µg/mL streptomycin), and incubated at 37 • C in 5% CO2 and 95% humidified air. The LX2 cells were seeded in a 10-cm culture dish at a density of 1 * 10 6 for 6 h. After attachment, the LX2 cells were treated with TGF-β1 (R&D systems, Catalog #240-B/CF) at 10 ng/ml concentration or left untreated as normal control for 24 h. Then RNA and proteins were isolated for further use.

Western blot assay
LX2 cells total protein were extracted with ice-cold RIPA lysis buffer. Protein concentration was determined using the BCA Protein Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA). Quantified proteins were separated on SDS-PAGE and transferred onto PVDF membranes (Millipore Corporation, Burlington, MA, USA). After blocking, membranes were incubated with anti-αSMA (1:20,000, ab124964, Abcam, UK) at 4 • C overnight. Then, membranes were washed with TBST and incubated with secondary antibodies for 2 h at room temperature. The anti-GAPDH (1:1,000, CST) was set as internal control. Protein bands were visualized by using ECL equipment (Pierce Chemical, Waltham, MA, USA).

Quantitative real-time PCR assay
RNA was extracted from cell line LX2 by Trizol reagent (Takara, Kusatsu, Japan) by following the manufacturer's instructions. The cDNAs were synthesized with a commercial kit (Takara, Japan). Gene expressions were measured by real-time PCR with CFX Connect TM Real-Time PCR System (Bio-Rad, Hercules, CA, USA). GAPDH was used as an internal control and the relative expression levels of mRNA were calculated using the 2 − Ct method. The primer pairs used in the experiments are listed in Data S1.

Prediction of therapeutic agents and target genes
To discover potential anti-fibrotic agents, the identified 185 DEGs were queried using the Connectivity Map online tool (L1000 platform; https://clue.io/l1000-query). This tool compares queried signatures with a gene expression profile database of several cell lines after treatment with more than 2,000 compounds, most of which are FDA approved. Drugs whose signatures were in opposition to the disease signature were assumed to have therapeutic potential.

Identification of 185 conserved DEGs
As shown in Fig. 1, each dataset was initially analyzed separately to identify DEGs unique to fibrosis of a given origin. 1,563 DEGs were identified in GSE6764 (HCV), 243 DEGs in GSE49541 (NAFLD), and 1,396 DEGs in GSE84044 (HBV) (Data S2). 185 DEGs overlapped across all three datasets, suggesting that these fibrosis-related DEGs may be conserved regardless of disease etiology. Among these 185 DEGs, 174 were up-regulated while only 11 were down-regulated. Interestingly, although the number of DEGs in NAFLD related dataset is relatively small, 94.7% of the 243 DEGs intersect with other datasets.

Functional enrichment analysis of DEGs
In order to compare the differences in gene function among these 3 datasets, GO and KEGG analyses were performed on each dataset (Data S3) and top 10 significant GO_BP and KEGG pathways are shown in Tables 2 and 3, respectively. Unexpectedly, the cell cycle pathway was uniquely enriched in the HBV-related dataset, ranking third among all KEGG pathways for this dataset. When compared with a non-viral fibrosis dataset (GSE49541), those datasets in which fibrosis was of viral origin (GSE6764 and GSE84044) contained DEGs enriched for immune-inflammatory responses, consistent with the distinct role of immunological responses in the initiation and control of local disease in affected individuals.
Next, GO and KEGG analyses were performed on the 185 common DEGs. The GO analysis revealed that most of the proteins encoded by these DEGs were extracellular matrix proteins located in the extracellular space (Fig. 2). The molecular functions (MF) enriched in this dataset were primarily associated with platelet-derived growth-factor binding and extracellular matrix structural constitution, while the enriched biological processes (BP) were primarily those associated with extracellular matrix organization and cell adhesion (Fig. 2). The KEGG analysis revealed that the primary enriched signaling pathways were those associated with ECM-receptor interaction, protein digestion and absorption, focal adhesion, and the PI3K-Akt signaling pathway (Fig. 3). Together, these shared DEGs highlight the central roles of cell-cell adhesion and ECM dysregulation in the development of fibrosis, regardless of the etiological origin of the disease.

PPI network construction and hub gene identification
To better understand which of these shared DEGs were most likely to be the key genes most essential for the development of fibrosis, a PPI network for these 185 common DEGs was built with 105 nodes and 275 edges. 80 of the 185 DEGs were not included in the PPI network (Fig. 4), as interaction score of these 80 genes were less than 0.4. Among the 105 genes in the PPI network, the top 25 genes according to the MCC method were selected using the CytoHubba plugin and are sequentially ordered as follows: COL1A2,  (Fig. 5). These 25 genes were deemed to be the hub genes and were those genes most likely to be essential for fibrogenesis. Most genes encode ECM components, including COL1A2, DCN, and FBN1. Other hub genes play known roles in ECM structural regulation (THBS2, ITGB8, VWF), while some are associated with ECM degradation (ADAMTS2, PCOLCE2, CTSK). This finding is consistent with known fibrogenic mechanisms, and suggests key potential drug targets that are most likely to have effective anti-fibrotic activity when disrupted.

Hub gene validation
In order to extend and validate our findings in a distinct model of human liver fibrosis, these top 25 hub genes were validated in the GSE14323 dataset, in which liver cirrhotic (LC) and normal control tissues (NC) were selected for analysis. Figure 6 displays a heatmap of GSE14323 expression profile data. This expression profile was consistent with the overlapping DEGs identified in the initial three datasets, with 23 of these 25 hub DEGs being up-regulated in cirrhotic patients. Statistical analysis of these genes in the validation dataset is shown in Fig. 7. Differences for all the hub genes between the LC and NC groups were statistically significant with the exception of ITGB8. A cell model of liver fibrosis was also constructed to validate these 25 hub genes. When treated with TGF-β1, the LX2 cells extended more tentacles and expressed more a-SMA protein (One of the markers of hepatic stellate cell activation) (Fig. 8), indicating the cell model was successfully established. Figure 9 displays statistical analysis of 25 hub genes relative expression to GAPDH, 13 of these 25 hub genes was up-regulated significantly, which is consistent with the trend of GEO datasets in this study. However, 4 genes (LUM, THBS2, ITGB8 and SPP1) was down-regulated in TGF-β1 treated cells.

Prediction of potential therapeutic agents and targets
Given the role of 185 DEGs in fibrogenesis, we next wanted to probe for potential therapeutic compounds that might best be suited to target these genes in order to achieve a beneficial therapeutic outcome. To that end the connectivity map L1000 platform, which compiles gene expression profiles associated with a wide range of therapeutic compounds, was used to search for drugs with the potential for therapeutic repurposing as a means of treating liver fibrosis (Data S4). The top 10 compounds are shown in Table 4, and when sequentially ordered by median_score are: Prometon, MK-212, Evodiamine, Zosuquidar, CAY-10415, Caffeic-acid, Budesonide, Rilmenidine, Afatinib, Desloratadine.
Target genes corresponding to each compound (with the exception of prometon) were also listed. Among these target genes, three (HTR2B, ABCB1 and ALOX5) were significantly up-regulated in HBV and HCV datasets, and the mRNA expression levels in each dataset were listed in Table 5. Together, these compounds and target genes provide a promising list for researchers or companies interested in conducting pre-clinical research into the mechanisms of and treatments for fibrosis both in vitro and in vivo.

DISCUSSION
Globally, HBV, HCV, and NAFLD are the most three common causes of liver fibrosis (Altamirano-Barrera, Barranco-Fragoso & Méndez-Sánchez, 2017). In the present study, we integrated datasets that were focused on these three most common causes of liver fibrosis, and in so doing we were able to identify different and common signaling pathways for the fibrogenesis. Transition of hepatic stellate cells (HSCs) from a quiescent to an activated state is a sign of the onset of liver fibrosis, and this process is controlled by E-type cyclins (CcnE1, CcnE2) and their associated cyclin-dependent kinase 2 (Cdk2) (Nevzorova et al., 2012;Ohtsubo et al., 1995). Cyclin E-Cdk2 has long been considered an essential master regulator of progression through the G1 phase of the cell cycle (Hwang & Clurman, 2005). According to our KEGG pathway enrichment results, the cell cycle pathway is uniquely enriched in HBV-related fibrosis dataset, with CcnE2 being significantly up-regulated only in this HBV dataset, but not in the HCV or NAFLD datasets (Table 5). Therapeutic targeting of Cyclin E1 via RNAi has been shown to have robust anti-fibrotic activity in mice (Bangen et al., 2017), and if this technology can be applied clinically in future, we predict that it will be most effective in those patients with chronic hepatitis B. Both HBV and HCV are non-cytopathic viruses, and liver damage in infected individuals is mainly caused by an inflammatory immune response aimed at eliminating the virus (Guidotti & Chisari, 2006), with such persistent inflammation leading to liver fibrosis (Protzer, Maini & Knolle, 2012). In contrast to such fibrosis of viral origin, the production of reactive oxygen species (ROS) and resulting oxidative stress is thought to be a critical factor in NAFLD-associated fibrosis. Although NAFLD is always accompanied by an inflammatory reaction with variations in levels of pro-inflammatory cytokines (Cai et al., 2005), the immune-inflammatory response is not significantly enriched in this NAFLDrelated dataset, whereas it is clearly enriched in the virus-related datasets.
By using a PPI network analysis, we identified 25 hub genes, some of which have been previously reported including LUM, THBS2, FBN1 (Lou et al., 2017), COL6A3, and COL1A1 . These 25 hub genes are crucial to fibrogenesis, and are expressed regardless of etiology. The development of anti-fibrotic drugs should therefore focus on these genes as targets. Connective tissue growth factor (CTGF), one of the 25 identified hub genes, is expressed at very low levels in normal liver tissue but is significantly upregulated in fibrotic liver tissue. Recently, a clinical trial of patients with   HBV-associated liver fibrosis was completed utilizing a human monoclonal antibody reactive to CTGF (NCT01217632). Although this clinical trial was terminated due to an unexpected prominent single-agent effect of entecavir in this patient population, it did show promise.
In this study, we were able to screen for and identify 10 compounds that may have therapeutic activity against liver fibrosis. Among these compounds, evodiamine, cafficacid, and budesonide have all been shown to be effective in animal models or clinical trials (Alferink et al., 2017;Silveira & Lindor, 2014;Yang et al., 2018;Yang et al., 2017). Four agents, MK-212, CAY-10415, afatinib, and desloratadine, have not been tested in vivo, but compounds targeting the same molecules as these four agents have been reported to have the potential to ameliorate liver fibrosis (Boettcher et al., 2012;Ebrahimkhani et al., 2011;Fuchs et al., 2014;Kennedy et al., 2018). Prometon, Zosuquidar, and Rilmeidine have not yet been reported to have relationship with fibrosis.
Among the target genes corresponding to these compounds, we found that HTR2B, ABCB1, and ALOX5 were significantly up-regulated in HBV and HCV related liver fibrosis datasets (Data not show). Stimulation of the 5-hydroxytryptamine 2B receptor (HTR2B) on HSCs by serotonin is required to negatively regulate hepatocyte regeneration, and antagonism of HTR2B has been shown to attenuate fibrogenesis and improve liver function in disease models in which fibrosis was pre-established and progressive (Ebrahimkhani et al., 2011). Interestingly, MK-212, an HTR2B agonist, showed a negative liver fibrosis gene expression profile suggesting potential as an anti-fibrotic agent, although formal experimental testing is needed. Arachidonate 5-lipoxygenase (ALOX5) plays a role in the synthesis of leukotrienes from arachidonic acid, and inhibition of the ALOX5 pathway markedly reduces the number of Kupffer cells in culture and attenuates inflammation and fibrosis in experimental liver disease (Titos et al., 2003). Recently, a clinical study revealed that frequent coffee consumption was inversely correlated with liver stiffness (Alferink et al., 2017), with suggestions that the underlying mechanism may be one related to the inhibition of TGF-β1/Smad3 signaling and the induction of autophagy in HSCs in response to caffeic acid . As an inhibitor of ALOX5, caffeic acid may thus be able to attenuate liver fibrosis via this ALOX5 (Sud'ina et al., 1993) pathway.
ATP Binding Cassette Subfamily B Member 1 (ABCB1) is known for encoding P glycoprotein, which is responsible for decreased drug accumulation in multidrug-resistant cells and often mediates the development of resistance to anticancer drugs, such as Zosuquidar mentioned above. However, there are currently no studies reporting that ABCB1, P glycoprotein or Zosuquidar is associated with liver fibrosis. Some studies have reported P glycoprotein was increased in rat activated HSC (Hannivoort et al., 2008), and its activity was increased by TGF-β (Baello et al., 2014) and endoplasmic reticulum stress (Ledoux et al., 2003), which are considered to be effective activators of HSC. Combining the findings of our research, we infer that ABCB1 might be a novel therapeutic target to liver fibrosis, although this hypothesis need to be verified in further study.