The Effects of Epigallocatechin Gallate (EGCG) on Pulmonary Fibroblasts of Idiopathic Pulmonary Fibrosis (IPF)—A Next-Generation Sequencing and Bioinformatic Approach

Idiopathic pulmonary fibrosis (IPF) is a disabling and lethal chronic progressive pulmonary disease. Epigallocatechin gallate (EGCG) is a polyphenol, which is the major biological component of green tea. The anti-oxidative, anti-inflammatory, and anti-fibrotic effects of EGCG have been shown in some studies, whereas its effects in altering gene expression in pulmonary fibroblasts have not been systematically investigated. This study aimed to explore the effect of EGCG on gene expression profiles in fibroblasts of IPF. The pulmonary fibroblasts from an IPF patient were treated with either EGCG or water, and the expression profiles of mRNAs and microRNAs were determined by next-generation sequencing (NGS) and analyzed with the bioinformatics approach. A total of 61 differentially expressed genes and 56 differentially expressed microRNAs were found in EGCG-treated IPF fibroblasts. Gene ontology analyses revealed that the differentially expressed genes were mainly involved in the biosynthetic and metabolic processes of cholesterol. In addition, five potential altered microRNA–mRNA interactions were found, including hsa-miR-939-5p–PLXNA4, hsa-miR-3918–CTIF, hsa-miR-4768-5p–PDE5A, hsa-miR-1273g-3p–VPS53, and hsa-miR-1972–PCSK9. In summary, differentially expressed genes and microRNAs in response to EGCG treatment in IPF fibroblasts were identified in the current study. Our findings provide a scientific basis to evaluate the potential benefits of EGCG in IPF treatment, and warrant future studies to understand the role of molecular pathways underlying cholesterol homeostasis in the pathogenesis of IPF.


Introduction
Idiopathic pulmonary fibrosis (IPF) is a chronic progressive pulmonary disease characterized by progressive fibrosing interstitial pneumonitis [1][2][3][4][5]. Patients usually present with non-specific symptoms, such as exertional dyspnea and dry cough, and high-pitched fine inspiratory crackles (the so-called Velcro-like crackles) are usually heard in bilateral basal lung fields on auscultation [2]. A high-resolution computed tomography is a key diagnostic tool, which may reveal the pattern of usual interstitial pneumonia (UIP), characterized by "honey-combing" (subpleural multi-layer cystic lesions), traction bronchiectasis, and peripheral alveolar septal thickening, initially involving the basal and peripheral lungs with gradual progression to involve the whole lungs [3]. With increasing awareness in recent decades, incidence has risen over time, and estimated to be 2-30 cases per 100,000 person-years [2,3,6]. IPF has an increasing global burden, affecting about 3 million people worldwide [3]. Causing progressively disabling dyspnea, IPF has a devastating effect on patients' quality of life and is a quite lethal disease [4]. The median survival of untreated IPF patients is around 3-5 years, which is worse than the majority of cancers in subjects with similar demographic characters [3,4,7].
Although the precise pathogenic mechanisms of IPF remain largely unclear, it is believed that fibroblasts play a key role in the development and progression of IPF [1,5,8]. Exaggerated proliferation, migration, and activation of fibroblasts, as well as their resistance to apoptosis, differentiation to myofibroblasts, and secretion of extracellular matrix components, have been proposed as a major pathogenic model of IPF [1,3,[9][10][11]. Inflammation and oxidative stress are also commonly considered as major factors promoting pulmonary fibrosis [12][13][14].
The treatment modalities for IPF are currently limited. Only two medications, including pirfenidone, an anti-fibrotic agent, and nintedanib, a multi-target tyrosine kinase inhibitor, have been shown to be effective in reducing disease progression and improving quality of life [5,[15][16][17][18][19][20][21]. Although these drugs offer hope for IPF patients, not all patients respond to these pharmacotherapies, and no biomarkers are available to predict the treatment response to any drug; therefore, efforts are continuously made to explore potential novel treatment modalities for IPF.
Epigallocatechin gallate (EGCG) is the ester from epigallocatechin and gallic acid [22]. This catechin is the major polyphenol responsible for the biological effects in tea [22,23]. In contrast to black tea, in which the flavan-3-ols are converted to theaflavins and thearubigins during fermentation processing, green tea retains the highest level of EGCG (100 g of dried leaves contain about 7380 mg of EGCG) [22,23]. EGCG is also found in trace amounts in some food sources, such as onions, hazelnuts, plums, and so on [24]. EGCG exhibits a wide range of therapeutic properties, such as anti-oxidative [25], anti-inflammatory [26], and anti-fibrotic effects [27]. It may modulate cell signaling pathways, such as mitogen-activated protein kinase (MAPK), NF-κB, and AMP-activated protein kinase (AMPK) pathways, and may also modulate epigenetic changes, such as DNA methylation and histone acetylation [22,23]. As a well-known mitochondrion-targeting medicinal agent, EGCG may also regulate mitochondrial metabolism, such as mitochondrial biogenesis and mitochondrial bioenergetics, as well as regulate cell cycle and apoptosis via mitochondria-mediated pathways [28].
Some in vitro and in vivo studies have shown the effects of EGCG on fibroblasts, such as attenuating cell proliferation, enhancing antioxidant defense systems, and inhibiting inflammation [27,[29][30][31][32]. However, the effects of EGCG in modulating gene expression in pulmonary fibroblasts have not been systematically investigated. We therefore conducted this study to explore the effect of EGCG on gene expression profiles in fibroblasts of IPF using next-generation sequencing (NGS) and bioinformatic analyses.

Gene Expression Profiling and Microrna Changes in IPF Fibroblasts Treated with EGCG
The primary pulmonary fibroblasts from a patient with IPF were treated with EGCG or water, and RNAs were extracted from the cells and sent for NGS followed by bioinformatic analyses.  shows the volcano plot of differentially expressed genes in EGCG-treated versus control fibroblasts. Genes with q-value <0.25 and >2-fold changes were selected for further analyses, including 16 significantly downregulated and 45 significantly upregulated genes in EGCG-treated versus control fibroblasts ( Figure 1, Table A1). The small RNA-sequencing data generated with NGS were analyzed to identify potentially significant changes in microRNA profiles in EGCG-treated versus control fibroblasts. As shown in Table A2 versus control fibroblasts ( Figure 1, Table A1). The small RNA-sequencing data generated with NGS 94 were analyzed to identify potentially significant changes in microRNA profiles in EGCG-treated 95 versus control fibroblasts. As shown in Table A2, we identified 56 microRNAs with >2-fold changes

Gene Ontology Annotations of the Differentially Expressed Genes in IPF Fibroblasts Treated with EGCG
Gene ontology analysis of the 61 differentially expressed genes was performed using DAVID, including cellular components, biological processes, and molecular functions. The cellular components significantly associated with these genes included endoplasmic reticulum (15 genes) and endoplasmic reticulum membrane (14 genes) ( Table 2). The biological processes significantly associated with these genes included cholesterol biosynthetic process (15 genes), isoprenoid biosynthetic process (6 genes), oxidation-reduction process (11 genes), cholesterol biosynthetic process via lathosterol (3 genes), cholesterol biosynthetic process via desmosterol (3 genes), cholesterol homeostasis (5 genes), and steroid biosynthetic process (4 genes) ( Table 2). The analyses failed to identify any molecular functions significantly associated with these genes. In the genes involved in these significantly associated cellular components and biological processes, PCSK9, which is significantly involved in cholesterol homeostasis, was the only gene targeted by a differentially expressed microRNA, hsa-miR-1972 ( Table 2).

157
Using IPA, the canonical pathways associated with the 61 differentially expressed genes were 158 investigated. As shown in Figure 3, we found that the associated canonical pathways included The protein-protein interaction (PPI) network analysis using the STRING database identified a total of 61 nodes and 172 edges, with PPI enrichment p-value <1.0 × 10 −16 . Using k-means clustering, the network could be further clustered into three clusters (Figure 4a). Within these three clusters, we found a core cluster, in which almost all genes were associated with cholesterol biosynthetic and metabolic processes (Figure 4b). found a core cluster, in which almost all genes were associated with cholesterol biosynthetic and metabolic processes (Figure 4b).

182
The underlying mechanisms of IPF pathogenesis are quite complex, which might include 183 oxidative stress, inflammation, enhanced proliferation and migration of fibroblasts, and so on 184 [2,3,5,7,21]. As EGCG has anti-oxidative, anti-inflammatory, and anti-fibrotic effects, the current 185 study tried to investigate the effects of EGCG on gene expression profiles in IPF fibroblasts. We   mitochondria-involved autophagy, stimulating glucose uptake, and increasing lipid oxidation [22].

202
As EGCG may also block the activity of peroxisome proliferator-activated receptors gamma 203 (PPARγ) via binding to its active site, this polyphenol has been considered a potential anti-obesity

Discussion
The underlying mechanisms of IPF pathogenesis are quite complex, which might include oxidative stress, inflammation, enhanced proliferation and migration of fibroblasts, and so on [2,3,5,7,21]. As EGCG has anti-oxidative, anti-inflammatory, and anti-fibrotic effects, the current study tried to investigate the effects of EGCG on gene expression profiles in IPF fibroblasts. We found 61 differentially expressed genes in EGCG-treated IPF fibroblasts as compared with the control cells. Gene ontology analyses and network analyses revealed that these genes were mainly involved in the biosynthesis and metabolism of cholesterol. In addition, we also identified 56 differentially expressed microRNAs, and further analyses found that five potential altered microRNA-mRNA interactions, including hsa-miR-939-5p-PLXNA4, hsa-miR-3918-CTIF, hsa-miR-4768-5p-PDE5A, hsa-miR-1273g-3p-VPS53, and hsa-miR-1972-PCSK9, might be important changes in response to EGCG treatment in IPF fibroblasts. Among the five genes with potential microRNA-mRNA interactions, PCSK9, which might be upregulated in association with downregulated has-miR-1972, was the only one involved in cholesterol metabolism.
The effects and mechanisms of EGCG have been studied for various diseases. EGCG is considered beneficial for cardiovascular health and may prevent metabolic syndrome due to its effects in preventing or improving atherosclerosis, cardiac hypertrophy, myocardial infarction, diabetes, inflammation, and oxidative stress [34,35]. In type 2 diabetes and obesity, EGCG may modulate muscle homeostasis by increasing the expression of anti-oxidative enzymes, reversing the increased production of reactive oxygen species in skeletal muscle, regulating mitochondria-involved autophagy, stimulating glucose uptake, and increasing lipid oxidation [22]. As EGCG may also block the activity of peroxisome proliferator-activated receptors gamma (PPARγ) via binding to its active site, this polyphenol has been considered a potential anti-obesity compound [36]. Potential neuroprotective effects of EGCG have also been reported [37]. EGCG exhibits multiple immune-modulating effects, regulating intestinal mucosal immune responses, allergic diseases, as well as anticancer immunity [38]. In addition to its effects in inhibiting NF-κB, epithelial-mesenchymal transition, and cell invasion, the anticancer effects of EGCG might also involve the regulation of microRNAs and epigenetic mechanisms responsible for carcinogenesis and cancer progression [39,40].
A few previous studies have investigated the effect of EGCG on pulmonary fibroblasts. EGCG has been shown to inhibit the production of TNF-α, which might play an important role in the pathogenesis of IPF [41]. Sriram et al. have done a series of studies on the effects of EGCG in pulmonary fibrosis using the rat model of pulmonary fibrosis induced by intra-tracheal instillation of bleomycin [27,[29][30][31]. EGCG augmented antioxidant activities and alleviated the bleomycin-induced oxidative stress, inflammation (increased levels of NF-κB, TNF-α, and IL-1β), and alveolar damage [30,31]. The increased glycoconjugates, increased activities of matrix degrading lysosomal enzymes, and ultrastructural changes induced by bleomycin were also attenuated by EGCG, suggesting the potential of EGCG as an anti-fibrotic agent [27]. EGCG also reversed the increased expression of gelatinases, including matrix metalloproteinase (MMP)-2 and MMP-9, TGF-β1, SMADs, and α-smooth muscle actin (α-SMA) induced by bleomycin treatment [29]. The in vitro studies using a fibroblast cell line also revealed that EGCG was capable of reversing the TGF-β1-induced proliferation and activation of fibroblasts [29]. Using a rat model of irradiation-induced pulmonary fibrosis, You et al. revealed that EGCG inhibited irradiation-induced alveolitis and pulmonary fibrosis through the effects such as inhibiting fibroblast proliferation, reducing collagen deposition, and regulating inflammatory cytokines [32].
The roles of cholesterol and lipoproteins in pulmonary diseases have been recognized in a few studies, although the detailed mechanisms remain unclear. Dyslipidemia affects innate and adaptive immunities in the lung [42]. Statins, 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR) inhibitors originally designed as a cholesterol-lowering drug, also have HMGCR-independent properties, such as anti-inflammatory and anti-fibrotic effects. Pitavastatin and simvastatin inhibited TGF-β1-induced production of growth factors and fibrogenic mediators from lung fibroblasts [43,44]. In a mouse study, pravastatin attenuated bleomycin-induced pulmonary fibrosis by reducing the expression of inflammatory and growth factors, such as TGF-β1 and connective tissue growth factor, and the oxidative stress [45]. A large cross-sectional clinical study showed that lower high-density lipoprotein (HDL)-cholesterol was associated with more subclinical interstitial lung disease, shown by more high attenuation areas measured with computed tomography, and more extracellular matrix remodeling, shown by increased serum MMP-7 and surfactant protein-A [46]. Through gene ontology and functional analyses, we found that the EGCG-induced differentially expressed genes mainly involve the biosynthesis and metabolism of cholesterol. Further studies are needed to understand the roles of cholesterol-associated pathways in pulmonary fibrosis.
In this study, five differentially expressed genes were found as the potential targets of corresponding differentially expressed microRNAs, including downregulated PLXNA4 and upregulated CTIF, PDE5A, VPS53, and PCSK9. As a study using the rat model of bleomycin-induced pulmonary fibrosis showed increased expression of PLXNA4 [47], the downregulation of PLXNA4 induced by EGCG might have beneficial effect in treating pulmonary fibrosis. In contrast to our findings that EGCG upregulated PDE5A expression in IPF fibroblasts, PDE5A inhibition by sildenafil improved bleomycin-induced pulmonary fibrosis by reducing oxidative stress [48]. PCSK9 encodes proprotein convertase subtilisin/kexin type 9, which is a regulator of the homeostasis of plasma low-density lipoprotein (LDL)-cholesterol, and is associated with the metabolism of lipid and glucose [49]. Expression of PCSK9 might reverse the abnormal cholesterol accumulation and the development of fibrosis in the liver caused by E2F1 deficiency [50]. Although the roles of these genes in regulating the cell physiology of pulmonary fibroblasts remain largely unknown, these EGCG-induced gene expression alterations might provide potential targets to reverse pulmonary fibrosis and deserve further studies.
Some anti-fibrotic and pro-fibrotic microRNAs have been reported, and some of them might contribute to the pathogenesis of IPF [1,51,52]. The expression of miR-155 in human lung fibroblasts was upregulated by TNF-α and IL-1β and downregulated by TGF-β1; miR-155, which might target keratinocyte growth factor, promoted migration of fibroblasts and enhanced pulmonary fibrosis [53]. In studies using the mice model of bleomycin-induced pulmonary fibrosis, upregulation of miR-155 and downregulation of miR-29 were observed, which correlated with the degree of lung fibrosis [53,54]. The increased expression of miR-155 and decreased expression of miR-29 have been observed in the lungs of IPF patients [51]. In addition, greater expression and localization of miR-34a in pulmonary fibroblasts of IPF have been reported, which might function as an inhibiting mechanism of pulmonary fibrosis via inducing senescence and apoptosis of the fibroblasts [55]. Our findings that EGCG significantly upregulated miR-29b-2-5p and miR-34a-3p and downregulated miR-155-3p in IPF fibroblasts suggested a potential role of EGCG in the treatment of IPF through regulation of these microRNAs.
The dose of EGCG used in this study might be a concern. While most published in vitro studies used 10-100 µM of EGCG [56,57], we chose 25 µM of EGCG. As shown in a few previous studies, this dose of EGCG did not cause significant proliferation inhibition in human fibroblast cell line [29,58] and human colorectal cancer cell lines [59]. In line with these studies, our study showed that 25 µM of EGCG did not significantly alter the expression of genes related to cell proliferation or cell death. In addition, the bioavailability of EGCG is quite poor [56,60]. As shown in a pharmacokinetic study, the peak plasma EGCG level was only about 0.17 µM after drinking two cups of tea [56]. Parenteral routes, such as intravenous injection, might be needed to reach an effective systemic dose. However, a high systemic dose might cause hepatotoxicity [61]. In 2018, the European Food Safety Authority stated that taking ≥800 mg of EGCG daily might increase serum transaminase levels [62]. Further study is needed to determine the optimal systemic dose of EGCG. On the other hand, inhalation of an EGCG aerosol might possibly provide an alternative route of administration to increase EGCG concentration in the lungs while avoiding the potential systemic toxicity [63].
The mRNA and small RNA expression profiles were assessed using NGS as our previous studies [1,[64][65][66][67][68]. In brief, total RNAs were extracted using TRIzol ® Reagent (Thermo Fisher Scientific, Waltham, MA, USA, Catalog No. 15596018) according to the instruction manual. The purified RNAs were quantified at OD 260nm using an ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and qualitatively analyzed using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) with RNA 6000 LabChip kit (Agilent Technologies, Santa Clara, CA, USA). Library preparation and deep sequencing were carried out at Welgene Biotechnology Company (Taipei, Taiwan) as the official protocol of Illumina (San Diego, CA, USA).
For transcriptome sequencing, the library was constructed with Agilent's SureSelect Strand Specific RNA Library Preparation Kit followed by AMPure XP Beads size selection. The sequence was directly determined using Illumina's sequencing-by-synthesis (SBS) technology, and the sequencing data were generated by Welgene's pipeline based on Illumina's base-calling program bcl2fastq v2.2.0. For read alignment, HISAT2 [69], a fast and sensitive alignment program for mapping NGS reads to genomes, was used. The new indexing scheme of HISAT2 is based on the hierarchical graph Ferragina-Manzini index (GFM index) [70]. HISAT2 uses the global GFM index and a large set of small GFM indexes providing better performance for splicing junction alignment, which collectively covers the whole genome for rapid and accurate alignment, making HISAT2 an effective tool for transcriptome alignment. Differential expression analysis based on Cuffdiff (Cufflinks 2.2.1) [71] with genome bias detection/correction and Welgene in-house programs was performed. Differential expressed genes of each experiment design were followed by enrichment test for functional assay by clusterProfiler 3.6 [72]. The genes with low expression levels (<0.3 fragment per kilobase of transcript per million mapped reads (FPKM)) in both EGCG-treated and control fibroblasts and the genes with undetected level in either EGCG-treated or control fibroblasts were excluded. The p-values were calculated by Cuffdiff with non-grouped sample using "blind mode", in which all samples were treated as replicates of a single global "condition" and used to build one model for statistical tests [71,73]. The q-values were p-value adjusted with false discovery rate using the method by Benjamini and Hochberg [74]. Genes with q-value < 0.25 (i.e., −log 10 (q-value) > 0.602) and > 2-fold changes were considered significantly differentially expressed.
For small RNA sequencing, samples were prepared using Illumina sample preparation kit according to the TruSeq Small RNA Sample Preparation Guide. After the 3 and 5 adaptors were ligated to the RNA, reverse transcription followed by PCR amplification was performed. The enriched cDNA constructs were size-fractionated and purified on a 6% polyacrylamide gel electrophoresis and the bands containing the 18-40 nucleotide RNA fragments (140-155 nucleotides in length with both adapters) were extracted, which were then sequenced on an Illumina instrument (75 bp single-end reads). Sequencing data was processed with the Illumina software. After trimming and removing low-quality data with Trimmomatics v0.36 [75], the qualified data were analyzed with the miRDeep2 [76] to clip the 3 adapter sequence and discard reads shorter than 18 nucleotides. The reads were then aligned to the human genome from the University of California, Santa Cruz (UCSC). Only reads that mapped perfectly to the genome ≤5 times were used for microRNA detection, because microRNAs usually map to few genomic locations. MiRDeep2 [76] estimates expression levels of known microRNAs, and also identifies novel microRNAs. The microRNAs with low levels (<1 normalized read per million (rpm)) in both EGCG-treated and control fibroblasts were excluded.

Analyses Using microRNA Target Predicting Databases
We used miRmap, an open-source software library providing comprehensive prediction of microRNA targets (http://mirmap.ezlab.org/) (accessed on 19 September 2018) [77], to predict the potential targets of significantly differentially expressed (>2-fold change) microRNAs. A miRmap score represents the repression strength of a microRNA on a target mRNA. Only the microRNA-mRNA pairs with miRmap score >97.0 were considered in the current study.

Database for Annotation, Visualization and Integrated Discovery (DAVID) Database Analysis
The DAVID (version 6.8) (https://david.ncifcrf.gov/) (accessed on 06 February 2019) [78] is a powerful tool for gene functional classification, which integrates many functional annotation databases, such as Gene Ontology (GO) biological process, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. By calculating the similarity of global annotation profiles with agglomeration algorithm method, a list of interesting genes can be classified into clusters based on their related biological functions, signaling pathways, or diseases. The differentially expressed genes were analyzed using the methods as in our previous studies [1,[64][65][66][67][68]. The p-values (adjusted with false discovery rate using the method by Benjamini et al.) <0.05 were taken as the criteria of significance.

Ingenuity Pathway Analysis (IPA)
Ingenuity ® Pathway Analysis (IPA) software (Ingenuity Systems, Redwood City, CA, USA) integrates many research results and performs multiple analyses providing a comprehensive interpretation of big experimental data. We used IPA (version 2.3) to identify the canonical pathways associated with the candidate genes.

Search Tool for the Retrieval of Interacting Genes (STRING)
The STRING database (version 11.0) (https://string-db.org/) (accessed on 7 February 2019) covers 5090 organisms, 24.6 million proteins and >2000 million interactions that provides analysis and integration of direct and indirect protein-protein interactions (PPI), and focuses on functional association [79]. The differentially expressed genes identified were uploaded, and interactions with at least medium confidence (interaction score >0.4) were selected. Network was clustered using k-means clustering to a specified number of clusters.

Conclusions
In summary, our study demonstrated an innovative model to discover the potential effects of a nature compound on gene expression alterations and microRNA changes using an NGS and bioinformatic approach. We identified differentially expressed genes and microRNAs in response to EGCG treatment in IPF fibroblasts. These gene expression changes were mainly involved in the biosynthesis and metabolism of cholesterol, suggesting that EGCG might have effects on IPF treatment through regulation of the cholesterol-associated genes. Our findings provide a scientific basis to evaluate the potential benefits of EGCG in IPF treatment, and warrant future studies to understand the role of molecular pathways underlying cholesterol homeostasis in the pathogenesis of IPF.

Conflicts of Interest:
The authors declare no conflict of interest related to this study.