Transcriptional profiling to identify the key genes and pathways of pterygium

Purpose Pterygium results from a variety of biological pathways that are involved in the formation of ocular surface diseases. However, the exact pathogenesis of pterygium is still unclear. Our study focused on gene expression profiles to better understand the potential mechanisms of pterygium. Methods RNA sequencing experiments were performed on clinical pterygium tissues and normal conjunctival tissues. To identify the hub genes for the development of pterygium, we further conducted weighted gene co-expression network analysis (WGCNA). qRT-PCR was utilized to validate the dysregulation of the most significant differentially expressed genes (DEGs) and key hub genes in the independent subjects. Results A total of 339 DEGs (P-adjusted < 0.05 and log2 fold change [log2FC] ≥ 1.0) were obtained that reached statistical significance with p-values < 0.05. Among them, 200 DEGs were upregulated; these genes were mainly associated with the extracellular matrix and with cell adhesion or migration. In contrast, the 139 downregulated genes were enriched for endocrine and inflammation pathways. With regard to WGCNA, five modules were assigned based on the DEG profiles, and the biological functions of each module were verified with previously published GO terms. The functions included ECM-receptor interactions, the PI3K-Akt signalling pathway and an endoplasmic reticulum (ER)-related pathway. The five hub genes with the highest connectivity in each module and the five most significant DEGs showed dysregulated expression in the independent cohort samples. Conclusions RNA sequencing and WGCNA provided novel insights into the potential regulatory mechanisms of pterygium. The identified DEGs and hub genes, which were classified into two groups according to different functions or signalings, may provide important references for further research on the molecular biology of pterygium.


INTRODUCTION
Pterygium, a common ocular surface disease, is mainly characterized by excessive proliferation of fibrovascular tissue from the conjunctiva to the cornea (Chui et al., 2011). At present, the standard treatment for pterygium is surgical removal, but the recurrence rate is approximately 61-82% after this treatment (Sheppard et al., 2014;Singh, 2017). Epidemiological studies have suggested that environmental factors, including ultraviolet radiation and dust, are involved in the etiology of pterygium (Zhou et al., 2016). Recent molecular studies have also reported that biological pathways including angiogenesis, fibrosis, proliferation and inflammation play contributory roles in the development of pterygium (Larrayoz et al., 2012). In general, the pathology of pterygium involves multiple factors. However, the exact pathogenesis remains unclear (Serra et al., 2018).
Recently, increasing studies on pterygium have focused on transcriptional analyses and have reported the existence of some differentially expressed genes (DEGs) in pterygium patients. Using a DNA microarray experiment, Wong et al. (2006) found that the mRNA levels of a number of genes were altered in primary pterygium. It was noted that the IGFBP3 gene, the function of which is related to the effects of insulin-like growth factor on cells, was significantly decreased in pterygium. Subsequently, other groups have also performed transcriptional profiling studies on pterygium (Lan et al., 2018;Liu et al., 2016;Liu et al., 2017).
RNA sequencing (RNA-Seq) technology, a novel transcriptional profiling tool, has several advantages over other techniques, particularly its sensitivity in terms of measuring gene expression and its ability to detect dynamic changes. With the rapid development of high throughput next generation sequencing (NGS), the discovery of disease-associated genetic variants and genome-wide profiling of expressed sequences and epigenetic marks has become more intensive, thereby permitting systems-based analysis of ocular development and diseases, including pterygium (Chaitankar et al., 2016;Larrayoz et al., 2012). Kim et al. (2016) assessed that the role of neural retina leucine zipper (NRL) in transcript development of rod photoreceptors and its relationship with other transcriptional regulators and effectors by performing microarray hybridization and RNA-Seq on mouse retinal tissues (Kim et al., 2016). Subsequently, another RNA sequence was performed on developing mouse rod photoreceptors in retinal tissues, indicating that NRL could regulate the noncoding transcriptome in developing photoreceptors (Zelinger et al., 2017). Recently, Bang et al. utilized RNA-Seq to identify the expression of complement factors in 20 cases of pterygium and in normal conjunctival tissues. The researchers reported that pterygium size is related to the expression of CFH, C1QB, C1QC and MASP1 genes and the alternative and lectin-binding complement systems may be activated in diseased tissues (Bang et al., 2017). In that study, novel DEGs and potential mechanisms of pterygium were mined from whole transcriptome profiles. Overall, studies on pterygium using RNA-Seq are still relatively scarce; thus, more studies with larger sample sizes are needed.
In the current study, we did the following: (a) explored the transcriptional profiles of pterygium with RNA-Seq; (b) constructed a weighted co-expression network; (c) identified the key hub mRNAs significantly associated with pterygium; and (d) raised new potential mechanisms associated with pterygium.

Patients and specimens
The research protocol was approved by the Institutional Review Board of Yangpu Hospital, and all study participants gave written informed consent. The study was carried out in accordance with the Declaration of Helsinki. All surgical procedures were performed under local anesthesia by the same surgeon. Thirty patients underwent elective pterygium surgery, including 18 males (aged 56 to 77 years, mean age 67 years) and 12 females (aged 40 to 77 years, mean age 62 years). Control tissues, i.e., small rectangular pieces of normal conjunctival tissue, were excised from 32 donor eyes, which were matched for age (42 to 74 years, mean age 64 years), gender (18 males and 14 females), and ethnic background (Chinese). Eight pterygium tissues and 10 normal conjunctival tissues were subjected to RNA-Seq. Ten pterygium tissues and 10 normal conjunctival tissues were used for validation of hub gene expression. To verify the dysregulation of the top five transcripts from the RNA-Seq data, we selected 12 independent pterygium and 12 healthy control samples to examine the mRNA levels. All study participants with pterygia were subjected to slit lamp photography (Canon, Japan) preoperatively to demonstrate the ingrowth of the pterygium onto the cornea. The extension of the pterygium onto the cornea in patients ranged from two mm to four mm. Clinical surgery for pterygium involved conventional excision of the pterygium with autotransplantation of the conjunctiva. The collected samples were transferred to 1.5 ml tubes and stored at −80 • C for further analysis.

RNA-Seq
Total RNA was extracted from the frozen tissues with TRIzol reagent (Invitrogen, CA, US) according to the manufacturer's protocol. The RNA concentration and purity were measured with a NanoDrop ND2000 (Thermo Scientific, MA, USA). The integrity of the total RNA was examined with an Agilent Bioanalyzer 2100 (Agilent Technologies, CA, US). The 260 nm/280 nm ratio was required to be within the range 1.8-2.2, and the RNA integrity number was required to be higher than 7 for the RNA-Seq experiments.
Two micrograms of mRNA were prepared for construction of the RNA-Seq library. First, we removed the ribosomal RNA from the total RNA with an Epicentre Ribo-Zero TM rRNA Removal Kit (Epicentre, WI, USA). Briefly, we hybridized probes to the RNA, and then the mixture was digested with RNase H and DNase I. The RNA was purified with an Agencourt RNAClean XP system. The sequencing libraries were constructed using a NEBNext R Ultra TM Directional RNA Library Prep Kit for Illumina R (NEB, MA, USA). The purified mRNA was fragmented and reverse-transcribed into first-strand cDNA with random hexamers in the presence of actinomycin D. Then, the second-strand cDNA was synthesized with RNase H and DNA polymerase I. After purification, the cDNA was subsequently subjected to adaptor ligation, USER enzyme digestion and PCR library enrichment. The final purified library was measured and quantified with the Bioanalyzer 2100 (Agilent, CA, USA) and sequenced on an Illumina HiSeq 2500 platform.

RNA-Seq data processing and analysis
We utilized the software FastQC to assess the quality of the Illumina reads, which were trimmed with the FASTX-Toolkit. Then, the human assembly GRCh37 was downloaded from the Ensembl database; indexing was conducted with bowtie2, and the quality trimmed reads were mapped to the genome using TopHat (v 2.0.9). HTSeq was used to compute the read counts for each gene in each sample. The data of DEGs was normalized using the transfer matrix method (TMM). Next, the DEGs were screened with the software DESeq2. The p-values were adjusted with the Benjamini-Hochberg method for multiple comparison testing. The significance of DEGs was accepted at an adjusted p-value lower than 0.05.

Gene Ontology and enrichment analysis
To illustrate the biological functions and pathways of the DEGs, we conducted Gene Ontology (GO) and pathway enrichment analyses. The topGO and clusterProfiler packages in R were utilized to detect GO category enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway overrepresentation from the entire database. Adjusted p-values were calculated with the Benjamini-Hochberg method, and the terms with p-values <0.05 were considered to be significant.

Weighted gene co-expression network analysis
To further delineate the functions of the DEGs, we conducted weighted gene co-expression network analysis on the basis of DEG expression in the studied tissues. According to the WGCNA tutorial, the step-by-step network construction approach was used for module identification. Firstly, we selected the suitable soft-thresholding power by testing a set of candidate powers to evaluate the approximate scale-free topology. Subsequently, the soft-thresholding power equal to 10 was used for calculation of the adjacencies. Then, the adjacencies were transformed into a topological overlap matrix for obtaining the corresponding dissimilarity. The hclust function was applied to produce a hierarchical clustering tree of genes. The leaves in the clustering trees corresponds to individual genes, while branches of the clustering trees represent the highly interconnected and co-expressed genes. The package dynamicTreeCut function for branch cutting, which has the advantages of identifying modules that have highly similar gene expression signatures, was used to detect modules in which the genes were highly co-expressed in the dendrogram groups. After that, module trait association analysis was used to find correlations between modules and phenotypes. Then, the function TopHubInEachModule was used to identify the hub gene in each module that had high connectivity in the weighted co-expression network. The figures were created using the igraph package. Finally, to characterize the heterogeneity of gene expression patterns quantitatively, the personalized expression perturbation profiles (PEEPs) algorithm was performed to identify expression changes within each individual.

Validation of the mRNA expression of hub genes by quantitative real-time PCR
Based on the RNA-seq results, we chose LCN1 (lipocalin 1), LTF (lactotransferrin), SCGB2A1 (secretoglobin family 2A member 1), HBA1 (hemoglobin subunit α1) and HBA2 (hemoglobin subunit α2), which were the top five DEGs in the comparison between pterygium and normal tissues. As well as the five hub genes identified from five modules in line with WGCNA analysis, including FN1 (fibronectin 1), ECM1 (extracellular matrix protein 1), GADD34 (growth arrest and DNA damage inducible gene 34), CXCL12 (C-X-C motif chemokine ligand 12) and IQGAP2 (IQ motif containing GTPase activating protein 2) for qRT-PCR validation.Total mRNA from each sample was isolated using TRIzol (Life Technologies, CA, USA), and the purity ratios (260/280 nm and 260/230 nm) were assessed with a NanoDrop instrument (Thermo Scientific, MA, USA). SYBR Premix Ex Taq II (Takara, Shiga, Japan) and synthesized primers sets (Sangon, Shanghai, China) were used for qRT-PCR. The methods for qRT-PCR were followed as described in our earlier publication (Wang et al., 2018). GAPDH was used as the endogenous control gene. The fold changes were calculated using the 2 − Ct method.

Statistical analysis
Data analysis was performed with the R platform and GraphPad Prism. The data are presented as the mean ± SEM. The pterygium and control groups were compared using independent-sample t -tests, and the significance was set at p ≤ 0.05.

Identification of the DEGs
Based on the RNA-Seq data, a total of 339 DEGs were obtained, and these genes reached the threshold p-value of <0.05. Among these DEGs, 200 transcripts were upregulated in the disease group against the health one, while 139 transcripts were found to decrease in pterygium tissues. The top 10 genes with the most obvious expression changes based on adjusted p-values are listed in Table 1. In addition, the relative expression levels are shown in a heatmap plot (Fig. 1A) and a volcano plot (Fig. 1B).

Gene function analysis for the upregulated genes
Using the software topGO and clusterProfiler, we performed GO and KEGG pathway enrichment analysis to find potential biological pathways of interest. Functional annotation of the 200 upregulated genes revealed that genes with higher expression levels were mainly involved in two biological functions: one was related to the extracellular matrix (ECM) and the other was related to cell adhesion or migration ( Table 2). The top 50 GO terms, including biological process, cellular component and molecular process terms, are shown in Table S1.
By means of clusterProfiler analysis, a total of seven KEGG pathways from the KEGG database were found to be significantly enriched (Table 3). These enriched genes were not only significantly associated with focal adhesion (hsa04510, p corrected = 0.00165) but also with the PI3K-Akt signaling pathway (hsa04151, p corrected = 0.00087), which suggested that complex processes and mechanisms underlie the development of pterygium.

Gene function analysis for the downregulated genes
GO analysis of the 139 downregulated genes revealed that these genes were mainly involved in inflammatory functions, including the ''response to external stimulus'' (GO:0009605,   Table S2. We did not find any KEGG pathways from the KEGG database that were significantly enriched because of the limited number of input genes.

WGCNA
To better understand the co-expression patterns of the DEGs, we performed WGCNA using the counts of each gene in the individual samples. First, the soft-thresholding power was calculated to be equal to 10 with the pickSoftThreshold program. By means of the dynamicTreeCut function, five modules were detected using the topological overlap matrix and corresponding dissimilarity approach; the module assignments are shown under the gene dendrogram in Fig. 2A. Then, we combined the module relationships and clinical traits and quantified the module trait associations. The results revealed that the turquoise module was negatively associated with the phenotypic comparison, while the other four modules were positively correlated with the phenotypic comparison (Fig. 2B). Interestingly, the PI3K-Akt pathway was the most significantly enriched pathway in the blue module, which was positively associated with pterygium, while the protein processing in the endoplasmic reticulum (ER) pathway was enriched in the turquoise module, which was negatively associated with pterygium (Fig. 3). We speculate that pterygium is associated with ER stress-related biological activity. We further annotated the biological functions for each module, and the top 50 biological process (BP), cellular component (CC) and molecular function (MF) terms are listed in Table S3. Next, the package TopHubInEachModule was used to provide an easy way to identify the hub gene from every module. Five hub genes were identified from the five modules. These five hub genes were ECM1 in the yellow module (kME = 0.97), IQGAP2 in the green module (kME = 0.969), PPP1R15A (GADD34) in the turquoise module (kME = 0.939), FN1 in the blue module (kME = 0.995) and CXCL12 in the brown module (kME = 0.965). To demonstrate the connectivity of the five hub genes in the network, we plotted the network using the adjacency matrix of the eigengenes in each module and highlighted the hub genes in the network (Fig. 4). By means of the PEEP method, the set of genes which were significantly perturbed in each single subject were shown in Table S4 , which had potential for diagnosis and treatment of pterygium.

Validation of mRNA expression changes in the top DEGs and hub genes in the pterygium
The expression of the five hub genes (ECM1, IQGAP2, FN1, GADD34, CXCL12) was determined by real-time PCR in comparisons between 10 normal conjunctival tissues and 10 pterygium tissues. The housekeeping gene GAPDH was not significantly altered between the two groups (p = 0.697) and was used to normalize the expression of the five genes. From the results of RNA-Seq, LCN1, LTF, SCGB2A1, HBA1 (hemoglobin subunit alpha 1) and HBA2 (hemoglobin subunit alpha 2) ranked as the top five DEGs in the comparison between pterygium and normal tissues. We successfully validated the results of RNA-Seq in an independent sample cohort. The mRNA expressions of SCGB2A1, LTF and LCN1 were significantly decreased in pterygium tissues compared with normal control tissues, while the mRNA levels of HBA1 and HBA2 were higher in the pterygium tissues than in the control tissues (Fig. 5). As represented in Fig. 6, ECM1 gene expression significantly increased in the pterygium group compared with the control group (7.64-fold increase, p = 0.0036). The other three genes, FN1, CXCL12 and IQGAP2, have higher mRNA levels, although the differences were statistically insignificant (FN1: 1.78-fold increase, p = 0.155; CXCL12: 1.25-fold increase, p = 0.24; IQGAP2: 1.274-fold increase, p = 0.283).
The expression of GADD34, on the other hand, was notably decreased in the pterygium group (0.277-fold decrease, p = 0.005).

DISCUSSION
Earlier, several groups have conducted DNA microarray studies on pterygium; however, few studies have used RNA-Seq, which has advantages over DNA microarray analysis (Rai et al., 2018). Bang et al. (2017) utilized RNA-Seq to explore complement factors and found that several factors were dysregulated in pterygium compared to normal conjunctival tissues (Bang et al., 2017). Enlightened by the previous research on gene expression of pterygium, one main feature of this study is to take the lead to integrate RNA-Seq and bioinformatic analysis methods to explore whole genome transcription in pterygium tissues and is the first study attempting to use the WGCNA method to explore the mechanisms of the disease.

Links to cell adhesion and extracellular matrix remodeling
Since pterygium is an ocular surface disease featuring excessive vascular ingrowth and the accumulation of extracellular matrix, the abnormal expression of extracellular matrix proteins could be related to the formation of pterygium.
According to the results of WGCNA performed on the DEGs, we identified five hub genes in five modules, including ECM1, IQGAP2, GADD34, FN1 and CXCL12. Confirmed by qRT-PCR validation, three out of the five genes were associated with cell adhesion and extracellular matrix remodeling. This is consistent with a previous study showing that ECM1, which encodes extracellular matrix protein 1, was found significantly increased in pterygium. ECM1 showed the highest similarity in the yellow module. Earlier, John-Aryankalayil et al. (2006) reported that upregulation of the ECM1 gene plays a key role in pterygium, as has also been validated by several other studies (Naib-Majani et al., 2004;Turner et al., 2007). FN1 encodes for fibronectin 1, which is a crucial glycoprotein in cell adhesion and migration during embryogenesis, wound healing and metastasis; and higher expression of FN1 has been observed in pterygium tissues (Engelsvold et al., 2013b). A previous study examined significant alterations in FN1 through DNA microarray analysis and reported that FN1 serves as a potential regulator of epithelial cell migration, extracellular matrix deposition and the epithelial-mesenchymal transition in pterygium (Engelsvold et al., 2013a). IQGAP2 is a signal-transducing scaffold protein that acts as an integrator of Rho GTPase and Ca 2+ /calmodulin signals associated with cell adhesion and cytoskeletal reorganization. A study also reported that IQGAP2 plays a role in regulating Wnt/ β-catenin and PI3K/Akt signaling (Schmidt et al., 2003). Interestingly, intranuclear accumulation of β-catenin in pterygium tissues has been reported (Kato et al., 2007). However, the functions of IQGAP2 in pterygium have not been addressed. Our study may provide new insights into the role of IQGAP2 in the mechanisms of pterygium pathogenesis.

Links to immunology
Several potential pathogenesis mechanisms for the ingrowth of pterygium have been reported dating back to last century, including immunological mechanisms (Pinkerton, Hokama & Shigemura, 1984) and increased cell stress (Kau et al., 2006). The results of WGCNA showed that CXCL12 and GADD34 were the hub genes in brown module and turquoise modules, respectively. CXCL12 (also called SDF-1) encodes for stromal cell-derived factor-1, an angiogenic chemokine. It plays a role in diverse cellular functions, including embryogenesis, immune surveillance, inflammation responses and so on. CXCL1 promotes angiogenesis through CXCR2 (Miyake et al., 2013) and regulates the recruitment of granulocytes during the inflammatory process (Geiser et al., 1993). Bamdad et al. (2017) reported that upregulation of SDF-1 contributes to pterygium (Bamdad et al., 2017). Kim et al. (2013) reported that the levels of CXCL12 and CXCR4 can be used to determine the severity of pterygium (Kim et al., 2013). GADD34 is a growth cycle protein that could be induced by growth arrest, DNA damage, and other kinds of cell stress. When intracellular proteins cannot fold properly, the disruption of endoplasmic reticulum (ER) physiological function leads to 'endoplasmic reticulum stress'. A long period of ER stress can induce the expression of GADD34 (Reid et al., 2016). Correspondingly, we also found that the ER stress-related pathway plays a contributory role in the development of pterygium according to the RNA-Seq results.
On the basis of RNA-Seq, the top low expressed DEGs were LCN1, LTF and SCGB2A1 (LCN1: 0.094-fold decrease, p < 0.0001; LTF: 0.36-fold decrease, p < 0.0001; SCGB2A1: 0.26-fold decrease, p < 0.0001) , which were associated with immune process. LCN1, encodes a member of the lipocalin family, is believed to be involved in innate immune response. The expression of LCN1 was found to be significantly increased in breast cancer tissues compared with adjacent normal tissues, possibly resulting from more neoantigens in cancer patients and therefore more immune infiltration (Yang et al., 2019). LTF is an important component of the innate immune system. The other three genes have not been reported in previous pterygium studies. However, the protein levels of these three downregulated genes have been reported to be perturbed in dry eye syndrome patients (Perumal et al., 2016). Dry eye is an important risk factor for the formation of primary or recurrent pterygium, and pterygium is also associated with ocular surface instability and dry eye disease (Ozsutcu et al., 2014). Many studies have attempted to explain the relationship between pterygium and dry eye. Based on our study, attention should be paid to genes such as LCN1, LTF and SCGB2A1, which had lower expression levels in pterygium, to determine whether they are directly involved in the development of pterygium and dry eye.
In conclusion, in this study, we strictly selected subjects and matched the demographic information, including age, gender, etc., between the case and control group. A range of genes and proteins were found to be aberrantly expressed in pterygium tissues, including growth factors, matrix metalloproteinases, interleukins, proliferation-related proteins, apoptosis-related proteins, cell adhesion molecules, tight junction proteins and endoplasmic reticulum stress response pathway-related molecules. The current study also confirmed the roles of key biological activities that have been reported in studies on the molecular mechanisms of pterygium.
All in all, this study has mined deeper into the pterygium transcriptome by applying a combination of RNA-Seq and bioinformatic analysis methods for the first time. We speculate that the genes identified to be associated with pterygium mutually interact and form a complex molecular network. It is possible that the significant dysregulation of the hub genes directly perturbs the entire network, contributing to the initiation and development of pterygium. Thus, pharmaceutical interventions targeting these hub genes might be effective for the treatment of pterygium.
There were some limitations to our study. First, the sample size was relatively small, and we also did not classify pterygium based on morphologic and pathologic characteristics. Second, we did not conduct an in vitro cell experiment to verify the potential mechanism of the ER stress-related pathway. Third, the lack of functional validation of certain hub genes requires future research. Last but not least, many important risk factors, such as ultraviolet irradiation or chronic ocular inflammation could directly affect the development of pterygium. Due to the limited information on this, we did not systematically analyze the correlation between risk factors and the transcriptional data.

CONCLUSION
Taken together, our RNA-Seq data confirm the dysregulated genes that have been published in a DNA microarray study. In addition, inflammatory, cell adhesion and extracellular matrix pathways were found to be enriched for DEGs in pterygium. By WGCNA, we identified five key hub genes and a novel biological pathway involved in the development of pterygium.