Dysregulation of pseudogene/lncRNA-hsa-miR-363-3p-SPOCK2 pathway fuels stage progression of ovarian cancer

Objective: Ovarian cancer is one of the most common and lethal cancer types in women. The molecular mechanism of ovarian cancer progression is still unclear. Results: Here, we first reported that expression levels of three genes, GJB2, S100A2 and SPOCK2, were significantly higher in advanced stage than that in early stage of ovarian cancer, and upregulation of them indicated poor prognosis of patients with ovarian cancer. Subsequently, 8, 6 and 20 miRNAs were predicted to target GJB2, S100A2 and SPOCK2, respectively. Among these miRNA-mRNA pairs, hsa-miR-363-3p-SPOCK2 axis was the most potential in suppressing progression of ovarian cancer. Mechanistically, we found that hsa-miR-363-3p-SPOCK2 axis was involved in regulation of actin cytoskeleton. Moreover, 6 pseudogenes and 8 lncRNAs were identified to potentially inhibit hsa-miR-363-3p-SPOCK2 axis in ovarian cancer. Conclusions: Collectively, we elucidate a regulatory role of pseudogene/lncRNA-hsa-miR-363-3p-SPOCK2 pathway in progression of ovarian cancer, which may provide effective therapeutic approaches and promising prognostic biomarkers for ovarian cancer. Materials and methods: Differentially expressed genes (DEGs) in ovarian cancer were first screened using GSE12470, after which DEGs expression were validated using GEPIA. Kaplan-Meier analysis was employed to assess the prognostic values. Potential miRNAs were predicted by seven target prediction databases, and upstream lncRNAs and pseudogenes of hsa-miR-363-3p were forecasted through miRNet or starBase. UALCAN and starBase were used to obtain the co-expressed genes of SPOCK. Enrichment analysis for these co-expressed genes was performed by Enrichr.


INTRODUCTION
It is known to all that RNAs consist of coding RNAs (messenger RNAs, mRNAs) and noncoding RNAs (ncRNAs). In recent years, ncRNAs have been the centerpiece of human genome research [1]. NcRNAs have key implications for human health and dysregulation of ncRNAs causes a variety of human disorders, including cancer [2,3]. There are many types of ncRNAs, including microRNA (miRNA), long ncRNA (lncRNA), pseudogene and circular RNA (circRNA) [4,5]. In 2011, the team of Salmena et al. AGING proposed competing endogenous RNA (ceRNA) hypothesis, which is a regulatory mechanism between mRNAs and ncRNAs [6]. ceRNA mechanism demonstrates that lncRNAs, pseudogenes, circRNAs and mRNAs can cross-talk by competitively binding to shared miRNAs, thereby exerting their biological functions [7]. Increasing ncRNAs have been discovered to act as important tumor promoters or suppressors [8][9][10][11]. Moreover, ncRNAs also serve as potential diagnostic and prognostic biomarkers for cancer [12][13][14][15]. To date, only an extremely limited number of ncRNAs have been characterized functionally in cancer, though thousands of ncRNAs have been annotated.
Ovarian cancer ranks the seventh most common neoplasm among the world and the eighth leading cause of cancer deaths in women [16,17]. Globally, 239,000 new patients were affected by ovarian cancer and 152,000 deaths occurred every year [18]. It's widely known that ovarian cancer can be classified into three subtypes, including epithelial, specialized stromal cell and germ cell tumors, among which epithelial ovarian cancer occupies largest ratio in ovarian cancer. Besides, epithelial ovarian cancer is the most lethal gynecologic malignancy. Different types of epithelial ovarian cancer have distinct mutational spectrum and different prognosis [19][20][21]. To date, surgery and platinum/ taxane chemotherapy combinate as the main treatment for ovarian cancer [22,23]. It's an effective method for early stage ovarian cancer patients, with the five-year survival rates up to 92% [24]. However, in spite of huge advances in various therapies in the past few decades, only 19% ovarian cancer patients are diagnosed at early stage, due to deficiency of typical clinical symptoms and manifestations and fast tumor progression, and the fiveyear survival rates of ovarian cancer are still only below 30% [25]. Hence, it's extremely meaningful to rapidly seek potential biomarkers and therapeutic targets for improving outcomes of ovarian cancer patients.
In this study, we constructed a network linked to stage progression of ovarian cancer by a series of analytic processes ( Figure 1). Firstly, we obtained differentially expressed genes (DEGs)-associated with stage progression of ovarian cancer using GEO dataset, GSE12470. Next, TCGA ovarian cancer cohort was employed to validate expression of these DEGs and perform survival analysis. Subsequently, upstream miRNAs of SPOCK2 were predicted. By expression correlation analysis and survival analysis, the most potential miRNA (hsa-miR-363-3p) binding to SPOCK2 was identified. Finally, the potential upstream dysregulated mechanisms (pseudogenes and lncRNAs) and downstream modulatory pathway were explored. The established pseudogene/lncRNA-hsa-miR-363-3p-SPOCK2 pathway shed novel insight into molecular mechanism of ovarian cancer progression and may provide effective therapeutic targets and promising prognostic biomarkers for ovarian cancer.

Screen of candidate genes associated with stage progression of ovarian cancer
To identify the key genes involved in stage progression of ovarian cancer, GSE12470 dataset was selected to perform differential expression analysis using GEO2R. As described in materials and methods, all cases of GSE12470 were divided into three groups: "Nor" group, "Ear" group and "Adv" group. A lot of differentially expressed genes (DEGs) between "Nor" group and "Adv" group or between "Ear" group and "Adv" group were discovered as shown in Figure 2A and Figure 2B, respectively. A total of 2,555 and 2,310 DEGs were significantly upregulated and downregulated in early ovarian cancer samples compared with normal samples (Supplementary Table 1), respectively. And a total of 921 and 196 DEGs were significantly upregulated and downregulated in advanced ovarian cancer samples compared with early ovarian cancer samples, respectively (Supplementary Table 2). This study aims to find those genes associated with stage progression of ovarian cancer. Therefore, we obtained the upregulated and downregulated DEGs that were commonly appeared in two comparison sets. As presented in Figure 2C-2D and Table 1, 35 upregulated DEGs and 2 downregulated DEGs were finally identified. The 37 DEGs were defined as candidate genes and selected for subsequent analysis.

Expression validation and survival analysis of candidate genes in ovarian cancer
Next, for improving accuracy of analysis, TCGA ovarian cancer samples and GTEx normal data were introduced to validate expression levels of 37 candidate genes via GEPIA database. As shown in Figure 3, among the 37 candidate genes, expression of 26 genes (ADAMDEC1,  C11orf15, CCNA2, CHEK1, CLGN, DAPL1, EGFL6,  FAM181A, FAM83D, FOXA2, GJB2, GPR19,  LRRTM1, LYPD1, MAD2L1, NEIL3, PART1, PNOC,  S100A2, SPOCK2, ST6GALNAC2, TGM1, XPR1, ZBED2, EPS8 and FAXDC2) were in accordance with the results acquired from GSE12470. Notably, only two genes (EPS8 and FAXDC2) were downregulated in cancer tissues compared with normal controls. In the following survival analysis, we focused on the 26 candidate genes. Firstly, TCGA ovarian cancer cohort was introduced to assess the prognostic values (including OS and RFS) of the 26 candidate genes. As shown in Figure 4A, high expression of GJB2, S100A2, SPOCK2, TGM1or EPS8 indicated a poor OS of patients with AGING  patients with  higher expression of ADAMDEC1, CHEK1, EGFL6,  FAM181A, FOXA2, LYPD1, PART1 or XPR1  possessed better OS. For RFS of patients with ovarian  cancer, SPOCK2 and FAXDC2 were unfavorable  prognostic biomarkers but ADAMDEC1, CCNA2,  FAM181A, FOXA2, LRRTM1, NEIL3 and XPR1 were favorable prognostic biomarkers ( Figure 4B). Among these genes, only expression levels of GJB2, S100A2, SPOCK2 and TGM1 were significantly upregulated in cancer samples and their high expression indicated poor prognosis (OS or RFS). The detailed information of survival analysis was listed in Figure 4C and Figure 4D. Subsequently, we introduced the ovarian cancer data showing the DEGs between normal samples (n=10) and early ovarian cancer (n=8). (B) Volcano plot showing the DEGs between early ovarian cancer (n=8) and advanced ovarian cancer (n=35). Note: the black dots represent genes that are not significantly differentially expressed between two groups, and the green dots and red dots represent the downregulated and upregulated genes in early ovarian cancer (compared with normal samples) and advanced ovarian cancer (compared with early ovarian cancer), respectively. |log2FC| > 1 and P-value < 0.05 were set as the cut-off criteria. FC = fold change. (C) The intersection of upregulated DEGs of "Nor vs Ear" and "Ear vs Adv". (D) The intersection of downregulated DEGs of "Nor vs Ear" and "Ear vs Adv". "Nor" vs Ear" represents the differential expression analysis between normal samples and early ovarian cancer. "Ear vs Adv" represents the differential expression analysis between early ovarian cancer and advanced ovarian cancer. AGING other than TCGA to further assess the survival effects of GJB2, S100A2, SPOCK2 and TGM1 on prognosis (OS) by Kaplan-Meier plotter. As presented in Figure 4E-4G, ovarian cancer patients with higher expression of GJB2, S100A2 and SPOCK2 showed better OS, which was identical with the analytic results obtained from TCGA cohort. However, in 206008_at cohort, increased expression of TGM1 linked to favorable OS of ovarian cancer ( Figure 4H), contrary to TCGA cohort. Taken together, GJB2, S100A2 and SPOCK2 might be three key genes in mediating tumor stage progression of ovarian cancer and causing poor prognosis.

Co-expression and enrichment analyses reveal that hsa-miR-363-3p-SPOCK2 axis is involved in regulation of actin cytoskeleton
Co-expression analysis was performed using two databases, namely UALCAN and GEPIA. 100 and 200 co-expressed genes were obtained from UALCAN and GEPIA as listed in Supplementary Table 3. By intersection of two gene sets, we found that 77 coexpressed genes of SPOCK2 were commonly appeared in the two databases ( Figure 6A and Supplementary Table 4). The 77 genes and their correlation coefficients were presented in Table 3. GO functional annotation and KEGG pathway enrichment analysis were introduced to better understand these genes via Enrichr database. The top 10 enriched GO terms were shown in Figure 6B-6D, containing membrane raft organization, membrane raft assembly and positive regulation of protein acetylation in the BP category, actin cytoskeleton, membrane raft and focal adhesion in the CC category, and cadherin binding involved in cell-cell adhesion, protein binding involved in cell-cell adhesion and cadherin binding in the MF category. The top 10 enriched KEGG pathways were presented in Figure 6E, including bacterial invasion of epithelial cells, leukocyte transendothelial migration and regulation of actin cytoskeleton. These findings demonstrate that hsa-miR-363-3p-SPOCK2 axis may regulate actin cytoskeleton and thereby take part in cell adhesion, invasion and migration, and finally suppress progression of ovarian cancer.

DISCUSSION
Ovarian cancer is notorious for its aggressive natural characteristic. Though great developments have been achieved in diagnosis and therapy for ovarian cancer, patients' prognosis is still dismal. Exploration of In this study, based on comprehensive expression analysis and survival analysis, three genes (GJB2, S100A2 and SPOCK2) were identified as key genes which may be associated with progression of ovarian cancer. Expression levels of GJB2, S100A2 and SPOCK2 were increased in ovarian cancer and their upregulation was linked to poor prognosis of patients with ovarian cancer. The 3 genes have been confirmed to act as oncogenes in multiple human cancers and link to cancer progression. Besides, many studies have also demonstrated that the three genes may serve as promising biomarkers for cancer. For example, GJB2 promoted early stage breast cancer progression by regulating cancer stemness [28]; Zhu et al. suggested that GJB2 was a prognostic marker in patients with pancreatic cancer by genome-scale analysis [29]; the group of Sun D found that GJB2 was identified as predictive biomarkers for pancreatic cancer by integrated whole genome microarray analysis and immunohistochemical assay [30]; Masuda et al. indicated that overexpressed S100A2 was a prognostic marker for patients with stage II and III colorectal cancer [31]; Kwon et al. showed that S100A2 was significant for progression of prostate adenocarcinoma [32]. Regrading to SPOCK2, the group of Ren F confirmed that SPOCK2 contributed to the progression of endometrial cancer by modulating the biological behavior of cancer cells [33]. All these reports together with our current analytic results of expression and survival analysis demonstrate that GJB2, S100A2 and SPOCK2 may be three key oncogenes in the progression of ovarian cancer.
miRNAs, a class of identified ncRNA molecules, play important roles in regulating the biological behaviors by suppressing target gene expression [26,27]. Therefore, we intended to seek those miRNAs targeting to GJB2, S100A2 or SPOCK2. Using several online databases, some potential miRNAs, including 8 for GJB2, 6 for S100A2 and 20 for SPOCK2, were predicted. According to the action mechanism of miRNA, these miRNAs should be tumor suppressive miRNAs in ovarian cancer. After performing survival analysis, six miRNA-mRNA pairs (hsa-miR-105-5p-GJB2, hsa-miR-522-3p-GJB2, hsa-miR-421-S100A2, hsa-miR-363-3p-SPOCK2, hsa-miR-362-3p-SPOCK2 and hsa-miR-942-5p-SPOCK2) possessed the most potential functions in ovarian cancer and were selected for subsequent expression correlation analysis. Correlation analysis showed that only hsa-miR-363-3p-SPOCK2 pair existed significant negative relationship. Altogether, hsa-miR-363-3p-SPOCK2 axis was considered as the potential pathway, involving in progression of ovarian cancer. Numerous studies supported that hsa-miR-363-3p functioned as a key suppressor in development of multiple human cancers. For example, hsa-miR-363-3p was found to inhibit tumor growth and metastasis of colorectal cancer via targeting SPHK2 [34]; He et al. suggested that hsa-miR-363-3p acted as a tumor suppressor in osteosarcoma cells by inhibiting PDZD2 [35]. These reports partially supported the accuracy of our bioinformatics analysis. Next, the co-expressed genes of SPOCK2 were acquired using UALCAN and GEPIA databases. Enrichment analysis for these coexpressed genes revealed that they were significantly enriched in actin cytoskeleton-associated pathways. It has been well documented that actin cytoskeleton is closely correlated with cancer invasion and metastasis [36,37]. Hence, hsa-miR-363-3p-SPOCK2 axis may suppress invasion and metastasis through regulation of actin cytoskeleton and finally block stage progression of ovarian cancer. In additional to miRNAs, there are also some other types of ncRNAs, such as lncRNAs and pseudogenes. They can serve as ceRNAs of mRNAs by competitively binding to common miRNAs, thereby participating in human health and disorders, containing cancer [13,38,39]. The upstream pseudogenes of hsa-miR-363-3p-SPOCK2 axis were first obtained by starBase database. Expression differences between ovarian cancer samples and control normal or among various major stages were further determined using GEPIA database. Among 56 predicted pseudogenes, only 6 pseudogenes (RPS26P15, AC004057.1, RPS26P31, RPS26P6, RPS26P3 and RPS26P47) were significantly upregulated in cancer samples (compared to normal samples) and advanced stage of ovarian cancer (compared to early stage ovarian cancer). Expression correlation analysis demonstrated that hsa-miR-363-3p was inversely linked to RPS26P15, AC004057.1, RPS26P31, RPS26P6, RPS26P3 or RPS26P47. By combination of ceRNA mechanism and these analytic findings, the six pseudogenes were identified as the potential regulators of hsa-miR-363-3p-SPOCK2 axis in ovarian cancer. Finally, using starBase and miRNet databases, the upstream regulatory lncRNAs of hsa-miR-363-3p-SPOCK2 axis were also predicted. Eight lncRNAs (SNHG5, DAPK1-IT1, MALAT1, TBX5-AS1, SNHG14, OIP5-AS1, PITPNA-AS1 and XIST) were commonly appeared in the two databases. Among these LncRNAs, MALAT1 and SNHG14 have been demonstrated to function as oncogenes in ovarian cancer. For example, lncRNA MALAT1 promoted proliferation and metastasis of epithelial ovarian cancer via the PI3K-AKT pathway [40]; lncRNA SNHG14 enhanced proliferation and metastasis of ovarian cancer by sponging miR-219a-5p [40]. The rest lncRNAs were also found to function as oncogenes in other types of human cancer. For example, lncRNA SNHG5 facilitated growth and invasion of melanoma by regulating the miR-26a-5p/TRPC3 pathway [41]; lncRNA OIP5-AS1 predicted poor prognosis and regulated cell proliferation and apoptosis in bladder cancer [42]; lncRNA XIST enhanced pancreatic cancer cells invasion through promotion of TGF-β2 expression by targeting miR-141-3p [43]. These reports further supported that the 8 lncRNAs, similar to 6 potential pseudogenes, may also play crucial roles in regulating hsa-miR-363-3p-SPOCK2 axis, thus involving in progression of ovarian cancer.

CONCLUSIONS
In summary, a series of integrated bioinformatics analyses suggest that hsa-miR-363-3p-SPOCK2 axis may play key roles in progression of ovarian cancer by regulating actin cytoskeleton. Furthermore, the potential upstream pseudogenes and lncRNAs of has-miR-363-3p-SPOCK2 axis are successfully identified. The constitutes in this pseudogenes/lncRNAs-hsa-miR-363-3p-SPOCK2 network may be utilized as promising therapeutic targets and prognostic biomarkers in the future.

Microarray
The mRNA microarray profiles of ovarian cancer were downloaded from GEO database (http://www.ncbi.nlm. nih.gov/geo) by searching keywords (("gene" OR "mRNA" [all fields] AND ("ovarian cancer" OR "ovarian carcinoma" OR "EOC" [all fields]) AND "Homo sapiens" [porgn]). Then, the titles and abstracts of datasets were screened, and the full information of the datasets of interest were further evaluated and finally selected according to the following inclusion criteria: (1) selected datasets should be mRNA transcriptome data; (2) the data should be derived from tumor tissues and normal tissues of patients with ovarian cancer; (3) only datasets containing more than 5 cancer samples and 5 normal samples were included; (4) cancer samples in selected datasets should contain early stage and advanced stage ovarian cancer. Finally, only one dataset GSE12470 was included in this study. GSE12470 dataset, which is based on the platform of Agilent-012097 Human 1A Microarray (V2) G4110B, contained 53 tissue samples, involving 10 normal samples, 8 early ovarian cancer samples and 35 advanced ovarian cancer samples.

Differential expression analysis
The differentially expressed genes (DEGs) were obtained by conducting differential expression analysis. GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r), an online analytic tool provided by the GEO database, was employed to perform differential expression analysis. Firstly, we divided the 53 samples into three groups, namely normal group ("Nor", 10 samples), early stage ovarian cancer group ("Ear", 8 samples) and advanced stage ovarian cancer group ("Adv", 35 samples). Then, differential expression analysis was successively done between "Nor" group and "Ear" group, "Ear" group and "Adv" group. |log2FC| > 1 and P-value < 0.05 were set as the cut-off criteria. Finally, VENNY 2.1.0 (http://bioinfogp.cnb.csic.es/tools/venny/index.html) was utilized to draw the Veen diagrams. The DEGs were commonly upregulated or downregulated in "Ear" group (compared with "Nor" group) and "Adv" group (compared with "Ear" group). These DEGs were redefined as significant DEGs. FC = fold change.

Gene Expression Profiling Interactive Analysis (GEPIA) database analysis
GEPIA database (http://gepia.cancer-pku.cn/detail.php), a newly developed interactive web server for analyzing the RNA sequencing expression data, included 9,736 tumors and 8,587 normal samples from The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) projects [15,44]. GEPIA database was used to validate the expression levels of significant DEGs in ovarian cancer. |log2FC| > 1 and P-value < 0.05 were set as the cut-off criteria. After entering the gene symbols into the "Gene" column, the expression box plots were automatically generated on the webpage. Similarly, gene expression differences among various major stages were also determined using GEPIA database. Besides, GEPIA database was also introduced to identify the co-expressed genes of SPOCK2 in ovarian cancer.

Kaplan-Meier plotter analysis
Kaplan-Meier plotter is a database that can be capable of assess the effect of 54,000 genes on survival in 21 cancer types, and the miRNA subsystems in this database include 11,000 samples from 20 different cancer types [45,46]. Kaplan-Meier plotter was utilized to evaluate the prognostic values of genes or microRNAs in ovarian cancer. Briefly, genes or microRNAs were first typed into the database. All cases of ovarian cancer were classified into a low expression group and a high expression group based on the median expression value. Subsequently, Kaplan-Meier survival plots were generated on the webpage. The hazard ratio (HR), 95% confidence interval (CI) and logrank P-value were also automatically calculated and displayed on the webpage. A logrank P-value < 0.05 was considered as statistically significant.

starBase database analysis
starBase database is a widely-used open-source platform for studying the ncRNA interactions from CLIP-seq, degradome-seq and RNA-RNA interactome data [47,48]. Herein, starBase database was introduced to analyze the expression correlation between miRNA and gene or pseudogene. R < -0.1 and P-value < 0.05 were set as the cut-off criteria for identifying the significant miRNAgene/pseudogene pairs. starBase database was also used to predict the pseudogenes and lncRNAs that can theoretically bind to hsa-miR-363-3p.

UALCAN database analysis
UALCAN database is a portal for analyzing gene expression and survival effect, which provides easy access to publicly available cancer transcriptome data including ovarian cancer [49,50]. In this study, the database was utilized to obtain the co-expressed genes of SPOCK2 in ovarian cancer. Then, these co-expressed genes were intersected with the co-expressed genes acquired from GEPIA database as mentioned above. The co-expressed genes commonly appeared in the two databases were re-defined as co-expressed genes of interest and were chosen for subsequent enrichment analysis.

Enrichr database analysis
Gene Ontology (GO) functional annotation and KEGG pathway enrichment analysis for the co-expressed genes of SPOCK2 was performed using Enrichr database as we previously described [51][52][53]. Three categories, including biological process (BP), cellular component (CC) and molecular function (MF), were included in GO functional annotation. The top 10 enriched GO items and KEGG pathways were displayed on the webpage and downloaded as images. miRNet database analysis miRNet (https://www.mirnet.ca/), an easy-to-use comprehensive platform integrated data from several miRNA-linked databases (TarBase, miRTarBase, miRecords, miRanda), was employed to predict the potential lncRNAs binding to hsa-miR-363-3p. Subsequently, these lncRNAs were intersected with the lncRNAs obtained from starBase database to acquire the most potential regulatory lncRNAs of hsa-miR-363-3p.

AUTHOR CONTRIBUTIONS
WYL, WMF and PFF designed and conceived this project. WYL performed in silico analysis of the data and wrote the manuscript. BSD and GSZ performed some in silico analysis. PFF and CYD revised the manuscript. All authors have read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors have no conflicts of interest to declare. AGING