The lncRNA GATA3-AS1/miR-495-3p/CENPU axis predicts poor prognosis of breast cancer via the PLK1 signaling pathway

The function of centromere protein U (CENPU) gene in breast cancer has not been well understood. Therefore, we explored the expression profiles of CENPU gene in breast carcinoma to better understand the functions of this gene, as well as the relationship between CENPU expression and the prognosis of breast carcinoma patients. Our results indicate that CENPU was expressed at significantly higher levels in cancerous tissues than in normal tissues. Furthermore, CENPU expression correlated significantly with many clinicopathological characteristics of breast cancer. In addition, we discovered that high levels of CENPU expression predicted poor prognosis in patients with breast cancer. Functional investigation revealed that 180 genes exhibited co-expression with CENPU. Functional annotation indicated that 17 of these genes were involved in the PLK1 signaling pathway, with most of them (16/17) being expressed at significantly higher levels in malignant tissues compared with normal controls and correlating with a poor prognosis. Subsequently, we found that four miRNAs, namely hsa-miR-543, hsa-miR-495-3p, hsa-miR-485-3p, and hsa-miR-337-3p, could be regarded as potential CENPU expression regulators. Then, five lncRNAs were predicted to potentially bind to the four miRNAs. Combination of the results from expression, survival, correlation analysis and functional experiments analysis demonstrated the link between lncRNA GATA3-AS1/miR-495-3p/CENPU axis and prognosis of breast cancer. In conclusion, CENPU could be involved in cell cycle progression through PLK1 signaling pathway.


INTRODUCTION
Breast cancer is common malignancy threatening female health globally. It is predicted that the incidence and mortality rate of breast cancer will rise substantially within the next 5 to 10 years [1]. As a type of aggressive malignancy with high heterogeneity, breast carcinoma has been connected with complex biological events. The occurrence of breast cancer even at a young age highlights its heterogeneity and complexity [2,3]. Although chemotherapy, surgical resection, and radiotherapy have improved outcomes for breast cancer patients over the recent decades, the median survival of breast cancer patients with metastasis remains disappointedly low (around 24 months) [4]. Mechanisms underlying the development and progression of breast cancer remain unclear. Therefore, it is necessary to investigate the underlying molecular events and identify novel therapeutic targets and prognostic biomarkers for effective management of breast cancer.
Centromere protein U (CENPU) gene is localized at 4q35.1 in human genome. The gene spans a genomic DNA region of 75.8 kilobases (kb) and consists of 14 exons. The protein product of this gene has alternative names of myeloid leukemia factor 1 interacting protein (MLF1IP), Cenp-50/PBIP1, or KLIP1 [5]. According to previous studies, CENPU is implicated in kinetochore AGING assembly, mitotic progression, and segregation of chromosomes [6,7]. Our previous study demonstrated that CENPU downregulation might inhibit the proliferation of human breast cancer cells [8]. However, molecular mechanisms underlying that observation remain undetermined. Therefore, it is important to explore the functions of CENPU and its relationship with survival outcomes and pathohistological characteristics in breast carcinoma patients.
It has been demonstrated that non-coding RNAs, more specifically, microRNAs (miRNAs) and long noncoding RNAs (lncRNAs), play essential roles in tumor progression [9,10]. Among them, miRNAs are ~22nucleotide-long non-coding RNAs that regulate target gene expression at the post-transcriptional level [11], while lncRNAs are a class of non-coding transcripts implicated in multiple biological events, such as cell differentiation, cell growth, transcriptional and post transcriptional regulation of gene expression, and immune activation/ inactivation [12,13]. In our current study, we also tried to identify miRNAs and lncRNAs that can regulate CENPU expression in breast cancer.

CENPU gene mutations in breast cancer
CENPU gene mutations in breast cancer patients were retrieved from the Catalogue of Somatic Mutations in Cancer (COSMIC) database (https://cancer.sanger.ac.uk/cosmic). Before April 7, 2020, CENPU gene in specimens from 37,419 patients had been sequenced, leading to the identification of 297 unique samples with CENPU mutations (Table 1). Among the identified mutations, 122 were point mutations, including six nonsense substitutions, 94 missense substitutions, and 22 synonymous substitutions, 3 were frameshifting insertions, and one was frameshifting deletion; No inframe deletions, inframe insertions, or complex mutations were identified (Table  1). Taken together, these data indicate a low incidence of CENPU gene mutation in breast carcinoma patients, implying that CENPU gene mutation might not be the reason for the differences in gene expression.

Aberrant CENPU expression in breast carcinoma
An aberrant high expression of a gene in cancerous tissues is a significant indicator that the gene can be considered as a diagnostic or prognostic biomarker [14]. In view of this, based on the HPA database, we determined CENPU mRNA expression levels in normal and cancerous tissues. CENPU expression was observed in both normal and malignant breast tissues ( Figure 1A-1B). Next, we conducted an analysis of CENPU expression in tissue samples deposited in The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium (CPTAC) databases by using the online database UALCAN. We observed that both the mRNA and protein levels of CENPU were significantly higher in malignant tissues than in normal tissues ( Figure 1C-1D). In addition, our previous study found significantly increased CENPU levels in malignant tissues than in adjacent normal breast tissues [8]. Taken together, these findings imply that CENPU may be a potential biomarker for breast carcinoma.
Subsequently, differences in CENPU expression in breast carcinoma patients with different clinical and pathological parameters were determined based on bc-GenExMiner. We found significant correlation between CENPU expression and the age of breast cancer patients, with higher CENPU expression levels being observed in patients aged ≤51 years than in patients aged >51 years ( Figure 2A). In addition, there were significant correlations between CENPU expression and Scarff-Bloom-Richardson (SBR) grade ( Figure 2B) and neuropsychiatric inventory (NPI) score ( Figure 2C). Furthermore, CENPU was markedly upregulated in patients with lymph node metastasis ( Figure 2D), as well as in estrogen receptor (ER)-negative ( Figure 2E), progesterone receptor (PR)-negative ( Figure 2F), and human epidermal growth factor receptor 2 (HER2)positive breast carcinoma patients ( Figure 2G). The expression of CENPU also differed considerably among different HU's ( Figure 2H) and robust single sample predictor classification (RSSPC) subtypes ( Figure 2I). Finally, we also found a significantly higher CENPU expression in basal-like breast carcinoma ( Figure 2J), triple-negative breast carcinoma (TNBC) ( Figure 2K), and basal-like & triple-negative breast triple-negative breast carcinoma ( Figure 2L), compared with nonbasal-like breast cancer, non-TNBC, and non-basal-like & non-triple-negative breast carcinoma, respectively.
According to eight studies included in the Oncomine database, the expression of CENPU was significantly higher in TNBC patients than in non-TNBC patients (fold change > 1.5, gene rank: top 10%) ( Table 2). All these data suggest that CENPU is not only an indicator of breast cancer, but also a molecular indicator of triplenegative breast carcinoma.

Correlation of CENPU expression with prognosis of breast carcinoma patients
We next explored the prognostic significance of CENPU expression in breast cancer. Two probes (218883_s_at and 229305_at) related to CENPU were retrieved from the Kaplan-Meier plotter database ( Figure 3). We found that CENPU expression AGING  Table 3). According to our analysis, CENPU expression level correlated significantly with OS, RFS, DMFS, disease-free survival, and disease-specific survival. Similarly, high CENPU expression was correlated with higher hazard ratios in breast carcinoma patients. All these data suggest that CENPU expression is also an indicator of prognosis of breast carcinoma patients.

The involvement of CENPU in the PLK1 signaling pathway
To better understand the functions of CENPU, we utilized three databases (cBioPortal, GEPIA, and AGING UALCAN) to identify genes that exhibited co-expression with CENPU. We found that the number of genes coexpressed with CENPU was 214, 1001, and 1094, as revealed by the cBioPortal, GEPIA, and UALCAN databases, respectively ( Figure 4A). Venn diagram showed that the number of genes exhibited co-expression with CENPU was 180. To gain a better understanding of these genes, we conducted gene ontology (GO) annotation and pathway enrichment analyses with the Enrichr database ( Figure 4B-4G). For functional annotation, three categories of GO term were analyzed, including biological process, cellular component, and molecular function. For pathway enrichment, cell signaling pathways included in the NCI-Nature Pathway, Reactome Pathway, and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway were analyzed. As presented in Figure 4B-4D, the top two GO terms enriched were DNA metabolic process and DNA replication in the GO category of biological process, spindle and mitotic spindle in the category of cellular component, and microtubule binding and tubulin binding in the category of molecular function. As shown in Figure 4E-4G, the most enriched pathways in the three GO categories were cell cycle-related. It is noteworthy that the PLK1 signaling pathway was the most enriched signaling cascade in the NCI-Nature Pathway.
Since the PLK1 signaling pathway is associated with cell cycle, we next focused on PLK1 signaling-related events. In addition to CENPU, there were 17 other genes enriched in the PLK1 signaling cascade ( Table 4). The correlation coefficients between these genes and CENPU were in the range of 0.32-0.69 according to the cBioPortal, GEPIA, and UALCAN databases. Next, we compared the expression levels of these 17 genes in breast carcinoma tissues with those in normal controls  by utilizing the UALCAN database ( Figure 5). All these genes showed a significantly higher expression in malignant tissues than in their normal counterparts. We then determined the prognostic significance of these genes in breast cancer by utilizing the Kaplan-Meier plotter database and found that high expression levels of 16 genes were associated with poor OS ( Figure 5). These results were similar to those observed on CENPU. Gene Set Enrichment Analysis (GSEA) showed that the 17 genes co-expressed with CENPU were enriched in PLK1 pathway, G2/M cell cycle, cell cycle mitotic, and cell cycle from reactome in TNBC along with high expression of CENPU ( Figure 6A-6D).
All these data imply that CENPU may be involved in cell cycle progression through the PLK1 signaling pathway.

Identification of key lncRNAs that can potentially regulate the key miRNAs
Growing evidence has suggested that lncRNAs can function as ceRNAs to interact with mRNA by competing for shared miRNA [16]. Based on this principle, we used the starBase database to identify AGING lncRNAs that can potentially bind with the four abovementioned key miRNAs. Five such lncRNAs were eventually identified (Table 7). Based on the aforementioned ceRNA theory, two lncRNAs with correlation coefficient absolute values above 0.1 (GATA3-AS1 and PAXIP1-AS1) were selected for subsequent analyses. Furthermore, by using the GEPIA and starBase databases, we found that only GATA3-AS1 exhibited significantly higher levels in malignant samples than in their normal counterparts ( Figure 8A-8B). Therefore, we next tried to explore the correlation between GATA3-AS1 expression and survival of breast cancer patients. Our results indicate that higher expression levels of GATA3-AS1 correlate with worse prognosis ( Figure 8C). To further verify this finding in vitro, we carried out loss-of-function analyses in breast carcinoma cell lines. Quantitative reverse-transcription PCR (qRT-PCR) assays demonstrated that both GATA3-AS1 and CENPU were markedly overexpressed in MDA-MB-468, BT-549, HCC1954, and MCF-7 (three breast cancer cell lines) than in MCF-10A cells (normal human breast epithelial cells), while the expression of miR-495-3p was lower in breast cancer cell lines than in normal breast epithelial cells ( Figure 8D-8F). By silencing GATA3-AS1, CENPU was downregulated ( Figure 8G) and miR-495 was   Figure 8H) in MCF-7 cells. Additionally, the proliferation of MCF-7 cells was suppressed ( Figure  8I). Therefore, GATA3-AS1 was defined as a key lncRNA. Combined with the results obtained from the expression analysis, survival analysis, and correlation analyses, these functional experiments demonstrate a link between the lncRNA GATA3-AS1/miR-495-3p/CENPU axis and the prognosis of breast cancer patients ( Figure 8J).

DISCUSSION
Despite great efforts have been invested in breast cancer biology during the past decades, the disease still poses a serious threat to the global public health. ER -/PR -/HER2breast cancer, also defined as triple-negative breast cancer (TNBC), is a most common and lifethreatening breast cancer subtype with the worst prognosis [17]. However, the molecular mechanisms underlying TNBC tumorigenesis are still elusive. Therefore, we tried to find a new molecular target for breast cancer treatment in this study. We first analyzed CENPU gene mutations in breast carcinoma patients and found a relatively low incidence of CENPU gene mutation, implying that CENPU gene mutation might not be the key reason for the differences in its mRNA level. Then we analyzed the expression profile of CENPU gene based on the HPA database. Our data indicated that the expression of CENPU was significantly higher in malignant tissues than in normal tissues. Also, CENPU expression significantly correlated with clinical and pathological features of breast carcinoma patients, especially patients with TNBC, implying that the expression of CENPU was closely linked to progression of breast carcinoma. Next, based on the PrognoScan database, we explored the prognostic significance of CENPU expression in breast carcinoma and found that higher CENPU expression levels were predictive of a poorer prognosis. Taken together, these data imply that CENPU is a molecular indicator for breast carcinoma, especially for TNBC.
Next, we explored the mechanism underlying CENPU's prognostic value in breast cancer. Co-expression analysis identified 180 genes co-expressed with CENPU. Functional annotation demonstrated that these genes were significantly enriched in the PLK1 signaling pathway. Among these genes, 17 were directly involved in the PLK1 signaling pathway. We then observed that most PLK1 signaling pathway-related genes were aberrantly upregulated in breast cancer, and that high expression of most of these genes closely correlated with a poor prognosis. Combined with our previous findings, we propose that CENPU is involved in the PLK1 signaling pathway.
Subsequently, we aimed to explore how CENPU was regulated in breast cancer. It was previously reported that miRNAs are natural inhibitors of oncogenes and suppressors of tumorigenesis [18]. In this study, we identified four miRNAs, namely hsa-miR-543, hsa-miR-495-3p hsa-miR-485-3p, and hsa-miR-337-3p, as key miRNAs that can potentially regulate CENPU expression. Our subsequent lncRNA analysis identified GATA3-AS1, which can bind with has-miR-495-3p, as a key lncRNA. Finally, correlation analysis revealed that only the lncRNA GATA3-AS1/miR-495-3p/CENPU axis conformed to the ceRNA theory. Functional analyses were then carried out to investigate the molecular events downstream of GATA3-AS1 and the relationships between CENPU, miRNA-495, and GATA3-AS1. The RT-qPCR results revealed that GATA3-AS1 exhibited significantly higher expression levels in breast cancer cell lines than in normal control.

AGING
In addition, the expression of GATA3-AS1 was positively correlated with CENPU expression and negatively correlated with that of miRNA-495, which is consistent with our bioinformatics analysis results. Finally, in terms of biological function, GATA3-AS1 knockdown inhibited the proliferation of breast cancer cells. These results suggest that GATA3-AS1 plays a crucial role in tumorigenesis of breast cancer.
In conclusion, using integrated bioinformatics analysis, we found a novel lncRNA-miRNA-mRNA axis of high prognostic value in breast carcinoma. More experiments AGING and large-scale clinical trials are required in the future to further verify our results.

Analysis based on the COSMIC database
COSMIC is an online database containing information about all types of genomic alteration in human malignancies [19]. Using COSMIC version 91, we summarized gene mutations that can potentially affect CENPU expression. All the data were retrieved on April 7, 2020.

Analysis based on the HPA database
The HPA database is dedicated to providing immunohistochemical data regarding the expression and distribution of 24,000 identified human proteins in multiple types of cells, cell lines and normal or cancerous tissues [20]. Based on this database, we determined CENPU expression in various normal and cancerous tissues.

Correlations between CENPU levels and clinical and pathohistological characteristics in breast carcinoma
Differences in CENPU expression in breast carcinoma patients with diverse clinical and pathological parameters were investigated based on bc-GenExMiner [21], a database exhibiting gene expression profiles in breast cancer patients. CENPU expression was examined in patients with different ages, SBR grades, Nottingham prognostic index (NPI) scores, lymph node statuses, ER statuses, PR statuses, HER2 statuses, HU's

Analysis based on the Oncomine database
Oncomine is an online discovery platform providing transcriptomic information based on 715 datasets obtained from 86,733 cancerous and normal tissue samples [22]. Using Oncomine version 4.5, gene expression profiles were analyzed and compared between different types and subtypes of malignancy, among cancer patients with different clinicopathological features, and between cancerous and normal tissues.

Analyses based on the Kaplan-Meier plotter and prognoScan databases
Kaplan-Meier plotter is a database that provides information about the correlation between expression of a specific gene and survival of cancer patients [23].  PrognoScan is a newly established base that collects meta-analyses of prognostic significance of different genes in various human malignancies [24]. We previously explored the expression of CENPU in cancerous and normal tissues of breast carcinoma patients based on the online database UALCAN [25]. In this research, we investigated the correlations between CENPU expression and OS, RFS, DMFS, or post- progression survival in those patients by utilizing the Kaplan-Meier plotter database. In addition, genes exhibited co-expression with CENPU were identified based on the UALCAN database, and their prognostic values were predicted based on the Kaplan-Meier plotter database.

Analyses of genes co-expressed with CENPU
We characterized genes that exhibited co-expression with CENPU using the cBioPortal [26], GEPIA [27], and UALCAN databases. To characterize the functions of these genes, GO annotation and pathway enrichment analyses were carried out based on the Enrichr database [28].
To obtain a better understanding of the functions of these genes, we retrieved raw microarray data from the gene expression omnibus (GEO) database. The dataset GSE142102 obtained based on platform GPL17692 from a cohort of 226 African-American female TNBC patients was used. Subsequently, these 226 samples were categorized into two groups depending on the level of CENPU expression (high vs. low). GSEA [29] was then carried out utilizing the optimal cut-off expression values established for event-free survival (EFS). We enriched gene sets obtained from the MSigDB "Curated" gene set collection (https://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp) in both the CENPU-high and the CENPU-low groups.

Analysis of key miRNAs and lncRNAs
We predicted miRNAs and lncRNAs that can potentially regulate CENPU expression by using starBase, a comprehensive online resource for prediction of microRNAs and lncRNAs [30]. Then we validated the functions and prognostic values of the predicted genes by using Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancerpku.cn/detail.php), a novel online server for analysis of RNA-seq data deposited in TCGA and GTEx.

Cell culture and cell transfection
Human breast cancer cell lines MDA-MB-468, BT-549, HCC1954, and MCF-7, and a normal breast epithelial cell line MCF-10 A were purchased from ATCC and maintained in Dulbecco's Modified Eagle Medium (Thermo Fisher Scientific) containing 10% fetal bovine serum (HyClone) and 1% penicillinstreptomycin solution under the conditions of 37°C, 5% CO2. The shRNAs targeting GATA3-AS1 and negative control (NC)-shRNAs were synthesized by GenePharma and were transfected into MCF-7 cells by utilizing the Lipofectamine 2000 transfection kit (Invitrogen).

Cell viability assay
Equal amounts (2 × 10 3 ) of cells were seeded into each well of five 96-well plates. The cells were cultured for five consecutive days and added with 20 μL of 3-(4,5dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT, final concentration 5 mg/mL). Afterwards, the cells were maintained at 37°C, 5% CO2 for another 4 hours, and the MTT solution was removed. The 490-nm optical absorption value of each well was obtained and cell grow curves were established accordingly by using the Graphpad Prism 5.0 software.
Immunoblotting MCF-7 cell lysates were prepared and electrophoresed on 10% sodium dodecyl sulfate polyacrylamide gel for separation of proteins, which were then electro-blotted onto polyvinyl fluoride membranes (Millipore). After being incubated in 5% non-fat dry milk [dissolved in Tris-buffered saline Tween (TBST)] for 1 h at ambient temperature, the membranes were probed with primary antibodies raised against our target proteins at 4°C overnight. After being rinsed with TBST for three times, the membranes were subjected to incubation in horseradish peroxidase (HRP)-conjugated secondary antibodies for 1 h at ambient temperature. The protein bands were eventually developed by using the Enhanced Chemiluminescence Western Blotting Substrate kit (Pierce). The primary antibodies used in this study included rabbit anti-CENPU (diluted 500-fold; Abcam, ab117078) and mouse anti-GAPDH (diluted 2000-fold; Santa Cruz, sc-32233); the secondary antibodies used in this study included goat anti-rabbit IgG (diluted 2000fold; Santa Cruz, sc-2004) and goat anti-mouse IgG (diluted 2000-fold; Santa Cruz, sc-2005).

Statistical analysis
At least three replicates were carried out for each experiment, and the results were analyzed by the GraphPad prism 5.0 software (GraphPad Software, Inc.). Quantitative data are displayed as mean ± S.D. We performed Student's t-test to evaluate the statistical significance of intergroup differences, and a P value < 0.05 was regarded as statistically significant.

AUTHOR CONTRIBUTIONS
Shiping Ding and Fang Peng contributed significantly to the design and conception of the study. Yanbo Lv, Genxiang Mao, Shuangyan Lin and Mingyuan Zhao all participated in data acquisition. Shuangyan Lin and Mingyuan Zhao also analyzed and interpretated the data. Shuangyan Lin wrote the manuscript; Shiping Ding and Fang Peng revised the manuscript carefully and provided valuable advices.