A novel six-microRNA-based model to improve prognosis prediction of breast cancer

Current tumor-node-metastasis (TNM) stage is unable to accurately predict the overall survival (OS) in breast cancer (BC) patients. This study aimed to construct a microRNA (miRNA)-based model to improve survival prediction of BC. We confirmed 99 differentially expressed miRNAs (DEMs) in 1044 BC samples compared to 102 adjacent normal breast tissues from The Cancer Genome Atlas (TCGA) database. Prognostic DEMs were used to establish a miRNA-based nomogram via Cox regression model. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes analyses (KEGG) were executed to analyze target genes of miRNAs. A six-miRNA signature was screened to effectively distinguish high-risk patients in the primary and validation cohort (all P<0.001). Furthermore, we established a novel prognostic model incorporating the six-miRNA signature and clinical risk factors to predict 5-year OS of BC. Time-dependent receiver operating characteristic analysis suggested that the predictive accuracy of the six-miRNA-based nomogram was distinctly higher than that of TNM stage (0.758 vs 0.650, P<0.001). GO and KEGG pathway analyses showed that the 39 target genes mainly enrichment in protein binding, cytoplasm and MAPK signaling pathway. Our six-miRNA-based model is a reliable prognostic tool for survival prediction and provides information for individualized treatment decisions in BC patients.


INTRODUCTION
Breast cancer (BC) is an enormous public health burden worldwide and ranks as the main cause of cancer deaths in China [1].With the advances of the comprehensive therapeutic strategies, the 5-year overall survival (OS) rate of BC has been improve dramatically.However, BC kills about 1.2 million people in China each year [2].Prognostic evaluation is vital for making appropriate therapeutic decisions and follow-up strategies in BC patients.Currently, tumor-node-metastasis (TNM) stage is a key tool for prognostic assessment and specific treatment choices.However, BC patients at same TNM staging can have very different clinical outcomes.The traditional TNM staging system is mainly on the basis of anatomical information, which is unable to sufficient prediction for prognosis of individual patients because it could not display the biological heterogeneity of BC [3].Therefore, a predictive tool that can integrate molecular biomarkers into the TNM staging system may improve the accuracy of survival prediction for BC patients.
MicroRNAs (miRNAs) are small, non-coding singlestranded RNAs (18-25 nucleotides) and negatively regulate gene expression by base-pair matching with the 3′UTRs of target mRNAs [4].Accumulating AGING evidence shows that miRNAs play critical roles in various physiological and pathological processes, including metabolism, carcinogenesis, and proliferation [4,5].In addition, previous reports have suggested the important prognostic value of miRNAs signature in a variety of cancers [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].But most of these studies were based on limited number of patients and different miRNAs platforms, lack a normalized standard.Thus, the Cancer Genome Atlas (TCGA) database provides us with a comprehensive catalogue of large-scale miRNAs expression data.Besides, the prognostic value of miRNAs signature to predict 5-year OS of BC patients has not been fully illustrated.With the ability of incorporating diverse independent prognostic variables to provide an individual probability of survival outcome, nomogram is widely applied for cancer prognosis [3,21].
Therefore, this study aimed to construct a novel miRNAbased model to improve survival prediction and effectively pick out the high-risk patients based on TCGA miRNA sequencing data.Such a practical tool has the potential to guide more effective individualized treatment decisions for BC patients.

Baseline characteristics of patients
A total of 984 BC patients from TCGA database were included.The detailed baseline characteristics of the primary and validation cohort were listed in Table 1.
No significant difference of baseline characteristics was displayed between the two independent cohorts in

Development of risk score formula and six-miRNAbased prognostic model
To facilitate the utility of the identified prognostic miRNAs in routine clinical practice, the following formula was developed to generate risk score for each patient: Risk score=(0.289×expressionmiR-549a)+(0.072× expressionmiR-6715a)+(0.026×expressionmiR-4501)+ (0.158× expressionmiR-7974)+(0.068×expressionmiR-4675)-(0.054×expressionmiR-147b).Thus, patients were classified into the low-risk group and the high-risk group via the same median risk score as the cut-off point in the two independent cohorts.The distributions of the miRNA-based risk scores, OS, OS status, and six-miRNA expression profiles of the training cohort and validation cohort are showed in Figure 2. The heat map suggests that the five risky miRNAs (hsa-miR-549a, hsa-miR-6715a, hsa-miR-4501, hsa-miR-7974, hsa-miR-4675) have high expression in the high-risk group, while the one protective miRNA (hsa-miR-147b) exhibits high expression in low-risk group (Figure 2).Besides, compared with the low-risk group, Kaplan-Meier survival analysis shows that the high-risk group has an obvious poorer prognosis (P<0.0001)(Figure 3).
According to the results of univariate and multivariate CPHR analyses (Table 3), the six-miRNA signature and three clinical risk factors (age, TNM stage and ER status) were identified as independent prognostic variables of OS.T stage and N stage were not entered into multivariate CPHR analysis, because they were associated with TNM stage, known as multicollinearity, could lead to spurious associations and unreliable results [21].To construct a more sensitive predictive tool in clinical practice, we built a novel six-miRNAbased prognostic model integrating the six-miRNA signature and three clinical risk factors (age, TNM stage and ER status) to predict 5-year OS of BC patients (Figure 4).The six-miRNA-based nomogram revealed the six-miRNA signature and TNM stage as the largest contribution to 5-year OS, followed by the age and ER status.Each variable was acquired a nomogram score     AGING on the point scale.After calculating the total nomogram score, we could easily obtain the nomogram-predicted probability of 5-year OS for each patient.

Assessment of the six-miRNA-based signature and prognostic model
To test whether the six-miRNA signature could predict OS regardless of stages, we performed risk stratification in patients with TNM stage, T stage, and N stage.The patients with low-risk scores had significantly better OS than patients with high-risk scores in TNM stage II (P=0.00063),TNM stage III (P=0.001),T2 (P=0.00015),T3 (P=0.0076),N1 (P=0.021),N2 (P=0.021) and N3 (P=0.018)(Figure 5).To assess the predictive performance of the six-miRNA-based signature and prognostic nomogram, we conducted a time-dependent ROC curve analysis AGING

Comparison with other prognostic factors
In ROC analysis to compare predictive accuracy of different prognostic factors, the six-miRNA signature suggested higher prognostic accuracy than clinical risk factors, or single miRNA alone (Figure 8A-8B).Thus, the six-miRNA signature can outperform the clinical prognostic features.More importantly, the six-miRNAbased prognostic nomogram had significantly better predictive performance than TNM stage (0.758 vs 0.650, P<0.001) (Figure 8C).

GO and KEGG pathway analyses of predicted target genes
To evaluate the potential function of the six-miRNAs, a total of 39 target genes of the six-miRNAs were predicted using TargetScan, miRTarBase and miRDB database, respectively.GO analysis included molecular function (MF), biological process (BP), and cellular component (CC).The 39 target genes were mainly related with protein binding (MF), transcription and DNA-templated (BP), cytoplasm and nucleus (CC) (Figure 9A).And KEGG pathway analysis revealed that the 39 genes mainly enriched in MAPK signaling pathway, transcriptional misregulation in cancer and cAMP signaling pathway (Figure 9B).

DISCUSSION
A molecular marker-based approach to accurately predict survival in BC patients is urgently needed in the era of precision medicine.Accumulating evidence indicates that miRNAs play a vital role in BC prognosis [10,[22][23][24].In the present study, we confirmed six-miRNA signature that was significantly associated with OS in BC patients based on the TCGA database.Furthermore, this six-miRNA signature enabled to stratify patients into the low-risk and high-risk groups with distinct differences in 5-year OS.Moreover, a novel six-miRNA-based Previous reports about DEMs have indicated that the miRNA-based signature is an important marker for survival or relapse in a variety of cancers [8-10, 12, 14, 23-27].Recently, Gong et al built a miRNA-based classifier to predict relapse in Hormone Receptor-Positive HER2-Negative BC patients [24].However,

AGING
this study has been limited by small sample size and small number of miRNAs screened to mine miRNA expression profiling.In addition, many researches were inconsistent in these sets of prognostic miRNAs because of the heterogeneous of BC and variations in the methods for miRNAs selection.TCGA database provides a robust platform to systematically analyze the large-scale miRNA sequencing data.Consequently, compared with above previous study, total of 1601 miRNAs were initially selected in our study, which could provide a more comprehensive analysis.Besides, the miRNA signature from the TCGA database regarding to the 5-year OS of BC patients has not been reported.
Although the miRNA-based model performs well in BC survival prediction, there are several shortcomings should be acknowledged in the present study.First, experimental studies should be conducted to deeply explore the molecular mechanisms of these miRNAs in the future.Second, the TCGA database lacks some important postoperative variables (chemotherapy, radiotherapy, hormone therapy), thus we could not carry out a comprehensive analysis and identify the low-risk patients to tailor adjuvant chemotherapy.Third, multicenter, large-scale, prospective studies should be performed to validate this predictive tool before Figure 9. Functional enrichment analysis for predicted target genes of the six miRNAs. (A) Gene ontology (GO) enrichment analysis.
(B) Kyoto Encyclopedia of Genes and Genomes analyses (KEGG) enrichment analysis.The x-axis indicates the number of genes, and the y-axis represents the GO terms and KEGG pathway names.The color represents the P-value.
AGING application in routine clinical practice.Fourth, the risk score could accurately discriminate patients with N1, N2 and N3 status.However, the risk score did not accurately discriminate patients with N0 status.Indeed, 21-gene expression Oncotype DX was considered as an accurate molecular tool to discriminate high-risk patients with N0 status and put the indication of adjuvant chemotherapy [28].Thus, the risk score and 21-gene expression Oncotype DX could be used to identify the low-risk patients whether benefit from adjuvant chemotherapy, regardless of N status.
In conclusion, the current study showed a novel, robust six-miRNA-based prognostic model incorporating six-miRNA signature and clinical risk factors to predict 5-year OS in BC patients.The six-miRNA-based nomogram had higher prognostic value than the conventional TNM stage in BC patients.Furthermore, the six-miRNA signature can effectively identify the low risk patients from the high-risk group in BC patients.Therefore, this practical tool has the potential to facilitate individualized treatment decisions for BC patients.

Patients and study design
In this study, the raw counts of BC dataset (Level 3 miRNA expression profiles), including 1044 BC samples and 102 adjacent normal breast tissues were acquired from TCGA data portal in September 2, 2018.The inclusion criteria were included: (1) histologically confirmed invasive BC; (2) both miRNA expression profile and complete survival information available; (3) OS time was more than 1 month.Finally, a total of 984 BC patients with the corresponding clinical features including, age, T stage, N stage, TNM stage, estrogen receptor (ER) status, progesterone receptor (PR) status, human epithelial growth factor receptor 2 (HER2) status were enrolled as primary cohort in this study.And we acquired data from 984 patients randomly assigned 492 patients as the validation cohort based on a computergenerated allocation sequence.Because the application of data abided by the TCGA publication guidelines, the approval of institutional ethics committees was not required.

Identification of potential OS-related miRNAs of BC patients
The miRNA expression profiles were normalized via the R/Bioconductor package of edger [29].We defined a miRNA with FDR <0.05 and |log2FC|≥2 of expression level between the 1044 BC samples and 102 adjacent normal breast tissues as DEMs.Firstly, the univariate CPHR analysis was executed to screen for each DEMs associated with OS.Subsequently, these DEMs with a P<0.05 were selected into multivariate CPHR analysis to identify the independent prognostic miRNAs of OS (P <0.05).

Construction of risk score formula and miRNAbased prognostic model
Prognostic miRNAs which were distinctly associated with OS in the multivariate CPHR analysis (P<0.05) were pointed out to develop the risk score formula.The formula was carried to compute the prognostic risk score for each patient.Using the coefficients obtained from the multivariable CPHR analysis, a risk score formula was built as following: Risk score (miRNA-based classifier) = sum of coefficients × expression level of miRNAs.Moreover, the BC patients were stratified into the high-risk group and the low-risk group via the median risk score as the cutoff value.To provide the oncologists and patients with a quantitative method to achieve individualized survival prediction, we constructed a prognostic nomogram that incorporated both the miRNA-based signature and clinical risk factors using Cox regression model.

Evaluation of risk score formula and miRNA-based prognostic model
To further assess the predictive performance of the miRNA-based classifier and prognostic model, we measured the area under the curve (AUC) based on time-dependent receiver operating characteristic (ROC) analysis [30].Furthermore, stratified analysis was conducted to test whether the miRNA-based classifier was associated with OS independent of stages.In addition, calibration curve was used to evaluate the agreement between model predicted outcome and actual outcome.The predictive accuracy of miRNA-based classifier and prognostic model were compared with other risk factors using ROC analysis.

Target gene prediction and functional enrichment analysis
Potential target genes of prognostic miRNAs were predicted via three online databases, including TargetScan, miRTarBase and miRDB [31][32][33].Thus, we confirmed the overlapping miRNA target genes from the three online databases to perform enrichment analysis.The Database for Annotation, Visualization, AGING and Integrated Discovery 6.8 Bioinformatics Tool (DAVID 6.8) was carried out to Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis.

Statistical analysis
The Mann-Whitney U test and the χ 2 test were implemented to compare the associations of continuous and categorical variables between the primary cohort and validation cohort, respectively.Univariate and multivariate CPHR analyses were executed to screen the independent prognostic variables of OS (P <0.05).Then, we used the Cox regression coefficients to establish a risk score formula and miRNA-based nomogram.For survival analyses, Kaplan-Meier method was carried out to plot survival curves, which were compared using log-rank tests.The predictive accuracy of each variable was tested via time-dependent ROC analysis.Time-dependent ROC curve analysis is extensively applied in biomedical reports for assessing the predictive accuracy of the six-miRNA signature.It is a graphical display which plots sensitivity estimates (probability of a true positive) against one minus specificity (probability of a false positive) of the six-miRNA signature for all possible threshold values.In a time-dependent ROC analysis, the sensitivity and specificity are determined at each time point to guide important medical decisions [34].A volcano plot and heat map were drawn using the "ggplot2" package of R software.The primary end point was OS, which was computed the interval from surgery to the date of death from any cause.A value of P<0.05 was determined statistically significant.All statistical analyses were conducted with Stata/MP, version 14.0 (StataCorp LP, College Station, TX) and R version 3.4.4were applied to the statistical analyses.

Figure 1 .
Figure 1.Volcano plot of 1601 miRNAs in breast cancer patients.Blue color indicates up-regulated expression, and red color represents down-regulated expression.

Figure 2 .
Figure 2. The distribution of risk score, OS, and OS status and the heat map of prognostic six-miRNA signature in the primary cohort (A) and validation cohort (B).The dotted line indicates the cutoff point of the median risk score used to stratify patients into the lowrisk group and high-risk group.OS, overall survival.

Figure 3 .
Figure 3. Kaplan-Meier curves of overall survival for breast cancer patients based on the six-miRNA signature in the primary cohort (A) and validation cohort (B).

Figure 4 .
Figure 4. Six-miRNA-based prognostic model to predict 5-year overall survival in breast cancer patients.
by comparing the respective AUC value.Then, the AUC values of the six-miRNA signature at 5 years were 0.701 (95%CI: 0.633-0.768)and 0.789 (95%CI: 0.715-0.880) in the primary cohort and validation cohort, respectively (Figure6A-6B).And the AUC values of the six-miRNA-based prognostic model at 5 years were 0.758 (95%CI: 0.686-0.830)and 0.777 (95%CI: 0.687-0.867) in the primary cohort and validation cohort, respectively (Figure6C-6D).Importantly, these AUC values revealed that six-miRNA-based signature and prognostic nomogram had favorable discrimination performance for BC patients.In addition, calibration plots of the six-miRNA-based prognostic model fitted well in the training cohort and validation cohort, which indicated good calibration ability (Figure7).

Figure 5 .
Figure 5. Stratified analysis of the six-miRNA signature for breast cancer patients in TNM stage (A), T stage (B), and N stage (C).

Figure 7 .
Figure 7. Calibration plots of the six-miRNA-based prognostic model in the primary cohort (A) and validation cohort (B).

Figure 6 .
Figure 6.Time-dependent receiver operating characteristic curves at 3-, 5-years based on the six-miRNA signature in the primary cohort (A) and validation cohort (B).Time-dependent receiver operating characteristic curves at 3-, 5-years based on the six-miRNAbased prognostic model in the primary cohort (C) and validation cohort (D).
prognostic model combining six-miRNA signature and clinical risk factors was established and validated to improve survival prediction for BC patients.The six-miRNA-based nomogram consisted of four independent prognostic variables, including age, TNM stage, ER status, and six-miRNA signature.The proposed tool was significantly superior to the traditional TNM stage in predicting 5-year OS for BC patients.The AUC value of the six-miRNA-based prognostic model was 0.758, which indicating favorable discrimination performance.Therefore, our six-miRNA-based nomogram might be a vital tool for survival prediction in BC patients, aiding in personalized therapeutic treatment strategies and postoperative counseling.Further bioinformatics analysis helps us understand the biological function of the six OSrelated miRNAs.On the basis of the GO and KEGG pathway analyses, the six-miRNAs may play crucial roles in protein binding, transcription and DNAtemplated, cytoplasm, nucleus, MAPK signaling pathway, transcriptional misregulation in cancer and cAMP signaling pathway.

Figure 8 .
Figure 8. Comparisons of the prognostic accuracy at 5-years using time-dependent receiver operating characteristic curves in the six- miRNA signature with single miRNA (A), the six-miRNA signature with clinical risk factors (B), and the six-miRNA-based prognostic model with six-miRNA signature, TNM stage (C).

Table 2 . Six prognostic miRNAs significantly associated with OS in the primary cohort.
Abbreviations: OS, overall survival; CI, confidence interval.