Identication of Four Hub Genes Involved in Breast Cancer Based on Robust Rank Aggregation and WGCNA Methods

Background: Further elucidation of the molecular mechanisms of the occurrence, development and prognosis of breast cancer remains an urgent need. Identifying hub genes involved in these pathogenesis and progression can potentially help to unveil these mechanisms and provide novel therapeutic targets for breast cancer. Methods: In this study, we systematically integrated robust rank aggregation (RRA), functional enrichment analysis, protein-protein interaction (PPI) networks construction and analysis, weighted gene co-expression network analysis (WGCNA), DNA methylation analyses and genomic mutation analyses, GSEA and GSVA to identify potential hub genes that are highly associated with breast cancer. Results: We identied a total of 512 robust DEGs that were signicantly associated with breast cancer based on RRA analysis and functional enrichment analysis. CENPL, ISG20L2, MRPL3 and LSM4 were identied as four potential hub genes for breast cancer through the WGCNA analysis and literate search. These four hub genes were upregulated in breast cancer tissues and associated with tumor progression. ROC and Kaplan-Meier indicated these four hub genes all showed good diagnostic performance and prognostic values for breast cancer. Methylation analyses and genomic mutation analyses suggested that the abnormal up-regulation of these genes are likelyresulted from hypomethylation and gene mutations. Moreover, GSEA and GSVA for single potential hub genes revealed they were all tightly related to the proliferation of tumor cells. Conclusion: We identify four genes (CENPL, ISG20L2, MRPL3, and LSM4) that are likely playing key roles in the molecular mechanism of occurrence and development of breast cancer. They may become potential therapeutic targets for breast cancer patients with further studies. Keywords: RRA,

that the abnormal up-regulation of these genes are likelyresulted from hypomethylation and gene mutations. Moreover, GSEA and GSVA for single potential hub genes revealed they were all tightly related to the proliferation of tumor cells. Conclusion: We identify four genes (CENPL, ISG20L2, MRPL3, and LSM4) that are likely playing key roles in the molecular mechanism of occurrence and development of breast cancer. They may become potential therapeutic targets for breast cancer patients with further studies. Keywords: breast cancer, RRA, WGCNA, hub genes Full Text Due to technical limitations, full-text HTML conversion of this manuscript could not be completed.
However, the latest manuscript can be downloaded and accessed as a PDF.

Tables
Due to technical limitations, tables is only available as a download in the Supplemental Files section. Figure 1 Work ow of our study. Gene expression pro les of eight breast cancer related datasets were downloaded from the GEO database and subjected for differential expression analysis. RRA algorithm was applied to integrate the DEGs in these eight datasets to search for robust DEGs. WGCNA was used to identify hub genes associated with breast cancer in TCGA_BRCA dataset. Subsequently, novel key genes were validated based on multiple datasets and databases. Moreover, GSEA and GSVA for single hub genes were performed to reveal their potential biological functions and mechanism in breast cancer based on METABRIC dataset.

Figure 2
Identi cation of robust DEGs between breast cancer and normal breast tissue in 8 datasets downloaded GEO database based on RRA analysis. The Heatmap shows the 20 most signi cant up-regulated genes and down-regulated genes according to adjusted P values. Each row in the gure represents one gene and each column is one dataset. Red shows up-regulation and green signi es down-regulation. The numbers in the heatmap squares indicates fold changes (breast cancer.Vs.Normal breast tissue) in each data set that conducted in the "limma" R package.  Identi cation of candidate gene module and 102 hub genes for breast cancer based on TCGA_BRCA dataset through WGCNA. a Left, analysis of the scale-free tting indices for various soft-thres holding powers(β), redline indicated Scale Free Topology Model Fit, signed R^2 is 0.90. Right, mean connectivity analysis of various soft-thres holding powers (β value range 1-20). b Left, his to gram shows the frequency distribution of the k(namely connection)when β =5.Right, checking the scale-free topology when β = 5, the gure shows that log10(k) and log10(p(k)) are negatively correlated (correlation coe cient 0.97), denoting that the gene scale-free network that we constructed is guaranteed. c Clustering dendro grams of genes based on dissimilarity to pological over lap calculation formula(1 -TOM) and merged gene set modules.Seven weighted gene co-expression network modules were constructed and shown in different colors. d Heat map of the correlation between module eigengenes and breast cancer samples traits (Tumor). The numbers in each square of heatmap indicates the Pearson correlation coe cient (up)and P value(down).e Scatter plot of gene signi cance for "Tumor" and module membership in the blue module. f Scatter plot of gene signi cance for "Tumor" and module membership in thebrowmodule.  Methylationlevelanalysesandgenelticalterationofnovelhubgenesforbreastcancer.The methylation levels of CENPL, ISG20L2, LSM4, and MRPL3 in breast cancer and normal tissues were examined using DiseaseMeth 2.0 databaset based on 450k (Illumina In nium Human Methylation 450 Bead Chip)platform. eGeneticalterations of CENPL, ISG20L2, MRPL3,and LSM4 were further examined in cBiportal database, four hub genes were altered in 570 (26%) of 2173 breast cancer patients, and CENPL and ISG20L2 altered the most (20%) with gene ampli cation as the main alteration type.

Figure 7
The diagnostic value analysis and validation of four novel hub genes in breast cancer. ROC curves analysis for CENPL, ISG20L2, LSM4 and MRPL3 based on: a TCGA dataset, b GEO dataset. ROC, receiver operating characteristic; AUC, area under the ROC curve.

Figure 8
The prognostic value analysis of four novel hub genes in breast cancer based on METABRIC dataset (a-d) and TCGA_BRCA dataset (e-h), Expression levels of CENPL, ISG20L2, LSM4 and MRPL3 are signi cantly associated with the OS of patients in breast cancer(all P<0.05,HR 1).

Figure 9
Gene set enrichment analysis (GSEA) of potential hub genes in the METABRC dataset. Tumor cell proliferation related gene-sets were signi cantly enriched in the high-expression group of each hubgene.a CENPL, b ISG20L 2, c LSM4 and d MRPL 3.