Multi-omics analysis identifies potential mechanisms of AURKB in mediating poor outcome of lung adenocarcinoma

Aurora kinases B (AURKB), which plays a critical role in chromosomal segmentation and mitosis, greatly promotes cell cycle progression and aggressive proliferation of cancers. So far, its role and underlying mechanisms in mediating poor outcome of lung adenocarcinoma (LUAD) remained largely unclear. Analyses on multiple omics data of lung adenocarcinoma cohort in The Cancer Genome Atlas (TCGA) were performed based on AURKB expression, and demonstrated its association with clinical characteristics and the potential of using AURKB as a biomarker in predicting patients’ survival. This study found aberrant alterations of genomics and epigenetics, including up-regulation and down-regulation of oncogenic genes and tumor suppressors, pathways involved in the cell cycle, DNA repair, spliceosome, and proteasome, hypermethylation enrichments around transcriptional start sites, which are all related to AURKB expression. We further discovered the possible role of tumor suppressors DLC1 and HLF in AURKB-mediated adverse outcome of LUAD. To conclude, this study proved AURKB as a potential prognostic factor and therapeutic target for lung adenocarcinoma treatment and provide a future research direction.


INTRODUCTION
Lung adenocarcinoma (LUAD) is the main type of lung cancer. Although epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs) [1] and programmed death-ligand 1/programmed death 1 (PD-L1/PD1) targeting immunotherapy used in combination with or without chemotherapy has been successful [2], the underlying mechanisms leading to poor survival or even unresponsiveness to these therapies in patients with lung adenocarcinoma remain to be elucidated. Aurora kinases B (AURKB), a key modulator of chromosome segregation in mitosis, plays an important role in proliferation and metastasis in many cancers. For instance, the expressions of AURKB and Aurora kinases A (AURKA) have been found to be upregulated by Myc and essential in Myc-driven B-cell lymphomas [3]. Barasertib is effective in the discontinuation of AURKB for the treatment of pediatric acute leukemia [4], and AURKB is also detected in many solid tumors, such as gastric cancer [5,6], glioblastoma [7], bladder cancer [8], and clear cell renal cell carcinoma [9]. As early as 2005, AURKB is found to be correlated with high frequencies of somatic mutations in lung cancer [10]. Researches showed that AURKB is an important target for KRAS, therefore has the potential to serve as a target of KRASinduced lung cancer [11]. Recently, study discovered that AURKB is responsible for cancer resistance to chemical reagents such as cisplatin and paclitaxel [12], and it has been established as a potential target to overcome the acquired resistance to EGFR TKIs when patients do not harbor resistance mutations in the EGFR gene [13].

AGING
Additionally, radiotherapy sensitivity of lung cancer could be enhanced by inhibition of AURKB induced by an herbal drug Daurinol [14]. Therefore, AURKB plays a critical part in lung cancer and may be a vital drug target to reverse the resistance to chemotherapy, targeted therapy, or radiotherapy.
Although some mechanisms of AURKB have been shown above, the important role of AURKB in lung adenocarcinogenesis and aggressiveness leading to poorer survival requires further exploration. Through analysis of the genome-wide mRNAs, microRNAs (miRNAs) and methylation profiles from LUAD project in The Cancer Genomics Atlas (TCGA) and the webbased bio-tool Lung Cancer Explorer (LCE) [15], we showed evidence for the use of AURKB as a prognosis biomarker in LUAD patients, and demonstrated potential genomic and epigenomic mechanisms associated with AURKB expression, explaining how AURKB accelerated the lung adenocarcinoma progression and limited patients' survival.

Overexpression of AURKB in LUAD patients
TCGA-LUAD cohort was used to compare the expression of AURKB in lung adenocarcinoma patients. As shown in Figure 1A, AURKB was overexpressed in LUAD tumor tissues (513 cases) when compared with 59 normal tissues (rank-sum test, p < 0.001). We further examined the correlation between AURKB expression and the clinical features of the 513 lung adenocarcinoma patients. For TNM system, we found that AURKB was significantly low-expressed in tumors with smaller size (T1 stage) when compared with T2 (p<0.001) or T3-4 (p=0.028) ( Figure 1B). Patients without lymph node invasion (N0) showed low expression of AURKB than those with N1 (p=0.011) or N2-3 (p=0.001) ( Figure 1C). No significant differences among patients with or without metastasis (p=0.123, Figure 1D) were detected, which might be due to the limited number of metastatic patients in the LUAD cohort. However, AURKB showed obvious variations in the patients' staging: lower expression in stage I against stage II (p=0.003) or III/IV (p<0.001), as shown in Figure 1E. We also found that AURKB was noticeably high-expressed in patients with a history of smoking (p=0.002, Figure 1F), and that elder or female patients tend to have a relatively lower expression of AURKB (p=0.041, 0.0028, respectively, as shown in Table 1). We additionally compared 7 microarrays and sequencing data sets using the lung cancer explorer (LCE) database, and the results confirmed that AURKB expression was up-regulated in lung adenocarcinoma (p=3.1e-10, Figure 1G).

AURKB indicates a worse outcome in LUAD
As AURKB is high-expressed in lung adenocarcinoma tissues and correlates with more advanced tumor stage, we hypothesized that AURKB might be a prognostic factor in LUAD. Thus, the expression distribution of AURKB and prognosis of each sample were examined. We observed that the expression of AURKB of each individual was (Figure 2A) concentrated, with a median value of 14.33. Based on this, lung adenocarcinoma patients were then divided into AURKB high and AURKB low groups, with 245 samples each group. As shown in Figure 2B, 2C, higher expression of AURKB contributed to significantly worse overall survival (OS, p=0.002) and progressionfree survival (PFS, p=0.013), indicating that AURKB is associated with poor prognosis in patients with lung adenocarcinoma. The data of multivariate cox analysis also showed that AURKB and N staging could be two independent prognostic factors of LUAD (HR, 95%CI: 1.672, 1.045-2.674 for AURKB, 1.897, 1.13-3.184 for N staging, Figure 2D), irrespective of T stage, M stage, smoking history and EGFR mutation status. The outcome data of 21 data sets involving 2912 patients with LUAD further confirmed that AURKB could be a prognostic factor in LUAD (HR, 95%CI: 1.22, 1.13-1.32, Figure 2E).
To evaluate the potential of AURKB as a prognostic marker in blood samples, we obtained the data of 80 LUAD tumor samples and 80 normal control samples from the GEO database (GSE20189). The original data of the chip were downloaded, and the R software package Affy was used to perform data conversion and RMA standardization for obtaining a gene expression profile data set. The expression of AURKB in tumor samples was found to be higher than that of control samples with significant margins (p=0.072, Supplementary Figure 1A). Further analysis of the changes in the expression of AURKB at the early and advanced stages of LUAD showed that AURKB was significantly high-expressed in tumors more advanced after Stage II (p=0.043, Supplementary Figure 1B). Angiogenic factors are key markers of tumor progression. The ssGSEA method was used to evaluate the scores of angiogenic factors in the samples, and the data indicated that the scores of angiogenic factors in tumor samples were significantly higher than in normal samples (p=0.00012, Supplementary Figure 1C). In LUAD patients, the expression of AURKB was greatly positively correlated with the angiogenic factor score (r=0.52, p<0.001, Supplementary Figure 1D). The above results indicated that the expression of AURKB was of a certain consistency in blood samples and tumor tissue samples.

Genome-wide mRNA profiles associated with AURKB expression
To identify the key genes involved in the AURKBmediated poor survival of LUAD and related biological functions or pathways, we compared the expressions of differentially expressed genes (DEGs) between AURKB high group and AURKB low group and examined their correlations with AURKB using the T-test and the rank-sum test, respectively (mRNAs were excluded if over 50% values are empty or zero). Only genes with correlation coefficients larger than 0.3, fold change >2, and the p-value < 0.01 either in the T-test or the rank-sum test were considered as DEGs. As visualized by volcano plot and heatmap ( Figure 3A, 3B), 348 DEGs incorporating 270 upregulated genes and 78 downregulated genes were found. The top 10 upregulated and down-regulated genes identified by fold changes were listed in Table 2. The most correlated mRNA with AURKB was FAM64A, which is a marker of cell proliferation, and has been found to be associated with the aggressive growth of pancreatic cancer and breast cancer [16,17]. Moreover, study indicated that FAM64A may also regulate Th17 cells and promote inflammation-associated cancers [18]. PRAME, an antigen high-expressed in various types of cancers, is involved in cell apoptosis, differentiation, proliferation, and metastasis [19], and it had the greatest fold change in the two groups. CDCA3 is the most significantly differentiated mRNA, and is already proven to be a prognostic factor in NSCLC patients [20]. Additionally, we further used ClusterProfiler [21] R-package to run KEGG enrichment analysis, and found that DEGs were enriched in 19 pathways (p<0.05), especially in cancerrelated pathways including cell cycle, DNA replication, homologous recombination, and p53 signaling pathway ( Figure 3C). These pathways were additionally validated by KEGG pathway enrichment analysis using GSEA. Meanwhile, we found some other pathways (24 pathways in total), such as purine metabolism, pyrimidine metabolism, spliceosome, and proteasome, were related to high expression of AURKB ( Figure 3D). Since AURKB could mediate EGFR TKI resistance as above-mentioned, we also analyzed the relationship Positive values indicate a higher correlation with AURKB high patients, while negative values indicate association with AURKB low patients.  Figure 2A). However, for the AURKB low group, no pathways met our criteria that both p-value and FDR q-value should be less than 0.05. Aberrant DEGs and enriched pathways identified in this study were closely associated with cancer formation and therapy resistance, which was consistent with our previous finding that AURKB mediated poor outcome for LUAD patients and might explain for the internal mechanism of action.

Connections between microRNAs and AURKB
We also found certain DEGs were enriched in the miRNA pathways of LUAD, thus, genome-wide miRNA profile analysis was performed. In a further Pearson correlation analysis with AURKB, 926 miRNAs with at least 30% sample expression values were included. A total of 54 miRNAs met the threshold, with a FDR of less than 0.05 and a correlation coefficient of above 0.3. Specifically, 50 of these miRNAs were positively correlated with AURKB expression, while 4 were negatively correlated with AURKB expression. The distribution relationship of correlation coefficients with FDR is shown in Figure 4A. As displayed by the clustered heatmap ( Figure 4B), only a small number of miRNAs were negatively correlated with AURKB, while most of the miRNAs were positively correlated with AURKB. Among them, the top miRNA with the highest positive correlation coefficient was miR-130b-3p. Previous research has been reported that miR-130b-3p present in exosomes contributes to CXCL12/ CXCR4-induced colorectal cancer metastasis into liver [22] and downregulate PIEZO2, thereby activating the Hedgehog signaling pathway and leading to worsening breast cancer prognosis [23]. Moreover, miR-421 is previously identified as a prognostic biomarker indicating promoted tumor progression of non-small cell lung cancer (NSCLC) [24]. miR-29b-2-5p and miR-101-3p were negatively correlated with AURKB. From reviewing previous studies, miR-29b-2-5p is reported to target CBI-b to inhibit the proliferation of pancreatic cancer cells [25], and miR-101-3p could hinder the growth and metastasis of NSCLC through attenuating MALAT-1 mediated PI3K/AKT signal pathway [26].
Correlated miRNAs were analyzed by the T-test and rank-sum test to compare the expression changes between AURKB high and AURKB low groups. 49 miRNAs were screened based on FDR < 0.05 and fold change > 1.5. Then mirWallk database was used to predict the target genes of these differential miRNAs and to construct mRNA-miRNA regulatory network, as shown in Figure 4C. Further analysis of the relationship between miRNA in the network and EGFR pathway AGING AGING showed that 82% of miRNAs predicted were significantly correlated with EGFR pathway (Supplementary Figure  2B). Based on the scores calculated by the degree algorithm, the following top 10 hub genes were screened: miR-93-5p, RAB3B, miR-17-5p, miR-20a-5p, WNK3, miR-7974, miR-6783-3p, PRR11, GINS4 and miR-128-3p (Table 3). miR-93-5p, which was reported to promote the proliferation of NSCLC cells and is indicative of a poor prognosis, was found to enhance the transcription of several oncogenic genes, such as HMGA2, E2F2, KIF23, POLQ, PRR11, GINS4, WNK3, LYPD6, KIF14, RAB3B and downregulate the expressions of tumor suppressors such as ATP1A2 and CYBRD1. Taken together, these observations provided rational evidence for the prognostic role of AURKB in lung adenocarcinoma.

Whole-genome methylation profiles associated with AURKB
DNA methylation is an important epigenetic mechanism regulating gene expressions through three DNA methyltransferases (DNMT1, DNMT3A, and DNMT3B) and affects cancer cell behaviors. We analyzed differences in the transcription of the three methyltransferases, and as shown in Figure 5A-5C, expressions of the three were significantly higher in AURKB high than AURKB low groups. 476 samples with methylation profiles were included to further examine the methylation differences between the AURKB high and AURKB low groups. Methylation sites were excluded if more than 30% of the samples showed no values or happened to be the cross-reactive CpG sites in the Illumina Infinium Human Methylation 450 microarray. After filling up missing values using the KNN method in the R package 'impute', a total of 195,622 methylation sites were obtained and then compared with the R package 'limma', according to the thresholds of fold change > 1.3 and FDR < 0.05. The volcano map in Figure 5D showed that the hypermethylation sites were significantly more than hypomethylation sites (1900 and 411, respectively). Heatmap of these differential methylation sites is shown in Figure 5E.
By examining the position distribution of the 2311 differential methylation regions (DMR) around the CpG islands, as shown in Figure 5F, all the positions had more hypermethylated DMRs, although the differences were not significant in N_Shelf or S-Shelf (Island: P<0.001, N_Shore: P=0.0019, S_Shore and Others: P<0.001). As expected, most hypermethylated DMRs centered on the CpG islands. Finally, we determined the distance of methylation sites to the transcription starting site (TSS), as presented in Figure 5G, most hypomethylated DMRs and hypermethylated DMRs were on the upstream of TSS (-600 -+200), and hypermethylated DMRs fell to -200 -+200 regions. Generally, hypermethylated sites tended to distribute around TSS, while hypomethylated sites tended to distribute on the upstream of TSS.
We extracted the corresponding genes of these DMRs based on the TSS and ran a KEGG enrichment analysis using R package ClusterProfiler. The data demonstrated that genes corresponding to hypermethylation were enriched into 36 pathways, such as cAMP signaling pathway, axon guidance, viral carcinogenesis, and chemokine signaling pathway (p<0.05, Figure 5H). However, the relative hypomethylated genes were only enriched into two pathways, which were olfactory transduction and viral protein in interaction with cytokine and cytokine receptors ( Figure 5I).

Tumor suppressors DLC1 and HLF negatively correlated with ARUKB and its associated miRNAs and methylation modification
As we have obtained the genome-wide mRNAs, miRNA-regulated genes, and methylation-regulated genes profiles, we then went on to investigate whether these three profiles shared genes in common that may have close crosstalk with AURKB to promote LUAD aggressiveness. As shown in Figure 6A, two genes, DLC1 and HLF, were present in all the three profiles. The differential mRNA expression levels of both genes were further confirmed using paired lung adenocarcinoma tumor and adjacent tissues in TCGA (p<0.001 for both genes, Figure 6B). Gene correlation analysis using GEPIA web-tool (http://gepia.cancerpku.cn/) also confirmed that AURKB was significantly negatively correlated with DLC1 and HLF (R= -0.29, -0.34, respectively, Figure 6C). Survival analysis of included LUAD patients from the TCGA database also showed an apparent difference between HLF high and HLF low patients (p=0.0029), and DLC1 was AGING determined to be a protective factor to the survival of patients with LUAD (p=0.078) (divided by the median value, Figure 6D Figure 6E).

DISCUSSION
Abnormal proliferation of cancer cells plays an essential role in both cancer formation and metastasis [27]. Cancer cells will consume nutrients and reprogram energy metabolism of normal cells, including stromal cells and immune cells, thereby further enhancing cancer survival and aggressive growth [28]. Cell mitosis restricts the fast proliferation of cancer cells, but most cancer cells overexpress mitosis-related proteins such as AURK family member AURKA and AURKB to affect chromosome segregation and cell cycle and cause unequal distribution of genome, producing aneuploid cells in cancers [29]. A retrospective study with a cohort of 132 NSCLC patients confirmed that AURKA could be an independent prognostic factor (HR[95%CI]: 1.81[1.16-2.84]) [30]. Here we identified AURKB as an independent prognostic factor in LUAD patients using the TCGA-LUAD cohort and further confirmed our findings through multiple datasets in LCE.
EGFR wild type and mutated types tend to have a similar expression level of AURKB, although they showed different survival outcomes by application of EGFR-TKIs. However, as mentioned above, the majority of AURKB associated pathways and miRNAs were significantly correlated with EGFR signaling pathway, and AURKB has been reported to be responsible for the therapy resistance to EGFR-TKIs in patients without secondary resistant mutations during progression, suggesting that AURKB could be a vital factor as a downstream effector of EGFR pathway contributing to tumor progression.
We further explore the mechanisms and essential biofunctions or pathways of AURKB in LUAD from the aspects of gene expression, miRNA expression, and methylation profiles. As previously demonstrated, AURKB was correlated with FAM64A, PRAME, CDCA3, etc., which are all highly related to cancers. DEGs upregulated in AURKB high patients were enriched in the cell cycle, DNA replication, homologous recombination, and p53 signaling pathway, which confirmed the role of AURKB in the mitosis. Moreover, we also found that some AURKB-associated genes were also involved in purine metabolism, pyrimidine metabolism, spliceosome, and proteasome pathways, indicating that the fundamental processes of a fast proliferation of LUAD cells were also regulated by AURKB either through nucleotide metabolism, mRNA maturation or posttranslational protein modifications.
For miRNA profiles, in this study, most miRNAs were positively correlated with AURKB expression, which is consistent with enriched pathway 'miRNAs in cancer' and indicated that AURKB may also regulate the genome-wide changes to facilitate LUAD growth through miRNAs. miR-93-5p, a hub miRNA in the profile, was predicted to activate RAB3B, WNK3, GINS4, PRR11 hub genes. Specifically, RAB3B is a member of the ras oncogene family currently remain largely unexplored; WNK3 is a serine-threonine protein kinase functioning as a positive regulator of the transcellular Ca2+ transport pathway and increases cell survival in a caspase 3-dependent pathway [31]. GINS4 was reported to contribute to the poor outcome in lung adenocarcinomas [32]. Recent research found that PRR11 could facilitate F-actin polymerization and disrupt the F-actin cytoskeleton, thereby causing aberrant nuclear lamina assembly and chromatin reorganization in NSCLC [33]. To the best of our AGING knowledge, previous researches have shown many cases of such miRNA-mRNA pairs [34], but the modulation of those genes by miR-93-5p has not been reported yet.
Other epigenetic modifications and methylation changes were also affected by AURKB, as we demonstrated that DNA methyltransferases were remarkably increased in AURKB high patients. In our study, hypermethylation, which reflects transcription repression functions, mostly occured within CpG island around TSS [35]. It was possible that those abnormal methylations might accompany the adverse prognosis. Interestingly, in this research, two tumor suppressors (DLC1 and HLF) were both found to be closely correlated with AURKB and AURKB-associated differentially expressed miRNAs and hypermethylation. DLC1 is a tumor suppressor in many different cancer types, including in lung cancer [36,37] and could inhibit cancer cells through Rho GTPase accelerating proteins (GAP) dependent-and independentmechanisms [38]. HLF, which is high-expressed in liver tissues but low-expressed lung tissues, is initially identified as a protein related to TCF3 driving acute lymphoblastic leukemia [39], but it is also demonstrated to be robustly hypermethylated in NSCLC when compared to normal tissues [40,41]. These two genes were closely related to AURKB through methylation modification and miRNA regulation, however, this remained to be validated by our team.
Smoking is a high risk factor and driving event for lung cancer, as we found here, significantly correlated with higher expression of AURKB in LUAD. To analyze the effect of smoking history on AURKB expression, we compared the profiles of DEGs, differentially expressed miRNAs, and methylation differences between AURKB high and AURKB low groups in smokers and nonsmokers separately, according to the thresholds of a two-fold difference and p <0.01. Generally, at different omics levels, the differences between AURKB high and AURKB low groups in patients with smoking history were greater than that in the non-smoking group (Supplementary Figure 4). Interestingly, >80% varied molecular features in the non-smoking group were commonly found in the smoking group, suggesting that smoking might be a main factor affecting AURKB expression and altering LUAD patient's genome characteristics.
In summary, this study found that AURKB may be an effective factor to predict LUAD patients' survival. Also, personalized drugs with AURKB could be designed to treat patients with AURKB overexpression and to overcome acquired resistance to chemo reagents or targeting agents. Recently, a phase I dose-escalation study showed that Chiauranib could simultaneously inhibits angiogenesis-related kinases, AURKB, and chronic inflammation-related kinase CSF-1R and is tested to be tolerable even in advanced solid tumors, especially in NSCLC (4 in 5 patients with disease control) [42]. However, AURKB inhibitors were largely unavailable in clinical practice, which requires further development of effective AURKB inhibitors. Moreover, the intrinsic regulatory mechanism involving gene expression, miRNAs and methylation through which AURKB could be better used as a potential and safe target should be deeply studied.

Data acquisition and preprocessing
The expression profiles of mRNAs and miRNAs were sequenced by RNA-seq and 450K genome-wide methylation profile.    ). The expressions of miRNAs and mRNAs were calculated using the 2 -ΔΔCT method, with U6 and ACTB served as normalization controls, respectively. All primers used are listed in Table 5.

AURKB overexpression or inhibition in lung adenocarcinoma cells
Briefly, pLenti-CMV-GFP-Puro-AURKB and pPLK/ GFP+Puro-AURKB shRNA were constructed on the backbone of pLenti-CMV-GFP-Puro and pPLK/GFP+ Puro plasmids, respectively, and each was then cotransfected with lentiviral packaging plasmids pMD2.G and psPAX2 (Addgene, USA) into HEK 293T cells to generate lentivirus particles, followed by the transfection of the particles into target cells. The cells were incubated with lentivirus for 24 hours (h) and then replaced with fresh medium. When cells stably grew to proliferative status, cell pellets were collected for protein and total RNA extraction.

Statistical analysis
The chi-squared test was used to examine the correlation between AURKB expression and clinical or pathological variables. The endpoint event overall survival (OS) was defined as the time starting from the occurrence of LUAD to death due to any cause. Progression-free survival (PFS) was defined as the time starting from the onset of LUAD to progression after the first-line treatment or death. Kaplan-Meier analysis with the logrank test was applied to compare the survival differences between two groups of patients, while multivariate cox regression analysis was used to examine potential independent factors related to survival. Student's t-test and false discovery rate (FDR) were used to identify differences in mRNAs, miRNAs, and methylation data between AURKB high and AURKB low expression groups. All the analyses were performed using the R 3.6 software packages.