A Six-Gene Signature Predicts Survival of Adenocarcinoma Type of Non-Small-Cell Lung Cancer Patients: A Comprehensive Study Based on Integrated Analysis and Weighted Gene Coexpression Network

Background and Goals. To identify a multigene signature model for prognosis of non-small-cell lung cancer (NSCLC) patients, we first found 2146 consensus differentially expressed genes (DEGs) in NSCLC overlapped in Gene Expression Omnibus (GEO) and TCGA lung adenocarcinoma (LUAD) datasets using integrated analysis. We constructed a weighted gene coexpression network (WGCN) using the consensus DEGs and identified the module significantly associated with pathological M stage and consisted of 61 genes. After univariate Cox regression analysis and subsequent stepwise model selection by the Akaike information criterion (AIC) and multivariate Cox hazard model analysis, an mRNA signature model which calculated prognostic score was generated: prognostic score = (−0.2491 × EXPRRAGB) + (−0.0679 × EXPRSPH9) + (−0.2317 × EXPRPS6KL1) + (−0.1035 × EXPRXFP1) + 0.1571 × EXPRRM2 + 0.1104 × EXPRTL1, where EXP is the fragments per kilobase million (FPKM) value of the mRNA included in the model. The prognostic model separated NSCLC patients in the TCGA-LUAD dataset into the low- and high-risk score groups with a median prognostic score of 0.972. Higher scores predicted higher risk. The area under ROC curve (AUC) was 0.994 or 0.776 in predicting the 1- to 10-year survival of NSCLC patients. The prognostic performance of this prognostic model was validated by an independent GSE11969 dataset of NSCLC adenocarcinoma with AUC values between 0.822 and 0.755 in predicting 1- to 10-year survival of NSCLC. These results suggested that the six-gene signature functioned as an independent biomarker to predict the overall survival of NSCLC adenocarcinoma.


Introduction
Lung cancer (LC) is one of the leading causes of cancerassociated deaths worldwide [1,2]. Non-small-cell lung cancer (NSCLC) accounts for 85% of all lung cancer cases [3]. Approximately 50% NSCLC is adenocarcinoma and 40% NSCLC belong to squamous cell carcinoma. Complete surgical resection is the most effective therapy for patients in the early stage, and adjuvant chemotherapy (ACT) is the standard treatment for patients in stage II-III. However, the relapse and death rates remain high. A multiparameter molecular signature provides wider insights into the heterogeneous nature of cancer including NSCLC and may more reliably predict survival and benefit from chemotherapy of cancer patients than a single prognostic biomarker and/or staging system. It is important to establish prognostic gene signatures that reflect the nature of NSCLC from multiple tiers of mechanisms.
Prognostic signatures based on gene expression profiles for NSCLC have been generated in several studies. A 14-gene signature can predict survival in resected nonsquamous NSCLC [4], identify patients at high risk of mortality despite small, node-negative lung tumors [5], improve identification of patients at risk for recurrence in early-stage NSCLC [6], and predict benefit from adjuvant chemotherapy for very early stage NSCLC, which is superior over current National Comprehensive Cancer Network (NCCN) criteria at identifying high-risk patients [7]. A 15-gene signature can differentiate high-and low-risk subgroups with significantly different overall survival and is prognostic for both adenocarcinoma and squamous cell carcinoma cases [8]. GeneFx ® Lung is a multigene RNA expression signature that classifies early-stage NSCLC patients as high-risk or low-risk for disease recurrence and predicts overall survival [9]. A 6gene signature including ABCC4, ADRBK2, KLHL23, PDS5A, UHRF1, and ZNF551 is identified to be the independent prognostic factor for overall survival [10]. A 10gene Yin Yang expression ratio signature (YMR) based on two groups of genes with opposing function significantly can separate high-and low-risk patients with stage IA or IB adenocarcinoma and squamous cell carcinomas of all stages. e YMR signature can also predict the benefit of adjuvant chemotherapy in high-risk patients with stage I NSCLC [11]. A 17-gene panel consisting of genes involved in epithelialmesenchymal transition (EMT), hypoxia response, glycometabolism, and epigenetic modifications for non-small-cell lung cancer prognosis has recently been identified through integrative epigenomic-transcriptomic analyses. It can clearly stratify NSCLC patients with significant differences in overall survival [12]. ese gene signatures contribute to preclinical and clinical treatment of NSCLC. However, additional gene signatures are needed for accurate prognosis of NSCLC because of its complexity.
In this study, we first identified genes significantly associated with pathological M stage based on weighted gene coexpression network analysis (WGCNA) using differentially expressed genes overlapped in both Gene Expression Omnibus (GEO) datasets and the TCGA-LAUD dataset. We then selected genes significantly correlated with the overall survival of NSCLC patients among above genes according to a univariate Cox regression analysis and identified a 6-gene signature for prognosis of NSCLC based on a multivariate Cox hazard model analysis. We characterized the prognostic performance of the 6-gene signature using TCGA-LAUD dataset and validated it in an independent GSE dataset of NSCLC adenocarcinoma. Our findings suggested that the 6-gene signature is a prognostic marker for NSCLC adenocarcinoma.

Datasets and Workflow of Data Analysis. A total of 14
Gene Expression Omnibus (GEO) datasets regarding NSCLC were collected, including GSE19188, GSE30219, GSE10072, GSE7670, GSE2514, GSE32863, GSE21933, GSE40275, GSE12472, GSE80796, GSE8500, GSE85841, GSE19027, and GSE11969. e TCGA-LUAD RNAseq data and clinical data (level 3) of the NSCLC and ANT (adjacent normal tissue) samples were downloaded from the TCGA data portal (up to June 29, 2018) ( Table 1) [13][14][15]. ese datasets were processed and analyzed by following the workflow in Figure 1. is workflow was set up based on the published literature [16,17]. Processing and Integrated Analysis.  e 13  raw GEO datasets (GSE19188, GSE30219, GSE10072,  GSE7670, GSE2514, GSE32863, GSE21933, GSE40275,  GSE12472, GSE80796, GSE8500, GSE85841, and GSE19027) were subjected to a quantitative and objective quality control using the MetaQC software package according to standardized mean ranks and principal component analysis (PCA) biplots [18,19]. e resultant GEO datasets were designated as the training set. e training dataset was processed individually using the LIMMA (linear models within the microarray analysis) software package with log 2 transformation and annotated by converting different probe IDs to their respective gene symbols. Duplicate gene expression values were averaged. Datasets from studies that screened differentially expressed mRNAs between human NSCLC and ANT samples were selected and combined. e MetaOmics software package (http://www.pitt.edu/ ∼tsengweb/MetaOmicsHome.htm) was used to integrate and analyze the GEO datasets [18].

GEO Datasets
e DEGs between NSCLC and ANT were identified by analyzing the training set using the MetaDE software package by setting the mean and standard deviation (SD) filter thresholds at 10% to filter minor changes in gene expression levels [20]. Meta-analysis was performed using Fisher's method. e modified t test and the permutation method by summarizing − log (P value) across studies and running 300 permutations (N Permutations � 300) were used to extrapolate the P values [21]. P values less than 0.05 were considered statistically significant for the DEGs. e heatmaps were generated to illustrate the expression patterns of DEGs [20].

TCGA-LUAD Dataset Processing and the Consensus
DEGs.
e TCGA-LUAD dataset was used as the test set to verify the results from the training set. Clinical information (American Joint Committee on Cancer pathological TNM stage, gender, age at initial pathological diagnosis and histological type, especially survival status, and time to latest follow-up) was screened to remove cases with incomplete clinical traits or gene expression information and resulted in 515 cases. e TCGA-LUAD DEGs were analyzed using an empirical Bayes approach within the LIMMA software package. e DEGs of the test set with a |log 2 fold change (FC)| ≥ 0.5 and an adjusted P value less than 0.05 were selected for subsequent analysis. An overlapping gene set by selecting common official gene symbols in both the training and test sets was created as the consensus DEGs.

Weighted Gene Coexpression Network Construction.
e consensus DEGs were subjected to WGCNA [14,22,23]. e scale-free gene coexpression networks were constructed using the clinical features and prognostic information of TCGA-LUAD dataset and the WGCNA software package [15]. e appropriate soft threshold power was automatically estimated and generated for the standard scale-free network. e weighted adjacency matrix was constructed using the power function ADJ mn � |COR mn | β (COR mn � Pearson's correlation between gene m and gene n, where ADJ mn � adjacency between gene m and gene n and β is the soft thresholding parameter, which was used to transform adjacencies and correlations into a topological overlap matrix (TOM). e corresponding dissimilarity was calculated as 1-TOM. Module identification was carried out with the dynamic tree cut method by hierarchically clustering the genes using 1-TOM as the distance measure with a deep split value of 2 and a minimum size cutoff 50 for the resulting dendrogram. Highly similar modules were marked by clustering and merged with a height cutoff 0.25.

Identification of Clinical Feature Modules and Efficacy
Evaluation of Pathological Stage Hub Genes. Module eigengenes (MEs), which are the first principal components in the PCA for each gene module, summarized the expression patterns of all genes into a single characteristic expression profile within a given module. e dynamic decision-making tree, node splitting method, and cluster analysis of the squared Euclidean distance were used to identify MEs related to these clinical features, especially those involved in the progression and prognosis of NSCLC. Modules with similar expression profiles were identified using the dynamic tree cut method. Highly similar modules were merged. Spearman's correlation analysis was carried out to confirm the object module, which was the most relevant module between the MEs and clinical traits. Depending on these, the module that had the highest absolute Spearman's correlation coefficient (PCC) value for the pathological stage and MEs was defined as the pathological stage module.

Identification, Characterization, and Validation of an mRNA Prognostic Model for NSCLC Patients.
Association of genes in the pathological stage modules with survival of NSCLC patients was analyzed using a univariate Cox regression analysis. e genes with P value <0.05 were selected. A stepwise model selection by the Akaike information criterion (AIC) was further performed to avoid overfitting to select a final list of genes. A multivariate Cox hazard model analysis was performed to generate an mRNA prognostic signature model, which calculated the prognostic score as follows: prognostic score � (C × EXP mRNA ), where EXP is the fragments per kilobase million (FPKM) value of the mRNA and C is the regression coefficient for the corresponding mRNA in multivariate Cox hazard model analysis. e median prognostic score of the training dataset was used to differentiate high-risk group and low-risk group. Higher scores predicted higher risk. e prognostic performance of the mRNA signature model was measured using receiver operating characteristic (ROC) curves by comparing the area under the respective ROC curves (AUC). e mRNA signature was examined for its association with patient survival. Finally, the mRNA signature model was validated with an independent data set GSE11969 of NSCLC adenocarcinoma. All reported P values were two-sided. All analyses were carried out via the R/BioConductor (version 3.5.1). Survival curves and ROCs were generated by ggplot2, survival, and survivalROC packages.

Identification of Consensus NSCLC DEGs in the Training
Set and the Test Set. A total of 7 GEO datasets (GSE10072, GSE19188, GSE21933, GSE30219, GSE32863, GSE40275, and GSE7670) were selected from the 13 raw datasets after MetaQC quality control for subsequent analysis (Table 2 and Figure 2(a)). ese datasets contained 590 NSCLC and 280 ANTs and were designated as the training set. In the training set, a total of 7373 DEGs were identified after filtering through the mean and standard deviation (SD) thresholds. A list of 7076 DEGs were subsequently obtained, and we eliminated batch effect after Fisher's t test and 300 permutations ( Figure 2(b)). Hierarchical clustering of the seven datasets in the training set using the 7076 DEGs distinguished NSCLC from ANT samples (Figure 3(a)). In the TCGA-LUAD dataset which contained 20501 mRNAs in 517 NSCLC samples and 59 ANT samples, a total of 3592 DEGs were identified and designated as the test set. e DEGs in the test set distinguished NSCLC from ANT samples in the 2-way hierarchical cluster ( Figure 3(b)). A total of overlapped 2146 DEGs between the training set and the test set were identified and designated as the consensus DEGs ( Figure 3(c)).

Coexpression Network Construction and Identification of Modules Associated with Clinicopathological Features.
We constructed a weighted gene coexpression network (WGCN) using the 2146 consensus DEGs and clinical traits and prognostic information from 515 NSCLC patients in the TCGA-LUAD test set. e results showed that the connections between the genes in the WGCN were in line with a scale-free network distribution (Figures 4(a)-4(d)). Modules with similar expression profiles were identified using the dynamic tree cut method (Figures 5(a) and 5(b)). Highly similar modules were merged ( Figure 5(b)). A total of 14 WGCN modules ranging from 54 to 607 genes in each module were generated ( Figure 5(c)). We further analyzed association of modules with clinicopathological features to identify pathological stage modules using Spearman's correlation and Module eigengenes analysis. e results showed the lightcyan module was most significantly associated with pathological M stage (R � 0.12, P � 0.009), and this module contained 61 genes ( Figure 5(c)).

Construction of an mRNA-Signature Prognostic Model and Characterization of Its Prognostic Performance Using NSCLC-LUAD.
To investigate potential genes significantly associated with the survival of NSCLC patients, we performed a univariate Cox regression analysis using the lightcyan module genes. e results showed that the twelve genes RRM2, RPS6KL1, RTL1, RXFP1, RRM1, RTCD1, RRAGB, RSPH10B2, RRM2B, RSPH9, RXFP2, and RUNX1 were significantly correlated with the overall survival of NSCLC patients (Table 3). We further performed a stepwise model selection by the Akaike information criterion (AIC), and six genes RRAGB, RSPH9, RPS6KL1, RXFP1, RTL1, and RRM2 were selected. Multivariate Cox hazard model analysis of association of these six genes with survival showed that RPS6KL1 and RXFP1 were independent factors associated with good overall survival in NSCLC patients, while RTL1 and RRM2 were independent factors associated with poor overall survival in NSCLC patients (Table 4). We generated an mRNA signature model which calculated the prognostic score: prognostic score � (− 0.2491 × EXP RRAGB ) where EXP is the FPKM value of the mRNA included in the model. We characterized the prognostic performance of the six-mRNA signature model using the TCGA-LUAD dataset. According to prognostic scores, the median score was 0.972 and we separated NSCLC patients into the low-risk group (n � 253) and the high-risk group (n � 253) (Figure 6(a)). e high-risk group had significantly shorter survival time or lower survival  No.
Study   Figure 2 was reproduced from Sun et al. [16].
probabilities than the low-risk group did (HR, 2.298; 95% CI, 1.711-3.086; log-rank test P � 9.181 × 10 − 9 ) (Figure 6(b)). e area under ROC curve (AUC) was 0.994 or 0.776 in predicting the 1-to 10-year survival of NSCLC patients (Figure 6(c)). High expression of RRAGB, RSPH9, RPS6KL1, and RXFP1 and low expression of RTL1 and RRM2 predicted good survival (Figures 6(d)-6(i)). e results suggested that the six-mRNA signature could predict the prognosis of NSCLC-LUAD.   NSCLC, we measured the prognostic performance of this model using the validation dataset GSE11969 of NSCLC adenocarcinoma. e results showed that the patients were divided into the high-risk group (n � 47) and the low-risk group (n � 47) according to the risk scores and the median cutoff point (Figure 7(a)). e high-risk group had significantly shorter survival time or lower survival probabilities than the low-risk group did (HR, 3.286; 95% CI, 1.79-6.03; log-rank test P � 0.00013) (Figure 7(b)).

Validation of the
e area under ROC curve in predicting 1-to 10-year survival of NSCLC was between 0.822 and 0.755 (Figure 7(c)). High expression of RRAGB, RSPH9, RPS6KL1, and RXFP1 and low expression of RTL1 and RRM2 expression predicted good survival of NSCLC patients (Figures 7(d)-7(i)). ese results were consistent with those of the test set (compare Figures 6 and 7), supporting that the six-mRNA signature could predict the prognosis of NSCLC adenocarcinoma.

Relative Expression Levels of RRAGB, RSPH9, RPS6KL1, RXFP1, RRM2, and RTL1 in NSCLC Tissues and the High-and
Low-Risk Groups. Examination of the expression patterns of these signature mRNAs revealed that RRAGB, RSPH9, and RXFP1 mRNA levels were significantly lower and RPS6KL1, RTL1, and RRM2 mRNA levels were significantly decreased in NSCLC of TCGA-LUAD, compared with the normal control (Figure 8(a)). e expression levels of RRAGB, RSPH9, RPS6KL1, and RXFP1 were significantly lower and RTL1 and RRM2 mRNA levels were significantly higher in the high-risk group than those in the low-risk group in NSCLC of TCGA-LUAD (Figure 8(b)). e expression levels of RRAGB, RSPH9, RPS6KL1, RXFP1, and RTL1 mRNA were significantly lower and the RRM2 mRNA level was significantly higher in the high-risk group than those in the low-risk group in NSCLC adenocarcinoma of GSE11969 (Figure 8(c)).  Figure 4 was reproduced from Sun et al. [16].  Figure 5 was reproduced from Sun et al. [16].  e listed genes were selected by the Akaike information criterion (AIC) model from the significant genes after the univariate Cox regression analysis (Table 3). Multivariate Cox regression analysis of association of the listed genes with survival was performed to reveal the independent predictor for survival and generate a prognostic risk score model. β, regression coefficient; HR, hazard ratio; CI, confidence interval; # P < 0.05.

Discussion
In the current study, we have identified a six-gene prognostic model for NSCLC adenocarcinoma by calculating prognostic score using the formula: prognostic score � (− 0.2491 × EXP RRAGB ) + (− 0.0679 × EXP RSPH9 ) + (− 0.2317 × EXP RPS6KL1 ) + (− 0.1035 × EXP RXFP1 ) + 0.1571 × EXP RRM2 + 0.1104 × EXP RTL1 , where EXP is the FPKM value of the mRNA included in the model. Characterization of prognostic performance revealed that the model separates NSCLC patients in the TCGA-LUAD dataset into the lowrisk score group and the high-risk score group with median prognostic score 0.972. Higher scores predicted higher risk. e area under ROC curve (AUC) was 0.994 or 0.776 in predicting the 1-to 10-year survival of NSCLC patients. e expression of each gene in the signature differentiates survival of NSCLC patients. e results were similar in the independent GSE11969 dataset of NSCLC adenocarcinoma. All these results support that the sixgene signature is an independent biomarker for prediction of overall survival of NSCLC adenocarcinoma.
Several expression-based gene signatures for NSCLC prognosis in the previous studies have been identified. e 8gene signature (STAT1, CLU, GTSE1, NUSAP1, ABCA8, TNNT1, ENTPD3, and CPA3) can significantly stratify patients into low-and high-risk groups and predict patients in stage II-III benefiting from adjuvant chemotherapy [24]. e independent prognostic six-protein signature (c-SRC, Cyclin E1, TTF1, p65, CHK1, and JNK1) is identified for ADC and five-protein signature (EGFR, p38α, AKT1, SOX2, and E-cadherin) for SCC [25].  low-risk subgroups with significantly different overall survival and is prognostic for both adenocarcinoma and squamous cell carcinoma cases [8].
e multigene RNA expression signature GeneFx ® Lung classifies early-stage NSCLC patients as high-risk or low-risk for disease recurrence and predicts the overall survival [9]. e 50-gene signature novel scoring system is identified for tumor-infiltrating immune cells with strong correlation with clinical outcome of stage I/II non-small-cell lung cancer [26]. e 6gene signature (ABCC4, ADRBK2, KLHL23, PDS5A, UHRF1, and ZNF551) is identified for independent prognosis of overall survival [10]. e Yin Yang Expression Ratio Signature containing 10 functionally opposing genes (GRM1, IGFBP5, NRAS, and RECQL4 in the Yin group and CRIP2, CD83, GATA2, HOXA5, SOSTDC1, and TNNC1 in the Yang group) significantly separates high-and low-risk patients with stage IA or IB adenocarcinoma and squamous cell carcinomas of all stages and can predict the benefit of adjuvant chemotherapy in high-risk patients with stage I NSCLC [11]. e 14-gene signature (11 cancer-related genes BAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, SH3BGR, and WNT3A and three reference genes ESD, TBP, and YAP1) predicts survival in resected nonsquamous, non-small-cell lung cancer [4], identifies patients at high risk of mortality despite small, node-negative lung tumors [5], improves identification of patients at risk for recurrence in early-stage none small cell lung cancer [6], and predicts benefit from adjuvant chemotherapy for very early stage NSCLC and is superior over current NCCN criteria at identifying high-risk patients [7]. e 17-gene panel consisting of genes involved in epithelial-mesenchymal transition (EMT), hypoxia response, glycometabolism, and epigenetic modifications for non-small-cell lung cancer prognosis is identified through integrative epigenomictranscriptomic analyses and clearly stratifies NSCLC patients with significant differences in overall survival [12]. Our six-gene prognostic signature for NSCLC adenocarcinoma include RRAGB, RSPH9, RPS6KL1, RXFP1, RRM2, and RTL1 genes which are not contained in the previous gene signatures. erefore, our six-gene signature is likely a novel tool for NSCLC adenocarcinoma prognosis.
Our data indicated that high expression of RRAGB, RSPH9, RPS6KL1, and RXFP1 and low expression of RTL1 and RRM2 predicted good survival. Among these six genes, we found that RRM2 (ribonucleotide reductase regulatory subunit M2 and ribonucleoside-diphosphate reductase subunit M2) is overexpressed in NSCLC tissues. e RRM2 mRNA levels are higher in the high-risk group than those in the low-risk group and may differentiate the survival of NSCLC patients. Studies have shown that RRM2 is known as a marker that may be involved in predicting clinical response to gemcitabine plus docetaxel [27] and predicting the treatment response to platinumbased chemotherapy and survival [28,29] in non-smallcell lung cancer patients. e expression levels of RRM2 and differences between primary tumors and infiltrated regional lymph nodes were correlated with relapse-free survival (RFS) and overall survival (OS) in patients with resectable non-small-cell lung cancer [30]. RRM2 regulates antiapoptotic protein Bcl-2 in head and neck and lung cancers [31,32]. BRCA1-regulated RRM2 expression protects glioblastoma cells from endogenous replication stress and promotes tumorigenicity [33]. RRM2 is regulated by the transforming growth factor beta regulator 4 (TBRG4) gene which affects tumorigenesis in human H1299 lung cancer cells [34]. It is likely that RRM2 plays an essential role in NSCLC development and progression and may serve as a key marker for NSCLC prognosis. e model signature genes RSPH9, RPS6KL1, RXFP1, and RTL1 have been revealed to be involved in cancer   Figure 8: e relative levels of RRAGB, RSPH9, RPS6KL1, RXFP1, RRM2, and RTL1 in NSCLC adenocarcinoma. (a) Compared with the normal control, RRAGB, RSPH9, and RXFP1 mRNA levels were significantly decreased and RPS6KL1, RTL1, and RRM2 mRNA levels were significantly increased in NSCLC of TCGA-LUAD. (b) e expression levels of RRAGB, RSPH9, RPS6KL1, and RXFP1 were significantly lower and RTL1 and RRM2 mRNA levels were significantly higher in the high-risk group than those in the low-risk group of TCGA-LUAD. (c) e expression levels of RRAGB, RSPH9, RPS6KL1, RXFP1, and RTL1 were significantly lower and RRM2 mRNA levels was significantly higher in the high-risk group than those in the low-risk group in NSCLC adenocarcinoma of GSE11969. activity. RSPH9 (radial spoke head 9 homolog) was significantly hypermethylated and downregulated in the hepatocellular carcinoma (HCC) and epigenetic silencing of RSPH9 may be associated with hepatocellular carcinoma [35]. In multivariate regression analysis, hypermethylation of RSPH9 was an independent predictor of non-muscle invasive bladder cancer (NMIBC) recurrence and progression, and RSPH9 could be of value for the assessment of disease recurrence and progression and for clinical decisionmaking regarding treatment [36]. RPS6KL1 (ribosomal protein S6 kinase-like 1) mutation hot spots were confirmed and validated in colorectal cancers with microsatellite instability, which might be used to develop personalized tumor profiling and therapy [37]. RXFP1 (relaxin family peptide receptor 1, relaxin receptor 1, and relaxin/insulin-like family peptide receptor 1) is a G protein-coupled receptor with the extracellular low-density lipoprotein A (LDL-A) module located at the N-terminus. Studies have revealed that RXFP1 is activated by both C1q-tumor necrosis factor-related protein 8 (CTRP8) and relaxin and contributes to growth and invasion of human glioblastoma [38][39][40]. Suppression of RXFP1 inhibits prostate cancer tumorigenesis, growth, and metastasis [41][42][43]. Relaxin 2/RXFP1 Signalling induces cell invasion via the beta-catenin pathway in endometrial cancer [44]. Expression of RXFP1 is decreased in idiopathic pulmonary fibrosis [45] and mediates the effects of miR-144-3p in lung fibroblasts from patients with idiopathic pulmonary fibrosis [46]. RXFP1 protects against airway fibrosis during homeostasis but not against fibrosis associated with chronic allergic airways disease [47]. RTL1 (retrotransposon-like protein 1, retrotransposon Gag like 1, also known as Peg11, paternally expressed 11) is essential for maintenance of the fetal capillaries. Both its loss and its overproduction cause late-fetal and/or neonatal lethality in mice [48]. e Rtl1 promoter is hypermethylated in the placentas with fetal growth restriction. Infants with severe SGA have abnormal placental DNA methylation of CpG1 in the CG4 region of RTL1, suggesting the existence of disturbed epigenetic control in utero [49]. Overexpression of RTL1 in melanoma cells accelerated cutaneous melanoma cell proliferation, promoted the passage of the cell cycle beyond G1 phase, and increased the expression of cell cycle related genes, and RTL1 promotes melanoma cell proliferation by regulating the Wnt/β-Catenin signalling pathway [50]. RTL1 activation serves as a driver of HCC. Overexpression of RTL1 was detected in 30% of analyzed human HCC samples, indicating the potential relevance of this locus as a therapeutic target for patients [51]. RRAGB (Ras related GTP binding B) mediates mTOR (mechanistic target of rapamycin kinase) and TRIM37 (tripartite motif containing 37) pathways related to amino acid-stimulated MTORC1 (MTOR complex 1) signalling and autophagy [52]. In the current study, we found that RRAGB, RSPH9, RPS6KL1, RXFP1, and RTL1 are differentially expressed between NSCLC and ANTs. Expression of RRAGB, RSPH9, RPS6KL1, RXFP1, and RTL1 is significantly associated with survival of NSCLC.

Limitations
(1) e relative expression profiles of RTL1 expression in the TCGA-LUAD and GSE11969 of NSCLC adenocarcinoma are not consistent. is may be due to intervariation between different detection platforms. (2) e six-gene prognostic signature remains to be evaluated for clinical application using multicenter randomized controlled studies and mechanistic investigation using in vivo and in vitro experiments.

Conclusions
In summary, we have identified a six-gene prognostic model for NSCLC adenocarcinoma by calculating prognostic-score using the formula: prognostic score � (− 0.2491 × EXP RRAGB ) + (− 0.0679 × EXP RSPH9 ) + (− 0.2317 × EXP RPS6KL1 ) + (− 0.1035 × EXP RXFP1 ) + 0.1571 × EXP RRM2 + 0.1104 × EXP RTL1 , where EXP is the FPKM value of the mRNA included in the model. is signature and the genes RRAGB, RSPH9, RPS6KL1, RXFP1, RRM2, and RTL1 included in this model are independent biomarkers for prediction of overall survival of NSCLC adenocarcinoma. e role of the signature genes played in NSCLC adenocarcinoma activity and prognosis remains to be investigated in the future.