Identification of Gene Markers for Survival Prediction of Lung Adenocarcinoma Patients Based on Integrated Multibody Data Analysis

We constructed a prognostic-related risk prediction for patients with lung adenocarcinoma by integrating multiple omics information of lung adenocarcinoma clinical information group and genome and transcriptome. Blood samples and cancer and paracancerous lung tissue samples were collected from 480 patients with lung adenocarcinoma. DNA and RNA sequencing was performed on DNA samples and RNA samples. The first follow-up was carried out 3 months after discharge. Clinical information of patients including age, gender, smoking history, and TNM stage was collected. The Cox proportional hazard model evaluated more than 600 potential SNPs related to the prognosis of lung adenocarcinoma. After LASSO analysis, we obtained 4 SNPs related to the prognosis of lung adenocarcinoma (including rs1059292, rs995343, rs2013335, and rs8078328). Through the Cox proportional hazard model, 260 candidate genes related to the prognosis of lung adenocarcinoma were evaluated. After subsequent analysis, 3 genes related to the prognosis of lung adenocarcinoma (LDHA, SDHC, and TYMS) were obtained. All survived patients were spilt into a high-risk group (n = 170) and a low-risk group (n = 170) according to 4 SNPs and 3 genes related to the prognosis of lung adenocarcinoma. The overall survival rate of patients in the high-risk group was lower than that in the low-risk group. The prognostic risk prediction index constructed by combining clinical information group and genomic and transcriptome characteristics of multiomics information can effectively distinguish the prognosis of patients with lung adenocarcinoma, which will provide effective support for the precise treatment of patients with lung adenocarcinoma.


Backgrounds
Despite improvements in knowledge of risk, development, immunologic control, and therapeutic strategies for lung cancer, it still remains the leading cause of cancer death around the world [1]. About 80% of lung cancers are nonsmall-cell lung cancer (NSCLC) [2]. Due to the lack of lung cancer screening methods in China, it is difficult to achieve early diagnosis and treatment [3]. As a result, about 75% of NSCLC patients are in the middle and advanced stages when they are discovered [4]. The treatment of lung cancer is not standardized and advanced, and only a few patients can benefit from surgery, resulting in a low 5-year survival rate of NSCLC [5]. For this reason, effective molecular markers have been found for early diagnosis and treatment and prognosis evaluation of NSCLC patients, which is of great significance for improving the overall survival of lung cancer patients. The first choice and the main treatment method for NSCLC treatment is surgery. In order to improve the cure rate of lung cancer surgery and the survival rate of patients, it is often necessary to perform adjuvant chemotherapy and radiotherapy before or after surgery [6]. Recently, researchers focus on using metaorganic framework fluorescent nanoparticles as carriers to target drug-resistant cancer cells [7]. Additionally, a review summarized important information on the 3D selfassembly nanostructure, such as peptide hydrogel, graphene, carbon nanotubes (CNTs), and fullerene, for application in gene delivery and cancer therapy and tissue engineering [8]. However, due to the characteristics of tumor heterogeneity, different patients with the same treatment plan have significant differences in efficacy. In order to solve this problem, the concept of individualized medicine has been proposed, but there is still a lack of effective clinical signs to distinguish individual differences between different patients. It is difficult to effectively distinguish between individual differences in patients and implement individualized treatment.
Although the prognostic analysis of lung adenocarcinoma transcriptome based on TCGA database has been extensively studied, including miRNA, lncRNA, and mRNA, these studies are limited to a single omics data, and the combined use of multiple omics information to predict the prognosis of lung adenocarcinoma still lacks data support [9][10][11].
At present, the application of molecular targeted therapy that uses the differences between tumor cells and normal cells at the molecular and cellular levels to target cell receptors, genes, and regulatory molecules as drug targets has become a hot spot in the current clinical application research field of lung cancer, but there are still few markers applied to clinical practice. Single nucleotide polymorphism (SNP) is the polymorphism of the DNA sequence in the population caused by a single base variation or short fragment insertion in the genome. The variation reaches a certain frequency in the population (more than 1%) [12]. With the deepening of SNP research, it has been discovered that SNPs in multiple oncogenes or tumor suppressor genes are closely related to the occurrence and development of various diseases, including tumors. In addition, SNPs can also be used to evaluate the efficacy of tumor chemotherapy. Some SNPs that are closely related to the prognosis of patients have been found in NSCLC, but the number is small, and the effect is limited when a single site is used for evaluation. For this reason, we are looking for more SNP sites that can be used for prognostic evaluation of NSCLC and formulating joint analysis strategies, which is of great significance in the individualized treatment of NSCLC.

Materials and Methods
2.1. Blood Samples. The blood samples and cancer and paracancerous lung tissue samples were surgically resected from patients who were diagnosed as NSCLC and admitted into our hospital from January 2014 to January 2016. The details of blood collection are as follows: before sample acquisition, the patients and their families need to be informed of the plan and significance of this study, and the patients and their families must agree and sign the "Informed Consent"; ensure that the enrolled patients have not undergone blood transfusion or surgery before adjuvant chemotherapy; take a blood sample after 8 hours of fasting before the operation; take 10-20 ml of blood from the vein of the upper extremity, collect it with a special test tube of EDTA anticoagulant, and mark it; collect the sample in a refrigerator at 4°C; keep it for a short time, refrigerate it, and send it to the laboratory as soon as possible to complete the whole genome DNA extraction of the blood sample.    Journal of Nanomaterials 2.2. Plasma Separation. The operation must be completed within 3 hours after blood collection. The specific steps are as follows. Place the anticoagulant test tube with blood sample on the vortexer and shake for 10 seconds, and then, place it in the centrifuge and centrifuge at low speed for 8 minutes (1200 g/min), transfer the plasma to a 1.5 ml centrifuge tube, pay attention to stop at least 0.2 ml supernatant away from the white blood cell layer, so as not to draw the white blood cell layer and affect subsequent use. Place the centrifuge tube containing 1.5 ml supernatant in the centrifuge for 5 minutes (8000 g/min), and then, divide it into 0.5 ml centrifuge tubes, each tube 0.4 ml, mark the sample research number, and place it at -80°C. Store in refrigerator. The separated blood cells were placed in a refrigerator at 4°C for storage, and the whole genome DNA extraction and RNA extraction were performed within 3 days.
2.3. Whole Genome DNA Extraction from Blood Cells. This subject uses the Omega Whole Blood DNA Extraction Kit. The specific steps are as follows. Set the water bath to 65°C before extraction. After the blood sample tube is shaken on the vortexer for 10 seconds, transfer the blood sample to a 15 ml centrifuge tube. Then, add 3 ml CL reagent to the sam-ple tube, shake it slightly, transfer to the centrifuge tube, continue to add 4 ml CL reagent to the centrifuge tube, vortex for 20 seconds to mix, centrifuge for 3 minutes (8000 g/min), and discard the supernatant liquid. Add 5 ml CL reagent to the above centrifuge tube, vortex repeatedly until the precipitate is completely dissolved, centrifuge for 3 minutes (8000 g/min), discard the supernatant, and put the centrifuge tube upside down on absorbent paper for 2 minutes to air dry. Prepare FG&PK working solution (FG reagent: proteinase K = 100 : 1) between centrifugation, and mix it up repeatedly. After air drying, add 2.5 ml FG&PK working solution to the centrifuge tube, vortex to mix. In a 65°C water bath for 30 minutes, shake it horizontally for 3 times during which time it will lyse cells more fully. Add 2.5 ml of precooled isopropanol, shake it horizontally for 20 seconds, and pay attention to precipitation. Centrifuge for 10 minutes (8000 g/min) and discard the supernatant. Add 2.5 ml of 70% ethanol, vortex for 5 seconds, centrifuge for 3 minutes (8000 g/min), and discard the supernatant. Repeat this step once. Let stand at room temperature to dry the DNA precipitation. Add 800 μl of TB elution buffer, vortex at low speed for 5 seconds, and aliquot. NanoDrop 2000 quantitative instrument detects DNA concentration. Store in a refrigerator at -80°C. After the DNA 3 Journal of Nanomaterials information of the extracted blood sample is recorded in the "Lung Cancer Sample Registration Form," it is completely entered into the database.

Sample Preparation before the Experiment.
According to the NanoDrop 2000 DNA quantification results, the blood sample DNA was diluted to the experimental concentration (10 ng/μl) one by one, and the diluted samples were loaded into a 96-well plate (10 ng/μl, final) in order. Volume 20 μl, use the random number method to randomly set 4 samples in a 96-well plate as repeated test samples, transfer the prepared 96-well plate to a 384-well plate with a row gun, each well of 10 ng DNA samples, centrifuge 384. Place the well plate in a 37°C oven for 30 minutes. When there is no liquid at the bottom of the well of the 384-well plate, prepare the plate before completing the experiment and store it in a refrigerator at 4°C.

RNA Extraction.
Cells were placed into a 1.5 ml centrifuge tube, add 1 ml Trizol, mix well, and let stand at room temperature for 5 minutes. Add 0.2 ml of chloroform, shake for 15 s, and let stand for 2 min. Centrifuge at 4°C, 12000 g × 15 min, and take the supernatant. Add 0.5 ml of isopropanol, gently mix the liquid in the tube, and let stand at room temperature for 10 min, followed by centrifugation (4°C, 12000 g × 10 min). After addition of 75% ethanol, the precipitate was centrifuged at 4°C (7500 g × 5 min) and added with DEPC H2O to dissolve (65°C for 10-15 min).
2.6. Sequencing. DNA and RNA sequencing was performed on DNA samples and RNA samples.

Clinical Information Collection.
This research group has developed a comprehensive "Individual Basic Information Registration Form." The medical staff in charge of the sample database collects basic patient information, including height, weight, age, gender, ethnicity, and education level, through face-to-face inquiry, and personal information, including family history of tumors, smoking history, long-term medication history, and occupational exposure history. After the patient is discharged from the hospital, the relevant personnel of the sample library will further collect data and fill in the "Medical Record Summary Table," including tumor size, number and location, TNM staging (according to the seventh edition of TNM staging), pathological grade (high, medium, and poorly differentiated), and tumor marker examination results, in addition to collecting the patient's hospital stay, operation time and method, whether to receive neoadjuvant therapy after surgery, and the type of dose.
2.8. Follow-Up. The first follow-up was carried out 3 months after the patient was discharged. The medical staff asked the patient the following information through telephone consultation: whether the patient was rechecked after discharge, including imaging examinations and serological examinations; whether to readmit to the hospital for radiotherapy, chemotherapy, and radiotherapy; whether there is recurrence, reoperation, and the patient's survival and death status, death time, etc. Fill in the "Lung Cancer Patient Follow-up Registration Form" after inquiry. The follow-up interval is 3 months, and the follow-up deadline is January 2018. A total of 40 cases of 480 enrolled cases were lost to follow-up, and the loss to follow-up rate was 8.3%, less than 10%, meeting the research requirements.

Prognosis Analysis.
Organize the collected patient clinical information and prognosis follow-up information. Transform information such as age, gender, smoking history, and TNM stage into the form of categorical variables. For continuous variables, the median will be divided into two points, converted into categorical variables. The survival time is the date of death or the last follow-up date minus the date of diagnosis, and the recurrence time is the date of relapse or the last follow-up date minus the date of diagnosis, all in months. The organized information is saved in SPSS file format for further analysis and use. The chi-square test and Fisher's test were used to analyze whether there were differences in clinical factors such as age, gender, smoking history, and TNM stage. Then, use the Cox proportional hazard regression model to analyze the impact of the above clinical indicators on overall survival (OS) and recurrence-free survival (RFS), and calculate the risk of death and recurrence (hazard ratios (HRs)), 95% confidence intervals (CIs), and P value. In the Cox regression analysis, clinical indicators such as age, gender, smoking history, and TNM stage are used for mutual correction. All tests were considered as significant differences with two-sided P < 0:05. The Kaplan-Meier survival curve and log-rank test (log-rank test) are used to assess whether there are differences in overall survival time and recurrence-free survival time in different patient groups.

Clinical Information and Overall Survival Analysis of NSCLC Patients.
This study included 480 patients with lung adenocarcinoma with an average age of 65:58 ± 8:98 and a median survival time of 23.65 months. There was no significant difference with regard to age, gender, and smoking status of these patients. See Table 1 for specific details.

Identification of Prognostic Genes of Lung
Adenocarcinoma. The Cox proportional hazard model evaluated 260 candidate genes related to the prognosis of lung adenocarcinoma. After subsequent analysis, three genes related to the prognosis of lung adenocarcinoma (LDHA, SDHC, and TYMS) were obtained, as shown in Table 3.

Construction and Evaluation of Genetic Prognostic Index.
According to the calculation of the SNP loci and genes obtained above, we divided all patients into a high-risk group (n = 170) and a low-risk group (n = 170). As shown in Figure 1, the overall survival rate of patients in the highrisk group was lower, and the log-rank test indicated that the difference in survival rates between the high-risk and low-risk groups was statistically significant (P < 0:05).

Discussions
Many phase III clinical trials and meta-analysis have shown that all the studied platinum-based two-line drugs have similar efficacy in the first-line treatment of advanced NSCLC. The remission rate is about 15% to 30%, the progressionfree survival period is about 46 months, and the overall survival is about 8-10 months. Compared with platinum-based double glue, the newer combination therapy does not further improve the efficacy [13][14][15]. Although the efficacy of firstline chemotherapy has reached a plateau, the prognosis of patients with advanced NSCLC is still very poor. With the latest developments in pharmacogenomics research, chemotherapy regimens can be tailored for patients with advanced NSCLC to improve the efficacy and reduce the toxicity of chemotherapy based on the expression level or polymorphism of one or several genes.
As patients with advanced NSCLC cannot or are not suitable for surgery, it is clinically recommended to use radiotherapy or a combination of radiotherapy and chemotherapy. Taking advantage of the molecular and cellular differences between tumor cells and normal cells, molecular targeted therapy targeting cell receptors, genes, and regulatory molecules as drug targets has gradually become a new strategy for the current clinical treatment of lung cancer, such as angiogenesis inhibitors and epidermal growth factor receptor inhibitors which have all been used in clinical applications. In addition, biological treatments including tumor vaccine technology, cytokine technology, monoclonal antibody technology, and gene therapy technology have gradually developed and transformed into clinical applications [16][17][18].
In this study, the genome and transcriptome data of 480 patients with lung adenocarcinoma were screened, and a total of 4 SNPs and 3 genes that were significantly related to the prognosis of lung adenocarcinoma patients were obtained, which were used to construct prediction models. Using the genetic prognosis index calculated by the model, patients with lung adenocarcinoma were divided into high-risk groups and low-risk groups. The results of survival analysis showed that the survival rate between the groups was significantly different. From previous evidence, the variant-containing genotypes of rs1059292 in 5 ′ -flanking region of CD98 gene were significantly associated with an increased risk of death in lung cancer [19]. Besides, Guo et al. found that rs995343 of the MCT2 gene exhibited an association with poor survival of NSCLC patients [20]. However, due to the lack of previous evidence of rs2013335 and rs8078328 in lung cancer, further investigations are required. As for 3 genes obtained in the present study which were supposed to be associated with prognosis of lung cancer, previous studies reported that knockdown of LDHA, SDHC, or TYMS could impede lung cancer cell migration and invasion [21][22][23], suggesting three of them were correlated with prognosis of lung cancer.
This study also has some limitations. This study did not consider the relationship between somatic mutations, structural mutations, methylation, and other changes in the levels of lung adenocarcinoma, which will be analyzed in subsequent studies. This study fails to examine the relationship between SNPs and EGFR mutations. After the integration of genetic and clinical indexes, there is no significant increase in HR value compared with the single effect, suggesting that there may be mutual influence between genetic index and clinical index. The prediction model constructed based on the genetic information of the genome and transcriptome can well identify patients with poor prognosis and high risk of lung adenocarcinoma and can predict the prognosis of patients together with clinical prognostic factors, so it can provide basis for evaluating the prognostic risk of patients with lung adenocarcinoma.

Data Availability
The data used to support the findings of this study are included within the article.