Identification of nine mutant genes and establishment of three prediction models of organ tropism metastases of non‐small cell lung cancer

Abstract Background Most Non‐small cell lung cancer (NSCLC) patients tend to have metastases at the initial diagnosis. However, limited knowledge has been established regarding which factors, are associated with its metastases. This study aims to identify more biomarkers associated with its organ tropism metastasis and to establish models for prediction of its metastatic organs. Methods We performed targeted next‐generation sequencing (NGS) to detect genes related to lung cancer in 272 patients with primary advanced NSCLC from Northeast China. We adopted Fisher test, multivariate logistic regression analysis to identify metastasis‐related gene mutations and to establish prediction models. Results Mutations of EGFR (p = 0.0003, OR = 2.554) (especially EGFR L858R [p = 0.02, OR = 2.009]), ATM (p = 0.008, OR = 11.032), and JAK2 (p = 0.009, OR = Inf) were positively and of TP53 exon4mut (p = 0.001, OR = 0.173) was negatively correlated with lung metastasis, and those of CSF1R (p = 0.01, OR = Inf), KIT (p = 0.03, OR = 4.746), MYC (p = 0.05, OR = 7.938), and ERBB2 (p = 0.02, OR = 2.666) were positively correlated with pleural dissemination; those of TP53 (p = 0.01, OR = 0.417) was negatively, while of SMAD4 (p = 0.03, OR = 4.957) was positively correlated with brain metastasis of NSCLC. Additionally, smoking history (p = 0.004, OR = 0.004) was negatively correlated with pleural dissemination of NSCLC. Furthermore, models for prediction of lung metastasis (AUC = 0.706), pleural dissemination (AUC = 0.651), and brane metastasis (AUC = 0.629) were established. Conclusion Taken together, this study revealed nine mutant genes and smoking history associated with organ tropism metastases of NSCLC and provided three models for the prediction of metastatic organs. This study enables us to predict the organs to which non‐small cell lung cancer metastasizes before it does develop.


| INTRODUCTION
Lung cancer is a malignant tumor with the second-highest incidence and the highest mortality in the world. 1 For all stages, Only 10% to 20% of lung cancer patients survive over 5 years after diagnosis in most countries, due to its fast progression and metastatic potency. 2 Metastases have been estimated to account for approximately 90% of all lung cancer caused deaths. 3 The preference metastatic organs of lung cancer include brain, pleural, bone, liver, adrenals and lung. 4 Patients with metastases usually have shorter median overall survival and lower quality of life. The prevention and early diagnosis of metastases can help prolong the survival and improve the life quality of patients. 5 Therefore, it is of great significance to study the molecular mechanisms of and correlated risk factors of metastases. Previous studies have revealed that clinical factors such as age, gender, primary sites of tumors, smoking history, and pathology are associated with metastases of lung cancer. [6][7][8][9] A series of gene mutations and their related pathways associated with metastases also have been identified. For example, Wingless and INT-1 (WNT) and mitogen-activated protein kinases (MAPK) signaling pathways and alterations of EGFR, anaplastic lymphoma kinase (ALK), Kirsten rat sarcoma viral oncogene homolog (KRAS), matrix metalloproteinase (MMP), mesenchymalepithelial transition (MET) and Live kinase B1 (LKB1) genes have been identified to be involved in brain metas-tasis9, and WNT, MAPK and NF-kappaB (NFκB) signaling pathways in bone metastasis and EGFR mutations in lung metastasis of lung cancer. 10 However, knowledge of mutated genes and molecular mechanisms associated with lung cancer metastases is still limited, so further studies are needed to identify more key genes.
Clinical next-generation sequencing (NGS) assays include whole genome sequencing (WGS), whole exome sequencing (WES) and targeted next-generation sequencing. With the wide application of clinical NGS in detection of cancer patients, unprecedented scale of information about tumor-related genomic alterations are being accumulated, making it easier to identify more gene mutations associated with lung cancer metastasis. 11 Non-small cell lung cancer (NSCLC) is the most common form of lung cancer with more than a half of patients have distant metastases at the time of diagnosis. 12 The incidence of ipsilateral lung metastasis is estimated over 60%, of contralateral lung metastasis 26%-28%, of bone metastasis 35%-43%, of liver 18%-20%, and of adrenal metastasis 21%-27%. 13 In this study, we identified nine mutant genes and a clinical factor that showed significant correlation with brain, lung, and pleural metastases of NSCLC and established three models for prediction of these metastases, respectively.

| Patients
This retrospective cohort consisted of 272 patients diagnosed of advanced non-small cell lung cancer who were admitted to the Department of oncology, Liaoning Cancer Hospital and Institute from 2016 to 2020. The essential information including age, smoking history and gender of all patients was collected. Then the pathological types, primary sites, and metastatic sites of the patients were detected by imaging and pathological examination. The genotypes of 63 lung cancer associated genes in tumor tissues of the primary sites were determined using targeted NGS method after diagnosis.
This study was carried out with ethics committee approval by the Medical Ethics Committee of Liaoning Cancer Hospital and Institute and signed informed consents were collected from all patients.

| Targeted NGS
DNA was extracted from paraffin-embedded tissues of the primary site using AllPrep DNA/RNA mini Kit (Qiagen 80,204) according to its instructions. The libraries of gDNA were constructed according to the manufacturer's protocols by a KAPA Hyper Prep kit (Kapa Biosystems). A panel of 63 lung cancer-related genes was used to enrich the gDNA libraries. The custom-designed capture probes of the panel were manufactured by Agilent, USA. The enriched gDNA libraries were amplified with P5/P7 primers and qualified by the 2200 Bioanalyzer (Agilent Technologies) and quantified by the qBittorrent (version 3), followed by being sequenced on the Hiseq X10 platform (Illumina). The NGS data were primary analyzed using trimmomatic-0.36. Sequence reads were aligned against human reference genome (version GRCh37/hg 14 ) using bwa (version 0.7.10). Samtools (version 1.3.1) and pindel (version 0.2.5b8, 20,151,210) were used to identify candidate somatic mutations in the targeted regions.

K E Y W O R D S
gene mutations, NGS, NSCLC, organ tropism metastases, prediction model, targeted therapy Finally, filter alignment and sequencing artifacts were conducted using IGV (Integrative Genomics Viewer).

| Statistical analyses
The Fisher tests, multiple logistic regressions were performed using fisher. test and glm functions of R version 4.0.4, respectively. p-value <0.05 indicates statistically significant and < 0.01 statistically very significant. Odds ratio (OR) > 1 means higher risk of occurrences of metastases and < 1 means lower risk. The logistic regression model models and receiver operating characteristic (ROC) curves were constructed by pROC of R version 4.0.4 using our own data and published data of other studies. [15][16][17][18][19] Threshold indicates the probability of occurrences of metastasis of each sample and each point on the ROC curve corresponds to a threshold. The sensitivity and specificity of the model were established using an optimized threshold value which was determined by Youden's index.

| Characteristics of patients
The age range of the 272 enrolled NSCLC patients was 28-97 with the median age of 63. There are 143 (52.57%) male and 132 (47.43%) female. One hundred and eight (39.71%) patients had a history of smoking and 159 (58.46%) did not. Histopathological examination showed that the tumors of 210 (77.21%) patients were adenocarcinoma, of 28 (10.29%) squamous cell and of 34 (12.5%) other types of NSCLC. The proportion of squamous cell carcinoma patients is lower than normal in this cohort because their lack of targeting gene mutations of drugs, which may make the statistical analysis results based on pathological types be unreliable. Primary sites of the tumor of 104 (38.24%) patients were in the right lung and of 164 (60.29%) in the left lobes. Metastases had been developed in most of the patients at the initial diagnosis. The tumors of 227 (83.46%) patients were in stage IV and of 45 (16.54%) in stage III. Lung metastasis was detected in 109 (40.07%), pleural dissemination (pleural effusion, pericardial effusion and nodes in pleura) in 93 (34.19%), brain metastasis in 48 (17.65%), bone metastasis in 78 (28.68%), and liver metastasis in 25 (9.19%) patients (Table 1).

| Prediction models for organ tropism metastases of NSCLC
To further determine the association of the factors identified by univariate analyses with NSCLC metastases, all of them were then included in logistic multivariate analyses. The results confirmed that mutations of EGFR (p = 0.0008, OR = 2.584) and of ATM (p = 0.04, OR = 12.598) were positively correlated with while, TP53 exon4mut (p = 0.004, OR = 0.154) and RET (p = 0.07, OR = 0.109) was negatively correlated with lung metastasis. Mutations ALK was positively correlated with lung metastasis but was not statistically significant (p = 0.08). Additionally, we determined that mutations of KIT (p = 0.02, OR = 6.076) and TP53 exon4mut (p = 0.01, OR = 2.956) were positively and smoking history (p = 0.008, OR = 0.416) was negatively correlated with pleural dissemination. Furthermore, TP53 Note: Red asterisk '*', indicates positive correlation; double red asterisk '**', significant positive correlation; blue asterisk '*', negative correlation; double blue asterisk '**', negative correlation significant. OR is short for odds ratio and CI is short for confidence interval.
(p = 0.02, OR = 5.667) was negatively correlated with brain metastasis significantly while SMAD4 (p = 0.02, OR = 6.076) was positively correlated with significantly (Table 3). Based on logistic multivariate analyses significant variates were combined to construct 3 logistic models for prediction of lung pleural and brane metastasis, respectively. The AUC (area under the curve) of the lung metastasis prediction model was 0.706, of pleural dissemination 0.651 and of brane metastasis 0.629 ( Figure 3, Table 4). Then we selected the cutoff threshold according to the highest Youden index. The cutoff threshold of the lung metastasis prediction model is 0.484 with a sensitivity of 63.30% and a specificity of 69.94% (Table 4). The threshold of the pleural dissemination and brain metastasis prediction models are 0.277 (sensitivity of 78.49%, specificity of 42.34%) and 0.19 (54.17%, 70.09%) (Figure 3, Table 4), respectively.

| DISCUSSION
Most NSCLC patients are at advanced stages and have developed metastases at the initial diagnosis. It was hypothesized that metastases of cancers involved seven steps which correlated with cancer-cell surface proteins, cytokines, chemokines and growth factors in the tumormicro-environment19. Additionally, they proposed that CXCL12/CXCR4 signaling pathways, EIF4EBP1, EGFR, ERBB2 and VEGFR2 involved in lung cancer brain metastasis, L1CAM-mediated ERK1/2 signaling, EGFR and KRAS involved in lung cancer bone metastasis of and type-1 insulin-like growth factor receptor (IGF-1R) involved in lung cancer liver metastasis19. In this study we detected the somatic mutations of 272 NSCLC patients (stage III or IV) using targeted NGS. Although the frequency of mutation of ALK (15.07% vs. 5%-10%) and ERBB2 (11.40% vs. 2%-4%) was higher than previous reported, 20 the frequency of ALK fusions (5.15% vs. 7.8%) was not. 21 Through subsequent univariate analyses we identified variates including nine mutant genes and smoking history that significantly correlated with the organ tropism metastases of NSCLC. Then we established three models for prediction of lung metastasis (EGFR + TP53 exon4mut + RET + ATM + ALK), pleural dissemination (Smoking+ TP53 exon4mut + KIT) and brane metastasis (TP53 + SMAD4), respectively. The AUC of the three prediction models are greater than 0.6. Both specificity and sensitivity of the lung metastasis prediction model are greater than 60%, sensitivity of the pleural dissemination prediction model and the specificity of brain metastasis model are greater than 70%. These characters of the models suggest that a comprehensive assessment of and gene mutations and other risk factors will be of great value in predicting organ tropism metastases of NSCLC and that examination of corresponding organs should be performed during follow-up of patients with these variates.
Lung is a major site of metastases formation from a variety of malignancies such as breast and colon cancer,  Previous studies showed that the mutations of KRAS and BRAF had positive correlations to lung metastasis of colorectal cancer and papillary thyroid carcinoma, respectively. 22,23 We found that mutations of EGFR were positively correlated with lung metastasis of NSCLC, which is consistent with previous studies. 24 Additionally, EGFR L858R exhibited significant positive correlation with lung metastasis while EGFR exon19del did not, which might provide a new way to explain the poorer prognosis of EGFR L858R. Besides, ATM mutations suggested an increase while TP53 exon4mut a decrease in the risk of lung metastasis of NSCLC. Olaparide has a significant effect on both castration resistant prostate cancer (CRPC) and ovarian cancer patients with ATM mutation, 25,26 and a multi-center clinical trial called ORION has been carried out for olaparide in the treatment of advanced NSCLC. In addition, a recent study showed that a metastatic colorectal cancer patient with ATM loss of function mutation benefited from an ATR inhibitor M6620 (VX-970) monotherapy. 27 We can speculate that there will be drugs targeting ATM to treat NSCLC in the future. So far, only a few mutant genes have been revealed to be associated with pleural dissemination. Guo et al. (2016) reported that some mutated genotypes of EGFR had positive correlation with pleural dissemination. 13 Our results showed that mutations of TP53 exon4mut, KIT and MYC positively correlated with the pleural dissemination of NSCLC. KIT is a tyrosine kinase receptorencoding gene with its mutations occurring in about 75%-80% gastrointestinal stromal tumors (GIST), 9.5% in melanoma, and 3% in lung cancer. Several TKIs like Imatinib, Sunitinib and Nilotinib were identified to have specificity for KIT. 28 MYC is an oncogene that participate in regulation of 20% cancers so it is a hot topic to development drugs targeting MYC and its related pathways. 29 However, there remains no drug directly targeting MYC at present. Pleural dissemination of lung cancer usually leads to malignant pleural effusion which could shorten survival of patients. Our findings showed that KIT, MYC, and TP53 exon4mut mutations may suggest a high risk of pleural dissemination. Thus, applications of drugs targeting them or their associated pathways in the treatment of advanced NSCLC would prolong the survival of patients by reducing the risk of pleural invasion. Besides, we found that smoking history negatively correlated with pleural dissemination, which is consistent with previous studies 30 but smoking should not be promoted because it can increase the risk of multiple cancers. 31 Lung cancer is prone to brain metastasis and it is estimated that advanced lung cancer contributes to almost half of brain metastasis patients. 32 Many factors including pathology, age, level of tumor markers (Neuronspecific enolase (NSE) and Carcinoembryonic antigen (CEA)) and tumor-associated gene mutations have been identified to associate with brain metastasis. Previous researches showed that activation of Ras, Wnt, and PIK3A pathways promote brain metastasis. 32 Alterations of multiple genes such as EGFR, ALK, LKB1, KRAS, HOXB9, LEF1, ANGPT4, PDGFRB, YAP1, and MMP13 have been found to associated with brain metastasis of NSCLC9. Herein, our results showed that mutations of SMAD4 positively correlated with brain metastasis while mutations of TP53 negatively did. SMAD4 is a transcription factor in the TGF-β/BMP-SMAD4 signaling pathway and participates in the regulation of tissue homeostasis, embryonic development, epithelial-to-mesenchymal transition (EMT) and extracellular matrix remodeling. Mutation frequency of SMAD4 is estimated about 50% in pancreatic cancer, about 30% in colon cancer whereas the frequencies in prostate, breast, liver and lung cancer are lower9. 33,34 Functional SMAD4 is a tumor suppressor and its inactivation promotes lung cancer metastasis through de-repression of PAK3 by miRNA regulation. 35 However, there has been no studies showing the contribution of mutation of SMAD4 to the brain metastasis of lung cancer prior to our research. Although mutation frequency of SMAD4 is high in pancreatic and colon cancer, there still are no drugs targeting SMAD4. Previous studies have shown that mutations of both EGFR exon 19 del and EGFR L858R can increase the risk of brain metastasis of NSCLC24. 36,37 We also found positive correlation between mutations of EGFR and the occurrence of brain metastasis but it was not statistically significant here (p = 0.11). To confirm this correlation, mechanism studies and larger sample size surveys need to be conducted in the future.
TP53, one of the most frequently mutated tumor suppressor genes in human cancers, expresses a transcription factor p53 which contains the tetramerization motif, the transactivation motif and the DNA-binding domain. 38,39 The frequency of simultaneous mutations of TP53 with EGFR or KRAS is higher than 5% in both Western and Asian lung adenocarcinoma patients while it is significantly lower in lung squamous cell carcinoma patients of the two major groups. 40 Most of the somatic missense mutations occur within the DNA-binding domain of p53 in cancers. 41 Germline TP53 mutations account for about 70% of families with Li-Fraumeni syndrome which is associated with hereditary of several cancers including lung adenocarcinoma. 38,42 TP53 p.R175 and p.R248 were identified as the germline mutated sites with the highest variant rates in Chinese tumor patients with Li-Fraumeni syndrome or Li-Fraumeni-like syndrome38. So far no sites have been determined as a founder mutation for Chinese tumor patients while TP53 R337H was identified as a founder mutation in Brazilian patients with adrenocortical tumors42. In addition to initiation and progression of cancers TP53 mutations also promote metastases by facilitating faster proliferation and evolution of tumors and mutations and expression changes of genes related to metastases39. Herein, we found that TP53 exon4mut positively correlated with lung metastasis but negatively with pleural dissemination and TP53 exon6mut positively correlated with bone metastasis. It is an interesting issue worthy of further study to explore why TP53 exon4mut inhibit pleural dissemination of NSCLC.

| CONCLUSION
Our study has identified a set of potential predictors and established three models of organ tropism metastases of NSCLC. Although the mechanisms of their involvement in organ tropism metastases need to be further studied, these potential biomarkers could be used as early warning signals to prevent the occurrence of metastases and would directed target therapy of NSCLC in the future.

ACKNOWLEDGMENTS
We thank Junhui Yang and Yingfei Shi of Genetron Health (Bejing) for their help in bioinformatics analysis.

CONFLICTS OF INTEREST
The authors declare that they have no competing interests.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Approval for the study was obtained from the Institutional Review Board/Ethics Committee of the Note: These models correspond to those in Figure 3.
T A B L E 4 Sensitivity and specificity at the respective cutoff threshold of models Medical Ethics Committee of Liaoning Cancer Hospital and Institute and signed informed consents were collected from all patients.