Prognostic Factors Analysis and Nomogram Construction of Dual Primary Lung Cancer: A Population Study

As a special type of lung cancer, multiple primary lung cancer (MPLC) has unique biological characteristics, and its research remains limited. The aim of our research was to identify prognostic factors and construct a prognostic nomogram of dual primary lung cancer (DPLC). A population cohort study of patients with DPLC was conducted using the extracted data from the Surveillance, Epidemiology, and End Results (SEER) database. Relevant survival variables were identified using the Cox proportional hazard model. Prognostic nomogram was performed and its predictive performance was validated via the modeling and validating cohort data. Additionally, propensity score matching (PSM) was also applied to evaluate whether surgery affected the OS of this study population. 5411 eligible DPLC patients were included in this study cohort, with 41.0% of 3-year OS rate and 27.7% of 5-year OS rate. Age, sex, race, grade, stage, lymph node (LN) metastasis, histological type, primary site, and surgery were considered to be prognostic factors of OS. The C-indexes of the established nomogram were 0.70 (95% CI (0.69, 0.71)) in the modeling group and 0.70 (95% CI (0.68, 0.72)) in the validation group, which showed an ideal model discrimination ability. AUC and calibration plots of 3- and 5-year OS also proved the good performance of the established nomogram. After 1 : 1 PSM, surgery can potentially reduce the risk of OS (HR = 0.63, 95% CI: 0.56–0.72) of DPLC. The prognostic nomogram with reliable performance was developed to predict 3- and 5-year OS rates, which could assist clinicians to make more reasonable survival prediction for DPLC patients. For patients without absolute surgical contraindications, surgery should be actively considered.


Introduction
Since 1975, when Martini and Melamed proposed the diagnostic criteria [1] for multiple primary lung cancer (MPLC) to distinguish it from intrapulmonary metastasis, the reports of MPLC have been increasing. Despite the continuous improvement of medical technology and systematic treatment of lung cancer, the survival prognosis of patient with MPLC is still not optimistic. Accurate prediction of survival rate of patients with MPLC is of great significance for clinical treatment decisions. Many clinical factors were reported to be related to the survival prognosis of MPLC [2][3][4][5], such as age, sex, lymph nodes, tumor stage, and histological type. But so far, there is still no clear conclusion on the prognostic factors of MPLC patients, and few large multicenter studies have evaluated them so as to make reasonable and accurate prediction for the survival prognosis of MPLC patients. Although TNM staging system is the most commonly used method to determine prognosis, it has limitations and the survival time of patients with the same histological type and the same TNM stage still varies greatly. In addition, as a special type of lung cancer, MPLC has special biological characteristics, and the commonly used TNM staging standard is not suitable for the selection of MPLC treatment decisions and prognosis judgment. erefore, it is required to seek a more refined method to predict the survival of MPLC patients. A nomogram is a good choice for this purpose. In recent years, nomogram has been widely used to evaluate the prognosis of patients with cancer because it can include various prognostic factors, quantify the effects of these factors on survival prognosis, and visualize the results so as to predict the survival rate of patients [6][7][8]. In this study, we selected dual primary lung cancer (DPLC) patients as the research objects because the vast majority of MPLC was DPLC. We analyzed the patient's available data in the Surveillance, Epidemiology, and End Results (SEER) database. And the Cox proportional hazard model was utilized to identify prognostic factors and develop a prognostic nomogram to establish a relatively systematic evaluation system so as to accurately predict the 3-year and 5-year overall survival (OS) rates of patients with DPLC. Propensity score matching (PSM) was considered more suitable for nonrandomized controlled studies due to its ability of reducing the potential selection bias [9]. We also used the method of PSM to evaluate the impact of surgery on OS in DPLC patients.

Data Source.
e SEER database, full name (Surveillance, Epidemiology, and End Results), is the authoritative cancer statistics database in the United States, which records the morbidity, mortality, and disease status of millions of patients with malignant tumors in some states and counties (18 registration  in the study cohort must include complete data in the abovementioned variables. e patient selection process was summarized in Figure S1.

Neoplastic Grade and Stage.
According to the grades and stages of FPLC and SPLC of the same patient, we took the grade with poorer differentiation and later stage of the two (FPLC and SPLC) as the final grade and stage of this patient. Grades included I, II, III, IV, and unknown, and all patients  were staged as IA, IB, IIA, IIB, IIIA, IIIB, and IV in this study.

Statistical Analysis.
In our study, OS started from the diagnosis of SPLC. OS rates of all variables were calculated using Kaplan-Meier method by the SPSS version 23.0 (IBM SPSS Inc., Chicago, IL, USA). Simple random sampling was performed with the random sampling function (sample ( ) function) in version 3.6.0 of R software, and patients were randomly divided into the modeling and validation groups by a ratio of 7 to 3, as shown in Table 1. All variables in the modeling group were considered in univariate and multivariate survival analysis by using the Cox proportional hazard model. Additionally, we also utilized the proportional hazards model to estimate OS hazard ratios for prognostic factors, which included age, sex, race, grade, histological type, primary site, interval (months since FPLC), AJCC Stage, AJCC N, surgery, radiation, and chemotherapy. In order to reduce the interference of confounding factors and improve the accuracy of predictive value of the nomogram, univariate survival analysis was performed for all variables, followed by multivariate analysis for statistically significant variables (p < 0.05). And then we used all relevant independent prognostic factors of OS to construct their prognostic nomogram at 3-and 5-year OS. e nomogram was developed with "rms" package. e AUC and C-index were applied to evaluate the predictive value of the established nomogram. e value of the C-index statistic ranged from 0.5 to 1.0, and the higher the C-index, the higher the predictive value [10]. Moreover, the performance of the prognostic nomogram was also assessed through internal validation (the modeling and verification groups). Bootstraps with 1000 resamples were adopted to decrease overfit bias. Test level a = 0.05. In the ideal calibration curve, the predicted value is equal to the actual observed value, and the curve will be infinitely close to the ideal 45°oblique line. Finally, PSM method was used to minimize the substantial differences that exist in terms of clinical characteristics between the two different groups (no surgery group and surgery group), which can better evaluate the effect of surgery on OS in patients with DPLC. We adopted the R software (version 3.6.0) making use of the "MatchIt" package for calculating propensity scores. A 1 : 1 matched analysis was performed using the nearest neighbor method with a caliper of 0.05.

Clinical and Pathological Characteristics of All Patients.
We identified 5411 DPLC patients diagnosed from 2004 to 2015. eir clinicopathological features of FPLC and SPLC  Table 3. It can be seen from the table that the patients whose FPLC and SPLC were both adenocarcinoma (named "Aden-aden") had 1000 people, accounting for 18.5% and their 3-and 5-year OS rates were, respectively, 47.4% and 34.9%. e vast majority of patients (65.7%) in this study had two primary tumors (FPLC and SPLC) on different sides, having a 42.0% of 3-year OS rate and 28.4% of 5-year OS rate. Stage IA patients accounted for 16.7% in this study population, who had higher 3-and 5-year OS rates than those in other stages (p < 0.001). In addition, there were 1866 patients receiving surgical treatment for both FPLC and SPLC (named "Yes-Yes") and 1152 patients not (named "No-No"), accounting for 34.5% and 21.3%, respectively. Patients with surgery were associated with higher OS (p < 0.001).

Analysis of Factors Influencing Survival and Prognosis.
Univariate and multivariate Cox analysis was performed on 3791 patients in the modeling group and the results showed that age at diagnosis, gender, race, neoplastic grade, stage, LN metastasis, histological type, location, and surgery were closely related to survival prognosis of DPLC patients (p < 0.05), as shown in Table 4. In order to more vividly reflect the relationship between independent risk factors and survival time, 5411 DPLC patients were analyzed and survival curves were drawn using the Kaplan-Meier method ( Figure 1). It can be seen from Table 4 and Figure 1 that these patients who had older age, worse grade, and later stage (except IIIA and IIIB) were linked to worse prognosis. e prognosis of black and white race was not significantly abnormal, and the prognosis of the other race was better than that of white race (p < 0.001). In addition, the prognosis of male patients was worse than that of female patients (p < 0.001), and the prognosis of patients with lymph node metastasis was worse than that of patients without metastasis (p < 0.001). Compared to patients with squamous cell carcinoma in both FPLC and SPLC, patients with adenocarcinomas in both FPLC and SPLC had better prognosis (p � 0.002). e survival prognosis of patients who did not receive surgical treatment for either FPLC or SPLC was worse than that of patients who received surgical treatment whether FPLC or SPLC (p < 0.001).

e Predictive Effect of Nomogram on Overall Survival.
Nomogram included all statistically significant prognostic factors in the Cox proportional hazard model, including age, sex, race, neoplastic grade, histological type, primary site, stage, LN metastasis, and surgery. e prediction results of 3-and 5-year OS rates are shown in Figure 2. According to the different classification of each feature, points are projected upward to get the score of each item. e total points are calculated by adding all the points, and the survival rate of patients can be calculated by projecting the total points downward. e higher the score, the worse the survival prognosis. is nomogram can be used to predict the survival rate of different patients according to their own conditions, so as to improve the efficiency and accuracy of the prediction. In this study, the established nomogram was verified with bootstrap method. e number of self-sampling B was 1000, and the validation results showed that the C-indexes of the modeling and validation groups were 0.70 (95% CI (0.69, 0.71)) and 0.70 (95% CI (0.68, 0.72)), respectively, both of which had good predictive value. e ROC curves and AUC also proved this conclusion (Table S1 and Figure S2). Additionally, the calibration curves of 3-and 5-year OS rates of the modeling and validation groups are shown in Figure 3, from which we could see that the calibration curves of both modeling and validation groups were close to the ideal 45°dotted line, indicating a good consistency between the predicted value and the actual observed value.

Propensity Score Matching of Surgery in
is Study Population. In this study, surgery was regarded as the prognosis factor of OS and the survival prognosis of patients who did not receive surgical treatment for either FPLC or SPLC was worse than that of patients who received surgical treatment in whether FPLC or SPLC (Table 4 and Figure 1). However, there were significant differences in some variables between the patients with surgery and the patients without surgery in the study cohort, including race, histological type, interval, grade, stage, LN metastasis, radiation, and chemotherapy (Table S2). So PSM method was used to reduce the differences in these variables between the two groups so as to better evaluate the effect of surgery. e matching effect of the method can be seen from Table S2 and Figure S3. We found 666 paired DPLC patients with nearly balanced variables after 1 : 1 PSM (Table S2).
Before PSM, median survival time of the patients who received surgical treatment was significantly longer than that without surgery (30 months vs. 15 months), which was consistent with the results after PSM (22 months vs. 15 months). In addition, before PSM, the 3-year OS in the surgery and no surgery groups was 45.3% and 24.3%, respectively (p < 0.001). And after PSM, there was 37.7% of 3year OS rate in the surgery group while there was 24.5% of 3year OS rate in the no surgery group (p < 0.001). Surgery seemed to be related to the low risk of OS of DPLC before (HR � 0.53 95% CI: 0.49-0.58) and after (HR � 0.63 95% CI: 0.56-0.72) PSM (Figure 4).

Discussion
With the rapid development of medical technology, the improvement of people's living standards, and the extension of survival of lung cancer patients, the detection rate of MPLC continues to improve. akur and his colleagues observed that SPLC occurred in 3% of 156,494 patients with primary lung cancers, and the incidence of SPLC was 1.10% per year. e risk did not stabilize over time. e study also found that patients with a history of lung cancer had a higher risk of developing new primary lung cancer than the general population [11]. In recent years, more and more attention has been paid to the survival prognosis of MPLC. However, there were no large studies to evaluate prognostic factors and construct a prognostic nomogram of DPLC. In our study, we found that age at diagnosis, gender, race, neoplastic grade, stage, LN metastasis, histological type, tumor location, and surgery were closely associated with OS of DPLC through the univariate and multivariate Cox regression analysis, as shown in Table 4 and Figure 1. All DPLC patients were involved in the study cohort, with 41.0% of 3-year OS rate and 27.7% of 5-year OS rate. e survival rate was lower than in other studies [12][13][14]. ere were several reasons for this difference. Firstly, the cases included in these studies were all surgically treated patients, and surgery can significantly improve the survival rate of DPLC patients, as our study and others concluded [15][16][17][18]. Secondly, the starting point for calculating survival time in these studies was different and the survival time was measured from the diagnosis of SPLC in our study. irdly, these studies were small sample and single center retrospective studies with obvious selective bias.
It is worth noting that there are still no clear treatment guidelines and plans for MPLC. At present, it is generally agreed that surgical treatment is the first choice for MPLC, and other treatment methods can be combined for lesions that cannot be completely resected. In order to reduce the influence of hybrid factors, the Cox proportional hazard model and PSM method were used to evaluate the impact of surgery on DPLC patients' survival, and the results showed that surgery can improve the long-term survival of DPLC patients.
Furthermore, in our study, age, sex, race, LN metastasis, stage, and neoplastic grade were also regarded as independent prognostic factors for OS of DPLC patients; the "Aden-aden" � both first and second primary lung cancer are adenocarcinomas; "Squa-squa" � both first and second primary lung cancer are squamous cell carcinomas; "Squa-aden" � FPLC is squamous cell carcinoma and SPLC is adenocarcinoma; "Aden-squa" � FPLC is adenocarcinoma and SPLC is squamous cell carcinoma; "BAC-aden" � FPLC is bronchioloalveolar cancer and SPLC is adenocarcinoma; Surgery: "Yes-yes" � patients received corresponding treatment for both first and second primary lung cancer; "Yes-no" � patients received corresponding treatment for FPLC and not for SPLC; "No-yes" � patients received corresponding treatment for SPLC and not for FPLC; "No-no" � patients did not received corresponding treatment for first and second primary lung cancer.
same results were also observed in other studies [3][4][5]15]. Tanvetyanon and his colleagues reported that adenocarcinoma was related to better outcomes [2], which was consistent with our research results. Compared to patients with other histological types, patients with "BAC-aden" or with "aden-aden" had better prognosis (p < 0.001). In addition, tumor location was considered to be associated with prognosis in DPLC patients in the univariate survival analysis; that is, the prognosis was better on the opposite side than on the same side. A similar result has been previously reported [2]. However, this conclusion was contrary to that of Ishikawa and his colleagues [19]. In multivariate survival analyses, tumor location was not statistically significant (p � 0.479) in our study, which was similar to the results of others [14,20].   Whether the time interval between FPLC and SPLC is related to OS of the DPLC patients has been controversial. Some studies suggested that the longer the interval, the better the prognosis [21][22][23][24]. Aziz and his colleagues argued that the longer interval was associated with less invasive SPLC [24]. However, other studies had not come to the same conclusion [25,26]. Some of these studies reported that the prognosis of synchronous MPLC was better than that of metachronous MPLC, while a meta-analysis suggested that time interval had nothing to do with OS of MPLC patients [27]. In our study, the relationship between time interval and prognosis of DPLC patients was reversed in univariate and multivariate survival analyses. And the reason for our results was that the SEER database can only provide a limited number of fields so that there were many other unknowns that cannot be included in the analysis and thus the interference of confounding factors cannot be completely eliminated. at is also why we did not treat the time interval as an independent prognostic factor for DPLC patients. In addition, a lot of researches have demonstrated the benefits of chemotherapy and radiotherapy in MPLC patients [14,[28][29][30], and it has been agreed that chemotherapy and radiotherapy can improve the survival of MPLC patients. However, our study found that radiotherapy and chemotherapy did not increase the survival advantage of patients (in the univariate survival analysis), so we did not include radiotherapy and chemotherapy into the multivariate survival analysis. Moreover, the SEER database cannot provide the specific chemotherapy plan and time of DPLC patients; some patients were recommended by doctors to receive radiotherapy while the patients gave up radiotherapy and specific radiation information was not available from SEER database. All mentioned above were closely related to the prognosis of them. For these reasons, we did not consider chemotherapy and radiotherapy as predictors of DPLC patients.
Our study has the following advantages. Firstly, our study is the first attempt to use nomogram to predict survival and prognosis of DPLC, including 5411 patients from SEER database in the study cohort. In recent years, some nomograms based on SEER database have been widely used in many studies on a variety of cancers [6,7,31,32]. e SEER database collects a lot of information on the population of 18 registration stations distributed throughout the United States, which accounts for about 28 percent of the US population, with a data accuracy up to 95% [33]. erefore, it can provide good data support for the construction of clinical prediction models, which is not possible in general single center studies and small sample studies. Secondly, in this study, all prognostic factors mentioned above were included, and different sets of each indicator were quantified to construct a relatively systematic and complete evaluation system. e nomogram based on the above factors had a C-index of 0.70 (95% CI (0.69, 0.71)) in the modeling group and 0.70 (95% CI (0.68, 0.72)) in the verification group, respectively. And the calibration curves of the two groups also showed good consistency, as shown in Figure 3, all of which revealed that the clinical predictive model had relatively ideal predictive value. erefore, we constructed the prognostic predictive model with good performance, which can assist doctors to evaluate the prognosis of DPLC patients so as to take corresponding measures.
Certainly, our research also has some shortcomings. e first disadvantage is that we considered the clinicopathological characteristics of FPLC and SPLC at the same time when studying the prognosis factors of DPLC patients in order to make the information of DPLC patients more comprehensive and the study more convincing, which made it more difficult to group each individual prognostic factor precisely, such as the specific type of surgery, lymph node status, tumor location, and size for each primary lung lesion. Second, the SEER database does not provide specific chemotherapy regimens of FPLC and SPLC, which can affect the effectiveness of treatment and is closely related to survival.
ird, the database also lacks important information such as family history of lung cancer and smoking, which may be the prognostic factors of DPLC. In addition, our study is a retrospective analysis and patients with incomplete information were removed from the study, which inevitably led to selective bias. Considering the shortcomings of retrospective analysis, further prospective analysis should be recommended for prognostic factor assessment.

Conclusion
In summary, patients with DPLC have poor prognosis with approximately 42.0% of 3-year OS rate and 27.7% of 5-year OS rate. Age at diagnosis, gender, race, neoplastic grade, stage, LN metastasis, histological type, tumor location, and surgery were seen as prognostic factors of OS in DPLC patients. e nomogram based on these factors has good predictive value. Surgical resection is effective treatment for patients with DPLC. us, for patients without absolute surgical contraindications, surgery should be actively considered.

FPLC:
First primary lung cancer SPLC: Second primary lung cancer DPLC: Dual primary lung cancer MPLC: Multiple primary lung cancer OS: Overall survival HR: Hazards ratio ROC: Receiver-operating characteristic AUC: Area under the curve C-index: Concordance index PSM: Propensity score matching SEER: Surveillance, Epidemiology, and End Results.

Data Availability
e data supporting the results reported in this article can be available by contacting the corresponding author.

Conflicts of Interest
e authors declare no conflicts of interest. Figure S1: flow chart detailing the selection of the patients in this study. MP-SIR, multiple primary-standard incidence rate; ICD, International Classification of Diseases for Oncology; AJCC, American Joint Committee on Cancer. Figure  S2: ROC curves. ROC curve analyses were generated to evaluate the predictive value of the established nomogram by the AUC. A and B came from the modeling (3-year and 5year OS, resp.). C and D came from the validation group (3year and 5-year OS, resp.). Figure S3: distribution of propensity scores. e treatment units represent surgery group, and the control units represent no surgery group. Each circle represents one patient. e size of the circles for matched patients is proportional to the distance obtained by the propensity score matching procedure. Table S1: C-index and AUC for the established nomogram to predict 3-year and 5year overall survival.