Assessment of the cancer risk factors of solitary pulmonary nodules

There are no large samples or exact prediction models for assessing the cancer risk factors of solitary pulmonary nodules (SPNs) in the Chinese population. We retrospectively analyzed the clinical and imaging data of patients with SPNs who underwent computer tomography guided needle biopsy in our hospital from Jan 1st of 2011 to March 30th of 2016. These patients were divided into a development data set and a validation data set. These groups included 1078 and 344 patients, respectively. A prediction model was developed from the development data set and was validated with the validation data set using logistic regression. The predictors of cancer in our model included female gender, age, pack-years of smoking, a previous history of malignancy, nodule size, lobulated and spiculated edges, lobulation alone and spiculation alone. The Area Under the Curves, sensitivity and specificity of our model in the development and validation data sets were significantly higher than those of the Mayo model and VA model (p < 0.001). We established the largest sampling risk prediction model of SPNs in a Chinese cohort. This model is particularly applicable to SPNs > 8 mm in size. SPNs in female patients, as well as SPNs featuring a combination of lobulated and spiculated edges or lobulated edges alone, should be evaluated carefully due to the probability that they are malignant.


INTRODUCTION
With increases in the clinical utilization of computed tomography (CT) in recent years, the pulmonary nodule detection rate has increased tremendously [1][2][3].In several large screening studies of lung cancer, the pulmonary nodule detection rate has increased from 8 to 51%.Malignant nodules account for 1.1 to 12% of these nodules [2].A solitary pulmonary nodule (SPN) is a single, round, well-circumscribed radiological opacity ≤ 3 cm in diameter [4].Most SPNs are detected incidentally by chest radiography and CT during investigations of other diseases [5].The great diagnostic challenge facing clinicians in the evaluation of a patient with an SPN is the definitive establishment of whether the nodule is benign or malignant.Fleishner and the American College of Chest Physicians (ACCP) guidelines recommend that nodules 8 mm or smaller undergo mandatory follow-up CT evaluations based on their diameters and associated risk factors.The guidelines also recommend that individuals with nodules larger than 8 mm undergo a diagnostic workup consisting of more invasive diagnostic procedures [4,[6][7][8].Devising effective strategies for managing patients with SPNs depends critically on the pre-test probability of malignancy [9].However, there are no large samples or exact prediction models for assessing the cancer risk factors of SPNs, especially in the Chinese population.Moreover, it is important to highlight that existing studies have not used more advanced imaging modalities to characterize pulmonary nodules in their analyses [5].
The aim of our study was to screen the risk factors for malignant SPNs, establish a risk prediction model

Research Paper
from a sample of consecutive patients (n = 1078) and test the model on data from a separate group of patients (n = 344) with SPNs who underwent CT-guided needle biopsies in our hospital.We also evaluated the accuracy and calibration of the Mayo Clinic and VA models, which were developed using North American or European populations to estimate the probability of malignancy, to determine whether any differences existed between our model and the Mayo and VA models with respect to cancer risk factors.

RESULTS
A total of 1422 consecutive patients with SPNs were enrolled in this study.The development data set included 1078 patients who were evaluated from Jan 1st of 2011 to April 30th of 2015, and a separate validation data set included 344 patients who were evaluated from May 1st of 2015 to March 30th of 2016.All patients enrolled in the study had biopsy pathology results.Twenty patients and four patients in the development and validation data sets, respectively, underwent repeat biopsies, and 186 patients and 24 patients in the development and validation data sets, respectively, had surgical pathology results.

Clinical data
Demographic data: In the development data set (647 men and 431 women aged 17-87 years, mean 55.41 ± 11.94 years), 414 patients had a history of smoking, and 54 patients had a history of cancer (Table 1).Additional patient clinical features are shown in Supplementary Table 1.In the validation data set (196 men and 148 women aged 13-85 years, mean 55.34 ± 11.24 years), 112 patients had a history of smoking, and 16 patients had a history of cancer.There were no significant differences between the two groups with respect to the above data (p > 0.05) (Table 1).
SPN CT characteristics: In the development data set, the average size of the SPNs was 18.43 ± 5.03 mm (ranging from 4.625 mm to 29.965 mm), and 98.42% of SPNs were > 8 mm.In the validation data set, the average size of the SPNs was 18.16 ± 5.05 mm (ranging from 5.87 mm to 29.515 mm), and 97.67% of SPNs were > 8 mm.There were no significant differences between the two groups with respect to SPN size (p > 0.05) (Table 2).

Equations for models estimating the pre-test probability of malignant SPNs
Based on the published literature, we adopted the following predictive mathematical models to estimate the malignant probability, with x varying in accordance with different formulas, such as p = e x ／(1+e x ), where e is the natural logarithm, and qualitative factors, including gender (male = 1, female = 2), previous medical history (1 = yes, 0 = no), edge (1 = yes, 0 = no), and calcification (1 = yes, 0 = no).
We used the following equation in our model:

Complications
Among all the patients enrolled in this study (n = 1422), pneumothorax (16.53%), and hemorrhage of the lung (8.79%) or pleural cavity (4.08%) were the major complications of percutaneous lung biopsy.No cases of fatal air embolism or death occurred (Supplementary Table 2).

DISCUSSION
Currently, the most common cause of cancerrelated death is lung cancer [10].Thus, early diagnosis and identification of pulmonary nodules has become increasingly important.The National Lung Screening Trial (NLST) found that although the rate of pulmonary nodule positivity was 25%, 96% of the nodules evaluated in that study were benign [11].Therefore, the most important first step in the evaluation of pulmonary nodules is to assess the possibility of cancer and then choose the optimum diagnostic methods for evaluating the nodules.Currently, the recommendation for estimating the pretest probability of malignancy in all patients with SPNs, whether qualitatively on the basis of clinical evaluations or quantitatively using validated models, remains valid [7].However, the sample sizes of the existing malignant risk prediction models for SPNs, such as the Mayo model, which included 419 cases [12]; the VA cooperative study, which included 375 cases [13]; the Herder model, which included 106 cases [14]; and the PKUPH model of China, which included only 107 cases, were small [15].The Brock model included 1871 cases; however, most of them were cases involving multiple nodules, and the study comprised a lung-screening population in which less than 5% of nodules were found to be malignant [16].Moreover, all models except the PKUPH model were developed using North American or European populations.Therefore, the most significant advantages of our study were its large SPN sample size and inclusion of a Chinese cohort.
In our model, age, malignancy history, pack-years of smoking and nodule size were the risk factors for malignant SPNs, and these factors were similar to those identified in previous models [12][13][14][15][16].We found that upper lobe location was not a risk factor for malignant nodules in our model-a finding that contrasts with those regarding the risk factors for malignant nodules in the Mayo and Brock models [12,16]-probably because tuberculosis is common in mainland China [17] and predominantly involves the upper lobe.Thus, we believe that the upper lobe is not a risk factor for malignant SPNs in countries with a high tuberculosis burden.In the assessment of malignant lung nodules, nodule size was a significant, but not decisive [18][19][20].However, the margins and contour of pulmonary nodules enable differentiation between benign and malignant nodules.Typical benign lesions have regular, smooth edges, while typical malignant nodules usually have lobulated, spiculated or irregular edges [21].Swensen et al. found that a lobulated edge was an independent risk factor for malignant pulmonary nodules, with a positive predictive value of up to 88-94% [12].In the Mayo and Brock models, spiculation was one of the risk factors associated with malignant pulmonary nodules [12,14]; however, in the same models, as well as in the Herder model, lobulation was not a risk factor associated with malignant pulmonary nodules [12][13][14]16].We found that lobulation was associated with a greater risk for malignant pulmonary nodules than spiculation and that the greatest risk factor for malignant SPNs was an edge characterized by lobulation and spiculation (p < 0.001).Among our 118 SPN cases with lobulation and spiculation, the malignancy rate reached 94.07%, indicating that nodules with lobulation and spiculation greatly warrant increased clinical attention.
The greatest risk factor that was identified in our study but not in other studies, aside from the Brock model, was female gender.In contrast to most Western countries, whose lung cancer death rates are decreasing, China's lung cancer incidence rate is still increasing, as lung cancer remains the most commonly diagnosed cancer and is the leading cause of cancer-related death in this country [22][23][24].Notably, the majority of affected non-smoking patients are women [25].Freedman et al. reported that the incidence of lung cancer was higher in non-smoking women than in non-smoking men, although women were not more susceptible to the carcinogens present in tobacco smoke than men [26].Evaluation of Pulmonary Nodules: Clinical Practice Consensus Guidelines for Asia recommend that practitioners should be aware of the risk of lung cancer caused by high levels of indoor and outdoor air pollution, as well as the high incidence of adenocarcinoma in female non-smokers [17].
NSCLC in non-smokers displays an increased clinical incidence in females and comprises a higher proportion of adenocarcinoma compared with NSCLC in ever smokers, both among surgical patients and among non-resectable advanced-stage patients [27].Therefore, it is necessary to focus more attention on female patients with SPNs.
We established the largest sampling risk prediction model of SPNs in a Chinese cohort.SPNs in females, as well as SPNs featuring lobulated and spiculated edges or lobulation alone, should be evaluated more carefully because of the probability that they are malignant.The biggest advantage of our study was that all the lung nodules were solitary nodules, making it the largest study of SPNs assessing cancer risks in a Chinese cohort.A total of 98.25% of SPNs in our study were > 8 mm in size.Clinicians must assess such nodules immediately to determine if they are benign or have risk factors for malignancy to establish their clinical diagnoses.Our model is particularly applicable to SPNs > 8 mm in size.Almost all potential clinical and advanced imaging risk factors pertaining to malignant SPNs were incorporated into this model, along with a detailed stratification of several internal factors.The limitations of this study were that it was a single-center, retrospective study and that some of the patients had incomplete clinical data.A total of 161 patients (14.94%) with non-diagnostic results in the development data set failed to receive a final diagnosis.
Additionally, the slice thickness of the images of the thorax that were obtained from Jan 1st of 2011 to Dec 30th of 2011 was 3 mm.Although these disadvantages may have led to bias that affected the results of our model, they are unavoidable and common problems in retrospective analyses.In the future, volume doubling times, bloodrelated indicators and genetic examination results will be incorporated into this assessment model to improve its accuracy [28][29][30][31][32][33].

Participants and methods
We retrospectively reviewed the data from consecutive patients with SPNs who underwent CT- ).The datasets were derived from images of the thorax.The slices of the images obtained from Jan 1st of 2011 to Dec 30th of 2011 were 3 mm thick, and the remaining slices were 1 mm thick.The slices of the images obtained after biopsy were 3 mm thick.If the nodule was near the pulmonary hila or a blood vessel, we performed enhanced CT of the lesion before or during the biopsy.An average 1 to 3 needles were used during the puncturing process to successfully obtain the samples.All patients enrolled in this study had pathological diagnostic reports.These reports included clinical data regarding sex, age, chief complaints, cigarette smoking status, previous medical histories and family histories (i.e., extra-thoracic disease history; chronic lung disease history, except cancer history; and cancer history, except lung cancer history), and histopathology.Information regarding the following chest radiological data was collected for all patients: nodule size (average of the maximum length and width) [8,9], edge characteristics (i.e., whether the edges featured spiculated protuberances, lobulation alone, spiculation alone, or lobulation and spiculation; and whether the edges were irregular or smooth) [6], density characteristics (i.e., whether the nodules were solid, purely ground-glass, or partly solid; whether the nodules featured thin cavitations or thickened cavitations; and whether the nodules displayed necrosis and calcification) (Figure 4) and location (i.e., whether the nodules were located in the upper, middle or lower lobe).We obtained approval from our institution to use patient medical records for this study, and patient confidentiality was maintained.

Statistical analysis
All statistical analyses, with the exception of those pertaining to the AUC graphs, which were performed using MedCalc, were conducted using SPSS ver.20.0 (IBM).Significance was set at p < 0.05.Qualitative variables were expressed as absolute frequencies and percentages, and numerical variables were expressed as the mean ± SD.To assess the differences between the development data set and the validation data set, we analyzed numerical variables using independent samples T-tests, and we analyzed qualitative variables using a chisquare test.
We performed logistic regression analysis using potential predictors, including sex, age, chief complaint, cigarette smoking status, previous medical history, family history, nodule size, edge characteristics, density and location, to identify the risk factors for malignant SPNs.A prediction model was developed using the development data set and was validated with the validation data set.In the regression analysis, patients with non-diagnostic results were classified into the non-malignant group.We also compared each patient›s final diagnosis to the probability of malignancy predicted by the Mayo Clinic and VA models.We assessed model accuracy by calculating the AUCs of ROC.When comparing the performances of the two models, we included patients only if a score was available for each model.

Figure 1 :
Figure 1: Histopathology of the development and validation data.* The development data set comprised 24 non-small cell lung cancers, 6 neuroendocrine carcinomas, 4 lymphoma-like epithelial carcinomas, 4 complex carcinomas, 8 carcinoids, 2 sarcoma carcinomas, and 1 lymphoma.The validation data set comprised 19 NSCLCs, 1 neuroendocrine carcinoma, and 1 sarcoma carcinoma.† The development data set comprised 16 sclerosing angiomas, 11 hamartomas, 8 cartilaginous tumors, 2 vascular smooth muscle tumors, 6 inflammatory pseudotumors, and 2 spindle cell tumors.The validation data set comprised 4 sclerosing angiomas, 3 hamartomas, 3 inflammatory pseudotumors, 3 spindle cell tumors, 2 bronchoceles and 2 chondrophymas.§ In the development data set, 29 nodules became markedly smaller within a short time, and 20 cases were stable after more than two years.The validation data set comprised 2 suppurative inflammation lesions and 1 eosinophilic inflammatory lesion.# Conditions that did not have a final diagnosis, including chronic inflammation, fibroplasia and gland hyperplasia.

ROC curves of our model, the Mayo model and the VA model in the development data set
in our hospital from Jan 1st of 2011 to March 30th of 2016 (these data were obtained from the PACSKJLCT 08-SYSTEM, Chongqing, China).With the exception of patients with a history of primary lung cancer, no other patients were excluded from the study.A malignant pathologic diagnosis was based on an examination of tissue obtained via biopsy or surgery.A definitive benign diagnosis was established when a specific benign etiology was confirmed pathologically through biopsy or surgery, when an SPN was found to have been radiographically stable for at least 2 years or when an SPN had been clearly absorbed within a short time.SPNs that did not meet these criteria and patients who did not have follow-up data were classified as undiagnosed.All

ROC curves of our model, the Mayo model and the VA model in the validation data set
divided into two groups, a development data set (assessed from Jan 1st of 2011 to April 30th of 2015) and a validation data set (assessed from May 1st of 2015 to March 30th of 2016).All patients underwent core needle biopsy using an Angiotech SuperCore TM Biopsy Instrument, 16 ga × 9 cm or ×15 cm (TECHNOLOGIES, USA), under the guidance of spiral CT (Siemens, Germany).CT was performed with single-(from Jan 1st of 2011 to Oct 30th of 2015) or 16-detector (from Nov 1st of 2015 to March 30th of 2016) CT scanners (tube voltage 120 kV, tube current 150 mA/ref, pitch 0.8