Evaluation of models for predicting the probability of malignancy in patients with pulmonary nodules

Abstract Objectives: The post-imaging, mathematical predictive model was established by combining demographic and imaging characteristics with a pulmonary nodule risk score. The prediction model provides directions for the treatment of pulmonary nodules. Many studies have established predictive models for pulmonary nodules in different populations. However, the predictive factors contained in each model were significantly different. We hypothesized that applying different models to local research groups will make a difference in predicting the benign and malignant lung nodules, distinguishing between early and late lung cancers, and between adenocarcinoma and squamous cell carcinoma. In the present study, we compared four widely used and well-known mathematical prediction models. Materials and methods: We performed a retrospective study of 496 patients from January 2017 to October 2019, they were diagnosed with nodules by pathological. We evaluate models’ performance by viewing 425 malignant and 71 benign patients’ computed tomography results. At the same time, we use the calibration curve and the area under the receiver operating characteristic curve whose abbreviation is AUC to assess one model’s predictive performance. Results: We find that in distinguishing the Benign and the Malignancy, Peking University People’s Hospital model possessed excellent performance (AUC = 0.63), as well as differentiating between early and late lung cancers (AUC = 0.67) and identifying lung adenocarcinoma (AUC = 0.61). While in the identification of lung squamous cell carcinoma, the Veterans Affairs model performed the best (AUC = 0.69). Conclusions: Geographic disparities are an extremely important influence factors, and which clinical features contained in the mathematical prediction model are the key to affect the precision and accuracy.


Introduction
Pulmonary nodules are common. Referable to the characteristics of pulmonary nodules, computed tomography (CT) imaging is currently the most prior method for decreasing pulmonary nodules and screening early-stage lung cancer in high-risk populations [1]. The pulmonary nodule can be separated into solid nodules and subsolid nodules, usually, we divide subsolid nodules into pure ground glass nodules and partial solid nodules. At the same time, if a nodule completely masks the entire lung parenchyma, we can mention it as the solid nodule [2]. According to the size of the node, the pulmonary nodules with ≤8 mm are defined as subcentimeter nodules. The lesion with straight diameter >3 cm is defined as lung swelling (lung mass) rather than nodule. Based on previous research, the lung swelling with

Participants
From January 2017 to October 2019, there were 542 patients from Central Hospital of Wuhan with pulmonary nodules who had surgery and had a clear pathological diagnosis. Of the 46 patients who were not included in the study because of incomplete data, we analyzed imaging data from 496 patients. Of the 496 patients with pulmonary nodules, 71 were other lung diseases that were not lung cancer, and 425 were malignant tumors. (Table 3) We usually diagnose lung cancer by examining excised specimens or biopsy specimens histopathologically or cytopathologically. Benign pulmonary nodules need to be stable for more than 2 years and biopsy or surgical resection is no seen in the nodules or a clear diagnosis [19][20][21].

Variables
All patient information are collected from the hospital information system. Clinical data collected included the patient's name, serial number, age, sex, history of smoking (smoking years, quit year), history of lung cancer, family history of cancer, nodule characteristics comprised calcification, spiculation, lobulation, clear border, air bronchogram sign, ground glass change, the site of nodules, and nodules diameter. All CT nodule features were collected from the CT reports. The pictures were displayed using lung window setting (width, 1500 HU; level, 600 HU).

Statistical analysis
In the present study, we employed SPSS21.0 software for statistical analysis. All data sets were included in the single factor analysis to determine the factors affecting the malignant probability of pulmonary nodules. The clinical data of independent and relevant factors related to benign and malignant were screened by multivariate logistic regression.
The original prediction performance of the area evaluation model based on (ROC-AUC) with 95% confidence interval is used. P value could help us to define whether it has statistically significant or not, when P < 0.05 it was normal great. One-way analysis of variance is performed on all observations, and the variance is equal to the conditions of use. If the assumptions are not met then use the Student's t-test [22][23][24][25].

A comparison of the four models
On account of Brock university's model that is excellent in all aspects, we expected it to perform well, but by comparing the diagnostic efficiency of the four models, we found that PKUPH model is more suitable for our patients. In the data that we collected, during this period, 46 participators (8.45%) were lost to follow-up, and of the 425 patients (299 men, 126 women) with malignant pulmonary nodules, 150 had lung adenocarcinoma and 56 had lung squamous cell carcinoma. The patient's age ranged from 29 to 89 years ( Figure 1A). In addition, there were 126 patients with pulmonary nodules between 1.9 and 8 mm in diameter, 219 patients with pulmonary nodules between 8 and 30 mm in diameter, and 151 patients with pulmonary nodules between 30 and 124 mm in diameter. ( Figure 1B). We bring the information of these collected patients into the model's formula and calculate the results. We

Evaluation of suitability
In the comparison of these models, we found that PKUPH model May show relatively better. It includes the diagnostic efficiency of lung cancer and pulmonary nodules, lung adenocarcinoma and early and late lung cancer. But in the comparison of squamous cell carcinoma, the VA model will be more suitable. In the supplementary material, we also provide a detailed table of logistic regression for the four models, (Tables 4,5,6 and 7) and four kinds of logistic regression model to compare (Table 8).

Discussion
Selection of mathematical prediction models for pulmonary nodules requires caution, radiologists should consider in their area of lung cancer epidemiology, and verify it in the local population. It has brought many problems to the clinic, since there are many factors affecting the benign and malignant lung nodules. Different prediction models use different predictive factors, which greatly affects the applicability of the model. These models have their own advantages, many studies have clarified them. At the same time, they also have some problems to be considered, and their applicability in different region needs to be supported by more data. For example, the Mayo model is based on  The reason for this result is because of different nationalities (the prevalence of tobacco and difference in the pattern of the history of tobacco exposure) conditions together with geographic disparities differences. Most lung cancers (61%) were diagnosed as stage III or IV; Only 21% of cases have been confirmed in stage I. As for stage I 5-year survival rates were 57%, while stage IV was 4%. Almost 75% of lung cancer survivors are 65 years old and above, more than 60% of patients diagnosed in five years, it is due to the low survival of lung cancer [26][27][28][29][30][31]. In the 2018 Cancer Statistics, invasive cancer men (39.7%) and women (37.6%). In the 2019 Cancer statistics, invasive cancer men (39.3%) and  women (37.7%). It reflects to some extent the cause of the differences in environmental exposure [28,30]. So far, the cancer of lung cancer is the biggest geographical differences, reflecting the state between smoking prevalence of huge differences in history and continued. It can be also seen that in the United States, the occurrence of lung cancer in men is decreasing (estimate the amount of new lung cancer cases at 12,1680 in 2018 and 11,6440 in 2019), while the incidence of women is rising (estimate the amount of new lung cancer cases at 11,350 in 2018 and 11,710 in 2019). It's worth noting that there is no significant difference between females from Chinese (22.8 per 100,000) and some Western European countries, (for instance, in France 22.5 per 100,000) although there are real differences in smoking prevalence between the two types of people. The incidence and trend of lung cancer vary greatly depending on gender, age, ethnicity, and socioeconomic status. In the United States, lung cancer mortality is highest among men of lower socioeconomic status, especially in central and southern regions. Smoking rates are decreasing globally, especially among men, such as the United States, the United Kingdom, and Australia. But in countries that started smoking late, we are seeing an increase in smoking rates. Nowadays, there is more than 50% lung cancer patients died every year in low income countries as well as middle income ones. [4,5,28,30]. Studies have shown that exposure to central bronchi by low molecular weight polycyclic aromatic hydrocarbons produced by smoking can lead to small cell lung cancer, while nitrosamines in peripheral lung tissue exposed to tobacco smoke can cause lung adenocarcinoma. All histological types were closely related to smoking, the relative risk of adenocarcinoma is much lower than that of small cell lung cancer and squamous cell carcinoma. The former of which is the more common of non-smokers and women, while the latter of which are more common with the time of smoke increased. In general, lung cancer in non-smokers are different from that in smokers at the molecular as well as epigenetic levels. Histologically, cancer from never-smokers is also different from smoker patients' cancer. Never smokers and women are mainly influenced by lung adenocarcinoma, while male smokers are predominantly squamous cell carcinoma [32][33][34][35].
There is increasing evidence to suggest recommendations to manage these patients, including how to define risk for progression and how to how to analyze who can observe through continuous imaging. These imaging features also assist to distinguish patients who may have early stage lung cancer that profit from local treatment. But this is not enough, and this may require the creation of new models to meet the needs of patients and clinicians with lung nodules in the region. Because the model is also constantly improving, by which kinds of indicators are included in the calculation. Obtaining a more accurate formula is a difficult problem. At the same time, we found that it is not rigorous to consider only smoking, but the time and amount of smoking. There are many factors influencing. Different places may have different results due to different geographical environments, different living habits, and different eating habits. This requires researchers to consider and verify whether other models can be directly used. Looking through the literature reports in recent years, it is not difficult to find that more and more better models have been established. More indicators or more sensitive molecular markers may be added (e.g. CEA and Cyfra21-1). It is complex and challenging to evaluate lung nodules. At present, guidelines advocate a scheme of a system based on clinical and radiographic features to evaluate the likelihood of malignancy. An externally validated clinical malignancy probabilistic model can assist us to identify benign and malignant nodules and advise clinicians and patients in making management decisions. As we apply the model to the clinic, it is significant to know the source population of each model. Therefore, it is very important to establish a regional prediction model for the benign and malignant pulmonary nodules, which might possess potential to help doctors to choose and interpret diagnostic and reduce the cost and suffering of patients. The development of radionomics and molecular biomarkers is expected to enhance the probability estimation of malignant tumors in the near future.