Computed tomography-based radiomics machine learning models for prediction of histological invasiveness with sub-centimeter subsolid pulmonary nodules: a retrospective study

To improve the accuracy of preoperative diagnoses and avoid over- or undertreatment, we aimed to develop and compare computed tomography-based radiomics machine learning models for the prediction of histological invasiveness using sub-centimeter subsolid pulmonary nodules. Three predictive models based on radiomics were built using three machine learning classifiers to discriminate the invasiveness of the sub-centimeter subsolid pulmonary nodules. A total of 203 sub-centimeter nodules from 177 patients were collected and assigned randomly to the training set (n = 143) or test set (n = 60). The areas under the curve of the predictive models were 0.743 (95% confidence interval CI [0.661–0.824]) for the logistic regression, 0.828 (95% CI [0.76–0.896]) for the support vector machine, and 0.917 (95% CI [0.869–0.965]) for the XGBoost classifier models in the training set, and 0.803 (95% CI [0.694–0.913]), 0.726 (95% CI [0.598–0.854]), and 0.874 (95% CI [0.776–0.972]) in the test set, respectively. In addition, the decision curve showed that the XGBoost model added more net benefit within the range of 0.06 to 0.93.


INTRODUCTION
Owing to advances in medical imaging technology and the wide application of lowdose computed tomography (CT), the detection of pulmonary nodules has drastically increased over the past decade. Among these, subsolid nodules (SSNs), which behave as ground-glass opacities on a CT scan, have drawn significant attention. The Dutch-Belgian Randomized Lung Cancer Screening Trial (Nederlands Leuvens Longkanker Screenings Onderzoek; NELSON) has shown that SSNs are detected in 3.3% of patients (Scholten et al., 2015). In another study in Chinese medical staff, SSNs diagnosed as adenocarcinoma via histopathology were detected in 2.0% of cases (Zhang et al., 2020). However, preoperative diagnose of SSNs remains an unresolved issue, and the accurate diagnosis and detection of early triggers have become a key topic in public health. In the present study, we focused specifically on the subtype of sub-centimeter SSNs.
The pulmonary nodule is defined as an area of increased attenuation of the lung in a CT scan with a diameter of less than 30 mm (Erasmus et al., 2000). Depending on the component of the nodule, it can be classified into two subtypes: a solid or a subsolid nodule. A subsolid nodule is defined as an area that partly or entirely disappears in the mediastinal window of a CT scan (Bueno, Landeras & Chung, 2018;Clark et al., 2015) and presents with a variety of histopathological findings. On the one hand, it can be a benign lesion, such as focal fibrosis, inflammation, bleeding, or a precancerous lesion (e.g., atypical adenomatous hyperplasia [AAH]). On the other hand, it can be adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma (IAC) (Henschke et al., 2002;Hutchinson, Moreira & Ko, 2017;Travis et al., 2015). Usually, AAH and AIS are considered preinvasive lesions, whereas MIA and IAC are regarded as invasive lesions (Travis et al., 2015).
The screening of I-ELCAP (the International Early Lung Cancer Action Program) indicated that 34% of SSNs are malignant, whereas only 7% of solid nodules are malignant (Yip et al., 2015). According to the international multidisciplinary classification of lung adenocarcinoma published in 2015, wedge resection and segmental resection without lymph node dissection are recommended before lesions reach the level of invasive adenocarcinoma. Conversely, for IAC, lobectomy and lymph node dissection are recommended (Goldstraw et al., 2016;Liu et al., 2016). Moreover, Fu et al. (2021a) reported that SSNs with a diameter of more than 10 mm are more often diagnosed as IAC. Therefore, preoperative diagnosis of sub-centimeter SSNs is crucial. To date, numerous studies have focused on SSNs; however, research on sub-centimeter nodules is scarce (Meng et al., 2021;She et al., 2017;Sun et al., 2019;Sun et al., 2020). Therefore, improving the accuracy of preoperative diagnoses of subsolid sub-centimeter nodules and providing a reference for clinicians is particularly important.
At present, preoperative diagnosis methods for sub-centimeter SSNs include (Godoy & Naidich, 2012), (1) CT characteristics, such as diameter, density, spiculation, lobulation, cavity, bubble-like lucency, air bronchogram, vascular breakthrough sign, and pleural indentation sign; (2) positron emission tomography (PET)-CT; and (3) biopsy. The limitation of CT characteristics is the difficulty in achieving interobserver agreement of findings. The limitation of PET-CT is that the maximum standard uptake values of SSNs are generally low (Nomori et al., 2004). Successful biopsy cannot be guaranteed because the diameter of the nodule is small, and the rate of false negatives is high (Ng et al., 2008;Ricciardi et al., 2021). Moreover, biopsy is a highly invasive method. Totally, there is an urgent need for a different approach to provide reference for preoperative decisions.
In 2012, Lambin et al. (2017 developed the concept of radiomics, which overcame the limitations of existing techniques (Lambin et al., 2012). Currently, radiomics has been applied widely to diagnosis, staging, efficacy assessment, and prognosis. Moreover, a considerable number of studies have demonstrated that radiomics is helpful for the preoperative diagnosis of SSNs (Meng et al., 2021;She et al., 2017;Sun et al., 2019;Sun et al., 2020). Therefore, we conducted a retrospective review of patients who had undergone surgery for sub-centimeter SSNs to evaluate the performance of CT-based radiomics machine learning models for discriminating the invasiveness of subsolid sub-centimeter nodules.

Patient selection
In this retrospective study, we obtained the medical records of patients who had undergone surgical resection at our hospital from 2019 to 2021 and whose nodules had been confirmed to be adenocarcinoma spectrum lesions. The cases met the following inclusion criteria: (1) the maximum diameter of the SSN in the largest slice in the CT scan was ≤ 10 mm; (2) the lesion was confirmed as AAH, AIS, MIA, or IAC via histopathology; (3) the preoperative CT scan had a slice thickness of ≤ 1.5 mm; (4) the interval between the preoperative CT scan and surgical resection was ≤ 1 month. The exclusion criteria for the study were: (1) confirmation of histopathological result by biopsy or bronchoscopy; (2) the patient had undergone other therapies for tumors, such as chemotherapy, radiotherapy, and targeted therapy. The cases were divided into two groups depending on histological invasiveness: Group A (AAH/AIS/MIA) and Group B (IAC). According to the ratio of cases in Group A and B, the cases were divided into two sets: 70% of cases were divided into the training set, and 30% of cases were divided into the test set.

CT image acquisition
The CT examination was performed on a Definition AS 40-detector (Siemens Healthineers, Erlangen, Germany), a Definition Flash 64-detector and Dual-energy Force (Siemens Healthineers), a Somatom EMOTION 16-detector (Siemens Healthineers), and an IQon 64-detector and Dual-energy Force (Philips Healthcare, Amsterdam, Netherlands). Scans covered the area between the thorax inlet and the bilateral adrenal glands in the supine position. The scanning parameters were as follows: 120 kV, automatic tube current, pitch 0.75-1.5, field of view 336 × 336 mm, and resolution 512 × 512. CT images were interpreted by two experienced pulmonologists who were blinded to the histopathological diagnosis in the lung and mediastinal windows. The following subjective features of the lesion were recorded: (1) component of the lesion (part solid or non-solid), (2) lesion diameter (the maximum diameter in the largest slice), (3) lesion location, and (4) the characteristics of the lesion, including lobulation (absent, present), spiculation (absent, present), bubble-like lucency (absent, present), cavity (absent, present), pleural indentation sign (absent, present), vascular breakthrough sign (absent, present), sharp (round or oval, irregular or polygonal), and margin (clear, unclear).

Histopathological analysis
Surgical specimens were analyzed separately by two experienced pathologists blinded to the CT findings, according to the 2015 International Association for the Study of Lung Cancer/American Thoracic Society, the European Respiratory

Nodule segmentation
At present, semi-automatic segmentation remains a relatively reliable method (Owens et al., 2018). Segmentation was performed using 3D Slicer 4.11 (https://slicer.org/). First, CT images were downloaded to the workstation from the picture archiving and communication system. Second, one pulmonologist outlined the margin of the nodule slice-by-slice. The software then automatically calculated the region of interest (ROI) in three dimensions according to the selected region ( Fig. 1). In addition, the mean and standard deviation (SD) of the CT attenuation of the lesion were calculated by the software. Finally, another pulmonologist reviewed the processes and results. In order to reduce the heterogeneity bias caused by different CT scanners or parameters, ROIs were resampled after segmentation.

Radiomics feature selection and predict model construction
The process of radiomics feature selection from the radiomics feature data extracted from the training set was as follows: (1) removal of the redundant features with a Pearson correlation coefficient >0.9; (2) removal of the features with a p > 0.05 in the Mann-Whitney U test; (3) retaining features filtered by the least absolute shrinkage and selection operator (LASSO). Using the retained features, three machine learning models were constructed according to three classifiers using the training set for differentiating Group A (AAH/AIS/MIA) from Group B (IAC). The classifiers included a logistic regression (LR) classifier, a support vector machine (SVM) classifier, and an XGBoost (XGB) classifier. After the models were constructed, the training dataset and test dataset were fitted to the model to calculate the area under the receiver operator characteristic (ROC) curve (AUC), sensitivity, specificity, and accuracy. Differences in the ROC curves between the training and test sets were assessed using the Delong test. To compare the clinical benefit between the different models, we performed decision curve analysis (DCA) by calculating the net benefit for a range of threshold probabilities based on all datasets (Vickers et al., 2008).

RESULTS
The characteristics of the cases are summarized in Table 1. A total of 203 sub-centimeter nodules from 177 patients (49 males, 128 females; mean age: 50.6 ± 12.04 years) were included and assigned randomly to the training set (n = 143) or test set (n = 60). Among the lesions, 104 SSNs were confirmed as IAC and 99 were classified as AAH/AIS/MIA. No significant differences in age, sex, component, location, lobulation, spiculation, bubble-like lucency, cavity, pleural indentation sign, sharp, or margin were found between Group A (AAH/AIS/IAC) and B (IAC) in either the training or test sets. However, the mean and SD of CT attenuation differed significantly between the two groups (p < 0.05).
In total, 1,781 radiomics features were extracted. After correlation analysis and the Mann-Whitney U tests, 1027 radiomics features were retained. The LASSO with a k-fold (k = 10) cross-validation yielded an optimal λ value of 0.07718 and a corresponding log (λ) value of −2.562. There were 10 radiomics features with non-zero coefficients (Figs. 2 and 3), which included firstorder_Maximum, glszm_LargeAreaLowGrayLevelEmphasis,  glszm_LowGrayLevelZoneEmphasis, wavelet-LHL_firstorder_Minimum, wavelet-LHL_glcm_InverseVariance, wavelet-HLL_gldm_DependenceNonUniformity, wavelet-HLH_glcm_Imc2, wavelet-HHH_firstorder_Skewness, wavelet-LLL_firstorder_RootMeanSquared and squareroot_ngtdm_Strength. Using these 10 radiomics features, three machine learning models were built using the LR, SVM, and XGB classifiers. An overview of the results is provided in Table 2 and Fig. 4. In the training set, the predictive accuracies of the three models were 0.720 for LR, 0.762 for SVM, and 0.881 for XGB. Those in the test set were 0.750 for LR, 0.700 for SVM, and 0.850 for XGB. for XGB in the test set. The predictive performance of the radiomics models in the test set was not significantly different from that in the training set (Delong tests: p > 0.05). These results indicated that the machine learning models were beneficial for discriminating between IAC and AAH/AIS/MIA with subsolid sub-centimeter pulmonary nodules. The DCA results of the three models are presented in Fig. 5. The decision curve showed that the XGB model offered a greater net benefit than the LR and SVM models within the threshold probability ranges of 0.06 to 0.93. Among the three predictive models, the XGB model had the best performance.

DISCUSSION
For many years, SSNs were considered one of the imaging manifestations of pneumonia until it was revealed that SSNs can be early-stage lung cancer (Jang et al., 1996;Kuriyama et al., 1999). Because lung cancer is usually asymptomatic in the early stage, early detection is challenging. The global lung cancer mortality rate is the highest of all cancers, with 1.08 million deaths in 2020 (Ferlay et al., 2020). Moreover, there has been a significant increase in the prevalence of adenocarcinoma, which is now the most prevalent type of lung cancer, accounting for 50% of all lung cancer diagnoses (Bray et al., 2018;Nagy-Mignotte et al., 2011;Barta, Powell & Wisnivesky, 2019). Thus, lung adenocarcinoma is a highly relevant topic in public health research.  In CT images, lung adenocarcinomas may appear as ground-glass nodules during the earliest stages. In histopathology, AAH, AIS, MIA, and IAC are considered a spectrum of adenocarcinomas (Bray et al., 2018). In the lining of alveolar walls or respiratory bronchioles, AAH is characterized by a limited proliferation of mildly to moderately atypical type II pneumocytes and/or Clara cells. AIS is a localized small adenocarcinoma that grows within pre-existing alveolar structures, without invaders from the stroma, blood vessels, or pleura. MIA is a small, solitary adenocarcinoma, with a predominantly lepidic pattern and a maximal 5-mm invasion in the greatest focus. The prognosis of IAC differs significantly from that of MIA and AIS. Following surgery for a preinvasive lesion and MIA, patients have a 5-year survival rate of roughly 100%. In contrast, the progression-free survival rate for patients with IAC is 74. 6% (Goldstraw et al., 2016;Liu et al., 2016). Thus, a well-performing predictive model using radiomics features and machine learning may provide a reference for surgeons and the public. for the training and test sets, respectively, demonstrating that the radiomics machine learning models may provide a reference for surgeons to evaluate the invasiveness of sub-centimeter SSNs accurately and avoid over-or undertreatment. In addition, DCA demonstrated that the XGBoost model was more clinically useful.
Usually, objective CT characteristics are used to evaluate the invasiveness of SSNs in clinical practice (Ricciardi et al., 2021). However, it's not reliable when we evaluate the sub-centimeter SSNs. Firstly, the effectiveness of using objective CT characteristics is still controversial. Some studies suggested the mean and SD of CT attenuation as factors for evaluating the invasiveness of SSNs (She et al., 2017;Kitami et al., 2016), but other studies have reported no correlation between mean and SD of CT attenuation and invasiveness and shown that only lesion size is the influencing factor (Fu et al., 2021b). Whether the mean and SD of CT attenuation reflect the invasiveness of SSNs remains inconclusive. Secondly, in contrast to larger nodules, sub-centimeter SSNs lack obvious different morphological CT signs, such as lobulation, spiculation, cavity, and bubble-like lucency (Chen et al., 2021;Wu et al., 2017). Similarly, the present analysis showed that these morphological CT signs were not independent predictors of IAC, indicating that nodules without these morphological features should not be ignored. Thirdly, agreement on evaluations of morphological CT signs between clinicians is often low (Zhao et al., 2019).
In contrast to objective CT features, radiomics converts image data into quantitative data using high-throughput mining. The massive data obtained contribute to diversity analysis in higher dimensions (Lambin et al., 2012;Lambin et al., 2017). Radiomics features are considered to quantitatively reflect the heterogeneity of lesions, which can predict tumor behavior (Lambin et al., 2012;Lambin et al., 2017;Nioche et al., 2018;Chen et al., 2020). Previous large-scale studies have confirmed that radiomics features exhibit good diagnostic performance. Chen et al. (2018) demonstrated 76 relevant radiomics features selected from 750 extracted features that differed significantly between benign and malignant pulmonary nodules. Hawkins et al. (2016) reported that radiomics could be applied to screen the risk of lung cancer (accuracy = 0.800). In the present study, 1,781 radiomics features were extracted from the ROIs of lesions selected using multiple methods. At last, we identified 10 radiomics features that were more strongly associated with invasiveness than other features. The results demonstrated that the radiomics models reached satisfactory prediction performance, and the Delong tests between the training and test sets showed stability of the radiomics models, which may help clinicians in selecting appropriate treatment measures.
Further, our models were constructed using three different machine learning classifiers. Machine learning has been applied widely and validated to improve predictive performance. Previous studies have commonly used LR, random forest, and SVM as classifiers to build predictive models (Huang et al., 2016;Parmar et al., 2015;Yang et al., 2015;Ypsilantis et al., 2015). However, in the present study, we selected a relatively novel classifier, XGB, and compared it with the other two classifiers, revealing that XGB model had the best performance when assessed by ROC curve analysis and DCA. The XGB model may provide a better strategy for diagnosis in clinical practice. In fact, the XGB classifier has been demonstrated to have high flexibility in other fields (Dhaliwal, Nahid & Abbas, 2018;Gumus & Kiran, 2017;Ren et al., 2017;Torlay et al., 2017). In the Kaggle data science competition, more than half of the challenge-winning solutions used XGB models (Chen & Guestrin, 2016). Regularization of L1 and L2 in XGB prevents the model from overfitting and takes advantage of parallel processing (Tang et al., 2019), which likely contributes to its superiority.
We also acknowledge that our research has limitations. First of all, the present study design was retrospective and the models were not validated using an external dataset. Therefore, a multicenter clinical trial with a larger sample size and prospective validation set are needed to determine the generalization of the models. Besides, our data showed that the mean and SD of CT attenuation helped to distinguish the invasiveness of lesions and the vascular breakthrough sign with in the training set. Our next plan is to identify more meaningful clinical features to construct a combined model to enhance predictive performance. Moreover, because only surgically cases were included, the cases inevitably were skewed toward the malignant nodules in morphology. Thus, more cases are supposed to be allowed for the inclusion of study.