Prediction of the Benign or Malignant Nature of Pulmonary Pure Ground-Glass Nodules Based on Radiomics Analysis of High-Resolution Computed Tomography Images

To evaluate the efficacy of radiomics features extracted from preoperative high-resolution computed tomography (HRCT) scans in distinguishing benign and malignant pulmonary pure ground-glass nodules (pGGNs), a retrospective study of 395 patients from 2016 to 2020 was conducted. All nodules were randomly divided into the training and validation sets in the ratio of 7:3. Radiomics features were extracted using MaZda software (version 4.6), and the least absolute shrinkage and selection operator (LASSO) was employed for feature selection. Significant differences were observed in the training set between benign and malignant pGGNs in sex, mean CT value, margin, pleural retraction, tumor–lung interface, and internal vascular change, and then the mean CT value and the morphological features model were constructed. Fourteen radiomics features were selected by LASSO for the radiomics model. The combined model was developed by integrating all selected radiographic and radiomics features using logistic regression. The AUCs in the training set were 0.606 for the mean CT value, 0.718 for morphological features, 0.756 for radiomics features, and 0.808 for the combined model. In the validation set, AUCs were 0.601, 0.692, 0.696, and 0.738, respectively. The decision curves showed that the combined model demonstrated the highest net benefit.


Introduction
Pulmonary ground-glass nodules (GGNs) are a type of lung lesion that exhibit heightened opacity on computed tomography (CT) scans and do not obscure the underlying bronchial and vascular architectures within the lung parenchyma [1].Histopathologically, GGNs are caused by partial alveoli filling due to a variety of reasons, such as thickening of the alveolar walls caused by fluid accumulation, cellular infiltration, or fibrosis; partial alveoli collapse; increased capillary blood volume; or a combination of these, which together lead to the partial replacement of lung air [2].Consequently, GGNs can represent benign lesions such as inflammation, hemorrhage, or localized interstitial fibrosis, as well as lung cancer or pre-cancerous lesions.Some benign GGNs may disappear over time or with anti-inflammatory treatment, while persistent GGNs often indicate a high risk of lung cancer [3].Succony et al. [4] reported that 37% of GGNs disappear on CT review after three months, whereas 10% of GGNs ultimately develop into invasive lung cancer.
In recent years, with growing health consciousness and the widespread application of high-resolution CT (HRCT) scans of the chest, the detection of lung cancer presenting as GGNs has become increasingly frequent [5].Currently, the clinical management strategy for pulmonary GGNs, especially pure ground-glass nodules (pGGNs), mainly involves follow-up examinations [6,7].However, long-term follow-up may impose substantial psychological stress and economic burdens on patients [8,9].If the benign or malignant nature of GGNs can be predicted at the time of the initial examination, it would significantly reduce the need for unnecessary follow-ups and alleviate concerns about underdiagnosing stable lung cancer nodules.Additionally, it would minimize the resection of benign nodules without delaying lung cancer diagnosis.
Traditional imaging assessment methods, such as evaluating size, morphology, and margin characteristics, are still the primary method for distinguishing benign from malignant lung nodules [3,10].However, these methods often yield unstable results and lack accuracy in determining the nature of GGNs, particularly for nodules with blurred borders, slow growth, or atypical morphology [11].
In 2012, Lambin et al. [12] introduced the concept of radiomics, revolutionizing the analysis and interpretation of medical images.This field has developed rapidly in recent years.Its applications in pulmonary lesions primarily include predicting the benignity and malignancy of lung lesions [13,14], as well as forecasting the invasiveness of lung cancer [15,16], predicting gene mutations [17,18], and assessing prognosis [19,20].However, there are fewer studies applying radiomics to the benign and malignant analysis of pGGNs [21].
In this study, we extracted the CT radiomics features of pGGNs, developed a prediction model, and compared it with the radiographic features, aiming to explore the value of radiomics in predicting the malignancy of pGGNs.

Study Population
This study was a retrospective study approved by the Ethics Committee of our hospital, and informed patient consent was waived (approval 2023-No.203 of the Ethics Committee).
A comprehensive search was conducted to identify all patients who presented with pulmonary nodules at our institution between 2016 and 2020.Inclusion criteria were as follows: (1) preoperative chest CT scans available in the Picture Archiving and Communication System (PACS) with thin-section lung window images (slice thickness < 1.5 mm); (2) lesions appearing as pure ground-glass nodules; (3) interval between scanning and surgery < 1 month; and (4) complete pathological data.Exclusion criteria were as follows: (1) poor scan quality with significant image artifacts (e.g., respiratory motion artifacts, foreign body artifacts outside the body) not meeting post-processing requirements; (2) invasive diagnostic or therapeutic procedures (e.g., biopsy, radiofrequency ablation) performed before CT scanning; and (3) simultaneously associated with other tumors.Based on the inclusion and exclusion criteria, 395 patients were included in this study, comprising 146 males (51.37 ± 12.14 years) and 249 females (52.45 ± 11.43 years), with 128 benign pGGNs and 267 malignant pGGNs (Figure 1).

Image Acquisition
All patients underwent routine CT scans using multi-slice spiral CT: TOSHIBA Aquilion (Toshiba Medical Systems, Ōtawara, Japan); Somatom Sensation 64, Somatom Def- All patients were randomly assigned to the training set and validation set in a 7:3 ratio, with 276 cases (89 benign, 187 malignant) in the training set and 119 cases (39 benign, 80 malignant) in the validation set.

Image Acquisition
All patients underwent routine CT scans using multi-slice spiral CT: TOSHIBA Aquilion (Toshiba Medical Systems, Ōtawara, Japan); Somatom Sensation 64, Somatom Definition (Siemens Healthineers, Erlangen, German); GE revolution, Discovery CT 750 HD (GE Healthcare, Chicago, IL, USA).Patients were scanned in the supine position while breathholding.The scanning range extended from the lung apex to the diaphragm.The scanning parameters were as follows: tube voltage of 100 kV (TOSHIBA), 120 kV (GE, SIMENS), automatic tube current, matrix 512 × 512, and field of view of 400 mm (TOSHIBA), 500 mm (GE, SIMENS).Thin-section lung window images were obtained using standard algorithms with a slice thickness of 1.0 mm or 1.25 mm, window width of 1500 HU, and window level of −600 HU.Subsequently, the acquired images were imported into MaZda software (version 4.6, http://www.eletel.p.lodz.pl/programy/mazda/(accessed on 11 January 2021)) in a DICOM (digital imaging and communications in medicine) format for analysis.

Image Analysis and Feature Extraction
CT imaging characteristics of pulmonary lesions were evaluated, including mean diameter [(long diameter + short diameter)/2], locate(lobe), CT attenuation value, shape (round/oval, irregular), margin features (spiculated), tumor-lung interface (clear smooth, clear rough, or blurred), pleural retraction (fine linear shadows between the lesion and pleura), vacuole sign (1-3 mm air-containing low-density areas within the lesion), vascular changes including external vascular change (vascular cluster sign) and internal vascular change (thickening, distortion).Measurements and assessments of images were performed by physicians with 5 and 11 years of diagnostic imaging experience, and a consensus was reached through consultation in case of discrepancies.
After importing the images into the MaZda software, gray-scale normalization was performed using µ ± 3σ to reduce the influence of contrast and brightness variations.The lesion images were then reviewed, and the maximum level of the lesion was selected.The region of interest (ROI) was manually delineated along the lesion contour using the segmentation tool in MaZda software to obtain the radiomics features (Figure 2).The final segmentation of all images was finally completed by the physician with 11 years of experience in obtaining radiomic features.

Statistical Analysis
Graphpad-prism (version Prism 9, https://www.graphpad.com/(accessed on 15 April 2022)) and R software (version 4.2.2, https://www.r-project.org/(accessed on 1 February 2023)) were utilized for statistical analysis.Normally distributed continuous data were expressed as mean ± standard deviation (M ± SD) and analyzed using the independent sample t-test.Non-normally distributed continuous data were represented as median (interquartile range) [M (Q1, Q3)] and analyzed using the Mann-Whitney U test.Categorical data were presented as numbers (percentages) and analyzed using the chi-square test.
The least absolute shrinkage and selection operator (LASSO) was employed for data dimensionality reduction and feature selection.LASSO is a regularization method for linear regression problems that selects a small number of key features in high-dimensional data, thereby reducing model complexity and preventing overfitting.The selected features were used to construct a predictive model by logistic regression.
The mean CT value, morphological features, radiomics features, and combined model were constructed, and the predictive performance of each model was evaluated using receiver operating characteristic (ROC) curves.A significance level of p < 0.05 was considered statistically significant.The clinical net benefit of each predictive model was evaluated using the decision curve analysis (DCA).

Statistical Analysis
Graphpad-prism (version Prism 9, https://www.graphpad.com/(accessed on 15 April 2022)) and R software (version 4.2.2, https://www.r-project.org/(accessed on 1 February 2023)) were utilized for statistical analysis.Normally distributed continuous data were expressed as mean ± standard deviation (M ± SD) and analyzed using the independent sample t-test.Non-normally distributed continuous data were represented as median (interquartile range) [M (Q1, Q3)] and analyzed using the Mann-Whitney U test.Categorical data were presented as numbers (percentages) and analyzed using the chi-square test.
The least absolute shrinkage and selection operator (LASSO) was employed for data dimensionality reduction and feature selection.LASSO is a regularization method for linear regression problems that selects a small number of key features in high-dimensional data, thereby reducing model complexity and preventing overfitting.The selected features were used to construct a predictive model by logistic regression.
The mean CT value, morphological features, radiomics features, and combined model were constructed, and the predictive performance of each model was evaluated using receiver operating characteristic (ROC) curves.A significance level of p < 0.05 was considered statistically significant.The clinical net benefit of each predictive model was evaluated using the decision curve analysis (DCA).

General Information and CT Imaging Features
Among the 395 patients, there were 267 cases of pulmonary adenocarcinoma and precancerous lesions, and 128 cases of benign lesions (including chronic inflammation, focal fibrous tissue proliferation, alveolar epithelial hyperplasia, granulomatous inflammation, and carbon deposition).Statistical analysis of clinical data and morphological features of benign pGGNs and malignant pGGNs in the training set revealed significant differences in mean CT value, sex, margin, tumor-lung interface, pleural retraction, internal vascular change (all p < 0.05), while age, mean diameter, location, shape, external vascular change, and vacuole sign showed no statistically significant differences (all p > 0.05) (Table 1, Figure 3).

General Information and CT Imaging Features
Among the 395 patients, there were 267 cases of pulmonary adenocarcinoma and precancerous lesions, and 128 cases of benign lesions (including chronic inflammation, focal fibrous tissue proliferation, alveolar epithelial hyperplasia, granulomatous inflammation, and carbon deposition).Statistical analysis of clinical data and morphological features of benign pGGNs and malignant pGGNs in the training set revealed significant differences in mean CT value, sex, margin, tumor-lung interface, pleural retraction, internal vascular change (all p < 0.05), while age, mean diameter, location, shape, external vascular change, and vacuole sign showed no statistically significant differences (all p > 0.05) (Table 1, Figure 3).

Construction and Diagnostic Performance of Predictive Models
Clinical and radiographic features of patients in the training set were analyzed using the chi-square test and Mann-Whitney U test.A model of the mean CT value was developed, along with a predictive model of morphological features (multivariate logistic regression was performed by fitting statistically significant characteristics, except the mean CT value).The area under the curve (AUC) of the mean CT value model in the training set was 0.606 [95% confidence interval (CI) 0.534-0.678],and that of the morphological prediction model was 0.718 (95% CI 0.656-0.781).In the validation set, the AUC of the mean CT value model was 0.601 (95% CI 0.486-0.717),and that of the morphological model was 0.692 (95% CI 0.589-0.795).
A radiomics predictive model was constructed based on the 14 features selected by LASSO.The AUC was 0.756 (95% CI 0.696-0.815) in the training set and 0.696 (95% CI 0.590-0.802) in the validation set.The mean CT value, selected morphological features, and radiomics features were fitted by logistic regression, and a combined model was constructed in an AUC of 0.808 (95% CI 0.755-0.861) in the training set and 0.738 (95% CI 0.641-0.835) in the validation set (Table 2, Figure 5).The mean CT value model had a lower predictive performance, the radiomics model outperformed the morphological features model, and the combined model had the highest predictive performance (Delong test: p < 0.05) (Table 3).Figure 6 illustrates examples of the combined model successfully or unsuccessfully classifying the studied nodules.Meanwhile, the DCA showed that the clinical net benefit of the combined model was greater than that of the other three predictive models (Figure 7).       the left lower lobe.The nodule is irregularly shaped with spiculation and internal vascular distortion.Postoperative pathology revealed adenocarcinoma (acinar + lepidic subtypes).The predictive model classified the nodule as malignant.(g,h) A 59-year-old male with a pGGN in the right lower lobe.The nodule is round-shaped with a vacuole sign and a normal vascular course within the lesion.Postoperative pathology revealed adenocarcinoma (lepidic + acinar subtypes).The predictive model classified the nodule as benign.
Figure 7.The decision curve analysis showed that the net benefit of the combined model was greater than that of the other three predictive models.

Discussion
Globally, approximately 1.8 million people die from lung cancer each year, making it the leading cause of cancer-related deaths [23].Furthermore, lung cancer accounts for the highest number of disability-adjusted life years across all age groups of cancer patients, irrespective of gender [24], indicating that lung cancer results in the greatest loss of healthy life years.About 75% of patients are already in advanced stages at the time of diagnosis, and the overall 5-year survival rate for patients with advanced lung cancer is only about 20% [25].In contrast, patients with adenocarcinoma in situ and microinvasive adenocarcinoma who undergo early-stage, complete surgical resection have a 10-year disease-specific survival rate of 100% [26].This significant disparity indicates that the key to improving lung cancer survival rates lies in early detection, diagnosis, and timely surgical intervention.
Previous studies have shown that the CT value of GGNs can be used to distinguish between benign and malignant lesions and to differentiate the invasiveness of lung adenocarcinoma [3,27,28].Yang et al. [3] suggested that a higher mean CT value in pGGNs may be advantageous for diagnosing malignant tumors, with values of −550 ± 141 HU for malignant lesions compared to −645 ± 90 HU for benign ones (p < 0.05).Wang et al. [27] conducted an analysis of thin-section CT images of 154 cases with sub-solid nodules and found that pre-invasive and micro-invasive lesions had lower CT value (−396.81± 235.20

Discussion
Globally, approximately 1.8 million people die from lung cancer each year, making it the leading cause of cancer-related deaths [23].Furthermore, lung cancer accounts for the highest number of disability-adjusted life years across all age groups of cancer patients, irrespective of gender [24], indicating that lung cancer results in the greatest loss of healthy life years.About 75% of patients are already in advanced stages at the time of diagnosis, and the overall 5-year survival rate for patients with advanced lung cancer is only about 20% [25].In contrast, patients with adenocarcinoma in situ and microinvasive adenocarcinoma who undergo early-stage, complete surgical resection have a 10-year disease-specific survival rate of 100% [26].This significant disparity indicates that the key to improving lung cancer survival rates lies in early detection, diagnosis, and timely surgical intervention.
Previous studies have shown that the CT value of GGNs can be used to distinguish between benign and malignant lesions and to differentiate the invasiveness of lung adenocarcinoma [3,27,28].Yang et al. [3] suggested that a higher mean CT value in pGGNs may be advantageous for diagnosing malignant tumors, with values of −550 ± 141 HU for malignant lesions compared to −645 ± 90 HU for benign ones (p < 0.05).Wang et al. [27] conducted an analysis of thin-section CT images of 154 cases with sub-solid nodules and found that pre-invasive and micro-invasive lesions had lower CT value (−396.81± 235.20 HU) than invasive lesions (−191.64 ± 206.23 HU, p < 0.001).In this study, a statistically significant difference was observed in the mean CT value between benign and malignant pGGNs (−541.56HU vs. −452.9HU, p = 0.004).Then, constructing an ROC curve using mean CT value, the AUC for the training set was 0.606 (95% CI 0.534-0.678),and for the validation set, it was 0.601 (95% CI 0.486-0.717).This suggests that although there is a difference in mean CT value between benign and malignant pGGNs, and the diagnostic performance assessed by CT values alone is limited.
Traditional CT imaging feature analysis can also be used to diagnose the benignity or malignancy of lung nodules [3,10,29].Yang et al. [3] performed a multivariate analysis in the pGGNs subgroup, and discovered that a well-defined border was a significant predictor favoring the diagnosis of malignancy; the AUC for this predictor was 0.705 (95% CI 0.583-0.828).In this study, there are four significant differences observed in HRCT imaging manifestations of pGGNs, including margin, pleural retraction, tumor-lung interface, and internal vascular change, between benign and malignant cases (all p < 0.05).It indicates that certain morphological features like margin spiculation, pleural retraction, and vascular changes within the lesions were more common in the malignant pGGNs compared to the benign ones.These may attributed to the rapid growth of adenocarcinoma and the irregular rate of internal growth of the lesion, resulting in uneven infiltration of the surrounding structures [10,30,31].Using logistic regression, a morphological feature model was established, with an AUC of 0.718 (95% CI 0.656-0.781) in the training set and an AUC of 0.692 (95% CI 0.589-0.795) in the validation set, demonstrating moderate diagnostic performance.Although morphological analysis and interpretation of CT images contribute to disease diagnosis, these rely on the expertise and understanding of imaging manifestations by diagnostic physicians.Moreover, when lesions present as pGGNs, the specificity of imaging features is insufficient, which can affect the accuracy of image interpretation.
Radiomics, by extracting and analyzing a large number of quantitative imaging features from medical images, can capture subtle differences in tissue characteristics that may not be discernible by the naked eye alone.While benign and malignant lesions may both manifest as pure ground-glass opacities on HRCT scans, the inherent pathophysiological characteristics and high degree of histological heterogeneity of lung cancer significantly distinguish it from benign lesions.Radiomics analysis can extract and quantify these intrinsic variations within tissue structures, and offer a nuanced perspective that transcends conventional imaging assessments [32,33].Gong et al. [33] employed radiomics analysis to diagnose ground-glass opacities in four datasets, achieving AUC values of 0.75, 0.55, 0.77, and 0.93.The accuracy was higher than that of two radiologists (53.1%, 56.3%, respectively).In this study, a total of 14 radiomics features with high diagnostic value were selected to establish the predictive model, with an AUC of 0.756 (95% CI 0.696-0.815) in the training set and an AUC of 0.696 (95% CI 0.590-0.802) in the validation set.The predictive value of the radiomics model was higher than the mean CT value and the morphological features model.
The radiomics model demonstrates strong performance in capturing microstructural variations in lesions, while clinical and radiographic features primarily reflect macroscopic manifestations and patients' background information.The integration of both approaches (e.g., the combined model) theoretically allows for a more comprehensive assessment of the pGGNs.In the study, we constructed the combined model and compared it to the radiomics, the mean CT value, and the morphological model.The combined model exhibited the highest predictive value in both the training set (AUC = 0.808, 95% CI 0.755-0.861)and the validation set (AUC = 0.738, 95% CI 0.641-0.835)(Delong test, p < 0.05).Compared to individual models, the combined model demonstrated superior diagnostic efficacy, highlighting the potential of radiomics in enhancing diagnostic precision.
This study also has some limitations.Firstly, this is a single-center retrospective study, which raises the possibility of data bias.Secondly, there is inconsistency among the CT scanners used for the cases included in this study, potentially impacting the results despite image normalization efforts.To address these limitations and provide more robust validation of our findings, future studies will involve collaborative multicenter efforts, which would help reduce data bias by incorporating a more diverse patient population and a wider range of imaging equipment.Additionally, multiparameter studies that incorporate various imaging modalities and clinical parameters could offer a more comprehensive assessment and strengthen the reliability of the predictive models.

Conclusions
In conclusion, we analyzed the clinical data, morphological features, and radiomics features of pGGNs, and developed a combined model that can non-invasively predict the benign or malignant nature of pGGNs.This model has the potential to significantly aid in clinical diagnosis and decision-making processes.

Tomography 2024, 10 ,Figure 1 .
Figure 1.The flow chart shows the inclusion and exclusion criteria of the study.

Figure 1 .
Figure 1.The flow chart shows the inclusion and exclusion criteria of the study.

Figure 2 .
Figure 2. Example of region of interest (ROI) delineation.(a-d) A 42-year-old male was first discovered with a left lower lobe pure ground-glass nodule (pGGN) in December 2019 (a).The follow-up in October 2020 showed no change in the nodule (b).(c,d) show the ROI and the extracted radiomics features.Postoperative pathology revealed focal fibrous tissue proliferation with inflammatory cell infiltration.(e-h) A 70-year-old male was first discovered with a right upper lobe pGGN in December 2017 (e).The follow-up in October 2020 showed an increase in the nodule size (f).(g,h) show the ROI and the extracted radiomics features.Postoperative pathology confirmed adenocarcinoma (papillary + acinar + lepidic subtypes).

Figure 2 .
Figure 2. Example of region of interest (ROI) delineation.(a-d) A 42-year-old male was first discovered with a left lower lobe pure ground-glass nodule (pGGN) in December 2019 (a).The follow-up in October 2020 showed no change in the nodule (b).(c,d) show the ROI and the extracted radiomics features.Postoperative pathology revealed focal fibrous tissue proliferation with inflammatory cell infiltration.(e-h) A 70-year-old male was first discovered with a right upper lobe pGGN in December 2017 (e).The follow-up in October 2020 showed an increase in the nodule size (f).(g,h) show the ROI and the extracted radiomics features.Postoperative pathology confirmed adenocarcinoma (papillary + acinar + lepidic subtypes).

Figure 3 .
Figure 3.Comparison of the mean CT value for benign pGGNs and malignant pGGNs in the training set (a) and the validation set (b).

Table 1 .Figure 3 .
Figure 3.Comparison of the mean CT value for benign pGGNs and malignant pGGNs in the training set (a) and the validation set (b).

Figure 4 .
Figure 4. (a) Plot of regression coefficients for each feature; (b) heat map of correlation coefficients for selected radiomics features.

Figure 4 .
Figure 4. (a) Plot of regression coefficients for each feature; (b) heat map of correlation coefficients for selected radiomics features.

Figure 5 .
Figure 5. ROC curves of the combined model, mean CT value model, morphological features model, and radiomics model in the training set (a,c) and the validation set (b,d).

Figure 5 .
Figure 5. ROC curves of the combined model, mean CT value model, morphological features model, and radiomics model in the training set (a,c) and the validation set (b,d).

Figure 5 .
Figure 5. ROC curves of the combined model, mean CT value model, morphological features model, and radiomics model in the training set (a,c) and the validation set (b,d).

Figure 6 .Figure 6 .
Figure 6.Examples of the combined predictive model successfully and unsuccessfully classifying the studied nodules.(a,b) A 60-year-old male with a pure ground-glass nodule (pGGN) in the rightFigure 6. Examples of the combined predictive model successfully and unsuccessfully classifying the studied nodules.(a,b) A 60-year-old male with a pure ground-glass nodule (pGGN) in the right lower lobe.The nodule is round-shaped with a clear-smooth tumor-lung interface and no spiculation.Postoperative pathology revealed fibrous tissue hyperplasia with inflammatory cell infiltration.The predictive model classified the nodule as benign.(c,d) A 52-year-old male with a pGGN in the right upper lobe.The nodule is irregularly shaped with a clear-rough tumor-lung interface, spiculation, pleural retraction, vacuole sign, and vessel convergence sign.Postoperative pathology revealed chronic inflammation with stromal fibrous tissue hyperplasia and glandular hyperplasia.The predictive model classified the nodule as malignant.(e,f) A 67-year-old female with a pGGN in the left lower lobe.The nodule is irregularly shaped with spiculation and internal vascular distortion.Postoperative pathology revealed adenocarcinoma (acinar + lepidic subtypes).The predictive model classified the nodule as malignant.(g,h) A 59-year-old male with a pGGN in the right lower lobe.The nodule is round-shaped with a vacuole sign and a normal vascular course within the lesion.Postoperative pathology revealed adenocarcinoma (lepidic + acinar subtypes).The predictive model classified the nodule as benign.

Figure 7 .
Figure 7.The decision curve analysis showed that the net benefit of the combined model was greater than that of the other three predictive models.

Table 1 .
Clinical and morphological features of benign and malignant pure ground-glass nodules in the training set.

Table 2 .
Predictive performance of the four models in the training and validation sets.

Table 3 .
The Delong test between the prediction models in the training set.