CT morphological features and histogram parameters to predict micropapillary or solid components in stage IA lung adenocarcinoma

Objectives This study aimed to construct prediction models based on computerized tomography (CT) signs, histogram and morphology features for the diagnosis of micropapillary or solid (MIP/SOL) components of stage IA lung adenocarcinoma (LUAC) and to evaluate the models’ performance. Methods This clinical retrospective study included image data of 376 patients with stage IA LUAC based on postoperative pathology, admitted to Putian First Hospital from January 2019 to June 2023. According to the presence of MIP/SOL components in postoperative pathology, patients were divided into MIP/SOL+ and MIP/SOL- groups. Cases with tumors ≤ 3 cm and ≤ 2 cm were separately analyzed. Each subgroup of patients was then randomly divided into a training set and a test set in a ratio of 7:3. The training set was used to build the prediction model, and the test set was used for internal validation. Results For tumors ≤ 3 cm, ground-glass opacity (GGO) [odds ratio (OR) = 0.244; 95% confidence interval (CI): 0.103–0.569; p = 0.001], entropy (OR = 1.748; 95% CI: 1.213–2.577; p = 0.004), average CT value (OR = 1.002; 95% CI: 1.000–1.004; p = 0.002), and kurtosis (OR = 1.240; 95% CI: 1.023–1.513; p = 0.030) were independent predictors of MIP/SOL components of stage IA LUAC. The area under the ROC curve (AUC) of the nomogram prediction model for predicting MIP/SOL components was 0.816 (95% CI: 0.756–0.877) in the training set and 0.789 (95% CI: 0.689–0.889) in the test set. In contrast, for tumors ≤ 2 cm, kurtosis was no longer an independent predictor. The nomogram prediction model had an AUC of 0.811 (95% CI: 0.731–0.891) in the training set and 0.833 (95% CI: 0.733–0.932) in the test set. Conclusion For tumors ≤ 3 cm and ≤ 2 cm, GGO, average CT value, and entropy were the same independent influencing factors in predicting MIP/SOL components of stage IA LUAC. The nomogram prediction models have potential diagnostic value for identifying MIP/SOL components of early-stage LUAC.


Introduction
Lung cancer is currently the leading cause of cancer-related deaths, with adenocarcinoma (AC) being the most common pathological type (1,2).In 2011, the International Association for the Study of Lung Cancer, the American Thoracic Society, and the European Respiratory Society (IASLC/ATS/ERS) released a new classification for lung cancer, categorizing invasive lung adenocarcinoma (LUAC) into five types based on their primary pathological components: lepidic, acinar, papillary, solid (SOL), and micropapillary (MIP) (3).Significant differences in invasiveness and prognosis exist among different pathological subtypes (4,5).Lepidic predominant AC is associated with a favorable prognosis, while SOL and MIP predominant AC have poorer prognoses (6).However, previous research has primarily focused on the relationship between the predominant pathological subtype and prognosis, overlooking the impact of minor high-risk pathological components on prognosis.An increasing number of studies have pointed out that the presence of non-dominant MIP and SOL components in early-stage LUAC is also associated with poor prognosis and early recurrence (7,8).
Surgical resection is the most effective treatment for stage IA LUAC, including lobectomy and sub lobar resection (9).Existing studies indicate that for peripheral non-small cell lung cancer ≤ 2 cm in diameter, sublobar resection may achieve perioperative outcomes comparable to lobectomy, but patient selection must be cautious (10,11).Previous studies have shown that sublobar resection is associated with a favorable prognosis for lepidic predominant AC (12).In contrast, lobectomy and systematic lymph node dissection are often considered for SOL and MIP predominant ACs due to their relatively higher rates of lymph node metastasis and tumor recurrence (13,14).This underscores the importance of identifying the presence of MIP/SOL components in early-stage LUAC for patient's surgical approach selection and prognosis assessment.
Although percutaneous biopsy can detect the pathological subtypes of LUAC preoperatively, it may increase the risk of metastasis (15).Previous studies have demonstrated the correlation between CT images and pathology, as well as prognosis (16,17).However, traditional methods of reviewing images mainly rely on morphological features, resulting in generally limited diagnostic efficacy (18).In recent years, the rapid development of artificial intelligence in the medical field has provided a new approach to solving this problem.Computer-aided diagnostic systems utilize CT histogram technology to analyze the grayscale distribution information of the entire image, thereby improving the accuracy and sensitivity of diagnosis (19).However, the effectiveness of CT histogram parameters to predict MIP/SOL components of stage IA LUAC still needs validation.This study aims to construct nomogram prediction models by combining preoperative CT morphological features and histogram parameters to explore their value in predicting the presence of MIP/SOL structures in stage IA LUAC.

Study population
This retrospective study included patients diagnosed with stage IA invasive LUAC who underwent thin-section CT scans of the lungs and subsequent surgical pathology confirmation at the First Hospital of Putian City from January 2019 to June 2023.The clinical information and image data were recorded.The inclusion criteria were as follows: patients who 1) were diagnosed with stage IA LUAC confirmed by surgical pathology, with a lesion longest diameter ≤ 3cm; 2) had clear thin-section CT lung images that a computer-aided diagnosis system could accurately identify the lesions; and 3) had complete clinical data and laboratory examination results.The exclusion criteria were: Patients who 1) received prior radiotherapy, chemotherapy, targeted therapy, immunotherapy, or radiofrequency ablation before surgery; 2) had an interval between preoperative examination and surgery exceeding one month; 3) had pathological types including mucinous AC, enteric AC, colloid AC, and fetal AC; and 4) had multiple lung nodules containing both MIP/SOL+ and MIP/SOLnodules.This retrospective study has been approved by the hospital's ethics committee (approval number 2023-068).The specific workflow is illustrated in Figure 1.

Chest CT examination and image analysis
The chest thin-section scans were performed using Siemens DECT or Philips 64-slice spiral CT scanners.Patients were placed in a supine position, and volumetric data were collected at the end of a deep inspiration with breath-holding.All parameters were derived from the CT plain images.The scan range extended from the lung apices to the adrenal glands.The scan parameters were set as follows: tube voltage ranged from 100 to 120 kV, tube current was automatically adjusted, matrix size was 512×512, slice thickness during acquisition was 5.00 mm, and the reconstructed image slice thickness was either 1.25 mm or 1.00 mm.
The morphological features of CT images were evaluated by two experienced radiologists (with 10 and 11 years of experience, respectively) in a double-blinded manner.Disagreements were resolved through discussion.Morphological features included: 1) Location: divided into left upper lobe, left lower lobe, right upper lobe, right middle lobe, and right lower lobe; 2) Nodule: categorized as ground glass opacity (GGO)-containing nodules (pure GGO nodules and part-solid nodules) or solid nodules.GGO refers to areas in the lungs that appear hazy or translucent on highresolution CT scans and do not obscure the underlying bronchial structures or pulmonary blood vessels; 3) Lobulation sign: uneven, scalloped edges around the nodule; 4) Spiculation sign: spiky or linear projections extending from the nodule's edge; 5) Vacuole sign: the presence of small air lucencies within the nodule; 6) Bronchial inflation sign: presence of air-filled bronchial structures within the nodule; 7) Vascular convergence sign: blood vessels converging towards the nodule; 8) Pleural indentation: indentation of the pleura or interlobar fissures near the nodule.
Subsequently, images were imported into a computer-aided diagnostic system (Chest-Lung: 3.3.1,developed by Shukun Network Technology Co., Ltd.), which automatically identified and analyzed nodules' quantitative histogram features.One radiologist collected the following data: 1) CT values: maximum, minimum, average, median, and standard deviation; 2) Grayscale histogram features: kurtosis (reflects the steepness of the gray value of CT images), skewness (reflects the asymmetry of the grayscale distribution in CT images), and entropy (describes the chaos of the image grayscale distribution in CT images); 3) Geometric features: compactness (describes how tightly the nodule's shape is packed), sphericity (indicates how spherical the nodule is), and max slice area (measures the largest crosssectional area of the nodule); 4) Consolidation-to-tumor ratio (CTR): calculates the proportion of the solid component within the entire nodule; 5) Energy: reflects the uniformity and frequency characteristics of the grayscale distribution; 6) 3D long axis (mm); 7) Total nodule volume (mm 3 ); 8) Nodule mass (mg).The other radiologist verified whether the system correctly identified the nodules' range.

Pathological histological classification
According to the 8th edition of the TNM staging standard revised by the International Union Against Cancer, cases of stage IA invasive LUAC were included.For stage IA, including T 1a N 0 M 0 , T 1b N 0 M 0 , and T 1c N 0 M 0 ; T 1a indicates a tumor ≤ 1 cm in greatest dimension, T 1b refers to a tumor > 1 cm but ≤ 2 cm, and T 1c represents a tumor > 2 cm but ≤ 3 cm.N 0 indicates no regional lymph node metastasis, and M 0 indicates no distant metastasis (20).The pathological subtype classification of all cases of LUAC was determined by intermediatelevel pathologists in the pathology department using the 5th edition of the WHO classification of thoracic tumors proposed in 2021.The specific subtype classification was described in increments of 5%, including lepidic, acinar, papillary, micropapillary, and solid types (21).We defined stage IA LUAC with ≥ 5% micropapillary and/or ≥ 5% solid components as MIP/SOL+, while the remaining cases were classified as MIP/SOL-.

Statistical analysis
Data analysis was performed using SPSS version 26.0 and R version 4.1.2statistical software.Normally distributed or approximately normally distributed metric data were presented as mean ± standard deviation (SD) and compared by independent sample t-tests.Skewed metric data were presented as median [interquartile range (IQR)], and compared by Mann-Whitney U tests.Count data were presented as frequency and percentage (%) and compared using Pearson's chi-square test, continuity corrected chi-square test, or Fisher's exact test.The data were randomly divided into a training set and a test set in a 7:3 ratio using the sample function in R software.LASSO regression was used to select variables in the training set.Logistic regression analysis was performed to select independent predictive factors for MIP/SOL components of stage IA LUAC and construct the models.Nomograms were used to visualize the models.The receiver operating characteristic curve (ROC) and area under curve (AUC) were used to examine the models' performance.Calibration curves and Hosmer-Lemeshow goodness-of-fit test were applied to evaluate calibration.Decision curve analysis (DCA) and clinical impact curves (CIC) were performed to evaluate the clinical effectiveness of the models.A significance level of p < 0.05 was considered statistically significant.

Baseline characteristics
This study included a total of 376 patients with stage IA invasive LUAC (149 males and 227 females, aged 33-84 years old).The data were randomly divided into a training set (n = 264) and a test set (n = 112) in a 7:3 ratio.All indicators between the two datasets were comparable (p > 0.05) (Supplementary Table S1).

Independent risk factors for MIP/SOL components in stage IA LUAC with tumors ≤ 3cm
In the training set, patients were divided into MIP/SOL+ and MIP/SOL-groups based on the presence or absence of micropapillary/solid components.There were 202 cases in the MIP/SOL-group (71 males, median age of 62 [IQR: 53, 67]) and 62 cases in the MIP/SOL+ group (31 males, median age of 61 [IQR: 53, 66]).Clinical data showed statistically significant differences in sex and CEA levels between the two groups (both p < 0.05).There were no statistically significant differences in age, smoking history, family history of lung cancer, presence of cavitation, NSE, CYFRA21-1, and ProGRP levels between the groups (all p > 0.05), as detailed in Table 1.Typical cases from both groups are illustrated in Figure 2.
Screening of CT morphological features and histogram parameters using LASSO regression (Figure 3).Lasso regression with five-fold cross-validation was used to determine the optimal penalty coefficient l.In Figure 3B, lines were drawn at l (0.044) and l+SE (0.115), respectively.Then, l (0.044) was considered as a benchmark to select six factors with non-zero regression coefficients as potential predictive factors for MIP/SOL components of stage IA LUAC, including GGO, average CT value, 3D long axis, energy, entropy, and kurtosis.The above meaningful clinical and CT characteristics were included in binary multifactor logistic regression analysis, and the backward stepwise regression method was used to finally screen out four independent predictive factors (all p < 0.05), including GGO, average CT value, entropy, and kurtosis (Table 2).

Nomogram construction, evaluation, and validation for MIP/SOL components in LUAC with tumors ≤ 3cm
A nomogram prediction model was constructed based on the results of the multifactor logistic regression analysis.The AUC of the nomogram prediction model in the training set was 0.816 (95% CI: 0.756-0.877),the sensitivity was 0.902, and the specificity was 0.611.The AUC in the validation set was 0.789 (95%CI: 0.689-0.889),the sensitivity was 0.818, and the specificity was 0.596.It indicates that the prediction model has decent discriminatory ability.The calibration curves of the training set and test set show that the predicted probability of the model is close to the actual probability, and the calibration is good.The Hosmer-Lemeshow test results show that in the training set c 2 = 9.785, p = 0.280; in the test set c 2 = 3.898, p = 0.866, indicating that the goodness of fit of the model is good.The DCA and CIC results demonstrated the nomogram prediction model's decent clinical applicability (Figure 4).

Construction, evaluation, and validation of prediction models for MIP/SOL components in LUAC with tumors ≤ 2cm
In this study, a total of 293 patients with stage IA invasive LUAC with tumor diameters ≤ 2cm were included.The data were randomly divided into a training set (n = 208) and a test set (n = 85) in a 7:3 ratio.All indicators between the two datasets were comparable (p > 0.05) (Supplementary Table S2).There were no statistically significant differences in clinical data indicators in the training set (p > 0.05) (Supplementary Table S3).LASSO regression was used to screen CT histogram features and morphological features (Supplementary Figure S1).Finally, l (0.050) was used as the standard to select five factors with non-zero regression coefficients, including GGO, location, average CT value, entropy, and median CT value.Subsequently, through backward stepwise regression analysis of binary multifactor logistic regression, three independent predictive factors (all p < 0.05) were finally selected, including GGO, average CT value, and entropy (Supplementary Table S4).The nomogram prediction model based on multifactor logistic regression analysis was constructed.The AUC in the training set was 0.811 (95% CI: 0.731-0.891),the sensitivity was 0.667, and the specificity was 0.867.The AUC in the validation set was 0.833 (95% CI: 0.733-0.932),the sensitivity was 0.640, and the specificity was 0.917.Calibration curves and Hosmer-Lemeshow test results showed that the model had good calibration.The DCA and CIC curves indicated that the nomogram prediction model has decent clinical applicability (Figure 5).

Discussion
Surgical resection is the most effective treatment for stage IA LUAC; however, postoperative recurrence remains a concern (9,22).Existing studies indicate that high-risk pathological features such as MIP/SOL components, spread through air spaces, lymphovascular invasion, and visceral pleural invasion in stage IA LUAC significantly impact prognosis, making early and accurate identification crucial for optimizing clinical decision-making (23)(24)(25).MIP/SOL components are more invasive compared to other subtypes like lepidic, acinar, and papillary (6).They are independent predictors of postoperative recurrence in stage IA LUAC (26).However, there is currently no reliable non-invasive method to accurately identify MIP/SOL components preoperatively.Previous studies have demonstrated the correlation between CT images and pathology, as well as prognosis (16,17).For instance, Kim et al. pointed out that spiculation in CT images is associated with higher recurrence rates and poorer survival rates (16).Similarly, Cai et al. found that GGO components in CT images are related to the lepidic predominant subtype of lung adenocarcinoma (27).Additionally, CT histograms can extract features from CT images that are difficult for the human eye to detect, thereby improving predictive accuracy (19).This study aims to establish predictive models by combining CT morphological features and histogram parameters to help clinicians preoperatively identify MIP/SOL components.The results showed that, for LUACs ≤ 3 cm or ≤ 2 cm, the nomogram models effectively predict MIP/SOL components.Among the predictive factors, GGO, average CT value, and entropy were consistently important.
This study found that stage IA LUACs ≤ 3 cm and ≤ 2 cm containing GGO were less likely to exhibit MIP/SOL components (OR = 0.244, p = 0.001), consistent with the findings of Katsumata et al., who also noted that lesions with GGO are more likely to be low-risk (28).Additionally, a validation study based on data from the Japanese Clinical Oncology Group study JCOG0201 by Hattori et al. found that ground-glass nodules (pure ground-glass nodules and partially solid nodules) had significantly higher 5-year overall survival rates compared to solid nodules (95.1% vs. 81.1%),regardless of the size of the solid component (29).The formation of GGO is likely related to mild infiltration of tumor cells into the alveolar wall, local fibrosis, and scar formation, characteristics associated with low-grade malignancy, slower growth, and lower cellular proliferation activity (27,30).In contrast, MIP and SOL (A, B) Case 1, male, 56 years old, MIP/SOL+ group lung AC, irregular solid nodule in the middle lobe of the right lung, with a maximum diameter of about 18 mm, with lobulated and spiculated edges and vacuole signs within the lesion; pathology (HE × 10) shows invasive lung AC, with micropapillary accounting for approximately 50% and acinar type accounting for approximately 50%.(C, D) Case 2: Male, 63 years old, lung AC in the MIP/SOL-group; mixed ground-glass nodule in the upper lobe of the left lung, with a maximum diameter of about 15 mm, lobulated edges, and adjacent pleural depression; pathology (HE × 10) showed invasive lung AC, with approximately 40% lepidic structures and 60% acinar type.

A B
Lasso regression analysis diagram in the training set for tumors ≤ 3cm.(A) Coefficient path of Lasso regression.(B) Lasso regression crossvalidation results.components represent more invasive and malignant subtypes, with active cell proliferation and rapid growth, leading to the formation of solid portions.Therefore, GGOs are less likely to contain MIP/ SOL components, highlighting the biological and cytological differences between these pathological subtypes.Additionally, average CT value and entropy were also independent predictive factors for MIP/SOL components in IAstage LUAC in the subgroups of ≤ 3 cm and ≤ 2 cm.The average CT value was significantly higher in the MIP/SOL+ group compared to the MIP/SOL-group (OR = 1.002, p = 0.002).This finding is consistent with the notion that the average CT value reflects the overall density of the lesion, with MIP/SOL components more likely to appear as solid nodules on CT images.Yoshida et al.'s study supports this, showing that MIP components are more common in solid nodules than in pure GGOs or subsolid nodules (31).A metaanalysis further confirmed that the average CT value has good diagnostic performance in predicting the invasiveness of GGOs (32).Additionally, entropy measures the disorder of the grayscale distribution in CT images, with higher values indicating more chaotic and irregular grayscale distributions.This study found that higher entropy values were associated with a higher risk of MIP/SOL components (OR = 1.748, p = 0.004).Qiu et al. also identified entropy as an independent predictor for quantifying the invasiveness of stage IA LUAC using CT texture features (33).This may be due to the higher malignancy of MIP/SOL components, leading to increased tissue heterogeneity and grayscale irregularity in the tumor, resulting in higher entropy values (34).
It is worth noting that the results for kurtosis were inconsistent in the subgroups of ≤ 3 cm and ≤ 2 cm.In the ≤ 3 cm subgroup, kurtosis was an important predictive factor for MIP/SOL components in stage IA LUAC (OR = 1.240, p = 0.030).Kurtosis describes the steepness of the grayscale value distribution in CT images, with higher values indicating steeper distributions and suggesting denser structures within the nodules.This study found that higher kurtosis was associated with an increased risk of MIP/SOL components, possibly due to the high malignancy, active cell proliferation, and rapid growth of MIP/SOL components, leading to dense cell accumulation.This finding is similar to the results of Alpert et al., who found that kurtosis has statistical significance in distinguishing between different invasive subtypes of LUAC (35).However, in the ≤ 2 cm subgroup analysis, kurtosis did not demonstrate the same predictive capability.We speculate that this may be because smaller tumors have less internal tissue heterogeneity and smaller differences in grayscale distribution, making kurtosis differences less apparent.As tumors grow larger, tissue heterogeneity increases, leading to greater differences in grayscale distribution and making kurtosis a significant predictive factor.The inconsistency in subgroup analysis results highlights the necessity of developing predictive models tailored to different tumor sizes.Our study constructed nomogram models for stage IA lung adenocarcinoma patients based on tumor size (≤ 3 cm and ≤ 2 cm).The ROC curve shows that the models have decent diagnostic performance.
Recent studies have highlighted the importance of predictive models in assessing the invasiveness of lung adenocarcinoma and the recurrence of low-risk resected stage I lung adenocarcinoma (36,37).Meanwhile, researchers have developed nomogram models based on radiomic features for predicting the invasiveness of LUAC, demonstrating high sensitivity and specificity (38).However, radiomic data processing is complex, reproducibility is challenging, and clinical applicability is limited (39).The nomogram models proposed in this study, based on computeraided diagnosis system CT histogram parameters, offers simple data acquisition and high stability (40).The DCA and CIC results demonstrated the model's decent clinical applicability.Medical staff can use the nomogram model to preoperatively calculate the probability of MIP/SOL components in each LUAC patient, providing a basis for clinical decision-making and personalized treatment planning.
However, this study still has some limitations: first, due to the single-center research design, central bias may have been introduced; second, selection bias in retrospective analysis may exist.To strengthen the credibility of the conclusions, more rigorous designs, including multi-center and prospective cohort studies, will be used in the future to further confirm the research results.Additionally, future research will continue to explore the relationship between other high-risk pathological features and CT images in stage IA lung adenocarcinoma.

Conclusion
In summary, the nomogram models established for lung adenocarcinomas ≤ 3 cm and ≤ 2 cm demonstrated decent accuracy and clinical applicability in predicting MIP/SOL components.GGO, average CT value, and entropy were consistent predictors for MIP/SOL components in stage IA lung adenocarcinoma smaller than 3 cm and 2 cm.The nomogram prediction model has potential diagnostic value for the non-invasive identification of MIP/SOL components in early-stage LUAC, providing a valuable tool for clinical decision-making and personalized treatment planning.

5 (
FIGURE 4 (A) Model 1, nomogram prediction model for the risk of MIP/SOL in stage IA lung AC with tumors ≤ 3cm.(B) Training set and test set ROC curves.(C) Training set and test set DCA curves, showing that when the threshold probabilities of the training set and test set are 0.08-0.78and 0.12-0.78,respectively, the patient's net benefit rate is greater than 0. (D,E) Training set and test set calibration curves.(F,G) Training set and test set CIC curves, indicating that for risk thresholds > 0.40, the model's predictions align closely with the actual high-risk MIP/SOL population.GGO, groundglass opacity; CT, computed tomography; MIP, micropapillary; SOL, solid.

TABLE 1
Comparison of clinical data indicators between the two groups in the training set for tumors ≤ 3cm.

TABLE 2
Multivariate logistic regression analysis for predicting MIP/SOL components in tumors ≤ 3cm.