Prediction of lymphovascular invasion of gastric cancer based on contrast-enhanced computed tomography radiomics

Background Lymphovascular invasion (LVI) is a significant risk factor for lymph node metastasis in gastric cancer (GC) and is closely related to the prognosis and recurrence of GC. This study aimed to establish clinical models, radiomics models and combination models for the diagnosis of GC vascular invasion. Methods This study enrolled 146 patients with GC proved by pathology and who underwent radical resection of GC. The patients were assigned to the training and validation cohorts. A total of 1,702 radiomic features were extracted from contrast-enhanced computed tomography images of GC. Logistic regression analyses were performed to establish a clinical model, a radiomics model and a combined model. The performance of the predictive models was measured by the receiver operating characteristic (ROC) curve. Results In the training cohort, the age of LVI negative (−) patients and LVI positive (+) patients were 62.41 ± 8.41 and 63.76 ± 10.08 years, respectively, and there were more male (n = 63) than female (n = 19) patients in the LVI (+) group. Diameter and differentiation were the independent risk factors for determining LVI (−) and (+). A combined model was found to be relatively highly discriminative based on the area under the ROC curve for both the training (0.853, 95% CI: 0.784–0.920, sensitivity: 0.650 and specificity: 0.907) and the validation cohorts (0.742, 95% CI: 0.559–0.925, sensitivity: 0.736 and specificity: 0.700). Conclusions The combined model had the highest diagnostic effectiveness, and the nomogram established by this model had good performance. It can provide a reliable prediction method for individual treatment of LVI in GC before surgery.


Introduction
Gastric cancer (GC) is one of the most common malignant tumours, with a high incidence in Asia, South America and Africa.It is estimated that there are 1 million new cases of GC and 70,000 deaths globally each year (1,2).According to the World Health Organization classification, gastric adenocarcinoma is the most common type of GC.Most gastric adenocarcinomas are thought to develop through a gradual progression from Helicobacter pyloriinduced chronic gastritis to atrophic gastritis, intestinal metaplasia, dysplasia and ultimately adenocarcinoma (3).At present, the Lauren and Borrmann classifications are commonly used to categorise them.Furthermore, some patients with GC have both intestinal and diffuse types in the GC specimen, which is separated into mixed types (4).The widely used Borrmann classification system divides advanced GC into four types depending on its gross appearance: mass type (type 1), ulcerative type (type 2), infiltrative and ulcerative type (type 3) and diffusely infiltrative type (type 4) (5).These types can be easily defined by endoscopic morphologic assessment before preoperative therapy, as well as by gross histopathologic assessment after surgery.
Lymphovascular invasion (LVI) is defined as the lymphatic vessels and/or blood vessel invasion of malignant tumour cells within the primary tumour and surrounding tissues.Lymphovascular invasion plays an important role in cancer cell spreading and lymph node metastasis, and it is associated with an increased risk of micrometastasis (6).Studies have revealed that LVI is an independent prognostic factor for predicting clinical outcomes for patients with GC (6).Although LVI is considered to be a key prognostic factor of unsatisfactory survival outcomes in various cancers, accurate identification of LVI status before operation is still difficult, as LVI is mainly found through postoperative pathology.
Lambin et al. first put forward the concept of radiomics in 2012 (7).With high-throughput computing, it is now possible to rapidly extract innumerable quantitative features from tomographic images (computed tomography [CT], magnetic resonance [MR] or positron emission tomography [PET] images).Recent research has found that radiomics methods exhibit high accuracy and sensitivity in diagnosing GC.By analysing a variety of imaging features, researchers can more accurately locate and assess tumours.Radiomics contributes to the early detection of lesions, including tiny tumours and early-stage malignancies.This is crucial for improving treatment success rates and survival rates (8,9) and can provide personalised diagnosis and treatment plans for each patient (9,10).Research has found that its advantages are typically non-invasive, causing minimal harm to patients, and it is suitable for repeated monitoring and long-term follow-up.Combining different types of imaging techniques, such as CT, MRI and PET, can provide comprehensive information, aiding in a more thorough understanding of lesions and utilising computer algorithms for image analysis; this enables efficient and automated data processing, thus relieving the workload of healthcare professionals (11).However, it also has some disadvantages, such as (1) some advanced imaging techniques may lead to higher costs, limiting their widespread adoption in certain regions and healthcare systems, (2) radiomics methods face challenges in standardisation; differences in devices and methodologies across studies may affect result consistency and (3) the vast amount of medical imaging data raises privacy concerns, necessitating strict data security measures (12,13).With its development, it has now been applied to any medical research that can image a disease or condition (14).At present, there are few studies on the use of radiomics to investigate LVI of GC.Therefore, this study aims to develop a clinical model, a radiomics model and a combination model based on contrast-enhanced CT to predict LVI of GC and to evaluate the potential clinical applicability of the model.

Patients
The sample size calculation formula was as follows: where n is the required sample size, Z is the Z-score corresponding to the desired confidence level (1.96 for a 95% CI), p is the estimated proportion of the population with a particular characteristic and E is the margin of error.
This study was approved by the Institutional Review Board of our hospital.The informed consent requirement was waived.A total of 423 consecutive patients with GC at the hospital between January 2019 and May 2022 were enrolled in this retrospective study.The inclusion criteria were as follows: (1) patients received radical gastrectomy with R0 resection and D2 lymphadenectomy; (2) contrast-enhanced abdominal CT images were acquired within 7 days before the operation; (3) GC diagnosis was histologically confirmed; (4) the image quality was satisfactory for analysis.The exclusion criteria were as follows: (1) received antitumour treatment before surgery; (2) the lesion was too small to be recognised by CT; (3) combined with other malignant tumours.In total, 146 patients were enrolled.All enrolled patients were randomly allocated to either the training cohort (n = 117) or the validation cohort (n = 29) at a ratio of 8:2.The clinical factors, including age, gender, tumour diameter, differentiation status, Borrmann classification, Lauren type and the test results of four serum biomarkers, namely carbohydrate antigen 125 (CA125), carbohydrate antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA) and alphafetoprotein (AFP), were obtained.An overview of the study's workflow is shown in Figure 1.

Computed tomography examination
All patients were asked to fast for 8 hours, drink 800-1,000 ml of water and practice holding their breath before the CT examination.Patients were administered scopolamine hydrochloride intramuscularly before the examination to reduce gastrointestinal motility artefacts.The CT examination was performed using Revolution CT (GE Medical Systems) and Aquilion ONE CT (CANON Medical Systems) devices.All patients were in the supine position, and the scan covered the upper or the entire abdomen.The scanning parameters were as follows: tube voltage 120 kVp, 375 reference mAs with an automated tube current modulation system, slice thickness 5.0 mm, slice interval 5.0 mm, matrix 512 × 512, rotation time 0.5 s and reconstruction section thickness 1.25 mm.After an intravenous injection of contrast medium (3.0-3.5 ml/s ioversol, 320 mg/ml) via a syringe pump, the arterial phase (AP) and venous phase (VP) scans were acquired following delays of 30-35 s and 70-75 s, respectively.

Region of interest segmentation and feature extraction
The AP and VP images of all patients were imported into the image processing software 3D Slicer 4.11.0 (https://www.slicer.org/) in the DICOM format.Two doctors unaware of the pathology manually delineated the upper and lower consecutive three-layer regions of interest (ROIs) centred on the largest cross-section of the tumour.In case of a disagreement, the two doctors discussed with each other until they reached a consensus.Using a spline interpolation algorithm, all CT images were resampled to the same size (1 × 1 × 1 mm), regardless of the scanner from which they were acquired.A total of 1,702 radiomic features were extracted from the AP and VP CT images for each patient.These features included shape, first order, grey level co-occurrence matrix, grey level size zone matrix, grey level run length matrix, grey level dependence matrix and neighbouring grey tone difference matrix features.

Feature selection
To identify robust and reliable radiomics features, feature selection was performed in the following five steps.First, all the features were centralised and standardised.Second, the intra-class correlation coefficient (ICC) was calculated for the re-segmentation data.All ICC values ≥0.75 were reserved for stable features.Third, the independent sample t-test or Mann-Whitney U test was applied to compare the features between the LVI positive (+) group and the LVI negative (−) group for selecting the potentially important features.Fourth, the least absolute shrinkage and selection operator (LASSO) method was used to reduce the dimension.The variables were selected through the regularisation process, and the complexity was adjusted simultaneously to obtain the optimal feature subset to improve the accuracy and repeatability of the radiomics prediction model.This study tuned the regularisation parameter (l), and tenfold cross-validation was used to select features.The best feature subset was obtained using the minimum variance model principle.Finally, the Pearson correlation coefficient of the retained features was analysed to verify whether there was a high correlation between the retained features (i.e.whether there was collinearity).The radiomics signature (RS) was built based on the selected features, and each patient's corresponding radiomics score was calculated.

Model construction
The RS was built based on the selected features, and each patient's corresponding radiomics score was calculated.Based on the selected features, the radiomics model was established by a multivariate logistic regression algorithm.The statistically significant risk variables (radiomics score and clinical factors) obtained from the univariate logistic regression analysis were then entered into the multivariate analysis to establish the clinical and combined models.A nomogram was generated for the combined model visualisation, graphical evaluation of variable importance and calculation of predictive accuracy.The Hosmer-Lemeshow test was performed to assess the goodness-of-fit of the nomogram.A calibration curve, obtained by plotting the actual LVI probability against the nomogram-predicted probability of LVI, was used to assess the calibration of the nomogram.The decision curve was used to evaluate the clinical utility of the nomogram.

Statistical analysis
The statistical analysis was conducted using the SPSS 25.0 and R 4.2.1 (https://www.R-project.org).A p-value <0.05 was set, indicating a statistically significant difference.The differences between continuous clinical variables (age, diameter, AFP, CEA, CA125 and CA19-9) were evaluated by the independent sample ttests or Mann-Whitney U test; these variables were expressed as mean ± standard deviation or median and interquartile range, as appropriate.The chi-squared test was used to compare dichotomous clinical variables (gender, differentiation, Borrmann classification and Lauren type) between LVI (+) and LVI (−); these variables were expressed as numbers and percentages.Following that, univariate and multivariate logistic regression analyses were performed to identify the relationship between clinical factors and LVI status.The LASSO analysis was conducted using the 'glmnet' package in the R software, and the nomogram, calibration curve and decision curve were made using the 'rms' and 'rmda' packages.Diagnosis efficacy was assessed using the receiver operating characteristic (ROC) curve with the area under the curve (AUC).In the training cohort, the poor differentiation rate of LVI (−) patients was higher than that of LVI (+) patients, and there was a significant difference in Borrmann classification between LVI (+) and LVI (−) (p < 0.05), and there was a significant difference in Lauren type between LVI (+) and LVI (−) (p < 0.05).Other featuressex, age, AFP, CEA, CA125, CA19-9 and diameterhad no statistical difference between groups.In the validation cohort, there was a significant difference in diameter between the two groups (p < 0.05), and the other factors were not statistically significant.Whether in the training or validation cohorts, the tumour diameter of LVI (+) patients was more extensive than that of LVI (−) patients.

Feature selection
A total of 1,702 radiomic features were initially extracted from the segmented CT images of the GC ROI, and 1,389 features with ICC ≥0.75 were retained.After the independent sample t-test or Mann-Whitney U test, 97 features with p <0.05 were retained.To reduce dependency and redundancy, LASSO regression was used to reduce the dimensions of these features (Figures 2A, B).After adjusting the parameter l by tenfold cross-validation, the optimal feature subset containing 8 features was finally obtained for model construction (Table 2, Figure 2C).After the Pearson correlation coefficient analysis, no high correlation was found between the  shows the color depth corresponding to different correlation coefficients, and the higher the color is, the greater the correlation is; as can be seen from the figure, the maximum correlation coefficient between each feature is 0.63, so there is no highly correlated feature pair.
retained features (the maximum correlation coefficient was 0.63); therefore, there was no redundant feature (Figure 2D).The retained 8 imaging features were calculated according to Rad-score, and finally the RS of each patient was obtained (Figure 3A).

Establishment of model and evaluation of diagnostic efficiency
Univariate and multivariate logistic regression determined that the diameter and differentiation were independent risk factors for LVI (Table 3), and these two variables were included in logistic regression to establish the clinical model.In addition, the radiomics model was established based on the RS by using a multivariate logistic regression algorithm, and the combined model was developed by integrating the RS and clinical independent risk factors.In the training cohort, the AUC values of the clinical model, the radiomics model and the combined model were 0.709 (95% CI: 0.613-0.805),0.852 (95% CI: 0.783-0.921)and 0.853 (95% CI: 0.784-0.920),respectively.In the validation cohort, the AUC values of three corresponding models were 0.468 (95% CI: 0.239-0.697),0.638 (95% CI: 0.391-0.881)and 0.742 (95% CI: 0.559-0.925),respectively.The combined model showed a higher predictive capability in the training cohort; therefore, the combined model was chosen as the final model and presented as a nomogram.The specific performances of the models are shown in Table 4.The ROC curves of the three models are illustrated in Figures 3B, C, and the nomogram is presented in Figure 4A.

Calibration and clinical usefulness analysis
The calibration curve analysis demonstrated good agreement between prediction and observation for the nomogram (Figure 4B).The decision curve analysis (DCA) for the radiomics model, the clinical model and the combined model are presented in Figure 4C.The DCA demonstrated the effectiveness of the three models in clinical decisionmaking; the clinical application value of the combined model is seen to be higher than that of the radiomics model and clinical model, and it can provide better net benefits (Figure 4C).

Discussion
Among the clinical features, endoscopic biopsy and tumour markers have become the routine preoperative examination of GC (15,16).However, at present, LVI is mainly found by postoperative pathology, and accurate preoperative identification of LVI status remains difficult.Radiomics can be used for GC screening, particularly for early detection in high-risk populations.Usually, the larger the tumour diameter (17) and the lower the degree of differentiation are, the higher the tumour invasiveness and the higher the risk of LVI.In a study of 2,090 patients, Fujikawan et al. (18) found that the maximum diameter of the tumour is one of the factors affecting LVI, which agrees with the findings of this study.In the study of the relationship between early GC vascular invasion and prognostic factors, Choi et al. ( 19) concluded that the degree of differentiation of early GC was statistically significant between the LVI (−) group and the LVI (+) group.In this study, we demonstrated that the tumour's maximum diameter and differentiation were independent risk factors of GC LVI.This study also found that Borrmann typing is not a factor affecting the positive rate of LVI, which is contrary to the results of some studies (20,21).The five tumour markers included in this study have no statistical significance after statistical analysis, which is different from the results of Ren (22) and Chen (23).The reason may be different research participants (i.e.different populations in different areas) or it may be caused by the small sample size of this study.
Radiomics is an emerging analytical method to improve the accuracy and sensitivity of GC diagnosis by assessing multiple features of medical images.It helps in identifying tumours early, including small or early malignancies, and is important in improving treatment efficiency and patient survival (8,9).Although there are few studies on LVI of GC by radiomics, its   This study has several limitations.First, it is a single-centre retrospective study without multicentre validation; future studies require multicentre data and prospective designs to evaluate current studies.Second, the sample size of this study is small, and the resulting sample error results in the prediction efficiency of the training set being higher than that of the verification set.Third, there is no 3D drawing of the whole tumour when sketching the ROI.Some studies (33,34) conclude that the model of drawing a 3D ROI has better performance; therefore, in future research, we should pay attention to the 3D ROI of the tumour.

Conclusion
Based on the preoperative clinical features of GC and enhanced CT images, the imaging and clinical models for predicting vascular invasion of GC and the combined model combined with imaging and clinical features were established and verified.It was found that the combined model offered the highest diagnostic efficiency.It was found that the DCA analysis result of the nomogram was better, which was beneficial in improving the clinical decision-making efficiency of patients with GC.However, experiments based on multicentre retrospective verification and prospective design of large samples of 3D interest need to be further studied and verified.

FIGURE 2 (
FIGURE 2 (A) LASSO the two vertical dotted lines in figure a represent the logarithm l of the minimum mean square error (lambda.min)(left dashed line) and the logarithmic l (right dashed line) of the minimum distance standard error (lambda.1se);lmabda.min is the best value, and lambda.1se is a model with excellent performance and the least number of independent variables.(B) Each curve in b represents the changing trajectory of each independent variable coefficient, the ordinate is the coefficient value, and the Abscissa is the number of non-zero coefficients in the model at this time.With the continuous increase of the parameters, the coefficient is finally compressed to a variable of 0, which shows that it is more important.The dotted line in the graph is the number of features under the lambda.1separameter.(C) The vertical coordinate of the feature weight diagram represents the feature name, and the Abscissa represents the coefficient value; Coefficients: coefficient.(D) The scale on the right side of the correlation coefficient heat mapshows the color depth corresponding to different correlation coefficients, and the higher the color is, the greater the correlation is; as can be seen from the figure, the maximum correlation coefficient between each feature is 0.63, so there is no highly correlated feature pair.
application is promising.Some articles in imaging science also show that wavelet changes are helpful in imaging analysis(24)(25)(26)(27).The combined model is established based on clinical characteristics and RS, and RS is calculated according to the RAD score.No matter the verification set or the test set, the combined model has the largest AUC value and the highest diagnostic efficiency.Therefore, by combining the maximum diameter and differentiation of the tumour with the RS, the nomogram can be established, and the calibration curve and DCA analysis can be carried out; the results of this procedure are encouraging.Li(28) collected 1,062 patients with GC based on enhanced CT imaging and established an imaging group model and a deep learning model to study the preoperative prediction of LVI status in patients with GC.Unlike the model established by logistic regression in this study, it used other machine learning (ML) models, namely random forest and support vector machine.Yardımcı et al. (29) established eight ML models based on CT to predict LVI for tubular gastric adenocarcinoma, although they did not use the logistic regression method.Meng et al. (30) included the task of predicting LVI in the multicentre comparative study of two-dimensional (2D) and three-dimensional (3D) CT imaging features of GC.The results showed that the AUC of the model established with 3D features in the training verification set was 0.618 and 0.615, which was lower than the AUC values of 0.704 and 0.677 for 2D features.Unlike the above study, this study selected the largest level of the tumour and its upper and lower layers when outlining the ROI.Chen et al. (31) carried out a retrospective analysis of 160 patients with GC.Three predictive models were established according to the imaging characteristics of arterial, venous and biphasic images, and then three RAD scores were obtained by multivariate logistic regression analysis.Four predictive models were established by combining the three RAD scores with clinical risk factors, and a total of seven predictive models were constructed.The nomogram for predicting LVI was established.In this study, the RAD score was established by using the characteristics of biphasic images and then combined with clinical factors, the nomogram was established and the decision curve (which is not found in the above research) was analysed; the results were positive.Fan et al. (32) used different ML classifiers with enhanced CT, PET/CT and clinical variables to establish a model to predict the LVI state of GC before operation.Three-dimensional manual segmentation was used to extract imaging features from PET and VP CT images.This study aimed to extract imaging features from AP and VP images.

FIGURE 4 (
FIGURE 4 (A) The nomogram consists of the maximum diameter of Diameter tumour; the degree of Differentiation; and RS imaging tagging.(B) The calibration curve of the nomogram this chart shows that there is a high consistency between the observation and prediction results of the nomogram.(C) DCA decision curve validation of three model's clinical decision-making.

Table 1
lists the clinical factors of the training and validation cohort patients.A total of 146 patients were included in this study, including 82 patients with LVI (+) and 64 patients with LVI (−) Among them, there were more male (n = 63) than female (n = 19) patients in the LVI (+) group.

TABLE 1
Analysis of clinical characteristics.

TABLE 2
Feature names and weight coefficient results.

TABLE 4
ROC diagnostic performance of the model.

TABLE 3
Univariate and multivariate logistic regression analysis of clinical characteristics.