Preoperative Prediction of Lymphovascular Invasion of Colorectal Cancer by Articial Neural Network

Background: Lymphovascular invasion (LVI) is considered to be important for metastasis of colorectal cancer (CRC). However, there is still no effective method to predict LVI before operation. Our research aimed to construct an articial neural network (ANN) for the pre-operative prediction of LVI. Methods: We obtained blood indexes and conditions of LVI (conrmed by pathological examination) of 288 cases of CRC patients from a tertiary hospital in China. ANN and logistic regression model were constructed based on randomly chosen 185 cases CRC patients (training group). The remaining 103 cases of CRC patients received tests of ANN and logistic model (validation group). Receiver operating characteristics curve (ROC) and decision curve analysis (DCA) was performed to assess the accuracy of constructed model respectively. Results: In the training group, the area under curve (AUC) of ANN was higher than that of logistic model (0.832 vs 0.692). The ANN correctly predicted 92% cases of LVI, whereas logistic model only predicted 56% cases. Similar results were also tested in the validation model. Conclusions: Our constructed ANN showed higher accuracy compared with a conventional linear model. The ANN based on blood indexes may provide value for pre-operative prediction of LVI.

Background CRC is one of the most common digestive tract tumor and nearly one million people suffer from CRC every year with a mortality rate of 33% (1). Recurrence and metastasis are important causes of death of CRC. LVI was reported to be associated with poor differentiation, unfavorable survival as well as lymph node metastasis (2,3). LVI is commonly detected by pathological examination after operation. However, pathological examination has hysteresis, which can not predict the metastasis and recurrence of patients in early stage (4). It is urgent to develop a new method to predict LVI before operation.
Traditional clinical prediction model is mainly based on linear regression model. Although it works effectively for small data size with simple relationships, it is di cult to construct lineal regression model for nonlinear data and express highly complex data (5). ANN is a good solution to the shortcomings of traditional linear regression with the characteristics of self-adaptive, self-organizing and real-time learning. It is a nonlinear and adaptive information processing system composed of a large number of processing units(6). At present, ANN has been more and more used in the eld of biology and medicine (7,8). Our research adopted common blood indexes in preoperative examination or physical examination to construct ANN for pre-operative prediction of LVI, thus provided value for surveillance of CRC in early stage.

Page 3/13
This study included 408 cases of CRC patients from Shanghai Pudong Hospital. These patients received CRC operation from April 2017 to April 2020. Patients with the following criteria were excluded from participation: receive anticancer treatment outside Shanghai Pudong Hospital; loss of lymphovascular invasion information, necessary basic characteristic information, necessary serological parameter or follow up; receive any other treatment. 288 cases of quali ed CRC patients were nally recruited in the study (Fig. 1). Clinical data (sex, age and body Mass Index), pre-operative blood indexes (alanine aminotransferase, carcinoembryonic antigen, Carbohydrate antigen-199, etc.) and postoperative pathological data (LVI and histologic grade) were obtained from medical records. LVI was determined by D2-40 immunohistochemistry (IHC) combined with hematoxylin-eosin (HE) staining.

Construction of logistic regression model
In training group, the association of variables (clinical data and pre-operative blood indexes) and LVI were performed with Spearman correlation analysis, p-value less than 0.05 was adopted as the screening criteria. Screened variables were further performed with multivariate logistic regression analysis.

Construction of arti cial neural network
Variables found to be signi cantly related to LVI in multivariate analysis were adopted to construct ANN.
we built a three-layer feedforward ANN with four input nodes, ve neurons in the hidden layer, and two output neurons. The learning rule used here was the back propagation of the error. The cut-offs of logistic regression and ANN outputs with the best relationship between sensitivity and speci city were adopted for classi cation.

Assessment of logistic regression model and arti cial neural network
In training group and validation group, both logistic regression model and ANN were tested with ROC curve (Hanley-McNeil method) and represented in terms of negative predictive value (NPV), positive predictive values (PPV), and likelihood ratios (LR). Furthermore, DCA was performed to calculate the net bene t of LVI conditions by the logistic regression model and ANN respectively.

Statistical analysis
All analyses were performed using SPSS 23.0 and R 3.5.3. All statistical tests were two-sided, and a P value < 0.05 was considered statistically signi cant. Continuous variables that conformed to the normal distribution were compared with the use of an independent t test for comparison between groups, while continuous variables with skewed distribution were compared with the Mann-Whitney U test. Categorical data were tested using the Chi-square test.

Baseline of Patients' characteristics and pre-operative blood indexes
The characteristics of all patients in both training group and validation group were shown in Table 1.
There was no difference in all clinical characteristics and pre-operative blood indexes between training group and validation group. In training group, there were 70 cases of LVI, accounted for 37.8% of all patients. In validation group, there were 42 cases of LVI, accounted for 40.8% of all patients. There was no difference between LVI condition in two groups (p-value: 0.705). First of all, we investigated the relationship of clinical characteristics and blood indexes with LVI via Spearman correlation analysis. We demonstrated glutamyl transpeptidase (CGT), neuron speci c enolase (NSE), CA125, CA199, carcinoembryonic antigen (CEA), C reactive protein (CRP), brinogen and preoperative largest diameter of tumor were associated with LVI (p < 0.05). These variables were further performed with multivariate logistic regression model. Only CA125, CA199 and brinogen were nally adopted as independent prediction factors for LVI (Table 2). Meanwhile, CA125, CA199 and brinogen were adopted to construct ANN to improve performance of the logistic regression model.

Comparison of logistic regression model and arti cial neural network
We further assess the performance of logistic regression model and ANN in both training group and validation group via ROC curve ( Fig. 2A, Table 3). We con rmed best cutoff value by learning Youden's index (Sensitivity + speci city-1). The sensitivity of ANN is 92% in training group and 75% in validation group, signi cantly higher than those of logistic regression model (56% in training group and 45% in validation group). However, high sensitivity of ANN seems to sacri ce speci city. The speci city of ANN is 74% in training group and 67% in validation group, lower than those of logistic regression model (83% in training group and 85% in validation group). Similarly, ANN showed higher PPV and positive LR but lower NPV and negative LR compared with logistic regression model in both training group and validation group. The AUC of ANN was 0.839 in training group and 0.76 in validation group, signi cantly higher than those of logistic regression model (0.692 in training group and 0.682 in validation group). ANN signi cantly improved AUC compared with logistic regression model (Training group: P < 0.001, Validation group: P < 0.05). In DCA, we also determined ANN provided a better net bene t to predict LVD compared with logistic regression model (Fig. 2B).

Discussion
Recent studies also showed that LVI can be regulated and promoted by lymphangiogenic of growth factors. Growth factors include glycoprotein, vascular endothelial growth factor-C (VEGF-C) and VEGF-D, and chemokines may also play roles by attracting tumor cells to lymphatic vessels (9). Lymphangiogenic factors can be released not only by tumor cells, but also by tumor related active matrix and immune cells (10). Further understanding of these mechanisms may improve future treatment strategies to inhibit the spread of cancer metastasis and prolong the survival of CRC patients. Current AJCC guidelines do not adopt LVI as a prognostic indicator for CRC in TNM staging systems. However, studies have shown that LVI is an independent risk factor for survival in CRC patients (11). LVI also indicated resistance to neoadjuvant chemoradiotherapy (12). Although several research had constructed models to predict LVI, part of them adopted postoperatively pathological data to predict LVI, which seems to be disadvantaged in early prediction (13,14). There were also studies established pre-operative prediction models based on imaging data, including magnetic resonance (MR) and computed tomography (CT) (15,16). However, imaging data is not as easy to obtain as blood indexes, and the analysis results are greatly affected by imaging quality. The additionally, most research adopted lineal regression model, but rare research adopted ANN. ANN is a good solution to the shortcomings of traditional linear regression with the characteristics of self-adaptive, self-organizing and real-time learning. Therefore, our research aimed to construct pre-operative prediction model for LVI based on blood indexes via ANN.
Our research determined CA125, CA199 and brinogen were independent predicted factors for LVI in CRC patients. Among them, CA125 and brinogen were also reported to be associated with LVI in endometrial cancer (17). Based on these 3 factors, we constructed logistic regression model and ANN respectively. ANN showed inspiring AUC and sensitivity compared with logistic regression model (AUC: 0.839 vs 0.692; sensitivity: 92% vs 56%). The performance of ANN was similar as the multimodal radiomics model reported in rectal cancer (AUC: 0.839 vs 0.884; sensitivity: 92% vs 93.8%) (15). Blood indexes are faster, simpler and cheaper than imaging indexes. The speci city of ANN was lower than logistic regression model, but higher than reported multimodal radiomics model (74% vs 83% & 72.7%). On the premise of high sensitivity, this sacri ce in speci city is acceptable.

Conclusion
Based on accessibly blood indexes (CA125, CA199 and brinogen), we constructed an ANN for preoperative prediction of LVI in CRC patients. The model showed satis ed performance compared with conventional lineal regression model. The ANN may provide potential value in predicting metastasis and recurrence of CRC patients.

Declarations Con ict of Interest
The authors declare that they have no competing interests.

Author Contributions
DW and ZY contributed to the statistical analysis of the data. SW and YQ contributed to the data collection. ZM and HH contributed to the design of the study. All authors read and approved the nal version of the manuscript.

Funding
This work was supported by the Pudong Hospital Puxiu Project (No.px201504).

Figure 1
Screening process of enrolled patients