Development and validation of prediction model for early warning of ovarian metastasis risk of endometrial carcinoma

Ovarian metastasis of endometrial carcinoma (EC) patients not only affects the decision of the surgeon, but also has a fatal impact on the fertility and prognosis of patients. This study aimed build a prediction model of ovarian metastasis of EC based on machine learning algorithm for clinical diagnosis and treatment management guidance. We retrospectively collected 536 EC patients treated in Hubei Cancer Hospital from January 2017 to October 2022 and 487 EC patients from Tongji Hospital (January 2017 to December 2020) as an external validation queue. The random forest model, gradient elevator model, support vector machine model, artificial neural network model (ANNM), and decision tree model were used to build ovarian metastasis prediction model for EC patients. The predictive efficacy of 5 machine learning models was evaluated by receiver operating characteristic curve and decision curve analysis. For screening of candidate predictors of ovarian metastasis of EC, the degree of tumor differentiation, lymph node metastasis, CA125, HE4, Alb, LH can be used as a potential predictor of ovarian metastasis prediction model in EC patients. The effectiveness of the prediction model constructed by the 5 machine learning algorithms was between (area under curve [AUC]: 0.729, 95% confidence interval [CI]: 0.674–0.784) and (AUC: 0.899, 95% CI: 0.844–0.954) in the training set and internal verification set, respectively. Among them, the ANNM was equipped with the best prediction effectiveness (training set: AUC: 0.899, 95% CI: 0.844–0.954) and (internal verification set: AUC: 0.892, 95% CI: 0.837–0.947). The prediction model of ovarian metastasis of EC patients based on machine learning algorithm can achieve satisfactory prediction efficiency, among which ANNM is the best, which can be used to guide clinicians in diagnosis and treatment and improve the prognosis of EC patients.


Introduction
Endometrial carcinoma (EC), as one of the 3 most common gynecological malignant tumors in the world, is an epithelial malignant tumor that occurs in the endometrium. [1,2]To date, in order to prevent tumor metastasis and recurrence, surgery is the most important treatment method for EC. [3]In clinical practice, it is more commonly used to treat EC by total hysterectomy with ovariectomy and pelvic lymph node dissection.Alarmingly, in recent years, the disease has become younger.About 25% of patients are premenopausal women, and up to 5% of women under 40 years old. [4,5]In view of this, whether ovariectomy has become the focus of surgical controversy, and only the trauma caused by hysterectomy also has a great impact on patients' reproductive function.
Previous research reports have shown that the probability of ovarian metastasis of EC is 1.7% to 11.0%. [6,7]Given this situation, if we can find out the risk factors of ovarian metastasis of EC, it will help us to take early intervention measures and avoid the trauma caused by ovariectomy or surgery.9] Therefore, building a prediction model for ovarian metastasis of EC through some preoperative indicators will help lowrisk patients choose other treatment methods to avoid trauma caused by surgery.
In recent years, machine learning is a common research hotspot in the field of artificial intelligence and pattern recognition. [10,11]It can centrally learn and analyze large-scale and complex data through different computing methods, and has incomparable application value in clinical disease diagnosis and prognosis evaluation. [11,12]Encouraged by this, this study intends to explore the risk factors of ovarian metastasis of EC and build a predictive model of ovarian cancer metastasis based on machine learning, combined with traditional clinical pathological parameters and serological related indicators, and carry out hierarchical classification, refined and individual management for such patients, as well as help improve the prognosis and survival of patients with EC and ovarian metastasis.

Study population
We included 536 EC patients who were treated in Hubei Cancer Hospital (HCH queue) from January 2017 to October 2022.At the same time, we included 487 EC patients who met the inclusion and exclusion criteria in Tongji Hospital (January 2017 to December 2020) as an external validation queue (TJ queue).Inclusive criteria: (1) patients with complete clinical data and ≥18 years old; (2) all patients were initially treated with surgery, and were pathologically diagnosed as EC after surgery.Exclusion criteria: (1) patients with diseases that affect serum sex hormone levels; (2) patients with liver and kidney dysfunction; (3) patients who received chemotherapy, radiotherapy, hormone therapy and molecular targeted therapy before surgery.This study complies with the Declaration of Helsinki, and has been approved by the Ethics Committee of Hubei Cancer Hospital (LLHBCH2022YN-043) and the Ethics Committee of Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology (TJ-IRB20220556) at the beginning of this study.As this study was a retrospective study, the included patient data was anonymous, so the requirement for informed consent was waived.The flow chart of patient inclusion, model construction and verification were summarized in Figure 1.

Clinical data acquisition and quality control
We obtained the patient's clinical information by consulting the patient's electronic medical record, including age, menopausal status, family history of tumor, pathological type, degree of differentiation, lymph node metastasis (LNM), preoperative serum carbohydrate antigen 125, preoperative serum epididymal protein 4, preoperative serum albumin, preoperative D-dimer, preoperative ovarian stimulating hormone, luteinizing hormone, testosterone, progesterone, prolactin and estradiol, and blood routine (neutrophil count, lymphocyte count, monocyte count, etc).For data quality control, we used the maximum likelihood method to perform iterative operations to fill in missing values and estimate parameters, that is, the first step ("prediction step"), given an estimate of an unknown parameter, predicts the missing data in the sufficient statistics; In the second step ("estimation step"), use the sufficient statistics obtained in this step to calculate the correction value of the maximum likelihood estimation of the parameter, and repeat the above 2 steps until the results of the 2 previous calculations reach the specified convergence standard.

Evaluation criteria for ovarian metastasis of EC
The diagnostic criteria for ovarian metastasis of EC were proposed according to the existing literature, namely: (1) ovarian tumors were multinodular, and tumor nodules could be seen in medulla and ovarian cortex under microscope; (2) ovarian tumors with 2 or more of the following conditions: bilateral ovarian invasion, fallopian tube involvement, ovarian diameter > 5 cm, vascular invasion, deep muscle layer infiltration; During operation, cancer focus or abnormal ovarian shape can be seen with naked eyes, which is a dominant metastasis; No abnormality was found with naked eyes during the operation, but the pathological examination after the operation confirmed that ovarian metastasis was a recessive metastasis.The dominant and recessive metastasis were ovarian metastasis.

Construction and verification of machine learning model
We randomly divided the patients from Hubei Cancer Hospital into training set (70%) and internal verification set (30%), and    .969BMI = body mass index; DTD = degree of tumor differentiation; FHT = family history of tumor; FIGO = International Federation of Gynecology and Obstetrics; IQR = inter-quartile range; LMR = lymphocyte-to-monocyte ratio; LNM = lymph node metastasis; NAR = neutrophil-toalbumin ratio; NLR = neutrophil-to-lymphocyte ratio; PLR = platelet-to-lymphocyte ratio.
[14] At the same time, we used the patient data from Tongji Hospital as the external validation queue to conduct ten-fold cross validation model training and validation on the model training dataset and the external validation dataset respectively; In addition, the predictive efficacy of 5 machine learning models in predicting ovarian metastasis of EC was evaluated by receiver operating characteristic curve and decision curve analysis.

Statistical analysis
All data processing, statistical analysis and mapping are carried out in R 4.0.5 software (download website link: https:// cran.r-project.org/).For descriptive analysis, the median (inter-quartile range) and frequency (%) of continuous variables and categorical variables were evaluated, respectively.Pearson correlation coefficient is used to evaluate the correlation between 2 continuous variables.Chi square test was used to compare categorical variables, and Wilcoxon rank sum test or T test was used to compare continuous variables.All statistical tests are bidirectional.P < .05 was considered statistically significant.

Analysis of clinicopathological parameters of EC patients with ovarian metastasis
In the data set (i.e., HCH queue) where we built the prediction model, 536 patients were randomly divided into training set (N = 375, 70%) and internal validation set (N = 161, 30%) for internal validation, as shown in Table 1.The incidence of ovarian  S1, Supplemental Digital Content, http://links.lww.com/MD/K136.However, there were significant differences between the 2 groups in tumor differentiation, LNM, CA125, HE4, Alb and LH (P < .05).

Selection of predictive variables for ovarian metastasis of EC
For screening of candidate predictors of ovarian metastasis of EC, we adopted a two-step analysis, namely: Pearson product-moment correlation coefficient, as shown in Figure 2A.We analyzed the correlation between all the variables that can be included and the outcome variables (that is, whether the ovary is metastatic or not as a "second category").The results showed that the degree of tumor differentiation, LNM, CA125, HE4, Alb, LH, and ovarian metastasis had a strong direct correlation (that is, r coefficient).In view of this, we immediately carried out "weight diversion" analysis on candidate variables with significant positive or negative correlation in the prediction models of 5 machine learning algorithms (Fig. 2B).The results showed that the above candidate variables accounted for a certain proportion in various machine learning prediction models, indicating the degree of tumor differentiation, LNM, CA125, HE4, Alb LH can be used as a potential predictor of ovarian metastasis prediction model in EC patients.

Construction of predictive model for ovarian metastasis of EC based on machine learning algorithm
For different machine learning algorithm rules, we respectively build prediction models for RFM, ANNM, DTM, GEM, and SVM.For example, in the ANNM model, based on the algorithm called "Multilayer Feed-Forward Neural Network." [15]e included the degree of tumor differentiation, LNM, CA125, HE4, Alb, LH (that is, the input layer), carried out "hidden layer" iterative analysis, and finally obtained the hierarchical effect of ovarian metastasis risk (Fig. 3); In the RFM model, based on its "bagging" algorithm, we take the included candidate variables as "branching" nodes, and then conduct iterative analysis (Fig. 4A and Table S2, Supplemental Digital Content, http://links.lww.com/MD/K137), that is, the included variables are branched according to "Ensemble learning," and finally get the prediction results. [16]On this basis, DTM model will be regarded as the same kind of algorithm of RFM, while SVM follows the principle of "approximating discrete function value," as shown in Figure 4B.Similarly, SVM and GEM also follow their own algorithm classification principles.In a word, based on different machine learning algorithms, we have built a practical prediction model that can be used to predict ovarian metastasis of EC patients.

Effectiveness evaluation of prediction models constructed by different algorithms
To further evaluate the robustness and accuracy of various prediction models in predicting ovarian metastasis of EC patients, we used decision curve analysis to evaluate the effectiveness of 5 machine learning prediction models.That is, the function calculates the decision curve, which is the estimation of the standardized net income according to the probability threshold, and is used to classify the observation results as "high risk." [17]As shown in Figure 5, the robustness of ANNM is significantly better than the other 4 prediction models in both training set and validation set queues, which shows that ANNM has certain advantages in predicting ovarian metastasis of endometrial cancer patients.At the same time, we also used the traditional receiver operating characteristic to evaluate the accuracy.The results showed that the diagnostic effectiveness of ANNM in training set and internal verification set was (area under curve [AUC]: 0.899, 95% confidence interval [CI]: 0.844-0.954)and (AUC: 0.892, 95% CI: 0.837-0.947),respectively, followed by the RFM model (training set: [AUC: 0.826, 95% CI: 0.771-0.881];verification set: [AUC: 0.831, 95% CI: 0.776-0.886]),as shown in Table 2.In a word, the 5 machine learning algorithm ovarian metastasis prediction models have satisfactory prediction efficiency, and can provide risk stratification guidance for EC patients with ovarian metastasis.

External queue verification of prediction models constructed by different algorithms
In order to continue tracking and evaluating the prediction effectiveness of machine learning models in external queues, we immediately took TJ queue as the external verification of this study.As shown in Figure 6A, the prediction models of 5 machine learning algorithms still have consistent robustness in external queues.Similarly, we also used the optimal prediction model (i.e., ANNM) to predict the hierarchical risk of ovarian metastasis in the TJ cohort, as shown in Figure 6B.Even in the external cohort, ANNM can also identify whether patients have ovarian metastasis risk, which confirms that the ovarian metastasis prediction model of EC has reliable scalability.

Discussion
EC is characterized by high invasion, metastasis and poor prognosis, among which ovarian metastasis is relatively common. [18]he previous epidemiological data showed that the ovarian metastasis rate of EC was between 9.7% and 13.56%. [6,19]his study was consistent with the previous report, that is, the planting and spreading rate of cancer on the ovary was between 8.53% and 11.8%.Therefore, it is of great significance to study the risk factors of ovarian metastasis of EC for clinical measures to reduce the probability of ovarian metastasis of EC patients.In this study, the prediction model of ovarian metastasis based on machine learning algorithm was constructed according to the risk factor indicators of EC patients with ovarian metastasis.It was found that the prediction model (especially ANNM) has high sensitivity and specificity, which may be a combination of multiple indicators to achieve information complementation, enhancing the sensitivity and specificity of the model, thus effectively improving the prediction efficiency of EC patients with ovarian metastasis, so it can be better applied to guide clinical practice.
[21] The results of this study showed that tumor differentiation, LNM, CA125, Alb, HE4, and LH were independent risk factors for ovarian metastasis of EC (P < .05).The reason for this may be that malignant tumor cells have more or less the characteristics of differentiation to normal cells. [22,23]The closer the tumor cells are to normal cells, the higher the degree of differentiation, and the lower the degree of malignancy; On the contrary, the greater the difference with normal cells, the lower the degree of differentiation, the more active the proliferation and growth of cancer cells, and the higher the degree of malignancy.Therefore, the higher the degree of differentiation, the worse its invasion.The lower the degree of differentiation, the stronger its invasion ability, and the higher the risk of ovarian Table 2 The receiver operating characteristic curve analyses for ovarian metastasis in each machine learning-based prediction model.metastasis.Under normal circumstances, the LNM rate of EC patients with good differentiation and muscle layer infiltration depth < 1/2 is relatively low. [24]However, if EC patients have LNM and involvement, their risk of ovarian metastasis will be greatly increased, because cancer cells can invade the ovary through the cancer focus at the bottom of the uterus along the lymphatic network above the broad ligament, and through the pelvic infundibulum ligament, anastomose with the lower ovarian collecting lymphatic vessels, retrogradely invade the ovary and ectopic planting. [25,26]A125 is a high molecular weight transmembrane glycoprotein encoded by MUC16 gene, which is rarely secreted or not secreted by human body under normal conditions. [27,28]he level of CA125 is related to many factors, such as the inflammatory reaction of the body, the progress of malignant tumor, the amount of ascites, etc.When endometrial malignant lesions appear, the serum level can be abnormally elevated.When cancer cells metastasize to other tissues or organs and continue to proliferate, the secretion of CA125 antigen will increase. [29][31] This may be because the endometrial barrier is seriously damaged by cancer cells, which provides convenient conditions for tumor cells to transfer to Miller tubes or mesothelial tissues such as ovaries and fallopian tubes, thus increasing the expression level of CA125.In addition, HE4 is a secreted glycoprotein encoded by whey acidic protein gene. [32]Under physiological conditions, it is mainly expressed in the respiratory system and reproductive system, and the expression level is related to age, smoking, and menstrual cycle. [32,33]Under the pathological conditions of malignant tumors, HE4 can induce EC cell proliferation and promote tumor cell metastasis and invasion by activating related carriers and signal pathways.In addition, the levels of CA125 and HE4 will increase with the decrease of tumor differentiation, and the serum levels of CA125 and HE4 in patients with LNM are higher than those without LNM.To sum up, this study suggests that serum CA125 and HE4 levels can reflect whether lymph nodes have micrometastasis to some extent.

Model
Alb has many physiological functions such as maintaining colloid osmotic pressure in blood vessels, anti-inflammation, anti-oxidation, and scavenging oxygen free radicals.The development process of malignant tumor is closely related to the metabolic disorder of human body. [34,35]With the development of the disease, protein catabolism increases and anabolism decreases, leading to negative nitrogen balance. [35]In addition, during the development of cancer, the permeability of microvessels will increase, and the penetration of Alb will increase, which can further reduce the content of Alb. [36]The role of LH is to promote ovarian ovulation and luteinization, and promote luteal secretion of estrogen and progesterone. [37]Estrogen and progesterone can regulate the cytokines and growth factors in the endometrium and exert influence on the endometrium, stroma and perivascular. [38,39]In vitro studies have confirmed that progesterone can inhibit the release of cytokines from mouse and human EC cells, estrogen can induce the activation of nuclear factors, and increase the proliferation and migration capacity of Ishikawa cells. [40]In addition, androgen receptors can also express in endometrium at different stages of the menstrual cycle, postmenopausal atrophic endometrium, precancerous lesions, and develop into EC, so as to have an anti-proliferative effect on endometrial cells. [41]However, under the effect of cancer cells, the receptivity of endometrium and pelvic microenvironment will change significantly, and can show periodic changes under the influence of estrogen and progesterone, the cancer cell differentiation factor will release and retrograde into the fallopian tube, ovary and other parts to form ectopic planting.
In this study, 5 types of ovarian metastasis prediction models based on machine learning algorithm and the above available clinical indicators can be used for layered diagnosis and treatment of EC patients.In recent years, machine learning algorithm has been widely used in the medical field because of its superior characteristics of "iteration and "progression." [42,43]nterestingly, this study also found that the optimal prediction model ANNM can greatly improve the prediction efficiency of ovarian metastasis in EC patients.This may be due to the fact that after correcting the weights of various neurons in the network, the network will be output in the forward propagation mode again, resulting in the error between the actual output value and the expected value, which can lead to a new round of weight correction. [44,45]Finally, the process of forward propagation and back propagation repeats until the network converges, and the interconnection weight and threshold value after the network converges are obtained, making the output result more robust.
Although the clinical significance of machine learning algorithm in predicting ovarian metastasis of EC is promising, some limitations should also be recognized.First, all the samples in this study were retrospective, and the validation of future prediction models should be conducted in a prospective multicenter cohort.Second, this study was slightly monotonous in screening predictors based on clinical pathological parameters and serological indicators.In the future, we still need to use multi group technology (such as blood, urine proteomics, transcriptomics, etc) to find a better prediction marker to build a more robust prediction model.Third, the sample size based on this study was still small, and there was an inevitable risk of data imputation loss and selection bias.In the future, large sample data will still be required for multi-dimensional verification and optimization.

Conclusion
In conclusion, based on the multi-dimensional machine learning algorithm, we developed a stable and powerful prediction model for evaluating ovarian metastasis of EC.In particular, ANNM is undeniably a powerful forecasting tool that can help determine optimal clinical management strategies.

Figure 1 .
Figure 1.Flow chart of patient inclusion, model construction and verification.

Figure 2 .
Figure 2. Screening of candidate variables for prediction model of ovarian metastasis of EC. (A) Correlation analysis between ovarian metastasis and candidate variables; (B) weight assignment of candidate variables in 5 machine learning algorithms.

Figure 3 .
Figure 3. Prediction model of ovarian metastasis in patients with EC based on ANN algorithm.(A) Construction of ANNM; (B) proportion of weight coefficient distribution of ANNM candidate variables.

Figure 4 .
Figure 4. Prediction model of ovarian metastasis in patients with EC based on "Decision Tree" algorithm.(A) RFM; (B) DTMM.

Figure 5 .
Figure 5. Effectiveness evaluation of prediction model constructed by machine learning algorithm.(A) Training set; (B) internal validation set.

Figure 6 .
Figure 6.External queue verification of prediction model constructed by machine learning algorithm.(A) Verification of robustness and prediction efficiency of prediction model (from TJ queue); (B) identification efficiency of ANNM in differentiating ovarian metastasis.

Table 1
Comparison of clinicopathological parameters between ovarian metastasis and non-metastasis in patients with EC (from HCH queue).