A nomogram prediction model of pseudomyxoma peritonei established based on new prognostic factors of HE stained pathological images analysis

Abstract Background Pseudomyxoma peritonei (PMP) is a rare clinical malignant syndrome, and its rarity causes a lack of pathology research. This study aims to quantitatively analyze HE‐stained pathological images (PIs), and develop a new predictive model integrating digital pathological parameters with clinical information. Methods Ninety‐two PMP patients with complete clinic‐pathological information, were included. QuPath was used for PIs quantitative feature analysis at tissue‐, cell‐, and nucleus‐level. The correlations between overall survival (OS) and general clinicopathological characteristics, and PIs features were analyzed. A nomogram was established based on independent prognostic factors and evaluated. Results Among the 92 PMP patients, there were 34 (37.0%) females and 58 (63.0%) males, with a median age of 57 (range: 31–76). A total of 449 HE stained images were obtained for QuPath analysis, which extracted 40 pathological parameters at three levels. Kaplan–Meier survival analysis revealed eight clinicopathological characteristics and 20 PIs features significantly associated with OS (p < 0.05). Partial least squares regression was used to screen the multicollinearity features and synthesize four new features. Multivariate survival analysis identified the following five independent prognostic factors: preoperative CA199, completeness of cytoreduction, histopathological type, component one at tissue‐level, and tumor nuclei circularity variance. A nomogram was established with internal validation C‐index 0.795 and calibration plots indicating improved prediction performance. Conclusions The quantitative analysis of HE‐stained PIs could extract the new prognostic information on PMP. A nomogram established by five independent prognosticators is the first model integrating digital pathological information with clinical data for improved clinical outcome prediction.


| INTRODUCTION
Pseudomyxoma peritonei (PMP) is a malignant clinical syndrome characterized by the accumulation and redistribution of mucus produced by mucinous tumor cells (TCs) in the peritoneal cavity, with its typical clinical manifestations including mucinous ascites, peritoneal implantation, omental cake, and ovarian involvement in women. 1 The overall incidence of PMP is 2-4/million, the prevalence is 25.1/million, the male to female ratio is 1:1.2-3.4,and the median age is 43-63 years old. 2 Currently, cytoreductive surgery (CRS) plus hyperthermic intraperitoneal chemotherapy (HIPEC) is the standard treatment to significantly improve the survival. 3s a rare clinical tumor syndrome, the pathological classification and grading of PMP is a routine method to predict the biological characteristics and clinical outcomes of PMP. 4,5At present, the recognized histopathological typing is mainly based on the qualitative assessment of TCs number, tumor nests (TNs) morphology, TCs atypia, mitotic, and tumor invasion mode. 6However, it has long been observed that some low-grade mucinous carcinoma peritonei (LMCP) patients without any known adverse prognostic factors had poorer prognosis than some highgrade mucinous carcinoma peritonei (HMCP) patients with poor prognostic features. 7Several factors could account for such complexity.First, such works is prone to observer differences among PMP pathologists.Second, due to the tumor heterogeneity, some patients have different histopathological types in different or even the same tumor sites, but with only one result in the final pathological diagnosis.Finally, the tumor microenvironment can mediate the occurrence and progression of PMP and cause morphological changes at the tissue-or cell-level, which are visible in pathological images (PIs). 8Therefore, it is necessary to explore new pathological prognostic factors of PMP based on the analysis of in-situ features in PIs.
The development of digital pathology has greatly promoted the application of image analysis research in pathology, and tumor pathology has gradually developed from manual qualitative diagnosis to machine learningassociated diagnosis. 9The PIs analysis techniques developed on this basis can help overcome the inconsistency of subjective interpretation and help to explore new pathological prognostic features.Nowadays in the field of solid tumors, a variety of methods for PIs analysis, computeraided diagnosis and prognosis have been proposed, providing a new direction for the development of PMP pathological research. 10,11This study aims to extract rich quantitative morphological features through digital PIs analysis, and integrate various clinical data to establish prognostic model to explore new pathological prognostic features.

| Patient cohort
A cohort of 92 PMP patients, who had no prior surgical history and received CRS + HIPEC at our center from December 2015 to December 2021, were included.Tissue slides, formalin-fixed paraffin-embedded tissue blocks, clinicopathological data, and follow-up information were available.The other inclusion and exclusion were consistent with CRS + HIPEC criteria. 12All the surgical specimens and HE stained slides were reread by two senior pathologists (Yan FC, Gao Y) according to the Peritoneal Surface Oncology Group International (PSOGI) histopathological diagnostic criteria of PMP, and PMP was divided into three types: LMCP, HMCP, HMCP with signet ring cells (HMCP-S). 6For the sake of model establishment, we have excluded the acellular mucin type.Overall survival (OS) was used as the primary endpoint and defined as the time from clinical diagnosis.

| Tissue slides construction and scanning
All the conventional surgical specimens of PMP patients were subjected to thorough histopathological study, with routine HE staining (Dako Hematoxylin, Dako Eosin and Dako Bluing Buffer, catalog number CS701; Dako CoverStainer, Agilent Technologies Inc., USA).PMP specimens were collected from different anatomical regions, and five HE stained slides with most prominent tumor proliferation and aggressive growth were selected from each patient by a senior pathologist (Yan FC).
Tissue slides were subsequently scanned at 200 × magnification (0.246 μm/pixel) using a whole slide scanner (KF-PRO-400 scanner, Jiangfeng, China).Each slide was scanned into a whole-slide images (WSI) with the image type of ".kfb files".

ROI selection
First, the ".kfb files" were converted to ".svs files" using a format converter (Jiangfeng, China).And then scanned WSIs were imported into QuPath (v.0.3.2,University of Edinburgh, UK), an open-source digital image analysis software.The color deconvolution algorithm in QuPath was used for stain separation to give a normalized representation of hematoxylin and eosin colors in the image, so that the subsequent image research could not be affected by different staining intensities of images as much as possible.Commands such as "Add smoothed features" and "Add intensity features" were performed to improve classification accuracy of detection of target areas.
The tumor areas with typical histopathological features, namely regions of interests (ROIs), were randomly selected for subsequently study by two investigators (Ma R, Yan FC), and then ROIs 500 × 500 μm (2029 × 2029 pixels) in size were generated within QuPath.ROIs included tumor and stroma areas, but not necrotic areas or improper staining artifacts.To ensure adequate representation of each patient and minimize selection bias, the goal of selecting a minimum of 25 patches per patient (five images per patient and five patches per image) was set.

| Image features extraction
QuPath was carried out to segment tissue, and cell detection process, including the creation and manipulation of modules such as interactive mapping tools (annotating TNs and stroma) or automatic segmentation commands (such as detection TCs or nuclei). 13anual annotation tools such as Brush and Wand were used to mark the TNs and stroma within the ROIs, and Command "Cell Detection" was used to quickly detect nuclei and cells.The detection parameters were unified as follows: background radius of nucleus was 8 μm; sigma was 1.5 μm; minimum and maximum area were 10 and 400 μm 2 ; threshold of intensity was 0.1; maximum background intensity was 2; and other parameters were the default values.Then, any non-cellular, folded, blurry or defective morphology were manually removed.The detected cells were appropriately defined as four components (including TCs, immune cells, blood cells, and other stromal cells) by module "Object classification", and annotated accordingly by two investigators (Ma R, Yan FC) using annotation tools.
Then, the features of TNs, stroma, TCs, all stromal cells and nuclei were extracted from the tissue-, cell-and nucleus-level, respectively.The color features were greatly affected by the PIs themselves, so they were not included in this study.Since five different WSIs were collected for each patient, the average of the features in the five images was calculated as the final feature value.

| Follow-up
Follow-up records included survival status and OS.The last unified follow-up date was May 31, 2023, and the follow-up rate was 100%.
The main steps of feature extraction and prognostic model establishment based on PMP PIs proposed in this study are described in Figure 1.

| Statistical analysis
X-Tile was used to convert the continuous variables (including clinicopathological characteristics and image features extracted by QuPath) into categorical variables for subsequent survival analysis.Categorical variables were presented as numbers (percentages), and Spearman correlation analysis was used for correlation analysis.Then Kaplan-Meier survival analysis screened for features with potential prognostic significance, and then partial least squares (PLS) regression was performed for multiple correlation features dimensionality reduction.The optimal number of components in dimensionality reduction were obtained by the cross-validation of root-squared error of prediction (RMSEP) and variance explained rate for different number of components.The smaller the RMSEP, the larger the variance explained rate (close to one), indicating the better the number of components. 14aplan-Meier survival analysis was used to plot the survival curves between subgroups of features, and log-rank test was used to calculate the statistical significance.Cox proportional regression model was applied to identify independent prognostic factors on OS.After multivariate survival analysis, a nomogram was constructed using independent predictors by "rms" R package to visually predict the 1-, 2-, and 3-year survival probability of OS.The variable with the largest influence on the OS was assigned a maximum of 100 points, and other variables were assigned a lower maximum value proportional to their impacts.Concordance index (C-index), calibration plots were used to evaluate the performance of the nomogram.C-index was applied for assessment of discrimination ability of prediction model, and calibration plots was performed to determine the concordance between the predictive and actual survival at 1, 2, and 3 years.A twosided p < 0.05 was considered as statistically significant.

| PIs feature acquisition
In this study, a total of 2245 patches were selected from 449 WSIs of 92 PMP patients.There were 1,208,717 cells detected, among which TCs, stromal cells, immune cells, and blood cells were 31.87%,47.87%, 11.02% and 9.24%, respectively.
Forty image parameters of different components were extracted from multiple classes.These parameters were divided as tissue-level features (including the number, size, and shape of TNs, etc., n = 11) and celland nucleus-level features (including the density, size, shape, contour, etc., n = 29) based on feature acquisition approach (Table S1).The extracted quantized numerical features were converted into three categorical variables using X-Tile.

| Clinical value of morphologic parameters from PIs
The above extracted parameters were included in the log-rank test, and the results showed as follows.At the tissue-level, univariate survival analysis demonstrated statistically significant differences on OS between the subgroups of the following nine parameters: TNs number (p < 0.001), TNs area average (p < 0.001), TNs area variance (p = 0.001), TNs perimeter average (p < 0.001), TNs area sum (p = 0.029), TNs perimeter sum (p < 0.001), TNs area/perimeter (p < 0.001), TNs/ stromal area ratio (p < 0.001), and TNs cell density (p < 0.001) (Table 2).

| Correlation analysis
Since features extracted in the process of machine learning were prone to product relevant features, the Spearman correlation analysis was carried out on the above 20 parameters (including nine at tissue-level, four at cell-level, and seven at nucleus-level).Figure 2 simultaneously showed the correlation between parameters (super diagonal), two-way scatter plot (sub diagonal), and the histogram of each parameter (diagonal).The correlation coefficient >0.5 was determined as close relation between two parameters.The results showed that there were multiple significant correlations among the nine parameters at tissue-level; there were multiple significant correlations among the nine parameters at cell-and nucleus-level except TCs nuclei circularity variance and pericancerous blood cell density (|r|>0.5, p < 0.001).

| PLS regression
Multiple correlation parameters of the above two levels were included into PLS regression respectively for feature dimensionality reduction.According to the results of RMSEP and variance explained rate, it was found that the PLS regression model at tissue-level had a better performance when it contains two components (RMSEP is relatively small and variance explained rate exceeded 85.0%).Similarly, the regression model at cell-and nucleus-level worked better when it had two components (Figure 3A,B).As a result, four new parameters were obtained by PLS regression: component (COMP) 1 and COMP 2 at tissuelevel, COMP 3, and COMP 4 at cell-and nucleus-level (Figure 3C).Table S2 presented the PLS regression coefficient with four components.
3.5 | Clinical value of image features by multivariate analysis

| Multivariate survival analysis
To verify the clinical value of the newly selected image features, these parameters were combined with traditional histopathological characteristics for multivariate survival analysis.Similarly, in order to avoid possible multicollinearity between variables in the COX regression model, Spearman rank correlation analysis was performed, and the results showed that there was no close correlation between image parameters and general pathological characteristics such as histopathological type (|r| < 0.5).Then, factors in the univariate survival analysis (p < 0.05) were incorporated into the COX regression model, delineating the following five independent prognostic indicators.In addition to three traditional factors including preoperative CA199, CC score and histopathological type, two image features including COMP 1 at tissue-level and TCs nuclei circularity variance (Figure 4; Table 5).

| Establishment and evaluation of nomogram
To construct a clinical-based method for predicting the prognosis of PMP patients, a nomogram that incorporated the above five independent prognostic factors was established, and the C-index was 0.795 (95% CI: 0.748-0.842).The Calibration plots (1000 bootstrap resamples) showed highly consistent between the predicted and actual observation in predicting 1-year, 2-year, and 3-year survival rate, indicating an improved predictive performance of nomogram (Figure 5).

| DISCUSSION
In this study, a total of 449 HE-stained WSIs of 92 PMP patients were marked and characterized quantitatively using QuPath.Forty detailed morphological parameters were extracted from the tissue-, cell-and nuclei-level.PIs researches of other tumors have shown that HE stained PIs contained many potential prognostic features. 15,16In contrast, the in-depth studies of PMP pathology are relatively rare, and only some investigators have tried quantitatively analysis on PIs.Some studies have shown that TCs density and morphology could affect   the prognosis of PMP, [17][18][19] and stromal components also played an important role of PMP prognosis. 8,20However, these studies were qualitative or semi-quantitative with relatively simple evaluations, which failed to comprehensive analyze the whole PIs.Moreover, most studies focused on the tumor, and stromal research was not enough.In addition, reliable prediction tools for PMP, such as nomogram, are relatively scarce.2][23] The nomogram in this study also showed good prediction performance (C-index was 0.795).Previous studies included the following features: In multi-center SEER dataset study, some prognostic factors were missed in dataset and excluded in nomogram 23 ; In single-center large-sample study, some important data were missing due to the long-time span or patient's referral 21 ; or other small-sample studies with complete data. 22Moreover, these prognostic models only included general clinicopathological features.It is important to explore potential features from abundant PIs information and establish a nomogram prediction model to assist clinical decisions for PMP patients.
In this study, we found five independent prognosticators, and a nomogram was developed for prognosis prediction and assess the risk for PMP patients.Preoperative CA199 was an independent prognostic factor, like previous studies.The biomarker CA199 can inhibit cell differentiation, promote cytoadherence, and enhance tumor metastasis, thus judging the proliferation activity of TCs. 24And some previous studies have also demonstrated that high preoperative CA199 level could serve as an independent factor for poor prognosis. 25,26Moreover, Hiraide et al. 27 have found that modified FOLFOX6 chemotherapy could decrease CA199 level, so reducing preoperative CA199 through perioperative chemotherapy may provide a way to improve the prognosis of PMP patients.
Another independent prognosticator was CC score, which similar to our previous study. 28CC score is an objective index to evaluate tumor resection of standardized CRS, and previous studies found PMP patient with satisfied CC score had better prognosis. 22,29Huang et al. 30 assessed the learning curve of CRS + HIPEC technique and found that CC0 complete rate increased with accumulation of surgical experience.Bai et al. 31 found that gender, disease duration, anemia and preoperative CA199 could help to predict CC score in PMP patient.Passot et al. 32 found that preoperative 18F-FDG PET could predict the postoperative CC score of PMP patient.Therefore, standardized CRS + HIPEC; surgical techniques improvement, and the professional examination before operation could decrease the postoperative CC score and thus improve the survival.
We also found that histopathological type was an independent prognosticator, and had the greatest impact on PMP prognosis based on nomogram.The histopathological classification is essential for the assessment of tumor behavior.Previous researchers also found that histopathological type could significantly affect the prognosis of PMP patients. 33In contrast, some studies have found no significant correlation between the prognosis with histopathological type, 34 and studies have even showed some PMP patients with poor malignant grade had a better prognosis than those with better differentiation. 7The difference may come from the standardization of PMP classification criteria, the difference of observers.Moreover, as mentioned above, PMP PIs may contain abundant information to explore new pathological prognostic features.
In this study, QuPath was used to analyze PMP PIs.Multiple morphological features were extracted at tissuelevel to measure the morphological complexity of TNs structure; And the size, shape, contour of TCs and nuclei were extracted at cell-and nucleus-level to quantify the polymorphism of TCs.This could help to overcome the inconsistency of subjective interpretation, quantitatively extract features as objectively and repeatably as possible; Moreover, with digital PIs features as the core, the prognosis model was established and two new prognostic features beyond conventional pathological parameters were found.
Tumor invasion largely depends on the collective behavior of TCs populations, that is, TNs.In-depth study of TNs and TCs can reveal more useful information about tumor development.Our previous study showed that the ratio of tumor/stroma area had a significant impact on the prognosis, suggesting the clinical value of TNs area. 35houdry et al. 19 and Horvath et al. 36 found that PMP patients with medium or high TCs density had a higher risk of disease progression.Bhatt et al. 18 found that the cytological morphology of TCs also affected the prognosis of PMP patients.Our study explored more features based on previous studies.The dimension-reduction features COMP 1 and COMP 2 of PIs comprehensively evaluated TNs morphology.Survival analysis showed that COMP 1 was an independent prognosticator, again verifying that TNs behavior was closely related to the malignancy of PMP.
In addition, the nomogram showed TCs nuclei circularity variance had a greater impact on OS than COMP 1.The changes of TCs nucleus plays a leading role in tumor occurrence, development and metastasis, and the morphological changes in PIs are an easy and intuitive method to detect the transforms of tumor nucleus. 37Nuclear characteristics in PIs are basis for benign or malignant detection and grading of many solid tumors.Papanicolaou's smear test is used to diagnose uterine and cervical cancer by detecting nuclear chromatin staining, size, and shape changes. 38Whitney et al. 39 found that nuclear shape, texture and structure could independently predict the risk of recurrence in ER+ breast cancer patients.This study showed that TCs nuclei circularity variance, which mainly evaluated the uniformity of nuclear roundness, was an independent prognosticator, and the greater the difference in nuclear roundness, the better the prognosis of PMP patients.
Although some promising results have been obtained in this study, we must acknowledge the innate limitations of this exploratory work.First, the relatively small sample size inevitably reduces the representing power of the cohort, which could be a source of potential bias.Second, our nomogram was only verified by internal validation, which is inevitably less convincing than external validation.Therefore, it is necessary to conduct further prospective model verification including a larger sample from multi-center database.Third, the molecular map of genomic data is also important for tumor prognosis, and it is better if multiple omics data can be integrated to establish a prognostic model.Finally, the quantitative research of HE stained PIs of PMP is still in exploratory stage, and the number and quality of sub visual features extracted may be limited.In the future, better algorithms need to be developed to excavate more potential pathological features.
In conclusion, this study established a prognosis nomogram prediction model based on new pathological characteristics of PMP through quantitative analysis of PIs, which represents the first one step forward towards digitalized pathological diagnosis in PMP routine pathology.This work could enhance the efficiency and accuracy in routine clinicopathologic diagnosis, significantly reduce the manual labor and more objectively locate the typical pathological ROIs and evaluate the importance impact of pathological features on the PMP prognosis, which is conductive to more accurate pathological diagnosis, disease course prediction and treatment decision.It is supposed that the key to further improve the prognosis of PMP in the future may be: (1) new adjuvant therapy to reduce preoperative CA 199; (2) more thorough CRS; (3) improving the ability to explore the information of the occurrence and development of PMP tumor itself to predict the future biological behavior is an important and urgent clinical task.

F I G U R E 2
Pair wise correlation and scatter plot matrix of all significant variables in KM analysis.The correlation between variables (super-diagonal), two-way scatter plot (sub-diagonal), and the histogram of each variable (diagonal).Red number: 1: TNs number; 2: TNs area average; 3: TNs area variance; 4: TNs perimeter average; 5: TNs area sum; 6: TNs perimeter sum; 7: TNs area/perimeter ratio; 8: TNs/ stromal area ratio; 9: TNs cell density; 10: TCs nuclei area average; 11: TCs nuclei area variance; 12: TCs nuclei perimeter average; 13: TCs nuclei circularity variance; 14: the min caliper average of TCs nuclei; 15: the min caliper variance of TCs nuclei; 16: TCs area variance; 17: TCs perimeter variance; 18: TCs area/perimeter ratio; 19: TCs nuclei area/perimeter ratio; 20: pericancerous blood cell density.Blue *: there was a stronger correlation between the two variables (|r|>0.5, p < 0.001).Kaplan-Meier analysis showed that 20 features were significantly correlated with OS, and they were fused into four composite features by correlation analysis and PLS regression.Sub visual pathological features and other clinicopathological indicators related to prognosis were incorporated into COX regression, and it identified five independent prognosticators of three categories: (1) preoperative tumor marker: CA199; (2) operation technique: CC score; (3) tumor biological characteristics: Histopathological type, COMP 1 at tissue-level, and TCs nuclei circularity variance.The nomogram demonstrated that the effect on prognosis in descending order was as follows: Histopathological type, TCs nuclei circularity variance, CC score, COMP 1 and preoperative CA199.The validation results showed that the C index reached 0.795, and the calibration curve also showed good prediction performance in 1-, 2-, and 3-year survival rate.

T A B L E 5 | 13 of 16 MA
Multivariate COX proportional hazards model in 92 PMP patients.F I G U R E 5 Establishment of nomogram of PMP patients.(A) Construction of prognostic nomogram to predict survival of patients; (B) calibration plots of the nomogram for predicting 1-, 2-, and 3-year survival probability.CA199, carcinoembryonic antigen 199; CC, completeness of cytoreduction; COMP, component; PMP, pseudomyxoma peritonei.et al.
Major clinicopathological characteristics of PMP patients.Analysis of tissue-level features regarding OS (p < 0.05).
Analysis of cell-level features regarding OS (p < 0.05).
Analysis of nucleus-level features regarding OS (p < 0.05).
T A B L E 4Abbreviations: OS, overall survival; TCs, tumor cells.