Using an anomaly detection approach for the segmentation of colorectal cancer tumors in whole slide images

Colorectal cancer (CRC) is the second most commonly diagnosed cancer in the United States. Genetic testing is critical in assisting in the early detection of CRC and selection of individualized treatment plans, which have shown to improve the survival rate of CRC patients. The tissue slide review (TSR), a tumor tissue macro-dissection procedure, is a required pre-analytical step to perform genetic testing. Due to the subjective nature of the process, major discrepancies in CRC diagnostics by pathologists are reported, and metrics for quality are often only qualitative. Progressive context encoder anomaly detection (P-CEAD) is an anomaly detection approach to detect tumor tissue from whole slide images (WSIs), since tumor tissue is by its nature, an anomaly. P-CEAD-based CRC tumor segmentation achieves a 71% 26% sensitivity, 92% 7% specificity, and 63% 23% F1 score. The proposed approach provides an automated CRC tumor segmentation pipeline with a quantitatively reproducible quality compared with the conventional manual tumor segmentation procedure.


Background
Colorectal cancer (CRC) is the second most frequently diagnosed cancer in the United States for both sexes and is also the second most common cause of cancer-related deaths worldwide. 1,2Genetic testing is the cornerstone of personalized medicine, and is rapidly becoming a necessary tool for prognostication and treatment selection, which have the potential to enhance the 5-year survival rate of CRC patients. 3ccording to the most recent NCCN guidelines, 4 the most important factors that influence treatment selection include pathologic staging and prognostic markers, including, but not limited to, MMR status ( with reflex for MLH1 promoter methylation or more expanded genomic testing), Her2 immunostain/fluorescent in-situ hybridization, KRAS, NRAS, BRAF, and NTRK mutations.Next-generation sequencing (NGS) offers to investigate most of the above mutations/fusions.
It is important to conduct genetic testing during the diagnostic process in CRC, as 5%-15% of cases are caused by inherited cancer susceptibility genes. 5,6Identifying TP53 mutation status correlates with higher stage and influences the overall survival rate. 7EGFR inhibitor therapies are not effective for CRC patients with positive mutations in KRAS, BRAF, PI3KCA, and PTEN, highlighting the need for accurate genetic mutation status, that will ensure successful selection of individualized therapy.Different genetic mutation status also impacts CRC survival, where the CRC patients with a positive mutation of LRP1B have a higher recurrence rate and shorter progression-free survival (PFS) compared to those with a positive mutation of FAT4. 7Therefore, CRC genetic testing is critical in improving predictions of CRC prognostics and survival rate.
In conventional clinical CRC patient-care pathways, tumor samples are formalin-fixed and paraffin-embedded into one or more tissue blocks.A guideline by Ballester and Cruz-Correa is used to determine if individuals should undergo genetic testing based on factors such as age at diagnosis of affected family members, personal and family history of colon polyps, and extracolonic cancers. 8If a patient meets the guideline for genetic testing, pathologist will confirm the best-block for testing and annotate tumor regions on H&E stained slides, to ensure selection of the the largest and purest viable tumor area, followed by tumor scraping from unstained slides by cytotechnologists.This step will ensure the highest possible yield of DNA or RNA from this specimen, with least benign or inflammatory cell contaminants.
The most important factors in ensuring a successful NGS testing are preanalytical variables, including selection of invasive tumor, size of invasive tumor, viability of tumor, and purity of tumor (i.e., minimal presence of benign cells, including inflammatory cells).The current clinical workflow for NGS testing, known as tissue slide review (TSR), is completely manual and suffers from significant interindividual variation leading to discrepancies in tumor annotation. 9To improve on this process, we are proposing to employ an artificial intelligence (AI) tumor segmentation algorithm to automatically detect tumor regions from digitized H&E-stained whole slide images (WSIs).This would allow control of multiple pre-analytical variables through selecting the block with the largest and purest tumor surface area and segmenting that area for later tumor recovery for subsequent testing (Fig. 1).

Related work
Image classification is a widely used method for detecting tumor regions in WSIs.This approach labels a WSI as either CRC positive or negative.1][12][13] On the other hand, image segmentation provides the xand ycoordinates of tumor regions in CRC WSIs-which is necessary for the TSR process (rather than a yes/no answer that tumor is present). 14While a supervised image segmentation approach is promising, acquiring ground-truth annotations from pathologists to train the supervised image segmentation model can be biased, expensive, and time-consuming, making the training process impractical.
Attempting to find other approaches in order to mitigate the shortcomings of requiring pathologist-provided annotations, the use of a Generative Adversarial Network (GAN) is explored for unsupervised anomaly detection. 15It is used to identify patterns of pixels that deviate from the established pattern in training images, without the need for high-quality pixel-level annotations from pathologists.This approach is particularly useful in tumor segmentation, as tumor tissue is a type of anomalous colon tissue. 16,17he GAN-based anomaly detection algorithm, referred to as GANomaly, 18 is a commonly used unsupervised anomaly detection approach.However, the GANomaly approach is based on the deep convolutional GAN (DCGAN), 19 which is not meant for very highresolution images, like colon WSIs.Different from DCGAN, the progressive GAN, also known as pGAN, is specifically designed for high resolution image data. 20In pGAN, 2 major components, the generator (G) and discriminator (D), are trained gradually starting from 4×4 resolution.Image layers of increasing resolution are incrementally added to G and D, allowing the model to be progressively trained from 4×4 up to 1024×1024, increasing by a multiple of 2, while keeping all the existing layers trainable during the entire training process.In addition, to maintain a smooth transition from lower to higher resolutions during the training of G, new layers are faded in smoothly while doubling the current resolution of image features using nearest neighbor filtering.A newly added toRGB layer with weight α increases linearly from 0 to 1, which further projects the features to the R (red)G(green)B(blue) color channels.Reversely, another newly added fromRGB layer with the same weight α projects the RGB color images to the feature vectors.The features are further faded into a new convolutional layer to halve their resolution using the average pooling strategy.Similarly, a smoothed training process for D is performed.This process could downscale the input images to match the requirements for the current image sizes of the network.This unique progressive GAN architecture is able to outperform the other conventional GAN architectures in generating photorealistic high-resolution normal colon WSIs by providing a global view focus on the normal colon histology representation from the entire slide in a relatively lower resolution level, and a local view focus on the detailed nuclei morphology patterns in a relative higher resolution level.Therefore, applying the progressive context encoder anomaly detection pipeline (P-CEAD) 21 was proposed for CRC tumor segmentation.

Materials and methods
The objective of this research is to automate the process of segmenting CRC tumor regions from WSI using P-CEAD.P-CEAD is a distinctive anomaly detection pipeline based on pGAN.Its training process consists of 3 phases (Fig. 2). 21In Phase 1, a pGAN architecture is trained using an image inpainting technique 22 on normal colon WSIs exclusively, in order to produce photorealistic normal (non-diseased) colon WSIs.This training phase enables pGAN to learn a reliable reference distribution of normal colon tissue representations by minimizing the error distance values between the input real normal colon WSIs and the generated photorealistic colon WSIs.Since not all pixels in a WSI are part of the tissue regions, the Otsu 23 method was used to identify these regions and extract image patches from them.Image patches were extracted from tissue regions on WSIs in 1024×1024 pixels, then downsampled to 512×512, 256×256, 8×8, and 4×4 pixels.The training data is saved in TFRecord files, 24 with each file containing binary image patch tensors and the corresponding file The goal of phase 2 in the training process is to calculate the normal error reference distribution (NERD).NERD is a multivariate Gaussian distribution of the absolute errors, also known as reconstruction errors, between the input real WSIs and the generated photorealistic WSIs.Because, during Phase 1, pGAN is only trained on normal colon WSIs, the absolute errors between the input real normal colon WSIs and the generated photorealistic normal colon WSIs should be small.The reconstruction errors between the input real CRC and the generated photorealistic CRC WSIs are expected to be relatively large because the GAN never learned how to encode features present in anomalous tissues and is therefore more prone to create higher reconstruction errors.
During Phase 3 of the training, the NERD and reconstruction errors are used to calculate pixel-level Mahalanobis distances.The goal of this phase is to identify a cut-off threshold to distinguish between normal and CRC tumor pixels in a WSI.If the Mahalanobis distance for a given pixel is higher than the threshold, it is considered an abnormal colon pixel; otherwise, it is considered a normal colon pixel.
After completing all 3 phases of training, the pGAN model was fed 1024×1024 resolution image patches extracted from tissue regions on a test set of WSI containing CRC.From this, the reconstruction errors between the input and generated images from the trained pGAN were calculated and binarized based on the Mahalanobis distance threshold.Using   the shapely package for Python, 25 polygon objects were created around the identified CRC tumor pixels and saved into a GeoPandas dataframe. 25he comparison between predicted and pathologist-annotated CRC tumor polygon objects were used to calculate a confusion matrix, including pixel-level counts of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) areas of the WSI.TP was defined as the number of pixels of the areas that are within both the predicted and annotated CRC tumor polygons.FP was defined as the number of pixels of the areas that are within the predicted CRC tumor polygons, but are not within the annotated CRC tumor polygons.TN was defined as the number of pixels of the areas that are outside of both the predicted and annotated CRC tumor polygons.FN was defined as the number of pixels of the areas that are outside of the predicted CRC tumor polygons, but are within the annotated CRC tumor polygons.In other words, each pixel is labeled independently.In other words, when annotating polygons of tumor tissue, all pixels within that boundary are labeled as TRUE.Otherwise, the remaining pixels are FALSE.During the inference stage, we predict an additional polygon set that defines the TRUE regions-with all others being FALSE.Pixels are then compared between truth and predicted, with those in both predicted TRUE end up calculated as TP.If both the predicted and annotated regions score a pixel as FALSE, then we consider that pixel to be a TN.Sensitivity, specificity, and F1 score are derived from these values to provide a quantitative measurement of the model performance.The codebase, including the training and inference pipeline, is publicly available via https://github.com/quincy-125/tsr_crc_tumor_seg.
A total of 277 WSIs scanned by the Aperio GT450 scanner 26 at the Mayo Clinic were used for training and inference (Table 1).Out of these, 140 were normal colon WSIs and 137 were CRC WSIs.All WSIs underwent quality control examination by a senior cytotechnologist and a senior anatomic pathologist.During the training process, 140 normal colon WSIs were used.Out of these, 100 were used for Phase 1, 20 were used for Phase 2, and the remaining 20 were used for Phase 3. Model inference was performed using all 137 CRC WSIs.The manual annotations of CRC tumors were required to compute the statistical metrics (i.e., confusion matrix, sensitivity, specificity, and F1 score).Tumor annotations from all 137 CRC WSIs were drawn by pathologists using QuPath. 27
A notable advantage of the P-CEAD-based CRC tumor segmentation pipeline is its fully unsupervised nature.This eliminates the need for time-consuming and costly pathologist annotations during the training process, underlining one of the benefits of implementing the unsupervised P-CEAD approach for CRC tumor segmentation in WSI.
However, our ground truth for CRC tumor annotation, meticulously done by a pathologist, is microscopic and encapsulates large non-tumor areas surrounding the main lesion, inclusive of whitespace regions.These anomalies are a source of model error since our model focuses only on tissue-containing patches, excluding the whitespace regions.Consequently, the manually annotated CRC tumor areas (TP) tend to be larger than the predicted tumor areas (TP+FP), leading to an increase in false-negative predictions (Fig. 4A).One potential solution could be to remove whitespace regions from the manually annotated areas to reduce the false negatives in future iterations.
Originally designed as an anomaly detection model, P-CEAD identifies all regions diverging from the norm, which includes inked tissue, inflamed tissue, and malignant areas from WSI.This could lead the model to classify artifacts such as on-slide annotations as anomalies, thereby increasing the false-positive predictions (Fig. 4B).To mitigate this, we propose adopting Jiang et al.'s 28 ink-removal technique as part of the data preprocessing procedure before model inference in future experiments.
In our P-CEAD-based model, peritumoral changes were included in the predicted CRC tumor areas.As discussed earlier, P-CEAD aims to detect all anomalous tissues, not solely malignant CRC tumors.Hence, the model included benign stromal tissue connected to malignant CRC tumors within the predicted areas, a factor contributing to false positives.For model training, we relied on normal colon WSIs (Section 2).A potential amendment could be to introduce benign tissues into the training set to adjust the Normalized Error Rate Difference (NERD), thereby reducing false-positive predictions from non-malignant tissues (Fig. 4C) The confusion matrix (TP, FP, TN, FN) of our P-CEAD-based model is measured in the pixel-level, even though our model resulted in some pixel-level false predictions, it does not lead to slide-level false predictions.Out of all 137 malignant colon WSIs (positive WSIs), our model did find pixel-level TPs for each of the 137 WSIs.Therefore, our P-CEAD-based model achieved a 100% accuracy in predicting the presence of the malignant colon tumors in the slide-level.
In summary, our P-CEAD model, an unsupervised anomaly detectionbased tumor segmentation approach, yielded 71%±26% sensitivity, 92% ±7% specificity, and 63%±23% F1 score in segmenting CRC tumors from WSI.This underscores the value in further exploration of the P-CEAD-based tumor segmentation algorithm in other cancer types.To optimize model performance, we recommend adding WSIs with artifacts or non-malignant CRC tumor anomalous tissue to the training data set.This could reduce the misclassification of such tissues as malignant CRC tumors when utilizing the anomaly detection approach of P-CEAD.Further, image preprocessing approaches such as ink-removal and whitespace removal using the Otsu method could enhance both quantitative (i.e., reducing false positives and negatives) and qualitative model performance.

Fig. 1 .
Fig. 1.Diagram of manual workflow of tissue slide review (TSR).There are 10 components included in the figure.Component (a) is a CRC tumor tissue; (b) is a cut CRC tumor biopsy sample; (c) is a glass slide with the non-stained two-dimensional CRC tumor tissue block cut from (b); (d) is a glass slide with the H&E-stained two-dimensional CRC tumor tissue block section cut from (b), which is the adjacent two-dimensional CRC tissue block section to (c); (e) illustrates the general anatomic pathology practice workflow for pathologists to make cancer diagnostics using microscope on glass slides; (f) is the pathologists diagnostics with red polygon highlighting the CRC tumor tissue regions from (d); (g) is the black CRC tumor polygon on (c) that has been aligned with the red CRC tumor polygon on (d); (h) illustrates the clinical workflow for cytotechnologists to scrape the CRC tumor tissue on (g); (i) is the NGS device used for genetic testing; (j) is the genetic testing results from the NGS technology.Two subfigures included in this figure: (A) Biopsy sample preparation pipeline; (B) tissue diagnostics and genetic testing pipeline.

Fig. 2 .
Fig. 2. Training and inference pipeline diagram of P-CEAD in CRC tumor segmentation.Phase 1: Phase 1 of the training pipeline, pGAN training; Phase 2: Phase 2 of the training pipeline, calculating NERD; Phase 3: Phase 3 of the training pipeline, selecting cut-off Mahalanobis distance threshold; Inference Phase: Evaluating P-CEAD performance in CRC tumor segmentation.

Fig. 3 .
Fig. 3. Quantitative measurement results of P-CEAD inference performance in CRC tumor segmentation on 137 CRC tumor WSIs.The statistical metrics including the sensitivity, specificity, and F1 score.Each CRC WSI is a blue dot.

Fig. 4 .
Fig. 4. Qualitative model performance evaluations.Note that the red lines are the boundaries of the ground-truth CRC tumor annotations; the blue lines are the boundaries of the predicted CRC tumor areas.(A) Impacts of whitespace of WSIs on model performance evaluation with (a1)-(a4) 4 example patches.All whitespace areas presented on (a1)-(a4) are all included in manual CRC tumor annotation regions, but not included in the model prediction regions.(B) Impacts of predicted artifacts on model performance evaluation with (b1)-(b4) 4 example patches.(b1) and (b2) are example patches with green on-slide annotation inks that are within the model prediction regions, but outside the manual CRC tumor annotation regions.(b3) and (b4) are example patches with black on-slide annotation inks that are within the model prediction regions, but outside the manual CRC tumor annotation regions.(C) Impacts of predicted non-malignant CRC tumor anomalous tissue on model performance evaluation with (c1)-(c4) 4 example patches.On each of the 2 example patches, (c1) and (c3), tissues on the left to the red polygon boundary line are included in the manual CRC tumor annotations; tissues on the right to the red polygon boundary line are not included in the manual CRC tumor annotations, but included in the model prediction regions.On each of the rest 2 example patches, (c2) and (c4), tissues on the lower regions toward the red boundary line are included in both the annotated and predicted CRC tumor areas; tissues on the upper regions toward the red boundary line are only included in the predicted-but are not included in the annotated CRC tumor areas.

Table 1
Data information summary table with WSI type and number of WSIs information regarding each of the 3 training phases and 1 inference phase.