Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study

Summary Deep learning (DL) can accelerate the prediction of prognostic biomarkers from routine pathology slides in colorectal cancer (CRC). However, current approaches rely on convolutional neural networks (CNNs) and have mostly been validated on small patient cohorts. Here, we develop a new transformer-based pipeline for end-to-end biomarker prediction from pathology slides by combining a pre-trained transformer encoder with a transformer network for patch aggregation. Our transformer-based approach substantially improves the performance, generalizability, data efficiency, and interpretability as compared with current state-of-the-art algorithms. After training and evaluating on a large multicenter cohort of over 13,000 patients from 16 colorectal cancer cohorts, we achieve a sensitivity of 0.99 with a negative predictive value of over 0.99 for prediction of microsatellite instability (MSI) on surgical resection specimens. We demonstrate that resection specimen-only training reaches clinical-grade performance on endoscopic biopsy tissue, solving a long-standing diagnostic problem.

5 Supplementary Tables Suppl. Table 1: Multi-cohort experiments with statistical endpoints, related to Figure 2. Multi-cohort dataset consisting of CPTAC, DACHS, DUSSEL, Epi700, ERLANGEN, FOxTROT, MCO, MECC, MUNICH, QUASAR, RAINBOW, TCGA, TRANSCOT (all resection cohorts except YCR-BCIP and GUANGZHOU).The models were trained with HistAuGAN stain color augmentation, CTransPath as feature extractor and our transformer model with class token as aggregation model.The thresholds 0.9, 0.925, and 0.95 were determined on the in-domain test set and used for the evaluation on the external test sets.All results for sensitivity, negative predictive value (NPV), and sensitivity are averaged over the five folds.

Supplementary Figure 4 :
Data efficiency analysis and confusion matrices, related to Figure 4. a-c) AUROC scores depending on the number of patients available for training.The samples were randomly drawn from all resection cohorts with available labels except the external test cohort.a) MSI prediction on YCR-BCIP.b) BRAF prediction on Epi700.c) KRAS prediction on Epi700.d-e) First column shows the threshold determined on the external tests, such that 0.95 sensitivity is reached.Second to forth column show fixed thresholds (0.25, 0.5, 0.75, respectively).Last column shows the threshold determined on the in-domain test set, such that 0.95 sensitivity is reached.d) Results of multi-cohort model evaluated on the biopsy cohort MAINZ.e) Results of multi-cohort model evaluated on the biopsy cohort YCR-BCIP.
and clinical characteristics of participants yes 21a Distribution of severity of disease in those with the target condition 21b Distribution of alternative diagnoses in those without the target condition yes 22 Time interval and any clinical interventions between index test and reference standard RESULTS Test results 23 Cross tabulation of the index test results (or their distribution) by the results of the reference standard 24 Estimates of diagnostic accuracy and their precision (such as 95% confidence intervals) yes 25 Any adverse events from performing the index test or the reference standard DISCUSSION 26 Study limitations, including sources of potential bias, statistical uncertainty, and generalisability yes 27 Implications for practice, including the intended use and clinical role of the index test OTHER INFORMATION 28 Registration number and name of registry 29 Where the full study protocol can be accessed 30 Sources of funding and other support; role of funders yes Suppl.Table 4: Patient cohorts used in this study and their characteristics, related to Figure 1.Clinico-pathological data were provided by the respective study principal investigators.In all cases, the TNM version from the original study registry was used.Information about the localization of the tumor was either provided as a binary variable (left-sided vs. right-sided) by the study site or assigned by the authors as follows: the cecum, ascending colon, hepatic flexure and transverse colon were defined as a right-sided tumor location whereas the splenic flexure, descending colon, sigmoid colon and rectum were defined as left-sided.*Number of patients before dropout of samples.*

Suppl. Table 2: Ablation study on architecture choices
, related to Figure2.The models were trained with the same pre-processing and feature extractor, only varying the architecture of the aggregation model.All models were trained with 5-fold cross validation.
12a Definition of and rationale for test positivity cut-offs or result categories of the index test, distinguishing pre-specified from exploratory yes 12b Definition of and rationale for test positivity cut-offs or result categories of the reference standard, distinguishing pre-specified from exploratory 13a Whether clinical information and reference standard results were available to the performers/readers of the index test 13b Whether clinical information and index test results were available to the assessors of the reference standard *for the MECC cohort, these statistics refer to the cases with available MSI/dMMR status only.