“Watch and wait” (W&W) is a nonstandard, nonoperative approach for patients with locally advanced rectal cancer who have achieved a clinical complete response (cCR) after chemoradiotherapy (CRT), in which surgery is reserved only for patients with development of a local regrowth. The W&W approach was introduced into clinical practice by Dr. Angelita Habr-Gama in the early 1990s.1

To date, several large prospective cohort studies from Brazil, the Netherlands, Denmark, and the UK have reported oncologic outcomes for W&W similar to those for standard CRT and total mesorectal excision (TME) surgery, reflecting the favorable biology of patients who achieve cCR (Table 1).

Table 1 Reported outcomes for the watch and wait (W&W) approach

One of the main challenges with W&W is accurate identification of patients who achieve cCR after CRT. Table 2 shows an accepted classification system used to assess the level of response after CRT.8 This system uses digital rectal exam (DRE), endoscopy, and magnetic resonance imaging (MRI) to classify response after CRT as complete, near complete, or incomplete.9

Table 2 Classification of response after chemoradiotherapy (CRT)8

In the original studies reporting W&W, DRE and endoscopy were used to assess for cCR. More recently, however, MRI has been increasingly used in addition to DRE and endoscopy for this assessment. When MRI is used as part of the evaluation, the tumor regression grade (TRG) is a critical aspect of the evaluation (Table 3).10 The 5-point TRG classification using T2-weighted sequences (T2W) was validated previously by the MERCURY group.10

Table 3 Radiologic MRI classification of tumor regression grade (TRG)9

In the current study, Haak et al.11 evaluated a modified 3-point TRG (poor, intermediate, good) based on risk of residual tumour using both T2W and diffusion-weighted imaging (DWI). For the study, seven expert readers working in different settings each read 62 restaging MRIs and classified them as indicating poor, intermediate, or good responders. The results of the study showed that agreement between the MRI classification based on the interpretation of the reader and the final response outcome (pathology or 2-year follow-up evaluation) was 95–100% for poor, 76–100% for intermediate, and 44–67% for good responders. These results show that when the MRI interpretation denoted a “poor response,” this interpretation was generally always correct. The interrater reliability between pairs of the seven readers using weighted kappa ranged from 0.38 to 0.68 and was best for the most experienced readers (k = 0.64–0.68).

A main strength of this report is its description of results for seven expert readers in different settings. Consequently, it is more generalizable to the real world setting than a report of only two expert readers in the same setting. Similar results were found by Lee et al.,12 who compared the 5-point TRG system with a modified 3-point TRG system including DWI. Although interrater agreement between studies should be interpreted cautiously, the results of this study showed that the percentage agreement between two readers (with 3 and 8 years of experience) reading 118 MRIs increased from 38.1–72.9% using the 3-point TRG, which translated into an interrater reliability using a weighted kappa of 0.58 [95% confidence interval (CI) 0.44–0.72] with the 3-point TRG versus 0.34 (95% CI 0.22–0.46) with the 5-point TRG.

These studies are important because they suggest that a 3-point TRG including DWI has fair interrater reproducibility when two or more expert readers in differing settings report MRI. The 3-point TRG also is a relatively attractive option because it is more consistent with the current clinical classification of response after CRT.

Based on their study results, Haak et al.11 suggest that MRI predicted that poor responders should go directly to surgery and that endoscopy can be safely omitted. This is an interesting suggestion, supported by the high agreement between the MRI interpretation for poor responders and the final outcome response (95–100%). However, it is important to note that the proportion of poor responders based on MRI interpretation ranged from 11 to 37% among the seven readers, indicating that some poor responders were misclassified as intermediate or complete responders. Although the consequence of this misclassification likely is negligible because these patients will undergo DRE and endoscopy, the reported variation in the MRI classification of poor responders among the seven expert readers as well as the overall kappa scores ranging from 0.38 to 0.68 underscore the difficulty interpreting restaging MRI. Furthermore, adopting this strategy for poor responders in jurisdictions with less access to endoscopy may not lead to similar results because these jurisdictions also are likely to have less access to MRI and expert gastrointestinal (GI) radiologists.

Whereas evidence is limited in the literature reporting agreement and reproducibility for endoscopic evaluation and final response outcome, several studies have reported favorable oncologic outcomes with W&W when only DRE and endoscopy were used to assess for local regrowth. Future studies comparing MRI and endoscopy head-to-head are necessary before the safe omission of endoscopy can be considered. Given this to date, worldwide experience with W&W has been documented for only 1009 patients. Therefore, it currently would be premature to omit endoscopy or MRI in assessing response after CRT.13