Loco-regional N staging in Rectal Cancer with Magnetic Resonance Imaging: A Study of Inter- and Intra-Observer Variability


 Background: Colorectal cancer is one of the most common tumors for both men and women: in the United States, it represents the third leading cause of new cancer cases and cancer-related deaths. The prognosis is directly related to tumor infiltration in the mesorectum and lymph node metastases. In particular, it’s important to define the distance between lymphadenopathy and mesorectal fascia, as this has repercussions on surgical planning. This study aimed to evaluate the agreement among observers with different abdominal MRI expertise and intra-observer reliability in lymph nodes size and feature definition. Methods: In this retrospective study, MRI examinations were performed in 88 patients with rectal adenocarcinoma treated with primary surgery. Four observers, two senior physicians, and two junior physicians, analyzed MRI scans in two sessions 30 days apart and determined the size and morphological pattern of regional lymph nodes. Statistical analysis included the determination of Fleiss kappa (k) coefficient, Cohen's Kappa coefficient, and confidence intervals (CI). Results: The inter-observer reproducibility for MRI N-staging was good among the four physicians (kappa = 0.65; CI 0.45–0.77). Reproducibility between the two senior physicians had a kappa of 0.68 (CI 0.62–1.00), while between the two junior physicians had a kappa of 0.61 (CI 0.33–0.89). Inter-observer reproducibility was excellent for mesorectal, inferior mesenteric, and internal iliac lymph nodes (kappa values of 0.89, 0.82, and 0.80 respectively). For the other two nodal stations (superior and middle rectal lymph nodes, sacral lymph nodes), there was a good interobserver reproducibility (kappa between 0.70 and 0.77).The intra-observer reproducibility of interpretations of the MRI overall N staging progressively decreased among observer B (kappa= 0.85), observer C (kappa= 0.59), and the other two physicians. There was a significant difference in lymph nodes measurements between the first and second sessions in observer A (p ≥ 0.05). Excellent intraobserver reproducibility was found for mesorectal lymph nodes; the lowest intraobserver reproducibility values were found for presacral and lateral sacral lymph nodes.Conclusions: Although the low accuracy of MRI in assessing the involvement of metastatic lymph nodes in rectal cancer, this study demonstrates good interobserver reliability among physicians with different abdominal MRI experiences.

The intra-observer reproducibility of interpretations of the MRI overall N staging progressively decreased among observer B (kappa= 0.85), observer C (kappa= 0.59), and the other two physicians. There was a signi cant difference in lymph nodes measurements between the rst and second sessions in observer A (p ≥ 0.05). Excellent intraobserver reproducibility was found for mesorectal lymph nodes; the lowest intraobserver reproducibility values were found for presacral and lateral sacral lymph nodes.
Conclusions: Although the low accuracy of MRI in assessing the involvement of metastatic lymph nodes in rectal cancer, this study demonstrates good interobserver reliability among physicians with different abdominal MRI experiences.

Background
Colorectal cancer is one of the most common tumors both in men and women and a leading cause of cancer-related deaths in developing countries with 447,000 newly diagnosed patients in Europe for 2012. [1] In particular, rectal cancer constitutes an estimated 27 to 58% of all colorectal cases. [2] About 55% of patients with rectal cancer are diagnosed at stage II or III [3], qualifying for multimodal therapy. The adoption of neoadjuvant chemo-radiotherapy in locally advanced rectal cancer reduces the risk of local recurrence, even if no bene t of survival is achieved [5], and often leads to impaired functional outcomes when compared to surgery alone [5 -7]. Therefore, accurate local staging of rectal cancer is an imperative prerequisite when selecting patients for preoperative treatment, to intending to avoid under-treatment and minimize over-treatment.
At the moment, rectal MRI is the most appropriate imaging modality for local staging of rectal cancer, detecting locally advanced rectal tumors that can be treated with neoadjuvant chemo-radiotherapy (Table   1).
In comparison to its performance for the T stage and involvement of the circumferential resection margin (CRM), MR imaging is less accurate in the detection of lymph node metastasis [8 -10], which is an important prognostic factor indicating the necessity of the use of neoadjuvant chemo-radiotherapy. The presence, number, and precise location of suspicious lymph nodes should be reported as the proximity between them and the mesorectal fascia (MRF), which is important for surgical planning, although it does not confer a poor prognosis in the same manner as that of the primary tumor. [11]  Regional lymph nodes cannot be assessed No regional lymph nodes metastases Metastasis in one to three regional lymph nodes Metastasis in one regional lymph node Metastasis in two to three regional lymph nodes Tumor deposit(s) in subserosa, mesentery, or non peritonealized pericolic or perirectal tissues without regional lymph nodes metastasis Metastasis in four or more regional lymph nodes Metastasis in four to six regional lymph nodes Metastasis in seven or more regional lymph nodes This problem is aggravated by the lack of consensus on appropriate criteria to assess lymph node involvement. [12] Using size alone as the sole criterion yields only moderate accuracy, as 94% of the involved nodes are smaller than 5 mm. On the one hand, lymph nodes measuring greater than 8 mm in the short axis are highly speci c for metastatic involvement. [11] Brown et al. were the rst to describe that a correct diagnosis of nodal involvement in rectal cancer on MR imaging is improved when using morphologic features, such as border contour, shape, and signal intensity instead of size criteria alone. If a node was de ned as suspicious because of an irregular border or a round shape or heterogeneous signal intensity, a superior accuracy was obtained with high sensitivity and speci city. [13] This explains why lymph node characterization is more accurate with larger nodes that can be analyzed for their shape, border, and signal intensity.
Kim et al. also demonstrated that new criteria such as a spiculated or indistinct border and a mottled heterogeneous appearance could be useful to predict regional lymph node involvement showing a sensitivity of 45% and 36%, and speci cities of 100% and 100%, respectively. [14] However, even with these new diagnostic criteria, the quality of lymph node staging using preoperative MRI in terms of sensitivity and speci city remains below 80% in a recent meta-analysis. [10,15] Regional lymph nodes involved in rectal cancer are the mesorectal, superior, middle and, inferior rectal, inferior mesenteric, lateral sacral, presacral, sacral promontory, and internal iliac. The other lymph nodes chains involved are considered distant metastasis.
This study assessed the intra-observer and inter-observer reliability in the evaluation of suspicious lymph nodes, analyzing their size and morphological features in rectal cancer patients using MR imaging to demonstrate the level of experience needed to use these features.

Patient population recruitment
In this retrospective single-center study, 88 patients with histologically proven rectal adenocarcinoma underwent primary surgery between April 2017 and June 2019 at the Department of Medical and Surgical Sciences, University Hospital of Foggia, Italy, were consecutively included.
All patients were preoperatively staged by pelvic MRI and underwent primary total mesorectal excision (TME) and lymphadenectomy. Patients with previous bowel resection and with images that consisted of artifacts or images that might prevent the proper evaluation of the rectal wall and mesorectum (tremors, poor de nition) were excluded.
This retrospective study was approved by the institutional review board and all patients provided informed consent for their participation in this study.

MR imaging protocol
All MR examinations were performed in the supine position with a 1.5 T magnet (Achieva, Philips Medical System, Eindhoven, The Netherlands) equipped with a phased-array 32-elements coil (dStream Torso coil). According to the tumor location on colonoscopy, an appropriate amount (20-80 mL) of ultrasonic gel was poured into the rectum of each patient except those with low or large rectal tumors. Unless contraindicated, a dose of 40 mg/4mL of phloroglucinol was injected intramuscularly approximately 10 minutes before the MR examination, to prevent intestinal peristaltic artifacts.
The sequences of MRI included conventional sagittal and axial high-resolution T2-weight images (T2WI), Diffusion-Weighted Images (DWI), and T1 Dynamic Contrast-Enhanced (DCE) images. The primary imaging parameters are listed in Table 2.
Gadopentetic acid dimeglumine of 0.2 mL/kg was used as the contrast agent with the injection rate of 2 mL/s followed by a 20 mL saline ush, to improve the detection of tumors and increase the accuracy of MRI for diagnosing T3 tumors and loco-regional spread. [16,17]  Lymph node evaluation Four radiologists were recruited to determine the size and morphological pattern of lymph nodes in the rectal cancer patients using MRI: two senior physicians (observers A and B) and two junior physicians terms of size (largest short-axis diameter in mm) and different morphological features (Fig. 1), and the total number of suspected metastatic lymph nodes per patient noted. Uniform nodes smaller than 9 mm in the largest short axis with homogeneous signal intensity were considered not suspicious.
Criteria for malignant nodes included: 1. Short axis diameter ≥ 9 mm; 2. Short axis diameter 5-8 mm and ≥ 2 morphologically suspicious characteristics (i.e., round shape, irregular border, heterogeneous signal intensity); 3. Short axis diameter < 5 mm and 3 morphologically suspicious characteristics (Fig. 2). Each radiologist made an initial interpretation of the anonymized examinations and none of them were aware of the histopathological ndings. At least 15-30 days later, a second analysis of all MRI pelvic examinations was performed by each physician separately using the same method, in a different order.

Statistical analysis
The main evaluation was the reproducibility of inter-and intra-observer interpretation of MRI nodal assessments, evaluated by the kappa (κ) statistical test.
[18] Cohen's Kappa coe cient (reproducibility between two observers) and the Fleiss' Kappa coe cient (reproducibility between four observers) indicated reproducibility of interpretation: κ > 0.81 is excellent, 0.80 > κ > 0.61 is good, 0.60 > κ > 0.41 is moderate, 0.40 < κ < 0.21 is fair, and κ < 0.20 is poor. [19] The kappa coe cients were recorded with their con dence intervals (CI). Intra-observer reliability was calculated using both the rst and second measurements from each observer. The null hypothesis was that κ = 0, i.e., no inter-or intra-observer agreement. The paired-sample t-test was used to address the differences in measurements between the two image analysis sessions. A p value of less than 5% was considered statistically signi cant. The analyses were performed using the software GraphPad Prism version 8.0 (GraphPad Software, Inc., San Diego, CA).

Results
The patients included in this study consisted of 68 men and 20 women, the mean age was 64.5 years (range, 45-78). Clinical characteristics of the study group are summarized in Table 3. A total of 396 lymph nodes in 88 patients were evaluated by four observers during two analysis sessions, resulting in a total of 6336 measurements.
The interobserver reproducibility for overall MRI N staging was good (Table 4).
Reproducibility between the four physicians had a kappa of 0.65 (CI 0.45-0.77). Reproducibility between the two senior physicians had a kappa of 0.68 (CI 0.62-1.00), whereas reproducibility between the two junior physicians had a kappa of 0.61 (CI 0.33-0.89). Excellent inter-observer reproducibility was found for mesorectal, inferior mesenteric, and internal iliac lymph nodes with a kappa value of 0.89 (CI 0.78-0.98), 0.82 (CI 0.82-1.00), and 0.80 (CI 0.66-1.00) respectively. For the other two nodal stations (i.e., superior and middle rectal lymph nodes, presacral and lateral sacral lymph nodes), there was good interobserver reproducibility with a kappa between 0.70 (CI 0.57-0.97) and 0.77 (0.62-0.96). Table 3 Patients characteristics.   Table 4 Inter-observer reproducibility. The intra-observer reproducibility of interpretation of the MRI overall N staging was excellent for one senior physician (observer B) with a kappa of 0.85 (CI 0.56-0.92), was moderate for one junior physician (observer C) with a kappa of 0.59 (0.36-0.78) and good for the other two physicians. Excellent intraobserver reproducibility was found for mesorectal lymph nodes in each observer with kappa between 0.81 (CI 0.65-0.97) and 1.00 (CI 1.00-1.00). The lowest intraobserver reproducibility values were found for presacral and lateral sacral lymph nodes with a kappa between 0.57 (CI 0.51-0.77) and 0.63 (0.57-0.87). The results of the agreement for each lymph node station were shown in Table 5. Table 6 shows the results of lymph nodes measurements among four observers and image sessions. There was a signi cant difference in lymph nodes measurements between the rst and second sessions in one observer only (observer A, p ≥ 0.05). Moreover, the mean lymph node diameters measured by observer A during the second image session were signi cantly larger than those measured by other observers. Table 5 Results of the agreement for each lymph node station.

Nodal stations
Inter-observer agreement at rst session analysis Two senior physicians Two junior physicians All four radiologists  Table 6 Summary of lymph nodes measurements among four observers and image sessions. Note: values are mean ± standard deviation (SD), expressed in millimeters.

Nodal stations
Intra-observer agreement between rst and second analysis session

Discussion
Preoperative staging of rectal cancer is essentially based on lymph node involvement that represents an important aspect of the management, and radiological assessment with MRI plays a key role in this process. In this regard, the current TNM staging distinguishes four categories for parameter N: N x if regional lymph nodes can't be assessed, N 0 if they are absent, N 1 if they are no more than three (N 1a for 1 lymph node, N 1b for 2-3 lymph nodes, N 1c for tumor deposits) and N 2 if they are at least four (N 2a for 4-6 lymph nodes, N 2b for 7 or more). The presence of nodal metastases in rectal cancer implies stage III disease and the need for intensi ed therapy (Table 7). Differences in overall survival have been noted when patients are strati ed by the extent of nodal metastasis. [20] Furthermore, it has been proposed for nodal assessment to include the size and morphologic characteristics of malignancy, including the presence of irregular borders, heterogeneous signal intensity, and round shape. [13,14] According to the last guidelines, the best MRI sequences for the detection of lymph node involvement are represented by FSE-T2w ones [21], without fat suppression, with a large eld of view and a slice thickness of at least 3 mm. Although DWI is highly sensitive for lymph nodes detection in rectal cancer, it is more sensitive to bowel movement and air artifacts and is unable to characterize lymph nodes as benign or malignant. [22] Preoperative staging is useful for identifying patients who can bene t from neoadjuvant treatment and, although MRI is the gold standard, its speci city in terms of metastatic lymphadenopathy detection is very moderate, especially if the observer has poor experience in the abdominal setting. [23] This inexperience has repercussions in terms of both prognosis and choice of surgical or medical treatment, and this is why the importance of a sub-specialization in gastrointestinal radiology has been increasingly recognized.
In particular, the specialist sub-branch plays an important role in daily practice together with updated guidelines, new surgical treatment modalities (laparoscopic, robotic, and endoscopic methods), oncological strategies (cytotoxic agents, immunotherapeutic modalities, and target-oriented hormonal treatments), and developments in advanced technology. Fellowship programs lead to o cial subspecialty certi cation and are needed to acquire speci c skills, but also to pursue academic careers.
When all these aspects are taken into consideration, a different value is attributed to the training received both during and after the residency period.
Our study aims precisely at demonstrating that subspecialization represents an added value for radiologists who are preparing to report MRI exams performed for gastrointestinal diseases. The results obtained re ect the poor diagnostic accuracy of a less experienced physician (observer A) who, despite having 20 years of experience as a general radiologist, is not able to properly evaluate rectal cancer with MRI due to the lack of dedication to the gastrointestinal tract. Particularly, it was possible to deduce that radiologist A has less lymph node staging capacity not only compared to the other senior physician (observer B), despite the latter having 12 years of work experience (according to our evaluation good interobserver reproducibility with k = 0.68), but also to the other two junior observers, C and D. Thus, even a minimum skill in this eld can make a difference in terms of quality of the radiological report.
However, the result of our study has several limitations: rst of all, observers came from the same institution and the agreement among them may have re ected the interpretation style of the department; only four radiologists were enrolled, so it was not possible to evaluate further diversi cation of the spectrum of training background and day-to-day practice; lastly, all observers received specialized instructions before starting their evaluation, while normally they would have interpreted imaging studies without speci c training in lymph nodal staging.

Conclusions
In conclusion, the intra-and inter-observer reliability of size and morphological criteria via MRI for lymph nodes depended on observer experience. Specialized observers had good reliability, whereas an observer who was unfamiliar with gastrointestinal radiology showed moderate to low reliability. Therefore, to ensure the reliability of rectal MRI in the assessment of suspected pathological lymph nodes, it is necessary to possess MRI abdominal experience.
Abbreviations Figure 2 T2-weighted paraxial pelvic MRI of three patients (A-C) with different morphological patterns. A shows an enlargement of pre-sacral lymph node with round shape, B a mesorectal lymph node with irregular border, and C an enlarged mesorectal node with a heterogeneous signal.