PET/MR versus PET/CT for locoregional staging of oropharyngeal squamous cell cancer

Background The value of fluorine-18-fluorodeoxyglucose positron emission tomography/computed tomography (FDG-PET/CT) for TN staging in head and neck cancer (HNC) has been proven in numerous studies. A few studies have investigated the value of FDG-PET/magnetic resonance imaging (MRI) in the staging of HNC; the combined results indicate potential for FDG-PET/MRI, but the scientific evidence remains weak. Purpose To compare performance of FDG-PET/CT and FDG-PET/MRI for locoregional staging in patients with oropharyngeal carcinomas. Material and Methods Two radiologists independently of each other retrospectively reviewed primary pre-therapeutic FDG-PET/CT and FDG-PET/MRI examinations from 40 individuals with oropharyngeal carcinomas. TN stage and primary tumor size were noted. The results were compared between observers and modalities and against TN stage set at a multidisciplinary conference. Results For nodal staging, PET/MRI had slightly higher specificity and accuracy than PET/CT for the most experienced observer. Both methods demonstrated excellent sensitivity (≥ 0.97 and 1.00, respectively), as well as high negative predictive values (≥ 0.95 and 1.00, respectively). No significant differences were found for tumor staging or measurement of maximum tumor diameter. There was a weak agreement (κ = 0.35–0.49) between PET/CT and PET/MRI for T and N stages for both observers. Inter-observer agreement was higher for PET/MRI than for PET/CT, both for tumor staging (κ = 0.57 vs. 0.35) and nodal staging (κ = 0.69 vs. 0.55). The agreement between observers was comparable to the agreement between methods. Conclusion PET/MRI may be a viable alternative to PET/CT for locoregional staging (TN staging) and assessment of maximal tumor diameter in oropharyngeal squamous cell cancer.

One drawback of FDG-PET/MR is that whole-body MRI is more technically challenging and requires more time to perform than whole body FDG-PET/CT (21). In a recent study, we found that for small HNC tumors (T1-T2) with a clinically negative neck, the risk for distant metastasis is extremely low and no squamous cell carcinomas (SCCs) in a cohort of 335 cases had set subphrenic metastases (22). On the other hand, whole-body imaging with PET/CT in early stage HNC can trigger excessive investigations of benign abdominal lesions that can cause delays in treatment (23,24). Therefore, an FDG-PET/MRI scan of the head and neck region combined with a CT scan of the thorax may be a sufficient alternative for TNM staging in such patients on the premise that the TN staging property of FDG-PET/MRI compares to or supersedes that of FDG-PET/CT. The superior soft tissue visualization of MRI offers advantages, especially in the suprahyoid region (25). The superiority of FDG-PET/ MRI was also proposed in a comparative study on nasopharyngeal carcinomas, conducted with non-contrast-enhanced FDG-PET/CT versus contrast-enhanced FDG-PET/MRI (26). The aim of the present study was to compare locoregional TN staging as well as assessments of tumor diameter between pretherapeutic FDG-PET/CT and FDG-PET/MRI in a cohort of patients with oropharyngeal squamous cell carcinomas (OPSCCs).

Material and Methods
In a retrospective setting, primary pretherapeutic FDG-PET/CT and FDG-PET/MRI examinations of 40 patients with histologically proven OPSCCs were reviewed by two radiologists (LF and SE) independently of each other. The radiologists had 25 and 5 years of experience of head and neck radiology, respectively.

Participants
The participants consisted of a cohort of patients with OPSCCs enlisted in the ongoing prospective clinical trial Multimodal Monitoring of Radiotherapy Response in Squamous Cell Cancer (MORRIS; NCT02379039). The aim of the MORRIS trial is the prediction of short-and long-term outcome after radiotherapy in SCC. The first 46 patients with OPSCC recruited in the MORRIS trial were evaluated for inclusion in the present project. Six patients were excluded due to incomplete radiology. A total of 40 patients (32 men, 8 women; mean age 64.1 years; age range = 40-84 years) were included in the analysis ( Fig. 1 and Supplementary Table 1). The T classifications were in the range of T2-T4a, and the N classifications were in the range of N0-N3. In total, 27 patients were HPV-positive, and eight patients were HPV-negative. HPV status was not known in five patients; these patients were regarded as HPV-negative when setting the N classification.

Image acquisition
PET/CT was acquired on a Discovery 690 64-slice PET/CT scanner (GE Healthcare, Chicago, IL, USA), after 6 h of fasting. Imaging was carried out 1 h after intravenous administration of 4 MBq/kg body weight of 18F-FDG. The PET acquisition was made separately for the head and neck region, with high-resolution reconstruction (SHARP) in the head and neck area. A standard acquisition protocol and reconstruction were applied for the rest of the body. After PET sampling, a diagnostic contrast-enhanced CT was performed of the thorax, abdomen, and neck. The neck was scanned separately with a smaller field of view to enhance in-plane resolution. The CT scans were performed with the patient in the same position as during the initial PET, enabling co-registered volumes. The total scanning time was 45 min.
The day after the PET/CT examination, biopsies were taken from primary tumor sites for histological analysis and assessment of HPV status by p16 expression with immunohistochemistry. The neck was examined with ultrasound, and ultrasound-guided fine-needle aspiration cytology (US-FNAC) was performed on all neck levels with nodes deemed pathological based on ultrasound or PET/CT. Within one week from the PET/CT examination, diagnosis and TNM stage were set at a multidisciplinary conference (MDC), based on all available information from clinical and radiological findings, histopathology of primary tumor biopsies including assessment of HPV status and cytology of US-FNAC from regional lymph nodes.
PET/MRI scanning was always performed after the MDC but before the patient received any other treatment, on average 27 days (range = 17-45 days) after the PET/ CT. The PET/MRI scan was carried out on a SIGNA™ PET/MRI 3.0-T scanner (GE Healthcare, Chicago, IL, USA) after 6 h of fasting. Imaging was carried out 1 h after intravenous administration of 4 MBq/kg body weight of 18F-FDG.
The PET/MRI examinations consisted of T1-weighted (T1W), T2-weighted (T2W), and diffusion-weighted imaging (DWI) as well as PET and fusion images. Details on MRI parameters are presented in Supplementary Table 2. From a stack of dynamic contrast-enhanced T1W sequence images, a stack of synthetic contrast-enhanced T1W images was calculated and saved. All the images were pseudonymized and patient information blinded for the observers.
Before the readings commenced, the observers calibrated their performance by consensus TN staging on five FDG-PET/CT scans from another patient cohort. The TN staging in this study followed the 8th edition of the UICC TNM classification of malignant tumors (27).

Image interpretation
Observer 1 first read all the PET/MRI scans, and after a period of one month read the PET/CT scans. Observer 2 did the opposite. Primary tumor localization, T stage, and largest diameter of primary tumor were noted as well as occurrence of nodal metastases on each level of the neck, as defined by the AJCC (28). The largest diameter of the largest node in each level was noted.
As a third step, 12 cases with a difference in tumor location or considerable difference in tumor staging between PET/CT and PET/MRI were reviewed in consensus. Four patients had undergone surgical intervention with tonsillectomy or extensive biopsies of the primary tumor site between PET/CT and PET/MRI examination, and those cases were excluded from the results of the PET/ MRI-based tumour staging (Fig. 1).
The N stage for each observer's readings was set according to the 8th edition of the UICC TNM classification (27).
The outcome from the readings was compared in terms of agreement between FDG-PET/MRI and FDG-PET/CT as well as agreement with the final TN stage as set at the MDC, which served as the gold standard. Finally, intra-and inter-observer variation of the different modalities was assessed.

Statistical analysis
SPSS Statistics software version 27.0.1.0 (IBM Corp., Armonk, NY, USA) was used for the statistical analyses. Interrater reliability was calculated as Cohen's Kappa index; and the Wilcoxon matched-pair signed-rank test was used to test for systemic differences between observers or methods in staging and tumor size measurements. P < 0.05 was considered statistically significant.
Sensitivity, specificity, predictive values, and likelihood ratios were calculated for nodal staging with MDC consensus as the gold standard and neck side as the observational unit. Thus, there were 80 observations in 40 patients per observer and modality.
The calculation of P values for differences in diagnostic performance were performed using R 4.  (29) and for the likelihood ratio as proposed by Gu and Pepe (30).

Ethical approval
Approval from the institutional review board was obtained from the regional ethics committee before the study (EPN 2015/117-31).

Results
The clinical features of the cohort are presented in Supplementary Table 1. For the mapping of nodal neck metastases, both PET/CT and PET/MRI demonstrated good accuracy, excellent sensitivity, and high negative predictive values (NPVs) (Tables 1 and 2). Specificity was slightly higher for PET/MRI than for PET/CT. PET/MRI also exhibited a higher accuracy than PET/CT. The differences between methods in specificity and accuracy, in favor of PET/MR, were present for both observers, but were statistically significant only for the more experienced observer 1 (Tables 1 and 2). For PET/MRI, there were significant differences between observers in specificity (P = 0.002) and in accuracy (P = 0.009). Negative likelihood ratios were excellent for both methods (0.00-0.05), whereas positive likelihood ratios were moderate (1.86-6.11) ( Table 2).
No statistically significant differences in tumor staging nor in measurement of maximum tumor diameter were found between modalities or observers.

Discussion
In the present study, we found statistically significant differences between PET/CT and PET/MRI in specificity and accuracy in the nodal staging of HNC for one of two observers. The difference was in favor of PET/MRI, and these findings differ from previous studies that have failed to demonstrate any difference between the methods (5,6,9,17,19,(31)(32)(33). Most previous studies, however, experienced small samples or used observer consensus as the gold standard. The scientific evidence for using PET/ MRI in the staging of HNCs is scarce, and the results from this study add evidence to the notion that PET/MRI is a valid replacement for PET/CT in such cases.
It should be emphasized that although both observers in this study obtained slightly better results for nodal staging with PET/MRI than with PET/CT in all diagnostic parameters except sensitivity, only the more experienced observer 1 achieved statistically significant differences. MRI has   been shown to be superior to CT for obtaining excellent soft tissue contrast and is less sensitive than CT for artifacts from dental hardware. Conventional MRI sequences are also superior to CT for a variety of findings that influence the therapeutic choice such as skull base invasion, perineural spread, detection of retropharyngeal nodes, extranodal spread in metastatic neck nodes, and vascular and lymphatic invasion (21). However, MRI may be more dependent than CT on observer experience as a plethora of technical parameters and possible artefact sources need to be considered in the image interpretation. In general, there was a larger difference between observers than between methods ( Table 1). The inter-observer variation was higher for tumor staging than for nodal staging for both modalities. This seems logical, as radiological T staging of oropharyngeal tumors is dependent on involvement of delicate anatomical structures and may be more complicated than nodal staging, which is primarily based on size, multiplicity, and bilaterality (27).
PET/MRI had a slightly lower agreement with the MDC for tumor staging than had PET/CT. This may be due to inclusion bias, as the results from PET/CT formed part of the base for the gold standard, that is, TN stage set at the MDC. Since the difference was not statistically significant, we believe PET/MRI to be at least equal to PET/CT for tumor staging of OPSCCs. Furthermore, PET/MRI performed slightly better than PET/CT for nodal staging, both in terms of interrater reliability as well as in specificity and accuracy (Tables 1 and 2, Fig. 2). As our gold standard for nodal staging also relied on the results of US-FNAC and thus probably is closer to the truth, we believe these findings speaks in favor of PET/MRI. Samolyk-Kogaczeska and co-workers (16), using a histopathological reference standard, reported better agreement in T staging and higher specificity, sensitivity, positive predictive values (PPV) and NPVs of lymph nodes evaluation by PET/MRI than CT imaging. Our findings indicate that the same may be valid for PET/MRI when compared to PET/CT, as in our study, inter-observer agreement for N staging was substantial for PET/MRI (κ = 0.69) and moderate for PET/CT (κ = 0.55). PET/MRI also demonstrated slightly higher sensitivity, specificity, PPV, and NPV for nodal staging than PET/CT, albeit the differences were not statistically significant.
There are a number of known causes for both falsepositive and false-negative evaluations described in the literature. For instance, necrotic nodes sometimes are without elevated FDG uptake and therefore overlooked by the radiologist. Furthermore, small metastatic nodes can be without elevated FDG uptake or suspicious morphological features and may be missed with any imaging modality (21). Many non-metastatic nodes, on the other hand, may show elevated FDG uptake or suspicious features due to inflammatory reaction. In our material, the false-negative findings were scarce whereas the false-positive fraction was significant ( Table 1). The sensitivity was higher and the specificity was slightly lower compared to early studies for both PET/ CT and PET/MRI (5,11,34) but were comparable to more recent studies (17,19). This is in line with the tradition at our institute where high sensitivity is favored over specificity since the results from PET imaging are routinely validated by US-FNAC. The pattern of overstaging of nodes between PET/CT and PET/MRI was similar, although there were slightly fewer false-positive findings with PET/MRI. Understaging is, in our point of view, more worrisome since false-negative evaluations may have dire consequences for the patient whereas overstaging can be dealt with by follow-up US-FNAC. In two cases, staged as N2b at the MDC, both observers understaged nodes with PET/MRI only. One case was judged with PET/MRI to be a single node (N1), whereas both PET/CT and ultrasound revealed a conglomerate of nodes at the same location, upstaging it to N2b (Fig. 3). In the other case, PET/CT indicated multiple suspicious nodes, and stage N2b was set at the MDC. Even, so the results from US-FNAC were positive only in one location and, in our opinion, N1, as indicated by both PET/MRI and US-FNAC, would be the proper staging. We regard this as a case of incorporation bias.
Another factor speaking in favor of PET/MRI is its inherently lower radiation dose compared with that of PET/CT. One could argue that the dose from PET/CT is negligible compared to the radiation treatment these patients will undergo. However, not all patients examined with PET/CT will undergo subsequent radiation therapy, as robot-assisted surgery is introduced for early stages of oropharyngeal cancer. We argue that the radiation dose from the diagnostic examination does matter, even in patients with a suspected HNC tumor that subsequently turns out to be benign.
The present study has some limitations. First, the gold standard was TN staging as set in an MDC. This staging was partly based on the results of the initial PET/CT, leading to incorporation bias that could favor the estimated accuracy of the PET/CT, especially in T staging. A better gold standard would be histologic evidence from surgical specimens. However, as all patients with OPSCCs at our institute are primarily treated with radiation therapy, surgical validation was not available, and the TN staging set at the MDC was as good a gold standard as we could achieve. The US-FNAC of the neck nodes were performed as a clinical examination and it was not possible in retrospect to pinpoint the exact node biopsied. Therefore, we used the neck side for further analyses.
Biopsies of primary tumors and US-FNAC of neck nodes were performed between PET/CT and PET/MRI, which could possibly skew the results from the PET/MRI examination.
The limited number of patients included incurs several weaknesses. First, it reduces the validity of the statistical results. The small sample size also prevents any reliable comparison between HPV-positive and HPV-negative OPSCC staging. It can also be questioned how reflective of a normal population this small patient cohort is. There was a skewness towards more men and HPV-positive disease in the cohort (Supplementary Table 1). While the p16 distribution is reflective of the spectrum of patients at our institute, the strong male predominance is not. Due to the inclusion criteria in the underlying MORRIS study, there was also a skewness towards advanced primary tumors.
The PET/MRI examinations were performed using a restricted research protocol. A full clinical protocol would also include T2W fat-saturated images, possibly more sensitive for nodes as well as a T1W spin echo sequence with gadolinium contrast enhancement, which we believe can result in more accurate tumor delineation. On the other hand, in a previous study, Pyatigorskaya et al. (15) found no added value in gadolinium contrast for tumor delineation in head and neck MRI.
In conclusion, PET/MRI may serve as a valid replacement for PET/CT for the loco-regional staging of HNC, as it equaled PET/CT for tumor staging and in sensitivity for identifying nodal positive necks. PET/MRI was more specific than PET/CT, albeit only for the more experienced observer. Observer experience had, in general, as high an impact on diagnostic outcome as the choice between PET/CT and PET/MRI.