Target volume delineation of anal cancer based on magnetic resonance imaging or positron emission tomography

To compare target volume delineation of anal cancer using positron emission tomography (PET) and magnetic resonance imaging (MRI) with respect to inter-observer and inter-modality variability. Nineteen patients with anal cancer undergoing chemoradiotherapy were prospectively included. Planning computed tomography (CT) images were co-registered with 18F–fluorodexocyglucose (FDG) PET/CT images and T2 and diffusion weighted (DW) MR images. Three oncologists delineated the Gross Tumor Volume (GTV) according to national guidelines and the visible tumor tissue (GTVT). MRI and PET based delineations were evaluated by absolute volumes and Dice similarity coefficients. The median volume of the GTVs was 27 and 31 cm3 for PET and MRI, respectively, while it was 6 and 11 cm3 for GTVT. Both GTV and GTVT volumes were highly correlated between delineators (r = 0.90 and r = 0.96, respectively). The median Dice similarity coefficient was 0.75 when comparing the GTVs based on PET/CT (GTVPET) with the GTVs based on MRI and CT (GTVMRI). The median Dice coefficient was 0.56 when comparing the visible tumor volume evaluated by PET (GTVT_PET) with the same volume evaluated by MRI (GTVT_MRI). Margins of 1–2 mm in the axial plane and 7–8 mm in superoinferior direction were required for coverage of the individual observer’s GTVs. The rather good agreement between PET- and MRI-based GTVs indicates that either modality may be used for standard target delineation of anal cancer. However, larger deviations were found for GTVT, which may impact future tumor boost strategies.


Background
Squamous cell carcinoma of the anal canal is a rare cancer where the primary treatment is combined radiotherapy (RT) and chemotherapy, with surgery being reserved for salvage treatment [1]. Five year survival rate for patients with an early stage disease is around 80-90%, colostomyfree survival of advanced stages (T3-T4) is 50-60%, and about 20% of the patients experience relapse. The main challenge is local control, though a general escalation of the RT treatment dose is associated with acute toxicity and late effects which potentially inhibit treatment and impact quality of life [2]. To reduce toxicities smaller fields and conformal dose delivery, through intensity modulated RT (IMRT), has been introduced [3,4]. As a result there is an increased emphasis on correct and precise delineation of the target volume [5,6].
The RT target volume originates from the gross tumor volume (GTV) and includes regions with possible subclinical disease together with margins accounting for patient motion and setup. RT volumes such as GTV are commonly defined on Computed Tomography (CT) images suitable for dose calculations. Tumor stage is evaluated by tumor diameter and degree of infiltration into normal tissue, often visualized by Magnetic Resonance Imaging (MRI) with its high resolution and softtissue contrast in pelvic tumors [7]. Positron Emission Tomography (PET) provides high sensitivity in detecting presence of tumor, regional nodal status and distant metastatic spread [8], though in general has somewhat lower resolution compared to CT and MRI. The American clinical practice guidelines, National Comprehensive Cancer Network (NCCN), recommend that PET/CT should be considered for RT planning [9]. Ideally CT, MRI and PET would be co-registered and used for target volume definition, with PET aiding tumor localization and MRI aiding tumor border delineation [10]. In practice, even though all modalities may be used during staging and description of tumor extent, the target volume delineator may be presented with PET-CT, MR-CT, or CT only.
To differentiate the dose levels between elective target volumes and gross disease for anal carcinomas simultaneous integrated boost with IMRT has been used with good results [4,11]. A further step in personalized treatment is to treat the target volume as an inhomogeneous structure, with different dose requirements due to cellular density and treatment resistance, and shaping dose distributions accordingly -dose painting. In this context both PET and MRI are evaluated as markers for local treatment resistance and for defining dose painting targets [12,13]. For both squamous cell carcinomas of the head and neck and non-small cell lung cancer, studies have shown that recurrence tends to originate within regions with a high [18F]fluorodeoxyglucose (FDG) uptake [14,15] and dose painting trials with PET have been initiated for these tumor types. However, at the moment no conclusive evidence has been presented [16,17].
In the current study, target volumes have been delineated for a cohort of patients with anal cancer according to either MRI/CT or PET/CT. This has been conducted to compare the existing inter observer variability of each modality with the differences between imaging modalities. The purpose of this analysis is to evaluate the effect of each of the imaging modalities for target delineations in current clinical practice and in the context of future dose painting.

Patients
This prospective study (NCT01937780) includes 19 consecutive patients that were treated for squamous cell carcinoma of the anal canal at Oslo University Hospital between December 2013 and November 2014. Written, informed consent was obtained from all patients and the study was approved by the regional ethical committee. The patients had a median age of 64 years (range 40-88), a mean weight of 65 kg (range 52-91), and 84% were female. Tumor stage was T2/T3/T4 in 53/21/21% of the cases, and 63% of the patients had node positive disease (Table 1). One T0 patient had large nodal metastases after surgery of carcinoma in situ which was treated as a primary tumor. IMRT/VMAT or 3-D conformal radiotherapy was delivered with 46 Gy to the clinical target volume (CTV) together with one or two cycles of concomitant Mitomycin C and 5-fluorouracil depending on disease stage [18]. For T1-2 N0 tumors the primary tumor was given 54 Gy, while for T3-4 or N+ disease the GTV and pathological lymph nodes were given 58Gy. Imaging was performed prior to RT and comprised contrast-enhanced planning CT, T2weighted (T2 W) and diffusion weighted (DW) MRI, and FDG-PET.
Anonymized images were exported to the Eclipse RT planning system (Varian medical systems) where they were used for RT target volume delineation. For each patient the CT images were duplicated creating two cases with corresponding information on clinical examination, anorectoscopy and multidisciplinary team notes. The time between planning CT and PET, and planning CT and MRI acquisition was in median 4 (range 1-26) and 8 (range 3-17) days, respectively. In addition one case included MR imaging reports for MR delineations and the other PET imaging reports for PET delineations. A random order list of these 37 anonymized cases was created to minimize patient recognition and intra observer bias.

Imaging
Medical imaging was conducted according to existing clinical protocols. PET examinations were performed after 6 h of fasting, using a Siemens Biograph 16 PET/ CT scanner (Siemens, Erlangen, Germany). Images were reconstructed with OSEM 2i21s with a 3D PSF, TOF and a Gaussian 2 filter. A 400 × 400 reconstruction matrix was used with 2 mm resolution and 3 mm slice spacing. Images were obtained 1 h (62 min [55-80])

Contouring
Three experienced oncologists (A, B, and C) independently performed the tumor delineations ( Fig. 1). Delineation of lymph nodes was not considered in this work. All contours where defined on the CT image basis used for dose planning, while the auxiliary modality, MRI or PET, was shown as a blended/alternate image. The use of blended images or two modalities in two windows was left to the preference of the oncologist though the images were always relative to the CT. The co registrations were performed in Eclipse using a pelvic limited rigid mutual information registration. Notable shifts in anatomical structures were noted and the quality of the co registration was evaluated in terms of location of soft tissue structures in the anorectal region by each delineator on a scale 1-3 (1:poor, 2: acceptable, 3:good, decimals allowed). To limit the effect of the variable experience the delineators had in using the different modalities a fixed window was used for PET images during PET delineations and DWI was limited to b1500 values with a fixed window during MRI delineations. As such, the PET delineations reflected anatomical structures seen on the CT together with hypermetabolic regions seen on the PET. MRI delineations reflected anatomical structures seen on CT and T2 W-MRI together with hypo-diffusive regions seen on DWI. The degree of how the images were weighted was left to the clinical judgement of the oncologist. If necessary, the oncologists could discuss the tumor extent with a dedicated radiologist or nuclear medicine physician, as in accordance with routine clinical practice. According to international practice on anal cancer [19] the GTV was delineated to include the visible tumor and the circumference of the anal canal and/or rectum when involved, as seen on axial T2 W/DW MR or PET images. Thus two GTVs were defined by each of the three observers for each patient; GTV MRI and GTV PET . Furthermore, the visible macroscopic tumor (GTV T ) on either MRI or PET was delineated as separate structures (GTV T_MRI /GTV T_PET ) . Any shift in DW images relative to T2 W images, possibly due to the image sequence duration and geometric distortions, was adjusted for according to anatomical structures seen in the CT images. Analysis DICOM images and structure sets were exported from the RT planning system and analyzed by in-house software developed in IDL (Exelis Visual Information Solutions). For each patient the total volume and center point of the GTVs and GTV T s were calculated. Three pairwise intersection volumes between doctors (A-B/B-C/C-A) and the three intersection volumes for each doctor between PET and MRI were calculated. The intersection volumes were normalized against the average volume of the two shapes, producing the Dice similarity coefficient [5]. The average Dice coefficient between the three doctor pairs represented the inter-observer variability for a given modality, while the average Dice coefficient for the three doctors between modalities represented the inter-modality variability. Furthermore, coronal, sagittal, and axial slices were reconstructed and a two dimensional Dice analysis was conducted, producing a profile over the mediolateral, anteroposterior and superoinferior directions. For each case the tumor width, as number of slices containing parts of the tumor, was normalized to one allowing an average curve over all patients to be produced. Margins, covering most of the observed variability, were calculated by expanding the median observer contour with an anisotropic kernel until it covered 90% of each individual observer contour.
For statistical analysis, Pearson's correlations between volumes were calculated. Dice distributions were compared using the Wilcoxon Signed-rank test. Analysis of variance (ANOVA) was conducted to evaluate the influence of patients, doctors and imaging modalities. A p value less than 0.05 was considered statistically significant.

Results
Delineations were completed on average in 20 min and the quality of the co-registration to the CT planning basis was deemed acceptable for both PET (2.5/3) and MRI (2.2/3). The most commonly reported issues were the position of the intestines, bladder filling and pelvic rotation. The median delineated GTV volumes of all patients was 27 (range 8-173) cm 3 for PET and 31 (range 10-150) cm 3 for MRI, with corresponding median GTV T volume values of 6 (range 1-80) cm 3 for PET and 11 (range 1-102) cm 3 for MRI. Individual patient data are given in Table 2. For each patient the delineated PET and MRI volumes were highly correlated for both GTV (r = 0.95) and GTV T (r = 0.94).
The impact of observer, imaging modality and individual patient on the absolute volumes was further investigated by ANOVA. Differences between patients in tumor volume was the primary feature in the ANOVA (p < 0.001), indicating that uncertainties in delineation were smaller than differences between patients for all delineators and modalities. The secondary feature was that delineated volumes were significantly larger for MRI compared to PET, both for GTV (F = 4.8, p = 0.032) and GTV T (F = 9.4, p = 0.003), indicating that variance between modalities were greater than between doctors. The third feature was that some doctors produced larger delineations than others for the GTVs (F = 3.2, p = 0.047) while not for the GTV T s (F = 2.7, p = 0.068). The latter indicates that for GTV T the intra delineator variance was comparable to the variance between delineators.
The main offset between center points of the individual delineations and the median volume was in the superoinferior direction with a median of 5 mm for the GTVs, 6 mm for GTV T_PET , and 7 mm for GTV T_MRI . The largest deviations of 2-4 cm originated from disagreement in the extent of whether to include the anorectal wall or not. The anisotropic margins required to ensure 90% coverage between delineation and the median volume were on average [0.9, 0. The distributions of Dice similarity coefficients over the current patient cohort, for inter-observer and inter- Table 2 Mean gross tumor volume (GTV) and active tumor region (GTV T ) for three observers delineating target volumes using PET or MRI. Dice coefficients for inter observer PET, inter observer MRI, and intra observer comparing PET with MRI  Figure 3 displays the Dice contours projected along the mediolateral, anteroposterior and superoinferior directions. In all directions the Dice values were lowest at the contour edges, and the shapes were relatively symmetrical. The superoinferior projection appears jagged as a result of the lower CT resolution in this direction. GTV PET projections show higher values and appear broader than GTV MRI , while the opposite is true for the GTV T s. Inter-modality profiles are lower than the corresponding inter-observer profiles, and the difference is most pronounced for GTV T .

Discussion
Generous margins have traditionally been applied in radiotherapy to correct for uncertainties in biology, movement, setup, delivery and imaging. Uncertainties in patient setup and delivery systems have steadily decreased over the years, while advances in medical imaging has been limited by co-registration issues and patient changes between sessions. The current work has focused on two main issues: how different observers will produce varying delineations for a given modality, and how their information from PET and MRI is interpreted differently.
MRI seemed to consistently produce slightly larger GTV volumes than PET, though the difference was small (median 4 cm 3 ). Other studies have shown similar results, with MRI tending to overestimate GTV for rectal cancer, though clinical significance has yet to be established [20,21]. The ANOVA and Dice distributions both demonstrated that the greatest uncertainty seemed to be the choice of modality and not the choice of observer. Although differences between delineators could be detected the Dice coefficients with a median of 0.8 for PET and 0.74 for MRI showed a high degree of overlap. The difference between delineators was seen as the same delineator repeatedly producing smaller, or larger, volumes than the others. This indicated that they were consistent in their own clinical judgement with low intra observer variability relative to inter observer or inter modality variability.
In the current work, GTV T was delineated to encompass only the visible macroscopic tumor volume, as opposed to GTV which also included the anorectal wall. For PET/CT GTV T_PET will largely be reflected by the hypermetabolic part of the tumor, as CT images provide sub-optimal images quality for this purpose. The hypermetabolic tumor area may be quantified in terms of the PET-based metabolic tumor volume (MTV), which has shown a prognostic role for anal cancer [22]. Since the GTV T_PET includes MTV and is smaller than the GTV it is an attractive target for dose escalation due to smaller dosimetric impact on the surrounding normal tissue than a general escalation. GTV T_MRI was contoured similarly with T2 W-and DW-MRI, thus including the most dense tumor tissue. For GTV T the delineations seemed less consistent with respect to delineators and to how the modalities were interpreted. Dice values decreased to median values about 0.7 for both modalities, with a median inter modality Dice coefficient of 0.56. ANOVA could no longer distinguish delineators indicating that the doctors were less consistent in defining GTV T s compared to GTVs. Thus, local dose escalation based on either GTV T_PET or GTV T_MRI are expected to produce different dose distributions in the patient, but the clinical impact of these differences is not clear.
The differences in center points of the delineated volumes were 1-2 mm in the axial plane and 5-7 mm in the superoinferior direction, which was consistent with the margin required for 90% coverage of inter observer variability. The reason for the wide margin in the superoinferior direction was not only a result of disagreement on which slice to begin and end the tumor but also the lower out-of-plane resolution. No comparable study of anisotropic margins for anal cancer exists, though a recent study of lung cancer reported a required GTV-PTV margin of 3-6 mm to account for target delineation variability [23]. For rectum cancer the greatest margins are required in the anteroposterior direction, though this may be due to rectal motion and possibly the use of isotropic voxels [24]. In the current work, both differences in center of mass positions and margins were almost equal for GTV PET and GTV MRI , with margins somewhat larger than the difference in center points. Both differences in center points and margins where greater for GTV T_MRI than for No previous inter-observer/modality delineation studies on anal cancer have been found in the literature, though it has been shown that PET is a useful supplement to CT [25][26][27]. For rectal cancer, inter-observer variability as measured by Dice coefficients has been shown to decrease when adding FDG-PET (0.81) to CT images (0.77) [28], while a comparison of the concordance index of CT + PET (0.82) and CT + MRI (0.79) did not report significant differences when switching between modalities [29]. These values are similar to our current results where the GTVs have a mean Dice coefficient of typically 0.75-0.8. Thus, the inter observer and inter modality variability in our work an anal cancer delineations seems comparable to that of rectal cancer.

Conclusion
In summary, delineation based on PET or MRI produced similar GTVs for RT planning of anal cancer, though PET appears to have lower inter observer variability in terms of Dice coefficients. However, the deviations between PET and MRI-based delineations were not substantial and may not translate into clinically meaningful differences. Overall the GTV T s display greater variability than the GTVs, both between doctors and modalities. It appears that being presented both PET and MR images are not critical for current GTV delineations, although local dose escalation strategies (dose painting) targeting GTV T may show greater dependence on imaging modality.