Introduction

In head and neck cancer various treatment strategies have been developed to improve outcome. However, it remains difficult to select patients for these intensified treatments despite careful evaluation of clinical factors such as tumour size/stage, lymph node involvement and anatomic subsite. Therefore, identification of novel pretreatment factors that potentially predict treatment response and long-term outcome is of great interest [1]. The development of molecular imaging techniques, such as PET, allows the noninvasive study of the pathophysiology of cancers.

In head and neck cancer there are indications that pretreatment tumour 18F-fluorodeoxyglucose (FDG) uptake may be an independent prognostic factor [1]. Many research groups have studied the incorporation of FDG PET into radiation treatment planning, and several ways of using PET data have been described. Visual interpretation is the most commonly used method [25]. This method, however, is susceptible to variations due to the window level settings of the images and is highly operator-dependent. Therefore, more objective methods have been explored. Examples are isocontouring based on a standardized uptake value (SUV) of 2.5 around the tumour [3, 68], a fixed threshold of the maximum signal intensity [913], or a threshold which is adaptive to the signal to background ratio (SBR) [3, 14]. We recently demonstrated that FDG PET may have important consequences for the definition of the gross tumour volume (GTV) of the primary tumour in head and neck cancer, and that the choice of the PET segmentation tool is not trivial [15]. The aim of this study was to assess the prognostic value of the determination of primary tumour volume from CT and FDG PET scans, and various ways of quantifying FDG uptake in patients with head and neck cancer treated with (chemo)radiotherapy, and to provide an overview of the available literature.

Material and methods

Patients

A total of 77 patients (58 men and 19 women; median age 61 years, range 43–86 years) with stage II–IV squamous cell carcinoma of the head and neck area, eligible for primary curative radiotherapy, were prospectively enrolled from June 2003 until July 2006. FDG PET was performed only for research purposes, and did not influence treatment. The tumour characteristics are summarized in Table 1. No information on human papillomavirus relatedness can be provided. The study was approved by the Ethics Committee of the Radboud University Nijmegen Medical Centre and all patients provided informed consent.

Table 1 Tumour characteristics of 77 patients

Treatment

All patients were discussed in a multidisciplinary conference for tumour classification and treatment recommendations. Our protocol recommended treating primary tumour and metastatic lymph nodes to a dose of 68–70 Gy This was combined with concomitant weekly intravenous cisplatinum 40 mg/m2 for large unresectable tumours. Elective lymph node regions were treated to 44 Gy.

Image acquisition

Before treatment, a CT scan and an FDG PET scan were acquired in radiation treatment position with the patient immobilized using a custom-made rigid mask covering the head, neck and shoulders. Maximum reproducibility in positioning was ensured by the use of additional support systems: a flat scanning bed, customized head support cushion, intraoral mould when indicated, standard cushion supporting the knees, and laser positioning system as previously described [15]. CT scans were acquired using a multislice spiral CT scanner (Philips AcQsim; Philips, Cleveland, OH). Scanning parameters were 130 kV, 120 mAs, slice distance and slice thickness 3 mm, scanning the head and neck area, with intravenous contrast agent. FDG PET scans were acquired using a full-ring dedicated PET scanner (Siemens ECAT Exact 47; Siemens/CTI, Knoxville, TN). Patients with diabetes mellitus were not excluded. However, glucose levels had to be appropriately regulated (glucose level at time of FDG injection <10 mmol/l, no insulin administration before FDG injection). A 3-D emission scan of the head and neck area and a 2-D 68GE-based transmission scan for attenuation correction were acquired 60 min (median±SD 64±11.4 min) after intravenous injection of 250 MBq FDG (Covidien, Petten, The Netherlands). The acquisition time per bed position was 5 min for emission and 3 min for the Ge-based transmission scan, resulting in a total scanning time of 16 min for the two bed positions. Image reconstruction has been described in detail previously [16].

Three-dimensional surface models were automatically derived from both the CT and PET images. These models were anatomically coregistered using an operator-independent iterative closest point algorithm, with an average registration error of 2.0 mm at the centre of the planning area as described previously [17]. SUV was defined as the voxel value of detected activity multiplied by the weight of the patient divided by the activity at the beginning of the scan.

The CT and the two PET datasets were transferred via DICOM to a Pinnacle3 treatment planning system (Philips Medical Systems, Andover, MA) for target volume definition.

Target volume definition

The primary tumour was delineated on CT and FDG PET images by two experienced radiation oncologists in consensus. The volume of the metastatic lymph nodes was not included. The role of FDG PET in the delineation of metastatic lymph nodes has been analysed previously [18].

On CT images, the GTV (GTVCT) was delineated manually according to current clinical protocols using information gathered from physical examination, available diagnostic work-up imaging (CT and/or MRI, examination under general anaesthesia) and the CT scan in treatment position. When the radiation oncologists were drawing the GTVCT contours, the FDG PET images were blinded.

Five PET-based volumes were obtained using different delineation approaches. The volumes were delineated visually (PETVIS) by contouring the FDG activity that was clearly above normal background activity. Locations with increased FDG uptake were classified as malignant in consensus with an experienced nuclear medicine physician. The other (threshold-based) volumes were obtained using in-house developed software scripts for the Pinnacle3 treatment planning system. Volumes were delineated by applying an isocontour of SUV = 2.5 (PET2.5) around the tumour. Volumes were delineated using two fixed percentage thresholds of 40% (PET40%) and 50% (PET50%) of the maximum signal intensity in the primary tumour (SUVMAX). Finally, volumes were delineated using an adaptive threshold based on the SBR (PETSBR), as developed at Université St. Luc in Brussels, Belgium [14]. Calibration and implementation of the PETSBR method have been described in detail previously [15]. Results obtained by automated delineation algorithms were checked visually before acceptance. A delineation was considered unsuccessful if the resulting volume included significant volumes of tissue that were clearly normal on visual interpretation.

The mean FDG uptake of each PET-based volume was recorded (SUVmeanVIS, SUVmean2.5, SUVmean40%, SUVmean50%, SUVmeanSBR). This was multiplied by the corresponding volume resulting in the integrated SUV (iSUVVIS, iSUV2.5, iSUV40%, iSUV50%, iSUVSBR).

Treatment outcome analysis

Follow-up visits included history, inspection of the upper aerodigestive tract and palpation of the neck. Local and regional recurrences were proven by histology and cytology, respectively. Distant metastases were identified by either pathologically or radiologically.

Statistics

All statistical analyses were performed using SPSS version 16.0 (SPSS, Chicago, IL). The significances of differences between two categories were established using t-tests or Mann-Whitney U testing, when appropriate. The normality of distributions were assessed using Kolmogorov-Smirnov tests. Variables were entered as continuous variables in Cox regression analyses to avoid the need to establish a cut-off value for local control (LC), regional recurrence-free survival (RRFS), distant metastasis-free survival (DMFS), disease-free survival (DFS) and overall survival (OS). A p < 0.05 was a priori considered as statistically significant.

Results

Tumour volume measurements

For CT-based primary tumour volume measurements, 77 datasets were available. PETVIS was generated for all 77 patients; the PETSBR segmentation tool resulted in unsuccessful volume definition in two patients. A delineation was considered unsuccessful if the resulting GTV included significant volumes of tissue that were clearly normal on visual interpretation. This was observed in four patients for both PET40% and PET50%, two of whom also had an unsatisfactory PETSBR. The PET2.5 segmentation tool was unsuccessful in 35 patients, including the four patients mentioned. As a consequence, this latter method was not evaluated further. All unsuccessful volume definitions were largely over-sized, being at least 300 cm3 and clearly incorporated benign tissue. An unsuccessful delineation did not correlate with specific tumour subsite or T stage. An example of an inadequate PET2.5 is shown in Fig. 1. The mean absolute tumour volume for the various methods were 22.7, 21.5, 16.4, 10.5 and 11.2 cm3 for GTVCT, PETVIS, PET40%, PET50% and PETSBR, respectively. GTVCT and PETVIS yielded similar mean absolute volumes, but the threshold-based methods (PET40%, PET50% and PETSBR) yielded volumes that were all smaller than GTVCT (p ≤ 0.0001 for all comparisons). Overlap and mismatch analyses performed in order to evaluate the location of the acquired volumes showed that in 64%, 59%, 29% and 31% of the PETVIS, PET40%, PET50% and PETSBR volumes, respectively, more than 20% of the volume was located outside the GTVCT domain.

Fig. 1
figure 1

Planning CT image (a), corresponding FDG PET image (b) and fusion image (c) in a patient with T3N2bM0 oropharyngeal carcinoma show differences in target volume definition. Indicated are GTV delineated on the CT image (GTVCT; red, absolute volume of 34.0 cm3) and PET-based GTVs obtained by visual interpretation (PETVIS; light green, volume 33.8 cm3), applying an SUV isocontour of 2.5 (PET2.5; orange), using a fixed threshold of 40% (PET40%; yellow, volume 14.0 cm3) and 50% (PET50%; blue, volume 13.4 cm3) of the maximum signal intensity, and applying an adaptive threshold based on the SBR (PETSBR; dark green, volume 15.0 cm3). GTV2.5 was unsuccessful in this patient because of inclusion of large areas of normal background tissue. Note that on this transverse slice PET50% and PETSBR are indistinguishable

Treatment and treatment outcome

The median primary tumour radiation dose was 68 Gy (range 64–72 Gy). Three patients were not treated; one died just prior to radiotherapy, another refused primary radiotherapy, and the third developed distant metastases prior to radiotherapy. After a median follow-up of 46 months (range 2.5–76 months), LC, RRFS, DMFS, DFS and OS at 2 years were 84%, 95%, 86%, 73% and 77%, respectively. Follow-up was at least 24 months or until the patient’s death. After primary treatment, five patients did not achieve complete remission. These patients did not have significantly different CT- or PET-based tumour volumes from the patients who did achieve complete remission. No recurrences were seen in the areas treated with an elective dose.

Prognostic value of CT and PET

Primary tumour volume (PET- or CT-based), SUVmean, SUVMAX and iSUV were not able to predict the likelihood of complete remission. The CT- and PET-based tumour volumes of the patients who did achieve complete remission (n = 69) are shown in Fig. 2. There was a significant difference in the volumes of oral cavity and oropharyngeal tumours as compared to laryngeal and hypopharyngeal tumours (p ≤ 0.004, Mann-Whitney). The values of SUVMAX for oral cavity/oropharyngeal tumours and laryngeal/hypopharyngeal tumours were 9.7 and 10.0, respectively. We analysed LC, RRFS, DMFS, DFS and OS in the 69 patients who achieved complete remission after primary treatment using primary tumour volume (PET- or CT-based), SUVmean, SUVMAX and iSUV as continuous variables in Cox regression survival analyses.

Fig. 2
figure 2

Box and whisker plot showing 5% and 95% confidence intervals (whiskers), 25% and 75% confidence intervals (boxes), and median of CT- and PET-based tumour volumes of oral cavity/oropharyngeal tumours (unfilled boxes) and hypopharyngeal/laryngeal tumours (filled boxes). There was a significant difference in the volumes of oral cavity and oropharyngeal tumours as compared to laryngeal and hypopharyngeal tumours (p ≤ 0.004, Mann-Whitney)

In hypopharyngeal and laryngeal tumours, none of the CT or PET parameters was associated with any of the outcome-related endpoints. SUVMAX and SUVmean also had no prognostic value in oral cavity and oropharyngeal tumours. The other results for oral cavity and oropharyngeal tumours are presented in Table 2. In these head and neck subsites, PETVIS was able to predict LC, whereas the other volume-based methods were not. Both PETVIS and GTVCT were able to predict DMFS, DFS and OS. Furthermore, all iSUV methods were able to predict LC, DMFS, DFS, and OS, albeit sometimes with borderline significance (p-values between 0.051 and 0.055). Figure 3 shows individual data points of GTVCT and PETVIS in relation to LC and DFS of oral cavity/oropharyngeal tumours with a follow-up of at least 24 months. Although the mean values differed significantly, Fig. 3 also shows that there was a large overlap in the volume range between patients with and without recurrence or death, indicating that the discriminative power of GTVCT and PETVIS is limited.

Table 2 Primary tumour volume (PET- or CT-based) and PET-based iSUV as variables in treatment outcome prediction in patients with oral cavity and oropharynx tumours who achieved complete remission (n = 31) after definitive (chemo)radiotherapy. Variables were assessed using Cox regression analysis. The values shown are p-values
Fig. 3
figure 3

Panels showing GTVCT and PETVIS in relation to LC (a) and DFS (b) of oral cavity/oropharyngeal tumours with a follow-up of at least 24 months. Differences were analysed using the Mann-Whitney U test

Discussion

In this study we assessed the prognostic value of CT- and FDG PET-based primary tumour volume measurements, mean FDG uptake (SUVmean) and maximum FDG uptake (SUVMAX), and iSUV in a large cohort of patients with head-and-neck cancer treated with (chemo)radiotherapy.

Interestingly, PETVIS was able to predict LC of oral cavity and oropharyngeal tumours, but GTVCT was not, while the mean PETVIS and GTVCT volumes were similar. Other studies have confirmed the lack of prognostic potential of CT-based primary tumour volume in oral cavity and oropharyngeal tumours [33, 34]. Our observation that PETVIS is associated with LC is novel. It remains questionable, however, if visual assessment can be a reliable prognostic tool given the operator-dependent nature of this method. Both GTVCT and PETVIS were able to predict DMFS, DFS and OS in these subsites. For CT-based primary tumour volume this was also observed by Chao et al. in 31 patients with oropharyngeal cancer treated with definitive (chemo)radiotherapy [35]. Apparently, in oropharynx tumours local radiotherapy response does not depend so much on the primary tumour volume, but possibly more on the biological characteristics of the tumour [36]. On the other hand, these results do suggest that metastatic potential is associated with the primary tumour volume in this head and neck subsite. One other study of 59 patients with stage III–IV head and neck cancer treated with definitive (chemo)radiotherapy found a correlation between PET-based primary tumour volume, using the PET2.5 method, and PFS [28]. After further analyses the study also showed that a volume ≥9.3 cm3 was associated with a decreased OS.

All the iSUV methods (the product of the PET-based primary tumour volume and the SUVmean within that volume, reflecting the metabolic volumes) were able to predict LC, DMFS, DFS and OS in oral cavity and oropharynx tumours, albeit sometimes with borderline significance. iSUV is a new variable fully representing the total metabolic activity within a predefined tumour volume. La et al. also found a correlation between iSUV and treatment outcome, albeit based on cumulative volumes of both the primary tumour and the PET-avid lymph nodes [27]. However, they hypothesized that the effect was due to the volume and not the product of volume and SUVmean. In contrast, our data indicate that of all the PET-based volume measurements, only PETVIS had a predictive value, while this was the case for practically all the iSUV methods. This suggests that the product of volume and SUVmean provides a more robust parameter which could possibly be a surrogate for both tumour aggressiveness and the total cancer cell mass.

In hypopharyngeal and laryngeal tumours we found no association between GTVCT or PETVIS and treatment outcome, whereas several studies have demonstrated the prognostic value of CT-determined tumour volume for outcome after definitive radiation therapy for these subsites as well as for nasopharyngeal cancer [37]. We do not have a solid explanation for this observation, except for the fact that we obtained high tumour control rates (LC at 2 years of 86%) compared to several other studies, and consequently relatively few events which would reduce the discriminative power of any pretreatment test. None of the three semiquantitative methods for PET-based tumour volume calculation (PET40%, PET50% and PETSBR) showed an association with outcome in any of the head and neck subsites. It should be noted that all three semiquantitative methods produced significantly smaller variability. This may also reduce discriminative power.

As the absolute volumes of FDG PET-based tumour sometimes partly located outside the GTVCT domain were small, it was not possible to determine whether the exact origin of a recurrence lay located outside the GTVCT domain, but within the FDG PET-based tumour volume.

In our cohort the SUVMAX of the primary tumour was not able to predict radiation treatment outcome. Table 3 summarizes the results of a literature search for studies examining the role of pretreatment FDG PET SUVMAX in patients with head and neck cancer treated with definitive (chemo)radiotherapy in predicting outcome. Of 15 studies identified, 8 showed that SUVMAX could possibly play a role in predicting radiation treatment response [1, 1925] and 7 showed that it does not [2632]. These inconsistencies could be a result of the heterogeneity of treatment modalities, the heterogeneity of tumour sites, the use of several endpoints (i.e. LC, LRF, DFS or OS), the use of various SUVMAX cut-off values, and the use of either the SUVMAX of the primary tumour or the SUVMAX of a metastatic lymph node. It is important to note that of the eight studies demonstrating an association between SUVMAX and outcome, six included substantial numbers of patients who were treated with surgery. Overall, of the 408 patients included in these six studies, 227 (55%) underwent primary surgery. In fact, the study by Brun et al. is the only one indicating that SUVMAX is a prognostic factor in a population treated with definitive (chemo)radiotherapy alone, and using only the SUVMAX of the primary tumour, finding that DFS and OS were worse when SUVMAX was >9.0 [19]. Thus, based on this overview of the literature, an unequivocal conclusion about the predictive role of pretreatment FDG PET SUVMAX in patients with head and neck cancer treated with definitive (chemo)radiotherapy cannot yet be drawn. Possibly a studies of larger cohorts of patients with homogeneous tumours and treatment characteristics stratified for the various subsites would be able to establish a role for a SUVMAX cut-off value in order to investigate future treatment individualization. Ideally these studies should use the same type of treatment and the same definition of treatment outcome.

Table 3 Summary of studies on treatment outcome prediction using SUVMAX from pretreatment FDG PET of patients with head and neck cancer treated with definitive (chemo)radiotherapy

Using pretreatment primary tumour volume based on FDG PET is appealing, and has not yet been extensively reported. In the current study, PETVIS proved to be the only PET-based volume able to predict treatment outcome, and only in the oral cavity and oropharyngeal tumours. It should be noted that the discriminative potential of PETVIS may be limited because of the large overlap between data points of patients with and without recurrence. The volumes generated by semiautomated PET segmentation methods were not useful for outcome prediction.

Thorwarth et al. demonstrated that cumulative FDG PET-based volumes of both the primary tumour and the PET-avid lymph nodes could not predict treatment outcome in a small series of patients with head and neck cancer treated with definitive (chemo)radiotherapy [31]. They generated the PET-based volume by encompassing all voxels showing a higher intensity than 40% of the maximum value. La et al. showed a correlation between DFS and OS of 85 patients with head and neck cancer treated with definitive (chemo)radiotherapy and the FDG PET-based cumulative volumes of both the primary tumour and the PET-avid lymph nodes [27]. They generated the PET-based volume by encompassing all voxels showing a higher intensity than 50% of the maximum value. Recently, Chung et al. showed a correlation between the DFS of 82 patients with pharyngeal cancer treated with definitive (chemo)radiotherapy and the FDG PET-based cumulative volumes of both the primary tumour and the PET-avid lymph nodes [26]. They generated the PET-based volume by encompassing all voxels showing an SUV of ≥2.5, and this was significant prognostic factor for DFS, whereas stage, histological grade and SUVMAX were not. In our cohort, the PET2.5 segmentation method resulted in an unsuccessful delineation in 35 patients, and factors that might explain this finding have been addressed in a previous report [15].

The use of a molecular imaging modality such as FDG PET to identify a robust variable on which prediction of treatment response and long-term outcome can be based remains attractive. Thus far, there is no role for pretreatment FDG PET as a predictor of outcome in head and neck cancer in daily routine, given the inconsistencies between studies and the low levels of evidence. However, this potential application of FDG PET needs further exploration, focusing both on FDG PET-based primary tumour volume and on iSUV and SUVMAX of the primary tumour. Preferably these questions should be incorporated in prospective phase III trials with strict criteria on treatment and outcome parameters. Other research questions are worth considering such as adding the data of a repeat FDG PET scan during treatment to the data acquired by a pretreatment FDG PET scan, and the use of different PET tracers such as 18F-fluoromisonidazole and 3′-deoxy-3′-18F-fluorothymidine, to image hypoxia and tumour cell proliferation, respectively, which are well-known tumour characteristics relevant to radiation response [38].

Conclusion

There are three major findings of this study. First, in oral cavity and oropharyngeal tumours PETVIS was the only volume-based method able to predict LC. Both PETVIS and GTVCT were associated with DMFS, DFS and OS in these subsites. Second, in oral cavity and oropharyngeal tumours the volume- and SUV-derived parameters iSUVVIS, iSUV40%, iSUV50%, iSUVSBR were consistently associated with LC, DMFS, DFS and OS, while SUVmean and SUVMAX were not. Third, in hypopharyngeal and laryngeal tumours, none of the CT and PET parameters was correlated with treatment outcome.

Given the inconsistencies between studies and low level of evidence thus far, there is no role yet for pretreatment FDG PET as a predictor of outcome in head and neck cancer in daily routine. Due to the heterogeneous nature of head and neck cancers, the difficulty in obtaining a large number of patients, and the variation in results, one has to be careful interpreting the results from our and similar studies, as they are based on a relatively low number of events. However, this potential application of FDG PET needs further exploration, focusing both on FDG PET-based primary tumour volume and on iSUV and SUVMAX of the primary tumour. Preferably these questions should be incorporated in prospective phase III trials with strict criteria on treatment and outcome parameters.