Cervical nodal volume for prognostication and risk stratification of patients with nasopharyngeal carcinoma, and implications on the TNM-staging system

We aim to evaluate the quantitative parameters of 18F-FDG PET/CT (metabolic parameters) and MRI (morphologic parameters) for prognostication and risk stratification in nasopharyngeal carcinoma (NPC). 200 (147 males, aged 50 ± 13 years-old, mean ± S.D.) newly diagnosed patients with NPC (TxNxM0) were prospectively recruited. Primary tumor and nodal lesions were identified and segmented for both morphologic (volume, VOL) and metabolic (SUV and MTV) quantification. Independent predictive factors for recurrence free survival (RFS) and overall survival (OS) were morphologic nodal volume (VOL_N, p < 0.001), TNM-stage (p = 0.022), N-Stage (p = 0.024) for RFS, and VOL_N (p = 0.014) for OS. Using Classification and Regression Tree (CART) analysis, three risk-layers were identified for RFS: Stage I/II with VOL_N < 18cc (HR = 1), stage III /IV with VOL_N < 18cc (HR = 2.93), VOL_N ≥ 18cc (HR = 7.84) regardless of disease stage (p < 0.001). For OS, two risk layers were identified: VOL_N < 18cc (HR = 1), VOL_N ≥ 18cc (HR = 4.23) (p = 0.001). The 18cc threshold for morphologic nodal volume was validated by an independent cohort (n = 105). Based on the above risk-classification, 35 patients (17.5%) would have a higher risk than suggested by the TNM-staging system. Thus, morphologic nodal volume is an important factor in prognostication and risk stratification in NPC, and should be incorporated into the staging system, while PET parameters have no advantage for this purpose in our cohort.

PET/CT protocol. 18 F-FDG was intravenously administered with a body-weight adjusted protocol (4.8MBq/ Kg) after fasting for at least 6 hours. Serum blood glucose cut-off was 10 mmol/l, and the uptake-time was 1 hour. CT acquisitions (120 kVp, 200-400 mA, 0.5 s per CT rotation, pitch 0.984:1, 2.5mm intervals, with CT contrast medium injected at the dose of 1.5 ml/Kg) followed by PET acquisitions (2 min 30 s per bed position and 6 bed position per case) with coverage from base of skull to upper third of thighs. PET image acquisition was attenuated by CT using ordered-subset expectation maximization iterative reconstruction algorithm (14 subsets and two iterations). The resultant PET images were fused with CT images for subsequent viewing (Advanced Workstation ADW 4.3, GE Healthcare Bio-Sciences, NJ, USA).
Image analysis. PET/CT images were reviewed by consensus (PLK & HY) using an image fusion system (Advanced Workstation ADW 4.3, GE Healthcare Bio-Sciences, NJ, USA). Standard uptake value (SUV) was normalized by lean body mass for all patients. A fixed threshold set at SUVmax = 2.5 was adopted for the semi-automatic segmentation of primary tumors and nodal metastasis. This threshold was selected as it has been found to provide a reasonable correlation between metabolic tumor volume and morphologic volume, and well-accepted 11 . Necessary manual adjustment was made by deleting physiological or reactive uptakes in structures such as adjacent normal brain, normal salivary glands, Waldeyer's lymphatic ring, etc so as to avoid over-segmentation (PLK & HY). Metabolic parameters, SUVmax and Metabolic Tumor Volume (MTV) of both primary tumors (T) and nodes (N) were recorded as SUVmax_T, MTV_T, SUVmax_N (of the hottest node) and MTV_N respectively. MRI images of the primary cohort were reviewed by specialized head & neck radiologists (QYA & ADK), blinded to the PET/CT findings, by consensus on a workstation (Extended MR workspace, Philips medical system, Netherland B.V., The Netherland). Cervical nodes were identified based on morphologic criteria of 1) size, using short axis ≥5mm for retropharyngeal nodes, ≥11mm for jugulodiagastric nodes and 10mm for all other cervical nodes; 2) presence of necrosis or 3) extracapsular spread 12,13 . Morphologic volume (VOL) of primary tumors (VOL_T) and all identified nodes (VOL_N) were calculated using summation-of-areas method by multiplying the slice thickness with cross-sectional area delineated from contrast enhanced T1-weighted images in axial planes.
Radiological Staging. Patients were staged using American Joint Committee on Cancer (AJCC) TNM staging system (7 th edition). Primary tumor was assessed using MRI for tumor invasion for T-stage. Notably, patients with prevertebral space invasion were classified as T2 stage. For N-stages, all nodal metastasis identified using the above stated method were assessed with maximal axial length quantified in all 3 standard imaging planes. Specifically, supraclavicular regions for N3b were defined with reference to a guideline described before 14 . In addition, identification of any node with SUVmax ≥2.5 in PET images that did not reach morphologic criteria stated above was included for upstaging N-stages. Treatment protocol. All patients received standardized therapy protocol, i.e. Intensity Modulated Radiation Therapy (IMRT), with or without induction, concurrent and adjuvant chemotherapy, according to their clinical stages, and were followed up in accordance with oncologists specialized in the treatment of NPC (DLWK & VHFL).
Chemotherapy. Generally, for stage I/II disease, only IMRT was prescribed, while for stage III to IVB patients, concurrent chemotherapy (cisplatin 100 mg/m 2 on day 1, 22 and 43) with or without adjuvant chemotherapy (cisplatin 80 mg/m 2 on day 1 and 5-FU 1000 mg/m 2 from day 1 to 4 for 3 cycles, starting 4 weeks after the completion of IMRT) were prescribed. For patients with bulky tumor, induction chemotherapy (cisplatin 100 mg/m 2 on day 1 and 5-FU 1000 mg/m 2 from day 1 to 5 for 3 weeks) was given with an aim to reduce tumor volume so that better balance could be achieved between radical dose delivered to tumors and unnecessary radiation exposure to organs-at-risk.
IMRT planning. For each patient, a thermoplastic cast and a customized mouth-guard were applied during above stated imaging and actual treatment. Planned dose was delivered using simultaneous accelerated radiation therapy technique (SMART) in 33~35 fractions (5 fractions per week, in consecutive weeks until designated dose is reached). Briefly, PET/CT images and MRI images were co-registered using a treatment planning system (Eclipse version 8.0 to 10.0 software, Eclipse Treatment Planning System, Palo Alto, CA.) for the delineation of targeted volumes including primary tumors and nodes, against organs-at-risk (OARs) including brainstem, spinal cord, globes, optic nerves, optic chiasm, lenses, temporomandibular joints, temporal lobes, auditory nerves, cochleae, mandible, oral cavity, larynx, parotid glands and vestibules. High risk regions (HRR) including the posterior half of the maxillary sinuses, nasal cavities, parapharyngeal spaces, styloid processes, basiocciput, basisphenoid, clivus, foramina rotunda and ovale, pterygopalatine fossae, pterygomaxillary fissures, infraorbital fissures, cavernous sinuses, and nodal stations (level Ib and level V) were also segmented. A total dose of 70 Gy was delivered to the targeted volumes enlarged by a 5mm margin, while a total dose of 66 Gy was prescribed to HRR with an enlarged margin of 3mm, considering inevitable motions, microscopic spread and set-up errors. Dose limits were set at 54 Gy for brainstem, optic nerves, and chiasm, and 45 Gy for spinal cord. On condition of some locally advanced diseases, dose limit for brainstem, optic nerves and chiasm could be escalated up to 60 Gy in order to maintain adequate dose delivery to targeted volumes. To parotid glands, mean dose was limited up to 26 Gy, while for the lenses and temporal lobes, minimalized dose was delivered on the premise that adequate dose was delivered to target organs.
Clinical follow-up. Tissues blocks acquired by routine 6-site biopsies during nasoendoscopy (random biopsy at roofs, lateral and posterior walls of at bilateral nasopharynx) were microscopically investigated for response assessment of primary tumor. This assessment was performed at the 8th week after completion of IMRT, and repeated 2-weekly (maximum twice) upon the detection of malignant residue in the previous histological investigation. In cases of suspicious disease-persistence in cervical nodes, ultrasonography (USG) with fine-needle aspiration biopsy (FNAB) was performed. Disease persistence was diagnosed as detectable malignant residue in the 3 rd histological inspections (12 weeks after IMRT) or any USG guided FNAB, indicating an instant salvage treatment such as salvage surgery, intracavitary brachytherapy boost, stereotactic radiotherapy, etc. with an aim to reach complete response. Patients with no positive findings were diagnosed as complete local remission, and clinically followed every 3-6 month. Contrast-enhanced head & neck MRI and 18 F-FDG PET/CT scans were only indicated upon suspicious relapse or metastasis.
Survival endpoints. Follow-up ended upon documentation of death or date of censor (Dec., 31, 2015). The primary survival endpoint was overall Survival (OS) which was calculated as a period between initial imaging diagnosis and documentation of death. Recurrence detected at nasopharynx, neck nodes and distant regions were deemed as local failure, regional failure and distant metastasis, respectively. Recurrence-free survival (RFS), as a secondary survival endpoint, was calculated as a period between initial imaging diagnosis and documentation of recurrence or death, whichever earlier. For patients with no reported endpoint events, their survival was censored upon the date of censor.
Additionally, to verify the positive findings that were found in the prospective cohort, an independent validation cohort of consecutive patients with newly diagnosed NPC (T x N x M 0 ) who received contrast-enhanced Head & Neck MRI as pre-treatment evaluation within the same period and treated by standardized protocols from the same institution, were retrospectively reviewed. Conventional imaging modalities including diagnostic CT, MRI, etc. were used to exclude distant metastasis. MRI images for treatment planning of the validation cohort were reviewed by another team (VV & YH) using the same standard stated above, with volume of nodes and TNM stages recorded. The same survival endpoints were used for analysis, with the date of censor being Jan. 1, 2017.

Statistical analysis. All statistical analysis was done by either SPSS (IBM SPSS Statistics, Version 23, IBM,
New York, USA) or R (Version 3.2.3, The R Foundation for Statistical Computing, Vienna, Austria) with necessary analytical packages such as rpart, partykit, and survival installed directly from Comprehensive R Archive Network (CRAN).
Correlations of volume from MRI and PET were analyzed using Pearson's correlation. Cox-regression model was adopted for the identification of independent prognostic factors with protocols stated as follows: Kaplan-Meier curves for categorical variables, i.e., gender, T-stage, N-stage, and the overall stage, were plotted and reviewed, with necessary combination upon similar survival curve results. Univariable followed by multivariable analysis were made to all parameters. Proportional hazards assumption was assessed by examining the Schoenfeld residuals. In the multivariable analysis, the presence of multicollinearity was assessed by a variance inflation factor (VIF). Factors suffering from multicollinearity were entered one at a time in the multivariable analysis. The factors that increased the predictive performances were also considered as independent predictive factors. Classification and Regression Tree (CART) analysis was conducted to derive a risk stratification rule with the hazard ratio of each risk-layer calculated using cox-regression model while statistical differences were compared with Kaplan-Meier model. Specifically, we entered only independent predictive factors in CART analysis, considering issues of the limited sample size and possibilities of over-fitting, and the tree were pruned with a complexity parameter (CP) at smallest x-error. The identified risk layers were cross-validated in an independent validation cohort using cox-regression model and Kaplan-Meier model. Benjamini-Hochberg procedure was adopted for multiple comparison correction. p < 0.05 was considered statistically significant.
Data availability statement. The datasets analyzed during the current study available from the corresponding author on reasonable request.

Results
Patient demographics. A total of 200 patients (147 male, age 50 ± 13 years old, mean ± standard deviation, [S.D.]) were eligible for final analysis after excluding 25 patients from the initial recruitment presenting with confounding factors of synchronous malignancies (n = 1), distant metastasis (n = 14), technical data damage (n = 1), withdrawn from the designated IMRT or lost during follow-up (n = 11). For the validation cohort, 105 patients (74 male, age = 54.2 ± 14 years old, [mean ± S.D.]) were eligible for analysis. (see Table 1 for details of patient demographics). Gender, age and all parameters derived from the primary tumor, i.e. T-stage, SUVmax, MTV_T and VOL_T, were found not predictive of OS or RFS. For OS, by univariable analysis, overall stage, MTV_N and VOL_N were predictive factors. Multivariable analysis found only VOL_N to be independently predictive of OS. For RFS, by univariable analysis, N-stage, the overall stage, VOL_N and MTV_N were predictive factors. Multivariable analysis found overall stage, N-stage and VOL_N to be independent predictive factors of RFS (Table 2).

Comparison of nodal volume based risk layers and TNM stage (7 th edition). 66 patients in the
primary cohort were identified with the highest risk level (VOL_N ≥ 18cc), using both OS and RFS as end-points, and these patients were staged as overall stage II (n = 11), stage III (n = 24) and stage IV (n = 31) using current SCIeNTIFIC RepoRts | 7: 10387 | DOI:10.1038/s41598-017-10423-w TNM staging system (7 th edition) ( Fig. 2A). Comparing to patients with VOL_N < 18cc in the same stage, patients were found to have significantly poorer survival in stage II and/or stage III diseases when VOL_N ≥ 18cc (HR = 3.91[1.24-12.33], p = 0.012 for stage II/III patients using OS as endpoint [ Fig. 2B], HR = 11.05[1.15-106.44], p = 0.009 for Stage II patients using RFS as endpoint [ Fig. 2C], and HR = 2.81[1.14-6.93], p = 0.019 for stage III patients using RFS as endpoint [ Fig. 2D]). Hence, based on our threshold, 35 out of 128 patients (27.3%) in Stages II and III, and 17.5% of total cohort, would have a higher risk than suggested by the 7 th edition of the TNM staging system which uses 6 cm measurement for N staging (i.e. N1/N2 vs N3).

Discussion
This study comprehensively evaluated quantitative parameters from staging MRI and PET/CT in a prospective cohort of non-metastatic NPC for prognostication and risk-stratification. We found the cervical nodal volume to be the predominant risk factor of survival (both OS and RFS). These findings were validated in an independent cohort. On the other hand, all parameters related to the primary tumor were not predictive. Moreover, the nodal volume based on MRI was a stronger prognostic factor compared to metabolic tumour volume (MTV). Thus, in our cohort, we found no advantage of using PET/CT for this indication.
There are only a few studies that have evaluated the volume of cervical nodes in NPC, although nodal volume is well-accepted to be predictive in other head & neck cancers 15 . For NPC, Wang et al. found that patients with cervical nodes ≥10cc have a worse 5 year loco-regional control (85.0% vs 96.3%, p < 0.05) 16 . Recently, Luo et al. found that volume of cervical nodes was independently predictive of survival, albeit marginally in a cohort of T4Nx patients 17 . In our study, metabolic activity measured by SUVmax was found predictive only by univariate analysis but not independently predictive by multivariate analysis. Several prior reports, including from our own institution of a smaller cohort, have suggested that SUVmax of cervical nodes (and primary tumor) are prognostic using cut-off values of 6. 5-7.58 ,18, 19 . In this cohort, we have identified a stronger quadratic rather than a linear correlation between SUVmax and volume of cervical nodes, indicating a decreasing strength of correlation with SUVmax in larger nodes. We postulate that the presence of necrosis that was observed in the large lymph nodes confounded SUV values. Thus, the SUVmax of nodes may not be a reliable surrogate marker when nodal disease is large/advanced.
Based on the current TNM staging system, a dimension ≥6 cm upstages the N-stage to N3 and therefore Stage IVb. However, this measurement may not be applicable for imaging based assessment. Li et al. found only 1 out of 749 patients (0.13%) present with nodes reaching this size 10 . Moreover, matted/clustered nodes are not rare in advanced NPC cases, and whether the "6 cm standard" should be applicable to a single node or matted nodes remains ambiguous, resulting in inter-observer variability. Finally, although the axial length is expected to be a surrogate of volume, the unidimensional nature renders it only suitable for lesions with regular shapes 20 . Thus, volume measurements are highly suggested to be incorporated in the TNM staging system. Our findings suggest three risk layers could be incorporated into TNM staging system such that N1 is defined as nodal volume smaller than 18cc with retropharyngeal nodes (regardless of laterality) and/or unilateral neck nodes, N2 as nodal volume smaller than 18cc with bilateral neck nodes, and N3 as nodal volume bigger than or equal to 18cc regardless of laterality. Using the threshold of morphologic nodal volume ≥18cc, 17.5% of patients would have been stratified to a higher risk than suggested by the current TNM staging system. Hence, the volume of 18cc may replace the "6 cm dimension" standard for N3 disease as stipulated by the current TNM system. A working-platform for automated or computer-assisted volume assessment by MRI would be critical in advancing the use of this important parameter as a quantitative measure, and image segmentation techniques which can be implemented in the clinics are on the horizon. On the other hand, metabolic volume of cervical nodes which was found to be highly correlated can be measured in a semi-automated way in the clinical PET workstation, and thus might be used as a surrogate of volume measurement for non-necrotic nodes.
A body of literature has found primary tumor parameters, volume and metabolic parameters such as SUV, MTV and total lesion glycolysis (TLG), to be prognostic, which is seemingly contradictory to our findings 6,21,22 .   However, many of these studies did not include the analysis of nodal disease, and hence, their findings were not adjusted for nodal status. Primary tumor volume has been associated with regional and loco-regional failure-free survival based on a threshold of relatively high tumor volume, commonly 30-50cc, suggesting that the effect of primary tumor volume on survival is only evident when tumor volume exceeds this high cut-off 6,16 . For our cohort, it is notable that patients with primary tumor volume >48cc comprised only 5% of our cohort. Indeed, in a study of 1197 NPC patients, the impact of primary tumor bulk on OS was only evident when tumor volume was large 22 . In concordance with these findings, Chua et al. found tumor volume in early stage NPC to be not a significant predictor of survival 23 . It has been postulated that one of the factors would be the marked improvements in local control, resulting in the lessening impact of primary tumor features on survival especially when tumors were relatively small 22,24,25 . On the other hand, in locally advanced bulky disease, local control is more likely to be compromised due to inadequate dose delivery (<66.5 Gy) to part of the tumor bulk because of dose limitations arising from proximity with adjacent critical organs 25 . Hence, the failure of radiation delivery itself is the primary prognostic factor. Moreover, in our cohort, the comprehensive imaging work up, clinical follow-up and aggressive salvage treatment may have led to further improving local control, lessening the survival impact of the primary tumor per se.
We recognize the challenges and controversies in the methodology for segmentation of metabolic tumor volume, and that the selection of a fixed threshold for segmentation is arbitrary. We did not adopt adaptive threshold using the Fuzzy Locally Adaptive Bayesian (FLAB) based threshold, or percentage-based thresholds as there would be then no criteria for selection of reportable lesions. Moreover, percentage based thresholds are highly dependent on SUVmax, causing strong collinearity with volume measurement. Also, using percentage-based thresholds, metabolic volume is likely to be overestimated in low SUVmax lesions while underestimated in high SUVmax lesions, resulting in a weak correlation between metabolic volume and morphologic volume 11,26,27 . Hence, a fixed threshold was applied in our cohort. We did not incorporate Total Lesion Glycolysis (TLG) into survival analysis due to collinearity issues.
Our study is limited by the lack of a gold standard for determination of metastatic nodes by histological confirmation which is clinically not feasible. Instead, we have utilized standardized and well-accepted parameters by both MRI and PET/CT for the cohort, acknowledging the inherent limitations in false positive and false negative rates. Also, the validation cohort did not have PET/CT in addition to MRI scans. However, our aim was to verify the significant positive finding that nodal volume measured by MRI was an independent significant predictor for survival and that the 18cc volume threshold was robust in risk stratification.
In conclusion, morphologic nodal volume using MRI is an important factor in prognostication and risk stratification in NPC, and is suggested to be incorporated in the staging system. On the other hand, quantitative PET parameters have no role for this purpose in our cohort. However, in clinical practice, PET/CT has been found useful in detection of suspicious nodal involvement in small volume nodes for radiation treatment planning and for the exclusion of distant metastases in advanced disease.