Validating PET segmentation of thoracic lesions—is 4D PET necessary?

Respiratory-induced motions are prone to degrade the positron emission tomography (PET) signal with the consequent loss of image information and unreliable segmentations. This phantom study aims to assess the discrepancies relative to stationary PET segmentations, of widely used semi-automatic PET segmentation methods on heterogeneous target lesions influenced by motion during image acquisition. Three target lesions included dual F-18 Fluoro-deoxy-glucose (FDG) tracer concentrations as high- and low tracer activities relative to the background. Four different tracer concentration arrangements were segmented using three SUV threshold methods (Max40%, SUV40% and 2.5SUV) and a gradient based method (GradientSeg). Segmentations in static 3D-PET scans (PETsta) specified the reference conditions for the individual segmentation methods, target lesions and tracer concentrations. The motion included PET images followed a 4D-PET (PET4D) and a 3D-PET (PETmot) scan protocol. Moreover, motion-corrected PET images (PETdeb) were derived from the PETmot images. Segmentations in PET4D, PETmot and PETdeb were compared to the PETsta segmentations according to volume changes (ΔVol) and an error estimate (lowUptakeerror) for the lesion part covering the low tracer concentration. In PET4D images, all segmentation methods provided lowUptakeerror estimates equivalent to PETsta segmentations and, except for the Max40% segmentations, a slight volume expansion. In the PETmot images, the GradientSeg method results in an average 0.43 increased volume and an overestimation of 0.33 for the lowUptakeerror. The most accurate segmentations in PETmot, relative to PETsta, were accomplished by the 2.5SUV and SUV40% methods. In the PETdeb images, the GradientSeg method solitary provided segmentations equivalent to segmentation in PETsta images. The use of FDG with various tracer concentrations revealed, according to PETsta images, that the most constant segmentations for motion-corrected PET images (PET4D or PETdeb) were achieved using the GradientSeg method. In the absence of PET4D or PETdeb images, the 2.5SUV and SUV40% methods are most consistent to PETsta segmentations.


Introduction
In general, lung cancer is classified into two main categories, defined as small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). The majority and minority of all lung cancers are classified as NSCLC (85%) and SCLC (15%) respectively [1]. For both SCLC and the NSCLC, radiotherapy is one treatment strategy depending of the extent of the cancer as indicated by lymph node involvement and potential metastases according to the TNM definition [2,3]. The hybrid positron emission tomography (PET) and computed tomography (CT) and recently introduced PET and magnetic resonance imaging scanners benefits a minimal misregistration between image acquisitions [4,5]. The combined use of F-18 Fluoro-deoxy-glucose (FDG) PET and CT provide high accuracy for the clinical staging of lung cancers [6][7][8] and lower inter-observer variation for target definition [9]. A high FDG uptake was associated with the preferential site of local relapse of NSCLC in a recent study by Calais et al [10]. Respiratory motion may result in motion artefacts, thus requiring motion encompassing images in PET/CT scans (4D-PET/ CT). The 4D-PET/CT scan benefits by nearly motion free images by selecting an appropriate number of respiratory phase bins and thus, it may further improve cancer staging and target volume definition in radiotherapy [11]. The use of semi-automatic delineation (segmentation) methods may reduce the inter-observer variation in the clinical target volume (CTV) outline for radiotherapy [12,13]. Some PET segmentation methods for outlining CTV are intensity-based, such as the standard uptake value (SUV) or the fixed/adaptive threshold activity value. Other more sophisticated methods may be established using activity statistics, resulting in segmented clusters or the use of spatial gradients in activity for segmenting boundaries [14,15]. Currently, no consensus exists on methods for FDG-PET segmentation in lung cancer, but a threshold of 2.5 SUV or a percentage of the maximum SUV (SUV max ) such as 40%-50% is widely used [7,8,16]. Using a fixed SUV threshold for target determination still result in variations in target definition due to partial volume effect, activity recovery and FDG uptake heterogeneities [17,18]. For PET/CT imaging of lung cancer, the effect of respiratory motion may additionally result in image intensity changes or eroded intensity gradients. Consequently, respiratory motion could result in less accuracy in established segmentation methods and therefore in uncertainty of CTV definition.
The main purpose of this work was to construct a phantom enclosing various lesions with heterogeneous PET tracer activities and to assess the effect of motion on commonly used semi-automatic PET segmentation methods according to equivalent segmentations in stationary PET images. A secondary purpose was to study the effect of commonly used segmentation methods on post-processed motion corrected 3D-PET images created by deconvolution of motion involved 3D-PET images with the target position density derived from a 4D-CT scan.

Equipment
The target volumes consisted of three in-house manufactured interlaced double volumes, titled A, B and C submerged in a water-filled phantom of size 17 cm×17 cm×18 cm (figure 1). Each of the three target volumes involved high and low PET tracer uptake sub-volumes (highUptake=A1, B1 and C1 and lowUptake=A2, B2 and C2), allowing for different PET tracer uptake ratios relative to a uniform background activity. The target size of volume A and B correspond to a simulation of lung tumour up to stage T3, whereas the target C correspond to a simulation of lymph node [3]. The water-filled phantom with target volumes was connected to a QUASAR TM Respiratory Motion Phantom (ModusQA Medical Devices), as illustrated in figure 2. In static mode, no motion was applied to the phantom, whereas in motion mode, a 15 mm peak-to-peak amplitude (nearly sine profile), 15 cycles per minute, orientated in the scanner longitudinal z-direction was applied to the phantom.
A total FDG PET tracer activity of 80-120 MBq (Injected activity ) was prepared for the phantom with four different FDG concentrations (ratios) relative to the background activity of SUV=1. The SUV definition for this study was corrected for FDG activity decay at acquisition time (t) as the following: with Phantom weight defined as background water and target volume materials. The target volume uptake ratios (uptakeRatio) were organised as 4:4:1, 3:2:1, 4:2:1 and 8:3:1, where for instance, a ratio of 4:2:1 represented a highUptake SUV=4, a lowUptake SUV=2 and a background SUV=1. The ratio of 4:4:1 indicated equal uptake . The target high sub-volumes are denoted by one (A1, B1 and C1) and the target low sub-volume as two (A2, B2 and C2). Targets A and B consisted of interlaced spheres, whereas the target C involved two cones aligned side-by-side in opposing direction.
(SUV=4) for both high-and lowUptake subvolumes. The highUptake sub-volumes (A1, B1 and C1) were included in every PET segmentations, whereas the lowUptake sub-volumes (A2, B2 and C2) were estimated to be segmented in approximately half of the uptake ratios as labelled in table 1.

Imaging protocols
For each instance of FDG uptake ratio, the individual target volumes were subjected to scan protocols in both static-and motion-modes using an integrated PET/CT scanner (Discovery STE, GE Healthcare).  . Relationship between the PET segmented volume SEG 4D,sta,mot,deb and the lowUptake reference volume CT lowUptake used to define the lowUptake error quantity. Sub-figure (A) illustrates a situation where the lowUptake sub-volume was partially segmented and in sub-figure (B), a homogeneous tracer uptakeRatio=4:4:1 illustrated an optimal lowUptake segmentation.

Static/motion 3D-PET/3D-CT
The 3D-PET scans were acquired using a 2 min acquisition time for a single bed position. In static mode the 3D-PET were labelled PET sta and with phantom motion enabled PET mot . The PET scans were acquired in 3D mode and reconstructed to matrix size of 256×256×47 pixels (voxel size 2.73×2.73×3.27 mm 3 ) with a 3D-OSEM algorithm (2 iterations, 26 subsets) and a post-Gaussian filter of 3 mm full-width-half-maximum. The corresponding CT scans were acquired in helical mode at 120 kVp, 80 mAs. The CT scan length was marginally beyond the phantom volume of 17 cm plus the respiratory motion extent. The 3D-CT were labelled CT sta in static mode and with phantom motion enabled CT mot . The CT scans were reconstructed with a CT slice thickness of 1.25 mm and a matrix size of 512×512 pixels (voxel size 0.977×0.977×1.25 mm 3 ). The CT sta /CT mot were used for attenuation correction of the corresponding PET sta /PET mot images.

4D-PET/4D-CT
The 4D-PET (PET 4D ) were acquired using a total PET acquisition time of 12 min, for 6 PET bins (2 min per phase bin). The number of phase bin was carefully chosen to obtain nearly motion free PET images. Given the motion amplitude and target size, Bettinardi et al proposed the number of bins to obtain motionfree images in 4D-PET according to motion-amplitude and target sizes [11]. The 4D-CT (CT 4D ) were acquired in cine axial mode at 120 kV, 80 mAs with a CT gantry rotation time of 0.5 s, cine time between images of 0.5 s and a total cine duration of 5 s. The 4D-CT was retrospectively re-sorted into 6 phase bins (maximum phase error 5%), using respiratory motion data obtained by the real-time position management system (RPM from Varian Medical System). Equivalent to 3D-CT, the CT images were reconstructed in 1.25 mm slice thickness (512×512 pixels). The matching CT phase bin from CT 4D were used for attenuation correction of the individual PET bins in PET 4D .

Motion-correction
In addition, a motion corrected post-processing procedure (motion deblurring) applied to the PET mot scans resulted in PET images labelled PET deb . The following section provides a more detailed description of the motion-correction procedure.
The motion-corrected PET deb were iteratively reconstructed from the PET mot , comparable to the method described by [19]. The PET deb volumes were derived by deconvolving PET mot with a target position probability kernel (K motion ), estimated from the 4D-CT. The K motion was derived using a normalised position probability kernel estimated from the CT target centre-of-mass position within the 4D-CT. Using the phantom trajectory of motion aligned in the superiorinferior (z-axis), the main contribution to the position probability density was coupled to this direction. The 4D-CT acquired with the same FDG uptake ratio was used for the K motion determination using a rigid target propagation from the CT phase bin 0% throughout the remaining five phases. A forward Van-Cittert iterative deconvolution method was performed as following: with the initial condition PET PET .
k deb 0 mot = = For simplification, the deconvolution was applied in 2D space slice by slice, thus the PET mot was resliced within the primary 2D motion plane (x, z). Succeeding each iteration a post-processing image correction was applied, limit the PET k deb 1 + within the original PET mot minimum and maximum values within a 4 mm radius, thereby reducing Gibb's phenomenon. A visualisation of the PET sta , PET mot and PET deb is provided for the uptakeRatio 8:3:1 in figure 4. SUV threshold values among the tracer uptakeRatios and segmentation methods. Grey cells indicate the predicted segmentation of the highUptake volume only, whereas white cells indicate predicted segmentation of both the high-and lowUptake volumes. The non-threshold method GradientSeg was marked not available. The GradientSeg low/high-Uptake separation were predicted assuming accurate PET static segmentation. For the uptakeRatio of 4:2:1 and 8:3:1, combined with GradientSeg in PET static images, only the highUptake volume were segmented.

Image registration
The PET and corresponding CT scans were coregistered using the common frame of reference (DICOM origin) from the PET/CT scan. For the individual uptakeRatio a sequence of PET registrations were organised between PET sta , PET 4D and PET mot / PET deb as following: a registration from PET sta to the PET 4D mid-ventilation (bin3) and the PET 4D (bin3) to PET mot /PET deb using the complete PET intensity range for registration with a roughly 2 cm isotropic margin covering the lesions as a volume of interest.

Segmentation
Four different PET segmentation methods were applied to every PET images, PET 4D (6 bins), PET sta , PET mot and PET deb for each of the three experimental phantom volumes A-C.
I. Max40%: a fixed threshold at 40% of the maximum SUV value defined as the mean SUV over 1 cm 2 around the peak intensity pixel. Voxels above the threshold value were segmented as the target volume.
II. SUV40%: threshold of 40% maximum SUV as described above and corrected for the background level (SUV bg ). IV. 2.5 SUV: threshold of 2.5 times the SUV. The segmentation included voxels with an uptake value above 2.5 SUV.
In the static CT (CT sta ) images, the high-CT highUptake ( ) and lowUptake CT lowUptake ( ) subvolumes were segmented individually using geometric 3D-structures aligned in the image according to the lesions walls. For target volumes A and B, a 3D-sphere of 39 mm (internal sphere diameter) was used for segmentations, followed by a Boolean operation to distinct the CT highUptake and CT lowUptake sub volumes. For the cone shaped target C, the first and last CT slice containing the target was segmented as 2D planar circles (diameter of 7 and 27 mm) connected by a linear interpolation. Through the image registrations, the CT lowUptake segmentations were transferred to PET 4D (bin3), PET mot and PET deb . A rigid alignment of CT lowUptake throughout the PET 4D were completed manually for the remaining bins.

Data analysis
The PET segmented volumes in the different scan protocols (SEG 4D,mot,deb ) were compared to the static segmentations (SEG sta ) according to volume change (ΔVol) and an error estimate of the lowUptake subvolume (lowUptake error ) segmentation. The segmented volume change was defined as the difference between PET segmentation relative to static segmentation for an equivalent segmentation method, target and tracer uptakeRatio.
The lowUptake error was defined by the volume similarity in terms of dice similarity coefficient (DSC) between PET segmentations and the CT lowUptake volumes labelled A2, B2 and C2 in figure 1.
I. DSC lowUptake as spatial volume similarity between the PET segmented volume and the CT lowUptake for the specific segmentation methods, target and uptakeRatio. An illustration of the volume similarity between the PET segmentation and the lowUptake sub-volume is provided in figure 3. As the PET segmented volumes extent outside the lowUptake sub-volume, the DSC lowUptake never range to the maximum value of one, as 0DSC lowUptake <1.
II. lowUptake error , a quantity metric quantifying the difference between the DSC lowUptake in equation (5) and the DSC lowUptake in the static PET sta image, relative to the DSC lowUptake measured with equal tracer uptake in both high-and lowUptake sub-volumes, for the matching segmentation method and target. Vol D and lowUptake error were compared to corresponding segmentations in the static PET sta images for identically target volumes and uptakeRatios with a 5% significance level for a one-sample t-test.

Results
The CT highUptake and CT lowUptake volumes (mean±1 sd) deviations from the actual cavity volumes were 2%±4% for targets A and B, while the target C deviation was −5%±5%. The background measured SUV PET tracer activity (mean±1 sd) was 1.00±0.06, including all FDG-PET tracer uptakeRatios. Summary tables of the ΔVol and lowUptake error metric are listed in tables 2 and 3, respectively. Apart from the Max40% segmentation method the PET 4D scans produced enlarged segmented volumes compared to the PET sta scans with an increased ΔVol for the SUV40% of 2% (p=0.001), gradientSeg of 4% (p=0.03) and the 2.5SUV of 4% (p<0.001), including all six PET bins. The lowUptake error of the PET 4D was not significantly different from lowUptake error of the PET sta for any segmentation method. For the PET mot scan protocol, the ΔVol significantly increased for the Max40% of 15% (p<0.001) and GradientSeg of 43% (p<0.001) methods compared Mean with 95% CI segmented volume differences of the PET 4D (all 6 bins), PET mot and PET deb protocols separated into segmentation methods, including all tracer uptakeRatios and target volumes. to the PET sta images. No significant changes in the ΔVol were observed for the SUV40% and 2.5SUV methods in the PET mot scan protocol. A significant increase in the lowUptake error quantity of 0.33 was detected for the PET mot scan protocol using the GradientSeg method (p<0.001). None of the three methods Max40%, SUV40% or 2.5SUV provided lowUptake error in the PET mot protocol significantly different from the lowUptake error in the PET sta protocol. In the motion corrected PET deb scan protocol the Max40% (p<0.001) and SUV40% (p=0.03) methods both presented increased segmented volumes of 10% and 6% respectively, compared to the PET sta scan protocol. The 2.5SUV method presented a lower segmented volume of −4% in PET deb , in which marginally significant difference was found compared to the PET sta scan protocol (p=0.048). In the PET deb scan the GradientSeg segmentations was not significantly different from the PET sta scan protocol according to both the ΔVol and lowUptake error quantity. As the only segmentation method the Max40% presented a lowUptake error metric that was significantly different in the PET deb scan protocol compared to the PET sta (p=0.04).

Discussions
This study aimed for a method to evaluate PET segmentation methods accuracies relative to static PET segmentations, for target lesions combined with motion and tracer uptake heterogeneity. The study design allocated the highUptake volume within every instance of segmentation. When motion was applied, the activity in the highUptake volume (as well as the lowUptake sub-volume) was partially displaced in the direction of the lowUptake location, simulating an overflow effect into the lowUptake location. For the instance of the cone shaped volume (target C), the overflow effect would be from the cone wide-end highUptake sub-volume into the neighbouring low-Uptake cone. This study design using a phantom with plastic inserts with zero tracer uptake and motion restricted to a regular one dimension (superiorinferior) are limitations that may not be directly equivalent to the clinical conditions. Additionally, using the static segmentation as a reference condition for each segmentation method and uptakeRatio, revealed inconsistency using the GradientSeg method for target containing SUV of 3. For the instance with uptakeRatio of 3:2:1 the highUptake volume were segmented using GradientSeg method, whereas with uptakeRatio of 8:3:1 the lowUptake sub-volume was absent in the segmentation. The non-appearance of lowUptake SUV=3 for this instance could be explained by the highUptake SUV of 8, which add high intensity gradients and thus underestimate the nearby lowUptake sub-volume. The acquisition time was retained equal between 3D and the individual 4D PET bins. This would lead to a scan time, in 4D-PET, not suitable for all patient intended for lung cancer radiotherapy. In contrast to clinical conditions, the injected activity for this study with a background of roughly 10 kBq ml −1 , lead to improved count statistics and hence reduced noise. The influence of a reduction of injected activity and hence reduced count statistics, in combination with motion and inhomogeneous target activity, have not been addressed in this study. The applied tracer concentrations and the uniform background activity, are likely to improve segmentation on images effected by motion. Thus further investigation of both reduced activity and increased target activity variations are needed to associate the phantom study to clinically condition. The effect of cold wall in combination with threshold PET segmentation has previously been addressed by [21] as potentially leading to overestimation of wall-less targets. The effect of irregular respiration patterns on different segmentation methods have been addressed by [22]. Despite Carles et al reported no substantial segmenting differences; irregular respiration during PET imaging might eventually lead to degraded PET images between the individually 4D-PET bins due to reduced count statistics. Excluding the Max40% method, the segmented volumes in PET 4D were slightly increased compared to the PET sta protocol. The residual motion in the PET bins connecting the inhale and exhale respiratory phases could explain the increasing volumes in PET 4D . However, the narrow confidence interval of the ΔVol in PET 4D points to a scan protocol that was consistent with segmentations in PET sta . The lowUptake error quantities in PET 4D supported segmentations consistent with the PET sta scan protocol as no significant deviation was detected. This finding was consistent to the result obtained by Bettinardi et al [11]; thus for these investigated segmentation methods, a six bin 4D-PET is recommended to be sufficient for motionfree image acquisition with an amplitude of 15 mm.
For the motion involved PET mot scan protocol the segmented volumes were enlarged, except for the 2.5SUV segmentation method. In both PET mot and PET deb the 2.5SUV method provided segmentations that were equivalent to the PET sta acquisition. In contrast to the other two threshold based segmentation methods (Max40% and SUV40%), the 2.5SUV method did not involve the maximum intensity, which degrades in motion-disturbed image acquisitions [18]. Eroded SUV values may not necessarily correspond to different volume of segmentations, as long as the SUV remains above the threshold value. Changed constellations of the FDG activities in the highand lowUptake volumes are likely to affect the results in PET mot images, particularly for threshold values close to the target SUV. The Max40% method exclusively relies on the maximum SUV, whereas the SUV40% method also facilitated background SUV correction. Due to the degraded SUV of PET mot images, a lower absolute threshold SUV value was used for Max40% segmentation compared to the SUV40%, thus segmenting slightly enlarged volumes relative to the SUV40%.
In specific the PET mot protocol, the GradientSeg method presented an enlarged segmented volume and an overestimation of the lowUptake sub-volume. Since the GradientSeg method makes use of intensity gradients as segmentation boundaries the enlarged volume, including the lowUptake sub-volume, was expected as a consequence of motion blur. This finding was supported by the PET deb scan protocol, which revealed that the segmentations were equivalent to PET sta scan protocol due to the motion correction with a re-establishment of the image intensity gradients.
Motion corrections through PET image deblurring with a position density estimate rely on accurate 4D-CT target detection. Due to the centre-off-mass uncertainties within 4D-CT phases, the position density estimate may potentially result in an inaccurate processing of the deblurred PET image. An increased number of bins for 4D-CT may improve the temporal centre-of-mass detection and high contrast fiducials could clarify detection within each phases and limit uncertainties within K motion determination. Detecting fiducial centre-of-masses within a 4D-CT scan is still limited by other factors such as the velocity during CT phase bin acquisition, orientations and fiducial style [23]. Other methods have been suggested to identify and reduce motion blur in PET images without 4D-PET, using either deformation vector field for correction or deblurring methods [24,25]. However, segmentation in these motion corrected images were performed using a fixed threshold of the maximum intensity, close related to the Max40% method. Furthermore, local deblurring must be performed near distinct lesions due to the various respiratory motion trajectories inside the lungs [26].
In radiotherapy, including tissue with low PET activity to the planning target volume (PTV) due to non-corrected images (PET mot ) and a high accurate segmentation method (GradientSeg) in stationary images (PET sta ), could potentially lead to enlarged PTV and consequently increased normal tissue complication [27]. On the other hand, PET images which encompass the all possible target positions throughout a full respiration cycle for internal target volume (ITV) definition could be used for segmentation. In contrast to the motion blur of PET mot in the current study, which encompassed all target locations, motion incorporation using super resolution-corrected PET images showed improved accuracy according to the 'true' ITV [28].
The main restrictions of the current study are the limited number of different tracer heterogeneity values and target geometries. In particular, heterogeneous PET lesions have been reported to be challenging to segment for threshold based methods and thereby suggesting sophisticated methods instead [29]. This current study exposed the similarity of the segmentation methods relative to a motionless PET acquisition. With segmentation evaluation relative to static PET segmentations, reliable ground-truth tumour volume description in static PET images is fundamental. A study by Wanet et al revealed high accuracy for the GradientSeg method in 4D-PET compared to the surgical specimens of lung cancers [20]. A closely related study by Yu et al published an optimal SUV threshold close to 3.0 with a threshold of 2.5 SUV that was not significantly different from the surgical specimens [8]. These related studies assessing the accuracies of PET segmentations according to pathological findings indicate a general recommendation for the use of GradientSeg or 2.5 SUV segmentation methods. The findings in the present study identifying the discrepancies in PET segmentation, with and without motion-correction, combined with heterogeneous tracer uptake indicates that caution should be used with maximum intensities for threshold segmentation. In addition, severe overestimation potentially exists for the GradientSeg method in non-corrected motion influenced PET images. For the instance of motion affected target lesions and the absence of 4D-PET or 3D-PET motion correction, the 2.5SUV segmentation method is recommended.

Conclusions
This experimental study of target lesions with heterogeneous tracer (FDG) uptake and respiratory motion demonstrated that motion-corrected PET imaging (PET 4D or PET deb ) in combination with GradientSeg method provided the most consistent and accurate PET segmentation amongst the tested segmentation methods. The GradientSeg method should be excluded in the absence of motion correction.