Investigation of Robustness of Radiomics Features Generated with Grey Level Co-occurrence Matrix (GLCM) for Positron Emission Tomography (PET) Image Analysis

Background: Quantification of heterogeneous radiotracer uptake in PET has the potential to be used as a biomarker of prognosis. Textural features accounting for both spatial and intensity information have recently been applied to FDG-PET images and used to predict treatment response. However, textural features have been predicted to strongly depend on volume. Other factors affecting textural features such as segmentation and quantization have previously been investigated on clinical data while image contrast and noise have not been assessed systematically. This study aims to investigate the relationships between textural features and these factors using phantom data. Methods: different and with a The inserts and was then scanned and using for only. All were using three approaches 1) the exact boundaries based their known diameters, 2) 40% fixed threshold and 3) adaptive threshold. Textural features were derived from the co-occurrence matrix using different quantization levels Results: volume. When using the exact delineation, contrast and scan duration (noise) have a lesser effect on textural features than sphere volume. When applying the same exact regions on the uniform background (no partial volume), the relationships between textural features and volume are comparable to when applied to the respective spheres except for correlation. Textural features are indirectly related to noise and contrast via segmentation with adaptive threshold being superior compared to the fixed threshold. Conclusion: Among the six textural features, homogeneity and dissimilarity are the most suitable for measuring PET tumour heterogeneity with quantization 64 if regions are segmented using methods that are robust to noise and contrast variations. To use these textural features as prognostic biomarkers, changes in textural features between baseline and treatment scans should always be reported along with the changes in volumes.


Introduction
PET radiotracer uptake in tumour is often heterogeneous due to different biological characteristics of tumour cells (e.g., cell proliferation, cell death, differential metabolic activity, vascular structure etc.). Heterogeneity defines the aggressiveness and therapeutic resistance of the tumour and makes the effective treatment strategies challenging (1).
Because of this reason, accurate quantification of intra-tumour heterogeneity has the potential to be used as a person specific tumour staging and prognostic biomarker (2)(3)(4)(5). A large amount of quantitative features representing heterogeneity can be extracted from these images which is termed as radiomics (6). Artificial intelligence (AI) assisted accurate classification of these tumour radiomic features can make tumour staging and prognostic biomarker more robust (7,8).
Among a number of heterogeneity features (2,(9)(10)(11)(12), textural features (homogeneity, correlation, energy, contrast, dissimilarity and entropy)a second order heterogeneity metric extracted from quantifier based grey level co-occurrence matrices (GLCMs) (13) accounting for both spatial and intensity information have shown to be capable of staging tumour (14) as well as to predict response (15,16) for FDG PET images at varying levels.
GLCMs are generated using quantized or resampled intensities within a volume of interests (VOIs) (16) where intensities are resampled in an integer number of bins with the number of bins being power of 2. Textural features extracted from these GLCMs have been reported to be strongly dependent on the metabolically active volume (MATV) using simulated data (17) and confirmed on clinical data (18)(19)(20)(21)(22)(23). Intensity quantization substantially affects the texture indices and thus should be chosen carefully (18,24). Reducing quantization always decreases homogeneity (25) and prognostic impact of the textural features is influenced by quantization level (26). Several groups have suggested using either quantization level 32 (18) or 64 (16,21). Quantization level 150 or higher also has been proposed in other studies (17,19). No statistically significant differences have been reported in another study (16). Three textural features -homogeneity, dissimilarity and entropy are found to be robust to delineation method and partial volume effects (PVE) (21). A separate study suggested that smoothing and segmentation have only a small effect compared to quantization (24). Relationships of textural features with tumour heterogeneity using different parameters and methods reported in literature are summarized in supplementary Table S1.
Non uniform selections of parameters and methods across studies make the choice of best textural feature based on MATV, quantization and segmentation challenging and its relationship with the tumour biological characteristics indistinguishable (18,20).
Relationship between volume and quantization has not been explicitly investigated in these studies. Likewise, effects of other parameters such as image contrast and noise on segmentation and textural features have not been reported systematically. This study aims to investigate the relationships of textural features with these parameters using phantom data.

Materials and Methods
The torso NEMA phantom ( Figure 1) containing six spheres (0.52, 1.15, 2.57, 5.58, 11.49 and 26.52 ml or cm 3 ) was filled with 18 F solutions to yield three different contrasts between the hot spheres and the colder uniform background (2:1 and 4:1). The activity ratio between spheres and background are shown in Table 1   Table 1. Activity concentration of the spheres and background (full width at half maximum) Gaussian filter after applying decay correction. The details of the full reconstruction can be found in (27).
In a separate scan, data were acquired by replacing the six spheres with two separate spheres with volumes of 8.18 and 18.82 ml corresponding to 25 and 33 mm diameter respectively ( Figure 1). Each of these sphere also contains another smaller sphere of volume 1.15 ml (13 mm diameter) within it to create two separate compartments. The wall thickness kept at 2 mm. When these two compartments are filled with different level of activity, they represent a tumour which has a deprived core and a hot rim. 3D printing technology was utilized to create the two hot rim tumours. The background and inner spheres were filled with 564 kBq/ml and the outer rims of the spheres were filled with 2564.5 kBq/ml activity respectively to create a contrast of 4:1 between the hot rim, inner core and background. All the spheres (both homogeneous and heterogeneous) were delineated using three different segmentation methods. First volume of interest (VOI true ) was estimated using the calculated boundaries based on the known diameter and position of each sphere. The second delineation method was a fixed threshold set to 40% (I 40T ) of the maximum intensity (I max ) within the sphere giving a VOI noted as VOI 40T (28) Voxels having activity more than I 40T were included in the VOI 40T .
The final volume of interest (VOI A ) was estimated using an adaptive threshold based method as described by Schaefer et al (29), where the threshold intensity (I A ) is given by I 70 is the mean intensity in a contour containing all voxels with a value greater than 70% of the I max in the sphere and I bg is the mean background intensity within a sphere of size 26.52 ml located away from all the spheres to avoid partial volume effect (PVE). Both the threshold based methods were applied separately on each roughly delineated VOI containing a sphere to generate the corresponding VOIs. The α and β parameters for the adaptive threshold were calculated using the mean value of optimal cutoff intensities (I optimal ) of five realizations using all contrasts for each acquisition duration. I optimal of each hot sphere is calculated using optimal threshold (T optimal ) and I max . T optimal is estimated as the percentage threshold value of I max which provides the best matched thresholded volume with the VOI true for the uniform sphere phantom. To

Results
One representative slice for both homogeneous and heterogeneous spheres for both contrasts and three different acquisition durations (900, 2000 and 7200 seconds) are shown in Figure 2.
All the textural features are dependent on the quantization value. Figure 3 shows the relationships between mean textural features of five realizations and quantization values for VOI true . Homogeneity exponentially decreases with the increase of quantization levels.   Dependency of textural feature on sphere volume for contrast 4:1 is shown in Figure 5.
Features for the spheres located at the background also show dependency on the volumes.
Homogeneity and entropy increase with volumes, whereas contrast, dissimilarity and energy decrease. There are subtle differences between the spheres and backgrounds for homogeneity, contrast and dissimilarity showing their dependency on the volume edge.
Entropy and energy are robust to edge as shown by very good agreement between the sphere and background. The separation between sphere and background for correlation indicates that it is more dependent on the intensity variations. All the features reaches plateau with the increase of volume at varying rates. The features do not vary with the contrast if VOI true is known ( Figure 6).    Volumes estimated from all three methods for all three contrasts are shown in Figure 7.
VOI 40T , estimated using I 40T , are always smaller compared to VOI true for 8:1 contrast and high noise. VOI 40T for contrast 4:1 are similar to VOI true especially for volumes bigger than 5.58 cm 3 and less dependent on noise, indicating that I 40T is optimal for 4:1 contrast. VOI 40T are always overestimated for contrast 2:1 for all noise levels for volumes less than 5.58 cm 3 .
For bigger volumes VOI 40T are underestimated for high noise cases for contrast 2:1. As the contrast goes down, the dependency on the volume and noise increases and the discrepancies between VOI A and VOI true become noticeable especially for smaller volumes and higher noise. Figure 8 and 9 compare the relationships between textural features and acquisition durations for VOI true with VOI 40T and VOI A respectively for contrast 2:1. Textural features derived using VOI 40T are significantly different than those of VOI 40T for smaller volumes ( Figure 8).
As the volume increases the differences between them reduces. The textural features also vary with the noise as the VOI 40T vary with the noise. With an adaptive segmentation method, all textural features become independent of noise for volume greater than 2.57 cm 3 and match closely with the features generated using VOI true (Figure 9).      Because of that, all the textural features generated using VOI true for smaller heterogeneous sphere agrees with the homogeneous spheres of similar volumes except correlation, which is dependent on the intensity. Homogeneity, contrast and dissimilarity measures are different for the bigger heterogeneous sphere compared to the homogenous ones of similar volumes.
Though being 123% bigger than the smaller one, the bigger heterogeneous sphere shows little differences in these three textural features for VOI true indicating that the volume effect is compensated by heterogeneity. High sensitivity of correlation to intensity also makes it less suitable to report changes in heterogeneity.
Two threshold based delineation methods (40% fixed and adaptive) were employed to investigate the effects of segmentation on textural features. It has been found that fixed threshold can only be optimal for a certain contrast for all noise levels and remains sensitive not only to contrast and volume but also to noise. With an adaptive approach, the dependency on contrast and noise is reduced and becomes less sensitive to contrast and noise. However, for the case of clinical setting where the acquisition duration is generally 15 minutes or 900 seconds, the adaptive threshold performance is not optimal particularly when the volume is small and contrast is low. The volumes generated using these two methods are substantially different. Since VOIs delineated using 40% threshold are different from each other, textural features generated using these VOIs are also different with the actual lesion volumes being the same. However, since VOIs generated using adaptive threshold matches with the VOI true , textural features are closer to the true textural features compare to VOI 40T . These results suggested that texture indices are highly sensitive to the segmentation method due to the erroneous inclusion or exclusion of the boundary or the edges of the lesion. These findings are confirmed by the consistent textural measures provided by the different shapes with the same volumes placed in the uniform background. The results are also consistent with the previously published findings (18,21,30).
Volume delineated by a robust segmentation method is capable of generating textural features such as homogeneity, contrast and dissimilarity that are capable of capturing tracer uptake heterogeneity if the volume changes between scans are minimal. Since homogeneity directly related to volume, it can only be used as a feature of image heterogeneity if the changes of volume and homogeneity are in opposite directions, i.e., if the combined multiplicative changes of volumes and homogeneity are either zero or negative. On the other hand, as contrast and dissimilarity are inversely related to volume they can be used as an image heterogeneity feature if the combined multiplicative changes of volumes and homogeneity are either zero or positive. Since contrast is approximately two times more sensitive to volumes compared to dissimilarity, homogeneity and dissimilarity are the two textural features that should be used to measure heterogeneity. These two features also provide complementary heterogeneity information which can be used for cross validation.

Conclusions
Homogeneous regions appear heterogeneous on PET images as quantified by textural features. Textural features generated using GLCM depends on quantization, volume and segmentation method. Since these features differentially vary with volume, regions should be segmented using methods are that are robust to variations in contrast and noise using quantization level 64. Small scale heterogeneity phantom studies suggest that homogeneity and dissimilarity are the most suitable textural features to be used as heterogeneity measures where there are combined changes in both heterogeneity and volume due to treatment.
Further investigations are required with more complex heterogeneous phantoms representing real clinical scenario to fully understand the volume and segmentation effects on the reproducibility of these textural indices. Nonetheless, to use these textural features as prognostic biomarkers, changes in textural features between baseline and treatment scans should always be reported along with the changes in volumes.