Wasserstein-based texture analysis in radiomic studies

doi:10.1016/j.compmedimag.2022.102129

Computerized Medical Imaging and Graphics

Volume 102, December 2022, 102129

https://doi.org/10.1016/j.compmedimag.2022.102129 Get rights and content

Under a Creative Commons license

open access

Highlights

•
Large data cohorts inherently have representative “reference” samples.
•
Radiomics statistical texture features lose the spatial information inherent in highdimensional texture matrices.
•
Proposing spatial texture features considering optimal reference samples.
•
Robust Wasserstein metric from Optimal Mass transport (OMT) theory used in the proposed classification pipeline.

Abstract

The emerging field of radiomics that transforms standard-of-care images to quantifiable scalar statistics endeavors to reveal the information hidden in these macroscopic images. The concept of texture is widely used and essential in many radiomic-based studies. Practice usually reduces spatial multidimensional texture matrices, e.g., gray-level co-occurrence matrices (GLCMs), to summary scalar features. These statistical features have been demonstrated to be strongly correlated and tend to contribute redundant information; and does not account for the spatial information hidden in the multivariate texture matrices. This study proposes a novel pipeline to deal with spatial texture features in radiomic studies. A new set of textural features that preserve the spatial information inherent in GLCMs is proposed and used for classification purposes. The set of the new features uses the Wasserstein metric from optimal mass transport theory (OMT) to quantify the spatial similarity between samples within a given label class. In particular, based on a selected subset of texture GLCMs from the training cohort, we propose new representative spatial texture features, which we incorporate into a supervised image classification pipeline. The pipeline relies on the support vector machine (SVM) algorithm along with Bayesian optimization and the Wasserstein metric. The selection of the best GLCM references is considered for each classification label and is performed during the training phase of the SVM classifier using a Bayesian optimizer. We assume that sample fitness is defined based on closeness (in the sense of the Wasserstein metric) and high correlation (Spearman’s rank sense) with other samples in the same class. Moreover, the newly defined spatial texture features consist of the Wasserstein distance between the optimally selected references and the remaining samples. We assessed the performance of the proposed classification pipeline in diagnosing the coronavirus disease 2019 (COVID-19) from computed tomographic (CT) images. To evaluate the proposed spatial features’ added value, we compared the performance of the proposed classification pipeline with other SVM-based classifiers that account for different texture features, namely: statistical features only, optimized spatial features using Euclidean metric, non-optimized spatial features with Wasserstein metric. The proposed technique, which accounts for the optimized spatial texture feature with Wasserstein metric, shows great potential in classifying new COVID CT images that the algorithm has not seen in the training step. The MATLAB code of the proposed classification pipeline is made available. It can be used to find the best reference samples in other data cohorts, which can then be employed to build different prediction models.

Keywords

Radiomics texture

Optimal mass transport

Reference samples

Spatial texture features

Supervised classification

Wasserstein metric

Bayesian optimization