Elsevier

Medical Image Analysis

Volume 82, November 2022, 102607
Medical Image Analysis

Towards bridging the distribution gap: Instance to Prototype Earth Mover’s Distance for distribution alignment

https://doi.org/10.1016/j.media.2022.102607Get rights and content

Highlights

  • Propose instance to prototype Earth Mover’s distance (I2PEMD) for distribution alignment;

  • I2PEMD combines shared prototype learning with EMD estimation;

  • Extensive validation on two typical yet challenging tasks.

Abstract

Despite remarkable success of deep learning, distribution divergence remains a challenge that hinders the performance of many tasks in medical image analysis. Large distribution gap may deteriorate the knowledge transfer across different domains or feature subspaces. To achieve better distribution alignment, we propose a novel module named Instance to Prototype Earth Mover’s Distance (I2PEMD), where shared class-specific prototypes are progressively learned to narrow the distribution gap across different domains or feature subspaces, and Earth Mover’s Distance (EMD) is calculated to take into consideration the cross-class relationships during embedding alignment. We validate the effectiveness of the proposed I2PEMD on two different tasks: multi-modal medical image segmentation and semi-supervised classification. Specifically, in multi-modal medical image segmentation, I2PEMD is explicitly utilized as a distribution alignment regularization term to supervise the model training process, while in semi-supervised classification, I2PEMD works as an alignment measure to sort and cherry-pick the unlabeled data for more accurate and robust pseudo-labeling. Results from comprehensive experiments demonstrate the efficacy of the present method.

Introduction

In medical image analysis, anatomical structures are often imaged with a variety of modalities. Images from different modalities can capture complementary information for disease diagnosis and treatment. Therefore it is important to jointly utilize the cross modality information for better assessment of diseases. However, different imaging mechanisms result in great visual differences, imposing huge feature distribution divergence across different modalities. In some cases, even if the image data is collected from the same or similar distribution, the learned features may be biased towards specific feature subspaces, due to the sampling bias or over-fitting problem (Wang et al., 2019).

To address the above-mentioned issues, distribution alignment across different domains (e.g., cross-modality) or different feature subspaces (e.g., labeled and unlabeled data collected from the same or similar distributions in semi-supervised learning), has drawn growing attention recently. In order to bridge the gap between different modalities, early and late fusion strategies are typically utilized. In early fusion-based methods, inputs from different modalities are concatenated along the color channels before being fed into the network (Pereira et al., 2016, Isensee et al., 2017, Wang et al., 2017, Zhao et al., 2018). As for late-fusion, paired inputs from different modalities are received by separate networks to extract modality-specific features. The extracted features are then fused at the semantic level to generate the final results (Dolz et al., 2018b, Chen et al., 2018, Dolz et al., 2018a). To mitigate the distribution gap across different feature subspaces, various techniques including adversarial training (Li et al., 2020, Dong and Lin, 2019), consistency regularization (Berthelot et al., 2019) and graph-based label propagation (Zhang et al., 2020, Iscen et al., 2019), are proposed.

From a new perspective of instance-to-prototype matching, in this paper, we address the distribution alignment problem by proposing a novel Instance-to-Prototype Earth Mover’s Distance (I2PEMD). Specifically, I2PEMD progressively learns shared class-specific prototypes for different modalities (or feature subspaces), and calculates the Earth Mover’s Distance (EMD) (Hou et al., 2016) to measure the instance-to-prototype matching degree for loss minimization or cherry-picking pseudo-labeled samples in downstream tasks. In addition, in our proposed I2PEMD, the important ground distance matrix for measuring cross-class relationships is dynamically updated by the learned prototypes, which can better adapt to the learned feature embedding than a fixed prior.

Unlike previous studies, the core of our proposed I2PEMD lies in shared prototype learning across different modalities (or feature subspaces) and instance-to-prototype EMD estimation. By explicitly learning shared class-specific prototypes, we can pull the high-level features belonging to the same class closer, mitigating the distribution divergence across different modalities. Besides, by carefully considering the cross-class relationships, I2PEMD leads to more robust matching mechanism for distribution alignment.

Our I2PEMD is a flexible module and ready to be plugged in many existing frameworks for handling the distribution alignment problem. To demonstrate its effectiveness, we apply I2PEMD to two different tasks, i.e., unpaired multi-modal image segmentation and semi-supervised classification. Extensive experimental results demonstrate that our I2PEMD matching mechanism is able to effectively alleviate the distribution alignment problem and improve the performance of downstream tasks.

The overall contributions of the proposed I2PEMD are summarized as follows:

  • We propose to address the distribution alignment problem from a new perspective of instance-to-prototype matching. This mechanism can be readily plugged into many different frameworks that require distribution alignment during deep feature representation learning.

  • We propose to combine shared prototype learning with EMD estimation to take into consideration of both intra-class compactness and cross-class relationships during distribution alignment.

  • We conduct comprehensive experiments to evaluate the effectiveness of the proposed I2PEMD on both unpaired cross-modality segmentation and semi-supervised classification tasks, generating superior performance compared with state-of-the-art methods.

Section snippets

Related works

Our work is closely related to the field of distribution alignment as well as methods concerning multi-modal image segmentation and semi-supervised classification. We will briefly review related literature respectively in the following sections.

Method

In this section, we elaborate on the details of the proposed I2PEMD and its applications to unpaired multi-modal segmentation and to semi-supervised classification tasks as well.

Experiments

In this section, we design and conduct comprehensive experiments to demonstrate effectiveness of the proposed I2PEMD. Specifically, in the task of unpaired multi-modal segmentation, the proposed I2PEMD is utilized to bridge the gap between the CT and MRI domains, mutually benefiting the segmentation performance of both domains. As for the task of semi-supervised classification, I2PEMD acts as a measure to select truly confident samples by taking into consideration the cross-class relationships.

Discussion

In this section, we will mainly discuss how our proposed I2PEMD benefits the unpaired multi-modal segmentation task and the semi-supervised classification task through distribution alignment. In the framework of unpaired multi-modal segmentation, I2PEMD functions as a regularization term to directly supervise the training process. Specifically, I2PEMD constrains the model to learn domain-invariant prototypes for CT and MRI inputs. The shared prototypes align the two domains from a global view,

Conclusion

We propose a novel distribution alignment algorithm, where the alignment is achieved by explicit shared prototype learning and consideration of the cross-class relationships during the instance-to-prototype matching. The proposed distribution alignment module can be flexibly plugged into many frameworks to benefit the tasks which need to bridge gap between different domains or feature subspaces. Comprehensive experiments on the unpaired multi-modal segmentation task and the semi-supervised

CRediT authorship contribution statement

Qin Zhou: Methodology, Software, Validation, Figure preparation, Writing – methodology & results. Runze Wang: Software, Validation, Editing & review. Guodong Zeng: Software, Editing & review. Heng Fan: Software, Editing & review. Guoyan Zheng: Conceptualization, Editing & review, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work was partially supported by Shanghai Municipality Science and Technology Commission, China under grant 20511105205, by the Natural Science Foundation of China under grant U20A20199, and by the National Key R&D Program of China under grant 2019YFC0120603.

References (58)

  • ChenL. et al.

    Mri tumor segmentation with densely connected 3d cnn

  • ÇiçekÖ. et al.

    3D u-net: learning dense volumetric segmentation from sparse annotation

  • CodellaN.C. et al.

    Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic)

  • Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V., 2020. Randaugment: Practical automated data augmentation with a reduced...
  • Diederik, K., Jimmy, B., et al., 2015. Adam: A method for stochastic optimization. In: International Conference on...
  • DolzJ. et al.

    Ivd-net: Intervertebral disc localization and segmentation in mri with a multi-modal unet

  • DolzJ. et al.

    Hyperdense-net: a hyper-densely connected cnn for multi-modal image segmentation

    IEEE Trans. Med. Imaging

    (2018)
  • DongN. et al.

    Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio

  • DongJ. et al.

    Margingan: Adversarial training in semi-supervised learning

  • DouQ. et al.

    Unpaired multi-modal segmentation via knowledge distillation

    IEEE Trans. Med. Imaging

    (2020)
  • Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T., 2013. Unsupervised visual domain adaptation using subspace...
  • GaninY. et al.

    Unsupervised domain adaptation by backpropagation

  • GongB. et al.

    Geodesic flow kernel for unsupervised domain adaptation

  • GopalanR. et al.

    Domain adaptation for object recognition: An unsupervised approach

  • HitchcockF.L.

    The distribution of a product from several sources to numerous localities

    J. Math. Phys.

    (1941)
  • HouL. et al.

    Squared earth mover’s distance-based loss for training deep neural networks

    (2016)
  • Hu, J., Lu, J., Tan, Y.-P., 2015. Deep transfer metric learning. In: Proceedings of the IEEE Conference on Computer...
  • Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected convolutional networks. In:...
  • HuoY. et al.

    Synseg-net: Synthetic segmentation without target modality ground truth

    IEEE Trans. Med. Imaging

    (2018)
  • Cited by (2)

    View full text