Towards bridging the distribution gap: Instance to Prototype Earth Mover’s Distance for distribution alignment
Graphical abstract
Introduction
In medical image analysis, anatomical structures are often imaged with a variety of modalities. Images from different modalities can capture complementary information for disease diagnosis and treatment. Therefore it is important to jointly utilize the cross modality information for better assessment of diseases. However, different imaging mechanisms result in great visual differences, imposing huge feature distribution divergence across different modalities. In some cases, even if the image data is collected from the same or similar distribution, the learned features may be biased towards specific feature subspaces, due to the sampling bias or over-fitting problem (Wang et al., 2019).
To address the above-mentioned issues, distribution alignment across different domains (e.g., cross-modality) or different feature subspaces (e.g., labeled and unlabeled data collected from the same or similar distributions in semi-supervised learning), has drawn growing attention recently. In order to bridge the gap between different modalities, early and late fusion strategies are typically utilized. In early fusion-based methods, inputs from different modalities are concatenated along the color channels before being fed into the network (Pereira et al., 2016, Isensee et al., 2017, Wang et al., 2017, Zhao et al., 2018). As for late-fusion, paired inputs from different modalities are received by separate networks to extract modality-specific features. The extracted features are then fused at the semantic level to generate the final results (Dolz et al., 2018b, Chen et al., 2018, Dolz et al., 2018a). To mitigate the distribution gap across different feature subspaces, various techniques including adversarial training (Li et al., 2020, Dong and Lin, 2019), consistency regularization (Berthelot et al., 2019) and graph-based label propagation (Zhang et al., 2020, Iscen et al., 2019), are proposed.
From a new perspective of instance-to-prototype matching, in this paper, we address the distribution alignment problem by proposing a novel Instance-to-Prototype Earth Mover’s Distance (I2PEMD). Specifically, I2PEMD progressively learns shared class-specific prototypes for different modalities (or feature subspaces), and calculates the Earth Mover’s Distance (EMD) (Hou et al., 2016) to measure the instance-to-prototype matching degree for loss minimization or cherry-picking pseudo-labeled samples in downstream tasks. In addition, in our proposed I2PEMD, the important ground distance matrix for measuring cross-class relationships is dynamically updated by the learned prototypes, which can better adapt to the learned feature embedding than a fixed prior.
Unlike previous studies, the core of our proposed I2PEMD lies in shared prototype learning across different modalities (or feature subspaces) and instance-to-prototype EMD estimation. By explicitly learning shared class-specific prototypes, we can pull the high-level features belonging to the same class closer, mitigating the distribution divergence across different modalities. Besides, by carefully considering the cross-class relationships, I2PEMD leads to more robust matching mechanism for distribution alignment.
Our I2PEMD is a flexible module and ready to be plugged in many existing frameworks for handling the distribution alignment problem. To demonstrate its effectiveness, we apply I2PEMD to two different tasks, i.e., unpaired multi-modal image segmentation and semi-supervised classification. Extensive experimental results demonstrate that our I2PEMD matching mechanism is able to effectively alleviate the distribution alignment problem and improve the performance of downstream tasks.
The overall contributions of the proposed I2PEMD are summarized as follows:
- •
We propose to address the distribution alignment problem from a new perspective of instance-to-prototype matching. This mechanism can be readily plugged into many different frameworks that require distribution alignment during deep feature representation learning.
- •
We propose to combine shared prototype learning with EMD estimation to take into consideration of both intra-class compactness and cross-class relationships during distribution alignment.
- •
We conduct comprehensive experiments to evaluate the effectiveness of the proposed I2PEMD on both unpaired cross-modality segmentation and semi-supervised classification tasks, generating superior performance compared with state-of-the-art methods.
Section snippets
Related works
Our work is closely related to the field of distribution alignment as well as methods concerning multi-modal image segmentation and semi-supervised classification. We will briefly review related literature respectively in the following sections.
Method
In this section, we elaborate on the details of the proposed I2PEMD and its applications to unpaired multi-modal segmentation and to semi-supervised classification tasks as well.
Experiments
In this section, we design and conduct comprehensive experiments to demonstrate effectiveness of the proposed I2PEMD. Specifically, in the task of unpaired multi-modal segmentation, the proposed I2PEMD is utilized to bridge the gap between the CT and MRI domains, mutually benefiting the segmentation performance of both domains. As for the task of semi-supervised classification, I2PEMD acts as a measure to select truly confident samples by taking into consideration the cross-class relationships.
Discussion
In this section, we will mainly discuss how our proposed I2PEMD benefits the unpaired multi-modal segmentation task and the semi-supervised classification task through distribution alignment. In the framework of unpaired multi-modal segmentation, I2PEMD functions as a regularization term to directly supervise the training process. Specifically, I2PEMD constrains the model to learn domain-invariant prototypes for CT and MRI inputs. The shared prototypes align the two domains from a global view,
Conclusion
We propose a novel distribution alignment algorithm, where the alignment is achieved by explicit shared prototype learning and consideration of the cross-class relationships during the instance-to-prototype matching. The proposed distribution alignment module can be flexibly plugged into many frameworks to benefit the tasks which need to bridge gap between different domains or feature subspaces. Comprehensive experiments on the unpaired multi-modal segmentation task and the semi-supervised
CRediT authorship contribution statement
Qin Zhou: Methodology, Software, Validation, Figure preparation, Writing – methodology & results. Runze Wang: Software, Validation, Editing & review. Guodong Zeng: Software, Editing & review. Heng Fan: Software, Editing & review. Guoyan Zheng: Conceptualization, Editing & review, Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The work was partially supported by Shanghai Municipality Science and Technology Commission, China under grant 20511105205, by the Natural Science Foundation of China under grant U20A20199, and by the National Key R&D Program of China under grant 2019YFC0120603.
References (58)
- et al.
A new contrast based multimodal medical image fusion framework
Neurocomputing
(2015) - et al.
Semi-supervised learning using adversarial training with good and bad samples
Mach. Vis. Appl.
(2020) - et al.
A deep learning model integrating fcnns and crfs for brain tumor segmentation
Med. Image Anal.
(2018) - et al.
A review: Deep learning for medical image segmentation using multi-modality fusion
Array
(2019) - et al.
Graph chest x-ray classification under extreme minimal supervision
- Bengio, Y., Louradour, J., Collobert, R., Weston, J., 2009. Curriculum learning. In: Proceedings of the 26th Annual...
- Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Sohn, K., Zhang, H., Raffel, C., 2020. Remixmatch:...
- et al.
Mixmatch: A holistic approach to semi-supervised learning
- et al.
Factorised spatial representation learning: Application in semi-supervised myocardial segmentation
- et al.
Deep class-specific affinity-guided convolutional network for multimodal unpaired image segmentation