SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Wang, Yiqing; Li, Zihan; Mei, Jieru; Wei, Zihao; Liu, Li; Wang, Chen; Sang, Shengtian; Yuille, Alan L.; Xie, Cihang; Zhou, Yuyin

doi:10.1007/978-3-031-43898-1_47

Yiqing Wang¹⁴,
Zihan Li¹⁵,
Jieru Mei¹⁶,
Zihao Wei^14,20,
Li Liu¹⁷,
Chen Wang¹⁸,
Shengtian Sang¹⁹,
Alan L. Yuille¹⁶,
Cihang Xie¹⁷ &
…
Yuyin Zhou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14222))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

4006 Accesses
3 Citations
4 Altmetric

Abstract

Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis. Our strategy harnesses the potential of multi-view information by incorporating two principal components. In the pre-training phase, we deploy a masked multi-view encoder devised to concurrently train masked multi-view observations through a range of diverse proxy tasks. These tasks span image reconstruction, rotation, contrastive learning, and a novel task that employs a mutual learning paradigm. This new task capitalizes on the consistency between predictions from various perspectives, enabling the extraction of hidden multi-view information from 3D medical data. In the fine-tuning stage, a cross-view decoder is developed to aggregate the multi-view information through a cross-attention block. Compared with the previous state-of-the-art self-supervised learning method Swin UNETR, SwinMM demonstrates a notable advantage on several medical image segmentation tasks. It allows for a smooth integration of multi-view information, significantly boosting both the accuracy and data-efficiency of the model. Code and models are available at https://github.com/UCSC-VLAA/SwinMM/.

Y. Wang, Z. Li, J. Mei and Z. Wei—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13(1), 1–13 (2022)
Article Google Scholar
Armato, S.G., III., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011)
Article Google Scholar
Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37, 2514–2525 (2018)
Article Google Scholar
Chen, C., Liu, X., Ding, M., Zheng, J., Li, J.: 3D dilated multi-fiber network for real-time brain tumor segmentation in MRI. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 184–192. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_21
Chapter Google Scholar
Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: ICLR (2020)
Google Scholar
Grossberg, A.J., et al.: Imaging and clinical data archive for head and neck squamous cell carcinoma patients treated with radiotherapy. Sci. Data 5, 180173 (2018)
Article Google Scholar
Harmon, S.A., et al.: Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat. Commun. 11(1), 1–7 (2020)
Article Google Scholar
Hatamizadeh, A., Yang, D., Roth, H.R., Xu, D.: UNETR: transformers for 3D medical image segmentation. In: WACV (2022)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Doll’ar, P., Girshick, R.B.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
Google Scholar
Hong, Q., et al.: A distance transformation deep forest framework with hybrid-feature fusion for CXR image classification. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Google Scholar
Iglesias, J.E., Sabuncu, M.R.: Multi-atlas segmentation of biomedical images: a survey. Med. Image Anal. 24(1), 205–219 (2015)
Article Google Scholar
Johnson, C.D., Chen, M., Toledano, A.Y., et al.: Accuracy of CT colonography for detection of large adenomas and cancers. Obstet. Gynecol. Surv. 64, 35–37 (2009)
Article Google Scholar
Kim, S., Nam, J., Ko, B.C.: ViT-NeT: interpretable vision transformers with neural tree decoder. In: ICML (2022)
Google Scholar
Li, Z., Li, Y., Li, Q., et al.: LViT: language meets vision transformer in medical image segmentation. IEEE Trans. Med. Imaging (2023)
Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Google Scholar
Luo, X., Liao, W., Xiao, J., et al.: WORD: a large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image. Med. Image Anal. 82, 102642 (2022)
Article Google Scholar
Ma, J., Zhang, Y., Gu, S., et al.: AbdomenCT-1K: is abdominal organ segmentation a solved problem. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Google Scholar
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 561–580. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_34
Chapter Google Scholar
Peiris, H., Hayat, M., Chen, Z., Egan, G., Harandi, M.: A robust volumetric transformer for accurate 3D tumor segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. LNCS, vol. 13435. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_16
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J.N., Wu, Z., Ding, X.: Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Med. Image Anal. 63, 101693 (2020)
Article Google Scholar
Tang, Y., et al.: Self-supervised pre-training of Swin transformers for 3D medical image analysis. In: CVPR (2022)
Google Scholar
Wu, D., et al.: A learning based deformable template matching method for automatic rib centerline extraction and labeling in CT images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 980–987. IEEE (2012)
Google Scholar
Xia, Y., Yang, D., Yu, Z., et al.: Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation. Med. Image Anal. 65, 101766 (2020)
Article Google Scholar
Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 171–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_16
Chapter Google Scholar
Zhai, P., Cong, H., Zhu, E., Zhao, G., Yu, Y., Li, J.: MVCNet: multiview contrastive network for unsupervised representation learning for 3-D CT lesions. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Google Scholar
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: CVPR (2018)
Google Scholar
Zhao, Q., Wang, H., Wang, G.: LCOV-NET: a lightweight neural network for COVID-19 pneumonia lesion segmentation from 3D CT images. In: ISBI (2021)
Google Scholar
Zhao, Z., et al.: MMGL: multi-scale multi-view global-local contrastive learning for semi-supervised cardiac image segmentation. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 401–405. IEEE (2022)
Google Scholar
Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., Prasanna, P.: Self pre-training with masked autoencoders for medical image analysis. arXiv preprint arXiv:2203.05573 (2022)
Zhou, Y., et al.: Semi-supervised 3D abdominal multi-organ segmentation via deep multi-planar co-training. In: WACV (2019)
Google Scholar

Download references

Acknowledgement

This work is partially supported by the Google Cloud Research Credits program.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Yiqing Wang & Zihao Wei
University of Washington, Seattle, USA
Zihan Li
The Johns Hopkins University, Baltimore, USA
Jieru Mei & Alan L. Yuille
University of California, Santa Cruz, Santa Cruz, USA
Li Liu, Cihang Xie & Yuyin Zhou
Tsinghua University, Beijing, China
Chen Wang
Stanford University, Stanford, USA
Shengtian Sang
University of Michigan, Ann Arbor, Ann Arbor, USA
Zihao Wei

Authors

Yiqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zihan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jieru Mei
View author publications
You can also search for this author in PubMed Google Scholar
Zihao Wei
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shengtian Sang
View author publications
You can also search for this author in PubMed Google Scholar
Alan L. Yuille
View author publications
You can also search for this author in PubMed Google Scholar
Cihang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yuyin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuyin Zhou .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen's University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 552 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y. et al. (2023). SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14222. Springer, Cham. https://doi.org/10.1007/978-3-031-43898-1_47

Download citation

DOI: https://doi.org/10.1007/978-3-031-43898-1_47
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43897-4
Online ISBN: 978-3-031-43898-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)