Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation

Wu, Qi; Zhang, Yuyao; Elbatel, Marawan

doi:10.1007/978-3-031-45857-6_16

Qi Wu¹⁶,
Yuyao Zhang¹⁶ &
Marawan Elbatel^16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14293))

Included in the following conference series:

MICCAI Workshop on Domain Adaptation and Representation Transfer

738 Accesses
1 Citations

Abstract

Recent advancements in large foundation models have shown promising potential in the medical industry due to their flexible prompting capability. One such model, the Segment Anything Model (SAM), a prompt-driven segmentation model, has shown remarkable performance improvements, surpassing state-of-the-art approaches in medical image segmentation. However, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. In this paper, we propose a novel perspective on self-prompting in medical vision applications. Specifically, we harness the embedding space of SAM to prompt itself through a simple yet effective linear pixel-wise classifier. By preserving the encoding capabilities of the large model, the contextual information from its decoder, and leveraging its interactive promptability, we achieve competitive results on multiple datasets (i.e. improvement of more than 15% compared to fine-tuning the mask decoder using a few images). Our code is available at https://github.com/PeterYYZhang/few-shot-self-prompt-SAM

Q. Wu and Y. Zhang—Co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cai, A., Hu, W., Zheng, J.: Few-shot learning for medical image classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 441–452. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_35
Chapter Google Scholar
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 168–172. IEEE (2018)
Google Scholar
Deng, R., et al.: Segment anything model (sam) for digital pathology: assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155 (2023)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Elbatel, M., Martí, R., Li, X.: Fopro-kd: fourier prompted effective knowledge distillation for long-tailed medical image recognition. ArXiv abs/ arXiv: 2305.17421 (2023)
Feyjie, A.R., Azad, R., Pedersoli, M., Kauffman, C., Ayed, I.B., Dolz, J.: Semi-supervised few-shot learning for medical image segmentation. arXiv preprint arXiv:2003.08462 (2020)
He, S., Bao, R., Li, J., Grant, P.E., Ou, Y.: Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324 (2023)
Hu, C., Li, X.: When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Chapter Google Scholar
Ji, W., Li, J., Bi, Q., Li, W., Cheng, L.: Segment anything is not always perfect: an investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750 (2023)
Jieyun, B.: Pubic Symphysis-Fetal Head Segmentation and Angle of Progression (Apr 2023). https://doi.org/10.5281/zenodo.7851339, https://doi.org/10.5281/zenodo.7851339
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Li, J., Zhang, Z., Zhao, H.: Self-prompting large language models for open-domain qa. arXiv preprint arXiv:2212.08635 (2022)
Ma, J., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023)
Makarevich, A., Farshad, A., Belagiannis, V., Navab, N.: Metamedseg: volumetric meta-learning for few-shot organ segmentation. arXiv preprint arXiv:2109.09734 (2021)
Mattjie, C., .: Exploring the zero-shot capabilities of the segment anything model (sam) in 2d medical imaging: a comprehensive evaluation and practical guideline. arXiv preprint arXiv:2305.00109 (2023)
Mohapatra, S., Gosai, A., Schlaug, G.: Brain extraction comparing segment anything model (sam) and fsl brain extraction tool. arXiv preprint arXiv:2304.04738 (2023)
OpenAI: Gpt-4 technical report (2023)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Singh, R., Bharti, V., Purohit, V., Kumar, A., Singh, A.K., Singh, S.K.: Metamed: few-shot medical image classification using gradient-based meta-learning. Pattern Recogn. 120, 108111 (2021)
Article Google Scholar
Sun, L., et al.: Few-shot medical image segmentation using a global correlation network with discriminative embedding. Comput. Biol. Med. 140, 105067 (2022)
Article Google Scholar
Tschandl, P., Rosendahl, C., Kittler, H.: Data descriptor: the ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1) (2018)
Google Scholar
Wang, R., Zhou, Q., Zheng, G.: Few-shot medical image segmentation regularized with self-reference and contrastive learning. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 514–523. Springer (2022). https://doi.org/10.1007/978-3-031-16440-8_49
Wu, J., et alT.: Medical sam adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)
Zhang, K., Liu, D.: Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785 (2023)
Zhou, T., Zhang, Y., Zhou, Y., Wu, Y., Gong, C.: Can sam segment polyps? arXiv preprint arXiv:2304.07583 (2023)

Download references

Acknowledgement

M.E is partially funded by the EACEA Erasmus Mundus grant. We would like to acknowledge Prof. Xiaomeng Li for revising our manuscript. We would also like to thank Mr. Haonan Wang for providing valuable suggestions to our work.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Hong Kong, China
Qi Wu, Yuyao Zhang & Marawan Elbatel
Computer Vision and Robotics Institute, University of Girona, Girona, Spain
Marawan Elbatel

Authors

Qi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuyao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Marawan Elbatel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Wu .

Editor information

Editors and Affiliations

CONICET/Universidad Nacional del Litoral, Tübingen, Germany
Lisa Koch
King’s College London, London, UK
M. Jorge Cardoso
CONICET / Universidad Nacional del Litoral, Santa Fe, Argentina
Enzo Ferrante
University of Oxford, Oxford, UK
Konstantinos Kamnitsas
Imperial College London, London, UK
Mobarakol Islam
Chinese University of Hong Kong, Hong Kong, Hong Kong
Meirui Jiang
Nvidia GmbH, Munich, Germany
Nicola Rieke
University of Edinburgh, Edinburgh, UK
Sotirios A. Tsaftaris
Nvidia (United States), Santa Clara, CA, USA
Dong Yang

A Supplementary Materials

1.1 A.1 Limitation

Multi-instance Segmentation. The first limitation of our method lies in the segmentation task that has multiple instances. One can blame the problem on the plugged-in linear classifier, the simple classifier cannot know the number of instances accurately, so the spatial information passed to the prompt encoder is not complete, thus leading to the limited performance. More advanced training and prompting techniques need to be explored in the future.

Limitation of Modality Knowledge in Decoder. We also tested our method on datasets of other modalities. For example, we tested it on an ultrasound dataset, Pubic Symphysis-Fetal Head Segmentation and Angle of Progression [13], which includes lots of high-frequency features. We test head segmentation using this dataset, see Table 3 We found that the SAM decoder will lead to degrading performance. To keep the fairness for both of our methods and other fine-tuning methods, we use k=20 for testing. The result in Table 3 shows that our method has a lower performance compared to MedSAM and SAMed. Surprisingly, in the examples in Fig. 3, we found that the performance is better when just using the linear classifier and then upscaling. The reason is that the decoder of SAM does not have the capability to predict the accurate mask from the interference of high-frequency features. This reflects that although the size and position of the instance are important, the classifier needs to know the basic knowledge of the modality. To solve it, one may combine our methods together with other fine-tuned decoder.

Table 3. The results of our method on the ultrasound dataset: Symphysis-Fetel, in a 20-shot setting. “Linear” means the coarse mask generated by the linear pixel-wise classifier, “point+box” means both of the self-generated point and box are used for the final output.

Full size table

1.2 A.2 Ablation Study Table

Table 4. Results of the ablation study on Kvasir-SEG. “Linear” means the coarse mask from the linear pixel-wise classifier; “point” means only using the point as prompt; “box” represents only using the bounding box as prompt; “point+box” means using both of bounding box and point as prompt. The scores here are dice scores.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Q., Zhang, Y., Elbatel, M. (2024). Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation. In: Koch, L., et al. Domain Adaptation and Representation Transfer. DART 2023. Lecture Notes in Computer Science, vol 14293. Springer, Cham. https://doi.org/10.1007/978-3-031-45857-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-45857-6_16
Published: 14 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45856-9
Online ISBN: 978-3-031-45857-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation

Abstract

Access this chapter

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Supplementary Materials

A Supplementary Materials

1.1 A.1 Limitation

1.2 A.2 Ablation Study Table

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation