Abstract
Recent advancements in large foundation models have shown promising potential in the medical industry due to their flexible prompting capability. One such model, the Segment Anything Model (SAM), a prompt-driven segmentation model, has shown remarkable performance improvements, surpassing state-of-the-art approaches in medical image segmentation. However, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. In this paper, we propose a novel perspective on self-prompting in medical vision applications. Specifically, we harness the embedding space of SAM to prompt itself through a simple yet effective linear pixel-wise classifier. By preserving the encoding capabilities of the large model, the contextual information from its decoder, and leveraging its interactive promptability, we achieve competitive results on multiple datasets (i.e. improvement of more than 15% compared to fine-tuning the mask decoder using a few images). Our code is available at https://github.com/PeterYYZhang/few-shot-self-prompt-SAM
Q. Wu and Y. Zhang—Co-first authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cai, A., Hu, W., Zheng, J.: Few-shot learning for medical image classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 441–452. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_35
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 168–172. IEEE (2018)
Deng, R., et al.: Segment anything model (sam) for digital pathology: assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155 (2023)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Elbatel, M., Martí, R., Li, X.: Fopro-kd: fourier prompted effective knowledge distillation for long-tailed medical image recognition. ArXiv abs/ arXiv: 2305.17421 (2023)
Feyjie, A.R., Azad, R., Pedersoli, M., Kauffman, C., Ayed, I.B., Dolz, J.: Semi-supervised few-shot learning for medical image segmentation. arXiv preprint arXiv:2003.08462 (2020)
He, S., Bao, R., Li, J., Grant, P.E., Ou, Y.: Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324 (2023)
Hu, C., Li, X.: When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Ji, W., Li, J., Bi, Q., Li, W., Cheng, L.: Segment anything is not always perfect: an investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750 (2023)
Jieyun, B.: Pubic Symphysis-Fetal Head Segmentation and Angle of Progression (Apr 2023). https://doi.org/10.5281/zenodo.7851339, https://doi.org/10.5281/zenodo.7851339
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Li, J., Zhang, Z., Zhao, H.: Self-prompting large language models for open-domain qa. arXiv preprint arXiv:2212.08635 (2022)
Ma, J., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023)
Makarevich, A., Farshad, A., Belagiannis, V., Navab, N.: Metamedseg: volumetric meta-learning for few-shot organ segmentation. arXiv preprint arXiv:2109.09734 (2021)
Mattjie, C., .: Exploring the zero-shot capabilities of the segment anything model (sam) in 2d medical imaging: a comprehensive evaluation and practical guideline. arXiv preprint arXiv:2305.00109 (2023)
Mohapatra, S., Gosai, A., Schlaug, G.: Brain extraction comparing segment anything model (sam) and fsl brain extraction tool. arXiv preprint arXiv:2304.04738 (2023)
OpenAI: Gpt-4 technical report (2023)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ramesh, A., et al.: Zero-shot text-to-image generation (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Singh, R., Bharti, V., Purohit, V., Kumar, A., Singh, A.K., Singh, S.K.: Metamed: few-shot medical image classification using gradient-based meta-learning. Pattern Recogn. 120, 108111 (2021)
Sun, L., et al.: Few-shot medical image segmentation using a global correlation network with discriminative embedding. Comput. Biol. Med. 140, 105067 (2022)
Tschandl, P., Rosendahl, C., Kittler, H.: Data descriptor: the ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1) (2018)
Wang, R., Zhou, Q., Zheng, G.: Few-shot medical image segmentation regularized with self-reference and contrastive learning. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 514–523. Springer (2022). https://doi.org/10.1007/978-3-031-16440-8_49
Wu, J., et alT.: Medical sam adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)
Zhang, K., Liu, D.: Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785 (2023)
Zhou, T., Zhang, Y., Zhou, Y., Wu, Y., Gong, C.: Can sam segment polyps? arXiv preprint arXiv:2304.07583 (2023)
Acknowledgement
M.E is partially funded by the EACEA Erasmus Mundus grant. We would like to acknowledge Prof. Xiaomeng Li for revising our manuscript. We would also like to thank Mr. Haonan Wang for providing valuable suggestions to our work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Supplementary Materials
A Supplementary Materials
1.1 A.1 Limitation
Multi-instance Segmentation. The first limitation of our method lies in the segmentation task that has multiple instances. One can blame the problem on the plugged-in linear classifier, the simple classifier cannot know the number of instances accurately, so the spatial information passed to the prompt encoder is not complete, thus leading to the limited performance. More advanced training and prompting techniques need to be explored in the future.
Limitation of Modality Knowledge in Decoder. We also tested our method on datasets of other modalities. For example, we tested it on an ultrasound dataset, Pubic Symphysis-Fetal Head Segmentation and Angle of Progression [13], which includes lots of high-frequency features. We test head segmentation using this dataset, see Table 3 We found that the SAM decoder will lead to degrading performance. To keep the fairness for both of our methods and other fine-tuning methods, we use k=20 for testing. The result in Table 3 shows that our method has a lower performance compared to MedSAM and SAMed. Surprisingly, in the examples in Fig. 3, we found that the performance is better when just using the linear classifier and then upscaling. The reason is that the decoder of SAM does not have the capability to predict the accurate mask from the interference of high-frequency features. This reflects that although the size and position of the instance are important, the classifier needs to know the basic knowledge of the modality. To solve it, one may combine our methods together with other fine-tuned decoder.
1.2 A.2 Ablation Study Table
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Q., Zhang, Y., Elbatel, M. (2024). Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation. In: Koch, L., et al. Domain Adaptation and Representation Transfer. DART 2023. Lecture Notes in Computer Science, vol 14293. Springer, Cham. https://doi.org/10.1007/978-3-031-45857-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-45857-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45856-9
Online ISBN: 978-3-031-45857-6
eBook Packages: Computer ScienceComputer Science (R0)