Skip to main content

Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation

  • Conference paper
  • First Online:
Domain Adaptation and Representation Transfer (DART 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14293))

Included in the following conference series:

Abstract

Recent advancements in large foundation models have shown promising potential in the medical industry due to their flexible prompting capability. One such model, the Segment Anything Model (SAM), a prompt-driven segmentation model, has shown remarkable performance improvements, surpassing state-of-the-art approaches in medical image segmentation. However, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. In this paper, we propose a novel perspective on self-prompting in medical vision applications. Specifically, we harness the embedding space of SAM to prompt itself through a simple yet effective linear pixel-wise classifier. By preserving the encoding capabilities of the large model, the contextual information from its decoder, and leveraging its interactive promptability, we achieve competitive results on multiple datasets (i.e. improvement of more than 15% compared to fine-tuning the mask decoder using a few images). Our code is available at https://github.com/PeterYYZhang/few-shot-self-prompt-SAM

Q. Wu and Y. Zhang—Co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cai, A., Hu, W., Zheng, J.: Few-shot learning for medical image classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12396, pp. 441–452. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61609-0_35

    Chapter  Google Scholar 

  2. Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp. 168–172. IEEE (2018)

    Google Scholar 

  3. Deng, R., et al.: Segment anything model (sam) for digital pathology: assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155 (2023)

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. Elbatel, M., Martí, R., Li, X.: Fopro-kd: fourier prompted effective knowledge distillation for long-tailed medical image recognition. ArXiv abs/ arXiv: 2305.17421 (2023)

  7. Feyjie, A.R., Azad, R., Pedersoli, M., Kauffman, C., Ayed, I.B., Dolz, J.: Semi-supervised few-shot learning for medical image segmentation. arXiv preprint arXiv:2003.08462 (2020)

  8. He, S., Bao, R., Li, J., Grant, P.E., Ou, Y.: Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324 (2023)

  9. Hu, C., Li, X.: When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506 (2023)

  10. Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

  11. Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37

    Chapter  Google Scholar 

  12. Ji, W., Li, J., Bi, Q., Li, W., Cheng, L.: Segment anything is not always perfect: an investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750 (2023)

  13. Jieyun, B.: Pubic Symphysis-Fetal Head Segmentation and Angle of Progression (Apr 2023). https://doi.org/10.5281/zenodo.7851339, https://doi.org/10.5281/zenodo.7851339

  14. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  15. Li, J., Zhang, Z., Zhao, H.: Self-prompting large language models for open-domain qa. arXiv preprint arXiv:2212.08635 (2022)

  16. Ma, J., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023)

  17. Makarevich, A., Farshad, A., Belagiannis, V., Navab, N.: Metamedseg: volumetric meta-learning for few-shot organ segmentation. arXiv preprint arXiv:2109.09734 (2021)

  18. Mattjie, C., .: Exploring the zero-shot capabilities of the segment anything model (sam) in 2d medical imaging: a comprehensive evaluation and practical guideline. arXiv preprint arXiv:2305.00109 (2023)

  19. Mohapatra, S., Gosai, A., Schlaug, G.: Brain extraction comparing segment anything model (sam) and fsl brain extraction tool. arXiv preprint arXiv:2304.04738 (2023)

  20. OpenAI: Gpt-4 technical report (2023)

    Google Scholar 

  21. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  22. Ramesh, A., et al.: Zero-shot text-to-image generation (2021)

    Google Scholar 

  23. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  24. Singh, R., Bharti, V., Purohit, V., Kumar, A., Singh, A.K., Singh, S.K.: Metamed: few-shot medical image classification using gradient-based meta-learning. Pattern Recogn. 120, 108111 (2021)

    Article  Google Scholar 

  25. Sun, L., et al.: Few-shot medical image segmentation using a global correlation network with discriminative embedding. Comput. Biol. Med. 140, 105067 (2022)

    Article  Google Scholar 

  26. Tschandl, P., Rosendahl, C., Kittler, H.: Data descriptor: the ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1) (2018)

    Google Scholar 

  27. Wang, R., Zhou, Q., Zheng, G.: Few-shot medical image segmentation regularized with self-reference and contrastive learning. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 514–523. Springer (2022). https://doi.org/10.1007/978-3-031-16440-8_49

  28. Wu, J., et alT.: Medical sam adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)

  29. Zhang, K., Liu, D.: Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785 (2023)

  30. Zhou, T., Zhang, Y., Zhou, Y., Wu, Y., Gong, C.: Can sam segment polyps? arXiv preprint arXiv:2304.07583 (2023)

Download references

Acknowledgement

M.E is partially funded by the EACEA Erasmus Mundus grant. We would like to acknowledge Prof. Xiaomeng Li for revising our manuscript. We would also like to thank Mr. Haonan Wang for providing valuable suggestions to our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Wu .

Editor information

Editors and Affiliations

A Supplementary Materials

A Supplementary Materials

1.1 A.1 Limitation

Multi-instance Segmentation. The first limitation of our method lies in the segmentation task that has multiple instances. One can blame the problem on the plugged-in linear classifier, the simple classifier cannot know the number of instances accurately, so the spatial information passed to the prompt encoder is not complete, thus leading to the limited performance. More advanced training and prompting techniques need to be explored in the future.

Limitation of Modality Knowledge in Decoder. We also tested our method on datasets of other modalities. For example, we tested it on an ultrasound dataset, Pubic Symphysis-Fetal Head Segmentation and Angle of Progression [13], which includes lots of high-frequency features. We test head segmentation using this dataset, see Table 3 We found that the SAM decoder will lead to degrading performance. To keep the fairness for both of our methods and other fine-tuning methods, we use k=20 for testing. The result in Table 3 shows that our method has a lower performance compared to MedSAM and SAMed. Surprisingly, in the examples in Fig. 3, we found that the performance is better when just using the linear classifier and then upscaling. The reason is that the decoder of SAM does not have the capability to predict the accurate mask from the interference of high-frequency features. This reflects that although the size and position of the instance are important, the classifier needs to know the basic knowledge of the modality. To solve it, one may combine our methods together with other fine-tuned decoder.

Table 3. The results of our method on the ultrasound dataset: Symphysis-Fetel, in a 20-shot setting. “Linear” means the coarse mask generated by the linear pixel-wise classifier, “point+box” means both of the self-generated point and box are used for the final output.
Fig. 3.
figure 3

Some examples of the result of our method on the Pubic Symphysis-Fetel dataset. The segmentation result is not satisfactory in ultrasound images, although the score is high. Also, the linear classifier even outperforms our method in some case. The result show that original SAM is sensitive to high-frequency perturbations (i.e. edges or noise in ultrasound).

1.2 A.2 Ablation Study Table

Table 4. Results of the ablation study on Kvasir-SEG. “Linear” means the coarse mask from the linear pixel-wise classifier; “point” means only using the point as prompt; “box” represents only using the bounding box as prompt; “point+box” means using both of bounding box and point as prompt. The scores here are dice scores.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Q., Zhang, Y., Elbatel, M. (2024). Self-prompting Large Vision Models for Few-Shot Medical Image Segmentation. In: Koch, L., et al. Domain Adaptation and Representation Transfer. DART 2023. Lecture Notes in Computer Science, vol 14293. Springer, Cham. https://doi.org/10.1007/978-3-031-45857-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45857-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45856-9

  • Online ISBN: 978-3-031-45857-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics