Skip to main content
Log in

STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have found extensive use in medical image segmentation tasks. However, they encounter limitations in capturing long-range semantic interactions. Conversely, Transformers excel at handling long-range dependencies but struggle to preserve local semantic details. To address this challenge, we propose STA-Former, a hybrid CNN-Transformer model for medical image segmentation. Our approach is founded on three fundamental principles: (1) We propose the Shrinkage Triplet Attention (STA) module to enhance feature fusion within the decoder. It focuses on spatial and channel interactions in the feature map, computes thresholds across dimensions, and suppresses irrelevant information through soft-thresholding. (2) We present a redesigned hierarchical hybrid CNN-Transformer encoder that connects CNN and Transformer blocks at multiple scales, enabling the capture of both long-range and short-range dependencies across various scales of feature maps. (3) Unlike traditional decoders that apply the attention mechanism exclusively to low-level features, our approach utilizes a multiscale attention hierarchical decoder, leveraging feature map correlations at different scales for effective feature fusion. Our method exhibits superior performance compared to the state-of-the-art methods on three datasets: Synapse multiorgan CT, ACDC cardiac MRI scans, and breast ultrasound image.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The authors confirm that the data supporting the findings of this study are available within the article and openly available in [Synapse], [ACDC], and [BUSI].

References

  1. Mkindu, H., Wu, L., Zhao, Y.: 3d multi-scale vision transformer for lung nodule detection in chest CT images. Signal Image Video Process. 17, 2473–2480 (2023)

    Article  Google Scholar 

  2. Pandit, B.K., Banerjee, A.: 3d edgesegnet: a deep neural network framework for simultaneous edge detection and segmentation of medical images. Signal Image Video Process. 17, 2981–2989 (2023)

    Article  Google Scholar 

  3. Upreti, M., Pandey, C., Bist, A.S., Rawat, B., Hardini, M.: Convolutional neural networks in medical image understanding. Aptisi Trans. Technopreneurship (ATT) 3(2), 120–126 (2021)

    Article  Google Scholar 

  4. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241, Springer (2015)

  5. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306 (2021)

  6. Azad, R., Fayjie, A.R., Kauffmann, C., Ben Ayed, I., Pedersoli, M., Dolz, J.: On the texture bias for few-shot CNN segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2674–2683 (2021)

  7. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention—MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 36–46, Springer (2021)

  8. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  9. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 213–229, Springer (2020)

  10. Azad, R., Heidari, M., Shariatnia, M., Aghdam, E.K., Karimijafarbigloo, S., Adeli, E., Merhof, D.: Transdeeplab: convolution-free transformer-based deeplab v3+ for medical image segmentation. In: Predictive Intelligence in Medicine: 5th International Workshop, PRIME 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings, pp. 91–102, Springer (2022)

  11. Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 367–376 (2021)

  12. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: ‘Swin-unet: unet-like pure transformer for medical image segmentation. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 205–218, Springer (2023)

  13. Wang, B., Wang, F., Dong, P., Li, C.: Multiscale Transunet++: dense hybrid u-net with transformer for medical image segmentation. Signal Image Video Process. 16(6), 1607–1614 (2022)

    Article  Google Scholar 

  14. Zhang, Y., Qian, K., Zhu, Z., Yu, H., Zhang, B.: Dba-unet: a double u-shaped boundary attention network for maxillary sinus anatomical structure segmentation in cbct images. Signal Image Video Process. 17(5), 2251–2257 (2023)

    Article  Google Scholar 

  15. Liang, B., Tang, C., Zhang, W., Xu, M., Wu, T.: N-net: an Unet architecture with dual encoder for medical image segmentation. Signal Image Video Process. 17, 3073–3081 (2023)

    Article  Google Scholar 

  16. Ruan, J., Xie, M., Xiang, S., Liu, T., Fu, Y.: Mew-unet: multi-axis representation learning in frequency domain for medical image segmentation. arXiv:2210.14007 (2022)

  17. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp. 3–11, Springer (2018)

  18. Chen, H., Han, Y., Xu, P., Li, Y., Li, K., Yin, J.: Ms-unet-v2: adaptive denoising method and training strategy for medical image segmentation with small training data. arXiv:2309.03686 (2023)

  19. Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: Cvt: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31 (2021)

  20. Xu, G., Wu, X., Zhang, X., He, X.: Levit-unet: make faster encoders with transformer for medical image segmentation. arXiv:2107.08623 (2021)

  21. Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148, (2021)

  22. Lin, Y., Zhang, D., Fang, X., Chen, Y., Cheng, K.-T., Chen, H.: Rethinking boundary detection in deep learning models for medical image segmentation. In: International Conference on Information Processing in Medical Imaging, pp. 730–742, Springer (2023)

  23. Wang, H., Xie, S., Lin, L., Iwamoto, Y., Han, X.-H., Chen, Y.-W., Tong, R.: Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2390–2394. IEEE (2022)

  24. Guo, M.-H., Liu, Z.-N., Mu, T.-J., Hu, S.-M.: Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 5436–5447 (2022)

    Google Scholar 

  25. Liu, X., Hu, Y., Chen, J.: Hybrid CNN-transformer model for medical image segmentation with pyramid convolution and multi-layer perceptron. Biomed. Signal Process. Control 86, 105331 (2023)

    Article  Google Scholar 

  26. Yu, Z., Lee, F., Chen, Q.: Hct-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation. Appl. Intell. 53, 19990–20006 (2023)

    Article  Google Scholar 

  27. Wang, T., Lan, J., Han, Z., Hu, Z., Huang, Y., Deng, Y., Zhang, H., Wang, J., Chen, M., Jiang, H., et al.: O-net: a novel framework with deep fusion of CNN and transformer for simultaneous segmentation and classification. Front. Neurosci. 16, 876065 (2022)

    Article  PubMed  PubMed Central  Google Scholar 

  28. Chen, Y., Wang, T., Tang, H., Zhao, L., Zhang, X., Tan, T., Gao, Q., Du, M., Tong, T.: Cotrfuse: a novel framework by fusing CNN and transformer for medical image segmentation. Phys. Med. Biol. 68(17), 175027 (2023)

    Article  Google Scholar 

  29. He, Q., Yang, Q., Xie, M.: Hctnet: A hybrid CNN-transformer network for breast ultrasound image segmentation. Comput. Biol. Med. 155, 106629 (2023)

    Article  PubMed  Google Scholar 

  30. Heidari, M., Kazerouni, A., Soltany, M., Azad, R., Aghdam, E.K., Cohen-Adad, J., Merhof, D.: Hiformer: hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6202–6212 (2023)

  31. Zhao, M., Zhong, S., Fu, X., Tang, B., Pecht, M.: Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 16(7), 4681–4690 (2019)

    Article  Google Scholar 

  32. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  33. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)

  34. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv:1804.03999 (2018)

  35. Fu, S., Lu, Y., Wang, Y., Zhou, Y., Shen, W., Fishman, E., Yuille, A.: Domain adaptive relational reasoning for 3d multi-organ segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp. 656–666, Springer (2020)

  36. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

  37. Huang, X., Deng, Z., Li, D., Yuan, X.: Missformer: an effective medical image segmentation transformer. arXiv:2109.07162 (2021)

  38. Naderi, M., Givkashi, M., Piri, F., Karimi, N., Samavi, N.: Focal-unet: Unet-like focal modulation for medical image segmentation. arXiv:2212.09263 (2022)

  39. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

  40. Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data Brief 28, 104863 (2020)

    Article  PubMed  Google Scholar 

  41. Valanarasu, J.M.J., Patel, V.M.: Unext: Mlp-based rapid medical image segmentation network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 23–33, Springer (2022)

  42. Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Yuzhao Liu wrote the main manuscript text. Liming Han, Bin Yao, and Qing Li provide important suggestions for the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Qing Li.

Ethics declarations

Conflict of interest

Not applicable.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Han, L., Yao, B. et al. STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model. SIViP 18, 1901–1910 (2024). https://doi.org/10.1007/s11760-023-02893-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02893-5

Keywords

Navigation