GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation

Zhang, Can; Wang, Zhiqiang; Zhang, Yuan; Li, Xuanya; Hu, Kai

doi:10.1007/978-3-031-53302-0_4

Can Zhang¹⁴,
Zhiqiang Wang¹⁵,
Yuan Zhang¹⁴,
Xuanya Li¹⁶ &
…
Kai Hu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14557))

Included in the following conference series:

International Conference on Multimedia Modeling

339 Accesses

Abstract

As one of the key problems in computer-aided medical image analysis, learning how to model global relationships and extract local details is crucial to improve the performance of abdominal multi-organ segmentation. While current techniques for Convolutional Neural Networks (CNNs) are quite mature, their limited receptive field makes it difficult to balance the ability to capture global relationships with local details, especially when stacked onto deeper networks. Thus, several recent works have proposed Vision Transformer based on a self-attentive mechanism and used it for abdominal multi-organ segmentation. However, Vision Transformer is computationally expensive by modeling long-range relationships on pairs of patches. To address these issues, we propose a novel multi-organ segmentation framework, named GDTNet, based on the synergy of CNN and Transformer for mining global relationships and local details. To achieve this goal, we innovatively design a Dilated Attention Module (DAM) that can efficiently capture global contextual features and construct global semantic information. Specifically, we employ a three-parallel branching structure to model the global semantic information of multiscale encoded features by Dilated Transformer, combined with global average pooling under the supervision of Gate Attention. In addition, we fuse each DAM with DAMs from all previous layers to further encode features between scales. Extensive experiments on the Synapse dataset show that our method outperforms ten other state-of-the-art segmentation methods, achieving accurate segmentation of multiple organs in the abdomen.

C. Zhang and Z. Wang—These authors contributed equally to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Chapter Google Scholar
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation (2021)
Google Scholar
DENG, J.: A large-scale hierarchical image database. Proc. IEEE Comput. Vision Pattern Recogn. 2009 (2009)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2021)
Google Scholar
Gibson, E., et al.: Automatic multi-organ segmentation on abdominal CT with dense v-networks. IEEE Trans. Med. Imaging 37(8), 1822–1834 (2018)
Article Google Scholar
Heidari, M., et al.: Hiformer: hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6202–6212 (January 2023)
Google Scholar
Huang, H., et al.: Scaleformer: revisiting the transformer-based backbones from a scale-wise perspective for medical image segmentation (2022)
Google Scholar
Huang, X., Deng, Z., Li, D., Yuan, X.: Missformer: an effective medical image segmentation transformer (2021)
Google Scholar
Jin, Q., Meng, Z., Pham, T.D., Chen, Q., Wei, L., Su, R.: Dunet: a deformable network for retinal vessel segmentation. Knowl.-Based Syst. 178, 149–162 (2019)
Article Google Scholar
Lian, S., Luo, Z., Zhong, Z., Lin, X., Su, S., Li, S.: Attention guided u-net for accurate iris segmentation. J. Vis. Commun. Image Represent. 56, 296–304 (2018)
Article Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022, October 2021
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks (2015)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR, 18–24 July 2021
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Wang, H., Cao, P., Wang, J., Zaiane, O.R.: Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, mo. 3, pp. 2441–2449 (2022)
Google Scholar
Wang, H., et al.: Mixed transformer u-net for medical image segmentation. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2390–2394 (2022)
Google Scholar
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 568–578, October 2021
Google Scholar
Xie, Y., Huang, Y., Zhang, Y., Li, X., Ye, X., Hu, K.: Transwnet: integrating transformers into CNNs via row and column attention for abdominal multi-organ segmentation. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023)
Google Scholar
Xu, G., Wu, X., Zhang, X., He, X.: Levit-unet: make faster encoders with transformer for medical image segmentation (2021)
Google Scholar
Yan, X., Tang, H., Sun, S., Ma, H., Kong, D., Xie, X.: After-unet: axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3971–3981 (January 2022)
Google Scholar
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020)
Article Google Scholar
Zhu, X., Lyu, S., Wang, X., Zhao, Q.: Tph-yolov5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios (2021)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 62272404 and 62372170, in part by the Open Project of Key Laboratory of Medical Imaging and Artificial Intelligence of Hunan Province under Grant YXZN2022004, in part by the Natural Science Foundation of Hunan Province of China under Grants 2022JJ30571 and 2023JJ40638, in part by the Research Foundation of Education Department of Hunan Province under Grant 21B0172, in part by the Innovation and Entrepreneurship Training Program for China University Students under Grant 202210530002, and in part by the Baidu Program.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
Can Zhang, Yuan Zhang & Kai Hu
Key Laboratory of Medical Imaging and Artificial Intelligence of Hunan Province, Xiangnan University, Chenzhou, 423000, China
Zhiqiang Wang
Baidu Inc., Beijing, 100085, China
Xuanya Li

Authors

Can Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuanya Li
View author publications
You can also search for this author in PubMed Google Scholar
Kai Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuan Zhang or Kai Hu .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Wang, Z., Zhang, Y., Li, X., Hu, K. (2024). GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-53302-0_4
Published: 29 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53301-3
Online ISBN: 978-3-031-53302-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation