RTUNet: Residual transformer UNet specifically for pancreas segmentation

https://doi.org/10.1016/j.bspc.2022.104173Get rights and content

Highlights

  • Residual transformer block can captures the relative pancreas position.

  • Dual down-sampling block addresses inaccurate pancreas morphology caused by pooling.

  • Hausdorff distance constraint makes the network focus on the pancreas boundary.

  • Residual transformer UNet is specifically proposed for pancreas segmentation.

Abstract

Accurate pancreas segmentation is crucial for the diagnostic assessment of pancreatic cancer. However, large position changes, high variability in shape and size, and the extremely blurred boundary make the task of pancreas segmentation challenging. To alleviate these challenges, we propose the residual transformer UNet (RTUNet) to fit the nature of the pancreas. Specifically, a residual transformer block is implemented to extract multi-scale features from a global perspective which captures high variabilities in the pancreas position. In addition, a dual convolutional down-sampling strategy is leveraged to obtain precise shape and size features of the pancreas in a large receptive field which prevents the loss of information. We finally propose a dice hausdorff distance loss that makes the network focus on the pancreas boundary. Through extensive experiments on the public NIH dataset, we achieved a dice similarity coefficient (DSC) of 86.25%, which outperforms the state-of-the-art DSC of 85.49%. In addition, our method surpasses the baselines by more than 3.0% on DSC and improves the min DSC by 2.93%. Furthermore, ablation studies are also performed to prove the effectiveness of each proposed module.

Introduction

Pancreatic cancer whose five-year survival is less than 9% has a high mortality, early diagnosis of pancreatic cancer is essential to improve the survival of patients [1], [2]. Accurate pancreas segmentation, as a prerequisite for pancreatic cancer recognition, effectively suppresses the complex background, which helps doctors assess the progress of treatment. Unlike abdominal organs (e.g. liver, kidney, and spleen) whose segmentation accuracy has exceeded 95% [3], [4], [5], accurate pancreas segmentation is challenging due to the following three aspects (Fig. 1): (1) The large inter-slice and intra-slice position variabilities (2) The small size and irregular shape features which is hard to be captured in deep neural network (3) The blurred boundary caused by density similarities between the pancreas and its surrounding tissues (e.g. stomach wall, duodenum and small intestine).

Existing methods leverage general segmentation methods rather than specially designed for the pancreas and these methods can also segment other medical images and natural images, which ignore the anatomical nature of the pancreas. Inspired by this, we propose residual transformer UNet (RTUNet) as shown in Fig. 2, which introduced three blocks to deal with large position changes, high morphological variabilities (shape and size), and the blurred boundary challenges of the pancreas.

Pancreas position varies greatly in CT slices, experts usually identify this change by relative position with surrounding tissues or organs. For example, the head of the pancreas is wrapped by the duodenum and near the liver, while the tail of the pancreas is in the middle of the spleen and kidney. We used self-attention [6] to imitate human perception to obtain pancreas position changes. Specifically, each residual convolution block is followed by a residual transformer block which leverages self-attention to compute the similarity between patch embeddings and get new representations of all patches. This can be seen as a process of obtaining a relative position from a global view. In addition, our residual transformer blocks captured objects from global fine-grained low-level features to global coarse-grained high-level features at multiple scales [7]. As we know, visual tasks based on transformers need an amount of data to train state-of-the-art models because the learning curve is very steep. Therefore, we applied a residual architecture to decrease the difficulty of learning.

Segmentation networks recognize targets through a large receptive field [8]. However, only a large receptive field fails to accurately capture the morphological nature of the pancreas, since the relative position of the internal structure of the pancreas is not kept unchanged during the pooling process, which is not conducive to the segmentation task [9]. Due to the parameter sharing mechanism, convolution has the translation equivariance, which is conducive to identifying the target object and reducing internal relative position information loss, so as to obtain the morphology of the pancreas [9]. In RTUNet, we designed a dual convolutional down-sampling block to down-sample the feature maps to prevent internal relative position loss.

Apart from high variabilities in position and morphology, the pancreas and surrounding tissues or organs have similar densities leading to the blurred boundary. In recent years, pancreas segmentation tends to use the region-based Dice loss function, which lacks attention to the boundaries of the segmentation target [10], [11]. Therefore, we leverage the Hausdorff distance as a regular term for the dice loss, which makes RTUNet pay more attention to the pancreas segmentation boundary.

To evaluate RTUNet, we conducted extensive experiments on the public NIH data set. The average dice coefficient of our method reached 86.4%, which performs better than existing state-of-the-art methods. We also compared RTUNet with other classic medical image segmentation models. In addition, ablation studies are conducted to evaluate each block. In summary, our main contributions are in four aspects:

(1) We propose a residual transformer UNet specifically for pancreas segmentation, which improves segmentation performance.

(2) The residual transformer block was added to capture the relative position of the pancreas through self-attention, and we find the residual structure can reduce the learning difficulty in training the transformer-based neural network.

(3) A dual down-sampling block was designed to alleviate the inaccurate shape and size of the pancreas feature maps caused by the pooling operations.

(4) A Hausdorff distance constraint was leveraged to the region-based Dice loss function to make RTUNet pay more attention to the pancreas boundary, which alleviated the blurred boundary caused by similar densities.

Section snippets

Related work

With the depth of research, various methods have been proposed to segment the pancreas automatically. Specifically, these methods are mainly categorized into traditional approaches and deep learning-based approaches. Traditional pancreas segmentation methods are mainly based on heuristic methods (e.g. threshold and region growth) or model-driven optimization methods (e.g. atlas, graph cuts and simple linear iterative clustering), while deep learning based pancreas segmentation is data-driven,

Methods

The proposed framework is shown in Fig. 3 containing two steps. (1) In the data preprocessing stage, Hounsfield unit (HU) values are limited to [−200, 340] to increase contrast and prevent irrelevant noise. In addition, through localization and cropping, we obtain input images of uniform size 256 × 256 which reduces the influence of complex background. (2) The cropped images are input to RTUNet, and the network outputs the segmentation masks which are then recovered to 512 × 512 resolution.

Experiments

Conclusions

In this paper, we proposed the residual transformer UNet (RTUNet) and the Dice Hausdorff distance (DiceHD) loss function to alleviate the challenges of pancreas segmentation in a targeted manner. We first propose to utilize the residual transformer block to capture the external relative position information between the pancreas and its surrounding tissues or organs from a multi-scale global perspective. Through residual transformer-based ablation study, we found that adding a residual structure

CRediT authorship contribution statement

Chengjian Qiu: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing – original draft, Writing – review & editing, Visualization. Zhe Liu: Resources, Validation, Project administration, Funding acquisition, Supervision, Data curation. Yuqing Song: Investigation, Supervision, Funding acquisition. Jing Yin: Writing – review & editing. Kai Han: Writing – review & editing. Yan Zhu: Writing – review & editing. Yi Liu: Project administration, Funding acquisition. Victor S.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61772242, 62276116, 61976106, 61572239), the China Postdoctoral Science Foundation (2017M611737), the Six Talent Peaks Project in Jiangsu Province (DZXX-122), the Jiangsu Province emergency management science and technology project (YJGL-TG-2020-8), and the key research and development plan of Zhenjiang City (SH2020011).

References (46)

  • AntonelliMichela et al.

    The medical segmentation decathlon

    Nature Commun.

    (2022)
  • VaswaniAshish et al.

    Attention is all you need

    Adv. Neural Inf. Process. Syst.

    (2017)
  • Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer, Multiscale...
  • Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings...
  • GoodfellowIan et al.

    Deep Learning

    (2016)
  • MoJuan et al.

    Iterative 3D feature enhancement network for pancreas segmentation from CT images

    Neural Comput. Appl.

    (2020)
  • TamTran Duc et al.

    Efficient pancreas segmentation in computed tomography based on region-growing

  • ShanXiaoying et al.

    Threshold algorithm for pancreas segmentation in dixon water magnetic resonance images

  • OtsuNobuyuki

    A threshold selection method from gray-level histograms

    IEEE Trans. Syst. Man Cybern.

    (1979)
  • ShimizuAkinobu et al.

    Automated pancreas segmentation from three-dimensional contrast-enhanced computed tomography

    Int. J. Comput. Assist. Radiol. Surgery

    (2010)
  • ErdtMarius et al.

    Automatic pancreas segmentation in contrast enhanced CT data using learned spatial anatomy and texture descriptors

  • OdaMasahiro et al.

    Regression forest-based atlas localization and direction specific atlas generation for pancreas segmentation

  • FaragAmal et al.

    A bottom-up approach for automatic pancreas segmentation in abdominal CT scans

  • Cited by (12)

    • ATFormer: Advanced transformer for medical image segmentation

      2023, Biomedical Signal Processing and Control
    • Automated CT pancreas segmentation for acute pancreatitis patients by combining a novel object detection approach and U-Net

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      Unlike the 2D methods mentioned above, several studies proposed 3D networks for pancreas segmentation [20,34], which can utilize a more inter-slice spatial context for convolution on the transverse plane. Qiu et al. [35]designed a residual transformer block to better extract multi-scale features. Dogan et al. [36] also proposed a two-stage approach for pancreas segmentation, which adopted Mask R-CNN for coarse localization and 3D U-Net for fine segmentation.

    View all citing articles on Scopus
    View full text