RTUNet: Residual transformer UNet specifically for pancreas segmentation
Introduction
Pancreatic cancer whose five-year survival is less than 9% has a high mortality, early diagnosis of pancreatic cancer is essential to improve the survival of patients [1], [2]. Accurate pancreas segmentation, as a prerequisite for pancreatic cancer recognition, effectively suppresses the complex background, which helps doctors assess the progress of treatment. Unlike abdominal organs (e.g. liver, kidney, and spleen) whose segmentation accuracy has exceeded 95% [3], [4], [5], accurate pancreas segmentation is challenging due to the following three aspects (Fig. 1): (1) The large inter-slice and intra-slice position variabilities (2) The small size and irregular shape features which is hard to be captured in deep neural network (3) The blurred boundary caused by density similarities between the pancreas and its surrounding tissues (e.g. stomach wall, duodenum and small intestine).
Existing methods leverage general segmentation methods rather than specially designed for the pancreas and these methods can also segment other medical images and natural images, which ignore the anatomical nature of the pancreas. Inspired by this, we propose residual transformer UNet (RTUNet) as shown in Fig. 2, which introduced three blocks to deal with large position changes, high morphological variabilities (shape and size), and the blurred boundary challenges of the pancreas.
Pancreas position varies greatly in CT slices, experts usually identify this change by relative position with surrounding tissues or organs. For example, the head of the pancreas is wrapped by the duodenum and near the liver, while the tail of the pancreas is in the middle of the spleen and kidney. We used self-attention [6] to imitate human perception to obtain pancreas position changes. Specifically, each residual convolution block is followed by a residual transformer block which leverages self-attention to compute the similarity between patch embeddings and get new representations of all patches. This can be seen as a process of obtaining a relative position from a global view. In addition, our residual transformer blocks captured objects from global fine-grained low-level features to global coarse-grained high-level features at multiple scales [7]. As we know, visual tasks based on transformers need an amount of data to train state-of-the-art models because the learning curve is very steep. Therefore, we applied a residual architecture to decrease the difficulty of learning.
Segmentation networks recognize targets through a large receptive field [8]. However, only a large receptive field fails to accurately capture the morphological nature of the pancreas, since the relative position of the internal structure of the pancreas is not kept unchanged during the pooling process, which is not conducive to the segmentation task [9]. Due to the parameter sharing mechanism, convolution has the translation equivariance, which is conducive to identifying the target object and reducing internal relative position information loss, so as to obtain the morphology of the pancreas [9]. In RTUNet, we designed a dual convolutional down-sampling block to down-sample the feature maps to prevent internal relative position loss.
Apart from high variabilities in position and morphology, the pancreas and surrounding tissues or organs have similar densities leading to the blurred boundary. In recent years, pancreas segmentation tends to use the region-based Dice loss function, which lacks attention to the boundaries of the segmentation target [10], [11]. Therefore, we leverage the Hausdorff distance as a regular term for the dice loss, which makes RTUNet pay more attention to the pancreas segmentation boundary.
To evaluate RTUNet, we conducted extensive experiments on the public NIH data set. The average dice coefficient of our method reached 86.4%, which performs better than existing state-of-the-art methods. We also compared RTUNet with other classic medical image segmentation models. In addition, ablation studies are conducted to evaluate each block. In summary, our main contributions are in four aspects:
(1) We propose a residual transformer UNet specifically for pancreas segmentation, which improves segmentation performance.
(2) The residual transformer block was added to capture the relative position of the pancreas through self-attention, and we find the residual structure can reduce the learning difficulty in training the transformer-based neural network.
(3) A dual down-sampling block was designed to alleviate the inaccurate shape and size of the pancreas feature maps caused by the pooling operations.
(4) A Hausdorff distance constraint was leveraged to the region-based Dice loss function to make RTUNet pay more attention to the pancreas boundary, which alleviated the blurred boundary caused by similar densities.
Section snippets
Related work
With the depth of research, various methods have been proposed to segment the pancreas automatically. Specifically, these methods are mainly categorized into traditional approaches and deep learning-based approaches. Traditional pancreas segmentation methods are mainly based on heuristic methods (e.g. threshold and region growth) or model-driven optimization methods (e.g. atlas, graph cuts and simple linear iterative clustering), while deep learning based pancreas segmentation is data-driven,
Methods
The proposed framework is shown in Fig. 3 containing two steps. (1) In the data preprocessing stage, Hounsfield unit (HU) values are limited to [−200, 340] to increase contrast and prevent irrelevant noise. In addition, through localization and cropping, we obtain input images of uniform size 256 × 256 which reduces the influence of complex background. (2) The cropped images are input to RTUNet, and the network outputs the segmentation masks which are then recovered to 512 × 512 resolution.
Experiments
Conclusions
In this paper, we proposed the residual transformer UNet (RTUNet) and the Dice Hausdorff distance (DiceHD) loss function to alleviate the challenges of pancreas segmentation in a targeted manner. We first propose to utilize the residual transformer block to capture the external relative position information between the pancreas and its surrounding tissues or organs from a multi-scale global perspective. Through residual transformer-based ablation study, we found that adding a residual structure
CRediT authorship contribution statement
Chengjian Qiu: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing – original draft, Writing – review & editing, Visualization. Zhe Liu: Resources, Validation, Project administration, Funding acquisition, Supervision, Data curation. Yuqing Song: Investigation, Supervision, Funding acquisition. Jing Yin: Writing – review & editing. Kai Han: Writing – review & editing. Yan Zhu: Writing – review & editing. Yi Liu: Project administration, Funding acquisition. Victor S.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61772242, 62276116, 61976106, 61572239), the China Postdoctoral Science Foundation (2017M611737), the Six Talent Peaks Project in Jiangsu Province (DZXX-122), the Jiangsu Province emergency management science and technology project (YJGL-TG-2020-8), and the key research and development plan of Zhenjiang City (SH2020011).
References (46)
- et al.
Mechanistic target of rapamycin in the tumor microenvironment and its potential as a therapeutic target for pancreatic cancer
Cancer Lett.
(2020) - et al.
The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge
Med. Image Anal.
(2021) - et al.
Attention gated networks: Learning to leverage salient regions in medical images
Med. Image Anal.
(2019) - et al.
Multi-atlas pancreas segmentation: atlas selection based on vessel structure
Med. Image Anal.
(2017) - et al.
Morphological and multi-level geometrical descriptor analysis in CT and MRI volumes for automatic pancreas segmentation
Comput. Med. Imaging Graph.
(2019) - et al.
Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation
Med. Image Anal.
(2018) - et al.
Automatic pancreas segmentation based on lightweight DCNN modules and spatial prior propagation
Pattern Recognit.
(2021) - et al.
Loss odyssey in medical image segmentation
Med. Image Anal.
(2021) - et al.
Cancer statistics
CA: Cancer J. Clin.
(2019) - et al.
Bridging the gap between 2D and 3D contexts in CT volume for liver and tumor segmentation
IEEE J. Biomed. Health Inf.
(2021)
The medical segmentation decathlon
Nature Commun.
Attention is all you need
Adv. Neural Inf. Process. Syst.
Deep Learning
Iterative 3D feature enhancement network for pancreas segmentation from CT images
Neural Comput. Appl.
Efficient pancreas segmentation in computed tomography based on region-growing
Threshold algorithm for pancreas segmentation in dixon water magnetic resonance images
A threshold selection method from gray-level histograms
IEEE Trans. Syst. Man Cybern.
Automated pancreas segmentation from three-dimensional contrast-enhanced computed tomography
Int. J. Comput. Assist. Radiol. Surgery
Automatic pancreas segmentation in contrast enhanced CT data using learned spatial anatomy and texture descriptors
Regression forest-based atlas localization and direction specific atlas generation for pancreas segmentation
A bottom-up approach for automatic pancreas segmentation in abdominal CT scans
Cited by (12)
MMMViT: Multiscale multimodal vision transformer for brain tumor segmentation with missing modalities
2024, Biomedical Signal Processing and ControlStrongly representative semantic-guided segmentation network for pancreatic and pancreatic tumors
2024, Biomedical Signal Processing and ControlSurgivisor: Transformer-based semi-supervised instrument segmentation for endoscopic surgery
2024, Biomedical Signal Processing and ControlATFormer: Advanced transformer for medical image segmentation
2023, Biomedical Signal Processing and ControlAutomated CT pancreas segmentation for acute pancreatitis patients by combining a novel object detection approach and U-Net
2023, Biomedical Signal Processing and ControlCitation Excerpt :Unlike the 2D methods mentioned above, several studies proposed 3D networks for pancreas segmentation [20,34], which can utilize a more inter-slice spatial context for convolution on the transverse plane. Qiu et al. [35]designed a residual transformer block to better extract multi-scale features. Dogan et al. [36] also proposed a two-stage approach for pancreas segmentation, which adopted Mask R-CNN for coarse localization and 3D U-Net for fine segmentation.
MCAFNet: multiscale cross-layer attention fusion network for honeycomb lung lesion segmentation
2024, Medical and Biological Engineering and Computing