RTUNet: Residual transformer UNet specifically for pancreas segmentation

doi:10.1016/j.bspc.2022.104173

Biomedical Signal Processing and Control

Volume 79, Part 2, January 2023, 104173

https://doi.org/10.1016/j.bspc.2022.104173 Get rights and content

Highlights

•
Residual transformer block can captures the relative pancreas position.
•
Dual down-sampling block addresses inaccurate pancreas morphology caused by pooling.
•
Hausdorff distance constraint makes the network focus on the pancreas boundary.
•
Residual transformer UNet is specifically proposed for pancreas segmentation.

Abstract

Accurate pancreas segmentation is crucial for the diagnostic assessment of pancreatic cancer. However, large position changes, high variability in shape and size, and the extremely blurred boundary make the task of pancreas segmentation challenging. To alleviate these challenges, we propose the residual transformer UNet (RTUNet) to fit the nature of the pancreas. Specifically, a residual transformer block is implemented to extract multi-scale features from a global perspective which captures high variabilities in the pancreas position. In addition, a dual convolutional down-sampling strategy is leveraged to obtain precise shape and size features of the pancreas in a large receptive field which prevents the loss of information. We finally propose a dice hausdorff distance loss that makes the network focus on the pancreas boundary. Through extensive experiments on the public NIH dataset, we achieved a dice similarity coefficient (DSC) of 86.25%, which outperforms the state-of-the-art DSC of 85.49%. In addition, our method surpasses the baselines by more than 3.0% on DSC and improves the min DSC by 2.93%. Furthermore, ablation studies are also performed to prove the effectiveness of each proposed module.

Introduction

Pancreatic cancer whose five-year survival is less than 9% has a high mortality, early diagnosis of pancreatic cancer is essential to improve the survival of patients [1], [2]. Accurate pancreas segmentation, as a prerequisite for pancreatic cancer recognition, effectively suppresses the complex background, which helps doctors assess the progress of treatment. Unlike abdominal organs (e.g. liver, kidney, and spleen) whose segmentation accuracy has exceeded 95% [3], [4], [5], accurate pancreas segmentation is challenging due to the following three aspects (Fig. 1): (1) The large inter-slice and intra-slice position variabilities (2) The small size and irregular shape features which is hard to be captured in deep neural network (3) The blurred boundary caused by density similarities between the pancreas and its surrounding tissues (e.g. stomach wall, duodenum and small intestine).

Existing methods leverage general segmentation methods rather than specially designed for the pancreas and these methods can also segment other medical images and natural images, which ignore the anatomical nature of the pancreas. Inspired by this, we propose residual transformer UNet (RTUNet) as shown in Fig. 2, which introduced three blocks to deal with large position changes, high morphological variabilities (shape and size), and the blurred boundary challenges of the pancreas.

Pancreas position varies greatly in CT slices, experts usually identify this change by relative position with surrounding tissues or organs. For example, the head of the pancreas is wrapped by the duodenum and near the liver, while the tail of the pancreas is in the middle of the spleen and kidney. We used self-attention [6] to imitate human perception to obtain pancreas position changes. Specifically, each residual convolution block is followed by a residual transformer block which leverages self-attention to compute the similarity between patch embeddings and get new representations of all patches. This can be seen as a process of obtaining a relative position from a global view. In addition, our residual transformer blocks captured objects from global fine-grained low-level features to global coarse-grained high-level features at multiple scales [7]. As we know, visual tasks based on transformers need an amount of data to train state-of-the-art models because the learning curve is very steep. Therefore, we applied a residual architecture to decrease the difficulty of learning.

Segmentation networks recognize targets through a large receptive field [8]. However, only a large receptive field fails to accurately capture the morphological nature of the pancreas, since the relative position of the internal structure of the pancreas is not kept unchanged during the pooling process, which is not conducive to the segmentation task [9]. Due to the parameter sharing mechanism, convolution has the translation equivariance, which is conducive to identifying the target object and reducing internal relative position information loss, so as to obtain the morphology of the pancreas [9]. In RTUNet, we designed a dual convolutional down-sampling block to down-sample the feature maps to prevent internal relative position loss.

Apart from high variabilities in position and morphology, the pancreas and surrounding tissues or organs have similar densities leading to the blurred boundary. In recent years, pancreas segmentation tends to use the region-based Dice loss function, which lacks attention to the boundaries of the segmentation target [10], [11]. Therefore, we leverage the Hausdorff distance as a regular term for the dice loss, which makes RTUNet pay more attention to the pancreas segmentation boundary.

To evaluate RTUNet, we conducted extensive experiments on the public NIH data set. The average dice coefficient of our method reached 86.4%, which performs better than existing state-of-the-art methods. We also compared RTUNet with other classic medical image segmentation models. In addition, ablation studies are conducted to evaluate each block. In summary, our main contributions are in four aspects:

(1) We propose a residual transformer UNet specifically for pancreas segmentation, which improves segmentation performance.

(2) The residual transformer block was added to capture the relative position of the pancreas through self-attention, and we find the residual structure can reduce the learning difficulty in training the transformer-based neural network.

(3) A dual down-sampling block was designed to alleviate the inaccurate shape and size of the pancreas feature maps caused by the pooling operations.

(4) A Hausdorff distance constraint was leveraged to the region-based Dice loss function to make RTUNet pay more attention to the pancreas boundary, which alleviated the blurred boundary caused by similar densities.

Section snippets

Related work

With the depth of research, various methods have been proposed to segment the pancreas automatically. Specifically, these methods are mainly categorized into traditional approaches and deep learning-based approaches. Traditional pancreas segmentation methods are mainly based on heuristic methods (e.g. threshold and region growth) or model-driven optimization methods (e.g. atlas, graph cuts and simple linear iterative clustering), while deep learning based pancreas segmentation is data-driven,

Methods

The proposed framework is shown in Fig. 3 containing two steps. (1) In the data preprocessing stage, Hounsfield unit (HU) values are limited to [−200, 340] to increase contrast and prevent irrelevant noise. In addition, through localization and cropping, we obtain input images of uniform size 256 × 256 which reduces the influence of complex background. (2) The cropped images are input to RTUNet, and the network outputs the segmentation masks which are then recovered to 512 × 512 resolution.

Experiments

Conclusions

In this paper, we proposed the residual transformer UNet (RTUNet) and the Dice Hausdorff distance (DiceHD) loss function to alleviate the challenges of pancreas segmentation in a targeted manner. We first propose to utilize the residual transformer block to capture the external relative position information between the pancreas and its surrounding tissues or organs from a multi-scale global perspective. Through residual transformer-based ablation study, we found that adding a residual structure

CRediT authorship contribution statement

Chengjian Qiu: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing – original draft, Writing – review & editing, Visualization. Zhe Liu: Resources, Validation, Project administration, Funding acquisition, Supervision, Data curation. Yuqing Song: Investigation, Supervision, Funding acquisition. Jing Yin: Writing – review & editing. Kai Han: Writing – review & editing. Yan Zhu: Writing – review & editing. Yi Liu: Project administration, Funding acquisition. Victor S.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61772242, 62276116, 61976106, 61572239), the China Postdoctoral Science Foundation (2017M611737), the Six Talent Peaks Project in Jiangsu Province (DZXX-122), the Jiangsu Province emergency management science and technology project (YJGL-TG-2020-8), and the key research and development plan of Zhenjiang City (SH2020011).

References (46)

LiuYueze et al.
Mechanistic target of rapamycin in the tumor microenvironment and its potential as a therapeutic target for pancreatic cancer
Cancer Lett.
(2020)
HellerNicholas et al.
The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge
Med. Image Anal.
(2021)
SchlemperJo et al.
Attention gated networks: Learning to leverage salient regions in medical images
Med. Image Anal.
(2019)
KarasawaKen’ichi et al.
Multi-atlas pancreas segmentation: atlas selection based on vessel structure
Med. Image Anal.
(2017)
AsaturyanHykoush et al.
Morphological and multi-level geometrical descriptor analysis in CT and MRI volumes for automatic pancreas segmentation
Comput. Med. Imaging Graph.
(2019)
RothHolger R. et al.
Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation
Med. Image Anal.
(2018)
ZhangDingwen et al.
Automatic pancreas segmentation based on lightweight DCNN modules and spatial prior propagation
Pattern Recognit.
(2021)
MaJun et al.
Loss odyssey in medical image segmentation
Med. Image Anal.
(2021)
SiegelRebecca L. et al.
Cancer statistics
CA: Cancer J. Clin.
(2019)
SongLei et al.
Bridging the gap between 2D and 3D contexts in CT volume for liver and tumor segmentation
IEEE J. Biomed. Health Inf.
(2021)

AntonelliMichela et al.

The medical segmentation decathlon

Nature Commun.

(2022)

VaswaniAshish et al.

Attention is all you need

Adv. Neural Inf. Process. Syst.

(2017)

Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer, Multiscale...

Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings...

GoodfellowIan et al.

Deep Learning

(2016)

MoJuan et al.

Iterative 3D feature enhancement network for pancreas segmentation from CT images

Neural Comput. Appl.

(2020)

TamTran Duc et al.

Efficient pancreas segmentation in computed tomography based on region-growing

ShanXiaoying et al.

Threshold algorithm for pancreas segmentation in dixon water magnetic resonance images

OtsuNobuyuki

A threshold selection method from gray-level histograms

IEEE Trans. Syst. Man Cybern.

(1979)

ShimizuAkinobu et al.

Automated pancreas segmentation from three-dimensional contrast-enhanced computed tomography

Int. J. Comput. Assist. Radiol. Surgery

(2010)

ErdtMarius et al.

Automatic pancreas segmentation in contrast enhanced CT data using learned spatial anatomy and texture descriptors

OdaMasahiro et al.

Regression forest-based atlas localization and direction specific atlas generation for pancreas segmentation

FaragAmal et al.

A bottom-up approach for automatic pancreas segmentation in abdominal CT scans

Cited by (12)

MMMViT: Multiscale multimodal vision transformer for brain tumor segmentation with missing modalities
2024, Biomedical Signal Processing and Control
Accurate segmentation of brain tumors from multimodal MRI sequences is a critical prerequisite for brain tumor diagnosis, prognosis, and surgical treatment. While one or more modalities are often missing in clinical practice, which can collapse most previous methods that rely on all modality data. To deal with this problem, the current state-of-the-art Transformer-related approach directly fuses available modality-specific features to learn a shared latent representation, with the aim of extracting common features that are robust to any combinatorial subset of all modalities. However, it is not trivial to directly learn a shared latent representation due to the diversity of combinatorial subsets of all modalities. Furthermore, correlations across modalities as well as global multiscale features are not exploited in this Transformer-related approach. In this work, we propose a Multiscale Multimodal Vision Transformer (MMMViT), which not only leverages correlations across modalities to decouple the direct fusing procedure into two simple steps but also innovatively fuses local multiscale features as the input of the intra-modal Transformer block to implicitly obtain the global multiscale features to adapt to brain tumors of various sizes. We experiment on the BraTs 2018 dataset for all modalities and various missing-modalities as input, and the results demonstrate that the proposed method achieves the state-of-the-art performance. Code is available at: https://github.com/qiuchengjian/MMMViT.
Strongly representative semantic-guided segmentation network for pancreatic and pancreatic tumors
2024, Biomedical Signal Processing and Control
Accurate and reliable segmentation of the pancreas and its lesions on computed tomography (CT) images is crucial in medical imaging for preoperative diagnosis, surgical planning, and postoperative monitoring. However, there are limited studies that address simultaneous segmentation of the pancreas and pancreatic tumors. Moreover, existing studies have not fully utilized the feature potential of the original images and have neglected the exploration of semantic information with strong representation. To overcome these limitations, we propose the Strongly Representative Semantic-guided Segmentation Network (SRSNet). Specifically, we employ intermediate semantic information to generate strongly representative high-resolution pre-segmented images, effectively reducing channel redundancy across different resolutions. We utilize various mechanisms to extract distinct representative features, and with the guidance of these features, SRSNet effectively supplements high-resolution detailed information for features of different resolutions, provides auxiliary features for the pixel decision phase of the network, and detects large-scale changes in the pancreas and pancreatic tumors. Additionally, we design a loss function that enhances SRSNet’s sensitivity to boundary pixels and attenuates the effect of class imbalance. Our method is evaluated on Task07 Pancreas and NIH Pancreas datasets. In the experiment of combined pancreas and tumor segmentation in the MSD dataset, we achieved Dice, Recall, Precision, and MIoU scores of 78.60%, 79.64%, 81.72%, and 71.47%, respectively. Extensive experiments demonstrate that our algorithm not only outperforms state-of-the-art algorithms for pancreas segmentation but also exhibits excellent performance for pancreas and pancreatic tumor segmentation.
Surgivisor: Transformer-based semi-supervised instrument segmentation for endoscopic surgery
2024, Biomedical Signal Processing and Control
Precise instrument segmentation helps tracking of instruments in surgery. The most of the existing instrument segmentation methods are fully supervised, which are based on 100% labeled data. However, the annotation for instrument segmentation are really expensive, which need the skilled professionals who can identify the parts and types of the surgical instruments. In this work, we propose a transformer-based semi-supervised instrument segmentation for endoscopic surgery, called Surgivisor. First, we present a data augmentation technique to generate synthetic data from endoscopic images to overcome the complex background and instrument collision problem, by fully using the information of unlabeled data and pseudo labels. Second, we propose a mutual prototype loss and a dual structural similarity loss to address illumination reflection and bloody condition issues in the training phase. With the two improvements, the effectiveness of proposed method is validated by the experiments on EndoVis Challenges. It exceeds the state-of-the-art results on the sub-tasks of binary, part, and type.
ATFormer: Advanced transformer for medical image segmentation
2023, Biomedical Signal Processing and Control
Combining transformers and convolutional neural networks is considered one of the most important directions for tackling medical image segmentation problems. To learn the long-range dependencies and local contexts, previous approaches embedded a convolutional layer into feedforward neural network inside the transformer block. However, a common issue is the instability during training since large differences in amplitude across layers by pre-layer normalization. Furthermore, multi-scale features were directly fused using the transformer from the encoder to decoder, which could disrupt valuable information for segmentation. To address these concerns, we propose Advanced TransFormer (ATFormer), a novel hybrid architecture that combines convolutional neural networks and transformers for medical image segmentation. First, the traditional transformer block has been refined into an Advanced Transformer Block, which adopts post-layer normalization to obtain mild activation values and employs the scaled cosine attention with shifted window for accurate spatial information. Second, the Progressive Guided Fusion module is introduced to make multi-scale features more discriminative while reducing the computational complexity. Experimental results on the ACDC, COVID-19 CT-Seg, and Tumor datasets demonstrate the significant advantage of ATFormer over existing methods that rely solely on convolutional neural networks, transformers, or their combination.
Automated CT pancreas segmentation for acute pancreatitis patients by combining a novel object detection approach and U-Net
2023, Biomedical Signal Processing and Control
Citation Excerpt :
Unlike the 2D methods mentioned above, several studies proposed 3D networks for pancreas segmentation [20,34], which can utilize a more inter-slice spatial context for convolution on the transverse plane. Qiu et al. [35]designed a residual transformer block to better extract multi-scale features. Dogan et al. [36] also proposed a two-stage approach for pancreas segmentation, which adopted Mask R-CNN for coarse localization and 3D U-Net for fine segmentation.
Acute pancreatitis is an inflammatory disorder of the pancreas. Medical imaging, such as computed tomography (CT), has been widely used to detect volume changes in the pancreas for acute pancreatitis diagnosis. Many pancreas segmentation methods have been proposed but no methods for pancreas segmentation from acute pancreatitis patients. The segmentation of an inflamed pancreas is more challenging than the normal pancreas due to the following two reasons. 1) The inflamed pancreas invades surrounding organs and causes blurry boundaries. 2) The inflamed pancreas has higher shape, size, and location variability than the normal pancreas. To overcome these challenges, we propose an automated CT pancreas segmentation approach for acute pancreatitis patients by combining a novel object detection approach and U-Net. Our approach includes a detector and a segmenter. Specifically, we develop an FCN-guided region proposal network (RPN) detector to localize the pancreatitis regions. The detector first uses a fully convolutional network (FCN) to reduce the background interference of medical images and generates a fixed feature map containing the acute pancreatitis regions. Then the RPN is employed on the feature map to precisely localize the acute pancreatitis regions. After obtaining the location of pancreatitis, the U-Net segmenter is used on the cropped image according to the bounding box. The proposed approach is validated using a collected clinical dataset with 89 abdominal contrast-enhanced 3D CT scans from acute pancreatitis patients. Compared with other start-of-the-art approaches for normal pancreas segmentation, our method achieves better performance on both localization and segmentation in acute pancreatitis patients.
MCAFNet: multiscale cross-layer attention fusion network for honeycomb lung lesion segmentation
2024, Medical and Biological Engineering and Computing

View all citing articles on Scopus

View full text

RTUNet: Residual transformer UNet specifically for pancreas segmentation

Highlights

Abstract

Introduction

Section snippets

Related work

Methods

Experiments

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Cancer Lett.

Med. Image Anal.

Med. Image Anal.

Med. Image Anal.

Comput. Med. Imaging Graph.

Med. Image Anal.

Pattern Recognit.

Med. Image Anal.

Cancer statistics

CA: Cancer J. Clin.

Bridging the gap between 2D and 3D contexts in CT volume for liver and tumor segmentation

IEEE J. Biomed. Health Inf.

The medical segmentation decathlon

Nature Commun.

Attention is all you need

Adv. Neural Inf. Process. Syst.

Deep Learning

Iterative 3D feature enhancement network for pancreas segmentation from CT images

Neural Comput. Appl.

Efficient pancreas segmentation in computed tomography based on region-growing

Threshold algorithm for pancreas segmentation in dixon water magnetic resonance images

A threshold selection method from gray-level histograms

IEEE Trans. Syst. Man Cybern.

Automated pancreas segmentation from three-dimensional contrast-enhanced computed tomography

Int. J. Comput. Assist. Radiol. Surgery

Automatic pancreas segmentation in contrast enhanced CT data using learned spatial anatomy and texture descriptors

Regression forest-based atlas localization and direction specific atlas generation for pancreas segmentation

A bottom-up approach for automatic pancreas segmentation in abdominal CT scans