WP-UNet: Weight Pruning U-Net with Depth-wise Separable Convolutions for Semantic Segmentation of Kidney Tumours

Background Accurate semantic segmentation of kidney tumours in computed tomography (CT) images is di�cult because tumours feature varied forms and, occasionally, look alike. The KiTs19 challenge sets the groundwork for future advances in kidney tumour segmentation. Methods We present WP-UNet, a deep network model that is lightweight with a small scale; it involves few parameters with a quick assumption time and a low oating-point computational complexity. Results We trained and evaluated the model with CT images from 300 patients. The�ndings implied the dominance of our method on the training Dice score (0.98) for the kidney tumour region. The proposed model only uses 1,297,441 parameters and 7.2e FLOPS, three times lower than those for other network models. Conclusions The results con�rm that the proposed architecture is smaller than that of U-Net, involves less computational complexity, and yields good accuracy, indicating its potential applicability in kidney tumour imaging.


Introduction
American Cancer Society has reported on the prevalence of kidney cancer in both men and women.Overall, the lifetime risk to develop kidney cancer is approximately1/48 and 1/83 for men and women, respectively.The types of kidney cancer in this study were of an advanced stage.Kidney cancers are generally this advanced because the kidneys are situated deep inside the body and are not physically perceived on a physical inspection.Several imaging methods are currently in use to track the growth of kidney tumours.This method has become increasingly popular because it can selectively extract diseased tissues and retain additional stable tissue.This approach was successful in treating small kidney masses.After the precise evaluation of the kidney tumour, details such as the kidney, tumour structure, and others can be collected.In a recent study (Hesamianet al.,2019), it was impossible to derive the essential details from computed tomography (CT) or magnetic resonance imaging scans.Kidney tumours vary in colour, form, and scale, and have a similar appearance to their parenchyma and other nearby tissues.Given the segmentation of the kidney (Kanishka Sharma, 2017) tumour area, segmenting kidney tumours is extremely di cult.
Currently, there is an increased need to deploy deep learning solutions on mobile handheld devices (Hooman Vaseli, 2019), embedded systems (Karakanis al., 2020), or machines with minimal resources.An important reason why convolutional neural networks (CNNs) are challenging to train is because they are over-parameterised (Denil, 2013), and they typically require greater computational power and storage space for training and inference.Deep learning researchers have claimed many 'pruning' strategies or quantising learned parameters on broad image datasets (LeCun et al., 1990;Alvarez and Salzmann, 2017;Han et al., 2016).Others have concentrated on teaching compact models (Howardet al., 2017;Zhanget al., 2017;Qinet al., 2018) from scratch by factorising regular convolution layers into depth-wise separable convolution layers for cheaper computations.
Although CNNs have achieved the best results in functional implementations, robustness and accuracy remain challenging.Ronneberger et al. (2015) proposed a tool called U-Net for automated medical image segmentation to solve these issues.The U-Net synthesises vital information by reducing the cost function in the rst half of the network and generates an image in the second half.Inspired by the U-Net model, we approached the current challenge of kidney tumour segmentation by proposing a WP-U-Net model.We implemented weight pruning of the U-Net with a depth-wise separable convolution architecture, and thus it re nes even tiny regions in the output tumour picture.The system precisely separates the tumour regions of the kidney and offers established quanti cation and qualitative validity.

Related Works
Several computer-aided diagnosis models and arti cial neural networks have been developed to classify and segment renal tumours using CT scans.Lingararu et al. (2011) published a computer-aided method which was used to examine a collection of brain CT scans of 43patients.In this system, tumours were robustly segmented with approximately 80% overlap.The methodology studied morphological variations between various types of lesions.Lee et al. (2017) developed a computer program capable of detecting and identifying small renal masses in CT images.Their tests yielded a speci c signal-to-noise ratio of 99.63%.Shah et al. (2017) presented a segmentation approach using machine learning.Yang et al. (2014) created a system to automatically segment CT images of the kidney based on multi-atlas registration.
First, they recorded a low-resolution image with a series of higher-resolution images to create a patientregistered image.Next, the kidney tissues were segmented and aligned to achieve the nal segmented production.
Various researchers have also experimented with the segmentation of renal tumours using deep learning.Thong et al.(2016) used an online patch-wise convolutional kernel to classify the central voxel in 2D patches.Then, the ConvNet analysed the CT scan data of each kidney tumour slice.Skalski et al. (2016) demonstrated an e cient hybrid level-set approach with elliptical-form restrictions for kidney segmentation.The RUSBoost algorithm and decision trees were used to differentiate between kidney and tumour structures, serving as a solution to class imbalance and the need for de ning additional voxels.Their model achieved an average precision of 92.1%.Wang et al. (2018) de ned a CNN-based model for kidney segmentation.They proposed a CNN-based segmentation scheme that integrates the bounding box information.They also improved the CNN model by ne-tuning the model for each picture.
Network prototypes.Deep neural networks are superior in their capacity and ability to be generalised.
Deep models that learn entirely from data produce excellent results for many tasks when compared with humans.They enhance the plot depth.Researchers have achieved further advances in neural networks.The use of skip links in deep neural networks makes them more trainable to perform tasks such as deep learning.U-Net was initially planned to resolve image segmentation, but others such as VGGNet and ResNet were designed for deep classi cation (Linguraru et al., 2011) supervision to further enhance segmentation.Network pruning has been widely studied to compress the CNN models (Heet et al., 2017(Heet et al., , 2018)).In early work, network pruning proved to be a valid way to reduce network complexity and over tting (LeCun et al., 1989;Hanson and Pratt, 1989;Hassibi et al., 1993;Strom, 1997).Recently, Han et al. (2015) pruned state-of-the-art CNN models with no accuracy loss.

Proposed Method
In this section, we propose the WP-UNet model and describe the modi ed objective function.

Image Pre-processing
All CT images were resized to 256 × 256 pixels in the training set and separated by 255 pixels to normalise the pixel values from 0 to 1.

Dataset
The KiTS challenge dataset for kidney tumour disease segmentation was used to assess the performance of WP-UNet.The KiTS dataset (Helleret al., 2019) consists of 210 high-contrast CT scans collected in the preoperative arterial process.They were chosen from a cohort of subjects who underwent partial or radical nephrectomy (Kutikov et al., 2009) for one or more kidney tumours at the University of Minnesota Medical Center and were eligible for inclusion between 2010 and 2018.The volumes included are characterised by different plane resolutions ranging from 0.437 to 1.04 mm, with slice thicknesses ranging from 0.5 mm to 5.0 mm in each case.
The dataset also provides the ground-truth mask of healthy kidney tissue and healthy tumours (Figure 1) for each case.Under the guidance of experienced radiologists, a group of medical students manually generated sample labels with only CT scan image axial projections.A detailed description of the segmentation strategy for the ground truth is described in Helleret al. (2019).The KiTs challenge dataset is provided with shape (number of slices, height, width) in the standard NIFTI format.

WP-UNet Model
Figure 2 shows the detailed architecture of the proposed WP-UNet model.The network has the properties of the encoder and decoder structure of the vanilla U-Net (Shenet al., 2015).As suggested byLiuet al. ( 2018), rst, the input image is passed into the standard convolution layer; subsequently, it is passed tothe encoder part of the WP-UNet block.Here, to improve the model's generalisation capacity, a depth-wise separable convolutional layer is used, which helps the network select the features related to translation invariance with fewer parameters (Karakanis 2020 )than the standard convolution layer.
WP-UNet encoding is composed of the following four blocks: Block1: A standard convolution layer, lters, a ReLu activation function, and a batch normalisation layer.
Up-sampling is performed in the decoder section, which is used to combine depth-wise separable convolutions and WP-UNet blocks as shown in Figure 2.It also consists of ve blocks: Block1: A depth-wise separable convolution layer with its features concatenated with the dropout layer from Block4 of the encoding path.
Blocks 2, 3, and 4: WP-UNet block and depth-wise separable layer concatenated with corresponding blocks from the encoding path.
Block 5: Two WP-UNet blocks and two depth-wise separable layers, with the last one as the nal prediction layer (Figure 2).
To improve the model performance and reduce the number of oating-point operations, we added network pruning (Liuet al., 2019) to the proposed architecture, as shown in Figure 4.The output of the network pruning (Han, 2016) WP-UNet model includes the kidney region, tumour region, and background, as shown in Figure 5.

Loss Function
In this study, the Adamoptimiser (Kingmaand Ba,2014)is applied, which correctly updates the network weights by iteration in the training data.Adam makes an average in the rst and second moments of gradients to adapt the learning rate parameter.Sabarinathan et al. (2019) proposed that the loss function be the sum of the categorical cross-entropy Dice loss channel one(C0) and Dice loss channel two (C1), as de ned in Eq. ( 1).
where L is the cross-entropy loss.In Eq. ( 2),y i and p i are the ground truth and predicted segmented images, respectively.Moreover, to ensure the loss function stability, the coe cient ϵ is used.

3.5.Performance Metrics
The key performance metrics used to measure the WP-UNet performance on the CT scan dataset are explained in this subsection.

Accuracy (AC):
Accuracy measures the percentage of correct predictions, and is given as, where TP = correctly predicted positive, TN = correctly predicted negative, FP = incorrectly predicted positive, FN = incorrectly predicted negative.

Mean Intersection over Union (Mean IOU):
The mean IOU (Hassibi and Stork, 1993)is a popular evaluation method for semantically segmented images that rst determines the IOU for each semantic class and then determines the average over classes.The mean IOU is expressed as follows: FLOPs: FLOPs essentially calculates the number of multiplications and additions of oating-point numbers to be performed by the computation device's processor.A neural network in progress requires oating-point operation calculations to estimate the complexity of the proposed model.

Training
The proposed network was trained with two outputs, namely the kidney and kidney tumour regions.The weight updates were performed using the Adam optimiser with a learning rate of 0.001.The batch size was set to 16, and the total number of epochs was set to a hundred.The training was based on Keras with a TensorFlow backend as a Google Colab deep learning framework enabled with an NVIDIA GPU such as T4(12 GB memory) with a high-memory virtual machine.

Results
The standard Dice score is considered an evaluation metric for the performance of the proposed WP-UNet model.We employed 35,865 and 10,158 images as training and validation images, respectively, in our experiments.Table 1 shows the segmentation results of the proposed WP-UNet model for the training and validation images.From the table, we observe that during training, the proposed method achieves a training accuracy of 0.98 for the tumour region.Similarly, the computational resource usage of our network is listed in Table 2. Based on the experimental results, we perceive the power of network pruning in the proposed network.Because network pruning is added to the proposed architecture, the total number of ops and parameters is three times smaller than the typical UNet architecture.In Figure 6, the qualitative effects of the KiTs19 dataset on the proposed WP-UNet model are shown.We used the provided input images and ground-truth reality images to perform the experiments.The segmented performance image isdepicted in Figure 5.The red-coloured area is the kidney region in the output picture, and the green-coloured part is the kidney tumour.Numerous structures outside the tumour and kidney areas were neglected for simplicity.The nal segmented output closely matches the groundtruth image from the quantitative results, which demonstrates the usefulness of the proposed WP-UNet.

Conclusion
Medical image segmentation is an important preliminary step in the identi cation of kidney organ structure and tumour tissues in CT image scans to aid in illness diagnosis, treatment, and general analysis.Early diagnosis is necessary to help in preventing complications that may arise due to late detections.However, with the increasing availability of large biomedical data, the workload on nephrologists, radiologists, and other experts in the eld has also increased.To help provide easier, accurate, and timely detections, several deep learning methods have been proposed, most of which have proven to be successful.The U-Net architecture is one such model that is widely accepted among researchers for biomedical image segmentation tasks.
In this study, weight pruning UNet (WP-UNet) was proposed for the segmentation of kidney tumour data with limited computational resources.The WP-UNet architecture makes use of depth-wise separable convolutions (Figure 2) and pruning to reduce the parameters and oating-point operations.Moreover, the WP-UNet deep learning method exhibits a faster inference speed than that of the UNet method.
Our ndings indicatedthat the proposed WP-UNet architecture yielded a satisfactory accuracy.Our system obtained a Dice score of 0.9799 and 0.9599 for the preparation and validation sets, respectively.The proposed WP-UNet model achieved the best segmentation outcomes in terms of the Dicescore and usage of computational resources.Additionally, WP-UNet is shown to have a faster inference speed on test data and is bene cial for situations whereinrapid and accurate segmentation results are required.

Figures Figure 1
Figures

Table 1 :
Comparison of results between WP-UNet and other models