DRD-UNet, a UNet-Like Architecture for Multi-Class Breast Cancer Semantic Segmentation

Staining of histological slides with Hematoxylin and Eosin is widely used in clinical and laboratory settings as these dyes reveal nuclear structures as well as cytoplasm and collagen. For cancer diagnosis, these slides are used to recognize tissues and morphological changes. Tissue semantic segmentation is therefore important and at the same time a challenging and time-consuming task. This paper describes a UNet-like deep learning architecture called DRD-UNet, which adds a novel processing block called DRD (Dilation, Residual, and Dense block) to a UNet architecture. DRD is formed by the combination of dilated convolutions (D), residual connections (R), and dense layers (D). DRD-UNet was applied to the multi-class (tumor, stroma, inflammatory, necrosis, and other) semantic segmentation of histological images from breast cancer samples stained with Hematoxylin and Eosin. The histological images were released through the Breast Cancer Semantic Segmentation (BCSS) Challenge. DRD-UNet outperformed the original UNet architecture and 15 other UNet-based architectures on the segmentation of 12,930 image patches extracted from regions of interest that ranged in size between $1036 \times 1222$ to $6813 \times 7360$ pixels. DRD-UNet obtained the best performance as measured with Jaccard similarity index, Dice coefficient, in a per-class comparison and accuracy for overall segmentation.


I. INTRODUCTION
Image segmentation has been one of the most important tasks in computer vision, pattern recognition, and image analysis.The essence of image segmentation consists of assigning a label or class to each element of the image.Labels can correspond to organs in medical imaging [1], or human bodies in videos of surveillance [2].A plethora of segmentation approaches have been proposed through the years; from traditional image processing techniques [3] and more recently deep learning [4].The fact that segmentation algorithms continue to be published every year, illustrates The associate editor coordinating the review of this manuscript and approving it for publication was Essam A. Rashed .
that it is a difficult problem, with many challenges yet to be addressed [5], [6].The use of deep learning architectures has grown significantly in recent years [7], in part due to the superior results that have been achieved in many areas, medical image analysis being just one of them [8], [9].Another reason for the popularity of deep learning is that hand-crafted features and analysis are replaced by a learning process that results from training with a large amount of labeled data [10].Deep learning-based semantic segmentation of biomedical images has grown dramatically since the first publication of UNet by Ronneberger in 2015 [11].Whilst the standard UNet architecture, sometimes called vanilla UNet [12], has been used in numerous applications, modifications to the architecture have improved results [13].However, it is considered that there is still room for improvement [14], [15], and any improvement would be welcome in medical environments where decisions like diagnosis or treatment could be influenced by the outcome of previous segmentation or classification steps.Indeed, it has been observed that the survival in cases of colon cancer can be predicted from histology slides [16].
The probability of developing breast cancer in females is the highest compared to that in any other organ at approximately 12.4% [17].Early detection through screening programs is the best way to improve patient outcome and reduce mortality [18].As such, mortality has decreased slightly since the 1970s [17], but breast cancer remains a significant concern for females worldwide [19]; thus, any improvement in treatment and diagnosis is to be pursued.Mammography is the first tool for diagnosis of breast cancer, and screening programs have shown significant reductions in breast cancer mortality [20].However, mammography has limitations and harms, besides the radiation inherent in the procedure, sensitivity and specificity are limited as compared with magnetic resonance imaging [21], and most importantly, its use can lead to over diagnosis, which is considered the greatest harm of the technique [22].Histopathology, or the examination of tissues under a microscope, can provide sufficient information to avoid over diagnosis and over treatment of breast cancer [23].Histopathology provides a huge amount of information [16], and advances in molecular techniques, bioinformatics, and image analysis can complementary information to reveal diagnosis, treatment, and survival [24], [25], [26].The development of whole slide imaging [27], which scan slides and capture images at different resolutions, has grown in popularity and provides huge datasets with a wealth of information, and besides the disadvantage of capital investment [28], there is the technical challenge of analysis images that can comprise billions of pixels.To analyze these large images, which can be part of even larger datasets, computational approaches are thus required and can provide new perspectives in clinical practice [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], like grading tissue [39], [40], [41], [42], or proliferation of tumors [43].
This work presents a UNet-like architecture, which provides a multi-class semantic segmentation breast cancer images stained with H&E.The main contributions of this paper are the following: 1) A novel processing block called DRD, which is a pyramid block with dilated convolutions, a residual connection, and a dense layer is proposed.The addition of this block improved the segmentation performance of histological images.2) Sixteen UNet-like architectures were objectively compared with the original UNet used as a baseline.The architecture here proposed, DRD-UNet outperformed all other architectures/

II. RELATED WORK
Pyramids [44], [45] and quad trees [46] have been used for decades as multiresolution tools that allow the analysis of images at distinct levels of detail.The underlying concept of these techniques is to combine the values of neighboring pixels and subsampling to create a new image, or level of the pyramid, which is smaller in size than the previous one and whose values correspond to a neighborhood in a lower level and the filter or kernel applied.In this way, decisions that are taken at higher levels of the pyramid will have information of larger and larger neighborhoods of pixels.One of the first attempts to use these schemes for visual pattern recognition was the Neocognitron proposed by Fukushima [47].A few years later Le Cun and Bengio extended the approach by extending the features through a series of convolutions, thus creating the widely popular convolutional networks [48].Deconvolutional Neural Networks (DCNN) [49] perform an inverse convolutional model by an upsampling process thus increasing the resolution of the output, extensively used in segmentation tasks; the precursor of DCNN is the Neocognitron proposed by Fukushima.A fully Convolutional Network (FCN) is the first example of a deconvolutional model, for example, [50] incorporated a dense prediction layer, and a fully connected Conditional Random Field.
Convolution with an extended kernel dilation is also known as Atrous convolution [51] and it has been proved for the segmentation of objects at multiple scales, as well as for weakly supervised semantic segmentation [52].
UNet has been widely used for segmentation in medical imaging, and several modifications and improvements have been proposed [53], [54].One of the first modifications of UNet was the addition of residual blocks [55] for liver lesion segmentation.ResUNet uses a UNet with residual connections at encoders and decoders and achieved first place in the liver tumor segmentation challenge in 2017.RMS UNet added a residual block with dilated section convolutions [55], [56] and demonstrated high accuracy over different publicly available datasets.Even a recent generative adversarial network (GAN) with a dense UNetbased segmentor and a radiomics discriminator for liver lesion segmentation [57] have been proposed.In brain segmentation, the stack Multi-Connection Simple Reducing Net (SMCSRNET) [58] formed with stacked coder-encoder blocks improved segmentation with less training time, using a reduced parameters model.MH-UNet [59], a multi-scaled network is formed with dense and residual blocks and as a result, it is used for multi-organ segmentation.MI-UNet [60] uses a procedure called brain parcellation to generate an input to UNet for brain stroke segmentation under MRI.Spatial weighted UNet is used for 3D CT brain images [61] with residual-inception blocks densely connected that reduces trainable parameters over the MRI dataset.A Purified and Residual UNet or P-ResUNet [62] is based on a Dilated Pyramid Block (DPB) and was used for brain tissue segmentation.This block consists of dilations of distances D = 1, 2, 3 in parallel.For retinal vessel segmentation, the images require enhanced contrast for accurate vessel detection and approaches based on Residual attention and supervised UNet have been proposed [63], [64], [65].The network MI-UNet [66], which consists of two MI-UNets connected into one S-UNet has been used for Brain Stroke Lesions.For nuclei and cell segmentation, RIC-UNet [67], a network with the mechanism residual blocks, multi-scale, and channel attention mechanism demonstrated superior performance over traditional methods.Residual-dense blocks have been explored also at the bottleneck connection to form D-UNet for lung vessel segmentation [68].Segmentation of breast tumor cancer has been also explored by different UNet architectures, the ResUNet [69] estimates volumetric measurement of breast cancer on magnetic resonance imaging.An architecture called Connected-UNets using two UNets and additional modified skip connections showed better visual results in segmenting the mass lesions over mammograms [70].Also, generative networks have been explored, RDA-UNET-WGAN [71] employs a Residual-Dilated-Attention-Gate-UNet as the generator network.The UNet3+, a full-scale skip connections and deep supervision framework, has also been employed for 6 different architecture segmentation models [72].

A. MATERIALS
The datasets used in this paper were obtained from the Breast Cancer Semantic Segmentation (BCSS) Challenge [73], which is publicly available through the Grand Challenges website (https://bcsegmentation.grand-challenge.org/).
The dataset was extracted from 151 formalin-fixed paraffin-embedded tissues stained with hematoxylin and eosin (H&E), all with histologically-confirmed breast cancer  and are part of The Cancer Genome Atlas Program (https://portal.gdc.cancer.gov).Whole-slide images (WSIs) were acquired at a magnifications of 40× (n = 138) and 20× (n = 13).Then, regions of interest (ROI) were selected from the WSIs and manually annotated by pathologists, pathology residents, and medical students through a crowd-sourcing process as described in [73].to the ROI.The maximum and minimum image sizes are 6813 × 7360 and 1036 × 1222 pixels, respectively.The mean ROI size was 39e6 pixels, which correspond to 1.18 mm 2 .A summary of the features of this data set is presented in Table 1.
From the ROI images, 136 were selected to perform the training of the architectures and fifteen were used as a separate test set.Then, from the 136 ROIs 12, 930 nonoverlapping pairs of data and label patches of dimensions 256 × 256 pixels were generated as illustrated in Figure 1.c.The main characteristics of the patches are shown in Table 2.
The manual annotation of the tissue contained 22 different tissue classes (tumor, stroma, inflammatory, necrosis, glandular secretions, blood, fat, plasma cells, other immune infiltrate, mucoid material, normal acinus or duct, lymphatics, undetermined, nerve, skin, blood vessels, etc.), of which the first four classes (tumor, stroma, inflammatory, necrosis) contained the majority of pixels.All other classes were merged in a single category called other following the instructions of the BCSS Challenge.After the merger, the proportions of the five classes were tumor 0.3964, stroma 0.3598, inflammatory 0.1031, necrosis 0.0660, other 0.0747.These proportions suggest a moderate class imbalance and as such, no measures to compensate the class imbalance were taken.
A 3D tensor structure (a cube) was used to represent the ground-truth (GT), i.e., the expected output of a segmented image.This structure has the same size, in terms of pixels, as its corresponding input image, and consists of five channels in depth, which mark the presence of a pathology each pixel.

B. METHODS
Experimental modifications to UNet architecture were systematically evaluated on segmentation of the BCSS dataset.The modifications focused on two operations: dilated convolutions (Figure 2.a) and residual connections (Figure 3.a).Convolution is a fundamental operation that traverses an image and extracts the values of a neighborhood of pixels, normally contiguous, but it is possible to dilate by a factor of n and thus cover a larger area as illustrated in Figures 2.b and 2.c.These dilated convolutions are sometimes called Atrous convolutions [56] and produce an increase in the receptive field.The essence of residual connections, also known as skip connections, is to add the output of a processing block with the input of that block.In terms of branches, these would go in parallel as opposed to a single sequential process as illustrated in Figure Figure 3.a.Residual connections were introduced at the computer vision and pattern recognition (CVPR) 2016 conference [74], and have been demonstrated to minimize training errors, overfitting, and vanishing gradient effects [55].

1) UNET ARCHITECTURE
The UNet [11] is a convolutional neural network, which was proposed for semantic segmentation.It is a purely convolutional model, with neither perceptrons nor other type of trainable units, and it is an autoencoder-like architecture (AE).Following the traditional AE model, the UNet compresses its input image with a cascade of encoding blocks composed of convolutional and maxpooling layers.Later, using a cascade of decoding blocks, composed of upsampling and convolutional layers, it decodes the compressed representation of the input and performs the prediction of its output.As required in deep learning, all convolutions are followed by non-linear ReLU activation functions and a pixel-wise softmax activation function at the output of the network.As shown in Figure 4, by using as many decoding blocks as there are encoding blocks, the UNet model describes a 'U' shape, hence its name.The key feature of the UNet is that it makes use of residual connections that concatenate the output of the i-th encoding block with its corresponding decoding counterpart.These residual connections serve two purposes, they mitigate the possibility of encountering vanishing gradient effects, and they increase the chance of exploiting visual patterns that might prove relevant for prediction but that could have been overlooked by the encoding process [11].

2) UNET MODIFICATIONS
The UNet architecture was systematically modified through different architectures, illustrated in Figure 5 and Table 3.The first five architectures implemented in this work were done with the inclusion of the residual connections [55].The residual connection was evaluated at the input layer of the encoder (ResUNet-i), at all layers of the encoder (ResUNet-e), and at all layers of the encoder and decoder (ResUNet).A modified ResUNet with added batch normalization after the convolutional layer was evaluated in ResUNet-BN.Notice that the original UNet does not include the BN layer.
Inception blocks with standard convolutions of size 3 × 3 were modified to include dilated convolutions of size 2 and 4 as illustrated in Figure 2.a -2.c.Next, different convolution dilations were evaluated by using the DPB block at the input layer of the encoder (DPB3-i), at all layers of the encoder (DPB3-e), at all layers of the encoder and decoder (DPB3-a), and with dilation of D = 5 at the input layer of the encoder (DPB5-i).Combinations of residual connections and Atrous convolutions were also evaluated.Dilations of size D = 3 at the input layer of the encoder with residual connection over all the layers of both the encoder and decoder (DPB3-i+Res), and a combination with an input dilation D = 4 (DPB4i+Res).A ResUNet-a block as presented by [75], used for the segmentation of remote sense data with dilations up to size D = 31, was evaluated at the encoder.Dilation in series was also evaluated at the input (Series-i) and at the encoder and decoder blocks (Series-e/d).RMS-UNet [56] combines residual connections and Atrous convolution with D = 2, 4 inserted in the residual connection (Figure 3.c).This configuration achieves high performance with minimal loss and low computational cost and was also evaluated in RMS-UNet.Finally, a Dilation DPB3 block, residual and dense block (DRD) as seen in Figure 3.d, is proposed, and evaluated in this work.

3) HYPER-PARAMETER EXPLORATION
Hyper-parameters were explored on the baseline UNet and then were maintained for all the other architectures.Batch size was evaluated between 1 and 16 and the best results were provided by a mini-batch of size 8. Three optimizers were evaluated: stochastic gradient descent with momentum (sgdm), root mean square propagation (rmsprop), and adaptive moment estimation (adam).Adam provided the best results.Training scenarios of 5, 10 and 30 epochs were evaluated, and 10 epochs were chosen as it provided good results at a fraction of the time.The loss function employed was the Cross Entropy for k Mutually Exclusive Classes [76].The same hyper-parameters were used for all subsequent variations of the UNet architecture.

4) IMPLEMENTATION DETAILS
All the programming was performed in a computer with processor Intel core i7-7700k, 16 GB RAM, and CPU at 4.20 GHz with an Intel graphics P4000 GPU.The platform used was Matlab ® version 2023a (The Mathworks TM , Natick, MA, USA) with the deep learning toolbox.The code is publicly available in the GitHub repository: https://github.com/mauOrtRuiz/DRD-UNet.

5) PERFORMANCE ESTIMATION
To assess the performance of the architectures proposed in this work, pixels were classified into four categories: True Positive (TP), True Negative (TN ), False Positive (FP) and False Negative (FN ) when compared against the ground truth as illustrated in in Figure 6.Where these have been calculated in a per − class basis, the sub-index i has been used, i.e., TP i .From these, the following pixel-wise, class-wise metrics were calculated: Dice coefficient, Jaccard similarity index (also known as intersection over union), specificity, and sensitivity as follows.In addition, overall accuracy (i.e., not on a per-class basis) was calculated as follows:

IV. RESULTS AND ABLATION ANALYSIS
From the 12, 930 pairs of image and label patches, 10, 736 were used to train the architectures previously described and 2, 194 were used for validation.Figure 7 shows the training and validation loss curves of the architectures that have been published and DRD-UNet.Then, the trained models were used to perform semantic segmentation of fifteen ROIs, which had been selected as the test set.The performance of each architecture was measured as described previously.The results for Dice, Jaccard and Accuracy are presented in Table 4, the results for Sensitivity and Specificity are presented in Table 5. Results of the semantic segmentation of selected regions of the images are presented in Figures 8  and 9 for all architectures.The sixteen models presented in this work are the result of a series of investigations that correspond to a study of ablation by themselves, as they incrementally incorporate variants into the base model.Therefore, we analyze our results from such an ablation perspective.
First, we can notice that adding residual connections (ResUNet) increases the number of parameters but seems to reduce the overall performance, except for class ''Inflammatory'' where the performance increases.This behavior holds regardless of the section of the UNet where the residual connections are placed.Additionally, notice that Batch Normalization (ResUNet-BN) seems to have random impact, sometimes increasing and sometimes lowering the performance, although just marginally.
The performance rises for classes ''stroma'' and ''necrosis'' when the dilated convolution operation is added to the architecture (P-ResUNet).This shows that residual connections with dilated convolutions are a suitable combination.Moreover, when the dilated convolution is used as part of the DPB module, it has a null contribution, except for class ''necrosis''.Such results show that the dilated convolution by itself brings little contribution, as opposed to its combination with the residual connections.
Regarding the architectures where the dilation of the convolutions is increased gradually (Series and RMS), it seems that also these architectures improve and lower the performance depending on particular classes but fail to remain consistently better for all cases.It is only the integration of all variants that make a robust architecture (DRD-UNet) that achieves the highest performance for three classes, as well as on average for all five classes.
In terms of Sensitivity and Specificity, the best results were distributed among the architectures, and in several cases, there were draws between several architectures (DRD-UNet, ResUNet-BN, DPB5-i, Series-i) like the specificity for stroma.For necrosis, the specificity results were remarkably close between the best with the best (ResUNet, ResUNet-BN) at 0.99 and DRD-UNet 0.98.Similarly, the specificity for Other of DRD-UNet was 0.98 while DPB3-a was 0.99.
Figures 8 and 9 illustrate the semantic segmentation results with selected sections of the images that contain a representative region of each of the five classes in each column and rows correspond to the architectures.TP are labeled in white; TN are labeled in black; FP are labeled in green, and FN are labeled in pink.The best results of DRD-UNet are visible in the smaller green and pink regions for tumor, stroma and other, which correspond to the values of Jaccard and Dice shown in Table 4.For inflammatory and necrosis, the areas are slightly higher than other architectures, but since tumor and inflammatory are the most common classes, these are the ones that have a greater impact on the overall accuracy giving DRD-UNet an advantage.

V. DISCUSSION
The addition of the proposed DRD block into DRD-UNet provided better results of Jaccard, Dice, and Accuracy in the fifteen images of the BCSS challenge.Whilst the set is relatively small, the results are encouraging and are worth considering in future experimentation.It can be noted the importance of dilated convolutions to increase the receptive field and we proposed the purified DPB3 [62] block with an Atrous convolution of dilation D = 1, 2, 3. Experimentation revealed there was no significant improvement after adding several residual blocks in different UNet levels.Under the assumption that we can extend performance by adding a deeper layer, a dense block was proposed after the residual.The reduction of training loss indicates a rapid computational convergence in terms of training time.Further work can be done to evaluate this operation over different datasets and other types of cancers and stainings.
Our analysis was performed on a single dataset that contained only triple-negative breast cancer cases.Whilst this is a limitation in the diversity of cancer cases, we consider that for the purposes of comparing a series of architectures, the conclusions in terms of performance should still be valid.However, further studies to confirm the performance of DRD-UNet with larger datasets and with other cancers would be useful to demonstrate the capabilities of the architecture.A closely related observation is that all the images were stained with H&E.Further experimentation with other stainings should be done to explore the capabilities of the architectures here compared.Second, in some cases the differences in performance are quite small.Indeed, for specificity several implementations provided exactly the same metrics or varied only by 0.01.Third, in this work we focused on the accuracy of the semantic segmentation provided by sixteen architectures derived from the original UNet.Further work could consider the segmentation and classification, not of tissue regions, but of specific cells (e.g., Tumor Infiltrating Lymphocytes [77], [78] ), or nuclei (e.g., [79], [80], [81], [82] ), which can provide further information to analyze cancer datasets.Further, by using the nuclei as a starting point, it is possible then determine the extent of cell and from there isolate and analyze other structures like extracellular matrix [83].

VI. CONCLUSION
In this work, we presented a UNet-like architecture, called DRD-UNet (Dilation, Residual, and Dense block).DRD-UNet was used to perform semantic segmentation of a multiclass breast cancer image dataset provided by the Breast Cancer Semantic Segmentation Challenge.DRD-UNet was compared systematically against the basic UNet architecture and fifteen variations and provided the best results in terms of overall accuracy and Dice coefficient and Jaccard similarity index for three (tumor, stroma, other) of the five classes.

FIGURE 1 .
FIGURE 1. Illustration of the histological images and annotations.(a) One representative histological image stained with Hematoxylin and Eosin (H&E) from the Breast Cancer Semantic Segmentation (BCSS) Challenge.(b) A rectangular region of interest is denoted by overlaid colors that correspond to labels of interest for this work: tumor (red), stroma (green), inflammatory (purple), necrosis (blue), and other (gray).(c) A zoom of the of the region of interest with additional yellow lines that indicate patches of size 256 × 256, which will be extracted for the training and validation of the proposed DRD-UNet and other architectures.
Figure 1.a,b illustrates the selection of an ROI from a WSI, and annotations associated

FIGURE 4 .
FIGURE 4. The UNet architecture as proposed by Ronneberger [11].It consists of encoder-decoder sections and contains a series of downsampling steps (encoder) obtained by convolutions and downsampling operations, and then upsampling steps (decoder) formed by upsampling plus convolution operations.

FIGURE 5 .
FIGURE 5. Illustration of the modifications over the original UNet architecture.(a) Input encoders (blocks shaded in light green) and decoders (blocks shaded in darker green) were replaced by the corresponding block under study.(b) Atrous Convolution at input encoder up to D = 3 in series were inserted.(c) Atrous Convolution in series was inserted at every encoder.

FIGURE 6 .
FIGURE 6. Illustration of calculation of performance metrics with a synthetic image with 4 classes.(a) Ground truth.The classes are denoted by different shades and numbers next to them (the numbers are not part of the image).(b) Estimated image.(c) Ground truth and estimated images overlaid.True Positives (TP) are white, True Negatives (TN) are black, False Positives (FP) are dark gray, False Negatives (FN) are bright gray.(d) Illustration of overall accuracy; pixels correctly estimated are shown in white and incorrect are shown in black.(e-h) Illustration of per-class TP, TN, FP, FN.Notice the FN in class 3 and the FP in class 4.

FIGURE 7 .
FIGURE 7. Comparison of training loss and validation loss curves for the architectures that have been publishedUNet-3[11], ResUNet-BN[55], P-ResUNet[62], ResUNet-a[75], RMS-UNet[56], and DRD-UNet (ours).(a-f) correspond to training loss curves, (g-l) correspond to validation loss curves.For all cases, the training loss is shown in blue with transparency and a weighted average thicker black line is overlaid.DRD-UNet shows a rapid decrease in training loss, similar to RMS-UNet and there is no indication of overfitting in the validation curves.The case with slowest decrease is ResUNet-a, which ends above UNet-3.ResUNet-BN shows a fast decrease in the training, with probably the lowest values, but on the validation, there is an increase, which suggest some overfitting.

FIGURE 8 .
FIGURE 8. Semantic segmentation results of five different classes with the first 8 models evaluated.The first row shows representative input patches that focus on a region where each class is present.Second row shows the classes of the ground truth overlaid on the input patch with colors: necrosis (dark purple), inflammatory (light purple), stroma (green), tumor (red), and other (gray label).Rows 3 to 10 are the results of the first 8 models.TP are labeled in white; TN are labeled in black; FP are labeled in green, and FN are labeled in pink.

FIGURE 9 .
FIGURE 9. Semantic segmentation results of five different classes with models 9 to 17.The first row shows representative input patches that focus on a region where each class is present.Second row shows the classes of the ground truth overlaid on the input patch with colors: necrosis (dark purple), inflammatory (light purple), stroma (green), tumor (red), and other (gray label).Rows 2 to 10 are the results of the evaluated models.TP are labeled in white; TN are labeled in black; FP are labeled in green, and FN are labeled in pink.

TABLE 3 . Details of the experimental design
. A traditional UNet was used as a baseline model, and a series of modifications of the encoder and decoder were applied (ablation studies).All models were trained in the same way: mini-batch of size 8, 10 epochs with a random shuffle of images at every epoch, initial learning rate of 1e-3, adaptive moment estimation optimization algorithm.Columns 2-6 indicate: total number of trainable parameters in millions, number of layers, brief description, figure if applicable and reference if applicable respectively.

TABLE 4 . Performance of sixteen deep learning architectures and the proposed DRD-UNet described inTable 3 .
Per-class performance is measured with Jaccard Similarity Index and Dice coefficient, Per-image performance is measured in accuracy.Best results are highlighted in bold.

TABLE 5 . Performance of sixteen deep learning architectures and the proposed DRD-UNet described inTable 3 .
Per-class performance is measured with Sensitivity and Specificity.Accuracy is repeated from Table4for convenience.Best results are highlighted in bold.