Dmbg-Net: Dilated multiresidual boundary guidance network for COVID-19 infection segmentation

: Accurate segmentation of infected regions in lung computed tomography (CT) images is essential for the detection and diagnosis of coronavirus disease 2019 (COVID-19). However, lung lesion segmentation has some challenges, such as obscure boundaries, low contrast and scattered infection areas. In this paper, the dilated multiresidual boundary guidance network (Dmbg-Net) is proposed for COVID-19 infection segmentation in CT images of the lungs. This method focuses on semantic relationship modelling and boundary detail guidance. First, to effectively minimize the loss of significant features, a dilated residual block is substituted for a convolutional operation, and dilated convolutions are employed to expand the receptive field of the convolution kernel. Second, an edge-attention guidance preservation block is designed to incorporate boundary guidance of low-level features into feature integration, which is conducive to extracting the boundaries of the region of interest. Third, the various depths of features are used to generate the final prediction, and the utilization of a progressive multi-scale supervision strategy facilitates enhanced representations and highly accurate saliency maps. The proposed method is used to analyze COVID-19 datasets, and the experimental results reveal that the proposed method has a Dice similarity coefficient of 85.6% and a sensitivity of 84.2%. Extensive experimental results and ablation studies have shown the effectiveness of Dmbg-Net. Therefore, the proposed method has a potential application in the detection, labeling and segmentation of other lesion areas.


Introduction
The ongoing pandemic poses significant challenges to public health systems and societies at large.The global tally of confirmed coronavirus disease 2019 (COVID-19) cases has reached a staggering 770,875,433 as of September 27, 2023 [1].This figure represents the cumulative number of individuals who have tested positive for the virus across various countries and regions worldwide.The necessity to develop machine-based tools for detecting COVID-19 signs in diagnostic imagery has arisen due to the worldwide spread of the pandemic [2,3].The COVID-19 outbreak has caused a global public health crisis with unprecedented disruption and long-lasting effects [4].Moreover, nonlaboratory evaluations, such as computer-aided analysis of chest radiographs (X-ray) or computed tomography (CT) scans, have been implemented to scrutinize the lungs for signs of COVID-19 [5].Compared to X-ray, CT screening is widely preferred due to its advantages and three-dimensional view of the lungs [6,7].CT enhances the efficiency of medical image analysis conducted by physicians, strengthens their clinical image perception skills and improves patient treatment rates.
Although there have been advancements in the development of intelligent diagnostic systems and treatment methods for COVID-19, numerous challenges still need to be addressed [8].First, the presence of diverse morphological variances and the varied positioning of infected areas in lung CT images significantly impact feature analysis and information extraction, especially object boundaries and small targets.Second, infected areas have a wide range of infection characteristics, and the images have low contrast between normal tissue and infected lesions, which causes difficulty in segmenting unclear structural boundaries.
Benefiting from the rapid advancement of deep learning techniques [9,10].Rani et al. [11] successfully addressed a potential obstacle to feature extraction in chest X-rays by implementing bone suppression and lung segmentation preprocessing techniques.These methods not only overcome the obstacle but also ensure the preservation of the highest possible spatial information and resolution.Fan et al. [12] developed the Inf-Net network by employing reverse attention and implementing semisupervised learning.Additionally, several works have tried to model the global context from the perspectives of boundary information using an encoder-decoder architecture [13,14].However, these methods fail to offer the flexibility needed for adjusting the receptive fields to match the diverse scales of the intended objects.The fixed receptive fields of the feature map hinder their ability to dynamically accommodate the varying scales of the targets.In contrast, a few studies have shown that learning the appropriate receptive fields and achieving an optimal configuration can enhance the perception of semantic cues associated with various objects [15,16].
Compared with most computer-aided detection/diagnosis methods, several studies have been conducted on methods to reduce a high false-positive rate [17][18][19].More recently, object segmentation methods have leveraged global appearance models to accurately identify and delineate target regions, encompassing both foreground and background.Yan et al. [20] designed a multiscale cascading deep belief network via calculating the Fourier spectrum to capture multiple scale characteristics.The diagnosis network acquires richer and distinctive feature information from the original signal [21,22].Bougourzi et al. [23] proposed a hybrid loss function to address the challenge of segmenting COVID-19 infection pixels with blurry boundaries.Furthermore, most prior research has concentrated on precision in specific areas while disregarding the boundaries [24][25][26].Fan et al. [27] integrated boundary features with the multi-layer output features to generate the ultimate output.Additionally, when considering the specific tasks associated with the segmentation of medical images, gaining a deeper comprehension of the intrinsic relationships that exist among the individual pixels becomes imperative [28,29].
The infected areas in lung CT images are usually scattered with complex backgrounds [30,31].The edge detection method uses local gradient representation to identify object boundaries and then separates the closed-loop region into objects [32].This idea is used in network design.However, directly transmitting the coarse low-level features may lead to redundancy interference.Several research studies have provided evidence that edge information can serve as a valuable constraint in guiding the extraction of features [33,34].Most of the previous studies focused on regional accuracy but neglected boundary quality.In fact, clear regions and boundaries hold significant importance in the segmentation of COVID-19 infection.This study incorporates explicit modelling of edge information within the network to effectively utilize edge cues.
The main contributions of this study are as follows: 1) A novel model of the dilated multiresidual boundary guidance network (Dmbg-Net) is proposed, which can learn more boundary details of lung lesions.In this method, a dilated residual block is used to expand the respective field.To fully leverage the contextual information, the method allows for exponential enlargement of the receptive field while maintaining resolution and intact coverage.To effectively acquire the low-contrast and boundary areas, edge information is generated by an edge-attention guidance preservation (EGP) block, which provides detailed structured information.2) A loss function is designed for amplifying the flow of gradients that effectively amplifies the saliency of the pertinent regions and suppresses that of the irrelevant regions.In the encoding path, the Dmbg-Net encoder-decoder framework is designed for ResNet-50, which causes the network to converge quickly.The results of ablation studies verified the effectiveness and accuracy of the proposed model.After comparative experiments, the proposed method outperforms the state-of-the-art convolutional neural networks (CNNs), including U-Net, UNet++ and GFNet.
The remainder of this paper is organized as follows.Section 2 provides an overview and detailed component description of the proposed approach.Section 3 illustrates the experimental validation and evaluation of COVID-19 infection segmentation tasks using the proposed model.Section 4 discusses the shortcomings between the proposed model and the existing segmentation network.Finally, the conclusion is presented in Section 5.

The proposed method
In this section, we begin by providing a detailed overview of our Dmbg-Net, focusing on its network architecture.Subsequently, we cover various network components, including the dilated residual block, edge-attention guidance preservation block and edge aggregation pile block.Furthermore, we clarify how we utilize a receptive field strategy to enhance segmentation accuracy and employ a progressive multi-scale supervision strategy to effectively segment fine-grained lung lesions.Finally, we introduce the loss function utilized by our model.

Structure of Dmbg-Net
The structure and details of the proposed Dmbg-Net are shown in Figure 1.The framework is based on an encoder-decoder architecture.It comprises five primary components: A feature encoder, dilated residual block, edge-attention guidance preservation (EGP) block, feature decoder and edge aggregation pile (EP) block.We have provided comprehensive details of the dilated residual block in Figure 1 for clarity.The model uses ResNet as a backbone network.To further enhance the features learned from ResNet, the dilated residual block aims to enhance and adapt the receptive fields to capture contextual information at the most suitable scale.Additionally, the EGP block is specifically designed to effectively handle shape edges, particularly semantic boundaries.Finally, the EP module has been developed to progressively guide the integration of multilevel feature maps, enabling the capture of richer semantic information.CT images are monochromatic, which means that the pixel values in all three channels are identical when the image is uploaded.Depending on the nature of the image, a monochromatic input may possess three channels (RGB image).We employ a 3 × 3 convolutional module with a stride of 1 and padding of 1 to process the input image.Subsequently, batch normalization is applied and activated using the rectified linear unit (ReLU).This approach can effectively process monochromatic CT images while preserving high accuracy in medical image segmentation tasks.
The feature extraction stream of our Dmbg-Net architecture consists of a stack of three convolutional layers: A 1 × 1 convolutional layer, a 3 × 3 convolutional layer and another 1 × 1 convolutional layer.On the one hand, we use a skip connection to allow information to flow directly from one layer to another without passing through intermediate layers.The model has the capability to generate high-level features that are specific to each class.On the other hand, adding the Maxing layer reduces model overfitting and improves its ability to capture global features in the input image.The Dmbg-Net architecture is divided into five layers, during which a pooling layer is linked between each of these stages.The first 2 layers consist of three cascaded convolutional layers, each with a different function.The first layer comprises a standard convolutional layer with a kernel size of 3 × 3.This layer employs a series of filters to extract low-level features.The second layer is a depthwise separable convolutional layer that applies separate filters to each channel of the low-level feature map.Each feature decoder includes a 1 × 1 convolution, a 3 × 3 transposed convolution and a 1 × 1 convolution.Therefore, this model combines hierarchical features from all the convolutional layers into a holistic framework.This holistic framework allows for automatic learning of all parameters, eliminating the need for additional overhead or increased complexity.
Given the significance of semantic relations and boundary constraints in the segmentation task, we present a comprehensive model for COVID-19 infection segmentation.The proposed model emphasizes the decoding process by incorporating global semantics and boundary constraints.In each feature encoder, we jointly consider the dilated convolutions, residual learning and edge guidance module to offer ample multiscale context for feature decoding.The traditional approach for segmenting GGOs relies on either grey information or the Euclidean distance, often leading to erroneous GGO segmentation.We employ explicit supervision to compare the generated boundary map with the boundary ground truth acquired through the boundary extractor to ensure the efficacy of learning.Specifically, it tackles the challenge of scattered lesion positioning and irregular issues.By leveraging the attributes derived from the feature decoder, we effectively transfer the representations of edge attention to the layers at a higher level to improve the final results.

Dilated residual block
To ensure accurate and complete regions, we propose a dilated residual block, which is made from dilated convolution and one residual path.Dilated convolutions, sometimes called atrous convolutions or convolutions with holes, have gained substantial popularity in deep learning.The convolutional kernel weight is set to zero at corresponding locations, which is given by Eq (1): where * denotes the convolution operation and   convolves the input feature map  .A dilated convolution θ , with a kernel size of  and dilation rate  determines the pattern for these holes.By employing this approach, the receptive field is effectively enlarged without the need to introduce any additional network parameters that require learning.
In state-of-the-art medical image algorithms, dilated convolutions are used to improve the receptive field while maintaining spatial information.Different from the existing methods, we utilize different dilated convolution rates to learn information about the contents of various high-level features.The dilated residual block comprises two sets of 3 × 3 convolutions, as in Eq (2): where * denotes the convolution operation and   convolves the input feature map  .A dilated convolution θ , with a kernel size of  and dilation rate of 2 ×  ( denotes the model feature decoder number).Φ denotes a 3 × 3 convolution with  channels, followed by the BatchNorm layer and a ReLU activation function.
Specifically, we employ an encoder architecture that consists of three convolution layers with dilation rates of 2, 4 and 8.As shown in Figure 1, this design results in an effective kernel size of 5, 9 and 17 for each convolution operation at their respective levels.The feature maps of the dilated residual block are combined through a residual connection.This aggregation process ensures seamless integration of the feature maps, enhancing the overall performance of the model.The use of dilated convolutions in the encoder enables the network to capture contextual information from a larger receptive field while preserving spatial resolution, which is important for contextual information processing.After each encoder layer, a max pooling operation with a stride of 2 and a 2 × 2 kernel is applied.This operation decreases the spatial dimensions of the feature maps by half while preserving their essential features.The feature maps processed by these dilated residual blocks have sizes of 88 × 88, 176 × 176 and 352 × 352.
The bottleneck convolution blocks are depicted in Figure 1.These blocks execute the combined residual transformation.The bottleneck is employed to compel the model to learn a compressed representation of the input data.This compressed representation should exclusively encompass the vital and valuable information needed for the decoder to restore the input.In the initial U-Net architecture, the bottleneck comprises two convolutional layers with a size of 3 × 3, both activated by the rectified linear unit (ReLU) function.As shown in Figure 2, the bottleneck of the multi-scale middle block includes avgpooling, convolutional units and maxpooling.Each convolution includes a BatchNorm layer and a ReLU activation function.The bottleneck employs a dilation rate of 8 while maintaining a stride of 1 and padding of 8.This architecture facilitates efficient feature extraction from medical images of multiple scales while preserving spatial resolution.By utilizing dilated convolutions in the encoder, we can expand the receptive field of each layer without compromising the spatial resolution.By incorporating feature encoder layers with distinct receptive field sizes, Dmbg-Net facilitates the acquisition of multiscale information at the low-level and object level.This unique attribute of our network enhances its ability to capture more boundary information.

Edge-attention guidance preservation block
The foundation of our framework takes significant inspiration from the research of [27].The authors assumed that the utilization of edge detection and region segmentation approaches could be beneficial.To optimize the feature extraction process in Dmbg-Net, we design an edge-attention guidance preservation (EGP) block that fuses the features of each layer by leveraging the concept of edge detection.These representations are used to guide our proposed approach and improve the precision of the final outcomes.The EGP block consists of five inputs, two outputs and a boundary aggregation (BA) unit component.The encoder layers E1 and E2 enhance the boundary information of the high-level decoder features D2, D3 and D4.Specifically, the focus of this block is acquiring boundary context, conserving the distinctive features of local edges within the E1 and E2 layers, and designing BA units to aggregate multiscale side outputs from the decoding layer.
To emphasize the significant boundaries in feature decoding, modifications are made to the U-Net network by incorporating edge attention representations.We devise the BA module to incorporate the guidance of boundaries for low-level features in the integration of features.Details of this module are shown in Figure 3.In the BA module, the initial step involves utilizing global average pooling to consolidate the overall contextual details of the inputs.Subsequently, two 1 × 1 convolutional layers, each employing distinct nonlinear activation functions, are employed to assess the significance of each layer and produce the weights across the channel dimension.By utilizing sigmoid functions, we can generate more distinctive features in the output, thereby enhancing the representativeness of the results.Several studies have demonstrated that utilizing edge data can offer valuable limitations to direct the extraction of features for segmentation [12,13].Consequently, considering that the proposed model incorporates low-level features, effectively retaining ample edge data, these low-level features are incorporated into our analysis.The resolution of the output feature from the BA unit is matched by upsampling the outputs of the EGP block, feeding them to the 1 × 1 and 3 × 3 convolutional layers and concatenating them.Next, we evaluate the dissimilarity between the generated edge map and the ground truth (GT) edge map.This evaluation is performed using the standard binary cross-entropy (BCE) loss function.The BCE loss of edge map can be written as Eq (3): where ,  are the coordinates of each pixel in the predicted edge map  and edge ground-truth map  .The  and ℎ parameters represent the dimensions of the respective maps.While obtaining the feature map, this study calculates the gradient of the ground truth of the input image to obtain the edge ground truth of the boundary.The edge-attention guidance preservation block serves as a mechanism for transferring edge information from early encoding layers to high-level layers, where it can be combined with other features to guide segmentation.Using this approach, the model can leverage the strengths of edge detection and attention mechanisms to enhance the efficiency of the image segmentation network.

Edge aggregation pile block
We design a progressive multiscale supervision strategy for edge aggregation pile (EP) blocks.Details of this module are shown in Figure 1 (right).Incorporating skip connections and leveraging the concatenation of feature decoders, the EP block consists of two inputs and one output and a component of a multiscale supervision (MSS) block, which are described in detail below.As illustrated in Figure 4, the EP blocks consist of two convolutional units with a size of 3 × 3.These units have a dilation rate of 2 × , where  represents the number of feature decoders in the model.The stride is set to 1, and the padding is adjusted to 2 ×  .Output    aims to create segmentation outcomes by utilizing the feature encoder and feature extractor.By incorporating skip connections, the feature decoder can acquire additional information from the encoder, compensating for the information loss caused by pooling and convolutional operations.These techniques enable efficient feature extraction at multiple scales while preserving spatial information and reducing overfitting.Next, a 3 × 3 convolutional layer and a bilinear upsampling module are employed, and then a 1 × 1 convolutional layer is applied, followed by a sigmoid activation function.The proposed model benefits from deep multiscale supervision, as it incorporates channels with various sizes of 176, 88 and 44.These modules allow Dmbg-Net to incorporate edge information at multiple scales and improve its ability to segment complex objects with irregular shapes.With the constraint of deep supervision, we can acquire enhanced feature mapping and generate final predictions.By simultaneously considering the information pertaining to the three high-level attributes D4, D3 and D2, a unified output feature resolution is achieved.Effective guidance helps the network learn missing components and intricate aspects of the perimeter, resulting in more comprehensive and accurate predictions.Hence, optimizing the gradient flow throughout the various layers of the model during the backpropagation process makes it possible to achieve faster convergence.

Loss function
The loss function  seg is defined as a combination of the weighted IoU loss function  IOU w and the weighted binary cross entropy (BCE) loss function  BCE w .The loss is calculated as follows: where  represents the weight (assigned a value of 1 in this study).The two components of  seg offer efficient global (image-level) and local (pixel-level) guidance.This approach ensures that the obtained results are reliable.Deep supervision involves computing the loss for both the hidden layers and the overall model, subsequently refining the model by leveraging the aggregated loss value.This study implements deep supervision for four outputs ( seg , i = 1, 2, 3, 4) and the boundary loss ℒ ℯ  ℊ ℯ .The corresponding total loss is given as follows:

Image database and evaluation indicators
The labelled CT images are taken from the COVID-19 CT segmentation dataset, composed of 100 axial CT images collected by the Italian Society of Medical and Interventional Radiology [35].With only 98 images available, this dataset is the first open-access resource for segmenting lung infections caused by COVID-19.We extracted 920 high-quality CT images from the COVID-19 CT collection dataset [36], which comprises twenty 3D CT volumes obtained from different COVID-19 patients.To better train the model and obtain a relatively sufficient training sample, two common datasets were combined to obtain 1018 high-quality CT images, which were further divided into 718 training images and 300 test images.
We utilize the Dmbg-Net architecture described in this study for the infected region experiment, incorporating dilated convolutions.We use an optimal receptive field strategy to improve Dmbg-Net, primarily based on an encoder-decoder network.This strategy is contrasted with two well-known segmentation models, U-Net and U-Net++, and the most recent model GFNet.The source code of the proposed method is available at https://github.com/pure-sky/Dmbg-Net.
Based on the studies by Fan et al. [12] and Fan et al. [27], we use six widely adopted metrics, i.e., the Dice similarity coefficient (DSC), sensitivity (Sen.), structure measure (S α ), enhance-alignment measure (  ϕ ), mean absolute error (MAE) and precision (Prec.).The formula provided below represents a novel metric that evaluates the local and global similarity of two binary graphs.
where  and ℎ represent the width and height of the ground-truth map , respectively, and ,  represents the coordinates of each pixel in .The symbol ϕ is an enhanced alignment matrix.
where α is the balance factor used to control object-aware similarity S and region-aware similarity S .In this study, we employ the identical metric value as the original text by utilizing the default setting (α = 0.5).The primary purpose of the structure measure S is to assess the degree of structural similarity between the prediction map and the ground-truth mask.The MAE metric measures the pixellevel error between  and , which is written as: where  and ℎ represent the width and height of the ground-truth map , respectively, and ,  represents the coordinates of each pixel in .Among these indicators, Sen. and S can reflect the segmentation integrity, and the DSC, Prec.,  ∅ and MAE can evaluate the overall performance.To maximize performance, except for the MAE metric, the numerical value of these measurements is higher.

Implementation details and experimental results
The proposed model is configured with the PyTorch toolbox and trained on a single Quadro RTX 6000 (24 GB) GPU.To ensure fairness in the training process, we apply a consistent resizing technique to all input data, resulting in a standardized size of 352 × 352.Our training approach for Dmbg-Net incorporates a multi-scale strategy [12,34].We begin by resampling the training images with varying scaling ratios, such as 0.75, 1 and 1.25.Subsequently, we employ the resampled images to train Dmbg-Net, thereby enhancing the overall generalization capabilities of the proposed model.The batch size is set to 4, and the Epoch is adjusted to 200.The training utilizes the Adam optimizer, with a learning rate of 3e-4.To assess the segmentation capabilities of the Dmbg-Net model proposed in this study, we employ it for COVID-19 infection segmentation.We conduct a comparative analysis against the classical algorithms currently in use. Figure 5 illustrates the visual comparison among the proposed model and other state-of-the-art methods.The original CT images are presented in the first column, while the second column displays the evaluation standard, which represents the manual marking performed by radiologists.From left to right are the results of the proposed method, U-Net [9], UNet++ [37] Attention-UNet [38], FCN [10], Inf-Net [12], GFNet [27], BCS-Net [13] and BS-Net [14].The proposed method demonstrates superior advantages of accuracy, completeness and sharpness compared to other techniques.For instance, in the first image, classical medical image segmentation networks, including U-Net, UNet++ and Attention-UNet, frequently fail to effectively mitigate the interference caused by background regions, such as the areas between the left lung and the background.The proposed method successfully overcomes this challenge, ensuring precise and comprehensive segmentation results.
The COVID-19 segmentation network exhibits superior performance in terms of recognition.However, the existing approaches, including Inf-Net, GFNet and BCS-Net, are unable to complete mitigate these interferences.For instance, the suppression of the area above the right lung proves to be inadequate.The proposed Dmbg-Net preserves the structural boundary of the desired area, even in cases where the structural boundary of the image is unclear or exhibits textural variations.Figure 6 is the box plot comparing Dmbg-Net with other state-of-the-art techniques.The training loss curve shows that the proposed method achieves fast convergence.The findings demonstrate that our Dmbg-Net outperforms all existing methods and exhibits superior stability and robustness.Our proposed approach demonstrates superior performance in these areas and showcases a heightened capability to identify intricate details.The eighth row of images validates the accuracy detection of infected area boundaries and effectively suppresses extraneous background noise.In addition, the proposed method has a more complete structure and sharper boundaries.In the second image, novel approaches, including GFNet and BSNet, fail to consistently and comprehensively identify infected lesions in the lower section of the left lung.Conversely, our innovative method successfully detects these regions with precision and completeness.Furthermore, compared with the currently existing methods, including UNet++ and Attention-UNet, our method results in clear boundaries, which are crucial for both academic exploration and practical applications in the early identification of lung lesions.
In CT images, the segmentation results of the U-Net model are generally the least satisfactory, with relatively rough boundaries.On the other hand, FCN-8s demonstrate varying levels of image oversaturation.According to the experimental findings, the segmentation outcomes achieved by Dmbg-Net surpass those of other algorithms, including UNet++.This novel approach exhibits superior segmentation performance and enhances the overall image quality.To ensure an objective assessment, conducting a quantitative analysis of the segmentation results is crucial.This approach helps mitigate the potential biases introduced by subjective factors, ensuring a more objective assessment.The quantitative comparisons are reported in Table 1.The Dmbg-Net model outperforms the other models across all five evaluation metrics when tested on the dataset, clearly demonstrating its superiority.Our proposed model's Dice coefficient reaches 85.6%, with a precision of 87.9%, and the segmentation effect is relatively more consistent.For example, compared to Inf-Net, GFNet has a percentage gain of 2.17% in DSC scores, while the proposed method has a percentage gain of 3.01%.In terms of  , peak performance is attained.More precisely, the proposed method exhibits an 8.25% enhancement in performance compared to UNet++ and a 3.90% improvement compared to Attention-UNet.The proposed Dmbg-Net framework demonstrates superior performance in the segmentation method for COVID-19 infection.Moreover, compared to the runner-up method BCSNet, there is a noticeable increase of 3.78% in precision percentage.In terms of quantitative evaluations, our detection capabilities are generally superior, ensuring a high level of accuracy in identifying lung infections.

Fine-grained segmentation in a multiclass dataset
Our objective of this research is to develop a framework for segmenting lung infections associated with COVID-19.These lung infection areas have visible imaging manifestations caused by ground glass opacity (GGO), consolidation and pulmonary fibrosis.Segmenting infection regions in multiple classes will undoubtedly offer additional support for enhancing the assisted diagnosis.This study utilizes the dataset provided by Zhang et al. [39], which comprises 150 CT scans.It includes 750 slices featuring a lung field and templates for GGO and consolidation segmentation.In this study, we use 500 slices for the purpose of training, while 250 slices are allocated for testing.The proposed model demonstrates satisfactory segmentation results for GGOs and consolidation, which are shown in Table 2.These outcomes stand in contrast to the findings of Enet in Paszke et al. [40] and the two existing methods mentioned in Zhao et al. [32].Table 2 demonstrates that when compared to alternative approaches, the proposed approach achieves significant performance in segmenting fine-grained infection areas, with a higher IOU for GGOs.The proposed method achieves an IOU of 57.45% for GGO segmentation, which is 9.81% higher than that of the second-ranked SCOAT-Net at 52.32%.Segmenting lung infection by multiclass segmentation poses a significant challenge due to the subtle variations in imaging manifestations observed between GGOs and consolidation.Conversely, the initial design of Dmbg-Net does not cater explicitly to consolidation segmentation but rather aims to identify abnormal regions within the lungs.Nevertheless, the proposed model consistently achieves high accuracy in segmenting both singleobject and multi-class infection areas.Given the robustness of the segmentation method in handling variations in infection datasets, it is evident that the proposed model exhibits robustness and strong generalization capabilities.

Ablation study
Numerous ablation experiments are carried out to validate the performance and effectiveness of the dilated residual block and the EGP block.These experiments are conducted with the aim of examining the key components of our proposed model.The results are given in Table 3 and Figure 7. First, we examine the impact of the dilated residual block.The primary purpose of incorporating dilated convolution layers is to enhance crucial spatial positional features and establish correlations between the various pixels.This approach guarantees improved precision and comprehensive segmentation capabilities.The absence of the dilated residual block results in a decline across all six metrics, with a particularly significant drop observed in the DSC score in Table 3. Upon the exclusion of the dilated residual block, some small infections in the top left of the image are not detected, and there are unsuppressed interference noises in the left lung of the first image in Column d.In the right region of the second image, there are visible instances of incorrect results.In the bottom left corner of the third image, there is a large false segmentation area compared to dilated convolution.The effectiveness of the dilated residual block is evident in these examples.Second, to assess the effectiveness of the edge-attention guidance preservation block, we attach them to the core network without establishing any connections between the boundary and the network.The final output of the network is obtained by utilizing characteristics from the second decoding phase, primarily focusing on the subsequent aspects.It maximizes the utilization of the attributes of the initial encoder layer, including E1 and E2, to complement boundary details for the higher-level decoder features.This dramatically outperforms the base network, proving that edge information is vital to segmentation.As shown in Figure 7, the first image in Column e has an obvious boundary over the segmentation, while the right area of the second image has blurred boundaries and obvious false detections.In the upper right section of the third picture, there are unsuppressed interference noises.When the foundational network does not include the EGP block, the DSC, MAE and  ϕ values are 84.8%,1.5% and 90.3%, respectively.When we attach the EGP block, the values are 85.6%, 1.4% and 96.4%, respectively.The fourth image w/o the EGP block in Figure 7 shows the lower portion of the left lung; the infection GT is mostly discrete and heavily dependent on the learning of the model for boundary information, and the infection spreads within the upper half of the lung.While the Dice coefficient of the w/o edge aggregation pile block is close to that of the proposed model for overall segmentation, the right lung region of the fourth image in Column f is missed in our global segmentation model.The evident improvements in performance for each of the three metrics demonstrate the effectiveness of the proposed EGP block.Through the integration of multiscale supervision, the segment of scattered and inconspicuous lesions is significantly improved by effectively leveraging contextual information.

Discussion
The distribution of infection is scattered due to the different GGO infection areas of the lungs and is distributed differently across the dataset.Background noise and low contrast make it challenging to detect edge information with traditional edge detection techniques.Dmbg-Net adds an edge-attention guidance preservation block to target regions for feature extraction and expands the receptive field by a dilated residual block while extracting multiscale contextual information.To evaluate the proposed models comprehensively, we implement the segmentation of two datasets obtained from publicly available medical datasets.Dmbg-Net performs better than the other models in identifying the precise borders and three assessment indicators.
In this study, we employ the dilated residual block architecture as a substitute for the conventional convolutional operations found in U-Net.By utilizing Dmbg-Net, we effectively augment the resolution of feature maps within the deeper layers, thereby enhancing the receptive fields of the input features.Additionally, the number of model parameters is slightly increased, effectively adjusting the requirements of the restricted target distribution in the affected area.The effect of this change is not statistically significant.In contrast to Bose et al. [16] and Fan et al. [27], we introduce border detection techniques that retain considerable border information in the E1 and E2 characteristics provided by the backbone network.In an encode-decode structure, edge detection enhances the sensitivity and precision of lung segmentation by suppressing background noise while generating finer semantic segmentation maps.Dmbg-Net offers great potential for quick identification of COVID-19 and quick separation of pulmonary infections, especially for distinguishing between the contours and intricate boundaries of small-scale target lesions.The distribution of classes in the second column of Figure 8 indicates an imbalance in the data categories.Within these categories, there is a limited representation of GGO and consolidation lesions, while most images predominantly depict unaffected regions of the lungs.The potential features in areas of interest can be seen in the slices (represented in Figure 8), and the manifestations of the shape, size, type and location of the infected areas during COVID-19 infection are highly diverse.Despite the promising results achieved by our Dmbg-Net in segmenting infected regions, there are certain limitations inherent in the current model.First, the focus of Dmbg-Net lies solely on the intact boundary of lung infection in COVID-19 patients.However, in clinical practice, it is often necessary to triage COVID-19 patients before proceeding with further treatment.In future work, we propose the integration of an automatic diagnosis system for lung lesions, COVID-19 detection, segmentation and quantification of lung infection into a unified framework.Second, in our multiclass infection segmentation framework, we adapt Dmbg-Net to guide and supervise the multiclass markers associated with different types of lung infections.Due to the scarcity of high-quality labelled data, this approach may result in suboptimal learning performance.We will study a viable option in which the semisupervised segmentation framework leverages unlabeled data to reduce the dependence on large amounts of labeled data.

Conclusions
Computer-aided COVID-19 infection segmentation is an effective approach for the early detection and diagnosis of lung lesions.A novel model, Dmbg-Net, is proposed and developed in this work, which uses an encoder-decoder framework with dilated convolution layers.These layers work together to refine boundaries and impose semantic constraints.Dmbg-Net is implemented on the multiclass segmentation of infections.The dilated residual block is designed to select the most crucial encoder features from the perspective of significant spatial information and contextual interdependence.The EGP block is designed to provide edge guidance, which can mitigate unclear boundaries effectively.Ablation studies have verified its effectiveness and accuracy.The proposed

Figure 2 .
Figure 2. The bottleneck of the multi-scale middle block.

Figure 3 .
Figure 3.The structure of the boundary aggregation unit.

Figure 4 .
Figure 4.The structure of the edge aggregation pile block.

Figure 6 .
Figure 6.Comparison of different methods by the box plot.(a) The box plot includes different approaches of the FCN, UNet++, Attention-UNet, CE-Net and the proposed Dmbg-Net.(b) The training loss curve of the Dmbg-Net.

Figure 7 .
Figure 7. Comparing the visual aspects of various distinct modules of Dmbg-Net.(a) Images, (b) ground truth, (c) Dmbg-Net, (d) w/o dilated residual block, (e) w/o edgeattention guidance preservation block and (f) w/o edge aggregation pile block.

Figure 8 .
Figure 8. Visual comparison of lung field, GGO and consolidation segmentation with different examples in multiclass datasets.(a) Images, (b) lung field, (c) GGO and (d) consolidation.Yellow is true positives, red is false positives and green is false negatives.

Table 1 .
Comparison of different methods on the segmentation of the COVID-19 dataset.

Table 2 .
Quantitative analysis of various networks for the segmentation of ground glass opacities and consolidations in multiclass datasets.