DMDF-Net: Dual multiscale dilated fusion network for accurate segmentation of lesions related to COVID-19 in lung radiographic scans

The recent disaster of COVID-19 has brought the whole world to the verge of devastation because of its highly transmissible nature. In this pandemic, radiographic imaging modalities, particularly, computed tomography (CT), have shown remarkable performance for the effective diagnosis of this virus. However, the diagnostic assessment of CT data is a human-dependent process that requires sufficient time by expert radiologists. Recent developments in artificial intelligence have substituted several personal diagnostic procedures with computer-aided diagnosis (CAD) methods that can make an effective diagnosis, even in real time. In response to COVID-19, various CAD methods have been developed in the literature, which can detect and localize infectious regions in chest CT images. However, most existing methods do not provide cross-data analysis, which is an essential measure for assessing the generality of a CAD method. A few studies have performed cross-data analysis in their methods. Nevertheless, these methods show limited results in real-world scenarios without addressing generality issues. Therefore, in this study, we attempt to address generality issues and propose a deep learning–based CAD solution for the diagnosis of COVID-19 lesions from chest CT images. We propose a dual multiscale dilated fusion network (DMDF-Net) for the robust segmentation of small lesions in a given CT image. The proposed network mainly utilizes the strength of multiscale deep features fusion inside the encoder and decoder modules in a mutually beneficial manner to achieve superior segmentation performance. Additional pre- and post-processing steps are introduced in the proposed method to address the generality issues and further improve the diagnostic performance. Mainly, the concept of post-region of interest (ROI) fusion is introduced in the post-processing step, which reduces the number of false-positives and provides a way to accurately quantify the infected area of lung. Consequently, the proposed framework outperforms various state-of-the-art methods by accomplishing superior infection segmentation results with an average Dice similarity coefficient of 75.7%, Intersection over Union of 67.22%, Average Precision of 69.92%, Sensitivity of 72.78%, Specificity of 99.79%, Enhance-Alignment Measure of 91.11%, and Mean Absolute Error of 0.026.


Introduction
Recently, the global disaster of coronavirus disease 2019  has grieved millions of people and triggered a socioeconomic crisis worldwide. According to the given figures of the WHO (World Health Organization, WHO Coronavirus Disease (COVID-19) Dash-board, 2021), until June 18, 2021, approximately 176,693,988 positive cases of the COVID-19 virus, including 3,830,304 deaths (2.17% mortality rate), have been reported globally. Additionally, the advent of different variants of COVID-19 has further caused an alarming situation worldwide due to their more contagious impact. Regarding the treatment of COVID-19, different experimental vaccines (Kim, Marks, & Clemens, 2021) have finalized clinical assessments and are authorized by the European Medicines Agency (EMA) and/or the Food and Drug Administration (FDA). However, mass production and worldwide distribution of the COVID-19 vaccine is still a challenging and time-consuming task. Until now, preventive measures and early diagnosis have been the only solutions to prevent further spread of this deadly virus. In the context of diagnosis, molecular tests, such as the nucleic acid amplification test (NAAT) and reverse transcription polymerase chain reaction (RT-PCR) are being performed to identify positive cases (Ai et al., 2020). However, these subjective evaluations are performed under strict clinical conditions that can limit the use of these testing methods in outbreak regions.
Recent studies (Ai et al., 2020;Fang et al., 2020) have found that chest computed tomography (CT) is a cost-effective diagnostic tool for the identification of COVID-19 infection. Fig. 1 shows a few CT images of different patients infected with the COVID-19 virus. The infected regions are indicated by red boundary lines. The quantitative results presented in (Ai et al., 2020) show that the personal evaluation of COVID-19 infection reached 97% sensitivity, in contrast with RT-PCR testing results based on lung CT images. Similar results in (Fang et al., 2020) demonstrated the diagnostic potential of chest radiography in the initial evaluation of the COVID-19 virus. Moreover, the quantitative evaluation of infection progress inside lung lobes is an important measure for medical treatment . Therefore, accurate segmentation of the infected regions is an important pre-processing step for assessing the severity of COVID-19 infection. However, manual evaluation of a large volume of CT scans is a time-consuming task and increases the workload requirements of healthcare professionals. Recent advances in artificial intelligence technology, especially in the field of medical diagnostics (Hou et al., 2021;Mhiri, Khalifa, Mahjoub, & Rekik, 2020;Xu et al., 2020), have substituted several humandependent diagnostic approaches with computer-aided diagnosis (CAD) tools. In the present outbreak of COVID-19, these CAD techniques can also support healthcare professionals in making timely and efficient diagnostic decisions using chest CT images. Generally, a CAD method applies a set of artificial intelligence algorithms to analyze the given data, such as CT images, and provides diagnostic results. Recently, a new set of artificial intelligence algorithms called deep learning has emerged that has significantly enhanced the diagnostic capabilities of numerous CAD techniques. These state-of-the-art algorithms can emulate the diagnostic capability of healthcare experts and make effective diagnostic decisions. Recently, convolutional neural networks (CNNs), a new variant of deep learning algorithms, have attained special attention in the development of CAD tools related to various medical domains. However, such CNN-based diagnostic methods need to be trained through supervised learning, which requires a large-scale annotated dataset. In the medical domain, data annotation is accomplished by healthcare experts, which requires sufficient time and resources. To substitute the requirement of a large-scale training dataset, transfer learning (Krizhevsky, Sutskever, & Hinton, 2012) was adopted to train a CNN-based CAD method. In this training approach, a pre-trained CNN (trained on a huge collection of natural images, such as ImageNet (Deng et al., 2009)) can be used in the medical domain. The internal structure of a CNN model consists of a series of convolutional and fully-connected (FC) layers, including other layers, such as batch normalization (BN), softmax, classification, and a rectified linear unit (ReLU) (detailed in (Heaton, 2015)). The convolutional and FC layers included learnable parameters that were initially trained with the training dataset.
In response to the current pandemic, various types of CAD methods exist in the literature. These existing methods mainly use chest X-ray and/or CT images to diagnose COVID-19. Initially, most of the existing methods Minaee, Kafieh, Sonka, Yazdani, & Soufi, 2020;Owais, Baek, & Park, 2021;Rathod and Khanuja, 2021) used deep classification models to make diagnostic decisions. These methods Minaee et al., 2020;Owais, Baek, et al., 2021;Rathod and Khanuja, 2021) can only classify positive and negative patients without highlighting the infectious regions in a given radiographic scan. Later, new methods were proposed based on deep segmentation networks, which localized the infected regions in a given radiographic scan. However, most of the existing methods lack cross-data analysis, which is a prime indicator for assessing the effectiveness of a CAD method under real conditions. Limited studies (Ma et al., 2021; have been conducted in which a cross-data analysis of the methods is performed. However, these methods showed limited results in cross-data analysis. Consequently, in this study, we address the limitations of the existing studies and develop a high-performance CAD method for the efficient and well-localized detection of COVID-19 related findings in chest CT images. The main contributions of our method are as follows. -We propose a dual multiscale dilated fusion network (DMDF-Net) for the robust segmentation of small lesions in chest CT images. Our designed model utilizes the strength of grouped convolution and multiscale deep features fusion inside the encoder and decoder modules using multiscale dilated convolution to achieve better segmentation results with a reduced number of training parameters. -Additional pre-and post-processing steps are introduced in the proposed method to address the generality issues and obtain superior performance in a real-world setting. Moreover, the post-processing step also provides a way to accurately estimate the proportion of the infected area of the lung (PIAL), which is an essential measure for quantifying the severity of COVID-19 infection. -Our proposed method achieves state-of-the-art results for the case of cross-data analysis and outperforms various existing methods and recent deep segmentation networks. -Finally, we make the proposed framework (including implementation of DMDF-Net, pre-, and post-processing steps) openly available through (https://github.com/Owais786786/DMDF-Net.git, accessed on 18 January 2022) for fair comparisons by other researchers.
The remainder of this article is structured as follows. In Section 2, we briefly review the various existing CAD methods for diagnosing COVID-19 infection using chest radiographic scans. Section 3 presents our selected datasets and proposed method. The training/validation settings and quantitative results are provided in Section 4. Finally, a brief discussion and the conclusions for the proposed framework are presented in Sections 5 and 6, respectively.

Related work
In recent literature, various types of diagnostic methods have been proposed to automatically diagnose COVID-19 from chest radiographic scans. These methods primarily focus on CNN-based classification and segmentation models to make diagnostic decisions. In Minaee et al., 2020;, the authors proposed CNNbased CAD frameworks that mainly classify the given radiographic scan as either positive or negative. Additionally, different training schemes were proposed in Minaee et al., 2020; to perform the optimal training of their models using a limited number of training samples. However, the methods in Minaee et al., 2020; were trained to classify only positive and negative cases of COVID-19 without detecting and localizing accurate lesion regions in a given radiographic image. In contrast, the semantic segmentation networks performed well in finding infected regions with COVID-19 in each radiographic image. However, pixel-level annotated ground truths are required to properly train and validate these segmentation networks. Such data annotation is performed by healthcare professionals, which requires sufficient time and resources.
To substitute the constraints of large-scale datasets, different semisupervised learning and data synthesis methods were proposed in literature Jiang, Chen, Loew, & Ko, 2021;. These methods can effectively train deep networks with limited training data. For example, (Jiang et al., 2021) proposed an image synthesis framework based on a conditional generative adversarial network (C-GAN) that can generate radiographic data samples (including both COVID-19 positive and negative CT images) for adequate training of deep networks. In addition, the conventional U-Net segmentation model (Ronneberger, Fischer, & Brox, 2015) was trained with and without using synthesized data to demonstrate the efficiency of their data synthesis approach. Subsequently,  presented a new version of C-GAN, called CoSinGAN, that can synthesize high-quality CT images by learning from a single data sample. The experimental results show superior segmentation performance for 2D and 3D U-Net compared to previous reference methods based on the synthesized data of CoSinGAN. In addition,  proposed a semi-supervised training scheme that effectively trains the proposed deep segmentation model (Inf-Net) using unlabeled data. A novel randomly selected propagation algorithm was adopted to perform the training of Inf-Net using labeled and unlabeled training data. Moreover, the aggregation of high-level features was performed inside Inf-Net to exploit the diverse representations of the lesion regions.
Later, (Ma et al., 2021) presented benchmarks for lung lobes and infection segmentation using two radiographic datasets, including CT images. Different segmentation models were trained and evaluated to achieve the best results. 3D U-Net was ranked as the best model among the different reference models. In a comparative study, (Oulefki, Agaian, Trongtirakul, & Laouar, 2021) presented a detailed analysis of traditional machine learning techniques in response to the automated diagnosis of COVID-19. Based on a limited number of data samples, the firstranked machine learning method showed comparable results to a deep CNN model. However, recent comparative studies (Jiang et al., 2021;Li et al., 2020; have proved that deep learning models outperform traditional machine learning methods using multi-source radiographic datasets. Furthermore, (El-Bana, Al-Kabbany, & Sharkas, 2020) developed a multitasking CAD method that comprises a classification and segmentation model to identify and segment certain types of infections in a given CT image. Initially, a pretrained CNN model was configured to recognize the positive and negative cases of COVID-19. Subsequently, a deep segmentation network (DeepLabV3+ (Chen, Zhu, Papandreou, Schroff, & Adam, 2018)) was included to segment the infectious regions in a given CT image. Similarly, (Zheng et al., 2020) presented a multiscale identification network (MSD-Net) to segment multiclass lesions of different sizes.
In a recent study, (Abdel-Basset, Chang, Hawash, Chakrabortty, & Ryan, 2021) presented a novel segmentation network, FSS-2019-nCov, to substitute the constraints of large-scale training datasets. FSS-2019-nCov contains a dual-path encoder-decoder design that mainly extracts high-level features without changing the channel information. A pre-trained residual network (ResNet34) was configured as an encoder. Later, (Selvaraj, Venkatesan, Mahesh, & Raj, 2021) developed a CAD framework based on the joint connectivity of a classification and segmentation network, similar to (El-Bana et al., 2020). Additional handcrafted features (i.e., lesion texture and structure information) were also used to efficiently train both networks. Subsequently,  and (Zhou, Canu, & Ruan, 2021) proposed segmentation-based CAD solutions for the effective detection of minor infectious regions in CT images caused by the COVID-19 virus.
To deal with multi-plane CT data, (Kesavan, Al Naimi, Al Attar, Rajinikanth, & Kadry, 2021) applied a pre-trained Res-UNet model for identifying the COVID-19 related lesion regions from the lung CT images with various 2D planes (such as axial, coronal, and sagittal orientations). In another study, (Munusamy, Muthukumar, Gnanaprakasam, Shanmugakani, & Sekar, 2021) proposed a novel CNN model (FractalCovNet) for detecting COVID-19 infection from heterogenous radiographic data (i.e., X-ray and CT images). The proposed model was configured to perform the following two tasks: 1) classifying the lung X-ray images into COVID-19 positive and negative cases; 2) recognizing the COVID-19 related infectious regions from lung CT images. Subsequently, (Voulodimos, Protopapadakis, Katsamenis, Doulamis, & Doulamis, 2021) performed the comparative analysis of two known segmentation models (U-Net and fully convolutional networks (FCNs)) using CT data from COVID-19 patients. Comparative results indicate the following distinctive aspects of FCNs over U-Nets: 1) achieve accurate segmentation despite the class imbalance on the dataset; 2) perform well even in case of annotation errors on the boundaries of symptom manifestation areas. (Zheng, Zheng, & Dong-Ye, 2021) performed the volumetric segmentation of the whole 3D chest CT-scan using an enhanced version of U-Net named 3D CU-Net. An attention mechanism was mainly included in the encoder part of the proposed 3D CU-Net to obtain different levels of the feature representation. Additionally, a pyramid fusion module with expanded convolutions was introduced at the end of the encoder to combine multiscale context information from high-level features. Similarly, (Zhao et al., 2021) proposed a dilated dual attention U-Net (namely D2A U-Net) for accurate detection of COVID-19 related lesion regions in chest CT images. The proposed D2A U-Net utilized the strength of the dual attention strategy to improve feature maps and decrease the semantic gap between different levels of feature maps. Additionally, the hybrid dilated convolutions are included in the decoder part to achieve larger receptive fields, which improves the decoding process. Finally, Table 1 presents a comparative summary of our proposed and various existing methods to highlight the superior aspects and limitations of each study.

Datasets and experimental setup
A total of two openly available datasets, MosMed (Morozov, Andreychenko, Pavlov, Vladzymyrskyy, Ledikhova, & Gombolevskiy, 2020) and COVID-19-CT-Seg (Jun, Cheng, Yixin, Xingle, Jiantao, & Ziqi, 2021;Ma et al., 2021), were selected to assess the performance of the proposed DMDF-Net and various baseline networks for a fair comparison. The MosMed dataset was made available by municipal hospitals in Moscow, Russia, which includes a total of 50 CT scans of different patients with COVID-19 infection. The entire dataset includes a total of 2,049 images and corresponding ground truths as segmentation masks. All segmentation masks were annotated by medical experts from the Moscow Health Care Department. In each CT image, all the findings related to COVID-19 infection are marked as white '1 ′ pixels in the corresponding annotated mask. Similarly, all the remaining pixels (other than the lesion regions) are marked as black '0 ′ in the annotated mask. The second dataset, COVID-19-CT-Seg, includes 20 CT scans of different patients with COVID-19 infection. This dataset includes a total of 3,520 images and corresponding ground truths as separate segmentation masks of the left lung ROI, right lung ROI, and infectious regions. Thus, COVID-19-CT-Seg includes three separate segmentation ground truths for each CT image. All segmentation masks (including left lung ROI, right lung ROI, and infectious regions) were annotated by junior data annotators and validated by three medical professionals. Fig. 2 presents a few CT images and their corresponding ground truths for both datasets.
MATLAB (version R2020b), which is a well-known coding framework, was used to implement and simulate the proposed DMDF-Net and other baseline models. All the experiments were performed using a personal desktop computer with an Intel Core i7 CPU with a Nvidia GeForce GPU (GTX 1070), 16-GB RAM, and a Windows 10 operating system.

Overview of proposed method
As shown in Fig. 3, the proposed CAD framework mainly consists of the following four stages: 1) data pre-processing step; 2) lung segmentation network (DMDF-Net-1); 3) infection segmentation network (DMDF-Net-2); and 4) post-processing step. In the first stage, the color and contrast of the input CT image I are adjusted according to the training dataset by applying a simple Reinhard transformation (RT) (Reinhard, Adhikhmin, Gooch, & Shirley, 2001). Mathematically, each testing image I is transformed into an enhanced image I T by applying the transformation I T = τ(I, φ), where τ() presents the RT as a mapping function and φ is the mapping parameter that incorporates the color and contrast information of training image. Subsequently, the second and third stages process the enhanced image I T (obtained after preprocessing) using two independent DMDF-Nets, and generate the segmented image of the lung region of interest (ROI) I ' 1 and infectious

Strengths Limitations
C-GAN andU-Net (Jiang et al., 2021) 829(9) C-GAN overcomes the underfitting issue -Lack of cross-data analysis -Limited testing dataset -Computationally expensive CoSinGAN and 2D U-Net  5,569(70) -CoSinGAN overcomes the underfitting issue -Detect the infected areas of various sizes -Data synthesis requires high computation power -Limited cross-data performance Inf-Net and Semi-Inf-Net  100 (40) Semi-supervised learning improves the performance -Lack of cross-data analysis -Limited testing dataset 3D U-Net (Ma et al., 2021) 5,569 (70) Detailed performance analysis and comparison -3D U-Net requires high computation power -Limited cross-data performance Modified local contrast enhancement (Oulefki et al., 2021) 275 (22) -Visualize the progression of disease -Computationally cheap -Lack of cross-data analysis -Lack of ablation study InceptionV3 and DeepLabV3+ (El-Bana et al., 2020) 100(40) Joint segmentation and classification framework -Lack of ablation study -Lack of cross-data analysis -Computationally expensive MSD-Net (Zheng et al., 2020) 4,780(36) -Improve small lesion segmentation -Detailed analysis -Lack of cross-data analysis -Difficult to distinguish the patients with mild symptoms FSS-2019-nCov (Abdel-Basset et al., 2021) 939 (69) Perform optimal training with a limited dataset -Lack of cross-data analysis -Data synthesis required high computation CNN (Selvaraj et al., 2021) 80(N/A) Perform optimal training with a limited dataset -Lack of ablation study -Limited testing dataset -Lack of cross-data analysis DAL-Net  5,569(70) -Address generality issue -Improve small lesion segmentation Include pre-processing stage U-Net + Attention mechanism (Zhou et al., 2021) 473 (69) -Improve small lesion segmentation -Deal with small lesion segmentation -Lack of cross-data analysis -Lack of ablation study Res-UNet (Kesavan et al., 2021) 200 (10) Detect the infected areas in various 2D planes of lung CT slices -Limited dataset -Lack of cross-data analysis FractalCovNet (Munusamy et al., 2021) 473(N/A) Detection of COVID-19 cases using both chest X-ray and CT images -Lack of ablation study -Lack of cross-data analysis U-Net and FCNs (Voulodimos et al., 2021) 939 (10) Overcome the effect of class imbalance and annotation errors.
-Lack of comparison with state-of-the-art models -Limited dataset Improved 3D CU-Net (Zheng et al., 2021) 5,569(70) Perform well in case of uneven distribution of lesions -Lack of ablation study -Limited cross-data performance D2A U-Net (Zhao et al., 2021) 1 regions I ' 2 , respectively. Our proposed DMDF-Nets (named as DMDF-Net-1 and DMDF-Net-2 in Fig. 3) mainly perform semantic segmentation and classify each pixel of input CT image either as black '0 ′ or white '1 ′ . In the output of DMDF-Net-1, the white '1 ′ pixels represent the "lung region" and black '0' pixels corresponding to "background." Similarly, the output of DMDF-Net-2 presents the "infectious region" and "normal/ background region" as white '1' and black '0' pixels, respectively. Finally, the post-processing stage further refines I ' 2 (the output of DMDF-Net-2 in the third stage) and generates the final output I ' by performing post-ROI fusion (i.e., I ' = (I ' 1 ∩ I ' 2 ) ∪ I ' 1 ) of both networks, as shown in Fig. 3. The final output provides well-localized information about the infectious regions inside the lung lobes as I ' that can be further used for the severity assessment of COVID-19 infection. The addition of the postprocessing stage reduces the false-positive pixels in I ' 2 (the output of DMDF-Net-2) and further provides a way to accurately quantify the severity of COVID-19 infection in terms of PIAL score. The PIAL score is calculated by dividing the area of the infected region (i.e., the total number of red pixels in final output image I ' ) over the total area of lung lobes (i.e., the total number of red and green pixels in final output image I ' ) as shown in Fig. 3. The subsequent sections present the detailed design, workflow, and selected training loss of the proposed DMDF-Net.

Design of the proposed DMDF-Net
The architecture of our proposed DMDF-Net is designed to meet the following objectives: 1) efficient memory consumption; 2) low number of trainable parameters; and 3) minimum performance degradation in terms of segmentation results. To accomplish these milestones, we primarily utilize the strength of the grouped-convolutional (G-Conv) and dilated convolutional (D-Conv) layers to develop the overall structure of the proposed network. The use of the G-Conv layer results in efficient memory consumption and fast processing speed owing to the decreased number of learnable parameters (Heaton, 2015). In detail, a conventional convolutional layer (Heaton, 2015) processes an input tensor I k ∈  R w k ×h k ×d k and generates an output tensor I l ∈ R w k ×h k ×d l by employing a kernel θ l ∈ R n×n×d k ×d l of size n × n. The entire process requires a total processing cost of w k × h k × d k × d l × n × n (Heaton, 2015). However, a G-Conv layer requires a total processing cost of w k × h k × d k ( n 2 +d l ) for a similar operation and reduces the processing cost by a factor of n 2 . In our network design, most of the G-Conv layers contain a kernel size of 3 × 3 (i.e., n = 3). Consequently, the average processing cost of the G-Conv layer is approximately eight to nine times lower than that of the conventional convolutional layer. Additionally, the D-Conv layers also result in better segmentation performance owing to the characteristic of exploiting the multiscale deep features without substantially affecting the computation cost . The complete layer-wise design of the proposed DMDF-Net is shown in Fig. 4. The network design mainly comprises an encoder part, followed by a decoder module. The encoder part mainly exploits the multiscale deep features from the given image and represents it as a 3D tensor that includes the main features. Subsequently, the decoder module upsamples the 3D tensor (encoder output) and generates a binary image as the final output. The following subsections provide a detailed explanation of the encoder/decoder structure and workflow.

Encoder design and workflow
To achieve efficient memory utilization and a low number of trainable parameters, we used the basic structural units of MobileNetV2 (Sandler, Howard, Zhu, Zhmoginov, & Chen, 2018) (labeled as A-Block and B-Block in Fig. 4 and Table 2) to develop an efficient encoder design. In addition, a set of four multiscale D-Conv layers  (labeled as C-Block in Fig. 4 and Table 2) was also included to exploit and fuse a more diversified representation of the input image. The encoder structure includes a total of four A-Blocks, three B-Blocks, one C-Block, and some other layers, as indicated in Fig. 4. Both A-and B-Blocks consist of the following three layers. 1) Expansion layer: a 1 × 1 convolutional layer that upsamples the depth size of the input tensor I k ∈ R w k ×h k ×d k by a factor of 6 and generates an output tensor I l ∈ R w k ×h k ×6d k . 2) Feature extraction layer: a 3 × 3 G-Conv layer that exploits the deep features from I l and produces an intermediate output 3) Projection layer: a 1 × 1 convolutional layer that downsamples the depth of I m by a factor of 6 and generates the final output tensor I n ∈ R wk×hk×dk or I n ∈ R wk/2×hk/2×dk (depending on the stride value in the preceding layer). In addition, a residual connection is included in B-Block that differentiates it from A-Block and prevents the vanishing gradient problem during the training process (Sandler et al., 2018). Subsequently, the C-Block mainly comprises four parallel D-Conv layers with dilation rate (DR) factors of 1, 6, 12, and 18 (in each layer). For effective computation, each D-Conv layer is followed by a projection layer (1 × 1 convolutional layer) that projects the depth of the output tensor of each D-Conv layer from 320 to 256 channels. To exploit multiscale features, four D-Conv layers process the input tensor I k ∈ R w k ×h k ×d k and generate a total of four output tensors l 1 l , l 2 l , l 3 l , and l 4 l . Consequently, four projection layers further reduce the depth of these intermediate outputs and generate new output tensors l 1 m , l 2 m , l 3 m , and l 4 m . Ultimately, a depth concatenation layer performs multiscale deep features fusion by combining these four output tensors and provides the final output tensor I n ∈ R w k ×h k × dn . Mathematically, input tensor I k undergoes the followingtransformations after passing through these structural blocks: where F A− Block (), F B− Block (), and F C− Block () represent the operations of A-, B-, and C-Block as transfer functions, respectively. In Eqs. (1) and (2), θ k , θ l , and θ m are the training parameters of the expansion layer (1 × 1 convolutional layer), the feature extraction layer (3 × 3 G-Conv layer), and the projection layer (1 × 1 convolutional layer), respectively. Similarly, in Eq. (3), θ 1 k , θ 6 k , θ 12 k , and θ 18 k are the training parameters of the four parallel D-Conv layers with DR factors of 1, 6, 12, and 18, respectively. The symbol ⊕ denotes the depth-wise feature concatenation operation in Eq. (3); and h θ () and h * θ () represent the convolution and dilated convolution operations, respectively. For DR = 1, dilated convolution, h * θ (), performs similarly to the standard convolution, conv().
The complete configuration and parametric details of the encoder module are listed in Table 2. Initially, the input image I T (obtained after pre-processing) is processed through a stack of multiple layers (including convolutional, BN, and ReLU layers) and transformed into a 3D tensor of size 18 × 22 × 256. In detail, the first 3 × 3 convolutional layer (labeled as Conv 1 in Table 2) explores the input image I T in both horizontal and vertical directions and converts it into an output tensor of size 144 × 176 × 32. Subsequently, the second and third convolutional layers (labeled as G-Conv 1 and Conv 2 in Table 2) process the output of the previous layer and transform the output of Conv 1 into a tensor of size 144 × 176 × 16. Consequently, a stack of seven structural blocks (labeled as A-Blocks 1,2,3,4 and B-Blocks 1,2,3 in Table 2) consecutively process the output of the previous layer/block to obtain a more diverse representation of the input image as a high-level abstraction. Eventually, these seven structural blocks convert the output tensor of Conv 2 into an output of size 18 × 22 × 320. Additionally, C-Block 1 applies the strength of multiscale dilated convolution and further explores the output of A-Block 4 at four different scales (with DR factors of 1, 6, 12, and 18) and provides diversified multiscale feature maps of size 18 × 22 × 1024 after performing multiscale deep features fusion. For efficient computation on the decoder side, a projection layer (labeled as Conv 3 in Table 2) further transforms these high-level features (output of C-Block 1) in a low-dimensional space. In detail, the Conv 3 layer reduces its (output of C-Block 1) depth by a factor of 4 and gives a final output tensor of size 18 × 22 × 256, which contains diverse semantic information.

Decoder design and workflow
The decoder part of the proposed DMDF-Net mainly includes two transposed convolutional (TP-Conv) layers, one C-Block (labeled as C-Block 2 in Fig. 4 and Table 2), some other layers named Conv and G-Conv, softmax, and pixel classification in Table 2. Our main contribution in the decoder part is the addition of multiscale D-Conv layers  (labeled as C-Block-2 in Fig. 4 and Table 2), which captures the multiscale representation of deep features from the unsampled output of the TP-Conv 1 layer by performing multiscale deep features fusion and provides additional performance gain. Moreover, a residual connection (extracted from B-Block 1 of the encoder module) was included in the decoded output of the TP-Conv 1 layer (before C-Block 2) to enrich the edge information. Most of the existing studies  have shown a better performance of residual skip connection (from encoder to decoder), particularly in the case of minor output objects.
The detailed layer-wise configuration of the decoder module is presented in Table 2. Initially, an 8 × 8 TP-Conv layer (labeled as TP-Conv 1 in Table 2) bilinearly upsamples the encoder final output (output tensor of size 18 × 22 × 256 after Conv 3 layer) with an upsampling factor of 4 and generates an upsampled tensor of size 72 × 88 × 256. Subsequently, a depth concatenation layer combines a residual tensor of size 72 × 88 × 48 (obtained from B-Block 1 and further processed by Conv 4) with the output tensor of TP-Conv 1 and gives a concatenated tensor of size 72 × 88 × 304. Furthermore, a total of four convolutional layers (labeled as G-Conv 2, Conv 5, G-Conv 3, and Conv 6 in Table 2) further transform the output of the previous layer into a new output tensor of size 72 × 88 × 320. Consequently, C-Block 2 applies the strength of multiscale dilated convolution and further processes the output of the preceding layer (Conv 6) at four different scales (with DR factors of 1, 6, 12, and 18) and provides diversified multiscale feature maps of size 72 × 88 × 1024 after performing multiscale deep features fusion. These feature maps are further projected into a low-dimensional space from 72 × 88 × 1024 to 72 × 88 × 2 by processing through two convolutional layers (labeled as Conv 7 and Conv 8 in Table 2). Next, a second 8 × 8 Table 2 Complete layer-wise structure, configuration, and parametric information of the proposed DMDF-Net.
TP-Conv layer (labeled as TP-Conv 2 in Table 2) further upsamples the output of Conv 8 using an upsampling factor of 4 and gives a final output tensor of size 288 × 352 × 2. Finally, a pixel classification layer in conjunction with the softmax layer generates the pixel-wise prediction of the given input image as the final output of our model. The softmax layer applies a softmax function (Heaton, 2015) that transforms the output of the TP-Conv 2 layer into a probability. Subsequently, the pixel classification layer provides a class label (either black '0' or white '1') to each pixel of the input CT image and generates a binary image as a final output.

Loss function
To perform better segmentation of minor lesion regions, a balanced cross-entropy (BCE) loss was selected for the training of the proposed DMDF-Net. BCE shows better performance than conventional crossentropy (CE) loss (Heaton, 2015;Jadon, 2020), mainly in the case of small segmentation objects or lesion regions (Li et al., 2021;Ni, Wu, Tong, Chen, & Zhao, 2020;Roth et al., 2018). Additionally, we took advantage of transfer learning (Krizhevsky et al., 2012) to perform timely and efficient training of the proposed network. The basic structural units of MobileNetV2 (labeled as A-Block and B-Block in Fig. 4) were used to develop the encoder design. Therefore, the initial training parameters of our encoder module (backbone network) were obtained from the MobileNetV2 network that was initially trained with an ImageNet dataset (Deng et al., 2009) using the conventional CE loss function (Heaton, 2015;Jadon, 2020) and stochastic gradient descent (SGD) optimization method (Li, 2018). Therefore, a related variant of the conventional CE loss function named as BCE was selected to perform sufficient training of the proposed DMDF-Net for the target domain. The mathematical interpretation of our selected BCE loss function is given as follows: where I k and M k are the k th training data sample and its ground-truth mask, respectively. Subsequently, ψ(.), N, and θ represent the proposed DMDF-Net as a transfer function, the total number of data samples, and the total initial training parameters, respectively. Finally,β is the class balancing factor between black '0 ′ and white '1 ′ pixels, and is calculated as the fraction of dominant pixels (i.e., black '0 ′ pixels) in the entire training dataset (Jadon, 2020).

Results
In this section, we present a detailed explanation of training, validation, and quantitative results of our method, including a detailed ablation study. Finally, we compare the performance of the proposed DMDF-Net (including both DMDF-Net-1 and DMDF-Net-2) with various state-of-the-art methods.

Training and validation
Based on existing studies (Kandel & Castelli, 2020;Prabowo & Herwanto, 2019), an SGD optimizer with a small learning rate factor of 0.001 was selected to perform efficient training of the proposed model. Generally, a small learning rate factor can achieve the global minimum; however, it requires many epochs to perform sufficient training of a segmentation network (Johnson & Zhang, 2013). However, a large value of the learning rate factor can skip the global minimum (Johnson & Zhang, 2013). Therefore, a small learning rate factor of 0.001 was selected to achieve optimal convergence of the proposed DMDF-Net. Additionally, we used the default settings provided by MATLAB R2020b for other hyperparameters. The overall training procedure of the proposed DMDF-Net is given in Algorithm 1 as pseudo-code.
, not increasing) do: Stop training proess for remaining epochs end 8 end 9 Output: finally trained parameters; θ ' // optimal weights of our model In the first experiment (hereinafter referred to as Exp#1), we performed lung segmentation using 70% (14/20), 10% (2/20), and 20% (4/20) of the COVID-19-CT-Seg data for training, validation, and testing, respectively. For a fair evaluation in Exp#1, a five-fold cross-validation was performed. Subsequently, in the second experiment (hereinafter referred to as Exp#2), we performed COVID-19 infection segmentation using two different datasets for training, validation, and testing. Such a cross-data analysis highlights the generality of the proposed framework. In Exp#2, we used 80% (16/20) and 20% (4/20) of the COVID-19-CT-Seg dataset for training and validation, respectively, and 100% (50/ 50) of the MosMed dataset for testing. In the MosMed dataset, the ground truths for lung segmentation are not given; therefore, we did not perform cross-validation in Exp#2 (i.e., using MosMed in training and COVID-19-CT-Seg in testing). Fig. 5 shows the training/validation accuracies and losses of the proposed DMDF-Net for lung segmentation (Exp#1) and COVID-19 infection segmentation (Exp#2). To avoid the overfitting problem, we included independent validation datasets in the training procedure for both Exp#1 and Exp#2. Consequently, we selected the best models based on the maximum validation accuracies for Exp#1 and Exp#2. Thus, the training of DMDF-Net-1 was continued till 15 epochs with a mini-batch size of eight. Subsequently, the second DMDF-Net-2 was trained till 13 epochs with the same mini-batch size as eight. To make a fair comparison, the same datasets and training protocols (such as training hyperparameter and loss function) were adopted to evaluate the results of different baseline models in both experiments. Finally, the quantitative results of the proposed and different baseline models were evaluated using the following seven performance evaluation metrics: the average Dice similarity coefficient (

Testing results (Ablation Studies)
Our proposed framework mainly includes a lung segmentation network (DMDF-Net-1 in Fig. 3) and an infection segmentation network (DMDF-Net-2 in Fig. 3) to segment lung lobes and infectious regions in a given CT image, respectively. Primarily, the output of DMDF-Net-1 is used in the post-processing step to refine the output of DMDF-Net-2 and generate well-localized information about the infectious regions in the CT image. Therefore, accurate segmentation of the lung lobes is important in the proposed framework. Thus, we evaluated the quantitative results of both DMDF-Net-1 and DMDF-Net-2, along with a comprehensive analysis of the pre-and post-processing stages as an ablation study. Table 3 presents the average segmentation results of the left, right, and both lung lobes (Exp#1) based on DMDF-Net-1. In addition, these results (Table 3) also highlight the significance of multiscale deep features fusion using multiscale dilated convolution (addition of C-Blocks) and transfer learning in developing and training the proposed network, respectively. In all cases (i.e., segmentation of left, right, and both lung lobes), DMDF-Net-1 (including C-Blocks) showed superior results in terms of all the performance metrics through transfer learning. In particular, the use of C-Blocks and transfer learning gave an average gain with DICE scores of 51% and both lung lobes, respectively. Besides the significant performance gains of our model, the segmentation results of the "both lung" dataset are higher than the "left lung" and "right lung" datasets. Such performance difference occurs due to similar shape and texture patterns of two lung lobes in both "left lung" and "right lung" datasets. Therefore, in the case of "left lung" and "right lung" datasets, it was more challenging for a CNN model to distinguish left and right lung lobes with similar shape and texture patterns. However, in case of "both lung" dataset, the segmentation of both lung lobes was quite simple for a CNN model compared to the individual segmentation of left and right lung lobes. Consequently, the performance of our model in case of the "both lung" dataset is higher than the individual results of the "left lung" and "right lung" datasets.
The encoder design (backbone network) of the proposed DMDF-Net includes the basic structural units of MobileNetV2. Therefore, we also compared the performance of the proposed encoder design with that of the original MobileNetV2 as a backbone network for lung segmentation (Exp. #1). .09%] for left, right, and both lung lobes, respectively. Owing to a slight class imbalance problem between two classes (lung ROI and background), the performance difference of our BCE loss is minimal in comparison with conventional CE loss. However, we observed a significant performance gain of our BCE loss in COVID-19 infection segmentation (Exp#2). Table 5 presents the quantitative results of the proposed DMDF-Net-2 for COVID-19 infection segmentation (Exp#2), along with a detailed ablation study. It can be observed from Table 5 that the proposed framework shows the best results (75.7%, 67.22%, 69.92%, 72.78%, 99.79%, 91.11%, and 0.026 for average DICE, IoU, AP, SEN, SPE, E φ , and MAE, respectively) with the addition of pre-and post-processing stages after performing transfer learning. After training the final proposed DMDF-Net-2 (including C-Blocks) for Exp#2 via transfer learning, both pre-and post-processing stages resulted in additional gains of 39. , and MAE, respectively. These results show that transfer learning, pre-processing, post-processing, and multiscale deep features fusion applying multiscale dilated convolution (C-Blocks) work in a mutually beneficial way to enhance the overall performance of the proposed diagnostic framework. Additionally, Fig. 6 shows the visual outputs of the proposed framework with and without including the pre-and post-processing stages for COVID-19 infection segmentation (Exp#2). It can be observed (Fig. 6) that both pre-and post-processing stages mutually contribute to reducing the number of false-positive and false-negative pixels and correctly segment the lesion regions in a given CT image.
We also compared the performance of the proposed encoder design (in DMDF-Net-2) with that of the original MobileNetV2 as a backbone network for COVID-19 infection segmentation (Exp#2). Table 6 presents the comparative results of the proposed versus original MobileNetV2 for Exp#2. It can be observed ( Table 6) that both pre-and post-processing steps show higher results for our proposed and original MobileNetV2 compared to those without including any pre-and post-processing steps. However, our model (including both pre-and post-processing stages) still outperforms MobileNetV2 In the post-processing step for Exp#2, the lung ROI (output of DMDF-Net-1) was applied over the output of DMDF-Net-2 to reduce the number of false-positive pixels, which is referred to as post-ROI fusion in this work. To highlight the significance of our post-ROI fusion-based postprocessing step, we also evaluated the performance of its counterpart, which is named as pre-ROI fusion. In this pre-ROI fusion-based postprocessing step, the lung ROI mask (output of DMDF-Net-1) was applied over the input CT image to obtain the lung ROI image. The lung ROI image was then further processed by DMDF-Net-2 to segment the infection regions. Table 7 shows the quantitative performance Table 3 Average five-fold performance of the proposed lung segmentation network (DMDF-Net-1) for lung segmentation (Exp#1). These results also highlight the significance of multiscale deep features fusion using multiscale dilated convolution (C-Blocks) and transfer learning in Exp#1 as an ablation study. (unit: %).  comparison of pre-ROI fusion versus post-ROI fusion with and without applying the pre-processing step for Exp#2. After training the proposed DMDF-Net-2 through transfer learning (without including the data preprocessing stage), post-ROI fusion outperforms pre-ROI fusion (  Table 7 after training the proposed DMDF-Net-2 from scratch (i.e., without transfer learning). Performance differences of our DMDF-Net-2 of no pre-and post-processing (the 1st and 7th rows of Table 7), only preprocessing (the 2nd and 8th rows of Table 7), combined pre-and post- Table 5 Quantitative results of the proposed infection segmentation network (DMDF-Net-2) for COVID-19 infection segmentation (Exp#2). These results also highlight the significance of transfer learning, pre-processing, post-processing, and multiscale deep features fusion using multiscale dilated convolution (C-Blocks) in Exp#2 as an ablation study. (unit: %).  processing (the 6th and 12th rows of Table 7) versus only postprocessing (the 4nd and 10th rows of Table 7) can be observed in Table 7. As shown in this table, we confirm that the combined pre-and post-processing, only pre-processing, only post-processing, and no preand post-processing show the 1st-4th highest accuracies, respectively. Subsequently, Fig. 7 presents the visual output results of pre-ROI fusion versus post-ROI fusion (for Exp#2) after applying the pre-processing step and training DMDF-Net-2 through transfer learning. It can be observed (Fig. 7) that the post-ROI fusion-based post-processing step effectively contributes to reducing the number of false-positive and false-negative pixels and correctly segmenting the lesion regions in a given CT image. We further analyze the effect of pre-and post-processing stages on the same dataset (COVID-19-CT-Seg) to show the comparative results of two different data distributions. In this experiment, we evaluated the average performance of the COVID-19-CT-Seg dataset with our proposed DMDF-Net-2 by performing the five-fold cross-validation. Table 8 shows these comparative results with and without applying pre-and post-processing steps. It can be observed (Table 8) that the addition of pre-and post-processing stages gave a marginal gain of 0.48%, 0.5%, 0.55%, and 0.06% for average DICE, IoU, AP, and SPE, respectively, in case of the same dataset (COVID-19-CT-Seg). In comparison, the effect of pre-and post-processing stages is significantly high in case of crossdataset (MosMed) with the average gains of 2.79%, 2.44%, 1.54%, and 11.71% for average DICE, IoU, AP, and SEN, respectively. These comparative results (Table 8) show the significant contribution of preand post-processing stages in cross-dataset (having different data distributions) and validate the generality of our proposed solution.
Our first MosMed dataset comprises a total of 2,049 images (having a total of 785 infected and 1264 non-infected slices). Subsequently, the second COVID-19-CT-Seg dataset includes a total of 3,520 images (with a total of 1843 infected and 1677 non-infected slices). Owning to a small number of data samples, the number of infected versus non-infected slices influenced the training of our proposed model by causing an under-fitting problem. To address this problem, we utilized the strength of transfer learning (Krizhevsky et al., 2012) to perform timely and efficient training of our model using a small dataset such as COVID-19-CT-Seg in all the experiments. Tables 3 and 5 show the significant gains of transfer learning in Exp#1 and Exp#2, respectively. In addition, we also observed a class imbalance problem (particularly in case of the MosMed dataset) owning to a small ratio of infected lung regions in each infected slice, which ultimately resulted in poor testing results (Table 6). This problem was further addressed by using a BCE loss in all the experiments. Tables 4 and 6 show the comparative performance gains of our BCE loss over the conventional CE loss function in Exp#1 and Exp#2, respectively.

Comparisons with the State-of-the-Art methods
In this section, we perform a detailed comparative analysis of the The best results are shown in boldface.

Table 7
Quantitative performance comparison of pre-ROI fusion versus post-ROI fusion with and without applying pre-processing step for COVID-19 infection segmentation (Exp#2). (unit: %). proposed method with state-of-the-art segmentation networks proposed for COVID-19 (Ma et al., 2021; and general image segmentation domains (Badrinarayanan, Kendall, & Cipolla, 2017;Chen et al., 2018;Long, Shelhamer, & Darrell, 2015;Ronneberger et al., 2015;Sandler et al., 2018). The proposed diagnostic framework includes DMDF-Net-1 and DMDF-Net-2 to extract lung ROI (Exp#1), and segment infectious regions (Exp#2) in a given CT image, respectively. Consequently, the performance of both DMDF-Net-1 and DMDF-Net-2 is separately compared with different baseline models. In (Ma et al., 2021;, the authors used the same datasets as selected in our method; therefore, we directly compared our method with the given results in (Ma et al., 2021; for both lung segmentation (Exp#1) and COVID-19 infection segmentation (Exp#2). Whereas in the case of other segmentation networks Chen et al., 2018;Long et al., 2015;Ronneberger et al., 2015;Sandler et al., 2018) proposed for general image segmentation applications, a direct comparison was not possible. Therefore, to make a fair comparison, we evaluated the segmentation results of these models Chen et al., 2018;Long et al., 2015;Ronneberger et al., 2015;Sandler et al., 2018) using the same datasets as those selected in this study. These baseline models include U-Net , DeepLabV3+ (based on ResNet ), MobileNetV2 (Sandler et al., 2018), SegNet (based on VGG16 and VGG19 ), and FCNs . Table 9 shows the comparative results of the proposed DMDF-Net-1 and all these baseline models for the lung segmentation task (Exp#1). It can be observed (Table 9) that the proposed DMDF-Net-1 shows superior results (in terms of average DICE and IoU scores), including a lower number of training parameters compared to the other models. DAL-Net  ranked as the second-best network based on the second highest DICE and IoU scores among all the baseline methods. The proposed DMDF-Net-1 outperforms   , and MAE, respectively. These results show the average gains of the performance of "left lung", "right lung", and "both lung" datasets. Additionally, in a t-test analysis (proposed versus ), we obtain an average p-value of less than 0.05 (specifically, a ρ-value of 0.013) that distinguishes our model from (Owais et al., 2021) with a 95% confidence score. In addition, the  The best results are shown in boldface.
number of training parameters of the proposed network is lower than . To be specific, our DMDF-Net-1 includes approximately 0.8 million fewer parameters than (Owais et al., 2021) (i.e., 5.85 million [proposed] ≪ 6.65 million ). Table 9 also includes the floating point operations per second (FLOPs) and execution speed of our DMDF-Net-1 and other baseline models. Our proposed DMDF-Net-1 requires 37.45 Giga FLOPs with an average execution speed of 25.64 frames per second (FPS). Figure 8 shows the final segmentation results of the proposed lung segmentation model (DMDF-Net-1) versus different state-of-the-art segmentation networks Chen et al., 2018;Long et al., 2015;Ronneberger et al., 2015;Sandler et al., 2018). It can be observed in Fig. 8 that our proposed DMDF-Net-1 and the methods of Sandler et al., 2018) show comparable visual results and outperform the other three baseline methods Long et al., 2015;Ronneberger et al., 2015). However, the overall quantitative performance of Table 10 of our DMDF-Net-1 is better than all the Table 9 Quantitative results of the proposed DMDF-Net-1 compared with different state-of-the-art segmentation networks for lung segmentation task (Exp#1 baseline methods. Furthermore, Table 10 shows the comparative results of the proposed DMDF-Net-2 versus different baseline models for the infection segmentation task (Exp#2). It can be observed from Table 10 that the proposed DMDF-Net-2 also provides superior performance and includes a lower number of training parameters compared to other methods. In contrast, DeepLabV3+ (MobileNetV2) (Sandler et al., 2018) is ranked as the second-best network among the other networks. With the addition of only the pre-processing step, the proposed DMDF-Net-2 outperforms (Sandler et al., 2018) (Sandler et al., 2018)), we obtained an average p-value of less than 0.01 (specifically, a ρ-value of 0.0072) that distinguishes our model from (Sandler et al., 2018) with a 99% confidence score. Similarly, the number of training parameters of the proposed network is also lower than (Sandler et al., 2018). To be specific, our DMDF-Net-2 includes approximately 1.6 million fewer parameters than (Sandler et al., 2018) (i.e., 11.7 million [proposed] ≪ 13.56 million (Sandler et al., 2018)). Consequently, these results (Tables 9 and 10) highlight the superior performance of our model. Table 10 also includes the FLOPs and execution speed of our DMDF-Net-2 and other baseline models. After including both pre-and post-processing steps, our final infection segmentation framework (DMDF-Net-1 and DMDF-Net-2) requires 74.9 Giga FLOPs with an average execution speed of 7.35 frames per second. Figure 9 presents the visual comparative results of our proposed framework with the other state-of-the-art deep segmentation models. Fig. 9a presents the comparative results without including pre-and postprocessing stages. Fig. 9b shows the visual outputs of all the methods by including only pre-processing stage. Finally, Fig. 9c visualizes the comparative performance by including both pre-and post-processing stages (applying the same lung segmentation network). It can be observed (Fig. 9) that the proposed network generates well-localized segmentation outputs for the input CT images. However, several reference models Chen et al., 2018;Long et al., 2015;Ronneberger et al., 2015) generate inadequate segmentation results, which are marked as false-positive (i.e., incorrectly recognize the normal regions as infectious regions) and false-negative (i.e., not recognizing the infected regions) pixels in Fig. 9. Nevertheless,  and (Sandler et al., 2018) showed better performance than Chen et al., 2018;Long et al., 2015;Ronneberger et al., 2015). However, the average segmentation results (Tables 9 and 10) show a higher performance of our method than that of  and (Sandler et al., 2018). Primarily, the superior performance of our method is attained by the addition of C-Blocks in both encoder and decoder modules, which mainly exploit diverse representations of lung/lesion patterns from the given data by performing multiscale deep features fusion. Moreover, a residual connection (extracted from B-Block 1 of the encoder module) further contributes to the low-level contextual information in the decoding part to refine the edge information of the final output.

Discussion
This section describes the distinctive aspects of the proposed method, with possible limitations that can influence the diagnostic performance of our system. Finally, it includes a brief roadmap for future work to address these constraints and improve the overall performance.

Principal findings
This study leveraged the strengths of recent deep learning techniques in chest CT image analysis to identify lung lesions associated with COVID-19 infection. The proposed framework mainly includes a lung segmentation network (DMDF-Net-1) and an infection segmentation network (DMDF-Net-2) to extract the lung ROIs and infected areas from a CT image. The output of DMDF-Net-1 is mainly used in a later postprocessing step to improve the infection segmentation results of DMDF-Net-2 and to provide a quantitative evaluation of the infected area in the CT image. Accurate detection and quantification of infected lung regions are essential for measuring infection severity in individual lung lobes and to find suitable personalized treatments . Fig. 10 presents the infection quantification results of our proposed diagnostic framework for some typical CT images, including both positive (Fig. 10a) and negative (Fig. 10b) Fig. 10, after pre-processing, DMDF-Net-1, DMDF-Net-2, and post-processing) further present the diagnostic workflow of the proposed framework. In Fig. 10, the PIAL score represents the quantification of the infectious regions in each CT image, which is calculated by dividing the area of the infected region by the total area of lung lobes (i.e., PIAL = 100 × [infectedlungarea/totallungarea]). Furthermore, in our network design, we mainly utilized the strength of grouped convolution and multiscale deep features fusion using multiscale dilated convolution (C-Blocks) to achieve better segmentation results with a reduced number of training parameters (specifically, 5.85 million). The encoder design of the proposed DMDF-Net contains fewer training parameters (specifically, 0.88 million) than the original Mobi-leNetV2 (specifically, 2.24 million). Owing to the optimal size of the encoder design, the average execution time of our model was lower than that of the original MobileNetV2. Specifically, the average execution time of our DMDF-Net is approximately 25.64 frames per second, while the original MobileNetV2 (which showed the second-best accuracies in Table 10) processes 20.41 frames per second. The average execution time (in terms of number of processed frames per second) was determined using the computing environment described in Section 2.1. Consequently, the optimal design of our model achieves state-of-the-art performance and utilizes low-cost hardware resources without influencing the overall diagnostic performance.

Table 10
Quantitative results of the proposed DMDF-Net-2 compared with different state-of-the-art segmentation networks for infection segmentation task (Exp#2  of the network. The input image of size 288 × 352 was downsampled into four spatial sizes (i.e., 144 × 176, 72 × 88, 36 × 44, and 18 × 22) after passing through the encoder module. Subsequently, the encoded output of size 18 × 22 is upsampled into two spatial sizes (i.e., 72 × 88 and 288 × 352) after passing through the decoder module. The decoded output with a size of 288 × 352 is the final output of the proposed network. Therefore, a total of five layers were selected for multiscale CAM visualization based on the distinctive spatial sizes of their outputs inside the encoder and decoder modules. Fig. 11 shows the multiscale CAM visualization of the proposed DMDF-Net-1 (lung segmentation network) and DMDF-Net-2 (infection segmentation network) for testing the CT images. It can be observed (Fig. 11) that the class-specific regions (lung ROIs or infectious regions) become increasingly discriminative after passing through successive layers. Finally, a binary image is obtained as the final output that presents the "Lung/infectious region" and "normal/background region" as white '1' and black '0' pixels, respectively.
Although several online datasets are available related to COVID-19, most datasets are related to the classification problem. A few segmentation datasets related to COVID-19 are publicly available, which only include the segmentation masks for infectious regions. This study presents a CAD framework for automatic detection and quantification of COVID-19 related findings in lung CT scans. The proposed method includes an additional post-processing step that also requires a lung segmentation mask for accurate segmentation and quantification of infected lung areas. Therefore, we selected the COVID-19 CT Seg dataset that includes the ground truths for both lung and infection regions of each slice. Secondly, we aimed to develop a CAD tool to segment trivial infected regions in the lung efficiently. Therefore, we selected the MosMed dataset that includes trivial lung tissue abnormalities with COVID-19 (pulmonary parenchymal involvement = <25%) (Morozov et al., 2020). In our future work, we will explore additional segmentation datasets related to the detection and quantification of COVID-19 related findings and develop a more efficient CAD solution.

Limitations and future work
Despite the promising results of the proposed method compared to existing methods, the current research still has some limitations. First, the performance of cross-datasets is still limited and can be further improved. Therefore, in future work, we will strive to increase the crossdata performance of the method, including multi-source CT data. Second, the proposed network can only segment lesions associated with COVID-19. In future work, we will collect more datasets, including those for multiple diseases, and propose a new CAD method that can detect and distinguish between COVID-19 and different types of diseases, such as other viral and bacterial infections.

Conclusions
In this study, we proposed a fully automated CAD framework for the effective recognition and quantification of COVID-19 related findings in a chest CT image. We mainly proposed a deep segmentation network (named DMDF-Net) that includes additional pre-and post-processing steps for accurate segmentation of infectious regions in CT images. The pre-processing step was included to address the generality issues considering a real-world scenario. The post-processing step generates a well-localized ROI of infectious regions and further provides the quantification of lesion regions in terms of the PIAL score. In detail, our designed network utilizes the strength of grouped convolution and multiscale deep features fusion using multiscale dilated convolution to achieve better segmentation results with a reduced number of learnable parameters (specifically, 5.85 million). The optimal size of our model utilizes low-cost hardware resources and provides effective diagnostic results. The first DMDF-Net-1 exhibited average DICE scores of 94.86%, 98.52%, and 98.66%, IoU scores of 90.59%, 97.11%, and 97.38%, and E φ scores of 94.67%, 98.78%, and 98.73% for the segmentation of the left, right, and both lung lobes (Exp#1), respectively. Similarly, the second DMDF-Net-2 (including both pre-and post-processing steps) exhibited average performance of 75.7%, 67.22%, 69.92%, 72.78%, 99.79%, 91.11%, and 0.026 for average DICE, IoU, AP, SEN, SPE, E φ , and MAE, respectively, for COVID-19 infection segmentation (Exp#2). Finally, a detailed comparative study for both Exp#1 and Exp#2 validates the superior results of our method over various state-of-the-art deep segmentation models.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.