1 Introduction

The pandemic of novel coronavirus disease (COVID-19) is affecting 213 countries and territories around the world, and with 2 international conveyances [12]. According to the statistics coming from the COVID-19 dashboard of Centre for Systems Science and Engineering (CSSE) at Johns Hopkins University, more than 19,172,505 identified cases of COVID-19 have been reported (the number is increasing), including 716,327 deaths [13]. The high contagiousness of COVID-19 is the reason of rapid increase of the confirmed cases. COVID-19 leads to extreme respiratory problems with different symptoms include fever, cough and fatigue. These symptoms can develop into severe pneumonia especially to people with weakened immune systems [34].

The Reverse Transcription Polymerase Chain Reaction (RT-PCR) is the most commonly used test to detect the viral RNA using nasopharyngeal swab. However, RT-PCR test is reported by different studies to have high false negative rates, where a repeated test is needed for accurate diagnosis. In addition, RT-PCR test has availability limitations due to the shortage of manufacturing material, also the testing process is time consuming that limits the rapid and accurate screening [1, 8].

The computed tomography (CT) is an alternative solution to RT-PCR for COVID-19 screening, where high proportions of CT scans were obtained from the infected patients. Compared to other types of tests, CT scanning is considered as a promising and efficient alternative tool for the detection and control of COVID-19 disease. CT imaging has been recommended for COVID-19 diagnosis, specifically, the chest CT screening has been used as a routine diagnostic tool for pneumonia [1]. Chest CT scanning has demonstrated effectiveness in coronavirus disease diagnosis, including follow-up assessment and disease progression monitoring [29, 33].

Diagnostic studies using CT screening on COVID-19 disease patients state that the infection areas may appear in the CT scans before the appearance of the disease symptoms. Therefore, for asymptomatic patients, these COVID-19 infection areas can be detected by observing the ground glass opacity (GGO) and pulmonary consolidations signs, which could appear at different stages of the disease [27, 30].

Visual CT imaging analysis can help in COVID-19 disease diagnosis by proposing approaches to identify the predominant patterns of the infections like ground glass opacity (GGO) and pulmonary consolidations. Systems in this field support three different image processing tasks. First, CT classification, where the patient is classified to have the disease or not [3, 21]. Second, the disease infection detection, where the infection areas are highlighted by bounding boxes. Third task is the infection area segmentation and disease burden calculation by applying classification at pixel level [6].

Manual segmentation of COVID-19 infections is tedious and time-consuming process. In addition, it greatly depends on the skills of the physician or doctor who perform the segmentation task [23, 25]. For COVID-19 infection segmentation, the automated approach is desirable, as it is, ideally, more objective and removes dependence on human skills. Recently, due to the advancement in computer vision, the development of deep fully convolutional networks (FCN) enhanced the performance of the semantic segmentation, which leads to outperform other competitors in the field of medical imaging [10, 19, 22].

General FCN focuses its task on image classification, where input is an image and output is one label. However, in COVID-19 chest CT analysis, it requires, beside the classification, to localize and segment the area of abnormality[10, 20]. Researchers started to use FCN for COVID-19 disease to help clinicians and radiologists in diagnosis and prognosis tasks, which succeeded to improve accuracy and reduce the time of inspection [2].

Deep learning systems have been proposed by many researchers to help in combating the high spread of COVID-19 disease. Most of the recent proposed deep learning systems focus on detecting (classifying) the patients infected by COVID-19 disease using CT screening, that is due to the availability of CT scans, which are many, and it does not require radiologist annotations, which are very rare [2, 33].

As an example for classification COVID-19 diagnosis systems, COVID-Net which is introduced by authors in [36], it is a deep neural network tailored for the detection of COVID-19 cases from chest X-ray images that are open source and available to the general public. Other study proposed 3D deep learning system trained using pulmonary CT images to distinguish COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases [37]. Another weakly-supervised deep learning-based software system was developed using 3D CT volumes to detect COVID-19 [38].

Many deep learning systems have been proposed to assist COVID-19 diagnosis in clinical practice, however few of them are related to infection delineation from CT scans [4, 7, 26, 32, 35]. Most of these proposed techniques use U-net FCN implementation as backbone in their approaches. some other works proposed their own COVID-19 oriented deep networks [7, 39]. The common challenge for most of the proposed methods is the insufficient labeled CT scans for deep networks training, which cannot be available in short time. As the process of annotating the infection areas is time consuming, expensive and depends on the radiologist expertise [6].

This paper proposes an automatic deep learning segmentation system to detect and delineate the COVID-19 infections in CT scans. The system starts by segmenting the lung organ as region of interest from CT scan, and then segmenting the infections inside it. The system helps the physicians to assess the evolution of the disease, calculate the burden and severity of the infection.

2 Method

2.1 Overview of the proposed framework

Figure 1 presents the flowchart of the proposed segmentation system; it consists of two main steps that are applied sequentially. The first step is the lung segmentation step from the plain chest CT slices, and then it is followed by the COVID-19 infection. The proposed framework depends on using two cascaded fully convolutional networks (FCNs). The first FCN is built to segment the lung organ which is used as region of interest (ROI) to focus and segment the COVID-19 infection areas using the second FCN. The two constructed FCNs are analyzed and trained using diverse datasets form different public sources.

Fig. 1
figure 1

Flowchart of COVID-19 infection segmentation pipeline

2.2 COVID-19 datasets

In this paper, publically available COVID-19 chest CT datasets, which are collected from different sources, are used to train and test the proposed FCN networks. The training datasets are diverse in term of COVID-19 infections severity, size, location and contrast, which contain both GGOs in early stage or pulmonary consolidation in the late infection stage. All CT datasets were labeled with required classes for training: background, lung organ and COVID-19 infection [5]. All publically available datasets used in this study are supported in MedSeg website [14].

The first open access COVID-19 dataset is collected by the Italian Society of Medical and Interventional Radiology (SIRM). It is the first open-access COVID-19 dataset for lung infection segmentation. The COVID-19 CT segmentation dataset consists of 110 axial slices that contains lung infections segmented by radiologist, where the segmented slices are extracted from more than 60 diverse COVID-19 infected CT scans [15]. The database contains 60 cases with example CXRs and single slice CT-images. It consists of 110 usable, axial CT-images of confirmed COVID-19 cases. The images were segmented by a radiologist using 3 labels: ground-glass, consolidation and pleural effusion. This dataset is used for the testing step of the proposed model.

The second group of public datasets is from Radiopedia, it consists of 9 COVID-19 chest volumetric CT with corresponding ground-truth [16]. About 373 slices of the whole 9 datasets have been diagnosed as positive and delineated by a radiologist [17, 23]. A third newly released public dataset is collected and presented by Ma et al., which consists of 20 annotated COVID-19 chest CT, 10 CT scans from Coronacases Initiative - RAIOSS (Radiolgy AI One-Stop Shop) [18] and another 10 CT scans from Radiopedia [23]. The slices in the datasets come with 512x512 dimensions. The number of slices for the CT scans ranges 200-301 slices. While for the pixels spacing and slices thickness, the range is 1-1.5 mm. These CT scans are freely accessible with CC BY-NC-SA license, and all the 20 COVID-19 CT scans are labeled by two radiologists and verified by an experienced radiologist. These datasets are used in the training step of the proposed system.

2.3 COVID-19 infections appearance enhancement

Different challenges confront the segmentation of infection areas inside the lung ROI, like the blurry edges and intensity inhomogeneity inside infection areas. The COVID-19 infection area has low contrast in the chest CT images; it does not have a clear boundary from the surrounding tissues. In addition, infection areas have high variability in term of texture, size and position in CT slices To improve the detection of the infection areas, the segmented lung (ROI) from the CT image is enhanced using tensor-based Edge Enhancing Diffusion (EED) filtering [24]. EED filtering uses diffusion tensor to adapt the diffusion based on the image structure. EED filter helps to enhance the contrast, filter the noise to improve intensity homogeneity, and preserves the boundaries of the shape [24].

To improve the detection and segmentation of the COVID-19 infection areas, EED filtering is used to increase the contrast of infection areas by enhancing the intensity homogeneity inside these areas and preserves the boundaries with respect to the lung parenchyma. This step aims to improve the FCN training process to extract and learn the main features that differentiate the infection areas from the surrounding tissues. Figure 2 shows the effect of the EED step to the raw medical CT slice.

Fig. 2
figure 2

EED enhancement of COVID-19 infection areas

2.4 Training patches extraction

The distribution and location of the infection areas inside the CT scans is unknown, which is considered as main concern for better training of the proposed deep learning FCN. In addition, if the CT slice has infection areas, the distribution of infection within the slice is largely skewed as a small percentage of the slice could belong to the infection. Therefore, using the whole CT slices as training patches could lead to strong bias toward background which is considered as common semantic segmentation problem in medical imaging.

In order to combat this problem, the generated training slices are extracted with random different patch sizes from those slices that contain infection. The training datasets represent the lung slices that are extracted only from lung ROI (not the whole CT slice). As the lung organ appears with different sizes in the different slices, the extracted patches come with different resolutions. Figure 3 presents different training patches examples.

Fig. 3
figure 3

Different training patches with different sizes

2.5 Network architecture: resdense FCN

In this work, two cascaded deep FCNs are connected sequentially to segment the lung organ and then the COVID-19 infection areas. The backbone of the proposed FCN network is an adjustment of U-net architecture [31], with 5 levels as shown by Fig. 4. The U-net consists of an encoding path and decoding path. At each level of the encoding, three operations are applied: convolution, activation function (ReLU) and batch normalization. These operations are applied two times consecutively in each level block, which is followed by max-pooling operation before moving to the next level. The kernel size is 3x3 for convolutions and 2x2 for the max-pooling. The resolution of the feature is reduced to the half after each level.

Fig. 4
figure 4

The proposed FCN

The decoding path of the network recovers the original input size by applying same sequence of operation (conv, ReLU, BN) but replacing the max-pooling with up-sampling at each level. In addition, the corresponding feature from the encoding path is concatenated to the input of each decoding level. The last level in the decoding path ends with 1x1 convolution with sigmoid activation function to classify the feature map using dice coefficient metric and generate the final binary prediction map.

The increase of network depth in FCN is inevitable, however it leads to the vanishing gradient problem, as more layers are stacked together vanish and wash out the gradient information, which slows down the training and degrades the performance. Different deep network architectures were proposed to deal with this problem, however, the DensNet and ResNet are considered as breakthroughs in term of performance.

In DensNet, each layer is connected to all forward layers, where the feature maps generated from different filter sizes are concatenated from previous layers, which make the model much thicker as channels are joined after every convolution operation [11]. On the other hand, that doesn’t happen in ResNet as the addition operation is used to merge the previous input identity with output feature map. In ResNet block, a shortcut (skip connection) from the input of the block (identity) bypass the stacked layers and attach with output feature of the block [9], Fig. 5 explains the difference in connections for (a) Residual block, and (b) Dense block.

Fig. 5
figure 5

(a) Residual block, (b) Dense block and (c) ResDense Block are the numbered sub-figures

DensNet aims to ensure maximum information to flow between layers in the network by combining the features through concatenating them instead of summation as on ResNet. Therefore, the DensNet is considered as memory hungry networks, as the back-propagation requires storing the entire layers outputs, which costs more memory and runs slow. On the other hand, the addition of tensors is the idea of ResNet, however it has been argued that the direct addition of feature maps harms the gradient flow through the network, as it sums up the features values.

Therefore, the concatenation operation is preferred as it preservers the feature maps, while the summation corrupts the feature maps for both the convolution operation and the source of the skip connections. The main contribution in the proposed network architecture is the ResDense block as shown in Fig. 5c. In the proposed ResDense, the dense connections (concatenation) are used between residual blocks rather than convolution layers. In terms of feature maps flow and memory, the proposed ResDense block refines the feature values sufficiently by having residual blocks and memorizes the refined feature values intermittently by dense connections between residual blocks.

In this work, the proposed FCN network’s encoder and decoder paths depend on ResDense blocks. Each level in the contracting and expanding paths of the proposed network is built using ResDense block as shown in Fig. 4. Hence, the depth size of the feature map is doubled and concatenated with block input at the end of each level in the encoding path.

2.6 Network implementation and training

The proposed network is trained with resized annotated 2D slices with patch size of 256 × 256 from cropped patches taken from original size of 512x512, which means that the training images comes from scaled up ones. All CT slices were normalized patch-wise using zero mean and unit variance normalization. The ResDense FCN architecture is implemented using KerasFootnote 1 with the TensorFlow backend.

Two ResDense networks were built to segment both the lung and then the infection areas inside it, sequentially. The network is trained and the parameters are updated using Adam optimizer with learning rate starts by 0.0001. The training process is monitored using different parameter, first during the training process, if there is no improvement in the validation loss over two consecutive epochs, the learning rate is decremented by factor of 0.2. One more monitoring parameter is the early stopping of training, where the training process stops when the validation loss is not improved over four consecutive epochs.

The networks are trained with batch size 32 and 16, for lung and infections segmentations, respectively. 15 percent of the training datasets are use as validation sets. The soft dice coefficient loss of the validation datasets was used to update the model parameters and to monitor the network training convergence. Batch normalization (BN) layer is used in the network design, which helps to avoid unrealistic increase or decrease of the generated values among network layers. The final layer for the network utilizes pixel-wise sigmoid activation function.

Due to imbalance class distribution of the lung tissue and the COVID-19 infections, many steps have been taken to improve the performance of the trained network. First, the soft dice loss metric is used in the training process to measure the overlap between the ground-truth patches and the area labeled as infection by the network inside the lung ROI. In addition, the second FCN network is trained only inside the lung ROI, to learn features that discriminate COVID-19 infections from lung tissues background only. Therefore, the training patches are extracted by cropping the lung part only from the CT slice. Besides that, the used training patches are ensured to have the corresponding mask, and to exclude patches without annotation mask from the training process.

2.7 Performance measures

To quantitatively evaluate the proposed system, a group of performance measures are used to assess the segmentation of COVID-19 infection and lung organ from CT scans. First, the Dice coefficient (DSC), it is an overlap measure that computes the ratio between the correctly segmented class with respect to the average size between the segmentation output (A) and the ground truth (B), as shown in equation 1. However, the used measure during the training process is the soft Dice loss because we directly use the predicted probabilities instead of doing threshold and converting them into a binary mask. In order to formulate a loss function which can be minimized, we simply use 1−DSC. A soft Dice loss is calculated for each class separately and then averaged to yield a final score.

$$DSC=\frac{2\left|A\cap B\right|}{\left|A\right|+|B|}$$

The second measure is sensitivity, which finds the ratio of the correctly segmented class voxels compared to the ground truth. The sensitivity measure shows the method ability to segment the intended class voxels correctly. The third measure is the specificity, which measure the ratio of the correctly segmented non-class voxels over the total number of non-class voxels. The specificity measure shows how likely the method able to segment the voxels that do not belong to the intended class.

3 Results and discussion

Datasets are divided into two groups, training and testing groups. The training group consists of CT scans from Radiopedia and Coronacases Initiative datasets, while the testing dataset contains the SRIM dataset. The proposed FCN networks are trained with 2D labeled slices that are extracted from the training dataset group, where 3,686 and 2,216 slices are used for lung organ and COVID-19 infections segmentations, respectively. Figure 6 shows the training process curve of the network for the COVID-19 infection segmentation step. The training process reached the plateau after thirteen epochs of the training after which the validation loss could not be improved.

Fig. 6
figure 6

Learning curves of the ResDense network for COVID-19 infection segmentation

The proposed method is evaluated qualitatively and quantitatively on the diverse test datasets (SIRM dataset). The trained networks showed an impressive performance for both lung and COVID-19 infection segmentation. Figure 7 shows the visual segmentation comparison of the segmented results (red) for three different examples with the corresponding ground-truths (green). Figure 7 shows the visual comparison for COVID-19 infection segmentation (red) and corresponding ground-truths (green). From the qualitative evaluation presented by Fig. 8, it is observed that the segmented infections areas (red) revealed the strong performance of the proposed system.

Fig. 7
figure 7

Visual Lung segmentation comparison: (a) axial slice (b) Ground-truth (c) segmented lung organ

Fig. 8
figure 8

COVID-19 infections segmentation comparison: (a) axial slice (b) Ground-truth (c) segmented infection areas

Quantitative evaluation proved the high performance of the proposed system as explained by the achieved values of the performance measures in Table 1. The achieved DSC values are 0.96 for lung segmentation and 0.78 for COVID-19 infections. In addition, the system achieved sensitivity and specificity of 0.93 and 0.99 for lung segmentation, and 0.82 and 0.95 for infection segmentation, respectively. In addition, the table presents the effect of using EED step and how it improved the infections segmentation as explained in Sect. 2.3.

Table 1 Measures alues for lung and covid-19 infection segmentation

For further evaluation, a comparative study is carried to assess the proposed system with respect to other COVID-19 segmentation approaches. However, it is difficult to establish a fair comparison due lack of common public datasets. Table 2 presents the proposed method along with other state of art COVID-19 infection segmentation. Some of these methods used publically available datasets, while other works used their own private datasets for their system analysis. From Table 2, the proposed method succeeded to achieve high DSC value compared to other listed methods.

Table 2 Comparison with other works for covid-19 infection segmentation

The lack of public annotated training datasets is the main challenge for COVID-19 deep learning segmentation systems, especially with the hungry deep FCNs that use 3D patches for training [26, 39]. Works that are based on limited datasets could suffer from over-fitting, as number of training datasets is few that lead to biased systems lack the ability of generalization.

However, some authors tried to deal with limited datasets by using cross-validation approach to generate multiple folds from the same small dataset, where each generated fold is used for both training and testing [23, 26]. Ma et al. made a study to combat the lack of publically available datasets to train the COVID-19 deep learning systems [23]. They carried three tasks to evaluate the potential annotation-efficient strategies. The authors tried COVID-19 datasets with few annotations, existing non-COVID-19 datasets and heterogeneous datasets include both COVID-19 and non- COVID-19 CT scans. For the non-COVID-19 datasets, the models almost fail to predict COVID-19 infections on testing set, which highlights that the lesion appearances differ significantly among lung cancer, pleural effusion, and COVID-19 infections in CT scans. Then the authors proposed U-net based deep learning system, which is trained using 20 CT scans, 10 cases from the Coronacases Initiative and 10 cases from Radiopedia. They used 20% of the dataset (4 cases) for training and 80% of the dataset (the remaining 16 cases) are used for testing [23]. Similarly, Muller et al. used the same dataset used by Ma et al., but with inverted percentages to train their 3D deep networks ( 80% for training and 20% for testing) [26].

Qiu et al. used SRIM dataset which consists of 100 axial CT images from 60 patients with COVID-19 to train their specific COVID-19 MiniSeg system. Authors supposed that even if the dataset is small but it is diverse with each patient contributing 1.6 axial CT images. They randomly chose 60 images for training models, and another 40 CT images for performance evaluation [28]. On the other hand, some works succeeded to use plenty of their own datasets (private) to train deep learning systems. Yan et al. trained and tested their specifically designed 3D deep system, COVID-SegNet, using chest CT images from 861 patients with confirmed COVID-19, which are annotated by experts [39].

In general, segmentation of COVID-19 infection from CT is a challenging task, mainly due to infection appearance in CT slices. Infection areas characteristics, like high variability in term of texture, size and position of GGO and consolidations in CT slices. Also, the low contrast and blurry boundaries of the infection areas impose challenges on the delineation process. Based on that, the proposed system framework used different steps that combat these challenges, like lung segmentation, EED and training patches generation steps. The importance of lung segmentation step for infection delineation is high, as it concentrates the teaching of deep FCN to be within region of interest. The proposed deep learning network is trained using slices contain information only from lung region, where the network is trained to learn features from two classes, either lung tissue (background) or infection areas.

Also, in order to combat the infection appearance deficiencies, the edge enhancing diffusion filtering (EED) is applied. EED filtering helps to strengthen the contrast between the infection areas and the background by improving infection areas boundaries. In addition, EED helped in improving and enhancing the intensity homogeneity within the infection areas. This step aids the network to learn the strong features that discriminate pixels into either infection areas or background.

The proposed system demonstrated the robustness, generalization and scalability of the proposed ResDense network, which is trained using 2D patches from different datasets supplied from multiple sources. The network succeeded to learn the lung and infections features after training with validation dice loss of 0.012 and 0.134, respectively. Compared to other works [7, 23, 26], this work was lucky to train the proposed network with more newly released annotated COVID-19 chest CT scans [5].

4 Conclusion

This paper presented a deep learning system for COVID-19 lung infection segmentation in chest CT scans. The constructed FCN utilized U-net architecture as backbone with each level in the encoding and decoding paths is built using proposed ResDense blocks. The feature maps of the infection areas and lung background flow through the network without significant change of their values due to concatenation skip connection in the each ResDense blocks, which improved network learning and enhanced the segmentation performance. Moreover, the system contains EED step to improve the infection areas appearance in CT slices by enhancing their contrast and intensity homogeneity. The qualitative and quantitative evaluation results demonstrate the effectiveness and the ability of the system to segment COVID-19 infection areas from CT images. The system is trained and validated using diverse datasets from different sources, which proved its ability for generalization and be promising tool for automatic analysis of COVID-19 infection segmentation and in clinical routine.