1 Introduction

Opacity is a term used to describe any area that preferentially absorbs and therefore appears more opaque than the surrounding area on a radiograph. It does not indicate the size or pathological structure of the abnormality [1]. In other words, opacity refers to any area that appears white on a chest radiograph when it should be darker. On both CT and chest radiographs, normal lungs appear dark compared to surrounding tissues because air has a relatively lower density. When there is a replacement of air with another substance in the lung, such as fluid or fibrosis, it causes an increase in the density of that area. As a result, the tissue appears lighter or gray in a chest radiograph. The term “lung opacity” on a chest radiograph refers to areas in the normally dark-appearing lung that appear denser, hazy, or cloudy [2]. Therefore, areas of opacity are areas that are gray but should be darker [3, 4].

Figure 1 shows a healthy lung image and an image with lung opacity.

Fig. 1
figure 1

X-ray images with healthy and lung opacity [5] a healthy image and b lung opacity image

Fig. 2
figure 2

Flowchart for the diagnosis of Lung Opacity

In lung opacity images, haziness can be seen in the areas where labeled boxes are present (referred to as ground-glass opacity) and the usual boundaries of the lungs are lost (referred to as consolidation). Lung opacities are not homogeneous and do not have a clear center or clear boundaries [6]. For this reason, it is difficult to separate it from the entire image and segment it properly [5].

Lung opacity is generally benign and resolves spontaneously without complications in patients with short-term illness [7, 8]. The presence of opacity on a chest X-ray image can indicate: fluid in air spaces, thickening of air space walls, thickening of lung tissue, inflammation, pulmonary edema, damage and bleeding in blood vessels, cancerous growth, fibrosis [5, 9,10,11]. The increase in the area of opacity also increases the risk of fatal pneumonia. The aim of this research is to detect opacity on X-ray images, reduce the burden on hospitals and healthcare professionals, distinguish between COVID-19, pneumonia, and tuberculosis, and encourage physicians to pay more attention to these areas before the disease reaches the pneumonia stage [12].

The main contributions of this paper are as follows:

  • Deep learning models have been successfully tested on X-ray images. An artificial intelligence system that helps physicians diagnose lung opacity disease through lung images has been developed.

  • A dataset with five different classes has been created from datasets accepted in the literature. This dataset includes the lung opacity, pneumonia, COVID-19, tuberculosis, and normal classes.

  • The dataset is different from those in the literature in terms of the number and consistency of the data. It is a unique dataset created for the five different classes mentioned. The biggest difference of the dataset from other datasets is that it has a higher number of data in the lung opacity, pneumonia, COVID-19, and normal categories. Also, the maximum number of images possible have been categorized for the tuberculosis class.

  • The three-channel fusion CNN model has been used for the first time in the classification of diseases with lung images.

  • The three-channel fusion CNN model has been proposed as a new architecture that is easy to implement and has shown success in multi-class problems.

Fig. 3
figure 3

Multi-class model architecture

Table 1 Statistical information of the created dataset
Fig. 4
figure 4

Samples from dataset

The remainder of this manuscript is organized as follows:

Section 2 introduces a literature review of lung opacity diagnosis. Section 3 describes the materials and methods implemented in this study. Section 4 presents the experimental results and a comparison with the results in the literature. Section 5 discusses conclusions and future work.

Fig. 5
figure 5

Lung opacity mask extraction samples

2 Literature Review

Sirazitdinov et al. [12] proposed an ensemble model consisting of two convolutional neural networks, Mask R-CNN and RetinaNet, for the localization and detection of lung opacity and other pneumonia. Because of the study on 26684 datasets, a recall value of 0.793 was obtained.

Senan et al. [13] used two deep learning models, AlexNet and ResNet-50, to diagnose X-ray datasets created from multiple sources. Enhanced features extracted from CNN models were then combined with traditional GLCM and LBP algorithms in a 1-dimensional vector of images, which produced more representable features for individual diseases.

Li et al. [14] proposed the Cov-Net model for the detection of four-class (lung opacity, COVID-19, viral pneumonia, and normal) radiological images. A modified residual network with asymmetric convolution and embedded attention mechanism was used as a backbone of the feature extractor for accurate detection of classes.

Table 2 Features used for classification tasks
Table 3 Average accuracy values for different classes
Fig. 6
figure 6

Accuracy, loss, and learning rate values for the five-class classification process

Fig. 7
figure 7

Accuracy, loss, and learning rate values for the four-class classification process

Fig. 8
figure 8

Accuracy, loss, and learning rate values for the three-class classification process

Fig. 9
figure 9

Accuracy, loss, and learning rate values for the two-class classification process

Table 4 Recall/ precision/ F1 score values for classifications
Fig. 10
figure 10

Confusion matrix for the five-class dataset (0: COVID-19, 1: lung opacity, 2: normal, 3: pneumonia, 4: tuberculosis)

Fig. 11
figure 11

Confusion matrix for the four-class dataset (0: lung opacity, 1: normal, 2: pneumonia, 3: tuberculosis)

Fig. 12
figure 12

Confusion matrix for the three-class dataset (0: lung opacity, 1: normal, 2: tuberculosis)

Fig. 13
figure 13

Confusion matrix for the two-class dataset (0: lung opacity, 1: normal)

Mergen et al. [15] used deep learning methods for detecting lung abnormalities. First, multi-scale deep reinforcement learning was used for detecting anatomical landmarks. A DenseUNet was trained for lung opacity segmentation.

Rahman et al. [16] used five image enhancement techniques to increase the accuracy of disease diagnosis on a three-class (lung opacity, COVID-19, and healthy) 18479 chest X-ray dataset. They then proposed a new UNet model for lung segmentation. Six different pre-trained CNNs and a shallow CNN model were examined on both normal and segmented images.

Muhammad et al. [17] successfully applied deep learning with CNNs to a five-class (lung opacity, bacterial pneumonia, viral pneumonia, COVID-19, and normal) dataset to increase diagnostic accuracy. To augment data, due to the lack of X-ray images, they proposed a self-augmentation mechanism using reconstruction independent component analysis (RICA).

3 Materials and Methods

The motivation behind the diagnosis and segmentation study of lung opacity disease on chest X-rays is to help physicians identify and follow the progression of the disease by using a deep learning technique-developed system to detect the lung opacity condition. The three-channel fusion CNN model has been used as a deep learning modeling algorithm to extract the most important distinguishing features from the X-ray images. The images included in the lung opacity class have also been segmented using python’s preexisting libraries (OpenCV, matplotlib) to mask the image. The obtained mask images have been stored on a web server, and it has been planned for the physicians to interpret the difference between the new X-ray image of the patient taken after a certain period (between 1 year and 3 years) has passed. The training results will be incorporated into the system using transfer learning to segment the lung opacity class in the model. The flowchart of the study is shown in Fig. 2.

3.1 Model Description

The proposed CNN model is designed as a three-channel model. Classic fusion architecture models are used with two, three, and four channels [18, 19]. The basic idea of the fusion architecture is to provide the input image multiple times in multiple stages in order to extract more features [20]. However, repeatedly providing the same image on different channels can cause inconsistency in extracting more features. Therefore, the three channels in our proposed model have been implemented with classical CNN models that have been successful in classification problems. (In this stage, the architectures available in TensorFlow have been tried in order and the ones with the best results have been selected.) The MobileNetV2 architecture has been used in the first channel, the InceptionV3 architecture in the second channel, and the VGG19 architecture in the third channel. MobileNetV2 CNN is widely used in image classification and segmentation processes [21, 22]. MobileNetV2 architecture works by reducing the size and complexity of the network in terms of the number of parameters. For this reason, it is developed and preferred for efficiency. InceptionV3 is a modified version of the inception family with some improvements including LabelSmoothing, 7x7 convolutions as well. It is mostly used in image analysis and object detection problems [23]. VGG19 architecture is a deep neural network with multiple layers of convolution. It is useful due to its simplicity, as it is composed of 3x3 convolutional layers stacked on top of each other with increasing depth levels. To reduce the volume size, maximum pooling layers are used [24]. In each of the three channels, the transfer of features from the previous layer to the current layer has been supported using ResNet architecture.

During the transfer to the fully connected layer, the features from the three channels are combined with a concatenated layer and transferred to the output layer. Then, lung opacity is detected with a multi-class classifier using the softmax function. Finally, the mask extraction process from the images belonging to the lung opacity class is performed with the help of the OpenCV and matplotlib libraries and saved on web servers. The architecture of the proposed model is shown in Fig. 3.

3.2 Dataset

The compiled dataset is a comprehensive version of the data commonly used in literature. The number of images belonging to classes that are difficult to learn (such as pneumonia and lung opacity) has been kept as high as possible. First, the publicly available dataset created by Deb and Jha [25] was examined and categorized. Then, images from the COVID-19 Grand Challenge dataset were included in these categorized groups (https://cxr-covid19.grand-challenge.org/Dataset/). Images from the dataset created by Cohen et al. were also added to the pool of data [26]. The publicly available dataset created by Chowdhury et al. was also examined and added to the data pool [16, 27]. Finally, the dataset created by Tawsifur et al. was added to the data pool [28].

These are the final classes in the five-class categorization: lung opacity, pneumonia, COVID-19, tuberculosis, and normal (healthy). The number of images in each class included in the dataset is shown in Table 1. The created dataset has been made publicly available for the knowledge of researchers [29]. More detailed descriptions of the dataset and a link to the dataset can be found at: https://github.com/turkfuat/covid19-pneumonia-dataset. Sample images used in the study are shown in Fig. 4.

3.3 Image Preprocessing

As the chest X-ray images are obtained from different sources, each image has different sizes, different contrasts, and different light reflections. Hence, the imaging intensity of each image is different. In addition, due to the lack of a certain standard in X-ray imaging and other reasons such as patient movements, noise occurs on the images. In noisy images, the disease diagnostic accuracy of algorithms can be reduced [30]. For this reason, preprocessing algorithms are applied to the images in the dataset [31]. For this purpose, the OpenCV Library in Python is used. If the pixel intensity is less than the specified threshold value, the pixel is set to 0 (black) to prevent it from participating in the computations. The average filter is applied to enhance the images. The contrast of each image is increased to expand the density range. All images are resized to a standard size of 224x224 pixels for deep learning models.

3.4 Lung Opacity Mask Extraction Process

After a lung opacity class is detected, the images in this class are masked in order to be compared with images obtained within a range of one to three years. These masks will be used to compare with new images obtained when physicians are called for control. For this purpose, the images are first converted to DICOM format and then the lungs are segmented using TensorFlow libraries [32, 33]. Some examples of segmentation are shown in Fig. 5.

3.5 Evaluation Metrics

Lung opacity detection is a classification task; therefore, the most fundamental metric that can be selected is the confusion matrix. The confusion matrix technique evaluates the accuracy and performance of the classification algorithm. If the images in the classes of the dataset do not show a balanced distribution, measuring the classification accuracy alone may not be sufficient and may give misleading results [13, 34]. In this study, the performance metrics calculated for the dataset used are defined as accuracy, recall, precision, and F1 score.

Accuracy is a measure of how well the algorithm is able to correctly predict the class of a given sample. It is calculated by dividing the number of correctly classified samples by the total number of predictions made. In other words, it represents the proportion of the total number of predictions that the classifier got right.

Recall is a measure of the performance of a classification model that indicates the proportion of actual positive cases that were correctly predicted by the model. It is particularly useful when the classes are imbalanced, as it gives a more complete picture of the model’s performance on the minority class.

Precision is a measure of the performance of a classification model that indicates the proportion of predicted positive cases that were actually positive. In a confusion matrix, precision is calculated by dividing the number of true positive predictions made by the model by the total number of predicted positive cases.

It is important to note that precision and recall are often trade-offs of each other: Increasing one may result in decreasing the other. As such, it is often useful to consider both precision and recall when evaluating the performance of a classification model. One way to do this is to use the F1 score, which is the harmonic mean of precision and recall.

These metrics are shown in Eq. (1) as accuracy, Eq. (2) as recall, Eq. (3) as precision, and Eq. (4) as F1 score.

$$\begin{aligned} \text {Accuracy}= & {} \frac{\text {TN}+\text {TP}}{\text {TP}+\text {FP}+\text {TN}+\text {FN}} \end{aligned}$$
(1)
$$\begin{aligned} \text {Recall}= & {} \frac{\text {TP}}{\text {TP}+\text {FN}} \end{aligned}$$
(2)
$$\begin{aligned} \text {Precision}= & {} \frac{\text {TP}}{\text {TP}+\text {FP}} \end{aligned}$$
(3)
$$\begin{aligned} F1\, \text {Score}= & {} \frac{2\times (\text {Recall}\times \text {Precision})}{\text {Recall}+\text {Precision}} \end{aligned}$$
(4)

4 Results and Discussion

The proposed model is initiated with the image dimensions set to 224x224 for the input layer. The ReLU and Leaky ReLU activation functions are applied to each channel, and Adam Optimizer and Stochastic Gradient Descent methods are used as the optimizer. Leaky ReLU and Adam Optimizer are chosen because they provided the best results among these methods. The learning rate is also tested at 0.001, 0.003, 0.0001, and 0.0003. The best results are obtained with 0.0001, so this value is used for training. The filter size is set to \(5\times 5\) and the maximum pooling to \(2\times 2\). The channels are combined in the concatenated layer. The output layer is designed to be five-class with a softmax activation function. The model was run with different numbers of epochs (60-80-100-120). The training was terminated at 100 epochs, as it was seen that the training did not progress further at this stage. Hence, the training is set to 100 epochs. The training of the network is completed in this way.

The GeForce GTX 1050 Ti graphics card is used for all processes. The features used for classification algorithms are shown in Table 2.

The average accuracy rate for the five-class classification (lung opacity, normal, COVID-19, pneumonia, and tuberculosis) using the three-channel fusion CNN model is calculated to be 91.71%. The results of the four-class (lung opacity, normal, pneumonia, and tuberculosis), three-class (lung opacity, normal, and tuberculosis), and two-class (lung opacity and normal) classifications are calculated to be 87.12%, 92.44%, and 92.52%, respectively. The accuracy results of the classification with different classes are shown in Table 3.

Table 5 Comparison of the proposed study

Figure 6 shows the accuracy, loss, and learning rate values for the five-class classification process. Accuracy, loss, and learning rate values are shown for four-class classification in Fig. 7, three-class classification in Fig. 8, and two-class classification in Fig. 9.

Table 4 shows the recall, precision, and F1 score performance of the classifiers.

Figure 10 shows the confusion matrix for detecting lung opacity using a five-class and three-channel fusion CNN model. The confusion matrix of the four-class, three-class, and two-class classifiers created using the same model is demonstrated in Figs. 1112, and 13, respectively.

Table 5 compares the proposed study with similar studies conducted in the literature.

Since Sirazitdinov et al. [12] developed a model on object recognition and used a smaller dataset, the accuracy values were more limited. Senan et al.’s [13] AlexNet and Resnet50 models are less complex and have simpler structures compared to our three-channel model. In addition, their dataset only includes images from the viral pneumonia class in the pneumonia category and does not include any images in the tuberculosis class, which leads to a higher classification accuracy. Li et al. [14] have identified a performance similar to that of the model we proposed. The success of the Cov-Net model in classification is noteworthy. However, the inclusion of the tuberculosis class in the four-class classification again increases the success. Muhammad et al. [17] conducted a five-class classification in which they divided the pneumonia class into two and included viral pneumonia and bacterial pneumonia in the classification. The classification accuracy was lower compared to other multi-class classifications.

5 Conclusion and Future Work

Lung opacity, pneumonia, COVID-19, and tuberculosis are often confused with each other. There is a need for serious classification systems in these types of medical conditions where differences are not prominent. Therefore, we have implemented a new approach for the detection and multi-classification of lung opacity with a three-channel fusion CNN model. In this model, CNN models, which are successful in classification problems, are designed so that each CNN model works on a separate channel. Individual evaluation of the extracted features and fusion steps provides a significant advantage in the classification phase. Additionally, contrary to the studies conducted with limited and unbalanced datasets in the literature, a dataset with a higher number of classes can produce stable results for training and testing. Therefore, our study can be a guide not only for lung opacity but also for other medical studies.

In future studies, we plan to perform lung segmentation in the new-progressive dataset that we will create for the follow-up of cases detected with classification. It is believed that the results can be integrated with the current study to create a web-based lung opacity warning system. In this way, we are optimistic that we can reduce the workload for physicians and medical institutions and provide a more comfortable living environment for patients.