Development of Chest X-ray Image Evaluation Software Using the Deep Learning Techniques

: Although the widespread use of digital imaging has enabled real-time image display, images in chest X-ray examinations can be conﬁrmed by the radiologist’s eyes. Considering the development of deep learning (DL) technology, its application will make it possible to immediately determine the need for a retake, which is expected to further improve examination throughput. In this study, we developed software for evaluating chest X-ray images to determine whether a repeat radiographic examination is necessary, based on the combined application of DL technologies, and evaluated its accuracy. The target population was 4809 chest images from a public database. Three classiﬁcation models (CLMs) for lung ﬁeld defects, obstacle shadows, and the location of obstacle shadows and a semantic segmentation model (SSM) for the lung ﬁeld regions were developed using a ﬁvefold cross validation. The CLM was evaluated using the overall accuracy in the confusion matrix, the SSM was evaluated using the mean intersection over union (mIoU), and the DL technology-combined software was evaluated using the total response time on this software (RT) per image for each model. The results of each CLM with respect to lung ﬁeld defects, obstacle shadows, and obstacle shadow location were 89.8%, 91.7%, and 91.2%, respectively. The mIoU of the SSM was 0.920, and the software RT was 3.64 × 10 − 2 s. These results indicate that the software can immediately and accurately determine whether a chest image needs to be re-scanned.


Introduction
Several X-ray examinations are performed annually on a global scale [1], of which chest X-ray (CXR) examination accounts for the largest number of cases [2]. In the past, the X-ray intensity distribution obtained by irradiating the subject with X-rays was taken on film, and the image could be displayed by developing the distribution (screen/film system). However, in recent years, digital imaging methods, such as computed radiography (CR), have been increasingly used. As a result, the time required to display the taken images has been significantly shortened. Although imaging plates require time for reading and processing, flat panel detectors (FPDs), which display images in real time, are becoming more widely used.

Subjects
In this study, we used the publicly available dataset "ChestX-ray8" provided by the National Institutes of Health (NIH) Clinical Center [18]. The software was based on 4809 images from this dataset. The purpose of this software was to provide an artificial intelligence-based judgment on whether a CXR image should be retaken for radiologists in CXR examinations in clinical settings. Therefore, it is necessary to be able to extract the CXR images which should be retaken from many CXR images.
We used 1000 of 4809 images to create a classification model (CLM). Because each CLM was created independently, we did not consider the overlap of data among the models.
In addition, we created a semantic segmentation model (SSM) to improve the visibility of the lung field region. In this case, we used 1000 chest images without considering the overlap of data with other models. Table 1 shows the breakdown of the 4809 chest images, including duplicates. On CXR examination, the presence or absence of defects in the lung fields should be considered as the first priority to determine whether the images need to be retaken. This is because if the lung fields to be examined are not adequately depicted without any defects, there is a risk of missing the disease to be detected.
We used 1000 of 4809 images in which the lung field area was depicted without defects (defined as "OK") and 1000 images in which the lung field area was depicted with defects (defined as "NG") ( Figure 1) and created a CLM for the lung field defects.

CLM for Obstacle Shadow
If no lung field defect is found on the CXR image, it is important to check the presence or absence of obstacle shadows and their types to determine whether retaking the images is necessary. This is because the presence of obstacle shadows may lead to overlooking the disease if it is hidden in the location of the obstacle shadows.
There are two major types of obstacle shadows: medical and nonmedical devices. In the case of medical devices, such as pacemakers and tubes, there is no need to remove them during imaging because their removal poses a risk to the patient's safety. On the other hand, in the case of nonmedical devices, such as accessories and underwear, their removal does not pose a risk to the patient's body; therefore, these should be removed before retaking the CXR image.
Of the collected 4809 CXR Images, a total of 1000 chest images without any obstacle shadows (defined as "None"); 1000 chest images with medical devices, such as a pacemaker, port, and tube and an electrocardiogram device (defined as "Medical devices"); and 1000 chest images with nonmedical devices, such as a belt, an underwear, When this CLM was trained, for data augmentation, the brightness was changed in five steps (50%, 75%, 100%, 125%, and 150%) and the scale was changed in three steps (0.8×, 0.9×, and 1.0×). Therefore, the data were expanded 15 times using data augmentation and used for training to create a CLM.
In this study, data augmentation with scaling was performed to improve the generalization performance. However, if the upscaling process is applied to the images without lung field defects, those images may be the images with lung field defects. Therefore, the equalizing and downscaling processes were applied. In addition, to standardize the data augmentation method for the images with lung field defects, the data were expanded by applying the same method. If no lung field defect is found on the CXR image, it is important to check the presence or absence of obstacle shadows and their types to determine whether retaking the images is necessary. This is because the presence of obstacle shadows may lead to overlooking the disease if it is hidden in the location of the obstacle shadows.
There are two major types of obstacle shadows: medical and nonmedical devices. In the case of medical devices, such as pacemakers and tubes, there is no need to remove them during imaging because their removal poses a risk to the patient's safety. On the other hand, in the case of nonmedical devices, such as accessories and underwear, their removal does not pose a risk to the patient's body; therefore, these should be removed before retaking the CXR image.
Of the collected 4809 CXR Images, a total of 1000 chest images without any obstacle shadows (defined as "None"); 1000 chest images with medical devices, such as a pacemaker, port, and tube and an electrocardiogram device (defined as "Medical devices"); and 1000 chest images with nonmedical devices, such as a belt, an underwear, a necklace, and earrings (defined as "Others"), were selected to create a CML of the disorder shadows ( Figure 2).

CLM for the Location of Obstacle Shadow
For CXR images that belonged to the "Other" category in the dataset of CLM for obstacle shadow, the location of the obstacle shadow, inside or outside the lung field, was important to determine the need for a retake the CXR examination. This was because if the obstacle shadows were located inside of the lung fields, the disease may be overlooked, whereas if they were located outside of the lung fields, they would not affect the diagnosis because the lung fields can be adequately evaluated.
We manually classified 1000 chest images ("Others") in which a nonmedical device was detected according to the position of the observed obstacle shadows and created two data groups: one group in which the obstacle shadows were located inside of the lung fields ("In") and the other group in which the obstacle shadows were located outside of the lung fields ("Out"). The numbers of images for the "In" and "Out" data groups were 270 and 730, respectively ( Figure 3).
When this CLM was trained, the brightness change in five steps (50%, 75%, 100%, 125%, and 150%) was used as a common data augmentation method. To maintain the number of training images close to the same number, the "In" image was rotated 11 times (−15°, −12°, −9°, −6°, −3°, 0°, +3°, +6°, +9°, +12°, and +15°) and scaled equally 2 times (0.9× and 1.0×). On the other hand, the "Out" image was rotated 7 times (−15°, −10°, −5°, 0°, +5°, +10°, and +15°) to increase the data volume by a total of 35 times, and then a CLM was created. In addition, when this CLM was trained, for data augmentation, the brightness was changed in five steps (50%, 75%, 100%, 125%, and 150%) and the angle was changed in seven steps (−15 • , −10 • , −5 • , 0 • , +5 • , +10 • , and +15 • ). The training data were multiplied by 35 to create a CLM. Patients with medical devices often undergo portable chest radiography while lying in a hospital bed because of their health conditions. In portable chest radiography, the detector is placed between the bed and the patient for imaging, and the position of the patient may be tilted in relation to the image. Therefore, we decided to use the data augmentation method via rotation to improve the generalization performance to the patient's body inclination.

CLM for the Location of Obstacle Shadow
For CXR images that belonged to the "Other" category in the dataset of CLM for obstacle shadow, the location of the obstacle shadow, inside or outside the lung field, was important to determine the need for a retake the CXR examination. This was because if the obstacle shadows were located inside of the lung fields, the disease may be overlooked, whereas if they were located outside of the lung fields, they would not affect the diagnosis because the lung fields can be adequately evaluated.
We manually classified 1000 chest images ("Others") in which a nonmedical device was detected according to the position of the observed obstacle shadows and created two data groups: one group in which the obstacle shadows were located inside of the lung fields ("In") and the other group in which the obstacle shadows were located outside of the lung fields ("Out"). The numbers of images for the "In" and "Out" data groups were 270 and 730, respectively ( Figure 3).

SSM for the Lung Field Region
If the human eyes can immediately recognize the lung field region when applying these CLMs, the judgments presented by the created CLMs can be easily recognized. This will contribute to the improvement not only in the throughput of daily work but also in the accuracy of examinations because it will lead to the establishment of a system of second checks by artificial intelligence and humans.
In this study, we performed a segmentation of both lung fields, including the mediastinum, on 1000 chest images and constructed an SSM that recognized the lung field region ( Figure 4). The lung fields, including the mediastinum, were segmented because medical devices, such as corrective fixation devices for scoliosis, and nonmedical devices, such as necklaces, were often located on the mediastinum. In other words, in this study, the lung field region, including the mediastinum, underwent semantic segmentation so that these devices can be accurately segmented, including those on the mediastinum.

SSM for the Lung Field Region
If the human eyes can immediately recognize the lung field region when applying these CLMs, the judgments presented by the created CLMs can be easily recognized. This will contribute to the improvement not only in the throughput of daily work but also in the accuracy of examinations because it will lead to the establishment of a system of second checks by artificial intelligence and humans.
In this study, we performed a segmentation of both lung fields, including the mediastinum, on 1000 chest images and constructed an SSM that recognized the lung field region ( Figure 4). The lung fields, including the mediastinum, were segmented because medical devices, such as corrective fixation devices for scoliosis, and nonmedical devices, such as necklaces, were often located on the mediastinum. In other words, in this study, the lung field region, including the mediastinum, underwent semantic segmentation so that these devices can be accurately segmented, including those on the mediastinum.

Software Development
In this study, we developed an in-house MATLAB (The MathWorks, Inc., Natick, MA, USA) software to evaluate the CXR images by combining three CLMs and an SSM in a single software package. With this combination of 4 DL models, the software could not only immediately identify the lung fields (contouring) in the input CXR image but could also obtain an artificial intelligence response from the viewpoint of whether retaking the CXR image is necessary. The definition of the classification model is based on the procedure actually used by radiological technologists to check CXR images taken in clinical practice. Decision making is performed to ensure that all lung fields are included in the image, that no foreign objects are reflected in the image, and that if foreign objects are included, they do not affect the lung fields as the obstacle shadow, including the longitudinal regions. The developed software allows the same flow as that used by radiological technologists to determine whether or not a CXR image needs to be retaken. Figure 5 shows an overview of our software using multiple DL models. The model was created by training the data augmented a total of 35 times by changing the brightness five times (50%, 75%, 100%, 125%, and 150%) and rotating the data seven times (−15 • , −10 • , −5 • , 0 • , +5 • , +10 • , and +15 • ).

Software Development
In this study, we developed an in-house MATLAB (The MathWorks, Inc., Natick, MA, USA) software to evaluate the CXR images by combining three CLMs and an SSM in a single software package. With this combination of 4 DL models, the software could not only immediately identify the lung fields (contouring) in the input CXR image but could also obtain an artificial intelligence response from the viewpoint of whether retaking the CXR image is necessary. The definition of the classification model is based on the procedure actually used by radiological technologists to check CXR images taken in clinical practice. Decision making is performed to ensure that all lung fields are included in the image, that no foreign objects are reflected in the image, and that if foreign objects are included, they do not affect the lung fields as the obstacle shadow, including the longitudinal regions. The developed software allows the same flow as that used by radiological technologists to determine whether or not a CXR image needs to be retaken. Figure 5 shows an overview of our software using multiple DL models. i. 2023, 13, x FOR PEER REVIEW 8 of 19 Figure 5. Overview of the CXR image evaluation software.

Architecture and Training
In this study, three CLMs and an SSM were made-ResNet-50 [19] was used to create the CLM, and DeepLabv3+ [20] was used to create the SSM. Tables 2 and 3 show the training conditions used for the CLM and SSM. Table 4 shows the equipment and software used in the training.
To avoid overtraining, which would make it impossible to accurately evaluate the generalization performance, we used a fivefold cross validation for training and evaluation. Each dataset was defined with a name from 1-fold to 5-fold, respectively, to distinguish between datasets. Because of the many types of obstacle shadows, when creating the CLM for the obstacle shadows, datasets were created while taking care not to bias the types of obstacle shadows depicted among the subsets.

Architecture and Training
In this study, three CLMs and an SSM were made-ResNet-50 [19] was used to create the CLM, and DeepLabv3+ [20] was used to create the SSM. Tables 2 and 3 show the training conditions used for the CLM and SSM. Table 4 shows the equipment and software used in the training.
To avoid overtraining, which would make it impossible to accurately evaluate the generalization performance, we used a fivefold cross validation for training and evaluation. Each dataset was defined with a name from 1-fold to 5-fold, respectively, to distinguish between datasets. Because of the many types of obstacle shadows, when creating the CLM for the obstacle shadows, datasets were created while taking care not to bias the types of obstacle shadows depicted among the subsets.

Evaluating the Created Models
In this study, we evaluated the CLM using overall accuracy on the confusion matrix and the SSM using the mean intersection over union (mIoU). When evaluating "the chest X-ray image evaluation software" with three CLMs and one SSM, the time taken for the input CXR image to pass through each created model and that for the system to show a response as to whether a retaking is necessary (response) were evaluated using the response time on this software (RT). Because each model was created independently, the RT per image of the software was calculated by summing up the RT per image of each model. The formulas for the evaluation indices were as follows, where TP is true positive, FP is false positive, TN is true negative, FN is false negative, N is the number of images, RT CLM1 is the response time of the lung field defect CLM, RT CLM2 is the response time of the disorder shade CLM, RT CLM3 is the response time of the disordered shadow location CLM, and RT SSM is the response time of the lung field area SSM: Because each model was trained and evaluated using a fivefold cross validation, the mean values of the overall accuracy and mIoU of the obtained five models were used to calculate the evaluation of each model. Figure 6 and Table 5 show the results of the fivefold cross validation of 1000 chest images without lung field defects and 1000 chest images with lung field defects. Figure 6 shows the results of the model (1-fold) among the five models (1-fold-5-fold), using the confusion matrix as a representative.

CLM for Lung Field Defects
Because each model was trained and evaluated using a fivefold cross validation, the mean values of the overall accuracy and mIoU of the obtained five models were used to calculate the evaluation of each model. Figure 6 and Table 5 show the results of the fivefold cross validation of 1000 chest images without lung field defects and 1000 chest images with lung field defects. Figure 6 shows the results of the model (1-fold) among the five models (1-fold-5-fold), using the confusion matrix as a representative.

CLM for Lung Field Defects
These results indicated that the presence or absence of lung field defects could be discriminated with an accuracy of approximately 90%.   Figure 7 and Table 6 show the results of the fivefold cross validation of 1000 chest images with a medical device ("Medical device"), 1000 chest images with the nonmedical device ("Other"), and 1000 chest images without obstacle shadows ("None").   These results indicated that the presence or absence of lung field defects could be discriminated with an accuracy of approximately 90%. Figure 7 and Table 6 show the results of the fivefold cross validation of 1000 chest images with a medical device ("Medical device"), 1000 chest images with the nonmedical device ("Other"), and 1000 chest images without obstacle shadows ("None"). Figure 7 shows the results of model (1-fold) among the five models (1-fold-5-fold), using the confusion matrix as a representative of the results of model (1-fold).

CLM for Obstacle Shadow
shows the results of model (1-fold) among the five models (1-fold-5-fold), using the confusion matrix as a representative of the results of model (1-fold).
These results indicated that the presence or absence and type of the obstacle shadow could be discriminated with accuracy of approximately 92%.   Figure 8 and Table 7 show the results of the evaluation of 730 "In" and "Out" images, which were classified according to the location of the obstacle shadows, by the fivefold cross validation. Figure 8 shows the results of model (1-fold) among the five models (1fold-5-fold), using the confusion matrix as a representative of the results of model (1-fold).  Table 6. Accuracy evaluation for each dataset.

Model
Overall Accuracy (%) These results indicated that the presence or absence and type of the obstacle shadow could be discriminated with accuracy of approximately 92%.  Table 7 show the results of the evaluation of 730 "In" and "Out" images, which were classified according to the location of the obstacle shadows, by the fivefold cross validation. Figure 8 shows the results of model (1-fold) among the five models (1-fold-5-fold), using the confusion matrix as a representative of the results of model (1-fold). These results indicated that it was possible to discriminate with approximately 91% accuracy whether the location of the obstacle shadow was inside or outside the lung field.

SSM for the Lung Field Region
Semantic segmentation of the lung field region was performed on 1000 chest images, and the results evaluated by the fivefold cross validation are shown in Table 8.
The results show that semantic segmentation can be performed with 92% accuracy for the lung field region, including the mediastinum.   These results indicated that it was possible to discriminate with approximately 91% accuracy whether the location of the obstacle shadow was inside or outside the lung field.

SSM for the Lung Field Region
Semantic segmentation of the lung field region was performed on 1000 chest images, and the results evaluated by the fivefold cross validation are shown in Table 8. The results show that semantic segmentation can be performed with 92% accuracy for the lung field region, including the mediastinum.

Chest X-ray Image Evaluation Software
The RT per image for each model was used to evaluate the CXR image evaluation software using three CLMs and one SSM. The results are shown in Table 9. Table 9. Response time of each model on this software (CLM1-lung field defect classification model; CLM2-obstacle shadow classification model; CLM3-obstacle shadow location classification model; and SSM-lung field region semantic segmentation model). The results were as follows: the RT per chest image for the lung field defect CLM was 2.40 × 10 −3 s; that for the obstacle shadow CLM was 1.85 × 10 −3 s; that for the obstacle shadow location CLM was 3.25 × 10 −3 s; and that for the lung field region SSM was 2.89 × 10 −2 s. Therefore, the RT of this software is the sum of these RTs 3.64 × 10 −2 s. Figure 9 shows the actual screen of the software. For example, the response "Need retaking" was shown for the chest image with the lung field defect, indicating the radiographer's need for a retake. Figure 10 shows the decision evidence for each of the representative images, with a heatmap showing the basis for each decision in terms of occlusion sensitivity [21], which is one of the Saliency maps. For each feature indicated by a red circle on the image, the heatmap shows strong evidence for classification in red.

Chest X-ray Image Evaluation Software
The RT per image for each model was used to evaluate the CXR image evaluation software using three CLMs and one SSM. The results are shown in Table 9.
The results were as follows: the RT per chest image for the lung field defect CLM was 2.40 × 10 −3 s; that for the obstacle shadow CLM was 1.85 × 10 −3 s; that for the obstacle shadow location CLM was 3.25 × 10 −3 s; and that for the lung field region SSM was 2.89 × 10 −2 s. Therefore, the RT of this software is the sum of these RTs 3.64 × 10 −2 s. Figure 9 shows the actual screen of the software. For example, the response "Need retaking" was shown for the chest image with the lung field defect, indicating the radiographer's need for a retake. Figure 10 shows the decision evidence for each of the representative images, with a heatmap showing the basis for each decision in terms of occlusion sensitivity [21], which is one of the Saliency maps. For each feature indicated by a red circle on the image, the heatmap shows strong evidence for classification in red.

Discussion
To the best of our knowledge, this software is the first attempt to apply the DL techniques to determine whether retaking a CXR image is necessary. In this software, four DL models were combined to form a single system. However, since there are few related previous studies, the models and software developed in this study will be objectively discussed through comparison with studies of QA systems for X-ray images, which are most relevant to this study, and studies on semantic segmentation of lung field regions not including the mediastinum. Here, we discussed each model and this software and described the limitations and prospects of this study.
First, we presented a discussion on the lung field defect, CLM. Although the classification technique has been widely applied to the classification of the presence or absence of defective products on factory production lines [22], its application to the classification of defects in the lung field region has not been confirmed yet. In such a situation, Junhao et al. [17] classified the presence or absence of defects in the lung field region as part of the construction of a QA system for CXR images, using a combined application of the semantic segmentation and classification techniques. In their study, they attempted to perform QA pixel-wise using the semantic segmentation technique instead of applying image classification directly, because the target area for QA evaluation was small. As a result, the accuracy of the image classification for the presence or absence of lung field defects at the image level was 92.50%, which was slightly higher than that of the present study, whereas the pixel-wise examination using the semantic segmentation technique showed an accuracy of 97.96%. In the data used in this study, there were some images in which the lung field defects were identified in small areas, such as the costophrenic angle. Considering the performance of hardware in actual clinical practice, complex data input may result in lower throughput due to slower processing speed. However, it is important to consider input features that will allow AI to identify the presence or absence of lung field defects more easily in the future. Second, we presented a discussion on the obstacle shadow CLM. For the classification of the presence/absence of obstacle shadows, Junhao et al. [17] attempted to classify the presence/absence of artifacts in the same way. In this case, the accuracy of the image-level classification of the presence/absence of artifacts was 83.75%, which was slightly lower than the accuracy of 91.7% in the present study. One reason for this difference in accuracy may be the difference in the number of chest images with artifacts used for training. The fact that the accuracy was better in the present study, in which

Discussion
To the best of our knowledge, this software is the first attempt to apply the DL techniques to determine whether retaking a CXR image is necessary. In this software, four DL models were combined to form a single system. However, since there are few related previous studies, the models and software developed in this study will be objectively discussed through comparison with studies of QA systems for X-ray images, which are most relevant to this study, and studies on semantic segmentation of lung field regions not including the mediastinum. Here, we discussed each model and this software and described the limitations and prospects of this study.
First, we presented a discussion on the lung field defect, CLM. Although the classification technique has been widely applied to the classification of the presence or absence of defective products on factory production lines [22], its application to the classification of defects in the lung field region has not been confirmed yet. In such a situation, Junhao et al. [17] classified the presence or absence of defects in the lung field region as part of the construction of a QA system for CXR images, using a combined application of the semantic segmentation and classification techniques. In their study, they attempted to perform QA pixel-wise using the semantic segmentation technique instead of applying image classification directly, because the target area for QA evaluation was small. As a result, the accuracy of the image classification for the presence or absence of lung field defects at the image level was 92.50%, which was slightly higher than that of the present study, whereas the pixel-wise examination using the semantic segmentation technique showed an accuracy of 97.96%. In the data used in this study, there were some images in which the lung field defects were identified in small areas, such as the costophrenic angle. Considering the performance of hardware in actual clinical practice, complex data input may result in lower throughput due to slower processing speed. However, it is important to consider input features that will allow AI to identify the presence or absence of lung field defects more easily in the future. Second, we presented a discussion on the obstacle shadow CLM. For the classification of the presence/absence of obstacle shadows, Junhao et al. [17] attempted to classify the presence/absence of artifacts in the same way. In this case, the accuracy of the image-level classification of the presence/absence of artifacts was 83.75%, which was slightly lower than the accuracy of 91.7% in the present study. One reason for this difference in accuracy may be the difference in the number of chest images with artifacts used for training. The fact that the accuracy was better in the present study, in which a larger number of chest images with artifacts were used for training, suggests that further improvement in classification accuracy can be expected with an increase in the number of images in the future. On the other hand, the classification using the semantic segmentation technique showed an accuracy of 94.90%, which was better than the accuracy of the present study.
Although we were unable to identify any studies that directly classified the types of disability shadows, Ue-Hwan et al. [23] investigated the manufacturer's classification, model group identification, and magnetic resonance imaging safety characterization of cardiac implantable electronic devices (CIEDs) using a DL-based algorithm. The overall accuracy rates against each internal test dataset were 99.7%, 97.2%, and 98.9%, respectively. In this study, images with the CIED portion cropped and resized as a preprocess were used for training and evaluation. Therefore, considering the purpose of the study and the difference in image formats, it is difficult to directly compare the accuracy with the present study. Third, we described a discussion of the obstacle shadow location CLM. In this study, we applied the data augmentation technique to 270 chest images that belonged to the "In" category and 730 chest images that belonged to the "Out" category. The number of data used in the training was made to be close to the same number by adjusting the expansion rate of the data augmentation. As a result, although there were some differences in the number of training data between the two classes, the classification accuracy was high without being attracted to the features of the other. This result indicates the validity of the data augmentation method used to maintain the number of data close to the same number. However, one of the reasons why the overall accuracy of the present obstacle shadow location CLM was only 91.2% may be due to the cases in which the obstacle shadows were located at the boundaries of the lung fields or where the obstacle shadows straddled the boundaries between the inside and outside of the boundaries, such as necklaces. In such cases, it is difficult to determine which region of the lung field the obstacle shadow is located, whether inside or outside. Therefore, it is considered that such a situation may have caused the accuracy degradation; however, we have not been able to visualize the basis for the decision of the DL model in this study. In the future, we believe that it is necessary to incorporate a system, such as the saliency maps [21,24], that can represent the basis of judgments on a heat map and examine the causes in more detail. Fourth, we described a discussion on the SSM of the lung field region. In this study, we compared the results of previous studies [11,12] and performed semantic segmentation for the lung field region, including the mediastinum. Among such previous studies, in the study on CardioNet by Abbas et al. [25], semantic segmentation was performed not only for the lung field but also for the heart and clavicle. Among the CardioNet used in this study, CardioNet-B performed semantic segmentation for the lung, heart, and clavicle with mIoU values of 0.9728, 0.9042, and 0.8674, respectively. Thus, the accuracy was higher than that of the present study when the comparison was made only for the lung. However, when we focused on the mIoU values of the heart and clavicle, we could confirm that the mIoU values were lower than those of the lung. This result suggests that semantic segmentation is more difficult in the low-radiolucent tissues than in the high-radiolucent tissues, such as the lung field. This also suggests that the accuracy of semantic segmentation of the lung field, including the mediastinum, tends to be lower than that of the semantic segmentation of the lung field without the mediastinum.
However, the SSM developed in this study had a higher mIoU value than in a previous study [15] using DeepLabv3+, similar to our method. Considering this finding, we believe that the semantic segmentation and data augmentation methods used in our method are appropriate. However, to further improve the accuracy of the semantic segmentation method in the future, it is natural to use more data, and it is necessary to take measures proposed by Johnatan et al. [26] to reduce the accuracy of semantic segmentation due to the abnormal shadows of lesions and accurately recognize the lung field regions of more examinees. Fifth, we discussed the evaluation software for CXR images that combined these DL models. We evaluated the RT per chest image of the software by summing the RTs per chest image of the four DL models. Considering that the image processing time of FPD was approximately several seconds and that of CR was approximately several tens of seconds, the RT of 3.64 × 10 −2 s of this software is almost the same as the time required for conventional imaging operations, and it is considered to be able to provide artificial intelligence's judgment on whether retaking is necessary. However, because the RT of this software varies depending on the device used, it is important to examine the response time on a PC with the specifications used in actual clinical practice in the future. In addition, the present study did not evaluate FLOPs. Only the response time on this software was calculated as a result, which should be taken into consideration in the future. FLOPs are useful for evaluating the performance and efficiency of models [27], but in this study, we were interested in evaluating the time from image input to the display of results on a simple software program. This is because the actual time to display multiple models in software is one of the criteria for clinical image confirmation.
This software was created by combining multiple DL models. All images in this study included images from healthy patients to patients who were hospitalized and placed under long-term management, including electrocardiograms. Typically, inappropriate X-ray images depicting missing lung fields are retaken, so they rarely remain in the data. The reason for multiple segmentations and classifiers is that a defective lung field is not appropriate for medical imaging when detecting a lesion, and the radiological technologist must take this into consideration. Therefore, the model should focus only on the detection of lung field defects. In addition, obstruction shadows can be allowed or not, depending on their location, so the segmentation model of the lung field, including the mediastinum, needs to show the exact lung field. Therefore, in this study, the software was developed based on the radiological technologist's decision in X-ray radiography. Each model was created by learning the images that were originally subject to retaking and could not be stored in the picture archiving and communication system or those that were drawn by a medical device because of health conditions. Therefore, for the effective use of the original images, each model was created independently without considering the overlap of training data. This suggests that it is difficult to comprehensively evaluate the software that combines each model. However, because each model of this software had an accuracy of approximately 90%, it is considered to be possible to immediately accurately provide the radiologist with a decision as to whether the retaking is necessary, and to encourage confirmation by the human eye. Finally, the limitations and prospects of this study were described. One of the limitations of this study is that the software was built and evaluated using CXR images collected from "CXR8" published by the NIH Clinical Center. Considering that CXR examination is the most common imaging method, it is important for a QA-related system, such as this software, to be trained and evaluated on data taken from several facilities. This is because it contributes to higher generalization performance. Therefore, we will aim to add more data by utilizing other datasets in the future and attempt to construct software with high generalization performance and accuracy. Even though the software developed in this study works on MATLAB, the models created in MATLAB can be converted to the open neural network exchange (ONNX) format, which allows for improvements and refinements irrespective of the development framework. Another limitation is that the contents considered in this system alone cannot provide appropriate judgments for all cases. For example, we were unable to examine the effects of the scapula on the lung field, as examined by Junhao et al. [17]. Therefore, additional data collection and training will be necessary to apply the method to more situations. In addition, the efficiency of post-imaging should be considered for the improvement in the overall efficiency of CXR examinations. Oura et al. [28] reported the QA of CXR images using the DL techniques. In this study, the DL techniques were applied to four points-correction of orientation, correction of angle, correction of left-right reversal, and judging the patient's position-and proposed a method to improve the accuracy and efficiency of daily operations. Therefore, the combined application with other QA systems in the future will enable us to perform the current CXR examinations with higher throughput. For example, a study that developed a computer-aided diagnosis system (CAD system) that utilizes a convolutional neural network ensemble to reduce the workload of physicians and radiologists and achieve quick and accurate diagnosis showed a marked improvement in accuracy in the classification of chest X-ray images [29]. In addition, a study that developed a DL-based algorithm to reduce data acquisition time in 3-D X-ray microscopy showed an 8-to 10-fold increase in speed while maintaining image quality, even with several hundred X-ray projections [30]. There are studies that have achieved high accuracy for each of these purposes, and further studies are needed in the system to determine whether or not to perform retakes in this study. Since the penetration of X-ray images varies depending on the subject's physique and other factors, it is necessary to be able to automatically control image quality and detect low image quality to improve the efficiency of medical image analysis, as in the system developed by Dovganich et al. [31] to automatically determine the penetration of pulmonary X-ray images. In addition, software and hardware and their efficiency need to be considered for future applications of inference; further study on how to make classification and regression models in the development of deep learning models, as shown by Sumathi et al. [32]; and for early prediction of infectious diseases based on chest X-ray images, as shown by Namburu et al. [33] and the integration of deep learning algorithms with FPGA hardware for efficient analysis and low power consumption, among other improvements. Based on these improvements, further development of the method in this study is considered possible in the future.

Conclusions
In this study, we constructed software to determine whether a CXR image needs to be retaken by combining three CLMs and an SSM. In addition to the 90% accuracy of each model, we found that the software can provide a decision on whether a retaking is necessary in 3.64 × 10 −2 s. The developed software can provide the same decision-making support that a radiologist would require to determine whether a CXR image needs to be retaken, and further improvements to the individual classification models are expected to lead to a system more appropriate for real-world clinical practice. However, there is a need to further investigate developmental considerations by incorporating existing techniques that have been presented in many fields.
Author Contributions: K.U. contributed to the data analysis, algorithm construction, and writing and editing of the manuscript. T.Y. and S.I. reviewed and edited the manuscript. H.S. proposed the idea and contributed to the data acquisition, performed supervision and project administration, and reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The created models in this study are available on request from the corresponding author. The source code of this study is available at https://github.com/MIAlaboratory/CXRevaluation (accessed on 20 April 2023).