Introduction

Since its introduction in the 1970s, the use of computed tomography (CT) has increased in various clinical settings, including emergency departments (EDs)1. In particular, CT-associated ED visits have increased dramatically in patients older than 65 years and in patients with acute abdominal pain (AAP)1. According to a previous study, in patients who visited the ED complaining of AAP, CT may make substantial contributions to diagnosis or disposition decisions and may confirm or exclude alternative diagnoses2. Because physician diagnostic accuracy and confidence increase with CT2, CT plays a critical role in the diagnosis and management of patients with AAP in the ED1,2,3.

Intravenous (IV) contrast agents are widely used in CT examination and are known to improve sensitivity and specificity in many indications4. However, the risk of adverse events, including allergies and nephropathy, must be considered before administering IV contrast agents5,6. In patients with advanced age and underlying chronic kidney disease, the risk of contrast-associated acute kidney injury is increased7,8. A nationwide survey revealed that the proportion of noncontrast-enhanced CT (NECT) among all abdominal CTs has increased from 9 to 14%, and among all abdominal NECTs, the proportion of examinations in patients over 65 years of age has increased from 31 to 41% over the past 10 years9. Since not all patients who visit the ED complaining of AAP are indicated for contrast-enhanced CT (CECT), improving the diagnostic performance of NECT is important.

Recently, deep learning algorithms that reduce the dose of contrast agent or synthesize virtual contrast-enhancement images have been technically validated in brain MRI10,11. One researcher proposed the technical feasibility of synthesizing virtual contrast-enhancement of heart chambers from NECT using a deep learning algorithm12. However, to the best of our knowledge, no previous study has validated the clinical utility of virtual contrast-enhanced abdominal CT synthesized by a deep learning algorithm. The purpose of our study was to investigate the clinical feasibility of deep learning-based synthetic contrast-enhanced CT (DL-SCE-CT) from NECT in patients who visited the ED complaining of AAP.

Results

Baseline demographics and clinical characteristics of the test dataset

The mean age of the included patients was 57.3 years (187 male and 166 female patients). The number of patients in each subgroup was as follows: acute pancreatitis (N = 20), acute diverticulitis (N = 21), liver disease (N = 26), biliary disease (N = 23), oncologic condition (N = 42), acute appendicitis (N = 21), bowel obstruction (N = 22), miscellaneous surgical condition (MSC) (N = 35), miscellaneous medical condition (MMC) (N = 59), and nonspecific abdominal pain (NSAP) (N = 84). Miscellaneous surgical conditions included bowel perforation, bowel strangulation, acute mesenteric ischemia, common hepatic artery pseudoaneurysm after pancreatectomy, acute aortic syndrome and ovarian cyst rupture. Miscellaneous medical conditions included urinary tract infection, urolithiasis, enterocolitis, past or active gastrointestinal bleeding, peptic ulcer and intraabdominal abscess requiring percutaneous drainage. NSAP included cases without a demonstrable cause of abdominal pain on CT.

Outcomes of clinical validation of deep learning-based synthetic CT Images

Step 1: Review of diagnostic performance of image analysis with NECT alone

With NECT alone, the accuracy of diagnosis ranged from 69.4 to 81.5%, and the accuracy of the disposition decision ranged from 70.1 to 84.4%. The accuracy of both diagnosis and disposition decisions differed according to the dataset and reviewer expertise. The accuracy of diagnosis was superior for the experienced radiologists (76.5–81.5%) compared to that of the training radiologists (69.4–77.1%) and in selectively enrolled datasets with specific diagnoses (Dataset-A, 75.5–87.0%) compared to that in consecutively enrolled datasets (Dataset-B, 61.4–73.2%). Similarly, the accuracy of the disposition decision was better in ERs (76.8–84.4%) than in TRs (70.3–75.1%) and in Dataset-A (76.0–91.5%) than in Dataset-B (60.1–75.2%). The confidence of diagnosis and dispositions was equivocal regardless of reviewer expertise (ERs: 3.47–4.07 and 3.83–4.08; TRs: 2.87–4.09 and 3.92–4.19) or dataset (Dataset-A: 3.12–4.18 and 3.97–4.26, Dataset-B: 2.55–4.03 and 3.64–4.11, respectively).

Step 2: Review of diagnostic performance of image analysis with the aid of DL-SCE-CT

Table 1 and Fig. 1 summarize the diagnostic performance of radiologists with or without the aid of DL-SCE-CT. Overall, with the aid of DL-SCE-CT, the accuracy of diagnosis increased from 69.4–81.0% to 70.5–84.7%. Of the six radiologists, three radiologists (50%, P = 0.023, 0.012, < 0.001) reported a significant increase in diagnostic accuracy with DL-SCE-CT. In particular, two-thirds of TRs experienced significant improvement in accuracy. The confidence of diagnosis (from 2.87–4.09 to 2.99–4.50) and disposition (from 3.83–4.19 to 4.11–4.53) also increased, with statistically significant increments observed by five of the six radiologists (83.3%, P < 0.001) (Figs. 2, 3). The accuracy of the disposition decision did not show a significant change for any radiologist. The diagnostic performance of each radiologist with or without the aid of DL-SCE-CT is shown in Supplementary Tables S1 to S6.

Table 1 Accuracy and Confidence of Diagnosis and Disposition decisions.
Figure 1
figure 1

The accuracies and confidences of diagnosis and disposition decisions in each radiologist in 1st and 2nd sessions of image review. The accuracies of diagnosis show increasing tendency in 2nd session (statistically significant increase observed in three of the radiologists and two of the training radiologists). The accuracies of disposition decision show equivocal change between two sessions. The confidences of diagnosis and disposition decision both shows statistically significant increases in five of the six radiologists. ER, experienced radiologist; TR, training radiologist.

Figure 2
figure 2

A 25-year-old male patient who visited the ED complaining of abdominal pain. CT images show fluid distension of small bowel loops with transition at the terminal ileum (arrowhead). The contrast among the bowel wall, visceral fat, and intraluminal fluid is more evident in DL-SCE-CT than in NECT. The patient was admitted for management of Crohn’s disease flares. In this case, all of reviewers made the correct diagnosis (small bowel obstruction at terminal ileum) regardless of DL-SCE-CT. However, two more radiologists made correct disposition decision (admission for medical management) after review of DL-SCE-CT. Moreover, with the aid of DL-SCE-CT, the confidence of the diagnosis and disposition decision increased from 4.17 to 4.50 and 4.00 to 4.50, respectively.

Figure 3
figure 3

A 65-year-old female patient who visited the ED complaining of abdominal pain and fever. CT images show intrahepatic duct stones (arrowhead) with dilated upstream bile ducts. The contrast among the liver parenchyma, fluid within the dilated bile duct, and stones within the bile duct are more evident in DL-SCE-CT than in NECT. The patient was admitted for management of obstructive cholangiohepatitis. In this case, 100.0% (6/6) and 83.3% (5/6) of radiologists made the correct diagnoses and disposition decisions (intrahepatic duct stones with biliary obstruction, admission for medical management), regardless of DL-SCE-CT. However, both radiologists’ confidence in the diagnosis and disposition decisions improved from 3.83 to 4.00 and 4.17 to 4.50, respectively, with the aid of DL-SCE-CT.

Subgroup analysis according to disease category

The accuracy and confidence of diagnosis and disposition decisions were variably reported in different subgroups and are summarized in Table 2. Disease categories in which more than half of the radiologists experienced an increase in confidence of both diagnosis and disposition decisions included oncologic conditions (5/6: 83.3% and 4/6: 66.7%, respectively), MMCs (6/6: 100.0% and 5/6: 83.3%, respectively), and NSAP (5/6: 83.3% and 6/6: 100.0%, respectively). For acute pancreatitis, acute diverticulitis, biliary disease, and acute appendicitis, more than half of the radiologists showed no significant change in confidence in diagnosis and disposition decisions. The confidence of diagnosis and disposition decisions in the 1st session were generally lower in useful subgroups (oncologic conditions, MMC, and NSAP, 1.31–4.71 and 2.81–4.77, respectively) than in less useful subgroups (acute pancreatitis, acute diverticulitis, biliary disease, and acute appendicitis, 3.48–4.90 and 3.70–4.90, respectively). There was no particular subgroup in which DL-SCE-CT could significantly improve more than half of the radiologists’ accuracy of diagnosis or disposition.

Table 2 Subgroup Analysis by Disease Category.

The radiologists determined that the image quality of DL-SCE-CT was sufficient (with moderate limitations for clinical use but no substantial loss of information; mean score 3.33), and the artifact degree was moderate (with preserved diagnostic reliability; mean score 3.25). The image quality score ranged from 1.83 to 4.17, and the artifact degree score ranged from 2.00 to 4.33.

Discussion

In our study, DL-SCE-CT was feasible and helpful for patients visiting the ED with complaints of AAP, increasing the radiologists’ accuracy of diagnosis and confidence level in diagnosis and disposition decisions. In particular, DL-SCE-CT was useful in cases with oncologic conditions, MMCs, and NSAP and for less experienced radiologists. The confidence of diagnosis and disposition decisions significantly increased in five of the radiologists (83.3%, P < 0.001). In addition, diagnostic accuracy was significantly improved in half of the radiologists (P = 0.023, 0.012, < 0.001). In particular, DL-SCE-CT was more helpful in training radiologists, improving the diagnostic accuracy of two-thirds of the radiologists (P = 0.023 and 0.012). Technically, the image quality of DL-SCE-CT was rated as sufficient with moderate limitations and without substantial loss of information, and the degree of artifact was rated as moderate with preserved diagnostic reliability.

DL-SCE-CT was more helpful in oncologic conditions, MMCs and NSAP (helpful subgroups) but less useful in the acute pancreatitis, acute diverticulitis, biliary disease, and acute appendicitis subgroups (unhelpful subgroups). In diseases belonging to the unhelpful subgroups, diagnosis often depends on findings such as fat strandings in the organ-fat interface (e.g., peripancreatic, periappendiceal, or peridiverticular fat strandings) or radio-opaque stones (e.g., acute calculous cholecystitis or cholangitis), which are easily detected by NECT. In these less useful subgroups, radiologists showed high confidence in diagnosis and disposition decisions using NECT alone (3.48–4.91 and 3.23–4.95). In contrast, radiologists showed lower confidence for NECT evaluation of helpful subgroups (1.31–4.36 and 2.81–4.55). It is meaningful that DL-SCE-CT increases the confidence of radiologists in diseases that are difficult to diagnose using NECT alone.

Recently, deep-learning-based synthetic medical images have been an active area of research with broad applications in various medical disciplines. Attempts to increase patient safety using deep learning-based contrast dose reduction have been made. Some researchers have accomplished reductions in the gadolinium dose used for brain MRI using a deep learning method10, while other researchers synthesized fake contrast enhancement images for brain MRI and cardiac CT11,12. Technically, unlike other previous algorithms, our algorithm was able to minimize the misalignment issue caused by minute breathing and patient movement between the input data NECT image and the reference standard CECT image using a two-stage approach27. A detailed description of our algorithm is provided in the Methods section.

Moreover, consideration of clinical significance is necessary for such technology to be used in real-world situations. AAP is one of the most common reasons for visiting the ED, accounting for up to 7–10% of all ED visits13,14. Diseases causing AAP range from self-limiting to life-threatening conditions14, causing large medical and socioeconomic burdens15,16. In particular, the overall burden of AAP and difficulty of diagnosis increase with advancing age14;16,17,18. Thus, the number and proportion of CT-associated ED visits has rapidly increased in elderly patients with AAP 1,2,3;19,20,21. However, liberal use of CT is accompanied by an increased risk of ionizing radiation exposure22,23 and adverse effects due to IV contrast agents5,6,8. Thus, the results of our study are valuable in this situation by augmenting the diagnostic performance of NECT and radiologists’ confidence in decision making, although more improvements are warranted in the future.

Our study has several limitations. First, there is inherent selection bias owing to the retrospective nature of our study. A considerable portion (56.7%, 200/353) of our study population was selectively enrolled, increasing selection bias. Second, the improvement was greater in confidence than in the accuracy of radiologists’ decisions. Although increased confidence is a meaningful benefit, DL-SCE-CT should improve radiologists’ actual performance to improve the clinical outcomes in patients. We hope to improve clinical outcomes by further elaborating the quality of synthetic contrast enhancement. Third, the increments of confidence were present in both correct and incorrect cases, raising concern for increasing confidence of misdiagnosis or mistreatment.

Deep learning-based synthetic CT (DLSCT) might be developed and applied in various clinical settings. Synthetic contrast enhancement is only one of many possibilities. Various kinds of image augmentation could potentially improve patient outcomes. Appropriate clinical settings are necessary for the development of useful synthetic images. For elderly patients with decreased renal function, both an NECT-based synthetic enhancement method and a method that uses a small dose of contrast agent but mimics the use of a full dose (e.g. as used in brain MRI10) could be developed. A combination with pre-existing contrast dose reduction technologies, such as dual energy CT, might be attempted. For particularly radiosensitive populations, such as pediatric patients, deep-learning-based denoising algorithms might be especially helpful24,25,26 to facilitate ultralow-dose imaging. Therefore, further investigation of potentially useful DLSCTs is warranted.

Conclusion

In conclusion, according to our preliminary study, DL-SCE-CT is feasible and is helpful for more accurate and confident diagnosis and disposition decisions regarding patients with AAP in the ED. In particular, DL-SCE-CT is useful in cases with oncologic conditions, MMCs, negative CT findings, and for less experienced radiologists.

Methods

This retrospective multicenter study was approved by the joint Institutional Review Board of Seoul National University Hospital, Seoul National University Bundang Hospital, and Boramae Medical Center. The Institutional Review Board granted a waiver of informed patient consent due to the retrospective nature of our study. All methods were performed in accordance with the relevant guidelines and regulations.

Study population and image dataset

We trained the conversion model using a pre-existing algorithm that generates DL-SCE-CT from NECT27. We used a training dataset consisting of 226 consecutive CT examinations (35,414 paired NECT and CECT images) that were performed in the ED of a tertiary hospital for the evaluation of AAP in January 2019. Then, two test datasets were prepared for the clinical validation of DL-SCE-CT. Common inclusion criteria for both datasets were as follows: (1) CT examinations performed in the ED for the evaluation of AAP and (2) CT examinations consisting of paired NECT and CECT images. Then, among the CT examinations performed in two institutions (one tertiary and one secondary hospital) from May 2019 to August 2019, one radiologist (S.W.K. with 5 years of experience in abdominal radiology) selected 200 CT exams, either with one of the following specific diagnoses (N = 159): biliary disease, acute appendicitis, acute diverticulitis, acute pancreatitis, oncologic pain, miscellaneous surgical condition, bowel obstruction and liver disease, or nonspecific findings (N = 41) as Dataset-A13. Among the CT examinations performed in another institution (tertiary hospital) from January 2019 to June 2019, 153 CT cases meeting the common inclusion criteria were consecutively included as Dataset-B. The NECT and subsequently generated DL-SCE-CT using the aforementioned conversion model were included in both datasets. Figure 4 summarizes the inclusion process of the study population.

Figure 4
figure 4

Flow diagram of the study design and study population inclusion process. NECT, nonenhanced CT; CECT, contrast-enhanced CT; DL-SCE-CT, deep learning-based synthetic contrast-enhanced CT.

For each case, one radiologist (S.W.K.) meticulously reviewed the CT images and electronic medical records, including the clinicopathologic data and laboratory test results, to determine the most appropriate diagnosis and disposition decisions (admission for medical treatment, admission for surgical treatment, or discharge).

CT techniques

All CT examinations were conducted using MDCT scanners with 64–160 detector rows. The acquisition parameters were as follows: tube voltage, 100–120 kVp; tube current, 50–300 mAs; slice thickness, 2.0–3.0 mm; reconstruction interval, 1.0–3.0 mm; pitch, 0.5–1.3; and rotation time, 0.33–0.75 s. Iodinated contrast agent (320 or 350 mg I/mL) was injected using an automatic power injector at a rate of 2.0–5.0 mL/s (total amount 1.6–2.0 mL/kg). The portal phase acquisition time was determined by either the bolus tracking method (beginning of portal phase scan 45–60 s after reaching threshold attenuation [100 HU] at the descending thoracic aorta or immediately after reaching threshold attenuation [100 HU] at the hepatic parenchyma) or fixed time delay (90 s after contrast injection).

Development of DL-SCE-CT image

A conversion model that generates DL-SCE-CT from NECT was developed in a previous study using 23,923 paired NECT and CECT images from 327 CT examinations27. In our study, we adopted the same model with identical architecture and trained it using our training dataset (226 CT examinations with 35,414 paired NECT and CECT images). In contrast to other neural image syntheses, a major problem in synthetic contrast enhancement of abdominal CT is inevitable misalignment between NECT and CECT images owing to patients’ breathing and involuntary movements during examinations. Our model was developed using a two-stage approach to overcome the misalignment issue using a conditional generative adversarial network (cGAN) and a deep convolutional neural network (CNN). In the first stage, a generator (GC→N) that creates synthetic NECT from real CECT was trained adversarially with a discriminator (D) that distinguishes among synthetic NECT, real NECT and real CECT. This stage, which is an inverse of our target task, is technically much easier because NECT images are much less patient-specific than CECT images due to monotonic intensities. The resulting synthetic NECT, which is perfectly aligned with real CECT, is used in the second stage for training a generator (GN→C), which creates synthetic CECT from synthetic NECT. Generators (GC→N and GN→C) were trained using the SPADE architecture, one of the state-of-the-art methods in image-to-image translation. The second generator (GN→C) was finally used to create our synthetic images in the test dataset (Datasets A and B). Figure 5 shows the development process.

Figure 5
figure 5

Schematic diagram of the two-stage approach used for making the conversion model. In the first stage, the generator (GC→N), which generates synthetic NECT from real CECT, is trained adversarially using a conditional generative adversarial network. In the second stage, another generator (GN→C) that generates synthetic CECT from NECT is trained using a deep convolutional neural network. During the second stage of training, synthetic NECT, which is generated from and perfectly aligned with real CECT, is used as input data, resolving the misregistration issue between input data and ground truth (real CECT). NECT, nonenhanced CT; CECT, contrast-enhanced CT; LAdv, adversarial loss; Lrec, reconstruction loss; GC→N, generator that generates synthetic NECT from real CECT; GN→C, generator that generates synthetic CECT from NECT.

Clinical validation of DL-SCE-CT

Step 1: Image analysis using NECT alone

Six reviewers with different expertise participated in two image review sessions (ER1: C.I.S.; ER2: J.H.K.; and ER3: S.J., board-certified experienced abdominal radiologists with 16, 23, and 7 years of experience, respectively; TR1: C.H.R.; TR2: J.C.; TR3: K.J.; training radiologists with 4, 5, and 4 years of experience, respectively). In the first session, the reviewers were asked to report the diagnosis in subjective form and the disposition decision in three-option multiple choice format (admission for medical treatment, admission for surgical treatment, and discharge) based on NECT alone. Reviewers were aware that the images being reviewed were from patients who visited the ED with AAP, but no further clinical or laboratory data were provided. The reviewers’ confidence in the diagnosis and disposition decision was rated on a 5-point scale (1: Not confident at all; 2: Slightly confident; 3: Somewhat confident; 4: Fairly confident; and 5: Completely confident).

Step 2: Image analysis with the aid of DL-SCE-CT

The second review session was initiated two weeks after the first session ended. In the second session, the reviewers were asked to report the diagnosis and disposition decision after reviewing both NECT and DL-SCE-CT with the 5-point scale confidence level. In addition, the reviewers were asked to report the image quality (1: poor image quality-image not usable; 2: restricted image quality-severe limitations for clinical use; clear loss of information; 3: sufficient image quality-moderate limitations for clinical use but no substantial loss of information; 4: good image quality-minimal limitations for clinical use; and 5: excellent image quality-no limitations for clinical use) and artifact degree (1: artifacts resulting in a nondiagnostic image; 2: severe artifacts resulting in limited diagnostic reliability; 3: moderate degree with preserved diagnostic reliability; 4: minimal degree with preserved diagnostic reliability; and 5: excellent without artifact) of DL-SCE-CT on a 5-point scale. The arithmetic means of scores rated by six radiologists were used as representative image quality and artifact degree scores for each case.

Statistical analysis

The accuracy of the diagnosis and disposition decisions were compared between the two review sessions using McNemar’s test. Confidence in the diagnosis and disposition decisions were compared using the Wilcoxon test. For subgroup analysis, the whole study population was divided into ten subgroups according to the CT diagnosis: acute pancreatitis, acute diverticulitis, liver disease (e.g., acute hepatitis and liver abscess), biliary disease (e.g., acute cholecystitis and acute cholangitis), oncologic condition (e.g., malignant bowel obstruction, malignant biliary obstruction and ruptured HCC with hemoperitoneum), acute appendicitis, bowel obstruction, miscellaneous surgical conditions, miscellaneous medical conditions, and NSAP. The accuracy and confidence were separately evaluated in each subgroup. A commercially available software package (MedCalc Statistical Software, Version 19.2.1, MedCalc Software) was used for statistical analysis. A p value less than 0.05 was considered statistically significant.