Diagnostic Performance of a Deep Learning-Powered Application for Aortic Dissection Triage Prioritization and Classification

This multicenter retrospective study evaluated the diagnostic performance of a deep learning (DL)-based application for detecting, classifying, and highlighting suspected aortic dissections (ADs) on chest and thoraco-abdominal CT angiography (CTA) scans. CTA scans from over 200 U.S. and European cities acquired on 52 scanner models from six manufacturers were retrospectively collected and processed by CINA-CHEST (AD) (Avicenna.AI, La Ciotat, France) device. The diagnostic performance of the device was compared with the ground truth established by the majority agreement of three U.S. board-certified radiologists. Furthermore, the DL algorithm’s time to notification was evaluated to demonstrate clinical effectiveness. The study included 1303 CTAs (mean age 58.8 ± 16.4 years old, 46.7% male, 10.5% positive). The device demonstrated a sensitivity of 94.2% [95% CI: 88.8–97.5%] and a specificity of 97.3% [95% CI: 96.2–98.1%]. The application classified positive cases by the AD type with an accuracy of 99.5% [95% CI: 98.9–99.8%] for type A and 97.5 [95% CI: 96.4–98.3%] for type B. The application did not miss any type A cases. The device flagged 32 cases incorrectly, primarily due to acquisition artefacts and aortic pathologies mimicking AD. The mean time to process and notify of potential AD cases was 27.9 ± 8.7 s. This deep learning-based application demonstrated a strong performance in detecting and classifying aortic dissection cases, potentially enabling faster triage of these urgent cases in clinical settings.


Introduction
Acute aortic syndrome (AAS) represents a spectrum of pathological conditions affecting the thoracic and abdominal aorta.Their prevalence ranges from 0.2% to 0.8%, equating to approximately 2.6-3.5 cases per 100,000 individuals in the general population, with a predilection towards males [1,2].Aortic dissection (AD) constitutes the predominant subtype within the range of AAS, accounting for 85-95% of cases [3].AD arises from a tear in the aorta's intimal and medial layers, facilitating blood ingress into the aortic wall and the formation of a secondary circulating pathway within the aorta known as the false lumen [4].This condition presents as a critical, life-threatening emergency, with mortality rates increasing by approximately 1-2% per hour within the initial 48 h [1].In cases where prompt surgical intervention is not undertaken, the mortality rate escalates to 58% [5].Therefore, expedited diagnosis is pivotal for effective patient management [6].
In response to the severity of AD, the Stanford classification system was developed, distinguishing between type A and type B dissections.Type A primarily involves the ascending aorta, whereas type B is limited to the descending aorta distal to the left subclavian artery [1,2,4].Type A AD demands urgent surgical intervention to mitigate the risk of life-threatening complications.Conversely, type B AD can often be managed conservatively through pharmacological means, particularly focusing on hypertension control, although close monitoring and potential intervention remain essential for optimal patient outcomes [1,2].
The clinical presentation of aortic dissection exhibits a lack of specificity, which can result in symptoms resembling those of various other medical conditions.This diagnostic ambiguity poses a significant challenge in the accurate identification of aortic dissection, as its manifestations overlap with a diverse range of diagnoses [5].Misdiagnosis of AD significantly impacts patient survival.Approximately 25% of AD patients receive inappropriate antithrombotic treatment due to initial misdiagnosis, resulting in an almost 50% increase in long-term mortality [7].Additionally, misdiagnosed patients experience a prolonged interval from symptom onset to surgery (8.6 h) compared to correctly diagnosed patients (5.5 h), further elevating their mortality risk [8].A systematic literature review demonstrated an AD misdiagnosis rate of 33.8%, indicating that one-third of patients face increased mortality risk solely due to diagnostic inaccuracies [9].Hence, ensuring accurate diagnostics and AD type classification is imperative for providing appropriate treatment.Importantly, the utilization of prompt and appropriate imaging techniques is associated with enhanced diagnostic accuracy [9].
In recent years, CTA has emerged as the gold standard for assessing aortic dissection, owing to its exceptional precision and effectiveness in identifying this condition [2,3,10].Despite the high sensitivity and specificity exhibited by CT diagnostics in detecting aortic dissection (AD), the escalating demands within the emergency environment and the onset of fatigue among radiologists could potentially elevate the error rates and prolong the time required for AD diagnosis.Thus, deep learning (DL) tools for AD detection and prioritization have been shown to expedite the evaluation process for radiologists, consequently reducing the time required for clinical decision making [6].The current investigation sought to scrutinize the efficacy and applicability of a recently introduced and commercially available DL-based tool for AD case prioritization.Specifically, the objectives of the study encompassed an evaluation of the diagnostic accuracy of the evaluated device, its capacity for the detection of aortic dissection types, and its potential to reduce the notification time of AD occurrences.This study aims to contribute valuable insights into the utility of DL technologies in enhancing the efficiency and precision of AD diagnosis and clinical patient management.

DL-Powered Algorithm for AD Detection: Architecture and Training
A commercially available application for AD (FDA approved and CE marked), CINA-CHEST v1.0.2, was provided by Avicenna.AI (La Ciotat, France).In the development of the AD DL-powered application, a two-stage approach was employed.The first algorithm was utilized for the segmentation of the aorta, while the second algorithm was employed for the localization of the AD within it, specifically focusing on the visible intimal flap between the true and false lumens.This methodology was selected due to its effectiveness in maximizing the detection of AD in anatomically consistent regions.Both algorithms were based on convolutional neural networks (CNNs).A hybrid 3D/2D U-Net variant, known for its robust performance in 2D and 3D segmentation tasks and previously published by Chang et al. [11], was used.
Regarding the ground truth used to train the algorithms, the segmentation of the aorta, encompassing both the true and false lumens while excluding necrosis or intramural hematoma, was performed per slice by two expert radiologists.Per slice segmentation of the visible intimal flap between the false and true lumens was conducted, so that the algorithm could target the localization of the AD.
The training dataset consisted of 649 3D CTA studies sourced from 40 different scanner models, representing various manufacturers (55% GE, 20% Siemens, 10% Philips, and 15% Canon), spanning from January 2019 to December 2022.Of these studies, 25% constituted positive cases of aortic dissection, categorized as type A or type B according to the Stanford classification.The age distribution among positive cases was as follows: 8% in the (18-40) age range, 56% in the (41-70) age bracket, and 36% over 70 years old.The cases with confounding conditions, such as thoracic or abdominal aneurysms, intramural hematoma, calcifications, and post-surgery instances (e.g., presence of stents), were sought out to enrich the training dataset.

Data Selection
The retrospective and observational CTA data acquisition for this study spanned from July 2017 to March 2022 and was conducted using multiple clinical sources.All data were anonymized and supplied by two teleradiology networks located in the USA and France, comprising more than 200 cities and 6 CT makers.The data were acquired on 52 different scanner models.Among the received data, CTA cases were consecutively preselected according to the recommended requirements (Table 1).The final dataset consisted of 1303 cases.

Ethical Considerations for Data
In this study, we adhered to stringent data ethics and privacy standards.All data utilized for algorithm training and analysis were anonymized and provided by two teleradiology companies based in the United States and France.The U.S. data were de-identified directly by the teleradiology company in accordance with the HIPAA Privacy Rule, specifically 45 CFR § 164.514(e) [12], before being transferred to Avicenna.AI.Additionally, per 45 CFR § 46.101 [13], data where individual subjects could not be identified were exempted from institutional review board (IRB) approval.This exemption was granted based on the criterion that the data posed no risk of identifying individual subjects, thereby supporting our commitment to conducting ethically sound and compliant research.
The European data were de-identified through a provider equipped with an advanced de-identification system, compliant with the General Data Protection Regulation (GDPR) (EU) 2016/679 [14], which outlines the requirements for lawful processing and the protection of personal data.This ensured that all data were rendered non-identifiable, maintaining strict confidentiality and privacy.Informed consent was waived when it was deemed necessary, following national legislation and institutional protocols, before the data were transferred to Avicenna.AI.

The Ground Truth
Two U.S. board-certified expert radiologists with 7 and 6 years of experience in radiology clinical practice independently visually annotated chest and thoraco-abdominal CTAs and determined the cases with suspected ADs.For positive cases, the experts defined the AD type according to the Stanford classification (type A or type B).A third U.S. board-certified expert radiologist with 8 years of experience in clinical radiology practice settled any disagreements.A third U.S. board-certified expert radiologist with 8 years of experience in radiology clinical practice settled any disagreements.The presence or absence of AD and AD classification by type were determined by majority agreement.Hyperacute, acute, subacute, or chronic ADs were all considered positive.The radiologists also reported the observed confounding factors such as thoracic or abdominal aneurysms, intramural hematoma, calcifications, and post-surgery instances (e.g., presence of stents).

Data Processing
The next step entailed processing the same anonymized dataset using the CINA-CHEST (AD) v1.0.2 AI-powered application.The application automatically processed incoming CTAs, displaying notifications of suspected findings (if any) alongside image series information.For cases flagged positive by the application, the AD type (A or B) was also displayed.All results were gathered for assessment.The evaluation was conducted blindly, without access to the results of the U.S. board-certified radiologists.Notifications, AD types (for positive cases), and processing times were recorded for all CTA cases, measured from the end of DICOM reception to positive or negative identification.

Statistical Analysis
The results provided by U.S. board-certified radiologists and those automatically computed by CINA-CHEST (AD) were compared.The sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (ROC AUC) were calculated for the complete dataset.The 95% confidence intervals (95% CI) for sensitivity, specificity, and accuracy were determined using the exact binomial distribution test (Clopper-Pearson).Matthew's correlation coefficient (MCC) was also computed to assess the binary classification quality.Positive and negative predictive values were derived using sensitivity and specificity, accounting for prevalence in the current dataset (10.5%).Subset performance analyses based on imaging acquisition parameters (manufacturer and slice thickness) and patient characteristics (age and sex) were conducted.Moreover, the clinical performance of Stanford classification for AD (type A and type B) was determined.The sensitivity, specificity, and accuracy were computed for each type of AD.Additionally, AD prioritization and triage effectiveness were evaluated based on the standalone per-case processing time of the device (mean ± SD, 95%CI, and median values) for all cases in the database and for true positive cases only.Statistical analyses were performed using MedCalc Statistical Software (MedCalc, v20.015MedCalcSoftware Ltd., Ostend, Belgium).

Confusion Matrix Ground Truth Positive
Negative Total  Among the eight missed ADs (FNs), six are complicated cases and were the subject of disagreements between both truthers.Thus, even visually, the detection of these aortic dissections was not obvious.Four of these dissections located within the abdominal infrarenal aorta or the distal descending thoracic aorta.They were all related to AD type B. Among the confounding factors that impacted correct AD identification by the AI-based algorithm was the combination of acquisition artefacts (one scan was noisy, one presented streak artefacts, and one included motion artefacts) and aortic pathologies (intramural hematoma (IMH): four cases; penetrating atherosclerotic ulcer (PAU): two cases; aortic calcifications: two cases; one aneurysm with large mural thrombus; one endoleak with active extravasation of contrast from the graft in the mid-descending thoracic aorta).One additional case was missed due to bad contrast filling.The last scan presented AD only within the last two abdominal slices, since the acquisition stopped at the level of the right kidney, thus preventing the visualization of the entire dissection (Figure 2; for more details, see Supplementary Table S1).Among the eight missed ADs (FNs), six are complicated cases and were the subject of disagreements between both truthers.Thus, even visually, the detection of these aortic dissections was not obvious.Four of these dissections were located within the abdominal infrarenal aorta or the distal descending thoracic aorta.They were all related to AD type B. Among the confounding factors that impacted correct AD identification by the AIbased algorithm was the combination of acquisition artefacts (one scan was noisy, one presented streak artefacts, and one included motion artefacts) and aortic pathologies (intramural hematoma (IMH): four cases; penetrating atherosclerotic ulcer (PAU): two cases; aortic calcifications: two cases; one aneurysm with large mural thrombus; one endoleak with active extravasation of contrast from the graft in the mid-descending thoracic aorta).One additional case was missed due to bad contrast filling.The last scan presented AD only within the last two abdominal slices, since the acquisition stopped at the level of the right kidney, thus preventing the visualization of the entire dissection (Figure 2; for more details, see Supplementary Table S1).There were 32 false positive cases.The inaccurate identification of the dissections, resulting in false positives, stemmed from various factors including inadequate contrast opacification (13 cases), motion artefacts (10 cases), instances of pathology mimicking dissection (i.e., penetrating atherosclerotic ulcer (PAU), intramural hematoma (IMH), and aneurysms (7 cases)), and interference from stent grafts (2 cases).
The reasons for misdiagnoses (FN and FP) are summarized in Table 4.

Stratified Statistical Analysis Results
The stratified statistical analyses are provided for the subgroups of the CT-scan makers, image acquisition (slice thickness), age, and sex groups (Table 6).The AI-based algorithm performances across these groups are presented in Table 6.The in-depth stratified statistical analysis revealed sensitivities ranging from 89.2% to 100% and specificities from 96.2% to 100%.This comprehensive examination demonstrated that across all categories and within each group, both the sensitivities and specificities consistently surpassed the 89% threshold, and the accuracy for all groups was higher than 95%.* No stratification was conducted for PNMS and Hitachi Ltd. scanner makers, as a small number of scans were included with these scanners, 1 and 4 CT scans, respectively.

Time to Notification Evaluation Results
The application was run with the following hardware specifications: CPU: 8 threads/16 cores at 3.0+ GHz and RAM: 16 GB on Ubuntu 22.04.4LTS.The time to notification (TTN) was calculated for all 1303 cases.The mean TTN ± SD was 27.9 ± 8.7 s, 95% CI 27.4-28.3s.The median TTN value was 26.7 s.The mean TTN ± SD for 129 true positive cases was 35.4 ± 15.4 s, 95% CI 30.8-36.3 s, with a median of 33.3 s.
Enhancing the diagnostic capacity of radiologists' AD screenings using non-enhanced CT scans was the objective of two studies [15,17].The diagnostic performances of the DLbased algorithms used were similar to or slightly outperformed those of trained radiologists.Hata et al. [15] showed that the sensitivity and specificity of their DL-based application were 91.8% and 88.2%, whereas trained radiologists performed at 90.6% and 94.1%, respectively.Yi et al. [17] sought to improve upon the DL-model implemented by Hata et al. [15] and developed a deep integrated model with a sensitivity and specificity of 86.2% and 92.3%, respectively.However, passing their DL-based algorithm on cases obtained from an external clinical center drastically dropped the specificity to 55.4%.These findings emphasize the importance of cases originating from different clinical sources, CT scan makers, and acquisition parameters for a proper diagnostic performance evaluation.Moreover, both studies mentioned above were conducted with a high prevalence of positive AD cases, which does not occur in clinical routine.
Harris et al. [6] evaluated their DL-based tool for AD detection using a multicenter, multiscanner approach using CTA images with a low AD prevalence.The sensitivity and specificity of this application were 87.8% and 96.0%, respectively.In comparison, CINA-CHEST (AD) (Avicenna.AI, La Ciotat, France) underwent an evaluation using scans sourced from multiple clinical sources, various scanner makers and models, and diverse acquisition parameters and approaches to real-world disease prevalence.Therefore, the current performance evaluation, conducted under conditions closely resembling real-world clinical practice, showcased a superior diagnostic performance to previously published solutions for AD triage and prioritization.Similar to CINA-CHEST (AD), another deep learning solution for the triage and prioritization of AD, Briefcase for AD (Aidoc Medical, Tel Aviv, ISRAEL), has received regulatory clearance and certification.Briefcase for AD demonstrated a sensitivity of 93.23% (95% CI: 88.70-6.35%)and a specificity of 92.83% (95% CI: 89.35-5.45%) on 499 CTAs cases, including 192 positives.Therefore, CINA-CHEST (AD) outperformed this similar solution [18].Additionally, CINA-CHEST (AD) is unique among certified and commercially available applications in its ability to classify aortic dissection by Stanford classification types.
AD type identification is a crucial feature of AD screening DL-based applications, as it allows clinicians to promptly sort patients in the emergent surgery clinical pipeline (for type A) and conservative treatment pipeline (for type B).Deploying a two-step neural network, Huang et al. [16] demonstrated the capacity of a DL-based application to classify AD by type with a sensitivity and specificity of 95.5% and 98.5% for type A and 79.3% and 94.0% for type B. In line with previously published articles, CINA-CHEST (AD) successfully identified all type A AD cases.All eight ADs missed by the application were type B, demonstrating more complicated automated diagnostics for these cases.In fact, as stated by Yi et al. [17], the diagnostic performance for type B is shown to be lower than for type A. This is due to a wider range of dissections, as a larger relative aorta volume is found in the descending aorta than in ascending aorta.However, this does not adversely affect clinical outcomes because type A ADs are life-threatening and require immediate intervention, making the accurate detection of these cases significantly more critical.
Early surgical intervention for type A aortic dissection (AD) significantly reduces mortality, emphasizing the crucial need to minimize diagnostic delays [19].Nevertheless, diagnostic delays are observed in nearly 25% of cases [19].The reason for these delays is the in-hospital diagnostic times, which are twice as long as the hospital arrival times, contributing to the majority of the delay [20].The analysis of the International Registry of Acute Aortic Dissection revealed a median diagnostic time of 4.3 h for this acute condition, indicating a significant opportunity for improvement [21].Diagnostic delays are associated not only with misdiagnosis but also with factors such as physician workload, knowledge, and experience, especially when AD presents without obvious clinical manifestations [22].Younger patients with AD experience longer diagnostic delays due to clinical hesitations and a lack of suspicion among emergency clinicians [23].Although CT scans are employed to confirm the diagnosis prior to surgery, their use is linked to diagnostic delays that need to be addressed [24,25].Implementing a decision support system, such as an automated detection and prioritization application, could enhance clinical workflows by reducing both the misdiagnosis rates and diagnostic delays [26].Harris et al. [6] measured the notification time of the application from image download into the platform to visible notification of the application results.This time was equal to 23.5 ± 21 s.Images flagged as positive were prioritized for readers' evaluation.This prioritization impacted the time of delay (time from the image receiving and image opening for analysis).Cases flagged as positive had a significantly reduced median delay time (265 s against 660 s).Considering that according to this study, acute cases must have been addressed within 30 min, the improvement in reducing delays was 20%, representing a significant advancement in the context of emergency radiology [6].CINA-CHEST (AD) had a similar notification time of 27.9 ± 8.9 s.A similar solution, Briefcase for AD (Aidoc Medical, Tel Aviv, ISRAEL), demonstrated a mean notification time of 38 s [18].Although CINA-CHEST (AD) has not yet been evaluated in clinical settings, it is hypothesized that the proposed automated solution for AD detection and prioritization could provide substantial clinical benefits by reducing misdiagnoses and diagnostic delay times.
The evaluated application CINA-CHEST (AD) generated a few false positive and false negative cases.The causes of these misdiagnoses were mainly the pathologies mimicking the aortic dissection, like a penetrating atherosclerotic ulcer (PAU), intramural hematoma (IMH), aneurysms, and acquisition artefacts.For all false positive cases, an alert will be dispatched to the on-call clinician, granting them the opportunity to examine the images and offer the appropriate diagnosis.For all misdiagnosed positive cases, as no notification will be issued to the clinician, these cases will instead undergo review via the standard care workflow, ensuring accurate diagnosis by the physician.
Our study had some limitations.Primarily, it did not include a direct comparison between the performance of the deep learning algorithm and that of a panel of independent radiologists.Hata et al. [15] and Yi et al. [17] assessed the radiologist's performance for AD detection; however, as this evaluation was performed on non-enhanced CT scans, this information is not relevant for CTA images.Second, the retrospective selection of CTA cases may introduce selection bias.Prospective studies would not only demonstrate a real-world performance regarding optimal and sub-optimal CTA scans but would also reveal the supposed clinical benefits regarding time savings for diagnosis in such acute conditions as AD.Moreover, radiologists' time savings by automated triage applications might become a crucial benefit in the next few years, as the clinician workforce is limited under an ever-increasing workload [27].Finally, we did not include any clinical parameters in the diagnostic pipeline like Yi et al. [17], who presented an integrated model where aorta morphology was taken into account for AD triage.

Conclusions
To sum up, CINA-CHEST (AD) (Avicenna.AI, La Ciotat) is a DL-based application for performing triage, classification, and prioritization of aortic dissections.Our multinational, multicenter, multiscanner study demonstrated the highest diagnostic performance reported in the literature for this class of devices.The study was performed with a prevalence that approaches to real-world clinical data, and the dataset presented a significant distribution among clinical sites, scan vendors, and acquisition parameters.This illustrates the device's robustness for extensive use across varied datasets and patient demographics.Moreover, the clinical use of this application is associated with a prompt time to notification that may improve the diagnostic speed and accuracy of clinicians in exigent emergency settings.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics14171877/s1,Table S1: Summary of ADs missed by the DL-based application.

Diagnostics 2024 , 13 Figure 1 .Table 3 .
Figure 1.Examples of CINA-CHEST (AD) outputs upon true and false positive AD cases automatically determined by CINA-CHEST (AD).Red boxes are placed by CINA-CHEST (AD) to indicate the localisation of a detected AD.(a) Correct detection of type A AD. (b) Correct detection of type B AD. (c) False-positive identification of a complicated case in the presence of intramural hematoma following aortic repair.Table 3. CINA-CHEST (AD) confusion matrix and performance data.(TP-true positive, TN-true negative, FP-false positive, and FN-false negative).

Figure 1 .
Figure 1.Examples of CINA-CHEST (AD) outputs upon true and false positive AD cases automatically determined by CINA-CHEST (AD).Red boxes are placed by CINA-CHEST (AD) to indicate the localisation of a detected AD.(a) Correct detection of type A AD. (b) Correct detection of type B AD. (c) False-positive identification of a complicated case in the presence of intramural hematoma following aortic repair.

Figure 1 .
Figure 1.Examples of CINA-CHEST (AD) outputs upon true and false positive AD cases automatically determined by CINA-CHEST (AD).Red boxes are placed by CINA-CHEST (AD) to indicate the localisation of a detected AD.(a) Correct detection of type A AD. (b) Correct detection of type B AD. (c) False-positive identification of a complicated case in the presence of intramural hematoma following aortic repair.Table 3. CINA-CHEST (AD) confusion matrix and performance data.(TP-true positive, TN-true negative, FP-false positive, and FN-false negative).

Figure 2 .
Figure 2. Examples of missed AD cases by CINA-CHEST (AD).(a) Missed subtle type B AD due to a streak artefact.(b) Missed type B AD in the presence of penetrating atherosclerotic ulcer.(c) Missed type B AD in the presence of large intramural hematoma and penetrating atherosclerotic ulcer.

Table 2 .
The dataset characteristics.

Table 6 .
Detailed stratified statistical analysis of CINA-CHEST (AD) application.