Artificial intelligence-based iliofemoral deep venous thrombosis detection using a clinical approach

Early diagnosis of deep venous thrombosis is essential for reducing complications, such as recurrent pulmonary embolism and venous thromboembolism. There are numerous studies on enhancing efficiency of computer-aided diagnosis, but clinical diagnostic approaches have never been considered. In this study, we evaluated the performance of an artificial intelligence (AI) algorithm in the detection of iliofemoral deep venous thrombosis on computed tomography angiography of the lower extremities to investigate the effectiveness of using the clinical approach during the feature extraction process of the AI algorithm. To investigate the effectiveness of the proposed method, we created synthesized images to consider practical diagnostic procedures and applied them to the convolutional neural network-based RetinaNet model. We compared and analyzed the performances based on the model’s backbone and data. The performance of the model was as follows: ResNet50: sensitivity = 0.843 (± 0.037), false positives per image = 0.608 (± 0.139); ResNet152 backbone: sensitivity = 0.839 (± 0.031), false positives per image = 0.503 (± 0.079). The results demonstrated the effectiveness of the suggested method in using computed tomography angiography of the lower extremities, and improving the reporting efficiency of the critical iliofemoral deep venous thrombosis cases.

Early diagnosis of deep venous thrombosis is essential for reducing complications, such as recurrent pulmonary embolism and venous thromboembolism. There are numerous studies on enhancing efficiency of computer-aided diagnosis, but clinical diagnostic approaches have never been considered. In this study, we evaluated the performance of an artificial intelligence (AI) algorithm in the detection of iliofemoral deep venous thrombosis on computed tomography angiography of the lower extremities to investigate the effectiveness of using the clinical approach during the feature extraction process of the AI algorithm. To investigate the effectiveness of the proposed method, we created synthesized images to consider practical diagnostic procedures and applied them to the convolutional neural network-based RetinaNet model. We Deep venous thrombosis (DVT) most commonly develops in the lower extremities and can cause complications that raise mortality and decrease the quality of life 1 . The treatment and long-term prognosis of lower-extremity DVT depend on an accurate and timely diagnosis 2 . However, owing to the absence of a radiologist on duty, diagnosis might be delayed. Because the clinical symptoms and signs have low specificity for the diagnosis of DVT, imaging workup is necessary to confirm or exclude the diagnosis 3 .
Owing to intra-and inter-observer variability, medical imaging analysis is typically performed manually; it places a burden on the radiologist and increasing the risk of misdiagnosis. Because of the drawbacks of manual analysis, convolutional neural network (CNN)-based artificial intelligence (AI) algorithms have been used in the medical imaging field as a computer-aided diagnosis (CAD) system tool 4 . Moreover, some studies proposed AI techniques to improve the diagnostic performance by fusing the clinical information or practical diagnostic procedures, and demonstrated the benefit of aggregated clinical approaches [5][6][7] .
Imaging modalities for DVT diagnosis include ultrasonography (US), computed tomography angiography of the bilateral lower extremities (LECTA), magnetic resonance imaging (MRI), and catheter venography. To overcome the limitations of the DVT manual analysis, studies have been conducted using various imaging modalities and have shown the potential and efficiency of an AI-based CAD system for DVT diagnosis [8][9][10][11][12] . Among the image modalities, LECTA was found to be more advantageous-it provided more objective images than US; it is easily accessible and frequently used to provide information about extravascular tissues in the bilateral lower extremities and abdominopelvic region. In the clinical setting, to accurately decide on DVT in LECTA, adjacent slices should be considered rather than one CT slice with the suspicious existence of the lesion. As preliminary studies, we conducted two kinds of DVT diagnosis based on CNN models in LECTA. The first study aimed to investigate

Methods
Data collection. The institutional review board of Gachon University Gil Medical Center (IRB Number: GAIRB2021-225) approved this study; the requirement of informed consent was waived for this study populations because of the study design's retrospective nature. All experimental protocols were performed in accordance with the relevant guidelines and regulations in compliance with the Declaration of Helsinki. The picture archiving and communication system database was searched for LECTA examinations conducted at Gil Medical Center between January 2013 and December 2020, and 583 consecutive LECTA examinations were identified. When a patient underwent multiple LECTA examination sessions, only the first LECTA scan session of the patient was considered for this study. Patients with motion or metallic artifacts were excluded. Additionally, cases without a detailed mention of the presence or absence of iliofemoral DVT in the radiologic report were excluded. Consequently, 380 LECTA examination sets were disqualified. Among them, 95 sets with iliofemoral DVT on the radiological report were grouped as the "DVT" group. Likewise, 95 LECTAs without iliofemoral DVT on the radiological report were systematically gathered and grouped as "no DVT" group ( Fig. 1).  To compare the performance of one slice image with that of the three slices data (one image, one upper image, and one lower image), we synthesized an image using three continuous LECTA images while considering the characteristics of clinical diagnosis (Fig. 2).

LECTA image acquisition.
Deep learning based on convolutional neural networks. We chose the CNN-based RetinaNet model to detect iliofemoral DVT because it has the advantages of time efficiency and high accuracy based on its loss function and structure. RetinaNet is a deep learning-based one-stage detection model that uses a focal loss function, and has demonstrated strong performance in addressing the foreground-background class imbalance, which is the main drawback of one-stage object detectors 13 . The RetinaNet has a feature pyramid network combined with the ResNet backbone 14 . The feature pyramid network has been applied and used in many detection models in medical imaging because it exhibits a high level of detection performance with minimal resource requirements for computation; it uses CNN to extract four different multiscale feature maps from one image [15][16][17] . The RetinaNet uses a pyramidal structure to construct the multiscale feature maps that the ResNet backbone network extracts. The RetinaNet structure has two distinct subnetworks using a region-proposal-based network from the feature map extracted by each pyramid layer. One performs regression for localization to the bounding box of the target object task, while the other performs object classification. Regarding time consumption, RetinaNet is a one-stage detector that accomplishes two tasks concurrently for high performance and efficiency.
The study was performed in Python 3.6.12 (Python Software Foundation, Wilmington, DE) using Keras 2.2.5 frameworks (Keras Global Limited, London, United Kingdom) on an Ubuntu 14.04 operating system (London, United Kingdom) with two NVIDIA Tesla P100 graphics processing units (GPUs; NVIDIA Corporate, Santa Clara, CA) and 512 GB of random access memory. The hyper-parameters are manually set as follows: 16 batch size, 100 epochs and a 0.0001 learning rate. The hyper-parameters are set to 16 batch size, 100 epochs and 0.0001 learning rate. We set the learning rate to decrease by a factor of 0.1 if the validation loss did not decrease for 15 epochs.
Performance assessment. The performance assessment was conducted using the test data from 38 cases that were not used as the training sets from each fold. The intersection over union (IOU) refers to an evaluation index based on the overlap between the two areas. In this study, the two areas stand for the ground truth (GT), which is the ROI labeled by radiologists, and the prediction area derived from the models. The IOU value threshold was set at 0.1. The bounding box that the model predicted was treated as a true positive (TP) if the IOU value calculated from the two areas was found to be greater than or equal to 0.1. If the value was found to be less than 0.1, the predicted box was considered a false positive (FP). A false negative (FN) was declared if the model's GT prediction area was absent.
The sensitivity (Sn and recall), FPs per image (FPPI), and precision were calculated using the model's evaluation indicators. The average precision (AP) refers to the area under the precision-recall curve. The following equations define them:  (Table 2). However, based on the ResNet152 backbone, the model with one slice performed the best with 0.456 (± 0.093) FPPI and 0.670 (± 0.047) precision values (Table 3, Fig. 4). Figure 3 shows the free response operating characteristic (FROC) curves based on the Sn and FPPI values of each model result.

Discussion
To detect iliofemoral DVT in LECTA, this study used deep learning-based AI techniques. To further reflect information about the periphery based on the z-axis of the lesion, we produced data by synthesizing three successive images centered on the lesion. To verify the validity of the three generated slices of the synthesized data, we compared their performance with the data that only included the slice identified as having a lesion. According to Table 2, the two models that used the suggested synthesized data performed better in terms of Sn and mAP values. However, the models based on one-slice data outperformed those based on three-slices data. By adding axis-z-based peripheral information to one image, despite an increase in FP cases, the detection rate for veins was higher. Additionally, because the location of the veins in the muscle and bone are relatively similar, we inferred that the ResNet152-based model fitted with more parameters performed better in all indicators based on the depth of the proposed models. mAP = AP the number of classes        Table 1, the results of using three slice images in the patch image of the same region showed that the number of TP and FP were higher than that of the model using one slice image. Figure 5 shows examples of the FP case. As shown in Fig. 5a, most FP cases occurred in areas with similar DVT shapes and pixel intensities. Because of the relatively small area of the generated patch unit that was compared with the entire image, it lacked the peripheral information to confirm the site of analysis as venous. Therefore, we assume that this small area patch led the model to mispredict the DVT for objects with a similar shape. Figure 5b shows a case of successful localization and unsuccessful classification. By creating patch unit images of the region around the iliofemoral veins, we collected data from various backgrounds. The generated image has a background similar to the shape and position of muscles and bones, particularly in the thigh range, which comprises most input data. This aids the model's ability to locate the objects. However, the number of cases used in this experiment was insufficient to extract the features of various types of DVT and healthy veins.

Conclusions
The CNN-based models outperformed the one-slice images in detecting iliofemoral DVT on LECTA using the proposed synthesized images. From the results, we demonstrated AI, reflecting the practical process, enables a more accurate diagnosis of DVT detection for LECTA. Our work allows the radiologists to achieve a more accurate diagnosis by utilizing this proposed AI algorithm which presents a location with a probability of the existence of DVT. The radiologists can primarily confirm the locations suggested by the AI model in CT volume data, comprised of numerous slices images. It can lead to improving the reading efficiency of radiologists and reducing the burden on them.
A few research is considered future works. First, it is essential to extend the detection ranges to the infrapopliteal vein for investigating DVT. This research is limited to the iliofemoral vein; hence, the infrapopliteal deep vein was excluded from this study because of its minimal diameter and inconstant location. The infrapopliteal DVT, particularly in high-risk patients, has clinical and diagnostic value because it can spread to the iliofemoral vein 18 . Therefore, it could be possible to increase the DVT detection rate of physicians by developing an AI model with more broaden detection range than this limited range. Second, the result from the proposed AI model should be compared with the diagnosis given by a radiologist to demonstrate the practical advantages of the model. This study attempted to prove the positive aspects of the proposed method by comparing and analyzing the results by applying the method to different AI algorithms for the effect of the proposed algorithm on the AI model. Hence, the research explores the benefit of this AI model for clinical diagnosis as a CAD system.

Data availability
The LECTA data used to support the findings of this study are available upon request from the corresponding authors.