Interactive content-based image retrieval with deep learning for CT abdominal organ recognition

Objective. Recognizing the most relevant seven organs in an abdominal computed tomography (CT) slice requires sophisticated knowledge. This study proposed automatically extracting relevant features and applying them in a content-based image retrieval (CBIR) system to provide similar evidence for clinical use. Approach. A total of 2827 abdominal CT slices, including 638 liver, 450 stomach, 229 pancreas, 442 spleen, 362 right kidney, 424 left kidney and 282 gallbladder tissues, were collected to evaluate the proposed CBIR in the present study. Upon fine-tuning, high-level features used to automatically interpret the differences among the seven organs were extracted via deep learning architectures, including DenseNet, Vision Transformer (ViT), and Swin Transformer v2 (SwinViT). Three images with different annotations were employed in the classification and query. Main results. The resulting performances included the classification accuracy (94%–99%) and retrieval result (0.98–0.99). Considering global features and multiple resolutions, SwinViT performed better than ViT. ViT also benefited from a better receptive field to outperform DenseNet. Additionally, the use of hole images can obtain almost perfect results regardless of which deep learning architectures are used. Significance. The experiment showed that using pretrained deep learning architectures and fine-tuning with enough data can achieve successful recognition of seven abdominal organs. The CBIR system can provide more convincing evidence for recognizing abdominal organs via similarity measurements, which could lead to additional possibilities in clinical practice.


Introduction
Medical imaging is a fundamental component of modern health care, as exemplified by the widespread development of computational analysis algorithms for clinical diagnosis (Oliveira et al 2016, Gómez-Flores et al 2020).Widely employed imaging modalities in clinical practice include x-ray, computed tomography (CT), and magnetic resonance imaging (MRI) (Kasban et al 2015).CT can be rapidly performed and is the first choice of initial radiological tests as a triage tool for screening.These images reflecting the specific anatomy or function of organ types require accurate interpretations, which complicates the diagnosis of imaging subspecialities and prevents experience.The interpretation of medical images has substantial variability across experts (Renard et al 2020).Furthermore, adequate domain knowledge and clinical experience are necessary to provide consistent diagnosis.In this situation, the subsequent increase in the number of radiological examinations results in a substantial workload and hence reduces the overall quality of patient care (Willemink et al 2020).
Previous studies have demonstrated the usefulness of computer-aided diagnosis to rapidly and objectively distinguish and assess various diseases, such as breast cancer, brain cancer, and colorectal cancer.With the assistance of recognizing abdominal CT organs, several image segmentation methods have been proposed to automatically determine the locations of certain organs.However, the segmentation accuracies of organ areas varied, which may lead to an inaccurate understanding of anatomical structures.Using the dice similarity coefficient (DICE), the liver had a value between 0.82 and 0.95, and the pancreas had a value between 0.57 and 0.75 (Shen et al 2023).Overall, the segmentation performance had an average value of 0.80.Previous studies have even tried to improve the segmentation accuracy of gallbladders from 0.83 to 0.85 using a multidimensional network with a circular inference strategy (Li et al 2022).Although segmentation can be helpful for automatically obtaining the organ area, identifying a promising method for practical clinical use is still challenging.Manual postprocessing to adjust the segmented areas is also time-consuming and tedious.
This in-depth study highlights the critical significance of organ recognition and content-based image retrieval (CBIR) in abdominal CT interpretation.By favoring CBIR over traditional organ segmentation, we leveraged its ability to interpret complex anatomical structures with greater reliability.The system retrieves similar images from diverse case databases upon inputting a query image, providing comprehensive insights into distinguishing between organs and their similarities.The organ CBIR system holds substantial potential for clinical integration.With advancements in medical imaging technology leading to an increase in radiology examinations, there is an overwhelming workload on radiologists.Automating the analysis and extraction of CT images with similar content can significantly guide diagnosis, streamline radiology report generation, and reduce the time-consuming tasks inherent in interpreting complex radiology images.Medical images are complicated.The presence of complex imaging parameters and subtle differences between two disease states make automated methods necessary for objective lesion analysis.By accelerating radiology workflows, this approach can enhance overall health care quality.
Previous studies using CBIR images on CT images have focused on distinguishing between normal tissues and abnormal tissues in certain organs, such as the liver (Wickstrøm et al 2023) or lung (Zhang et al 2022a).Nevertheless, CBIR has not been proposed for recognizing complex anatomical structures in the abdomen.Recognizing the possible forms of organs in different situations is essential and should be established for further exploration of possible abnormalities in clinic diagnosis.Two challenging issues may be faced in the construction of CBIRs for abdominal CT.The first issue is whether the user can appropriately express the region of interest.Previous studies have focused only on a certain organ, such as the liver or lung, because it is difficult to show the intent for different organs.An abdominal CT slice is composed of multiple organs, as shown in figure 1(a).Inputting the whole original slice can't present the organ user wants to query.An interactive mechanism should be provided.The second issue is that the classification of different organs in the abdomen should depend on high-level imaging features.Low-level features, including colors, intensities, and textures, are not capable of distinguishing abdominal organs (Ghahremani et al 2021, Kugunavar andPrabhakar 2021) because of the sophisticated scanning parameters, shape differences and differences among patients, which can generate various image compositions and distributions.
With the development of deep learning, more automatic and high-level image features can be extracted from images for characterization (Lo et al 2021(Lo et al , 2022a)).Convolutional neural networks (CNNs) were previously proposed for ischemic stroke assessment (Lo et al 2021).Different polyp types were successfully classified on the basis of CNN features (Lo et al 2022b).After multiple layers of feature extraction, feature selection, and feature mapping, the relevant image features for a specific task can be interpreted without human intervention.To overcome the limited receptive field in CNNs, a vision transformer (ViT) inherited from a transformer, which was previously utilized in natural language processing, was also proposed for global image composition analysis (Dosovitskiy et al 2020).Global information can reveal the spatial properties of organs in a CT slice and thus can be effective in CBIR.
In addition to the deep learning architectures mentioned above employed in the proposed CBIR, the system also provides user interaction information for users to specify which organ in the slice is the purpose of the query.With this annotation, CBIR can better perform image matching for successful retrieval, i.e. more relevant images can be returned.For clinical practice, the CBIR system proposed in this study can automatically generate relevant image features from well-annotated CT image datasets and retrieve images to obtain similar images and past cases to gain detailed knowledge of the patient's organs or diseases.

Participants and image acquisition
This study received approval from the institutional ethics committee (No. 21MMHIS253e), and a waiver was obtained for written informed consent due to its retrospective and anonymous analysis.Eligible clinical abdominal CT images were retrospectively collected from distinct patients between January 2020 and December 2022 at two different institutions.All contrast-enhanced abdominal CT images were acquired using specific CT scanners: a 128-channel CT scanner (SOMATOM Definition AS; Siemens Healthineers) or a 64-detector row CT scanner (SOMATOM Definition Flash; Siemens Healthineers) at institution 1 and a 256-row multidetector CT scanner (Aquilion One; Toshiba Medical Systems) at institution 2 (as outlined in table 1).
All multidetector contrast-enhanced CT examinations were performed using the institutional routine abdominal protocol.The detailed protocol and parameters are summarized in table 2.
Two senior experienced radiologists assessed the consecutive CT images to determine the type of organ in consensus and then manually and independently segmented the organ contour on each slice of the abdominal CT scans using LabelMe software.A total of 2827 abdominal CT slices, including 638 liver, 450 stomach, 229 pancreas, 442 spleen, 362 right kidney, 424 left kidney and 282 gallbladder tissues, were collected.Figure 1 shows a CT slice corresponding to the seven organs shown in different colors.

Content-based image retrieval
CBIR can assist users in obtaining images from a previously accumulated image database that are similar to the query image.This approach enables the user to identify the query image and learn about similar characteristics among the retrieved images or to understand the differences between images that are similar but belong to different categories, thereby achieving the goal of learning discriminative diagnosis.Whether the user can express the intended meaning of the image through the query image and transform it into the features used for comparison is the key to successful retrieval.When there is a partial inconsistency between the user's idea and the retrieval result, this difference is referred to as the semantic gap (Liu et al 2007, Wan et al 2014, Chen et al 2021).
Previously, image features were mostly low-level features, such as brightness, saturation, and complexity of lines and textures, and were considered global image descriptions.These features are referred to as low-level Poor image quality and motion artifact features because they are statistical results obtained from the smallest unit of an image: pixels.However, human visual perception of images usually involves high-level features, such as identifying objects in an image.The smaller the semantic gap is, the better the user experience and the more likely the user is to achieve the retrieval goal (Liu et al 2007, Owais et al 2019).However, high-level features are difficult to quantify and describe; thus, deep learning, which can automatically generate high-level semantic features, is the most suitable mechanism for producing retrieval features.The transformer (Vaswani et al 2017), another deep learning architecture originally employed for natural language processing, was recently developed into the vision transformer (ViT).In experiments, ViT indeed achieved higher accuracy than ResNet (Sengar et al 2022, Xin et al 2022, Zhang et al 2022b).ViT divides images into a series of patches to consider global correlations in feature extraction (Dosovitskiy et al 2020).After adding positional information to the token after patch transformation, multihead self-attention is utilized to obtain correlation features between patches in different positions.ViT has achieved substantial accuracy in various medical tasks, including the determination of knee septic arthritis using ultrasound images (Lo and Lai 2023) and the prognosis modeling of colorectal cancer patients based on colonoscopy images (Lo et al 2023).Visual tokens are illustrated in figure 2(b).

Deep learning
The visual tokens in the sequence were obtained by separating the original image into nonoverlapping patches.The size of H × W × C is reshaped into a sequence of two-dimensional patches xp with the shape N × (P 2 × C).H and W are the height and width, respectively, and C is the channel number.With the patch size P × P, there are N = HW P −2 patches.In the following linear projection to be patch embeddings, a class token (x class ) is added.Moreover, position embeddings (E pos ) are combined with patch embeddings to obtain positional information, as shown in Formula 1.
The next encoder includes many concatenations of multihead self-attention and multilayer perceptron blocks.Through the embedding layer, the input x i is converted to a i and then enters the self-attention layer.By multiplying by three different matrices, query (q), key (k), and value (v) were obtained.v is multiplied as a feature vector by the weight w(q, k), which is the inner product of q and k.Similarities can be obtained as the attention weight to determine what patch combinations are most meaningful.Because the original ViT method used fixed-size patches, it lacked the flexibility to describe objects of different sizes at different positions.The Swin Transformer v2 (SwinViT) extends this approach by utilizing a shifted window approach, similar to convolution, to generate windows of different receptive fields (Liu et al 2022).Then, relative position bias is applied within the window before performing self-attention.
The architecture of deep learning indeed enables the automatic generation of high-level features, but it also heavily relies on the learning mechanism, which requires a large amount of data.Especially for images, the main objects that need to be recognized in the original image may not be clear, or the background information may be too complex.Moreover, analyzing the statistical distribution of a large amount of labeled image data is needed to identify relevant features.This situation is even more demanding when training with multilayer CNNs, as more parameters and complex structures of multilayer CNNs require more data to learn these parameters and structures to avoid overfitting.The ViT paper also noted that to outperform the accuracy of ResNet, more data are needed for training.Compared to ViT, SwinViT, which has more layers and parameters, also requires a large amount of image data for model training.However, medical images cannot be obtained in such large quantities, and physicians do not have enough time to perform complete annotations.Therefore, using pretrained parameters combined with fine-tuning of the target images is a more feasible way to construct models.
In the present study, DenseNet201, ViT, and SwinViT with pretrained parameters were utilized for abdominal organ classification and retrieval.Pretrained parameters from DenseNet were trained with stochastic gradient descent, 90 epochs, and a batch size of 256 using the ImageNet ILSVRC 2012 dataset, which included 1.3 M images and 1k classes.The initial learning rate is 0.1, and the learning rate is gradually decreased by factors of 10 at epochs 30 and 60 (Huang et al 2017).ViT had the hyperparameters Adam with β1 = 0.9, β2 = 0.999, a batch size of 4096 and a weight decay of 0.1.The learning rate was linearly related to warmup and decay following stochastic gradient descent with momentum.Weights were trained with ImageNet-21k +ImageNet2012.ImageNet-21K has 14 million images and 21 000 classes and was extended from the ImageNet ILSVRC 2012 dataset (1.3 million images and 1000 classes) (Kingma and Ba 2014).SwinViT is a version of SwinV2-B trained with a batch size = 128, a stochastic gradient descent = AdamW based on the adaptive estimation of first-order and second-order moments to decay weights = 0.05, and a total of 300 epochs with a warmup process and an initial learning rate = 0.001.Training data were obtained from the ImageNet1K dataset and included 1.3 million images and 1000 classes.During the fine-tuning training of the previous models, stochastic gradient descent served as the optimizer, for which the learning rate = 0.01.Due to the limited computational resources, the batch size was 16.The epochs for all training tasks were set to 10 to guarantee convergence.Fivefold cross-validation was applied to evaluate the generalizability of the model.

Image annotation
This study proposed the use of a CBIR system to identify the major organs on CT scans, as shown in figure 1.In a single CT slice, multiple organs may simultaneously appear, and the positional, shape, and textural information of each organ is crucial for distinguishing them from one another.Since not every slice has the same number of organs appearing, depending on the scanning order and the size of the organs, some organs may appear earlier and have differences in shape and texture in the middle or back sections, while other organs may appear later.Therefore, when users want to query a specific organ in a slice, they need to define the region of interest (ROI) in the input image.This approach achieves two objectives: first, it specifies the target to be recognized, and second, it simultaneously strengthens the target while weakening the surrounding similar tissue as background.
As deep learning can interpret high-level features, seven-organ classification models were trained on CT slices marked with different organs, and the trained features were utilized to match query images with target database images.However, it is unclear whether different deep learning architectures can be effectively used to analyze differences in characteristics, including position, shape, and texture, of different organs when users specify the target using ROIs.Therefore, three different annotated images were used as the input for classification training and retrieval in the experiment to determine the importance of the global features.Figure 3 shows three kinds of labeling approaches employed in classification and retrieval for organ recognition.

Performance evaluation
The performance evaluation in the experiment included two parts: classification and retrieval.The original dataset was separated into a target image database (80%) and a query image set (20%).The target image database was subsequently used to train a classification model.All slices from the same patient were employed in either the training set or test set.In the training, fivefold cross-validation was used to show the model's generalizability.
In each iteration, one fold was selected from the total five folds to test the model trained from the remaining folds.The complete results of five runs were averaged to obtain the final performance.The proposed three deep learning architectures and three kinds of images were evaluated using different methods to indicate ROIs to determine the accuracy of seven-organ recognition.The overall accuracy was obtained by adding the correct classification numbers of each organ (C Oi ) and dividing by the total number of each organ (T Oi ), as described in Formula 2 liver, stomach, pancrea, spleen, right kidney, left kidney, gallbladder . 2 { } ( ) In the classification task, high-level features such as shape, texture, and location are utilized to classify various organs and can be learned through a series of layers.During the learning process, the model extracts and prioritizes features that are most relevant for distinguishing between different organ types.These features are abstract representations that capture unique characteristics or patterns within images.
Once the model has been trained, the learned features can be repurposed for retrieval tasks.Instead of being limited to classification, these features can serve as descriptive representations of the images.For retrieval purposes, when a query image or query features are presented, the model calculates similarity scores or distances between the query features and the features of images in a database.This approach enables the system to retrieve images from a database that shares similar features or patterns with the query, facilitating CBIR.The benchmark of CBIR considers the ratio of relevant images to the retrieved images in a query.The top 10 accuracy values indicate the number of retrieved images that are relevant to the top 10.In the experiment, a test set of query images was employed to evaluate the CBIR.For the cutoff top k, the mean average precision (mAP) was calculated via the following formula (Smith 2001).TP indicates the total relevance in the top k.FP indicates the total irrelevance in the top k.R n is the number of relevant images.

Results
Table 3 shows the classification and retrieval performances obtained from three deep learning architectures and three kinds of query images.Both the contour and hole images provide enough information to achieve almost perfect classification accuracy (98%-99%) and retrieval mAP (0.99).In particular, the hole images were slightly better than the contour images.Regarding the deep learning architectures, SwinViT with fine-tuning performed slightly better than ViT (retrieval with query of organ images) and better than DenseNet201 (classification with organ images).When examining the details of the retrieval results using organ images as the query, table 4 shows the AP values of the seven organs based on the three deep learning architectures.The stomach and left kidney may be more challenging than the other organs.In table 5, the left kidney is the most challenging organ according to the use of hole images as the query.Although the mAP obtained by using organ images in table 6 was the lowest compared to that obtained by using other query images, the left kidney was still the most challenging organ among the seven organs. Figure 4 shows the best classification and retrieval performances obtained by using hole images.After delineating the boundaries of the organs, the inside areas were filled with red to create hole images.This kind of representation highlights the dominant properties of the organs, such as size, shape, and position.The retrieval mAP in figure 4 is 1.0.Various views of the stomach are presented in the retrieval results.Figure 5 shows the    For the general purpose of using the CBIR system, the user may not be familiar with the anatomical structure on abdominal CT.Thus, incorrect delineation of a certain organ is possible.Figure 7 shows an illustration of using an imprecise outline as a query.The system achieved an AP of 1.0 in the top 10 retrieved images.An independent test set was collected from the MICCAI 2015 Multi-Atlas Abdomen Labeling Challenge (https://www.synapse.org/#!Synapse:syn3193805/wiki/217789).Under institutional review board supervision, abdomen CT scans were randomly selected from a combination of an ongoing colorectal cancer chemotherapy trial and a retrospective ventral hernia study.The scans were captured during the portal venous contrast phase with variable volume sizes (512 × 512 × 85-512 × 512 × 198) and fields of view (280 × 280 × 280 mm 3 -500 × 500 × 650 mm 3 ).The in-plane resolution varies from 0.54 × 0.54 mm 2 to 0.98 × 0.98 mm 2 , while the slice thickness ranges from 2.5 to 5.0 mm.A total of 139 slices were picked from the MICCAI dataset to generate query images of 129 livers, 92 stomachs, 63 pancreases, 56 spleens, 70 right kidneys, 70 left kidneys, and 36 gallbladders.The overall mAP was 0.93 for the MICCAI dataset.An example using a delineated liver as the query is shown in figure 8.

Discussion
This study collected numerous series of CT images from a clinic to establish a model of real-time recognition of abdominal organs and to present the results using a CBIR system.These images were obtained from clinical diagnostic indications with different CT scanner settings.Several methods for extracting image information for classification have been proposed (Shah et al 2016, Agrawal et al 2022).However, previous disease classification systems targeted only a single disease; thus, their scope is substantially narrower than that of clinical practice.The relevant characteristics used in multiple-organ recognition are very subtle and require more sophisticated integration of the available image information.In the experiment, deep learning architectures were proposed to automatically extract image features.Upon fine-tuning the pretrained parameters with substantial data diversity, DenseNet, ViT, and SwinViT performed very well, with accuracies higher than 94% and mAPs higher than 0.98.Overall, SwinViT achieved the best performance.SwinViT considers the spatial correlations between different patches in an image and is a relevant factor for organ recognition in the same slice.Similarly, organ images that use information from only organ areas may not provide more information about adjacent compositions, which results in worse performance than that of the other two kinds of images.The CBIR system provides a clear view of the similarities of the same or different organs.
Figure 4 shows an overview of the organ retrieval results to demonstrate that the current CBIR system can successfully retrieve different views of an organ.The location correlations between organs played an important role both in the classification and retrieval, as shown in the distinctions between the stomach and the left kidney in figures 5 and 6.Unlike previous computer-aided diagnostic studies focused on segmenting and analyzing ROIs, organ recognition is strongly dependent on global information.In contrast, using contour or hole images containing whole anatomical structure information would be more suitable for classification and retrieval.Additionally, whether the deep learning architecture can benefit from the available anatomical structure is also quite relevant.The reason why DenseNet performed worse than vision transformers may be the limited receptive field in CNNs.On the other hand, considering more patches of different sizes and locations, SwinViT also performed slightly better than ViT.
Focusing on classification accuracy and CBIR metrics, we performed extensive steps to evaluate the system's real-world performance.In parallel, it is crucial to recognize the widespread shortage of trained professionals in the field.Our research is aimed at bridging this gap by using artificial intelligence (AI)-driven solutions, which are validated through rigorous steps such as the use of roughly delineated contours (figure 7) and analysis of an external MICCAI dataset, resulting in a commendable mean average precision of 0.93 (figure 8).This approach not only validates the model's discriminative features but also underscores the potential for AI technologies to supplement diagnostic processes in contexts where skilled experts are scarce, highlighting the transformative role of AI in overcoming workforce limitations.
While data-driven methodologies hold promise in shaping the future of radiology, challenges from factors such as patient privacy concerns, laborious annotation processes, limited access to extensive datasets, and the shortage of radiologists limit the development of robust datasets.Deep learning algorithms heavily rely on expansive, carefully annotated datasets for optimal performance.The ideal dataset should encompass a wide array of high-quality images sourced from diverse institutions and geographic regions to ensure the model's adaptability for clinical applications.Considering the potential for overfitting-the model fits the training data but may not generalize to new cases.We added additional experiments, including the use of roughly delineated contours as the query image (figure 7), and validated the system using an external MICCAI dataset, achieving a substantial mean average precision of 0.93 (figure 8).Despite these achievements, it is important to recognize the limitations inherent in the small dataset size, potential overfitting concerns, and the need for external validation, which pose challenges in translating these methodologies for widespread clinical adoption.
Based on deep learning architectures, organ recognition has become promising and has potential use in clinical settings.With the proposed retrieval system, additional explainable meanings were also explored.The differences or similarities among the retrieved images suggest limitations and possible future improvements.In addition, comparing query images with the top 10 similar images would instill more confidence in users with uncertainty at the beginning.These findings highlight the usefulness of using deep learning in clinical practice.Previous studies have reported large interpatient shape variations; intense contrasts between the liver and adjacent organs, e.g. the stomach, pancreas, and kidney; and the existence of various pathologies (Bobo et al 2018, Liu et al 2019, Zhou et al 2022).In future studies, whether CBIRs can retrieve similar images from a query about an uncertain disease could lead to more convincing diagnostic suggestions and subsequent treatment.Furthermore, determining whether CBIRs can assist in providing diagnostic suggestions and even improving readers' performance will be the next goal for clinical use.

Conclusions
There is currently a massive increase in medical images.Image interpretation can provide evidence-based diagnosis, teaching, and research.To provide objective and rapid access and use, a CBIR system was proposed in this study for the recognition and management of abdominal CT organs, which are commonly employed in clinical practice.With respect to deep learning architectures, including DenseNet, ViT, and SwinViT, high-level diagnostic features were automatically generated to achieve substantial classification accuracy (94%-99%) and retrieval results (0.98-0.99).Using images with different ROIs, more explanatory characteristics about organ recognition were obtained.CBIR also provides more convincing evidence via image similarities, which could lead to additional possibilities in clinical practice.

Figure 1 .
Figure 1.Illustrations of (a) a CT slice and (b) the seven organs shown in different colors.In the following classification and retrieval steps, only one organ was outlined in a slice to express the user's intention to query a specific organ.
A CNN is a specific implementation of deep learning in image recognition.The common components of CNNs include convolutional layers, pooling layers, and fully connected layers (Hinton et al 2012).Since 2012, when AlexNet (Krizhevsky et al 2017) based on CNNs won first place in the ImageNet Large Scale Visual Recognition Competition (ILSVRC), the most accurate methods each year have been based on CNNs, including GoogLeNet (Szegedy et al 2015) and ResNet (He et al 2016).The subsequent DenseNet (Huang et al 2017) was based on the ResNet approach and further reuses the features of each layer, reducing the vanishing gradient problem in CNNs.This outcome pushed the number of layers in the neural network to 201, as shown in figure 2(a).

Figure 2 .
Figure 2. Illustrations of different deep learning architectures for (a) dense blocks and (b) vision tokens.

Figure 3 .
Figure 3. Three kinds of labeling approaches used in classification and retrieval for organ recognition: (a) contour image, (b) hole image, and (c) organ image.
disadvantages of using organ images containing only the organ area.A pancreas was retrieved in the query of the stomach in figure 4 using organ images.Figure6shows another example of a stomach query.Without anatomical structure information, the left kidney (figure6(c)) was very close to the stomach query (figure 6(b)).

Figure 4 .
Figure 4. Top 10 retrieval results for a query image using a stomach hole.

Figure 5 .
Figure 5. Stomach image correctly retrieved as the stomach using (a) the hole image but incorrectly retrieved as the pancreas using (b) the organ image.

Figure 6 .
Figure 6.Challenging stomach image for ViT features during retrieval: (a) the top 10 results, (b) the query image, and (c) the incorrectly retrieved organ image showing the left kidney (No. 3 in (a)).

Figure 7 .
Figure 7. Illustration of using an imprecise outline as a query: (a) a well-delineated liver and (b) a query image that included the contour of the liver and unrelated vessels with an AP of 1.0.The top 10 retrieved images are all relevant, demonstrating a similarity range of 0.93 to 0.96.

Figure 8 .
Figure 8. Illustration of MICCAI slices with the liver delineated as a query.

Table 1 .
Baseline characteristics of patients in the collected database.

Table 3 .
Classification accuracies and top 10 mAPs of different learning networks.

Table 4 .
Top 10 mAPs of the seven organs retrieved from contour images using different deep learning networks.

Table 5 .
Top 10 mAPs of the seven organs retrieved from the hole images using different deep learning networks.

Table 6 .
Top 10 mAPs of the seven organs retrieved from organ images using different deep learning networks.