Performance of a deep learning system for automatic diagnosis of protruding lesions in colon capsule endoscopy: a multicentric study

Colon capsule endoscopy (CCE) represents a landmark in minimally invasive exploration of the colonic mucosa for patients with contraindications for a conventional colonoscopy, or for whom the latter exam is unwanted or unfeasible. Colorectal neoplasia is the most common lesion found in CCE. The widespread acceptance of CCE as a non-invasive diagnostic method is particularly important in the setting of colorectal cancer screening. However, reviewing these exams is a time-consuming process as they generate a large number of frames, with the risk of overlooking important lesions. We aimed to develop an arti�cial intelligence (AI) algorithm using a convolutional neural network (CNN) architecture for the automatic detection of colonic protruding lesions. A CNN was constructed using an anonymized database of CCE images collected from a total of 124 patients. This database included images of patients with colonic protruding lesions or patients with normal colonic mucosa or with other pathologic �ndings. A total of 5715 images (2410 protruding lesions, 3305 normal mucosa or other �ndings) were extracted for CNN development. Two image datasets were created and used for training and validation of the CNN. The performance of the CNN was measured by calculating the area under the receiving operating characteristic curve (AUROC), sensitivity, speci�city, positive and negative predictive values (PPV and NPV, respectively). The AUROC for detection of protruding lesions was 0.99. The sensitivity, speci�city, PPV and NPV were 90.0%, 99.1%, 98.6% and 93.2%, respectively. The overall accuracy of the network was 95.3%. The developed deep learning algorithm accurately detected protruding lesions in CCE images. The introduction of AI technology to CCE may increase its diagnostic accuracy and acceptance for the screening of colorectal neoplasia.


Introduction
Capsule endoscopy (CE) is a primary diagnostic tool for the investigation of patients with suspected small bowel disease. Colon capsule endoscopy has been recently introduced as a minimally invasive alternative to conventional colonoscopy for evaluation of the colonic mucosa 1,2 . This system allows to overcome some of the drawbacks associated with conventional colonoscopy, including the potential for pain, use of sedation, and the risk of adverse events, including bleeding and perforation 3 . The clinical application of this tool has been most extensively studied in the setting of colorectal cancer screening, particularly for patients with previous incomplete colonoscopy, or for whom the latter exam is contraindicated, unfeasible or unwanted 4 . Although CCE marks an advance in non-invasive colon evaluation, its sensitivity, particularly for polyp detection, remains suboptimal 5 . Moreover, a single fulllength CCE video may produce over 50,000 images, and reviewing these images is a monotonous and time-consuming task, requiring approximately 50 minutes for completion 2 . Furthermore, any given frame may capture only a fragment of a mucosal abnormality and lesions may be depicted in a very small number of frames. Therefore, the risk of overlooking important lesions is not insigni cant 2 .
The combination of enhanced computational power with large clinical datasets has potentiated the research and development of AI tools for clinical implementation. The application of automated algorithms to diverse medical elds has provided promising results regarding disease identi cation and classi cation [6][7][8] . Convolutional neural networks (CNN) are a type of multi-layered deep learning algorithm tailored for image analysis. The application of these technological solutions to small bowel CE has provided promising results in the detection of several types of lesions [9][10][11][12] . The introduction of AI tools for real-time detection of colorectal neoplasia in conventional colonoscopy has suggested a high diagnostic yield for CNN-based algorithms 13 . The impact of AI algorithms for detection of colorectal neoplasia in CCE images has been scarcely evaluated. Enhanced reading of CCE images through the application of these tools may improve the diagnostic accuracy of CCE for colorectal neoplasia, which is currently unsatisfactory 2 . Importantly, the implementation of automated algorithms may help to reduce the time required for reading a single CCE exam. The aim of this study was to develop and validate a CNN-based algorithm for the automatic detection of colonic protruding lesions using CCE images.

Study design
A multicentric study was performed for development and validation of a CNN for automatic detection of colonic protruding lesions. CCE images were retrospectively collected from the two different institutions: São João University Hospital (Porto, Portugal) and ManopH Gastroenterology Clinic (Porto, Portugal).
One hundred and twenty-four CCE exams (124 patients), performed between 2010 and 2020, were included.
The full-length video of all participants was reviewed, and a total of 5715 frames of the colonic mucosa were ultimately extracted.
Inclusion and classi cation of frames were performed by three gastroenterologists with experience in CCE (MJMS, HC and MMS). A nal decision on frame labelling required the agreement of at least two of the three researchers.
This study was approved by the ethics committee of São João University Hospital (No. CE 407/2020). The study protocol was conducted respecting the original and subsequent revisions of the declaration of Helsinki. This study is retrospective and of non-interventional nature. Thus, the output provided by the CNN had no in uence on the clinical management of each included patient. Any information susceptible to identify included patients was omitted, and each patient was assigned a random number in order to guarantee effective data anonymization for researchers involved in CNN development. A team with Data Protection O cer (DPO) certi cation (Maastricht University) con rmed the non-traceability of data and conformity with the general data protection regulation (GDPR).

Colon Capsule Endoscopy Procedure
For all patients, CCE procedures were conducted using the PillCam™ COLON 2 system (Medtronic, Minneapolis, Minnesota, USA). This system comprises three major components: the endoscopic capsule, an array of sensors connected to a data recorder, and a software for frame revision. The capsule measures 32.3 mm in length and 11.6 mm in width. It has 2 high-resolution cameras, each with a 172º angle of view. The system frame rate varied automatically between 4 and 35 frames per second, depending on bowel motility. Each frame had a resolution of 512 x 512 pixels. The battery of the endoscopic capsule has an estimated life of ≥ 10 hours 2 . This system was launched in 2009 and was not submitted to hardware updates since then. Thus, no signi cant changes in image quality were evident during this period. The images were reviewed using PillCam™ software version 9.0 (Medtronic, Minneapolis, Minnesota, USA). Each frame was processed in order to remove information allowing patient identi cation (name, operating number, date of procedure).
Each patient received bowel preparation according to previously published guidelines 14 . Summarily, patients initiated a clear liquid diet in the day preceding capsule ingestion, with fasting in the night before examination. A solution consisting of polyethylene glycol was used in split-dosage (2 liters in the evening and 2 liters in the morning of capsule ingestion). Prokinetic therapy (10 mg domperidone) was used if the capsule remained in the stomach 1 hour after ingestion, upon real-time image review on the recorder. Two boosters consisting of a sodium phosphate solution were applied after the capsule has entered the small bowel and with a 3-hour interval.

Development of the Convolutional Neural Network
A deep learning CNN was developed for automatic detection of colonic protruding lesions. Protruding lesions included all polyps, epithelial tumors, nodes and subepithelial tumors. From the collected pool of images (n = 5715), 2410 showed protruding lesions and 3305 displayed normal mucosa or other mucosal lesions (ulcers, erosions, red spots, angiectasia, varices and lymphangiectasia). This pool of images was split for constitution of training and validation image datasets. The training dataset was composed by 80% of the consecutively extracted images (n = 4572). The remaining 20% were used as the validation dataset (n = 1143). The validation dataset was used for assessing the performance of the CNN (Fig. 1).
To create the CNN, we used the Xception model with its weights trained on ImageNet (a large-scale image dataset aimed for use in development of object recognition software). To transfer this learning to our data, we kept the convolutional layers of the model. We removed the last fully connected layers and attached fully connected layers based on the number of classes we used to classify our endoscopic images.
We used two blocks, each having a fully connected layer followed by a dropout layer of 0.3 drop rate. Following these two blocks, we add a dense layer with a size de ned as the number of categories to classify. We applied gradient-weighted class activation mapping on the last convolutional layer 15 , in order to highlight important features for predicting protruding lesions. The learning rate of 0.0001, batch size of 16, and the number of epochs of 100 was set by trial and error. We used Tensor ow 2.3 and Keras libraries to prepare the data and run the model. The analyses were performed with a computer equipped with a 2.1 GHz Intel® Xeon® Gold 6130 processor (Intel, Santa Clara, CA, USA) and a double NVIDIA Quadro® RTX™ 4000 graphic processing unit (NVIDIA Corporate, Santa Clara, CA, USA).

Model performance and statistical analysis
The primary outcome measures included sensitivity, speci city, positive and negative predictive values, and accuracy. Moreover, we used receiver operating characteristic (ROC) curve analysis and area under the ROC curve (AUROC) to measure the performance of our model in the distinction between the categories. For each image, the trained CNN calculated the probability for each of the categories (protruding lesions vs. normal colonic mucosa or other ndings). A higher probability value translated in a greater con dence in the CNN prediction. The software generated heatmaps that localized features that predicted a class probability (Fig. 2A). The category with the highest probability score was outputted as the CNN's predicted classi cation (Fig. 2B). The output provided by the network was compared to the specialists' classi cation (gold standard). Additionally, the image processing performance of the network was determined by calculating the time required for the CNN to provide output for all images in the validation image dataset. Sensitivities, speci cities, positive and negative predictive values were obtained using one iteration and are presented as percentages. Statistical analysis was performed using Sci-Kit learn v0.22.2 16 .

Construction of the network
One hundred and twenty-four patients were submitted to CCE and enrolled in this study. A total of 5715 frames were extracted, 2410 showing protruding lesions and 3305 showing normal colonic mucosa or other ndings. The training dataset was constituted by 80% of the total image pool. The remaining 20% of frames (n = 1143) were used for testing the model. This validation dataset was composed by 482 (42.2%) images with evidence of protruding lesions and 661 (57.8%) images with normal colonic mucosa/other ndings. The CNN evaluated each image and predicted a classi cation (protruding lesions vs. normal mucosa/other lesions) which was compared with the classi cation provided by gastroenterologists. Repeated inputs of data to the CNN resulted in the improvement of its accuracy (Fig. 3).
Overall performance of the network The confusion matrix between the trained CNN and expert classi cations is shown in Table 1. Overall, the developed model had a sensitivity and speci city for the detection of protruding lesions of 90.0% and 99.1%, respectively. The positive and negative predictive values were, respectively, 98.6% and 93.2%. The overall accuracy of the network was 95.3% (Table 1). The AUROC for detection of protruding lesions was 0.99 (Fig. 4).

Discussion
The exploration of AI algorithms for application to conventional endoscopic techniques for automatic detection of colorectal neoplasia in conventional endoscopic techniques has been producing promising results over the last decade. The development and implementation of these systems has been recently endorsed (although with reservations) by the European Society of Gastrointestinal Endoscopy 17 . Furthermore, a recent meta-analysis has suggested that the application of AI models for adenoma and polyp's identi cation may substantially increase the adenoma detection rate and the number of adenomas detected per colonoscopy 18 . These improvements in commonly used performance metrics have shown not to be affected by factors known to in uence detection by the human eye, including the size and morphology of the lesions 18 .
In our study, we have developed a deep learning tool based on a CNN architecture for automatic detection of protruding lesions in the colonic lumen using CCE images. This study has several highlights. First, our model demonstrated high levels of performance, with a sensitivity of 90.0%, a speci city of 99.1, an accuracy of 95.3% and an AUROC of 0.99. Obtaining fairly high levels of sensitivity and negative predictive value is paramount for CNN-assisted reading systems, which are designed to lessen the probability of missing lesions, while maintaining a high speci city. Third, our network had a remarkable image processing performance, being capable of reading 65 images per second.
The precise role of CCE in everyday clinical practice is yet to be de ned. So far, most studies have focused on its application to colorectal cancer screening. Although colonoscopy remains the undisputed gold standard, studies have suggested that CCE could be viewed as a non-invasive complement, rather than substitutive of conventional colonoscopy, particularly in the setting of a previous incomplete colonoscopy 19 . Current guidelines on colorectal cancer screening list CCE as a valid alternative to colonoscopy for the screening of an average-risk population 14 . Studies comparing the diagnostic yield of CCE with another non-invasive screening test, CT colonography, have shown the superiority of CCE 20 . Moreover, when following a rst positive fecal-immunological test, CCE may reduce the need for more invasive conventional colonoscopy 21 . Moreover, CCE was shown to have a higher uptake rate (an essential parameter in any population-based screening program) comparing to conventional colonoscopy in a population-based study 22 . However, the use of CCE is hampered by its purely diagnostic character, the need for a rigorous bowel cleansing protocol, as well as the time required for reading each CCE exam.
The development of AI tools for detection of colorectal neoplasia in CCE images has been poorly explored. Automatic detection of these lesions is limited by the poor resolution of CCE images combined with their variable morphology, size and color. To our knowledge, only two other studies have assessed the potential of the application of CNN models to CCE images. Yamada et al. was the rst to explore the implementation of AI algorithms for the identi cation of colorectal neoplasia in frames extracted from CCE exams.
Their network was developed using a relatively large pool of CCE images (17,783 frames from 178 patients).
Overall, their algorithm achieved a good performance (AUROC of 0.90) 23 . However, the sensitivity of their model was modest (79%) compared to that of our network. Blanes-Vidal et al. adapted a preexisting CNN (AlexNet) and trained it for the detection of colorectal polyps. The sensitivity, speci city and accuracy of their work were 97%, 93% and 96%, respectively. In our perspective, the development of these technologies should aim to support a clinical decision rather than substitute the role of the clinician. Therefore, these systems must remain highly sensitive in order to minimize the risk of missing lesions.
Our network demonstrated a high image processing performance (65 frames/second). To date, no value for comparison exists regarding CCE. Nevertheless, these performance marks exceed those published for CNNs applied to other CE systems 10,24 . The development of highly e cient networks may, in the near future, translate into shorter reading times, thus overcoming one of the main drawbacks of CCE. Further well-designed studies are required to assess if a high image processing capacity in experimental settings can be reproduced as an enhanced time e ciency regarding reading times of CCE exams comparing to conventional reading. The combination of enhanced diagnostic accuracy and time e ciency may have a pivotal role in widening the indications for CCE and its acceptance as a valid screening and diagnostic tool.
This study has several limitations. First, it is a retrospective study. Therefore, further prospective multicentric studies in a real-life setting are desirable to con rm the clinical value of our results. Second, although we included a large number of patients from two distinct medical centers, the number of extracted images is small. We are currently expanding our image pool in order to increase the robustness of our model. The multicentric nature of our work reinforces the validity of our results. Nevertheless, multicentric studies including larger populations are required to ensure the clinical signi cance of our ndings.
In conclusion, we developed a highly sensitive and speci c CNN-based model for detection of protruding lesions in CCE images. We believe that the implementation of AI tools to clinical practice will be a crucial step for wider acceptance of CCE for non-invasive screening and diagnosis of colorectal neoplasia.