Current Evidence on Computer-Aided Diagnosis of Celiac Disease: Systematic Review

Celiac disease (CD) is a chronic autoimmune disease that occurs in genetically predisposed individuals in whom the ingestion of gluten leads to damage of the small bowel. It is estimated to affect 1 in 100 people worldwide, but is severely underdiagnosed. Currently available guidelines require CD-specific serology and atrophic histology in duodenal biopsy samples for the diagnosis of adult CD. In pediatric CD, but in recent years in adults also, nonbioptic diagnostic strategies have become increasingly popular. In this setting, in order to increase the diagnostic rate of this pathology, endoscopy itself has been thought of as a case finding strategy by use of digital image processing techniques. Research focused on computer aided decision support used as database video capsule, endoscopy and even biopsy duodenal images. Early automated methods for diagnosis of celiac disease used feature extraction methods like spatial domain features, transform domain features, scale-invariant features and spatio-temporal features. Recent artificial intelligence (AI) techniques using deep learning (DL) methods such as convolutional neural network (CNN), support vector machines (SVM) or Bayesian inference have emerged as a breakthrough computer technology which can be used for computer aided diagnosis of celiac disease. In the current review we summarize methods used in clinical studies for classification of CD from feature extraction methods to AI techniques.


INTRODUCTION
Celiac disease (CD) is a systemic autoimmune disease driven by gluten ingestion in genetically susceptible individuals. At some point during their lifetime, some of the DQ2/DQ8 positive individuals become gluten intolerant and develop an autoimmune reaction in response to dietary gluten, leading to small bowel injury consisting in villous atrophy (VA) and crypt hyperplasia. Although it is one of the most common chronic digestive disorders, with prevalence rate of 1% worldwide (Ludvigsson et al., 2016), CD is severely underdiagnosed. This is due to the frequently mislabeling patients with irritable bowel syndrome, lack of awareness among medical professionals about the extra-intestinal presentations of the disease (Jinga et al., 2018) and missed opportunities to screen for CD such as first-grade relatives, high-risk groups and not least scoping the upper gastrointestinal tract for unrelated reasons. Un-diagnosed CD bears the risk of several complications (nutritional, fertilityrelated and even malignancy) and reduced quality of life (Fuchs et al., 2018). Although the diagnosis of adult CD is very clear cut (CD-specific serology and sampling of duodenal mucosa by upper gastrointestinal endoscopy) and access to diagnostic tools has improved considerably, CD remains heavily underdiagnosed, with only 15%-20% of patients being detected through current strategies.
In the setting of open-access endoscopy and increasing number of examinations worldwide (including on-demand procedures), some have considered using endoscopy as an opportunity for detection of unsuspected CD, by careful analysis of the small bowel mucosa and recognition of subtle markers of VA. In fact, a study has shown that up to 5% of CD patients have undergone a previous endoscopy examination in the years before the diagnosis, and this could be considered a missed opportunity to diagnose it earlier (Lebwohl et al., 2012). Thus, endoscopy can be viewed not only as a diagnostic tool to confirm the disease by tissue sampling, but also as a case-finding tool for CD. Some have even proposed random duodenal biopsies during all upper endoscopy examinations, but this has proven a low diagnostic yield for CD with a high burden for endoscopists and pathologists and lack of cost-effectiveness (Herrod and Lund, 2018). Thus, the interest has been changed over to a better selection of patients in whom biopsy sampling should be carried out and the way to do it is by detection of markers of VA during endoscopy. However, recognition of changes in the duodenal mucosa can be challenging, especially in the setting of patchy or mild disease (Balaban et al., 2015); in order to overcome the subjectiveness in detecting these endoscopic markers of VA and to better delineate the subtle mucosal changes seen in early CD, some have proposed the use of computer-based processing of endoscopic images for the detection of VA, which could trigger the examiner to perform biopsies for the diagnostic protocol of CD.
Endoscopy with biopsy is currently considered the gold standard for the diagnosis of adult CD. Computer-assisted systems for the diagnosis of CD could improve the whole diagnostic work-up, by saving costs, time and manpower and at the same time increase the safety of the procedure by avoiding biopsy sampling and prolonged sedation associated with the multiple biopsy protocol. Not least, this nonbiopsy protocol could translate into a longer life for the endoscope, by avoiding warn of the working channel of the scope. Also, the histological staging of biopsies is subject to a significant degree of intra-and interobserver variability (Weile et al., 2000;Corazza et al., 2007;Mubarak et al., 2011;Arguelles-Grande et al., 2012). A further limitation of the endoscopy biopsies for the diagnosis of CD is due to the possibly patchy distribution of CD (Bonamico et al., 2004;Hopper et al., 2007), areas affected by CD can be in the midst of normal mucosa. So, given the case that the biopsies would be targeted from areas of healthy mucosa, CD could be missed due to a sampling error. Therefore, observer independent diagnostic methods such as computer-assisted diagnosis systems are very useful to improve the accuracy of diagnosis.
From the first research focused on computer-assisted system in the context of automated diagnosis of CD which has started in 2008 (Vécsei et al., 2008), over 50 publications on the topic that are using spatial domain, transform domain, scale-invariant and and spatio-temporal features have appeared (Hegenbart et al., 2015) but artificial intelligence (AI), machine learning (ML) and deep learning (DL) have emerged as a breakthrough computer technology in this world of big data and computational power based on graphics processing units. In the field of medical images, the accumulation of enormous digital images and medical records drove a need for the utilization of AI to efficiently deal with these data, which also become fundamental resources for the machine to learn by itself. ML and AI techniques have played an important role in the medical field, supporting such activities as medical image processing, computer-aided diagnosis, image interpretation, image fusion, image registration, image segmentation, imageguided therapy, and image retrieval and analysis (Razzak et al., 2018;Yang and Bang, 2019). In this work, we try to give a comprehensive overview of the research focused on computer-assisted diagnosis of CD from classical features extraction to AI. Several image-processing techniques have been reported so far in the literature, with good diagnostic performance in discriminating CD patients from healthy controls. Applying these image-processing techniques could be used to select in real-time, during endoscopy, patients with high probability of CD, who warrant a full diagnostic work-up including small bowel biopsies. The purpose of our review is to summarize current evidence of computerized methods in detecting CD, according to their diagnostic accuracy.

METHODS
We conducted the present research according to the principles of the preferred reporting items for meta-analysis protocol (PRISMA) (Moher et al., 2015).
A systematic search of the literature was carried out in September 2019 in PubMed (Medline) database, using the following search criteria: CD (Mesh) and terms referring to computer-aided detection by image processingcomputer, digital, image processing, AI, DL, neural network, quantitative assessment or texture features. There were no restrictions set on the search with regard to article type, text availability or publication date.
Publications revealed through this search were assessed for consistency with the topic, according to their title and abstract, by the two first authors. Conflicts resulted from independent data extraction according to inclusion and exclusion criteria were resolved by consensus.
Studies included in our systematic review were required to meet the following inclusion criteria: (i) full-text paper available in English, (ii) original papers describing image-processing techniques for computer-aided diagnosis of CD. We excluded case-reports, reviews and descriptive papers without validation of methods described on CD patients. We also excluded papers referring to digital processing of histology images in diagnosing CD. References from the retrieved articles were also checked for possible match with the review topic, in order to identify potentially relevant publications that could have been missed on the initial search.
For each study included in the systematic review we recorded the following data: first author, year of publication, type of endoscopic image used, study population (CD cases and controls), method tested and diagnostic performance (sensitivity, specificity, diagnostic accuracy).

RESULTS
The process of study selection for this systematic review is summarized in Figure 1. The search yielded 174 results from 1970 onward, which were evaluated according to the above described methodology. Another six papers were found through other sources. A total of 139 papers were excluded because of irrelevance to the topic (confounding use of search words in the papers) or type of article (review/editorial) and the remaining 41 publications, consisting in original work describing image processing techniques for computer-aided diagnosis of CD, were analyzed for this review.
Among published papers, several techniques have been validated for the diagnosis of CD-first attempts were focused on features extraction methods used for classification such as spatial domain features, transform domain features, scale-invariant features and spatio-temporal features. More recent AI techniques DL methods such as convolutional neural network (CNN) have emerged as a breakthrough computer technology which can be used for computer aided diagnosis of CD. A statistic of common methods is presented, as well as an evaluation of their use in CD diagnosis. Table 1 summarizes the feature extraction methods used in clinical studies for the classification of the CD and the overall classification rates (OCR) [sensitivity (sens), specificity (spec) and accuracy (acc)]. Features can be classified into four main categories: spatio-domain features, transform domain features, scale-invariant features, and spatio-temporal features. Images used for feature extraction are obtained either by standard endoscopy or using video-capsule. While some of the studies reported the number of subjects analyzed (healthy and with CD), others have reported the number of full images and the number of subimages obtained as patches from the full images that were used for training and testing.

Feature Extraction Methods
The spatial domain features that were used for CD classification are the edge shapes, shape curvature histograms (SCH), gray-level cooccurence matrix (GLCM), edge orientation histogram (EOH), local binary patterns (LBP) and extended LBP (ELBP), local ternary patterns (LTP) and extended LTP (ELTP),     (Bassotti et al., 1994;Tursi, 2004) (standard deviation), periodicity in brightness, histogram mean level, shape-from-shading, pooling protocol, and statistical and syntactical measurements. Ciaccio et al. proposed some quantitative measurement (statistical and syntactical measurement, motility estimation) in video capsule endoscopy images in order to detect and measure the presence of VA in CD patients (Ciaccio et al., 2016a;Ciaccio et al., 2016b). Also in (Ciaccio et al., 2017b), Ciaccio et al. described and discussed methods used for quantitative detection and analysis of VA in the small intestinal mucosa of CD patients using video capsule endoscopy images but these remain to be further validated in larger samples (Ciaccio et al., 2017b).
Best results obtained using spatial domain features were obtained by Hegenbart et al. (Hegenbart et al., 2011b) by extracting various LBP from standard endoscopic images. In a database consisting of 999 image patches (587 control and 412 CD), overall classification rates varied between 98.04% and 98.93%.
In terms of transform-domain features best results were obtained by Vécsei et al. (2009) using FFT-evolved multiple ring-shape filters applied on standard endoscopic images. Database consisted of 390 image patches (312 control and 79 CD). The algorithm proved 97% accuracy, 83% sensitivity and 99% specificity in diagnosing CD.
Best results for scale-invariant features were obtained by Gadermayr et al. (2016a) using multifractal spectrum features and SIFT Fisher vectors extracted from standard endoscopic images. Database consisted of 676 control image patches and 479 CD image patches from 290 patients (all children). Performance of the method was 98.1% for SIFT Fisher vectors and 96.8% for multifractal spectrum features. When human knowledge was incorporated, performance increased to 98.9%.
Best results for spatio-temporal features were obtained by Ciaccio et al. (2012b) using dynamic estimate of wall motility (standard deviation) computed on video-capsule endoscopic images. Database consisted of 200 frames per patient extracted from 10 control patients and 11 CD patients. Diagnostic performance was high, with 98.2% sensitivity and 96% specificity.  compared different endoscopic image configuration [white-light imaging (WLI) (Gasbarrini et al., 2003) and narrow-band imaging (NBI) (Emura et al., 2008;Valitutti et al., 2014)] to find which image data are most accurate in case of computer aided CD diagnosis but in (Gadermayr et al., 2016a;Gadermayr et al., 2016b) same authors et al. showed that an hybrid system which incorporated expert knowledge in automated CD diagnosis increased the accuracy with 6% (see Table 1).
Spatial domain features were based on the similarity of specific features also observed by a human analyst; they are robust and fast in terms of computation, making them suitable for real time classification. Transform domain features have the advantage of analysing endoscopic images on multiscale and multiorientation levels. Common tools are based on Wavelet transform, Fourier transform and Gabor transform. Scaleinvariant features are suitable to analyse characteristics that are affected by scaling, but are more demanding in terms of computation than previous features. Spatio-temporal features have higher robustness to features that are not visible directly on the acquired images. In terms of computation, the more complex the feature extraction algorithm is, the more are suitable to an offline analysis. All extracted features presented herein are further used as inputs to various classifiers, such as: support vector machine (SVM), k-nearest neighbors (kNN), Bayes classifiers, and random forests. All these classifiers are standard pattern recognition techniques.

AI Techniques
The AI is the field of computer science that aims to create intelligent machines by learning and understanding complex concepts. As an AI branch, ML deals with intelligent machines that learns by themselves from available data. Moreover, DL refers to a family of ML methods that uses neural networks for learning (see Figure 2). These artificial neural networks (ANN) are biological-inspired computing systems that allows computers to learn.
In field of medical image processing there are some application that use AI. In Yang and Bang's review (Yang and Bang, 2019) about applications of AI in gastroenterology, which summarizes clinical studies that are using AI in the upper and lower gastrointestinal field, there is a sole mention of CD. The review presented only Zhou's study which has achieved a sensitivity and specificity of 100%. Seguí et al. in (Seguí et al., 2016) presented a generic feature descriptor for the classification of video capsule endoscopy images. In order to build the system they created a large database containing only color images, designed a CNN architecture and performed an exhaustive validation of the proposed method. They achieved very good results: 96% accuracy. Gadermayr et al. in (Gadermayr et al., 2018) investigated the capability of state-of-the-art neural network approaches for diagnosis of CD and proposed pipelines for fully-automated patient-wise diagnosis as well as for integrating expert knowledge into the automated decision process.
The availability of big data and computational power have led to the use of AI in medical applications on a large scale. Table 2 summarizes AI techniques used in clinical studies for classification of CD. Common neural networks used in studies made for CD diagnosis are the AlexNet (Wimmer et al., 2016b;Yang and Bang, 2019), GoogLeNet (Zhou et al., 2017), VGGf net (Wimmer et al., 2016b;Gadermayr et al., 2017;Wimmer et al., 2017;Wimmer et al., 2018), and VGG16 net (Wimmer et al., 2016b;Wimmer et al., 2018). Only a single study used video-capsule images (Zhou et al., 2017), while all the others researches used standard endoscopic images. Comparing to the feature extraction, databases used in AI are based on a much larger number of patients [e.g., 353 patients in (Wimmer et al., 2016b)].
Best results have been obtained on video-capsule images by Zhou et al. using GoogLeNet (Zhou et al., 2017). Although the database was reduced in terms of number of patients, 400 images were used for training that led to 100% specificity and 100% sensitivity. Best results on standard endoscopy images were obtained by Wiemer et al. using CNN with SVM and principal component analysis (PCA) (Wimmer et al., 2016a). They obtained a 97% good classification rate based on 1661 image patches (986 control and 675 CD).

DISCUSSION AND LIMITATIONS
Even if the gold standard for the diagnosis of CD is considered to be the duodenal biopsy, advanced endoscopic techniques such as chromoendoscopy and water-immersion have been researched as enhanced tools to detect VA. The most notable techniques include the modified immersion technique (MIT) (Gasbarrini et al., 2003) FIGURE 2 | Machine learning as a branch of artificial intelligence. under traditional white-light illumination (denoted as WL MIT ), as well as MIT under narrow band imaging (Emura et al., 2008;Valitutti et al., 2014) (denoted as NBI MIT ). These endoscopic techniques were specifically designed for improving the visual confirmation of CD during endoscopy. Other studies have proposed the use of video capsule images processing in detecting CD (Ciaccio et al., 2010a;Ciaccio et al., 2010b;Ciaccio et al., 2012a;Ciaccio et al., 2012b;Ciaccio et al., 2013b;Ciaccio et al., 2014b;Ciaccio et al., 2017a). Although it is considered as a noninvasive technique, its use is relatively low due to high cost and low resolution of image samples (de Bruaene et al., 2015). One of the most important issues that are encountered in endoscopic image analysis is related to degradations such as noise, reflections, blurring and scaling due by weak illumination and downsized sensors. Some of the papers proposed some methods of improving these degradations (Hegenbart et al., 2011a;Hegenbart et al., 2011b;Gadermayr et al., 2014a).
Database construction is a critical subject in the AI-based classification of CD using endoscopic imagery. When large databases are not available, one can use data augmentation to artificially increase the number of samples of the database .
A major limitation of using automated-processing of duodenal images captured during endoscopy for diagnosing CD is the wide differential diagnosis of VA. Several other diseases other than CD can manifest as VA in the small bowel-giardiasis, Helicobacter pylori infection, Whipple's disease, tropical sprue, collagenous sprue, eosinophilic gastroenteritis, common variable immunodeficiency, intestinal lymphoma, Crohn's disease, HIV-enteropathy, or drug-induced enteropathy (Jinga et al., 2017;Schiepatti et al., 2019). Upon detection of VA by digitally processing of endoscopy images, CD-specific serology will discriminate CD from other causes of seronegative VA.
Another limit of using computerized automation methods for image processing is that of artifacts due to the presence of air bubbles, residues or secretions in the duodenum. This could represent an issue for selection of images frames which are processed for assessment of VA. Not least, selection of point of interest (POI) regions from the image captured during endoscopy is yet to be automatized; this could be considered a selection bias in current studies, as it was done manually during image processing. Also, a special concern is to be raised for cases with mild enteropathy, when changes in the duodenum are subtle, and those with patchy disease, when selection of the POI could be nonrepresentative.
Another limitation of the current review is the heterogeneity of studies with respect to cases analyzed, as some of the studies report the number of patients included, while others the number of images processed.

CONCLUSION
In the last decades, there's been a growing interest in image processing techniques for detection of VA. Computer-aided diagnosis of CD by processing images of the small bowel captured during endoscopy is feasible and warrants further development for integration into endoscopy consoles.

AUTHOR CONTRIBUTIONS
AM and DB discussed the research idea. C-CM and MJ proposed the research design. AM and DB performed the literature search and selection of articles. AM and DB drafted the first manuscript. MJ and C-CM critically reviewed the manuscript, acting as guarantors of the paper. All authors approved the final version of the manuscript.

FUNDING
This paper was financially supported by "Carol Davila" University of Medicine and Pharmacy through Contract no. 23PFE/17.10.2018 funded by the Ministry of Research and Innovation within PNCDI III, Program 1 -Development of the National RD system, Subprogram 1.2 -Institutional Performance -RDI excellence funding projects.