Analysis of Endoscopic Evaluation Reliability for Ulcerative Colitis in Histological Remission

The Mayo endoscopic subscore (MES) is a major endoscopic scoring system used to assign a status of mucosal inflammation and disease activity to patients with ulcerative colitis (UC). Using interobserver reliability (IOR), this study clarified the difficulties for endoscopic observers imposed by MES parameters used for the endoscopic evaluation of UC in histological remission. First, 42 endoscopists of four observer groups examined each MES parameter, which were evaluated from endoscopically obtained images of 100 cases as Grade 0 or 1 of the Nancy histological index of histopathological inflammation. Then, IOR was assessed using multiple κ statistics for each finding of MES. The results showed that IOR among all the observers was slight or fair for all the parameters, indicating a low IOR. The experts of the UC practice group had “moderate” or higher IOR for seven of the nine parameters, whereas “slight” or “fair” results were found for all parameters by the trainee group. The IOR for each MES parameter was calculated separately for the observer groups. All the groups showed “slight” or “fair” for “Erythema” and “Decreased vascular pattern”. Large differences between the endoscopists were found in the IOR for the MES parameters in UC in histological remission. Even among UC practice experts, the IOR was low for “Erythema” and “Decreased vascular pattern”.


Introduction
During the management of ulcerative colitis (UC), life events such as school attendance, employment, marriage, pregnancy, and baby delivery are possible when the long-term maintenance of remission is achieved [1,2]. Endoscopic remission (ER) is important as a short-term therapeutic goal leading to the achievement of long-term, therapeutic goals. The importance of the Treat to Target strategy has been proposed, through which treatment is organized to achieve ER [3,4].
Endoscopic evaluation is crucially important for UC management and treatment [1]. Several scores have been used to characterize and calculate the endoscopic findings for UC [5][6][7]. Among them, the Mayo endoscopic subscore (MES) presented by Schroeder et al. in 1987 [8] is a major endoscopic score system for evaluating the status of mucosal inflammation and disease activity. It remains the most commonly used endoscopic evaluation scale [9][10][11]. As an index of endoscopic activity based only on endoscopic mucosal findings, the MES system is used frequently. Nevertheless, it is a subjective evaluation; a different evaluation of the same endoscopic image by observers is common. The objectivity of evaluation has been investigated using various methods. The diagnostic criteria used for endoscopy are more reliable. Their use is reported more commonly for cases in which interobserver reliability (IOR) among endoscopists is higher. Travis SP et al. reported aspects of the ulcerative colitis endoscopic index of severity (UCEIS); its components show satisfactory intra-and inter-investigator reliability [12]. A systematic review indicated that the sigmoidoscopic component of MES and UCEIS presented the most promise as reliable evaluative instruments of endoscopic disease activity [13]. As described above, different studies have positively evaluated the validity and reliability of the endoscopic criteria that are commonly used for UC today. However, in several cases, ER was observed without full achievement of histological remission [14][15][16]. Some reports describe that the histological activity and endoscopic activity are correlated [17,18] but endoscopic findings with a low activity must be assessed for patients who have achieved histological remission. For this reason, a high interobserver reliability (IOR) is necessary for endoscopic findings to be assessed as such. Nevertheless, no report described the evaluation of the IOR of each MES parameter in patients who had achieved histological remission.
This study was designed to evaluate the IOR for each finding among the endoscopists for each MES item used for the endoscopic evaluation of UC cases that achieved histological remission. Then, difficulties with the MES items were clarified.

Study Design and Ethics
This study was approved by the ethics committee of Dokkyo Medical University Hospital (approval no. R-36-7J), conducted in accordance with the ethical principles stipulated in the Declaration of Helsinki, and registered with the University Hospital Medical Network Clinical Trials Registry (R000051904). Regarding the use of endoscopic photographs of patients, we provided a means to opt out instead of omitting informed consent, which was a way to guarantee an opportunity for research participants to notify and publish research information from our website.

Collection of Endoscopic Images and Histological Evaluation
From the medical chart database, among 353 patients treated for UC at the Department of Gastroenterology of Dokkyo Medical University Hospital from 1 January 2018 to 31 December 2019, data of patients that maintained clinical remission for at least 1 year (clinical remission was defined as a partial Mayo score of 3 points or lower [8], excluding 126 cases for which remission was maintained for less than 1 year), and of patients who were judged as Grade 0 or 1 of the Nancy histological index [19] of histopathological inflammation from periodic endoscopy (excluding 98 cases with Grade 2, 3 or 4 of Nancy histological index), were extracted to collect colonoscopic images at the time of pathological diagnosis. Pathological examinations were performed by two pathologists who specialized in pathology of gastrointestinal diseases. The degree of inflammation was assessed using the Nancy histological index based on the agreement of those two pathologists. From those cases, 29 cases judged by the principal investigator as having an endoscopic poor quality image and ambiguous pathological diagnosis were excluded. Therefore, 100 patients were selected for this study (Figure 1). For those 100 investigated cases, the clearest image was selected by MK for each case from endoscopic images of the site where the histopathological biopsy was conducted. The cases were selected by MK; 100 images were presented to the observer endoscopists without patient information. Therefore, the observer endoscopists were unable to obtain information related to histologic activity to exclude bias from the endoscopic evaluation. Additionally, MK was not included among the observers in this study.

Observers for IOR Evaluation
The images described above were shared with 42 endoscopists from our department, including trainers and trainees. They were evaluated to assess their MES classification. These 42 endoscopists were classified into the following four groups based on the number of cases experienced, years of experience and expertise: group A endoscopists examined at least 200 IBD patients per year and were certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society (5 persons who were experts); group B endoscopists were not specialized in IBD treatment but were certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society (14 persons); group C endoscopists had at least six years of clinical experience as gastroenterologists but were not certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society (16 persons); and group D endoscopists were trainees with fewer than six years of clinical experience as a gastroenterologist (7 persons who were trainees).

Method of Presenting Endoscopic Findings
Endoscopic findings led to assignment of an MES [8,20] as MES 0 (normal, inactive disease), MES 1 (erythema, decreased vascular pattern, mild friability), MES 2 (marked erythema, absent vascular pattern, friability, erosions), or MES 3 (spontaneous bleeding, ulceration). Mild friability and normal friability among these findings were necessarily evaluated in real-time during endoscopy. They were excluded from the selected parameters because evaluating them in one presented image was expected to be too difficult. One hundred endoscopic images selected by MK were presented to observers to evaluate the presence or absence of endoscopic findings (nine selected parameters excluding mild friability and normal friability: normal (a), inactive disease (b), erythema (c), decreased vascular pattern (d), marked erythema (e), absent vascular pattern (f), erosions (g), spontaneous bleeding (h), and ulceration (i) (Figure 2)). The evaluator was not informed that this case had a Nancy histological index of 0 or 1.
For this study, IOR analysis was conducted for each endoscopic finding (multiple κ statistics).

Observers for IOR Evaluation
The images described above were shared with 42 endoscopists from our department, including trainers and trainees. They were evaluated to assess their MES classification. These 42 endoscopists were classified into the following four groups based on the number of cases experienced, years of experience and expertise: group A endoscopists examined at least 200 IBD patients per year and were certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society (5 persons who were experts); group B endoscopists were not specialized in IBD treatment but were certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society (14 persons); group C endoscopists had at least six years of clinical experience as gastroenterologists but were not certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society (16 persons); and group D endoscopists were trainees with fewer than six years of clinical experience as a gastroenterologist (7 persons who were trainees).

Method of Presenting Endoscopic Findings
Endoscopic findings led to assignment of an MES [8,20] as MES 0 (normal, inactive disease), MES 1 (erythema, decreased vascular pattern, mild friability), MES 2 (marked erythema, absent vascular pattern, friability, erosions), or MES 3 (spontaneous bleeding, ulceration). Mild friability and normal friability among these findings were necessarily evaluated in real-time during endoscopy. They were excluded from the selected parameters because evaluating them in one presented image was expected to be too difficult. One hundred endoscopic images selected by MK were presented to observers to evaluate the presence or absence of endoscopic findings (nine selected parameters excluding mild friability and normal friability: normal (a), inactive disease (b), erythema (c), decreased vascular pattern (d), marked erythema (e), absent vascular pattern (f), erosions (g), spontaneous bleeding (h), and ulceration (i) (Figure 2)). The evaluator was not informed that this case had a Nancy histological index of 0 or 1.
For this study, IOR analysis was conducted for each endoscopic finding (multiple κ statistics).

Outcomes
For assessing the primary outcome of the present study, IOR was calculated among all observers for the MES parameters used for the endoscopic evaluation of UC. The secondary outcome was a comparison of findings obtained for the MES parameters among the four groups.

Outcomes
For assessing the primary outcome of the present study, IOR was calculated among all observers for the MES parameters used for the endoscopic evaluation of UC. The secondary outcome was a comparison of findings obtained for the MES parameters among the four groups.

IOR among All Observers for MES Parameters
The values of IOR calculated for all observers (42 persons) were calculated for the MES parameters (Table 1) as described below. The interobserver κ coefficients for the respective endoscopic features of UC were 0.402 ± 0.003 in normal, 0.389 ± 0.003 for inactive disease, 0.235 ± 0.003 for erythema, 0.215 ± 0.003 for decreased vascular pattern, 0.351 ± 0.003 for marked erythema, 0.399 ± 0.003 for absent vascular pattern, 0.354 ± 0.003 for erosions, 0.1 ± 0.003 for spontaneous bleeding, and 0.212 ± 0.003 for ulceration. Only spontaneous bleeding was evaluated as "slight"; the other parameters were evaluated as "fair".

Comparison of IORs among Observer Groups
The values of the IOR parameters consisting of MES were compared among the four observer groups ( Table 2). The κ coefficients of the four observer groups differed. In Group A, the κ coefficient was "moderate" or higher for seven of the nine parameters. In Group B and Group C, the κ coefficients were "moderate" or higher for two of the nine and four of the nine parameters, respectively. In Group D, they were "slight" or "fair" in all the parameters. Table 2. Interobserver reliability of MES parameters for observer groups.

IORs of MES Parameters by Observer Group
The IORs of MES parameters were calculated for the respective observer groups ( Table 2). This investigation was conducted without Group D because all the parameters were evaluated as "slight" or "fair" in Group D.
For "Normal", the κ coefficient was "moderate" in Groups A and C, whereas it was "fair" in Group B. For "Inactive disease", the κ coefficient was "moderate" in Groups A, B, and C. The κ coefficient was found to have low values for "Erythema" and "Decreased vascular pattern". They were "fair" or "slight" in all Groups. For "Marked erythema", it was "moderate" only in Group A and was "fair" in Groups B and C. For "Absent vascular pattern", it was "moderate" in Groups A and C, but it was "fair" in Group B. For "Erosion", the κ coefficient moved from "substantial" to "moderate" in Groups A, B, and C. For "Spontaneous bleeding", it was "almost perfect" in Group A, but the result was as low as "slight" in Groups B and C. For "Ulceration", it was "moderate" in Group A, but the result was as low as "fair" or "slight" in Groups B and C.

Meaning of MES
The lower gastrointestinal endoscopy for UC treatment is an important tool for making a diagnosis, elucidating clinical conditions, evaluating treatment, and for detecting and monitoring cancer. Endoscopic observations of inflammation in UC are scored using an objective indicator. In actual clinical situations, the Baron index [6,23] and Matts classification [24] are used. Actually, MES has been used more in recent, large-scale clinical studies [9][10][11].

Difficulties of Endoscopic Diagnosis and IOR in Image Diagnosis
Although the evaluation of the endoscopic findings using MES is important for the treatment selection and follow-up after treatment of UC, difficulty persists in the endoscopic diagnosis of UC: the interobserver agreement rate is unstable. Daperno et al. [25] analyzed the MES agreement rates reported by IBD experts and by IBD non-experts, and found poor results. The respective kappas of the IBD expert group and the IBD non-expert group were 0.53 and 0.71. One report also described that the perfect agreement rate of judgment as MES 0 or MES 1 was 68.2%, even among three endoscopists specializing in IBD [26].
A decrease in MES by at least 1 point is often regarded as an endoscopic improvement; MES 0 or MES 1 is often regarded as signifying an endoscopic remission [27,28]. However, it has been reported from recent studies that the relapse rate and the surgery rate are lower for MES 0 than for MES 1 [29][30][31]. Particularly, it was demonstrated that the remission maintenance rate differed in MES 1 by histological evaluation [15,32]. These problems might result from confusing and complicated parameters for endoscopic evaluation. Therefore, it is particularly important to improve the accuracy of judgment on endoscopic findings in the remission phase. Reportedly, endoscopic activity is correlated with histological activity [17,18], although the long-term studies of hospitalization rates and corticosteroid application rates have shown lower rates in histological remission than in endoscopic remission [16]. The period of remission maintenance is extended considerably in cases that have reached histological remission [33]. These findings suggest that histological remission can be a better indicator of remission maintenance than endoscopic remission. One reason for this might be the reliability of the endoscopic findings, i.e., IOR. Particularly, patients who have achieved histological remission often show endoscopic findings with low activity. This remission might lead to low IOR. In light of that possibility, we investigated the rate of agreement of endoscopic findings for UC patients in the histological remission phase among endoscopic observers. The results could indicate the reliability of evaluations made by endoscopists based on endoscopic data and images.

Significance of Study Results
The results of this study show the IOR of all the observers as "slight" for "Spontaneous bleeding" and "fair" for the other parameters, indicating somewhat lower results for the IOR because the IOR in Groups B, C, and D was lower than in Group A. Although Group B comprised endoscopists certified as Board Certified Fellows of the Japan Gastroenterological Endoscopy Society, pancreatobiliary work was a sub-specialty for most of them. They did not usually engage in IBD treatment, which might have affected the results. Group C members had no established sub-specialty, and therefore engaged in diverse treatments. The small number of cases they experienced might have affected their IOR. Particularly, Group D consisted of trainees with fewer than six years of clinical experience, resulting in the lower κ coefficient because of their relative lack of experience in endoscopy and their fewer cases experienced.
Regarding the item of "Spontaneous bleeding", Group A comprising IBD specialists had a result of κ coefficient = 1, whereas the other three groups had a low IOR of "slight". The images presented for this study were endoscopic images showing histological remission. Therefore, the finding "Spontaneous bleeding" was not observed. However, observers other than the IBD experts tended to overestimate the findings and interpret them as showing "Spontaneous bleeding".
By contrast, the parameters "Erythema" and "Decreased vascular pattern" elicited a low IOR as "fair" or "slight", signifying a disagreement not only among IBD non-experts but among IBD experts. These parameters were similar in expression, expressed as "Erythema" and "Marked erythema", as well as "Decreased vascular pattern" and "Absent vascular pattern". Moreover, these findings are often observed simultaneously. It can be considered that this mode of expression led to a low IOR, eventually leading to results that were inappropriate as evaluation parameters. In fact, the overall IOR of MES overall was likely to be improved by changing these parameters to more objective ones for future use. Preparing several new expressions and findings, evaluating their resultant IOR, and choosing those which lead to better IOR as new evaluation parameters for newly modified MES is expected to be effective. Constructing more common and universal diagnostic parameters is desirable not only for IBD experts, but for all practitioners because all endoscopists might be involved in the endoscopic evaluation of IBD in actual clinical situations.

Limitations
There are some limitations to this study. First, the images used for evaluation in this study were not videos. They were one static image per case. Therefore, mild friability and friability, which served as real-time evaluation parameters during endoscopy, could not be included. Second, the study was a single-center study.

Conclusions
Large differences were found in the IOR of the MES parameters used by endoscopists for endoscopic evaluation of UC in the histological remission phase. Results indicate that IOR was low for the parameters "Erythema" and "Decreased vascular pattern", even among experts at UC practice. The possibility exists that these MES parameters are inappropriate as evaluation parameters for endoscopic findings. The future analyses of the IOR in UCEIS can support development in this field.

Informed Consent Statement:
A means to opt out was provided instead of omitting informed consent, which is a way to guarantee the opportunity for research subjects to be notified and to publish research information related to our website.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.