Robust layer segmentation of esophageal OCT images based on graph search using edge-enhanced weights

Automatic segmentation of esophageal layers in OCT images is crucial for studying esophageal diseases and computer-assisted diagnosis. This work aims to improve the current techniques to increase the accuracy and robustness for esophageal OCT image segmentation. A two-step edge-enhanced graph search (EEGS) framework is proposed in this study. Firstly, a preprocessing scheme is applied to suppress speckle noise and remove the disturbance in the esophageal structure. Secondly, the image is formulated into a graph and layer boundaries are located by graph search. In this process, we propose an edge-enhanced weight matrix for the graph by combining the vertical gradients with a Canny edge map. Experiments on esophageal OCT images from guinea pigs demonstrate that the EEGS framework is more robust and more accurate than the current segmentation method. It can be potentially useful for the early detection of esophageal diseases.


Introduction
Optical Coherence Tomography (OCT), which was first demonstrated by the MIT group in 1991 [1], is a powerful medical imaging technique. It can generate high-resolution, non-invasive, 3D images of biological tissues in real time. Initial applications of OCT were mainly in ophthalmology, where the microstructures revealed by OCT facilitated retinal disease diagnosis [2][3][4]. Endoscopic OCT is an important and rapidly growing branch of the OCT technology [5]. By combining fiber-optic flexible endoscopes, OCT is able to image internal luminal organs of human body with minimal invasiveness. It has been shown that gastrointestinal endoscopic OCT can visualize multiple esophageal tissue layers and pathological changes in a variety of esophageal diseases, such as eosinophilic esophagitis (EoE), Barrett's esophagus (BE) and even esophageal cancer [6][7][8].
Recently, the development of ultrahigh-resolution gastrointestinal endoscopic OCT enables imaging of the esophagus with much finer details and improved contrast [9,10]. Many esophageal diseases are manifest by changes in the tissue microstructures, such as changes in the esophageal layer thickness or disruption to the layers. Accurate quantification of the esophageal layered structures from gastrointestinal endoscopic OCT images can be potentially very valuable for objective diagnosis of the diseases and assessment of the disease severity as well as the exploration of potential structure-based biomarkers associated with disease progression [5,11]. For instance, the OCT image of BE has an irregular mucosal surface and may present an absence of the layered architecture [12]; the OCT image of EoE is featured with increased basal zone thickness in the esophagus [11]. These diseased features can be easily detected provided that the esophageal OCT images are accurately segmented.
Traditional manual segmentation is time-consuming and subjective. As a result, the computeraided automatic layer segmentation method is in urgent need. In the past few years, research on OCT images segmentation methods mostly targeted retina OCT images, and various algorithms have been published [13][14][15][16]. Representative methods can be grouped into the following four categories: the A-scan based methods [2,3], the active contour based methods [4,[17][18][19], machine learning based methods [20,21] and the graph based methods [22]. Among these methods, the graph based method is the most widely used one in layer segmentation, and is proven to be quite successful [13,22,23]. Representative frameworks are the graph theory and dynamic programming (GTDP) [13] and the 3-D graph based segmentation [22]. It is worth mentioning that the newly developed deep learning algorithms have also been applied to retinal layer segmentation and achieved great success [24][25][26][27]. Studies on the segmentation of endoscopic OCT images are not as extensive as the macular ones. Representative researches can be found in the processing of cardiovascular [28][29][30] and esophageal OCT images [31][32][33][34][35][36]. As reported, the graph based method is also effective in segmenting cardiovascular [30] and esophageal tissue layers [36].
Segmentation of normal esophagus OCT images is supposed to detect layered tissue structures. Considering guinea pig as an example, the layerd structure includes the epithelium stratum corneum (SC), epithelium (EP), lamina propria (LP), muscularis mucosae (MM) and submucosa (SM) as illustrated in Fig. 1, which is the result of our proposed segmentation method. It can be found that these tissues have a similar layered architecture as the retina. In that case, automatic segmentation of esophageal OCT images has to address some common challenges in OCT image processing, such as speckle noise and motion artifacts [13,36]. Moreover, the esophageal OCT image has some unique challenges resulting from the in vivo environment or the endoscopic setup, including the disturbance from the plastic sheath and the mucus, the discontinuous boundaries due to the non-uniform scanning speed and the irregular bending caused by the sheath distortion.
Solutions of these common problems, such as speckle noise and the irregular bending have been reported in the literature. Representative speckle noise suppression algorithms include the median filter [3,37], wavelet shrinkage [38], curvelet denoising [39] and the non-linear anisotropic diffusion filter [4,22]. Among these methods, the median filter is not the best, but it has the advantages of easy parameter setting, simple algorithm realization and robust noise suppression, which make it popular in OCT image denoising and was adopted in our framework. It is noted that there are some more advanced denoising methods, such as the sparse representation based framework proposed by Fang [40,41]. Since such methods are not easy to implement and may take more computation time than the simple median filter, they were not adopted in this reported work. The negative effect caused by tissue irregular bending can be reduced by image flattening [20], which is realized by using cross-correlation [20] or the baseline search [42]. Generally, the baseline-related method performs better, but robust baseline extraction is difficult in esophageal OCT images due to the disturbance of the plastic sheath and mucus. To improve the image quality and remove such disturbance, our study designed a comprehensive preprocessing scheme according to the specific problems of the esophageal OCT image, thus creating favorable conditions for the subsequent segmentation.
Considering the previously mentioned problems, this study proposed an edge-enhanced graph search (EEGS) framework to automatically segment esophageal tissue layers. The main contributions lie in two aspects: Firstly, a specific-designed preprocessing scheme is proposed to address the challenges in esophageal OCT images (e.g. speckle noise, plastic sheath and mucus disturbances and boundary distortion). Secondly, an edge-enhanced weight matrix that combines modified canny map [43,44] and vertical gradients are employed for graph a search. In that case, the local feature is preserved while the missing boundary in shadow regions is interpolated. Different from Yang's work [44], the canny edge detector used in this study was modified to focus on horizontal features, which is consistent with the esophageal tissue orientation, thus making it more suitable for esophageal layer boundary detection.
The paper is organized as follows. Section 2 introduces the detailed process of the proposed EEGS framework. Section 3 illustrates the advantages of the EEGS framework by segmentation experiments on esophageal OCT images of guinea pigs. Comparisons with the GTDP framework and the clinical potential of EEGS are also included in this section. Discussions and conclusions are presented in Sections 4 and 5, respectively.

Framework for robust esophageal layer segmentation using EEGS
The proposed EEGS method is composed two major steps: 1) preprocessing and 2) graph search using weight matrix based on Canny edge detection. The flowchart of the proposed EEGS framework is illustrated in Fig. 2.

Preprocessing
In order to calculate reliable weights that accurately indicate layer boundaries and improve the segmentation performance, we designed a novel preprocessing scheme to deal with the disturbance in esophageal OCT images.

Denoising
In this study, we chose the simple median filter to suppress the speckle noise and its effectiveness in OCT image denoising has been proven by numerous studies [3,37]. Besides, the median filter has the advantage of high efficiency and easy parameter setting comparing with other popular OCT denoising methods, such as the wavelet and diffusion filter. A representative original esophageal image and the image denoised by a 7 × 7 median filter are presented in Fig. 3.

Removing plastic sheath
During endoscopic OCT imaging, the probe is protected from biofluid by a plastic sheath. The sheath boundary is so prominent that causes strong disturbance in the search of esophageal tissue layers. To remove the plastic sheath from the OCT image, its upper bound Pr1 and lower bound Pr2 should be determined first.
In this study, the GTDP algorithm [13] was adopted for the boundary identification of the plastic sheath. The GTDP represents image I as a graph G(V, E), where V denotes the graph nodes that correspond to image pixels and E is the edge connecting adjacent nodes. The weight for edge connecting adjacent pixels a and b was set as where g a and g b are the vertical intensity gradients normalized to [0, 1], and w min is the minimum possible weight in the graph. The gradients are calculated by convolving the image with a mask k [36], which is defined by ( The path with minimal weight is the potential layer boundaries, which was solved by the Dijkstra algorithm [45]. The Pr1 is the boundary that separates the plastic sheath from the background, which possesses the highest intensity contrast. This character indicates the Pr1 owns the highest gradient that can be easily located by GTDP. Pr2 can also be determined using GTDP by limiting the search region with Pr1 and 10 pixels below Pr1. Ten pixels is the approximate sheath thickness in this study. The plastic sheath is then removed by shifting the pixels from Pr1 to Pr2 and the empty pixels are filled with a mirror image. The result is illustrated in Fig. 4.

Lumen segmentation
The outer boundary of the esophageal lumen is defined as the baseline (Fig. 1). Baseline is important in this study because it is the foundation of the following image flattening and it also affects the effectiveness of the subsequent search for other layer boundaries.
The baseline extraction using GTDP is supposed to be easy since it is the most prominent layer boundary on the image without the plastic sheath as illustrated in Fig. 4(b). Nevertheless, the mucus may induce a great error to GTDP as displayed in Fig. 5(a). Noticing that the SC layer has the highest intensity in the image, which can be used to correct the mucus-influenced baseline. The detailed process is summarized below: (a) Extract a preliminary baseline BA1 by GTDP as shown in Fig.5(a).
(b) Find the up-most point that has an intensity higher than a predefined threshold in each column.
(c) Determine if there is a successive part in BA1 above the obtained points. Provided that BA1 is consistent with the obtained points, it can be marked as the valid baseline. Else, recognize the different part as the erroneous region ( Fig. 5(b)), and continue to the following steps.
(d) Limit the graph search region for GTDP. As illustrated in Fig. 5(c), in the valid part of BA1, the graph search region is defined around BA1, while in the erroneous part, the graph search is conducted beneath BA1, thus eliminating the negative effects of mucus.
(e) Graph search in the re-defined region to get the final baseline (Fig. 5(c)).

Flattening
Based on the graph search theory, the layer boundary is identified by searching the minimum weighted path across the graph. When the weights are set uniformly, the graph search method tends to find the shortest geometric path. However, the in vivo esophageal OCT images are often accompanied with a steep slope and irregular bending due to tissue movements and sheath distortion, which make the interested boundary lie in complex curves. Flattening is an effective solution to this problem. The flattened image is created based on the baseline obtained in the previous section. We shift each column up or down such that the baseline is flattened. Empty pixels resulting from the baseline shifting are filled with a mirror image. The final image is shown in Fig. 5(d), which is beneficial to the following segmentation.

Esophageal layer segmentation by EEGS
EEGS is composed of the following steps. Firstly, a modified Canny edge detector is designed to create a map showing local main edges. Secondly, a gradient map in the axial direction is generated using a convolution mask. In that case, an edge-enhanced graph combining the gradient and Canny maps is obtained. As a result, layer boundaries can be extracted by dynamic programming. Detailed realization is described as follows.

Modified Canny edge detection
The Canny edge detector [43] was modified to create an edge-enhanced weight matrix for the subsequent graph search. This process can be summaried by the following steps: (a) Apply a Gaussian filter to smooth the image.
(b) Calculate the intensity gradients of the smoothed image. The gradient magnitude G and direction α can be determined by where G x and G y are the first derivative in the horizontal and vertical direction, respectively. The gradient magnitude is calculated along the vertical direction since the flattened esophageal tissue layers distribute horizontally.
(c) Apply non-maximum suppression to get rid of spurious response to edge detection. The matrix indicating edges can be described by where p is a pre-defined threshold, I i and I j denotes the gradient magnitude of the pixel in the positive and negative gradient directions, respectively.
Consequently, a binary matrix I e indicating image edges can be generated. An example of edge map I e overlying the original image is shown in Fig. 6(b). By removing vertical edges, the modified canny detector can better describe esophageal tissue layers, thus creating an edge map more suitable for layer segmentation.

Construction of edge-enhanced gradient map
In this study, the edge-enhanced gradient map M is defined as where Gr denotes the vertical intensity gradient calculated by mask k (Eq. (2)), I e represents the modified Canny edge map and w is a weight parameter. The combination of Gr and I e has the following advantages. By using neighboring information, Gr provides complementary search guidance where the Canny detector loses its efficacy. Meanwhile, I e calculated by the Canny strategy compensates for the lack of local precision of Gr caused by the local smoothing effects of k. As a result, M is able to preserve local details while interpolating information into the shadow regions.

Segmentation by EEGS
The EEGS framework uses the GTDP for layer boundary identification. Instead of setting the weight by Eq. (1), the edge weight in EEGS is defined as: where, M n a and M n b are normalized edge-enhanced map values for connecting adjacent points a and b calculated by Eq. (5).
The extraction of each boundary is realized by performing EEGS iteratively in a limited search area. The area is defined using the previously-identified boundary and the prior knowledge of the tissue layer thickness with a ±20% tolerance [44,46], so that each search region contains one boundary ideally. The prior knowledge can be obtained by manual segmentation. As a result, all of the six boundaries are acquired automatically.

Experimental data
The proposed EEGS segmentation framework was tested on esophageal OCT images of guinea pigs, which were acquired by an 800-nm ultrahigh resolution gastrointestinal endoscopic OCT system [9,10,47]. A typical image is illustrated in Fig. 3(a). Some layer boundaries like SC, EP and LP can be visually observed, while the MM and SM layer boundaries have low-contrast and are difficult to identify. Besides, disturbance such as the speckle noise, plastic sheath and the mucus are clearly presented on the image.

EEGS performance on OCT images with different challenges
In vivo esophageal OCT images present unique difficulties for layer segmentation resulting from motion artifacts and intrinsic disturbance from the endoscopic equipment itself (such as the plastic sheath). Fig.7 illustrates several typical ill-posed images. Specifically, Fig. 7(a) shows an image with irregular bending, which was caused by the sheath distortion; Fig. 7(c) has quite weak boundaries in some regions of the MM and SM layers; Fig. 7(e) presents discontinuous boundaries, which might be caused by the non-uniform rotation speed of the endoscope; mucus occurs in Fig.  7(g) and separates the probe from the tissue surface. All of the listed problems are addressed in our EEGS scheme by embedding procedures such as flattening, baseline correction and the Canny-based edge-enhanced strategy. Corresponding segmentation results are demonstrated in Figs. 7(b),7(d),7(f) and 7(h). Results show that the EEGS is able to accurately identify all the esophageal layers, which confirms the robustness of the proposed method.

Segmentation result analysis of the EEGS framework
To further confirm the effectiveness of the EEGS framework, we compared the proposed method with manual segmentation of three experienced observers. These observers have segmented numerous OCT images from different organs, such as the retina, esophagus and airway, using a freeform (drawing) method implemented in the open-source software ITK-SNAP [48]. Besides, the comparison of the EEGS and GTDP [13,36] was also carried out to prove the advantages of the proposed method. The experimental data is composed of 100 esophageal OCT images, each with 2048 × 2048 pixels acquired from one healthy guinea pig. For a quantitative evaluation, we calculate the thickness of the five esophageal layers.
An intuitive segmentation comparison among EEGS, GTDP and one of the observers (Obs. 1) was demonstrated in Fig. 8. It can be seen that both EEGS and GTDP are consistent with the manual segmentation results for the right portion of the image, where the tissue layers are smooth and little disturbance exists. In comparison, for the left portion of the image where distortion occurs, differences between automatic and manual segmentation can be visually found. In that case, the EEGS result is closer to Obs.1 than GTDP because the modified Canny map in EEGS enhances the edge details, thus compensating for the loss of precision of the vertical gradients used by GTDP. The unsigned border position differences between the automatic and manual segmentations are listed in Table 1, where borders BD1 to BD6 represent the layer boundary from the top of SC layer to the bottom of SM layer and the data is presented in the form of mean ± standard deviation in micrometer. It can be found that the EEGS result is closer to the manual segmentation in all cases, which proves its better accuracy in layer boundary identification.  The average layer thickness of 100 esophageal OCT images and the corresponding standard deviation are listed in Table. 2. Using each of the manual segmentation as a reference separately, the differences of layer thickness between the automatic segmentation and the reference are listed in Table 3. Data in bold indicates the automatic segmentation results that are closer to the manual segmentation. Noticing that the EEGS segmentation results are closer to the manual reference values than the GTDP in all cases, which indicates the proposed EEGS is able to segment five esophageal layers more accurately than GTDP. Fig. 9 shows the scatter plots indicating the reliability of the thickness measurements using GTDP and EEGS in comparison with the reference annotations from Obs.1, as well as the corresponding Bland-Altman plot. In Fig. 9, n is the point number, r denotes the correlation coefficient and LOA represents the limit of agreement with the 95% confidence interval. It can be found that the EEGS method offers a larger r value and a smaller LOA, which indicates its result is closer to the reference annotations.

Clinical potential of EEGS
To demonstrate the clinical potential, the EEGS framework was employed to segment three sets of 30 guinea pig esophagus images, including two normal conditions and one EoE model [49]. EoE is an esophageal disorder featured with eosinophil-predominated allergic inflammation in the esophagus [11]. Representative OCT images of guinea pig esophagus segmented by EEGS are presented in Figs. 10(a) to 10(c), and the corresponding thicknesses of the five tissue layers are shown in Fig. 10(d). It is evident that the thickness of the SC layer of the EoE model is significantly thicker than the normal cases, which indicates our EEGS framework would potentially aid clinical diagnosis [11,50].

Discussions
Our image analyses were performed on a personal computer with an Intel Core i7 2.20 GHz CPU and 16 GB RAM. Using MATLAB, it takes about 12 seconds for the EEGS to preprocess and segment an esophageal OCT image with the size 2048 × 2048 pixels, which is less efficient than GTDP (about 8 seconds) due to the additional Canny edge detection. This computational efficiency is suboptimal for real-time processing. To reduce the segmentation time, more efficient GPU-based programming in C will be adopted in the future. Since the esophageal OCT images were collected successively by the endoscope, the overlapped information in adjacent frames can be used to correct outliers, thus further improving the segmentation accuracy. In addition, the current algorithm requires some apriori knowledge such as the layer numbers to be segmented and the approximate layer thickness. Future work will try to find adaptive parameter setting methods to perform automatic segmentation with less or no user input.
The esophagus layer segmentation experiments on normal and EoE guinea pig models demonstrate the clinical potential of the EEGS framework. In the future, esophageal OCT images for human will be collected and studied, so that the criteria for diagnosing different esophageal diseases would be determined. As a result, an automatic diagnosis system for esophageal diseases will be developed.

Conclusions
The main contribution of this paper is proposing the EEGS scheme to accurately segment esophageal layers on OCT images. With reasonable preprocessing before segmentation, the negative effect caused by the OCT imaging system and in vivo motion artifacts is minimized. By introducing Canny edge detection in the construction of the edge-enhanced weight matrix, the local edge information is preserved while the lost information in the shadow region is interpolated. It is worth mentioning that the Canny method utilized in this study focuses on boundaries along the horizontal direction, thus matching the esophageal layer better. Experiments showed that the proposed EEGS method can achieve better esophageal layer segmentation results in accuracy and stability than the GTDP, and it has the potential to be used for diagnosing esophageal diseases.