Automatic and robust segmentation of endoscopic OCT images and optical staining

: We report a generic method for automatic segmentation of endoscopic optical coherence tomography (OCT) images. In this method, OCT images are first processed with L 1 - L 0 norm minimization based de-noising and smoothing algorithms to increase the signal-to-noise ratio (SNR) and enhance the contrast between adjacent layers. The smoothed images are then formulated into cost graphs based on their vertical gradients. After that, tissue-layer segmentation is performed with the shortest path search algorithm. The efficacy and capability of this method are demonstrated by automatically and robustly identifying all five interested layers of guinea pig esophagus from in vivo endoscopic OCT images. Furthermore, thanks to the ultrahigh resolution, high SNR of endoscopic OCT images and the high segmentation accuracy, this method permits in vivo optical staining histology and facilitates quantitative analysis of tissue geometric properties, which can be very useful for studying tissue pathologies and potentially aiding clinical diagnosis in real time.


Introduction
Optical coherence tomography (OCT) is a powerful imaging technology for assessing biological tissue morphology, blood flow, optical and mechanical properties [1][2][3][4]. With the help of innovative endoscopes, OCT can provide high resolution 2D and 3D images of internal luminal organs in vivo and in real time [2][3][4][5][6][7]. To facilitate the extraction and utility of structural and quantitative information from the large amount of OCT imaging data, one important measure is to automatically segment tissue layers of interest from the images. One example of direct clinical relevance is the thickness of retinal layer and nerve fiber layer based on OCT images for glaucoma staging [8,9].
Various automatic or semi-automatic methods have been proposed for OCT image segmentation [10][11][12][13][14][15][16][17]. The widely-used methods are based on the OCT intensity variation, such as the pixel-level edge detection algorithm [10,11], the 1-D intensity peak detection procedure for each A-scan [12], and the support vector machine method based on mean intensity of each layer [13,14]. Furthermore, the intensity variation-based method has been successfully applied to segment retinal layers from 3D OCT data sets [15]. These approaches, however, are intrinsically sensitive to noise and could potentially fail to detect tissue layer boundaries correctly if the boundaries seem to be discontinued in the images due to view blocking, tissue folding, or low contrast resulting from low signal-to-noise ratio (SNR). To overcome the noise issue, graph cut methods have been proposed by taking into account both the edge-based and the region-based terms [16]. However, these methods require a priori knowledge of the tissue structure and their accuracy heavily depends on the choice of the a priori conditions. Active contour segmentation approaches based on the level set theory have also been proposed to find the boundary of each layer by optimizing a cost function [18,19]; however, they are liable to fall into local optima. Recently, methods based on graph theory were developed to segment the retinal layers, promising results were demonstrated with very high efficiency and accuracy [20][21][22][23][24].
So far, most reported OCT image segmentation methods targeted retinal OCT images. To the best of our knowledge, there is no literature reporting an accurate and robust segmentation method for ultrahigh-resolution endoscopic OCT images of luminal organs, such as esophagus and airway. We hypothesize that accurate segmentation of endoscopic OCT images can help quantitatively study tissue integrity and remodeling of luminal organs associated with various diseases, such as Eosinophilic Esophagitis (EoE) and Barrett's esophagus [25][26][27]. Segmentation of endoscopic OCT images has to address some common challenges similar to those encountered in segmenting free-space retinal OCT images, including the intrinsic speckle noise, intensity change with depth, and motion artifacts (associated with in vivo imaging). In addition, endoscopic OCT images come with their own challenges, such as steep layer boundary slopes due to tissue folding, view blocking by mucus or some debris (such as food debris in esophagus), and image distortion caused by nonuniform azimuthal scanning speed.
To address these challenges, we propose an automatic and robust layer segmentation approach for endoscopic OCT images. This approach firstly incorporates L 1 -L 0 norm minimization based de-noising and smoothing methods to reduce the image noise and improve the image contrast; then it numerically attenuates the intensity along the image depth and identifies each layer's boundary by searching for the lowest cost path. This paper is organized as follows: in section 2, the detailed segmentation method is presented; in section 3, the effectiveness and robustness of this method are demonstrated by segmenting in vivo guinea pig esophagus images acquired by our home-built OCT endoscope. Based on the accurate segmentation results, in vivo optical staining histology of guinea pig esophagus is also demonstrated. Finally, a brief discussion is provided along with conclusion in section 4.

Methods
Our segmentation method is primarily based on the gradient information and graph theory. As we have pointed out, gradient-based method is generally sensitive to noise. To address this problem, an L 1 norm minimization based de-noising technique is firstly applied to reduce the image noise [28]. Secondly, in order to mitigate the influence of gradients resulting from intra-layer structures, the intra-layer fine structures are smoothed out by using an L 0 norm minimization based smoothing algorithm [29]. Thirdly, the image signal intensity is numerically attenuated along imaging depth in order to ensure a gradually increasing cost function along depth direction (as elaborated in section 2.3) and facilitate automatic identification of each sequential layer boundaries for later step. Finally, each layer on the image is segmented by calculating the weight of each graph node and searching for the lowest cost path.
Our method is tested on segmenting the endoscopic OCT images of guinea pig esophagus. Figure 1 shows a representative histology image illustrating well-delineated layered structure of guinea pig esophagus [6]. In our method, we firstly segment the stratum corneum (SC) layer and then move on to segment each other layers in sequence (along imaging depth), such as the epithelium (EP), lamina propria (LP), muscularis mucosae (MM), and submucosa (SM), respectively. Since our method does not actually utilize any special property of tissues, it can be generalized for layer segmentation of other luminal organs, such as airway and colon.

L 1 -L 0 norm minimization based de-noising and smoothing methods
In order to obtain reliable image intensity gradient for each layer boundary, we firstly reduce the speckle noise of OCT images. However, it is well known that the general low-pass filters, such as the Gaussian or Butterworth filter [30], will blur the feature edges of the image and weaken the strong gradients. To preserve the strong gradients while minimizing the speckle noise, we adopt the total variation minimization algorithm [28]. In essence, the algorithm searches for a desired noise-free (ideal) image (denoted by u) by minimizing the following variant gradient L 1 norm ( ) J u ε as: where 0 λ > is a regulation parameter to adjust the weight of the total variation, g is the original noisy image, ∇ denotes the 2D gradient operator, ε is a stabilization parameter, and Ω represents the 2D image space domain. The explicit image pixel position ( , ) x y is omitted in g and u for simplicity. It is noted that Eq. (1) is slightly different from the classic total variation formulation by the introduction of ε , which is used to remove potential singularities and stabilize the solution. In our case, we choose 6 1 10 ε − = × . This de-noising procedure is equivalent to seeking for a solution of u that can best resemble the original noisy image g while still maintaining a sparse gradient field. By minimizing the gradient L 1 norm in Eq. (1), the solution of u will be: where ∂Ω is the image boundary of Ω . Equation (2) can be discretized using a fixed point finite difference scheme, which iteratively estimates u and its gradient to obtain the solution, i.e., the desired noise-free image u [31].
Furthermore, we found that the fine structures within each layer have strong gradients, which often interfere severely with the search for the desired inter-layer boundaries. For this end, an L 0 norm minimization based smoothing approach is applied to preserve the salient boundary of each layer and blur the fine structures within each layer. The employed L 0 norm smooth function is: where 0 s λ > is a weight parameter to adjust the smoothing extent, u is the de-noised image, s I refers to the expected smooth image, and 0  denotes L 0 norm. Here is a set operation which counts the number of set elements, x s I ∇ and y s I ∇ are the partial derivatives of s I with respect to x and y, respectively.
The goal of this optimization procedure is to find the smooth image s I whose gradients have the minimal L 0 norm, while preserving the major features of the noise-free image u.
However, it is well known that Eq. (3) is concave due to the calculation of L 0 norm, and thus difficult to optimize [32,33]. To address this challenge, we introduce an auxiliary vector variable , where h D is the partial derivative of D  with respect to x, and v D is the partial derivative of D  with respect to y. By trying to let D  be as close to s I ∇ as possible, Eq.
(3) can be converted into: where 0 d λ > is a scaling parameter to control the proximity between s I ∇ and D  . Then, the optimization of Eq. (4) can be divided into two sub-problems. One is: and the other is: It can be found that Eq. (5) is convex and Eq. (6) can be solved according to Ref [34]. By alternately optimizing Eq. (5) and Eq. (6), the expected smooth image s I can be obtained [31].
Through the above L 1 -L 0 de-noising and smoothing procedures, we can obtain OCT images with lower noise and higher contrast, as shown in Figs. 2(a)-2(c). As seen from the represetative intensity profiles illustrated in Figs. 2(d)-2(f), the fine structures within each layer are effectivly smoothed while the layer boundraies are well preserved after the denoising and smoothing procedures.

Graph path and layer segmentation
By treating each pixel as a graph node and the relationship between neighboring pixels as edges (here we define the edge weight as the summation of neighboring pixels' gradients), the image after de-noising and smoothing can be represented as a graph G(V, E), where V is the set of vertices of nodes and E is the set of edges. If we weigh the graph edges appropriately, these boundaries between adjacent layers could correspond exactly to the preferred graph paths. Accordingly, the problem of layer segmentation is transformed to a problem of searching for the preferred graph paths, which can be solved by dynamic programming method [35].
In principle, the weight of each edge is crucial to the accuracy of layer segmentation. Owing to the L 1 -L 0 norm minimization based de-noising and smoothing procedures, the resulted layer boundaries of OCT images become more pronounced with high gradients. Therefore, we could simply use the vertical gradients (along the imaging depth) to calculate the weight of each edge. To further improve the robustness of the gradient calculation, we use a 5 5 × vertical gradient operator k [30]: where ⊗ denotes convolution operation. To facilitate further processing, the grad matrix of the image can be normalized as: where min( ) ⋅ and max( ) ⋅ are the minimum and maximum functions, respectively. Based on grad Nm , we assign a cost value to each node(i, j), denoted by C(i, j), as: ( , ) 1 ( , ).
Nm C i j grad i j = − (9) In this way, the cost value of each node is reversely proportional to its vertical gradient. Then the edge weight between node(i, j) and node(m, n), denoted by W (i,j)-(m,n) , can be defined as: where ( , ) Neighbor i j denotes the nearest neighboring pixels of node(i, j). In our method we associate each node with its eight nearest neighbors, i.e.

{ }
It can easily be seen that the edge weight is designed to be smaller for edges between two neighboring pixels (nodes) both with larger magnitude of vertical gradients; such two pixels potentially delineate layer boundaries. While for two neighboring pixels within one layer, their vertical gradients tend to be small, and then the edge weight between them tends to be large. Hence, by applying the shortest path search to find the path with minimal weight W (i,j)-(m,n) among its eight neighboring pixels [36][37][38], the task of identifying layer boundaries can be formulated as a problem of seeking the minimal cost graph path, which could be readily solved via the dynamic programming algorithm [39,40]. It is worth noting that a similar shortest path search method was first introduced by Chiu et al. for segmenting retinal structures [21].

Numerical attenuation of depth-dependent image intensity
For endoscopic OCT images of luminal organs (such as esophagus), the imaging beam usually focuses inside the tissue instead of on the tissue surface; thus the surface boundary of the first tissue layer does not always have the highest gradient. This results in a compromised accuracy for the identification of the first layer boundary, which in turn affects the effectiveness of sequential search for other layer boundaries. Theoretically, the influence of the depth-dependent effects of the beam profiles can be numerically compensated [41]; however, it is complex for endoscopic OCT imaging. Instead, we numerically attenuate the image intensity along imaging depth so that the first layer boundary will have the highest gradient and then each sequential layer boundary will have a gradient lower than that of its precedent. To do so, we introduce a depth (y)-dependent attenuation function: s I x y I x y y ϕ = (13) This intensity attenuation can greatly facilitate the identification of the first boundary by using the shortest path searching algorithm since the first boundary (or the tissue surface) will now have the highest gradient in the image. In order to optimize the searching efficiency, we adopt the same method as reported in [20,42] to limit the search area for the segmentation of each layer by using the prior knowledge of the tissue layer thickness with a ± 20% tolerance. This prior knowledge was acquired from manual segmentation of the endoscopic OCT images of the same subject that have a good histological correlation.

Segmentation and optical staining of esophagus images
The performance of the above segmentation method is tested on in vivo OCT images of guinea pig esophagus, which were acquired with an 800-nm ultrahigh resolution endoscopic OCT system [6,7]. Figure 3(a) shows a representative image where some layer boundaries can be visually (but vaguely) identified, such as the boundaries of the epithelium (EP), lamina propria (LP), and muscularis mucosa (MM). In comparison, our segmentation result is shown in Fig. 3(b), where the boundaries of all five layers of interest are successfully identified and the layer thicknesses can be further quantified. Unlike [42] in which the images need to be numerically flattened before segmentation, our method can directly segment original OCT images with undulating layers (and layer boundaries). Furthermore, it can be seen from Fig.  3(b) that even in the presence of dramatic kinks of tissue layers, e.g. the areas marked by red circles, the layer boundaries can still be reliably identified. As aforementioned, OCT images possess a set of unique challenges conveyed by endoscopic setting and luminal organs. Shown in Fig. 4 are exemplary challenging images with steep slopes in the layer boundaries ( Fig. 4(a)), with partial view-blocking (Fig. 4(c)) and with local distortion (Fig. 4(e)). Specifically, steep boundary slopes are usually caused by luminal tissue folding or sudden change of the endoscope position relative to the lumen wall; partial view-blocking generally results from mucus or blood vessels, or food debris for the case of esophagus; image distortion can occur due to non-uniform rotation speed of the endoscope. We tested our method on these images and the corresponding results are shown in Figs. 4(b), 4(d), and 4(f). As seen from the segmentation results, each tissue layer can be accurately identified for all these ill-posed cases, which therefore demonstrated the robustness of the method. Furthermore, in order to show the reproducibility of the current method, 50 sequential OCT images of guinea pig esophagus have been segmented automatically and illustrated in Visualization 1 in the supplementary materials.
Together with the ultrahigh resolution and high SNR of OCT images, the accuracy afforded by the automatic and robust segmentation method offers a unique opportunity to optically code tissue structures with different color to achieve in vivo optically stained histology-alike images. Various color-coding schemes can be chosen to mimic different histological staining effects. Here we use a color scheme similar to (but not exactly the same as) the widely-used H&E staining as an example. We stain each layer by mapping the grayscale intensity to a designated RGB color map. Figures 5(b) and 5(d) show the optically stained OCT images in the Cartesian and polar coordinates, respectively. The image area above the tissue surface including the plastic sheath (as boxed with a red rectangle in Fig.  5(a)) is masked out in both Figs. 5(b) and 5(d). Furthermore, automatic and robust segmentation of OCT images enables an efficient quantitative analysis of the geometric properties of tissues. For example, we can conveniently calculate the average thickness of each layer in terms of optical path length (OPL) for 20 esophagus images acquired from one 10-week old guinea pig, as shown in Table 1. In order to quantitatively validate the accuracy of the proposed method, we recruited two experienced OCT image readers who were blinded to the automatic segmentation results. Both readers manually segmented the same set of 20 OCT images independently using a freeform (drawing) method implemented in MATLAB (The MathWorks Inc.). As shown in Table 1, we found that the automatic method provided similar results as the two readers. In order to demonstrate the clinical potential, our method was deployed to segment two sets of 20 guinea pig esophagus images, including a control and an EoE model. EoE induction with ovalbumin sensitization and challenge was performed following a well-established protocol [43]. Male guinea pigs (Hilltop, Scottsdale, PA) with the same age and roughly the same weight were used. All animals were handled under protocols approved by the Johns Hopkins University Animal Care and Use Committee (ACUC). Two segmented OCT images of esophagus for a guinea pig control and an EoE model were displayed in Figs. 6 (a) and (b), respectively. As shown in Fig. 6 (c), the layer thicknesses can be quantified by our segmentation method and conveniently compared between the control and the EoE model. Together with in vivo optical stained histology-alike images, we believe that an accurate quantitative analysis would potentially facilitate the study of tissue pathologies and aid clinical diagnosis in real time [44,45].

Discussions and conclusions
In summary, an automatic segmentation method with high robustness has been proposed and validated for segmenting endoscopic OCT images. By employing an L 1 -L 0 norm minimization based de-noising and smoothing algorithms, our method works well on low-SNR and lowcontrast images. By introducing depth-dependent digital attenuation of the image intensity, the tissue surface and other layer boundaries can be accurately and sequentially identified. In addition, accurate segmentation of OCT images of ultrahigh resolution and high SNR enables optical staining of OCT images and facilitates the quantitative analysis of tissue geometric parameters. It is worth mentioning that our method can serve as a generic segmentation method and be conveniently adopted for segmenting other endoscopic OCT images (such as airway and colon) and non-endoscopic OCT images (such as retina). This method can also be potentially used to segment histopathological micrographs for patterns/features extraction and identification.
As one of the limitations, the computational efficiency of current algorithm is suboptimal for real time applications. As a benchmark test, CPU timing was carried out on a personal computer with a Windows 7 operating system, an Intel Core i7 at a base processor frequency of 2.4 GHz and 8GB RAM. Algorithm was implemented in MATLAB. Computational time was measured to be ~6.63 seconds per frame (2048 A-lines/frame and 2048 pixels/A-line). In addition, the current algorithm only segments a predefined number of layers, for example, the first 5 layers of the esophagus tissue in our case. Future work and effort can be concentrated on GPU implementation for improving computational speed and algorithm optimization for adaptive selection of the number of layers for segmentation. It is also worth noting that our current optical staining colormap was mainly used for showing the layers segmented by the reported method. More studies and validations on the color coding methods will be carried out in the near future in order to better reflect the histomorphology of tissues. Furthermore, the MATLAB based source scripts for the reported method are available by emailing request to jhu.bme.bit@gmail.com.