Balloon catheter-based radiofrequency ablation monitoring in porcine esophagus using optical coherence tomography

: We present a microscopic image guidance platform for radiofrequency ablation (RFA) using a clinical balloon-catheter-based optical coherence tomography (OCT) system, currently used in the surveillance of Barrett’s esophagus patients. Our integrated thermal therapy delivery and monitoring platform consists of a ﬂexible, customized bipolar RFA electrode array designed for use with a clinical balloon OCT catheter and a processing algorithm to accurately map the thermal coagulation process. Non-uniform rotation distortion was corrected using a feature tracking-based technique, which enables robust, frame-to-frame analysis of the temporal ﬂuctuation of the complex OCT signal. With proper noise calibration, precise delineation of the thermal therapy zone was demonstrated using cumulative complex diﬀerential variance in porcine esophagus ex vivo with the integrated OCT-RFA system, as validated by nitroblue tetrazolium chloride (NBTC) histology. The ability to directly and accurately visualize the thermal coagulation process at high resolution is critical to the precise delivery of thermal energy to a wide range of epithelial lesions.


Introduction
We present a technique for balloon catheter-based radiofrequency ablation monitoring using optical coherence tomography (OCT), thereby enabling the microscopic guidance of thermal therapy of epithelial lesions in a clinical setting. Among epithelial lesions, dysplasia in the esophagus is of special interest because it can develop into a highly lethal cancer called esophageal adenocarcinoma, which has a dismal 5-year survival of ∼ 20%, and as low as 5% for patients diagnosed with late stage disease [1,2]. Clinical studies using OCT have demonstrated its potential to identify early precancerous changes in the esophagus, a metaplastic state called Barrett's esophagus (BE), which arises from chronic acid reflux from the stomach [3][4][5]. In BE, normal squamous epithelium transforms into columnar epithelium with goblet cells and potential presence of submucosal glands. BE can further develop into low-grade dysplasia and eventually high-grade dysplasia.
Due to the significant risk of progression of high-grade dysplastic lesions, physicians typically perform therapies such as radiofrequency ablation (RFA), cryotherapy, endoscopic mucosal resection, photodynamic therapy, and even esophagectomy [6] to remove dysplastic tissue and prevent malignant transformation. Among these, RFA is the most frequently used due to its high effectiveness and minimally invasive nature [7]. However, overtreatment with RFA can result in complications, especially stricture formation, which requires endoscopic esophageal dilation to regain normal esophageal function [8][9][10][11]. RFA is typically guided only by macroscopic surface features through endoscopy, although it is known that conventional endoscopy can lead to missed occult BE regions [12]. The lack of cross-sectional information in conventional endoscopy implies that RFA therapy is currently performed unguided; apart from visual inspection during video endoscopy, there is no pre-arranged therapy plan that identifies the areas and depths that require treatment. In addition, there is no technique to monitor treatment depth in real-time and ensure that the required depth is achieved. Because of these limitations, a conservative approach to RFA is typically used in order to minimize complications. A standardized radiofrequency energy is used in all patients irrespective of natural variations of epithelial thickness and disease distribution [13][14][15]. Because full ablation of the mucosa is not guaranteed, multiple treatment sessions are required, with an average of 3.5 sessions and reaching up to six sessions for large Barrett's segments [16]. The high cost of this approach implies that, despite the evidence showing a 90.5% complete response rate with RFA treatment for low-grade dysplasia in an intention-to-treat analysis [17], treatment is generally reserved only for the advanced high-grade dysplastic state of the disease. In the current work, we focus on the pressing need for better monitoring tools to enable more precise treatment.
In the domain of microscopic imaging, there is a limited number of studies on thermal therapy monitoring. Polarization-sensitive OCT (PS-OCT) can assess thermal damage through the reduced tissue birefringence caused by the denaturation of collagen fibres after therapy. This has been shown in porcine tendon and skin [18], as well as in RFA lesions in porcine myocardium ex vivo including in an integrated PS-OCT/RFA catheter configuration [19][20][21][22]. However, PS-OCT therapy monitoring requires tissue exhibiting high birefringence at baseline, which is not the case for the epithelial layer of the esophagus [23]. Another interesting approach that was explored to visualize RFA lesions in excised porcine atrial tissue was autofluorescence hyperspectral imaging, which offers high spatial resolution but does not provide depth-resolved information [24]. Conventional OCT structural images have further been used to monitor cardiac RF ablation in porcine heart given the increased signal intensity after ablation both in vitro and in vivo [25][26][27], and in the superficial epithelial layer in porcine esophagus ex vivo using a commerical focal RFA probe [28]. However, we have shown that the extent of increased scattering seen in the structural image does not necessarily correlate with the ablation zone boundary, especially as the coagulation boundary extends deeper into the tissue [29]. OCT has been used to monitor the tissue expansion after laser coagulation in the retina; however, the delineation of the injury boundary was not attempted [30][31][32]. For imaging ablation therapy in BE, buried glands were monitored before and after RFA or cryoablation, but not during ablation [33]. In any case, existing RFA probes, which have been developed to be used exclusively with white-light endoscopic imaging, completely obscure the region under treatment and inhibit imaging [34,35].
A previous study demonstrated the potential of using volumetric OCT for detecting buried glands [33], which evade detection in conventional endoscopy and lead to recurrence. For these reasons, the use of OCT in the clinical environment could be greatly facilitated by the development of a comprehensive real-time RFA monitoring system with potential for clinical translation. The OCT structural image could be used to prepare the therapy plan, creating a map of the regions that require treatment together with the target depth. This system would require a specialized RFA probe to allow simultaneous imaging, and would provide real-time depth monitoring of the RFA therapy.
To this end, we have shown that the analysis of the temporal fluctuations of the OCT signal during treatment [36], specifically using complex differential variance (CDV) [37] -previously developed as an angiographic OCT technique-can be used to monitor laser thermal therapy and to provide accurate delineation of the coagulation region in a benchtop configuration in different tissue types [29]. CDV relies on the temporal complex correlation coefficient of the OCT signal, and by performing a frame-to-frame analysis using a cumulative CDV (cCDV) metric, we could clearly delineate the thermal coagulation zone. However, direct translation of this technique to an endoscopic system is not possible due to the open-loop nature of catheter rotation and associated motion artifacts. In most endoscopic applications, scanning is performed with flexible fiber-optic probes which are attached to external motors to perform longitudinal and azimuthal scanning. The need for a flexible probe in a clinical endoscopic setting carries the drawback of dynamic changes in friction given by the geometric configuration of the probe at a given instant, resulting in inaccuracies during scanning known as non-uniform rotation distortion (NURD) [38]. Therefore, a given A-line in the OCT tomogram corresponds to different angular orientations even between adjacent B-scans. Offsets of a few to tens of A-lines between adjacent B-scans are not uncommon in distal-driven clinical catheters, with the correspondence between angular orientation and A-line changing dynamically as a function of time [38]. A frame-to-frame analysis of the complex OCT signal using CDV is, therefore, challenging without proper NURD correction. Previous attempts to correct for NURD artifacts include techniques that require extensive changes to the imaging system [39] or the use of fiducial markers in a highly customized micromotor catheter design which is computationally efficient but with low accuracy due to the limited number of fiducial markers present in our balloon catheter [40]. Importantly, most NURD-correction techniques cannot correct distortions down to the lateral resolution of the system [38,41], a necessary requirement for CDV processing.
This manuscript describes the development of an integrated balloon-catheter-based radiofrequency ablation monitoring system that enables simultaneous imaging and therapy monitoring using the CDV technique with robust NURD artifact correction. Our NURD correction is based on the tracking of individual speckle in intact tissue across sequential frame pairs, enabling the correction of distortion down to the lateral resolution of the system and thus the use of speckle correlation metrics for therapy monitoring. A flexible, bipolar radiofrequency ablation electrode array was fabricated and integrated with the clinical balloon catheter. A prototype integrated OCT-RFA catheter was constructed and tested in porcine esophagus ex vivo. Using novel processing algorithms, NURD effects were reduced dramatically enabling frame-to-frame CDV processing of complex tomograms. Proper SNR calibration followed by cCDV analysis allowed us to clearly visualize the thermal coagulation process, delineate the treatment zone and validate our monitoring system with nitroblue tetrazolium chloride (NBTC) histology staining. If implemented clinically, microscopic image guidance of RFA could reduce the number of treatment sessions, associated cost, and complication rate, thereby lowering the barrier to treat BE at earlier stages of the disease, where the response rate is very high.

Integrated balloon catheter-based OCT-RFA system
In order to enable OCT imaging during RFA therapy, we developed and built a prototype of an integrated OCT-RFA balloon catheter that enables simultaneous imaging and therapy monitoring. First, a finite element model was constructed to determine the optimal geometry of a flexible radiofrequency ablation electrode array that delivers sufficient energy for thermal coagulation of esophageal tissue within a reasonable amount of time while maintaining sufficient spacing for imaging between electrodes. Based on these simulation results, a prototype integrated OCT-RFA balloon catheter (a focal RF ablation probe in a balloon catheter configuration, as illustrated in Fig. 1) was constructed and tested in porcine esophagus ex vivo using a modified clinical OCT system built for Barrett's esophagus patients.  Fig. 1. Schematic of the microscopic thermal therapy monitoring setup integrating a flexible bipolar radiofrequency ablation electrode array with an optical coherence tomography system. Imaging is performed between alternating electrode pairs (+: active electrode, -: dispersive or ground electrode) to enable real-time microscopic image guidance.

Finite element modeling
We designed a focal RFA electrode array using finite-element modeling (COMSOL Multiphysics, Burlington, MA, USA) to optimize energy delivery while providing an electrode separation that enables imaging between consecutive electrode pairs. Figure 2 illustrates the finite element model geometry, which consists of a bipolar radiofrequency electrode array on top of a two-layer tissue slab. The bipolar electrode array consists of a set of copper electrodes. Alternating electrodes (active and dispersive electrodes) are connected to two different connectors -one serving as the ground and another as the RF voltage source. The tissue slab is a simplified model of the esophagus consisting of a surface mucosal layer (epithelium up to lamina propria) with a thickness of 500 µm and the remaining layer is assumed to be mainly a muscular layer (muscularis mucosa down to muscularis propria) with a thickness of 2.5 mm. The submucosa is not modeled explicitly.
Using the COMSOL Multiphysics software, the temperature field over the course of the RF ablation procedure was simulated by solving the following set of equations in the FEM model.
First, the Pennes bioheat equation [42], which governs heat transfer in biological tissue, was solved to derive the temperature field: where ρ is the density of the tissue (kg/m 3 ), C is the specific heat (J/kg·K), k is the thermal conductivity (W/m·K), T is the temperature of the tissue (K), T b is the temperature of blood (K), ρ b is the density of blood (kg/m 3 ), C b is the specific heat of blood (J/kg·K), ω b is the perfusion rate (1/s), Q m is the metabolic heat source term (W/m 3 ), and Q e is the heat from the external source (W/m 3 ). For this ex vivo model, the perfusion term and metabolic heat source term were set to zero. The density ρ and specific heat C of the tissue were set at 1070 kg/m 3 and 3700 J/kg·K, respectively. The thermal conductivities of the mucosal layer of the esophagus (described with subindex 1) and the muscular layer (described with subindex 2) were set as 0.343 and 0.49 W/m·K, respectively [43]. The boundary conditions were set to an ambient temperature of 25°C for all outer boundaries such as the tissue surface and ì n · (k 1 ∇T 1 − k 2 ∇T 2 ) = 0 for all interior boundaries.
To couple the bioheat equation to the RF ablation process, the heat from the external source Q e was set to ì J · ì E where ì J is the current density (A/m 2 ) and ì E is the electric field intensity (V/m) or The voltage distribution can be solved using Laplace's equation where V is the voltage and σ is the electrical conductivity (S/m) of the tissue. The electrical conductivities of the mucosal layer and the muscular layer of the esophagus were set as 0.1778 and 0.4459 S/m, respectively [44], and a downward ramp function was implemented between 95 to 100 o C to account for the significant drop in electrical conductivity approaching zero once the tissue becomes dessicated as it approaches vaporization threshold at 100°C [45].
Here, a quasi-static approach (the tissue is considered source-free and displacement currents are neglected) was employed since the biological tissue is purely resistive at the frequency typically used for RF ablation (∼ 500 kHz) and the voltage was set as the root mean squared (RMS) voltage of the applied RF voltage (V RMS ) [46]. The boundary conditions were set as follows: V = V RMS on active electrodes, V = 0 on dispersive electrodes and outer domain boundary, and ì n · ( ì J 1 − ì J 2 ) = 0 across all other boundaries. The damage integral Ω(r, t), which defines the degree of tissue injury (dimensionless) based on the Arrhenius equation [47], is computed as where C(r, 0) refers to the initial concentration of tissue molecules in the native state, C(r, t) is the remaining concentration at time t after conversion to the damaged state, A is the frequency factor (s −1 ), E a is the activation energy (J/mol) and R is the universal gas constant (8.314 J/mol·K).
Here, A was set to 2.88 · 10 98 s −1 and E a was set to 6.3 · 10 5 J/mol [48]. The necrotic tissue fraction θ d can be expressed as and we define that tissue injury occurs when 63% of tissue has been coagulated. Figure 2(b) shows an example of the temporal evolution of the simulated temperature distribution at the center of the bipolar RF array illustrated in Fig. 2(a) (11 electrodes, each 500 µm wide and 10 mm long, with a d = 500 µm spacing) at an applied voltage of V RMS = 20 V. Given the symmetry of the electrode pairs, the ablation depth is, as expected, uniform across various electrode pairs. The necrotic tissue fraction at 10 s is shown in Fig. 2(c)(i), which corresponds well with the 60°C isotherm in Fig. 2(b) (t = 10 s) and is commonly used in the literature as the threshold for tissue coagulation. As expected, the initial tissue temperature directly affects the necrotic tissue fraction at the end of the RF ablation process, which is demonstrated in Figs. 2(c)(ii) and 2(c)(iii).
Using this model, various applied voltages and electrode spacing configurations were evaluated, as shown in Figs. 2(d) and (e). In general, as the electrode spacing increases, the applied voltage required to achieve tissue coagulation within a given therapy duration also increases. Therefore, the final design of the bipolar RF electrode array, with a spacing of 500 µm, enables imaging between consecutive electrode pairs readily (in contrast to an effective gap of only 250 µm in the clinical BarrX RFA balloon catheter [49]), without imposing very high RF generator requirements, and achieves a desirable ablation depth of at least 500 µm-1 mm within 5-10 seconds of therapy duration at an applied voltage of V RMS = 20-30 V while providing sufficient number of frames for processing.

Integrated catheter configuration
connected to the optical system through a rotary junction on the proximal end. The electrode array consisted of 11 electrodes oriented circumferentially, 5 positive and 6 ground alternating electrodes, each 500 µm wide with 500 µm spacing, in a bipolar RF configuration. The electrodes were 10 mm in length, covering approximately 50 • of the balloon circumference. To drive the bipolar RF electrode, a continuous sinusoidal RF waveform at 500 kHz was produced by a function generator (Stanford Research Systems Model DS345, Sunnyvale, CA) and amplified to the final driving RMS voltage by a linear amplifier (Piezo Systems Model EPA-104, Cambridge, MA), which has a peak voltage rating of ±200 V and maximum current of 200 mA.
Cross-sectional imaging was performed through the gap between the central electrodes, at a fixed longitudinal location (without pull-back). Due to the symmetry of the RF electrode array, the ablation depth is expected to be relatively uniform across the area covered by the RFA probe. Extension to larger treatment areas with multiple electrode arrays is discussed later.

Clinical system specifications
The clinical OCT system was a first-generation NvisionVLE™ Imaging System (NinePoint Medical, Inc., Bedford, MA), with an A-line rate of 50 kHz (3096 samples per A-line) and 4096 A-lines per B-scan [50]. This results in a frame rate of 12.2 B-scans per second at a rotational speed of 750 rpm. The system used a frequency shifter to remove depth degeneracy [51], which did not share a timebase with the acquisition board, inducing random global phase shifts to each acquired A-line. Additional phase jitter from timing-induced errors in the data acquisition board were present [52], introducing phase ramps of random amplitude to each A-line.
We performed cross-sectional imaging of excised porcine esophagus while performing RFA therapy with different voltages and treatment durations. In these proof-of-concept experiments, raw data was recorded and processed offline as explained in the following section.

Processing algorithm
The processing algorithm was designed in order to make it real-time compatible. That is, the results for a given frame depend on the data from that frame and only preceding frames. Figure 3 shows an overview of the major steps in the processing algorithm. The main challenge was the non-uniformity in the rotation of the clinical OCT catheter due to the use of a proximal scanning mechanism and a long flexible drive shaft, and in the implementation of a correction compatible with the complex-valued data of a phase-unstable OCT system. The details of each step are outlined as follows: Here, we define the z axis as depth, x as the in-plane lateral axis and y as the out-of-plane lateral axis. We will use k to denote discrete variables in the z dimension, and l for discrete variables in the x dimension. m will be used for labeling frames in the time dimension. Now, the CDV γ is defined as [37] γ(k, l, m; N) where ζ(k, l, m) is a modified correlation coefficient of the complex OCT signal in an ensemble centered at depth k, lateral position l, and frame m with the signal in frame m − 1 for multiple possible time pairs N where summation over k is carried over the support of the axial weighting window w(k) which defines the ensemble average, and summation over ∆ is used when more than one time pair N is . 3. Overview of the processing algorithm, illustrating the impact of NURD correction and SNR calibration on the ability to delineate the thermal lesion in this challenging setting. Blue (red) highlights regions of high CDV (cCDV). The intensity OCT images were used to globally align the frames using the a fiducial mark [RFA electrode connectors on each side, as indicated with white arrowheads, the bottom one used as fiducial; the active RFA region imaged between consecutive bipolar electrode pairs is indicated by the red star] (a), followed by feature tracking to correct for NURD artifacts (b). Using the NURD corrected frames, the calibrated CDV (c) is computed by extracting a static region within the tissue to determine the SNR calibration curve. The cumulative CDV (d) is computed to reveal the thermal lesion generated by the RFA therapy.
used. In systems with two polarization detection channels p, γ is calculated independently for each channel. For simplicity, we omit the index p in the following equations. The complex correlation term is defined as where F is the complex-valued tomogram and only τ = 0 and τ = 1 are considered in Eq. (6). We call ζ a cross-correlation term, because in a cross-correlation coefficient an ensemble average is necessarily performed. Note that when multiple time pairs are used (N > 1), the summation in Eq. (6) is performed on the absolute value of the correlation coefficients. Throughout this work, w(k) consists of an 21-pixel Hanning window and a single time pair is used (N = 1). Hence we omit the N dependence hereafter, and finally Eq. (6) reduces to The key to using the CDV to highlight tissue coagulation is that the complex correlation term in Eq. (7) is large for all members of the ensemble when the value of F does not change with time. Therefore, the CDV discriminates between areas of stable and dynamically changing OCT signal, assuming the indices (k, l) in Eq. (7) point to the same spatial location in the sample for both frames m − 1 and m. Unfortunately, the presence of NURD invalidates this last assumption in most endoscopic systems.
To better visualize the significant effect of NURD, Figs. 4(a) and 4(b) show the intensity and the phase, respectively, of the tomogram of excised porcine esophagus in a small region of interest (ROI) 1 s after the beginning of the therapy (m = 12). The bright structure at the center of the ROI is the right edge of the electrode. We expect the region outside the electrode, at this early time point, to be completely static across frames, and the region underneath the electrode to show a very minimal change. Figure 4(c) shows the amplitude of ζ(k, l, m), while Fig. 4(d) shows the phase of ζ(k, l, m) -what can be considered the pixel-per-pixel Doppler shift between adjacent frames. Given that therapy had not affected the tissue yet, the value of the amplitude of ζ(k, l, m) should be close to 1 throughout the tomogram. It is clear that there is a mixture of regions with large and small ζ(k, l, m), implying that the signal was correlated in some regions (i.e. the spatial location at the sample is the same for each pixel in both frames), and decorrelated in some other regions (i.e. the spatial location at the sample is different in both frames). This demonstrates that NURD deforms the tomogram in a non-rigid manner. This effect is also seen in Fig. 4(d), where some large-scale phase differences are seen in the regions of high ζ(k, l, m) in Fig. 4(c), while the rest shows a completely decorrelated phase profile. The data in Fig. 4 illustrates why a direct translation of our prior work [29] is severely impaired by NURD. We can define a NURD mapping functionl = M(l, m), which describes the index l in frame m which corresponds to the same sample location of index l in frame m − 1. With knowledge of M(l, m), we can replace F(k, l, m) withF(k, l, m) = F(k,l, m) = F(k, M(l, m), m) in Eq. (7), whereF represents the NURD-corrected complex tomogram. There are two challenges now to compute CDV, first, accurately determining M(l, m), and second, calculatingF.

B-scan alignment and interpolation
The presence of NURD not only produces a deformation in the image, but also implies that each B-scan is not guaranteed to contain exactly 360 • of rotation. The goal of the first step in Fig. 2 is to create B-scans with a complete 360 • azimuthal range. To this end, we defined the start of each frame to be at the left electrode rail (the conductor connecting all the electrodes belonging to one of the poles), and we performed the registration of this feature for each frame, similar to the process in Ref. [38]. This feature is very easy to detect automatically, because of the strong reflection and the shadow cast onto the tissue. For raw frames with less than 360 • , it is necessary to wait for the acquisition of the next frame in order to create a frame with a full rotation. The small delay (80 ms) in this case is unlikely to be detrimental to the performance in a real-time setting. Although this step guarantees that the first and last A-lines in each frame correspond to the same physical location in the sample, now B-scans have a variable number of A-lines. We define the number of A-lines in B-scan m as l max (m) Generally, it is straightforward to transform a B-scan from a given number of A-lines to another by using an interpolation technique. However, in this case, we were working with complex-valued B-scans. From Fig. 4(b), it is clear that the length scale of the phase features in the lateral dimension is much smaller than the speckle size seen in Fig. 4(a), revealing that the phase has been heavily corrupted by the phase jitter of the system. Any interpolation of the complex signal will produce an erroneous result because of these artifacts. For this reason, before interpolation we need to correct for phase jitter. The employed system had Nyquist sampling in the lateral scanning direction, which means that the speckle in adjacent A-lines is partially correlated. Therefore, we compute a phase-jitter corrected tomogram aŝ where we have made explicit the polarization channel index p, and where we define (l = 0, m) ≡ (l = l max (m − 1), m − 1), which guarantees phase stability between B-scans. The sum over the available polarization channels is warranted because the phase jitter is not polarization dependent. This correction is equivalent to the bulk-motion correction used in Doppler OCT [53]. After this step, linear interpolation of the real and imaginary parts -independently-of the complex tomogram was performed to resample B-scans with arbitrary number of A-lines to exactly 4096 A-lines. Figure 5 shows the results from this step, which produced high-quality complex tomograms unaffected by phase jitter. The absence of timing-jitter artifacts after this correction suggest that the frequency shifter was the main responsible for the phase instability in the acquired tomograms.

Determining the NURD mapping function
The function M(l, m) was then determined using feature tracking. We first used the speeded up robust feature [54] (SURF) algorithm, an accelerated variant of the scale-invariant feature transform (SIFT) feature descriptors [55]. The input to the algorithm was the logarithmic intensity of the tomogram without any downsampling, in order to detect individual speckle as features. Then, we used the Kanade-Lucas-Tomasi (KLT) feature tracking algorithm [56] across each pair of frames, which provided the coordinates in each frame for all the speckle detected by SURF. Virtually every speckle in each pair of tomograms is being tracked. For real-time implementation, there exist very fast implementations of these algorithms [57]. Figure 6 shows the feature tracking analysis between frames 24 and 25, a pair in which the non-rigid deformation introduced by NURD is evident. Figure 6(a) shows the individual speckle tracked between the two frames exhibiting a clear collective lateral motion, consistent along depth, due to NURD. The output from feature tracking was a set of coordinates of the tracked speckle (t x (s), t y (s)), where s is an index identifying each tracked speckle, and the corresponding vectors of lateral and axial motion (∆ x (s), ∆ y (s)). Figure 6(b) shows ∆ x (s) as a function of t x (s) for the full frame 25, where the region under the electrode was starting to show early signs of coagulation. The quality of the tracking was in general very high outside the active region, and the total number of tracked points was consistently around 20,000-30,000 per frame [blue dots in Fig. 6(b)]. Under the electrode, because the speckle pattern changes between frames, a number of false positives were seen. Because the size of the electrode is known and the start of each frame corresponds with the beginning of the electrode region, false positives were easily removed by ignoring all tracked points below the electrode surface. The balloon and the electrode surface still provided useful speckle and intensity features to allow feature tracking to work effectively in the electrode region. Additional false positives throughout the frame were identified because, for a given lateral location t x , the lateral motion ∆ x and axial motion ∆ y should be similar for all true positives. Defining∆ y as a moving median average, any points with |∆ y | > 0.25∆ y were discarded as outliers. The remaining valid data points, shown in red in Fig. 6(b), are generally around 10,000-20,000 per frame. A final median and spline filtering was performed, and then the vector of ∆ x was interpolated for all integer lateral locations in order to create the mapping speckle location frame m-1 speckle location frame m speckle motion  Figure 7 shows the results of the NURD correction. We omit the tomogram intensity and phase as changes at this scale are too small to visualize. More importantly, the modified cross-correlation coefficient in Fig. 7(a) shows now a high correlation across all axial and lateral locations, with only some dependence on signal intensity, which will be addressed later. Figure 7(b) shows a very homogeneous phase shift, and highlights the fact that the tissue had some level of bulk axial motion (prevalence of a positive phase difference), which could be caused by the tissue thermal expansion. There was a clear axial gradient of the phase difference in the tissue under the electrode, supporting this hypothesis.

Noise calibration
The initial 10 frames in each dataset, where the RF energy source was off, were used to calibrate the noise effects onto the CDV. To this end, the frames were downsampled 10 times in x and z (A-line and depth, respectively) using linear interpolation to reduce the variability and size of the data, and an additional median filter was applied to further reduce noise. Each frame was divided into 20 segments of equal length along x, and the segment with the lowest average CDV was taken as a reference. The CDV as a function of signal intensity in dB, I log , of the data contained in the 10 resulting segments (one per frame) was used to fit a single exponential function γ = f a exp ( f b I log ). Then, the calculated CDV in all subsequent frames was modified to account for the noise contribution as where f c is a global offset, estimated at f c = 0.1 in the empty region above the tissue.

Display and cumulative CDV
Before displaying the CDV, spatio-temporal filtering was applied to γ SNR , consisting of a running median and mean filter with a kernel size of 1×1×3 px×px×frames and 10×10×3 px×px×frames, respectively. The kernels were shifted in time in order to use only information available up to that point: frame m was filtered using frames [m − 2, m − 1, m]. The CDV cross sections were created using a nearly-isoluminant map (a truncated version of ametrine from Ref. [58]) to map the value of γ SNR onto the hue, and the structural information I onto the luminance. Finally, the B-scan was transformed from polar coordinates into cartesian coordinates, for intuitive display in the clinical setting. Given the use of a single time pair, the final CDV can be displayed at 6.1 frames per second. In all figures below, we indicate the time since turning on the RF source with t and we round to the nearest integer. The cumulative CDV (cCDV) κ in frame m was calculated using In a similar fashion to CDV, cCDV cross-sectional overlays encode κ in the hue (untruncated ametrine colormap) and I in the luminance, and were converted into a cartesian coordinate system. The RFA therapy was configured with V RMS = 20 V at 500 kHz for 10 s. A region with increased CDV (purple) was observed at the center of the electrodes in the epithelium up to the lamina propria (t = 7-8 s) and eventually reaching slightly into the muscularis mucosa (t = 9-10 s). Imaging was performed until after the RF source was turned off at t = 10 s to illustrate the rapid decrease in CDV. ee, edges of electrode; e, epithelium; lp, lamina propria; mm, muscularis mucosa. The two-dimensional color bar indicates the CDV γ SNR range in hue, and the image intensity I in dB in luminance. Scale bars are 1 mm in tissue. See Visualization 1 for the full video.

Histological Analysis
Porcine esophagus was harvested and kept in ice until the ex vivo experiments performed at room temperature. The ablated tissue was kept in ice, dissected, and cut in half at the center for histological validation. Nitroblue tetrazolium chloride (NBTC) histology (with eosin counterstain) was performed after embedding in optimum cutting temperature compound for frozen sections at 10 µm. NBTC staining is used to detect thermal coagulation of tissue [59], which leads to loss of activity of NADH-diaphorase. Figure 8 shows the ability to monitor the thermal coagulation process in the integrated OCT-RFA catheter using our processing algorithm. The rise in CDV was initially limited to the epithelium from t = 7-8 s and reached the muscularis mucosa at the endpoint of the therapy at 10 s with an applied voltage of V RMS = 20 V at 500 kHz. Upon cessation of therapy, the rapid drop in CDV was evident from t = 10 s to 12 s with some residual heating at t = 11 s. The loss of the CDV signal shows one of the limitations of using the instantaneous CDV to visualize the coagulation zone: CDV highlights the active coagulation process, but it is unable to distinguish between intact and already coagulated tissue. The cumulative CDV, illustrated in Fig. 9, demonstrates the ability to more clearly visualize the evolution of the coagulation zone, especially after the RF energy was turned off at t = 10 s. To investigate the impact of the applied RF voltage on the time course of ablation and to determine our ability to detect coagulation well beyond the lamina propria, the RF voltage was increased to V RMS = 25 V at 500 kHz and the same experiment was repeated for an ablation time  Fig. 9. Radiofrequency ablation monitoring using cumulative CDV (counterpart to the experiment in Fig. 8), demonstrating the ability to more clearly visualize the evolution of thermal coagulation process in porcine esophagus ex vivo. A growing region with increased cumulative CDV (red/purple) was observed in the epithelium and eventually reaching slightly into the muscularis mucosa. ee, edges of electrode; e, epithelium; lp, lamina propria; mm, muscularis mucosa. The two-dimensional color bar indicate the cCDV κ range in hue, and the image intensity I in dB in luminance. Scale bars are 1 mm in tissue. See Visualization 2 for the full video. of 10 s. Figure 10 shows that the CDV signal became visible at t = 5-6 s, compared to t ∼ 7 s for 20 V RMS . The residual CDV was also slightly higher at t = 11-12 s after the RF source was turned off at t = 10 s. The cCDV (Fig.11) further suggests that the thermal energy was deposited mostly in the epithelium up to t = 8 s and reached all the way through the muscularis mucosa by the end of the therapy (t = 10-12 s). It is interesting to note that in this voltage setting the CDV started to increase in a columnar fashion, in contrast to the more laterally homogeneous increase seen in the V RMS = 20 V setting. We believe the focal increase in CDV could have been caused by the formation of gas bubbles, as observed in the past in the case of laser therapy [29]. Figure 12 shows the histological validation of the cumulative CDV maps for both power settings at V RMS = 20 V and V RMS = 25 V using NBTC staining (3 trials for each setting). At V RMS = 20 V, the thermal coagulation zone as visualized with cumulative CDV consistently extends beyond the lamina propria (lp) into the muscularis mucosa (mm), without compromising the submucosa (sm) or muscularis propria (mp). Figures 13(c) and (d) show the magnified view of the same dataset shown in trial 1 [ Fig. 12(a)], demonstrating an estimated lesion depth of ∼ 590 µm using the cumulative CDV map, compared to corresponding NBTC histology showing a lesion depth of ∼ 700 µm at the center [ Fig. 13(e)]. For the V RMS = 25 V setting, the results are also very similar: the cumulative CDV map and NBTC histology agree on the injury zone, both suggesting that thermal injury extends all the way through the muscularis mucosa up to the submucosa, although the absolute measured depths (∼ 670 µm by cCDV and ∼ 900 µm by NBTC histology), as shown in Figs. 13(d) and (f), are different. Histological analysis of other datasets shown in Fig. 12 reveal a consistent trend, in which the exact treatment depth delineated by NBTC is higher than the depth revealed by the cCDV map, but more importantly show similar relative injury depth with respect to important anatomical landmarks (lp and sm). We also confirmed in the histology that the injury depth produced by our electrode was fairly uniform in both directions (not shown in Fig. 12). For example, the in-plane extent (along x) of the injury was ∼ 7.5 mm at the V RMS = 20 V setting, and the out-of-plane (along y) depth varied within 680-730 µm when measured every 200 µm in between two electrodes. From these images, it is clear that the injury depth is very similar, when determined with respect to the tissue layers. The difference in depth stems from the fact that during imaging, the esophagus was significantly stretched due to the balloon inflation pressure. Despite the possible shrinkage of tissue during histological preparation, the balloon pressure onto the esophagus is very high; therefore, we expect that all depth measurements during imaging will be lower than in the histology slides. However this is not an important drawback because therapy guidance requires the knowledge of the injury depth as defined with respect to the tissue layers, as the objective is to ablate only up to the muscularis mucosa -without injuring the submucosa or muscularis propria-to reduce the risk of stricture formation, regardless of the specific thickness of the layers at the time. It is also important to note that an hypothetical treatment plan would be created from the OCT imaging of the esophagus in a given patient, and therefore both the pre-treatment imaging as well as the treatment real-time monitoring will be carried out with the esophagus in similar conditions, i.e. under balloon pressure. This implies that the differences in absolute treatment depth found between OCT imaging and histology are not relevant in this scenario.

Discussion
Using our integrated balloon OCT-RFA therapy delivery and guidance system, we demonstrated direct, label-free visualization of the thermal coagulation process during RFA at high-resolution with the NURD-corrected, calibrated CDV as well as the cumulative CDV metric in porcine esophagus ex vivo. We showed the ability to monitor therapy and determine the injury depth with respect to important landmarks from the epithelium all the way through the muscularis mucosa (up to the submucosa), as confirmed by NBTC histology. This is an important finding, because even though coagulated tissue exhibits an increase in scattering and affects the penetration of OCT imaging, we can still determine the ablation zone well beyond the epithelium into the submucosa, which is clinically an important boundary to prevent damage to the muscularis propria and to lower the risk for stricture formation. While the exact lesion depth delineated during imaging in the inflated balloon catheter configuration differed from the NBTC-delineated depth due to the stretching of the esophagus and relative flattening of the layers, it is important to point out that the target depth would be determined based on the location of the submucosa and muscularis propria in the clinical setting. Therefore, it is the relative target depth (relative to all the anatomical layers and landmarks) that is critical to monitor during ablation and the results demonstrate the ability to delineate the relative depth accurately with respect to important target structures. In our current study, we demonstrated cross-sectional imaging at the center of the flexible RFA electrode array given the uniformity of the lesion depth across the electrodes from our COMSOL simulation and histological data. A limitation of the current approach is that simultaneous pull-back is not possible across different bipolar electrode pairs to generate a 3-D ablation map in real-time and the exact ablation depth directly under the electrodes cannot be determined. However, this is can be overcome by designing multiple RFA arrays along the balloon catheter that are individually activated such that the ablation depth can be tailored across a non-uniform lesion. In addition, the width and spacing of the electrodes can be further optimized for clinical application to maximize the imaging window available. For example, in the clinical setting, since the current electrode array covers a 50 • region, azimuthal groups of 8 independent electrodes will cover the full 360 • circumference of the catheter, and 6 groups arranged longitudinally will cover the full length of the catheter. Each electrode will be wired independently, allowing the treatment of different lesions at specific depths, without requiring the balloon catheter to be deflated for relocation onto different regions of the esophagus. To further improve our SNR, a more transparent material for the substrate of the flexible electrode array can be used. Finally, it is important to note that due to the imaging speed of the clinical system employed (50 kHz with an effective frame rate of 12.2 fps), our target therapy duration was set to 10 seconds to provide sufficient number of frames for subsequent processing. In comparison, the clinical Barrx 360 RFA system achieves a target depth of approximately 700-800 µm within 1 s using a pulsed RF ablation scheme [49]. For future in vivo experiments, a faster imaging system will be employed and the therapy duration can be reduced with a higher RF voltage, and pulsed RF ablation schemes will be further explored as well.
A limitation of our technique is the fact that cCDV is a surrogate for tissue coagulation, and not a direct metric. For this reason, different types of tissues will in general exhibit different CDV values under coagulation. The particular CDV value also depends on the frame rate of the OCT system used for monitoring: it is clear that a sufficiently fast system may capture a more progressive change of the speckle pattern during coagulation; therefore, a lower CDV level will be observed. The same is true for the cumulative CDV metric: it is highly dependent on tissue type and OCT system frame rate. We also want to point out that it is not possible to assign a defined tissue state (coagulated/not coagulated) to a specific value of the cCDV, as a one-to-one relationship is not expected to exist due to intrinsic differences in tissue response to therapy. Rather, the inspection of the cCDV two-dimensional map reveals changes in the cCDV that are intuitively mapped to tissue state. Similarly, the colorbar limits in Figs. 8-11 were chosen because they work well for porcine esophagus in our system, but translating this technique to patients will require calibration with the corresponding imaging system. While we have focused on the use of CDV in our study due to its ability to provide better contrast for a given number of measurements and equivalent spatial averaging than intensity-based decorrelation metrics [60], the use of the latter may decrease the computational requirements for real time implementation.
Currently, all the processing is performed after the measurements are obtained. A realtime implementation is required to make this technique useful for monitoring in the clinical environment. There are a number of approaches to speed up the feature tracking step with SURF, including GPU-based and FPGA-based approaches [61,62], which have enabled its use on mobile platforms for image-stitching and other feature matching applications. In addition, the type of motion artifacts in vivo will be different than in our experiments. For instance, heartbeat and respiratory motion artifacts may in general deform the tissue, rather than produce the bulk motion due to tissue expansion that we have observed in our experiments. Full in-vivo motion correction may require more sophisticated algorithms which could be implemented as an extension to our feature-tracking step. However, it is important to note that our technique is only impacted by motion between adjacent B-scans, and these types of physiological motion generally have longer time scales. For this reason, and especially if a faster imaging system is used, we do not believe that heartbeat and respiration will cause significant problems for our processing.
As seen in Figs. 8-13, the contrast between tissue layers can vary greatly, and sometimes it is difficult to identify their boundaries. More consistent contrast should be possible with PS-OCT, because muscle layers have a measurable level of retardation at baseline [23] and can provide complementary contrast to further delineate injury boundary and minimize risk of stricture formation.
Finally, there was a lack of a clear transition from high CDV to low CDV when tissue coagulates, as we observed in the past [29]. This may be due to mechanistic differences between laser thermal therapy and RF ablation, although the consistently high CDV seen here in the epithelium even after the coagulation front has reached the muscularis mucosa is not well understood. For example, RFA generates heat in tissue based on ionic agitation with RF energy which may perturb the origin of the CDV signal microscopically despite achieving thermal coagulation. However, this property actually allowed us to simplify the analysis, discarding the coagulation metric we used in the past [29] and relying exclusively on the cumulative CDV, which correlates well with histological analysis.
Future translation of the current approach to the clinic would enable the concept of conformal thermal therapy with individualized zone and depth control achieved with a segmented electrode array design, which will be able to deliver different total energy in each targeted area. This is similar to the concept of 3D conformal radiation therapy used in cancer treatment, leading to more personalized treatment planning for BE patients.

Conclusion
Clinically, RFA is currently an approved and widely adopted treatment modality for Barrett's esophagus patients with dysplasia. Its current use in low-grade dysplasia and non-dysplastic BE is controversial due to the potential risk of complications and cost of treatment associated with repeated sessions.
Our technique is based on the analysis of the temporal fluctuations of the speckle pattern using CDV when the tissue is under active coagulation. With a new RF electrode design together with a clinically compatible OCT balloon catheter, and by correcting for NURD and motion artifacts, we defined an SNR-calibrated CDV and a cumulative CDV metric for accurately monitoring RFA therapy and injury depth, as confirmed by histological analysis in porcine esophagus ex vivo.
We have presented a new thermal therapy monitoring tool that provides an accurate measure of the thermal coagulation zone, with the potential to increase single-procedure efficacy and reduce complication rate by improving the accuracy of lesion targeting. Clinical translation of this tool may open up the possibility of using RFA in treating Barrett's esophagus with low-grade dysplasia and non-dysplastic Barrett's esophagus when treatment response is highly favorable in the future.

Funding
National Institutes of Health (NIH) (P41 EB015903, K25 EB024595). WCYL was also supported by the Canadian Institutes of Health Research (CIHR) Doctoral Foreign Study Award and Harvard Presidential Scholarship.