Enhanced volumetric visualization for real time 4D intraoperative ophthalmic swept-source OCT

Current-generation software for rendering volumetric OCT data sets based on ray casting results in volume visualizations with indistinct tissue features and sub-optimal depth perception. Recent developments in hand-held and microscope-integrated intrasurgical OCT designed for real-time volumetric imaging motivate development of rendering algorithms which are both visually appealing and fast enough to support real time rendering, potentially from multiple viewpoints for stereoscopic visualization. We report on an enhanced, real time, integrated volumetric rendering pipeline which incorporates high performance volumetric median and Gaussian filtering, boundary and feature enhancement, depth encoding, and lighting into a ray casting volume rendering model. We demonstrate this improved model implemented on graphics processing unit (GPU) hardware for real-time volumetric rendering of OCT data during tissue phantom and live human surgical imaging. We show that this rendering produces enhanced 3D visualizations of pathology and intraoperative maneuvers compared to standard ray casting.


Introduction
Optical coherence tomography (OCT) allows for imaging of the anterior and posterior segments of the eye with micron scale resolution. Previous intraoperative OCT imaging was performed using hand held spectral domain (SD) systems during pauses in surgery [1,2]. To minimize disruptions in surgery, both custom research and commercial SD-OCT systems have been incorporated into surgical microscopes . These spectral domain microscopeintegrated OCT (SD-MIOCT) systems have been limited to live B-scan imaging because the 20-30 kHz A-scan rate SD-OCT engines they incorporate do not support real time volumetric imaging and lack the computational capability for real time volumetric rendering. We have recently reported a swept-source (SS) based MIOCT system capable of real time volumetric acquisition and basic 3D rendering (up to 10 volumes/sec) in both model eyes [24,25] and live human surgery [26][27][28][29][30]. Live volumetric or "4D" OCT visualization can combine depth information that is only inferred in the conventional en-face view through the microscope from indirect cues such as stereoscopy and instrument shadowing, with lateral context that is unavailable in B-scan only SD-OCT imaging. We believe that this combination of information could assist with the real time visualization and evaluation of surgical maneuvers, such as retinal membrane peels, which require axial manipulation of the tissue and visualization of the relative motion of the tool and tissue in the lateral dimensions.
The development of improved volumetric rendering algorithms which generate high visual quality and are scalable in speed are further motivated by the limitations of real time B-scan display of volume data, constant improvements in OCT acquisition speed [31][32][33], as well as new applications which require simultaneous volume rendering from multiple viewpoints. When performing real time volumetric imaging at 100kHz, B-scan display has difficulty conveying volumetric information. Either the volume of B-scans can be played through (at hundreds of B-scans per second) or a single B-scan can be shown. The play through of the first is so fast that it is difficult to perceive and the second ignores the volumetric data. Our MIOCT system also incorporates a novel stereoscopic heads-up display which requires rendering of live volumes from two angularly separated viewpoints [34], and we anticipate that future implementations will require multiple simultaneous monoscopic or stereoscopic visualizations for group viewing.
3D scalar volumes can be rendered in several ways [35]. If the surface or object of interest can be defined or segmented from the volume, a traditional vector graphics pipeline such as OpenGL or DirectX [36] can be used. These hardware accelerated pipelines have the capability to quickly produce high quality surfaces. However, high quality segmentation of OCT volumes is a time consuming and difficult task which has evolved into its own specialty of OCT image processing. Most segmentation algorithms that have been reported to date are optimized for multilayer segmentation instead of fast segmentation of the nerve fiber layer [37][38][39][40][41]. Direct volume rendering relies on generating renders directly from the voxel values. This requires an optical model for how each voxel emits and absorbs light based on its scalar value. These models can be extremely complex and computationally expensive when used to generate photo realistic renders for computer graphics applications. Volumetric medical imaging modalities capable of real-time imaging typically use a simpler approach called ray casting, a computationally convenient single pass absorption plus emission model based on Beer's Law [42]. Basic ray casting has been used in OCT for real time display of OCT data [43], ophthalmic OCT data [44][45][46], in-vivo intrasurgical OCT data [24] and ex-vivo OCT data acquired at up to 51 volumes per second [47]. However, when compared to the quality of volumes rendered in post-processing using software such as Matlab (Mathworks; Natic, MA), AVIZO (FEI Visualization Sciences Group; Burlington, MA), or other commercial volume rendering software [48], ray casting produces renders with surfaces that appear hazy with poor depth definition. In this paper, we report on the development and implementation of an enhanced ray casting model that adds volumetric filtering, edge enhancement, feature enhancement, depth based shading, and Phong lighting while remaining computationally efficient. We have implemented this enhanced model on GPU hardware with sufficient performance to produce high quality volumetric renders at over 60 renders/second. We demonstrate this new rendering pipeline integrated into our real-time, volumetric SS-MIOCT system for significantly improved visualization of volumetric features.

Theory of enhanced volume rendering
Ray casting is a standard computationally convenient absorption plus emission optical model that describes the generation and propagation of light through a scalar volume [42]. A basic ray casting algorithm consists of five steps: projection, thresholding, classification, shading, and compositing. The flow chart in Fig. 1 shows the steps of the basic ray casting algorithm indicated in the square blocks. Our additions to the ray casting algorithm are shown in the oval blocks, and are discussed in the following paragraphs. In basic ray casting, each pixel in the output image (I(x,y)) is computed by projecting a ray (r) from the view point, through the image plane, and into the volume (Fig. 1 right). At M discrete points ( i r ) on the ray, the voxel value is interpolated and thresholded. The color ( () i cr ) and opacity ( () i cr ) of the current location are assigned by piecewise classification and shading transfer functions. The emission from the voxel is defined as its color multiplied by its opacity. The emission from each voxel is attenuated by the opacities of the other voxels between it and the viewer via Beer's Law. The algorithm iterates down the ray and the output color of the pixel in the image is determined by front to back compositing [42,49]: To improve the performance of basic ray casting, we first added volumetric filtering to denoise the volume and to create smooth surfaces. We implemented a 3x3x3 median filter as an edge preserving filter and a 5x5 two-dimensional Gaussian low pass filter operating on Bscans as a smoothing filter. This combination has been used in direct volume rendering of ultrasound volumes [50,51] and OCT volumes in post processing [24]. In our experience (reported below) this filter combination provided enough performance to de-noise the volume, while remaining algorithmically and computationally simple.
After filtering, the volume was thresholded. While thresholding could be done with the shading and classification transfer functions, we found that having a separate thresholding step made interaction with the rendering user interface easier for the end user. Instead of having to precisely manipulate the amplitude and position of multiple points on two transfer functions, having an independent threshold allows the transfer function to remain relatively constant. Instead, only a single threshold value needs to be manipulated.
In basic ray casting, the color and opacity values from the transfer functions are passed directly to the compositing function. In our enhanced ray casting method, we made four modifications (denoted with ovals in Fig. 1) to these values to emphasize specific volumetric features. First, the original opacity ( ( )) i r  was scaled by one plus the scaled (k g1 ) gradient magnitude to add gradient based edge enhancement to the image [42,49]: Gradient based edge enhancement works well for ophthalmic data sets where it is desired to render the interface between transparent regions (i.e., the vitreous of the eye) and a tissue surface. The gradient was estimated using a discrete gradient operator in the x, y, and z directions. Raising the gradient magnitude to the power of k g2 allowed for non-linear tuning of the classification function.
Feature enhancement relies on visual cues provided by the edges of an object. In a 3D scene with multiple objects, much of the relative depth and positional information can be conveyed by drawing edges of the objects. In our volume rendering enhancement, object edges were emphasized by increasing the opacity of voxels where the gradient is orthogonal to the view direction (V) [49]: Here, k f1 and k f2 determine the scaling and non-linear tuning of the transfer function, respectively. Increasing the opacity of edge voxels tended to make them darker, as the opacity threshold is reached faster and less emission is accumulated by the compositing function.
Depth perception was enhanced by including intensity and color based depth queues. Dimming voxels that are further away can add an illusion of depth to the volume render. This can be furthered by adding a slight blue hue to far away voxels as per Eq. (4) [49]: This technique is often used by artists and relies on the tendency for cool colors (such as blue) to fade into the background [49]. k d1 and k d3 weigh the relative contribution of intensity and color based shading, d v is the fractional depth along the ray, c b is the color of the depth based hue shading (blue), and k d2 allows for non-linear depth based shading. Finally, to simulate lighting we added the Phong reflection model. The Phong reflection model is a well-known lighting model which produces the illusion of smooth surfaces at reasonable computational cost. It is the weighted contribution of ambient background lighting, Lambertian reflectance, and specular reflection [42,52]: Here, k p1 , k p2 , and k p3 are the ambient, diffuse, and specular reflection coefficients respectively, k p4 allows for non-linear tuning of specular reflection, N( i r ) is the normalized gradient, L is the lighting direction, and H is the vector in the direction of maximum highlight defined as the normalized sum of the lighting direction and the view direction vectors.

Methods
Volumetric images of both ex vivo porcine and in vivo human retinas were acquired using a 100 kHz A-line scan rate SS-MIOCT system, described previously in [30]. The system used a swept source 1060 nm laser (Axsun Technologies; Bilerca, MA) and a Mach-Zender topology interferometer. Light from the laser was collimated and introduced into the infinity space of the surgical microscope using a custom microscope attachment and a dichroic mirror [5]. The signal was detected with a 1 GHz balanced photoreceiver (PDB481C-AC, Thorlabs; Newton, NJ) and sampled at 800 MS/s with a 1.8 GS/s digitizer (Alazar; Quebec, Canada). Custom CUDA (NVIDIA; Santa Clara CA) and C++ software performed the acquisition and necessary signal processing of the data. Software was benchmarked on an NVIDIA GTX Titan Black GPU with 2880 CUDA cores and 6144 MB of GDDR5 memory. The 6 dB falloff and the axial resolution of the system were measured to be 3.9 mm (Fig. 2) and 7.8 µm in air respectively. Full-depth, complex conjugate resolved non-intraoperative anterior segment volumes were acquired with a long depth of focus sample arm and the same engine [53]. Additional in-vivo volumes of non-sedated pre-term babies with retinopathy of prematurity were acquired at the bedside using a commercial SD-OCT handheld system (Envisu c2300, Bioptigen; Morrisville NC) with a 32 kHz A-line rate. Importing the data from the commercial system into our OCT rendering software required converting the OCT data to the range and format expected by our software. To date, we have imaged over 50 human surgeries at the Duke Eye Center with the SS-MIOCT system and enhanced rendering pipeline, and we have visualized over 30 bedside preterm infant retinal volumes using the enhanced rendering pipeline. All human subjects imaging research was performed under protocols approved by the Duke Medical Center Institutional Review Board. Optical power from the intraoperative MIOCT system was set to 1.6 mW before the start of each imaging session, which is below the ANSI standard for maximum permissible exposure at 1060 nm. When imaging during human surgery, surgical illumination was reduced to ensure that the combination of OCT light and surgical illumination posed no additional hazard when compared to the original surgical illumination.
After being processed by our CUDA software, the volumes were de-noised using custom GPU implementations of median and Gaussian filters. A classic histogram based approach for the median filter performed poorly on current generation GPU hardware (up to CUDA compute capability 3.5) because the CUDA cores had inadequate low latency memory (registers) to support a sufficient number of parallel threads (each thread requiring more than the allowed 256 registers) to leverage the computational power of the GPU [54,55]. Instead, we employed a forgetful selection algorithm [54] to calculate the median. A diagram illustrating forgetful selection is shown in Fig. 3. Forgetful selection relies on the fact that the median of a set of n numbers cannot be the extrema of an n/2 + 1 subset of those numbers. Starting with an n/2 + 1 subset of the data, the extrema of the subset are identified and removed from the subset. An additional value is added to the subset from the set and the algorithm is iterated until only the median remains. While algorithmically more complicated, forgetful selection only requires n/2 + 1 registers. For a 3x3x3 median filter, this allowed all sorting operations to take place within registers while also allowing the CUDA cores to support a sufficient number parallel threads. Each thread was assigned to a pixel in a B-scan of the output volume. The thread iteratively calculated the median in the slow scan direction. Memory access latencies were hidden by shared memory blocking, a standard form of explicit caching [54].
The 5x5 2D Gaussian filter was implemented using the same blocking approach and thread structure as the median filter [55]. Since the Gaussian kernel was small, performing convolution was faster than taking the 2D Fourier transform of the data. Both filters were performed on data saved as 8-bit unsigned integers. Finally, real-time volumetric ray-casting was implemented in CUDA by assigning each pixel in the output image to a separate thread. Data was passed from the filtering kernel to the rendering kernel. In the rendering kernel, each thread projected a ray from the user controlled virtual camera, through the image plane, and into the volume. Using geometry, the thread determined if the ray intersected the volume; if not, the output pixel was set to the background color (black) and the thread terminated. The ray was broken up into a series of discrete intervals and the scalar value at the current point was interpolated from the volume. The modified color and opacity were calculated from the scalar value and passed to the compositing algorithm (discussed in the previous section). In order to reduce computation time, if the accumulated opacity along a thread passed a certain threshold, the thread terminated as there was no significant accumulation of light from voxels further along the ray. Memory accesses were hidden using implicit caching by storing the filtered volume in 3D GPU texture memory. Texture memory was chosen because it supports accelerated trilinear interpolation.
The parameters used in our enhanced rendering engine were empirically determined by optimizing edge enhancement, feature enhancement, depth encoding, and Phong shading in order. Once determined, all but the gradient magnitude (k g1 ) and the lighting direction (L) remained constant for all data shown in this paper. Default values k g1 , L, and the constant values for the rest of the parameters are given in Table 1. We found that that the scaling of the gradient magnitude (k g1 ) and the lighting direction (L) had the most noticeable impact on image quality and our intraoperative software allows the user to further tune these parameter. However, we found that most users only modified the threshold and the transfer function to perform live optimization of the volume render. Rendering of MIOCT data for anterior segment versus retinal surgery required different tuning of algorithm parameters. Unlike for retinal images in which the first surface encountered by a ray was rendered as opaque, in anterior segment images it is important to render the cornea transparent to allow for visualization of deeper structures. To allow for improved visualization of the anterior chamber, the contribution of opacity from edge and feature enhancement (k g1 and k f1 respectively) were reduced by a factor of 10.

Results
In order to demonstrate improvements from our enhanced ray casting pipeline, Fig. 4 shows the sequential addition of steps to the pipeline for live SS-OCT visualization of a simulated surgical maneuver in a porcine eye. As seen in the figure, filtering smoothed the surface and removed opaque voxels in the vitreous that obscured the surface in the original unenhanced render. Edge enhancement made the surface appear sharper and added 3D features to the render. Feature enhancement furthered this effect and added additional texture to the volume. In addition, it increased the contrast between objects at different depths. Depth based shading had the most subtle effect, but increased the contrast between objects at different depths. Phong lighting added additional depth queues and three dimensional features by including information about surface orientation.
When used for anterior segment rendering the retinal rendering settings produced renders in which the surface of the cornea is opaque, blocking visualization of features in the anterior chamber. Figure 5 shows a comparison of basic ray casting, enhanced ray casting with retinal settings, and enhanced ray casting with anterior segment settings for visualization of the anterior segment of the eye in a living patient. Enhanced volume rendering with cornea settings removed substantial spurious noise, rendered the cornea more realistically transparent, and allowed for enhanced visualization of depth features of the iris.  To demonstrate that the enhanced rendering pipeline supported real time performance, the NVIDIA profiler was used to generate a timing diagram (Fig. 6). The timing diagram illustrates the computational performance of the enhanced ray casting and OCT acquisition pipelines generating a 425x600 render (the default render size in our intraoperative software). In the acquisition pipeline, data was acquired and processed in groups of 16-B-scans. For 512x688 B-scans, a group of 16 B-scans took 86 ms to acquire and 18 ms for standard OCT processing. We demonstrate that the enhanced rendering pipeline has sufficient computational throughput to filter the acquired group of B-scans (10.0 ms) and render (2.5 ms) the entire volume before the start of the next acquisition. While this was a 11.8 ms increase in computational cost over basic ray casting (700 μs), the total cost for acquisition, OCT processing, filtering, and display was 30.5 ms, much less than the acquisition time of 86 ms. Furthermore, the enhanced rendering kernel was fast enough to allow for the volume to be interactively manipulated and re-rendered multiple times between acquisitions with no perceived delay to the user. The high speed obtained also enables visualization applications where more than one rendering from multiple viewpoints is desired, such as live stereoscopic display of volumetric SS-MIOCT data. Stereoscopic display requires renders to be offset by a fixed rotation and enhances user perception of depth features in the volume when compared to monoscopic display [56]. With our rendering pipeline, stereoscopic rendering required only an additional 2.5ms. A stereoscopic pair that was used for intraoperative visualization through a novel stereoscopic heads-up display incorporated in the surgical microscope is shown in Fig. 7. Outside of the operating room we have also used stereoscopic rendering combined with a 3D monitor and NVIDIA 3D Vision glasses to enable live group viewing of MIOCT data in a microsurgery research laboratory. Fig. 7. Intraoperative MIOCT volume of forceps peeling a membrane rendered as a stereoscopic pair, with a 9 degree offset. This data was displayed live to the surgeon through a novel stereoscopic heads-up display incorporated into the surgical microscope. Volumes were acquired at 300 A-scans per B-scan and 128 B-scans per volume over a 5mmx5mm region. Volumes were re-rendered for display outside of the OR using the same pipeline as the intraoperative software.
Intraoperative SS-MIOCT provided live visualization of pathology and response to treatment. Figure 8 shows renderings before and after removal of an epiretinal membrane (ERM) adjacent to a macular hole. Removal of the pathological ERM, decrease in macular hole size after ERM removal, and foveal elevations due to inner limiting membrane (ILM) peeling with intraocular forceps are clearly visible in the post-operative volumes. In addition to providing new prospective on pathology, volume visualization provided lateral context that was unavailable to B-scan only imaging. Figure 9 shows intraoperative visualization of the start of an ILM peel around a macular hole using a membrane scraper in a human eye, a common task in macular surgery. Volume rendering allowed for clear visualization of the tool and the macular hole, both of which are evident on both B-scans and volume renderings. However, volume rending allowed for visualization of the groove left in the retina by the surgical tool. This track was difficult to visualize in the B-scans because the depth and width of the track required 3D context to observe.
Finally, we illustrate the use of the enhanced rendering pipeline for a non-surgical OCT, imaging of pre-term babies with retinopathy of prematurity, with a commercial hand held SD-OCT system at the Duke University Hospital. Application of our rendering pipeline allowed for improved visualization of volumetric features and differentiation of the states of abnormal neovascularization and vascular budding (Fig. 10). In particular, volume rendering allows for visualization of the abnormal vascular growth as it extends above the surface of the retina. This was not readily visible during clinlinical examination and the clinical impact of this new information is still being studied [57]. Note that this volume has significant motion artifacts due to the slower scan rate (32 kHz) of the SD-OCT engine.  Volumes were acquired at 300 A-scans per B-scan and 128 B-scans per volume over a 5mmx5mm region. B-scans are tracked to the tool location in post processing. Renders were re-rendered for display outside of the OR using the same pipeline as the intraoperative software. Fig. 10. Visualizations of pathologic neovascularization in retinopathy of prematurity in an infant who was born three months premature and was imaged two months later. Fan-shaped neovascularization (NV) is visible on the back left corner with smaller neovascular buds (NB) at the right. Volumes were acquired with a Bioptigen hand held SD-OCT system at 1000 Ascans per B-scan and 64 B-scans per volume.

Discussion
The introduction of high speed real-time volumetric or "4D" MIOCT systems has motived the development of improved volumetric rendering algorithms which produce images with high visual quality and that are scalable in speed. The methods presented in this manuscript allow for real time visualization of 4D OCT data, which is unavailable to B-scan only display modes. While B-scans can provide better quantitative assessment of depth based features (such as the exact lateral extent of a macular hole), volume visualization enables assessment of the entire OCT volume and reveals lateral features (such as the retinal track and foveal elevations) that are difficult to visualize in a B-scan only display. Since the user only needs to tune a threshold and a single transfer function, the described methods can be easily adapted to meet the needs of clinical applications. For example, in corneal transplants it is sometimes desired to render the cornea opaque [26] and this change can be made dynamically in the operating room. The improved visual quality of our enhanced rendering pipeline has already allowed for new visualization of data for clinical applications, such as 3D renderings of abnormal neovascularization extending above the retinal surface in premature infants and tracks left in the retina by surgical tools. While the exact clinical impact of these applications is still unknown, it is anticipated that high quality visualization of intraoperative MIOCT data will lead to new assessments of pathologies, tool tissue interactions, and potentially aid in the development of surgical techniques as well.
The need for a high speed rendering algorithm is further motivated by constant improvements in swept-source OCT laser technologies, as well as new visualization and display applications. These visualization techniques, such as heads up stereoscopic display, have been shown to improve surgeon performance during simulated surgery [56]. They require generating multiple high quality volume renderings from different viewpoints simultaneously with low latency. To support higher speed systems, the processing and rendering could be implemented on multiple GPUs. The performance of CUDA programs will also scale with constantly improving GPU hardware; for example, the current generation GTX Titan X has at least a 20% increase in computational power (9% more cores and 12% higher clock rate) over the GTX Titan Black that was employed for this study [55]. Since both OCT processing and enhanced rendering fall under the same instruction multiple data programming paradigm, we believe the rendering algorithms described here are scalable for systems that operate at higher speeds, have faster GPU hardware, and require renderings from multiple simultaneous viewpoints.

Conclusions
We have demonstrated an enhanced rendering pipeline that includes volumetric filtering, edge enhancement, feature enhancement, depth based shading, and Phong lighting. Implemented on a current-generation GPU, this pipeline had sufficient computational throughput and low latency to support real time rendering and interactive manipulation of real-time volumetric 100 kHz SS-MIOCT in human surgery. The pipeline is applicable to data from commercial as well as research OCT systems, and is scalable to future generations of ever faster OCT acquisition hardware and increasingly demanding visualization requirements such as real-time stereoscopic visualization.