High resolution, programmable aperture light field laparoscope for quantitative depth mapping

Recent applications have shown that light field imaging can be useful for developing uniaxial three-dimensional (3D) endoscopes. The immediate challenges in implementation are a tradeoff in lateral resolution and acquiring enough depth information in the physically limited environment of minimally invasive surgery. Here we propose using programmable aperture light field imaging in laparoscopy to capture 3D information without sacrificing the camera sensor’s native, high spatial resolution. This hybrid design utilizes a programmable aperture to preserve the conventional laparoscope’s functionality and, upon demand, to compute a depth map for surgical guidance. A working prototype is demonstrated.


Introduction
Conventional laparoscopic systems provide surgeons with a two-dimensional (2D) view of the operative field, which limits depth cues due to the lack of binocular vision and causes loss of accurate depth perception and challenges of eye-hand coordination [1]. On the other hand, 3D laparoscopes offering more accurate depth cues such as binocular vision have gained significant popularity, especially when integrated with robotic surgery. Studies have reported less fatigue and more accurate and faster surgical performances with 3D laparoscopy [2,3].
There exist several different methods for implementing 3D laparoscopy, including dual sensor stereo, single-sensor stereo, single-sensor 3D imaging via structured light, and uniaxial 3D imaging [4]. Engineering these methods for 3D laparoscopes faces unique design limitations. Dual-sensor stereo systems that integrate a pair of imaging optics and sensors to acquire binocular images of the surgical field into the constrained, standardized endoscope housing are one of the popular means that have been successfully adopted in commercial systems. Compared to the 4k ultra-high definition image quality of state-of-the art 2D laparoscopes, improving optical performance in these systems is still challenging due to the naturally constrained dimensions and effective numerical aperture (NA). Single-sensor stereo systems capture stereo images with a single sensor by means of split-channel optics and result in compromise of resolution. Singlesensor 3D imaging systems via structured light record distorted structured light to determine 3D surface profile, but at the cost of an extra projection path. Uniaxial 3D acquisition techniques that extract 3D depth information using monocular endoscopes with a single optics channel have one advantage of maintaining the similar form factor to the monocular endoscopes and several strategies such as time of flight measurement, shape from defocus or shading, and various active illumination methods have been investigated actively [4].
Recently, another uniaxial 3D acquisition method called light field (LF) imaging was applied to minimally invasive surgery, such as LF otoscope [5], laryngoscope [6], and endoscope [7]. Capturing the LF of a surgical field requires recording both the spatial and angular information of the light rays from a 3D object and thus enables an imaging system to digitally refocus post-image capture, extend the depth-of-field, and acquire depth information [8]. In addition, LF capture can be implemented with a simple addition of a microlens array (MLA) to its original monocular imaging optics. The existing LF endoscopes [5][6][7], however, are subject to several major limitations. The most important limitation is their substantially reduced spatial and limited angular resolution due to tradeoffs between ray position and angular sampling. Furthermore, they are often dedicated 3D systems incapable of acquiring high-resolution 2D images, require splitting the imaging path, or are limited to acquiring depth information for a specific environment.
Here we propose a design of a high resolution, programmable aperture light field laparoscope (PALFL) and demonstrate its utility for quantitative depth mapping. By adopting a programmable aperture (PA) instead of an MLA for capturing light fields in a time-multiplexing fashion [9,10], the proposed laparoscope design is able to address the above-mentioned limitations of existing LF endoscopes. In this case, the spatial resolution of the acquired light fields is only subject to the limit of the image sensor and the undesirable tradeoff between spatial and angular resolution is removed. Figure 1 illustrates the schematic layout of our proposed PALFL design, which consists of an objective lens, a 1:1 relay lens group, an eyepiece, a programmable aperture, a focusing lens, and a sensor. The objective lens with a focal length of f obj images the entire field of view (FOV) of an object and forms intermediate image 1 (II1). The 1:1 relay lens is necessary for rigid laparoscopes to extend the insertion length of the imaging probe within the limit of the housing tube diameter and relay the image to outside of the patient's body at intermediate image 2 (II2). To fit the objective lens and relay lens within the standard 10 mm-diameter housing of laparoscopes, the objective lens is designed to be image-space telecentric with its entrance pupil (EP) placed at its front focal point while the relay lens group is designed to be double telecentric. The eyepiece with a focal length of f eye projects the image toward optics infinity for direct viewing or further imaging. In the meantime, the eyepiece forms a conjugate image of the objective EP, labeled as "stop", at which the programmable aperture is placed. Opening a given region of the PA component allows the focusing lens, with a focal length of f fl , to image different bundles of rays from the object onto the sensor.

Optical approach
By selectively opening different sub-apertures (e.g. three instantaneous sub-apertures are highlighted by the Red, Green and Blue pixels in Fig. 1) sequentially, the sensor captures different light ray angles incident upon the EP from the same object point. As illustrated by the zoomed view at the sensor, depending on the depth of the object of interest, the rays through the different sub-apertures may be imaged at the same pixel when the object depth is optically conjugate to the senor or at different pixels when it is either nearer or further than the conjugate depth. Such disparity information recorded by the sub-aperture images is to be used for reconstructing the depth map of the object field or refocusing the image at different depth.
One significant advantage of a PALFL design over existing LF endoscopes using MLA is that spatial and angular resolutions of the captured LF images are only subject to the limits of the sensor resolution and the pitch of the PA, respectively, while existing LF endoscopes are subject to the tradeoffs between the spatial resolution of the images and the angular resolution of ray direction samples. Another worth-noting feature of a PALFL system is its hybrid capability. The system's instantaneous aperture can be switched between sub-aperture LF capture state and a normal capture state where a centered, regular-sized aperture is operated to capture a conventional 2D full-resolution, full FOV image that is the same as a conventional laparoscope. This capability provides a surgeon with the option, on demand, to receive guidance through the visualization of depth information.
Another interesting aspect of the PA approach is that the size and pattern of the sub-aperture can be customized based on what is needed. To match the throughput of existing LF endoscopes, sub-apertures can span multiple adjacent pixels in the PA while sensor pixels can be binned. In the case of insufficient illumination, the span can be further extended at the cost of depth-of-field or depth mapping range, and high angular resolution can still be maintained by allowing sequential sub-aperture regions to overlap. The drawback of a sequential capture is the cost of speed, but the ever-increasing frame rates of imaging sensors can well overcome this limitation. Multiplexed light field acquisition [9,10], which uses patterns spanning multiple regions of the PA per frame, can be implemented to increase signal-to-noise ratio and allow for faster frame rates.

Depth mapping resolution
A key aspect to the design of a PALFL system is to achieve adequate depth mapping resolution. This mainly depends on the maximal angular separation of the rays through the centers of the sub-apertures, which establishes the maximal baseline equivalent to a stereo system, and the minimally detectable ray separation of the imaging system. For the convenience of quantifying the depth resolution of different systems, we use the numerical aperture at the nominal working distance, NA WD , in the object space to characterize the maximal angular separation of the sub-apertures, and we use the equivalent sensor spatial resolution in the object space, B obj , of the system to quantify the minimally detectable ray separation. We assume that distinguishing the three separated rays in Fig. 1 and confidently detecting a depth offset from the sensor conjugate depth, L WD , minimally requires a 2-pixel separation (2B at the sensor or 2B obj at L WD between the Red and Blue rays on the sensor. A higher depth resolution can be achieved by digitally interpolating pixel data and refining the location of rays that land in between two pixels, but this possibility is not demonstrated here. Using similar triangles with bases located at L WD and the EP and a Taylor series expansion for simplification, the depth resolution of a PALFL design is derived: where d + and d − represent the absolute distances from the sensor conjugate depth, L WD to the closest resolvable depths away from and towards the EP, respectively, and D EP is the EP diameter. Given the pixel resolution, B, of the sensor and first-order optics specifications, without considering the effects of diffraction and aberration, B obj and NA WD are defined as: (2) Figure 2 plots the average depth resolution, d, of d + and d − in relation to NA WD for systems of different spatial resolutions in the object space. At a nominal working distance of 50 mm, the NA WD of a standard monocular laparoscope is ~0.003 while the 5 mm baseline of a state-of-the-art stereo laparoscope (with a 12 mm diameter rod) by Intuitive Surgical produces an equivalent NA WD of ~0.05. The object-space spatial resolution here is quantified by the minimally discernable pair of line features per unit distance (lps/mm), equivalent to 1/(2B obj ). The object-space resolution of a commercial laparoscope is 2-6 lps/mm and the diffraction limited resolution of the multi-resolution foveated laparoscope reported in [11] is ~12 lps/mm. Figure 2 suggests that implementing a LF laparoscope using standard laparoscope optics, with a spatial resolution of 4 lps/mm and NA WD less than 0.01, can yield a depth resolution of worse than 12 mm. The combination of NA WD of 0.015 and resolution of 6 lps/mm provides a depth resolution of ~5.5 mm, which can be useful for surgeons to determine the proximity of their surgical tools, but inadequate for accurately visualizing anatomical structures. Achieving sub-mm depth resolution with light field method requires substantial improvements in both the object-space resolution and numerical aperture of standard 2D laparoscopes. On the other hand, achieving this resolution in a LF laparoscope with dimensions like a stereo laparoscope seems possible. concept. An f/2.5 objective lens with a focal length of 1.8 mm from an existing laparoscope developed in [11] was repurposed for this prototype. The diameter of this objective lens group is 5.7 mm, which is small enough to allow space for fiber illumination and lens housing to build a standard 10-mm diameter rigid laparoscope as demonstrated in [11]. The optical system inside the rigid laparoscopic tube in [11] also consists of several groups of relay optics to relay the image of the objective to the distal end of the tube for further imaging. As the objective and the relay were optimized and custom-made independently, they can be used separately without the other and different number of the relay optics can be added or removed without affecting the optical performance. When building the PALFL bench prototype, we removed the relay optics for simplicity and only used the objective along with a newly added eyepiece, a PA, and a focusing lens as the relay optics does not add or change the imaging function of the system. The objective lens was originally optimized for an L WD of 120 mm, a D EP of 0.8 mm, and lens diameters < 6 mm, resulting in an effective NA WD of ~0.003. However, for this PALFL prototype the objective lens was used at an L WD of 20 mm. Although this distance is short for laparoscopy, it yields an NA WD as large as ~0.022 if the full EP is sampled and produces an NA WD that is more comparable to that of stereo laparoscopes. The relay lenses were omitted to simplify the lens design and optical alignment of this prototype. A 10 mm focal length eyepiece built with stock lenses was optimized to meet sufficient performance over a 60° full FOV and expanded the 0.8 mm EP of the objective lens to a 4.4 mm stop where a PA could be inserted. Note that the eyepiece diameter can be much larger than that of the objective and relay because it is outside of the patient's body. These modules were aligned to a commercial focusing lens with a focal length of 25 mm and 1/3" color CCD sensor (1.3 MP Dragonfly2 from Point Grey). The pixel resolution of the sensor is 1280 × 960, and the color pixel size is 3.75 × 3.75 μm 2 . Using Eq. (2), we can estimate that the theoretical spatial resolvability of the system in the object space is ~33 lps/mm. Using Eq. (1), the depth resolution of the prototype can potentially reach ~0.69 mm if the sub-aperture images are sampled at the full aperture and the optics perform at its full resolution. Figure 3(b) shows the prototype. The objective lens and eyepiece were assembled in a 3D printed opto-mechanical housing, as shown in the grey cylinder. Instead of using a digital PA, a physical iris mounted on a two-axis linear stage was employed. Figure 3(c) illustrates the angular sampling scheme bounded by the stop. The grid of black dots represents the locations that would be sampled sequentially by the pitch of the sub-apertures and determines the angular resolution. The iris, indicated by the red circle with arrows, moves to each sampling location. An illuminated bladder model object is placed near an L WD of 20 mm, as shown in Fig. 3(b). On the image side, the sensor was adjusted to the new conjugate image position.

Prototype and experimental setup
Since the preexisting objective lens was not optimized for this short L WD , aberrations and vignetting were introduced. To minimize degradation of data due to this issue, an effective LF calibration based on the aberration correction theory presented in [8] was developed and applied post-data capture. Similarly, this LF calibration can minimize the impact of aberrations from relay lenses. Since the focus here is the PALFL concept, this calibration is briefly summarized hereby and will be thoroughly discussed in a future paper. The calibration process began with a step of calibrating the amount of vignetting across the field of view by capturing the LF data of a flat Lambertian source extending across the full FOV. By comparing the peripheral sub-aperture images to the center one, the vignetting was quantified and minimized via multiplication for future LF data sets. Following the step of removing the vignetting effects, residual aberrations were minimized next using an analogous process. The LF data of a checkerboard extending across the full FOV was taken.
By comparing the peripheral sub-aperture images to the center one, the aberrations were quantified and minimized via lateral shifting of pixels for future LF data sets.

Figures 4(a) through 4(e)
show the captured LF data organized into sub-aperture images bordered in green according to the sample scheme shown in Fig. 3(c). The captured scene consists of a part of the bladder model and a screwdriver placed in front within the FOV to simulate a laparoscopic surgical tool. For scaling reference, the width of the screwdriver is 3 mm while the background bladder model is minified since it is farther away. The center sub-aperture image, Fig. 4(a), is uncalibrated and colored and was used as a reference for LF calibration. For the peripheral sub-aperture images, Figs. 4(b) through 4(e), the calibrated greyscale results extracted from the green color channel were shown along with white grid lines representing matching locations on the sensor for reference. Each of the original sub-aperture images has a high pixel resolution of 1280 × 960 pixels, which is the same as that of the native sensor. Due to the LF calibration, the FOV of the peripheral sub-apertures was cropped as seen in these images. Figures 4(f) through 4(i) show magnified images of a small region, marked by a Red box on each of the corresponding sub-aperture images, 4(b) through 4(e), respectively. The small but slightly different displacements of the screwdriver relative to the white reference grids in the different sub-aperture images help to visualize the ray separations described in Fig. 1 and validate that the screwdriver is in front of the nominal working distance, L WD .
The optical performance of the built prototype was limited by the quality of the stock lenses in the eyepiece and the use of a generic focusing lens. Therefore, we only utilized the greyscale images converted from the green color channel for further data processing to eliminate the effects of chromatic aberration, and we only used the center five angular samples to avoid severe vignetting and aberration-blurring, which increases significantly for sub-apertures farther from the optical axis. These five samples of sub-aperture images, however, are adequate to demonstrate the minimum data needed to achieve maximum data processing speeds and depth sensitivity from x or y-oriented image features in a PALFL system.
The angular sampling dimensions for the data in Fig. 4 were determined experimentally. A 1 mm diameter iris was found to produce sufficient sub-aperture image quality and depth of-field for the object distances of interest. A 0.91 mm pitch between the sub-apertures at the stop provided a balance between enough light ray separation at different object depths, absence of sub-aperture image aberration, and aliasing during digital refocusing.
The diffraction limited spatial resolution of the sub-apertures was measured using a 1951 USAF resolution target (groups 0-3) placed at an L WD of 20 mm. Figure 5 shows the center sub-aperture image, a zoomed in view of groups 2 and 3, and green channel intensity profiles along group 3, element 3 and 4. The bars in element 3 are clear while in element 4, they begin to diminish. This indicates that the diffraction limited spatial resolution is in between these two elements, which is ~10.7 lp/mm. Although the sub-aperture spatial resolution is limited by diffraction, the higher pixel sensor resolution is not wasted because it enables more precise measurement of disparity between sub-aperture images and will also be used for the conventional laparoscope, where the PA is fully opened and the optical resolution is higher.

Data processing
A modified open source code [12] was used to process the calibrated LF data for digital refocusing and to generate depth maps. Figure 6 demonstrates digital refocusing for three image planes corresponding to near, medium, and far object distances. At near focus, the screwdriver is identifiable while the background is blurry. At medium focus, the white protrusion on the bladder model becomes clear. At far focus, the screwdriver and white protrusion are defocused while the pink line features on the right side are beginning to defocus. Because of the minimum angular sampling for this experiment, when refocusing to one extreme depth, the opposite one shows some aliasing, as seen by the edges of the defocused screwdriver when the focus is far. Figure 7(a) was constructed by applying an intensity gradient threshold to Fig. 4(a) to highlight pixels containing strong image features for confident depth estimation. The depth was then estimated at those pixels while the other pixels were nullified. These null regions were interpolated based on the nearest confident depth estimation to construct a full depth map. This strategy reduced noisy depth estimations. Figures 7(b) and 7(c) show full depth maps generated from algorithms based on focus contrast and on correspondence feature matching, respectively. For each object point, the depth estimation is obtained by measuring at the sensor the separation between light rays captured by adjacent sub-apertures (in units of sensor pixels). A negative pixel value indicates the separation occurred in the opposite direction, as shown in the zoomed view of Fig. 1 when comparing the ray separation from near and far images. Greyscale color illustrates that darker is closer and brighter is farther, allowing determination of relative depth.
Both depth maps identify the correct objects at three different depths, according to Fig.  6. However, depending on the image feature characteristics [12] and error from defocus aliasing, the algorithms perform differently. In the focus contrast map shown in Fig. 7(b), aliasing resulted in inconsistent depth estimation between the screwdriver's edges and body. Also, aliasing likely caused slight inconsistency between the two algorithms in the depth estimation of the farthest layer of depth. Therefore, the feature matching algorithm performs better for larger depth ranges. On the contrary, for the grey valley and surrounding white region on the left side of the FOV where aliasing is absent, the focus contrast map provides a smoother depth reconstruction.

Quantitative depth mapping
A lookup table method was created to enable conversion of depth maps from the pixels measuring ray separations to absolute, quantitative depth values and to validate depth resolution based on the system design. Figure 8(a) shows the center sub-aperture view of a 45° tilted ruler providing 0.7 mm depth intervals across the vertical FOV. After applying the same LF calibrations as those in the bladder model experiment, a smooth focus contrast depth map was generated in Fig. 8(b). Based on the measured ray separation, Fig. 8(b) highlights the pixels corresponding to d ± and the L wd of 20 mm. The corresponding pixels were found in Fig. 8(a), and knowing the ruler dimensions, the units were converted to physical depth.
These results were compared to our derived depth resolution study in Sec. 3. Due to the optical performance limitations discussed earlier, we experimentally determined the following prototype specifications. Knowing the real image to object magnification and manufacturer pixel size (B), the sensor spatial resolution in the object space, B obj , of the current prototype was calculated to be 21.3 lps/mm for the center angular samples. Calculating B obj using dimensions known in Fig. 5 yields a similar result. We measured the equivalent D EP from the angular samples shown in Fig. 4 to be 0.345 mm and the equivalent NA WD of the sampled data to be 0.0074 for an L WD of 20 mm. From Eq. (1), d + and d − are 3.6 and 2.7 mm, respectively. Measured from the labeled data points in Figs. 8(a) and 8(b), the ± 1 sensor pixel depths corresponding to d + and d − are separated from the 0 sensor pixel depth on the ruler by + 5 and −4 intervals, respectively. Knowing the depth between each interval on the tilted ruler is 0.7 mm, they correspond to measured depth resolutions of 3.5 mm and 2.8 mm, respectively, resulting in a maximum percent error of 3.7% in comparison to the theoretical values. Because depth estimation may be nonuniform depending on the algorithm used and the variation of an object's texture density, the percent error can fluctuate for different objects throughout the FOV. Nevertheless, the results presented here demonstrate the potential of the PALFL while depth estimation algorithms are continually being improved.

Conclusion
In conclusion, a PALFL was conceptualized to obtain high spatial resolution LF data up to that of the camera sensor for refocusing and quantitative depth mapping, without trading off angular resolution. By taking advantage of the PA's flexibility, this hybrid system integrates the high performance of existing 2D endoscopes with 3D LF imaging. Theory was then developed to analyze, compare, and design laparoscopes regarding adequate depth resolution. A bench-top prototype using an existing laparoscope objective demonstrated proof of concept by performing quantitative depth mapping according to the depth resolution theory. Using our understanding of this prototype, the next generation PALFL will incorporate many improvements. We will optimize the optical system to achieve high performance at its full aperture, incorporate a liquid crystal array in either a transmissive or reflective mode with multiplexed LF acquisition capability to acquire data up to the sensor frame rate, include relay lenses to extend the optical system, and redesign the system to have a working distance and maximum baseline similar to current commercial stereo endoscopes. (a) Intensity gradient thresholding of Fig. 4(a) for depth mapping noise reduction. Relative depth reconstruction maps based on (b) depth from focus contrast and (c) depth from multi-view correspondence feature matching.