Developing an optical design pipeline for correcting lens aberrations and vignetting in light field cameras

: Light ﬁeld cameras have been employed in myriad applications thanks to their 3D imaging capability. By placing a microlens array in front of a conventional camera, one can measure both the spatial and angular information of incoming light rays and reconstruct a depth map. The unique optical architecture of light ﬁeld cameras poses new challenges on controlling aberrations and vignetting in lens design process. The results of our study show that ﬁeld curvature can be numerically corrected for by digital refocusing, and vignetting must be minimized because it reduces the depth reconstruction accuracy. To address this unmet need, we herein present an optical design pipeline for light ﬁeld cameras and demonstrated its implementation in a light ﬁeld endoscope.


Introduction
The light rays captured by an imaging system contain abundant information, which is described by a 7D plenoptic function P(θ, ϕ, λ, t, x, y, z) (θ, ϕ, angular coordinates; λ, wavelength; t, time; x, y, z, spatial coordinates) [1]. A conventional camera acquires only the 2D spatial information (x, y) of an input scene. By contrast, the light field camera measures both spatial (x, y) and angular information (θ, ϕ) [2], where the angular information can be further used to reconstruct a depth map (x, y, z). Due to its superior 3D imaging capability, the light field camera has been employed in various applications such as biomedical imaging [3,4], object recognition [5][6][7], and machine vision [8,9].
There are two types of light field cameras: the unfocused light field (ULF) camera [2,10] and the focused light field (FLF) camera [11]. Figure 1 shows the corresponding schematics. As shown in Fig. 1(a), in a ULF camera, three point objects S 1 , S 2 , and S 3 are first imaged by the main lens, forming intermediate image points S 1 , S 2 and S 3 . These intermediate image points are then reimaged by the microlens array (MLA) onto a detector array. Because the distance from the MLA to the detector array is equal to the focal length of the MLA, the ULF camera essentially images the pupil associated with each microlens. We use (u, v) and (x, y) to denote the Cartesian coordinates at the pupil plane and the MLA, respectively. The captured raw images (M 1 , M 2 , and M 3 in Fig. 1(a)) can be re-arranged as a 4D datacube (x, y, u, v), which is also referred to as a light field (LF) [12]. A 2D x-u slice of the LF is termed an epipolar plane image (EPI). As an example, Fig. 1(b) shows three EPIs associated with point S 1 , S 2 , and S 3 , respectively. The corresponding depths can then be deduced by estimating the slope of lines in the EPIs. A refocused image at a given depth can be reconstructed from an integral projection of the 4D LF along a trajectory in the EPIs [2]. Reconstructing images at all depths creates a focal stack of images, and an extended depth of field (DOF) image can be rendered by fusing all the reconstructed images [13].
Unlike the ULF camera, the FLF camera directly images the object, rather than pupils, onto the detector array. There are two types of FLF cameras: the Keplerian and Galilean [14]. Figure 1  shows the schematic of a Galilean FLF camera. The spacing (B) between the MLA and the detector is smaller than the focal length of the MLA. In contrast, B is larger than the focal length of the MLA in the Keplerian configuration. The depth information can be derived from the disparities between adjacent perspective images ( Fig. 1(d)), and an all-in-focus image can be reconstructed by projecting all the pixels in the raw image back to the intermediate image plane.
Although the depth calibration method and ray tracing model of light field camera have been extensively studied [15][16][17][18][19][20][21][22], the optical design of its main lens has yet to be exploited. Because of the unique optical architecture of light field cameras, the handling of lens aberrations and vignetting is significantly different from conventional lens design methods [23,24]. To address this unmet need, we systematically analyzed the effect of aberrations and vignetting on the fidelity of reconstructed images and developed a design pipeline for the main lens of light field cameras. While the proposed lens design pipeline is generally applicable to all light field cameras, we focus on a niche application in endoscopy (Section 4: Design example).

Aberrations and vignetting in light field cameras
When designing an imaging lens, although aberrations and vignetting are usually unwanted, they are not equally weighted in the tolerancing budget. Here we limit our discussion to third-order Seidel aberrations and ignore defocus and wavefront tilt. The conventional optical design prioritizes the correction of aberrations, which increase the spot size at the image plane (i.e., spherical aberration, coma, astigmatism, field curvature). Particularly, when field curvature W 222 exists, a flat object plane is imaged to a curved surface. Because the detector plane is flat, field-dependent defocus is then introduced to the final image. In the periphery field, the blur so induced is so severe that it often overshadows other aberrations. More problematically, field curvature is more difficult to correct for than other Seidel aberrations-common approaches such as lens bending/splitting and stop shifting cannot be applied because field curvature depends on only the power and refractive index of lenses, if the system is free of astigmatism. Therefore, in conventional optical design, field curvature is considered one of the toughest aberrations, and correcting for it normally leads to a bulky setup. By contrast, vignetting reduces the irradiance of the image but not the resolution, and it can be numerically corrected for in postprocessing. For this reason, vignetting is a less-concerned factor compared with Seidel aberrations.
Unlike conventional cameras that capture only the 2D (x, y) information of a scene, light field cameras measure a 4D (x, y, u, v) datacube and derive the depth from light ray angles. Therefore, designing the main lens needs a new standard. Particularly, the field curvature and vignetting must be assessed in 3D (x, y, z) rather than 2D (x, y). Figure 2 shows a light field camera with field curvature. The object is imaged by the main lens to a curved surface, as indicated by the black dashed line. The depth of field of the microlens array (MLA), denoted by DRM, determines the depth range of the main lens, while the DRM itself depends on the detector pixel size and the numerical aperture (NA) of the MLA [25]. Provided that the entire curved intermediate image locates within the DRM, the shape of the surface can be recovered through calibration [16]. As a result, the field curvature can be numerically corrected for by digital refocusing, and it can be loosely tolerated in light field cameras. By contrast, in light field cameras, vignetting must be minimized. Because light field cameras estimate depths using the light ray angles, the loss of the angular information due to vignetting will reduce the number of views in the EPIs. To elaborate on this effect, we performed a simulation using Zemax (Zemax, LLC). Figure 3 shows the shaded model of an ULF camera. The object is a point source. We use a 4F system as the main lens, which consists of two paraxial lenses (f = 15 mm) and a physical stop. The stop is placed at the Fourier plane of the first lens (i.e., back focal plane). To match the NA of the main lens and the MLA, we set the stop diameter to 1.38 mm. To introduce vignetting, we place another aperture of the same diameter at a location 10 mm after the stop. A MLA (f = 0.65 mm, lens pitch = 60 µm) locates at the back focal plane of the second lens, and a detector array is placed at the back focal plane of the MLA. The pixel size of the detector array is 4 µm. We define the vignetting factor η as: where E and E u denote the total irradiance received by the detector array with and without vignetting, respectively, and η is zero if the image is unvignetted. In the simulation, the point source was placed at the front focal plane of the first lens, and we scanned it along the x-axis at 13 different locations from 0 mm to 1.2 mm with a step size of 0.1 mm. At each step, we traced 100,000 light rays to form a raw image and rendered an EPI at v = 0 and y = 0. Figure 4(a) shows three representative raw images at x = 0 mm, 0.6 mm, 1.2 mm, and their corresponding EPIs.
The results indicate that although the slope of the line feature in the EPIs does not change, the number of pixels that forms the line (i.e., views) reduces as vignetting increases. The relation between the vignetting factor and the number of views is shown in Fig. 4(b). We calculated the number of views by enumerating the non-zero pixels in the EPI after image binarization. The light field camera reconstructs depth by estimating the slope of line features in EPIs through linear regression. The standard error of fitting can be computed by: where SE is the standard error, n is the number of observations, a i is an independent variable for the i th observation,ā is the mean, b i is a dependent variable for the i th observation, andb i is the estimated value of b i . Equation 2 implies that the standard error decreases as the number of observations increases. In light field cameras, vignetting reduces the number of views in EPIs, resulting in a larger regression error and, therefore, a reduced depth accuracy. Particularly, when the number of detector pixels associated with a microlens is small, vignetting dramatically increases the regression error. To further illustrate the effect of vignetting on depth accuracy, we defocused the point source by 6 mm towards the first lens, and we scanned it under the same conditions. Because the depth of the point source has changed, the line in the EPI is tilted with respect to the vertical axis, and it is not aligned with the detector pixels. As a result, ambiguities are introduced by sampling. Three representative raw images and corresponding EPIs at x = 0 mm, 0.6 mm, 1.2 mm are shown in Fig. 5(a). At each step, we computed the slope of the line in the EPI. The relation between the slope regression error and the vignetting factor is shown in Fig. 5(b).  It is worth mentioning that the slope regression error is also dependent on aberrations and noises. When aberrations exist, the image of a point source is no longer a sharp point, and the shape of the line in the EPI may be distorted. On the other hand, noises affect the intensity of the views and the background pixels. In both cases, a sufficient number of views is critical for faithful depth reconstruction. Therefore, vignetting must be minimized in light field cameras.
Finally, we validated the effect of vignetting through a real experiment. The optical setup of an unfocused light field camera is shown in Fig. 6(a). We used a 4F system as the main lens, which consists of two 50 mm focal length achromatic doublets (Thorlabs, AC254-050-A-ML). A 4.8 mm diameter stop was placed at the Fourier plane to match the NA of the main lens and the MLA. An MLA with a 50 µm pitch was placed at the back focal plane of the second lens, and the spacing between the MLA and the camera (Lumenera, Lt965R) is equal to the MLA focal length. A flat printed grid pattern was used as the object, and it locates near the front focal plane of the main lens. An adjustable aperture was positioned 12 mm before the camera, and its diameter was set to be 2.8 mm, 4 mm, and 5 mm to create different levels of vignetting. We captured a raw image for each aperture diameter and a baseline image when the aperture was removed (i.e. no vignetting). A representative raw image when the aperture diameter = 4 mm and the baseline image are shown in Fig. 6(b), each including two magnified subfields. Compared to the baseline, Area 2 from the raw image when the aperture diameter = 4 mm shows vignetted pupils. Next, we calculated the vignetting factor and generated a disparity map for each image, followed by computing the root-mean-squared error (RMSE) for each disparity map. Note that a depth map can be further rendered based on disparity-to-depth calibration [16]. The resultant disparity maps are shown in Fig. 7. The experimental results indicate that the disparity RMSE increases as the vignetting factor increases. Therefore, depth accuracy would be jeopardized if vignetting exists.

Lens design for light field cameras
Compared to conventional cameras, light field cameras can tolerate field curvature but are sensitive to vignetting. The field curvature coefficient W 220 can be separated into two terms: where W 222 is proportional to astigmatism and W 220p is Petzval curvature. Without astigmatism, the field curvature reduces to Petzval curvature. Because Petzval curvature depends on only the power and refractive index of lenses, it is insensitive to most aberration correction methods (e.g., lens bending/splitting, stop shifting). The primary method to flat Petzval surface is to add negative power lenses and create air spaces in between. However, it makes the system bulky and expensive. Therefore, releasing the tolerance on the field curvature can greatly reduce the system complexity and design constraints. For example, if we use a single ball lens as the main lens in a light field camera, all off-axis aberrations would be eliminated [26]. Digitally correcting for the remained field curvature provides an ideal solution to achieve a large field of view with a high resolution.
To minimize vignetting in a light field camera, we put a constraint on the lens aperture: where a is the radius of the aperture, andȳ and y are the chief ray height and marginal ray height at the aperture position, respectively. In addition, we force the telecentricity of the main lens in the image space. Figure 8 illustrates the proposed optical design pipeline, which differs from the conventional standard in two aspects: first, the field curvature is not a primary design constraint and can be loosely tolerated, while vignetting must be strictly minimized. Second, optimization must be performed in 3D (x, y, z) rather than 2D (x, y)-we must account for all object points within both the depth range (z) and FOV (x, y). In practice, given radial symmetry, it is justified to sample object points only in the y-z plane. During optimization, we assign each (y, z) object point to a system configuration. We then perform ray tracing in each configuration and calculate the corresponding vignetting factor. Lastly, we construct a y-z vignetting factor map and compute the mean. We use this value as the metric to evaluate vignetting of the system.

Design example
To demonstrate the implementation of the proposed pipeline, we designed the main lens for a light field endoscope using Zemax. The desired specifications are listed in Table 1. We selected a double Gauss lens as the initial configuration to reduce odd aberrations, followed by scaling down the lens to the required diameter. Next, nine object points within the depth range (z) and the FOV (x, y) were chosen to build the multi-configuration, as summarized in Fig. 9. The working distance (WD) is defined as the distance between an object point and the first surface of the main lens. We inserted a dummy surface after the nominal image plane (where the marginal ray height = 0 mm) in each configuration, which serves as the real image plane. Due to field curvature, defocus is introduced for off-axis object points. During optimization, the location of the dummy surface was set as a variable, and each configuration was optimized independently to compensate for the field-dependent defocus. In this way, the effect of field curvature is excluded in the merit function for image quality optimization. Next, we built the merit function based on design specifications. The activated operands are summarized in Table 2. The variables consist of the radius of surface curvature and the central thickness between adjacent surfaces. Only spherical surfaces are used for each lens element. The optimization process is divided into two steps: local optimization and global optimization. During the local optimization, the paraxial magnification is defined using operand PMAG, RECI, ABLT, and ABGT. The desired magnification of the main lens is −0.2. We used operand AXCL to minimize the axial color, while other aberrations (spherical aberration, coma, astigmatism, distortion, and lateral color) are optimized together to minimize the root-mean-squared (RMS) spot size using default operand TRAC. Particularly, we limited vignetting by image space telecentricity. The operand RAID was used to confine the chief ray angle (CRA) at the last surface of the lens. In addition, the semi-diameter of the lens group was limited by operand MXSD, and the air and glass thicknesses were constrained by operand MNCA, MXCA, MNEA, MNCG, MXCG, and MNEG. During the global optimization, we made two changes: first, we replaced operand TRAC with operand OPDX to minimize the RMS wavefront error. Second, the glass type of each element was set as "substitute" for better performance.
To meet the length requirement, we further used Hopkins rod lenses as the relay lens. The desired magnification of the relay lens is 1. We started with two thick doublets, which are symmetric about the stop. As a result, the lens does not introduce coma, distortion, and lateral color. We used the same merit function as that in the main lens, except the object space telecentric was forced to match the pupil. The variables consist of the radius of curvature of each surface and  The schematic of the final endoscope is shown in Fig. 10 The original lens design file is shown in Dataset 1 (Ref. [31]). The effective focal length (EFFL) of the system is 14.6 mm, and the total length (TOTR) is 212 mm. The back focal length is 3 mm, and the paraxial magnification is −0.206. Figure 11 shows spot diagrams of three configurations when working distance = 65 mm and object height = 0 mm, 7 mm, 10 mm, respectively, and the corresponding modulation transfer functions (MTFs) are shown in Fig. 12. Finally, we performed ray tracing to calculate vignetting factors for all object points within the depth range and the FOV, and the result is shown in Fig. 13, where the pixel value represents the normalized percentage of unvignetted rays. The mean of this map is 0.99, implying that only one percent of total rays are vignetted. The resultant design, therefore, maximizes the depth reconstruction fidelity. Fig. 11. Spot diagrams corresponding to three configurations in which working distance = 65 mm, object height = 0 mm, 7 mm, 10 mm, respectively.

Conclusion
In this paper, we systemically studied the effect of field curvature and vignetting on light field depth reconstruction accuracy. We show that the field curvature in light field cameras can be loosely tolerated, while vignetting must be minimized to assure high reconstruction fidelity. To incorporate this finding into the lens design process, we developed a pipeline that optimizes the optical performance of light field cameras in a 3D space, facilitating the computational refocus and parallax-based depth estimation. We expect this work will lay the foundation for future light field camera lens design development, particularly in biomedical applications where diagnosis and treatment heavily rely on the accuracy of the 3D measurement [27][28][29].
Noteworthily, our current optical design pipeline is applicable to only ray optics models. This premise holds valid for light field cameras with a relatively small aperture, such as a light field endoscope. For large NA imaging, to account for the diffraction effect that occur when recording the light field, we must adapt the design process for a wave optics model [30] instead. This study is out of the scope of current work, and we will leave it for future investigation.