Advanced light-field refocusing through tomographic modeling of the photographed scene

: Recently we have shown that light-ﬁeld photography images can be interpreted as limited-angle cone-beam tomography acquisitions. Here, we use this property to develop a direct-space tomographic refocusing formulation that allows one to refocus both unfocused and focused light-ﬁeld images. We express the reconstruction as a convex optimization problem, thus enabling the use of various regularization terms to help suppress artifacts, and a wide class of existing advanced tomographic algorithms. This formulation also supports super-resolved reconstructions and the correction of the optical system’s limited frequency response (point spread function). We validate this method with numerical and real-world examples.


Introduction
Portable plenoptic cameras offer unique features which do not exist with traditional cameras, such as the possibility to refocus on different planes of the scene after acquisition [1,2], and the ability to extract depth information regarding the photographed objects [3].These features come at the cost of a greatly reduced spatial resolution.As a result, researchers and developers have invested a considerable effort on developing custom and sometimes very specialized algorithms for computational photography techniques, in particular for super-resolution applications.
Higher light-field refocusing quality and resolution are prerequisites for important applications that can benefit from the ability to refocus on multiple distances after acquisition, depth estimation and volume rendering, all from a single light-field.Examples of these applications can be found in fields ranging from industrial quality control and automation processes to autonomous driving.As light-field imaging will be increasingly adopted by these applications, greater importance will be given to quality and richness of details of the refocused images.
Many approaches attempt to obtain spatially super-resolved light-fields from lower resolution acquisitions [4][5][6][7][8][9].The resulting super-resolved light-fields contain finer detailed structures that can deliver higher quality depth-estimation (distance from the camera) of the objects in the scene, and deliver higher quality image reconstruction with traditional refocusing algorithms.These approaches need strong a priori information to work properly, like a map of the distances from the camera of all the objects in the scene, or the knowledge of their composing structures, which limits their applicability.
A different approach is taken in [10] for a compound-eye imaging system, and in [11] for the light-field microscope, where the authors use the redundancy of the acquired data to super-resolve the refocused images, instead of creating new information in the data.In both [10] and [11], the image formation process is modelled as a linear process, and the high-resolution image reconstruction is formulated as the solution to a minimization problem.The limited frequency response of the optical system, modelled as a point spread function (PSF), is taken into account in the image formation model.
As specified in [11], the traditional light-field acquisition scheme, called unfocused light-field (ULF), does not allow to super-resolve images at the acquisition focal distance.In [12] the authors propose a deconvolution-like solution to the problem, while in [13,14], the authors propose a new popular acquisition scheme, which is known as focused light-field (FLF).Traditionally, the ULF and FLF have been treated as two different acquisition setups that needed different algorithms for similar tasks like refocusing.In the ULF case, notable examples of refocusing approaches are: the integration method [15], the Fourier-space method based on the Fourier slice theorem [16], and a volumetric filtered version of the latter [17].In the FLF case, we have mainly the patch method [14], and a more sophisticated super-resolution method that explicitly takes into account the positioning of the pixels on the sensor [18].Recently, in [19], the authors have demonstrated that the algorithms developed for one camera setup can be applied to the other setup.However, the resulting image resolutions and quality depend on the chosen algorithm.
In a recent paper, we described a framework in which a light-field photography image can be interpreted as limited angle cone-beam tomography acquisition, [20].The framework presents an in-depth analysis of how different parametrizations of the ray-tracing model can be formulated so that the well-known back-projection methods for cone-beam tomography can be used to solve the refocusing problem.
In this paper, we propose to replace the back-projection approach introduced earlier by an algebraic model of the cone-beam tomography problem.This new formulation exposes a broad range of more sophisticated tomographic reconstruction techniques, introducing a number of powerful modelling and computational formulations for light field reconstruction.Specifically: • We extend the formulation by unifying the treatment of unfocused light-field (ULF) and focused light-field (FLF) images.
• We formulate the refocusing reconstruction as a convex optimization problem, introduce additional regularization terms to cope with artifacts, and show how advanced iterative algorithms can be used to solve the problem.
• We show how super-resolved refocused images can be computed.
• We correct for the limited frequency response of the optical system (point spread function).
Each of these extensions contributes to a measurable improvement of the refocusing reconstruction quality.

Detailed content description
The introduction of tomographic projection operations naturally allows to express the image formation process as a linear system, similarly to [10,11], where the forward operator is composed for both geometries of: a tomographic forward-projection operator; a down-sampling operator in the case of super-resolved images; and a blurring operator.The blurring operator takes into account the effects of Fraunhofer diffraction on the lenslets [10,12] and the Nyquist limitation on the refocusing resolution [16].
We propose the use of regularizations like the Total Variation minimization (TV-min) or the stationary wavelet transform (SWT) [21,22], instead of the traditional smoothness constraints [23].These advanced constraints allow to effectively suppress artifacts related to the super-resolution reconstruction, while enforcing prior knowledge in the reconstruction, like sharpness of the in-focus objects, or its compressibility in the wavelet domain.We also demonstrate the use of both established and recent customizable algorithms, which are widely used in the tomography community and include the Simultaneous Iterative Reconstruction Technique (SIRT) [24,25], and the Chambolle-Pock (CP) algorithm instances [26,27].The use of consolidated algorithms allows to re-use existing high-performance solutions, while the use of CP algorithm instances allows to have tailored algorithms for the most popular regularization functionals in a straight-forward manner.
The rest of the article is structured as follows.In section 2 we introduce the used formalism, explain the geometric adjustments needed for the ULF and FLF cases, construct the minimization problem, and present the used reconstruction techniques.In section 3 we present the results of the reconstruction methods detailed in section 2 on simulated and real-world examples, and compare them against a representative selection of existing methods.In section 4 we conclude the paper.

Methods
In this section we introduce and discuss the mathematical tools used in the paper.In section 2.2 we establish the relationship between the projection geometry, the optical system's point spread function (PSF) and the image formation.Then in section 2.3 we express the refocusing as a convex optimization problem.Finally in section 2.4 we introduce the regularization terms used in this article.

Conventions
We schematically represent the two typical plenoptic acquisition setups in Fig. 1.They are composed of a main lens (ML), a micro-lenses array (MLA), and a sensor.The camera array setup is not considered explicitly here, but it is equivalent to the setup in Fig. 1(a), with the only difference being that it lacks the ML [20].
We adopt the conventions and representations from [20].In both Figs.1(a) and 1(b) the ML divides the space into the object space which is the photographed scene (left side) and the image space which is the inside of the camera (right side).
The focal lengths of the ML and the lenslets in the MLA are called f 1 and f 2 respectively, and the distance between the natural focal plane in the object space and the ML is called z 0 .In the ULF case (Fig. 1(a)), the MLA-to-sensor distance is called z 2 and is usually set to f 2 , while the ML-to-MLA distance is z 1 .Given the fact that f 2 << z 1 , the ML is considered in this case to be at the optical infinity of the lenslets in the MLA.In the FLF case, we can have two configurations: "Keplerian" and "Galilean" [28].For simplicity, throughout this paper, we only work with a Keplerian setup (represented in Fig. 1(b)), but our approach equally works for the Galilean setup.In Keplerian mode, the image from the ML is formed in front of the MLA, at the distance z 1 , and that has a ML-to-MLA distance equal to z 1 + a, with a > 0. In Galilean mode, the image from the ML is virtual and it is formed behind the MLA, which is placed again at a distance of z 1 + a from the ML, but with a < 0. For an FLF in Keplerian configuration, the MLA-to-sensor distance is called b and satisfies the thin lens equation: In the ULF configuration, the coordinates (u, v) define the positions on the main lens, while the coordinates (s, t) define the positions of each lenslet on the MLA.For the pixels behind each lenslet, we use coordinates (σ, τ), whose origin is on the center of the lenslet footprint.In this setup, there is a one-to-one correspondence with coordinates (u, v) on the main lens, given by: We refer to the collection of pixels behind a lenslet as the micro-image.The pixels sharing the same (σ, τ) position under each lenslet form a sub-aperture -or pinhole -image.This is due to the fact that their intensities originate from the same point (u, v), as if there were a pinhole in that position.The corresponding pinhole is indicated by the yellow apertures in Fig. 1(a).Compared to the ULF case, we showed in [20] that in the FLF case, we observe a shift in the (s, t) coordinates of each sub-aperture image, which depends on its associated (σ, τ) coordinates: where s MLA and t MLA are the coordinates of the center of the lenslets on the MLA, while the (u, v) coordinates become:

Forward model
In [20] we derived the tomographic volumetric forward-projection and back-projection formulas for the ULF, which relate lines in the considered volume of the photographed scene to points in the light-field and vice versa.The complete forward-projection formula is: where M = z 0 /z 1 is the magnification of the recorded pixels in the photographed scene, (s o , t o ) are the transverse coordinates in the object space, (s i , t i ) are the transverse coordinates in the image space, L (s i , t i , u, v) is the computed light-field, E (s o , t o , z) is the function describing the photographed scene, with support and The back-projection (integration refocusing) formula is instead: where E (s o , t o , z) is the produced focal stack, L (s i , t i , u, v) is the recorded light-field, with support and |U| and |V | are the largest sampled coordinates on the ML.Equations ( 7) and ( 8) are also valid for the FLF case when considering the coordinates (s i , t i ) on the plane at distance z 1 inside the camera.From Eqs. ( 3) and ( 4), we deduce that each recorded sub-aperture image presents a displacement equal to ∆s i = (a/b)∆σ and ∆t i = (a/b)∆τ, respectively.Using Eqs. ( 5) and ( 6), we can compute the displacements of the recorded sub-aperture images in the image space, as functions of the (u, v) coordinates: In the object space, the shifts become: The resulting forward-projection equation for the FLF case is simply: and the back-projection equation is: From a tomographic perspective, in the object space, both Eq. ( 7) for the ULF and Eq. ( 13) for the FLF compute the line integral for the pixel (s o , t o , z 0 ) of a projection having source in (u, v, 0), in a cone-beam geometry acquisition.The only difference between the two resides in the definition of the pixel (s o , t o , z 0 ), which is shifted by the amounts (∆s o,u , ∆t o,v , 0) in the FLF configuration.Due to this, they can be both re-interpreted as the application of the integral forward-projection operator 7) and ( 13) into: where the integration is over the coordinates (s o , t o , z), and the details of the geometry are absorbed into the operator A (s i , t i , u, v; s o , t o , z).Equations ( 8) and ( 14) are the adjoint operations of Eqs. ( 7) and ( 13), respectively.So, Eqs. ( 8) and ( 14) can also be re-interpreted both as the application of the integral back-projection operator where the integration is over the coordinates (s i , t i , u, v), and A † is the adjoint of A. The forward-projection matrix A is the matrix representation of the operator where the rows are selected by the set of indexes {s i , t i , u, v}, and the columns are selected by the set of indexes {s o , t o , z}.The back-projection matrix A T is then the matrix representation of the operator where the rows are selected by the set of indexes {s o , t o , z}, and the columns are selected by the set of indexes {s i , t i , u, v}.The matrices A and A T are sparse, and have real non-negative values [29].
In this article, we restrict ourselves to refocusing at a single distance, as opposite to reconstructing an entire volume from one light-field image.This turns Eq. ( 7) into the projection from a slice at position z in the volume to the light-field: and Eq. ( 8) into the projection from the light-field to such slice at position z: where α = z /z 0 , and the support Ω o,z is the slice at position z = z .In Fig. 2(a) we show the geometric representation for Eqs. ( 17) and ( 18) and α = 1.For α 1, the dotted lines remain the same, but the image grid is shifted to the corresponding z = αz 0 position.For the FLF case, Eq. ( 13) becomes: and Eq. ( 14) becomes: Without loss of generality we can identify again A z as the matrix representation of both the forward-projection operators in Eqs. ( 17) and (19), and A T z as the matrix representation of the back-projection operators in Eqs. ( 18) and ( 20).
If we now consider the discretization of the functions L (s i , t i , u, v) and E z (s o , t o ) as the vectors b and x z respectively, the image formation process for one slice is represented by the following linear system: where xz is the vector representation of the considered slice in the photographed scene, and b is the vector representation of the recorded light-field.In Eq. ( 21) we assume that all the recorded light comes from the chosen slice at distance z.
Any object outside the slab defined by the acquisition depth-of-field (DoF) around z, is not reconstructed correctly and appears out of focus in the refocused images.The acquisition DoF is the sum of two factors: a geometric and a diffractive component.The former was also considered in [20] from the point of view of a tomographic acquisition.It is linearly proportional to the lenslet diameter and inversely proportional to the ML numerical aperture (NA).The latter has instead an inverse quadratic proportionality to the ML NA, and it is proportional to the detected photon wavelength.For a more in depth explanation and derivation of the DoF we refer to [30].For translucent objects, this means that if they are in-focus, their shape is correctly refocused, but the objects outside of the DoF that are visible through them are imaged out of focus.
In case of super-resolved reconstructions, we assume that the refocused slice in the photographed scene has a smaller pixel size than the recorded light-field in the (s, t) coordinates.Equation ( 21) can be modified to remove this discrepancy, by adding a down-scaling operation after the projection of the matrix A z .The down-scaling matrix U n bins the value of n × n neighbouring pixels into a single value, where n is the ratio between the recorded light-field resolution and the slice resolution in the (s, t) coordinates.The resulting image formation process from Eq. ( 21) becomes: The optical system's limited frequency response is also introduced in the form of a convolution matrix P z (Toeplitz matrix), applied to the binned images [31].This matrix performs the convolution of the binned images with the total PSF of the optical system.This PSF itself is the convolution of the PSFs generated by the lenses' limited size and the Nyquist limitation on the resolution.The ML and lenslet's PSFs correspond to the Airy functions defined by their size and the captured light wavelength [2,11,32].They do not depend on z, and they are applied over the (s i , t i ) and (u, v) coordinates respectively.The Nyquist limitation on the resolution corresponds to the integration over the corresponding portion to a given sub-aperture image on the ML [16].It has the shape of a flat disk whose width depends on the distance z, refocusing distance z 0 and (u, v) resolution, and it is applied over the (s i , t i ) coordinates.
By introducing the optical system's limited frequency response and super-resolution capabilities, the complete image formation process for one slice from Eq. ( 22) becomes: where U n is the binning matrix for the chosen n super-resolution, P z is the convolutional matrix that applies the optical system's PSF.The matrices A z and A T z can be computed explicitly by using the sets of unit vectors E o = {e o,1 , e o,2 , . ..} and L i = {l i,1 , l i,2 , . ..} respectively.The unit vectors E o correspond each to a volume in the (s o , t o , z) coordinates, where every value is zero, except for the selected point of position (s o,k , t o,k , z k ), which is one.The same goes for the unit vectors L i in the coordinates (s i , t i , u, v).These vectors are used in Eqs. ( 17)-( 20) to compute a volume in the adjoint space.These resulting volumes are the columns of A z and the rows of A T z when using unit vectors E o , while they are the rows of A z and the columns of A T z when using the unit vectors L i .While this procedure can give mathematical insight about the projection operations, for practical applications, it is better to use existing efficient tomographic implementations, like the ASTRA toolbox [33].

Refocusing as a minimization problem
Refocusing is the procedure of retrieving a photograph E z (s o , t o ) for a given refocusing distance z = αz 0 , from the light-field acquired at z 0 .It was proven in [20] that the simple back-projection of the sub-aperture images corresponds to the integration method for non super-resolved photographs described in [1].According to Eq. ( 21), we obtain the stack of refocused images xz , from Eqs. ( 18) and ( 20), as follows: where b is again the vector representation of the recorded light-field.For super-resolved images, based on the forward model from Eq. ( 22), we update Eq. ( 24) to: where U T n is the up-sampling matrix, transpose of the U n down-sampling matrix.The Fourier slice theorem method from [16] uses an approximation ÃT z to the back-projection matrix A T z which is computationally less expensive, resulting in a very similar refocusing problem: When refocusing to only one slice, the method from [17] is equivalent to the one from [16], in the sense that it performs a Gaussian re-sampling of the Fourier space instead of using Kaiser-Bessel functions.This results in just a slightly different approximation matrix ÃT z for Eq. ( 26).In this article, we formulate the refocusing reconstruction as a convex minimization problem, similarly to the high resolution image reconstruction from [10,11].We minimize the residual on the captured light-field, using Eq. ( 23): where the non-negativity constraint imposes the fact that there cannot be negative light scattering.This representation can be solved by existing iterative algorithms like SIRT and Conjugate Gradient Least-Squares (CGLS) [25].The SIRT implementation for the solution of Eq. ( 27) can be found in Appendix A.

Regularization
The recent introduction of reconstruction algorithm classes like Chambolle-Pock [26,27], offers the ability to generate algorithm instances that solve arbitrary problems characterized by convex and possibly non-smooth terms, in a straight-forward manner.This allows to surpass the implementations from [23] and [10], and to augment Eq. ( 27) with non-smooth convex terms like the following: where λ is a weighting parameter, and O is an operator.The l 1 -norm promotes sparsity [34], and we can chose an operator O that returns a sparse representation of the solution vector x when it presents certain characteristics.This can be used to impose prior knowledge on the reconstruction.For instance, natural images present a sparse wavelet decomposition, while this is not the case for white noise [34].Thus, reducing the l 1 -norm of the wavelet decomposition for natural images, reduces the impact of white noise and possibly other sources of noise, like pixel value quantization and pixel surface integration.
Another popular choice for O is the modulus of the point-wise gradient, whose l 1 -norm is known as Total Variation: The TV term promotes isotropic gradient sparsity in the reconstructed images, thus favouring flat regions with sharp boundaries [27].
In this article, we propose to use either the TV term or the translation-invariant wavelet transform (SWT) [21,22].The SWT is an overcomplete wavelet decomposition, that allows to remove the well-known block artifacts that characterize high compression levels of the traditional discrete wavelet transform (DWT), while preserving its sparsity properties.The CP implementation for the solution of Eq. ( 28) with a TV regularization term can be found in Appendix A.

Practical limitations
The model presented in section 2.2 suffers from a few practical limitations.Three main limitations are: lenslets are usually not perfectly aligned with the underlying sensor pixel grid; the ML does not have real pinholes, but it integrates over a finite region; and the ML causes geometric distortion.
The first limitation implies a mismatch of the (σ, τ) coordinates between the pixels under each lenslet.As a result, none of the presented methods, including the one from the literature can be directly applied to the collected data.This is routinely solved by a re-interpolation of the pixel grid from the area underneath each lenslet (lenslet footprint).The resulting re-sampled grid has the desired matching of (σ, τ) coordinates, but it suffers from a small penalty in directional resolution, as large as half of the (u, v) pixel-size.
The second corresponds to the Nyquist limitation on the resolution discussed in section 2.2.It results in a disk blur dependent on the refocusing distance from the original focal distance z 0 , and it can be mitigated through the inclusion of such blurring term in the projection model, as discussed in section 2.2.
The third limitation causes straight lines to be reconstructed as curved lines.This a well-known problem in computer vision, and it is routinely solved by a check-board calibration of the camera, which returns a map of its distortion.Once the camera has been calibrated, the acquired images can be rectified through a common image distortion correction procedure.Reliable software for carrying out this procedure already exists.An example is the "Light Field Toolbox for Matlab" (LFT: http://dgd.vision/Tools/LFToolbox/)from Donald G. Dansereau.
For a more in depth analysis of lenslet alignment with the underlying sensor, and the geometric distortion introduced by the ML, we refer to [35].

Reconstructions
We now address the reconstruction performance and applicability of the proposed methods.We first use a known phantom from [20] (the "logos" test case) to test the reconstruction fidelity of the refocused objects, for different up-scaling factors (Fig. 3).For brevity we only reconstruct the logos example for an ULF camera geometry.We then demonstrate the ability of our approach to reconstruct real-data for both ULF and FLF camera configurations.For the ULF case, we first show reconstructions of two versions of the "Tarot Cards" dataset from the Stanford Light-Field Archive (http://lightfield.stanford.edu/lfs.html),originally acquired by Andrew Adams (data: Fig. 4).The two versions are used for different purposes: one is downscaled and blurred to simulate an MLA-based acquisition setup, and assess the PSF correction and super-resolution performance; the other is the original full-resolution version, which is used to show that the developed methods can work with full-resolution gantry setups.To further test the ULF case, we then show reconstructions of the "Tea" dataset, acquired with the setup in Fig. 5(a) using the Lytro Illum plenoptic camera (data: Fig. 6).
For the FLF case, we show the reconstruction of two datasets called "Letters" (micro-images: Fig. 7(a), sub-aperture images: Fig. 7(b)) and "Flower" (micro-images: Fig. 8(b), sub-aperture images: Fig. 8(c)), that we acquired experimentally with the dedicated setup in Fig. 5(b).The Letters dataset depicts vector art, that is easily recognizable by the human eye, and that is favoured by the TV-min constraint.The Flower dataset depicts natural imagery (https://github.com/pratulsrinivasan/Local_Light_Field_Synthesis), which is more complex and favoured by the SWT-min constraint.b) is due to the fact that FLF data mixes spatial information with angular information at the micro-images [36], and that the lenslets perform an image space inversion of the images that form on their object space focal plane.

Data description
The logos scene is composed of three different objects positioned at 90, 100 and 110 mm respectively from the ML, as displayed in Fig. 3(a).The dataset is composed of 16 × 16 sub-aperture images, each composed of 512 × 256 pixels.The sub-aperture images for this case are presented in Fig. 3(b).The diameter d 1 of the main lens is 10 mm, while each lenslet has a diameter d 2 = 16 µm.The distance z 2 is 50 µm, and the detector pixel size 1µm.The ML-to-MLA distance z 1 is 25 mm, so that the focal plane in the object space is at 100 mm (|M | = 4).In the creation of this dataset we applied the two PSFs due to Fraunhofer diffraction for the ML and the lenslets in the MLA.These PSFs were generated by computing the Fraunhofer diffraction pattern predicted for the lens aperture (d 1 and d 2 for ML and MLA respectively), detected light wavelength, and lens-to-target distance (z 1 and z 2 for ML and MLA respectively).This diffraction pattern was then integrated in steps equal to the underlying pixel size (16 µm and 1µm for ML and MLA respectively).
To enable the evaluation of the up-sampling refocusing reconstruction quality for the logos case, we also created two additional versions of such test case, where we binned (down-sampled) the sub-aperture images along the (s, t) coordinates by 2 and 4. The reconstruction of these down-sampled datasets was performed at the original dataset (s, t) resolution, turning it into a super-resolved reconstruction problem with corresponding up-sampling factors of 2 and 4, respectively.The resulting up-sampled refocused images could then be compared to the original (high resolution) phantom.In general, this means that for increasing sampling factors, the available data (s, t) resolution becomes worse, less data is available, and more operations are needed to produce images with the original (s, t) resolution.This procedure also resulted in the re-scaling of the expected lenslet sizes, detector pixel sizes, and PSF widths, according to the sampling factors.
The Tarot cards dataset has been acquired using a gantry setup (analogous to a camera array setup), which was built using Lego Mindstorms, and a Canon Digital Rebel XTi, with a Canon 10-22 mm lens.The cropped and registered dataset from the archive comes as a collection of 17 × 17 sub-aperture images, with a relatively high (s, t) resolution (1024 × 1024).The focal length of the main lens f 1 was set to 20 mm.We binned the sub-aperture images to a resolution of 256 × 256, and blurred the light-field in the (u, v) coordinates using a different PSF per color channel.The three PSFs have been computed as the result of the Fraunhofer diffraction effects due to the lenslets, with the following parameters: lenslet diameter d 2 = 20 µm, distance z 2 = 37 µm and detector pixel size 1.4 µm.The parameters for the lenslets are the same as the ones in a well known plenoptic camera, commercially available under the name of Lytro ILLUM.The modified Tarots cards dataset is displayed in Fig. 4.
The Tea dataset (Fig. 6) has been acquired using a Lytro Illum plenoptic camera, with the setup in Fig. 5(a).Being the Lytro Illum a proprietary camera, we do not have the complete acquisition parameters.The raw data and metadata were extracted using the proprietary software "Lytro power tools", property of Lytro.This software already performs the rectification and lenslet re-sampling procedures discussed in section 2.5.
The Letters and Flower datasets (Figs. 7 and 8) represent printed vector art and natural imagery, that was then photographed with the dedicated setup in Fig. 5(b), in the laboratories of Imagine Optic (Bordeaux, France).In this setup, we used the AC254-200-A-ML ML and MLA300-14AR-M MLA from Thorlabs, and the Stingray F-145 sensor from Allied Vision.The ML has focal distance f 1 = 200 mm, and the MLA is composed of ∼ 30 × 30 lenslets with f 2 = 18.6 mm, diameter d 2 = 300 µm, and built-in f -number equal to 62.The detector pixel size is 6.45 µm.We chose a z 1 distance of 400 mm, which gave a z 0 distance of 400 mm and |M | = 1.We also chose a distance b = 20.93 mm, which corresponded in a distance a = 167.4mm.The field-of-view (FoV) of the scene is 7.2 mm × 4.8 mm, while the DoF is 2.5 mm [30].The degradation of the micro-image quality observed in both Figs.7 and 8 is due to the PSF of the imaging system, and in particular to the large lenslet f -number, which negatively correlates to the imaging system spatial resolution [30].
For the reconstruction of all these test cases, the chosen regularization weight λ values have been selected empirically by trial and error.In the case of multi-channel data (e.g.RGB), we performed a single-channel reconstruction, where each channel has been reconstructed separately.However, the same λ value was used for the reconstruction of all the channels.

Performance analysis
To evaluate the reconstruction fidelity of the logos test case, we can directly compare the refocused regions with the phantom, and compute the root mean square error (RMSE) of each refocused image using the formula: where the summation is over the selected N pixels, and the vector xz represents the expected solution at the given depth z.Lower values of the RMSE correspond to better refocusing performance.

Logos refocusing
The logos test case is a benchmark on the impact of both the PSF correction and the regularization, with respect to different up-sampling parameters.We compare the refocusing reconstruction results between a few different reconstruction approaches: integration (known also as "shift&sum"), the mathematically equivalent tomographic simple back-projection, Fourier slice theorem, SIRT with and without PSF correction, and CP using the Least Squares (l 2 ) data-divergence term and TV penalization (abbreviated to CP-LS-TV), both with and without PSF correction.The refocusing was performed on the CWI logo at distance 90 mm from the ML, corresponding to α = 0.9 (as defined in section 2.3).As mentioned in section 2.3, the choice of a simple Fourier slice theorem implementation from [16], over a filtered version from [17] is due to the fact that the two versions, for planar refocusing only differ by the introduction of a filtering step that is equivalent to a Gaussian resampling, instead of the Kaiser-Bessel resampling proposed in [16].This means that on a noiseless phantom, the two methods should behave very similarly.
In Fig. 9 we present all the different reconstructions against the all-in-focus phantom, at the original phantom resolution (sampling factor = 1: down-sampling of sub-aperture images = 1, up-sampling of reconstruction = 1).It is apparent from the comparison of Figs.9(b) to 9(d) against Figs.9(e) to 9(h) that iterative methods can define better sharp transitions than traditional one-step refocusing methods, even in the presence of blurring.Then, from the comparison between Figs. 9(e) to 9(f) and Figs.9(g) to 9(h), we see that the PSF correction greatly improves the recovery of sharp boundaries.From the comparison between Fig. 9(g) and Fig. 9(h), by looking close to the edge of the CWI target in the left inset of each figure, it is possible to notice that the additional regularization term helps in reducing the wavy type of artifacts due to the PSF reconstruction.
Figure 10(a) quantifies the performance of the different methods, by plotting the RMSEs of each reconstruction method for the down-sampling factors 1, 2 and 4 of the sub-aperture images in the original light-field, and corresponding up-sampling of the refocusing reconstructions.As also observed for Fig. 9, the one-step refocusing methods present similar reconstruction accuracy across the board, with the difference between the integration and the back-projection approach simply being due to the different interpolation method used in their implementations.Among the iterative methods, the PSF correction can improve the reconstruction accuracy by almost an order of magnitude at the highest resolutions, while it becomes less important with higher detector binning factors, because the PSF effect tends to be substantially reduced [10].This can be observed in particular for the sampling factor equal to 2, where the SIRT and CP-LS-TV reconstructions without PSF correction improve their refocusing performance with respect to the original dataset, because of a reduction in size of the lenslet PSF.The choice of the correct regularization (in this case TV-min) improves the result over non-regularized reconstructions, and it becomes relatively more important at higher sampling factors and with smaller PSFs.Finally, in Fig. 10(b) we can observe the computational time required for every tested method at each down-sampling factor.Even if the focus of this article is not on the computational performance of the proposed method, we believe that it is worth giving an understanding of the performance penalties associated with each addressed aspect.These tests were carried out on a machine equipped with an Intel i7-6700K CPU, 64GB of DDR4 memory, and a NVIDIA GeForce GTX 970 GPU.The fastest method is the back-projection because it is performed using the ASTRA toolbox, which uses GPU acceleration.All the iterative methods also use the GPU acceleration offered by ASTRA for the forward and back-projection, but perform the rest of the operations on the CPU.This imposes data transfers across the PCIe bus (from GPU to CPU and vice versa) at each iteration.We expect to see substantial improvements by running these refocusing routines entirely on the GPU, by both avoiding bottlenecks due to the data transfers, and allowing to accelerate every other operation, including the PSF application.For the same number of iterations, the addition of the TV minimization term has a negligible impact on the refocusing performance.The PSF addition causes the highest performance penalty, which however depends on the size of the PSF: for increasing down-scaling factors, it has a progressively lower impact.The Fourier method is also showing better performance with higher down-sampling factors, because the initial 4D-FFT is performed on smaller data sizes [16].

Tarot cards refocusing
The Tarot cards test case shows the super-resolution and artifact correction potential of the proposed methods, over real-data, and in particular over a well understood and widely used dataset.
We first analyze the results of the downscaled and blurred dataset.In Fig. 11, we can observe the refocusing performance of the most significant reconstruction methods, for the two different up-sampling parameters of 1 and 4. For this test case, we chose to use: back-projection, Fourier slice theorem, SIRT with PSF, and CP-LS with SWT penalization based on Haar wavelets (CP-LS-SWT) and PSF.Again, the back-projection and Fourier based methods deliver comparable refocusing and feature-resolving performance.In Fig. 11(c), for up-scaling equal to 4, the SIRT reconstruction presents a better feature resolving performance than the one-step approaches, but also visible artifacts in the reconstruction.The SWT penalized reconstruction from Fig. 11(d) visibly preserves the higher feature resolving performance of the SIRT approach, while strongly reducing the said artifacts.The choice of the SWT penalization term over the TV term can be understood through the same reasoning detailed in section 2.4.The Tarot Cards dataset presents a variety of features, including shaded regions.These regions break the assumption of sharp boundaries associated to the TV minimization.
We now consider the reconstruction of the unmodified Tarots Cards dataset.It can be deduced from Fig. 12 that the approaches presented in this article allow to correctly reconstruct high resolution gantry data.By comparing the results of the different methods in Fig. 12, no difference is visible between the reconstructions.This is due to the fact that, in this case, the back-projection method already returns a correct reconstruction.In fact, the blur visible in the reconstructions is also present in the sub-aperture images (Fig. 12(a)).These high resolution gantry data reconstructions can be used to quantify the quality of the super-resolved refocusing reconstructions that used the modified Tarots Cards dataset.The measured RMSE values for the reconstructions in Fig. 11, against Fig. 12(b) are the following: 0.0363353, 0.103504, 0.0382393, 0.0169782 for the back-projection, Fourier, SIRT with PSF, and CP-LS-SWT with PSF respectively.These results confirm the visual interpretation carried out earlier in this section.

Tea refocusing
The Lytro Illum plenoptic camera is a popular commercially available compound camera, that allows to easily acquire and extract single-shot light-fields, while maintaining a portable format.In Fig. 13, we show the reconstruction of the Lytro Illum data acquired with the setup in Fig. 5(a).As it was done for the Tarots Cards dataset, we performed reconstructions with up-sampling of 1 and 4, for the following methods: back-projection, Fourier slice theorem, SIRT with PSF, and CP-LS with SWT penalization based on Haar wavelets (CP-LS-SWT) and PSF.
This example shows that the methods developed in this article can be applied to the data acquired with this popular camera.It can be seen from the leaf and castle insets that once again both iterative approaches present a better feature resolving performance than the one-step approaches.Moreover, the SWT regularization is well behaved for the refocusing of natural images, and it helps in reducing artefacts.
In this example, we also focus our attention to the out-of-focus regions of the refocused image.In particular, we refer to the bottom inset, which magnifies an edge from the background.The super-resolved back-projected reconstruction (bottom of Fig. 13(a)) of this region is similar to the corresponding Fourier version (bottom of Fig. 13(b)), but it also presents traces of a wavy pattern.The iterative methods used in Figs.13(c) and 13(d) present a stronger artifact because they use the same back-projection operation as in Fig. 13(a), but iterate it multiple times.The observed artifact is due to inaccuracies in the numerical interpolation of the used back-projection implementation, which is more apparent for higher relative distances from the acquisition focal distance.In fact, this effect is not visible for the Tarot Cards test case (Fig. 12), and it deviates from the expected bokeh effect for out-of-focus regions.To remove it, the back-projection operation could be substituted by a Fourier type of back-projection.

Letters and Flower refocusing
The Letters and Flower test cases assess the ability of our approach to handle real-data produced with the FLF camera geometry.We show that every aspect of our approach, including superresolution and regularization, can be carried over to this case.In Fig. 14, we present the reconstruction of the FLF data from Fig. 7 for the following approaches: back-projection, SIRT without PSF and CP-LS-TV with PSF.For this case, the choice of the TV term is justified by the properties of the imaged objects: the printed letters, with sharp boundaries over a plain background, explicitly satisfy the prior knowledge of the TV minimization.We present both reconstructions with up-sampling equal to 1, and 2. From Fig. 14, it is clearly visible that the use of super-sampling and regularization techniques can improve the reconstruction of the smallest features.
The PSF used for this case is the result of both the theoretically predicted PSF due to Fraunhofer diffraction of the lenslets, and the Nyquist limitation on the resolution for the given refocusing distance (as seen in section 2.2).To compute the Nyquist PSF size, we used a computed refocusing distance z 430 mm, an acquisition focus distance z 0 = 400 mm, and (u, v) resolution of ∼ 0.175 mm. Figure 14(c), despite looking overall sharper than Fig. 14(b), still presents some blur.This is due to the fact that the PSFs were only theoretically computed, based on the estimated distances z, z 0 , and the measured b (which in turn determines a, and the (u, v) resolution).For higher reconstruction accuracy, a thorough calibration of the acquisition setup is required (section 2.5), including the experimental determination of the camera PSF.In Fig. 15, we present the reconstruction of the FLF data from Fig. 8. Similarly to what was done for the Letters test case, we used the following: back-projection, SIRT without PSF and CP-LS-SWT with PSF.The notable difference from the Letters case is the choice of the SWT regularization term, over the TV term.As previously mentioned, natural imagery generally breaks the assumption of flat and sharp regions in the reconstructed images.
In Fig. 15(a), we can see the expected reconstruction with an ideal camera (with infinite bandwidth).However, in practice, as stated in [16], the acquired light-field is convolved with a low pass filter, which has the form of a 4D sinc function.This results in lower than ideal performance for each reconstruction method, but an increasing similarity to Fig. 15(a) indicates increasing reconstruction quality.Moreover, the images in Fig. 15(a), were obtained by downscaling the high resolution image from Fig. 8(a), and not by taking a picture of a printed version, as it was done for Figs.8(b) and 8(c).As a consequence, especially the higher resolution version represents a purely ideal case, because it was not upsampled from its lower resolution version.
Despite requiring a more thorough calibration of the images and PSF, also in this case, the proposed methods improve on the back-projection approach.In particular, the super-resolution reconstruction using the theoretical PSF and SWT regularization allows to visibly recover more of the expected features, than the other considered approaches.

Conclusions and outlook
The proposed tomography based light-field refocusing method is defined in the direct-space, as opposed to previously proposed Fourier space tomography methods.We have shown that this formulation is applicable to both the ULF and FLF camera configuration, and it allows to both easily introduce advanced forward operator modeling, and advanced regularization techniques.This is due to the fact that it reduces the different camera geometries to small differences in the projection implementation, resulting in a unified refocusing strategy for all the available plenoptic acquisition schemes.One of the main results of this aspect is that it provides greater and wider applicability for future developments in the field.As an example, we showed that the introduction of regularization terms in the refocusing reconstruction could be applied to both the ULF and FLF cases.
The TV and SWT regularization techniques also proved to be powerful tools for reducing the artifacts due to super-resolution reconstructions and PSF corrections.Newer and more advanced regularizations could help to further improve the reconstruction quality and push for higher super-resolution refocusing.
The proposed approach will allow in the future to introduce the optical system's aberration modeling in the forward projection operator, in the same fashion as for the PSF.This would result in yet higher refocusing accuracy and super-resolution.We believe that further effort could be devoted to determining the theoretically best achievable super-resolution for the light-field data and geometry at hand, once the PSF and aberrations have been taken into account.On another note, this article further confirms that computational improvements can help to minimize the impact of physical limitations imposed by the manufacturing of smaller optics and detectors with smaller pixel sizes.
Finally, we showed that the proposed method allows to tune the refocusing problem based on the available prior knowledge, but also that multiple advanced algorithms could be used to solve these problems.This means that the newest developments in the reconstruction algorithms could be ported to light-field refocusing from other fields like computed tomography, alongside with the newest developments in convex optimization and regularizations.

Fig. 2 .
Fig. 2. Projection geometry of the two light-field acquisition geometries: (a) unfocused light-field (ULF); (b) focused light-field (FLF).At the distance z 0 all the sub-aperture images coincide on the (s, t) axes for the ULF case, while they experience shifts ∆s = (a/b)∆σ and ∆t = (a/b)∆τ for the FLF case.

Figure 2 (
Figure 2(b) gives a representation of the projection geometry of different sub-aperture images, for the FLF case.Without loss of generality we can identify again A z as the matrix representation of both the forward-projection operators in Eqs.(17) and(19), and A T z as the matrix representation of the back-projection operators in Eqs.(18) and (20).If we now consider the discretization of the functions L (s i , t i , u, v) and E z (s o , t o ) as the vectors b and x z respectively, the image formation process for one slice is represented by the following linear system:

Fig. 3 .Fig. 4 .
Fig. 3. Description of the logos synthetic case: (a) phantom used to simulate light field data, having a VOXEL logo at the acquisition focal distance z = 100 mm, and two CWI logos at the distances of 110 mm, and 90 mm; (b) simulated data, with two sub-aperture images for clarity.

Fig. 6 .
Fig. 6.Raw data from the Tea dataset acquired using a Lytro Illum plenoptic camera, with the setup from Fig. 5(a).

Fig. 7 .
Fig. 7. Raw data for the Letters case: (a) Micro-image representation; (b) Sub-aperture representation.In both cases, two zoomed insets have been shown for clarity.The observed inversion between (a) and (b) is due to the fact that FLF data mixes spatial information with angular information at the micro-images[36], and that the lenslets perform an image space inversion of the images that form on their object space focal plane.

Fig. 8 .
Fig. 8. Raw data for the Flower case: (a) Original (high resolution) image used for this test case; (b) Micro-image representation; (c) Sub-aperture representation.In both cases, two zoomed insets have been shown for clarity.

Fig. 10 .
Fig. 10.Root-Mean-Square-Errors (RMSEs) and computational time for each reconstruction approach from Fig. 9 for the down-sampling factors 1, 2, and 4 of the sub-aperture images (the (s, t) coordinates of the light-field): (a) RMSE; (b) Computational time; (c) Scaling of sub-aperture image resolution and reconstruction up-sampling factor against the sampling factor.

Fig. 11 .Fig. 12 .
Fig. 11.Performance comparison of different refocusing approaches, at different levels of up-scaling (super-resolution) for the Tarot Cards case.First row shows reconstructions at up-sampling = 1, and second row up-sampling = 4.In the columns, instead, we have the different approaches: (a) Back-projection; (b) Fourier slice theorem; (c) SIRT with PSF; (d) Chambolle-Pock using PSF, l 2 data-divergence term, and SWT Haar regularization with λ = 3.
Fig. 14.Performance comparison of different refocusing approaches, at different levels of upscaling (super-resolution) for the Letters case.First row shows reconstructions at up-sampling = 1, and second row up-sampling = 2.In the columns, instead, we have the different approaches: (a) Back-projection; (b) SIRT without PSF; (c) Chambolle-Pock with PSF, l 2 data-divergence term, TV regularization, and λ = 100.

Fig. 15 .
Fig. 15.Performance comparison of different refocusing approaches, at different levels of upscaling (super-resolution) for the flower case.First row shows reconstructions at up-sampling = 1, and second row up-sampling = 2.In the columns, instead, we have the different approaches: (a) Expected reconstruction for an ideal camera with infinite bandwidth optical system response; (b) Back-projection; (c) SIRT without PSF; (d) Chambolle-Pock with PSF, l 2 data-divergence term, SWT regularization, and λ = 5.