Efficient Inversion of Multiple-Scattering Model for Optical Diffraction Tomography

Optical diffraction tomography relies on solving an inverse scattering problem governed by the wave equation. Classical reconstruction algorithms are based on linear approximations of the forward model (Born or Rytov), which limits their applicability to thin samples with low refractive-index contrasts. More recent works have shown the benefit of adopting nonlinear models. They account for multiple scattering and reflections, improving the quality of reconstruction. To reduce the complexity and memory requirements of these methods, we derive an explicit formula for the Jacobian matrix of the nonlinear Lippmann-Schwinger model which lends itself to an efficient evaluation of the gradient of the data- fidelity term. This allows us to deploy efficient methods to solve the corresponding inverse problem subject to sparsity constraints.


Introduction
Optical diffraction tomography (ODT) was introduced in [1] by E. Wolf in the late '60s. It is a microscopic technique that retrieves the distribution of refractive indices in biological samples out of holographic measurements of the scattered complex field produced when the sample is illuminated by an incident wave. This method is of particular interest in biology because, contrarily to fluorescence imaging, it does not require any staining of the sample [2]. It proceeds by solving an inverse scattering problem, where the scattering phenomenon is governed by the wave equation. There is a vast literature on inversion methods going from linearized models (Born, Rytov) [1,3] to nonlinear ones [4,5,6,7]. It is worth noting that the scattering model, along with its associated inverse problem, is generic and not limited to optical diffraction tomography. In particular, it is encountered in many other fields such as acoustics, microwave imaging, or radar applications [8].

From the wave equation to the Lippmann-Schwinger integral equation
Let us consider an unknown object of refractive index n(x) lying in the region Ω ⊆ R D (D ∈ {2, 3}) and being immersed in a medium of refractive index n b , as depicted in Fig. 1. This sample is illuminated by the incident plane wave where the wave vector k ∈ R D specifies the direction of the wave propagation, ω ∈ R denotes its angular frequency, and u 0 ∈ C defines its complex envelope (amplitude). The resulting total electric field u(x, t) satisfies the wave equation where c 3 × 10 8 m/s is the speed of light in free space. Denoting by u(x) the complex amplitude of u(x, t) = Re u(x)e −iωt and substituting it into (2), we obtain the inhomogeneous Helmholtz equation with the propagating constant in free space k 0 = ω/c. The total field u(x) is the sum of the scattered field u sc (x) and of the incident field u in (x), which is itself a solution of the homogeneous Helmholtz equation ∇u in (x) + k 2 0 n 2 b u in (x) = 0. Accordingly, (3) can be rewritten as (see [1]) where f (x) = k 2 0 (n 2 (x) − n 2 b ) defines the scattering potential function. It follows that where g(x) is the Green's function of the shift-invariant differential operator (∇ 2 + k 2 0 n 2 b I). Specifically, g verifies ∇ 2 g(x) + k 2 0 n 2 b g(x) = −δ(x), where δ is the Dirac distribution and the minus sign is a convention used in physics. Under Sommerfeld's radiation condition, g(x) is given by [9, and references therein] There, H is the Hankel function of the first kind. Finally, the total field u(x) is governed by the Lippmann-Schwinger equation n(x) Sources (u in p ) Ω Figure 1: Optical diffraction tomography. A sample of refractive index n(x) is immersed in a medium of index n b and illuminated by an incident plane wave (wave vector k). The interaction of the wave with the object produces forward and backward scattered waves. The forward scattered wave is recorded in the detector plane. Optionally, a second detector plane may record the backward scattered wave (see Section 5).

Inverse ODT problem: prior work
Let the object be illuminated by a series of incident fields u in Records of the resulting total fields u p (x) at positions x m (m ∈ [1 . . . M ]) in the detector plane Γ are denoted y p ∈ C M (see Fig. 1). The objective is then to retrieve the scattering potential function f (x) (i.e., the refractive index n(x)) from the data y p . Pioneering methods were using linear approximations of the model. For instance, assuming that the scattering field is weak compared to the incident one (i.e., u sc u in ), one can interpret the phase of the transmitted wave as the Radon transform of the refractive index and then reconstruct f using the filtered-back-projection algorithm [10,11]. This method ignores the effect of diffraction. The first Born approximation [1] has then been proposed as a refined model. Its validity is however limited to thin samples with weak variations of their refractive index (RI) [12]. A more accurate linearization, less sensitive to the thickness of the sample but still limited to weak RI contrasts, is given by the Rytov approximation [3,13]. It is derived by assuming that the total field has the form u(x) = u in (x)e φ(x) , where φ(x) is a complex phase function. Both Born and Rytov approximations have been originally used to derive direct inversion methods. They were later used within regularized variational approaches to improve the quality of reconstructed images [14,15].
Inversion methods that use a nonlinear model have been shown to significantly improve the accuracy of reconstruction. These include the conjugate-gradient method (CGM) [16,17], the contrast source-inversion method (CSI) [18], the beam-propagation method (BPM) [19], the recursive Born approximation [5], or the hybrid method proposed in [4]. Although still approximate (for instance, they do not properly take reflections into account), they more closely adhere to the model of the physical phenomenon than the linear models, at the price of a higher computational cost. We refer the reader to [2] for additional details concerning existing approximations, regularizations, algorithms, and comparisons.
To address applications with thick samples and large RI contrasts, a better solution is to rely on the exact Lippmann-Schwinger model which accounts for mutiple scattering and reflections. Such an approach has been recently proposed in [6,7] (SEAGLE algorithm). There, the authors tackle the problem from a variational perspective. They minimize a nonconvex objective using the well known fast iterative shrinkage-thresholding algorithm (FISTA) [20]. Their main contribution is to compute the forward model (which itself requires the inversion of an operator) using Nesterov's accelerated gradient-descent (NAGD) method [21] and, more interestingly, to explicitly compute the gradient of the quadratic data-fidelity term as an error-backpropagation of the forward algorithm. However, the bottleneck of their method is its high memory consumption. Indeed, the error-backpropagation strategy requires one to store all the iterates produced during the computation of the iterative forward model. This can be limiting for large 3D volumes.

Contributions
To improve the computational efficiency of solvers such as SEAGLE, we provide an explicit expression for the Jacobian of the nonlinear Lippmann-Schwinger operator. This results in an efficient method to compute the gradient of the data-fidelity term and avoid recoursing to the memory-consuming error-backpropagation strategy. Another advantage is that the computation of the forward model and of the gradient are now decoupled. They can thus be solved using any numerical scheme. Then, considering simulated data, we show that the proposed method results in a significant reduction of both computational time and memory requirements with respect to SEAGLE, at no loss in quality.
In Section 2.1, we formulate the discrete forward model proposed in [7]. Then, the common approach used to solve the inverse problem subject to sparsity constraints is presented in Section 2.2. There, we highlight our main innovation with respect to SEAGLE, which is a new computation of the gradient of the data-fidelity term. It relies on the derivation of the Jacobian of the forward model, which is given by Proposition 3.1. Finally, Sections 4 and 5 are dedicated to numerical comparisons.
2 Solving the inverse problem

Formulation of the forward model
In this section, we review the formulation of the forward model that was proposed by Liu et al. in [7]. Let the region of interest Ω be divided into N ∈ N "pixels". Then, over Ω, we define the discrete version of (7) as where u p ∈ C N , u in p ∈ C N , f ∈ R N are the discrete representations of u p , u in p , and f , respectively. The diagonal matrix diag(f ) ∈ R N ×N is formed out of the entries of f , while G ∈ C N ×N stands for the matrix of the convolution operator on Ω (convolution with g). One can notice that (8) is nonlinear with respect to f . On the other hand, given u in p and f , the computation of u p amounts to inverting the operator (I − G diag(f )). Instead of attempting to compute this inverse directly, the ODT forward model on Ω, for a given f , is defined as This classical quadratic-minimization problem can be solved iteratively using numerous state-of-the-art algorithms (see Section 4.2). Then, from the total field u p (f ) (inside Ω), we get measurements y p on Γ using a different discretizationG ∈ C M ×N of the Green's function (see [7]) where u in p | Γ denotes the restriction of the field u in p to the area Γ.

Common optimization strategy
Following the classical variational approach, the estimation of f ∈ R N from the measurements where D : R N → R measures the fidelity to data, R : R N → R imposes some prior to the solution (regularization), and µ > 0 balances between these two terms. It is customary to consider the data term where ∀p ∈ [1 . . . P ] which is well suited for Gaussian noise. Here, y sc p = (y p − u in p | Γ ) is the scattered measured field at the detector plane Γ and u p (f ) is given by (9). As regularizer R, the combination of total variation (TV) penalty and nonnegativity constraint is used, where i 0 (f ) = {0, if f n ≥ 0 ∀n; +∞, otherwise} and ∂ d denotes the gradient operator along the dth direction. This choice is supported by the facts that we consider situations where n b ≤ n(x) ⇒ f (x) ≥ 0 and that n and, thus, f can be assumed to be piecewise-constant. It Algorithm 1 Accelerated forward-backward splitting.
is worth noting that (11) is nonconvex due to the nonlinearity of the forward operator in (8). However, since D is smooth with respect to f , (11) can be solved by deploying a forward-backward splitting (FBS) method [22] or some accelerated variants [20,23], as presented in Algorithm 1. The gradient of the data-fidelity term D is given by with where J hp (f ) denotes the Jacobian matrix of Algorithm 1 encompasses FISTA [20] for a specific choice of the sequence (α k ) k∈N . Its convergence is guaranteed in the convex case when γ < 1/Lip(∇D), where Lip(∇D) is the Lipschitz constant of ∇D. In the nonconvex case, a local convergence of the classical FBS algorithm can be shown [24]. Although, to the best of our knowledge, there exists no theoretical proof of convergence of accelerated versions for nonconvex function, Algorithm 1 always converged in our experiments.

Computation of J H hp (f )
The computation of J H hp (f ), required at line 5 of Algorithm 1, is challenging. The existence of a closed-form solution is made unlikely by the fact that the forward model in (8) itself requires one to invert an operator. We distinguish two distinct strategies.

SEAGLE: Build an error-backpropagation rule from the NAGD algorithm used
to compute the forward model (9).
2. Ours: Derive an explicit expression of J hp (f ), as given in Section 3 (Proposition 3.1).

Computation of prox γλR
Numerous methods have been proposed to compute the proximity operator of R, [25,26,27]. In SEAGLE, Liu et al. use the algorithm proposed by Beck and Teboulle [25].
Here, we compute it using the popular alternating-direction method of multipliers (ADMM) [28], which is well suited to the minimization of the sum of three convex functions. Moreover, it provides a high modularity for spatial regularization since one can easily change from one regularizer (e.g., TV) to another (e.g., Hessian Shattennorm [29]). Details about the computation of R are provided in Appendix A.

Speedup strategies
The cost of evaluating the forward model with (9) and the gradient ∇D is proportional to the number P of illuminations u in p . However, these computations can easily be parallelized by performing the computation for each illumination (or each element of the sum in (15)) on a separate thread. Moreover, in the spirit of the stochastic gradient-descent algorithm [30], we approximate ∇D as where ω is a subset of [1 . . . P ]. We change ω at each iteration. Such a method is known to spare many computations when ∇D p does not admit a simple-form expression.

Efficient computation of the gradient ∇D
The error-backpropagation strategy used in SEAGLE to compute J H hp (f ) implies that one must store all the forward iterates. This consumes memory resources and compromises the deployment of the method for large 3D data. Instead, Proposition 3.1 reveals that its computation requires one to invert the operator (I − diag(f )G H ). This operator has the same form (and size) that the operator we invert within the forward computation in (9) and both can be computed in a similar way, using an iterative algorithm. Moreover, it allows us to decouple the forward and gradient computation in Algorithm 1, which has the two following advantages: • choice of any iterative algorithm for computing (9) at line 4 of Algorithm 1, and computing J H hp (f ) at line 5 of Algorithm 1 (see Section 4.2); • reduction of the memory consumption (no needs for storing forward iterates).
Proposition 3.1. The Jacobian matrix of the function h p in (17) is given by Proof. We use the Gâteaux derivative in the direction v ∈ R N given by Then, from (8), we get that and Combining (21) and (22), we obtain that Finally, we get that and, thus, that which completes the proof.
4 Algorithm analysis

Memory requirement
In this section, we elaborate on the memory consumption of the proposed method in comparison with SEAGLE. First, let us state that gradient based methods, such as NAGD or CG, have similar memory requirements. It corresponds roughly to three times the size of the optimization variable which is the part that is common to both algorithms. The additional memory requirement that is specific to SEAGLE relies only on the storage of the NAGD iterates during the forward computation. Suppose that K NAGD ∈ N iterations are necessary to compute the forward model with (9) and that the region Ω is sampled over N ∈ N pixels (voxels, in 3D). Since the total field u p (f ) computed by NAGD is complex-valued, each pixel is represented with 16 bytes (double precision for accurate computations). Hence, the difference of memory consumption between SEAGLE and our method is which corresponds to the storage of the K NAGD intermediate iterates of NAGD. Here, we assumed that ∇D was computed by sequentially adding the partial gradients ∇D p associated to the P incident fields. Hence, once the partial gradient associated to one incident angle is computed by successively applying the forward model (NAGD) and the error-backpropagation procedure, the memory used to store the intermediate iterates can be recycled to compute the partial gradient associated to the next incident angle. However, when the parallelization strategy detailled in Section 2.2.3 is used, the memory requirement is mutiplied by the number N Threads ∈ N of threads, so that Indeed, since the threads of a single computer share memory, computing N Threads partial gradients in parallel requires N Threads times more memory.
For illustration, we give in Fig. 2 the evolution of ∆ Mem as a function of N for different values of K NAGD and N Threads . One can see with the vertical dashed lines that, for 3D volumes, the memory used by SEAGLE quickly reaches several tens of Megabytes, even for small volumes (e.g., 128 × 128 × 128), to hundreds of Gigabytes for the larger volumes that are typical of microscopy (e.g., 512 × 512 × 256). This shows the limitation of SEAGLE for 3D reconstruction in the presence of a shortage of memory resources and reinforces the interest of the proposed alternative.

Conjugate gradient vs. Nesterov accelerated gradient descent for (9)
Due to Proposition 3.1, we can compute both (9) and J H hp (f ) using any state-of-theart quadratic optimization algorithm. This contrasts with SEAGLE, where one must derive the error-backpropagation rule from the forward algorithm, which may limit its choice. We now provide numerical evidence that GC is more efficient than NAGD for solving (9). To this end, we consider a circular object (bead) of radius r bead and refractive index n bead immersed into water (n b = 1.333), as presented in Fig. 3 (topleft). In such a situation, an analytic expression of the total field is provided by the Mie theory [31,32]. Hence, at each iteration k, we compute the relative error ε k of the current estimate u k to the Mie solution u Mie as In our experiment, the bead is impinged by a plane wave of wavelength λ = 406 nm. The region of interest is square with a side length of 16λ (see top-left panel of Fig. 3). It is sampled using 1,024 points along each side. We used a fine grid in order to limit the impact of numerical errors related to discretization. The wave source corresponds to the bottom border of this region. Then, as in [6,7], we refer to the refractive index n bead by its contrast with respect to the background medium, defined as max(|f |)/(k 2 0 n 2 b ). We show in Fig. 4 the evolution of k ε0 , which is the number of iterations needed to let the relative error (28) fall below ε 0 = 10 −2 . One can observe that CG is much more efficient than NAGD, in particular for large contrasts. This is not negligible since an evaluation of the forward model is required at each iteration of Algorithm 1 (line 4). Our comparison in terms of a number of iterations is fair because the computational cost of one iteration is the same for both algorithms. Note that the descent step of NAGD was adapted during the iterations following the same rule as in [6,7].
Finally, the solution obtained with the two algorithms for r bead = 3λ and a contrast of 1 are shown in Fig. 3. The analytic Mie solution is also provided for comparison. From these figures, one can appreciate the high accuracy obtained by solving (9), as first demonstrated in [6,7].

Numerical experiments
This section is devoted to numerical experiments that illustrate the two main advantages of the proposed method over SEAGLE, which consist in a reduced computational cost and a reduced memory consumption. The algorithms have been implemented using an inverse-problem library developed in our group [33] (GlobalBioIm: http://bigwww.epfl.ch/algorithms/globalbioim/). Hence, they share the implementation of the overall FISTA algorithm as well as inner procedures such as the computation of the proximity operator of R (see Appendix A). The only difference between the two methods resides in the computation of the forward model in (9) and J H hp (f ). For SEAGLE, this is performed using the NAGD algorithm and an error-backpropagation strategy. For our method, (9) and J H hp (f ) are computed using the CG algorithm, in accordance with Proposition 3.1. Note that no parallelization is used. Reconstructions are performed with MATLAB 9.1 (The MathWorks Inc., Natick, MA, 2000) on a PowerEdge T430 Dell computer (Intel Xeon E5-2620 v3).

Simulation settings
The Shepp-Logan phantom of Fig. 5 has the contrast max(|f |)/(k 2 0 n 2 b ) = 0.2. It is immersed into water (n b = 1.333). The wavelength of the incident plane waves is λ = 406 nm. We consider thirty-one incident angles, from −60 • to +60 • . The sources are placed at the bottom side of the sample, at a distance of 16.5λ from its center. Moreover, we consider two detectors placed on both top and bottom sides of the object, also at a distance of 16.5λ from its center. Hence, the overall region is a square of length 33λ per side. Data are simulated using a fine discretization of this region, with a (1024 × 1024) grid that leads to square pixels of surface (3.223 · 10 −2 λ) 2 . We used a large number of CG iterations to get the accurate simulation mentioned in Section 4.2.
Then, the measurements were extracted from the first and last rows of each total field associated to the incident fields. This lead to a total of (31 × 2 × 1024) measurements. Finally, we defined three ODT problems by downsizing (using averaging) the (31 × 2 × 1024) measurements to grids with size of (31×2×512), (31×2×384), and (31×2×256). This setting corresponds to an ill-posed and highly scattering situation. Moreover, the detector length is only two times larger than the object, which results in a loss of information for large incident angles. This makes the resulting inverse problem challenging.
Then, to compute the gradient (stochastic-gradient strategy), we selected eight angles over the thirty-one that were available and changed this selection at each iteration (see Section 2.2.3).
The NAGD or CG forward algorithms are stopped either after hundred-twenty iterations or when the relative error between two iterates is below 10 −4 . Finally, twohundred iterations of FISTA are performed with a descent step fixed empirically to γ = 5 · 10 −3 . We used the regularization parameter µ = 3.3 · 10 −2 .  [6,7] in terms of running time and memory consumption. The reconstructed refractive-index maps are presented in Fig. 6

Metrics
We compared the two methods in terms of running time and memory consumption, as measured by the peak memory (maximum allocated memory) reached by each algorithm during execution. The outcome is reported in Table 1. Once again, due to the use of our inverse-problem library [33], the comparison of the two methods is fair because their implementations differ only by the forward algorithm and by the computation of J H hp (f ). Moreover, CG and NAGD are implemented in the same fashion since they inherit the same optimization class of our inverse-problem library. Finally, we also provide the SNR of the reconstructed refractive index and observe that the computational gain comes at no cost in quality.

Discussion
Our proposed alternative to SEAGLE allows us to reduce both time and memory. Moreover, we have measured the peak memory difference ∆ Mem between the two methods and superimposed it on the predictions of Fig. 2 where the adequacy between the theoretical curves and the measured points is remarkable. Hence, although our experiments are restricted to 2D data, where the gap between the two algorithms is moderate, the evolution of ∆ Mem for 3D data can be extrapolated from Fig. 2. This shows the relevance of our method when size increases.
The SNR values given in Table 1 as well as the reconstructions presented in Fig. 6 suggest that the two methods perform similarly in terms of quality. This is not surprising since the overall algorithm is the same, the differences residing merely in the computation of the forward model in (9) and the Jacobian J hp (f ). Moreover, one can observe that the quality of reconstruction decreases when the discretization grid becomes coarser. Indeed, the model is insufficiently accurate when the discretization is too poor. For instance, in the case of the (128 × 128) grid, one wavelength unit is discretized using eight pixels, which is clearly detrimental to the accuracy of the forward model.
Reconstructions for the (256×256) problem are presented in Fig. 6 for completeness. Besides the difficulty of the considered scenario, the two methods are able to retrieve most details of the object in comparison with the Rytov approximation. Artifacts are mainly due to the missing-cone problem and to the limited length of the detector. This corroborates the findings of [7].

Real data
We evaluated our method using the FoamDielExt target (TM polarisation) of the Institut Fresnel's public database [34]. The data were collected for the two-dimensional inhomogeneous sample depicted in the left panel of Fig. 7. The permitivity of the ground truth was measured experimentally and is subject to uncertainties [34]. The object is fully contained in a square region of length 15 cm per side, which we discretize using a 256 × 256 grid. Sensors were distributed circularly around the object, at a distance of 1.67 m from its center, and with a step of 1 • . Eight sources, uniformly distributed around the object, were sequentially activated. For each activated source, the sensors closer than 60 • from the source were excluded. Thus, 241 detectors among the 360 available were used for each source. Frequencies from 2 to 10 GHz with a step of 1 GHz are available in the database but we used only the 3 GHz measurements (i.e., λ = 10 cm). The NAGD or CG forward algorithms are stopped either after two-hundred iterations or when the relative error between two iterates is below 10 −6 . Hundred iterations of FISTA are performed with a descent step γ = 5 · 10 −3 . We used the regularization parameter µ = 1.6 · 10 −2 .
In Fig. 7, we see that both methods provide good reconstructions that are essentially indistinguishable (see also SNR values provided in the caption of the figure). This corroborates the simulated numerical experiments of Section 5.1. The main point here is that, for this setting, the proposed method was 15 times faster than SEAGLE.

Conclusion
We have presented a refinement of the SEAGLE algorithm that was recently proposed in [6,7] and that has shown unprecedented reconstructions for difficult configurations. However, the current limitation of SEAGLE is that its memory requirements increase excessively with the size of the problem, particularly in 3D. As an alternative, we have derived the explicit expression of the Jacobian matrix J hp (f ) of the nonlinear Lippman-Schiwnger model and shown that it can be computed in a direct analogy with the computation of the forward model. This approach allows us to drastically reduce the memory consumption and opens the door to 3D reconstruction using desktop computers. Moreover, the proposed method is quite flexible in the sense that it can cope with any iterative algorithm employed to compute either the forward model or J H hp (f ). For instance, the conjugate-gradient algorithm proved its efficiency for this task. It allows a significant decrease of the computational time with respect to SEA-GLE. Finally, these improvements in terms of speed and memory come at no loss in quality.

A Proximity operator of R
In this appendix, we describe how we compute the proximity operator of the regularization term R in (14) using the ADMM algorithm [28]. The proximity operator is defined [35] as the solution of the optimization problem prox µR (v) = arg min for µ > 0. Let us start by reformulating (29) as Algorithm 2 ADMM for solving (29).