Efficient and accurate inversion of multiple scattering with deep learning

Image reconstruction under multiple light scattering is crucial in a number of applications such as diffraction tomography. The reconstruction problem is often formulated as a nonconvex optimization, where a nonlinear measurement model is used to account for multiple scattering and regularization is used to enforce prior constraints on the object. In this paper, we propose a powerful alternative to this optimization-based view of image reconstruction by designing and training a deep convolutional neural network that can invert multiple scattered measurements to produce a high-quality image of the refractive index. Our results on both simulated and experimental datasets show that the proposed approach is substantially faster and achieves higher imaging quality compared to the state-of-the-art methods based on optimization.


Introduction
The problem of reconstructing the spatial distribution of the dielectric permittivity of an unknown object from the measurements of the light it scatters is common in many applications such as tomographic microscopy [1] and digital holography [2].The problem is often formulated as a linear inverse problem by adopting scattering models based on the first Born [3] or Rytov [4] approximations.However, these linear approximations are inaccurate when scattering is strong, which leads to reconstruction artifacts for objects that are large or have high permittivity contrasts [5].For strongly scattering objects, it is preferable to use nonlinear measurement models that can account for multiple light scattering inside the object [6][7][8][9][10][11][12][13][14].
When adopting a nonlinear measurement model, it is common to formulate image reconstruction as an optimization problem.The objective function in the optimization typically includes two terms: a data-fidelity term that ensures that the final image is consistent with measured data, and a regularizer that mitigates the ill-posedness of the problem by promoting solutions with desirable properties [15].For example, one of the most widely adopted regularizers is total variation (TV), which preserves image edges while promoting smoothness [16].TV is often interpreted as a sparsity-enforcing 1 -penalty on the image gradient and has proven to be successful in the context of diffraction tomography with and without multiple scattering [12,13,[17][18][19][20][21].
Despite the recent progress in regularized image reconstruction under multiple scattering, the corresponding optimization problem is difficult to solve.The challenging aspects are the nonconvex nature of the objective and the large amount of data that needs to be processed in typical imaging applications.In particular, when the scattering is strong, the problem becomes highly nonconvex, which negatively impacts both the speed of reconstruction and the quality of the final image [14].
In this paper, we consider a fundamentally different approach to the problem of image reconstruction under multiple scattering.Recently, several results have interpreted multiple scattering as a forward-pass of a convolutional neural network (CNN) [10,13,21].This view inspires us to reconstruct the object by designing another CNN that is specifically trained to invert multiple scattering in a purely data-driven fashion.While our approach is consistent with the recent trend of using deep learning architectures for image reconstruction [22][23][24][25][26][27][28][29][30], it is fundamentally different in the sense that due to multiple scattering our measurement operator is both nonlinear and object dependent (and hence unknown).Our approach is also related to the recent work on reverse photon migration for diffuse optical tomography [31].However, our focus is on diffractive imaging, where the light propagation is assumed to be deterministic, rather than stochastic as in [31].Finally, we extensively validated the proposed method on several simulated and real datasets by comparing the method against recent optimization-based approaches based on the Lippmann-Schwinger (LS) model and the TV regularizer [12,14].Our results show that it is possible to invert multiple scattering by training a CNN, even when imaging strongly scattering objects for which optimization-based approaches underperform.To the best of our knowledge, the results here are the first to show the potential of deep learning to reconstruct high-quality images from multiple scattered light measurements.

Nonlinear diffractive imaging
In this section, we describe the traditional optimization-based approach for nonlinear diffractive imaging.We first review the image reconstruction and then discuss the details of the physical model for multiple scattering.

Nonlinear inverse problem
We consider an imaging inverse problem where the goal is to recover the unknown image x ∈ R N from the noisy measurements y ∈ C M .The measurement operator H : R N → C M models the response of the imaging system and the vector e ∈ C M represents the measurement noise, which is often assumed to be independent and identically distributed (i.i.d.) Gaussian.When the inverse problem is linear, the measurement operator is represented as a measurement matrix H ∈ C M ×N .
In practice, problems such as (1) are often ill-posed; the standard approach for solving them is by formulating an optimization problem where the data-fidelity term ensures that the final image is consistent with measured data and the regularizer R promotes solutions with desirable properties.Two common regularizers for images include the spatial sparsity-promoting penalty R(x) τ x 1 and total variation (TV) penalty R(x) τ Dx 1 , where D is the discrete gradient operator [16,32,33].Two common methods for solving optimization problems of form (2) are FISTA [34] and ADMM [35], both of which were successfully applied to the problem of image reconstruction from scattered light data [11,12,14,20].

Multiple-scattering model
Consider the scattering problem illustrated in Fig. 1, where an object of the permittivity distribution (r) in the bounded domain Ω ⊆ R 2 is immersed into a background medium of permittivity b and illuminated with the incident electric field u in (r).We assume that the incident field is monochromatic and coherent, and it is known inside Ω and at the locations of the sensors.The result of object-light interaction is measured at the location of the sensors as a scattered field u sc (r).The multiple scattering of light can be accurately described by the Lippmann-Schwinger equation [36] inside the image domain where u(r) = u in (r) + u sc (r) is the total electric field, f (r) k 2 ( (r) − b ) is the scattering potential, which is assumed to be real, and k = 2π/λ is the wavenumber in vacuum.The function g(r) is the Green's function, defined as g(r) j 4 H (1)      is the zero-order Hankel function of the first kind.Note that the knowledge of the total-field u inside the image domain Ω enables the prediction of the scattered field at the sensor The discretization and combination of (3) and ( 5) leads to the following matrix-vector description of the scattering problem where x ∈ R N is the discretized scattering potential f , y ∈ C M is the measured scattered field u sc at Γ, u in ∈ C N is the input field u in inside Ω, S ∈ C M ×N is the discretization of the Green's function evaluated at Γ, G ∈ C N ×N is the discretization of the Green's function evaluated inside Ω, • denotes a component-wise multiplication between two vectors, and e ∈ C M models the additive noise at the measurements.Using the shorthand notation A I − Gdiag(x), where I ∈ R N ×N is the identity matrix and diag(•) is an operator that forms a diagonal matrix from its argument, we can formally specify the nonlinear forward model as follows  The recent work has shown that the computation of the operator H(•) can be interpreted as a CNN and that the gradient of the corresponding data-fidelity term can be efficiently evaluated, enabling efficient optimization [12,13,21].

Scattering decoder
We now describe our proposed deep learning approach called Scattering Decoder (ScaDec).

Backpropagation
The general framework of our approach is visually illustrated in Fig. 2. The first-step in the method is backpropagation, which simply transforms the collected data from the measurement domain to the image domain.We define the backpropagation of the measurements generated by the kth transmitter as where vector y k ∈ C M are the measurements consistent with the kth transmitter and collected by M receivers, and matrix P k ∈ C N ×M is the backpropagation operator.Inside the operator P k , matrix S H ∈ C N ×M is the Hermitian transpose of the discretized Green's function S, and u * in,k is the element-wise conjugate of the incident light field emitted by the kth transmitter.The output z k ∈ C N is a complex vector with N elements, which matches the number of pixels in the original image.When the data is collected with multiple transmissions, we define the backpropagation of K transmitters as where vector w ∈ C N is the linear combination of z k and K denotes the number of transmitters.Note that the backpropagation ( 9) does not rely on the actual forward model H(•) in ( 7) which is both nonlinear and object dependent.Remarkably, as we shall see, our simple backpropagation followed by a specific CNN architecture will be sufficient to recover a high-quality image given multiple scattered measurements.Note that since w is a complex vector, we consider its real and complex parts as two distinct feature maps of the object f .Thus, the backpropagation can be viewed as a fixed layer in a CNN with M inputs and two outputs to the subsequent layers (see Fig. 2) [37].The weights inside the layer are characterized by P k 's, and the activation functions for the output nodes are Re(•) and Im(•), respectively.

U-Net decoder
We design the ScaDec model based on the popular U-Net architecture [37], which was recently applied to various image reconstruction tasks such as X-ray CT [27,39].Fig. 3 shows a detailed diagram of the proposed CNN architecture.There are two key properties that recommend U-Net for our purpose.

Multi-resolution decomposition:
The decoder employs a contraction-expansion structure based on the max-pooling and the up-convolution.This means that given a fixed size convolution kernel (3×3 in our case), the effective receptive field of the network increases as the input goes deeper into the network.

Local-global composition:
In each resolution level, the outputs of the convolutional block in the contraction are directly connected and concatenated with the input of the convolutional block in the expansion.The skip connection enables the later layers to reconstruct the feature maps with both the local details and the global texture.
The suitability of U-Net architecture is further corroborated by the results in Section 4, demonstrating the ability of the network to form high-quality images from multiple scattered measurements.

Experimental validation
We now present the results of validating our method on simulated and experimental datasets.We evaluate the data-adaptive recovery capability of ScaDec by selecting datasets that consist of images with nontrivial features that can be well represented by a CNN, but are not well captured by fixed regularizers such as TV.The first simulated dataset consists of synthetically generated piecewise-smooth images with sharp edges and smooth Gaussian regions.The second simulated dataset consists of human face images [40].The experimental dataset is the public dataset provided by the Fresnel Institute [41], which consists of experimental microwave measurements of the scattered electric field from 2D targets consisting of foam and plastic cylinders.

Results on simulated datasets
The two simulated datasets were obtained by using a high-fidelity simulation of multiple scattering with the conjugate-gradient solver.Each of the datasets contains 1548 images, separated into 1500 images for training, 24 images for validation, and 24 images for testing.The physical size of images was set to 18 cm × 18 cm, discretized to a 128 × 128 grid.The background medium was assumed to be air with b = 1 and the wavelength of the illumination was set to λ = 0.84 cm.The measurements were collected from 40 transmissions uniformly distributed along a circle of radius 1.6 m and, for each transmission, 360 measurements around the object were recorded.The simulated scattered data was additionally corrupted by an additive white Gaussian noise corresponding to 20 dB of input signal-to-noise ratio (SNR).
We evaluated the proposed model in two distinct scenarios associated with the weak and strong scattering.Weak scattering corresponds to the regime where first Born approximation is valid.In particular, we defined the permittivity contrast as f max ( max − b )/ b , where max max r∈Ω { (r)}.The permittivity contrast quantifies the degree of nonlinearity in the inverse problem, with higher f max indicating stronger levels of multiple scattering.We regarded the weakly scattering scenario as f max = 10 −6 , whereas the strong   [38] (FB-NN, column 2) and the total variation [17] (FB-TV, column 4), the non-linear method in [14] regularized by imposing non-negativity (LS-NN, column 3) and the total variation (LS-TV, column 5).The values above images show the SNR (dB) of the reconstruction.The first column shows the true images.Each row corresponds to a different scattering scenario, which is denoted above the leading true image.

FB-NN LS-NN FB-TV LS-TV ScaDec Truth
scattering scenario was considered as f max = 5 × 10 −2 .For each scenario, we trained a separate ScaDec architecture using the corresponding training dataset with the reconstruction mean squared error (MSE) as the loss function.For quantitively measuring the quality of the reconstructed image x with respect to the true image x, we used the signal-to-noise ratio (SNR) defined as SNR(x, x) max a,b∈R where higher values of SNR correspond to a better match between the true and reconstructed images.As illustrated in Fig. 4, we observed no issues with the convergence of the training for our architecture and datasets.Note that all the SNR and visual results were obtained on a distinct dataset that does not contain images used in training.Table 1 summarizes the results of comparing ScaDec against the baseline optimization-based methods corresponding to two different priors: nonnegativity constraints on the image and TV.For each prior, we consider the effects of the linearity versus nonlinearity of the measurement model.The linear measurement model is obtained by using the first Born-approximation, while the nonlinear model takes into account multiple scattering by using the full Lippmann-Schwinger equations [12,14].Fig. 5 additionally shows some visual examples of the reconstructed images for each scenario under consideration.Note that the regularization parameters for TV were optimized for the best SNR performance for all the experiments.The results confirm that as the level of scattering increases, the performance under the linear inverse problem formulation based on the first Born approximation degenerates with or without regularization.While TV substantially improves the SNR, it also imposes a piecewise-constant structure, leading to blocky artifacts visible in Fig. 5. On the other hand, the output of ScaDec substantially outperforms the baseline methods and leads to higher SNR values and to more natural looking images free of blocky artifacts.ScaDec also enjoys good stability in terms of performance, where the reconstruction SNR is nearly identical in weakly and strongly scattering regimes.
Finally, the computational cost of ScaDec is extremely low during the reconstruction stage, where each reconstruction corresponds to a simple forward pass through the CNN.In our case, all the optimizationbased methods were run on a pair of 2 CPUs (Intel Xeon processor E5-2620 v4) during testing, while ScaDec was evaluated on a single GPU (NVIDIA GeForce GTX 1080 Ti).We observed that the reconstructing time of ScaDec for a single image was less than 2 seconds in all scenarios, while LS-TV took over 8 and 35 minutes to reconstruct one image in the weakly and strongly scattering cases, respectively.

Results on experimental datasets
For the experimental validation, two 2D settings were considered: FoamDielExtTM and FoamDielIntTM that consist of a fixed foam cylinder and a plastic cylinder located outside or inside of the foam (Fig. 6).In both settings, the objects were placed within a 18 cm × 18 cm square region, discretized to 128 × 128 grid, hence, the pixel size of the reconstructed images was 1.4 mm × 1.4 mm.Total 8 transmitters were uniformly distributed along a circle of radius 1.78 m, emitting electromagnetic wave towards the objects, and the measurements of the scattered wave were recorded by 360 receivers.Though the dataset contains measurements of a range of wave frequencies, we only consider the case of 5 GHz; hence, the wavelength of the transmission is λ = 60 mm.The background medium was air with b = 1.The permittivity contrasts of foam and plastic were measured as f max = 0.45 and f max = 2, respectively [41].
For different settings, we trained the same ScaDec architecture with 6500 pairs of 128 × 128 synthetic images and their simulated scattered measurements.The measurements were generated by computing the multiple scattering measurements governed by Lippmann-Schwinger equations.Each image was synthesized with one centered circle with a lower contrast and another randomly-placed circle with a higher contrast.Furthermore, all measurements were corrupted with an additive white Gaussian noise corresponding to 20 dB of input SNR.The ScaDec was trained for 1000 epochs to minimize the MSE between the true image and the restored image.
Visual results of the reconstructed images of different methods are shown in Fig. 6, where we compare ScaDec against LS-TV and FB-TV.The first column shows the ground truth of the foam cylinder (light blue) and the plastic cylinder (bright yellow) in each setting.The linear model FB-TV dramatically underestimates the permittivity distribution and fails to reconstruct the shape of objects.On the other hand, the nonlinear model of LS-TV produces better reconstructed images by taking into account both the multiple scattering and the piecewise-constant nature of the image.Finally, the proposed method obtains the highest quality reconstruction in terms of both the contrast value and the shape of objects.The edges of the foam and plastic were clear and sharp, and no obvious degradation of the contrast value was observed.Visually, the results of ScaDec look very close to the ground truth, which is due to the capability of the framework to adapt to the features in the training dataset.Remarkably, the experimental results also illustrate the potential of using simulated data for training, and then deploying the trained CNN for image formation from experimental data.

Conclusion
We designed and experimentally demonstrated a deep convolutional neural network for solving a multiple scattering problem in diffraction tomography.The proposed method, called ScaDec, successfully reconstructed high-quality images and outperformed state-of-art optimization-based methods in all scenarios.Remarkably, the method trained on simulated data, also succeeded in reconstructing images from real experimental data consisting of highly scattering objects.One of the key advantages of the proposed approach is that the actual process of image formation is substantially accelerated compared to optimization-based reconstruction methods.These features make ScaDec a promising alternative to optimization based methods and opens rich perspectives for efficient correction of scattering in biological samples.

Figure 1 :
Figure 1: Schematic representation of scattering scenarios considered in this letter.An object of a scattering potential f (r) is illuminated with an input wave u in , which interacts with the object and leads to the scattered field u sc at the sensors.
r a 2 d 3 b 3 z P 2 D p o o S y a D B I h H J t k c V C B 5 C A z k K a M c S a O A J a H m j q 6 n f u g e p e B T e 4 T i G b k A H I f c 5 o 6 i l n n n k I j y g 8 t N b K J d c 1 o + w d D b p m U W 7 Y s 9 g L R M n I 0 W S o d 4 z v 9 x + x J I 2 x a 8 5 t v d y 4 z O M o o m N 0 g q r I Q e e o g a 5 R E 7 U Q Q Y / o G b 2 i N + P J e D H e j Y 9 5 a 8 H I Z w 7 R H x i f P 7 g y l L g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S T 2 b Q j 2 0 Y e U T e s D I 5 2 x a 8 5 t v d y 4 z O M o o m N 0 g q r I Q e e o g a 5 R E 7 U Q Q Y / o G b 2 i N + P J e D H e j Y 9 5 a 8 H I Z w 7 R H x i f P 7 g y l L g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S T 2 b Q j 2 0 Y e U T e s D I 5 2 x a 8 5 t v d y 4 z O M o o m N 0 g q r I Q e e o g a 5 R E 7 U Q Q Y / o G b 2 i N + P J e D H e j Y 9 5 a 8 H I Z w 7 R H x i f P 7 g y l L g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " S T 2 b Q j 2 0 Y e U T e s D I 5 O + j R m r G 1 m 4

Figure 2 :
Figure 2: The overview of the proposed approach that first backpropagates the data into a complex valued image and then maps this image into the final image with a CNN.

Figure 3 :
Figure 3: Visual illustration of the proposed learning architecture based on U-Net [37].The input consists of two channels for the real and imaginary parts of the backpropagated vector w ∈ C N .The output is a single image of the scattering potential x ∈ R N .

Figure 4 :
Figure 4: Illustration of the convergence of the training on the dataset of piecewise-smooth objects.The left figure shows the training loss and the right figure shows the validation loss.The horizontal lines on the right show the losses of other algorithms on the same data.

Figure 5 :
Figure 5: Simulated Datasets: Visual comparison of the reconstructed images using the linear model with the first Born approximation regularized by imposing non-negativity[38] (FB-NN, column 2) and the total variation[17] (FB-TV, column 4), the non-linear method in[14] regularized by imposing non-negativity (LS-NN, column 3) and the total variation (LS-TV, column 5).The values above images show the SNR (dB) of the reconstruction.The first column shows the true images.Each row corresponds to a different scattering scenario, which is denoted above the leading true image.

Figure 6 :
Figure 6: Experimental Dataset: Reconstructed images obtained by ScaDec, LS-TV and FB-TV from the data of 2D experimental measurements.The first and second row relate to the setting of FoamDielExtTM and FoamDielIntTM, respectively.The first column shows the ground truth of each setting.The size of all reconstructed images are 128×128 pixels.Note that the colormap for FB-TV is different from the rest because the permittivity contrast was extremely underestimated by FB.

Table 1 :
SNR (dB) comparison of five methods on two datasets