Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques

This paper investigates experimental means of measuring the transmission matrix (TM) of a highly scattering medium, with the simplest optical setup. Spatial light modulation is performed by a digital micromirror device (DMD), allowing high rates and high pixel counts but only binary amplitude modulation. We used intensity measurement only, thus avoiding the need for a reference beam. Therefore, the phase of the TM has to be estimated through signal processing techniques of phase retrieval. Here, we compare four different phase retrieval principles on noisy experimental data. We validate our estimations of the TM on three criteria : quality of prediction, distribution of singular values, and quality of focusing. Results indicate that Bayesian phase retrieval algorithms with variational approaches provide a good tradeoff between the computational complexity and the precision of the estimates.


Introduction
Wave propagation in complex media is a fundamental problem in physics, be it in acoustics, optics, or electromagnetism [1]. In optics, it is particularly relevant for imaging applications. Indeed, when light passes through a multiply scattering medium, such as a biological tissue or a layer of paint, ballistic light is rapidly attenuated, preventing conventional imaging techniques, and random scattering events generate a so-called speckle pattern that is usually considered useless for imaging. Recently, wavefront shaping using spatial light modulators (SLM) has emerged as a unique tool to manipulate multiply scattered coherent light, for focusing or imaging in scattering media [2]. In essence, these methods use the linearity and time-reversal symmetry of the wave propagation, whatever the complexity of the medium, to control the output speckle field, by manipulating the light beam impinging on the scattering sample. Different wavefront shaping approaches rely on digital phase-conjugation [3,4] or iterative algorithms [5], but it is also possible to measure the so-called transmission matrix (TM) of the medium [6], which fully describes light propagation through the linear medium, from the modulator device to the detector. This approach has been particularly efficient for focusing, imaging [7,8] and for studying the transmission modes of the medium [9]. These methods are not only valid for scattering material but can also be applied to other complex transmission system, most notably multimode fibers, turning them into minimal footprint endoscopes [10,11,12,13].
A major limitation of most of these techniques for imaging is their speed. Indeed, the wavefront shaping process must be faster than the stability time of the medium, which can be of only a few milliseconds in biological tissues. Yet, most of the works reported so far have relied on phase modulators which are usually slow (few tens of Hertz for liquid crystal modulators). Micro Electro-Mechanical Systems (MEMS) modulators are much faster, but are usually not phase-only. As a promising alternative for wave shaping in complex media, Digital Micromirror Device (DMD) technology [14] offers binary amplitude modulators (i.e., ON or OFF) operating at > 20kHz, with high pixel counts (10 6 ) and low pitch (around 10 microns), all this at low cost. These binary amplitude modulators have been used as phase modulators, using appropriate diffraction and filtering, e.g. by Lee-type amplitude holography [15,16], as shown on Fig. 1b). While phase control is more effective for wavefront shaping than amplitude control, some works reported on using DMD as genuine binary amplitude modulators for wavefront shaping through opaque scattering media, albeit usually yielding lower overall efficiency than phase modulators for focusing or mode matching [17,18,19]. The DMD configuration can also be optimized using genetic algorithms [20] to maximize the intensity enhancement.
For the measurement of a TM, an additional issue lies in accessing the amplitude and phase of the output field, that in optics usually requires a holographic measurement, i.e. a reference beam, as shown on Fig. 1a). This reference beam can either be co-propagating in the medium [7, 21], or use an external reference arm [22,8]. The phase and amplitude of the measured field can then be extracted by simple linear combinations of interference patterns with a phaseshifted or off-axis reference. This however poses the unavoidable experimental problem of the interferometric stability of the reference arm.
In this work we report on the full measurement of the complex TM of a multiply scattering medium, using a DMD binary amplitude modulator as an SLM, with no reference on the detection side, as shown on Fig. 1c). This approach combines the high-speed and high pixel counts allowed by DMD devices, with the simplicity and robustness of a reference-less optical setup. However, it involves advanced signal processing algorithms for phase retrieval, run on a sufficiently large number of input-output calibration measurements. In this study, we compare the performance of four phase-retrieval algorithms [23,24,25,26], for the estimation of a TM based on actual noisy experimental measurements. We assess their performance as a function of the number of measurements, and compare their relative computational cost. We then show Using the DMD as a spatial phase modulator by displaying amplitude holograms, and using the unmodulated parts of the field as a phase-stable reference; (c) the presented approach where only intensity values are measured.
that the distribution of the singular values of the measured TM varies according to random matrix theory. Finally, we demonstrate that single-or multi-point light focusing can be achieved, using an ∞ -regularization algorithm [27] for determining the optimal DMD binary input pattern. In addition to being an interesting signal processing problem, our approach is particularly relevant for real-life applications of the TM approach, since it allows a simple, fast and robust implementation.

Experimental setup
Our experimental setup, described in Fig. 2, uses a DMD-array from Texas Instrument (1920 × 1080 tilting micromirrors), driven by the DLP V-9500 VIS module (Vialux). The DMD is made of mirrors that can switch between two angular positions separated by 24 • , thus reflecting each pixel either toward a beam dump (pixel OFF) or towards the focusing system (pixel ON). Under Matlab, an amplitude mask is computed and loaded on the DMD. The pattern corresponding to the ON pixels is focused on the surface of a thick scattering medium by means of a f = 100 mm lens L1 (thus the DMD pixels correspond roughly to incidence angles on the sample). The sample is a ∼ 100 microns thick layer of white paint, which is thick enough in order to considerably mix the light on the other side, producing a complex speckle interfering pattern. This speckle pattern is collected through a microscope objective (L2) and detected on a camera (AVT Pike F-100B). In order to measure the TM, we need to send a large series of input patterns (typically a few times the number of input pixels we wish to control), in a time over which the medium can be considered stationary. For this purpose, we use the "high speed" driver provided with the DMD in order to load all the to-be-projected random amplitude masks to the memory of the DMD driver module, and we trigger the display of each mask via a DAQ card (National Instruments, PCI-6221) and a waveform generator. In the same way, in order to be as fast as possible, we also only consider a subregion on the camera of size of 400 × 400 pixels. The overall acquisition rate is 31 images/second. To monitor the stability of the medium, we periodically measure the correlation of the speckle image corresponding to the same input mask. We therefore quantify the stability of the medium, which is better than 98% over the total measurement time (typically around 5 minutes). A 532 nm CW laser is expanded through a telescope in order to obtain an homogeneous beam. Through a rectangular mask, it illuminates the DMD which acts as binary amplitude spatial light modulator. The DMD reflects the light in two different directions corresponding to either ON (unit transmission) or OFF (the light is deviated towards a beam dump). The transmitted pattern is focused by a first lens L1 on the scattering mediumhere a white paint layer -, acting as a thick multiply scattering medium. The transmitted speckle pattern is collected by a microscope objective and is observed through a polarizer P on a CCD camera.

Estimating the TM with intensity-only measurements and binary inputs
The measurement of the TM can be formalized as a calibration problem: given P incoming waves, assumed perfectly known, which model explains at best the observed outputs? In our case, this inverse problem reduces to the well-known problem of phase retrieval. Let x µ ∈ {0, 1} N stand for the binary DMD inputs related to the µ-th acquisition, where N is the number of pixels (mirrors) used on the DMD. We assume that the partial observations of the sole moduli of the transmitted waves (the square root of the camera measured intensities), denoted by y µ ∈ R M + , obey where D is the TM complex-valued transmission matrix characterizing the scattering material, and M is the number of observed pixels on the camera.
Then, adopting a matrix formulation and conjugating-transposing the system, we get where Y = [y 1 , . . . , y P ], X = [x 1 , . . . , x P ] and . H denotes the conjugate-transpose of a matrix/vector. This reveals a "classic" phase retrieval problem: given the matrix of inputs X H , each column of Y H is used to estimate each complex-valued column of D H .

Phase retrieval
The problem of reconstructing a complex vector given only the magnitude of measurements is a non-convex optimization problem notoriously difficult to solve. Many algorithms have been devised in the literature to deal with this problem. We can roughly divide them into three main families:

Bayesian variational approximations
Additionally to the previous notations, we introduce new variables, modeling, on the one hand, the missing phases of the observations, and on the other hand, some acquisition noise. Thus, recalling that we resort to a conjugate-transposition of the matrix system, each absolute-valued measurement y µ , µ ∈ {1 . . . P}, of any row y of Y, is expressed as where θ µ ∈ [0, 2π) stands for its missing conjugate phase, x µi is the ith element of the µth row in X, d * i corresponds to the ith conjugate element in the current estimated row d of D and ω µ is an additive noise, assumed centered isotropic Gaussian (denoted C N in the following) with variance σ 2 . We moreover suppose that the probability distributions for the entries of the matrix and for the missing phases are: and Under these assumptions, the absence of phases in the observations is naturally taken into account in the model since marginalizing on θ µ leads to a distribution on y µ which only depends on the moduli of y µ and ∑ N i=1 x µi d * i . Within model (3)-(5), the recovery of the complex signal d can be expressed as the solution of the following marginalized Maximum A Posteriori (MAP) estimation problem with Because of the marginalization on the hidden variables θ , the direct computation of p(d|y) is however intractable in general. The solutions in [25,26] optimally approximate, in a Kullback-Leibler sense, the posterior joint distribution p(d, θ |y) byq(d, θ ) conditionally to a set of given constraints F :q Depending on F , the minimization (8) gives rise to different approximations.
• In particular, defines a Mean-Field approximation, and problem (8) can be efficiently solved using the "Variational Bayes Expectation-Maximization" (VBEM) algorithm [31]. This is the approach considered in [26], denoted by prVBEM in the rest of this paper.
partitions the variables d (resp. θ ), and α i (resp. β µ ) is the degree of variable node d i (resp. θ µ ), problem (8) refers to the minimization of the Bethe free energy, which can be solved by generalized approximate message passing (GAMP) algorithms, see [32]. This is the approach followed in [25], denoted by prGAMP in the rest of this paper.
We will not detail here the structure of the resulting algorithms. We refer the interested reader to the papers [26] and [25] and the authors' webpage 1 for a practical implementation of the prVBEM algorithm.

Prediction performance
To assess the accuracy of the TM estimated by the considered approaches, we adopt a crossvalidation-like experimental framework. The setup is as follows. We measure the M = 40000 camera pixels stemming from N = 900 DMD mirrors, 50% of them being turned on, the others off at each displayed pattern. The operation is repeated randomly P = 6000 times. Given this dataset, a row of the TM is then learned from p = αN calibration measurements, with α varying in {1, . . . , 6}, and used in a second step to predict the P − p remaining measurements. This estimation is performed on 50 different rows of the TM.
We   As a tradeoff between computational cost and performance, we set the stopping criteria for each algorithm as follows. PhaseCut is run until the target precision drops below 10 −2 . Gerchberg-Saxton stops after 3000 iterations, prGAMP and prVBEM after 200 iterations. Given the complexities of the algorithms, we allow PhaseCut and Gerchberg-Saxon for a higher running time, as shown and further discussed in Fig. 4.
The prediction performance of the algorithms is evaluated according to the mean-square error (MSE) (Fig. 3a) and the normalized cross-correlation (Fig. 3b) between the moduli of the P − p predicted measurements and the actual observed ones. For Gerchberg-Saxton, PhaseCut and prVBEM, the MSE curves present similar behaviors : they decrease monotonically with increasing α. This observation resonates in Fig. 3b, with the general increase tendency of the correlation. Interestingly, we see that for α ≥ 3, that is, for at least 3 times more real measurements than complex unknowns, prVBEM outperforms all other algorithms with a correlation around 0.95.
On the contrary, prGAMP presents a contrasted performance. If it leads to a good correlation (Fig. 3b) between estimated and observed measurements, its MSE remains very high, independently of α (Fig. 3a). Here, the algorithm finds an acceptable solution but only up to a multiplicative factor. While this is not an issue for the focusing experiment considered in Section 4, this could become a limitation in other more complex tasks.
Finally, as previously mentioned, we allow in these experiments more iterations for PhaseCut and Gerchberg-Saxton. In parallel to the performance curves exposed above, Fig. 4 illustrates the corresponding average running time of the 4 considered algorithms. In this figure, we see that prGAMP performs the lowest computational cost, closely followed by prVBEM. Phase-Cut requires a long running time, prohibitive in our application context. It should be noted that the Gerchberg-Saxton algorithm, although relatively slow, still exhibits good performance, especially given its simplicity.
On the basis of these preliminary experiments, for the remaining of this paper, we choose the prVBEM approach and set the number of calibration measurements to p = 4N, as a good tradeoff between performance and computation time. In this case, computing one row of the TM takes about .6 s in Matlab, on a Macbook Air with a 1.7GHz i7 processor, keeping in mind that rows are independent.

Comparison of singular values to Random matrix theory
Interestingly, we can check that the measured TM presents some characteristics as predicted by random matrix theory. One practical way is to verify that the distribution of its normalized singular values obeys the Marcenko-Pastur law [34]. It should be noted that such apparently random signals are the hardest case for phase retrieval, where no specific structure can be taken into account.
In order to reduce the influence of specifics of our experimental setting, we perform the following operations, as in [35]: i) We normalize over the rows and columns, to attenuate the illumination artifacts: residual illumination "by default" on each pixel of the camera for the rows, and inhomogeneous contribution of each DMD mirror on the entire set of camera pixels for the columns.
ii) Because of the size of the speckle grains, two neighboring DMD mirrors may affect the material in the same way, as well, two pixels of the camera will be potentially correlated.
To avoid this effect, we subsample the rows and columns of the matrix.
To draw the empirical spectral density, we then consider the following setup. We subsample the columns of the matrix up to N = 200 and leave the number of rows varying, more precisely M = γN, with γ ∈ {1, . . . , 6}. These sub-matrices thus constitute partitions of the estimated matrix, randomly picked 100 times to average the resulting densities. Fig. 5 compares the experimental curves to the theoretical ones drawn according to the Marcenko-Pastur law. We see that the experiments qualitatively follow the predictions. We remark, however that the larger γ is, the more chances we have to consider the contributions of neighboring correlated pixels. This partly explains the increasing gap between both curves.

Focusing with the DMD
Knowing the TM gives a powerful and flexible tool to control light within the scattering medium [35]. In particular, it can be used to compute which DMD input has to be set, in order to display a given arbitrary pattern at the receiver end. In this section, we demonstrate the special case of focusing light with maximum intensity on a desired pattern (a chosen sparse subset of the output pixels), with the TM measured experimentally as in the section 3. It should be emphasized that we keep the same experimental setup, with the binary DMD as input device. Here, simple inversion methods such as [35] cannot be used, as these require intensity-or phase-modulated inputs.
We propose here to resort to a similar Bayesian variational approach as for the calibration, adapted to the binary nature of the DMD inputs.

Mean-Field-based inversion
Formally, the problem can be expressed as an inverse problem, where, knowing the TM D and the observation y, we look for the DMD input x such as described in (1). Adopting a similar modeling as in previous section, we then assume, for all elements y µ with µ ∈ {1, . . . , M}, where θ µ ∈ [0, 2π) stands for the missing conjugate phase, d µi is the µth element of the ithcolumn in D, x i ∈ {0, 1} corresponds to the state of the i-th DMD pixel and ω µ is an additive noise, assumed centered isotropic Gaussian of variance σ 2 . As in the subsection 3.2, we suppose that the elements θ µ are independently and uniformly distributed in the interval [0, 2π), however, in order to accommodate for binary inputs, we consider here a Bernoulli model for x: Then, within model (9)-(10), we solve the marginalized MAP estimation: with and resort -following the comparison exposed in subsection 3.2 in the Gaussian case -to a Bayesian Mean-Field approximation. The particularization of the algorithm to the Bernoulli model (10) is detailed in the appendix, an implementation is also available on author's webpage. In this section, we assess the performance of the proposed focusing approach through different experiments. The general setting is as follows. The DMD inputs, here taken of dimension N = 1600, are estimated according to the procedure described above from the desired outputs, focusing on 1 to 4 target points. We set the Bernoulli parameters p i to 0.5, noticing that asymptotically half of the DMD pixels are expected to be "ON" ([17]). Finally, the TM, reduced to its rows of interest, is measured as discussed in section 3. Fig. 6 shows an example of the observed output field, corresponding to the estimated DMD configuration, optimized to focus on 3 points. To quantitatively evaluate the focusing performance, we measure the intensity enhancement factor, as:

Experiments and results
where I foc is the intensity inside the target area after spatial binary amplitude modulation is performed, I back is the average background intensity. This value is measured for 100 trials, as a function of the number of calibration measurements used to learn the TM. Two different setups are then considered: the single-point focusing case and the multi-target case.  For each experiment point α ∈ {2, . . . , 6}, such that p = αN calibration measurements are used to compute the TM, we display the boxes related to the phase-conjugation method (blue boxes), and the new Mean-Field technique (red boxes) described in the previous section. As a first observation, we can see that the general dependency with regard to α noticeably resonates with the curve of the prVBEM algorithm in Fig. 3b: there is a clear gap between the performance achieved for α = 2 and for α = 3, while, for α ≥ 3, the intensity enhancement keeps increasing but less significantly.

Focusing on a single point
Interestingly, the Mean-Field approach seems to outperform phase-conjugation, with regard to the mean and maximum values measured, but not always in a statistically significant manner. Focusing on the most favorable case considered here, namely with α = 6, the best intensity enhancement factor lies around 140, to be compared with the ideal expected enhancement given by 1 +

Focusing on multiple points
For this second setup, we are interested in the performance of the proposed algorithm in a context of multiple target points. Additionally to the intensity enhancement, we consider here the missed detection rate, defined as the number of trials (expressed in percentage) failing to focus on at least one of the multiple target points, i.e., the number of trials for which at least one of the T largest intensity peaks in the output image does not match any of the T targets.  The missed detection rate seems however more sensitive to the number of calibration points used to learn the TM: for α = 2 and 2 target points, the algorithm fails with a rate approaching 40%, while for α = 3 and the same number of targets, we keep a reasonable performance (around 10%). In a more general view, these figures greatly highlight the deep relation between the quality of the calibration and the focusing performance.

Conclusion
This paper shows that the full complex-valued transmission matrix of a strongly scattering material can be estimated, up to a global phase factor on each of its rows, with a simple experimental setup involving only real-valued inputs and outputs. In our experiment, the inputs are amplitude modulations on a binary DMD, and the output is the field intensity measured on a CCD camera, that gathers a significant amount of measurement noise. Note that no reference arm is used, that would allow interferometric measurements, but that would make the experimental setup more complex and considerably more unstable.
We here resort to Bayesian phase retrieval techniques, and we have shown that, amongst such techniques, a recently proposed variational approach (VBEM) [26] allows a precise estimation of the transmission matrix, tractable in computational complexity and scalable for large-size signals, provided that we have a sufficiently large number of input-output calibration signals.
Experimental results validate this concept, both in terms of output prediction, distribution of singular values, and in an application of light focusing onto a number of target points in the output plane. It should be emphasized that this estimation of the transmission matrix opens many applications beyond light focusing, may it be for imaging through the scattering material [35,36], or for obtaining information about the scattering material itself.

Appendix: Focusing with a Mean-Field based algorithm
The VBEM algorithm is an iterative procedure which successively updates the factors of the Mean-Field approximation. Particularized to model (9)-(10), this gives raise to the following update equations: q( where y = y µ e ( j arg(y * µ z µ )) I 1 ( 2 σ 2 |y * µ z µ |) and I 0 (resp. I 1 ) stands for the modified Bessel function of the first kind for order 0 (resp. 1). Coming back to problem (11), an approximation of p(x|y) thus simply follows from p(x|y) = θ p(x, θ |y), Using this approximation, the problem is then easy to solve by a simple thresholding operation, i.e.,x i = 1 if q(x i = 1) > 0.5 andx i = 0 otherwise.