Fluorescent wavefront shaping using incoherent iterative phase conjugation

Wavefront shaping correction makes it possible to image fluorescent particles deep inside scattering tissue. This requires determining a correction mask to be placed in both excitation and emission paths. Standard approaches select correction masks by optimizing various image metrics, a process that requires capturing a prohibitively large number of images. To reduce acquisition cost, iterative phase conjugation techniques use the observation that the desired correction mask is an eigenvector of the tissue transmission operator. They then determine this eigenvector via optical implementations of the power iteration method, which require capturing orders of magnitude fewer images. Existing iterative phase conjugation techniques assume a linear model for the transmission of light through tissue, and thus only apply to fully-coherent imaging systems. We extend such techniques to the incoherent case for the first time. The fact that light emitted from different sources sums incoherently violates the linear model and makes linear transmission operators inapplicable. We show that, surprisingly, the non-linearity due to incoherent summation results in an order-of-magnitude acceleration in the convergence of the phase conjugation iteration.


Introduction
One of the core challenges when performing linear fluorescence microscopy inside tissue is the fact that biological tissue is highly scattering at visible wavelengths. This limits the clinical applicability of linear fluorescence microscopy techniques to thin superficial layers, as incoming and outgoing light propagating through the tissue is highly aberrated. In turn, this precludes widespread clinical use for tasks such as vasculature imaging, laser light therapy, and tumor detection.
A promising approach for overcoming the multiple scattering challenge is wavefront shaping correction: if one reshapes the incoming (or outgoing) coherent wavefront, such that its aberration is conjugate to the aberration that will happen inside the tissue, then after propagation the wavefront will focus into a sharp spot inside the tissue.
Adaptive optics techniques [10,18,22] were first used to correct modest aberrations, for example due to imperfect optics or refractive index variations in the tissue. More recently, wavefront shaping techniques [17,57] have shown that it is possible to focus light through thick, highlyscattering layers [48][49][50]55].
Wavefront shaping ideas have found applications in a wide range of imaging modalities, including sound and light, coherent imaging and OCT, and incoherent fluorescence imaging using single-photon and multi-photon excitation. Our interest in this work is wavefront shaping for linear, single-photon fluorescence feedback.
The practical application of wavefront shaping is hindered by the difficulty of finding the wavefront correction to apply. This wavefront correction varies between different tissue layers, and even between different positions inside the same tissue sample. The simplest approach for finding the wavefront correction is to use a so-called guide star [16,20,21,25,26,28,46,[52][53][54]: In this case, scattering arises from a strong single point source inside tissue, and a wavefront sensor [28,48] directly measures the scattered wavefront.
Finding a wavefront shaping correction in the presence of multiple sources is more challenging, and typically involves optimization strategies relying on a variety of feedback mechanisms [4, 7-9, 13-15, 23, 25, 27, 36, 45, 46, 49-52, 55, 56]. This optimization is tractable when the wavefront correction can be described by a small number of parameters (e.g., using Zernike polynomials [11,33]). However, to focus inside thick highlyscattering media, it is desired to use all the degrees of freedom of a modern spatial light modulator (SLM), often in the megapixel range. This is posing non-trivial optimization challenges [14,25,35,50]. Even if we can test every such free parameter only once [13], the very large number of images captured for optimization limits any real-time applicability.
For fully coherent imaging systems, an alternative class of techniques estimating the wavefront correction is iterative phase conjugation. These techniques use the observation that a wavefront shaping correction focusing on a single point inside tissue is an eigenvector of the transmission matrix of the scattering sample [34]. They then find these eigenvectors using an optical implementation of the power method [47], which iterates between sending in a wavefront, measuring the scattered wavefront, and using the measurement as the successive input. Often this procedure converges after a very small number of iterations, leading to an order-of-magnitude acquisition speedup compared to standard optimization approaches. Iterative phase conjugation has found successful applications for sound [37,44] and acousto-optics [39,42], where the propagation is fully coherent. Although not presented this way, a similar iterative scheme was also applied for two-photon fluorescent imaging [32].
An important assumption underlying the coherent iterative phase conjugation scheme is that light scatters only once. This greatly limits its applicability to thin or sparse volumes. Our goal in this work is to develop an iterative phase conjugation approach that is applicable to linear (single-photon) fluorescent imaging. As the emitted light does not excite the tissue or the particles again, by working with fluorescent sources we can greatly relax the single scattering assumption, making our approach applicable to much thicker volumes, in particular tissue.
The primary technical challenge in this setting is that any uncorrected incident wavefront (such as the wavefronts used during the power method) will excite more than one fluorescing point inside the tissue sample, and the excited points will emit light that sums incoherently. Consequently, we cannot model the relation between input excitation and output fluorescent emission using a linear transmission operator, as fully-coherent iterative phase conjugation techniques do. To overcome this challenge, we analyze the incoherent case, and report two findings: First, we show that the same power method procedure as in the fully-coherent case can be used to recover the correction pattern also in the incoherent case. Second, we show that, whereas for the fully-coherent case the power method converges at an exponential rate, for the incoherent case it converges at a doubly-exponential rate. We demonstrate these findings experimentally, focusing light on fluorescent beads attached at the back of chicken breast tissue layers. Our technique achieves wavefront correction after capturing as few as 10 − 30 images, compared to thousands of images captured by existing optimization-based wavefront shaping strategies for fluorescent imaging [7]. Fig. 1 shows our imaging setup. A laser beam illuminates a tissue sample via a microscope objective. A phase SLM at the Fourier plane of the illumination arm modulates the illumination pattern. The modulated laser light excites fluorescent beads at the back of the sample. The emitted light is collected via the same objective, and reflected at a dichroic beam-splitter. A second phase SLM at the Fourier plane of the imaging arm modulates the emitted light. Lastly, the modulated light is measured by the front camera, which captures the images used by our algorithm. The setup includes a second validation camera behind the tissue sample. In our experiments we attached fluorescent beads at the back of the tissue layer, so that the validation camera can image them directly. We emphasize that measurements from this camera are not used by our algorithm, and that we only use the camera for validation purposes, to assess focusing quality and to image an undistorted reference of the bead layout.

Principle
We derive a strategy for efficiently finding a wavefront shaping modulation pattern for the illumination arm, allowing us to focus all light into a single spot inside the tissue sample. Once we have found the modulation pattern, we use the same modulation to also correct the emitted light in the imaging arm. This is possible because, in our linear fluorescent imaging setting, emission and excitation wavelengths are relatively close. Our approach extends to the incoherent imaging case iterative phase conjugation ideas that were previously used with coherent illumination. We begin our presentation by reviewing the coherent case, and then introduce the incoherent one.
Coherent iterative phase conjugation. Consider a set of K scattering (non fluorescent) particles inside a sample, and denote their positions by o 1 , . . . , o K . We denote by u the value of an incoming 2D electric field at the input plane, and by ν a K × 1 vector of the field propagating through the sample at each of the K scatterers. Although u is a 2D field, we reshape it as a 1D vector and relate ν to u as ν = T i u, where T i is the incoming transmission matrix describing coherent light propagation. T i is specific to the tissue sample being tested. Likewise, we denote by T o the back-propagation transmission matrix, describing the light returning from the particles to the sensor. The propagation of light to the particles and back to the sensor is then modeled using the combined transmission matrix Note that (1) offers a simplistic description of light propagation, assuming there is not much light back-scattered from other structures in the medium apart of the listed Our wavefront correction fluorescent microscope setup: A laser beam is exciting fluorescent beads at the back of a tissue layer, and fluorescent emission is scattered again through the tissue, reflects at a dichroic beam-splitter and is collected by a main (front) camera. We place two SLMs in the Fourier planes of both illumination and imaging arms to allow reshaping these wavefronts. A validation camera views the beads at the back of the tissue directly. This camera is not actually used by the algorithm, and is only assessing its success. LP=linear polarizer, BS=beam-splitter, DBS=dichroic beam-splitter, BPF=bandpass filter, L1 . . . L7=lenses, Obj=Objective.
particles o 1 , . . . , o K , and multiple scattering between the particles is negligible. Under fully coherent illumination, the input illumination and the measured speckle intensity are related as Our goal is to find an illumination pattern u that will focus on one of the particles, so that ν is a one-hot vector -non-zero only at a single point o k for some k value. We note that focusing at any of the particles is sufficient for our setting; below, we show that once we focus at one point, we can use the memory effect to focus at nearby ones.
To find a wavefront modulation we need access to T i , but in practice we can only measure T a . The wave conjugation principle states that the returning transmission matrix is the transpose of the incoming one, [38]. With this assumption, consider an illumination field u that, after propagating through the tissue sample, generates a one-hot ν vector. If we focus all light at one particle, then by the wave conjugation principle the returning field is proportional to the incoming one. Therefore, we can express the returning intensity in (2) as where s is a scale factor. That is, a focusing wavefront is an eigenvector of the combined transmission matrix T a .
Consequently, if we can compute eigenvectors efficiently, we can find a wavefront that focuses all the light in a single spot. A common class of numerical algorithms for computing matrix eigenvectors follows the power method [47]. This algorithm relies on the fact that the sequence u, T a u, (T a ) 2 u, (T a ) 3 u . . . converges exponentially-fast to the largest eigenvector of T a . Iterative phase conjugation algorithms [37,39,42,44] do not acquire the full transmission matrix T a , but instead directly measure its optical operation on wavefronts of interest. They begin by illuminating the sample with a random wavefront u 0 , then iteratively measure the resulting output wavefront, and use its conjugate as a successive illumination pattern. That is, at the t-th iteration, the incident wavefront is u (t) = (T a u (t−1) ) * , where * denotes complex conjugation. When measuring intensity images I = |T a u (t) | 2 , computing u (t+1) also requires estimating the phase of the measured intensity pattern.
Using the exponential convergence property of the power method it can be shown [47], and we review the derivation in the supplement, that the energy focused on the k-th particle at the t-th iteration follows a geometric sequence of the form for constants λ k , c k and a normalization factor N (t) we derive in the supplement. (4) implies that the energy at the k-th particle scales exponentially with the iteration number t. Thus, each iteration increases the gap in energy between the strongest and second strongest particles, and the sequence quickly converges to a one-hot ν vector.
Incoherent phase conjugation. The main limitation of coherent iterative phase conjugation is that to describe the propagation using the model of (1) one neglects multiple scattering between the particles, as well as backscattering from any other tissue components. This in turn limits the applicability of the technique to thin or sparse volumes. By using fluorescent emission we remove this restriction, because even in thick tissue it is reasonable to assume that the emitted light does not excite the tissue or the other beads again. Moreover, we show that the incoherent summation of fluorescent emission results in largely accelerated convergence. However, an adaptation of the power method to the incoherent case is not straightforward due to the non-linearity imposed by incoherent emission.
To study the incoherent case we need to adjust the above model in two ways. First, we now mark by o 1 , . . . , o K the positions of the fluorescent particles rather than all scatterers in the volume. We use T i to describe propagation at the excitation wavelength λ i , and T o to describe propagation at the emission wavelength λ o . Despite the small difference between emission and excitation wavelengths, we still assume that T o ≈ T i . Note that T i , T o describe multiple scattering events by other tissue components apart of the listed fluorescent particles o 1 , . . . , o K .
Second, whereas in the coherent case the output wavefront is a linear function of the input, T a u (t) , this linear model no longer holds when incoherently summing light from different emitters. To derive an image formation model for this case, we again use ν = T i u to denote the field arriving at the fluorescent emitters. Fluorescent emission is proportional to the intensity of ν, and the recorded intensity equals an incoherent summation where T o :,k is the k-th column of T o . If we manage to focus and ν is a one-hot vector, then there is only a single non zero term in the summation of (5). Denoting the index of this non zero entry by k o we can express the intensity in (5) as So effectively, when focusing is achieved, (5) reduces to (2), and the measured intensity is equivalent to |T a u| 2 .
Therefore in the incoherent case, a focusing wavefront is still an eigenvector of the transmission operator T a = T o · T i . Motivated by this observation we apply iterative phase conjugation as in the coherent case. As we measure only the intensity of the emitted light, to recover the phase of the wavefront, we use a phase diversity acquisition scheme [30]. We place J = 5 known modulation patterns H j on the phase SLM of the imaging arm. At the t-th iteration, we measure speckle intensity images where is convolution, h j is the Fourier transform of the pattern we placed on the SLM, and |ν t k | 2 is the intensity arriving at the k-th particle in the t-th iteration. We use gradient descent optimization to find a complex wavefront We then use the conjugate of the estimated wavefront as the excitation of the next iteration, and display it on the SLM of the illumination arm. When the intensity image is an incoherent summation from multiple sources there is typically no wavefront minimizing (8) with zero error. Despite this, we show in the supplement that the resulting wavefront is approximately equal to a weighted linear combination of the wavefronts T o :,k generated by the individual sources. Sources with stronger emission receive a higher weight in the reconstruction, which further increases their weight in the next iteration of the algorithm.
In the supplement, we analyze the differences between the coherent and incoherent models, and we show that the incoherent summation results in an asymptotically faster convergence rate. In particular, we prove the following claim.
Claim 1. The convergence of the power iterations in the incoherent case follows a doubly exponential sequence of the form for scalars λ k , c k derived in the supplement.
To understand the difference, we note that in the coherent case of (4), the energy at the different particles scales as λ t k . In the incoherent case, we get another exponential factor, and energy scales as (λ k ) 2 t . Intuitively, this is because the fluorescent emission is proportional to the intensity of the field |ν (t) | 2 arriving at the particles, rather than to the field ν (t) itself. As ν (t) is squared in k | 2 , for different iterations of the iterative phase conjugation algorithm. As predicted by theory, the incoherent case converges into a one-hot vector within a smaller number of iterations (compare 4 incoherent iterations to 15 coherent ones). The x axis of our plot corresponds to scatterer index k, where for ease of visualization we sort these in decreasing order of power. every iteration, the squaring is accumulated into another exponential term.
To visualize the faster convergence, in Fig. 2 we simulated coherent and incoherent power iterations on a random transmission matrix sampled as described in supplement.
In practice, in the hardware implementation described below, our algorithm converged within about 2 − 6 iterations. Accounting for the 5 images used for phase acquisition at each step, our approach can find a wavefront correction pattern using about 10 − 30 image measurements. This provides orders of magnitude speedup compared to recent optimization-based approaches recovering a wavefront shaping correction pattern using a single-photon fluorescent feedback, which requires capturing thousands of images [7].

Results
In our experimental implementation, we use fluorescent microspheres of diameter 200 nm (ThermoFisher Fluo-Spheres dark red), excited and imaged with N A = 0.5 objectives so that the particles are slightly smaller than the diffraction limit. For excitation, we use a 637 nm laser, and to measure emission we use a band-pass filter of center wavelength 680 nm and bandwidth 10 nm. In the main paper, we use as scattering samples chicken breast tissue slices of thickness 200 − 400µm. In the supplement, we also show results using other scattering phantoms, including parafilm and polystyrene beads dispersed in agarose.
For all examples, significant scattering is present, and a standard microscope cannot image the actual source pattern. The beads are attached at the back of the tissue layer, separated only by a 150µm microscope cover glass.
We use two Pluto Holoeye SLMs, and a Prime BSI sC-MOS sensor for imaging fluorescent emission. Fig. 3 visualizes the power iterations of our algorithm from both the main camera and the validation camera. In the beginning the main camera sees a wide speckle pattern, and from the validation camera we can see that a wide speckle pattern reaches the back of the tissue. We also use a band-pass filter on the validation camera to image the beads excited by each modulation pattern. The validation camera confirms that as the algorithm proceeds the illumination reaching the back of the tissue converges into a single spot. Even if we manage to excite a single bead, the emitted light can scatter on its way to the main camera and generate a speckle pattern. In Fig. 3 we first visualize this scattering by showing the views of the main camera when modulation is used in the illumination SLM to focus the excitation, but with no modulation at the imaging arm. In addition, we show what happens if the modulation pattern of each iteration is also placed on the SLM of the imaging arm.
As the iterations proceed and the modulation pattern improves, the imaging SLM refocuses the light emitted from the excited bead into a single sensor spot.
When multiple fluorescing particles are present in the field of view, the algorithm typically converges to the strongest one. However the particle at which the algorithm converges may vary due to multiple reasons such as imaging noise, fluorescence bleaching or local minima of the phase diversity optimization. Convergence can also change if the optimization is initialized with a different speckle pattern.
In Fig. 4 we demonstrate the final iteration of our algorithm on a few additional examples. In Fig. 4(c) we also visualize the actual scattering of the tissue layer. To this end we place the correction mask on the illumination SLM only, bringing all light to excite a single bead. We use no correction on the imaging arm, allowing us to visualize the speckles from this source. This image corresponds to a column of the transmission matrix T o .
In Fig. 5, we visualize the phase diversity acquisition results at the first and last iterations of the algorithm. Fig. 5(a) shows the image from the main camera in both iterations when the imaging SLM applies no correction. Fig. 5(b) shows the intensity of the recovered wavefront at the plane of the camera sensor, which is set conjugate to the plane of the fluorescent sources. The phase diversity optimization attempts to explain the images in Fig. 5(a). However, in the first iteration, the captured speckle image is an incoherent summation from multiple particles, which the optimization objective of (8) attempts to explain with a single coherent wavefront; thus the result is imperfect. In the last iteration, the algorithm excites a single particle, and indeed the estimated wavefront better explains the captured intensity. Finally, Fig. 5

Val. Camera Emission
Val. Camera Excitation Fig. 3: Algorithm convergence. We show the iterations of our power algorithm on two different tissue samples. We demonstrate views via the main camera seeing the front of the tissue with and without the modulation correction, and the validation camera observing fluorescent beads directly. To better appreciate the focusing we used the validation camera to capture both the excitation and emission wavelengths. In the first iteration we see a speckle image, but as power iterations proceed the illumination wavefront converges and focuses on a single bead. When the same modulation pattern is placed at the imaging arm, imaging aberrations are corrected and one can see a sharp image of the excited bead. Note that images in different iterations have very different ranges, and for better visualization each image was normalized to its own maximum.

Reference
Simple ME Tilt-shift ME the retrieved phase in the frequency domain, which is essentially the pattern displayed on the SLM. We note that, as we use a phase-only SLM, we effectively correct only the phase of the wavefront and neglect its amplitude.
Imaging a wide field of view. The recovered modulation pattern is designed to focus at a single particle inside the tissue sample. However, due to the memory effect, the corrections of nearby spots are similar. We demonstrate this experimentally in the second row of Fig. 6: We place a random pattern on the illumination arm, which results in exciting multiple fluorescing particles. We then place the correction pattern recovered for focusing on one of the fluorescing particles on the imaging arm. We observe that, thanks to the memory effect, the camera can image a small neighborhood of particles around the focus points, and not just the single particle the correction pattern corresponds to. To further improve on this, we use the tilt-shift memory effect [5,31] and shift the modulation mask in the Fourier plane. As we explain in the supplement, this shift allows us to focus at nearby regions using the same correction pattern. In the third row of Fig. 6, we image a wider range of particles behind the tissue sample, by scanning 21 × 21 such shifts. We acknowledge that by placing the SLM at a plane conjugate to the sample itself [29,31] rather than in the Fourier plane, we can probably expand the region corrected by a single modulation and reduce the number of required shifts. Even after exploiting the tilt-shift, the extent of the memory effect is limited, and beads at the periphery of the images of the last row in Fig. 6 are either not recovered or strongly aberrated. Imaging beyond this region would require applying another set of power iterations to calibrate a different wavefront modulation, amplifying the importance of the faster convergence of our proposed procedure.

Discussion
We extended iterative phase conjugation algorithms to apply to incoherent fluorescent imaging. Even though the incoherent contribution of different sources alters the linear transmission model from which these algorithms are derived, we show that the non-linear incoherent model accelerates convergence rate, from exponential to doublyexponential. To find a wavefront correction pattern, we need to excite the tissue with a very small number of trial patterns and measure the resulting excitation. The number of measurements is orders of magnitude smaller than that of previous optimization-based techniques. We used the recovered modulation pattern to image fluorescent particles placed behind a tissue sample. However, wavefront correction in thick tissue is spatiallyvarying, and each modulation pattern is only usable for imaging a limited region, with size determined by the memory effect. To image a wider region behind the tissue sample, we have to apply the modulation recovery algorithm multiple times in different sub-regions. This makes it even more important to have fast wavefront shaping algorithms. One way to further reduce the number of acquired images is to use a tilt-shift adaptation of the aberration correction estimated in one region, and initialize with it the power iteration in neighboring sub-regions.
Our current results apply only on a sparse set of fluorescent particles. Increasing the density of the sources is challenging because as speckle contrast decays [2], it is harder for the phase diversity acquisition scheme to recover phase. To alleviate this problem, we could adopt other phase acquisition schemes, such as using a Shack-Hartmann sensor [41].
Another issue which may challenge convergence with a dense continuous fluorescent object is two nearby spots emitting similar power. This is due to the fact that when a transmission matrix contains multiple eigenvectors with the same eigenvalue, power iterations may not separate them, and can converge to a linear combination of the two eigenvectors. We note however that the incoherent convergence rate as analyzed in supplement Eq. (26) depends not only on the actual eigenvalues, but also on the initial excitation T i k,: u (0) . As this excitation is usually a highly varying speckle pattern, there is a better chance to separate between nearby illuminators of similar power.
Our approach also relies on the assumption that the excitation and emission wavelength are close enough so that the excitation and emission transmission matrices are sufficiently similar. Also, as the emitted light contains multiple wavelengths, these can produce somewhat different speckle patterns. There is evidence in the literature that speckle patterns produced by nearby wavelengths are correlated [58], but this similarity degrades as the tissue sample thickness increases [3]. In our ex-perimental implementation, there is a 40 nm gap between the emission and excitation wavelengths. In linear fluorescence imaging, the gap between excitation to emission can be made lower than that, leading to even stronger correlation between the wavefronts.
Relationship to memory-effect techniques. Our work is orthogonal to approaches for imaging fluorescent sources through tissue using speckle statistics, and in particular the memory effect [2,6,24]. Recently, such approaches were successful in imaging a sparse set of fluorescent particles inside tissue, using hundreds [8,59] or even just dozens [12] of images. While the field of view of a wavefront shaping modulation is constrained by the extent of the memory effect, as demonstrated in Fig. 6, approaches based on the memory effect can recover fullframe patterns with much wider field of view. By contrast, memory effect correlations only exist in thin tissue layers, while approaches based on phase conjugation can theoretically achieve larger penetration depths. However, in practice, the penetration depth is greatly constrained by the very weak signal-to-noise ratio of fluorescent emission. Approaches based on phase conjugation make it possible to not only image through scattering, but also focus light inside scattering tissue, a capability that memory effect approaches lack. Focusing inside tissue is important for applications such as laser treatment therapy, confocal microscopy, and STED microscopy. Finally, a modulation recovered from fluorescent sources can also be used to image adjacent non-fluorescent tissue structures.

Acknowledgments
We thank Lucien Weiss and Onit Alalouf for their help preparing experimental data.

Disclosure
The authors declare no conflicts of interest.

Convergence analysis for incoherent transmission
We analyze the convergence of our iterative algorithm, while modeling the incoherent summation of different fluorescent emitters. We show that incoherence leads to asymptotically faster convergence when compared to the coherent case.
Model. We introduce some notation that we will use to rewrite the transmission image formation model in an equivalent form that is more convenient for our analysis. For this, we recall from the main paper our assumption that the transmission matrices for excitation T i and emission T o are transposes of one another, T o = T i . Therefore, the rows of T i equal the columns of T o . We write T i k,: , T o :,k for the k-th row and column of these T i and T o , respectively. We also denote by n k the norm of the k-th row of T i and k-th column of T o , where x is a position on the input or sensor plane, for excitation and emission respectively. n k can be lower than one, because some light emitted by the fluorescent particles scatters at angles higher than the numerical aperture of the objective and does not reach the sensor. We use T i , T o to denote the matrices T i , T o after normalizing their rows and columns, respectively, to have unit-norm: We will account for the norm explicitly in the image formation model. Then, linear fluorescence emission from the k-th location inside tissue is proportional to where we use e k to denote the emission power of the kth particle (for simplicity of exposition, in the main text we have absorbed e k into the unnormalized transmission matrix). Similarly, the measured emission intensity is With this notation we can express the combined transmission operator as where W is a diagonal matrix with non-negative diagonal entries These entries encode the power of the fluorescent emitter at the k-th focus location, as well as the amount of energy transferred on the k-th row and column of T i and T o , respectively.
To further simplify our analysis, we also assume that the wavefonts emitted by different fluorescent particles are sufficiently random, and their correlation is sufficiently small. For the rest of this derivation we will assume ε k, ≈ 0 and can be neglected. We note that the memory effect implies that the rows of the transmission matrix are correlated shifted versions of each other; that is T o x,k ≈ T o x+∆, where ∆ is the displacment between the k, particles. However, even in the presence of ME correlation, at the zero shift we consider in (16), such rows are uncorrelated, as effectively, the row entries are pseudo-random patterns.
With this decorrelation assumption, T i , T o are orthogonal matrices, and the diagonal entries w k of the matrix W in (14) will correspond to the eigenvalues of the transmission operator T a .
Power method under coherent illumination. We start by reviewing the principles of the power method [47] considering a coherent illumination model. For simplicity, we assume that we can measure the phase of wavefronts rather than only their intensities. We will later extend this analysis to the incoherent case.
To apply the power method for the coherent illumination case, we start with a random illumination pattern u 0 . At each iteration, we illuminate the tissue sample with a wavefront u t . The propagation through the sample and back to the sensor produces a wavefront T a u t . Assuming we can measure both amplitude and phase of the resulting wavefront, we update the illumination wavefront as where we normalize u (t+1) to fix the total energy of the excitation pattern u (t+1) at each iteration, as determined by the power of the excitation laser. We note that, in practice, we use a phase-only SLM, and thus we only display the phase of u (t+1) , dropping its amplitude.
To understand the convergence of this algorithm, we denote by β t k the energy scattered from the k-th particle inside the tissue sample at the t-th iteration, With this notation, the operation of the transmission matrix T a on u (t) equals the sum of the columns of T o weighted by β Therefore, the illumination pattern used at the next iteration will equal a weighted combination of the conjugate columns, u (t+1) is then normalized as in (17). Applying T i on u (t+1) can be expressed as a summation over all entries x. Using the decorrelation assumption of (16), and (20) this reduces to A simple recursion implies that β (t) k follows an exponential series of the form where λ k ≡ w k , c k ≡ β 0 k . (23) states that the entries of β (t) scale exponentially as a function of the iteration number. This implies that the gap between the largest entry of β and the next one increases with each iteration. It is easy to show that the sequence converges quickly into a one-hot vector, which is non-zero at a single entry.

Power method under incoherent illumination.
The coherent case is similar to the classical application of the power method. We now make the necessary adaptations for the incoherent case. The convergence rate we achieve is asymptotically faster than the exponential convergence we derived in (23). Throughout this derivation we assume the power of the fluorescent sources is constant during optimization and ignore effects such as blinking or bleaching.
To study the incoherent emission of fluorescent sources, we start by deriving the corresponding image formation model. At the t-th iteration, we excite the tissue sample with a wavefront u, and measure where α (t) k denotes the incoherent equivalent of β (t) k , the energy emerging from the k-th emitter, times the norm of the k-th column. Following the definitions in Eqs. (12), (13) and (15), we use: Our goal is to show that, as in the coherent case, within a small number of iterations, α (t) k converges to a one-hot vector.
At each iteration, our algorithm needs to estimate some phase from the speckle intensity image I (t) . As we mention in the main paper, we use a phase diversity acquisition scheme. As this scheme is based on optimizing a non-linear score, analyzing its convergence is not straightforward. To this end, we start by considering a simpler acquisition scheme based on point diffraction interferometry [1,43]. This scheme is highly-sensitive to noise, and implementing it using weak fluorescent sources is impractical. However, the advantage of this scheme is in providing a closed-form expression for the contribution of different sources to the estimated phase, allowing for simple analysis. We will later use numerical simulations to compare the convergence of phase diversity against the analytical expressions we derive from point diffraction interferometry.

Point diffraction interferometry. Consider the field T o
x,k generated by the k-th fluorescent source at image point x. We decompose it as where τ o k is its complex mean andT o x,k = T o x,k − τ o k . Point diffraction interferometry captures J ≥ 3 images using the SLM in the Fourier plane of the imaging arm. It changes phases at a single spot corresponding to the 0-th (central) frequency. Placing phase φ j at the zero frequency only changes the mean of the signal, and the intensity to be measured at pixel x of the image plane from the k'th source corresponds to where denotes the real component, and α (t) k defined in (25) corresponds to the energy emitted by the k-th fluorescent particle given the current excitation wavefront. In the presence of multiple incoherent sources, we measure the incoherent summation of the intensity speckle patterns produced by each of them, In point diffraction interferometry, we capture J ≥ 3 images with equally-spaced phase shifts φ j = [1 . . . J] 2π J . Standard phase shifting interferometry techniques [19] imply that, by summing measurements with different phase shifts, we can extract Thus, point diffraction interferometry extracts a weighted combination of the wavefronts emerging from all incoherent sources. The weights correspond to the intensity they receive from the previous excitation pattern, weighted by a complex scalar corresponding to the mean τ o k . In the next iteration of the power method, we excite the tissue with the extracted wavefront normalized to have unit energy: Convergence. We now show that the sequence α t k converges to a one-hot vector, which implies that the iterative approach we described above converges. To this end, we note that when we excite the tissue sample with illumination u t+1 (x), we effectively multiply u t+1 (x) by T i . Using the decorrelation assumption in (16), we can write the energy at the k-th fluorescent particle as Thus, using the definition of α k in (25) and ignoring the normalization in (31), we have Using recursion, this leads to where To understand the difference between this result and the coherent case in (23), we note that in the coherent case the leading term converges as λ t k , which is an exponential sequence. In the incoherent case we have another exponent, and the leading term in (34) is of the form (λ k ) 2 t . This is known as a doubly exponential series, which will converge into a one-hot vector much faster than the exponential series of (23).
Phase diversity acquisition. As we mentioned above, the point diffraction interferometry scheme is useful for analysis, as it leads to closed-form expressions. In practice, this approach is very sensitive to noise, and implementing it with weak fluorescent sources is unrealistic. Instead, our implementation uses a phase diversity acquisition scheme [30]. We place J = 5 known modulation patterns H j on the SLM of the imaging arm, and measure speckle intensity images of the form where denotes convolution, and h j is the Fourier transform of the pattern we placed on the SLM. We use gradient descent optimization to find a complex wavefront This optimization is subject to local minima, and it is hard to give any analytic guarantees about its convergence. Below we conduct numerical simulations comparing its empirical convergence to the analytical predictions from point diffraction interferometry.
Numerical evaluation. We sample transmission matrices T o , T i such that each row is a random i.i.d. complex Gaussian random vector. For simplicity all rows have the same mean τ o k . We transform the noise vectors to the Fourier domain and set to zero any frequency above N A = 0.5. In the primal domain, we limit the speckles in a Gaussian window of STD 20µm, as the speckles imaged in our setup have a limited support and do not spread over the full sensor. We assume K = 20 fluorescent sources. We initialize with a uniform excitation and apply the power method as we described above. We simulate phase acquisition with ideal noise-free point diffraction interferometry, and also by solving the phase diversity optimization of (37), which may converge to local optima. In Fig. 7(a-b) we plot the vectors α (t) we obtain in the first three iterations of the algorithm. For ease of visualization, we sort the entries of this vector by decreasing order of w k , so that the maximal eigenvalue is always at k = 1. We also normalize the plotted vectors to sum to 1. With both acquisition schemes, within a small number of iterations α (t) is a one-hot vector. In each case we plot a dashed line with the expected values following the model of (34). Even the point diffraction interferometry values do not match this model precisely, because the transmission matrices we sample have random rows which have low correlation, yet their correlation is not precisely zero as assumed in (16). Despite this difference, the convergence rate of both schemes qualitatively agrees with the model predictions.
To statistically assess the differences between the model of (34) and the empirical phase retrieval results, we sample 50 random transmission matrices and apply on each the first iteration of the power method. For this, we use an initial excitation such that |T i k,: u (0) | is uniform.We measure intensities using the point diffraction interferometry or phase diversity schemes, recover the phase of u (1) , and compute the vector  Phase diversity optimization is often sparser than the model prediction (above the diagonal line) but can also be of lower quality.
To assess the sparsity of this vector, we measure Ideally we want s to be as close to 1 as possible. According to the model in (34) the sparsity of the eigenvalues after one iteration should be equivalent to In Fig. 7(c), we evaluate s pdi and s pd using the point diffraction interferometry and phase diversity schemes for 50 different transmission matrices. For the k-th random transmission matrix, we plot the 2D points (s k o , s k pdi ) and (s k o , s k pd ). The plot demonstrates that, in practice, the sparsity of phase diversity is equivalent or even better than point diffraction interferometry. If s pdi and s pd matched the s o prediction, all points should lie on the diagonal dashed line marked in the figure. We see that, for point diffraction interferometry, s pdi is proportional to s o , but is not exactly equivalent to it, as the decorrelation assumption of (16) does not hold exactly. We obtained the result of phase diversity acquisition using gradient descent optimization, which does not always converge to a global optimum. The plot in Fig. 7(c) illustrates that, in most cases, this solution is actually better than the s o prediction (points above the dashed line), though for some transmission matrices the solution is worse, and the points lie below the dashed lines.

Tilt-shift correction
Below we explain the acquisition of the last row of Fig. 5 in the main paper. Given a wavefront shaping modulation that applies to one fluorescent particle inside the tissue sample, we correct nearby ones using the tilt-shift memory effect. For that, we denote by u o1 x , u o2 x two speckle fields obtained on the sensor plane of our main camera (where x denotes spatial position on this plane), generated by fluorescent particles at o 1 , o 2 . We focus the objective such that the sensor plane is conjugate to the plane containing the fluorescent sources. The tilt-shift memory effect [5,31] implies that, for small displacements, u o1 is correlated with a tilted and shifted version of u o2 : with ∆ = o 2 − o 1 the displacement between the sources. If there was no tilt, and the speckle at the image plane could be explained by pure shift, placing in the Fourier plane the Fourier transform of u o1 x would correct the emission from o 1 and the emission from nearby points o 2 . Given the tilt, the Fourier correction for o 2 should be a shifted version of the Fourier correction of o 1 . To account for this, we place in the Fourier plane of our imaging arm shifted versions of our recovered mask. Fig. 8 illustrates that each such shift allows us to see the fluorescent particles in a different local region. By scanning multiple shifts of the modulation mask, we construct a wider image of the fluorescent particles inside the tissue sample, as shown in the last row of Fig. 8 and in Fig. 5 of the main paper.

Additional results
Most experiments in this paper used chicken breast tissue, whose optical properties have been characterized by [40], reporting an anisotropy parameter g = 0.965 and a mean free path (MFP) around 43.7µm. In practice, these numbers may vary significantly between different tissue slices. We also demonstrate results on two other materials whose optical properties are better characterized. First, as in [31] we used 10µm polystyrene micro-spheres dispersed in agarose. Using Mie theory we compute the anisotropy parameter of this dispersion to g = 0.98. Sample thickness was 500µm and we measured its optical depth as OD = 5.9. Results on this sample are demonstrated in Fig. 9. In addition, we used parafilm, whose optical properties were characterized by [7]. This has an anisotropy g = 0.77 and a MFP around 170µm, where each layer is 120µm thick. In Fig. 10 we demonstrate focusing through one and two parafilm layers. While the OD here is not high, the parafilm has a much wider scattering angle and the speckle spread on the sensor is very wide. As the fluorescent emission is weak in power, for the two layer example the speckle images we measure involved a lot of shot noise and the algorithm convergence was not very stable.

Calibration and alignment
Below we elaborate on various calibration and alignment details. First, to correctly modulate the Fourier transform of the wave, the illumination SLM needs to be at the focal plane of the lens right after it (L2 in the system figure), and the imaging SLM at the focal plane of the lens before it (L5). We do this alignment using another camera focused at infinity. We use this camera to view the SLM through the relevant lens, forming a relay system. We adjust the distance between the SLM to the lens until the calibration camera can see a sharp image of the SLM plane. We also ensure that the distance between the sensor of the main/validation cameras and the lenses L6/L7 attached to them is set such that the cameras focuse at infinity.
A second step of the alignment is to focus the excitation laser and the system camera on the same target plane. In our setup the sample and the objective of the validation camera are mounted on two motorized z-axis (axial) translation stages. We use fluorescent beads with no tissue and adjust the axial distance between the sample and the objective of the main camera (Obj1 in the setup figure) such that the main camera sees a sharply focused image of the bead. Then we adjust the distance of the validation objective (Obj2 in the setup figure) from the beads so that we see a sharp image of the same beads in the validation camera. We then want the laser to generate its sharpest spot on the same plane. Assuming the validation and main camera are focused at the same plane, we adjust the position of the lens L3 until the validation camera sees a sharp laser spot.
After the system has been aligned we need to determine two mappings. The first one is between frequencies to pixels on the SLM. A second, more challenging one is the registration between the two SLMs, so that we can map a pixel on the imaging SLM to a pixel on the illumination SLM controlling the same frequency. We start with a mapping between frequencies to the SLM on the imaging arm. We first put a calibration camera that can image the camera SLM plane directly when it receives light from fluorescent beads. This allows us to see an il-luminated circle on the SLM plane, corresponding to the numerical aperture of the imaging system. The center of this circle gives us a first estimate of the zero (central) frequency of the Fourier transform. Assuming we know the focal length of L5, the SLM pitch and the wavelength of the emitted light, we can map frequencies to SLM pixels using simple geometry. Alternatively we can display on the SLM sinusoidals of various frequencies. This shifts the image on the sensor plane. By measuring the shift resulting from each sinusoidal we can calibrate the mapping between frequencies to SLM pixels. To align between the two SLMs we find a region behind the tissue where a single isolated bead is excited so that optimizing the phase diversity cost provides the correct modulation pattern with a single power iteration. We need to determine how to position this modulation on the SLMs keeping in mind that tilt and shift on these planes may impact the results. For the imaging SLM this is less of an issue because we have already marked the zero frequency and because a tilt of the imaging SLM only shifts the position of the spot on the sensor. However, if the illumination SLM is not registered correctly we may see a sharp spot behind the tissue but it will be shifted from the bead of interest and will not excite it. Thus, we tilt and shift the modulation on the illumination SLM until we see the bead is excited in the validation camera, or alternatively, until the intensity we measure on the main camera (when the modulation correction is on) is maximized. After this is achieved we can fine tune the shift on the imaging SLM, which is equivalent to the position of the zero (central) frequency that we have previously marked by looking at the illuminated circle.
Once the system has been calibrated and aligned the algorithm can proceed as described above. For the phase diversity we use j = 5 random, phase-only, modulation masks H j . We start by sampling values for each SLM pixel independently, but we then low pass the masks H j to limit the spread of the convolution kernels h j in the image domain, so that the limited fluorescent energy is not split between too many sensor pixels. We chose the support of h j to be about the same as the spread of the speckle pattern we observe in an unmodulated image.
To use a recovered modulation pattern to image a larger region of beads we use the tilt shift memory effect. To apply the scan we need to recover the parameter α of (41), determining the ratio between the tilt and shift. For that, after we recover the modulation pattern we place it on the illumination SLM and use the validation camera to view the focused spot. We then adjust the ratio between tilt and shift of the modulation pattern so that we can move the focused spot in the validation camera, while preserving maximal intensity. Iteration 1  Iteration 2  Iteration 3  Iteration 4  Iteration 5 Main Camera No mod.
Main Camera With mod.

Val. Camera Emission
Val. Camera Excitation Fig. 9: Algorithm convergence on an agarose scattering phantom of OD = 5.9. We demonstrate views via the main camera seeing the front of the tissue with and without the modulation correction, and the validation camera observing fluorescent beads directly. To better appreciate the focusing we used the validation camera to capture both the excitation and emission wavelengths. In the first iteration we see a speckle image, but as power iterations proceed the illumination wavefront converges and focuses on a single bead. When the same modulation pattern is placed at the imaging arm, imaging aberrations are corrected and one can see a sharp image of the excited bead. Note that images in different iterations have very different ranges, and for better visualization each image was normalized to its own maximum. Note also the different scale bar in different rows, some rows zoom only on the center of the speckle pattern.

Val. Camera Emission
Val. Camera Excitation Fig. 10: Algorithm convergence on a parafilm scattering phantom. The top example uses a single parafilm layer and the second one images through two such layers. We demonstrate views via the main camera seeing the front of the tissue with and without the modulation correction. We also demonstrate the view from the validation camera observing fluorescent beads directly, with and without a bandpass filter. Note the different scale bar in different rows, some rows zoom only on the center of the speckle pattern.