Dynamic noninvasive imaging through turbid media under low signal-noise-ratio

In turbid media, scattering of light scrambles information of the incident beam and represents an obstacle to optical imaging. Noninvasive imaging through opaque layers is especially challenging for reliable image reconstruction and dynamic objects. We here propose a solution to these problems: rather than using the full point-spread function or its Fourier transform (optical transfer function, OTF), the wave distortions in scattering layers can be characterized and diffraction-limited imaging performed using only the phase of OTF. Based on this understanding, we develop a method that exploits the redundant information from multiple measurements, which reliably yields OTF phases within several iterations. This method enables noninvasive imaging through turbid media with low signal-to-noise ratios in the measurements, which is not possible with previous methods. We then demonstrate noninvasive video imaging of a moving object hidden between scattering layers at 25–200 Hz. This imaging approach may inspire many other applications in scattering materials.


Introduction
The ability to look through light-scattering media is important in various fields ranging from life sciences [1] to nano-technology [2]. Among all the means, imaging with visible light offers better resolution than microwave and ultrasound imaging and is still considered safe for samples in most scenarios, such as biomedical tissues. In optically-thick random media, wavefront distortions due to multiple light scattering are very complicated, so the wavefront correction methods in conventional adaptive optics [3] provide limited improvement [4], which only compensate for low spatial frequency perturbations such as atmospheric turbulence and aberrations induced in translucent tissues [5]. One successful approach in strongly scattering medium is to selectively detect ballistic photons that travel through the medium in a straight line [6][7][8]. Unfortunately, its working depth is as shallow as a few mean free paths (MFPs) because the ballistic light flux is exponentially extinguished as the depth increases. Photons that are scattered only once [9] or slightly so that the overall propagation direction is maintained, such as photo-acoustic microscopy [10][11][12], can also be exploited to image target objects located deeper than ten MFPs with limited resolution.
There are also numbers of other innovative strategies which use scattered waves in full. Pioneering work [4,[13][14][15][16][17][18][19][20] utilizing the spatial degrees of freedom of light have been demonstrated including wavefront shaping [15][16][17] and phase conjugation [18][19][20], which are usually combined with a remarkable speckle correlation phenomenon known as the 'memory effect' [21,22] for imaging behind tens of micron-to millimeter-thick tissues. These techniques require the presence of a guide-star in the object plane and the use of a spatial light modulator or a digital mirror device to generate the needed phase map. Recently, noninvasive optical imaging strategies which remove those requirements have been proposed and demonstrated [23,24], which are similar to the speckle interferometry first proposed by Labeyrie [25] in 1970, which aims to yield the diffraction-limited autocorrelation of astronomical objects in spite of atmospheric turbulence. These strategies exploit the speckle correlations caused by the 'memory effect' for scattering and use phase retrieval for image reconstruction. However, in these methods, the image-retrieval process is undependable and time-consuming with stagnation problems, which deteriorates with low signal-to-noise ratios (SNRs) in realistic measurements. This unreliability becomes a major bottleneck for looking through scattering media. It is still a big challenge not only for imaging in weak light environments, but also for dynamic imaging. Thus, present noninvasive methods are still far from a realistic application.
To overcome the above difficulties, we propose that, when a strong scattering system contains sufficient random scatterers to satisfy the ergodic-like condition [26], the wave distortions can be effectively characterized in the Fourier frequency domain using only the Fourier phase of the PSF (the optical transfer function (OTF) phase), rather than the full PSF [27]. Once the OTF phase is obtained, diffraction-limited images can be analytically deconvolved without any iterative process. Based on this perspective, we develop a framework, named multi-frame OTF retrieval engine (MORE), to non-invasively recover the OTF phase of a scattering imaging system. It exploits the redundant information from multiple measurements to address the unreliability problem in conventional methods, resulting in a highly stable and rapid convergence in the reconstruction. Even in low SNR situations which are inaccessible with previous methods, it still stably and rapidly yields unique solutions in spite of noisy data.
It was generally believed that, for a strong scattering imaging system, a full PSF is necessary for recovering diffraction-limit images [28][29][30][31][32][33]. Since the random fluctuation of a PSF will cause an error amplification during the deconvolution, an iterative process is usually required to solve this problem even after the PSF is precisely known [28,29,[31][32][33]. Recently, there are works using non-iterative deconvolution in imaging through scattering: 1) the magnitude of the OTF (MTF) was artificially changed to an empirical exponential form, where the exponential coefficient varied in different experiments and were determined empirically [34]; 2) multiple PSFs caused by point sources on different axial positions were measured and summed together, which flattened the MTF and suppressed the error amplification in the deconvolution [35]. These methods tried to minimize the influence of MTF at the cost of more measurements or calibrations. Our theory is fundamentally different. We completely avoid the MTF involved in the deconvolution and prove that the phase of the OTF is enough to analytically solve the diffraction-limit images. This not only avoids an iterative process, but also reduces the difficulty in characterizing the wave distortions because only half information of the PSF is needed.
Since MORE recovers OTF phases, it enables dynamic imaging through turbid media. Moreover, MORE preserves the information of the relative positions and orientations of an object at different moments. This allows us to simply put together all the recovered images to create a video without worrying about image misalignment. We realized video imaging at 25 ∼ 200 Hz for a time-varying object hidden behind turbid media. As a non-invasive approach, our method does not require any calibration or pre-process. It starts with capturing all the speckle patterns at once, and then performs post-reconstructions for the images.
It is worthwhile to point out that, if the previous imaging methods [23,24] were used for dynamic imaging, each frame would have to be individually reconstructed with a conventional phase retrieval algorithm. In the lack of the information of the relative positions and orientations, it is very difficult to track the variance and movement of the object to make a video, in addition to the unreliability of their convergence. Figure 1 depicts the schematic for non-invasive imaging objects hidden behind a diffuser. An LED with a wavelength of λ = 633 nm and a bandwidth of ∼15 nm illuminates the object through the diffuser. The light travels back from the object and gets scattered by the diffuser again, generating a speckle-like intensity pattern on a camera. The diffuser is a piece of 220-grit ground glass. The distance from the object to the diffuser is z ≈ 200 mm. The distance between the diffuser and the camera is d ≈ 300 mm. An aperture with a diameter of D ≈ 6 mm is place in front of the diffuser to limit an area which the scattered light passes through.

Physical mechanism
Within a 'memory effect' range, spherical waves from nearby points on the object suffer similar distortions and produce shifted but almost identical speckle patterns, S(x, y), on the image plane, which is the PSF of the system. For that isoplanatic system with incoherent illumination, the detected intensity I(x, y) on the camera is a superposition of those speckle patterns, which can be written as a convolution of the where the tilde denotes the two-dimensional Fourier transform, u and v are spatial frequency coordinates, S(u, v) is the OTF of the system. Rewrite equation (1) as where is the Fourier phase of the object. The MTF, |S(u, v)|, acts as a spatial frequency filter on |Õ(u, v)|. In a scattering system with enormous numbers of random scatterers, when more than thousands of speckles are captured by the camera's sensor to satisfy the ergodic-like condition [26,36,37], the MTF can be derived as (see appendix A): where δ D is a delta-like peak. T(ξ, η) represents the attenuation of the light intensity at position (ξ, η) on the turbid layer, which is proportional to squared modulus of the pupil function. As shown in figure 6, with an even light intensity attenuation over the circular-shape diffuser, the MTF has a gentle slope well below the system's best spatial-frequency resolution f upper ∼ nD λd at the image plane [38], where n denotes the refractive index. In the object plane, the best spatial-frequency resolution is ∼ nD λz . Thus, the MTF limits the spatial spectrum range of the object, which is determined by the pupil size, defining the diffraction-limit of the image.
To better understandM(u, v), we recall that, for a scattering-and aberration-free lens system, its OTF is a real function, which can be formulated in a similar form: Its MTF, |S sf (u, v)| ∝ T 1/2 T 1/2 , plays the similar role, defining the diffraction-limit image, the inverse Fourier transform ofM sf (u, v).
Equations (2) and (4) show that, the difference between these two systems is that the scattering system has a non-even OTF's phase, e −iΦ S (u,v) . As long as the OTF's phase is determined, the diffraction-limit image can be directly computed: where F −1 denotes the inverse Fourier transform. Since I(x, y) is a bounded real-valued function, its Fourier transformĨ(u, v) is Hermitian symmetric. Similarly, the OTF is also Hermitian symmetric and is also Hermitian symmetric. Its inverse Fourier transform M(x, y) is real-valued. From equation (5), using only the phase of OTF can direct compute the diffraction-limit image of the scattering system. Without the MTF involved in the deconvolution operation, the noise amplification can be avoid and no iterative process or equivalent operation is required at all. In the following, we provide an experimental proof for this theory.
In the setup of figure 1, we replaced the object by a resolution chart board ( figure 1(b)). The camera-captured speckle pattern is shown in figure 2(b). The PSF of the system, S(x, y), was directly measured with a point source placed on the object plane. We took the Fourier transform of the measured PSF, and obtained the OTF's phase. Using equation (5), the image of the target is direct computed from the OTF's phase and the capture speckle pattern without any iterative process. The result is shown in figure 2(c). The smallest part that can be clearly resolved is group 4, Element 5 of the resolution test chart, corresponding to an effective resolution of 2 −(Group+(Element−1)/6+1) mm ≈ 20 μm, in agreement with the expected diffraction-limited resolution of zλ/nD ≈ 18 μm.
Previously, in a scattering system, a full PSF was used to deconvolve images with iterative algorithms [14], such as the Richardson-Lucy (RL) algorithm [39,40]. Here, we repeat this operation with a Matlab RL function 'deconvlucy'. Figure 2(d) shows the result with 600 iterations of the RL algorithm. The experiment demonstrates that, the OTF phase itself contains enough information for recovering the diffraction-limited image when lacking specific noise or source information. A more detailed comparison of the image qualities obtained by different deconvolution approaches is beyond the scope of this work and will not be discussed here.

Dynamic noninvasive imaging from multiple speckle patterns
3.1. The framework of MORE Equation (5) suggests that, once the OTF phase is determined, the motion of a dynamic object can be monitored in real-time by directly deconvolving the images from the captured speckle patterns. However, in an noninvasive scenario, the PSF or the OTF phase cannot be directly measured. Fortunately, the physical mechanism also reveal that, the OTF phase can be computed by exploiting phase retrieval. Recall equation (2), we have where Φ I is the Fourier phases of the speckle pattern. Equation (6) indicates that, the Fourier phase of the object, Φ O (u, v), can be recovered from Ĩ (u, v) using a phase retrieval algorithm such as HIO or ER [41]. The OTF phase is then retrieved with equation (7). However, the phase retrieval from one single pattern is unreliable and even fails to give a reasonable solution, especially when the SNR in the measurements decreases. As shown in the experiments, when the data is captured at 200 Hz (5 ms exposure time), the speckle patterns are corrupted with experimental noise, which results in the failure of the HIO-ER. Please see appendices D and E for the performance of the HIO-ER under different SNRs.
To overcome the above challenge, we propose a framework to recover the OTF phase with multiple patterns. This method is developed in the spirit of the local isoplanatism effects in speckle imaging, i.e., these speckle patterns are generated from different objects (or different moments of the dynamic object) but from the same OTF. Therefore, the OTF is an instinctive and robust connection for the patterns. The redundant information across multi-patterns helps to stably and rapidly retrieve the OTF phase. This method is therefore termed 'MORE', which implies using more camera frames (patterns), differentiating itself from the above method with a single pattern.
As shown below, the framework starts with an initial random guess for the OTF phase, and keeps updating the OTF in a few big iterative loops (indexed by k). In each big loop, the multiple frames (indexed by j) are successively used to correct the guessed OTF' phase: (c) Apply realness and non-negativity constraints to O k,j (x, y), yielding O k,j (x, y): where Re{} indicates taking a real part, and Γ is the support (the region of the object). See appendix B for the determination of the support.
(f) Repeat step 1 to 5 with the next frame until all the frames are used. (g) Repeat step 1 to 6 to update the OTF phase until the criterion of convergency is reached.
Here, the constraints in step 3 are based on the physical facts that the image must be real and non-negative. O and Õ represent the calculated image and its Fourier transform. arg{} is the phase operator.Ĩ j (u, v) is the Fourier transform of the jth speckle pattern.

Dynamic noninvasive imaging with MORE
Using an independent phase retrieval for each speckle pattern will result in an uncertain position shift of the each image [42] (see figure 9(c)). As we can see from equation (6), the retrieval process uses the Fourier magnitude of the speckle pattern, which is invariant for any position shift of the object. Different from the phase retrieval for each single object separately, MORE recovers the OTF phase that is used to deconvolve all the images (see figure 9(b)). As we can see from the following formulism, if an additional phase of 2π · u · t is appended to the OTF, all the reconstructed images from the same OTF phase will have the same spatial shift of t, but their relative positions are reserved. So are their relative orientations, which can be explained in the polar coordinates. It is crucial to dynamic imaging.
As shown in figure 1, when the object is moving, we captured a sequence of frames. We selected three frames and used MORE to recover the OTF phase, which was then used to directly deconvole the images of all the frames with equation (5). The recovered images were simply stacked together to create a video without any image alignment process. Since the relative positions and orientations of the images are correctly recovered, the video displays the correct motion of the object. By adjusting the exposure time of the camera, we obtained three videos (400 frames each) at 25 Hz, 100 Hz and 200 Hz (see supplementary Movie1.avi, Movie2.avi and Movie3.avi (https://stacks.iop.org/NJP/22/093046/mmedia)). Figure 3 shows several frames of the video at 25 Hz. Please see appendix F for the details of the experiments and the other video samples.
Note that, in the process of each videos, MORE converged to the faithful OTF phase within 5 loops (equivalent to 15 separate iterations), taking less than 200 ms (with a Matlab code running on a Nvidia Tesla P100 GPU). MORE's convergence speed is 10-to 1000-fold faster (in terms of iterations) than that of the HIO-ER (see appendices D and E for the comparison).

Imaging under low SNRs
For realistic imaging through turbid media, the detectable light is usually weak, resulting in a low SNR in the measurements. With an exposure time of 500 ms, the HIO-ER can recover reasonable images (see figure 9(c)). When the exposure time is down to 40 ms, the reconstruction with the HIO-ER becomes degraded. When the exposure time is 5 ms, the contrasts of the measured speckle patterns decrease to only ∼7%, and the HIO-ER fails to provide a reasonable results even after 120 000 iterations as shown in figure 4(c) (see appendix E for more details). In contrast, MORE still yields unique stable solutions (figure 4(d)). Moreover, as shown in figure 5, it converges from the first big iterative loop, and stably reaches its global minimum after the fifth big loop (less than 200 ms computation time in total). MORE exhibits a high stability in reconstruction and an ability of non-invasive imaging under a low SNR, making an big step close to a realistic application.

Discussion
Limitations of the speckle-correlation-based imaging method have been discussed in previous work [24]. One restriction is the practical limit on the complexity, N, of an object that can be recovered in one frame. N is defined by the ratio of the object's bright area to the area of the resolution cell [24]. Nevertheless, this problem can be ameliorated by reducing either the size of the localized illumination, or the area of the turbid media where light is collected from.
Another restriction is that, in order to obtain high-fidelity images, the illuminated region of the sample each time must be within an isoplanatic patch (memory effect range), as in adaptive optics, not just laterally but also axially. When the camera sensor is large enough, the lateral and axial speckle correlation ranges are zλ/πnL eff [22] and (2λ/πn)(z/D) 2 [26], respectively, where L eff is the effective thickness of the medium. Any illumination originating from outside of this range contributes to noise in the measurement and deteriorate the image quality. L eff = L − l t if L is larger than the transport mean free path l t . However, in cases where L < l t , L eff will be orders of magnitude smaller than the physical thickness L, such as in biological tissues [24] and atmospheric turbulence. On the other hand, if the camera sensor is not large enough, the angular memory effect range θ ME will be limited by the size of the camera sensor D cs , θ ME = D cs /d.
Since MORE recovers images from the same OTF, it requires the turbid media between the object and the camera stays static during the measurement of one OTF. Thus, the exposure time for each frame must be much shorter than the varying timescale of the scattering media. For example, for a turbid media varying in a timescale of ∼100 ms, we can record all the frames with an exposure time of 5 ms, which are then divided into groups to guarantee that all the frames in each group share the same OTF. Two adjacent groups should share a frame, which can be used to trace the movement of the object.
The operation of MORE has some similarities to ptychography [43], Fourier ptychography [44,45] and super-resolution fluorescence imaging approaches, including speckle correlation imaging with a high-index scattering medium [46], as they all exploit joint information among multiple frames to optimize the iterative phase retrieval process. Our method, however, is different from the others in that it does not require overlap in the spatial (as described in appendix F) or in the Fourier domain, nor the low-resolution image as an intensity constraint. In an iterative process, MORE only employs two to five frames within an isoplanatic patch, whereas all frames are involved in an iterative computation for the other methods. Moreover, our method need no prior information on the illumination shape as in conventional structured illumination microscopy [47]. If combined with the scanning procedure proposed in high-resolution fluorescence imaging with a turbid medium [46], MORE may help to substantially improve the efficiency and fidelity of the image-recovery process.
In this work, we prove the OTF phase information is sufficient for imaging through optically-thick materials, and that it can be reconstructed in a phase retrieval process. It makes recovering images under low SNR situations possible. Meanwhile, its reliability and rapid convergence paves the way for real-time dynamic imaging through turbid media. We believe this idea will inspire many other applications with scattering materials such as super-resolution fluorescence imaging [46]. on the scattering medium. The theoretical upper cutoff frequency of the magnitude of the OTF is f upper ∼ nD λd , where n denote the refractive index behind the turbid medium. Note that, in reality, the system's resolution can also be restricted by other effects, such as the size of the camera's pixel. Moreover, it is worth mentioning that the system's lower cutoff spatial frequency, f lower , is affected by multiple factors including the size of the scatterers, the finite area of the detector's sensor, and the uneven illumination background on the camera. In our experiment the last factor, the illumination background on the camera, determines f lower .
With enormous numbers of random scatterers in the turbid medium, the PSF of the system at any one point is the result of a great many independent scattered wave vectors, and thus a Gaussian random variable [36,37]. In ideal cases where more than thousands of speckles are captured by the camera's sensor, for an arbitrary intensity distribution of light transmitted through the scattering layer T(ξ, η), the autocorrelation of the PSF is predicted by the van Cittert-Zernike theorem [36]: S is the average intensity of the PSF. ξ and η represent the coordinates on the diffuser plane. φ is the phase factor [36]. C is the contrast of the PSF, which is less than 1. Its value is influenced by various factors including the noise level in the measurement, the polarization and the bandwidth of the light source. Based on equations (A1) and (A2), if the transmitted light through the medium has an uniform circular cross section, the autocorrelation of the PSF will be a Gaussian-like function (an airy disk) with a background [26,48]. In Fourier domain, Thus |S(u, v)| 2 , as the Fourier transform of the autocorrelation of the PSF, is also a Gaussian-like function except for a sharp peak in the very middle. In figure 6, curves of |S(u, v)|/S are plotted at the v = 0 plane assuming an even intensity distribution on the circular-shape diffuser with different values of C. Note that, |S(u, 0)| has nonzero values only when |u| is below ∼ f upper . In our experiments, we have a contrast of the PSF C 30%.

Appendix B. The determination of a support
The region of the object can be estimated from the power spectrum of a captured pattern. The size of a mode of the spatial frequency is inversely proportional to the region of the object. In our experiment, the size of a mode is around 17 × 29 pixels in the discrete Fourier domain. The region of the object Γ is set as a  rectangle of (2048/17) × (2048/29) ≈ 128 × 70 pixels, where 2048 × 2048 pixels is the dimension of the captured pattern. We also investigate how the accuracy of the estimated region affects the reconstruction. Figure 7 shows the reconstruction results VS different estimated region. Figure 8 shows the relevant convergencies. These results indicate that, 1) the estimation from the power spectrum is faithful; 2) the reconstruction gradually degrades when Γ is getting smaller than its true size. 3) MORE is less sensitive to a bigger Γ. When Γ is set to be twice bigger than the object's true size, MORE still provides a reasonable result.

Appendix C. Comparison of reconstructions for the data with a 500 ms exposure time
The speckle patterns were captured with a 500 ms exposure time. In order to compare the relative positions and orientations of the recovered images, we recorded each position of the object while measuring it. After all the measurements, we restored the location of each object and took its direct image. Under the 500 ms exposure time, the HIO-ER can recover good images. Nevertheless, we still need to run more than 10 000 iterations for each object, and choose the best result from them. Figure 9(c) shows that, when using the  HIO-ER to individually recover an object from a single frame, the relative positions and orientations of different objects are lost. In contrast, MORE preserves their relative positions and orientations, as shown in figure 9(b).

Appendix D. Reconstruction for the data taken at 25 Hz (40 ms exposure time)
We here compare the performance of MORE with that of the HIO-ER, which uses a single speckle pattern to recover an object [23,24]. The speckle patterns were captured at 25 Hz (40 ms exposure time) while the object is moving. Figure 10 shows the convergence curves for object '2' and '5' in the first 200 iterations of the HIO. The normalized difference between a recovered image O dir (x, y) and the direct image O rec (x, y) is calculated with equation (D1): Note that, at K = 0, all the recovered images are calculated with an initial random guess of the OTF. In each calculation of the difference, each recovered image was rotated to the same orientation as that of the direct image, and shifted to the best position to have a maximal overlap with the direct imaging. This process guarantees the validity of the comparison. The convergence curves indicate that, the reconstructions oscillate around local minima. The convergence curves of '3', '4' and '6' also exhibit a similar oscillation. Therefore, it is necessary to choose the best result for an object in all the iterations by looking for the closest Fourier magnitude to |Ĩ pro (u, v)|, i.e., the minimum of where O rec (u, v) is the Fourier magnitude of the recovered image. This chosen criterion is not very reliable. Most of the time, we manually selected the best ones based on the prior information (their direct images). Figure 11 exhibits the recovered image at the iterations of j = 0, 5, 10, 15, 40, 45, as well as the best result chosen from 50 000 iterations for each object. We perform MORE with the same data set. The initial condition is also the same as that used in the HIO-ER. Figures 12 and 13 shows convergence speeds and all the recovered images after each big loop, respectively. MORE converges to the right results since the first big loop. The convergence reaches its global minimum after the third big loop.  One (dynamic object 1) had a bigger size (each numeral is of ∼5 mm), which was easy to be bended and attached on the rotatable cylinder. Another (dynamic object 2) has a much smaller size. Each numeral is of 0.5 × 0.4 mm. So, we were able to investigate the resolution down to μm level. But it was difficult for us to make a cylinder with such a small size. We therefore used a moving plate (containing numerals) and a screen to mimic the behavior of the rotating cylinder: one numeral can be seen each time in the field of view of the measurement, as shown in figure 18.
For dynamic object 1, we set the distance from the object to the diffuser as z ≈ 2 m, the distance between the diffuser and the camera as d ≈ 200 mm. A sequence of frames was captured by the camera while the cylinder was rotating. We selected three frames corresponding to three different numerals to  recover the OTF phase. All the images were then directly computed with this recovered OTF phase using equation (6) of the main text. Figure 19 shows the results.
For dynamic object 2, we reset the experimental parameters to z ≈ 200 mm and d ≈ 300 mm. We performed the dynamic imaging in the same way as for dynamic object 1. After we acquired all the images at different moments of the object, we simply stacked all the images together and generate a video without any image alignment process: each frame is just a recovered image. Since the preservation of the relative positions and orientations, the video displays the real motion of the object. By setting the exposure time of the camera to 40 ms, 10 ms and 5 ms, we obtained the video imaging at the frame rate of 25 Hz, 100 Hz and 200 Hz, respectively. Figure 20 shows some frames of the videos. Please supplementary Movie1.avi, Movie2.avi and Movie3.avi for the whole videos.
Note that, the dimension of each frame was 2048 × 2048. The computation time for the OTF phase was less than ∼200 ms with a three years old GPU (Nvidia Tesla P100 GPU). The reconstruction for each frame contains two Fourier transform calculations, which cost several ms in our laptop.