Deep-inverse correlography: towards real-time high-resolution non-line-of-sight imaging

Low signal-to-noise ratio (SNR) measurements, primarily due to the quartic attenuation of intensity with distance, are arguably the fundamental barrier to real-time, high-resolution, non-line-of-sight (NLoS) imaging at long standoffs. To better model, characterize, and exploit these low SNR measurements, we use spectral estimation theory to derive a noise model for NLoS correlography. We use this model to develop a speckle correlation-based technique for recovering occluded objects from indirect reﬂections. Then, using only synthetic data sampled from the proposed noise model, and without knowledge of the experimental scenes nor their geometry, we train a deep convolutional neural network to solve the noisy phase retrieval problem associated with correlography. We validate that the resulting deep-inverse correlography approach is exceptionally robust to noise, far exceeding the capabilities of existing NLoS systems both in terms of spatial resolution achieved and in terms of total capture time. We use the proposed technique to demonstrate NLoS imaging with 300 µ m resolution at a 1 m standoff, using just two 1/8th s exposure-length images from a standard complementary metal oxide semiconductor detector


INTRODUCTION
Non-line-of-sight (NLoS) imaging recovers hidden objects from light scattered off these objects onto other surfaces in the scene; in essence, it lets us use a rough wall as a mirror.Although the majority of NLoS imaging methods have exploited time-of-travel information [1][2][3][4][5][6][7][8][9][10][11][12], recent work has demonstrated that spatial correlations in scattered light (speckle) contain sufficient information to image such hidden objects [13][14][15].These methods recover the latent image of the hidden object by solving variations of a phase retrieval (PR) problem.While speckle-based methods achieve unprecedented results in high-flux illumination, such approaches struggle to recover a latent image in photon-starved environments, due in large part to the poorly understood noise characteristics of the correlation process.This shortcoming limits existing methods to long acquisition times and small standoff distances d .
In this paper, we examine the NLoS imaging problem illustrated in Fig. 1.The objective is to recover a spatially resolved image of a target hidden behind the corner.To this end, we indirectly illuminate the target using continuous-wave (CW) laser light scattered off a section of the visible wall (dubbed the virtual source), and record the object returns incident on another section of the visible wall (dubbed the virtual detector).This configuration causes the hidden object's albedo, r , to be encoded in the distribution of the speckle pattern incident on the virtual detector.By analyzing spatial correlations in the speckle image, we can estimate the albedo's autocorrelation, r r .With this estimate in hand, we use a PR algorithm to recover r using the following relationship: where F denotes the Fourier transformation and the square is taken elementwise.Although details differ in how r r is encoded and estimated, this basic Fourier relationship is what underlies nearly all correlation imaging techniques [13,[15][16][17][18][19], including NLoS correlography [14].NLoS correlography is described in detail in Section 3.
While correlation imaging techniques provide a novel approach to the challenging NLoS problem, their performance is limited by the same d 4 intensity falloff that affects all NLoS methods.Note that because of eye safety limitations, this falloff cannot be overcome simply by increasing the laser power.Similarly, reducing the falloff to d 2 by colocating the illumination source and the hidden object, as was done in previous correlation-based NLoS Fig. 1.NLoS imaging setup.A camera uses light scattered off of a rough wall, known as a virtual detector, to reconstruct an image of the hidden object.When using a continuous-wave laser, the camera records speckle.Inset, NLoS correlography estimates an object's autocorrelation using speckle images.It then uses this autocorrelation estimate to recover the object's shape by solving a PR problem.
Deep-learning has recently achieved state-of-the-art performance on a range of challenging imaging inverse problems, such as super-resolution microscopy [29,30], lensless imaging [31], ghost imaging [32], and imaging through scattering media [33][34][35][36].However, these existing approaches depend on large supervised training data and generally use shift-variant loss functions that do not translate to the NLoS correlography problem, Eq. ( 1).While we tackle the latter by introducing translation-invariant loss functions, acquiring training data experimentally is infeasible (consider the large combinatorial space of potential hidden scenes).To lift this limitation, we derive an accurate noise model for the NLoS correlography PR problem, which enables us to generate accurate training data synthetically.With the training data in hand, we propose and validate a learned reconstruction approach to NLoS correlography in the low-flux regime.In doing so, we make the following contributions.
• We use results from spectral density estimation to analyze the distribution of the noise associated with NLoS correlography.
• We propose a new approach for generating PR training data, without the need of experimental acquisition or modeling scene semantics.
• We propose a new mapping for the PR problem and propose and analyze several new, translation-invariant loss functions for learning-based PR.
• We validate our CNN on experimental NLoS imaging data, where it proves to be far faster and more robust than traditional methods; it enables reconstructions at a 300 µm resolution at 1 m standoff, using just two 1/8 s exposure-length images.

A. Strengths and Limitations
In contrast to most NLoS imaging methods, our approach requires only standard CW laser sources and complementary metal oxide semiconductor (CMOS) sensors, solves the reconstruction problem in a fraction of a second, and does not need to know the location of the virtual source and detector.This significantly enhances the utility and practicality of our approach, bypassing the need for ultrafast sources and detectors, computationally expensive techniques for recovering a latent image from measurements, and impractical calibration steps.On the flip side, like other correlation-based methods and unlike time-of-travelbased techniques, our technique is best suited for imaging small isolated objects within the hidden volume; large objects lie outside the range of the memory effect [37] and so do not cause the self-interference upon which correlation-based techniques rely.Likewise, because of the translation invariance of the PR problem, our system is unable to localize the position of the objects within the hidden volume.

RELATED WORK
A. Impulse Non-Line-of-Sight Imaging Kirmani et al. [1] first described the concept of imaging occluded objects using temporally coded measurement in which short pulses of light are captured propagating through the scene at the speed of light.These transient "light in flight" measurements are the temporal impulse response of light transport, and Abramson [38] first demonstrated a capture system for transient imaging.Velten et al. [2] showed the first experimental NLoS imaging results using a femtosecond laser and streak camera system to capture transient images.Building on these seminal works, a large body of work has explored impulse NLoS imaging [3][4][5][6][7][8][9][10][11][12], much of which is reviewed in Ref. [39].These methods require detectors capable of high temporal resolution sampling to allow for impulse probing of the temporal light transport in the scene.Although the streak camera setup from Velten et al. [2] is essentially a decade-old technology, it allows for temporal precision of <10 ps.However, the high instrumentation cost and sensitivity of these experimental capture systems has sparked an interest in single photon avalanche diodes (SPADs) as a detector alternative [7,40].Although SPAD detectors can offer resolution <10 ps [41], comparable to streak camera setups, they typically suffer from low fill factors typically around a few percent [42] and spatial resolution in the kilopixel range [43].Compared to ubiquitous intensity sensors with pixel arrays of >10 megapixel and more, SPAD detectors are orders of magnitude less photon-efficient and more expensive.
Recently, a combination of noise robust algorithms [11,12] and extremely powerful illumination have moved these systems closer to real-time rates; using an illumination source with a peak power nearly 10000× our own, [11] produced room-sized reconstructions in under 30 s.Additionally, Ref. [9] demonstrated real-time reconstructions of retroreflective objects; retroreflectors experience only a d 2 intensity falloff with confocal measurements [7].

B. Correlation Non-Line-of-Sight Imaging
Instead of directly acquiring transient transport measurements, a further line of research explores indirect coding using time-offlight sensors [44][45][46][47].Time-of-flight cameras capture correlation measurements of amplitude-modulated light, which encodes travel time via the phase shift of the amplitude-modulated illumination.Although these cameras are readily available as consumer products, such as Microsoft's Kinect One, existing pixel technology offers only limited modulation bandwidths of around 100 MHz, truncating the effective temporal resolution to the nanosecond range.
Recently, an exciting line of work [13,15,19], loosely based off of ideas first developed in [37,48], has explored using correlations in the carrier wave itself, instead of amplitude modulation.This approach enables the use of conventional intensity sensors, while offering high modulation bandwidths in the THz range.Although seemingly a solution to the bandwidth and detector limitations of previous methods, existing approaches have been limited to scenes at microscopic scales [15] and lab setups with ambient light completely eliminated and have colocated the hidden object and the illumination source.
In this paper, we demonstrate a method that overcomes these issues by relying on temporally and spatially coherent measurements and a robust reconstruction framework, together allowing us to achieve photon-efficient NLoS imaging in the presence of strong ambient light and at large distances.A comparison of the two methods is provided in Section 3 of Supplement 1.

C. Non-Line-of-Sight Tracking
Recently, a number of methods for the orthogonal task of tracking and/or classifying occluded objects have been demonstrated using intensity measurements only [49][50][51][52][53][54][55][56].Although these methods also rely on conventional intensity sensors, most are restricted to coarse localization tasks and often assume known object classes and scene geometries or require colocating the hidden object and the illumination source.

D. Traditional Phase Retrieval Algorithms
PR solves the problem of recovering a signal from a measurement of its Fourier magnitude (modulus).As the phase is lost in the measurement, this inverse problem is ill-posed in general.However, if the measurements are oversampled sufficiently, in theory, phase can be perfectly recovered by solving a set of nonlinear equations [57].Together with an assumption on a nonzero support of the real-valued signal, practical error reduction methods have been designed [20] for a plethora of applications in optics, crystallography, biology, and physics.A popular extension of this algorithm is the hybrid input-output (HIO) method [21] and its various relaxations [22,58].Recently, two major lines of research have explored alternating direction methods for PR [23,59], and overcoming the nonconvex nature of PR through convex relaxations [60].

E. CNN-Based Phase Retrieval Algorithms
Only very recently, deep neural networks have been explored for solving PR.Most previous attempts to apply convolutional neural networks (CNNs) to PR have been application specific.CNNs have been applied to ptychography [61,62], holography [63], quantitative phase microscopy [64,65], and coherent diffraction imaging (CDI) [66,67].Among these works, the CDI approaches are the closest to our own.
There have also been attempts to use CNNs as regularizers within a PR optimization problem [68].Unfortunately, these techniques require accurate initializations to succeed, which are not available in the very low signal-to-noise ratio (SNR) regimes characteristic of NLoS imaging.

NON-LINE-OF-SIGHT CORRELOGRAPHY A. Principles of Operation
The canonical NLoS imaging geometry of Fig. 1 will be used to introduce and develop the mathematical concepts underlying NLoS imaging.In this setup, a quasi-monochromatic laser source illuminates a portion of the visible wall, which we dub the "virtual source."Laser light scattered by the optically rough virtual source surface propagates towards the hidden object.Due to the coherence of the laser source, the hidden object is illuminated by a fully developed speckle pattern, characterized by randomized constructive and destructive interference.A fraction of the light incident on the object is redirected (reflected and scattered) towards a second section of the visible wall, which we dub the "virtual detector."A camera observing the virtual detector surface records the light reflected and scattering by the object.By scanning the position of the virtual source, a second statistically independent speckle realization is used to indirectly illuminate the hidden object.The corresponding camera image of the virtual detector surface, which is itself a low-contrast speckle pattern, is recorded.Using two or more measurements of this form, the autocorrelation of the hidden object's albedo is estimated, and the albedo is reconstructed by solving a PR problem.

B. Measurement Model
In this section, we provide details of the NLoS correlography measurement process, seeking to identify the (idealized) probability distribution associated with each measurement.We denote the complex-valued optical fields incident and emitted by the virtual source, hidden object, and virtual detector with the variables E VS in , E VS out , E O in , E O out , and E VD in , respectively.We index spatial locations on the virtual source, the hidden object, and the virtual detector using x VS , x O , and x VD , respectively.In the interest of mathematical simplicity, we assume that the distances between the virtual source, object, and virtual detector are such that propagation between them can be modeled by the Fraunhofer approximation, i.e., proportional to a Fourier transformation.The propagation operator may be generalized to Fresnel propagation by absorbing the quadratic phase factors intrinsic to propagation within the fields being propagated, without affecting the outcome of our analysis.We additionally assume that the virtual source surface is illuminated by a collimated beam at normal incidence so that ( The optically rough virtual source scrambles the phase of the light emerging from the virtual source surface, so that where the phase term θ VS out is uniformly distributed over the interval [0, 2π ] and where the autocorrelation function of the phase term is a Dirac delta function.Thus, for x VS,1 = x VS,2 , E VS out (x VS,1 ) and E VS out (x VS,2 ) are uncorrelated with respect to the distribution of θ VS out .
The field emerging from the virtual source then undergoes farfield propagation on its way to the object, which can be modeled by a Fourier transformation.Accordingly, the field incident on the hidden object is where F denotes the Fourier transform operator.
From the central limit theorem, the independence of the fields at different locations of E VS out , and the orthonormality of the Fourier transform, we have that, for all x O , E O in (x O ) follows a circular Gaussian distribution with autocorrelation function σ 2 δ( x O ) for some constant σ 2 .
Each location on the object modulates the incoming field according to the albedo of the hidden object.Thus, and E O out (x 0 ) follows a circular Gaussian distribution with autocorrelation function r σ 2 δ( x O ).
The field emerging from the object propagates towards the virtual detector, in accordance with the far-field propagator.Thus, The camera then images the virtual detector.If we assume that the camera has infinite aperture and the virtual detector has uniform albedo, we have One can illuminate different locations on the virtual source to capture independent measurements that follow Eq.(7).Images can additionally be subdivided into nonoverlapping patches and treated independently, e.g., a 2000 × 2000 image can be subdivided into 25 independent 400 × 400 patches [18,69].

C. Autocorrelation Estimation
Correlography takes measurements that follow the distribution specified by Eq. ( 7) and recovers the hidden object's albedo using the following two equalities.These equalities are derived in Section 1 in Supplement 1 and are based on the law of large numbers, and By subtracting Eq. ( 9) from Eq. ( 8), we are left with an unbiased estimate of r r , which we denote with r r .An example of such an estimate is shown in Fig. 2(c).
The autocorrelation is related to the Fourier magnitudes of r through Eq. ( 1).Thus, by taking the Fourier transform of our estimate of r r and then applying a PR algorithm to the result, we can recover an estimate of the albedo r .An example of one such Fig. 2. Long-exposure NLoS correlography example.25 nonoverlapping 400 × 400 speckle subimages were drawn from each of 50 distinct, 1 s exposure-length speckle images.These subimages were then used to estimate the hidden object's autocorrelation (middle) using Eqs.( 8) and (9).HIO [21] was then used to reconstruct the object's albedo (right).3. Short-exposure NLoS correlography example.25 nonoverlapping 400 × 400 speckle subimages were drawn from each of 2 distinct, 1/8 s exposure-length speckle images.These subimages were then used to estimate the hidden object's autocorrelation (middle) using Eqs.( 8) and (9).HIO [21] was then used to reconstruct the object's albedo (right).reconstruction, recovered using the HIO algorithm [21], is shown in Fig. 2(d).
Figures 3(c) and 3(d) repeat this experiment using two short exposure measurements, which produce a much noisier autocorrelation estimate: HIO fails to recover any structure in this context.Understanding and overcoming this noise is the key to enabling real-time NLoS imaging with correlography.

CORRELOGRAPHY NOISE MODEL
This section describes the fluctuations of r r due to various sources of noise in the measurement process.

A. Distribution of the Autocorrelation Estimate
In practice, for locations x = 0, Eq. ( 8) is much greater than Eq. ( 9).Thus, for x = 0, The expression 1 N N n=1 |F −1 I n | 2 is the average of N i.i.d.random variables.From the central limit theorem, the elements of for some mean µ( x ) and variance σ 2 ( x ).
Next, we note that 1 In the context of power spectral density (PSD) estimation, averaging together multiple periodograms is known as Bartlett's procedure.Recognizing this fact allows us to rely upon existing theory to analyze r r .Bartlett's procedure produces an unbiased estimate of the true PSD, S( x ).Moreover, if we assume I n follows a Gaussian distribution, the pointwise variance of this estimate is proportional to 1 N S 2 ( x ) [70].The assumption is justified because I n follows a noncentral chi-squared distribution with M degrees of freedom, where M denotes the dimension of the signal and, for large M, this can be approximated by a Gaussian distribution [71].In summary, for x = 0, we have where S( x ) is the PSD of the speckle at x and γ is some constant.

B. Sources of Noise
Multiple sources of noise and bias influence the NLoS correlography measurement process, the most important of which are 1.Finite-sample approximation error: Using a few samples (small N) will increase the variance of r r .It has no affect on the PSD of the speckle.2. Photon noise: When dealing with weak, third bounce signals, Poisson shot noise shows up on the measurement I n .This shot noise is white and will add a uniform offset to S( x ).

Ambient illumination:
The measurements capture light not only from the hidden object, but also from walls, floors, and clutter in the scene.This shows up as both a diffuse background and uncorrelated shot noise in the speckle the measurements.The diffuse background will add a peak at S( x = 0), and the shot noise adds a uniform offset to S( x ).

Finite apertures:
The finite apertures of both the camera and the virtual detector mean that our measurements are low-pass filtered.This band limits the PSD of the speckle.
Combining these sources of noise, we get where H(•) is the aperture's low-pass transfer function and b is an offset accounting for the shot noise (both from the hidden object and the background).Assuming our camera has a sufficiently large aperture, this model simplifies to Combining this result with Eq. ( 10), with some slight abuse of notation, we have In the above equation, we eliminate the variance's cross terms by assuming that, on the support of r r ( x ), r r ( x ) b.
In this expression, the signal-dependent, space-varying noise (second term) is due primarily to finite-sample approximation error.Figure 4(d) illustrates what happens when too few high SNR measurements are used to estimate r r ; error-type 1 dominates, and shot-noise-like, strongly signal-dependent noise shows up in the estimate.
In contrast, the offset and the spatially invariant noise (third term) are due primarily to the shot noise.Figure 4(b) illustrates the case when many low SNR measurements with significant shot noise components are used to estimate r r ; error type 2 dominates, and an offset and uniform Gaussian noise appears in the estimate.
Finally, Fig. 4(c) demonstrates that for a large-enough photon budget, there is a Goldilocks zone wherein the shot-noise in the measurements is reduced but there are still enough samples to avoid finite-sample approximation error.Photons budgets should be spent so as to operate in this regime.In this paper, we found our photon budget was best spent capturing just two high-resolution speckle images (each broken into 64 smaller image patches, resulting in N = 128).

C. Validating the Noise Model
To further validate the proposed noise model, we inspect a series of experimental autocorrelation estimates, which were formed with a varying number of speckle images and with a variety of exposure times.For each autocorrelation estimate, we consider the statistics of a 20 × 20 patch in the top-left corner of the estimate, which, by visual inspection, contains no signal component.Our model predicts that, across estimates, such regions should be distributed according to N (b, γ N b 2 ).Assuming these regions are ergodic, they should exhibit a similar distribution across the pixels of a single estimate.Variance versus mean with varying exposure and fixed N = 128 (left) and variance over mean with fixed exposure and a varying N (right).As predicted by our model Eq. ( 11), the variance of our r r estimate grows quadratically with respect to its mean, and the ratio between the variance and mean grows linearly with respect to 1 N .In Fig. 5, we plot the variance and mean of these patches (across pixels).We observe that as the mean increases the variance grows quadratically, as predicted.Furthermore, we see that as we reduce the number of speckle images used to form the estimates (N), the ratio between the variance and mean of the patches grows proportionally to 1  N , as predicted.

LEARNING PHASE RETRIEVAL
As mentioned before, existing PR algorithms are not up to the task of solving the noisy PR problem associated with low-light NLoS correlography.In this section, we describe how we applied a CNN to the problem.

A. Training Datasets
Deep learning is a powerful tool for solving computational imaging problems, but requires vast quantities of training data to succeed.
In the context of NLoS imaging, this training data is very hard to come by experimentally.Therefore, we leverage the noise model developed in the previous section to synthesize training data.We generated training data consisting of r , r r pairs, where the r r examples were synthesized according to Eq. ( 11) with b = 70 and γ N = 0.015, where elements of r are scaled such that max(r r ) = 255.These parameters were chosen by fitting the noise model to the mean (b) and variance ( γ N b 2 ) of a 20 × 20 patch from the top left corner of the autocorrelation estimate formed by a 1/8 s exposure measurement.
The dataset used for r determines what priors the network learned about the problem-and how well it generalizes to different problems.In this paper, we train a CNN using a dataset of sparse, "unstructured" images (at the SNRs we are interested in, reconstructing dense, "natural image" scenes is infeasible).This dataset was formed by passing the Berkeley Segmentation Dataset 500 [72] through a Canny edge detector and then cropping to form a dataset of roughly 20000064 × 64 images with sparse edges.The images in this dataset are connected and sparse, but otherwise lack much structure.See Fig. 6.

B. Loss Function
One challenge in learning PR is that with Fourier measurement operators, PR is invariant to translations.Thus, training a neural network using a loss that is not invariant, such as the 1 or 2 distance between r and r , would force the network to not only solve the PR problem but also memorize the locations of all the training data., where r denotes the networks estimate of r .We found that the loss r r − r r 1 converged quicker than the others but that all four losses eventually produced similar solutions (the Pearson correlation coefficient may also have been effective [66]).See Figures S1 and S2 in Supplement 1.We use r r − r r 1 as the loss function throughout the rest of the paper.

C. Choosing the Mapping
Although in principle a CNN can learn almost arbitrary mappings between a measurement of r and the signal r , certain mappings are easier to learn than others.Neural networks with skip or residual connections excel at learning identity-like mappings [73].As illustrated in Fig. 7, the mapping from r r to r is much closer to an identity mapping (r r and r share similar features) than the mapping from |F(r )| to r is.As such, we found that networks trained to recover r from an estimate of its autocorrelation did much better than those trained to reconstruct it from an estimate of its Fourier magnitudes.

D. CNN Architecture
We used the well-known U-net architecture as our CNN [74].Our U-net consists of 12 convolutional layers, each with between 64 and 512 channels.The U-net is essentially a convolutional autoencoder with a large number of skip connections between its layers.While originally designed for segmentation, it has been applied successfully to a range of imaging inverse problems, such as the reconstruction of medical images [75] and low-light denoising [76].

E. Implementation and Training Details
Our network was implemented in PyTorch.We used a batch size of 32.We trained for 400 epochs at a learning rate starting at 0.002 and decaying to 0.000001, using the ADAM optimizer [77].It took a little over a day to train the network using an Nvidia Titan RTX GPU.(Our code is available at [78].)

LOW-LIGHT NLoS CORRELOGRAPHY
In this section, we use simulations and experiments to validate the proposed approach and answer the following question: does the increased noise robustness afforded by the proposed learned PR enable real-time NLoS correlography?

A. Experimental Setup
Our NLoS correlography experimental setup is illustrated in Fig. 8.A steerable, 500 mW, 532 nm CW laser source (Azur Light Systems ALS-532) illuminates the virtual source.We operated the laser at 300 mW.A Canon telephoto macrolens with 180 mm focal length is used to image the virtual detector surface.With this lens, the image sensor (2056 × 2464 pixel Sony IMX 264 monochrome) has a magnification of about 0.5, a pixel size of 3.45 µm, and an active area of 8.47 mm × 7 mm.(We removed the camera's cover glass to reduce internal reflections.) We imaged the 1 cm hidden figures from Figs. 2(a) and 3(a), which had an average fill rate of about 0.2 (they occupy about 0.2 cm 2 ).Our virtual source was 0.5 m from the hidden object, the hidden object was 1 m from the virtual detector, and the virtual detector was 0.8 meters from the camera.

B. Radiometric Throughput
Assuming the walls and target have albedo 1 and are perfectly Lambertian (they are not), the radiometric throughput of our system works out to −182 dB.With a 300 mW, 532 nm laser source, this translates to 2.7 million third bounce photons per second reaching the detector.These calculations can be found in Section D in Supplement 1.

C. Phase Retrieval Algorithms
Here we compare our CNN to the projection-based HIO PR algorithm [21] and the alternating minimization PR algorithm (Alt-Min) from [24].Additional results with median truncated Wirtinger flow (MTWF) [25], truncated amplitude flow (TAF) [26], truncated Wirtinger flow (TWF) [27], and alternating direction method of multipliers with a total variation prior (ADMM-TV) [28] can be found in Section E of Supplement 1.These methods perform no better than HIO.
HIO was run for 2000 iterations with a step size β = .9.The support was assumed to be 64 × 64, out of 128 × 128 pixels total.Alternating minimization was run for 1000 iterations.

D. Recovery Times
With 2 speckle images, estimating the autocorrelation, r r , took 1 10 s.From there, HIO took just under 3 s to reconstruct r .The CNN took a few hundredths of a second.Exposure times, not processing, are the bottleneck in a CNN-based system.

E. Low-Light Imaging Simulations
Using the throughput estimates from the previous section, we simulated NLoS correlography with exposure times between 1/4 and 1/256 s using a 300 mW CW laser with hidden objects with a reflective area of roughly 0.2 cm 2 .For each of the exposure times, we captured 2 images, each consisting of 64 patches.Because it is more robust to noise, the CNN-based method can operate with far less light and thus at higher frame rates than a system relying on traditional PR algorithms like HIO [21] or Alt-Min [24].(The vertical/horizontal lines that can be observed in the experimental short-exposure autocorrelation estimates are the result of correlated, fixed pattern read noise between pixels.The 7 and F were measured at different orientations.) Our simulation results, presented in Fig. 9(a), demonstrate that for a given laser power, because it is more robust to noise, the CNNbased method can operate with reduced exposure times, and thus at higher frame rates.

F. Low-Light Imaging Experiments
We applied the CNN-based PR method to experimental NLoS imaging data consisting of multiple low exposure measurements of the objects from Figs. 2(a) and 3(a).Figure 9(b) demonstrates that the CNN is significantly more robust to noise and offers improved reconstructions across all operating regimes.The CNN offers recognizable reconstructions starting around exposure lengths of 1 16 s.Additional results can be found in Supplement 1.

CONCLUSION
NLoS correlography promises, in theory, to enable real-time NLoS imaging at sub-mm resolutions.However, the limitations of existing PR methods, particularly their sensitivity to noise, prohibit this.More broadly, because of the quartic attenuation of intensity with distance, handling low-flux regimes is arguably the fundamental barrier to real-time NLoS imaging.This paper makes a step towards lifting these limitations.
Specifically, we first analyzed the NLoS correlography noise model.This analysis makes it possible to simulate adequate training data for learning NLoS correlography problems.Using the proposed dataset and new loss function, we then trained a CNN to solve the noisy PR problem.In simulation, we confirmed that the resulting CNN is computationally efficient and exceptionally robust to multiple forms of noise, far exceeding the capabilities of existing algorithms.We validated our approach on experimental NLoS imaging data and successfully reconstructed the shape of small hidden objects from a standoff distance of one meter away using just two 1/8 s exposure-length images captured by a conventional CMOS detector, representing a significant step towards real-time high-resolution NLoS imaging.
See Supplement 1 for supporting content.† These authors contributed equally to this work.

Fig. 5 .
Fig. 5. Distribution of experimental autocorrelation estimates.Variance versus mean with varying exposure and fixed N = 128 (left) and variance over mean with fixed exposure and a varying N (right).As predicted by our model Eq.(11), the variance of our r r estimate grows quadratically with respect to its mean, and the ratio between the variance and mean grows linearly with respect to1  N .

Fig. 6 .
Fig. 6.Unstructured training data.Examples of images formed with a Canny edge detector (top) and their associated noisy autocorrelations (bottom).

Fig. 7 .
Fig. 7. Noisy estimates of (a) r r, (b) |F(r)| and (c) the associated r. r r and r share similar features whereas |F(r )| and r , for the most part, do not.

Fig. 8 .
Fig. 8. Experimental setup.Light passes from the laser, to the virtual source, to the hidden object, to the virtual detector, and finally to the camera.

Fig. 9 .
Fig.9.Simulated and experimental reconstructions with varying exposure lengths.Because it is more robust to noise, the CNN-based method can operate with far less light and thus at higher frame rates than a system relying on traditional PR algorithms like HIO[21] or Alt-Min[24].(The vertical/horizontal lines that can be observed in the experimental short-exposure autocorrelation estimates are the result of correlated, fixed pattern read noise between pixels.The 7 and F were measured at different orientations.)