Fast Correlated-Photon Imaging Enhanced by Deep Learning

Correlated photon pairs, carrying strong quantum correlations, have been harnessed to bring quantum advantages to various fields from biological imaging to range finding. Such inherent non-classical properties support extracting more valid signals to build photon-limited images even in low flux-level, where the shot noise becomes dominant as light source decreases to single-photon level. Optimization by numerical reconstruction algorithms is possible but require thousands of photon-sparse frames, thus unavailable in real time. Here, we present an experimental fast correlated-photon imaging enhanced by deep learning, showing an intelligent computational strategy to discover deeper structure in big data. Convolutional neural network is found being able to efficiently solve image inverse problems associated with strong shot noise and background noise (electronic noise, scattered light). Our results fill the key gap in incompatibility between imaging speed and image quality by pushing low-light imaging technique to the regime of real-time and single-photon level, opening up an avenue to deep leaning-enhanced quantum imaging for real-life applications.

ing low-light imaging technique to the regime of real-time and single-photon level, opening up an avenue to deep leaning-enhanced quantum imaging for real-life applications.
Correlated-photon imaging, relying on the inherent quantum correlations between entangled photon pairs, emerges as a novel technique bringing quantum enhancement to a large number of research fields [1][2][3][4][5]. The direct imaging of non-classical correlations can reveal entanglement between position and momentum [6,7] or entanglement among optical angular momentum modes [8]. The new imaging technique enabled by single photon-sensitive cameras can be used to test fundamental quantum physics [9][10][11][12][13][14][15][16][17] and to improve the conventional imaging systems on spatial resolution and signal-to-noise ratio (SNR) [18][19][20]. Unfortunately, imaging system's performance at low-light flux is affected by shot noise due to the quantum nature of light. Intensified scientific complementary metal-oxide-semiconductor (I-sCMOS) cameras are able to capture single photons by virtue of image intensifier technology [21][22][23][24][25]. In order to extract the single-photon signal from the noise, a reasonable threshold is set to binarize the data in each pixel, a signal over which we define successful registration of one photon [26,27].
Reconstructing such photon-limited images can be converted to solve inverse problems [28,29]. Numerical algorithms [30][31][32] can solve the inverse problems by treating Poisson statistics as the prior knowledge and performing complex iterative operations, such as least squares, maximum likelihood and convex optimization. Apparently, thousands frames of raw image data have to be collected to form a proper-precision statistical result, which therefore prevent the reconstruction implementing in real time. Machine learning, also known as "end-to-end" approach, can merge multiple stages into one single neural network [33][34][35][36][37][38]. It implies that it is possible to find a direct relationship between the original objective and measured ultra low-light images by learning from large data sets.
Here, we report an experimental deep learning of correlated photon imaging where the convolutional auto-encoder (CAE) is trained for extracting effective signals from various noises.
As a result, deep learning algorithm shows superior performance over the numerical reconstruction algorithm in image restoration and superresolution at single-photon level, especially the ability to achieve high-contrast imaging in real time. Our results suggest emerging deeplearning-enhanced applications in quantum imaging and quantum information processing.
In general, the imaging measurements, y, is noisy compared with the original image, x , due to the imperfect imaging system, such as laser's Poisson properties, optical elements' limited size and camera's low quantum detection efficiency. However, a statistical model describing the forward imaging system becomes ill posed when the number of photons decreases. Thus regularization is often introduced into a designed numerical algorithm R reg to search solutions that match the prior knowledge about objects [30], where Φ represents the Poisson forward model. L is loss function, an appropriate measure of error. h is a regularizer to control model complexity and reduce overfitting. The choice of the regularizer is often based on practical experience. The total variation (TV) regularization has been extensively studied and applied in the field of image denoising. It often converts the image denoising into a well-posed problem by introducing certain constraints, thus ensures the existence and uniqueness of the optimized image, so the method has the advantage of being less disturbed by noise.
As is shown in Fig.1(a), numerical algorithms usually consist of the following process: Preprocessing step rescales the measurement data to fit the inputs in Modeling step, where the reconstruction process is converted to a convex optimization program from Eq.(1). After se-quential quadratic approximations, we get an optimized image corresponding to original object.
As an effective method for feature extraction and image denoising, deep learning has revolutionized our ability to use computers to perform challenging tasks. It provides another alternative, called the learning algorithm, as is shown in Fig.1(b). Original objects and their corresponding measurements, {(x n , y n )} N n=1 , are fed into a neural network as inputs, the reconstruction algorithm, R learn , is trained for optimizing the following process where L is loss function, h is a regularizer, and θ is possible parameters in the neural network.
Learning algorithm usually consists of two parts: the encoder maps the features of input images into the hidden layer space, and then the decoder restores these features to reconstruction images. The internal parameters are adjusted to minimize the loss function from Eq.(2) between measurements and original data. Once the learning step is complete, the neural network structure can serve as the optical inverse imaging model, and recover new images from their measurements in a straightforward fashion.
Our experimental arrangement is shown in Fig.2 The layout of the CAE network is schematically shown in Fig.2(b). We build the model with three convolutional layers, three maxpooling layers, three deconvolutional layers and three unpooling layers. The images are split into a training set and a test set containing 10521 and 1169 images from the Extended MNIST (EMNIST) handwritten letter database, respectively.
Training letters are sequentially displayed on the amplitude-only SLM shown in Fig.2(a), and the corresponding noisy images are then captured by the I-sCMOS camera. All input-output data are fed into the network to optimize the parameters of every layers, after it's complete, the performance of the CAE framework is tested using the test images (see Methods for details).
We choose the "photon" letter samples to demonstrate the performance from different reconstruction algorithms, as is shown in Fig.3(a)-(d). Direct measurements in the camera plane are very noisy compared with the original objects in low light level condition. TV regularization is suitable for optimizing photon-limited images, so background noise needs to be pre-filtered by above photon counting approach. When there is only one frame of data, this scheme has little effect on image reconstruction. In contrast, the CAE algorithm is very efficient in suppressing the shot noise and electronic noise. Besides, we plot an intensity map of one line of the "n" image, as is shown Fig.3(e). Sharp edges can be well reconstructed by the end-to-end method.
The reasonable interpretation is that the CAE algorithm does not fully learn the forward imaging model, but rather learns how to suppress noise and also how to optimize the feature parameters of the training images.
To quantify this excellent improvement, we define the image contrast as In the case of ultra weak signals, for TV-regularization, only a very faint image of the object is obtained with a contrast of 0.4, while the CAE algorithm recovers a better image with an enhanced contrast of 1. This result indicates that the CAE algorithm has an advantage in suppressing noise and fast image reconstruction.
Further, to verify the robustness of the learning algorithm, we prepare another data set containing 6690 handwritten digits downloaded from the MNIST database, in which 6021 images as a training set and 669 images as a test set. Furthermore, fewer correlated photons are illuminated on the samples and recorded by the camera with ∼ 0.8 photons per pixel on average.
We display "1905" digits in Fig.4(a)-(c). Fewer photons result in indistinguishable raw signal measurements. Interestingly, the reconstructed images still give good contrast. Thus, the CAE algorithm can protect the signals from noise damage and demonstrate strong robustness.
Besides, in order to optimize the CAE structure, we constructed the CAE networks with 5 layers, 7 layers, and 9 layers. Fig.4(d) shows the cost changing with the number of epochs. After 1000 epochs, the mean-squared error (MSE) between the network output and the appearance of the handwritten digits drops down to 0.25 for 5 layers, and becomes steady. However, for 7 layers and 9 layers, the cost difference becomes smaller after 2000 epochs. In fact, using the least convolution layers is necessary to realize the best denoising results, since it can reduce the computational cost significantly and save huge computing resources.
We summarize the state-of-the-art single photon imaging experiments, as is shown in Fig.5.
Imaging systems differ in various applications, leading a trade-off in visibility and the time spent on collecting data. Compared with the passive imaging schemes, active imaging enables higher contrast by high-precision nanosecond time gate filtering out noise from the signals.
However, intensified CCD and CMOS architectures suffer from the low frame rate, as a result, the traditional reconstruction algorithms can enhance the image only by collecting thousands of sparse-photon frames, which waste a lot of time [15,18,39]. Another active imaging scheme is based on SPAD cameras capable of counting and time stamping single photons with picosecond time resolution. Single-pixel scanning imaging [40] with an excellent visibility acquires a megapixel scan in approximately 20 minutes. While SPAD array imaging [41] achieves great improvement, but still requires hundreds of seconds. It can be seen that the balance between imaging quality and imaging speed is the result of the simultaneous improvement on hardware devices and algorithms. Our deep-learning-based reconstruction algorithm based on the I-sCMOS camera can realize fast imaging at a second-level speed, and while keep high visibility simultaneously.
In summary, we experimentally demonstrate a fast correlated-photon imaging scheme en-

Data availability.
The data that support the findings of this study are available from the corresponding author upon reasonable request. from the EMNIST and MNIST database is 28 × 28 pixels. We magnified these images 12 times and displayed them on the SLM. As a result, the physical size of images on the SLM is about 2.34mm × 2.34mm. The camera is an I-sCMOS camera with a 5.5 megapixel sensor and a 10% quantum efficiency at 780nm.

CAE model:
The Convolutional Auto-Encoder Model has three convolutional layers, three maxpooling layers, three deconvolutional layers, three unpooling layers as shown in Fig.2(b).
The first convolution layer takes a feature map of size ( 28 × 28 × 1 ) and outputs a feature map of size ( 28 × 28 × 64 ) : it computes 64 filters with size 3 × 3 over its input. In the next max-pooling operation, the patches extracted by a 3 × 3 convolution with stride 2 over a 28 × 28 input (without padding). The max-pooling operation halves the above feature map to size ( 14 × 14 × 64 ). After three convolution and pooling operations, the data has been compressed to a feature map of size ( 4 × 4 × 32 ). Decoder is an inverse process similar to encoder.
After three upsample and deconvolution operations, we get a feature map of size ( 28 × 28 ×  This process only optimizes a single-frame image to achieve a global optimal solution by multiple complex steps such as least squares, maximum likelihood and convex optimization. (b) Schematic of the learning algorithm process. This process builds a network structure to directly connect the input and output images, and then a large set of training images are fed into the network to learn features of the imaging system by optimizing joint parameters. Once training step is finished, we can rebuild optimized images in real time.    To achieve high-contrast imaging quality, numerical algorithms require sparse single photon data per frame, thus intensified CCD and CMOS cameras have to accumulate thousands of frames. The emergence of new imaging devices like SPAD cameras makes less necessary photons and high contrast possible, but numerical algorithms are still a barrier to realize fast imaging. Deep learning algorithms effectively solve this problem, achieving a win-win for both imaging speed and quality.