Amplitude/Phase Retrieval for Terahertz Holography With Supervised and Unsupervised Physics-Informed Deep Learning

Most neural networks proposed for computational imaging (CI) in the terahertz (THz) bands require a large amount of experimental data to optimize their weights and biases. However, obtaining a sufficient number of ground-truth images for training is challenging in the THz domain due to the requirements of environmental and system stability, as well as the lengthy data acquisition process. To overcome this limitation, this article proposes novel supervised and unsupervised physics-informed deep learning (DL) methods for amplitude and phase recovery by incorporating angular spectrum diffraction theory as prior knowledge. First, we demonstrate that our unsupervised dual network can predict both amplitude and phase simultaneously, overcoming the limitations of previous studies that could only predict phase objects. This is demonstrated using synthetic 2-D image data as well as measured diffraction images. The advantage of unsupervised DL is its ability to be used directly without labeling by human experts. In addition, we address supervised DL, which is a concept of general applicability. We introduce training with a database set of 2-D images taken in the visible spectra range and numerically modified by us to emulate THz images. This approach allows us to avoid the prohibitively time-consuming collection of a large number of THz-frequency images. Furthermore, we employ a combination method that enhances the sharpness of image edges, improves contrast, and effectively aligns the approach with the ground truth. The results obtained using both approaches represent the initial steps toward fast holographic THz imaging with reference-beam-free, low-cost power detection.

Imaging at THz frequencies (0.1-10 THz, wavelengths of 3 mm-30 µm) has received considerable attention in recent years.Application areas for THz imaging explored at present include non-destructive testing [1], quality monitoring [2], security screening [3,4], biomedical imaging [5], and sensing for robotics and vehicle control [6].The applied imaging modali-ties increasingly include coherent holographic approaches [7], variously because they allow to capture of comparatively large scenes with good spatial resolution, offer the possibility of 3D scene reconstruction, may have a relaxed demand on the optical imaging optics, or lend themselves for advanced numerical image processing, e.g., exploiting sparsity effects.However, at lower THz frequencies, and there especially in the sub-1-THz frequency range, which is most relevant for many applications, one is forced to perform coherent detection as a serial rather than a parallel process [8][9][10].The main reason is that power availability at these THz frequencies is a critical issue [11].While detector arrays (with pixel numbers limited to hundreds or at most a few thousand [7]) in principle are available, the limited power makes it difficult to provide a reference beam for multipixel interferometric or heterodyne phase measurements.The need for serial data recording renderscoherent imaging timeconsuming and it is detrimental for image reconstruction, since phase distortions and noise introduced during signal recording affect the data quality strongly, especially in the case of weak signals [12,13].A powerful way to substantially reduce the power requirements, to drastically simplify the measurement system, and to accelerate the data acquisition process would open up, if the rebuilding of the phase information could be done computationally from the amplitude or intensity diffraction pattern obtained with the object-beam alone.Then it would become feasible to revert to array-based, reference-beam-free power detection for THz holography.
Conventional amplitude-phase recovery algorithms rely on iterative processing of diffraction patterns (DPs) recorded at different distances to improve the convergence and reliability of the reconstruction [14][15][16].The cost is the acquisition of large amounts of data and the manual searching for the convergence condition.Recently, rapidly evolving DL techniques offer a novel and efficient way to reconstruct images precisely [17][18][19], including supervised DL methods which require plenty of labeled experimental data [20][21][22], and unsupervised DL methods that are suitable for unlabeled data (such as PhysenNet [23]).However, the expensive acquisition of THz images hinders naive supervised training with experimental data.

arXiv:2212.06725v1 [physics.optics] 13 Dec 2022
This letter proposes two novel physics-informed DL methods -one supervised and the other unsupervised -for phase retrieval in THz holographic imaging, using only visible-light MNIST datasets [24] for training, and incorporating Fresnel diffraction as prior physical knowledge.Physics-informed DL provides a way to integrate mathematical physics models and data into the learning algorithms synergistically,ch that the calculations respect the physics laws symmetries of the modeled system [25,26] and has been proven powerful in tackling inverse problems in physics [27][28][29].The proposed supervised DL method requires longer training, which is however a one-off effort, leads to a high prediction speed after training, and makes the predictions fairly robust against noise in experimentally generated images (see later in Fig. 4(f)).The advantage of the unsupervised method is that it can be used directly both without labeling by human experts and training, and that it achieves simultaneous reconstruction of amplitude and phase, thus largely expanding the working range of previous unsupervised DL methods.Figure 1 displays the flow chart of the proposed supervised phase reconstruction, illustrated for the use with handwritten digits.Lacking a sufficiently large data set recorded with THz radiation, we start from the public MNIST data set which was generated from thousands of photographs taken in the visible spectral range.The images from MNIST are then preprocessed to transpose them into effective THz images, as shown in Fig. 1(a)-(c).The pre-processing includes (i) superimposing the Gaussian field-amplitude profile of a measured THz beam (recorded in the experimental setup discussed below) onto the images to obtain the modified amplitude, and (ii) determining the phase contour according to the object's material and thickness to obtain the modified phase.We then employ the physical model H (see below) to calculate the field-amplitude diffraction pattern of the effective THz image according to the Huygens-Fresnel principle [30].This pattern is fed as input into the convolutional neural network (CNN), as shown in Fig. 1(d).Its architecture is inspired by U-Net [31] and consists of a down-sampling path as well as a symmetric up-sampling path (see Supplementary for a detailed description).The output of the calculations are reconstructed phase images.In a subsequent step, the same network is applied for amplitude retrieval from the diffraction patterns.
The physical model, H, simulates the experimental THz imaging process.If a planar object is illuminated by a beam, the complex-valued field amplitude immediately behind the object can be written as where A 0 and φ 0 are the amplitude and the phase at the object plane.Over a distance d, diffraction reshapes the field to [30] where y is the wave propagation function, λ the wavelength, Ê0 the spatial Fourier transform of E 0 with f x = x/λd and f y = y/λd as the spatial frequencies in the x and y directions.The diffraction pattern is the field's absolute value where H(•) represents the mapping function relating the object to the diffraction pattern A. The challenge is now to achieve an inverse mapping, H −1 (•), such that The supervised method of Fig. 1 A corresponding inverse mapping reconstructs the amplitude information of the object, A 0 .A different strategy is applied with the proposed unsupervised DL method shown in Fig. 2. Without the need for pretraining on large labeled datasets, it works directly on the amplitude A(x, y, z = d), i.e., the propagated diffraction pattern of the object.Unlike PhysenNet, which was developed for visiblerange images and only phase objects [23], the model proposed here is suitable for all holographic systems and has no object restrictions.The field-amplitude diffraction pattern is the only input to the CNN, which is designed to generate estimated phase and amplitude maps simultaneously (see Supplementary for a detailed description).The two output paths share the CNN's front layers in order to ensure that the same object information is analyzed, the path splits in two only at the backend.The model H is then applied to convert the network outputs to an estimated diffraction pattern.The MSE between the input and the estimated diffraction pattern is fed back to the network to optimize the weights and bias values via gradient descent.Thus, the retrieval of the phase is formulated as This will force the calculated diffraction pattern to converge to the measured pattern I, as the iterative process proceeds.The pre-training and blind tests were performed on a PC with a sixteen-core 3.50-GHz CPU and 64 GB of RAM, using an Nvidia GeForce RTX 3080 GPU.On average, the pre-training of the supervised CNN took ∼4 h for 60,000 pairs of training data and 10,000 pairs of validation data during ∼100 training epochs.The inference time of the trained network for a hologram of 72 × 72 pixels was < 0.1 s.The training (optimization process) of the unsupervised NN (with phase and amplitude channel) took ∼1 min during ∼500 epochs.
We demonstrate the performance of the proposed supervised and unsupervised methods with data derived from simulation (THz-emulated MNIST) and experiment.Fig. 3 shows results obtained with the two methods for a randomly chosen MNIST number ("0").Note that the diffraction pattern as shown in Fig. 3(a) is the only input to both methods.In the case of the supervised approach, the networks are first pre-trained to represent the mapping function between the amplitude diffraction pattern and the objects' phases and amplitudes.The corresponding learning curves in Fig. 3(f1) and f(2) show convergence to low validation loss values after ∼50 epochs, indicating that the networks have not become overfitted by the training data set.We obtain a validation loss (MSE) of 0.00018 and 0.00048 after 100 epochs for amplitude and phase, respectively.The reconstructed amplitude and phase of the test object are shown in Fig. 3(d) and (e), with the ground truth for comparison in (b) and (c).The MSE between the ground truth and the reconstruction is 0.00016 and 0.00011 for phase and amplitude, respectively.
As stated above, the proposed unsupervised DL method works directly on diffraction data without prior training.The optimization process can be monitored by the loss curve (MSE between the input diffraction pattern and the one calculated from the CNN-estimated phase and amplitude data) which is displayed in Fig. 3(g).As the optimization goes on, the MSE drops to 10 −6 after 500 epochs, yielding the concurrently updated diffraction, amplitude and phase patterns shown in Fig. 3(h)-(j).It is found that the MSE between the ground-truth phase and the predicted phase is 0.06, while the MSE between the ground-truth amplitude and predicted amplitude is 0.00029.Compared to the supervised DL method, the phase prediction performance is slightly worse, while the amplitude prediction is comparably good.This trend is confirmed with other test objects.In more ambiguous cases, supervised DL is found to be clearly superior to unsupervised DL (see Supplementary).Table 1 summarizes the findings.To validate the proposed methods by experiments, we performed coherent heterodyne THz measurements at 300 GHz with two frequency-locked electrical multiplier chains (see also Supplementary).The objects consisted of thin metal screens with cut-outs in the form of digits.The collimated THz beam illuminated the object, the transmitted diffracted wave was detected coherently 70 mm behind the object by a raster-scanned singlepixel TeraFET detector [32].The detector and the second emitter, which provided the local-oscillator signal, were co-located on a 2D translation stage.From the measured complex-valued diffraction pattern (E d (z = d)), the object's amplitude (A 0 ) and phase (φ 0 ) were reconstructed by back-propagation via the inverse operation of Eq. 2. A 0 , φ 0 represent the ground truth for the DL evaluation of the image represented by A(z = d).

Table 1. Comparison of the two methods
The measured A(z = d) diffraction pattern and the ground truth together with predictions from the DL methods are shown in Fig. 4. We reconstructed the amplitude and phase in three different ways, by unsupervised and supervised DL (Fig. 4(d)-(g)) and by a combination of both in a sequential process where the prediction obtained by supervised DL served as input to the following unsupervised process, yielding the final reconstruction shown in Fig. 4(h), (i).The MSE values (referenced to the ground truth) for the unsupervised method are 0.006 (0.05) for the reconstructed amplitude (phase).The corresponding values for supervised DL are 0.0084 (0.0249).The combined method yields the best MSE values: 0.0035 (0.0230) for amplitude (phase).The image exhibits sharper edges and better contrast, and is closer to the ground truth.We attribute the reason for this considerable improvement to a suppression of the influence of noise, which is present in the measured images, by the supervised DL step.This denoising effect invites further studies in the future.In conclusion, we have proposed supervised and unsupervised deep-learning methods, and a combination thereof, for the reconstruction of 2D terahertz images, starting from amplitude diffraction patterns and deriving the amplitude and phase in the object plane.As it is next to impossible to generate by coherent measurements a THz image database of sufficient size for pre-training of the neural networks, a large set of publicly accessible visible-range images was converted numerically into emulated complex-valued THz images.The neural network trained with these images achieves fast and reliable image reconstruction with both emulated and experimentally measured THz amplitude images.Regarding the proposed unsupervised method, a dual output layer, integrating a physical model of the wave propagation, has been introduced to simultaneously predict the amplitude and phase information.The best reconstruction performance with measured image data is achieved by the sequential application of both methods: The first step with the supervised method eliminates much of the noise of the raw images and provides an already good image estimate for the subsequent unsupervised method.At least for sufficiently simple scenes, our findings confirm the potential of phase retrieval from amplitude images of THz diffraction patterns.

Fig. 1 .
Fig. 1.Illustration of the supervised phase reconstruction model.(a) Random original image from the MNIST data set, (b) modified amplitude and phase image, (c) amplitude diffraction pattern in the THz domain calculated by the physical model.(d) Scheme of the convolutional neural network (CNN) of the DL method.A measured diffraction pattern is the input of the CNN, and the output is a phase image.The mean squared error (MSE) between this phase reconstruction and the ground truth phase (i.e., the modified phase image in (b)) is taken as the loss function to optimize the CNN (see Supplementary for further details).
utilizes a parameterized network function R θ (θ denoting the network weights and bias parameters) to approach the desired inverse mapping H −1 (•) via learning based upon the labeled training set S T = (A k , φ k ), k = 1, 2, ..., K, thus solving the optimization problem of

Fig. 2 .
Fig. 2. Schematic illustration of the pipeline of unsupervised DL.The input to the NN is a measured field-amplitude diffraction pattern, the outputs are estimated phase and amplitude maps, which are then numerically propagated via the model H to simulate the diffraction and measurement processes, yielding an estimated diffraction pattern.The MSE between the true and estimated patterns provides the loss function used to optimize the NN parameters (see Supplementary).

Fig. 3 .
Fig. 3. Results obtained with the testing data set for supervised and unsupervised DL.(a) Diffraction pattern of a digit "0" from the testing data set.(b), (c) Ground truth of the object's amplitude and phase.(d)-(f) Supervised DL: Evolution of the MSE for amplitude (f1) and phase (f2) during pre-training (blue lines) with 60,000 images and testing with 10,000 ones (orange lines), and resulting predictions for the test "0" (d), (e).(g)-(j) Unsupervised DL: Optimization (training) curve (g) and evolution of the estimated diffraction pattern (h), amplitude (i), and phase (j) during optimization.

Fig. 4 .
Fig. 4. Comparison of experimental and reconstruction results.(a) Amplitude diffraction pattern measured at a distance of 70 mm.(b), (c) Ground-truth amplitude and phase.(d)-(g) Reconstructed amplitude and phase maps, obtained by unsupervised DL ((d), (e)), respectively supervised DL ((f), (g)).(h), (i) Amplitude and phase images reconstructed by the sequential combination of the two methods.