Photonics-based 3D radar imaging with CNN-assisted fast and noise-resistant image construction

: Photonics-based high-resolution 3D radar imaging is demonstrated in which a convolutional neural network (CNN)-assisted back projection (BP) imaging method is applied to implement fast and noise-resistant image construction. The proposed system uses a 2D radar array with each element being a broadband radar transceiver realized by microwave photonic frequency multiplication and mixing. The CNN-assisted BP image construction is achieved by mapping low-resolution images to high-resolution images with a pre-trained 3D CNN, which greatly reduces the computational complexity and enhances the imaging speed compared with basic BP image construction. Besides, using noise-free or low-noise ground truth images for training the CNN, the CNN-assisted BP imaging method can suppress the noises, which helps to generate high-quality images. In the experiment, 3D radar imaging with a K-band photonics-based radar having a bandwidth of 8 GHz is performed, in which the imaging speed is enhanced by a factor of ∼ 55.3 using the CNN-assisted BP imaging method. By comparing the peak signal to noise ratios (PSNR) of the generated images, the noise-resistant capability of the CNN-assisted BP method is soundly verified.


Introduction
Radar imaging has wide applications in remote sensing, target recognition, and geological survey, etc. [1,2]. Currently, the operation bandwidth of traditional radars is constrained by the electric devices and subsystems, which results in a limited range resolution in decimeter level. Microwave photonic technology has great potential in breaking through the frequency and bandwidth limitations of traditional radars, and it has been applied in radar applications [3][4][5][6][7]. Previously reported photonics-based radars have achieved a range resolution at the centimeter level, which makes it possible for high-resolution 2D and 3D radar imaging [8][9][10][11][12][13][14]. Nevertheless, the improvement of range resolution brings great challenges to construct high-quality images with commonly used radar imaging algorithms, such as the Range-Doppler (RD) algorithm [15]. This is caused by the fact that the target migration may cover multiple range resolution units, making it difficult to implement accurate motion and phase compensation. The obtained radar images may suffer from defocusing and distortions [16]. Using time-domain back projection (BP) algorithm can avoid the problem of migration by accurate coherent accumulation [12]. However, the imaging speed is rather low because the BP algorithm requires quite a lot of computations, and this problem is aggravated for broadband high-resolution imaging as the number of image pixels is increased. To improve the imaging speed, a few fast BP algorithms have been proposed, but most of these methods would lower the image quality since the fast imaging is realized by undersampling in azimuth direction. Therefore, a method for implementing fast and high-precision imaging with photonics-based broadband radar is highly desired.
In recent years, deep learning has emerged as a promising technique for signal recovery and image processing. Convolution neural network (CNN), an important model of the deep learning technique, has unique advantages in feature extraction and identification. It has been successfully applied to implement speckle elimination [17,18], target classification [19,20], and recognition [21] in the field of radar imaging. In [22], we have proposed a CNN-assisted BP method using a pre-trained CNN to complete the mapping between low-resolution and high-resolution radar images. With this method, the imaging speed can be remarkably improved while keeping the high-precision imaging capability originated from BP imaging method. Another advantage of this method is that it is noise resistant if noise-free images are used as the ground truth images when training the CNN. Therefore, this method is a promising solution to implementing fast and high-quality imaging for photonics-based broadband radars. In [22], the fast and noise-resistant imaging capability of the CNN-assisted BP method is verified through simulations of a 2D radar imaging. In this paper, we extend the CNN-assisted BP method from 2D imaging to 3D imaging, and experimentally investigate its performance with a photonics-based 3D radar imaging system, which is particularly important to validate the effectiveness of this method in real applications. In addition, a potential problem with the previous CNN-assisted BP method is pointed out, i.e., the noise-resistant property may suppress the weak-scattering targets and thus reduce the reliability of the obtained images. A solution to this problem is proposed and investigated. Figure 1 shows the schematic diagram of the photonics-based 3D radar imaging system, in which a 2D uniform rectangular array consisting of W×H elements is adopted. Each element of the 2D array is a broadband radar transceiver constructed based on photonic frequency multiplication and frequency mixing [4]. To save the cost, the 2D uniform rectangular array can also be constructed by the synthetic aperture radar technique [23,24], which requires only one radar transceiver. To establish the radar transceiver, a continuous wave intermediate frequency (IF) band linearly frequency modulated (LFM) signal generated by a voltage-controlled oscillator (VCO) is fed to an electrical 90°hybrid coupler. The obtained two signals are sent to the two RF ports of a dual-parallel Mach-Zehnder modulator (DPMZM) to modulate the continuous wave light source generated by a laser diode (LD). By properly setting the bias voltages, the DPMZM works at the frequency quadrupling mode [11], i.e., only the ±2nd-order modulation sidebands are generated. An optical coupler (OC) is used to equally split the optical signal into two branches. In the upper branch, the optical signal is sent to a photodetector (PD1) to complete optical-to-electrical conversion. The generated LFM signal has a bandwidth that is four times the original IF-LFM signal. It is launched to the free space through a transmit antenna after amplified by an electrical amplifier (EA1). The radar echoes collected by the receive antenna are properly amplified by another amplifier (EA2) and applied to an MZM to modulate the optical signal from the lower branch of the OC. The obtained optical signal is amplified by an erbium-doped optical fiber amplifier (EDFA) and sent to another PD (PD2) to implement photonic frequency mixing, which completes the de-chirp processing [11]. An electrical low-pass filter (ELPF) is followed to select the desired de-chirped signal by removing the high-frequency interferences. Then, the de-chirped signal is sampled by an analog-to-digital converter (ADC) before sent to a digital signal processing (DSP) unit.

Principle
To illustrate the principle of basic BP imaging method, we assume the 3D image to be constructed contains M, N, and L pixels along x, y, and z directions, respectively. Here, y-axis is the range direction along the radar line of sight (LOS), x-axis is the azimuth direction, and z-axis is the elevation direction, as shown in Fig. 1. The 1D range profiles corresponding to different elements of the 2D array are acquired by performing fast Fourier transformation (FFT) to the de-chirped digital signals [11]. By back projecting the range amplitudes to the M×N×L pixels of the imaging area through interpolation, W×H coarse images are obtained, as denoted by R wh (t mnl ), where t mnl is the round-trip time delay between the w,h-th element of the 2D array (w=1, 2, . . . , W; h=1, 2, . . . , H) and the image pixel at the coordinate of (x m , y n , z l ) [22]. Then, all these coarse images are coherently accumulated to get the final 3D image. The amplitude of the image pixel at (x m , y n , z l ) can be expressed as: where c is the speed of light and λ is the carrier wavelength. In this process, the amount of calculations, which is proportional to W×H×M×N×L, is usually quite large and the imaging speed is rather low. Since the photonics-based broadband radar enables high-resolution imaging, the resultant increase of imaging pixels will undoubtedly aggravate the computation complexity and further lower the imaging speed. To address this, we propose a CNN-assisted BP imaging method, of which the basic idea is to reduce the number of back projection pixels, and then construct a high-resolution image using a pre-trained CNN. Figure 2 shows the processing flow of the CNN-assisted BP imaging method. Firstly, the imaging area is divided into (M/α) × (N/β) × (L/γ) blocks (α, β, γ, M/α, N/β, and L/γ are positive integers) to get a low-resolution 3D image containing (M/α) × (N/β) × (L/γ) pixels using basic BP algorithm. Here, the low-resolution image may not have high accuracy, but it should cover all the imaging areas. Then, a pseudo-high-resolution 3D image with M×N×L pixels is obtained by mapping each pixel of the low-resolution image to a random pixel within the block covering the same area, as shown in Fig. 2. The other pixels within the same block are simply set to zero. After normalization processing, the pseudo-high-resolution 3D image is sent to a CNN, which outputs the desired high-resolution 3D image having M×N×L pixels. To avoid destroying the integrity of the input data, a 3D CNN structure is adopted, as shown in Fig. 2. The 3D CNN consists of one 3D input layer, multiple middle layers, and one output layer. The 3D input layer imports the pseudo-high-resolution 3D images into the network. Each middle layer contains a 3D convolution layer for feature extraction, a batch normalization (BN) layer for speeding up the training process and reducing the sensitivity to network initialization, and a rectified linear unit (ReLU) as the nonlinear activation function [17]. The output layer is composed of a 3D convolution layer and a regression layer, which outputs the data of the last middle layer in the same form as the ground truth image and calculates the loss, thus helping to stabilize and speed up the training of the network. All the 3D convolution layers in the 3D CNN use the cubic filter having the same size to extract features from local neighborhoods on 3D feature maps in the previous layer. The 3D CNN is trained using datasets generated by simulations. To make sure it is applicable in different scenarios, the pseudo-high-resolution 3D images are generated considering different conditions in which the number of targets, the target locations, and signal-to-noise ratio (SNR) of the radar echoes are randomly chosen within a specific range. The ground truth images used for training the CNN are the high-resolution 3D images obtained by basic BP method. When training the 3D CNN, the mean square error (MSE) between the output image and the ground truth image is used as the loss function: where Y Θ m,n,l and X ground m,n,l denote the m, n, l-th pixel of the network's output image and the corresponding ground truth, respectively. The output image Y Θ is given by where F is the 3D CNN's operator on the input image X input and Θ is the 3D CNN's parameter space (e.g., kernels, weights, and biases). To suppress the gradients explosion due to a small learning rate and improve the training speed as much as possible, adjustable gradient clipping is adopted in the training process. Once the 3D CNN is trained, it can be used to construct 3D images based on actual experimental data. In this process, the complexity of constructing a low-resolution 3D image is reduced by a factor of α×β×γ compared with that when directly generating the high-resolution 3D image. By controlling the complexity of the 3D CNN structure, running the CNN would take much less time than building the image with BP algorithm. Therefore, the imaging speed of the CNN-assisted BP method will be enhanced with a factor close to α×β×γ, compared with basic BP method. When the ground truth images used for training the 3D CNN are generated without loading noises to the radar echoes, the CNN-assisted BP imaging method can suppress the noises after image construction. This noise-resistant property has been verified through simulations of 2D image construction in our previous work [22].

Experiments
An experiment is carried out to investigate the performance of the photonics-based 3D radar imaging system. The radar transceiver is built based on the setup in Fig. 1. The CW light is generated by an LD (TeraXion Inc.) with a wavelength of 1550.12 nm. A VCO (INNO-9205) is used to generate an IF-LFM signal, of which the bandwidth is 2 GHz (4.5-6.5 GHz) and the pulse width is 100µs with a repetition rate of 5 kHz. The DPMZM (Fujitsu FTM7962EP) has a 3-dB bandwidth of about 28 GHz. The output signal from the DPMZM is divided into two branches by an OC. In the upper branch, a PD (PD1, u2t XPDV2120RA, bandwidth: 40 GHz) is used to implement optical-to-electrical conversion, generating a frequency quadrupled LFM signal covering a frequency range from 18 GHz to 26 GHz. The LFM signal is amplified by an EA (EA1, SHF 806E, 26 dB gain) and fed to the transmit antenna. The radar echoes collected by the receive antenna are properly amplified by another amplifier (EA2, SHF 806E) and applied to an MZM (EOSAPCE Inc.) to modulate the optical signal in the lower branch of the OC. The output signal amplified by an EDFA (Amonics Ltd.) is sent to another PD (PD2, CONQUER Inc., bandwidth: 10 GHz) to complete photonic frequency de-chirping. An ELPF with a 3-dB bandwidth of 95 MHz is followed to remove the high-frequency interference. The output signal is sampled by an ADC with a sampling rate of 100 MSa/s. The operation bandwidth of the established radar is 8 GHz (18-26 GHz), which enables a high range resolution of 1.875 cm. In our experiment, a synthetic aperture radar architecture is adopted, in which a single radar transceiver is moved along a two-dimensional sliding track to form a 2D uniform rectangular array. The equivalent 2D array is composed of 50×50 elements (W = H=50), and it covers an area of 2 m × 2 m. The imaging area is apart from the 2D array by 4 meters.
The dataset used for training the 3D CNN is numerically generated based on a mathematical model that has the same parameters as the experimental scenario. When generating the lowresolution images, the target is a series of point reflectors, of which the total number is randomly chosen from 1 to 25 and the positions are randomly assigned within the imaging area. Besides, additive white Gaussian noise (AWGN) is loaded to the radar echoes, imitating a radar receiver with a random SNR between -55 dB and -14 dB. The corresponding ground truth high-resolution 3D images are generated by basic BP imaging algorithm without considering the noise. In this way, 500 pairs of low-resolution 3D images and high-resolution ground truths are generated as a dataset. The 3D CNN is set to have five middle layers, in which the five convolution layers have 2, 16, 16, 16, and 4 filters, respectively, and all the filters have a size of 5×5×5. These parameters are optimized to achieve a good tradeoff between a fast imaging speed and a high imaging quality. If a CNN having a more complex structure is applied, the imaging quality may be improved, but the imaging speeding will be lowered because of the increased computational complexity. The 3D CNN is trained using Stochastic Gradient Descent Momentum (SGDM) optimization, and the learning rate is reduced by 10 times every 10 periods with the initial value being 0.01. Besides, the L2 norm of the gradient is used to enable gradient clipping with a threshold of 0.01. All the numerically generated images are used to train the 3D CNN, and the loss curve during the training process is shown in Fig. 3. As can be seen, the loss is well converged to an acceptable level after 50 epochs of training. Here, training the CNN takes 21.5 hours using a commercial computer (CPU: i9-9900K 16-core, GPU: GTX 2080Ti, RAM: DDR4 64GB). With the trained 3D CNN, radar image construction with experimentally collected data can be implemented. To verify the feasibility of the CNN-assisted BP method, three reflective balls (diameter: 3.76 cm) are used as the targets, of which the positions have different values in range, azimuth, and elevation directions, as shown in Fig. 1. The desired 3D image includes 160×160×160 pixels (M = N=L=160), that covers a cubic space of 0.28m×0.28m×0.28m. When using the basic BP imaging method based on Eq. (1), the 3D view of the obtained image is shown in Fig. 4(a), which is the scatterplot of the points with their amplitudes larger than 0.1% of the peak amplitude. Figure 4(b) shows the 2D view obtained by maximum value projection to one of the coordinate planes. When using the CNN-assisted BP method, the low-resolution image obtained by basic BP method has 40×40×40 pixels, which means the computation complexity is reduced by a factor of 64 (α=β=γ=4). Here, separating the image area into 40×40×40 pixels can make sure there are at least two samples within a radar resolution cell, obeying the spatial sampling theorem [25]. After generating a pseudo-high-resolution image and sending it to the CNN, a 3D image with 160×160×160 pixels is obtained with its 3D and 2D views shown in Fig. 4(c) and Fig. 4(d), respectively. The results in Fig. 4 indicate that high-resolution image construction can be well implemented with the proposed CNN-assisted BP method. In addition, the targets of the image generated by CNN-assisted BP method are more focused than those generated by basic BP method. This is attributed to the noise-resistant property of the CNN-assisted BP imaging method, which suppresses the noises interferences circled by the red dotted line in Fig. 4(b). The imaging time required for obtaining Fig. 4(a) and Fig. 4(c) with the same computer as that used for training the CNN is measured to be 407.005 s and 7.360 s, respectively, with a reduction ratio of ∼55.3. Therefore, the CNN-assisted BP method achieves greatly enhanced imaging speeding compared with basic BP method.
In the previous demonstration, the scattering intensities of the three balls in Fig. 1 are nearly the same in all directions. To imitate real scenes in which the scattering points usually have different reflection coefficients, a complex U-shape target composed of a few corner reflectors (size: 2 cm × 2 cm × 2 cm) is applied, as shown in Fig. 5(a). The reflectors are artificially set to have different orientations, such that the radar echoes reflected from different reflectors have different amplitudes. The desired 3D image still has 160×160×160 pixels, and the low-resolution image used in the CNN-assisted BP imaging has 40×40×40 pixels. In this case, the improvement of imaging speed by using the CNN-assisted BP method is nearly the same as the previous demonstration. Figure 5(b) and (c) show the imaging results obtained by the basic BP method and the CNN-assisted BP method, respectively. Here, due to the unideal observation direction for some of the reflectors, they are missing in the obtained 3D images. While, by comparing the two 3D images, it is found that the CNN-assisted BP algorithm can still perform fast imaging with high precision, and the strong scatter points are more focused because of the noise-resistant property. Considering that the collected radar echoes in the experiment have very low noise, to intensively investigate the noise-resistant capability of the proposed method, different levels of AWGN are artificially added to the de-chirped digital signals. We define the relative SNR, which is the power ratio between the de-chirped signal and the loaded AWGN, to indicate how much noise is loaded. When the relative SNR is 30dB, the 3D imaging results obtained by the basic BP imaging are shown in Fig. 5(d), in which the background noises severely deteriorate the image quality. When using the CNN-assisted BP imaging method, the obtained image is clear with the background noise well suppressed, as shown in Fig. 5(e). In obtaining the images of Figs. 5(c) and (e), the CNN used for image construction is trained with noise-free ground truth images, which makes the proposed algorithm have very strong noise-suppress capability. A potential problem is that, as the background noises are suppressed, the targets with weak scattering amplitudes are also suppressed. This problem becomes more serious when the radar echoes suffer from more noises. When the relative SNR is decreased to 25 dB, the basic BP image is shown in Fig. 6(a), in which the target cannot be clearly observed due to the serious noise. The image obtained by CNN-assisted BP method is shown in Fig. 6(b). As can be seen, the target can be observed in Fig. 6(b) and most of the background noises are suppressed, the amplitudes corresponding to the weak scattering points circled by the red dotted line in Fig. 6(b) are also suppressed, which reduces the reliability of the obtained images. To solve this problem, we propose to use low-noise ground truth images to train the CNN, instead of using noise-free ground truth images. Specifically, the ground truth images are obtained using numerically generated radar echoes that have 30-dB higher SNR than the echoes used for constructing the low-resolution images. With this modification, when the relative SNR is 25 dB, the 3D image obtained by the modified CNN-assisted BP method is shown in Fig. 6(c). In this case, although the background noise is slightly stronger than that in Fig. 6(b), the weak scattering points can be well observed, which is helpful to improve the reliability of the obtained images. In the modified method, when training the CNN, the SNR difference between the radar echoes should be chosen considering both the noise level of the experimentally collected radar echoes and the desired radar imaging quality. In our experiment, the SNR difference of 30 dB achieves a good tradeoff between a strong noise suppression capability and a high fidelity of the weak scatters. If the radar detection environment is changed, a re-optimization of the SNR difference is preferred. To quantitatively evaluate the noise-resistant capability of the proposed method, the peak signal to noise ratios (PSNR) of the 3D images generated by the basic BP method and the modified CNN-assisted BP method are compared when the radar suffers from different levels of noises. The PSNR is a commonly used evaluation index in radar denoising and despeckling research [26,27], which is defined by in which X test and X ref are the image under test and the reference image, respectively. When calculating the PSNR, the 3D image obtained by basic BP method without loading extra AWGN, i.e., the image in Fig. 5(b), is used as the reference. Since the target contrast in each image is usually concerned in radar applications, the image under test and the reference image are normalized individually according to their maximum amplitude. Figure 7 shows the measured PSNR when the relative SNR decreases from 36 dB to 4 dB with a step of 0.5 dB. In Fig. 7, the PSNR values of the images generated by the modified CNN-assisted BP method are always higher than those of images obtained by basic BP method, and the maximum difference between the two curves reaches 12 dB when the relative SNR is 23.5 dB. When the relative SNR is lower than 20 dB, due to the serious noises, the PSNR values of the two methods tend to be flat with slight fluctuations, while the modified-CNN assisted BP method has an advantage of about 5 dB. This result can soundly verify the good noise-resolution capability of the proposed method, especially in strong noise scenarios.

Conclusion
We have proposed and demonstrated a CNN-assisted microwave photonic broadband 3D radar imaging method aiming to realize fast and noise-resistant 3D radar imaging. The proposed method uses CNN to construct high-resolution images based on low-resolution images, which greatly enhances the imaging speed compared with basic BP imaging. Besides, using noise-free or low-noise ground truth images to train the CNN, the proposed method can suppress the noises, which helps to get high-quality images. The performance of the proposed method is investigated through experiments with a photonics-based broadband radar, and the results verify its feasibility and advantage. Disclosures. The authors declare no conflicts of interest.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.