Estimating the Magnitude and Phase of Automotive Radar Signals under Multiple Interference Sources with Fully Convolutional Networks

Radar sensors are gradually becoming a wide-spread equipment for road vehicles, playing a crucial role in autonomous driving and road safety. The broad adoption of radar sensors increases the chance of interference among sensors from different vehicles, generating corrupted range profiles and range-Doppler maps. In order to extract distance and velocity of multiple targets from range-Doppler maps, the interference affecting each range profile needs to be mitigated. In this paper, we propose a fully convolutional neural network for automotive radar interference mitigation. In order to train our network in a real-world scenario, we introduce a new data set of realistic automotive radar signals with multiple targets and multiple interferers. To our knowledge, we are the first to apply weight pruning in the automotive radar domain, obtaining superior results compared to the widely-used dropout. While most previous works successfully estimated the magnitude of automotive radar signals, we propose a deep learning model that can accurately estimate the phase. For instance, our novel approach reduces the phase estimation error with respect to the commonly-adopted zeroing technique by half, from 12.55 degrees to 6.58 degrees. Considering the lack of databases for automotive radar interference mitigation, we release as open source our large-scale data set that closely replicates the real-world automotive scenario for multiple interference cases, allowing others to objectively compare their future work in this domain. Our data set is available for download at: http://github.com/ristea/arim-v2.


I. INTRODUCTION
A UTONOMOUS driving and road safety are very important topics in order to reduce the number of traffic accidents and the number of deaths on the road. One of the solutions proposed by automotive companies to build autonomous and safer vehicles is based on scanning the surrounding environment using radar sensors. The most common radar senors used in the automotive industry are frequency modulated continuous wave (FMCW) / chirp sequence (CS) radars, which transmit sequences of linear chirp signals. The signals transmitted and received by such sensors provide the means to estimate the distance and the velocity of nearby targets (e.g., vehicles, pedestrians or other obstacles). For instance, automotive radar sensors have even been used to detect very small objects (e.g., road debris [1]). However, the growing adoption of radar sensors [2] increases the probability of interference among sensors from different vehicles, generating corrupted and unusable signals. Indeed, radio frequency interference can raise the noise floor by a large margin, to the point where potential targets are completely hidden by noise, thus reducing the sensitivity of target detection methods [3]. In Figure 1, we present a range profile of a radar signal with and without interference, in which some of the targets visible in the clean range profile are absorbed by the risen noise floor caused by multiple interference sources. In order to be able to detect such targets, the radar interference has to be mitigated. To address this problem, researchers have proposed various techniques ranging from  conventional approaches [4]- [11] to deep learning methods [12]- [16].
In this paper, we extend our prior work [12] by designing a novel fully convolutional network (FCN) [17] that (i) can recover the phase along with the magnitude of radar beat signals and (ii) can cope with multiple non-coherent radiofrequency (RF) interference sources. Our network takes as input the real part, imaginary part and magnitude of the Short-Time Fourier Transform (STFT) of the beat signal with interference, providing as output the real part, imaginary part and magnitude of the range profile, respectively. Although our network does not directly estimate the phase, it can be trivially computed from the real and imaginary parts. To our knowledge, we are among the few to propose a deep learning model that can accurately estimate the phase, this being a well-known problem, which is often left as future work in related articles [18]. While most deep learning approaches studied radar interference mitigation with a single interference source [11], [12], [14], we aim to address the task under multiple interference sources. To achieve this goal, we generate a large-scale data set that closely replicates the realworld automotive scenario for multiple interference sources, considering up to three interference sources during training and up to six interference sources during inference. We compare our approach with three state-of-the-art methods, one based on zeroing [4], [9] and two based on deep neural networks [12], [13], reporting superior results for various evaluation metrics. In this paper, we also apply weight pruning [19], [20], improving the signal-to-noise ratio contained in the weights of our neural network models. We compare our weight pruning to the widely-adopted dropout [21], showing that the former approach helps the neural model to reach a better convergence point. Furthermore, we release our novel data set as open source, allowing other researchers and engineers to objectively compare their future work on radar interference mitigation. Our data set is available for download at: http://github.com/ristea/arim-v2. Along with the data set, we also release the code to reproduce our results.
In summary, our contribution is threefold: • We propose a deep learning model able to mitigate noncoherent RF interference from multiple sources. • We design a fully convolutional network architecture that outputs clean range profiles, estimating both the magnitude and, indirectly, the phase. • We introduce a radar interference data set with a wide and realistic range of signal parameter variations as well as multiple interference sources.
We organize the rest of this paper as follows. We present related work on radar interference mitigation in Section II. We describe our method based on fully convolutional networks in Section III. We present our data set composed of generated range profiles in Section IV and we provide a comprehensive set of experimental results in Section V. Finally, we draw our conclusions in Section VI.

II. RELATED WORK A. CONVENTIONAL METHODS
State-of-the-art interference mitigation methods are usually classified according to the domain in which the interference is mitigated [4]- [9]: polarization, time, frequency, code and space. Polarization-based methods assume the use of crosspolarized antennas between the two interfering radars and the mitigation margin is around 20 dB, but ground reflections or other surrounding targets can severely reduce this margin. Time domain methods include the following approaches: using low transmit duty cycles (to reduce the probability of hitting other receivers), using short receive windows (to reduce the probability of being hit by an interferer), or employing a variable pause between transmitted chirps or a variable chirp slope (to avoid periodic interference). Frequency domain methods involve a division of the authorized operating bandwidth into several sub-bands, such that nearby systems operate in different sub-bands. Radio frequency interference (RFI) mitigation in the coding domain implies the modulation of radar wave forms with a device-specific code (to minimize cross-talk between radars, the codes of different devices should be orthogonal), whereas in the case of space domain techniques, the antenna radiation pattern is adaptively configured to avoid interfering signals.
A particular class of methods are the strategic RFI mitigation techniques [4], which require additional hardware and/or software, yet rely on some of the basic techniques. The classical strategic approaches are: "communicate and avoid" (requires inter-vehicle communication to avoid simultaneous transmission), "detect and avoid" (e.g., detects the interference in a sub-band and changes the operating sub-band of the radar), "detect and repair" (after detection, the measurement with interference is reconstructed), "detect and omit" (after detection, the measurements affected by interference are removed) and "listen before talk" (the radar transmits only when no other transmitting device is detected).
Currently, mitigation of a single FMCW interferer on an FMCW victim radar is quite well understood [22], ongoing research in the field being focused on other scenarios that involve multiple interference sources due to the increasing number of vehicles equipped with radars and the increase in the number of radar systems per vehicle.
Different from all these methods, which rely on algorithms handcrafted by researchers, we propose an approach based on end-to-end learning from data. In order to obtain our approach, we extended the data set and the method proposed in our previous work [12] in order to learn deep neural networks for RFI under multiple interference sources.

B. DEEP LEARNING METHODS
Deep learning techniques have been applied in a vast diversity of tasks with remarkable results, including object detection [23]- [25], speech separation [26] and medical image super-resolution [27]- [29]. One such task is image denoising, where deep learning achieved state-of-the-art results [30], outperforming classical filtering approaches (i.e., median or bilateral filtering). By transforming an arbitrary signal with STFT, we obtain an image-like representation that can eliminate the gap between the task of signal denoising and that of image denoising. Indeed, the interference becomes a noise pattern that is overlapped over the signal's STFT, opening up the possibility to employ novel ways for interference mitigation, previously applied only on images. In this context, we propose to apply fully convolutional networks, a deep learning technique, to transform a noisy STFT image into a clean range profile of an FMCW radar sensor.
To our knowledge, there are only a handful of related works [11]- [15], [18] that employ deep learning models for radar interference mitigation. Most of the existing deep RFI mitigation approaches consider only scenarios with one source of interference. Complex scenarios with multiple sources of interference, which are very likely to happen in daily driving, were only considered by Rock et al. [13]. In [13], the authors proposed a convolutional neural network (CNN) to address RFI, aiming to reduce the noise floor while preserving the signal components of detected targets. Their CNN architecture can be trained using either range processed data or range-Doppler (RD) spectra as inputs. The authors reported promising results, but they still had concerns regarding the generalization capacity on real data. In the experiments, we show that our approach outperforms the model of Rock et al. [13] by a significant margin. Additionally, we demonstrate that our method can generalize to real data.
Another approach that relies on CNNs is proposed in [18]. The authors employed an auto-encoder based on the U-Net architecture [31], performing interference mitigation as a denoising task directly on the range-Doppler map. They surpassed classical approaches, but their method fails to fully preserve the phase information. Similarly, in [14], the network architecture is build upon CNNs, but the authors added residual connections, inspired from the ResNet model [32]. A different interference mitigation method is proposed in [11], which is based on applying a recurrent neural network model with Gated Recurrent Units (GRU) [33] on the time domain signal. The authors reported better performance and lower processing times compared to previous signal processing methods. Similarly, Mun et al. [15] proposed an approach that is also based on GRU, but they add a novel attention block. This approach attains better results than classical methods and the authors empirically prove that the attention block brings a performance boost. Nevertheless, the algorithm is not tested on real data or on a large test collection, so there are concerns regarding its generalization capacity.
Relation to preliminary VTC-Fall 2020 version [12]. We recently proposed two novel FCN models in our preliminary work [12], which are able to transform an STFT sample affected by interference into the corresponding clean range profile. The models have the capacity to generalize on real data, but they are not designed to estimate the phase of beat signals. In the current work, we extended our preliminary work presented in [12] by designing a method able (i) to recover the phase of beat signals and (ii) to cope with multiple interference sources. In addition, we employ a new training regime based on weight pruning [19], [20], which is aimed at improving the signal-to-noise ratio of the neural network weights. Moreover, the performance of our novel neural model is highlighted by the experimental results. Indeed, we achieved the best performance on the ARIM-v2 data set and we empirically proved the model's generalization capability. Additionally, we extended the data set proposed in our preliminary work [12] to multiple sources of interference, releasing the first freely available interference mitigation data set for multiple sources of interference.

A. RADAR SIGNAL MODEL
In FMCW radar solutions, the transmitted signal s T X (t) is a chirp sequence, whose frequency usually follows a sawtooth pattern. In the presence of mutual interferences, the receiving antenna collects a mix from two signals, the reflected signal and the interference signal. Consequently, the received signal is defined as follows: where A i = A i · e jφi is the complex amplitude, τ i is the propagation delay of target i, N t is the number of targets, and N int is the number of interferers. The receive antenna collects the reflected signal s RX (t), which is further mixed with the transmitted signal and low-pass filtered, resulting in the beat signal s b (t). After mixing the signal reflected by a point-like target with the transmitted signal, we obtain a signal with constant frequency, whereas by mixing an uncorrelated interference with the transmitted chirp, we obtain a baseband chirp signal (as depicted in a qualitative manner in Figure 2).
The slope of the interference chirp (after mixing) is equal to the difference between the slope of the transmitted signal k and the slope of the interference k RF I,l , while its zerofrequency point t RF I,l corresponds to the intersection be-tween the instantaneous frequency laws (IFL) of the transmitted and interference chirps. Based on the time-frequency diagram from Figure 2b, the IFL of the interference chirp in baseband can be written as: where T AAF,l = 2F M |k−k RF I,l | is the duration of the interference, which is limited by the anti-aliasing filter used before sampling. If the slope of the transmitted signal is close to the interference slope, T AAF,l can get much longer than the chirp duration T and the actual time extent of the baseband interference limited by T . A similar effect occurs if t RF I,l is near the ends of the repetition interval. Using the introduced notations, the resulting analytical beat signal in the presence of interferences is expressed as: where A RF I,l is the complex amplitude of interference signal l and p(t) is the window function described below: Hence, s b (t) consists of a sum of complex exponentials (representing the targets) and a sum of interfering signals (baseband chirps). Therefore, the uncorrelated interference appears as a highly non-stationary component on the beat signal's spectrogram, being spread across multiple frequency bins, as opposed to the signal received from targets, which is present only at some frequency values [22]. This explains the general aspect of the magnitude of STFT presented in Figure 1.

B. DATA PREPROCESSING
As shown in Figure 1, we need to compute the discrete STFT in order to disentangle the targets from the inference sources. The following equation shows how to transform a time domain signal into an image using the discrete STFT: Nx kn , where x[n] is the discrete input signal (the sampled version of s b (t)), w[n] is a window function, N x is the STFT length and R is the hop/step size [34]. There are a multitude of window functions proposed in literature, such as hann, blackman and others. We chose to perform the STFT with hamming window. Additionally, we scale the STFT (by dividing it with α = 40, which was obtained statistically on the entire training set) in order to have the input data approximately within the range of [−1, 1].
Since our data samples are now represented as images, we consider convolutional neural networks (CNNs) to model the mapping between input images and clean range profiles, noting that CNNs attain state-of-the-art results on natural images [35]- [37], medical images [28] and artificial images resulted after transforming time domain signals [38].
Our goal is to obtain clean range profiles from the STFT of the beat signal affected by noise and uncorrelated interference. We design a custom FCN architecture to provide the clean range profiles as output (during training, the FCN has to learn to reproduce the ground-truth clean range profiles). For this reason, we perform a Fast Fourier Transform (FFT) of our time domain labels (to obtain the ground-truth clean range profiles) and train our network to map the STFT input to the FFT output (computed in N x points, as the number of STFT frequency bins). The intuition behind choosing the input (STFT) and the output (FFT) domains of our network as presented above is that the spectrum of a beat signal affected by interference is covered in noise and the targets are almost undetectable, as could be seen in Figure 1. The advantage of using STFT is that there are portions in the representation where the targets are visible (the thin horizontal lines in the STFT in Figure 1), even if the signal is affected by multiple sources of interference. We empirically tested our intuition by training the same network architecture (redesigned for the one-dimensional FFT input) and discovered that this approach has convergence issues and unusable results.

C. NEURAL NETWORK MODEL
Our goal is to create a neural network that can mitigate RFI and is able to map a noisy STFT input to the clean FFT output for any given signal, in terms of both magnitude and phase. Therefore, we propose a novel FCN architecture that can meet the above requirement. There are related works which take an STFT as input and give an FFT as output, such as [10], but these are not based on deep learning techniques. To the best of our knowledge, there are no approaches based on deep learning models that transform an STFT input sample affected by interference into a clean FFT range profile.
The novelty of our neural architecture is mainly related to the input and output structures, each consisting of a representation composed of three different channels. The first and third channels of the input are the real and imaginary parts of the STFT, while the channel in the middle is the magnitude of the STFT. In terms of information theory, the second channel is redundant information, which could be mathematically determined from the real and imaginary parts. The motivation behind adding the magnitude of the STFT as input is given by our preliminary FCN models [12], which successfully used it to predict the magnitude of the FFT. Furthermore, the magnitude of an STFT has the most meaningful visual information and can be seen as an attention map [39], [40], which, in our case, is not computed by the network, but offered as an input channel. The output follows a similar design in terms of the number of channels, the only difference being its spatial dimension, as described next. Although our network does not directly compute the phase, it can be computed from the real and imaginary parts. We hereby underline that we have tried various architectures to explicitly output the phase of the FFT, such as having as input channels only the magnitude and phase, but we never obtained convergence. However, it appears that modeling the phase indirectly is achievable. Regarding the magnitude, we can choose between taking the middle output channel directly predicted by the network or computing the magnitude from the real and imaginary parts, as a post-processing step. The results are very close, but slightly better for the former approach. In summary, we take the magnitude predicted as output and compute the phase from the real and imaginary parts. We also noticed that without the magnitude of the STFT as input channel, our model achieves significantly lower performance (see Table 4). Hence, the seemingly redundant magnitude channel is actually of utter importance.
Our neural model is designed to process an input tensor of size 154 × 2048 × 3 and give an output tensor of size 1 × 2048 × 3. The network progressively reduces the di-VOLUME 4, 2016 mension on the vertical axis (154), which corresponds to the number of time bins in which the STFT is computed, to the size 1, while keeping the dimensions on the other axes constant (the number of FFT points, N x , and the number of channels, respectively).
Our architecture, illustrated in Figure 3, consists of 10 convolutional (conv) layers organized into 4 convolution blocks. Each of the first 2 blocks are composed of 3 conv layers, followed by a max-pooling layer. The third block is formed of 2 conv layers and a max-pooling layer, while the last block has the same number of convolutions as the third, but without any pooling layer. Additionally, each conv layer is followed by leaky Rectified Linear Units (ReLU) [41], except for the last 2 layers. The number of convolutional filters (kernels) is independently established for each block. The number of kernels starts from 32 in the first block, growing by 32 with each subsequent block, ending up at 128 in the last one. Exceptionally, the very last conv layer has only 3 kernels in order to fit to the desired number of output channels. We also reduce the kernel size from 13 × 13 in the first block to 9 × 9 in the second block and 5×5 in the third block. Regarding the last conv block, we set the kernel size to 5×5 in the first conv layer and to 1×1 in the last conv layer, respectively. The conv filters are always applied at stride 1, a circular padding being added to preserve the horizontal dimension of the activation maps. The pooling filters are always of size 2 × 1, reducing the size of the activation maps by half on the vertical axis only. Zero padding for the max-pooling layers is added only when we need to make sure that the input activation maps have an even size.

D. LOSS FUNCTION
The procedure of learning a neural network model f is cast as an optimization problem, which is typically solved using a gradient-based algorithm that navigates the space of possible sets of weights W the model may use in order to attain a convergence point. Typically, a neural network model is trained using the stochastic gradient descent optimization algorithm, the weights being updated using back-propagation [42]. In the context of an optimization problem, the function used to evaluate a candidate solution (i.e., a set of weights) is referred to as the objective function, or the loss function.
In our case, we employ a custom loss function based on the mean squared error (MSE), in order to properly train the model and achieve optimal results. Our main goal is to recover the targets, which are typically at the upper extremity of the amplitude interval. To make sure that our model gives proper attention to such extreme values, we favor MSE instead of the mean absolute error (MAE). Furthermore, our loss function is designed to adjust the importance of the FFT magnitude in relation to its real and imaginary parts, because estimating the real and the imaginary parts of a complex number is more difficult to achieve compared to estimating its magnitude [18]. We therefore introduce the hyperparameter λ ∈ R + to control this importance. Our loss function is formally defined below: L(y,ŷ) = L abs (y,ŷ) + λ · (L re (y,ŷ) + L im (y,ŷ)), (6) where y is the true label,ŷ = f (x, W ) is the label predicted by the model f for the input x associated with label y, and the loss function L {abs,re,im} is the MSE applied to the corresponding parts of y andŷ, respectively. As explained earlier, the factor λ adjusts the importance of the magnitude with respect to the real and the imaginary parts. We stress out that the label y is actually the FFT of the clean range profile, being composed of the magnitude, the real part and the imaginary part, respectively.

E. WEIGHT PRUNING
Convolutional neural networks have shown major performance improvements in a broad range of domains [28], [36], [37], once the training on powerful graphical processing units was made possible by the technological advancements in parallel processing, gaining orders of magnitude in terms of training time [36]. This also allowed researchers to explore deeper and deeper models [43]- [45], requiring appropriate changes to avoid vanishing and exploding gradients after a certain point, for example by introducing residual blocks [45]. However, a downside of such large models is that they are also likely to capture noise from training data, easily falling into the pitfall of overfitting. The noise learned by a CNN through its weights is not representative for the generic data distribution, inherently leading to high variance and poor performance. Nonetheless, simply reducing the model's capacity would not be a proper solution, because it will lead to the other extreme, underfitting. This problem may occur when the high-order relationships between input and output cannot be captured by a model with reduced capacity. Since it is already proven that CNNs attain better results as the models grow deeper [36], [43]- [45], the main focus in this area of research is to find ways to avoid overfitting for models with higher capacity. One such example is dropout [21]. Moreover, a well-known fact is that noise reduction is a hot topic in the field of signal processing, therefore, a lot of approaches have been proposed by researchers [46]. Both classic signal processing algorithms [47]- [49] and machine learning methods [50]- [52] have been developed in order to mitigate the noise from signals. The noise problem is even more relevant when we refer to denoising solutions based on deep learning, because models have a large capacity and tend to also replicate the noise from label signals, preventing the network from achieving a global optimum.
To solve this problem, we apply a weight pruning [19], [20] method that starts with a conventional training phase, followed by a noise-constrained training phase with the aim of pruning the inner network noise, thus improving its signalto-noise ratio. The network architecture is perfectly consistent, assuming no further modifications at testing time. The steps required by weight pruning are formally described in Algorithm 1. The training starts with the random initialization of the weights W from a normal distribution, as usual. 14: In the first stage, we apply the standard training procedure based on gradient descent, until we reach an optimal convergence point. During this stage, the weights W are updated in the negative direction of the gradient ∇f , the update step being controlled through the learning rate η. After the first training stage, we observed that our neural networks contains many weights that are close to zero. When put together, these very small weights can affect the model, acting as some kind of noise learned from the training data. In the next phase, we compute a binary mask M with the aim of clipping the less important weights to zero. In step 4, the weights are first sorted by their magnitude in ascending order. The index of the largest "noisy" weight to be used as threshold is computed in step 5, based on the ratio of noise reduction r given as input. The actual value of the threshold weight is stored into the parameter . In steps 7 to 11, we build the mask M , assigning 0 for every weight lower than and 1 for every other weight. After obtaining the mask M , we can further proceed by training the model using gradient descent. After each weight update, the training algorithm introduces step 14, which removes the weights close to zero. We note that the mask M can also be recomputed at every iteration, but we did not observe any significant improvement in terms of convergence during our preliminary experiments. To save computational time during training, we decided to compute the mask M only at the beginning. Last, we note that, although Algorithm 1 is based on the standard stochastic gradient descent, the weight update steps are independent of the training regime. Hence, weight pruning [19], [20] can be applied on top of any modern optimization algorithm for neural networks, e.g. Adam [53].

1) Relation to dropout
Being designed as a method to prevent overfitting, we note that weight pruning can be seen as a competitor to dropout [21], which may lead to superior results [54]. Dropout is a regularization technique that drops out a certain percentage of neural units randomly, at each iteration. Weight pruning works in a similar way, but instead of dropping units randomly, it chooses the units that have the weights closer to zero. We expect such units to contain noise rather than useful information. We therefore believe that weight pruning is able to preserve (or even improve) the signal-to-noise ratio inside the neural network. Another difference from dropout is that our training regime based on weight pruning is divided into two stages. In the first stage, we allow the network to converge to an optimal point, using only early stopping to prevent overfitting. Weight pruning is applied only in the second stage, enabling convergence to a more generic solution. We compare dropout and weight pruning experimentally, showing that the latter training regime provides superior results.

IV. DATA SET
One of the key factors in the training process of deep models is the data set. It was empirically shown many times before, e.g. [36], [45], that a large database, e.g. ImageNet [55], is essential to enable deep models to attain state-of-the-art results. Therefore, we extend the automotive radar interference mitigation (ARIM) data set proposed in [12] in order to cover complex real-world automotive scenarios that include multiple sources of interference. To the best of our knowledge, there are no other public databases for the interference mitigation task with multiple sources of interference. Our data set is created in the fast FMCW hypothesis, where the beat frequency is usually much larger than the Doppler shift. Each range profile can include both static and dynamic targets, which will practically appear as straight lines in the beat signal's spectrogram.
In this paper, we introduce a novel and complex large-scale database, called ARIM-v2, consisting of 144,000 synthetically generated samples, replicating realistic automotive scenarios. We generated each sample using randomly selected values from the set of realistic parameters enumerated in Table 1. The number of interference sources and the signalto-noise ratio (SNR) values are selected using a fixed step between the minimum and the maximum values. The other parameters from Table 1 are interpreted as random variables that follow an uniform distribution between the minimum and the maximum values. The amplitude of each target is proportional with the power expected from that particular target. Moreover, we added a random phase to each target to obtain more realistic radar signals.
Real data acquisition involves capturing signals with radar sensors that have specific parameters. Even when we consider a particular application, the deployed radar sensor VOLUME 4, 2016  could have distinct values of parameters, which may lead to differences between captured data. For this reason, in our data generation procedure, we considered a set of parameters (i.e., bandwidth, sweep time, sampling frequency and central frequency) that can be adjusted for a specific radar sensor. In this way, our database could be adapted and used in various circumstances. Without loss of generality, we developed the database by setting the acquisition parameters to typical values used in automotive radar sensors. The exact values used for these parameters are listed in Table 2. We underline that, for a different sensor, the database can be regenerated with the parameters of the specific radar system. One of the greatest advantages regarding synthetically generated data is that we can control the entire process with the purpose of obtaining more complete and relevant information, which may help to develop a better solution. In our case, we have access to the signal with interference as well as access to the clean signal. Hence, we can properly evaluate any interference mitigation algorithm. For example, the clean signals can be used as ground-truth labels when training a machine learning model. Moreover, access to the clean signals provides the means to conduct an objective performance assessment, by comparing the output predicted by the model with the corresponding ground-truth (expected) output. In ARIM-v2, a data sample is composed of: • a time domain signal without interference; • a time domain signal with interference; • a label vector with complex amplitude values in target locations; • a label vector with the following information: number of sources of interference, SNR, SIR and interference slopes. We randomly split our data samples into a training set of 120,000 samples and a test set of 24,000 samples. The generated data set will allow future works on RFI mitigation to objectively compare newly developed methods with the state of the art, provided that our data set is freely available for download at: http://github.com/ristea/arim-v2.

V. EXPERIMENTS
Since both ARIM and ARIM-v2 databases consist of multiple radar signals (with and without interference) referring to different range profiles, in our experiments, the interference mitigation is performed individually, on each range profile. We consider as label the amplitude and phase of targets from the range profiles obtained by applying FFT on signals without interference.

A. PERFORMANCE MEASURES
Usually, the goal in radar signal processing is to maximize the detection performance. Therefore, a rather intuitive measure is the area under the Receiver Operating Characteristics (ROC) curve, known as AUC for short, which describes the ability to disentangle targets from noise at various thresholds. When computing the AUC, the target detection threshold slides iteratively from the lowest value to the largest value in the range profile, modifying the probability of false alarms. Another performance indicator is the mean absolute error (MAE) in decibels (dB) between the range profile amplitude of targets computed from label signals and the amplitude of targets from predicted signals. However, in radar signal processing not only the target's amplitude is important, but also its phase, because the latter is necessary to estimate other essential parameters (e.g., target velocity) or to perform beamforming. Thus, we also report the MAE in degrees between the range profile phase of targets computed from label signals and the phase of targets from predicted signals.
In summary, we employ the AUC, amplitude MAE, phase MAE and mean SNR improvement (∆SNR), which is computed for the target with the highest amplitude as the difference between the SNR before and after interference mitigation in the range profile.

B. HYPERPARAMETER TUNING
Hyperparameter tuning is performed on ARIM-v2, employing the same hyperparameters on ARIM without further tuning. In order to minimize the chance of overfitting in hyperparameter space, we split the ARIM-v2 training set into training and validation, keeping 20% (24,000 samples) for validation, the rest (96,000 samples) being used for training. Regarding the our regime based on weight pruning, we trained our model for 100 epochs in the conventional training regime, followed by 20 epochs with weight pruning. The ratio of noise reduction r was validated considering values in the set {0.15, 0.3, 0.45}, the best performance gains being obtained for r = 0.3. We compared the weight pruning regime with conventional training for 120 epochs and dropout for 120 epochs, respectively. The dropout rate was validated in the range [0.1, 0.5], the best rate being 0.25. In all cases, we used mini-batches of 16 samples using the Adam optimizer [53] with a learning rate of 5 · 10 −5 and a weight decay of 10 −5 . Regarding the parameter λ in the loss function, we tried out several values ranging from 1 to 20, the best solution being achieved for λ = 10.

C. RESULTS OF WEIGHT PRUNING VERSUS COMPETING METHODS
In order to prove that weight pruning attains better performance due to its inner network noise reduction principle, we present the results obtained on the ARIM-v2 test set in comparison with a set of competing training regimes. We consider as competing methods the conventional training regime applied on a network with full capacity, the conventional regime applied on a network with half capacity (HC), and the regime known as dropout [21]. The corresponding results are presented in Table 3. We first observe that dropout offers the lowest results in terms of all performance metrics, even compared to the conventional training regime. The poor results attained by dropout actually motivated us to seek an alternative training regime, this being the main driver behind using weight pruning. In order to establish the optimal noise reduction ratio r for weight pruning, we performed several validation experiments. However, we observed that weight pruning produces better results than conventional training, irrespective of the considered reduction ratio. We therefore present test results using three different reduction ratios in Table 3. Although weight pruning is generally better than dropout and conventional training, it seems that the best results are achieved for the ratio r = 0.3. Even if the noise reduction ratio of 0.45 gives better results in terms of ∆SNR, the other performance metrics are in favor of the ratio r = 0.3.
Since weight pruning replaces a certain percentage of weights with zero, it can be argued that it can be equivalent to simply reducing the model's capacity. We therefore present results using conventional training, while considering a model having 50% of the original FCN capacity. As shown in Table 3, reducing the network's capacity is a sub-optimal solution. In terms of ∆SN R, the difference between our best pruning variant and conventional training with half capacity is 2.92 dB in favor of the former approach. Additionally, an important difference can be observed for the MAE computed on the phase of targets, where the score of the best pruning solution is 6.58 degrees, while the score of the model with half capacity is 3.30 degrees higher. We thus emphasize that weight pruning is not equivalent to reducing the network's capacity, as it attains superior results.

D. RESULTS ON ARIM
On the ARIM data set, which contains one interference source per data sample, we compared our FCN (considering both conventional and weight pruning regimes) with the oracle, the zeroing approach, the CNN proposed by Rock et al. [13] and the FCN models proposed in our earlier work [12]. The results are presented in Table 4. The oracle is a model based on ground-truth labels, which represents an upper bound for the other models. We used the same network architecture proposed by Rock et al. [13], but instead of range-Doppler maps, we trained the network with the STFT of radar signals.
A major drawback of the FCN models proposed in [12] is their inability to estimate the phase of signals, which is a mandatory quality, the phase being necessary in subsequent radar signal processing blocks. Even if our method attains a poor performance in terms of ∆SNR compared with the Deep FCN, we outperform all methods regarding the other performance measures. On the test set, our FCN model trained with weight pruning surpasses the Deep FCN with 0.20 dB in terms of target amplitude MAE, as well as the zeroing baseline, with 1.49 degrees in terms of target phase MAE. In addition, we observe that weight pruning leads to slightly better results, sustaining the idea that noisy weights may alter the overall performance of the neural model.
In order to show the necessity and the effectiveness of adding the magnitude channel besides the real and the imaginary parts of the input, we removed the absolute channel from the input and the output, obtaining an ablated FCN network (no magnitude). Its results are included in Table 4. Without the magnitude channel, we observe that the network's performance drops by a significant margin, attaining weaker results compared with the other methods (even weaker than zeroing). This enforces the idea that the magnitude channel is a useful input channel, acting as an attention mechanism that helps the network to focus on relevant input locations.
In addition to the quantitative results presented so far, we illustrate a series of qualitative results on the ARIM test set in Figure 4, comparing our approach against the zeroing method. The examples are vertically corespondent and they demonstrate that in certain conditions, for example when wide-length interference affects the signal, classical approaches, such as zeroing, fail to mitigate the interference and provide unsatisfying results. In Figure 4, we observe that our model successfully produces signals that are very similar with the labels, while the zeroing method cannot perform the interference mitigation. All parameters are identical except VOLUME 4, 2016 4: Validation and test results on the ARIM data set (containing only one source of interference per range profile) attained by our model (trained with both conventional and weight pruning regimes) versus the oracle (based on ground-truth labels), the zeroing approach, a state-of-the-art deep learning method [13] and our earlier FCN models [12]. The best results (excluding the oracle) are highlighted in bold. The symbol ↑ means higher values are better and ↓ means lower values are better.     for the parameter k (the ratio between signal and interference slopes), which quantifies the length of interference with respect to the length of signal. More exactly, the closer k is to 1, the longer the interference is, i.e. k = 1 refers to a coherent (a) Nint = 2, SIRmin = 0.5dB.
(d) The same parameters as above.
(e) The same parameters as above.
(f) The same parameters as above. interference.

E. RESULTS ON ARIM-V2
On the ARIM-v2 data set, which contains up to three interference sources per data sample, we compared our FCN (considering both conventional and weight pruning regimes) with the oracle (based on ground-truth labels), the zeroing baseline and the CNN of Rock et al. [13]. The results reported in Table 5 show that our approach provides superior results for all metrics, attaining performance levels quite close to the oracle. The differences between zeroing and our FCN on the ARIM data set become undoubtedly higher on the ARIM-v2 data set, because ARIM-v2 simulates a more difficult automotive scenario, in which a conventional method such as zeroing seems to fail to mitigate multiple sources of interference. Our FCN model attains almost half the error reached by zeroing, in terms of target phase MAE. Furthermore, our model estimates the amplitudes of targets with 0.86 dB better than zeroing on the test set. Another remarkable difference can be seen for the mean SNR improvement, where our network obtained a score of 15.36 dB, which is better than zeroing by 6.42 dB. In addition, we note that weight pruning leads to consistent improvements for all metrics, which seems to be considerably more important on ARIM-v2 than on ARIM. With or without pruning, our method also surpasses the state-of-the-art CNN of Rock et al. [13].
In terms of the amplitude MAE, even the zeroing baseline outperforms the CNN of Rock et al. [13]. In summary, the quantitative results demonstrate the superiority of our method.
In addition to the quantitative results presented so far, we illustrate a series of qualitative results on the ARIM-v2 test data set in Figure 5, comparing our approach against the zeroing method. Due to the fact that data samples are synthetically generated, we are able to compare the algorithms with the ground-truth signal without interference, allowing us to determine which method provides the desired result after interference mitigation. The plots depicted in Figure 5 are vertically corespondent, meaning that, in the top plot on a column, the interference is mitigated by zeroing, while in the bottom plot, the same interference is mitigated by our FCN model. We handpicked three examples with multiple sources of interference, a type of incident that may occur in a real-life automotive scenario. We observe that, in this particular case, when there are multiple sources of interference, the zeroing approach fails to mitigate the interference and the targets can be barley observed because of the raised noise floor. Our model successfully mitigates the interference, providing an output very similar to the label. Although our model shows similar performance to baseline approaches when signals are affected by an interference with narrow length, in a difficult scenario, with multiple sources of interference or with wide-length interference, our approach clearly outperforms approaches such as zeroing, as it results from the plots presented in Figure 4 and Figure 5.
In order to provide a more detailed picture of our quantitative results, in Figure 6, we illustrate how our approach compares to zeroing in terms of three performance metrics (AUC, amplitude RMSE and phase RMSE) considering one, two and three sources of interference, from top to bottom, respectively. We observe that the differences between our FCN models and zeroing grows along with the number of VOLUME 4, 2016 interference sources, in favor of our approach, considering all performance measures. We notice an important difference when we consider the RMSE on the phase of targets. The zeroing algorithm exhibits poor performance because, when there are multiple sources of interference, a substantial part of the signal is covered by interference. Therefore, we observe a substantial difference between our FCN models and zeroing. In Figure 6, we also illustrate the results of the deep FCN from our preliminary work [12], excluding it from the graphs depicting the phase RMSE, since the deep FCN is not capable of recovering the phase. As the number of interference sources grows, we observe an increasing gap between our FCN+pruning and the deep FCN. Certainly, the gap is in favor of our method. We thus conclude that our current neural model is superior.
In Figure 8, we added some visual results to observe the network's capacity to reduce noise and mitigate radar interference on ARIM-v2. In order to compare STFT data, we processed the output of the network by performing an inverse FFT, followed by a STFT. In this manner, we are able to reconstruct the STFT and compare it with the input STFT. We can observe that for no interference source (N s = 0), the network acts like a denoising model and does not affect the target. When we feed STFT data affected by interference into the network, we observe that the interference is completely mitigated, even if input data is affected by multiple interference sources (N s = 3).
In order to analyze more complex interference scenarios,  we synthetically generated samples with a single source of interference, while modeling the multipath propagation of the interference signal. The multipath simulation was performed by summing the same interference signal to the signal affected by interference, with a delay and a different amplitude. The delay was chosen to correspond to 0.5 and 0.7 meters and the reflection amplitude was considered 0.3 and 0.8, in order to simulate both weak and strong reflectors. The results are shown in Figure 9. We can observe that our model successfully mitigates the interference, even if it corresponds to a multipath propagation.

F. GENERALIZATION TO REAL DATA
The major concern regarding training a neural network on synthetically generated samples is the model's capacity to generalize to real data. Therefore, we evaluate the generalization capacity of our FCN on real data, by testing it on real samples collected with two different radar sensors. We underline that our FCN is never trained or fine-tuned on real data samples. In Figure 7, we present qualitative results on nine real samples with interference, comparing our method against zeroing. The first six plots, depicted in Figures 7a to 7f, are generated with real data provided by FAU [18]. We note that the targets are different among the presented signals, showcasing various scenarios. Moreover, the central frequency of the interference source is not always the same, having three distinct values: 76.25 GHz, 76.5 GHz and 76.75 GHz. Looking at the results, it is clear that our network can provide more accurate estimations of the amplitude of targets, being able to mitigate the interference and to reduce the noise floor.
The last three plots, depicted in Figures 7g to 7i, are made on data provided by the NXP company, which were captured with the NXP TEF810X 77 GHz radar transceiver in a couple of outdoor experiments on a two-lane road. The victim radar was mounted on the bumper of a car, while the interfering radar was mounted on a tripod in a fixed location outside the roadway. The main target was another moving car on the road. Besides the car, there were other reflections in the range profiles coming from surrounding targets (e.g., lighting poles, trees). Even if the interference is more visible in these examples, our approach successfully mitigates the interference, providing better results in terms of amplitude of targets compared to the zeroing algorithm.
We highlight that the real data samples are collected with different radar sensors and have distinct central frequencies. Nevertheless, our model is able to mitigate the interference and surpass the baseline method, without any adjustment or  fine-tuning. This demonstrates that our model has a good generalization capacity, being applicable to a wide range of radar sensors, without requiring any additional effort. In addition to range profile processing, we tested the capacity of our network to clean real range-Doppler profiles, by processing separately each range profile and then concatenating them. We computed the range-Doppler experiment on data from the NXP company and tested our method against the zeroing baseline. The results are shown in Figure 10. We can observe that our FCN trained with pruning is able to better clean the range-Doppler map in comparison with the zeroing method.

G. GENERALIZATION TO MORE INTERFERENCE SOURCES
In real automotive scenarios, a wide range of incidents may cause the radar sensor to fail during driving. A plausible situation could be that, in a specific moment, more interference sources affect the radar antenna. Therefore, we investigate the generalization capacity of our model to mitigate RFI from more sources than it was trained for. In this scope, we synthetically generated an additional test data set of 2,400 samples with four, five and six interference sources. We consider our FCN models trained on ARIM-v2 with both conventional and weight pruning regimes, resulting in an out-of-distribution evaluation setting. The results attained by our FCN models are compared with the oracle and the zeroing method. As shown in Table 6, our approach clearly outperforms the zeroing algorithm, being the closest method to the oracle. In terms of target phase MAE, our FCN based on weight pruning attains results with 12.81 degrees better than zeroing. Moreover, the ∆SNR is almost double for both FCN models compared to the zeroing baseline. Regarding the AUC, a measure which is very important in radar applications because it describes the ability to disentangle targets from noise, our best model has an improvement of 7.8% compared to zeroing. In addition, we notice that weight pruning attains better performance compared to the conventional training regime, even when we test the generalization capacity on out-of-distribution data. This further supports our claim that FIGURE 9: Qualitative results provided by our FCN+pruning model on synthetically generated data for multipath interference propagation. On the first row, there are signals affected by interference, and on the second row, the interference is mitigated with our network. A mp stands for the amplitude of the reflected interference and d mp stands for the path difference for the reflected interference with respect to the direct path. In all images, there is a single target (horizontal line) in the same position. Best viewed in color. weight pruning can act as a regularization method.

H. GENERALIZATION TO MORE TARGETS
Another less expected situation that can occur in real automotive scenarios is generated by the presence of a multitude of targets in the same range profile. To demonstrate the capacity of our network to generalize to such situations, we generated an additional synthetic test set of 2,400 samples, such that each sample contains a randomly chosen number of targets between 5 to 10. Our network trained with weight pruning attains the best performance, as shown in Table 7. Our best model has an improvement of 7.55 dB in terms of ∆SN R in comparison with the zeroing method. Moreover, the MAE of the target's phase is reduced by half for our best model, when we take the zeroing baseline as reference. Once again, the efficiency of weight pruning is highlighted by the results, as it surpasses the conventional training method in terms of all metrics. VOLUME 4, 2016

VI. CONCLUSION
In this paper, we proposed a novel fully convolutional network capable of estimating both magnitude and phase of automotive radar signals affected by multiple sources of interference. We also introduced a large-scale database of radar signals simulated in realistic and complex settings.
We compared our FCN model with some state-of-the-art methods in a series of comprehensive experiments, showing that the proposed FCN provides superior results. We also released our novel data set to allow objective comparison in future work. To our knowledge, we are the first to establish a benchmark data set for automotive radar interference mitigation with multiple sources of interference. In future work, we aim to modify our FCN or to explore model distillation approaches in order to perform real-time processing on lowcost embedded devices. At the moment, real-time processing is only possible on expensive GPUs. He published over 90 articles at international peer-reviewed conferences and journals, and a research monograph with Springer. Radu is editor of the journal Mathematics and served as an area chair at ICPR 2020. He received the "Caianiello Best Young Paper Award" at ICIAP 2013 for the paper entitled "Kernels for Visual Words Histograms". Radu also received the "Young Researchers in Science and Engineering" Prize from prof. Rada