Deep unfolding based hyper‐parameter optimisation for self‐interference cancellation in LTE‐A/5G‐transceivers

Deep unfolding is a very promising concept that allows to combine the advantages of traditional estimation techniques, such as adaptive filters, and machine learning approaches, like artificial neural networks. Focusing on a challenging self-interference problem occurring in frequency-division duplex radio frequency transceivers, namely modulated spurs, it is shown that deep unfolding enables remarkable performance gains. Based on the hyper-parameter optimisation of several least-mean squares (LMS) variants and the recursive-least squares algorithm, the importance of a well-chosen loss function are highlighted. Especially the variable step-size LMS and the transform-domain LMS vastly benefit without increased runtime complexity.

Deep Unfolding for Adaptive Filters: An AF algorithm is fully described by its inference equationŷ n = f Inf (x n , w n−1 ; ρ n ) and its coefficient update w n = f Upd (x n , y n , w n−1 ; ρ n ) with the input vector x n ∈ C Q and the desired value y n ∈ C at the general iteration index n. Unlike classical filtering applications, where x n is a delay-line vector, we do not place any restrictions on the input. Moreover, the formulation explicitly includes algorithms like truncated Volterra or spline AFs that are suitable for nonlinear estimation problems. Both, the inference and the up- (1) τ n are the iteration-dependent error weights and N is the total number of iterations. In AF applications, x n and y n are commonly realisations of random processes, therefore it is desirable to replace Equation (1) with the mean square errorL(.) = E[L(.)]. The common approximation of the expectation is the sample mean of different realisations r: In the following, the superscript (r) indicates a specific realisation. The described structure is visualised in Figure 1 for a single realisation, highlighting the similarity to a classical ANN. It is important to note that the coefficient initialisation w 0 , the error weighting τ n and the hyper-parameters ρ n are independent of r. Due to the complex structure of the loss function an analytic minimisation is typically infeasible and numerical methods, for instance the simplex or back-propagation algorithms, have to be applied. To limit the complexity of the minimisation for large N, it is beneficial to keep the hyper-parameters constant for certain iteration intervals, i. e.
Note that the interval counts M i and lengths l mi,i − l mi−1,i can be chosen individually for all P hyper-parameters. Theρ mi,i are the new variables to be optimised. The full potential of the deep unfolding concept is obtained when tuning AFs for hardware implementations. By replacing the analytic inference and update functions with their actual implementations, any numerical inaccuracies, e. g. caused by rounding errors in fixed-point arithmetic, are included in the optimisation. Combined with a sufficiently high number of realisations, the data-driven tuning helps to avoid instabilities, too.
Application to Modulated Spur Cancellation: In the following, we investigate a particular linear estimation problem to show that, under challenging conditions, substantial improvements are possible even in such a widely covered field. The cancellation of modulated spur interferences, a class of self-interferences, is relevant in LTE-A/5G transceivers. The main issues are high power differences between simultaneously occurring transmit (Tx) and receive (Rx) signals combined with an insufficient Tx-Rx isolation and receiver non-idealities. Consequently, the Tx signal leaks into the receiver, where it can create a modulated spur interference that deteriorates the wanted Rx signal. For further details on the effect we refer to [7]. In short, the resulting AF problem resembles the well-known noise cancellation application. Though, due to tough signal characteristics, this problem is challenging for low-cost AFs which are preferred to enable a real-time implementation. First, the signals are typically narrow-band in order to enable resource sharing among different users. Second, the symbol-wise nature of the involved signals leads to non-stationarity at these symbol boundaries [6]. The simplified signal model (by neglecting a known frequency offset) in the Rx baseband is  with the interference y Intf,n , the signal-of-interest y Rx,n and additive noise η n . h TxL,n is the unknown impulse response of the leakage path and x n are the Tx baseband samples. In order to avoid degradation of the signal-of-interest, we target to cancel the interference by estimating the leakage path by an AF. Note that both the additive noise and the signalof-interest act as noise for the cancellation process. We employ deep unfolding with two loss functions to optimise the performance of different AF algorithms and compare them in simulations. The setup follows the considerations of [7]. For the Tx we use an LTE-10 signal with an allocated bandwidth of 2.16 MHz (12 out of 50 resource blocks). The wanted Rx uses an LTE-10 downlink signal with full bandwidth and a signal-to-noise ratio (SNR) of 10 dB relative to η n . The h TxL,n are finite impulse responses obtained from measured leakage paths. The AF length is set to Q = 16. All presented cancellation results are ensemble averages over 6 leakage paths and 50 signal realisations for each path. The hyper-parameter optimisation is based on a set of R = 48 signal realisations (x (r) n , y (r) n ), where each of them involves a randomly generated leakage path. To get more general performance results and avoid any bias, the realisations for training and testing are different.

Impact of Loss Function:
The first AF we apply is the N-LMS [3], represented by the inference and update equationŝ Besides the regularisation ξ , which we fix to 10 −4 , the only hyperparameter of the N-LMS is the constant step-size μ NLMS . The definition of the cost functions L (r) NLMS (μ NLMS ) is completed with the error weighting τ n for all N = 4384 samples (4 LTE-10 symbols) of a single realisation. Two approaches appear promising: One is to equally weight all e n with τ 1,n = 1, which enforces fast adaptation. The other is to reduce the weighting of early (and high) errors, which prioritises the steady-state performance. We use τ 2,n = 0.05 for the first LTE-10 symbol and τ 2,n = 1 otherwise. In addition, we mask symbol transitions with τ 2,n = 0 to account for the temporary non-stationarity [6]. Both error weightings are depicted in Figure 2.L NLMS (.) is minimised by means of the simplex algorithm for selected interference power levels, quantified by the time-and ensemble-averaged interference-to-carrier-plusnoise ratio The initialisation value and optimisation results for μ NLMS using τ 2,n are given in Table 1. As expected, higher step-sizes can be chosen for high ICN values. Moreover, the quite large deviation of the results from the initial value indicate that the optimisation is robust against unfavourable initialisation. The second AF we apply is the TD-LMS [3], a variant of the N-LMS with a transformed input and an estimated power normalisation. It is obtained from Equations (5) and (6) by substituting x n with v n : v n = diag p 2 n −1 u n , u n = T x n p n,k = (1 − β ) p n,k + β |u n,k | 2 . (8) denotes an element-wise operation and p n,k and u n,k are the k-th entries of the vectors p n and u n , respectively. T ∈ C Q×Q is the discrete cosine transform (DCT) of type-II in our case. The TD-LMS has two constant hyper-parameters: μ TD and β. Again, we minimiseL TD (.) for both error weightings and provide selected results for τ 2,n in Table 1.
Similarly to the N-LMS, higher step-sizes are possible for strong interference levels, although all values are smaller compared to μ NLMS . The power estimation is only slightly affected by the noise level, since it operates on the exact input sequence u n .
A crucial metric for the modulated spur application is the time-and ensemble-averaged signal-to-interference-plus-noise ratio (SINR) after cancellation, given by Note that the summation starts at n = 1100 to exclude the first symbol, which causes the SINR to mainly represent the steady-state performance.
In Figure 3, we compare the SINR of the optimised N-LMS and TD-LMS algorithms for both error weightings. Dotted vertical lines indicate the ICN values, for which the hyper-parameters of the AFs were optimised. Each grey rectangle indicates an ICN range for which a learned hyper-parameter has been used. This accounts for the fact that the exact ICN is usually unknown. While the difference between the error weightings is small for the N-LMS, in case of the TD-LMS, the lower τ 2,n for n < 1100 lead to a smaller μ TD , which enables a more accurate adaptation. Thus, the steady-state performance improves substantially with an SINR gain of up to 2 dB.

Steady-State Performance:
Based on the previous results, we solely continue with the superior error weighting τ 2,n and examine two additional AFs, the VSS-LMS and the RLS. The VSS-LMS [4] improves the adaptation rate of the N-LMS by employing an iteration-dependent step-size μ VSS,n in the standard update Equation (6). The inference Equation (5) coincides with the N-LMS. Many strategies have been proposed for optimal step-size decay, often relying on a vanishing estimation error. In case of modulated spur cancellation, the error signal contains the signal-ofinterest, and thus the prerequisite of many step-size selection strategies is not fulfilled. Though, the deep unfolding concept can still be applied in a straightforward manner. In order to simplify the hyper-parameter optimisation, we allow μ VSS,n to change only at predefined iterations l m and keep it constant after the last adjustment. In total, we split the first symbol into M = 8 intervals of increasing length and hence have 8 hyper-parameters to train. We choose an initial value of μ VSS,n = 0.1 for all segments.
The interval lengths and the optimal step-size sequences for three interference power levels are visualised in Figure 4. Besides the clear dependence of the step-size sequences on the ICN, all curves exhibit a fast decay to very small values, which enables good steady-state performance. For the highest ICN, the step-size in one segment is very close to Unlike the LMS variants, the RLS [3] does not require timedependent hyper-parameters. When using the exponentially-weighted variant, the two major parameters of the RLS are the forgetting factor λ and the initialisation factor α of the estimated inverse auto-covariance matrix. We use common values for initialisation, which implies a large value of α. However, the optimal results in Table 1 indicate that an unexpectedly low α is superior. This result is mainly related to the adaptation behaviour as discussed in the next section.
In Figure 5, the best cancellation results of the N-LMS and TD-LMS are compared to the VSS-LMS and the RLS. Clearly, the RLS outperforms all other algorithms by almost restoring the receiver SNR of 10 dB, independent of the ICN. However, when considering their complexity, the optimised TD-LMS and even the VSS-LMS feature remarkable improvements over the standard N-LMS. Tuning the hyper-parameters manually can hardly yield a comparable cancellation performance, especially for a high parameter count.
Adaptation Behaviour: Besides the steady-state performance, a fast adaptation is crucial for self-interference cancellation due to fast changing system parameters especially in 5G. A universally applicable metric for the adaptation performance is the normalised mean square error (NMSE) defined as In Figure 6 we compare the NMSE of all covered AFs for ICN of 16 dB. The N-LMS and the TD-LMS show a comparable initial NMSE decay, but the TD variant features a steady-state improvement of about dB for the VSS-LMS and −35.9 dB for the RLS. All curves exhibit regular spikes at the boundaries of the individual LTE-10 symbols, which is caused by the non-stationarity mentioned previously [6].
As we already mentioned above, the low optimal values for α, exemplarily given in Table 1, are contradictory to the basic literature on the RLS. To analyse the effect, we compare the initial NMSE decay for α = 10 6 and α = 1.65 for an ICN of 16 dB. The corresponding curves are plotted in Figure 7. A large α causes an overshoot of the estimation error within the first 25 samples, whereas a small α leads to a monotonic decrease of the error. Hence, with the optimised α, only a minimum number of Rx samples is considerably impaired by the interference.