Per-Wavelet Equalization for Discrete Wavelet Transform Based Multi-Carrier Modulation Systems

The Discrete Wavelet Transform (DWT) has gained attention in the area of Multi-Carrier Modulation (MCM) because it can overcome some well known limitations of Discrete Fourier Transform (DFT) based MCM systems. Its improved spectral containment removes the need for a cyclic prefix, be it that appropriate equalization then has to be added as the cyclic convolution property no longer holds. Most DWT based MCM systems in the literature use Time-domain EQualizers (TEQs) to mitigate the channel distortion. In this paper, a Per-Wavelet EQualizer (PWEQ) is proposed which directly maximizes the Signal-to-Interference-plus-Noise Ratio (SINR) per symbol and is applicable to any wavelet family. The proposed PWEQ provides a performance upper bound for the TEQs for DWT based MCM systems. The computational complexity of the PWEQ is reduced by modifying the Filter Bank (FB) structure of the DWT. Simulations are performed to compare the PWEQ performance against the TEQs for DWT based MCM systems and the similar Per-Tone EQualizer (PTEQ) for DFT based MCM systems. The simulations are performed using measured Asymmetric Digital Subscriber Line (ADSL) and G.fast channels with Fejér-Korovkin (FK) wavelets. The proposed PWEQ increases the SINR on the received symbols compared to the TEQs at the cost of an increased computational complexity.


I. INTRODUCTION
Discrete Multi-Tone (DMT) modulation has dominated broadband wireline [1], [2], [3], [4] and wireless [5], [6] communications over the last decades. It is a Discrete Fourier Transform (DFT) based Multi-Carrier Modulation (MCM) system with a fairly simple receiver structure where, because of the use of an appropriately long cyclic prefix, channel equalization can be done with a single-tap filter per tone, i.e. one complex multiplication per tone. However, the cyclic prefix length ν has to be equal to or longer than the channel impulse response length, and hence reduces the symbol rate by ν/(N + ν), where N is the block size. Therefore, a shorter cyclic prefix is often used in combination with a channel shortening filter, referred to as a Time-domain EQualizer (TEQ). The design of these TEQs thus usually aims at shortening the effective channel impulse response and is not directly linked to the Signal-to-Interference-plus-Noise Ratio (SINR) achieved on each tone. To address this, the Per-Tone EQualizer (PTEQ) has been introduced, which uses a multi-tap filter for each tone and aims at maximising the SINR on each tone. The PTEQ has been shown to provide a performance upper bound for TEQs [7].
The first sidelobe in the DFT prototype filter is only 13 dB lower than the main lobe, and consecutive lobes decrease at a rate of only 20 dB/dec [8]. This causes Inter Carrier Interference (ICI) whenever the receiver cannot equalize the channel properly. High sidelobe levels also lead to more tones being affected by a narrowband interferer.
Finally, in DFT based MCM systems, the Peak-to-Average Power Ratio (PAPR) of the transmitted signals is known to be relatively high, leading to inefficient use of the amplifier in the transmitter. Many mitigation techniques have been developed to reduce the PAPR based on clipping [9], signal scrambling [10] and recently machine learning [11]. However, each of these methods leads to either a BER increase [12] or a decrease in data rate.
Recently, the Discrete Wavelet Transform (DWT) has been considered as an alternative for the DFT in MCM systems because it can overcome some of its limitations. First, certain wavelet families have been shown to have a lower PAPR [13], which reduces the cost of the MCM system. Secondly, high order wavelets use several overlapping blocks because they have much longer basis functions. A well designed wavelet can therefore more accurately approximate an ideal bandpass filter, which has an infinitely long impulse response. This improves the robustness against interference and channel distortion, so that also the cyclic prefix can be removed.
Currently, there is no standardized equalization structure for Discrete Wavelet Multi-Tone (DWMT) modulation. Zeroforcing [14] and Minimum Mean Squared Error (MMSE) [15] equalizers have been suggested in the literature. However, they do not directly maximize the SINR achieved on each tone. This problem was solved for powerline communication with the Adaptive Sine modulated/Cosine modulated filter bank Equalizer for Transmultiplexers (ASCET) [16]. Unfortunately, this method is only applicable to cosine modulated Mband wavelets. This severely limits the choice of wavelet family because in general all M-band wavelets that create multiple subbands through a MultiResolution Analysis (MRA) could serve as basis functions. Therefore, the main objective of this paper is to derive a generalized equalizer for DWMT modulation similar to the PTEQ for DMT modulation.
The remainder of this paper is outlined as follows. In Section II, the matrix representation of a DWMT system with a TEQ is described. In Section III, the proposed Per-Wavelet EQualization (PWEQ) is derived and an efficient implementation is discussed in Section IV. In Section V, simulation results are presented where the proposed PWEQ is compared to a TEQ for DWMT modulation and PTEQ for DMT modulation. Finally, Section VI summarizes the main conclusions and Section VII discusses the possibilities for future research.

II. DATA MODEL
The data model is split into the three main parts of a communication link, i.e. the transmitter, the channel and the receiver.

A. TRANSMITTER
First, the incoming bitstream is modulated into blocks of N symbols using Amplitude Modulation (AM). It is noted that AM is used (versus Quadrature Amplitude Modulation (QAM) in DMT modulation) because the DWT has realvalued basis functions. Then, the Inverse Discrete Wavelet Transform (IDWT) transforms this block of N symbols into a discrete time-domain signal. This transformation can be calculated efficiently using Mallat's algorithm [17], which corresponds to the Filter Bank (FB) structure in Fig. 1 (2-band example, M = 2). The filters B 1 (z), . . ., B M (z) are the discrete filters associated with the reconstruction wavelet and scaling functions for an M-band wavelet transform. Their coefficients are determined through the dilation equation [18], [19].
Let the FB response be the mapping of input signals is a unit impulse at k = 0 and X i [k] is the ith symbol in the kth transmitted block, to the output signal. Then this mapping can be described by a matrix B ∈ R L d ×N , with L d the length of the output signal. The matrix B is a product of transformation matrices representing the Multi-Resolution Synthesis (MRS) [19] where n = log M (N ) 1 and every matrix B i represents all upsampling, filter and addition operations in the ith synthesis level. Mathematically, this is represented by equations (2) and (3) shown at the bottom of the next page, where G is the impulse response length for B 1 (z), . . ., B M (z) and b i j is the jth filter coefficient of B i (z). It can be noticed that each matrix B i is blockdiagonal, with N/M i diagonal blocks B i .
The FB in Fig. 1 consists of many sub-FBs. Let l[i] denote the length of the FB response of a sub-FB with the first i synthesis levels. Then l[i] will increase with increasing i because every upsampling operation introduces M − 1 zeros between every pair of consecutive samples. Furthermore, each filter operation increases the length with G − 1 samples. This leads to the following difference equation for the FB response length Solving this equation with initial condition l[0] = 1 leads to The total length of the FB response is then In general, this value is larger than, and not necessarily a multiple of, the block size N. Therefore, even though the symbol rate is N times smaller than the sampling rate, consecutive outputs of the FB will overlap. Jointly transforming multiple blocks of symbols stacked in a column vector then requires an overlap-add matrix.
Next, let us construct the column vector X k , where the superscript k indicates that this vector contains all transmitted symbols that contribute to the kth block of detected symbols Z[k] (see Section II-C, (18)). First, all symbols X i [k] in the kth block of transmitted symbols are stacked into the vector Secondly, all symbol blocks k , . . ., k, . . .k , on which Z[k] depends are stacked into the vector Then, this vector can be transformed with the overlap-add IDWT transformation matrix .
Several samples at the beginning and end of this transformation are not valid because of zero-padding artifacts. Since k and k are the indices of the first and last symbol block, respectively, the first valid sample is the one right after the end of the (k − 1)th FB response, at index (k − 1)N + L d + 1.
The last valid sample is the one right before the start of the (k + 1)th FB response, at index (k + 1)N. Let I x min and I x max be the indices of the first and last time-domain samples that have to be valid. Then the maximal k and minimal k are given by and respectively. The relevant time-domain samples x k are selected from the output of the transformation withB, by means of a selection matrix S, that discards the first I x min − (k N + 1) and last k N + L d − I x max rows The size of this vector x k is I x max − I x min + 1, which depends on the FB response, channel impulse response and the equalizer order T (see Section II-C). To conclude, the entire transmitter is represented by the matrix SB. In the next section, the influence of the channel on these transmitted samples is described.

B. CHANNEL
The channel distorts the transmitted samples through a convolution with the channel impulse response. Moreover, it corrupts the received samples with uncorrelated additive noise. Let h ch i denote the ith channel coefficient (i.e. the channel impulse response at lag i), then the convolution is represented by a Toeplitz matrix where K and L are the channel head and tail, respectively. These are defined through the channel delay, which is estimated by finding the maximal energy in a window, similar to the reference delay used in DMT modulation [7]. In this case, the size of the window is chosen equal to the equalizer length T . The received samples y can then be expressed as Hence, y k can be represented as a linear function of the transmitted symbols X k and the noise n k . In the next section, the reconstruction of the transmitted symbols from these received samples is described.
where the matrix A i represents the decomposition operations in the (n − i + 1) st analysis level. Mathematically this is represented by equations (16) and (17) shown at the bottom of the next page, where a i j is the jth filter coefficient of A i (z). It can be again noticed that each matrix A i is blockdiagonal with The matrix A maps L d samples onto a block of N symbols. Because of the overlap-add operation in (9), contributions of multiple symbol blocks will be included in the transformation. However, these contributions lie in the null space of the matrix A T . This is a result of the M-shift property in wavelet transformations, which leads to orthogonality of the basis functions when shifted over N samples. Alternatively, this can be interpreted from a FB theory point of view as a property of perfect reconstruction FBs.
In the absence of channel distortion and noise, a careful synchronization can therefore ensure perfect reconstruction of the transmitted symbols. When the channel does distort the transmitted signal, it destroys the orthogonality of the basis functions and Inter Block Interference (IBI) as well as ICI will occur. Therefore, a TEQ is often used to reduce the channel distortion by convolving the received signal with a filter w. The equalizer aims to reconstruct the transmitted signal, up to a delay. This delay is defined as the channel delay plus a decision delay δ, which is a design parameter. This delay has to be taken into account when synchronizing to (approximately) restore the orthogonality of the basis functions. The transmitted block of N symbols is then reconstructed as where Y k ∈ R L d ×T is a Hankel matrix containing the received samples y k , and Z[k] is defined as The last relevant received time-domain sample is the one at index I y max = kN + L d + K + δ due to the FB response, channel delay and decision delay. Furthermore, it follows from (18) that I y min = kN + K + δ + 2. Finally, I x max = kN + L d + K + δ and I x min = kN − L + δ + 2 because the channel impulse response in (13) is causal with length K + L + 1.
The complexity of the receiver consists of one DWT and a time-domain TEQ filtering operation. It can be shown [20] that this corresponds to per block of N symbols. The TEQ filter coefficients can be determined through optimization based on a zero-forcing or MMSE criterion. In practical realizations, this is often implemented with the Least Mean Squares (LMS) algorithm. This results in an additional NO(T ) complexity, which is of the same order as (20). Such optimizations are formulated in the time-domain. Hence, there is no guarantee that they result in an optimal SINR for each tone. This problem is addressed in the next section.

III. PER-WAVELET EQUALIZATION
Using the TEQ of the previous section, the ith reconstructed symbol is calculated as where A(i, :) denotes the ith row of A. Because matrix multiplications are associative, the order in which the matrices are applied can be interchanged [21]. This in particular makes it possible to design a filter for each tone separately, i.e. w i for tone i. The collection of filters w i (i = 1, . . ., N) is called the PWEQ, similar to the PTEQ in [7]. This then leads to a new expression for the ith reconstructed symbol Alternatively, the ith reconstructed symbol can be expressed directly as a function of y k through a reordered representation of the convolution operation The MMSE criterion can then be applied directly to the reconstruction error for every symbol separately Substituting (23) into (24) leads to a quadratic optimization problem. The solution of this optimization problem is calculated by equating the gradient with respect to w i to zero, which leads to where the correlations are expressed as a function of the known statistics of the transmitted symbols and noise R yy = E y k y k T = E HX k + n k HX k + n k T = HR XX H T + R nn (26) Here, R XX and R nn are the autocorrelation matrices of the transmitted symbols and noise, respectively.
The PWEQ will always perform better than or equal to any TEQ with the same filter order because the solution space for the TEQs is a subset of the solution space for the PWEQ, and the PWEQ design criterion is directly linked to the performance-determining SINR. Indeed, the objective function of the data rate maximizing TEQ [22] reduces to the PWEQ objective function when separate filters are used for each tone. Therefore, the PWEQ is effectively a generalization of the data rate maximizing TEQ. However, the increase in SINR comes at the expense of an increased computational complexity because T DWTs have to be computed instead of one per block of N symbols (which follows from (23)). This corresponds to a computational complexity per block of N symbols that is equal to In practical systems this will again be implemented with a gradient descent algorithm. For LMS, this leads to an additional NO(T ) complexity. This is significant, but remains smaller than the DWT complexity for high wavelet orders. Therefore the next section develops a more efficient implementation of the DWT.

IV. FILTER BANK IMPLEMENTATION OF PWEQ
The increase in computational complexity is caused by the calculation of T DWTs. This will be referred to as the sliding DWT, which is calculated as AY k . The columns of Y k are time shifted over a delay l = 0, . . ., T − 1. The column corresponding to l = 0 can be transformed using the FB in

TABLE 1. The Simulation Parameters for ADSL and G.fast
This operation can be repeated log M (T ) − 2 times which leads to a performance reduction by a factor T /M i in level i of the FB. Combining the number of computations at every level leads to the total computational complexity in the receiver [20] ⎧ The reduction of the computational complexity will be larger for increasing T . However, the resulting total computational complexity will always be larger than for a TEQ. On the other hand, the simulations in the next section will also show that a PWEQ can often use a lower filter order than a TEQ.

V. SIMULATIONS
In this section the simulated performance of the PWEQ is discussed and compared against an MMSE TEQ for DWMT and a PTEQ for DMT. The simulation aims to determine the SINR on each tone. To achieve this goal, the variance on the reconstructed symbols is determined using Monte Carlo simulations. For PTEQ, the SINR is calculated directly from the second order statistics of the noise and transmitted symbols [21].
Two standardized wireline communication systems, namely Asymmetric Digital Subscriber Line (ADSL) and G.fast, will be simulated in a Single Input Single Output (SISO) setup. Their channel impulse responses have been measured and provided by a Tier I telecommunication operator. Their frequency responses are shown in Figs. 4 and 5. Table 1 contains the simulation parameters for both standards. In this table, N denotes the number of symbols that are modulated independently in one block. For DMT, this is not equal to the block size because of the conjugate flip operation and insertion of a cyclic prefix. This choice for the definition  of N ensures equal sub-band width f for each tone, and results in the following symbol rates In this paper only the Fejér-Korovkin (FK) wavelet family is used because it has superior performance [24] due to its asymptotic approximation of a rectangular frequency response [25]. If all transmitted symbols have equal variance σ 2 X = σ 2 X i (i = 1, . . ., N) and are independent, the Power Spectral Density (PSD) of the transmitted signal is flat P x ( f ) = σ 2 X / f s . The noise is assumed to be white and Gaussian with variance σ 2 n . This variance is chosen such that the noise PSD P n ( f ) = σ 2 n / f s is 100 dB smaller than P x ( f ). The remainder of this section is split into two parts. In Section V-A the different systems are compared in terms of SINR per tone. Then, in Section V-B the maximal data rates are estimated from these SINRs.

A. SINR
First, Fig. 6 plots the SINR per tone for the ADSL system with an equalizer length of T = 16. For these settings, there is a significant improvement for the PWEQ compared to the TEQ for FK wavelets of order six (FK-6). Furthermore, increasing the wavelet order to fourteen (FK-14) results in another 10 dB increase of the SINR. However, PTEQ for DMT is still better than PWEQ and TEQ for DWMT.
Next, Fig. 7 shows that increasing the filter order to T = 64 results in comparable performance for all methods, with only   a minor decrease in SINR for the TEQ and low order wavelets. In this scenario, the performance is limited by the channel noise instead of channel distortion, which is independent of the applied method. Thus, all methods have similar performance. However, wavelet based methods do not use a cyclic prefix and will therefore be able to achieve slightly higher data rates than DMT.
The results for a 250 m G.fast line and T = 64 are shown in Fig. 8. PWEQ with FK-14 outperforms PTEQ with ν = 128 for certain low frequencies. On the other hand, PWEQ appears to be less robust against sharp dips in the channel frequency response than PTEQ. Increasing the cyclic prefix length to ν = 320 improves the SINR at low frequencies with more than 10 dB. Therefore, the PTEQ with a long prefix performs best for this scenario.  The periodic changes in SINR are probably caused by the different spectral overlap between basis functions. This is caused by the dyadic FB structure in Figs. 1 and 2. When splitting a signal into subband components, the edge of the spectrum overlaps with the adjacent frequency band because it deviates from the ideal brick wall frequency response. Because some subbands will have been at the edge several times and others only once, they have different levels of spectral containment. Fig. 9 plots the results for an equalizer length of T = 256. Again, the performance for high filter orders are comparable for all methods. Here, the shape of the SINR curve does not follow the channel frequency response of Fig. 5 while this is the case for the ADSL channel in Figs. 7 and 4 except at very low frequencies. Therefore, the performance for G.fast is still limited by residual channel distortion rather than channel noise. This is a consequence of the higher signal amplitude in G.fast because the lines are much shorter. Since both PTEQs have near identical SINR graphs, the shortest ν will result in the highest data rate.
Finally, the results for a 728 m G.fast channel are plotted in Figs. 10 and 11 for T = 32 and T = 128, respectively. As for ADSL, the TEQ performs worse than the PWEQ and increasing the wavelet order boosts performance. Increasing the filter order improves the SINR for TEQ and PWEQ slightly for low frequencies, while it has almost no effect on PTEQ. Overall, PTEQ with ν = 320 is the best method for this channel. To summarize, the PWEQ consistently increases the SINR on the received symbols compared to the TEQ. Increasing the order of the wavelet can improve the performance significantly but also increases the computational complexity. For high filter orders, the PWEQ performance is comparable to the PTEQ in ADSL and short G.fast lines. In long G.fast lines, a PTEQ consistently performs better.

B. CAPACITY
The data rate at constant BER is a good measure of performance to compare different communication systems. However, this depends on the power allocation and bit loading scheme of the DMT and DWMT. Therefore, an estimate of this data rate is made, starting from the Shannon capacity formula [26]. The number of bits b i that could maximally be loaded onto symbol i with arbitrary small BER is the Shannon capacity divided by the symbol rate where C is the capacity of a subband, W is its bandwidth and SINR i is the SINR on tone i. The factor α = W/F s is equal to 1 for DMT and 1/2 for DWMT. This compensates for the factor of two in (30) and (31) so that comparable SINR graphs lead to comparable data rates. However, practical methods cannot reach the Shannon capacity and the SINR is often artificially reduced with a factor SGA to satisfy the BER limit [27] b i = α log 2 This is referred to as the Standard Gap Approximation (SGA) because it assumes the margin to the Shannon capacity, to achieve a desired data rate, does not depend on the number of bits per symbol. In this paper, the value SGA = 15.8 is used for uncoded transmission with (Q)AM, including a noise margin of 6 dB. The data rate can then be estimated as F s b i . Tables 2, 3 and 4 list the estimated data rates for all channels and high equalizer orders. For the ADSL and short G.fast systems, the removal of the cyclic prefix results in a higher data rate for PWEQ compared to the PTEQ. However, for long G.fast lines the higher SINR  for PTEQ is larger than this effect and PTEQ achieves the highest data rate.

VI. CONCLUSION
In this paper, a Per-Wavelet EQualizer (PWEQ) is proposed, which is applicable to any type of wavelet and directly maximizes the SINR per symbol. An efficient FB implementation of the sliding DWT has been derived to reduce the computational complexity but it remains significantly higher than that of a TEQ.
By means of simulations, the performance of the proposed equalizer has been investigated for a wireline communication system. It has been verified empirically that the PWEQ is indeed an upper limit for the TEQ, for the same number of filter taps. Moreover, the PWEQ shows similar (or almost similar) performance as the PTEQ implementation of DMT for ADSL and 250 m G.fast lines and high equalizer orders.

VII. FUTURE WORK
The proposed PWEQ can be viewed as a stepping stone towards a truly practical DWMT system, where further complexity reduction is still required. Therefore, future research could focus on the reduction of the computational complexity of PWEQ. This can be achieved by exploiting the fact each subband signal only covers one Nth of the spectrum. Equivalent baseband processing can therefore be applied which removes the need for a sliding DWT entirely. This approach was followed in ASCET [16] but can have a negative impact on the SINR because the subband signals are not perfectly bandlimited and aliasing effects will occur. Therefore, choosing a downsampling factor 1 < D < N is advised and its value can be varied to optimize for performance or complexity. The computational complexity in both the filter and the FB implementation of the sliding DWT decrease approximately with the downsampling factor because all delay values in Fig. 3 are multiples of D.