Sidelobe Suppression for Multicarrier Signals via Structured Spectral Precoding

Reducing the large sidelobes of multicarrier signals is crucial to prevent adjacent channel interference. Spectral precoding is an effective approach toward this goal, at the expense of throughput loss due to precoder redundancy; thus, it is of interest to explore alternative precoder designs with improved performance at lower redundancies. We present a novel precoder which minimizes radiated power within a user-selectable frequency region. The structure of the precoding matrix is chosen to allow efficient mitigation of in-band distortion at the receiver by means of iterative and successive interference cancellation, while completely avoiding distortion to protected and pilot subcarriers. By exploiting the low-rank properties of constituent blocks, computational complexity can be significantly reduced with little impact on sidelobe reduction. Simulation results show the benefits of the proposed design, which is particularly effective in redundancy-limited settings targeting high spectral efficiency.

Sidelobe Suppression for Multicarrier Signals via Structured Spectral Precoding Khawar Hussain , Member, IEEE, Roberto López-Valcarce , Senior Member, IEEE, Francesc Rey , Member, IEEE, Josep Sala-Alvarez , Senior Member, IEEE, and Javier Villares , Senior Member, IEEE Abstract-Reducing the large sidelobes of multicarrier signals is crucial to prevent adjacent channel interference.Spectral precoding is an effective approach toward this goal, at the expense of throughput loss due to precoder redundancy; thus, it is of interest to explore alternative precoder designs with improved performance at lower redundancies.We present a novel precoder which minimizes radiated power within a user-selectable frequency region.The structure of the precoding matrix is chosen to allow efficient mitigation of in-band distortion at the receiver by means of iterative and successive interference cancellation, while completely avoiding distortion to protected and pilot subcarriers.By exploiting the low-rank properties of constituent blocks, computational complexity can be significantly reduced with little impact on sidelobe reduction.Simulation results show the benefits of the proposed design, which is particularly effective in redundancy-limited settings targeting high spectral efficiency.

I. INTRODUCTION
O RTHOGONAL frequency division multiplexing (OFDM) has been adopted as the main signaling format in many wireless communications standards, including 5G New Radio [1] and IEEE 802.11ax (Wi-Fi 6) [2], due to its inherent advantages: it is spectrally efficient, provides robustness against channel dispersion, and is well-matched to multiple input-multiple output (MIMO) operation.Despite these advantages, the power spectral density (PSD) of OFDM signals suffers from large sidelobes, causing high out-of-band radiation (OBR) which results in significant levels of adjacent channel interference.Inserting guard bands by turning off subcarriers is a simple but very inefficient means to address this problem due to the slow decay of sidelobes.Signal filtering [3] and windowing (pulse shaping) [4], [5], [6], [7] are also straightforward, but they reduce the effective length of the cyclic prefix (CP), whereas multiple-choice sequence techniques [8], [9] require the transmission of side information with each symbol, increasing system overhead.Data-dependent techniques have also been proposed, including constellation expansion [10], subcarrier weighting [11], [12], and phase adjustment [13]; they suffer from high online complexity, since they require solving an optimization problem for each OFDM symbol.
Spectral precoding, by which the active subcarriers are modulated by a suitable function of the information symbols, is another approach to reduce OBR [14], [15], [16], [17], [18], [19], [20], [21], [22].Since it generally introduces in-band distortion, some appropriate decoding may be required at the receiver to mitigate symbol error rate (SER) degradation.Active interference cancellation (AIC) [23], [24], [25], [26], [27], [28], [29], [30] constitutes an exception, as a particular case of spectral precoding in which data symbols are directly mapped to their subcarriers, whereas a few additional cancellation subcarriers, computed as a linear combination of data symbols, are reserved and used for OBR reduction.Although this process is distortionless and hence transparent to the receiver, which merely discards cancellation subcarriers, its effectiveness is limited.In contrast, orthogonal precoders [31], [32], [33], [34] use a precoding matrix with orthonormal columns which does introduce in-band distortion, although its effect can be readily corrected at the receiver without noise enhancement due to the orthonormality property of the precoding matrix.Orthogonal precoders significantly outperform AIC schemes in terms of OBR reduction, but at the price of increased computational complexity at both transmitter and receiver.
With spectral precoding, the difference between the total number of modulated subcarriers and the number of information symbols per block can be thought of as the redundancy of the precoder (e.g., the number of cancellation subcarriers in AIC).Increasing this redundancy results in more degrees of freedom available for OBR reduction, but with the corresponding penalty in spectral efficiency since fewer data symbols per block can be transmitted.At the other extreme, neither AIC nor orthogonal precoding can provide any OBR reduction with zero redundancy.Thus, it is of interest to seek new low-redundancy precoder designs achieving sufficient sidelobe suppression, possibly at the cost of additional computational complexity, either at the transmitter in order to implement the precoding operation, or at the receiver to compensate in-band distortion [15], [16], [17], [18], [19], [20], [21].The designs from [19] and [20] provide a step in this direction: the amount of in-band distortion introduced by the precoder can be controlled at the design step, and mitigated at the receiver by means of iterative decoding, as originally proposed in [15].Larger distortion levels improve OBR performance, but also impose a larger number of iterations at the decoder side.
Motivated by the above considerations, we propose a novel spectral precoder design providing additional flexibility in the tradeoff between OBR reduction and complexity.In particular, and building upon our preliminary work in [20], we combine a precoding block with distortion control on a per-subcarrier basis with a strictly lower triangular band (SLTB) block, whose joint optimization results in improved sidelobe suppression.Moreover, the SLTB structure allows to apply successive interference cancellation (SIC) at the receiver, which effectively limits further SER degradation.Following [19], [20], [28], and [33], the proposed design aims to directly minimize OBR over a selectable frequency range.The obtained precoding matrices can be computed offline; in addition, some of these matrices have (approximately) low rank, a property which can be exploited to further reduce online computational cost.
The main contributions of the paper are summarized next: 1) A new structure is proposed for the spectral precoding matrix.Active subcarriers are partitioned into data and cancellation subcarriers, as in AIC, but in-band distortion is allowed in order to improve OBR performance.To counteract its effect, an iterative SIC decoding scheme at the receiver is put forward, which motivates the introduction of a lower triangular block at the precoder side.2) This lower triangular block is combined with an unstructured low-distortion block for further improvement.Appropriate constraints on in-band distortion facilitate the task of the iterative decoder to avoid error propagation and SER degradation.The proposed structure allows for (but does not require) the inclusion of protected subcarriers, e.g., pilots, which remain free of in-band distortion.3) Based on this structure, the precoder coefficients are computed in order to minimize OBR, defined as the integral of the weighted PSD over a selectable frequency range.Although the resulting problem is convex, the number of variables and constraints is large, so we propose an alternative low-complexity scheme to iteratively seek a suboptimal solution.4) We provide numerical examples to validate our design.
As it turns out, some of the blocks comprising the precoder can be well approximated by low-rank matrices; similarly, the triangular block of the precoder can be well approximated by a band matrix.These facts result in significant savings in online computational complexity.
The rest of the paper is organized as follows.The signal model is given in Sec.II, and the proposed precoder structure is presented in Sec.III.The optimization problem to obtain the precoder matrices is discussed in Sec.IV, and online complexity is analyzed in Sec.V. Numerical results are provided in Sec.VI, and Sec.VII concludes the paper.
Notation: Vectors and matrices are respectively denoted by boldface lowercase and boldface uppercase symbols.||A|| F , A T and A H respectively denote the Frobenius norm, the transpose, and the conjugate transpose of A. The n × n identity matrix is denoted as I n , and its i-th column is denoted by e i .The Euclidean norm of a vector v and the trace of a square matrix A are respectively denoted as ∥v∥ and tr A. The Kronecker delta is denoted as δ[m], and E{•} is the expectation operator.Blank blocks in block-partitioned matrices correspond to all-zero blocks.

II. PROBLEM STATEMENT A. Signal Model
Consider a CP-OFDM signal generated with an IFFT of size N and cyclic prefix size of N cp samples.Let K = {k 1 , k 2 , . . ., k K } denote the set of indices of the K ≤ N active subcarriers, and let x k [m] be the data modulated on the k-th subcarrier in the m-th OFDM symbol.The baseband samples of the multicarrier signal are then given by where L = N + N cp is the symbol length in samples, and the shaping pulse h P [n] equals 1 for n = 0, 1, . . ., L − 1 and zero elsewhere.The (baseband) continuous-time multicarrier signal is obtained as the output of a digital-to-analog converter (DAC) with sampling frequency f s = 1 Ts with input s[n]: where h I (t) is the impulse response of the interpolation filter in the DAC.The subcarrier spacing, in Hz, is then given by ∆ f = 1 N Ts .The allocation of the K active subcarriers is based on four subcarrier types as follows: • K u subcarriers are dedicated to sending unprotected data, i.e., data that may have been altered by the precoding operation.The unprotected data in the m-th symbol are collected in the vector d u [m] ∈ C Ku .We define S ∈ C N ×Ku as the matrix comprising the K u columns of I N whose indices correspond to the location of the unprotected data subcarriers.• K p subcarriers are dedicated to sending protected data, i.e., data that should not be distorted by the precoding operation; these are collected in d p [m] ∈ C Kp .Protecting data may be necessary in the presence of legacy users which remain oblivious to the precoding operation at the transmitter, or to send side information about the precoder itself [35].We define R p ∈ C N ×Kp as the matrix comprising the K p columns of I N whose indices correspond to the location of the protected data subcarriers.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.• K c subcarriers are reserved for OBR reduction.We define T ∈ C N ×Kc as the matrix comprising the K c columns of I N whose indices correspond to the location of these cancellation subcarriers.Thus, one has K = K u + K p + K t + K c .Note that the allocation matrices S, R p , R t , T are all semi-unitary and pairwise orthogonal.
The vector modulating the subcarriers in the m-th symbol is obtained by linearly precoding the data and the pilots by means of the precoding matrix G ∈ C N ×(Ku+Kp+Kt) .Specifically, letting one has where P ∈ C Ku×(Ku+Kp+Kt) and Q ∈ C Kc×(Ku+Kp+Kt) are parameters to be designed.We assume that the subcarrier allocation, defined by the choice of S, T , R p and R t , is given.Note that the protected and training data are directly mapped to the corresponding entries of x[m] without distortion: this is readily seen from the fact that and . Also note that the N − K entries of x[m] with indices not in K are zero, as they correspond to unmodulated subcarriers.
It is assumed that the sequence d t [m] modulated on pilot subcarriers is chosen as pseudo-random with sufficiently long repetition period, in order to avoid undesirable line components in the power spectrum.Thus, d t [m] statistically behaves as a truly random sequence approximately, and will be regarded as such.In particular, we assume that where C is positive definite diagonal, given by with Thus, unprotected data have unit variance, whereas the k-th entries of the protected and training data vectors p,k and γ 2 t,k , respectively.We allow for C pt ̸ = I, since it may be desirable to allocate extra power to protected data and pilots.Hence, The precoding operation will distort the unprotected data subcarriers whenever P ̸ = J where since in that case S H x ̸ = d u .In general, this in-band distortion will degrade the symbol error rate, unless appropriate measures are taken; typically, these include the application of some decoding algorithm at the receiver exploiting the knowledge of the precoding matrix G and possibly the finite alphabet property of the entries of d.
Note that unprecoded transmission with null subcarriers corresponds to the case P = J and Q = 0, i.e., the data subcarriers are undistorted and the cancellation subcarriers are all set to zero.The case P = J with Q optimized to yield low OBR corresponds to AIC; this approach completely avoids data distortion and is transparent to the receiver, which simply discards the cancellation subcarriers.We pursue a more general approach in which both P and Q are optimized, under appropriate constraints.

B. Power Spectral Density
t dt be the Fourier Transforms of the shaping pulse and DAC interpolation filter, respectively.Also denote As shown in [37], the CP-OFDM signal s(t) in ( 2) is cyclostationary with period LT s , and with PSD given by where ∀f , the corresponding weighted power is given by where we have introduced the positive (semi-)definite matrix To quantify OBR, W (f ) can be selected to emphasize certain frequency regions over others; in the simplest case, if B ⊂ R is the set of frequencies over which OBR is to be minimized, one can take W (f ) = 1 for f ∈ B and zero otherwise.Our goal is to minimize P W , given in (13), with respect to P and Q, and subject to appropriate constraints on the total transmit power as well as on the distortion introduced on the data subcarriers, which has a direct impact on decoding complexity at the receiver.

III. PRECODER STRUCTURE
The structure of the precoder impacts both sidelobe suppression capability and in-band distortion.The latter may lead to SER degradation, depending on the decoding strategy applied at the receiver.In this section we present a decoding scheme which, although suboptimal, has low complexity and suggests a suitable structure for the precoder matrix.
At the receiver end, after time and frequency synchronization, the CP is removed and an N -point FFT is applied.From these N samples, those corresponding to the K c cancellation subcarriers are discarded.Of the remaining ones, the subset of pilot subcarriers can be used for channel estimation and synchronization, since its elements d t are known and have not been distorted by the precoding operation.After applying frequency-domain equalization, the vectors r p ∈ C Kp and r u ∈ C Ku of protected and unprotected data subcarriers, respectively, are available, and the data vectors d p , d u must be recovered from them.Partition P as For convenience, we also define Then, assuming perfect channel estimation and zero-forcing equalization, one has where w p , w u are the corresponding noise vectors.
Let DEC{•} be an entrywise operator returning for each entry its closest point in the constellation.Noting from (17) that the protected data symbols can be readily estimated as dp = DEC{r p }, we can substract the effect of the protected and training data from (18) to obtain where it was assumed that dp ≈ d p .The receiver needs to recover d u given r u in (19); to this end, the structure of the precoding matrix P u has to be conducive to efficient decoding.Thus, we propose to constrain P u to be of the form with Π, Θ, ∆ ∈ C Ku×Ku such that: • Π is a permutation matrix; • Θ is a strictly lower triangular band matrix [38, Sec.
• ∆ is a full matrix with small elements (as described below).The above structure is motivated by the following iterative decoding procedure, which exploits the finite-alphabet property of data.Let I max be the maximum number of iterations, and initialize d(0) u = E{d u } = 0.Then, for i = 1, 2, . . ., I max , we first compute the intermediate variable and then, since Θ is strictly lower triangular, apply SIC to obtain the next estimate The choice of permutation Π sets the decoding order in the SIC process ( 21)-( 22), and is assumed fixed.Note that, by the band property of Θ, products Θ kℓ d(i) u,ℓ are zero for ℓ < k − b, and need not be computed in (22).By selecting b, complexity can be traded off against OBR reduction performance: larger values of b result in more degrees of freedom for Θ, but also increase the number of complex products required in (22).We denote the set of strictly lower triangular K u × K u band matrices with bandwidth b by L Ku b .The effect of the distortion matrix ∆ is substracted in step (21) based on the estimates from the previous iteration; thus, in order to limit error propagation, the size of the elements of ∆ should not be too large, as discussed in Sec.IV.

IV. PRECODER DESIGN
To avoid unacceptable SER degradation at the receiver when the decoding scheme of Sec.III is adopted, some constraint must be placed on the size of the distortion matrix ∆.To this end, note that the normalized inter-carrier interference (ICI) power in the k-th unprotected subcarrier due to this distortion matrix is given by the squared norm of the k-th row of ∆: Then, it is reasonable to constrain the normalized ICI on a per-subcarrier basis: we set a maximum normalized ICI of ϵ k ≪ 1 for the k-th unprotected data subcarrier.Another issue that needs to be considered is the potential generation of undesirable spectral peaks in the signal passband.This effect is well known for AIC [25], [39], and results from the optimal precoder giving too much gain to the cancellation subcarriers.Hence, to avoid spectral overshoot in our design, we introduce regularization terms to the cost function P W in (13), penalizing large values of the power of the precoded vector components due to the precoding matrices P pt , Q and Θ.Specifically, letting H , these power values are computed as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Thus, the following problem is obtained: where α, β, γ ≥ 0 are regularization parameters.Their values should be chosen sufficiently large to effectively limit in-band spectral peaks, but not so large as to result in excessive performance loss in terms of OBR.Note that it is not necessary to introduce a regularization term for ∆, as its Frobenius norm is already effectively limited by the per-subcarrier normalized ICI constraints: The objective and the K u inequality constraints in (27) are convex quadratic, whereas the remaining constraints are linear.Hence, problem (27) is convex, and it could be tackled with any suitable convex solver.However, this approach quickly becomes impractical as the number of subcarriers increases: the large number of optimization variables and constraints would result in very high computational complexity.This is of particular relevance in dynamic spectrum access (DSA) systems which must reconfigure their transmission parameters as spectrum availability changes over time, so that precoding matrices may have to be frequently recomputed.Due to this, we seek alternative reduced-complexity approaches to approximately solving (27).In particular, we propose to cyclically minimize the objective w.r.t. each of {P pt , Q}, ∆, and Θ while keeping the remaining variables fixed, as follows.Initialize ∆ 0 = 0 and Θ 0 = 0, and then for j ≥ 1 do: At each iteration, the tuple (P pt,j , Q j , ∆ j , Θ j ) is feasible for problem (27); therefore, any convergent point must be feasible, since the feasible set is closed.In addition, the regularized cost function in (27) is decreased (or at most, does not increase) at each of the steps ( 28)- (30); since this cost function is nonnegative, the sequence of its values necessarily converges.Each of the three subproblems ( 28)-( 30) is addressed in turn in the following subsections.
Let P u,j−1 = Π(I Ku + Θ j−1 + ∆ j−1 ) be fixed, so that (28) is rewritten as This is a convex quadratic problem, whose solution can be found in closed form as follows.Let us partition the precoding matrix as In view of ( 6), the first and third terms of the cost J in (31) can be written respectively as We recall the following properties of the complex gradient1 : for constant matrices A, B, it holds that ∇ X tr{X H A} = A, ∇ X tr{A H X} = 0 , and ∇ X tr{X H AXB} = AXB.Then one has Since C pt is invertible, equating (34)- (36) to zero yields the solution to (31): where we have introduced the matrices Note that ( 38) is in fact independent of the iteration index j, so it only has to be computed once.

B. Optimization of ∆
For fixed s.to The main hurdle towards efficiently solving (40)-( 41) is the large number of inequality constraints related to the persubcarrier normalized ICI power.Noting that each of these constraints involves a single row of ∆, we propose to sequentially minimize the objective with respect to each of these rows while keeping the remaining K u − 1 rows fixed.
To this end, let δ k = ∆ H e k , and let ∆ k = ∆ − e k δ H k , i.e., ∆ k is obtained by zeroing out the k-th row of ∆.With these, let us introduce which does not depend on δ k .Then the precoder matrix can be written as G = G j,k + SΠe k δ H k J , where J was defined in (7).Therefore, the problem of optimizing δ k while keeping ∆ k fixed can be stated as This convex quadratic problem with a quadratic inequality constraint is highly structured and can be solved in closed form, as shown in Appendix A. In this way, the rows of ∆ are optimized in a sequential fashion.The order of the sequence may be cyclic or random, and the number of passes can be either fixed, or variable subject to some stopping criterion.Note that in (44), j denotes the index of the outer iterations corresponding to ( 28)-( 30), whereas k is the index of the inner iterations corresponding to the row updates with j fixed.

C. Optimization of Θ
For fixed P pt = P pt,j , Q = Q j and ∆ = ∆ j , problem (30) becomes Note that the constraint Θ ∈ L Ku×Ku b is linear in Θ.Then the objective in (45) can be written in terms of the nonzero elements of Θ, resulting in a convex quadratic problem (see Appendix B).The overall procedure for computing the precoder G = SP + T Q + R is summarized in Algorithm 1.

V. COMPLEXITY ANALYSIS
Implementation complexity is a critical factor for any OBR reduction method, both at transmitter and receiver.Our design of precoder matrices can be done offline, as it is dataindependent.Regarding online complexity, one has: • At the transmitter, directly implementing (5) requires Compute Q u,j , Q pt,j , P pt,j via (37), (38) 5:  2  Interestingly, as shown in Sec.VI, the matrix ∆ obtained by the proposed design tends to have many small singular values, which is reasonable since the constraints on normalized ICI prevent ∆ from being "large".This property motivates the use of a low-rank approximation of ∆ to reduce implementation complexity.The best choice, in the sense of minimizing the squared Frobenius norm of the approximation error, is given by truncating the singular value decomposition (SVD), as per the Eckart-Young theorem [40].Thus, if the SVD of ∆ is truncated to r ∆ < K u principal components, then for some L ∆ ∈ C Ku×r∆ and M ∆ ∈ C Ku×r∆ .Then the transmitter computes (5) as which, taking into account that the terms involving d t can be precomputed and stored, requires b( cmults/symb.Analogously, at the receiver end, the intermediate step ( 21) can be implemented with 2r ∆ K u cmults/symb per iteration.This results in significant savings, because usually it is possible to take r ∆ ≪ K u without significantly compromising OBR reduction.

VI. NUMERICAL EXAMPLES
We provide examples of the performance of the proposed design.For reference, we compare the corresponding PSD 2 Note that this figure may be actually lower if one exploits the finite-alphabet nature of past decisions: for instance, with Quadrature Phase-Shift Keying (QPSK) modulation, the terms d(i) u,ℓ in (22) belong in {1+j, 1− j, −1 + j, −1 − j}, so that Θ kℓ d(i) u,ℓ can be actually computed with just two real additions; similar considerations apply to the computation of ∆ d(i−1) u in (21).Nevertheless, we do not take such potential savings into account when reporting computational loads.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.with those of a plain CP-OFDM system with null subcarriers and no precoding (P = J and Q = 0), the standard AIC design (P = J ), and two variations of Orthogonal Precoding to accommodate protected and training subcarriers, termed Plain Orthogonal Precoding (POP) and Extended Orthogonal Precoding (EOP), which are described in Appendix C. In both cases the design criterion is the minimization of the weighted power (13); the structure of EOP is an extension of that proposed in [35], which considered data and pilot subcarriers, to also allow for protected subcarriers not known at the receiver.If all subcarriers are unprotected, POP and EOP reduce to the design from [33].
Assuming an ideal lowpass interpolation filter H I (f ) with cutoff frequency f s /2, three different scenarios are examined for a transmitter with IFFT size N = 512 and CP length N cp = N/16, which differ in the number and layout of active subcarriers.In all cases, all protected subcarriers have the same power γ 2 p,k = 1.2, whereas all training subcarriers have power γ 2 t,k = 1.5.The permutation matrix Π ∈ C Ku×Ku is taken as With this choice, the SIC decoder starts with the innermost subcarrier, and then progresses incrementally towards the band edges alternating between subcarriers below and above the passband center.This decoding order has been found to yield a more uniform distribution of power across subcarriers, resulting in a PSD with reduced in-band spectral peaks.In all designs, the regularization factors are selected by trial and error in order to obtain the best performance in terms of OBR reduction without incurring in spectral overshoot.They are expressed in terms of the product of dimensions of the corresponding matrices, e.g., for α, β, γ in (27) we write

A. Scenario 1
We consider a layout with K = 257 active subcarriers symmetrically located about the carrier frequency, i.e., K = {−128, • • • , 0, • • • , 128}.There are K p = 4 protected subcarriers, with indices in K p = {±10, ±20}, and It is assumed that K c is even, so that K u is even as well.
Fig. 1(a) shows the PSD obtained in this setting by the different precoder designs, all of them with K c = 6 cancellation subcarriers; thus, the precoder redundancy is Kc K = 2.3% (only the PSD envelope is shown for clarity).The extra power allocated to protected and pilot subcarriers makes their locations within the passband clearly noticeable.The AIC precoding design ( β = 16) provides little OBR reduction with respect to the reference scheme using K c /2 null subcarriers at each passband edge, whereas the orthogonal designs (POP and EOP, ᾱ′ = 0) perform significantly better.For the proposed design, we initially constrain the normalized ICI power to ϵ k = ϵ = 0.005 ∀k and b = K u − 1 = 215, so that all the degrees of freedom available in the precoding matrix Θ are used.The proposed design (ᾱ = 0.2, β = 0.3, γ = 0.005) significantly outperforms the other schemes, providing lower PSD levels in the OBR region.The SER obtained with the decoding scheme described in Sec.III is shown in Fig. 1(b), assuming an additive white Gaussian noise (AWGN) channel.It is seen that directly slicing the vector r u in ( 19) yields poor performance (curve labeld "direct" in the figure), due to intercarrier interference introduced by the precoding matrix P u .The SIC-based decoder effectively counteracts this effect with just two iterations in this case, for both QPSK and 16-QAM modulation.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.To explore more computationally efficient alternatives, let us introduce two energy compaction metrics applied to the K u × K u matrices Θ and ∆, respectively.For the strictly lower triangular Θ, we define the energy compaction over subdiagonals as the energy of all coefficients in subdiagonals 1 through k, normalized by the total energy, i.e.,

TABLE I OBR LOSS INCURRED BY
whereas for the matrix ∆, we measure the energy compaction over its singular values Note that D Ku−1 (Θ) = 1 and F Ku (∆) = 1 always hold.From Fig. 2, it is seen that the most significant coefficients of Θ are grouped in the first subdiagonals (for example, the first 60 subdiagonals pack over 90% of the total energy), and that ∆ has a small number of significant singular values.These observations motivate the adoption of a banded structure for Θ and a low-rank approximation for ∆, in order to reduce computational complexity with a small impact on performance.To check this point, Table I shows the performance loss incurred by replacing ∆ by its best r ∆ -rank approximation: it is seen that as long as r ∆ ≥ 7, the loss is below 1 dB.Next, the precoder was redesigned keeping the normalized ICI power constraint ϵ = 0.005, but with different values of the bandwidth b ∈ {0, 2, 4, 10, 20, 50}; the resulting matrix ∆ was replaced by its best rank-7 approximation in all cases with very small performance loss.Fig. 3(a) shows the corresponding PSDs.For b = 0 (meaning the triangular component Θ is absent in the precoder) the proposed design performs better than POP but worse than EOP.However, for b ≥ 2 the proposed scheme outperforms EOP.The PSD obtained with b = 50 is nearly identical to that with b = 215.
The convergence of the proposed method in terms of OBR reduction (relative to the OBR achieved with the null subcarriers-based reference scheme) is shown in Fig. 3(b), together with the corresponding values obtained with AIC, POP and EOP.In general, convergence is smooth but takes longer for larger b.For all values of b, the observed SER behavior of the decoder in the AWGN channel was similar to that in Fig. 1(b), taking two iterations to converge.Table II lists the online complexity of the different designs in this setting, both at the transmitter and at the receiver, together with the attained OBR reduction.With b = 50, OBR is 13.4 dB below that of EOP, at the price of increased complexity (by a factor of 4.2 at the transmitter and 6.5 at the receiver, approximately).The value of b can be further decreased to trade off performance and complexity; for example, with b = 4 it is possible to achieve an OBR improvement of 8.3 dB over EOP, at about twice its computational cost.
In the next experiment, still with K c = 6, the bandwidth of Θ was fixed to b = 10, and the value of the normalized ICI power constraint ϵ k = ϵ (common to all unprotected subcarriers) was varied as ϵ ∈ {0, 0.0025, 0.01, 0.015}.Note that ϵ = 0 means that the term ∆ is absent in (20); for ϵ > 0, a low-rank approximation of ∆ was adopted using r ∆ = 7.The resulting PSD can be seen in Fig. 4(a), whereas Fig. 4(b) shows the SER of the decoding scheme of Sec.III.There is a clear tradeoff in the selection of ϵ between OBR improvement and the required number of decoding iterations, a tradeoff which becomes more demanding for denser constellations.For instance, with ϵ = 0.015, the proposed precoder achieves an Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.OBR reduction of 14.4 dB with respect to EOP; however, with 16-QAM the decoder takes about 3-4 iterations to converge, and it presents a 1-dB gap at SER= 10 −4 with respect to the baseline unprecoded system.Thus, for a given OBR level, the pair (b, ϵ) can be selected depending of the target SER and TX/RX complexities.Fig. 5 shows the complementary cumulative distribution function (CCDF) of the Peak-to-Average Power Ratio (PAPR) of the multicarrier signals generated in this scenario.The PAPR degradation for the EOP scheme with respect to the unprecoded case with null subcarriers is 0.1 dB, and for the proposed scheme it remains within 1 dB.For fixed b the PAPR seems to be almost insensitive to the choice of ϵ, whereas for fixed ϵ PAPR degradation becomes worse for low values of b > 0. For b = 0 (not shown for clarity) the PAPR degradation of the proposed scheme is less than 0.1 dB.
Next, the number of cancellation subcarriers is reduced to K c = 2 (redundancy Kc K = 0.8%).We fix b = 10, whereas now ϵ k = ϵ ∈ {0, 0.005, 0.01, 0.015}, and r ∆ = 10 is adopted for ϵ > 0. The resulting PSD and SER curves are shown in Fig. 6.In this low-redundancy setting, the orthogonal precoders cannot substantially improve performance with respect to simply turning off the cancellation subcarriers.The proposed scheme, in contrast, is able to reduce OBR significantly at the expense of increased complexity: for ϵ = 0, the number of cmults/symb at the TX and RX (with a single decoding iteration) is about twice that of EOP, whereas for ϵ > 0 the corresponding complexities increase by factors of 4.4 at the TX and 5.4 at the RX (with two decoding iterations).

B. Scenario 2
In the second scenario, the number of active subcarriers is reduced to There are K p = 2 protected subcarriers, with indices in K p = {±9}, and K t = 9 training subcarriers, with K t = {0, ±6, ±12, ±18, ±24}, so that  available results in poor performance of the orthogonal precoders, whereas the proposed design shows much better behavior.In all cases, a low-rank approximation of ∆ with r ∆ = 8 was adopted without noticeable degradation.Larger values of ϵ yield better OBR reduction, but with more decoding iterations needed at the receiver, and with an increasing gap with respect to the unprecoded reference system.
Next, we consider a multipath block-fading channel with exponential power delay profile.Specifically, channel taps are generated as c i.i.d.realizations of a circularly symmetric zero-mean complex Gaussian random variable.In this way, δN cp corresponds to both the mean delay and the rms delay spread of the channel.The corresponding frequency-domain channel is normalized to yield k∈K |C[k]| 2 = K.Fig. 8 shows the SER curves for the proposed decoder corresponding to a design with ϵ = 0.005 and b = 4, assuming perfect channel knowledge and zero-forcing equalization; a different channel realization was independently drawn for each OFDM symbol.The proposed decoding scheme is seen to converge in a single iteration with QPSK with a gap smaller than 1 dB with respect to the baseline unprecoded system; with 16-QAM convergence is achieved in 1-2 iterations, and the gap is under 2 dB.SER performance is seen to degrade with increasing delay spread, as expected.

C. Scenario 3
Lastly, we consider a scenario with non-contiguous K = 193 active subcarriers, with indices in K The protected subcarriers are at K p = {±80, ±100} (K p = 4), and the training subcarriers are at The K c = 4 cancellation subcarriers (redundancy Kc K = 2.1%) are placed at the spectrum edges: K c = {−128, −1, 64, 128}.We set b = 4 and ϵ k = ϵ = 0.005 for all k, and consider two different spectral weighting functions: • Uniform weight: In this way, more priority is given to low PSD levels over B in .Regularization factors ᾱ = 10, β = 0, γ = 0.2.Results obtained with a low-rank approximation r ∆ = 10 are shown in Fig. 9.The orthogonal precoder designed with a uniform weight yields little OBR reduction over either B out or B in ; with a nonuniform weight, the PSD over B in is lowered by 10 dB.Nevertheless, the proposed design provides much better performance in this setting.It is seen that OBR reduction over B in and B out can be traded off by selecting the spectral weighting function.Regarding SER performance, the gap to the unprecoded reference system in AWGN channel is within 0.3 dB and 1 dB for QPSK and 16-QAM, respectively, with convergence taking place in two iterations in both cases.

VII. CONCLUSION
By allowing in-band distortion, the performance of spectral precoders can be significantly improved.Orthogonal precoders exploit this fact, providing satisfactory performance if sufficient redundancy can be afforded.However, in scenarios targeting high spectrum utilization, this may not be the case, and the proposed precoder design constitutes an alternative.The structure of the precoding matrix is selected with the operation of the decoder in mind, in order to allow compensation of in-band distortion.In this way, sidelobe reduction can be traded off against spectral redundancy, computational complexity, in-band spectral peaks, and error rate degradation, particularly with denser constellations.
The proposed decoding method targets low-complexity implementation and hinges on successive interference cancellation feeding back hard decisions on the data symbols, which Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
is not necessarily optimal.The development of alternative designs based on more sophisticated decoding schemes has the potential to reduce error rate degradation for spectral precoders aiming at aggressive sidelobe reduction, and constitutes an interesting line for future work.

APPENDIX A SOLUTION TO (43)-(44)
In view of ( 6) and ( 7), one has J C = J and J CJ H = I Ku .Using these, and introducing B = Π H S H A W SΠ and F = J G H j,k A W SΠ, the objective in (43) reads as G j,k does not depend on δ k , so (43)-( 44) can be rewritten as The unconstrained minimizer of the cost in (53) is given by δ then it also constitutes the solution to the constrained problem; otherwise, the constraint must hold with equality.The corresponding Lagrangian, with Lagrange multiplier λ, is Equating the gradient of L to zero yields ∥F e k ∥ , with s ∈ {−1, +1}.To remove the sign ambiguity, substitute this expression of δ k in the objective of (53) to obtain which is minimized over s ∈ {−1, +1} when s = −1.Thus, the solution to (53) can be compactly written as

APPENDIX B SOLUTION TO (45)-(46)
The component of G independent of Θ is given by so that G = G j + SΠΘJ .Since CJ H = J H and J CJ H = I Ku , one has k Be k for any B ∈ C Ku×Ku .For ease of notation, let us introduce where Let R and Z be as in ( 4) and ( 39) respectively.Consider a precoder G = ZF + R, where F ∈ C (Ku+Kc)×(Ku+Kp+Kt) is to be optimized.This structure does not distort the protected and pilot subcarriers: with x = Gd, one has where F u ∈ C (Ku+Kc)×Ku , F p ∈ C (Ku+Kc)×Kp and F t ∈ C (Ku+Kc)×Kt .

A. Plain Orthogonal Precoder
In this design we fix F p = 0 and F t = 0.Then, at the output of the receiver's zero-forcing equalizer, the vector of K u + K c samples at unprotected and cancellation subcarriers satisfies r uc = F u d u + w uc , with w uc the noise term.Thus, if F u has orthonormal columns, F H u r uc = d u + F H u w uc approximately recovers d u .The weighted power in (13) becomes P W = tr{G H A W GC} = tr{F H u Z H A W ZF u } + tr{R H pt A W R pt C pt }, which is minimized over the set of semiunitary matrices when F u comprises the K u least eigenvectors of Z H A W Z. Since F u has orthonormal columns, the products F u d u (at the transmitter) and F H u r uc (at the receiver) can be efficiently performed by resorting to Householder reflectors [41], taking 2K u K c + K 2 c complex multiplications each.

B. Extended Orthogonal Precoder
Allowing F p , F t to be nonzero, one has r uc = F u d u + F p d p + F t d t + w uc .The term F t d t is known to the receiver, whereas an estimate dp can be readily obtained as shown in Sec.III.
The solution to (65) is such that F u comprises the K u least eigenvectors of Z H A W Z and F pt = −(Z H A W Z + α ′ I Ku+Kc ) −1 Z H A W R pt .In this case, exploiting again the properties of Householder reflectors [41], and the fact that the term F t d t can be precomputed and stored, the precoding operation at the transmiter takes K u (2K c +K p )+K c (K c +K p ) complex multiplications per OFDM symbol, and the same amount at the receiver for decoding.
From this point onwards, we drop the dependence of the vectors x[m], d[m], etc., on the symbol index m, and write simply x, d, etc., unless otherwise specified.
with weight function W (f ) = 1 for f ∈ B and zero elsewhere.Only K c = 2 cancellation subcarriers are available (redundancy Kc K = 3.1%), with indices in K c = {±32}.The resulting PSD and SER curves in AWGN channel obtained with b = 4 and for four different values of the normalized ICI power ϵ (with ϵ k = ϵ for all k) are shown in Fig. 7. Again, the reduced number of cancellation subcarriers

1
is the number of columns of E k .Then we can rewrite problem (45)-(46) as min quadratic.Its solution is given byθ k = −(M k + γI n k ) −1 v j,k , k = 1, . . ., K u − 1. (64) APPENDIX C ORTHOGONAL PRECODER DESIGN [36]subcarriers are dedicated to sending training data (pilots), which should not be distorted by the precoding operation either[36].Pilots are collected in d t [m] ∈ C Kt , and are used for channel estimation and synchronization.Note that d t [m] is known at the receiver, whereas d u [m], d p [m] are not.We define R t ∈ C N ×Kt as the matrix comprising the K t columns of I N whose indices correspond to the location of the pilot subcarriers. • Structured Spectral Precoder Design 1: Input: subcarrier allocation matrices {S, T , R}, OBR matrix A W , covariance matrix C, permutation matrix Π, bandwidth b, normalized ICI constraints {ϵ k } Ku k=1 , regularization factors α, β, γ 2: Initialize ∆ 0 ← 0 Ku×Ku , Θ 0 ← 0 Ku×Ku , j = 1 3: repeat multiplications per OFDM symbol (cmults/symb).• At the receiver end, since d t is known, the term P t d t can be precomputed and stored, so the initial step (19) requires K u K p cmults.The intermediate step (21) requires K 2 u cmults/symb per iteration, except for the first iteration which is multiplication-free; whereas the number Algorithm 1 (13), if F u is semiunitary, and assuming dp ≈ d p , the receiver computesF H u (r uc − F p dp − F t d t ) = d u + F H u F p (d p − dp ) + F H u w uc ≈ d u + F H u w uc , and again d u is approximately recovered.Let F pt = [ F p F t ].It can be readily checked that the weighted power in(13)becomesP W = tr{G H A W GC} = tr{F H u Z H A W ZF u } + tr{(ZF pt + R pt ) H A W (ZF pt + R pt )C pt }.To avoid spectral peaks due to the term F pt , we introduce a regularization term and then solve min F u,F pt