Synchronization Algorithms and Receiver Structures for Multiuser Filter Bank Uplink Systems

We address the synchronization problem in an uplink multiuser ﬁlter bank system. The system di ﬀ ers from orthogonal frequency division multiple access (OFDMA) since it deploys subchannel frequency conﬁned pulses. User multiplexing is still accomplished by partitioning the tones among the active users. Users are asynchronous such that the received signals experience independent time o ﬀ sets, carrier frequency o ﬀ sets, and multipath fading. We ﬁrst consider the synchronization problem in conventional receivers that implement an analysis ﬁlter bank with precompensation of the subchannel time and frequency o ﬀ sets followed by recursive least square linear subchannel equalization. Several correlation metrics that use data training are described. Then, we consider the synchronization problem in a novel multiuser receiver that comprises two e ﬃ ciently implemented fractionally spaced analysis ﬁlter banks. In this receiver, time/frequency compensation can be jointly done for all the users. Despite its lower complexity, we show that it approaches the performance of single-user transmission.


Introduction
In this paper, we consider the synchronization problem for filter bank (FB) modulation in a multiple access uplink wireless channel. In particular, we consider an FB system that uses frequency confined subchannel pulses. The users are multiplexed by partitioning the tones in a frequency division multiple access (FDMA) mode. This system is also referred to as multiuser filtered multitone (FMT) [1,2]. Orthogonal frequency division multiple access (OFDMA) differs from multiuser FMT since it deploys rectangular timedomain pulses that exhibit a sinc frequency response [3]. The multiuser FMT transmitter can be efficiently implemented using an inverse fast Fourier transform (IFFT) followed by low-rate subchannel filtering [1,4,5]. The subchannel frequency confinement makes multiuser FMT more robust than OFDMA in an asynchronous uplink channel, where the signals of distinct users experience independent time offsets from propagation delays, carrier frequency offsets from misadjusted oscillators, or Doppler effects from movement, and propagation through multipath fading channels [6][7][8].
Although the synchronization problem in OFDM/ OFDMA has received great attention and several results have been obtained, as, for instance, the algorithms in [9,10], synchronization in FMT systems, and more in general in multiuser FMT, has not been extensively investigated. Synchronization involves the estimation of the users' time and frequency offsets as well as the estimation of the channel impulse response. In [11], a blind scheme has been considered for synchronization in single-user FMT that exploits the redundancy of the oversampled FB. In [12], both time domain and frequency domain algorithms with data training have been investigated. In [13], a nondata-aided timing recovery scheme has been proposed for single-user FB modulation.
The synchronization problem depends on the particular receiver structure adopted. In this paper, we first describe two conventional receiver structures. The first receiver uses an analysis FB that is matched to the individual subchannels after time and frequency compensation. The second receiver deploys an FB, where compensation is done at user level, that is, a single value for the time phase, and the carrier 2 EURASIP Journal on Wireless Communications and Networking frequency offset is deployed for all the subchannels of a given user. Then, symbol-spaced subchannel recursive least square (RLS) equalization is performed [14]. In both cases, we address the problem of estimating the time and frequency offsets. We consider a training approach and devise several metrics that exploit the subchannel separability of FMT deriving from the use of frequency confined subchannel pulses. We also propose an iterative approach where the analysis FB is iteratively matched to the received signals by estimating the time and frequency offsets at the FB output followed by feedback to the input [15].
A drawback of the above receivers is that they have high complexity. In particular, the subchannel-synchronized receiver does not allow for an efficient implementation. On the other hand, the user-synchronized receiver can be implemented via polyphase low-rate filtering followed by a fast Fourier transform. However, one analysis FB per user is required. Further, the synchronization stage is implemented at sampling time which yields extremely high complexity. Therefore, to simplify the complexity, we propose the use of a novel fractionally spaced analysis FB that allows jointly detecting all the subchannels of the asynchronous users with lower complexity compared to the traditional singleuser receiver that requires one synchronous FB per user. In this receiver, the compensation of the time offsets and carrier frequency offsets is jointly performed for all users. The fractionally spaced outputs are processed by subchannel fractionally spaced RLS equalizers. The practical implementation of this multiuser receiver is studied, and a metric for the estimation of the parameters is proposed. Numerical results show that it nearly achieves the performance of single-user FMT. Further, its complexity is significantly lower than that of the conventional receivers both during the synchronization stage and the detection stage.
The paper is organized as follows: we describe the multiuser FMT system model in Section 2. In Section 3, we describe the three receiver structures considered in this paper. In Section 4, we address the synchronization problem and propose metrics for all three receivers. In Section 5, we study the complexity of these receivers. In Section 6, we report several performance results. Finally, the conclusions follow.

Multiuser FMT System Model
We consider a multiuser FB modulation architecture, where multiplexing is performed via partitioning the subchannels among the users (Figure 1). We denote by g(nT) the prototype subchannel pulse, where T is the sampling period. (The sampling period T is assumed to be the time unit and Z denotes the set of integer numbers.) The subchannel carrier frequency is denoted by where M is the total number of tones. It follows that the complex baseband transmitted signal of user u can be written as where a (u,k) ( T 0 ) is the kth subchannel data stream of user u that we assume to belong to the M-QAM constellation set and that has rate 1/T 0 with T 0 = NT ≥ MT. In FMT, the prototype pulse has frequency confined response with Nyquist bandwidth 1/T 0 . The interpolation factor N is chosen to increase the frequency separation between subchannels and to ease the construction of finite impulse response (FIR) pulses that minimize the amount of intercarrier interference (ICI) and multiple access interference (MAI) at the receiver side. Distinct FMT subchannels are assigned to distinct users. Thus, the symbols in (1) are set to zero for the unassigned FMT subchannels: where K u denotes the set of M u subchannel indices that are assigned to user u. At the receiver, the complex discrete time received signal can be written as where τ i = iT + Δ 0 and Δ 0 is a sampling phase. N U is the number of users, Δ (u) τ is the time offset of user u that is due to asynchronous transmission and/or propagation delays, Δ (u) f and φ (u) are the carrier frequency and phase offset, g (u) CH (t) is the fading channel impulse response, and η(τ i ) is the additive white Gaussian noise with zero-mean contribution. We assume the time/frequency offset to be identical for all the subchannels that are assigned to a given user.

Receiver Structures
The base station has to detect all users' signals that are affected, according to the model in (3), by carrier frequency offsets, and propagation delays, as well as by different dispersive channel impulse responses. In this section, we first describe two conventional receiver structures. Then, we propose a novel multiuser receiver that allows lowering the complexity.

Conventional Receivers.
In the subchannel-synchronized receiver (SCS-RX), synchronization is done at subchannel level. That is, we deploy an analysis FB where each subchannel filter is matched to the transmit pulse, compensates the frequency offset by an amount Δ (u,k) f , and adjusts the time phase by an amount Δ (u,k) τ . The outputs are then sampled at rate 1/T 0 and processed by linear subchannel equalization before detection. The synchronization parameters, Δ (u,k) f and Δ (u,k) τ , have to be estimated. We use a training approach as it is explained in the next section. It should be noted .

g(nT)
f 0 Other users Multiuser channel FMT receiver that the optimal subchannel sampling phase can vary across the subchannels. This is because the propagation channel frequency selectivity translates into different subchannel equivalent impulse responses. The subchannel output of user u can then be written as where the superscript * denotes the complex conjugate operation and η (u,k) (mT 0 ) is the sequence of filtered noise samples. The relation (4) has been obtained following the derivation in Appendix A. We have separated the term that carries the useful data symbol from the term that represents the subchannel intersymbol (ISI) contribution, the ICI term that is generated by the subchannels of index k / = k of user u, and the MAI term that is generated by the subchannels that belong to the other users. These interference terms are the consequence of using a nonperfectly orthogonal FB, the presence of time/frequency offset, and channel frequency selectivity. An interesting characteristic of FMT is that the ICI/MAI contribution is negligible when frequency-confined pulses are used. Ideally, it is null with perfectly confined pulses if the frequency offsets are smaller than half the guard band among subchannels, that is, where α/T 0 is the excess band of the prototype pulse. Clearly, some intersymbol interference in each subchannel may be present and can be counteracted with subchannel equalization. In this paper, we consider linear equalizers. In 4 EURASIP Journal on Wireless Communications and Networking fact the subchannel equivalent response is not an ideal pulse, and it can be written as where . The inner sum in (6) represents the correlation between the prototype pulse and the pulse itself modulated by , the analysis FB has a frequency mismatch with the synthesis FB. Further, the factor e j2π(Δ (u) )mT0 that weights the useful data symbol in (4) introduces a time-variant rotation of the constellation.
On the other hand, if the estimation of the frequency offset is perfect, the equivalent impulse response reads where r g (iT) is the autocorrelation of the prototype pulse. The SCS-RX not only compensates the time offset of a given user, but it also uses an optimal time phase for each subchannel. In fact Δ (u,k) τ comprises both the propagation delay and the effect of the multipath channel that moves the position of the peak of the subchannel impulse responses as (7) shows. Such a peak is the amplitude of the useful data symbol in (4).
The SCS-RX has good performance as it will be shown in Section 6, but it has a significant drawback since it cannot be implemented using an efficient discrete Fourier transform (DFT) polyphase FB. The efficient FMT analysis FB comprises serial-to-parallel conversion of the received sample stream, low-rate subchannel filtering with pulses that are obtained by the polyphase decomposition of the prototype pulse, and finally a DFT [1,4]. A unique time phase for all the subchannels must be used, which does not allow adjusting timing at the subchannel level.
To simplify the complexity of the SCS-RX, we can compensate the time offset with a common value Δ (u) τ and the frequency offset with a common value Δ (u) f for all the subchannels of a given user. We refer to this receiver as usersynchronized receiver (US-RX). Now, one efficient analysis FB per user can be deployed whose realization can be done as described in [4] or in [5] when the tones are regularly interleaved among the users. Subchannel equalization with symbol-spaced equalizers is performed. Although the US-RX can be implemented in an efficient way, it still requires one FB per user. Further, it suffers from a performance penalty compared to the SCS-RX (Section 5).

Fractionally Spaced Multiuser Receiver.
To reduce further the complexity and increase the performance, we propose a novel architecture that uses only two fractionally spaced analysis FBs to detect all M signals that are partitioned among the N U users. The block diagram is depicted in Figure 2. We refer to this receiver as fractionally spaced multiuser receiver (FS-RX). The FB outputs are processed with fractionally spaced linear subchannel equalizers, whose coefficients are obtained according to the minimum mean square error (MMSE) criterion [16]. The use of a fractionally spaced equalizer allows having a common time phase for all the users. Fine synchronization is not required. Only synchronization at symbol level is required, and this can be done at the output of the bank of filters.
It should be noted that with ideal band-limited pulses, neither ICI nor MAI is present also with this receiver. However, the use of an inexact sampling phase (as a result of imperfect synchronization) may yield increased subchannel ISI which has to be handled with equalization. Further, a problem to be solved is the joint compensation of the carrier frequency offsets that differ among the users. This is accomplished, in our proposal, by the correction of part of the frequency offset before the FB (precompensation), and part after it (postcompensation). We where Q is a positive integer, and M 2 = l.c.m(M, N) is the least common multiple between M and N. Then, we define the frequency offset as the sum of an integer multiple of 1/M 3 T, and a fractional part Δ (u) f , that is, where we choose the integer q (u) according to the following rule: that corresponds to minimize the fractional frequency offset at the output of the receiver FB as it will be explained in the following. In (9), we have assumed − K 3 /2 ≤ q (u) < K 3 /2 such that adjacent FMT subchannels do not completely overlap as a result of the frequency offset. Now, the FS-RX precompensates the integer part of the frequency offset by the estimated value q (u) /(M 3 T) before subchannel filtering and it samples the outputs at rate 2/T 0 , that is, we collect two samples per transmitted subchannel symbol. Therefore, the two subchannel outputs of index k ∈ EURASIP Journal on Wireless Communications and Networking  K u belonging to user u are where is the residual error in the correction of the integer part of the frequency offset. ICI and MAI terms are negligible due to the subchannel spectral containment. Their detailed expression is derived in Appendix A. The subchannel equivalent response reads The factor e j2π( Δ (u) f +ε (u) q )mT0 that weights the useful data symbol in (10) introduces a time-variant rotation of the constellation. However, it can be estimated and compensated at the subchannel filter output, that is, postcompensation. The factor e j2π( Δ (u) f +ε (u) q )nT in the inner sum in (11) cannot be compensated, and it yields a frequency mismatch between the transmitted subchannel and the analysis subchannel filter which is minimized for ε (u) q = 0 when the estimation of the integer part of the frequency offset is perfect. Therefore, the precompensation of only the integer part of the frequency offset translates in both a subchannel SNR loss and an increased ISI. However, as it is shown in Section 6, the penalty in performance can be negligible for practical frequency offset values, that is, when Δ (u) f nT is small over the duration of the prototype pulse.
The joint correction of the integer part of the frequency offset for all the users can be realized in an efficient receiver implementation. Following the derivation in [4] for the single-user case, if we apply the polyphase decomposition with period M 3 T to the filtering operation in the first line of (10), under the hypothesis of precompensating the frequency offset by q (u) /M 3 T, we obtain with being the polyphase decomposition of the received sample stream.

EURASIP Journal on Wireless Communications and Networking
Equations (12) and (13) suggest the scheme in Figure 2, where each of the two analysis FBs comprises the following steps: (i) serial-to-parallel conversion of the input sample stream, interpolation by a factor L 3 , filtering with the polyphase pulses (ii) computation of an M 3 -point DFT, and sampling the DFT outputs with index K 3 k + q (u) for k ∈ K u . This is to obtain the subchannel signals of user u and to partly compensate the frequency shift introduced by the integer part of the carrier frequency offset; (iii) combining the signals in (12) for δ = {0, T 0 /2} to obtain a set of M sample streams at rate T 0 /2; (iv) compensation of the fractional frequency offset (after estimation) with multiplication by e − jπ Δ (u) f mT0 ; (v) finally, a fractionally spaced subchannel equalizer processes the signals.
It should be noted that the correction of the integer part of the frequency offset is done by choosing the appropriate output tone of the M 3 -point DFT (shifted tone). If we increase Q, we reduce the amount of the residual frequency offset Δ (u) f at the expense of complexity since the size of the DFT increases.
In the FS-RX, the parameters to be estimated are Δ (u) f and the integer part of the frequency offset. This is addressed in the next section.

Synchronization
We deploy a training approach to estimate the time/ frequency offsets in the three receiver structures. Each user transmits a frame of data that comprises a known training data portion a (u,k) TR ( T 0 ), k ∈ K u , = 0, . . . , N TR − 1, that is, a training sequence per subchannel. The training sequence is also used to train the MMSE subchannel equalizer using a recursive least square algorithm (RLS) [14].
The estimation of the parameters can be done either at the input of the analysis FB or at the output of it, or jointly at the input and at the output of it. We refer to the first two approaches, respectively, as preestimation and as postestimation of the parameters. The third approach can be performed using an iterative procedure where we first filter the received signal with a bank of filters that is matched to the transmit FB. Second, the time offset and the frequency offset of the user are postestimated at its outputs. Third, we rerun the FB by now precompensating the received signal with the estimated time/frequency offset. The procedure is iteratively repeated. This iterative approach makes particularly sense for application to the SCS-RX and the US-RX. At each iteration, we essentially decrease the frequency mismatch between the synthesis and analysis FBs. Therefore, at the first iteration (when we do not have any a priori knowledge of the time/frequency offset of the desired user) we run the following FB for the channels of user u : The outputs are used to compute a correlation metric with known training symbols. In particular, we consider three approaches for the correlation metric that are described in the next section. The metric allows determining an estimate of the time offset and the frequency offset, that are denoted as Δ (u,k) τ,it and Δ (u,k) f ,it , respectively. Once the estimates above are computed, we rerun the receiver FB. However, now the FB can exploit the knowledge of the estimated time/frequency offset. Thus, for this new iteration we compute Now, using the outputs in (16), we can recompute the synchronization metrics in an iterative fashion and estimate the time and frequency offsets for this new iteration.
In the following, we propose several synchronization metrics for the receivers structures herein described. They implement at the FB output an appropriately defined correlation with the training data. The correlation is done either in time (along the time dimension for a given subchannel) and/or in frequency (across the subchannels).

Metrics for the Conventional Receivers.
Let us assume that the carrier frequency offsets fulfil the relation Δ (u) f T 0 ∼ 0, u = 1, . . . , N U , such that they are small compared to the subchannel bandwidth which is a realistic assumption in a practical system. Then, (15) can be written as for m = 0, . . . , N TR − 1 and n = 0, . . . , N − 1, that is, in correspondence to the training sequence. In (17), g (u,k) EQ (nT) denotes the equivalent subchannel impulse response that corresponds to (6) (17), if we assume the training sequence to have good autocorrelation properties, the following synchronization metric can be devised: It essentially performs a correlation with lag KT 0 of the subchannel outputs divided by the training data. An example of it is shown in Figure 3 for several subchannels of a user in an FMT system that deploys 32 tones and multiplexes 4 users by regularly interleaving the tones across them. The channel has an exponential power delay profile. More details on the system parameters are reported in Section 6. The peak of (18) is in correspondence to the training sequence. In fact, for n max = arg max n {|P (u,k) (n)| 2 }, we obtain where arg max{} denotes the function that returns the argument that maximizes the expression. It should be noted that the contribution of ICI and MAI in I (u,k) (n) is small due to the good spectral subchannel containment. The lag K is chosen to take into account the presence of subchannel ISI, and it has to be such that KT 0 is larger than the subchannel time dispersion. This is shown in Appendix B. Figure 3 also shows that the peak of the correlation metric differs among subchannels. Metric (18) is used to locate the training sequence and to estimate the time offset of subchannel k of user u as follows: while it is used to estimate the frequency offset as follows: where arg{a} returns the phase of the complex number a. The frequency offset estimation holds for |Δ f | < 1/(2KT 0 ). The frequency offsets can then be averaged across the subchannels since, in our assumptions, they do not differ for a given user.
Finally, we point out that if we use the iterative approach described above, at each iteration, the refinement of the frequency offset estimation is such that the residual frequency offset at the bank output decreases.
The metric described above can be applied also to the US-RX. In this case starting from the estimates given by (21) and (22), we compute the average values From (23), we obtain a common value for the time phase and carrier frequency offset that are used for all subchannels assigned to user u.
In a different method the outputs Z (u,k) (mT 0 ; nT) defined in (19) are used to compute the following correlation: where K ≤ N TR − K. The metric (24) corresponds to the computation of a correlation over each subchannel of a given user, followed by averaging over the subchannels. If we choose K = 1, we just need two training symbols per subchannel which minimizes the amount of redundancy required for synchronization. Metric (24) is used for time synchronization of user u as follows: while it is used to estimate the frequency offset as follows: The frequency offset estimation holds for |Δ f | < 1/(2KT 0 ). This metric is not suitable for the SCS-RX since it gives a common estimate for the offsets of the subchannels of user u.
The metric (24) in correspondence to the training sequence reads Therefore, while in (20) the maximum is in correspondence to the peak of the squared magnitude of the subchannel equivalent response, in (27) the maximum is in correspondence to the sum of the subchannel equivalent responses assigned to a given user. It should be noted that this metric turns out to be effective when the user is allocated to a sufficient number of subchannels.

Metric for the Fractionally Spaced Multiuser Receiver.
Since the FS-RX processes T 0 /2 -spaced samples, we can derive a synchronization metric starting from (24) provided that we sample it by a factor N/2. Further, this receiver requires an estimate of the integer part q (u) , and the fractional part Δ (u) f of the frequency offset. We emphasize that in the efficient implementation of Figure 2, the compensation of the integer part of the frequency offset is accomplished by selecting for each user of index u the M 3 -point DFT outputs of index K 3 k + q (u) for k ∈ K u , where q (u) is equal to the estimated value of the integer frequency offset.
The metric (28) is used to jointly estimate the integer part of the frequency offset and the symbol timing for user u as follows: According to (30), we search the peak of the correlation (28) for each of the K 3 possible values of the integer frequency offset. The position of the highest peak yields both the estimate q (u) and the user timing Δ (u) τ = n (u) maxT0/2.
It should be noted that (28) can be written in a way similar to (27) provided that g (u,k) as in (11). Therefore, the parameter estimates are chosen to maximize the sum of the squared amplitudes of the useful signals.
Finally, the fractional frequency offset is estimated as follows: The fractional frequency offset estimation holds for | Δ f | < 1/(2KT 0 ). Since frequency precompensation can be done for a value up to K 3 /(2M 3 T) = 1/(2MT), the practical FS-RX works for a larger range of carrier frequency offset than the SCS-RX and the US-RX that use the synchronization metrics described before. It should be noted that the constraint | Δ f | < 1/(2KT 0 ). can be satisfied by increasing Q, since Δ f decreases according to (8).

Complexity Comparison
To evaluate the complexity of the proposed receiver structures, we consider separately the stage where the synchronization parameters are estimated and the detection stage. They are characterized by a different amount of complexity. This is because while detection may use an efficient implementation for the FB, synchronization for the conventional receivers requires inefficient processing at sampling time T. In our analysis, we do not consider the complexity introduced by the subchannel equalizer since it is identical for all structures. Further, we assume that P = M/N U subchannels are assigned in a regularly interleaved fashion to each user u. The pulse has length LN coefficients.

Estimation of the Synchronization Parameters.
We now consider the synchronization stage. We have first subchannel filtering and then a subchannel metric. The SCS-RX and the US-RX require processing at sampling time T, while the FS-RX processes data streams at rate 2/T 0 . The complexity of the inefficient FB realization for the SCS-RX and the US-RX is equal to the following number of complex operations (sums and multiplications) per second (NOPS): The FS-RX deploys the efficient realization also for the estimation of the parameters. Thus, its complexity is equal to Now, we focus on the metrics described in Section 4. The SCS-RX uses the metric (18) that works at sampling time T and requires The metric (24) for the US-RX requires For the FS-RX, the metric (28) works at sampling rate 2/T 0 and requires Finally, we point out that each synchronization metric requires a different number of calls to the functions arg max{} and arg{}. In the SCS-RX, there are M calls to these functions; in the US-RX, there are N U calls; in the FS-RX, there are K 3 M calls to function arg max{} and N U calls to the function arg{}.

Detection.
At the detection stage, the SCS-RX not only compensates the time/frequency offset of a given user (already estimated), but it also uses an optimal time phase and frequency offset for each subchannel. For this reason, it can be implemented only using an inefficient high-complexity FB with no polyphase implementation. It corrects the frequency offset at its input, and it has P subchannels (per user) realized by a mixer and a decimation filter with LN coefficients. It requires The US-RX compensates the time/frequency offset for each user, but it deploys a common sampling phase for all the subchannels of a given user. In this case, the efficient implementation with an FB per user can be deployed and realized with the method in [4,5]. Since the tones are interleaved, we can exploit the discrete Fourier transform (DFT) properties and use a P-point DFT, yielding In (38), we take into account the fact that only min(QM 2 , LN) coefficients at the FB outputs differ from     zero. Their position cyclically shifts in an a priori known fashion, which allows simplifying the computation of the periodic transform.
The FS-RX has a complexity that is essentially equal to (33) plus 2M/T 0 NOPS due to the correction of the fractional frequency offset. Table 1, we report a numerical example in terms of complex operations per second per user. We consider the following set of parameters: M = 32, N = 40, P = 8, L = 12, N TR = 30, K = 3, and K = N TR − K. Further, we consider 4 or 8 users. These parameters are also used in the next section about system performance. Looking at the synchronization stage the FS-RX shows a significant advantage in terms of complexity with respect to the SCS-RX and the US-RX. This is due to the possibility of deploying an efficient implementation also during the synchronization stage. Looking at the detection stage, the US-RX and the FS-RX have similar complexity but are both less complex than the SCS-RX.

Performance Results
To evaluate the performance of the proposed synchronization algorithms, we consider an FMT system with M = 32 tones. A single user or four asynchronous users with interleaved tone allocation are present. The interpolation factor is N = 40. The prototype pulse has duration 12T 0 , and it is designed according to the method in [4] which yields a good frequency confinement with a theoretical bandwidth equal to 1.25/T 0 = 1/MT. To simulate the asynchronous uplink, we assume the carrier frequency offsets independent and uniformly distributed in [−Δ max f , Δ max f ], while the time offsets are uniformly distributed in [0, T 0 ]. The user channels are assumed Rayleigh faded with an exponential power delay profile with independent T-spaced taps that have average power Ω p ∼ e −pT/(0.05T0) with p ∈ Z + and truncation at −20 dB. The data symbols and the training sequences belong to the 4 PSK signal set. The training sequences are randomly generated. If, for example, we assume a 20 MHz bandwidth the channel has overall duration equal to 500 nanoseconds, and the transmission rate is 32 Mbit/s. The subchannel equalizers have length equal to 3 taps. Clearly, if we increase the number of subcarriers, the subchannel equalizer length may be shortened.
First, we consider the convergence of the synchronization metrics as a function of the training sequence length. In Figures 4 and 5 we plot the standard deviation of the frequency offset estimation error as a function of the training sequence length, respectively, for the metric that is used in the US-RX and the FS-RX. The estimation error has zero mean. The factor K in the estimator is set equal to 3. The SNR is set to 20 dB, and the curves are reported for two values The curves show that the estimation error standard deviation rapidly decreases with the increase of the training sequence length (herein we consider a length equal to 10 ÷ 30 symbols). Looking at Figure 4, where the US-RX is considered, we see that in the 4-user case a degradation with respect to the single-user case is obtained. This is justified by the fact that in the multiuser case each user deploys a fraction equal to 1/4 of the total number subchannels. Therefore, less redundancy in the frequency domain is exploited with respect to the single-user case. For instance, a standard deviation equal to 10 −5 is achieved with training sequences of length 10 for the single-user case and ε f = 0.05, while the same value is obtained with training sequences of length 23 for the four users case (Figure 4(a)).
The curves also show that the estimator is robust to a wide range of frequency offsets. In fact, they remain nearly unchanged for the two values of carrier frequency offset herein shown. In Figure 4, we also show the standard deviation of the estimation error using 1 and 2 iterations according to the method described in Section 4. The second iteration provides some small benefit only for large carrier frequency offsets.
In Figure 5, we consider the FS-RX, and we report the standard deviation of the estimation error Δ (u) is obtained from the estimation of the integer and fractional part. Then, the standard deviation is averaged over the users. For both the single-user case and the multiuser case, the estimation error standard deviation is larger than that obtained in the US-RX. Some small improvement is obtained by increasing the frequency resolution of the receiver, that is, increasing Q from 1 to 4. The effect of increasing Q is beneficial in the bit error rate (BER) performance, as we discuss in the following, for high values of carrier frequency offset.
In Figures 6 and 7, we report average BER as a function of the SNR for a maximum carrier frequency offset normalized to the subcarrier spacing ε f = 0.05. The ideal curves assume perfect synchronization and channel estimation, while the other curves assume practical synchronization and channel estimation with a single iteration for the SCS-RX and the US-RX. For the ideal case, we use a 3-tap subchannel MMSE equalizer. For the practical case, we use RLS training of the equalizer over the known synchronization sequences. The training sequences have length 30 unless otherwise stated.
First, we consider the ideal curves. The US-RX exhibits an error floor both for 1 user (Figure 6(a)) and 4 users (Figure 7(a)). The best performance is achieved by the EURASIP Journal on Wireless Communications and Networking  SCS-RX which remains practically unchanged with 1 or 4 asynchronous users. This shows the robustness of the FMT scheme to the multiple access interference due to the good subchannel spectral containment. The proposed FS-RX with ideal synchronization nearly achieves the performance of the SCS-RX both with 1 and 4 users. Now, we consider the practical curves which allow benchmarking the performance of the synchronization parameter estimators (Figures 6(b) and 7(b)). Figure 6(b) shows that with a single user the practical curves are close to the ideal ones for all the receivers. The proposed FS-RX with practical synchronization performs better than the practical SCS-RX in the single-user case. Herein, we consider Q = 1, since for Q = 4 we have not found improvements. Further, a single iteration is used by the estimator in the SCS-RX and the US-RX. With four users (Figure 7(b)) the practical FS-RX performs close to the SCS-RX with a penalty of about 1 dB at BER = 10 −2 and 2 dB at BER = 10 −3 . However, it has lower complexity. We also report in Figure 7(b) the BER curve that has been obtained with the FS-RX that uses synchronization sequences of length 20 instead of 30. The BER penalty is significant only at high SNRs.
Comparing the practical curves with the ideal ones, we see that a higher loss is found for the 4-user case. This is because we have fewer subchannels per user, so when we compute the synchronization metrics we have less redundancy in the frequency dimension.
To understand the dependency from the frequency offset value, we report in Figures 8 and 9 the BER as a function of the maximum carrier frequency offset. The SNR is set to 20 dB. Again, we first analyze the ideal curves. For N U = 1, both the SCS-RX and the US-RX have performance independent of ε f since the frequency offset is fully compensated (Figure 8(a)). The US-RX has lower performance than the SCS-RX. For N U = 4, the SCS-RX and the US-RX exhibit a rapid performance degradation only for ε f > 0.04. This is due to the multiple access interference that degrades performance when the carrier frequency offsets are such that the subchannels may overlap by more than 1/4 the excess band of the pulse.
The FS-RX (Figure 8(b) and 9(b)) with ideal synchronization has performance close to the SCS-RX. Increasing the factor Q from 1 to 4 improves performance for ε f > 0.04. Now looking at the practical curves, for all receivers the performance is close to the ideal curves. Larger, although nonsignificantly high, penalties are seen for the 4-user case. The estimators for the SCS-RX and the US-RX (Figures 8(a) and 9(a)) provide estimates for normalized carrier frequency offsets up to 0.13 using K = 3, while for the FS-RX up to 0.63.  It should be noted that in Figure 9(a) the practical curves for ε f = 0.12 are better than the ideal ones. This can be justified by the fact that for large frequency offsets the MAI interference from subchannel overlapping can become significant. In particular a frequency offset equal to ε f = 0.12 among adjacent subchannels translates into a significant superposition that equals 3/4 the excess band of the pulse. Therefore, the full compensation of the frequency offsets provided by the SCS-RX and US-RX may not be the optimal choice. That is, for large frequency offsets an analysis FB that is not perfectly frequency matched can exhibit a higher signal-to-interference ratio at its outputs. This is what happens with the practical receiver that uses frequency offsets that differ from the ideal ones.
The overall conclusion is that the FS-RX both with ideal and practical synchronization provides performance close to the SCS-RX yet allowing for an efficient DFT-based implementation.

Conclusions
In this paper, we have discussed a training-based synchronization approach for multiuser FB systems. Both two conventional receiver structures and a novel multiuser analysis FB with an efficient implementation have been described. The proposed synchronization metrics are based on a correlation approach that exploits the separability of subchannel signals that belong to different uplink users. An iterative analysis FB synchronization approach has also been considered for the conventional receivers. Simple RLS adaptive subchannel equalization has been used. The practical synchronization algorithms yield performance close to the ideal. The proposed novel multiuser receiver is an attractive solution both for its DFT-based implementation and for its performance with practical estimation of the parameters. With ideal synchronization, it achieves singleuser transmission performance. Further, it shows a high robustness of the practical estimator to a wide range of time/frequency offset.

A. Subchannel Output ICI and MAI Terms
In this appendix, we report the derivation of the ICI and MAI terms at the subchannel outputs in the SCS-RX and the FS-RX, that appear in (4) and (10), respectively.
First, we consider the SCS-RX. Substituting (1) and (3) in the first line of (4), we can write