A Practical Scheme for Frequency Offset Estimation in MIMO-OFDM Systems

This paper deals with training-assisted carrier frequency o ﬀ set (CFO) estimation in multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. The exact maximum likelihood (ML) solution to this problem is computationally demanding as it involves a line search over the CFO uncertainty range. To reduce the system complexity, we divide the CFO into an integer part plus a fractional part and select the pilot subcarriers such that the training sequences have a repetitive structure in the time domain. In this way, the fractional CFO is e ﬃ ciently computed through a correlation-based approach, while ML methods are employed to estimate the integer CFO. Simulations indicate that the proposed scheme is superior to the existing alternatives in terms of both estimation accuracy and processing load.


Introduction
Orthogonal frequency-division multiplexing (OFDM) is an attractive modulation technique for wideband wireless communications due to its robustness against multipath distortions and flexibility in allocating power and data rate over distinct subchannels. For these reasons, it is adopted in a variety of applications, including digital audio broadcasting (DAB), digital video broadcasting (DVB), and the IEEE 802.11a wireless local area network (WLAN) [1]. Combining OFDM with the multiple-input multiple-output (MIMO) technology is an effective solution to increase the capacity of practical commercial systems. The deployment of multiple antennas at both the transmitter and receiver ends can be exploited to improve reliability by means of space-time coding techniques and/or to increase the data rate through spatial multiplexing [2].
Similar to single-input single-output (SISO) OFDM, MIMO-OFDM is extremely sensitive to carrier frequency offsets (CFOs) induced by Doppler shifts and/or oscillator instabilities. The CFO destroys orthogonality among subcarriers and must be accurately estimated and compensated for to avoid severe error rate degradations [3]. While CFO recovery is a well-studied problem for single antenna systems, only few solutions are available for MIMO-OFDM.
A blind kurtosis-based scheme is presented in [4], while a method for jointly estimating the CFO and MIMO channel is derived in [5] by placing null subcarriers and pilot tones across adjacent OFDM blocks. Unfortunately, these methods are quite complex as they require a large-point discrete Fourier transform (DFT) operation and a computationally demanding line search. Furthermore, they provide the CFO estimate upon observation of several OFDM blocks, and accordingly, are not suited for packet-oriented applications, where synchronization must be completed shortly after the reception of a packet. In order to achieve fast timing and frequency recovery, training sequences with a periodic structure are commonly employed in SISO-OFDM systems [6][7][8]. Extending this approach to MIMO-OFDM, however, is not straightforward as signals emitted from different antennas give rise to multistream interference (MSI) at the receiver station, which may degrade the accuracy of the synchronization algorithms. The detrimental effect of MSI can be alleviated by a careful design of the MIMO preambles. For instance, in [9], it is shown that the performance of the least-squares (LSs) channel estimator is optimized if the training sequences at different TX branches are orthogonal and shift-orthogonal for at least the channel length. To meet such requirement, a time-orthogonal design is employed in [10], where different TX antennas transmit their preambles 2 EURASIP Journal on Wireless Communications and Networking over disjoint time intervals. In this way, however, the preamble length grows linearly with the number of TX branches, thereby, increasing the system overhead. The use of chirp-like polyphase sequences is suggested in [11], while a training block composed of repeated PN sequences with good cross-correlation properties is employed in [12]. In both cases, the CFO estimate is obtained by cross-correlating the repetitive parts of the received preambles in a way similar to SISO-OFDM. This approach is also adopted in [13,14], where the pilot sequences are obtained by repeating Chu or Frank-Zadoff codes with a different cyclic shift applied at each TX antenna. Alternative criteria for MIMO-OFDM preamble design can be found in [15,16].
A subspace-based method for CFO estimation in MIMO-OFDM has recently been proposed in [17]. In this scheme, pilot symbols at different transmit antennas are frequencydivision multiplexed (FDM) and placed over equally spaced subcarriers. The resulting preambles are characterized by an inherent periodic structure in the time domain which can be effectively exploited at the receiver to separate signals arriving from different TX antennas. This approach is reminiscent of the multiple-signal-classification (MUSIC)based frequency recovery scheme employed in [18] in the context of orthogonal frequency division multiple access (OFDMA). The main advantage with respect to [18] is that in [17], the CFO estimate is obtained with reduced complexity by looking for the roots of a real-valued polynomial function. A root-based approach is also adopted in [19] after writing the CFO metric in polynomial form.
In this paper, the repetitive slots-based CFO estimator discussed in [8] is extended to MIMO-OFDM transmissions. In order to enlarge the frequency acquisition range, however, we decompose the CFO into a fractional part plus an integer part. The fractional CFO is computed first by crosscorrelating the repetitive segments of the received preambles in a way similar to [8], while the integer CFO is subsequently estimated by resorting to maximum likelihood (ML) methods. This results into an algorithm of affordable complexity which can estimate large CFOs and whose accuracy attains the relevant Cramer-Rao bound (CRB).
The rest of this paper is organized as follows. Section 2 describes the system model and introduces basic notation. In Section 3, we review the joint ML estimation of the CFO and MIMO channel, while Section 4 is devoted to the training sequences design and CFO recovery scheme. Simulation results are presented in Section 5 and some conclusions are drawn in Section 6. Notation 1. Matrices and vectors are denoted by boldface letters, with W N and I N being the DFT matrix and identity matrix of order N, respectively. A = diag{a(n); n = 1, 2, . . . , N} denotes an N × N diagonal matrix with entries a(n) along its main diagonal, while B −1 is the inverse of a square matrix B. We use E{·}, (·) * , (·) T , and (·) H for expectation, complex conjugation, transposition, and Hermitian transposition, respectively. The notation · represents the Euclidean norm of the enclosed vector, while Re {x}, |x|, and arg{x} stand for the real part, modulus, and principal argument of a complex number x. Finally, [B] k,l denotes the (k, l)th entry of a matrix B, while λ is a trial value of the unknown parameter λ.

System Model
We consider a MIMO-OFDM system with N T transmitting and N R receiving antennas. We denote by N the number of available subcarriers which are enumerated from n = 0 to n = N − 1 and call c i = [c i (0), c i (1), . . . , c i (N − 1)] T the frequency domain pilot sequence at the ith TX antenna. Before transmission, this sequence is converted in the time domain through an inverse discrete Fourier transform (IDFT) operation and a cyclic prefix (CP) of length N g is inserted to avoid inter-block interference (IBI). The signal emitted from the ith TX branch arrives at the mth RX antenna after propagating through a multipath channel with discrete- where L is a design parameter that depends on the duration of the transmit/receive filters and on the channel delay spread. Since one single oscillator is used for frequency conversion at both ends of the wireless link, the same CFO is assumed for all transmit/receive antenna pairs. We denote by where ν is the frequency offset normalized by the subcarrier spacing. Assuming ideal timing recovery and N g ≥ L, we have where n m is an N-dimensional vector of AWGN samples with zero-mean and variance σ 2 n , while s m = [s m (0), s m (1), . . . , s m (N − 1)] T is the useful signal component, which is modeled as In (2) , we have set A i = W H N C i F L , where C i = diag{c i (n); 0 ≤ n ≤ N − 1} collects the pilot sequence emitted by the ith TX antenna, while F L is an N × L matrix with entries In Section 3 we show how to exploit vectors {x m ; 1 ≤ m ≤ N R } for jointly estimating the CFO ν and the MIMO channel In doing so, we adopt the FDM training sequences suggested in [17], which optimize the performance of the LS channel estimator thanks to their shift orthogonality properties [9]. Such sequences are expressed by where Q is a power of two not smaller than N T , {μ i } are integer parameters satisfying 0 ≤ μ 1 < μ 2 < · · · < μ NT < Q, and d i (n )} are pilot symbols with constant modulus |d i (n )| = Q/N T . In this way, the total energy allocated to training amounts to E T = N and is equally split between the TX antennas.
EURASIP Journal on Wireless Communications and Networking 3

Maximum Likelihood Frequency Estimation
Given the unknown parameters (H, ν), from (1), it turns out that vectors {x m } are statistically independent and Gaussian distributed with mean Γ(ν)s m and covariance matrix σ 2 n I N . Hence, bearing in mind (2), the log-likelihood function (LLF) for (H, ν) takes the form As a consequence of the FDM property of the employed training sequences, we observe that Using this fact, after neglecting irrelevant terms independent of H and ν, we may rewrite the LLF as where we have borne in mind that Γ H ( ν)Γ( ν) = I N . The joint ML estimate of the unknown parameters is the location where Λ 1 ( H, ν) achieves its global maximum. After standard computations, the CFO estimate is found to be where and LL H is the following Cholesky decomposition: In the sequel, we refer to (7) as the maximum likelihood frequency estimator (MLFE). The following remarks are in order.
a necessary condition for the existence of (A H i A i ) −1 in the right-hand-side of (9) is that L ≤ N/Q. On the other hand, from (4), it follows that and reduces to N · I L if L ≤ N/Q. In such a case, the frequency metric simplifies to (2) By invoking the asymptotic efficiency property of the MLFE, the frequency estimate (7) is expected to be unbiased with an accuracy that approaches the corresponding CRB for large data records and sufficiently high signal-to-noise ratios (SNRs). Using the LLF in (5), it is found that [19]: where (8) undertakes heavy computational burden. One possible way to reduce the system complexity is indicated in [19], where g( ν) is transformed into a real-valued polynomial function, and the CFO estimate is indirectly obtained by means of a polynomial rooting procedure. In this paper, we follow the alternative approach outlined in [8], by which a periodicity is first introduced in the MIMO training sequences, and CFO recovery is then accomplished by measuring the phase rotations between the repetitive parts of the received preambles. For this purpose, the sequences in (4) are modified so as to simultaneously satisfy the following constraints:

Frequency Estimation with Reduced Complexity
(C1) pilot symbols are equipowered, equispaced in the frequency domain and modulate distinct subcarriers at different TX antennas according to the FDM principle; (C2) each vector W H N c i (i = 1, 2, . . . , N T ) of time domain samples is obtained by the repetition of R identical segments, where R is some power of two.
Condition C1 implies that the N T preambles remain shift-orthogonal in the time domain, which is desirable to enhance the accuracy of the channel estimates, while condition C2 facilitates CFO recovery by ensuring that the preambles are periodic with period P = N/R.
To proceed further, let Q be a power of two with Q ≥ N T . Then, it can be easily shown that C1 and C2 are simultaneously met if pilot symbols at each TX antenna are equispaced in the frequency domain at a distance of M = QR 4 EURASIP Journal on Wireless Communications and Networking subcarriers and their positions are shifted by R subcarriers from one TX branch to the next. This amounts to putting where we set |d i (n )| = M/N T to ensure that the total energy allocated to training is still E T = N. It is worth observing that the use of time-repetitive FDM training sequences for MIMO-OFDM has also been suggested in [16] to make the CRB of the frequency estimates independent of the channel realization. However, our design (14) is more general as it applies to any triple N, N T , L), whereas in [16], the number of subcarriers is constrained to be a multiple of N T L. Recalling that in practical OFDM systems N is always a power of two, it turns out that the sequence design in [16] can only be adopted on condition that both N T and L are powers of two.
As it is known, the use of OFDM preambles composed by R repetitive slots restricts the acquisition range of the CFO estimator to ±R/2 times the subcarrier spacing. To cope with such a drawback, we decompose ν into a fractional part, less than R/2 in magnitude, plus an integer part which is multiple of R. The normalized CFO is thus rewritten as where η is an integer parameter referred to as the integer CFO (ICFO), while ε is the fractional CFO (FCFO) and belongs to the interval (−1/2, 1/2]. Since the transmitted preambles remain periodic after passing through the channel (apart from the presence of thermal noise and from a phase shift induced by the CFO), each vector of received time domain samples can be decomposed into R segments In (16), u m is a P-dimensional vector with elements while {n m (r); r = 0, 1, . . . , R−1} are statistically independent Gaussian vectors with zero-mean and covariance matrix σ 2 n I P .

Estimation of the Fractional CFO.
Our first goal is the estimation of ε based on the observations {x m } NR m=1 . Inspection of (16) reveals that this task is complicated by the presence of the nuisance vectors {u m }. One possible approach is to consider such vectors as deterministic but unknown parameters and proceed to the joint ML estimation of the parameter set (u, ε), with u = u T 1 u T 2 · · · u T NR T . This approach has been used in [8] in the context of SISO-OFDM, and its extension to MIMO transmissions leads to the following FCFO metric: where R m (r) is the rP-lag sample correlation function evaluated at the mth RX branch, that is, The ML estimate of ε is eventually found by locating the global maximum of q( ε). Unfortunately, no closed form solution is available except when R = 2. The more general case can be approached by an exhaustive search over the interval ε ∈ (−1/2, 1/2] which may be cumbersome in practice. For this reason, we suggest a suboptimal but simpler procedure which develops in two steps. In the first step a coarse FCFO estimate is obtained as The rationale behind the above expression is easily understood after substituting (16) into (19). This yields where N m (r) is a zero-mean disturbance term collecting signal × noise and noise × noise interactions. Inspection of (21) reveals that, in the absence of noise, the right-handside of (20) is just the true FCFO. In order to improve the estimation accuracy, ε (c) is refined in the second step by looking for an estimate of the residual error Δε = ε − ε (c) . For this purpose, we let R (c) m (r) = R m (r)e − j2π ε (c) r and rewrite (18) in the following form: where we have defined Δ ε = ε − ε (c) and ϕ (c) m (r) = arg{R (c) m (r)}. Setting to zero the derivative of (22) with respect to Δ ε and assuming that Δε is small enough such that sin[ϕ (c) m (r) − 2πΔ εr] ϕ (c) m (r) − 2πΔ εr, an estimate of Δε can be computed in closed form as The final FCFO estimate is given by

Estimation of the Integer CFO.
If the normalized CFO is guaranteed to be less than R/2 in magnitude, the quantity εR can be regarded as an estimate of ν. Otherwise, ν is expressed as in (15), and an estimate of the integer offset η must be found. This problem is now addressed using ML methods. In order to compensate for the fractional offset ε, the received samples at each RX branch are first counter-rotated at an angular speed 2π εR/N. This produces the N R vectors z m = [z m (0), z m (1), . . . , z m (N − 1)] T , with EURASIP Journal on Wireless Communications and Networking 5 Substituting (1)-(2) into (25) and assuming ideal FCFO compensation, we obtain where n m = Γ H ( εR)n m is the noise contribution, which is statistically equivalent to n m . Vectors {z m } are next used to get the joint ML estimate of (H, η). Bearing in mind (26), the corresponding LLF is found to be by which, maximizing with respect to h m,i , we obtain matrix whose rank is not greater than min {L, N/M}. Hence, a necessary condition for the existence of (A H i A i ) −1 is that L ≤ N/M. In such a case, if the pilot sequences are those defined in (14) The concentrated likelihood function for η is found by substituting (29) into the right-hand-side of (27). Neglecting irrelevant terms independent of η, we obtain and the ML estimate of η is computed as where |η| max represents the largest expected value of |η|, which is determined by the stability of the transmitter and receiver oscillators. Recalling that A i = W H N C i F L , after standard manipulations, we may put ψ( η) in the equivalent form where {Z m (n)} is the repetition with period N of the DFT of z m , that is, On the other hand, from (14), we see that symbols c i (n) are different from zero only when n = p i (n ), where p i (n ) = n M + (i − 1)R are the indices of the pilot subcarriers at the ith TX antenna. Function ψ( η) can thus be rewritten as Once the ICFO is obtained as indicated in (31), an estimate of the CFO is computed from (15) in the form In the sequel, we refer to (35) as the reduced complexity frequency estimator (RCFE).

Remarks.
(1) As mentioned previously, matrix A H i A i in (28) is nonsingular provided that L ≤ N/M. Such condition is more restrictive than the constraint L ≤ N/Q that was found in the previous section for MLFE. In particular, recalling that M = QR, it turns out that the maximum channel length that RCFE can manage is R times smaller than for MLFE.
(2) Assuming for simplicity that the ICFO has been perfectly estimated, from (35), it follows that E{( ν − ν) 2 } = R 2 · E{( ε − ε) 2 }. Since parameters (u, ε) are jointly estimated through ML methods, we expect that E{( ε − ε) 2 } asymptotically approaches the corresponding CRB. The latter is provided in [8] and reads where σ 2 s denotes the average signal power at each RX branch, that is, The frequency MSE is thus given by (3) The computational load of RCFE can be assessed as follows. Computing the correlations {R m (r)} R−1 r=1 in (19) requires a total of 2(R−1)(2N −1) real operations (additions plus multiplications) for each RX branch, while 8N R (R − 1) operations are needed to obtain Δ ε in (23). Quantities Z m (n) in (33) are computed through an N-point DFT for each receiving antenna, with a corresponding complexity of 5N R N log 2 N. Finally, evaluating ψ( η) in (34) needs additional 8NLN T N R /M operations for each η. The overall complexity of RCFE is summarized in the first row of Table 1, where a distinction has been made between the FCFO and ICFO recovery tasks, and we have denoted by N η = 2|η| max + 1 the number of hypothesized ICFO values.
(4) Our FCFO recovery algorithm is an improved version of the correlation-based frequency estimator (CBFE) 6 EURASIP Journal on Wireless Communications and Networking proposed in [12]. Actually, both schemes employ training preambles composed by R repetitive parts and operate in two steps. A coarse estimate ε (c) is firstly computed by CBFE in a way similar to (20), and it is next refined by evaluating the quantity The final CFO estimate is obtained as ν CBFE = R( ε (c) + Δ ε), and its MSE is given by [12] E ν CBFE − ν Comparing this results with (38), we see that the loss (in dB) with respect to RCFE is 10·Log[4(1 − 1/R 2 )/3], which approaches 1.25 dB for large values of R. Furthermore, since no ICFO estimation is attempted in [12], the estimation range of CBFE is restricted to |ν| ≤ R/2, while RCFE can cope with CFOs as large as ±N/2. The overall complexity of CBFE is shown in the third line of Table 1. Compared to FCFO recovery by means of RCFE, the computational saving of CBFE is in the order of R/3.

Simulation Results
Computer simulations have been run to check and extend the analytical results of the previous sections. The simulation scenario is summarized as follows.

Simulation Model.
The investigated MIMO-OFDM system has N = 1024 subcarriers and operates in the 5 GHz frequency band. The signal bandwidth is 5 MHz, corresponding to a subcarrier distance of approximately 4.9 kHz. The sampling period is T s = 0.2 microsecond, so that the useful part of each OFDM block has length 0.205 millisecond. Each channel is characterized by L = 12 independent Rayleigh fading taps with an exponentially decaying power delay profile In (41), the constant σ 2 h is chosen such that the channel power is normalized to unity, that is, E{ h m,i 2 } = 1. A new channel snapshot is generated at each simulation run and kept fixed over the training period. Vectors h m,i are assumed to be statistically independent for different TX/RX antenna pairs . The training sequences employed by RCFE are given in (14), where we have set R = 8 and Q = 4. In this way, each TX antenna transmits a total of 32 pilot symbols which are randomly taken from a QPSK constellation with power |d i (n )| 2 = 32/N T . Parameters N T and N R are varied throughout simulations to assess their impact on the system performance.
Comparisons are made between RCFE, CBFE, and the polynomial-based frequency estimator (PBFE) proposed in [17]. This scheme employs the training sequences defined in (4) and performs initial ICFO recovery by maximizing the following cost function: After ICFO compensation, the fractional CFO is eventually estimated by looking for the roots of a real-valued polynomial function that is obtained by applying the MUSIC principle. As mentioned in [17], the estimation range of PBFE is |ν| ≤ Q/2. Its computational requirement is mainly ascribed to the need for evaluating the correlation matrix of the received time domain samples and is summarized in the second row of Table 1. Figure 1 compares the performance of the fractional CFO estimators in terms of their MSE E{( ν − ν) 2 } versus the signal-to-noise ratio at each receiving antenna. The latter is defined as SNR = σ 2 s /σ 2 n , where σ 2 n is the noise power, and σ 2 s is given in (37). Marks indicate simulation results, while solid lines are drawn to EURASIP Journal on Wireless Communications and Networking ease the reading of the graphs. The number of TX and RX antennas is N T = 3 and N R = 2, respectively. The same training sequences are used for both CBFE and RCFE, while PBFE employs the pilot design specified in (4) with Q = 32 and {μ 1 , μ 2 , μ 3 } = {0, 1, 5}. This means that the number of pilot symbols transmitted by each TX antenna is 32 for all the considered schemes. As suggested in [17], the pilot symbols {d i (n )} for PBFE belong to a Chu sequence. The CFO is randomly generated at each simulation run with uniform distribution within the interval [−0, 4; 0.4), which corresponds to having η = 0 and ε = ν/R. For the time being, we concentrate on the accuracy of the FCFO estimates and assume ideal ICFO recovery for both RCFE and PBFE. We use the average CRB to benchmark the performance of the considered schemes. The latter corresponds to the extended Miller and Chang bound (EMCB) [20] and is obtained by numerically averaging the right-hand-side of (12) with respect to the channel statistics. Inspection of Figure 1 reveals that RCFE outperforms the other schemes, and its accuracy is close to the EMCB at all investigated SNR values. As predicted by the theoretical analysis shown in (38) and (40), the loss of CBFE with respect to RCFE is approximately 1.25 dB. Looking at the system complexity, from Table 1, it turns out that in the considered scenario, RCFE requires a total of 57 500 operations for FCFO recovery, while PBFE and CBFE need 1 156 000 and 24 000 operations, respectively. Combining these figures with the results of Figure 1 indicates that RCFE is superior to PBFE in terms of both estimation accuracy and processing load, while CBFE is a valid solution when limiting the computational requirement is an issue of concern. Figure 2 illustrates the impact of the number of transmit antennas N T on the accuracy of RCFE. The simulation scenario is the same as in Figure 1, except that now N T = 2,3 or 4. As it is seen, the frequency MSE is virtually independent of N T and the same occurs for the EMCB. Such behavior can be ascribed to the fact that signals emitted by different TX antennas combine incoherently at each RX branch, so that higher values of N T do not result into a corresponding increase of the array gain. As it is known, array gain exploitation by means of multiple TX antennas requires channel knowledge at the transmitter in conjunction with suitable precoding techniques. Figure 3 shows how the performance of RCFE is affected by the number N R of receiving antennas. In such a case,N T is fixed to three while N R = 2, 3 or 4. As predicted by (38), the estimation accuracy improves with N R, and this trend is also evident in the EMCB. The physical reason behind such SNR advantage is that the presence of multiple receiving antennas increases the length of the data record x = [x T 1 , x T 2 , . . . , x T NR ] T used for CFO recovery. This provides the system with an array gain of 10·Log(N R ) dB.

Performance Assessment.
The performance of the ICFO estimators is illustrated in Figure 4 in terms of probability of failure P f = Pr{ η / = η} versus SNR. Comparisons are made between RCFE and PBFE using the same simulation setup of Figure 1. The RCFE metric defined in (34) is evaluated for η ∈ {−2, −1, 0, 1, 2}, while PBFE looks for the maximum of ψ PBFE ( η) over the set η ∈ {−16, −15, . . . , 15}. In this way, the estimation range is |ν| ≤ 20 for RCFE and |ν| ≤ 16 for PBFE. As it is seen, for SNR > −10 dB, the best performance is obtained with RCFE. From Table 1, it follows that the total number of operations needed to get the CFO estimate ν is 1 283 000 for PBFE and 252 500 for RCFE, thereby leading to a reduction of the processing load by a factor greater than 5. It is fair to say, however, that the complexity of PBFE can be controlled by a judicious design of parameter Q. Specifically, decreasing Q alleviates the computational requirement at the expense of a reduced CFO acquisition range.

Conclusions
We have addressed the problem of training-assisted CFO recovery in MIMO-OFDM systems. To reduce the computational burden required by the exact ML solution, we have divided the CFO into a fractional part plus an integer part and have designed FDM pilot sequences that are periodic in the time domain. The fractional CFO is estimated in closed form by measuring the phase rotations between the repetitive parts of the received preambles, while the integer CFO is estimated in a joint fashion with the MIMO channel matrix by resorting to the ML principle. The proposed scheme has affordable complexity and exhibits improved performance with respect to existing alternatives. For these reasons, we believe that it provides an effective approach for frequency synchronization in beyond third generation (3G) wideband MIMO-OFDM transmissions.