EURASIP Journal on Applied Signal Processing 2005:6, 981–988 c ○ 2005 Hindawi Publishing Corporation Iterative Code-Aided ML Phase Estimation and Phase Ambiguity Resolution

As many coded systems operate at very low signal-to-noise ratios, synchronization becomes a very di ﬃ cult task. In many cases, conventional algorithms will either require long training sequences or result in large BER degradations. By exploiting code properties, these problems can be avoided. In this contribution, we present several iterative maximum-likelihood (ML) algorithms for joint carrier phase estimation and ambiguity resolution. These algorithms operate on coded signals by accepting soft information from the MAP decoder. Issues of convergence and initialization are addressed in detail. Simulation results are presented for turbo codes, and are compared to performance results of conventional algorithms. Performance comparisons are carried out in terms of BER performance and mean square estimation error (MSEE). We show that the proposed algorithm reduces the MSEE and, more importantly, the BER degradation. Additionally, phase ambiguity resolution can be performed without resorting to a pilot sequence, thus improving the spectral e ﬃ ciency.


INTRODUCTION
In packet-based communications, frames arrive at the receiver with an unknown carrier phase.When phase estimation (PE) is performed by means of a conventional non-dataaided (NDA) algorithm [1], the resulting estimate exhibits a phase ambiguity, due to the rotational symmetries of the signalling constellation.Phase ambiguity resolution (PAR) can be accomplished by a data-aided (DA) algorithm that exploits the presence of a known pilot sequence in the transmitted data stream [2].The need for PAR can be removed by using differential encoding, which however results in a BER degradation, and requires significant changes to the decoder in case of iterative demodulation/decoding [3].Since a phase ambiguity resolution failure gives rise to the loss of an entire packet, its probability of occurrence should be made sufficiently small.At the same time, the pilot sequence must not be too long as it reduces the spectral efficiency of the system.
Although conventional estimation algorithms perform well for uncoded systems, a different approach needs to be taken when powerful error-correcting codes are used.These codes operate typically at low SNR, making the estimation process more difficult.By exploiting the knowledge of certain code properties, a more accurate estimate may be obtained.In [4], by approximating the log-likelihood function, iterative phase estimation and detection is performed, while [5] uses the so-called extrinsic information after each decoding iteration to perform phase estimation.Similarly, [6] exploits the observation that the magnitude of the extrinsic information depends on the phase error.By changing the turbo decoder, certain types of phase estimation errors can be resolved [7].An EM-based algorithm was proposed in [8] but required certain approximations to operate in coded systems.Apart from these ad hoc methods, a theoretical framework for code-aided estimation was proposed in [9] and applied to phase estimation.In [10], using a factor-graph representation, various phase models were considered and message-passing algorithms for joint decoding and phase estimation were derived.Most of the papers above made no comparisons with conventional estimation algorithms.Furthermore, the problem of PAR was not considered.On the other hand, in [11,12], a form of code-aided PAR was proposed, but assuming perfect phase estimation and using the code structure in an ad hoc fashion.This paper addresses the problem of joint phase estimation and phase ambiguity resolution for a turbo-coded system [13].Based on [9], we make use of the EM algorithm [14] to derive a maximum-likelihood (ML) method for PE and PAR.We make comparisons in terms of mean square estimation error (MSEE) and BER with some known schemes from literature.We go on to show how convergence issues may be dealt with, without any increase in computational complexity, MSEE, and BER.Finally, we demonstrate that although the EM-based PE algorithm does not necessarily yield a substantial gain in terms of BER as compared to a conventional PE algorithm, the EM-based PAR algorithm is mandatory if we wish to avoid long pilot sequences.

SYSTEM DESCRIPTION
The transmitted sequence, denoted by the row vector s, consists of a pilot sequence (p, length L) and an unknown data sequence (a, length N), that is, s = [p a].The data symbols are obtained by mapping a sequence of interleaved coded bits onto a signalling constellation.The received vector is given by where n is a row vector consisting of L + N complex AWGN samples with real and imaginary components having variance σ 2 = N 0 /(2E s ), where E s is the energy per transmitted symbol.The pilot and data symbols are taken from an M-PSK constellation 1 with |p m | = |a n | = 1, for m = 0, 1, . . ., L − 1 and n = 0, 1, . . ., N − 1.The unknown carrier phase θ is in the interval (−π, π).Detection of the data symbols a is based upon the rotated vector re − j θ , with θ denoting an estimate of the carrier phase θ.
We introduce the integer part, k θ , and the fractional part, ε θ , of the phase θ, defined by where |ε θ | < π/M and k θ ∈ {0, 1, . . ., M − 1}.The PE algorithm involves the estimation of the continuous parameter ε θ or θ, whereas PAR refers to the estimation of the discrete parameter k θ .Estimation of ε θ and θ will be denoted by "fractional phase estimation" (FPE) and "total phase estimation" (TPE), respectively, wherever it is appropriate to make such a distinction.

DA total phase estimation
Considering only the observations [r 0 , . . ., r L−1 ] that correspond to the pilot symbols, an ML estimate of θ may be obtained as follows [15].Defining 1 Generalization to other constellations is straightforward.
the ML estimate becomes θML,p = arg max θ C p e − j θ = arg C p . (4) As this phase estimate is in the interval (−π, π), no PAR is required.

NDA fractional phase estimation combined with DA PAR
Note that in (4), the observations [r L , . . ., r N+L−1 ] are not exploited.These observations can be used in a NDA estimator, such as a Viterbi and Viterbi (VV) estimator [1].However, because of the rotational symmetry of the M-PSK constellation, the NDA estimate suffers from an M-fold phase ambiguity, and is to be interpreted as an estimate of the fractional part ε θ rather than the "total" phase θ.Hence, the VV estimator yields [1] εθ The NDA FPE algorithm ( 5) must be combined with a PAR algorithm that estimates the integer part k θ of the phase.A conventional PAR algorithm based upon the pilot sequence is [2] kθ = arg max k∈{0,...,M−1} where εθ is the NDA estimate resulting from (5).

ML estimation through the EM algorithm
Assume we want to estimate a (discrete or continuous) parameter b from an observation vector r in the presence of a so-called nuisance vector a.The ML estimate of b maximizes the log-likelihood function where Often p(r| b) is difficult to calculate.The EM algorithm [14] is a method that iteratively solves (7).Defining the complete data x = [r, a], the EM algorithm breaks up in two parts: the expectation part (9) and the maximization part (10): It has been shown that b(n) converges to a stationary point of the likelihood function under fairly general conditions [14].However, when the initial estimate ( b(0) ) is not sufficiently close to the ML value, the EM algorithm may converge to a local maximum or a saddle point instead of the global maximum of the likelihood function.To avoid these convergence problems, we propose the following solution [16].Assuming we have K initial estimates { b(0) 1 , . . ., b(0) K }, we apply the EM algorithm (( 9)-( 10)) K times, each with a different initial estimate; after convergence, this will result in K tentative estimates { b1 , . . ., bK }.The final estimate of b is the tentative estimate with the largest likelihood: As the computation of the likelihood function p(r|b) is generally intractable, we resort to the following approximation: where Q( bk , bk ) is obtained by evaluating (9) for b = bk and b(n) = bk .Although using (12) instead of (11) may seem somewhat ad hoc, p(r| bk ) and Q( bk , bk ) turn out to have very similar shapes in our situation.For the sake of illustration, we have computed these functions for b = θ through computer simulations for some short random codes.A typical result is shown in Figure 1.
The EM algorithm can easily be extended to acquire the maximum a posteriori (MAP) estimate of b by taking the a priori distribution p(b) into account in (9).

ML phase estimation
We now make use of the EM algorithm for estimating the carrier phase θ.We define the complete data as x = [r, a].Taking (1) into account, we obtain In the appendix, we show that, since a and θ are independent, (9) becomes where C p is given by (3) and wherein  denotes the a posteriori average of the data symbol a i .Here {α l } is the set of constellation points.The quantity µ i (r, θ) can be interpreted as a soft symbol decision: it is a weighted average of all possible constellation points.The a posteriori probabilities in ( 16) can be provided by a MAP decoder.Application of (10) yields the following iterative algorithm for TPE: The algorithm starts with n = 0 from some initial estimate θ(0) .Such an estimate can be obtained either according to (4) or by taking θ(0) = 2π( kθ /M) + εθ with εθ and kθ resulting from ( 5) and ( 6).This initial estimate is used to phase-correct the vector r, which is then fed to the detector which computes the corresponding a posteriori probabilities.From that point on, we can apply the EM algorithm (17).
Generally the true a posteriori symbol probabilities are difficult to compute.For that reason, we resort to a suboptimal scheme whereby the detector consists of a soft-in soft-out (SISO) demapper and a SISO decoder.The latter operates on coded bits, rather than on coded symbols.The decoder incorporates bit interleaving, BCJR decoding [17], and bit deinterleaving.Such an implementation of the EM estimator is shown in Figure 2. Depending on the system's set-up, the detector may iterate between demapping and decoding (as in a BICM-ID scheme [18]).The resulting a posteriori probabilities of the coded bits are then recombined to yield a posteriori probabilities of the coded symbols.

Convergence properties
In this section, we will illustrate some convergence properties of the EM total phase estimation algorithm (17).We first introduce the notion of the normalized phase estimation error e (n) = ( θ(n) − θ)/2π, where θ is the true (unknown) value of the carrier phase and θ(n) the estimated value after n EM iterations.The behavior of the EM TPE algorithm is analyzed based on the evolution of e (n) from one iteration (n) to the next (n + 1).We have carried out computer simulations for a turbo-coded system with QPSK mapping (to be described in more detail in Section 5) to obtain E[e (n+1) ] and E[Q( θ(n) , θ(n) )], where E[•] denotes averaging with respect to the pilot sequence, the coded data symbols, the Gaussian noise, and the carrier phase.The results are shown in Figure 3.Note that these results do not depend on the specific value of n and that we plot E[e (n+1) ] − e (n) , rather than E[e (n+1) ], as a function of e (n) .
In Figure 3a, we plot the measured values of E[e (n+1) ] − e (n) as a function of e (n) .The negative and positive zerocrossings of E[e (n+1) ] − e (n) correspond to the stable and unstable equilibrium points of the EM algorithm.The stable equilibrium points are at e (n) = {−0.5,−0.25, 0, 0.25, 0.5} whereas the unstable equilibrium points are at e (n) = {−0.375,−0.125, 0.125, 0.375}.These equilibrium points are independent of the SNR.Hence, the acquisition range of the EM algorithm for QPSK is |e (0) | < 0.125, corresponding to a maximum allowable initial phase error magnitude of π/4.For larger phase errors, the EM algorithm will (on average) converge to an incorrect stable point.We have verified (results not shown) that for turbo-coded BPSK, the acquisition range is |e (0) | < 0.25, corresponding to a maximum allowable initial phase error magnitude of π/2.
Figure 3b shows measurements of E[Q( θ(n) , θ(n) )] as a function of e (n) .We observe that the previously mentioned stable and unstable equilibrium points correspond to local maxima and minima, respectively.In particular, the stable equilibrium point e (n) = 0 corresponds to the global max- From these two figures, we draw the important conclusion that proper operation of the EM algorithm ( 17) requires an initial estimate θ(0) without phase ambiguity.The DA estimate (4) exhibits no phase ambiguity, but a long pilot sequence is needed to keep the variance of the estimate within acceptable limits.Instead, we propose to apply the EM algorithm with NDA initialization, but with KM rather than one initial estimate: where εθ is obtained from the NDA FPE algorithm ( 5), M denotes the constellation size, and the integer K ≥ 1 is a design parameter.Applying the EM algorithm will result in KM tentative estimates.The final phase estimate is then obtained  according to (12) with b = θ.This way, we can be sure that K initial estimates yield a corresponding initial normalized error e (0) within the acquisition range of the EM algorithm.Strictly speaking, K = 1 is sufficient, but we will point out in the next section the advantage of taking K > 1.In the remainder of this paper, we will denote the EM algorithm with KM initial values by "EM-K." In the case of perfect PAR (i.e., k θ is known), this EM algorithm can easily be specialized into a purely FPE algorithm by retaining from (18) only the K initial estimates closest to 2πk θ /M and applying algorithm (17).Similarly, the EM algorithm can be modified to a PAR algorithm by fixing εθ and then applying (12) with b = k θ .

PERFORMANCE RESULTS
We evaluate the performance of the EM algorithm for PE and PAR when applied to a turbo-coded system with QPSK mapping.The constituent convolutional codes of the turbo code are systematic and recursive with rate 1/2, generator polynomials (21, 37) 8 , and constraint length 5.The turbo code consists of the parallel concatenation of two unpunctured constituent encoders, which yields an overall code rate of 1/3.We perform 10 turbo-decoding iterations per EM iteration to compute the a posteriori symbol probabilities P[a i = α l |r, θ].Codewords consist of 1002 bits (not including pilot bits).The EM algorithm was executed until convergence (i.e., until | θ(n+1) − θ(n) | is sufficiently small) with a maximum of 10 EM iterations.Through simulation, we have made a comparison with conventional schemes from Section 3 and a code-aided scheme from literature.

Computational complexity
The total computational complexity of the joint estimation and decoding algorithm is proportional to KMDI, where M is the size of the constellation, KM is the number of executions of the EM algorithm, D is the decoding time per codeword, and I is the number of EM iterations.From the shape of the curves in Figure 3, we may infer that convergence will occur sooner (i.e., for less EM iterations) when the initial estimate is nearer to the correct value.It may therefore be advantageous to execute the estimation algorithm with more than M initial values but with fewer EM iterations.
To illustrate this, Figure 4 shows, as a function of the number of EM iterations (I), the BER performance of the EM FPE algorithm for turbo-coded QPSK at an SNR of 1 dB.We compare K = 1, K = 2, and K = 4, and also show the BER values corresponding to VV estimation (5) with perfect PAR and to perfect TPE.For a given value of I, the BER performance evidently improves with increasing K.More importantly, for a given computational complexity (i.e., fixed KI), EM-2 and EM-4 yield a comparable BER, and considerably outperform EM-1.Finally, as compared to the BER corresponding to perfect TPE, we observe that for a large number of iterations, EM-1 still results in a significant BER degradation, whereas EM-2 and EM-4 have excellent performance, even for a limited number of iterations.Hence, introducing more initial estimates not only allows us to reduce the number of EM iterations and the computational complexity, but also has the additional advantage that convergence to the correct phase value is highly probable.

Phase estimation
Figure 5a shows the mean square estimation error (MSEE) performance of the VV estimator and the EM estimators, assuming perfect PAR (i.e., k θ is known at the receiver, so that only ε θ needs to be estimated).As a reference, we include the modified Cramér-Rao bound (MCRB) for a known sequence of 1002 bits (=501 QPSK symbols).The MCRB is a lower bound for the MSEE of any unbiased estimator [19].Application of the EM-1 algorithm reduces the MSEE but the MCRB is reached only for SNRs above 2 dB.The EM-2 algorithm is able to further reduce the MSEE and reaches the MCRB for E b /N 0 ≥ 1.5 dB.The different MSEE performances resulting from the EM algorithms indicate that the EM-1 algorithm occasionally converges to an incorrect value due to the (large) initial estimation error.
To see how this translates in BER performance, we refer to Figure 5b.Clearly, the VV FPE algorithm results in a high BER degradation.EM-1 is able to partly reduce this degradation.However, the remaining degradation at BER=10 −4 is still around 0.5 dB.By applying EM-2, we are able to essentially remove any resulting degradation.Note that increasing the number of initial estimates (i.e., increasing K to 3 or more) will further reduce the MSEE but the corresponding reduction of the BER will be barely noticeable.

Phase ambiguity resolution
We first note that the combination of any PAR algorithm with any FPE algorithm will result in degradation at least as large as the degradations of the separate algorithms.For that reason, we will only consider the following schemes (we remind that L is the length of the pilot sequence, expressed in symbols): (i) TPE: EM-2 + init(VV); L: the EM algorithms is executed 2M times with initial estimates given by ( 18); (ii) PAR: CORR + perfect FPE; L: the conventional PAR algorithm (6) under the assumption of perfect knowledge of ε θ ; (iii) PAR: REEN + perfect FPE; L: this algorithm is formally obtained by replacing the soft decisions µ i in (15) by the data symbols obtained by re-encoding the decoded information sequence [12].This approach has roughly the same computational complexity as the EM-1 PAR algorithm; (iv) PAR: EM-hard + perfect FPE; L: this algorithm is formally obtained by replacing the soft decisions µ i from (15) by the nearest (hard) constellation symbol.This can be seen as a code-aided decision-directed PAR algorithm.
In the latter two cases, ε θ is assumed to be known at the receiver.The estimated phase shift (2πk/M) is the one resulting in the largest correlation of the hard symbol decisions (resp., with and without re-encoding) with the rotated received vector (r exp(− j2πk/M)).
Figure 6 shows the BER performance for the various approaches.We see that "PAR: CORR + perfect FPE" requires fairly long pilot sequences to reach acceptable BER performance, thus reducing the spectral efficiency of the system.The re-encoding rule leads to a BER degradation when E b /N 0 is below 2 dB.We now consider the EM algorithm approach.Using hard instead of soft data decisions leads to very high BER for all considered SNR, even under perfect FPE.On the other hand, application of "TPE: EM-2 + init(VV)" leads to a very good performance, even when no pilot sequence is present.

CONCLUSION AND REMARKS
This contribution has considered the problem of phase estimation (PE) and phase ambiguity resolution (PAR) in (turbo-) coded systems.Starting from the ML criterion, we have pointed out how code-aided PE and PAR may be performed iteratively based on the EM algorithm, and how convergence issues may be addressed.We have compared the resulting algorithms with known algorithms (of which some do and some do not take code properties into account) in terms of the mean square estimation error (MSEE) and the BER.Through simulation of a turbo-coded QPSK transmission system, we have shown that (i) code-aided PAR can achieve a very small BER degradation, even in the absence of pilot symbols; (ii) conventional PAR can achieve a very small BER degradation only at the expense of a sufficient number of pilot symbols; (iii) code-aided PE is required to achieve a very small BER degradation.
We should mention that for turbo-coded BPSK transmission (results not reported in this paper), the conventional VV phase estimator (assuming perfect PAR) results in negligible BER degradation, as compared to perfect PE and PAR.

Figure 1 :
Figure 1: Comparison of (a) p(r|θ) and (b) Q(θ, θ) (both up to a multiplicative constant) for a short random code with QPSK mapping.The true value of the carrier phase is 0 radians.

Figure 3 :
Figure 3: Convergence behavior for EM phase estimation.

Figure 4 :
Figure 4: Number of EM iterations versus number of initial values tradeoff (QPSK, E b /N 0 = 1 dB).