Computing the information rate of discrete-time Wiener phase noise channels by parametric Bayesian tracking

: A new upper bound (UB) on the information rate (IR) transferred through the additive white Gaussian noise channel aﬀected by Wiener’s laser phase noise is proposed in the paper. The bound is based on Bayesian tracking of the noisy phase. Speciﬁcally, the predictive and posterior densities involved in the tracking are expressed in parametric form, therefore tracking is made on parameters. This make the method less computationally demanding than known non-parametric methods, e.g. methods based on phase quantization and trellis representation of phase memory. Simulation results show that the UB is so close to the lower bound that we can claim of having virtually computed the actual IR.


Introduction
Multiplicative phase noise is a major source of impairment in radio and optical channels. Recent studies about the phase noise that arises in optical channels and about its effects in coherent optics can be found in [1][2][3]. The basic model of laser phase noise is a Wiener process, [3], [4], where the power spectral density (PSD) of the spectral line is a slope of -20 dB/decade. Several methods have been proposed in the literature to combat the detrimental effects of Wiener phase noise (WPN). Among these methods we cite iterative demodulation and decoding techniques of [5][6][7] and the insertion of pilot symbols (possibly with staged decoding [8].) Note however that single carrier systems affected by non-ideal frequency responses of the channel plus phase noise require specific mitigation techniques [9] that are not addressed in this paper.
The capacity of AWGN channels affected by phase noise with white PSD is studied in [1], [10,11]. Computation of information rates (IRs) for the multiplicative WPN plus AWGN channel, which is a channel with memory and continuous state, is a challenging problem. Monte Carlo approaches based on non-parametric Bayesian tracking (BT) techniques of the hidden phase with memory, e.g. phase state-space quantization [12,13] and particle filtering [14,15], have been recently proposed for computing IRs. In [16] the method of particle filtering is adopted to work out an approximation to the IR.
The method proposed here is based on parametric representation of densities. This leads to efficient computation, since, as shown in the paper, the parameters can be updated in the iteration steps of the BT method by closed-form equations. Conversely, in non-parametric methods, densities are estimated and updated point-wisely in their support space, leading to a more demanding computation.

Channel and Source Model
Let u k i indicate the vector (u i , u i+1 , · · · , u k ) with u k i ∈ U k i , and let U indicate a stationary and ergodic process, U = (U 0 , U 1 , · · · ), whose generic realization is the sequence (u 0 , u 1 , · · · ). When U k i is a continuous set, p(u k i ) is used to indicate the multivariate probability density function (PDF), while when U k i is a discrete set p(u k i ) indicates the multivariate mass probability and |U i | denotes the number of elements in U i .
The k-th output of the channel is where j is the imaginary unit, Y is the complex channel output process, X is the channel complex input modulation process, and Φ is the phase noise process which is assumed to be independent of X and W . We assume that the input process X is made of i.i.d. random variables with zero mean and unit variance. Process W is a complex AWGN process with zero mean and variance SNR −1 . Process Φ is modeled as a Wiener process folded in the range [−π, π) where the frequency noise V is a i.i.d. sequence of Gaussian random variables with zero mean and unit variance, γ is a scalar, and φ 0 is uniformly distributed in [−π, π). Equations (1) can be cast in the framework of continuous-state first-order Markov channels, where the state at time k is φ k . For this class of channels Moreover, from (1) one realizes that the channel is memoryless given the state, therefore Let h(U) denote the relative entropy rate of process U. According to [15], an upper bound The terms h(Φ) and h(Y |X, Φ) can be evaluated in a straightforward manner, see [15]. The two UBs h(Y ) and h(Φ|Y, X ) can be based as in [15] on the normalized Kullback-Leibler (KL) divergence. Assuming that U is stationary and ergodic, one can invoke the Shannon-McMillan-Breiman theorem and the chain rule, gettinḡ where u n 0 is generated according to the actual multivariate PDF p(u n 1 ), and the initial condition u 0 is given. The bound can be extended to the conditional relative entropy rate in a straightforward manner. The auxiliary PDF q(·) can be seen as an approximation to p(·). In this perspective, the KL divergence is a measure of the quality of the fit between p(·) and q(·), and the UB is equal to the actual relative entropy rate when the fit is ideal, that is when q(·) = p(·).

Bayesian Tracking
In the first-order model of phase noise (1) the hidden state at time k is the phase φ k . BT of the hidden state is based on the two-step iteration for k = 1, 2, · · · is the posterior density, z k 1 is the channel measurement up to time k, and the denominator of the above equation is computed by the Chapman-Kolmogoroff equation Note that the state transition probability p(φ k |φ k−1 ) appears in (5) in place of p(φ k |φ k−1 , z k−1 1 ) thanks to the Markovian property (2). Moreover, thanks to (3), the channel transition probability p(z k |φ k ) can be used in place of p(z k |φ k , z k−1 1 ) in (5). The predictive density (and, as a consequence, the posterior density as well) cannot be expressed in closed form and therefore it should be approximated in some way. We adopt parametric BT to obtain approximations to the probability density functions appearing in the relative entropy rates h(Y ) and h(Φ|Y, X ). When h(Φ|Y, X ) is evaluated, one puts Z = (Y, X ) and the auxiliary probability that appears inside the logarithm of (4) is evaluated as where the approximation q(φ k |y k 1 , x k 1 ) to the actual posterior probability is obtained from the parameters worked out by the parametric BT, and g(µ, σ 2 ; x) is a Gaussian density with mean µ and variance σ 2 evaluated in x. When h(Y ) is considered, Z = Y and the approximation q(y k |y k−1 1 ) to p(y k |y k−1 1 ) is obtained from the Chapman-Kolmogoroff equation (6) and used inside the base-2 logarithm of (4).
Two parametric forms are hereafter considered: Tikhonov and Fourier. As expected, Tikhonov parametrization outperforms Fourier parametrization in approximating p(φ k |y k 1 , x k 1 ), because channel and state transition probabilities are very well approximated by Tikhonov distributions of the phase φ k (see (9) and (11)), hence Tikhonov parameters almost represent a sufficient statistics for computing the IRs. On the other hand, Fourier parametrization is well suited to approximate p(φ k |y k−1 1 ), which is a periodic function of period π/2 and with a small bandwidth (hence a few Fourier coefficients completely describe the function) for most constellations of practical use. In general, using a parametrization that is far from representing a sufficient statistics, will give looser bounds on the IRs. See App. A and B for details about Tikhonov and Fourier parametrizations, respectively. As shown in the appendices, all the integrals involved in the evaluation of auxiliary PDF appearing in (4) can be computed in closed form. This makes the proposed method computationally efficient.

Simulation Results
In this section, the new UB is compared to a lower bound worked out by the computationally demanding trellis-based method of [12]. Specifically, the lower bound has been obtained with 128 states. Fig. 1 reports the results obtained with 4-QAM and with γ = 0.125: This is the largest value of γ obtained in the experimental results of [3] and can be regarded as a case of strong phase noise in cases of practical interest. The lower bound is virtually indistinguishable from the UB obtained with parametric BT in the entire range of signal-to-noise ratios (SNRs). A similar analysis holds for the results obtained with 16-QAM and reported in Fig. 1. Also in this case, the UB obtained with parametric BT virtually gives the actual IRs for γ = 0.125 and any SNRs. The figures report also the UB computed with the non-parametric approach (Trellis) for 32 and 64 states: For both constellations we observe that such UBs are not tight and are often looser than the AWGN information curve, that serves as a trivial IR UB. Increasing the number of states of the non-parametric approach helps in getting tighter bounds, at the expense of a much higher computational complexity. When the actual IR saturates (SNR= {10, 20} dB resp. for 4-and 16-QAM), the gap between the 64-state UB and the actual IR is larger for 16-QAM. However, if the gaps are normalized to the maximum IR, they are the same for the two modulation formats. Moreover, observe that getting a good IR estimate at high SNR is much more difficult with a non-parametric approach rather than a parametric one, because in this regime estimating accurately the density tail is crucial [15]: At high SNR the densities become spiky and tails are better captured by a parametric description, with a negligible computational complexity.

Conclusion
A new upper bound (UB) above the information rate transferred through the discrete-time WPN channel has been presented. The results, compared to the lower bound obtained with the computationally demanding non-parametric methods of [12], show that the UB is accurate in all cases of practical interest. Before concluding the paper, a remark is in order about phase noise of order higher than one, which is out of the scope of the present paper and that will be subject of future research. Extension of the parametric-based approach to phase noise with memory of order higher than one, as the one studied in [15], is feasible by proposing an appropriate parametric density. In contrast, quantizing a multidimensional state space according to the approach of [12] would lead to an exponential increase of the number of states of the trellis, making computation unfeasible. Also, non-parametric methods based on particle filtering [14,15] can suffer from the high space dimensionality, especially at high SNR. To get accurate UBs one is led to increase the number of particles, thus increasing the complexity of the computation.

A. Tikhonov Parametrization
The method is based on the approximation of the posterior density at time k − 1 as a Tikhonov density of real variable φ k−1 ∈ [−π, π) and complex parameter ω k−1 where I 0 (·) is the modified Bessel function of the first kind and {·} denotes the real part. The state transition probability is hereafter approximated as Using these approximations in place of the actual probability density functions in (5), and using the properties of Tikhonov densities listed in [18] one has where ω k = ω k−1 /(1+γ 2 |ω k−1 |). The channel transition probability at time k can be approximated with a Tikhonov density of variable φ k : Using (10) and (11) into (5) gives Having computed ω k by parametric BT (10) and (12), one can get the contribution at time k to the entropy rate h(Φ|Y, X ) by taking the logarithm of the auxiliary probability (7). Note that in the BT we assume that the posterior and the predictive densities are Tikhonov, thus getting a close approximation to the actual densities without the need of evaluating integrals numerically. However, in the evaluation of (7), the integral appearing in the denominator cannot be evaluated in closed form if we still adopt the Tikhonov parametrization. Therefore, we adopt a Gaussian density based on the complex parameter ω k obtained by BT as an approximation to the actual posterior density appearing in the right side of (7). Specifically, we put in (7) q(φ k |y k 1 , x k 1 ) = g(∠ω k , |ω k | −1 ; φ k ). This allows to compute in closed form the integral appearing in the denominator of (7): where the last equality follows because the integrals appearing inside the sum are convolutions between Gaussian densities.

B. Fourier Parametrization
We assume that constellation X has a 4-fold symmetry, therefore at time k − 1 it is possible to approximate the posterior density p(φ k−1 |y k−1 1 ) in the interval [−π, π) by the Fourier series where { A (k−1) n } m n=−m are the Fourier coefficients of the auxiliary posterior density at time k − 1, and m is responsible of the accuracy of the approximation. Using (14) and (9) into (5) one has which provides us with the prediction step A
(16) The auxiliary posterior density is found by plugging (15) and (16) Fourier coefficients A (k) n can be obtained from the convolution A (k) n ∝Ã (k) n = m i=−m A (k) i B (k) n−i by suitably normalizing theÃ n 's such that (17) is a density. Note that the truncation for n = −m, . . . , m of the Fourier expansion in the right hand side of (17) might be such that q(φ k |y k 1 ) is negative for some values of φ k . It is easy to show that a sufficient condition for (17) to be non-negative is that for m = 1 is also a necessary condition. At each iteration of the BT we check (18): if it is satisfied then the tracking algorithm can proceed by the normalization A (k) n =Ã (k) n /(2πÃ (k) 0 ), n = −m, . . . , m, otherwise we put A (k) 0 = (2π) −1 and A (k) n =Ã (k) n /(4π m i=1 |Ã (k) i |), n 0. In any case, the contribution at time k to the relative entropy rate h(Y ) is obtained by putting inside the base-2 logarithm appearing in (4) the auxiliary probability q(y k |y k−1 1 ) = π −π q(φ k |y k−1 1 )q(y k |φ k )dφ k = 2πÃ (k) 0 .