Chalmers Publication Library Improving Soft Fec Performance for Higher-order Modulations via Optimized Bit Channel Mappings Improving Soft Fec Performance for Higher-order Modulations via Optimized Bit Channel Mappings References and Links

(2014) "Improving soft FEC performance for higher-order modulations via optimized bit channel mappings". Notice: Changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published source. Please note that access to the published version might require a subscription. Chalmers Publication Library (CPL) offers the possibility of retrieving research publications produced at Chalmers University of Technology. It covers all types of publications: articles, dissertations, licentiate theses, masters theses, conference papers, reports etc. Since 2006 it is the official tool for Chalmers official publication statistics. To ensure that Chalmers research results are disseminated as widely as possible, an Open Access Policy has been adopted. The CPL service is administrated and maintained by Chalmers Library. Abstract: Soft forward error correction with higher-order modulations is often implemented in practice via the pragmatic bit-interleaved coded modulation paradigm, where a single binary code is mapped to a nonbinary modulation. In this paper, we study the optimization of the mapping of the coded bits to the modulation bits for a polarization-multiplexed fiber-optical system without optical inline dispersion compensation. Our focus is on protograph-based low-density parity-check (LDPC) codes which allow for an efficient hardware implementation, suitable for high-speed optical communications. The optimization is applied to the AR4JA protograph family, and further extended to protograph-based spatially coupled LDPC codes assuming a windowed decoder. Full field simulations via the split-step Fourier method are used to verify the analysis. The results show performance gains of up to 0.25 dB, which translate into a possible extension of the transmission reach by roughly up to 8%, without significantly increasing the system complexity. Capacity limits of optical fiber networks , " J. A pragmatic coded modulation scheme for high-spectral-efficiency fiber-optic communications, " J. Next generation FEC for high-capacity communication in optical transport networks, " J. A discrete-time model for uncompensated single-channel fiber-optical links, " IEEE Trans. Modeling of the impact of nonlinear propagation effects in uncompensated optical coherent transmission links, " J. Protograph based LDPC codes with minimum distance linearly growing with block size, " in " Proc. of protograph-based LDPC convolutional codes over erasure channels, " IEEE Trans. Time-varying periodic convolutional codes with low-density parity-check matrix, " IEEE Trans. EXIT-aided bit mapping design for LDPC coded modulation with APSK constellations, " IEEE Commun. On the mapping of low-density parity-check codes for bit-interleaved …


Introduction
There is currently a large interest in developing practical coded modulation (CM) schemes that can achieve high spectral efficiency close to the ultimate capacity limits of optical fibers [1]. Pragmatic bit-interleaved coded modulation (BICM) in combination with low-density paritycheck (LDPC) codes is one of the most popular capacity-approaching CM techniques for achieving high spectral efficiency, due to its simplicity and flexibility [2]. For a BICM system, a helpful abstraction is to think about transmitting data using a single forward error correction (FEC) encoder over a set of parallel binary-input channels, or simply bit channels, with different qualities. This is due to the fact that bits are not protected equally throughout the signal constellation. With this useful picture, an immediate problem is how to best allocate the coded bits from the encoder to these channels. As a baseline, a random or consecutive/sequential mapping is commonly used in practice. However, by optimizing the mapping strategy, one can improve the system performance, at almost no increased complexity cost. While BICM has been studied for fiber-optical communications by many authors, see e.g., [3] or [4] and references therein, to the best of our knowledge, optimized bit channel mappings have not yet been studied for such systems. In the following, we use the term "bit mapper" to denote the device that performs the bit channel mapping. We remark that other terms, e.g., "bit interleaver" or "mapping device", are also frequently used in the literature.
In this paper, we address the bit mapper optimization for a BICM system based on LDPC codes in the context of long-haul fiber-optical communications. Our target system operates over a communication link with a lumped amplification scheme and without optical inline dispersion compensation. In general, the signal undergoes a complicated evolution and interacts with amplified spontaneous emission (ASE) noise and co-propagating signals through dispersive and nonlinear effects. For dispersion uncompensated transmission, it has been shown that an additive Gaussian noise (GN) model can be assumed, provided that dispersive effects are dominant and nonlinear effects are weak [5,6]. We use the GN model for our analysis, which accounts for both the ASE noise from inline erbium-doped fiber amplifiers (EDFAs) and nonlinear noise due to the optical Kerr effect.
The starting point for the optimization problem is a fixed modulation format and a given error correction code, i.e., we do not consider the joint design of the modulation, bit mapper, and code. This scenario is often encountered in practice when the modulation and code have been designed separately and/or are predetermined according to some communication standard. Our focus is on protograph-based LDPC codes [7], which are very attractive from a design perspective and allow for a high-speed hardware implementation, suitable for fiber-optical communications [8]. A protograph is a (small) bipartite graph, from which the Tanner graph defining the code is obtained by a copy-and-permute procedure. As one illustrative example for protographbased codes, we consider the AR4JA protographs developed by researchers from JPL/NASA in [9]. We also consider bit mapper optimization for protograph-based spatially coupled lowdensity parity-check (SC-LDPC) codes using the windowed decoder (WD) proposed in [10]. SC-LDPC codes, originally introduced as LDPC convolutional codes in [11], have attracted a lot of attention due to their capacity-achieving performance under belief propagation (BP) decoding for a variety of communication channels [12]. SC-LDPC codes can be constructed using protographs and they are considered as viable candidates for future spectrally efficient fiber-optical systems [8].
Most of the literature about bit mapper optimization deals with irregular LDPC codes that are not based on protographs, see e.g., [13,14]. Attempts to improve the performance of BICM systems with protograph-based codes through bit mapper optimization have been previously made in [15][16][17]. In [15], a mapping strategy inspired by the waterfilling algorithm for parallel channels called variable degree matched mapping (VDMM) is presented. This idea is extended in [16], where the authors exhaustively search over all possible nonequivalent connections between protograph nodes and modulation bits showing performance improvements over VDMM. As pointed out in [17], the above approaches are somewhat restrictive in the sense that only certain protographs can be used with certain modulation formats. A more flexible approach is proposed in [17], which is in principle suitable for any protograph structure and modulation but relies on a larger intermediate protograph.
Our optimization of the bit mapper is based on the decoding threshold over the additive white Gaussian noise (AWGN) channel similar to, e.g., [13,14,16], albeit assuming a fixed number of decoding iterations. The decoding threshold divides the channel quality parameter range (in our case the equivalent signal-to-noise ratio (SNR) of the GN model) into a region where reliable decoding is possible and where it is not. In the asymptotic case, i.e., assuming infinite codeword length, density evolution (DE) or one-dimensional simplifications via extrinsic information transfer (EXIT) functions can be used to find the decoding threshold for LDPC codes under BP decoding [18]. Approximate decoding thresholds of protograph-based codes assuming binary modulation can be obtained by using the protograph extrinsic information transfer (P-EXIT) analysis [19]. The approach proposed here relies on a modified P-EXIT analysis which allows for a fractional allocation between protograph nodes and modulation bits. This approach is, to the best of our knowledge, novel in the context of protograph-based codes and different from the approaches described in [15][16][17]. In particular, a fractional allocation allows for an unrestricted matching of protographs and modulation formats and additionally does not suffer from an increased design complexity due to a larger intermediate protograph. We also discuss several ways to reduce the optimization complexity. In particular, we introduce periodic bit mappers for SC-LDPC codes with a WD, which is based on the results we previously presented in [20], where optimized bit mappers are found for (nonprotograph-based) SC-LDPC codes assuming parallel binary erasure channels (BECs) without considering the WD. The use of a WD in this paper is motivated by the reduced complexity and decoding delay with respect to full decoding. Finally, we provide a simulative verification assuming both linear and nonlinear transmission scenarios. For the latter case, we use the split-step Fourier method (SSFM) to show that the performance improvements predicted from the AWGN analysis can be achieved for a realistic transmission scenario including nonlinear effects.

Notation
Vectors and matrices are typeset in bold font by lowercase letters a and capital letters A, respectively. Matrix transpose is denoted by (·) , Hermitian transpose by (·) † , and the squared norm of a complex vector by a 2 . I n denotes the identity matrix of size n. Complex conjugation is denoted by (·) * . δ (t) is Dirac's delta function, whereas δ [k] is the Kronecker delta. Convolution is denoted by * . N 0 , R, and C denote the set of nonnegative integers, real numbers, and complex numbers, respectively. Random variables and vectors are denoted by capital letters and their realizations by lowercase letters. The probability density function (PDF) of a random variable Y conditioned on the realization of another random variable X is denoted by f Y |X (y|x), and the expected value by E[·].

Continuous-time channel
We consider transmission of a polarization-multiplexed (PM) signal over a standard singlemode fiber (SSMF) with a lumped amplification scheme as shown in Fig. 1. The optical link consists of N sp spans of SSMF with length L sp . The baseband signal in each polarization is generated via a linear pulse modulation according to s x (t) = ∑ k s x,k p(t − k/R s ), where s x,k ∈ C are the information symbols, p(t) the real-valued pulse shape, and R s the symbol rate. (We give expressions for polarization x only, if polarization y has an equivalent expression.) The PM signal s(t) = (s x (t), s y (t)) is launched into the fiber and propagates according to [21,Ch. 3] where v(t, z) is the complex baseband representation of the electric field and the input to the first fiber span and the output signal are s(t) = v(t, 0) and r(t) = v(t, N sp L sp ), respectively. In (1), α is the attenuation coefficient, β 2 the chromatic dispersion coefficient, and γ the nonlinear Kerr parameter. The terms g(z) and w(t, z) = (w x (t, z), w y (t, z)) model the amplifier gain and the generated ASE noise [22, p. 84]. Each EDFA introduces circularly symmetric complex Gaussian noise with two-sided power spectral density (PSD) N EDFA = (G − 1)hν s n sp [1, eq. (54)] per polarization, where G = e αL sp is the amplifier gain, h is Planck's constant, ν s the carrier frequency, and n sp the spontaneous emission factor. A standard coherent linear receiver is used, consisting of an equalizer, a pulse-matched filter and a symbol-time sampler. This amounts to r

Discrete-time channel
An approximate discrete-time model for the received samples r k = (r x,k , r y,k ) based on the transmitted symbols s k = (s x,k , s y,k ) is given by r k ≈ ζ s k + n k +ñ k , where ζ ∈ C [5]. The term n k = (n x,k , n y,k ) accounts for the linear ASE noise with is the transmit power per polarization (assumed to be equal for both polarizations). η is a function of the link parameters α, β 2 , γ, L sp , N sp and the symbol rate R s [5, eq. (15)], and |ζ | 2 = 1 − |η|P 2 . The conditional PDF in this model is assumed to be Gaussian according to where P N = P ASE + ηP 3 . The equivalent SNR is defined as ρ |ζ | 2 P/(P ASE + ηP 3 ).

Bit-interleaved coded modulation
The transmitted symbols s k in each time instant k take on values from a discrete signal constellation X ⊂ C 2 . Each point in the constellation is labeled with a unique binary string of length m = log 2 |X |, where b i (a), 1 ≤ i ≤ m, denotes the ith bit in the binary string assigned to a ∈ X (counting from left to right). Consider now the block diagram shown in Fig. 2 can be ignored for now. At each time instant k, the modulator Φ takes m bits b i,k , 1 ≤ i ≤ m, and maps them to one of the constellation points according to the binary labeling. We consider two product constellations of one-dimensional constellations labeled with the binary reflected Gray code (BRGC) as shown in Fig. 3, which we refer to as PM-64-QAM and PM-256-QAM. At the receiver, the demodulator Φ −1 computes soft reliability information about the transmitted bits in the form of the log-likelihood ratios (LLRs) where X i,u {a ∈ X : b i (a) = u} is the subconstellation where all points have the bit u at the ith position of their binary label.
A useful way to think about the setup depicted in Fig. 2(a) is to imagine transmitting over a set of parallel bit channels, where one may interpret the conditional distribution of the LLR f L i,k |B i,k (·|·) as a bit channel. In the following, we say that a bit channel is an LLR channel, but not necessarily symmetric in general. Symmetry can be enforced by adding modulo 2 independent and identically distributed bits d i,k to the bits b i,k and multiplying the corresponding LLR byd i,k (see Fig. 2(a)) [23]. The symmetry condition is an important requirement for the analysis in Section 3.3, where one implicitly relies on the assumption that the all-zero codeword has been transmitted [24, p. 389].
To simplify the analysis, the original bit channels are replaced with parallel symmetric Gaussian LLR channels, as shown in Fig. 2(b), where an LLR channel f L|B (l|b) is called a symmetric Gaussian LLR channel with parameter σ 2 if L ∼ N (σ 2 /2, σ 2 ) conditioned on B = 0 and L ∼ N (−σ 2 /2, σ 2 ) conditioned on B = 1. In order to find a correspondence between the LLR channels f L i,k |B i,k (·|·) and the parameters σ 2 i , one may match the mutual information (MI) according to is independent of k and J(σ ) denotes the MI between the output of a symmetric Gaussian LLR channel and uniform input bits. As an example and to visualize the different bit channel qualities, in Fig. 4 we compare the LLR channels (solid lines, estimated via histograms) with the approximated Gaussian LLR channels (dashed lines) assuming an AWGN channel and two different values of ρ for the three distinct bit channels of PM-64-QAM (see Fig. 3(a)). It can be seen that the actual densities are clearly non-Gaussian. However, the Gaussian approximation is quite accurate for the bit mapper optimization as shown later and allows for a major simplification of the analysis, thereby justifying its use.
Consider now the case where a binary code C ⊂ {0, 1} n of length n and dimension d is employed and each codeword c = (c 1 , . . . , c n ) is transmitted using N = n/m symbols s k . The allocation of the coded bits to the modulation bits (i.e., the different bit channels in Fig. 2(b)) is determined by a bit mapper as shown in Fig. 5, where the vectors b 1 , . . . , b m are of length N. Our goal is to find good bit mappers for a fixed code and modulation. As a baseline, we consider a consecutive mapper according to

Encoder
Bit

Protograph-based LDPC codes
An LDPC code of length n and dimension d is defined via a sparse parity- There exist different methods to construct "good" LDPC codes, i.e., good matrices H. One popular method is by using protographs [7]. An LDPC code can be represented by using a bipartite Tanner graph consisting of n variable nodes (VNs) and c check nodes (CNs), where the ith CN is connected to the jth VN if h i, j = 1. A protograph is also a bipartite graph defined by an adjacency matrix P = [p i, j ] ∈ N c ×n 0 , called the base matrix. Given P, a parity-check matrix H is obtained by replacing each entry p i, j in P with a random binary M-by-M matrix which contains p i, j ones in each row and column. This procedure is called lifting and M ≥ max i, j p i, j is the so-called lifting factor. Graphically, this construction amounts to copying the protograph M times and subsequently permuting edges. Parallel edges, i.e., for p i, j > 1, are permitted in the protograph and are resolved in the lifting procedure. The design rate of the code is given by R = 1 − c/n = 1 − c /n , where c = c M and n = n M.

AR4JA codes
As one example to illustrate the bit mapper optimization technique, we consider the AR4JA code family defined by the protographs in [9, Fig. 8]. The base matrix P ( ) of the AR4JA code ensemble with parameter ∈ N 0 can be recursively defined via [17] with c = 3 and n = 5 + 2 . VNs corresponding to the second column of the base matrix are punctured, leading to a design rate of R = (1 − c /n ) · n /(n − 1) = ( + 1)/( + 2).

Spatially coupled LDPC codes
SC-LDPC codes have parity-check matrices with a band-diagonal structure (for a general definition see, e.g., [12]). For completeness, we briefly review the construction via protographs in [25], [10, Sec. II-B]. The base matrix P [T ] of a (J, K) regular, protograph-based SC-LDPC code with termination length T can be constructed by specifying matrices P i , 0 ≤ i ≤ m s of dimension J by K , where m s is referred to as the memory. The matrices are such that P = ∑ m s i=0 P i has column weight J and row weight K for all columns and rows, respectively. Given T and the matrices P i , the base matrix P [T ] is constructed as From the dimensions of P [T ] one can infer a design rate of R(T ) = 1 − (T + m s )J /(T K ). As T grows large, the rate approaches R(∞) = 1 − J /K . Since our goal is not to optimize the code, we rely on base matrices that have been proposed elsewhere in the literature, in particular in combination with a WD which we discuss below. We consider P 0 = (2, 2, 2) and P 1 = (1, 1, 1) according to [10, Design rule 1], where J = 1, K = 3, m s = 1, and R(∞) = 2/3.

Decoding and asymptotic EXIT analysis
We use a modified version of the P-EXIT analysis as a tool to predict the iterative BP performance behavior of the protograph-based codes [19]. A detailed description of this tool for binary modulation is available in [19] and [24, Algorithm 9.2]. Here, we only describe the necessary modifications to account for the WD and the nonbinary modulations. We start with the former and explain the latter in the next section.
We employ the WD scheme developed in [10]. WD helps to alleviate the long decoding delays and high decoding complexity of SC-LDPC codes under full BP decoding by exploiting the fact that two VNs are not involved in the same parity-check equation if they are at least (m s + 1)K columns apart [10]. The WD restricts message updates to a subset of VNs and CNs in the entire graph. After a predetermined number of decoding iterations, this subset changes and the decoding window slides to the next position. Pseudocode for the modified P-EXIT analysis of SC-LDPC codes accounting for the WD is presented in Algorithm 1. The main difference with respect to BP decoding is the window size parameter W , which specifies the number of active CNs in the protograph considered in each window as a multiple of J . The P-EXIT analysis for the standard BP decoder can be recovered from Algorithm 1 by setting T = 1, W = 1, J = c , and K = n .

Asymptotic bit mapper model
Each VN in the protograph represents M VNs in the lifted Tanner graph. Since a VN corresponds to one bit in a codeword, the n VNs in the protograph give rise to n different classes of coded bits that are treated as statistically equivalent in the P-EXIT analysis. In particular, I(ρ) = (I 1 (ρ), . . . , I m (ρ)), then, multiplying I(ρ) by A leads to a vector (Ĩ 1 ,Ĩ 2 , . . . ,Ĩ n ) with the MIs corresponding to the averaged bit channels as seen by the n VN classes. These averaged bit channels are modeled as symmetric Gaussian LLR channels with parameters (σ 2 1 , . . . , σ 2 n ). In particular, the P-EXIT analysis for nonbinary modulation is obtained by changing the initialization step in line 3 of Algorithm 1 and assigning σ 2 i = J −1 (Ĩ i ) 2 , where the algorithm takes A as an additional input to computeĨ i as described.
In order to have a valid probabilistic assignment, all columns in A have to sum to one and all rows in A have to sum to n /m, i.e., we have mn equality constraints in total. The first condition ensures that, asymptotically, all VNs are assigned to a channel, while the second condition ensures that all parallel channels are used equally often. The set of valid assignment matrices is denoted by A m×n ⊂ R m×n . In the case of punctured VNs, the corresponding columns in A are removed and n is interpreted as the number of unpunctured VNs.

Optimization
For a given bit mapper, i.e., for a given assignment matrix A, an approximate decoding threshold ρ * (A) can be found using Algorithm 1 as follows. Fix a certain precision δ , target bit error probability p tar , and maximum number of iterations l max . Starting from some SNR ρ where Algorithm 1 converges to a successful decoding, S = 1, iteratively decrease ρ by δ until the decoding fails. The smallest ρ for which S = 1 is declared as the decoding threshold ρ * (A). For any ρ ≥ ρ * (A), we denote the number of iterations until successful decoding by l s (A, ρ).
We are interested in optimizing A in terms of the decoding threshold for a given protograph and modulation format. The optimization problem is thus where the baseline system realizes a mapping of coded bits to modulation bits such that a i, j = 1/m, ∀i, j, resulting in identical variances σ 2 i for the equivalent bit channels of all VN classes. The corresponding assignment matrix is denoted by A uni . The search space A m×n can be regarded as a convex polytope P in p = (m − 1)(n − 1) dimensions by removing the last row and column in A, replacing the equality constraints with inequality constraints, and writing the matrix elements in a vector x ∈ R p according to the prescription x (i−1)n + j = a i, j for 1 ≤ i ≤ m − 1 and 1 ≤ j ≤ n − 1. While the search space is convex, one can show by simple examples that the objective function is nonconvex in P. In the following, we discuss ways to obtain good bit mappers with reasonable effort. We also remark that some of the optimization approaches proposed previously in the context of bit mapper optimization for irregular LDPC codes are not necessarily appropriate in our case due to the higher number of VN classes, i.e., they can be too complex (for example the iterative grid search in [13]) or do not explore the search space efficiently (simple hill climbing approaches as in [14]).
First, as an alternative to directly optimizing the decoding threshold, we iteratively optimize the convergence behavior in terms of the number of iterations until successful decoding as follows. Initialize ρ to the decoding threshold for the baseline bit mapper, i.e., ρ = ρ * (A uni ).
Find A * such that it minimizes the number of decoding iterations until convergence for the given ρ, i.e., For the found optimized A * , calculate the new decoding threshold ρ * (A * ). If the threshold did not improve, stop. Otherwise, set ρ = ρ * (A * ) and repeat the optimization. The above iterative approach was already used by the authors to find good bit mappers for SC-LDPC codes in [20] for parallel BECs. This approach is largely based on the ideas presented in [27,Sec. IV], where optimized degree distributions for irregular LDPC codes are found. The computational complexity can be significantly reduced compared to the threshold minimization (6). However, it is not guaranteed to be equivalent to a true threshold optimization, i.e., in general A opt = A * . We employ differential evolution [28] to solve the optimization problem in (7), which has been previously applied by many authors in the context of irregular LDPC codes [24, p. 396]. Differential evolution is a solver for unconstrained optimization problems and we briefly indicate how the algorithm is modified to account for the constrained search space. First, since A m×n can be regarded as a convex polytope, it is straightforward to take uniformly distributed points for the initial population via standard random walk procedures [29]. Second, if the algorithm generates a trial point x t that lies outside the polytope, we apply the following randomized bounce-back strategy. Let L be the line segment connecting x t and a random point inside the polytope, and let x i be the intersecting point of L and the boundary of P. We replace x t with a point taken randomly from L , such that it lies in P and has at most a distance d from x i , where d is the distance between x i and x t . For a detailed description of the algorithm itself and some guidelines regarding the optimization parameter choice, we refer the reader to [28]. The optimization complexity is further reduced by constraining the maximum number of iterations l max . Practical systems commonly operate with a relatively small number of BP iterations. For example, in Sec. 5, we assume 50 BP iterations, and hence the decoding thresholds are optimized for the same number of iterations. In the simulative verification, we have observed that the performance of the finite-length codes assuming 50 BP iterations is generally better using a bit mapper that is also optimized for l max = 50 compared to, say, l max = 1000, although the differences were small.
Additionally, for SC-LDPC codes, we take advantage of the structure of the optimized bit mappers for parallel BECs [20], which show a certain form of periodicity. The optimization complexity can then be reduced by assuming that the optimal solution lies in a lowerdimensional subspace of P, defined by assignment matrices that take on a periodic form as A = (A , A , A , · · · , A , A ), with m × V matrices A , A , and A , where V is the periodicity factor. If V is chosen small enough, the dimensionality of the search space (i.e., (m−1)(3V −1)) can be substantially reduced, which generally improves the convergence speed of the differential evolution algorithm.
The methods and complexity reduction techniques described above have been selected to obtain a good trade-off between final performance and design complexity. In certain cases, for example the considered AR4JA code in the next section, it could be possible to further improve the performance at the expense of a higher design complexity by directly targeting the decoding threshold optimization (6) without the need for the iterative optimization (albeit we expect the improvements to be incremental). On the other hand, for the considered SC-LDPC code, the iterative optimization and periodicity assumptions were critical to maintain a reasonable design complexity, which is mainly due to the very large number of protograph VNs.

Results and discussion
In this section, we present and discuss numerical results, and illustrate the performance gains that can be achieved by employing optimized bit mappers. For the baseline systems, we use a consecutive mapping of coded bits to modulation bits. Alternatively, one may use a uniformly random mapping, which has the same expected performance.
In order to show the flexibility of the technique, we consider four different scenarios, combining both modulation formats with one code based on the AR4JA protographs and one SC-LDPC code, where the lifting factor is M = 3000 in all cases. For simplicity, the codes are randomly generated without further consideration of the graph structure. The protograph lifting procedure can in principle be combined with standard techniques to avoid short graph cycles that may potentially lead to high error floors [24,Ch. 6.3]. Alternatively, an additional outer algebraic code may be assumed, which removes remaining errors to achieve a required target BER of 10 −15 . A rate R = 2/3 code based on the AR4JA protograph for = 1 is used, which is denoted by C AR4JA . For the spatially coupled case with T = 30, a code based on the protograph described in Sec. 3.2 is used, which is denoted by C SC . For the given value of T , the design rate is R(30) = 0.656. For the AR4JA code, standard BP decoding is assumed with l max = 50, while for the SC-LDPC codes, we employ a WD with W = 5 and l max = 10, which again amounts to a total of 50 iterations per decoded bit. We also tried other combinations of W and l max with a similar total number of iterations and this combination gave the best performance. For the bit mapper optimization and in particular the P-EXIT analysis, we use the same values for l max and W , and additionally p tar = 10 −5 . The finite-length bit mappers are obtained via the rounded matrix MA * from which the index assignment of coded bits to modulation bits is determined.
Notice that in all four scenarios, the approaches in [15][16][17] are either not possible (due to a mismatch between the number of protograph VNs and the number of modulation bits) or not feasible (due to the large complexity of the resulting optimization). As an example, the protograph corresponding to C SC has 90 VNs and can be directly connected to the three distinct bit channels of PM-64-QAM. This leads, however, to a very large number of possible (nonfractional) connections between protograph VNs and modulation bits.

Linear transmission
We start by providing a verification of the proposed optimization technique assuming an AWGN channel. This case is obtained when nonlinear effects are ignored, i.e., γ = 0. In this case, the channel PDF (2) is valid without approximations.
In Fig. 6(a), the predicted bit error rate (BER) of the AR4JA code via the P-EXIT analysis is shown together with Monte Carlo simulations by the dashed and solid lines, respectively. Performance curves for the baseline bit mappers are shown in red and for the optimized ones in blue. As a reference, we also plot the BER-constrained [24, p. 17] generalized mutual information (GMI) for the corresponding spectral efficiency in each figure (the GMI is also referred to as the BICM capacity [30]). For both scenarios, it can be observed that the optimized bit mappers lead to a significant performance improvement. The gains that can be achieved at a BER of 10 −5 are approximately 0.19 and 0.25 dB for PM-64-QAM and PM-256-QAM, respectively. The predicted gains from the P-EXIT analysis for the same BER is slightly less, i.e., 0.12 and 0.19 dB, respectively. The deviation of the asymptotic analysis from the actual simulation results is to be expected due to the Gaussian approximation of the LLR densities and the finite lifting factor and, hence, finite block lengths of the codes. However, it is important to observe that, even though the optimization was carried out assuming a cycle-free graph structure, the predicted performance gains for the finite-length codes is well preserved.
Similarly, the performance of the SC-LDPC code is shown in Fig. 6(b). The periodicity factor for the bit mapper optimization was set to V = 3. The observed gains at a BER of 10 −5 are approximately 0.20 dB for PM-64-QAM and 0.25 dB for PM-256-QAM. We also show the predicted P-EXIT performance obtained for bit mappers that are optimized assuming a larger periodicity factor of V = 6 by the solid green curves. It can be seen that for both modulation formats, the additional gains are incremental, i.e., for PM-64-QAM the predicted performance curves virtually overlap, while for PM-256-QAM, the difference is roughly 0.01 dB. This suggests that a full optimization of A will be only marginally better than with V = 3.
From Fig. 6, it appears that the P-EXIT analysis consistently underestimates the finite-length performance improvement for the AR4JA code, while it overestimates the improvement for the SC-LDPC code. This observation does, however, not apply in general and seems to be coinci- dental. In particular, we also optimized the bit mapper for AR4JA codes of different code rates (results not shown), and the P-EXIT analysis may also underestimate the true performance improvements in that case. Moreover, we would like to stress that a direct comparison between the two codes is difficult, because of the slightly different code rates (and hence spectral efficiencies) and different decoding complexities and delays. Fair comparisons between SC-LDPC codes and LDPC block codes is an active area of research and beyond the scope of this paper.

Nonlinear transmission
In this section, we consider a transmission scenario including nonlinear effects, i.e., γ = 0, where the assumed channel PDF (2) is only approximately valid. In particular, we study the potential increase in transmission reach that can be obtained by employing the optimized bit mappers.
We consider a single channel transmission scenario to keep the simulations within an acceptable time. In the simulation model, we assume perfect knowledge about the polarization state, and perfect timing and carrier synchronization. All chosen system parameters are summarized in Table 1. Additionally, we use a root-raised cosine pulse p(t) with a roll-off factor of 0.25. In order to solve (1), we employ the symmetric SSFM with two samples per symbol and a fixed step size of ∆ = (10 −4 L 2 D L NL ) 1/3 , where L D = 1/(|β 2 |R 2 s ) and L NL = 1/(γP) is the dispersive and nonlinear length, respectively. The input power that maximizes ρ according to the GN  In Fig. 7, the simulated BER of the PM systems using C AR4JA and C SC is shown as a function of the number of fiber spans N sp by the dashed and solid lines, respectively. Again, curves corresponding to the baseline bit mappers are shown in red, while curves corresponding to the optimized bit mappers are shown in blue. Notice that the SNR decrease (in dB) is not linear with increasing number of spans, hence the different slopes compared to the curves shown in Fig. 6. For PM-256-QAM, the transmission reach can be increased by roughly one additional span for both codes, at the expense of a slightly increased BER. For example, for C SC , the transmission reach can be increased from 12 to 13 spans, while the BER slightly increases from 10 −5 to 3 · 10 −5 . For PM-64-QAM, the increase is roughly 1 span for C AR4JA and roughly 2 spans for C SC . In fact, these gains can be approximately predicted also from the GN model. For example, for the chosen input power and system parameters, the GN model predicts an SNR decrease of roughly 0.3 dB from N sp = 12 to N sp = 13 and 0.15 dB from N sp = 34 to N sp = 35, i.e., one would expect the performance improvements in the linear transmission scenario to translate into roughly one additional span for PM-256-QAM and one to two additional spans for PM-64-QAM. This estimate corresponds to an increase of the transmision reach by 3-8%, which is well in line with the simulation results presented in Fig. 7.

Conclusion
In this paper, we studied the bit mapper optimization for a PM fiber-optical system. Focusing on protograph-based codes, an optimization approach was proposed based on a fractional allocation of protograph bits to modulation bits via a modified P-EXIT analysis. Extensive numerical simulations were used to verify the analysis for a dispersion uncompensated link assuming both linear and nonlinear transmission regimes. The results show performance improvements of up to 0.25 dB, translating into a possible extension of the transmission reach by up to 8%.