Digital backpropagation accounting for polarization-mode dispersion

Digital backpropagation (DBP) is a promising digital-domain technique to mitigate Kerr-induced nonlinear interference. While it successfully removes deterministic signal–signal interactions, the performance of ideal DBP is limited by stochastic e ects, such as polarizationmode dispersion (PMD). In this paper, we consider an ideal full-field DBP implementation and modify it to additionally account for PMD; reversing the PMD e ects in the backward propagation by passing the reverse propagated signal also through PMD sections, which concatenated equal the inverse of the PMD in the forward propagation. These PMD sections are calculated analytically at the receiver based on the total accumulated PMD of the link estimated from channel equalizers. Numerical simulations show that, accounting for nonlinear polarization-related interactions in the modified DBP algorithm, additional signal-to-noise ratio gains of 1.1 dB are obtained for transmission over 1000 km. © 2017 Optical Society of America OCIS codes: (060.0060) Fiber optics and optical communications; (060.1660) Coherent communications. References and links 1. Y. Yamamoto, Y. Kawaguchi, and M. Hirano, “Low-loss and low-nonlinearity pure-silica-core fiber for Cand L-band broadband transmission,” J. Lightw. Technol. 34(2), 321–326 (2016). 2. X. Liu, A. R. Chraplyvy, P. J. Winzer, R. W. Tkach, and S. Chandrasekhar, “Phase-conjugated twin waves for communication beyond the Kerr nonlinearity limit,” Nat. Photon. 7(7), 560–568 (2013). 3. S. L. Jansen, D. van den Borne, P. M. Krummrich, S. Sälter, G.-D. Khoe, and H. de Waardt, “Long-haul DWDM transmission systems employing optical phase conjugation,” IEEE J. Sel. Topics Quantum Electron. 12(4), 505–520 (2006). 4. Y. Gao, J. C. Cartledge, A. S. Karar, S. S.-H. Yam, M. O’Sullivan, C. Laperle, A. Borowiec, and K. Roberts, “Reducing the complexity of perturbation based nonlinearity pre-compensation using symmetric EDC and pulse shaping,” Opt. Express 22(2), 1209–1219 (2014). 5. M. Secondini and E. Forestieri, “On XPM mitigation in WDM fiber-optic systems,” IEEE Photon. Technol. Lett. 26(22), 2252–2255 (2014). 6. R. J. Essiambre, P. J. Winzer, X. Q. Wang, W. Lee, C. A. White, and E. C. Burrows, “Electronic predistortion and fiber nonlinearity,” IEEE Photon. Technol. Lett. 18(17), 1804–1806 (2006). 7. E. Ip, “Nonlinear compensation using backpropagation for polarization-multiplexed transmission,” J. Lightw. Technol. 28(6), 939–951 (2010). 8. G. P. Agrawal, Nonlinear Fiber Optics (Academic, 2013), 5th ed. 9. E. Temprana, E. Myslivets, L. Liu, V. Ataie, A. Wiberg, B. P. P. Kuo, N. Alic, and S. Radic, “Two-fold transmission reach enhancement enabled by transmitter-side digital backpropagation and optical frequency comb-derived information carriers.” Opt. Express 23(16), 20774–83 (2015). 10. R. Maher, T. Xu, L. Galdino, M. Sato, A. Alvarado, K. Shi, S. J. Savory, B. C. Thomsen, R. I. Killey, and P. Bayvel, “Spectrally shaped DP-16QAM super-channel transmission with multi-channel digital back-propagation,” Nat. Sci. Rep. 5, 8214 (2015). 11. D. Lavery, D. Ives, G. Liga, A. Alvarado, S. J. Savory, and P. Bayvel, “The benefit of split nonlinearity compensation for optical fiber communications,” IEEE Photon. Technol. Lett. 28(17), 1803–1806 (2016). 12. D. Rafique and A. D. Ellis, “Impact of signal–ASE four-wave mixing on the e ectiveness of digital back-propagation in 112 Gb/s PM-QPSK systems,” Opt. Express 19(4), 3449–3454 (2011). 13. F. Yaman and G. Li, “Nonlinear impairment compensation for polarization-division multiplexed WDM transmission using digital backward propagation,” IEEE Photon. J. 2(5), 816–832 (2010). 14. G. Gao, X. Chen, and W. Shieh, “Influence of PMD on fiber nonlinearity compensation using digital back propagation,” Opt. Express 20(13), 14406–14418 (2012). 15. K. Goroshko, H. Louchet, and A. Richter, “Fundamental limitations of digital back propagation due to polarization mode dispersion,” in “Proc. of Asia Communications and Photonics Conference (ACP),” (Hong Kong, China, 2015), p. ASu3F.5. 16. G. Liga, T. Xu, A. Alvarado, R. I. Killey, and P. Bayvel, “On the performance of multichannel digital backpropagation in high-capacity long-haul optical transmission,” Opt. Express 22(24), 30053–30062 (2014). 17. G. Liga, C. B. Czegledi, T. Xu, E. Agrell, R. I. Killey, and P. Bayvel, “Ultra-wideband nonlinearity compensation performance in the presence of PMD,” in “Proc. of European Conference on Optical Communication (ECOC),” (Düsseldorf, Germany, 2016), pp. 794–796. 18. N. V. Irukulapati, H. Wymeersch, P. Johannisson, and E. Agrell, “Stochastic digital backpropagation,” IEEE Trans. Commun. 62(11), 3956–3968 (2014). 19. I. Fernandez de Jauregui Ruiz, A. Ghazisaeidi, P. Tran, and G. Charlet, “Impact of polarization mode dispersion on digital nonlinear compensation algorithms in dispersion unmanaged systems,” in “Proc. of Optical Fiber Communication Conference (OFC),” (Anaheim, CA, 2016), p. Th3D.3. 20. I. Fernandez de Jauregui Ruiz, A. Ghazisaeidi, E. Awwad, P. Tran, and G. Charlet, “Polarization e ects in nonlinearity compensated links,” in “Proc. of European Conference on Optical Communication (ECOC),” (Düsseldorf, Germany, 2016), pp. 379–381. 21. K. Goroshko, H. Louchet, and A. Richter, “Overcoming performance limitations of digital back propagation due to polarization mode dispersion,” in “Proc. of International Conference on Transparent Optical Networks (ICTON),” (Trento, Italy, 2016), p. Mo.B1.4. 22. C. B. Czegledi, G. Liga, D. Lavery, M. Karlsson, E. Agrell, S. J. Savory, and P. Bayvel, “Polarization-mode dispersion aware digital backpropagation,” in “Proc. of European Conference on Optical Communication (ECOC),” (Düsseldorf, Germany, 2016), pp. 1091–1093. 23. C. B. Czegledi, G. Liga, D. Lavery, M. Karlsson, E. Agrell, S. J. Savory, and P. Bayvel, “Modified digital backpropagation accounting for polarization-mode dispersion,” to appear in “Proc. of Optical Fiber Communication Conference (OFC),” (Los Angeles, CA, 2017), p. W1G.6. 24. S. J. Savory, “Digital coherent optical receivers: algorithms and subsystems,” IEEE J. Sel. Topics Quantum Electron. 16(5), 1164–1179 (2010). 25. H. Louchet, K. Kuzmin, and A. Richter, “Improved DSP algorithms for coherent 16-QAM transmission,” in “Proc. of European Conference on Optical Communication (ECOC),” (Brussels, Belgium, 2008), p. Tu.1.E.6. 26. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables (Dover Publications, 1964). 27. R. Bellman, Introduction to Matrix Analysis (McGraw-Hill, 1960). 28. R. Haber and H. Unbehauen, “Structure identification of nonlinear dynamic systems—a survey on input/output approaches,” Automatica 26(4), 651–677 (1990). 29. D. Marcuse, C. R. Manyuk, and P. K. A. Wai, “Application of the Manakov-PMD equation to studies of signal propagation in optical fibers with randomly varying birefringence,” J. Lightw. Technol. 15(9), 1735–1746 (1997). 30. C. B. Czegledi, M. Karlsson, E. Agrell, and P. Johannisson, “Polarization drift channel model for coherent fibre-optic systems,” Nat. Sci. Rep. 6, 21217 (2016). 31. C. H. Prola, J. A. Pereira da Silva, A. O. Dal Forno, R. Passy, J. P. von der Weid, and N. Gisin, “PMD emulators and signal distortion in 2.48-Gb/s IM-DD lightwave systems,” IEEE Photon. Technol. Lett. 9(6), 842–844 (1997). 32. W. Kabsch, “A solution for the best rotation to relate two sets of vectors,” Acta Crystallogr. Sect. A 32(5), 922–923 (1976). 33. A. J. Laub, Matrix Analysis for Scientists & Engineers (Society for Industrial and Applied Mathematics, 2005).


Introduction
The ever-increasing global demand for digital-data tra c has resulted in an accelerated development of the optical networks.One of the key technologies supporting this demand is coherent optical detection, which has reached maturity and is widely deployed.Among the many benefits of this technology, it facilitates the use of polarization multiplexing and the recovery of all degrees of freedom of the optical field, thus improving receiver sensitivity and allowing the use of higher-order modulation formats.Furthermore, it naturally suits the use of digital signal processing (DSP) tools to compensate for channel impairments and hardware imperfections.
Despite the fact that the optical fiber channel is fundamentally nonlinear, historically, the maximum achievable data rates through optical fibers were limited by linear impairments, such as chromatic dispersion (CD) and polarization-mode dispersion (PMD).However, in current systems, these impairments are e ectively mitigated using DSP, and the intensity-dependent Kerr nonlinearity is suggested to be the ultimate obstacle to increase optical fiber transmission rates.Due to the nonlinear e ects, there exists an optimal (e ective) signal-to-noise ratio (SNR) point (where it is recommended to operate), which is a compromise between the additive noise and the nonlinear interference.
Various nonlinear mitigation techniques are currently under investigation, which can be categorized into two groups: optical and digital techniques.Optical solutions such as large e ective area fibers [1] allow higher signal launch powers, thus increasing the available SNR, but transmission is still performed avoiding the nonlinear regime.Other optical solutions include optical phase conjugation using twin waves [2] or mid-span optical phase conjugation [3].The drawback of the former is that it does not utilizes all degrees of freedom of the optical field e ciently, hence the spectral e ciency is reduced.The latter requires a symmetric map of both dispersion and power evolution along the link, thus making it fruitless in deployed systems.Various digital techniques are available in the literature to mitigate fiber nonlinearities, including perturbation-based precompensation [4], Kalman equalization [5], and digital backpropagation (DBP) [6,7], which has proved significantly beneficial in experimental demonstrations.
DBP compensates for the deterministic nonlinear fiber impairments by solving the nonlinear propagation equation using the split-step Fourier (SSF) method [8, Sec.2.4.1] and backpropagating the received optical field with inverted channel parameters.Several variations of DBP have been proposed and demonstrated including transmitter- [9], receiver-side [10], and both transmitterand receiver-side [11] compensation.Having exact knowledge of the fiber parameters, it is believed that the deterministic nonlinear signal-signal interactions are completely removed using DBP and that the performance of a fiber-optical system is limited by the uncompensated stochastic e ects, such as amplified spontaneous emission noise, which leads to signal-noise interactions [12], and PMD, leading to polarization-dependent nonlinear interactions [13][14][15][16][17], which considerably undermine the e ectiveness of DBP.In order to account for the signal-noise interactions, a modified DBP algorithm [18] has been proposed that takes into account the additive noise, getting the performance of the optical fiber channel closer to the fundamental limits.On the other hand, PMD introduces a frequency-dependent delay that accumulates as a random-walk-like process along the fiber length and it is usually compensated for at the receiver after DBP.When applying DBP, the entire reverse propagation is performed with the accumulated PMD over the entire link; therefore the nonlinear compensation is mismatched and its accuracy degrades with the backpropagated distance.However, it has been shown that low-complexity DBP implementations (one or less steps per span) that backpropagate a single channel are marginally a ected by polarization e ects since the nonlinear signal-signal interactions are not entirely removed and dominate the achievable performance [19,20].On the other hand, considering an ideal DBP with su cient number of steps where PMD is an issue, in order to remove its e ects, PMD should be compensated for as it naturally occurs, i.e., in a distributed fashion along the link, rather than doing it once after DBP.It has been shown numerically that compensating for PMD on a per-span basis decreases its impact on DBP significantly [13].However, this approach requires a priori PMD knowledge for every span, which is not known at the receiver.
Recently, a modified DBP algorithm that takes into account PMD was proposed in [21], where an appropriate amount of di erential group delay (DGD) is introduced at the link principal states of polarization (PSP) after each span in the reverse propagation, such that the accumulation of DGD in the forward propagation is reversed.In [22], we recently proposed a modified DBP method that reverses the PMD e ects in the backward propagation by passing the reverse propagated signal through PMD sections.Unlike [21], where the DGD is subtracted along the same PSP every time in the backward propagation, in [22] the PSPs of the backward PMD sections are di erent and aligned using an optimization algorithm such that the Jones matrix modeling the total backward PMD equals the inverse of the Jones matrix modeling the PMD occurring in the forward propagation.
In [23], we proposed a DBP method accounting for PMD, where the backward PMD sections are computed analytically using a first-order linearization approach, which we extend in this work.Compared to [23], herein we detail the analysis of the proposed method and extend the algorithm to wavelength-division-multiplexed (WDM) systems.

Proposed algorithm
The conventional DBP algorithm is modified such that the signal is backpropagated also through PMD sections.These sections are chosen such that, when cascaded, they match the overall PMD accumulated over the entire link, as observed at the receiver but with opposite sign.This method gradually reverses the PMD e ects, mimicking the accumulation of PMD in the forward propagation.The accumulated PMD at the receiver can be obtained from (blind) channel equalizers, such as the constant modulus algorithm (CMA) [24] or the multiple modulus algorithm (MMA) [25].
The inverse of the accumulated PMD at the receiver can be modeled in baseband by a frequency-dependent 2 ⇥ 2 complex-valued Jones matrix J k ( f ), where k denotes the time index.In order to reverse the PMD e ects distributively, we divide J k ( f ) into a product of N PMD + 1 matrices as where In Eq. (1), J k ( f ) is expanded into its Taylor series [26, p. 880] evaluated at f 0 and then approximated to the first order in Eq. (2).The (•) 0 , (•) 00 , . . .operators denote the first, second, . . .derivatives with respect to frequency.In Eq. ( 4), is approximated using the matrix exponential [27, p. 165] defined as which converges for any A 2 C n⇥n , and I is the identity matrix, whereas (•) 1 denotes the inverse operator.Due to the unitary nature of J k ( f ), the inverse can be obtained by the conjugatetranspose operation J 1 k ( f ) = J H k ( f ) for any f and k.Here we introduce the matrix exponential to facilitate the factorization of Eq. (4) in Eq. ( 5).Although higher order terms of ( f f 0 ) are reintroduced in Eq. ( 4) with the wrong coe cient by the matrix exponential, thus introducing another approximation, counterintuitively, this is a better approximation of Eq. (1) than Eq.(2), where the higher order terms are absent.Unfortunately, we were not able to provide an analytic proof of this result, but observed it only numerically.Nevertheless, the approximation in Eq. ( 4) is accurate for small ( f f 0 )J Fig. 1.Schematic of the proposed DBP method, where N DBP is the number of steps used by the DBP algorithm over the entire link and the equalizer is a conventional adaptive channel equalizer such as the CMA or MMA.For brevity, the blocks modeling the amplifiers and attenuation are not shown.
large | f f 0 |, or strong PMD, i.e., large J 0 k ( f 0 ), resulting in a performance penalty.In Eq. ( 5), the exponential matrix is divided into N PMD identical frequency-dependent matrices J s k ( f ).These matrices are inserted equidistantly in the DBP procedure, as illustrated in the top left of Fig. 1, to reverse the PMD e ects in the backward propagation.As a last step, the result should be left-multiplied with J k ( f 0 ) to approximate J k ( f ) as in Eq. ( 5), but as we will see below, we replace J k ( f 0 ) with J c k ( f ) to reduce the e ects of approximations.The approximations involved in deriving Eq. ( 5) hold tightly around f 0 , which is chosen to be in the middle of the signal bandwidth, but the error diverges proportionally to the accumulated DGD as f deviates from f 0 .Therefore, compensating for PMD by applying the approximation Eq. ( 5) instead of the true matrix J k ( f ) leads to a penalty.However, this penalty can be mitigated by replacing J k ( f 0 ) in Eq. ( 5) with a frequency-dependent correction matrix as illustrated in the top right of Fig. 1.Under ideal conditions (no nonlinear e ects), this correction perfectly compensates for the approximations in Eqs. ( 2) and ( 4) and brings back J k ( f ).
The modified DBP algorithm operates in a streaming block-wise fashion processing a block of samples x k at a time and it compensates for PMD using the same Jones matrix J k ( f ) over the entire block x k .Since PMD is a time-varying stochastic process and might not be constant over x k , a residual PMD after DBP might occur due to these temporal variations.This residue, denoted J r k ( f ), is mitigated by the channel equalizer and is periodically fed back to the DBP through a feedback loop to update In this fashion, it is ensured that the processed samples are PMD-free and that the PMD Jones matrix J k ( f ) implemented by the DBP algorithm is periodically updated, tracking the temporal variations of the PMD.
The schematic diagram of the proposed algorithm is shown in Fig. 1.A typical coherent receiver might include several di erent blocks between DBP and the equalizer, such as matched filtering, frequency o set compensation, or resampling, which are skipped here since they are not relevant to this work.The DBP algorithm receives as inputs a block of received samples x k and the PMD residue J r k ( f ), and outputs the processed block of samples y k , which is then further processed by the equalizer and output as z k .Initially, at the first block of samples k = 1, the DBP algorithm operates as in the conventional approach, without compensation for PMD (J 1 ( f ) = I).Thereafter, J r k ( f ) is estimated and fed back by the equalizer, which for k = 1 is the entire PMD of the link since the DBP did not compensate for it.Consequently, PMD will be compensated for  in the DBP routine and J r k ( f ) will consist of only residual PMD.The polarization-dependent nonlinear interactions, and thereby the algorithm's e ciency, are immune to temporal variations of the absolute state of polarization and are sensitive only to the frequency dependence of J k ( f ).Therefore, fluctuations of the state of polarization, which drift much faster than the frequency dependence of J k ( f ), do not have to be fed back by the equalizer, reducing the feedback frequency.
Figure 1 also illustrates the internal structure of the proposed DBP method.The conventional DBP algorithm is confined in the dark gray box, whereas two extra operators are added to account for PMD.The J s k ( f ) operator is nested in the DBP procedure and applied N PMD times, while the J c k ( f ) operator is applied only once at the end.It should be noted that if N PMD is chosen to be equal to the number of spans, the J s k ( f ) operator is applied at the beginning of each span in the backward propagation.In this paper, we chose the Wiener-Hammerstein model [28], but this is not a general requirement.The algorithm can be implemented using the Wiener or the Hammerstein models [28].
As mentioned earlier, J r k ( f ) is obtained from channel equalizers.However, in general, a coherent receiver estimates and cancels PMD using a time-domain complex-valued finite-impulse-response (FIR) filter bank arranged in a butterfly structure.Therefore, in order to calculate the PMD operators J s k ( f ) and J c k ( f ) used by the modified DBP, the time-domain filter coe cients of the equalizer must be converted to the frequency-domain Jones matrix J r k ( f ).Moreover, the derivative J 0 k ( f 0 ) must be calculated numerically in this case from noisy J k ( f ).In the Appendix, we present a method to obtain the discrete-frequency unitary matrix J r k ( f ) from the filter coe cients of the equalizer, and an approach to calculate J 0 k ( f 0 ). Figure 2 illustrates an example of the evolution of the accumulated DGD in the forward propagation and in the backward propagation applying the modified DBP scheme using Eqs.( 6) and (8).As can be seen, di erent frequency components accumulate di erent amounts of DGD in the forward propagation in a random-walk-like fashion.In the backward propagation, we set N PMD = 10, therefore the J s k ( f ) operator is applied at every multiple of 100 km starting from distance 1000 km, whereas J c k ( f ) is applied last, at distance 0 km.The DGD at f 0 , where the first order approximation was performed, decreases after every J s k ( f ) operation with a constant step equal to 1/N PMD of the total DGD at 1000 km.On the other hand, due to the approximations involved in deriving Eq. ( 5), the DGDs at the other four frequencies do not decrease monotonically and can, in some cases, increase.Since the approximations are tighter around f 0 , the DGDs at f 0 ± 25 GHz have a lower residual DGD at 0 km compared to the ones at f 0 ± 50 GHz.This residual DGD is corrected by applying J c k ( f ) at 0 km.As can be seen, this operation has no impact on the DGD at f 0 , since there is no approximation in this case and the accumulated DGD has been successfully removed already at 100 km.The average DGD grows proportionally to the square root of the distance in the forward propagation.We replicated this in our study for the backward direction, i.e., instead of a linear decrease of the argument of the matrix exponential (which quantifies the DGD), we have chosen the decrease rate according to the inverse of the square root.But we observed minor gains compared to the linear decrease and decided not to include it as it involves extra complexity since the J s k ( f ) operators will be di erent w.r.t.distance.Although the evolution of the DGD in the backward propagation does not match its evolution in the forward propagation, as we will see in the Results section, this method provide performance gains over the conventional approach.However, due to this mismatch between the two evolutions, the performance does not match the case without PMD.In the conventional approach, the backward propagation is performed with the DGD frozen to the values accumulated over the entire link, i.e., at 1000 km in Fig. 2.However, we would like to remind the reader that the DGD does not fully quantify PMD, therefore the match between the forward and backward evolution of the DGD cannot be fully regarded as a performance metric.Moreover, the example plotted in Fig. 2 is just one realization of the stochastic process PMD, and for other realizations the forward and backward DGD evolutions will match closely or loosely.
Judging by the step-wise evolution of the DGD in the backward direction, our approach seems to be similar to the one proposed in [21].However, the two methods are di erent.Opposed to [21], where in the backward propagation J s k ( f ) is a diagonal matrix modeling a DGD element applied at the same PSP along the entire link, our approach changes the PSPs with every J s k ( f ) in the backward direction.
In WDM systems, the bandwidth of the signal is large and, as we saw earlier, J s k ( f ) becomes inaccurate for large bandwidths.This inaccuracy can be lessened by, instead of approximating J k ( f ) around one central frequency f 0 , approximating J k ( f ) for each channel individually around their center frequency.The center frequency f 0 of each linearization Eq. ( 6) can be chosen in the middle of each channel's bandwidth.However, this approach introduces discontinuities between the J s k ( f ) operators of di erent channels.For example, if N ch WDM channels with a frequency separation of w are considered, the frequency span (including the guard band) of the ith channel is f 2 [ f i 0 w/2, f i 0 + w/2] for i = 1, . . ., N ch , where f i 0 denotes the central frequency of the ith channel.The discontinuities in J s k ( f ) will emerge at f = f 1 0 + w/2 + iw for i = 0, . . ., N ch 1.These discontinuities a ect the performance and should be removed.To do so, a continuous Js k ( f ) operator can be calculated by first computing the J s k ( f ) operator according to Eq. ( 6) for f 2 Here we chose to apply the linearization Eq. ( 6) once per channel.Nevertheless, this can be applied more than once, but we observed no benefits from doing so in our simulations.Thereafter, we set Js , where in Eqs. ( 9) and ( 10), the recursion is backwards and forwards, respectively.Notice that the correction matrix has to be calculated accordingly in this case by using Js k ( f ) in Eq. ( 8).It should be noted that the linearization of each WDM channel becomes less accurate by making Js k ( f ) continuous in Eqs. ( 9) and (10).However, the continuity of Js k ( f ) is more critical for the performance than having the linearization of each WDM channel intact.Complexity-wise, the computational increase over the conventional DBP is marginal.The calculations of the J s k ( f ), Js k ( f ), J c k ( f ) operators can be performed in parallel to the dataprocessing stream.On the other hand, applying these operators to the data requires a single complex multiplication and can be performed after compensating for the linear e ects in the frequency domain.

Simulation setup
We study through numerical simulations a point-to-point transmission link consisting of an ideal transmitter and coherent receiver, and N s spans of 100 km standard single-mode fiber with: attenuation 0.2 dB/km, CD parameter 17 ps/(nm•km), and nonlinear coe cient 1.2 1/(W•km).Each span is followed by one erbium-doped fiber amplifier with a noise figure of 4.5 dB, compensating for the exact span loss.The transmitted signal consists of N ch polarizationmultiplexed 16-ary quadrature-amplitude modulated channels at 50 Gbaud shaped using a root-raised cosine (RRC) pulse with roll-o factor 0.01 and spaced at 50.1 GHz.The signal was oversampled by a factor of 2 and each channel consists of 2 15 symbols.The signal propagation was simulated by solving the Manakov-PMD equation [29] using the SSF approach with steps of 0.1 km.Static PMD was emulated at every SSF step consisting of a polarization scrambler, which uniformly [30] scatters the state of polarization, and a retardation plate.The DGD introduced by each retardation plate was Gaussian distributed with mean ⌧ p and standard deviation ⌧ p /5 [31]; thus the mean accumulated DGD is ⌧ = p 8N SSF /(3⇡) ⌧ p , where N SSF is the total number of SSF steps.In order to capture the stochastic nature of PMD, 120 fiber realizations were simulated for each set of parameters, except Fig. 4(b) where we simulated 500 realizations to increase the accuracy of the scatter plot.
We consider two receiver DSP setups: i) conventional DBP followed by a linear PMD equalizer, and ii) modified DBP described in the previous section using Eqs.( 6), (8)(9)(10) with N PMD = 10, except in Fig. 4(a) where N PMD is varied.For both setups, we consider the equalizer to operate under perfect PMD knowledge and then we compare this performance with results obtained using the blind MMA equalizer in Fig. 5(b).The length of equalizer's FIR filters was N taps = 31 with two samples per symbol, and the numerical derivative J 0 k ( f 0 ) is averaged over 2/3 of the bandwidth of the signal, i.e., L = 1/(3T f ) (see appendix).DBP is performed with the same SSF step distribution and sampling rate as in the forward propagation for both setups.Full-field DBP is applied, i.e., all channels are backpropagated.DBP is followed by an ideal matched RRC filter applied to the signal, after which the linear-scale SNR is averaged over the two polarizations by comparing the transmitted and received symbols.Namely, the SNR of each polarization is evaluated as the ratio between the variance of the transmitted complex symbols (E[|U| 2 ]) and variance of the noise (E[|U V | 2 ]), where V denotes the received symbols after DSP.Note that this definition of the SNR over one polarization is related to the error vector magnitude (EVM) according to SNR ⇡ 1/EVM 2 .

Results and discussion
In order to study the e ectiveness of the proposed algorithm, we consider di erent PMD scenarios by studying two transmission distances: 1000 (10 ⇥ 100) km and 4000 (40 ⇥ 100) km; two di erent simulated bandwidths: 50 GHz (N ch = 1 channel) and 350 GHz (N ch = 7 channels); and two PMD parameters: 0.04 and 0.1 ps/ p km, corresponding to typical modern optical fibers.These PMD parameters yield an average accumulated DGD of ⌧ = 1.26 ps and ⌧ = 3.16 ps, respectively, for 1000 km, and ⌧ = 2.52 ps and ⌧ = 6.32 ps, respectively, for 4000 km.
Figure 3 shows the achieved performance obtained for the studied scenarios.In Fig. 3(a), the single-channel scenario is considered and, as can be seen, DBP obtains gains of 9.2 and 6.5 dB  compared to CDC for the two transmission distances of 1000 and 4000 km, respectively, when PMD is not considered.This performance degrades in the presence of PMD with parameter 0.04 and 0.1 ps/ p km by 1.2 and 2.9 dB, respectively, for 1000 km, and by 0.7 and 1.9 dB, respectively, for 4000 km.In Fig. 3(b), the seven-channel scenario is studied, where DBP obtains gains of 7.5 and 5.9 dB compared to CDC for the two transmission distances (1000, 4000 km) when PMD is not considered.Same behaviour as above, the performance of DBP degrades in the presence of PMD with parameter 0.04 and 0.1 ps/ p km by 3 and 4.7 dB, respectively, for 1000 km, and by 2.5 and 3.7 dB, respectively, for 4000 km.Without PMD, DBP achieves the highest gains when both the distance and bandwidth are the smallest due to the weaker signal-noise nonlinear interactions.PMD has a stronger impact on the performance as the bandwidth and/or the PMD parameter increases, a phenomenon also observed in [17], where PMD incurred a penalty of 16 dB for a transmission of approximately 1 THz bandwidth.However, it is interesting to notice that the degradation of the DBP performance due to PMD relative to the gain obtained without PMD is constant with respect to the distance.In the single-channel scenario (Fig. 3 reduces the SNR gains of DBP for both distances by approximately 10% and 30% when the PMD parameter is 0.04 and 0.1 ps/ p km, respectively.On the other hand, in the seven-channel scenario (Fig. 3(b)), the gains are reduced by approximately 40% and 60% when the PMD parameter is 0.04 and 0.1 ps/ p km, respectively, irrespective of the distance.The proposed modified DBP algorithm improves the optimum SNR in the single-channel scenario (Fig. 3(a)) by 0.6 and 1.1 dB for 1000 km, and by 0.4 and 0.7 dB for 4000 km compared to the conventional DBP approach when the PMD parameter is 0.04 and 0.1 ps/ p km, respectively.These gains represent 50% and 37% of the SNR reduction due to PMD, regardless of the distance.In the seven-channel scenario (Fig. 3(b)), the modified DBP improves the optimum SNR by 1.1 and 1 dB for 1000 km, and by 0.8 and 0.6 dB for 4000 km, with PMD parameter 0.04 and 0.1 ps/ p km, respectively.Here the gains represent 14% and 12% of the SNR reduction due to PMD.It is likely that the performance of the proposed algorithm decreases for larger bandwidths and higher PMD parameters due to the approximations used to derive Eq. ( 5), which becomes less accurate.
In [22], an average gain of only 0.7 dB (cf.1.1 dB) was obtained in the single-channel configuration for a PMD coe cient of 0.1 ps/ p km and distance 1000 km shown in Fig. 3(a).There the backward PMD sections are chosen quasi-randomly and obtained from an optimization algorithm such that the Jones matrix modeling the total backward PMD equals the inverse of the Jones matrix modeling the PMD occurring in the forward propagation.This leads to large performance fluctuations since the optimization algorithm outputs both sequences of PMD sections that match (almost) exactly the one in the forward propagation, and sequences that are the opposite and perform poorly, lowering the average gain.On the other hand, the performance of the method proposed here does not exhibit these two extremes of poor and ideal, which is an advantage and disadvantage, respectively.It would be desirable to use [22] if one could distinguish between good and bad solutions of the optimization algorithm to maximize performance.
From now on, the presented results are based on the single-channel configuration over 1000 km and PMD parameter 0.1 ps/ p km, except in Fig. 5(a) where the PMD parameter is varied.The achieved SNR at the optimal input power is shown in Fig. 4(a) as a function of N PMD .As can be seen, the performance increases with N PMD , providing a gain of 1.1 dB for N PMD = 10  (same as the number of spans), after which it saturates, providing negligible extra gain.
Figure 4(b) illustrates the scatter plot of the achieved SNR gain by the proposed DBP scheme versus the SNR of the conventional DBP at the optimum input power of 11 and 10 dBm, respectively, for di erent PMD realizations.As can be seen, the gain is positive in the majority of the cases (>90%), and negative gain mainly occurs when the achieved SNR by the conventional DBP is relatively high.From the same figure, it can also be inferred that the worst/best SNR achieved by the proposed method is 2/0.5 dB better than the conventional DBP, thus improving the required SNR margin and lowering the outage probability.
Figure 5(a) shows the average performance of the two schemes as a function of the PMD parameter at the optimal input power for each case.As the PMD parameter increases, the performance of both schemes degrades significantly from 27.6 dB down to 19.6 dB when the PMD coe cient is 1.5 ps/ p km.The proposed DBP method provides a minor gain (0.1 dB) at 0.01 ps/ p km, which increases up to a maximum of 1.2 dB at 0.25 ps/ p km.After this point, the gain declines and becomes 0.1 dB at 1.5 ps/ p km.The performance of DBP in the weak PMD regime is lightly a ected by PMD; therefore the proposed algorithm provides small gains.On the other hand, it is likely that the gain deteriorates at high PMD parameters due to the approximation used to derive Eq. ( 5), which becomes less accurate.However, the proposed DBP method provides gains greater than 0.8 dB over the range of PMD coe cient 0.04-0.4ps/ p km, which covers most of the modern optical fibers.
So far, the presented results were obtained assuming that the equalizer has ideal knowledge of the accumulated PMD.In Fig. 5(b), results obtained using ideal equalization are compared with results based on the blind MMA equalizer and the processing procedure presented in the Appendix.It can be seen that the MMA induces a penalty of about 0.1 dB, which is approximately constant over the entire range of input power and is the same for both schemes.This penalty is likely due to the inability of the equalizer to perfectly estimate the PMD.In general, the equalizer does not always return the true Jones matrix J k ( f ), but a matrix that minimizes the equalizer's cost function.Due to the symmetries of the polarization-multiplexed constellations, the returned matrix by the equalizer might be a ected by phase and polarization ambiguities, therefore the returned matrix is approximately J k ( f ) multiplied with a constant matrix modeling these ambiguities.However, the proposed algorithm is not a ected by these ambiguities since the performance relies only on the dependence of J k ( f ) with frequency.

Conclusions
We studied the achievable performance by ideal full-field DBP in systems with PMD.We saw that the performance of DBP degrades with bandwidth and the strength of PMD.However, interestingly, we observed that the ratio between the achieved gains by DBP considering PMD and the gains obtained without PMD does not change with respect to the transmission distance for all studied PMD parameters and bandwidth.Moreover, in order to improve the performance of DBP in the presence of PMD, we proposed a simple modification of the DBP algorithm to take into account polarization e ects by blindly reversing the PMD e ects in the backward propagation.The algorithm has been proved to work with PMD information obtained from the blind MMA equalizer and provides SNR gains over an ample range of di erent transmission setups.Interesting directions for future research include performance characterization in the presence of a time-varying PMD, as well as a comprehensive study including [21,22] to investigate which of the three methods o ers the best performance for various simulation setups and channel conditions.

Appendix
In this Appendix is presented the methodology of obtaining the PMD residue J r k ( f ) from the FIR filters of the equalizer and a numerical method to calculate the derivative J 0 k ( f 0 ).In the discussion above, the matrices modeling PMD were defined in the continuous-frequency domain.Since DSP operates on finite-length discrete-time or discrete-frequency signals, this analysis is carried in the discrete domain.We distinguish between continuous and discrete functions adopting the notation f (•) and f [•], respectively.
The filter impulse response of the equalizer can be represented by a discrete-time 2 ⇥ 2 complex-valued matrix sampled at time instants nT s , where x and y denote the two degenerate polarizations, T s is the sampling period, and n = 0, . . ., N taps 1, whereas N taps is the length of the FIR filter.Typically, an oversampling of two samples per symbol is employed, i.e., T = 2T s , where T is the symbol duration.The impulse response of the filter is usually two-sided centered at dN taps /2e, where d•e indicates the ceiling function.
The discrete-frequency representation of T eq k [n] can be obtained by taking the element-wise discrete Fourier transform, denoted F {•}, with respect to n which is the discrete representation of J eq k ( f ) sampled with resolution f = 1/(T s N taps ) at frequency instants m f for m = bN taps /2c, . . ., dN taps /2e 1, i.e., J eq k (m f ).Typically, N taps is chosen to be less than 100, resulting in a matrix J eq k [m] with a coarse frequency-domain resolution, i.e., large f .In order to refine this resolution, T eq k [n] can be symmetrically padded with zeros, which increases N taps , before applying Eq. ( 12) such that the desired resolution is obtained.The finer resolution of J eq k [m] also improves the accuracy of the numerical di erentiation, which is discussed below.
However, T eq k [n] may compensate for polarization-dependent losses (PDL), which will be reflected in J eq k [m] through Eq. (12).Considering the unitary nature of PMD, PDL can be removed from J eq k [m] by finding its closest unitary matrix [32] J which is the discrete-frequency representation of J r k ( f ) and can be used in the DBP routine as described in Sec. 2. In Eq. ( 13 (6)(7)(8), the derivative of J k ( f 0 ) is needed.Usually numerical di erentiation is done using the straightforward two-point derivative J 0 k [m] = (J k [m + 1] J k [m])/ f .However, this derivative is very sensitive to noisy signals, but it can be improved by using multipoint di erentiation.In the most general case, the multipoint di erentiation can be approximated as where c l are the filters coe cients.In this paper, we chose c l as so that the biggest weight is given to the samples in the vicinity of J 0 k [m] and then the weight decreases linearly with l.

Fig. 2 .
Fig. 2. The evolution of the accumulated DGD at di erent frequencies versus distance in the forward and backward propagation.The PMD parameter is 0.1 ps/ p km and N PMD = 10.

0+ w/ 2 ]
at the central channel bN ch /2c, where b•c indicates the flooring function.The continuous operator Js k ( f ) of the outer channels is calculated recursively as

Fig. 3 .
Fig. 3. Average SNR versus input power per channel for various setups: ((a)) N ch = 1 and N s = {10, 40}; ((b)) N ch = 7 and N s = {10, 40}.For comparison, the performance of CD compensation (CDC) only and DBP, both without PMD, are shown.The maximum (in dB) of each curve from the plots is summarized in the legend above.

Fig. 4 .
Fig. 4. (a): Average SNR obtained at the optimal input power by the conventional DBP and by the modified DBP scheme with varying number of PMD sections N PMD .(b): Scatter plot of the achieved SNR gain by the proposed DBP modification versus the SNR attained by the conventional DBP obtained over 500 fiber realizations at the optimum input power 11 and 10 dBm, respectively.
and MMA Conv.DBP and ideal eq.Modified DBP and MMA Modified DBP and ideal eq.

Fig. 5 .
Fig. 5. (a): Average SNR and SNR gain versus fiber PMD parameter obtained at the optimum input power for each scenario.Note the y-axis on the right for the gain.(b): Results obtained using the blind MMA equalizer compared to an ideal equalizer for both DBP schemes.
), the columns ofU k [m], V k [m] are the left-, right-singular vectors of J eq k [m], i.e., J eq k [m] = U k [m]⌃ k [m]V H k [m] is the singular-value decomposition [33, p. 35] of J eq k [m].When calculating J s k ( f ) and J c k ( f ) in Eqs.