Evaluation of the computational effort for chromatic dispersion compensation in coherent optical PM-OFDM and PM-QAM systems

Recently, coherent-detection (CoD) polarization multiplexed (PM) transmission has attracted considerable interest, specifically as a possible solution for next-generation systems transmitting 100 Gb/s per channel and beyond. In this context, enabled by progress in ultra-fast digital signal processing (DSP) electronics, both multilevel phase/amplitude modulated formats (such as QAM) and orthogonal-frequency-division multiplexed (OFDM) formats have been proposed. One specific feature of DSP-supported CoD is the possibility of dealing with fiber chromatic dispersion (CD) electronically, either by post-filtering (PM-QAM) or by appropriately introducing symbol-duration redundancy (PM-OFDM). In both cases, ultra-long-haul fully uncompensated links seem to be possible. In this paper we estimate the computational effort required by CD compensation, when using the PM-QAM or PM-OFDM formats. Such effort, when expressed as number of operations per received bit, was found to be logarithmic with respect to link length, bit rate and fiber dispersion, for both classes of systems. We also found that PM-OFDM may have some advantage over PM-QAM, depending mostly on the over-sampling needed by the two systems. Asymptotically, for large channel memory and small over-sampling, the two systems tend to require the same CD-compensation computational effort. We also showed that the effort required by the mitigation of polarization-related effects can in principle be made small as compared to that of CD over long uncompensated links. © 2009 Optical Society of America OCIS codes: (060.0060) Fiber optics and optical communications; (060.2330) Fiber optics communications; (060.2360) Fiber optics links and subsystems; (060.4080) Modulation; (060.4510) Optical communications References and links 1. R. Noé, “Phase Noise-Tolerant Synchronous QPSK/BPSK Baseband-Type Intradyne Receiver Concept With Feedforward Carrier Recovery,” J. Lightwave Technol. 23, 802-808 (2005). 2. S. Tsukamoto, D. S. Ly-Gagnon, K. Katoh, K. Kikuchi, “Coherent Demodulation of 40-Gbit/s PolarizationMultiplexed QPSK Signals with 16-GHz Spacing after 200-km Transmission,” in Proc. OFC 2005, PD paper 29, Anaheim (USA), March. 6-11, (2005). #101167 $15.00 USD Received 4 Sep 2008; revised 2 Jan 2009; accepted 14 Jan 2009; published 22 Jan 2009 (C) 2009 OSA 2 February 2009 / Vol. 17, No. 3 / OPTICS EXPRESS 1385 3. Y. Han and G. Li, “Coherent optical communication using polarization multiple-input-multiple-output,” Opt. Express 13, 7527-7534 (2005). 4. D. S. Ly-Gagnon, S. Tsukamoto, K. Katoh, and K. Kikuchi, “Coherent Detection of Optical Quadrature PhaseShift Keying Signals With Carrier Phase Estimation,” J. Lightwave Technol. 24, 12-21 (2006). 5. S. J. Savory et al., “Digital Equalisation of 40 Gbit/s per Wavelength Transmission over 2480km of Standard Fibre without Optical Dispersion Compensation,” in Proc. ECOC 2006, paper Th2.5.5, Cannes (FR), Sept. 2428, (2006). 6. C. R. S. Fludger, T. Duthel, T. Wuth, and C. Schulien, “Uncompensated Transmission of 86 Gbit/s Polarization Multiplexed RZ-QPSK over 100km of NDSF Employing Coherent Equalisation,” in Proc. ECOC 2006, PD paper Th4.3.3, Cannes (FR), Sept. 24-28, (2006). 7. K. Roberts, “Electronic Dispersion Compensation Beyond 10 Gb/s,” in Proc. of IEEE LEOS Summer Topical Meetings, Portland (USA), paper MA2.3, Jul. 23-25, (2007). 8. G. Charlet et al., “12.8 Tbit/s transmission of 160 PDM-QPSK (160X2X40 Gbit/s) channels with coherent detection over 2550 km,” Proc. ECOC 2007, paper PD 1.6, Berlin (D), Sept. 16-20, (2007). 9. C. Laperle, B. Villeneuve, Z. Zhang, D. McGhan, Han Sun, M. OSullivan, “WDM Performance and PMD Tolerance of a Coherent 40-Gbit/s Dual-Polarization QPSK Transceiver,” J. Lightwave Technol. 26, 168-175 (2008). 10. C. R. S. Fludger, et al., “Coherent Equalization and POLMUX-RZ-DQPSK for Robust 100-GE Transmission,” J. Lightwave Technol. 26, 64-72 (2008). 11. J. Renaudier, G. Charlet, M. Salsi, O. B. Pardo, H. Mardoyan, P. Tran, and S. Bigo, “Linear Fiber Impairments Mitigation of 40-Gbit/s Polarization-Multiplexed QPSK by Digital Processing in a Coherent Receiver,” J. Lightwave Technol. 26, 36-42 (2008). 12. W. Shieh, H. Bao, and Y. Yang, “Coherent Optical OFDM: Theory and Design,” Opt. Express 16, 841-859 (2008). 13. W. Shieh and C. Athaudage, “Coherent Optical Orthogonal Frequency Division Multiplexing,” Electron. Lett. 42, 587-589 (2006). 14. W. Shieh, X. Yi, and Y. Tang, “Transmission Experiment of Multi-Gigabit Coherent Optical OFDM Systems over 1000 km SSMF Fibre,” Electron. Lett. 43, 183184 (2007). 15. S. L. Jansen, I. Morita, N. Takeda, and H. Tanaka, “20-Gb/s OFDM Transmission over 4160-km SSMF Enabled by RF-pilot Tone Phase Noise Compensation,” Proc. OFC 2007, Anaheim (CA), paper PDP 15, March 25-29, (2007). 16. S. L. Jansen, I. Morita, T. C. W. Schenck, N. Takeda, and H. Tanaka ‘Coherent Optical 25.8-Gb/s OFDM Transmission Over 4160-km SSMF,’ J. Lightwave Technol. 26, 6-15 (2008). 17. B. Goebel, B. Fesl, L. D. Coelho and N. Hanik, “On the Effect of FWM in Coherent Optical OFDM Systems,” in Proc. OFC 2008, Anaheim (CA), paper JWA58, San Diego (CA), Feb. 24-28, (2008). 18. A. J. Lowery and J. Armstrong, “Orthogonal Frequency Division Multiplexing for Dispersion Compensation of Long-Haul Optical Systems,” Opt. Express 14, 2079-2084 (2006). 19. A. J. Lowery, “Improving Sensitivity and Spectral Efficiency in Direct-Detection Optical OFDM Systems,” in Proc. OFC 2008, paper OMM4, San Diego (CA), Feb. 24-28, (2008). 20. S. L. Jansen, I. Morita and H. Tanaka, “16x52.5-Gb/s, 50-GHz Spaced, POLMUX-CO-OFDM Transmission over 4,160 km of SSMF Enabled by MIMO Processing,” in Proc. ECOC 2007, paper PD 1.3, Berlin (D), Sept. 16-20, (2007). 21. S. L. Jansen, I. Morita and H. Tanaka, “10x121.9-Gb/s PDM-ODFM Transmission with 2-b/s/Hz Spectral Efficiency over 1,000 km of SSMF,” in Proc. OFC 2008, paper PDP2, San Diego (CA), Feb. 24-28, (2008). 22. Yiran Ma, W. Shieh, and Qi Yang, “Bandwidth-Efficient 21.4 Gb/s Coherent Optical 2x2 MIMO OFDM Transmission,” in Proc. OFC 2008, paper JWA59, San Diego (CA), Feb. 24-28, (2008). 23. E. Yamada, et al., “Novel No-Guard-Interval PDM CO-OFDM Transmission in 4.1 Tb/s (50x88.8 Gb/s) DWDM Link over 800 km SMF Including 50-Ghz Spaced ROADM Nodes”, in Proc. OFC 2008, paper PDP8, San Diego (CA), Feb. 24-28, (2008). 24. W. Shieh, Q. Yang, and Y. Ma, “107 Gb/s coherent optical OFDM transmission over 1000-km SSMF fiber using orthogonal band multiplexing,” Opt. Express 16, 6378-6386 (2008). 25. H. Bulow, B. Franz, A. Klekkamp, and F. Buchali, “40 Gb/s Distortion Mitigation and DSP-Based Equalisation,” in Proc. ECOC 2007, Berlin, Germany, Sept. (2007). 26. A. V. Oppenheim and R. V. Schafer, Digital Signal Processing, Prentice-Hall Inc., Englewood Cliffs (NJ), pp. 110-113, (1975). 27. S. W. Smith, The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Publishing, Chapter 18, San Diego (CA), (1997). 28. L. Hanzo, M. Munster, B.J. Choi, and T. Keller, OFDM and MC-CDMA, John Wiley and Sons, Hoboken (NJ), (2003). 29. Xingwen Yi, W. Shieh, and Yan Tang, “Phase Estimation for Coherent Optical OFDM,” IEEE Photon. Technol. Lett. 19, 919-921 (2007). 30. J. H. Winters, “Equalization in Coherent Transmission Systems Using a Fractionally Spaced Equalizer,” J. Lightwave Technol. 8, 1487-1491 (1990). 31. M. G. Taylor, “Coherent Detection Method Using DSP for Demodulation of Signal and Subsequent Equalization #101167 $15.00 USD Received 4 Sep 2008; revised 2 Jan 2009; accepted 14 Jan 2009; published 22 Jan 2009 (C) 2009 OSA 2 February 2009 / Vol. 17, No. 3 / OPTICS EXPRESS 1386 of Propagation Impairments,” IEEE Photon. Technol. Lett. 16, 674676 (2004). 32. E. Ip and J. M. Kahn, “Digital Equalization of Chromatic Dispersion and Polarization Mode Dispersion,” J. Lightwave Technol. 25, 2033-2043 (2007). 33. S. J. Savory, “Digital Filters for Coherent Optical Receivers,” Opt. Express 16, 805-817 (2008). 34. G. Bosco, P. Poggiolini, and M. Visintin, “Performance Analysis of MLSE Receivers Based on the Square-Root Metric”, J. Lightwave Technol. 26, 2098-2109 (2007). 35. P. Poggiolini, G. Bosco, and M. Visintin, “MLSE Receivers and Their Applications in Optical Transmission Systems”, in Proc. of The 20th Annual Meeting of the IEEE LEOS, Lake Buena Vista, Florida (U.S.A.), 21-25 Oct., pp. 216-217, (2007). 36. P. Poggiolini, G. Bosco, Y. Benlachtar, S. J. Savory, P. Bayvel, R. I. Killey, and J. Prat, “Long-Haul 10 Gbit/s Linear and Non-Linear IMDD Transmission over Uncompensated Standard Fiber Using a SQRT-Metric MLSE Receiver,” Opt. Express 16, 12919-12936 (2008). 37. Xingwen Yi, W. Shieh, and Yiran Ma, “Phase Noise Effects on High Spectral Efficiency Coherent Optical OFDM Transmission,” J. Lightwave Technol. 26, 1309-1316 (2008). 38. H. C. Bao and W. Shieh, “Transmission of Wavelength-Division-Multiplexed Channels With Coherent Optical OFDM,” IEEE Photon. Technol. Lett. 19, 922-924 (2007). 39. P. Duhamel and H. Hollmann, “Split-radix FFT algorithm,” Electron. Lett. 20, 14-16 (1984).


Introduction
Optical system research is currently targeting 100 Gb/s per channel transmission and higher. At such speeds, all detrimental fiber propagation effects are exacerbated and this has brought up a number of severe challenges for system implementation. To cope with these challenges, coherent detection (CoD) has been advocated.
CoD has only recently become a practical option, thanks to astounding progress in electronic digital signal processing (DSP). CoD allows to use advanced modulation formats such as polarization-multiplexed (PM) quadrature amplitude modulation (PM-QAM) [1]- [11] and coherent orthogonal frequency division multiplexing (OFDM) [12]- [17]. The latter can be transmitted PM, too [20]- [23]. OFDM is being very actively investigated with direct-detection (DD) as well. See for instance [18], [19]. In this paper, we concetrate on CoD PM-OFDM, because the comparison with CoD PM-QAM is more balanced. However, many of the results obtained here would also apply with straightforward adaptations to some DD-OFDM schemes as well.
With both PM-OFDM and PM-QAM, thanks to electronic DSP, it is in principle possible to completely avoid optical chromatic dispersion (CD) compensation, or management. The possibility of transmitting over completely uncompensated links holds the promise for substantial capex/opex savings and overall network simplification. The systems could be adaptive and therefore they could also cope with optical path re-routing.
In this paper we consider optically uncompensated links and we aim at comparing the computational effort required to carry out CD compensation at the Rx using CoD and either PM-QAM or PM-OFDM.
The most commonly used Tx and Rx structures for the two formats are shown in Figs. 1-2. Concerning transmission DSP, PM-OFDM needs two IFFTs and four DACs, whereas PM-QAM may avoid DSP altogether for simple constellations but may need it for larger ones. Concerning the Rx's, the electro-optical analog front-ends are essentially identical for both formats. Fig. 2 only shows the CD-compensation main blocks.
Previous literature on CD compensation computational effort includes [25], where the bit rate was 40 Gb/s and it was assumed that only a relatively limited amount of link dispersion was left uncompensated (up to 1000 ps/nm). In that context, PM-OFDM appeared to require significantly less computational effort than PM-QAM. However, a time-domain implementation was assumed for the PM-QAM FIR filters. In this paper, we assume long-haul fully uncompensated links (i.e., large CD), arbitrary bit rate and FIR filter implementation through Fast-Fourier-Transforms (FFTs) and inverse FFTs (IFFTs), using the overlap-and-add algorithm [26]- [27].
CoD PM-QAM and PM-OFDM systems can compensate through DSP for other fiber im-pairments too, such as PMD and PDL. Also, both formats need to keep track of the link overall birefringence, in order to align the received signal polarization frame with the Rx polarization frame. We evaluate such computational effort, too, in order to find out whether it may be considered negligible as compared to that of CD alone. A specific section of this paper has been devoted to this topic, and suitable DSP for such effects will be discussed there. We do not address the computational effort required to carry out channel estimation, or 'identification'. We acknowledge that its burden may be substantial, especially with regards to estimating non-stationary polarization-related effects. On the other hand, this problem was investigated in [25] and the computational effort for channel identification was found to be negligible with respect to the real-time compensation of propagation effects. Also, we address neither the computational effort needed to carry out digital clock recovery and/or sample interpolation/decimation, nor frame recovery. We consider these specific topics outside of the scope of this paper.
As in [25], we chose to express the DSP computational effort in terms of arithmetical operations per transmitted information bit (OP b ). Another fundamental assumption that we made is that the same DSP technology is used for both classes of systems and, in particular, the same FFT/IFFT technology is available to both PM-QAM and PM-OFDM. This makes it possible to carry out a fair and meaningful comparison. Note that, from an implementation viewpoint, additions and multiplications have different complexities. However, with both formats, by far the bulk of the computational effort consists of FFTs/IFFTs, as we shall see later on. Therefore, for both formats the DSP ratio of additions to multiplications is essentially established by the common FFT/IFFT technology. Consequently, as far as a comparative analysis of the relative computational effort of PM-QAM to PM-OFDM is concerned, discerning additions from multiplications would not change the result.
The paper is structured as follows. In Sect. 2 we estimate the number N SC of subcarriers that are necessary to keep the PM-OFDM cyclic prefix overhead at an acceptable level. In Sect. 3, we compute the OP b needed by PM-OFDM, given N SC found in the previous section.
In Sect. 4 we discuss the length N F of the finite-impulse-response (FIR) filters needed to compensate for CD in PM-QAM. In Sect. 5 we compute the OP b needed by PM-QAM, based on N F as found in the previous section.
While estimating OP b for either PM-OFDM and PM-QAM, we take into account various important implementation aspects such as zero-padding, over-sampling and the efficiency of FIR filter implementation.
In Section 6 we compare the CD-compensation computational effort of PM-QAM and PM-OFDM assuming certain specific system scenarios. We also derive 'asymptotic' computational effort expressions in the limit of large CD and information-theory-limited values of oversampling. The computational effort required by the compensation/mitigation of polarizationrelated effects, as compared to that of CD, is discussed in depth in Section 7.
Finally, a discussion of the results is proposed and some conclusions are drawn in Section 8. Throughout the paper, the following notation is used:  • M: number of bits per subcarrier and per PM-OFDM symbol, or number of bits per PM-QAM symbol.

Number of Subcarriers for PM-OFDM
In this section we derive the number of OFDM subcarriers N SC needed to support a given system-design target amount of uncompensated CD.
In the absence of CD we would have: However, CD makes the various subcarriers propagate at different group velocities. The ab-solute value of the group delay difference between the two outermost subcarriers at frequencies f 1 and f N SC is [16]: Assuming operation at λ =1550 nm, we have λ 2 c = 8.0139 8 (the number '8' has dimensions of [(nm 2 · s)/km]). We will henceforth use the following close approximation for Eq. (2): Assuming that the subcarrier frequency is expressed in [THz], and every other quantity follows the units given in the bulleted list at the end of the previous section, then ∆τ g conveniently results in [ps].
Such group delay difference makes the symbols on each subcarrier slip, relative to one another, and a cyclic prefix needs to be added in order to preserve a suitable common sampling window at the Rx, of duration T s , good for the OFDM symbols on all subcarriers. This avoids symbol discontinuities and prevents loss of subcarrier orthogonality [28], [12]. The cyclic prefix makes the symbol duration increase to a new value: Even though the duration of the FFT sampling window at the Rx remains T s , the actual time taken to transmit one symbol is now T s > T s and the actual OFDM symbol rate goes down to: Note that due to the slow-down of the subcarrier symbol rate, the spectral occupancy of each subcarrier decreases. However, from the viewpoint of the Rx FFT, whose input array of signal samples still spans a time-window T s , the subcarriers are at minimum spacing when they are spaced R s and trying to pull them closer would generate loss of orthogonality. Therefore, even though R s is no longer the symbol rate, it still remains the minimum frequency separation between adjacent OFDM subcarriers.
Since the symbol rate has gone down, also the total bit rate carried by the OFDM channel goes down to a lower value: However, this is unacceptable because the nominal OFDM channel total bit rate must remain R b . To restore the original bit rate, it is necessary to add more carriers, i.e., to increase N SC to a greater N SC . Unfortunately, increasing the number of subcarriers in turn increases ∆τ g , which would require a longer cyclic prefix and eventually an even greater N SC , and so on. The problem must then be solved in a combined way. First, we remark that: Then, we equate the rightmost-hand side of Eq. (4) to the right-hand side of Eq. (6). By rearrangement of the resulting equation, we achieve an expression for N SC , which contains ∆τ g . We then use the approximated form of Eq. (2) to eliminate ∆τ g . Remembering that T s = R −1 s , we further obtain the following intermediate result: , we then get: where N SC is the increased number of subcarriers needed to support the cyclic prefix while keeping R b constant. Note that in Eq. (7) N SC is essentially an initial guess of the needed number of subcarriers. Once a value for N SC has been somehow decided, then Eq. (7) tells us how many subcarriers N SC are actually needed to cope with dispersion. However, the fact that N SC depends on an arbitrary initial guess makes Eq. (7) somewhat unsatisfactory. It would be desirable to eliminate N SC and directly find the actually needed number of subcarriers N SC .
It turns out that this is not possible, because the problem does have one degree of freedom which cannot be eliminated. However, such degree of freedom could be attributed to a more meaningful quantity than the arbitrary guess N SC .
We therefore define the CD-induced overhead, as Note that k ≥ 1. This quantity is crucial because the cyclic prefix has two detrimental effects on the system, both of which are directly expressed in terms of k.
One effect is the loss of bandwidth efficiency. We use the symbol ρ B for the bandwidth efficiency in the absence of cyclic prefix. When using the cyclic prefix, we have shown that we need more subcarriers to transmit the same bit rate. Since the spacing among subcarriers must remain the same, this means that the use of the cyclic prefix lowers ρ B to a new value ρ B which is given by: The other detrimental effect of the cyclic prefix is that of impacting the system sensitivity. Transmission of more subcarriers requires more power, because it is easily found that we cannot lower the power per subcarrier without worsening both the per-subcarrier and the global bit error rate (BER). Put it differently, we are wasting power for cyclic prefix transmission. The resulting optical signal-to-noise-ratio (OSNR) penalty is simply: Therefore k is a fundamental system design parameter, in the sense that fixing k we can a priori set a limit to both the loss of bandwidth efficiency and the system OSNR penalty.
Based on (8), by means of simple substitutions we can re-write Eq. (7) as: This important equation clearly shows that there is not a unique solution for N SC . Rather, there are many possible solutions, depending on the overhead k that we are willing to accept. Note that if we try to minimize the overhead, i.e., make k close to 1, the number of needed subcarriers N SC diverges. This means that CD has a definite and unavoidable impact on OFDM systems, because some overhead must be accepted for the system to be feasible. Fig. 3 shows a plot of N SC vs. the OSNR penalty ∆OSNR dB defined in (10). The plot is drawn using a set of 'reference system parameters' which are summarized in Table 1. The plot shows that, to keep the OSNR penalty below 1 and 0.5 dB, about 1900 and 3200 subcarriers are needed at 3000 km, respectively. Typically, for FFT implementation efficiency, these numbers would have to be rounded up to the next power-of-two and therefore would become 2048 and 4096, respectively. The rounding is almost exact for 0.5 dB penalty at 1000 and 2000 km, where 1024 and 2048 subcarriers would suffice, respectively.  Table 1.
Incidentally, Eq. (11) has its minimum for k = 2, i.e., for 3 dB OSNR penalty and 50% reduction of the bandwidth efficiency. The plots in Fig. 3 confirm it. On the other hand, the OSNR penalty at such minimum is too large for k = 2 to be a solution of practical interest.
These calculations neglect some factors which could increase the number of necessary subcarriers, such as the need for 'pilot tones' ( [28], Chapter 14). Pilot tones are used to perform phase estimation and subcarrier CD-induced phase-delay compensation [29]. They may also be used to help estimate the correction needed to compensate for polarization-related effects. However, assuming long uncompensated links, the cyclic-prefix is by far the leading factor in determining N SC . Pilot tones could be at most a few percent of the total number of subcarriers. Therefore we will disregard the increase in N SC due to pilot tones.
Finally, there have been proposals for OFDM sub-banding [21], [24], to help decrease the cyclic overhead. The concept is that of dividing the transmitter signal into K sub OFDM subbands within the same Tx channel, and then demodulating each sub-band as a separate OFDM signal. This way, each sub-band occupies a smaller bandwidth and the group delay difference between a sub-band extreme subcarriers can be made much smaller than the group delay difference between the extreme subcarriers of the overall Tx channel (ideally K −1 sub times smaller). In addition, the speed of DACs and ADCs can be likewise reduced, since each sub-band approximately carries a fraction K −1 sub of the payload. This is a very interesting concept but it has also drawbacks: the TX and Rx must use 4K sub DACs and ADCs (though slower), and must perform 2K sub IFFTs and FFTs. Moreover, both the TX and RX must make use of perfectly synchronous RF oscillators and RF mixers to perform sub-band upconversions and downconversions. Other techniques for mo/demodulation are also possible but added complexity is always present, in other forms.
In this paper, we elect to restrict the scope of the investigation to the case of a straightforward single-band PM-OFDM signal, leaving more elaborate architectures for future investigation.

Operations per bit for OFDM
The OFDM Tx makes use of an inverse FFT (IFFT) to create the modulating signals. Since we assume polarization multiplexing, two IFFTs are needed. The minimum order of the IFFT for the OFDM Tx coincides with the number N SC of necessary subcarriers. A higher-order IFFT can be used to increase the number of time-samples per OFDM symbol that the DACs use to create the modulating waveforms. This simplifies the removal of aliases off the spectrum and makes the analog electrical modulating waveforms into the electro-optical modulators more ideal. The increase in the IFFT order can be obtained through zero-padding, by imposing zeroamplitude coefficients to 'ghost subcarriers', which may reside on either side of the payload subcarrier spectrum. We assume that the Tx IFFT is of order n Tx N SC , where the 'oversampling factor' n Tx ≥ 1 takes zero-padding into account. Note that zero-padding requires faster DACs and the speed of DACs is one of the most critical aspects of OFDM implementation. Therefore a key design goal is to try and keep it as small as possible.
The minimum order for the Rx FFT is again N SC , like for the Tx IFFT. Oversampling can be operated at the Rx as well. The FFT would then process a larger number of samples than just N SC . This would allow some spectral margin against aliasing, specifically to protect the subcarriers at the channel band edges. In the calculations that follow, the FFT is assumed to be performed over n Rx N SC samples, where n Rx ≥ 1 takes oversampling into account. It is interesting to notice that the oversampling factors at the Tx and Rx, n Tx and n Rx , are independent of each other. They can be separately optimized according to the specific Tx and Rx individual optimization constraints.
Note that we are using the symbol n Rx both in the context of PM-QAM (see Section 4) and of PM-OFDM. In both cases it gives an indication of the oversampling that is carried out at the Rx, though the specific definition is somewhat different. Since the two different contexts of use will always be clearly defined, we keep the same notation.
Oversampling requires faster ADCs. Even though somewhat less critical from a technological viewpoint than DACs, ADCs are challenging too. Here as well, the design goal is to try and keep oversampling to a minimum.
Keeping in mind the above caveats regarding n Tx and n Rx , the number of arithmetic operations required to demodulate a single bit of the payload, or OP b , is given by: where OP s,T x and OP s,T x are the total number of operations needed to process a full OFDM symbol at the Tx and Rx, respectively. Such processing requires computing two IFFTs and two FFTs (one per polarization) over n Tx N SC and n Rx N SC complex numbers, respectively.
We assume that the available FFT technology is such that a FFT or IFFT performed over an array of N complex numbers requires a number of operations OP: Note that the asymptotic complexity of the split-radix algorithm [39] is such that q 4. The well-known Cooley-Tukey algorithm has a slightly larger OP count, but essentially behaves similarly. In [25], it was conservatively assumed q = 5. However, though important from a system design viewpoint, the actual value of q becomes largely irrelevant within a relative comparison of PM-OFDM with PM-QAM, if we assume that the same FFT technology is available to both systems. Actual implementation details may also deeply affect the on-chip performance of a certain FFT algorithm, but by the same reasoning they are unimportant to the effect of a comparison between the two formats, as long as they are using the same technology.
Taking Eq. (13) into account, we have at the Tx: To compute OP s,Rx one only needs to change the subscripts 'Tx' with 'Rx'. Note that the factor '2' in front of the right-hand side is due to the fact that two separate IFFTs are needed, one per polarization. Using Eqs. (11), (12) and (14), the following result is found: This estimate of operations per bit is not yet complete because CD has the effect of phaseshifting the symbols on each subcarrier through a different phase factor. Such phase factor can be estimated using pilot tones [29]. Irrespective of how estimation is done, the Rx FFT output must be multiplied times a complex correction factor which costs 6 operations per complex multiplication, per polarization. The total is then 12 operations, per subcarrier.
Taking the above into account, the OFDM total operations per bit becomes:

Number of FIR filter taps for PM-QAM
The use of FIR filters to implement fractionally-spaced equalizers (FSEs) to compensate for CD in CoD systems was proposed by Winters, back in 1990 [30]. More recently, Taylor has reframed the concept in the context of modern DSP [31]. Specifically, CoD PM-QAM needs two complex-coefficient FIR filters (Fig. 2) to compensate for CD (see for instance [32], [33]). The duration of the FIR filters impulse response τ F depends on the channel memory, that we will call µ. We will express both τ F and µ in number of symbol intervals. They are different quantities by definition but, if we constrain the FIR filters to exactly compensate for the channel memory induced by CD, then from well-known results of signal theory and filtering theory it follows τ F µ.
The channel memory µ depends both on the accumulated dispersion D · L and on the symbol rate R s . It also depends on the actual Tx pulse spectral/temporal shape. The smoother the pulse, the smaller µ and consequently τ F . Therefore, determining the actual length of the FIR filter is not simple. In fact, it should be done "a posteriori", based on a penalty constraint. One should gradually reduce the length of the FIR filter till a certain target penalty is incurred. However, different FIR impulse-response synthesis/optimization techniques could be used, which may lead to different results. Also, different Tx pulse shapes intrinsically generate different µ's and therefore require different FIR lengths.
In  The assumptions were: target penalty 2 dB, NRZ pulses obtained by passing square pulses through a 5-th order Bessel filter with bandwidth 0.8 · R s . Assuming operation at λ = 1550 nm, Eq. (17) becomes: τ F 8.5 · |D| · L R 2 s (the number '8.5' has dimensions [(nm 2 · s)/km]). Given the many factors that impact the FIR filter length, we decided to compare Eq. (17) with a formula obtained in a quite different way. It was best-fitted based on results from uncompensated systems using an MLSE receiver to mitigate dispersion. The MLSE receiver can be used to estimate the channel memory µ, simply by adding more memory to the Viterbi processor itself till the system-required OSNR closely approaches its asymptotic minimum vs. processor memory. Such processor memory (in number of symbols) gives an accurate estimate of µ.
In various MLSE systems, both through simulations and experiments [34]-[36], we found that the following law fitted well the needed processor memory vs. CD, for smooth NRZ pulses: If we again assume to operate at λ =1550 nm, then: µ 8 |D| · L · R 2 s . Since, as mentioned, the FIR filter impulse response duration should ideally match the channel memory, (18) also gives us an estimate of τ F for a FIR filter capable of compensating for the set amount of CD.
Eqs. (17) and (18) differ by about 6%. Given the completely different approach, this is quite a remarkable result and confirms the general validity of both estimates. The slightly higher value of (17) can be explained by the relatively mild Tx impulse smoothing. More drastic Tx filtering than applied in [32] is possible, as it was done for instance in [36], likely leading to some reduction of τ F . In the following, we will therefore take Eq. (18) as the estimate of µ, and hence of τ F .
The actual number of taps of the FIR filter needed to compensate for the channel memory would then ideally be: N F = τ F · n Rx , where n Rx is the number of samples per symbol taken by the Rx analog-to-digital converter (ADC).
The parameter n Rx is critical for system design and it is currently being debated how low it can practically be made. The value n Rx = 2 guarantees good performance whereas the value n Rx = 1 is the theoretical lower limit. It is also possible to use intermediate values, such as n Rx = 1.5. The lower n Rx , the lower is the FIR computational effort. However, operating close, or at, n Rx = 1 may cause large penalties due to aliasing and other problems [32]. The actual penalty will also depend on the pulse shape and consequent spectral occupation.
Note also that irrespective of the value of n Rx , it is mandatory that there is a digital clockrecovery circuit, possibly using an interpolation/decimation stage, that eventually provides one single sample per symbol to carry out decision. As stated in the introduction, we consider this topic outside of the scope of the paper. On the other hand, we point out that CD compensation can and should occur before clock recovery, also because the clock recovery circuit would not lock on a highly dispersed signal. Current system prototypes, such as Nortel's PM-QPSK at 43 Gb/s, follow this scheme. As a result, the computational effort of the FIR filters for CD compensation is correctly estimated using n Rx as the number of samples per symbol.
This issue is therefore complex. For now we conclude that the number of FIR filter taps is estimated to be: keeping in mind all the the caveats regarding n Rx . If the reference system parameters of Table 1 are substituted into (18), we find that the predicted channel memory at 3000 km is very large: about 310 symbols. Assuming a conservative n Rx = 2, then Eq. (19) yields N F 620.
Comparing Eq. (19) with (11), we remark that the factor µ = 8 |D| L · R 2 b /M 2 appears in the latter, too. This could be expected, since it is clearly 'channel memory' that drives the need for the cyclic prefix in PM-OFDM as well. Even though µ in (18) was defined and estimated in the context of PM-QAM, we will use it both in PM-QAM and PM-OFDM formulae to simplify the equations and ease comparisons. For instance, Eq. (11) can be rewritten in compact form as: As a consequence, also the formula expressing the number of OP b for PM-OFDM can be likewise simplified and becomes: Note however that µ is expressed in number of symbols assuming single-carrier transmission, not OFDM transmission. It is proportional to the channel memory, but it is not directly related to the OFDM symbol duration. Therefore, when used in the PM-OFDM context, µ should only be viewed as a conventional 'channel memory parameter' or simply as a convenient shorthand for the expression 8 |D| L · R 2 b /M 2 .

Operations per bit for PM-QAM
FIR filters for CD compensation can be implemented in 'time domain' (TD), but their computational effort scales quite unfavorably as the number of required filter taps N F [27]. A straightforward count of operations per bit of the two FIR filters shown in Fig. 2, assuming TD, leads to: By comparing Eq. (22) to Eq. (21), giving the computational effort for PM-OFDM, it is immediately seen that Eq. (22) scales as µ whereas Eq. (21) scales as log 2 (µ). The difference is striking and it shows that the TD approach would result in PM-OFDM having a far superior performance than PM-QAM. However, FIR filters can also be implemented in frequency domain, through the use of FFTs and IFFTs. Special algorithms are needed, because the use of straightforward FFT/IFFTs would perform a 'circular convolution' rather than a standard convolution. Perhaps the most wellknown such algorithm is called 'overlap-and-add' [26], [27], and we will assume its use in the following.
An important point to stress is that FIR implementation through the overlap-and-add algorithm requires 'block processing'. In other words, it is necessary to accumulate a certain number P of samples of the incoming signal and then block-process them together. The minimum value for P is N F , but choosing P > N F improves the algorithm efficiency. We will come back to the choice of P later. The filter output at each iteration will consist of a block of P samples of the dispersion-compensated signal. Note that the duration of an iteration is P · T s /n Rx .
Internally to the overlap-and-add algorithm, because of the way the algorithm operates, the block length is increased from P to P + N F . As a result, at every iteration, the overlap-and-add algorithm needs to perform: 1. one FFT over P + N F samples 2. P + N F complex multiplications of the FFT output times the channel transfer function 3. one IFFT over P + N F samples 4. N F complex sums.
Keeping Eq. (13) in mind, the number of operations per iteration referred to the above list is, respectively: 1. q · (P + N F ) · log 2 (P + N F ) 2. 6 · (P + N F ) 3. q · (P + N F ) · log 2 (P + N F ) 4. 2N F . Two FIR filters must be implemented, one per polarization. The total number of operations per iteration OP i is then: As found, OP i operations are needed to process a whole block of P input samples, over both polarizations. Then, we observe that each of these dual-polarization blocks carries P/n Rx symbols, corresponding to (M · P)/n Rx bits. As a result, the number of operations per bit is: For convenience, we now express the overlap-and-add block length as P = pN F . Using this equation and then (19) to relate N F to the system parameters and CD, we get: Finally, resorting to the channel memory µ to simplify the notation:

Comparison of PM-OFDM and PM-QAM computational effort for CD compensation
In this section we compare PM-OFDM with PM-QAM, first by trying to establish a realistic case-study. Later, we will attempt to carry out a comparison in more idealized 'asymptotic' conditions, to identify fundamental trends. For the first comparison, we operate in the context of the reference system set-up of Table 1. Specifically M = 4 and so, as PM-QAM format, we actually assume PM-QPSK.
The parameters of Table 1 do not address all of the quantities that appear in (21) and (26). We need to make further assumptions, which we try to do in a realistic and reasonable way. Nonetheless, it is clear that the following comparison cannot be viewed as a general result but, rather, a specific case-study.
We assumed the following.
• The PM-OFDM cyclic-prefix overhead parameter k is 1.122, corresponding to an OSNR penalty of 0.5 dB and a 12.2% bandwidth efficiency loss. • The PM-OFDM Tx and Rx oversampling factors are n Tx = n Rx = 1.25. • The PM-QPSK Rx oversampling factor is n Rx = 1.5, which appears to be reachable without incurring substantial penalties [32]. • The PM-QPSK overlap-and-add block-length parameter p is set to 7.5, i.e., the total length of the sample block that is processed by the Rx FFT/IFFT is P = (1 + p)N F = 8.5 · N F . This choice of p makes the FFT block-length identical between PM-OFDM and PM-QPSK. This makes the comparison more fair because the FFTs and IFFTs used by the two formats then have the same complexity. Note that p could be set as low as 1, allowing for a much smaller block-length for PM-QPSK. However, the total number of operations per bit would actually increase, since decreasing p makes the overlap-and-add algorithm less efficient. • For both systems we choose q = 5, as it was suggested in [25]. As mentioned before, q has little impact on a comparison between PM-QPSK and PM-OFDM, but it is necessary to set it in order to achieve an approximate estimate of the actual total number of operations per bit. Fig. 4 shows plots of OP b,PM−OFDM and OP b,PM−QPSK , obtained using the above parameters. The top plot assumes R b = 111 Gb/s whereas L ranges between 10 and 3000 km. Both systems use an identical FFT/IFFT block length of 4000 samples at 3000 km, linearly decreasing to zero as L goes to zero. For the sake of clarity we refrained from rounding up the FFT/IFFT block lengths to powers of 2.
As mentioned, PM-QPSK could use a much smaller block length: at 3000 km, assuming the minimum allowed value of p = 1, it could be as small as 1000 samples; however, the number of The added overhead for the mitigation of polarization-related effects is computed as follows. First, PMD causes an increase in the channel memory, so µ should be corrected to take this aspect into account: µ µ CD + µ DGD (30) where µ CD is given by (18) and µ DGD = τ DGD · M/R b . In turn, τ DGD is the channel memory due to the maximum differential group delay (DGD) that the system is designed to handle, measured in [ps]. We remind the reader that µ is measured in number of PM-QAM symbols but is also used as a channel memory parameter for PM-OFDM (see end of Sect. 4). In long uncompensated systems, however, µ DGD can be expected to be small as compared to µ CD , so that in many practical cases µ µ CD . Considering PM-OFDM, apart from the above amendment to µ, the number and size of the FTT/IFFTs remains unchanged. After executing the FFTs at the Rx, instead of the simple multiplication of the two output arrays times the CD phase-shift complex correction factor, suitable processing needs to be performed, called either 'butterfly' or multiple-input-multiple-output (MIMO) processing [22], which entails twice the multiplications and two further element-byelement complex sums (i.e., four real sums, see Fig. 5). Once this is factored in, the number of operations per bit for OFDM becomes: The only change with respect to Eq. (21) is in the last term, whose coefficient increases from 12 to 28. Regarding PM-QAM, theoretically a very similar reasoning could be used. At the Rx, FIR filtering must take place in a 'butterfly' fashion as well [10], again also called MIMO [3]. Apart from the correction to µ, the computational effort for the FFTs and IFFTs needed for frequencydomain FIR filtering remains the same as for CD alone. We now have four transfer functions rather than two, plus two more complex sums. Considering the overall butterfly filtering, we easily get:  The only change with respect to Eq. (26) is in the last two terms in the square brackets, that increase from 3 + 4/M to 7 + 8/M. However, for practical reasons, most proposed PM-QAM Rx implementations keep the CDcompensation stage separate from the polarization-effects mitigation stage. If so, a separate butterfly filtering stage follows two CD-compensating overlap-and-add FIR filters (Fig. 6). We will call this solution 'dual-stage' compensation, as opposed to the 'single-stage' butterfly processing of Fig. 5, and in the following we will evaluate the impact of this alternative filtering architecture too.
Stage duplication causes excess computational effort whose exact amount depends on various design aspects. We assume here that all CD is compensated for in the two FIR filters that make up the first-stage, so that the second-stage butterfly filter takes care only of polarizationrelated effects. Since the PMD-induced channel memory is relatively small ( µ DGD , a few symbols), most proposed PM-QPSK/PM-QAM implementations realize the second-stage in time-domain. Therefore, we assume time-domain implementation for the second stage. The resulting number of operations per bit per each of the four FIR filters needed for the second stage is then n 2 Rx (8µ DGD − 2) /M. The overall second stage computational effort, including the two final complex sums, is then: 4n 2 Rx (8µ DGD − 2) /M + 4n Rx /M. Incidentally, this formula yields results which are in agreement with those presented in [25].
We can now re-draw the top plot of Fig. 4 in Fig. 7, using the same system parameters, this time taking the polarization-related computational effort into account. We assumed a conservatively large maximum DGD of τ DGD = 120 ps, corresponding to a channel memory increase of 3.3 PM-QPSK symbols, at 111 Gb/s. In Fig. 7, we see that OP b increases only slightly if polarization effects are compensated for together with CD (single-stage processing), both for PM-OFDM and PM-QPSK. However, if a separate, time-domain second-stage butterfly filter is used (dual-stage processing) for PM-QPSK, then the overhead vs. single-stage is quite substantial, about 62% and 54% at 1000 and 3000 km, respectively. Fig. 7 gives some significant indications. First, looking at the single-stage results, it shows that the overhead due to polarization-related effects is substantial at short distances but almost negligible as soon as the amount of CD becomes significant. At 3000 km, it amounts to a just few percent, both for PM-OFDM and for PM-QPSK. The comparison between single-stage and dual-stage processing for PM-QPSK shows instead a very substantial overhead for the latter. This is striking in view of the fact that the second stage has to deal with a very small memory (5 taps per filter, given that n Rx = 1.5) as compared to the 310 symbols of channel memory which are dealt with by the CD-compensating first stage. The very different performance is due to the extreme efficiency of FIR filtering in frequency-domain, with the overlap-and-add algorithm, as opposed to the great inefficiency of time-domain FIR filtering.
However, the PM-QPSK dual-stage result of Fig. 7 should be taken with caution. Though accurate regarding the number of operations per bit, the plot gives no indication about circuit complexity. The time-domain second stage typically needs a small number of taps and, as a result, it requires a limited number of electronic elements and circuit floor space. Therefore, whereas the four lower curves of Fig. 7 represent a fair comparison, since they use the same FFT and IFFT technology and also use a similarly (large) data block size, the top curve gives a somewhat pessimistic rendering of the PM-QPSK dual-stage solution.
On the other hand, it is quite clear that the best solution to the combined CD and polarization effect compensation for PM-QPSK is that of the single-stage structure. A potential problem of the single-stage solution lies however in the estimation and, above all, in the speed of updating of the transfer-functions for the four FIR filters making up the butterfly structure: the specifications for the polarization rotation angular speed that are currently being debated are as fast as 100 krad/s, resulting in the need to update the transfer-functions many hundred thousand times per second. Such transfer functions must have the same length as the block length, i.e., their length may be on the order of several thousands elements. Their ultra-fast updating, as required by the specs on the speed of change of polarization-related effect in the fiber, may therefore be quite challenging. A solution to the transfer-function speed-of-update problem, which will not be investigated here, could be to implement both the first and second stage of the PM-QPSK dual-stage structure in frequency domain, separately. This way, the transfer function length of the second stage would typically be limited to a few tens of elements and would be more easily updated.

Conclusion
Our appraisal of the computational complexity involved in dispersion compensation for coherent polarization-multiplexed PM-OFDM and PM-QAM systems showed that the two classes of systems have similar behaviors.
PM-OFDM appears to require a somewhat lower computational effort than PM-QAM, but this relative advantage (about 30% less) is independent of bit rate and system accumulated dispersion. Rather, it depends on the different oversampling required by the two systems, which currently appears to be lower for PM-OFDM. As technology and DSP algorithms evolve, the tolerated oversampling factors could be reduced, somewhat altering the results. But even refinements to such parameters would not change the general picture. All in all, we have shown that the fundamental trends of complexity vs. CD-induced channel memory are asymptotically identical for the two formats.
When polarization-related effects are taken into account, we found that the increase in computational effort with respect to CD compensation alone is small for both PM-OFDM and PM-QAM, if a single-stage butterfly structure is used. With PM-QAM, if a dual-stage structure is used, with the second stage being a separate time-domain butterfly filter devoted to polarization effects only, then substantial computational effort overhead may occur, but the required added circuit complexity may be modest (see discussion in Section 7).
As a whole, we believe this study shows that the computational effort for CD and polarization-effects compensation does not seem to be a decisive discriminating factor between PM-OFDM and PM-QAM. A possible competition between the two systems would probably be decided by other aspects.
One such aspect is that PM-QAM systems with simple constellations may not need DACs at the Tx, which are quite challenging to implement at high speed. DACs are instead mandatory for all PM-OFDM systems. Also, PM-OFDM seems to be quite susceptible to non-linear effects and in particular to FWM and non-linear phase-noise [37], [38]. Computational effort due to clock/frame recovery and interpolation, especially for PM-QAM, should be investigated. Both formats need to track birefringence, PMD and PDL, dynamically, but this may be somewhat less difficult for PM-OFDM than for PM-QAM. All of these elements form a quite complex picture and how pros and cons balance out will need further investigation.