Blind , fast and SOP independent polarization recovery for square dual polarization – M QAM formats and optical coherent receivers

We present both theoretically and experimentally a novel blind and fast method for estimating the State of Polarization (SOP) of a single carrier channel modulated in square Dual Polarization (DP) MQAM format for optical coherent receivers. The method can be used on system startup, for quick channel reconfiguration, or for burst mode receivers. It consists of converting the received waveform from Jones to Stokes space and looping over an algorithm until a unitary polarization derotation matrix is estimated. The matrix is then used to initialize the center taps of the subsequent classical decision-directed stochastic gradient algorithm (DD-LMS). We present experimental comparisons of the initial Bit Error Rate (BER) and the speed of convergence of this blind Stokes space polarization recovery (PR) technique against the common Constant Modulus Algorithm (CMA). We demonstrate that this technique works on any square DP-MQAM format by presenting experimental results for DP–4QAM, –16QAM and – 64QAM at varying distances and baud rates. We additionally numerically assess the technique for varying differential group delays (DGD) and sampling offsets on 28 Gbaud DP–4QAM format and show fast polarization recovery for instantaneous DGD as high as 90% of symbol duration. We show that the convergence time of this blind PR technique does not depend on the initial SOP as CMA does and allows switching to DD–LMS faster by more than an order of magnitude. For DP–4QAM, it shows a convergence time of 5.9 ns, which is much smaller than the convergence time of recent techniques using modified CMA algorithms for quicker convergence. BER of the first 20 × 10 symbols is always smaller by several factors for DP–16QAM and –64QAM but not always for DP– 4QAM. ©2012 Optical Society of America OCIS codes: (060.0060) Fiber optics and optical communications; (060.1660) Coherent communications; (060.2330) Fiber optics communications; (200.4560) Optical data processing. References and links 1. S. J. Savory, “Digital coherent optical receivers: algorithms and subsystems,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1164–1179 (2010). 2. L. Liu, Z. Tao, W. Yan, S. Oda, T. Hoshida, and J. C. Rasmussen, “Initial tap setup of constant modulus algorithm for polarization de-multiplexing in optical coherent receivers,” in Proc. OFC2009, paper OMT2. 3. A. Vgenis, C. S. Petrou, C. B. Papadias, I. Roudas, and L. Raptis, “Nonsingular constant modulus equalizer for PDM-QPSK coherent optical receivers,” IEEE Photon. Technol. Lett. 22(1), 45–47 (2010). 4. R. Maher, D. S. Millar, S. J. Savory, and B. C. Thomsen, “Widely tunable burst mode digital coherent receiver with fast reconfiguration time for 112Gb/s DP-QPSK WDM networks,” J. Lightwave Technol. (submitted). 5. F. Vacondio, O. Rival, Y. Pointurier, C. Simonneau, L. Lorcy, J.-C. Antona, and S. Bigo, “Coherent receiver enabling data rate adaptive optical packet networks,” in Proc. ECOC2011, paper Mo.2.A.4. 6. J. E. Simsarian, J. Gripp, A. H. Gnauck, G. Raybon, and P. J. Winzer, “Fast-tuning 224-Gb/s intradyne receiver for optical packet networks,” in Proc. OFC02012, paper PDPB5. 7. M. Morsy-Osman, M. Chagnon, Q. Zhuge, X. Xu, M. E. Mousa-Pasandi, Z. A. El-Sahn, and D. V. Plant, “Training symbol based channel estimation for ultrafast polarization demultiplexing in coherent single-carrier transmission systems with M-QAM constellations,” in Proc. ECOC2012, paper Mo.1.A.4. #177532 $15.00 USD Received 5 Oct 2012; revised 19 Nov 2012; accepted 19 Nov 2012; published 29 Nov 2012 (C) 2012 OSA 3 December 2012 / Vol. 20, No. 25 / OPTICS EXPRESS 27847 8. G. P. Agrawal, Lightwave Technology: Telecommunication Systems (Wiley-Interscience, 2005). 9. J. P. Gordon and H. Kogelnik, “PMD fundamentals: polarization mode dispersion in optical fibers,” Proc. Natl. Acad. Sci. U.S.A. 97(9), 4541–4550 (2000). 10. D. Godard, “Self-recovering equalization and carrier tracking in two-dimensional data communication system,” IEEE Trans. Commun. 28(11), 1867–1875 (1980). 11. S. J. Savory and F. P. Payne, “Pulse propagation in fibers with polarization-mode dispersion,” J. Lightwave Technol. 19(3), 350–357 (2001). 12. D. Marcuse, C. R. Menyuk, and P. K. A. Wai, “Application of the Manakov-PMD equation to studies of signal propagation in optical fibers with randomly varying birefringence,” J. Lightwave Technol. 15(9), 1735–1746 (1997). 13. C. D. Poole and J. Nagel, “Polarization Effects in Lightwave Systems” in Optical Fiber Telecommunications, Vol. III A (Academic Press, 1997) Chap. 6. 14. E. Ip and J. M. Kahn, “Digital equalization of chromatic dispersion and polarization mode dispersion,” J. Lightwave Technol. 25(8), 2033–2043 (2007). 15. I. Fatadin, D. Ives, and S. J. Savory, “Blind equalization and carrier phase recovery in a 16-QAM optical coherent system,” J. Lightwave Technol. 27(15), 3042–3049 (2009). 16. I. T. Jolliffe, Principal Component Analysis (Springer-Verlag, 2002). 17. D.-S. Ly-Gagnon, S. Tsukamoto, K. Katoh, and K. Kikuchi, “Coherent detection of optical quadrature phaseshift keying signals with carrier phase estimation,” J. Lightwave Technol. 24(1), 12–21 (2006). 18. A. J. Viterbi and A. M. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with application to burst digital transmission,” IEEE Trans. Inf. Theory 29(4), 543–551 (1983). 19. O. K. Smith, “Eigenvalues of a symmetric 3x3 matrix,” Commun. ACM 4(4), 168 (1961). 20. P. Johannisson, H. Wymeersch, M. Sjödin, A. S. Tan, E. Agrell, P. A. Andrekson, and M. Karlsson, “Convergence comparison of the CMA and ICA for blind polarization demultiplexing,” J. Opt. Commun. Netw. 3(6), 493–501 (2011). 21. V. N. Rozental, T. F. Portela, D. V. Souto, H. B. Ferreira, and D. A. A. Mello, “Experimental analysis of singularity-avoidance techniques for CMA equalization in DP-QPSK 112-Gb/s optical systems,” Opt. Express 19(19), 18655–18664 (2011). 22. P. J. Winzer, A. H. Gnauck, C. R. Doerr, M. Magarini, and L. L. Buhl, “Spectrally efficient long-haul optical networking using 112-Gb/s polarization-multiplexed 16-QAM,” J. Lightwave Technol. 28(4), 547–556 (2010). 23. X. Zhou, and J. Yu, “200-Gb/s PDM-16QAM generation using a new synthesizing method,” in Proc. ECOC2009, paper 10.3.5. 24. P. Johannisson, H. Wymeersch, M. Sjödin, A. S. Tan, E. Agrell, P. Andrekson, and M. Karlsson, “Convergence comparison of CMA and ICA for blind polarization demultiplexing of QPSK and 16-QAM signals,” in Proc. ECOC2009, paper TH.9.A.3. 25. J. Yang, J.-J. Werner, and G. A. Dumont, “The multimodulus blind equalization and its generalized algorithms,” IEEE J. Sel. Areas Comm. 20(5), 997–1015 (2002). 26. X. Zhou, L. Nelson, R. Isaac, P. Magill, B. Zhu, and D. Peckham, “1200km transmission of 50GHz spaced, 5x504-Gb/s PDM-32-64 hybrid QAM using electrical and optical spectral shaping,” in Proc. OFC2012, paper OM2A.2. 27. M. J. Ready and R. P. Gooch, “Blind equalization based on radius directed adaptation,” IEEE Int. Conf. Acoust. Speech Signal Proc., 3, 1699–1702 (1990).


Introduction
Optical coherent receivers are key to higher transmission rates, where information is imprinted in multiple dimensions of the optical waveform.Polarization is one of those dimensions, where different complex waveforms can be independently imprinted on two orthogonal SOPs of the signal at the transmitter.However, at the receiver side, the received SOP is completely unknown.It is mapped onto a different orthogonal basis than the one used at the transmitter and is varying with time and distance.
To blindly untangle the polarization of the incoming waveform in an optical coherent receiver, the CMA algorithm is often used [1].However, CMA has several disadvantages.First, it suffers from the singularity problem where the algorithm converges to a tap-weight setup that produces the same transmitted signal at both equalizer outputs [2,3].Moreover, the convergence speed of the CMA is strongly dependent on the SOP of the received waveform ,and consequently varies a lot with the choice of the initial FIR tap values [2,4,5,6].This has become a major problem and is recently being addressed to comply with the fast transition requirements of emerging data rate adaptive optical packet networks [5,6], burst mode coherent receivers and agile optical network architectures where the receiver can be dynamically reconfigured to quickly drop a wavelength and switch to another channel [4].
Recently, new blind PR techniques have been proposed to speed up the convergence time required for blind polarization rotation estimation.A method based on CMA using 25 different initial CMA tap values, all processing in parallel, is presented in [4].It chooses the test case that provides the lowest CMA error to initialize the central tap of the equalizer.Using such scheme on DP-4QAM format provides a mean convergence time of around 35 ns, with a worst case convergence time of 280 ns [4].The computational requirement of processing independently and in parallel 25 self-adaptating equalizers is very substantial.In [6], a three-stage CMA enables blind recovery in 200 ns, or 11200 symbols at 56 Gbaud, which is hardly compatible with burst-mode operation [5].Some proposed techniques to reduce the convergence time of CMA use training symbols or preample headers.As an example, in [5], a preample of 30 header symbols is used to estimate to polarization rotation matrix and in [7], a training symbol based algorithm using 20 or 40 symbols is proposed to estimate the Jones channel matrix.Such techniques, although providing quick rotation estimates, not only add overhead to the system but also need synchronization with the preample header for polarization matrix estimation, and hence are not true blind approaches.
In this paper, we present a new method for blindly estimating the SOP of the optical waveform for coherent receivers applicable to any polarization multiplexed square-MQAM format.The process consists of converting the received waveform into the three dimensional Stokes space and looping over an algorithm looking for desired Stokes states on the Poincare sphere until a unitary polarization derotation matrix can be estimated.This matrix is then used to initialize the center taps of the subsequent decision directed-LMS polarization equalizer.The method realizes much shorter convergence time then recent methods using modified version of CMA to improve convergence speed.This paper consists of three parts.In Section 2, we present the mapping of Jones constellations to Stokes constellations for different DP-MQAM formats.We also summarize a well known model for waveform polarization rotation and polarization mode dispersion (PMD) in lightwave systems [8].We will use this model to motivate our approach.In Section 3 we introduce the new blind PR method and present in detail its operating algorithm.Section 4 explains the metric of comparison used to confront the convergence time and initial BER of our Stokes space blind method against that of CMA for 3 different modulation format: DP-4QAM, DP-16QAM and DP-64QAM.The experimental test bed is also presented in this section.Finally, Section 5 presents experimental performance comparison for CMA and the proposed method as well as numerical simulation results assessing the new method over varying PMD.Conclusion in found in Section 6.

DP-MQAM formats in Stokes space and modeling of SOP in lightwave systems
An optical coherent receiver collects simultaneously 4 electrical waveforms that represent the real and imaginary part of an optical waveform mapped into an orthogonal basis formed by the optical coherent front end.Those 4 waveforms of dimension 1 × N can be converted into a single waveform of dimension 2 × N which consists of a concatenation of N Jones vectors as in Eq. ( 1), where A x/y-Re/Im are the 4 sampled signals and N is the number of samples.The right hand side of Eq. ( 1) is a different representation of the combination of the 4 waveforms, where the magnitude, the absolute phase, the relative amplitude and sign on x and ŷ and the relative phase of x with respect to ŷ are all embedded in E n , δ n , θ n and φ n , respectively.
The 3-by-1 Stokes representation of Eq. ( 1) is written as As can be seen in Eq. ( 2), the absolute phase information e iδ is lost in the conversion from Jones to Stokes Space.The new blind Stokes space polarization estimation method we present in this paper will benefit from this absolute phase information loss.The reader can refer to [9] for details concerning conversions from Jones to Stokes space.
Different polarization multiplexed modulation formats can be represented in either the Jones or the Stokes space.Because it exhibits the extra phase information i e δ , Jones space is commonly use to represent constellations.Constellations in Stokes space are rarely shown and we depict here in Fig. 1 such constellations for a) DP-4QAM, b) DP-16QAM and c) DP-64 QAM, all of unitary mean signal power.As can be observed, much fewer states exists in this space; for instance, the 4 2 = 16 states of DP-4QAM map to only 4 states in Stokes space.The red rings in Fig. 1 depict the equivalent modulus used in the cost function of the blind CMA polarization derotation algorithm.For every square QAM formats, the modulus has to increase with QAM order [10].For instance, DP-4QAM, -16QAM and -64QAM formats have moduli of 0.5, 0.66 and ~0.69, respectively.As those moduli apply to both polarizations independently, they are depicted as a single ring of radius double those values.
The Stokes space representations of any square DP-MQAM format share a common feature: the most outer Stokes constellation points are all exactly aligned with the Stokes vectors -A  , and 3 A  at increasingly higher powers of 3(√MQAM-1) 2 /(MQAM-1).Our novel blind Stokes space polarization recovery method uses this theoretical alignment and power increase feature of the most outer Stokes constellation points for estimating the received SOP.As will be detailed in the next section, locating the position of those 4 regions provides an estimate of the unknown unitary rotation imprinted in the received signal.
It is out of the scope of this paper to explain the generation of specific polarization rotation matrices, or "unitary" matrices.For detailed explanations, we refer the reader to the paper from Gordon and Kogelnik [9].However, as our blind polarization recovery technique is strongly based on those rotations, the following provides a brief explanation on one of the many possible ways to rotate a single Stokes vector from one position to another on the Poincare sphere.The simplest way to move Â to B is by applying on Â a rotation matrix that spins the Poincare sphere on an axis given by the cross product of Â and B , ˆˆÂ B p × = , and by an angle given by the inverse cosine of the dot product of Â and B , arcos ˆ( ) . The notation ' Â ' means the unitary normalized Stokes vector of ' A  '.
There are actually an infinite number of possible unitary rotation matrices that rotate Â to B .

Modeling of SOP in lightwave systems
As we know, the birefringence in an optical fiber breaks the degeneracy of the 2 eigenmodes that can propagate in a fiber at the same propagation constant.Birefringence gives 2 eigen modes each having its respective propagation constants β j (ω).Up to the first order, one can describe those 2 modes in the Jones-matrix and Stokes-matrix forms [8,11,12] ( ) ( ) where ( ) T are the Pauli spin matrices σ k and are defined in [8,9].The term Δβ 0 produces a differential phase shift, while the term ωΔβ 1 leads to a temporal delay between the two orthogonal eigenvectors of U, which directly leads to PMD.Said differently, Δβ 0 engenders simple rotations, or "spinning" of all the states, independent of frequency, whereas ωΔβ 1 leads to spinning of the components of the waveform as a function of their frequency, inherently resulting in differential delays.In the frequency domain, polarization dispersion manifests as a frequency dependent state of polarization at the output of the fiber [13].One can note that the eigenvectors of U † σ 1 U are, in Stokes space, ± b , such that for constant birefringence, a SOP aligned on ± b wouldn't change.However, the birefringence vector b  has varying orientations and amplitudes along the fiber length and has to be a function of z.For the zero th order, the magnitude of b  is Δβ 0 and for the first order, its magnitude is Δβ 1 .The solution of Eq. ( 3) can be written in the form where W(z) is a function of solely z.If we apply Eq. ( 5) to Eq. (3), we get the 2 equations 0 ( ) ( ( ) ) ( ) 2 If b  was z-independent, the solution to W(z) would simply be 0 where "exp" is the exponential of the S ω  at z = 0.This easily shows mathematically that PMD is simply a polarization rotation that is frequency dependent [13].If PMD was neglected (Δβ 1 = 0) and constant birefringence was assumed, ( , ) A z t  would simply precess around b •as it propagates, at an angular speed proportional to Δβ 0 .However, as b  is z-dependent, we can assume without any loss of generality that birefringence is piece-wise constant along the fiber, that is, b (z j ≤ z ≤ z j+1 ) = b j and that for every z j ≤ z ≤z j+1 , both Δβ 0j and Δβ 1j vary.The solution of ( , ) , in the form of Eq. ( 5), where the total fiber length is divided into N shorter pieces.None of the matrices in Eq. ( 9) commute, so the order of application is paramount.U c (z,ω) is the composite Jones matrix of the whole fiber link [11] and is also a unitary matrix at each frequency.In SMF fibers, Δβ 1 is normally small .If we neglect PMD, all the T(z j ,ω) matrices become the identity I, ( , ) A z ω  becomes ω-independent and the result of the propagation of | A(z,t)⟩ is simply that its SOP wanders around on the Poincare sphere as it propagates, adding no impairment on the initial waveform o S  .One can observe that for a static link without PMD, the SOP is constant at a fix distance and only very slowly varying for small first order birefringence.
The polarization recovery technique that we propose in this paper allows us to estimate blindly the cumulated polarization rotation U c (z,ω) with the hypothesis that the cumulative PMD after distance z is small.Said differently, it estimates the resulting matrix of Eq. ( 9) with the T's assumed to be almost the identity I.

New blind polarization rotation estimation
The technique we present in this paper allows estimating blindly the polarization rotation of the received signal after the optical front end of an optical coherent receiver.This technique can be used for any square DP-MQAM modulation format.In this section, we will explain this new method of blind polarization rotation recovery.
The very first step is to convert the 4 waveforms sampled by the real-time oscilloscope into a concatenation of Jones vectors as explained in Eq. ( 1).Then, the data is resampled at T/2, where T is the symbol duration.The third step is to apply the inverse chromatic dispersion (CD) [14].At this point we make the assumption that we have a 2-by-time matrix of Jones vectors that are simply rotated and noisy versions of the Jones vectors that were transmitted, that is, rotated and noisy versions of Figs.1(a), 1(b) or 1(c) for DP-4QAM, -16QAM or -64QAM modulation formats, respectively.Mathematically, we assume that we have a series of This blind polarization rotation estimation operates differently than other blind PR techniques like CMA.The mode of operation of our technique is different and independent of the steady-state mode of operation for polarization tracking.On the contrary, a technique like CMA can be used from system startup and be kept for steady state operation for some modulation formats like DP-4QAM [1].The CMA algorithm iteratively adapts its coefficients blindly such that they minimize a phase independent cost function [10].The CMA self-recovering equalizer is more often used as pre-convergence to reduce the effects of channel distortions in polarization before switching to the decision-directed Least-Mean Square (DD-LMS) equalizer which requires an acceptable average of good decisions by the slicer [10,15].
In the technique presented in this paper we also share the same end goal of steady state operation in DD-LMS, but the blind technique implemented to get there is different.The proposed SOP estimation process in Stokes space is not an iteratively adapting process like CMA.The process waits until it obtains enough satisfying SOP states, at which point the rotation matrix U ROT is estimated.Only then adaptive processing of new incoming data starts, adapting in decision directed mode.
Detailed steps for obtaining U ROT summarize as follows: 1. Generate Jones vectors as in Eq. ( 1), resample at T/2, apply matched filter/inverse CD 2. Keep only vectors that have power greater than a power threshold P th , convert those to Stokes vectors, save their power, normalize them to unity and store them in a buffer, forming a 3-by-N matrix that we call S 3,N , where N grows by 1 for each new valid entry.
3. For each new entry to the buffer matrix S  1. U ROT is then estimated in the following way: 5.1.Because all vectors stored in S 3,N have high power, a plane can be fit in a least mean square sense to those three dimensional data points.Figure 3 shows how a plane can be fit to a power discretized noisy stokes constellation.The norm ˆhole p of this plane is estimated as being the eigenvector of the smallest eigenvalue of the covariance matrix C COV of S 3,N , where each vector in S 3,N has to be previously rescaled to its respective power, stored in the power vector in Operation 2. This is (C) 2012 OSA also known as the Principal Component Analysis (PCA), or more precisely as the eigenvalue decomposition of the data covariance matrix [16].From PCA, we know that the i th eigenvalue represents the variance of the projection of all vectors stacked in S Â and an angle given by the mean angular deviation of the 3 clusters with respect to angles nπ/2, n = {0,1,2,3}.To remove the periodicity of nπ/2 in the angle φ of Eq. ( 2) in the 3 clusters, we use the 4th power algorithm [17,18].
5.3.The last estimate is about finding the absolute phase e iδ as in Eq. ( 1) allowing estimating the initial phase before beginning processing in DD-LMS mode.For this part, from the Stokes vectors used in 5.2 we keep only the ones that fit within the last 10 ns before Operation 4. was successfully completed.As laser linewidths are typically less than 2.5 MHz in coherent fiber optic systems [4], we assume that the received signal's phase noise can be considered constant within 10 ns.The initial Jones versions of those Stokes vectors have to be used.As the phase e iδ is present in both the x and ŷ part of every Jones state, the number of phase values δ k used to estimate δ is double the number of Stokes vectors used in 5.2.
Prior to estimating e iδ , we need to suppress the intermediate frequency offset.To do so, we use all the Jones vectors created in Operation 1., even the power discarded ones, from system startup until the end of Operation 4. From this time series, we can estimate the frequency offset by raising each Jones vector to the fourth power and identify the peak frequency using a Fast Fourier transform [4].
Obtaining an estimate of e iδ helps the DD equalizer to start in making good decisions.The polarization derotation estimation matrix U ROT is mathematically represented as Here the order of application of the 2 matrices is of paramount importance.The first matrix applied is the one that rotates ˆhole p to either ± 1 A  , the second matrix rotates the three , where ' k ' is the mean in dimension k.We can show that the total computational complexity of computing C COV and the smallest eigenvalue with its eigenvector is small.Normal eigenvalue decomposition are numerically done iteratively which is a computationally intensive process.However, C COV has the advantageous property of being a real symmetric 3-by-3 matrix.The eigenvalues of such matrices can easily be directly calculated, all independently [19].Moreover, computing the eigenvector once an eigenvalue is known is quite trivial and goes as follows.If  1 v is the desired column eigenvector of the eigenvalue λ 1 of matrix C COV , we can compute From the equation we know that the eigenvector  v v = 1.The 4 most power hungry processes required in finding U rot are 1) computing the Stokes vectors that satisfy the power criterion P th , 2) projecting each new Stokes vector onto all previously stored vector, 3) computing C COV and 4) computing λ 1 and  1 v .Without detailed steps, we can show that the entire computational complexity of all those processes require only (3N 2 + 30N/2 + 46) real multiplications and (N 2 + 16N + 16) additions, where N is the number of Stokes vectors stacked in S 3,N .As N is very well below 100 and only 1 symbol every 6 on average are converted to Stokes, those numbers confirm the low complexity.
We model the data obtained after Operation  power discriminator as green dots and the others as blue dots for a) DP-4QAM, b) DP-16QAM and c) DP-64QAM when the noise level gives a BER of 3.8 × 10 −3 .We can clearly observe that the direction that has the smallest variance would be pointing towards the region where less green dots are found, hence the name ˆhole p .The P th radius is the red sphere.For a total mean signal power of 1, one can find a radius P th that leads to a specific power discrimination ratio, i.e. the total number of Stokes points over the number of Stokes points outside the red sphere.Figure 3 depicts the results for a ratio of 100.Signals with higher Signal to Noise Ratio (SNR) need a smaller P th to maintain a power discrimination ratio, and vice versa.As propagation and consequently amplification adds noise to the signal, P th has to be slightly increased with distance in order to maintain a ratio and consequently a good estimate of ˆhole p .Observing the green "caps" on top of Figs.(a), (b) and (c), we understand that signals with higher SNRs exhibit 4 clusters that are more regrouped, providing a better estimate of ˆhole p .As higher order QAM inherently require higher SNR for the same BER, the blind PR technique presented in this paper works very well for higher order QAM modulation formats, which is not true for common blind adaptive PR techniques like CMA, because their inherent cost function is not designed for high order QAM [20].
A major benefit of this blind Stokes space polarization recovery compared to the iteratively adapting CMA algorithm is the avoidance of the singularity problem of CMA [21], where 2 severely rotated inputs can converge to the same output.As we force the SOP to rotate back to the { 2 A  , 3 A  } plane, the x and ŷ outputs presents independent information.
When the derotation matrix U ROT is estimated, its 4 values as used to initialize the center tap of the Decision Directed-Least Mean Square with built-in Phase Lock Loop (DD-LMS + PLL) equalizer that starts right after U ROT 's estimation.
As mentioned in Eq. ( 9), the proposed algorithm compensates for the "zero th " order PMD, or said differently for the global polarization rotations occurring on the signal during propagation.The assumption that the cumulative first order PMD in U c (z,ω) at the receiver is small depends on the fiber type and more precisely on its PMD Q value.Newer fiber types show very little PMD, with maximum PMD Q of 0.04ps/√km.Even after 6400 km, they exhibit a very small mean DGD of around 3.2 ps.Even at a high baud rate of 28 GBaud, this value represents less than 10% of the symbol duration.Much greater instantaneous DGD can be tolerated for digital optical coherent receivers.As the fiber used on the experimental setup has a maximum PMD Q of 0.04ps/√km, experimental results do not assess the proposed polarization recovery technique over large PMD with respect to symbol duration.Consequently, we will present numerical simulations for which the algorithm is tested over varying DGD and sampling offsets in subsection 5.1, after experimental results, and show the accuracy and speed of the algorithm over much larger DGD.

Comparison metric and experimental test bed
We compare our novel blind polarization recovery algorithm in Stokes space only against the CMA [10] scheme.We reason this not only because CMA is well known and widely used, but also because most other blind algorithms do not converge as well.For example, the radius-directed algorithm [15] relies on decisions based on the amplitude, and it has been found that this algorithm does not converge well in practice [22].A recently proposed cascaded three-modulus blind equalization algorithm has shown worse convergence properties than the CMA [23] and can only be used after a channel estimate has been obtained.The same holds true for decision-directed algorithms operating on non-converged data, which depend on most decisions being correct.In [20], the convergence rate of both CMA and the Independent Component Analysis (ICA) is improved compared to the standard CMA by applying computationally expensive algorithms to the data.One method runs three independent computations of blind polarization recovery using different matrices initializations, similar to [4].The other method performs the gradient descent based on all the observed symbols from system startup up to cumulative time k, instead of on single symbol for standard CMA.ICA converges faster but is always more costly than CMA, and even more so for higher order QAM like 16QAM [24].We will consequently compare our technique against the easily implementable CMA algorithm with the objective of switching to DD-LMS + PLL as quickly as possible for steady-state operation.We claim that the time saved to get to convergence translates into saves in computational expenses, as an extra computational effort has to be spent during the additional time required for blind adaptive techniques to converge.During this time, 2 complex cost function errors have to be computed for every excessive required symbol, where the error of CMA (y n R-* n y y n y n ) is more expensive than that of LMS ( ˆn y -y n ). Figure 4 depicts in Stokes space the convergence location of three blind equalization algorithm schemes for DP-16QAM signals, namely (a) the CMA, (b) the Radius Directed Equalizer (RDE) [15] and (c) the Multimodulus Algotithm (MMA) [25].Although considered blind, the MMA algorithm is phase dependent as its cost function minimizes the distance between a square instead of a circle(s) as in CMA (RDE).By comparing the figures in Fig. 4 against the constellations in Stokes space of Fig. 1(b) for DP-16QAM, we understand that the "thickness" in the ± 1 Â dimension as well as the presence of multiple valid rings driven by the cost function have an impact on the speed of convergence of the blind methods for varying initial signal SOP.Even if all those convergence rings align in planes of norm 1 Â , their multiple presence in several planes, e.g.A 1 = 0 (3 rings), A 1 = ± 0.4 (2 rings) and A 1 = ± 0.8 (1 ring), slows the speed at which a rotated SOP is blindly forced back to the desired A 1 = 0 plane.On the contrary, the single ring of CMA forces the SOP to fall more quickly to the A 1 = 0 plane, while leaving it freely rotating within that plane.A similar conclusion due to the thickness of the MMA convergence region in Fig. 4(c) explains its slower blind adaptation speed compared to CMA.Hence, as highlighted in [22], we observed a reduced robustness in filter pre-convergence using the multi-ring CMA and therefore opted for the classic CMA.

Comparison metric
We will use two metrics to assess our method.Firstly, we will compare the convergence time of our technique against that of the CMA having a built-in PLL.We define the convergence time as the number of symbols required by a blind algorithm to switch from blind polarization recovery to DD-LMS + PLL.For our technique, the convergence time is the total number of received symbols before U ROT can be estimated and used to initialize the taps of the subsequent DD-LMS + PLL.For the blind CMA adaptive technique, the convergence time is the number of symbols required before the average number of bits in error is less than 5% on each polarization, evaluated over a total number of 1000 received bits (500 per polarization).We decided on a BER threshold of 5% as we experimentally observe consistent postconvergence of DD-LMS for any QAM order using this value.Yet this BER is a high bound as it is more than double the current best pre-FEC BER of 2.4% for soft-decision FEC [26].
As an example, for DP-4QAM, -16QAM and -64QAM formats, the observation window, in symbols, onto which the BER is calculated is 250, 125, and 84, respectively.These short window lengths allow switching to LMS as soon as feasible, preventing to stay in CMA mode for a longer, undesired period.When the BER reaches such a value, the 2-by-2 multipleinput-multiple-output (MIMO) filter switches its method of adaptation to DD-LMS.For smooth transition from CMA to DD-LMS, the same PLL is used inside both polarization recovery schemes.To decide if we update the phase of the PLL or not, we use a criterion based on the signal power threshold on each polarization, as presented in [27].
To show in a different way the quickness of our new method, we will also compare the mean number of bits in errors using only the first 20 × 10 3 symbols.Again, we compare our Stokes space blind polarization recovery technique against that of CMA.This will show that our matrix estimation in Eq. ( 10), including the important initial phase estimate e iδ , yields an initial BER that is much smaller than what obtained using the common CMA algorithm.

Experimental test bed
We compare performance of our new method against CMA for 3 different modulation formats: DP-4QAM at 28 Gbaud, DP-16QAM at 28 Gbaud and DP-64QAM at 7 Gbaud.This will allow proving experimentally that the technique works for any square MQAM order.The experimental test bed is depicted in Fig. 5.We use an Arbitrary Waveform Generator (AWG) from MICRAM © to generate the multi-level signals applied to a Dual-Parallel Mach-Zehnder (DPMZ) modulator that modulates CW light out of an external cavity laser (ECL) that has a linewidth smaller than 100 kHz.Dual-Polarization is emulated using Polarization Beam Splitters/Combiners and an optical delay line.The single carrier optical signal is then boosted to the desired launch power; we launch at -2.2 dBm for these experiments.The signal then propagates inside an optical recirculating loop which contains 4 spans of 80 km of Corning SMF-28e + LL © and 4 in-line EDFAs.No optical dispersion compensation modules are present.On the receiver side, the signal gets filtered, amplified and re-filter before hitting the optical coherent receiver front end.The Local Oscillator (LO) laser is the same type as the transmitter.The Dual-Polarization 90° Optical Hybrid is from Kylia © , the balanced photodiodes are BPDV2020R's from U 2 T © and the real-time oscilloscope is Agilent © 's DSO-X 93204A sampling at 80 GSa/s with 3dB bandwidth of 33 GHz.The recirculating loop allows propagation by steps of 320 km.

Results
We present in this section the convergence speed and the initial BER for DP-4QAM in Fig. 6, DP-16QAM in Fig. 7 and DP-64QAM in Fig. 8.We compare the BER of the first 20 × 10 3 symbols when a blind CMA + PLL filter starts operating on the received data, after resampling at T/2 and CD removal, against our Stokes Blind Recovery method.As our technique has to discard or buffer incoming data before estimating the polarization derotation matrix whereas CMA outputs decisions as soon as it starts processing, we consider the discarded and buffered Jones symbols when computing the BER of the Stokes space method, for a fair comparison of an equal number of samples starting at the same time.Those initial data are polarization entangled, off-phase and freely moving in both polarization and phase, therefore are very bad data for the slicer.For rapid blind convergence of the CMA algorithm, we used a relatively large adaptation coefficient of μ = 10 −3 for all QAM orders.Even when considering those discarded/buffered symbols, the initial BER of the Stokes Space Blind Recovery method gives better performance than CMA.For DP-4QAM format, some traces had their initial SOP at the receiver fortuitously aligned to that of the PBS axis of the coherent receiver.For example, one capture after 1600 km shows that even if the minimum of 250 symbols where needed in CMA + PLL mode before switching to LMS + PLL (from the need to have at least 1000 bits collected to evaluate the BER), the initial BER of the first 20K symbols is still less than that of the Stokes Space Recovery method that required only 38 symbols to switch to LMS + PLL.The reason is that the first 38 symbols had both their phase and polarization tracked in CMA + PLL whereas they are left wandering in the Stokes Space Recovery algorithm.It is important to note that if we would assume that buffered incoming symbols could be post-processed by the polarization/phase derotation matrix of Eq. ( 10) after its estimation, before feeding them to the slicer, the initial BER of the Stokes Space algorithm would be even smaller, and those rare cases where CMA performed better than Stokes for DP-4QAM would most probably be reversed.
The sawtooth look of the required number of symbols for CMA in Fig. 6(b) for largely varying noise levels exhibits the strong dependence of the speed of convergence of CMA with respect to the received SOP and no dependence on the SNR.Strongly varying convergence speeds are observed at both large and little noise levels.The Stokes space method, however, shows no dependence on the input SOP, and only a slight dependence on the noise level depicted by the slope.This slope is caused by the voluntary increase of the power discriminator P th with distance in order to keep a power discrimination ratio and consequently a good estimate of ˆhole p .For DP-4QAM, we let P th increase as P th = 1 + LoopNo/40, where LoopNo is the loop count.This means that at 6400 km where we reach BER = 3.8 × 10 −3 , P th = 1.5 which is well below P th = 2.18 for a ratio of 100 as in Fig. 3  As demonstrated in the figures, the average initial BER of the Stokes Recovery method can be smaller by an order of magnitude or more compared to that of CMA.This shows the accuracy of our estimation of U ROT : ˆhole p , the angle φ/2 in the 2nd matrices in Eq. ( 10) and the absolute phase e iδ are well estimated.As both the PLL and the MIMO filters start with good initial conditions, locking of those processes happens more quickly.For CMA, if the initial polarization entangling is very severe ( ˆhole p pointing around ± 3 Â or ± 2 Â ), the algorithm will take more time to adaptively blindly untangle the information on both polarizations [4].
The independent PLLs working on x and ŷ output polarizations operate on providing an absolute phase to the Jones vectors on a per symbol basis and help locking the SOP on the { 2 A  , 3 A  } plane to the 2 Â and 3 Â axis, preventing spinning within the plane.
For any square QAM of 4, 16 or 64-ary, at any distance (any noise level), our experimental results show that the number of required symbols needed to estimate U ROT before switching to DD-LMS never exceeds 830.For DP-16QAM, at a BER of 2.2% after 2240 km, a total of only 829 symbols impinging on the receiver were needed to estimate U ROT and start operating in DD-LMS.At 28 Gbaud, this is a very short 29.polarization convergence, which is known to be the process taking the most time to converge [4].For the DP-64QAM format at 7 Gbaud after 640 km at a BER of 1.4%, only 554 symbols were needed, converting into only 79.1 ns.Finally, for DP-4QAM, after 6400 km giving a BER of 0.38%, only 165 symbols were required before successfully switch to DD-LMS + PLL.This converts to a mere 5.9 ns.We can compare those results with recently reported convergence time of blind CMA algorithm applied on DP-4QAM signals, where modification were applied to the CMA in order to decrease its convergence time.In [4] they report a mean convergence time of 40 ns, with a best case of 20 ns and a worst case of 280 ns.In another work [6], the CMA algorithm achieves blind recovery in 200 ns, or 11200 symbols.By comparing our convergence time results, we demonstrate the rapidity of convergence of our novel blind Stokes space polarization recovery technique.
For the DP-16QAM and -64QAM, the initial BER within the first 20 × 10 3 samples is always better; around 3 times lower for DP-64QAM and even more for DP-16QAM.In Figs. 6 to 8, (a), we can appreciate how close the curves of the Stokes space method are to the lower bound of steady-state operation, proving the accuracy of the method's estimation of U ROT .

Testing over PMD
We present in this subsection numerical simulations where the proposed polarization derotation method is assessed over varying instantaneous DGDs, assumed constant for the short period required by the algorithm to find the derotation matrix U ROT .The proposed SOP recovery method is also assessed over different DGD when the sampling instances of the Analog to Digital Converters (ADCs) don't align with the center of the symbols' location, for more realistic received waveforms.We call this sampling offset with respect to the symbols' location the Common Group Delay (CGD).This can also be true for two-fold oversampling as is the case for processing the received waveforms in our simulations.Mathematically, the numerical model is described as follows where τ is the DGD and κ is the CGD.U rand,1 and U rand,2 are random rotation matrices as defined in [9].The two orthogonal eigenvectors of U rand,1 form the basis that will observe a show the accuracy and speed of the proposed polarization untangling process over largely varying DGD Finally, the required number of symbols obtained numerically for instantaneous DGDs smaller than 0.2T match the value obtain for real propagation shown in Fig. 6(b).Simulations give between 200 and 245 required symbols on average for DGDs between 0 and 7.1 ps where experiments show around 200 symbols for the same modulation format and noise level with mean DGD of 0.04ps/√km × √6400km = 3.2 ps.Consequently, similarity of the experimental results with simulation over comparable range of PMD confirms the mathematical model used and validates results for larger DGD.The proposed method would successfully operate for fibers with higher PMD Q like the G.652D.From the Maxwellian distribution of DGD with the sole knowledge of DGD MEAN , we can be compute the probability that the instantaneous DGD be smaller than 0.9T on G.652D fiber after 6400 km with 28 Gbaud signals giving DGD MEAN = 0.448T: a probability of 1-P(DGD>0.9/0.448× DGD MEAN ) = 98.4%.

Conclusion
We presented a novel method for blind estimation of the SOP and polarization rotation for single carrier channels and coherent receivers.This method can be used either on regular system startup or for fast switching burst mode receivers.We showed experimentally that the algorithm outperforms the convergence time for any DP-square-QAM format by about an order of magnitude compared to the standard CMA algorithm.The numbers of symbols required to blindly derotate the received SOP before the self-adapting equalizer can operate in decision-directed-LMS mode is used to assess convergence.Unlike CMA, this blind Stokes space polarization recovery algorithm is completely independent of the input SOP of the signal.We compared experimental results of convergence time and initial BER when the CMA and when the Stokes space technique were used to blindly untangle the receiver SOP for 3 modulation formats of DP-4QAM, -16QAM and -64QAM.After propagation over SMF 28e + LL fiber, we obtain convergence within 5.9 ns at BER = 0.38% after 6400 km for DP-4QAM, 29.3 ns at BER = 2.2% after 2240 km for DP-16QAM and 79.1 ns at BER = 1.4% after 640 km for DP-64QAM.The initial BER of the first 20 × 10 3 symbols when the Stokes space technique is used is always much closer to the steady-state operation than when CMA is used for tap adaptation, for DP-16QAM and -64QAM formats.For DP-4QAM, some fortuitously well aligned input SOP on system startup can give better initial BER when using CMA.However, the Stokes space method outperforms CMA even for DP-4QAM as it is robust to any input polarization alignment.Finally, we simulated PMD testing on our Stokes space polarization recovery technique to assess its robustness over larger PMDs than what provided by the experimental test bed.We demonstrated the accuracy and speed in obtaining the derotation matrix over DGDs as large as 90% of the symbol duration for perfect sampling offset and as large as 30% when the initial central tap of the filter is off by one third of a symbol with respect to the symbols' center location.The speed of convergence using the blind Stokes method on large PMD is still generously smaller than that of CMA on little PMD.

Fig. 1 .
Fig. 1.Mapping of all possible Jones states to Stokes space for theoretical (a) DP-4QAM, (b) DP-16QAM and (c) DP-64QAM.For (d), theoretical DP-64QAM is rotated by a random unitary matrix.The total mean power is always unitary.The gray sphere has a radius of 1.
noise source and U ROT (R ROT ) is an unknown unitary rotation matrix in Jones (Stokes) space.Here, by | A Rx ⟩ we mean the waveform after the three first steps.Figure1(d)shows a randomly rotated constellation of a DP-64QAM signal.As mentioned in the previous section, UROT  is almost constant at a fix distance for fiber with small PMD coefficient.Our goal is to estimate U ROT with the sole knowledge of the time series of | A Rx ⟩.Our technique resides in estimating the location of 3 out of the 4 outer-most states in the Stokes constellation, as depicted in Fig.1, and to realign those estimated corners to their desired location, i.e. along the ± 2 A  and ± 3 A  axes.#177532 -$15.00USD Received 5 Oct 2012; revised 19 Nov 2012; accepted 19 Nov 2012; published 29 Nov 2012

clusters of data points to around ± 2 A  and ± 3 A
 and the last term e iδ estimates the absolute phase.#177532 -$15.00USD Received 5 Oct 2012; revised 19 Nov 2012; accepted 19 Nov 2012; published 29 Nov 2012 (C) 2012 OSA

Fig. 2 .
Fig. 2. Explanation of algorithm in operation 4.: The n th column of (S) 3-N is the green dot.By projecting vectors onto each other, we need at least C-1 other vectors close by, C vectors perpendicular and C opposite to estimate U ROT .The loop in Operation 4 stops and U ROT is estimated as soon as three circles in Fig. 2 that are opposite and perpendicular possess C Stokes vectors.Computing U proj for every new Stokes vector buffered in S 3,N is not a computationally expensive task: U proj-N is simply S 3,N T S 3,N .One can compute U proj-N from U proj-(N-1) by only computing the projection of all the previous vectors stored in S 3,N-1 with the new added vector S  , and append S 3,N T S  to the previous U proj-(N-1) .Because of pre-normalization, U proj is a real symmetric matrix with ones on the diagonal and values between 1 and -1 elsewhere.The number of real multiplication and additions required to compute U proj-N of size N-by-N is 3N(N-1)/2 and N(N-1), respectively.In Operation 5.1., a covariance matrix C COV is computed and the eigenvector of the smallest eigenvalue is extracted.The C COV is expressed as cov , ,

1 v
the determinant of B is zero, all b  's lie in the same plane and consequently  is simply the cross product of any two b  's, which can subsequently be normalized to unit power such that  

Fig. 3 .
Fig. 3. Power filtering representation in Stokes space for several QAM order.Noise loaded such that BER = 3.8 × 10 −3 and P th set such that on average, for every 100 Stokes vectors, 1 has powers exceeding P th .(a) DP-4QAM with SNR of 8.53 dB, P th = 2.19 (b) DP-16QAM with SNR of 15.2 dB, P th = 2.01, (c) DP-64QAM with SNR of 21.1 dB, P th = 2.01.

Fig. 6 .
Fig. 6.(a) BER vs Distance of first 20K symbols for DP-4QAM format at 28 Gbaud comparing CMA and Stokes space Recovery.(b) Number of symbols required before switching filter from blind to DD-LMS + PLL.
(a).In fact, P th = 1.5 represents a theoretical power discrimination ratio of ~5.8 for DP-4QAM at BER = 3.8 × 10 −3 .As roughly 200 symbols were needed for U ROT 's estimation after 6400 km, an average of 35 symbols were stacked in S 3-N and used for the 1st matrix estimate (Operation 5.1), and only 3C = 12 or a little bit more were kept for the 2nd matrix estimate (Operation 5.2) and the phase estimate (Operation 5.3) of U ROT .The BER of the first 20x10 3 symbols shows the accuracy of our Stokes Space estimate in comparison to the steady state DD-LMS + PLL curves which represents the lowest bound for the initial BER of any method taken to get to steady state.

Fig. 7 .
Fig. 7. (a) BER vs Distance of first 20K symbols for DP-16QAM format at 28 Gbaud comparing CMA and Stokes Space Blind Recovery.(b) Number of symbols required before switching from blind to DD-LMS + PLL.

Fig. 8 .
Fig. 8. (a) BER vs Distance of first 20K symbols for DP-64QAM format at 7 Gbaud comparing CMA and Stokes Space Blind Recovery.(b) Number of symbols required before switching from blind to DD-LMS + PLL.

#
177532 -$15.00USD Received 5 Oct 2012; revised 19 Nov 2012; accepted 19 Nov 2012; published 29 Nov 2012 (C) 2012 OSA 3 December 2012 / Vol. 20, No. 25 / OPTICS EXPRESS 27865 When all the conditions in 4. are satisfied, the Stokes vectors saved in S 3,N are used to estimate U ROT .The subset of vectors in S 3,N that made stop the looping algorithm and satisfied 4.1 to 4.3 represent 3 out of the 4 outer-most Stokes constellations corners, introduced in Fig.
3,N onto the i th eigenvector.Hence our choice for ˆhole p .The first rotation matrix in U ROT can be generated by the knowledge of ˆhole