Clustering algorithms for Stokes space modulation format recognition

: Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity.


Introduction
Coherent detection in combination with digital signal processing (DSP) allows for implementation of advanced signal processing techniques, enabling increased spectral efficiency, digital impairment compensation, and in-service optical performance monitoring [1].This detection scheme linearly maps amplitude and phase of an optical field into electrical, and subsequently, digital signal, which is then processed with DSP algorithms.In principle, a coherent receiver is a software-defined receiver, capable of acquiring and demodulating any modulation format for which appropriate algorithms are implemented.Simultaneously, optical networks are evolving towards dynamic lightpath switching driven by paradigms of self-managing and self-optimization [2].For short-lived connections, lasting in the order of seconds, the delay incurred by the control plane may constitute a considerable overhead.Therefore, autonomous modulation format recognition (MFR) functionality, implemented as a subsystem in a multi-format coherent receiver, will lift the limitation imposed by a slow control plane, and allow for increasing the rate at which lightpaths can be switched.It will enable the receiver to demodulate and recover information from signals for which prior information on the modulation format is unavailable.Another potential application for MFR is in optical performance monitoring, where a dedicated device is used to acquire information about optical channels present in an optical link.
Various approaches to optical MFR can be found in the literature, each of which has its limitations: (a) method based on k-means clustering of the received constellation diagram [3], which introduces a circular dependence between demodulation algorithms and the MFR subsystem; (b) direct-detection-based technique employing artificial neural networks trained with features extracted from optical eye histograms [4], which is not resistant to fiber impairments, such as chromatic dispersion (CD); (c) procedure based on the computation of signal cumulants [5], requiring prior polarization demultiplexing; (d) method based on the distribution of the amplitude histogram [6], which requires prior knowledge of optical signalto-noise ratio (OSNR); (e) classification using the received signal intensity histogram [7], not compatible with QAM modulation formats; (f) counting the number of clusters (groups of neighboring points) formed in Stokes space by using statistical signal processing methods [8], requiring high OSNR; and (g) a hybrid approach, combining (f) and (c) [9], which alleviates OSNR limitation.Out of these techniques, the Stokes-space-based MFR (Stokes MFR) subsystem is the most versatile, as it allows for modulation classification at an early stage in the DSP chain, before digital polarization demultiplexing [8], thus enabling subsequent DSP algorithms to be optimized for the detected modulation format.Moreover, due to properties of the Stokes parameters, the Stokes MFR subsystem is independent of polarization rotation and mixing, as well as frequency and phase offsets of the received signal, impairments which are inherent to coherent optical systems.In our previous work in (f), we used Gaussian mixture model combined with variational Bayesian expectation maximization for clustering and cluster counting.However, the OSNR limitation of our previous method was caused by approximation of noise in Stokes space by a trivariate Gaussian distribution.In fact, Stokes space transformation considerably distorts the noise probability density function [10], effectively penalizing our previous approach at low OSNR.However, clustering and symbol counting can be accomplished by other algorithms, better suited to non-spherical or irregular noise distributions.
In this paper we perform an extensive analysis of the following clustering algorithms applicable for Stokes MFR: k-means, expectation maximization (EM), density-based spatial clustering of applications with noise (DBSCAN), ordering points to identify the clustering structure (OPTICS) and spectral clustering.We use the following polarization division multiplexed (PDM) modulation formats: binary phase shift keying (BPSK), quaternary PSK (QPSK), 8-PSK, 8-ary quadrature amplitude modulation (8-QAM), and 16-QAM to test algorithms' reliability in terms of optical signal-to-noise ratio (OSNR) needed.Furthermore, we propose a novel clustering algorithm for Stokes MFR, based on maximum likelihood between received data and stored features of the targeted formats.It is shown that the new method provides the best tradeoff among the performance metrics analyzed.

Stokes space modulation format recognition
The Stokes space representation of a signal is obtained, after CD compensation and timing recovery are performed, by computing the Stokes parameters of the linearly polarized received signals x and y, from samples at the ideal symbol instants, according to Eq. (1).
This results in total signal power (S 0 ) and polarization of the received wave (S 1 , S 2 , S 3 ).The vector (S 1 , S 2 , S 3 ) T , after normalization by max(S 0 ), allows for visualization of the transformed signal as a set of points in the Poincaré sphere, the Stokes space.As it can be seen in Eq. ( 1), Stokes-space-transformed signal becomes independent of polarization rotation and mixing, carrier frequency and phase offsets, since this transformation involves only relative phase differences between signal polarizations (x and y).Moreover, only symbol instants are used for this transformation, which requires 1 complex sample per symbol and does not depend on the received signal pulse shape.For each polarization-division-multiplexed (PDM) modulation format, a unique "fingerprint"a distinct number of clusters, inscribed in a threedimensional lens-like object [11] will form in the Stokes space.Counting the number of clusters is sufficient to distinguish between many common modulation formats, which is the principle behind Stokes MFR.The top row of Fig. 1 shows constellation plots of five different modulation formats: BPSK, QPSK, 8-PSK, 8-QAM, 16-QAM.The bottom row depicts the respective Stokes space representation of the PDM versions of these signals, arbitrarily rotated, and possibly translated with respect to the origin of the Stokes space.
Misalignment between the received polarization-multiplexed signals with respect to the axis of the receiver is translated into an arbitrary rotation of the clusters in the Poincaré sphere around the axis defining linear polarization (S 1 ).In the simplified case with only one birefringent element, the clusters will rotate maintaining their relative positions.However, fiber impairments such as chromatic dispersion (CD), differential group delay (DGD), polarization mode dispersion (PMD), or polarization dependent loss (PDL) will introduce dispersion to the Stokes representation caused by non-constant fiber rotation matrix over the signal bandwidth [11] and/or time span used for modulation analysis.Techniques for compensation of some of these effects in Stokes space have been investigated in the literature [12].

Clustering algorithms
Clustering algorithms, also known as unsupervised learning techniques, are machine learning methods which intend to classify raw data set into separate groups of points based on similarity measures between the different samples.
We compare the performance of five different clustering algorithms applied to modulation format recognition in Stokes space: k-means, EM, DBSCAN, OPTICS and spectral clustering.We also introduce a new maximum likelihood method which measures similarity of an unknown signal to a template thus performing MFR.
We use the following notation through this paper: the data set matrix is represented by X where x i represents i-th sample, the size of the database is denoted by N. Regarding clusters, K is the total number of clusters to be found, and k is an index representing k-th cluster.C k is the set of points' indices that are assigned to cluster k.

K-means
K-means [13] produces a partition of the data set into K different clusters, exclusively based on distance measures between the points.The objective of this method is to find the positions of the clusters' centroids (µ k ) which minimize the cost function, defined as .
K-means iteratively finds centroids of clusters in a data set.After randomly selecting an initial position for each centroid, the method iterates a two-steps algorithm until convergence is met.The first step assigns each sample to its closest centroid, while the second recomputes the location of clusters' centroids according to new assignments.The procedure ends when convergence is reached, when there is no change of the cost function value between successive iterations.

Expectation maximization
EM [14] aims to extract the parameters of a Gaussian mixture model, i.e., a sum of weighted multivariate normal distributions, cf.Equation (3), by fitting the input data: means (µ k ), covariance matrices (Σ k ) and mixing coefficients (π k ).Here we assume noise probability EM usually uses k-means for initialization of parameters defining the model (µ k, Σ k , π k ).Afterwards, expectation (E) and maximization (M) steps, similar to k-means, are iterated until convergence is reached.First, in the E step, probabilities for each sample belonging to each cluster, also called responsibilities, are computed.In the M step, the set of parameters defining the model is recomputed using the new responsibilities values.The iteration ends when the cost function does not change between successive iterations.

DBSCAN
DBSCAN [15] is a density-based method, which clusters the data set based on the neighborhood of samples.This aims to solve some recurrent problems associated with kmeans or EM: i) assumption of Gaussian shaped clusters; ii) the need to specify the number of clusters prior to clustering procedure.DBSCAN only requires two parameters: the minimum radius for the clusters (ε) and the minimum number of points required to form a cluster (MinPts).
The algorithm checks if, for each sample in the data set, the needed conditions to start a new cluster or expand an existing one, defined by the algorithm, are met.These conditions are specified by the algorithm, as described in Eq. ( 4).In the case of fulfillment, the method looks for points in the neighbourhood accomplishing the same rule.The found samples also belong to the same cluster.

 
3.4 OPTICS OPTICS [16] is a generalized version of DBSCAN, and therefore, it is also a density-based method.The main difference between them is the capability of dealing with different density clusters contrarily to DBSCAN, which is particularly useful when dealing with Stokes representation of signals, since clusters have different density depending on the distance from the center.The output of the algorithm is an ordering of the samples in the data set from which is possible to extract the corresponding clustering assignments.This ordering is made according to two new variables associated to each data point: the reachability distance and the core distance.These have been extensively explored in the literature.The procedure followed by OPTICS is very similar to that of DBSCAN, moreover including the computation of these two new variables for each visited point [16].

Spectral clustering
Spectral clustering [17] has its bases on linear algebra and unlike other algorithms does not require the assumption of a certain density function.By performing a data transformation on the received signal, spectral clustering produces a more separated clustering structure from which it is easier to recognize the number of clusters.In the literature several approaches have been taken.In our comparison we follow the approach outlined in [17].
The transformation consists of computing the symmetric normalized Laplacian matrix of the data set, which takes into consideration similarity measurements between points, and, on this matrix calculate as many eigenvectors as clusters to be found.After this mapping, the eigenvectors are used as the new samples over which a simpler clustering algorithm like kmeans is used to obtain the final results.

Maximum-likelihood-based algorithm
As mentioned in section 2, Stokes-space-transformed data may suffer rotation of the lens-like object inside of the Poincaré sphere.After compensating fiber impairments (CD, DGD, PMD, PDL), the arriving signal roughly maintains the relative position of the clusters.This allows to define a new method for identifying the modulation format by direct comparison of the transformed points in the Poincaré sphere rotated by an unknown angle with precomputed, ideal locations of clusters' centroids.The comparison is performed by minimizing the Euclidean distance between the two sets of points.This procedure is different from the ones followed by clustering techniques, which rely on determining cluster count.The method consists of four steps, shown in Fig. 2, which should be performed for each tested modulation format.The first step aims to determine rotation of the clusters in the S 1 plane.We use precomputed QPSK Stokes centroids as a reference for this rotation search since these four centroids constitute a subset of all higher-order modulation formats under test (8-PSK, 8-QAM, 16-QAM), while BPSK is treated as a special case with a separate test for two centroids.We define the cost function in Eq. ( 5) as the Euclidean distance summation between each sample of the arriving signal x i and the closest point k from the set of precomputed QPSK centroids y rotated by angle θ, y k (θ).The algorithm aims to select the angle α for which the cost function attains its minimum.By using these four centroids,, we can fairly accurately determine the rotation of the received signal around S 1 , given 90° rotational symmetry of the investigated constellations and their Stokes representations.Additionally, using y with only four centroids reduces computational complexity.Then, in step two, the precomputed centroids for all tested modulation format are rotated by angle α found in the previous step.This rotates their reference frame to the reference frame of the received signal.Subsequently, in step three, points of the received signal are assigned to the rotated centroids from step two based on the minimum Euclidean distance.After the assignments have been performed for all possible modulation formats under test, the best fitting is determined according to the silhouette coefficient (as explained in section 3.7) and hence the modulation format is detected.The proposed algorithm offers the main advantage of not requiring any initial parameter contrary to most of the clustering algorithms, in which parameters' selection may substantially influence the clustering process.However, the result can be degraded if the impairments compensation is imperfect and clusters' relative position was not very well maintained.

Number of clusters evaluation
Some of the methods analyzed here (k-means, EM and spectral clustering) need to specify the number of clusters to be found in the data set in advance.Different methods exist to discriminate between clustering outputs and decide the best choice for the number of clusters in the literature.
In this work, the Silhouette coefficient has been used.Silhouette compares the tightness of different clustering structures by analyzing the intercluster and the intracluster distances [18].The more compact clusters are in the data set, the higher is the coefficient and the more likely it is the correct number of clusters.

Numerical simulation
The system architecture used in our simulations is shown in Fig. 3.It consists of a transmitter able to generate five different PDM modulation formats (BPSK, QPSK, 8PSK, 8QAM and 16QAM), an additive white Gaussian noise channel introducing variable noise power, and a universal Stokes-space-based receiver capable of receiving any of the modulation formats transmitted.
The reliability of the clustering algorithms (k-means, EM, DBSCAN, OPTICS, spectral clustering, maximum likelihood) is analyzed using 500 realizations of 2000 samples each, for each modulation format considered.OSNR range spanning 5-30 dB (0-15 dB for BPSK) in steps of 0.1 dB has been examined in order to determine reliability of each clustering method as a function of OSNR.Specifically, the minimum OSNR required to correctly recognize the modulation format (i.e.correctly recognize modulation format in at least 95% of realizations) is a key parameter to assess performance of clustering algorithms and feasibility of MFR in general.
In Table 1, the parameters specific to each clustering algorithm used in the comparison are summarized.They were adjusted empirically in order to maximize reliability of algorithms after several realizations.K-means and EM are randomly initialized, and spectral clustering is also initialized randomly after data transformation.The ML algorithm does not require any parameters to run.

Simulation results
Reliability of clustering methods for each modulation format is measured as the percentage of correct classifications (cases in which the recognized modulation was in agreement with actual).Figure 4 shows the results of the reliability analysis for the considered OSNR range.Each cell in the plot shows the outcome for a different modulation format: (a) BPSK, (b) QPSK, (c) 8-PSK, (d) 8-QAM, and (e) 16-QAM.FEC limit, in terms of OSNR resulting in a bit error rate equal to 3.8 × 10 3 at a symbol rate of 28 Gbaud, is shown with different background colors, with red corresponding OSNR below the FEC threshold, and green above.The general observation is that most clustering algorithms achieve correct classification at viable OSNR values.However, we notice performance differences in the lowest OSNR required by each method to give a significant percentage of detection.A surprising result for some algorithms is the deterioration of reliability as the OSNR increases.This is mainly due to high sensitivity to parameter values, which could not have been simultaneously optimized for all tested modulation formats and OSNR ranges, and were kept constant for all analyzed cases.Suboptimal initialization of clusters is also responsible for this behavior.Moreover, algorithms, in particular k-means and EM, which assume isotropic (spherical) distance or PDF functions, can be further improved for operation in the Stokes space [19].Another interesting observation, is that for low OSNRs, algorithms, with exception of k-means, converge to the smallest possible number of clusters, i.e. 2 for BPSK, thereby selecting it by default.This explains the excellent reliability of these clustering algorithms for BPSK for OSNR values as low as 0 dB, significantly below the FEC threshold.On the other hand, k-means tends to cluster noisy signals into the largest possible number of clusters, i.e. 60 for 16-QAM, which is due to the fact that more clusters result in smaller error function values, and simultaneously, possibility of overfitting.
K-means and OPTICS achieve better reliability for most of the modulation formats.However, in some cases like 16-QAM for OPTICS, reliability is poor for low OSNR.In this case, the combination of both methods: k-means for low and OPTICS for high OSNR, will result in high reliability for multiple modulation formats over whole OSNR range.
On the other hand, the ML-based algorithm achieves considerably good results as compared with the FEC limit for all modulation formats.No parameters are necessary for this clustering method, resulting in a very good performance for all noise powers.In Fig. 5 the minimum OSNR needed for each algorithm is shown.This metric is measured as the lowest signal-to-noise ratio value required to achieve reliability higher than 95%.Missing bars marked with * symbol, represent cases in which clustering did not achieve 95% of correct classifications for any OSNR value under test (range under test is the same as in Fig. 4).Contrarily, no bars for BPSK symbolize that the modulation is already correctly recognized at 0 dB, due to preference of most algorithms towards low-count component clustering, as described earlier.Finally, the relative complexity of each of the implemented algorithms was estimated as being proportional to algorithm runtime.The results presented in Fig. 6, were obtained by calculating average runtime and normalizing with respect to the slowest time (spectral clustering with QPSK).Spectral clustering is globally the slowest method for all modulation formats studied, while density-based and ML based methods are the most rapid ones.It should be noted that MFR does not need to run continuously.A dedicated side-processor could perform the MFR functionality on-demand or periodically.In order to perform MFR in an efficient manner, the modulation format recognition could be triggered only on network reconfiguration (fiber switching).This may be done by monitoring for short loss of signal events the receiver.

Conclusions
Modulation format recognition is a key functionality implemented in software-defined coherent receivers for future cognitive optical networks, in which delay introduced by the control plane has to be minimized.Among the different alternatives, Stokes MFR stands out by offering modulation recognition at an earlier stage in the DSP, thus enabling subsequent DSP algorithms optimization.
We have compared the performance of MFR in the proposed architecture using six different clustering algorithms (k-means, EM, DBSCAN, OPTICS, spectral clustering and ML based) discriminating between five dual-polarized modulation formats (BPSK, QPSK, 8PSK, 8QAM, 16QAM) in terms of OSNR performance, accuracy and complexity.Classification without a priori information is proved to be possible in all the OSNR range considered if a combination of methods is used.For example, k-means might be used at low OSNR and OPTICS at high OSNR, producing valuable results for all modulation formats except 8-PSK.They offer a very good balance between OSNR performance and complexity, but require little knowledge about approximate OSNR of the communication link.
Finally, a new clustering method based on maximum likelihood has been proposed, that does not require knowledge of initial parameters.This proposed algorithm features a good trade-off between OSNR performance and complexity even when used alone.

Fig. 1 .
Fig. 1.Constellations of polarization-multiplexed modulation formats (top row) and their corresponding Stokes space representation in the Poincaré sphere (bottom row).

Fig. 2 .
Fig. 2. Visual representation of the steps for the ML-based algorithm, applied to discriminate between BPSK, QPSK and 8PSK polarization multiplexed modulation formats.

Fig. 3 .
Fig. 3. Setup of the simulated optical communications system.Typical stages of DSP are shown with Stokes-based modulation format recognition step highlighted in green.

Fig. 4 .Fig. 5 .
Fig. 4. Reliability results versus optical signal-to-noise ratio for the clustering algorithms.Average of 500 realizations.Green background represents the OSNR range higher than FEC limit for the corresponding modulation format at BER = 3.8 × 10 3 at 28 Gbaud.

Fig. 6 .
Fig. 6.Relative complexity, evaluated in terms of algorithm runtime, for all clustering algorithms under test.