Automatic Model Determination for Quaternion NMF

Nonnegative Matrix Factorization (NMF) is a well-known method for Blind Source Separation (BSS). Recently, BSS for polarized signals in spectropolarimetric data, containing both polarization and spectral information, was introduced. This information was encoded in 4-dimensional Stokes vectors represented by quaternion numbers. In the proposed Quaternion NMF (QNMF), the common challenge of determining the (usually) unknown number of quaternion signals remained unaddressed. Estimating the number of signals (aka model determination) is important, since an underestimation of this number results in poor source separation and omission of signals, while overestimation leads to extraction of noisy signals without physical meaning. Here, we introduce a method for determining the number of polarized signals in spectropolarimetric data, named QNMFk. QNMFk integrates: (a) Quaternion Alternating Direction Method of Multipliers (QADMM) implemented for QNMF, (b) random resampling of the initial quaternion data, and (c) custom clustering of sets of QADMM solutions with same number of sources, k, needed to estimate the stability of the solutions. The appropriate latent dimension is determined based on the stability of the solutions. We demonstrate that, without any prior information, QNMFk accurately extracts the correct number of signals used to generate synthetic quaternion datasets and a benchmark spectropolarimetric data.


INTRODUCTION
P OLARIZATION is a property of transverse waves that describes oscillations perpendicular to the direction of propagation of the wave. Modern sensors allow us to capture both the spectral signatures and the polarization information of different wave types, such as, electromagnetic waves [1], seismic waves [2], acoustic waves [3], and many others. The polarized wave includes two real-valued signals (u(t), v(t)) which can be represented as a complex function φ(t) ∈ C 2 ; where, φ(t) = u(t) + iv(t) [4]. The complex function, φ(t), describes an elliptical trajectory and the corresponding wave field is called elliptically polarized. When the direction of the arrival of such wave field is known, a standard description of the polarization ellipse can be done by the Stokes parameters, S 0 , S 1 , S 2 , and S 3 , which are measurable quantities [5].
Recently, it has become clear that obtaining both spectral and polarization information often leads to new results and understanding in various applications [6]- [10]. This new experimental technique, that involves recording the Stokes vector (S 0 , S 1 , S 2 , S 3 ) for every pixel of the image to obtain such information, is called spectropolarimetric imaging.
In practice, spectropolarimetric imaging information usually contains mixtures of multiple polarized sources which remain latent (not directly observed). To infer these original sources solely from data, one needs the power of modelfree unsupervised blind source separation (BSS) techniques [11]. BSS separates the source signals from their observed mixtures with no additional information about the original signals [12], [13]. BSS includes factorization methods, such as, Independent Component Analysis [14] and Non-negative Matrix Factorization (NMF) [15], which decompose the observational matrix (containing mixture of the original signals) into a product of two low-rank matrices, called factors. One of these factor matrices contains the original sources and the other -their activities. A unique advantage of NMF is that the extracted latent sources are explainable because they are parts of the original data [16]. The identification of the (usually unknown in practice) number of original sources is an important step in BSS. There are various heuristics used to find this important number. The Automatic Relevance Determination (ARD) method, originally introduced for neural networks by MacKay [17] and then adapted for probabilistic interpretation of NMF by Fevotte and Tan [18] is one of the popular Bayesian techniques for determination of the latent dimensionality.
Recently, Quaternion Non-Negative Matrix Factorization (QNMF) was introduced as a new BSS method for analyzing of wide-band signals and frequency-dependent polarization fields present in spectropolarimetric data [19]. QNMF is a generalization of NMF that is able to work with polarized signals. QNMF relies on quaternion numbers to encode polarization information and the physical constraints of Stokes parameters. However, to obtain meaningful and explainable factors, the introduced QNMF needs prior information of the number of the latent polarized (quaternion) sources in the system -information that is usually unavailable.
In this paper, we introduce a method for estimating the true number, k, of original quaternion sources, whose mixture generates the analyzed data. We refer to this method as QNMFk. QNMFk is a generalization of a recent method for automatic model determination, called NMFk, for quaternion matrices. When applied to a large number of synthetic datasets with predetermined latent dimensions, NMFk has demonstrated a superior performance in comparison to other heuristics [20]. QNMFk determines the latent dimension based on the cluster stability of QNMF solutions of a set of random quaternion matrices with mean value the original observational data. Thus, QNMFk utilizes a specific quaternion bootstrap and determines the unknown number of latent sources based on stability of QNMF solutions (with different number of quaternion sources, k) without prior assumptions. We demonstrate that QNMFk accurately determines the number of latent sources and factors in several synthetically generated polarized signals as well in a previous benchmark quaternion data with predetermined number of polarized sources.

POLARIZATION AND STOKES PARAMETERS
Stokes parameters provide a characterization of polarized waves by means of four intensity parameters S 0 , S 1 , S 2 , S 3 . Here, S 0 is the total intensity of the wave while S 1 , S 2 , S 3 describe the polarization. The degree of polarization P ∈ [0, 1] is given by P = √ where P = 1 corresponds to a completely polarized wave, P = 0 to completely unpolarized wave, and anything in between is deemed partially polarized. Stokes parameters can be represented by a 4-dimensional vector S = (S 0 , S 1 , S 2 , S 3 ) ∈ R 4 known as the Stokes vector. Physical properties of waves lead to algebraic constraints on the space of Stokes vectors. The first constraint (i) S 0 ≥ 0 indicates that there cannot be a negative intensity wave, while the second constraint (ii) S 2 0 ≥ S 2 1 +S 2 2 +S 2 3 , imposes that the total wave intensity is greater than or equal to the polarized components magnitude.

QUATERNIONS
Instead of considering Stokes vectors as elements in R 4 , every Stokes vector can be represented algebraically as a single quaternion number in H. This number completely describes the polarization state and encodes the physical constraints of the parameters. This encoding of the data allows us to represent spectropolarimetric data as an M × N matrix with quaternion elements. Formally, quaternion numbers are defined as the set, [21]. For any q ∈ H, the real part of q is denoted by Re(q) = a while the imaginary part (or vector part) is Im(q) = b i + c j + d k. Quaternions have component-wise addition and component-wise scalar multiplication. The multiplicative group structure is known as the Hamilton product and is given by the distributive law and multiplication of basis elements {1, i, j, k}. The Hamilton product is associative but not commutative. Similarly to the complex numbers, the conjugate of q is defined as the To be admissible as a Stokes vector, the parameters must satisfy constraints (i) and (ii) imposed by the data as stated in the section 2.1. This dictates that we work with quaternions q = a + bi + cj + dk which satisfy the non-negativity constraints S given by S : a > 0 and b 2 + c 2 + d 2 ≤ a 2 . Therefore, we will be interested in a 'non-negative' subset H S ⊂ H given by H S = {q|Re(q) > 0 and |Im(q)| 2 ≤ |Re(q)| 2 }. The identification of Stokes vectors to this constrained set is crucial for our studies as it formally generalizes the notion of 'non-negativity' to quaternion numbers. H S is isomorphic to the set of 2 × 2 non-negative (or Positive Semi-Definite) Hermitian matrices K ∈ + given by the map For an arbitrary quaternion q ∈ H, we can project it to H S by considering its associated 2x2 complex matrix defined above and projecting to K ∈ + . The projection is given by where η i and v i are the i th eigenvalue and eigenvectors. Then, our projection is given by Flamant et al. [19] introduced a QNMF algorithm, which is viable when the underlying number of polarized sources, k, is known beforehand. The original paper relies on a  alternating least squares method to approximate the factorized matrices. Alternatively, one can employ a quaternion alternating direction method of multipliers (QADMM) [22]. From our testing, we found that QADMM provides faster and more consistent results. This is valuable since our QNMFk algorithm, which determines k, requires computing multiple factorizations with different values of k.

QUATERNION NMF
Quaternion Non-negative Matrix Factorization (QNMF) finds the original polarized sources from the observed data matrix, X, where every entry of X is a superposition/mixture of these original polarized sources. This means that if we have M wavelengths and N sensors, for some wavelength λ and sensor u, the polarized signal x(λ, u) is assumed to be linear combination of K < min{4M, N } distinct basis sources w 1 , ..., w K ∈ H S and non-negative activations h 1 , ..., h K ∈ R, i.e., (1)

QUATERNION ADMM UPDATES
ADMM is an optimization procedure used to solve convex problems by breaking the problem into easier subproblems [23].
where χ H S is the characteristic function indicating if the matrix adheres to the Stokes constraint, and χ + is the nonnegative characteristic function.

QNMF WITH AUTOMATIC MODEL DETERMINATION
The QADMM algorithm is adequate for extraction of latent polarized sources when their true number k is known, however this information is not typically known in practice. To determine this (usually unknown) number, we introduce here, QNMFk, a method that integrates: (a) QADMM minimization, (b) random resampling of the initial data, and (c) custom clustering. For the determination of the unknown number of latent features we follow the ideas of a recent NMF algorithm [24]- [26]. Algorithm 4 details this procedure and a flowchart of QNMFk can be seen in Figure 1.

Algorithm 4 QNMFk
Input: X, ρ W , ρ H Output: k for k = 1, . . . , min(4M, N ) − 1 do for r = 1, . . . , R do X r ← Π H S (X + δ r ) W r , H r ← QADM M (X r , k, ρ w , ρ H ) end for C k ← Cluster(k, W 1 , . . . , W R ) s k ← SilhouetteStatistics(C k ) end for Based on the original data, QNMFk creates an ensemble of randomly perturbed quaternion matrices on which it performs QADMM minimization. While QNMFk can explore all possible numbers of quaternion sources, k = 1, ..., min(4M, N ) − 1, in practice this range of investigated k values can be tighter. For each k, a set of R ∼ 100 minimizations is computed with random initial conditions. Thus, each QADMM minimization works on a random quaternion matrix: X r = Π H S (X + δ r ), where each δ r is a small uniformly distributed quaternion random perturbation centered at 0. After we construct the set of R solutions (with the explored k), the k columns of all quaternion feature solutions, {W 1 , . . . , W R }, are subjected to custom clustering into k clusters. This custom clustering is similar to k-means in that it iteratively assigns vectors to centroids but it differs in that it holds that each one of the k clusters must contain exactly one column from each W r matrix (for this k) [20]. These conditions need to be enforced since each QADMM minimization contributes one solution W r with the same number of columns, k, and accordingly it has to supply exactly one element to each of the k clusters. In the clustering, the similarity between the vectors is measured with cosine similarity, defined as cos(θ) = u·v u v for two vectors u and v. After each iteration of clustering, the centroids of each cluster are computed as the mean of assigned vectors and the process is iterated until convergence. Finally, to determine the number of latent features, QNMFk calculates the stability of the clusters for each k, via Silhouette statistics, s k [27]. Silhouette statistics quantifies (a) the cohesion and (b) the separability of clusters. The Silhouettes measure the average similarity between elements belonging to the same cluster (cohesion) and compare it to the similarity of the other clusters (separability). The average Silhouette value, s k , ranges from -1 to +1, where +1 indicates that the vector matches to its assigned cluster well and poorly matches the neighboring clusters, while -1 shows that the vector does not belong to the cluster it was assigned by the clustering. The number of latent quaternion sources, k, is determined as the maximum number, k, of stable clusters. The centroids of the clusters for the determined latent dimension can be used to reconstruct the initial quaternion matrix, X.

EXPERIMENTS
Here, we demonstrate and validate QNMFk on datasets with predetermined patterns and numbers of sources. The application to synthetic data allows the validation of both the latent dimension, as well as correct feature extraction. First, we show QNMFk applied to synthetically constructed data where the polarization orientations contain the six primary polarization orientations. Additionally, we examine QNMFk applied to several randomly generated datasets, as well as to spectropolarimetric data generated in [19]. In all experiments a grid search was conducted to determine the convergence parameters. The convergence parameters for ρ W = ρ H that minimized the objective function after 1000 iterations were selected independently for each latent dimension.

Synthetic Constructed Data
To display and test our methodology, we first apply it to a synthetic matrix indexed by four wavelength channels and one hundred sensors, X ∈ H 4×100 . Each sensor's observed signal was generated as a linear combination of six constructed latent signals. The six latent signals were 100% polarized and correspond to vertical, horizontal, 45 degree, -45 degree, clockwise, and counterclockwise polarization schemes in pairs of wavelength channels. The polarization ellipses of the six generative vectors are depicted in Figure 2a. Random exponential combinations of these six signals were added together to simulate the observed signal of each of the 100 sensors in the matrix. We applied QNMFk to the resulting matrix to investigate latent dimensions k ∈ [1,9].    Figure 2b depicts the silhouette and relative error selection criteria metrics. The results clearly indicate stable clustering at the correct latent dimension, k = 6, as the silhouette score is approximately 1, while at k = 7 the silhouette score drops significantly. To verify that the extracted features are correct Figure 2c depicts the cosine similarities between the recovered features and the generative features, after the recovered features were permuted to correspond with the generative features. The cosine similarities close to 1 on the diagonal indicate that the extracted features are functionally identical to the generative features. The cosine similarities close to zero show the orthogonal nature between the pairs of signals in the synthetic construction.

Synthetic Random Data
To test QNMFk on less structured data we constructed random datasets with predetermined number of sources. This was accomplished by independently synthesizing quaternion feature matrices W ∈ H m×k S and activation matrix H ∈ R k×n + . Each element of W was sampled as a complex 2 × 2 matrix from the complex Wishart distribution CW(I, 2, 2) [28]. This distribution, which is is a generalized multidimensional gamma distribution, was chosen as it has a support of positive semidefinite matrices with a reasonable eigenvalue distribution, and therefore the sampled elements adhere to the physicality constraints of the Stokes parameters. Each element of H was sampled from the uniform distribution U(0, 1) to ensure non-negativity. Figure 1, panel a, depicts the convergence curves of the QADMM implementation for parameter ρ. Figure 1, panels b, c d, e, and f depict the average Silhouette, S, and relative error, R, plots for 5 different latent dimension constructed with M = 50 and N = 100. We see that at the right latent dimensions, the Silhouette is close to 1, and with a small relative error, thus selecting the correct number of predetermined latent sources in each of the synthetic datasets. VOLUME , 2016 (a) Silhouette scores, S, in red, and relative errors, R, in blue, with varying latent dimension for spectropolarimetric data. The identified latent dimension, dashed black line, matches the true latent dimension K = 3.
(b) Spatial activation diagrams of the three polarized sources. The panels show: the originals, reconstructions, and differences. FIGURE 4: Analysis of spectropolarimetric data generated in [19].

Spectropolarimetric Data
Finally, we tested our method on the spectropolarimetric data generated in Ref. [19]. This data is contained in a wavelength by spatial index matrix and is a linear combination of three fully polarized sources. From Figure 4a we see that our method correctly identifies that there are three polarized sources. Figure 4b shows comparison of the ground truth activations with QNMFk extracted activations. Finally, the basis latent quaternion vectors, we found, also coincide with these used to construct the data, showing Pearson correlations > 0.9999 with the ground truth sources.

CONCLUSION
In this work, we introduced and demonstrated the effectiveness of a method for determining the number of latent quaternion sources based on their stability. QNMFk utilizes a QADMM version of previously introduced QNMF minimization, integrated with custom quaternion bootstrapping and clustering algorithm to statistically compare factorization models with different number of latent quaternion sources. We showed that QNMFk correctly identifies the number of latent quaternion sources in multiple synthetic data sets. To further validate the use of quaternion based models, additional research on quaternion domain optimization as well as experiments on real world spectropolarimetric datasets is required. Current results, however, indicate a promising direction for quaternion based modeling in signal processing.
GIANCARLO SANCHEZ is currently pursuing his PhD in Applied Mathematics at Florida International University. He was an intern at Los Alamos National Labratory for the Applied Machine Learning Summer Research program. He is currently pursing research in machine learning algorithms for approximation of high dimensional PDEs.
ERIK SKAU received his PhD in Applied Mathematics from North Carolina State University in 2017. His research expertise includes optimization techniques for matrix and tensor decompositions. Erik is a scientist in the Information Sciences Group as Los Alamos National Laboratory.
BOIAN ALEXANDROV is a scientist at the Theoretical Division in Los Alamos National Laboratory. He has MS in Theoretical Physics, a PhD in Nuclear Engineering and second PhD in Computational Biophysics. Alexandrov is specialized in Big Data analytics, non-negative matrix and tensor factorization, unsupervised learning, and latent features extraction. VOLUME , 2016