The Mueller matrix cone and its application to filtering

We show that there is an isometry between the real ambient space of all Mueller matrices and the space of all Hermitian matrices which maps the Mueller matrices onto the positive semidefinite matrices. We use this to establish an optimality result for the filtering of Mueller matrices, which roughly says that it is always enough to filter the eigenvalues of the corresponding"coherency matrix". Then we further explain how the knowledge of the cone of Hermitian positive semidefinite matrices can be transferred to the cone of Mueller matrices with a special emphasis towards optimisation. In particular, we suggest that means of Mueller matrices should be computed within the corresponding Riemannian geometry.


Introduction
In polarisation optics Mueller matrices are of great importance, as they describe the change of polarisation of light after interacting with a medium in a linear fashion. In order to be a Mueller matrix the matrix has to satisfy the Stokes criterion, which states that every Stokes vector has to be mapped onto a Stokes vector. Cloude then showed in [9] that Mueller matrices can be associated with Hermitian matrices with non-negative eigenvalues the so called coherency or covariance matrices. This was then used for filtering measured matrices in order to make them physically meaningful, i.e. satisfying the Stokes criterion. Moreover, it was shown that any coherency matrix of a non-depolarising Mueller matrix also known as a Stokes-Mueller matrix has only one non-zero eigenvalue. This then easily suggests that any Mueller matrix is the sum of four non-depolarising matrices. In [19] and [21] matrices, which can be decomposed into a non-depolarising part and one perfectly depolarising part, have been analysed and in the latter a filtering method was proposed. Optimality of filtering was analysed in [1] by using a maximum likelihood method originally developed for quantum process tomography and does such as [13] rely on the Cholesky decomposition of the coherency matrix for filtering. More about optimality filtering of Mueller matrices was derived in [7], [4], [16] and [13]. In [14] then the optimality of the Cloude filter was rigorously proved.
The purpose of this work is to connect the methodologies of filtering of measured Mueller matrices to well-established mathematical theories. We will show how this can be used to prove a more general theorem about the optimality of filtering of Mueller matrices. This simplifies and generalises part of the results of [14].Moreover, we then review the mathematical theory about the Hermitian positive semidefinite cone and explain, along with reviewing existing results, how this gives rise to the differential geometry of the manifold of all Mueller matrices.

Isometry of the ambient space
In this section we explain how a well know result about the connection of Mueller matrices and Hermitian positive definite matrices establishes an isometry between them. For that we first restate the theorem which establishes this connection. It implicitly first appeared in [9].
is a Mueller matrix if and only if the Hermitian matrix H = (h ij ) defined by the following linear equations has non-negative eigenvalues. Moreover, if the Mueller matrix has only one non-zero eigenvalue, then it is non-depolarising.
Note that we altered the result of these linear equations by a factor of 2 in order to simplify oncoming observations, but this does of course not change the validity of the theorem.
What to our knowledge has not yet discussed explicitly about the above result and the above equations is the following simple observation. The whole trick is to realise that the Mueller matrices and Hermitian matrices are vectors and then conclude that the Frobenius inner product coincide with the Hermitian/Euclidean inner product. Proof. We may assume that we are working in C 16 by taking the canonical bijection from C 4×4 to C 16 . It will be enough now to write down the complex 16 × 16-matrix T corresponding to the equations 1, 2, 3 and 4. Compute the eigenvalues of T with your favourite solver and then conclude that The nice thing about unitary operators is that they preserve the Hermitian inner product, i.e. we have that x, y = T (x), T (y) for any x, y ∈ C 4×4 . Hence, the Hermitian norm (which coincides with the euclidean norm, in case there are only real entries) is preserved under these map.
Moreover, by Theorem 1 we know that T maps the set of all Mueller matrices to the set of all positive semidefinite Hermitian matrices. 1 We further investigate some properties of the map T . We denote as C the R-vector space of all Hermitian 4 × 4-matrices and denote as R the R-vector space of all 4 × 4-R-matrices both with the usual trace scalar product. Lemma 1) is some non-singular orthogonal linear transformation from R to C.

Lemma 2. The restriction T ↾ R of the linear map T (as defined in
Proof. We first consider T to be a map from R to 4 × 8-R-matrices (map the complex numbers to R 2 ). Moreover, the space of all Hermitian matrices can be considered a 16-dimensional subspace of R 4×8 . The orthogonality and nonsingularity follows as the eigenvalues of the T are 1, −1 by Lemma 1. Now the next result follows by Theorem 1.

Corollary 1. The map T is an isometry on C 4×4 (with the Hermitian norm)
and an isometry between R 4×4 and C (with the Euclidean norm) which maps the set of all Mueller matrices onto the set of all semidefinite matrices.

Optimal filtering revisited
Now we are able to translate the following problem into a question about Hermitian matrices: Given a real 4 × 4-matrix (a Mueller matrix one got from a measurement). Then we ask what the nearest (in terms of the euclidean distance) physically feasible Mueller matrix is. The same holds for the question, which asks for the nearest non-depolarising matrix to a given measurement. Which now by Lemma 1, Lemma 2 and Corollary 1 can be translated to the question; What is the nearest positive semidefinite matrix (with rank 1 in the non-depolarising case) to a given Hermitian matrix. The answer to the first question by implicitly answering the second was already given in [14], but we can now rely on well-established mathematical theory to show this. We will further derive a more general result and apply it to a further case.
Notation; By [a] we denote the diagonal matrix with entries a n ≤ . . . ≤ a 1 and by U n the set of all unitary n × n-matrices.
The following is true in fact for any unitarily-invariant matrix norm * such as the Hermitian norm. It can be considered a Hermitian version of Theorem 4.5 of [22].
Further, let S Y be the set Given some Hermitian A = U † [a]U with a 1 ≥ . . . ≥ a n ≥ c and U ∈ U n , we then have that for some b ∈ Y the following holds Proof. The proof of Theorem 4.5 in [22] can be easily modified. Conclude in the same way that there exists some for any x ∈ Y . Since our matrices are Hermitian, we know then by Theorem 2 of [31] that This Theorem together with Corollary 1 now lets us translate any nearness problems of Mueller matrices into a problem of nearness of the eigenvalues.
Then the nearest Mueller matrix in M Y in terms of the euclidean norm to M is the matrix Hence, we can now easily conclude that the filtering proposed by Cloude [9] is optimal. For this let M be the measured Matrix and let T (M ) = U † [a]U has a minimal eigenvalue of c. We further assume that c < 0 as otherwise we do not have to apply any filter. Let [a ′ ] be the tuple where we set all negative eigenvalues of [a] to 0.g As the set is the set of all Mueller matrices which can be decomposed in a non-depolarising part and perfectly depolarising part. 3 Now asking what the best estimate for a measured matrix, which has this type of composition, can be answered by applying Corollary 2 with E (which is closed). Now if a 1 ≥ a 2 ≥ a 3 ≥ a 4 are the eigenvalues of T (M ) then b = (a 1 , c, c, c) with c = 1 3 4 i=2 a i is the best estimate in E. And hence, we have that for a measurement M the best estimate in . This also shows us that Equation (17) of [21] is in fact the best a priori estimate for the perfectly depolarising part, contrary to what was stated in that paper.

Geometry of the semidefinite cone
The reader may wonder why we could so easily compute the nearest Mueller matrix to a given real matrix or respectively solve the corresponding problem the nearest semidefinite matrix to Hermitian. This ultimately has to do with the nature of the object consisting of all complex semidefinite matrices. It turns out that this is a deeply studied object which is known under the name complex semidefinite cone or more generally symmetric cones and is used among other things for complex semidefinite programming. Many very nice properties such as convexity are known about it. In fact, it is a cone. So it is closed under positive linear combinations, i.e. αH 1 +βH 2 is also positive definite with H 1 , H 2 positive definite and α, β positive numbers. Of course, this implies that the set of all Mueller matrices is a cone by linearity of T . It is also known what the interior (all positive definite matrices) and the boundary (all singular positive semidefinite matrices) is. Moreover, there is a Riemannian metric tensor on its interior (see [18] and Chapter 6 of [5]). Now the practitioner can use this knowledge and grab ready available tools and mathematical theory. For example, take a subset of Mueller matrices S and a function f : S → R one wants to optimise. We have just seen such functions namely the distance of the Mueller matrices (or certain subsets of them) to a given measurement M . As done before, we can translate the problem by optimising the map f • T −1 from T (S) to R instead. This can be either done by finding suitable theory about the semidefinite cone such as Theorem 2 and then solve the problem directly. Or a more general approach would be to use available tools for solving optimisation problems. As a start one would transfer the complex optimisation problem into a real one (with tools as YALMIP [23]). Although voices have been raced to consider optimisation in the complex numbers directly [15]. In any way, there are many available software tools for computing the optimum of a function on the complex or real semidefinite cone such as Manopt [8], Pymanopt [28] and SeDuMi [27].
We highlight one approach of characterising the space of semidefinite matrices of some fixed rank taken from [32] and [30] which is also described in the code of [8] and [28]. Now if the rank is 1 then this space is in correspondence via T with the non-depolarising Mueller matrices. The differential geometry of the non-depolarising Mueller matrices was already studied in [12]. We going to outline now the differential geometry of the Hermitian positive semidefinite cone.
A semidefinite matrix H from C 4×4 of rank k can be written as an outer product Y Y † of a matrix Y of C 4×k of full rank. On the other hand any such outer product Y Y † is positive semidefinite and of rank k. 4 As in [32] we define an equivalence relation on C 4×k by identifying Y U with Y for all unitary matrices U (as the outer product does not change, i.e. Y Y † = Y U (Y U ) † ). We denote the manifold of all C 4×k matrices of full rank as C 4k . Now by the quotient manifold theorem the manifold C 4k /U (k) is a Riemann quotient manifold, if U (k) is the Lie group of all unitary matrices.
One can note a striking similarity to Cholesky decomposition, which is used in [1] and [13]. In particular, in case of positive definite matrices the Cholesky decomposition is unique and C 44 can be replaced with all triangular matrices with real diagonal entries. In the case k < 4 one can find a unique decomposition after a twisting with permutation matrices [17] and hence one would end up with a finite-to-one map (bounded by 24, the number of 4 × 4-permutation matrices). For all k the metric of the manifold is given by the real-trace inner product, if identifying the complex numbers with R 2 . Moreover, when k = 1 we can find a representative of the equivalence classes by requiring that the first non-zero element of the tuple c ∈ C 4 is a real number. This lets us conclude that its dimension is 7.
Further, if we identify the C with R 2 then the quotient manifold theorem tells us also that the dimension of the Riemannian manifold of all complex positive semi-definite matrices of rank k has dimension 4 · k − k 2 . Of course, all this analysis extends to the Mueller matrices by extending the mapping via T −1 . This then implies that the manifold of Mueller matrices is a decomposition of this quotient manifolds C 4k /U (k) with 1 ≤ k ≤ 4 and the zero element. Furthermore, in case where the Mueller matrices M are assumed to be the sum of a non-depolarising matrix and an ideal depolariser and hence the corresponding coherency matrices T (M ) are a sum of a rank-1 positive semidefinite matrix and a diagonal matrix with positive entries, it is not hard to see that this manifold is the product manifold of the positive real numbers R + and the manifold of all complex rank-1 positive semidefinite matrices.
What also can be interfered from the above analysis is the following. We set the above together to receive a map F which is defined as follows where HPSD is the space of all Hermitian positive semidefinite matrices and M the space of the Mueller matrices. Now a short calculation gives us then that F is a quadratic homogeneous polynomial and hence any F (λx) = λ 2 F (x). Moreover, we can see that 11 is the upper left element of the Mueller matrix. This means that it is almost always enough to study the reduced case of Mueller matrices which have upper left element 1.
Another question which now arises is that of the mean of two or more matrices. In the euclidean space this of course just the standard Arithmetic mean. But in manifolds the geodesic might look very different from a straight line and hence the average of two matrices, i.e. the middle point on the geodesic between these two, might be significantly different from the arithmetic mean. This case of the geometric average of two Mueller matrices was already covered in [10]. The generalisation of this concept namely the Riemannian barycenter of matrices A 1 . . . A n , i.e. the matrix which is the minimum of the function n i=1 d(X, A i ) where d is the distance measure on the manifold. Again we can rely on a well studied area of means of semidefinite linear operators. Studying of the mean of two linear operators began through a study of connections of electrical networks [2]. This was then followed by more axiomatic studies on general Hermitian operators [20], [25]. Means between more than two matrices have been studied in [3]. In [6] means have been studied in case of real semidefinite matrices of fixed rank. An exposition of the geometric nature of means can be found in Chapter 6 of [5]. All together this suggests that computing the mean of multiple Mueller matrices should be done using the Riemannian geometric mean. In practice this would be done by transferring them via T to the semidefinite cone and then using available implementation of the Riemannian mean such as the tool Yalmip [23].

Conclusion
We have established a connection between the area of Mueller matrices and the areas of general matrix analysis, Riemannian geometry and optimisation. All basically by interpreting existing results and making the simple observation that the real ambient space of the Hermitian positive semidefinite matrices and the Mueller matrices and the objects themself isometrically map onto each other. With this new knowledge, we showed how matrix analysis can be directly used to prove an optimality result (see Corollary 2) for the filtering of measured Mueller matrices.
We further reviewed mathematical results about the complex semidefinite cone and noted how this can be used with our previous results and how this suggests a new mean for Mueller matrices. Of course, such connection have been partly discovered in the past or general results about semidefinite matrices have been reproved in the special case of Mueller matrices and 4 × 4-Hermitian semidefinite matrices. But our connection makes this precise and provides a way to bring well-established mathematical theories and tools into the polarimetric world. One can also speculate that the analysis which we have established here, might bring new insight to quantum optics and quantum information as they share some mathematical objects [1].
What is still missing in our analysis is to bring together this analysis with the study of the Lie group structure of invertible Mueller matrices. Or more generally the semigroup structure. Of course, by our analysis of the geometry it is easy now to compute the tangent space at the identity and therefore the Lie algebra. But this is nothing new, the study of the Lie group and Lie algebra was already done in [11]. What is still missing is a study how the geometry of the additive structure of the Mueller matrix, which corresponds to parallel optical elements, and the geometry of the multiplicative structure, which corresponds to successive optical elements, interact.