A Bayesian approach for polarimetric data reduction : the Mueller imaging case

In this paper, we extend to the Mueller imaging framework a formerly introduced Bayesian approach dealing with polarimetric data reduction and robust clustering of polarization encoded images in the piecewise constant case. The extension was made possible thanks to a suitable writing of the observation model in the Mueller context that relies on the system’s coherency matrix and Cholesky decomposition such that the admissibility constraints are easily captured. This generalization comes at the cost of nonlinearity with respect to the parameters that have to be estimated. This estimation-clustering problem is tackled in a Bayesian framework where a hierarchical stochastic model based on a Markov random field proposed by Potts is used. This fully unsupervised approach is extensively tested over synthetic data as well as real Mueller images. © 2008 Optical Society of America OCIS codes: (120.5410) Polarimetry; (110.2960) Image analysis; (100.3190) Inverse problems; (000.3860) Mathematical methods in physics. References and links 1. J. Zallat, and Ch. Heinrich, “Polarimetric data reduction: a Bayesian approach,” Opt. Express 15, 83–96 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=oe-15-1-83. 2. J. Zallat, S. Aı̈nouz, and M.-P. Stoll, “Optimal configurations for imaging polarimeters: impact of image noise and systematic errors,” J. Opt. A: Pure Appl. Opt. 8, 807–814 (2006). 3. S. Cloude, and E. Pottier, “Concept of polarization entropy in optical scattering: polarization analysis and measurement,” Opt. Eng. 34, 1599–1610 (1995). 4. G. Golub, and C. Van Loan, Matrix computation (The Johns Hopkins U. Press, Third edition, 1996). 5. G. Winkler, Image analysis, random fields and Markov chain Monte Carlo methods: a mathematical introduction (Springer, Second edition, 2006). 6. P. Green, and S. Richardson, “Hidden Markov models and disease mapping,” J. Am. Stat. Assoc. 97, 1055–1070 (2002). 7. S.-Y. Lu, and R. Chipman, “Interpretation of Mueller matrices based on polar decomposition,” J. Opt. Soc. Am. A 13, 1106–1113 (1996).


Introduction
In a previous paper [1], we reported on a new method addressing polarimetric data reduction in the Stokes imaging framework.Our main contribution there concerned the use of a Bayesian approach ensuring the physical authenticity of polarization encoded images obtained from raw intensity acquisitions while providing a robust clustering of the imaged scene.We showed also that our approach preserved the Stokes channels from image noises and errors induced by the imaging system.The problem of the more complex Mueller imaging case remained pending.It is addressed in the present article.
The importance of Mueller imaging comes from the rich set of information it provides about the local nature of the target which may be of great use in studying number of physical phenomena that arise in many fields of science.Hence, an increasing interest is devoted to the development of new imaging systems that record spatially dependent patterns of polarized light diffusely scattered or transmitted by scenes or samples.
Mueller imaging is defined as the distributed measurement of Mueller matrices attached to each pixel.This is made possible by the use of effective polarization state analyzers (PSAs) that acquire the Stokes vectors corresponding to each pixel in the image while a source lights up the scene through a polarization state generator (PSG).Different configurations using different polarization modulation techniques are used in imaging polarimetry.Whatever the technique used, the principle remains the same: a set of raw intensity images have to be inverted to yield polarization encoded images.This procedure called "polarimetric data reduction" (PDR) is achieved through a linear observation model that relates the two sets of images via the system dependent PSA and PSG matrices.
It is worth noting that a proper choice of the PSA and the PSG matrices plays an important role in PDR [2].It ensures a good conditioning that allows the inversion of the system and minimizes the contamination of polarimetric channels by image noise.However, if one uses the classical PDR techniques, even for well behaved imaging polarimeters [2], the image noise propagates into the estimated Mueller matrices and may prevent their physical admissibility.Indeed, a 4 × 4 real matrix must satisfy some physical constraints to be bona fide.
The contribution of the present work is twofold.Firstly, we propose an original parameterization of the forward model in order to account for the physical admissibility conditions.This leads to a nonlinear observation model.Secondly, we propose a fully unsupervised Bayesian framework (model and algorithm) addressing the estimation problem.We suppose that the sought object is piecewise constant, which corresponds to many situations of practical interest.The Markov random field model originally proposed by Potts will account for the hidden segmentation map holding the spatial prior knowledge.We will also assume the Mueller matrices to be class wise constant.Estimation will take place in the generalized maximum likelihood framework where the a posteriori probability law is optimized jointly with respect to the parameters and the hyperparameters.
The article is organized as follows: the nonlinear parameterization of the forward problem and the stochastic model are presented in section 2. The estimation procedure is detailed in section 3. Simulation results and real data processing are presented and discussed in section 4. Conclusion and perspectives are given in section 5.

Observation model in the Mueller imaging polarimetry framework
A Mueller imaging polarimeter allows the indirect measurement of the Mueller matrix attached to each pixel in the image.This is achieved by illuminating an object by p independent polarization states where p ≥ 4 through a PSG.Each of these states interacts with the object to yield an outgoing Stokes vector that is sensed through the PSA by q independent probing states where q ≥ 4. For each pixel location ( j y , j x ) in the image, the p × q intensity measurements I are related to the pixel's Mueller matrix M by the following linear equation: where A is a p × 4 PSA matrix whose rows constitute the p probing states and W is the 4 × q PSG matrix whose columns represent the q incident polarization states.
It can be shown [2] that Eq. ( 1) can be written in the following form where the underline symbol stands for the operator which maps a p × q matrix into an pq × 1 vector formed by stacking up its columns and ⊗ represents the Kronecker product.The known pq × 16 matrix P, also called "Polarization Measurement Matrix" (PMM), depends on a parameter vector η which controls the combined actions of the PSG and PSA and relates linearly the Mueller parameters vector to the raw measured intensity vector.The multi-component nature of the Mueller matrices image is treated here by seeing the Mueller image as a result of a multi-channel imaging system where the information is conveyed to the user by a multi-channel image data.These channels will be indexed in the following, where appropriate, by j n ∈ [0, 15].Regardless of the physical constraints that must be satisfied by the elements of M, we have the same starting point as with the Stokes imaging framework [1]: the two problems share the same observation model.Systematic calibration errors as well as noises that arise from the use of a CCD photosensitive device as a detector (readout noise, thermal noise, dark current noise, and photon noise among others) are accounted for in the same way as in [1], which leads to the complete observation model: where ρ accounts for kinds of noises and errors having their variances directly proportional to the measured intensity (see [1] for more details) while ε encompasses all remaining noise terms.
The problem is now to estimate M from the observations I while ensuring the physical admissibility of the retrieved Mueller operators attached to each pixel.One can easily understand that the classical approach that relies on the PMM pseudo-inverse matrix is not the right way to achieve this task.Indeed, it necessitates a pixel-wise Mueller validity post checking which is time consuming without giving a proper estimate for pathological pixels.
We propose a new parameterization of the problem, yielding an observation model with simple constraints that allows tackling the estimation in a Bayesian framework.This comes at the cost of nonlinearity.The parameterization is achieved by observing that there is a one-to-one correspondence between a bona fide Mueller matrix M and a Hermitian positive semidefinite matrix H called "system coherency matrix" where {σ k } 3 k=0 , comprising the Pauli matrices and the identity matrix, are the elements for a basis for C 2×2 and m kl are the elements of M [3].The linear equation (4) leads to the following expression: where T is a 16 × 16 invertible complex transformation matrix relating the stacked Mueller matrix to the stacked system's coherency matrix.It was stated that M is physically acceptable if and only if H is positive semi-definite [3].Furthermore, if H is positive semidefinite, it can be written in terms of its Cholesky decomposition as [4] H where Λ is a lower triangular matrix composed of 16 real parameters (λ i ) It is now possible to parameterize the elements of H by {λ i } in Eq. ( 5) and report the expression of M in the observation model of Eq. ( 3) to obtain We must note that the observation model is now a nonlinear function of {λ i } and that λ j , j ∈ [0, 3] must obey a positivity constraint to respect the positive definiteness of H.The estimation problem turns now to the estimation of the real parameters {λ i } that are used to construct H, hence yielding a physically acceptable Mueller matrix.We note also that generalization to the case where the matrix P depends on the site ( j y , j x ) is straightforward.

Underlying segmentation model
The underlying segmentation map will be modeled as a Markov random field, and more precisely using Pott's model [5].Let denote the segmentation image, where K is the number of classes.The prior probability associated to I 0 reads where is the abstract energy measure associated to the field and Z the partition function.Variables t and s denote pixel sites of the image, the ∼ denotes neighboring sites (we used eight neighbors in the present work) and I (.) is the indicator function taking values 1 or 0 whether the condition between parentheses is true or false.This model favors piecewise constant images, the size of the patches being governed by the abstract inverse temperature parameter β .A key problem in using Eq.(10) as a penalty term is the computation of Z as a function of β .We rely here on thermodynamic integration, i.e., on the relation where the expectation is taken with respect to p (I 0 |β ).
From a theoretical point of view, we have From a practical point of view, values of E [−U (I 0 )] are computed using Markov chain Monte Carlo approximations for different values of β .The corresponding set of points is linearly interpolated, thus defining an approximation of d logZ/dβ [6] which is further integrated thus giving an approximation of logZ.Let us mention that the samples of I 0 are generated by alternating Gibbs sampling (pixel-flipping) raster scans of the image with Swendsen-Wang iterations (cluster-flipping).The role of the latter iterations is to improve the mixing properties of the Markov chain.We will not further detail this standard procedure, the reader being referred to [5].
Let Φ be the parameter matrix defined as where the numbers between parentheses stand for the class index.In the estimation procedure to be described, the role of the segmentation map I 0 will be to index the appropriate column k of candidate Mueller class parameters {λ i (k)} for pixels labeled as class k: I 0 ( j y , j x ) = k.This straightforwardly yields the vector H attached to such pixel.

Noise model
The noise will be supposed to be independent and Gaussian distributed according to where μ are the noise parameters (mean and standard deviation) that affect the k th class in the j n channel image.
Let us define Θ as the set Θ = Θ ( j n ) ; j n ∈ [0, 15] , where All these parameters are unknown and will have to be estimated.This general model will in the sequel be restricted to the case of vanishing means.From a modelling point of view, the model with non vanishing means suffers from an identifiability problem: modifying some or all the λ i 's amounts to modifying the μ j 's and we have, at least for now, no information (prior or measured) to be able to make a decision.Furthermore, this assumption does not penalize at all the algorithm performances as observed in practice.

General probabilistic model
The overall probabilistic model may be depicted using a Directed Acyclic Graph (DAG) (see Fig. 1).
For the sake of clarity, we drop the dependency on K, since the only role of K is in defining dimensions.Considering the DAG, we may write: The component terms on the right-hand-side are reckoned as follows: • p (I 0 |β ) is established by Eq. (10) and section 2.2; Fig. 1.The directed acyclic graph representing the dependence relations between the variables.According to usual conventions, square boxes represent fixed or observed quantities whereas circles represent quantities to be estimated.More precisely, β is the abstract inverse temperature parameter of the Markov random field, σ λ is the standard deviation of the λ i 's, ρ is related to the quantum efficiency of the camera, I 0 is the underlying class segmentation map, Φ is the Mueller parameter matrix encompassing the λ i (k)'s, Θ is the matrix encompassing the parameters of the noise distribution, and I is the intensity observations matrix.
• p (Φ|σ λ ) is defined in a class-independent manner as follows: The variance σ 2 λ is chosen sufficiently large in order to have no influence.In other words, the prior p (Φ|σ λ ) is flat compared to the likelihood p (I|Φ, I 0 , ρ, Θ) when the λ i (k) vary.Thus the prior p (Φ|σ λ ) has no influence besides imposing the positivity of λ 0 ,... λ 3 when estimating Φ.The variance σ 2 λ may be estimated using physical considerations or using a set of I's (from this set of I's, we compute the H's in a pseudo-inverse framework and estimate the λ i 's by minimizing H − ΛΛ t 2 ).This would give a set of λ i 's from which we could derive σ 2 λ ; • the data-dependent term p (I|Φ, I 0 , ρ, Θ) is obtained by solving Eq. (8) for ε then using Eq.(15) to express the (joint) probability density function of ε, assuming iid of pixels I ( j y , j x ) conditioned on I 0 : where In other words, the λ i (k) are indexed from Φ (per Eq. ( 14)) based on the class index of the pixel, which in turn is looked-up from the segmentation map I 0 .

Estimation procedure
All available information is contained in distribution (17), from which we will derive the estimates Φ, I 0 , ρ, Θ, β .Let us emphasize that this method is fully unsupervised: there is no free parameter to be chosen or tuned by the user, set aside the number K of classes (see discussion below).Let {ρ, Θ, β } be the set of hyperparameters (i.e., the parameters which are not directly of interest), whereas the parameters are I 0 and Φ.The parameters and hyperparameters will be obtained simultaneously by joint optimization of distribution (17), thus corresponding to generalized maximum likelihood estimation whereas maximum a posteriori would correspond to optimizing the distribution with respect to the parameters for given values of the hyperparameters.
To address the optimization problem, we resort to the Gauss-Seidel scheme: in the global loop, we consider all variables in a row and optimize − log p (I, Φ, I 0 |ρ, Θ, β , σ λ ) with respect to the current variable while the other variables are held The update of I 0 is achieved site by site in a raster scan.The update of Φ is achieved component by component (i.e., we loop on i and k and optimize with respect to λ i (k)).Convergence is reached when I 0 is not modified by an entire raster scan (in fact, we will monitor that I 0 is held constant over several (10 in the present case, see the counter < 10 condition in Algorithm 2-b) raster scans, since even if I 0 is held constant, the other variables may vary, if only a little, which in turn may induce a modification of I 0 ).
Optimization with respect to λ i (k) deserves particular comments.This optimization involves a fourth degree polynomial obtained by extracting the corresponding terms from − log p (I|Φ, I 0 , ρ, Θ)−log p (Φ|σ λ ).This is handled using formal computation: the coefficients of the polynomial are analytically precomputed and the local minima are computed given the numerical values of the coefficients.The updated value of λ i (k) is sought among a) the arguments of the local minima when no positivity constraint is involved or b) the positive arguments of the local minima and zero when a positivity constraint is involved.
The present optimization problem being highly multimodal, particularly with respect to I 0 when the noise level is important, we suggest using the algorithm detailed below which behaves better than straightforward and sole Gauss-Seidel descent.
At first, twenty different runs of the K-means clustering procedure are applied to the image set of m × n dimensional intensity channel vectors, yielding different class segmentation maps.The best resulting map, characterized by minimal distances between the clusters and the class's centroids, is retained as the initial value of I 0 and used to initialize the DAG with the procedure detailed in Algorithm 1-a and Algorithm 1-b.In this phase, a whole scan of I 0 may lead to numerous modifications since the state of the DAG is far from its final (equilibrium) state.We want to avoid such a large movement without updating the rest (i.e., all variables but I 0 ) of the DAG, since it may lead to local optima far from the solution.Thus, we update the other variables of the DAG as soon as a given (reduced) number of site updates are achieved (Algorithm 1-a).This allows reaching a state much closer to the equilibrium than the initial state.We then switch to entire raster scans (Algorithm 1-b) on which we loop until convergence.The role of the mask is to improve the speed of the procedure.The new mask (see the instruction update the mask) is composed of all sites that have just been modified and of the neighbors of those sites.At this stage, the state is not optimal: a better (in the sense of a lower criterion value) solution may be found.This state cannot be reached by updating single sites at a time.This is why we resort to a multiresolution optimization scheme (Algorithm 2-a), where the sites of I 0 are modified blockwise.The other parameters of the DAG remain fixed in this phase, since they are close to their final values.We conclude with a standard Gauss-Seidel procedure (Algorithm 2-b) until convergence.cal pseudo-inverse PDR approach as well as for the Bayesian one (Potts' model) and obtained respectively 110% for the first one and 2.2% for the second one.When the Mueller matrix is estimated with the pseudo-inverse, 90 % (resp.99 %) of the pixels are not physically admissible for σ 2 n = 0.1 (resp.σ 2 n = 0.5).Moreover, for the pseudo-inverse estimate, several Mueller channels display no information (see for example column 3 and row 3 in Fig. 3), whereas structural information appears in all channels for the Bayesian estimate.
We conclude that the introduced Bayesian approach (Potts' model) gives better results in terms of estimated quantities and structural information (composition) of the imaged scene as compared to other approaches (pseudo-inverse and K-means).Moreover, the Markov random field prior plays an important role (see Table 2, where the results significantly degrade for β = 0).The two Bayesian variants (Potts' model and the unilateral model) give nearly the same results in term of RE but Potts' model is noticeably better in term of misclassified pixels (Table 2).Besides, further simulations showed that Potts' model is more robust than the unilateral one with respect to increasing noise variance.
In real situations, for some limited scenario, the optimal number of classes (K opt ) can only be roughly estimated.Usually, it can't be inferred easily by simple observation.Our approach is able to handle this issue.To determine K opt , the optimum of the law in Eq. ( 17) is plotted against Table 2. Mean percentage of misclassified pixels for the Bayesian approach (with β and without (β = 0) Markov random field prior) and for K-means clustering applied to the intensity images.Averaging has been done over noise realizations.Numbers between parentheses represents standard-deviation of related quantities.
Noise variance σ value for which the graph levels off as shown in Fig. 4. In a fully Bayesian framework, we would have K opt = argmax K p (I|K).This would necessitate to resort to Markov chain Monte Carlo methods which is outside the scope of this paper.This is left for future work.
Running the method takes 4 minutes (resp.30 minutes) for K = 4 (resp.K = 8) classes (128 × 128 image, 2.8 GHz standard desktop computer, programmation in C langage).This important variation in CPU time is due to an increase in the number of iterations.

Results against measured data
The proposed model is applied to real measured images acquired by an inline rotating quarterwave plate Mueller imaging polarimeter.The object used to carry out these measurements consists of two dichroic patches (Polaroid) and a transparent thin film (cellophane) sandwiched between two coverglasses.The overall mount is imaged through a narrow band interferential Table 3. Mean values of the relative error RE (in percent) of the four Mueller matrices corresponding to the four classes of Fig. 2. Averaging has been done over noise realizations.The first values correspond to σ 2 n = 0.1, the second values correspond to σ 2 n = 0.5.The K-means were used to derive a segmentation map over which the pseudo-inverse estimate was classwise averaged, yielding the Mk 's.We ensured physical admissibility conditions over those averages according to  filter (10 nm) and a PSA in front of a 12 bits radiometric CCD camera.We point out that the dichroic patches are placed such that their transmission axes are nearly orthogonal.Totally polarized incident light undergoes multiple reflections between the two coverglasses and escapes highly depolarized such that the polarization signature is expected to be relatively weak.This situation was chosen to represent a hard test case that can be encountered in many practical situations such as in polarization imaging of biological tissues.The Mueller image obtained by classical PDR of this object is shown in Fig. 5.At first, we can observe that while the conventional intensity imaging provides little to no information about the object (m 00 channel in the Mueller image), the other Mueller channels provide a rich enhancement to classical imaging.We observe also that the Mueller image is highly contaminated by a noise that hampers the physical interpretation of the interaction mechanisms which would allow inferring the object properties.
The algorithm output is given in Fig. 6 and in Table 4.We point out that all retrieved Mueller matrices are physically valid.This can be verified easily by computing the eigenvalues spectra of their corresponding coherency matrices.Indeed, all the eigenvalues are real positive.In the other hand, Fig. 7 shows the variation of the optimum of Eq. ( 17) versus the number of classes.The shape of this curve indicates that the optimal number of classes is here 5.This is in complete agreement with the a priori knowledge.
Once the Mueller matrices are estimated, one can focus on the physical interpretation of the

Normalized criterion
Optimal number of classes Fig. 7.The optimal number of classes for the true image.We observe that the graph is consistent with the a priori knowledge of the object.results.Several analysis tools are available in the literature.We cite among others the coherency matrix decomposition approach [3] and the polar decomposition [7].For this image we observe that the five obtained matrices depolarize a large amount of any incoming polarized state which is in agreement with the physical situation.We can also compute the product of the two matrices M 2 and M 3 corresponding to the second and third classes (which are weak polarizers oriented orthogonally to each other) to obtain 0.03 0.01 0.00 −0.05 0.00 0.00 0.00 −0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 which is very close to a pure depolarizer in accordance with the physics of the object (transmitting axes nearly orthogonal).Finally, let us mention that running the algorithm takes 2 hours for K = 4 classes (256 × 256 image, 2.8 GHz standard desktop computer, programmation in C langage).

Conclusion and perspectives
In this paper, we present a polarimetric data reduction method addressing piecewise constant images in the Mueller framework imagery.The proposed Bayesian method enforces the physical admissibility conditions while providing a robust clustering of the estimated images.The results obtained are significantly better and more robust than those given by classical pseudoinverse or K-means estimation.This framework will enable the researcher to enhance and deepen the understanding of the object constituents.Extension of the approach to smoothly varying polarization signatures is under investigation.
The software that processes Stokes and Mueller images is available on request for interested readers wishing to process their own images.

Fig. 2 .
Fig. 2. 128-pixels×128-pixels mask used to generate the synthetic Mueller images.Each color (white, light gray, heavy gray and black) corresponds to one label.

Fig. 3 .
Fig. 3.The 16 channels of the synthetic Mueller image obtained by the commonly used pseudo-inverse estimate obtained for the noise of variance σ 2 n = 0.5.

Fig. 4 .
Fig.4.The optimal number of classes for the simulated data.The graph corresponds to σ 2 n = 0.5.

Fig. 5 .
Fig. 5.The Mueller image of the test object as obtained from classical pseudo-inverse PDR.The image in the upper left (m 00 ) element corresponds to a conventional intensity image.

Fig. 6 .
Fig. 6.The final map obtained by the Bayesian approach (Potts' model).Each gray level corresponds to one different class associated to a different Mueller matrix.The circular cross section delimits the footprint of the illuminating beam.Classes (2) and (3) are the dichroic patches and class (4) corresponds to the thin cellophane film.
the minimization being achieved under the admissibility conditions.

Table 4 .
The five retrieved Mueller matrices that correspond to the label map output of our algorithm.These matrices have to be interpreted with the class numbering that is shown in Fig.6.