Optimal discrimination of multiple regions with an active polarimetric imager

: Until now, most studies about polarimetric contrast optimization have focused on the discrimination of two regions (a target and a background). In this paper, we propose a methodology to determine the set of polarimetric measurements that optimize discrimination of an arbitrary number of regions with different polarimetric properties. We show on real world examples that in some situations, a few number of optimized polarimetric measurements can overcome the performance of full Mueller matrix imaging.


Introduction
Active polarimetric imaging systems are useful for gathering information that is not visible on intensity images, and can be useful in such domains as machine vision, remote sensing, biomedical imaging, and industrial control [1][2][3][4][5][6][7][8].However, cost, size and technological complexity of polarimetric imagers depend on the number of images they need to acquire in order to perform their task [9,10].Acquisition time can also be an issue when observing rapidly evolving scenes [11].In this context, a key issue is to evaluate the added value of each acquired image in order to optimize the compromise between complexity, speed, and efficiency of these systems.
In target detection applications, the relevant criterion for quantifying the efficiency of an imaging configuration is the contrast (or discrimination ability) between a target and a background region.Analysis and optimization of the contrast in polarimetric images have been investigated in the radar [12,13] and optics [14][15][16] communities.In particular, in [10], three different polarization imaging modalities (scalar, Stokes and Mueller) have been compared and it has been shown that for discriminating two regions, the optimal strategy consists in performing a single measurement with optimized illumination and analysis polarization states.
Until now, most studies about polarimetric contrast optimization have focused on the discrimination of two regions (a target and a background).In this paper, we address discrimination of an arbitrary number of regions having different polarimetric properties.This problem is more complex than two-region discrimination since it involves new degrees of freedom.In particular, the optimal number of measurements may no longer be equal to one, since several projections may be needed to optimally discriminate a given number of regions.This incurs accrued complexity in the expression of the performance criterion (separability measure) and its optimization.The present study is, to the best of our knowledge, the first one that addresses these issues theoretically and experimentally.
The paper is organized as follows.In section 2, we define our classification method and illustrate it on full Mueller matrix data.These results will serve as a benchmark for the following of the study.In section 3, we address the problem of finding the optimal set of polarimetric measurements when the number of acquisitions is fixed.In particular, we discuss the separability criterion and the numerical method to optimize it.We then illustrate our methodology on a real-world imaging example, and show that in some experimental conditions, a limited number of projections can yield better discrimination results than full Mueller matrix data.

Polarimetric imaging and region discrimination
In this section, we present the basic active imaging principle that we will consider, and the classification algorithm that we will use for multiple target discrimination.We then apply this approach to the classification of full Mueller matrix data.These results will serve as a benchmark for the following of the study.Finally, we discuss the limits and possible improvements of this approach.

Polarimetric measurements
We consider an active polarimetric imaging system that illuminates the scene with light coming from an unpolarized white source.Polarization state in illumination is defined by a Stokes vector S generated thanks to a Polarization State Generator (PSG) (see Fig. 1).In the practical implementation we use, the PSG is composed of two Liquid Crystal Variable Retarders and one polarizer.The output beam allows illuminating the scene uniformly in polarization and intensity.The polarimetric properties of a region of the scene corresponding to a pixel in the image is characterized by its Mueller matrix M. The Stokes vector of the light scattered by this region is S ′ = MS.It is analyzed by a Polarization State Analyser (PSA), which is a generalized polarizer whose eigenstate is the Stokes vector T. As for the PSG, in the experimental setup we use, the PSA is composed of two Liquid Crystal Variable Retarders and one polarizer.The number of photoelectrons measured at a pixel of the sensor is: where the superscript T denotes matrix transposition.In this equation, S and T are unit intensity, purely polarized Stokes vectors, I 0 is a number of photons and η is the conversion efficiency between photons and electrons.The total field of view of the imager, using a 480 × 640 CCD camera, is about 10 o .The measurements are done in a spectral band of 10 nm centered on 640 nm and selected thanks to an interference filter.

Maximum Likelihood (ML) region classification
Our objective in this section is to classify different regions of the scene by using a series of acquisitions of the type defined in Eq. (1 ).For each pixel p of the scene, the measures can be gathered in a vector x p of dimension N defined as: with M p , the Mueller matrix of the region of the scene corresponding to pixel p and i p n , n ∈ [1, N], the intensity associated to the pixel p projected on the polarization states S n and T n .Each pixel of the scene is thus represented by a point in a N-dimensional space, and our goal is to discriminate the different regions in this space.
The choice of the classifier depends on the application and on the statistics of the noise disturbing the images, that can be for example detection noise or spatial fluctuations of the scene.For the sake of simplicity, we will assume that the sum of these perturbations leads to Gaussian statistics for the measured data.The Probability Density Function (PDF) associated with class k is thus: where det returns the determinant of a square matrix, x k and Γ k respectively the mean and covariance matrix of x associated to the class k.From this expression of the PDF, it is possible to define the log-likelihood: The Maximum Likelihood (ML) classifier consists in deciding that a pixel of the scene belongs to class k as: In practice, the mean and covariance matrix of each class k are estimated from a database composed of sets Ω k containing P k training Mueller matrices.For each of these datasets, P k values of x p k , p ∈ [1, P k ], can be computed as in the Eq. ( 2).The estimate of x k and Γ k are obtained by the following formula: We will now illustrate this classification method on full Mueller matrix data.

Classification on full Mueller matrix data and discussion
The full Mueller matrix of the scene is obtained by performing 16 projections of the type of Eq. ( 1) using different couples of illumination and analysis states.These data are then inverted to yield an estimate of the Mueller matrix associated with each pixel p of the scene, which is thus represented by a vector x p of dimension N = 16 (Eq.( 2)).The set of illumination and analysis states is usually chosen to minimize the error propagation during this inversion [17,18].The Mueller matrix contains all the polarimetric properties of the scene and such data has often been used for discrimination by applying different methods of classification [19,20].
Let us illustrate our approach on a real Mueller matrix image.We consider a scene composed of four regions (three regions over a background).The three regions and the background will be denoted t i ,i = {1, 2, 3} and b in the following and are represented in Fig. 2(a).The intensity image of the scene is presented in Fig. 2(b).We can see that the four regions cannot be discriminated on it, since they have similar intensity reflectances.The Mueller matrices and covariances matrices are estimated from the database and these estimates are used to design a ML classifier (see Eq. ( 5)).The classification results are presented in Fig. 3(b).We can see that the different objects are globally well discriminated, although some errors are present.
Our purpose in the next section will be to compare the results obtained with full Mueller data to those obtained by using a smaller number of optimized projections.To have a common reference, we will consider that the total amount of time t 0 available to perform the acquisition is constant.Consequently, the larger the number of projections, the smaller the integration time for each measurement.For example, for the Mueller matrix acquisition, we have to acquire 16 projections thus the acquisition time is t 0 /16 for each image.In Fig. 3(a), it is seen that some Mueller images contain no information useful for discrimination.The time used for acquiring them would thus have been better spent on other more informative projections.To reduce the number of images that need to be acquired, one solution is to estimate only the coefficients of the Mueller matrix containing relevant information to discriminate the regions.This selection of specific coefficients has been the subject of recent works [9] and it has been shown that in general, 4 acquisitions are needed to extract one coefficient.In the next section, we will propose an alternative approach that makes it possible to further reduce the number of acquisitions while keeping, and even improving the classification results.

Discrimination using optimal projections
In this section, our purpose is to determine the optimal set of polarimetric projections that optimizes discrimination capacity.We can note that Eq. ( 1) can be put in the form of a linear projection [21]:  where m is the 16-component vector formed by reading the Mueller matrix M in the lexicographic order: ,and q = T ⊗ S (⊗ denotes the Kronecker product).Using this formulation, each pixel of the scene can be seen as a point in a 16-dimension space defined by each component of the Mueller matrix and, as all the pixels associated with the same object have not exactly the same polarimetric properties, each region is represented by a point cloud in this 16-dimension space.Using a vecteur q, it is then possible to project these points on a direction of the space in which the separability of the regions is maximal.The problem is thus of finding the optimal set of linear projections that maximizes the separability of the different point clouds.This issue has long been solved in pattern recognition: it is the well-known Linear Discriminant Analysis (LDA) [22].The optimal projection vectors are the K − 1 generalized eigenvectors associated to the interclass and intraclass covariances matrices.In our case, it would lead to K − 1 vectors q, that, after having projected the data, allow maximizing the separability of the different regions.However, this solution is valid when the domain of definition of the projection vectors is the set of real valued 16-component vectors with unit norm.In our problem, the vector q = T ⊗ S does not span this space (it has 4 degrees of freedom instead of 15).Consequently, the classical LDA technique cannot be used, and we describe in this section an approach to solve this problem.The key point is to define a tractable separability criterion for multi-region discrimination, which is not obvious.We then briefly describe the optimization algorithm that we use to determine the optimal configuration and illustrate this approach on a real-world imaging scenario.

Separability criterion
The statistics of each class k, and thus the performance of the classifier, depends on the sets of illumination and analysis states S = [S 1 , ..., S N ] and T = [T 1 , ..., T N ].Our goal in the following will be to optimize these states in order to maximize the separability of the different classes.
For that purpose, one has to define a separability criterion, that is, a function C(S , T ) of the illumination and analysis states that quantifies the discrimination performance.The optimal projection parameters will be obtained as: The adequate separability criterion for a multi-class discrimination problem is the Bayesian probability of error, which involves a sum of integrals of the PDF over the decision regions, weighted by the relative importance of the different types of errors (misclassification between pairs of classes) [23].However, this criterion is difficult to calculate and optimize.This is why we will use a separability criterion which is suboptimal, but easier to compute.Recently, it has been shown that for such real-world discrimination tasks as target detection and object segmentation, the Bhattacharyya distance is a good candidate for separability criterion [24].
We have thus decided to use it as contrast criterion in our study.The Bhattacharyya distance (B) is an asymptotic exponent on the probability of error in discrimination problems [25,26].Let us consider two probability density functions (pdf) P k (x) and P l (x).In our case, these pdf correspond to the noise statistics of the pixels in regions k and l.If n denotes the size of the sample, the probability of error in deciding wether the observed data has been generated with P k (x) and P l (x) behaves as exp[-nB] as n tends to infinity [26].Considering our two sets of data defined by their pdf P k (x) and P l (x), the Bhattacharyya distance is defined as: with D the definition domain of P k and P l .The Bhattacharyya distance is thus a scalar value that quantifies the similarity between the pdf P k (x) and P l (x).It belongs to the interval [0; +∞[, is equal to zero when the pdf are identical and infinite when the pdf supports do not overlap.
In our case, the Bhattacharyya distance between two classes k and l has the following expression: We have to note that, if it can be considered that the average values of the projected pixels in each class x k are sufficiently different, we can neglect the second term associated only to the difference of covariance matrices Γ k and Γ l .This simplification is particularly valuable since a large number of iterations have to be done to compute the optimal sets of images.We have checked that in the example of Fig. 4, taking into account the second term does not change the sets of optimal polarization states.
When the number of regions to discriminate is larger than two, there are several possible ways of merging the separability of each pair of classes into a single numerical criterion.We have chosen a min-max approach which consists in maximizing the minimal separation between pairs of classes.It corresponds to maximizing the following criterion: The higher the value of this criterion, the easier will be the discrimination of all classes in the N-dimensional space.Our goal is now to determine the optimal set of parameters (S , T ) maximizing this criterion.

Computational issue for the optimization
Once the optimization criterion has been defined and the number N of projections chosen, one has to search for the optimal combination of projections maximizing the separability.The main characteristic of this problem is the number of parameters that have to be optimized simultaneously.Indeed, each projection depends on 4 parameters, e.g., the azimuth and ellipticity of the illumination and analysis states.Searching for N projections thus involves optimization on 4N parameters, which is quite large even for low values of N and it is thus likely that the separability criterion will have local maxima.It is thus necessary to use an algorithm robust to the presence of local maxima.After having compared different solutions, we have chosen to use the Shuffled Complex Evolution (SCE-UA) Method [27].This algorithm consists in generating different sets consisting of N couples of illumination and analysis polarization states, and is changing them by using a global evolution framework to finally converge to the best set of parameters (S , T ) that leads to the highest separability of the contrast.Concerning our issue, we have verified that in our applications, it converges rapidly to the global maximum researched.

Application to a real-world imaging example
Let us now apply the proposed approach to the scene represented in Fig. 2. We have considered successively that we can acquire one, two or three polarimetric projections.On each of these configurations, we have compared the discrimination results obtained with the classifier presented in Eq. ( 5) on each of these configurations.The obtained images, the associated optimal states and the results of the classification are gathered in Fig. 4, as well as the discrimination result and Bhattacharyya distance obtained with full Mueller matrix data (see Fig. 3), that will serve as a benchmark.
As explained in the previous section, for all the acquisition scenarios, the total measurement time is constant and equal to t 0 .If only one projection is acquired, the integration time is thus t 0 .It can been seen in Fig. 4(a) that the obtained image is indeed much less noisy than the Mueller images in Fig. 3, since the latter corresponds to an integration time that is 16 times smaller for each projection.However, for this 4-region scenario, one projection -although optimal -is clearly insufficient since many classification errors are observed in Fig. 4(a).To visualize the detection efficiency, we have represented the estimation of the PDF (histogram) associated to the four classes of object in Fig. 5.We can observe that the average values of the four classes are well separated (which explains why we can globally discriminate the objects in the scene), but there is also a large overlap between the different PDF that leads to a high number of classification errors as we can see in Fig. 4(a).Consequently, the contrast criterion (C = 2.0) is low compared to that obtained using the full Mueller matrix (C = 6.6).It is thus necessary to increase the number of projections to enhance the classification performance.Let us now consider that we can acquire two projections (see Fig. 4(b) and 4(c).These images are obtained with an integration time equal to t 0 /2.We observe that the optimal projections are different from that obtained by trying to optimize the separability on only one projection (Fig. 4a).Indeed, in the image 4.b, the three regions are separated from the background but are not discriminated between themselves.This discrimination is done thanks to the second image (Fig. 4.c).To visualize the detection efficiency, we have plotted, in Fig. 6, the pixel value distributions of the different objects in the 2-dimensional space defined by the two optimal projections.We can see that the different point clouds are well separated that leads to good discrimination performance, as we can see in Fig. 4(b).Indeed, the obtained value of the separability criterion (C = 9.8) is higher than that obtained with full Mueller matrix data.We can now ask the question: is it possible to increase further the contrast using a third projection ?
The set of three projections that maximize separability is presented in Fig. 4(d,e,f).The integration time for each image is equal to t 0 /3.These images are different from all the previously obtained images and correspond to a different way to discriminate the objects.Indeed the first image discriminates the regions t 1 and t 3 from the background b and the region t 2 .The second image separates the region t 1 from the region t 3 and finally the third image isolates all the regions from the background.This set of images leads to a high value of the separability criterion (C = 13.1) that corresponds to a good separability of the classes in the 3-dimensional space (see Fig. 7) and thus to excellent discrimination results, as we can see in Fig. 4(c).Theses results are better than those obtained with the full Mueller matrix because we acquire only images that contain information relevant to classification: by decreasing the number of images, we increase the integration time associated with each image and thus increase the signal to noise ratio.
The objective is thus to find the best number of images that allows obtaining all the information necessary for the discrimination and also a good signal to noise ratio.Indeed, by increasing the number of images while keeping the same global acquisition time, we may not be able to extract more useful information but since the signal to noise ratio in the images is decreased, this may lead to a decrease of the contrast.To verify this assumption, we have represented in Fig. 8 the evolution of the contrast criterion as a function of the number of optimal images acquired.The integration time for one set of images is kept constant and equal to t 0 ∼ 80ms.We can see in Fig. 8 that the contrast begins by increasing and reaches its maximum for 3 images.This evolution can be explained by the fact that each extra image brings enough new information to compensate the decrease of the signal to noise ratio per image due to the reduction of the integration time.With 3 images, all the information useful for discrimination is gathered.This is in perfect accord with the standard result in Linear Discriminant Analysis (LDA): if K classes have to be discriminated, K −1 linear projection are sufficient to obtain optimal discrimination [22].If the number of acquired images is increased while keeping the total acquisition time constant, the information brought by these new images is no longer sufficient to compensate for the reduction of the signal to noise ratio and the contrast decreases.We can also notice that with 16 optimal projections, we obtain the same contrast as with the raw Mueller matrix data (C = 6.6).

Conclusion
We have proposed a methodology to determine the set of active polarimetric measurements that optimize discrimination of an arbitrary number of regions with different polarimetric properties.
For that purpose, we have considered systems where the illumination and analysis states can vary on the whole Poincaré sphere.Of course, this methodology can also be easily applied to systems with reduced sets of accessible polarization states in illumination and/on in analysis.
We have demonstrated on a real world example that a few number of optimized polarimetric measurements can overcome the performance of full Mueller matrix imaging.The optimal number of measurements is of course highly dependent on the nature of the observed scene.
It is clear that the determination of the optimal projections require some prior knowledge about the polarimetric properties of the scene -i.e. the average and the covariance of Mueller matrices of each class.It can thus not be used in all polarimetric imaging scenarios.However, if such prior knowledge is available, we have demonstrated that it may be preferable to use a few, well selected acquisitions.Of course, research on adaptive strategies that can reduce or suppress the need of prior knowledge about the scene is an interesting perspective of the present work.
The authors thank Ryoichi Horisaki for fruitful discussions and the anonymous reviewers for their comments which were very helpful to improve the quality of this paper.Guillaume Anna's Ph.D thesis is supported by the Délégation Générale pour l'Armement, MRIS domain IMAT.

Fig. 3 .
Fig. 3. (a) Mueller image of the scene with an integration time of t 0 /16 ∼ 5ms.(b) Results of the classification.

# 1 ( 2 9 Fig. 4 .
Fig. 4. Optimal sets of N = 1, 2 and 3 projections maximizing the separability of the objects in the scene.Last column: results of classification using a ML classifier.Last row : classification results obtained with full Mueller matrix data.The polarization states in illumination and analysis are represented by their azimuth (α) and ellipticity (ε): (α S , ε S )(α T , ε T ).

Fig. 5 .Fig. 6 .
Fig. 5.Estimated PDF of the four classes in the scene (estimated on sample of around 300 pixels in each class).This projection corresponds to the results in the first row of the figure 4.

#Fig. 7 .
Fig. 7. Representation of pixels of the four classes in the 3-dimensional space defined by the three images presented in the third row of the figure 4.

Fig. 8 .
Fig.8.Evolution of the contrast criterion in function of the number of optimal images acquired.The integration for one set of images is keeping constant and equal to t 0 ∼ 80ms.