A priori information and optimisation in polarimetry

Polarimetric measurements are designed to obtain information pertaining to the system under study, however noise in the system limits the precision and hence information obtainable. Exploitation of a priori knowledge of the system allows for an improvement in the precision of experimental data. In this vein we present a framework for system design and optimisation based upon the Fisher information matrix, which allows easy incorporation of such a priori information. As such the proposed figure of merit is more complete than the commonly used condition number. Conditions of equivalence are considered, however a number of examples highlight the failings of the condition number under more general scenarios. Bounds on the achievable informational gains via multiple polarimeter arms are also given. Finally we present analytic results concerning error distribution in a Mueller matrix polar decomposition, allowing for a more accurate noise analysis in polarimetric experiments. © 2008 Optical Society of America OCIS codes: (110.3055) Information theoretical analysis; (120.5410) Polarimetry; (220.4830) Systems design References and links 1. Planets Stars and Nebulae Studied With Photopolarimetry, T. Gehrels ed. (University of Arizona Press 1974). 2. J. Tinbergen, “Interstellar polarization in the immediate solar neighbourhood,” Astron. Astrophys. 105, 53–64 (1982). 3. R. A. Chipman, Polarimetry, Handbook of Optics Vol 2, (McGraw-Hill, New York, 1995). 4. R.M.A. Azzam and N.M. Bashara, Ellipsometry and polarised light, (Elsevier, North Holland, 1987). 5. J. Zallat, S. Aı̈nouz, and M. Ph. Stoll “Optimal configurations for imaging polarimeters: impact of image noise and systematic errors,” J. Opt. A: Pure Appl. Opt. 8 807–814 (2006). 6. J. S. Tyo, “Optimum linear combination strategy for an N-channel polarization sensitive imaging or vision system,” J. Opt. Soc. Am. A 15 359–366 (1998). 7. J. S. Tyo, “Design of optimal polarimeters: maximization of signal-to-noise ratio and minimization of systematic error,” Appl. Opt. 41 619–630 (2002). 8. A. Ambirajan and D.C. Look, “Optimum angles for a polarimeter,” Opt. Eng. 34 1651–1658 (1995). 9. M. Smith, “Optimization of a dual-rotating-retarder Mueller matrix polarimeter,” Appl. Opt. 41 2488–2493 (2002). 10. A. De Martino, E. Garcia-Caurel, B. Laude, and B. Drévillon, “General methods for optimized design and calibration of Mueller polarimeters,” Thin Solid Films, 455–456 112–119 (2004). 11. D. Mendlovic, and A. W. Lohmann “Spacebandwidth product adaptation and its application to superresolution: fundamentals,” J. Opt. Soc. Am. A 14 (1997). 12. T. J. Rothenberg, Efficient estimation with a priori information (New Haven, Yale University Press, 1973). 13. I. J. Cox, and C. J. R. Sheppard, “Information capacity and resolution in an optical system,” J. Opt. Soc. Am. A 3 1152–1158 (1986). 14. S. Y. Lu and R. A. Chipman, “Interpretation of Mueller matrices based on polar decomposition,” J. Opt. Soc. Am. A 13, 1106–1113 (1996). #99376 $15.00 USD Received 24 Jul 2008; revised 4 Sep 2008; accepted 8 Sep 2008; published 11 Sep 2008 (C) 2008 OSA 15 September 2008 / Vol. 16, No. 19 / OPTICS EXPRESS 15212 15. R. M. A. Azzam, “Division-of-amplitude Photopolarimeter (DOAP) for the simultaneous measurement of all four Stokes parameters of light,” J. Mod. Opt. 29 685–689 (1982). 16. E. Collett, “Automatic determination of the polarization state of nanosecond laser pulses,” U.S. Patent No. 4158506 (1979). 17. H. Mueller, ”The foundations of optics” J. Opt. Soc. Am. 38, 661–661 (1948). 18. V. Delaubert, N. Treps, C. Fabre, H. A. Bachor, and P. Réfrégier, “Quantum limits in image processing,” J. Mod. Opt. 29 685–689 (1982) 19. M. R. Foreman, S. S. Sherif, and P. Török, “Photon statistics in single molecule orientational imaging,” Opt. Express 15 13597–13606 (2007). 20. A. Dubois, K. Grieve, G. Moneron, R. Lecaque, L. Vabre, and C. Boccara, “Ultrahigh-Resolution Full-Field Optical Coherence Tomography,” Appl. Opt. 43, 2874–2883 (2004). 21. B. J. Meers, “Recycling in laser-interferometric gravitational-wave detectors,” Phys. Rev. D 38 2317–2326 (1988). 22. J. W. Goodman, Statistical Optics (Wiley, New York, 2000). 23. A. Bénière, F. Goudail, M. Alouini, and D. Dolfi, “Estimation precision of degree of polarization in the presence of signal-dependent and additive Poisson noises,” J. Europ. Opt. Soc. Rap. Public. 3 08002 (2008). 24. H. Cramér, Mathematical Methods of Statistics, (Princeton Univ. Press. 1946), ISBN 0-691-08004-6. 25. C. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,” Bull. Calcutta Math. Soc. 37 81-89 (1945) 26. R. Fisher, “On the mathematical foundations of theoretical statistics,” Phil. Trans. R. Soc. Lond. 222 309–368 (1922) 27. R. Fisher, “Theory of statistical estimation,” Proc. Cam. Phil. Soc. 22 700-725 (1925) 28. http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html 29. H. H. Barrett, J.L. Denny, R.F. Wagner and K.J. Myers “Objective assessment of image quality. II Fisher information, Fourier crosstalk, and figures of merit for task performance,” J. Opt. Soc. Am. A 12 834-852 (1995) 30. L. L. Scharf, Statistical Signal Processing: Detection, Estimation, and Time Series Analysis (Addison-Wesley, 1991) 31. E. W. Barankin, “Locally best unbiased estimates,” Ann. Math. Stat. 20 477-501 (1949). 32. M. F. Kijewski, S. P. Müller, and S. C. Moore, “The Barankin bound: a model of detection with location uncertainty,” Proc. SPIE 1768 153 (1992). 33. S. P. Müller, C. K. Abbey, F. J. Rybicki, S. C. Moore, and M. F. Kijewski, “Measures of performance in nonlinear estimation tasks: prediction of estimation performance at low signal-to-noise ratio,” Phys. Med. Biol. 50 3697– 3715 (2005). 34. B. R. Frieden Physics from Fisher Information: A Unification, (Cambridge University Press 1998). 35. J. S. Tyo, “Noise equalisation in Stokes parameter images obtained by use of variable-retardance polarimeters,” Opt. Lett. 25 1198-1200 (2000). 36. A. Bénière, F. Goudail, M. Alouini, and D. Dolfi, “Precision of degree of polarization estimation in the presence of additive Gaussian detector noise,” Opt. Commun. 278 264–269 (2007). 37. R. Mehra, “Optimal input signals for parameter estimation in dynamic systems–Survey and new results,” IEEE T. Automat. Contr. 19 753–768 (1974). 38. E. Walter and L. Pronzatom “Qualitative and quantitative experiment design for phenomenological models a survey,” Automatica 26 195–213 (1990). 39. S. N. Savenkov, “Optimization and structuring of the instrument matrix for polarimetric measurements,” Opt. Eng. 41 965–972 (2002). 40. V. Bhapkar and C. Srinivasan, “On Fisher information inequalities in the presence of nuisance parameters,” Ann. Inst. Statist. Math 46 593–604 (1994). 41. R. M. A. Azzam and F. F. Sudradjat, “Single-layer-coated beam splitters for the division-of-amplitude photopolarimeter,” Appl. Opt. 44 190–196 (2005). 42. A. D. Whalen, Detection of signals in noise, (Academic Press Inc., New York, 1971). 43. P. Török, M. Salt, E.E. Kriezis, P.R.T. Munro, H.P. Herzig, and C. Rockstuhl, “Optical disk and reader therefor,” Worldwide Patent No. WO 2006/010882 (2006). 44. P. R. T. Munro and P. Török, “Properties of confocal Mueller-matrix polarimeters,” (submitted to Opt. Lett.). 45. B. Laude-Boulesteix, A. De Martino, B. Drévillon, and L. Schwartz, “Mueller Polarimetric Imaging System with Liquid Crystals,” Appl. Opt. 43, 2824–2832 (2004) 46. J. M. Bueno, “Depolarization effects in the human eye,” Vision Research 41 2687–2696 (2001). 47. M. Floc’h, G. Le Brun, J. Cariou, and J. Lotrian, “Experimental characterization of immersed targets by polar decomposition of the Mueller matrices,” Eur. Phys. J. AP 3 349–358 (1998). 48. S. M. Nee, “Error analysis for Mueller matrix measurement,” J. Opt. Soc. Am. A 2


Introduction
Polarimetry is the study and measurement of the polarisation state of light.Although measurement of the state of polarisation of light is often an important objective [1,2] such techniques are also frequently used to obtain information about an optical system, such as its birefringence [3].One may then subdivide polarimetry into two broad categories; Stokes polarimetry and Mueller polarimetry.The former entails measuring the four Stokes parameters of light [4] whilst the latter is intended to measure the full Mueller matrix of a sample from which parameters of interest can be inferred.Much effort has been invested into determining optimal configurations of polarimeters in terms of their experimental setup [5][6][7][8][9][10], however invariably little consideration is given to a priori information that an experimenter may have about the system they are studying.Information theory however states that if exploited correctly such information can improve the accuracy of any measurements [11,12].For example if the position of an object is approximately known the field of view can be reduced, perhaps by using a confocal microscope giving rise to an increase in the bandwidth and hence resolution of the system [13].To address this omission we consider how such a priori information can be represented and incorporated into system optimisation.In doing so we find that common optimisation procedures do not necessarily give the optimal polarimeter configuration.Furthermore our formulation naturally describes the distribution of errors amongst inferred polarisation parameters, such as diattenuation, retardance and depolarisation as may be obtained from a Lu-Chipman Mueller matrix decomposition [14].These results could potentially be used for a more accurate noise analysis in polarimetry.
In Section 2 we first describe the system model of Stokes and Mueller polarimeters used throughout this text and a discussion of the noise model assumed.Section 3 then proceeds to discuss a priori information before we discuss its use in system optimisation in Section 4. A number of examples are given in Section 5 where we also consider and confirm the approach used when no a priori information is possessed.Finally in Sections 6 and 7 we discuss how our optimisation strategy can be extended and highlight the inherent description of noise propagation in polarimetry, before concluding in Section 8.

Stokes polarimeter
A division of amplitude polarimeter (DOAP) as originally proposed by Azzam [15] can measure the Stokes vector S = (S 0 , S 1 , S 2 , S 3 ) of incident light by projecting it onto at least four independent polarisation basis states using a polarisation state analyser (PSA) and as such the vector of detected intensities D = (D 0 , D 1 ,... ,D N D −1 ) can be written where T is a N D × 4 matrix with rows corresponding to the Stokes vectors of the N D polarisation basis states, known as the instrument matrix (see Fig. 1).It is by variation of the instrument matrix that we aim to optimise a Stokes polarimeter.Each row of the instrument matrix can be deduced from the polarisation elements present in each arm of the polarimeter setup used and is thus known.Given a set of measured intensities it is then possible to deduce the state of polarisation of the incident light by application of the inverse operation i.e.
where T + is the Moore-Penrose pseudoinverse of the instrument matrix.Although a division of wavefront polarimeter (DOWP) [16] can also be used for polarimetric measurements we neglect this arrangement in this work since it requires beams that are uniformly polarised and that the beam intensity profile be known a priori; conditions which are generally not achieved in practice.DOWPs are hence rarely used.

Mueller polarimeter
A Mueller polarimeter builds on the principle of a DOAP by addition of a polarisation state generator (PSG) and a sample to the optical setup as shown in Fig. 1.The action of the sample on the incident polarised light can be described by a 4 × 4 Mueller matrix M [17] such that the light incident into the DOAP is described by the Stokes vector where R is the Stokes vector of the illuminating beam.Since the Mueller matrix has 16 elements, all of which must in general be determined, it is not sufficient to illuminate using a single polarisation state.At minimum four distinct polarisation states must be used to illuminate the sample.Under these circumstances R can be written as a 4 × N R matrix, R, N R ≥ 4 whose columns are the Stokes vectors of the input states.Consequently D becomes a N D × N R matrix whose columns correspond to the vector of detected intensities for each input polarisation state.We can hence write from which the Mueller matrix can be found using the inverse operations i.e.
The incident polarisation matrix R provides additional degrees of freedom by which a Mueller polarimeter can be optimised.

Noise model
All practical systems are subject to noise of a stochastic nature.It is this unfortunate fact which limits the precision to which measurements can be made and hence the amount of information we can extract from said measurements.In optical systems noise can be attributed to two sources which Delaubert et.al [18] refer to as technical and quantum noise.The former arises from poor experimental setup and can in principle be reduced to an arbitrarily low level, however quantum noise, arising from the discrete nature of light and the stochastic nature of photon arrivals at the detector, can not.Despite being a fundamental limit to experimental precision the assumption of quantum noise is also valid for any shot noise limited setup, a limit that has been reached in many applications, for example single molecule detection, OCT and astronomy [19][20][21].Goodman's discussion of the degeneracy parameter [22] also illustrates that at optical frequencies quantum noise is frequently dominant over other potential fluctuations in the source.So as not to unnecessarily complicate calculations we adopt a semi-classical model of light whereby before incidence onto the detector light is modelled using traditional electromagnetic theory, whilst the final absorption process is necessarily quantum in nature.Under these assumptions it can be shown [22] that the number of photocounts registered by a photon counting detector is governed by a Poisson probability law where f (D| D) denotes the probability density function (PDF) for registering D photocounts and the bar notation is used to highlight that the PDF is dependent on the ensemble average D of possible measurements.Since the number of photocounts is proportional to the integrated intensity received on the detector according to D = ηW /h ν we refer to D simply as the detected intensity.Here η is the quantum efficiency of the detector, whilst h ν is the average energy of an incident photon.
DOAPs however have multiple detectors and it is necessary to consider the stochastic detection process on each.Assuming that the noise on each detector is independent the joint PDF is where an additional term I d n has been added to account for other potential additive sources of stray photons.The notation D can be read as either a matrix or vector of intensities, since the formulation in both cases is identical.A good discussion of such possible noise sources is given in [23], however we give the simple examples of detector dark count or a passive background.
Although not necessary we make the simplifying assumption that these additional noise sources affect each detector equally such that I d n = I d .

Deterministic systems
Optical measurements are intended to extract information about the system under study, however as discussed noise in the system limits the amount obtainable.This relationship can be placed on a more formal basis using the Cramér-Rao lower bound (CRLB) [24,25] which states that K ≥ J −1 w .Here and henceforth we assume we are estimating a set of parameters w.K is then the covariance matrix of our estimate i.e. the obtainable precision, and J w is known as the Fisher information matrix (FIM) [26,27] defined by E D [...] denotes the expectation or ensemble average with respect to D. In essence Fisher information is a statistical measure of the dependence of the observed data on the input parameters.
A strong dependence implies that we can measure the parameters more precisely.Application of the chain rule to Eq. ( 8) quickly yields where G = ∂ D ∂ w is independent of D. Various authors use different conventions for matrix calculus however throughout this work we adopt the formalism described in [28].From Eqs. (7) and ( 8) we arrive at the well known result that the FIM for estimation of the mean of a Poisson random process is given by where δ i j is the Kronecker delta.Eqs. ( 9) and (10) thus define the FIM associated with estimation of a parameter vector w from intensity measurements in the presence of Poisson noise.We can consider Stokes (or Mueller matrix) polarimeters by allowing w = S (or M) to give the associated FIM, where by equating w to M we imply a vectorisation operation of M.

Stochastic systems
Equations given thus far are valid only for a particular value of w however the parameter values may differ between different experimental setups or measurements.The experimenter may however know from an existing model or earlier data that the object being studied belongs to a restricted class, that is to say they possess some a priori information about the parameters being measured.A fibre-optic communication channel provides a good example whereby it is known that during a measurement window either a pulse will be received or not with equal probability representing logical 1 and 0 respectively.Known restrictions on the possible values of w can be conveniently parameterised using a PDF f ( w) which describes the probability of each value of w occurring.As shown in the accompanying Appendix when w can vary the FIM takes the form where J nr w is the deterministic FIM given by Eq. ( 9) and J r w depends only on our a priori knowledge via Eq.(46).
Although the CRLB and FIM are useful tools for defining a best obtainable precision it is worth questioning whether such a limit can practically be achieved.In answer we refer to [29] wherein it is shown that each photon can be considered as an independent measurement of the system.Furthermore the maximum-likelihood (ML) estimator [30] asymptotically achieves the CRLB as the number of samples increases.We calculate using the Central Limit Theorem that acceptable convergence requires ∼ 8000 photons and hence under these conditions we can achieve the CRLB by using a ML estimator.If such a criteria is not meet then the CRLB is not the strongest bound and one must resort to alternative bounds, such as the Barankin bound as discussed in [31][32][33].Full exposition of these bounds is beyond the scope of this paper.

Channel capacity
Frieden [34] defined a metric C w to quantify the ability of a system to transmit Fisher information about the system being studied, known as the Fisher channel capacity and given mathematically by the trace of the FIM.Parameter estimation can only be accomplished via some degree of data processing on the measured intensities.It is hence sufficient to consider the channel capacity in the context of estimating the mean intensities only i.e. tr(J D), since it is impossible for data processing steps to introduce additional information into the system.We consider first a Stokes polarimeter.From Eq. ( 10) we have which under the constraint 0 < ∑ N D n=1 Dn ≤ S 0 is a maximum when the intensity is equalised across the detectors, such that D n = aS 0 /N D , where a ≤ 1 is a positive constant.We note that equalising the intensity on each detector also equalises the noise which is considered a desirable property in optimisation of polarimeters [5,35].The channel capacity is thus bounded according to We thus see that although channel capacity initially increases quadratically with the number of detectors, this slows to a linear increase as the additive noise term I d becomes significant.We note here that if a Gaussian noise model were used, as is frequently done [18,36], the channel capacity would increase linearly with N D .Since Fisher information is additive [34] the Fisher channel capacity for a Mueller polarimeter can be calculated by summing the capacities for each input polarisation state.Thus where R 0 is the intensity of the probing polarisation states We note that this bound is unlikely to be achieved since equality requires the sample to be perfectly transmitting to all incident polarisation states.The channel capacity for a Mueller polarimeter can be seen to scale only linearly with respect to the number of probing polarisation states.Practically these results embody the intuitive result that by performing more measurements i.e. increase the sampling, we introduce greater redundancy into our experimental data hence allowing a better precision in our parameter estimates to be achieved.Henceforth we assume that N D = N R = 4 since this is the minimum number of measurements required to determine S or M uniquely.The associated FIMs are thus 4 × 4 and 16 × 16 respectively.

Ellipsoids of concentration
Consider a parameter column vector w (of length n) and an estimate made by an experimenter of this parameter vector denoted w e .In general there will be a discrepancy between the estimated and true value of w due to noise in the system given by Δ w = w − w e (15) We note that since w e is derived from random experimental data Δ w is a random variable.Subsequent experiments will give a different value for the vector Δ w.In the n-dimensional error space i.e. the space with coordinate axes given by Δw 1 , Δw 2 ,... ,Δw n each realisation of Δ w specifies a single point.The scatter of these points is determined by the variations in the experimenters estimate w e as described by the associated covariance matrix K. Furthermore the extent of scatter of different errors can be visualised using concentration ellipsoids [30] defined by Δ wK −1 Δ w T = c, the volume of which can be used as a metric of aggregate error.Via the CRLB the FIM defines the smallest possible ellipsoid of concentration, which has a volume of where V n is the volume of the n-dimensional unit hypersphere.As such we use the determinant of the FIM as a figure of merit, whereby a larger determinant is preferable.This criterion is known as D−optimality [37,38] and has the additional advantage that log|J w | is closely related to the Fisher channel capacity of the system, such that by maximising |J w | we are also maximising the channel capacity.Setting w = S or M and using Eqs.( 1), ( 4) and ( 9) it is possible to calculate the FIM for estimation of the Stokes parameters or the elements of the Mueller matrix respectively as where J D is of the form of Eq. ( 11) and ⊗ denotes the Kronecker product.Hence We now restrict to uniform distributions meaning J r D = 0, so as to illustrate the general behaviour of these figures of merit, whereby Closer inspection of Eqs. ( 18) reveals there are two factors which influence the amount of information received in an optical experiment.The first of these corresponds to the amount of information acquired during the physical measurement as described by the |J D| term.This component also encompasses any a priori information that may be possessed.
The second, and perhaps the more familiar, influence is related to any subsequent data processing used to extract the Stokes vector or Mueller matrix from the measured intensities as per Eqs.( 2) and (5).Any noise in the intensity measurements is amplified during data processing, the extent of which is often measured using the condition number of the associated matrices, namely T (and R).In this paper we define the condition number of the matrix X as κ X = X F X −1 F where ... F denotes the Frobenius norm.Since the condition number of a matrix is inversely proportional to its determinant (see e.g.[39]) we see that noise amplification is described by the |T| (and |R|) factors of Eqs.(18).
Frequently the condition number of the instrument matrix (and input polarisation matrix) is used as a figure of merit for polarimeter optimisation [7][8][9][10].Our discussion above however has highlighted the inadequacy of this strategy in general since it gives no regard to potential gains that can be made by improving the precision of the measurement itself or incorporation of a priori knowledge.We thus believe that our informational figure of merit is more holistic in terms of measuring the quality of a polarimeter.

Nuisance parameters
Hitherto it has been assumed that full knowledge of the Stokes vector or Mueller matrix was desired, however this may not always be the case as demonstrated in Section 5.3.Such a scenario thus warrants our attention, although we shall now give a full commentary since it is amply discussed in the literature e.g.[30].
Let us assume that the parameter vector w is formed via the concatenation of two vectors u and v such that w = ( u, v).u is assumed to contain the p parameters that we wish to know, whilst v contains q parameters that are of little interest, known as nuisance parameters.When nuisance parameters are present the relevant FIM J u is given by [40] J u = J 11 − J 12 J −1 22 J 21 (20) where J 11 is p × p, J 12 is p × q, J 21 is q × p, , J 22 is q × q and is the FIM were all parameters to be estimated.Accordingly the matrix determinant on which to optimise the polarimeter setup can be shown via rules of matrix algebra to be

Examples
Having developed a more suitable framework within which both Stokes and Mueller polarimeters can be optimised we now give a number of examples to highlight some points of interest.We start by highlighting the circumstances under which our informational figure of merit is equivalent to the condition number, however further examples then illustrate that when a priori information is introduced this equivalence does not hold.

Maximal ignorance
Our first example considers the situation in which we are maximally ignorant of the likely incident polarisation states.Initially considering a Stokes polarimeter our a priori information (or our lack thereof) can be modelled by assuming each polarisation state is equally likely.The associated PDF is thus uniform over the Poincaré sphere.Assuming that all possible incident states have the same intensity S 0 and degree of polarisation P (defined as the fraction of the total intensity originating from totally polarised light) it is possible to show Hence It is thus apparent that to maximise the information obtained we must maximise |T| or equivalently make the condition number as small as possible.It can be shown geometrically that this corresponds to making the volume of the tetrahedron whose vertices on the Poincaré sphere are defined by T i a maximum, i.e. making the tetrahedron regular [39].Although the same conclusion has been previously reached via considerations of the structure of the instrument matrix and noise propagation [7,35,39,41] our derivation based on information theory appears to be new.Since a maximum determinant corresponds to minimal noise amplification the SNR, given by is maximum.If I d S 0 this reduces to the familiar S 1/2 0 scaling associated with Poisson noise.There are an infinite number of possible instrument matrices corresponding to the rotation of the tetrahedron within the Poincaré sphere about the origin, however given one optimal instrument matrix (as can easily be found numerically) e.g.[39] it is possible to find alternative and perhaps more practical instrument matrices by applying a suitable rotation matrix.For a Mueller matrix polarimeter the same result applies, however we must also maximise the determinant of R, that is to say make the incident polarisation states as orthogonal as possible.For the completely uniform distribution there is no relationship between T and R.
Finally we note with reference to Fig. 2 that the channel capacity is independent of the choice of instrument matrix and increases with the degree of polarisation as would be expected.

Matched filter
Adopting the opposite extreme to maximal ignorance we now consider the polarimetric equivalent to the matched filter.Matched filters, which correlate a known signal (or template) with a measured signal so as to determine the presence or absence of the known signal against some background, are frequently encountered in signal processing since they maximise the SNR when the desired signal is present [42].We here restrict our discussion to that of a Stokes polarimeter in which depolarised light constitutes the background signal.
Denoting the known polarisation state by its Stokes vector S t = (S t0 , S t1 , S t2 , S t3 ) our a priori knowledge can be represented by the PDF f ( S) = δ ( S− S t ) where δ ( x) is the multi-dimensional Dirac delta function.Since S is non-random J r D is identically zero.Consequently (J D) i j = δ i j /( Di + I d ) whereby If the additive noise term I d is zero we see that we can obtain infinite information if D i = 0 on a single detector, corresponding to one arm of the polarimeter projecting the incident polarisation on to the basis state T i ∝ (S t0 , −S t1 , −S t2 , −S t3 ).Note the parallel with conventional matched filters whereby the filter corresponds to the template reversed in time.This result can be understood by noting that for a given state of polarisation there are only two PSA configurations capable of uniquely identifying that state, namely T i ∝ (S t0 , ±S t1 , ±S t2 , ±S t3 ) corresponding to diametrically opposite points on the Poincaré sphere.For example only a horizontal or vertical polariser can unambiguously identify horizontally polarised light (giving a maximum or null intensity respectively).When taking a single measurement it is not in general possible to know which intensity level corresponds to the maximum, whilst a null intensity is clearly identifiable.Furthermore we have assumed an underlying Poisson process in which noise variations grow as the intensity grows and hence lower intensities give a better precision.
If present, a depolarised background necessitates a second, distinct polarimeter arm and also results in finite information.The situation is similar for non-zero I d .Additional polarimeter arms improve estimation precision as discussed earlier in Section 3.3.Although unnecessary, we maintain our assertion that N D = 4 to allow easy comparison.Once more we find there are an infinite number of possible instrument matrices that give rise to a maximum in the information, since it is possible to trade off precision in the intensity measurements (corresponding to higher light levels) with a reduction in the noise amplification associated with data processing i.e. smaller condition number.Three possible polarimeter configurations are shown in Fig. 3(a) for a template Stokes vector of (1, 0, 0, 1).The first configuration (shown in green) gives the best condition number possible for a matched polarimeter (and consequently worse measurement precision), whilst the second (red) shows the opposite case, whereby the volume of the inscribed tetrahedron is significantly smaller i.e. larger condition number, yet the precision of measurement is increased since the total detected intensity is smaller.Practically this arrangement is unsuitable since it is highly sensitive to alignment errors in the PSA.The third configuration (blue) illustrates a more general arrangement.

Linear polarimeter
Our final example assumes the polarisation incident into a Stokes polarimeter is restricted to lie on the equator of the Poincaré sphere.This could for example correspond to studying the light from a family of polarisers.Such a model could be useful in polarisation multiplexed optical data storage [43,44].Using the PDF f (ε, θ ) = δ (ε)/π where ε and θ are angles which define a position on the Poincaré sphere [4] the expectations can be evaluated analytically to give where α i is the equatorial angle on the Poincaré sphere for the i th basis Stokes vector of the instrument matrix.Considering we know a priori that the incident polarisation state is linearly polarised we have no need to estimate S 3 since this only describes the ellipticity of the light and can hence treat it as a nuisance parameter as described in Section 4.2.A linear Stokes polarimeter is thus optimised when is maximum, where S denotes the parameter vector (S 0 , S 1 , S 2 ).In agreement with [6] the maximum of this metric occurs when the measurement basis Stokes vectors T i are equally spaced around the equator of the Poincaré sphere as shown in Fig. 3(b).When applied to Mueller polarimeters a similar analysis shows the optimal input polarisation states also be equally spaced about the equator of the Poincaré sphere, although their position need bear no resemblance to those defined by T i .

Extension of optimisation results
The above results with regards to the optimisation of polarimeters holds not only for inference of the Stokes parameters or elements of a Mueller matrix but can be further extended for inference of parameters z from these quantities.Such a situation may arise when performing a Lu-Chipman polar decomposition [14] on a measured Mueller matrix as is heavily used in the literature [45][46][47].Eqs.(17) generalise to Accordingly the volume of the ellipsoid of concentration as found from |J z | is modified by a factor of |∂ w/∂ z| 2 ( w = S or M), which is independent of both T and R. Its significance in terms of optimisation with respect to the experimental setup is thus null hence optimisation of these more complicated inference problems reduces to the optimisation procedure previously discussed.
Although in this work we discuss optimisation when the system is limited by quantum noise, the same technique can be used for alternative noise models.Adopting different noise models only requires a change in the deterministic FIM J nr D .For example the FIM when assuming a Gaussian noise model is J nr D = K −1 .

Single element systems
Noise propagation in inference problems, that is to say how noise in experimental data manifests itself as errors in the parameters of interest, can also be considered by employing Eqs.(30).Although the mathematics is generally complicated we give here a result pertaining to polar decomposition of Mueller matrices [14].Before considering the composite systems for which polar decomposition is relevant we must first describe noise propagation for single polarisation element systems, namely pure diattenuators, retarders and depolarisers.
A diattenuator is a non-depolarising polarisation element which preferentially transmits particular states of polarisation and has a Mueller matrix of the form [14] where T u is the transmittance for unpolarised light, A = (A 1 , A 2 , A 3 ) is the diattenuation vector whose magnitude A is known as the diattenuation and I is the 3 × 3 identity matrix.Any decomposition algorithm will need to estimate all four unknown parameters z = (A 1 , A 2 , A 3 , T u ).Lengthy calculations give the derivatives required for evaluation of Eq. (30b) as where where m Ai j is the (i, j) th element of m A .Using the FIM, J A , as calculated from Eqs. ( 30)- (34) and the CRLB we can calculate the best obtainable precision for estimation of the diattenuation parameters.The error on each parameter will in general be different, a point considered further in [48,49].
Similarly we can consider a pure retarder which has a Mueller matrix of the general form where ) defines a retardance axis and has a norm of R known as the retardance and ε i jq is the Levi-Civita permutation symbol.Calculation of the FIM, J R , requires the derivatives The case of a depolariser is however much more difficult to tackle since in general an eigenanalysis of the system is required to find the pertinent depolarisation parameters.This can not be described analytically except in some special cases.For example if it were known a priori that the sample were a pure depolariser with Mueller matrix of the form where 1 − |a|, 1 − |b| and 1 − |c| are the principal depolarisation factors we can easily calculate the derivatives where M Δ i j is the (i, j) th element of M Δ .The appropriate FIM J Δ is then given by substituting Eqs.(39) into Eq.(30b).

Composite systems
The single element results discussed above can be used for noise analysis when the experimenter has a priori knowledge about the structure of the Mueller matrix.If however this is not the case a Lu-Chipman decomposition is frequently performed so as to parameterise the sample.Fundamental to the Lu-Chipman decomposition is the fact that an arbitrary Mueller matrix can be written as the product of three distinct Mueller matrices corresponding to a depolariser, retarder and diattenuator i.e.M = M Δ M R M A .Morio and Goudail [50] considered the importance of altering the order in which the product is evaluated and found that different decompositions gave either unphysical results or merely comprised of an appropriate rotation compared to the Lu-Chipman decomposition and thus is only a mathematical, not physical, difference.With these results in mind we adhere to the original formulation, since this ensures physicality and furthermore corresponds to common usage.Calculation of the FIM for a Lu-Chipman decomposition can be achieved by application of the product rule to Eq. (30b) which yields since the structure of the matrices dictates that the cross terms are identically zero.It is important to note that we stack the parameters of interest into a single parameter vector for example where the order of the diagonal terms depends only on the ordering of the parameters in z.
Mathematically the FIMs J R , J A and J Δ are of the same form as the single element FIMs described in the previous section albeit for a slight modification in the effective input polarisation states and instrument matrix respectively as can be seen by comparing Eqs.(30b) and (40).Fortunately this makes physical sense considering the Mueller matrix polar decomposition models the system as a cascade of three independent polarisation elements.Once more we give the cautionary note that the derivatives required to calculate J Δ can not be found analytically in general.

Conclusions
In this paper we have addressed the question as to how a priori information we may possess about a system being measured can be used to improve the precision of our measurements.In doing so we used a metric based upon the FIM which takes a holistic approach by including the raw information obtained from a measurement and the noise amplification that occurs during data processing.We have shown the conditions under which our figure of merit is equivalent to the frequently used condition number of the instrument (and incident polarisation) matrix, namely maximal ignorance of the likely values of the parameters of interest.We then proceeded to show that when incorporating a priori knowledge to achieve a resolution improvement the condition number is unsatisfactory and does not give optimal results.This was illustrated by considering the polarimetric equivalent of a matched filter and linear polarimeters.Calculation of the informational figure of merit requires calculation of a FIM, which is also beneficial since it provides a simple description of the noise propagation that may arise during data processing and how the resulting errors are distributed among the parameter values.Specifically we have given results pertaining to a Lu-Chipman polar decomposition since this is frequently used in polarimetric analysis.
Although formulated in terms of estimation of Stokes parameters and Mueller matrix elements in the presence of Poisson noise, optimisation with respect to Fisher information is easily extended to inference of alternative sample parameters and noise models and is thus applicable to a wide variety of optical experiments.
Finally we acknowledge the financial support of the EU via NANOPRIM Contract No. NMP3-CT-2007-033310.

A. Fisher information for random parameters
For a deterministic parameter the FIM is defined by Eq. (8) where f (D| w) is the PDF of D conditioned on the value of w.If however w can vary this definition becomes unsatisfactory since it does not account for our knowledge about the random nature of the parameter, which can be used to improve the precision of any measurement.Instead it is more appropriate to define the FIM in terms of the joint PDF of D and w, namely f (D, w) = f (D| w) f ( w) Taking the logarithm gives L(D, w) = ln f (D| w) + ln f ( w) (42) so that the modified FIM is defined by where L 1 (D, w) = ln f (D| w) and L 2 ( w) = ln f ( w), and expectations are now with respect to both D and w.Considering each of these terms in turn we have where E w [...] denotes the expectation with respect to w only and J nr w is the deterministic (or non-random) Fisher information matrix as defined by Eq. ( 8).
Adopting a similar treatment of J 2 we have where the last step follows from f (D| w)dD = 1.A similar result follows for J 3 .Finally consider Combining these results we find J w = E w [J nr w ] + J r w (47) We thus see that the Fisher information matrix when trying to estimate a random parameter w is given by the average of the deterministic Fisher information with respect to w plus an additional term arising from our knowledge of the random behaviour of w.
As a final point of interest we note here that if f ( w) is uniform then such that J r w = 0 and J w = E w [J

Fig. 2 .
Fig. 2. Dependence of channel capacity on degree of polarisation for S 0 /I d = 10 4 .

Fig. 3 .
Fig. 3. Poincaré sphere showing possible polarimeter configurations (a) for a Stokes polarimeter matched to the template Stokes vector (1, 0, 0, 1) i.e. left circularly polarised light and (b) a linear polarimeter, assuming the ratio S 0 /I d = 10 5 .Each arrow denotes the basis Stokes vector of a polarimeter arm.