Optimal birefringence distributions for star test polarimetry

Star test polarimetry is an imaging polarimetry technique in which an element with spatially-varying birefringence is placed in the pupil plane to encode polarization information into the point-spread function (PSF) of an imaging system. In this work, a variational calculation is performed to find the optimal birefringence distribution that effectively encodes polarization information while producing the smallest possible PSF, thus maximizing the resolution for imaging polarimetry. This optimal solution is found to be nearly equivalent to the birefringence distribution that results from a glass window being subjected to three uniformly spaced stress points at its edges, which has been used in previous star test polarimetry setups.

. System layout for a star test polarimetry measurement of an unknown input field E 0 , illustrated for the case in which a single circular polarization component is imaged. The pupil plane is described by a radial coordinate u = sin θ and an azimuthal angle φ. independent fluorescent molecules [9]. Star test polarimetry can also be applied for imaging continuous objects if these are discretized by placing a pinhole array in the object plane of an afocal 4 f relay system with unit magnification [10,11]. In all these applications, the polarization state of each object point can then be deduced from the corresponding PSF in the image plane, provided that the PSFs from different points do not overlap significantly. The spatial resolution of the measurement is thus limited by the size of the PSF. Encoding polarization information in the PSF necessarily implies increasing its size, since more information is being included in it.
The goal of this work is to find the birefringence distribution of the mask in the pupil plane that maximizes spatial resolution by producing the smallest possible PSF while imposing reasonable restrictions to ensure that the object's polarization is effectively encoded in the shape of the PSF. The optimal solution is found to give very similar results to those of the birefringence distribution of the SEO. The solution's statistical performance (i.e., the expected error in the retrieved Stokes parameters) is assessed by calculating the Fisher information matrix for the measurement.

System layout and notation
Consider the system shown in Fig. 1, in which a spatially-varying, transparent, thin birefringent mask with Jones matrix J(u) is placed at the pupil plane of an exit-telecentric imaging system, where u = (u cos φ, u sin φ) is the pupil coordinate normalized such that u ≤ NA, with NA being the system's numerical aperture. Collimated light (e.g., from a distant localized source) with uniform polarization E 0 is then incident on this BM in the plane of an aperture with pupil function A(u) (binary or apodized) and is focused by a lens, producing a polarization-dependent PSF. The input polarization may then be deduced from the shape of the PSF by separating two orthogonal polarization components of the field, denoted by e 1 and e 2 , before forming an image of one (or both) of them. For applications where photons are not scarce, only one of the two images is often necessary, so the extraction of the component in question (say, e 1 ) can be performed with a polarizer. As will be discussed later, however, there are applications in which it is best to use all photons, so instead the two components (e 1 and e 2 ) are separated by using a polarizing beamsplitter or a Nomarski or Wollaston prism and then they are imaged separately [9].
In what follows, all calculations are performed in the circular polarization basis, so that e 1 = (1, 0) and e 2 = (0, 1), corresponding to right-hand circular (RHC) and left-hand circular (LHC) polarizations, respectively. If other output polarization components were analyzed, the results would still be valid after minor modifications. It is assumed that the imaging lens is slow (typically around 0.05 NA) so that the PSFs are imaged onto several pixels. All calculations then assume the paraxial limit, and angle-dependent polarization effects at the lens can be neglected.

Birefringence distribution and point-spread function
We use here the notation introduced in Ref. [12] to describe spatially-varying birefringence, where the Jones matrix of the BM in the circular polarization basis may be written as with Γ(u) being a global phase function, and where Θ(u) ∈ [−π/2, π/2] and Φ(u) ∈ [0, 2π) are the latitude and longitude on the Poincaré sphere of the "slow" eigenpolarization at the pupil point u, and δ(u) represents half the retardance between the two eigenpolarizations. As discussed in Ref. [12], the four quantities in Eq. (2) can be used to construct a four-dimensional unit vector ì q (u) = (q 0 , q 1 , q 2 , q 3 ), and J(u) itself can be considered as a unit quaternion. A particularly simple visualization of the birefringence results from using the 3-vector q = (q 1 , q 2 , q 3 ), which points in the direction joining the two eigenpolarizations in the Poincaré sphere, and whose magnitude encodes the half-retardance δ as |q(u)| = sin δ(u) [12]. Because ì q and − ì q correspond to the same effective birefringence, this vector can always be chosen such that q 0 = (1 − |q| 2 ) 1/2 ≥ 0.
After transmission through the BM, the e 1 and e 2 polarization components of the field are given by e † 1 J(u) E 0 and e † 2 J(u) E 0 , respectively, where the dagger denotes a conjugate transpose. The paraxial PSF I (j) (x) of each polarization component e j is the squared modulus of the Fourier transform of the pupil distribution: where k = 2π/λ is the wavenumber, x is the two-dimensional spatial coordinate in the image plane, and · T denotes a time average in the case of partially polarized light. As shown in Ref. [12], the phase function Γ(u) causes the PSF to increase in size without helping to encode polarization information. Therefore, its optimal value is a constant, assumed here without loss of generality to be zero. The PSF may be rewritten succinctly by introducing the pupil functions and their Fourier transforms Note that G(x) and H(x) represent the output e 1 field component that results from an RHC or LHC polarized incident beam, respectively. As shown in Appendix A, the PSF of each polarization component may then be expressed in terms of the Stokes parameters S 0 , S 1 , S 2 , S 3 of the incident field E 0 in the form where the normalized intensity contributions I (1,2) n (x) are given by As shown in Appendix B, the Fisher information matrix for a measurement of the normalized Stokes vector s = (S 1 , S 2 , S 3 )/S 0 when N photons are detected is given by where, in what follows, we assume that only the image for the first polarization component (e 1 ) is being used, so that µ(x) is defined by and is a unit vector, and therefore |µ| ≤ 1. The case in which both images are used is discussed in Section 9.

Constraints on the BM distribution
As mentioned previously, the primary performance metric for a candidate BM distribution is its impact on the width of the resulting PSF. However, it is necessary to impose some constraints in order to avoid solutions that would be unsuitable for polarimetry. For practical purposes, it is useful for the PSFs corresponding to each polarization component to have the same total power Ψ (j) = ∬ I (j) (x)d 2 x. This makes the signal-to-noise ratio roughly independent of the input polarization, and it ensures that all polarization information encoded by the BM is fully contained within the shape of each intensity distribution rather than the overall power. Consequently, the input polarization can be deduced by analyzing the PSF of a single component, which is guaranteed to contain roughly half of the incident photons when the signal is sufficiently large.
The total power of each polarization component can be found by integrating Eq. (6) over x and applying Parseval's theorem to each term, leading to where β(u) = (−2q 0 q 2 + 2q 1 q 3 , 2q 0 q 1 + 2q 2 q 3 , q 2 0 − q 2 1 − q 2 2 + q 2 3 ) is a unit vector. Then Note that this condition is not satisfied when ì q is constant, ruling out the possibility of a uniform BM. In fact, from Parseval's theorem it is easy to see that µ = β A /Ψ (1) . Therefore, the condition β A = 0 implies that µ = 0, which causes a significant simplification of the Fisher information matrix in Eq. (8), in particular making zero the second term inside the brackets that is guaranteed to be non-positive for the diagonal elements of the matrix.

Derivation of differential equation and boundary conditions
When the BM is introduced into the system, the PSF must increase in size since it now contains polarization information. Because the PSF would provide information about four values (the Stokes parameters) rather than one, the width of I (j) (x) can be expected to be roughly twice as large as that of a system with zero (or uniform) birefringence. Of course, there are many measures for the size of the PSF, each of which would give a slightly different result. The metric chosen here is the RMS irradiance width, which allows a simple variational calculation. It is expected that the optimal birefringence distribution for this measure will be nearly optimal for other measures, such as the full width at half maximum (FWHM) or the Strehl ratio.
Generally speaking, the width of the PSF does not strongly depend on the incident polarization state since the normalized intensity contributions I (1) n (x) (n = 0, 1, 2, 3) have similar compositions. Therefore, for simplicity, the performance may be evaluated in terms of the spread of the average PSF over all possible input polarizations, i.e., over all values of S 1 , S 2 , and S 3 within the range [−S 0 , S 0 ]. This average produces I (1) , the analysis is also valid when the PSFs of both polarization components are used.) As shown in Ref. [12], the squared RMS width r 2 of the PSF for unpolarized light is where ∇ is the gradient with respect to u, ∇ ì q denotes the Frobenius norm of the 2 × 4 matrix ∇ ì q, and κ = k 2 ∬ A 2 d 2 u. (This result can also be derived from Eqs. (6) and (7a).) The term involving |∇A| 2 accounts for diffraction, while the second term is the increase in width due to the birefringence distribution of the BM. Notice that if A represents a hard aperture, the RMS width is not well-defined since ∇A diverges at the edge of the pupil. However, the increment caused by the BM can be well-defined and finite even for a hard-aperture pupil; it is this increment that we seek to minimize. Even for hard apertures, this measure of increase of the PSF width is expected to produce meaningful results, which can be verified by evaluating the FWHM of the solution for comparison. Since ì q is a unit vector, it contains only three free components for optimization. A natural choice is to use the three-dimensional vector q and let q 0 = (1 − |q| 2 ) 1/2 . To find the solutions we use a variational approach: we consider a linear combination of the PSF's width increase and the three constraints, is a vector of Lagrange multipliers and the constant prefactor κ was introduced for future convenience. The variational procedure consists of finding the conditions under which M is stationary under infinitesimal changes in q. For this purpose, consider a variation ì is an infinitesimal change. Since after this change the vector must remain a unit vector, ì must be perpendicular to ì q, so only three of its components are independent parameters, e.g., the zeroth component can be chosen to be a function of the remaining ones according to Let us first analyze the variations of each of the different parts, starting with the PSF width increase. It can be seen from Eq. (12) that ì q → ì q + ì causes the change where we ignored terms quadratic in ì and in the second step we removed the derivatives from n by using integration by parts. The term in square brackets is an edge contribution that is present in the case of hard apertures, where ∂ ⊥ denotes the derivative normal to the edge, and it should be included only if edges are excluded from the region of integration within the second term (since otherwise this edge contribution is already included in this integral). The corresponding variations in the three constrained quantities β A give In Eqs. (14) and (15), 0 can be expressed in terms of the remaining n by using Eq. (13). The substitution of the resulting variations into M causes a variation of the form M → The total variation of M is then insensitive to the infinitesimal changes n at any point inside the pupil only if M 1 = M 2 = M 3 = 0 everywhere inside the pupil, that is, if the following three constraints hold: Note that for hard apertures, these equations are dominated at the edges by the derivatives acting on A 2 , which is discontinuous. However, as mentioned earlier, it is perhaps more convenient to account for variations of M due to the edges by using the edge term in Eq. (14). After following similar steps, it is easy to find three edge constraints that can be written succinctly as This way, Eqs. (16) apply within the smooth regions of the aperture (even infinitesimally close to the edges) and constitute the differential equations for q(u) to be solved, while Eq. (17) applies at the edges, providing appropriate boundary conditions.

Solution ignoring the boundary conditions
In order to solve these equations, it is illustrative to first solve the simpler problem in which the constraints are ignored in the variational calculation (that is, the Lagrange multipliers are set to zero), as are the boundary conditions. In this case, Eqs. (16a) through (16c) can be written compactly as or as a set of several equalities: The latter representation highlights the underlying symmetry between the coordinates of ì q (u) on the Poincaré hypersphere they inhabit [12].
Consider the standard case of a circular hard aperture with pupil function The rotational symmetry of the pupil combined with the symmetries of the differential equations mentioned earlier allow solving the problem through separation of the polar pupil variables u, φ.
The solution becomes particularly simple if we assume that the BM is not birefringent at the pupil's center, i.e., that q = 0 for u = 0. (Note that this choice causes no loss in generality, since the cascading of the resulting BM with plates with uniform birefringence before and/or after it gives access to other solutions.) It is easy to show that a set of solutions to Eq. (18) is given by for integer m, and where b m is a constant. (A derivation of this result in terms of a stereographic projection of the hyperspherical coordinates ì q is given in Ref. [13].) Note that we also made the choice of setting q 3 = 0. Again, there is no loss of generality in this choice, because other solutions can be found through cascading with uniform birefringent plates.
The solutions in Eq. (21) turn out to automatically satisfy the constraints β 1 A = β 2 A = 0. For the third constraint ( β 3 A = 0), we calculate where we used the change of variables v = u/NA ∈ [0, 1] in the last step. The value of this expression is plotted in Fig. 2 whereδ(v) = δ(vNA). Note from this expression that solutions with higher |m| tend to cause larger increases in the size of the PSF. For the solutions in Eq. (21),δ(v) = 1 2 arctan(b m NA |m| v |m| ). The numerical results obtained for each solution are shown in the inset in Fig. 2, where we can see that the solution with |m| = 1, b 1 = 0.625/NA produces the smallest increase in PSF width, which is ∆r = 0.239λ/NA. The resulting half-retardance distribution is thenδ(v) = 2 arctan(0.625v).
While the solutions found above satisfy the constraints β A = 0, the fact that these constraints were ignored at the outset in the variational calculation means that they cannot simultaneously satisfy this constraint and the boundary condition in Eq. (17), which for the hard circular aperture considered here (where the normal derivative ∂ ⊥ reduces to a radial derivative) is given bȳ These solutions are therefore not optimal, but they provide insights that are useful when considering the true optimal solution.

Optimal birefringence distribution
Let us now go back to the general equations (16) while again considering a hard circular pupil. Drawing inspiration from the previous case, we propose a separable solution constrained to the q 1 q 2 plane, namely q(u, φ) = sin δ(u) [cos(mφ), sin(mφ), 0] with m 0, which automatically satisfies the constraints β 1 A = β 2 A = 0. Notice from Eqs. (2) that this ansatz corresponds to q 0 (u) = cos δ(u), Θ = 0, and Φ = mφ. The substitution of this ansatz into Eq. (16c) implies that Λ 1 = Λ 2 = 0, and the resulting form of both Eqs. (16a) and (16b) gives a differential equation for the half-retardance δ, which after changing variables to v = u/NA becomes for v ∈ [0, 1], where the assumption of no birefringence at the pupil center translates into the conditionδ(0) = 0, and the boundary condition in Eq. (24) must be satisfied. Additionally, the solution must satisfy the constraint β 3 A = 0, which implies ∫ 1 0 cos 2δ(v) v dv = 0. Equation (25) must be solved numerically subject to these boundary conditions and constraints by choosing appropriately the initial slope and NA 2 Λ 3 . Like for the case considered in the previous section, the optimal solution corresponds to |m| = 1. This solution is shown in Fig. 3, and it turns out to be very well approximated by the quadratic expression 1.856v − 0.922v 2 (the R-squared of the fit being 0.9996). This solution achieves an increase in the PSF width of ∆r = 0.220λ/NA, which is indeed slightly smaller than that found in the previous section.

Performance evaluation and comparison to SEO
As discussed in the introduction, star test polarimetry is implemented experimentally by using an SEO having three points of symmetric stress. The birefringence distribution near the center of an SEO turns out to be approximately given by the same general form as the previous two solutions, namely q(u, φ) = sin δ(u) [cos(mφ), sin(mφ), 0], with m = −1 and where in this case the half-retardance is given by δ(u) = cu, with c being a measure of the stress in the SEO [6,7]. Given the similarity of this birefringence distribution to the optimal one, the constraints β 1 A = 0  and β 2 A = 0 are also automatically satisfied. The third constraint reduces to where sinc(x) = x −1 sin x. This equation has infinitely many solutions for c, but the one that leads to the smallest PSF width increase is the first one, c = 1.166/NA, shown in Fig. 3. Table 1 provides a comparison of the performance metrics of the BM solutions found in the previous two sections and the SEO. The half-retardance distributions are also summarized. All three solutions correspond to |m| = 1. (The SEO corresponds specifically to m = −1, but the results are the same for m = 1, which corresponds to an SEO followed a uniform half-wave plate.) The optimal solution does outperform the two other solutions not only in RMS width increase ∆r (calculated through Eq. (23)) for which it was optimized, but also for two other performance metrics: the FWHM and the Strehl ratio. However, the differences in these metrics between the optimal solution and the SEO are of only between 5 and 10%. Also provided for comparison are corresponding metrics for a diffraction-limited imaging system where no BM is used (or equivalently where the BM is uniform). We can see that the cost of encoding polarization information into the PSF (on average over all possible polarizations) is an increase in its FWHM of about 60% and a reduction of the Strehl ratio by about 50%.
The functions G(x) and H(x) and the corresponding PSF contributions I (j) n (x) for the optimal BM are shown in Fig. 4. Note that the PSF contributions involve negative contributions for n ≥ 1, but the total measured PSF, which corresponds to the superposition in Eq. (6), is always non-negative as a result of the constraint S 2 0 ≥ S 2 1 + S 2 2 + S 2 3 . For the PSF contributions, the 3 (x) 3 (x) changes by a global sign. That is, the symmetries of the functions G(x) and H(x) are such that the only sign change (±) that has an effect in Eqs. (7) is that in Eq. (7d). The equivalent plots for an SEO are virtually indistinguishable and are therefore not shown. Instead, to appreciate the very subtle differences, plots of the horizontal cross-sections I (1) n (x, 0) for both the optimal BM solution and the SEO are shown in Fig. 5. In practice, the effect of these differences is probably insignificant compared with variations that would arrive, for example, from the pixelization of the detector.
The accuracy of the measurement can be assessed by using the Fisher information matrix in Eq. (8), which given the constraints imposed on the solutions reduces to The inverse of this matrix provides an estimate of the variance in the accuracy of the measure- ments. That is, for a given incident polarizationŝ, the retrieved normalized Stokes parameter vector s is expected (within one standard deviation) to be within an ellipsoid given by Figure 6 shows cross-sections (in red) of these ellipsoids for several incident polarization states, both for the optimal BM and for the SEO, for the case of N = 1500 detected photons. The error ellipsoids are slightly more symmetric with respect to the s 1 -s 2 plane for the optimal BM, but overall the performance is essentially the same. Note that these expected deviations of the normalized Stokes vectors scale as N −1/2 , and that the width of the ellipsoids in the plane normal to the cross-sections is comparable to that in the azimuthal direction.
The fact that all solutions described so far have the form q(u, φ) = sin δ(u) [cos(φ), ± sin(φ), 0], where δ(u) is linear near the center, can be understood in terms of the two circular components of the incident light. Recall that the BM is followed by an RHC polarizer and a Fourier-transforming lens before reaching the detector. Since the birefringence is smallest at the center of the BM and grows away from it, the BM/RHC polarizer combination acts as a rotationally-symmetric apodizer for the incident RHC-polarized light. The incident LHC-polarized component, on the other hand, gets converted into RHC light away from the BM's center, and acquires a phase vortex due to a geometric phase effect. After Fourier transformation by the lens, the field distributions at the detector plane for these components are precisely the distributions G and H (shown in Fig. 5 for the optimal BM), whose forms are a localized central lobe with uniform phase and a slightly larger "donut" with a phase vortex of unit charge at its center, respectively. The radial extent of the latter would increase with |m|, the magnitude of the vortex charge written by the BM on H, so |m| = 1 guarantees the most compact vortex distribution. The superposition of these two fields then gives access to combinations with all relative phases between them, as well as with all relative amplitudes due to their different radial dependence.
The accuracy of a polarimetric measurement is known to increase with the volume within the Poincaré sphere enclosed by a simplex whose corners correspond to the polarization components being measured [14]. The analysis above tells us then that, in theory, each point of the PSF being measured corresponds to the measurement of a different polarization component, and that these cover the complete surface of the Poincaré sphere in a fairly uniform way, so that all the interior of the sphere is enclosed. The uniformity of the coverage of the Poincaré sphere in the s 3 direction following from the variation in the radial direction of the relative weights of the two field components is examined in Fig. 7(a) for both the optimal solution and the SEO. While the uniformity is not perfect, all points are represented in similar amounts and the centroid of the distribution is zero, as guaranteed by the constraint µ 3 = 0. In practice, of course, the sampling of these different polarization components is compromised by detector pixelation, and even if the pixels were to be made very small, this would come at the cost of each detecting less photons. However, it is clear that this type of measurement provides a more "democratic" coverage of the sphere when compared, say, to a polarimeter where six polarization components are measured.

Results when both polarization components are measured
So far we have assumed that only one circular polarization component (e 1 ) emerging from the BM is focused to form an image. However, as mentioned in the introduction, it is possible to instead separately image both circular components and analyze the two resulting PSFs jointly. While doing so requires a more complex system, there are applications in which it is advantageous [9]. One clear advantage is that the number of detected photons is essentially doubled, reducing the uncertainty in the estimation of the Stokes parameters by roughly a factor of 1/ √ 2. Note, however, that when both images are used, µ vanishes automatically (where the bar now indicates weighted integration over both images) because the total number of detected photons is independent of the polarization state. The Fisher information matrix is then given by the simpler form in Eq. (27) without having to impose the constraints introduced in Section 4. While removing these constraints would clearly affect the variational derivation presented earlier, we now show that the results found earlier remain nearly optimal for the case when both components are imaged. Consider again the representative case of nearly unpolarized light (s ≈ 0), for which the uncertainty in the estimation of the Stokes parameters is largest, and for which the Fisher information matrix in Eq. (27) reduces to N times µ m µ n . Given the forms found for I (j) n (x) (antisymmetric in x and symmetric in y for n = 1, symmetric in x and antisymmetric in y for n = 2, and rotationally symmetric for n = 3), the 3 × 3 matrix with components µ m µ n is automatically diagonal. Further, because µ(x) is a unit vector, the trace of this matrix is unity. The uncertainty in the measurement is then minimized if the three diagonal elements have equal magnitude, that is, if µ 2 n = 1/3 for n = 1, 2, 3. It turns out that the optimal BM solution derived in Section 7 nearly achieves this, giving µ 2 1 = µ 2 2 = 0.32 and µ 2 3 = 0.35, while the SEO gives µ 2 1 = µ 2 2 = 0.35 and µ 2 3 = 0.30. The cross-sections of the uncertainty ellipsoids for both the optimal BM solution and the SEO are shown (in blue) in Fig. 6 for the case in which 1500 photons are detected in each PSF, giving a total number of photons of N = 3000. The first thing one notices is that the uncertainties are indeed smaller by a factor of about 1/ √ 2 with respect to those where only the PSF for the e 1 component was used (red). Also, the uncertainties are now exactly symmetric in s 3 , as is the coverage of s 3 shown in Fig. 7(b). Finally, note that the relations µ m µ n = µ 2 n δ mn and µ n = 0, valid for the optimal and SEO solutions (and whether only e 1 or both e 1 and e 2 components are imaged), imply that the four basic PSFs, I (j) n (x), are orthogonal (and almost orthonormal since µ 2 n ≈ 1/3) under the weight 1/I (j) 0 (x). This orthogonality is useful in the retrieval of the parameters from the measured PSFs.

Concluding remarks
A variational calculation was performed to determine the spatial birefringence distribution to be placed at the pupil plane of an imaging system that optimizes the encoding of polarization in the PSF's shape while keeping the PSF size as small as possible. This optimal birefringence pattern was found to be similar (both in distribution and performance) to that found naturally near the equilibrium points of a transparent window under stress: the mask is not birefringent at its center, but its retardance grows with the radial pupil coordinate u, while the orientation of the slow and fast axes rotates with the azimuthal pupil coordinate φ as ±φ/2. (As mentioned earlier, other solutions with equal performance correspond to birefringence distributions that result from the cascading of these solutions with uniform wave plates.) The only small differences were in how the retardance grows with u away from the center, but these cause only small variations in the resulting PSFs. In fact, if the variational derivation had been based on a different measure of PSF size and/or requirements for polarimetric optimality (e.g. a measure of the uniformity of µ 3 in Fig. 7), the details of the optimal δ(u) would change slightly, these variations being comparable to those between any of these optimal solutions and the SEO. That is, the mechanics of stress-induced birefringence provide a simple way to produce a nearly optimal birefingence distribution for imaging polarimetry, without the need for nano-fabrication or combinations of spatial light modulators: by applying a pressure distribution with trigonal symmetry to the edges of a a glass window, a nearly-optimal birefringence mask (the SEO) results.
Since the aim of this work was to show that the optimal BM distribution has very similar characteristics and performance as the SEO, we considered the continuous functional form of the PSFs. The performance of these devices in real applications, however, will also depend on the pixelation and general characteristics of the detector, as well as on the algorithms used for the retrieval of the Stokes parameters from the measured PSFs. Preliminary work on this direction can be found in Ref. [13]. These aspects are particularly important in applications such as fluorescence microscopy [9], where photons are scarce and the PSFs are used to extract information not only about the two-dimensional polarization studied here, but of the three-dimensional polarization (as well as the three-dimensional position) of the light emitted by a fluorophore. For such applications, the near-optimality of the SEO is of central importance.