Multiband Weighting of X-ray Polarization Data

An optimal estimate for Stokes parameters is derived for the situation in X-ray astronomy where the instrument has a modulation factor that varies significantly with energy but the signals are very weak or mildly polarized. For such sources, the band of analysis may be broadened in order to obtain a significant polarization measurement. Optimal estimators are provided for the cases of binned and unbinned data and applied to data such as might be obtained for faint or weakly polarized sources observed using the Imaging X-ray Polarimetry Explorer (IXPE). For a sample situation, the improvement in the minimum detectable polarization is 6-7% using a count weighted root-mean-square of the modulation factor, when compared to a count weighted average. Improving the modulation factor, such as when using a neural network approach to IXPE event tracks, can provide additional improvement up to 10-15%. The actual improvement depends on the spectral shape and the details of the instrument response functions.


INTRODUCTION
X-ray polarimeters are still at an early stage of development where measurements are signal limited across the entire band of sensitivity. Current and planned instruments generally provide sinusoidal signals with fractional half-amplitudes of pµ for 100p% (linear) polarized input, where µ is the modulation factor. Often, p and µ are significantly less than unity, providing weak signals. The minimum detectable polarization (MDP) is frequently used to indicate how low an instrument's sensitivity might be; when the instrument gives p > MDP, the signal should be significant at high confidence. Most calculations of MDP select a single value for µ in order to determine the MDP even though µ may vary significantly with energy within the instrument's bandpass. Here, I examine how one may account for such variation in an optimal fashion. X-ray polarimeters that use the photoelectric effect such as the Imaging X-ray Polarimetry Explorer (IXPE: Weisskopf et al. 2016;O'Dell et al. 2018) or practically any other method including scattering (e.g. X-Calibur: Beilicke et al. 2014;Kislat et al. 2018), provide a signal that is probabilistically related to the input polarization direction. A histogram of event phase angles, ψ, then has a characteristic instrumentdependent modulation factor, µ. Ground-based calibration using sources of known polarization angle are used to determine how µ depends on energy E. Polarimeters are usually characterized by their MDP which depends on µ and is also related to the exposure time, T , and the count rates the instrument records for both the source R S and background R B : and the level of confidence, which we will take to be 99% (Weisskopf et al. 2010 where N = T R S is the expected number of counts in the observation. In general, the modulation factor depends on energy. It is defined as the ratio of the semi-amplitude of the event histogram to the average over phase for fully polarized light. Thus, MDP 99 can be readily computed for a small energy range but it is more difficult to determine what value of µ to use when µ(E) is a strong function of E. One approach would be to use a count-weighted average of µ: where C j are the counts observed in energy bin j and N = j C j Elsner et al. (2012). Here, I compute optimal estimates of the polarization Stokes parameters for cases where µ can vary significantly across the energy range of the detector as long as the Stokes parameters are constant or vary slowly across the entire range.
For high signal observations of strongly polarized sources, the data can be divided into energy bands over which the assumption of µ = constant is appropriate. In this analysis, I will concentrate on the case when the signal is weak, so that one requires a large bandpass in order to detect a polarization signal. As circular polarization is not yet feasibly detectable, I will assume that Stokes V is zero. Furthermore, for simplicity, only the case where the background is negligible and the exposure for each angular bin is the same are considered. Kislat et al. (2015) provided an method to determine weights for the case where an instrument must be rotated to sample the Stokes parameters fully and the sampling is not uniform.

Assumptions
Let i designate one of n angular phase bins of width ∆ψ = 2π/n about ψ i = i∆ψ (i = 0..n − 1) and j indicate the energy bin of width ∆E about E j ( j = 1..J). The instrument provides counts C i j , each with uncertainty σ i j . I assume that the background is negligible for this analysis. Also, for the sake of simplicity so that we may gain a little intuition about the solution, I will assume, for now, that we have large signals in every bin so that we may use χ 2 statistics. To make formulae a little simpler, I will set σ i j to σ for a few of the derivations but then generalize in § 2.7.
Since we are interested in how to combine data from different energy channels, I also assume that the Stokes parameters Q and U are not functions of energy. Otherwise, we would not combine the channels at all. I will not use any formalism involving energy to channel redistribution matrix files (aka RMFs), but instead assume that the energy response functions are narrow compared to the energy bands. This last point is not really restrictive because all we need in the end is a model that describes the counts in each phase-averaged energy bin.

The Model and Fit Statistic
The expected number of counts in bin i, j is where f j is the source flux in suitable units (say ph/cm 2 /s/keV per radian of rotation), µ j is the modulation factor at energy E j , A j is the system effective area, T is the exposure time, and q and u are the fractional Stokes parameters, given by Q = P cos φ 0 = qI and U = P sin φ 0 = uI, respectively. The polarization fraction is then p ≡ P/I = (q 2 + u 2 ) 1/2 and the source phase angle is φ 0 = tan −1 u/q. The electric vector position angle (EVPA) is ϕ = φ 0 /2. We form the fit statistic from the data and the model as To make the solution a little easier to read, I define α j = f j A j T , β j = α j µ j , s i = sin 2ψ i , and c i = cos 2ψ i . At this point, I set σ i j = σ for all i and j, which would apply if the counts in each energy bin were the same and p 1. Eq. 5 then becomes Because the phase angles are chosen uniformly,

Solving for the Flux (Stokes I)
Setting the derivative of χ 2 with respect to α j to zero yields where ∆ = ∆ψ∆E. For this analysis, µ j p 1, so we drop the 2nd term on the right of Eq. 11, obtaininĝ where C . j ≡ i C i j is the expected number of counts in channel j. We also havê Note that the units are appropriate; the factor of 1/2π comes in because the flux was defined per radian of polarization phase space.

Solving for the Polarization (Stokes Q and U)
Similarly proceeding, one may derive the best estimates of q and u: Using the definition of β j and substituting Eq. 12, we havê where the definition of C . j and Eq. 9 are used and Also, where 2 (x) is the 2nd term of the Fourier transform of x i (i.e., the 2θ modulation portion of the phase binned counts weighted by a channel-dependent term). Note that pµ j C . j is the maximum modulated signal in channel j, and µ j C . j is the maximum modulated signal when p = 1. The best estimates of q and u for a single energy channel j are obtained from Eqs 16 and 17 by removing the summation over j, givingq We can check these formulae by modeling with C i j = C . j /n(1 + ηµ j cos 2ψ i ) = C . j /n(1 + ηµ j c i ) as the response of the detector to a signal that is 100η% polarized at an EVPA = ϕ = φ 0 = 0. Substituting the model into Eqs. 21 and 22 givesq j = η andû j = 0, as expected, using Eqs. 8 and 9.

Bandpass Weighting
Solving Eqs. 21 and 22 for i c i C i j and i s i C i j , respectively, then Eqs. 16 and 17 can be rewritten aŝ where Because j W j = 1, the W j values can be identified as weights to apply to the individual bandpass values in order to obtain optimal, global values, as long as the polarization angle and fraction do not depend significantly on energy. The relative weight of channel j compared to channel k is just (µ j /µ k ) 2 (C . j /C .k ) 2 , the ratio of the variances of the total counts in the two channels weighted by the square of the relevant modulation factors.

Stokes Parameter Uncertainties, Binned Case, Constant Uncertainties
The diagonal terms of the Fisher matrix give the parameter uncertainties for f j , q, and u, as the offdiagonal derivatives of Eq. 6 are all zero or very small when pµ 1. Because of the assumption that the bandpasses are arranged to have equal variances, σ 2 , C . j = nσ 2 = N/J, giving In order to relate to the Stokes parameters' uncertainties to MDP 99 , consider the case of a single channel, where J = 1 and µ j = µ, so MDP 99 is then just scaled to the uncertainty in q or u: Due to the symmetry between q and u in the absence of a signal, we can determine the MDP for uniformly weighted polarimeter counts when there are several bandpasses with differing modulation channels: Eq. 31 indicates that the optimal modulation factor to use for a multi-band data set with constant counts per energy bin is the RMS of the set of µ j values, not the mean.

Using Data Uncertainties, Binned Case
While commonly practiced, it is not always appropriate to set spectral bins based on the observed counts in order to obtain a constant number of counts per spectral bin. One reason to avoid such an approach is that the bins may end up too wide to satisfy the requirement that the modulation factor be constant across the bin. So, now we return to the general case where uncertainties depend on the energy bin j. With statistical weighting by 1/σ 2 i j , the weighted sums over c i , s i , and c i s i are no longer identically equal to zero as the uncertainties track the counts per bin, which will correlate with ψ i for pµ j > 0. However, when considering the special case of weakly modulated signals, there can be simple solutions.
The best values of α j , q, and u are found by setting ∇χ 2 = 0. Using Eq. 5 for χ 2 gives These equations will not be tractable in general but we can derive approximate solutions for pµ j 1 and we may drop the c i s i terms that are small compared to the c 2 i and s 2 i terms: where the quantityC j is a variance-weighted average of the counts in channel j. When σ 2 i j = C i j , then C j = n/( i C −1 i j ). The single channel estimates of the q and u values arê Again, we can solve Eqs. 38 and 39 for the terms that sum over C i j /σ 2 i j in the numerators to obtain estimates of q and u in terms of single band estimateŝ where and, as before, j w j = j w j = 1. These are the weights to be applied to single band results to combine them optimally.

Stokes Parameter Uncertainties, Binned Case
Again, uncertainties in the derived values can be estimated under the assumption pµ 1 so that crossterms in the Fisher matrix are negligible. Then, For weak modulations, the counts in channel j are approximately independent of phase bin, so we may approximate σ 2 i j by σ 2 j /n, where σ 2 j ≈ C . j is the variance in total counts in channel j, C . j . NowC j = C . j /n, giving Following the computation of MDP 99 as in § 2.6, MDP 99 in this case is Eq. 50 indicates that the modulation factor to use in Eq. 2 that is optimal for a multi-band data set is the count-weighted RMS of the set of µ j values, j µ 2 j C . j /N, not the count-weighted mean (Eq. 3). The choice to use Eq. 49 or Eq. 50 depends on whether the variances are dominated by the total counts in the energy band; if there is significant non-Poisson variance, then Eq. 49 is will be somewhat more accurate.

WEIGHTING OF UNBINNED DATA
In general, an unbinned likelihood analysis is more appropriate than the use of χ 2 statistics for fitting X-ray event data due to the Poisson nature of the counting statistics. Furthermore, binning details often involves user discretion. Though normally requiring numerical methods to determine model parameters, the assumption of weak modulations makes it possible to obtain good estimates.

Likelihood Formulation
Assume that there are N events, with energies and instrument phases (E i , ψ i ). At energy E i , the modulation factor is µ i , the instrument effective area is A i , and the intrinsic source flux is f i = f (E i ) based on the spectral model of the source. The event density in a differential energy-phase element dEdψ about (E, ψ) is and the log-likelihood for a Poisson probability distribution of events, S = −2 ln L, is Note that eq. 53 has two terms that depend only on the parameters of f E and the remaining interesting term depends only on the parameters of q and u. If there are no parameters in common, in the case where q and u do not depend on the source flux and we are not fitting (yet) for parameters of the spectral model, then the log-likelihood for the polarization parameters is merely S(q, u) = −2 i ln(1 + qµ i cos 2ψ i + uµ i sin 2ψ i ).

Estimating q and u for the Unbinned Case
We can define c i = µ i cos 2ψ i and s i = µ i sin 2ψ i (unlike the case for binned data). Setting ∂S/∂q = 0 and ∂S/∂u = 0 to find the best estimates of q and u gives where w i ≡ (1 +qc i +ûs i ) −1 . It can be shown that i w i = N. These two equations apply under quite general circumstances but require numerical solution. However, forq 1 andû 1, then w i ≈ 1 −qc i −ûs i , so eqs. 55 and 56 become where all sums are implicitly over i. If the source is polarized, then the phase angles are not random, so the sums over the sine or sine-cosine terms are not identically zero. Fortunately, the pair of linear equations can be solved, givingq In keeping with the assumed approximations of small q and u, we expect the sine-cosine terms to be small compared to the sums over s 2 i or c 2 i , givingq It should be simple to compute these sums for all the events in an observation in order to obtain a first estimate of the sizes of the fractional Stokes parameters, q and u. These estimates provide some accounting for the variation of the modulation factor by using it as a weight on each phase term. Furthermore, the solution indicates a simple way to estimate the dependences of q and u on E by merely creating the sums over suitably small energy ranges so one may obtain a non-parametric estimate of the (q, u) energy dependence, assuming that the values are small but measurable.
The RMF does not enter this analysis, making it relatively easy to examine a data set in a preliminary fashion before attempting detailed fitting that might require an iterative fitting method that accounts for energy redistribution. Technically, assigning a value of µ to an event requires knowledge of the event's true energy, which is uncertain because of the probabilistic mapping between the detector signal and energy. However, as long as the modulation factor varies slowly compared to the detector response function, this approach may still be useful and give a valid impression as to how to devise a physical model.

Uncertainties, Unbinned Case
Finally, the uncertainties on q and u are estimated from the 2nd derivatives of eq. 54: and using w i ≈ 1 and w 2 i c i s i ≈ 0, one obtains simple estimates for the Stokes uncertainties: Remember that the c i and s i values have the modulation factor included in this situation, so the denominators are not N/2. To estimate the denominators, we note that whereμ 2 is the count-weighted average of µ 2 . Furthermore, c i and s i should be similarly distributed when p 1, giving Finally, as in § 2.7, we make the connection to MDP 99 : As in the binned case, the modulation factor to use in Eq. 2 that is optimal for unbinned data set is the count-weighted RMS of the set of µ i values, not the count-weighted mean (Eq. 3).

EXAMPLE APPLICATION AND SUMMARY
Armed with an appropriate weighting scheme, it is useful to determine how much the MDP may improve using it. To determine the effect of optimal binning, the results of an observations can be computed using assumed functions for the spectral model, effective area, and µ(E). It is important to get the dependences of these functions on energy approximately correct; the normalizations are not relevant when just selecting a total number of events for each simulated observation. Fig. 1 shows assumed models of the effective area, normalized to 1 at the maximum value, and of µ(E), both for an instrument like IXPE O'Dell et al. (2018). The standard modulation curve for IXPE uses a moment method with ellipticity exclusion (Bellazzini et al. 2003); the data are from A. Di Marco, et al., in preparation (see also Weisskopf et al. 2016). The effective area is approximate (as in Elsner et al. 2012)only the shape is important for this analysis. For a given total number of counts, N, two versions of MDP 99 were computed using the area and modulation curves for each of two assumed spectral models. The spectral model is a power law with spectral index Γ = 0 or 2, where f E ∝ E −Γ . For case 1, MDP 99 is determined using a count weighted modulation factor given by Eq. 3. For case 2, MDP 99 is given by Eq. 50. Thus: where X = 1 for case 1 and 2 for case 2 and the integral in the numerator normalizes the spectrum so that there are N expected counts. For this exercise, N = 10 5 after the ellipticity exclusion, achievable with IXPE for sources with 2-8 keV fluxes of ∼ 10 −11 erg cm −2 s −1 in about 10 days . For Case 1, MDP 99 came out to 3.61% and 4.52% for Γ = 0 or 2, respectively. For the optimum weighting of Case 2, MDP 99 values were 3.41% and 4.21%. Thus, the improvement to the MDPs were 6.0% and 7.2% for the two different spectral slopes. It is important to note that the fractional improvement is independent of N.
One can obtain further improvement in the MDP by increasing the modulation factor. Peirson et al. (2021) show that an approach to measuring IXPE tracks using convolutional neural networks can increase µ(E), as shown in Fig. 1, and thereby improve the subsequent estimate of the MDP. The actual improvement in MDP in this approach depends somewhat on a weighting parameter, λ (cf. Peirson et al. 2021). Using Eq. 71 with the λ = 2 modulation factor curve from Peirson et al. (2021) and assuming N = 10 5 can yield further reductions of MDP 99 by 15% and 8% for the two spectral shapes to 2.97% and 3.89%, respectively. For λ = 1, the improvement is less than a few percent. Peirson et al. (2021) point out that the track uncertainty weight method has no ellipticity exclusion, thus starting with more events than with the moments method, but does reduce the effective number of events by 15-20%.
Clearly, there is an advantage to optimal weighting using the count-weighted RMS of the modulation factor. The actual improvement will depend on the details of the shape of the effective area and modulation factor curves as well as the actual shape of the spectrum. However, this weighting can be generally applied for the best results.