Analysis of Polarimetry Data with Angular Uncertainties

For a track based polarimeter, such as the Imaging X-ray Polarimetry Explorer (IXPE), the sensitivity to polarization depends on the modulation factor, which is a strong function of energy. In previous work, a likelihood method was developed that would account for this variation in order to estimate the minimum detectable polarization (MDP). That method essentially required that the position angles of individual events should be known precisely. In a separate work, however, it was shown that using a machine learning method for measuring event tracks can generate track angle uncertainties, which can be used in the analysis. Here, the maximum likelihood method is used as a basis for revising the estimate of the MDP in a general way that can include uncertainties in event track position angles. The resultant MDP depends solely upon the distribution of track angle uncertainties present in the input data. Due to the physics of the IXPE detectors, it is possible to derive a simple relationship between these angular uncertainties and the energy-dependent modulation function as a step in the process.


INTRODUCTION
X-ray polarimeters that use the photoelectric effect, such as the gas pixel detectors (GPDs) (Costa et al. 2001;Bellazzini et al. 2007;Fabiani et al. 2012;Baldini et al. 2021) in the Imaging X-ray Polarimetry Explorer (IXPE: Weisskopf et al. 2016;O'Dell et al. 2018) provide a signal that is probabilistically related to the nput polarization direction A histogram of event phase angles, ψ, then has a characteristic instrument-dependent modulation factor, µ, defined as the amplitude of the signal modulation for a source of 100% linearly polarized photons. Instrument models and ground-based calibration using sources of known polarization angle are used to determine how µ depends on energy E. Generally, polarimeters are characterized by their minimum detectable polarization (MDP) at a given level of confidence (e.g., 99%) which depends on µ: where T is the exposure time and the count rates the instrument records for both the source R S and background R B (Weisskopf et al. 2010 where N = T R S is the number of counts in the observation. In a previous paper, we showed how to modify Eq. 2 for the case where µ is a strong function of E (Marshall 2021, hereafter Paper I). In this paper, we extend the method to include uncertainties in the event phase angles. Estimating the uncertainties in the phase angles was a major feature of a new advance in handling event tracks based on convolutional neural networks (CNNs, .  showed that a machine learning approach could provide significantly larger modulation factors than previous empirical methods, yielding improvements to the MDP for a given simulated observation. In order to choose networks yielding the smallest MDPs, a weighting term of form σ −λ was applied to the likelihood based on the event uncertainties, σ. However, the weighting term was heuristic, with a free parameter, λ, that was set a posteriori using simulations in order to provide small MDPs. Thus, the eventual MDP values actually depended on the free parameter.  found that λ should be 1.5-2 but that the value of λ that gives the minimum MDP depends on the spectral shape of the source, which determines the number of events with low energy and high σ relative to the number with high energy and low σ. The choice of an optimal CNN then depends on the spectral shape of the source and λ.
Here, I demonstrate that these track angle uncertainties can be directly related to the modulation function. Furthermore, I show how a maximum likelihood analysis of an observed data set can be modified to include the event angle uncertainty information in an objective manner. Using this approach, the MDP is derived for the case of large N and is shown to depend on the distribution of track angle uncertainties present in the data. This distribution necessarily depends on source spectrum, so the MDP will also depend on the source spectrum. However, the process for determining the optimal CNN of a set of possibilities will depend on its accuracy in measuring track angles and not directly on the source spectrum and will not require a weighting term that depends on a free parameter.

BACKGROUND
We follow the unbinned method of our previous paper, (Marshall 2021, hereafter Paper I) in formulating the problem and defining quantities. An observation with the instrument consists of N events with index i, where each event is tagged by its energy, E i , and instrument phase, ψ i . In the original formulation, it was assumed that this phase is the perfectly measured position angle of the event track. At energy E, the instrument response is characterized by a modulation factor is µ E and the instrument effective area is A E = AA(E), where A is a constant with units of cm −2 . For reasons that will be clear later, the intrinsic source flux is defined to be n E = n 0 N (E i ), where the function N (E) is a function giving the spectral shape of the source, and n 0 is the normalization of the spectrum in suitable units (say ph/cm 2 /s/keV per radian of rotation). In this model, the event density in a differential energy-phase element dEdψ about (E, ψ) is where T is the exposure time, and q and u are the fractional Stokes parameters, given by Q = P cos φ 0 = qI and U = P sin φ 0 = uI, respectively. The (linear) polarization fraction is then p ≡ P/I = (q 2 + u 2 ) 1/2 and the source phase angle is φ 0 = tan −1 u/q. The electric vector position angle (EVPA) is ϕ = φ 0 /2. In general, the normalized Stokes parameters q and u are functions of energy and a forward folding approach (such as xspec) is needed to estimate parameters of those functions. For now, I assume that q and u are independent of E or that the bandpass of interest is sufficiently narrow that the polarization fraction and EVPA can be assumed to be constant across the band. For some sources, such an assumption may be required in order to detect a weak polarization signal or to place a limit on the degree of polarization.
The log-likelihood for a Poisson probability distribution of events, S = −2 ln L, is The best estimate of n 0 is independent of q and u: The log-likelihood for the polarization parameters alone is merely the first term of Eq. 5: and the MDP is when N is large, whereμ 2 is the count-weighted average of µ 2 (Paper I).

DERIVING THE MODULATION FACTOR
For a photoelectron polarimeter, the interaction cross section is proportional to sin 2 θ cos 2 ψ, where θ is the polar angle and ψ is the azimuthal angle relative to the input photon's E vector, ϕ. For IXPE, the charge generated by photoelectron interactions in the detector gas drifts in an applied electric field in before collection at the pixelized anode (cf. Bellazzini et al. 2007). Thus, the polar angle is not measured and the azimuthal angle is independent of energy, so the probability distribution of track angles is p(ψ) = 2 cos 2 ψ − ϕ = 1 + cos 2(ψ − ϕ) for 100% polarized X-rays (p = 1); thus, µ = 1 in theory. Noting that this distribution is only satisfied for a perfect polarimeter, we examine the hypothesis that the modulation factor is less than 1 due to track angle measurement uncertainties. A similar hypothesis was examined by Vink & Zhou (2018). They took an empirical approach and used Monte Carlo methods to confirm the relationship between track measurement uncertainties and the modulation function. However, now that a track measurement method has been developed that provides uncertainties for individual tracks , we explore an analytical approach to the modulation function.
We start by assuming that the angle error distribution can be modeled by a single function, G(ψ; ψ , α), where ψ is the track's measured phase angle, ψ is the true phase angle, and α is a generic parameter that characterizes the error distribution. For this study, we consider two functional forms of G: a Gaussian, with α = σ as the standard deviation of the Gaussian, and a von Mises distribution, with α = κ. The von Mises distribution is circular, which may be more appropriate for azimuthal angles. The forms of the two distributions are where I 0 (κ) is the modified Bessel function of order 0. Note that the two distributions are practically equivalent for small σ and large κ, with κ ∼ σ −2 .
We can now compute the probability distribution for the observed track angles by integrating over the distribution of true (unknown) angles: where we find the expected modulation factors depend solely on the characteristic parameters of the input error distributions: for the Gaussian and von Mises distributions, respectively. In Fig. 1, we compare these two distributions, noting that the two are practically identical for small (Gaussian) uncertainties, as expected, and that the general forms are quite similar. Before exercising this development, however, we must recognize that µ is usually given as a function of energy while the distributions of uncertainties at any given energy are not delta functions and are not even unimodal at some energies. See Fig. 4 of  for two examples. While the distribution of σ is approximately Gaussian with a mean of σ = 0.62 rad at 3 keV, the 6.4 keV distribution shows a peak at σ = 0.33 rad and a broad, weaker peak at 0.6 rad. If we designate the distribution of σ for a given energy by ρ(σ; E), normalized so that ρ(σ; E)dσ = 1, we can then estimate the observed modulation function by for the Gaussian case. The von Mises case is analogously determined. For 3 keV, Eq. 12 gives µ = 0.46. Approximating the ρ(σ) distribution at 3 keV as a single Gaussian with mean σ 0 and standard deviation s σ , then µ = 1 For E = 3 keV, σ 0 = 0.62 and s σ = 0.076 and µ increases, but only by 0.5% compared to assuming that ρ(σ) is a delta function at σ 0 . Because s 2 σ σ 2 0 , the result is not sensitive to s σ , changing by less than 0.3% for 10% changes in s σ . The best value of µ using CNNs was just under 0.40, with other methods giving µ as low as 0.32 . While a few percent loss may be expected due to events that interact in the detector window or the gas electron multiplier (GEM), this effect is insufficient to explain the difference between the model and the CNN results and these events are typically cut out of the analysis anyway. We can recover the observed modulation factor by adding a systemic variance term, replacing σ 2 0 with σ 2 0 + σ 2 s ; for σ s = 0.25, the predicted value of µ drops to 0.41. This systemic variance is not to be identified with the "epistemic" variance term added by  to compute total track angle variance. The predicted µ based on the von Mises error distribution may prove to be better, requiring knowledge of the κ distribution characterizing the track measurements which is outside the scope of this paper.
At 6.4 keV, we approximate the ρ(σ) function as the sum of two Gaussians, one with 70% of the events at σ 0 = 0.33 and s σ = 0.076 and 30% with σ 0 = 0.58 and s σ = 0.076. With these values, µ = 0.72, which compares well to the values obtained from the weighted CNN fits, which were 0.70 and 0.76, depending on the weighting parameter. Again, the predicted µ is not sensitive to s σ . A more proper comparison of the prediction would be to the unweighted CNN result, which gives µ = 0.61, significantly below the predicted value. Including an additional variance term with σ s = 0.25 brings the predicted value down to 0.63, much closer to the observed value. Thus, while this method of predicting the modulation factor may be too high by 10-15%, adding an additional variance term may be sufficient to provide an accurate model of the modulation as a function of the track angle uncertainty.
In reality, neither the Gaussian model nor the (monopolar) von Mises model on cos(ψ − ψ ) are likely to be accurate descriptions of the angle error distributions. Both models are appropriate only when the X-ray event interaction point can be reliably identified by the NN. Such will be the case for most long tracks but will not be valid for short tracks (i.e., those with low energies) whose tracks have approximately symmetrically distributed charge. In this case, a dipolar von Mises model on cos 2(ψ − ψ ) is closer to the truth. For this model (e.g. . However, this model cannot hold for all events because high energy tracks to not have equal probability for ψ as ψ + π. Therefore, an alternative angle error distribution might be where f (κ) is a smooth function of κ, such that 0 ≤ f (κ) ≤ 1. For example, taking f (κ) = κ/(1 + κ) gives greater emphasis to the monopolar model for long tracks where κ is large. For G(ψ; ψ , κ) as defined by Eq. 17, the expected modulation factor is Determining f (κ) is outside the scope of this paper, requiring detailed CNN simulations. At this point, we assume that a suitable functional form for G(ψ; ψ , α) can be determined and validated with simulations and lab data. We now proceed to show how the maximum likelihood formalism can be used to incorporate track angle uncertainties.

REVISING THE LIKELIHOOD
Each event now has three important measurable quantities associated with it that relate to the source polarization: energy E, phase angle ψ, and (generically) angle uncertainty α. The event density in this 3D space is where we now include P(α; E) to represent the probability density that an event of energy E has a track angle uncertainty α. This probability is normalized so that the integral over α is unity. The log-likelihood of Eq. 5 is rewritten to be which looks the same as before because P(α; E) is normalized and doesn't depend on the source flux or polarization. However, now µ i = µ (α i ) according to the appropriate choice of error distribution (and any additional variance term) instead of µ i = µ(E i ). At this point, the best estimates of q and u are the same as in Paper I: and where c i ≡ µ i cos 2ψ i , s i ≡ µ i sin 2ψ i , and the approximations hold for large N. Note how these equations are similar to comparable equations derived by Kislat et al. (2015) and Vink & Zhou (2018). Finally, the MDP is as in Paper I. This analysis was based on an unbinned data set, where each event is tagged with uncertainty α i and position angle ψ i . Instead, if we suppose that the data are binned in small intervals ∆α about α j , with C j observed counts in bin j, then the Eq. 26 would instead be written as similar to Eq. 50 of Paper I, except that j represents an uncertainty bin, not an energy bin. Eqs. 28 and 29 apply to uncertainties given by Gaussian and monopolar von Mises distributions, respectively. While the derivation of the best fit polarization parameters does not depends on applying weights to the likelihood, the computation of the MDP does depend on the spectrum of the source via the C j terms. These terms are where the best estimate of n 0 from Eq. 6 is substituted, and gives the fraction of events in uncertainty bin j. Thus, determining the MDP is merely a process of setting the number of counts N in an observation and the assumed spectral shape N (E). Furthermore, once N (or n 0 ) is set, the likelihood for q and u no longer depends directly on the source spectrum, so maximizing the likelihood over a set of CNNs only depends on the values of α i (via µ i ) that are derived for the set of events. Figure 1. Comparison of the modulation factor from a Gaussian model with standard deviation σ to that using a von Mises distribution with parameter κ.

SUMMARY
We have outlined a way to to use the uncertainties in track angle measurements for an instrument such as IXPE. These uncertainties can be obtained by a machine learning method such as the version based on convolutional neural networks by . The uncertainties can be directly related to the instrument modulation function, which is critical to any analysis. Furthermore, the uncertainties can be incorporated in a maximum likelihood method for estimating polarization parameters by computing the distribution of these uncertainties for the data set of interest.
All events are used in the analysis to determine the best fit parameters, obviating the need for estimating an "effective" number of events. Events with extremely large uncertainties would have negligible modulation factors and would contribute very little to the estimation of q and u. Eqs. 28 through 31 show that the MDP depends primarily on events with small uncertainties but all events are usable.
One caveat to this approach is that the angle error distribution should be properly characterized. We have demonstrated how the analysis may proceed for a Gaussian distribution of errors, as provided by the current CNN method, but a von Mises distribution is likely to be more appropriate. Fortunately, the method outlined here can accommodate whatever distribution is found to fit the data. It is important to carry out enough simulations to estimate the G(ψ; ψ , α) (or just µ [α]) and P(α; E) distributions.