Cluster mass estimators from CMB temperature and polarization lensing

Upcoming Sunyaev–Zel'dovich surveys are expected to return ∼104 intermediate mass clusters at high redshift. Their average masses must be known to the same accuracy as desired for the dark energy properties. Internal to the surveys, the cosmic microwave background (CMB) potentially provides a source for lensing mass measurements whose distance is precisely known and behind all clusters. We develop statistical mass estimators from six quadratic combinations of CMB temperature and polarization fields that can simultaneously recover large-scale structure and cluster mass profiles. The performance of these estimators on idealized Navarro–Frenk–White (NFW) clusters suggests that surveys with a ∼1′ beam and noise in uncontaminated temperature maps can make a ∼10σ detection, or equivalently a ∼10% mass measurement for each 103 set of clusters. With internal or external acoustic scale E-polarization measurements, the ET cross-correlation estimator can provide a stringent test for contaminants on a first detection at ∼1/3 the significance. For surveys that reach below , the EB cross-correlation estimator should provide the most precise measurements and potentially the strongest control over contaminants.


Introduction
Upcoming surveys for clusters utilizing the Sunyaev-Zel'dovich (SZ) effect in the cosmic microwave background (CMB) as a detection technique hold the promise to measure the properties of dark energy to high precision with ∼10 4 clusters at redshifts out to z ∼ 1. The mean mass and variance of the sample needs to be known to an accuracy comparable to that desired for the dark energy equation of state to not be a limiting factor. Since the SZ effect is sensitive to the temperature weighted baryon content of the cluster, it does not directly probe the total cluster mass in a model independent way.
Fortunately, the same survey which identifies the clusters in the SZ effect can potentially also constrain their masses through gravitational lensing of the CMB. Utilizing the CMB as a source is also appealing in that its distance is both well determined and sufficiently far to probe even the highest redshift clusters in the sample.
Studies of gravitational lensing of the CMB by clusters [1]- [7] typically focus on making detailed measurements of individual high mass clusters or on statistical reconstructions of average cluster properties. The CMB suffers in the former case when compared with galaxy weak lensing because the background image, in this case the primary CMB, is a Gaussian random field whose properties must be separated from the cluster and its emission. This disadvantage will be substantially reduced in the latter case where the goal is to measure statistical properties rather than unique objects, provided that large samples of clusters are available for the analysis. Surveys such as the South Pole Telescope (SPT) and the Atacama Cosmology Telescope (ACT) should provide samples of ∼10 4 intermediate mass clusters at high redshift, so that a precise and unbiased statistical measurement may become a realistic possibility in the near future. The CMB even possesses some advantages over galaxy lensing, since the source statistical properties and redshift are extremely well determined, and because lensing of the temperature and polarization provide strong consistency checks against possible contamination by cluster emission.
Techniques which use CMB lensing to reconstruct the lensing convergence, κ, have been available for some time [8]- [10], and although these were initially intended as probes of large-scale structure, they can in principle recover the projected mass regardless of its source. In principle this includes reconstructions of large galaxy clusters and associated statistical quantities, such as the cluster-convergence correlation function. On cluster length scales, this is essentially an average density profile and mass measurement (e.g. [11,12]). We will focus on estimators that are quadratic in the observed CMB fields [8,9], since these can be implemented using fast algorithms, and can eliminate contamination by isolating pairs of modes.

DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
However, Amblard et al [13] used simulations to show that the minimum variance quadratic estimator built out of the temperature field is biased if the lensing field is non-Gaussian at the level expected for real structure. Subsequently Maturi et al [6] helped to explain this fact by demonstrating that the reconstructed density field for this estimator is biased low in regions around large clusters.
In this paper, we analyse the origin of the bias and show that it can be nearly eliminated using the increasingly well measured properties of the CMB, and that the modifications we introduce produce only a very modest degradation in the statistical reconstruction of cluster properties. Since the temperature field estimators readily generalize to polarization, these can be used as consistency checks on each other or indeed on any other cluster mass estimates.
The outline of the paper is as follows. In section 2, we review the general construction of quadratic estimators and the underlying approximations upon which they are based. We determine the origin of the biased estimates and show that they are associated with the miss-estimation of source gradients in the moderate to strong lensing regime. The bias can be eliminated at negligible cost to the signal-to-noise by imposing a strong filter against this and other contaminants. In section 3 we consider the idealized performance of the temperature and polarization estimators. Finally in section 4 we discuss internal consistency checks and robustness of the estimators.

Quadratic estimators
Lensing is a surface brightness conserving remapping of the intrinsic temperature and polarization fields from recombination. Given an unlensed temperature fieldT (n) and StokesQ(n) andŨ(n) fields, wheren denotes the angular position on the sky, the lensed fields are given by Here φ is the deflection potential, ∇φ is the deflection angle, and they are related to the convergence as All derivatives throughout are angular derivatives on the sky. We will furthermore employ the flat sky approximation in this paper but all expressions readily generalize to the full sky with the replacement of ordinary derivatives with covariant derivatives and Fourier modes with spherical harmonics [14]- [16]. Hu [8] and Hu and Okamoto [9] derived minimum variance quadratic estimators of φ (or equivalently κ) out of all possible pairs of the temperature and polarization fields. Their derivation relies on two related but independent linearization approximations: the gradient approximation and linearization in the convergence. Only the latter, which is equivalent to considering lensing as a small correction to the source field, requires modification for use in cluster reconstruction.
To see how these approximations enter, let us first consider the case of the temperature field. The analysis readily generalizes to polarization as we shall see.
Fundamentally the temperature estimator is built out of a lensing induced correlation between the temperature field and its gradient [17]. This correlation arises from the gradient 4 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT approximation, valid when the deflections are small compared with the structure in the unlensed field. This approximation of course does not hold for all Fourier modes in the CMB fields and lens (see e.g. [18]). It need only hold the modes that are correlated by the reconstruction. The lensed field T L (n) can be prefiltered in Fourier space to isolate modes for which the gradient approximation is valid where W T l is the Fourier filter, the subscript 'L' denotes the lensing-filtered field and T l is the Fourier representation of the full temperature field. In the generalization to polarization below, we employ multiple lensed fields in the quadratic construction and so we will also denote the temperature based one as The lensed temperature field T L can be approximated by a Taylor expansion which we will call the gradient approximation. For the case of cluster lensing, the typical deflections are < 1 compared with structure in the unlensed CMB with coherence of ∼10 . Equation (5) is therefore an excellent approximation for the full field even in the strong lensing regime. In this case the minimum variance filter can be used [8] W T l = (C TT l + N TT l ) −1 .
Here C TT l is the lensed CMB power spectrum and N TT l is the noise power spectrum. More generally W T l can be chosen to suppress modes which violate the gradient approximation or which are contaminated by foregrounds. Likewise, although the filter can be slightly sub-optimal for cluster mass reconstruction since a cluster is not located in a random point in the sky, the choice does not bias the reconstruction since it simply corresponds to a predetermined relative weighting of the modes.
Given that the gradient approximation induces a correlation with the unlensed temperature gradient, one forms a quadratic estimator by multiplying T L by a filtered gradient of the lensed temperature field where W TT l is another Fourier space filter. The filter [8] yields a minimum variance reconstruction under the fully linearized approximation. HereC TT l is the unlensed CMB power spectrum and equation (8) is essentially a Wiener filter for the unlensed gradients (cf [6] for the more problematic Wiener filter for the total gradient).

DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
This filter has to be modified in the presence of rare non-Gaussian structure where lensing effects are moderate to strong. Expanding the product in the gradient approximation, we obtain The second approximation is that the quadratic term in φ is negligible. If so, averaging over realizations of the unlensed temperature field yields a quantity proportional to ∇φ with a proportionality coefficient related to the well-determined two-point function or power spectrum of the unlensed CMB. This quantity is therefore a quadratic estimator of the deflection angles ∇φ or κ in the linear approximation. The quadratic term in φ comes from the change in the temperature gradients due to lensing. Qualitatively, its omission is equivalent to considering lensing as a small perturbation to the unlensed image. Quantitatively it involves the assumption that the deflection angles are small compared with structures in the lens. This approximation is violated around a cluster and the Wiener filter, which is based on the variance of typical regions, does not sufficiently suppress this region in the absence of detector noise.
To see the effect of the quadratic term, consider the lensed temperature field to be filtered for small-scale fluctuations as in equation (6) with N TT l = 0 and likewise take the unlensed temperature gradients ∇T G ≈ const. In that case, the estimator becomes The quadratic term carries a coherent contribution whose strength depends on κ since ∇ 2 φ = −2κ. As κ becomes O(1) in the interior of the cluster, the estimation will be biased low. In fact for κ > 1 the observed gradient reverses and a second, flipped image of the intrinsic gradient appears at the centre of an azimuthally symmetric cluster. Another way to see why the reconstruction is biased low in the single image regime is that the cluster magnifies the background image and decreases the observed temperature gradient behind it. While this bias is mitigated by the Wiener filtering of equation (8), the filter cannot remove it entirely. Furthermore, a truly optimal filter would require knowledge of the cluster mass to be estimated.
The bias arises because of the overlap in scales between the unlensed gradient field and the lensed temperature field. Maturi et al [6] suggested that it can be further mitigated by a combination of more strongly high pass filtering the lensed temperature field in equation (6) and utilizing the direction of the large-scale gradients. The efficacy of this technique depends on both the assumed signal and the instrument noise.
Instead let us exploit the fact that we have robust prior knowledge about the unlensed CMB spectrum. A handful of well determined cosmological parameters fixes its shape out through the damping tail where the small-scale gradients pick up most of their contribution. Figure 1 shows the unlensed rms gradient as a function of a step function low pass filter W TT l out to l = l G Almost all of the gradient comes from l 2000 due to diffusion damping of the acoustic peaks. On these scales, the lensing correction to the gradient is tiny for a typical cluster. This suggests that if we impose a sharp filter on the gradient to exclude higher multipoles, we lose very little signal and gain a clean separation between the lensed and unlensed structure. Now let us make these considerations concrete and generalize them to the full set of temperature and polarization fields X, Y ∈ T, E, B. E(n) and B(n) are real valued fields that are related to the Stokes parameters as where ϕ l is the angle between l and the axis which defines Stokes Q. We shall assume that B is absent in the unlensed CMB. The angular space convergence estimatorŝ are constructed from Fourier space estimators that are quadratic in the observed fieldŝ

DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
The gradient field G XY is built out of the lensed X field as either a spin-0 or spin-2 field depending on the spin of Y L Y represents the lensed fields For the polarization fields, they are the Stokes field Q + iU filtered for the E and B components. The two gradient fields and three lensed fields produce six estimators from the temperature and polarization fields for consistency checks. In addition, replacement of the divergence in equation (14) with curl should leave estimators that are consistent with noise. A XY l is a normalization coefficient set to return unbiased estimators under the fully linearized approximation. For arbitrary filter functions, it is determined by noting that [9] where l = l 1 + l 2 = 0 and Here ϕ l 1 l 2 ≡ ϕ l 1 − ϕ l 2 . The normalization is given in terms of these quantities and the filters as For the lensed field filter, we retain the choice of [8,9] As described above, the first modification is that the gradient weights are set to zero above l G = 2000 Figure 1 shows that this is also a good choice for the E field. For l l G , we retain the Wiener filter for Y = B. Secondly, we distinguish between TE and ET estimators. With the gradient filter employing only l < l G , the scale symmetry between the two fields is broken. It is then advantageous to separate the estimators. The TE estimator uses T for the gradient field and E for the lensed field. If contamination from unpolarized cluster emission is strong then this estimator can be used to eliminate its effects. The ET estimator uses E for the gradient field and T for the lensed field. This estimator places less demanding requirements on the experiments in that only the relatively large E field of the acoustic peaks need to be measured to high signal-to-noise. Indeed this measurement need not come from the same experiment that measures the cluster signal. Unfortunately, the sample variance of these estimators is also higher given the imperfect correlation between the two fields.
On the other hand, the sample variance of the EB estimator is reduced by the fact that B modes are assumed absent in the unlensed fields [9,10]. In practice the noise of the actual estimator will depend on the B modes contributed by foregrounds and other contaminants. Since the experimental requirements for EB are similar to that of TE, TB, and EE but yield better prospects for constraints, we focus on the EB, ET and TT estimators in the next section.

Idealized examples
To illustrate the performance of the estimators, let us take the idealization that all lenses are NFW [19] dark matter halos and the observed CMB fields have no contaminants aside from white detector noise. We will address more realistic cases in a future work [20]. As we discuss in section 4 these estimators contain many internal cross checks and should in fact remain unbiased for a wide class of contaminants. Contaminants can however contribute substantially to the noise and we will examine here the performance of the estimators as a function of detector noise as a proxy for other contaminants.
The NFW density profile is given by [19] ρ(r) where the normalization coefficient can be expressed in terms of the mass of the halo. We define the mass to be that enclosed at r 180 defined to be the radius which encloses the mass at an overdensity of 180 times the mean density where ρ c is the critical density today. Likewise, we define the concentration of the halo in terms of this radius The convergence profile for such a halo is [21] κ where the projected scale radius θ s = r s /D L . D is the comoving distance in a flat universe with subscripts 'L' denoting the distance to the lens, 'S' denoting distance to the source, and 'LS' the distance between the lens and source. A primary advantage of the CMB is that the source is behind all clusters and at a well-determined distance D S = 14.3 Gpc. The concentration factor is and the functional form of the profile is [21] g(x) = 1 For illustrative purposes, we take c 180 = 3.2 and z L = 0.7. For these parameters θ s = 0.94 and κ(θ s ) = 0.1. Note the low value of the concentration reflects the low σ 8 fiducial cosmology and is determined by the scaling of [22]. In a higher σ 8 cosmology, clusters at a fixed mass are more concentrated but correspondingly larger clusters are more abundant. Other numbers are typical of SZ selected clusters with upcoming surveys. For the simulated reconstruction we consider a fiducial experiment with a θ FWHM = 1 beam and varying levels of noise on the sky where the beam factor is with θ FWHM in radians. It exponentiates the white detector noise on the beam deconvolved sky. We further assume that P = √ 2 T . For this fiducial beam, a pixel scale of θ pix = 0.2 suffices. The field is chosen to be 512 × 512 pixels (100 × 100 ).
We lens realizations ofC TT l ,C EE l according to the full nonlinear prescription of equation (1) with sixth-order polynomial interpolation of the source image. This procedure allows strong lensing effects to appear at the centre of the cluster. To this lensed image we add a realization of the noise power spectra above. In figure 2, we show the reconstruction with T = 0 simulations of the TT -estimator for N = 1, 10, 10 2 and 10 3 stacked clusters. In this detector noise free limit, there is a clear detection with only one cluster. It appears as a concentrated spike on top of a noisy but slowly varying background from chance correlations of the Gaussian random background field. The reconstructed single image lacks the azimuthal symmetry of the lens but symmetry is recovered by stacking images [6]. This asymmetry comes from the fact that the deflections cannot be estimated in the direction orthogonal to the gradient. The estimator however is unbiased once many realizations of the gradient are stacked.
Stacking is equivalent to measuring the cluster-convergence cross-correlation function where the coordinates are centred at the location of the cluster such that θ is the separation from the centre and φ is the azimuthal angle around the cluster. We assume that the centre of the cluster is known to a negligible fraction of an arcminute through the means by which they are selected. The averaging is over the clusters in the sample. In practice, we evaluate the correlation function at discrete intervals in pixel units by interpolation on the stacked image. Neighbouring bins out to several arcminutes are therefore highly correlated as can be seen directly in figure 2. This correlation is in fact useful for distinguishing the signal from noise.
In figure 3, we show the recovered correlation function from 24 000 detector noise free realizations along with the scatter per 1000 clusters with M 14 = 2. We have smoothed both the lens and reconstruction with a 1.5 pixel FWHM beam after the fact for comparison. The estimator has no detectable bias at the ∼1-2% level in the well-measured regime within a few scale radii. At five times the fiducial mass or M 14 = 10, a low bias of ∼8-9% develops for the fiducial gradient cut of l G = 2000. For these very rare clusters, the signal is large enough that a more aggressive l G = 1000 yields a bias free reconstruction with sufficient signal-to-noise. Alternately, the bias can be calibrated out with simulations.
For cases with finite noise, we explicitly remove effects below the beam scale for numerical convenience. We low pass filter both the lens and the reconstruction with a step function at l beam = √ 8 ln 2/θ FWHM . This also has the desirable effect of making the reconstruction less sensitive to potential contamination near the cluster centre. The filtered correlation function is shown in figure 4 and remains unbiased and well-defined. It is equivalent to measuring the cross power spectrum at multipoles l < l beam . Figure 4 also shows that with T = 10 µK , M 14 = 2 and 2.6 are significantly distinguished.
To quantify this sensitivity, we compute the covariance matrix of the correlation function from the simulations where the discretization again is in bins of the pixel width. To estimate the significance with which two models can be separated, we evaluate χ 2 between a test model ξ cκ (θ) and the true model ξ cκ (θ) summed out to θ = 5 . Beyond a few scale radii of the cluster the correlation function is dominated by large-scale structure.  In table 1, we show the detection significance, the difference between the fiducial M 14 = 2 lens and zero with the covariance matrix evaluated at zero signal. We have scaled the significance per cluster by √ N for N = 10 3 . At a noise level of 10 µK , the significance or signal-to-noise is χ 2 ∼ 0.4 per cluster. Above this noise level, only TT and ET can provide significant detections. Although TT has roughly three times the signal-to-noise, ET can still provide a useful check against contamination and systematics for a first detection. Recall that the filter construction involves the oscillatory acoustic TE correlation and strongly rejects spurious signals. At noise levels between 1 and 3 µK , the EB estimator has both the signal-to-noise and the ability to reject contaminants to make it competitive with TT . Below 1µK , the EB estimator dominates.
In figure 5 we plot the sensitivity to the cluster mass. Specifically we calculate the fractional change in mass that would generate a χ 2 = 1 with 10 3 clusters. The covariance is evaluated at the true model M 14 = 2 so as to include the sample variance of the unlensed gradients. The same features found for detection significance hold for mass sensitivity. For example at 10 µK , TT , ET , and EB provide 13, 43 and 53% mass estimates whereas at 1µK the numbers improve to 5, 14 and 6% respectively.  Finally, these sensitivities depend on the beam scale only weakly near the fiducial θ FWHM = 1 and T = 1-10 µK level. For example, at 3µK , improving the beam to 0.5 has negligible effect on the signal-to-noise. Enlarging the beam to 2 , degrades the TT and ET estimators by ∼30-40% in mass sensitivity. Since the EB estimator is noise dominated at the 1 scale even without beam, it degrades negligibly.

Robustness
Although the reconstruction has low signal-to-noise per cluster, CMB temperature and polarization cross-correlation measurements will ultimately offer many opportunities for consistency checks as well as automatic rejection of many types of contaminants. A full study of the effects of contaminants such as the residual thermal SZ effect, kinetic SZ effect, point sources etc is beyond the scope of this work but we review here the basic properties that enhance the robustness of the estimators.
Each of the six temperature and polarization estimators involves a correlation between a large (l < l G ) scale gradient with a small-scale structure in a temperature or polarization field. Contaminants that do not exhibit such a correlation do not bias the estimators but may increase the noise. The three estimators that involve temperature and polarization (TE, ET , TB) are especially robust. To mimic lensing the two fields must be correlated in the specific oscillating pattern of the acoustic peaks. Likewise the EB estimator provides strong rejection of contaminants given the different parity properties of E and B. Furthermore it does not suffer the reduction in signal of the other three estimators.
The cross-correlation measurement also has further protection against contaminants. Given that the correlation function is azimuthally averaged around the centre of the cluster and that the estimator is weighted by the large-scale gradient, any contaminant that is azimuthally symmetric around the cluster will drop out of the estimator. Thus the symmetric component of the thermal and kinetic SZ effect would not even contribute as excess noise in the estimator.
On the other hand, beyond the idealization of identical NFW clusters at a single redshift employed here, the cluster convergence cross-correlation, while still a well-defined observable, does not directly equate to a measurement of the cluster mass. Rather it provides a measurement of the average mass profile of the clusters in the sample. In general this quantity on its own is not sufficient to interpret the cluster number counts. The widths of the selection function of clusters in mass and redshift must also be sufficiently well understood either a priori or from self-calibration (e.g. [23]). The efficacy of an average profile measurement therefore depends on the manner by which the clusters are selected, e.g. the SZ flux decrement envisaged here, x-ray flux and temperature, optical richness etc. However, the same issues face all lensing mass measurements of clusters that seek to eliminate projection effects through stacking.

Discussion
We have developed mass estimators from six quadratic combinations of CMB temperature and polarization fields that retain their minimum variance characteristic for large-scale structure and provide nearly unbiased reconstructions for rare clusters where lensing effects are moderate to strong. The key difference from previous work [8,9] is that the Wiener filtering for unlensed CMB gradients is augmented with a sharp filter that removes any gradients at l G > 2000 that are not part of the source fields from recombination.
This sharp filter should also help prevent false signals from cluster emission such as the thermal and kinetic SZ effect, point source emission as well as other foregrounds. To even appear as excess noise, the contaminant at ∼1 must have a component that is antisymmetric about the cluster centre. To bias the measurement, the direction of the asymmetry must be correlated with both the polarized and unpolarized contamination at ∼10 in the same way as the acoustic oscillations in the CMB. Note that projection effects from mass associated with the cluster should be considered part of the signal and can be calibrated by N-body simulations. True contaminants however can substantially increase the noise of the estimators. An evaluation of the efficacy of the filter in the presence of real world contaminants is beyond the scope of this paper but will be treated in future work [20].
Here we have tested the estimators against idealized signal and noise: identical clusters lenses with NFW dark matter profiles in the presence of beam filtered white noise. At a noise level of 10 µK and with a 1 beam, the TT estimator can provide a ∼10σ detection per 10 3 clusters of M ∼ 2 × 10 14 h −1 M or equivalently a ∼10% average mass measurement. The ET estimator, based on a separate measurement of E-polarization in the acoustic peaks and T measurements around the cluster, can provide an important cross check on a first detection. Since the estimator filters for the TE correlation and anticorrelation of the acoustic peaks it provides a strong discriminant against contamination and systematics at the price of a factor of ∼3 in signal-to-noise. Ultimately with experiments that produce foreground free maps of the polarization to < 3 µK , the EB estimator will return the best estimates, with the potential to ultimately provide ∼1% measurements.