COSMOLOGICAL PARAMETERS FROM CMB MAPS WITHOUT LIKELIHOOD APPROXIMATION

B. Racine; J. B. Jewell; H. K. Eriksen; I. K. Wehus

doi:10.3847/0004-637X/820/1/31

1. INTRODUCTION

Observations of the cosmic microwave background (CMB) provide a direct image of the early universe (Bennett et al. 2013; Planck Collaboration 2015a), and have revolutionized our understanding of the composition and evolution of the universe as a whole (Hinshaw et al. 2013; Planck Collaboration 2015e). The new insights derive primarily from heroic community-wide efforts in microwave instrumentation, leading to steadily improved detector performance and noise levels. However, with improved data sets follow more stringent requirements on analysis techniques, and optimal statistical CMB analysis has become a rich scientific field in its own right during the last 20 years.

One branch of this community-wide effort has revolved around optimal exploration of the full joint CMB posterior, and among the most successful of such methods is the CMB Gibbs sampler. This framework, which was originally introduced by Jewell et al. (2004), has proved particularly powerful because of its ability to seamlessly and jointly account for astrophysical and instrumental nuisance parameters together with the primary CMB parameters, thereby both mitigating and propagating systematic uncertainties in final science results. This method has already been applied successfully to evaluation of power spectra for COBE (Wandelt et al. 2004), WMAP (Eriksen et al. 2007; Hinshaw et al. 2013), and Planck (Planck Collaboration 2015d), and it has played a central role in separation of astrophysical components for Planck (Planck Collaboration 2015b, 2015c).

By virtue of being a Gibbs sampler, this overall method consists of a series of iterated conditional Markov chain moves, in which one or more parameters are changed at a time, leaving all other parameters fixed. Each conditional parameter move is usually fairly simple, and always associated with a well-established sampling algorithm. For instance, the conditional distribution of the CMB power spectrum is an inverse Wishart distribution, that of the CMB sky map is a multivariate Gaussian (Jewell et al. 2004), while astrophysical foreground conditionals can usually be described in terms of some fairly simple χ² evaluations (Eriksen et al. 2008). Still, the computational cost per sample is certainly non-trivial, with the primary cost being driven by the multivariate Gaussian sky distribution, and can easily amount to several CPU hours per sample, requiring more advanced algorithmic treatment (Eriksen et al. 2004; Seljebotn et al. 2014; Jasche & Lavaux 2015).

In this paper, we revisit the problem of estimation of cosmological parameters and the CMB power spectrum within the Gibbs sampling framework, with the goal of establishing an efficient algorithm in both high and low signal-to-noise regimes, and thereby reducing the overall computational cost of the method. The same problem has been discussed and addressed repeatedly in the literature already (e.g., Eriksen et al. 2004; Jewell et al. 2009), but no definitive and general solution has been presented until now. In Section 2 we present the intuition behind our new algorithm, and we compare the new approach to existing sampling schemes. Then, in Section 3 we formalize the algorithm in standard mathematical notation, before testing it on simplified simulations in Section 4. We conclude in Section 5.

2. INTUITION AND MOTIVATION

As noted already by Eriksen et al. (2004), the most severe complication for estimation of the CMB power spectrum and cosmological parameters with the Gibbs sampling framework concerns the relationship between effective signal-to-noise ratio and Markov chain correlation length: while the width of the full power spectrum posterior is given by both cosmic variance and instrumental noise, the step size of the Markov chain power spectrum in the default algorithm (Jewell et al. 2004; Wandelt et al. 2004) is given by cosmic variance alone.

This problem is illustrated in the top two panels of Figure 1. Each move (illustrated by black arrows for sky map parameters and colored for power spectrum parameters) affects only one parameter at a time. Each arrow therefore points parallel to either coordinate axis. In the high signal-to-noise regime (top left panel), the sky signal is highly constrained, and the corresponding marginal posterior is very narrow. The power spectrum marginal, however, still has significant uncertainty due to cosmic variance, even if the noise contribution is small. However, since the sky map distribution essentially converges to a delta function with increasing signal-to-noise ratio, the joint distribution is nearly uncorrelated between the two directions, and pure Gibbs steps (defined by a Gaussian distribution in the vertical direction and an inverse Wishart distribution in the horizontal direction) are able to navigate the full posterior efficiently.

**Figure 1.** Illustration of the performance of different sampling algorithms in different signal-to-noise regimes.We sketch the exploration of the joint distribution of proposed ${C}_{{\ell }}$ and the signal map's power spectrum ${\sigma }_{{\ell }}=1/(2{\ell }+1)\sum | {s}_{{\ell }m}{| }^{2}$ . In the high signal-to-noise limit (left column), standard Gibbs sampling steps (proposing ${C}_{{\ell }}$ according to the cosmic variance (CV) and solving Equation (10)) achieve both good acceptance rates and correlation lengths. In the low signal-to-noise limit (right column), the cosmic variance, and therefore the step length, is much smaller than the noise contribution to the posterior, and standard Gibbs sampling results in a long correlation length. Joint sampling steps in which the sky map is rescaled by the power spectrum, as proposed by Jewell et al. (2009) and illustrated in the bottom right panel, avoid this problem if one proposes ${C}_{{\ell }}$ 's with a variance that includes noise in addition to cosmic variance. Unfortunately, as illustrated in the bottom left panel, the corresponding signal rescaling does not perform well in the high signal-to-noise regime. When proposing ${C}_{{\ell }}$ according to the cosmic variance and the noise, naive rescaling leads to a large change in the amplitude of the signal map, without correspondingly modifying the phase information of the map, and most steps are rejected in the Metropolis–Hastings acceptance evaluation through a poor effective ${\chi }^{2}$ . This problem is solved by the sampling algorithm introduced in the present paper, in which we exclude the high signal-to-noise Wiener filter component of the signal from the rescaling operation, and only modify the fluctuations around this mean-field map. The net result is an algorithm that works in both low and high signal-to-noise regimes.
Download figure:
Standard image High-resolution image

In the low signal-to-noise regime (top right panel), this is no longer true. In this case, there is significant uncertainty in the true sky signal, and for each sky map value the power spectrum conditional follows an inverse Wishart distribution centered on a value given by the sky map. The joint distribution therefore becomes highly degenerate. To move from one end of the joint distribution to the other, a very large number of orthogonal moves are thus required. Such degeneracies are a well-known problem for the Gibbs sampling algorithm in general.

This problem was first identified and studied by Eriksen et al. (2004), and a first proper attempt at solving it was subsequently presented by Jewell et al. (2009). They introduced a new type of joint Markov chain move that consists of three steps. First, one proposes an arbitrary change to the power spectrum, ${C}_{{\ell }}^{i+1}\leftarrow T({C}_{{\ell }}^{i})$ , where T is some proposal rule. Second, one rescales the corresponding sky map, ${{\boldsymbol{s}}}^{i+1}\leftarrow {{\boldsymbol{S}}}_{i+1}^{1/2}{{\boldsymbol{S}}}_{i}^{-1/2}{{\boldsymbol{s}}}^{i}$ , where ${{\boldsymbol{S}}}_{i}={\boldsymbol{S}}({C}_{{\ell }}^{i})=\langle {{\boldsymbol{ss}}}^{t}\rangle$ is the signal-only covariance matrix. This rescaling operation leaves the quantity ${{\boldsymbol{s}}}^{t}{{\boldsymbol{S}}}^{-1}{\boldsymbol{s}}$ invariant, and one can show that any determinant factors in the corresponding Metropolis–Hastings accept probability cancel, and the final accept ratio is given by χ²'s only (Jewell et al. 2009). The third and final step is therefore to evaluate this accept probability, and apply the Metropolis–Hastings rule.

Intuitively, this algorithm corresponds to diagonal moves in Figure 1, as illustrated in the bottom two panels. With an accept rate given by χ²'s alone, this kind of move works very well in the low signal-to-noise regime (bottom right panel), since the effective χ² does not change appreciably when changing the amplitude of the map. However, in the high signal-to-noise regime the χ² becomes sensitive to the phase information in the sky map, and large map rescaling factors are generally associated with very low accept probabilities; only very short moves are allowed in order to stay within the acceptable region.

In the present paper we solve this problem by introducing a small but critical variation of the previous scheme. First, as detailed by Jewell et al. (2004) and Wandelt et al. (2004), we note that the sky signal map may be decomposed into the sum of a Wiener filter component, $\hat{{\boldsymbol{s}}}$ , and a fluctuation term, $\hat{f}$ , in the form ${\boldsymbol{s}}=\hat{{\boldsymbol{s}}}+\hat{f}$ . The high signal-to-noise information is contained in $\hat{{\boldsymbol{s}}}$ , while the noise-dominated component is described by $\hat{f}$ . Exploiting the intuition described above, we now note that the optimal Markov chain move should leave $\hat{{\boldsymbol{s}}}$ invariant, and rescale only $\hat{f}$ by the power spectrum. Specifically, we introduce a partially rescaled proposal rule in the following form

$\begin{eqnarray}&&{{\boldsymbol{s}}}^{i+1}={\hat{{\boldsymbol{s}}}}^{i+1}+{{\boldsymbol{S}}}_{i+1}^{1/2}{{\boldsymbol{S}}}_{i}^{-1/2}({{\boldsymbol{s}}}^{i}-{\hat{{\boldsymbol{s}}}}^{i})\end{eqnarray} \tag{ 1 }$

$\begin{eqnarray}&&={\hat{{\boldsymbol{s}}}}^{i+1}+{{\boldsymbol{S}}}_{i+1}^{1/2}{{\boldsymbol{S}}}_{i}^{-1/2}{\hat{f}}^{i}\end{eqnarray} \tag{ 2 }$

where ${\hat{{\boldsymbol{s}}}}^{i}$ is the Wiener filtered sky map evaluated with ${C}_{{\ell }}^{i}$ .

This new move is contrasted to the previous joint sampler in the bottom two panels of Figure 1 in terms of yellow versus blue arrows. In the low signal-to-noise regime (right panel), the two perform nearly identically, since ${\boldsymbol{s}}\approx \hat{f}$ . However, they perform very differently in the high signal-to-noise regime: since $\hat{f}$ is very small in the signal-dominated regime, only small changes are proposed to ${\boldsymbol{s}}$ in the new scheme, maintaining a high net accept rate.

3. ALGORITHMS

In this section, we first define necessary notation and review previous CMB Gibbs sampling algorithms, before formalizing the intuition described above into a well-defined and operational algorithm. The technical derivation of the Metropolis–Hastings accept probability is deferred to the Appendix.

Let us start by considering a data model of the form

$\begin{eqnarray}&&{\boldsymbol{d}}={\boldsymbol{A}}{\boldsymbol{s}}+{\boldsymbol{n}},\end{eqnarray} \tag{ 3 }$

where ${\boldsymbol{d}}$ denotes a data vector in pixel space; ${\boldsymbol{A}}$ represents convolution with an instrumental beam, usually represented by a Legendre transform b_ℓ; ${\boldsymbol{s}}$ is a Gaussian signal with covariance ${\boldsymbol{S}};$ and ${\boldsymbol{n}}$ denotes Gaussian instrumental noise with covariance ${\boldsymbol{N}}$ . We further assume that the signal ${\boldsymbol{s}}$ is statistically isotropic, and define its power spectrum as ${C}_{{\ell }}\equiv \langle {s}_{{\ell }m}^{*}{s}_{{\ell }^{\prime} m^{\prime} }\rangle ={C}_{{\ell }}{\delta }_{{\ell }{\ell }^{\prime} }{\delta }_{{mm}^{\prime} }$ , where ${\boldsymbol{s}}={\sum }_{{\ell }m}{s}_{{\ell }m}{{\boldsymbol{Y}}}_{{\ell }m}$ . The signal covariance matrix is then given as ${{\boldsymbol{S}}}_{{\ell }m,{\ell }^{\prime} m^{\prime} }={C}_{{\ell }}{\delta }_{{\ell }{\ell }^{\prime} }{\delta }_{{mm}^{\prime} }$ .

The overall goal of this paper is to characterize the marginal power spectrum posterior $P({C}_{{\ell }}| {\boldsymbol{d}})$ somehow. In the literature, this is conventionally done in several different ways, for instance by adopting either single- ${\ell }$ or binned estimates of the power spectrum. While the algorithm presented here certainly is suitable for such parameterizations as well, we will in the following instead focus directly on cosmological parameters, which are the ultimate goal of any CMB experiment. Furthermore, adopting a high-level parameterization that depends on both low and high multipoles (and therefore both high and low signal-to-noise regimes) puts maximum pressure on the algorithm itself. Note that this method is also naturally suitable for sampling power spectrum coefficients, ${C}_{{\ell }}$ , which can be useful to reveal features or anomalies in the data.

For convenience, we adopt a standard six-parameter ΛCDM model in the following, with baryon density ${{\rm{\Omega }}}_{b}{h}^{2}$ , cold dark matter density ${{\rm{\Omega }}}_{c}{h}^{2}$ , optical depth at reionization τ, amplitude and tilt of the primordial fluctuations A_s and n_s, and the Hubble parameter H₀ as free parameters. To evaluate corresponding power spectra, we employ CAMB³ (Lewis et al. 2000). In the rest of the paper, we will denote this vector of parameters as θ.

Thus, our goal is to map out $P(\theta | {\boldsymbol{d}})$ . To do so, we use Bayes' theorem to write the joint density $P(\theta ,{\boldsymbol{s}}| {\boldsymbol{d}})$ as

$\begin{eqnarray}P(\theta ,{\boldsymbol{s}}| {\boldsymbol{d}}) & = & \displaystyle \frac{P(\theta ,{\boldsymbol{s}},{\boldsymbol{d}})}{P({\boldsymbol{d}})}=P({\boldsymbol{d}}| {\boldsymbol{s}})P({\boldsymbol{s}}| \theta )\displaystyle \frac{P(\theta )}{P({\boldsymbol{d}})}\\ & = & \displaystyle \frac{{e}^{-\frac{1}{2}{({\boldsymbol{d}}-{\boldsymbol{A}}{\boldsymbol{s}})}^{t}{{\boldsymbol{N}}}^{-1}({\boldsymbol{d}}-{\boldsymbol{A}}{\boldsymbol{s}})}}{\sqrt{| {\boldsymbol{N}}| }}\displaystyle \frac{{e}^{-\frac{1}{2}{{\boldsymbol{s}}}^{t}{{\boldsymbol{S}}}^{-1}{\boldsymbol{s}}}}{\sqrt{| {\boldsymbol{S}}| }}\displaystyle \frac{P(\theta )}{P({\boldsymbol{d}})}.\end{eqnarray} \tag{ 4 }$

For clarity, we have dropped the dependence of ${\boldsymbol{S}}$ on θ, as well as any factor of $2\pi$ . Optional priors on θ are described by P(θ), and from now on we will neglect the overall normalization factor, $P({\boldsymbol{d}})$ , often called the evidence. We note that our target distribution is obtained by marginalizing the joint distribution over ${\boldsymbol{s}}$ ,

$\begin{eqnarray}&&P(\theta | {\boldsymbol{d}})=\int P(\theta ,{\boldsymbol{s}}| {\boldsymbol{d}})d{\boldsymbol{s}}.\end{eqnarray} \tag{ 5 }$

3.1. Gibbs Sampling

To recap, the basic idea behind Gibbs sampling is to draw a sample from the joint posterior $P(\theta ,{\boldsymbol{s}}| {\boldsymbol{d}})$ by iteratively sampling from the corresponding conditional probabilities,

$\begin{eqnarray}&&{{\boldsymbol{s}}}^{i+1}\leftarrow P({\boldsymbol{s}}| {\theta }^{i},{\boldsymbol{d}})\end{eqnarray} \tag{ 6 }$

$\begin{eqnarray}&&{\theta }^{i+1}\leftarrow P(\theta | {{\boldsymbol{s}}}^{i+1},{\boldsymbol{d}}).\end{eqnarray} \tag{ 7 }$

As discussed by Jewell et al. (2004) and Wandelt et al. (2004), the former of these distributions may be recognized as a standard multivariate Gaussian by rewriting the exponent in Equation (4) as follows:

$\begin{eqnarray}&&{({\boldsymbol{d}}-{\boldsymbol{A}}{\boldsymbol{s}})}^{t}{{\boldsymbol{N}}}^{-1}({\boldsymbol{d}}-{\boldsymbol{A}}{\boldsymbol{s}})+{{\boldsymbol{s}}}^{t}{{\boldsymbol{S}}}^{-1}{\boldsymbol{s}}\\ &&\qquad ={({\boldsymbol{s}}-\hat{{\boldsymbol{s}}})}^{t}({{\boldsymbol{S}}}^{-1}+{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}})({\boldsymbol{s}}-\hat{{\boldsymbol{s}}}),\end{eqnarray} \tag{ 8 }$

where we defined the mean-field map $\hat{{\boldsymbol{s}}}\equiv {({{\boldsymbol{S}}}^{-1}+{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}})}^{-1}{{\boldsymbol{AN}}}^{-1}{\boldsymbol{d}}$ . Given this expression, we sample the sky signal according to

$\begin{eqnarray}&&P({\boldsymbol{s}}| {\theta }^{i},{\boldsymbol{d}})={ \mathcal N }(\hat{{\boldsymbol{s}}},{({{\boldsymbol{S}}}^{-1}+{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}})}^{-1}),\end{eqnarray} \tag{ 9 }$

where ${ \mathcal N }({\boldsymbol{\mu }},{\boldsymbol{ \mathcal C }})$ is a Gaussian multivariate, with a mean vector ${\boldsymbol{\mu }}$ and a covariance ${\boldsymbol{ \mathcal C }}$ . Specifically, we generate two random vectors, ${{\boldsymbol{\omega }}}_{0}$ and ${{\boldsymbol{\omega }}}_{1}$ , drawn from ${ \mathcal N }(0,1)$ , and solve the equation

$\begin{eqnarray}&&[{{\boldsymbol{S}}}^{-1}+{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}]{\boldsymbol{s}}={{\boldsymbol{AN}}}^{-1}{\boldsymbol{d}}+{{\boldsymbol{S}}}^{-\frac{1}{2}}{{\boldsymbol{\omega }}}_{0}+{{\boldsymbol{AN}}}^{-\frac{1}{2}}{{\boldsymbol{\omega }}}_{1}.\end{eqnarray} \tag{ 10 }$

Solving Equation (10) in the context of a realistic experiment with anisotropic noise and non-trivial masks can be computationally expensive (Seljebotn et al. 2014; Jasche & Lavaux 2015). However, this is a purely computational problem of algebraic nature, and fully independent of questions regarding Monte Carlo correlation lengths and signal-to-noise levels. In this paper, we therefore circumvent this problem entirely, and consider only an ideal data set in the harmonic domain with uniform noise and full sky coverage. In this particular case, the noise matrix may be described by a noise power spectrum, ${N}_{{\ell }m,{\ell }^{\prime} m^{\prime} }={N}_{{\ell }}{\delta }_{{\ell }{\ell }^{\prime} }{\delta }_{{mm}^{\prime} }$ , and Equation (10) may be solved directly in harmonic space at negligible computational cost:

$\begin{eqnarray}&&{\hat{{\boldsymbol{s}}}}_{{\ell }m}={{\boldsymbol{d}}}_{{\ell }m}\displaystyle \frac{{b}_{{\ell }}}{{N}_{{\ell }}}\sqrt{{C}_{{\ell }}}\displaystyle \frac{{N}_{{\ell }}\sqrt{{C}_{{\ell }}}}{{N}_{{\ell }}+{b}_{{\ell }}^{2}{C}_{{\ell }}}\end{eqnarray} \tag{ 11 }$

$\begin{eqnarray}&&{\hat{{\boldsymbol{f}}}}_{{\ell }m}=\left({{{\boldsymbol{w}}}_{0}}_{{\ell }m}+{{{\boldsymbol{w}}}_{1}}_{{\ell }m}{b}_{{\ell }}\displaystyle \frac{\sqrt{{C}_{{\ell }}}}{\sqrt{{N}_{{\ell }}}}\right)\displaystyle \frac{{N}_{{\ell }}\sqrt{{C}_{{\ell }}}}{{N}_{{\ell }}+{b}_{{\ell }}^{2}{C}_{{\ell }}}.\end{eqnarray} \tag{ 12 }$

Finally, the second conditional distribution in Equation (7) reduces to an inverse Wishart distribution with a well-known sampling algorithm. This particular sampling step, however, will not be needed for the purposes of the current paper, and we therefore refer the interested reader to the referenced papers for further details.

3.2. Joint Sampling by Metropolis–Hastings MCMC

Adopting the above notation, Jewell et al. (2009) introduced the following joint sampling step to mitigate the slow convergence rate discussed in Section 2 in the low signal-to-noise regime:

$\begin{eqnarray}&&{C}_{{\ell }}^{i+1}={C}_{{\ell }}^{i}+\delta {C}_{{\ell }}\end{eqnarray} \tag{ 13 }$

$\begin{eqnarray}&&{{\boldsymbol{s}}}_{{\ell }m}^{i+1}=\sqrt{\displaystyle \frac{{C}_{{\ell }}^{i+1}}{{C}_{{\ell }}^{i}}}{{\boldsymbol{s}}}_{{\ell }m}^{i},\end{eqnarray} \tag{ 14 }$

where $\delta {C}_{{\ell }}$ denotes a random fluctuation drawn from a normal distribution with zero mean and variance given by the sum of cosmic variance and instrumental noise. They then showed that the corresponding Metropolis–Hastings accept probability for such a joint move reduced to the ratio of exponentiated χ²'s,

$\begin{eqnarray}&&A=\mathrm{min}\left(1,\displaystyle \frac{{e}^{-\frac{1}{2}{({\boldsymbol{d}}-{\boldsymbol{A}}{{\boldsymbol{s}}}^{i+1})}^{t}{{\boldsymbol{N}}}^{-1}({\boldsymbol{d}}-{\boldsymbol{A}}{{\boldsymbol{s}}}^{i+1})}}{{e}^{-\frac{1}{2}{({\boldsymbol{d}}-{\boldsymbol{A}}{{\boldsymbol{s}}}^{i})}^{t}{{\boldsymbol{N}}}^{-1}({\boldsymbol{d}}-{\boldsymbol{A}}{{\boldsymbol{s}}}^{i})}}\right).\end{eqnarray} \tag{ 15 }$

Although efficient in the low signal-to-noise regime, this particular move performs very poorly in the high signal-to-noise regime.

To solve this, we propose in this paper a slight—but critical—variation of the above scheme, as discussed in Section 2. Specifically, rather than rescaling the full signal map, we propose to rescale only the fluctuation component, such that

$\begin{eqnarray}&&{\hat{{\boldsymbol{f}}}}_{{\ell }m}^{i+1}=\sqrt{\displaystyle \frac{{C}_{{\ell }}^{i+1}}{{C}_{{\ell }}^{i}}}{\hat{{\boldsymbol{f}}}}_{{\ell }m}^{i},\end{eqnarray} \tag{ 16 }$

or written in terms of ${{\boldsymbol{s}}}^{i+1}$ as in Equation (2),

$\begin{eqnarray}&&{{\boldsymbol{s}}}^{i+1}={\hat{{\boldsymbol{s}}}}^{i+1}+{{\boldsymbol{S}}}_{i+1}^{1/2}{{\boldsymbol{S}}}_{i}^{-1/2}({{\boldsymbol{s}}}^{i}-{\hat{{\boldsymbol{s}}}}^{i}).\end{eqnarray} \tag{ 17 }$

Again, the underlying intuition behind this proposal is to leave the high signal-to-noise component of the sky signal invariant, and modify only the low signal-to-noise component, to which the total χ² is largely insensitive.

We derive the Metropolis–Hastings accept rate for this new proposal in the Appendix, and find it to be given by

$\begin{eqnarray}&&A={\rm{min}}\left[1,\displaystyle \frac{\pi ({\theta }^{i+1})}{\pi ({\theta }^{i})}\displaystyle \frac{w({\theta }^{i}| {\theta }^{i+1})}{w({\theta }^{i+1}| {\theta }^{i})}\displaystyle \frac{P({\theta }^{i+1})}{P({\theta }^{i})}\right],\end{eqnarray} \tag{ 18 }$

where

$\begin{eqnarray}&&\pi ({\theta }^{i+1})={e}^{-\frac{1}{2}{({\boldsymbol{d}}-{\boldsymbol{A}}\hat{{\boldsymbol{s}}})}^{t}{{\boldsymbol{N}}}^{-1}({\boldsymbol{d}}-{\boldsymbol{A}}\hat{{\boldsymbol{s}}})}{e}^{-\frac{1}{2}{\hat{{\boldsymbol{s}}}}^{t}{{\boldsymbol{S}}}^{-1}\hat{{\boldsymbol{s}}}}{e}^{-\frac{1}{2}{\hat{{\boldsymbol{f}}}}^{t}{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}\hat{{\boldsymbol{f}}}},\end{eqnarray} \tag{ 19 }$

and $w({\theta }^{i+1}| {\theta }^{i})$ denotes the proposal distribution for θ in Equation (13). Note that, for clarity, we dropped the ${\theta }^{i+1}$ dependence of ${\boldsymbol{S}}({\theta }^{i+1})$ , $\hat{{\boldsymbol{s}}}({\theta }^{i+1})$ , and $\hat{{\boldsymbol{f}}}({\theta }^{i+1})$ in the above expression.

The proposal rule w is in principle arbitrary. However, for standard cosmological parameter estimation, as implemented for instance in CosmoMC (Lewis & Bridle 2002), overall faster convergence is achieved when adopting a proposal rule that is close to the underlying marginal posterior distribution. Following CosmoMC and other samplers, we therefore adopt a multivariate Gaussian for w of the form:

$\begin{eqnarray}&&w({\theta }^{i+1}| {\theta }^{i})={e}^{-\frac{1}{2}{({\theta }^{i+1}-{\theta }^{i})}^{t}{{\boldsymbol{C}}}_{{\boldsymbol{\theta }}}^{-1}({\theta }^{i+1}-{\theta }^{i})},\end{eqnarray} \tag{ 20 }$

with a covariance matrix derived by some earlier analysis. (If no such earlier analysis is available, we generate a short chain with a diagonal covariance matrix, and compute a first covariance matrix from that run.)

As in the case of the original method introduced by Jewell et al. (2009), the sampling step introduced above explores by itself only a very limited subspace of the full volume of the sky signal posterior, namely that spanned by a single amplitude rescaling. To explore the full posterior volume, this step must therefore be interleaved with a standard Gibbs step, as described in Equation (8). Overall, the full sampler therefore works as follows:

1.
Propose some initial parameter vector θ⁰, and generate a power spectrum ${C}_{{\ell }}^{0}$ with CAMB. Solve Equations (11) and (12) to obtain the mean-field map $\hat{{\boldsymbol{s}}}({\theta }^{0})$ and the fluctuation map $\hat{{\boldsymbol{f}}}({\theta }^{0})$ .
2.
Propose a new parameter vector θ¹ according to the proposal rule w, and compute the corresponding power spectrum ${C}_{{\ell }}^{1}$ .
3.
Compute the deterministically rescaled fluctuation map ${\hat{{\boldsymbol{f}}}}_{{\ell }m}({\theta }^{1})=\sqrt{{C}_{{\ell }}^{1}/{C}_{{\ell }}^{0}}{\hat{{\boldsymbol{f}}}}_{{\ell }m}({\theta }^{0})$ , and evaluate the accept probability according to Equation (18); accept or reject the proposal according to the usual Metropolis–Hastings rule.
4.
Given the most recent parameter sample, make a standard conditional Gibbs step for the sky map, according to Equation (8), computing both the mean-field and fluctuation maps.
5.
Iterate 2–4.

3.3. Brute-force Direct Sampling

To validate and benchmark our new sampling scheme, we compare it to a case for which we can evaluate the exact posterior at negligible cost, namely a data set with uniform noise and full sky coverage. In this case, the exact marginal parameter posterior, $P(\theta | {\boldsymbol{d}})$ , reads

$\begin{eqnarray}&&{\rm{\Pi }}(\theta | {\boldsymbol{d}})=\displaystyle \frac{{e}^{-\frac{1}{2}{{\boldsymbol{d}}}^{t}{({{\boldsymbol{A}}}^{t}{\boldsymbol{S}}(\theta ){\boldsymbol{A}}+{\boldsymbol{N}})}^{-1}{\boldsymbol{d}}}}{\sqrt{| ({{\boldsymbol{A}}}^{t}{\boldsymbol{S}}(\theta ){\boldsymbol{A}})+{\boldsymbol{N}}| }}.\end{eqnarray} \tag{ 21 }$

To map out this distribution, we use a standard Metropolis sampler with the same proposal distribution, w, as for the joint Gibbs sampler.

4. VALIDATION AND BENCHMARKS

To validate and benchmark our method, we now apply it to a simplified simulation generated as follows. We draw a random Gaussian CMB sky realization from the Planck 2015 best-fit ΛCDM power spectrum (TT + lowP; Planck Collaboration 2015e), and convolve this with a 13' FWHM Gaussian beam. Finally, we add white noise with a power spectrum amplitude of ${N}_{{\ell }}=1.84\times {10}^{-3}\;{\rm{\mu }}{{\rm{K}}}^{2}$ . Both the beam and noise level are chosen to mimic WMAP, in order to probe both the high and low signal-to-noise regimes within our effective multipole range. To be specific, with these parameter choices we find a signal-to-noise ratio per multipole of unity at ${\ell }=900$ .

Next, since we consider only temperature observations in the following, our constraints on τ are very loose. To produce more realistic results, we therefore additionally impose an informative prior of τ = 0.07 ± 0.02.

As our first proposal matrix, we adopt the covariance matrix obtained from the Planck 2015 TT + lowP Markov chains, which are publicly available from the Planck Legacy Archive.⁴ However, due to its higher signal-to-noise ratio and different effective sky realization, this proposal distribution is quite poor. We therefore first generate a set of chains with 50k samples each, and compute the parameter covariance from the latter 40k samples. We additionally rescale the resulting distribution by ${{\boldsymbol{C}}}_{{\boldsymbol{\theta }}}\to 2.4/\sqrt{6}\;{{\boldsymbol{C}}}_{{\boldsymbol{\theta }}}$ , as proposed in Dunkley et al. (2005), for further optimization. Based on this proposal, we run 40 chains with 30k samples each, for both the new joint sampler and the exact sampler. After conservatively removing burn-in, we retain a total of 1M samples for analysis. Note that this is far more than is strictly required, but since each sample is cheap, and our primary concerns here are of algorithmic nature, optimization of chain lengths is not an issue.

We first consider the overall efficiency of the algorithm, as measured in terms of Markov chain correlation lengths. These are shown in Figure 2 for each of the six cosmological parameters, and for both the new (dashed) and the exact (dotted) samplers. Overall, we see that the correlation lengths for the new sampler are roughly twice as long as for the standard sampler, which translates into a computational cost roughly double that of standard samplers. For comparison, we typically find a correlation length of roughly 30–50 with CosmoMC. Thus, the new sampler performs similarly to existing samplers in terms of overall correlation length, within a very small factor. This has never before been achieved within the CMB Gibbs sampling framework.

Next, in Figure 3 we compare the 1D and 2D marginal distributions derived using the two samplers. At least at a visual level, all distributions agree very well. This agreement is quantified in Table 1 in terms of posterior mean and standard deviations for each of the two methods, and the relative difference between the two. We find that the posterior means are identical up to a few thousandths of a σ, while somewhat larger differences are observed for the posterior standard deviation—up to 3% for τ and A_s. The cause of this small discrepancy is still under investigation, although we observe that it vanishes if we loosen or remove the prior on τ. It is thus not an intrinsic feature of the method as such, but rather related to the use of external priors. Since most applications of this type are anyway made without such external priors, and the difference is in either case very small, we defer further discussion and resolution of the issue to a future publication.

Table 1. Summary of Cosmological Parameters

	${{\rm{\Omega }}}_{b}{h}^{2}$	${{\rm{\Omega }}}_{c}{h}^{2}$	τ	A_s	n_s	H₀
New joint sampler	0.02241 ± 0.00035	0.1180 ± 0.0029	0.067 ± 0.019	3.063 ± 0.037	0.972 ± 0.01	68.2 ± 1.5
Exact posterior	0.02241 ± 0.00035	0.1180 ± 0.0029	0.068 ± 0.019	3.063 ± 0.038	0.972 ± 0.01	68.2 ± 1.5
Posterior mean bias $\left(\frac{{\rm{\Delta }}(\mu )}{{\sigma }_{1}}\right)$	−0.002	−0.002	−0.005	−0.006	−0.001	0.003
Posterior rms bias (%)	0.0	0.0	3.2	3.1	0.0	0.1

Note. Posterior mean and standard deviations for each of the six cosmological parameters are shown for the new sampling algorithm in the top row and for the exact calculation in the second row. The third row shows the difference in the posterior means measured in units of σ₁, the rms derived from the new method. The bottom row shows relative difference in the derived posterior RMSs as a percentage. Note that an external prior was applied to τ.

Download table as: ASCII Typeset image

5. CONCLUSIONS

We have introduced a new sampling step for exploring the joint CMB sky signal and power spectrum posterior that is suitable for both high and low signal-to-noise regimes. This new step is very closely related to an already introduced rescaling algorithm (Jewell et al. 2009), but with a very important difference: rather than rescaling the entire sky signal in each iteration, we now propose to rescale only the component with low signal-to-noise fluctuation. As a result, high Metropolis–Hastings accept rates are maintained even in the high signal-to-noise regime.

Our focus in this paper has been on purely statistical–algorithmic aspects of the method, not real-world applications. To make the algorithm useful for such, it needs to be combined with state-of-the-art constrained realization samplers (Eriksen et al. 2004; Seljebotn et al. 2014), in order to solve the computationally expensive map-making step efficiently. Once that task has been completed, it will finally be possible to go from raw sky maps to cosmological parameters without the use of any likelihood approximations whatsoever. In addition, full physical marginalization over astrophysical foreground contamination will be straightforward (Eriksen et al. 2008; Planck Collaboration 2015c).

Finally, we note that while we have considered only CMB temperature analysis in the current paper, the method generalizes naturally to all other fields that employ joint Gibbs sampling as their basic algorithm. Three specific examples include CMB polarization analysis (Larson et al. 2007), large-scale structure analysis (Jasche & Wandelt 2013), and weak lensing analysis (Alsing et al. 2016).

B.R. and H.K.E. thank Charles Lawrence and the Jet Propulsion Laboratory (JPL) for their hospitality during summer 2015, where this project was initiated. We acknowledge the use of the CAMB and HEALPix (Górski et al. 2005) software packages. B.R. acknowledges funding from the Research Council of Norway. This project was supported by the ERC Starting Grant StG2010-257080. Part of the research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA.

APPENDIX: TECHNICAL DETAILS OF THE PARAMETER SAMPLING MCMC STEP

We review the technical details of constructing an MCMC algorithm to sample from the Bayes joint posterior of cosmological parameters θ and CMB signal maps s given data $p(\theta ,{\boldsymbol{s}}| {\boldsymbol{d}})$ , ideally striking a balance between overall computational expense and number of posterior samples. Transition matrices for MCMC algorithms are constructed to be both stationary and irreducible. The latter simply means that any "state" is reachable with a finite (non-vanishing) probability after enough iterations, while the former means that the target distribution is invariant under repeated iterations of the algorithm. These two properties are sufficient to establish convergence in measure to the target distribution when started from some initial probability density.

In this Appendix we concentrate on just the step involving variations in cosmological parameters and deterministic changes in the CMB map—this step allows large jumps in the parameters, and represents a stationary step leaving the Bayes posterior invariant. It is not, however, irreducible—but we interleave standard Gibbs steps in between to produce an overall transition matrix given by the product of the two that is both stationary and irreducible. We now discuss the details of the joint parameter and CMB map step, and derive its accept probability.

A.1. Joint Proposal for Cosmological Parameters and CMB Signal Map

We first assume joint proposals for both CMB maps and parameters of the form

$\begin{eqnarray}&&w({\theta }^{i+1},{{\boldsymbol{s}}}^{i+1}| {\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}})\\ &&\qquad =w({{\boldsymbol{s}}}^{i+1}| {\theta }^{i+1},{\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}})w({\theta }^{i+1}| {\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}}).\end{eqnarray} \tag{ 22 }$

In words, we first generate a proposal for cosmological parameters possibly dependent on the current state of the MCMC chain, followed by a proposal for a new CMB map given the current and proposed parameters θⁱ and ${\theta }^{i+1}$ respectively, the current CMB map, and the data (in the numerical examples presented in this paper, a symmetric proposal for the parameters was used for them, but we proceed with the general case for now).

For the proposal for the new map ${{\boldsymbol{s}}}^{i+1}$ , we consider variations about the new mean-field map

$\begin{eqnarray}&&{\hat{{\boldsymbol{s}}}}^{i+1}={({{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}+{{\boldsymbol{S}}}^{-1}({\theta }^{i+1}))}^{-1}{{\boldsymbol{AN}}}^{-1}{\boldsymbol{d}}\end{eqnarray} \tag{ 23 }$

(note that the mean-field map is a function of information conditioned on when proposing ${{\boldsymbol{s}}}^{i+1}$ , specifically ${\theta }^{i+1}$ and the data ${\boldsymbol{d}}$ ). We consider, for some linear filter ${\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})$ , general proposals of the form

$\begin{eqnarray}&&{{\boldsymbol{s}}}^{i+1}={\hat{{\boldsymbol{s}}}}^{i+1}+{\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})({{\boldsymbol{s}}}^{i}-{\hat{{\boldsymbol{s}}}}^{i})+{\beta }^{1/2}\xi ,\end{eqnarray} \tag{ 24 }$

with ξ Gaussian-distributed with unit variance and β a scaling factor controlling its variance. Our family of proposals for $w({{\boldsymbol{s}}}^{i+1}| {\theta }^{i+1},{\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}})$ is therefore

$\begin{eqnarray}&&{w}_{\beta }({{\boldsymbol{s}}}^{i+1}| {\theta }^{i+1},{\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}})\\ &&\quad =\displaystyle \frac{1}{{(2\pi \beta )}^{({{\ell }}_{\mathrm{max}}+1/2)}}{e}^{-\displaystyle \frac{\parallel {{\boldsymbol{s}}}^{i+1}-{\hat{{\boldsymbol{s}}}}^{i+1}-{\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})({{\boldsymbol{s}}}^{i}-{\hat{{\boldsymbol{s}}}}^{i}){\parallel }^{2}}{2\beta }}.\end{eqnarray} \tag{ 25 }$

We take the limit $\beta \to 0$ , which reduces to a δ-function for the map about the deterministic proposal

$\begin{eqnarray}&&{w}_{\beta }({{\boldsymbol{s}}}^{i+1}| {\theta }^{i+1},{\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}})\;\to {}_{\beta \to 0}\\ &&\qquad \times \delta [{{\boldsymbol{s}}}^{i+1}-{\hat{{\boldsymbol{s}}}}^{i+1}-{\boldsymbol{F}}[{\theta }^{i+1},{\theta }^{i}]({{\boldsymbol{s}}}^{i}-{\hat{{\boldsymbol{s}}}}^{i})].\end{eqnarray} \tag{ 26 }$

Finally, we focus on one choice of filter (used for the numerical results in this paper):

$\begin{eqnarray}&&{\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})={{\boldsymbol{S}}}^{1/2}({\theta }^{i+1}){{\boldsymbol{S}}}^{-1/2}({\theta }^{i})\end{eqnarray} \tag{ 27 }$

(see the discussion below for a generalization of this choice of filter and its impact on the MCMC algorithm). To satisfy detailed balance, note that we must have

$\begin{eqnarray}&&{{\boldsymbol{s}}}^{i}-{\hat{{\boldsymbol{s}}}}^{i}={\boldsymbol{F}}[{\theta }^{i},{\theta }^{i+1}]({{\boldsymbol{s}}}^{i+1}-{\hat{{\boldsymbol{s}}}}^{i+1}),\end{eqnarray} \tag{ 28 }$

which means that the filter must satisfy

$\begin{eqnarray}&&{\boldsymbol{F}}({\theta }^{i},{\theta }^{i+1})={{\boldsymbol{F}}}^{-1}({\theta }^{i+1},{\theta }^{i})\end{eqnarray} \tag{ 29 }$

(which it does, as can be readily verified).

A.2. Functional Form of the Accept Probability

As reviewed in Jewell et al. (2009), stationary MCMC transition matrices can be constructed by demanding detailed balance, from which the probabilistic rule of accepting or rejecting proposed changes in $(\theta ,{\boldsymbol{s}})$ can be derived. We will require that the probability of rejecting a proposed move when starting from state $({\theta }^{2},{{\boldsymbol{s}}}^{2})$ is equal to the probability of accepting a transition TO the state $({\theta }^{2},{{\boldsymbol{s}}}^{2})$ from some other state:

$\begin{eqnarray}&&{\rm{Prob.}}\;{\rm{of}}\;{\rm{rej.}}\\ &&=\left[\displaystyle \int d({{\boldsymbol{s}}}^{1},{\theta }^{1})\ A[{\theta }^{1},{{\boldsymbol{s}}}^{1}| {\theta }^{2},{{\boldsymbol{s}}}^{2}]\ \delta [({{\boldsymbol{s}}}^{1}-{\hat{{\boldsymbol{s}}}}^{1})\right.\\ &&\quad -{\boldsymbol{F}}({\theta }^{1},{\theta }^{2})({{\boldsymbol{s}}}^{2}-{\hat{{\boldsymbol{s}}}}^{2})]\ w({\theta }^{1}| {\theta }^{2},{{\boldsymbol{s}}}^{2},{\boldsymbol{d}})]P({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\boldsymbol{d}})\\ &&{\rm{Prob.}}\;{\rm{of}}\;{\rm{acc.}}\\ &&=\left[\displaystyle \int d({{\boldsymbol{s}}}^{1},{\theta }^{1})\ A[{\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1}]\ \delta [({{\boldsymbol{s}}}^{2}-{\hat{{\boldsymbol{s}}}}^{2})\right.\\ &&\quad -{\boldsymbol{F}}({\theta }^{2},{\theta }^{1})({{\boldsymbol{s}}}^{1}-{\hat{{\boldsymbol{s}}}}^{1})]\ w({\theta }^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1},{\boldsymbol{d}})]P({\theta }^{1},{{\boldsymbol{s}}}^{1}| {\boldsymbol{d}}).\end{eqnarray} \tag{ 30 }$

The integration over the δ-function in the latter term is equivalent to

$\begin{eqnarray}&&{\rm{Prob.}}\;{\rm{of}}\;{\rm{acc.}}\\ &&=\left[\displaystyle \int d({{\boldsymbol{s}}}^{1},{\theta }^{1})\ A[{\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1}]\right.\\ &&\quad \times \displaystyle \frac{\delta [({{\boldsymbol{s}}}^{1}-{\hat{{\boldsymbol{s}}}}^{1})-{\boldsymbol{F}}({\theta }^{1},{\theta }^{2})({{\boldsymbol{s}}}^{2}-{\hat{{\boldsymbol{s}}}}^{2})]}{| {\boldsymbol{F}}({\theta }^{2},{\theta }^{1})| }\\ &&\quad \times w({\theta }^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1},{\boldsymbol{d}})]P({\theta }^{1},{{\boldsymbol{s}}}^{1}| {\boldsymbol{d}})\end{eqnarray} \tag{ 31 }$

where we explicitly note that

$\begin{eqnarray}&&| {\boldsymbol{F}}({\theta }^{2},{\theta }^{1}){| }^{-1}=\displaystyle \frac{| {\boldsymbol{S}}({\theta }^{1}){| }^{1/2}}{| {\boldsymbol{S}}({\theta }^{2}){| }^{1/2}}\end{eqnarray} \tag{ 32 }$

and where the last line follows from ${{\boldsymbol{F}}}^{-1}({\theta }^{2},{\theta }^{1})={\boldsymbol{F}}({\theta }^{1},{\theta }^{2})$ . In order to have the two integrals for the reject and accept probabilities be equal, we can demand that in detail the accept probability satisfies

$\begin{eqnarray}&&P({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\boldsymbol{d}})w({\theta }^{1}| {\theta }^{2},{{\boldsymbol{s}}}^{2},{\boldsymbol{d}})\ A[{\theta }^{1},{{\boldsymbol{s}}}^{1}| {\theta }^{2},{{\boldsymbol{s}}}^{2}]\\ &&\quad =A[{\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1}]\ \left(\displaystyle \frac{1}{| {\boldsymbol{F}}({\theta }^{2},{\theta }^{1})| }\right)w\\ &&\qquad \times ({\theta }^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1},{\boldsymbol{d}})P({\theta }^{1},{{\boldsymbol{s}}}^{1}| {\boldsymbol{d}}),\end{eqnarray} \tag{ 33 }$

which equivalently leads to the usual rule $A({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1})=\mathrm{min}[1,R({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1})]$ with the ratio defined as

$\begin{eqnarray}R({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1}) & = & \left(\displaystyle \frac{P({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\boldsymbol{d}})}{P({\theta }^{1},{{\boldsymbol{s}}}^{1}| {\boldsymbol{d}})}\right)\left(\displaystyle \frac{| {\boldsymbol{S}}({\theta }^{2}){| }^{1/2}}{| {\boldsymbol{S}}({\theta }^{1}){| }^{1/2}}\right)\\ & & \times \left(\displaystyle \frac{w({\theta }^{1}| {\theta }^{2},{{\boldsymbol{s}}}^{2},{\boldsymbol{d}})}{w({\theta }^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1},{\boldsymbol{d}})}\right).\end{eqnarray} \tag{ 34 }$

Explicitly substituting the form of the joint posterior $P(\theta ,{\boldsymbol{s}}| {\boldsymbol{d}})$ , and using the fact that ${\hat{{\boldsymbol{f}}}}^{1}{{\boldsymbol{S}}}^{-1}({\theta }^{1}){\hat{{\boldsymbol{f}}}}^{1}={\hat{{\boldsymbol{f}}}}^{2}{{\boldsymbol{S}}}^{-1}({\theta }^{2}){\hat{{\boldsymbol{f}}}}^{2}$ , we have

$\begin{eqnarray}R({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1}) & = & \left(\displaystyle \frac{P({\theta }^{2})}{P({\theta }^{1})}\right)\left(\displaystyle \frac{{e}^{-\frac{1}{2}{\chi }^{2}[{\hat{{\boldsymbol{s}}}}^{2}]-\frac{1}{2}{\hat{{\boldsymbol{s}}}}^{2}{{\boldsymbol{S}}}^{-1}({\theta }^{2}){\hat{{\boldsymbol{s}}}}^{2}}}{{e}^{-\frac{1}{2}{\chi }^{2}[{\hat{{\boldsymbol{s}}}}^{1}]-\frac{1}{2}{\hat{{\boldsymbol{s}}}}^{1}{{\boldsymbol{S}}}^{-1}({\theta }^{1}){\hat{{\boldsymbol{s}}}}^{1}}}\right)\\ & & \times \left(\displaystyle \frac{{e}^{-\frac{1}{2}{\hat{{\boldsymbol{f}}}}^{2}{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}{\hat{{\boldsymbol{f}}}}^{2}}}{{e}^{-\frac{1}{2}{\hat{{\boldsymbol{f}}}}^{1}{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}{\hat{{\boldsymbol{f}}}}^{1}}}\right)\left(\displaystyle \frac{w({\theta }^{1}| {\theta }^{2},{{\boldsymbol{s}}}^{2},{\boldsymbol{d}})}{w({\theta }^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1},{\boldsymbol{d}})}\right),\end{eqnarray} \tag{ 35 }$

where ${\chi }^{2}[{\hat{{\boldsymbol{s}}}}^{2}]={({\boldsymbol{d}}-{\boldsymbol{A}}\hat{{\boldsymbol{s}}})}^{t}{{\boldsymbol{N}}}^{-1}({\boldsymbol{d}}-{\boldsymbol{A}}\hat{{\boldsymbol{s}}})$ . Above and from now on, we drop the ${(.)}^{t}$ notation for the transpose vectors. For the choice of a parameter proposal matrix that is independent of the CMB map and data and symmetric (for example $-\mathrm{log}w({\theta }^{i+1}| {\theta }^{i},{{\boldsymbol{s}}}^{i},{\boldsymbol{d}})$ $=\;-\mathrm{log}w({\theta }^{i+1}| {\theta }^{i})$ $\sim ({\theta }^{i+1}-{\theta }^{i}){{\boldsymbol{C}}}_{{\boldsymbol{\theta }}}^{-1}({\theta }^{i+1}-{\theta }^{i})$ ) the ratio of parameter proposals drops out and we are left with

$\begin{eqnarray}R({\theta }^{2},{{\boldsymbol{s}}}^{2}| {\theta }^{1},{{\boldsymbol{s}}}^{1}) & = & \left(\displaystyle \frac{P({\theta }^{2})}{P({\theta }^{1})}\right)\left(\displaystyle \frac{{e}^{-\frac{1}{2}{\chi }^{2}[{\hat{{\boldsymbol{s}}}}^{2}]-\frac{1}{2}{\hat{{\boldsymbol{s}}}}^{2}{{\boldsymbol{S}}}^{-1}({\theta }^{2}){\hat{{\boldsymbol{s}}}}^{2}}}{{e}^{-\frac{1}{2}{\chi }^{2}[{\hat{{\boldsymbol{s}}}}^{1}]-\frac{1}{2}{\hat{{\boldsymbol{s}}}}^{1}{{\boldsymbol{S}}}^{-1}({\theta }^{1}){\hat{{\boldsymbol{s}}}}^{1}}}\right)\\ & & \times \left(\displaystyle \frac{{e}^{-\frac{1}{2}{\hat{{\boldsymbol{f}}}}^{2}{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}{\hat{{\boldsymbol{f}}}}^{2}}}{{e}^{-\frac{1}{2}{\hat{{\boldsymbol{f}}}}^{1}{{\boldsymbol{A}}}^{t}{{\boldsymbol{N}}}^{-1}{\boldsymbol{A}}{\hat{{\boldsymbol{f}}}}^{1}}}\right).\end{eqnarray} \tag{ 36 }$

A.3. Generalization to Unity Accept Transitions versus Computational Expense

We note here the interesting generalization of the scheme outlined above which uses deterministic proposals for the CMB signal maps and which leads to unity accept proposals.

We look for a filtering operation ${\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})$ that leaves invariant

$\begin{eqnarray}&&{\hat{{\boldsymbol{f}}}}^{i+1}({{\boldsymbol{N}}}^{-1}+{{\boldsymbol{S}}}^{-1}({\theta }^{i+1})){\hat{{\boldsymbol{f}}}}^{i+1}={\hat{{\boldsymbol{f}}}}^{i}({{\boldsymbol{N}}}^{-1}+{{\boldsymbol{S}}}^{-1}({\theta }^{i})){\hat{{\boldsymbol{f}}}}^{i}.\end{eqnarray} \tag{ 37 }$

We can therefore set

$\begin{eqnarray}&&{\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})={({{\boldsymbol{N}}}^{-1}+{{\boldsymbol{S}}}^{-1}({\theta }^{i+1}))}^{-1/2}{({{\boldsymbol{N}}}^{-1}+{{\boldsymbol{S}}}^{-1}({\theta }^{i}))}^{+1/2}.\end{eqnarray} \tag{ 38 }$

This has the desired invariance of the fluctuation map, as well as satisfying the condition required for detailed balance, ${{\boldsymbol{F}}}^{-1}({\theta }^{i+1},{\theta }^{i})={\boldsymbol{F}}({\theta }^{i},{\theta }^{i+1})$ . The determinant factor appearing in the ratio for the accept probability is

$\begin{eqnarray}| {\boldsymbol{F}}({\theta }^{i+1},{\theta }^{i})| & = & \displaystyle \frac{{| {{\boldsymbol{N}}}^{-1}+{{\boldsymbol{S}}}^{-1}({\theta }^{i+1})| }^{-1/2}}{{| {{\boldsymbol{N}}}^{-1}+{{\boldsymbol{S}}}^{-1}({\theta }^{i})| }^{-1/2}}\\ & = & \left(\displaystyle \frac{| {\boldsymbol{S}}({\theta }^{i+1}){| }^{1/2}}{| {\boldsymbol{S}}({\theta }^{i}){| }^{1/2}}\right)\left(\displaystyle \frac{{| {\boldsymbol{N}}+{\boldsymbol{S}}({\theta }^{i})| }^{1/2}}{{| {\boldsymbol{N}}+{\boldsymbol{S}}({\theta }^{i+1})| }^{1/2}}\right)\end{eqnarray} \tag{ 39 }$

(with generalizations for the case when ${{\boldsymbol{N}}}^{-1}$ is singular). The significance of the above is the following—everything in the accept probability ratio R cancels if the proposals for the cosmological parameters are from the "exact" form as discussed in this paper.

While it may appear somewhat paradoxical to consider proposals in cosmological parameters from the exact posterior marginal (therefore making MCMC unnecessary), the idea provides insight into the correctness of the algorithm as well as intuition if approximations to the inverse noise can be made. A well-known approximation to the functional form of the likelihood in CMB analysis has been to take only the diagonal elements in a spherical harmonic basis. The above comments simply show that in the limit that approximations to the noise converge to the true instrument noise (and we can generate proposals to parameters from the approximate posterior) we will have high accept probabilities.

COSMOLOGICAL PARAMETERS FROM CMB MAPS WITHOUT LIKELIHOOD APPROXIMATION

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. INTUITION AND MOTIVATION

3. ALGORITHMS

3.1. Gibbs Sampling

3.2. Joint Sampling by Metropolis–Hastings MCMC

3.3. Brute-force Direct Sampling

4. VALIDATION AND BENCHMARKS

5. CONCLUSIONS

APPENDIX: TECHNICAL DETAILS OF THE PARAMETER SAMPLING MCMC STEP

A.1. Joint Proposal for Cosmological Parameters and CMB Signal Map

A.2. Functional Form of the Accept Probability

A.3. Generalization to Unity Accept Transitions versus Computational Expense

Footnotes

COSMOLOGICAL PARAMETERS FROM CMB MAPS WITHOUT LIKELIHOOD APPROXIMATION

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. INTUITION AND MOTIVATION

3. ALGORITHMS

3.1. Gibbs Sampling

3.2. Joint Sampling by Metropolis–Hastings MCMC

3.3. Brute-force Direct Sampling

4. VALIDATION AND BENCHMARKS

5. CONCLUSIONS

APPENDIX: TECHNICAL DETAILS OF THE PARAMETER SAMPLING MCMC STEP

A.1. Joint Proposal for Cosmological Parameters and CMB Signal Map

A.2. Functional Form of the Accept Probability

A.3. Generalization to Unity Accept Transitions versus Computational Expense

Footnotes