BeyondPlanck XII. Cosmological parameter constraints with end-to-end error propagation

We present cosmological parameter constraints as estimated using the Bayesian BeyondPlanck (BP) analysis framework. This method supports seamless end-to-end error propagation from raw time-ordered data to final cosmological parameters. As a first demonstration of the method, we analyze time-ordered Planck LFI observations, combined with selected external data (WMAP 33-61GHz, Planck HFI DR4 353 and 857GHz, and Haslam 408MHz) in the form of pixelized maps which are used to break critical astrophysical degeneracies. Overall, all results are generally in good agreement with previously reported values from Planck 2018 and WMAP, with the largest relative difference for any parameter of about 1 sigma when considering only temperature multipoles between 29<l<601. In cases where there are differences, we note that the BP results are generally slightly closer to the high-l HFI-dominated Planck 2018 results than previous analyses, suggesting slightly less tension between low and high multipoles. Using low-l polarization information from LFI and WMAP, we find a best-fit value of tau=0.066 +/- 0.013, which is higher than the low value of tau=0.051 +/- 0.006 derived from Planck 2018 and slightly lower than the value of 0.069 +/- 0.011 derived from joint analysis of official LFI and WMAP products. Most importantly, however, we find that the uncertainty derived in the BP processing is about 30% larger than when analyzing the official products, after taking into account the different sky coverage. We argue that this is due to marginalizing over a more complete model of instrumental and astrophysical parameters, and this results in both more reliable and more rigorously defined uncertainties. We find that about 2000 Monte Carlo samples are required to achieve robust convergence for low-resolution CMB covariance matrix with 225 independent modes.


Introduction
The cosmic microwave background (CMB) represents one of the most powerful probes of cosmology available today, as small variations in the intensity and polarization of this radiation impose strong constraints on cosmological structure formation processes in the early universe. The first discovery of these fluctuations was made by Smoot et al. (1992), and during the last three decades massive efforts have been spent on producing detailed maps with steadily increasing sensitivity and precision (e.g., Bennett et al. 2013;de Bernardis et al. 2000;Louis et al. 2017;Sievers et al. 2013;Ogburn et al. 2010; Planck Collaboration I 2020, and references therein). These measurements have led to a spectacularly successful cosmological concordance model called ΛCDM that posits that the universe was created during a hot Big Bang about 13.8 billion years ago; that it was seeded by Gaussian random density fluctuations during a brief period of exponential expansion called inflation; and that it consists of about 5 % baryonic matter, 30 % dark matter, and 65 % dark energy. This model is able to describe a host of cosmological observables with exquisite precision (see e.g. Planck Collaboration VI 2020), even though it leaves much to be desired in terms of theoretical understanding. Indeed, some of the biggest questions in modern cosmology revolve around understanding the physical nature of inflation, dark matter and dark energy, and billions of dollars and euros are spent on these questions. CMB observations play a key role in all these studies.
The current state-of-the-art in terms of full-sky CMB observations is defined by ESA's Planck satellite mission (Planck Collaboration I 2014, which observed the microwave sky in nine frequencies, ranging from 30 to 857 GHz, between 2009 and 2013. These measurements imposed strong constraints on the ΛCDM model, combining information from temperature and polarization CMB maps with novel gravitational lensing reconstructions (Planck Collaboration VI 2020). While the Planck instrument stopped collecting data already in 2013, the final official Planck data release took place as recently as 2020 (Planck Collaboration Int. LVII 2020), and this clearly testifies to the significant data analysis challenges associated with these types of data. Large-scale polarization reconstruction represents a particularly difficult problem, and massive amounts of effort have been spent aiming to control all significant systematic uncertainties (e.g., Planck Collaboration Int. LVII 2020; Delouis et al. 2019).
The next major scientific endeavor for the CMB community is the search for primordial gravitational waves created during the inflationary epoch (e.g., Kamionkowski & Kovetz 2016). Current theories predict that such gravitational waves should imprint large-scale B-mode polarization in the CMB anisotropies, with a map-domain amplitude no larger than a few tens of nano Kelvin on degree angular scales. Detecting such a faint signal requires at least one or two orders of magnitude higher sensitivity than Planck, and correspondingly more stringent systematics suppression and uncertainty assessment.
The main operational goal of the BeyondPlanck project (BeyondPlanck 2022) is to translate some of the main lessons learned from Planck in terms of systematics mitigation into practical computer code that can be used for next-generation B-mode experiment analysis. And among the most important lessons learned in this respect from Planck regards the tight connection between instrument characterization and astrophysical component separation. Because any CMB satellite experiment in practice must be calibrated with in-flight observations of astrophysical sources, the calibration is in practice limited by our knowledge by the astrophysical sources in question-and this must it-self be derived from the same data set. Instrument calibration and component separation must therefore be performed jointly, and a non-negligible fraction of the full uncertainty budget arises from degeneracies between the two.
The BeyondPlanck project addresses this challenge by constructing a complete end-to-end analysis pipeline for CMB observation into one integrated framework that does not require intermediate human intervention. This is the first complete approach to support seamless end-to-end error propagation for CMB applications, including full marginalization over both instrumental and astrophysical uncertainties and their internal degeneracies; see BeyondPlanck (2022); Colombo et al. (2022) for further discussion.
For pragmatic reasons, the current BeyondPlanck pipeline has so far only been applied to the Planck LFI observations, which have significantly lower signal-to-noise ratio than the Planck HFI observations. The cosmological parameter constraints derived in the following are therefore not by themselves competitive in terms of absolute uncertainties as compared with already published Planck constraints. Rather, the present analysis focuses primarily on general algorithmic aspects, and represents a first real-world demonstration of the end-to-end Bayesian framework that will serve as a platform for further development and data integration of different experiments (Gerakakis et al. 2022).
Noting the sensitivity of large-scale polarization reconstruction to systematic uncertainties, we adopt the reionization optical depth τ as a particularly important probe of stability and performance of the BeyondPlanck framework, and aim to estimate P(τ | d) from Planck LFI and WMAP observations. We also constrain a basic 6-parameter ΛCDM model, combining the Be-yondPlanck low-likelihood with a high-Blackwell-Rao CMB temperature likelihood that for the first time covers the two first accoustic peaks, or ≤ 600. We eventually also complement this with the Planck high-likelihood to extend the multipole range to the full Planck resolution, as well as with selected external non-CMB data sets.
The structure of the rest of the paper is as follows: In Sect. 2 we review the global BeyondPlanck data model, posterior distribution and the CMB likelihood, focusing in particular on how cosmological parameters are constrained in this framework. In Sect. 3 we present ΛCDM parameter constraints from Beyond-Planck alone and combined with the Planck high-likelihood. In Sect. 4 we assess the impact of systematic uncertainties, adopting τ as a reference parameter. In Sect. 5, we provide an assessment of the Monte Carlo convergence of CMB samples. Finally, we summarize our main conclusions in Sect. 6.

Cosmological parameters and BeyondPlanck
We start by introducing the global BeyondPlanck data model in order to show how it couples to cosmological parameters through the Gibbs loop; for a detailed discussion, we refer the interested reader to BeyondPlanck (2022) and references therein. Explicitly, the BeyondPlanck time-ordered data model reads where j indicates radiometer; t and p denotes time sample and pixel on the sky, respectively; and c refers to a given astrophys- Expansion rate in km s −1 Mpc −1 10 9 A s 10 9 × power at k 0 = 0.05 Mpc −1 10 9 A s e −2τ Scalar power amplitude ical signal component. Further, d j,t denotes the measured data value in units of V; g j,t denotes the instrumental gain in units of V K −1 cmb ; P tp, j is the N TOD × 3N pix pointing matrix, where ψ is the polarization angle of the respective detector with respect to the local meridian; B pp , j denotes beam convolution; M jc β p , ∆ j bp denotes element ( j, c) of an N det × N comp mixing matrix, describing the amplitude of the component c, as seen by radiometer j relative to some reference frequency j 0 ; a p c is the amplitude of component c in pixel p, measured at the same reference frequency as the mixing matrix M, and expressed in brightness temperature units; s orb j,t is the orbital CMB dipole signal in units of K cmb , including relativistic quadrupole corrections; s fsl j,t denotes the contribution from far sidelobes, also in units of K cmb ; s 1Hz j,t accounts for elecronic interference with a 1 Hz period; n corr j,t denotes correlated instrumental noise; and n w j,t is uncorrelated (white) noise. The free parameters in this equation are { g, ∆ bp , n corr , a c , β}. All the other quantities are either provided as intrinsic parts of the original data sets, or given as a deterministic function of already available parameters. In addition to the parameters explicitly defined by Eq.
(1), we include a set of hyper-parameters for each free stochastic random field in the model. For instance, for the CMB component map, a CMB , we define a covariance matrix S, which is taken to be isotropic. Expanding a p = m a m Y (p) into spherical harmonics, its covariance matrix may be written as where C is called the angular power spectrum. This function is itself a stochastic field to be included in the model and fitted to the data, and, indeed, the angular CMB power spectrum is one of the most important scientific targets in the entire analysis. We note that this spectral-domain covariance matrix approach does not apply solely to astrophysical components, but also to instrumental stochastic fields, such as correlated noise (Ihle et al. 2022) and time-dependent gain fluctuations (Gjerløw et al. 2022). In many cases, the power spectrum may be further modelled in terms of a smaller set of free parameters, ξ, defined through some deterministic function, C (ξ). For the CMB case, ξ is nothing but the set of cosmological parameters, and the function C (ξ) is evaluated using a standard cosmological Boltzmann solver, for instance as implemented in the CAMB code (Lewis et al. 2000). If we now define the full set of free parameters in the data model as ω ≡ { g, ∆ bp , n corr , a c , β, C (ξ)}, the goal of the current paper is to derive an estimate of the cosmological parameter posterior distribution P(ξ | d), marginalized over all relevant astrophysical and instrumental parameters. In practice, this marginalization is performed by first mapping the full joint posterior, P(ω | d), as a function of C through Monte Carlo sampling, then deriving a C -based CMB power spectrum likelihood from the resulting power spectrum samples, and finally mapping out this likelihood with respect to cosmological parameters using the well-established CosmoMC (Lewis & Bridle 2002) code. We describe in Table 1 the cosmological parameters included in our analysis. The rest of this section details the steps involved in establishing the CMB likelihood for this step.

The BeyondPlanck posterior distribution and Gibbs sampler
In order to sample from the full global posterior, P(ω | d), we start with Bayes' theorem, where P(d | ω) ≡ L(ω) is called the likelihood, P(ω) is called the prior and P(d) is a normalization factor usually referred as the "evidence". Since the latter is independent of ω, we ignore this factor in the following. The exact form of the likelihood is defined by the data model in Eq. (1), which is given as a linear sum of various components, all of which are specified in terms of our free parameters ω. The only term that is not deterministically defined by ω is the white noise, n w , but this is instead assumed to be Gaussian distributed with zero mean and covariance N w . We can therefore write d = s tot (ω) + n w , where s tot (ω) is the sum of all model components in Eq. (1), irrespective of their origin, and therefore d − s tot (ω) ∝ N(0, N w ), where N(µ, Σ) denotes a multivariate Gaussian distribution with mean µ and covariance Σ. Thus, the likelihood takes the following form, The priors are less well defined, and the current BeyondPlanck processing uses a mixture of algorithmic regularization priors (e.g., enforcing foreground smoothness on small angular scales; Andersen et al. 2022), instrument priors (e.g., Gaussian or lognormal priors on the correlated noise spectral parameters; Ihle et al. 2022), and informative astrophysical priors (e.g., AME and free-free amplitude priors; Andersen et al. 2022;Colombo et al. 2022). A full summary of all active priors is provided in Sect. 8 of BeyondPlanck (2022).
Since the main topic of this paper is cosmological parameter estimation, we summarize here only the CMB amplitude and power spectrum sampling steps, as defined by Eqs. (10) and (11), and refer the interested reader to BeyondPlanck (2022) and references therein for discussions regarding the other steps.
As shown by Jewell et al. (2004); Wandelt et al. (2004), the amplitude distribution P(a | d, ω \ a), i.e. the probability of a given the data d and the all the model parameters except a, is a multivariate Gaussian with a mean given by the so-called Wiener filter and an inverse covariance matrix given by S(C ) −1 + N −1 , where S(C ) and N now are the total effective signal and noise covariance matrices, respectively. Samples from this distribution may be drawn by solving the following system of linear equations, typically using the Conjugate Gradient method (Shewchuk 1994), In this expression, M ν is called the mixing matrix, and encodes the instrument-convolved spectral energy densities of each astrophysical foreground component, and the η i 's are independent random Gaussian vectors of N(0, 1) variates. For further details on solving this equation, see Eriksen et al. (2008); Seljebotn et al. (2019); BeyondPlanck (2022); Colombo et al. (2022). Sampling from P(C | d, ω \ C ) is much simpler, as this is an inverse gamma distribution with 2 + 1 degrees of freedom for CMB temperature measurements (Wandelt et al. 2004) and a corresponding Wishart distribution for CMB polarization (Larson et al. 2007). The standard sampling algorithm for the former of these is simply to draw 2 − 1 random variates from a standard Gaussian distribution, η i ∼ N(0, 1), and set The generalization to polarization is straightforward.
The above Gibbs algorithm only represents a formal summary of the algorithm, and in practice we introduce a few important modifications for computational and robustness reasons. The first modification revolves around Galactic plane masking. As shown by Colombo et al. (2022), the BeyondPlanck CMB reconstruction is not perfect along the Galactic plane. To avoid these errors from contaminating the CMB power spectrum and cosmological parameters, we therefore apply a fairly large confidence mask for the actual CMB analysis. At the same time, the Galactic plane does contain invaluable information regarding important global instrumental parameters, for instance the detector bandpasses (Svalheim et al. 2022a), and excluding these data entirely from the analysis would greatly increase the uncertainties on those parameters. For this reason, we run the analysis in two main stages; we first run the above algorithm without a Galactic mask and setting S −1 CMB = 0, primarily to estimate the instrumental and astrophysical parameters; this configuration corresponds to estimating the CMB component independently in each pixel without applying any smoothness prior over the full sky. The cost of setting the power spectrum prior to zero is slightly larger pixel uncertainties than in the optimal case, as the CMB field is now allowed to vary almost independently from pixel to pixel. However, this also ensures that any potential modelling errors remain local, and are not spread across the sky.
Then, once this main sampling process is done, we resample the original chains with respect to the CMB component by looping through each main sample, fixing all instrumental and (most of the) astrophysical parameters, and sampling the CMB-related parameters again; see Colombo et al. (2022) for full details. For low-resolution CMB polarization estimation, for which our likelihood relies on a dense pixel-pixel covariance matrix, the main goal of this stage is simply to obtain more samples of the same type as above, to reduce the Monte Carlo uncertainty in the noise covariance matrix (Sellentin & Heavens 2016). In this case, we therefore simply draw n additional samples from Eq. (10), fixing both instrumental and astrophysical parameters, as well as the CMB a m 's for > 64. We thereby effectively map out the local conditional distribution with respect to white noise for each main sample on large angular scales. We conservatively draw n = 50 new samples per main sample in this step, but after the analysis started, we have checked that as few as 10 sub-samples achieves an equivalent effect. On the other hand, since the cost of producing one of these sub-samples is almost two orders of magnitude smaller than producing a full sample, this additional cost is negligible.
This approach is not suitable for high-resolution CMB temperature analysis, since we cannot construct a pixel-pixel covariance matrix with millions of pixels. In this case, we instead use the Gaussianized Blackwell-Rao estimator (Chu et al. 2005;Rudjord et al. 2009), which was also used for CMB temperature analysis up to ≤ 30 by Planck (e.g., Planck Collaboration V 2020). This estimator relies on proper C samples, and we therefore resample the main chains once again, but this time we apply the confidence mask and enable the C sampling step; once again, all instrumental and (most of) the astrophysical parameters are fixed at their main chain sample values. Thus, this step includes solving Eq. (12) with an inverse noise covariance matrix that is zero in the masked pixels and a non-local S covariance matrix, and this translates into a very high condition number for the coefficient matrix on the left-hand side (Seljebotn et al. 2019). In fact, the computational cost of a single CMB temperature power spectrum sample is comparable to the cost of a full main sample, and we therefore only produce 4000 of these. Fortunately, as shown in Sect. 5, this is sufficient to achieve good convergence up to 700. However, it does not allow us to explore the low signal-to-noise regime above 800. For this reason, we conservatively limit the current BeyondPlanck temperature analysis to ≤ 600, leaving some buffer, and combine with Planck 2018 results at higher multipoles when necessary.

The BeyondPlanck CMB likelihood
The BeyondPlanck CMB power spectrum likelihood is based on two well-established techniques, namely brute-force lowresolution likelihood evaluation on large angular scales for polarization (e.g., Page et al. 2007; Planck Collaboration V 2020), and Blackwell-Rao (BR) estimation for intermediate angular scales for temperature (Chu et al. 2005;Rudjord et al. 2009;Planck Collaboration XI 2016). The main variations are that we employ the signal-to-noise eigenmode compression technique described by Tegmark et al. (1997); Gjerløw et al. (2015) for the lowresolution likelihood (to reduce the dimensionality of the covariance matrix, and therefore the number of Gibbs samples required for convergence), and that we now are able to use the BR estimator to ≤ 600, not only ≤ 200, as was done in Planck 2018; the main reason for this is that in the current scheme the CMB sky map samples are drawn from foreground-subtracted frequency maps (30, 44, 70 GHz. . . ), each with a well-defined white noise term, while in the Planck analysis they were generated from component-separated CMB maps (Commander, NILC, SEVEM, and SMICA; Planck Collaboration IV 2018) with smoothed white noise terms. In this section, we briefly review the mathematical backgrounds for each of these two likelihood approximations, and refer the interested readers to the already mentioned papers for further details.

Low-temperature+polarization likelihood
Starting with the low-resolution case, the appropriate expression for a multivariate Gaussian likelihood reads whereŝ CMB represents a CMB-plus-noise map and N is its corresponding effective noise covariance map. This expression has formed the basis of numerous exact CMB likelihood codes, going at least as far back as COBE-DMR (e.g., Gorski 1994).
The key novel aspect of the current analysis is simply how to establishŝ CMB and N: In previous analyses,ŝ CMB has typically been estimated by maximum-likelihood techniques, while N has been estimated through analytic evaluations that are only able to take into account a rather limited set of uncertainties, such as white and correlated noise; a very simplified template-based foreground model; and simple instrumental models of modes that have poorly measured gains as a consequence of the scanning strategy. In contrast, in the current paper both these quantities are estimated simply by averaging over all available Gibbs samples, where brackets indicate Monte Carlo averages. Thus, in this approach there is no need to explicitly account for each individual source of systematic effects in the covariance matrix, but they are all naturally and seamlessly accounted for through the Markov chain samples.
The main challenge associated with this approach regards how many samples are required for N to actually converge. As discussed by Sellentin & Heavens (2016), a general requirement is that n samp n mode , where n samp is the number of Monte Carlo samples and n mode is the number of modes in the covariance matrix. To establish a robust covariance matrix, one may therefore either increase n samp (at the cost of increased computational costs) or decrease n mode (at the cost of increased final uncertainties). It is therefore of great interest to compress the relevant information inŝ CMB into a minimal set of modes that capture as much of the relevant information as possible. In our case, the main cosmological target for the low-resolution likelihood is the optical depth of reionization, τ, and the main impact of this parameter on the C power spectrum for Planck LFI happens in polarization at very low multipoles, 6 − 8, due to the limited sensitivity of the instrument (Planck Collaboration V 2020).
In practice, we compress the information using the methodology discussed by Tegmark et al. (1997), which isolates the useful modes through Karhunen-Loève (i.e., signal-to-noise eigenmode) compression. Adopting the notation introduced by Gjerløw et al. (2015), we transform the data into a convenient basis through a linear operator of the forms = Ps CMB , where the projection operator is defined as Here P h is an harmonic space truncation operator that retains only spherical harmonics up to a truncation multipole t , M is a masking operator, and [A] is the set of eigenvectors of A with a fractional eigenvalue larger than a threshold value . Thus, P corresponds to a orthonormal basis on the masked sky that retains primarily multipoles below t , 1 and with a relative signalto-noise ratio higher than . It is important to note that this projection operator results in an unbiased likelihood estimator irrespective of the specific values chosen for t and , and the only cost of choosing restrictive values for either is just larger uncertainties in the final results. This is fully equivalent to masking pixels on the sky; as long as the mask definition does not exploit information in the CMB map itself, no choice of mask can bias the final results, but only modify the final error bars. In this paper, we adopt a multipole threshold of max = 8 and a signal-tonoise threshold of 10 −6 ; we apply the R1.8 analysis mask defined by Planck Collaboration V (2020) (with f sky = 0.68); and we use the best-fit Planck 2018 ΛCDM spectrum to evaluate S. In total, this leaves 225 modes in P. Determining how many Monte Carlo samples are required to robustly map out the likelihood for this number of modes is one of the key results of Sect. 4.

High-temperature likelihood
For high-temperature analysis, we exploit the Blackwell-Rao (BR) estimator (Chu et al. 2005), which has been demonstrated to work very well for high signal-to-noise data (Eriksen et al. 2004). This is the case for the BeyondPlanck temperature power spectrum below 700, whereas the signal-to-noise ratio for high-polarization is very low everywhere, even when combining LFI and WMAP data.
In practice, we employ the Gaussianized Blackwell-Rao estimator (GBR), as presented in Rudjord et al. (2009) and used by Planck (Planck Collaboration V 2020), in order to reduce the number of samples required to achieve good convergence at high multipoles. In this approach, the classical Blackwell-Rao estimator is first used to estimate the univariate C likelihood for each multipole separately, where σ i ≡ m |s i m | 2 /(2 + 1) is the observed power spectrum of the i'th Gibbs sample CMB sky map, s CMB . This distribution is used to define a Gaussianizing change-of-variables, x (C ), multipole-by-multipole by matching differential quantiles between the observed likelihood function and a standard Gaussian distribution. The final likelihood expression may then be evaluated as follows, where x = {x (C )} is the vector of transformed input power spectrum coefficients; ∂C /∂x is the Jacobian of the transfor-mation; and the mean µ = {µ } and covariance matrix C = (x − µ )(x − µ ) are estimated from the Monte Carlo samples after Gaussianization with the same change-of-variables. This expression is by construction exact for the full-sky and uniform noise case, due to the diagonal form of the noise covariance matrix, and consequently the full expression factorizes in . For real-world analyses that include sky cuts, anisotropic noise and systematic uncertainties it is strictly speaking an approximation, but as shown by Rudjord et al. (2009), it is an excellent approximation even for relatively large sky cuts. Furthermore, any differ-

CAMB and CosmoMC
Final cosmological parameters are sampled with CosmoMC (Lewis & Bridle 2002), using the above likelihoods as inputs. This code implements a Metropolis-Hastings algorithm to efficiently probe the whole parameter space, using various speed-up and tuning methods (Neal 2005;Lewis 2013). In our analysis, we run eight chains until they reach convergence, as defined by a Gelman-Rubin statistic of R − 1 < 0.01 (Gelman & Rubin 1992), while discarding the first 30 % of each chain as burn-in. This is due to the way CosmoMC learns an accurate orthogonalization and proposal distribution for the parameters from the sample covariance of the previous samples. In general, quoted error bars correspond to 68 % confidence ranges, except in cases for which a given parameter is consistent with a hard prior boundary (such as the tensor-to-scalar ratio, r), in which case we report upper 95 % confidence limits.

Six-parameter ΛCDM constraints
We are now ready to present standard ΛCDM cosmological parameter constraints as derived from the BeyondPlanck likelihood, and we compare these with previous estimates from Planck2018 (Planck Collaboration V 2020). The main results are shown in Fig. 1 Table 2, where we also report constraints when including CMB lensing and baryonic acoustic oscillations (BAO); see (Planck Collaboration XVI 2014; Planck Collaboration XIII 2016) for corresponding Planck analyses.
Overall, we observe excellent agreement between the various cases, and the most discrepant parameter is the optical depth of reionization, for which the BeyondPlanck result (τ = 0.065 ± 0.012) is higher than the Planck 2018 constraint (τ = 0.051 ± 0.06) by roughly 1 σ. In turn, this also translates into a higher initial amplitude of scalar perturbations, A s , by about 1.5 σ. At the same time, it is important to note that the high-information from the HFI-dominated Planck 2018 likelihood plays a key role in constraining all parameters (except τ), by reducing the width of each marginal distribution by a factor of typically 5-10. As such, the good agreement seen in Fig. 1 is not surprising, but rather expected from the high level of correlations between the input datasets.
It is therefore interesting to assess agreement between the various likelihood using directly comparable datasets, and such a comparison is shown in Fig. 2. In this case, we show constraints derived using only T T information between 30 ≤ ≤ 600, combined with a Gaussian prior of τ = 0.060 ± 0.015. The solid lines show results for BeyondPlanck (black), Planck 2018 (blue), and WMAP (red), respectively, while the dashed-dotted green line for reference shows the same Planck 2018 constraints as in Fig. 1 derived from the full likelihood.
Taken at face value, the agreement between the three datasets appears reasonable in this directly comparable regime, as the the most discrepant parameters are Ω b h 2 and H 0 , which both differs by about 1 σ between BeyondPlanck and Planck 2018 and WMAP. However, it is important to note that all three of these datasets are nominally cosmic variance limited in the multipole range between 30 ≤ ≤ 600, and therefore one should in principle expect perfect agreement between these distributions, and that is obviously not the case. Some of these discrepancies can be explained in terms of different masking, noting that the effective sky fraction of the BeyondPlanck Planck 2018, and WMAP likelihoods are about 63, 65, and 75 %, respectively. However, as shown by Planck Collaboration V (2020), such small variations are not by themselves large enough to move the main cosmological parameters by as much as 1 σ.
It is therefore likely the actual data processing pipelines used to model and propagate astrophysical and instrumental systematic errors play a significant role in explaining these differences. In this respect, we make two interesting observations. First of all, we note that BeyondPlanck pipeline fundamentally differs from the two previous pipelines from a statistical point of view, as it is the first pipeline to implement true end-to-end Bayesian modelling that propagate all sources of astrophysical and instrumental systematic uncertainties to the final cosmological parameters; in comparison, the other two pipelines both rely on a mixture of frequentist and Bayesian techniques that are only able to propagate a subset of all uncertainties. Second, we note that the low-LFI-dominated BeyondPlanck results are for several parameters more consistent with the high-HFI-dominated Planck 2018 results than the two previous pipelines; specifically, this is the case for Ω b h 2 , Ω c h 2 , and A s , while for H 0 , the Planck 2018 low-likelihood is slightly closer to its high-results, while Be-yondPlanck and WMAP are identical. Finally, for n s all three pipelines result in comparable agreement with the high-result in terms of absolute discrepancy, but with a different sign; Be-yondPlanck prefers a stronger tilt than either Planck 2018 or WMAP. All in all, we conclude that there seems to be slightly less internal tension between low and high multipoles when using the BeyondPlanck likelihood. Still, the main conclusion from this analysis is that all these differences are indeed small in an absolute sense, and subtle differences at the 1 σ level for 30 ≤ ≤ 600 do not represent a major challenge for the overall cosmological parameters derived from the full Planck 2018 data, as explicitly shown in Fig. 1.
Before concluding this section, we comment on two important cosmological parameters that have been the focus of particularly intense discussion after the Planck 2018 release, namely the Hubble expansion parameter, H 0 , and the RMS amplitude of scalar density fluctuations, σ 8 . Figure 3 shows two-dimensional marginal distributions for H 0 -Ω m and σ 8 -Ω m , respectively, for various data combinations. Here we see that BeyondPlanck on its own is not able to shed new light on the either of the two controversies, due to its limited angular range. When combining with high-Planck 2018 information, however, we see that BeyondPlanck prefers an even slightly lower mean value of H 0 Corresponding marginal BeyondPlanck tensor-to-scalar ratio posteriors derived using BB multipoles between = 2-8, marginalized over the scalar amplitude A s (gray), and by fixing all the ΛCDM parameters to their best-fit values. (black). The filled region corresponds to the 95% confidence interval. Table 3: Summary of cosmological parameters dominated by large-scale polarization and goodness-of-fit statistics. Columns list, from left to right, 1) analysis name; 2) basic data sets included in the analysis; 3) effective accepted sky fraction; 4) posterior mean estimate of the optical depth of reionization with 68 % error bars; 5) upper limit on tensor-to-scalar ratio at 95,% confidence; 6) χ 2 goodness-of-fit statistic as measured in terms of probability-to-exceed; and 7) primary reference.
Analysis Name Data Sets f pol sky τ r BB 95 % χ 2 PTE Reference than Planck2018, although also with a slightly larger uncertainty. The net discrepancy with respect to Riess et al. (2018) is therefore effectively unchanged. The same observation holds for σ 8 , for which BeyondPlanck prefers a higher mean value than Planck, increasing the absolute discrepancy with cosmic shear and galaxy clustering measurements from Heymans et al. (2021). In this case, we see that Be-yondPlanck prefers an even higher value than Planck, by about 1.5 σ, further increasing the previously reported tension with late-time measurements. This difference with respect to Planck is driven by the higher value of τ, as already noted in Fig. 1.

Large-scale polarization and the optical depth of reionization
As discussed by BeyondPlanck (2022), the main purpose of the BeyondPlanck project was not to derive new state-of-the-art ΛCDM parameter constraints, for which, as we have seen above, Planck HFI data are essential. Rather, the main motivation be-hind this work was to develop a novel and statistically consistent Bayesian end-to-end analysis framework for past, current and future CMB experiments, with a particular focus on nextgeneration polarization experiments. As such, the single most important scientific target in all of this work is the optical depth of reionization, τ, which serves as an overall probe of the efficiency of the entire framework. We are now finally ready to present the main results regarding this parameter in this section.
In the left panel of Fig. 4 we show the marginal posterior distribution for τ as derived from the low-BeyondPlanck likelihood alone (black curve), and compare this with corresponding previous estimates from the literature (Hinshaw et al. 2013;Planck Collaboration VI 2020;Natale et al. 2020;Pagano et al. 2020). We note, however, that making head-to-head comparisons between all of these is non-trivial, as the reported parameters depend on different assumptions and data combinations. All results are evaluated adopting the same series of LFI processing masks as defined by (Planck Collaboration V 2020). From top to bottom, the three panels show 1) posterior τ estimate; 2) posterior r estimate, expressed in terms of a detection level with respect to a signal with vanishing B-modes in units of σ; and 3) χ 2 PTE evaluated for the best-fit power spectrum in each case.
only low-polarization and marginalizes over only a small set of strongly correlated parameters, i.e., A s and/or r. Taking into account the fact that Natale et al. (2020) analyzes the official LFI and WMAP products jointly, we choose to tune our analysis configuration to them, to facilitate a head-to-head comparison for the most relevant case. Corresponding numerical values are summarized in Table 3.
We see that the BeyondPlanck polarization-only estimate is in reasonable agreement with the Natale et al. result based on the official LFI and WMAP products, with an overall shift of about 0.2 σ. However, there are two important differences to note in this regard. First, the BeyondPlanck mean value is slightly lower than the LFI+WMAP value, and therefore in slightly better agreement with the HFI-dominated results. Second, and more importantly, we see that the BeyondPlanck uncertainty is larger for BeyondPlanck than LFI+WMAP despite the fact that its sky fraction is larger (68 versus 54 %). Since the uncertainty on τ scales roughly inversely proportionally with the square root of the sky fraction 3 , we can make a rough estimate of what our uncertainty should have been for their analysis setup, The blue curve shows marginalization over white noise only; the red curve shows marginalization over white noise and astrophysical uncertainties; and, finally, the black curve shows marginalization over all contributions, including low-level instrumental uncertainties, as in the final BeyondPlanck analysis.
For comparison, the actual Natale et al. (2020) uncertainty is 0.011, or about 30 % smaller. We interpret our larger uncertainty as being due to marginalizing over a more complete set of statistical uncertainties in the BeyondPlanck analysis framework than is possible with the frequentist-style and official LFI and WMAP data products. As such, this comparison directly highlights the importance of the end-to-end approach. Table 3 also contains several goodness-of-fit and stability tests. Specifically, we first note that the best-fit tensor-to-scalar ratio is consistent with zero, and with an upper 95 % confidence limit of r < 0.6. While this is by no means competitive with current state-of-the-art constraints from the combination of BI-CEP2/Keck and Planck of r < 0.032 (Tristram et al. 2021a), the absence of strong B-mode power is a confirmation that the Be-yondPlanck processing seems clean of systematic errors; these results are in good agreement with the power spectrum results presented by Colombo et al. (2022).
We also note in Table 3 that the impact of = 2 from the analysis is small, and the only noticeable effect of removing it from the analysis is to increase the uncertainties on τ and r by about 10 %. This is important, because the BeyondPlanck processing is not guaranteed to have a unity transfer function for this single mode (EE, = 2): As discussed by Gjerløw et al. (2022), there is a strong degeneracy between the CMB polarization quadrupole and the relative gain parameters, and the current pipeline breaks this by imposing a ΛCDM prior on the single EE = 2 mode. Although this effect is explicitly demonstrated through simulations to be small by Brilenkov et al. (2022), it is still comforting to see that this particular mode does not have a significant impact on the final results.
Finally, the sixth column in Table 3 shows the χ 2 probabilityto-exceed (PTE), where the main quantity is defined as For a Gaussian and isotropic random field, this quantity should be distributed according to a χ 2 n dof distribution, where n dof = 225 is the number of degrees of freedom, which in our case is equal to the number of basis vectors inŝ CMB . The PTE for our likelihood is 0.32, indicating full consistency with the ΛCDM best-fit model 4 and sample-based noise covariance matrix. 5 Figure 5 shows corresponding results for different sky fractions, adopting the series of analysis masks defined by Planck Collaboration V (2020). The tensor-to-scalar ratio is reported in terms of a nominal detection level in units of σ, as defined by matching the observed likelihood ratio L(r bf )/L(r = 0) with that of a Gaussian standard distribution. Overall, we see that all results are largely insensitive to sky fraction, which suggests that the current processing has managed to remove most statistically significant astrophysical contamination (Andersen et al. 2022;Svalheim et al. 2022b). However, we do note that a small Bmode contribution appears at the most aggressive sky coverage of 83 %, and also that the χ 2 PTE starts to fall somewhat above 68 %. For this reason, we conservatively adopt a sky fraction of 68 % for our main results, but note that 73 % would have been equally well justified.
Before concluding this section, we return to the importance of end-to-end error propagation, and perform a simple analysis in which we estimate the marginal τ posterior under three different regimes of systematic error propagation. In the first regime, we assume that the derived CMB sky map is entirely free of both astrophysical and instrumental uncertainties, and the only source of uncertainty is white noise. This case is evaluated by selecting one random CMB sky map sample as the fiducial sky, and we do not marginalize over instrumental or astrophysical samples when evaluating the sky map and noise covariance matrix in Eq. (15). In the second regime, we assume that the instrumental model is perfectly known, while the astrophysical model is uncertain. In the third and final regime, we assume that both the instrumental and astrophysical parameters are uncertain, and marginalize over everything, as in the main BeyondPlanck analysis. The results from these calculations are summarized in Fig. 6. As expected, we see that the uncertainties increase when marginalizing over additional parameters. Specifically, the uncertainty of the fully marginalized case is 46 % larger than for white noise, and 32 % larger than the case marginalizing over the full astrophysical model. This calculation further emphasizes the importance of global end-to-end analysis that takes jointly into account all sources of uncertainty.

Monte Carlo convergence
As noted in Sect. 2, one important goal of the current paper is to assess how many end-to-end Monte Carlo samples are required to robustly derive covariance matrices and cosmological parameters by Gibbs sampling. We are now finally in a position to answer this question quantitatively, using the results already presented.
Starting with the low-polarization likelihood, we once again adopt τ as a proxy for overall stability, and show in Fig. 7 τ as function of the number of Gibbs samples, n samp , used to build the low-likelihood inputs in Eq. 15. 6 Here we see that the esti-4 In this paper we denote quantities fixed to a fiducial ΛCDM best-fit value with the superscript bf. 5 We note that this was not the case in the first preview version of the BeyondPlanck results announced in November 2020: In that case the full-sky χ 2 PTE was O(10 −4 ), and this was eventually explained in terms of gain over-smoothing by Gjerløw et al. (2022) and non-1/ f correlated noise contributions by Ihle et al. (2022). Both these effects were mitigated in the final BeyondPlanck processing, as reported here. 6 Recall that for each main Gibbs chain sample, we additionally draw n = 50 sub-samples to cheaply marginalize over white noise, such that  Planck TT angular power spectrum, as evaluated from four independent σ chains. A R − 1 value lower than 0.1 typically indicates acceptable convergence. Moreover, we report the R − 1 = 10 −2 threshold (dotted black line) representing a safer criterion to assess convergency. mates are positively biased for small values of n samp , with a central value around τ = 0.085. However, the estimates then starts to gradually fall while the Markov chains explore the full distribution. This behaviour can be qualitatively understood as follows: The actual posterior mean sky map converges quite quickly with number of samples, and stabilizes only with a few hundred samthe actual number of individual samples involved in Fig. 7 is actually ples. However, the τ estimate is derived by comparing the covariance of this sky map with the predicted noise covariance as given by N; any excess fluctuations ins compared to N is interpreted as a positive S contribution. Convergence in N is obviously much more expensive than convergence ins, which leads to the slow decrease in τ as a function of sample as N becomes better described by more samples.
From Fig. 7, we see that the results stabilize only after n samp ≈ 2000 main Gibbs samples, which is almost nine times more than the number of modes in the covariance matrix, n mode = 225. Obviously, this number will depend on the specifics of the data models and datasets in question, and more degenerate models will in general require more samples, but at least this estimate provides a real-world number that may serve as a rule-of-thumb for future analyses.
Finally, to assess convergence for the high-temperature likelihood, we adopt the Gelman-Rubin (GR) R convergence statistic, which is defined as the ratio of the "between-chain variance" and the "in-chain variance" (Gelman & Rubin 1992). We evaluate this quantity based on the four available σ chains, including different numbers of samples in each case, ranging between 250 to 4000. The results from this calculations are summarized in Fig. 8. Here we see that the convergence improves rapidly below 600-800, while multipoles above 1000 converge very slowly. We adopt a stringent criterion of R − 1 < 0.01 (dashed horizontal line), and conservatively restrict the multipole range used by BeyondPlanck to ≤ 600. With these restrictions, we once again see that about 2000 samples are required to converge.

Conclusions
The main motivation behind the BeyondPlanck project is to develop a fully Bayesian framework for global analysis of CMB and related datasets that allows for joint analysis of both astrophysical and instrumental effects, and thereby robust end-toend error propagation. In this paper, we have demonstrated this framework in terms of standard cosmological parameters, which arguably represent the most valuable deliverable for any CMB experiment. We emphasize that this work is primarily algorithmic in nature, and intended to demonstrate the Bayesian framework itself using a well-controlled dataset, namely the Planck LFI measurements; it is not intended to replace the current state-of-the-art Planck 2018 results, which are based on highsensitivity HFI measurements.
With this observation in mind, we find that the cosmological parameters derived from LFI and WMAP in BeyondPlanck are overall in good agreement with those published from the previous pipelines. When considering the basic ΛCDM parameters and temperature information between 30 ≤ ≤ 600, the typical agreement between the various cases is better than 1 σ, and we also note that in the cases where there are discrepancies, the BeyondPlanck results are typically somewhat closer to the high-HFI constraints than previous results, indicating less internal tension between low and high multipoles.
Overall, the most noticeable difference is seen for the optical depth of reionization, for which we find a slightly higher value of τ = 0.066 ± 0.013 than Planck 2018 at τ = 0.051 ± 0.006. At the same time, this value is lower than the corresponding LFI-plus-WMAP result derived by Natale et al. (2020) of τ = 0.069 ± 0.011, which suggests that the current processing has cleaned up more systematic errors than in previous LFI processing. Furthermore, and even more critically, we find that the Be-yondPlanck uncertainty is almost 30 % larger than latter when taking into account the different sky fraction, and we argue that this is due to BeyondPlanck taking into account a much richer systematic error model than previous pipelines. Indeed, this result summarizes the main purpose of the entire BeyondPlanck project in terms of one single number. We believe that this type of global end-to-end processing will be critical for future analysis of next-generation B-mode experiments.
A second important goal of the current paper was to quantify how many samples are actually required to converge for a Monte Carlo-based approach. Based on the current analysis, we find that about 2000 end-to-end samples are need to achieve robust results. Obviously, introducing additional sampling steps that more efficiently break down long Markov chain correlation lengths will be important to reduce this number in the future, but already the current results proves that the Bayesian approach is computationally feasible for past and current experiments.