Bayesian forward modelling of cosmic shear data

We present a Bayesian hierarchical modelling approach to infer the cosmic matter density field, and the lensing and the matter power spectra, from cosmic shear data. This method uses a physical model of cosmic structure formation to infer physically plausible cosmic structures, which accounts for the non-Gaussian features of the gravitationally evolved matter distribution and light-cone effects. We test and validate our framework with realistic simulated shear data, demonstrating that the method recovers the unbiased matter distribution and the correct lensing and matter power spectrum. While the cosmology is fixed in this test, and the method employs a prior power spectrum, we demonstrate that the lensing results are sensitive to the true power spectrum when this differs from the prior. In this case, the density field samples are generated with a power spectrum that deviates from the prior, and the method recovers the true lensing power spectrum. The method also recovers the matter power spectrum across the sky, but as currently implemented, it cannot determine the radial power since isotropy is not imposed. In summary, our method provides physically plausible inference of the dark matter distribution from cosmic shear data, allowing us to extract information beyond the two-point statistics and exploiting the full information content of the cosmological fields.


INTRODUCTION
As light from distant galaxies propagates through the Universe, it is deflected by the gravitational field induced by the large-scale structures. This deflection results in a coherent distortion of observed galaxy images, inducing small changes in the ellipticity of observed galaxies, which is known as cosmic shear. The weak gravitational lensing effect is sensitive to the geometry of the Universe and the growth of cosmic structures, making it a powerful probe to study the matter distribution and the nature of dark matter and dark energy (see e.g. Kilbinger 2015, for a review).
The next-generation surveys like Euclid (Euclid Collaboration et al. 2020), Roman Space Telescope (Spergel et al. 2015) and the Rubin Observatory (LSST Science Collaboration et al. 2009) will provide unprecedented precision in cosmic shear measurements, performing wide-field cosmic shear surveys and measuring large and small scales. Harvesting the information from these data sets will present a challenge. Many of the current cosmic shear analyses focus on extracting information from the correlation function or the associated power spectrum (Kitching et al. 2011;Heymans et al. 2013;Kitching et al. 2014Kitching et al. , 2015Alsing et al. 2016;Kitching et al. 2016;Hildebrandt et al. 2017;Troxel et al. 2018;Hikage et al. 2019;Taylor et al. 2019). These analyses capture the two-point statistics, but they do not fully capture the non-Gaussian information encoded in the filamentary features of the matter distribution (Bernardeau et al. ★ n.porqueres@imperial.ac.uk 1997; Jain & Seljak 1997;van Waerbeke et al. 1999;Schneider & Lombardi 2003;Takada & Jain 2003;Vafaei et al. 2010;Kayo et al. 2013). While some approaches to access the non-Gaussian information are based on measuring high-order correlations (Bernardeau et al. 2003;Pen et al. 2003;Jarvis et al. 2004;Semboloni et al. 2011;van Waerbeke et al. 2013;Fu et al. 2014), peak counts (Jain & van Waerbeke 2000;Dietrich & Hartlap 2010;Maturi et al. 2011;Marian et al. 2012;Pires et al. 2012;Cardone et al. 2013;Lin & Kilbinger 2015;Liu et al. 2015b,a;Kacprzak et al. 2016;Petri et al. 2013;Peel et al. 2017) or using machine learning (Gupta et al. 2018), they rely on summary statistics that do not capture all the information and whose distributions are not well known.
Capturing the full information content of the large-scale structure requires a field-based approach to infer the matter distribution from observations. Alsing et al. (2016Alsing et al. ( , 2017 presented a Bayesian hierarchical inference scheme to jointly infer shear maps and the corresponding power spectra, assuming Gaussian statistics of the shear field. From a Bayesian perspective, assuming a Gaussian distribution for the shear field is a well-motivated prior since it constitutes the maximum entropy prior once the mean and covariance are specified. However, more information coming from physics is available, and the Gaussian assumption is sub-optimal. In this work, we address this limitation by including a gravity model in the Bayesian hierarchical model. For this, we build on the Bayesian Origin Reconstruction from Galaxies (BORG, Jasche & Kitaura 2010a; Jasche & Wandelt 2013a; Lavaux et al. 2019) framework, which employs a physical description of the dark matter dynamics and allows us to sample Figure 1. Hierarchical representation of the BORG inference framework for the analysis of cosmic shear data. Primordial fluctuations ic encoded in a a set of Fourier modes at ≈ 1000 are obtained from the prior ( ic |Ω), where Ω represents the cosmological parameters. These initial conditions are evolved using the gravity model M ( ic ), which provides the evolved density f . The evolved density and the redshift distribution of sources ( ) are then used to compute the convergence field for each tomographic bin , ( , ( )). From the convergence, we compute the cosmic shear 1 , 2 in the flat-sky approximation.ˆ1 ,ˆ2 indicate the observational data. Purple boxes indicate deterministic transition while green boxes are probability distributions. from the initial conditions, which are accurately described by Gaussian statistics. With this more complex data model, we get a better representation of the data, and we can extract information beyond the two-point statistics, exploiting the full information content of the shear fields.
One of the main challenges in the analysis of cosmic shear based on estimating the power spectrum is accounting for the masked regions within the survey area (see, e.g. Chon et al. 2004;Brown et al. 2005;Smith 2006). Our forward modelling approach circumvents these difficulties associated with the survey mask. Although the data do not provide information about the fields in the masked regions, the dynamical model still provides probabilistic information about the shear and density fields that are physically possible in those regions. In our method, the masked regions are treated as pixels with infinite noise, circumventing the need to treat unobserved areas as being cut from the analysis.
The paper is organised as follows. Section 2 describes the data model for the cosmic shear and the likelihood. Section 3 gives an overview of the Bayesian inference framework, BORG, as required for this work. In Section 4, we described the simulated data employed in testing and validating the method. The results are presented in Section 5, showing that the method provides unbiased matter density fields. In Section 6, we discuss the effect of the prior power spectrum in the results. Finally, Section 7 summarises the results.

THE DATA MODEL
The effect of weak gravitational lensing on a source can be described by two sky fields: the spin-2 shear, , which describes the distortion in the shape of the image, and the scalar convergence field, , which describes the change in angular size. These two fields are related to the lensing potential, , by (see e.g. Kilbinger 2015), where 1 and 2 are the components of the shear distortion parallel and at /4 to the coordinate axes, and = + is the complex derivative on the sky, assuming the flat-sky approximation.
To connect the shear fields to the 3D dark matter distribution, we implemented a line-of-sight integration using the Born approximation, integrating along unperturbed paths. First, we generate convergence fields by integrating along the line-of-sight with the corresponding lensing weights as where is the position on the sky, is the comoving distance, lim is the limiting comoving distance, is the final density field and where ( ) is the source galaxy distribution. In our discrete implementation, this becomes The index labels the tomographic bin and the subindices indicate the 2D pixel on the sky, whose size is chosen to include typically many sources. The sum index indicates the slice in the radial direction, at a comoving distance . 2 is the total number of voxels along the radial axis. The voxels have a length of Δ .
is the 3D dark matter overdensity at a scale factor . The comoving radial distance indicates the distance to the source plane. The redshift distribution of sources for each tomographic bin is given by ( ). For this initial study, we assume the distant observer approximation.
In the flat-sky approximation, we can obtain the shear values from the convergence field. On a flat-sky, the shear and the convergence are related in Fourier space. We, therefore, use a discrete Fourier transform (DFT) to obtain the shear values as where ì = ( , ) is the wave-vector written as a complex quantity.
To analyse cosmic shear observations in our Bayesian framework, we now built a likelihood based on this data model. We assume Gaussian pixel noise for the shear, corresponding to a negative loglikelihood, L = − log (ˆ| ), that can then be written as whereˆ=ˆ1 +ˆ2 is the observed data. This is an estimate of the shear in the pixel, with a variance 2 , which is determined from the shape noise and number of sources per pixel as 2 / sources . We note that even if the ellipticity distribution is not Gaussian, provided many sources contribute to each pixel average the noise will become Gaussian according to the central limit theorem. An alternative would  be to sample from the distribution in another level of the hierarchy, but this would be expensive, so we simplify this stage by using summary statistics of estimated shear and their variance. This likelihood is then implemented into the large-scale structure sampler of the BORG framework. The corresponding physical forward modelling approach is illustrated in Fig. 1 and proceeds as follows. Using realisations of the three-dimensional field of primordial fluctuations, the dynamical structure formation model evaluates non-linear realisations of the dark matter distribution, accounting for the light-cone effects inherent to deep observations. Using these 3D dark matter field realisations and the data model, BORG predicts shear fields that are compared to the observed data via the likelihood in equation (7).

THE BORG FRAMEWORK
This work extends the previously developed BORG algorithm to analyse the spatial matter distribution underlying cosmic shear observations. In this section, we provide a summary of the algorithm. A more detailed description of the BORG framework can be found The BORG framework is a Bayesian inference method aiming at inferring the non-linear spatial dark matter distribution and its dynamics from cosmological data sets. The underlying idea is to fit full dynamical gravitational and structure formation models to observations. By using non-linear structure growth models, the BORG algorithm can exploit the full statistical power of high-order statistics of the matter distribution imprinted by gravitational clustering. This dynamical model links the primordial density fluctuations to the present large-scale structures. Therefore, the forward modelling approach allows the translation of the problem of inferring non-linear matter density fields into the inference of the spatial distribution of the primordial density fluctuations, which are well described by Gaussian statistics (Planck Collaboration 2019). The BORG algorithm, therefore, infers the initial matter fluctuations, the dark matter distribution and its dynamical properties from observations. Motivated by inflation theory and observational data, the BORG algorithm employs a Gaussian prior for the initial density contrast at an initial cosmic scale factor of 10 −3 , time for which density perturbations are linearly growing. Initial and evolved density fields are linked by deterministic gravitational evolution mediated by various physics models of structure growth. Specifically, BORG incorporates several physical models based on Lagrangian Perturbation Theory (LPT), fully non-linear particle-mesh models (Jasche & Lavaux 2019), a model based on spatial COmoving Lagrangian Acceleration framework (Leclercq et al. 2020), and a semiclassical analogue to LPT (Porqueres et al. 2020). While in this work, we used LPT to approximately describe gravitational clustering, any of these dynamical models can be straightforwardly employed within the flexible block sampling illustrated in Fig. 1.
At its core, the BORG framework employs MCMC techniques. This method allows inference of the full posterior distribution from which we can quantify the uncertainties in our results. However, the inference of the density field typically involves O (10 7 ) free parameters, corresponding to the discretised volume elements of the initial conditions. To explore this high-dimensional parameter-space efficiently, the BORG framework uses a Hamiltonian Monte Carlo (HMC) method, which exploits the information in the gradients and adapts to the geometry of the problem. We need, therefore, the adjoint gradient of the data model, which transforms the error vector from the likelihood space to the initial conditions. For the case of weak lensing, we derive this gradient in Appendix A. More details about the HMC and its implementation are described in Jasche & Kitaura (2010b) and Jasche & Wandelt (2013b).

THE MOCK DATA
To test the inference framework, we generated mock observations of cosmic shear with 30 sources per arcmin 2 as expected for the Euclid survey, with four tomographic bins with equal number density of sources, and a non-trivial survey mask. In this section, we describe the properties of the synthetic data.
Mock data are constructed by first generating Gaussian initial conditions on a Cartesian grid of size 1ℎ −1 Gpc ×1ℎ −1 Gpc ×4ℎ −1 Gpc  with 128×128×256 voxels. To generate primordial Gaussian density fluctuations we used a cosmological matter power spectrum including the baryonic wiggles calculated according to the prescription provided by Eisenstein & Hu (1998. We further assumed a standard ΛCDM cosmology with the following set of parameters: Ω = 0.31, Ω Λ = 0.69, Ω = 0.049, ℎ = 0.6711, 8 = 0.8, = 0.9624. Here H 0 = 100ℎ km s −1 Mpc −1 . To generate realisations of the non-linear density field, we evolve the Gaussian primordial fluctuations via LPT. This involves simulating displacements for 256 2 × 512 particles in the LPT simulation, accounting for light-cone effects inherent to deep observations. Final density fields are constructed by estimating densities via the cloudin-cell scheme from simulated particles on the Cartesian grid. A cosmic shear field is generated by applying the data model described in Sec. 2, assuming the redshift distributions for tomographic bins shown in Fig. 2. Finally, we added Gaussian pixel-noise to the shear with variance corresponding to 30 sources per arcmin 2 , as expected to be obtained from the Euclid survey (Laureĳs et al. 2011), and with an error on intrinsic ellipticity given by = 0.3 (Kilbinger 2015). This corresponds to a signal-to-noise ratio of / = 0.5. We added a non-trivial survey mask, shown in Fig. 3. Since the data provides no direct information in the masked regions, these are treated as pixels with infinite noise.

RESULTS
Here, we present the results of applying our algorithm to simulated cosmic shear data. We show that our method infers unbiased density fields and corresponding power spectra at all scales considered in this work. We also perform a posterior predictive test for the shear, showing that the inferred densities can explain the data within the noise uncertainty.

The warm-up phase of the sampler
In this first Bayesian approach, we keep the cosmology fixed, and specify a prior on the initial power spectrum. However, the power spectrum of the inferred matter distribution is conditioned by the data, and we can use the posterior ( ) as a diagnostic for the effectiveness of the inference since the power spectrum of the posterior samples may differ from the prior. To monitor the initial warm-up phase of the Markov sampler, we follow a similar approach to our previous works ( : we initialised the Markov chain with an over-dispersed state and traced the systematic drift of inferred quantities towards their preferred regions in the parameter space. Specifically, we initialised the Markov chain with a random Gaussian initial density field scaled by a factor 10 −3 and monitored the drift of corresponding posterior power spectra during the warm-up phase. Figure 4 presents the results of this exercise, showing successive measurements of the posterior power spectrum during the initial warm-up phase. The amplitudes of the posterior power spectrum show a systematic drift towards their fiducial values. By the end of the warm-up phase, the sampler has found an unbiased representation of the initial power spectrum at all Fourier modes considered in this work. Starting the sampler from an over-dispersed state, therefore, provides us with an important diagnostics to test the validity of the sampling algorithm. To test for residual correlations between different Fourier modes, we estimated the covariance matrix of power spectrum amplitudes from our ensemble of posterior samples. Figure 5 shows that the and ensemble mean final (middle-lower panel) density field computed from 500 MCMC samples. Since the information on the radial direction is not very informative, the density fields are projected on the sky, and the different slices of the 3D density field are weighted with the distribution of sources. Comparison between these panels shows that the method recovers the structure of the true projected density field with high accuracy. Right panels show standard deviations of inferred amplitudes of the initial (upper right panel) and final density fields (lower right panel). The regions of high uncertainty correspond to the masked regions, where there are no contributing sources. covariance matrix has a clear diagonal structure, discarding spurious correlations between scales.

Inferred density fields
As discussed above, our method uses a forward modelling approach, fitting a physical dynamical model to shear data and employs an MCMC sampler to explore the parameter space. This provides the full posterior distribution, from which we draw samples of the initial matter fluctuations. Figure 6 shows projections of the true fields, and the ensemble mean and variances of inferred three-dimensional fields. The mean and variance are estimated from 500 samples of the posterior distribution (the correlation length is ≈ 80 samples). To compare the ground truth to the inferred mean density field, we computed the projection of the density fields on the sky since the radial information is not very constraining. In this projection, the different slices of the 3D density field are weighted with the distribution of the lensing sources. A first visual comparison between ground truth and the inferred ensemble mean initial and final density fields shows that the algorithm correctly recovered the large-scale structures from cosmic shear data. As expected, the mean of the initial density samples exhibits a small degree of smoothing, a feature that is known from the Wiener filtering solution for Gaussian fields and Gaussian prior.
The right panels of Fig. 6 show the corresponding standard deviations of the projected densities, which are estimated from the posterior samples. The high uncertainty regions correspond to the masked areas, where there are no contributing sources. While the data do not provide direct information about the density field in these masked regions, the dynamical model still provides probabilistic information about the density fields that are physically plausible in those regions. These results indicate that the method can deal with non-trivial survey masks, and account for the uncertainty in the unobserved areas. The standard deviation of the initial conditions is homogeneous, apart from mask effect, indicating that the dynamical model correctly propagates the information between the primordial matter fluctuations and the final density field.

Posterior predictive tests
Posterior predictions allow testing whether the inferred density fields provide accurate explanations of the data (see, e.g. Gelman et al. 2004). Generally, posterior predictive tests provide good diagnostics about the adequacy of data models in explaining observations and identifying possible systematic problems with the inference. In this section, we predicted the shear and convergence fields as the average computed from 500 posterior samples. Figure 7 presents the result of this test for one tomographic bin, showing that the posterior predicted shear and convergence recover the features of the true fields. The masked regions show higher standard deviation, indicating that the method can account for the uncertainties in the unobserved areas and provide probabilistic information of the physically plausible shear in those regions. While Fig. 7 shows a visual comparison, Fig. 8 shows the residuals between the true and the mean posterior-predicted shear fields. The green line in the plot indicates the noise distribution, showing that the distribution of the Figure 7. Posterior predicted shear and convergence for one tomographic bin. The left column shows the shear data, including noise and masked regions; the second column shows the true shear and convergence fields, and third and fourth columns show the mean and standard deviation of the posterior-predicted shear and convergence, computed from 500 posterior samples. The method recovers the true cosmic shear correctly. The regions with higher standard deviation correspond to the masked regions. Figure 8. Histogram of the residuals computed as the difference between the posterior predicted shear and the true shear. We note that the true shear does not include the noise. The residuals distribution is narrower than the distribution of pixel noise in the data, indicated in green, showing that the method recovers the true shear at sub-noise level with additional constraints from the cosmological prior. residuals is narrower than the noise distribution, and, therefore, the inferred quantities can explain the data at sub-noise level, with the additional constraints coming from the cosmological prior. Figure 9 shows the power spectrum of the posterior-predicted shear, measured as the averaged of predicted shear fields from 500 Figure 10. Distribution of the convergence field for tomographic bins centred at different redshift. For comparison, a Gaussian with the same mean and variance is plotted on top. The convergence distribution is skewed for tomographic bins centered at > 0.5. randomly-drawn samples. The predicted power spectra match the true shear power spectrum at all scales. We note that we computed the true power spectrum from the noiseless data as the posteriorpredicted shear does not contain noise.

Distribution of the convergence field
Previous Bayesian approaches (Alsing et al. 2016) rely on a Gaussian prior for the shear data. The Gaussian prior is well-justified when only the mean and variance are known since it is the least informative prior. It is important to understand that the samples are not Gaussian fields, since they are conditioned on the data, so non-gaussianity in the field may be imposed by the data. However, we can make use of the fact that we have more information available from knowledge of gravitational physics and, for this reason, we include a gravity model in our Bayesian hierarchical model. The advantage here is that we sample from the initial field, which we know to be Gaussian, so rather than relying on an uninformative prior for the final shear fields, we use the correct Gaussian distribution for the initial conditions. What we do not yet do in this model is to vary the prior parameters of the power spectrum (as Alsing et al. (2016) do), and this will be the subject of future work. As described in section 2, we obtained the shear on a flat-sky from the convergence, which is computed from the non-linear density field. Figure 10 presents the distribution of the convergence field for different tomographic bins obtained with our method. While the convergence shows a Gaussian distribution for tomographic bins centred at > 0.5, it is skewed for tomographic bins at lower redshifts. This indicates that the Gaussian approximation is accurate at large redshift, but it is sub-optimal for low-redshift bins.

PRIOR TEST
As discussed in section 5.1, in this approach, we keep the cosmology fixed, and specify a prior on the initial power spectrum. Although our method does not sample the power spectrum, the power spectrum of the posterior samples is conditioned by the data. This means that, if the data require it, the posterior power spectrum deviates from the prior, and the density samples have a power spectrum that differs from the prior. To demonstrate that, we tested our method with mock data generated with ℎ = 0.6, 8 = 0.55 and analyse these data with ℎ = 0.677, 8 = 0.8, which determine the prior power spectrum. Figure 11 presents the evolution of the power spectrum for this test, showing the burn-in phase of the Markov chain. In the plane of the sky, the posterior power spectrum agrees with the truth, demonstrating that the method is sensitive to the true power spectrum even After the burn-in phase, the method recovers the correct matter power on the plane of the sky. However, the data is not constraining in the radial direction and the prior dominates. We note that as currently implemented, isotropy is not a requirement. Figure 12. Histogram of the shear residuals for the prior test. The residuals are computed as the difference between the mean posterior-predicted shear and true shear. The distribution of shear residuals is narrower than the noise distribution, indicated in green.

Figure 13.
Posterior power spectrum of the shear field compared to the power spectrum of the true shear for the prior test. The grey dotted line indicates the shear power spectrum computed using the prior cosmology, which differs from the truth. The bottom plots show the ratio between the posterior and the true power spectrum, showing that, although the prior and the truth differ, the method recovers the true power spectrum of the shear. when this differs from the prior. However, in the line-of-sight direction, the data is not constraining, and the prior dominates. These results are consistent with Simon et al. (2009). As currently implemented, isotropy is not required, and the method cannot determine the radial power accurately. The recovered matter power spectrum, however, suffices to explain the data, as can be seen in Fig. 12 and 13, showing that the shear residuals are below the noise level and the method recovers the correct lensing power spectrum. We note that, for this test, the number of sources per tomographic bin is not the same, but the furthest bin has fewer sources, resulting in a higher noise variance.
In future work, we will extend our Bayesian hierarchical model to jointly sample the cosmological parameters and the density field, using an approach compatible with the one presented in Ramanah et al. (2019). We expect that this future extension of the method will constrain the matter power spectrum, also in the radial direction, through the assumption of isotropy, but we do not anticipate being able to recover the small-scale radial distribution at the field level, because of the width of the lensing kernel and the distance uncertainties. Meanwhile, the test presented here shows that the posterior power spectrum is conditioned by the data despite using a fixed cosmology.

SUMMARY AND DISCUSSION
We have developed a Bayesian physical forward model to infer the matter density field and primordial fluctuations from cosmic shear data. This framework consists of a Gaussian prior for the primordial fluctuations, a dynamical structure formation model that links the initial conditions and the evolved density field, and a likelihood based on a data model of the cosmic shear in the flat-sky approximation.
This forward modelling approach allows us to go beyond the common analyses of cosmic shear based on two-point statistics. While many studies of the cosmic shear focus on the power spectrum or the correlation function, the non-linear dynamics of the large-scale structure encode significant information in higher-order statistics associ-ated with the filamentary structure of the cosmic web. Our dynamical forward model reproduces the filamentary matter distribution and, in this way, allows using every data point, rather than relying on summary statistics that do not capture all the information and whose distributions are not well known. By employing a more accurate gravity model, our method also improves over previous Bayesian hierarchical approaches that assumed a Gaussian distribution of the shear field (Alsing et al. 2016).
We have tested our inference method with simulated data with four tomographic bins, a survey mask, and 30 sources per arcmin 2 as expected for the Euclid survey. These tests demonstrate that our method recovers the unbiased matter distribution and initial matter power spectrum from cosmic shear data. Posterior predictive tests showed that the inferred quantities are known to the sub-noise level, with additional constraints coming from the cosmological prior.
Although our framework currently uses a fixed cosmology, we have shown that the method is sensitive to the true power spectrum when this differs from the prior. While we do not sample the power spectrum, the posterior power spectrum deviates from the prior if the data require it. To illustrate this, we performed a test using different values of 0 and 8 to generate the mock data and to analyse them. In this case, the prior power spectrum, therefore, differs from the true power spectrum. This test demonstrated that our method is sensitive to the underlying cosmology, and the power spectrum of the density samples is conditioned by the data, recovering the true matter power spectrum across the sky and the lensing power spectrum. However, in the radial direction, the data is not informative, and the prior dominates since isotropy is not a requirement. In future work, we will extend our Bayesian hierarchical approach to sample the cosmological parameters. We expect that this extension will constrain the matter power spectrum also in the radial direction through the assumption of isotropy.
To summarise, this work demonstrates the feasibility of detailed and physically plausible inference of the large-scale structure from cosmic shear data. The proposed approach, therefore, improves the shear data model from previous methods by including a physical description of gravity, providing a better representation of the data and allowing us to extract information beyond the two-point statistics.
In future work, we will explore the constraints on cosmology that this approach provides by jointly sampling the initial conditions and the cosmological parameters.