Forecasting the power of Higher Order Weak Lensing Statistics with automatically differentiable simulations

Aims. We present the Di ﬀ erentiable Lensing Lightcone (DLL), a fully di ﬀ erentiable physical model designed for being used as a forward model in Bayesian inference algorithms requiring access to derivatives of lensing observables with respect to cosmological parameters. Methods. We extend the public FlowPM N-body code, a particle-mesh N-body solver, simulating lensing lightcones and implementing the Born approximation in the Tensorﬂow framework. Furthermore, DLL is aimed at achieving high accuracy with low computational costs. As such, it integrates a novel Hybrid Physical-Neural parameterisation able to compensate for the small-scale approximations resulting from particle-mesh schemes for cosmological N-body simulations. We validate our simulations in an LSST setting against high-resolution κ TNG simulations by comparing both the lensing angular power spectrum and multiscale peak counts. We demonstrate an ability to recover lensing C (cid:96) up to a 10% accuracy at (cid:96) = 1000 for sources at redshift 1, with as few as ∼ 0 . 6 particles per Mpc / h. As a ﬁrst use case, we use this tool to investigate the relative constraining power of the angular power spectrum and peak counts statistic in an LSST setting. Such comparisons are typically very costly as they require a large number of simulations, and do not scale well with the increasing number of cosmological parameters. As opposed to forecasts based on ﬁnite di ﬀ erences, these statistics can be analytically di ﬀ erentiated with respect to cosmology, or any systematics included in the simulations at the same computational cost of the forward simulation. Results. We ﬁnd that the peak counts outperform the power spectrum on the cold dark matter parameter Ω c , on the amplitude of density ﬂuctuations σ 8 , and on the amplitude of the intrinsic alignment signal A IA .


Introduction
Weak gravitational lensing by Large Scale Structures (LSS) is one of the key probes to test cosmological models and gain insight into constituents of the Universe.The upcoming stage-IV surveys, such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory Ivezić et al. (2019), the Nancy Grace Roman Space Telescope (Spergel et al. 2015), and the Euclid Mission (Laureijs et al. 2011), will provide measurements of billions of galaxy shapes with unprecedented accuracy, which in turn will lead to tight constraints on dark energy models (e.g.Mandelbaum et al. 2018).
With the increased statistical power of these surveys comes the question of their optimal analysis.Traditional cosmological analysis rely on measurements of the two-point statistics, either the shear two-point correlation functions or its Fourier transform, the lensing power spectrum.However, the two-point statistics are only optimal for Gaussian fields, and do not fully capture the non-Gaussian information imprinted in the lensing signal at the scales that future surveys will be able to access (e.g.infor-Contact: denise.lanzieri@cea.frmation encoded in the peaks and in the filamentary features of the matter distribution).
Recently, machine learning-based methods that broadly fall in the category of Simulation-Based Inference (SBI) (Fluri et al. 2019;Kacprzak & Fluri 2022;Fluri et al. 2021;Jeffrey et al. 2021;Fluri et al. 2022), and Bayesian forward-modeling frameworks (Porqueres et al. 2021;Sarma Boruah et al. 2022) have also been introduced to attempt to fully account for the non-Gaussian content in the weak lensing signal.Unlike the meth-ods described above, these approaches are designed to access the full field-level information content.Even though these techniques are asymptotically theoretically optimal in terms of information recovery, they still suffer from significant limitations.SBI methods are characterized by the absence of an analytical model to describe the observed signal and instead rely on learning a likelihood from simulations.Modern approaches employ deep learning-based density estimation methods to model the likelihood without the need to make any Gaussianity assumptions, and as such have drawn the attention of the community (Alsing et al. 2018;Jeffrey et al. 2021).
A key element common to most of these methodologies is their ability to benefit from gradient information.For example, Porqueres et al. (2021) present a Bayesian hierarchical approach to infer the cosmic matter density field simultaneously with cosmological parameters using the Hamiltonian Monte Carlo (HMC) algorithm to explore the full high-dimensional parameter space.The HMC algorithm exploits the information encoded in the gradients of the joint likelihood function to inform each update step and hence requires the derivatives of the forward model.
As a different class of examples, Makinen et al. (2021) demonstrate that the full information content of a cosmological field can be represented by optimal summaries, allowing for likelihood-free and near-exact posteriors for cosmological parameters.In particular, they use neural networks trained on simulations to maximise the Fisher information, which requires having access to derivatives of the simulation model to compute a Fisher matrix.
One possible way of evaluating these gradients with respect to cosmological parameters is by using numerical differentiation, e.g.computing a finite difference of a given statistic by varying the simulation fiducial values by a small amount.However, numerical differentiation is expensive in terms of computational resources and simulation time, and also requires hyperparameter tuning for the step size used in the finite difference scheme.An alternative option consists in computing the gradient analytically.Both these solutions are faced with limitations: the first approach gets computationally intractable in high dimensions, while the analytic gradients are sometimes impossible to estimate.All of these new techniques for cosmological analyses make the development of fast and differentiable simulations necessary.Böhm et al. (2021) developed MADLens, a CPU-based python package for producing non-Gaussian lensing convergence maps.These maps are differentiable with respect to the initial conditions of the underlying numerical simulations and to the cosmological parameters Ω m and σ 8 .In this paper, we aim to efficiently compute gradients that benefit the development of new inference algorithms for weak lensing surveys.To do this we extend the framework of the FlowPM package Modi et al. (2021) by implementing the Born approximation and simulating lensing lightcones in the Tensorflow framework.TensorFlow is a tensor library that includes the ability to perform automatic differentiation.Automatic differentiation enables us to compute gradients exactly as opposed to finite differences, which only provide approximate gradients.Specifically, TensorFlow implements the backpropagation algorithm to compute gradients, i.e. first it creates a graph (e.g.data structures representing units of computation), then it works backward through the graph by applying the chain rule at each node.Unlike MADlens, our tool is GPU-based and provides derivatives with respect to all the cosmological parameters.There are also differences in the implementation of various functions to improve accuracy and speed.
We validate our simulations against the cosmological Nbody simulations κTNG (Osato et al. 2021) by comparing both the lensing angular power spectrum and multiscale peak counts.
In particular, as a first application, we show how the differentiability of numerical simulations can be exploited to evaluate the Fisher Matrix.Then, we compare the constraining power of two map-based weak lensing statistics: the lensing power spectrum and peak counts and investigate the degeneracy in high dimensional cosmological and nuisance parameter space through Fisher forecasts.
This paper is structured as follows: in 2 we briefly review the weak lensing modeling including the theoretical framework and the summary statistics used in this work.In section 3 we introduce the numerical simulations illustrating the numerical methods used to generate mock WL maps.In section 4 we validate the simulations by comparing the statistics from our simulations and κTNG-Dark ones.The Fisher forecast formalism and the survey and noise setting are shown in section 5. We finally discuss our results and present our conclusions afterward, in section 6 and section 7.

Cosmic shear
Weak gravitational lensing is a powerful probe to infer the distribution of matter density between an observer and a source.The effect of gravitational lensing can be quantified in term of the separation vector x between two light rays separated by an angle θ: with Φ and Φ 0 the gravitational potential along the two light rays, f k (χ) and χ the angular and radial comoving distance.
Formally, the effect of the linearized lens mapping is described by the Jacobian matrix: In the limit of weak-field metric (small Φ), the integral in Equation 1 can be approximated by considering the series expansion in power of Φ and truncating the series at the first term.With these assumptions, given that ∇ ⊥ Φ 0 is not dependent from θ, the Jacobian matrix can be written as: This, also known as Born approximation, corresponds to integrating the potential gradient along the unperturbed ray.If we define the 2D potential, the lensing potential as: the Jacobi matrix can be written as: From the parametrization of the symmetrical matrix A, we can define the spin-two shear γ = (γ 1 , γ 2 ) and the scalar convergence field, κ.Hence, the convergence and the shear are defined as the second derivative of the potential: the two fields γ and κ describe the distortion in the shape of the image, and the change in the angular size, respectively.By combining the 2D Poisson equation with the Equation 7, the convergence κ becomes: where we define the lensing efficiency: Thus, the Born-approximated convergence can be interpreted as the integrated total matter density along the line of sight, weighted by the distance ratios and the normalised source galaxy distribution n(χ)dχ = n(z)dz.

Intrinsic alignments (NLA)
The galaxy ellipticity observed by a telescope can be decomposed in the cosmic shear signal γ and the intrinsic ellipticity of the source int , where the latter is the combination of the alignment term IA and the random component ran .
Different theoretical models have been proposed to describe the physics of Intrinsic Alignments (IA), such as the Non-Linear tidal Alignment model (NLA) (e.g. in Bridle & King (2007)), the tidal torquing model (Hirata & Seljak 2004;Catelan et al. 2001), or the combination of both the Tidal Alignment and Tidal Torquing model (TATT) (Blazek et al. 2019).We model the IA effect using the NLA description (Harnois-Déraps et al. 2021), i.e. assuming a linear coupling between the intrinsic galaxy shapes and the non-linear projected tidal fields s i j : from which the observed ellipticities are computed as: The A IA term in Equation 11 defines the strength of the tidal coupling, C1 is a constant calibrated in Brown et al. (2002), D(z) is the linear growth function and ρ is the matter density.

Lensing Summary Statistics
To extract the cosmological information from the simulated κ−maps, we use two summary statistics for weak lensing observable: the angular power spectrum and the starlet peak counts (Lin et al. 2016).

Angular Cls
Second-order statistics, both in the form of shear 2-point correlation function ξ ± (θ), or its counterpart in Fourier space, the angular power spectrum C , have been widely used to extract the cosmological information from weak lensing surveys.In the Limber approximation, the angular power spectrum of the convergence field for a given tomographic bin can be computed as: where P δ defines the matter power spectrum of the density contrast.
The Intrinsic Alignment (IA) signal adds an excess correlation to the two-point shear correlation function (also known as cosmic shear GG or shear-shear correlation) with two terms: 1) the intrinsic-intrinsic (II) term, tracing the correlation of the intrinsic shape of two galaxies and 2) and the intrinsic-shear coupling (GI) term, describing the correlation between the intrinsic ellipticity of one galaxy with the shear of another galaxy (Kilbinger 2015).The matter power spectra for the IA terms are defined as: where D(z) ≡ D(1 + z) (Harnois-Déraps et al. 2021).Under the Limber approximation the projected angular power spectra for the IA terms become:

Wavelet peak counts
Wavelet Transform The wavelet transform has been widely used in analysing astronomical images due to its ability to decompose astronomical data into components at different scales.This multiscale approach is well-suited for the study of astronomical data since their complex hierarchical structure.A wavelet function ψ(x) is a function that satisfies the admissibility condition: where we indicate with ψ(k) the Fourier transform of ψ(x), with ψ(x)dx = 0 in order to satisfy the admissibility condition.A given signal is decomposed in a family of scaled and translated functions: where ψ a,b are the so-called daughter wavelets, scaled and translated version of the mother wavelet, with a and b scaling and translation parameters.The continuous wavelets transform is defined from the projections of a function f ∈ L 2 (R) onto the family of daughter wavelets.The coefficients of this projection represent the wavelet coefficient, obtained by : with ψ * the complex conjugate of ψ, and ∀a ∈ R + , b ∈ R. In this work, we filter the original convergence maps with the starlet transform, an isotropic and undecimated (i.e.not down-sampled) wavelet transform, suited for astronomical applications where objects are mostly more or less isotropic (Starck et al. 2007).
It decomposes an image c 0 as the sum of all the wavelet scales and the coarse resolution image c J : where J max is the maximum number of scales and w j is the wavelet images showing the details of the original image at dyadic scales with a spatial size of 2 j pixels and j = J max + 1.
The starlet wavelet function is a specific translational invariant wavelet transform: specified by an isotropic scaling function φ, that, for astronomical application, can be defined as a B-spline of order 3: The N-dimensional scaling functions can be built starting from the separable product of N φ 1D : φ(x 1 , x 2 ) = φ 1D (x 1 )φ 1D (x 2 ).Each set of wavelet coefficients w j is obtained as the convolution of the input map with the corresponding wavelet kernel.For a full description of the starlet transform function, see Starck et al. (2007) and Starck et al. (2010).
Peak counts It has been shown that it is necessary to go beyond second-order statistics to fully capture the non-Gaussian information encoded in the peaks of the matter distribution (Bernardeau et al. 1997;Jain & Seljak 1997;van Waerbeke et al. 1999;Schneider & Lombardi 2003).Several studies have shown that the weak-lensing peak counts provide a way to capture information from non-linear structures that is complementary to the information extracted by power spectrum (Lin & Kilbinger 2015;Peel et al. 2017;Ajani et al. 2020;Harnois-Déraps et al. 2021;Zürcher et al. 2022).The peaks identify regions of weak lensing map where the density value is higher, in this way they are particularly sensitive to massive structures.There are two different ways to record weak lensing peaks: as 1) local maxima of the signal-to-noise field or 2) local maxima of the convergence field.In both cases, they are defined as pixels of larger value than their eight neighbors in the image.

Fast and Differentiable Lensing Simulations
Analytical models with which to predict the observed signals do not exist for most higher-order summary statistics.To circumvent this issue, one approach is to rely on generating a suite of numerical simulations.In the following section, we introduce our weak lensing map simulation procedure, including a description of the N-body simulator and the lightcone construction.

FastPM/FlowPM
Numerical simulations provide a practical way to model the highly nonlinear universe and extract cosmological information from observation at different scales.
However, collision-less N-body simulations typically require significant computational effort in terms of time and CPU/GPU power, in particular, computing the gravitational interactions between the particles is typically the most time-consuming aspect and where most of the approximations are done.For that reason, several quasi N-body schemes have been developed to reduce the simulation time and the computational cost of full numerical simulations.Our weak lensing simulation tool is mainly based on the FastPM algorithm (Feng et al. 2019) and its FlowPM (Modi et al. 2021) implementation which provides a fast Particle-Mesh (PM) solver estimating the gravitational forces by computing Fast Fourier Transforms on a 3D grid.

Automatic differentiation through black-box ODE solvers
In this work, we extend the FlowPM approach by implementing the time integration of the Ordinary Differential Equations (ODEs) that describe the gravitational evolution of the particles in the simulation using a black-box ODE integrator.This is in contrast to the leapfrog integration method used in FastPM.One reason for this change is that adaptive ODE solvers are able to automatically adjust the time step of the simulation based on the desired accuracy for the result.Another reason for this approach is that modern automatic differentiation frameworks like Ten-sorFlow provide automatically differentiable solvers which significantly reduce the memory footprint of the simulation when computing the gradients, as will be described below.
We begin by describing the set of equations used in the simulation: with x and v the position and the velocity of the particles, a the cosmological scale factor, E(a) the ratio between the Hubble expansion rate and the Hubble parameter and F the gravitational force experienced by the dark matter particles in the mesh.
To evaluate the gradients of the solution with respect to input cosmological parameters, it is therefore necessary to backpropagate through the ODE solver.Very recently the adjoint sensitivity method Chen et al. (2018); Pontryagin et al. (1962) has gained a lot of attention in the field of deep learning, as it allows to compute these gradients by solving a second ODE backward in time and treat the ODE solver as a black box.
Consider an ODESolve(z(t 0 ), f, t 0 , t 1 , θ), where z(t) is the state variable, f the function modeling the dynamics, t 0 the start time, t 1 the stop time and θ the dynamic parameter.The function F of its output can be differentiated with respect to the input θ.First, we need to compute the adjoint a(t) = ∂F /∂z(t), i.e. the gradient of F respect to the hidden state z(t).Then, we can determine the dynamics of the adjoint through: Finally, we compute the gradients with respect to the parameters θ evaluating a third integral: All the integrals are evaluated in a single call to the ODE solver, and the Jacobian is computed by automatic differentiation.The choice to extend the FlowPM code with the ODE implementation is motivated by the fact that to compute the gradient of the forward model, the original algorithm needs to store all the intermediate steps of the simulations.This induces a memory overhead that scales with the number of time steps in the simulation.In the adjoint ODE approach, this is instead replaced by solving another ODE backward in time when evaluating the gradient.We illustrate the potential of differentiating through ODE solvers, highlighting the fact that the simulations and the gradients presented in this paper are computed using one single GPU for 128 3 particles.

Hybrid Physical-Neural ODE
PM simulations can be used as a viable alternative to full N-body to model the galaxy statistics and create fast realizations of largescale structures at lower computational cost.Nevertheless, these kinds of simulations lack resolution on small scales and are not able to resolve structures with scales smaller than the mesh resolution.To compensate for the small-scale approximations and recover the missing power, we adopt an Hybrid Physical-Neural (HPN) approach presented in Lanzieri et al. (2022).The correction scheme we implement consists in computing the short-range interaction as an additional force parameterized by a Fourierspace Neural Network.This residual force is modeled by applying a learned isotropic Fourier filter acting on the PM-estimated gravitational potential φ PM : where F −1 is the inverse Fourier transform and f θ (a, |k|) is a Bspline function whose coefficients are the output of the Neural Network of parameters θ trained using the CAMELS simulations (Villaescusa-Navarro et al. 2021).In particular, we use a single CAMELS N-body simulation at the fiducial cosmology of h = 0.6711, n s = 0.9624, M ν = 0.0 eV, w = −1, Ω k = 0., Ω m = 0.30, σ 8 = 0.8.We adopt a loss function penalizing both the positions of the particles and the overall matter power spectrum at different snapshot times s: where λ is an hyper-parameter balancing the contributions of the two terms.By comparing the results obtained from different values of λ in the fiducial setting and outside the training regime, we find that λ = 0.1 provided the optimal balance in terms of overall correction and overfit.
In Lanzieri et al. (2022) we tested the robustness of the HPN correction scheme to changes in resolution and cosmological parameters, i.e. we applied the correction parameters found with the setting described above to simulations of larger volume or different Ω m and σ 8 .We observed that most of the missing power that characterizes the matter power spectrum in the PM approximation is still recovered when the HPN correction is applied to simulations different from the ones used to train the Neural Network.So, from these tests, we can conclude that the HPN is still robust to differences among simulation settings.

Differentiable Lensing Simulations
To extract the lens planes and construct the lightcone, we export 11 intermediate states from the N-body simulation of a fixed interval of 205 h −1 Mpc in a redshift range between z = 0.03−0.91.
To recover the redshift range of the lightcone, one unit box is replicated using periodic boundary conditions.First, we generate rotation matrices along the three axes, hence, each snapshot is rotated around each of the three axes, finally, all the particles are randomly shifted along the axes.To obtain the final density field, each snapshot is projected in a 2D plane by estimating its density with a cloud-in-cell (CiC) interpolation scheme (Hockney & Eastwood 1988).After creating a Cartesian grid of coordinates, each slice is interpolated onto sky coordinates.This procedure differs from the one implemented in the MADLens package (Böhm et al. 2021).In MADLens the lightcone is built by translating the redshift of the particles into distances, then the particles are projected onto the convergence map at the proper evolution step corresponding to that distance.

Implementation of Born lensing
We generate the convergence map by integrating the lensing density along the unperturbed line of sight, i.e. applying the Born approximation (Schneider 2006).In particular we discretize the Equation 9 such that it becomes: where the i index runs over the different lens planes, the δi indicates the matter overdensity projected into the lightcone, χ s defines the comoving distance of the source and ∆χ is the width of the lens plane.

Implementation of IA with NLA
We model the effect of IA on the convergence map level following the model proposed by Fluri et al. (2019).This allows us to create pure IA convergence maps to combine with shear convergence maps in order to generate a contaminated signal.Following Harnois-Déraps et al. ( 2021), the Fourier transform of the intrinsic ellipticities can be phrased as: where σ g defines the smoothing scale of a two-dimensional smoothing kernel G 2D , the tilde symbols ∼ refers to the Fourier transformed quantities and k ⊥ denotes the two Fourier wavevector components perpendicular to the line of sight.Combing Equation 11and Equation 31 we can calculate the intrinsic alignment as part of the convergence map: where the index i refers to the i-th redshift bins.

Differentiable Peak Counts
One of the difficulties in estimating derivatives of traditional peak count statistics is that it relies on building an histogram of peak intensities, and histograms, due to the discrete nature of the bins, are not differentiable.However, the underlying idea of peak counting is just to build an estimate of the density distribution of Article number, page 5 of 18 A&A proofs: manuscript no.journal_paper number of peaks as a function of their intensity.Histograms are one way to build such an estimate, and have been historically preferred, but for no particular reason.To circumvent the nondifferentiability of histograms, we prefer here to estimate this density using an alternative method: Kernel Density Estimation (KDE).As a continuous equivalent to an histogram, KDEs are differentiables and can just as well be used to define the peak counts statistic.We define the KDE for the peak counts as: where b w is the smoothing bandwidth parameter, X is the number of peaks in a given bin, and x is the center of each bin.This procedure yields a peak count statistic that is smoothly differentiable with respect to the input map and thus can be used for applications such as Fisher forecasting as discussed later in this paper.

Validating simulations for LSST
In this section, we compare the results from our simulation to other works, including the analytic models for the matter power spectrum Halofit (Smith et al. 2003;Takahashi et al. 2012) and the cosmological N-body simulations, κTNG (Osato et al. 2021).
The κTNG simulations is a suite of publicly available weak lensing mock maps based on the cosmological hydrodynamical simulations IllustrisTNG, generated with the moving mesh code AREPO (Springel 2010).In particular, we use the κTNG-Dark suit of maps based on the corresponding dark matter-only TNG simulations.The simulations have a side length of the box equal to 205 Mpc/h and 2500 3 CDM particles.To model the propagation of light rays and simulate the weak lensing maps, a multiplelens plane approximation is employed.The simulation configuration consists of a map size of 5 × 5 deg 2 , 1024 × 1024 pixels and a resolution of 0.29 arcmin/pixel.For a complete description of the implementation see Osato et al. (2021).
To produce our simulations, we follow the evolution of 128 3 dark matter particles in a periodic box of comoving volume equal to 205 3 (h −1 Mpc) 3 , with initial conditions generated at z=6 using the linear matter power spectrum as implemented by Eisenstein & Hu (1998).In particular, we implement the Eisenstein-Hu transfer function in the Tensorflow framework, in order to compute its gradients automatically.
The actual choice of bins to include in the forecasting is made following the DESC data requirement for the angular power spectra (Mandelbaum et al. 2018), i.e. adopting max,shear = 3000 and min,shear = 300.

HPN validation
To compensate for the small-scale approximations resulting from PM, we applied the HPN approach presented in subsubsection 3.2.1.We show on Figure 1 the power spectrum and the fractional power spectrum of PM simulations before and after the HPN correction compared to analytic Halofit predictions (Smith et al. 2003;Takahashi et al. 2012) for redshift z = 0.03 and z = 0.91.We observe a bias between our measured power spectrum and the theoretical prediction at low k.This reduced power is explained by the small box size of our simulation and the associated reduced number of large-scale modes.At redshift z = 0.91 most of the missing power is recovered by the HPN correction up to k ∼ 1, after which the method overemphasizes the small-scale power.In this article however, we can assume that this effect does not impact the results of the cosmological parameters forecast, since it concerns scales that are beyond the range of frequencies that are taken into account for the analysis.At redshift z = 0.03, the correction model does not improve significantly the results.
In Figure 2 we show an example of our convergence map at z = 0.91, from pure PM simulation (first panel) and the HPN corrected simulation (second panel).The HPN model sharpens structures in the lensing field without introducing any artifacts.
In the upper panel of Figure 3, we present the angular power spectrum computed from our Differentiable Lensing Lightcone (DLL hereafter) complemented by the HPN scheme and a conventional DLL simulation with the same resolution.Both the outputs are compared to the κTNG prediction.In the lower panel of Figure 3 the fractional differences between the convergence power spectra from the two maps and the κTNG are shown.Both the power spectra and ratios are averaged over N = 100 realisations.We can see that the HPN model reduces the relative deviations of the angular power spectra to within 30%.We also observe a perfect match at large scales, since the κTNG and the DLL simulations have the same box size of 205 Mpc/h 3 .

IA validation
In the upper panel of Figure 4, we present the C II and C GG contributes from our DLL simulations compared to theoretical Halofit predictions (Smith et al. 2003).In the lower panel of the same figure, we show the fractional differences between the mentioned contributions.To validate the IA infusion, only for this experiment, we run simulations keeping the term A IA = 1.
As we can see, the fractional difference for the C II term features uncertainty consistent with C GG term, validating our infusion process.The signal is computed for the source redshift z s = 0.91 and is averaged over 100 realizations.The theoretical predictions are computed using the public Core Cosmological Library (CCL, Chisari et al. (2019)).

Lensing C
To quantify the accuracy of the simulations we aim to reproduce the summary statistics from the Dark Matter Only κTNG simulations.We compare the results from the angular power spectrum for different source redshift, just investigating how well we can recover the power spectrum for a given source plane.
The results of the angular power spectrum from the sources redshift z= [0.25,0.46,0.65,0.91,1.30]are shown in the upper panel of Figure 5, as well as the fractional differences between the κTNG and DLL maps in the lower panel.We observe that the differences for z s = 0.91 and z s = 1.30curves are within 10% of accuracy for scales larger than = 1000, within 25% for scales 1000 < < 2000 and within 25% and 45% for scales 2000 < < 3000.For lower source redshifts the deficit of power in our simulations becomes worse.This can be explained considering that a given value of at lower redshift corresponds to smaller scales, some of those below the resolution of our simu-lations.We conclude that, if for z=0.91 and z=1.30we have a general agreement with κTNG, with this specific setting of the model, we can not model correctly cases with sources redshift lower than z=0.91.We want to highlight that the results shown are produced keeping the resolution of the simulations extremely low, and we do not aspect to get the same precision as κTNG.The purpose of these tests, and the overall goal of the paper, is to present a proof of concept of the DLL package and its potential.In practice, we will not work at this resolution.
Nevertheless, note that the simulations presented here already achieve a similar resolution of the MassiveNus simulations (Liu et al. 2018), despite being generated using one single GPU.

Peak counts
We compute the starlet peak counts as wavelet coefficients with values higher than their eight neighbors.We define J max = 7 in Equation 21, this starlet filter applied to our map with a pixel size of 0.29 arcmin, corresponds to a decomposition in 7 maps of resolution [0.59, 1.17, 2.34, 4.68, 9.33, 18.79, 37.38] arcmin and a coarse map.To satisfy the survey requirement and keep the analysis centred in the range = [300, 3000], we consider only the scales corresponding to [9.33, 18.79, 37.38] arcmin.The peaks are counted for 8 linearly spaced bins within the range As for the power spectrum, we compare the peak counts statistic from our map to the one from the κTNG for different redshift bins.We present the results in subsection 4.4.These results are shown in terms of S/N, where the signal to noise is defined as the ratio between the amplitude of wavelet coefficients over the noise expected for our survey choice.At wavelet scale θ = 9.33 arcmin the differences for the z s = 0.91 curves are within the 20% up to S /N = 3, for S /N > 3 the accuracy is between the 20% and the 50%.At larger scale, θ = 18.79 arcmin the accuracy is within the 20%.Finally, at θ = 37.38 arcmin the accuracy is within the 15%, except S /N < 1 where the accuracy decreases up to 28%.The results slightly improve for z=1.30, showing an accuracy within the 35% for scale θ = 9.33 arcmin, within the 10% for θ = 18.79 arcmin and 25% for θ = 37.38 arcmin.As for the power spectrum case, we observe higher dis-Article number, page 7 of 18 A&A proofs: manuscript no.journal_paper  crepancies at lower redshift, hence we can conclude that, with the current setting of our simulation, we can not model correctly such redshift.

Application: Fisher forecast
As an example of application of differentiable simulations, we aim to investigate the degeneracy between the cosmological parameters in high dimensional space and when systematics, such as the intrinsic alignment, are included in the analysis.Thanks to automatic differentiation, taking the derivative through the simulation with respect to the initial cosmological and nuisance parameters is now possible, thus allowing, among other things, for Fisher forecasts.In this section, we briefly introduce the Fisher forecast formalism.We also describe in detail the specific choices for the analysis we use throughout.

Forecast formalism
Fisher forecast is a widely used tool in cosmology for different purposes, e.g.investigate the impact of systematic sources or forecast the expected constraining power of the analysis (Tegmark et al. 1997).It can be thought as a tool to forecast error from a given experimental setup and quantify how much Lower panel: The fractional difference between the theoretical and simulated C II and C GG contributes.We can see that we measure a reduced power spectrum at low compared to the theoretical predictions.This can be explained by the small volume of our simulation and the related low number of large-scale modes.The power spectra and ratios are means over 100 independent map realisations and the shaded regions represent the standard deviation from 100 realisations.information we can extract from it.The Fisher matrix is defined as the expectation value of the Hessian matrix of the negative log-likelihood L(C( ); θ): where we indicate with θ the cosmological parameters or any systematics included in the simulation.If we assume a Gaussian likelihood and a Covariance matrix C i j independent from θ, we can calculate the Fisher matrix as were we indicate as ∂µ ∂θ α the derivatives of the summary statistics w.r.t the cosmological or nuisance parameters evaluated at the fiducial values.So, under the assumption of Gaussian likelihood, the Fisher information matrix provides a lower bound on the expected errors on cosmological parameters.

Analysis choices
To perform our study we use a single source redshift at z=0.91.Specifically, we generate 5000 independent map realisations to  which we add Gaussian noise with mean zero and variance: where we set the shape noise σ e = 0.26, the pixel area A pix = 0.29 arcmin 2 and the galaxy number density n gal = 20 arcmin −2 .We assume a parameter-independent covariance matrix computed as: where N is the number of independent realizations, x r i is the value of the summary statistics in the i th bin for a given realization r, and µ is the mean of the summary statistics over all the realization in a given bin.In addition, we adopt the estimator introduced by Hartlap et al. (2007) to take into account the loss of information caused by the finite numbers of bins and realizations, i.e. we compute the inverse of the covariance matrix as : where C * is the covariance matrix defined in Equation 37.As mentioned before, we want to focus on a fair comparison between the power spectrum and the peak counts method.To be sure we are considering the same scales for both statistics, we apply a wavelet pass-band filter to the maps to isolate particular scales before measuring the power spectrum.We use the same scales used for the Peak counts, i.e. we decompose the noisy convergence map in seven images, we sum back only the three maps corresponding to [9.33, 18.79, 37.38] arcmin and compute the angular C on the resulting image.An example of the C computed for each individual starlet scale image and for the summed image is depicted in Equation 5.2.
For each map, we compute the angular power spectrum and the peak counts by using our own differentiable code implemented in the TensorFlow framework. 1he derivatives with respect to all parameters are evaluated at the fiducial cosmology as the mean of 1500 and 2600 independent measurements for the peak counts and the C respectively.Indeed, while the peak counts reach the convergence with ∼ 1500 simulations, the C proves to be more sensitive to noise and thus, requires more realizations to convergence.In Figure B1 in appendix B we test the stability of the Fisher contours by changing the number of simulated maps used to compute the jacobian.
The priors used in the forecast process, are listed in Table 1 following Zhang et al. (2022).To take into account the par- tial coverage of the sky, we scale the Fisher matrix by the ratio f map / f survey , where f map is the angular extend of our κmap f map = 25 deg 2 and f survey corresponds to the size of the convergence maps for Stage IV-like survey f survey = 15000 deg 2 .

Results
We now compare the relative constraining power of the two statistics described in subsection 2.2 using the Fisher matrix formalism.As mentioned before, our interest is to investigate the sensitivity of the two weak-lensing statistics when systematic, such as the intrinsic alignment, and more cosmological parameters are included in the forecast.The results presented here are obtained from one single source redshift at z=0.91, assuming the fiducial cosmology and survey requirement presented in section 4 and subsection 5.2.The fiducial and priors ranges of the parameters are listed in Table 1.
Figure 8 shows the 2σ contours on the full ΛCDM parameter space and intrinsic alignment term considered for the two analyses.The contours obtained by the angular C analysis are plotted in grey, the ones for the peak counts in yellow.We find that in constraining Ω c , σ 8 and A IA peak counts outperform the power spectrum, while h, n s and Ω b parameters, within the limit of our setting, are not constrained by either and are prior dominated.
This is an interesting result, confirming the higher constraining power of weak-lensing peak counts as found in Ajani et al. (2020), especially considering that the two studies differ in multiple aspects.The most important difference between these two analyses is the parameters they include.Whereas Ajani et al. (2020) derive constraints on the sum of neutrino masses M ν , the total matter density Ω m , and the primordial power spectrum normalization A s , we include the five cosmological parameters of the ΛCDM model and intrinsic alignment amplitude.The constraining power of the peak count statistic keeps being higher even in high dimensional cosmological parameter space and when the intrinsic alignment is included.
The chosen angular scales differ as well.Ajani et al. (2020) consider angular scales in the range l = [300, 5000], while we focus, for both multiscale peak counts and C , on scale approximately corresponding to the range l = [300, 3000].Despite we are neglecting scales > 3000, containing a larger amounts of non-Gaussian information, we find that for mildly non-linear N (0,3) 0.0 scales we are considering, the peak counts statistic still constrains the cosmological parameters the most.Finally, we find that the contours on the galaxy intrinsic alignment are significantly better constrained by the peak counts.

Discussion
In this section, we discuss the limitations of the methodology and results obtained in this paper highlighting in particular strategies for future extensions and applications.
In this work, we only used a single source plane in our Fisher forecast analysis, which does not allow us to evaluate the full impact that IA would have in a tomographic analysis.In particular, we do not have a contribution from the GI term.Many studies have demonstrated that the tomographic analysis can significantly improve constraints on cosmological and IA parameters.(King & Schneider 2003;Heymans et al. 2004;Troxel & Ishak 2015).Although it is straightforward to generalize all the results shown in this paper to the tomographic case, this will require increasing the resolution of the simulation at lower redshifts (as illustrated by Figure 5) in order to model correctly low redshift bins.Since the maximum number of particles we can adopt in a simulation is closely limited to the GPU memory, we are building a distributed implementation of DLL, which will allow us to increase the resolutions of the simulations to the point of modeling correctly even the smaller scales at the lower redshifts.
Another direction for further development is the ray tracing methodology.In our method, we construct the weak lensing maps assuming the Born approximation.However, Petri et al. (2017) shows that for an LSST-like survey, while the Born approximation leads to negligible parameter bias for the power spectrum, it can lead to significant parameter bias for higherorder statistics.Hence, the natural next step will be to implement a ray-tracing algorithm beyond the Born approximation in our pipeline.We aim to adopt the multiple-lens-plane approximation (Blandford & Narayan 1986;Seitz et al. 1994;Jain et al. 2000;Vale & White 2003;Hilbert, S. et al. 2009), i.e. by introducing lens planes perpendicular to the line-of-sight, the deflection experienced by the light rays due to the matter inhomogeneities will be approximated through multiple deflections at the lens planes.More specifically, we will implement the memory-efficient ray-tracing scheme proposed by Hilbert, S. et al. (2009) in the Tensorflow framework.
On the theoretical modeling side, we studied the impact of the intrinsic alignment of galaxies assuming a linear coupling between the intrinsic galaxy shapes and the non-linear projected tidal fields, i.e. adopting the NLA model.This physical description for the IA is only an approximation since it does not take into account the tidal torque field.In future work, we aim to extend the NLA model by implementing the extended δNLA model, described by Harnois-Déraps et al. (2021).
Finally, we presented a tool based on only Dark matter simulations.We note that this would force us to perform conservative scale cuts in the inference analysis to not include scales affected by baryonic effects.A future prospect is to include baryonic effects in the analysis.One possible way applicable to our methodology could be to extend the Hybrid Physical Neural ODE approach and apply more sophisticated models to learn the physics that controls the hydrodynamics simulations.
We expect that the methods illustrated in this paper will be extended to different relevant use-cases.A particularly suitable example is related to the application of algorithms such as the Variational Inference and the Hamiltonian Monte Carlo that are widely used in the Bayesian inference context and were until now excluded due to the lack of derivatives.A further example is provided by Zeghal et al. (2022), which demonstrates that having access to the gradients of the forward model is beneficial to constrain the posterior density estimates.

Conclusions
In this paper, we have presented the Differentiable Lensing Lightcone (DLL) model, a fast lensing lightcone simulator pro-Article number, page 11 of 18 viding access to the gradient.We extended the public FlowPM N-body code, implementing the Born approximation in the Tensorflow framework to create non-Gaussian convergence maps of weak gravitational lensing.To allow DLL to run at low resolution without affecting significantly the accuracy, we complement the FlowPM N-body code with the Hybrid-Physical Neural scheme, a new correction scheme for quasi N-body PM solver, based on Neural Network implemented as a Fourier-space filter.We validate our tool by comparing the C and peak counts statistics against predictions from κTNG simulations.To do this, we run simulations following the evolution of 128 3 particles and we produce weak lensing convergence maps for several redshift sources.We show that, despite being generated at low computation costs, we recover a good match for redshift equal or higher than z = 0.91.To demonstrate the potential of our tool, as a first use case, we exploit the automatic differentiability of the simulations to do Fisher forecast.Thanks to back-propagation, accessing the derivative through the simulations w.r.t. the cosmological parameters and A IA parameter is possible at the same computational cost of the forward simulation.Assuming an LSST-like setting, we simulate weak lensing convergence maps for a single source redshift z = 0.91 and angular extend of 5 • , based on a periodic box of comoving volume equal to 205 h −1 Mpc.We compute the constraints on the resulting convergence maps with the starlet peak counts and use a wavelet-filtered lensing power spectrum as a benchmark for the comparison.Within the limits of the analysis choices made in this study, we obtain the following results: -We confirm that the peak count statistics outperform the twopoint statistics as found in Ajani et al. (2020), even in high dimensional cosmological and nuisance parameter space.-We find the peak counts to provide the most stringent constraints on the galaxy intrinsic alignment amplitude A IA .
To conclude, the framework outlined here can provide many advantages in the context of cosmological parameter inference: it is the first step in the development of fully differentiable inference pipelines for weak lensing, it is a fast tool to further explore the sensitivity of higher-order statistics to systematics.

Fig. 1 .
Fig.1.Matter power spectrum and fractional matter power spectrum of PM simulations before and after using the Hybrid Physical-Neural (HPN) correction model and the theoretical halofit model for redshift z=0.03 (upper panel) and redshift z=0.91 (lower panel).The power spectra and ratios are means over 100 independent map realisations.The shaded regions represent the standard deviation from 100 independent DLL realisations.

Fig. 2 .
Fig. 2. Left panel: Convergence map at source redshift z = 0.91 from DLL, PM only.Right panel: Same convergence map when the HPN correction is applied.

Fig. 3 .
Fig.3.Upper panel: Angular power spectra of PM simulations before and after using the Hybrid Physical-Neural (HPN) correction model compared to the κTNG prediction.Lower panel: fractional angular power spectrum of PM simulations before and after using the Hybrid Physical-Neural (HPN) correction model and the κTNG prediction.The power spectra and ratios are means over 100 independent map realisations and the shaded regions represent the standard deviation from 100 DLL realisations.The spectra are computed for the source redshift z s = 0.91.

Fig. 4 .
Fig. 4. Upper panel: The C II and C GG contributions from theoretical predictions (dashed line) and DLL simulations.Lower panel: The fractional difference between the theoretical and simulated C II and C GG contributes.We can see that we measure a reduced power spectrum at low compared to the theoretical predictions.This can be explained by the small volume of our simulation and the related low number of large-scale modes.The power spectra and ratios are means over 100 independent map realisations and the shaded regions represent the standard deviation from 100 realisations.

Fig. 5 .
Fig.5.Upper panel: Angular power spectra for 5 source redshift from our DLL maps compared to the κTNG predictions.Lower panel: Fractional angular power spectra of DLL simulations and κTNG simulations for different source redshift.The power spectra mean over 100 independent map realisations and the shaded regions represent the standard deviation from 100 independent DLL realisations.

Fig. 6 .
Fig.6.Fractional number of peaks of DLL simulations and κTNG simulations for different sources redshift.The peak counts distributions are shown for each starlet scales resolutions used: 9.34 (upper panel), 18.17 (center panel), 37.38 arcmins (lower panel).The results mean over 100 independent map realisations and the shaded regions represent the standard deviation from 100 independent DLL realisations.

Fig. 7 .
Fig. 7. Example of the filtered C used for the analysis.The colored lines show the C computed on maps with a different resolution of the starlet decomposition.Specifically: the blue line (multiscale map) corresponds to C computed on the summed image, the black dashed line (Original map) corresponds to the standard C computed from a non-filtered map.

Fig. 8 .
Fig.8.2σ contours derived for one single source redshift at z=0.91 and the survey setup presented in section 4. The constraints are obtained by applying the starlet Peak counts (yellow contours) computed on noisy maps filtered using a starlet kernel of[9.33,18.70, 37.38] arcmin together and the wavelet pass-band filter for the C statistics (grey contours) as described in section 4. The dashed black lines are located at the fiducial parameter values.

Fig
Fig.A1.2σ contours derived for one single source redshift at z=0.91 and the survey setup presented in section 4. We compare the Fisher matrix constraints on cosmological parameters and A IA amplitude obtained with the C from our mock maps (orange) and the theoretical C (blue) obtained from the public library jax-cosmo(Campagne et al. 2023).In both cases, the constraints are obtained by applying the wavelet pass-band filter for the C as described in section 4. The dashed black contours are the prior used for the forecasting.
Article number, page 8 of 18 Denise Lanzieri et al.: Forecasting the power of Higher Order Weak Lensing Statistics with automatically differentiable simulations

Table 1 .
Prior and fiducial values used for the forecasting.