DISCO-DJ I: a differentiable Einstein-Boltzmann solver for cosmology

We present the Einstein-Boltzmann module of the Disco-Dj (DIfferentiable Simulations for COsmology — Done with J ax) software package. This module implements a fully differentiable solver for the linearised cosmological Einstein-Boltzmann equations in the Jax framework, and allows computing Jacobian matrices of all solver output with respect to all input parameters using automatic differentiation. This implies that along with the solution for a given set of parameters, the tangent hyperplane in parameter space is known as well, which is a key ingredient for cosmological inference and forecasting problems as well as for many other applications. We discuss our implementation and demonstrate that our solver agrees at the per-mille level with the existing non-differentiable solvers Camb and Class, including massive neutrinos and a dark energy fluid with parameterised equation of state. We illustrate the dependence of various summary statistics in large-scale structure cosmology on model parameters using the differentiable solver, and finally demonstrate how it can be easily used for Fisher forecasting, with a forecast for Euclid as an example. Since the implementation is significantly shorter and more modular than existing solvers, we believe it will be more straightforward to extend our solver to include additional physics, such as additional dark energy and dark matter models, modified gravity, or other non-standard physics in the future.


Introduction
Describing the evolution of perturbations of Einstein's field equations in a cosmological setting is an almost century-old quest and the foundation of inhomogeneous cosmology as much as that of all of extragalactic astrophysics.These perturbations, leading to deviations from the homogeneous Friedmann-Lemaître-Robertson-Walker (FLRW) spacetimes and the growth of structure, are described by the Einstein-Boltzmann equations, which are a set of coupled partial differential equations that describe the evolution of the energy density of the various components of the Universe along with the perturbations to the FLRW metric.First derivations of the equations date back many decades [1,2] and subsequent revisions provide the foundation of all of modern inhomogeneous cosmology [3][4][5].
The Einstein-Boltzmann equations are a cornerstone of modern cosmology, and are the basis for the computation of the cosmic microwave background (CMB) anisotropies, the matter power spectrum, and other observables.The matter power spectrum, or its constituents, the individual transfer functions of the various matter components, is the starting point for all predictions of large-scale structure observables, such as the galaxy n-point spectra, weak lensing spectra, the halo mass function (in analytic models), and in particular also the initial conditions for all cosmological simulations.Building on the linear Einstein-Boltzmann equations, the non-linear evolution of structure is then typically described in the non-relativistic collisionless limit by the Vlasov-Poisson system of equations, which is the basis for all nonlinear perturbation theory and all N -body simulations (e.g.[4,6,7]).
Several famous software packages have been developed that integrate the linearised Einstein-Boltzmann equations.The most commonly used (historically) include the Cosmics/Linger code [5], followed by a version specifically optimised for CMB predictions, the CMBfast code [8], which included the line-of-sight approximation which allowed a dramatically faster computation of CMB spectra.The increasing requirements of the WMAP [9] and Planck [10] CMB missions for efficient codes have led to the highly optimised and accurate CAMB1 [11] and CLASS2 [12,13] codes.A recent addition is PyCosmo3 [14] which relies on automatic translation of symbolic expressions to compiled code.Both Camb and Class feature a large number of optimizations (e.g.approximate asymptotic forms of the equations valid in various regimes, robust high-order implicit integrators), are highly optimised, and are written in compiled functional languages -Fortran (CAMB) and C (CLASS) -but both now have convenient Python interfaces.
Both packages share a common minimal set of implemented equations (see [5,15]), which model the evolution of cold dark matter (CDM), baryons, photons, massless and massive neutrinos, a dark energy fluid, as well as metric perturbations.While already the first codes included these energy components, Camb and Class have by now been extended to include a plethora of additional models, including various dark energy and modified gravity models [16][17][18], and other variations such as scalar field dark matter [19,20], early dark energy [21], and many more.
In this article, we present the Einstein-Boltzmann (EB) module of a new software package, Disco-Dj (DIfferentiable Simulations for COsmology -Done with Jax), that implements a fully differentiable Einstein-Boltzmann solver in the Jax framework [22].This solver is based on the Diffrax Python package [23], which (among others) implements a high-order implicit Runge-Kutta solver for stiff ordinary differential equations (ODEs).All code (including the Diffrax package) is built on top of the Jax framework, which provides automatic differentiation of Python code, and thus allows computing the Jacobian matrix of the Einstein-Boltzmann system as well as functions of its results with respect to all input parameters.This is a key ingredient for many applications relying on gradients, including cosmological inference and the computation of the Fisher matrix computation for forecasting or optimal data compression.
In addition to differentiability, the Jax framework is also optimised for execution on graphics processing units (GPUs), which allows for a significant speed-up of the computation of the Einstein-Boltzmann system, in particular when batching over multiple cosmological models, which is an embarrassingly parallel calculation.
The current version of the EB module of Disco-Dj (referred to as Disco-Eb in what follows) has a very lean and modular code base and is thus easy to extend and modify beyond its current minimal implementation from a programming point of view.In particular, we hope that in the future additional physics modules, such as dark energy models, modified gravity, or other non-standard physics will be added by the community.Eventually, we expect that also other deliverables beyond what is shown in this paper will be added, such as e.g. the CMB anisotropies or the lensing potential.Further, the EB module will be connected to a fast N -body solver, which we are also developing as part of the Disco-Dj package and which allows the inclusion of non-linear effects in the computation of the matter power spectrum and other key statistics and predictions for large-scale structure cosmology.We will present this in a future publication.
The structure of this article is as follows: In section 2 we discuss the implementation of the Einstein-Boltzmann solver in Disco-Dj, giving attention to differentiable solvers, implicit integrators for stiff ODEs, approximation schemes, and the thermal history solver.In section 3 we validate the accuracy of the solver against the commonly used Camb and Class codes.We then discuss various applications of a differentiable Einstein-Boltzmann solver, including the computation of the dependence of various summary statistics on cosmological parameters, and the computation of the Fisher information matrix in section 4. Finally, we summarise and conclude in section 5.An overview of the equations implemented in the Einstein-Boltzmann module of Disco-Dj is given in Appendix A.

Implementation
In this section, we discuss aspects of the implementation of the linear Einstein-Boltzmann module of Disco-Dj.

Overview
The Einstein-Boltzmann equations are a set of non-linear coupled partial differential equations that describe the evolution of the energy density of the various components of the Universe along with the perturbations to the FLRW metric.A fully self-consistent nonlinear solution all the way to late times and small scales is however so far computationally untractable (but see [24] for a second-order EB solver).Instead, in cosmology, one typically splits the evolution into a linear and a non-linear part.The linear part is described by the linearised Einstein-Boltzmann equations.The non-linear part is typically described in the non-relativistic collisionless limit by the Vlasov-Poisson system of equations, which is solved with N -body methods (e.g.[7]), using input from the linear Einstein-Boltzmann solver.
In this work, we focus on the linear Einstein-Boltzmann equations including CDM, baryons, photons, massive and massless neutrinos, and clustering dark energy.The governing equations are stated in full in Appendix A for reference.The same equations are also implemented in the popular Camb and Class solvers.Those solvers also include additional physical models (e.g.early dark energy, dark radiation, etc.) that we do not include (yet).
The new aspect of our work is that we implement the Einstein-Boltzmann equations in the Jax framework, which allows us to compute the Jacobian matrix of the Einstein-Boltzmann system (which is crucial input for implicit ODE solvers) as well as of any function of its results with respect to all parameters.For this reason, in this article, we first discuss the aspect of automatic differentiation in subsection 2.2 below.The linearised Einstein-Boltzmann equations are so-called 'stiff' coupled ODEs, which are most efficiently solved by implicit integrators, an aspect we discuss in subsection 2.3.Solvers such as Camb and Class employ a number of numerical approximation schemes in order to reduce the large number of equations needed to be solved and optimise the time-to-result for a given precision, which is critical for the use in data analysis.We discuss these aspects in subsection 2.4.Finally, we also discuss our choice of thermal history solver and its performance in subsection 2.5.

Automatic differentiation
The autodiff paradigm [25] has seen rapidly increasing interest recently due to the machine learning revolution.Traditionally, the computer-aided computation of derivatives could be subdivided into symbolic and numerical differentiation methods.Symbolic differentiation involves algorithmically determining the derivative of a function as a new symbolic expression by applying differentiation rules such as the chain rule, the product rule, etc.This approach is implemented in software packages such as the popular Mathematica program [26] and the Python module SymPy [27].While these programs are often able to find surprisingly simple expressions for derivatives that would be hard to compute by hand, more complex functions may result in lengthy expressions that are far from computationally optimal when being evaluated with concrete numerical values.On the other hand, numerical differentiation relies on finite difference approximations of derivatives.This introduces a hyperparameter, namely the step size, upon which the results can depend sensitively.In particular, numerical approximations of higher derivatives are oftentimes simply too noisy to be useful in practice.
More recently, the autodiff paradigm (also known as automatic differentiation) has become a popular third contender among computational differentiation methods and is now the de-facto standard in the field of machine learning.Automatic differentiation exploits the fact that programs perform the evaluation of any mathematical function of some inputs as a nested sequence of simple functions such as summation, multiplication, exponentiation, etc.The derivative of each of these building blocks is known, so when the program sequentially propagates the inputs through these blocks, the derivative information can be passed on alongside the computed values simply by using the chain rule.This particular flavour of autodiff is known as forward-mode differentiation.Alternatively, if the chain rule is instead performed starting at the outputs of the function and propagating backwards to the inputs, one obtains reverse-mode differentiation.Which of the two is computationally cheaper mainly depends on the dimensionality of the inputs and outputs (backward-mode for many inputs → few outputs, e.g. when computing a scalar loss function w.r.t. the millions of parameters of a neural network, forward-mode for few inputs → many outputs, e.g. when differentiating a field quantity w.r.t. a cosmological parameter).Keeping track of the derivatives with autodiff only entails a constant computational overhead for each of the function's building blocks.
In our case, the quantity of interest, namely the matter power spectrum P , is obtained as the solution of a system of differential equations, specifically the cosmological Einstein-Boltzmann equations.Thus, the computation of gradients such as ∂P/∂Ω m requires us to differentiate through the solution of the differential equations.This can be done in two different ways: (1) differentiate iteratively through the time steps performed by the numerical solver ("discretise-then-optimise" approach) or (2) set up a continuous "adjoint equation" for ∂P/∂Ω m that then must be solved backwards in time, also with a numerical method ("optimise-then-discretise" approach). 4Generally, the optimise-then-discretise approach yields less accurate derivatives because the numerical discretisation of the backwardsin-time equations required for the computation of the derivatives is not synchronised with that for the forward-in-time equations for the quantities themselves [28]; however, it is more memory-efficient than its discretise-then-optimise counterpart.
In this work, we rely on Google's array-based autodiff Python library Jax [22].Jax conveniently allows writing differentiable programs in high-level Python that are then compiled just-in-time (JIT) to execute efficiently on graphics or tensor processing units (GPUs, TPUs), making use of the fast Xla (Accelerated Linear Algebra) compiler.Syntactically, much of Jax follows the syntax of NumPy, making it easy for users with Python experience to switch from plain NumPy (which runs on central processing units, i.e.CPUs, and is neither compiled nor differentiable) to Jax.It supports forward-and reverse-mode differentiation, the computation of higher-order derivatives such as Hessian matrices, holomorphic differentiation, and the vectorised computation of derivatives, among other useful features.The Jax ecosystem is steadily expanding, and there exist Jax-based packages for machine learning (e.g.[29]), for physics simulations (e.g.[30]), probabilistic programming (e.g.[31,32]), and many other applications, which can be integrated seamlessly with custom Jax code.Importantly, we use the Jax-based Diffrax library for solving differential equations [23], which supports different flavours of "discretise-then-optimise" and "optimise-then-discretise" derivatives.The current version of Disco-EB supports both forward (via Diffrax's DirectAdjoint) and backward (via Diffrax's BacksolveAdjoint) differentiation.Other differentiable frameworks exist; notably e.g.Bolt5 implements a forward-differentiable Einstein-Boltzmann solver in Julia (but is unpublished at the time of writing).
In the past few years, Jax has garnered considerable interest in the cosmology community.For example, Ref. [33] presented a differentiable cosmology library using Jax, Ref. [34] introduced a Jax-based inference framework for cosmology, and the N -body code by Ref. [35] is fully differentiable thanks to its implementation in Jax.It has also been used in the context of estimating weak lensing shear [36] and for modelling the evolution of dark matter haloes [37].

Stiff ODEs and implicit solvers
The Einstein-Boltzmann equations, cf.Appendix A, are a stiff system of ODEs due to the multiple time scales involved in the various physical processes, which makes them ill-suited for explicit time integration schemes [38].Various approximations can be employed (cf.[13], and subsection 2.4) to reduce the stiffness of the system (see also [39]), but these approximations are generally not valid for the entire domain of integration.State-of-the-art solvers therefore use a combined approach of approximation schemes and a powerful high-order implicit time integration scheme [13,40].A main draw-back of implicit solvers is that the Jacobian matrix of the system needs to be computed and inverted at each time step, which is computationally expensive, particularly so for large systems such as the cosmological Einstein-Boltzmann equations.
A main advantage of the autodiff paradigm is that the Jacobian matrix of the system can be computed automatically without specifying it explicitly or having to recourse to finite-difference approximations.For Disco-EB, we adopt the Diffrax package [23] for this purpose, which comes with various high-order implicit and explicit integration and is also built on top of the Jax framework.Specifically, we adopt the highest order implicit solver implemented in Diffrax: the 7-stage Kvaerno5 solver [41], a singly diagonally implicit Runge-Kutta (ESDIRK) method of order 5/4.
Due to the large number of equations, it is in principle critical to exploit the sparse structure of the Jacobian matrix for efficient implicit solvers -such as done by Class which uses sparse storage for the Jacobian.However, sparse matrix operations in Jax are still in an experimental stage and thus, Diffrax does not (yet) support sparse matrices, which is the current performance bottleneck of Disco-EB.
After linearisation of the Einstein-Boltzmann system [5], the equations for individual wave numbers k decouple in Fourier space and can thus be solved individually.The resulting Jacobian for multiple wave numbers thus has a block-diagonal structure.Since such sparse Jacobian matrices can currently not be handled, an optimised joint integration of multiple wave numbers, which leads to a block diagonal Jacobian (since the different modes do not couple), is currently not possible.Storing the full Jacobian for multiple modes is computationally prohibitive.While we are, therefore, currently restricted to integrating mode-by-mode in an embarrassingly parallel way, there is plausibly room for future performance improvements along such lines.In applications where the Einstein-Boltzmann system needs to be solved for multiple sets of cosmological parameters, e.g. in the context of inference tasks, it is straightforward to batch the computation over those sets in order to increase the computational speed, harnessing the highly parallel architecture of GPUs.Currently, we use however the Jax vmap function to vectorise the computation of multiple modes on the GPU.

Approximation methods
In order to speed up the integration of the Einstein-Boltzmann system, for Class and Camb, it has been critical to employ various approximation schemes [13,40] in order to reduce the number of degrees of freedom of the high-dimensional system and to reduce the stiffness of the system.The situation is somewhat different for GPU-Jax-based solvers.While the Jacobian matrix of the system can be computed automatically (which is, in fact, more efficient than a finite difference calculation), the computation of the Jacobian matrix is still computationally expensive, and it is still desirable to reduce the number of degrees of freedom of the system.However, adaptively switching to a reduced set of variables based on a predefined criterion introduces additional algorithmic complexity, which is not desirable in the context of GPU-based solvers.Specifically, we found that neither of the commonly used approximation schemes yielded a O(1) performance improvement, and, while we implemented the tight coupling approximation [5,13] (valid when the baryons are fully ionised and tightly coupled to the photons), and all the approximations used in Class [13] (notably the ultra-relativistic fluid approximation, applying to massless neutrinos at late times, as well as the radiation streaming approximation, applying to massless neutrinos and photons at even later times) in Disco-EB, we do not currently use any of them.
A main performance bottleneck lies in the computation of the Jacobian matrix, which scales quadratically with the number of equations (see Appendix A).The number of equations scales with the number ℓ max of the multipole where the photon and neutrino distribution functions are truncated, and the number of massive neutrino mass bins.Specifically we have equations.The '8' entails the equations for metric+CDM+baryon+dark energy perturbations, ℓ γ,max is the maximum multipole moment of the photon distribution function, ℓ νUR,max is the maximum multipole moment of the massless neutrino distribution function, n q is the number of neutrino momentum bins, and ℓ νmassive,max is the maximum multipole moment of the massive neutrino distribution function.
A key optimisation, therefore, lies in a clever choice of neutrino momentum bins so that n q can be chosen as small as possible.Specifically, [42] found that a 3-5 bin Gauss-Laguerre inspired integral approximation used by Camb (see Appendix A of [42]) works very well in this context.The neutrino momentum bins are chosen so that the following integral is well approximated by a sum over the bins where q is the neutrino momentum, v is the (relativistic) neutrino velocity, ν ℓ is a rescaled function of the ℓ-th multipole moment of the neutrino distribution function, w = −1, 0, 1 (depending on which moment of the distribution function is to be computed, cf.eqs.( 55) of [5]), and K i are the kernel weights approximating the integral.The authors of [42] propose the following very accurate binning schemes: The truncation of the neutrino Boltzmann hierarchy ℓ ν,max = lnumax then determines the number of equations that need to be solved as the product of lnumax and the number of neutrino momentum bins.When neutrinos are sufficiently non-relativistic, lnumax can, in principle, be reduced to a very small number (2 or 3, cf.[42]).
For our first proof-of-concept version of Disco-EB, we decided to leave out all such further approximations, but we expect that a possible factor of 2-3 in speed could be gained by a careful evaluation of the various approximation schemes on integration of the ODEs on GPUs (specifically optimizing how to best change the number of equations integrating directly in the integrator).We leave this for future work.

Thermal history/recombination solver
Currently, we adopt a simplified thermal history solver in Disco-Dj, that is based on the original RecFast model [43].The original implementation of the thermal history in Cosmics (Section 5.8 of [5], which is based on [44] with addition Helium) is not accurate enough to obtain sub-per-cent-level agreement with Camb or Class.All state-of-the-art Einstein-Boltzmann solvers employ at least RecFast [43], or the newer CosmoRec [45] and HyRec [46] (which has been further developed into HyRec2 [47]) solvers.We found however that, since we are predominantly interested in the matter power spectrum and late times (rather than a CMB spectrum), adopting the original RecFast model is accurate enough for our purposes.While the thermal history solver is thus not the main source of error in the late time matter power spectrum, we are planning to eventually add a more accurate recombination solver in Disco-Dj.The current version implements the original RecFast model, that we dub 'RECFASTmini', as well as the much older MB95 model [5].
In Figure 1, we compare the ionization history of the baseline cosmology computed using the simplified models in Disco-EB with the RecFast and HyRec solvers interfaced in Class.The main difference is due to the different treatment of the helium recombination, with a longer phase of He + recombination in RecFast.As we will show below, for late times The main difference is due to details in the treatment of the combined hydrogen-helium recombination.None of these differences (except MB95) have a measurable effect late-time on matter components.
z ≲ 1000, the simplified 'RECFASTmini' model is sufficient to reach agreement of late time spectra with Class or Camb to below one per cent.

Validation and performance
We next validate the performance of our solver against the Camb and Class solvers.First, we compare the precision of our solver for a given set of cosmological input parameters.Second, we demonstrate that our solver is able to compute the Jacobian matrix of the system with respect to all input parameters, and validate the correctness of the Jacobian matrix by comparing it to finite-difference approximations.

Choice of fiducial cosmological model
For all comparisons, unless otherwise indicated, we adopt a flat (extended) ΛCDM cosmology, with the cosmological parameter set shown in Table 1, which is consistent with the Planck 2018 EE+BAO+SN constraints [48].Specifically, we consider the standard extension with inclusion of one massive neutrino, along with a dark energy fluid with parameterised (Chevallier-Polarski-Linder) equation of state w = p/ρ = w 0 + w a (1 − a).We adopt w 0 = −0.99instead of −1 since derivatives w.r.t.w 0 are defined only in the one-sided limit at w = −1, a complication we thus avoid (although defining one-sided derivatives is, in principle, also supported by Jax).  is chosen so that all densities add up to critical and thus Ω k = 0 serves as a constraint, not a parameter.

Comparison with CAMB and CLASS
In this subsection, we validate the performance of the Disco-Dj Einstein-Boltzmann solver against the well-established and commonly used Camb Since Disco-Dj is aimed at large-scale structure studies, we are mainly interested in the post-recombination regime.We use high accuracy settings for all three codes, listed in Table 2 and Table 3, for Camb and Class respectively, to boost the accuracy of the integration results.For Disco-EB, we adopted a maximum ℓ of the hierarchy truncation of 32, as well as the 5-point neutrino momentum integration.Note that we found that Class enables by default some optimizations which can lead to relatively large differences, notably we had to disable some optimizations for massive neutrinos and the dark energy fluid.
In all tests, we employ the cosmological parameters listed in Table 1 as our baseline cosmological model and compute the evolution from adiabatic (isentropic) initial conditions using Disco-EB, Camb, and Class.We compare the resulting power spectra for baryons, CDM, the massive neutrino, as well as the combined total matter (CDM+baryon+massive In all cases, we find excellent agreement between the three codes, with relative differences of at most a few per mille for all quantities of interest, with the sole exception of the massive neutrino perturbations.Generally the difference is similar w.r.t.both Class and Camb, with possibly slightly better agreement between Disco-EB and Class for everything but the massive neutrinos spectrum, and slightly better agreement between Disco-EB and Camb for the massive neutrinos.We find that the spectra of massive neutrino perturbations differ at more than 1 per cent for k ≳ 1 Mpc −1 , where they are suppressed by about 10 −6 relative to the CDM perturbations due to free streaming.In practice, these differences are therefore not significant for the matter power spectrum, and entirely attributable to the chosen numerical precision. A similar comparison at z = 99 (a = 10 −2 ) (shown in Figure 8 in the appendix) yields very similar results with few per-mille differences in all perturbations.Slightly larger differences exist for baryons, presumably due to differences in the recombination solver used.To improve agreement even further at higher z, we would therefore have to improve the quality of the recombination solver, which we leave for future work (see discussion in subsection 2.5).Again, larger differences exist for massive neutrinos in the free-streaming suppressed regime, which are however of no real importance.
All in all, the agreement is therefore excellent, and we conclude that Disco-EB is able to reproduce the results of Camb and Class at the per mille accuracy level for all quantities of interest.We caution that the tiny differences between the results of Camb and Class might still be reduced by more optimal parameter choices.

Gradient of the power spectrum w.r.t. cosmological parameters
Having a fully differentiable Einstein-Boltzmann solver permits taking derivatives with respect to any parameter of the model.In this subsection, we demonstrate this capability by computing the Jacobian matrix of the total matter (CDM+baryon+massive neutrino) density power spectrum at fixed time (a = 1) with respect to all cosmological model input parameters.This is achieved by wrapping the power spectrum calculation in a Jax function, which takes the parameters as an argument, and then calls the jacfwd method of the Jax library.Specifically, let ϑ ∈ R n be the vector of cosmological model input parameters, where n = 12 for our baseline cosmology (see Table 1), and specifically In principle, all physical constants and other parameters, including e.g.numerical parameters, could be added here.Let P (k, a | ϑ) be the vector of power spectrum values at fixed time, then we are interested in computing the logarithmic Jacobian The result of the autodiff computation is shown in Figure 3 as a solid black line.The respective component of the Jacobian is indicated by a label in each plot.For comparison, we also show the finite-difference approximation of the Jacobian computed with Camb and Class (note that we have adopted ℓ max = 32 for Class in this case).Specifically, we compute the finite difference approximation of the Jacobian as which means that for second-order finite differences two calls to the EB solver are needed per model parameter.We show the results for ϵ = 10 −2 in Figure 3.The top panel in each case represents the value of the derivative as a function of the wave number, while the bottom panel shows the relative difference between the Disco-EB autodiff computation and the finite difference solutions using Camb and Class.
We note that the autodiff evaluation and the finite difference approximation agree generally very well to mostly better than one per cent.We found that a good bias-variance tradeoff requires ϵ ≈ 10 −2 , while for smaller values, the finite difference estimates become very noisy.Nonetheless, where visible, the autodiff version shows fewer signs of numerical artefacts.The only significant difference is found for the neutrino mass derivative w.r.t. the finite difference result for Class, which is attributable to the reduced accuracy we used (specifically not setting ncdm fluid approximation as this leads to prohibitive runtimes).We note that the finite difference approximation is computationally expensive, since it requires two calls to the EB solver per parameter, while the autodiff computation is computationally much cheaper.Due to the large impact of optimizations and specific parameter choices on the performance of the EB solvers, we prefer to refrain from a detailed comparison of the computational performance of the autodiff and finite difference approaches.
Finally, in Figure 4, we show full time and space dependence of the derivative of the total matter power spectrum (cdm+baryons+massive neutrino) w.r.t. the model parameters.Each panel shows the colour-coded change in the power spectrum, with red colours indicating an increase and blue a decrease, as a function of scale factor (y-axis) and wave number (xaxis).A white colour indicates that the respective parameter has no impact on the matter power spectrum at that time and scale.For this analysis, we extend this (qualitative) analysis to earlier times (a = 10 −5 ) revealing how the BAO feature propagates to increasingly larger scale prior to decoupling at a ≈ 10 −3 .Also intuitively, the neutrino mass dependence clearly shows when the neutrino becomes non-relativistic at z ≲ 100, while the signature of the dark energy EOS parameters w 0 , w a and c 2 a is restricted to very late times z ≲ 1, with c 2 a clearly impacting only the sound horizon in the dark energy fluid.

Applications
In this section, we demonstrate a few applications of the differentiable Einstein-Boltzmann solver.We note that these are not meant to be exhaustive in any way.Specifically, two main applications of a differentiable solver will be for parameter inference and the construction of emulators, aspects that we postpone to future work.Nonetheless, the cases we highlight here are meant to demonstrate the applicability and integration of the solver for a few select applications relevant to large-scale structure cosmology.

Redshift-space correlators for large-scale structure studies
The matter power spectrum itself is not directly observable but can be constrained e.g. through the clustering of galaxies.Galaxies are, however, biased tracers of the underlying matter distribution (cf.[49] for a review) and their three-dimensional distribution can generally only be determined in redshift space [50].As a next proof-of-concept test, we demonstrate a differentiable calculation of the redshift-space power spectrum and the correlation function multipoles of biased tracers, as this is a common input for large-scale structure studies.

Linear redshift space distortions
The proper motions v ∥ of galaxies parallel to the line-of-sight (LOS) contribute to the measured redshift and lead to the redshift space distortion (RSD) effect.At linear order and in the Newtonian limit, the redshift-space power spectrum can be computed from the matter power spectrum using the Kaiser formula [50].For linearly biased tracers with density contrast δ g it states that δ Kaiser g := b δ m − µ 2 θm H and so where b is the linear bias coefficient (which we fix to b = 2 here), µ := cos α for an angle α between k and the line-of-sight, θ m is the matter velocity divergence, and H = a ′ /a is the conformal-time expansion rate.Further, one needs to assume an irrotational velocity field (v = ∇∇ −2 θ) and the absence of velocity bias (i.e.galaxy velocity θ g = θ m ) for this relation to hold.Commonly, the growth rate f is used to express θ m in terms of δ m , but since we have both fields from the EB solver, it is more convenient to work directly with the velocity divergence.The fields δ m and θ m are readily obtained from Disco-EB.

Correlation function multipoles
Redshift space distortions on large scales are conveniently expressed in terms of a multipole expansion of the power spectrum.These redshift-space power spectrum multipoles are commonly defined as the following expansion in Legendre polynomials [51,52] where µ = cos α is the cosine of the angle α between the wave vector k and the line-of-sight direction n, P (k, µ; z) is the anisotropic power spectrum of the tracer, and L ℓ is the Legendre polynomial of order ℓ = 0, 1, 2, . . . .Then the redshift-space power spectrum multipoles for the Kaiser formula (eq.(4.2), [50]) are given by  Each panel shows the derivative as a function of scale r of the monopole ξ 0 (r) (blue), the quadrupole ξ 2 (r) (green), and the hexadecapole ξ 4 (r) (purple), the respective correlation function multipoles (without derivative applied) are shown as light dashed lines.We have multiplied the hexadecapole ξ 4 by a factor of 30 for visual clarity.In this case the derivative is carried through the FFTLog method and the Einstein-Boltzmann solver.and therefore with P δδ := δ m δ * m = P m , P δθ := δ m θ * m , and P θθ := θ m θ * m the only non-zero multipoles for the linear Kaiser model are Finally, the correlation function multipoles are defined in terms of the power spectrum multipoles via the integral where j ℓ is the spherical Bessel function of order ℓ.These integrals are typically efficiently computed using the FFTLog method [52,53], which can be implemented in Jax in a few lines, and thus allows differentiable computation of the correlation function.
We show the resulting derivatives of the monopole, quadrupole, and hexadecapole w.r.t. the baseline cosmological parameters in Figure 5.The respective derivative is indicated by a label in each plot.We have omitted the dark energy parameters here as they predominantly affect the growth history and thus only the normalisation of the correlation function multipoles.Clearly visible are the BAO features in many derivatives (but not all).The calculation in this case was carried out using 512 modes geometrically distributed between k = 10 −5 Mpc −1 and k = 100 Mpc −1 since the FFTlog transform is relatively noisy with fewer sampling points, and even in this case, some ringing can be observed at r ≳ 200 Mpc.This could likely be improved by additional tweaks to the FFTlog method, which goes beyond our minimal FFTlog implementation, and which we leave for future work.

Two-dimensional power spectrum
In a more realistic analysis, in addition to the Kaiser effect, typically also the Finger-of-God effect and the Alcock-Paczynski effect [54] are considered.
The Alcock-Paczynski (AP) effect.It arises from the fact that the observables redshift and angle on the sky need to be converted to length scales assuming a fiducial cosmology which might not be the true cosmology.The AP effect can be expressed as a modification of the observed wave vector k obs and the observed angle µ obs in terms of the true wave vector k and angle µ as and volume deformation V obs := q 2 ⊥ q ∥ .The deformation parameters are where D A (z) is the angular diameter distance and H(z) = a ′ /a is the conformal Hubble function.The superscript fid indicates the value in the reference (fiducial) cosmology.which leads to a volume deformation The Finger-of-God (FoG) effect.It arises from the fact that galaxies in clusters have a velocity dispersion which leads to a smearing of the observed power spectrum.The effect can be modelled as a Lorentzian damping of the power spectrum in redshift space [55][56][57].
It amounts to a multiplicative factor of the form where σ 2 θ (z) := 1 6π 2 (H fid ) 2 P fid θθ (k, z) dk (4.8) The observed power spectrum.Putting all together, we can write the observed (linear) power spectrum including all mentioned effects as . (4.9) Note that the power spectrum of course always also depends on all physical/cosmological parameters ϑ, which we generally have suppressed in this section to avoid cluttering the notation further.
In Figure 6, we show the logarithmic gradients of the observed matter power spectrum ∂ log P obs /∂ log ϑ i with respect to a key subset of model parameters ϑ as obtained with Disco-Dj.Wave vectors parallel to the LOS are indicated in the 0 • direction, while modes perpendicular are at 90 • and 270 • .The radial coordinate is log k, which is clipped to be between k min = 10 −4 Mpc −1 (the center point) and k max = 10 Mpc −1 (the outer circle).The colour scale indicates a positive dependence of P obs on the parameter (red) or a negative dependence (blue).A white region indicates that the information content due to parameter ϑ i of the power spectrum vanishes.
A few aspects are notable.First, the BAO feature is clearly visible in the derivatives w.r.t.H 0 , Y He , and the DE EOS parameters w 0 and w a , indicating that these parameters have a strong impact on the BAO feature (i.e.causing a shift in the BAO scale).Thirdly, the derivatives w.r.t.Ω m and the bias parameter b reveal a quadrupole pattern in the k-µ plane which is due to their impact on the redshift space distortion effect (Ω m determines the growth rate f = d log D + /d log a, and b and f directly determine the RSD effect in the Kaiser formula).

Fisher-forecasting with differentiable solvers
As a proof-of-concept, we demonstrate an application to compute the Fisher information matrix for forecasting.Fisher forecasts are widely used in cosmology for experimental design [58,59], and they require computing derivatives of the likelihood with respect to the cosmological parameters.Typically, these derivatives are computed with finite differences, which requires careful tuning for convergence [60].Autodiff makes the computation of the Fisher matrix easier and avoids the convergence problem.
We demonstrate the computation of a Fisher forecast for the 'optimistic' spectroscopic Euclid survey with 4 bins as described in [56,57].For a Gaussian likelihood, the elements of the Fisher matrix for a summary statistic mean y(ℓ) are where C(ℓ) is the covariance matrix computed at the fiducial cosmology.Following [56], we account for effective bias, anisotropies due to redshift space distortions, and redshift uncertainty in the observed two-dimensional power spectra, given by eq.(4.9).When considering a tomographic survey, the bias parameter b 1 (z) and redshift error, σ z (z) are of course redshift dependent.Specifically, one has that redshift errors grow roughly like σ z = (1 + z)σ z,0 .
Linear covariance and survey properties.In our simplified proof-of-concept analysis, we assumed the linear approximation of the covariance matrix [56,57]:  where δ D is the Dirac delta distribution and V s (z) is the survey volume for the redshift bin at z.We assume a galaxy density for this bin of The survey area in the sky is A survey = 15000 deg 2 , and the rest of the parameters are set as in Table 3 in Ref. [56] (equivalent to Table 3 in [57]).
Observed power spectrum.In the standard Euclid forecasting [56,57], the observed power spectrum Pobs is the linear spectrum P obs from eq. (4.9) modified in three additional ways: (1) the damping of the BAO feature due to non-linearities is mimicked by a blending of the linear spectrum with a filtered version with BAO wiggles removed.Specifically, as in [57], we apply a 3rd order Savitzky-Golay filter of width ∆ log k ≈ 1.53 to obtain the filtered P obs,nw .This is then blended with the original spectrum to yield Pobs (k, µ; z) := P obs (k, µ; z) e −g(k,µ; z)k 2 + P obs,nw (k, µ; z) 1 − e −g(k,µ; z)k 2 (4.13) with the transition scale Then, (2) the result is multiplied by a Gaussian dispersion kernel (cf.[61, modelling the impact of redshift measurement errors, and (3) receives an additional shot-noise contribution (note that we follow here in notation [57], rather than [56] who absorb the 1/n(z) contribution into an effective survey volume) where δP s (z) = δP bin s is a shot-noise nuisance parameter for each bin (with fiducial value 0), and the redshift error kernel is given by where σ z,0 (1 + z) = σ z is the measurement error on the redshift (we adopt σ z,0 = 0.001 as the fiducial value for the spectroscopic Euclid survey).Similarly to [57] we allow for an additional nuisance parameter correcting the bias, i.e. we replace in eq.(4.9) where b(z) = b bin is the fiducial mean bias per bin and δb(z) = δb bin is an additional nuisance parameter for which the trivial dependence on changes in the normalisation is corrected by the scaling with σ 8 .for each tomographic bin.In the k-space integral, we employ Euclid's 'optimistic' cuts on the wave number k included in the Fisher analysis as k min = 10 −3 h fid Mpc −1 and k max = 0.3 h fid Mpc −1 .We assume the four tomographic bins described in Table 3 of [56].The total Fisher matrix is then computed by summing over the different tomographic bins computation of a Fisher forecast for the Euclid spectroscopic survey.While state-of-the-art modelling would have to take into account also non-linearities and other effects, these results serve as proof-of-concept use cases for our differentiable Einstein-Boltzmann solver.By visualisation of parameter gradients of a model, we see value also of differentiable solvers for educational purposes as they are capable of directly showing the impact of changes in the (cosmological) model parameters on the observables.
Beyond the applications shown here, we believe that a differentiable Einstein-Boltzmann solver will be useful for more efficient likelihood maximization and sampling, the development of likelihood-free inference methods, and in particular the development of differentiable nonlinear models, such as differentiable N -body solvers, or differentiable non-linear perturbation theory, based on e.g. the effective field theory of LSS (see e.g.[65] for a review) or highorder Lagrangian perturbation theory (e.g.[66,67]).We expect that they also allow the development of more efficient emulators based on Einstein-Boltzmann solvers [68][69][70][71], as they allow inclusion of information from the derivatives of the power spectrum with respect to the cosmological model parameters when building the emulator and should thus require much fewer training points.Naturally, the gradient information can be made part of the emulator itself (such as [72]), which would allow for usage of gradient information in the inference process.
Disco-Dj is a work in progress, and we plan to extend it in several directions.Most importantly, this module will serve as the starting point for a comprehensive differentiable package for large-scale structure cosmology, which will include a differentiable n-th order Lagrangian perturbation theory (LPT) module, as well as an N -body code.Several papers in this direction are currently in preparation.We also invite the community to extend the Einstein-Boltzmann solver to include non-standard cosmological models, such as modified gravity and alternative dark energy and dark matter models.
We make the Disco-Dj code publicly available upon publication of this paper under the URL https://github.com/ohahn/DISCO-EB.
CDM.As coordinates are defined by freely falling observers, the equations of motion of collisionless cold dark matter are particularly simple:

Figure 1 :
Figure1: Performance of the simplified recombination solver 'RECFASTmini' in Disco-Dj in comparison with the more accurate RecFast and HyRec solvers used in Class, and the original MB95 model[5] (also available in Disco-Dj).We show the ionization fraction x e − as a function of the scale factor a for the baseline cosmology computed using the simplified model based on the original RecFast (blue), the much older MB95 model (orange dashed), as well the most recent RecFast (red dashed) and HyRec (green) modules in Class.The main difference is due to details in the treatment of the combined hydrogen-helium recombination.None of these differences (except MB95) have a measurable effect late-time on matter components.

Figure 2 :
Figure 2: Comparison at a = 1 of the performance of our differentiable Einstein-Boltzmann solver vs. Camb and Class.The top panel shows the power spectrum of density perturbations in baryons (solid), CDM (dashed), and the 0.06 eV massive neutrino (dot-dashed).The lower panels indicate the relative errors P/P ref − 1 for CDM, baryons, massive neutrinos, CDM+baryons, and total matter = CDM+baryons+massive neutrino.The grey bands indicate 1 per cent deviation (light grey) and 0.5 per cent deviation (darker grey).All EB solvers were run on high-accuracy settings for this comparison, but we note that it might well be possible to improve agreement further with other parameter choices.

l 3 :
max g = 64 l max pol g = 64 l max ur = 64 l max ncdm = 64 use ppf = no radiation streaming approximation = 3 ncdm fluid approximation = 3 ur fluid approximation = 3 reio parametrization = reio none Table Parameters used for the reference high-precision Class runs.Note that for Disco-EB, we used only ℓ max = 32.neutrino) and the 'b + c' (baryon+CDM) spectra to those obtained from Camb and Class at z = 0 in Figure 2.

2 ∂ 2 ∂∂∂∂∂∂∂ 5 0 5 ∂ 4 ∂−∂∂ 2 aFigure 3 :
Figure3: Derivative of the total matter (CDM+baryon+massive neutrino) density power spectrum at fixed time (a = 1) with respect to all cosmological model input parameters, evaluated at the fiducial parameter values.We show the result of the Disco-EB autodiff computation (black) and a second-order finite-difference approximation based on Class (orange) and Camb, computed using eq.(3.3) with ϵ = 10 −2 .The respective derivative is indicated by a label in each plot.The top panel in each case represents the value of the derivative as a function of the wave number, while the bottom panel shows the relative difference between the Disco-EB autodiff computation and the finite difference solutions.

∂∂∂∂10 1 k/Mpc − 1 ∂10 1 k/Mpc − 1 ∂ 2 aFigure 4 :
Figure 4: Time and spatial dependence of the logarithmic derivative of the total matter power spectrum w.r.t. the cosmological parameters.Each panel shows the colour-coded change in the power spectrum in arbitrary normalisation, with red indicating an increase and blue a decrease w.r.t. an arbitrary baseline, as a function of scale factor (y-axis) and wave number (x-axis).Compare with Figure 3 which represents a slice at fixed time a = 1.

Figure 5 :
Figure5: Derivative of the redshift-space correlation function multipoles w.r.t. the cosmological parameters.Each panel shows the derivative as a function of scale r of the monopole ξ 0 (r) (blue), the quadrupole ξ 2 (r) (green), and the hexadecapole ξ 4 (r) (purple), the respective correlation function multipoles (without derivative applied) are shown as light dashed lines.We have multiplied the hexadecapole ξ 4 by a factor of 30 for visual clarity.In this case the derivative is carried through the FFTLog method and the Einstein-Boltzmann solver.

Figure 6 :
Figure 6: Logarithmic derivatives of the two-dimensional redshift-space matter power spectrum w.r.t.model parameters ϑ, ∂ log P obs (k, µ)/∂ log ϑ i .The angular coordinate in each plot is the angle between a mode k and the line-of-sight, while the radial coordinate is k, which is clipped to be between k min = 10 −4 Mpc −1 (the center point) and k max = 10 Mpc −1 (the outer circle).The colour scale indicates a relative increase (red) vs. decrease (blue).

8 Figure 7 : 1 − and 2 −
Figure 7: 1− and 2 − Fisher contours predicted for the 'optimistic' Euclid spectroscopic galaxy clustering survey using a simplified model based on Disco-EB (green) in comparison to comparable predictions from the official forecast [63, blue].

Table 1 :
Baseline flat Planck2018 EE+BAO+SN cosmological parameters used for all tests unless otherwise indicated.The dark energy density Ω DE ).All EB solvers were run on high-accuracy settings for this comparison, but we note that it might well be possible to improve agreement further with other parameter choices.