Galaxy clustering multi-scale emulation

Simulation based inference has seen increasing interest in the past few years as a promising approach to model the non linear scales of galaxy clustering. The common approach using Gaussian process is to train an emulator over the cosmological and galaxy-halo connection parameters independently for every scales. We present a new Gaussian process model allowing to extend the input parameter space dimensions and to use non-diagonal noise covariance matrix. We use our new framework to emulate simultaneously every scales of the non-linear clustering of galaxies in redshift space from the AbacusSummit N-body simulations at redshift $z=0.2$. The model includes nine cosmological parameters, five halo occupation distribution (HOD) parameters and one scale dimension. Accounting for the limited resolution of the simulations, we train our emulator on scales from $0.3~h^{-1}\mathrm{Mpc}$ to $60~h^{-1}\mathrm{Mpc}$ and compare its performance with the standard approach of building one independent emulator for each scales. The new model yields more accurate and precise constraints on cosmological parameters compared to the standard approach. As the new model is able to interpolate over the scales space, we are also able to account for the Alcock-Paczynski distortion effect leading more accurate constraints on the cosmological parameters.


Introduction
Galaxies are one of the best tracers of the underlying dark matter distribution in our Universe as a function of time.Spectroscopic galaxy surveys have become a powerful probe of the physical laws governing the formation of the large-scale structures, since they provide precise spatial information for millions of galaxies over large volumes.Surveys such as the 2dFGRS (Colless et al. 2003), 6dFGS (Jones et al. 2004), WiggleZ (Drinkwater et al. 2010), VIPERS (Guzzo et al. 2014), SDSS BOSS and eBOSS (Eisenstein et al. 2011;Blanton et al. 2017;Dawson et al. 2013Dawson et al. , 2016)), have been (and will continue being) one of the strongest tools to understand the accelerated expansion of our Universe and test models of dark energy or modified theories of gravity (Weinberg et al. 2013;Zhai et al. 2017;Huterer 2023;Hou et al. 2023).
The analysis of spectroscopic galaxy surveys consists in the measurement and modelling of two main physical features present in the two-point function of the density field: the baryon acoustic oscillations (BAO) and the redshift space distortions (RSD).While the BAO can be well modelled by linear theory, since they manifest on correlations at large linear scales (around 100h −1 Mpc), RSD are more challenging since there is valuable information on small mildy non-linear scales (below few tens of h −1 Mpc), therefore harder to model accurately.The latest measurements of BAO and RSD from SDSS have put the tighest constraints on the growth history to date (Alam et al. 2021;Ross et al. 2015;Howlett et al. 2015;Alam et al. 2017;Bautista et al. 2021;Gil-Marín et al. 2020;de Mattia et al. 2021;Tamone et al. 2020;Hou et al. 2021;Neveux et al. 2020;du Mas des Bourboux, Helion et al. 2020).Ongoing and future galaxy surveys such as Euclid (Laureijs et al. 2011;Collaboration et al. 2022) and the Dark Energy Spectroscopic Instrument (DESI, Collaboration et al. 2016;DESI Collaboration et al. 2022) span deeper and wider fields of view and will measure tens of millions of galaxies.Such samples of galaxies will significantly reduce statistical uncertainties in the two-point functions, requiring higher precision clustering models to derive unbiased cosmological constraints.
Standard perturbative theories attempt to model mildly nonlinear scales analytically (Taruya et al. 2010;Reid & White 2011;Carlson et al. 2013;Wang et al. 2014).Even more modern approaches (including loop corrections) such as the effective field theory of large scale structure (EFToLSS, Pajer & Zaldarriaga 2013;Carrasco et al. 2014) try to push the model validity to smaller scales, but ultimately such perturbative approaches are limited as the density perturbations on small scales become too large, especially at low redshifts.
An alternative to perturbation theory is to build a simulationbased model to extract information from the small scale clustering.A recent approach consists in running several N-body simulations for different cosmological models, compute the summary statistics for each and use them to predict the clustering for a new set of cosmological parameters.This process is commonly referred to as emulation.Brute iterative solving of dynamical equations in N-body simulations provide the high fidelity description of the matter non-linear clustering, although they are computationally expensive to produce.In the past years several suites of N-body simulations have been produced for this purpose (DeRose et al. 2019;Villaescusa-Navarro et al. 2020;Maksimova et al. 2021).Several emulators were developed (Kobayashi et al. 2020) and some of these were applied to real data to extract cosmological parameters from real data (Kobayashi et al. 2022;Chapman et al. 2022).
Any statistics derived from the properties of dark matter tracers suffers from cosmic variance.This effect decreases as the volume of the simulation increases, but ultimately every mea- In blue the of 88 cosmologies that used to train the emulator, in orange the 6 cosmologies used as a test set.The dark matter density ω cdm , the baryon density ω b , the clustering amplitude σ 8 , the dark energy equation of state w(a) = w 0 + w a (1 − a), the Hubble constant h, the spectral tilt n s , the density of massless relics N ur and the running of the spectral tilt α s (Maksimova et al. 2021).
sured observable comes with its error budget.Hence, an emulator should consider the uncertainty of the training set along with the uncertainty of the predictions.A popular emulation algorithm used in previous works for galaxy clustering (Yuan et al. 2022a;Zhai et al. 2022;Kwan et al. 2023) is the Gaussian Process (GP).This Bayesian machine learning method with physical interpretation not only provides a prediction for the model but also an associated uncertainty.However it is well known that GP models scale poorly with the number of training points N as it requires to build and inverse N × N matrix (N is the number of training samples) a lot of times during the training phase.Each of these operations costs O(N 2 ) memory and O(N 3 ) time.As a consequence the dimension of the input space is often restricted to the cosmology and galaxy-halo connection parameters.Particularly the standard approach of the previous works is to construct an independent emulator for each separation or Fourier mode, ignoring the well-known correlations between scales.In this work we propose a new Gaussian process model, allowing to extend the input parameter space, thus to interpolate/extrapolate over a wider range of parameters such as scales, redshift bins, selection effects etc.As a prove of concept, we built an emulator for the two point correlation function from a redshift z = 0.2 snapshot of the AbacusSummit suite (Maksimova et al. 2021), and extended the input parameter space to include the different separation scales.We implemented our new framework on GPU with the Jax library.
This paper is organised as follows.In section 2 we present the AbacusSummit N-body simulations, the galaxy-halo connection and the observable.In section 3 we describe the Gaussian process framework along with our new model.In section 4 we validate our emulator performance compare to a standard model.Finally, in section 5 we explore the constraining power of our model.

Emulating clustering from N-body simulations
In this section we describe the suite of N-body simulations used in this work.Since these are dark matter-only, we use halo occupation distribution (HOD) to populate galaxies.Then, we discuss the summary statistics we use to evaluate the performance of our emulation technique.

The AbacusSummit suite of simulations
AbacusSummit (Maksimova et al. 2021) is a suite of N-body simulation ran with the Abacus N-body code (Garrison et al. 2021) on the Summit supercomputer of the Oak Ridge Leadership Computing Facility.These simulations were designed to match the requirement of the Dark Energy Spectroscopic Survey (DESI).It is composed of hundreds of periodic cubic boxes evolved in comoving coordinates to different epochs from redshift z = 8.0 to z = 0.1.The base simulations consist of 69123 particles of mass 2 × 10 9 h −1 M ⊙ in a 2 h −1 Gpc) 3 volume, yielding to a resolution of about 0.3 h −1 Mpc.The suite provides halo catalogs formed by identifying groups of particles, built using the specialized spherical overdensity based halo finder CompaSO (see Hadzhiyska et al. 2021 for more details).
AbacusSummit spans several cosmological models sampled around the best-fit ΛCDM model to Planck 2018 data (Collaboration et al. 2020) with 9 different parameters θ Ω = {ω cdm , ω b , σ 8 , w 0 , w a , h, n s , N ur , α s }: the dark matter density ω cdm , the baryon density ω b , the clustering amplitude σ 8 , the dark energy equation of state w(a) = w 0 + w a (1 − a), the Hubble constant h, the spectral tilt n s , the density of massless relics N ur and the running of the spectral tilt α s .A flat curvature is assumed for every cosmology.
Figure 1 shows the sampling over the 9 dimensional space of 88 cosmologies (in blue) that will be used to train the emulator, and 6 (in orange) that will be used as a test set all with the base resolution.The test set includes the center of the hypercube: Planck2018 ΛCDM cosmology with 60 meV neutrinos, the same cosmology with massless neutrinos, 4 secondary cosmologies with low ω cdm choice based on WMAP7, a wCDM choice, a high N ur choice, and a low σ 8 choice.The train set provides a wider coverage of the parameter space.In the following we call X Ω the 88 × 9 matrix containing the cosmological parameters of the training set.
All simulations were run with the same initial conditions so changes in the measured clustering are only due to changes in cosmology.To test the cosmic variance, AbacusSummit also provides an other set of 25 realisations with same cosmology and different initial conditions and an additional set of 1400 smaller (500 h −1 Mpc) 3 boxes with same cosmology, mass resolution but different phases.The small boxes are used to compute sample covariance matrices of our data vectors.All those simulations were run with the base Planck2018 cosmology.While snapshots are available for several redshifts, we only use snapshots at redshift z = 0.2 throughout this work 1 .

Halo occupation distribution model
We use the halo occupation distribution (HOD, Zheng et al. 2007) framework to assign galaxies to dark-matter halos of the simulations.This framework assumes that the number of galaxies N in a given dark matter halo follows a probabilistic distri-1 While a z = 0.1 snapshot exists for AbacusSummit, the smaller boxes for covariance matrix estimations were not available.bution P(N|Ω h ), where Ω h can be a set of halo properties.In this work we assume a standard vanilla HOD, where this distribution only depends on the halo's mass Ω h = M.Moreover, the occupation is decomposed into the contribution of central and satellite galaxies ⟨N(M)⟩ = ⟨N cen (M)⟩ + ⟨N sat (M)⟩.The number of central galaxy occupation follows a Bernoulli distribution while the satellite galaxy occupation follows Poisson distribution.The mean of these distributions are defined as : With erfc(x) the complimentary error function and θ hod = {M cut , σ, κ, M 1 , α} the HOD parameters : M cut the minimum mass to host a central galaxy, σ the width of the cutoff profile (i.e the steepness of the transition 0 to 1 in the number of central galaxy), κM cut the minimum mass to host a satellite galaxy, M 1 the typical mass to host one satellite galaxy, and α the power law index of the satellite occupation.
We use the AbacusHOD implementation (Yuan et al. 2022b) where for each halo, if a random number -drawn from a normal distribution between 0 and 1 -is smaller than ⟨N cen ⟩, a central galaxy is assigned to the center of mass of the halo with the same velocity.Satellite galaxies do not assume a fixed distribution profile such as the Navarro-Frenk-White profile (Navarro et al. (1997)), but rather follow the distribution of the dark matter particles.Specifically, each particle inside an halo has the same probability of hosting a satellite galaxy p = ⟨N sat ⟩/N p with N p the number of particles in a given halo.For every particle, a random number between 0 and 1 is drawn, if it is smaller than p a galaxy is assigned to their position with the same velocity.The abacusutils package implements this halo occupation scheme and it is publicly available2 .To build our trainset, we populate each of the 88 boxes with 600 HODs selected using a latin hypercube sampling, resulting in 52800 different clustering measurements.For the test set, we use a different set of 20 HODS to populate the 6 test cosmologies.We chose the parameters range according to the bright galaxy sample of the DESI one percent survey (Prada et al. 2023).In the following we call X HOD the 600 × 5 matrix containing the HOD parameters of the training set.Note that using the same HOD sampling for every cosmology results in a gridstructured parameter space X Ω ⊗ X HOD where ⊗ is the Kronecker product.This peculiar data structure will be usefull to build the emulator in section 3. Figure 2 illustrates the different forms that the total occupation of galaxy takes for the wide range of HOD parametrisation that we use.The black lines show the mean HOD configuration for ⟨N cen (M)⟩, ⟨N sat (M)⟩ and ⟨N tot (M)⟩.In table 1 we summarize the sampling range of every cosmological and HOD parameters.

Observable of interest: the correlation function
To capture the cosmological information in the small scales clustering of galaxies we use as an observable the standard statistic called two point correlation function (2PCF).It is defined as ξ(r 1 − r 2 ) = ⟨δ g (r 1 )δ g (r 2 )⟩ with δ g (r) the over density of galaxy at position r.The 2PCF is measured in a catalog by counting the pairs of galaxies with the well known Landy-Szalay estimator (Landy & Szalay 1993): where DD, RR, DR and RD are the normalized pair counts of data-data, random-random, data-random and random-data catalogs.The random catalog is used to define the window function of the survey.Because each galaxy has its own peculiar velocity on top of the Hubble flow, the measured positions of galaxies along the line of sight (LoS) are modified by the so called redshift-space distortions (RSD) as : with d and d s the true and distorted radial position, v ∥ the peculiar velocity along the LoS, and H(z) the Hubble parameter at that redshift.Note that because we are using snapshots of N-body instead of light cones, the redshift evolution is not simulated and here z has the same value of 0.2 for all galaxies.Because of RSD, the 2PCF loses its isotropy and no longer only depends on the relative separation between the galaxy pairs s = |s 1 − s 2 |, but on the triangles formed by each pair of galaxies and the observer.Hence the pair counts have to be binned in three dimensions (s, µ, θ), with µ the cosine of the angle between the observer LoS (passing through the midpoint between the galaxies) and the separation vector of the galaxy pair, and θ the angle between the two individual lines of sight.The wide-angle dependence is minor for distant surveys, so we can perform, with a minor loss of information, the projection according to Yoo & Seljak (2015): where P(s, µ, θ) is a probability distribution proportional to the number of pairs at a given triangle configuration and is survey dependant.Note that the wider the distribution in θ is, the more this projection makes the observable survey dependant.This is equivalent to straightly binning the pair counts in (s, µ).
The 2PCF can then be decomposed as a series of Legendre polynomials : with As this work is a proof of concept for our new emulation strategy, we will focus on the monopole ξ 0 hereafter.The study of all multipoles is left for future work.The most expensive part of computing the estimator defined in Eq. 3 is the randomrandom pair counting.As we have cubic boxes with periodic boundary conditions, RR can be analytically computed with a fixed LoS for every pair counts and we can make use of the natural estimator Peebles P. J. E. (1974) : greatly reducing computational time.This methodology could however bias the clustering as a fixed LoS is equivalent to assuming θ = 0 for each pair of galaxies, or that the observer is infinitely far away.This flat sky approximation should break down for low-z surveys and large separation bins.We tested this hypothesis with the 25 Planck2018 boxes, using the median HOD.
In figure 3 we show the mean clustering with its standard deviation computed using Eq. 8 with a fixed LoS in red, and using Eq. 3 with a varying midpoint LoS in blue.In the latter case, in order to get a realistic clustering we place the observer at the center of the box, and cut a spherical full sky footprint from z = 0.15 In red ξ 0 is computed using Eq. 8 with a fixed LoS,and in blue using Eq. 3 with a varying midpoint LoS.In the latter case in order to get a realistic clustering we place the observer at the center of the boxes, and cut a spherical full sky footprint from z = 0.15 to z=0.25.The bottom panel shows the residual using the cosmic variance of the (realistic) varying LoS.
to z = 0.25 in radial distance.The bottom panel shows the residual using the cosmic variance of the varying LoS which is of course larger an more realistic.The shaded area correspond to a 1σ deviation.We see that although the deviation seems to increase as the separation s gets larger, the difference is well within the cosmic variance.For the considered redshift and separation bins, the Peebles estimator gives an unbiased description of the clustering.
Figure 4 shows the clustering measured for every cosmology and HOD using Eq. 8 and applying RSD along a fixed LoS that we choose to be the z axis of the box.A sample of the train and test sets are shown in blue and orange respectively.We choose the separation range to be a logarithmic binning going from the resolution of the simulation 0.3 h −1 Mpc to 60 h −1 Mpc where the flat sky approximation is still holding.Hereafter we define as X s the vector containing those separations (used in section 3).To reduce shot noise, we populate five times the halos using the same HOD model, only changing the random seed, and we average their clustering.
Although AbacusSummit covers a large volume, the train and test cosmology boxes are run with the same initial phase, and the measured clustering is still subject to cosmic variance.To quantify the uncertainties on our measurements, we measure the 2PCF on 1400 of the small Planck2018 boxes using the median HOD and rescaling the volume, and build the covariance matrix C cosmic .The corresponding correlation matrix is shown in figure 5. We see that while the scales below a few h −1 Mpc are almost uncorrelated, the large scale correlations should not be neglected.

Multi-scale Gaussian process Model
In the following sections we give a breve description of GP formalism, and introduce our new model allowing to consider every scales at the same time.For a more complete discussion on GP see Rasmussen & Williams (2006).

Standard GP regression model
Let D = {(x i , y i )| n i=1 } be a training set composed of n pairs of ddimension input parameter vectors x i and scalar observations y i .Let X be the n × d matrix concatenating every input vector and Y a n column vector.We suppose that the observed values y i are given by where f is the true underlying mapping function we wish to learn and ϵ is an additive noise following an independent identically distributed Gaussian distribution with mean zero and variance where I is the identity matrix.We assume that the mapping function can be described as where Φ(x) is a set of n f feature functions (e.g.polynomials Φ(x) = (||x||, ||x|| 2 , ||x|| 3 , ..., ||x|| n f )) mapping the d-dimensional input vector into a n f -dimensional feature space, and w is a n f -dimension vector of unknown weights (or parameters).This model is a linear function of w for any set of feature functions hence easier to handle analytically in the training process for instance.
The likelihood of observing Y for a set of input parameters X and weights w can then be written as following a Gaussian distribution Choosing a Gaussian prior with zero mean and covariance Σ for the weights and following Bayes' theorem we can write the posterior distribution over the weights as where p (Y|X) is the evidence or marginal likelihood.
Making n * predictions f * for a new set of input vector X * (with X * a n * × d matrix) requires to marginalise over the weight distributions.The evidence is a just normalisation constant.Using the Gaussian properties of the prior and likelihood, an analytical expression for the posterior distribution of the prediction can be derived.It is expressed as: where and Φ = Φ(X) and Φ * = Φ(X * ).This posterior is a normal probability distribution over a function space.Note that making a prediction at some new location of the input space is equivalent to analytically performing a Bayesian inference.Thanks to the linearity of the model and the Gaussian assumptions, the posterior probability can be analytically computed.
Let us now define the kernel k(x, x ′ ) = Φ(x) ⊤ ΣΦ(x ′ ) (Note that k(x, x ′ ) returns a scalar).Doing so allows us to switch from the weight-space view to the so called function-space view.The expressions for the mean and covariance of the model posterior become: where K(X, X * ) denotes the n × n * covariance matrix built by evaluating the kernel k(x, x * ) for the different values of x and x ′ (i.e K ij = k(x i , x * j )).Specifying a family of feature functions, and letting their number n f go to infinity leads to compact form of kernels.The more popular in the literature (obtained using Gaussian feature functions) is the squared exponential kernel or the Matern kernels These are stationary kernels, functions of the distance between two points in the input space, commonly defined as where l is a ddimensional vector containing characteristic length-scales across each dimension of the input space.A large value for some component of l (compared to the input space variance) describes a smoothly varying signal in the corresponding dimension, thus allowing for a larger range on interpolation.In practice, every kernel is also scaled by an additional hyperparameter σ 2 to account for the signal amplitude in the output space.Note that the sum or product of different kernels also give a valid kernel.The training of the GP consists in choosing one or more kernels and inferring the hyperparameters θ = {l, σ} from the input parameter matrix X and their corresponding output observations Y.This is equivalent to a two level hierarchical Bayesian inference where Eq 14 is rewritten Now maximising the posterior probability for new predictions requires to maximize the evidence p (Y|X, θ).This term now acts as a likelihood in the second level of the hierarchical Bayesian inference.The posterior over the hyperparameters can then be expressed as where p (θ) is the hyper-prior and p (Y|X) is the hyper-evidence.This second level inference is usually not analytically tractable and requires algorithmic maximisation of the log-likelihood The first term in the right-hand side of the equation above describes the goodness of the fit while the second term is the complexity penalty which is independent from the observables Y, and allows (partially) GP models to avoid overfitting.We recall the standard result for the gradient of the log-likelihood with respect to a given hyperparameter θ i : with Training a GP (learning the hyperparameters) requires to compute this likelihood and its gradient several times, which involve getting the inverse and the determinant of the n × n matrix K y .For a training set larger than a few thousands of samples, these operations can become computationally prohibitive.For example, considering our case described in section 2, this matrix would have n = n Ω × n HOD × n s = 88 × 600 × 30 ∼ 10 6 , where n Ω , n HOD and n s are the number of sampled cosmologies, HODs and separations, respectively.

Multi-scale GP model
To overcome the issue of large matrix inversions for the standard GP, we can use a particular way of sampling the training data input parameters.Those are known as Kronecker GP models, and we describe them in the following.
Let the input parameter matrix of the training set X be composed of n samples of d dimensions.If this matrix can be decomposed as a Kronecker product X = X 1 ⊗ X 2 where X 1 and X 2 have smaller dimensions (n 1 × d 1 and n 2 × d 2 respectively) with n = n 1 × n 2 and d = d 1 + d 2 .This decomposition is only possible if the training set is grid-sampled (eg. the same HOD parameters are repeated for each set of cosmological parameters), which is sub-optimal, but as a result the covariance kernel can be computed as a Kronecker product of two lower dimension matrices This structure allows to make use of the algebraic properties of Kronecker products.If K is positive definite so it can be written as a principal component decomposition K = QΛQ ⊤ , where i , with every Λ matrix diagonal.Thus every calculation can be performed in sub dimensional spaces.Such a model was implemented in the Python library GPy 3 but only for Kronecker decompositions into two sub-spaces.Saatci (2011) gives a generalisation of this framework to k decompositions X = k i=1 X i but valid only for a diagonal uncorrelated noise matrix N = σ 2 I.
In this work, we extended this framework to general correlated noise covariance matrices of the form N = k i=1 N i .This is important in clustering modelling because the noise covariance matrices are generally far from being diagonal as seen in figure 5.The essential step is now to compute the inverse and the determinant of (K + N) in Eq 23.Assuming a Kronecker structured training set and performing the decompositions where we used different properties of the Kronecker product, and defined the projected matrices Ki = S so the determinant from Eq 23 is simply the product of the eigenvalues products S −1/2 |Λ + I| S −1/2 .The goodness-of-fit term is computed by iteratively using the kronmvprod algorithm (described in Saatci (2011)) that effectively evaluates operation of the form α = k i=1 A i b with b a column vector.
We also require the evaluation of log-likelihood gradients.The first term of Eq 24 can be easily computed as the hyperparameters are specific to a particular sub-dimension d, hence to a kernel K d , i.e.,

∂K ∂θ
while the second term of Eq 24 can be computed using the cyclic properties of the trace operator: Article number, page 7 of 18 The last terms in above equations are given by We provide an implementation of this model using GPy framework and based on JAX GPU parallelisation for algebra operations.It is publicly available as the MKGpy4 package.

Application to galaxy clustering
To apply this model to our observable, we decompose the parameter space as a gridded structure X = X Ω ⊗ X HOD ⊗ X s .Specifically, as mentioned in section 2.2, we measure the correlation function on the same scales using the same HOD sampling for every cosmology, allowing us to decompose the signal and noise covariances as We found that the optimal signal kernels (maximizing the log-likelihood) are simply two Matern 3/2 kernels for cosmological and HOD spaces and a squared exponential kernel for the scales space.This choice of modeling leads to 16 signal hyperparameters θ = {l Ω , l HOD , l s , σ} composed of 9 lengthscales for the cosmological parameters l Ω , 5 for the HOD parameters l HOD , one for the separations l s and one overall variance of the signal σ.For the noise kernels we chose N s to be the fixed covariance matrix of the 2PCF monopole computed from the 1400 small boxes (figure 5).For the cosmological and HOD noise kernels with chose diagonal heteroscedastic kernels of the form N(x, x ′ ) = σ n (x) 2 δ D (x−x ′ )I, allowing a different variance for every HOD and cosmological configuration.We let σ n to be fully parameterized vectors, leading to 88 noise hyperparameters for each cosmology and 600 for each HOD.Those noise hyperparameters σ n Ω , σ n HOD are fitted along with the signal hyperparameters θ.
To avoid mathematical overflow and ease the hyperparameter learning, we normalise each dimension i of the input and output training sets as follow X Ω and X HOD are normalized to be distributed in the range [0, 1].Because we use stationnary kernels k(x, x ′ ) = k(|x − x ′ |) we need to make sure that the variance of the signal is roughly constant as a function of the input x, which is not the case along the subspace X s .For this input dimension we use a logarithmic transformation to flatten the signal before applying the normalisation.We also normalise the output training set Y to be normally distributed around zero with a variance of one.Because of the large range of magnitude span by the 2PCF, we also apply a logarithmic transformation before doing so.
The covariance matrix used as N s must also be projected in the normalised space.To do so we use the error propagation Ñs = JN s J ⊤ (32) with J the mapping Jacobi matrix J i j = ∂ Ỹi ∂Y j computed for the mean of the 1400 small boxes.Once the emulator is trained, every new input configuration X test should be normalised using Eqs 30, and every model prediction µ f and its covariance Σ f (Eq.18) should be transformed back into the real space using Eqs 31 and 32.
The hyperparameters are optimised using the Scipy gradient based algorithm lbfgs with 10 random initialisations and selecting the best-fit hyperparameters yielding the highest likelihood.To compare performances, we train both the standard GP where each separation is treated independently (as separate emulators) and our new multi-scale GP.We employ the same kernels, normalisation and optimization method.

Emulator performance
In this section we compare the performances of two emulators.First, the independent Gaussian process model (IGP), where each scale s of the correlation function ξ(s) has its own independent GP depending on cosmological and HOD parameters.And second, our multi-scale Gaussian process model (MGP) able to predict the full model for ξ(s) as well as an associated non-diagonal model covariance.

Prediction accuracy
We computed predictions for the test-set composed of 20 HOD × 6 cosmologies (see section 2 for details) using both IGP and MGP models, and compared with their observed correlation function monopoles.
Figure 6 describes the IGP and MGP performances using the full test set.The grey lines show the relative difference in % between the MGP emulator prediction and the expected values from the test set.The black dotted line correspond to a 1% deviation.The blue and green lines are the median of those deviations for the MGP and IGP models respectively, assessing a subpercent precision for both models on most of the scales.However, the correlation functions of our test set are noisy due to cosmic variance and shot noise, so we do not expect the difference ∆ ≡ ξ emu − ξ true to be smaller than the estimated uncertainty of ξ true .The red dashed line shows the median of σ ξ true in % with σ the diagonal of C cosmic .A difference smaller than this reference would indicate overfitting of the particular initial phase of the simulations, while larger differences would show that the prediction performance is poor.Both MGP and IGP models seems to be slightly overfitting the initial phase for the large scales, but follow the expected accuracy on small scales.Note that for the very small scales, the MGP is less overfitting the test set than the IGP.We see that the relative error of the full sample in grey varies around the cosmic variance expectation.Note that the dotted red line corresponds to the cosmic variance of the central cosmology and HOD.The wide HOD scatter causes large variations to the clustering amplitude and hence to the cosmic variance.

Robustness to cosmic variance
Our models have been trained on correlation functions from simulations ran with different cosmologies and HOD parameters, though all started from the same initial phase.Over-fitting this particular phase could potentially bias our clustering predictions on simulations with different initial phases.We tested this issue with 25 phase realisations of AbacusSummit ran with the same Planck2018 cosmology and the same set of HOD parameters.
Gaussian processes provide not only a prediction for a model, but also an estimate for the model uncertainty that de-0.3 0.5 0.9 1.5 2.7 4.6 7.9 13.6 23.pends on the input parameters θ.This uncertainty is expressed as a covariance matrix, noted C emu (θ).When performing inferences on real data, this model covariance has to be added to the measurement uncertainty for the considered scales C cosmic as: We deliberately dropped the dependency of C cosmic on θ as it is often considered fixed during inferences.
Figure 7 shows the difference ∆ between the 25 measurements and predictions from both the MGP (blue) and the IGP (green) emulators.The red dashed line displays the cosmic variance level while blue and green lines show the total variance (model + cosmic) according to Eq. 33.We see that for intermediate scales, ∆ is larger than the cosmic variance level, while using C tot lowers the deviation to less than one sigma.We also note that for this example the variance of the MGP is larger than the one from the IGP, however figure 7 only presents the diagonal elements of the covariance matrices.
Figure 8 shows the total (model + cosmic) correlation matrix of the MGP.Notice that compared to C cosmic alone (Figure 5), the small scales are now significantly correlated.The IGP's predicted covariance matrix is diagonal by construction so it cannot account for such correlations, potentially affecting cosmological constraints.We investigate this issue in section 5.
The 25 mocks clustering measurements are shown in the top panel of figure 9 with the average in red dotted line.The bottom panel describes the 25 residual for both IGP (green) and MGP (blue) predictions.With ∆ = ξ emu − ξ true and L −1 the Cholesky decomposition of the inverse of the total covariance C tot .This residual metric is usefull as it takes into account the correlations between scales.The shaded area correspond to 1σ and 2σ deviation.For most of the scales, both MGP and IGP models are able to reproduce the clustering for the different realisations within 1σ deviation.For every scales, the residuals are within 2σ deviation.It seems that for both models the predicted uncertainty C emu solves the initial condition phase problem.We check in the next section the impact on the recovered cosmological and HOD parameters.

Cosmology recovery and constraints
In this section we describe the performances of our models to recovering unbiased cosmological and HOD parameters.We used the clustering measured on 25 realisations of the Planck2018 cosmology using the same median HOD parameters.

Parameter inference using emulator model
The parameter inferences are performed by running MCMC chains with the library emcee, using a flat prior corresponding to Table 1 and a standard Gaussian log-likelihood defined as where µ is the data vector, m(θ) is the model prediction for a given set of parameters θ = (θ Ω , θ hod ), and C tot (θ) is the total (model + cosmic ) covariance defined in Eq. 33.Compared to an usual cosmological inference, the terms C −1 tot (θ) and log|C tot (θ)| must be computed for every iteration because of the emulator covariance dependence on the parameters.To do so efficiently, we make use of the fast and stable Cholesky decomposition.
For a given set of cosmological parameters we can infer the corresponding growth rate f .The uncertainty on the measured growth rate is simply propagated as ∂Ω j , where C Ω i j is the covariance of the inferred cosmological parameters.We use the jacobi5 python library to compute our model's gradient.

Cubic boxes : Average fit
We first tested the performance of both IGP and MGP models to recover the parameters of interest when fitting a high precision measurement: the average stack of the 25 correlation function monopoles, corresponding to a volume of 200 (h −1 Gpc) 3 .We used the likelihood defined in Eq. 34, divided the sample covariance C cosmic by 25 and run two inferences with and without using the emulator predicted covariance C emu , for each model.
Figure 10 shows the 68 and 95% confidence regions obtained for the cosmological and HOD parameters.The true values are shown by the dotted lines.We observe several interesting points.First, neglecting the model covariance leads to strongly biased results for the IGP with very sharp posterior distributions, while the MGP performs significantly better though yielding multimodal posteriors.Second, using the model covariance allows to recover unbiased results for all cosmological parameters within one sigma for both models.Third, while MGP and IGP contours are consistent, the MGP model gives tighter constraints on cosmological parameters, and the IGP model gives tighter constraints on the HOD parameters.We also see in the posterior distributions several local maxima, specially in the green IGP contours.When sampling cosmological parameters far from the ones used in the training process of the GPs lead to larger model covariances while fitting the data points.This can also be seen as the goodness-of-fit term being counter-balanced by the determinant in the likelihood.Ignoring the model covariance can cause the fitter to converge toward a point that, when considering the total covariance, turns out to be only a local maxima in the log likelihood.
Figure 11 compares the best fits of the MGP and IGP models in blue and green respectively.The bottom panel displays the residuals, using the Cholesky decomposition matrix to account for correlations between scales.The one sigma deviation limit is set by the grey shaded area.For the MGP model we get a reduced chi squared of χ 2 r = 5.6 and for the IGP χ 2 r = 31.These large χ 2 r values are due to the large volume probed by the average of 25 simulations.These χ 2 r values and the bottom panel of Figure 11 show that the data are much better adjusted by the MGP model over all scales.

Cubic boxes : 25 individual fits
To check statistically the accuracy and potential biases of our models, we fit each of the 25 boxes separately, corresponding to a probed volume comparable to current surveys.The cosmic covariance matrices were normalised to account for this change in volume.Figure 12 shows the reduced chi-squared χ 2 r , the best-fit growth rate parameter f along with their estimated uncertainties σ f .The derived cosmological parameters along with their inferred uncertainties are shown in Figure 13 and the same for the HOD parameters in Figure 14.Table 2 reports some summary statistics from those results, where ⟨∆ p ⟩ is the difference between the true and best-fit value for parameter p; ⟨σ p ⟩ is the average over the 25 realisations of the (symmetrised) one sigma confidence level; and std(p) is the standard deviation of the estimated parameter p over the 25 fits.We note that the bias in the growth rate measurement is reduced with our MGP model, and that both models give consistent results, recovering the growth rate within one sigma.The reduced chi-squared distributions are similar, but with this sample the IGP model gives in average a better fit with ⟨χ 2 r ⟩ = 4.93 while we have ⟨χ 2 r ⟩ = 7.36 for MGP  for dof = (30 − 14) = 16 degrees of freedom.Although it is hard to draw conclusions because of the limited size of the sample, in both cases the mean estimated error ⟨σ f ⟩ does not exactly match the standard deviation of the best-fit parameters std(p), but is conservatively larger.We also see that our MGP model yields smaller dispersion and in tighter constraints on f .Figure 13 showcases similar results for the cosmological parameters.Both models are able to retrieve the right parameters within one sigma, but the MGP returns smaller bias, smaller dispersion and tighter constraints for all parameters except the num-ber of ultra relativistic species N ur which is poorly constrained.Specially, the uncertainty on amplitude parameter σ 8 is reduced by a factor larger than 4, while reducing the bias on the best fitted value.For the HOD parameters illustrated in Figure 14, we likewise get unbiased measurements.Once again the bias and the dispersion for the best fitted values are overall reduced with the MGP, however the IGP gets tighter constraints.Hence, it seems that using the correlation between scales helps constraining the cosmology rather than the HOD parameters as the HOD signal is mainly in the small scales where the cosmic covariance is nearly Table 2: Summary statistics for the fits of the 25 cubic mocks with Planck2018 cosmology and the median HOD.⟨∆ p ⟩ is the bias, the difference between the true and best fitted value for any parameter p. ⟨σ p ⟩ is the average over the 25 realisations of the (symmetrised) one sigma confidence level.std p is the standard deviation of the estimated parameter p over the 25 results.Note that f is a derived parameter from the other cosmological ones.5), while all scales are affected by cosmological parameters.

Observational effects : wide angle and AP
We validated the MGP model on cubic mocks similar to the simulations used to train the emulators and showed that it performs significantly better than the standard IGP.Now we wish to test the constraining power and accuracy of the MGP model on more realistic measurements.For each of the 25 realisations, we placed the observer at the center of the box.As the snapshot describes the clustering at redshift z = 0.2 we select galaxies at redshifts 0.15 < z < 0.25 from the observer, in order to not make large changes to the observed clustering.We consider each survey to cover the full sky.We used the Landy-Szalay estimator of Eq. 3 to compute correlation functions.The measurement of the 2PCF requires the choice of a specific fiducial cosmology to convert redshifts into comoving distances.Differences between the fiducial and the true cosmology lead to distortions on the measured clustering known as the Alcock-Paczynski (AP) effect (Alcock & Paczynski 1979).This effect is often neglected when building an emulator (Yuan et al. 2022a;Zhai et al. 2022) as it is small on very small scales.Moreover, modeling the AP distortion requires either one extra parameter in the training input space or to have a continuous model over different scales for interpolation.By construction, our MGP model is more adapted than the IGP to build a continuous model, which can be evaluated at any scale within the training range.
If the choice of fiducial cosmology is not far from the true cosmology, the AP effect can be modelled with two parameters scaling the radial and angular separations.However, since we are only using the monopole of the 2PCF, we only have access to isotropic information.We use the isotropic dilatation parameter α V described in Moon et al. (2023) to scale separations as : where with the isotropic distance defined as a combination of the Hubble distance (radial) D H (z) = c/H(z) and the comoving angular distance D M (z) = (1 + z)D A (z) : We picked cosmology c177 of the AbacusSummit suite as a fiducial cosmology which is different than the true one (Planck18), resulting in a 2% variation in the isotropic dilatation parameter α v ≈ 1.02.We measure the clustering of the 25 realisations of our survey-like mocks using both the true and c177 as fiducial cosmologies.The average monopoles along with the residual (using the cosmic variance of the survey-like mocks) are shown in Figure 15.Although the cosmic variance is larger for the survey-like mocks compared to the cubic boxes measurements due to the reduced volume, the AP distortions lead to significant deviations for every scales, up to 10 σ for scales of few h −1 Mpc.
To assess if ignoring the AP distortions can lead to biased cosmological constraints, we fit separately the 25 realisations of the AP distorted clustering, using the MGP model with and without (setting α V = 1) accounting for the AP distortions.We refer to the former as MGP α V .In both cases we rescale the sample covariance matrix with the corresponding spherical volume.We also compare the impact of ignoring the AP effect using MGP and IGP in appendix A.
Figure 16 displays the distribution of reduced chi-squared, best-fit growth rate f and their estimated uncertainties σ f .The best-fit cosmological and HOD parameters with their uncertainties are shown in Figures 17 and 18 respectively.Table 3 shows summary statistics as in Table 2, now for the survey-like mocks.Table 3: Summary statistics for the fits of the 25 mocks with spherical full sky footprint from z = 0.15 to z = 0.25, Planck2018 cosmology and the median HOD, where the clustering was measured using cosmology c177 of the AbacusSummit suite as fiducial cosmology resulting in an isotropic dilatation parameter α v ≈ 1.02.⟨∆ p ⟩ is the bias, the difference between the true and best fitted value for any parameter p. ⟨σ p ⟩ is the average over the 25 realisations of the (symmetrised) one sigma confidence level.std p is the standard deviation of the estimated parameter p over the 25 results.Note that f is a derived parameter from the other cosmological ones.With a significantly larger cosmic variance, the models better fit the data, thus the reduced chi-squared distributions are less spread and more centered on one compared to the cubic boxes fits.The data points are also better described when accounting for AP distortions, the mean of the distributions are ⟨χ 2 r ⟩ = 0.55 for MGP α V and ⟨χ 2 r ⟩ = 2.43 for MGP for dof = (30 − 14) = 16 degrees of freedom.In both cases we recover the expected growth rate within one sigma, though ignoring the AP effect increases the bias by a factor of ten and gives larger errorbars, with best fitted values for f also more scattered.We also note that compared to the previous cubic box fits, although the cosmic variance of the survey-like mocks is larger, the growth rate is more constrained in both cases.We see similar results on Figure 17 for the cosmological parameters, when considering AP distortion in the modelling, we reduce the bias and tighter the constraints.Except for the dark energy parameters w 0 and w a , where although MGP α V gives tighter constraints, the bias are larger than when using MGP.However, as those parameters are weakly constrained, the difference in the growth rate measurement accuracy between the two models mainly comes from the ω m and h parameters.For the HOD parameters showcased in figure 18, we have in both cases unbiased results for every parameters.However as HOD parameters are mostly constraints by small scales where the AP effect is less important, the MGP α V model does not bring any substantial improvement, and even gives in average looser constraints then the MGP without AP modelling.

Conclusions
In this work we reviewed the standard methodology for training an emulator model for galaxy clustering at non linear scales from a suite of dark matter simulations.We introduced a new Gaussian Process model, allowing to efficiently extend the input parameter space dimension as well as account for correlated noise.As a proof of concept for our model, we emulated the redshift space 2PCF of galaxies over the input space X = X Ω ⊗ X HOD ⊗ X s , allowing an additional interpolation over a continuous range of scales, and making use of the well known correlations between them, both in the signal and the noise.We implemented our multi-scale Gaussian Process model (MGP) as well as the standard approach which consists in constructing independent Gaussian Processes (IGP) for each separation.We used as a train set 88 cosmologies, 600 HOD and 30 separation scales, all measured in the AbacusSummit suite of n-body simulations.The input parameter space of the resulting models is composed of 9 cosmological parameters, 5 HOD parameters, and the set of separations of the 2PCF.
After validating the predictive accuracy of our trained model on a test set with 6 cosmologies and 20 HOD, we ran inferences using MCMC to check the robustness and constraining power of our models.In a high-precision setting, where we used the average 2PCF of 25 realisations, ignoring the predicted uncertainties of our GP models can lead to biased results, specially with the IGP model.We also demonstrated that both models are able to recover the expected HOD parameters.Marginalising over such parameters, we are able to obtain unbiased cosmological constraints.Our MGP model returns more accurate and precise constraints on the cosmological parameters than the IGP.Moreover, we showed that Alcock-Paczynski distortions can become important compared to the cosmic variance at intermedi-ate scales for the redshifts we are testing.Neglecting this effect in an emulator model as it has be done in previous work contributes to loosen the constraints and increase the bias on the growth rate measurement.
The parameter space of such emulator models could also be extended to higher dimensions to be able to interpolate and marginalize over systematic quantities, extended HOD models, selection effects, different redshift bins and more.Some drawbacks for a too high dimensions training set yet remain.First, building the training set is expensive.Second, although a Kronecker GP model with a very large number of training points is fast to train, the predictions of the model covariance can be slow .It is also important to remind that for galaxies at lower redshift, building a model would require more realistic simulations reproducing redshift evolution (light cone), to account for wild angle effects.Applying such model to real measurements would also need beforehand to be validated on realistic mocks using different galaxy-halo connection.
Our MGP model could be used in future works to emulate a larger set of galaxy clustering observables and their cross correlations such as: galaxy and peculiar velocity auto and cross correlations, galaxy clustering and weak lensing, etc. Accurate models of these correlations are essential in the era of high precision measurements from surveys such as DESI or Euclid.
Table A.1: Summary statistics for the fits of the 25 mocks with spherical full sky footprint from z = 0.15 to z = 0.25, Planck2018 cosmology and the median HOD.⟨∆ p ⟩ is the bias, the difference between the true and best fitted value for any parameter p. ⟨σ p ⟩ is the average over the 25 realisations of the (symmetrised) one sigma confidence level.std p is the standard deviation of the estimated parameter p over the 25 results. (

Fig. 1 :
Fig.1: Sampling over the 9 dimensional cosmological space around Planck2018 ΛCDM.In blue the of 88 cosmologies that used to train the emulator, in orange the 6 cosmologies used as a test set.The dark matter density ω cdm , the baryon density ω b , the clustering amplitude σ 8 , the dark energy equation of state w(a) = w 0 + w a (1 − a), the Hubble constant h, the spectral tilt n s , the density of massless relics N ur and the running of the spectral tilt α s(Maksimova et al. 2021).

Fig. 2 :
Fig. 2: Different forms that takes the total occupation of galaxy for the 600 HOD parametrisation used.The black lines show the mean HOD configuration for ⟨N cen (M)⟩, ⟨N sat (M)⟩ and ⟨N tot (M)⟩.

Fig. 3 :
Fig.3: Mean clustering with its standard deviation of the Planck2018 25 boxes using the median HOD.In red ξ 0 is computed using Eq. 8 with a fixed LoS,and in blue using Eq. 3 with a varying midpoint LoS.In the latter case in order to get a realistic clustering we place the observer at the center of the boxes, and cut a spherical full sky footprint from z = 0.15 to z=0.25.The bottom panel shows the residual using the cosmic variance of the (realistic) varying LoS.

Fig. 4 :Fig. 5 :
Fig.4: Emulator train and test sets in blue and orange respectively.The clustering for every cosmology and HOD is measured using Eq. 8 and applying RSD along a fixed LoS that we choose to be the z axis of the box.

Fig. 6 :Fig. 7 :
Fig. 6: Grey lines show the relative difference in % between the MGP emulator prediction and the expected values for the full test set.The blue and green lines are the median of those deviations for the MGP and IGP models respectively.The black dotted line correspond to 1% deviation.The red dotted line shows the median of |σ/ξ true | in %.With σ 2 the cosmic variance computed for the Planck 2018 cosmology and median HOD.

Fig. 8 :
Fig. 8: Total correlation matrix of the MGP including the model uncertainty prediction C tot = C cosmic + C emu (θ Ω , θ hod ).The measurement uncertainty C cosmic was evaluated for the Planck2018 cosmology with the median HOD using the 1400 small boxes.Including C emu (θ Ω , θ hod ) significantly correlates the small scales.

Fig. 9 :
Fig. 9: In the upper panel the 25 realisations of Planck2018 cosmology using the median HOD are shown in grey.The average is shown in red dotted line.The 25 residuals for both IGP (green) and MGP (blue) predictions are shown in the bottom panel.L −1 is the Cholesky decomposition of the inverse of C tot .The grey shaded areas correspond to one and two sigma deviations.

Fig. 10 :
Fig. 10: Posterior distribution of the cosmological and HOD parameters obtained after running MCMC inference on the average stack of the 25 correlation functions with Planck 2018 cosmology and the median HOD.The results obtained using the IGP model with(without) the model covariance prediction are shown in green(red), and those obtained using the MGP model with(without) the model covariance prediction are shown in blue(black).The true parameters are pointed by the black dotted lines.

Fig. 11 :
Fig. 11: The 25 realisations of Planck2018 cosmology with the median HOD in grey.The average is shown in red dotted line and the MGP prediction in blue.The 25 residuals for both IGP (green) and MGP (blue) predictions are shown in the bottom panel.L −1 is the Cholesky decomposition of the inverse of C tot .

Fig. 12 :
Fig. 12: Reduced chi squared, best fitted growth rate and corresponding uncertainties, resulting from the separate fits of the 25 cubic mocks with Planck2018 cosmology and the median HOD.Results using the MGP and IGP models are shown in blue and green respectively.The true parameters values are pointed by a red dotted line.

Fig. 13 :Fig. 14 :
Fig. 13: Best fitted cosmological parameters along with the corresponding uncertainties, on the 25 cubic mocks with Planck2018 cosmology and the median HOD.Results using the MGP and IGP models are shown in blue and green respectively.The true parameters values are pointed by a red dotted line.

Fig. 15 :
Fig. 15: Top panel : mean clustering with its standard deviation of the 25 mocks with Planck2018 cosmology, the median HOD and a spherical full sky footprint from z = 0.15 to z = 0.25.The true and AP distorted clustering are shown in orange and indigo respectively.The bottom panel shows the residual using the cosmic variance of the varying LoS clustering.

Fig. 16 :
Fig.16: Reduced chi squared, best fitted growth rate and corresponding uncertainties, resulting from the separate fits of the 25 mocks with spherical full sky footprint from z = 0.15 to z = 0.25, AP distortion, Planck2018 cosmology and the median HOD.Results using the MGP with and without modeling AP distortions are shown in cyan and blue respectively.The true growth rate value is pointed by a red dotted line.

Fig. 17 :Fig. 18 :
Fig. 17: Best fitted cosmological parameters along with the corresponding uncertainties, on the 25 mocks with spherical full sky footprint from z = 0.15 to z = 0.25, AP distortion, Planck2018 cosmology and the median HOD.Results using the MGP with and without modeling AP distortions are shown in cyan and blue respectively.The true parameters values are pointed by a red dotted line.
Fig. A.1: Best fitted growth rate and corresponding uncertainties, resulting from the separate fits of the 25 mocks with spherical full sky footprint from z = 0.15 to z = 0.25, AP distortion, Planck2018 cosmology and the median HOD.Results using the MGP and IGP models are shown in blue and green respectively.The true parameters values are pointed by a red dotted line.

Table 1 :
Cosmological and HOD parameter names and ranges sampled to build the emulator training and testing sets.The cosmology bounds come from the smallest hypercube around the AbacusSummit cosmology grid.The HOD bounds come from the the minimum and maximum in each parameter within the 600 HOD.