Cosmological gravity on all scales. Part II. Model independent modified gravity N-body simulations

Model-independent constraints on modified gravity models hitherto exist mainly on linear scales [1]. A recently developed formalism presented a consistent parameterisation that is valid on all scales [2]. Using this approach, we perform model-independent modified gravity N-body simulations on all cosmological scales with a time-dependent μ. We present convergence tests of our simulations, and we examine how well existing fitting functions reproduce the non-linear matter power spectrum of the simulations. We find that although there is a significant variation in the accuracy of all of the fitting functions over the parameter space of our simulations, the ReACT [3] framework delivers the most consistent performance for the matter power spectrum. We comment on how this might be improved to the level required for future surveys such as Euclid and the Vera Rubin Telescope (LSST). We also show how to compute weak-lensing observables consistently from the simulated matter power spectra in our approach, and show that ReACT also performs best when fitting the weak-lensing observables. This paves the way for a full model-independent test of modified gravity using all of the data from such upcoming surveys.


Introduction
In the current cosmological paradigm, dark matter (DM) and dark energy (DE), the so-called dark sector, are the dominant components of the energy density of the Universe, but their underlying physics is still unknown.In addition, assuming that General Relativity (GR) (plus the cosmological constant Λ) is the fundamental law of gravity is an extrapolation of many orders of magnitude (from the Solar System regime where GR is well-tested).Furthermore, the cosmological constant and its theoretical foundations are yet to be completely understood [4].Dark matter has also hitherto eluded direct detection efforts [5,6].These problems are key parts of the theoretical motivation for testing gravity.
The modified gravity model space is vast [see [7,8] for reviews of this topic].As a result, in order to constrain significant volumes of this model space, one needs to formulate model-independent parameterisations that cover a large fraction of the model space. 1 There are many cosmological parameterisations of modified gravity [9][10][11][12][13][14], however the vast majority of these are developed in the "linear" regime, where perturbations (particularly density perturbations) are small.These have been constrained with multiple observational probes [15][16][17][18][19][20][21], but no deviation from ΛCDM has been detected universally.However, the restriction to linear scales significantly limits what these parameterisations can achieve 2 .
This restriction represents a significant challenge, since upcoming cosmological surveys such as Euclid3 , the Vera Rubin Observatory4 , the Nancy Roman Space Telescope 5 and the Square Kilometre Array (SKA) 6 will generate a multitude of data on non-linear cosmological scales.Until now, various studies of structure formation were carried out by performing N -body simulations in specific modified gravity models, such as f (R) gravity [24] or Dvali-Gabadadze-Porrati (DGP) gravity [25]; for a review, see [26,27].While these studies are valuable in understanding the behaviour of the respective models in the non-linear regime, they do not allow one to rule out significant regions of the modified-gravity model space.Furthermore, these studies don't allow us to perform a robust null test of GR, unlike modelindependent approaches.
To this end, it is important to establish a model-independent approach towards testing modified gravity on all cosmological scales.Recent work [2] used the Post-Friedmann formalism [28] to design a rigorous approach to modified gravity that is valid on all cosmological scales.Modified gravity models are typically parameterised by two parameters in linear theory.The post-Friedmann approach extends these linear theory parameterisations in a manner that can be consistently simulated and interpreted on all scales.In this work we explore the non-linear phenomenology of this model-independent approach using N -body simulations.Crucially, a careful choice of the parameters ensures that we can run simulations characterised by a single parameter, and include the effect of the other in post-processing (see sections 2 and 5 for more details).
Understanding this behaviour, and being able to capture it using fitting functions is important if we are to use this approach to analyse the data from next generation surveys.These missions will generate a multitude of data, particularly on non-linear cosmological scales.Not only will they measure the matter power spectrum through galaxy clustering, but they will also measure cosmic shear: the apparent distortion of the shapes of galaxies due to the deflection of light by the distribution of mass along the line-of-sight.As such, being able to compute these weak-lensing observables from our simulations and any fitting functions is also of paramount importance.
The plan of this work is as follows.In section 2, we provide a brief review of the Post-Friedmann formalism and discuss some features of our approach and their impact on the simulations, followed by a description of our N -body pipeline in section 3 as well as a discussion on the performance of the different fitting-functions.We present the phenomenology of the simulations in section 4. In section 5, we compute weak-lensing observables in our approach from the simulations and examine the performance of the fitting functions in this context.We present our concluding remarks in section 6.

Modified gravity framework
Cosmological perturbation theory is typically done in the background of the FLRW metric with small perturbations.Perturbation theory enables us to impose model-independent constraints on modified gravity models on linear scales [see for example [9] and references therein].Meanwhile, a lot of progress has been made [29] in constraining GR on local solarsystem scales by expanding the local Minkowski metric to varying powers of 1/c, where c is the speed of light, with the aim of capturing local non-linear effects.Such an expansion is done assuming the Newtonian limit, i.e., the quasi-static approximation (where time derivatives in the metric perturbations are down-weighted).The Post-Friedmann formalism [28] generalises the Newtonian limit to weak-field cosmology, in a manner that applies to all scales.The FLRW metric is expanded with perturbations where 1/c 2 and 1/c 3 terms are considered leading order.The full metric is written as where the spatial Cartesian coordinates are as measured in an Eulerian system of reference.We operate in the Poisson gauge, one of the few gauges that is valid on both linear and non-linear cosmological scales [23].The vector functions B N i and B P i are divergenceless and h ij is a transverse and traceless tensor (see [28] for a detailed explanation of this expansion and the physical interpretation of each term).In this approach, the dominant constituents of the cosmological fluid are pressure-less dust and the cosmological constant.The Einstein equations and the covariant conservation of energy-momentum are used to obtain a closed system of equations.
Recently, the post-Friedmann formalism was studied in the context of validating a parameterisation of modified gravity models on both linear and non-linear scales [2].This work uses the similarities between the linear-perturbative and Newtonian limits, and the lack of an intermediate regime where both limits fail, to create a set of "master" equations that can be applied on all cosmological scales.In this work, the so-called "re-summed potentials" [28] are used to define the parameterisation for modified gravity models that we use for our simulations.As discussed in [2], the re-summed potentials, in combination with the fact that the weak-field approximation is valid on all cosmological scales, show that the same degrees of freedom and dynamics are operating on all cosmological scales.A parameterised Poisson equation that governs the gravitational dynamics of massive particles may be written as where ρ is the background density, ∆ = δ − ȧ a and time.In ΛCDM µ is unity, but in modified gravity models this parameter may in principle be a complicated function of space and time.The work in [2] showed that these parameterised equations are valid on all cosmological scales, and encompass any modified gravity model as long as certain conditions are satisfied, such as the validity of the Newtonian limit of that theory on all non-linear scales.In particular, f (R) [30] fits within this approach.Moreover, the works ( [22,31]) can be re-interpreted in this framework as specifying the functional forms of µ that occur for some commonly studied models with screening mechanisms.A similar and promising extension of the linear theory parameters to a much wider range of scales was developed in [32], which applies to all models that fit into a Parameterised Post Newtonian (PPN) formalism [29].The equivalent µ in this approach has a functional form such that it has scale dependent features on linear scales, but is purely time-dependent on non-linear scales, due to the restriction to theories that fit into the PPN framework.Another promising proposal to address parameterising modified gravity theories on nonlinear scales was proposed in [22].This theoretical setup maps specific modified gravity models to known screening mechanisms via detailed spherical collapse calculations.This formalism was recently implemented to run phenomenological N -body simulations of f (R) gravity without having to solve for the additional dynamical scalar field [31].As discussed above, this work can be re-interpreted as a description of the functional forms of µ in our approach for known theories with screening mechanisms.We note that our approach is less restrictive in theory space than either of the two approaches discussed here, however this comes at the cost of an increased difficulty to pinpoint the type of theory responsible if a deviation is found.
With all these points in mind, in this work we take the first step toward a fully modelindependent treatment of modified gravity with N -body simulations.Various approaches for modelling µ are considered in [2] including, for example, using the known functional forms of µ on linear scales to transition from into the non-linear regime at k NL (traditionally chosen to be the scale at which k 3 P (k) = 1).Such an approach would therefore have an implicit timedependence, depending on when the transition to non-linear structure formation occurs for a given µ.For our simulations, we use the "maximally phenomenological pixels" approach as described in [2], in which we run N -body simulations where µ is a piece-wise constant function composed of independent bins or pixels in time and space which can be varied independently of each other.This allows us to be as agnostic as possible about the functional forms of the modifications to gravity.Note that this approach has been applied successfully in linear theory for modified gravity [1] and also generalised constraints on dark matter [33].Since the evolution of matter is completely determined by the Poisson equation given in eq.(2.6), we can run the N -body simulations with a single modified gravity parameter µ, and only require the second parameter when computing the observables from the simulation outputs (see section 5 for a more detailed discussion of how our simulations can be used to probe both parameters).Since we restrict ourselves to sufficiently sub-horizon scales, the Newtonian approximation allows us to set ∆ = δ, as is typical in Newtonian N -body simulations.
We will focus on models of modified gravity with a purely time-dependent µ.Whilst the full time-and scale-dependent analysis would employ similar methods and techniques outlined in this work, it requires a more involved alteration of the methods used to evolve the N -body simulation.We will present this in future work.More importantly, the purely time-dependent µ case is of considerable interest in its own right for several reasons.The time-dependent case is a bridge between linear and non-linear studies, because the linear growth factor remains purely time-dependent.This allows the non-linear scale dependent phenomenology due to an evolving strength of gravity to be clearly studied separately from the additional complexity introduced even on linear scales when µ is scale dependent.This was shown in [34] for a general class of modified gravity models on linear scales.We will return to this point later in this work (see eq. (2.8) and the discussion following for further details on the utility of maintaining scale independence on linear scales).We leave combined scale-and time-dependent µ simulations to future work.
Previous studies of phenomenological N -body simulations were carried out in [35][36][37][38][39].As discussed in [2], the interpretation of these simulations is not always entirely clear, in terms of how they relate to linear parameterisations, and how they relate to the equations on non-linear scales in particular modified gravity theories.By working from eq. (2.6) and the derivation and framework of [2], we avoid these possible problems, although we note that some of these earlier simulations are justified post-hoc by [2].The works [37,38] restrict themselves to smaller regions of parameter space, while the studies [35,36] are similar to ours in the sense that they concentrate on a purely time-dependent µ, although these simulations were run with a constant value for µ throughout the simulation.For this simplified setup, the authors in [35] derive a fitting function to calculate the power spectrum for arbitrary values of µ (see the appendix in [35] for details).In this work we explore the validity of the fitting function derived in [35], and other fitting functions in the literature, which attempt to reproduce the non-linear matter power spectrum P (k) for modified gravity models.
Our simulations are the first to explore the phenomenology of µ having different values in multiple redshift bins through the simulation.This means that our method allows one to be sensitive to any time variation that µ may assume, irrespective of theoretical prejudices/biases.
As noted, an advantage of a purely time-dependent µ is that the linear growth factor D = δ/a is purely time-dependent, as in ΛCDM.We use this property in section 4 to separate whether phenomenology is sensitive to the era in which the modified growth occurred, and provide analytic checks of the simulations on large scales.It is also useful for comparing to different ΛCDM simulations, and when considering the "pseudo spectrum" in ReACT (see section 3).We solve for the linear growth factor D(z) using where denotes a derivative with respect to the logarithm of the scale factor a and E(z) is the dimensionless Hubble function.

N -body simulations
In this section we present the details of the N -body simulations and code modifications, as well as the tests and consistency checks we performed.We use the publicly available N -body code GADGET-2 [40] as the basis for our N -body pipeline.GADGET-2 and modifications of it have been used extensively in the literature to run dark matter-only simulations of a wide variety [41][42][43].We restrict ourselves to dark matter-only simulations in order to understand modified gravity phenomenology.We present a summary of our numerical pipeline in the flowchart presented in fig. 1.We will now discuss the main branch of this pipeline in detail.
GADGET-2 employs a TreePM algorithm that splits the force computation into a longrange and short-range force.We modify both the long-range and the short-range force in the Flowchart of the numerical pipeline that we implemented.We establish initial conditions at z = 50, which are then evolved up to z = 0 using GADGET-2.Blue panels represent codes that we have modified, while red panels are codes that are used in their publicly available form.The modified gravity parameters are in orange, while ΛCDM parameters are in cream.The central branch represents our numerical simulation pipeline, while the right hand branch denotes our most successful pipeline for fitting the simulations (see section 3.2 for more details).
algorithm.We follow a similar procedure as in [44], where we have a modified Poisson equation with a time-dependent µ parameter, which is determined as a function of the simulation redshift.We now describe the time evolution of µ within our simulations.
For the phenomenological pixels that we are considering, the µ function is split into a set of piece-wise constant redshift bins, i.e., each with a constant value for µ.In order to explore the phenomenology of this approach as well as make contact with earlier work [35], we run a suite of simulations with a simple "1-2-4" hierarchy of pixels.In other words, we split the redshift interval 0 ≤ z ≤ 50 into 1, 2, and 4 redshift bins of equal ΛCDM growth.In order to isolate as far as possible the non-linear behaviour (of the matter power spectrum) from the linear theory behaviour, we choose the values of µ in each bin such that the linear growth factor at z = 0 is equal when each bin is "switched on" exclusively, with µ = 1 in all the other bins (see table 3.1 and fig. 2 for the redshift bins and the respective µ values that correspond to each of them).The widths of each bin are chosen such that, for our cosmological parameters, there is equal growth in each bin in a ΛCDM universe.See Appendix B for full details of this procedure.This allows an additional consistency check for our simulations, since the matter power spectrum at z = 0 for all our simulations should differ from the standard ΛCDM result by a factor [D(z)] 2 .
We use 2nd order Lagrangian perturbation theory [45] to generate our initial conditions at z = 50 for all our simulations, via the 2LPTic code [46].We only consider modifications to gravity from the starting redshift of the simulations.This allows us to use the same initial conditions (and therefore realisations) for modified gravity and ΛCDM, allowing us to focus on the phenomenology due to the modified gravitational evolution.We take advantage of  1.The redshift bins of equal ΛCDM growth according to the procedure described in the text, where we have the same linear growth at z = 0 when one of the bins is "switched on" for one, two and four bin(s).The µ values in each bin such that P (k) at z = 0 when only that bin is turned on, is identical to P (k) at z = 0 in the case when only the reference bin is turned on.these identical initial conditions as this allows us to present our results in terms of ratios of matter power spectra [47] (see Appendix B for further details).Taking ratios of matter power spectra ensures that we cancel out any realisation dependent effects that individual models, i.e., µ values and redshift bins are insensitive to, up to a wavenumber k ∼ 5 h Mpc −1 that was validated by convergence tests (see appendix A for more details).For all our simulations, we use a ΛCDM initial power spectrum created using the CLASS code [48] at z = 50, with the following parameters Ω m = 0.315, Ω Λ = 0.685, h = 0.674 and σ 8 = 0.811.
We also run a separate set of simulations with modified initial conditions such that we obtain ΛCDM matter power spectra with the same linear growth at z = 0 as our modified gravity simulations.These simulations have a modified value of σ 8 = 0.883 and σ 8 = 0.742, for the simulations with µ > 1 and µ < 1, respectively.This allows us to match the linear growth at z = 0 of all our simulations with the corresponding re-scaled ΛCDM simulation.This is important as the ReACT fitting function that we explore later in section 3.2 predicts exactly the ratio of matter power spectra in modified gravity w.r.t. to ΛCDM spectra with matched linear growth at the same redshift.We refer to these ΛCDM spectra as 'pseudo' spectra.These simulations have the same initial seed as our standard ΛCDM simulations.
To measure the matter power spectrum from the output of our simulations, we use the publicly available powmes code [49,50].For notational convenience, we define the ratio S(k) = P (k)/P ΛCDM (k) to be the ratio of the matter power spectrum measured from our simulations to the ΛCDM power spectrum and R(k) = P (k)/P pseudo (k) to be the ratio of the modified gravity matter power spectrum from the simulations to the ΛCDM power spectrum with the same linear growth factor.
To test the convergence of our simulations, we run them increasing the particle numbers for an identical box size.We then plot the ratio of the matter power spectra from each simulation to the power spectrum from the one with highest resolution, which we choose to be our reference.Our reference simulation contains 1024 3 particles in a comoving box of side 250 h −1 Mpc.We demonstrate convergence in our ΛCDM simulations by showing that the power spectrum from the one with the second-highest resolution (512 3 particles) agrees with that of highest resolution up to k = 4.8 h Mpc −1 , shown by the vertical green line in fig.11 (see appendix A for further details).We use a gravitational softening length equal to 2% of the inter-particle-distance, and a comoving mesh with the same number of grid points and particles.
We ensure that µ is a well-defined, continuous function of redshift by smoothing at the bin edges.In order to quantify the effect of such a smoothing procedure on the power spectrum, we first worked with the CLASS code, before implementing an identical algorithm in GADGET-2.Based on these results, the same smoothing procedure was implemented in GADGET-2 (see Appendix B for details).These results indicate that there is very little difference in the simulation results, regardless of whether µ is a smooth, continuous function of redshift or not, for our chosen parameters.
Note that we have primarily discussed the central branch of the flowchart in fig. 1 in this section; we now turn to the right hand side branch of the pipeline in section 3.2.

Predicting the power spectrum on non-linear scales
N -body simulations are too computationally expensive to be used to forecast and perform data analyses for upcoming surveys.Thus in this section we test the performance of various theoretical formulations that attempt to predict the full non-linear matter power spectrum.We do this by comparing their respective accuracy in the reproduction of the ratio of the modified gravity matter power spectra with respect to the ΛCDM matter power spectrum as measured from our simulations.This comparison of ratios allows us to remain independent of realisation-dependent effects.
The results in [35] provide hints that knowledge of the linear power spectrum is insufficient to completely calculate the matter power spectrum on non-linear scales.This has important consequences for the fitting functions.In section 4 we show explicitly that in our framework simulations with identical linear matter power spectra can show significant differences in their non-linear power spectra.
We consider the following formalisms as candidates for predicting the non-linear matter power spectrum: • case 1 -the fitting function obtained in [35]; • case 2 -the standard halo model as in eqs.(C.2),(C.3)with a Sheth-Tormen mass function [51,52]; • case 3 -the halofit fitting procedure [53]; • case 4 -the halo model reaction formalism [3] (ReACT).It is important to note that the halofit prediction (magenta dotted) is actually identical for all the simulations, since they all have identical linear power spectra.In all the panels, we can clearly see the ReACT curve (blue dot-dashed) is consistently within 5% of the black dot-dashed line which indicates our reference, the simulations.
We will now briefly describe each case.
To our knowledge, the only fitting function in the literature based on an arbitrary µ is that presented in [35].As noted earlier, this fitting function was estimated assuming a constant µ through the simulations, which is not the case in our simulation (see appendix C for more details).Nevertheless, we investigate the performance of this fitting function in the context of our simulation results.
As suggested in [35], the halo model provides a potential explanation of the results obtained in the simulations.We implement the standard halo model to predict the matter power spectrum [see appendix C].We also consider the halofit fitting procedure [53][54][55] that is commonly used for a ΛCDM Universe.However, it is well known that the basic implementation of the halo model (the simple addition of the 2-halo and 1-halo terms, or equivalently, the addition of the 1-halo term to the linear matter power spectrum) is inaccurate on quasi-linear scales due to halo correlation effects that require more detailed modelling [56].To circumvent the problems associated with the halo model on quasi-linear scales, the authors in [3,57] modified the halo model.This   can be seen in the following equation where R(z, k) is the so-called halo model reaction and P pseudo is the ΛCDM 'pseudo' power spectrum with identical linear growth to P mg .The reason for the use of the pseudo spectrum is that since it has identical linear clustering with respect to the modified gravity spectrum, the inaccuracy in the transition from linear to non-linear scales is slightly alleviated (see appendix C for a more detailed explanation of the various fitting functions).
The authors also invoke 1-loop perturbation theory corrections to quasi-linear scales for the halo model (see Appendix C for more details).We note that the authors in [3] modify the ΛCDM concentration-mass relation for the dark energy models under consideration (see Appendix C for more details and further discussion of the theoretical formulation of ReACT.See also fig.6 where we consider a further modification of the ReACT formalism).
In figs. 3 and 4, we show the ratio of the quantity R(k) (i.e., the ratio of the modified gravity matter power spectrum to the pseudo ΛCDM matter power spectrum) as predicted by the fitting functions with respect to the same quantity computed from the simulations.We have chosen a subset of our simulations that is representative of the performance of the different fitting functions we consider.Clearly, we see that the fitting function in [35] fails to predict the correct non-linear behaviour when µ is not a constant value throughout the simulation.As expected, the halofit prediction is identical for all the simulations, since they all have identical linear growths.In other words, the simulations where the halofit performs better than ReACT are the cases where modified gravity parameters coincidentally result in a matter power spectrum that is already very close to that of ΛCDM.We show that on average, this accidental success is not typically replicated in the general case.We see that while the ReACT formalism doesn't always predict the non-linear trend perfectly, it appears to be consistently within 5% of the simulations.Since we are only interested in understanding the modified gravity phenomenology, we do not vary ΛCDM parameters in this work.We leave the full validation of ReACT across ΛCDM and modified gravity parameter space to future work.
We use the following least-square statistic to quantify the agreement between the various fitting functions with our simulations where N = 160, with the sampling in Fourier space being approximately logarithmic, as in powmes and k cut is a wavenumber cut-off that we choose.The subscript 'sim' implies the ratio as measured in our simulations and the subscript 'FF' implies the ratio as predicted by the various fitting functions.We employ a cut-off at k cut = 3 h Mpc −1 , which is the theoretical threshold up to which the ReACT formalism is supposed to be accurate [3].Note that this scale is within the region of validity of our simulations [see appendix A], which is We show the χ 2 statistic with both wavenumber cut-offs in fig. 5.This indicates that the ReACT formalism provides the best fit to our simulations, as evidenced by the fact the red line is largest in the majority of the cases presented.Note that since we plot the negative logarithm, a larger bar corresponds to a better fit.We also note that due to the failure of the Cui et al. [35] (see eq. (C.1) in appendix C) even at small k to capture the results of our simulations, we don't include their fitting function in this figure.
We also show the wavenumber k fail at which the quantity R FF (k)/R sim (k) deviates from unity at the level of 3 and 5 %, respectively in the right panel of fig. 5. Note that this probes the validity in k-space of each fitting function, while the χ 2 statistic is an indicator of the accuracy of each fitting function within this region of validity.The combination of k fail and χ 2 throws light on the applicability of a fitting function to future forecasts.
In [35], the authors claim that in order to accurately reproduce the power spectra in simulations using the halo model, one needs to take into account that any modification to µ results in a change in the rate of structure formation, i.e., a change in the concentration parameter associated to haloes.Increasing µ causes haloes to form earlier, leading to lower concentrations in comparison to ΛCDM and vice-versa for decreasing µ.It is also worth noting that in order to accurately reproduce power spectra in f (R) and DGP models, the authors in [3] modify the ΛCDM concentration-mass relationship.Therefore, we would expect that in order to achieve precision in forecasting phenomenological modified gravity, one would need to depart from the concentration-mass relationship observed in ΛCDM.
In anticipation of this, we carry out a simple modification of the ReACT formalism in which we rescale the ΛCDM concentration parameters for all halo masses.This re-scaling is given by c MG vir = c ΛCDM vir /A, where A is a function that would conceivably depend on µ, the redshift at which it is modified and the duration in redshift over which it is modified.In fig.6, we demonstrate that for different values of A, the non-linear tail of the power spectrum is significantly affected.Indeed, we find a better fit for A = 1.6, as can be seen in the right panel of fig. 6.We note that this is only a preliminary examination of this effect and requires rigorous validation in future work, nevertheless we find similar qualitative behaviour to other studies.Clearly, the red bar is larger than the other bars in the majority of the simulations which implies that the ReACT formalism is typically the best fit over the range of scales (note that since we plot the negative logarithm, a larger bar is a better fit).In the right panel, we plot the wavenumber at which the quantity R(k) − 1 is at 3% and 5%, respectively.The results are consistent with the left panel, with the red bar being the largest for the majority of the simulations.This shows that the ReACT formalism allows one to probe deeper into the non-linear regime. .We display the ratio of the power spectra with respect to one of our simulations for several values of the parameter A in the left panel.The solid cyan line (which corresponds to A = 1.6) is the best fit, with the smallest χ 2 parameter, as can be seen by the plot on the right panel.While this behaviour was hinted at in various works in the literature [35], to our knowledge this is the first time that such an analysis has been carried out for the general model-independent case time-varying µ.We leave the analysis of varying A as a function of µ and the resulting rigorous validation of this modification to future work.

Simulation phenomenology
In this section, we discuss the phenomenology from our full suite of simulations with the fourteen different forms of µ(z).Our goal is to understand the variation in the non-linear matter power spectrum as a function of µ(z).In the top row of fig. 7, we present the ratio S(k) (at redshift zero) of the power spectra from all the simulations to the ΛCDM spectrum with the same initial conditions.We note that while the ratio relative to the pseudo-spectrum R(k) is the most appropriate ratio for the fitting functions in the previous section, here we focus on S(k) because it throws light on the different structure formation histories of the simulations with respect to ΛCDM.By design, the lines in these plots are all equal on The ratio of the power spectra from all the simulations to the ΛCDM power spectrum at z = 0. Note that we see a quasi-linear bump in these due to the mismatch in linear growth, which is absent in the other rows.We see the multi-dimensionality of the parameter space in display here since we have 6 different unique non-linear power spectra for the same linear power spectrum as the case where µ is constant throughout the simulation as studied in [35].Middle row : The quantity R(k), the ratio of the matter power spectra from the different simulations to the pseudo ΛCDM power spectrum with the same linear growth.Lower row : Ratio of power spectra at z = 0 from all the simulations to the power spectrum at z = 0 from the reference simulation with a constant µ throughout, with redshift bins according to table 3.1.
linear scales (for small k).We observe two distinct features in S(k), present on quasi-linear scales and non-linear (large k) scales respectively.Firstly, we consistently see a rise (or dip, depending on whether µ > 1 or µ < 1) in power across all simulations on quasi-linear scales at the level of ∼ 20% (at 0.1 h Mpc −1 ≤ k ≤ 1 h Mpc −1 ).This is followed by a "split" behaviour on small (non-linear) scales, where the power depends on the range of redshifts over which the modified gravity effects were switched on, and the simulations with µ = 1 at large redshifts exhibit the opposite behaviour to the case where it is switched on at small redshifts.This is due to varying halo formation times and inter-halo clustering, as we will discuss later in this section.
The quasi-linear behaviour may be attributed to the difference in growth histories between ΛCDM and the different simulations, since the effect of a different growth rate is more pronounced on quasi-linear scales compared to linear scales.We illustrate this using the middle panels of fig. 7, where we compute the ratio of the matter power spectrum in our simulations to the pseudo power spectrum.This quasi-linear feature is significantly diminished, since the pseudo power spectrum has the same linear growth at z = 0, so by construction the increased effects of the changed growth are mostly removed.This is why the pseudo power spectrum is a good comparison point when examining fitting functions.The same quasi-linear behaviour is seen in the bottom panel, where we compute the ratio of the matter power spectra relative to a reference simulation where µ is held constant in redshift.
We now turn to the non-linear split behaviour that we see in all of the rows.On small scales the lines diverge, depending on the redshift range during which µ = 1 in each simulation.In particular, the simulations where µ = 1 at earlier times, continues the trend expected from linear scales, that µ > 1 increases the power and vice versa for µ < 1.However, varying µ at low redshifts introduces the opposite effect to linear theory.For example, µ > 1 at late times actually leads to a lack of power on non-linear scales.This results in "crossing points" with respect to ΛCDM, i.e. scales below which the power spectrum is actually reduced, despite equal initial conditions and µ > 1.This implies that the state of clustering when µ is modified, in terms of how much structure has already formed and on what scales, is important for understanding the final clustering spectrum on small scales at redshift zero.Since we obtain intrinsically different shapes for the power spectra for different values of µ with identical linear growths, the discussion on concentration factors in [35] is insufficient to explain the physics of these simulations.These plots clearly show that any fitting procedures that predict the non-linear matter power spectrum purely from the linear spectrum will fail to capture non-linear effects accurately.
We can examine this small-scale issue in more detail by looking at the redshift evolution of the matter power spectrum in our simulations.In fig.8, we show the ratio of the power spectra from the simulations to the ΛCDM spectrum, at all of our output redshifts.There is a clear transfer of power over time from small scales to large scales.This is demonstrated in the top row where the peak/trough (depending on whether µ > 1 or µ < 1) in the ratio of the power spectra shifts to the left, i.e., to larger scales as one steps forward in redshift (from high redshift to low redshift).In the bottom panels, this transfer of power persists, although we see it manifested as phenomenon that 'evens out' the power over large and small scales.This transfer of power indicates the formation of larger structure in the universe causing power to be deposited on larger scales.Comparing the position and height of this peak/trough provides information on the rate of structure formation and the gravitational interaction at different epochs.On comparing the behaviour of the magenta, black and red lines in the top row with the bottom row, we also see the reinforcement of our earlier result,

S(k)
= 0.746 at 0 z 2.1 Figure 8.The ratio of the power spectra at all the redshifts output from our simulations relative to the ΛCDM power spectrum at the same respective redshifts, for the case where we have 4 bins in µ but the same linear growth at z = 0.These figures highlight the hierarchical nature of structure formation and its varying influence on the power spectrum depending on whether µ is switched on early in the simulation or later.In the former case, we have an early modification of the non-linear power spectrum, the imprint of which can be noticed at z = 0, while in the latter case, the late switching on of µ, introduces interaction between already formed structures resulting in non-intuitive shapes for the matter power spectrum.
that there is an excess of power in the non-linear regime when µ > 1 at high redshift, and vice versa at low redshift (this behaviour is mirrored when µ < 1).We now consider these two cases in more detail.Structure formation in the Universe is hierarchical, with smaller haloes being formed before larger ones.Therefore, modifying µ at early times impacts the power spectrum on the smallest scales as demonstrated in the bottom row of fig.8 where there is a substantial peak in S(k) on small scales at high redshift z = 4 and z = 5.This is due to the fact that only the smallest haloes were in the process of formation.Effectively, the increase in the rate of structure formation at early times causes smaller haloes to be formed earlier.The formation of larger structure only takes place when growth reverts to the ΛCDM rate.As one steps forward in redshift, one sees that this small scale peak is diminished, but not totally destroyed.As larger haloes start to form, mergers begin to take place leading to a gradual shift in power towards larger scales.However, there is still an excess of power at z = 0 on small scales, as a signature of the enhanced structure formation at earlier times.
Modifying µ at late times introduces additional complexity, since at late times, larger structures have started to form.Therefore, increasing/decreasing the strength of gravity massively impacts the interaction between larger haloes, i.e., halo power spectrum as seen by the aforementioned shift in the peak of the power spectrum to the left in the top row of fig. 8. Physically, this means that the interaction between haloes depletes power on smaller scales and deposits power on intermediate/large scales.Therefore, there is a transfer of power from smaller scales to larger scales (and of course the opposite happens when µ < 1).This further validates the behaviour of the orange, dark red and yellow lines in fig.7, where the small scale behaviour is opposite to that on linear scales.We note that the middle rows in fig.8 contain a mix of the two limiting cases described in the above paragraphs.

Computing weak-lensing observables in modified gravity
We now discuss the impact of varying µ in different redshift bins on the observables.In this section we compute the weak-lensing convergence power spectrum from the output of our simulations, and examine the impact of having predictions that are not restricted to linear scales.Furthermore, we also discuss how one can probe our two-parameter family approach to modified gravity from the output of our simulations.
The weak-lensing convergence power spectrum can be computed from the matter power spectrum and the modified gravity parameters from section 2 using the following expression where P δ is the matter power spectrum, η is the second modified gravity parameter that affects the photon geodesics and χ is the comoving angular diameter distance to the source along the line of sight.This equation is derived using the standard procedure of solving for null geodesics in the perturbed FLRW metric, in order to express the convergence in terms of the metric potentials, but we then replace the potentials by substituting the modified Poisson equation in eq.(2.6).We stress that in order to compute the matter power spectrum, one requires no knowledge of η.Therefore, we need only model µ in our N -body simulations.We will return to this point shortly.In practice, the above integral is truncated at the distance χ max corresponding to the maximum source redshift the survey is sensitive to.In this work, we concentrate on the auto-power spectra.The additional complications involved in the analysis of the cross-power spectra requires a full statistical forecast, which is beyond the scope of this work.The function g(z) is a filter function that depends on the redshift distribution of the background galaxies and is typically written as We use the standard expression for the source galaxy redshift distribution given by [58] We adopt a Euclid-like binning of the source number density into 10 equi-populated bins according to eq. ( 5.3) with z 0 = 0.9/ √ 2, α = 2 and β = 3/2, where we have assumed an average source density ng = 30 arcmin −2 [59][60][61][62][63]. Due to the fact that the ReACT formalism is only valid at z < 2.5, we concentrate only on the first tomographic bin, i.e., 0 ≤ z ≤ 0.4.The error bars are obtained by computing the following [63] where f sky = 0.7 is the fractional sky coverage, σ = 0.21 is the variance of the observed ellipticities and ni is the surface galaxy density of each bin.Essentially, we combine the Poisson shot noise contribution with cosmic variance to obtain the total uncertainty on the power spectrum, P κ .Note that since this involves an integration along the line of sight, one needs to compute the matter power spectrum at multiple redshifts and interpolate between them to compute the integral in eq.(5.1).To do this, we make use of the project 2d module in the Cosmosis numerical library [64].We compute both the linear and non-linear matter power spectrum from our simulations at z pk = {0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 2.0} which are then used as input to perform the integral in eq. ( 5.1) assuming the tomographic source redshift distribution given by eq. ( 5.3).Following eq. ( 5.1), we multiply the input power spectra that we are interpolating between by the corresponding value of µ 2 (1 + η) 2 /4 at each redshift.
We compute the convergence power spectra from both linear matter power spectra and the non-linear matter power spectra from our simulations, and compare them with the ΛCDM convergence power spectrum with Euclid-like error bars.On the left hand panel of fig.9, we show the convergence power spectra as computed from the output of one of our simulations.Note that for this plot, we only consider a variation in µ, and we set η to its ΛCDM value (η = 1).We see that on large scales, i.e., small , the modified gravity spectrum is within the ΛCDM error bars, while the spectrum on non-linear scales is outside the error bars.Moreover, the linear spectrum deviates from the non-linear spectrum below = 100.This shows the strong increase in constraining power that can be achieved by using non-linear scales: not only are there many more scales that can be used, but also the errors are typically smaller on those scales.These results emphasise the importance of developing simulations and fitting functions for model-independent modified gravity, as examined in section 3. ( + 1)C bin 1_1 = 0.9 at 0 z 7.01 bin 1_1 = 0.9 at 0 z 7.01 (1 + ) 2 = 1 bin 1_1 = 0.9 at 0 z 7.01 = 1 CDM bin 1_1 Figure 9.In the left panel, we show the weak lensing auto-power spectrum for a particular tomographic bin from ΛCDM (red-dashed) and from the simulations with µ = 0.9 at 0 ≤ z ≤ 7.01 (blue solid).Note that we are using the ΛCDM simulation with rescaled initial conditions at z = 50.In the right panel, we show three different curves with the same µ bin, but with different values of η.The first curve has η = 1 (blue solid).The second has η = 0.8 such that the prefactor µ(1 + η)/2 = 1 as in ΛCDM (green-dotted), such that the modified gravity effects only enter through the modified P δ .Finally, we show the case where µ = 1, but η = 0.8 at 0 ≤ z ≤ 7, which has a different shape, i.e., different scale dependence on non-linear scales.Therefore, we show how our approach may be used by future missions to put constraints on both µ as well as η.We stress that by explicitly going to non-linear scales, we are able to break the degeneracy between η and µ that exists in linear theory. .We show the ratio of the convergence power spectra obtained from the matter power spectra as predicted by the ReACT (blue-dotted) and halofit (green dot-dashed) formalisms with respect to those obtained from the simulation matter power spectra.The superscripts 'FF' and 'sim' imply 'fitting function' and 'simulation', respectively.We immediately see that halofit performs significantly worse than ReACT even at low , where it fails at the level of ∼ 5%.The vertical lines indicate the value of at which the ReACT curve deviates from the simulation curve at the level of 3% and 5%, respectively.We choose two simulations that capture the variation in the cut-offs in our sample of simulations.In the left panel is the simulation with the largest cut-offs in our sample of simulations.In the right panel, we show a simulation with one of the smallest cut-offs.The better performance of the ReACT formalism over halofit in such a model-independent analysis of modified gravity on non-linear scales is clearly demonstrated.We also show the curve with the concentration parameter modified (see fig. 6) which seems to further improve the ReACT fit, see text for details.
We note that comparing our modified gravity parameterisation (with parameters that are only time dependent) to ΛCDM for a single observable is slightly non-trivial, because one has a free choice of which redshift to make the linear matter power spectra agree at.For our weak-lensing comparison, using the lowest redshift tomographic bin, we choose to focus on the case where the two linear spectra are equal at redshift zero (i.e. the ΛCDM simulation with rescaled initial conditions at z = 50).We make this choice for several reasons.It minimises the linear theory difference between the two curves, allowing us to focus on the additional information from non-linear behaviour/scales and, as long as the linear theory behaviour is similar at low redshift, the exact redshift at which they are equal is unimportant since lensing combines information from multiple redshifts.Our conclusions about the relative sizes of errors on difference scales, and the relative numbers of scales that contribute in each regime, are unchanged if a different choice is made.We will extend our analysis to a full parameter forecast for upcoming surveys in future work, which will allow this issue of parameter degeneracies to be examined in detail.
We now generalise our results to using our simulation output to probe the effects of both µ and η.As mentioned before, since the input matter power spectra from the simulations are unaffected by η = 1, we may model the effects of it in post-processing.In other words, the evolution of matter fluctuations is driven solely by µ, but the trajectory of photons through this distribution is affected by both η and µ.Therefore, we can effectively ignore η whilst running our N -body simulations without any loss of generality, essentially getting P δ (k) as a function of µ only.We then use eq.( 5.1) to compute the weak-lensing observables for both parameters.
As shown in eq.(5.1), the effect of varying η is captured solely by the prefactor µ 2 (1 + η) 2 /4.Therefore, one would expect that a variation of η would result in an overall shift of the power spectrum.If one were to concentrate on purely linear scales, this is exactly the same as varying µ due to the fact that a time-dependent µ also simply shifts the matter power spectrum in the vertical direction(s).As a result, there is a degeneracy between time-dependent η and µ in linear theory.By explicitly going to non-linear scales, we break this degeneracy since we have already shown that a time-dependent µ introduces scale-dependent features in the matter power spectrum, and therefore in the weak-lensing convergence spectrum via P δ in eq.(5.1), whereas η introduces no such scale dependence. 7he breaking of this degeneracy is clearly shown in the right panel of eq.(5.1),where we show the ΛCDM convergence spectrum with error bars, along with two different curves obtained from the same simulation, with identical µ(z), but different values of η.The greendotted line is obtained by setting µ 2 (1 + η) 2 /4 = 1 (as in ΛCDM).The solid blue line is obtained directly from the simulation, i.e., η = 1.We see that varying η while keeping µ constant leads to an overall different amplitude of the convergence spectra.We also explicitly show that if one reverts to the ΛCDM value of µ = 1 while keeping the prefactor µ(1 + η)/2 the same as in the blue line, one obtains the same shape for the power spectrum as the ΛCDM case.This can also be seen by comparing the shapes of the blue line and the green-dotted line which have identical µ(z).Thus, it is clear that the scale dependent effects from a timedependent µ breaks the degeneracy between the two parameters that exists on linear scales, allowing the two parameters to be distinguished from each other and from their ΛCDM values.We note that the linear theory degeneracy should persist in both the auto-power spectra and the cross-power spectra.However, we leave the full behaviour of the cross-power spectra as a function of varying η and µ to future work.
These results not only show how weak-lensing observables can be constructed from our simulations for the full (µ, η) parameter space, but also further highlight the importance and gain of using non-linear scales.

Further evaluation of fitting functions
We can extend the earlier analysis of the fitting functions, by examining how well they capture the non-linear behaviour of the weak-lensing observables.We focus on the two fitting functions that performed the best in section 3.2, i.e., the ReACT formalism and the halofit fitting procedure.
In general we find that the better performance of ReACT compared to halofit is further improved when looking at the weak-lensing observables, compared to when directly examining the matter power spectrum.In particular, similarly to section 3.2, we quantify the quality of the fitting functions by calculating the fail value (rather than k fail as in section 3.2) at which the fitting function first fails by 3 (or 5) %.The fail values at which ReACT fails by 3% are in the range 250 ≤ fail ≤ 4000, however halofit typically fails at the 3% level even for very low .For a given fitting function, we find little correlation between the k and values at which the fitting function fails in each simulation, due to the range of k and redshift values at which the matter power spectrum contributes to the weak-lensing convolution.
To illustrate this performance, we plot the results for two representative simulations: one where ReACT performs well to higher , and one where it fails at much lower , approximately corresponding to the best and worst cases we studied.These simulations are shown in fig. 10, where we show the ratio of the convergence power spectra calculated from the matter power spectra predicted by the fitting function and the simulation respectively.These plots show that halofit performs significantly worse than ReACT, failing at the level of ∼ 5% even at low .This poor performance of halofit when computing weak-lensing observables is common across all simulations.In these plots, the vertical lines show the fail values, i.e., where the ReACT curve deviates from the simulation curve at the level of 3% and 5%, respectively.The left panel shows a simulation where the ReACT curve is accurate to ∼ 5000 while on the right, cut-offs are an order of magnitude smaller.These fail values give an indication of the range where ReACT can be used to fairly reliably predict the weak-lensing observables for model-independent modified gravity studies.As described in section 3.2, we expect this performance to be improved by adjusting the concentration parameter.To demonstrate this, we include an additional curve where the concentration parameter has been modified according to the discussion at the end of section 3.2 (see fig. 6 for more details).As expected, it appears that weak-lensing observables can be accurately computed to higher using this process, however we leave a detailed examination of this to future work.
These results, in combination with those in section 3.2, show that phenomenological modified gravity analyses with current data can be carried out without restricting to linear scales (e.g.[9]) or carrying out a linearistaion procedure as in [65].For these purposes, our results show that ReACT is the best performing fitting function (including outperforming halofit), particularly when using weak-lensing observables.In future work we will extend this initial analysis to quantitatively examine a much wider range of parameter space, in preparation for analysing the data for upcoming surveys.

Discussion and Conclusion
In this work, we have presented N -body simulations with a time-dependent strength of gravity µ, based on a framework for examining modified gravity in a model-independent way across all cosmological scales [2].The key results of this paper are a presentation of the phenomenology of these simulations, an evaluation of existing fitting functions for capturing this phenomenology, and a demonstration of the application and importance of this framework for weak-lensing observables.
We modified the GADGET-2 N -body code [40], and ran a series of simulations with piecewise-constant bins in redshift for µ.See section 3 for more details and table 3.1 for the redshift bins and the µ values in each bin.
The only fitting function calibrated from N -body simulations for the matter power spectrum on non-linear scales in phenomenological modified gravity was presented in [35].We investigated the performance of this fitting function, as well as the ΛCDM halofit fitting procedure, the standard halo model of structure formation [see eqs.(C.2),(C.3)]and the halo model reaction [3,57].We compared the ratio of the modified gravity matter power spectrum to the so-called 'pseudo' ΛCDM power spectrum as predicted in the various formalisms to the same ratio measured from our simulations.We quantified the accuracy of each formalism by calculating the least square statistic χ 2 and k fail , the wavenumber of first failure (see fig. 5) for each fitting function.We found that the halo model reaction formalism performed significantly better than the others, with the notable exception being the case where the modified gravity parameters are such that the resulting power spectrum is co-incidentally very similar to the ΛCDM case.We also see qualitative evidence (see fig. 6) supporting the theoretical expectation that to achieve precision in forecasting modified gravity matter power spectra, one needs to modify the ΛCDM concentration-mass relationship.We will extend the investigation presented here to a full parameter space examination and validation of the ReACT approach and the concentration issue in future work.
We present the matter power spectra from our simulations in section 4. Figs.7 and 8 show that a purely time-dependent µ induces scale-dependent features in the matter power spectrum, as reported in [35] for a simpler case.Our results also show that the shape of the power spectrum on quasi-linear as well as non-linear scales depends not only on the value of µ, but also on the redshift at which it is 'switched on' (i.e., different from unity) and the duration of such a modification.Most notably, we see that introducing a modification in µ at early times produces a power spectrum that either lacks (µ < 1) or has excess power on the quasi-linear scales (µ > 1) and vice-versa on non-linear scales, respectively.We also noticed that the peak in the power spectrum relative to ΛCDM shifts to larger scales (lower k) as one steps forward in time (from high redshift to low redshift).
To understand the physics that leads to these results, we first note that modifying µ at different epochs and for different periods of time affects the redshift at which non-linear structure starts to form.Combining this with our results indicates that a transfer of power occurs from smaller to larger scales or vice versa, depending on when and for how long µ is modified.This is due to the fact that at early times, the very first haloes are in the process of forming, which means that changing the rate of structure formation affects the smaller cosmological scales since those are the haloes forming at that time.Therefore, when one returns to ΛCDM structure formation at late times, one sees a marked difference in the power spectrum.However, if one introduces a change in the rate of structure formation at late times, when galaxies have already formed, one is then affecting the rate at which the largest haloes form, via mergers.Therefore, one sees a unique change in both the quasi-linear as well as the non-linear scales of the matter power spectrum, for a given time evolution of µ.
Finally, we show the impact of these phenomenological modifications to gravity on weaklensing observations that will be generated in future experiments like the Euclid satellite (see fig. 9 and the discussion attached).We show that the constraining power of these experiments in the context of modified gravity is strongly dependent on the data obtained from non-linear cosmological scales, both in terms of the number of scales that are accessible and the sizes of the errors on the different scales.We further show how we can use weak-lensing observables to probe the two-parameter µ − η family of modified gravity models using the output of our simulations [see fig.10].We find that by explicitly going to non-linear scales one breaks the degeneracy between these two parameters that exists on linear scales [see right panel of fig.9].We also extend our analysis of the fitting functions to the weak-lensing context, again finding that the ReACT formalism performs the best.This shows that phenomenological modified gravity analyses with current data can be carried out without restricting to linear scales or carrying out a linearisation procedure.
The model-independent approach first elucidated in [2], and further developed here, is not a priori restricted to particular regions of model space or types of theories, and has the potential to be a powerful null test of the ΛCDM+GR paradigm.The simulations and results presented here are an important step towards realising this, and using all of the data in future surveys to put model-independent constraints on the laws of gravity that operate in the Universe.

A Convergence Tests
We now present the procedure we followed to test the convergence of our N -body simulations.In the following discussion we explicitly show convergence with respect to particle number in our simulations.Using the same methods, we also tested convergence with respect to varying box-size, softening length and also different realisations of initial conditions, respectively.
In order to demonstrate convergence in our simulations, we present ratios of the matter power spectrum as this allows one to neglect realisation-dependent effects [47].We run all our simulations in a cosmological box of side 250 Mpc h −1 .In the left panel of fig.11, we compute the ratio of the matter spectrum from ΛCDM simulations with respect to our highest resolution reference simulation, increasing the particle in steps.The green line denotes the k value at which the second-highest resolution simulation (512 3 particles) disagrees with our highest resolution (1024 3 particles) simulation, which is an indicator of the scale up to which our results are reliable.
We now show that we have achieved the same level of convergence in our modified gravity simulations.In the right panel, we show the convergence of our modified gravity The ratio of the power spectra as a function of scale, with each line labelled by the particle number of the simulation they represent.All simulations have a box size of 250 Mpc h −1 comoving.The green vertical line represents the wavelength at which the ratio of the power spectra deviates at the level of 1% from the simulations with the best resolution.This panel shows the level of convergence we obtain in our ΛCDM simulations.Right panel : A similar procedure was carried out for the modified gravity simulations, but with a key difference.For each simulation, we calculate the ratio S(k) of the power spectrum from the simulation to the power spectrum from the ΛCDM simulation.We then plot the ratio of S(k) computed from simulations with 512 3 particles to simulations with 1024 3 particles.The green vertical line is plotted at the identical point as before.Therefore, we show that our modified gravity simulations have similar convergence both with respect to each other as well as with respect to our ΛCDM simulations.Note that on linear scales, all the ratios are within 0.5% of 1, which justifies our choice of µ in each bin.For our production simulations, we use a particle number of 1024 3 (with an equally fine mesh) and a box size of 250 Mpc h −1 .
simulations by plotting a ratio of ratios.We evaluate the ratio of matter power spectra from the simulation with 512 3 particles with the simulation with 1024 3 particles in both our ΛCDM simulations as well as our modified gravity simulations.We then plot S 512 /S 1024 , and show that they agree up to the same wavenumber as the left-hand-side panel, again shown by the green line.The fact that the vertical green line is at an identical wavenumber on both our panels demonstrates that we have achieved convergence for both our ΛCDM simulations as well as our modified gravity simulations.

B Binning in µ
We now describe the procedure we adopted to bin µ in redshift.We reiterate this parameter space is multi-dimensional, which simultaneously covers a large portion of the parameter space while isolating non-linear behaviour is a difficult task.We focus on the latter in this work, as our intention is to understand the phenomenology.As mentioned in section 3, we divide the redshift spanned by our simulations into 2 and 4 bins of equal ΛCDM growth.In other words, in the case where we have 2 redshift bins for µ, we have that D ΛCDM (z 1 )/D ΛCDM (z 2 ) = D ΛCDM (z 2 )/D ΛCDM (z 3 ).Setting z 1 = 0 and z 3 = 50, we obtain D(z 2 ) 2 = D(z 1 )D(z 3 ) , (B.1) from which we can compute z 2 ≈ 7.01.We follow a similar procedure in the case where we have 4 bins in redshift.We consider the bin 0 ≤ z ≤ 7.01 as our reference in the context of choosing the value of µ.We set µ = 1.1 and 0.9, in this bin, respectively, and compute D(0).For all the other bins, we choose µ such that the growth factor at z = 0 is identical to the value we computed for our reference bin.In this way, we obtain the µ values in table 3.1.In order to confirm that these values of µ indeed correspond to identical growth factors at z = 0, we modified the equations of motion solved by CLASS for a time-dependent µ as outlined in [66].In fig. 12 we show that the relative differences in the linear power spectra for all the cases in table 3.1 are within 0.5%, in accordance with the right panel in fig.11, which shows that our modified CLASS code performs as expected.Therefore, we run two sets of seven simulations in which µ = 1 in one, two or four redshift bins, with the matter power spectrum measured from all the simulations being equal on linear scales at z = 0. We now describe the implementation of the µ smoothing in CLASS as well as GADGET-2.We assume that the input to the code includes an array of values for µ as well as an array of redshifts that denote the bins.We then use a combination of two error functions given by We have chosen the width function X = (1/A) ln(∆z bin ) with A being a tolerance parameter that determines the width of the transition and ∆z bin is the bin-width.This choice ensures that the width of the transition naturally becomes shorter at lower redshifts where the growth factor is more sensitive to redshift (where the timestep in the code is small).To estimate the impact of the smoothing, we have carried out two tests.The first test is the ratio of the power spectrum with the smoothed µ to the un-smooth case for different values of A. In this case, we have averaged the ratio over all k values to obtain a "goodness of fit" parameter, which ideally should be equal to zero.The second test was to plot the growth factor as a function of redshift, specifically over the range of redshifts where the smoothing is applied.The results from both these tests in CLASS are shown in fig.13.On implementing this test in GADGET-2, we found that over the range of values we considered for the smoothing parameter, the differences between the smoothed and un-smoothed case were smaller than 0.1%.

C Fitting functions
We now present an overview of the different fitting functions we examined in section 3.2.Note that we also examine the performance of the standard halofit procedure [53][54][55] which we do not describe in detail here.Not that while this is not a detailed examination of these fitting functions, we present the procedure that we followed to actually predict R(k) in order to compare with our simulations.

C.1 Cui et al.
The fitting function derived in [35] is given by R(x, µ) = exp ((1 − µ)B(µ)x C(µ) ) , (C.1) where R(x, µ) as mentioned in the main text is the ratio of the matter power spectrum in modified gravity relative to the ΛCDM pseudo spectrum (with equal linear growth).B(µ) = 0.0429+0.133µ−4 , C(µ) = 0.573 and x = ∆ 2 (k, z, µ = 1) is the dimensionless power spectrum P (k)k 3 /(2π).The main feature of this function is that it predicts R > 1 when µ < 1 and viceversa when µ > 1.In other words, this fitting function assumes that the non-linear evolution is slower when the growth rate is increased due to µ being larger than its value in ΛCDM, and vice versa.While we have observed a similar behaviour in some of our simulations, we find that this is not true in the general case.We demonstrate in fig.14 that one can reproduce with reasonable success the matter power spectrum in the case where µ is constant throughout the simulation time period, although even in this case we see that ReACT outperforms the other fitting functions that we consider.
In contrast to [57], our work is designed to understand the phenomenology while remaining model-agnostic and therefore, we neglect any screening mechanisms and indeed any scale dependent corrections to the Poisson equation given in eq.(2.6).Therefore, in order to reproduce the matter power spectrum in our formalism within the ReACT code, we only modify the µ parameter and set S(a, k) = 0 and indeed all the other modified gravity parameters within the code to their respective values in ΛCDM.This simplifies the numerical implementation of the ReACT while still being able to estimate the non-linear effects of modifiying µ.

Figure 1 .
Figure 1.Flowchart of the numerical pipeline that we implemented.We establish initial conditions at z = 50, which are then evolved up to z = 0 using GADGET-2.Blue panels represent codes that we have modified, while red panels are codes that are used in their publicly available form.The modified gravity parameters are in orange, while ΛCDM parameters are in cream.The central branch represents our numerical simulation pipeline, while the right hand branch denotes our most successful pipeline for fitting the simulations (see section 3.2 for more details).

Figure 2 .
Figure 2. We illustrate the redshift evolution of µ in our simulations.Dotted lines represent the case where µ is constant through the simulation (one bin), dashed lines represent the case where there are two bins in redshift, while dot-dashed lines represent the case where there are four bins in redshift.

Figure 3 .
Figure 3.The ratio of the quantity R(k) as predicted by the various fitting functions (as indicated by the superscript 'FF') with respect to the same quantity computed from the simulations (as indicated by the superscript 'sim').It is clear that the cyan line from the fitting function of[35] fails for all cases.The green dashed curve is the simple application of the halo model as shown in eqs.(C.2) and (C.3).It is important to note that the halofit prediction (magenta dotted) is actually identical for all the simulations, since they all have identical linear power spectra.In all the panels, we can clearly see the ReACT curve (blue dot-dashed) is consistently within 5% of the black dot-dashed line which indicates our reference, the simulations.

Figure 4 .
Figure 4.As in fig.3, but where µ has a different value from GR in one of the earlier redshift bins.

Figure 5 .
Figure 5.In the left panel, we show the values of the least square statistic for the different fitting functions as defined in eq.(3.2).We obtained these values by employing k cut = 3 h Mpc −1 and k cut = 5 h Mpc −1 , which represent the wave number up to which the ReACT formalism is designed to operate and the wave number up to which we have converged results from our simulations, respectively.Clearly, the red bar is larger than the other bars in the majority of the simulations which implies that the ReACT formalism is typically the best fit over the range of scales (note that since we plot the negative logarithm, a larger bar is a better fit).In the right panel, we plot the wavenumber at which the quantity R(k) − 1 is at 3% and 5%, respectively.The results are consistent with the left panel, with the red bar being the largest for the majority of the simulations.This shows that the ReACT formalism allows one to probe deeper into the non-linear regime.

2 = 1 .
Figure 6.We display the ratio of the power spectra with respect to one of our simulations for several values of the parameter A in the left panel.The solid cyan line (which corresponds to A = 1.6) is the best fit, with the smallest χ 2 parameter, as can be seen by the plot on the right panel.While this behaviour was hinted at in various works in the literature[35], to our knowledge this is the first time that such an analysis has been carried out for the general model-independent case time-varying µ.We leave the analysis of varying A as a function of µ and the resulting rigorous validation of this modification to future work.

= 1 .z 50 Figure 7 .
Figure 7. Top row : The ratio of the power spectra from all the simulations to the ΛCDM power spectrum at z = 0. Note that we see a quasi-linear bump in these due to the mismatch in linear growth, which is absent in the other rows.We see the multi-dimensionality of the parameter space in display here since we have 6 different unique non-linear power spectra for the same linear power spectrum as the case where µ is constant throughout the simulation as studied in[35].Middle row : The quantity R(k), the ratio of the matter power spectra from the different simulations to the pseudo ΛCDM power spectrum with the same linear growth.Lower row : Ratio of power spectra at z = 0 from all the simulations to the power spectrum at z = 0 from the reference simulation with a constant µ throughout, with redshift bins according to table 3.1.

Figure 10
Figure 10.We show the ratio of the convergence power spectra obtained from the matter power spectra as predicted by the ReACT (blue-dotted) and halofit (green dot-dashed) formalisms with respect to those obtained from the simulation matter power spectra.The superscripts 'FF' and 'sim' imply 'fitting function' and 'simulation', respectively.We immediately see that halofit performs significantly worse than ReACT even at low , where it fails at the level of ∼ 5%.The vertical lines indicate the value of at which the ReACT curve deviates from the simulation curve at the level of 3% and 5%, respectively.We choose two simulations that capture the variation in the cut-offs in our sample of simulations.In the left panel is the simulation with the largest cut-offs in our sample of simulations.In the right panel, we show a simulation with one of the smallest cut-offs.The better performance of the ReACT formalism over halofit in such a model-independent analysis of modified gravity on non-linear scales is clearly demonstrated.We also show the curve with the concentration parameter modified (see fig.6) which seems to further improve the ReACT fit, see text for details.

z 50 Figure 12 .
Figure 12.The relative difference in the linear power spectra at redshift zero to the reference case (chosen to be the case where µ is constant) from runs of our modified CLASS code.

Figure 13 .
Figure 13.The plot of the growth factor as a function of redshift in the case where µ > 1 (left) and µ < 1 (right) at 2.15 ≤ z ≤ 7.01.