Fast full $N$-body simulations of generic modified gravity: derivative coupling models

We present MG-GLAM, a code developed for the very fast production of full $N$-body cosmological simulations in modified gravity (MG) models. We describe the implementation, numerical tests and first results of a large suite of cosmological simulations for two broad classes of MG models with derivative coupling terms -- the Vainshtein- and Kmouflage-type models -- which respectively features the Vainshtein and Kmouflage screening mechanism. Derived from the parallel particle-mesh code GLAM, MG-GLAM incorporates an efficient multigrid relaxation technique to solve the characteristic nonlinear partial differential equations of these models. For Kmouflage, we have proposed a new algorithm for the relaxation solver, and run the first simulations of the model to understand its cosmological behaviour. In a companion paper, we describe versions of this code developed for conformally-coupled MG models, including several variants of $f(R)$ gravity, the symmetron model and coupled quintessence. Altogether, MG-GLAM has so far implemented the prototypes for most MG models of interest, and is broad and versatile. The code is highly optimised, with a tremendous (over two orders of magnitude) speedup when comparing its running time with earlier $N$-body codes, while still giving accurate predictions of the matter power spectrum and dark matter halo abundance. MG-GLAM is ideal for the generation of large numbers of MG simulations that can be used in the construction of mock galaxy catalogues and accurate emulators for ongoing and future galaxy surveys.


Introduction
The accelerated expansion of our Universe [1,2] is one of the most challenging problems in modern physics, and after decades of attempts to find its origin, we are still far from reaching a clear conclusion. While the current standard cosmological model -Λ Cold Dark Matter (ΛCDM), which assumes that this accelerated expansion is caused by the cosmological constant, Λ -is in excellent agreement with most observational data to date, this model suffers from the well-known coincidence and fine-tuning problems. This suggests that a more fundamental theory is yet to be developed which can naturally explain the small observationally inferred value of Λ. The alternative theoretical models proposed so far can be roughly classified into two categories: one involves some exotic new matter species beyond the standard model of particle physics, the so-called dark energy [3], which usually has a non-trivial dynamics; the other involves modifications of Einstein's General Relativity (GR) on certain (usually cosmic) scales [4][5][6], or introduces new fundamental forces between matter particles 1 . Some leading examples are quintessence [7][8][9][10], k-essence [11,12], coupled quintessence [13], f (R) gravity [14,15] and chameleon model [16][17][18][19], symmetron model [20][21][22], the Dvali-Gabadadze-Porrati braneworld (DGP) model [23], scalar [24,25] and vector [26][27][28] Galileons, Kmouflage [29], massive gravity [e.g., 30], etc.. In modified gravity (MG) models, in addition to a modified, and accelerated, expansion rate that could explain observations, often the law of gravity is also different from GR, which can further affect the evolution of the large-scale structure (LSS) of the Universe. This suggests that we can use various cosmological observations to constrain and test these models [e.g., [31][32][33]. In this sense, the study of MG models can be used as a testbed to verify the validity of GR on cosmological scales, hence going beyond the usual small-scale or local tests of GR [34].
To exploit the next generation of observational data, we need to develop accurate theoretical tools to predict the cosmological implications of various models, in particular their behaviour on small scales which encode a great wealth of information. However, predicting LSS formation on small scales is a non-trivial work because structure evolution has entered the highly non-linear regime here, with a lot of complicated physical processes, such as gravitational collapse and baryonic interactions, being at play. The only tool that could accurately predict structure formation in this regime is cosmological simulations, which follow the evolution of matter through the cosmic time, from some initial, linear, density field all the way down to the highly-clustered matter distribution on small, sub-galactic, scales at late times. Modern cosmological simulation codes, e.g., RAMSES [81], GADGET [82,83], AREPO [84], PKDGRAV [85], SWIFT [86], have been able to employ hundreds of billions or trillions of particles in Giga-parsec volumes [e.g., 85,87,88], and are nowdays indispensable in the confrontation of theories with observational data. In particular, to achieve the high level of precision required by galaxy surveys, one can generate hundreds or thousands of independent galaxy mocks that cover the expected survey volume, based on these simulations. However, this has so far been impossible for MG models, which usually involve highly non-linear partial differential equations that govern the new physics, solving which has proven to be very expensive even with the latest codes, e.g., ECOSMOG [89][90][91][92], MG-GADGET [93], ISIS [94] and MG-AREPO [95,96] (see [97] for a comparison of several MG codes). For example, current MG simulations can take between 2 to O(10) times longer than standard ΛCDM simulations of the same specifications. Obviously, to best explore the future observations for testing MG models, we need a new simulation code for these models with greatly improved efficiency compared with the current generation of codes.
In this paper, we present such a code, MG-GLAM, which is an extension of the parallel particlemesh (PPM) N -body code GLAM 2 [98], where various important classes of modified gravity models are implemented. Efficiency is the main feature of MG-GLAM, which is partly thanks to the efficiency and optimisations it inherits from its base code, GLAM 3 , partly due to optimised numerical algorithms tailored to solve the nonlinear equations of motion in these modified gravity models, and partly thanks to a careful design of the code and data structures to reduce memory footprint of simulations.
Modified gravity models can be classified according to the fundamental properties of their new dynamical degrees of freedom, and the interactions the latter have. Here, we study two classes of MG models which introduce new scalar degrees of freedom that have derivative-coupling interactions: the normal-branch of the DGP [23] braneworld model, which is a representative example of Vainshteintype gravity models, and the Kmouflage model [29]. These models generally introduce a new force (fifth force) between matter particles, but they can both employ screening mechanisms to evade Solar System constraints [29,102] on the fifth force. These two models have been widely studied in recent years and, as we argue below, the implementation of them can lead to prototype MG codes that can be modified to work with minimal effort for other classes of interesting models. In a twin paper [103], we will describe the implementation and analysis of several other classes of MG models, including the coupled quintessence [13], chameleon [16,17] f (R) gravity [104], and symmetron models [20,21], which are examples of conformally coupled scalar fields.
As we will demonstrate below, the inclusion of modified gravity solvers in MG-GLAM adds an overhead to the computational cost of GLAM, and for the models considered in this paper and its twin paper [103], a MG-GLAM run can take about 2-5 times (depending on the resolution) the computing time of an equivalent ΛCDM simulation run using default GLAM. All in all, this makes this new code at least around 100 times faster than other modified gravity simulation codes such as ECOSMOG [89][90][91][92] and MG-AREPO [95,96] for the same simulation boxsize and particle number. In spite of such a massive improvement in speed over those latter codes, it is worthwhile to note that MG-GLAM is not an approximate code: it solves the full Poisson and MG equations, and its accuracy is only limited by the resolution of the PM grid used, which can be specified by users based on their particular scientific objectives. This makes it different from fast approximate simulation codes such as those [105][106][107][108] based on the COmoving Lagrangian Acceleration method (COLA) [109].
The paper is organised as follows. Section 2 presents the theoretical aspects of the modified gravity models studied here. In Section 3 we discuss the numerical implementation of MG-GLAM. The description and results of several code tests are shown in Section 4 and in Section 5 we analyse the nonlinear power spectra and halo mass functions of the first derivative coupling models performed with MG-GLAM. Finally, we summarise the main results and give our conclusions in Section 6.
Throughout this paper, we adopt the usual conventions that Greek indices label all space-time coordinates (µ, ν, · · · = 0, 1, 2, 3), while Latin indices label the space coordinates only (i, j, k, · · · = 1, 2, 3). Our metric signature is (−, +, +, +). We will strive to include the speed of light c explicitly in relevant equations, rather than setting it to 1, given that in numerical implementations c must be treated carefully. Unless otherwise stated, the symbol ≈ means 'approximately equal' or 'equal under certain approximations as detailed in the text', while the symbol means that two quantities are of similar order of magnitude. An overdot denotes the derivative with respect to (wrt) the cosmic time t, e.g.,ȧ ≡ da/dt and the Hubble expansion rate H(a) is defined as H =ȧ/a, while a prime ( ) denotes the derivative wrt the conformal time τ , e.g., a = da / dτ , H(a) ≡ a /a = aH(a). Unless otherwise stated, we use a subscript 0 to denote the present-day value of a physical quantity, an overbar for the background value of a quantity, and a tilde for quantities written in code units.
We note that, since they have a lot in common, including the motivation and the design of code structure and algorithms, this paper has identical or similar texts with its twin paper [103] in the Introduction section, as well as in Sections 3.1, 3.1.1, 3.2 until 3.2.1, 3.2.1, 3.2.2, the last paragraph of 3.2.5, and part of 4.1.

Modified gravity models with derivative coupling terms
In this section we briefly introduce the modified gravity models with derivative coupling terms that are implemented in the MG-GLAM code. We start with the general action of scalar field models in the Einstein frame, where g is the determinant of the metric tensor g µν , M Pl (= 1/ √ 8πG) is the reduced Planck mass, G is Newton's constant, R is the Ricci scalar, K is a general kinetic function which contains nonlinear terms of the derivatives of the scalar field, V (φ) the potential energy of the scalar field φ,ψ (i) m are the matter fields, andĝ µν is the Jordan-frame metric that couples to them.
The Jordan-frame metricĝ µν and Einstein-frame metric g µν are assumed to be related to each other by the following conformal mapping, where A is a function of the scalar field φ. Disformal relations between the two metrics are possible, but they are not considered here. By varying the action Eq. (2.1) with respect to the scalar field, we obtain the following equation of motion where ρ m is the density of non-relativistic matter. We define the coupling strength β(φ) as a dimensionless function of φ: Note the M Pl in this definition, which is because φ has mass dimension 1. For later convenience, we shall define a dimensionless scalar field as Two classes of models of Eq. (2.1) are of particular interest in the literature. The first is what we call 'Vainshtein-type' modified gravity models, which employs the Vainshtein screening mechanism [102] to decouple the scalar field from matter in regions where the second derivatives of the field are large. The second is the 'Kmouflage-type' gravity models, which employs the Kmouflage screening mechanism [29,110] to hide the effect of the scalar field in regions where the field has a large gradient.
In the next subsections we describe the theoretical aspects of both Vainshtein-type and Kmouflagetype gravity models.

Vainshtein-type gravity
An excellent example of Vainshtein-type models is the Galileon model [24] and its covariant extension [25], which is a generic description of self-interacting scalar field models whose Lagrangian is invariant under the Galilean shift, ∂ µ ϕ → ∂ µ ϕ+b µ , with b µ being a constant 4-vector. Simulations of these models have been carried out previously, e.g., [111,112], along with other approaches to studying the nonlinear structure formation in these models, e.g., [113]. In recent years, the vector Galileon, or generalised Proca, theory has attracted attentions, e.g., [114][115][116]. As the Galileon model, these models also employ the Vainshtein screening mechanism to suppress the effect of modified gravity in regions where the second derivative of the field is large. But unlike Galileons, here the dynamical degrees of freedom are the spatial components of some vector field, whose transverse mode plays a negligible role in cosmic structure formation [117] while the longitudinal mode behaves like the Galileon field ϕ (with the difference that the vector field has no dynamics on the background). Simulations of vector Galileons have been recently carried out in [117,118]. These models have rich phenomenology, able to modify the background expansion history as well as the gravitational potential, and hence propagate a modified gravity-or fifth-force between matter particles and affect large-scale structure formation.
In this paper, we consider another class of models that realise the Vainshtein screening mechanism, the Dvali-Gabadadze-Porrati (DGP) [23] brane-world model, as our toy Vainshtein-type gravity model. This choice is for a few reasons. First, the DGP model has been very popular in the literature, being widely used as a testbed for the Vainshtein mechanism. Second, it has great flexibility in terms of the background expansion history (although there is a catch as we will see later), and usually one can make the model have an expansion rate identical to that of ΛCDM, to focus on the anaysis of the effects of the fifth force. Finally and more importantly, owing to its simplicity, this model can be used as a prototype for all Vainshtein-type models, to understand the effects of the screening mechanism; a simulation code model can be easily modified to simulate the Galileon and vector Galileon models, as well as generalised Galileons [119] and kinetic-gravity braiding models [120], which all share a similar equation of motion for the dynamical field.
In the DGP model, the Universe is a four-dimensional 'brane' embedded in a five-dimensional spacetime, or bulk. The total action of the model is written by, where g µν , g, R and G are respectively the metric tensor, the determinant of the metric, the Ricci scalar and the gravitational constant in the 4-D brane, while g (5) , R (5) and G (5) are their equivalents in the 5-D bulk, and S m is the action of the matter fields ψ i which are assumed to be confined on the brane.
A new parameter can be introduced, which is defined as the ratio of G (5) and G and known as the crossover scale, r c , It has the physical meaning of being roughly the scale at which the behaviour of gravity transitions from 4-D standard Einsteinian (r r c ) to 5-D (r r c ), where gravitons could leak into the fifth dimension.
Here we study the normal-branch (nDGP) model, where the variation of the action, Eq. (2.6), yields the modified Friedmann equation in a homogeneous and isotropic universe with Ω rc ≡ c 2 /(4H 2 0 r 2 c ) where c is the speed of light, Ω m is the present-day value of the matter density parameter, the dark energy density parameter Ω DE (a) is defined as Ω DE (a) ≡ 8πGρ DE (a)/3H 2 (a), a is the scale factor and H 0 is the present-day value of the Hubble parameter. The nDGP model on its own cannot lead to an accelerated Hubble expansion, which is why an extra dark energy component has to be added to match observational data: because there is not much a priori requirement on this dark energy component, it is often assumed to have such an equation of state that the overall effect of Eq. (2.8) is to give a ΛCDM expansion history (note that this is not possible if this dark energy component is assumed to be a cosmological constant); also, the dark energy component is assumed to be non-clustering so that its effect is only on the background expansion. In this model, deviations from GR can be characterised in terms of the parameter H 0 r c /c. As we can see from Eq. (2.8) if H 0 r c /c → ∞ then the equation of state of the dark energy component approaches −1 in order to produce a ΛCDM expansion history.
The structure formation in the nDGP model is governed by the Poisson and scalar equations in the quasi-static and weak-field limits: [121], where ϕ is a scalar degree of freedom related to the bending modes of the brane (which describes the position of the brane in the fifth dimension), the total modified gravitational potential Φ is given by Φ = Φ N + 1 2 ϕ with Φ N being the standard Newtonian potential, δρ m = ρ m −ρ m is the perturbation of non-relativistic matter density, and β DGP (a) = 1 + 2H r c 1 +Ḣ In the last expression we have used the above assumption that the nDGP model has the same expansion history as the ΛCDM model, i.e., the Hubble parameter is written as where Ω Λ is the contribution of Λ in the ΛCDM model, defined as Ω Λ ≡ 1−Ω m . Note that throughout this paper we assume that the Universe is spatially flat, and neglect the contribution by radiation unless otherwise stated. From Eq. (2.9), it is straightforward to identify the modified gravity contribution to the gravitational acceleration, If we linearise Eq. (2.10), the two nonlinear terms in the squared brackets vanish and the modified Poisson equation, Eq. (2.9), can be re-expressed as which represents a time-dependent and scale-independent rescaling of Newton's constant. Since β DGP is always positive, the formation of structure is enhanced in this model with respect to ΛCDM.
The linear growth for the matter fluctuations in the nDGP model can be obtained by solving the equation of the linear growth factor, D, 15) where N = ln(a), and 1/3β DGP is the ratio between the strengths of the fifth and standard Newtonian forces in the linear regime, which is scale independent (see derivation below).

Vainshtein screening mechanism
As mentioned above, the nDGP model is a representative class of modified gravity models that feature the Vainshtein screening mechanism [102].
in which for simplicity we have set a = 1, and g N is the Newtonian acceleration caused by the mass M (r) at distance r from the centre, Eq. (2.17).
If we further assume that δρ m is a constant within a radius R and zero outside, then Eq. (2.18) has the physical solution for r ≤ R. In these expressions r V is the Vainshtein radius which is defined as where r S ≡ 2GM (R)/c 2 is the Schwarzschild radius and M (R) ≡ 4π R 0 δρ m (r )r 2 dr is the total mass of the spherical object.
According to Eq. (2.9), the fifth force is given by 1 2 dϕ/dr. Therefore at r r V we have indicating that on scales larger than the Vainshtein radius gravity is enhanced (because β DGP > 0 for the nDGP model) by a scale-independent factor 1/3β DGP . On the other hand, for r, R r V we have indicating that the fifth force is suppressed (or screened), relative to the Newtonian force, well within the Vainshtein radius.

Kmouflage-type gravity
The Kmouflage model [29] is another class of screened modified gravity models, in which V (φ) = 0 and the scalar field satisfies an equation of motion, Eq. (2.3), that takes the following form [122,123]: where K(X) is the kinetic function in Eq. (2.1) which needs to be specified for a given model, which has mass dimension four, A(ϕ) is the coupling function between the scalar field and matter, which in this work we assume to take the exponential form: β Kmo is a constant model parameter, K X = dK(X)/dX for a given function K(X). For convenience, from here on we specify to the dimensionless versions of K(X) and X-which for simplicity are still denoted by the same notations-where the dimensionless K will be defined the dimensional kinetic function K in Eq. (2.1) divided by Λ 4 , and is a dimensionless quantity and Λ is a model parameter of mass dimension 1 related to dark energy.φ is the background value of the scalar field ϕ, ∇ i is raised by the metric δ ij , and the a −2 is because by default X should use the physical derivatives while here we have written things using the comoving derivatives.
In addition to featuring a qualitatively different-and less explored-screening mechanism, the Kmouflage model can also be considered as a natural generalisation of the well-known k-essence model [124,125] by allowing a direct coupling of the k-essence scalar field with matter via the coupling function A(ϕ). Furthermore, the equation of motion in the Kmouflage model, Eq. (2.24), is featured in other models, such as the charged dark matter model proposed in [126] and the covariant models of MOdified Newtonian Dynamics (MOND; e.g., [127,128]). Thus, a simulation code for Kmouflage can be a prototype for simulating these other models. There has been very little work on the simulations of Kmouflage models so far, and in this work we will develop a code to do this 4 .
For convenience, we define a dimensionless parameter λ so that and X can be rewritten more as where we have explicitly included a factor containing the speed of light c. Note that the parameter λ satisfies λ ∼ O(1), because the model parameter Λ is chosen such that it plays the role of accelerating the cosmic expansion at late times, meaning that at low z we have Λ 4 /M 2 Pl ∼ 8πGρ DE /3 ∼ H 2 0 Ω DE . We will describe how to determine the numerical value of λ in the MG-GLAM code later.
A possible choice of the function K(X) that has been studied previously [122,123,130,131] is where the integer n satisfies n ≥ 2 and K 0 is a dimensionless model parameter. In this model, the modified Poisson equation is given by, 30) and the total force on matter particles is given by where r is the particle coordinate, t is the physical time, and dr/dt is the peculiar velocity and The force equation can be rewritten as where x is the comoving coordinate and the ∇ symbol denotes the comoving gradient, with p ≡ a 2ẋ . The linearised version of the full Kmouflage equation of motion, Eq. (2.24), is For completeness, here is the linear growth equation for matter density contrast δ (or the linear growth factor itself) in the Kmouflage model: where denotes the derivative with respect to the conformal time τ , and K X = dK/dX as above-in our case (2.36) Therefore, we can already observe four effects the Kmouflage scalar field has on structure formation: (i) the modified expansion history, cf. a /a; (ii) a fifth force which can (but may not) be screened by the Kmouflage mechanism, described by 2β 2 Kmo /K X ; (iii) a rescaling of the matter density field by A(ϕ) = 1 in the Poisson equation, implying that the matter particle mass is effectively modified; and (iv) a velocity-dependent force 5 described by the term involving (d ln A/dϕ) ϕ δ . The fifth force has a ratio of 2β 2 Kmo /K X to the Newtonian force, and this will be derived explicitly shortly.

The Kmouflage screening mechanism
Similarly to the Vainshtein screening mechanism, let us consider the static and spherically symmetric form of the Kmouflage equation of motion, Eq. (2.24), which can be integrated once to give, The condition for screening, r < r K , can be written as In the linear perturbation regime, we can neglect the contribution to X by the spatial derivatives and therefore K, K X become purely time-dependent quantities, leading to a constant ratio,

Numerical Implementation
This section is the core part of this paper, where we will describe in detail how the different theoretical models of §2 can be incorporated in a numerical simulation code, so that the scalar degree of freedom can be solved at any given time with any given matter density field. This way, the various effects of the scalar field on cosmic structure formation can be accurately predicted and implemented.

The GLAM code
The GLAM code is presented in [98], and is a promising tool to quickly generate N -body simulations with reasonable speed and acceptable resolution, which are suitable for the massive production of galaxy survey mocks. As a PM code, GLAM solves the Poisson equation for the gravitational potential in a periodic cube using fast Fourier Transformation (FFT). The code uses a 3D mesh for density and potential estimates, and only one mesh is needed for the calculation: the density mesh is replaced with the potential. The gravity solver uses FFT to solve the discrete analogue of the Poisson equation, by applying it first in xand then to y-direction, and finally transposing the matrix to improve data locality before applying FFT in the third (z-)direction. After multiplying this data matrix by the Green's function, an inverse FFT is applied, performing one matrix transposition and three FFTs, to compute the Newtonian potential field on the mesh. The potential is then differentiated using a standard threepoint finite difference scheme to obtain the x, y and z force components at the centres of the mesh cells. These force components are next interpolated to the locations of simulation particles, which are displaced using a leapfrog scheme. A standard Cloud-in-Cell (CIC) interpolation scheme is used for both the assignment of particles to calculate the density values in the mesh cells and the interpolation of the forces.
A combination of parameters that define the resolution and speed of the GLAM code are carefully selected. For example, it uses the FFT5 code (the Fortran 90 version of FFTPACK5.1) because it has an option of real-to-real FFT that uses only half of the memory as compared to FFTW. It typically uses 1/2-1/3 of the number of particles (in 1D) as compared with the mesh size-given that the code is limited by available RAM, this is a better combination than using the same number of particles and mesh points. GLAM uses OPENMP directives to parallelise the solver. Overall, the code scales nearly perfectly, as has been demonstrated by tests run with different mesh sizes and on different processors (later in the paper we will present some actual scaling test of MG-GLAM as well, which again is nearly perfect). MPI parallelisation is used only to run many realisations on different supercomputer nodes with very little inter-node communications. Load balance is excellent since theoretically every realisation requires the same number of CPUs.
Initial conditions are generated on spot by GLAM, using the standard Zel'dovich approximation [132,133] from a user-provided linear matter power spectrum P (k) at z = 0. The code backscales this P (k) to the initial redshift z ini using the linear growth factor for ΛCDM with the specified cosmological parameters. Since the Zel'dovich approximation is less accurate at low redshifts [134], the simulation is typically started at an initial redshift z ini ≥ 100.
GLAM uses a fixed number of time steps, but this number can be specified by the user. The standard choice is about 150-200. In this work, we have compared the model difference of the matter power spectra between modified gravity MG-GLAM and ΛCDM GLAM simulations and found that the result is converged with 160 time steps. Doubling the number of steps from 160 to 320 makes negligible difference.
The code generates the density field, including peculiar velocities, for a particular cosmological model. Nonlinear matter power spectra and halo catalogues at user-specified output redshifts (snapshots) are measured on the fly. For the latter, GLAM employs the Bound Density Maximum (BDM; [135,136]) algorithm to get around the usual limitations placed on the completeness of low-mass haloes by the lack of force resolution in PM simulations. Here we briefly describe the idea behind the BDM halo finder, and further details can be found in [136,137]. The code starts by calculating a local density at the positions of individual particles, using a spherical tophat filter containing a constant number N filter (typically 20) of particles. It then gathers all the density maxima and, for each maximum, finds a sphere that contains a mass M ∆ = 4 3 π∆ρ crit (z)R 3 ∆ , where ρ crit (z) is the critical density at the halo redshift z, and ∆ is the overdensity within the halo radius R ∆ . Throughout this work we will use the virial density definition for ∆ given by [138] where Ω m (z) is the matter density parameter at z. To find distinct haloes, the BDM halo finder still needs to deal with overlapping spheres. To this end, it treats the density maxima as halo centres and finds the one sphere, amongst a group of overlapping ones, with the deepest Newtonian potential. This is treated as a distinct, central, halo. The radii and masses of the haloes which correspond to the other (overlapping) spheres are then found by a procedure that guarantees a smooth transition of the properties of small haloes when they fall into the larger halo to become subhaloes of the latter. The latter is done by defining the radius of the infalling halo as max(R 1 , R 2 ), where R 1 is its distance to the surface of the larger, soon-to-be host, central halo, and R 2 is its distance to the nearest density maximum in the spherical shell [min(R ∆ , R 1 ), max(R ∆ , R 1 )] centred around it (if no density maximum exists in this shell, R 2 = R ∆ ). The BDM halo finder was compared against a range of other halo finders in [137], where good agreement was found. MG-GLAM extends GLAM to a general class of modified gravity theories by adding extra modules for solving MG scalar field equations, which will be introduced in the following subsection.

The GLAM code units
Like most other N -body codes, GLAM uses its own internal unit system. The code units are designed such that the physical equations can be cast in dimensionless form, which is more convenient for numerical solutions.
Let the box size of simulations be L and the number of grid points in one dimension be N g . We can introduce dimensionless coordinatesx, momentap and potentialsΦ using the following relations [98] Having the dimensionless momenta, we can find the peculiar velocity, where we assumed that box size L is given in units of h −1 Mpc. Using these notations, we write the particle equations of motion and the Poisson equation as whereδ is the code unit expression of the density contrast δ. From Eqs. (3.2) we can derive the following units, In what follows, we will also use the following definitioñ for the code-unit expression of the speed of light, c. GLAM uses a regularly spaced three-dimensional mesh of size N 3 g that covers the cubic domain L 3 of a simulation box. The size of a cell, ∆x = L/N g , and the mass of each particle, m p , define the force and mass resolution respectively: where N 3 p is the number of particles and ρ crit,0 is the critical density of the universe at present.

Solvers for the extra degrees of freedom
We have seen in §2 that in modified gravity models we usually need to solve a new, dynamical, degree of freedom, which is governed by some nonlinear, elliptical type, partial differential equation (PDE). Being a nonlinear PDE, unlike the linear Poisson equation solved in default GLAM, the equation can not be solved by a one-step fast Fourier transform 6 but requires a multigrid relaxation scheme to obtain a solution. For completeness, we will first give a concise summary of the relaxation method and its multigrid implementation ( §3.2.1). Next, we will specify the practical side, discussing how to efficiently arrange the memory in the computer, to allow the same memory space to be used for different quantities at different stages of the calculation, therefore minimising the overall memory requirement ( §3.2.2), and also saving the time for frequently allocating and deallocating operations. After that, in §3.2.3- §3.2.4, we will respectively discuss how the nonlinear PDEs in Vainshtein-and Kmouflage-type gravity models can be solved most efficiently. In §3.2.5, we will present how to solve the evolution of the cosmic background in the Kmouflage model. Much effort will be devoted to replacing the common Newton-Gauss-Seidel relaxation method by a nonlinear Gauss-Seidel, which has been found to lead to substantial speedup of simulations [140] (but we will generalise this to more models than focused on in Ref. [140]). For the coupled quintessence model, we will also briefly describe how the background evolution of the scalar field is numerically solved as an integral part of MG-GLAM, to further increase its flexibility.

Multigrid Gauss-Seidel relaxation
Let the partial differential equation (PDE) to be solved take the following form: where u is the scalar field and L is the PDE operator. To solve this equation numerically, we use finite difference to get a discrete version of it on a mesh 7 . Since MG-GLAM is a particle-mesh (PM) code, it has a uniform mesh resolution and does not use adaptive mesh refinement (AMR). When discretised on a uniform mesh with cell size h, the above equation can be denoted as where we have added a nonzero right-hand side, f h , for generality (while f h = 0 on the mesh with cell size h, later when we discrete it on coarser meshes needed for the multigrid implementation, f is no longer necessarily zero). Both u h and f h are evaluated at the cell centres of the given mesh. The solution we obtain numerically,û, is unlikely to be the true solution u h to the discrete equation, and applying the PDE operator on the former gives the following, slightly different, equation: Taking the difference between the above two equations, we get is the local residual, which characterises the inaccuracy of the solutionû h (this is because ifû h = u h , we would expectf h = f h and hence there is zero 'inaccuracy'). d h is also evaluated at cell centres. Later, to check if a given set of numerical solutionû h is acceptable, we will use a global residual, h , which is a single number for the given mesh of cell size h. In this work we choose to define h as the root-mean-squared of d h in all mesh cells (although this is by no means the only possible definition). We will call both d h and h 'residual' as the context will make it clear which one is referred to. Relaxation solves Eq. (3.12) by starting from some approximate trial solution to u h ,û h old , and check if it satisfies the PDE. If not, this trial solution can be updated using a method that is similar to the Newton-Ralphson iterative method to solve nonlinear algebraic equationŝ This process can be repeated iteratively, until the updated solution satisfies the PDE to an acceptable level, i.e., h becomes small enough. In practice, because we are solving the PDE on a mesh, Eq. (3.16) should be performed for all mesh cells, which raises the question of how to order this operation for the many cells. We will adopt the Gauss-Seidel 'black-red chessboard' approach, where the cells are split into two classes, 'black' and 'red', such that all the six direct neighbours 8 of a 'red' cell are black and vice versa. The relaxation operation, Eq. (3.16), is performed in two sweeps, the first for 'black' cells (i.e., only updatingû h in 'black' cells while keeping their values in 'red' cells untouched), while the second for all the 'red' cells. This is a standard method to solve nonlinear elliptical PDEs by using relaxation, known as the Newton-Gauss-Seidel method. However, although this method is generic, it is not always efficient, and later we will describe a less generic alternative which is nevertheless more efficient. Relaxation iterations are useful at reducing the Fourier modes of the error in the trial solution u h , whose wavelengths are comparable to that of the size of the mesh cell h. If we do relaxation on a fine mesh, this means that the short-wave modes of the error are quickly reduced, but the long-wave modes are generally much slower to decrease, which can lead to a slow convergence of the relaxation iterations. A useful approach to solve this problem is by using multigrid: after a few iterations on the fine level, we 'move' the equation to a coarser level where the cell size is larger and the longer-wave modes of the error inû h can be more quickly decreased. The discretised PDE on the coarser level is given by where the superscript H denotes the coarse level where the cell size is H (in our case H = 2h), and R denotes the restriction operator which interpolates quantities from the fine level to the coarse level. In our numerical implementation, a coarse (cubic) cell contains 8 fine (cubic) cells of equal volume, and the restriction operation can be conveniently taken as the arithmetic average of the values of the quantity to be interpolated in the 8 fine cells. Eq. (3.17) can be solved using relaxation similarly to Eq. (3.13), for which the numerical solution is denoted asû H . This can be used to 'correct' and 'improve' the approximate solutionû h on the fine level, asû where P is the prolongation operation which does the interpolation from the coarse to the fine levels.
In this work we shall use the following definition of the prolongation operation: for a given fine cell, 1. find its parent cell, i.e., the coarser cell that contains the fine cell; 2. find the seven neighbours of the parent cell, i.e., the coarser cells which share a face (there are The above is a simple illustration of how multigrid works for two levels of mesh resolution, h and H. In principle, multigrid can be and is usually implemented using more than two levels. In this paper we will use a hierarchy of increasingly coarser meshes with the coarsest one having 4 3 cells. There are flexibilities in how to arrange the relaxations at different levels. The most-commonly used arrangement is the so-called V-cycle, where one starts from the finest level, moves to the coarsest one performing relaxation iterations on each of the intermediate levels (cf. Eq. (3.17)), and then moves straight back to the finest performing corrections using Eq. (3.18) on each of the intermediate levels.
Other arrangements, such as F-cycle and W-cycle (cf. Fig. 1), are sometimes more efficient in improving the convergence rate ofû h to u u , and we have implemented them in MG-GLAM as well.

Memory usage
GLAM uses a single array to store mesh quantities, such as the matter density field and the Newtonian potential, because at any given time only one of these is needed. The Newtonian force at cell centres is calculated by finite-differencing the potential and then interpolated to the particle positions. To be memory efficient, GLAM also opts not to create a separate array to store the forces at the cell centres, but instead directly calculates them at the particle positions immediately before updating the particle velocities.
With the new scalar field to be solved in modified gravity models, we need two additional arrays of size N 3 g , where N 3 g is the number of cells of the PM grid (i.e., there are N g cells in each direction of the cubic simulation box). This leads to three arrays. Array 1 is the default array in GLAM, which is The relaxation always starts on the finest level, and the solid lines show how the multigrid solver walks through the different levels, performing Gauss-Seidel relaxation iterations at each level (denoted by the circles), called smoothing. Only one single full cycle is shown for each case. The solver walks over the multigrid levels more times in W-cycle than in F-cycle and V-cycle, and thus it requires fewer cycles in the former case to arrive at a converged solution. However, it is also computationally more expensive. We will compare the performances of the three different arrangements in real cosmological simulations in § 4.3 used to store the density field ρ and the Newtonian potential Φ (at different stages of the simulation). Note that the density field is also needed when solving the scalar field equation of motion during the relaxation iterations, and so we cannot use this array to also store the scalar field. On the other hand, we will solve the Newtonian potential after the scalar field, by when it is safe to overwrite this array with Φ. Array2 is exclusively used to store the scalar field solutionû h on the PM grid, which will be used to calculate the fifth force. Array3 is used to store the various intermediate quantities which are created for the implementation of the multigrid relaxation, such as d h ,û H , Rû h , Rd h , S H and ρ H , the last of which is the density field on the coarser level H , which appears in the coarse-level discrete PDE operator L H .
To be concrete, we imagine the 3D array (Array3) as a cubic box with N 3 g cubic cells of equal size. An array element, denoted by (i, j, k), represents the ith cell in the x direction, jth cell in the y direction and kth cell in the z direction, with i, j, k = 1, · · · , N g . We divide this array into 8 sections, each of which can be considered to correspond to one of the 8 octants that equally divide the volume of the cubic box. The range of (i, j, k) of each section and the quantity stored in that section of Array3 are summarised in the table below: Let us explain this more explicitly. First of all, the whole Array3, of size N 3 g , will be used to store the residual value d h on the PM grid (which has N 3 g cells). From now on, we label this grid by 'level-', and use 'level-( − m)' to denote the grid that are m times coarser, i.e., if the cell size of the PM grid is h, then the cells in this coarse grid have a size of 2 m h. In the table above we have used d to denote the d h on level-, and so on. Note that we always use N g = 2 .
The local residual d h on a fine grid is only needed for two purposes: (1) to calculate the global residual on that grid, h , which is needed to decide convergence of the relaxation, and (2) to calculate the coarse-level PDE operator L H that is needed for the multigrid acceleration, as per Eq. (3.17). This suggests that d h does not have to occupy Array3 all the time, and so this array can be reused to store other intermediate quantities (see the last column of the above table) after we have obtained h .
In our arrangement, Section 1 stores the residual residual Rd , Section 2 stores the restricted density field ρ −1 = Rρ , Sections 3 and 4 store, respectively, the restricted scalar field solution Rû and the coarse-grid scalar field solutionû −1 -the former is needed to calculate S −1 in Eq. (3.17) and to correct the fine-grid solution using Eq. (3.18), which is fixed after calculation, while the latter is updated during the coarse-grid relaxation sweeps 9 . Section 7 stores the coarse-grid source S −1 for the PDE operator L −1 as defined in Eq. (3.17), and finally Section 6 stores the residual on the coarse level, d −1 . Note that all these quantities are for level-( − 1), so that they can be stored in section of Array3 of size (N g /2) 3 . Section 8 is not used to store anything other than d .
We have not touched Section 5 so far -this section is reserved to store the same quantities as above, but for level-( − 2), which are needed if we want to use more than two levels of multigrid. It is further divided into 8 section, each of which will play the same roles as detailed in the table above 10 . In particular, the (sub)Section 5 of Section 5 is reserved for quantities on level-( − 3), and so on. In this way, there is no need to create separate arrays of various sizes to store the intermediate quantities on different multigrid levels which therefore saves memory.
There is a small tricky issue here: as we mentioned above, the local residual d on the PM grid is needed to calculate the coarse-grid source S −1 using Eq. (3.17), thus we will be using the quantity d stored in Array3 to calculate Rd and then write it to (part of) the same array, running the risk of overwriting some of the data while it is still needed. To avoid this problem, we refrain from using the d data already stored in Array3, but instead recalculate it in the subroutine to calculate Rd (this only needs to be done for level-). With a bit of extra computation, this enables use to avoid creating another array of similar size to Array3.
Since Array3 stores different quantities in different parts, care must be excised when assessing these data. There is a simple rule for this: suppose that we need to read or write the quantities on the coarse grid of level-( − m) with m ≥ 1. These are 3-dimensional quantities with the three directions labelled by I, J, K, which run over 1, · · · , 2 −m , and we have where i, j, k = 1, · · · , N g run over the entire Array3.
We can estimate the required memory for MG-GLAM simulations as follows. As mentioned above, the code uses a 3D array of single precision to store both the density field and the Newtonian potential, and one set of arrays for particle positions and velocities. In addition, two arrays are added to store the scalar field solution (Array2) and various intermediate quantities in the multigrid relaxation solver (Array3). In the cosmological simulations described in this paper, we have used double precision for the two new arrays, and we have checked that using single precision slightly speeds up the simulation, while agreeing with the double-precision results within 0.001% and 0.5% respectively for the matter power spectrum and halo mass function. Given its fast speed and its shared-memory nature, memory is expected to be the main limiting factor for large MG-GLAM jobs. For this reason, we assume that all arrays are set to be single precision for future runs, and this leads to the following estimate of the total required memory: where we have used 1 GB = 1024 3 bytes. This is slightly more than twice the memory requirement of the default GLAM code, which is 52 (N p /1000) 3 GB [98].

Implementation of Vainshtein-type gravity models
Having described the code and data structure of MG-GLAM, we next discuss in greater detail how each of the two classes of models studied in this paper is implemented, starting from Vainshtein-type models.
Since ϕ plays the role of the conservative potential of the fifth force ( § 2.1), we can choose the same code unit for it as for the Newtonian potential Φ: We also introduce the code-unit counterpart of the cross-over scale r c as which is consistent with the code unit for comoving coordinate or length. Using the code unit expression for the speed of light c, Eq. (3.7), it can be shown that where R c is a new dimensionless model parameter and Ω rc has been introduced above. We can then recast the DGP equation of motion, Eq. (2.10), in code unit as whereã MG denotes the modified gravity contribution to the gravitational acceleration in code units. For simplicity, in what follows we neglect the tildes in Eq. (3.24). Making the following defining decomposition of the second derivative of the scalar field [112,139], so that∇ i∇j ϕ has zero trace, i.e.,∇ i∇ i ϕ = 0, one can show that Eq. (3.28) has two branches of solutions: Which branch is the physical solution depends on the sign of α and hence of the function β DGP (a).
The requirement is that, as δ → 0, i.e., for a homogeneous density field, we must have a homogeneous scalar field, and so ∇ 2 ϕ → 0. Therefore, the solution can be written as with the function sign(x) = 1 for x ≥ 0 and −1 for x < 0.
The solve it on a discrete mesh, the continuous equation, (3.31), is first discretised as L h ϕ i,j,k = 0, where the operator L h is defined as with where h here denotes the simulation mesh cell size in code units, as introduced in § 3.2.1 (this is the same symbol as used for the dimensionless Hubble constant, but not confusion should arise given the context); i, j, k are the indices of cells in the simulation mesh, with, e.g., (i + 1, j, k) denoting the neighbouring cell to the right of cell (i, j, k), with the same y, z coordinates. This discretisation has second-order accuracy, meaning that its deviation from the true value reduces as O h 2 . This equation can be solved using the multigrid relaxation method described above, for which the code iterates to update the value of ϕ i,j,k in all cells, and at each iteration the field values changes as where we have As mentioned in [141], the operator splitting of Eq. (3.26) and the manipulation of the default discrete DGP equation into the Poisson-equation-like form of Eq. (3.31) are critical for obtaining reasonable convergence properties of the relaxation solver. The latter also makes the code more efficient as there is no need for expensive and approximate Newton-Gauss-Seidel iterations 11 . We will follow the same spirit in designing the relaxation algorithm for Kmouflage-type models next. 11 Eq. (3.34) can be considered as the exact solution of a linear equation for ϕ h,new i,j,k so that there is no need for the Newton iterations, though we note that this linear equation itself is only an approximation to the full DGP equation, (3.31), where Σ depends on the field itself. The key point here is that the discretion of Σ does not depend on ϕ i,j,k but only depends on the field values in neighbouring cells to (i, j, k).

Implementation of Kmouflage-type models
For this model, we define the following code unit for ϕ, Crucially, we note that this unit only applies to the scalar field when we take the spatial derivatives of it, while the time derivative of the scalar field is treated differently. Alternatively, one can understand the ϕ here as the spatial perturbation of the total Kmouflage field, i.e., δϕ = ϕ −φ. In the quasi-static approximation with which we work in this paper, the equations to be solved contain only the spatial derivatives of δϕ and the field value or time derivatives of ϕ ≈φ (because |δϕ| |φ|). Therefore, we opt to use ϕ to also denote δϕ for simplicity, and the context should make it clear which quantity is being referred to.
With this, we get the following expression ofX, which is the code-unit counterpart of X, where, as stated in the introduction, denotes the derivative with respect to the conformal time τ , so that ϕ = aφ. Then, in code units, the equation of motion, Eq. (2.24), can be recast as . (3.39) Here we have evaluated dA(ϕ)/dϕ atφ because the perturbation to the scalar field is generally much smaller than the background valueφ itself, which is of order 0.1 ∼ 1 at late times (see § 4.1). This equation, however, has a potential issue. To see this, let's consider the simple case of a 1D density field, say, which depends only on the x coordinate. Then the equation becomes The second term in the square brackets on the left-hand side is negative in the regime of (3.41) While |φ /H 0 | O 10 −3 at late times, at z > 10 it can be much larger (note that the denominator is H 0 ). For γ > 0, in certain regimes the coefficient of∂ 2 xφ can cross 0, which leads to a singularity. Instead of the model being unphysical in these regimes, this is more likely a consequence of deriving the equation in the quasi-static and weak-field approximations, because even when the coefficient of ∂ 2φ is zero, the left-hand side of Eq. (3.40) should have had terms that involve time derivatives of the field so that the full equation is still physical. As we are mostly interested in the Kmouflage screening mechanism in this work, we circumvent this potential numerical issue by slightly modify Eq. (3.38) to the following form: This should not affect the Kmouflage screening because it mainly takes effect in the highly nonlinear regime, where the spatial term in X (orX) is much larger than the temporary contribution. In the linear regime, when the spatial contribution in X is subdominant, the above equation should also reproduce the perturbation behaviour of the fifth force. Eq. (3.42) is a nonlinear equation inφ. As mentioned towards the end of the last subsection, we also apply the operator splitting of Eq. (3.26) to improve the stability and convergence properties of the relaxation solver for the Kmouflage model. After some manipulation, this leads to the following equivalent form of the Kmouflage equation, where we notice that, after discretisation, only the left-hand side containsφ i,j,k because∇ i∇jφ does not containφ i,j,k , and neither does∇ iφ . The latter is because, at second order accuracy, we have the following discrete version of the scalar field gradient: Therefore, the code-unit equation can be written in the following simplified form: where and Σ 1,2 do not have contribution from the central cell,φ i,j,k , as described just now. This is therefore essentially a linear equation for∇ 2φ . The discrete version of∇ i∇j ϕ∇ i ϕ∇ j ϕ (here we have again neglected the tildes temporarily for simplicity) can be written aŝ As mentioned in § (2.2), the Kmouflage field has 4 effects on cosmological structure formation, and thus we also need to write the other effects in code units. Using the code-unit expressions Eqs. (3.2), (3.7) and (3.36), we can rewrite the force equation, Eq. (2.33), into Consider the linear-theory behaviour of the model, where Eq. (3.45) can be simplified as Meanwhile, the Poisson equation is modified tõ This means that as an approximation we havẽ 51) and the ratio between the fifth force (βc∇φ) and Newtonian gravity (∇Φ N ) is Note that here the Newtonian gravity is the force that already accounts for the particle mass variation. If F N is the standard Newtonian gravity force (no particle mass variation taken into account yet), the ratio would become These agree with the fifth-force-to-Newtonian-gravity ratio used in Eq. (2.35), and so it confirms that the code-unit equations are correct and that the modification to Eq. (3.42) indeed does not change the linear theory evolution of the model.

Kmouflage background cosmology solver
Because Eqs. (3.45, 3.47, 3.48, 3.50) involve various background quantities such asȧ,φ and dφ/da, for any given Kmouflage model we need to solve its background evolution. This is governed by the following equation [130], which is the background part of the Kmouflage equation (2.24): where K XX ≡ d 2 K/dX 2 , along with the modified Friedmann equation (recall that we assume here a flat Universe, k = 0) and the modified Raychaudhuri equation, whereρ r denotes the background density of radiations (we assume that all three species of neutrinos are massless and thus counted as radiation). The Friedmann equation (3.55) containsφ 2 , both explicitly and inside functions ofX, on the right-hand side. Writing where N ≡ ln(a) and for simplicity we have used an over-circle to denote the derivative with respect to N , that equation can be recast, after some manipulation, as (3.59)  (3.59). However, we note that Eqs. (3.58, 3.59) also both depend onφ, so that these equations are coupled. To solve them, we note that for a given time (a or N ) andφ, Eq. (3.58) can be considered as a quadratic (in case of n = 2) or cubic (for n = 3) equation 12 of H 2 /H 2 0 , which can be solved analytically (the expressions of the solutions will not be presented here). This can be substituted into Eq. (3.59) to find H /H 2 0 at the same a (or N ) and for the sameφ. After that,φ, H 2 /H 2 0 and H /H 2 0 at time a or N can be used to calculate˚φ using Eq.  In our calculation we have included both radiation and non-relativistic matter, with 'radiation' including CMB photons with a current temperature of 2.7255 K and 3.046 flavours of massless neutrinos. We defer the implementation of massive neutrinos which couple to the scalar field in a different way from non-relativistic matter in the Kmouflage model, to future works.
We remark that λ is not a free parameter of the model. Rather, once the density parameters Ω m , ρ r0 and H 0 are specified, λ, which roughly quantifies the amount of dark energy in this model, must take some certain value in order to ensure consistency -if λ is too large, the predicted H(a = 1), by solving Eqs. (3.61, 3.58, 3.59) with given initial conditions ofφ andφ, will be larger than the desired (input) value of H 0 , and vice versa. In practice, MG-GLAM starts from a trial value of λ = 1, evolves the above equations from some initial redshift (z i = 10 5 ) to z = 0, and checks if the calculated value of H(a = 1) is equal to the desired value H 0 (within a small relative error of order O 10 −6 ) -if the predicted H(a = 1) value overshoots the desired H 0 , λ is decreased, and vice versa. This process is repeated until we have obtained a good approximation to λ, with the relative error of the predicted H 0 less than 10 −6 . The initial conditions ofφ andφ at z i = 10 5 are not important, as long as their values are sufficiently small (in the MG-GLAM code we set them to be both 10 −30 ). Once the value of λ has been determined in this way, it is stored to be used in other parts of the code; also stored are a large array for the various background quantities such as H,Ḣ,φ andφ -if needed at any time by the Kmouflage field solver of MG-GLAM, these quantities will be linearly interpolated in the scale factor a or N = ln(a).

Numerical code tests
We have performed a series of code tests to check that our MG solvers work correctly following the framework of the ECOSMOG and MG-AREPO codes [96,141]. To this end, we have run low-resolution simulations with box size L = 256 h −1 Mpc and N g = 256 grid cells in each coordinate direction.

Background cosmology tests
Of the two classes of models considered in this work, the nDGP models have an expansion history identical to that of ΛCDM by design, but the Kmouflage models can have non-negligible deviations from ΛCDM in background expansion [130]. Our numerical solver of the background equations have been described in § 3.2.5, and in this subsection we test the reliability of that implementation.
To this end, we have compared the predictions by the numerical Kmouflage background solver in MG-GLAM with the results obtained using a modified version of the CAMB code used in [130]. The results are shown in Fig. 2, where the left panel shows the background Kmouflage field as a function of the scale factor a, and the right panel shows the ratio between the modified expansion rate H MG (a) and that of standard ΛCDM, H GR (a), with the same Ω m and H 0 . As we can see, for both quantities and all models tested here, the two codes agree very well.
In this figure, we have shown the results of fixed n = 2 and β Kmo = 0.2, but varying values of K 0 ; however, we have checked that the same agreement between the two codes hold for other values of n and β Kmo .
We note that in the models studied here, the background scalar field is negative,φ < 0, and decays over time. This has two implications: (i) the direction-dependent force in Eq. (2.33) or Eq. (3.48), −β Kmo dφ dap , points to the direction of the particle's movement, which means that it actually speeds up the particle rather than acting as a 'friction' force; (ii) given that β Kmo > 0 in the models studied here, we have A(φ) = exp (β Kmoφ ) < 1 at late times, which means that the particles contribute less to the Poisson equation, cf. the discussion below Eq. (2.35); equivalently, we can consider this as a decrease of the effective dark matter particle mass over time.
Therefore, we can have a quick discussion about how the 4 effects of the Kmouflage model in structure formation, discussed below Eq. (2.35), depend on the parameter K 0 , when n = 2 and β Kmo is fixed. This may also help us appreciate the complexity of this model when discussing its effects on the halo mass function below.
• varying particle mass: the Kmofulage models have A(φ) < 1 and the smaller K 0 is (we only focus on the cases with K 0 > 1 here), the smaller A(φ) becomes, which reduces the Newtonian force and hence weakens structure formation.
• modified expansion rate: as shown in the right panel of Fig. 2, decreasing K 0 slows down the expansion rate more, which can enhance structure formation. However, even for K 0 = 1 the expansion rate is only ≈ 2% smaller than in ΛCDM, and so this effect is expected to be small.
• direction-dependent force: for fixed β Kmo , the amplitude of this force (for particles moving at the same speed) depends on |dφ/da|, which is clearly larger for smaller K 0 values.
• the fifth force: the ratio between the amplitudes of the fifth and Newtonian forces is 2β 2 Kmo /K X , with K X (X) given in Eq. (2.36). Neglecting the weak dependence of λ on K 0 , we can see that the size of K X is a result of the competition between K 0 and |φ | or equivalently |dφ/da|: but from the left panel of Fig. 2 it is evident that K 0 varies more than (φ ) 2 , and so K X decreases with a decreasing K 0 , making the fifth force force relatively stronger.
Therefore, the effect of varying particle mass works against all the remaining three effects, and which side wins the competition of boosting versus weakening structure formation can only be answered by numerical solutions.

Density field tests
This subsection is devoted to the tests of the multigrid solvers for the nDGP and Kmouflage models, using different density configurations for which the scalar field solution can be solved analytically or using a different numerical code.

Uniform density field tests
For the first test we consider the case where the solution of the scalar field, ϕ, is constant in space. A constant field should be obtained if we choose a homogeneous matter distribution (i.e., the density field is uniform and equal to the cosmological background value). To check this we have setδ i,j,k = 0 and chose a set of random values that follow a uniform distribution in the range [−0.05, 0.05] as initial guesses ofφ i,j,k , then we let the code run until the residual is d ≤ 10 −8 .
The results of this test are shown in the upper left panel of Fig. 3, where the orange (blue) dots represent the initial guess, and the orange (blue) solid line is the numerical solution after relaxation, in the nDGP (Kmouflage) case. In both cases a constant solution is obtained by the code, as expected.

1D density field tests
For our next test, we consider a one-dimensional sine density field (varying in the x direction) given by,δ for nDGP and , (4.2) for Kmouflage, where the model parameters are set as n = 2, K 0 = 1, β Kmo = 0.1, while A = 0.1, K = 4 are extra parameters describing the specific density field. We have checked other parameter values and found similar agreement, but we only present the results for one set of parameters here, to make the plot easier to read.
The analytical solutions of the nDGP and Kmouflage scalar field equations of motion, Eq. (3.24) and Eq. (3.45), for these density fields are, respectively.
The results of this test are shown in the upper right panel of Fig. 3, where the orange (blue) dots correspond to the numerical solution and the orange (blue) solid line represents the analytical solution for the nDGP (Kmouflage) model. The code is able to accurately recover the analytical predictions in both models.

3D spherical overdensity field tests
The 3D spherical tests help us to check that the code is able to solve the nonlinear terms of the nDGP and Kmouflage equations correctly. For the nDGP spherical test we use the code units and a = 1, so that Eqs. (2.19) and (2.20) can be written as  forr ≤R and forr ≥R, wherer is the comoving radial distance from the centre of the spherical overdensity,R is the radius of the latter andδ is the (constant) value of the overdensity insideR, all in code units. Similarly, the Kmouflage equation, (2.37), in code units can be solved (for the special case of n = 2) as dφ dr where g(r) is a function defined as g(r) ≡ γ 2 108f (r) + 20.78460969 27f 2 (r) + 4 γ , (4.8) which is obtained by analytically solving a cubic equation satisfied by dφ/dr, and the function f (r) is defined as For these tests, we place the spherical overdensity in the centre of the grid andr is defined as, where (x,ỹ,z) is the coordinate of a mesh cell in code units, withx,ỹ,z running from 0 to N g . For cells withr ≤R, we setδ to a nonzero value; otherwiseδ = 0.0. We use the values ofR = 0.1N g , δ = 0.5 and H 0 r c = 0.5, 1, 5 for nDGP andR = 0.1N g ,δ = 5000 and K 0 = 1, 10, 100 with n = 2 and β = 0.2 for Kmouflage. In both models, the above analytical solutions are for dφ/dr. We then numerically integrate this quantity to get the radial profiles ofφ. The solutionsφ(r) obtained this way may have a constant shift relative to the numerical solutions obtained by MG-GLAM, which is because the DGP and Kmouflage equations contain only spatial derivatives of the scalar field 14 , and so any solution to these equations shifted by a constant value everywhere would still be a valid solution. Thus, to compare the analytical and numerical solutions, we shift the former so that it has the same peak value as the latter.
The results from these tests are shown in the lower left and right panels of Fig. 3 for the nDGP and the Kmouflage models, respectively. The coloured symbols in the different panels represent the numerical solutions from MG-GLAM and the solid lines are the analytical solutions. We can see that the two agree well, especially at smallr, i.e., close to the centre of the spherical overdensity. Far from the centre, the agreement becomes poorer because the analytical solution does not assume periodicity of the spherical overdensity, while the numerical code uses periodic boundary conditions so that the field sees the overdensities in the replicated boxes as well.

Convergence tests
As mentioned in § 3.2.1, in MG-GLAM we have implemented three different arrangements of the multigrid solver -V-cycles, F-cycles and W-cycles. We have compared the accuracy and computational costs of these arrangements. To do so, we have run a series of smaller simulations for the nDGP model with H 0 r c = 1 and for Kmouflage with n = 2, K 0 = 1 and β Kmo = 0.2. The simulations follow the evolution of 512 3 dark-matter particles in a cubic box of length L = 256 h −1 Mpc with N g = 1024 grid points in each direction. We use 10, 3 and 2 V-cycles (V10, V3 and V2), one F-cycle (F1) and one W-cycle (W1) to test the convergence of the solution. In all cases, within each cycle the code transverse the mesh twice to perform Gauss-Seidel relaxation.
In Fig. 4 we show the relative difference of the nonlinear matter spectrum measured at z = 0 from our test simulations described above for the nDGP (left panel) and Kmouflage (right panel) models where the benchmark case is V10 (black solid line). We find a permille agreement between all the different schemes, and different numbers of cycles used to solve the PDEs, on almost all scales. However, the running time is larger when using more cycles or iterations, i.e., the slowest simulations are those using V10. The F-cycles and W-cycles are more efficient in reducing the residual, which is not surprising given that they walk more times across the fine and coarse multigrid levels. However, they are also slower than V2. As a compromise between accuracy and cost, we have therefore decided to always use V2 in our cosmological runs. It is actually incredible to reach convergence with just two V-cycles (and two Gauss-Seidel passings of the entire mesh in each cycle), for nonlinear equations in the DGP and Kmouflage models.

Scaling tests
To test the parallelisation performance and scalability of MG-GLAM, we have run a series of simulations for the nDGP model with H 0 r c = 1, with varying sizes and/or resolutions. The strong scaling is shown in the left panel of Fig. 5, where we test the speed-up of the code when varying the number of OPENMP threads while fixing the size of the simulation. The test simulations follow the evolution of N 3 p = 256 3 particles in a box of size L = 128 h −1 Mpc with 512 3 grids. We vary the number of threads from 1 to 56 (symbols) and found a nearly perfect agreement with the ideal linear scaling relation (dashed line) when using up to 16 threads. The code also shows good scalability when using up to 56 threads, and the deviation from ideal scaling is likely caused by the fact that the test run has a small size so that the overhead becomes a significant fraction of the total time when using too many threads.
The right panel of Fig. 5 displays the result of the tests with fixed number of OPENMP threads (56), but varying the simulation size. For this test we run five simulations with different number of grid points and DM particles, N g = 256, 512, 1024, 2048 and 4096 (symbols) with N p = N g /2 and L = 512h −1 Mpc in all cases. Again we find a nearly perfect agreement with an ideal linear scaling (dashed line).  These tests suggest that MG-GLAM has excellent scalability, and the running times for the simulations performed in this work can be used to reliably predict the requirement for even larger runs.

Resolution tests
We performed a series of mass and force resolution tests for the nDGP model with H 0 r c = 1. but these are not used in this comparison. The measured nonlinear power spectra at z = 0 are shown in the left panel of Fig. 6, where we have multiplied P m (k) by the wavenumber (k) to enhance any difference on large-scales. We find a good agreement on large-scales, where the measurements of the Np1024Ng2048 and Np2048Np4096 simulations are well within the error bars of the Np1024Ng4096 case. In the right panel of Fig. 6 we confirm a one per cent agreement between all simulations on scales k 1 h Mpc −1 . It also shows that for N g = 4096, increasing N p from 1024 to 2048 does not make a big difference.
The effects of mass and force resolution on the halo mass function (HMF) are shown in Fig. 7. First, we observe an improvement of the completeness of the HMF down to M vir ∼ 10 12 h −1 M for the highest force resolution simulations, i.e., those configurations with ∆x = 0.125 h −1 Mpc or N g = 4096 (see left panel of Fig. 7). In addition, the right panel of Fig. 7 shows the level of agreement between the different configurations. We found that the N g = 4096 cases have a 2% agreement over a large range of masses, 10 12.3 h −1 M < M vir < 10 15 h −1 M , while the N g = 2048 simulations show good convergence (better than 5% agreement) for haloes with mass M vir > 10 12.3 h −1 M . To have complete halo catalogues down to 10 12.5 h −1 M , the resolution of L512Np1024Ng2048 seems to be fine, while to have haloes down to 10 12 h −1 M we need the resolution of L512Np2048Ng4096.

Comparisons with previous simulations
Finally, we compare the dark matter power spectrum and the abundance of dark matter haloes of the nDGP (H 0 r c = 1) model at the present time measured from our MG-GLAM simulations with those from the L = 500 h −1 Mpc simulations presented in [142] ran with the MG-AREPO code [96].
The MG-AREPO simulation follows the evolution of one realisation of 1024 3 particles in a box of size 500 h −1 Mpc, with a force resolution 0.01 h −1 Mpc and mass resolution m p = 9.98 × 10 9 h −1 M . We take advantage of the performance of MG-GLAM to run 10 independent realisations of the same nDGP model, using the same linear theory power spectrum as for the MG-AREPO runs. For the MG-GLAM simulations we use a box of size 512 h −1 Mpc and a mesh with the smaller-scale discrepancy due to the lower force resolution of the MG-GLAM runs), and the P (k) enhancement approaches to the linear theory prediction (solid horizontal grey line) on large scales. MG-GLAM slightly under-predicts the power spectrum enhancement at large, linear scales, and this effect appears to be systematic, which is independent of the simulation box size or resolution. However, we have performed checks by running simulations of the same nDGP model using the ECOSMOG code, and found the same behaviour, which to a less extent also exists in MG-AREPO simulations (the red dashed line here is a particular realisation). In any case, the agreement between these two codes is consistent with that between ECOSMOG and MG-AREPO, cf. Fig. A1 of [96]. The comparison of the cumulative halo mass function enhancement measured from MG-GLAM (solid blue line with error bars) and MG-AREPO (dashed red line) is presented in the right panel of Fig. 8. For the latter we have run the halo finder with the same virial mass overdensity halo definition as adopted for MG-GLAM, to be consistent. We again find a good, percent-level, agreement between the results of both codes, especially for high-mass haloes where the MG-AREPO measurement is well within the MG-GLAM error bars (standard deviation of the 10 realisations). The MG-AREPO prediction appears to be slightly but consistently lower than that of MG-GLAM. Indeed, while in MG-AREPO the nDGP model enhances the abundance of large haloes and reduces it for small haloes, for MG-GLAM the abundance is always enhanced; the latter behaviour is seen in all the ECOSMOG simulations, e.g., Fig. 2 of [143] of the nDGP model. This is unlikely due to the different halo finding algorithms, since [143] does not use the BDM halo finder and yet finds the same behaviour. Rather, we suspect that this small discrepancy between MG-GLAM and MG-AREPO is caused by differences in other code details, such as force calculation.
All in all, we conclude that the MG-GLAM code has passed various tests, and is ready for massive productions of simulations and mock catalogues. We will demonstrate a small-scale-in terms of the very low cost compared to MG-AREPO and ECOSMOG simulations-application in the next section.

Cosmological simulations
As a taster of the MG-GLAM code, we have conducted a large suite of dark-matter only simulations of the nDGP model and a few Kmouflage simulations, to have a quick look at the nonlinear matter For all simulations, we use the same ΛCDM linear perturbation theory power spectrum to generate the initial conditions at z ini = 100 using the on-the-fly algorithm of MG-GLAM. The cosmological parameters are chosen from those reported by the Planck collaboration [144]: The linear matter power spectrum is generated using the CAMB code. The reason we can use the same initial condition for all simulations is that the effect of the scalar field is very weak at z > 100; we have checked that even the strongest Kmouflage model studied in this work only differs from ΛCDM by O (0.1%) in the linear matter power spectrum at z = 100.

Matter power spectrum
The measured power spectra for all 30 nDGP models are displayed in Fig. 9 at z = 0 (left panel) and z = 1 (right panel). The colorbar displays the values of H 0 r c from the strongest (H 0 r c = 0.25; bluest solid line) to the weakest models (H 0 r c = 10, reddest solid line). From the lower subpanels, we see that we can cover a wide range of enhancement amplitudes of the power spectrum, with the relative differences between the nDGP and GR models spanning from ≈ 1% to 40% on large scales at z = 0. At earlier times (z = 1; right panel), the behaviour is qualitatively similar, but the enhancement is generally smaller (≈ 0.5%-25% on large scales) as the fifth force has had less time to take effect. The effect of the Vainshtein screening mechanism is reflected by the decay of the power spectrum enhancement towards 0 at small scales (large k). However, notice that at this resolution, we can only trust the result at k 3h/Mpc, as shown by the comparison between MG-GLAM and MG-AREPO in § 4.6. Should the simulations be run at a higher resolution, we expect the decay to 0 to happen faster at k > 3h/Mpc. This decay is because, according to the halo model [145] of structure formation, the small-scale matter power spectrum is determined by the one-halo term, which in turn depends on the inner density profiles of dark matter haloes; the Vainshtein screening mechanism can effectively suppress the relative strength of the fifth force, cf. Eq. (2.23), inside and near massive bodies such as haloes [146], so that in Vainshtein-type models the halo density profile is close to ΛCDM [118,142,147].
The lower subpanels of Fig. 10 display the relative difference between the measured power spectra of the Kmouflage models and GR. In addition to the results of the full and linearised simulations, we also show in dotted lines the linear-theory predictions at z = 0 (left panel), obtained using the modified version of the CAMB code developed in [130]. In general, we find that the linearised simulations give similar results to those of their full nonlinear counterparts; also, all measurements approach to the linear theory predictions on large scales. This shows that the Kmouflage screening mechanism is not efficient [148] in suppressing the effect of the fifth force in cosmic structure formation. This is related to the way in which screening works in this class of models, which requires |∇ϕ| |φ| ∼ H 0 , a condition that is likely to be satisfied only on small (e.g., sub-galactic) scales. A corollary from this is that, in cosmological simulations, solving the fully nonlinear Kmouflage equation of motion may not be as important as for the other models such as nDGP and f (R) gravity [103].
Since this is the first time that cosmological simulations for the Kmouflage model are conducted, let us comment on the qualitative behaviour shown in the lower subpanels of Fig. 10. Overall, the power spectrum enhancement in this model looks very similar to that in the nDGP model, cf. Fig. 9, but there is a critical difference: here the enhancement becomes negative at small scales, k 2h/Mpc. We have already seen that this can not be due to the Kmouflage screening mechanism -actually, it is due to the lack of screening. Unlike in nDGP, here even inside dark matter haloes particles still feel a strong fifth force which has a nearly constant ratio with the strength of Newtonian gravity, and on top of this the direction-dependent force discussed below Eq. (2.35) can also speed up the particles; the result of these two forces is that particles gain a higher kinetic energy, tend to move into or stay in the outer regions of dark matter haloes and thus reduce the clustering on small scales as compared to ΛCDM. Such distinct behaviours between the nDGP and Kmouflage matter power spectra may offer a potential way to distinguish between them observationally, although that is beyond the scope of this paper.

Halo mass functions
Modified gravity and screening mechanism effects can also be studied by exploring dark matter halo populations. In Figs. 11 and 12 we show the cumulative halo mass function (cHMF), which defines the number density of dark matter haloes more massive that a given halo mass M vir , measured from our BDM halo catalogues at z = 0 (left panels) and z = 1 (right panels). For nDGP the 30 models are colour-coded in the same away as in Fig. 9.
From the lower subpanels of Fig. 11, we see that the abundance of haloes is enhanced by the fifth force, especially at low redshifts and for high-mass haloes. The same behaviour has been found and discussed in previous works, e.g., [142,143,149]. We also notice that the enhancement over ΛCDM is positive for the whole halo mass range, not just for massive haloes, as already discussed in § 4.6. The abundance of haloes is enhanced from ≈ 1 to 250 percent for the different nDGP models. The large increase of high-mass haloes in the less efficiently screened nDGP models (models with H 0 r c < 5) is due to the accretion of surrounding matter around these massive objects thanks to the enhanced gravity force: these objects, often being the dominating object within some large surrounding region, can attract matter from the whole region, including the accretion of smaller haloes to them, and so the fifth force can strongly boost their masses; on the other hand, smaller objects, while also experiencing the fifth force [96], are more likely to meet competitors and so their masses grow less.
On the other hand, the lower subpanels of Fig. 12 show the relative difference of the cHMFs between the Kmouflage models and ΛCDM. In the same figure we compare the predictions from the linearised Kmouflage simulations (dashed lines) with their fully nonlinear counterparts (solid lines). Each pair of Kmouflage simulations produce roughly the same abundances of dark matter haloes, as evident from the overlap between dashed and solid lines in the entire mass range used to measure the cHMFs, confirming that the effects of Kmoulfage screening are marginal. The abundance of massive haloes is enhanced by ≈ 50 percent at z = 1 and ≈ 20 percent at z = 0, consistent with the redshift evolution of the matter power spectrum shown in the lower panels of Fig. 10.
Also, we find that the Kmouflage model produces fewer low-mass haloes than GR, especially at lower redshifts, and we believe this is the consequence of the competition between the four effects of the Kmouflage model, discussed below Eq. (2.36). As we have demonstrated in § 4.1 for a few cases of fixed n and β Kmo , this competition can be complicated and not analytically predictable. As a result, to disentangle the four effects and to rank their relative importance, we need to switch them on and off individually to observe the impact on cosmological observables. While this is apparently an interesting and important thing to do, it is beyond the scope of this paper and so we will leave such a study to future works.
Finally, before concluding this section, it is worthwhile to mention that, at the simulation resolution used here, we can already get the HMF complete down to 10 12.5 h −1 M , as shown in [103,150].

Discussion
In this section we have had an initial taste of the MG-GLAM code, by running a large suite of simulations covering all three classes of models studied in this paper.
One particularly relevant aspect of the MG-GLAM code is its fast speed (cf. § 4.4). The 30 nDGP simulations described in this section have been run using 56 threads with OPENMP parallelisation, and we find that the run time for the majority of them is ∼ 23, 000 seconds, or equivalently 357 CPU hours, roughly 105 times faster than MG-AREPO, and 300 times faster than ECOSMOG, for the same simulation specifications. With such a high efficiency, we can easily ramp up the simulation programme to include many more models and parameter choices, and increase the size and/or resolution of the runs, e.g., using boxes of at least 1h −1 Gpc. The Kmouflage simulations, while having a different screening mechanism, take about 22, 000 seconds each, similar to the nDGP runs. This is not unexpected given that in both models we use the same number of V-cycles and 157 timesteps. As part of the resolution tests in § 4.5, we have also run a few even larger simulations for ΛCDM and N1, e.g., with L = 512 h −1 Mpc, N p = 2048 and N g = 4096. These runs took around 40, 000 seconds for ΛCDM and 116, 000 seconds (wallclock time) for N1, using 128 threads on the SKUN8@IAA supercomputer at the IAA-CSIC in Spain, suggesting that a single run of specification L1000Np2048Ng4096, which would be useful for cosmological (e.g., galaxy clustering and galaxy clusters) analyses should take at most 1.3 days to complete and is therefore easily affordable with existing computing resources.
On the other hand, efficiency should not be achieved at the cost of a significant loss of accuracy. For the runs used here, we have used a mesh resolution of 0.25h −1 Mpc, which is sufficient to achieve percent-level accuracy of the matter power spectrum at k 1hMpc −1 [98], matter power spectrum enhancement at k 3hMpc −1 , and (main) halo mass function down to ∼ 10 12.5 h −1 M [103]. The particle number, N 3 p , in GLAM simulations is normally set according to N p = N g /2, so that in the simulations here we have used 1024 3 particles. However, we have checked that increasing the particle number to 2048 3 has little impact on the halo mass function (cf. § 4.5). We notice that the completeness level of the HMFs here is similar to ECOSMOG runs with the same simulation specifications, suggesting that MG-GLAM is capable of striking an optimal balance between cost and accuracy.

Summary and conclusions
In this paper, along with a companion paper [103], we have presented the MG-GLAM code, which is an extension of the GLAM pipeline [98] that enables very efficient and accurate production of full Nbody simulations in a large variety of modified gravity models, with the ultimate objective of covering all such models of interest. We have focused on the description and numerical implementation of models with derivative coupling terms, while our twin paper [103] explores the conformally coupled scalar field models, including thin-shell screening models such as f (R) gravity and symmetrons, as well as the usual coupled scalar field models.
We studied two classes of derivative coupling models, the Vainshtein-type and the Kmouflagetype gravity models, which employ the Vainshtein and Kmouflage screening mechanism, respectively. As an example of Vainshtein-type models, we considered the nDGP braneworld model, which serves as a prototype for other classes of models such as Galileons, vector Galileons, generalised Galileons and kinetic-gravity braiding models. The Kmouflage models are comparatively new in the context of cosmological simulations, and we have proposed a new numerical algorithm to solve their equations of motion in this work. This algorithm, and its implementation in MG-GLAM, can be easily generalise to simulate other classes of interesting models such as k-essence, MOND, and the scalar [151] or vector [126] dark matter models with non-canonical kinetic terms of the k-essence type and possibly a generic interaction potential.
To implement these models into the parent code GLAM, we have added subroutines to solve the nonlinear partial differential equations that govern the formation of cosmological structures in such models (cf. § 3.2). These nonlinear PDEs are solved using the multigrid Gauss-Seidel relaxation technique, which uses one of three different arrangements of the multigrid solver (V-cycles, F-cycles and W-cycles). In addition, we have included some background cosmology solvers for the Kmouflage model (cf. § 3.2.5). For both classes of models, we have designed the relaxation algorithm to avoid the Newton-Gauss-Seidel iteration commonly used for nonlinear PDEs, which generally slows down the convergence and is sometimes unstable. This is a key to the performance of MG-GLAM, which we find to be 100-300 times faster than earlier modified gravity codes such as MG-AREPO and ECOSMOG for the same mass resolution; the force resolution is lower as MG-GLAM uses a fixed mesh resolution, while the other codes use adaptive mesh refinements; but even with the resolution used in this work, MG-GLAM is able to accurately predict the halo mass function down to ≈ 10 12.5 h −1 M (comparable to the performance of ECOSMOG) and the power spectrum enhancement down to k ≈ 3 h Mpc −1 .
We have performed a series of tests to check that our implementation of the multigrid solvers works correctly, using different density configurations for which we can obtain analytical expressions of the scalar field solution (cf. § 4), and found that the MG-GLAM numerical solutions agree very well with the analytical expectations. We have shown that using only 2 V-cycles, we can reach convergence for the nonlinear equations in the nDGP and Kmouflage models. Also, we have compared the solutions of the background scalar field and the modified expansion rate in the Kmouflage model obtained with MG-GLAM and CAMB [130], finding excellent agreement between both codes. Finally, we have compared the power spectrum enhancement and the abundance of dark matter haloes for one nDGP model (H 0 r c = 1.0) predicted by MG-GLAM and the MG-AREPO code [96]. To do so, we ran 10 independent MG-GLAM realisations (to reduce cosmic variance) and use the L500-N1 simulation presented in [142]. In general, MG-GLAM is able to reproduce the power spectrum enhancement and the abundance of dark matter haloes from those high-resolution simulations with high accuracy.
For the first time, we have been able to run a large suite of nDGP simulations, for 30 models with H 0 r c logarithmically spaced between 0.25 and 10, and carried out the first fully nonlinear N -body simulations for three Kmouflage models with β Kmo = 0.2 and (n = 3, K 0 = 1), (n = 2, K 0 = 1) and (n = 2, K 0 = 0.5). In addition, we have run linearised simulations for each of the Kmouflage models mentioned above. With this large suite of MG simulations we are able to study in great detail the interplay between modified gravity effects and screening mechanism on structure formation, as we have shown in the nonlinear matter power spectra and cumulative halo mass function predictions, Figs. 9-12. Our nDGP simulations clearly demonstrate the effect of Vainshtein screening in the matter power spectrum, and how that evolves with time and depends on H 0 r c . The Kmofulage simulations, on the other hand, indicates that the Kmouflage screening mechanism is is much less efficient in the cosmological regime, as the fully nonlinear and linearised simulations give similar predictions of the matter power spectrum and halo mass function; this agrees with expectations.
The development of MG-GLAM will help in the construction of a large number of galaxy mock catalogues in MG theories for Stage-IV galaxy surveys, such as DESI and Euclid. Owing to its high efficiency and accuracy, this code can be used to perform > O(100) large (L > 1.0h −1 Gpc at least) and high-resolution (m p < 10 10 h −1 M ) simulations for each modified gravity model, with minimal computational cost. These will allow for variations of not only the gravitational but also cosmological parameters, and subsequently the construction of accurate emulators for various physical quantities in different gravity models. This will open up a wide range of possibilities for future works to test gravity using cosmological observations. The prescriptions to populate dark matter haloes with galaxies will be explored in an upcoming paper, as well as a more detailed study of halo properties, including halo clustering, will be left in future works.