A general framework to test gravity using galaxy clusters VI: Realistic galaxy formation simulations to study clusters in modified gravity

We present a retuning of the IllustrisTNG baryonic physics model which can be used to run large-box realistic cosmological simulations with a lower resolution. This new model employs a lowered gas density threshold for star formation and reduced energy releases by stellar and black hole feedback. These changes ensure that our simulations can produce sufficient star formation to closely match the observed stellar and gas properties of galaxies and galaxy clusters, despite having $\sim$160 times lower mass resolution than the simulations used to tune the fiducial IllustrisTNG model. Using the retuned model, we have simulated Hu-Sawicki $f(R)$ gravity within a $301.75h^{-1}{\rm Mpc}$ box. This is, to date, the largest simulation that incorporates both screened modified gravity and full baryonic physics, offering a large sample ($\sim$500) of galaxy clusters and $\sim$8000 galaxy groups. We have reanalysed the effects of the $f(R)$ fifth force on the scaling relations between the cluster mass and four observable proxies: the mass-weighted gas temperature, the Compton $Y$-parameter of the thermal Sunyaev-Zel'dovich effect, the X-ray analogue of the $Y$-parameter, and the X-ray luminosity. We show that a set of mappings between the $f(R)$ scaling relations and their $\Lambda$CDM counterpart, which have been tested in a previous work using a much smaller cosmological volume, are accurate to within a few percent for the $Y$-parameters and $\lesssim7\%$ for the gas temperature for cluster-sized haloes ($10^{14}M_{\odot}\lesssim M_{500}\lesssim10^{15}M_{\odot}$). These mappings will be important for unbiased constraints of gravity using the data from ongoing and upcoming cluster surveys.


INTRODUCTION
Galaxy clusters are the largest gravitationally-bound structures that have been observed in the Universe. They are believed to trace the highest peaks of the primordial density field, and their properties are highly sensitive to cosmological parameters which control the formation and evolution of large-scale cosmological structure. For example, the cluster abundance, which is quantified using cluster number counts, has been used to constrain the matter density parameter Ω M and the linear fluctuation of the density field 8 (e.g., Ade et al. 2016). Clusters can also be used to probe the strength of gravity on the largest scales: for example, a strengthened gravity would result in the formation of a greater number of clusters by the present-day and alter properties such as the temperature of the intra-cluster gas. Therefore, cluster observations can be used to search for departures from General Relativity (GR).
Ongoing and upcoming astronomical surveys are generating vast catalogues using all available means of detection: clustering of galaxies in galaxy surveys (e.g., Lawrence et al. 2007; LSST Science ★ E-mail: m.a.mitchell@durham.ac.uk Collaboration 2009;Laureĳs et al. 2011;DESI Collaboration 2016); X-ray emission produced by the hot intra-cluster gas (Weisskopf et al. 2000;Jansen et al. 2001;Merloni et al. 2012); and the Sunyaev-Zel'dovich (SZ) effect (e.g., Sunyaev & Zeldovich 1972), which is caused by interactions between cosmic microwave background photons and electrons in the intra-cluster gas (e.g., Hasselfield et al. 2013;Ade et al. 2016;Abazajian et al. 2016;Ade et al. 2019). Some of these catalogues will be many times larger than previous data sets, and will offer a means to significantly enhance our understanding of the mechanisms driving the large-scale evolution of the Universe, including the late-time accelerated cosmic expansion.
Before we can use this data to test cosmological models, we must first consider any potential sources of bias that can arise from an incomplete modelling of the cluster properties. For example, many modified gravity (MG) models (see, e.g., Clifton et al. 2012;Joyce et al. 2015;Koyama 2018;Ferreira 2019), which can be used to explain the accelerated cosmic expansion, feature a strengthened gravitational force. This can alter properties such as the intra-cluster gas temperature and the density profile. If these effects are not properly accounted for in tests of these models using the abundance and other properties of galaxy clusters, the inferred constraints may be subject to bias. For example, the cluster mass is often determined using scaling relations that relate the mass with some observable flux obs . In GR, these are often modelled as power laws, but this may be inaccurate for MG. If these effects are not accounted for, then constraints inferred by comparing the observed cluster abundance, d /d obs , with the model-dependent prediction of the halo mass function (HMF), d /d , can be unreliable, since the inferred cluster mass is not correct.
The above effects can be studied and modelled using numerical cosmological simulations which incorporate sub-resolution models for baryonic processes such as star formation, gas cooling, and stellar and black hole feedback (e.g., Schaye et al. 2015;Weinberger et al. 2017;Pillepich et al. 2018a). For the first time, these 'full-physics' models are being incorporated in MG simulations to study the combined effects of the extra gravitational forces and baryonic processes on the properties of galaxies. For example, the simulations (see Arnold et al. 2019;Hernández-Aguayo et al. 2021), which incorporate the IllustrisTNG baryonic physics model (Weinberger et al. 2017;Pillepich et al. 2018a), were run for two popular MG models which feature a strengthened gravity and some screening mechanism to recover GR in high-density regions. These yielded useful insights into, for example, the abundance of disk-shaped galaxies, the power spectra of different matter components, and the stellar and gas properties of galaxies in these models. As an example, it was found that, for MG models with weak or moderate deviations from GR, the impacts of baryons and MG can be modelled separately and added up, which means that full baryonic simulations in MG are not necessary for predicting the matter power spectrum on small scales. Some interesting observations were also made about the number of disk galaxies produced by different models , with a stronger gravity leading to more frequent halo mergers and therefore fewer surviving disk galaxies (though this observation requires further verification using larger boxes). The ( ) simulation data also show that different gravity models could produce very different numbers of small haloes (those less massive than 10 10 ), and hence lead to different amounts and clustering of 21cm-emitting neutral hydrogen . But those simulations are generally aimed at studies of galaxy properties, and so they have very high resolution and small box sizes which are not suitable for a cosmological study. Extending this approach from the galactic regime to the cluster regime is now an important step as we prepare for future cluster tests of gravity.
This paper is part of a series of works which aims to develop a general framework for robust and unbiased tests of gravity us-ing galaxy clusters. An important component of this framework is a mapping of observable-mass scaling relations from GR to ( ) gravity, to accurately predict the latter from the (better-known knowledge of the) former. As mentioned in the above, this is required in order to avoid biased mass estimates. Most interestingly, this can be done using a simple analytical model for the ( ) enhancement of the dynamical mass -in Mitchell et al. (2021d), we tested this mapping using the simulations, and demonstrated that it performs very well for galaxy groups. However, the simulations have a box size of just 62ℎ −1 Mpc ('L62') and a high mass resolution, which as mentioned above is more suited for studying galaxies than galaxy clusters; indeed, these simulations contain only a few cluster-sized objects ( 500 10 14 ) 1 . The L62 predictions of the cluster scaling relations may therefore suffer from poor statistics and be potentially subjected to a significant influence by sample variance. On the other hand, it has been shown that a sufficiently high mass resolution is required for the IllustrisTNG model to generate sufficient levels of star formation to closely match galaxy observations (e.g., Pillepich et al. 2018a). Unfortunately, high-resolution simulations which incorporate both screened modified gravity and full baryonic physics are very expensive to run for larger cosmological volumes, which has made it difficult to study the interplay between baryonic physics and the fifth force at higher masses.
In this work, we will present a retuning of the IllustrisTNG model which can be used to run full-physics simulations at lower resolutions without losing the good agreement with observational data. This retuning was a significant undertaking which involved running over 200 simulations with a reduced box size, and our new model can be used to run low-resolution, large-box simulations for both standard gravity and MG scenarios. We have used this model to run simulations for GR and ( ) gravity with a significantly increased box size of 301.75ℎ −1 Mpc, and in this work we will present the predictions for the observable-mass scaling relations over an extended mass range 10 13 ≤ 500 10 15 . This paper is structured as follows. In Sec. 2, we provide some background on the ( ) gravity model and our general framework. Then, in Sec. 3, we provide a detailed description of the baryonic physics retuning and the large-box simulations, including the agreement with galaxy observations. We present our results for the observable-mass scaling relations in Sec. 4. Finally, we provide a summary of this paper in Sec. 5.

BACKGROUND
In Sec. 2.1, we will introduce the ( ) gravity model studied in this work. Then, in Sec. 2.2, we will describe the effects of the ( ) fifth force on the properties of galaxy clusters and outline our general framework. Throughout this section, we will adopt the unit convention = 1 for the speed of light and Greek indices can take values 0, 1, 2 and 3. Unless stated otherwise, overbars (e.g.,¯) will be used to denote the background value of a quantity, while a subscript 0 will denote the present-day value of a quantity. 1 In this work, we define the halo mass Δ as the total mass within a sphere of radius Δ which is centred on the potential minimum of the halo and encloses an average density of Δ times the critical density at the halo redshift for a flat universe.

Theory
The ( ) gravity model is constructed by adding a nonlinear function, ( ), of the Ricci scalar curvature, , to the integrand of the Einstein-Hilbert action of GR: where is the determinant of the metric tensor , is Newton's constant and L M is the Lagrangian density for matter fields (we will focus on late-time cosmology and therefore on non-relativistic matter). By setting the variation of the action to zero, the modified Einstein field equations can be derived: where is the Einstein tensor, is the stress-energy tensor and is a new tensor which is given by: where is the Riemann tensor, ≡ ∇ ∇ is the d'Alembert operator and ∇ is the covariant derivative compatible with the metric . The quantity ≡ d /d represents the extra scalar degree of freedom which we refer to as the 'scalar field'. This mediates a fifth force which can act on scales smaller than the Compton wavelength of the scalar field: where is the cosmological scale factor. The chameleon screening mechanism (e.g., Khoury & Weltman 2004a,b;Mota & Shaw 2007) is featured by the model in order to suppress the fifth force in highdensity regions, ensuring consistency with Solar System tests of gravity (e.g., . This is brought about by an effective mass of the scalar field, eff = −1 C , which becomes very large in dense regions, significantly reducing the range of the fifth force. The fifth force will therefore only act in regions with shallow gravitational potential, which can include cosmic voids, low-mass haloes and the outer regions of galaxy clusters. In these regions, the total strength of gravity is enhanced by up to a factor of 4/3. In this work, we will focus on the popular Hu-Sawicki (HS) model of ( ) gravity (Hu & Sawicki 2007), which uses the following prescription for the function ( ): where 2 ≡ 8¯M ,0 /3 = 2 0 Ω M , with¯M ,0 the present-day background matter density and 0 the Hubble constant. The model has three free parameters: , 1 and 2 . However, it can be simplified by assuming −¯ 2 for the background curvature, in which case the background scalar field is given by: Assuming that the background expansion history is practicably indistinguishable from ΛCDM, the background curvature is given by: where Ω Λ = 1 − Ω M . Therefore, the background scalar field at a given redshift can be re-expressed as: where 0 is the present-day value of the background scalar field (we will omit the overbar from this quantity throughout this work). We note that the inequality −¯ 2 holds for a realistic choice of cosmological parameter values. Therefore, to a reasonable accuracy we are able to work with just two free parameters: and 0 . We will choose = 1 throughout this work, which is also a common choice in literature, and we will be working with a value | 0 | = 10 −5 (the 'F5' model). The amplitude of the background scalar field is greater at later times, which means that a halo with a given mass is more likely to be unscreened (i.e., it can feel the fifth force) at late times.

Galaxy clusters in ( ) gravity
In this subsection, we describe the main components of our framework to test gravity using galaxy clusters, as illustrated in Fig. 1, including our previous work on cluster scaling relations in ( ) gravity.

Dynamical mass enhancement
In this work, we will refer to two definitions of the cluster mass (or the mass of any massive body whose gravity is of interest): the 'true' mass is the intrinsic mass, and can be inferred using weak lensing; the 'dynamical' mass is the mass that is felt by a nearby massive test particle, and can be measured using properties related to the total gravitational potential of the halo, including the velocity dispersion and thermal properties including the Xray temperature. In GR, the two masses are expected to be equal: GR true = GR dyn (we will therefore neglect the subscript from the GR mass, GR , in what follows); while in ( ) gravity the fifth force will enhance the dynamical mass relative to the true mass: Mitchell et al. (2018), we used dark-matter-only (DMO) simulations to calibrate a general model for the ratio of the dynamical mass to the true mass in HS ( ) gravity. This is accurately described by a tanh fitting formula, with the dynamical mass of low-mass (unscreened) haloes enhanced by a factor of 4/3 and with no enhancement for high-mass haloes which are efficiently screened: We found that the parameter 1 is approximately constant, with a best-fit value of 1 = 2.21±0.01. For the parameter 2 , we obtained the following best-fit relation: 2 = (1.503 ± 0.006) log 10 |¯( )| 1 + + (21.64 ± 0.03).
This parameter represents the logarithmic halo mass above (below) which haloes are expected to be mainly screened (unscreened). Our model, which is an important component of the framework (cf. the red dotted box in Fig. 1), attains a very high accuracy for a wide range of halo masses (10 11 ℎ −1 500 10 15.5 ℎ −1 ) and present-day strengths of the scalar field (10 −6.5 ≤ | 0 | ≤ 10 −4 ). MG Mitchell et al. (2018), we used dark-matter-only simulations to calibrate models for the ( ) enhancements of the halo concentration (blue dotted box) and the dynamical mass (red dotted box). Our model for the concentration is required for conversions between halo mass definitions. In Mitchell et al. (2021d), we showed that our model for the dynamical mass enhancement can be used to convert observable-mass scaling relations from GR to ( ) gravity (green dotted box). These can be used to link the observed cluster abundance to the theoretical halo mass function. In Mitchell et al. (2021a), we used mock cluster catalogues to validate our MCMC pipeline for constraining the amplitude of the present-day background scalar field (brown dotted box). This pipeline will be employed in future works to test gravity.

Halo concentration
In Mitchell et al. (2019), we used an extended suite of DMO simulations to calibrate a general model for the enhancement of the halo concentration in ( ) gravity (blue dotted box in Fig. 1). The concentration is an important parameter of the Navarro-Frenk-White density profile of dark matter haloes (Navarro et al. 1997). Our model can therefore be used to model the effect of the fifth force on the halo density profile, allowing for conversions between halo masses defined with respect to different spherical overdensities. We will not provide the model here, since it is not required for the present study.

Observable-mass scaling relations
It is often difficult and (observationally) expensive to directly measure the dynamical mass of clusters. This can require long exposure times and high-quality spectra and X-ray data. Instead, the cluster mass is often inferred using its one-to-one relationship with the thermal properties of the intra-cluster gas. During cluster formation, the initial gravitational potential energy of the gas is converted into thermal energy through shock heating as it is accreted by the dark matter halo. In GR, this leads to an approximate power-law mapping between the cluster mass and various thermal properties ('mass proxies') such as the gas temperature gas , the Compton -parameter of the SZ effect SZ , the X-ray analogue of the -parameter X and the X-ray luminosity X . These observable-mass scaling relations have been widely studied in the literature using both hydrodynamical simulations (e.g., Le Brun et al. 2017;Truong et al. 2018) and observations (e.g., Ade et al. 2014). In ( ) gravity, the gravitational potential of a halo can be enhanced by up to 1/3 by the fifth force, and consequently the gas temperature may be enhanced by a similar factor, leading to departures from a power-law (e.g., Hammami & Mota 2017) because the departure will depend on the mass of the halo (stronger for low-mass haloes).
In Mitchell et al. (2021d), we tested two methods for mapping between the observable-mass scaling relations in GR and ( ) gravity using the full-physics simulations (the green dotted box in Fig. 1). One of these methods (the 'effective density' approach) had already been proposed and studied using non-radiative simulations by . We also proposed and tested an alternative set of mappings (the 'true density' approach). We found that both methods work well for galaxy groups and low-mass clusters for the -parameters and the gas temperature, even in the presence of the extra baryonic processes found in the full-physics simulations, including star formation, cooling and feedback. However, the simulations only contain haloes with mass 500 10 14.5 , including only a few cluster-sized objects ( 500 10 14 ). In this work, we will be using larger simulations to verify the previous results with a much larger halo sample and an extended mass range 10 13 ≤ 500 10 15 . We will be focusing on the 'true density' mappings, which are summarised below.
Consider a halo in GR which has mass GR and gas temperature GR gas , and a halo in ( ) gravity which has true mass ( ) true , dynamical mass ( ) dyn and gas temperature ( ) gas . If the true masses of these haloes are equal (i.e., if GR = ( ) true ), then the following relation between their gas temperatures is expected: This relation can be understood as follows: as discussed above, the gas temperature is closely related to the total gravitational potential, which is approximately dyn / 500 . Because the dynamical mass in ( ) gravity can be enhanced by up to 1/3 by the fifth force, the gas temperature of the ( ) halo is expected to be enhanced relative to the temperature of the GR halo by the same factor 2 , giving rise to the ( ) dyn / ( ) true factor in Eq. (11). This factor also arises in the relations between the ( ) and GR -parameters, since these are linearly related to the gas temperature: On the other hand, the X-ray luminosity varies as 1/2 gas , therefore the ( ) and GR values are related by a factor of ( ) The -parameters and the X-ray luminosity also depend on the gas density. During the formation of a cluster, matter is drawn from an initially large region, and so the ratio between the gas mass and total mass is expected to be equal to the cosmic baryonic fraction Ω b /Ω M to very good approximation (e.g., White et al. 1993). Therefore, two haloes with the same true mass are expected to have a similar gas content regardless of whether a fifth force is acting or not, and so the gas density dependence of the above observables does not contribute substantial additional factors in Eqs. (12)-(14). In Mitchell et al. (2021d), we validated this assumption for group-sized haloes using the simulations (see Fig. 2 in that work), and showed that the mappings given by Eqs. (11)-(14) can be computed analytically using Eq. (9) for the mass ratio.

Further works
In a recent work (Mitchell et al. 2021a), we tested our general framework by using Markov chain Monte Carlo (MCMC) sampling to constrain the present-day amplitude of the background scalar field, | 0 |, using mock cluster data (brown dotted box in Fig. 1). In future works, we plan to use this sampling pipeline to constrain | 0 | using real cluster data from ongoing and upcoming surveys. Our pipeline is also designed to be easily extendable to other MG models and cluster observables. For example, we recently (Mitchell et al. 2021b) carried out a similar modelling of the cluster properties in the normal-branch Dvali-Gabadadze-Porrati (nDGP) model of gravity (Dvali et al. 2000), and we have also demonstrated the potential in using the thermal and kinetic SZ angular power spectra to probe ( ) gravity and nDGP (Mitchell et al. 2021c). Both of these works were carried out using the L62 simulations, therefore it will be important to validate these results using simulations with much larger box sizes before we use our pipeline to test gravity using real data. This will be another use for the new baryonic model 2 Note that for haloes in GR and ( ) gravity of equal true mass true ≡ 500 , the mass definition guarantees that their radius 500 is the same too.
(see Sec. 3.2), which can be used to run large-box simulations for a range of MG scenarios (in addition to GR).

SIMULATIONS AND METHODS
In Secs. 3.1 and 3.2, we will provide a brief summary of the Illus-trisTNG model and discuss our retuning of the model parameters for lower-resolution simulations. Our large-box simulations, which are used for the main results of this paper, are presented in Sec. 3.3.

The IllustrisTNG model
In this section, we will briefly summarise the main features of the IllustrisTNG subgrid model (Weinberger et al. 2017;Pillepich et al. 2018a) and the -body and hydrodynamical simulation code (Springel 2010), where this model is implemented. The description will be kept concise, with further relevant details given in the next subsection.
In , dark matter and gas are respectively sampled as particles and cells. The gas cells in make up an unstructured, moving Voronoi mesh. The cells are adaptive in the way that they refine (split) and derefine (merge), such that the mass of any cell does not differ by more than a factor of two from the mean. The code uses a tree-particle-mesh algorithm to solve the Poisson equation and a second order finite-volume Godunov scheme on the Voronoi mesh to solve the ideal magneto-hydrodynamics equations, with Powell cleaning used to maintain the divergence constraint of the magnetic field (see Pakmor et al. 2011;Pakmor & Springel 2013). The magnetic field, which dynamically couples to gas through magnetic pressure, is initially seeded at = 127 with a uniform strength of 1.6 × 10 −10 Gauss.
The TNG model employs a subgrid scheme for star formation which is based on the Springel & Hernquist (2003) model: at each simulation timestep, for gas cells which exceed a particular threshold density, a fraction of the gas mass is converted to mass in star particles according to the Kennicutt-Schmidt law (Pillepich et al. 2018a). A star particle represents a population of stars with an initial mass function given by Chabrier (2003). The evolution of these stars and the subsequent chemical enrichment of the surrounding gas is tracked. A portion of the gas mass in star-forming gas cells is also converted into wind particles which are launched in random directions (Pillepich et al. 2018a); these represent galactic winds driven by supernovae. These wind particles will eventually couple to gas cells outside their local dense interstellar medium, resulting in the heating and metal enrichment of the gas. Gas cells also undergo radiative cooling which is modulated by a time-dependent ultraviolet background radiation.
Supermassive black holes are seeded at the centre of friendsof-friends (FOF) groups which exceed a particular mass threshold. These can then grow through a combination of Eddington-limited Bondi gas accretion and black hole mergers. The TNG model employs two types of black hole feedback, depending on the accretion state of the black hole (Weinberger et al. 2017;Vogelsberger et al. 2013): in the low accretion state, a kinetic feedback model is employed which produces black hole-driven winds; while in the high accretion state, a thermal feedback model is employed which heats up the surrounding gas.
It has been shown that a sufficiently high mass resolution is required in order to use the IllustrisTNG model to accurately reproduce the observed properties of galaxy populations (e.g., Pillepich et al. 2018a). For example, a lowered mass resolution means that the gas cells will have larger volumes, resulting in a smoothed out density field which can miss out the high-density peaks where star formation would be highest. The L62 simulations, with 512 3 gas cells, have ∼15 times lower mass resolution than that used to calibrate the TNG model (25ℎ −1 Mpc box with the same number of gas cells). The gas and stellar properties of haloes from the L62 simulations still give a reasonable agreement with observational data, however the lowered resolution means that there is less star formation, resulting in the amplitudes of the stellar mass fraction, the stellar mass function and the star formation rate density (SFRD) being reduced compared to the fiducial TNG results (for example, see Fig. 4 in Arnold et al. 2019).

Baryonic physics fine-tuning
For the present work, running ( ) gravity simulations with a substantial number of galaxy clusters (with masses 10 14 500 10 15.5 ) requires a box size of at least ∼ 300ℎ −1 Mpc, which necessitates going to even lower resolutions than the L62 simulations to remain computationally feasible. In order to make this possible without losing the good agreement with observational data, it is necessary to retune the parameters of the IllustrisTNG model, including parameters which control the density threshold for star formation and the energy released by the stellar and black hole feedback mechanisms.
To recalibrate the baryonic physics at our desired mass resolution, we have run a large number of realisations with a small box size of 68ℎ −1 Mpc ('L68-N256'). These runs have 256 3 dark matter particles with mass 1.35×10 9 ℎ −1 and, initially, the same number of gas cells with mass ∼ 2.6 × 10 8 ℎ −1 on average. The calibration was carried out using GR (although, as we will show, the retuned model works equally well for F5) with the same cosmological parameter values as the simulations: (ℎ, Ω M , Ω b , s , 8 ) = (0.6774, 0.3089, 0.0486, 0.9667, 0.8159), where ℎ = 0 /(100kms −1 Mpc −1 ) and s is the power-law index of the primordial power spectrum. The runs were all started from the same set of initial conditions at redshift = 127. These have been generated using the N-G IC code (e.g., , which uses the Zel'dovich approximation to displace an initially homogeneous particle distribution and obtain an initial density field with a prescribed linear power spectrum. Each of the input particles is then split into a dark matter particle and a gas cell, with the ratio of masses set by the values of the cosmic density parameters Ω M and Ω b . Halo catalogues are constructed using the code (Springel et al. 2001) which is implemented in . This uses the FOF algorithm to identify FOF groups (haloes) and a gravitational un-binding method to locate the bound substructures (subhaloes) of each group. By adjusting the baryonic physics parameters of our calibration runs, we have aimed for reasonable agreement with observational data and empirical constraints for the six galaxy properties shown in Fig. 2, which were also used to calibrate the IllustrisTNG model (Pillepich et al. 2018a). These are: the stellar mass fraction (within halo radius 200c ), with empirical constraints from Behroozi et al. (2013) and Kravtsov et al. (2018); the stellar mass function (subhaloes), with observations from D' Souza et al. (2015), Bernardi et al. (2013), Baldry et al. (2012) and Li & White (2009); the SFRD as a function of redshift, with observations from Behroozi et al. (2013); the gas mass fraction (within halo radius 500c ), with observations from Lovisari et al. (2015), Gonzalez et al. (2013), Pratt et al. (2010) and Sun et al. (2009); the black hole mass versus the stellar mass (subhaloes), with the compilation of observations from McConnell & Ma (2013); and the galaxy size versus the stellar mass (subhaloes), with observational data from Baldry et al. (2012). The results for a selection of our calibration runs are represented by the coloured solid lines. Apart from the SFRD, which is a direct output of the simulations, these lines are generated using mass-binning of either haloes or subhaloes (see the parentheses above). The black lines show predictions from the TNG simulations (e.g., Nelson et al. 2018;Springel et al. 2018;Marinacci et al. 2018;Pillepich et al. 2018b;Naiman et al. 2018) as well as the BAHAMAS and cosmo-OWLS simulations Brun et al. 2014).
The dark blue line in Fig. 2 shows the predictions using the fiducial TNG model at our lowered resolution. Star formation is significantly reduced at this resolution compared to the fiducial TNG resolution, which is used by the 'TNG L25-N512' simulation ('TNG100' has a similar resolution, while 'TNG300' has ∼ 5 times lower resolution). Consequently, the stellar mass fraction, the stellar mass function and the SFRD are significantly lower. The primary objective of our retuning is therefore to achieve a greater amount of star formation in order to obtain a closer match with the observational data. Our changes are described in the sections below, and the effects of these changes are shown in Fig. 2. We note that the calibration runs discussed in this section are only a very small subset of the ∼200 simulations which were run for this calibration study: we provide further details of these simulations and the calibration procedure in Appendix A.

Gravitational softening
In low-resolution simulations, where the gas cells have higher masses, there is a higher risk of two-body heating: this occurs when two particles come close together and incur a significant gravitational boost to their velocities, which, if happening frequently, can raise the internal energy and subsequently the temperature of the gas. We have therefore increased the gravitational softening length to 1/20 times the mean inter-particle separation, which is about twice the length used for the simulations. The gravitational force is dampened when gas cells come within this distance, preventing extreme interactions. This change alone causes an overall reduction of the gas temperature in our simulations, which results in more cool gas that is capable of forming stars: see the orange lines in Fig. 2, which have a greater amplitude than the dark blue lines for the stellar mass fraction and stellar mass function.
We also considered fractions of 1/30 and 1/10 for the softening length. For higher-mass haloes ( 200 10 12 ), we observed that using a larger softening length results in greater star formation (for the reasons discussed above). However, we were unable to significantly boost star formation at lower masses; in fact, we observed that a large softening length (for example, a fraction 1/10 of the mean inter-particle separation) can even lead to less star formation for low-mass haloes. A potential effect of using a larger softening is that the gravitational potential well of haloes effectively becomes shallower. For low-mass haloes, where the gravitational potential well is already shallower than for high-mass haloes, this could potentially lead to a lower density of cold gas (e.g., the gas is now less gravitationally bound) which in turn could reduce star formation. This is a motivation for using the fraction 1/20, for which we never observed the above effect, rather than using larger fractions.
It is evident from Fig. 2 that, while it can increase star formation, changing the gravitational softening length alone is not enough to produce stellar contents that match observational data.

Stellar feedback
For a star-forming gas cell with metallicity , the available wind energy is (Pillepich et al. 2018a): where¯w is a dimensionless free parameter, SNII,51 is the available energy from core-collapse supernovae in units of 10 51 erg, SNII is the number of supernovae per stellar mass that is formed, and w, , w,ref and w, are additional parameters of the model. A wind particle will eventually donate its thermal energy (along with its mass, momentum, and metal content) to a gas cell that is outside its local dense inter-stellar medium. This heats the gas and subsequently reduces the efficiency of star formation (gas must be sufficiently cool in order to form stars).
Star formation efficiency is reduced by our lowered gas cell resolution, therefore reducing the thermal heating of the gas by wind feedback can help to rectify this. We have achieved this in our retuning of the model by reducing the value of¯w from the fiducial TNG value 3.6 to 0.5, which lowers the energy of the winds. As can be seen from the green lines in Fig. 2, this change significantly boosts star formation over a wide range of halo masses. The stellar mass fraction now has a reasonable amplitude for 200 10 12 , while the amplitudes of the SFRD and stellar mass function are much closer to the observational data.
We have also tried varying the wind speed w , which in the IllustrisTNG model is given by (Pillepich et al. 2018a): where w is a dimensionless factor, DM is the local onedimensional velocity dispersion of the dark matter particles and w,min is the minimum wind velocity allowed in the model. For our calibration runs, we tried reducing the w and w,min parameters. This reduces the speed of the wind particles, which now take longer to transfer the thermal energy to the surrounding gas. Gas is therefore heated up at a slower rate, resulting in an increased amount of star formation. We found that reducing these parameters has a similar effect to reducing the¯w parameter, with star formation boosted over a wide halo mass range.
However, we could find no clear advantage in varying the wind speed parameters instead of¯w, or in varying all three parameters in combination. Moreover, it has been impossible to boost star formation sufficiently to get the stellar mass function and fraction matching observational data by varying these parameters. For simplicity, we therefore decided to adjust the stellar feedback using only the¯w parameter.

Star formation model
As discussed in Sec. 3.1, the star formation rate in IllustrisTNG is computed for gas cells using the Springel & Hernquist (2003) model. Stars can only be formed by gas cells which exceed a particular density threshold, which is approximately H ≈ 0.1cm −3 . We will refer to the value of the threshold gas density as ★ in this work. At our reduced resolution, gas cells have a larger volume and therefore a smoothed density which can miss out high-density peaks of cold gas in galaxies. In order to account for this, we have reduced ★ from ≈ 0.1 to a fixed value of 0.08, allowing gas cells with lower density to form stars.
The effect of making this change, in addition to the changes listed above, is shown by the magenta lines in Fig. 2. This further boosts the stellar mass fraction and SFRD, which are now both in good agreement with the TNG100 results for 200 10 12 and the TNG L25-N512 results for 5, respectively, and there is also now a good agreement with the D'Souza et al. (2015), Bernardi et al. (2013) and Baldry et al. (2012) observations of the stellar mass function for ★ 10 11 .

Black hole feedback
In IllustrisTNG, the rate of gas accretion, , by the central supermassive black holes is set by the Eddington-limited Bondi accretion rate (Weinberger et al. 2017): where BH is the black hole mass, represents the ambient density around the black hole, s is the ambient sound speed and r is the black hole radiative efficiency. The mode of feedback depends on whether or not the ratio / Edd exceeds the following threshold: where 0 and are parameters. If / Edd > , the resulting thermal feedback will inject thermal energy into the surrounding gas at a rate therm = f,high r 2 , where f,high is another parameter; and if / Edd < , the resulting kinetic feedback will inject energy into the surroundings at a rate kin = f,kin 2 , where the factor f,kin depends on the ambient density . Both of these feedback modes will reduce the efficiency of star formation in the surrounding gas, either by blowing gas out, so that less gas will exceed the density threshold for star formation, or by heating up gas which, as for stellar feedback, reduces the amount of cool gas capable of forming stars. As discussed above, the star formation efficiency is already reduced by our lowered gas resolution; reducing the overall effect of black hole feedback on star formation therefore provides another means of rectifying this.
For our retuning of the black hole feedback, we have increased r from the fiducial TNG value 0.2 to 0.22. The effect of this change on the overall energy release is quite complex: the energy injected by thermal feedback will be boosted, unless = Edd (i.e., Bondi > Edd ) in which case the r factors will cancel and there will be no effect; on the other hand, from Eq. (17) we see that Edd is lowered if r is increased, and subsequently the ratio / Edd will be greater and there will then be less kinetic feedback. From this discussion, increasing r is therefore expected to increase the heating of the gas by thermal feedback and reduce the blowing out of gas by kinetic feedback: two effects which would have competing impacts on the star formation efficiency. For our calibration runs, we have observed that increasing r from 0.18 to 0.22 slightly boosts the amount of star formation. Therefore, it seems that the reduced blowing out of gas by kinetic feedback has the dominant effect here.
The result of making this final adjustment to the baryonic physics model is shown by the cyan lines in Fig. 2. The stellar mass fraction and stellar mass function are both slightly boosted for highmass haloes. From the upper-right panel of Fig. 2, our model now appears to slightly overshoot the observed stellar mass function at higher masses; this is actually a consequence of sample variance which results from using a small box-size. As we will show in Sec. 3.3, the agreement is very good for the larger 301.75ℎ −1 Mpc box size. The change to r also brings the galaxy size relation into closer agreement with the TNG L25-N512 runs, while the good agreement with observations for the black hole mass relation, the gas mass fraction and the SFRD is unaffected. Overall, the change of r from 0.2 to 0.22 leads to very minor or negligible changes in all the 6 observables plotted in Fig. 2, especially considering the large uncertainties in the calibration data sets, and thus in principle one could do without this change.
We also considered the minimum halo mass for black hole seeding. Increasing this threshold means that, at a given time, black holes will have been growing for a shorter period of time and will consequently have a lower mass. This results in lower accretion and therefore reduces the energy released through feedback. We found that this can significantly boost star formation in higher-mass haloes (which contain larger black holes and are therefore more susceptible to black hole feedback) but has very little effect on the stellar content of low-mass haloes. We found no clear advantage to vary this in addition to the other parameters varied in this work, therefore we adjusted the black hole feedback using only the r parameter.

Summary and further comments
In summary, our retuned baryonic model uses updated parameter values ★ = 0.08,¯w = 0.5 and r = 0.22, in addition to a larger gravitational softening length, to get sufficient star formation.
While this retuning of the baryonic physics parameters has significantly boosted star formation for haloes with total mass 200 10 12 and galaxies with stellar mass ★ 5 × 10 10 , as shown in Fig. 2, it is still unable to give sufficient star formation at lower masses compared to observational data. Therefore, the stellar mass fraction and stellar mass function are both underestimated at the low-mass end, and the SFRD is underestimated at redshifts 5 (when there are only low-mass haloes). We attempted to rectify this by using even lower values of ★ and reduced stellar and black hole feedback, but found that this offered little improvement overall. We even tried switching off feedback entirely, by setting the stellar wind energy to zero (¯w = 0) and by preventing the seeding of black holes: while this resulted in a huge amount of star formation at masses 200 10 12 , there was still insufficient star formation at masses 200 10 11.5 to match observations. Therefore, the only way to have sufficient star formation across the full mass range appears to be by increasing the mass resolution. Interestingly, the BAHAMAS simulations ) are able to achieve sufficient star formation for the full mass range (see the dotted lines in the top panels of Fig. 2) despite having ∼ 3× lower mass resolution than our simulations. The BAHAMAS simulations were run using the -3 code , which uses smoothedparticle hydrodynamics rather than the Voronoi mesh. Perhaps the contrasting treatments of the gas by the two codes could explain the different levels of star formation at these lowered resolutions. One possible way to further boost star formation in low-mass haloes is by having a halo mass dependency for some of the baryonic parameters, but this approach is beyond the scope of this work. On the other hand, we note that our low-resolution simulations are designed primarily for studying galaxy groups and clusters ( 500 10 13 ), for which the predictions of our model appear to be very reasonable.
While retuning these parameters, we came across a number of degeneracies. For example, as discussed above in Sec. 3.2.2, we found that the stellar-induced wind feedback can be lowered by reducing the speed of the winds rather than the wind energy. And, in our final model, we could have instead used a slightly reduced ★ (e.g., ★ = 0.07) and reduced r (e.g., r = 0.18) to achieve similar results. Therefore, we note that different combinations of parameter values could have been used to achieve a similar level of agreement with the observational data.

Large-box simulation
Our full simulation ('L302-N1136'), which has been run using the retuned baryonic model at the same mass resolution as the L68-N256 calibration runs, has a box size of 301.75ℎ −1 Mpc and contains 1136 3 dark matter particles and (initially) the same number of gas cells. The simulation has been run for both GR and F5, the latter using an MG solver which has been implemented in . This computes the highly nonlinear scalar field of ( ) gravity using an adaptively refining mesh, ensuring accurate calculations of the fifth force in high-density regions (for further details, see Arnold et al. 2019).
The red lines in Fig. 3 show the GR predictions of the same six observables as used to calibrate the baryonic model. The results are slightly different compared to the cyan lines in Fig. 2, which use the same baryonic model: the predicted amplitudes of the stellar mass fraction and stellar mass function are slightly lower, which actually improves the high-mass agreement with observations of the latter; and the amplitude of the galaxy size relation is greater for 10 11 ★ 3 × 10 11 , leading to slightly worse agreement with the TNG L25-N512 predictions at these masses. These effects are likely to be a consequence of using a much larger box size, which is less susceptible to sample variance. The L302-N1136 simulation also extends to higher masses ( 500 ∼ 10 15 and ★ ∼ 10 12 ) than the L68-N256 runs. At these masses, the agreement with the observational data in Fig. 3 looks excellent.
The predictions for the F5 model, shown by the green lines in Fig. 3, agree with the GR predictions for the galaxy size and black hole mass relations; however, the amplitudes of the other four observables are slightly boosted in F5 compared to GR. The SFRD is boosted for redshifts 0.5 3; this is consistent with the results for the simulations . The stellar mass fraction and stellar mass function are boosted at 200 ∼ 10 12 and ★ ∼ 10 11 , respectively, and the gas mass fraction is slightly enhanced for masses 500 10 13 . There are a number of possible reasons for the enhanced gas fraction in F5: for example, the stronger gravitational force can lead to more gas being accumulated within 500 by the present-day and less gas being blown away by black hole feedback. These haloes can therefore accommodate a higher level of star formation, as observed in the other panels. The F5 predictions are still in excellent agreement with the observations, therefore it is not necessary to carry out a separate retuning of the baryonic physics for this model.

RESULTS
Using the L302-N1136 simulations (see Sec. 3.3), we have measured the observable-mass scaling relations in GR and F5 for haloes in the mass range 10 13 ≤ 500 10 15 . In Sec. 4.1, we will discuss the relations for the mass-weighted gas temperature¯g as , the SZ -parameter ( SZ ), the X-ray analogue of the -parameter ( X ) and the X-ray luminosity X . These have  Fig. 2. The red shaded regions in the subpanels for the stellar mass fraction, gas mass fraction, black-hole-mass-stellar-mass relation, and stellar-mass-galaxy-size relation indicate the 68% spread of the GR halo data. The red shaded regions for the stellar mass function and SFRD are errors from jackknife resampling, which are barely visible.
all been computed in the same way as in Mitchell et al. (2021d). The mass-weighted gas temperature is computed as follows: where gas, and are the mass and temperature of gas cell , and the summations include all gas cells found in the radial range 0.15 500 < < 500 . This excludes gas cells found in the core region, which we define as the radial range < 0.15 500 , where there can be a significant dispersion in the temperature profiles due to halo mergers, black hole feedback and cooling. The SZ and X parameters and the X-ray luminosity have been computing using the following: ∑︁ e, , X = gas ×¯g as , where T is the Thomson scattering cross section, e is the electron rest-mass, gas is the total gas mass within 500 , and e, and gas, are the number of electrons and the density in gas cell . The summations in Eq. (20) extend over the same radial range as for Eq. (19); in Mitchell et al. (2021d) we have checked the results of using different core region excisions, 0.1-0.2 500 , and found SZ and X to be insensitive to that.
We will test the 'true density' mappings, given by Eqs. (11)-(14), between the F5 and GR scaling relations for redshifts 0 ≤ ≤ 1. We will also discuss scaling relations which do not involve the cluster mass in Sec. 4.2; these can potentially be used to test gravity using galaxy groups and clusters with no requirement to measure or infer the mass.

Observable-mass scaling relations
The top rows of Figs. 4-7 show the F5 and GR scaling relations for redshifts 0, 0.5 and 1, with data points representing individual haloes. At = 0, there are ∼ 8000 GR haloes with 500 > 10 13 , including ∼ 500 cluster-sized haloes with 500 > 10 14 . This is a significant improvement on the L62 simulations, which only had ∼ 100 haloes with 500 > 10 13 at = 0. The curves in the top rows of the figures show the median observable as a function of the mean logarithmic mass computed within mass bins; the 'true density' rescalings of the F5 relations (see Eqs. (11)-(14), where ( ) dyn / ( ) true is given by the analytical tanh formula, Eq. (9)) are indicated by the dashed lines. We use eight mass bins with constant logarithmic width over the range 10 13 ≤ 500 ≤ 10 15.4 . All bins are shown regardless of the halo count. Although there may only be a few haloes in the highest-mass bins, we note that these correspond to high-mass clusters for which scatter in the scaling relations, especially in the model difference between F5 and GR, is expected to be very low. The relative difference between the F5 and GR binned data is shown in the lower panels of the figures.

Temperature scaling relation
The results for the¯g as ( ) scaling relation are shown in Fig. 4. The GR data appears to follow the well-known power-law behaviour for cluster-sized objects; however, the relation appears to curve at lower masses, where processes such as feedback can cause additional gas heating and break the power-law scaling. In F5, haloes are mostly screened from the fifth force for masses 500 10 14.5 at = 0, 500 10 14 at = 0.5 and 500 10 13.5 at = 1, and here the F5 temperature closely follows the GR temperature. At lower masses, the F5 temperature becomes significantly enhanced, as the total gravitational potential of the halo is deepened by the fifth force.
Our rescaling of the F5 data, which we recall involves dividing the temperature by the ratio of the dynamical mass to the true mass (Eq. (11)), can successfully account for this offset at lower masses, restoring a < 7% agreement with the GR relation. However, the rescaled F5 relation now slightly underestimates the GR relation on average. We note that at = 0, this offset appears to be roughly constant for cluster-sized masses; therefore, as long as the GR scaling relation parameters are allowed to vary in MCMC sampling (which can account for small differences in the amplitude), this rescaling is still expected to work well in our constraint pipeline presented in Mitchell et al. (2021a).

SZ and X scaling relations
The SZ ( ) and X ( ) relations are shown in Figs. 5 and 6, respectively. The GR relation appears to follow a weakly broken power-law, with a slightly steeper slope for group-sized haloes ( 500 10 14 ) than for cluster-sized haloes ( 500 10 14 ). Again the low-mass behaviour can be explained by feedback, which, in addition to heating up gas, also blows gas out from the inner regions which in turn can lower the values. For example, in Mitchell et al. (2021d), we observed that the -parameter was lower in the full-physics simulations than in the non-radiative simulations, which did not include feedback.
For lower (unscreened) masses, we observe an enhancement of the F5 relations by up to ∼ 50% compared to GR. This is mostly corrected by our rescaling, after which the agreement is within ∼ 12% for group-sized haloes and is within a few percent on average for cluster-sized objects. This is positive news for the constraint pipeline in Mitchell et al. (2021a), which used this rescaling to model the SZ ( ) relation for clusters in ( ) gravity in the redshift range 0 < < 0.5.

X-ray luminosity scaling relation
The X ( ) relation is shown in Fig. 7. Mitchell et al. (2021d) found that our rescaling based on the ratio between the dynamical and true halo masses was unable to accurately account for the difference between GR and ( ) gravity for this observable. That study was carried out primarily for group-sized haloes, and for these new results the rescaling is again unsuccessful for the mass range 10 13 500 10 14 . The X-ray luminosity varies as 1/2 gas 2 gas . For the 'true density' rescaling, which is applied here, it is assumed that the gas temperature is enhanced by the fifth force while the gas density is unchanged (see Sec. 2.2). This may not be a good approximation in full-physics simulations. For example, it is likely that there are different levels of feedback in F5 and GR. A greater amount of feedback in one model would result in the blowing out of gas and subsequent lowering of the gas density. This is expected to have a much greater effect on X , which varies as 2 gas , than on the other observables considered in this work. For the -parameters, which vary as gas gas , the effects of feedback on the gas density and the temperature can roughly balance out (e.g., Fabjan et al. 2011), allowing our rescaling to perform better for these observables as demonstrated in Figs. 5 and 6. 3) at redshifts 0, 0.5 and 1. The curves correspond to the median temperature and the mean logarithm of the halo mass 500 computed within mass bins. Data has been included for GR (red solid lines) and F5 (green dotted lines). A rescaling to the F5 temperature has been carried out as described in Sec. 4.1, as indicated by the green dashed lines. Data points are displayed, with each point corresponding to a GR halo (red points) or to a halo in F5 (green points), including the rescaling. Bottom row: the relative difference between the F5 and GR curves in the above plots.
While the above is particularly problematic for galaxy groups, which are more susceptible to feedback, our rescaling appears to work reasonably well for cluster-sized haloes in Fig. 7, where the rescaling brings the agreement to within 10% at = 0. However, the X ( ) relation is also highly scattered compared to the other relations considered in this work. For example, the agreement between F5 and GR has a large ∼ 20% fluctuation at = 1 for 500 > 10 14 , even though clusters are completely screened at this redshift.
Based on this discussion, the¯g as ( ), SZ ( ) and X ( ) relations are more suitable than the X ( ) relation for tests of gravity that involve the cluster mass.

X -temperature and X -temperature relations
In Figs. 8 and 9, we show the X (¯g as ) and X (¯g as ) relations, respectively, at redshifts 0, 0.5 and 1 (we have not shown the SZ (¯g as ) relation, since this is very similar to the X (¯g as ) relation). The curves show the median X -parameter and X-ray luminosity, respectively, and the mean logarithmic temperature computed within seven temperature bins, with logarithmic width 0.2, spanning the range 10 −0.4 keV ≤¯g as ≤ 10 1 keV. Haloes in F5 and GR with the same temperature are expected to have a similar dynamical mass; in this case, the F5 haloes would have a lower true mass than the GR haloes, and therefore a lower gas density (e.g., see the 'effective density' rescalings in Mitchell et al. 2021d). This explains why, for X (¯g as ), the amplitude of the F5 relation is suppressed by up to ∼ 40% compared to GR, while for X (¯g as ) the F5 relation is suppressed by up to ∼ 45% (the differences may also be partly due to differences in the levels of feedback in the two models). For both relations, the difference is greater for lower redshifts and lower temperatures, where more haloes are unscreened.
Neither of these relations involve the cluster mass. Therefore, these could potentially be used to test gravity using galaxy groups and clusters without the risk of bias from mass measurements. This demonstrates that, besides their abundances inferred from observables such as SZ and X , the combination of different internal observational properties for a population of galaxy clusters or groups can also offer useful, possibly complementary, constraints on the theory of gravity. We will further explore this direction in future projects.

SUMMARY, DISCUSSION AND CONCLUSIONS
Running large-box cosmological simulations which simultaneously incorporate screened modified gravity and full baryonic physics can be computationally expensive, necessitating the use of lower mass resolutions so that the calculations can involve fewer particles. However, this means that the gas density field is smoothed, resulting in high-density peaks being lost and consequently an overall reduction in star formation. This can result in poor agreement with observations of the stellar and gaseous properties of galaxies, galaxy groups or galaxy clusters.
In this work, we have retuned the IllustrisTNG baryonic model   so that it can be used to run full-physics simulations at a much lower resolution while still retaining a high level of agreement with galaxy observations. Calibrated using runs with a box size of 68ℎ −1 Mpc, 256 3 dark matter particles and, initially, 256 3 gas cells, our model uses updated values for the following TNG parameters (Sec. 3.2): the threshold gas density for star formation, ★ , is reduced from ≈ 0.1 to 0.08; the parameter¯w which controls the energy released by the stellar-driven wind feedback is reduced from 3.6 to 0.5; and the black hole radiative efficiency r is increased from 0.2 to 0.22. In addition to these changes, we have also increased the gravitational softening to a factor 1/20 of the mean interparticle separation. By reducing the heating and blowing out of gas by feedback and two-body interactions, and lowering the threshold density of star formation, these changes boost the amount of star formation at our lowered resolution, resulting in good agreement with observations of galaxy properties including the stellar mass fraction, the stellar mass function, the SFRD and the gas mass fraction (Fig. 2). Using our retuned model, we have run GR and F5 simulations with a box size 301.75ℎ −1 Mpc (Sec. 3.3). The predictions of stellar and gaseous properties in both gravity models show a very good match with galaxy observations, particularly for groupand cluster-sized masses (Fig. 3), which shows that for F5 it is not necessary to further retune the baryonic parameters. Using these simulations, we have studied, for redshifts 0 ≤ ≤ 1 and masses 10 13 ≤ 500 10 15 , the scaling relations between the halo mass and four observable mass proxies (Sec. 4.1): the SZ Compton -parameter SZ and its X-ray analogue X , the mass-weighted gas temperature¯g as , and the X-ray luminosity X .
For the SZ ( ) and X ( ) relations, our mapping between the F5 and GR relations, which involves dividing the F5 -parameter by the ratio of the dynamical mass to the true mass, is accurate to within ∼ 12% for galaxy groups and just a few percent for galaxy clusters. This validates our method for accounting for the effect of the fifth force on the SZ ( ) relation, which is currently used in our ( ) constraint pipeline (Mitchell et al. 2021a). For the¯g as ( ) relation, the same rescaling is again reasonable, with 7% accuracy for the full range of masses. Our rescaling does not work as well for the X ( ) relation, which is likely due to the greater susceptibility of the X-ray luminosity to feedback processes.
We have also shown (Sec. 4.2) that the X -temperature and X -temperature scaling relations can differ in F5 and GR by up to 45%. These relations could potentially be used for large-scale tests of gravity that do not involve measuring the cluster mass, and hence not only eliminating one potential source of uncertainty but also including additional information in the model constraints.
By running large-box full-physics simulations for a range of ( ) gravity field strengths, it will be possible to test our models for the enhancements of the dynamical mass (Eq. (9)) and the halo concentration (Mitchell et al. 2019) in the presence of full baryonic physics over a wide mass range. Our baryonic model can also potentially be used to run large full-physics simulations for other classes of modified gravity and dark energy models, e.g., the nDGP model (Hernández-Aguayo et al. 2021) using the MG solvers implemented in the code, since it is likely that a recalibration of the baryonic parameters will not be necessary unless the model studied is an extreme and differs strongly from the current best-fit ΛCDM (but in that case the model is likely to have already been ruled out by other observations). The application to the nDGP model, which is another popular class of MG models, will make it possible to validate our models for the nDGP enhancements of the halo concentration and the HMF, which were calibrated using DMO simulations, in addition to extending our results for the observable-mass scaling [Colour Online] X-ray analogue of the -parameter as a function of gas temperature for haloes from the full-physics L302 simulation (see Sec. 3) at redshifts 0, 0.5 and 1. The curves correspond to the median luminosity and the mean logarithm of the temperature computed within temperature bins. Data has been included for GR (red solid lines) and F5 (green dotted lines). Data points are displayed, with each point corresponding to a GR halo (red points) or to a halo in F5 (green points). Bottom row: the relative difference between the F5 and GR curves in the above plots. relations to higher masses (Mitchell et al. 2021b). Finally, in our study of the thermal and kinetic SZ angular power spectra in ( ) gravity and nDGP (Mitchell et al. 2021c), we were unable to study larger angular scales ( 500), again due to the relatively small box size of the simulations: this can potentially be rectified by using these larger simulations. These possibilities will be explored in future works.
The ability to run large realistic galaxy and cluster formation simulations for beyond-ΛCDM models will prove highly beneficial for research in this field: not only will this endow us with numerical tools to predict observables, such as cluster properties, that cannot be studied using DMO simulations, but the hydrodynamical simulations enabled by such a tool can be used to quantify the impacts of baryons on various other observables, such as weak lensing and galaxy clustering. The lack of such a quantitative assessment would either restrict the amount of data that can be reliably used in model tests, or lead to biased constraints on models and parameters.

APPENDIX A: BARYONIC PHYSICS CALIBRATION
In Sec. 3.2, we presented our new baryonic model for low-resolution, full-physics cosmological simulations. In particular, we focused on the changes that we made to the IllustrisTNG model and described only a small subset of the ∼200 calibration runs. In this appendix, we will provide a more detailed description of the calibration procedure, including an outline of our simulations and details of the parameter search. Table A1 shows the specifications of the simulations used to tune the baryonic model. The primary goal of the tuning was to find a model that can produce sufficient star formation in lowresolution simulations to match galaxy observations. We studied in detail simulations with three different mass resolutions before settling on the resolution of the L68-N256 simulations, which have already been mentioned in Sec. 3.2.

A1 L100-N256 simulations
To start with, we used simulations with a box size of 100ℎ −1 Mpc, containing 256 3 dark matter particles and (initially) the same number of gas cells (L100-N256). With an average gas cell mass of ∼ 8.3 × 10 8 ℎ −1 , these have 512 times lower mass resolution than the simulations used to calibrate the fiducial TNG model and the same resolution as the BAHAMAS simulations , which were run using -3 ) rather than . We ran ∼100 simulations at this resolution, varying the following baryonic parameters: the threshold gas density for star formation ★ (see Sec. 3.2.3) was varied in the range [0.00, 0.13] cm −3 ; the parameter¯w controlling the stellar wind energy (see Eq. (15)) was varied in the range [0.0, 3.6]; the parameters w and w,min controlling the stellar wind speed (see Eq. (16)) were varied over ranges [0.0, 29.6] and [0, 500] kms −1 , respectively; and the black hole radiative efficiency r (see Sec. 3.2.4) was varied in the range [0.02,0.20]. We also tested gravitational softening lengths in the range 1/40 to 1/10 times the mean inter-particle separation.
These runs provided a very useful insight into the effects of changing each parameter, however all of the tested parameter combinations at the L100-N256 resolution resulted in insufficient star formation within haloes of mass 200 10 13 , and at higher halo masses it was difficult to simultaneously match observations for different galaxy properties. For example, parameter combinations which yielded a sufficiently high stellar mass function typically resulted in the stellar mass fraction being overestimated, and in order to match the SFRD observations it was necessary to set either ★ or the stellar wind energy close to zero. We therefore decided to look at higher resolutions.

A2 L86-N256 simulations
Keeping the dark matter particle number (and initial gas cell number) unchanged, we initially reduced the box size to 86ℎ −1 Mpc (L86-N256), and executed ∼60 runs at this higher resolution. Our best model used a gravitational softening length of 1/20 times the mean inter-particle separation and the following parameter combination: ★ = 0.05cm −3 ,¯w = 0.5, w = 2, w,min = 200 kms −1 and r = 0.15. This gave a reasonable match with high-mass observations of the stellar mass fraction ( 200 2 × 10 12 ℎ −1 ) and stellar mass function ( ★ 10 11 ℎ −1 ), however the agreement was still poor at lower masses and the SFRD was significantly underestimated for redshifts 2. We made some efforts to rectify this. For example, we switched off feedback entirely by setting the wind energy to zero and preventing the formation of black hole particles, and we tried using much lower values of ★ . While these efforts resulted in more star formation at lower masses, it was still not enough to match observational data, and at higher masses and lower redshifts there was now far too much star formation.

A3 L68-N256 and L136-N512 simulations
We finally settled on the 68ℎ −1 Mpc (L68-N256) box size, where we ran a further ∼50 runs to calibrate the final model presented in Sec. 3.2. Our final model, with ★ = 0.08cm −3 ,¯w = 0.5, r = 0.22 and a softening length of 1/20 times the mean interparticle separation, is able to produce sufficient star formation at lower masses and higher redshifts than the above L86-N256 model, and only requires changes to three of the TNG model parameters (we take the TNG values for w and w,min ). The predictions from five of the L68-N256 runs are shown in Fig. 2 to illustrate the effects of each of the changes to the fiducial TNG parameter values.
In order to assess the effects of sample variance, we also ran three simulations with an increased box size of 136ℎ −1 Mpc (L136-N512) and the same mass resolution as the L68-N256 runs. The results from these simulations, which were run using our three most promising baryonic models, indicated that the stellar mass fraction and stellar mass function are slightly reduced in the larger box. This is why we selected the above model, even though it slightly overestimates the stellar mass function in Fig. 2. This paper has been typeset from a T E X/L A T E X file prepared by the author.