Concordance between observations and simulations in the evolution of the mass relation between supermassive black holes and their host galaxies

We carry out a comparative analysis of the relation between the mass of supermassive black holes (BHs) and the stellar mass of their host galaxies at $0.2<z<1.7$ using well-matched observations and multiple state-of-the-art simulations (e.g., Massive Black II, Horizon-AGN, Illustris, TNG and a semi-analytic model). The observed sample consists of 646 uniformly-selected SDSS quasars ($0.2<z<0.8$) and 32 broad-line active galactic nuclei (AGNs; $1.2<z<1.7$) with imaging from Hyper Suprime-Cam (HSC) for the former and Hubble Space Telescope (HST) for the latter. We first add realistic observational uncertainties to the simulation data and then construct a simulated sample in the same manner as the observations. Over the full redshift range, our analysis demonstrates that all simulations predict a level of intrinsic scatter of the scaling relations comparable to the observations which appear to agree with the dispersion of the local relation. Regarding the mean relation, Horizon-AGN and TNG are in closest agreement with the observations at low and high redshift ($z\sim$ 0.2 and 1.5, respectively) while the other simulations show subtle differences within the uncertainties. For insight into the physics involved, the scatter of the scaling relation, seen in the SAM, is reduced by a factor of two and closer to the observations after adopting a new feedback model that considers the geometry of the AGN outflow. The consistency in the dispersion with redshift in our analysis supports the importance of both quasar- and radio-mode feedback prescriptions in the simulations. Finally, we highlight the importance of increasing the sensitivity (e.g., using the James Webb Space Telescope), thereby pushing to lower masses and minimizing biases due to selection effects.


INTRODUCTION
The close correlations between the mass of supermassive black holes (BHs), M BH , and the properties of their host galaxies (e.g., stellar mass, M * ) indicate a physical coupling during their joint evolution (Magorrian et al. 1998;Ferrarese & Merritt 2000;Marconi & Hunt 2003;Häring & Rix 2004;Gültekin et al. 2009). To understand the nature of this connection, considerable efforts have been focused on measuring such correlations using broad-line active galactic nucleus (AGNs) over a range of redshifts, with the intention to determine how and when the correlation emerges and evolves over cosmic time. While an observed evolution has been found since redshift z ∼ 2 in galaxies with stellar mass M * 10 10 M (e.g., Treu et al. 2004;Peng et al. 2006;Treu et al. 2007;Woo et al. 2008;Bennert et al. 2011a;Park et al. 2015) in which the growth of BHs predates that of the host, other studies (e.g., Schramm & Silverman 2013;Sun et al. 2015;?) predict that BHs grow commensurately with galaxies. However, to understand the significance intrinsic evolution, it is necessary to take into account systematic uncertainties and the selection effects (Treu et al. 2007;Lauer et al. 2007;Schulze & Wisotzki 2014;Park et al. 2015;Jahnke et al. 2009;Ding et al. 2020b;Li et al. 2021a).
Various theoretical models have been proposed to explain the origin of the scaling relations. For example, AGN feedback is considered as one of the possible viable mechanisms. During this process, a fraction of the AGN energy is injected into its surrounding gas, which can then regulate the mass growth of the BH and its host. In this scenario, star formation is inhibited by the heating and unbinding of a significant amount of gas. Alternatively, the mass relations can be explained through an indirect connection in which AGN accretion and star formation are fed through a common gas supply (Cen 2015;Menci et al. 2016;Anglés-Alcázar et al. 2017). Actually, even without any physical mechanisms, statistical convergence from galaxy assembly alone (i.e., dry mergers) could instill the observed correlations (Peng 2007;Jahnke & Macciò 2011;Hirschmann et al. 2010). However, as expected from the central limit theorem, a higher dispersion would appear in the scaling relations at high-z compared to what is observed today (e.g., ??).
Numerical simulations provide an opportunity to further understand the connection between BHs and their host galaxies. For example, a comparison of scaling relations has been made using state-of-the-art cosmological hydrodynamical simulation of structure formation (MassiveBlackII) and observational measurements at 0.3 < z < 1 (e.g., DeGraf et al. 2015), which show a positive evolution (i.e., the mass growth of the BH predates that of its host). Further efforts are using large-volume simulations to investigate the scaling relations and find good agreement with the local relation with redshift evolution, including the Magneticum Pathfinder smoothparticle hydrodynamics (SPH) Simulations (Steinborn et al. 2015), the Evolution and Assembly of Galaxies and their Environments (EAGLE) suite of SPH simulations Crain et al. 2015;McAlpine et al. 2016), Illustris moving-mesh simulation Vogelsberger et al. 2014a;Sijacki et al. 2015;Nelson et al. 2015;Li et al. 2019), the Horizon-AGN simulation (Dubois et al. 2014bVolonteri et al. 2016), and the SIMBA simulation (Thomas et al. 2019;Davé et al. 2019). In particular, the M BH -M * relation using BH populations using six large-scale cosmological simulations (i.e, Illustris, TNG100, TNG300, Horizon-AGN, EAGLE, and SIMBA) has been compared with observations in the local universe in a recent study ). However, these comparison works are limited by the observation data in terms of the sample size (<100) and redshift range (i.e., limited to the local universe).
For such comparisons using simulations, it is crucial to consider the systematic uncertainties and selection biases. A direct means to account for these is to apply the same effects and selection to the simulation products and make a forward comparison in the observational plane. In Ding et al. (2020a), a direct comparison has been performed using 32 X-ray-selected AGN at 1.2 < z < 1.7 and a direct comparison with two stateof-the-art simulation efforts, including MassiveBlackII (MBII) and a Semi-analytic Model (SAM, Menci et al. 2014Menci et al. , 2016. The dispersion in the mass ratio between BH mass and stellar mass is significantly more consistent with the MBII prediction (∼ 0.3 dex) favoring the hypothesis of AGN feedback being responsible for a causal link between the BH and its host galaxy.
In this study, we extend our previous work by adding recent measurements of hundreds of Sloan Digital Sky Survey (SDSS) quasars at 0.2 < z < 0.8 based on wide and deep Hyper Suprime-Cam (HSC) imaging from the Strategic Subaru Program, and comparing the observational measurements with that from simulations. Furthermore, we extend the simulated quasar populations by including MBII, SAM, Illustris, TNG100, TNG300, and Horizon-AGN. This paper is structured as follows. In Sections 2 and 3, we describe our observed and simulated samples. A direct comparison is performed and the result is presented in Section 4. The concluding remarks are presented in Section 6.

SDSS/HSC sample
A sample of ∼5000 type-1 SDSS quasars from the DR14 catalog (Pâris et al. 2018) at 0.2 < z < 1 has been imaged by the high-resolution Subaru Strategic Program (SSP) wide area survey (Aihara et al. 2019) using Hyper Suprime-Cam (Miyazaki et al. 2018). With accurate PSF models in five optical bands grizy, twodimensional quasar-host decompositions have been performed (Li et al. 2021b) to obtain the flux and color of each quasar's host galaxy. The state-of-the-art image modeling software lenstronomy (Birrer et al. 2015;Birrer & Amara 2018;Birrer et al. 2021) is adopted to perform the modeling task. This approach is first developed by Ding et al. (2020b) and used to decompose the near-infrared emission of the HST sample (see next section). Having measured the host light in each band, the stellar mass of host galaxy is derived using spectral energy distribution (SED) fitting with CIGALE (Boquien et al. 2019). Simulation tests are also performed to verify the fidelity of the M * measurements. The statistical measurement error on M * is at the ∼0.2 dex level. The values of M BH are determined by Rakshit et al. (2020) which are estimated based on the Hβ-based measurements using the virial method (Peterson et al. 2004;Vestergaard & Peterson 2006). The typical error of M BH are estimated to be 0.4 dex. The mass ranges for the entire sample are log(M * ) ∈ [9.0, 11.5] M , and log(M BH ) ∈ [6.5, 10.0] M ). We refer the reader to Li et al. (2021b) in the Section 4.2 for more details.
To avoid any potential biases related to the selection of the quasars, Li et al. (2021a) isolated 877 sources which are uniformly selected based on their PSF-magnitudes, color cuts using single-epoch SDSS photometry and the value of the measured M * . Specifically, we use the ugri color-selected sample (228 sources) from SDSS I/II (Richards et al. 2002), and the CORE sample from SDSS BOSS (408 sources) and eBOSS (241 sources) surveys (Ross et al. 2013;Myers et al. 2015) (hereafter the uniform sample). These samples are initially selected based on PSF-magnitude cuts of 15 < i < 19.1 (for ugri), and i > 17.8 and g, r < 22.0 (for CORE). Furthermore, a limit on M * is set to assure the detection of the host, especially since the rate and accuracy of detection is higher when M * is increasing, resulting in a final sample of 646 quasars. These selections will be adopted in an equivalent manner to the simulated samples to mitigate selection effects thus allowing the fair comparison.

HST sample
A sample of 32 HST-observed AGN systems across the redshift range 1.2 < z < 1.7 are selected from three deep-survey fields (COSMOS, (E)-CDFS-S, and SXDS). The HST/WFC3 IR camera is used to obtain the high-resolution imaging data (HST program GO-15115, PI: John Silverman) with six position dither pattern and a total exposure time ∼2348 s. The filters F125W (1.2 < z < 1.44) and F140W (1.44 < z < 1.7) were employed, according to the redshift of each target to bracket the 4000Å break. The AGN images are analyzed and decomposed to infer the host galaxies fluxes using the approach developed by D20 based on lenstronomy. The HST ACS/F814W imaging data for 21/32 of the AGNs is also used to infer the host color. The results show that stellar templates of 1 and 0.625 Gyr can match the sample color at z < 1.44 and z > 1.44, respectively (see Figure 5 in D20). These best-fit models are used to estimate the stellar masses of the host galaxies. M BH is determined by Schulze et al. (2018) using near-infrared spectroscopic observations of the broad Hα emission line with the recipe provided by Vestergaard & Peterson (2006), in a consistent manner to that adopted for HSC sample. The mass ranges for the HST sample are log(M * ) ∈ [9.5, 11.0] M , and log(M BH ) ∈ [7.5, 9.0] M ). We refer the reader to D20 for a more detailed description of the analysis.
The measurements of the M BH -M * relations for both the HST and HSC samples are obtained with a consistent approach. Thus, we expect the measurement errors of these two samples to be at a comparable level (i.e., ∆M BH = 0.4 dex, ∆M * = 0.2 dex). Indeed, the two samples are consistent with a lack of evolution in the mass ratio (see Figure 6 of Li et al. 2021a), even though the sample selection is slightly different.

SIMULATIONS AND COMPARISON STRATEGY
We introduce the simulation samples that are adopted in this study. All are based on larger-scale cosmological simulations, except the Semianalytic Model (SAM; see Section 3.5). In Table 1, we summarized the key elements for each hydrodynamic simulation being considered. We note that each simulation adopts either a Chabrier (2003) or a Salpeter (1955) initial mass function (IMF). To make self-consistent comparisons, we ensure that the adopted IMFs for both the simulations and the observations are consistent. The assumed IMF is needed to obtain the mass-to-light ratio and convert the observed luminosity to stellar mass to perform the comparisons. For numerical simulation, the mass is the base material, which is IMF independent.

MassiveBlackII (MBII)
MBII is a high-resolution cosmological hydrodynamic simulation that has a box size of (142.7 cMpc) 3 and 2 × 1792 3 (i.e., dark matter + gas) particles. The simulation is based on smooth particle hydrodynamic (SPH) code P-GADGET, a hybrid version of the parallel code GADGET (Springel 2005). The base cosmology parameters are based on the WMAP7 results (Komatsu et al. 2011). For dark matter and gas, the mass resolutions are 1.57 × 10 7 M and 3.14 × 10 6 M , respectively. The simulation includes a full modeling of gravity plus gas hydrodynamics, with a wide range of subgrid recipes to model the star formation (Springel & Hernquist 2003), BH growth, and the feedback process (Di Matteo et al. 2005).
To model supermassive BH, the initial seed with mass 5 × 10 5 M /h is inserted into halos of mass 5 × 10 10 M /h. Once seeded, BH growth via gas accretion is assigned at a rate ofṀ BH = 4πG 2 M 2 BH ρ/(c 2 s + v 2 BH ) 3/2 where ρ and c s are the density and sound speed of the interstellar medium (ISM) gas at cold phase; v BH is the relative velocity between the BH and its surrounding gas. Note that unlike several previous works, the accretion rate in MBII adopt the prescription in Pelupessy et al. (2007) which does not include any artificial boost factor. The accreted gas is released as radiation at a radiative efficiency of 10%. A fraction of 5% of the radiated energy thermally couples to the surrounding gas as BH (or AGN) feedback (Di Matteo et al. 2005). A mildly super-Eddington (two times Eddington rate) is allowed. Due to resolution limitations, BH dynamics cannot be self-consistently modeled in the simulations. Two BHs are considered to be merged when their separation distance is below the simulation spatial resolution (i.e., the SPH smoothing length) and their relative speeds are lower than the local sound speed of the medium.
Halos are identified using a friends-of-friends (FOF) group finder (Davis et al. 1985). Galaxies are identified with the stellar matter components of subhalos; these subhalos are identified using SUBFIND within the halos (Springel 2005). As a common practice, the stellar mass is obtained by using a 3D spherical aperture of 30 kpc to represent the observed stellar mass. We adopt this definition of stellar masses for all larger-scale cosmological simulations described in the following sections, except for Horizon-AGN which use total mass (see Section 3.4) for details. Using this definition, Pillepich et al. (2018a) has shown that the corresponding stellar mass function is consistent with the observational measure. Even more, the stellar mass using this 3D aperture can achieve good agreement to those measured within the Petrosian radii in observational studies . For further details of MBII simulation, we refer the reader to Khandai et al. (2015).

Illustris
The Illustris Project is another large scale hydrodynamics simulation, introduced in Genel et al. (2014); Vogelsberger et al. (2014a,b); Sijacki et al. (2015); Nelson et al. (2015). The simulation consist of a volume of (106.5 cMpc) 3 (slightly smaller than MassiveBlack II), and was run with the moving Voronoi mesh code Arepo (Springel 2010) with a base cosmology adopted from WMAP9 results (Hinshaw et al. 2013). Besides gravity and gas hydrodynamics, the simulation calculates the astrophysical processes (Vogelsberger et al. 2013;Torrey et al. 2014) that includes gas cooling and star formation (with a density threshold of 0.13 cm −3 , Springel & Hernquist 2003), stellar evolution and chemical enrichment, kinetic stellar feedback by SNe activity, BH growth (accretion and merging), and AGN feedback.
BHs are seeded with an initial mass of 1 × 10 5 M /h when a halo exceeds a mass of 5×10 10 M /h. BHs then grow via accretion described by the Eddington limited Bondi-Hoyle-Lyttleton formalism (α4πG 2 M 2 BH ρ/c 3 s ), as well as mergers with other BHs. The boost factor α = 100 is introduced to account for the unresolved multiphase ISM Booth & Schaye 2009), which is otherwise expected to underestimate the density around the BHs. Lastly, accreting BHs radiate with a bolometric luminosity given by rṀBH c 2 , wherė M BH is the mass accretion rate and r = 0.2 is the radiative efficiency.
The AGN feedback consists of three components, namely quasar-mode, radio-mode and radiative feedback. In the quasar-mode which holds for BHs with Eddington ratio > 0.01, the AGNs deposit 5% (quasarmode feedback efficiency) of their released energy into the surrounding gas as thermal energy. For Eddington ratios < 0.01, the AGN feedback is in radio-mode where the thermal energy is released as hot bubbles with a radius of ∼ 100 kpc at the intervals between which the BH mass grows by a fixed fraction. The energy of the bubbles is given by m r δM BH c 2 where δM BH is the change in BH mass within the last time interval, and m = 0.35 is the radio-mode feedback efficiency. Lastly, the radiative feedback mode is implemented by modifying the heating and cooling rates of the gas in the presence of radiation from all surrounding AGN. Halos and galaxies are identified similar to those of MBII. As in MBII, a 3D 30 kpc spherical aperture is used to obtained the galaxy stellar mass.

IllustrisTNG
The Next Generation Illustris Simulations (Illus-trisTNG, hereafter, TNG) Pillepich et al. 2018a;Naiman et al. 2018;Marinacci et al. 2018;Nelson et al. 2018) are a suite of magnetohydrodynamical simulations of galaxy formation in large cosmological volumes. It builds upon the scientific achievements of the Illustris simulation with improvements upon Illustris by 1) extending the mass range of the simulated galaxies and haloes, 2) adopting an improved numerical and astrophysical modeling, and 3) addressing the identified shortcomings of the previousgeneration simulations. Note that TNG simulations employ a modified version of the Bondi formalism, with c s explicitly including a B term for these magneto-hydro simulations.
The TNG100 and TNG300 have a volume of (100 cMpc) 3 and (300 cMpc) 3 , respectively. The adopted cosmological parameters are updates by the Planck result (Planck Collaboration et al. 2016). The gas cooling and star formation prescriptions are broadly similar to the Illustris model. However signif-icant updates have been made to the stellar feedback model (more details in Pillepich et al. 2018b). BH seeds with initial mass of 8 × 10 5 M /h are placed in Dark matter halos with a mass exceeding 5 × 10 10 M /h. Notably, the seed mass is one order of magnitude higher than in the Illustris simulation. The BH accretion also follows the Bondi-Hoyle-Lyttleton formalism, but without any boost factor (unlike Illustris). Accreting BHs release energy with a radiative efficiency of 0.2 (same as Illustris). The inclusion of the magnetic fields can affect the relationship between the BHs and their host galaxies properties; the M BH -M * mean relation is higher with magnetic fields (Pillepich et al. 2018b).
The AGN feedback occurs in thermal, radio, and radiative modes. For high accretion rates, the feedback implementation is the same as in Illustris i.e., thermal energy is injected in the surroundings of the accreting BHs. However, at low accretion rates, the feedback implementation is substantially different from Illustris. Instead of releasing hot bubbles, this feedback mode in TNG is purely kinetic. In particular, there is a directional injection of momentum along a randomly chosen direction (Weinberger et al. 2017(Weinberger et al. , 2018a. The transition between the two feedback modes is also different from Illustris, and is set by the minimum value of 0.1 and 2 × 10 −3 × (M BH /10 8 M ). Additionally, the radiative feedback implemented in Illustris (summarized in the previous section) is also present in TNG. Lastly, halo and galaxy identification, as well as calculation of galaxy stellar mass, is done in a similar manner to that of Illustris and MBII.

Horizon-AGN
The simulation Horizon-AGN (Dubois et al. 2014b has a volume of 142 cMpc 3 and was generated using the adaptive mesh refinement code Ramses (Teyssier 2002) with a ΛCDM model based on WMAP7 (Komatsu et al. 2011) cosmological results. The dark matter particle mass is 8 × 10 7 M . The stellar particle mass is 2 × 10 6 M and the MBH seed mass is 10 5 M . Adaptive mesh refinement is permitted down to ∆x = 1 kpc, and, if the total mass in a cell becomes greater than 8 times the initial mass resolution, it is performed in a quasi-Lagrangian manner. Collisionless particles (dark matter and star particles) are evolved using a particlemesh solver with a cloud-in-cell interpolation. The simulation includes gas cooling down to 10 4 K (Sutherland & Dopita 1993), and stochastic star formation. Stellar feedback is modeled as mechanical energy injection from Type Ia SNe, Type II SNe and stellar winds, with the metal enrichment from these sources.
Differing from simulations presented above, Horizon-AGN does not use a fixed threshold in the dark matter halo mass to seed BHs. BHs are seeded with a mass of 10 5 M in cells, with gas density above n 0 and stellar velocity dispersion larger than 100 km s −1 . An exclusion radius is imposed so that no BH seed is formed at less than 50 ckpc from an existing BH. After z = 1.5, new BHs are prevented from forming. At these subsequent times, all the progenitors of the M * > 10 10 M galaxies at z = 0 should be formed and seeded with BHs (Volonteri et al. 2016). BH accretion is computed using the Bondi-Hoyle-Lyttleton formalism with a boost factor α = (ρ/ρ 0 ) 2 when the density ρ is higher than the resolution-dependent threshold ρ 0 . Otherwise, the boost factor is fixed as unity (Booth & Schaye 2009).
Horizon-AGN includes two modes of AGN feedback. In the quasar mode (f Edd > 0.01), thermal energy is isotropically released within a sphere of radius a few resolution elements. The energy deposition rate isĖ AGN = 0.015Ṁ BH c 2 . In the radio mode, energy is injected into a bipolar outflow with a velocity of 10 4 km s −1 , to mimic the formation of a jet. The energy rate in this mode iṡ E AGN = 0.1Ṁ BH c 2 . The technical details of BH formation, growth and AGN feedback modeling of Horizon-AGN can be found in Dubois et al. (2012).
We identify galaxies applying the AdaptaHOP structure finder (Aubert et al. 2004;Tweed et al. 2009) to the star particle distribution. Galaxies are identified using a local threshold of 178 times the average matter density, with the local density of individual particles calculated using the 20 nearest neighbors. Only galaxies with more than 50 particles are considered. With this approach for Horizon-AGN, the adopted galaxy mass corresponds to the total stellar mass of a galaxy, which is different than the other hydrodynamic simulations (i.e., within a 3D 30 kpc spherical aperture). This definition of stellar mass is commonly used in Horizon-AGN, whose stellar mass function is known (Kaviraj et al. 2017) to be in good agreement with observations.

Semi-analytic Model (SAM)
We highlight the main points of the simulation with respect to our study; for more detail, a full description of the SAM can be found in Menci et al. (2016) which is based on an earlier semi-analytic model introduced in Menci et al. (2014). The specific version adopted here differs from the one presented in the above papers since it implements a new, detailed description of AGN feedback, as discussed in detail below.
For dark matter halos that merge with a larger halo, the impact of dynamical friction is assessed to define whether the halo will survive as a satellite or sink to the center of the dominant galaxy which increases its mass. The binary interactions (fly-bys and mergers), among satellite sub-halos, are also described by the model. In each halo, we compute the fraction of gas which cools because of the atomic processes and settles into a disk (Mo et al. 1998). The stars are converted from the gas through three channels: (1) quiescent star formation with long timescales: ∼ 1 Gyr; (2) starbursts following galaxy interactions with timescales 100 Myr; (3) the loss of angular momentum triggered by the internal disk instabilities causing the gas inflows to the center, resulting in stimulated star formation (as well as BH accretion). The stellar feedback is also considered by calculating the energy released by the supernovae associated with the total star formation, which returns a fraction of the disk gas into a hot phase. A ΛCDM power spectrum of perturbations with a total matter density parameter Ω 0 = 0.3, a baryon density parameter Ω b = 0.04, a dark energy density parameter Ω Λ = 0.7, and a Hubble constant h=0.7 is adopted.
We assume BH seed M seed = 100 M (Madau & Rees 2001) to be initially present in all galaxy progenitors at the initial redshift z = 15. This constitutes an approximate way of rendering the effect of the collapse of Population III stars. However, the detailed value of M seed has a negligible impact on the final BH masses as long as they remain in the range M seed = 50 − 500 M .
The BH accretion is assumed to follow from the gas instabilities resulting from either galaxy interactions or disk instabilities, and is thus related to star formation channels 2-3: a fraction of the cold gas destabilized during galaxy encounters and through disk instabilities is in fact accreted onto the central BHs (with the remain- Figure 2. Demonstration of AGN selection using MBII. left: distribution of MBH and Eddington ratio for the full (colored squares) MBII sample and individual objects meeting the observed selection criteria (blue circles) for those at redshift z = 0.6. A matched HSC sample is shown by the orange data points. The light-green background cloud shows the intrinsic simulated number density in this parameter space. Note that this is the first step of sample selection. We further use an AGN magnitude cut to assure that the simulation sample has a similar L bol distribution (e.g., see Figure 7-left) and MBH distribution with the observations. We then use the same M * cut to construct the final sample. right: similar to the left panel, but presenting the impact of selection on the HST sample.
ing fraction fueling star formation channels 2-3 described above). Such fractions and the corresponding timescales for accretion are computed as described in Menci et al. (2014, their Section 3).
The SAM adopted here implements a new and improved model for the AGN feedback with respect to the previous versions (Menci et al. 2008). In both versions, the basic assumption is that fast winds with velocity up to 10 −1 c observed in the central regions of AGNs (Chartas et al. 2002;Pounds et al. 2003) result in supersonic outflows that compress the gas into a blast wave terminated by a leading shock front. This moves outward with a lower but still supersonic speed, and sweeps out the surrounding medium. However, while in the earlier version of the SAM (Menci et al. 2016) the blast wave is assumed to expand into an isotropically distributed medium, in the new description of AGN feedback (Menci et al. 2019) the full two-dimensional structure of the gas disk and of the expanding blast wave is followed in detail. The main physical difference is that in the new model the large density of gas along the plane of the disk causes the blast wave expansion to stall in such a direction, while it expands with large velocities in the vertical direction. The resulting strong dependence of the total (integrated over directions) outflow rate on the AGN luminosity L AGN and on the gas content of the galaxy M gas is shown in Figure 1. Such a new AGN feedback model has been tested in detail against a state-of-the-art compilation of observed outflows in 19 galaxies with dif-ferent measured gas and dynamical masses (Fiore et al. 2017), allowing for a detailed, one-by-one comparison with the model predictions. This well tested AGN feedback model allowed us to derive, for each simulated galaxy in the SAM, the outflow expansion and the mass outflow rates in different directions with respect to the plane of the disc.

Application of observational measurement error and selection effects
To make direct comparisons with observations, we add measurement errors and apply the equivalent selection to the simulated samples. We first inject random noise to the simulated catalog to mimic the scatter caused by measurement error. As mentioned above, M * and M BH for HSC and HST samples are measured with a similar approach; thus, their uncertainty levels are expected to be equivalent. We assume the following measurement uncertainties that are added as random noise: ∆M BH = 0.4 dex, ∆M * = 0.2 dex, and ∆L bol = 0.03 dex. Since the selection of the simulation sample based on L bol has limited effects in this study (see the discussion in Section 5 and Figure 7 for details), the AGN variability correction is not considered.
We then apply restrictions on the noise-injected simulation to mimic selection effects as present in the observational data. Since the HSC and HST samples have their own selection function, we apply different selection criteria to the simulation as follows.
Comparing with the HSC sample: (1) The observed sample consists of type-1 AGN, and thus the simulated sample should match the relationship between M BH -L bol as seen in the HSC sample. We use MBII to demonstrate the importance of matching the sample selection (Figure 2-left). (2) The i-band magnitudes of the AGN are bright (see Section 2.1). The specific selection is carried out as follows: a value of AGN i-band magnitude is chosen to make the sample selection such that the L bol distribution is similar to the observations (see Section 5 for details). Since the simulations do not provide the observed AGN magnitude, we adopt a simulated rest-frame magnitude or L 5100 and assume the quasar continuum as a single power-law with an index of α ν = −0.44 (Vanden Berk et al. 2001) to calculate the observed i-band magnitude. (3) Following the HSC selection, we require the M * value to be above a certain level (according to their redshift) to assure an accurate measurement. Finally, the HSC sample is split into three redshift bins for making comparisons, which are 0.2 < z < 0.4, 0.4 < z < 0.6, and 0.6 < z < 0.8.
Comparing with the HST sample: Simulated AGN systems are selected only when they match the M BH -L bol targeting window, which is the same as the observational selection (see Figure 2-right using MBII as an example). Note that the selection of the HST sample has a hard cut on the M BH values (i.e., log(M BH ) between [7.7, 8.6] M ). The HST sample covers the higher redshift range 1.2 < z < 1.7, which is considered as a single redshift bin to make the comparison with the simulations at z = 1.5.

RESULTS
For comparison, the local scaling relation provided by D20 (e.g., M BH = 0.98M * −2.56, Chabrier IMF 1 ) is adopted as the fiducial relation to assess relative offsets and differences in dispersion with redshift. This local (M BH -M * ) relation is derived by fitting measurements for 55 local galaxies as given in Bennert et al. (2011b) and Häring & Rix (2004). For each sample, we focus on the M BH residuals 2 (i.e., the offset to the local relation along the y-axis) and calculate their mean and standard deviation to make comparisons with observations.
In Figure 3, we present the mass scaling relation M BH -M * for both the observations and simulations for direct 1 Since different simulations adopt either a Chabrier or a Salpeter IMF, we use the local relation and M * of the observational data that are based on the same IMF; thus, a comparison between the observations and simulations are self-consistent. 2 The value of the slope for the local sample is close to 1, and thus if taking the M * to calculate the residual for each system, the offset value remains the same.
comparison. For the simulated data, both the initial sample and that with observational effects applied (i.e., noise correction and selection) are shown. Note that offsets can occur for the simulated samples in three cases: an inherent offset, an offset due to selection, and offsets from added noise and selection. For the first, it has been recognized (e.g., Habouzit et al. 2021, Figure 2), and seen in our Figure 3, that the offset values vary over a range of stellar mass, and thus the mean and standard derivation do not represent the entire mass distribution.
In this work, we focus on the last two offset distributions, i.e., considering selection effects with and without noise correction. To aid in visualization of the differences among the various simulations, compared to the observed sample, we show the distribution of offsets (in terms of the ∆logM BH ) as histograms in Figure 4. Each panel presents a different redshift range. In addition, the values of the central offset and scatter are presented in Table 2, both before and after consideration of the noise. Comparisons with the observations are presented in the remainder of this section.

Dispersion
Our results show that almost all simulations can produce scatter which is consistent with the observations across all redshifts examined (Figures 3 and 4) -for the simulated samples at z < 1, this level of scatter is ∼ 0.5 dex, while at z > 1, it is ∼ 0.3 dex. Note that the HST sample z > 1 has a narrow selection window based on M BH (see Figure 2 bottom), causing the observed scatter to be smaller than that of the HSC sample at z < 1. At all redshifts, we recognize that the observed scatter is dominated by measurement uncertainties in the data.
An understanding of how much of the scatter derives from random noise can help us to determine the intrinsic scatter in the scaling relation. To this end, we measure the scatter of the simulation sample without injecting random noise but adopting the same selection window for both z < 1 and z > 1 samples to infer the central offset and scatter. We find that the intrinsic scatter is at a level of ∼ 0.15 − 0.2 dex for both z < 1 and z > 1 (see Table 2). These levels are consistent with the intrinsic scatter as estimated using observation data alone (Ding et al. 2020b;Li et al. 2021a). Furthermore, the intrinsic scatter appears to be independent of redshift since the observations and simulations all follow the observed trend with redshift expected to be due to selection effects ( Figure 5). This suggests that the tight scaling relation may not be the result of a pure stochastic process, i.e., random mergers. However, the scatter is affected by Figure 3. BH mass vs. stellar mass for both the observational (small orange circles) and simulated (small colored circles) samples. Each row pertains to a particular simulation as labeled. The panels, from left to right, show different redshift bins. The black line in each panel indicates the local relation adopted by Ding et al. (2020b). The background cloud (in green and yellow with contours) shows the intrinsic simulation number density before injecting random noise and applying selection effects. The TNG100 and TNG300 appear to present similar results (see discussion section). We note that different samples adopt either a Chabrier or a Salpeter IMF for calculating the stellar mass; thus, the M * values for the observations and the local relation are shifted appropriately. sample selection, and thus these levels can only be taken as an approximation of the true intrinsic scatter.

Global offsets
We examine the offsets to understand whether the simulations deviate or not from the observed scaling relation, with particular attention to changes with redshift. Considering the values given in Table 2 and shown in Figure 5, over the lower redshift range z < 0.6, Illustris and Horizon-AGN predict observed M BH offsets consistent with the observation data (at a level of 0.1 dex). At higher redshift 0.6 < z < 1.5, the simulations SAM, TNG100, TNG300 and Horizon-AGN follow the observed evolution. These results are consistent with the Kolmogorov-Smirnov (KS) test performed using the offset distributions between each simulated sample and the observed sample -the p-values are given in Table 3 showing that Horizon-AGN and Illustris have a good statistical match to the scaling relation at z < 0.6 (i.e., p-value > 0.1), while the TNG100, TNG300 and Horizon-AGN simulation do well at z > 0.6. Except for MBII and Horizon-AGN, we also see that the other simulations have mass offsets that are decreasing from z = 0.2 to z = 0.6, while the observation offsets increase with redshift up to z ∼ 1.5. However, this inconsistency is well below the 1σ scatter level. Overall, we find that the mass correlation between supermassive BHs and their host galaxies is generally consistent between observations and simulations, with some subtle differences that are not at the level of concern for this present study.

Trends with stellar mass
In Figure 6, we investigate how the offset values are correlated with stellar mass. Here we focus on the sample at z ∼ 0.7. The other redshift bins at z < 1, where there is a large observation sample from HSC, show similar trends. We include the intrinsic values from the simulations in the figures to address how the observational effects (i.e., random noise and selection) change the observed scaling relations and offsets. First, considering the observed quasar sample (same in each panel), there is a trend for which BHs have masses further offset from their stellar mass with decreasing stellar mass. This trend is not seen in any of the simulations after noise, and selection effects have been applied. Given the level of uncertainties in the mean offsets of the observed sample, we do not try to interpret this trend any further in this study.
Interestingly, we notice that MBII and Illustris have BHs intrinsically undermassive relative to their galaxies at the lower masses that reach the D20 local scaling re-lation at higher masses. In contrast, TNG and Horizon-AGN have BHs slightly elevated from D20 local relation at most masses. These differences between simulations present two different scenarios, either one where the BHs come later, or coevolution with the two growing in tandem. Considering the former scenario, Illustris shows the strongest trend with stellar mass. In fact, after noise and selection are applied, the simulated sample exhibits very small offsets that agree remarkably well with the observed data. If the BHs are accurately characterized in the simulation, one interpretation is that the observations, including our HSC AGN sample, do not inform us of the true M BH offsets as a function of M * . This result underscores the importance of taking into account errors and selection -without accounting for those, one could erroneously interpret an apparent trend as evolution in the opposite sense as the true one. The most direct way to circumvent these issues is to probe lower masses (M * < 10 10 M ) using a more sensitive instrument, such as the James Webb Space Telescope (Habouzit et al. 2022), across this redshift range (see also Volonteri & Stark 2011).

DISCUSSION
In this study, there are a few issues that may bias the results and thus need to be considered. First, the mass offsets are compared to the observed relation derived in the local universe (D20). In fact, the different simulations could have different mean relations at z = 0 (e.g., Habouzit et al. 2021). As a result, the interpretation of the BH mass offsets of the simulations with redshift, anchored to the D20 local scaling relation, can be different if using the local relation of each individual simulation. In addition, the stellar mass of the Horizon-AGN sample is the total mass, while that for the other hydrodynamic simulations is determined within a 3D 30 kpc spherical aperture 3 . Therefore, in Figure 8, we manually recalibrate the offsets of all simulations so that their mean value at z = 0.3 is fixed to match that of the observations. This enables the evolutionary trend in the offsets of each sample to be clearer. In general, the updated results show that the evolution of the simulations is consistent with our interpretations in Section 4.2. This result can be expected since most of the simulations have demonstrated a good match to the observations at redshift z = 0.3. Note-This table collects the comparison results of the M * -M BH correlations between different simulation at different redshift. The value shows the central position offset to the D20 local scaling relation and the scatters measured around the local relation after applying the offset. A positive offset means the M BH value predicted by the simulation is higher than the local relationship measurement at fixed M * value. The last column shows the corresponding IMF that was adopted to the local anchor to make a fair comparison with the observation. Note that for the observational data, the relative differences between local and high-z measurements are not affected by the IMF assumptions. For the MBII sample, the simulation does not produce the sample at z = 0.5 or z = 0.7, but rather at z = 0.6. We use a Monte Carlo approach with 500 realizations to infer the uncertainties of the values in the table, finding that the uncertainties are within ±0.03.  Figure 3. The predictions from the numerical simulations, given in Table 2, are presented by different colored symbols. The grey horizontal band illustrates the level of dispersion for the local sample (D20).
Regarding the survey volumes, we collected all available samples in the simulations to perform these comparisons. The final sample sizes, used to compare with observation, are not set by design. Despite that the observed samples are not volume limited and the simulated samples are not volume matched, fortunately, the volumes of all the simulations appear to be sufficient, i.e., all simulations have a similar number (or more) of data points to compare with the observations (see Figure 3). In addition, similar results are found with the smaller-and larger-volumes simulations, e.g., TNG100 and TNG300, which reflect the fact that the simulation volume reaches a sufficient size to be effective for the comparisons in the observational plane. Even though, the intrinsic scatter of TNG100 and TNG300 (before the noise/selection) can differ. 4 To avoid any possible bias raised by sample mismatch, we designed our selection (Section 3.6) so that the distributions of L bol and M BH are similar. We also as-sure that the M * spans similar ranges. As a test, we loosened the selection by not requiring equivalent L bol distributions. The result shows that the offset distributions for all simulations are similar (see demonstrations in Figure 7) to those requiring matched L bol distributions. This test implies that our comparison results are stable, even considering the possibility for AGN variability (10 − 20% level flux variation) and the lack of the obscured population being represented in our observed samples.
In the literature (Weinberger et al. 2018b;Habouzit et al. 2021), it has been noted that the mass correlations predicted by TNG300 and TNG100 are not identical. For example, TNG300 appears to have different scatter in M BH at fixed M * from TNG100 at M * > 10 10 M . In addition, BH growth is more efficient in TNG100, thus causing the BH mass function to have higher normalization at the low-mass end. In our work, the differences between TNG100 and TNG300 also exist. However, the differences are milder after selection effects and random noise injection have been applied to the samples.
We considered all known observational effects, including sample selection and random noise injection, and applied these to the data products from the simulations in order to directly compare with observations. Our result shows that the scatters between the observations and simulations are very similar. Considering that the observational and simulated samples are matched to the best of our ability, their intrinsic properties are also similar. However, there may be limitations in our comparisons that may impact the results. For instance, there may be unknown observational effects that have not been properly applied to the simulations. Even so, state-of-the-art simulations likely do not capture all the physical aspects that impact the scatter. For example, the spin of BHs is not modeled in the simulation, which could change the scatter since the spin affects both the accretion rate and the energy that can be released through AGN feedback (Dubois et al. 2014a;Bustamante & Springel 2019;Habouzit et al. 2021) at some level. Even though the spin effect is still not well known and such a study is beyond the scope of this paper, a future effort may be warranted when the simulations incorporate spin.

CONCLUSIONS
We compared the observed scaling relation M BH -M * with the predictions from numerical simulations. The observational data are composed of 626 quasars at 0.2 < z < 0.8 imaged by HSC and 32 X-ray-selected quasars at 1.2 < z < 1.7 imaged by HST. The simulations include an SAM and five hydrodynamic simulations, i.e., MBII, Illustris, TNG100, TNG300, and Horizon-AGN. Figure 6. Comparisons of the offset of MBH (to the D20 local scaling relation) as a function of stellar mass from observational data and the simulations at z ∼ 0.7. In each stellar mass bin (minimum number of objects is larger than 6), we give the mean and standard deviation of the offset values. To consider random noise, we use the average of ten realizations to calculate the mean and standard deviation in each bin. The histograms on the right show the offset distribution with lines marking the mean offsets for the observations and simulations using the full galaxy sample. The green color distributions with contours show the intrinsic simulated sample distribution without random noise and selection effect.
We carried out the comparisons in the observed parameter space to account for uncertainties and selection effects. To achieve this, we first injected random errors with the same observational uncertainty into the simulation and then adopted the same selection condition for the simulated data (see Figure 2). Finally, we adopted the scaling relation from the local universe as our reference and performed comparisons using the scatter of the measurements and their central offset to the D20 local scaling relation. Our main results are summarized as follows: 1. The observed scatter predicted by the simulations is consistent with the observational measurements, i.e., ∼ 0.5 dex at z < 1 and ∼ 0.3 dex at z > 1 (see Figure 5 and Obs. dex, see Table 2), indicating that observational errors dominate the scatter.

SAM
3. Regarding the offsets of the scaling relation from the local one (∆M BH at a given M * ; D20), all simulations generally match the observations with some subtle, yet notable, differences. While Illustris, MBII, and Horizon-AGN show good correspondence with observations at z < 0.6, the comparisons at z > 0.6 are better for SAM, MBII, TNG100, TNG300, and Horizon-AGN. Bridging the gap from z ∼ 0.7 to z ∼ 1.5, TNG100, TNG300, and Horizon-AGN simulations match well the observed evolution of the scaling relation, i.e., the offsets are larger at higher redshift as shown in Figure 5 and Table 2. Four out of six of the simulations have a decreasing mass offset from z = 0.2 to z = 0.6 while the observational mass offset increases with redshift; however, this is well below the observed scatter.
These results are based on samples with stellar masses mainly with the range [9.5, 11.5] M . Note that the values of stellar masses in both the observations and simulations have significant uncertainty (up to a factor of two). For example, the observed M * depends on the assumption of initial mass function and star formation history, while the value of M * in the simulation depends on how it is defined (i.e., total mass or within 30 kpc aperture), and other subgrid models such as SN feedback and AGN feedback. In contrast, the scatter around the mean correlation is a relative quantity, which is less affected by such systematic effect. Thus, in this work we first consider the scatter as a diagnostic criterion to see whether some simulations match the data better than others. Taking (1) and (2), our results suggest that the tightness of the scaling relations has been formed since redshift 1.7, which is in contrast with the scenario of the central limit theorem (Peng 2007;Jahnke & Macciò 2011;Hirschmann et al. 2010) that the scaling relation is a consequence of a stochastic cloud in the early universe with subsequent random mergers thereafter. In this stochastic scenario we expect the scatter of the scaling relations to increase toward higher redshift. In fact, the scatter level in the simulation without adding random noise is consistent with the intrinsic scatter estimations reported in Ding et al. (2020b); Li et al. (2021a) (i.e., 0.35 dex). This level is also not larger than the typical scatter of the local relations reported in the literature (Kormendy & Ho 2013;Gültekin et al. 2009;Reines & Volonteri 2015).
The simulations studied in this work have adopted completely different numerical techniques. Surprisingly, all of them have similar tightness of the intrinsic scaling relation and thus provide good agreement with the observations in terms of the sample dispersion. In fact, the tightness of the scaling relation likely stems from the same physics assumed in these simulations (i.e., AGN feedback). Thus, our result is consistent with the hypothesis 5 that AGN feedback, as a causal link between supermassive BHs and their hosts, plays a key role in establishing the scaling relation.
We can gain more insight into the role of feedback by looking at the SAM model, for which multiple feedback models have been implemented. Ding et al. (2020a) compared the scaling relations obtained with the same HST sample and the SAM simulation but with a different, isotropic, AGN feedback model, and found a larger scatter (∼ 0.7 dex) with respect to the present SAM version (∼ 0.36 dex). We ascribed the change to the following reasons: in the new 2D model for feedback, the wave expansions stalls along the direction of the disk, and the radius where the expansion stops depends strongly on both the gas density of the disk and the AGN luminosity. This means that the opening angle (and hence the fraction of expelled gas) is larger when the gas density is small (because of the lower energy that has to be spent to push the gas outward) and when the AGN luminosity is large (because of the larger energy available to push the blast wave outward). These dependencies are summarized in Figure 1. Both quantities depend on the merging histories and are related, since the AGN luminosity L AGN depends on the available cold gas reservoir M gas . The large efficiency of feedback in galaxies with particularly small M gas (for given L AGN ) or in those with particularly large L AGN (for given M gas ) inhibits the BH growth in all the host galaxies that are outliers with respect to the average relation between M gas and L AGN . This results in a smaller scatter.
5 By stating that our comparisons are 'consistent' (Section 5), we do not mean that our hypothesis is the only plausible interpretation; there may be others. The black line denotes the criterion in the simulation for which a supermassive BH is in a quasar (above) or radio (below) mode.
In theoretical models, AGN feedback is often assumed to consist of two distinct modes: a quasar-heating mode where the BH accretion rates are comparable to the Eddington rate, and a radio-jet mode occurring at low accretion rates (see, e.g., Section 3.4). In high-redshift universe, the cold material in the early universe leads to the vigorous accretion to the supermassive BH, which drives the high accretion rates, and thus the quasar mode dominates the feedback. At low redshift, the star formation and feedback ejection reduce the cold material, leading to a lower accretion rate and a radio-mode-dominating feedback (e.g. Dubois et al. 2012;Volonteri et al. 2016;Weinberger et al. 2018a). Our comparison result shows that the level of intrinsic scatter in the scaling relation at redshifts up to 1.7 is consistent with low redshift (see Table 2), which reveals the fact that at high redshift the AGN feedback described by the quasar mode may be responsible at some level for the tight correlation between supermassive BH and its host galaxy. In Figure 9, we demonstrate that essentially all of the quasars in Horizon-AGN that match the observed samples are at high Eddington rates. After that, the radio-jet mode starts to take control at low redshift by maintaining the tightness of the scaling relation untill the level we observe today.
Our work highlights the importance of applying measurement uncertainty and the effect of selection to the simulated data in order to make direct comparisons with observations. Such comparisons have been made in the local universe (e.g., Habouzit et al. 2021) where the mea-surements are relatively robust and the selection function is broad; thus, it is less crucial to ensure consistency between observations and simulations. However, beyond z > 0.2, the scatter and the central distribution of the scaling relations are dominated by measurement uncertainty and selection effects (see Figures 3 and 6) and a forward modeling in the observational plane becomes essential. Indeed, those effects would hamper our understanding of whether BHs and their hosts coevolve or not. For example, from trends seen with stellar mass in Illustris and MBII (Figure 6), we found that the observations of the M BH offsets as a function of M * show a very different trend from the intrinsic one.
Extending this study to even higher-redshift and lower-mass galaxies (M * < 10 10 M ) will be very beneficial, probing closer to the epoch of formation of massive galaxies and supermassive BHs. The understanding of how and when the tight scaling relation emerged is crucial for testing theoretical models (Volonteri et al. 2021). On the observational side, the James Webb Space Telescope will provide high-quality imaging data of AGNs at redshifts up to z ∼ 7 and beyond. The upcoming measurements will represent stringent tests on the proposed physical mechanisms for the initial formation of supermassive BHs and of their subsequent evolution with galaxies.