Connection between Galaxies and H i in Circumgalactic and Intergalactic Media: Variation according to Galaxy Stellar Mass and Star Formation Activity

This paper systematically investigates the comoving megaparsec-scale intergalactic medium (IGM) environment around galaxies traced by the Lyα forest. Using our cosmological hydrodynamic simulations, we investigate the IGM–galaxy connection at z = 2 by two methods: (i) cross-correlation analysis between galaxies and the fluctuation of Lyα forest transmission (δ F) and (ii) comparison of the overdensity of neutral hydrogen (H i) and galaxies. Our simulations reproduce observed cross-correlation functions (CCFs) between Lyα forest and Lyman-break galaxies. We further investigate the variation of the CCF using subsamples divided by dark matter halo mass (M DH), galaxy stellar mass (M ⋆), and star formation rate (SFR) and find that the CCF signal becomes stronger with increasing M DH, M ⋆, and SFR. The CCFs between galaxies and gas density fluctuation are also found to have similar trends. Therefore, the variation of δ F–CCF depending on M DH, M ⋆, and SFR is due to varying gas densities around galaxies. We find that the correlation between galaxies and the IGM H i distribution strongly depends on M DH as expected from linear theory. Our results support the ΛCDM paradigm, confirming a spatial correlation between galaxies and IGM H i, with more massive galaxies being clustered in higher-density regions.


Introduction
The standard picture of galaxy formation within the gravitational instability paradigm indicates that galaxy formation and evolution is closely linked to the surrounding gas called the circumgalactic medium (CGM) and intergalactic medium (IGM; e.g., Rauch 1998;Mo et al. 2010). The inflowing gas from the IGM provides the fuel for star formation in galaxies and promotes the growth of galaxies and their central supermassive black holes (SMBHs). As important as the inflow is the energetic feedback from massive stars and SMBHs, which blows the gas away into the CGM and IGM. Therefore, determining the megaparsec-scale distribution of gas as a function of time and space is quite important for understanding galaxy formation and evolution.
The connection between the CGM/IGM and galaxies has been studied using Lyα forest absorptions in quasar spectra. The most common method to clarify the CGM/IGM-galaxy connection is cross-correlation analysis between Lyα forest absorption and galaxies (e.g., Adelberger et al. 2003Adelberger et al. , 2005Chen et al. 2005;Ryan-Weber 2006;Faucher-Giguère et al. 2008;Rakic et al. 2011Rakic et al. , 2012Rudie et al. 2012;Font-Ribera et al. 2013;Prochaska et al. 2013;Tejos et al. 2014;Bielby et al. 2017). Alternatively, comparisons of galaxy and neutral hydrogen (H I) overdensities have also been discussed in the literature (Mawatari et al. 2017;Mukae et al. 2017Mukae et al. , 2020. Specific high-density regions with abundant H I gas and highly clustered galaxies have also been studied (e.g., Cai et al. 2016;Lee et al. 2016;Mawatari et al. 2017;Hayashino et al. 2019). All of the above studies have revealed that galaxy distribution correlates with the IGM up to tens of comoving megaparsec (Mpc) scales.
The physical properties of H I gas in the CGM or IGM have been studied in detail theoretically (e.g., Meiksin 2009;Fumagalli et al. 2011;van de Voort & Schaye 2012;van de Voort et al. 2012;Meiksin et al. 2014Meiksin et al. , 2017Rahmati et al. 2015), aided by powerful cosmological hydrodynamic simulations such as EAGLE , Illustris (Genel et al. 2014;Vogelsberger et al. 2014aVogelsberger et al. , 2014bSijacki et al. 2015), and IllustrisTNG (Villaescusa-Navarro et al. 2018;Nelson et al. 2019). Turner et al. (2017) have presented the median H I optical depth versus the line of sight (LoS) or transverse distance around galaxies in an EAGLE simulation and found that the H I optical depths are sensitive to dark matter halo mass. Sorini et al. (2018) have compared the mean Lyα absorption profiles as a function of the transverse distance from galaxies in both observations and simulations and showed a reasonable match between them beyond 2 proper Mpc (pMpc) but significant differences at 0.02-2 pMpc. A subsequent study by Sorini et al. (2020) has investigated the impact of feedback around mock quasars and found that stellar feedback is the dominant driver (rather than active galactic nuclei (AGNs)) in determining the average properties of the CGM.
The correlation between galaxies and IGM morphology has also been examined in the literature. Martizzi et al. (2020) have demonstrated that galaxies with lower stellar masses than the median are in voids and sheets of the IGM, whereas galaxies with higher stellar masses are more likely to be in filaments and knots of the IGM with higher gas densities. In addition, the correlation of mock Lyα forest absorption spectra with galaxy overdensity has been studied and compared with observations (e.g., Stark et al. 2015;Miller et al. 2019). Although these theoretical studies provide further evidence of a strong link between the CGM/IGM and galaxies, there are many aspects that are still unclear, and our understanding is still insufficient.
Why do we need to study the distribution of H I and its bias beyond the well-established linear theory of the ΛCDM model? One of the main reasons is that the bias between baryons, galaxies, halos, and dark matter is nontrivial, and it is of substantial interest for future projects of IGM tomography, such as the Subaru PFS, Euclid, DESI, and 21 cm cosmology by the Square Kilometre Array. Understanding nonlinear bias at intermediate and small scales is becoming increasingly more important for baryon acoustic oscillation studies of precision cosmology. What we are trying to achieve in this paper is to dig deeper into this nonlinear bias between baryons (particularly H I), galaxies, and dark matter, beyond the simple halo bias, and evaluate the impact of limited observational data on the cross-correlation function (CCF). Feedback effects from galaxies can alter the H I distribution in the intergalactic space due to ionization, and it is crucial to use a cosmological hydrodynamic simulation that solves heating and cooling selfconsistently with the impact of the UV background radiation field. As observations can only trace a certain phase of gas, it is unclear if the simple bias of linear theory is applicable to H I. In fact, it has been shown that H I has a complicated scaledependent bias by many authors (e.g., Villaescusa-Navarro et al. 2018;Ando et al. 2019Ando et al. , 2020Sinigaglia et al. 2020). It is thus crucial to examine the IGM H I-galaxy connection more deeply, including the dependence on galaxy mass and galaxy population. Current observational data are still very limited to study the CCF, in terms of both the survey volume of Lyα forest absorptions and the number of galaxies with spectroscopic redshifts (Momose et al. 2021, hereafter M21). In comparison, numerical simulations provide an alternative opportunity to study the IGM H I-galaxy connection, and this paper presents such systematic investigations for the first time. Our results certainly include the effect of halo bias, but they also reflect the intricate nonlinear bias between H I, galaxies, and dark matter, beyond the simple linear theory.
In order to unveil the gas distribution on Mpc scales around galaxies, we systematically investigate it using numerical simulations. We aim to (1) establish the methodology for evaluating the CGM/IGM-galaxy connection, (2) examine its dependence on galaxy properties, and (3) verify the cause of dependences using GADGET3-Osaka cosmological hydrodynamic simulations (Shimizu et al. 2019). What is new in this paper is a systematic investigation of the IGM-galaxy connection depending on several galactic properties. We make subsamples based on galaxies' stellar mass M å , halo mass M DH , star formation rate (SFR), and specific SFR (sSFR = SFR/M å ) and compare the CCFs. Calculating the CCFs in three dimensions taking various galaxy properties into account is also a novel aspect of this work. Studying the IGM-galaxy correlation with the same resolution as the latest observations and comparing the simulation results with them is clearly a worthwhile exercise to check the validity of the ΛCDM model because the application of linear theory to the IGM H I-galaxy connection has not yet been fully investigated. In this study, we particularly focus on statistical comparisons between simulations and observations. For a fair comparison, we use the same parameters as in our companion observational paper (M21). As we describe in more detail in M21, for observations, we use the COSMOS Lyα Mapping And Tomography Observations (CLAMATO), which is publicly available Lyα forest 3D tomography data (Lee et al. 2014(Lee et al. , 2016(Lee et al. , 2018, and several other catalogs in the archives. This paper is organized as follows. We introduce our numerical simulations in Section 2 and the methodology for examining the IGM-galaxy connection in Section 3. The results and discussions are presented in Sections 4 and 5, respectively. Finally, we give our summary in Section 6. In the appendix, we discuss the dependences of our results on redshift width, redshift uncertainty, sample size, and cosmic variance. We note that "cosmic web" and "IGM" are used specifically for those traced by H I gas unless otherwise specified in this paper. In addition, we mainly use h −1 Mpc in comoving units in the following sections.

Simulations
In this paper, we use cosmological hydrodynamic simulations performed with GADGET3-Osaka (Shimizu et al. 2019), which is a modified version of the TreePM smoothed particle hydrodynamics (SPH) code GADGET3 (originally described in Springel 2005). Some physical processes important for galaxy formation such as star formation, supernova (SN) feedback, and chemical enrichment have been implemented and described in detail by Shimizu et al. (2019). Our simulations reproduce various observational results such as the stellar mass function, SFR function, stellar-to-halo-mass ratio, and cosmic star formation history within observational uncertainties at z 2 (I. Shimizu et al., in preparation).
Here, we briefly describe our simulations, which employ N = 2 × 512 3 particles in a comoving volume of ( ) h 100 Mpc 1 3 . The particle masses of dark matter and gas are 5.38 × 10 8 M e and 1.00 × 10 8 h −1 M e , respectively. The gravitational softening length is set to 8 h −1 kpc in comoving units. For SPH particles, the smoothing length is allowed to be 10% of the softening length (800 h −1 pc in comoving units) in regions where baryons dominate over dark matter. This means that at z = 3 the physical maximum resolution for gas may reach 200 pc (proper) in our simulations. In this study, we pay particular attention to the physical properties of gas at r > 100 h −1 kpc from galaxies, and our resolution is sufficient for this study.
Star particles are generated from gas particles when a set of criteria are satisfied. Note that the mass of gas particles changes over time due to star formation and stellar feedback (by SNe and AGB stars). As described in detail in Shimizu et al. (2019), we employ the CELib package (Saitoh 2017), which can treat the time-and metallicity-dependent metal yields and energies from SNe II, SNe Ia, and AGB stars, allowing for a more realistic chemical evolution of simulated galaxies. In order to identify the simulated galaxies, we run a friends-of-friends (FoF) group finder with a comoving linking length of 0.2 in units of the mean particle separation to identify groups of dark matter particles as dark matter halos. We then identify gravitationally bound groups of a minimum of 32 particles (dark matter + SPH + star) as substructures (subhalos) in each FoF group using the SUBFIND algorithm (Springel et al. 2001), which is a standard practice in many simulation works. We regard substructures that contain at least five star particles as our simulated galaxies. Moreover, we define the most massive galaxy in a halo as the central galaxy and the rest as satellite galaxies. We also calculate the virial halo mass (M DH ), which is defined by the total enclosed mass inside a sphere with 200 times the critical density of the universe. This means that the member galaxies (the central and satellite galaxies) in a dark matter halo have the same M DH , even though the substructures can have different subhalo masses as calculated by SUBFIND. Note that each gas (star) particle has some associated physical properties such as mass, SFR, and metallicity. In our SPH simulation, these values are automatically computed following hydrodynamic and gravitational interactions, rather than given by hand, which is a fundamental difference from semianalytic models of galaxy formation. The properties (gas mass, stellar mass M å , and SFR) of a simulated galaxy are defined by the summation of these quantities in each subhalo.
In order to directly compare our simulations and observations, we create the light-cone output of gas particles and galaxies by connecting 10 simulation boxes of different redshifts following our previous work (Shimizu et al. 2012(Shimizu et al. , 2014(Shimizu et al. , 2016. The redshift range of our light-cone output is z ∼ 1.8 to 3.1, which can cover the redshift range of recent Lyα absorption line surveys (e.g., CLAMATO; Lee et al. 2014Lee et al. , 2016Lee et al. , 2018 and future PFS Lyα absorption surveys (Takada et al. 2014). We then randomly shift and rotate each simulation box so that the same objects do not appear multiple times on a single LoS at different epochs.
With this light-cone output, we calculate the Lyα optical depth (τ Lyα ) along the LoS. First, we calculate the important physical quantities, A grid (x), at each grid point x along the LoS, such as the H I density, LoS velocity, and temperature, as follows: where A j , m j , ρ j , and h j are the physical quantity of concern, gas particle mass, gas density, and smoothing length of the jth particle, respectively. W is the SPH kernel function, and r is the distance between LoS grid points and gas particles. For simplicity, the grid size (dl) is set to a constant value of 100 h −1 kpc in comoving units, which corresponds to 10 km s −1 in the velocity space at z ; 2. This is a higher resolution than that of any of the relevant Lyα observations. Then, we calculate the Lyα optical depth τ Lyα (x) using these physical values at each grid point as follows: where e, m e , c, f ij , n H I , and x j are the electron charge, electron mass, speed of light, absorption oscillator strength, H I number density, and jth grid point location, respectively. f is the Voigt profile, and we use the fitting formula of Tasitsiomi (2006) without direct integration. In this study, after making the high-resolution LoS data, we reduce our resolution by coarsegraining the grid size to match the observations: 1024 (=32 2 ) LoSs are drawn with regularly spaced intervals. The mean separation of each LoS is 3.3 h −1 Mpc, which is similar to that in the CLAMATO survey.
Finally, we note that AGN feedback is not considered in the simulation that we use for this paper. We note that the impact of AGN feedback on H I distribution is still under debate. For example, Sorini et al. (2020) have found that AGN feedback has only little impact on the surrounding CGM in general, and SN feedback has more dominant effects. Besides, statistical results of the H I-galaxy connection obtained from the same simulations in this study have shown good agreement with current observations . Some observations have found the proximity effect around quasars (e.g., Mukae et al. 2020;Momose et al. 2021), which requires full radiative transfer calculations to compare them with simulations. Therefore, we defer the study of AGN feedback using radiation transfer to future work.

Methodology
One of the main purposes of this study is to compare results with CLAMATO, which is 3D tomography data of Lyα forest transmission fluctuations (δ F ; see the definition below) over 2.05 < z < 2.55 in 0.157 deg 2 of the COSMOS field (Scoville et al. 2007;Lee et al. 2016Lee et al. , 2018 To produce a similar data cube of δ F from our LoS data, we first evaluate the Lyα transmission fluctuation δ F in each LoS pixel by Ly is the Lyα flux transmission and 〈F z (x)〉 is the cosmic mean transmission. We adopt the following value derived by Faucher-Giguère et al. (2008): because it is used in CLAMATO. Additionally, we use the same setup used in CLAMATO with a Hubble constant h = 0.7 and a redshift coverage of 2.05 z 2.55. In the following two subsections, we present the methods for two analyses: (i) cross-correlation and (ii) overdensity analysis.

Cross-correlation Analysis
Cross-correlation analysis is often used in the literature to characterize the correlation between galaxies and the CGM/ IGM (e.g., Adelberger et al. 2005;Tejos et al. 2014;Croft et al. 2016Croft et al. , 2018Bielby et al. 2017). In this study, we adopt the following definition of the CCF: where ξ δF is the CCF between δF and galaxies, δ g,i and δ ran,j are the values of δ F for pixel i and j at distance r from galaxies and random points (Croft et al. 2016), and N(r) and M(r) are the numbers of pixel-galaxy and pixel-random pairs in the bin with distance r.
To calculate CCFs, we prepare two LoS data with different LoS grid resolutions. One is LoS data with the original resolution of comoving 0.1 h −1 Mpc. The other is lowerresolution data with a coarse-grained grid size of 0.4 h −1 Mpc (as described in Section 2) to match the CLAMATO resolution of 0.5 h −1 Mpc at z = 2.35. We call this latter, lower-resolution data set "LoS-4." For comparison with observations, we use the LoS-4 data set and calculate CCFs at 1-50 h −1 Mpc scales around galaxies.
At the same time, we use the original LoS data set to derive CCFs at r = 0.16-1 h −1 Mpc, because the redshift resolution of LoS-4 data is greater than the smallest radius for CCF calculation. Hereafter, we refer to the scales of r < 1 h −1 Mpc and r 1 h −1 Mpc as the "CGM regime" and "IGM regime" as indicated in Figure 1(e1).
The ξ δF value is evaluated in each shell with a thickness of ( D r h log 1 Mpc) = 0.2 and 0.1 for the CGM and IGM regimes, respectively. We confirm that our CCFs by the LoS and LoS-4 data are smoothly connected at r = 1 h −1 Mpc within the error. We perform jackknife resampling by leaving one object out and calculating the ξ δF value and adopt the jackknife standard error as the error in each shell.
To examine how the CCF varies according to the physical properties of galaxies, we divide the galaxy sample into three to five subsamples according to M å , M DH , SFR, and sSFR. The number of galaxies in each subsample and its name are summarized in Table 1.

Overdensity Analysis
Another analysis that we perform in this paper is a direct comparison of the IGM absorption and galaxy overdensity within cylinders along the LoS direction. This method was originally proposed by Mukae et al. (2017), and we call it "overdensity analysis." We first generate a 2D LoS map by binning the LoS data with Δz = 0.032, which corresponds to 27.9 h −1 Mpc. We then estimate the mean IGM fluctuation 〈δ F 〉 within circles of radii of 4.74 h −1 Mpc centered on the local minima and maxima of the map. The sizes of both Δz and the cylinder radius are chosen to be comparable to the actual observations (see M21). Note that we use a cylinder of Δz = 0.08 (Δz = 0.032) corresponding to a length of 69.7 (27.9) h −1 Mpc and with a radius of 3 (4.74) h −1 Mpc in our observational analysis because the mean separation of our LoS data is 2.35 h −1 Mpc. To determine the pixel positions of the local minima/maxima, we first mark the positions of the local minima/maxima of δ F within a few h −1 Mpc scale and then repeat the same procedure to identify the minima/maxima on even smaller scales.
The galaxy overdensity is also computed together with 〈δ F 〉 in the same cylinders as follows: where N gal and 〈N gal 〉 are the exact number of galaxies and the mean number of galaxies in the cylinder, respectively. Note that we calculate N gal and 〈N gal 〉 for each of the three stellar mass-divided subsamples given in Table 1. Since we first generate a 2D LoS map, the galaxy overdensity computed above can be regarded as the galaxy surface density, and thus we denote it as Σ gal .
To increase the number of data points, we randomly select four different redshift slices. The number of galaxies in each redshift slice is summarized in Table 2. We also perform overdensity analysis for randomly selected positions to examine possible bias due to the positioning of the cylinders (see also Mukae et al. 2017).

Lyα Absorption Fluctuation
The CCFs of all galaxies and subsamples are shown in Figure 1 (top). We also present the sign-flipped CCF plotted in log-scale in Figure 1 (bottom) for the discussion in Sections 4.1.2 and 5.1. We detect a CCF signal up to r ∼ 40 h −1 Mpc, which is in good agreement with observations by Adelberger et al. (2005) and Bielby et al. (2017). We find that there is a clear tendency of the CCF signal depending on the subsample, except for the sSFR subsamples. The trend is that the CCF signal becomes stronger with increasing galaxy masses and SFRs, but the SFR-(v) subsamples do not follow this trend. A turnover radius where ξ δF reaches about zero also shows a trend for galaxies in the M å and M DH subsamples (see Figures 1(a2) and (b2)), which a sample with a stronger CCF signal drops rapidly to zero at a smaller radius-hereafter we call this the "turnover radius." For the SFR samples, however, the CCF signal does not seem to correlate with the turnover radius. On the other hand, the sSFR samples do not show any obvious trend in their CCFs. Nonetheless, the turnover radius of the sSFR samples increases with an increasing sSFR. Likewise, Turner et al. (2017) have demonstrated the halo mass dependence of the median τ H I as a function of distance from their modeled galaxies. Meiksin et al. (2017) have investigated δ F for galaxies in M DH subsamples against the projected impact parameter and found an increase in δ F with halo mass. Observationally, Chen et al. (2005) have measured the two-point cross-correlation ξ ga between Lyα absorbers and absorption line-dominated or emission linedominated galaxies, which are presumably massive early-type and star-forming galaxies, and presented a different amplitude of ξ ga (see also Chen & Mulchaey 2009). Wilman et al. (2007) have also possibly found differences in the cross-correlation signal between absorption line-dominated and emission linedominated galaxies (see their Figure 4), although they did not claim that they were statistically significant detections. To characterize our CCFs, we fit them with a power law: where r 0 and γ are the clustering length and slope. We apply the power-law fitting to the CCFs in Figure 1 (bottom) over 0.1-1 h −1 Mpc and 1-10 h −1 Mpc separately, corresponding to the CGM and IGM regimes. Note that r 0 and γ reflect the amplitude and slope of the CCF, respectively. The best-fit parameters of all the CCFs are presented in Figure 2. Filled and open circles represent the best-fit parameters of the CGM and IGM regimes, respectively. For all galaxies, we obtain the best-fit parameters of (r 0 , γ) = (0.07 ± 0.004, 0.50 ± 0.01) and (0.34 ± 0.03, 1.07 ± 0.04) for the CGM and IGM regimes (see also Figure 2(e)). Several observational studies have performed a power-law fitting to their CCFs between Lyα absorption and galaxies. Tummuangpak et al. (2014) have calculated the CCF between Lyα absorption and LBGs at z ∼ 3 and fit it by a double power law, showing (r 0 , γ) = (0.08 ± 0.04, 0.47 ± 0.10) and (0.49 ± 0.32, 1.47 ± 0.91) for the CGM and IGM regimes used at r = 1.6 h −1 Mpc as a border. A subsequent study of IGM-LBG clustering by Bielby et al. (2017) described their CCF by a single power law with (r 0 , γ) = (0.27 ± 0.14, 1.1 ± 0.2). A power-law fitting to the CCF of a weak H I (N H I < 10 14 cm −2 ) IGM and galaxies at z < 1 has also been attempted and resulted in (r 0 , γ) = (0.2 ± 0.4, 1.1 ± 0.3) (Tejos et al. 2014). Although the fitting range for power-law fitting is different among studies, our best-fit parameters for both the CGM and IGM regimes are comparable to those of previous observations within the error.
We next show the best-fit parameters obtained from all categories. For M å and M DH , they show similar trends in both r 0 and γ of both the CGM and IGM regimes. The clustering length r 0 becomes longer with increasing mass independently of the regime. We remind the reader that r 0 reflects the amplitude of the CCF. Thus, we can confirm particularly in Figure 1 (bottom) that the higher the mass becomes, the stronger the CCF amplitude is. Meanwhile, the slope γ becomes smaller with increasing mass in the CGM regime but is consistent with being constant within the error in the IGM regime. In the literature, Tummuangpak et al. (2014) have calculated the mass-dependent CCFs by dividing their simulated galaxies into two categories of M å > 10 8 and 10 9 h −1 M e . Their double power-law fit for these M å > 10 8 and 10 9 h −1 M e samples have given (r 0 , γ) = (0.10 ± 0.07, 0.46 ± 0.22) and (0.16 ± 0.09, 0.46 ± 0.19) in the CGM regime and (0.51 ± 0.39, 1.25 ± 0.61) and (0.61 ± 0.34, 1.18 ± 0.43) in the IGM regime. Although the differences of r 0 and γ between the M å > 10 8 and 10 9 h −1 M e samples are within the error, a similar trend is confirmed in r 0 . For the SFR sample except SFR-(v), we confirm that r 0 becomes greater with an increasing SFR. This is naturally explained by the fact that our galaxy sample are mostly on the star formation main sequence between SFR and stellar mass. On the other hand, γ has no clear trend in either the CGM or the IGM regime. Although the CCFs of all three sSFR categories are comparable as presented in Figure 1(d1), the best-fit parameters of both r 0 and γ are different from each other. Interestingly, both r 0 and γ show identical trends in both the CGM and IGM regimes, where sSFR-(i) shows the smallest value. We briefly discuss the reason in Sections 5.1 and 5.2.

Gas Density Fluctuation
The δ F value of the LoS data should correlate with the H I gas density at the position. Thus, the variation of CCF amplitudes can be attributed to the variation of the local gas density around galaxies. To verify this hypothesis, we evaluate the CCFs of gas density fluctuations around galaxies defined by where δ ρ is the gas density fluctuation defined by the ratio of the gas density at one LoS pixel ρ to the mean gas density at each redshift of Δz = 0.01, 〈ρ z 〉; ξ δρ is the CCF; and d r i , g and d r j , ran are the gas density fluctuations for pixel i and j at distance r from galaxies and random points. As in Equation (5), N(r) and M(r) are the numbers of pixel-galaxy and pixel-random pairs in the bin with distance r. The CCFs of gas density fluctuations are measured for both total gas and H I (d r gas and d r H I ) and are shown in Figures 3 (top) and 4 (bottom). For comparison to the CCFs of δ F , the log-scale inversion CCFs of δ F (i.e., d -log F ) are also shown in Figure 1 (bottom). Due to several negative values in d r H I , we also present the mean gas density fluctuations around galaxies in Figures 3 (bottom) and 4 (bottom).
First, we start from the CCFs of the total gas density fluctuation d r gas in Figure 3. The overall trends for each category (i.e., M å , M DH , SFR, and sSFR) are almost the same as those of δ F ʼs CCFs. That is, the overall CCFs monotonically decrease with radius, and the CCF signal becomes stronger with increasing galaxy mass (either M å or M DH ) or SFR. However several samples show notable behavior in their CCFs' strength log (sSFR/yr −1 ) - 9 log sSFR 38,078 sSFR-(i) - 10 log sSFR < −9 47,419 sSFR-(ii) -> 10 log sSFR 3949 sSFR- (iii) or shape at a certain radius. For M å -11, M DH -13, and SFR-(i), we find a slight decline of x dr gas at the center. This indicates the decline of the relative total gas density in the vicinity of those galaxies. For SFR-(v) and sSFR-(iii), their CCFs show a convex feature at r = 0.3-2 h −1 Mpc and even have the strongest signal among each category over that radius range. It is noted for M DH -10 that the swelling at r ∼ 0.5 h −1 Mpc may be due to its small sample size. The CCFs of H I gas density fluctuation d r H I are shown in Figure 4. Compared to the H I optical depth distribution on the LoS (Figure 1, top, δ F ), the raw H I gas particles have a slightly discrete distribution (Figure 4, bottom, d r H I ). This is because we consider line broadening based on the Voigt profile (see also Equation (2)). As a result, the CCFs in δ F have a smoother shape than those in d r H I . Similar to the overall trends seen in the CCFs of d r gas , we find that the CCF signal becomes stronger with increasing M å , M DH , or SFR in general. The trend is also about the same as that found in the CCFs of δ F . The consistency of the CCFs' trends is naturally explained by considering Equations (2) and (3), where δ F is proportional to H I number density. We also find that the irregularity identified in the CCFs of d r gas is seen in several samples: a decline of x dr H I at the center in M å -11, M DH -13, and SFR-(i) and a convex profile and the strongest signal at r = 0.3-2 h −1 Mpc in SFR-(v) and sSFR-(iii). In addition to the above irregular CCF shapes, SFR-(v) and sSFR-(iii) show a significant decline of x dr H I values at the center. This suggests that H I densities around galaxies in M å -11, M DH -13, SFR-(i), SFR-(v), and sSFR-(iii) are also relatively low in general. We discuss this in Section 5.2.

Overdensity Analysis
We present the results of the overdensity analysis in Figures 5(a) and (b), which are derived from local minima/ maxima and random positions in the gas density field. The analysis is performed for all galaxies and M å -dependent subsamples of M å -11, M å -10, and M å -9. We evaluate the Σ gal and 〈δ F 〉 values in each of the four redshift slices indicated by open or filled circles colored red or blue. The exact galaxy count used in each redshift slice is shown in Table 2.
First, we find possible anticorrelations between Σ gal and 〈δ F 〉 in Figure 5 We should comment on the effect of outlier data points in M å -10, M å -9, and ALL of Figure 5(a). We repeat Spearman's rank correlation test for all data points but without the outliers and obtain R s = (−0.37, −0.28, −0.32) for the (M å -10, M å -9, ALL) samples. Therefore, weak anticorrelations are still confirmed even without the outliers.
To characterize the 〈δ F 〉-Σ gal distribution, we follow Mukae et al. (2017) and apply chi-square fitting in Figure 5 with the linear model The best-fit parameters of α and β are summarized in Table 2. We find that α ∼ −0.13, which is about the same for all of the four samples within the error, while the β become slightly larger with increasing M å , although they are still similar within the error: β = −0.007 ± 0.003, −0.014 ± 0.005, and −0.020 ± 0.007 for M å -11, M å -10, and M å -9, respectively. Note that the best-fit parameters of anticorrelations for M å -10, M å -9, and ALL without outliers appear to be comparable within the error. We compare the best-fit parameters of the ALL sample to those in Mukae et al. (2017), which are (α, β) = (−0.17 ± 0.06, --+ 0.14 0.16 0.06 ). We find a similarity in α but a large difference in β, showing a much shallower slope for our sample. A shallower slope of the simulated galaxy sample has also been found in Nagamine et al. (2020). It may be attributed to photo-z errors in the observational data. If the photo-z errors are large, then some galaxies would contaminate the sample and the value of Σ gal would be smeared out. As a result, the observed Σ gal only has a narrow dynamic range, which could make the apparent correlation steeper than the real one.
We also examine the result of overdensity analysis based on randomly selected points in order to verify the effect of position bias ( Figure 5(b)). Spearman's rank correlation tests for randomly selected positions yield mild anticorrelations with R s = −0.51 to R s = −0.57. The best-fit parameters of the linear model (α and β) are also comparable to those from Figure 5(a) within the error. Moreover, the trend found in the best-fit α and β as a function of stellar mass is confirmed. Therefore, we  (2) Spearman's coefficient and pvalue. The 〈δ F 〉-Σ gal relation is examined around the local minima and maxima.
(3) The best-fit parameters of the chi-square fitting of the 〈δ F 〉-Σ gal relation examined around the local minima and maxima. (4) Spearman's coefficient and p-value. The 〈δ F 〉-Σ gal relation is examined around random points. (5) The best-fit parameters of the chi-square fitting of the 〈δ F 〉-Σ gal relation examined around random points.
conclude that the position bias of overdensity analysis does not affect the 〈δ F 〉-Σ gal correlation seriously.

Origin of CCF Variations
We present in Section 4.1.1 that the CCF of δ F varies depending on the galaxy mass and SFR. To find the origin of its variation, we also calculate the CCFs of d r gas and d r H I in Section 4.1.2 and find that their signal strengths also depend on M å , M DH , and SFR. This suggests that the different relative gas densities around galaxies cause the variation of the CCFs of δ F . Considering the relation between δ F and H I number density in Equations (2)-(4) a similar trend of CCFs in δ F and d r H I is reasonable. The same trend even for d r gas probably means that the total gas density correlates with H I gas density in general. Therefore, we argue that the variation of δ F CCFs is caused by different gas distributions around galaxies.
We find that not only the δ F CCFs but also the best-fit parameters of their power-law fitting vary depending on the galaxies' mass and SFR (see also Figure 2). In particular, our best-fit clustering lengths r 0 for the IGM regime show an increase with an increasing mass or SFR of galaxies. A dependence of CCF signal strength and its best-fit parameters on H I column density (N H I ) of the CGM and IGM has also been reported in observational studies. For example, Ryan-Weber (2006) has calculated CCFs by splitting their absorber sample into two based on the absorber's N H I and found that the high-N H I subsample shows stronger correlation than the low-N H I subsample. Tejos et al. (2014) have calculated CCFs depending on N H I for galaxies at z < 1 and found a positive correlation between best-fit parameters (both r 0 and γ) and N H I . They have also demonstrated a dramatic change of CCF signals depending on N H I , showing more than a factor of 10 higher CCF signals in the N H I 10 14 cm −2 sample compared to those of the N H I < 10 14 cm −2 sample. Similarly, Bielby et al. (2017) have analyzed the cross-correlation between Lyα absorption with different N H I measurements and LBGs at z = 3 and presented a positive correlation between the best-fit parameters of their CCFs and N H I . These studies have also argued for a relation between the best-fit parameters (r 0 and γ) and gas, particularly H I. Larger r 0 implies a stronger clustering of H I systems around galaxies. On the other hand, for γ, previous studies have suggested the necessity for additional baryonic physics to explain its changes with N H I .
Considering the above discussion, the signal strength and best-fit parameters (r 0 and γ) of the CCF depend on relative gas densities on Mpc scales near the galaxy. If the gas has a high density and clusters around a galaxy, the resultant CCF between Lyα absorbers and the galaxy must have a stronger signal and give larger best-fit parameters for the IGM regime. The variation of CCFs is hence attributed to the different gas density around each galaxy and the strength of the galaxy-IGM connection.

What Types of Galaxies Strongly Correlate with the IGM?
Under the ΛCDM paradigm, massive galaxies are expected to strongly correlate with the underlying dark matter (e.g., Figure 2. Best-fit parameters of the power-law fitting. Filled and open circles represent the best-fit parameters obtained from the CGM and IGM regimes, respectively. When error bars are not indicated, they are smaller than the symbol sizes. Panels (a)-(e): The parameters for the CCFs in the M å , M DH , SFR, and sSFR categories and for the CCF from all galaxies. Mo & White 2002;Zehavi et al. 2005) and hence with IGM as well. In that sense, more massive galaxies should strongly cluster in higher-density regions compared to less massive galaxies. In Section 4.1.1, we confirm that the CCF signal becomes stronger with the increasing mass (both M å and M DH ) of a galaxy. In addition, we find that the turnover radius of  within the error. A shallower slope in the M å -11 sample indicates stronger clustering of massive galaxies around a dense H I IGM.
The above results from both methods (CCF and overdensity analysis) imply that massive galaxies are strongly clustered in high-density regions in the cosmic web, while less massive galaxies have the opposite trend.
The same trend should be true for SFR subsamples by considering the star formation main sequence (e.g., Brinchmann et al. 2004;Daddi et al. 2007;Elbaz et al. 2007;Noeske et al. 2007;Speagle et al. 2014;Schreiber et al. 2015;Tomczak et al. 2016). The overall trend for the SFR samples is that galaxies with higher (lower) SFRs correlate with a higher (lower) gas density of the IGM. However, the SFR-(v) subsample does not follow the trend and even seems to reside in the highest density among all SFR samples at r = 0.3-2 h −1 Mpc (see also Section 4.1.2). Such CCF behavior can be attributed to the halo mass distribution of galaxies in SFR-(v). Figure 6 represents the halo mass distribution of galaxies in all SFR samples. They generally have a single peak, but a mild bimodal distribution is seen in SFR-(v), which has two peaks at M DH ∼ 10 11 and 10 12.5 -10 13 M e . This implies that the host dark matter halos of the SFR-(v) subsample can be roughly divided into two: one is less massive and the other is massive. Given the mass dependence of the IGM-galaxy connection, the strong CCF (i.e., high-density gas) seen at r = 0.3-2 h −1 Mpc of the SFR-(v) subsample reflects high gas density around massive halos.
Our sSFR samples do not show any obvious trends in CCF strength. We find that all three samples have comparable CCFs, although sSFR-(i) gives smaller r 0 and γ in both the CGM and IGM regimes. This can be due to the same reason as what we see in the SFR samples. Similar to the SFR-(v) sample, the sSFR-(iii) sample has a mild bimodal halo mass distribution (see Figure 7). The halo mass distribution of sSFR-(ii) may have a tail extending to higher mass. As a result, these two samples possibly reside in higher-density regions in the IGM compared to sSFR-(i).
Summarizing the above discussion, we conclude that the dark matter halo mass is the most sensitive parameter to determining the baryonic environment around galaxies. Galaxies that are hosted by massive halos are generally located in high-density gaseous environments, resulting in a stronger signal of the CCF.
Finally, we should briefly discuss the declining CCF signal at r < 0.3-0.4 h −1 Mpc around galaxies in the M å -11, M DH -13, SFR-(i), SFR-(v), and sSFR-(iii) subsamples (see Figures 3  and 4), which are hosted by the most massive halos among each category. There are several possible reasons for the lack of H I gas in the central region of massive galaxies. The first possibility is that the gas particles in the central region are blown out to the CGM/IGM by SN feedback. The second possibility is that our feedback prescription without AGN contribution is still inadequate in pushing the gas away into the CGM/IGM for massive galaxies due to their deep gravitational potential. As a result, most gas particles in the central region are consumed by star formation. We need more detailed analysis of our simulation to confirm the reason, but this is beyond the scope of this paper. We will try to address this issue in future work. Figure 5. Overdensity analysis obtained from (a) local minima and maxima of gas density field and (b) random points. The results for M å -11, M å -10, M å -9, and ALL are shown from left to right. Open and filled circles colored red or blue indicate data points from each of the four 2D LoS maps. The best-fit linear regression is shown as a black line with its errors shown by gray shading.

Photo-z versus Spec-z Data and the IGM-Galaxy Connection
In this subsection, we briefly discuss the reliability of using photo-z and spec-z data to investigate the IGM-galaxy connection. We also argue that some precautions must be taken when one applies the two analyses to observational data. We have demonstrated that the cross-correlation method succeeds in identifying variations of the CCF according to galaxy properties. The difficulty for the CCF method is the necessity of relatively accurate redshift measurements for galaxies. As we demonstrate in Appendix B, the CCF signal will be attenuated if the redshift uncertainty is large. According to the general photo-z uncertainties at z ∼ 2 in the literature, σ z = 0.05-0.1 (e.g., Muzzin et al. 2013;Laigle et al. 2016;Straatman et al. 2016), galaxies with only photometric redshift cannot be used for cross-correlation analysis. In general, however, the majority in a given galaxy sample do not have spectroscopic redshift. In that sense, the results of our CCF analysis applied to observational data are biased toward the CGM and IGM around galaxies with spec-z measurements.
On the other hand, galaxies with only photo-z can be used for overdensity analysis that examines the IGM-galaxy connection on large scales (beyond tens of h −1 Mpc). Indeed, photo-z galaxies are used as tracers of large-scale structures (e.g., Nakata et al. 2005;Blake et al. 2007;Trevese et al. 2007;Tanaka et al. 2008;Hayashi et al. 2019). However, there are still the following potential problems in using photo-z galaxies for analysis. First is the contamination of galaxies whose real redshifts are out of the redshift range for 2D LoS data. It must always happen, even if the redshift range is wider than the mean photo-z error. Second is a difficulty in statistically confirming a correlation when the cylinder volume is large (see also Appendix A). Both this study and those in the literature show the presence of an IGM-galaxy connection up to ∼10 h −1 Mpc (e.g., Adelberger et al. 2003Adelberger et al. , 2005Chen et al. 2005;Ryan-Weber 2006;Faucher-Giguère et al. 2008;Rakic et al. 2011Rakic et al. , 2012Rudie et al. 2012;Font-Ribera et al. 2013;Prochaska et al. 2013;Tejos et al. 2014;Bielby et al. 2017). In addition, according to our tests in Appendix A, we suggest that a cylinder length less than Δz = 0.01 (corresponding to 9 h −1 Mpc) might be able to capture large-scale structures in δ F . If either the cylinder length or radius is larger than the above scale, the large-scale structures traced by the cosmic web and galaxies will be attenuated, and thus both 〈δ F 〉 and Σ gal will become close to zero. In that case, the slope of the 〈δ F 〉-Σ gal relation will become indistinguishable.  Based on all these arguments, the cross-correlation method is a promising method to study the variation of the IGM-galaxy connection over a 1 h −1 Mpc scale using actual observational data. On the other hand, overdensity analysis applied to photo-z samples can probe the IGM-galaxy connection on scales beyond several tens of h −1 Mpc.

Summary
We systematically investigate the connection between galaxies and the CGM/IGM, particularly as traced by Lyα forest absorption. In this study, we use cosmological hydrodynamic simulation (Shimizu et al. 2019;Nagamine et al. 2020) and demonstrate the CGM/IGM-galaxy connection using two methods: one is cross-correlation analysis, and the other is the overdensity analysis proposed by Mukae et al. (2017). Using our simulation, we also calculate the CCFs of relative gas density (both total and H I) around galaxies. All parameters for our analyses are chosen to match the observations presented in M21. The main results of this paper are summarized below.
1. We calculate the CCFs between Lyα forest transmission fluctuation (δ F ) and galaxies as shown in Figure 1. The CCF obtained from all galaxies reproduces the one from LBGs in the literature (Adelberger et al. 2005;Bielby et al. 2017). Further investigations based on subsamples divided by the M å , M DH , SFR, and sSFR of simulated galaxies show the following trends and variations in the CCF. For the M å and M DH subsamples, we find a clear trend that the CCF signal becomes stronger with increasing galaxy mass. We also confirm that the turnover radius of CCFs becomes smaller with increasing mass (see Figure 1, bottom), indicating stronger clustering of massive galaxies around density peaks. Additionally, from the best-fit parameters of power-law fitting for the CCFs, the clustering length r 0 and slope γ are found to become longer and steeper with increasing galaxy mass in the IGM regime (r 1 h −1 Mpc; see also Figure 2). For the SFR samples, we find that they tend to have stronger signals with an increasing SFR. Such a trend for SFR samples is also linked to the mass dependence of CCFs, because M å and SFR are almost linearly related to each other through the star formation main sequence of galaxies. However, we do not identify clear trends in the CCFs of sSFR samples. 2. We measure CCFs between gas density fluctuation (d r gas and d r H I ) and galaxies in Figures 3 and 4. The overall trends of the CCFs are similar to those of CCFs in δ F except for SFR-(v) and sSFR-(iii). This indicates that the variation in δ F CCFs reflects different relative gas densities around galaxies; i.e., galaxies with higher mass and SFRs generally reside in higher-density gas and vice versa. For the SFR-(v) and sSFR-(iii) subsamples, we find the strongest CCF signal at r = 0.3-2 h −1 Mpc among all the SFR and sSFR samples. Because the two subsamples have a mild bimodal halo mass distribution with two peaks at M DH ∼ 10 11.3 M e and M DH ∼ 10 12.5 M e , their strongest CCF signals are probably due to highdensity regions where massive host halos reside. We suggest that the observed variation in the CCF is caused by the dependence of gas density (both total and H I) around galaxies.
3. Our overdensity analysis between galaxy overdensity Σ gal and mean IGM fluctuation 〈δ F 〉 is presented in Figure 5. We statistically identify anticorrelations from all subsamples of M å -11, M å -10, M å -9, and ALL. In addition, we find that their slopes decrease with increasing M å , albeit still within the error. This suggests that galaxies in the M å -11 subsample are more strongly correlated with higher-density gas than those in M å -9 in terms of their spatial distribution. 4. Considering all of our results together, we conclude that the mass, particularly the dark matter halo mass, is the most sensitive parameter to determining the Mpc-scale gas density environment around galaxies. Galaxies in massive halos tend to be clustered in higher-density regions of the cosmic web, resulting in a CCF with a higher amplitude, greater r 0 , steeper γ, and shallower anticorrelation between 〈δ F 〉 and Σ gal at r 1 h −1 Mpc.
Overall, our analyses confirm the strong connection between galaxies, dark matter halos, and the IGM, providing further support for the gravitational instability paradigm of galaxy formation within the concordance ΛCDM model. Future observations of CCF studies between galaxies, H I, and metals will provide useful information on the interaction between them and the details of feedback mechanisms that are important for the theory of galaxy formation and evolution such as galactic wind and the associated ejection of metals into the IGM.
By comparing the results of Figures 1 and 2 against predictions of linear perturbation theory, we can infer the mean bias parameters of the Lyα forest and galaxies relative to the underlying dark matter density field (e.g., Croft et al. 2016;Bielby et al. 2017;Kakiichi et al. 2018;Meyer et al. 2019). In addition we can perform cross-checks by computing such bias parameters directly from our simulation output and compare them with those inferred from the linear theory framework. Such bias parameters will further provide additional checks against the gravitational instability paradigm, and we plan to carry out such analyses in further work. 2D LoS Maps To understand the impact on actual observational data, we conduct several tests for generating 2D LoS maps and crosscorrelation analysis with the actual observational data in M21 in mind. In these appendices we briefly show our results. Note that we only calculate the CCFs at r 1 h −1 Mpc for those tests in order to directly compare them with the observational results presented in M21.
We show 2D LoS δ F maps binned by 9, 18, 45, and 90 h −1 Mpc, which correspond to Δz = 0.01, 0.02, 0.05, and 0.1, in Figure 8. Each column indicates a different redshift used for the overdensity analysis in Section 4.2. Figure 8 clearly shows that the intensity of IGM fluctuation becomes attenuated with increasing redshift width for the binning of the LoS data. In addition, the dynamic range of 〈δ F 〉 of the 2D LoS maps binned by more than 45 h −1 Mpc seems to be too narrow to differentiate the environment based on galaxy properties.

Appendix B The Impact of Redshift Measurement Uncertainties on the CCF
The cross-correlation analysis used in this study requires a spec-z sample of galaxies (see Section 5.3). However, spectroscopic redshift measurements are not always available for galaxies in photometric images. Hence, here we test the usability of photo-z galaxies by adding photo-z errors.
We randomly add redshift uncertainties with σ z 0.1 to all galaxies. Then, we recalculate the CCF using the reassigned galaxy redshift z use = z real ± σ z , where z use and z real are the redshift used to calculate the CCF and the original one in our simulation, respectively. This process is carried out 10 times. The CCF of each routine and the mean of 10 tests are shown by thin and thick black lines in Figure 9. We also carry out the same tests with σ z 0.05, σ z 0.02, and σ z 0.01. The actual distances corresponding to the σ z values are (σ z = 0.1, 0.05, 0.02, 0.01) = (90, 45, 18, 9) h −1 Mpc at 〈z〉 = 2.3. In the bottom panel of Figure 9, the ratio of the CCF from all galaxies in red (ξ org ) to the mean of the CCFs from galaxies in consideration of redshift uncertainties (x s z ) is also presented ( We find that all CCF signals become weaker with respect to the original CCF in red, though the scatter of CCF signals among 10 tests is quite small. In particular, the CCF signal becomes insignificant for the data with σ z 0.1 or 0.05. This suggests that galaxy data sets with such large photo-z errors are useless for cross-correlation analysis. However, the other two cases with σ z 0.02 or 0.01 still show some signals at the center, although they are weak. Due to the signal detection, galaxies with σ z 0.02 may be usable for calculating CCFs. Another interesting result from this test is the radius where Ξ becomes approximately unity. Within r < 40 h −1 Mpc, each sample reaches Ξ = 0 at a radius that is equivalent to the actual distance of σ z (see also vertical lines in the bottom panel of Figure 9). These results indicate that a data set with redshift uncertainties less than 1 h −1 Mpc (or σ z ∼ 0.001) would be necessary to obtain a true CCF.
Many high-z galaxies often have photo-z estimates. In the literature, such photo-z uncertainties have been evaluated as σ z = (0.007-0.021) × (1 + z), which corresponds to σ z = 0.023-0.070 at z = 2.35 (e.g., Laigle et al. 2016;Straatman et al. 2016). Considering our tests shown in Figure 9, galaxy data sets with current photo-z measurements only are not useful for cross-correlation analysis. In order to derive an accurate CCF, galaxy samples with good spectroscopic redshifts are needed. If a survey volume is not large enough, the cosmic variance must affect the CCFs (in terms of both amplitude and shape). We evaluate the effect on the CCF by limiting the volume to Δz = 0.1 and 0.05, corresponding to 91 h −1 Mpc and 45 h −1 Mpc in redshift direction. Note that because large-scale fluctuations are missed due to a limited simulation volume, we inevitably underestimate the effect of cosmic variance on large scales. We find that the resultant CCFs scatter around the original one obtained from all galaxies in the entire volume with a wide variation of the power-law γ and r 0 parameters of Equation (9). This suggests that cosmic variance also influences both the slope and the clustering length of the CCF. Therefore, when we compare the CCFs of two different galaxy populations, it would be desirable to match their redshift coverage.

Appendix D CCFs Obtained from Small Sample Size
In Appendix B, we discuss the uncertainty introduced by using photo-z data and the need for more accurate spec-z measurements for cross-correlation analysis. However, the number of galaxies with spec-z measurements is limited, and therefore the derived CCF from spec-z data may suffer from a small sample size and differ from the true signal. Thus, we carry out the following two tests in order to verify the effect of sample size on the resultant CCF.
The first test is to change the completeness of the galaxy sample. We calculate CCFs by randomly selecting 0.1%, 1%, and 10% of galaxies from the entire sample and repeat this procedure 10 times. We find that the resultant CCFs scatter around the original one. Additionally, this scatter becomes smaller with an increasing fraction and negligible in the 10% samples. Therefore, we argue that at least 1% of the total sample must have spec-z measurements to reproduce the true CCF.
Unfortunately, we do not always know the true total number of galaxies in real observations. Therefore as a second test, we examine the effect of using an extremely small sample of randomly selected 5 and 10 galaxies and repeat this step 100 times. We conduct this test twice by changing the galaxy selection method: one is completely random, while the other is to select only galaxies whose 〈δ F 〉 within 1.7 h −1 Mpc of the radius are less than −0.2. The resultant CCFs of both methods show a large dispersion around the original CCF, and the dispersion becomes smaller as the sample size increases from 5 to 10. However, intriguingly, the scatter among 100 CCFs in the latter method becomes smaller than that in the former method. This suggests that the true CCF cannot be obtained from a randomly selected, extremely small sample. On the other hand, a CCF derived from a few galaxies that are located in similar IGM densities (i.e., in a limiting 〈δ F 〉) could still reflect their IGM environments.