The Hierarchical Cosmic Web and Assembly Bias

. Accurate modeling of galaxy distributions is paramount for cosmological analysis using galaxy redshift surveys. However, this endeavor is often hindered by the computational complexity of resolving the dark matter halos that host these galaxies. To address this challenge, we propose the development of effective assembly bias models down to small scales, i


Introduction
The main goal of modern cosmology is to unveil the nature of the main components of the Universe, such as dark matter and dark energy.For this purpose, galaxy surveys like DESI [1] or EUCLID [2] are aimed at mapping the large-scale of the Universe observing millions of galaxies over huge volumes.The central purpose of this endeavor is to measure the influence of dark energy on the expansion of the Universe as well as the effect of dark matter in the evolution of structures.To achieve this objectives, these surveys demand a robust analysis of observational data by means of a strong theoretical framework and galaxy mock catalogues, which are essential for estimating covariance matrices and study systematic effects.
These methods can be classified into two categories: the ones trying to reproduce the halo distribution based on analytical and numerical approximations (PEAKPATCH, PINOCCHIO, later versions of PThalos, HALOGEN, ICE-COLA); and those trying to reproduce the distribution of large-scale structure tracers from detailed reference calculations with effective field-level bias models (e.g., the earliest PThalos version, PATCHY, EZmocks, and BAM).
The latter category offers substantial savings in computational resources, as demonstrated in a comparative study by [27].Moreover, it enables a detailed dissection of contributions from individual bias components, as elucidated in works by [14,23].Importantly, this methodology can be likewise extended to field-level inference studies (see, e.g., [29,30]).
In this study, our approach to modelling structure formation hinges on augmented Lagrangian Perturbation Theory (ALPT) [31], an advancement over linear LPT.ALPT refines the displacement field by decomposing it into long-and short-range components, drawing from second-order LPT (2LPT) and the spherical collapse approximation, respectively.This nuanced treatment allows for a more precise characterisation of the evolving cosmic structures within a one-step gravity solver.We could, however, rely on more accurate and expensive approaches (e.g., eALPT [32], FASTPM [33], and COLA [34]).The findings of the present study will yield to improved results when using more sophisticated gravity solvers, as demonstrated in [16].
The advantage of PATCHY over EZmocks and BAM is its explicit, clear analytic bias description, i.e., the description of the intricate relationship between the dark matter density field and its tracers such as halos, galaxies, galaxy clusters, etc.However, PATCHY faced challenges in accurately modelling the higher-order statistics of low-mass tracers, particularly halos with masses above approximately 10 8 M ⊙ h −1 in a cubical volume of 400 h −1 Mpc side, despite incorporating second-order non-local bias effects [35].
The study by [23], using high mass halos (halos with ≳ 10 12 M ⊙ h −1 in a 1,5 h −1 Gpc side cubical volume), showed that a proper modelling of the higher order statistics requires going to at least third-order non-local bias, and that this can be done through the cosmic web classification.These findings were exploited in the context of effective baryonic physics bias mapping and Lyman-α forest modelling (see [36][37][38]).
This paper extends previous PATCHY versions with a novel hierarchical cosmic web classification, indirectly accounting for both long-and short-range non-local bias at least up to third order.This bias description turns out to be crucial in acquiring a high fidelity forward model, as tested on a N -body based halo catalogue with a mass cut of ≳ 10 11 M ⊙ h −1 in a 1 h −1 Gpc side cubical volume with variance suppression [39].
Our method offers the advantage of streamlining the parametric bias model, consolidating it into a maximum of six parameters that encompass both local deterministic and stochastic bias components.Moreover, the indirect incorporation of non-local bias terms through the hierarchical cosmic web classification enhances the model's comprehensiveness and efficiency.Our approach avoids the explicit inclusion of non-local bias terms, a practice that can lead to the introduction of negative densities when truncated to a specific order.Instead, we segment a large-scale structure catalogue into well-defined subsets, akin to dividing a halo catalogue into mass bins.This segmentation allows us to treat each subset as a distinct catalogue, simplifying the application of a straightforward local bias model to accurately reproduce its characteristics.In this way, we manage to have a solid description of the large scale bias on large scales as stated by renormalised perturbation theory [40,41], and extend it towards small scales with short-range non-local bias terms.The bias model presented in this study could potentially be combined with effective field theory based approaches [42][43][44][45][46][47][48][49] within field-level frameworks (see, e.g., [50]).
While our approach concentrates solely on number counts within a real-space mesh, it is worth noting that a companion paper introduces a sub-grid model designed to allocate positions and velocities within the cells (see the CosmoMIA method [51]).The bias method presented in this paper together with CosmoMIA are being applied to a variety of large-scale structure tracers, such as Luminous Red Galaxies (LRGs), Emission Line Galaxies (ELGs), quasars (QSOs) and Bright Galaxies (BGs), and will be presented in forthcoming publications.
The paper is structured as follows.In section 3, we briefly describe the structure formation theory we use to evolve the density field.Subsequently, we present the bias of dark matter tracers through perturbation theory (PT) and define our analytic model.At the end of this section we hierarchically classify the cosmic web.In section 4, we present and statistically analyse the test on a N -body reference catalog.Finally, we discuss the results in section 5.

Theory
In this section, we provide an overview of the analytical expression of the displacement field necessary to calculate the density field.Then we recap the theoretical background in modelling the bias of large scale structure tracers.

Structure formation
On the large scales, we rely on Lagrangian perturbation theory (LPT) to simulate structure formation [3].This is an approximate solution to evolve particles from the primordial density fluctuations until the desired epoch (redshift).In this theory, the action of gravity across cosmic evolution is encoded in the displacement field Ψ(q, z), which relates initial Lagrangian positions q to final Eulerian comoving ones x(z) through the following mapping relation However, it is well known in the literature that LPT approaches suffer from artificial shellcrossing, as the particles do not interact with each other [52][53][54][55].This problem can be mitigated with Augmented LPT (ALPT) [31], a combination of LPT with the spherical collapse (SC) model [56][57][58], which models virialisation towards small scales.In particular, we rely on second-order LPT (2LPT) [52,53], an improvement of the linear approximation with non-local tidal field corrections.The ALPT displacement field is then written as sum of two components where the radius r s smoothly separates both regimes and takes the value 6 h −1 Mpc in this work, as the typical uncertainty in the LPT solution is of the order of 3 h −1 Mpc [9].In this way, the LPT solution is gradually substituted by the spherical collapse solution, mitigating artificial shell-crossing.The long-range component is described by the linear growth factor D and the first-and second-order gravitational potential terms ϕ (1) , ϕ (2) : 3) The short-range component reads For more details, we refer the reader to [31].The smooth dark matter density field δ ALPT is finally obtained by applying Cloud-In-Cell (CIC) interpolation to the final positions x(z) including tetrahedra-tesselation [59].The ALPT method presents a better cross-correlation with N -body simulations than applying separately either LPT solutions until third order or the spherical collapse model alone [31,60].Furthermore, compared to N -body solutions, it is considerably more efficient,1 while reproducing the power on the very large scales at different redshifts.
Our studies have shown that the usage of improved gravity solvers, such as eALPT [32], an extension of ALPT, makes only sense when the resolution of the particle-mesh is higher than about 2.5 h −1 Mpc, since otherwise the calculation of the gravitational potential towards small scales is too inaccurate no matter which gravity solver is used (see also a study comparing ALPT with particle-mesh gravity solvers at resolutions of 2.6 h −1 Mpc side cells, where moderate gain is reported [16]).Since we want to explore at first, the maximal efficiency of the method, we explore in this paper only resolutions of ∼ 3.9 h −1 Mpc.Conveniently, we opt for a resolution that is a multiple of two (2 4 ) lower than the original dark matter field in each dimension, which serves as our reference simulation from the UNIT project [39].We leave an exploration of our framework with higher resolutions based on eALPT for future work.

The bias of dark matter tracers
Galaxies, galaxy clusters and halos are biased tracers of the dark matter distribution (for review, see [62]).The relations between different tracers and the underlying dark matter field, named bias relations, are well known to be non-linear, non-local (hence, also scaledependent) and stochastic (see [13] and references therein).Bias relations also depend on the history evolution of the Universe since the tracer formed, the so-called assembly bias (see, e.g., [63][64][65]).Assembly bias is also studied through the dependence of the tracer on the cosmic web (see, e.g., [66,67]).
On the large scales, where density fluctuations are small, perturbation theory (PT) can be used, following [68], to expand the relation between the tracer number density δ tr and the dark matter density δ, where b i are the unknown bias parameters.This expansion, however, presents only a dependence on the local density.In order to allow for history dependence and long-range non-locality, one can also include a dependency on the peculiar velocity through ∂ i v j and on the gravitational potential through the tidal field tensor [23,40].In this work, we extend local bias models including long-and short-range non-local bias components through a cosmic web classification.The short-range non-local bias is based on the secondorder derivatives of the density field, . The dependencies we investigate can be summarised as follows where ϵ describes stochastic uncertainty arising from the fact that the tracer distribution is a discrete sample of the underlying dark matter distribution.By convenience, following [40], the tidal field tensor can be rewritten by removing its trace Furthermore, the cosmological principle (homogeneity and isotropy) constrains the bias parameters to be constant scalar quantities.Then, one should construct scalars from s ij .The first order scalars s ii are zero by definition and the second and third order terms can be constructed by the tensor contractions s 2 ≡ s ij s ji and s 3 ≡ s ij s jk s ki , respectively.Analogously, we can define a traceless tensor for the short-range non-local bias term, and its scalars µ 2 ≡ µ ij µ ji and The resulting Taylor expansion of the bias model, in Eulerian coordinates x, has now local (B L ), non-local long-range (B LR NL ), non-local short-range (B SR NL ), velocity (B v NL ) and stochastic (B ϵ ) bias contributions.Following [23] and [40], the expansion can be written as where the cs are the bias factors and the superscripts (1 → 3) denote terms from first to third order.In the large-scale limit of k → 0, the observable, renormalised, large scale bias b δ as the sum of bias-like terms according to [40] is given by where σ 2 is the variance of the dark matter field at the considered resolution, σ 2 ϵ is the corresponding variance of the noise, and c st relates to a mixed non-local bias term including velocity contributions for θ ̸ = δ.

Method
Equation (2.9) encapsulates the principal bias terms of the tracer density field δ tr within a Taylor expansion.However, employing truncated versions of an infinite expansion at a practical level is non-trivial and carries inherent risks.
As depicted in (2.10), accurate reproduction of large-scale bias necessitates accounting for at least third-order bias contributions.Yet, this does not offer guidance on transitioning from large to small scales to achieve a comprehensive field-level bias description.
As elucidated in [23], the contributions of velocity bias primarily impact small scales and can generally be disregarded for the majority of baryon acoustic oscillations (BAO) and redshift-space distortions (RSD) studies.Consequently, we defer an investigation into extending our bias model to incorporate terms where θ ̸ = δ for future endeavors.Instead, we focus on terms where information from velocity shear is inherently encoded in the tidal field tensor.
Here, we adopt the approach proposed in the aforementioned study and introduce, for the first time in the context of parametric bias models of discrete tracer distributions, a cosmic web classification to indirectly model non-local bias terms.We extend this methodology to similarly model short-range non-local bias contributions for the first time, as elaborated below.
This approach has several advantages, as it permits us to ensure positive definite tracer density fields, and directly or indirectly include all bias terms (B L , B LR NL , B SR NL , B v NL and B ϵ ).Our method consists on applying a deterministic (section 3.1) and stochastic (section 3.2) bias model to each of the different regions of the cosmic web which is hierarchically classified in terms of the gravitational potential ϕ and the density field δ (section 3.3).
The deterministic bias, consists firstly on a power law [13,69,70] and secondly on a threshold bias [13,71,72], describing the non-linear behaviour and allowing to suppress low density regions where not enough mass is available to form halos.We extend this picture with a suppression of density peaks which some galaxies tend to avoid as a result of halo exclusion (see, e.g., [73]).
Finally, the connection between the expected number counts and the discrete realisation of large scale structure tracers is established through a likelihood, accounting for stochastic biasing [74,75].As advanced by [76], a non-vanishing correlation function introduces deviations from Poissonity [77,78].This affects the statistics at scales below the mesh resolution and needs to be modelled with non-Poisson likelihoods [13,35,79,80]

Local deterministic bias
Following [13], we can generally write the expected number counts of a certain tracer ⟨ρ tr ⟩ in a given volume element dV as function of the underlying dark matter density δ, where B(δ tr |δ) is the deterministic bias relation and f tr is the normalisation factor which controls the number density and can be described as i.e., by requiring the tracer density field to have the number density of the reference sample In particular, we define the bias relation as the product of a power-law, describing the non-linear behaviour, an exponential damping for high-density regions (ϵ > 0) and a lowdensity threshold (ϵ ′ < 0), where we have extended the model presented in [13] to also account for high density suppression.

Stochastic bias
The distribution of halos on a dark matter density field defined on a mesh, characterised by a lower resolution than that needed to resolve the halos, leads to a stochastic process.This can be described by the probability P (N i |λ i ) of finding a given number of halos N i in a given cell i with volume dV .Therefore, λ i = ⟨ρ tr ⟩ dV × dV is the average number of particles in a cell.The Poisson distribution, P (N , can account for the discrete nature of galaxies [81,82].However, it only approximately holds in intermediate density regions.In general, we have over-dispersion.Non-Poissonian distributions, such as the Negative Binomial (NB), where introduced to account for field-level bias modelling [13,30,50]: The Poisson distribution is recovered in the limit β → ∞ for which the deviation tends to one [16].One can also model a density dependent β-parameter to account for arbitrary variance expressions (see [35]).In this work, we model the stochastic bias by drawing samples of the deterministic bias model (3.3) from the NB distribution.

Non-local deterministic bias: The hierarchical cosmic web classification
Numerical simulations have shown that the gravitational growth of the initial density perturbations leads to a characteristic pattern of structures, dubbed cosmic web [83,84], very similar to the one from observations of the large-scale distribution of galaxies (see, e.g., [85]).Most of the Universe is composed of low-density regions (voids) surrounded by filaments of matter generating a web-pattern.Inspired by the formation of cosmic sheet-like pancake structures [86], [87] proposed a theoretical criterion to classify the different regions of the cosmic web in terms of the particle motion inside the gravitational potential of a dark matter halo, ẍ = −∇ϕ.
The center of mass of a halo is an extremum, i.e. ∇ϕ = 0, and hence, there, the equation of motion can be linearised.At this points, the particle motion is completely determined by the tidal field tensor T ij ≡ ∂ i ∂ j ϕ and, more concretely, by its eigenvalues λ 1 > λ 2 > λ 3 .Analogously to the Zel'dovich pancake, we consider regions with three positive eigenvalues as those where gravitational collapse cause matter inflows.On the other hand, we consider unstable orbits (expanding regions) when the three eigenvalues are negative.Attending to this argument, one can define four regions in the cosmic web The dimensionless threshold eigenvalue λ th2 , allowing for λ th ̸ = 0, was introduced by [88] to reduce the dependence on the smoothing scale, mesh resolution and mass assignment scheme.This classification can be generalised to the full density field rather than limit to the centre of the halos.As is essentially defined by the gravitational potential, we dubbed this classification ϕ-web (see figure 1).It has been used in simulations both to study the properties of dark matter halos, galaxies and intergalactic gas [36,67,87,89] and to generate halos and galaxy mock catalogues [21,35,90].Furthermore, it is used in observational analysis [91][92][93].
The gravitational potential is a key ingredient in both the bias model (2.9) and the cosmic web classification.The distribution of halos can be associated with various regions of the cosmic web by considering the invariants of the tidal field tensor.These invariants, in turn, directly describe the non-local bias terms within halo perturbation theory (for details see [23] and references therein), and are: Consequently, the invariants can be directly linked to the different regions of the cosmic web (for simplicity, and without loss of generality, for λ th = 0): • Sheets: • Voids: It emerges that both the long-range non-local bias (2.9) and the cosmic web classification are determined by the eigenvalues of the tidal field.In particular, the invariants can be related to the different bias components in halo perturbation theory (see [23]) The cosmic web classification based on the gravitational potential (the ϕ-web) provides information on I 1 ∼ δ, I 2 and I 3 , and thus, on δ, s 2 , and s 3 .Thus, we propose applying the deterministic (3.3) and stochastic (3.4) bias model to the different ϕ-web regions to indirectly model long-range non-local bias.Additionally, since the short-range non-local bias (2.9) is determined by the density field tensor, Γ ij = ∂ i ∂ j δ, we suggest sub-classifying the ϕ-web regions based on the eigenvalues of the density field tensor.Figure 1 illustrates in the upper panels the density field (left) alongside the ϕ-web classification (middle) and the δ-web classification (right).A visual examination of figure 1 reveals the correlation between both classifications and the density field, with the δ-web providing information on small scales.The lower panels in figure 1 show how the gravitational potential traces larger scales (left), and how the ϕ-web classification (middle) is quite stable against moderate variations in λ th , while the δ-web classification is very noisy when the λ th is set to zero.We have employed an information theory-based approach to represent the ALPT density field using only four distinct values, corresponding to different cosmic web regions, in order to study the cross-correlation with the halo reference catalogue and obtaining a criterion to determine the most suitable value for the λ th parameter.Following the eigenvalues, we assign the numbers 4, 3, 2 and 1 to, respectively, knots, filaments, sheets and voids for both ϕ − web and δ − web classification.An inverted order of numbers yields anticorrelated fields.The specific values themselves are not critical; what is important is ensuring that the difference between subsequent regions remains consistent to prevent bias toward any particular region, while preserving the hierarchical order.With this definition, we can define a density contrast based only on those four values for both long-range δ ϕ−web , and shortrange δ δ−web components.The cross-correlation between these cosmic-web based fields and the density field are shown in figure 2. The effect of the λ th parameter on these density fields is to make tighter structures with increasing values of λ th but always close to zero.Above 0.2 the filamentary structure starts to disappear and the voids occupy most of the volume.
We propose selecting the threshold value λ th in terms of the cross-correlation between the cosmic web density fields δ ϕ−web , δ δ−web , defined from the ALPT density field, and the UNIT halo reference catalogue (section 4).After exploring a wide range of threshold values, we have determined that λ th = 0.05 is the most suitable choice.In figure 2, we display the correlation for λ th = 0.05, along with lower (λ th = 0.0) and upper (λ th = 0.1) limits.
The prominent feature of this plot is that, on small scales, the δ-web exhibits a higher correlation with the reference halos compared to the ϕ-web.This observation supports the notion that δ-web classification encompasses information on small scales, effectively modeling short-range non-local behavior.Across different thresholds, the cross-correlation with the ϕ-web remains relatively consistent at both large and small scales, with a slight increase in correlation at intermediate scales for higher λ th values (up to 0.1, beyond which it starts to decrease).However, the correlation with the δ-web displays significant differences for various thresholds: at large and intermediate scales, the correlation for λ th = 0.0 is notably lower compared to other cases.Across almost all scales, the δ-web with λ th = 0.05 demonstrates the highest correlation with the reference catalogue.In light of these findings, we have opted to utilise λ th = 0.05 in this study.
Moreover, figure 3 depicts the hierarchical cosmic web classification scheme for the halo reference catalogue used for calibration in section 4. In the subsequent section, we demonstrate how progressively incorporating ϕ-web and δ-web information significantly enhances the 1-, 2-, and 3-point statistics, culminating in the reproduction of the reference catalogue with remarkable accuracy.
As a consequence of our hierarchical cosmic web classification, various quantities related to the local deterministic and stochastic bias components have to be computed for each sub-region.In particular, the bias parameters depend on the cosmic web region denoted by the subscript cw:

Calibration on N-body halo catalog
In this section, we present the application of the method to generate halo mock catalogues.

Reference catalog: UNIT simulation
We employ a gravity-only simulation from the UNIT project3 , specifically a fixed-amplitude run denoted as UNITSIM1 (U1) [39], which is consistent with the cosmological parameters of Planck Collaboration 2016 [94], i.e., Ω m = 0.3089, h ≡ H 0 /100 = 0.6774, n s = 0.9667 and σ 8 = 0.8147 (see table 4 in [94]).The simulation tracks the dynamic evolution of 4096 3 dark matter (DM) particles within a periodic box of 1 Gpc h −1 on a side, starting at scale factor a ≡ 1/(1 + z) = 0.01 (z = 99) with initial conditions generated with FastPM [95].The DM particles undergo evolution up to redshift z = 0 with the particle-mesh and tree algorithms implemented in the GADGET code [96].This setup ensures a mass resolution of 1.2 × 10 9 M ⊙ h −1 .Halos were identified with the phase-space halo finder ROCKSTAR4 [97].The CONSISTENT5 algorithm was applied on top of the ROCKSTAR catalogues to enhance the consistency of merger histories through the implementation of dynamic tracking for halo progenitors (for details, please refer to [98]).As for the reference catalogue used in this research, we focus in the halo distribution at z = 1.032 with spherical overdensity masses of M 200c ≳ 10 11 M ⊙ h −1 .This criterion agrees with the expected mass cut at which dark matter halos can host Emission Line Galaxies (ELGs) (as detailed in studies such as [99]).For this reason, we have chosen a halo mass cut of 10 11 M ⊙ h −1 yielding a reference catalogue containing a total of 8,278,311 distinct halos.

Mock catalog
The first step involves running ALPT with the WebON code [100] in a cubical volume of 1 h −1 Gpc side to z = 1.032.To this end, we employ a down-sampled version of the original initial conditions (corresponding to the UNITSIM1 simulation relying on 4096 3 cells and particles) to a mesh of 256 3 cells and particles.This results in a cell side resolution of approximately 3.9 h −1 Mpc.We consistently use the same cosmology as the UNITSIM1.We define the dark matter density contrast after applying the CIC mass assignment scheme combined with tetrahedra-tesselation [59].We then use the resulting dark matter field to calibrate the bias parameters.Subsequently, we execute 100 realisations with random initial conditions to ensure robust statistical analysis.
Furthermore, from the WebON code, we obtain the ϕ-web and δ-web classifications for the ALPT dark matter density field using a threshold value of λ th = 0.05.This classification is also applied to the halo reference catalogue to determine the number density in each cosmic web region.The classification scheme is illustrated in figure 3, where the two numbers V fr and N fr represent the volume and number fraction, respectively, characterising each region of the cosmic web.
In the second step, we apply the bias model described in (3.3) to each of the individual regions.Although initially there are six possible free parameters {α, β, ρ ϵ , ϵ, ρ ′ ϵ , ϵ ′ }, we find that we can successfully fit most of the individual regions using only two parameters, {α, β}.For some regions, we require four parameters by incorporating the exponential damping for high-density regions, but in neither case do we need all six parameters.The number of parameters used in each region is labeled as n p in figure 3.In this work, in contrast to [13], we do not need the low-density threshold bias term of (3.3).This is accounted for by classifying the short-range cosmic web, which leads to varying bias parameters and number densities in each region.This classification controls the emergence of haloes in expanding low-density regions.
We are conducting separate studies to investigate which parameters are relevant for each galaxy type.

Results
Figure 4 depicts a visual comparison between the reference and the mock halo catalogue.The disparities observed stem from stochasticity induced by the various realisations below the scale of the down-sampled field.Nevertheless, both fields exhibit striking similarities, and the large-scale structure is clearly discernible and consistent in both panels.
A quantitative analysis is presented in figure 5, where each row illustrates the 1-, 2and 3-point statistics or, equivalently, the probability density function PDF (left), the powerspectrum P (k) (middle), and the reduced bispectrum Q(θ 12 ) (right) for the various bias models considered in this study.We progressively include more bias terms from the top to the bottom rows.The summary statistics corresponding to the reference catalogue is represented with the black solid lines or black dots and corresponding error bars.The mock catalogue mean over 100 realisations is represented with the red dashed line and the red shaded area corresponds to the standard deviation region.
The PDF [101], the power-spectrum [102] and the bispectrum [103,104] are powerful tools for extracting cosmological information from the large-scale structure of the Universe by constraining cosmological parameters.The PDF is understood as the number of cells containing a given number of objects.The Fourier-space 2-and 3-point functions, i.e., the power-spectrum and bispectrum, respectively, are defined as where the ensemble average stands for k-shell averages, while the reduced bispectrum is defined as We have focused on a particular configuration of the reduced bispectrum, k 2 = 2k 1 = 0.2 h Mpc −1 , which is typically studied, as it is relevant to BAO and RSD analyses.We are using larger volume simulations based on the AbacusSummit [105] simulation suite to thoroughly study a larger variety of configurations for halos and each galaxy type tracer relevant to the DESI and EUCLID collaborations.Such studies will be presented in forthcoming papers.As outlined in previous studies [16,38], the determination of the bias parameters is performed jointly by maximizing the likelihood of the 1-point PDF and the power-spectrum.This joint optimisation is crucial as the skewness of the PDF is constrained by the bispectrum, aiding in breaking degeneracies [14].In particular, we determine the optimal values for our parameters by sampling their posterior distribution through the affine-invariant emcee Markov Chain Monte Carlo (MCMC) sampler [106,107].The total log-likelihood is given by −2 ln(L) = −2 ln(L) P (k) (ref|θ) + ln(L) PDF (ref|Θ) , where "ref" denotes the reference, and ln(L) P (k) (ref|θ) and ln(L) ref (ref|θ) are Gaussian likelihoods (see also Eqs. 6-9 in [16]).We assume no correlation between the PDF and the P (k).Also, we assume diagonal covariance matrices, with zero off-diagonal components and diagonal elements given by σ 2 P (k) = 4πP ref (k)/V box k∆k and σ 2 PDF = N n -with N n the number of cells containing n halos -for the power-spectrum and the PDF, i.e.Gaussian and Poisson uncertainties, respectively.
The power-spectrum likelihood assumes a conservative overestimated variance, since the reference and the calibration catalogues share the same fixed-amplitude and down-sampled initial conditions.However, one has to consider that the down-sampling process introduces stochastic bias, which has some impact on the power-spectrum at low modes.In fact, this variance is lower than additionally allowing for random initial conditions.Nonetheless, we have checked that this likelihood does not introduce large-scale biases (see calibration dotted lines in the lowest panel of figure 5).The calibration shows a better agreement than the mean of one hundred random realisations in the power-spectrum.
Through our approach, it is evident how we progressively enhance the fit to the 1-, 2-, and 3-point statistics.With the simplest model, we are significantly divergent from reproducing the reference catalogue.There are substantial deviations in the PDF and the bispectrum, and the power-spectrum fails particularly towards high wave numbers k (small scales).
The second row of plots demonstrates how incorporating a threshold bias (3.3) improves statistics, notably by reducing the significant deviation in the bispectrum.However, the statistics are still not well reproduced.Therefore, we decided to incorporate information on the different cosmic web environments (knots, filaments, voids, and sheets) by individually applying the bias model to each region rather than directly to the entire catalogue.This leads to significant improvements (third row): the PDF is much closer to the reference, the power-spectrum is reproduced within a 5 percent deviation until k = 0.3 h Mpc −1 , and only the wings of the bispectrum are not entirely compatible with the UNIT simulation catalogue.
Despite these improvements, we aim for percentage accuracy and ultimately achieve it by incorporating small-scale information through hierarchical cosmic web classification.Now, the four regions of the ϕ-web are subclassified based on the density field tensor, which, as demonstrated in section 3.3, indirectly models the short-range non-local bias.The bias model (3.3) is now applied to each of the 16 regions, and the statistical result of combining them to create the entire catalogue is shown in the fourth row of figure 5.The PDF is now reproduced with 5 percent accuracy, he power-spectrum is compatible within 2 percent deviations almost up to k = 0.8 h Mpc −1 , and the bispectrum, including the wings, is reproduced within error bars.
We find that the low-density threshold bias, previously required in studies utilising the PATCHY code, becomes unnecessary.This is because the short-range δ-web classification already provides a comprehensive enough description of halo formation physics at the scales analysed in our work.As a consequence, the hierarchical cosmic web classification allows us to dramatically reduce the number of parameters n p used in each region, so that the total number of parameters employed in our study is of 32 instead of 96 (= 16 (regions) × 6 (parameters)), as indicated in figure 5.
We study the goodness of the fits in terms of the χ 2 test, as shown in table 1, quantifying the fits shown in figure 5. Note that the bispectrum is not used in the fit but for a completeness of the analysis.A decrease in the χ 2 , or equivalently an improvement of the fit accuracy, is appreciated when including more bias terms progressively from the first to the fourth row.Important deviations in the PDF, the power-spectrum and the bispectrum are present in the fit until the non-local bias is included through the hierarchical cosmic web leading a final  remarkable result in the three statistics, χ 2 ∼0, with rigorously accuracy in the two-and three-point functions.
Given that we use the same down-sampled initial conditions, we expect to obtain very close results on large scales to the reference catalogues in all summary statistics.Strictly speaking, the reference is from fixed-amplitude initial conditions, so the noise of the reference is difficult to quantify.Hence, the chi-squared values are not faithfully quantifying how well the mock matches with the reference, instead, they serve as quantitative references about how much the clustering statistics are improved with different statistics of the density field.
It should be emphasised once more that the strength of this approach lies in the ability to treat each subset of the halo catalogue, corresponding to a particular sub-region of the cosmic web, as an independent catalogue.This parallels the practice in clustering analyses where galaxies are divided into categories such as blue and red galaxies.The accuracy of our approach is undoubtedly constrained by the availability of sufficient statistics to effectively constrain the bias parameters of each sub-region.

First row
Second  1. Goodness of fit in terms of the χ 2 parameter for the PDF, the power-spectrum and the reduced bispectrum in each of the rows shown in figure 5.
Nonetheless, this approach is less susceptible to over-fitting compared to machine learning techniques, as the model (and its degrees of freedom) is constrained by its analytical formulation for both the deterministic and stochastic bias components.One crucial advantage over the BAM approach is that it does not require a kernel to reproduce the power-spectrum.This capability enables us to efficiently apply our field-level bias mapping technique in redshift shells to a dark matter lightcone, such as the ones we have computed with the WebON code.
We are exploring in separate studies transformations of the density field which improve the description of the cosmic web towards smaller scales and allows us to explore such scales in the 2-and 3-point statistics.

Summary and discussion
In this study, we have introduced an innovative approach for generating mock catalogues which extends the deterministic and stochastic bias model by incorporating a hierarchical classification of the cosmic web.This augmentation allows us to effectively model non-local bias, ensuring the positive definiteness of the expected number counts of tracers.
By integrating information from the cosmic web classification, specifically the ϕ-web and δ-web, we significantly enhance the fidelity of our bias model.This comprehensive framework enables us to capture the influence of large-scale structure on tracer distributions and accurately reproduce statistical properties such as the probability density function, powerspectrum and bispectrum.
One of the most remarkable findings of this study is the simplification of the bias model as we incorporate the hierarchical cosmic web classification.We observe that each region can typically be modelled with just two parameters, and in some instances, only four parameters are required.This reduction in the number of parameters not only streamlines the modelling process but also enhances the interpretability and efficiency of our approach.It underscores the effectiveness of leveraging cosmic web information to capture the complexity of large-scale structure with minimal complexity in the bias model.
Our method offers several key advantages, including its computational efficiency, high accuracy in reproducing reference statistics up to third order, and versatility in its application to different tracers of large-scale structure.In comparison to automatic non-parametric learning methods, e.g., [21], our approach offers the advantage of being less susceptible to overfitting.This is because our bias model, including the stochastic component, is analytical and determined by only a few parameters.
However, further investigation is needed in future work to verify the robustness of the model assumptions, such as the specific deviation from Poissonity set by the negative binomial distribution.This ongoing research will help to ensure the reliability and validity of our approach across a range of cosmological scenarios and observational data sets.
Also, we conducted a series of 200 large-volume light-cone dark matter simulations using the WebON code within cubical volumes of (10 h −1 Gpc) 3 [100].We are currently preparing a series of papers presenting mock galaxy catalogues generated using this method with various tracers of the large-scale structure.We also plan to explore the 4-point statistics of our mock catalogues.Previous studies have shown that accurately reproducing the 1-, 2-, and 3-point statistics lays the foundation for reproducing covariance matrices [108].Therefore, we believe that we are on the right track with our current approach to become accurate in the higher-order statistics.The hierarchical cosmic web classification presented in this study could potentially be applied for environmental studies of galaxies to study assembly bias.The assignment methods of halo or galaxy properties can benefit from the findings of this paper extending previous approaches with the short-range classification scheme (see [90]).The combination of this bias model with effective field theory can be as straightforward as considering different local bias parameters at various hierarchical cosmic web regions.
Looking ahead, we anticipate further enhancements by continuing to refine the cosmic web classification with additional terms, such as those based on velocity bias components obtained with fast gravity solvers (e.g., [32]).Additionally, we plan to implement a nonlinear mapping of the density field to imprint missing power from the approximate low-resolution gravity solver and improve the description towards small scales.
This ongoing development promises to advance our understanding of the complex interplay between cosmic web environments and tracer distributions in the Universe.By incorporating these advancements, we aim to further refine our mock catalogue generation process and utilise this framework for field-level inference.This will potentially lead to even more accurate representations of large-scale structure and facilitate deeper insights into cosmological phenomena.

1 Figure 1 .
Figure 1.Upper panels: ALPT Dark matter density field and the chosen large-scale and small-scale cosmic web fields for this study with λ th = 0.05 for both cases.Slices of 20 h −1 Mpc thick and 1000 h −1 Mpc side at redshift z = 1.032 are shown for the logarithmic density field (upper left panel), the ϕ-web (upper middle panel) and the δ-web (upper right panel).The darker the colour, the higher the density.Lower panels: Same as upper panels, but showing the gravitational potential (lower left panel), and the large-scale (lower middle panel) and small-scale (lower right panel) cosmic web fields, with λ th = 0 for both cases.

10 Figure 2 .
Figure 2. Cross power-spectra of the cosmic web density fields δ ϕ−web (solid lines) and δ δ−web (dashed lines) with the halo reference catalogue for different threshold values λ th .

Figure 4 .
Figure 4. Visual comparison of the reference and a mock halo catalog.The slices are ∼20 h −1 Mpc thick and 1000 h −1 Mpc side.The logarithm of the halo density contrast δ tr is shown and darker regions are related to lower densities.Both fields share the same initial conditions down-sampled to resolutions of ∼ 3.9 h −1 Mpc cell side.Below that resolution, stochastic bias dominates and the halo distributions differ.

Figure 5 .
Figure 5. Statistical comparison of the UNIT simulation reference catalogue (black) and the different mock catalogues (red).The probability density function PDF (left column), the power-spectrum P (k) (middle) and the reduced bispectrum Q(θ 12 ) (right) are shown for a model describing a deterministic and stochastic bias model (first row), to which we add a threshold bias (second) and then, progressively, the ϕ-web and δ-web information (third and fourth rows).The dashed red line is the mean over 100 realisations and the shaded area represents the standard deviation.The grey area in the subsets represent the 5 percent deviation (2 percent also included in the bottom row).The error bars on the reference bispectrum come from the 100 realizations.