AMBER: A Semi-numerical Abundance Matching Box for the Epoch of Reionization

The Abundance Matching Box for the Epoch of Reionization (AMBER) is a semi-numerical code for modeling the cosmic dawn. The new algorithm is not based on the excursion set formalism for reionization, but takes the novel approach of calculating the reionization-redshift field $z_\mathrm{re}(\boldsymbol{x})$ assuming that hydrogen gas encountering higher radiation intensity are photoionized earlier. Redshift values are assigned while matching the abundance of ionized mass according to a given mass-weighted ionization fraction $\bar{x}_\mathrm{i}(z)$. The code has the unique advantage of allowing users to directly specify the reionization history through the redshift midpoint $z_\mathrm{mid}$, duration $\Delta_\mathrm{z}$, and asymmetry $A_\mathrm{z}$ input parameters. The reionization process is further controlled through the minimum halo mass $M_\mathrm{min}$ for galaxy formation and the radiation mean free path $l_\mathrm{mfp}$ for radiative transfer. We implement improved methods for constructing density, velocity, halo, and radiation fields, which are essential components for modeling reionization observables. We compare AMBER with two other semi-numerical methods and find that our code more accurately reproduces the results from radiation-hydrodynamic simulations. The parallelized code is over four orders of magnitude faster than radiative transfer simulations and will efficiently enable large-volume models, full-sky mock observations, and parameter-space studies. AMBER will be made publicly available to facilitate and transform studies of the EoR.


INTRODUCTION
Cosmic dawn is a fascinating period in the first billion years of the Universe and is a frontier topic for both theoretical and observational explorations and discoveries. The reionization of hydrogen by the first stars, galaxies, and quasars drastically converts the cold and neutral gas into a warm and highly ionized medium. Galaxies most likely provided the bulk of the ionizing photons (e.g. Bouwens et al. 2015a;Finkelstein et al. 2015), but there may have been early contributions from Population III stars (e.g. Wu et al. 2021) and late contributions from active galactic nuclei (AGN; e.g. Madau & Haardt 2015). On large scales, the higher-density regions near radiation sources are generally reionized earlier than the lower-density regions far from sources in this inhomoge-neous process (e.g. Barkana & Loeb 2004;Trac et al. 2008).
Our current understanding of when the Epoch of Reionization (EoR) occurred primarily comes from "A Tale of Two Optical Depths". The Thomson optical depth τ (CMB photons scattering with free electrons) and the Gunn-Peterson optical depth τ GP (Lyman-alpha photons scattering with neutral hydrogen) have long provided two major observational constraints. However, the latest measurements tell a starkly different tale than the early observations (e.g. Fan et al. 2002;Kogut et al. 2003). Planck Collaboration et al. (2020) recently inferred τ = 0.054±0.007 from measurements of the CMB temperature and polarization angular power spectra, implying a late reionization midpoint at redshift z ≈ 7.7 ± 0.6 (e.g. Glazer et al. 2018). There are now multiple evidence of dark Lyα troughs extending down to z ≈ 5.5 in the spectra of high-redshift quasars (e.g. Becker et al. 2015;Bosman et al. 2018;Eilers et al. 2018;Bosman et al. 2021). This suggests that reionization could have ended at z < 6 (e.g. Keating et al. 2020;Nasir & D'Aloisio 2020), later than previously assumed. When and how the EoR occurred exactly (i.e. the reionization history and process) are of primary interest.
There is a renewed emphasis on using radiationhydrodynamic simulations (e.g. Gnedin 2014;Norman et al. 2015;Ocvirk et al. 2016;Semelin et al. 2017;Finlator et al. 2018;Doussot et al. 2019;D'Aloisio et al. 2020;Katz et al. 2021) to model the complex astrophysics of sources and sinks on small scales and using semi-analytical/numerical methods (e.g. Furlanetto et al. 2004;Mesinger & Furlanetto 2007;Zahn et al. 2007;Alvarez et al. 2009;Choudhury et al. 2009;Santos et al. 2010;Mesinger et al. 2011;Battaglia et al. 2013b;Hassan et al. 2016;Xu et al. 2017;Hutter 2018) to make theoretical predictions and mock observations on large scales. Radiative transfer (RT) simulations are usually limited to small box sizes L 100 h −1 Mpc and a single one typically requires hundreds of thousands to millions of CPU hours to run on a supercomputer (e.g. Trac & Gnedin 2011). Fast and accurate approaches with larger box sizes are essential for parameter-space studies. These two approaches in tandem provide our best option in making forward progress in synergy with observations.
A variety of independent approaches are crucial for comparison with and interpretation of observations of the 21cm signal, CMB, Lyα forest, and intensity mapping. However, there has been a lack of development and diversity in semi-analytical/numerical methods. The majority of codes, like the often-used 21cm-FAST , are based on the excursion set formalism (ESF; e.g. Bond et al. 1991;Furlanetto et al. 2004). In Zahn et al. (2011), we compare two RT simulations Trac & Cen 2007) with two ESF methods (Mesinger & Furlanetto 2007;Mesinger et al. 2011). While the simulations statistically agree within 10% at all redshifts, the models agree within 50% only when compared at the same ionization fraction. There are larger statistical differences when compared at the same redshifts because of the disagreement in reionization histories. Previous seminumerical methods have input astrophysical parameters such as the star formation efficiency, photon production efficiency, and radiation escape fraction. The output reionization history is an end product after many complex and uncertain calculations, and is not easily controlled. There should be flexibility to directly choose the reionization history as this is the primary question for studies of the EoR.
In this paper, we present a new semi-numerical Abundance Matching Box for the Epoch of Reionization (AM-BER) code for modeling the cosmic dawn on large scales. AMBER allows users to directly specify the reionization history through input parameters, a useful feature for theoretical and inference studies that is not currently in other semi-numerical methods. Section 2 summarizes the Simulations and Constructions of the Reionization of Cosmic Hydrogen (SCORCH) project that is used to motivate and calibrate AMBER. Section 3 describes the semi-numerical methods in AMBER, including the novel technique for matching the abundance of ionized mass or volume. Section 4 compares AMBER against two other semi-numerical methods, and Section 5 presents some basic results. Further applications will be presented in upcoming work. For example, Chen et al. (2022) models and studies the patchy kinetic Sunyaev-Zel'dovich (KSZ) effect, which is a promising probe of the EoR. We conclude in Section 6 and add supplementary material in the Appendix.

SCORCH
We first summarize the SCORCH project Doussot et al. 2019;Chen et al. 2020) that is used to motivate and calibrate AMBER. SCORCH is designed to provide cosmological simulations, theoretical predictions, and mock observations to facilitate more accurate comparisons with current and future observations. In this section, we describe the N-body simulations (Sec. 2.1), the radiation-hydrodynamic simulations (Sec. 2.2), and the reionization-redshift fields (Sec. 2.3). The simulations are based on the concordance cosmological parameters: Ω b = 0.045, Ω m = 0.3, Ω Λ = 0.7, h = 0.7, σ 8 = 0.8, and n s = 0.96.

N-body Simulations
In SCORCH I , we run 22 highresolution N-body simulations to quantify the abundance of dark matter halos as a function mass M , accretion rateṀ , and redshift z during the EoR. The N-body simulations are run using an updated particleparticle-mesh (P 3 M) code with a hybrid halo finder. Box sizes in the range 10 ≤ L/(h −1 Mpc) ≤ 400 are chosen to focus on the atomic cooling halos and two realizations of each box size are run to reduce sample variance. Each simulation contains N dm = 2048 3 dark matter particles and has a particle mass resolution m p = 8.72 × 10 6 (L/100) 3 h −1 M .
We quantify and fit the halo mass function dn/dM , mass accretion-rate relationṀ (M, z), and halo accretion-rate function dn/dṀ for the redshift range z ≥ 6. The new fit for the halo mass function is 20−40% more accurate compared to Tinker et al. (2008) at the high-mass end hosting currently observable galaxies. We also model and study the galaxy-halo connection by abundance matching observed high-redshift galaxies with simulated dark matter halos. See Trac et al. (2015) for more details.
In SCORCH II (Doussot et al. 2019), we also run a high-resolution P 3 M simulation with N dm = 3072 3 dark matter particles in a comoving box of side length L = 50 h −1 Mpc to generate halo and galaxy catalogs for the radiation-hydrodynamic simulations. A halo finder is run on the fly every 20 million cosmic years to locate dark matter halos and build merger trees. With a particle mass resolution of m p = 3.59 × 10 5 h −1 M , we can reliably measure halo quantities such as mass and accretion rate down to the atomic cooling limit for galaxy formation (T ∼ 10 4 K, M ∼ 10 8 h −1 M ).

Radiation-hydrodynamic Simulations
In SCORCH II (Doussot et al. 2019), we run three radiation-hydrodynamic simulations with the same cosmic initial conditions, same galaxy luminosity functions, but with different radiation escape fraction f esc (z) models. The simulations are designed to have the same Thomson optical depth τ ≈ 0.06, consistent with recent CMB observations (Planck Collaboration et al. 2020), and similar midpoints of reionization 7.5 z 8, but with different evolution of the ionization fractionx i (z).
The radiation-hydrodynamic simulations are run with the RadHydro code, which combines N-body and hydrodynamic algorithms (Trac & Pen 2004) with an adaptive raytracing algorithm (Trac & Cen 2007) to directly and simultaneously solve collisionless dark matter dynamics, collisional gas dynamics, and RT of ionizing photons. The raytracing algorithm has adaptive splitting and merging to improve resolution and scaling. Each simulation has N dm = 2048 3 dark matter particles, N gas = 2048 3 gas cells, N rt = 512 3 RT cells, and up to 12 billion adaptive rays in a comoving box of side length 50 h −1 Mpc. RadHydro has been used to simulate both hydrogen and helium reionization (e.g. Trac et al. 2008;Battaglia et al. 2013b;La Plante et al. 2017;D'Aloisio et al. 2020).
We model high-redshift galaxies using an updated approach that allows us to systematically control the galaxy distributions in the simulations while matching the observed luminosity functions from HST (e.g. Bouwens et al. 2015b;Finkelstein et al. 2015). We populate dark matter halos with galaxies by abundance matching the number densities, where L UV is the galaxy UV luminosity andṀ is the halo mass accretion rate. Connecting the mass accretion rate to the star formation rate allows us to model and study the episodic nature of high-redshift galaxy formation.
The radiation escape fraction is allowed to vary with redshift in a power-law relation, where f 8 is the average escape fraction at z = 8, and a 8 is the power-law slope. Sim (a 8 =) 0 has constant f esc and reionization starts latest but ends earliest out of the three models. Sim 1 has f esc (z) varying linearly with 1 + z and is an intermediate model. Sim 2 has f esc (z) varying quadratically and reionization starts earliest but ends latest. Table 1 summarizes the parameters for the three Rad-Hydro simulations. Some additional parameters will be presented later in the paper: the reionization history parameters z mid , ∆ z , and A z in Sec. 3.1.1, the Weibull coefficients a w , b w , and c w in Sec. 3.1.2, and the radiation mean free path l mfp in Sec. 3.4.1. Each simulation takes approximately half a million CPU hours to run on a supercomputer, and they are infeasible for parameterspace studies. See SCORCH I and II Doussot et al. 2019) for more details.

Reionization-redshift Fields
The reionization-redshift field z re (x) quantifies the timing of reionization as a function of space and has been measured in radiation-hydrodynamic simulations (e.g.  Doussot et al. 2019). Each image is 512 × 512 pixels from a slice that is 50 h −1 Mpc × 50 h −1 Mpc with a thickness of ∼ 100 h −1 kpc. The three simulations show remarkable similarities even with episodic star formation and varying radiation escape fractions. On large scales, higher-density regions near sources are generally reionized earlier than lower-density regions far from sources. Trac et al. 2008;Battaglia et al. 2013b;Kaurov 2016;Doussot et al. 2019) and semi-analytical/numerical models (e.g. Barkana & Loeb 2004;Alvarez et al. 2009;Majumdar et al. 2014). Combined with the density, velocity, and temperature fields, it has been used to model the imprints of reionization on the CMB (e.g. Natarajan et al. 2013;Battaglia et al. 2013a), 21cm (e.g. La Plante et al. 2014La Plante & Ntampaka 2019), and the Lyα forest (e.g. D' Aloisio et al. 2015Aloisio et al. , 2019. We construct the reionization-redshift field for a given RadHydro simulation while it is running. For each gas cell, we save the redshift at which the majority of the mass is ionized in a three-dimensional array. Previously in Battaglia et al. (2013b), we studied both 50% and 90% ionization thresholds and find no significant differences at the resolutions of interest. Statistically, the autocorrelations and cross correlations of the z re (x) fields for these two cases are almost identical. For the vast majority of gas cells, once they start to ionize they quickly become almost fully ionized, but not entirely because of recombinations. Thus, we adopt a nominal 50% ionization value for the RadHydro simulations in SCORCH II (Doussot et al. 2019) and throughout this paper. Figure 1 is a visualization of the reionization-redshift fields z re (x) from the three RadHydro simulations. They show remarkable similarities even with episodic star formationρ (x, z) and varying radiation escape fractions f esc (z). The z re (x) field is highly correlated with the matter density field ρ(x) on large scales since the higherdensity regions near sources are generally reionized earlier than the lower-density regions far from sources (e.g. Barkana & Loeb 2004;Trac et al. 2008). In Battaglia et al. (2013a), we find that these two fields are corre- . Distribution of fractional differences z in the reionization-redshift fields zre(x) for the RadHydro Sims 0 and 2 relative to Sim 1. Prior to abundance matching (solid), the distributions are skewed because of the different reionization histories. After abundance matching to have the same mass-weighted (M) or volume-weighted (V) ionization fractions, the distributions are approximately Gaussian with zero mean and small widths.
lated down to Mpc scales and can be related using a first-order, scale-dependent bias in Fourier space.

Cross Correlations
The reionization-redshift fields are quantitatively very similar when we account for the different reionization histories. To show this, we take the z re (x) fields from Sims 0 and 2 and individually scale the redshift values for each grid cell without changing their spatial rank order such that they have the same ionization fraction  Figure 3. Top: autopower spectra of the RadHydro reionization-redshift fields zre(x) prior to abundance matching. All of the spectra have a characteristic peak on larger scales and a power-law break on smaller scales. Center: the redshift bias bzz is nonunity when crosscorrelating the original zre(x) fields, but is much closer to unity after abundance matching to have the same mass-weighted (M) or volumeweighted (V) ionization fractions. Bottom: the cross correlation rzz shows that the original fields are already very highly correlated on scales k 10 h/Mpc and abundance matching still preserves this correlation.
x 0,2 (z) =x 1 (z) as Sim 1. We are matching the abundance of ionized mass or volume at all redshifts, which we call abundance matching the reionization history (Sec. 3.4.2 for more details). Alvarez & Abel (2012) use a similar technique to rescale the reionization-redshift field in their semi-numerical method. Here we con-sider both mass-weightedx i,M (z) and volume-weighted To compare the z re (x) fields cell by cell, we define the fractional difference field, for Sim 0 or 2 relative to Sim 1 and then calculate the probability distribution function (PDF) of z . Figure  2 shows the distribution of fractional differences for the z re (x) fields at the RT resolution l rt = 98 h −1 kpc, which is approximately ten times smaller than the typical mesh cell size used in semi-numerical models. The difference distributions for the original Sims 0 and 2 are skewed because the reionization histories differ from that of Sim 1. After abundance matching, the distributions are approximately Gaussian with zero mean and small rootmean-square differences σ z = ( z (x) −¯ z ) 2 1/2 0.01. We find very similar results when matching either the mass-weighted or the volume-weighted ionization fractions.
To correlate the fields, we define the normalized redshift field, wherez re is the volume-weighted average value, and Fourier transform to get δ z (k) for all three simulations. Generally, the two-point power spectrum and its dimensionless version are defined as where i = j for autocorrelations and i = j for cross correlations. For comparing two different fields, we quantify with the bias and cross correlation, The field δ i is said to be biased, unbiased, or underbiased relative to δ j for b ij > 1, = 1, and < 1, respectively. Similarly, the fields and their fluctuations are said to be correlated, uncorrelated, and anticorrelated for r ij > 0, = 0, and < 0, with perfect correlation or anticorrelation given by the limits r ij = 1 and = −1, respectively. Figure 3 shows the autopower spectra for the z re (x) fields prior to abundance matching. On larger scales, the dimensionless spectra similarly peak at k ≈ 0.3 − 0.4 h/Mpc. As this is only a factor of 3 from the Rad-Hydro simulation box limit k min = 2π/L, the peak scale may possibly be underestimated. Nonetheless, this suggests that reionization simulations and semi-numerical models must have box lengths > 20 h −1 Mpc in order to capture the relevant large-scale fluctuations and correlations in the reionization-redshift field. The spectra similarly decline in power-law form on intermediate scales until k ≈ 10 h/Mpc, followed by steeper decline on smaller scales down to the RT Nyquist frequency.
We also crosscorrelate Sims 0 and 2 relative to Sim 1 and plot the bias b zz and normalized cross correlation r zz in Figure 3. Using the normalized fields δ z (x) partially adjusts for the different redshift midpoints, but the shorter and longer durations result in lower and higher bias for the original Sims 0 and 2, respectively. After abundance matching, the b zz (k) curves are unity at the box scale and deviate by only 10% at the RT resolution scale. The cross correlations show that the original fields are already very highly correlated, allowing abundance matching to be applied correctly. The r zz (k) curves are exactly unity at the largest scale and remain within a few percent on scales k 10 h/Mpc.
All of these results lead to new ideas for AMBER: there is a spatial order to the reionization process, and abundance matching can be applied to a correlated field to accurately predict the reionization-redshift field.

AMBER
The AMBER code provides a novel semi-numerical method for modeling the cosmic dawn on large scales. The new algorithm is not based on the ESF for directly predicting the ionization fraction field (e.g. Furlanetto et al. 2004), but takes the novel approach of calculating the reionization-redshift field z re (x) assuming that the hydrogen gas encountering higher radiation intensity are photoionized earlier. Redshift values are assigned while matching the abundance of ionized mass according to a given mass-weighted ionization fraction x i (z). The code has the unique advantage of allowing users to directly specify the reionization history through input parameters, a useful feature for theoretical and inference studies. In this section, we describe the major model components: reionization history (Sec. 3.1), Lagrangian perturbation theory (LPT) for the large-scale structure (Sec. 3.2), ESF for the halo mass density field (Sec. 3.3), and abundance matching for the reionizationredshift field (Sec. 3.4).

Reionization History
The reionization historyx i (z) directly affects EoR observables. For example, the integrated Thomson opti- cal depth and the evolution of the global 21cm brightness temperature depend linearly on the ionized electron fractionx e (z) and neutral hydrogen fractionx HI (z), respectively. In Trac (2018), we show that the reionization histories of the RadHydro simulations can be effectively and accurately described by the redshift midpoint, duration, and asymmetry parameters. We will here generalize and improve the approach for AMBER. The reionization history is quantified by the average ionized hydrogen fraction, which can be mass-weighted or volume-weighted. We work with the mass-weighted versionx i,M as the volume-averaged ionized hydrogen number density is given bȳ A common misconception is that the volume-averaged ionized density is instead proportional to the volumeweighted ionization fraction. See Chen et al. (2020) for clarification. We will typically work with mass-weighted fractions and volume-averaged densities and drop the subscripts to simplify the notation.

Midpoint, Duration, and Asymmetry
The redshift midpoint z mid , duration ∆ z , and asymmetry A z are input parameters, which in turn give three ionization points. For the early, middle, and late stages of reionization, let the redshifts z ear > z mid > z lat correspond to the ionization fractionsx ear <x mid <x lat .
We take z mid as the redshift midpoint and define the duration, and asymmetry, The other two redshifts are then given by Instantaneous reionization models would correspond to ∆ z = 0, but reionization is an extended process with finite ∆ z > 0. Symmetric reionization histories would correspond to A z = 1, but reionization simulations typically find that the early stage of reionization takes longer than the later stage such that A z > 1. Table 1 lists the midpoint, duration, and asymmetry parameters for the three RadHydro simulations. In Doussot et al. (2019), we takex mid = 0.50 and present two practical choices for the other ionization points. In the first case, we choose quartile ionization fractions (x ear ,x lat ) = (0.25, 0.75) and ∆ z is analogous to a full width half max. In the second case, we take (x ear ,x lat ) = (0.05, 0.95) and ∆ z more effectively quantifies the whole EoR. We will adopt the latter as the default convention throughout this paper. AM-BER users can specify other values, but extreme choices (e.g.x ear 0.01,x lat 0.99) are not recommended because the start and end of the EoR are not well defined and are difficult to determine precisely.

Analytical Interpolating Function
Once the midpoint, duration, and asymmetry parameters are specified and used to calculate three ionization points, we use an analytical interpolating function to calculate the reionization history. In Trac (2018), we use Lagrange interpolating functions to construct an analytical function for the ionization fraction x i (z) that exactly fits the ionization points. For three points, the interpolating polynomial is a quadratic, which has the advantage of being invertible but has the disadvantage of being nonmonotonic. A log transformation of the variables improves monotonicity over the relevant redshift range, but it is not an ideal solution.
In AMBER, we interpolate the three ionization points with a modified Weibull function (Weibull 1951), where the coefficients a w , b w , c w are all positive values. Note that a w corresponds to z end when x end = 1 and the max function ensures full ionization at lower redshifts. In Appendix A, we show that the coefficients can be easily determined by first solving a nonlinear equation for c w and then substituting its value into algebraic equations for the other two coefficients. We find that solutions exist for the asymmetry range A z 15, which is more than sufficient for parameter space studies. We also experimented with a generalized logistic function (Richards 1959), but find that it is limited to A z 5. The Weibull function can be analytically inverted, which is a useful feature for our abundance matching technique. Table 1 lists the Weibull coefficients for the three Rad-Hydro simulations. The offset a w determines when the function reaches unity, and it decreases when reionization ends later. The divisor b w controls the stretch in redshift, and it increases with the duration ∆ z . The power c w controls the steepness of the function, and it is inversely related to the asymmetry A z . In general, changing one of the reionization shape parameters while holding the other two fixed affects all three Weibull coefficients. Figure 4 compares the evolution of the mass-weighted ionization fraction x i (z) from the RadHydro simulations with the Weibull functions. The analytical parameterizations are excellent matches to the simulated reionization histories. The typical differences are only |∆x i | 0.01, while the maximum differences of |∆x i | 0.02 are found near the end of the EoR, which is not accurately captured in reionization simulations and semi-analytical models. We find that the Weibull function gives slightly better fits than Lagrange interpolation and is valid for a larger range of parameter space.

Lagrangian Perturbation Theory
LPT is used to generate initial conditions for cosmological simulations and semi-analytical methods. LPT is also used in semi-numerical models of reionization to efficiently model the evolved matter distribution in the moderately nonlinear regime because particles can be simply advanced to any given redshift without having to iteratively perform expensive force calculations and time integration like in N-body and hydro simulations. Furthermore, on moderately nonlinear scales, the dark matter and gas distributions are highly correlated and assumed to exactly trace each other. We will here test these assumptions and techniques for evolving the largescale structure and constructing the density and velocity fields in AMBER.
In first-order LPT, otherwise known as the Zel'dovich Approximation (Zel'Dovich 1970), each particle moves in a straight line with displacement and velocity, where q is the Lagrangian coordinate, φ(q) is the Lagrangian potential, D(z) is the density linear growth factor, f (z) is the velocity linear growth factor, and H(z) is the Hubble function. In second-order LPT (e.g. Bouchet et al. 1995;Scoccimarro 1998), each particle moves in a slightly curved trajectory with displacement and velocity, where φ i (q), D i (z), and f i (z) are the usual terms calculated at the ith-order. The spatial information is contained in the Lagrangian potentials, which only have to be computed once, while the time evolution is governed by the growth factors, which are more straightforward to calculate. For modeling the EoR, we find that 2LPT is more accurate than 1LPT at length scales 1 Mpc and redshifts z 5. Third-order LPT (e.g. Bouchet et al. 1995;Scoccimarro 1998) and augmented LPT (e.g. Kitaura & Hess 2013;Neyrinck 2016) have been shown to offer only small improvements at lower redshifts far after the end of the EoR. Hence, we will generally use second-order theory and will simply refer to it as LPT throughout the remainder of this paper.

Density Fields
Density fields are constructed from the LPT particles using techniques from particle-mesh (PM) methods. In N-body simulations, particles are added to a Cartesian mesh to create a density field for solving Poisson's equation with Fast Fourier Transforms (FFT). In large-scale structure analyses, galaxies or halos are added to a grid to create a density field for calculating the power spectrum. Similar techniques are also used in other seminumerical models of reionization, but we will describe an improved implementation for AMBER.
There is a hierarchy of particle assignment schemes, and the first three are called the nearest grid point (NGP), cloud-in-cell (CIC), and triangular-shaped cells (TSC). The mass assignment process interpolates the discrete distribution of particles and can be interpreted as convolving and sampling the underlying density field ρ(x) to produce a discretized density field, where W pm (x) is the PM assignment window function. See Hockney & Eastwood (1988) for a seminal review. The particle assignments introduce effects such as aliasing, smoothing, and shot noise that have to be accounted for (e.g. Jing 2005). While the latter two effects can have analytical corrections, the first is especially problematic. Hockney & Eastwood (1988) have suggested that interlacing two or more staggered meshes can reduce aliasing. Interlacing has been used to improve force calculations in N-body simulations (Couchman 1991), produce more accurate overdensity fields for power spectra estimation (Sefusatti et al. 2016), and recently implemented in the nbodykit package (Hand et al. 2018).
We use the interlacing technique to produce more accurate and robust density fields in AMBER. We construct the first density field ρ 1 (x) as normal, while for the second density mesh ρ 2 (x), the particle positions are shifted by half of a grid cell spacing in each direction by adding the vector ∆x = (l c /2, l c /2, l c /2), which is equivalent to shifting the mesh by the opposite vector. In Fourier space, the two transformed fields are combined into a single, effective field as The twiddle factor e ik·∆x multiplied to the second transformed field accounts for the shifted particle positions. We also deconvolve the effects of smoothing by dividing with the Fourier transform of the assignment window function, where p = 1, 2, and 3 for the NGP, CIC, and TSC schemes, respectively. For comparison, we construct the AMBER and Rad-Hydro density fields at two resolutions, using a 64 3 mesh with cell size l c = 0.8 h −1 Mpc and a 512 3 mesh with cell size l c = 0.1 h −1 Mpc. The fiducial lower resolution is more typical of semi-numerical models, while the higher resolution allows us to test the limits of LPT. The AMBER density fields are efficiently made using equal numbers of LPT particles as mesh cells. The RadHydro density fields are made by simply binning the 2048 3 dark matter particles and the 2048 3 gas cells down to the appropriate size meshes. Figure 5 is a visualization of the matter density fields ρ(x) at redshift z = 8 from RadHydro and AMBER. The LPT particles are assigned using the TSC scheme with interlacing and deconvolution. Both images are shown with the higher-resolution pixels in order to see . Visualization of the matter density fields ρ(x)/ ρ at redshift z = 8 from RadHydro (left) and AMBER (right). Each image is 512 × 512 pixels from a slice that is 50 h −1 Mpc × 50 h −1 Mpc with a thickness of ∼ 1 h −1 Mpc. The RadHydro matter and gas (not shown) distributions are visually indistinguishable on the scales of interest. The AMBER density field, made by assigning the LPT particles using the TSC scheme with interlacing and deconvolution, also closely resembles the RadHydro results.
the large-scale structure more clearly, but have been averaged along the line of sight at the lower resolution. The RadHydro matter and gas (not shown) density fields are visually indistinguishable, while the AM-BER density field also has close resemblance, but minor smoothing is visible in the small-scale, high-density regions. At the lower resolution, the AMBER and Rad-Hydro fields are indistinguishable, though the images appear quite pixelated. Figure 6 compares the one-point PDF of the relative density ρ/ ρ = 1 + δ at redshift z = 8. The RadHydro matter and gas distributions are in very good agreement even at the higher resolution. For AMBER, the TSC assignment scheme with interlacing and deconvolution gives the best agreement, and we only show this case to avoid overcrowding the plot. The results are also in very good agreement at the fiducial resolution, but not at the higher resolution. LPT calculated at second order or even higher order cannot capture the nonlinear shell-crossing on smaller scales and therefore underresolves high-density, collapsed regions. The disagreement in underdense regions is not concerning because it is due to the much smaller number of LPT particles and the different assignment and deconvolution process for AM-BER. We would find the same effects for RadHydro if we use a lower-resolution simulation and not done the simple binning. . One-point probability distribution functions of the density fields at redshift z = 8 from RadHydro and AMBER. The RadHydro matter and gas distributions are identical at the fiducial lower resolution and in very good agreement at the higher resolution. For AMBER, using LPT with the TSC scheme also gives very good agreement at the fiducial lower resolution. The differences in the high-density tails of the PDFs are due to reduced gravitational collapse and imperfect deconvolution for LPT. Figure 7 shows the density autopower spectra ∆ 2 dd (k) at redshift z = 8 and cross correlations relative to the RadHydro matter overdensity δ m . All of the dimensionless power spectra continue to increase in amplitude  Figure 7. Top: autopower spectra of the density fields at redshift z = 8 from RadHydro and AMBER. Center: for RadHydro, the density bias b dd of the gas relative to the matter is unity for k 10 h/Mpc and the suppression on smaller scales is due to Jeans smoothing from the gas pressure. For AMBER, the TSC assignment scheme with interlacing and deconvolution gives more correct power than the NGP and CIC schemes, which tend to overshoot near the Nyquist frequency because of aliasing. Bottom: the cross correlation r dd of the gas relative to the matter is unity nearly down to the smallest scale probed. Assigning LPT particles with the TSC scheme significantly improves the cross correlations. toward smaller scales, typical for cold dark matter dominated models. For the fiducial lower resolution, the power reaches a maximum value of only ∆ 2 dd ≈ 0.2. As this is still in the quasi-linear regime, 2LPT is an accurate and efficient approach for modeling the mass distribution and density field.
The RadHydro matter and gas perfectly trace each other with bias b dd = 1 and cross correlation r dd = 1 for k 10 h/Mpc. The suppression in the gas bias on smaller scales is due to the Jeans smoothing from the gas pressure that opposes gravitational collapse. For AM-BER, the LPT bias starts out at unity on the largest scales, is still within a few percent for k 1 h/Mpc, but drops to ∼ 0.9 by k ∼ 10 h/Mpc. However, the normalized cross correlation is still closer to unity down to smaller scales, showing that the density perturbations are highly in phased even though the amplitudes may differ. We find that using the TSC assignment scheme with interlacing and deconvolution gives more accurate autopower and cross-power spectra. The NGP and CIC results tend to be overbiased and undercorrelated near the Nyquist frequency k Ny = π/∆x because of aliasing. Our results are similar to the previous findings by Sefusatti et al. (2016), who present a thorough analysis of how interlacing can reduce aliasing and improve power spectrum estimation.

Velocity Fields
Velocity fields are constructed similarly to the density fields by assigning the LPT particles to interlaced meshes. For a given mesh cell, the velocity v is a weighted average of the individual velocities v i from all overlapping particles, where w i and w tot are the individual and total mass assignment weights, respectively. With the NGP assignment scheme, we find that some cells can have no overlapping particles and therefore undefined velocities. Both the CIC and TSC schemes work better in practice with no empty cells and missing velocities for our moderately clustered distribution of LPT particles. Figure 8 is a visualization of the line-of-sight component of the velocity fields v(x) at redshift z = 8 from RadHydro and AMBER. The velocity images are constructed using the same approach as for the density images (Fig. 5). The shown images with the higherresolution pixels are visually similar and have an even closer resemblance with lower-resolution pixels. The RadHydro and AMBER velocity fields show large coherence lengths that are actually underestimated because of the moderate simulation box length L = 50 h −1 Mpc. We will discuss this interesting feature in more detail below. Figure 9 compares the one-point PDF of the velocity components v ⊂ (v x , v y , v z ), at redshift z = 8. The RadHydro matter and gas velocity distributions are in excellent agreement, even more so than their density distributions. All of the distributions are Gaussian, and the widths decrease slightly by a few percent for the fiducial lower resolution. For AMBER, the CIC and TSC assignment schemes with interlacing and deconvolution give similarly accurate results, and we only show the lat-ter to avoid overcrowding the plot. The differences in the tails of the PDFs are due to the RadHydro simulations having slightly faster velocities because of virialization in rare, collapsed regions. Figure 10 shows the velocity autopower spectra ∆ vv (k) and the cross correlations relative to the RadHydro matter velocity v m . We individually compare and average over each velocity component The velocity power spectrum declines in amplitude toward smaller scales, unlike the density power spectrum (Fig. 7). In linear theory, the transformed velocity field is related to the matter overdensity field as v(k) ∝ (k/k 2 )δ(k), and therefore the power is predominantly coming from larger scales. This explains the large coherence length seen in the velocity images (Fig. 8). While radiation-hydrodynamic simulations are still computationally too expensive, semi-numerical methods are sufficiently efficient to afford large box sizes of 500 h −1 Mpc, which is necessary to capture the large-scale power and accurately model the velocity fields.
The RadHydro matter and gas velocities perfectly trace each other with bias b vv = 1 and cross correlation r vv = 1 for k 20 h/Mpc, even to smaller scales compared to their densities (Fig. 7). For AMBER, the NGP assignment scheme gives incorrect bias and poor stochasticity when we simply set the velocities to zero where ever it is undefined. The CIC and TSC assignment schemes have similarly accurate bias, but the latter has preferably better cross correlation. For the ve-  Figure 10. Top: autopower spectra of the velocity fields at redshift z = 8 from RadHydro and AMBER. Center: for RadHydro, the velocity bias bvv of the gas relative to the matter is unity for k 20 h/Mpc. For AMBER, the CIC and TSC assignment schemes with interlacing and deconvolution give similarly accurate bias, but the NGP scheme can produce undefined velocities that lead to incorrect bias. Bottom: the cross correlation rvv of the gas relative to the matter is unity nearly down to the smallest scale probed. Assigning LPT particles with the TSC scheme significantly improves the cross correlations. locity bias, the lower-resolution results diverge from the higher-resolution ones at larger scales compared to the density bias. The velocity field is a weighted average of the particle velocities (Eq. 23) and a simple deconvolution using the mass assignment window function (Eq. 22) is only approximately correct.
In AMBER, we adopt the TSC assignment scheme with interlacing and deconvolution as the default method for constructing density and velocity fields from the LPT particles. This approach reduces aliasing and smoothing, and produces the most accurate one-point distributions and two-point correlations. Accurate density and velocity fields are essential components for modeling EoR observables. The 21cm signal and Lyα forest both depend on the neutral hydrogen density n HI (x) and line-of-sight velocity field v los (x). For the CMB, the patchy Thomson optical depth and patchy KSZ effect depend on the electron number density n e (x) and line-of-sight momentum n e (x)v los (x), respectively.

Excursion Set Formalism
The ESF (Bond et al. 1991) is used to model the collapsed mass fraction and the halo mass function in extended Press-Schecter theory (EPS). ESF has the advantage of being orders of magnitude faster than an N-body simulation and halo finding, but it is known to only produce approximately correct results. In the majority of semi-numerical methods for reionization, it is used to model both the collapsed mass and ionization fraction field (e.g. Furlanetto et al. 2004). For AMBER, we use ESF only for modeling the halo mass density field.
We will first summarize ESF and then test its accuracy for modeling the halo mass density field in AMBER. The linearly extrapolated overdensity field δ 0 is filtered to produce where the window function W f is usually taken to be a sharp k-space filter, or a spherical tophat filter, The convolution is rapidly computed in Fourier space after transforming with highly optimized FFTs. In a large-scale region with filtered overdensity δ f , the collapsed fraction of matter in halos above a minimum mass M min is given by the EPS result (Lacey & Cole 1993), The variance of the linear density fluctuations, smoothed on mass scale M s , is calculated as where ∆ 2 lin (k) is the dimensionless linear power spectrum, and W th (k) is the spherical tophat filter with comoving radius R s = [M s /(4πρ 0 /3)] 1/3 . The corresponding collapsed overdensity barrier is given by where δ crit ≈ 1.686 is the spherical collapse value, and D(z) is the linear growth factor. Note that the redshift dependence only explicitly shows up in Equation 32, while the other functions and fields have already been linearly extrapolated to z = 0.
In the original version of ESF (e.g. Bond et al. 1991;Lacey & Cole 1993), the filtering is done in Lagrangian (q) space on the initial overdensity field. In an alternative version that is used in some semi-numerical models of reionization (e.g Mesinger et al. 2011;Zahn et al. 2011), the filtering is done in Eulerian (x) space on the evolved density field. We will refer to these versions as ESF-L and ESF-E, respectively.
When the filter window function W f is chosen to be a spherical tophat function, it is natural to set the filter radius R th to be equal to the comoving radius R s in the variance. For the sharp k-space function, it is unclear how to choose the filter radius R sk . Following Lacey & Cole (1993), we will simply choose For ellipsoidal collapse, Sheth, Mo, & Tormen (2001) propose a moving barrier δ c (σ, z) with three coefficients that can be changed in order to match the halo mass functions from N-body simulations. Their moving barrier is used for determining when halos collapse following the peak-patch approach of Bond & Myers (1996). However, it is not self-consistent to simply use their δ c (σ, z) in Equation 30 to calculate the collapsed fraction. Furthermore, the proposed functional form for δ c (σ, z) may not be valid for all halo mass functions (e.g. Robertson et al. 2009). The Sheth & Tormen (1999) halo mass function corresponding to this moving barrier does not agree with the more recent halo mass functions for the EoR in SCORCH I . Therefore, we will use the spherical overdensity δ c (z) rather than the ellipsoidal collapse barrier.

Halo Mass Density Fields
We construct halo mass density fields with ESF and compare them against results from a high-resolution Nbody simulation. For reference, we use the halo catalog from SCORCH II (Sec. 2.1) and select all halos above a minimum mass M min = 10 8 h −1 M , which corresponds to the atomic cooling limit for galaxy formation at redshift z ≈ 8. To assign the halo mass to interlaced meshes, we choose the CIC scheme as it gives both reduced aliasing and strong correlation with the matter density field. The NGP scheme has the most aliasing, while the TSC scheme has the most stochasticity as it spreads the small halos over too many larger grid cells.
For ESF-L, we calculate the collapsed fraction and mass in Lagrangian space,  Figure 12. Top: autopower spectra of the halo mass density fields for Mmin = 10 8 h −1 M at z = 8 from an N-body simulation and from ESF-L and ESF-E with sharp k-space (sk) and tophat (th) filters. Center: relative to the N-body halos, ESF-L has better halo bias b hh that is closer to unity than ESF-E. Bottom: ESF-L also has better halo cross correlation r hh that is closer to unity than ESF-E.
for LPT particles of mass m p , move the particles to their comoving positions x (Eq. 18), and then assign the collapsed mass to the interlaced meshes (Eq. 21) to obtain the halo mass density field ρ h (x). For ESF-E, we calculate the collapsed fraction field in Eulerian space, and then the halo mass density field is calculated as whereρ m is the average matter density, and the growth factor D(z) reverses the linear extrapolation done earlier. ESF requires that the mass smoothing scale M s be greater than the minimum halo mass M min in Equation 30. The smoothing scale also cannot be lower than the particle mass resolution. For the fiducial lower resolution with 64 3 particles on a 64 3 mesh in the 50 h −1 Mpc box, the particle mass is m p = 4.0 × 10 10 h −1 M and the cubical cell size is l c = 0.8 h −1 Mpc. Therefore, we will choose and vary the smoothing scales such that M s ≥ m p and R s = [M s /(4πρ 0 /3)] 1/3 0.5 h −1 Mpc. Figure 11 is a visualization of the halo mass density fields ρ h (x)/ ρ h for M min = 10 8 h −1 M at z = 8. Here, we normalize each density field using the same mean ρ h taken from the N-body result to preserve the relative differences in ρ h (x). We also use the fiducial smoothing scale M s = m p . The halo images appear more pixelated than the corresponding density and velocity images (Fig. 5,8) because of the lower resolution shown here. The ESF-L images with the sharp k-space and tophat filters are remarkably similar to the N-body result, but the ESF-E images are noticeably different. ESF-E produces relatively higher halo densities in prominent collapsed regions and lower values elsewhere, and this larger range of density contrast can be traced back to using the evolved density field rather than the initial density field for the filtering process. Figure 12 shows the halo autopower spectra ∆ 2 hh (k) and cross correlations. The halo power spectrum is approximately power-law with slope n = d ln P/d ln k ≈ −1.5 at z = 8. In comparison with the matter power spectrum (Fig. 7), we find scale-dependent halo-matter bias b hm (k) with a large-scale value of ≈ 3.7. Similarly, the halo-matter cross correlation r hm (k) is close to unity on the largest scales, but drops below 0.9 for k 1 h/Mpc. Therefore, we cannot accurately model the halo mass density field assuming linear bias, even if we allow for scale dependence.
In cross correlation with the N-body halo mass density field, the ESF-L with the sharp k-space filter gives the best agreement, closely followed by with the tophat filter. Both the bias b hh (k) and cross correlation r hh (k) differ from unity by 0.05 down to the smallest scale shown. The strong agreement is remarkable given that we have adopted nominal choices for the filter radius R sk and collapse barrier δ c without any fine-tuning. ESF-E with either the sharp k-space or tophat filter gives poorer agreement with the N-body results. The larger amplitude in the power spectrum and bias is consistent with the larger range in the density contrast seen in the halo density images (Fig. 11). We now vary the minimum halo mass M min and smoothing mass M s and quantify the mass-weighted average collapse fraction f coll in Figure 13. As M min is increased above the atomic cooling limit while fixing M s = 4 × 10 10 h −1 M and R s = 0.5 h −1 Mpc, we again find that ESF-L with the sharp k-space filter gives the best agreement, followed by with the tophat filter. However, ESF-E overpredicts the average collapse fraction by a factor of 3, with increasingly larger differences toward higher M min . Note that we should expect small differences in the average collapse fraction. In the N-body halo finder , the halo mass M 200 is defined such that the average halo densityρ h is 200 times the average matter densityρ m (z), which is also equal the critical densityρ crit (z) at high redshifts. In ESF, the halo mass is not clearly defined, although in spherical collapse theory, the average halo density is usually 18π 2 ≈ 180 times the critical density at high redshifts or in Einstein-de Sitter cosmologies.
As M s and R s are increased while fixing M min = 10 8 h −1 M , we find that ESF-L gives reliably accurate and stable values for the average collapsed fraction that are within 10% of the N-body halo value. However, ESF-E gives highly variable results that are only in agreement at large smoothing radius R s 10 h −1 Mpc. Similarly, we also find that the power and bias change as the smoothing is varied, which is not found for ESF-L.
The sensitivity of ESF-E to the smoothing and filtering scales is especially troublesome for the seminumerical models of reionization that are based on this approach. For a given ionized region, the filtering scale is adaptively varied until the number of ionizing photons equals the number of hydrogen atoms. For the entire modeled volume, there is a distribution of filtering scales rather than just one value, and as a consequence, there is no one value for the average collapse fraction and large-scale halo bias. Furthermore, as one changes the reionization parameters, the distribution of filtering scales will change, and so too will the halo mass density field and power spectrum, both of which should be independent of the reionization history.
In AMBER, we adopt the ESF-L with the sharp kspace filter as the default choice for modeling the halo mass density field. In the next section, we will discuss that our abundance matching technique does not depend on the overall average density ρ h nor the overall normalization of the halo bias b hm . In future work, we can calibrate the EPS collapsed fraction relation (Eq. 30) using the N-body simulations and halo catalogs from SCORCH I and II. Accurate halo abundance and density fields are required to properly model high-redshift galaxies and quasars as radiation sources.

Abundance Matching
The new idea for AMBER is that there is a spatial order to the reionization process, and abundance matching can be applied to a correlated field to accurately predict the reionization-redshift field (Sec. 2.3) such that the reionization history follows a given massweighted ionization fraction (Sec. 3.1). We emphasize that the abundance matching technique does not depend on the overall normalization of the correlated field of interest. In Trac et al. (2008), we find that the higherdensity regions near sources are generally reionized earlier than the lower-density regions far from sources in our RadHydro simulations. Following up in Battaglia et al. (2013b), we find that the density and reionizationredshift fields are correlated down to Mpc scales. While two natural choices for abundance matching are the matter density field (Sec. 3.2.1) and the halo mass density field (Sec. 3.3.1), the radiation field is more directly relevant to the photoionization of hydrogen.

Radiation Field
The radiation field is specified by the hydrogen photoionization rate, where J ν is the specific intensity, σ HI is the hydrogen photoionization cross-section, and ν HI is the Lyman limit frequency. In our RadHydro simulations, the RT is calculated using an accurate but expensive raytracing algorithm (Trac & Cen 2007). However, this is an unnecessarily costly approach for calculating the radiation field just for abundance matching, which does not depend on the overall normalization. In AMBER, we model the specific intensity field as where S ν is the source function, the 1/(4πr 2 ) term is the inverse-square law for flux, and the e −r/l mfp term is the transmitted fraction over the radiation mean free path l mfp . The source field is related to the specific luminosity density and can be expressed as where ρ sfr is the star formation rate density, ν is the radiative energy per unit star formation rate per frequency, and f esc is the radiation escape fraction (e.g. Madau et al. 1999;Robertson et al. 2010). The star formation rate density field is often modeled using the halo mass density field, where f star and τ star are the star formation efficiency and timescale, respectively (e.g. Cen & Ostriker 1992;Trac & Cen 2007). The source field is proportional to the halo mass density field, while the normalization depends on the product of several astrophysical parameters and functions (f esc , f star , τ star , ν ), which may have redshift dependence that further alters the reionization history. In semi-numerical methods based on the ESF, this combination is often represented by a single efficiency parameter ζ, which is generally assumed to be constant for simplicity. With our abundance matching technique, we do not need to specify the astrophysical quantities to calculate the normalization of the the source field, but instead get to cast them and their effects in terms of the the redshift midpoint, duration, and asymmetry parameters that directly set the reionization history (Sec. 3.1).
In AMBER, we adopt the common assumption S ν (x) ∝ ρ halo (x) and ignore the overall normalization for the source field as it is not necessary for abundance matching. The radiation field is computed quickly using FFTs to perform the convolution in Equation 36. In practice, we can abundance match the specific intensity field instead of the photoionization rate field, as the integration over frequency only changes the overall normalization for the latter.
We use an effective mean free path to account for the attenuation of the radiation field. Our use of a mean free path parameter is more physical than the maximum bubble size imposed in some semi-numerical models based on the ESF. It is more self-consistent with RT theory (e.g. Madau et al. 1999) and in better agreement with observational measurements (e.g. Worseck et al. 2014) extrapolated to higher redshifts for the EoR. It also helps to mitigate uncertainties on radiation sinks. Recently, Davies & Furlanetto (2021) have also introduced an ionizing photon mean free path as an improved alternative to the maximum filtering scale in ESF methods like 21cmFAST. Their implementation is different in that l mfp ultimately shows up in their scale-dependent ionization criterion, whereas we use it to directly compute an attenuated radiation field.

Reionization-redshift Fields
The reionization-redshift field z re (x) is assumed to be correlated with a given field, either the matter density field, halo mass density field, or radiation field. A region with higher matter density, halo density, or radiation intensity is considered to be photoionized earlier and has a higher reionization redshift. The abundance matching technique assigns redshift values such that the reionization history follows a given mass-weighted ionization fractionx i (z), specified with the redshift midpoint, duration, and asymmetry parameters and interpolated with a Weibull function (Sec. 3.1).  Figure 14. Visualization of the reionization-redshift fields zre(x) from the three RadHydro simulations (left) and corresponding abundance matching models using matter density (AM-M), halo mass density (AM-H), and radiation intensity (AM-R) fields. Each image is 64 × 64 pixels from a slice that is 50 h −1 Mpc × 50 h −1 Mpc with a thickness of ∼ 1 h −1 Mpc. Abundance matching using the radiation field more correctly captures that large-scale regions near sources are generally reionized earlier than largescale regions far from sources, and is the adopted approach in AMBER.
We perform the abundance matching on a correlated field at a single redshift for computational efficiency, but it can also be done tomographically using multiple redshift intervals. The correlated field is first constructed at the redshift midpoint z mid and the data array is ranked in descending order using a parallel quicksort. For the nth rank order cell out of a total of N , all cells with indices m ≤ n are considered ionized. The reionization redshift z n is calculated by equating the cumulative mass fraction with the mass-weighted ionization fraction: where the matter overdensity for the mth cell is linearly extrapolated from the redshift midpoint, The linear growth is appropriate for the modest overdensities in lower-resolution semi-numerical models, and because we have chosen the redshift midpoint as the pivot. We construct the abundance matching models using the matter density field (AM-M), halo mass density field (AM-H), and radiation intensity fields (AM-R), and compare them with the RadHydro reionizationredshift fields (Sec. 2.3). For the AM-M models, we  Figure 15. Top: autopower spectra of the reionizationredshift fields from the RadHydro simulations and corresponding abundance matching models using the matter density (AM-M), halo mass density (AM-H), and radiation intensity (AM-R) fields. The AM-R spectra have a characteristic peak similar to the simulations. Center: the redshift bias bzz between the models and simulations for AM-R are much closer to unity than for AM-M and AM-H. Bottom: the AM-R models have improved cross correlation rzz that approach unity on large scales compared to AM-M and AM-H.
use the TSC particle assignment scheme with interlacing and deconvolution to construct the matter density field (Sec. 3.2.1). For the AM-H models, we use the ESF-L with sharp k−space filter for constructing the halo mass density field and set the minimum halo mass to M min = 10 8 h −1 M (Sec. 3.3.1). While this corre-sponds to the threshold for atomic cooling halos, the RadHydro simulations are more complex because of the nonmonotonic star formation efficiency and episodic star formation Doussot et al. 2019). For the AM-R models, we vary the radiation mean free path to find the best agreement between the reionizationredshift fields. Figure 14 is a visualization of the reionization-redshift fields z re (x) from the RadHydro simulations and abundance matching models. While all of the models have the same reionization history for a given simulation, their reionization-redshift fields are visibly different. The AM-R images most closely resemble the simulation results and correctly show that large-scale regions near radiation sources are generally reionized earlier than large-scale regions far from sources. However, the AM-M and AM-H images appear too grainy. The low-density regions just outside of collapsed regions are assigned low redshifts and considered to be reionized late despite their close proximity to the radiation sources. We can possibly improve these results by smoothing the matter density and halo density fields prior to abundance matching. Figure 15 shows the reionization-redshift autopower spectra ∆ 2 zz (k) and cross correlations relative to the RadHydro normalized redshift δ z (Eq. 4). We only show Sims 0 and 2 for clarity. The AM-R power spectra are most similar to the simulation results, rising in power until a characteristic peak scale and then declining in amplitude with k. However, the AM-M and AM-H power spectra generally always increase with k, similar to the mass density (Fig. 7) and halo density power spectra (Fig. 12). The AM-R models are the least biased with b zz ≈ 1 for k 1 h/Mpc, while the AM-M and AM-H biases significantly deviate from unity. Furthermore, the AM-R models have improved cross correlation r zz that approach unity on large scales. The bias and stochasticity on small scales can be attributed to three main factors. In the RadHydro simulations, the episodic star formation and spatially varying mean free path are not accounted for. Furthermore, the simulations and models have differences in smoothing near the grid scale.
In AMBER, we choose to abundance match using the radiation field evaluated at the redshift midpoint. In future work, we can improve the spatial accuracy by incorporating varying mean free paths (e.g. Cain et al. 2021;Davies & Furlanetto 2021). We can also perform the abundance matching tomographically using multiple redshifts.

METHODS COMPARISON
We compare AMBER against two other seminumerical methods using RadHydro Sim 1 as reference. We continue to use the fiducial resolution with 64 3 mesh with cell size l c = 0.8 h −1 Mpc, which is comparable to the typical resolution for semi-numerical methods. In this section, we first summarize the methods (Sec. 4.1 and 4.2), and then compare their reionization histories (Sec. 4.3) and reionization-redshift fields (Sec. 4.4).

21cmFAST
The 21cmFAST code , like the majority of semi-numerical methods, is based on the ESF for reionization (e.g. Furlanetto et al. 2004). It generates the density, ionization, velocity, and spin temperature fields to compute the 21cm brightness temperature given the astrophysical parameters that control the ionizing, Lyα, and X-ray backgrounds. This method has been used widely to model and study the EoR through the 21cm signal (e.g Greig & Mesinger 2017) and CMB (e.g. Mesinger et al. 2012).
We use the original version with three free parameters: ionizing efficiency ζ, minimum halo mass M min , and maximum filter scale R max , because this parameterization is common amongst ESF codes and has been extensively tested and applied. Park et al. (2019) have developed a new version with eight free parameters, allowing additional parameterization of high-redshift galaxy properties. Davies & Furlanetto (2021) have proposed improving R max with two physically motivated prescriptions of the mean free path of ionizing photons. To our knowledge, the newer versions have not yet been tested against RT simulations, and searching the highdimensional parameter space is beyond the scope of our basic comparison.
By default, 21cmFAST uses 64 times as many 1LPT particles as mesh cells for constructing density and velocity fields. For this comparison, 256 3 particles are assigned to a 64 3 mesh using the NGP scheme, but there is significant smoothing near the grid scale because deconvolution is not performed. To achieve the same effective resolution, 21cmFAST would require at least 8 times as many mesh cells, 512 times as many particles, and over 500 times as much memory as AMBER. It also uses the alternative ESF-E version for the collapsed fraction (Sec. 3.3.1), but it does not explicitly compute the halo mass density field.

RLS
The Reionization on Large Scales (RLS; Battaglia et al. 2013b) method is not based on the ESF, but is a parametric approach motivated by and calibrated with previous RadHydro simulations. It provides a parametric approach to map higher-resolution, smallervolume radiation-hydrodynamic simulations onto lowerresolution, larger-volume N-body simulations. This method has been applied to model and study the patchy Thomson optical depth (Natarajan et al. 2013), patchy KSZ effect (Battaglia et al. 2013a), and the 21cm lightcone effect (La Plante et al. 2014).
The reionization-redshift field is found to be highly correlated with the matter density field down to Mpc scales and related using a first-order, scale-dependent bias. In Fourier space, the matter overdensity field δ(k) at the mean redshiftz is first smoothed with a spherical tophat filter W th (k) with radius R th = 1 h −1 Mpc and then multiplied by a bias function b zm (k) to produce the normalized redshift field, The bias function is parameterized as where the normalization is fixed at b RLS = 1/δ c = 0.593 based on analytical models (Barkana & Loeb 2004), while the scale k RLS and slope a RLS are allowed to vary. Finally, the reionization-redshift field is given by where the mean redshiftz is the volume-averaged value of the reionization-redshift field, and it is not exactly equal to the redshift midpoint z mid of either the massweighted or volume-weighted ionization fractions. While the RLS parameters can be varied to change the reionization history and process, there is no direct connection between them and the redshift midpoint, duration, asymmetry, minimum halo mass, and radiation mean free path. Figure 16 compares the mass-weighted ionization fractionsx i,M (z) and volume-weighted ionization fractions x i,V (z) from the models with the simulation. For AM-BER, we directly input the redshift midpoint z mid = 7.85, duration ∆ z = 4.73, and asymmetry A z = 2.35 parameter values. These are calculated for RadHydro Sim 1 at the fiducial resolution using Equations 10 and 11 onx i,M (z). While AMBER uses the mass-weighted parameters by construction, it almost exactly reproduces both the mass-weighted ionization fraction with |∆x i,M | max = 0.01, and volume-weighted ionization fraction with |∆x i,V | max = 0.03. There are two models for 21cmFAST. In the first model (21cmFAST-a), we vary the ionization efficiency and find the best-fit ζ = 7.5 for matching z mid . The minimum halo mass is kept fixed at M min = 10 8 h −1 M to be consistent with the SCORCH galaxy models used in the RadHydro simulations. This model has a more extended reionization history, overestimates the duration by 35%, underestimates the asymmetry by 6%, and has |∆x i,M | max = 0.10 and |∆x i,V | max = 0.17. In the second model (21cmFAST-b), we find best-fit ζ = 22.9 and M min = 1.8 × 10 9 h −1 M for matching both z mid and ∆ z . This model underestimates the asymmetry by 14% and has |∆x i,M | max = 0.04 and |∆x i,V | max = 0.06. Note that the best-fit M min is now ≈ 20 times larger and inconsistent with that in the RadHydro simulations. This complicates the interpretation of the inferred minimum halo mass for galaxy formation when comparing observations with the original version of 21cmFAST and other similar ESF models. The newer version (Park et al. 2019) with additional parameterization for high-redshift galaxy properties will likely provide better agreement, but it requires varying eight free parameters.

Reionization History
There are also two models for the RLS method. The first model (RLS-a) strictly follows Battaglia et al. (2013b). We fit the bias data from RadHydro Sim 1 and obtain k RLS = 2.09 h/Mpc and a RLS = 1.71 when b RLS is set to b B13 = 0.593. In addition, we choosē z = 7.84 to be the same value from the simulation. The fitted function underestimates the large-scale bias, and the best-fit k RLS and a RLS are significantly different from the fiducial values of k B13 = 0.185 h/Mpc and a B13 = 0.564 in Battaglia et al. (2013b). This model increases z mid by 0.19, underestimates ∆ z by 17%, underestimates A z by 25%, and has |∆x i,M | max = 0.11 and |∆x i,V | max = 0.09. In the second model (RLS-b), we fit for all three bias parameters and obtain b RLS = 0.895, k RLS = 0.325 h/Mpc, and a RLS = 0.816. While k RLS and a RLS are now more similar to k B13 and a B13 , the normalization b RLS is 51% larger than the previously adopted b B13 . We also choosez = 7.61 in order to match z mid , but note the difference of 0.24 between them. This model underestimates ∆ z by 16%, underestimates A z by 30%, and has |∆x i,M | max = 0.08 and |∆x i,V | max = 0.07. Note that the RLS method tends to produce more symmetric reionization histories, and it is difficult to produce larger A z values just by varying the free parameters. Figure 17 is a visualization of the reionization-redshift fields z re (x) and Figure 18 shows the reionizationredshift autopower spectra ∆ 2 zz (k) and cross correlations relative to RadHydro Sim 1. For AMBER, we vary the radiation mean free path and obtain l mfp = 3.2 h −1 Mpc when matching the auto-power spectrum on the largest scales. This produces a reionization-redshift field that is visually and statistically in strongest agreement out of all the models. The redshift bias b zz (k) and cross correlation r zz (k) show that AMBER is both the least biased and most correlated with the RadHydro simulation.

Reionization-redshift Field
For 21cmFAST, we can also vary the ESF maximum filter scale R max . It is recommended that R max ≥ 20 h −1 Mpc, but we are limited to R max ≤ 25 h −1 Mpc. The radius of the smoothing sphere cannot be larger than half of the periodic box length otherwise it would wrap around itself. We find no improvement over this narrow range of filtering scales. 21cmFAST-b does better than 21cmFAST-a because we are able to match both z mid and ∆ z . Even then it differs more from Rad-Hydro than either AMBER or RLS. The reionizationredshift image shows noticeable differences near biased sources, and this coincides with the redshift bias b zz (k) being larger than unity by > 0.2 on the largest scale. The redshift cross correlation r zz (k) drops to 0.5 by k ∼ 1 h/Mpc and to zero at the grid Nyquist frequency. The stochasticity may be due to using the NGP scheme for the matter density field, the ESF-E version for the collapsed fraction, and the semi-numerical approach for the ionization fraction field. Again, the newer version of 21cmFAST will likely provide better agreement, but it requires searching an eight-dimensional parameter space.
The RLS method is a parametric approach to constructing the reionization-redshift field, so it is not surprising to see that it is visually and statistically in good agreement. On large scales, RSL-b does better than RLS-a because the bias normalization is allowed to vary beyond the fixed value of b B13 = 0.593. On small scales, both models have significant smoothing because of the spherical tophat filter applied to the overdensity field. In Battaglia et al. (2013b), this implementation choice was made because the linear, deterministic biasing relation, δ z (k) ∝ δ(k), breaks down on scales k 1 h/Mpc. RLS has more stochasticity than AMBER, but less than 21cmFAST. The gradual deviation of the redshift cross correlation r zz (k) from unity reflects the decreasing correlation between the reionization-redshift and matter density fields towards smaller scales.
More studies and comparisons are required to understand the advantages, disadvantages, and appropriate utilization of the various semi-numerical methods. We emphasize that a variety of independent approaches are crucial for theoretical and inference studies of the EoR.

RESULTS
We now present some basic results from AMBER and leave further applications for upcoming work. For example, Chen et al. (2022) models and studies the patchy KSZ effect, which is a promising probe of the EoR. In this section, we study the parameter dependence of the reionization-redshift field (Sec. 5.1) by varying the redshift midpoint, duration, asymmetry, minimum halo mass, and radiation mean free path.
We run AMBER models, each with 2048 3 particles and 2048 3 cells in a comoving box of side length L = 2 h −1 Gpc. For the fiducial model, we use z mid = 8.0, ∆ z = 4.0, A z = 3.0, M min = 10 8 h −1 M , and l mfp = 3.0 h −1 Mpc. In addition, we include a lower and higher value when we vary each parameter one at a time. Each simulation takes approximately 20 CPU hours when run in parallel with 32 to 64 cores. We discuss the code scaling performance in Appendix B. Figure 19 is a visualization of the reionization-redshift fields. Each image is 100 × 100 pixels from a subsection that is 100 h −1 Mpc × 100 h −1 Mpc with a thickness of ∼ 1 h −1 Mpc. We use a divergent color scheme centered at the fiducial redshift midpoint z mid = 8.0 to distinguish between regions that are ionized earlier and later. In general, the higher-density regions near radiation sources are reionized earlier than the lower-density regions farther away on large scales. As a result, a larger fraction of the volume is seen with z re < z mid and the redshift midpoint of the volume-weighted ionization fraction comes later. Figure 20 shows the one-point PDF of the normalized redshift δ z (Eq. 4), reionization-redshift auto-power  Figure 18. Top: autopower spectra of the reionizationredshift fields from RadHydro Sim 1, AMBER, 21cmFAST, and RLS. All of the spectra peak at a characteristic scale that depends on the redshift midpoint z mid . Center: the redshift bias bzz between the models and simulation show that AMBER and RLS-b are the least biased. Bottom: the cross correlations rzz show that AMBER is the most correlated, followed by RLS, while 21cmFAST is the most stochastic. spectra, and cross correlations relative to the matter density field at the redshift midpoint. In general, the PDF(δ z ) are skewed Gaussians except for the symmetric case with A z = 1.0. The reionization-redshift power spectrum ∆ zz (k) first rises in power until a characteristic peak scale and then declines in amplitude with k. The redshift-matter bias b zm (k) asymptotically approaches a constant value towards large scales and declines monotonically with k.

Redshift Midpoint
As the redshift midpoint decreases over the range 7.0 ≤ z mid ≤ 9.0, we see overall later reionization in Figure 19. The one-point PDF(δ z ) becomes wider, the two-point power spectrum ∆ 2 zz (k) increases overall in amplitude, but the redshift-matter bias b zm (k) remains almost constant in Figure 20. In Equation 4 for the normalized redshift δ z , the numerator (z re −z re ) remains approximately constant, but the denominator (1 +z re ) decreases. Therefore, δ z grows by a factor of a re = 1/(1+z re ), whereas the matter overdensity δ grows by a factor D(z mid ) ≈ 1/(1 + z mid ). We find that the normalized redshift grows similarly to the matter overdensity as the bias is only ≈ 5% lower for z mid = 7.0 than for z mid = 9.0. Note that while the radiation field also grows like the matter density, the abundance matching is insensitive to the overall amplitude.

Duration
As the duration increases over the range 2.0 ≤ ∆ z ≤ 6.0, we see a larger range of values in the reionizationredshift images. The PDF(δ z ) has larger variance and the power spectrum increases overall in amplitude. Note that the variance is given by σ 2 z = |δ z (x)| 2 = ∆ 2 zz (k)d ln k. Therefore, we expect the large-scale bias to scale with σ z , which in turn scales with ∆ z . We find that the large-scale bias changes by a factor ≈ 3.1 when the duration increases from ∆ z = 2.0 to ∆ z = 6.0.
The large-scale bias can differ significantly from the single value of b B13 = 1/δ c = 0.593 adopted in the RLS method (Battaglia et al. 2013b). Barkana & Loeb (2004) derived this value assuming that fluctuations in the reionization-redshift field are solely determined by fluctuations in the collapsed mass density field from EPS theory. Relating the growth of ionized regions to the growth of dark matter halos would correspond to a particular duration value. However, the reionization history can depend on other nonconstant factors and the largescale bias can have a range of values in general.

Asymmetry
As the asymmetry increases over the range 1.0 ≤ A z ≤ 5.0, the images show a smaller range of redshift values for regions reionized after the midpoint and a larger range of redshift values for the regions reionized earlier than the midpoint. The PDF(δ z ) becomes skewed and the power spectrum changes shape. There is less power on large scales and more power on small scales as the characteristic peak shifts to smaller scales. For a larger A z at fixed duration, the first portion (z mid < z < z ear ) z re Figure 19. Visualization of the reionization-redshift fields zre(x) when the redshift midpoint z mid , duration ∆z, asymmetry Az, minimum halo mass Mmin, and radiation mean free path l mfp are varied. Each column shows when a given parameter is varied, while the middle row shows to the same fiducial model. The divergent color scheme centered at the fiducial z mid = 8.0 is used to distinguish between regions that are ionized earlier and later. Each image is 100 × 100 pixels from a subsection that is 100 h −1 Mpc × 100 h −1 Mpc with a thickness of ∼ 1 h −1 Mpc.
of the reionization history becomes longer, while the second portion (z lat < z < z mid ) becomes shorter. We have just seen that increasing (decreasing) the duration increases (decreases) the overall power. We find that lengthening the first half portion slightly increases the power on small scales while shortening the second portion more significantly decreases the power on large scales. Correspondingly, the large-scale bias decreases with more asymmetry in the reionization history.

Minimum Halo Mass
Over the minimum halo mass range 10 7 ≤ M min /[h −1 M ] ≤ 10 10 , there are only minor changes to the reionization-redshift field. The PDF(δ z ) only has minor changes since the redshift midpoint, duration, and asymmetry are kept fixed. There is minor suppression in ∆ 2 zz (k) and b zm (k) on small scales. For larger M min , the collapsed fraction decreases and the halo bias in-creases. Correspondingly, this decreases the amplitude and increases the bias of the radiation intensity field (Eq. 36). However, the abundance matching technique is insensitive to the overall normalization of J ν and to the overall normalization of the fluctuating component δ J . The abundance matching only depends on the relative ranking of the field values.
On one hand, varying the minimum halo mass and halo density field will generally change the reionization history and process if astrophysical parameters such as the star formation efficiency, photon production efficiency, and radiation escape fraction are kept fixed. On the other hand, varying these astrophysical terms, which can be functions of mass and redshift, can offset the changes from varying M min and produce the same reionization history. With AMBER, we find that when the redshift midpoint, duration, asymmetry, and radiation mean free path are kept fixed, then the reionization- redshift field has only minor dependence on the minimum halo mass. For certain applications, it may then be possible to ignore M min and reduce the number of free parameters.

Radiation Mean Free Path
As the radiation mean free path increases over the range 1.0 ≤ l mfp /[h −1 Mpc] ≤ 5.0, we see more spatial coherence in the reionization-redshift images. For a larger l mfp , there is a larger coherence length and less small-scale structure in the radiation intensity field used for abundance matching. The PDF(δ z ) only has minor changes since the redshift midpoint, duration, and asymmetry are kept fixed. The reionization-redshift power spectrum changes shape though, having more power on large scales and less on small scales. Increasing l mfp produces relatively larger ionized regions, resulting in the characteristic peak shifting to larger scales.
In the RLS method, Battaglia et al. (2013b) change the duration by varying a RLS and k RLS . But without varying b RLS , they also change the correlation length of ionized regions at the same time. More specifically, they obtain smaller correlation lengths for longer dura-tions. With AMBER, we find longer ∆ z increases the bias, while shorter l mfp decreases the bias. This explains the inverse relationship between the duration and correlation length when b RLS is fixed in the RLS method. AMBER has the flexibility to change ∆ z and l mfp independently.

CONCLUSION
AMBER is a semi-numerical code for modeling the cosmic dawn and EoR. The new algorithm is not based on the ESF for directly predicting the ionization fraction field, but takes the novel approach of calculating the reionization-redshift field z re (x) assuming that the hydrogen gas encountering higher radiation intensity is photoionized earlier. Redshift values are assigned while matching the abundance of ionized mass according to a given mass-weighted ionization fraction x i (z). The code has the unique advantage of allowing users to directly specify the reionization history through the redshift midpoint z mid , duration ∆ z , and asymmetry A z input parameters (Trac 2018). The reionization process is further controlled through the minimum halo mass M min for galaxy formation and the radiation mean free path l mfp for RT.
We implement improved methods for modeling the large-scale structure and constructing density, velocity, halo, and radiation fields, which are essential components for modeling reionization observables. 2LPT (vs 1LPT) is used to efficiently evolve the matter distribution in the moderately nonlinear regime. The TSC (vs NGP or CIC) PM assignment scheme with interlacing and deconvolution produces density and velocity fields that are in excellent agreement with RadHydro simulations (Doussot et al. 2019). ESF-L (vs ESF-E) accurately and robustly produces halo mass density fields compared to N-body simulations . The radiation intensity field is computed quickly using FFTs to convolve the source field with the RT kernel. The interlacing and deconvolution reduce aliasing and smoothing, and thereby improve power spectrum estimation (e.g. Sefusatti et al. 2016). These methods can also be used to improve other semi-numerical and RT algorithms. The AMBER program modules can be adapted and plugged into other codes.
We compare AMBER with two other semi-numerical methods, 21cmFAST ) and the RLS method (Battaglia et al. 2013b), and find that our code more accurately reproduces the results from RadHydro simulations. AMBER has the flexibility to directly match the simulated reionization history x i (z), while the other two methods require running multiple models before finding the best fit. It also produces reionizationredshift fields z re (x) that are the least biased and most correlated with the simulations. The RLS method has more smoothing, while 21cmFAST has more stochasticity. More studies and comparisons are required to understand the advantages, disadvantages, and appropriate utilization of the various semi-numerical methods. We reiterate that a variety of independent approaches are crucial for theoretical and inference studies of the EoR.
We study the parameter dependence of the reionization-redshift field by varying the redshift midpoint, duration, asymmetry, minimum halo mass, and radiation mean free path. As z mid decreases, there is overall later reionization, the PDF(δ z ) becomes wider, the power spectrum ∆ 2 zz (k) increases overall in amplitude, but the redshift-matter bias b zm (k) remains approximately constant. As ∆ z increases, there is more extended reionization, the redshift distribution becomes wider, and both the power spectrum and bias increase overall in amplitude. As A z increases, the reionization history becomes more asymmetric, the redshift distribution is more skewed, and the power spectrum changes shape. Interestingly, the reionization-redshift field is only weakly sensitive to M min when the other parameters are fixed. As l mfp increases, there is more spatial coherence and less small-scale structure in the radiation intensity and reionization-redshift fields, and the power spectrum changes shape due to the relatively larger ionized regions.
AMBER is initially designed for theoretical and inference studies in which the primary interest is controlling or constraining the reionization history and process. With the reionization-redshift field, the evolution of the neutral hydrogen and ionized electron densities can be readily computed. As a first application, Chen et al. (2022) models and studies the imprint of reionization on the CMB through the patchy KSZ effect, which is a promising probe of the EoR. Complementary, the connection between radiation sources and sinks, and the reionization history can be understood using analytical models for the evolution of the massweighted ionization fraction (e.g. Chen et al. 2020) and radiation-hydrodynamic simulations that solve the complex physics.
In future work, we will include additional physics that will enable more realistic predictions of EoR observables. To improve the source models, we will calibrate the ESF halo collapsed fraction against N-body simulations, and include high-redshift galaxies through abundance matching (e.g. Trac et al. 2015). To improve the radiation and reionization-redshift fields, we will incorporate spatially varying radiation mean free paths (e.g. Cain et al. 2021;Davies & Furlanetto 2021), and perform the ionization abundance matching tomographically using multiple redshifts. We will compute the Lyα and X-ray radiation fields for the 21cm signal (e.g. Mesinger et al. 2011;Semelin et al. 2017) and calculate the thermal history for the Lyman alpha forest (e.g. Upton Sanderbeck et al. 2016;D'Aloisio et al. 2019).
AMBER is currently parallelized to run on multi-core, shared-memory nodes. It is over four orders of magnitude faster than radiation-hydrodynamic simulations and will efficiently enable large-volume models, full-sky mock observations, and parameter-space studies. The code will be made publicly available to facilitate and transform studies of the EoR.
We thank Nick Gnedin for initial discussions that motivated the development of AMBER. We thank Anson D'Aloisio, Matt McQuinn, and the anonymous referee for providing comments on the manuscript. We also thank Nick Battaglia, Sasha Kaurov, Emiliano Sefusatti, and Qirong Zhu for helpful discussions. The computational work was performed on the Bridges-2 system at the Pittsburgh Supercomputing Center (PSC) using allocations ast190014p and phy210042p. We thank PSC staff Ken Hackworth and T.J. Olesky for computing support. H.T. acknowledges support from the NSF AI Institute: Planning: Physics of the Future, NSF PHY2020295. R.C. acknowledges support from NSF AST2007390. B. SCALING PERFORMANCE AMBER is written in modern Fortran and parallelized using OpenMP to run on multi-core, sharedmemory nodes. We perform the following scaling tests on Bridges-2 at the Pittsburgh Supercomputing Center (PSC). An Extreme Memory (EM) node has 4 Intel Xeon Platinum 8260M "Cascade Lake" CPUs, 24 cores per CPU, 96 cores per node, and a total of 4 TB of memory. Figure 21 shows the results of the scaling tests. For the strong scaling test, AMBER is run with N part = 2048 3 particles and equal number of mesh cells in a comoving box of side length L = 2 h −1 Gpc. We vary the number of cores N core from 1 to 64 and measure the wall time t(N core ). The speedup is defined as t (s) Figure 21. Top: strong scaling test with a fixed number of Npart = 2048 3 particles for each AMBER run. The measured wall time t is compared against the ideal scaling. Bottom: weak scaling test with a particle/core ratio Npart/Ncore ≈ 512 3 .
and we obtain S = 5.4 (21.8) for N core = 8 (64). The deviation from ideal speedup is mostly due to our parallel quicksort, which is not fully optimized yet. The serial sorting takes about 20% of the total wall time, while the parallel sorting takes 29% (47%) for N core = 8 (64). Nonetheless, a typical AMBER model can be quickly run in under an hour wall time with 32 to 64 cores. For the weak scaling test, we use a particle/core ratio of N part /N core ≈ 512 3 . We vary N core from 1 to 64 and change N part accordingly. The efficiency is defined as and we obtain E = 61% (30%) for N core = 8 (64). Again, the decrease in efficiency is mostly due to our nonoptimal quicksort. The serial sorting takes about 14% of the total wall time, while the parallel sorting takes 27% (47%) for N core = 8 (64). We will improve the parallelization of the algorithm in future work.