Constraining the X-ray heating and reionization using 21-cm power spectra with Marginal Neural Ratio Estimation

Cosmic Dawn (CD) and Epoch of Reionization (EoR) are epochs of the Universe which host invaluable information about the cosmology and astrophysics of X-ray heating and hydrogen reionization. Radio interferometric observations of the 21-cm line at high redshifts have the potential to revolutionize our understanding of the universe during this time. However, modeling the evolution of these epochs is particularly challenging due to the complex interplay of many physical processes. This makes it difficult to perform the conventional statistical analysis using the likelihood-based Markov-Chain Monte Carlo (MCMC) methods, which scales poorly with the dimensionality of the parameter space. In this paper, we show how the Simulation-Based Inference (SBI) through Marginal Neural Ratio Estimation (MNRE) provides a step towards evading these issues. We use 21cmFAST to model the 21-cm power spectrum during CD-EoR with a six-dimensional parameter space. With the expected thermal noise from the Square Kilometre Array (SKA), we are able to accurately recover the posterior distribution for the parameters of our model at a significantly lower computational cost than the conventional likelihood-based methods. We further show how the same training dataset can be utilized to investigate the sensitivity of the model parameters over different redshifts. Our results support that such efficient and scalable inference techniques enable us to significantly extend the modeling complexity beyond what is currently achievable with conventional MCMC methods.


INTRODUCTION
The Cosmic Dawn (CD) marks the formation of the first sources of light, which produced high-energy X-ray and UV radiation.The radiation from these sources heated up the intergalactic medium (IGM) and initiated the Epoch of Reionization (EoR), during which the IGM transitioned from a neutral to ionized state (Barkana & Loeb 2001;Furlanetto et al. 2006a;Pritchard & Loeb 2012).The astrophysics driving the heating and reionization process is still poorly understood, with large uncertainties on the properties of the sources which dominantly contributed to these epochs (e.g., their star formation efficiency, ionizing efficiency, and their X-ray luminosity).Observations of the high redshift quasar spectra (Becker et al. 2001;Fan et al. 2003;Boera et al. 2019), electron scattering optical depth from the Cosmic Microwave Background (CMB) (Kaplinghat et al. 2003;Komatsu et al. 2011;Planck Collaboration et al. 2020), and the luminos-⋆ E-mail: a.saxena@rug.nlity function and clustering properties of Lyman-α emitters (Jensen et al. 2012;Dijkstra 2014;Bouwens 2016;Gangolli et al. 2020) currently provide some constraints on the astrophysical evolution of the CD and EoR.The 21-cm line associated with the spin-flip hyperfine transition of the hydrogen atom offers the most promising probe to study these eras.
There are a large number of ongoing radio interferometric experiments, including GMRT (Paciga et al. 2013), HERA (DeBoer et al. 2017), LOFAR (Mertens et al. 2020;Ghara et al. 2020), LWA (Eastwood et al. 2019), MWA (Barry et al. 2019;Li et al. 2019) and PAPER (Kolopanis et al. 2019).These experiments target the detection of the 21-cm signal by quantifying its spatial fluctuations using various Fourier statistics.We get increasingly interesting upper limits on the 21-cm power spectra from these experiments, some of which already enable us to rule out certain astrophysical models (Ghara et al. 2020(Ghara et al. , 2021;;Mondal et al. 2020;Greig et al. 2021;Abdurashidova et al. 2022;The HERA Collaboration et al. 2022).The upcoming SKA (Koopmans et al. 2015;Mellema et al. 2015) is expected to detect the 21-cm power spectrum and owing to its high sensitivity, it is likely that SKA will also be able to do the full tomography of the 21-cm signal.
Once the signal is detected, the next goal would be to constrain the parameters of the CD-EoR models to pin down the astrophysics of the early universe.Modeling the 21-cm signal from CD-EoR using full radiative transfer simulations (Mellema et al. 2006;Ghara et al. 2015) is computationally expensive and unfeasible to perform parameter inference.To overcome this challenge, various approximate and efficient semi-numerical models are used to model the signal accurately at scales ≥ 1 Mpc (Zahn et al. 2011).The traditional framework that is used to explore parameter space is 21CMMC 1 (Greig & Mesinger 2015, 2017, 2018;Park et al. 2019), which uses a semi-numerical framework of the 21-cm signal simulator 21cmFAST2 (Mesinger et al. 2010) and embed this code in a Markov-Chain Monte Carlo (MCMC) sampler.While 21CMMC is quite powerful in systematically performing parameter inference, it becomes computationally quite expensive once we take into account the inhomogeneous Xray heating in the simulations.Alternatively, one can use analytical models of the 21-cm signal during CD-EoR (Qin et al. 2022;Muñoz 2023).Quite generally, as the dimensionality of parameter space increases, it takes longer for an MCMC, which samples the full joint posterior, to converge.
To circumnavigate these problems, machine learning techniques have been explored in various astrophysical and cosmological problems.In the context of 21-cm cosmology, one common approach is to use emulators, which are trained using artificial neural networks to replace actual simulations.This makes the likelihood evaluations and, consequently, the parameter inference significantly faster (Shimabukuro & Semelin 2017;Kern et al. 2017;Schmit & Pritchard 2017;Tiwari et al. 2022).However, the application of emulators is currently limited to low-order summary statistics.The likelihood could become intractable for higher-order information such as the full 3D 21-cm images.For a tractable likelihood function, the traditional MCMC algorithm can be used to sample from the posterior distribution.However, when the likelihood itself is intractable, techniques such as the Approximate Bayesian Computation (ABC) (Toni et al. 2008) can be used to sample from the approximate posterior.This approach uses simulated datasets to avoid the likelihood evaluations; however, it requires the introduction of summary statistics, which can significantly affect the quality of the approximation.
These issues can be resolved by performing a Simulation-Based Inference (SBI) (Cranmer et al. 2020;Papamakarios et al. 2019;Alsing et al. 2019), where deep learning algorithms along with the ABC are used to estimate the posterior distribution.In this work, we will apply the Marginal Neural Ratio Estimation (MNRE) algorithm (Miller et al. 2021) using swyft3 (Miller et al. 2022).It directly estimates the marginal likelihood-to-evidence ratios through neural networks, which makes it much more efficient than sampling the full joint posterior with an MCMC.In addition, MNRE offers the flexibility to ignore large numbers of nuisance parameters, learning only the parameters of interest.This has already been applied for the cosmological parameter inference from the CMB power spectra (Cole et al. 2022), reconstructing the halo clustering and halo mass function from N-body simulations (Dimitriou et al. 2022), and gravitational lensing analyses (Coogan et al. 2022).
In this work, we use this framework for the astrophysical parameter inference with the 21-cm power spectrum from the CD-EoR.In a recent study, Zhao et al. (2022b) have performed the reionization parameter inference from the EoR using the density estimation likelihood free inference (DELFI).Their analysis, however, was limited to a two-dimensional parameter space to model the 21-cm signal during the EoR.Here, we extend the parameter space to six dimensions to also include the parameters that govern the inhomogeneous X-ray heating during the CD.In this case, a single 21-cm power spectrum simulation is ∼ 5 times slower than the former 2D parameter space.This implies that an MCMC for the sixdimensional parameter space would be even slower because of the typical exponential scaling of the required samples as a function of the number of parameters.Moreover, such analysis with conventional methods while also co-varying the cosmic seed for the forward models is pushed out of the realm of feasibility.
Generating the simulated dataset and performing the inference with MNRE are two independent processes within swyft.This allows us to utilize the same training dataset for various applications.To highlight this aspect of swyft, in a workedout example, we will let the neural network determine which set of parameters are sensitive at which redshifts at no extra cost of 21-cm simulations.The distribution of integration time over different redshifts can be considered a proxy to determine which part of the data each parameter is most sensitive to.This could be indicative of the possible degeneracies between parameters for more complex astrophysical models of the 21-cm signal.
This paper is organized as follows.In section 2, we briefly outline the implementation of MNRE using swyft.In section 3, we describe the 21-cm signal modeling and the parameters of interest.In section 4, we present the posterior inference and investigate the sensitivity of model parameters in different redshift ranges.We conclude in section 5. Throughout this work, we assumed a ΛCDM universe with cosmological parameters Ωm = 0.308, Ω b = 0.048, ΩΛ = 0.692, h = 0.678 and σ8 = 0.81 (Planck Collaboration et al. 2016a).

IMPLEMENTATION OF MNRE USING swyft
The probability distribution of model parameters θ for a given observation x follows from Bayes' theorem where p(x|θ) is the likelihood of the data x for given parameters θ, p(θ) is the prior probability distribution over the parameters and p(x) is the evidence of the data.
In SBI, the information about the likelihood is implicitly accessed via a stochastic simulator, which maps from input parameters θ to data x.We generate sample-parameter pairs from this simulator {(x 1 , θ 1 ) (x 2 , θ 2 ), • • • }.Here θ i is typically drawn from the prior, so these pairs are drawn from the joint distribution p(x, θ).These pairs are used to train a neural network to approximate the likelihood-to-evidence ratio, a procedure known as Neural Ratio Estimation (NRE) (Hermans et al. 2020a,b;Durkan et al. 2020).Following equation (1), this ratio (which we denote r(x, θ)) can be expressed as In other words, r(x, θ) is equal to the ratio of the joint probability density p(x, θ) to the product of marginal probability densities p(x) p(θ).A binary classifier d ϕ (x, θ) is then trained to distinguish between jointly-drawn and marginallydrawn pairs.Here ϕ denotes the learnable parameters of the model, which are updated as the model is trained.More precisely, we introduce a binary label y to denote whether a pair was drawn jointly (y = 1) or marginally (y = 0).Strictly speaking, y is a random variable.The output of the classifier (assuming it is trained well) approximates the probability that a sample-parameter pair (x, θ) is drawn jointly (y = 1), i.e., , where we assumed p(y = 0) = p(y = 1) = 1 2 .This learning problem is associated with a binary cross-entropy loss function which is minimized using stochastic gradient descent to find the optimal parameters ϕ of the network.The binary classifier is simply a dense neural network with a few hidden layers.Once the network is trained, it results in which can be re-written as to estimate the posterior probability distribution.This procedure can directly estimate marginal posteriors by omitting model parameters from the network's input, a variant called Marginal Neural Ratio Estimation (MNRE).In this work, we use MNRE as implemented in the software package swyft (Miller et al. 2022).

21cmFAST
To model the 21-cm signal and the underlying astrophysics of heating and reionization, we use the publicly available seminumerical formalism, 21cmFAST (Mesinger et al. 2010).We first generate the initial density perturbation at z = 300 on a high-resolution 1024 3 grid.These perturbations are evolved using the Zel'dovich approximation (Zel'dovich 1970) at later redshifts.To produce the ionization map, the high-resolution density field is first mapped on a coarser grid.Then, 21cmFAST uses an excursion-set based formalism (Furlanetto et al. 2004) to identify the ionized regions by comparing the number of ionizing photons with the number of baryons within the spheres of decreasing radius Rmin ≤ R ≤ Rmax.Here, Rmin depends on the spatial resolution of the simulation, and Rmax is the maximum horizon for ionizing photons (see Section 3.1.3).A grid point located at (x, z) is considered fully ionized if for any where ζ represents the ionizing efficiency (see Section 3.1.1)and f coll (x, z, R, Mmin) is the fraction of collapsed matter within a spherical region of radius R centered at (x, z), which depends on the minimum mass of the halo formation Mmin (Press & Schechter 1974;Sheth & Tormen 1999).The cells that do not satisfy Equation ( 6) are assigned a partial ionization fraction, ζf coll (x, z, Rmin).The resulting ionization map is then converted into the 21-cm brightness temperature map using (Furlanetto et al. 2006b) where xH ii is the ionization fraction, δ b is the baryon overdensity, Ωm is the matter density, Ω b is the baryon density, h is the Hubble parameter, TS and TCMB are the spin temperature and CMB temperature respectively, and the last term takes into account the velocity gradient along the line of sight.
The spin temperature TS can couple to (i) the CMB temperature TCMB, in which case δT b = 0, (ii) the kinetic gas temperature TK through collisional coupling and (iii) the Lyα color temperature TC through the Wouthuysen-Field coupling (Wouthuysen 1952), where TC ≈ TK.To track the evolution of the gas temperature, 21cmFAST simulates the inhomogeneous heating of the IGM by X-rays by integrating the angle-averaged specific X-ray emissivity (ϵX) along the lightcone for each cell.The specific X-ray emissivity is given as (Mesinger et al. 2010;Greig & Mesinger 2017) where ρcrit,0 is the current critical density, f⋆ is the fraction of baryons in stars, δ nl is the evolved density.The term enclosed in square brackets is the star-formation rate (SFR) density along the lightcone.LX is the specific X-ray luminosity which is assumed to follow a power law, LX ∝ E −α X .The photons below an energy threshold E0 are absorbed by the interstellar medium.The X-ray efficiency is normalized by quantifying an integrated soft-band (< 2 keV) luminosity per SFR The semi-numerical model adopted in this work consists of six astrophysical parameters which govern the evolution of the 21-cm signal during the CD-EoR.We briefly describe each of these parameters and the adopted priors below.

Ionizing efficiency, (ζ)
The UV ionizing efficiency of high redshift galaxies can be expressed in terms of various factors as (Barkana & Loeb 2001;Mesinger et al. 2010) where fesc is the fraction of ionizing photons that escape into the intergalactic medium (IGM), f⋆ is the fraction of galactic gas in stars, N γ/b is the number of ionizing photons produced per baryon in stars and nrec is the average number of times a hydrogen atom recombines.We assume a single population of efficient star-forming galaxies (a constant ionizing efficiency for all the galaxies) hosted by haloes with a sufficient mass.
The timing and duration of reionization strongly depend on ζ.Large values of ζ will speed up the ionization process if we keep the other parameters fixed.We adopt a flat prior ζ ∈ (10, 100), although an extended range with the upper limit of ζ = 250 has also been studied in Greig & Mesinger (2017) to explore the models where the EoR is driven by rare, very bright galaxies.

Minimum virial temperature of haloes, T min vir
The minimum threshold for a halo to host a star-forming galaxy is defined in terms of its virial temperature, T min vir .It is related to the mass of the halo (Barkana & Loeb 2001) as where µ is the mean molecular weight, Ω z m = Ωm(z), and ∆c = 18π 2 + 82d − 39d 2 where d = Ω z m − 1.The choice of T min vir determines the cut-off in the UV luminosity function.Galaxies that are hosted within a halo with Tvir < T min vir have no contribution to star formation due to internal feedback processes.We note that T min vir has a significant impact on both the EoR and the Epoch of Heating (EoH) because, within the 21cmFAST framework, the physics of star-formation drives both the X-ray heating and ionization fields.
We adopt a flat prior on T min vir ∈ (10 4 , 10 6 ) K. The minimum temperature required for efficient atomic cooling defines our lower limit of T min vir = 10 4 K, and the upper limit is consistent with the observation of Lyman break galaxies at high redshifts (Kuhlen & Faucher-Giguère 2012;Barone-Nugent et al. 2014).

Mean free path of the ionizing photons, R mfp
The physical size of the ionized region is governed by the distance ionizing photons propagate through the IGM, which depends on the population of the photon absorption systems where recombinations take place.To take into account this effect, we define R mfp as the maximum horizon for the ionizing photons.
It has been shown by Greig & Mesinger (2017) that this parameter is only sensitive during the later stages of reionization when the typical size of the H ii regions approaches R mfp .We use a flat prior on R mfp ∈ (5, 25) cMpc similar to Greig & Mesinger (2015), which is consistent with the subgrid recombination model of Sobacchi & Mesinger (2014).The total integrated soft-band (< 2 keV) luminosity per SFR escaping the host galaxies (L X<2 keV /SFR) controls the efficiency with which X-rays heat the IGM.It decides the timing and duration of the EoH in a manner similar to ζ for the EoR.
For sufficiently large values of L X<2 keV /SFR, the X-rays can also ionize the IGM at ∼ 10 − 20% level, in addition to heating.We use a flat prior on log 10 (L X<2 keV /SFR) ∈ (38, 42).This range is motivated by population synthesis models at high redshifts (Fragos et al. 2013) and the observations of the local population of galaxies (Mineo et al. 2012;Sazonov & Khabibullin 2017).
3.1.5X-ray energy threshold for self-absorption by the host galaxies, E0 The soft X-rays produced by the host galaxies can be absorbed by the interstellar medium, in which case they can no longer contribute to the heating of the IGM.From the simulations of high z galaxies, it has been shown by Das et al. (2017) that the attenuation of the X-ray profile can be approximated by a step function below an energy threshold E0.
The small values of E0 lead to very efficient and inhomogeneous heating.It has been shown by Pacucci et al. (2014) that the amplitude of the power spectra for such softer spectral energy distributions (SEDs) is larger by up to an order of magnitude.We adopt a flat prior on E0 ∈ (0.1, 1.5) keV.

X-ray spectral index, αX
The spectral index governs the spectrum that emerges from the X-ray sources and depends on the dominant physical process emitting the X-ray photons.We take a flat prior on αX ∈ (−0.5, 2.5) similar to Greig & Mesinger (2017) to take into account various relevant X-ray SEDs such as HMXBs, mini-quasars, host ISM, SNe remnants.
Our simulations are performed within a [250 cMpc] 3 box on a [128] 3 grid.The training data is composed of 20,000 power spectra samples evaluated at ten different redshifts in range (25,6).These samples are drawn randomly from the priors.We use 80% of the samples for training, 10% for validation, and 10% for the test dataset.We also vary the cosmic seed in our forward models.The impact of the size of the training data is investigated in Appendix D.

Telescope noise profile
To simulate the thermal noise, we first estimate the uv coverage for SKA1 low, assuming 1000h of observations.Thermal noise is simulated using ps_eor4 by creating a SEFD of 2500 Jy at the central frequency of the observation.The current configuration of SKA1-Low has 512 stations, 224 of which are placed randomly in a circular core of radius 350 m.The remaining 288 stations are distributed among 36 clusters in three spiral arms extending up to a radius of 35 km from the central core.We integrate for 10 seconds per visibility and observe for 6 hours each day with a frequency resolution of 195.3 kHz.These parameters are tabulated in Table 1.This results in the thermal noise (σ therm ).Note that in Greig & Mesinger (2017), the authors also include a 20% modeling uncertainty on the sampled power spectra to take into account the differences with various semi-numerical and radiative transfer simulations.This can be easily incorporated without running any additional 21-cm signal simulations in our analysis.The impact of including the modeling uncertainty is discussed in Appendix C.

Mock observation
To form our mock observation, we consider a model with {ζ, log 10 (T min vir ), R mfp , log 10 (LX), E0, αX} = {30, 4.70, 15, 40.5, 0.5, 1}.It corresponds to the FAINT GALAXIES model from Mesinger et al. (2016); Greig & Mesinger (2017) in which reionization is driven by numerous sources with low ionizing  In Figure 1, we show the cosmological 21-cm power spectra (black line) from our mock observation at different redshifts.The shaded region represents the 21-cm power spectrum uncertainty.We then draw a random realization from the normal distribution ∼ N (0, σ 2 therm (k, z)), and add it to the cosmological 21-cm power spectrum to form the noisy mock observation (orange line).We restrict our analysis to the k-modes in the range k ∈ (0.1, 0.8) Mpc −1 to avoid the impact of foreground contamination on large scales and thermal noise on small scales (Greig & Mesinger 2015, 2017).

RESULTS
In this section, we discuss the application of swyft to obtain the posterior probability distributions for our six-dimensional 21-cm power spectra model for the simulated mock observation and explore the constraints on different astrophysical parameters as a function of redshift in Section 4.1.In Section 4.2, we show how the distribution of integration time over different redshifts can be used as a proxy to find which part of the data each model parameter is sensitive to.These examples emphasize the flexibility of our framework.

Posterior inference with swyft
To obtain the posterior distribution from MNRE, we first concatenate the power spectra from different redshifts into a 1D array.This is then fed as the input for the multi-layer perceptron (MLP) with three layers, each containing 256 neurons.The network is trained with a batch size of 64, and we decay the initial learning rate of 10 −3 by 0.95 after every epoch.The output of the trained network is the estimated ratios for the parameters of interest.In Figure 2, we show a schematic diagram of the network architecture.Once the network is trained, the ratio estimator allows for very fast MCMC sampling from the approximate posterior.
In Figure 3, we present the posteriors on the astrophysical parameters obtained from swyft for the FAINT GALAXIES model assuming 1000h observation from the SKA.The diagonal panels show the 1D marginalized posterior for each parameter, and 2D marginals are shown in the lower offdiagonal panels.The dashed lines represent the true value of the parameters.The inferred model parameters and the corresponding 16th and 84th percentiles are tabulated in Table 2.
Consistent with Greig & Mesinger (2017), we are able to tightly constrain all our model parameters except αX, which has a relatively small impact on the amplitude of the 21-cm power spectra.The small degeneracies between ζ − log 10 (T min vir ) and E0 − αX are in agreement with Greig & Mesinger (2017), and the findings of Ewall-Wice et al. (2016) and Kern et al. (2017).In the top right panel of Figure 3, we show the 1σ and 2σ constraints on the mean neutral fraction ( χHI) as a function of redshift z, where the dashed line represents the true ionization history of the model.For this analysis, we use χH i(z) in place of parameters θ of the network architecture shown in Figure 2. We find tight constraints on the ionization history from the SKA.
Next, we investigate the sensitivity of our model parameters during different redshifts.We perform the parameter inference by dividing the entire redshift range into two bins: (i) zEoH ∈ (25, 12), which corresponds to the X-ray heating, and (ii) zEoR ∈ (11, 6) that corresponds to reionization.Note that this analysis does not require any extra 21-cm power spectra simulations.The same training data can be re-used with a minimal change in the network's architecture, which is not possible for an MCMC analysis.In this case, the MLP takes the power spectra from zEoH (or zEoR) as the input and estimates the ratios for the parameters of interest.
In Figure 4, we show the resulting 1D and 2D marginal posteriors from zEoH (red) and zEoR (blue).The inferred parameters and the corresponding 16th and 84th percentiles are tabulated in Table 2.We find that ζ, log 10 (T min vir ) and R mfp are well constrained with the 21-cm power spectra from zEoR, which is expected as these parameters play a significant role during reionization.On the other hand, log 10 (T min vir ), log 10 (LX) and E0 are constrained with the power spectra from zEoH.Note that the minimum virial temperature of a halo to host the star-forming galaxies, T min vir can be well con-strained with either redshift bin because, within 21cmFAST, the galaxies that host the ionizing sources are the same galaxies that are responsible for X-ray heating.So, this parameter impacts both the EoH and EoR.
In the top right panel of the Figure 4, we show the constraints on reionization history from the 21-cm power spectra during zEoH (red) and zEoR (blue).We find that from the 21-cm power spectra at zEoH, we can constrain the neutral fraction reasonably well at z ≥ 12 (zEoH), but it does not provide tight constraints during the intermediate and late stages of reionization.However, with the 21-cm power spectra at zEoR, we can infer the entire reionization history of the FAINT GALAXIES model.The constraints from the 21cm power spectra at zEoR on the neutral fraction at zEoH comes from the fact that throughout our models, the neutral fraction χH i ≈ 1 at z ≥ 12.

Information from different redshifts
In order to study the information content from different redshifts, we consider a toy scenario where we use the distribution of integration time over different redshifts as a proxy to find which part of the data each parameter is most sensitive to.So far, in our analysis, we considered the distribution of integration time to be uniform over redshifts.However, for a fixed total integration time, this distribution can be optimized since the thermal noise level at redshift z depends on the integration time tz allocated for that redshift.
The optimization is achieved via gradient descent by maximizing the information the network learns about any given parameter from different redshifts.For a fixed total integration time Ttot, we optimize the integration time for each redshift tz, such that Ttot = tz.We parameterize tz as where v is a vector that corresponds to the number of redshift bins.Larger components in v correspond to more integration time for that redshift bin.Next, we consider v to be one of the network parameters that is optimized during the training.As we train the classifier (MLP) to learn the 1D posterior for any given parameter, at the same time, it learns the optimal way of distributing the integration time for that parameter.We further obtain the uncertainties on the optimal time distribution through Monte Carlo Dropout (MCD) (Gal & Ghahramani 2015).
In Figure 5, we show the results from this information content analysis.For each parameter, the top panel shows the uniform time distribution (orange dashed line) and the optimized time distribution (violins), where the uncertainties follow from the MCD.The bottom panel shows the histogram of the 1σ uncertainty interval on the posterior distribution of each parameter from 500 different mock observations drawn randomly from the test dataset assuming the uniform (orange) and optimized (black) time distribution.
We find that the parameters log 10 (T min vir ), log 10 (LX), E0 and αX are assigned a larger integration time at high redshifts z ≥ 12 after the optimization of the network.This implies that the information for these parameters is contained at high redshifts.These findings are consistent with the posteriors from zEoH and zEoR shown in Figure 4. On the contrary, for the mean free path R mfp , the network allocates large integration time at redshift z = 9.This is also Table 2.The inferred parameter values and the associated 16th and 84th percentiles for the posteriors from (i) z EoH + z EoR shown in Figure 3, and (ii) z EoH and z EoR shown in Figure 4.   in agreement with the analysis in Figure 4, where we found a flat posterior on R mfp from zEoH, and the constraints only came from zEoR.For each parameter, the histogram of the 1σ uncertainty interval from the optimized integration time distribution tends towards lower 1σ uncertainty on the posterior distribution, which indicates that the network learns more information about a given parameter from the optimized time distribution in comparison to the uniform time distribution.

SUMMARY
In this paper, we performed Simulation-Based Inference through a MNRE algorithm, swyft, to constrain the astrophysical parameters that govern the X-ray heating and reionization during the CD-EoR.We used 21cmFAST to model the 21-cm power spectra during CD-EoR with a six-dimensional astrophysical parameter space.We showed that this framework is significantly more efficient as it directly learns the marginal posteriors of interest through neural networks than the conventional likelihood-based methods such as MCMC, which samples the full joint posterior.
With the training data composed of 20,000 21-cm power spectra simulations and the expected thermal noise level from the SKA, we were able to constrain the parameters of our model.The 1D and 2D marginal posteriors obtained through MNRE look consistent with the earlier studies performed with an MCMC, which required an order of magnitude more samples to converge.We further checked the statistical consistency of the trained network by evaluating the nominal and empirical expected coverage probabilities.
Within swyft, generating the training dataset and MNRE are two independent processes.This feature gives us the flexibility to reuse the simulations and utilize the same training dataset for various applications.To demonstrate this aspect of swyft, we investigated the sensitivity of different parameters over two different redshift ranges that correspond to the EoH (zEoH) and EoR (zEoR).We obtained the posterior probability distribution on the model parameters from zEoH and zEoR at no extra cost of 21-cm power spectra simulation.An MCMC analysis in this scenario would otherwise require a new chain, and the simulations can not be used efficiently.
We further studied the information content for each parameter from different redshifts by considering a toy scenario where we consider the distribution of the integration time to be part of the network parameters, which is optimized during the training.We found the optimized time distribution to be consistent with the posterior probability distribution of model parameters from zEoH and zEoR.This could be used as an indicator of the possible degeneracies for more complex astrophysical 21-cm signal models without running additional simulations.This establishes that with such efficient and scalable inference techniques, one can increase the complexity of the 21-cm model even further, which could otherwise be impractical for the likelihood-based approaches.
While our analysis has shown that MNRE is a powerful framework to analyse the 21-cm power spectrum, in reality, the 21-cm signal during CD-EoR is highly non-Gaussian (Shimabukuro et al. 2016;Majumdar et al. 2018;Watkinson et al. 2018), so the 21-cm power spectrum is probably not the most optimal summary statistics to use for parameter inference.In future work, we plan to explore the higher-order summary statistics such as the 21-cm bispectrum (Tiwari et al. 2022), the morphology of the ionized regions (Gazagnes et al. 2021;Kapahtia et al. 2021) and convolutional neural networks on the 21-cm tomographic images (Gillet et al. 2019;Zhao et al. 2022a)   density regions (HPDR) for the posterior estimator p(θ|x) is given as (Hermans et al. 2021) where Θ p(θ|x i ) (1 − α) function gives the (1 − α) HPDR of p(θ|x) for the mock data xi with the ground truth θ * i .We then compare it with the nominal expected coverage probability, which is equal to the confidence level (1 − α).For an estimator with perfect coverage, the empirical coverage probability is equal to the nominal coverage probability, so when we randomly generate n samples (xi, θ * i ) ∼ p(x, θ), the ground truth θ * i lies outside the (1 − α) HPDR in α of the cases (Cole et al. 2022).
We re-parameterize α ( α) in terms of a new variable z which is 1−α/2 (1− α/2) quantile of the standard normal distribution.This means that the (1, 2, 3)σ regions correspond to z = (1, 2, 3) with (1 − α) = (0.6827, 0.9545, 0.9997).The uncertainties on the empirical expected coverage probability follow from the finite number of samples (n) and are estimated using the Jeffreys interval (Cole et al. 2022).In and 2D marginal posteriors.We find that in all cases, they match to good precision.

APPENDIX B: COMPARISON WITH 21CMMC
In this section, we compare the posteriors obtained from MNRE with an MCMC sampling-based method, 21CMMC.As sampling our joint six-dimensional parameter space with the likelihood-based approach is computationally very demanding, we shrink our parameter space to two dimensions, including ζ and log 10 (T min vir ).We keep the other parameters fixed to {R mfp , log 10 (LX), E0, αX} = {15, 40.5, 0.5, 1} and target redshifts z = 10, 9 and 8.
To set up 21CMMC, we use 48 random walkers with 2000 iterations each, generating ∼ 10 5 samples.On the other hand, the training data for MNRE consists of 10 4 simulations.The mock observation with (ζ, log 10 (T min vir )) = (30, 4.70) is generated using a different realization of the density field from the one used in sampling.In Figure B1, we show the posteriors obtained from 21CMMC (red) and MNRE (blue).The 1D and 2D marginal posteriors obtained from MNRE are in good agreement with 21CMMC at a significantly reduced computational cost.These results are consistent with the findings of Zhao et al. (2022b).

Figure 1 .
Figure 1.The cosmological (black) and noisy (orange) mock power spectrum for the FAINT GALAXIES model at different redshifts, where k ∈ (0.1, 0.8) Mpc −1 .The shaded region represents the power spectrum uncertainty level.

Figure 2 .
Figure 2. Illustration of the network architecture.Input data x and parameters θ are mapped to the marginal parameter combinations.The individual ratio estimators are trained with an MLP.The outputs are estimated ratios r(x, θ) for the marginal posteriors of interest.
efficiency.This set of parameter values results in the reionization history and the Thomson optical depth τ consistent with Planck data(Planck Collaboration et al. 2016b).The mock observation is simulated within a [500 cMpc] 3 box on a [256] 3 grid.

Figure 5 .
Figure 5. Result of integration time optimization as a proxy for information content for ζ, log 10 (T min vir ), R mfp , log 10 (L X ), E 0 , α X .Top panel: Violins represent the optimized time distribution, and the orange line shows the initial time distribution.Bottom panel: The histogram of 1σ uncertainty interval on the posterior distribution of each parameter for 500 different mock observations drawn randomly from the test dataset assuming a uniform (orange) and optimized (black) time distribution.

Figure A1 .
Figure A1.Empirical expected coverage probability (1 − α) of the trained network as a function of confidence level (1 − α) for all 1D and 2D marginal posteriors.In case of a perfect coverage, the purple line coincides with the green dashed line.

Table 1 .
Observation parameters for SKA1 low configuration used in this work to simulate the thermal noise.
for parameter inference through MNRE.

Table C1 .
The inferred parameter values and the associated 16th and 84th percentiles for the posteriors shown in FigureC1with and without including 10% modeling uncertainty.