Accelerated Bayesian SED Modeling Using Amortized Neural Posterior Estimation

State-of-the-art spectral energy distribution (SED) analyses use a Bayesian framework to infer the physical properties of galaxies from observed photometry or spectra. They require sampling from a high-dimensional space of SED model parameters and take >10–100 CPU hr per galaxy, which renders them practically infeasible for analyzing the billions of galaxies that will be observed by upcoming galaxy surveys (e.g., the Dark Energy Spectroscopic Instrument, the Prime Focus Spectrograph, the Vera C. Rubin Observatory, the James Webb Space Telescope, and the Roman Space Telescope). In this work, we present an alternative scalable approach to rigorous Bayesian inference using Amortized Neural Posterior Estimation (ANPE). ANPE is a simulation-based inference method that employs neural networks to estimate posterior probability distributions over the full range of observations. Once trained, it requires no additional model evaluations to estimate the posterior. We present, and publicly release, SEDflow, an ANPE method for producing the posteriors of the recent Hahn et al. SED model from optical photometry and redshift. SEDflow takes ∼1 s per galaxy to obtain the posterior distributions of 12 model parameters, all of which are in excellent agreement with traditional Markov Chain Monte Carlo sampling results. We also apply SEDflow to 33,884 galaxies in the NASA–Sloan Atlas and publicly release their posteriors.


INTRODUCTION
Physical properties of galaxies are the building blocks of our understanding of galaxies and their evolution. We can determine properties such as stellar mass (M * ), star formation rate (SFR), metallicity (Z), and age (t age ) of a galaxy by analyzing its spectral energy distribution (SED). Theoretical modeling of galaxy SEDs is currently based on stellar population synthesis (SPS) and describes the SED as a composite stellar population constructed from isochrones, stellar spectra, an initial mass function (IMF), a star formation and chemical evolution history, and dust attenuation (e.g. Bruzual & Charlot 2003;Maraston 2005;Conroy et al. 2009, see Walcher et al. 2011Conroy 2013 for a comprehensive review). Some models also include dust and nebular emissions as well as emissions from active galactic nuclei (e.g. Johnson et al. 2021). In state-of-the-art SED modeling, theoretical SPS models are compared to observed SEDs using Bayesian inference, which accurately quantifies parameter uncertainties and degeneracies among them (Acquaviva et al. 2011;Chevallard & Charlot 2016;Leja et al. 2017;Carnall et al. 2018;Johnson et al. 2021;Hahn et al. 2022). The Bayesian approach also enables marginalization over nuisance parameters, which are necessary to model the effects of observational systematics (e.g. flux calibration).
However, current Bayesian SED modeling methods, which use Markov Chain Monte Carlo (MCMC) sampling techniques, take 10−100 CPU hours per galaxy (e.g. Carnall et al. 2019;Tacchella et al. 2021). While this is merely very expensive with current data sets of hundreds of thousands of galaxy SEDs, observed by the Sloan Digital Sky Survey (SDSS; York et al. 2000), DEEP2 (Davis et al. 2003), COSMOS (Scoville et al. 2007), and GAMA (Baldry et al. 2018), it is prohibitive for the next generation of surveys. Over the next decade, surveys with the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al. 2016), the Prime Focus Spectrograph (PFS; Takada et al. 2014), the Vera C. Rubin Observatory (Ivezic et al. 2019), the James Webb Space Telescope (Gardner et al. 2006), and the Roman Space Telescope (Spergel et al. 2015), will observe billions of galaxy SEDs. The task of SED modeling alone for these surveys would amount to tens or hundreds of billions of CPU hours, exceeding e.g. the entire compute allocation of the Legacy Survey of Space and Time (LSST) data release production 1 by at least two orders of magnitude. Recently, Alsing et al. (2020) adopted neural emulators to accelerate SED model evaluations by three to four orders of magnitude -posterior inference takes minutes per galaxy. While this renders current data sets within reach, the next generation data sets will still require tens or hundreds of millions of CPU hours whenever any aspect of the SED model is altered. Furthermore, this still practically precludes rapid analyses of upcoming transient surveys, especially LSST, which will report ∼10,000 alerts per minute 2 .
But Bayesian inference does not require MCMC sampling. Simulation-based inference (SBI) is a rapidly developing class of inference methods that offers alternatives for many applications (see Cranmer et al. 2020, and references therein). Many SBI methods leverage the latest developments in statistics and Machine Learning for more efficient posterior estimation (Papamakarios et al. 2017;Alsing et al. 2019;Hahn et al. 2019;Dax et al. 2021;Huppenkothen & Bachetti 2021;Zhang et al. 2021). Of particular interest for SED modeling is a technique called Amortized Neural Posterior Estimation (ANPE). Instead of using MCMC to sample the posterior for every single galaxy separately, ANPE uses neural density estimators (NDE) to build a model of the posterior for all observable galaxies. Once the NDE is trained, generating the posterior requires only the observed SED and no additional model evaluations.
In this work, we present SEDflow, a method that applies ANPE to Bayesian galaxy SED modeling using the recent Hahn et al. (2022) SED model. We demonstrate that we can derive accurate posteriors with SEDflow and make Bayesian SED modeling fully scalable for the billions of galaxies that will be observed by upcoming surveys. As further demonstration, we apply SEDflow to the optical photometry of ∼33, 000 galaxies in the NASA-Sloan Atlas (NSA). We begin in Section 2 by describing SBI using ANPE. We then present how we design and train SEDflow in Section 3 and describe the NSA observations in Section 4. We validate the accuracy of the posteriors from SEDflow in Section 5 and discuss the implications of our results in Section 6.

SIMULATION-BASED INFERENCE
The goal of Bayesian SED modeling, and probabilistic inference more broadly, is to infer the posterior probability distributions p(θ | x ) of galaxy properties, θ, given observations, x . For a specific θ and x , we typically evaluate the posterior using Bayes' rule, p(θ | x ) ∝ p(θ) p(x | θ), where p(θ) denotes the prior distribution and p(x | θ) the likelihood, which is typically assumed to have a Gaussian functional form: (1) m(θ) is the theoretical model, in our case a galaxy SED model from SPS. C is the covariance matrix of the observations. In practice, off-diagonal terms are often ignored and measured uncertainties are used as estimates of the diagonal terms. Simulation-based inference (SBI; also known as "likelihood-free" inference) offers an alternative that requires no assumptions about the form of the likelihood. Instead, SBI uses a generative model, i.e. a simulation F , to generate mock data x given parameters θ : F (θ ) = x . It uses a large number of simulated pairs (θ , x ) to directly estimate either the posterior p(θ | x ), the likelihood p(x | θ), or the joint distribution of the parameters and data p(θ, x ). SBI has already been successfully applied to a number of Bayesian parameter inference problems in astronomy (e.g. Cameron & Pettitt 2012;Weyant et al. 2013;Hahn et al. 2017;Kacprzak et al. 2018;Alsing et al. 2018;Wong et al. 2020;Huppenkothen & Bachetti 2021;Zhang et al. 2021) and in physics (e.g. Brehmer et al. 2019;Cranmer et al. 2020).

Amortized Neural Posterior Estimation
SBI provides another a critical advantage over MCMC inference methods -it enables amortized inference. For SED modeling using MCMC, each galaxy requires >10 5 model evaluations to accurately estimate p(θ | x ) (Hahn et al. 2022, Kwon et al. in prep.). Moreover, model evaluations for calculating the posterior of one galaxy cannot be used for another. This makes MCMC approaches for SED modeling of upcoming surveys computationally infeasible.
With density estimation SBI, we require a large number (∼10 6 ) of model evaluations only initially to train a neural density estimator (NDE), a neural network with parameters φ that is trained to estimate the density p φ (θ | x ). If the training covers the entire or the practically relevant portions of the θ and x spaces, we can evaluate p φ (θ | x i ) for each galaxy i with minimal computational cost. The inference is therefore amortized and no additional model evaluations are needed to generate the posterior for each galaxy. This technique is called Amortized Neural Posterior Estimation (ANPE) and has recently been applied to a broad range of astronomical applications from analyzing gravitational waves (e.g. Wong et al. 2020;Dax et al. 2021) to binary microlensing lensing (Zhang et al. 2021). For SED modeling, the choice in favor of using ANPE is easy: the entire upfront cost for ANPE model evaluations would only yield posteriors of tens of galaxies with MCMC. ANPE makes two important assumptions. First, the simulator F is capable of generating mock data x that is practically indistinguishable from the observations. In terms of the expected signal, m in Eq. 1, this is the same requirement as any probabilistic modeling approach. But unlike likelihoodbased evaluations, such as conventional MCMC, data generated for SBI need to include all relevant noise terms as well. We address both aspects in Sections 3.2 and 6.1. Second, ANPE assumes that the NDE is well trained: p φ (θ | x ) is a good approximation of p(θ | x ), and therefore of p(θ | x ). We assess this in Section 5.
ANPE commonly employs so-called "normalizing flows" (Tabak & Vanden-Eijnden 2010;Tabak & Turner 2013) as density estimators. Normalizing flow models use an invertible bijective transformation, f , to map a complex target distribution to a simple base distribution, π(z ), that is fast to evaluate. For ANPE, the target distribution is p(θ | x ) and the π(z ) is typically a simple multivariate Gaussian, or mixture of Gaussians. The transformation f : z → θ must be invertible and have a tractable Jacobian. This is so that we can evaluate the target distribution from π(z ) by a change of variable: Since the base distribution is easy to evaluate, we can also easily evaluate the target distribution. A neural network is trained to obtain f and the collection of its parameters form φ. The network typically consists of a series of simple transforms (e.g. shift and scale transforms) that are each invertible and whose Jacobians are easily calculated. By stringing together many such transforms, f provides an extremely flexible mapping from the base distribution. Many different normalizing flow models are now available in the literature (e.g. Germain et al. 2015;Durkan et al. 2019). In this work, we use Masked Autoregressive Flow (MAF; Papamakarios et al. 2017). The autoregressive design (Uria et al. 2016) of MAF is particularly well-suited for modeling conditional probability distributions such as the posterior. Autoregressive models exploit chain rule to expand a joint probability of a set of random variables as products of one-dimensional conditional probabilities: p(x ) = i p(x i | x 1:i−1 ). They then use neural networks to describe each conditional probability, p(x i | x 1:i−1 ). In this context, we can add a conditional variable y on both sides of the equation, p(x | y ) = i p(x i | x 1:i−1 , y ), so that the autoregressive model describes a conditional probability p(x | y ). One drawback of autoregressive models is their sensitivity to the ordering of the variables. Masked Autoencoder for Distribution Estimation (MADE; Germain et al. 2015) models address this limitation using binary masks to impose the autoregressive dependence and by permutating the order of the conditioning variables. A MAF model is built by stacking multiple MADE models. Hence, it has the autoregressive structure of MADE but with more flexibility to describe complex probability distributions. In practice, we use the MAF implementation in the sbi Python package 3 (Greenberg et al. 2019;Tejero-Cantero et al. 2020).

SEDFLOW
In this section, we present SEDflow, which applies ANPE to galaxy SED modeling for a scalable and accelerated approach. For our SED model, we use the state-of-the-art PROVABGS model from Hahn et al. (2022). Although many SED models have been recently used in the literature (e.g. Bagpipes, Carnall et al. 2018;Prospector, Leja et al. 2017;Johnson et al. 2021), we choose PROVABGS because it will be used to analyze >10 million galaxy spectrophotometry measured by the DESI Bright Galaxy Survey (Hahn et al. in prep.). Below, we describe the PROVABGS model, the construction of the SEDflow training data using PROVABGS, and the training procedure for SEDflow.

SED Modeling: PROVABGS
We use the state-of-the-art SPS model of the PROVABGS (Hahn et al. 2022). The SED of a galaxy is modeled as a composite of stellar populations defined by stellar evolution theory (in the form of isochrones, stellar spectral libraries, and an initial mass function) and its star formation and chemical enrichment histories (SFH and ZH), attenuated by dust (see Walcher et al. 2011;Conroy 2013, for a review). The PROVABGS model, in particular, utilizes a non-parametric SFH with a starburst, a non-parametric ZH that varies with time, and a flexible dust attenuation prescription.
The SFH has two components: one based on non-negative matrix factorization (NMF) bases and the other, a starburst component. The SFH contribution from the NMF component is a linear combination of four NMF SFH basis functions, derived from performing NMF (Lee & Seung 1999;Cichocki & Phan 2009;Févotte & Idier 2011) on SFHs of galaxies in the Illustris cosmological hydrodynamical simulation Genel et al. 2014;Nelson et al. 2015). The NMF SFH prescription provides a compact and flexible representation of the SFH. The second starburst component consists of a single stellar population (SSP) and adds stochasticity to the SFH.
The ZH is similar defined using two NMF bases dervied from Illustris. This ZH prescription enables us to flexibly model a wide range of ZHs and, unlike most SED models, it does not assume constant metallicity over time, which can significantly bias inferred galaxy properties (Thorne et al. 2021). The stellar evolution theory is based on Flexible Stellar Population Synthesis (FSPS; Conroy et al. 2009;Conroy & Gunn 2010) with the MIST isochrones (Paxton et al. 2011(Paxton et al. , 2013(Paxton et al. , 2015Choi et al. 2016;Dotter 2016), the Chabrier (2003) initial mass function (IMF), and a combination of the MILES (Sánchez-Blázquez et al. 2006) and BaSeL (Lejeune et al. 1997(Lejeune et al. , 1998Westera et al. 2002) libraries. The SFH and ZH are binned into 43 logarithmically-space time and SSPs are evalulated at each time bin using FSPS. The SSPs are summed up to get the unattenuated rest-frame galaxy SED.
Lastly, PROVABGS attenuates the light from the composite stellar population using the two component Charlot & Fall (2000) dust attenuation model with diffuse-dust (ISM) and birth cloud (BC) components. All SSPs are attenuated by the diffuse dust using the Kriek & Conroy (2013) attenuation curve. Then, the BC component provides extra dust attenuation on SSPs younger than 100 Myr with young stars that are embedded in modecular clouds and HII regions. In total the PROVABGS SED model has 12 parameters: stellar mass (M * ), six SFH parameters (β 1 , β 2 , β 3 , β 4 , t burst , f burst ), two The distribution of SED model parameters, redshift, photometric magnitudes, and uncertainties of the data used to train SEDflow. We plot a subset of the parameters (log M * , β 1 ) and photometric bands for clarity. The training data was constructed by sampling SED model parameters from the prior and forward modeling optical photometry using the PROVABGS and noise models (Section 3). For comparison, we present the distribution of redshift, magnitudes, and uncertainties for galaxies in the NSA catalog (blue). The training set encompasses the observations; thus, SEDflow can be used to infer the posterior for the NSA galaxies.

Training Data
In this section, we describe how we construct the training data for SEDflow using the PROV-ABGS SED model. First, we sample N train model parameters from a prior: θ ∼ p(θ). We use the same priors as Hahn et al. (2022): uniform priors over M * , t burst , f burst , γ 1 , γ 2 , τ BC , τ ISM , n dust with broad conservative ranges and Dirichlet prior over β 1 , β 2 , β 3 , β 4 , chosen to normalize the NMF SFH. For each θ , we also uniformly sample a redshift within the range of the NSA: z ∼ U(0., 0.2). Next, we forward model mock observables. We calculate the rest-frame galaxy SED from PROVABGS and redshift it: F (λ; θ , z). Afterwards, we convolve F with optical broadband filters, R X , to generate noiseless photometric fluxes: The next step of the forward model is to apply noise. We assign photometric uncertainties, σ X , by sampling an estimate of the observed p(σ X |f X ) of NSA galaxies. Then, we apply Gaussian noisê to derive the forward modeled photometric flux. For our estimate of p(σ X |f X ), we use an empirical estimate based on NSA photometry and measured uncertainties. For each of the five optical bands, we separately estimatê as a Gaussian in magnitude-space. µ σ X and σ σ X are the median and standard deviation of σ X as a function of f X that we estimate by evaluating them in f X bins and interpolating over the bins. Any θ that is assigned a negative σ X is removed from our training data. We also remove any training data with f X (θ ) outside the range of NSA photometry. As SBI requires an accurate noise model, one might be concerned about the simplicity of our model. If the noise model is incorrect, estimates of the parameters θ would be biased by an amount that is impossible to predict. We therefore add the noise variances σ X to the conditioning variables of the posterior model, i.e. we train and evaluate p φ (θ | x , {σ X }). This means we merely have to ensure that σ X spans the observed σ X values in order to have a posterior that is robust to our choice of noise model. We discuss this further in Section 6.
In total, we constuct N train = 1, 131, 561 sets of SED parameters, redshift, photometric uncertainties, and mock NSA photometry. In Figure 1, we present the distribution of the training data {(θ , z, σ X ,f X )} (black). We include select SED model parameters (log M * , β 1 ), redshift, photometry in the g and r bands, and photometric uncertainty in the r band. We also include the (z, σ X , f X ) distribution of NSA galaxies (blue), for comparison. The photometry and uncertainties are in magnitudespace. The distribution of the training data spans the distribution of NSA galaxies.

Training SEDflow
For SEDflow, we use a MAF normalizing flow model (Section 2.1) with 15 MADE blocks, each with 2 hidden layers and 500 hidden units. In total, the model has 7,890,330 parameters, φ. We determine this architecture through experimentation. Our goal is to determine φ of the MAF model, p φ (θ | x ), so that it accurately estimates the posterior probability distribution p(θ | x ). θ represent the SED parameters and x = (f X , σ X , z). We do this by minimizing the KL divergence between In practice, we split the training data into a training and validation set with a 90/10 split. Afterwards, we maximize the total log likelihood i log p φ (θ i | x i ) over training set, which is equivalent to minimizing D KL (p || p φ ). We use the Adam optimizer (Kingma & Ba 2017) with a learning rate of 5 × 10 −4 . To prevent overfitting, we evaluate the total log likelihood on the validation data at every training epoch and stop the training when the validation log likelihood fails to increase after 20 epochs. Training our model with a batch size of 50 takes roughly a day on a single 2.6 GHz Intel Skylake CPU. Given our small batch size, we find similar training times when using CPUs or GPUs.

NASA-SLOAN ATLAS
As a demonstration of its speed and accuracy, we apply SEDflow to optical photometry from the NASA-Sloan Atlas 4 (NSA). The NSA catalog is a re-reduction of SDSS DR8 (Aihara et al. 2011) that includes an improved background subtraction (Blanton et al. 2011). We use SDSS photometry in the u, g, r, i, and z bands, which are corrected for galactic extinction using Schlegel et al. (1998). To ensure that the galaxy sample is not contaminated, we impose a number of additional quality cuts by excluding: • objects where the centroiding algorithm reports the position of the peak pixel in a given band as the centroid. The SDSS photometric pipeline can struggle to accurately define the center of objects near the edge or at low signal-to-noise, so these cases are often spurious objects.
• objects that have pixels that were not checked for peaks by the deblender.
• objects where more than 20% of point-spread function (PSF) flux is interpolated over as well as objects where the interpolation affected many pixels and the PSF flux error is inaccurate. The SDSS pipeline interpolates over pixels classified as bad (e.g. cosmic ray).
• objects where the interpolated pixels fall within 3 pixels of their center and they contain a cosmic ray that was interpolated over.
• objects that were not detected at ≥ 5σ in the original frame, that contain saturated pixels, or where their radial profile could not be extracted.
By excluding these objects, we avoid complications from artificats in the photometry that we do not model. For additional details on the quality flags, we refer readers to the SDSS documentation 5 . After the quality cuts, we have a total of 33,884 NSA galaxies in our sample.

RESULTS
Now that we have trained SEDflow, we can estimate the posterior, p(θ | x i ), for any x i = {f X,i , σ X,i , z i }. In practice, we do this by drawing samples from the SEDflow NDE model. Since we use a normalizing flow, this is trivial: we generate samples from the target distribution of the   normalizing flow, a multivariate Gaussian distribution in our case, then we transform the samples using the bijective transformation in Eq. 2 that we trained. The transformed samples follow p φ (θ | x i ) and estimate the posterior, p(θ | x i ). Next, we validate the accuracy of the SEDflow posterior estimates. As a first test, we compare the posterior from SEDflow to the posterior derived from MCMC sampling for a single arbitrarily chosen NSA galaxy in Figure 2 (NSAID = 72). In the top, we present the the posterior distribution of the 12 SED model parameters for the SEDflow posterior (blue) and MCMC posterior (black). The SEDflow posterior is in excellent agreement with the MCMC posterior for all of the SED parameters.
In the bottom of Figure 2, we compare the SEDs of the best-fit parameter values from the SEDflow (blue) and MCMC posteriors (black dotted). We also include the NSA photometric flux of the selected galaxy (red). The best-fit SED from the SEDflow posterior is in good agreement with both the MCMC best-fit SED and the NSA photometry.
The key advantage of ANPE is that it enables accurate Bayesian inference orders of magnitude faster than conventional methods. We derive the MCMC posterior using the Zeus ensemble slicesampler (Karamanis & Beutler 2020) with 30 walkers and 10,000 iterations. 2,000 of the iterations are discarded for burn-in. In total, the MCMC posterior requires >100,000 SED model evaluations. Since each evaluation takes ∼340 ms, it takes ∼10 CPU hours for a single MCMC posterior. Recently, SED modeling has adopted neural emulators to accelerate SED model evaluations (Alsing et al. 2020). In Hahn et al. (2022), for instance, the PROVABGS emulator takes only ∼2.9 ms to evaluate, >100× faster than the original model. Yet, even with emulators, due to the number of evaluations necessary for convergence, an MCMC posterior takes ∼10 CPU minutes. Meanwhile, after training, the SEDflow posterior takes 1 second ->10 4 × faster than MCMC.
The posteriors from SEDflow and MCMC are overall in excellent for NSA galaxies, besides the one in Figure 2. However, we do not know the true SED parameters for these galaxies so to further validate SEDflow, we use test synthetic photometry, where we know the truth. We sample 1000 SED parameters from the prior, {θ test i } ∼ p(θ), and forward model synthetic NSA observations, {x test i }, for them in the same way as the training data (Section 3.2). Afterwards, we generate posteriors for each of x test i using SEDflow: {p(θ | x test i )}. In Figure 3, we present the probability-probability (p-p) plot of the SEDflow posteriors for the test data. The p-p plot presents the cumulative distribution function (CDF) of the percentile score of the true value within the marginalized posterior for each parameter. For true posteriors, the percentiles are uniformly distributed so the CDF is a diagonal (black dashed). Overall, the CDFs for SEDflow lie close to the diagonal for each of the SED parameters. Hence, the SEDflow posteriors are in excellent agreement with the true posteriors.
In Figure 3, we also include the CDFs of the SED parameters for the MCMC posteriors derived for a subset of 100 test observations (gray dotted). Comparing the CDFs from the MCMC posteriors to those of SEDflow, we find that the SEDflow posteriors are actually in better agreement with the true posteriors. This is due to the fact that MCMC posteriors are also only estimates of the true posterior and are subject to limitations in initialization, sampling, and convergence. Posteriors from SEDflow are not impacted by these limations, so the comparison highlights additional advantages of an ANPE approach besides the >10 4 × performance improvement. We examine another validation of the SEDflow posteriors using simulation-based calibration (SBC; Talts et al. 2020). Rather than using percentile scores, SBC examines the distribution of the rank statistics of the true parameter values within the marginalized posteriors. It addresses the limitation that the CDFs only asymptotically approach the true values and that the discrete sampling of the posterior can cause artifacts in the CDFs. In Figure 4, we present SBC of each SED parameter for the SEDflow posteriors (blue) using the 1000 test observations. For comparison, we include the SBC for the MCMC posteriors (gray dotted). Similar to the percentile score, the distribution of the rank statistic is uniform for the true posterior (black dashed). The rank statistic distribution for the SEDflow posteriors are nearly uniform for all SED parameters. Hence, we confirm that the SEDflow posteriors are in excellent agreement with the true posterior.
An advantage of SBC is that by examining the deviation of rank statistics distribution from uniformity, we can determine how the posterior estimates deviate from the true posteriors. For instance, if the distribution has a U-shape where the true parameter values are more often at the lowest and highest ranks, then the posterior estimates are narrower than the true posteriors. If the distribution has a ∩-shape, then the posterior estimates are broader than the true posteriors. Any asymmetry in the distribution implies that the posterior estimates are biased. For the SEDflow posteriors, we find none of these features for any of the SED parameters. Hence, SEDflow provides unbiased and accurate estimates of the true posteriors for all SED parameters. With the accuracy SEDflow validated, we apply it to derive posteriors for all of our NSA galaxies (Section 4). Analyzing all 33,884 NSA galaxies takes ∼12 CPU hours. For each galaxy, we generate 10,000 samples of the 12-dimensional posterior, p(θ | x ). These posteriors on SED parameters represent posteriors on stellar mass, SFH, ZH, and dust content of the NSA galaxies. To maximize the utility of the posteriors further, we use them to derive posteriors on the following additional galaxy properties: SFR averaged over 1Gyr, mass-weighted metallicity, and mass-weighted stellar age (see Eq. 17 in Hahn et al. 2022 for the exact calculation). We publicly release all of the posteriors for our NSA sample as well as all of the software and data used to train and validate SEDflow at https://changhoonhahn.github.io/SEDflow/. 6. DISCUSSION

Forward Model
In the previous section, we demonstrated the accuracy of SEDflow posteriors. Nevertheless, a primary determining factor for the fidelity of SEDflow, or any ML model, is the quality of the training data set and, thus, the forward model used to construct it. Below, we discuss the caveats and limitations of our forward model, which has two components: the PROVABGS SPS model and noise model (Section 3.2).
First, for our noise model, we assign uncertainties to noiseless photometric fluxes based on an empirical estimate of p(σ X | f X ) for each band independently. This is a simplicistic prescription and, as the bottom right panels of Figure 1 (g − σ r and r − σ r ) reveal, there are discrepancies between the magnitude -uncertainty distributions of the training data and observations. Despite these discrepancies, SEDflow provides excellent estimates of the true posterior. This is because we design our ANPE to include σ X as a conditional variable (Section 3.3). The f X − σ X distribution of our training data does not impact the accuracy of the posteriors as long as there are sufficient training data near x to train the NDE in that region.
A more accurate noise model will, in theory, improve the performance of SEDflow because the x -space of the training data will more efficiently span the observations. Fewer training data would be expended in regions of x -space that are devoid of observations. However, for our application, we do not find significantly improved performance when we alter the noise model. This suggests that even with our simplistic noise model, the x -space of observations is covered sufficiently well by the training data. We note that when we decrease N train to below 500,000, SEDflow posteriors are significantly less accurate. A more realistic forward model may reduce this N train threshold for accurate posteriors. However, generating N train ∼1, 000, 000 training SEDs has a negligible computational cost compared to MCMC SED modeling, so we do not consider it necessary to explore this further.
Next, we consider limitations in the PROVABGS SPS model used in our forward model. Our SPS model uses a compact and flexible prescription for SFH and ZH that can describe a broad range of SFHs and ZHs. However, the prescription is derived from simulated Illustris galaxies, whose SFHs and ZHs may be not reflect the full range of SFHs and ZHs of real galaxies. If certain subpopulations of observed galaxies have SFHs and ZHs that cannot be well described by the PROVABGS prescription, they cannot be accurately modeled. Even if the PROVABGS SFH and ZH prescriptions are sufficient, there are limitations in our understanding of stellar evolution.
There is currently no consensus in the stellar evolution, stellar spectral libraries, or IMF of galaxies (e.g. Treu et al. 2010;van Dokkum & Conroy 2010;Rosani et al. 2018;Ge et al. 2019;Sonnenfeld et al. 2019). The PROVABGS model uses MIST isochrones, Chabrier (2003) IMF, and the MILES + BaSeL spectral libraries. These choices limit the range of SEDs that can be produced by the training data. For instance, if galaxies have significant variations in their IMF, assuming a fixed IMF would falsely limit the range of our training data. A more flexible SED model that includes uncertainties in SPS would broaden the range of galaxy SEDs that can be modeled. Data-driven approaches may also enable SED models to be more descriptive (e.g. Hogg et al. 2016;Portillo et al. 2020). Improving SED models, however, is beyond the scope of this work. Our focus is on improving the Bayesian inference framework. In that regard, the limitations of the SED model equally impacts conventional approaches with MCMC.
We encounter the caveats above when we apply SEDflow to the NSA catalog. For a small fraction of NSA galaxies (588 out of 33,884), SEDflow generates posteriors that are outside of the prior volume. This is because the photometry or uncertainties of these galaxies lie outside of the support of the training data and where SEDflow is well trained. They either have higher photometric uncertainties, for a given magnitude, or bluer photometric colors than the training data. Some of these may be observational artifacts or problematic photometry. Nevertheless, SEDflow fails because we cannot construct training data near them with our limited noise and SPS models. Since this only affects a small fraction of the NSA galaxies, we flag them in our catalog and, for completeness, infer their galaxy properties by applying PROVABGS with MCMC sampling. For more details, we refer readers to Appendix A.
To test for limitations of the forward model, we can construct additional tests of posteriors derived from ANPE. For instance, the χ 2 of the best-fit parameter value from the estimated posterior can be used to assess whether the best-fit model accurately reproduces observations. This would only require one additional model evaluation per galaxy. One can also construct an Amortized Neural Likelihood Estimator (ANLE) using the same training data. Unlike the ANPE, which estimates p(θ | f X , σ X , z), the ANLE would estimate p(f X | θ, σ X , z). We can then further validate the posteriors by assessing whether the observed photometry lies within the ANLE distribution. Based on the overall high level of accuracy of SEDflow posteriors, we do not explore these additional tests; however, they can be used to further validate any ANPE posteriors.

Advantages of SEDflow
The primary advantage of SEDflow is its computational speed. This becomes even more pertinent if we want to add additional parameters to address concerns about the choices in current SPS models, described above. To relax these assumptions, SPS models would need to introduce additional parameters that flexibly model these uncertainties (Conroy et al. 2009;Conroy & Gunn 2010). While the dimensionality of current SPS models already makes MCMC methods computational infeasible, ANPE has been applied to higher dimensional applications. For instance, Dax et al. (2021) constructed an accurate ANPE for a 15-dimensional model parameter and 128-dimensional conditional variable spaces. NDE is an actively developing field in ML and new methods are constantly emerging (e.g. Wu et al. 2020;Dhariwal & Nichol 2021). Since ANPE can handle higher dimensionality, we can in the future include additional parameters that model uncertainties in SPS. This will not only improve our SED modeling, but also improve our understanding of stellar evolution and the IMF.
In addition to enabling scalable SED modeling for the next generation galaxy surveys, SEDflow will also enable us to tackle other key challenges in SED modeling. For example, recent works have demonstrated that priors of SED models can significantly impact the inferred galaxy properties (Carnall et al. 2018;Leja et al. 2019;Hahn et al. 2022). Even "uniformative" uniform priors on SED model parameters can impose undesirable priors on derived galaxy properties such as M * , SFR, SFH, or ZH. To avoid significant biases, galaxy studies must carefully select priors and validate their results using multiple different choices. With an MCMC approach, selecting a different prior means reevaluating every posterior and repeating all the SED model evaluations in the MCMC sampling.
For an ANPE approach, the prior is set by the distribution of parameters in the training data. For a new prior, instead of reconstructing the training data, we can resample it in such a way that the parameters follow the new prior. Then, the ANPE model can be re-trained, re-validated on the test data, and re-deployed on observations. Each of these steps require substantially less computational resources than generating a new set of training data or using MCMC methods. Hence, the ANPE approach provides a way to efficiently vary the prior without multiplying computational costs.

SUMMARY AND OUTLOOK
By analyzing the SED of a galaxy, we can infer detailed physical properties such as its stellar mass, star formation rate, metallicity, and dust content. These properties serve as the building blocks of our understanding of how galaxies form and evolve. State-of-the-art SED modeling methods use MCMC sampling to perform Bayesian statistical inference. They derive posterior probability distributions of galaxy properties given observation that accurately estimate uncertainties and parameter degeneracies to enable more rigorous statistical analyses. For the dimensionality of current SED models, deriving a posterior requires 100, 000 model evaluations and take 10 − 100 CPU hours per galaxy. Upcoming galaxy surveys, however, will observe billions of galaxies using e.g. DESI, PFS, Rubin observatory, James Webb Space Telescope, and the Roman Space Telescope. Analyzing all of these galaxies with current Bayesian SED models is infeasible and would require hundreds of billions of CPU hours. Even with recently proposed emulators, which accelerate model evaluations by three to four orders to magnitude, the computation cost of SED modeling would remain a major bottleneck for galaxy studies.
We demonstrate in this work that Amortized Neural Posterior Estimation (ANPE) provides an alternative scalable approach for Bayesian inference in SED modeling. ANPE is a simulation-based inference method that formulates Bayesian inference as a density estimation problem and uses neural density estimators (NDE) to approximate the posterior over the full space of observations. The NDE is trained using parameter values drawn from the prior and mock observations simulated with these parameters. Once trained, a posterior can be obtained from the NDE by providing the observations as the conditional variables without any additional model evaluations.
In this work, we present SEDflow, a galaxy SED modeling method using ANPE and PROV-ABGS, a flexible SED model that uses a compact non-parameteric SFH and ZH prescriptions and was recently validated in Hahn et al. (2022). Furthermore, we apply SEDflow to optical photometry from the NASA-Sloan Atlas as demonstration and validation of our ANPE approach. We present the following key results from our analysis.
• We train SEDflow using a data set of ∼1 million SED model parameters and forward model synthetic SEDs. The parameters are drawn from a prior and the forward model is based on the PROVABGS and noise models. We design the ANPE to estimate p(θ|f X , σ X , z), where f X , σ X , and z are the photometry, photometric uncertainty, and redshift, respectively. For its architecture, we use a MAF normalizing flow with 15 MADE blocks each with 2 hidden layers and 500 hidden units. Training SEDflow requires roughly 1 day on a single CPU. Once trained, deriving posteriors of galaxy properties for a galaxy takes ∼1 second, 10 5 × faster than traditional MCMC sampling. • Posteriors derived using SEDflow show excellent agreement with posteriors derived from MCMC sampling. We further validate the accuracy of the posteriors by applying SEDflow to synthetic observations with known true parameter values. Based on statistical metrics used in the literature (p-p plot and SBC), we find excellent agreement between the SEDflow and the true posteriors. • Lastly, we demonstrate the advantages of SEDflow by applying it to the NASA-Sloan Atlas.
Estimating the posterior of ∼33, 000 galaxies takes ∼12 CPU hours. We make the catalog of posteriors publicly available at https://changhoonhahn.github.io/SEDflow/. For each galaxy, the catalog contains posteriors of all 12 PROVABGS SED model parameters as well as the galaxy properties: M * , average SFR over 1Gyr, mass-weighted metallicity, and mass-weighted stellar age.
This work highlights the advantages of using an ANPE approach to Bayesian SED modeling. Our approach can easily be extended beyond this application. For instance, we can include multiwavelength photometry at ultra-violet or infrared wavelengths. We can also modify SEDflow to infer redshift from photometry. In SEDflow, we include redshift as a conditional variable, since NSA provides spectroscopic redshifts. However, redshift can be included as an inferred variable rather than a conditional one. Then, we can apply SEDflow to infer galaxy properties from photometric data sets without redshift measurements while marginalizing over the redshift prior. If we do not require spectroscopic redshifts, SEDflow can be extended to much larger data sets that span fainter and broader galaxy samples. Conversely, we can use SEDflow to infer more physically motivated photometric redshifts, where we marginalize over our understanding of galaxies rather than using templates.
The ANPE approach to SED modeling can also be extended to galaxy spectra. Constructing an ANPE for the full data space of spectra would requires estimating a dramatically higher dimensional probability distribution. SDSS spectra, for instance, have ∼3, 600 spectral elements. In our approach we include the uncertainties of observables as conditional variables, which doubles the curse of dimensionality. Recent works, however, have demonstrated that galaxy spectra can be represented in a compact low-dimensional space using autoencoders (Portillo et al. 2020, Melchior & Hahn, in prep.). In Portillo et al. (2020), they demonstrate that SDSS galaxy spectra can be compressed into 7-dimensional latent variable space with little loss of information. Such spectral compression dramatically reduces the dimensionality of the conditional variable space to dimensions that can be tackled by current ANPE methods. We will explore SED modeling of galaxy spectrophotometry using ANPE and spectral compression in a following work.
A. TESTING OUTSIDE SEDflow TRAINING RANGE For a small number of NSA galaxies, 588 out of 33,884, SEDflow does not produce valid posteriors. The normalizing flow of SEDflow generates posteriors that are entirely outside of the prior volume because the photometry or uncertainties of the galaxies lie outside of the support of the SEDflow training data (Section 3.2). These galaxies either (1) have unusually high photometric uncertainties that are not accounted for in our noise model or (2) they have photometric colors that cannot be modeled by our SED model. In Figure 5, we present the distribution of photometric magnitudes, uncertainties, and redshift (x = {f X , σ X , z}) of these NSA galaxies. We mark galaxies that are outside the SEDflow training data support (black) due to (1) in orange and (2) in blue.
We classify the galaxies as (1), if they have σ X that is unusually high for a given f X in at least one photometric band: σ X > µ σ X (f X ) + 3σ σ X (f X ) (see Eq. 4). There are 490 galaxies without valid SEDflow posteriors due to (1). Many of them lie well beyond the locus of training data points. The SEDflow estimate of p(θ | f X , σ X , z) is only accurate in regions of x -space where there is sufficient training data. This requirement is not met for these galaxies. In principle, if we use a more conservative noise model than Eq. 4 and construct noisier training data, we can expand the support of SEDflow. SEDflow would then produce sensible posteriors for more NSA galaxies. However, there is an inherent trade-off. For a training data set of fixed size, a more conservative noise model would reduce the amount of training data in x -space where the vast majority of NSA galaxies lie and can reduce the accuracy of the posteriors in these regions. Since, SEDflow fails for only a small fraction of NSA galaxies, we do not explore more conserative noise models in this work.
Next, we examine the galaxies that lie outside of the SEDflow support because they have colors that cannot be modeled by our SED model. In Figure 5, we classify NSA galaxies as (2) if any of their colors (e.g. u − r, u − g, r − z) is bluer than the 99.9% percentile of the training data color. There are 98 galaxies without valid SEDflow posteriors due to (2). The i versus z magnitude panel in particular highlights how a significant number of galaxies are bluer than the training data. The fact that the training data do not span these colors suggests that the PROVABGS SED model may not fully describe all types of galaxies in observations. As we discuss in Section 6, this may be due to limitations in the SFH and ZH prescription in the PROVABGS model or our understanding of stellar evolution. Limitations of the SED model equally impacts conventional MCMC sampling approaches and is beyond the scope of this work. We, therefore, do not examine the issue further. For completeness, we derive posteriors for NSA galaxies, for which SEDflow fails, using PROVABGS with MCMC sampling with the same configuration as Hahn et al. (2022).  Figure 5. The distribution of photometric magnitudes, uncertainities, and redshift of NSA galaxies for which SEDflow does not produce valid posteriors (blue or orange). These galaxies lie outside of the suport of the SEDflow training data (black) so SEDflow cannot accurately estimate their posteriors. We mark the NSA galaxies that have unusually high photometric uncertainties that are not accounted for in our noise model in orange and galaxies that have photometric colors that cannot be modeled by our SED model in blue.