A path towards constraining the evolution of the interstellar medium and outflows in the Milky Way using APOGEE

In recent years, the study of the Milky Way has significantly advanced due to extensive spectroscopic surveys of its stars, complemented by astroseismic and astrometric data. However, it remains disjoint from recent advancements in understanding the physics of the Galactic interstellar medium (ISM). This paper introduces a new model for the chemical evolution of the Milky Way that can be constrained on stellar data, because it combines a state-of-the-art ISM model with a Milky Way stellar disc model. Utilizing a dataset of red clump stars from APOGEE, known for their precise ages and metallicities, we concentrate on the last 6 billion years – a period marked by Milky Way’s secular evolution. We examine the oxygen abundance in the low-𝛼 disc stars relative to their ages and birth radii, validating or constraining critical ISM parameters that remain largely unexplored in extragalactic observations. The models that successfully reproduce the radius – metallicity distribution and the age – metallicity distribution of stars without violating existing ISM observations indicate a need for modest differential oxygen enrichment in Galactic outflows, meaning that the oxygen abundance of outflows is higher than the local ISM abundance, irrespective of outflow mass loading. The models also suggest somewhat elevated ISM gas velocity dispersion levels over the past 6 billion years compared to galaxies of similar mass. The extra turbulence necessary could result from energy from gas accretion onto the Galaxy, supernovae clustering in the ISM, or increased star formation efficiency per freefall time. This work provides a novel approach to constraining the Galactic ISM and outflows, leveraging the detailed insights available from contemporary Milky Way surveys.


INTRODUCTION
Understanding the evolution of galaxies requires studying the relationship between their baryonic components, namely, stars and gas.The connections between these components provide information on the complex interaction between baryons and dark matter, and the galactic lifecyle.This is best studied within our own Galaxy, thanks to several rich, large volume spectroscopic datasets that have collectively mapped more than 10 million stars in the Milky Way (Cui et al. 2012;Majewski et al. 2017;Buder et al. 2021;Gaia Collaboration et al. 2018, 2023).With the help of these datasets, we are now starting to put together the history of our Galaxy on a star by star basis (Galactic archaeology), from in-situ formation of stars in the proto ★ sharda@strw.leidenuniv.nl(PS) † yuan-sen.ting@anu.edu.au(YST) ‡ neige.frankel@utoronto.ca(NF) Milky Way (e.g., Bonaca et al. 2020;Belokurov & Kravtsov 2022;Semenov et al. 2023a) to accretion events and mergers that dumped stars and gas from external galaxies (e.g., Helmi et al. 2018;Helmi 2020;Horta et al. 2020).
Stars not only encode information on the dynamical evolution of the galaxy, but also preserve signatures of the chemistry of the ISM they were born out of.There has been considerable success in using chemical evolution models that lead to insights on the dynamical history, showing that even with implicit influences taken into account, proper modeling can illuminate the evolution of the Milky Way.However, this is complicated by the fact that stars tend to migrate away from their birth location as they age (Sellwood & Binney 2002;Schönrich & Binney 2009b), and this migration can be as strong as a few kpc over a few Gyr (e.g., Minchev & Famaey 2010;Bird et al. 2012;Klačka et al. 2012;Martínez-Barbosa et al. 2015;Frankel et al. 2018;Lian et al. 2022a).In the last decade, significant investments have been made in combining chemical evolution models with ra-dial migration models, to characterize the detailed chemo-dynamical assembly history of the Galaxy (e.g., Schönrich & Binney 2009b,a;Minchev et al. 2013Minchev et al. , 2014;;Kubryk et al. 2015;Spitoni et al. 2015;Vincenzo & Kobayashi 2020;Johnson et al. 2021;Chen et al. 2023b).
However, the prescription for gas-phase abundances used in many of these works is either a parametrization that is not physically motivated, or does not arise from first principles, and therefore cannot reflect the complexity of processes in the interstellar medium (ISM) that sets the gas-phase metallicity.As such, these models cannot be used to explore variations in ISM physics (e.g., Sanders & Binney 2015;Sharma et al. 2021;Spitoni et al. 2021;Lu et al. 2022;Ratcliffe et al. 2023).Further, models that include a prescription for ISM physics either do not incorporate radial variations, or only solve for the mass and metal balance in the ISM, but not energy (e.g., Minchev et al. 2014;Andrews et al. 2017;Weinberg et al. 2017;Grisoni et al. 2017Grisoni et al. , 2018;;Johnson et al. 2021), even though numerous Galactic as well as extragalactic observations have pointed out the role of ISM energetics in modulating star formation and subsequent chemical enrichment (e.g., Salim et al. 2015;Grand et al. 2015;Krumholz et al. 2018;Forbes et al. 2019;Bacchini et al. 2020;Ginzburg et al. 2022;Ostriker & Kim 2022;Ibrahim & Kobayashi 2024;Sharda et al. in preparation).There is very limited work on connecting Galactic ISM physics to Galactic Archaeology, even though chemical evolution is clearly dependent on ISM physics (Weinberg 2023).Similarly, we also lack a bridge between Galactic and extragalactic domains, and the evolutionary history of the Galactic ISM inferred from Milky Way chemical evolution models is not generally validated against extragalactic gas-phase observations that often provide more stringent constraints on ISM physics.
Complementary to the chemical evolution models developed for the Milky Way, there exists a number of sophisticated ISM evolution models designed to study the gas-phase abundances in external galaxies.These non-parameterized ISM models are generally more descriptive in their treatment of ISM physics, with model parameters mostly constrained by (or, naturally reproduce) extragalactic observations (e.g., Forbes et al. 2014Forbes et al. , 2019;;Krumholz et al. 2018;Krumholz & Ting 2018;Sharda et al. 2021aSharda et al. , 2024;;Yates et al. 2021;Ginzburg et al. 2022;Stevens et al. 2023).However, these models have not been extensively tested on the Milky Way (beyond reproducing the time evolution of the oxygen abundance gradient - Sharda et al. 2021a), despite the fact that it is the only galaxy where we can study chemical evolution on a star by star basis.Therefore, the comprehensive treatment of ISM physics that makes these models state of the art has not been investigated using Milky Way observables.Further, as we show in this work, the Milky Way is the only galaxy that can provide insights into some of the parameters in these ISM models that have no constraints from extragalactic observations.As the current and even the upcoming generation of simulations remain far from simulating Milky Way like galaxies in a cosmological context with enough resolution to resolve star formation and feedback (e.g., Emerick et al. 2018;Lahén et al. 2020;Brown & Gnedin 2022;Wibking & Krumholz 2023;Carrillo et al. 2023; see however, Hopkins et al. 2024), there is vast scope for applying semi-analytical ISM models to reproduce Milky Way observables, and infer the evolutionary history of the Galactic ISM.
In this work, we use a combination of a simple forward model for the Milky Way's disc that includes stellar radial migration, and ISM gas-phase metallicity model to predict the radius -metallicity and age -metallicity distributions of stars, which we then compare with the observed distribution of red clump stars from The Apache Point Observatory Galactic Evolution Experiment (APOGEE, Bovy et al. 2014; Majewski et al. 2017) DR14 (Abolfathi et al. 2018). 1 Our aim here is only to provide a qualitative assessment of how such a combination of gas + stellar chemo-dynamical modeling can be a powerful tool to constrain the evolutionary history of the ISM of the Milky Way.We organize the rest of the paper as follows: Section 2 describes the APOGEE data we make use of in this work, Section 3 gives an overview of the stellar disc and ISM chemical evolution models we use, Section 4 introduces the fiducial (or, the base) model we start the analysis with, Section 5 provides the results of the comparison between the APOGEE data and the models, Section 6 presents a discussion of the key features of our analysis, and Section 7 concludes our work.For the purpose of this paper, we use Solar metallicity Z ⊙ = 0.0134, corresponding to 12 + log 10 O/H = 8.69 (Asplund et al. 2021).

DATA
We retrieve a sample of 20100 red clump stars from the APOGEE survey that delivered near-infrared spectroscopic data within  gal ∼ 5 − 14 kpc from the Galactic centre (Bovy et al. 2014).These stars are well known for their precise ages and metallicities (Girardi 2016).Red clump stars were identified by predicting their astroseismic parameters from the spectra with a neural network trained on APOKASC2 data (Pinsonneault et al. 2018), and published in a catalog by Ting & Rix (2019).The same method also derived stellar ages.Following Frankel et al. (2018, figure 1), we only use the sub-sample of these stars which are relatively young (age < 6 Gyr, or equivalently,  ≤ 0.6), have low [/Fe] (where  denotes Type II elements), and are confined within  gal = 1 kpc from the midplane.These stars belong to the thin disc at a time when the Galaxy had a relatively secular evolution.Further, simulations find that the CGM of Milky Way analogues virialized ∼6.5 Gyr ago, and bursty star formation stopped to allow for quiescence, enabling them to form stable gas discs (Yu et al. 2021a;Hopkins et al. 2023;Semenov et al. 2023b;McCluskey et al. 2024).Hereon, we simply refer to the low- disc in the Galaxy as the 'disc'.We additionally sub-select stars in the main APOGEE sample, that were drawn randomly from the underlying distribution rather than specific programmes (e.g., clusters) that would not represent the intrinsic Milky Way distribution.We end up with a subsample of 15714 stars that we use in this work, covering a Galactocentric distance between 4 − 14 kpc.

Metallicity
Stellar metallicity is commonly characterised by the iron abundance, whereas ISM metallicity is referenced using oxygen as most of the iron in the ISM is locked in dust (e.g., Draine 2011) (Vincenzo et al. 2021).We show the footprint of [O/H] for our subsample in Figure 1.As Figure 1 shows, the Milky Way has a clear abundance gradient in [O/H], with stars in the inner regions being more metal rich on average (see also, Lian et al. 2023).Following (Vincenzo et al. 2021), we adopt a conservative uncertainty of  [O/H] = 0.1 dex on the oxygen abundance at birth.The uncertainty on stellar ages we adopt is   = 0.15 dex (Frankel et al. 2020).

OVERVIEW OF THE MODELS
We aim to compare the predictions from our model to observational data from APOGEE.To this end, we build a forward model, produce mock observations, and compare the distributions of the mock observations to those in the APOGEE dataset.We describe in brief the stellar disc model and the ISM metallicity model we use in this work.We predict the joint distribution of observables plus the (latent) stellar birth radius  (or Galactocentric radius in the gas) where the first component is the ISM model that we explore in this work, and the second component is the best-fit stellar disc model combined with the APOGEE selection function as in Frankel et al. (2020).Here, [O/H] is the oxygen abundance,  is the stellar age, ì   is the line of sight velocity (from APOGEE DR14), ì  is the 3D spatial position and ì  is the proper motion (from Gaia DR2 -Gaia Collaboration et al. 2018).
Figure 2 shows a graphical model that depicts the overall structure of the modeling we perform in this work.Essentially, the graphical model describes that we keep parameters of the stellar disc model (that is applied to every star) fixed, while exploring the parameter space of the ISM chemical evolution model.We refer the reader to Frankel et al. (2018Frankel et al. ( , 2020) ) and Sharda et al. (2021aSharda et al. ( , 2024) ) for a detailed description of the two models, respectively.To produce a synthetic distribution of metallicities, ages and present day radii, we first make a mock stellar dataset and then use the ISM model to paint the abundances based on birth radius and age.We then evaluate the performance of the model against the observed radius -metallicity and age -metallicity distributions.Sharda et al. (2021aSharda et al. ( , 2024) ) present a first-principles model to explore the physics of radially-resolved gas-phase metallicities in galaxies.In addition to conventional galactic chemical evolution models, this model brings together key metal transport processes like radial gas flows and metal diffusion with gas accretion and outflows to model the metallicity gradients, which makes it applicable to a wide variety of galaxies both in the local Universe and at high-redshift (Sharda et al. 2021b,c;Gillman et al. 2022;Cheng et al. 2024).The model incorporates a galactic disc model (Krumholz et al. 2018;Ginzburg et al. 2022) to predict the gas-phase metallicity as a function of the galactocentric radius.The galactic disc model has been validated against numerous gas scaling relations in galaxies (e.g., Johnson et al. 2018;Yu et al. 2019;Varidel et al. 2020;Yu et al. 2021b;Girard et al. 2021;Parlanti et al. 2023).

ISM metallicity model
Before we describe details of the model relevant to this work, it is helpful to provide some remarks about the model.Note that this is an equilibrium model, meaning that the metal distribution is assumed to be in steady-state at a given time.At least for the last 6 Gyr, Sharda et al. (2021a) find that the metal equilibration time is less than the molecular gas depletion time, thus it is safe to assume the galactic disc is in metal equilibrium during this period.The model assumes axisymmetry and vertical hydrostatic equilibrium, and therefore does not include azimuthal scatter in metallicity at fixed radius.However, this is not a problem since simulations find that azimuthal scatter only becomes dominant over the radial gradient at  ≳ 1 in Milky Waylike galaxies, corresponding to lookback times ≳ 8 Gyr (Bellardini et al. 2021(Bellardini et al. , 2022)).
At each epoch, the model first solves for mass balance in the disc by equating sources (accretion) and sinks (star formation, transport and radial gas flows, outflows) of gas mass in the disc.Then, we solve for energy balance in the ISM by requiring that the rate of energy injected into the ISM balances the rate of energy dissipation, which sets the gas velocity dispersion.These steps constrain the gasphase properties of the disc, which we then use to find a solution for metallicity by invoking metal equilibrium.
The model finds two solutions for the ISM metallicity, depending on the nature of star formation in the galaxy.Specifically, the galaxy evolution model used as an input to the metallicity model (Krumholz et al. 2018) separates star formation into the Toomre and giant molecular cloud (GMC) regimes, based on the fact that stars do not form in a continuous volume-filling medium in galaxies like the Milky Way.Instead, most star formation in these galaxies occurs in GMCs with densities ∼ 100× higher than the midplane density and are hence dynamically decoupled from the disc (Ceverino et al. 2010).For the GMC regime, and for the Toomre regime, where Z R 0 is the metallicity at the inner edge of the disc  0 ,  = / 0 is the dimensionless radius normalized to the inner edge of the disc, and  1 is a constant of integration that is constrained by the metallicity at the outer edge of the disc (or, the CGM metallicity,  1. the ratio of metal advection (due to radial gas flows), metal production and loss (due to star formation and outflows) and gas accretion (due to metal-poor inflows) to metal diffusion due to turbulence.These dimensionless ratios naturally arise in a chemical evolution model that takes into account all major metal transport processes, and includes mass, metal and energy balance in the ISM.We deduce from equation 3 and equation 2 that if P or A are larger than S, the resulting metallicities will be lower, with flatter metallicity profiles, and vice-versa.We can intuitively understand this as follows: high levels of star formation or weaker outflows (factors that increase S) will lead to the disc being more enriched as well as retaining more metals, thereby increasing the ISM metallicity.Similarly, higher gas turbulence, transport or accretion (factors that enhance P and A) will either efficiently mix metals or dilute the overall metallicity, thereby creating flatter metallicity profiles.
For the purpose of this work, we fix  0 = 3 kpc since 1.) this is the range of the APOGEE data we use, and 2.) the ISM model does not apply to the innermost regions where the bar dominates galactic dynamics.Sharda et al. (2021a) find that for massive local galaxies, Z R 0 ≈ S/A.Here, we have also assumed that the rotation curve index is 0 for the range in  we use, since the rotation curve of the Galaxy flattens out around  ≈ 1 (Bland-Hawthorn & Gerhard 2016).Fractional contribution of radial gas flows to turbulence 1,9 equation 5 0 − 1  g () Gas velocity dispersion

Metal advection and radial gas flows
Following Sharda et al. (2024, equation 23), we express P as the ratio of metal advection to metal diffusion We summarise the definitions and fiducial values of all the parameters in equation 4 in Table 1.We discuss one critical parameter,  ( g ), in detail below.The steady-state radial gas flow that we incorporate in the ISM metallicity model comes from the evolution of the gas velocity dispersion due to turbulent dissipation, star formation feedback, gas accretion, and non-axisymmetric torques (Krumholz & Burkert 2010).Under energy equilibrium, the radial gas flow rate can then be ex-pressed using  ( g ) as (Sharda et al. 2024, equation 17) where  sf is the gas velocity dispersion powered by star formation feedback in a galactic disc.The exact formulation of  sf is provided in Sharda et al. (2024, equation 14); here, we only highlight that it is linearly proportional to the average momentum injected per unit stellar mass formed by a single supernova.The fiducial value we use is 3000  SN km s −1 (e.g., Thornton et al. 1998;Walch & Naab 2015;Kim & Ostriker 2015), where  SN > 1 takes into account the fact that the average momentum scales with the number of supernovae that go off simultaneously in the same patch of the ISM (also known as supernova clustering - Gentry et al. 2017;Hu et al. 2023).In the fiducial model, we set  SN = 1.We will later see the potential importance of supernova clustering to explain the abundance patterns of our APOGEE sample.For Milky Way-like galaxies,  sf ∼ 6 − 15 km s −1 (e.g., Joung et al. 2009;Kim et al. 2011;Krumholz et al. 2018;Varidel et al. 2020;Bacchini et al. 2020).
Gas accretion not only adds mass to the disc, but also injects energy into the disc, depending on the angular momentum of the gas, clumpiness, and where it joins the disc (e.g., Klessen & Hennebelle 2010;Mandelker et al. 2020;Bacchini et al. 2020;Forbes et al. 2023;Jiménez et al. 2023). acc describes the velocity dispersion due to turbulence driven by gas accretion.Sharda et al. (2024, equation 16) provide the exact expression for  acc ; critically, it depends on the efficiency with which accreting gas can convert its kinetic energy into turbulence,  a .To date, this fractional parameter has no constraints from direct observations, and it is not yet known how it varies with redshift, although the general expectation is that it increases with redshift (Ginzburg et al. 2022).The fiducial value Sharda et al. (2024) adopt from Ginzburg et al. (2022) is  a = 0.2 (1 + ).
In energy equilibrium, the rest of the contribution to  g , if any, comes from gas transport via radial gas flows.Transport is switched off if star formation feedback and gas accretion are sufficient to drive the turbulence required; in such cases,  ( g ) = 0, and the galaxy can take any  ≥  min .Values of  ( g ) < 0 are not allowed in equilibrium.

Metal production, star formation and outflows
Now, we consider the dimensionless ratio S that describes the role of metal production and ejection against diffusion.In the GMC regime it is given by and in the Toomre regime, As in Section 3.1.1,we define the symbols and the corresponding values in Table 1, and discuss the key parameters below.
There is some subtlety in selecting an appropriate value for  sf , as  sf ≪ 1 for local spiral galaxies like the Milky Way that lie in the GMC regime (e.g., Krumholz et al. 2012;Saintonge et al. 2017;Catinella et al. 2018).Our fiducial choice is to use observational constraints on the ratio of molecular to molecular + atomic gas in galaxies from Tacconi et al. (2020) and Saintonge & Catinella (2022).Later on, we will also explore theoretical models that can parameterize  sf in terms of local ISM properties (Krumholz et al. 2009;Krumholz 2013).
The fractional parameter  y describes the differential metal enrichment of galactic outflows, and is one of the most important parameters but least constrained in the model. y takes into account the fact that some fraction of newly produced elements (given by the yield ) can be preferentially ejected via galactic outflows, such that the metallicity of galactic outflows can be higher than the ISM metallicity of the galaxy.This can happen due to imperfect metalmixing, or due to metal-rich ejecta from supernovae directly being entrained in an outflow that dumps metals into the CGM.We express it as (Sharda et al. 2024, equation 26) where  w is the outflow mass loading factor and Z w is the metallicity of the outflow normalized to Solar.Following Sharda et al. ( 2024), we explore three models that specify  w as a function of galaxy mass or gas fraction: 1.) from the analytical work of Hayward & Hopkins (2017), 2.) from FIRE-2 simulations (Hopkins et al. 2018) studied by Pandya et al. (2021), and 3.) from EAGLE simulations (Schaye et al. 2015) studied by Mitchell et al. (2020).Our fiducial choice is the former.Despite there being heavy theoretical and observational support (e.g., Mac Low & Ferrara 1999;Peeples & Shankar 2011;Forbes et al. 2014;Emerick et al. 2018;Chisholm et al. 2018;Lopez et al. 2020;Cameron et al. 2021;Sharda et al. 2021b;Vĳayan et al. 2023), almost all chemical evolution models for the Milky Way simply assume  y = 1 across cosmic time, which corresponds to the outflow metallicity being the same as the ISM metallicity (see Chen et al. 2023b for an exception).We leave it as a free parameter in the fiducial model, but later on we will see how models that best match the data prefer  y < 1.

Metal dilution and gas accretion
Accretion of metal-poor gas from the cosmic web naturally dilutes the overall metal content in galaxies.We express the last dimensionless ratio, , that describes the ratio of gas accretion to metal diffusion as where  g,acc is the gas accretion rate onto the disc. g,acc is proportional to the baryonic accretion efficiency of dark matter haloes,  in , as well as the halo concentration index .We neglect the presence of galactic fountains (whereby a galaxy can re-accrete metal-rich gas that it ejected in a past cycle) due to lack of models that selfconsistently include mass, metal and energy exchange between the disc and the CGM via fountains, although recent attempts have been made in this direction (e.g., Pandya et al. 2023;Carr et al. 2023).

Metal diffusion
Once produced, metals also diffuse into the ISM due to turbulence and thermal/magnetorotational instabilities.Although the existence of metal diffusion has been widely acknowledged (Yang & Krumholz 2012;Petit et al. 2015;Krumholz & Ting 2018;Rennehan et al. 2019;Beniamini & Hotokezaka 2020), no Galactic chemical evolution models have incorporated it to date.A key reason for this is the absence of energy balance in these models.The Sharda et al. ( 2024) ISM model we use includes a treatment for turbulent metal diffusion.
The diffusion coefficient in the model is set by the product of the gas scale height and the velocity dispersion (Karlsson et al. 2013, equation 11).We refer the reader to Sharda et al. (2021a, section 2.2) and Sharda et al. (2024, section 1.2) for a detailed discussion of how this is achieved in the model; importantly, we point out that metal diffusion can become dominant over other physical processes in setting metallicity variations in some galaxies.

Stellar disc model
We borrow the Milky Way's low- disc model from Parameters  in and  a are used to describe efficiencies with which gas can be accreted and drive turbulence in the disc, respectively. sf and  gQ describe gas fraction in the disc. w is the mass loading factor,  g,acc is the gas accretion rate, and  is the halo concentration index. g is the gas velocity dispersion, a part of which is driven by star formation feedback ( SF ) and gas accretion ( acc ).A detailed description of all the parameters is available in Table 1.
mentum and radial action of the stars. 3The model describes where and when stars were born, and how they subsequently changed orbits via radial migration (diffusion in angular momentum) and radial and vertical heating (average increase in radial action and scale-height).These aspects are encoded in structural parameters { ì  str }.The model also had, in Frankel et al. (2020), a chemical enrichment component for their fitting purpose, with enrichment parameters ì  chem .But in this work, we replace their chemical enrichment by the one in Section 3.1.
The model parameters for the Milky Way disc model can be split into structural and chemical evolution parameters ì  MW = { ì  str , ì  chem }.We fix ì  str to the best fit of Frankel et al. (2020) and in this work we explore ì  chem with the ISM chemical evolution model against red clump stars data.
Note that the star formation history used by the stellar disc model is not necessarily self-consistent with that produced by the ISM chemical evolution model.For the purpose of this work, we have kept the stellar disc model fixed, so in essence, we treat the two star formation histories independently.However, the star formation rates between the two models are similar (Frankel et al. 2020), and satisfy direct constraints from observations, so we expect this inconsistency to not make a significant impact on our inferences.In any case, this inconsistency is present in any work with chemical evolution models that only compare the chemical -age trends (modelled vs observed) rather than the age distribution of the dataset (predicted by the model vs observed).Here, the star formation history of the stellar disc models is sampled so that the age distribution of the stars in the dataset 3 We could equally use the guiding radius instead of the Galactocentric radius since the low- disc model predicts the joint distribution of   and  gal (Frankel et al. 2020, figure 6), but it would not be more informative for our purpose, since we do not vary the dynamical part of the model.is well reproduced, but it does not talk to the chemical enrichment part.The star formation history from the chemical evolution model is used only to compare the age-metallicity-radius trends.Ideally, we would like to resolve this inconsistency by fully combining the two models, but this is the scope of a future work.

FIDUCIAL MODEL
We begin our analysis by creating a fiducial model.The fiducial model in this context does not mean the model that best describes the data; rather, it is the model where parameters are set to values typical of local spiral galaxies.There is no guarantee that such values also represent the Milky Way.We list the parameter values for the fiducial model in Table 1.The model has three parameters for which there are no constraints:  y , [O/H] CGM , and  max , the last two of which are only used to create boundary conditions for the model.To statistically sample these parameters for each model we run (for reasons which will become clear in Section 5), we use Latin Hypercube Sampling (LHS).LHS is a stratified sampling scheme where points sampled for each variable are uniformly distributed but no two samples share the same combination of values.
To generate an LHS sample, we vary  y between 0.1 − 1, [O/H] CGM between −2 to −0.3, and  max between 13 − 23, essentially spanning the entire range of  birth sampled by the Frankel et al. ( 2020) stellar disc model.We create 10000 samples for each model.As we are interested in learning about the metal entrainment in Galactic outflows, we will later restrict the LHS sampling for  y to different subsets between 0.1 -1.
We begin from  = 2, and evolve the model to  = 0. We set the initial halo mass at  = 2,  h = 3.5 × 10 11 M ⊙ such that the Moster et al. (2013) stellar mass -halo mass relation gives the present day stellar mass in the disc  ★ ≈ 4 × 10 10 M ⊙ .At each epoch, we solve for the evolution of the gas mass (mass balance), gas velocity dispersion (energy balance), and metallicity (metal balance). 4Note that the exact starting redshift does not matter as the model quickly establishes equilibrium within the first few 100 Myr (Ginzburg et al. 2022).We show the evolution of parameters that vary with  in Figure 3.The end product is that we obtain metallicity at each epoch and Galactocentric radius that we pass on to the stellar disc model.

Comparison to APOGEE
In order to make a statistical comparison between the models and the APOGEE sample, we generate radius -metallicity and agemetallicity distributions from the models.We then assess the modeled distributions against the data using simplified versions of four distinct metrics: • Bhattacharya Distance: this metric describes the dissimilarity of two distributions, and is sensitive to the shape as well as the overlap between the distributions (Bhattacharyya 1943). 5In its simplest form, it can be expressed as where I model corresponds to the normalized model histogram value (flattened to 1D) in either the radius -metallicity or the agemetallicity domains and  bins is the total number of bins in these domains.
• KL Divergence: the Kullback-Leibler distance measure describes the cost of approximating one distribution with another, and is sensitive to the peaks and tails of the distributions.It is expressed as (Kullback & Leibler 1951) • Earth Mover's Distance: this metric describes the costs associated with transforming one distribution into another.If two distributions are identical, the cost is 0. It is also known as the Wasserstein metric.It is expressed as (Vasershtein 1969) • Normalized squared residual: We use a form of the normalized squared residual as our last metric that can quantify goodness of the fit of the models, Note that we only apply these metrics if both I data, and I model, ≠ 0. We apply a penalty to bins in radius and age where the model has a non-zero value while the APOGEE sample does not, and vice versa.
The penalty per bin is equivalent to the square root of the bin value, so that bins where a model deviates more from the APOGEE sample are penalized by a larger value.This penalization plays a crucial role in accurately assessing the performance of different models, as it takes into account their shortcomings as well as their successes.
We use multiple metrics because they have their own strengths and weaknesses (e.g., Canale & Dunson 2011;Kurtek et al. 2012).If the model and the data were identical, all the four metrics will yield a value of zero.Therefore, for all the metrics, the best match corresponds to the least value of the metric.These metrics are unavoidably crude, but they are sufficient for our goal to gain a qualitative understanding.A more accurate approach would include fitting the MDF using likelihood-free inference, as we will show in a forthcoming work.
For the remainder of this work, we will use the sum of the four metrics to determine how different models perform, where we have scaled M KL by a factor of 10 to bring its mean magnitude at par with other metrics.The best models are the ones with the least value of the sum, M. In Appendix A, we show this is reasonable since the four different metrics estimate the performance in similar ways (i.e., it is not the case that one metric predicts a significantly different result than another) but there is some scatter that could otherwise bias the result towards models that excel only on a subset of metrics.

Variations in the fiducial model
As we mention above, the fiducial model well describes the ensembleaveraged evolution of local spiral galaxies, but is not guaranteed or tuned to reproduce Milky Way observables.In this sense, the fiducial model is only a starting point for us to explore the parameter space of Milky Way observations.As our objective is to obtain a qualitative understanding of the gas-phase history of the Milky Way, we focus on varying the ISM model parameters, while keeping the stellar disc model fixed.This approach allows us to determine whether changes in a specific ISM parameter align with the desired outcome in terms of replicating the data.Consequently, it serves as a valuable tool for unraveling the influence of each ISM model parameter on the overall metallicity distribution with radius and age.
The model consists of two types of parameters: those with numerical values that can be adjusted (e.g.,  ff ), and those that require selecting a specific functional form (e.g.,  sf ).For the former, we modify the parameter values by increasing or decreasing them to assess if the adjustments enhance agreement with the data.For the latter, we explore alternative functional forms by substituting the fiducial choice with other existing models or observationally-derived functional forms.We then use the metrics we describe in Section 4.1 to evaluate the performance of the variation invoked.We list all the models we create in Table 2.As we shall see below, the result is insensitive to variations in some parameters, but is clearly influenced by others.

PHYSICAL CONSTRAINTS ON ISM PARAMETERS
Following Frankel et al. (2018), we bin the APOGEE data in six bins between 4 − 13 kpc, representing the present-day galactocentric radius of our sample of stars.Similarly, we bin the data in seven age bins spaced 1 Gyr apart.As we show in Appendix B, our results are robust against the number of bins used in either dimension.The left It is not useful to gauge the performance of the fiducial model simply based on the metric value since we do not fit the model parameters.However, we can judge the relative performance of different models by comparing their overall metric values.In fact, we will see below that some variations in the fiducial model produce a much better match with the data, whereas others perform poorly.In particular, we find four variations of the fiducial model to produce the lowest value on our metric (within the uncertainty) -we refer to these as the best models.Below, we club and analyze results of each model we create based on the ISM physics they impact.

Differential metal enrichment of outflows
The first variations we introduce relate to the differential enrichment of galactic outflows, and are encoded in  y .The handful of nearby galaxies where outflow metallicity has been directly measured (Chisholm et al. 2018) suggest that  y increases with stellar mass (Sharda et al. 2021b), implying that low mass galaxies preferentially lose metals via galactic outflows.
y plays a central role in the ISM model, and the impact of variations in other parameters is limited as compared to  y .We thus opt to study different regimes of  y rather than handpicking a few select values or scaling it up/down by some factor (as we do for the rest of the parameters).Models 2, 3, and 4 limit the range over which we vary  y in the fiducial model.The limits we introduce ( y = {0.4,0.7, 1.0}) are rather arbitrary, but capture distinct regimes of metal enrichment  2024).The fiducial values of all ISM parameters are listed in Table 1.
of outflows.We have checked that models with  y ≲ 0.4 produce significantly more metal-poor stars than observed, so we do not discuss them in this work.
We present the resulting model distributions in Figure 5 and report the metric values in Table 2.We find that model 2 (0.4 ≤  y ≤ 1.0) and model 3 (0.7 ≤  y ≤ 1.0) perform slightly better and worse than the fiducial model, respectively.Interestingly, model 4 (0.4 ≤  y ≤ 0.7) performs much better as compared to models 1 − 3, with a metric score roughly half of these models.This is also visible in the distributions for model 4, as we read off from Figure 5.The effect of  y < 1 is that outflows remove a considerable fractions of metals from the disc, thereby making it difficult to produce stars with very high [O/H], which aligns with the observed distributions.We have checked that the best models prefer 0.4 ≤  y ≤ 0.7 regardless of the value of (or variations in) other parameters.

Outflow mass loading factors
Based on the results from models 1 -4, we set a new baseline for creating more models, where we limit 0.4 ≤  y ≤ 0.7.We have confirmed that models with all other variations that follow perform better when  y is restricted to this range as compared to  y < 0.4 or  y > 0.7.In model 5, we set the outflow mass-loading factor,  w (), following results from the FIRE-2 simulations (Pandya et al. 2021).Similarly, for model 6, we scale  w () with stellar mass following the EAGLE cosmological simulations (Mitchell et al. 2020).For reference, at  = 0,  w ≈ 0.3 from FIRE-2 and  w ≈ 1.0 from EAGLE for a galaxy with  ★ ≈ 4 × 10 10 M ⊙ (Sharda et al. 2024, figure 6).
Figure 5 presents the resulting model distributions for model 5.It is difficult to ascertain whether models 5 and 6 perform better than model 4 by visual inspection.The value of the combined metric for models 4, 5 and 6 is 4.11, 4.83, and 4.67, respectively.Given the typical uncertainty of 0.3 associated with these values, we conclude that model 4 (with  w from the analytical model of Hayward & Hopkins 2017) performs better than the others.However, the differences in the value of M between these three models is not as large as models 1 -4 where we varied  y keeping  w fixed.This finding is surprising because the evolution of  w for the range in  ★ we use in this work is different by a factor 2 − 10 between the three scalings (Sharda et al. 2024, figure 6).The relative insensitivity of the model distributions to  w as compared to  y implies that the mass loading of the outflows plays a less critical role in the models as compared to the metal mass loading.For subsequent models, we continue to use  w () from the analytical model of Hayward & Hopkins, because this model also gives  w ( = 0) and  w () in good agreement with direct observations and previous modeling of the Milky Way (Fox et al. 2019;Pan et al. 2019).

Gas accretion
Next, we vary two parameters that control the accretion of gas onto the disc: the halo concentration index  and the baryonic accretion efficiency  in .Models 7 and 8 scale  down and up by 2×, respectively, to examine if tuning this parameter provides a better match against the data.We show the resulting radius -metallicity and age -metallicity distributions in Figure C1 in Appendix C. The value of M is higher than the fiducial model in both cases, indicating a poorer reflection of the data.In particular, we find that the model age -metallicity distribution is flatter than that observed when the concentration index is changed.In particular, model 8 produces more metal-rich stars at younger ages than required, which is reasonable because the dimensionless ratio S increases non-linearly with increasing  via its influence on the rotational velocity   .
Similarly, there is no guarantee that accretion in the Milky Way follows the fits used in the ISM model; in fact, the fit for  in provided by Faucher-Giguère et al. ( 2011) that we use is only valid at  ≥ 2, and  in remains unconstrained at lower  at least for the Milky Way.To test for variations in  in , models 9 and 10 scale it down and up by 2×, respectively.Both these models perform poorly on our metric; model 9 produces a lot more old and metal-rich stars whereas model 10 produces a substantial population of stars at low metallicities in the inner regions of the disc.These trends are expected because A ∝  in , and decreasing A leads to higher [O/H] (see Section 3.1).We thus conclude that variations in  and  in are not required to better reproduce the data.

Gas fraction
Next, we examine parameters that determine the gas fraction in the disc,  gQ and  sf .In model 11, we reduce  gQ by 2× whereas in model 12, we increase it by 1.5×. 6The resulting value of M for both models is greater than 10, indicating a much poorer fit to the data.Since both of these fractional parameters are directly proportional to the dimensionless ratio S, all the stars in model 11 have [O/H] < 0, whereas model 12 produces a lot more metal-rich stars.This is perhaps not surprising given that the fiducial value of  gQ we use is derived from the gas and stellar surface density of the Solar Neighbourhood (McKee et al. 2015).This mismatch therefore indirectly verifies the synergy between APOGEE data and gas-phase observations.
Instead of scaling  sf by a constant factor, we produce a new model (model 13) where  sf () is set by the analytic model of Krumholz et al. (2009) and Krumholz (2013) that describes the fraction of starforming gas as a function of local ISM properties.Model 13 performs the worst among all the models we investigate in this work.It gives a value of  sf < 0.1, and like model 11, does not produce any stars at [O/H] = 0 across the parameter space in radius and age.This is again due to a  sf < 0.1 yielding an unreasonably low S that brings down [O/H].We thus conclude that the fiducial choice for  sf is adequate to describe the fraction of star-forming gas in the Milky Way.

Disc scale height
Given that the ISM model invokes vertical hydrostatic equilibrium, we can also test for the impact of the scale height of the disc on the model distributions.The scale height is given by the combination of the gas surface density and velocity dispersion (Krumholz et al. 2018, equation 24).For this purpose, we turn to the parameters  and  mp since the model assumes that the turbulence crossing time is of the order of the gas scale height.In model 14, we increase  to 3, such that the energy injected in the ISM is radiated away in roughly two crossing times.In model 15 (16), we decrease (increase) the value of  mp to 1 (2). mp = 2 means that the cosmic ray pressure in the disc midplane is equivalent to magnetic pressure, whereas  mp = 1 implies that the thermal pressure is non-negligible.
Models 14 and 15 perform slightly worse than model 4 (the fiducial model with 0.4 ≤  y ≤ 0.7) on our metric, indicating that (1.) our assumption that turbulence is dissipated in a single crossing time is reasonable, and (2.) thermal pressure at the midplane is negligible in the Milky Way, as is expected for massive spiral galaxies.The scale heights we obtain for the disc vary from 0.1 kpc in the inner regions to 0.4 kpc in the outer regions.
We show the distributions for model 16 in Figure 6.The distributions from model 16 are slightly broader as compared to the data, especially for older stars as well as stars located at large radii.Nevertheless, model 16 performs as well as model 4 within the uncertainty.While the dynamic range of  mp is rather limited, the comparable performances of models 4 and 16 imply that cosmic ray pressure plays a non-negligible role in setting the ISM pressure in the disc.Model 16 is one of the four best models we find in our analysis.

Disc self gravity
The self-gravity of the disc determines its stability against gravitational collapse, and is thus an important property that dictates the onset of star formation and metal production.The two model parameters that set the self-gravity of the gas are  min and  Q .In the ISM model, we assume that galaxies can self-regulate  to maintain a balance between star formation, gas accretion, outflows, and transport.Simulations and observations typically find that  min varies between 1 − 3 on orbital timescales (e.g., Leroy et al. 2008;Genzel et al. 2014;Inoue et al. 2016), so we introduce two new models where  min = 1 (model 17) and 3 (model 18).Similarly, we increase  Q to 4 in model 19 to test if a higher Toomre  for gas results in a better match with the data.
From Figure C1, we find that model 17 shows an overabundance of metal-rich stars whereas model 18 shows the opposite.It is not straightforward to find out the reason for these trends because if scaling  min shuts off transport, then  can take on any value equal or larger than  min , and S ∝  −2 whereas P ∝  −2 min .Nevertheless, M > 10 for both models 17 and 18, implying that they can not explain the data.Model 19 performs better than models 17 and 18, however it lags behind model 4.This is expected since both S and A ∝  Q but P ∝  2 Q .These experiments thus suggest that the fiducial values of  min and  Q are adequate to reproduce the observed radiusmetallicity and age -metallicity distributions.

Star formation
To test for the effects of variations in our star formation prescription, we first vary the star formation efficiency per free-fall time by 2×.While observations find  ff ≈ 0.01 across orders of magnitude in gas density and star formation rate (e.g., Krumholz et al. 2012;Salim et al. 2015;Sharda et al. 2018Sharda et al. , 2019)), more precise measurements in resolved clouds reveal an intrinsic scatter in  ff by a factor of 2 (Hu et al. 2022).We find that model 20, where  ff is reduced by 2×, performs somewhat poorly as compared to model 4, as it contains a higher fraction of somewhat metal-poor stars than the data.
However, model 21 where we increase  ff by 2× outperforms model 4, and is one of the four best models.We show the distributions for model 21 in Figure 6.We see that while model 21 nicely captures the extent and shape of the observed radius -metallicity distribution, it does not well reproduce the age -metallicity distribution seen in the data.Nevertheless, the overall value of the metric M is smaller than that of model 4.This result highlights the possibility that stars formed with a slightly higher efficiency per free-fall time in the Galactic disc, which we discuss below in Section 6.
As we mention in Section 3, our ISM model includes two modes of star formation -one where it exclusively occurs in GMCs that are decoupled from the disc, and another where the entire volume of the disc can undergo star formation (the so-called Toomre regime of star  5, but for the models that perform the best on the metric M used in this work.The differences between these models and the fiducial model are listed in Table 2. formation, commonly seen in ULIRGs and starbursts).The transition from one regime to the other is determined by the maximum gas depletion timescale,  sf,max in the model -for large values of  sf,max , star formation occurs in a continuous medium and is dictated by the free-fall time of the gas.On the other hand, if the gas depletion timescale is shorter than the free-fall timescale, stars forms in clouds which are collapsing at the same as the gas is being depleted.These scenarios are depicted by models 22 and 23, where we scale  sf,max down and up by a factor of 2, respectively.A lower value of  sf,max will result in a higher star formation rate, thus, model 22 produces a lot more metal-rich stars whereas model 23 contains no stars with [O/H] > 0. Unsurprisingly, both these models have M > 10, and fail to reproduce the data.This failure is also consistent with observational constraints on  sf,max that find it to be ∼ 2 Gyr for local galaxies (Leroy et al. 2008;Bigiel et al. 2008;Sun et al. 2020).
Finally, in model 24, we switch off the GMC regime, thus mimicking all other chemical evolution models that only consider the Toomre regime (or an even simpler prescription for star formation).The resulting value of M > 40, which is not surprising, as Krumholz et al. (2018) point out the relevance of the GMC regime of star formation for Milky Way-like galaxies.This is critical to take into account because not all the available gas in the Milky Way is forming stars, and most star formation is occurring exclusively in GMCs.

ISM turbulence
One of the key strengths of our ISM model lies in the connection between gas dynamics and metallicity distribution, which, despite longstanding arguments (Tinsley 1980), has not been explored except in a handful of cosmological simulations (e.g., Ma et al. 2017;Hemler et al. 2021).In our ISM model, turbulence can be injected into the ISM by supernova feedback, gas accretion, or radial gas flows.The resulting gas velocity dispersion influences all the three dimensionless ratios: P ∝ ∼  g , S ∝  −2 g , and A ∝  −3 g .In models 25 and 26, we take supernova clustering into account by simply increasing  SN to 2 and 5, respectively.We notice that more intense feedback due to clustering yields a smaller value of M as compared to model 4. We show the distributions for models 25 and 26 in Figure 6.While both models 25 and model 26 can reasonably reproduce the data, we will show later in Section 6 that an excessively large  SN violates observational constraints on the gas velocity dispersion.Hence, our conclusion is that while feedback from clustered supernovae offers a more favorable fit to the APOGEE data, very aggressive feedback is likely implausible as it leads to an overestimation of the gas velocity dispersion.
Similarly, we also investigate if varying the turbulent injection efficiency due to accretion ( a ) has an impact on our results.Models 27 and 28 use a value of  a that is scaled down and up by 2× as compared to the fiducial choice, respectively.While model 27 performs slightly worse than model 4, model 28 performs much better, and is the last of the four best models we find.From Figure 6, we find that the distributions from model 28 are akin to those from model 16 where we took into account the contribution from cosmic ray pressure at the disc midplane.Overall, models 25, 26, and 28 favor a slightly higher level of turbulence ( g ).

Radial gas flows
Our ISM model includes a self-consistent treatment of radial gas flows. 7In equilibrium, radial flows turns off if turbulence present  6).For comparison, also shown are the normalized MDFs for the fiducial model and the model with the worst score on the metric (model 13).A description of all the models is available in Table 2.
in the ISM can be sustained by star formation feedback and gas accretion.We find that some models show radial gas flow and metal advection at some epochs, while it is switched off in other models.Further, radial gas flow is also sensitive to the size of the disc; since we sample over the disc size, we find that gas flows are present for some disc sizes but not for others for the same model.Interestingly, we find that radial gas flows are inactive for all  for the four best models we find in this work.This is not surprising, given that radial gas flows are not always observed in local spiral galaxies (Di Teodoro & Peek 2021).From the point of view of equilibrium models, this is because turbulence can be maintained by accretion and supernova feedback in these galaxies (Krumholz et al. 2018;Bacchini et al. 2020).We discuss this further below in Section 6.

Summary of best models
To summarize, we find four models that outperform the fiducial model.These are model numbers 16, 21, 25 and 28 (see Table 2); they have the least value of the metric M. In model 16, we scaled  mp to 2, thereby including the contribution of cosmic rays to the midplane pressure.In models 21 and 25, we increased the star formation efficiency and supernovae clustering, respectively.Finally, in model 28, we tuned up the level of turbulence injected into the ISM by gas accretion.All these variations provide interesting insights into the ISM history of the Milky Way, which we discuss in detail below.To aid in comparison, we show the normalized MDFs for these models with the APOGEE MDF and the MDF from the fiducial and the worst models in Figure 7.It is immediately obvious that the fiducial model produces a broader MDF than the data, whereas the worst model (model 13) produces a much more metal-poor MDF than the data.The most important result of our analysis is that all the best models prefer 0.4 ≤  y ≤ 0.7, and it seems indispensable to not assume incorporate radial flows that would be necessary to achieve equilibrium after the accreted gas has already been transported down the potential well of the galaxy.
Figure 8. Radial profiles of the star formation rate surface density in the Galaxy, adopted from (Soler et al. 2023).The blue curve corresponds to Σ ★ measured from protostellar clumps in the Hi-GAL survey (Elia et al. 2021(Elia et al. , 2022)).The range curve denotes Σ ★ inferred from massive young stars within 6 kpc of the Sun (Zari et al. 2023).In both cases, the shaded regions denote the 16 th and 84 th percentiles.Overplotted in grey is the range of Σ ★ in the best models.
y = 1 or not include it in chemical evolution models, which is the usual practice.
As we read off from Figure 7, the MDFs for the best models overlap the most with the data MDF, but there is clearly scope for further tuning the model parameters to better reproduce the observed MDF.Given our objective, we have only experimented with very simple variations in the ISM model parameters, and refrained from testing complex, multi-parameter scalings as a function of radius or age.Conversely, we do not expect to reproduce detailed features of the observed MDF.This exercise will be the subject of a forthcoming study.

DISCUSSION
We have seen in Section 5 how varying the ISM parameters impacts the performance of our models against the APOGEE data.We first discuss the common features of the four best models we find in our analysis (summarized in Section 5.10), and argue that they also satisfy constraints from ISM observations for the Milky Way, or, from local spiral galaxies that are expected to evolve in a similar manner to the Milky Way.This is an important sanity check because there is no guarantee that models that well reproduce the stellar distributions do not violate gas-phase observations.Then, we discuss predictions from the best models for parameters that have little to no observational constraints from external galaxies, and provide the first insights into them using Milky Way data.

Consistency with gas-phase observations
Mass flow rates First, we look at the mass flow rates as they set the overall mass balance in the ISM model.The present day star formation rates for the best models is 1 − 3 M ⊙ yr −1 , in excellent agreement with existing multi-wavelength observations (Elia et al. 2022, table 1, and references therein).In addition to the global SFR, we also compare the radial profile of the SFR surface density predicted by the best models with estimates from two different observations in Figure 8.The observational estimates, presented in Soler et al. (2023), are derived from protostellar clumps emitting in the infrared in the Hi-GAL survey (Elia et al. 2021(Elia et al. , 2022) ) that covered a 2 degree strip around the Galactic midplane, and from massive young stars within a 6 kpc × 6 kpc area centered on the Sun (Zari et al. 2023).Our predictions better agree with Σ ★ from the Hi-GAL survey, which is lower than Σ ★ from Zari et al. (2023) by a factor of two.Given the systematic offset between the two datasets (see Soler et al. 2023 for a discussion on possible reasons for the offset) and the qualitative nature of this work, the agreement between the modelled and observed Σ ★ profiles is encouraging, and constitutes an important step in validating the models.We also obtain gas accretion rates of the same magnitude in these models.It is non trivial to compare the model accretion rates with direct observations because the latter decompose the accreting gas into different thermal phases, and at different locations throughout the Galactic halo.Nonetheless, these observations estimate gas accretion rate for the Galaxy to be 0.2 − 3 M ⊙ yr −1 (e.g., Putman et al. 2012;Richter 2012Richter , 2017;;Fox et al. 2014Fox et al. , 2019)), consistent with our models.As we mention in Section 5.2, we also obtain mass loading factors in agreement with estimations for the Milky Way (Fox et al. 2019;Pan et al. 2019), although these measurements suffer from the same problem as the accretion rates.Given the depth of available data, we can confirm that the large-scale mass flow rates for the best models do not violate existing observational constraints.
GMC regime of star formation It is well known that star formation largely occurs in GMCs in the Milky Way (Blitz 1993;Leroy et al. 2008), and we take this into account in the ISM model via the GMC regime of star formation.The importance of the GMC regime is also reflected in all the best models, and in the the large metric value for model 24 where we exclude it.Moreover, the data prefer the fiducial value of the timescale for star formation in the GMC regime,  sf,max , which is adopted from the gas depletion timescale in the Milky Way and nearby local spirals (Leroy et al. 2008;Sun et al. 2020;Kim et al. 2022).Similarly, it is encouraging that models that use the fiducial value of the effective gas fraction in the disc,  gQ , perform better than models where it is varied, as the fiducial value is directly obtained from measurements within the Solar neighborhood (McKee et al. 2015).
Star formation efficiency One of the models that best reproduces the data is model 21, where we set  ff = 0.03, a factor of 2 higher than the typical value (Krumholz et al. 2012).While a factor of 2 variation in  ff is generally not considered significant as it is within observational uncertainties (e.g., Hu et al. 2022), very precise measurements of the gas surface density and star formation rate surface density in nearby Milky Way molecular clouds indeed find  ff ≈ 0.026 (Pokhrel et al. 2021).This consistency between our models that best reproduce the APOGEE data and direct observations of nearby star-forming molecular clouds strongly indicates that stars have formed more efficiently (in a free-fall time) in the Milky Way, at least in its recent history.
Hydrostatic equilibrium We also see that model 16 (with a lower value of  mp ) outperforms other models, although varying  mp has a limited impact on all the metrics.The higher value of  mp is suggestive of the non-negligible role of cosmic ray pressure (as compared to magnetic pressure) in setting hydrostatic equilibrium in the Galactic disc, and is supported by both Galactic observations and theory (Boulares & Cox 1990;Crocker et al. 2021).Furthermore, the scale heights we obtain for the disc (that depends on the gas surface density and velocity dispersion -see equation 24 of Krumholz et al. 2018) vary from 0.1 kpc in the inner regions to 0.4 kpc in the outer regions, in good agreement with those inferred by Bacchini et al. (2019, figure 1) for the Galaxy using classical Cepheids (Chen et al. 2019; see also, Lian et al. 2022b andCantat-Gaudin et al. 2024).
Overall, we find that our best models are consistent with a large set of gas-phase observations of the Milky Way, and can simultaneously reproduce the observed 2D distributions of the metallicity of red clump stars as a function of Galactocentric distance and age.To highlight the predictive power of the models, we next investigate model parameters that are also well constrained by the APOGEE data but have no direct, gas-phase observational constraints as of yet.

Insights into observationally-unconstrained ISM parameters
Three parameters in the ISM model have no direct constraints from observations of the Milky Way or its analogues.These are -1.)preferential metal enrichment of galactic outflows ( y ), 2.) supernovae clustering ( SN ), and 3.) accretion-driven turbulence efficiency ( a ). y dictates the metallicity of galactic outflows, while the other two enhance turbulence in the disc, thereby increasing the gas velocity dispersion.
We concluded in Section 5 that all the best models require 0.4 ≤  y ≤ 0.7.One might worry that any conclusions about  y are potentially sensitive to the assumed IMF-averaged yield , which is uncertain by a factor ∼ 3 for oxygen.This uncertainty arises from our incomplete knowledge of stellar nucleosynthesis (Kobayashi et al. 2020;Romano 2022), which massive stars go supernovae (Heger et al. 2003;Sukhbold et al. 2016), effects of pre-supernova enrichment (Laplace et al. 2021), etc.If the IMF systematically varies as a function of metallicity (as is expected from ISM thermodynamics - Kroupa 2001;Sharda & Krumholz 2022), this will also affect .We have used a constant /Z ⊙ ≈ 2 in this work.If we vary this ratio between 1−3 (e.g., Weinberg 2023, figure 1) and re-consider variations in the fiducial model, we still find 0.4 ≤  y ≤ 0.7 to best describe the data.This implies that in order to reproduce the observed distributions, the ISM model requires the existence of metal-rich outflows (relative to the local ISM) from the disc over the past 6 Gyr.Based on the mass-loading factor we obtain from Hayward & Hopkins (2017), this range of  y corresponds to outflow metallicities 1.2 − 1.8× local ISM metallicity when integrated across the disc.
It is well known that superbubbles seen in nearby spiral galaxies are a result of clustered supernovae explosions (e.g., Mac Low & McCray 1988;Yadav et al. 2017;El-Badry et al. 2019;Watkins et al. 2023).It has been long suspected that superwinds created due to clustered supernova feedback might be driving metal-enriched outflows from the Solar neighbourhood (e.g., Meusinger 1992; see also, Fielding et al. 2018) and impacting subsequent star formation (Higdon & Lingenfelter 2013;Zucker et al. 2022).Our finding that  y < 1 for the Milky Way is also supported by isolated galaxy formation simulations with resolution high enough to resolve metal mixing at the ISM -CGM interface (Vĳayan et al. 2023).These simulations estimate  y < 1 even for Solar Neighborhood conditions.However, considerable uncertainties remain in directly measuring the metallicity of the outflowing gas (e.g., Chisholm et al. 2017), not to mention the added complication due to possible re-accretion of metal-rich gas via Galactic fountains (e.g., Ma et al. 2016;Fox et al. 2016;Anglés-Alcázar et al. 2017;Soler et al. 2020).Nonetheless, our results point that even for a massive galaxy like the Milky Way, the commonly used assumption that the outflow metallicity equals the ISM metallicity may not be correct, and should be revisited in Galactic chemical evolution models.If the Galaxy indeed preferentially expels gas rich in oxygen (more generally, type II elements), this also has implications for the evolution of O/Fe, C/O and N/O ratios as this loss is not taken into account in commonly accepted explanations for the CNO cycle or the − knee (see the review by Romano 2022).
We also infer from Section 5 that models that either include feed- back from clustered supernovae or a higher turbulence injection efficiency from gas accretion better reproduce the APOGEE distributions.Note that large values of both  SN and  a increase the velocity dispersion of the gas.The elevated velocity dispersion (averaged over the last 6 Gyr) is higher than that seen in a typical galaxy of similar mass but within the scatter present in the trend between stellar mass and gas velocity dispersion (Varidel et al. 2020).Higher  g in turn increases P but decreases S and A in the metallicity equation (see equation 4).However, model 26 (with  SN = 5) increases  g to 50 km s −1 at  = 0, which is larger by a factor ∼ 4 than that observed in the Galactic ISM (Kalberla & Kerp 2009;Riener et al. 2020), and even larger than the stellar velocity dispersion of stars younger than 6 Gyr (Mackereth et al. 2019).Therefore, even though model 26 has an M at par with the other best models, it severely violates observational constraints on both stellar and gas-phase velocity dispersions.Together with the fact that model 25 (M = 4.11 ± 0.3) performs similar to model 4 (M = 4.19 ± 0.3), we conclude that the models allow for, at most, some level of supernova clustering to reproduce the radius -metallicity and age -metallicity distributions, but do not require it.It is quite plausible that supernova clustering can simultaneously increase ISM turbulence and metal content of outflows (e.g., Emerick et al. 2018;Fielding et al. 2018;Schneider et al. 2020;Kim et al. 2020;Sharda et al. 2024, in prep.), both of which are preferred by models best describing the APOGEE data.
On the other hand, model 28 with  a () twice the fiducial value yields  g ( = 0) consistent with observations, while also reproducing the observed APOGEE distributions. g in model 28 is higher by 20 per cent than the average trend (Übler et al. 2019;Wisnioski et al. 2019, Wisnioski et al. in preparation) at the highest redshifts we consider.Our findings suggest that gas accretion could have driven a non-negligible amount of turbulence in the Galactic disc over the past 6 Gyr.This has not been investigated before because chemical evolution models typically do not take into account energy balance in the ISM in addition to mass and metal balance.However, there are no direct observational measurements of  a , and simulations have only explored it for dwarf galaxies (Forbes et al. 2023).Another implication of our results is on the age -velocity dispersion relation (AVR) for the Milky Way disc, and whether stars were born hot or were dynamically heated afterwards (e.g., Spitzer & Schwarzschild 1951;Sellwood & Binney 2002).If the ISM in the Galaxy had higher than average velocity dispersion in the past, it would mean that the stars were born hotter and contribution from dynamical heating to their observed velocity dispersions is less prominent than expected (e.g., Bird et al. 2012Bird et al. , 2013;;Leaman et al. 2017;Mackereth et al. 2019;Ting & Rix 2019).
Lastly, we discuss the role of metal diffusion in our models.The diffusion coefficient in the models is proportional to  2 g , and increases linearly with radius under the Toomre stability criterion (see section 2.2 of Sharda et al. 2021a for details).Intuitively, we can understand this as follows: high  g inevitably leads to more metal mixing due to turbulence or other instabilities (Ma et al. 2017;Sharda et al. 2021c), and the outer regions of galaxies have larger scale heights due to flaring (Bacchini et al. 2019), which makes it possible for diffusion to act on larger scales.Thus, diffusion becomes dominant (when any of the three dimensionless ratios P, S and A are less than unity) in redistributing metals at earlier times and at larger radii.
While we explore ISM physics in a lot more detail as compared to other studies, our investigation solely focuses on the oxygen abundance.This trade-off prevents us from leveraging the abundances of various other elements in the APOGEE sample, which can provide valuable insights into different nucleosynthesis pathways and their impact on the ISM (e.g., Chiappini et al. 2001;Romano et al. 2010;Rybizki et al. 2017;Krumholz & Ting 2018;Sharma et al. 2021;Ting & Weinberg 2022).

Which aspects in APOGEE data constrain outflow metallicity and ISM turbulence?
Given that our models find Galactic outflows to be metal enriched and gas velocity dispersion to be slightly elevated, it is worth asking which particular aspects of the APOGEE data we use are ultimately responsible for these conclusions.
To understand this, we revisit the metallicity equations (equation 2 and equation 3), and analyze the impact of the outflow metallicity parameter  y and the gas velocity dispersion ( g ) while keeping other parameters fixed to the fiducial model.We keep the boundary conditions fixed in this exercise as they do not matter for the overall profile.We can directly compare these profiles with the oxygen abundance distribution of the youngest stars (ages < 0.5 Gyr) in the APOGEE database, for which we can neglect radial migration and assume their birth radii is the same as their current location in the disc.We plot the abundance of these stars as a function of their Galactocentric radius (normalized by  0 = 3 kpc) in Figure 9.Note that none of the youngest stars lie within 6 kpc of the Galaxy center in our sample.
We show the resulting model metallicity profiles in Figure 9.It is not surprising that the profiles are non-linear, given the complex interplay between various ISM processes that ultimately sets the metallicity in the models (see Figure 2).Such non-linear profiles are routinely observed in the ISMs of external galaxies, even though the gradient reported is based on a linear fit (e.g., Maiolino & Mannucci 2019;Kewley et al. 2019;Poetrodjojo et al. 2021;Chen et al. 2023a).We first focus on the left panel of Figure 9 which plots model metallicity profiles with all the fiducial model parameters fixed except for  y .We see that model profiles that best encapsulate the extent of the data are those where  y is neither too high nor too low.Now, we fix  y = 0.5 and vary  g from 6 km s −1 (dwarf-galaxy like, Stilp et al. 2013;Moiseev et al. 2015;Ianjamasimanana et al. 2015) to 40 km s −1 (ULIRG-like, Veilleux et al. 2009;Scoville et al. 2015Scoville et al. , 2017)).We read off from the right panel of Figure 9 that trends seen in the data cannot be reproduced by models where  g is either too low or too high, as expected.For example, the metallicity profile corresponding to  g = 6 km s −1 is inverted, which is not observed in local massive spiral galaxies (Sánchez-Menguiano et al. 2016;Belfiore et al. 2017;Poetrodjojo et al. 2021).Variations in boundary conditions do not impact these results.The key observable responsible for driving  y and  g to intermediate values for the youngest stars is the cluster of stars at  gal ≈ 9 − 12 kpc with [O/H] ≈ −0.2.If stars at these distances were more metal rich, our models would suggest a higher  y (i.e.,, an outflow metallicity closer to the ISM metallicity).Extending this analysis to older stars is non trivial because the ISM model metallicity profiles need to be adjusted to take radial migration into account.

Degeneracy between model parameters
Since the metallicity profiles are dictated by the ratios of metal advection, star formation, and gas accretion to metal diffusion (P, S, and A, respectively), model parameters that appear together in these ratios are degenerate with each other. 8For example, we notice from equation 7 that the fractional parameters  sf (fraction of star forming gas in the ISM) and  y (differential enrichment of galactic outflows) only appear in S.These two parameters are thus degenerate -models where either parameter is very low (i.e., close to zero) cannot explain the observed 2D MDFs, and perform the worst on our metric.If  sf is too low or too high,  y can compensate for its effect on S. As the value of  sf is well constrained for Milky Way like galaxies, we have exploited this degeneracy to implicitly place constraints on the value of  y as above.
Similarly, the parameters  gP (pressure in the disc due to gas selfgravity) and  mp (ratio of total to the turbulent pressure in the disc) are also degenerate.If we only vary  gP independent of  gQ , we obtain variations in M in the same direction as when we vary  mp .However, the resulting values of M do not scale together quantitatively since  mp is not a fractional parameter.While the overall impact of varying either  gP or  mp is noticeable, they are well constrained by direct observations and analytical calculations, so the degeneracy between the two is not of concern for our analysis.

CONCLUSIONS
In this work, we use a combination of state of the art stellar disc model and first principles ISM chemical evolution model to reproduce the observed radius -metallicity (oxygen abundance) and age -metallicity distribution of APOGEE red clump stars.We use the stellar, low- disc model of Frankel et al. (2018Frankel et al. ( , 2020) ) to map the present day radius of these stars ( gal ) to their birth radii (), and the ISM chemical evolution model of Sharda et al. (2021aSharda et al. ( , 2024) ) to investigate which ISM parameters are important in reproducing the data.
Our intention is to provide a qualitative understanding of how we can recover information about the evolution of the Galactic ISM and outflows from stellar data.We do so by keeping the stellar disc model parameters fixed and introducing simple variations in the ISM model parameters, and use different metrics to compare the modeled radius -metallicity and age -metallicity distributions with the APOGEE data corrected for selection function effects.The fiducial model that we consider as a starting point for our analysis is inspired from observations of local spiral galaxies.The fiducial model does not reproduce the observations, implying that the ISM history of the Milky Way is somewhat different than a typical  = 0 spiral galaxy.
We first validate models that best reproduce APOGEE data against existing constraints from Galactic as well as extragalactic gas-phase observations.As we discuss in Section 6, all except one of our best models are in good agreement numerous gas-phase observations.The exception is the model that allows for strong supernova clustering, and well reproduces APOGEE data, but at the cost of unusually high gas velocity dispersion not observed in the Milky Way or its analogues.This consistency check thus demonstrates that Galactic chemical evolution models should be validated against extragalactic observations as these observations provide a benchmark to compare the Milky Way with, and often contains information about ISM physics not directly accessible within the Galaxy.
We exploit the successes of the other models best reproducing the APOGEE data to place novel constraints on two key parameters that inform on the evolution of the Galactic ISM and outflows: (i) The parameter  y , which describes the differential enrichment of galactic outflows, is less than unity, implying the presence of metal-rich (relative to the local ISM) Galactic outflows over the last 6 Gyr.Our best models constrain 0.4 ≤  y ≤ 0.7 for the Milky Way, which, assuming a mass loading factor less than unity (Hayward & Hopkins 2017), implies that the metallicity of outflows is 1.2 − 1.8× the local ISM metallicity.This finding is also supported by high resolution simulations modeling the ISM -CGM interface in Solar Neighborhood-like environments (Vĳayan et al. 2023).If true, this means that the commonly used assumption that the outflow metallicity equals the ISM metallicity is not applicable to the Milky Way, as is also supported by observations of nearby galaxies (Chisholm et al. 2018;Sharda et al. 2021b).
(ii) The gas velocity dispersion (averaged over the last 6 Gyr) in the Milky Way is higher than that for a galaxy of similar stellar mass, but is within the scatter present in the stellar massvelocity dispersion relation (Varidel et al. 2020).The elevated dispersion can either be sourced from a higher star formation efficiency per freefall time (as is indeed measured for Milky Way molecular clouds - Pokhrel et al. 2021), increased turbulence due to clustering of supernovae and superbubble formation (which can also enhance outflow metallicity - Kim et al. 2020;Vĳayan et al. 2023), or due to gas accretion (expected to be important for massive galaxies like the Milky Way - Ginzburg et al. 2022).This highlights the importance of considering energy balance in the ISM in addition to mass and metal balance in chemical evolution models.
Both of these findings are novel because to date, no chemical evolution models have simultaneously included differential enrichment of outflows and energy balance in the ISM.The metal enrichment of outflows (which most influences the match between the ISM models and the data) is the least constrained from both Galactic as well as extragalactic observations (e.g., Chisholm et al. 2018).We do not fit the model parameters to reproduce the data, and therefore do not expect to reproduce detailed features of the MDF or the radius -metallicity and age -metallicity distributions.This will be the subject of a future work involving likelihood-free inference to create a model that specifically applies to the Milky Way.Starting with the Milky Way, this work thus shows the power of combining stellar and gas-phase information in galaxies to reveal their detailed evolutionary history by simultaneously considering both the stars and the ISM.2).We scale the KL divergence metric by a factor of 10 so that its mean magnitude is at par with the other metrics.Right panel: Same as the left panel but for the age -metallicity domain.On average, all the metrics yield consistent performance in the context of how a model compares to the data from APOGEE.  .Sum of all four metrics in the age -metallicity domain plotted as a function of that in the radius -metallicity domain for all the models used in this work.On average, the summed metrics lie close to the 1:1 line, indicating that models that exhibit good performance in one domain also excel in the other domain, and vice versa.This ensures that the result is not biased by one domain over the other.number of bins has virtually no effect on the relative M, and models that best reproduce the data remain identical between the two cases.
Note that while it is mathematically straightforward to further increase the number of bins to assess the performance of the models, there is a physical limitation for the regime of applicability of the underlying chemical evolution model: the ISM component is designed to explain chemical enrichment on ∼ kpc scales and does not include additional physics, azimuthal variations (Ho et al. 2018;Kreckel et al. 2019), and stochasticity that acts on smaller scales (Krumholz & Ting 2018;Metha et al. 2024), so using bin sizes ≲ 300 pc will not yield a physically meaningful result.

APPENDIX C: RADIUS -METALLICITY AND AGE -METALLICITY DISTRIBUTIONS FROM ALL OTHER MODELS
Figure C1 plots the radius -metallicity and age -metallicity distributions for all the other models we discuss in the main text.These models perform poorly as compared to models presented in Figure 6.This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure 1.Present day location of the subsample of APOGEE red clump giants that we use in this work, color-coded by [O/H].

Figure 3 .
Figure 3. Variations in the ISM metallicity model parameters as a function of redshift (or lookback time) in the fiducial model.Left axis shows parameters with fractional values (dashed lines); right axis shows the rest (solid lines).Parameters  in and  a are used to describe efficiencies with which gas can be accreted and drive turbulence in the disc, respectively. sf and  gQ describe gas fraction in the disc. w is the mass loading factor,  g,acc is the gas accretion rate, and  is the halo concentration index. g is the gas velocity dispersion, a part of which is driven by star formation feedback ( SF ) and gas accretion ( acc ).A detailed description of all the parameters is available in Table1.

Figure 4 .
Figure 4. Radius -metallicity and age -metallicity distributions for the sample of red clump stars from APOGEE DR17 used in this work (left panels), compared against the fiducial model presented in this work.Fiducial model in this context does not mean the model that best describes the data -it is the starting point for all models.The fiducial values of all ISM parameters are taken from a collection of observations and theoretical calculations, and are listed in Table1.

Figure 5 .
Figure 5. Radius -metallicity and age -metallicity distributions for the sample of red clump stars from APOGEE DR14 used in this work (left panels), compared against the combination of the Frankel et al. (2018) stellar disc model and the fiducial ISM chemical evolution model fromSharda et al. (2024).The fiducial values of all ISM parameters are listed in Table1.

Figure 6 .
Figure 6.Same as Figure5, but for the models that perform the best on the metric M used in this work.The differences between these models and the fiducial model are listed in Table2.

Figure 7 .
Figure 7. Normalized metallicity distribution function (MDF) for the APOGEE data, plotted together with the normalized MDFs from the four best models (models 16, 21, 25, and 28 -see Figure6).For comparison, also shown are the normalized MDFs for the fiducial model and the model with the worst score on the metric (model 13).A description of all the models is available in Table2.

Figure 9 .
Figure 9. Left panel: Oxygen abundance as a function of normalized Galactocentric radius ( 0 = 3 kpc).Grey markers denote the subset of the youngest, low−  red clump stars (ages < 0.5 Gyr) in APOGEE DR14 data.Solid curves denote the metallicity profile predicted by the fiducial model with varying levels of metal enrichment of Galactic outflows, encoded in the fractional parameter  y . y = 1 means that the metallicity of the outflows is equal to the ISM metallicity, whereas  y ≪ 1 means most of the newly produced type II metals are ejected via outflows.Most of the data prefer an intermediate  y , as we discuss in the text.Right panel: Same data as the left panel, but the models shown are for varying levels of ISM turbulence (encoded in the gas velocity dispersion  g ) for fixed  y = 0.5.The data prefer intermediate values of  g .

Figure A1 .
Figure A1.Left panel:Value of the four different metrics used in the main text (Section 4.1) for the radius -metallicity domain for all the models discussed in this work (see Table2).We scale the KL divergence metric by a factor of 10 so that its mean magnitude is at par with the other metrics.Right panel: Same as the left panel but for the age -metallicity domain.On average, all the metrics yield consistent performance in the context of how a model compares to the data from APOGEE.

Figure A2
Figure A2.Sum of all four metrics in the age -metallicity domain plotted as a function of that in the radius -metallicity domain for all the models used in this work.On average, the summed metrics lie close to the 1:1 line, indicating that models that exhibit good performance in one domain also excel in the other domain, and vice versa.This ensures that the result is not biased by one domain over the other.

Figure B1 .
Figure B1.Metric value M for all models relative to the fiducial model, for the default number of bins used in the main text (blue), and for the case where the number of bins is increased by 2× (orange).The relative performance of the models remains the same when more bins are used.
. APOGEE provides both [O/H] and [Fe/H] measurements for the sample we use.Since the ISM model we use is best applicable to  elements, we will use [O/H] for the purpose of our analysis.Additionally, the overall [O/H] measured in the selected sample is expected to reflect the birth abundance (within the uncertainty) because it is not affected by dredge-up sf ,  ff ,  gQ  SF ,  SN ,  sf,max  nt ,  (  g ) Graphical model that shows the different components of the chemical evolution model, with curly braces marking the corresponding stellar and ISM parts.Filled gray circles denote APOGEE observables, solid dots denote parameters that we keep fixed in this work, and colorless ellipses/circles denote parameters we vary.The green ellipses denote the equilibria we consider in the ISM.All the relevant ISM physics (specified by various symbols in bottom left) gets condensed into three key dimensionless ratios -P, S, and A. We use the model to produce synthetic distributions of stars with a given metallicity, current day galactocentric radius, and age that we then compare against APOGEE data.Definitions of all the parameters are listed in Table [O/H] CGM ).We estimate [O/H] in the range  = 1 to  max , where  max is the outer edge of the disc.We sample  max in the range 13 − 23, as we describe further in Section 4.The dimensionless quantities P, S 2 and A respectively describe

Table 2 .
List of all ISM models created from variations in the fiducial model.Model 1 corresponds to the fiducial model, the parameters of which are listed in Table1.Second column lists the modifications applied to the fiducial model.Third column denotes the overall metric score -best models (denoted in bold font) have the lowest score.The typical uncertainty on the combined metric value M is 0.3.Last column denotes the qualitative inference about ISM physics based on M.  y ≤ 0.7,  gQ () → 0.5  gQ ()35.04Variations in  gQ not required 12 0.4 ≤  y ≤ 0.7,  gQ () → 1.5  gQ () y ≤ 0.7,  mp → 1.4  mp 3.98 17 0.4 ≤  y ≤ 0.7,  min → 0.5  min 17.46 Variations in Toomre  not required 18 0.4 ≤  y ≤ 0.7,  min → 1.5  min 16.96 19 0.4 ≤  y ≤ 0.7,  Q → 2  Q 6.72 Variations in  Q not required 20 0.4 ≤  y ≤ 0.7,  ff → 0.5  y ≤ 0.7,  a () → 2  a ()

3.94 panels
of Figure4show the APOGEE radius -metallicity and agemetallicity distributions.The corresponding right panels in Figure4show the same distributions from the fiducial model.We see from these panels that the model distributions are wider and more diffuse than the data distributions.The fiducial model produces more stars at low and high [O/H] at all ages.The overall metric value for the fiducial model is M = 9.95, with an uncertainty ∼ 0.3 that results from a combination of error on the ages and [O/H] as well as random error due to finite LHS sampling (i.e., M changes by approximately 0.3 when the number of LHS samples is either increased or decreased from 10000, or if a different set of 10000 randomly chosen grid points is used).