Flexible Models for Galaxy Star Formation Histories Both Shift and Scramble the Optical Color-M/L Relationship

The remarkably tight relationship between galaxy optical color and stellar mass-to-light ratio ($M_*/L$) is widely used for efficient stellar mass estimates. However, it remains unclear whether this low scatter comes from a natural order in the galaxy population, or whether it is driven by simple relationships in the models used to describe them. In this work we investigate the origins of the relationship by contrasting the derived relationship from a simple 4-parameter SED model with a more sophisticated 14-dimensional Prospector-$\alpha$ model including nonparametric star formation histories (SFHs). We apply these models to 63,430 galaxies at $0.5<z<3$ and fit a hierarchical Bayesian model (HBM) to the population distribution in the $(g-r)$--$\log(M/L_g)$ plane. We find that Prospector-$\alpha$ infers systematically higher $M_*/L$ by 0.12 dex, a result of nonparametric SFHs producing older ages, and also systematically redder rest-frame $(g-r)$ by 0.06 mag owing to the contribution from nebular emission. Surprisingly, the combined effects of the $M_*/L$ and $(g-r)$ offsets produce a similar average relationship for the two models, though Prospector-$\alpha$ produces a higher scatter of 0.28 dex compared to the simple model of 0.12 dex. Also, unlike the simple model, the Prospector-$\alpha$ relationship shows substantial redshift evolution due to stellar aging. These expected and testable effects produce overall older and redder galaxies, though the color--$M_*/L$ relationship is measured only at $0.5<z<3$. Finally, we demonstrate that the HBM produces substantial shrinkage in the individual posteriors of faint galaxies, an important first step toward using the observed galaxy population directly to inform the SED fitting priors.


INTRODUCTION
The stellar mass of a galaxy (M * ) encodes rich information about the formation of the galaxy itself. Stellar mass changes through internal star formation activity and external mergers and galaxy interactions. It is a stable property that allows one to connect together galaxy populations across time when the galaxies themselves have disparate ages, dust contents, sSFRs, and colors. It plays a critical role in our understanding of the evolution of galaxies over cosmic time, as massive galaxies tend to form earlier and quench faster than low-mass galaxies ("downsizing"; Cowie et al. 1996;Gallazzi et al. 2005). Also, M * correlates with many galaxy physical yzl466@psu.edu properties such as star formation rate (SFR), metallicity (Z), and galaxy size. By combining these scaling relationships, M * can provide us with a baseline knowledge of galaxy properties.
Since M * is not an observable, its determination generally relies on spectral energy distribution (SED) fitting of broadband photometry or spectra (e.g., Papovich et al. 2001;Shapley et al. 2001;Pforr et al. 2012;Conroy 2013;Courteau et al. 2014). In SED fitting, the galaxy spectrum is modeled as an assembly of stellar populations of coeval stars with (typically) homogeneous metallicity. In this composite stellar population, the total stellar mass of the galaxy is the integral of the SFH across time plus the effects of stellar mass loss. The SED-fitting process is complex and most informative when performed with large amounts of data as it involves generating template galaxy SEDs and inferring 2 Li et al.
the model parameters by comparing the model SEDs to the data.
In contrast, when we do not have sufficient data to perform informative SED modeling, one alternative is a popular and efficient method to get M * using the rest-frame optical color (CMLR for the color-M * /L relationship). This empirical relationship allows remarkably accurate M * estimates without SED fitting. Bell & de Jong (2001) reported a tight linear relationship between the optical color (B − R) and log M * /L with a scatter of ∼0.2 dex. Since then several studies have further explored the relationship using combinations of color and M/L in different bands based on both stellar population synthesis (SPS) models and the results of performing SED fitting on the observations (e.g., Bell et al. 2003;Portinari et al. 2004, Zibetti et al. 2009Taylor et al. 2011;Into & Portinari 2013;Courteau et al. 2014;McGaugh & Schombert 2014;van de Sande et al. 2015;García-Benito et al. 2019;Ge et al. 2021). These studies confirmed the robustness of M * estimates from a single optical color but revealed a larger scatter in the relationship (∼0.3 dex). In comparison to optical colors, near-infrared (NIR) colors are less predictive for M * /L, which we will discuss in Figure 1 later, and are more sensitive to the modeling of the asymptotic giant branch (AGB) phase of stellar evolution. Because we are able to estimate the approximate color directly from the observations, the optical color-M * /L relationship has been widely used in translating the stellar light to galaxy mass when we know the object redshift, e.g., in dynamical studies (e.g., Nguyen et al. 2020).
The notorious dust-age-metallicity degeneracy actually helps shape this remarkably tight relationship. Increasing the age, metallicity, or including more dust will make the galaxy redder and meanwhile enhance the measured M * /L. An important finding of Bell & de Jong (2001) is that the dust effect on color and M * /L is parallel to the relationship. These stellar population parameters counteract each other's effects on optical color and M * /L and the net result is a relationship with small scatter. In Figure 1 we present the relationships between the priors of color and diffuse dust optical depth (τ 2 ), mass-weighted age derived from SFH (t avg ), and stellar metallicity (stellar Z) for the two SED models adopted in this paper 1 (see Section 2.2 for the details and construction of Figure 1). As we have emphasized, a tight relation between optical color and log(M/L g ) is a fundamental outcome of the stellar physics and dust attenuation models. Figure 1 shows that due to the strong degeneracies among parameters, we cannot infer the age, dust, and metallicity from the optical color (g − r) alone, as indicated by their broad distribution at a given color. However, we can predict the M/L g ratio with high confidence from (g − r). On the contrary, (J − K) color spans a narrow range and is not sensitive to either M/L g or stellar population parameters. This is because the degeneracy works differently for different bands. The shape of the degeneracy makes optical M * /L increase much faster than NIR color as the stellar population parameters vary.
Many components of SED modeling can potentially affect the accuracy of the mass determination from the empirical optical color-M * /L relationship. Such components include the physical model, the initial mass function (IMF), the stellar isochrones, and the stellar spectral libraries. IMF variations can shift the entire relationship to higher or lower M * /L but generally do not influence the variance, as most IMF assumptions only affect the low-mass stars, which emit a small fraction of the total light.
After choosing the same IMF, different choices of SPS models can still introduce a 0.2 dex offset in M * /L as argued by several previous studies (e.g., Zibetti et al. 2009;van de Sande et al. 2015;Ge et al. 2019). The offset is strongest for optically blue stellar populations. There are many possible reasons for the offset such as different treatment of the AGB phase of stellar evolution (Zibetti et al. 2009;Kriek et al. 2010), and the completeness of the covered parameter space of the stellar spectral libraries, particularly for young, metal-poor populations (Choi et al. 2016;Senchyna et al. 2019). However, the influence on the color-M * /L relationship due to SPS models is not the topic of this paper.
In this work, we want to evaluate the role of the simplified prescriptions in SED models which comparatively has not seen as much attention in the literature. Previous work on the color-M * /L relationship mostly relies on relatively simple model assumptions like parametric SFHs, fixed dust attenuation curves, and often assumes fixed metallicity. These simple prescriptions have been used for a long time because they make the model straightforward to fit and easy to interpret. Nevertheless, SED-fitting models have become substantially more complex over these years (see reviews by Walcher et al. 2011;Conroy 2013) in accounting for extra important effects, such as freedom in SFHs, metallicity evolution, nebula emission, dust emission, and AGN, in concert with the development of machinery that cre- ates self-consistent population evolution. It is unclear if the color-M * /L relationship from previous simpler SED models emerges from a substantially more sophisticated framework.In this paper, we employ a sophisticated physical model Prospector-α (Leja et al. 2017), which allows a wide range of physics and has 14 free parameters. By comparing it with a simpler model, we will demonstrate that how we model the galaxy has a large impact on the resultant relationship. Instead of performing a linear regression to the optical color and log M * /L best-fit data like most previous studies, we utilize a hierarchical Bayesian model (HBM) to derive the density distribution of the relationship. Our HBM characterizes the relationship with a population distribution that models the distribution parameter of log M * /L at given color with explicit parameters. The hyperparameters that define the population model are fit upon the individual color and M * /L posteriors from SED fittings. The result of this two-level Bayesian inference is the full posterior distribution of the population parameters. This method naturally corrects for the observational uncertainties of color and M * /L by utilizing the SED-fitting posteriors as weights, without any requirement for assuming Gaussian or uncorrelated posteriors. Traditional χ 2 minimizing can be biased by an uneven data distribution in the measurement plane, especially when the measurement uncertainties of the independent variable are comparable to its intrinsic pop-ulation scatter (Kelly 2007). HBM is likely to be less biased as it naturally distinguishes intrinsic scatter from the measurement errors in its hierarchical structure. Additionally, we pass likelihoods to the population model instead of posteriors by including priors in the weights. In this way we go from an informative prior to an "uninformative"(flat) prior, which is normally not the case in linear regression. Also, because HBM assumes the fit objects coming from the same population and assigns a shared prior distribution (i.e., the population distribution) to the individual color and M * /L estimates, it shrinks the individual fits and makes them cluster around the population mean (Loredo & Hendry 2019). This alludes to one big advantage of the HBM, that we can learn new priors and reapply them to SED fits.
In this work, we will investigate the relationship between optical color (g − r) and M * /L in the g-band M/L g using two contrasting SED models. Our goals are to (1) diagnose how SED model assumptions affect the relationship since the Prospector-α model is very different from previous models; (2) derive a relationship at higher redshifts than previous studies, which allows data-based M * /L estimates when the full machinery of SED fitting is impractical. The paper is organized as follows. In Section 2, we review the sample properties and the key features of our SED models. In Section 3, we introduce the algorithms of our hierarchical Bayesian modeling approach. We present the resultant relation-ships in Section 4, where we also investigate the driving factor for the M * /L and the color offsets between the two SED models. We compare our results to a few previous works and discuss what we learn from the HBM in Section 5. In Appendix A we present a mock test of our HBM. We assume a Chabrier (2003) IMF in our analysis. All colors and M * /L are in the rest-frame band. We use AB magnitudes throughout the paper and adopt the absolute magnitude of the Sun in g-band M g = 5.11 (Willmer 2018).

DATA FROM SED FITTING
In this work, we will contrast the optical color-M * /L relations derived from two SED models. Both models are constructed within the SED-fitting framework Prospector (Leja et al. 2017;Johnson et al. 2021). We select our sample from the 3D-HST photometric catalogs (Skelton et al. 2014). In Section 2.1 we introduce the photometry from the 3D-HST survey and our selection criteria. In Section 2.2 we describe the two SED models used in this work. In Section 2.3 we show the prior distribution of (g − r) and log(M/L g ), which will be used in deriving the hierarchical models.

3D-HST Sample
3D-HST (Skelton et al. 2014) is a space-based grism survey covering ∼900 arcmin 2 in five well-studied extragalactic fields. It provides between 17 (the UDS field) and 44 (the COSMOS field) bands of photometry at wavelengths 0.3-8 µm for hundreds of thousands of galaxies. The survey is supplemented with Spitzer/MIPS 24 µm photometry from Whitaker et al. (2014). The 3D-HST catalogs also provide photometric redshifts from the EAZY SED-fitting code (Brammer et al. 2008). Approximately 30% of galaxies studied in this work have measured spectroscopic redshifts or grism redshifts, which are computed by fitting the photometry and spectrum simultaneously (Momcheva et al. 2016).
In this paper, we adopt a subsample of 3D-HST galaxies from Leja et al. (2019b) and Leja et al. (2020), consisting of 63,430 galaxies. This sample is selected between 0.5 < z < 3.0 above the stellar mass completeness limit of 3D-HST, which is the mass of the leastmassive galaxy detectable. It is critical when performing population-level inference to reduce or eliminate selection effects by working with a mass-complete sample, or alternatively to model the selection function very well. With the redshift cut, the observed photometry covers the rest-frame (g − r) color across the full redshift range. Details of the selection criteria and adjustments to the photometric zero-points from the default 3D-HST catalogs are described in Leja et al. (2019b).

Two Contrasting Physical Models for SED fitting
We use the Prospector galaxy SED-fitting code to fit the 3D-HST photometry. The Prospector inference framework is based on Bayesian forward modeling. The posterior parameter distribution is sampled using the dynamic nested sampling code dynesty (Speagle 2020). For every galaxy in our sample, we fit a simple SED model that mimics the FAST settings (Kriek et al. 2009) as used to derive the stellar population parameters in 3D-HST catalog (Skelton et al. 2014), and a more complex SED model Prospector-α (Leja et al. 2017) to the photometry. The Prospector-α fits have been performed in Leja et al. (2019b) and Leja et al. (2020). We adopt MESA Isochrones and Stellar Tracks (MIST; Dotter 2016; Choi et al. 2016), and MILES stellar spectral libraries (Sánchez-Blázquez et al. 2006) in the Flexible Stellar Population Synthesis (FSPS; Conroy et al. 2009) framework 2 .
The simple model is constructed using basic assumptions that have been widely applied in SED modeling. It has four parameters: the stellar mass formed M * ,formed , the diffuse dust optical depthτ 2 , the galaxy age t age , and the star formation timescale τ . The stellar metallicity is fixed to solar metallicity. We adopt the Calzetti et al. (2000) dust attenuation curve with a flat prior over 0 <τ 2 < 4. Theτ 2 parameter controls the normalization of the attenuation curve. We assume an exponentially declining SFH with a minimum t f of 30 Myr.
The 14-parameter Prospector-α model incorporates many of the recent important improvements in SED models into a single consistent framework. Free stellar and gas-phase metallicity are allowed in the model. The stellar mass-stellar metallicity relationship modified from Gallazzi et al. (2005) is implemented as a prior. We assume a flexible two-component dust attenuation model accounting for birth-cloud and diffuse dust separately. For the diffuse dust component, we add a parameter dust index to enable variations in the shape of the attenuation curve as in Noll et al. (2009), and include the UV dust absorption bump. Dust emission from energy balance is also built in the model. We use a step function nonparametric SFH with seven time bins, which is more capable of capturing the diversity of galaxy SFHs than the exponentially declin-ing SFH (Leja et al. 2019a;Carnall et al. 2019;Lower et al. 2020). The model includes both the nebular line and continuum emission as implemented by Byler et al. (2017), which are important for young stellar populations or high-redshift objects. Mid-infrared dust emission from a dust-enshrouded AGN is also permitted in the model. The priors of Prospector-α model parameters are demonstrated in Leja et al. (2019b).

Priors on Color and Mass-to-light Ratios
To fit a hierarchical model to the optical color-M * /L relationship, we must first infer the SED-fitting priors on both color and M * /L, so we can correct for the priors in the HBM and not be biased by assumptions used in our physical models. Color and M * /L priors are not specified explicitly, but instead are specified implicitly by the choice of priors on SED model parameters including SFH, dust, metallicity, etc. We infer them numerically, and their joint distributions are shown in Figure 1. Their marginal distributions are shown in Figure 2. The resulting prior probability density distributions will be used in the next section for building hierarchical models.
We compare the (g − r) and log(M/L g ) priors and their distribution inferred for the 63,430 galaxies fit in Figure 2. The priors are not closely matched to the data, especially for the simple model. This is because the observed number density of galaxies is dominated by blue galaxies with small M * /L and the number densities of various subpopulations of galaxies are not yet typically taken into account in galaxy SED-fitting priors. Figure 2 shows that the Prospector-α priors describe the data better than the simple model, with a narrower range and a peak closer to the data. This is mostly owing to the different dust priors adopted, since galaxies with a highτ 2 will be redder and correspond to higher mass-to-light ratios. The simple model assumes a uniformτ 2 prior from 0 to 4, following the standard for the field (e.g., da Cunha et al. 2008, Marchesini et al. 2009, Muzzin et al. 2013, Skelton et al. 2014. Whereas the Prospector-α model has a more informativeτ 2 prior, a truncated Gaussian prior with a mean of 0.3 and a standard deviation of 1 between 0 and 4. Consequently, the priors of the simple model typically own more dust content and are redder. We further demonstrate the redshift dependence of the priors in Figure 2. As the redshift decreases, the prior distributions shift to the redder regime. Priors at different redshifts have the same blue end, implying young galaxies in the nearby universe. Overall, the redshift effects on the (g − r) and log(M/L g ) priors are not strong. However, we will show later that the observed redshift dependence of the color-M * /L relationship is strong.
. Prior and data distributions for (g − r) and log(M/Lg). The upper and lower panels show the simple model and Prospector-α respectively. Prospector-α priors describe the data better as the simple priors are too wide. The Prospector-α fits provide systematically higher log(M/Lg) by 0.12 dex and higher (g − r) by 0.06 mag. We show in Section 4.3 and Section 4.4 that the offsets are mainly driven by different mass-weighted age measurements and the inclusion of nebular emission, respectively.

A HIERARCHICAL MODEL FOR THE COLOR-M * /L RELATIONSHIP
In this section, we will construct a hierarchical Bayesian model to fit the relation between the rest-frame (g − r) color and the stellar mass-to-light ratio using the data described in Section 2. Since this work aims to understand the dependence of M * /L on optical colors and the intrinsic scatter in this relation, in the HBM we parameterize the distribution of log(M/L g ) at a given (g − r) with hyperparameters ρ. The model computes posteriors on the population parameters ρ from the individual galaxy fits θ, where θ = {log(M/L g ) i ,(g-r ) i }. Our approach is similar to Leja et al. (2020) in the context of modeling the stellar mass function (see also Nagaraj et al. (2022) for HBMs on the relationship between dust attenuation and galaxy properties). We employ the dynamic nested sampling package dynesty (Speagle 2020) to sample the model posteriors.
We model the probability density function (PDF) of log(M/L g ) at a fixed (g−r) and redshift z with a skewed generalized Student's t (sgt) distribution: Li et al.
where β denotes the beta function and sgn is the sign function defined by: (2) This generalized Student's t-distribution has 5 parameters. µ is the mode of the distribution. λ controls the skewness of the distribution. p and q are the kurtosis parameters. σ accounts for the variance of the distribution. We model the evolution of µ, λ, p, and q with quadratics in color, and, motivated by the redshift dependence in Figure 2, we additionally include a linear dependence on redshift for the location parameter µ, which accounts for potential variations in the slope of the (g − r)-log(M/L g ) relation over different redshifts (e.g., Szomoru et al. 2013).
Here a 0,1,2,3 , b 0,1,2 , c 0,1,2 and d 0,1,2 are the quadratic coefficients. The model has 14 degrees of freedom to control the population-wide behaviors of log(M/L g ) and (g − r). Such flexible formalism is necessary for fitting the skewness and the heavy tails of the log(M/L g ) density distribution. Due to the flexibility in the skew and kurtosis parameters, we can achieve a good fit with a fixed value of the variance parameter σ. This is also motivated by the shape of the observed distribution, where the scatter does not change significantly at different colors. The sgt distribution requires −1 < λ < 1, p > 0, and q > 0. So in practice, we reparametrize the quadratic coefficients b 0,1,2 , c 0,1,2 and d 0,1,2 using the values of λ, p, q at (g − r) = −0.3, 0.5, 1.3, respectively. The reparameterization process is similar to Appendix B of Leja et al. (2020). Using the anchor points such as λ −0.3 , λ 0.5 , λ 1.3 instead of b 0,1,2 makes it easier to set the priors. We are able to satisfy the upper and lower limits of the sgt parameters directly by defining the proper prior range. The anchor points are selected to roughly cover the (g − r) range, and the choice of the specific anchor points should not alter the results.
To summarize, our HBM model has 14 parameters to describe the distribution of the (g − r)-log(M/L g ) relationship. We assume uniform priors for these hyperpa-rameters: a 0 , a 1 , a 2 , a 3 ∼ Uniform(−10, 10), In Appendix A we generate mock data from this population model and validate that we can recover the distribution. Now that we have defined our model for the distribution of M * /L at a fixed color, we describe how to write down the likelihood for our HBM. The HBM is a straightforward extension of standard Bayesian analysis. Here, however, the input data are the posteriors from the individual galaxy fits, and the output is a model for the population distribution in (g − r) and log(M/L g ). By Bayes' theorem, Here, D is the data vector, and P (D) is a normalizing constant that can be ignored in our posterior sampling. P (ρ) is the prior distribution. Let θ i represent the (log(M/L g ) i ,(g-r ) i ) vector for galaxy i. We can therefore rewrite the likelihood P (D|ρ) in terms of θ i : where N is the total number of galaxies in our data. Because the fits are performed independently to each galaxy, where P (θ i |ρ) is the probability density of log(M/L g ) i at (g-r ) i . Here our model probability density is weighted by the likelihood P (D i |θ i ) from the SED fits performed by Prospector. The model likelihood for each galaxy P (D i |ρ) represents the weighted average of the model density distribution over all possible values of θ i .
In order to calculate the integral P (D i |ρ) = P (D i |θ i )P (θ i |ρ)dθ i from Equation 7, we estimate the likelihood P (D i |θ i ) from the posteriors and priors we compute during the SED fits in Section 2. We draw m samples {θ i,1 , . . . , θ i,m } from the posterior P (θ i |D i ) of color-M * /L 7 every galaxy. We choose m = 50 samples. Such a sample size is enough to resolve the posteriors while also allowing the fit to remain computationally tractable. We assign each posterior sample an importance weight: where P (θ i,j ) is the marginalized prior on log(M/L g ) i,j demonstrated in Figure 2. Accordingly, the model likelihood for each galaxy can be expressed in terms of P (θ i,j |ρ) from Equation 1 and w i,j : Therefore, our full log-likelihood becomes

MODEL RESULTS
While Figure 1 shows that our model priors predict a tight relation between the optical color and M * /L, in this section we will use real data to determine whether the observed relationship is similarly tight. We will first compare the (g − r)-log(M/L g ) relationship derived from the simple model and from the sophisticated Prospector-α model as described in Section 2. We will then explore the origin of their M * /L differences and color differences. In the end, we will apply the new color-M * /L relationship from Prospector-α HBM fit as a prior and show new color and M * /L estimates. 4.1. Relation Between (g − r) and the Stellar M/L Ratios Figure 3 shows the (g − r)-log(M/L g ) relation for the simple and Prospector-α models. We present the average relationship from the priors, the individual SED fits, and the distribution derived from our HBM. We marginalize the HBM over redshift when we project the 3D model distribution onto the 2D (g − r)-log(M/L g ) plane. Table 1 lists the posteriors of the 14-parameter Student's t-distribution that is used to describe the distribution of log(M/L g ) at fixed color. The model parameters are overall well constrained except the quadratic parameters of the kurtosis q.
To verify that our HBM fits the data well, we compare the (g − r)-log(M/L g ) posterior distribution from the HBM and the sum of SED-fitting posteriors of the entire sample for the Prospector-α model in Figure 4. Because the HBM input is SED-fitting likelihoods for the whole sample, we need to compare the full posterior distribution instead of point estimates, which do not represent the correlated, high-dimensional posterior distribution on redshift, (g − r), and log(M/L g ). It is challenging to define a "residual" between the data, i.e., SED-fit posteriors, and a best-fit HBM model and nor did the HBM aim to minimize any kind of residuals. We processed the HBM posterior distribution in two steps to make it comparable to the data. First, we weighted the HBM posteriors by the redshift and (g−r) distribution of the data to marginalize the 1D log(M/L g ) distribution at given redshift and color. Second, we convolved the weighted HBM posteriors with the median uncertainties of the observed (g − r) and log(M/L g ) in each grid of Figure 4 using Gaussian kernels. The final product is shown in the left column of Figure 4.
In the right column of Figure 4, we use the fractional difference between our model posteriors described in the last paragraph and the SED-fitting posterior sum to diagnose how well the HBM fits the data. We observe a close match between the model and data distribution density where the most data resides, as shown by the light-yellow color. This result suggests that our HBM works as expected. Since the population model is constrained by the observed galaxies, it should agree with the observed galaxies in the region where most of them reside. The difference is large where there is little data, especially for the upper edge of the relationship where the HBM posterior has a much higher density in the deep blue region than the data posterior sum. The comparison indicates that our HBM describes the main trend of the (g − r)-log(M/L g ) relationship well but is not able to capture the behavior of some outliers.
In Figure 3, we find that the two physical models have similar average relationships from the HBM (see the red and magenta lines in the right column). Though the simple relationship has a larger curvature, in which the slope is steep for blue galaxies and then flattens for red galaxies. In the contrast, Prospector-α has a roughly fixed slope for all galaxies. Based on Equation 3 and the median posteriors of a 0,1,2,3 in Table 1, the mode of the simple relationship is (11) and the mode of the Prospector-α relationship is (12) Readers may use this equation to estimate the (g − r)log(M/L g ) relationship from this work. Note that the (g − r) color here refers to the median from SED-fitting posteriors (see Section 4.4 for a discussion on different methods for color measurement). We report the mode of the relationship instead of the mean here to show Note that this is not the quantity that our HBM minimizes, and we only use this figure to visualize how well our HBM describes the data. The comparison shows that the HBM performs good fits where the most data are, but is not able to capture the behavior of the upper edge of the (g − r)-log(M/Lg) distribution where there is little data.
where most galaxies are. The mean and mode have a typical difference around 0.1 dex because the log(M/L g ) distribution is skewed at given (g − r). We warn the readers that our relationship is untested at z < 0.5. Also, in Section 5.3 we provide a tentative way to reduce the scatter of the relationship using an additional (r − i) color (Equation 13). At fixed color, both models show an increasingly skewed distribution at lower log(M/L g ). This trend is strongest for blue galaxies with a λ parameter closer to -1 (see Table 1). Because the distribution is skewed, it is more challenging to estimate the M * /L of blue galaxies, and can result in a high bias if the skew is unknown.
The most notable difference between the two SED models is the scatter around the average relation. The dispersion of the relationship is significantly larger for the Prospector-α model, which is evident by the larger posterior estimates of the σ parameter in Table 1. As we introduced in Section 1, the relationship exists in the first place because the effects of dust, age, and metallicity canceled out: variations in these parameters move the galaxies more or less parallel to the main relationship. Nevertheless, our results suggest that the net effect of these variations can alter the direction of the change in the (g−r)-log(M/L g ) plane when we consider a more realistic model. Hence we need to be careful about using a simple linear relation to predict log M/L ratios from optical colors.
A distinctive feature of the simple SED fits is the hook at the bluest end. This is due to the similar SFHs of the bluest galaxies in the simple fits, marked by the large t f /t age . As found by Bell & de Jong (2001), recent bursts of star formation will bias M * /L to low values. The exponentially declining SFH suffers strongly from the "outshining" effect (Papovich et al. 2001;Maraston et al. 2010), where the young bright stars outshine the old dim ones in the emitted light. As a result, the exponentially decaying SFH only characterizes the bulk of the most recent star formation and is not sensitive to underlying old populations with recent bursts. The simple fits may cause old star-forming galaxies to look like young starforming galaxies in this sense. The lack of flexibility of the SFH model creates a bias on the (g − r)-log(M/L g ) relationship at the bluest colors.
The principal point to be made here is that we cannot constrain the (g−r)-log(M/L g ) relation with the simple exponentially declining SFHs. The hook at the bluest color also makes the simple fits difficult to model. As shown in Figure 3, our median relation from the HBM is higher than the SED fits for these blue galaxies since the HBM causes regression toward the mean. The diagonal edge of the simple fits is due to the lower bound of t age prior, in which the youngest age allowed in the simple model is 10 Myr.

Redshift Evolution of the Color-M/L Relationships
In addition to the color dependence discussed above, our HBM parameterizes the redshift evolution, prescribed by the a 3 parameter (see Table 1). The bottom panel of Figure 5 demonstrates the redshift evolution of the model relationships from z = 1 to z = 3. The simple model has no perceptible redshift evolution. Conversely, as the redshift increases, the Prospector-α relationship shifts downwards. At given (g − r), low-z galaxies have higher M * /L than those of high-z galaxies because the stars in these galaxies are on average older. The slope of Prospector-α relationship does not vary as a function of redshift, which means the redshift evolution produces a similar effect for galaxies at all colors.
The strong redshift dependence of the Prospector-α (g − r)-log(M/L g ) relationship matches simple expectation from passive stellar aging: it increases the average age of the stars, and thus their M * /L as the age of the universe increases. The simple relationship agrees with the Prospector-α relationship at the blue and red ends at z = 3, but agrees with the main locus of the Prospector-α relationship at z = 1. The weak redshift dependence of the simple model implies that it does not reproduce a simple expectation that stars in galaxies get older as the universe gets older. This is because exponentially declining SFH only models the light that it sees from bright stars and is not sensitive to old stellar pop- ulations as we discussed in Section 4.1. Previous studies such as López-Sanjuan et al. (2019) found no change in the relationship with z, which is also likely due to their use of parametric SFHs.

What Drives the M * /L Offset Between the Two Models
So far we have shown how the optical color-M * /L relationship derived from two SED models differs. According to Leja et al. (2020), Prospector-α provides one of the highest M * /L from fitting the photometry among various SED models, which is still true when fitting the spectrum (Tacchella et al. 2022). The next question is what physics properties drive the differences between the models. Much of the complexity in interpreting the relationship is that the model parameters correlate with each other. To disentangle the degeneracy quantitatively and identify the most relevant driver of the higher inferred M * /L values, we exploit the feature importance measurement provided by random forests (RFs).
We train an RF to predict the log(M/L g ) difference between the Prospector-α model and the simple model. We use the scikit-learn package (Pedregosa et al. 2011) to construct an RF model of 1000 trees. We adopt 80% of our sample as the training set, and 20% as the test set. The training features are extracted from the SED fits, which include the model dif-ferences in the (g − r), diffuse dust optical depthτ 2 , total formed mass-weighted age t avg , and SFR in recent 100 Myr SFR 100 Myr , as well as the Prospector-α measurements of (g − r),τ 2 , t avg , specific star formation rate sSFR 100 Myr , stellar metallicity, and gas-phase metallicity. The endpoint of the RF regression is an accuracy score to reflect the goodness of fit, with 1 being the most accurate. Our selected features can predict ∆ log(M/L g ) of the test set at a mean accuracy score of 91% with a 0.03 dex root-mean-square error. The prediction accuracy is even better if we only consider high signal-to-noise ratio (S/N ) galaxies. This suggests that we have included sufficient training features. Figure 6 shows the importance of each input feature, which is computed by the reduction of the model performance by dropping out the features. Figure 6 immediately demonstrates that the difference in mass-weighted age is the primary driven factor for the log(M/L g ) offsets. t avg is far more predictive than the other features with a relative importance of 35%. This means that Prospector-α fits have systematically higher M/L g ratios mostly because they are older, as a result of using nonparametric SFHs (see Lower et al. 2020 for a discussion on the M * estimates from different SFHs). This agrees with the redshift trend in Figure 5. The bulk of the difference between the model relationships comes from galaxies at lower redshifts because the age differences permitted are larger (Leja et al. 2019b). The principal role of age is also supported by studies based on SPS libraries. For example, Into & Portinari (2013) demonstrated that both colors and M/L ratios increase continuously with SSP age regardless of the metallicities.
Other factors exhibit smaller influences than t avg on ∆ log(M/L g ) estimates, with the dust index being the second important feature. The dust index is a powerlaw modification to the shape of Calzetti et al. (2000) dust attenuation curve (Noll et al. 2009), where Calzetti law has a dust index of 0. Galaxies with a larger dust index have a flatter curve, i.e., less attenuation in UV and more attenuation in NIR. The dust index does not directly affect L g becauseτ 2 is measured within the g band, but will influence M * , and thus M * /L g . As the NIR SED traces closely the bulk of M * from the old, lowmass stars, a large dust index permits galaxies to add a lot of mass due to increased attenuation in NIR without changing optical colors (Salmon et al. 2016;Ma lek et al. 2018).
We find that the dust index is the most distinguishing factor for red galaxies when we repeat the RF analysis for the red subsample. Since dusty galaxies on average have a flatter dust attenuation curve than galaxies with little dust (Chevallard et al. 2013;Salim & Narayanan 2020), having a free dust index allows red galaxies to be either dusty star-forming galaxies with flatter dust attenuation curves (e.g. ultraluminous infrared galaxies), or nearly dust-free quiescent galaxies with steeper curves. In contrast, the simple model assumes a zero dust index for all galaxies. The distinct dust indices between the two models lead to large log(M/L g ) offsets at the red end.
The sSFR also has a second-order effect on ∆ log(M/L g ). The simple fits do not model the dust emission in IR while the IR emission traces the recent star formation. This generally biases the SFRs to lower values since dust-obscured star formation is preferentially missed (Wuyts et al. 2011). Our results confirm that the log(M/L g ) offsets are marginally influenced by metallicity, especially gas-phase metallicity.

Color Systematics
The results discussed so far investigate one component of the relationship, the mass-to-light ratio. The other part, the rest-frame optical color, is usually considered a well-measured quantity. Nevertheless, systematics occurs when we convert the observed photometry at different redshifts to the rest-frame colors. In our work we synthesize colors from the models generated during SED fitting. Some studies and surveys adopt Kcorrections to convert the observed-frame photometry of one given band to the desired rest-frame photometry of the same or a different band (e.g, Hogg et al. 2002;Blanton & Roweis 2007). K-correction from the observed band to its rest-frame counterpart is an effi-cient way to estimate colors at low redshift presumably since the bandpass wavelengths have changed very little. As one goes to high redshift, the dependence of the same band K-corrections on the galaxy SED increases but some surveys try to use an observed bandpass near the blueshifted rest-frame bandpass to approximate. Kcorrections are usually calculated by fitting the observed photometry with linear combinations of a limited number of SED templates (Blanton & Roweis 2007; some also use a single best-fit SSP to calculate K-corrections (Chilingarian et al. 2010)). Although the calculation of K-corrections still involves SED fitting at some level, the resultant color is likely to be less model-dependent than the one from full SED fitting. But unlike K-corrections, colors from SED fitting can be easily calculated in a consistent fashion for galaxies at any redshifts. Also, SED fitting can infer other galaxy properties including ages, metallicities, dust properties, and detailed SFHs more than just colors.
In Figure 7, we contrast the median model restframe colors and spectra of several example galaxies between the two SED model fits. The typical color differences between the models are around 0.1 mag, with the Prospector-α results systematically redder. In the upper left panel, we show galaxies with the typical color differences, where the corresponding g-band and r-band fluxes are shown as triangles. In the upper right panel, we show examples of galaxies with the most extreme color differences (all > 0.5 mag).
The comparison makes it clear that the color offsets between the two models are mostly due to the emission lines. The simple model does not include nebular emission. In particular, if the galaxy has bright UV fluxes and shows no prominent Balmer break, the simple model cannot fit the data well because the Balmer break is its only sharp spectral feature. Conversely, Prospector-α is more flexible in modeling observed photometry with large variations. It will add strong emission lines in this situation, which produces a lower χ 2 value when fitting the photometry as compared to the simple model (see upper right panel of Figure 7). Combining our discussion in the last section, the shift of log(M/L g ) with a median 0.12 dex due to nonparametric SFHs and the shift of (g − r) with a median 0.06 mag due to nebular emission, together move the average Prospector-α relationship in a parallel direction as the average simple relationship.
Remarkably, the galaxies with the extreme color differences in Figure 7 all show the inverse Balmer break feature. The inverse Balmer break comes from strong nebular continuum emission. This feature is uncommon for galaxies at lower redshifts but more likely to 3 × 10 3 4 × 10 3 6 × 10 3 Rest-frame Wavelength [Å] 10 10 10 9 Flux Density (maggies) [OIII] log . The color differences between the two models. The bottom panel shows the distribution of the (g − r) offsets between Prospector-α fits and the simple fits, where the majority of galaxies are redder in Prospector-α. In the top panels, we show example spectra of galaxies with the typical color difference and the extreme color difference, where g-band and r-band model photometries are shown in triangles. Comparison between the two columns in the top panels indicates that the model offset in the rest-frame color mainly comes from the nebula emission.
be prevalent in the early universe, marked by very high SFRs, strong nebular emissions, and very low metallicities. Our finding suggests that a subsample of 3D-HST galaxies are candidates for being extreme emission line galaxies (EELGs, e.g., Maseda et al. 2018). We examine the Hα equivalent width (EW) for 20 galaxies with the most extreme color differences. All of them have Hα EW 400 km s −1 , high sSFR, and low metallicity, consistent with the properties of observed EELGs.

Shrinkage Effects in the HB method
Here we will discuss a key feature of HBMs called shrinkage -this refers to the reduction in posterior size that occurs when applying the population model as a prior to individual fits. In our problem, the shrinkage estimators introduce a decrease in variance around the average (g − r)-log(M/L g ) relationship. Because of the restrictions of the simple model and its curvature at the bluest color that makes it hard to model, we will focus on the Prospector-α model for the shrinkage analysis in this section. We calculate the shrinkage of every galaxy by multiplying the individual SED-fit posterior P (θ i |D) by a factor P (ρ i |D)/P (θ i ), where P (ρ i |D) is the HBM posterior at this galaxy's redshift and P (θ i ) is the prior probability we show in Figure 2. In this way, we are reapplying the HBM posterior as a prior to the individual likelihoods from SED fitting. Figure 8 presents the shrinkage effects on the Prospector-α model. We compare the posterior mean of each galaxy before and after shrinkage, where the original SED fits are plotted in black and the data af-  Figure 8. Comparison of the Prospector-α fits to the HBM prediction after shrinkage for S/N ≥ 19 galaxies, S/N < 19 galaxies, and the entire sample. Galaxies before the shrinkage are indicated in black, and galaxies after shrinkage are shown in green. The error bars represent the median standard deviation of log(M/Lg) over all colors. The model does stronger shrinkage for low S/N galaxies.
ter shrinkage is in green. A close inspection of this shows a more concentrated posterior distribution after shrinkage. The model shifts most galaxies with high log(M/L g ) toward the median relationship. We compare the shrinkage effects between galaxies with different S/N ratios in different panels. The full sample is divided into two subsamples using S/N = 19 as a cut, by which we have a roughly equal number of galaxies in each subsample. It is clear that the individual fits experience stronger shrinkage in the S/N < 19 subsample, where the galaxies are dimmer. Because such faint galaxies have a smaller effect on the inferred hyperparameters and we assume the faint galaxies distribute similarly to the bright galaxies, we are borrowing information from bright galaxies when we perform hierarchical shrinkage on the properties of faint galaxies. This manifests a strength of the HBM that it provides a framework whereby a subset with high-quality data explicitly benefits all of the data.

DISCUSSION
Here we have learned that the chosen SED model has a large effect on the optical color, M * /L, and the relationship between the two. Now we will discuss three outstanding issues. First, we will make a comparison to the color-M * /L relationships in other work. Second, we will discuss the implications of the HBM shrinkage. Finally, we will explore the possibility of constraining a much tighter relationship with an additional color (r−i).

Comparison to Other Color-M * /L Relationships
Prospector-α model predicts higher galaxy masses and smaller SFRs than other SED models ) and we expect it to also produce a different color- Figure 9. Comparison to the (g − i)-log M/Li relationships presented by other works. The black solid line is Prospector-α median relationship from our HBM. The blue and pink shaded regions are the 1σ and 2σ range of the model posteriors respectively. The green line shows the relationship from Bell et al. (2003). The yellow line represents the relationship given by Taylor et al. (2011). These two are empirical relationships derived from SED fitting to data like our work. The Zibetti et al. (2009), Into & Portinari (2013 relationships are based on stellar evolution models. M * /L relationship. In Figure 9, we compare the median Prospector-α (g − i)-log M/L i relationship to two empirical relationships and two theoretical relationships in other works. We switch to (g − i) and log M/L i for consistency with these works. We follow the same procedure in Section 3 to calculate the distribution of log M/L i at given (g − i) by fitting the HBM. We have accounted for the offsets due to the choice of different IMFs.
The relationships from Zibetti et al. (2009), Into & Portinari (2013 are derived from SPS libraries. Zibetti et al. (2009) estimate their relationship by marginalizing over a Monte Carlo library of SPS models, with an SFH composed of an exponentially declining continuum and random SF bursts. Into & Portinari (2013) use the Padova isochrones to show the trends in M/L ratios and colors as a function of ages, metallicities, SFH birthrate parameters. These relationships depend entirely on the physical model and should in fact be compared to our priors. Similar to the priors in Figure 1, these relationships have steeper slopes than our relationship from observational data.
The empirical relationships from Bell et al. (2003), Taylor et al. (2011) are more aligned with our relationship. Bell et al. (2003) fit the relationship using data from the Sloan Digital Sky Survey (SDSS) and the Two Micron All Sky Survey (2MASS). They use an exponentially declining SFH and evolve their dust-free relationship back to z = 0 where they assume the galaxies are 12 Gyr old. Our offset to their relationship can be partly explained by the inclusion of dust, and by the difference in age due to our nonparametric SFH and higher central redshift (see redshift evolution in Figure 5). Taylor et al. (2011) obtained the relation using intermediate-redshift (z < 0.65) galaxies in the Galaxy And Mass Assembly (GAMA) survey with an exponentially declining SFH as well. Their relationship is the closest to ours. The Bruzual & Charlot (2003) SPS models that they choose can introduce a ∼0.05 dex offset in M * with respect to FSPS SPS models used in Prospector-α .
The relationships from literature in Figure 9 are all based on parametric SFHs. We have argued that nonparametric SFHs are crucial to providing the flexibility needed to model M * /L estimates for blue galaxies. Recently, Ge et al. (2021) also use nonparametric SFHs for SED fitting to study the pixel-by-pixel-based color-M/L relationship of MaNGA galaxies. However, their M/L estimates are systematically higher than ours, and we speculate that it is because individual pixels are allowed to have a more bursty SFH. While galaxies represent the integrated light from pixels, the bursty features of pixels are possibly smoothed out. Also note that we assume a continuity SFH prior from Leja et al. (2020) and they assume a much bustier prior from PPXF (Cappellari 2017). Therefore, our relationship cannot be compared to theirs directly; this highlights the great effects of the nonparametric SFH priors on the inferred SFH.
To summarize, there are many ways that Prospector-α color-M * /L relationship may be different than the other empirical results. First, Prospector-α is a sophisticated physical model that adds nebular emission, dust emission, and uses nonparametric SFH. All these model assumptions may affect the relationship. As we have highlighted in Section 4, the nonparametric SFH shifts the whole relationship as the stellar population ages. This means if our sample centers at a different redshift than other works do, the resultant average relationship will be different. Second, each work assumes different model priors. Consequently, the parameter space covered in our (other) model may not be allowed in other (our) models. For example, Bruzual & Charlot (2003) SPS model covers a wider variety of metallicity than ours, which can lead to smaller M * /L for blue galaxies because they are allowed to have lower metallicity. Despite the discrimination between our and other color-M * /L relationships, we do not know which relationship is correct. Our updated relationship between color and M * /L can potentially impose constraints on galaxy dynamical masses (M dyn ). The difference between dynamical and stellar masses often depends on how far from the galaxy center we probe. If we have spatially resolved data to take the galaxy radius into account explicitly, we may correlate the M dyn /L with the M * /L of a galaxy (e.g., Taylor et al. 2010;de Graaff et al. 2021). For example, Taylor et al. (2010) showed that the ratio of M * and M dyn is independent of stellar population parameters and observables such as apparent magnitude and redshift after excluding the effect of galaxy structure. A few studies observe a strong empirical relationship exists between log M dyn /L and color (e.g., van de Sande et al. 2015) similar to log M * /L. Since M dyn is composed of stellar, gas, and dark matter components, our relationship on M * /L can set a lower limit of M dyn /L. Specifically, the relationship can potentially put radiusdependent constraints on M dyn /L at high redshift where the galaxy kinematic studies are challenging. Note that there is much complexity in the interpretation of the dynamical masses such as the galaxy inclination and mass-anisotropy degeneracy, and therefore, one needs to be cautious about linking M dyn with M * .

Learning New SED-fitting Priors from the HBM Shrinkage
While Prospector-α likely allows more freedom in its parameters than galaxies actually occupy, the HBM provides us with one way to update the SED model to make it better describe the real data. A key aspect of the population model is to link galaxies of the same population and facilitate predictions of unobserved galaxies of this population. Potentially, we can infer from the shrinkage estimates how we should tune the priors to  Figure 10. The logarithmic density of galaxies on the (g − r)-log(M/Lg) plane for the priors, the median of individual SEDfitting posteriors, and the median of each object after HBM shrinkage are shown in the left three columns respectively. The right column demonstrates the log(M/Lg) offset after and before the HBM shrinkage. The results shown here are based on the Prospector-α model. We observe a strong shrinkage effect for blue galaxies, but some red faint galaxies are pushed away from the average relationship. make them better fit the underlying population distribution. Deriving new priors from the shrinkage may not be a straightforward process (Loredo & Hendry 2019). But understanding the HBM shrinkage would be a first step to building better priors. Figure 10 shows the probability density function of priors, the median posteriors from individual SED fits, the median estimates of every galaxy after HBM shrinkage in the (g−r)-log(M/L g ) plane, and the difference in log(M/L g ) between the median after shrinkage and the median after SED fitting for each galaxy. For the majority of galaxies, the shrinkage estimates are more tightly constrained to the average relationship than the individual SED fits. A large amount of shrinkage is observed for the blue galaxies where the HBM increases their extremely low log(M/L g ) values. The shrinkage analysis suggests that most galaxies do live within a narrower range than our priors, especially blue galaxies. If we apply this tighter color-M/L relationship from the HBM for future photometric surveys like the Large Synoptic Survey Telescope (LSST), it may enable a more accurate selection of color and redshift for the mass range covered than those from the current model priors (see Figure 2).
Despite the significant shrinkage at the blue end and in the middle color range shown in Figure 10, we observe an opposite effect for some red faint galaxies. For these outliers, the HBM pushes them even farther from the average by lowering their log(M/L g ). This is because the prior is too strong and washes out the effect of HBM posterior. The different results for blue and red galaxies from the shrinkage analysis may imply that our assumption that faint and bright galaxies are drawn from the same population may not be true, and we need more population components in our model. In order to test the efficacy of our model on specific galaxy subpopulations, it will be important to have deeper, higher spectral resolution data across the entire color range or a larger sample size.
The shrinkage analysis also supports our choice of the population model for the distribution of galaxies in the optical color-log M * /L plane. We choose a flexible functional form to model the population distribution as described in Section 3. The purpose of using a flexible model is to avoid biases led by model choices. Adding free parameters into the population model will usually lead to greater shrinkage toward the average relationship as the population model tries to avoid overfitting. Figure 10 only shows a moderate amount of reduction in scatter. Given that the amount of shrinkage is determined by P (ρ i |D)/P (θ i ), if galaxies in fact live far from the SED model predictions or the population model is unnecessarily complex, the amount of shrinkage should be much greater.
In addition to learning from data using the hierarchical model, numerical simulation is another approach to building more informed priors (Pacifici et al. 2013). However, galaxy evolution is very complicated, and simulations need to consider many ingredients, such as the interstellar medium, dark matter halo, active galactic nuclei, stellar feedback, gas cooling, and magnetic fields. It is a challenge to ensure that all the physical parameters we are interested in are well predicted in simulations (Somerville & Davé 2015), especially when considering different subgrid physics (e.g., Crain et al. 2015). As a result, each galaxy formation model encodes a different set of SFHs. What we learn from the simulations will vary widely depending on which simulation we choose. For example, the predictions for the power spectrum density have a large diversity among simulations (Iyer et al. 2020), and M * from SED fitting shows systematic offsets to the true mass from simulated stellar particles (Lower et al. 2020). Unlike simulations, for HBM we are guaranteed to learn how we can update our SED model to describe the data better but suffer from the observation uncertainties and degeneracies. We can utilize the observations and simulations together to better constrain the SED priors.

Constraining the M * /L with Two Colors
So far our discussion has been focused on inferring M * /L from one optical color, but is it possible to significantly reduce the uncertainty of M * /L estimates with one or two more colors? In principle, the derived M * /L will certainly be closer to the SED-fit results when we use more colors, or equivalently, more photometry as this is what the SED models are conditioned upon. Our question is if there is one color that provides significant added information on the M/L g in addition (g − r), so the combination of the two colors can constrain a much tighter relationship with M/L g .
We calculate rest-frame colors from 10 filters in the optical and NIR, including u, g, r, i, z, J, H, K, IRAC1, IRAC2. We examine if any of them are correlated with the log(M/L g ) residuals between individual SED fits and the HBM (g−r)-log(M/L g ) relationship from Equation 12. Of these colors, we find that (r − i) has the strongest correlation with the residuals (see the left column of Figure 11). The Pearson correlation coefficient is 0.44. This correlation indicates that including (r − i) may help reduce the residuals of log(M/L g ) estimates.
We perform a linear regression between (r − i) and the residuals of log(M/L g ) estimates. Readers may use the following relation to evaluate an (r − i) dependent correction to the log(M/L g ) estimates from our HBM. log(M/L g ) residuals = −0.11 + 0.40(r − i).
Note that this relation has not been tested through our full HBM machinery. As a simple test to quantify the improvement of M * /L estimates after adding (r − i), we perform two linear fits on the relationship between color(s) and log(M/L g ) using the median posteriors from SED fits. In one of the linear models, log(M/L g ) has a linear dependence on (g − r) and redshift, and in the other model, log(M/L g ) depends on (g − r), (r − i), and redshift. The two models predict a similar average relationship between (g − r) and log(M/L g ) while the model with (r − i) has a smaller scatter around the regression line. The right column of Figure 11 shows the median and 1σ range of the log(M/L g ) residuals of the linear fits. The linear model with two colors has residuals closer to 0 and a smaller scatter ∼0.15 dex, compared to ∼0.26 dex of the model with only (g − r). Our results imply that it may be worthwhile taking (r − i) into account besides a single color (g − r) to better constrain M * /L. We do not perform a full HBM here since the purpose of this section is only to provide a possible path forward for future analyses.
(r − i) is potentially useful for constraining a tighter color-M * /L relationship because it helps to identify sSFR, which can be degenerate with metallicity or dust at a fixed (g − r). In the left column of Figure 11, we show that both log(M/L g ) residuals and (r − i) correlates with sSFR 100 Myr , which means that sSFR is the major driver for the potential of (r−i) in reducing M * /L residuals. At a fixed (g − r) color, a bluer (r − i) color is associated with younger star-forming galaxies, whereas a redder (r − i) color is typically associated with older quiescent systems. We speculate that (r − i) correlates with sSFR 100 Myr because the r-band photometry likely puts constraints on the Hα emission line, an indicator of the most recent SFR. If this is true, adding (r − i) may only benefit the color-M * /L relationship from SED models including nebular emission.
The right column of Figure 11 indicates that including (r − i) has the largest effect on red galaxies with (g − r) 0.3. So (r − i) may be useful in differentiating the dusty star-forming galaxies with blue (r − i) and the quiescent galaxies with red (r − i) for galaxies with red (g − r) color. This agrees with our finding that (r − i) has a strong correlation withτ 2 . The fact that (r − i) anticorrelates with sSFR and reduces the scatter of log(M/L g ) estimates for red galaxies likely suggests that (r − i) helps distinguish between age and dust.

SUMMARY
The empirical rest-frame optical color-M * /L relationship is greatly influenced by the physical model used in SED fits. In this work, we contrast the (g − r)-log(M/L g ) relationship between a simple 4parameter SED model and a sophisticated 14-parameter Prospector-α model using 63,430 3D-HST galaxies up to z = 3. We utilize a hierarchical Bayesian model to fit the relationship, which allows us to account for the SEDfit posteriors and priors explicitly. We show the distinction between the two model relationships and identify the galaxy properties that drive the difference. Further-  Figure 11. The rest-frame (r − i) color may significantly reduce the uncertainty of M * /L estimates from CMLR. Left: sSFR drives the correlation between (r − i) and log(M/Lg) residuals calculated from the mode of the HBM relationship. The Pearson correlation coefficient is 0.44. Right: the linear fit of log(M/Lg) upon both (g − r) and (r − i) has smaller scatter than the linear model using only (g − r). The shaded regions cover 1σ range of the residuals for the two linear models respectively. Both plots are based on the Prospector-α model. more, by taking advantage of HBM shrinkage, we learn more about the true distribution of the (g − r) color and M/L g . Our main findings are as follows: • The two contrasting SED models derive similar average (g − r)-log(M/L g ) relationships but with significantly different scatters. The more sophisticated Prospector-α model predicts a looser and less skewed relationship in contrast to the simple model (Figure 3), with a 1σ uncertainty of 0.28 dex compared to 0.12 dex. M * /L and (g−r) offsets between the two SED models are mostly attributed to the nonparametric SFHs and nebular emission in Prospector-α, respectively. However, the two effects together shift the average relationship in a parallel direction to higher M * /L and redder color. The simple model is too restricted and not sufficient for describing all galaxies, especially the blue ones because the exponentially decaying SFH does not allow galaxies to have both old stars and optically blue colors.
• Stellar age is the major driver of the difference between two model relationships ( Figure 6). For the Prospector-α model, we find a significant redshift evolution of the relationship ( Figure 5). The average relationship for low-z, older galaxies has higher M * /L than that of high-z, younger galaxies. Prospector-α nonparametric SFH adds more scatter to the relationship through a more flexible treatment of SFHs. We do not recover a redshift evolution in the simple model due to the limitation of exponentially declining SFHs against outshining effects. The shape of the dust attenuation curve is an important driver for the M * /L offset in red galaxies.
• The nebular line and continuum emission enabled in Prospector-α is the main reason for the different color measurements (Figure 7). The objects with extreme color offsets are likely to be EELGs.
• The shrinkage analysis of the HBM motivates us to learn better galaxy SED-fitting priors from the data. Our current priors are slightly wider for most galaxies but should be weaker for some red galaxies ( Figure 10).
Our work shows that optical color-M * /L relationship suffers from systematic errors depending on the choice of the SED model. The skewness of the distribution of M * /L at given color may cause a bias when estimating M * for a specific galaxy using the average relationship. Given the redshift evolution we found, future work needs to be careful when applying the empirical relationship determined locally to high-redshift galaxies.
Our results can be expanded in several ways. The HBM in this work is a path forward to informing better priors for SED fitting using various hierarchical models. Dynamical masses will provide useful constraints to the relationship, and it may help to include dynamical measurements in the HBM. Ideally, if we have a more flexible model such as an HBM fitting both M/L distribution at a given color and color distribution at a given M/L, we can better parameterize the M/L distribution at different colors. Finally, it is possible that the scatter in the relationship can be significantly reduced by adding only one or two more colors such as the r − i color. The recovered PDF is similar to the input one to high precision.