A Machine-learning Approach to Predict Missing Flux Densities in Multiband Galaxy Surveys

We present a new method based on information theory to find the optimal number of bands required to measure the physical properties of galaxies with desired accuracy. As a proof of concept, using the recently updated COSMOS catalog (COSMOS2020), we identify the most relevant wave bands for measuring the physical properties of galaxies in a Hawaii Two-0- (H20) and UVISTA-like survey for a sample of i < 25 AB mag galaxies. We find that with the available i-band fluxes, r, u, IRAC/ch2, and z bands provide most of the information regarding the redshift with importance decreasing from r band to z band. We also find that for the same sample, IRAC/ch2, Y, r, and u bands are the most relevant bands in stellar-mass measurements with decreasing order of importance. Investigating the intercorrelation between the bands, we train a model to predict UVISTA observations in near-IR from H20-like observations. We find that magnitudes in the YJH bands can be simulated/predicted with an accuracy of 1σ mag scatter ≲0.2 for galaxies brighter than 24 AB mag in near-IR bands. One should note that these conclusions depend on the selection criteria of the sample. For any new sample of galaxies with a different selection, these results should be remeasured. Our results suggest that in the presence of a limited number of bands, a machine-learning model trained over the population of observed galaxies with extensive spectral coverage outperforms template fitting. Such a machine-learning model maximally comprises the information acquired over available extensive surveys and breaks degeneracies in the parameter space of template fitting inevitable in the presence of a few bands.


INTRODUCTION
Corresponding author: Nima Chartab nchartab@carnegiescience.edu Future ground-based and space-borne observatories, equipped with large aperture telescopes and sensitive large format detectors will provide broad-band imaging data for more than a billion galaxies.These data are pivotal to better understanding of dark sectors of the Universe (i.e., dark matter and dark energy) as well as the evolution of galaxies and large-scale structures over cosmic time.The challenge, however, is to obtain wide waveband coverage to constrain the spectral energy distributions (SEDs) of millions of galaxies and estimate their redshifts and physical parameters such as stellar masses and star formation rates.
Template fitting is widely used to infer photometric redshifts of galaxies and their physical properties (e.g., Arnouts et al. 1999;Bolzonella et al. 2000;Ilbert et al. 2006).However, theoretical synthetic templates may not be representative of the real parameter space of galaxies.For example, templates can include SEDs which do not have an observational analog.This will cause degeneracy in parameter measurement, especially when we reconstruct SEDs with few bands.Many of these degeneracies are mitigated by obtaining data with wide spectral coverage (e.g., with a larger number of wavebands).An example of such a data set is the Cosmic Evolution Survey (COSMOS; Scoville et al. 2007) that has been observed in more than 40 bands from Xray to radio wavelengths.The wealth of information in this field provides very well-constrained SEDs for galaxies.However, not all surveys have as many photometric bands as the COSMOS field.For instance, Euclid (Laureijs et al. 2011) will rely on near-infrared Y , J, and H bands (960-2000 nm), complemented by optical groundbased observations in u, g, r, i and z to measure photometric redshifts (Euclid Collaboration et al. 2020).It is therefore instructive to use the extensive dataset in the COSMOS field to identify essential bands which carry most of the information regarding the physical properties of galaxies.
The aim of this study is to transfer the information gained in the COSMOS field to fields such as the Euclid deep fields where such extensive photometry does not exist.Using the concepts of information theory, we can find if there is any information shared between the bands and use these measurements to identify the most important bands (those that reveal most of the information about the physical properties of galaxies).Based on the machine learning techniques, we can then predict fluxes in the wavebands that are not observed in a survey but share information with other available (observed) bands.This allows us to carefully design future surveys and only observe in selected wavebands that include most of the information to significantly save in the observing time.
Machine learning has become popular in recent years to build models based on spectroscopic redshifts (e.g., Carrasco Kind & Brunner 2014; Masters et al. 2017) and train models based on synthetic templates (e.g., Hemmati et al. 2019) or mock catalogs generated from galaxy simulations (e.g., Davidzon et al. 2019;Simet et al. 2021).These methods are particularly useful as machine learning algorithms can learn more complicated relations given a large and comprehensive training data set (Mucesh et al. 2021).Moreover, these models speed up parameter measurement, which is an important characteristic with the flood of data imminent from upcoming surveys (Hemmati et al. 2019).
In this paper, we develop a new technique based on information theory to quantify the importance of each waveband and identify essential bands to measure the physical properties of galaxies.We also develop a machine learning model to predict fluxes in missing bands and thereby improve the wavelength resolution of existing photometric data.To demonstrate the application of these techniques, we apply our methods to a sample of galaxies drawn from the latest version of the COS-MOS survey (COSMOS2020; Weaver et al. 2022), analogous to that planned by the Euclid deep fields.A new ground-based survey, Hawaii Two-0 (H20; McPartland et al. in prep), has been designed to provide complementary photometric data for the Euclid mission.H20 will provide u−band observations from MegaCam instrument on the Canada-France-Hawaii telescope (CFHT) and g−, r−, i−, z−band imaging from Hyper Suprime-Cam (HSC) instrument on the Subaru telescope over 20 square degrees of the Euclid deep fields.Spitzer/IRAC observations from the Spitzer Legacy Survey (SLS) are also available in the same fields (Moneti et al. 2021).In this paper, we identify the importance of wavebands for an H20+UVISTA-like survey with similar wavelength coverage expected in Euclid deep fields, incorporating the near-IR Y JH bands from UltraVista (McCracken et al. 2012) in addition to the H20 and SLS wavebands.We then predict fluxes in near-IR wavebands using the existing ground-based and mid-IR Spitzer/IRAC observations (H20-like) of the deep fields.
In Section 2, we briefly introduce the COSMOS2020 catalog, and use that to build a sample of H20+UVISTAlike galaxies.Section 3 describes the concepts of information gain and quantifies the importance of each waveband based on that.In Section 4, we use dimensionality reduction techniques to visualize photometric data in 2-dimensional space to explore the feasibility of predicting fluxes in near-IR fluxes based on ugriz and Spitzer/IRAC data.This is followed by Section 5 where we train a machine learning algorithm, Random Forest model, to predict fluxes in UVISTA/Y JH wavebands using data in wavebands similar to the existing H20.In Section 6, we investigate the accuracy of the photometric redshifts and stellar masses given the limited number of bands available in H20-like and H20+UVISTA-like The entropy of the redshift calculated based on the distribution shown in this figure is less than the entropy of a uniformly distributed redshift.In other words, we get less surprised when we observe the redshift of a galaxy given this distribution (prior information).
data.We discuss and summarize our results in Section 7.

DATA
Here we use the updated version of the COSMOS catalog, COSMOS2020, to build a sample of galaxies analogous to those that will be observed in the Euclid deep fields.Compared to COSMOS2015 catalog (Laigle et al. 2016), COSMOS2020 provides much deeper near-IR and mid-IR (Spitzer) photometric data as well as two independent methods for photometric extraction -the conventional and a profile-fitting (The Farmer; J. Weaver et al., in prep.)methods.We use The Farmer photometry that contains consistent photometric data in 39 bands from FUV to mid-IR including broad, medium and narrow filters.All the data are reduced to the same scale with appropriate PSFs.Photometric redshifts are calculated using LePhare (Arnouts et al. 1999;Ilbert et al. 2006) with a similar configuration described in Ilbert et al. (2013).Given the large number of bands with deep observations, photometric redshift solutions are accurate, reaching a normalized median absolute deviation (σ NMAD ; Hoaglin et al. 1983) of 0.02 for galaxies as faint as i ∼ 25 AB mag (Weaver et al. 2022).The redshifts of galaxies are then fixed on their estimated photometric redshifts and the stellar masses were estimated.In this paper, we consider COSMOS2020 photometric redshifts and stellar masses as a "ground truth" since spectro-scopic redshifts are only available for a limited number of galaxies and using a mixture of photometric and spectroscopic redshifts can bias our sample towards specific populations of galaxies.
We use two sets of wavebands: 1) H20-like bands: A := {u, g, r, i, z, ch1, ch2}, 2) H20+UVISTA-like bands: B := {u, g, r, i, z, Y, J, H, ch1, ch2}.u−band observations are conducted by MegaCam instrument at CFHT, and other optical bands (g, r, i and z) are available from Subaru's Hyper Suprime-Cam (HSC) imaging.Spitzer/IRAC channel 1,2 (ch1, ch2) data are compiled from all the IRAC observations of the COS-MOS field (Moneti et al. 2021).Near-IR photometry in Y , J and H bands are obtained from the UltraVista survey (McCracken et al. 2012).We select a subset of the COSMOS2020 galaxies that are observed, but not necessarily detected, in all the aforementioned bands and have i−band AB magnitude ≤ 25 with 3σ detection.These selection criteria result in 165,807 galaxies out to z ∼ 5.5.Photometric measurements in COSMSOS2020 catalog are not corrected for Galactic extinction.We corrected them using Schlafly & Finkbeiner (2011) dust map.Moreover, some sources have negative fluxes in the desired bands, which is due to the variation of background flux across the image.We set these fluxes to zero.

INFORMATION GAIN
Let's suppose that we do not have any prior information about the redshift distribution of galaxies selected from the criteria mentioned in Section 2. We, therefore, assume a uniform distribution for the redshift.As an example, if we define four bins of redshifts ({z 1 =(0,1], z 2 =(1,2], z 3 =(2,3], z 4 =(3,4]}) and want to identify which bin does a galaxy belong to, we can encode it in two bits, as below, Here, we need to ask two YES/NO questions to identify the bin a galaxy belongs to.However, based on the available observations of COSMOS2020, we know the redshift distribution of galaxies with i ≤ 25 AB mag as background information.We, therefore, update the Mutual information of redshift and wavebands in bits per galaxy.Larger mutual information means that the entropy of the redshift will decrease more if we include the band in photometric redshift measurements, so the band is more important.
Here, u is the most important followed by z-band.
decision tree above, considering our prior information about the redshift distribution, to reduce the average number of questions we need to ask to identify the redshift bin of a galaxy.Based on the redshift distribution shown in Figure 1, the probability of a galaxy being in each redshift bin is: P (z 1 ) = 0.56, P (z 2 ) = 0.32, P (z 3 ) = 0.09, P (z 4 ) = 0.03.Thus, one possible decision tree to identify the redshift bin of a galaxy can be built as follows, 0 < z ≤ 1?

Yes No
Yes No

Yes No
On average, 0.56 × 1 + 0.32 × 2 + (0.09 + 0.03) × 3 = 1.56 questions (bits) are required to identify the redshift bin of a galaxy.We find that the number of bits (questions) reduced from 2 to 1.56 when we added information regarding the redshift distribution of galaxies.This decrease shows that we will get less surprised when we observe the redshift of a galaxy, given that we know what the redshift distribution looks like.
Given the above example, the optimal number of bits required to store a variable called Shannon's entropy (H), is defined as (Shannon 1948), where x i is the possible outcome of a variable (X) which occurs with probability P (x i ).In this formulation, log 2 P (x i ) represents the number of bits required to identify the outcome.Using equation 1, Shannon's entropy of redshift based on the probabilities in four bins is 1.45 bits.This means that we can still make our tree more optimal to encode the redshift values in 1.45 bits instead of 1.56.One possible way would be building the tree to identify the redshift of two galaxies simultaneously, which makes the average number of questions per galaxy even less than 1.56.However, we do not aim to find the optimal compression algorithm to encode the redshift information.We just use Shannon's entropy to find the maximal compression rate.
In the presence of other information, such as observed fluxes in different bands, the entropy of the redshift decreases even more.The amount of uncertainty (entropy) remaining in X after we have seen Y is called conditional entropy and defined as, where P (x, y) is the joint probability distribution at (x, y).Moreover, the mutual information between X and Y (i.e., the amount of uncertainty in X that is removed by knowing Y) is defined as, where H(X, Y ) is the joint entropy of a pair of variables (X, Y ).In other words, I(X, Y ) is a measure of the amount of information (in bits) one can acquire about X by observing Y .This parameter can be used to identify the waveband that will be most useful for measuring galaxy properties (e.g., redshifts).For instance, the .Conditional mutual information of redshift and wavebands in bits per galaxy.The most relevant bands can be selected based on their conditional mutual information.The sample is selected based on the magnitude of the i−band, which implies that the first selected waveband is the i−band.The top left panel shows the mutual information of redshift and wavebands given that i−band data are available.Therefore we select r−band as the second most relevant band since it provides the most information.In the top right, we assume that i− and r−band data are available and find that u−band would be the third choice.We follow a similar procedure to find relevant bands in order of their importance.We note that these results depend on the selection criteria.For any new sample of galaxies with a different selection, these results should be remeasured.waveband with the highest I(redshif t, waveband) carries the most information and decreases the entropy of the redshift the most.
The mutual information as in equation 3 is defined for discrete variables.In the case of continuous variables (e.g., redshift, flux, stellar mass), we need to properly discretize the data.Kraskov et al. (2004) (hereafter KSG) introduced a k-nearest neighbor estimator to compute the mutual information of continuous vari-ables.This method detects the underlying probability distribution of data by measuring distances to the k th nearest neighbors of points in the data set.There is nonzero mutual information when some points are clustered in the X-Y space, which allows us to predict y ∈ Y given an x ∈ X coordinate.We refer readers to the original KSG paper for details of the method.Figure 2 shows the mutual information between redshift and each waveband based on the KSG algorithm with k = 100 nearest neighbors.It suggests that given the sample of i < 25 AB mag galaxies, the u−band provides the largest information regarding the redshift compared to the rest of the H20+UVISTA-like bands.However, our sample is selected based on i−band magnitudes, so we assumed that i−band data are already available.Suppose that for our sample u−band fluxes are highly correlated with i−band data.In this case, u−band carries no information in the presence of i−band data.To take into account such an effect, we need to compute conditional mutual information, defined as, where I(X, Y |Z) is the mutual information of X and Y given that Z is observed.Following the KSG algorithm, we find the conditional mutual entropy to sort wavebands based on their importance.We compute I(redshif t, waveband|i−band) and choose the waveband with the highest conditional mutual information as the most important band.The conditional mutual information estimations reveal that the r−band is the most important waveband given that i−band data are available.
We continue computing conditional mutual information, I(redshif t, waveband|swaveband), where swaveband is the previously selected waveband.
Figure 3 shows the non-zero conditional mutual information as we select relevant wavebands.We find that for i < 25 AB mag galaxies, r, u, ch2 and z bands are the bands that provide most of the information about the redshift with decreasing importance from r−band to z−band.We repeat these analyses for stellar mass measurements.As shown in Figure 4, we measure the mutual information between stellar mass and each waveband for the whole sample, and in Figure 5, we measure the same quantity, I(log(M * /M ), waveband|i − band), in the bins of redshifts.As we expect, the role of short wavelength bands decreases as we approach higher redshifts.We further compute the important wavebands given the availability of i−band data in Figure 6.We find that ch2, Y , r and u bands are the most relevant bands in the stellar mass measurements with decreasing order of importance.One can constrain the redshift and repeat analysis to find the optimal bands for stellar mass measurements in the desired redshift range given the availability of i−band data.
One should note that these conclusions depend on the selection criteria of the sample.This method provides a powerful tool in designing future surveys and quantifying the importance of each waveband.An efficient observation can be conducted by prioritizing important wavebands identified by the information gain-based method.
Moreover, different waveband fluxes can be intercorrelated for a specific sample of galaxies.For instance, the top left panel in Figure 6 shows that IRAC/ch1 and ch2 provide a comparable amount of information for stellar mass measurements, which suggests that these bands are inter-correlated for our sample with i < 25 AB mag. Figure 7 visualizes the mutual information between different bands.A greater value of mutual information indicates that wavebands are more correlated.Inter-correlation between wavebands allows us to predict/simulate fluxes of galaxies in missing bands.In the following, we investigate the possibility of predicting/simulating near-IR UVISTA/Y JH fluxes based on H20-like data for a sample of galaxies with i < 25 AB mag.
4. DATA VISUALIZATION Fluxes of galaxies in N wavebands are used to measure the photometric redshifts and physical parameters of galaxies.For example, the H20-like data with N = 7 bands occupy a 7-dimensional space, where the position of each galaxy is determined by its fluxes in 7 bands.Therefore, galaxies with similar positions in Ndimensional space are expected to have similar redshifts and physical parameters if N is large enough to fully sample the observed SED of galaxies.Similarly, it is expected that they will have similar fluxes in (N + 1) th waveband.However, showing galaxy fluxes in a highdimensional space (e.g., 7-dimensional space) is impossible and thus, we use dimensionality reduction techniques to present them in 2D space such that the information of higher dimension is maximally preserved.In this work, we use Uniform Manifold Approximation and Projection (UMAP; McInnes et al. 2018) technique to visualize our sample in a 2-dimensional space.UMAP is a nonlinear dimensionality reduction technique that estimates the topology of the high-dimensional data and uses this information to construct a low-dimensional representation of data that preserves structure information on local scales.It also outperforms other dimensional reduction algorithms such as t-SNE (t-Distributed Stochastic Neighbor Embedding; van der Maaten & Hinton 2008) used in the literature (Steinhardt et al. 2020) since it preserves structures on global scales as well.In a simple sense, UMAP constructs a high-dimensional weighted graph by extending a radius around each data point and connecting points when their radii overlap.This radius varies locally based on the distance to the n th nearest neighbor of each point.The number of the nearest neighbor (n) is the hyper-parameter in UMAP that should be fixed to construct the high-dimensional graph.Small (large) values for n will preserve more local (global) structures.Once the high-dimensional weighted graph is constructed, UMAP optimizes the layout of a low-dimensional map to be as similar as possible to the high-dimensional graph.
We use the UMAP Python library1 to map 7dimensional flux space of H20-like data to 2 dimensions considering 50 nearest neighbors to provide a balance between preserving local and global structures.We do not map magnitudes or colors since non-detected values cannot be handled properly when using them.Multiwaveband fluxes contain all the information regarding colors, but using colors misses information regarding fluxes or magnitudes.Therefore, mapping fluxes of galaxies from that space to 2-dimension is a better way than using colors.Since fluxes in different bands have fairly similar distributions, no normalization is needed before applying UMAP.In the case of significantly distinct distributions, normalization is needed to avoid the dominance of a waveband with a larger dynamic range.Figure 8 shows a 2-D visualization of the sample with H20-like bands using the UMAP algorithm.As an example, the mapped data are color-coded by the H−band fluxes (not present in H20 photometry) in µJy.The smooth transition of the H−band fluxes in the 2D representation in Figure 8 reassures us that galaxies with similar fluxes in H20-like bands also have similar H−band fluxes.We note that the H20-like data set does not include H−band data.Visualized data in Figure 8 show qualitatively that the H−band fluxes are predictable to some extent using H20-like data.To perform a quantitative assessment on how accurately one can predict fluxes in UVISTA Y JH bands given the H20-like observations, we train a Random Forest (Breiman 2001) model with half of our sample and evaluate the model's performance with the other half.A Random Forest consists of an ensemble of regression trees.The algorithm picks a subsample of the dataset, builds a regression tree based on the subsample and repeats this procedure numerous times.The final value is the average of all the values predicted by all the trees in the forest.Having numerous decision trees based on subsampled data makes this algorithm unbiased and unaffected by overfitting.Another advantage of this method is that the inputs do not need to be scaled before feeding into the model.In the following section, we train a Random Forest model and evaluate its accuracy.

FLUX PREDICTIONS
We split the sample (described in Section 2) randomly into a training and a test sample.To evaluate if the training sample is representative, we construct a 2-D projection of H20-like fluxes similar to Figure 8 for both training and test samples.Figure 9 shows the 2-D visualizations color-coded by the properties of galaxies (photometric redshift and stellar mass).We find that the training and test samples share the same properties, so the training sample is representative of the galaxies in the COSMOS field.With 82,903 galaxies as a training sample, we build a Random Forest model with 100 regression trees to predict UVISTA Y JH bands from the H20-like band fluxes.We use Python implementation of the algorithm (Scikit-learn; Pedregosa et al. 2011) 2 with its default parameters to build the model.The true (observed) fluxes in the Y JH bands are available in the COSMOS2020 catalog.Using the trained Random Forest model, we then predict the expected fluxes for galaxies not included in the training set, with the re-  2022) found that the median tension between the magnitudes derived from aperture photometry and profile-fitting extraction is ∆ ∼ 0.002 in Y J bands and ∆ ∼ 0.02 in H−band for sources brighter than the 3σ depth of each band.Thus, such small offsets in the Random Forest regressor are within the intrinsic uncertainties of the data reduction techniques.Green solid and dashed lines in the subpanels of Figure 10 show the median of ∆ and 1σ (68%) scatter, respectively.The scatter in the prediction is < 0.17 mag for galaxies brighter than 24 AB mag.This shows that Y JH near-IR observations of UVISTA can be simulated with acceptable accuracy from the available observations of H20 for a sample of galaxies with i < 25 AB mag.Our results remain consistent when we rebuild a new Random Forest with different randomly selected training samples.While our focus in this paper is on the UVISTA/Y JH and H20 bands, the method we present is general and directly applicable to other surveys.

PHOTOMETRIC REDSHIFT AND STELLAR MASS
In the previous section, we showed that given the observations of the H20 survey, near-IR observations of UVISTA can be constrained to some extent.In other words, observations of the COSMOS field provide valuable information regarding the distribution of galaxies in the flux space, even if we do not observe galaxies as extensively as it is done in the COSMOS field in terms of spectral coverage.When we use template fitting code with synthetic templates, we usually do not take into account this constraint.There are two approaches to incorporate this information in the photometric redshifts or physical parameters measurements.First, add a prior to fluxes in the bands that are not observed in the survey.For instance, when we perform SED fitting using H20like bands, we can add priors to the Y JH bands based on a Random Forest model, which is trained over the population of galaxies from the COSMOS observations.Second, train a model based on SED-fitting results calculated with a large number of bands.In this case, when we feed our model with H20-like data, it will decide the best value of a parameter based on both the existence of similar observations in the COSMOS field (information from galaxy populations) and the SED-fitting solution for that galaxy.
In this section, we employ the latter approach to train a model to predict the photometric redshifts and the stellar masses of galaxies based on H20-like and H20+UVISTA-like bands.We train a Random Forest model based on a training sample of observed galaxies.The inputs of the model are H20-like fluxes and the output is either photometric redshift or stellar mass computed from SED fitting over 29 bands available in the COSMOS2020 catalog.We also train another similar model where the inputs are H20+UVISTA-like bands.Figure 11 shows the performance of trained models on the test sample with 82,904 galaxies.We find that both models recover photometric redshifts and stellar masses with comparable accuracy with being slightly accurate using H20+UVISTA-like inputs.Normalized median absolute deviation (σ NMAD ) of ∆z/(1 + z) is ∼ 0.03 for both models with ∼ 4% outlier fraction.Outlier galaxies are defined as galaxies with ∆z/(1 + z) > 0.15.The median absolute deviation of log(M * /M ) is ∼ 0.1 dex for both models.We explain this similar performance using the results of Section 3 and 5.The Random Forest model with H20-like bands comprises most of the information regarding UVISTA bands as we trained the model with the population of observed COSMOS galaxies.Therefore, it should recover photometric redshifts and stellar masses as accurately as the model which includes near-IR (Y JH) observations.
We repeat a similar analysis starting with only i−band data and adding other important bands in the same order as we identified in Section 3. Figure 12 shows the the normalized median absolute deviation of ∆z/(1 + z) and log(M * /M ) as a function of bands used to measure the parameter.We find that i−, r−, u−, ch2−, z−band are the minimal number of bands to reach an acceptable accuracy of σ NMAD ∆z/(1+z) = 0.03 to measure photometric redshifts of i < 25 AB mag.For the same sample, i−, ch2−, Y −, r−, u−band are the optimal bands for stellar mass measurements reaching an accuracy of σ NMAD log(M * /M ) = 0.15 dex.

Synthetic templates
In the following, we use UMAP to visualize photometry of synthetic SED models commonly used in template-fitting procedures.We build a set of theoretical templates using 2016 version of a library of Bruzual & Charlot (2003), considering Chabrier (2003) initial mass function.Star formation histories are modeled with an exponentially declining function (SFR ∝ e −t/τ ), where τ is the star formation timescale.Dust attenuation is applied using the Calzetti et al. (2000) law and solar stellar metallicity is assumed for all templates.We build ∼ 750, 000 theoretical templates assuming τ ∈ (0.1, 10) Gyr, t ∈ (0.1, 13.7) Gyr, A V ∈ (0, 2) mag and z ∈ (0, 5.5).t and A V are the stellar age and the extinction in the visual band, respectively.We then calculate the synthetic photometry in both H20-like and H20+UVISTA-like bands by applying the corresponding filter response function.
As we learned the topology of fluxes in the H20-like bands for real observed galaxies in COSMOS2020 catalog (Figure 8), we can transform H20-like band fluxes of synthetic photometry into the learned space.Figure 13 shows the 2-D visualization of the theoretical templates with H20-like bands in that learned space.As an example, data points in the reduced dimension are color-coded by their synthetic H−band fluxes in µJy.Comparing theoretical templates with the observed data shown in Figure 8 reveals that model galaxies encounter degeneracies.In this specific example, we show that templates with similar H20-like fluxes have more diverse H−band fluxes than real observations, which can produce degenerate results when template fitting is performed based on H20-like bands.Adding information of the COSMOS2020 observations as a prior imposes a strong correlation between the observed and missing bands and makes the theoretical templates less degenerate as shown in Figure 8.For example, the dark blue arc in the left side of Figure 13 mismatches with the observational counterpart.In other words, synthetic templates predict H−band flux of ∼ 0.1 µJy for galaxies in that vicinity (i.e., the dark blue arc), but real observations show that they have, in fact, H−band flux of ∼ 10 µJy.This shows that extra information that exists in the previous observations can add valuable information to template fitting analysis.
If one adds a predicted band in the template-fitting procedure, the errors should be assigned based on the 1σ scatter of the predicted flux (dashed green lines in Figure 10).It is particularly important to properly take into account the systematic scatter of the predicted bands in template-fitting and ensure that the predicted bands are not over-weighted in best-template selection.In the following section, we perform a simple template-fitting to evaluate values added by predicted fluxes.However, it is worth highlighting that the better approach would be using a machine learning model which is trained based on template-fitting results of a galaxy population with well-constrained SEDs such as COSMOS2020 (Figure 11).

Template-fitting
We perform template fitting for three cases using 1) H20-like bands, 2) H20-like+predicted YHJ bands, and 3) H20+UVISTA-like bands.For this purpose, we split the test sample used in Section 5 into half to have a validation set as well as a new test sample.The validation sample is used to measure 1σ scatter of the predicted flux (similar to dashed green lines in Figure 10).We as-  sign errors to the predicted fluxes of the new test sample based on 1σ scatter of the validation sample at a given magnitude.We use a template-fitting code LePhare with the same configuration as Ilbert et al. (2015).This configuration differs from the templates used for COS-MOS2020 redshift measurements.In the COSMOS2020 catalog, the photometric redshifts are measured based on templates employed by Ilbert et al. (2013), followed by stellar masses measured in the same manner as Ilbert et al. (2015) at fixed photometric redshifts, but here we fit both photometric redshifts and stellar masses simultaneously.Figure 14 presents the results of the template-fitting for these three cases.We find that the lack of observed near-IR fluxes in template-fitting increases the σ NMAD and outlier fraction by 50% and 80%, respectively.We also find that adding predicted fluxes improves the σ NMAD and outlier fraction by 10% and 25%, respectively.Predicted fluxes also improve the scatter of the stellar mass measurements by 7%.Improvement in template-fitting results by adding predicted fluxes suggests that observationally driven priors on near-IR fluxes can help reduce both scatter and outlier fraction of SED-derived properties.Moreover, we find that adding observed near-IR data significantly (∼ 50%) improves the template-fitting results, but this is not the case for the Random Forest model shown in Figure 12 (∼ 10% improvement).This suggests that machine learning models are able to fully incorporate the information gathered from extensive surveys and avoid the degeneracies in template-fitting parameters that are inevitable when a few bands are present.

DISCUSSION AND SUMMARY
In this paper, we present an information gain-based method to quantify the importance of wavebands and find the optimal set of bands needed to be observed to constrain photometric redshifts and physical properties of galaxies.To demonstrate the application of this method we build a subsample of galaxies from map the color space of theoretical models and used the reduced map as a fast template-fitting technique.In the present work, we use a new technique, UMAP, to create a 2-dimensional representation of a high-dimensional flux distribution.This technique can also be utilized to map the color space of galaxies and study their physical properties (similar to Figure 9), providing an opportunity for further analyses that can be performed in the future.
Acquiring data for galaxy surveys over wide areas and a range of wavelengths with a large number of wavebands is costly.A new method based on machine learning algorithms is presented in this paper to supplement the present and future surveys in their missing bands with information from previous extensive surveys (e.g.COSMOS).It can be used to optimize observations of future surveys, as well as to predict photometry of observatories that have ceased operation (Dobbels et al. 2020).

Figure 1 .
Figure1.Redshift distribution for the subset of COS-MOS2020 galaxies brighter than i= 25 AB magnitude (3σ).The entropy of the redshift calculated based on the distribution shown in this figure is less than the entropy of a uniformly distributed redshift.In other words, we get less surprised when we observe the redshift of a galaxy given this distribution (prior information).
Figure3.Conditional mutual information of redshift and wavebands in bits per galaxy.The most relevant bands can be selected based on their conditional mutual information.The sample is selected based on the magnitude of the i−band, which implies that the first selected waveband is the i−band.The top left panel shows the mutual information of redshift and wavebands given that i−band data are available.Therefore we select r−band as the second most relevant band since it provides the most information.In the top right, we assume that i− and r−band data are available and find that u−band would be the third choice.We follow a similar procedure to find relevant bands in order of their importance.We note that these results depend on the selection criteria.For any new sample of galaxies with a different selection, these results should be remeasured.

Figure 4 .
Figure 4. Similar to Figure 2 but for the stellar mass.Mutual information of stellar mass and wavebands in bits per galaxy isshown.With more mutual information, the entropy of stellar mass will decrease more if we include the band in the photometric stellar mass measurements, so the band is more important.

Figure 6 .
Figure 6.Similar to Figure3but for the stellar mass.Each panel shows the Conditional mutual information of stellar mass and wavebands given that all the previously selected bands are available.We find that for the i−band selected sample, ch2,Y ,r and u−band are the four most relevant bands with decreasing order of importance.The top left panel shows that IRAC data are essential for stellar mass measurements.

Figure 7 .
Figure 7. Visual representation of the mutual information between different wavebands for a sample of i < 25 AB mag galaxies.The map is colored based on the value of mutual information, with purple representing the most correlated bands and yellow representing the least correlated bands (mostly independent).For instance, the mutual information of ch1 and ch2 quantifies the bits of information about IRAC/ch1 flux obtained by observing IRAC/ch2 flux.It is similar to the correlation coefficient, but it is able to capture non-linear relationships.

FluxFigure 8
Figure 8. 2-D visualization of the sample with H20-like bands using the UMAP technique.The mapped data are color-coded by the H−band fluxes.The smooth gradient of H−band fluxes in the 2-D representation reassures us that galaxies with similar fluxes in H20-like bands have similar H−band fluxes as well.

Figure 9 .
Figure 9. Similar to Figure 8, but for training (two left panels) and test (two right panels) samples.Maps are color-coded with photometric redshifts and stellar masses.We find that the training and test samples share the same properties, so the randomly selected training sample is representative of the galaxies in the COSMOS field.sultscompared in Figure10.For each band, we compare the predicted magnitudes (Mag Predicted ) with the true observed magnitudes (Mag True ).We find that the Random Forest model predicts unbiased Y JH fluxes with high accuracy.The bottom panel in each figure shows the scatter of the Mag Predicted − Mag True as a function of true magnitudes.With a median magnitude discrepancy (∆) of ∼ 0.01, we find that the offset is comparable with discrepancies that arise from different methods of photometric data reduction.Weaver et al. (2022) found that the median tension between the magnitudes derived from aperture photometry and profile-fitting extraction is ∆ ∼ 0.002 in Y J bands and ∆ ∼ 0.02 in H−band for sources brighter than the 3σ depth of each band.Thus, such small offsets in the Random Forest regressor are within the intrinsic uncertainties of the data reduction techniques.Green solid and dashed lines in the subpanels of Figure10show the median of ∆ and 1σ (68%) scatter, respectively.The scatter in the prediction is < 0.17 mag for galaxies brighter than 24 AB mag.This shows that Y JH near-IR observations of UVISTA can be simulated with acceptable accuracy from the available observations of H20 for a sample of galaxies with i < 25 AB mag.Our results remain consistent when we rebuild a new Random Forest with different randomly selected training samples.While our focus in this paper is on the UVISTA/Y JH and H20 bands, the method we present is general and directly applicable to other surveys.

Figure 10 .
Figure10.The performance of the Random Forest model on the 82,904 test galaxies not used for the training of the model.The model is trained based on H20-like bands (u, g, r, i, z, ch1, ch2) and predicts UVISTA Y JH bands.Bottom panels show the scatter of Mag Predicted − MagTrue as a function of true magnitudes and ∆ is the median offset in these scatter plots.

Figure 12 .Flux
Figure 12.The normalized median absolute deviation of ∆z/(1 + z) (left) and log(M * /M ) (right) as a function of bands used to measure the parameter.As the sample is selected based on the i−band magnitude of galaxies, we start with training a Random Forest model based on only i−band data and then we add other bands following the same order of importance we find in Figure 3 and 6.Red horizontal lines show the scatter of the data relative to their mean value.

Figure 13 .
Figure 13.Similar to Figure 8, but for synthetic photometric data.The high-dimensional synthetic H20-like data are transformed to the space learned in Figure 8.The map is color-coded by the synthetic H−band fluxes.Existing dissimilarities between this figure and Figure 8 show that synthetic models lack the observed information.