The eROSITA Final Equatorial-Depth Survey (eFEDS): A Machine Learning Approach to Infer Galaxy Cluster Masses from eROSITA X-ray Images

We develop a neural network based pipeline to estimate masses of galaxy clusters with a known redshift directly from photon information in X-rays. Our neural networks are trained using supervised learning on simulations of eROSITA observations, focusing in this paper on the Final Equatorial Depth Survey (eFEDS). We use convolutional neural networks which are modified to include additional information of the cluster, in particular its redshift. In contrast to existing work, we utilize simulations including background and point sources to develop a tool which is usable directly on observational eROSITA data for an extended mass range from group size halos to massive clusters with masses in between $10^{13}M_\odot<M<10^{15}M_\odot.$ Using this method, we are able to provide for the first time neural network mass estimation for the observed eFEDS cluster sample from Spectrum-Roentgen-Gamma/eROSITA observations and we find consistent performance with weak lensing calibrated masses. In this measurement, we do not use weak lensing information and we only use previous cluster mass information which was used to calibrate the cluster properties in the simulations. When compared to simulated data, we observe a reduced scatter with respect to luminosity and count-rate based scaling relations. We comment on the application for other upcoming eROSITA All-Sky Survey observations.


Introduction
Improving our understanding of the mass function of galaxy clusters enables us to improve our inference on key cosmological parameters.These parameters include Ω M , the density parameter of matter in the Universe, and σ 8 which describes the dispersion of linear density fluctuations.The ongoing eROSITA (extended ROentgen Survey with an Imaging Telescope Array) All-Sky Survey (Predehl et al. 2021) on board the Spectrum Roentgen Gamma mission (Sunyaev et al. 2021) will provide us with the largest intra-cluster medium (ICM)-selected galaxy clusters to date which promises to provide tight constraints on cosmology through cluster abundance measurements (Merloni et al. 2012).A key ingredient in this analysis is to understand the cluster masses associated with a selected underlying sample (Bulbul et al. 2019).Traditionally this is performed with weak lensing (WL) calibrated scaling relations in the context of the eROSITA cluster census or using dynamical mass measurements (Mamon et al. 2013;Old et al. 2014Old et al. , 2015) ) in situations where data allows for this approach.In the context of eROSITA, the former procedure has been demonstrated on the Final Equatorial Depth Survey (eFEDS) using the Hyper-Supreme Camera WL mass measurements (see Bahar et al. 2022;Chiu et al. 2022).
Cosmology analyses through cluster abundances detected in the X-ray or SZ surveys heavily rely on the availability of external WL mass measurements (Mantz et al. 2015;Bocquet et al. 2019;Grandis et al. 2019).This procedure requires the knowl-edge of cluster masses through WL surveys and introduces bias and scatter in the final cosmology contours if survey data are not deep enough.Unaccounting for these biases and selection differences may affect the final cosmology measurements (Ramos-Ceja et al. 2022).Recently, applications of new machine learning (ML) tools and methods on large astronomy data and numerical simulations presented a promising method to reduce scatter on such cluster mass calibration using X-ray images (see Ntampaka et al. 2019;Green et al. 2019;Yan et al. 2020), SZ Compton y−maps (Cohn & Battaglia 2020; Wadekar et al. 2022b,a), and using optical data (Ntampaka et al. 2015;Ho et al. 2019;Kodi Ramanah et al. 2020;Ho et al. 2021Ho et al. , 2022)).
In this work, we present a method that avoids the explicit knowledge of these WL measurements by using X-ray data and the redshift of clusters.In spirit, this is the same approach as using existing scaling relations on a new cluster sample.To calibrate, or, put differently, train our ML model we are using simulations and the accuracy of these methods is determined by the cluster model in the training data.To apply this method on new observations reliably, we are interested in training our ML model with a realistic cluster sample, i.e. simulated clusters which represent our knowledge on clusters based on previous observations and represent the observational setting.In comparison to standard scaling relations, this ML model is more flexible as it can combine different features in a non-linear model.Furthermore, we consider models that utilize most of the information available, including the observation's energy and spatial information, rather than preprocessed features such as the luminosity (profiles) of a galaxy cluster.Given the success in other domains with similar data structures (such as in computer vision tasks on images Krizhevsky et al. (2017)), a natural candidate for such models are convolutional neural networks (CNNs).The potential of these methods for estimating galaxy cluster masses has been previously demonstrated in Ntampaka et al. (2019), where a reduced mass scatter compared to luminosity-based methods was reported.In this work, we modify these methods to address a cluster sample at a larger redshift (0.01 < z < 1.3) and mass range (10 13 < M/M < 10 15 ).Additionally, we account for emissions from other X-ray sources, e.g., active galactic nuclei (AGN), that are major contaminators in cluster analyses.Here, we present a method where additional filtering for such point sources is not required.
Finally, our neural network (NN) method incorporates a measure of uncertainty alongside the respective mass prediction.To estimate the uncertainty, we assume that the logarithm of our cluster masses is distributed according to some underlying Gaussian distribution with an associated mean and standard deviation.Both can be inferred using the log-likelihood associated with this normal distribution (cf.Section 3 for a detailed description).In addition, to account for the model uncertainty of our NN, we use a frequentist ensemble approach for our final mean and standard deviation.We train and validate our method on simulations of eROSITA galaxy clusters dedicated for eFEDS observations (Comparat et al. 2020;Liu et al. 2022b;Seppi et al. 2022).
This allows us to apply our method on the eFEDS cluster sample (Liu et al. 2022a) and provide for the first time ML mass estimates on cluster observations.When comparing the performance of our mass estimates with those obtained from WL calibrated scaling relations using count-rate measurements (Chiu et al. 2022) on the simulations we find a reduced scatter.Our results on simulations are of similar scatter as using idealised luminosity information.
The paper is organized as follows: In Section 2, we describe the respective data products used in this work.Section 3 describes our machine learning approach, and we discuss the results of our numerical work in Section 4. Our conclusions are presented in Section 5.
Throughout this paper, our simulated observations are obtained using a flat ΛCDM cosmology close to that of the Planck collaboration Planck Collaboration et al. ( 2020) with H 0 = 67.74km s −1 Mpc −1 , Ω m = 0.308900, Ω b = 0.048206, and σ 8 = 0.8147 as described in Comparat et al. (2020).Our masses M 500c refer to the mass included in the region with a mean density of 500 times the critical density.

eROSITA X-ray and Simulated Observations
This section presents the data we have used to train and test our machine learning method.We restrict ourselves to the data corresponding to the performance verification mini-survey of eROSITA, eFEDS (Brunner et al. 2022), the data analysis pipeline (Liu et al. 2022b) and the corresponding eFEDS simulations (see Comparat et al. (2019) for the procedure on how AGNs are simulated, Comparat et al. (2020) how galaxy clusters are painted for M 500c > 10 13.7 M and Seppi et al. (2022) for the extension to lower masses M 500c > 10 13 M ).
We comment on the extension of our method to eROSITA All-Sky survey (eRASS) observations in our conclusions in Section 5.

eROSITA X-ray Images
The 140 deg 2 eFEDS field, designed as a performance verification survey, has a uniform depth of 2.2 ks (1.2 ks after correcting for vignetting) approximately equal to the depth of the final eROSITA All-Sky Survey (Brunner et al. 2022).In this field, a total of 542 cluster candidates were detected with an extent likelihood threshold larger than six and detection likelihood larger than five (see Liu et al. 2022a, for details).Of these 542 candidates, 477 galaxy groups and clusters were confirmed with the follow-up optical data with redshift measurements (Klein et al. 2022).The clusters detected in the point source catalog are excluded in this analysis due to differences in the selection criteria (Bulbul et al. 2022).We use the subsample of 463 optically confirmed clusters, which have WL-calibrated features between 10 13 M < M 500 < 10 15 M .This selection is applied as this corresponds to the mass range on which our networks are trained on, i.e. the cluster sample from the eFEDS simulations subsequently described.
To create X-ray images, we use the eROSITA Standard Analysis Software System (eSASS Brunner et al. 2022), version eSASSusers_201009.The calibrated event lists are corrected for good time intervals, dead times, corrupted events and frames, and bad pixels.Images are generated in ten equally spaced energy intervals of 205 eV each in the soft band for the range 0.25 − 2.30 keV, using the eSASS tool evtool.Multiple energy bands are selected to maximize the information on the X-ray images, taking advantage of the superb soft sensitivity of eROSITA.We keep X-ray photons in a fixed square of 300 pixels (corresponding to 1200 ) centered on the X-ray centroid identified by eSASS.

eROSITA Simulated Images
The mock observations used in this study have the same exposure depth and field area to match the eFEDS observations.A method developed in (Comparat et al. 2020;Seppi et al. 2022) is employed to generate the mock photons for our training, validation, and test sets.A full-sky dark matter-only simulation provides the halo sample.Based on the properties of the dark matter halos, the X-ray properties of the sources are impainted using a Gaussian process model, which has been fit using previous cluster observations.These properties are then used to generate a source list passed to the SIXTE software (Dauser et al. 2019), which outputs the survey mock photons.It is worth stressing that these include not only cluster photons but, in addition, also point sources.
Within the eFEDS simulation, 18 realizations of the same eFEDS field are created to have enough sources for statistical analysis (see Liu et al. 2022a).All realizations together contain 148 833 clusters, whereas a single realization contains approximately 8 000 clusters.To train our neural networks on a representative sample, we restrict ourselves to the same thresholds for cluster selection used for the eFEDS catalog (eSASS software version eSASSusers_211214).This gives a final sample of 7 947 clusters.We found that these selection criteria improved our ML performance compared to using more clusters utilizing the ones with smaller detection and extent likelihoods.As for the observations, we use X-ray photons in a fixed square of 300 pixels centered around the halo cluster center, noting no performance difference between the simulated cluster center and the eSASS detected center.

Neural network input datasets
These respective images are in a standard data format for images in machine learning which are processed as a three-dimensional array where the first two dimensions carry the spatial information and the third dimension respectively carries "color" information.We modify the images to make our machine-learning pipeline more efficient.
To render the input less sparse and to have fewer memory requirements, we scale the boxes of size 300 × 300 × 10 down to a size of 50 × 50 × 10, and we apply Gaussian smoothing in all three directions, including the energy direction.The respective formula can be found in Appendix A.
We do not remove background photons or identified point sources, but we clip the pixel number at 36 to avoid instabilities in our neural network training.An example of such an energyband-image (EBI) can be found in Figure 1.To resolve the ambiguity between a less luminous cluster at low redshift and a highly luminous cluster at high redshift, we also use the redshift information as input to our network.We optimized the spatial region and smoothing used for the EBI, ensuring that in almost all cases, the entire cluster is visible in the image.
It is important to stress that these input images are independent of R 500c as such a selection would automatically include information about the cluster mass.
The respective mass and redshift distributions of clusters in eFEDS simulations and our eSASS selected sample are shown in Figure 2. From all realizations of the simulations, we end up with 7947 clusters from which we use 70% for our training, 15% for our validation, and 15% for our test set.

Machine Learning Method
As a proof of concept, we utilize a standard architecture using convolutional and pooling layers, followed by at least one dense layer.We provide the network with information about the source's redshift at the first of these dense layers.To avoid overfitting, we utilize preprocessing layers, which perform random rotations and flips during training, efficiently augmenting the training dataset.We found that this significantly improved per- To enable stable training for a large variety of hyperparameters, we first trained our networks using several standard regression loss functions (e.g.mean squared error loss of the logarithmic mass).A few hundred epochs of training are typically sufficient.To assess the performance beyond the values for the losses, we check the scatter of masses on the respective training and validation sets for additional biases.More details on our choices and associated scans can be found in Appendix B.
In our hyperparameter scan, we identify several promising architectures and perform further analysis of these architectures.
In particular, we train our network from scratch using a negative log-likelihood for each data sample to predict the mean and standard deviation of a Gaussian: where x n denotes the data, y n the data label, σ θ , µ θ are our predictions which depend on the neural network parameters θ.The relevant hyperparameters and the training curve for one of our well-performing models can be found in Figure 3. Finally, to address the systematic uncertainty in the neural network prediction, we opt for an ensemble method as presented in (Lakshminarayanan et al. 2016) and leave Bayesian approaches for the future (Gal & Ghahramani 2015).In practice, we repeat the training procedure with N random weight initializations and the final predictions are calculated via: We now turn to a discussion of the results obtained using this approach for mass estimation.

Results
To analyze the performance of our neural networks on eFEDS simulations, we firstly compare the predicted and actual mass distributions in the simulations.As shown in Figure 4, predictions on the test set and the scatter follow the ideal slope very closely.We observe a scatter of σ = 0.188 on the test set.
Our mean error prediction is identical to this value with a mean error of σ * = 0.188.As shown in Figure 5, we observe a bias for the mass range of 13.0 < log M 500 /M < 13.5 where we are over-predicting the mass on average.In the mass range 14.5 < log M 500 /M < 15.0 we are under-predicting the masses respectively.To interpret these biases, we performed two experiments: 1. To improve the quality of our training and test sample, we train with a cluster sample which has a detection and extent likelihood larger than 60.This reduces the scatter to σ = 0.159 on the test set using the same likelihood cuts.
In addition, we see a reduction of the bias for high-mass objects.Clearly this cut reduces the number of available clusters significantly (from 7947 to 1156 in total).In addition, it is very encouraging to see that our mean uncertainty also reduces to σ * = 0.158.2. To change the number of clusters at high and low-mass respectively, we weigh our samples to effectively generate a uniform distribution in mass during training.This ensures, in particular, that the network is more strongly penalized when falsely predicting high-mass clusters.We find that the scatter is slightly increased but we reduce the bias for the high-mass clusters from −0.177 to −0.097 and for the lowmass clusters from 0.121 to 0.082.This is encouraging as we only know approximately the observed distribution of cluster masses and our method should be able to compensate for small differences in the distribution.
These respective scatters in the mass predictions have to be compared with the underlying probabilistic cluster model and the application of scaling relations.First of all, there is the intrinsic scaling relation in the data where in our case the luminosity-mass scaling relation has a scatter of σ = 0.2 (cf. Figure 6 in Comparat et al. (2020)) and the temperature-mass scaling relation which has a σ = 0.07.We see that the scatter in our method depends on the quality of the dataset, i.e. when selecting clusters with high detection and extent likelihood we reduce the scatter below the luminosity scaling relation.Next, when comparing our scatter with scaling relations, a natural caveat is whether the respective scaling relation provides similar results as a scaling relation which is calibrated on this cluster sample e.g. using WL observables.To do this we utilize scaling relations which have been calibrated for the eFEDS cluster sample.When using the cluster luminosities and the luminosity mass scaling relations re-   2022) (cf.Eq. 67), we recover a scatter of σ = 0.197 on our test dataset with detection likelihood larger than 5 and extent likelihood larger than 6 where we have used the actual luminosities in the simulation.This is comparable to the luminosity scatter in the simulation and 4.8% larger than the scatter we observe for our NN masses.When applying to the higher quality cluster sample, the scatter reduces to σ = 0.186 but is significantly above the NN scatter.To apply these scaling relations we have used an appropriate selection function based using the cluster luminosity and redshift, although we note only a small effect on the ensemble level.In this analysis, the luminosity is normalized with the factor where the evolution factor E(z) = H(z)/H 0 , the pivotal mass M piv = 1.4 • 10 14 M and the pivotal redshift z piv = 0.35.Moreover, the scaling relation parameters as calibrated in the analysis are δ L X = −0.07,C S S ,L X = 2 and γ L X = −0.51.For this scaling relation analysis, a fiducial flat ΛCDM cosmology was used with H 0 = 70 km s −1 Mpc −1 , Ω m = 0.3, Ω b = 0.05, σ 8 = 0.8 and n s = 0.95.Further, to provide an outlook on inference of cosmological parameters, the scatter is worse when using the count-rate scaling relations on the data with detection likelihood larger than 5 and extent likelihood larger than 6 with the measured count-rate where we find a scatter of σ = 0.265.Note that this is without applying the selection function which in light of the effect on the luminosity scaling relation appears to have a small effect in changing the mass predictions for this sample.Such a selection function is currently not available for sample.A more detailed comparison of our systematic uncertainties with systematic uncertainties appearing for the scaling relations between count-rate and weak lensing mass as discussed in Grandis et al. ( 2021) is left for the future.
For both scaling relations we find a significant reduction in scatter.The amount of reduction depends significantly on the data used for training, this does not only dependent on the mass and redshift distribution.
One further advantage of the NN-based mass estimation is that the training networks use the full morphology information of the input clusters in the X-ray images (Ghirardini et al. 2022) compared to other methods and are not impacted by the line-ofsight structure or assumed 3D-morphology of the source when estimating masses (ZuHone et al. 2022) or hydrostatic mass bias often a problem for X-ray mass measurements (Scheck et al. 2023).
We note that the predicted means and the respective standard deviations do not vary hugely on an ensemble level.In particular, we observe that the ratio of the individual σ-values for each network and the correspond ensemble prediction σ * is given as Fig. 6: NN on eFEDS observed data: Comparison of WL calibrated mass estimates (using luminosity scaling relations as in Eq. ( 67) of (Chiu et al. 2022)) and masses obtained from our ensemble neural networks.Left: Respective mass predictions on eFEDS clusters.The uncertainties in the mass predictions are color-coded and correspond to the NN uncertainties.Right: Correlation between predicted masses on eFEDS observations and the measured luminosities as presented in (Chiu et al. 2022).
σ/σ * = 0.951±0.039where we quote the single standard deviation values and where we have averaged over all clusters in the test sample.On an individual level we report the clusters with the highest and smallest differences in the predicted masses (see Figure 7).We often find upon visual inspection that the largest differences in the predicted masses occur when other bright Xray sources are present in the EBI and our cluster of interest is a less luminous source.
Having seen that our method provides sensible looking mass estimates on eFEDS simulations, we now estimate the masses for the eFEDS cluster sample with extent likelihood larger than 6 and detection likelihood larger than 5.We use the ensemble of neural networks which we have trained on data with the same selection criteria.We show the scatter between our NN predicted masses and the masses obtained using WL-calibrated luminosity scaling relations in Figure 6.We observe that both predictions agree for clusters where our NN ensemble predicts a low uncertainty σ * < 0.185 (more visible points correspond to clusters with such a low uncertainty).We note that for the few clusters present in the eFEDS cluster sample which have a mass below the range we have trained on, our neural network ensemble still predicts masses in the mass regime it was trained on and does not generalize for this data outside of the known regime.
Further, to compare our predicted masses with the luminosity-mass estimates of the clusters, we show the scatter between the luminosity and our mass estimates on the right of Figure 6.Overall, we find a linear relation which is close to the slope identified via the WL-calibrated scaling relation.However, we find deviations from the WL-calibrated masses at high masses.Further analysis of the features being used by the NN ensemble, ultimately aiming at a data-driven scaling relation, is beyond the scope of this paper.

Conclusions
We have demonstrated that galaxy cluster masses can be estimated using NN ensemble predictions when applied to the eFEDS-field of eROSITA, both to the respective simulations and actual observations.Depending on the training data, we observe a significant reduction in scatter in comparison to luminosity based scaling relations from σ = 0.186 to σ = 0.159 on a sample with higher detection and extent likelihood and from σ = 0.197 to σ = 0.188 on the entire sample.Compared to count-rate based scaling relations, the improvement is from σ = 0.265 to σ = 0.188.Our approach is applicable to clusters at different redshifts and we are not required to remove other clusters or point sources from the respective images to mimic a realistic observational set-up.Going beyond existing NN methods for cluster mass estimation, our method provides uncertainty measurements of the NN predicted masses for each cluster.Our ML approach can be integrated into a highly developed workflow for estimating cluster masses and their subsequent use for cosmological parameter inference.The interplay with each of these components is important to understand shortcomings and potentials for improved mass estimates in the future: -WL and other additional measurements: Given the dependence on our simulations of eFEDS clusters, our NN methods do not require in addition WL measurements for (a subset of) the X-ray selected cluster sample.Any constraints, e.g. for a subsequent cosmological analysis, arising from the requirement of availability of WL information can be circumvented.
Our model can be easily expanded and improved by adding new features and observations, similar to using redshift information to the network.For instance, richness information coming from optical observations of clusters of galaxies, as being developed in the context of Euclid, promises to improve the NN-predictions in the high-mass end (e.g.Euclid Collaboration et al. 2022).As in any other model, adding new multi-wavelength data requires appropriate calibration and will be used in future work.-Simulations: as our ML approach is obtained from the underlying data model, it heavily depends on the data used for training.To make our method work, it is crucial that the training data is of sufficient quality.This requires, for instance, that the training data contains clusters in the appropriate mass regime and that the clusters in the training sample are ideally very close to the cluster sample the method is applied on.At this stage, a generalization beyond properties captured by the training data is not guaranteed.Throughout this project, we often encountered performance deterioration when including different cluster samples for training.Addressing the independence of the training data is a clear future goal but -as demonstrated here -can be circumvented by utilizing a dedicated training set.Implicitly, our method depends on the data used to shape the simulations and in particular on the underlying scaling relations.However, we crucially observe that the mass distribution of clusters in the training sample is not of high importance as showcased when using a uniform distribution of masses (Figure 8).This is particularly encouraging as this allows for generalization across different mass distributions from different cosmologies.-ML vs. known astrophysical features: there are two approaches for predicting masses using ML; either using known astrophysical features, e.g. the measured luminosities or count-rate, as the input (cf.Green et al. (2019)) or directly the photon information.Here, we explore the latter and demonstrate that it provides competitive mass estimates.Future studies will provide more information on which method ultimately predicts the most accurate mass estimates.It would be very interesting to compare the ML features with previously identified features (e.g.using, appropriate dimensional reduction and symbolic regression, see Wadekar et al. (2022b) for work in this direction).
Finally, we summarize the advances from the ML-based mass predictions, presented here: -We demonstrate, for the first time, that meaningful uncertainty measures can be provided with the mass estimates in X-ray cluster mass estimations with ML and, in particular, neural networks.This is a crucial requirement for integrating ML-based methods into cosmological analyses with cluster counts.-As our simulations also includes clusters with masses as low as 10 13 M , we are able to demonstrate for the first time that this neural network approach to X-ray cluster mass estimates also works in this mass regime without introducing large biases.A further successful extension to the low-mass regime would be very interesting and could dramatically increase the sample utilized for cosmology, we found that objects with a high detection and extent likelihood provide an avenue forward.
-Instead of a single channel that can only capture information about the total number of photons, we utilize an input format that also captures the energy information of photons.
Our EBIs enable the neural networks, at least in principle, to utilize energy-dependent information such as the cluster temperature.
One of the immediate next objectives is to apply our method to other eROSITA cluster samples, particularly the upcoming All-Sky Survey data.To make our method applicable to these observations, we need to ensure that the performance does not decrease due to the different exposure times for individual clusters in those samples, as the first All-Sky survey data is shallower than the eFEDS data used in this work.A further extension to observations of other X-ray telescopes (e.g.XMM, Chandra), despite being very interesting, would require dedicated datasets to train the ML method appropriately.
As this paper was in its final stage, the preprint Ho et al. (2023), which discusses a similar question, appeared on astroph.Our approach differs from Ho et al. (2023) by the use of the simulation data sets for training.The sample of simulated clusters used in this work represents the eROSITA cluster selection.Our method could hence be successfully applied to the eROSITA survey observations and compared with the observational WL mass measurements utlized for the same sample, selfconsistently.Through this work, we also provide a clear path toward using ML-based masses in cosmological analyses.Additionally, we successfully utilize a likelihood loss for the first time, enabling uncertainty estimates, i.e., a prerequisite for employing ML-based masses in future scaling relations and cosmology analyses.The eROSITA data shown here were processed using the eSASS software system developed by the German eROSITA consortium.

Fig. 1 :
Fig. 1: An example X-ray image as input to our neural networks.All ten bands of the image of the galaxy cluster (SRC_ID: 10006566 from realization 5) of the eFEDS simulations.Each image has a dimension of 50 × 50.Due to our smoothing, the photon values are continuous.This cluster has a mass log (M 500 /M ) = 15 and is located at a redshift of 0.11.

Fig. 2 :Fig. 3 :
Fig. 2: The simulated cluster sample with and without the applied filters used for training and evaluation with their respective redshift and mass distribution.The total number of simulated and filtered clusters is 25031 and 7947, respectively.

Fig. 4 :
Fig. 4: NN on simulations: Overview of mass estimation on eFEDS simulations using our ensemble of 30 convolutional neural networks trained with our likelihood loss from Eq. (1) (cf. Figure 3 for the hyperparameters).Left: The mass scatter between predicted mean masses and masses from the simulations on the training set.The colors indicate our predicted standard deviation.Middle: Our mean mass predictions on the test set.Right top: The distribution of our error log 10 (µ * /M ) − log 10 M true 500c /M .Right bottom: The distribution of our error estimates which show a mean uncertainty of 0.189 on the test set.

Fig. 5 :
Fig. 5: NN biases on simulations: The mass predictions on eFEDS simulation test sets for different mass ranges to evaluate respective biases between NNs trained on datasets with three different datasets described in the main text.

Fig. 7 :
Fig.7: All ten channels of the EBI of the galaxy cluster with the highest and lowest ∆µ = µ max − µ min , where µ max (µ min ) corresponds to the highest (smallest) predicted mass by the ensemble members.Top: This cluster -located as usual at the center of the EBI -has a mass of log M 500 /M = 13.959 and is at redshift z = 0.211 in the simulations (Object with SRC_ID: 10003763 from realization 18).The lowest and highest predicted mean values in our ensemble are µ min = 13.323 and µ max = 14.587 respectively.For illustration purposes, to make the actual cluster visible, we have clipped the photon values at 1.The final mass predictions are log M 500 /M = 14.079 ± 0.305.Bottom: An example of a cluster with little differences among the ensemble (SRC_ID: 10006556 from realization 9), it has a mass of log M 500 /M = 13.873 and is at redshift z = 0.305.We obtain µ max = 14.095 and µ min = 13.994 and our final prediction for this cluster is log M 500 /M = 14.042 ± 0.161.

Fig. 8 :
Fig. 8: NN trained on uniform cluster sample: Overview of mass estimation on eFEDS simulations using our deep ensemble trained with class weights such that the effective mass distribution is close to uniform, illustrating the robustness to uncertainty in the underlying mass distribution.Left: The mass scatter between predicted means and masses from the simulations on the training set.The colors indicate our predicted standard deviation.Middle: Our mass predictions on the test set.Right: Comparison of the residuals and distribution of standard deviations in our test set mass distributions between our normal training procedure with masses distributed as shown in Fig. 2 and our weighted clusters.The distribution of our error estimates shows a mean uncertainty of 0.228.
E.B., A.L., V.G., X.Z., and C.G. acknowledge financial support from the European Research Council (ERC) Consolidator Grant under the European Union's Horizon 2020 research and innovation programme (grant agreement CoG DarkQuest No 101002585).