CHEX-MATE: A non-parametric deep learning technique to deproject and deconvolve galaxy cluster X-ray temperature profiles

.


Introduction
Galaxy clusters are ideal probes of the large-scale structure of the Universe (Holder et al. 2001;Planck Collaboration XXIV 2016;Bocquet et al. 2015;Sereno et al. 2017;Abbott et al. 2022).Xray observations of the hot gas in the ICM, which constitutes the dominant baryonic component in galaxy clusters, provide us with a useful tool for identifying and studying these objects.Shallow, wide-field X-ray surveys by ROSAT (ROentgen SATellite) and eROSITA (extended ROentgen Survey with an Imaging Telescope Array) have now discovered thousands of clusters (e.g.Piffaretti et al. 2011;Klein et al. 2022, and references therein).In recent years, the detailed X-ray follow-up of samples extracted from these surveys has exploited the high spatial resolution of Chandra and the large field of view and sensitivity of X-ray Multi-Mirror Mission (XMM-Newton) to investigate the morphological, structural, and scaling properties of the cluster population (e.g.Lovisari & Maughan 2022;Kay & Pratt 2022, and references therein).
The X-ray-derived radial temperatures and density profiles are key ingredients to derive the thermodynamic properties of the ICM, and, under the assumption of hydrostatic equilibrium, the total mass profile in galaxy clusters (Böhringer et al. 2007;Pratt et al. 2010;Ettori et al. 2010Ettori et al. , 2013;;Eckert et al. 2022).These X-ray studies have revealed the presence of two distinct types of clusters: cool cores (CCs), characterised by dense and the lowtemperature cores, and non-cool core (NCCs), which exhibit a relatively flat central density and temperature.Various morphological parameters have been introduced to analyse X-ray images and to link these to the dynamical behavior of galaxy clusters and to the presence or absence of a low temperature cores, providing insights into their structural characteristics, internal dynamics, and evolutionary stages (Rasia et al. 2013;Campitiello et al. 2022).Although it is now well-established that Active Galactic Nuclei (AGN) feedback plays a major role in suppressing the ICM cooling in cluster cores, the reason for the CC and NCC dichotomy is still not fully understood (Rasia et al. 2015;Barnes et al. 2018).
X-ray observations give access to the projected (2D) density and temperature profiles of the ICM.The latter is obtained from fitting a thermal model to the spectra extracted in concentric annuli about a given centre (usually the X-ray peak or centroid).For further scientific applications, these must then be deprojected to obtain the 3D profiles.If needed, the effect of the instrumental point spread function (PSF) can be taken into account in the deprojection step.While the deprojected (3D) gas density in shells can be easily estimated from the X-ray surface brightness (Croston et al. 2006;Bartalucci et al. 2017;Ghirardini et al. 2019a), the deprojection of ICM temperature profiles is not trivial.This is partly due to the need for sufficient photon counts to build and fit the spectrum, leading to the temperature profiles having significantly coarser angular resolution than the density.
The relationship between the observed 2D temperature profile, T 2D , and the originating 3D temperature profile, T 3D , can be expressed in matrix form as where ⊗ denotes the matrix product.Assuming a cluster is spherically symmetric and that the 3D temperature profile is defined in concentric spherical shells, the (i, j) th element of the matrix C proj encodes the projection effect of the j th 3D shell onto the i th 2D annulus on the plane of the sky.The 2D annuli may have the same or different radii to the 3D shells.We note that C PSF is a second matrix that describes the effect of the finite instrumental PSF.Its (k, i) th element contains the fraction of counts from the i th 2D annulus that are redistributed by the telescope into the k th observed 2D annulus.If there are n-model 3D shells, and correspondingly n-model 2D annuli, plus m-observed annuli, then the dimensions of C proj and C PSF are n × n and m × n, respectively.If the PSF is ignored, then the dimensions of C proj should change to m × n.
The fitting of projected parametric models of the 3D temperature profiles to both observed and simulated 2D data has been widely used in the literature (De Grandi & Molendi 2002;Pizzolato et al. 2003;Ascasibar & Diego 2008;Bulbul et al. 2010;Gaspari et al. 2012;Ghirardini et al. 2019b).Initially, these were polytropic models that assumed a simple relationship between the density and the temperature distribution (T ∝ ρ γ−1 ), but this does not fully capture all the complexities of real galaxy clusters, especially the central regions of CCs clusters.The quality of recent data has necessitated more complicated models to be proposed, perhaps the most widely used being that proposed by Vikhlinin et al. (2006): where x = (r/R cool ) a cool and {T 0 , τ, R t , R cool , a cool , a, b, c} are the model parameters.In the framework of the Representative XMM-Newton cluster structure survey (REXCESS; Böhringer et al. 2007) and Following the most massive galaxy clusters over cosmic time (M2C; Bartalucci et al. 2019) projects, Démoclès et al. (2010) and Bartalucci et al. (2018) developed a non-parametric-like deconvolution approach.For this study the Vikhlinin et al. (2006) parametric model was used to perform the PSF correction and deprojection in order to estimate the temperature at the weighted radii of the 2D annular binning scheme.The 3D uncertainties were then computed consistently from the 2D errors, and random temperatures were drawn within these uncertainties to compute the temperature derivatives which were used in the hydrostatic equilibrium total mass computations.However, parametric approaches are not fully satisfactory since, with a limited number of parameters, they could fail to capture features in the temperature profile due to shock fronts, edges, mergers and the presence of cool cores with one single model.Moreover, a high degree of degeneracy between the parameters could be present.The Vikhlinin et al. (2006) parametric temperature model, which was developed for cool core systems, is a complex eight-parameter model, four of which correspond to the cool-core component.It is therefore not well-suited to highly disturbed NCC clusters, which have flatter central temperature profiles instead of declining cool cores.Furthermore, for typical X-ray data quality, it exhibits a high degree of degeneracy between its parameters, leading to poorly constrained model parameters and the results that depend on the prior choices in MCMC fitting schemes.Recently, Gianfagna et al. (2021), using a sample drawn from high resolution numerical simulations, found that the Vikhlinin et al. (2006) parametric model could only fit well to 50% of their sample in the range [0.1-1] R 500 1 .
Model-independent direct spectral deprojection methods offer an alternative and are commonly used to deconvolve the 3D temperature profiles.This can involve the onion-skin technique (Fabian et al. 1981;David et al. 2001;Johnstone et al. 2005;Russell et al. 2008;Lakhchaura et al. 2016), where the 3D layers are successively built up from the outside in.However, this approach is strongly dependent on the choice of the outermost bin because it is necessary to take into account the contribution to the emission from the shells outside the outermost annulus used for the analysis.Alternatively, isothermal models can be fitted to each annular spectrum and then the matrix method (i.e.Eq. 1) can be used to deproject (e.g.Ettori et al. 2002).Ignoring the PSF effect, the equation for temperature profiles assuming that the observed projected spectra consist of a linear combination of isothermal emission models weighted by the projected emission measure simplifies to Here, T 3D, j and T 2D,k are the 3D and 2D temperatures at the jth 3D spherical shell and kth 2D observed annulus, respectively, and the weights, w k, j , consist of the emission measure contribution of the spherical shells onto the observed annuli (e.g.Mathiesen & Evrard 2001).However, such model-independent approaches are often unstable if the data are noisy because Eqn.1 is an inverse problem, meaning that any noise becomes greatly amplified by the deconvolution procedure.In addition, the simplistic emission measure weighting has been found to be inaccurate when applied to Xray observations.In particular, it has been demonstrated that in the presence of a multi-temperature components gas, w is more appropriately expressed as a non-linear combination of density and temperature (Mazzotta et al. 2004;Vikhlinin 2006), further complicating the deconvolution procedure.
Machine Learning (ML) techniques have emerged as a powerful technique for predicting key features of data and for solving inverse problems to reconstruct (deconvolve) signals, images, etc, from observations.ML techniques have been applied to study galaxy clusters too.Ntampaka et al. (2015) developed an ML algorithm based on Support Distribution Machines to reconstruct dynamical cluster masses using the velocity distribution of cluster members from simulations, achieving a reduction in the scatter between the predicted and true mass by a factor of two compared to standard methods.More complex ML approaches have led to similar significant improvements in the mass estimates (Armitage et al. 2019;Calderon & Berlind 2019).
Using deep learning techniques, Convolutional Neural Network (CNN) models have also been used to infer the dynamical mass of galaxy clusters (Ho et al. 2019;Ramanah et al. 2020;Ho et al. 2021;de Andres et al. 2022).In particular, Yan et al. (2020) used mock datasets of stellar mass, soft X-ray flux, bolometric X-ray flux, and Compton y-parameter images as input to train a CNN model to infer the mass of galaxy clusters, and Gupta & Reichardt (2020, 2021) trained CNN models to estimate cluster masses used mock SZ, cosmic micro-wave background (CMB) lensing maps.Ferragamo et al. (2023), using a combination of an auto-encoder and a random forest regression technique on a sample of 73,138 mock Compton-y parameter maps from the hydrodynamical simulations of the Three Hundred Project (Cui et al. 2018), and were able to reconstruct the 3D gas mass profile and total mass in galaxy clusters with a scatter of about 10% with respect to the true values.de Andres et al. (2022) and Ho et al. (2022)  In this work, we show the first use of neural networks, trained on numerical simulations, to deproject the X-ray temperature profiles of galaxy clusters.Our technique is based on that proposed by Bobin et al. (2019Bobin et al. ( , 2023) ) where a so-called Interpolatory Autoencoder (IAE) neural network is built to model the 3D temperature profiles by learning a non-linear interpolatory scheme from a limited set of example profiles called 'anchor points'.The main advantage of the IAE neural network is that it is able to capture the intrinsic low-dimensional, non-linear nature of the profiles even when the training sample is not large in size.This is crucial as a small sample size can otherwise pose several challenges to the effectiveness of a deep learning algorithm.The model is trained and tested with a set of 315 simulated temperature profiles, in the radial range of [0.02-2] R 500 , from the three hundred project (Cui et al. 2018).A robust temperature deconvolution scheme is then introduced to fit the trained IAE model, that makes use of an efficient regularisation term in the likelihood, along with Markov chain Monte Carlo (MCMC) sampling.The technique is then applied to a pilot sample of Xray temperature profiles from the CHEX-MATE project (Cluster HEritage project with XMM-Newton: Mass Assembly and Thermodynamics at the Endpoint of structure formation; CHEX-MATE Collaboration 2021).
The paper is organised as follows.Section 2 discusses in detail the simulations used in training the IAE model for temperature profiles.In Section 3 we present the IAE model, and Section 4 deals with model training and the learning-based deconvolution technique.The performance of the deconvolution algorithm is tested with simulations in Section 5, while in Section 6, we apply our approach for the first time to a representative sample of 28 galaxy clusters from the first data release (DR1 hereafter, Rossetti et al. 2023, in prep.) in the CHEX-MATE sample.Finally, in Section 7, we summarise our work.Throughout this work, we adopt a flat ΛCDM model with H 0 = 70 km s −1 Mpc −1 , Ω m = 0.3 and Ω Λ = 0.7.Further, E(z) is the ratio of the Hubble constant at redshift z to its present value, H 0 and h 70 = H 0 /70 = 1.

Simulations
In this work, training of the neural network is undertaken using the gas mass-weighted 3D temperature profiles, T 3D , of galaxy clusters from the Three Hundred Project (Cui et al. 2018;Ansarifard et al. 2020).These simulations are based on the 324 Lagrangian regions centred on the z = 0 most massive galaxy clusters selected from the MultiDark dark-matter-only MDPL2 simulation (Klypin et al. 2016), carried out with the cosmological parameters from the Planck mission (Planck Collaboration XIII 2016).MDPL2 is a periodic cube of comoving size equal to 1.48 Gpc containing 3840 3 dark matter particles.The selected Article number, page 3 of 32 regions were resimulated with the inclusion of baryons and were carried out with the code GADGET-X (Beck et al. 2016).To treat the baryonic physics several processes were included such as: metallicity-dependent radiative cooling, the effect of a uniform time-dependent UV background, a sub-resolution model for star formation from a multi-phase interstellar medium, kinetic feedback driven by supernovae, metal production from SN-II, SN-Ia and asymptotic-giantbranch stars, and AGN feedback (Rasia et al. 2015).
In the present work, we ignore the redshift dependence of the profiles, if any, and only consider the simulated sample at a fixed redshift of z = 0.33, which is the average redshift of the CHEX-MATE sample.However, we consider a mass range of M 500 > 10 14 M ⊙ allowing us to build a library covering the full mass range of the CHEX-MATE sample.This left us with 314 clusters in the simulated sample.
The temperature profiles were derived in 48 fixed logarithmically spaced radial bins in the range [0.02-2] R 500 (Ansarifard et al. 2020).The lowest radial limit of 0.02 R 500 was chosen since it encloses approximately 100 gas particles for the simulated sample, which we call the precision threshold condition, thus ensuring that the analysis is statistically robust and that the results are not affected by numerical fluctuations in the gas properties at small radii (Rasia et al. 2015).The 3D mass-weighted temperature in a given shell i, T 3D,i (i.e the i th element of the T 3D vector), was calculated by weighting the temperature of the p th gas particle (T p ) using its gas mass (m p ) as a weighting function w, In this calculation, no attempt was made to exclude lowtemperature sub-clumps in the outskirt regions of the clusters, however, only particles with temperature >0.3 keV were considered.
We estimated the projected 2D temperature profiles (T 2D ) along the line of sight (l) using the 3D gas density (ρ) and temperature profiles (T 3D ).The 2D temperature profiles were estimated in pre-defined logarithmically spaced annular bins by first considering the classical emission-measure weights (C = C proj , see Eqn. 3): where w = ρ 2 (e.g.Mathiesen & Evrard 2001).We produced several versions of the T 2D profiles: First the T 2D profiles were first estimated in the same radial bins as those of the T 3D (48 bins) by using a matrix C of dimension 48 × 48 (C 48,48 ).We also estimated T 2D in a coarser binning scheme to reproduce typical radial sampling from present-day X-ray observatories such as XMM-Newton and Chandra.These have either twelve or six logarithmic bins reaching only up to R 500 , corresponding to matrices of dimension 12×48 (C 12,48 ) and 6×48 (C 6,48 ) respectively.We also considered a more complex case where we use the spectroscopic-like weighting proposed by Mazzotta et al. (2004) to generate the 2D temperature profiles using the binning schemes discussed above.In this case, apart from the normalisation, the matrix elements of C simply change to C i, j = C i, j T 3/4 3D, j (or equivalently, the weights change to w = ρ 2 T −3/4 3D ), where T 3D, j is the mass-weighted 3D temperature profile in the j th bin.
In many clusters in the simulated sample, the temperature profiles in the first few inner bins (typically 0-13 radial bins corresponding to radii between ≈ [0.02-0.07]R 500 ) were noisy (i.e having < 100 gas particles).For such systems, the 2D profiles were estimated without considering such bins.
Figure 1 shows the observed scaled 2D temperature profiles of the Planck SZ sample (Planck Collaboration XI 2011) and the XMM-Newton DR1 sample (Rossetti et al. 2023, described in detail in Sect. 6.2).These are compared to 50 randomly drawn 2D temperature profiles from the Three Hundred Project using emission measure (left panel) and spectroscopic-like (right panel) weighting schemes and an observation-like convolution matrix, C 12,48 .Both observational and simulated temperature profiles were scaled by the average 2D temperature (T X ) in the radial range of [0.15-0.75]R 500 .Figure 2 shows the distribution of the clusters in the simulated sample, Planck sample and DR1 sample on the basis of T X .These two figures illustrate three points that will be critical for the following study: 1.In common with a number of works over the last 20 years (e.g.De Grandi & Molendi 2002;Vikhlinin et al. 2006;Pratt et al. 2007;Leccardi & Molendi 2008;Ghirardini et al. 2019a), the structural similarity in the observed temperature profiles are clearly visible in Fig. 1.The central regions are characterised by a large spread, due to a mixed population of cool core and disturbed systems, while beyond the central 0.15 R 500 the profiles all decline in a similar fashion.
2. The simulated profiles follow the same general trend as the observed profiles.The average trend and 1-σ dispersion of the simulations is very consistent with that of the CHEX-MATE DR1 sample.The simulated temperature profiles on average are slightly hotter in the centre compared to the Planck SZ sample.This may be related to the fact that there are more low mass clusters in the simulated sample compared to the Planck SZ sample.Such low mass clusters are expected to be more strongly affected by AGN feedback, potentially leading to higher temperatures in the central region (Iqbal et al. 2018).Alternatively, the higher central temperatures in the simulations may simply be due to the fact that the sample has a large number of NCC clusters.
3. Overall, the observed temperature profiles are well represented by the simulated sample.This fact will be key to a successful training stage of the IAE model, which relies on identifying underlying trends in the data that would not otherwise be found.We note that the simulated profiles do not have to precisely match the observed data: as we will see, the most important point is that they reproduce the overall structure and diversity of the observed profiles, which is what our IAE model learns.
We further classified the simulated clusters using three schemes.This is important to quantify how well the IAE model reconstructs the radial temperature distribution for different types of objects and profile shapes.

CC and NCC classification
Firstly, we classify the profiles as CC and NCC by visual inspection.The objective here is simply to select simulated profiles that mimic those of observed cool-core-like clusters with a central temperature drop, and non cool-core clusters that display an almost isothermal central temperature profile.The profiles which Article number, page 4 of 32 .
show a decreasing trend towards the cluster centre (positive temperature gradient) were classified as CC clusters.We identify about one-third of the clusters as belonging to the CC class.In Fig. 3, grey lines in the left panels and right panels show the 3D temperature profiles (T 3D ) of CC and NCC clusters respectively.

Dynamical classification
Clusters in these simulations were classified on the basis of their intrinsic dynamical state (relaxed or disturbed) using a variety of estimators (Rasia et al. 2013).The two important intrinsic estimators are f s = M sub /M tot , the fraction of cluster mass (M tot ) included in substructures (M sub ), and ∆ r = |r δ − r cm |/R ap , which is the measure of the offset between the central density peak (r δ ), and the centre of mass (r cm ) of the cluster normalised to aperture radius R ap .Both of the estimators were computed at R 500 .Both f s and ∆ r are expected to be lower than 0.1 for relaxed objects (Cialone et al. 2018;De Luca et al. 2021).These two dynamical parameters can be combined (Rasia et al. 2013) to give the so-called relaxation parameter χ Here ∆ r,med and f s,med are the medians of the ∆ r and f s distributions, respectively, and ∆ r,quar and f s,quar are the first or the third quartiles, depending on whether the parameters of a specific cluster are smaller or larger than the median.According to this definition, clusters with χ D < 0 are classified as relaxed, and clusters χ D > 0 are classified as disturbed.The left panel of Fig. 4 shows the histogram of χ D values.The cyan and magenta hatched regions represent the 20 most relaxed clusters and 20 most disturbed clusters, respectively.We will refer to these sub-samples as MR20 and MD20 hereafter.In the top panel of Fig. 3, we show the corresponding temperature profiles of the MR20 clusters (left panel) and the MD20 clusters (right panel) with cyan and magenta lines, respectively.It is interesting to note that only a few of the most relaxed objects are also categorised as CC clusters.Visual inspection of emissivity maps shows, as expected, that χ D is clearly linked to the overall gas morphology, as also found in Campitiello et al. (2022).

Structural classification
To enable a better assessment of the performance of the IAE model for temperature profile reconstruction, we also classified the 3D temperature profiles based directly on their smoothness.Bumps in the temperature profiles are usually associated with complex astrophysical processes such as merger shocks, gas condensation, the presence of cold substructures, sloshing, and turbulence, all of which affect the temperature in a given annulus.To measure the degree of the bumpiness of the 3D tem-Article number, page 5 of 32 A&A proofs: manuscript no.aanda_v3 perature profiles, we used the starlet wavelet transform, which is widely used in component separation in astrophysical images (Starck et al. 2007), to split each profile into its smooth and nonsmooth components.Using this technique, the 3D temperature profile T 3D (r) can be decomposed into a J + 1 coefficient set W = {w 1 , ..., w J , T J }, as a superposition of the form where T J is a smooth (coarse resolution) version of the original temperature profile and w j represents the structure in the temperature profile on scale 2 − j .
Figure 5 shows the starlet decomposition for one of the clusters in the Three Hundred Project which exhibits a complex shape in the range [0.5-1] R 500 .The cluster is experiencing a major merger and there is an enhancement of the temperature due to the propagation of a shock in this region.We use the starlet transform with J = 2, which we have found to be the optimal configuration to measure the non-smoothness, yielding a decomposition into a smooth temperature component and two additional nonsmooth components, w 1 (r) and w 2 (r).We then define the root mean square deviation, χ S of the difference between the true and smooth temperature profiles in the radial range of [0.08-1] R 500 as a measure of the non-smoothness of the temperature profiles.
where u is the number of data points in the range of [0.08-1] R 500 , and the lower limit of 0.08 R 500 corresponds to the radius at which all clusters satisfy the precision threshold condition.The temperature profiles were first scaled (normalised) by the average mass-weighted temperature in the radial range of [0.15-0.75]R 500 before applying the decomposition operator to calculate χ S .The right hand panel of Fig. 4 shows the distribution of χ S for the full sample, which follows an approximately log-normal distribution.The green and orange hatched regions represent the 20 most smooth profiles and 20 most irregular profiles, respectively, based on the χ S criterion.We will refer to these sub-samples as MS20 and MI20 henceforth.In the bottom panel of Fig. 3, we show the corresponding temperature profiles of the MS20 (left panel) and MI20 profiles (right panel) with green and orange lines respectively.Here also, only a few of the clusters with the most smooth profiles are categorised as CC clusters.The correlation between χ D and χ S is shown in

Neural network model for learning 3D temperature profiles
The deconvolved temperature profile can in principle be obtained by solving the following classical inverse problem where C is a non-linear operator (matrix) which represents the observational and instrumental effects (projection, PSF, etc) and N represents the statistical properties of the noise.The standard way of solving Eqn. 9 is to consider least squares regression with some regularisation R where T fit 3D is the best-fitting model profile for T 3D , which is obtained by optimising the above relation with respect to T. However, Eqn. 9 is an ill-posed (non-linear) problem, and using standard non-parametric methods does not result in a unique and stable solution.Therefore, one has to resort to advanced deconvolution techniques.In this work, we propose one such algorithm that makes use of neural networks to model the temperature profiles, and whose framework will be explained below.A learningbased regularisation procedure for direct deconvolution using the trained neural network is discussed in Sect. 4.
Our approach is based on manifold learning, which stems from the manifold hypothesis, that suggests the existence of a lower dimensional manifold on which real-world data lies (Fefferman et al. 2013).This is evidently the case for galaxy cluster temperature profiles, which clearly display some degree of regularity, as seen in Fig. 4. The goal is then to find the lower dimensional manifold by learning the underlying structure of the data.When one has access to a large training set (from observations and/or simulations), it may be possible to make use of machine learning (deep learning) methods to build an underlying manifold.However, this becomes quite difficult when available training samples are sparse, as is the case for cluster temperature profiles.In such cases, rather than learning the underlying manifold structure, Bobin et al. (2023) proposed the Interpolator AutoEncoder (IAE), that learns to travel on a manifold by way of interpolation between a limited number of anchor points that belong to it.
We assume that any temperature profile in a training set {T i } i=1,...,n , where n represents the total number of elements in the set, can be interpolated from a small set of d anchor points {T e a } e=1,...,d using an appropriate metric Π where Θ is called the barycentre.The elements of vector are the barycentric weights ( d e=1 λ i e = 1) which are optimised in the above equation.If we consider the metric Π to be Euclidian, then The above equation reduces Θ(Λ i ) to an orthogonal projection onto the span of anchor points T e a , that is The problem then reduces to finding (optimising) barycentric weights such that the barycentre Θ accurately reconstructs any input temperature profile in the training sample.
However, if the profiles are non-linear, with varying amplitudes and shapes, as is the case with the temperature profiles in galaxy clusters, the standard metric Π may not reconstruct an appropriate barycentric representation.Our method, therefore, uses the approach proposed by Bobin et al. (2019Bobin et al. ( , 2023)), in which a data-driven metric is constructed using a deep learning neural network that is well adapted to build physically relevant barycentres of anchor points.We introduce an auto-encoder (Vincent et al. 2010) inspired neural network model which learns to transport points (temperature profiles in our case) onto the underlying manifold using a non-linear interpolation scheme between the anchor points.
Article number, page 7 of 32 The structure of the neural network we are considering is shown in the left hand panel of Fig. 6.It consists of an encoder (Φ), that takes an input, and a decoder (Ψ), that generates the desired output.The role of the encoder is to transform the input data into a lower-dimensional representation, while the decoder is responsible for mapping the lower-dimensional data back into the original space.By performing these mappings, auto-encoders are able to learn the underlying structure of the data.In contrast to standard auto-encoders, our model training is performed by minimising the error between the input and the reconstructed training sample according to the Euclidean distance onto the manifold spanned by the anchor points in the encoder (feature space).
More precisely, for the encoder Φ, the representation of the input profile T i (belonging to the training set Φ(T i )) is expressed in terms of the barycentre, Θ, in feature space, as an orthogonal projection onto the span of the anchor points Φ(T e a ) given in Eqn.13: The barycentric weights are constrained to sum to one so as to avoid certain scaling indeterminacies, and are not necessarily constrained to be positive like actual barycentric weights, which potentially allows us to extrapolate beyond the affine hull of the encoded anchor points.More precisely, the barycentric weights for the n elements in the training sample are computed as follows: which can be approximated by taking the solution to the leastsquares problem followed by a rescaling of the barycentric weights in order to make them sum to one.
Once the optimal barycentric weights (Λ i ) are computed for each element T i of the training sample, the approximations (i.e. the barycenters) go back through the decoder Ψ to reproduce the input as . The learning stage reduces to estimating the weights and biases of layers of Φ and Ψ using an appropriate cost function that minimises the error between the input, T i and the output, Ψ( d e=1 λ i e Φ(T e a )), so that min In the training stage, we thus learn the non-linear interpolation scheme that best approximates the training samples in feature space, and the mapping between the barycentres and real space.
The parameter µ controls the trade-off between these two objectives.In the evaluation phase only the decoder Ψ(Θ), which embeds the mapping between the barycentric weights and 3D temperature profiles, is used.As shown in the right panel of Fig. 6, the decoder is used as a generative model that is parameterised by the barycentric weights, Λ (for convenience we drop the subscript 'i' from now on).This model can easily be convolved to fit the observed 2D temperature profile so as to recover the true (3D) temperature profile.From now on, we refer to the decoder as the IAE model.The number and choice of anchor points and the model training will be discussed in the following Section.

Model training
We use a JAX (Bradbury et al. 2018) implementation to develop and train the IAE model.As a training sample, we use 200 randomly drawn T 3D profiles from the full sample of 315 extracted from the Three Hundred Project simulations.
Each profile in the training sample is first normalised to entries that sum to 1.The model is trained at the same fixed radial binning as that of T 3D profiles in the [0.02-2]R 500 radial range.For the training stage, several configurations were tested, among which the following choices were found to perform the best: -Network architecture: Both the encoder and the decoder are multi-layer perceptron (MLP) neural networks, which are composed of 2 layers, each of which has a number of hidden units equal to the input signal dimension (i.e.48).We employ a smooth and non-monotonic Mish2 activation function to introduce non-linearity and enhance the learning capacity of our deep neural network model.Since the IAE model employs a barycenter transformation of the training sample in encoder space to achieve dimensionality reduction, in this work, we only focus on a specific architecture with a fixed number of neurons per layer, corresponding to the dimension of the input samples.Further exploration of more general architectures is left for future work.For both encoder and decoder, the output Z l+1 of layer l can be expressed as Here, the first term represents the standard output of the neural network, with W and b defined as weight matrix and bias vector respectively.The second term represents skip connections (He et al. 2015;Huang et al. 2016), also known as residual connections.The skip connection acts by partially re-injecting Z up to a layer-dependent scalar factor ε. In general, the residual injection factors are typically chosen to be small for low-level layers and larger for deeper layers.This approach helps mitigate the vanishing gradient phenomenon, which is commonly encountered during the training of deep Notes: The output of the encoder is first transformed into the barycenter of the anchor points using Eqn.14.
networks.For each layer l of encoder and decoder, we consider following the functional form of ε l as used in Bobin et al. ( 2023) where ε 0 is a constant factor.By using skip connections with re-injection and layer-dependent scaling, the model can leverage both the direct information flow from earlier layers and the higher-level abstractions learned by the deeper layers, which can lead to improved performance and better training in deep neural networks.
-Cost function: The cost function defined in Eqn.16 is composed of two terms.The first term measures the reconstruction error in real space, and the second term defines the error in feature space.The parameter µ allows one to tune the trade-off between these two terms.An accurate IAE model relies on both a low reconstruction error (i.e.first term of the training loss), and an efficient interpolation scheme in feature space.It has been emphasised in Bobin et al. (2023) that the second term helps improve the training process by constraining the feature space.In addition, depending on the problem and data at stake, it can help to increase the model accuracy by reducing the interpolation error in feature space, which in turn can reduce noise propagation at inference.In the present case, we noted that the trained model is not particularly sensitive to µ, which we set to 10 000 to minimise the reconstruction error in real space.
-Training hyper-parameters: The batch size (the number of training profiles processed together before updating the neural network weights) is fixed to 32.The optimisation is performed by back-propagation using the standard Adam solver (Kingma & Ba 2014) with a step size of 10 −3 and a number of epochs equal to 25000.It is customary to further regularise the model by adding noise to the training samples, which limits over-fitting effects.To do that, Gaussian noise with mean zero and standard deviation of 2 × 10 −3 is added to the samples at the training stage.
The batch normalisation was achieved by normalising the input batch using a global mean of 0 and a standard deviation of 1. Finally, we fix the residual parameter (ε 0 ) to 0.1.
-Number of anchor points: Anchor points serve as the basis on which temperature profiles are reconstructed using Training with a small number of anchor points results in smoother (more regular) profiles; conversely, a large number of anchor points increases the modelto-data fidelity.Thus, the choice of the number of anchor points used during the training stage is essentially equivalent to choosing a regularisation parameter.For our study, the number of anchor points is fixed at five.These are generated by first dividing the training sample into five groups using a k-means clustering algorithm.The anchor points are then assumed to be the central points (centroids) of these five groups.By using five anchor points, we can ensure that the model-to-data residual remains below 10% over the observable radial range of ≈ [0.02-1] R 500 , as shown in Sect.5, and at the same time, we can avoid any possible biases that could be introduced if the observations were shallow (bias-variance problem).Figure 7 shows the anchor points used in the neural network model.In Sect.5.1.2,we will discuss the effect of increasing the number of anchor points.
Table 1 provides a comprehensive summary of our neural network architecture, along with the optimal hyper-parameters used in the study.For our implementation, we used publicly available source code hosted on a GitHub repository 3  (Bobin et al. 2019(Bobin et al. , 2023)).
Since our simulated sample is small, we use the term 'validation' to refer to testing of the model performance on simulated data (Sec.5) before using it on real-world data, where the 3D temperature profiles are not directly available.We therefore used the training sample itself to evaluate the convergence of the cost function.Specifically, we monitored the cost function during training and found that after approximately 25000 iterations, the cost function reached a point where it became flat.At this stage, we considered the training process to be sufficiently converged, and we terminated the training.

Model fitting
The IAE model is tested/fitted on the validation sample consisting of the remaining 115 galaxy clusters in the sample which were not used in the training stage.We have verified that the validation sample is representative of the full sample: about onethird of the validation clusters have cool cores, and the fractions of relaxed/disturbed clusters and smooth/irregular profiles are similarly distributed in the training and validation samples.
We employ Markov Chain Monte Carlo (MCMC) analysis to estimate the parameters of the IAE models and use the publicly available emcee python package (Foreman-Mackey et al. 2013) for this purpose.The parameter estimation is undertaken on all the IAE parameters: the five anchor point weights Λ = [λ 1 , λ 2 , λ 3 , λ 4 , λ 5 ], and the amplitude (normalisation) parameter α.
The deconvolved temperature profile can be obtained from the trained non-parametric IAE model by minimising the following log-likelihood: where T val is a temperature profile (2D or 3D) in the validation sample to be fitted, Σ o is the error covariance matrix and is the corresponding convolved IAE model predicted profile.Tr and ⊤ represent the trace and transpose of the matrix respectively.The first term represents the mean proximity term, with Γ controlling its overall contribution to the likelihood.This enforces the solution to be a barycentre of the example profiles (i.e. it searches for the best approximation of the input signal with respect to the learned model/network).We find that Γ in the range 0.1-1 generally provides good results, and we, therefore, fix it to 1. Λ (the mean value of the Λs) and Σ t (the covariance matrix of the Λs) are computed from the training set by generating 100 Monte Carlo simulations for each cluster with log-normal noise, which are then subsequently fitted to the IAE model using the Adam optimiser.This cost-effective regularisation strategy is introduced to avoid model extrapolation (physically unrealistic results), and enables us to have a robust and effective deconvolution algorithm.The second term is the standard likelihood related to some additive Gaussian noise perturbation.We have used flat prior distributions and Tab. 2 shows the prior ranges of all parameters.We used Getdist (Lewis 2019) with the chains generated by emcee to produce 2D contours and marginal posteriors.
The IAE model testing was undertaken by fitting it to the T 3D profiles and T 2D profiles built in Sect. 2. For simplicity, we ignore the PSF in the testing phase.We tested and validated our model by considering three fitting cases: 1. 3D-3D fit with fine binning: the T 3D profiles are directly fitted to recover the best-fitting 3D profiles from the IAE model.The goal, in this case, is to assess the ability of the IAE model to reproduce the input 3D temperature profile shape.In this case, as there is no projection, C in Eqn.19 is simply an identity matrix of size 48 × 48 (C 48,48 ).For case 3, which seeks to mimic the typical characteristics of 2D temperature profiles measured with current X-ray satellites, we assume that the uncertainties increase linearly with radius.Based on our previous experience with XMM-Newton and Chandra observations, we assume temperature profile uncertainties that increase from 5% to 25% in the [0.02-1]R 500 radial range for the 12-bin profiles, and from 10% to 30% for the 6bin profiles.We built a diagonal error covariance matrix, i.e.Σ o , using this approximation.This was then incorporated in the likelihood and acts as a weighting function, giving more weight to the inner regions in the fit.In general, regardless of whether the errors increase monotonically, the inclusion of errors in the likelihood leads to an overall improvement in the fit.For cases 1 and 2 (fine binning), we do not consider errors in the likelihood and as such Σ o is a unit matrix.Both model training and fitting a single profile with MCMC can be completed within a few minutes on a 16-core CPU.
In the objects where the temperature profiles in the first few inner bins were not reliable (i.e.having < 100 gas particles), these bins were not considered in the fitting.However, no such constraint was applied during the training stage, as one expects the network to learn only the fundamental structure of the data rather than the noise.

Model evaluation
In this Section, we discuss the robustness of the non-parametric IAE model reconstruction using different schemes.We check the performance of our model with respect to the radial binning, which is important since the number of radial bins corresponding to the observations is much lower compared to the resolution of the temperature profiles in the simulated sample.We also consider different weighting schemes in the fit.The model is tested with the 115 temperature profiles in the validation sample.
The performance of the model was evaluated by comparing the original 3D and 2D temperature profiles with those recovered from the IAE model.For each case, we calculated the median fractional residual and its associated 1-σ dispersion (16th-84th percentile range) at three scaled radii (0.02 R 500 , R 500 , and 2 R 500 ), and over the full radial range.These results are presented in Table 4, and each case is discussed in more detail below.
5.1.3D-3D reconstruction of temperature profiles

Overall performance
We first consider the simplest case, corresponding to the 3D-3D fit with fine binning, where we directly fitted the IAE model to the intrinsic 3D gas mass-weighted temperature profiles (T 3D ), ignoring projection effects.The left hand panel of Fig. 8 shows the fractional residuals (∆T 3D /T 3D ) between the input (true) and recovered temperature profiles for all the individual clusters in the validation sample.The median fractional residual profile along with 1-σ dispersion (16th-84th percentile range) are also plotted.
The median fractional residual profile is found to be close to zero throughout the radial range: at radii, 0.02 R 500 , R 500 , and 2 R 500 , the values are −0.010± 0.060, 0.010 ± 0.051 and −0.020 ± 0.120 respectively.Moreover, the median fractional residual over the full radial range is found to be −0.001± 0.042.The 1-σ dispersion in the fractional residuals is nearly constant at around ±5%, except beyond 1.5 R 500 .
Within the validation sample, the fractional residuals of the 20 most relaxed / disturbed clusters (MR20 / MD20) are displayed at the top in the right panel of Fig. 8, while the 20 most smooth / irregular profiles (MS20 / MI20) are shown at the bottom.In all cases, the median fractional residuals are again consistent with zero.The 1-σ dispersion in fractional residuals over all radii for the MD20 (MI20) sub-sample is ±0.045 (±0.053), which is larger, as expected, compared to the dispersion of ±0.032 (±0.029) found in the MR20 (MS20) sub-sample.This conclusion is supported by the fact that the histogram of the residuals of the MR20 (MS20) sub-sample is more peaked at zero, and hence is narrower compared to the MD20 (MI20) subsample.In general, we find that for disturbed clusters and for irregular profiles, the IAE model smooths out the sharp small scale variations in the 3D temperature profiles.

Anchor point weights, λ i
We have shown above that the IAE model is able to recover the average shape of the 3D profiles with high accuracy.In this context, it is interesting to consider how the anchor point weights, λ, change according to the characteristics of the profile under consideration.Figure 9 shows the temperature profiles of the most relaxed / disturbed clusters in the validation sample, classified according to the χ D criterion discussed in Sect.2.2, and of the most regular / irregular profiles in the validation sample, classified according to the χ S criterion introduced in Sect.2.3.The reconstructed median temperature profile and fractional residuals obtained with the IAE using MCMC are also shown.The IAE model produces smoother profiles on small scales by ignoring the fluctuations on such scales.At large scales, the IAE model is able to reproduce the underlying structure of the input temperature profiles.The bottom left hand panel shows the fractional residuals, which can be seen to be less than 5% over most of the radial range.In Appendix A.2, the top panel of the We also tested the effect on the IAE model of increasing the number of anchor points.We found that the model fidelity can be improved by increasing the number of anchor points and that the choice of 20 anchor points reduces the residuals significantly.In Appendix A. found to be 0.002 ± 0.030, about 25% smaller compared to the fiducial IAE model obtained with five anchor points.However, the usefulness of this higher dimensional model is limited to simulations only.The temperature profiles that can be obtained from current X-ray satellites generally have temperature data at around 8-15 points for typical deep observations.Use of the IAE model with 20 anchor points in cases such as this would result in over-fitting and/or large variance.

2D-3D reconstruction of temperature profiles with fine 2D binning
We now discuss the efficiency of the IAE model when fitting the 2D (projected) temperature profiles, defined at the same radial grid as in the previous case and at which the IAE model is defined (2D-3D fit with fine binning case).Here, the 3D IAE model is convolved with the standard emission-measure weighting matrix.The resulting projected model is then fitted to the input 2D temperature profiles, in order to reconstruct the 3D temperature profiles.
Since projection results in smoother 2D temperature profiles, washing out fluctuations at small scales, one expects the 3D reconstruction obtained from the 2D profile to be more regular compared to what was found in the previous section.It is also important to note that projection effects are dominant in the inner regions (especially in CC clusters), which can introduce degeneracy into the reconstructed 3D temperature profiles in the central region.However, both the 2D and 3D profiles of CC clusters will always display a central temperature dip.Thus one can expect a larger scatter in the 3D reconstructed temperature profiles in the central regions, as compared to the 3D-3D fitting case.
In Fig. 10, we show the ensemble plot of fractional residuals of the 2D (top panel) and 3D (bottom panel) temperature profiles for the validation sample (left panel) and sub-samples (right panel).The fractional residuals in 2D space (where the fitting is actually performed) are smaller compared to the 3D temperature residuals, as expected.
For the 2D fit, we find median fractional residuals at radii 0.02 R 500 , R 500 , and 2 R 500 to be 0.009±0.027,0.004±0.040and −0.018 ± 0.095 respectively.The median of fractional residuals for the full sample and over the entire radial range is found to be −0.002± 0.027.Unlike in the 3D-3D case, where the dispersion around the median was slightly larger in the outer regions only, here, it also increases towards the centre, as expected from the arguments given above.The dispersion is about ±10% at the first bin.
For the 3D reconstruction, we find median fractional residuals at 0.02 R 500 , R 500 , and 2 R 500 of 0.021 ± 0.110, 0.014 ± 0.052 and −0.018 ± 0.095, respectively.The median of fractional residuals for the full sample and over the entire radial range is found to be −0.003± 0.045.Moreover, as in the 3D-3D case, here too, the histogram of the fractional residuals over all radii of MR20 and MS20 sub-samples are narrowly peaked compared to the MD20 and MI20 sub-samples, indicating again that the profiles of more relaxed clusters, or intrinsically smoother temperature 2D profiles, are reconstructed with higher fidelity in general.
In the left panel of Fig. 11, we show the recovered temperature profiles for the extreme cases of the most relaxed / disturbed cluster and the most smooth / irregular profiles in the validation sample.As in the 3D-3D case, the difference between the input and recovered temperature profiles is less than 5% over most of the radial range.In Appendix A.2, the bottom panel of the Here also, all the parameters are well constrained.The comparison to the equivalent parameters contours for the 3D-3D case, also shown on the plot, show that, understandably, the 2D-3D reconstruction has slightly larger contour intervals compared to the 3D-3D.

2D-3D reconstruction of temperature profiles with an observation-like binning
So far we have tested the IAE model only with high resolution simulated temperature profiles.However, real observed 2D temperature profiles are of much lower spatial resolution, have fewer data points, and are generally detected up to R 500 only.In this Section, we test the accuracy of the IAE model to recover simulated temperature profiles with resolutions similar to those found with the current X-ray observations (2D-3D fit with coarse binning case).First, we consider a case where we fitted 2D temperature profiles having resolutions similar to that expected with moderately deep X-ray observations.In such observations, we normally expect around twelve annular data points limited up to R 500 .We also impose more realistic errors in the 2D temperature profiles: They are assumed to increase linearly with a radius from 5% in the innermost bin to 25% in the outermost bin.Later in this Section, we will also consider a fitting case with 2D temperature profiles defined at only six radial points within R 500 , with errors ranging from 10% to 30% from the innermost to the outermost radial bin.
Article number, page 13 of 32 Fig. 12: Fractional residuals for 115 clusters in the validation sample with IAE for the 2D-3D fit (coarse binning) using 2D temperature profiles defined at twelve radial bins up to R 500 .Colour coding is the same as in Fig. 10.When given 2D temperature profiles with a binning scheme typical for moderately deep X-ray observations, the IAE model can still reconstruct 3D temperature profiles with fractional differences of about 5% throughout the 2D fitting range (i.e.[0.02-1] R 500 .)

Twelve bin case
In Fig. 12, we show the ensemble plot of the 2D and 3D fractional residuals for the 2D-3D fit with the coarse binning case, by considering twelve 2D temperature data points within R 500 .
Even with the lower resolution, we find that within the 2D fitting range (i.e. up to R 500 ), the 3D fractional residuals are still close to zero, with a 1-σ dispersion of about ±5%, as in the previous cases.The median 3D fractional residuals at radii 0.02 R 500 , R 500 , and 2 R 500 is found to be 0.003 ± 0.071, −0.010 ± 0.064 and −0.070 ± 0.185 respectively.The median of fractional residuals for the full sample and over the entire radial range is found to be −0.006± 0.051.Beyond R 500 , where no 2D temperature data were available to fit, and thus where the constraints on the 3D reconstruction are only due only to projection effects, the scatter increases with radius, reaching a 1-σ dispersion of ±20% at the last bin (2 R 500 ).Moreover, beyond 1.5 R 500 , 3D temperature profiles are underestimated by about 7%.However, it is important to mention that the true 3D temperature profiles mainly lie within the 1-σ dispersion of reconstructed temperature profiles.
As before in the fine binning case, the dispersion in the 2D fractional residual is much smaller compared to the 3D reconstruction.
For the 2D fit, we find median fractional residuals at radii 0.02 R 500 and R 500 to be 0.001 ± 0.008, −0.026 ± 0.073 respec-tively.The median of fractional residuals for the full sample and over the entire radial range is found to be −0.002± 0.026 for the 2D profiles, similar to that found in the 2D-3D fit with the fine binning case.Since we assumed that the errors increase radially outwards such as in real observations, putting more weight on the inner regions in the fit, the constraints in the inner region are better compared to the 2D-3D fit with the fine binning case.For comparison, Fig. A.5,in Appendix. A.3 shows the 3D fractional residuals for the case where we do not consider error bars in the fit.Here, we find that the scatter is increased in the inner regions as compared to both 2D-3D fit with fine binning case (previous case) and coarse binning case (present case).
As in the previous cases, the histogram of the residuals of the MR20 (MS20) sub-sample has a stronger peak around zero and reduced wings compared to the MD20 (MI20) subsample.For example, the 1-σ dispersion in 3D fractional residuals over all radii for MD20 (MI20) sub-sample is found to be ±0.055(±0.065) and for MR20 (MS20) sub-sample it is ±0.041 (±0.036).
In the left hand panel of Fig. 13, we show the IAE recovered temperature profiles of the most relaxed and disturbed cluster and of the most regular and irregular profile in the validation sample.As in previous cases, here also the difference between the input and recovered temperature profiles is less than 5% in Article number, page 14 of 32 using 2D temperature profiles defined at twelve (left panel) and six radial bins (right panel) up to R 500 .Errors in the 2D temperature profiles are assumed to increase linearly with a radius from 5% (10%)in the innermost bin to 25% (30%) in the outermost bin for the twelve (six) bin case.
Colour coding is the same as in Fig. 11.

Density
Fig. 14: Fractional residuals for 115 clusters in the validation sample with IAE for the 2D-3D fit (coarse binning) using 2D temperature profiles defined at six radial bins up to R 500 .Colour coding is the same as in Fig. (10).For simplicity, we have not shown the sub-sample cases.Even when input 2D temperature profiles with a binning scheme typical for shallow X-ray observations, the IAE model can still reconstruct 3D temperature profiles with fractional differences of about 5% throughout the 2D fitting range (i.e. [0.02-1] R 500 ).
the 2D fitting range of [0.02-1] R 500 .Beyond R 500 , as expected, the residuals can be high.In Appendix A.3, the top panel of

Six bin case
The 2D and 3D fractional residuals for a fit considering only six data points with errors linearly increasing from 10% in the innermost bin to 30% in the outermost bin in the range [0.2-1] R 500 are shown in Fig. 14.We find that the median 2D and 3D fractional residuals are still consistent with zero in the 2D fitting range.However, as expected, the 1-σ dispersion is larger compared to the previous cases and temperature profiles are underestimated by about 8% beyond 1.5 R 500 (where there are no 2D data).We find median 2D fractional residuals at radii 0.02 R 500 and R 500 to be −0.006±0.022,−0.022±0.070respectively.For the 3D reconstruction, we find median fractional residuals at 0.02 R 500 , R 500 , Article number, page 15 of 32 and 2 R 500 to be 0.05 ± 0.128, −0.004 ± 0.090 and −0.080 ± 0.235 respectively.The median of fractional residuals for the full sample and over the entire radial range is found to be −0.008± 0.038 and −0.014 ± 0.075 for the 2D and 3D profiles respectively.In the right panel of Fig. 13, we show the temperature profiles of the most relaxed and disturbed cluster and of the most regular and irregular profile in the validation sample.We find that even with only six data points in the fit, the IAE is still able to recover the 3D temperature profiles with residuals less than 10% over most of the cluster region.However, the confidence intervals of the reconstructed profiles and IAE parameters, shown in the bottom panel of For comparison, Table 4 provides the median fractional residuals obtained for the different cases of fitting schemes discussed in this Section.Similarly, Table 5 shows the bestfitting parameters of IAE model for different cases obtained with MCMC.One can see that as we go from the high resolution simulated profiles to lower resolution observational-like profiles, the dispersion in fractional residuals and parameter estimates increases.
We also checked the performance of the model with other binning schemes and found the performance of the IAE model to be robust.In particular, we checked the performance by considering five 2D data points up to 0.5 R 500 in the fit.We find that the IAE model is able to reproduce the results with an average fractional difference of about 5% up to 0.5 R 500 which then increases with radius and becomes about 10% at R 500 and 25% at 2 R 500 .We also considered an IAE model with 20 anchor points, applied to the two observation-like cases, and found that its performance is very similar to that of our fiducial five-parameter IAE model, unlike in the 3D-3D case where it is found to have better performance.This implies that increasing the number of anchor points does not necessarily increase the model fidelity for these cases, as one must also have higher resolution input 2D temperature profiles for the model to be fitted against.Notes: Numbers in the brackets represent the optimal priors found in this work.

2D-3D reconstruction of temperature profiles with spectroscopic-like weighting
In the previous Sections, we have only focused on 3D temperature reconstruction from the IAE model using 2D temperature profiles derived using standard emission-measure weights (Mathiesen & Evrard 2001).In this Section, we consider more complex spectroscopic-like weighting (Mazzotta et al. 2004), which has a stronger dependence on the 3D temperature profiles.This makes deconvolution a more complicated problem and, therefore, it is important to check the accuracy of the IAE model in this case.In Fig. 15, we show the fractional residual for 2D and 3D temperature profiles between the input and IAE recovered temperature profiles in 2D-3D fit with twelve data points in the range [0.02-1] R 500 .We find the median fractional residuals at radii 0.02 R 500 and R 500 to be 0.002 ± 0.008, −0.027 ± 0.065 respectively for the 2D profiles.For the 3D reconstruction, we find median fractional residuals at 0.02 R 500 , R 500 , and 2 R 500 to be 0.040 ± 0.072, −0.003 ± 0.065 and −0.060 ± 0.180 respectively.We see that on average there is a small but noticeable 4% overestimation in the 3D temperature profiles in the first 4 radial bins.This could be caused by the presence of dense and cold substructures that in the simulated objects could lower the central value  Notes: The errors are given at 1-σ level.The reconstructed 2D and 3D temperature profiles at the last point (i.e 2 R 500 ) are by construction identical for the 2D-3D fit with fine binning case.The fine binning cases represent high resolution simulated temperature profiles, and the coarse binning cases represent lower resolution observational-like temperature profiles.The constraints on the residuals become weaker with decreasing resolution.Even in the coarse binning cases, the residuals remain consistently below 5% within the fitting range (i.e.[0.02-1] R 500 ).In all the cases, MS20 and MR20 sub-samples provide the tightest constraints across the full radial range.
of the 3D spectroscopic-like temperature in the innermost region, where the impact of this formulation is the strongest (see e.g.Fig. 3 of Rasia et al. ( 2014)).Similarly, beyond R 500 the temperature profiles are underestimated by 8% on average.This effect could also play a role for the central mismatch, since the convolution is temperature dependent, the slight overestimation in the first few innermost bins may be also linked to the underestimation of temperature profiles in the outermost bins.This suggests the importance of deriving accurate estimation of the temperature profiles beyond the 2D fitting range.More detailed treatment in this regard is beyond the scope of this paper and we propose possible explanations as an important future direction.However, we do find the median residual is consistent with zero over all the radial range of [0.02-2] R 500 and as in the previous cases, for the majority of the clusters the true 3D temperature profiles lie within 1-σ dispersion of the IAE recovered temperature profiles.The median of fractional residuals for the full sample and over the entire radial range is found to be −0.003± 0.038 and −0.003 ± 0.075 for the 2D and 3D profiles respectively.

Comparison of IAE model to a parametric model
In this Section, we use the validation sample of 115 clusters to compare the non-parametric results from IAE model to those obtained from a parametric temperature model.We first obtain the best-fitting 3D temperature from the Vikhlinin et al. ( 2006) model (Eqn.2) considering the prior range on each parameter given in Table 3, and using the same binning schemes as used for the IAE model in previous sections, assuming a spectroscopiclike weighting scheme.Temperature profiles were first scaled by T X before fitting them to the parametric model, so as to bring the parameter T 0 to a comparable scale.We find that in the 2D-3D (or 3D-3D) fine binning case, the 3D reconstruction is poor compared to the observational-like cases where the fitting is weighted according to the errors, which increase with radius.We also tried to fit the temperature profiles in log space, which could effectively address any heteroscedasticity issues and stabilise the variance over the large radial range.However, this still did not improve the model reconstruction in the 2D-3D (or 3D-3D) fine binning case.This indicates that such a parametric model strug- Article number, page 17 of 32 Notes: The errors are given at 1-σ level.As in Table .4, the constraints decrease in strength as we go from the fine binning cases to observation-like coarse binning cases.gles to accurately capture the true underlying patterns in the noiseless data, or when the noise covariance is negligible.By weighting the fitting according to the errors, which reflect the inherent uncertainties in the data and which increase with radial distance, the model can better adapt to the complexities of the noiseless data, resulting in improved performance.The significant improvement achieved by incorporating error covariance can be visually observed in Fig. 16.Even with coarse resolution, as discussed in the next paragraph, the fit shows a remarkable enhancement when realistic error covariance is considered during the fitting process.Another reason for the sub-optimal performance of the parametric model can be attributed to its highly non-linear nature and the strong degeneracy between the parameters.This results in poor constraints on the parameters, and the reconstructed 3D temperature profiles could depend strongly on the choice of fitting priors.The arguments discussed above can be explained with Fig. 16.The top panel of the Fig. 16 shows the dispersion for the 2D-3D fine and coarse binning cases with prior ranges of parameters a = 0−0.6 and c = 0−4, which have a significant effect on the profiles in the central and outer regions respectively.We find, for the 2D-3D fine binning case, that the 3D reconstructed temperature profiles obtained from this parametric fitting have a large bias in both the central and outer regions, with median fractional residuals of values about 30% and 11% at the first and last bin respectively.For observational-like binning, having a weighted fitting, the bias in the central regions becomes consistent with zero, however, there is still a bias beyond the R 500 which increases with the median fractional residual of values about 18%.We find that the optimal priors for parameters a and c are a = 0 − 0.1 and c = 1 − 4 respectively, leading to a minimal bias in the central and outer regions respectively.This is shown in the bottom panel of Fig. 16, where one finds a median consistent with zero, but with slightly larger dispersion compared to the IAE model for the observational-like cases.In the outer regions, however, the dispersion in the 2D-3D fine binning case is barely consistent with zero for the parametric model.
Considering the optimal priors for the a and c parameters discussed above, the left panel of Fig. 17 shows the reconstruction of the 3D temperature profiles with the IAE and parametric models for typical CC and NCC clusters in the simulated sample with observational-like binning having twelve bins.While the CC profile is recovered well by both models, the reconstruction is poor in the central region for the parametric fit to the NCC Article number, page 18 of 32  2006) parametric model.For better visibility, the 1-σ dispersion for the parametric model is not shown.Right panel: 3D temperature profile reconstruction with the IAE and parametric models for two complex cases in the Three Hundred Project .For better visibility, 2D profiles and the 1-σ dispersion are not shown.For both figures, the bottom panel shows a fractional difference between the true and recovered 3D profiles.For NCC and CC clusters, both the IAE model and parametric model reconstruction with optimal priors are comparable, but the former exhibits slightly better performance.
For the complex cases, the IAE model is more accurate in uncovering the profile shapes.
case, and would require larger values of a to improve the fit in the central region.Similarly, in the right panel of Fig. 17, we show the 3D reconstruction of two complex profiles.These two clusters are experiencing ongoing merger shocks.Here one sees that, in such scenarios, the parametric model performs poorly compared to the IAE model, being unable to capture the true underlying structure of the data.We find that even increasing the priors on a and c did not have any significant improvement in the parametric fit for such complex profiles.The accurate estimation of the shape of the temperature profile is vital since the estimation of total mass profiles depends on it.

Modifications to the IAE model
Although the Three Hundred Project provide us with one of the highest resolution hydrodynamical simulation samples to date, due to numerical issues, the thermal profiles could only reliably be estimated above 0.02 R 500 for most of the galaxy clusters in the sample.The number of available 2D annular temperature data points and their radial distribution will depend on the object mass and luminosity, the presence or absence of a cool core, and the depth of the observation4 .From our experience of X-ray analysis of typical observations of local (z < 0.5) massive (M 500 > 10 14 M ⊙ ) galaxy clusters available in the XMM-Newton or Chandra archives, we find that for many objects, one is generally able to obtain some temperature data points interior to 0.02 R 500 (corresponding to 20 − 40 ′′ at z = 0.05 and 5 − 10 ′′ at z = 0.3 for typical cluster masses).
Therefore, in order to make the best use of the available data, one needs to look for an optimal extrapolation of the IAE model that is able to reconstruct the temperature profiles robustly even in the very central regions.To build an IAE model that is suitable for application to such observations, we first extrapolated the simulated temperature profiles to 0.005 R 500 by fitting a Vikhlinin et al. (2006) parametric model in the inner regions (up to 0.5 R 500 ).We then re-trained the IAE model in the full radial range of [0.005-2] R 500 with the simulated dataset, augmented by the parametric model extrapolation in the very central regions.

Observed sample
We then use this updated IAE model on the latest CHEX-MATE Data Release 1, DR1 sample (Rossetti et al. 2023) to deconvolve the temperature profiles.The DR1 sample is a 'technical but representative' sub-sample, which was built to test our pipeline for the extraction and reconstruction of the radial temperature and density profiles.It is composed of 30 clusters, whose distribution in mass, redshift, and Planck signal-to-noise-ratio (S/N) reflect the properties of the CHEX-MATE parent sample.In Appendix A.4, Table A.1 provides the details of all the clusters in the DR1 sample.For data reduction and analysis, we used the XMM-Newton Science Analysis System (SAS), version 16.1.We refer to Bartalucci et al. (2023) for details on the data reduction procedures (calibration, standard pattern cleaning, removal of noisy MOS CCDs, and light-curve filtering) and on the detection of contaminating sources.From the EPIC images in the 0.7-1.2keV band, we extracted both mean and median surface brightness radial profiles, centered on the peak and on the centroid within R 500 .For the temperature profile, we extract spectra  in concentric annuli centered on the surface brightness peak, using the MOS-spectra and PN-spectra ESAS tools (Snowden et al. 2008) embedded in SAS.For each region, we perform a joint fit of the MOS1, MOS2, and PN spectra with an adsorbed thermal model, to which we add a model for all the background components (Galactic foregrounds, CXB, Cosmic-ray particle background, residual soft protons).We estimate priors for the parameters of this background model that are allowed to vary within their uncertainty during the joint fit with the cluster parameters, running the Markov Chain Monte Carlo method within XSPEC  et al. (2006) models.Solid lines show the best fit for the data.We see that both our non-parametric and parametric approaches provide tight and accurate constraints on the average temperature of clusters.
(see Rossetti et al. 2023, for more details).In this work, two clusters (PSZ2 G046.88+56.48 and PSZ2 G057.78+52.32)that require background treatments using off-set observations were not considered in the analysis.

Method
For deconvolution of these observed profiles, we assume that the 3D temperature profiles can be represented by the IAE model, convolved with a response matrix C = C PSF ⊗C proj , which simultaneously takes into account projection and PSF redistribution.The projection matrix, C proj , is built by using the DR1 density profiles from Duffy et al. (2023, in prep.), derived using the non-parametric deconvolution algorithm of Croston et al. (2006).More details of the derivation of the density profiles can be found in Croston et al. (2008) and Pratt et al. (2022).C PSF is constructed as in Croston et al. (2006)   PSF model of Ghizzardi (2001) as a function of the energy and angular offsets, the parameters of which can be found in EPIC-MCT-TN-011 5 and EPIC-MCT-TN-012 6 .
The IAE model was then projected, taking into account the spectroscopic-like weighting scheme proposed Mazzotta et al. (2004), and fitted to the observed 2D profiles.In our future work, we will examine the more complex Vikhlinin (2006) weighting scheme, which is more robust for lower temperature clusters/groups, and compare the results to other weighting schemes.

Estimation of profiles
In Fig. 18, we show the 3D temperature profiles reconstructed using the IAE model and the Vikhlinin et al. (2006) 8-parameter parametric model for a typical NCC and a typical CC cluster in the DR1 sample.In general, we find that with the annular resolution of the present 2D profiles, both models produce similar reconstructed 3D temperature profiles.However, the parameters of the Vikhlinin et al. (2006) model are poorly constrained, and the final reconstructed temperature profiles (especially the inner and outer regions) may depend on the chosen priors.
Figure 19 shows the 3D temperature profiles of the clusters in the DR1 sample obtained with the IAE model, scaled by the average temperature (T X ) in the [0.15-0.75]R 500 region.We find that fractional dispersion is about 22% in the inner region which first decreases with the radius and attains a minimum value of 3% at around 0.5 R 500 .It then starts to increase with radius, achieving a maximum value of 22% in the outer regions.Also plotted in the sub-panel is the ratio of the 3D temperature profiles recovered with the IAE and parametric models.One finds that within the radial range of [0.1-1] R 500 , the difference be-tween IAE and Vikhlinin et al. (2006) model is less than 10%.The difference between them can be as high as 25% in the inner and outer regions.However, on average both models predict very similar profiles with a difference of less than 2% over the entire radial range of [0.005-2] R 500 .
As a consistency check, we compared the values of the average temperature in the [0.15-0.75]R 500 region.Figure 20 shows the observed T X compared to T X,model , the temperature derived from a projection of the 3D non-parametric IAE and parametric models in the same annulus.Fitting a straight line to the (T X,model ,T X ) one finds the slope for the IAE and parametric model to be 1.01 ± 0.01 and 1.01 ± 0.02 respectively.

Estimation of derivatives
While non-parametric models offer greater flexibility in modelling complex patterns and relationships, one requires a large amount of data to accurately estimate derivatives.Small irregularities in the profiles often amplify the noise in the derivatives.Therefore, it is often desirable to apply some degree of smoothing to the profiles to have accurate derivatives in the nonparametric approaches.As can be seen from Fig. 19, the reconstructed 3D temperature profiles from the IAE model have a reasonably smooth underlying structure.We find that the direct computation of numerical derivatives of individual profiles derived from the MCMC chains using spline interpolation, without applying any smoothing, usually provided a good estimate of the logarithmic derivatives and corresponding 1-σ interval.Nonetheless, we sometimes found the derivative estimates to be noisy, particularly beyond the 2D fitting range.This noise can be attributed to logarithmic binning, which can create sparsity in the outer regions.Another potential cause for the noise is small spikes in the temperature profiles between consecutive radii in the profiles inherited by the model from the simulations itself in the inner regions due to the limited resolution there.We, therefore, choose to apply a very minimal smoothing, such that only the sharp discontinuities, if any (usually small in magnitude), on local scales (2-3 radial bins) are affected/corrected and the general non-linear structure is preserved.We use the algorithm developed by Cappellari et al. (2013) which implements the one-dimensional locally linear weighted regression Cleveland (1979) 7 .It uses a tri-cube weighting function with weights (1 − u 3 ) 3 where u is a distance from the local point R under consideration and a smoothing parameter f which is the fraction of neighborhood points to be considered in the local fit around R. Increasing the value of f increases the neighborhood of influential points leading the smoother profiles.For our case, we apply modest smoothing with f = 0.15.
Figure 21 shows the corresponding logarithmic derivatives of the temperature profiles of the two clusters discussed in the previous sub-section.Here also, both the IAE and the parametric models produce consistent profiles.Furthermore, for the IAE model, the profiles obtained with and without applying the smoothing on the temperature profiles are also consistent with each other.This can be also seen in the bottom panel where the ratio between reconstructed 3D temperature profiles with and without applying smoothing is seen to be less than 1% over most of the radial range.Figure 22 shows the logarithmic derivatives of the 3D temperature profiles of the clusters in the DR1 sample obtained with the IAE model.Also, in the bottom panel, we show the difference in logarithmic derivatives derived from IAE and parametric models (∆).We find that, although dispersion in the difference increases with the radius, the difference is consistent with zero throughout the radial range.While it is difficult to quantify this difference in the inner region, since logarithmic derivatives are close to zero, in the range [0.5-2] R 500 the difference in logarithmic derivatives between the IAE and the parametric model can be more than as 20%.The impact of this on the total mass estimate is not straightforward but is expected to be about 5%-30%.

Discussion and conclusions
Classical statistical modelling techniques can be sensitive to inaccuracies and may lead to poor performance if the data are complex (non-linear) and/or have a dynamic structure.Data-driven (model-agnostic) deep-learning techniques are now becoming increasingly popular.They make use of the topology to learn the underlying structure of the data, and often have been found to give superior performance in terms of accuracy and precision when the underlying structure of data are non-linear.However, one typically requires a massive dataset and vast computational resources to train the neural network, limiting their applicability for some scenarios.In this paper, we demonstrate the first use of deep learning techniques to build a model of galaxy cluster temperature profiles and apply this model to the problem of temperature profile deprojection.Using a non-linear interpolatory scheme with five anchor points (temperature profiles), allows us to have frugal learning with a sparse training set, and the neural network is able to uncover the lower dimensional non-linear manifold of data by way of mapping between latent space and real space.
The resulting Interpolatory Auto-Encoder (IAE) model is trained and evaluated in the radial range of [0.02-2] R 500 using a simulated dataset of 315 temperature profiles from the Three Hundred Project.We then implement a new deconvolution scheme using efficient and cost-effective learning-based regularisation to achieve a stable and accurate reconstruction of the 3D temperature profiles by optimising the latent parameters (barycentric weights) of the anchor points using MCMC.Moreover, the deconvolution algorithm can be easily extended to include the instrumental PSF effect.We test the IAE with a different set of deconvolution schemes with respect to the resolution, projection, and quality of the data.We find that, in general, the IAE model can recover unbiased 3D temperature profiles in the fitting range.The performance of the IAE model to recover the true temperature profiles can be summarised as follows: -We first considered the simplest case, where we tested the efficiency of the IAE model in directly fitting the high resolution simulated 3D temperature profiles, defined in 48 fixed radial bins in the range [0.02-2] R 500 , the resolution with which the IAE model is trained.We find that in this case, the reconstruction of temperature profiles from the IAE model is robust, with the median fractional residuals centered around zero and a 1-σ dispersion (determined by the 16th and 84th percentile range of fractional residuals) of about ±5% over most of the radial range.The dispersion in the outskirts is somewhat larger (about ±10%).This can be interpreted as being due to the complex nature of the ICM as a result of merging/accretion processes that are dominant there.Moreover, dispersion in the fractional residuals for the sub-sample of 20 most relaxed clusters (MR20) and smooth temperature profiles (MS20) is about 35% smaller compared to the sub-sample of 20 most disturbed clusters (MD20) and irregular temperature profiles (MI20).We find Article number, page 22 of 32 that the model fidelity can be further improved by increasing the number of anchor points in the IAE model.However, since observed temperature profiles are generally of much lower resolution, increasing the complexity of the model is undesirable as it could lead to overfitting.
-We then considered a case where we fitted the high resolution simulated 2D temperature profiles to the IAE model using classical emission measure weights.Here too we find the median fractional residual is centered around zero with a 1-σ dispersion of about ±5% over most of the radial range.In the first few innermost bins, however, we find that the dispersion is increased to about ±10%.This is understandable since the projection operation introduces a degeneracy in the 3D temperature profiles which is significant in the inner regions i.e the mapping between input 2D temperature profiles and IAE reconstructed 3D temperature profiles is not as strong as compared to a mapping between input 3D temperature profiles to IAE reconstructed 3D temperature profiles.However, this degeneracy can be mitigated to a large extent in the observational-like cases since the 2D temperature profiles in the inner bins have relatively smaller errors associated with them as compared to the rest of the radial bins.Moreover, as in the previous case, the distribution of the fractional residuals over all radii for the MR20 (MS20) sub-sample is narrowly peaked compared to the MD20 (MI20) sub-sample.
-We next considered observation-like fitting cases, with typical temperature profile data quality such as would be obtained from the XMM-Newton or Chandra satellites.We first considered a case where we fit 2D temperature profiles defined at twelve radial points and up to R 500 only, mimicking the profile expected from the moderately deep X-ray exposures.We find that in the 2D fitting range i.e.
[0.02-1] R 500 , with the relatively low resolution input 2D temperature profiles, the performance of the IAE model is negligibly degraded.However, beyond R 500 , where we do not consider any 2D data in the fit, the 1-σ dispersion in the 3D reconstruction increases with radius and becomes about ±20% in the last bin.The 3D median fractional residual is found close to zero over most of the radial range, except beyond 1.5 R 500 where it is underestimated by about 7%.We also considered a case where we only use only six 2D temperature data points in the fit and find that the IAE is still able to provide an unbiased estimate of the reconstructed temperature profile, albeit with a slightly larger uncertainty.
-We considered a more realistic temperature-dependent spectroscopic-like weighting scheme (Mazzotta et al. 2004) in the deprojection.We find that there is a small bias of about 4% excess in the fractional residual in the innermost few bins, in addition to the underestimation in the outer regions as in the previous case.
-We also compared the IAE model with a parametric temperature model.With the high resolution hydrodynamical simulated temperature profiles, the parametric model based on Vikhlinin et al. (2006) showed poor performance when the realistic error covariance matrix is ignored in the fit.
Including the error covariance matrix improved the fit.The non-linearity and parameter degeneracy of the parametric model also contributed to sub-optimal performance, making the 3D reconstruction dependent on the choice of priors.In contrast, the IAE model performed better, particularly in complex cases with ongoing merger shocks, demonstrating its superior adaptability to diverse data scenarios.
-Finally, in a first application to X-ray data, we built an augmented version of the IAE model in the radial range [0.005-2] R 500 .The data augmentation was necessary because the simulated profiles did not have sufficient resolution to probe the very core regions that are accessible to good quality Xray data.The augmentation step was achieved by extrapolating the simulated profiles to lower radii (below ≈0.02 R 500 ) by fitting them to the Vikhlinin et al. (2006) parametric model in the range ≈ [0.02-0.5]R 500 .We then used this updated IAE model to reconstruct the 3D temperature profiles and logarithmic derivative of the representative (DR1) sample galaxy clusters drawn from the CHEX-MATE project.
The resulting non-parametric IAE profiles were compared to those derived from parametric deprojection and deconvolution.We find that, in such observational cases where the typical number of annular data points is much fewer compared to the simulations, the difference between the IAE and parametric model is less than 10% over most of the observed region.However, in the inner and outer regions, the difference between them can be as high as 25%.Moreover, the results from the Vikhlinin et al. (2006) parametric model, especially inner and outer regions, depends on the priors chosen on the parameters as they are very poorly constrained during the fit.
It should be noted that the inner regions of the clusters, which involve processes such as AGN feeding/feedback, gas condensation, sloshing, etc., are complex and may not be accurately represented by current state-of-art cosmological simulations.Moreover, the augmentation of the central regions of the training set using the extrapolation of a parametric model could potentially introduce bias in the underlying model recovered from the IAE.Despite these limitations, we believe that the IAE model provides higher-fidelity results compared to traditional parametric modelling, as demonstrated in this study.As the size and quality of both X-ray observations and simulations are set to improve in the coming years, the robustness of IAE will also be enhanced resulting in a much lower scatter.Our future plan is to perform network training and testing on different sets of simulations so as to have a larger training and validation sample.This will potentially also help us to understand the systematics, if any, in the IAE model inherited from the particular set of numerical simulations used for training.For example, De Luca et al. (2021) showed that the dynamical state of clusters in the Three Hundred Project clusters varies with redshift: the relaxed clusters decrease in number from redshift z = 0 to z = 1.It remains to be seen if issues such as possible redshift dependence have any impact on learning.This effect, in principle, can be taken into account by training the model using simulated clusters across a large redshift range.
Another important step in improving the deconvolution scheme will be to force the neural network model to learn the features shared between simulations and real data using transfer/adversarial learning (Ganin et al. 2016).This will essentially mitigate the biases inherited by the neural network model from simulations.Moreover, we expect with an upgraded IAE model, the reconstruction of 3D temperature profiles beyond the observational range of R 500 will be significantly improved due to an increase in the size of the training sample.We further plan to implement a more robust model extrapolation technique in future work.
Article number, page 23 of 32 The usefulness of the IAE is not only limited to the estimation of the temperature of the galaxy clusters.We further plan to use the IAE interpolatory technique to recover the underlying density, pressure and hence dark matter profiles in the galaxy clusters.An important extension of this will be to train a neural network to estimate the total mass profiles of the galaxy clusters directly from the thermal profiles of the ICM without considering the hydrostatic equation.Another interesting prospect for our work will be to implement the deconvolution technique in SZ and lensing data, to recover the robust model of the galaxy clusters.This will further help us to understand the biases introduced in calibrating the mass and scaling relations for cosmological studies.Such studies might be also used to assess more robustly relative density/temperature fluctuations, hence constraining turbulence and relative parameters (Mach number, injection scale, etc).Our methodology can also be implemented in other areas of astrophysics and cosmology.In fact, the IAE scheme has already been implemented in the source separation algorithm to tackle physical hyper-spectral data (Gertosio et al. 2023).
One of our immediate plans is to implement the proposed deconvolution technique to the most recent high quality CHEX-MATE X-ray sample of clusters (CHEX-MATE Collaboration 2021), and compare to other approaches such as those used in Bartalucci et al. (2018) (semi-parametric reconstruction) and Eckert et al. (2022) (multi-scale non-parametric reconstruction).The comparison of the estimated logarithmic derivatives will be instructive since these are highly related to the shape of mass profiles of clusters.Our ultimate goal will be to test the ΛCDM predictions on the total mass distribution in galaxy clusters using a new and sophisticated fully non-parametric approach.

Fig. 1 :Fig. 2 :
Fig. 1: Comparison of the observed 2D temperature profiles, scaled as a function of R 500 and T X , the temperature in the [0.15-0.75]R 500 region.The thin grey lines show 50 randomly selected simulated 2D temperature profiles from the Three Hundred Project, extracted with an observationlike annular binning resolution, derived using emission measure (left panel) and spectroscopic-like (right panel) weighting schemes.The thin red lines show individual profiles in the Planck Collaboration XI (2011) sample.For better visibility, the error bars corresponding to the observed profiles are not shown.The regions enclosing thick black and red lines show the 1-σ dispersion (16th-84th percentile range) of the temperature profiles of the full simulated sample and the Planck sample respectively.The regions enclosing the thick blue lines show the 1-σ dispersion of the CHEX-MATE DR1 sample.Scaled by R 500 and T X , both the emission measure and spectroscopic-like derived 2D simulated temperature profiles become somewhat self-similar.

Fig. 3 :
Fig. 3: Classification of temperature profiles in the Three Hundred Project.Left panel: Grey line shows the visually classified CC clusters.Cyan and green lines show the 20 most relaxed clusters (top panel) and 20 most smooth profiles (bottom panel).Right panel: Grey line shows the visually classified NCC clusters.Magenta and orange lines show the 20 most disturbed clusters (top panel) and irregular profiles (bottom panel)

Fig. 4 :Fig. 5 :
Fig. 4: Distribution of clusters in the Three Hundred Project as a function of the χ D (Eqn.6) and χ S (Eqn.8) criteria.The hatched cyan and magenta regions show the 20 most relaxed clusters and the 20 most disturbed clusters respectively based on χ D criterion.The hatched green and orange 20 most show the 20 most regular profiles and the 20 most irregular profiles respectively based on χ S criteria.

Fig. 6 :
Fig. 6: Design of the neural network used in this work.Left Panel: Neural network used in the training stage.Φ and Ψ represent the encoder and decoder respectively.T i are the elements of the training set and T e a are the elements of the anchor set.Φ(T i ) and Φ(T e a ) are the representations of T i and T e a , respectively, in the encoder (feature) space.Θ(Λ i ) = Θ([λ i 1 , ..., λ i d ]) is the Euclidean barycentric representation of Φ(T i ) in terms of d anchor points Φ(T e a ), which is fed to the decoder.Ψ(Θ) is the reconstructed output of the decoder.The network is trained by minimising the error between the input T i and output Ψ(Θ) temperature profiles.Right panel: Neural network (IAE model) of temperature profiles, where λ 1 , ..., λ d are the input parameters and IAE([λ 1 , ..., λ d ]) is the output temperature profile.The decoder is not required in any step here.

Fig. 7 :
Fig. 7: Five anchor points (example profiles), T e a , where e runs from 1 to 5 used in the IAE model.

Fig. 8 :Fig. 9 :
Fig.8: Fractional residuals for 115 clusters in the validation sample with IAE for the 3D-3D fit.The three horizontal dashed black lines represent zero and ±5% fractional residuals; the vertical dashed black lines represent R 500 .Left panel: The grey lines show the individual fractional residuals of all the clusters.The solid black line and shaded black region show the median and 1-σ dispersion of the fractional residual distribution, respectively.The histogram shows the distribution of fractional residuals over all radii.Right panel: The cyan and magenta lines in the top panel show the fractional residuals of MR20 and MD20 sub-samples, respectively.The green and orange lines in the bottom panel show the fractional residuals of the MS20 and MR20 sub-samples respectively.Shaded regions show the corresponding 1-σ dispersion of the fractional residual distribution.The histograms show the distribution of fractional residuals over all radii.Regions enclosed by the solid black lines show the 1σ dispersion of the fractional residual of the full validation sample.The IAE model can reconstruct 3D temperature profiles with a fractional difference of about 5% across nearly the full radial range.

Fig. 10 :
Fig. 10: Fractional 2D and 3D residuals for 115 clusters in the validation sample with IAE for the 2D-3D fit (fine binning).The three horizontal dashed black lines represent zero and ±5% fractional residuals; the vertical dashed black lines represent R 500 .Left panel: Grey lines show the individual 2D (top panel) and 3D (bottom panel) residuals of all the clusters.The solid black line and shaded black region in the left panels show the median and 1-σ dispersion of the 2D (top panel) and 3D (bottom panel) residual distribution, respectively.The histogram shows the distribution of residuals over all radii.Right panel:The cyan and magenta lines show the 2D (top panel) and 3D (bottom panel) residuals of the MR20 and MD20 sub-samples respectively.Green and orange lines show the 2D (top panel) and 3D (bottom panel) residuals of the MS20 and MI20 sub-samples respectively.Shaded regions show the corresponding 1-σ dispersion of the residual distribution.Regions enclosed by the solid black lines show the 1-σ dispersion of the median residual of the full validation sample.The histograms show the distribution of residuals over all radii.When given 2D profiles as input, the IAE model can reconstruct 3D temperature profiles with a fractional difference of about 5% across nearly the full radial range.

Fig. A. 2
Fig. A.2 shows the corresponding posterior distribution of the parameters of the IAE model obtained using MCMC.The parameters are seen to be well-constrained, and as anticipated the relaxed cluster profile (or the most regular profile) has tighter constraints compared to the most disturbed cluster (or the most irregular profile) which has relatively larger contour levels.Figure.A.3 of Appendix A.2 shows the comparison of temperature profiles and the reconstructed temperature profiles of 20 example clusters in the validation sample.We also tested the effect on the IAE model of increasing the number of anchor points.We found that the model fidelity can be improved by increasing the number of anchor points and that the choice of 20 anchor points reduces the residuals significantly.In Appendix A.2, Fig. A.4  we show the recovered ensemble plot of fractional residuals using the IAE model with 20 anchor points for the full validation sample, and for the different sub-samples.There is a significant improvement in the average fractional residual in all the cases.The median of the fractional residuals for the full sample over the entire radial range is

Fig. 11 :
Fig. A.2 shows the corresponding posterior distribution of the parameters of the IAE model obtained using MCMC.The parameters are seen to be well-constrained, and as anticipated the relaxed cluster profile (or the most regular profile) has tighter constraints compared to the most disturbed cluster (or the most irregular profile) which has relatively larger contour levels.Figure.A.3 of Appendix A.2 shows the comparison of temperature profiles and the reconstructed temperature profiles of 20 example clusters in the validation sample.We also tested the effect on the IAE model of increasing the number of anchor points.We found that the model fidelity can be improved by increasing the number of anchor points and that the choice of 20 anchor points reduces the residuals significantly.In Appendix A.2, Fig. A.4  we show the recovered ensemble plot of fractional residuals using the IAE model with 20 anchor points for the full validation sample, and for the different sub-samples.There is a significant improvement in the average fractional residual in all the cases.The median of the fractional residuals for the full sample over the entire radial range is Fig. A.2 shows the corresponding posterior distribution of the IAE model parameters.

AFig. 13 :
Fig.13: Results for the most relaxed and disturbed clusters and for the most smooth and irregular profile with the 2D-3D fit (coarse binning)

Fig. A. 6 ,
shows the corresponding posterior distribution of the parameter.One finds that the confidence intervals for the IAE model parameters are larger compared to fine binning cases (i.e.cases 1 and 2).However, we were still able to put relatively good bounds on the parameters, which are represented by nearly Gaussian posterior distributions.Figure A.7 in Appendix A.3   shows the comparison of true 2D and 3D temperature profiles and the reconstructed temperature profiles of 20 example clusters in the validation sample for the twelve bin case.

Fig. 15 :
Fig.15: Fractional residuals for 115 clusters in the validation sample with IAE for the 2D-3D fit (coarse binning) using spectroscopic-like 2D temperature profiles defined at twelve radial bins up to R 500 .For simplicity, we have not shown the subsample cases.
Fig. A.6 in Appendix A.3, are larger compared to previous cases.Finally Fig. A.8 in Appendix A.3 shows the comparison of true 2D and 3D temperature profiles and the reconstructed temperature profiles of 20 clusters in the validation sample for the six bin case.

Fig. 16 :
Fig.16: The 1-σ dispersion in the 3D fractional differences obtained with MCMC for priors provided in Table3for theVikhlinin et al. (2006) parametric model (Eqn.2).In the figure, we consider the 2D-3D fine binning case and 2D-3D observational-like coarse binning cases with twelve and six bins.The top panel shows the results with prior ranges for a = 0 − 0.6 and c = 0 − 4, while the bottom panel presents the results with priors ranges for a = 0 − 0.1 and c = 1 − 4. The regions enclosed by cyan and magenta lines in the bottom panel show the corresponding dispersion recovered with the IAE model for the observational-like cases with twelve and six bins respectively.

AFig. 17 :
Fig. 17: CC and NCC model recover comparison.Left panel: Comparison of the 3D temperature profiles of typical CC and NCC clusters in the Three Hundred Project sample recovered with the IAE and parametric models using twelve 2D annuli within R 500 (points with error bars).The dashed line shows the true 3D temperature profiles.The solid lines and shaded regions show the reconstructed 3D temperature profiles with 1-σ dispersion obtained with the IAE model.The dotted lines are the 3D temperature profiles recovered with the Vikhlinin et al. (2006) parametric model.For better visibility, the 1-σ dispersion for the parametric model is not shown.Right panel: 3D temperature profile reconstruction with the IAE and parametric models for two complex cases in the Three Hundred Project .For better visibility, 2D profiles and the 1-σ dispersion are not shown.For both figures, the bottom panel shows a fractional difference between the true and recovered 3D profiles.For NCC and CC clusters, both the IAE model and parametric model reconstruction with optimal priors are comparable, but the former exhibits slightly better performance.For the complex cases, the IAE model is more accurate in uncovering the profile shapes.

Fig. 19 :
Fig. 19: Scaled 3D temperature profiles of the DR1 sample recovered with the IAE model.Also shown in the bottom panel is the ratio of 3D temperature profiles recovered with the IAE model to the parametric models.For better visibility, the error bars corresponding to the individual profiles are not shown.The black lines and grey shaded grey regions represent the median and 1-σ dispersion of the sample.The difference between the IAE model and the parametric model can be as high as 20%, although the average ratio between them remains close to unity.

Fig. 20 :
Fig. 20: Left Panel: Comparison of the observed T X and the best-fit T X,model obtained with non-parametric IAE and parametric Vikhlinin et al. (2006) models.Solid lines show the best fit for the data.We see that both our non-parametric and parametric approaches provide tight and accurate constraints on the average temperature of clusters.
, which uses the parametric Article number, page 20 of 32 A. Iqbal et al.: Deprojection and deconvolution of temperature profiles with deep learning

Fig. 21 :
Fig.21: Comparison of the logarithmic derivatives 3D temperature profiles of a typical NCC (PSZ2 G050.40+31.17)and CC (PSZ2 G057.92+27.64)cluster in the DR1 sample recovered with the IAE and parametric models.Solid lines and the associated shaded regions show the median and 1-σ dispersion obtained with MCMC.The region enclosed by the dashed lines represents 1-σ dispersion, if no smoothing is applied to the profiles derived from the MCMC chain.The bottom panel shows the ratio of the median 3D temperature profiles obtained using IAE with and without smoothing.

Fig. 22 :
Fig. 22: Logarithmic derivatives of 3D temperature profiles of the DR1 sample recovered with the IAE model.Also shown in the bottom panel is the difference between profiles recovered with the IAE model and the parametric model.For better visibility, the error bars corresponding to the individual profiles are not shown.The black lines and grey shaded grey regions represent the median and 1-σ dispersion of the sample.

Fig. A. 7 :
Fig. A.7: Left panel: Comparison of the 20 simulated 2D temperature profiles (solid points with errors) and reconstructed 2D temperature profiles obtained using IAE model (solid lines), the fitting being performed in the range [0.02-1] R 500 considering twelve 2D temperature bins.The shaded regions represent the 1-σ dispersion of the reconstructed 2D temperature profiles.The smaller subplots show the residuals of the fit.Right panel: Solid lines and the shaded regions show the corresponding reconstructed 3D temperature profiles and the 1-σ dispersion respectively.Also shown in the dashed lines are the true 3D mass-weighted temperature profiles.Article number, page 31 of 32

Table 1 :
Details on the neural network architecture and hyper-parameters used in this work.

Table 2 :
Flat priors used for the IAE model parameters.

2 .
2D-3D fit with fine binning: we fitted the 2D projected temperature profiles with the IAE model convolved with a projection matrix C. In this case, we wish to assess how well the IAE model recovers the intrinsic 3D temperature profile when only 2D projected data are available.We used the same 2D radial logarithmic binning as that of the T 3D profiles, meaning that C has dimensions of 48 × 48(C 48,48).For this testing phase, we assume standard emission measure weighting to calculate the elements of C. 3. 2D-3D fit with coarse binning: the 2D projected temperature profiles having coarse logarithmic radial binning of twelve or six points up to R 500 were fitted to the IAE model convolved with matrix C. Here, the goal is to assess the ability of the IAE model to recover the intrinsic 3D temperature profile when only a coarse 2D projected profile, similar to that obtained from present-day observations, is available.In this case C has a dimensions of 12 × 48 (C 12,48 ) and 6 × 48 (C 6,48 ) for the 2D temperature profiles with twelve and six bins respectively.As above, we use standard emission measure weighting to calculate the elements of C. In Sect.5.4, we will also consider the Mazzotta et al. (2004) temperaturedependent spectroscopic-like weights.
Article number, page 16 of 32 A. Iqbal et al.: Deprojection and deconvolution of temperature profiles with deep learning

Table 5 :
Best fit results for the IAE parameters derived with the MCMC for the fitting schemes and samples considered in Secs.5.1, 5.2 and 5.3.
sample recovered with the IAE and parametric models.Solid lines and the associated shaded regions show the median and 1-σ dispersion of the reconstructed 3D temperature profile obtained with MCMC.Regions enclosed by the dashed lines represent the corresponding 1-σ dispersion 2D temperature profiles fitted to the observed 2D data (black dots).In line with our results with simulations for observational-like cases, we find that both the IAE model and parametric model with optimal priors generate comparable profiles for NCC and CC clusters.