A Robust Model for Flux Density Calculations of Radio Halos in Galaxy Clusters: Halo-FDCA

Here we present Halo-FDCA, a robust open source Python package for modeling and estimating total flux densities of radio (mini) halos in galaxy clusters. Radio halos are extended ( ~200 - 1500 kpc in size) synchrotron emitting sources found in galaxy clusters that trace the presence of cosmic rays and magnetic fields in the intracluster medium (ICM). These sources are centrally located and have a low surface brightness. Their exact origin is still unknown but they are likely related to cosmic rays being re-accelerated in-situ by merger or sloshing driven ICM turbulence. The presented algorithm combines the numerical power of the Markov Chain Monte Carlo routine and multiple theoretical models to estimate the total radio flux density of a radio halo from a radio image and its associated uncertainty. This method introduces a flexible analytic fitting procedure to replace existing simplistic manual measurements prone to biases and inaccuracies. It allows to easily determine the properties of the emission and is particularly suitable for future studies of large samples of clusters.


Introduction
Galaxy clusters are massive gravitationally bound systems consisting of hundreds to thousands of individual galaxies and have total masses of the order of ∼ 10 14−15 M ⊙ . However, most of the baryonic matter in galaxy clusters is in the form of a hot dilute plasma that permeates the cluster's volume and that is called intracluster medium (ICM). The ICM emits thermal bremsstrahlung at X-rays wavelengths (Forman and Jones, 1982;Sarazin, 1986).
In the radio band, particularly at lower frequencies, extended radio emission is observed in some galaxy clusters (for a recent review see van Weeren et al., 2019). This diffuse emission is not directly associated with individual cluster member galaxies and indicates the presence of cosmic rays (CR) and magnetic fields in the ICM. This radio synchrotron emission generally has a steep spectrum, with ≲ −1 ( ∝ where is the spectral index). Cluster-scale emission in galaxy clusters has been classified into two main categories: radio (mini) halos and relics. The focus of this work are radio (mini) halos which are centrally located and their morphologies approximately follow the X-ray emission from the ICM. They are thought to trace particles accelerated by turbulence in the ICM. Giant radio halos extending over Mpc-scales are thought to trace turbulence induced by energetic cluster merger events (Brunetti et al., 2001;Petrosian, 2001). In dynamically relaxed clusters, diffuse radio emission covering the central region of the cluster can be found in form of radio mini-halos (e.g., Gitti et al., 2002;Giacintucci et al., 2017). These sources are smaller in size compared to halos (100s kpc vs Mpc-scale), and they are believed to trace turbulent motions in cluster cores injected by sloshing or by the central AGN (e.g, Mazzotta and Giacintucci, 2008;ZuHone et al., 2013). The precise acceleration mechanisms that operate in the ICM are still a point of debate, although some consensus exists on the global processes (e.g, Brunetti and Jones, 2014).
Radio halos and mini-halos are generally morphologically smooth, although there are several examples of irregular shapes thanks to the advent of increasingly high sensitive observations that unveil substructures within the diffuse emission itself (e.g., Gendron-Marsolais et al., 2017;Botteon et al., 2020a). Examples of known and well-studied clusters with central bright and extended radio emission are the Coma cluster, Abell 2744, and the Perseus cluster (e.g., Giovannini et al., 1993;Sijbring and de Bruyn, 1998;Brown and Rudnick, 2011;Pearce et al., 2017). Currently, more than 100 diffuse radio sources in the ICM are known and their number is increasing rapidly due to recent advances of low-frequency radio telescopes and surveys (e.g., Jonas, 2009;Norris et al., 2011;van Haarlem et al., 2013;Duchesne et al., 2017;Hurley-Walker et al., 2017;Shimwell et al., 2017;Shimwell et al., 2019;Gupta et al., 2017;de Gasperin et al., 2021).
Using large samples of radio halos we can study their statistical properties, such as how the total power and emissivity in the cluster's volume scale with cluster mass. These quantities form an important basis to test theoretical models for the formation of diffuse sources in the ICM. Using a fitted flux density profile, the total flux, surface brightness, and size of a halo can be accurately estimated. Currently, measured values in the literature often adopt a boundary of the radio emission based on a certain contour level determined by the noise of the radio image. This means that the sizes and integrated flux densities do depend on the map noise and grow when deeper observations are available. The main objective in this work is to develop a robust model to fit the flux density profiles of radio halos in galaxy clusters which results in a generic algorithm functioning with a little user input, limiting the biases introduced by manual measurements.
Our Halo-FDCA (Halo-Flux Density CAlculator) builds on the work of Murgia et al. (2009); Bonafede et al. (2017). It improves upon previous works as flux density profiles are fitted directly to 2D images instead of first radially averaging them. It also adds additional models that can take into account more complex, asymmetric shapes of the diffuse emission. It uses the Markov Chain Monte Carlo method to provide a robust assessment of the associated uncertainties. The method is generic and can be applied both to radio halos and mini-halos 1 . The code is freely available as an open source project, coded in Python, on GitHub 2 . This paper is organized as follows in Sect. 2 we describe the underlying theoretical models for the surface brightness distribution that we fit. This is followed by Sect. 3 where we describe the implementation of the Python code. We show three examples and results of the fitting in Sect. 4. We end with a discussion and conclusions in Sects. 5 and 6

Analytic Fitting Models
There are already existing methods to measure and estimate the flux density of a radio halo. The most commonly used method is to manually measure the flux in regions around the radio halo by following a certain radio surface brightness ("contour") level. Often, values around 2 rms or 3 rms are used, where rms is the root mean square noise in the radio image. The image noise used here is calculated using the findrms function taken from the LOFAR ddf-pipeline 3 .
A second method is to fit a certain functional form to the radio emission to determine its flux density. Murgia et al. (2009) did this by first azimuthally averaging the surface brightness in concentric spherical annuli and then fitting a spherically symmetric exponential model to the one-dimensional radial surface brightness profile of the form where is a characteristic -folding distance to the halo centre and 0 is the central surface brightness. Since the introduction of this model, it has been used a number of times for flux calculations of halos (e.g. Orrú et al., 2007;Vacca et al., 2014). There are however examples of halos that do not follow this idealized circular distribution: some may have elongated morphologies, e.g. the Toothbrush cluster (van Weeren et al., 2012;Rajpurohit et al., 2018) and El Gordo (Lindner et al., 2014;Botteon et al., 2016), while others show even asymmetric distributions, e.g. MACS J0717.5+3745  and Abell 2744 . It has been suggested to compose a more general and robust model to fit profiles to a wide variety of radio halos .
The main advantage of fitting models to the radio halo emission is that the integrated flux density should not be directly dependent anymore on the depth of the observations and choice of "contour" level. Measuring the flux density 1 For simplicity, in the following we shall use only the term "halo". 2 https://github.com/JortBox/Halo-FDCA 3 https://github.com/mhardcastle/ddf-pipeline within a region delimited by a certain rms contour level generally leads to larger values if this methodology is applied to another image of the same cluster with a lower noise level (e.g. a deeper follow-up observation), as the halo may appear larger. Vice versa, a high noise level image of the same halo would lead to a smaller measured radius and lower flux density estimate. Using models, we obtain a proper measure of the size of the halo that can be defined and reproduced. The sizes of halos have been commonly measured using a rms contour level, leading to different sizes depending on the noise map.
To construct the analytic models, we start by defining the unconvolved model denoted as ( ), with = ( , ) Cartesian image coordinate vector. The convolved model that is actually fitted to a data image with a certain synthesized beam shape is given by For the beam shape, we take (3) as defined in Condon & Ransom (2016). In this definition, denotes the Full-Width at Half-Maximum (FWHM) of the beam major axis, the FWHM of the beam minor axis and the counterclockwise shape rotation. This same rotation transforms a traditional coordinate system ( , ) to Furthermore, FWHM = √ 8 ln 2, with the Gaussian standard deviation. These are the same quantities as commonly defined in radio observations.
In general, we can define an exponential model to be where ( ) is a surface brightness and ( ) is the function that takes different forms depending on the complexity of the model chosen by the user in Halo-FDCA. We will now introduce the three main models implemented in the program.
Circular. In the case of the circular model, this function will take the simple form ( ) = | |∕ , where | | denotes the length of positional vector . Coordinate offsets for the different models are implemented by simply replacing by − with the halo centre (or halo location; = ( 0 , 0 ) ). This form then depicts a 4 parameter model with parameter set = { 0 , 0 , 0 , }.
Increasing complexity for this model is done analogously to how a two-dimensional circular Gaussian can be generalized. Both a standard two-dimensional Gaussian and ( ) are Gaussian types with different form factor . A circular Gaussian is defined as  , 0, 0, 75, 50, 120, 90, ∕4}. where 0 is in mJy arcsec −2 and all the radii are in pixel units. This image was created using equation (9). with = 0.5 and the amplitude. In our exponential case, = 0 and √ 2 would be replaced with . In this discussion, we explicitly work with a general form that includes the form factor. The presented models and algorithm allow for fitting the factor as an extra parameter to be able to acquire an insight into the form factor of radio halos. Based on the needs of the user, can be adopted as an optional parameter. By default, = 0 and will not be treated as a fitting parameter by Halo-FDCA. Including , the circular model becomes Ellipsoid. So far the model discussed here (with = 0) is the same as in Murgia et al. (2009). From here, a few generalizations are introduced. The model can be extended by separating the two principal axes into two terms and by allowing for rotation with respect to the coordinate system. Such a model is defined via The set of parameters for this six dimensional model is Skewed. The next generalization is the construction of a skewed model that allows for an off-center maximum of the intensity distribution. To construct a skewed model, the euclidean plane is divided into its four primary quadrants. The skewed model is then made up of functions defined on a single quadrant only, this is visualized in Fig. 1. In theory, one must be cautious and think carefully about exactly where the function is defined and how these functions connect to each other at the edge of a quadrant (the = 0 and = 0 lines).
The division is made because, in this way, every quadrant can have its own exponential function with two -folding radii at the edge. The first quadrant for example, has radii for the positive and -axis. To make the overall distribution 'smooth', the functions of each quadrant have equal radii where their edges coincide. Furthermore, all 4 sub-functions are required to have an equal 0 .
We define new parameters + , which is defined as the radius in the positive direction and − , which is defined as the radius in the negative direction. In a similar fashion, + and − are defined. Equation (9) provides the mathematical representation of the function. It can also be defined differently but this form simplifies the implementation in the algorithm.
Contrary to previous models, this one is piece-wise continuous and consists of four functions only defined on a specific quadrant of the euclidean plane. This makes integrating this function somewhat more complex. The eight parameter function is given by ( ) produces the most general exponential profile.
( ) and ( ) account for the offset and rotation of the elliptical shape. The four piece-wise functions are used to introduce skew to the function. The fitting Parameter set for this model The most general form of the exponential (Equation (9)) reduces to Equation (8) by setting + = − ⇒ and + = − ⇒ . This can further be simplified to a 5 parameter model by setting = 0. Equation (7) will be found when + = − = + = − ⇒ and = 0.

Profile Integrals
The total halo flux density can be calculated by integrating the analytic functions from the former section. The total flux of a radio halo at frequency is given by There might be cases where integrating up to infinity is not desired. In that case it is possible to integrate up to an arbitrary distance . It is important to note that the flux density is integrated from the unconvolved model rather than from the convolved one. This can be done because the integral over both functions are equal to each other thanks to the fact that the beam is normalized. An integral over one quadrant with -folding radii and can be expressed in terms of the lower incomplete gamma function ( , ): where for simplicity, 0.5 + = . To ensure convergence, we set > 0. In the above expression, denotes the radius of integration in units of e-folding radius . When integrating up to infinity, however, the limit → ∞ is taken such that where Γ is the complete Gamma function, whose definition can be deduced from the equation above. With the 8 parameter skewed function, takes the form: The extended derivation of the integral over Equation (5) with Equation (9) can be found in Appendix A. From there, simplifying that expression will give us the integrals for the simpler functions. Conventionally, analytic models are not integrated up to infinity, but adopt a radius of three times the found -folding radius (see Murgia et al., 2009). In that case, one would take = 3 and thus (3) ≃ 0.8 (∞) for = 0 (this can be checked by plugging in = 3 in footnote 6 in Appendix A). This is independent of found parameters, which means the flux as integrated to 3 is 80% of the flux with integration to infinity for = 0 for any obtained set of parameters.
From the total flux of a certain model, the radio power can be calculated. From , the power is given by where is the luminosity distance. To calculate values in physical units (e.g. kpc), a flat ΛCDM cosmology with Ω Λ = 0.7, Ω = 0.3, 0 = 70 km s −1 Mpc −1 is assumed in Halo-FDCA. A conversion from an observed frequency obs to the desired frequency can be made using ∝ . The code has a built-in function to retrieve the radio power.

Input data for the fitting
To properly separate radio halo emission from contaminating sources sufficient spatial resolution is required. The images used for the radio halo profile fitting are expected to be free from other radio sources. Two approaches can be taken to achieve this. This first approach is to subtract all discrete sources from the visibility data by building a model of the contaminating sources in the field using appropriate uv-cuts. This approach is commonly used and can be successful as long as there are no extended radio sources embedded within the radio halo emission.
A second approach is to mask all regions affected by radio sources other than the radio halo. The code is capable of fitting halos where significant portions are left out by a mask. The algorithm simply extrapolates the model over the regions where data is censored. In practice, the two methods must be combined so that the compact radio sources are subtracted and the extended sources are masked. Halo-FDCA accepts optional DS9 (Joye and Mandel, 2003) region files to mask areas where the subtraction of radio sources has been inaccurate, or where other types of diffuse emission or residual calibration artifacts are present. To investigate further the influence of masking parts of the image, we run the code on a simple radio halo applying different masks. This is discussed in Section 4.3 Finally, the radio halo emission in the input images is required to be deconvolved and restored, otherwise, the flux measurements can be highly overestimated in particular for arrays with dense inner uv-coverage and sparser coverage in the outer parts of the uv-plane (such as the GMRT and LOFAR). To achieve deep deconvolution, multi-scale clean (Cornwell, 2008;Offringa and Smirnov, 2017) is advised. The description of pipeline and the up-to-date list of required and optional parameters can be found on the Halo-FDCA GitHub page ( https://github.com/JortBox/Halo-FDCA).

Markov Chain Monte Carlo Algorithm
For fitting a flux density profile, a Markov Chain Monte Carlo (hereafter MCMC) simulation is used within Halo-FDCA. This was done in Python 3+ using the emcee module (Foreman-Mackey et al., 2013), which is based on the theory described in Goodman and Weare (2010). This package is a tool for probabilistic data analysis and model fitting using Bayesian inference to, in this case, find model parameters that maximize a likelihood function . We adopt the following definition for the log-likelihood function Here, ( ) is the image containing the radio halo and denotes the total number of pixels. Bayesian inference is a statistical approach to parametric fitting. It is based on the belief of an event happening. Finding best-fit parameters for any parametric model in Bayesian inference can be done by solving Bayes' theorem with , Θ random variables. Furthermore ℙ(Θ = ) is called the prior and ℙ(Θ = | = ) the posterior. In other words, the equation reads that given the data , the probability that a parameter set describes the model is equal to the probability of the data given the model, times the probability of that set of parameters can exist, divided by the probability that the data is observed. Given a model and independent and identically distributed data , the best fit parameters are found by maximizing In this last equation, the term corresponding to ℙ( = ) was dropped since it is a constant independent of the model parameters, thus not affecting the location of the maximum in equation (14).
This definition for the likelihood function is derived from assuming that the data is described by ( ) = [ ( ;̂ ) + ( )] * , where is the underlying noise map: which is normally independent and identically distributed (iid) around zero. This approximation makes the definition of the likelihood function relatively easy. How close the actual noise distribution is to a normal distribution depends on the image quality. Equation (18) implies that independent data points are required in order to fit models than the data. Generally, pixels values in radio images are correlated and cannot be considered independent values. To assure the

Parameters
Rotating and regridding Figure 3: This flowchart gives a schematic representation of the key processes in the code. The boxes within the larger blue cluster represent processes directly related to the MCMC run. The algorithm takes compact source subtracted radio images and, in principle, returns the MCMC sampler chain which is the distribution of best fit parameters. With this chain and the information of the MCMC settings, all the relevant results such as radio flux and power can be calculated.
independence of data points, data images are rotated and regridded based on the image beam such that pixel areas are equal to the beam area. In practice, new pixel sizes are calculated through In this procedure, the total flux density is preserved. These procedures are included in the flowchart in the next section ( Fig. 3) The total process of rotating and regridding images is visualized in Fig. 2. Regridding inevitably results in using images with different aspect ratios than the original data due to the often elliptical beam shape. An extreme case of this is for the example reported in Section 4.2.3 for the Phoenix cluster, where the beam shape is very elongated. The emcee code works by exploring the parameter space efficiently with walkers. These walkers jump from value to value using a certain move algorithm. By default, the Stretch Move is used by emcee. An MCMC run samples the parameter space and results in a chain with all the sample information in it. From these chains, one can determine the resulting maximum likelihood parameters and their uncertainties. The technical and theoretical descriptions of emcee are outside the scope of this paper, it can be found in Foreman- Mackey et al. (2013) and Goodman and Weare (2010).

Model Selection and Parameter Dependence
Every model will in practice result in different flux density measurements. The question then arises which model is the best one to describe the data. In this case, a useful indicator is the reduced 2 ( 2 red ). The 2 red is a frequently used value that provides insight into the 'goodness' of a fit, with 1 typically indicating a good fit. If 2 red < 1, the model is over-fitting and if 2 red > 1, the model is under-fitting. In this case 2 red is defined as

Image cropping and initial parameter guesses
In this section, we provide an overview of the main processes of Halo-FDCA, such as pre-fitting methods and probability priors. This section is summarised in the flowchart in Fig. 3.
Before running the MCMC algorithm, the data is slightly processed as preparation with the aim of decreasing run time. Because a convolved version of the analytic model is fitted to the image, one likelihood function evaluation involves two two-dimensional Fourier transforms. Models also have to be regridded at every likelihood evaluation.
Since the halo often only covers a small part of the radio image, one way to shorten run times and increase the chance of convergence consists in simply cropping the image. A second advantage of using cropped images is that some contaminating sources present in images are removed and do not have to be masked manually. This step is shown in Figure 3 as "Decrease FoV".
When an image is given in input to Halo-FDCA, the program does not immediately know where the halo is located. Together with the location as provided by the user, a preliminary fit can be performed to approximate the location and size of the halo on the image. This fit is performed with the simplest model (Eq. (1)) without convolution. The sole purpose of this first fit is to get an idea of the approximate size, surface brightness, and location of the halo. Based on these findings, the image is cropped (when possible) to a size of 8 times the approximated radius, which is enough to prevent any loss of relevant data. The radius and location are fitted again in this cropped image to set up an initial parameter guess for the MCMC run.
The Pre-MCMC Fit is the first time the actual convolved model is fitted to the input data. It takes the simple initial guesses provided by the preliminary fit. For the skewed model, this means equal -folding radii in all directions and a rotation of zero. The results of this pre-MCMC fit are taken as the fairly accurate initial guesses for the Markov-chain Monte Carlo algorithm. The pre-MCMC fit is also the stage at which the mask (a DS9 region file) is taken into account. With such a mask, specific regions in the data can be ignored by the algorithm.

Priors
Bayesian inference allows us to work with priors, which can be defined as ℙ[( ;̂ )], the chance of the model for a certain set of parameters. Using the Indicator function from statistics, we can define the prior to be where 's are given by These are the basic constraints on the parameters. max and max denote the length of the working image in both dimensions. The last inequality could also be set to 0 ≤ < 2 but we stick to the first because parameter solution sets are not unique when 0 ≤ < 2 instead of 0 ≤ < . In practice though, an angle prior − ∕4 ≤̂ < 5 ∕4 is chosen to prevent MCMC confusion around = 0 = 2 . Inequality five makes sure that what is mathematically defined as the major (x) and minor (y) axis remains associated with that axis. An extra prior requires > such that the major axis is always on the x-axis. This requirement limits the parameter freedom and allows for a unique parameter solution.
One can also choose to adopt a prior on the form factor when it is used as an extra parameter. It must be greater than −1∕2 to ensure a converging flux density integral (by default = 0). There is no upper boundary on this extra parameter.

Output
An MCMC simulation deploys walkers to explore the possible parameter space. After a fitting run, the walker chain (all values explored by a walker) is used to estimate the parameters. The way these walkers behaved and moved through the parameter space is the primary indication of whether the algorithm converged and detected a radio halo. Two types of figures are in that case important; the walker and corner plots. An example of each of these plots is shown in Figure B.1 and B.2, respectively.
The walker plot shows in what way the parameter space is sampled. The figure shows a fit for the elliptical model. A total of 200 walkers explore a certain set of parameters 1200 times. The walker plot can be used to visualize the burn-in time: the number of steps that need to be taken before the algorithm settles on a value. The walker plot then also indicates whether walkers settle on one specific value or rather jump between multiple values that have similar likelihood values. Both example figures show that the fitting procedure converged and settled on a set of optimal parameters with a certain uncertainty.

Results, fitting simulated data and examples
The first part of the results will be dedicated to fitting simulated data, where we will show the performance of the method as a function of surface brightness relative to the noise. This will be investigated by injecting artificially created halos into the algorithm to see how well true parameters can be retrieved. In the second part, we will show the application of this algorithm to three radio halos observed with the LOw-Frequency ARray (LOFAR) and the Karl G. Jansky Very Large Array (VLA). We will finally briefly study the influence of different masking on the flux density estimates for a fourth radio halo.

Fitting artificially injected halos
To demonstrate the code is reliable, images with artificially created radio halo sources were used for fitting. The diffuse sources were modelled to be elliptical with a fixed set of parameters. This allows us to check whether the code is able to recover the true parameters of a halo. An artificial halo is given by ( , ) = ( ( , ) + ( )) * where is the modelled noise map. The beam parameters are: { , , } = {32 arcsec , 27 arcsec , 54.2 deg} The gray line is the one-to-one optimal relation between the two. Ideally, obtained flux densities need to lie on this line. The red line indicates the point at which the maximum surface brightness 0 drops below 3 rms . Middle panel: relation between true and measured 0 . The accuracy of the obtained 0 decreases significantly below 3 rms and the code starts to overestimate its value. Right panel: relation between true injected flux density and obtained e-folding radii minus the true injected radius. and we set the artificial halo at a distance of = 0.3. The map noise is generated each time by making an empty image and filling it with random Gaussian noise  (0, 1). The noise map is then scaled to match rms = 0.14 Jy arcsec −2 The total map is then convolved with the known image beam. In total, 40 artificial halos of identical shape and varying flux density (or equivalently, varying 0 ) are constructed to test reproducibility as a function of halo surface brightness. The halo that is injected is based on a rotated elliptical model with parameters: = 500 kpc, = 200 kpc, = 2 ∕3. 0 varies based on the flux density. Flux densities range from 1.67 − 100 mJy ( 0 is then calculated via Equation (8)).
Modelled halos corresponding to = 11.75 mJy and = 100 mJy are shown in Figure 4.
The flux density, 0 and radii values obtained running the algorithm are presented in Figure 5. It shows that the code is behaving as expected: it is able to recover well the flux densities for the high surface brightness halos, while it fails

Measured halo size [kpc]
Semi-major axis at 3σrms Semi-minor axis at 3σrms Figure 6: Left panel: relation between true injected and measured flux density using MCMC.
In this panel, the flux densities with radii corresponding to the ( ) = 3 rms contour are also shown with grey symbols. This is the method that has often been adopted in the literature. The dashed gray line is the one-to-one optimal relation between the two. The red line indicates the point at which the maximum surface brightness 0 drops below 3 rms . So from here = 0 for the ( ) = 3 rms contour measurements. Right panel: relation between true injected surface brightness divided by the noise and obtained halo size up to ( ) = 3 rms .
to settle at the 4 lowest flux density values. This is not surprising since the maximum surface brightness of these halos is well below the 3 rms level. Interestingly, the figure shows accurate flux density estimates at low surface brightness despite the high uncertainties of the other parameters. This is possibly due to a correlation between maximum surface brightness and radii. This is also seen in Figure 5, where at low surface brightness, radii seem to be underestimated (leading to lower flux densities) while 0 is being overestimated (leading to higher flux densities) resulting in an accurate flux density where both effects are canceled out.
Conventionally, sizes of radio halos are measured visually or up to 3 rms contours (e.g. Cassano et al., 2007). These noise-based size estimations dramatically influence the measured flux density. Figure 6 stresses the inaccuracies resulting from this. The left panel shows both the flux estimated as in Figure 5 in black and the flux density with a radius corresponding to 3 rms . It is clear that this estimation is accurate for very bright halos but underestimates the flux for faint halos and it naturally drops to zero flux as the maximum surface brightness reaches 3 rms . The right panel shows the halo size (up to the 3 rms contour) for all the artificial halos. The figures illustrate how basing results on manual/visual measurements can bias the outcome.

Three examples
We now use two giant radio halos (Abell 2744 and RXC J1825.3+3026) and a radio mini-halo (Phoenix Cluster) with known flux densities as examples to show how Halo-FDCA compares to manual measurement methods.
The radio halo in Abell 2744 was first identified by Giovannini et al. (1999) van Weeren et al., 2014). Recently, Timmerman et al. (2020) studied this system using deep multi-frequency multi-configuration VLA observations. In the following, we will be using the same images as published in the mentioned works. We note that the uncertainty presented here only takes into account uncertainties introduced by the fitting process and we ignore flux-scale uncertainties. Figure 7 shows all three clusters as they were used during fitting. These figures show the contamination due to compact sources or partially subtracted extended radio galaxies that are left out before a mask is applied during fitting. An important aspect of the fitting is checking whether there is a statistically significant result. The figures and table shown in this section help motivate Halo-FDCA converged on the expected emission. The signal-to-noise ratio, ∕ (defined as the flux density value divided by the uncertainty) and 2 red provide a handle on what model is best to use. The skewed model might describe bright, extended, or irregular halos better (especially with deep, high signal halo images) while the circular model is a safe choice for faint halos.
We performed the fitting on a local 96 core (four AMD EPYC 7401 24-Core, 2.00 GHZ CPUs) node with a total of 512 GB of internal memory. For fitting Abell 2744 (384×384 pixels 2 ) for instance, the code took 20 minutes to run. This is with multiprocessing turned on. Running the code without using a Python parallelization increases run-times to over 2.5 hours.

Abell 2744
For this object, a 1.5 GHz VLA image was used. The initial image was 384×384 pixels 2 with scale 4 arcsec pixel −1 . Beam characteristics for this observation are = 13, 23 arcsec, = 13, 23 arcsec and = −80.77 deg. The noise of the original image is rms = 18.10 Jy beam −1 . Due to regridding, the uncorrelated image had a size of 60×60 pixels 2 with rms = 15.44 Jy beam −1 . We fitted three models to the image: the circular, rotated elliptical and skewed model. The general results for all three models are shown in Figure 8. Accompanying this figure, Table 1 provides numerical results. Table 1 shows that all three models result in almost identical flux density values within one standard deviation, which suggest no preference for any of the three models regarding the flux density. The same is true for the maximum surface brightness 0 values. Based on the 2 red value, the skewed model is favored over the other two, the skewed model is also the one with the smallest relative uncertainty.
More detailed figures for every model are shown in Fig.  9 also points to the skewed model as the best description of this halo since the residual image for the bottom row shows the least amount of over-subtraction by the model within the 2 rms region. This under-or over-subtraction is mainly caused by cluster substructure or possible inaccurate subtraction of discrete sources embedded in the diffuse emission. The residual images are thus potentially useful for studying substructure.
Visually, the halo extends up to approximately two times the obtained radii. The flux density is slightly above measurements from Pearce et al. (2017), who found a flux density of 1.5 GHz = 45.1±2.3 mJy. They manually drew a circular region around the emission and integrated the emission within that. The higher value reported here is probably due to the model being extrapolated to infinity well below the noise level and not being cut based on visual appearance. This causes the value to be about 20% higher than when using 3 as the integration limit. See the Discussion for some more comments on the choice of integration radii. Furthermore, in Pearce et al. (2017) the flux density in the masked regions was not extrap-olated using, contrary to what is done by Halo-FDCA.

RXC J1825.3+3026
For this object, LOFAR 144 MHz observations, published in Botteon et al. (2019) are used. The initial image is 613×613 pixels 2 large with a 6 arcsec pixel −1 scale. The regridded image size is 55×55 pixels 2 with rms = 277 Jy beam −1 . This regridded image was obtained using = = 1 arcmin and = 0 as beam characteristics. The model reconstruction of the radio halo for the circle, ellipse and skewed model are shown in Figure 10. Skewed S144MHz = 243 ± 6 mJy  results for these fits are again shown in Table 1. Figure 10 shows the algorithm suggesting quite elongated radio emission in RXC J1825.3+3026. The skewed model even points to the existence of a low surface brightness extension in the lower right corner of the radio halo. Looking at the 2 red value in Table 1, this situation is preferable, even though it has the highest relative uncertainty. From the table, it is clear that the estimated flux density varies significantly per chosen model.
This value is higher than reported by Botteon et al. (2019) ( 144 = 163 ± 47 mJy), who however reported the flux density of the radio halo from a rectangular region roughly defined by the 3 sigma level contours in the cluster center, excluding its SE extension. Our code demonstrates it can recover low-surface brightness emission that may contribute to the final flux density of the halo.

Phoenix cluster
For the last example, we use an image of the mini-halo located in the Phoenix cluster, first reported by van Weeren et al. (2014) using 610 MHz GMRT data. Here we use 1.5 GHz VLA data from Timmerman et al. (2020). The initial im- age is 324×324 pixels 2 with a noise of rms = 13.57 Jy beam −1 . The image has a 0.2 arcsec pixel −1 to sky scale. During the fitting, an image of 11×52 pixels 2 with 10.42 Jy beam −1 was used. This image is rather rectangular due to the elongated image beam with = 4, 82 arcsec by = 1, 04 arcsec and = 176.4 deg. The results for all models are shown in Figure 12 and the fitted parameters are again listed in Table 1.
The used image has a strong contaminating source right at the centre of the mini-halo, which is carefully masked out as shown in figure 12. The detailed results per fit are shown in Figure 13. This figure highlights the effect an elongated beam has on regridding.
Halo-FDCA returns a flux density between 8.1 − 8.4 mJy over the different models. Despite the small size of the minihalo (there are only a few beams across the halo in one direction) and central contamination, runs for all models result in precise flux densities that are in agreement with each other. The 2 red values for both the circular and skewed model show values very close to one, while the elliptical model turns out slightly over-subtracted in the sense that the model is locally larger than the data (visualized in residual images). This resulted in a relatively high flux density for this model. As shown in Figure 13, the skewed model appears to have the most over-subtraction within 2 rms of the diffuse emission. We therefore adopt an estimated flux density of 1.5 GHz = 8.15 ± 0.18 mJy (statistical uncertainties only) with = {39 ± 2, 21 ± 2, 28 ± 1, 20 ± 1} kpc. This agrees within the  uncertainties with the value of 8.5±0.9 reported by Timmerman et al. (2020) which was obtained by fitting a Gaussian profile.

The influence of masking
In Section 3 we discussed how to separate the radio halo emission from contaminating sources, either by masking extended sources or by using compact source subtracted images made with a uv-cut.
Here, we perform a test to show how the different methods affect the fitting. For this we used 144 MHz LOFAR data of MCXC J1036.1+5713 from Osinga et al. (2020). The compact source subtracted image from this cluster is quite clean for fitting purposes (see Figure 14, right panel), while the unsubtracted data shows some compact sources embedded in the halo (see Figure 14, left panel). We estimated the flux density of the halo in both images using two different masks for the left panel of Figure 14 to assess the influence of the adopted mask on the results. The fit results are shown in Figure 15. The compact source subtracted flux density is 13.96 ± 0.63 mJy, the flux density for the 'narrow' mask is 15.32 ± 0.70 mJy and the flux density for the 'broad' mask is 14.76 ± 0.75 mJy. This shows that the flux density is not significantly influenced by the specific mask.

Discussion
Along with the Bayesian methodology, the Halo-FDCA code introduces more generalized equations to describe the surface brightness of radio halos in galaxy clusters than the model proposed by Murgia et al. (2009). This generalization included the introduction of skewed asymmetric models and the form factor . For the sake of comparison, this extra parameter was not used in the examples. The inclusion of this parameter option could prove to be useful in cases the user desires more model shape flexibility. This option could also be used with a sample of halos to study the distribution of .
When estimating the flux densities, we have made extensive use of masks, especially for the Phoenix cluster. While the pipeline itself mostly functions without user-based biases, masking can introduce such biases as they are arbitrarily drawn. Although we did not extensively study the influ-  ence of masking on the estimation of the flux density, we expect that in most cases the masking does not significantly influence the results based on the test discussed in Section 4.3. In cases like MCXC1036.1+5713 and the Phoenix cluster, where contamination is present at the halo centre, it is advised to make sure all contamination is masked out. The code can extrapolate the model where data is missing but will falsely include contamination if it is not masked out properly. It is better to mask out too much than too little. The uncertainty on the flux density takes into account the level of masking adopted, becoming larger when larger areas are masked.
The choice for a standardized integration radius is to be made by the user. Although infinity is a natural choice from a mathematical point of view, it is not necessarily the best choice if accuracy is desired and the aim is to compare different radio halos. In particular, for radio halos detected at low signal to noise, the extrapolation to infinity might introduce unnecessary large uncertainties since most flux density comes from parts of the radio halo that are below the noise level of the map. Beyond a certain radius from the center of the halo, the model is essentially not constrained by the data in this case. Therefore, it might be better to intentionally integrate up to distance since this value relies less on extrapolation. For example, Murgia et al. (2009) adopted 3 where the surface brightness is 5% of its peak value (for the simple exponential model). This might be even too far out for barely detected halos. The flexible code allows the user to adopt their own choice of integration radius in units of .
The merging of two clusters, each with its own radio halo might introduce a situation where two radio halos overlap and joint fitting is necessary. Currently, Halo-FDCA is not able to fit more than one radio halo "component" but if required the code can be expanded to provide such functionality, keeping the overall framework intact. This would also allow the fitting of a smaller mini-halo component embedded in a larger scale diffuse component, possible examples of such a case are the clusters Abell 2142  and PSZ1G139.61+24.20 . Additionally, the overlap of a radio "bridge" (e.g., Botteon et al., 2018Botteon et al., , 2020bGovoni et al., 2019) with a radio halo might require adding another model component. However, determining what the proper functional form is for fitting a radio bridge requires more investigation.

Conclusions
We presented Halo-FDCA, a fitting algorithm based on Bayesian inference to accurately estimate the flux density of radio halos in galaxy clusters. Bayesian inference methods adopted here find parameters that best fit the data based on the conditional probability of observed data given the model. The implementation used here makes use of Markov Chain Monte Carlo fitting.
A total of three different models are included for fitting. These models include circular, elliptical and skewed exponential distributions. Halos are detected and identified by likelihood estimation rather than subjective judgment, which should allow for a better comparison of radio images from the same objects but at different resolutions and depths. The algorithm also includes a flexible option to mask any desired portion of the image.
The presented models for flux density estimation can be of great importance for deriving accurate e statistical prop-  erties of large samples of radio halos which should help us to better understand the origin of these enigmatic sources.

A. Exponential Profiles and Their Integrals
The total flux density of ( , ) is written as The skewed model is integrated here to derive a general expression for the total flux density. For this calculation, , 0 and 0 are set to zero since those parameters only transpose and rotate the distribution and do not actually change its shape. This simplifies Eq. (22) To make matters simpler, an ellipse with major and minor axis corresponding to the -folding radii of a certain quadrant are integrated over that quadrant only. This results in the evaluation of four integrals which all integrate over the elliptic exponential equation in a specific quadrant. This looks like Where corresponds to the elliptical model. It is important to note that every function that is appearing, has different -folding radii. Formally, limits have to be taken to let the functions approach zero, That kind of notation is omitted here because, in the end, it will give the same result as when it would be included.
Another step to simplify the integrals themselves is to note that the elliptic function is symmetric in every quadrant and has the exact same shape in all four of them. Using this the integrals can be taken over all the real numbers again with a factor of 1∕4 in front to account for the fact that only one quadrant is taken into account. This results in We continue to evaluate one integral with function which has -folding radii and . Writing out ( , ) yields We transform to elliptical polar coordinates ( , ) where we substitute = cos and = sin . It can be checked that the Jacobian for this coordinate transformation is such that = . This is the point where the radius of integration can be specified, since is a radial coordinate. This value can in theory be different for every quadrant, but it is for this application highly advised to stick to one value for for every integral. The integral now becomes where in the last line, ≡ 1 + 2 6 . The resulting integral is a standard one that can be evaluated fairly easily using = as a substitution. After substitution, the integral becomes This integral is generally known as the lower incomplete gamma function ( , ), defined as The expression for one of the integrals now becomes Now, we only have to add the four integrals with different -folding radii and equal together to obtain the final expression: walkers and 1200 evaluations. This particular example shows walker movements while fitting an elliptical model to Abell 2744. It can be seen in panel 2 and 6 for instance, that it takes some time before the algorithm has settled on a certain value. This is after around 200 steps. The default burn-in setting is a quarter of the total number of steps (red line), thus only values evaluated after step n/4 are used in estimating the parameters. walkers and 1200 evaluations. This particular example shows walker movements while fitting a skewed model to Abell 2744. Blue lines indicate initial guesses resulting from the pre-MCMC fit that are passed to emcee. Histograms showing the distribution of maximum likelihood parameters. The two-dimensional scatter plots show how every parameter solution is correlated. The solutions of 0 − and 0 − are in fact slightly correlated as illustrated by the elongated scatter distribution. This figure shows that a pre-MCMC fit does not always provide the most accurate initial guess (in this case for 0 , 0 and + ). The code was nevertheless able to diverge from the initial guess and find a more likely model.