Starduster: A multi-wavelength SED model based on radiative transfer simulations and deep learning

We present Starduster, a supervised deep learning model that predicts the multi-wavelength SED from galaxy geometry parameters and star formation history by emulating dust radiative transfer simulations. The model is comprised of three specifically designed neural networks, which take into account the features of dust attenuation and emission. We utilise the Skirt radiative transfer simulation to produce data for the training data of neural networks. Each neural network can be trained using ~ 4000 - 5000 samples. Compared with the direct results of the Skirt simulation, our deep learning model produces ~ 0.005 mag and ~ 0.1 - 0.2 mag errors for dust attenuation and emission respectively. As an application, we fit our model to the observed SEDs of IC4225 and NGC5166. Our model can reproduce the observations, and provide reasonable measurements of the inclination angle and stellar mass. However, some predicted geometry parameters are different from an image-fitting study. Our analysis implies that including a constraint at (rest-frame) ~ 40 micron could alleviate the degeneracy in the parameter space for both IC4225 and NGC5166, leading to broadly consistent results with the image-fitting predictions. Our SED code is publicly available and can be applied to both SED-fitting and SED-modelling of galaxies from semi-analytic models.


INTRODUCTION
Theoretical spectral energy distribution (SED) models play an important role in understanding galaxy formation. They have two main applications. The models can be fit to an observed SED to infer galaxy parameters such as dust mass, stellar mass and star formation rate (e.g da Cunha et al. 2008;Han & Han 2014;Leja et al. 2017). Secondly, we can also apply the models to producing large samples of mock SEDs using galaxy parameters obtained from simulations, and compare the derived statistical properties such as luminosity functions and colour-magnitude relations with observations (e.g. Qiu et al. 2019;Yung et al. 2019;Vogelsberger et al. 2020). The two applications use the theoretical SED models inversely, and can therefore be put into the same framework, leading to a better connection between observations and simulations. For example, the SED code developed by Robotham et al. (2020) was applied to both SED-fitting ) and SEDmodelling of galaxies from a semi-analytic model (Lagos et al. 2019. There is a gap in the comparisons between observations and simulations via theoretical SED models. Currently, most SED models use phenomenological parameters to calculate dust attenuation and emission, e.g. attenuation curve slope and dust emission spectral index, which are not direct predictions of galaxy formation simulations. This problem is typically addressed by employing empirical relations, which link dust parameters to some relevant galaxy properties (e.g. Cullen et al. 2017;Qiu et al. 2019;Hahn et al. 2021). However, the extra degrees of freedom introduced by the empirical relations weaken the constraining power of observations, leading to a difficult interpretation of the results. For instance, by adjusting the free parameters in an empirical dust model, Hahn et al. (2021) found that the UV and optical colour-magnitude relations can be reproduced by three hydrodynamical simulations that predict different intrinsic galaxy properties. An additional problem is that multi-wavelength SED models (e.g da Cunha et al. 2008;Leja et al. 2017;Boquien et al. 2019;Robotham et al. 2020) assume the same dust attenuation curve to estimate both the UV to optical attenuation and the en-ergy absorbed by dust. This is unrealistic, since the dust emission spectrum does not depend on inclination angle, while the UV to optical attenuation does. Doore et al. (2021) found that including inclination dependence in SED-fitting affects the estimation of stellar mass.
The above issues can be overcome using dust radiative transfer simulations (e.g. Fioc & Rocca-Volmerange 2019; Camps & Baes 2020;Narayanan et al. 2021, see Steinacker et al. 2013 for a review). These simulations estimate dust attenuation and emission by solving the dust radiative transfer equation given the star-dust geometry and dust extinction curves. The dependence of the dust attenuation on inclination angle is explicitly taken into account by these simulations. However, they are computationally expensive, and therefore are not widely used in SED-fitting or producing large mock galaxy catalogues (however see Camps et al. 2018).
The main purpose of this work is to develop a supervised deep learning model that emulates radiative simulations. Our model is trained using data generated by radiative simulations, and predicts the FUV to FIR SED from galaxy geometry parameters and star formation history. As complementary, some studies build the model from the opposite direction, i.e. deriving the galaxy and dust properties from a SED. Such models can be found in Lovell et al. (2019); Dobbels et al. (2020); Gilda et al. (2021). This work also gives an example of using the deep learning model in SED-fitting. An application of our SED model to a semi-analytic model will be presented in future work. This paper is organised as follows. Section 2 describes our deep learning model. Section 3 demonstrates an application of the model to SED-fitting. Finally, this work is summarised in Section 4. This paper uses concepts from deep learning directly, e.g. the fully connected layer and the multi-layer perceptron. The reader is referred to Goodfellow et al. (2016) for a nice introductory book on deep learning.
Our code, named as starduster, is available on GitHub 1 under a GPLv3 License and version 0.1.0-beta is archived in Zenodo (Qiu & Kang 2022). The implementation of the deep learning model includes many technical details and hyperparameters. It is tedious to present all of them. The reader is referred to our public code repository for the details. We start by describing the geometry model of a galaxy, which serves as the input of radiative transfer simulations. We assume that a galaxy is comprised of a stellar disk, a stellar bulge and a dust disk. Their density profiles are given by the following parametric forms: where S n is the Sérsic function: Our dust attenuation model is designed to compute the dust obscured stellar luminosity. With the geometry model described in the previous section and the linearity of the radiative transfer equation, we can express the dust attenuated luminosity as where L disk λ , L bulge λ are the intrinsic luminosities, and k disk λ , k bulge λ are the corresponding dust attenuation curves. We note that the dust attenuation curves can be obtained by simulations that contain only one stellar component, which can be run separately and are independent of the intrinsic luminosity. Accordingly, our approach first carries out two sets of simulations that predict k disk λ and k bulge λ , and then uses the resulting data to train two neural networks.
The design of the proposed neural network is illustrated in Figure 1. To understand the motivation of this design, we show some attenuation curves of the disk component as a heat map in Figure 2. These data are produced in the next section. In order to highlight the shape, these curves are normalised so that the value at the shortest wavelength is equal to one. We suggest that these curves can be decomposed into five components, Table 1. Summary of the input parameters to run the skirt dust radiative transfer simulation.

Symbol
Description Value Unit θ Inclination angle Beta distribution with α = β = 0.5 a r disk Stellar disk radius Log-uniform distribution from -2.0 to 2.5 kpc h disk /r disk Stellar disk height scaled by stellar disk radius Fixed at 0.1 r bulge Stellar bulge radius Log-uniform distribution from -0.5 to 1.5 kpc r dust /r disk Dust disk radius scaled by stellar disk radius Log-uniform distribution from -0.7 to 0.7 h dust /r disk Dust disk height scaled by stellar disk radius Fixed at 0.1 -Σ dust Dust column density b Log-uniform distribution from 3.5 to 7.5 M /kpc 2 η tot Intrinsic bolometric luminosity Log-uniform distribution from 6 to 14 L α B/T Intrinsic Bulge to total luminosity ratio Beta distribution with α = β = 0.5 a Inclination angle is scaled to [0, π 2 ] using linear transformation. b Dust column density is defined by Σ dust = m dust /(2πr 2 dust ).  Figure 1. Schematic diagram of the dust attenuation neural network described in Section 2.2.1. The neural network is designed to explicitly take into account the features of the attenuation curves generated by the skirt radiative simulation. We suggest that these curves can be decomposed into five components, i.e. a monotonically decreasing baseline, a trough, and three bumps. These features can be seen in Figure 2. i.e. a monotonically decreasing baseline, a trough, and three bumps. Our neural network takes into account these features explicitly. We can construct a sequence of data that have one of these features in the following way. Firstly, a monotonically increasing sequence can be obtained by applying a cumulative sum to an array of positive data. Then, by changing the sign, we can create a monotonically decreasing sequence. Finally, we can generate a sequence that has only one peak or trough by multiplying a monotonically increasing sequence and a monotonically decreasing sequence together. We in-troduce a renormalisation module as a final step in the neural network. This step ensures those modules which represent the features of the curves have similar scales.

Data preparation
We utilise the skirt 3D Monte Carlo dust radiative transfer simulation (Baes et al. 2011;Camps & Baes 2015 to generate training and validation data. Specifically, we use version 9 of skirt introduced in Camps & Baes (2020).
We run the skirt radiative transfer simulation using the geometry model described in Section 2.1. In ad- Figure 2. Heat map of dust attenuation curves produced by the skirt radiative simulation. These curves are generated in Section 2.2.2, which are used to train the dust attenuation neural work of the stellar disk. The curves are such normalised that the value at the shortest wavelength is equal to one.
dition to those defined in equation (1) -(3), the input parameters also include dust mass and inclination angle. We sample these parameters according to Table 1.
In addition, we adopt the Draine & Li (2007) dust grain mixture, and launch 10 7 photon packages for each run. The dust emission luminosity is not calculated for these simulations, since we are only interested in the dust attenuation curves. The output wavelength grid ranges from 0.09 µm to 100 µm, with 300 bins in logarithmic scale.
Data preprocessing is an essential step in the machine learning context. We scale all input parameters into [−1, 1]. Specifically, we apply f (θ) = 2 cos(θ) − 1 for the inclination angle, and f (x) = 2 log x−log xmin log xmax−log xmin − 1 for other parameters. The output curves are noisy due to the Monte Carlo nature of the simulations. To reduce the noise, we smooth the attenuation curves using a Savitzky-Golay filter. Furthermore, we exclude the data for k(λ = 100 µm) < 0.01, since they are extremely noisy and badly behaved. Table 2 shows the number of samples in each data set.

Training
We adopt the following loss function for the dust attenuation neural networks To minimise the loss function, we employ an Adam optimiser with a batch size of 256. We run the optimiser for 3000 epochs and change the learning rate dynamically according to Smith & Topin (2019). We train the neural network four times with different maximum learning rates at 2.5 × 10 −4 , 5 × 10 −4 , 1 × 10 −3 and 2 × 10 −3 , and choose the best result from these runs. Figure 3 illustrates the training results. The median error on the dust attenuation curves is roughly 3 × 10 −3 mag for both the stellar disk and bulge models. We will discuss the uncertainties in more detail in Section 2.4, where we combine the dust attenuation and emission neural networks.

Classifiers
As mentioned in Section 2.2.2, we exclude some badly behaved samples from the training data, and consequently the obtained neural networks should not be applied to the excluded parameter space. For this reason, we train two classifiers for the disk and bulge models respectively to identify whether the input parameters are in the effective region. Each classifier is a multi-layer perceptron with one hidden layer. Both classifiers are trained using the same method as in Section 2.2.3.

Neural network
Our dust emission neural network aims to predict the dust emission luminosity. A major difficulty is that the dependence of the dust emission luminosity on the intrinsic SED is non-linear, and therefore the training data must include the intrinsic SED. While the intrinsic SED can be derived from star formation history and a simple stellar population library, it is almost impossible to fully sample star formation history. Fortunately, we have recognised that the dust emission luminosity depends on the intrinsic SED only via the dust absorption fraction, which is a linear function with respect to the intrinsic SED. This is the key point behind the design of our dust emission neural network. In general, the dust emission luminosity also relies on dust compositions, and in this work, we assume a universal dust composition.
The proposed dust emission neural network is illustrated in Figure 4. To understand our approach, we write the dust emission luminosity as where η tot is the intrinsic bolometric luminosity, ξ tot abs is the dust absorption fraction, and Y dust λ is the normalised luminosity. We first focus on the modelling of the absorption fraction. The linearity of the energy balance equation suggests that the absorption fraction can be decomposed into two parts with where ξ disk abs and ξ bulge abs are the fractions of energy contributed by the stellar disk and bulge but absorbed by dust respectively, and α B/T is the intrinsic bulge to total luminosity ratio. We suggest that ξ disk abs and ξ bulge abs can have the following functional forms where A disk abs and A bulge abs are the mean dust attenuation functions. They can be modelled using neural networks. Different from the dust attenuation curves defined in Equation (6), these two functions are independent of inclination angle. In principle, they can be derived from k disk λ and k bulge λ . However, this approach is not adopted in this work, since it introduces an intermediate training step, resulting in extra errors. Instead, we make the dust absorption fractions linearly depend on the star formation history parameters 2 by using discrete star formation history, in which case the normalised intrinsic SED can be written as with where c i,k is the fractional luminosity in i-th metallicity and k-th stellar age bins, and Y SSP λ (τ, Z) is the normalised spectrum of simple stellar population (SSP) with stellar age τ and metallicity Z. Substituting equations (14) and (15) into equations (12) and (13) yields with The above equations suggest that once the neural network can predict ζ disk abs and ζ bulge abs , the total absorption fraction can be obtained using the linear combination. Hence, when preparing the training data, there is no need to sample the star formation history of the stellar disk and bulge, i.e. c disk i,k and c bulge i,k . Instead, these parameters can be provided by the one hot encoding. This significantly reduces the dimensions of the problem.
In terms of the neural network to predict Y dust λ , we point out that the dependence of the dust luminosity on the intrinsic SED is only via the absorption coefficient in accordance with the energy balance equation. In other words, we can write where Φ geometry is the parameter set describing the geometry model exclusive of the inclination angle. The functional form of Y dust λ is modelled using a neural network.
As illustrated in Figure 5, we introduce a specific module to reproduce the shape of the dust emission luminosity. The FIR emission is modelled by a Planckian mixture, which is a weighted sum of some Planck distributions with different temperatures. The resulting spectrum is then modified to take into account the other emission features at MIR wavelengths using a transfer module. The design of the transfer module is demonstrated in Figure 4. The key idea behind the module is probability conservation.

Data preparation
To prepare the data, we use skirt to perform a set of simulations with both stellar disk and bulge components. We apply the same method as in Section 2.2.2 to sample the required parameters for the simulations. Moreover, each run requires two SSP spectra as input for the disk and bulge components respectively. They are randomly chosen from a library. We construct the library using the Flexible Stellar Population Synthesis model (Conroy et al. 2009;Conroy & Gunn 2010) with a Chabrier (2003) initial mass function. Nebular line and continuum emissions are added following Byler et al. (2017). The raw SSP spectra are gridded using Equation (15) with six stellar age bins and six metallicity bins. The simulations require two additional parameters, i.e. the intrinsic bolometric luminosity η tot and bulge to total luminosity ratio α B/T . We sample them according to Table 1.
For other settings of the simulations, we assume that dust grains are not in local thermal equilibrium, and use the stochastic heating algorithm to calculate the dust emission luminosity. To be consistent with the dust attenuation model, we adopt the Draine & Li (2007) dust grain mixture. The output wavelength grid uses 500 logarithmic bins and ranges from 0.09 µm and 10,000 µm. Finally, we launch 10 7 photon packages for each run.  (22) and (23). The training method is described in Section 2.3.3.
. We adopt the same method as in Section 2.2.2 to preprocess the input parameters. In addition, we use the classifiers obtained in Section 2.2.4 to exclude badly behaved data. We find that the data exclusion can lead to better training results 3 . The number of samples in the data set is given by Table 2. We have tested that increasing the sample size only gives limited improvement in the training results.

training
To train the dust emission neural work, we define the following loss function The normalised dust emission spectrum can be treated as a probability distribution, and it is common to quantify the distance between two probability distributions using the Kullback-Leibler (KL) divergence. The dust emission luminosity is represented as a one dimensional array in the code, and we compute the KL divergence using the trapezoid rule. Moreover, the total absorption fraction by definition ranges from 0 to 1. Accordingly, we can treat it as a probability value and quantify the difference between ξ true abs and ξ pred abs using the discrete KL divergence. In terms of the optimisation strategy, we adopt the same method as in Section 2.2.3.
For the best result, we calculate the KL divergences for each sample, and demonstrate the corresponding distributions in Figure 5. The median KL divergence is ∼ 7 × 10 −4 for the normalised emission spectra and ∼ 4 × 10 −5 for the dust absorption fraction. A more intuitive analysis of the uncertainties will be carried out in the next section, where we combine the dust attenuation and emission models.

Combining dust attenuation and emission
In this section, we integrate the dust attenuation and emission models, and estimate the overall errors. Our multi-wavelength SED model takes the parameters listed in Table 1 and the star formation history parameters defined in Equations (14) -(17) as input. To obtain the stellar luminosity, for both the disk and bulge components, we first integrate the SSP library with the star formation history parameters and then apply the dust attenuation curves obtained by the deep learning model. Since attenuation curves are smooth, we can interpolate between them, allowing for a finer wavelength grid for the SSP library instead of the one adopted in the model training. This approach includes more emission line features from the SSP library. Dust emission luminosity is then added to the stellar luminosity to obtain a multiwavelength SED. Linear interpolation is adopted in the calculations above.
To prepare the validation data, we sample the fractional luminosity c disk i,k and c bulge i,k using a Dirichlet distri-   Table 1. To be consistent, we run the skirt simulation using the same configuration as in Section 2.2.2 and Section 2.3.2. Our validation set contains 2048 samples. Figure 6 illustrates nine example SEDs predicted by skirt and our deep learning model. They are given as black and red lines respectively. We find good consistency between them. In Figure 7, we demonstrate the errors in AB magnitude in twenty filters from FUV to FIR. We only compare the results for M true AB < −10. In the left panel, the errors in the bands are dominated by the dust attenuation model, which are at ∼ 0.005 mag with a ∼ −0.006 systematic offset. In terms of the dust emission model, as shown in the right panel, the predicted magnitudes in the MIR to FIR bands deviate from the fiducial values by ∼ 0.1 mag to 0.2 mag. The uncertainties in WISE -3.4 and WISE -4.6 are smaller at ∼ 0.03 mag.

APPLICATIONS
As an application, we fit our deep learning model to the observed SED of IC4225 and NGC5166. We  ATLAS (Eales et al. 2010). The used data are plotted in Figure 8. For the present work, we do not adopt the uncertainties from the observations. The observed errors are assumed to be Gaussian in flux. However, as discussed in Section 2.4, our deep learning model also introduces some uncertainties, which are Gaussian-like in magnitude. Some of these errors may be even larger than the observations. In addition, the skirt simulation itself also introduces ∼ 0.05 mag numerical errors as suggested by Camps et al. (2018). Hence, for the fittings of IC4225 and NGC5166, we employ a loss function that assumes Gaussian errors in magnitude.
Based on the results illustrated in Figure 7, we assume σ = 0.1 for the WISE -12, WISE -22, and IRAS -100 bands, and σ = 0.2 for all Herschel bands. For the other bands, we adopt σ = 0.05 to take into account the errors from the skirt simulation. In addition, according to the left panel of Figure 7, we also add a ∼ −0.006 mag offset to those bands. However, we find that this correction has a small influence on the results. Since we ignore the observational errors, to avoid double weighting, we do not use the fluxes in IRAS -25, IRAS -100 and Planck bands for NGC5166, which overlap the existing data points. In addition, we assume the distances of IC4225 and NGC5166 are 73.9 Mpc and 62.6 Mpc respectively according to DG15. The free parameters of the fitting model are the input parameters of our deep learning model. As listed in Table 1, we have six parameters describing the geometry and two parameters specifying the intrinsic bolometric luminosity of the stellar disk and bulge. As introduced in Section 2.3.1, we can have thirty-six parameters for the star formation history. Such degrees of freedom, however, may be unnecessary for the SED fitting. Instead, we use three parameters to describe the fractional luminosity of three stellar age bins and one parameter to specify the overall metallicity. The reduced stellar age bins are obtained by merging every two bins. The corresponding bin edges can be found in Table 3. The metallicity is applied using linear interpolation. This is a typical discrete star formation history model used by several recent SED-fitting studies (Leja et al. 2017;Boquien et al. 2019;Robotham et al. 2020). We adopt this model for both the stellar disk and bulge. We note that the fractional luminosity parameters c k should satisfy To tackle this constrain, we assume that each input parameter satisfies 0 < a k < 1, and apply the following transformation If each a k is uniformly distributed, the transformed parameters will follow a flat Dirichlet distribution. Overall, our SED-fitting model contains 15 free parameters. We utilise the particle swarm optimisation (PSO) (Shi & Eberhart 1998) to minimise the loss function defined  (2015). Solid lines with empty circles are the best-fitting results of our deep learning SED model. For the black lines, no parameter is fixed, while for the blue and magenta lines, bulge radius is fixed. In particular, when fitting the magenta lines, we fix the bulge radius at the same value as De Geyter et al. (2015). The dashed lines are obtained by running the skirt simulation using the corresponding best-fitting parameters listed in Table 3. The model free parameters and the fitting method are described in Section 3. As discussed in Section 3, observational uncertainties are not considered in the fittings, which may be smaller than the intrinsic errors of our deep learning model. by Equation (24). We run the optimiser with 32 particles for 2000 iterations. We find that the parameter space is complex and contains multiple local minima. To ensure the results are stable, we perform the optimisation 10 times for each galaxy, and choose the best-fit from these runs.
The best-fitting SEDs are demonstrated in Figure 8, and the best-fitting parameters are given in Table 3. We find that our SED model can fit the observations well. To test our deep learning model, we run the skirt simulation using the best-fitting parameters, and plot the results in Figure 8 as dashed lines. The SEDs predicted by our deep learning model are well consistent with the simulation results. Table 4 compares some relevant quantities with DG15, who carried out an image fitting analysis on these galaxies using fitskirt (De Geyter et al. 2013). Our model also predicts that both IC4225 and NGC5166 are edgeon galaxies, consistent with the galaxy images. Our predicted galaxy properties are broadly consistent with DG15 but with some anomalies. For IC4225, the stellar disk radius is overestimated by a factor of 2. For NGC5166, our results overestimate the dust disk radius and the dust mass by a factor of 3 and 4 respectively. An additional issue is that for NGC5166, our model cannot fit the fluxes in the WISE -22 and IRAS -70 bands. This behaviour was also found by DG15. Their results show even poorer consistency with the observations at MIR to FIR wavelengths. The issue is known as the dust energy balance problem, which was reported by several image studies (e.g. De Looze et al. 2012;Mosenkov et al. 2016Mosenkov et al. , 2018. Moreover, for both galaxies, the predicted bulge radius is significantly underestimated and reaches the minimum value in our model 4 .
To understand the discrepancies, we examine the impact of bulge radius on predicted SEDs. To examine the dependency in the SED on bulge radius, we perform two additional fittings for each galaxy, doubling the bulge radius and fixing the bulge radius at the same value with DG15. The additional results can also be found in Figure 8, Tables 3 and 4. Figure 8 shows that increasing the bulge radius lowers the fluxes at 20 µm − 80 µm. This trend is very obvious for NGC5166, in which case the predicted SED has a constraint at 70 µm. In Tables 3 and 4, while no apparent trend can be found in the changes of the best-fitting parameters, the results become more consistent with DG15, particularly for IC4225. The relative sizes of the dust disk, stellar disk and bulge can affect the dust optical depth significantly, resulting in different behaviours of dust attenuation and emission. Our results imply that these parameters are highly degenerated.
We propose that including information at 20 µm − 80 µm could alleviate the degeneracy in the geometry parameters. To verify this, we carry out a fitting for each galaxy with mock data at 40 µm. The mock data are estimated from the best-fitting results of using the same bulge radius as DG15. We use a tophat filter centred at 40 µm with a width of 10 µm to compute the flux. In addition, for NGC5166, we treat the flux in WISE -22 as an outlier and exclude it in the fitting. We show the additional results in Figure 8, Tables 3 and 4. When including the mock data at 40 µm, we obtain better constraints for the bulge radius and achieve better consistency with DG15 for the geometry parameters. In particular, for IC4225, the results with mock data and fixed bulge radius are quite consistent. As illustrated in Figure 9, the obtained SEDs in both cases overlap completely. The additional fittings present above imply that fluxes at 20 µm − 80 µm are related to bulge radius and could reduce the degeneracy in the parameter space of our SED model. However, this point should be tested using a large galaxy sample. These fluxes may be observed by Herschel in galaxies at higher redshifts.
Some implications can also be found by comparing the estimated stellar mass and dust mass among the different fittings of each galaxy. We have carried out four different fittings for both IC4225 and NGC5166, leaving all parameters free, fixing the bulge radius and including mock data. As demonstrated in Table 1, while the variations in stellar mass, dust mass and dust absorption fraction are small, geometry parameters such as dust disk radius show large differences. Our predicted stellar mass is slightly overestimated compared with DG15, who measured the stellar mass using magphys (da Cunha et al. 2008). This difference is consistent with the study by Doore et al. (2021), who found that inclination independent SED models could underestimate stellar mass. In addition, the different choices of the simple stellar population library could also impact the measurements of the stellar mass. In terms of the dust mass, while our predicted value for IC4225 is broadly consistent with DG15, there is a disagreement for NGC5166. A potential reason could be that both fitskirt and our SED model cannot reproduce the flux in WISE -22, which introduces systematic errors in the fittings. Figure 9. Best-fitting SEDs of IC4225 and NGC5166 with mock data at 40µm, which are shown as orange lines. The mock data is computed using the best-fitting SEDs with fixed bulge radius, which are shown as magenta lines. For IC4225, the orange line overlaps the magenta line. The back lines correspond to the best-fitting results without fixed parameters. The dashed lines are obtained by running the skirt simulation using the corresponding best-fitting parameters listed in Table 3. The model free parameters and the fitting method are described in Section 3. As discussed in Section 3, observational uncertainties are not considered in the fittings, which may be smaller than the intrinsic errors of our deep learning model.