Machine learning assisted identification of grey-box hot metal desulfurization model

ABSTRACT Hot metal desulfurization operates as an extraction process in steel production after the blast furnace. Mathematical modeling of the process provides the basis for control and optimization solutions. Owing to complex dynamics, an exhaustive physico-chemical description of the process is computationally infeasible. Thus, a model with a gray–box structure is considered. In the model, a neural network model for sulfide capacity of the slag is implemented in series with the reaction model. The network is trained using an independent data set. The parameters for the reaction model are identified using genetic algorithm with real-encoded population and hybrid selection and recombination operators. The gray-box model parameter distributions are estimated using Metropolis Hastings Algorithm. This study suggests that a sufficiently low mean absolute error can be achieved for the end sulfur content. The relative contribution of the permanent contact reaction to overall reaction rate under industrial conditions remains uncertain.


Introduction
Sulfur is a major impurity in a final steel product.The hot metal desulfurization serves to remove sulfur from the metal melt, while the system is still in reductive conditions, i.e. in the hot metal stage [1] ].In steel, sulfur causes brittleness, and decreases the weldability and corrosion resistance.The final sulfur content is product-dependent.In hot metal desulfurisation, the initial sulfur content is around 0.03-0.045wt-%.In steel production, the sulfur is removed to varying extent in iron production (blast furnace, BF), hot metal pretreatments (e.g.hot metal desulfurization, HMD), in primary metallurgy (basic oxygen furnace, BOF) or secondary metallurgy (ladle furnace, LF). [2]Hot metal desulfurisation can be carried out either with the lance injection technique, the Kanbara reactor process with mechanical stirring, or in ladle furnace process. [2]In the lance injection technique, sulfur is extracted from the metal phase by injecting a fine-grade desulfurisation reagent in an inert carrier-gas, typically nitrogen (N 2 ).The lance is immersed in the hot metal to a constant depth. [2]This study considers calcium oxide (CaO) and calcium carbonate (CaCO 3 ) based reagent mixes.
In lance injection, the reagent injection can be controlled easily.The process allows the co-injection of reaction-enhancing compounds, such as ones that decompose into a gas in the melt conditions, thereby improving the scattering of the fine-grade particles. [3]Owing to the large interfacial area, the reactions occurring in the lance-injection are fast.The available interfacial area is dependent on the particle size of the reagent. [1]The injection gas keeps the metal bath well-mixed, but it is plausible that a variety of the nominal particle-metal surface area is not used effectively as the particles tend to be entrapped in the carrier-gas plume, ergo do not get into straight contact with the sulfur that is dissolved in the hot metal. [3,4]Furthermore, the utilization ratio of the reagent still decreases owing to the assumably short residence time of the particles in the hot metal, denoted here as t res . [1]athematical models can be used in various tasks, such as in process control and design activities.Often, the models are used either to explain or to predict the process behavior.The explanatory models contain usually exhaustive phenomenological descriptions, making them computationally infeasible to be used in fast decision-making, such as in process control activities.Often, the model structures are illustrated with grayscale.In the grayscale, the white-box models are constructed by making argumentative assumptions based on physical principles.These models are usually computationally intensive, especially spatial variance of the system is considered.The models which have a generic structure are called the black-box models.The identification of the black-box models is often solely data-based.Drawing predictions with the black-box models is computationally less intensive than their training.Grey-box models are a mixture of black and white-box models, having a deterministic structure based on physical principles, but also parameters that are estimated from the data.The computational complexity of the graybox models is a compromise between their explanatory power.The parameters might be dimensionless, or have a physical meaning.Because of their structure, the gray-box models can be used also for explanatory purposes.These models are also often capable to draw reasonable predictions. [5]The gray-box model can also use a mechanistic white-box model as the main model, and a submodel with a black-box structure and or the black-box model can compensate for the white-box model outputs. [6,7]he previous studies on modeling hot metal desulfurization cover a vast number of mathematical models for the process, which have been exhaustively reviewed in our previous study.The most common model type is of white-box structure. [8]Of these, the exhaustive descriptions accounting for the spatial dynamics in the heat, momentum and mass transfer can act only as decision support tools because of their computational complexity.The computationally light approaches have been presented as a lumped parameter approach, ignoring the spatial variance.In addition, it is a well-known issue that the reduced lumped parameter models do not describe well the variance of the process outputs, and an exhaustive description is computationally extremely intensive.These issues make the white-box approach generally ill-suited.Very recently, the present authors proposed a black-box model for a carbide-based desulfurization process implemented as a neural network, selected with a genetic algorithm. [9]However, the black box model structure does not enable studying the proposed fundamental research questions.Thus, a new approach with a dynamic gray-box model structure is proposed.The comparable models in the literature are summarized in Table 1.Most of the models are implemented for injectionbased desulfurization, except for Refs., [18,19] which consider the Kanbara process.The most common models tend to be of blackbox structure.The gray-box models are favored over the blackbox structure in studies that consider explanatory analysis to some extent.The modeling errors are given as reported by the authors or are calculated based on the given data using the mean absolute error (MAE).The furthermost column on the right also presents the sample sizes in these studies.The errors are only reported in the comparable cases, ergo in the cases that consider the end content of sulfur as the output variable.
Despite the attempts to study the injection phenomena both on laboratory and industrial scales, [8] the t res as well as the fraction of particles that remain entrapped in the carrier-gas (f P ) under industrial conditions remain an enigma.Thus, the models often include unquantified uncertainties.In this study, a mathematical model with a gray-box structure is identified to quantify these effects as well as the uncertainties, reflecting these to the literature gaps.The model identification is carried out using machine learning algorithms, namely real-coded genetic algorithms (RCGA), Metropolis-Hasting's algorithm and neural networks.The identification process is divided into two separate sections: black-box model training and gray-box model identification.The overall model has a serial architecture, in which the pre-trained blackbox model-based estimator is used to predict the sulfide capacity (C S )and the gray-box models estimate the sulfur trajectory during the treatment.

Materials and methods
In powder injection, the overall reaction rate is assumed to be mainly a sum of the reagent-metal reaction (the transitory contact reaction) and the slag-metal reaction (the permanent contact reaction).In addition, there exists a hypothetical reaction mechanism between the hot metal and the particles that remain entrapped in the carrier gas.However, this mechanism is often ignored in the model descriptions because of the plausibly slow reaction rate and because the current literature does not provide any distinct evidence of its occurrence. [1,8,20]Thus, the overall rate is the summation of the reaction rates that constitute mechanisms i and ii (R tot = R i + R ii ).There are some uncertainties concerning the mechanisms, especially the magnitudes of t res and f P .The overall model structure is presented in Fig. 1.Each of the desulfurization treatments is simulated within time interval of t 0 -t max .In the time integration, an implicit Euler method [21] was used to ensure a stable convergence and small truncation error even with longer integration time steps.

The transitory contact reaction
When a CaO-based reagent is used in the injection, the transitory desulfurization reaction is as follows [2] In this reaction, the sulfur [S] is dissolved in the metal phase, whereas the solid calcium oxide particles <CaO> acts as the main desulfurizer forming a solid calcium sulfide <CaS> and oxygen that dissolves instantly in the metal phase [O].][24][25] Based on the literature, some of these are difficult to be quantified, thus generating several degrees of freedom in the system.Several models are available to describe the rate of transitory reaction (see the available review [8] ).In these models, the sulfur extraction capacity of the individual reagent particles is limited using the concept of microkinetic efficiency.Considering this Black-box PLS model (CaC 2 consumption) N.A. [15]   Black-box PLS model (CaC 2 consumption) N.A. [16]   Black-box PLS model (CaC 2 consumption) N.A. [17]   Black-box MLR model (CaC 2 consumption) MAE = 15.3 ppm (n = 15) [18]   Black-box NN model (t stir , v stir., CaO consumption) N.A. property, the integrated form of the rate law for a single reagent particle of a size d p,i is given as [1,10,26,27] where R i;i is the reaction rate of the transitory reaction for an individual particle size class, L S;CaO is the distribution ratio of sulfur between the <CaO> and the hot metal (-), _ m r;i is the injection rate of the reagent (kg/min) of a particle size-class i, ρ is the density (kg/m 3 ), m is the mass (kg), β M is the mass-transfer coefficient of the metal side at the particle-metal interface (m/s), d p;i is the particle diameter (m) of size-class i and [%S] is the sulfur content in the metal phase.The overall rate for transitory contact reaction is a sum of size-class-specific reaction rates.As indicated above, from the rate-determining properties, the values oft res and f p;i are not well quantified in the literature.
A further discussion of the identification of these properties are given in the following sub-sections.

Describing the f P,i
The term 1-f P,i has been suggested to be significantly less than 1, at least in the case of CaC 2 reagents. [22]It is proposed that this property is dependent of the particle diameter, flowrate of the injection gas, and the co-or mixed injection of additives that generate gas in the system when decomposing, such as CaCO 3 . [20,23]It is suggested that the decomposition reaction improves the particle scattering, and potentially increases the interfacial area between the reagent and the metal. [24]In the injection system, the dependency of particle size and the penetration behavior that is proportional to the contact probability can be described using the critical Weber number (We C ).The We C defines the velocity that is needed for a projectile to have to overcome the surface forces that are defined based on the properties of the penetration interface.In other words, u C is defined as the velocity of an object that is needed to overcome the surface forces in the metal-particle interface.The increase in u C plausibly increases the probability of entrapment of the injected reagent particles in the gas bubbles. [28]This phenomenon is analyzed in more detail in [1] and. [28] The contact probability of particles also depends on the flow scheme (coupled or non-coupled flow) that prevails in the injection lance.The coupled flow is a scheme, where the velocity difference between the particles and the superficial gas velocity is small.As this difference increases, the flow scheme is transitioned into a non-coupled flow that is more unstable. [29]Given these premises, the observed particle behavior might be stochastic.Thus, the f p;i is described with a parameterized model, and given as where Q tot is the overall gas volume accounting both the injection gas and the CO 2 formed with the decomposition of the CaCO 3 in the melt conditions (T = 0.8T M (K), p = p 0 + ρ M gh lance ).The total gas volume in the hot metal conditions is computed assuming an ideal gas behavior.

The residence time (t res Þ
As the flow field formed because of the carrier-gas injection, the exact velocities and thus the t res of the particles in the melt are complex to solve accurately.The identified values of this parameter depend on the assumptions made on the studied system.In addition, the particles might be able to recirculate from the slag phase to the melt, and by that make the system even more complex.It is suggested that the t res is between 39 and 78 s depending on the rate-controlling mechanism. [1]On the other hand, a t res of 25.83 s has also been suggested for the CaO particles. [30]On the other hand, using the flow field correlations presented in [26] the t res is around 2.3 s.A very recent study [31] used t res of 0.6 s in the simulations.However, their validation data consisted of only two heats. [31]With these considerations, it is obvious that in identification the t res should be treated as a probability distribution, not as a single value.Here, the t res is expressed as a mean value for all size classes and is left to be identified from the data.The residence time that is used in Equation 2 is taken as the minimum of the current time (t) and the t res .The magnitude of the t res would be beneficial to be known as it determines the relative contribution of the reaction rates to satisfy the expression for the overall rate.It has been estimated that the rate of permanent contact reaction is half of the rate of the transitory contact reaction, [11] whereas some studies suggest that the overall rate is essentially defined by the rate of the permanent contact reaction. [26]Again, these properties depend on the chosen t res , but also on the assumed particle size distribution (PSD) parameter, which defines the interfacial area for the reaction.

Particle size class specific feedrate ð _
m r;i Þ Rosin-Rammler-Sperling distribution (RRS) is a mathematical model for the mass or volume fraction-based cumulative PSD (PSD).The general form of the model function for particle size class i is [32] where w p;i is the cumulative mass fraction and c 1 and c 2 are the distribution parameters.The distribution parameters are identified using the characteristic distribution parameters d 10 , d 25 , d 50 , d 80 and d 90 with a method of least squares.With these considerations, the mass flowrate of an individual particle size class is where _ m r is the overall mass flowrate of the reagent.

The permanent contact reaction
The rate of slag-metal reaction is considered to be [26] where A is the surface area of the ladle opening (m 2 ), V M is the volume of the hot metal (m 3 ), and β S is the mass transfer coefficient (m/s).Along with the available specific surface area and mass-transfer rate, the overall rate of the permanent contact reaction is defined by the magnitude of the thermodynamic driving force. [33]The magnitude of this driving force is approximated using the equilibrium sulfur partition ratio (L S ), given as [26] where L S is the partition ratio and (%S) is the wt-% of sulfur in the slag phase.As seen in Equation 7, the L S reflects the distribution of sulfur between the slag and metal phases.The partition ratio is related to the C S that is derived based on the desulfurization equilibria of either slag-metal or slag-gas reactions.The C S describes the slag's sulfur extraction capacity.The C S is considered independent of the activity of oxygen (a H O ½ � Þ in the system.C S practically defines the thermodynamic driving force of the permanent contact reaction. [1]The computational details are well presented in Ref. [1] Usually, a logistic expression for the L S is used, such as the following [34] where f H S ½ � is the Henrian activity coefficient of sulfur in the metal phase.To predict the sulfur partition ratio between the slag and the metal using the C S , or vice versa, the a H O ½ � needs to be measured or estimated using the reaction equilibrium approach. [1]In the gray-box model, the a H O ½ � is assumed to be determined either based on C -CO or Si -SiO 2 equilibrium reactions, given by where {} and () are the notations for species in gas and slag phases, respectively.A common assumption is that the partial pressure of carbon monoxide (p CO ) equals to 1 and the Fe is present as a pure solute, a R Fe = 1.Using these assumptions and a Henrian standard state (infinitely diluted solutions), the a H O ½ � in Henrian standard state is given by either of the following relations where K is the equilibrium constant,γ R i is the Raoultian activity coefficient, x SiO 2 is the molar fraction SiO 2 .The K CO f g and are given in. [35,36]The Raoultian activity coefficient for (SiO 2 ) is calculated using a regular solution model. [37]In all cases, the activity coefficients for the components dissolved in the metal phase are estimated using the Wagner-Lupis-Elliott formalism. [38]ractical observations have revealed that in the operating conditions of the studied process, the slag is initially liquid.After approximately 800 kg of solid CaO reagent has been injected in the melt, the slag forms a heterogeneous system with a high fraction of non-dissolved CaO particles, as the as their mass percentage the slag can be as high as 75-80 wt-% at the end of the treatment.The estimation of the evolution of the slag composition during injection was carried out with simple mass-balance computations.The solubility of CaO in the slag phase at the given temperature (1350°C) is relatively low.In addition, the approximation of the ladle flow field during the injection is computationally very intensive.Owing to these properties, the mass-transfer coefficient for the slag-metal reaction is described using a semi-empirical correlation [39] where τ is a coefficient describing the relation of the mass-transfer coefficient and the gas flowrate Q tot through the surface area of the ladle opening (A).The Q tot is computed using the conditions prevailing in the ladle surface [39] (T = 1523.15K, p atm = 101325 Pa).The dispersion of the metal droplets in to the slag phase, or vice versa, would drastically increase the rate of permanent contact reaction through the metal droplets with a large interfacial area. [1]owever, this approach is ignored because of the low empirical evidence of it, especially in the case of heterogeneous slag phase.

Grey-box model identification
In this study, the considered model is implemented using a gray-box structure.The principal gray-box architecture is based on the overall white-box model of which parameters are identified using the RCGA the standard errors of the model parameters are estimated using the Metropolis Random Walk Algorithm.In addition, a neural network model is used to estimate the log 10 (C S ), making the overall model architecture a serial gray-box.The black and gray-box model identification as well as the overall model architecture is presented in Fig. 2. The formulated gray-box model is essentially a parameterized version of the white-box model presented in Fig. 1.

Grey-box model parameter identification
The model parameter identification is divided into two steps (1) Estimation of the expected value of the model parameter vector with a RCGA (2) Estimation of the uncertainty of the parameter vector using the Metropolis-Hastings algorithm The objective of the parameter identification is two-fold.The primary objective is to identify parameters that yield reasonably accurate predictions and allows the use of the model for predictive purposes in on-line applications.The secondary objective is to draw an estimate of the model parameter uncertainties and by that to evaluate the explanatory power of the model.To simplify the parameter identification objective function space and the analysis, the training and the testing data sets are kept constant in the identifications.

Real-coded genetic algorithm
The RCGA is a well-established method for the minimization of complex and multimodal functions, which are common properties of the least-squares functions that are based on noisy data sets and non-linear models.Especially, the exploration properties of the genetic algorithms make them superior compared to gradientbased methods, even though the metaheuristic algorithms tend to suffer from poor convergence near the local optimum.There is a plethora of other available metaheuristic population-based algorithms available in the literature that perform equally well as the RCGA. [40]The RCGA implemented in this study consists of the five basic operators of the classical genetic algorithm.These are selection, crossover, mutation, elitism and population shifting.The operations are implemented as a hybrid, which means that the population members are treated differently.[43] The implementation details are summarized in Table 2.

The metropolis hastings algorithm
The Metropolis Hasting (MH) algorithm belongs to a family of Monte Carlo Markov Chain (MCMC) methods, which are designed to assess probability distributions in a frequentist manner. [47]A general application of the algorithm is in the estimation of the probability distribution of the model parameters, allowing the estimation of the expected parameter values and the standard errors.The basic idea of the algorithm is to draw an estimate of the parameter distribution by making use of an assumed posterior distribution of the model parameters and the modeling error.The algorithm generates a chain of parameters using a random walk and the defined forms of the assumed posterior distributions.However, the algorithm demands an initial guess to initialize the computations, which is often complex to define without a priori information of the parameter values.The algorithm itself does not explore the parameter space efficiently, which is why the RCGA is used to identify the expected value of the nonlinear model parameters, and the MH algorithm is used to estimate the standard errors and the confidence intervals.In this study, the parameters and the modeling error were assumed to follow normal and Fisher's distribution, respectively.An exhaustive description of the applied algorithm is given in. [49,50]

The neural network model training for sulfide capacity
In this study, a shallow neural network model is used as the model basis to estimate the C S .The training procedure follows the principles of Extreme Learning Machine (ELM), which allows a significantly shorter training time than the traditional backpropagation algorithm, without significantly trading off the generalization and the estimation capabilities of the network. [51]As the C S dataset is relatively small, the use of more complex models such as random forests, XGBoost or deep neural networks was not considered further.In addition, as one of the key objectives is to demonstrate a model that is suited to be used on-line, the fast training of the applied sub-models is rather pragmatic.The computational details of the shallow neural networks are presented in multiple references, and thus not repeated here.

Shallow neural network training data
Numerous models are available for predicting the C S .The most referenced models usually lack data on CaO-SiO 2 -Na 2 O slag systems, [34,52,53] thus making the model residual correlated with the Na 2 O content.This is problematic as the Na 2 O is a major component in the studied industrial slag data (see the next section).Therefore, dataset 3 is collected to cover the slag system considered in this study.The data extraction was carried out manually.The data set consists of 487 observations and 9 potential input variables.These kinds of multivariate datasets might suffer from sparsity, which needs to be addressed during the model selection.The referenced studies are presented in Table 3 with the corresponding slag systems.The system temperature in the selected studies varies in the range of 1200-1700°C, so it covers well the studied industrial conditions.The L S and the C S were determined as described in the previous sections and equations 16 and 17.A more throughout the description of this method can be found in. [1]The comprehensive experimental details can be found in the referenced studies.
Similarly, the a H O ½ � is needed to estimate the C S that is then to be used as the reference vector in the model training.The a H O ½ � is considered to be determined as the minimum value by the Table 2. Operators for the genetic algorithm.

Grey-box model identification data
A real-life industrial data set was used for the identification of the gray-box model.The considered dataset dimensions are 39 rows (observations) and more than 20 columns (input variables).The data consists of the mass-based composition of the hot metal and the slag phase and the temperature of the hot metal.These properties are available before and after the treatment.In addition, the data was combined with the available injection parameters (gas and mass flowrate of the reagent) as well as with the PDS's of the used reagents.The average compositions of the metal and slag phases are given in Table 4.
The average temperature in the data is 1623 K. On average, the m M is around 80 tonnes.The slag compositions were not available for all treatments.The missing values were replaced using mean and regression imputation techniques.The regression imputation was carried out by using an empirical relation for (CaO) wt-% = 0.3878T(fjC) − 501.3, whereas the rest of the slag components were imputed using mass-balance calculations.The volume-based and corresponding characteristic PDS's were measured using a laser diffraction method.The values are reported as a percentage (X) below diameter (d), ergo d X .The values for the characteristic particle sizes (d 10 , d 25 , d 50 , d 80 , d 90 ) are extracted from Ref. [67] The characteristic PSD's of the available reagents are given in Table 5 with corresponding estimated RRS distribution parameters.The R 2 for the RRS fit was consistently high (R 2 >0.95) for each reagent.

Neural network identification
The considered neural network was identified using molar fractions of CaO, SiO 2 , Al 2 O 3 , MgO and Na 2 O as well as the temperature (T) in Kelvins as input variables.The variable selection was based on the available slag component measurements in the process data set, whereas the rest of the available variables were ignored to avoid inconsistencies in predictions because of the missing values.The number of hidden neurons was set to be 8 as it was found to minimize the cross-validated error and to provide a good estimate of the C S on the cross-validation set.The performance of this model was found reasonably good, as the test set predictions resulted in the coefficient of determination R 2 = 0.88 and mean absolute error (MAE) of 0.25 (-).The estimated effects of the input variables on the C S were similar to the reference data.The C S of the slag phase was found positively dependent on the Na 2 O, CaO/SiO 2ratio and on the T, as well as on the Na 2 O/SiO 2 ratio.These dependencies are consistent with the and similar as were observed in the training data using univariate estimators.The interpolation capability of the model was validated roughly using a response surface analysis.The response surface seemed smooth; thus, the provided training data could be considered adequate for the purpose and the model can be considered to interpolate well.

Grey-box model parameter identification
The parameter identification with the genetic algorithm was repeated several times with different population sizes in order to achieve a reliable convergence.Consequently, a suitable population size was found to be (n pop ) of 100 and the maximum number of generations (gen nax ) of 200.The actual parameter identification phase was repeated with 10 different initial populations.The elite individual of the population in the final generation was then used as the initialization for the MH algorithm.The chain length for each of the repetitions was set to 500.The 95% confidence interval (CI 95 ) was then computed with the quantile method using the quantiles 0.025 and 0.975.
The estimated parameters with corresponding 95% confidence intervals (CI 95 ) are summarized in Tables 6 and 7.The corresponding probability distributions as well as the parity  plots for the parameter values are presented in Fig. 3.In the figure it can be seen that the values for the t res agree well with the literature (see section 2.1.2).However, the multimodality of the objective function reflects the large uncertainty of the model parameters.Thus, the parameters cannot be treated as crisp values, but rather as multimodal probability distributions.This finding agrees well with the findings presented, [1] in which is suggested that especially the t res depends greatly on the modeling assumptions.To exemplify, the magnitude of the transitory reaction rate is dependent on the product of the β M and the t res .However, these values themselves are interdependent, as the mass-transfer rate is partially defined by the metalphase velocity, which further defines the t res .In addition, both of these values are dependent on the particle size class, complicating the system a bit more.To build a more solid foundation for explicit conclusions, further investigations are needed, e.g.computational fluid dynamics with discrete particle (CFD-DP) modeling or physical modeling.The 95% CI shows that four of the five effects are estimated such that their statistical significance can be concluded.However, the parameter (b 1 ) that describes the effect of u c on the contact probability of the particles remains uncertain, as the CI is distributed on both sides of 0. In addition, the parameter seems to be bimodal, or even trimodal, indicating the existence of different operating areas of the model.This can be further associated to the dominance of either the transitory or the permanent contact reaction in the overall kinetics and the uncertainty of the main contributing mechanism (see Fig. 4).The reasoning can be observed from Fig. 3; the transitory reaction parameters are to some extent correlated with the permanent contact rate parameters.Thus, the contribution of the slag phase to the reaction rate is uncertain, even though its mathematical description seems to explain the results well.However, in our previous study, it was found that the PSD parameters only explain this data set very well in a model structure that ignores the permanent contact reaction. [66]It is plausible that to identify the effect of permanent contact reaction on the overall rate, a data set in which only the carrier gas is injected without the reagent.However, collecting this kind of dataset in normal operation might not be feasible in an industrial-scale process.Another reason could be that the laboratory-scale data, and thus the neural network could overestimate the C S in the industrial scale and by that increase the relative contribution of the permanent contact reaction.What comes to t res of the particles, it is of the same magnitude as is proposed in the literature (see for example. [1,26,30]The t res also correlated not only with the τ but also with b 0 and b 1 , which is in agreement with Oeters, [1] who proposed that the t res depends greatly on the modeling assumptions.

Model validation and simulations
The model validation was carried out using stratified crossvalidation.This was carried out reagent-wise.The stratified cross-validation was used to ensure the existence of each reagent treatment both in the training and testing dataset.The division was made such that 24 treatments were used for identification and 15 treatments for testing.The test set predictions were carried out by using the parameter vectors identified with the RCGA.The results are presented in Table 7.
From the table, it can be seen that the model predicts the end sulfur contents well, as both R 2 and MAE are reasonably good for each parameter set.Owing to the multimodality of the objective function, the RCGA suits well for parameter identification in the final application of the model.
The sensitivity of the model for model parameters was then evaluated using the identified parameter distributions.The simulation results were computed for 5000 parameter sets, and the corresponding time-variant max, min, and 95% CI values for the sulfur content were recorded.Consequently, 75000 simulated sulfur trajectories were computed.In this case, 95% CI was again computed with the quantile method.The simulation envelopes for six exemplified treatments are presented in Fig. 4. It can be seen from the simulation results that the model is not sensitive to parameter changes, provided that the parameters are not treated as independent.A single-parameter sensitivity analysis would provide drastically different results, because for example the β M and t res cannot be considered independent (see Equation 2).
Similarly, by changing the contribution of the permanent contact reaction (τ), the overall rate would be changed drastically because the overall rate is constant and is formed as the sum of these two individual mechanisms.This issue can be well observed in Fig. 5, in which the contribution of the transitory reaction on the overall rate is presented as a percentage.It can be seen that there are multiple almost equally good solutions for estimating the end content of sulfur, but for the explanatory analysis, the modeling result is quite uncertain.The 95% CI for the transitory reaction contribution is case-dependent and depends on the sulfur content.In some cases, the contribution might be as high as 100%, whereas good predictions can be drawn with a contribution of only 60-70%.The 100% percent relative contribution reflects well our previous observations, [67] which suggest that the slag phase rate could be ignored for this data.Obviously, the confidence intervals and the distributions for the permanent contact reaction is R ii = 100-R i (%), and by that has similar uncertainty characteristics as does the transitory contact reaction.It is rather interesting to see that regardless of the mechanistic uncertainty, the model is capable of estimating the end sulfur contents reasonably well.
The effect of particle size and the mass percentage of mixed CaCO 3 within the reagent on the 1-f P is in presented Fig. 6.The figure shows that the effect of CaCO 3 mixed with the reagent on the contact probability can be seen as significant, but the effect of particle size on the contacted fraction is uncertain, which can be observed from the spread of the confidence interval.This particular issue also reflects the identification results concerning parameter b 1 and its confidence interval; based on this data, the effect appears either positive or negative.Referring to the work of Nakano and Ito (2016), a higher penetration velocity decreases the contact probability.However, with these assumptions, and thus would rule out the negative parameter values.This issue at least in practical applications solved by setting physical constraints to the parameter values by assuming the dependence a priori (b 1 >0).

Practical implications
The practical applications of the models are two-fold.The model speed allows its use online in control applications, and on the other hand, the mechanistic implementation of the model allows it to be used in the desulfurization reagent design and quantification of the reagent efficiency.A one 8 min simulation takes 0.064 s on a standard laptop using a simulation time-step of 1 (s) with implicit Euler time integration.This corresponds to 7500 simulations in realtime and thus can be considered suitable for on-line use.However, the parameter identification process with RCGA would have to be carried out off-line.
To exemplify the use of the model in process design, a simulated scenario is formulated.In the scenario, two different PDS's with d 90 = 117 and d 90 = 428.5 μm are considered.The simulation parameters were otherwise kept constant with respect to the treatment time, slag composition, and injection parameters.In the simulations, the identified parameter distributions were used to estimate both the expected value and the confidence interval on the effect of d 90 particle diameter of the reagent on the rate of desulfurization and end content of sulfur.The considered simulation time was 8 minutes and the reagent injection rate was 100 kg/min.The simulated trajectories are presented in Fig. 7.It can be seen that the more finegride PSD yields an increased rate and efficiency of the desulfurization reagent, as the end content of sulfur is estimated to be [S] = 42-72 ppm with a fine-grade reagent, compared to [S] = 93-116 ppm with the more coarse-grade reagent.The mean difference between the distributions is 48 ppm, and the difference between the two end-point sulfur contents appears to be statistically significant (p-value < 0.01) with 95% confidence level.Thus, the identified model implies that using a reagent with a finer-grade PSD increases the rate of desulfurization in this data, regardless of the effect of the slag phase.This effect is in good agreement with our previous studies. [27,67]

Conclusions
In this study, a well-performing mathematical model with a gray-box structure is proposed to estimate the time-variant sulfur content in injection-based hot metal desulfurization.The performance of the gray-box model was found good for the test set on average (R 2 = 0.92, MAE = 13 ppm) using different parameter combinations identified with the real-coded genetic algorithm.The explanatory performance of the model remained uncertain, as the model parameters were found to correlate to some extent.Overall, the modeling results were consistent with the literature.Owing to the fast computational time and accuracy, the model could be implemented for online use, provided that the off-line estimation of the parameters is carried out reliably.

Figure 1 .
Figure 1.The overall grey-box model structure.

Figure 2 .
Figure 2. The training of the black-box model and the use of the pre-trained black-box model as a part of the overall grey-box model.

1
Notes: *) X/Y = Fractions of population treated with either of the operators, MPT = Mäkinen-Periaux-Toivanen, n TS = number of individuals selected as tournament pairs, n pop = number of individuals in the population, p c = Crossover probability, p mut = Mutation probability, p map = Mapping probability.

Figure 3 .
Figure 3.The parity plots and the histograms of the estimated model parameters.

Figure 4 .
Figure 4.The simulated sulfur trajectories for 6 randomly selected treatments with corresponding confidence intervals (min, max and 95% limits).

Figure 5 .
Figure 5.The reagent rates for 6 randomly selected treatments with corresponding confidence intervals (min, max and 95% limits).

Table 1 .
Comparable studies in the literature.The figures of merit are reported in the cases in which the end point of sulfur ([S] end ) is used as the output variable or the predictions have been reported.

Table 4 .
The average values before the studied treatments.

Table 5 .
The characteristic PSD's of the reagents with corresponding RRS distribution parameters.The values are given in micrometers (μm).

Table 6 .
The CI 95 of the estimated parameters.

Table 7 .
The estimated model parameters with the RCGA and corresponding figures of merit computed for the test set.