MATHEMATICAL AND NUMERICAL RESULTS FOR QUALITY CONTROL OF HOT METAL IN BLAST FURNACE

The blast furnace is a very complex industrial equipment producing hot metal from iron oxides. Its measuring, modeling and exploring functionalities is very crucial; this is due to the difficult measurement and control problems related to the unit. To maintain high efficiency, the current work proposes new adaptive algorithms for data-driven methods. These methods are classified into supervised and unsupervised algorithms, well known in optimization problems as regression, classification and clustering. To extract their limitations, a comparative study between the proposed techniques is presented, where the obtained results are validated on real data from the steel processes ArcelorMittalAnnaba, proving the feasibility and effectiveness of the proposed models and numerical procedures.


INTRODUCTION
Blast furnace is an extremely complicated nonlinear system consisting of several specific elements such as the loading equipment; the cooling circuit, the whole production of hot wind, and the big cylindrical shaft four.
Their principal operation is as follows: the solid raw material like iron ore and coke is charged in a mixture successive layers from the upper of the furnace. Under the action of their particular weight, they gradually descend to the bottom of the heating up until it melts; however, the hot combustion gases rise through the column of combustion materials. The molten metal consisting of cast iron and slag flows into the crucible. At the end of combustion, the materials split into two elements; molten slag on the one side and molten hot metal on the other, which accumulate according to their own specific masses. The slag is drained through the slag tap hole which is higher than the cast iron tap hole.
The heat and mass transfer process complexity combined with a wide variety of chemical reactions and high pressure, make blast furnace modeling an extremely difficult problem. Despite the best researchers's efforts to solve this type of problems, challenges do remain.
Prediction and control of product quality by virtual sensing technology is a key tool in real time, and has been extensively used in many manufacturing processes (see Zhang (2017) [13] and Zhou et al. (2017) [15]). Usually, it is split in two main groups, that are the data-driven techniques and the first-principle models. In this case, it is very hard to construct the first principle due to the large size of the reactor and the complex internal operating environments, while the data-driven model do not need previous knowledge of the full operation of the process and are based directly on the process data, Ge et al. (2017) [2]. Therefore it has been effectively applied to a various industrial processes, as typical examples the time series models by Saxen (2013), multivariate statistical approach (PLS, PCR).
For an overview on the Data-driven time discrete models for dynamic prediction of the hot metal silicon content, the reader is referrred to Gao et al (2011) [1] and Saxen et al. (2013) [8], where they proposed a model to predict the change of thermal state of blast furnace hearth with support vector machine; while Ling J. et al (2011) [5] developed an adaptive model using the sliding windows smooth support vector regression (SVR) for nonlinear blast furnace system.
Numerous dynamic systems are characterized by a non-linear dynamic behavior, such as the blast furnace, where nonlinear models are then necessary. Indeed, it has been shown that SVR and Neural Network (NN) can approach continuous nonlinearities, and have been applied to the modeling of complex nonlinear systems whose complexity is often due to the high number of weights in the network; in addition to the model identification principle using conventional Recursive Least Square (RLS) and its adaptive version. More details about these methods can be found in different documents, see for example [1], [2] and [9] and references therein.
In this study, we propose a new approach to measurement quality control of blast furnace using robust nonlinear modeling and identification techniques known as data driven methods; with an attempt to reduce modeling and identification errors concerning the structure and parameters of the model that is included in the uncertainty budget.

BIG DATA AND DATA DRIVEN METHODS
Big Data is the huge amount of pseudo infinite information being collected from different databases. These raw data are usually of a different special nature characterising the multi-scale actions of the system under consideration. Although the quantity of information stored is important, there are other many things that have the same importance, like the accuracy, variety (format, nature, and type) , and value (perspectives and impact) of these data, as well as the absence of noise, knowing that outliers, missing and incorrect values with the data sparsity could introduce noise.
Big Data analysis faces many challenges along with the previous important issues such as data quality and verification, high dimensions, spread and representation data resources, data visualization and testing and the capability to develop algorithms. In order to derive patterns from large-scale data and understand the value of big data analysis, we need to apply and adapt certain methods such as the Machine Learning paradigms and algorithms; and deep learning. Machine learning is an important field and a research frontier in artificial intelligence, which have significant part in big data analysis long with computational power. its task is to analyse the largest amount of data at all levels, whether simple or complex and ensure that more accurate results and decisions are obtained faster. This can be categorized into two model forms: The first is called supervised learning, where for this type of input data, it is usually required to train reliable data representations that can be expanded to new data in the same big data applications area. It can be classified into two main groups, logistic regression and regression. Neural Network (NN) and support vector machines (SVM) are considered as classification methods to get the best prediction or explore different linear or nonlinear models, just enough to change the structure parameters, such as the kernel for the SVM, the activation function and number of layers in the NN. Different algorithms based on robust ones from gradient descent to Levenberg Marquardt, are used to minimize the modelling errors for these model types.
The second is named unsupervised learning, in which there is no modelling error to supervise and no direct learning algorithm to track a model output. The Partial least square (PLS), Principal components analysis (PCA), dimensionality reduction and clustering, belong to unsupervised learning. Generally, we use these methods to extract features from the noisy data, as a pre-processing step, then the pre-processed data is applied as inputs for the supervised learning step.
Note that this section is not to cover the Big Data in particular, but to provide a brief overview of its key concepts and challenges as detailed in next sections.

PLS technique.
The key step in a process modeling is to define the most important input variables and forecast the process response from the collected data. Nevertheless, the high size and co-linearity of these data make it complex to build a robust process model. the need to explain the process quality of these data has led to the development of a multivariate analysis for complex processes. In order to reach this goal we tried to use different strategies among them the PLS technique. As a supervised dimension reduction methodology, it was invented in 1983 by Herman Wold [10] and was taken to pick up an optimal subset of input variables called "latent variables", where its purpose is to construct new predictor variables as linear combinations of the original ones summarized in a matrix X and a vector y of response variables (class labels). It focuses on maximising the covariance between the extracted factors from both input and output process sets.
Let X ∈ R m and Y ∈ R n be m and n-dimensional spaces of m and n variables, respectively.
From N observed samples for each x ∈ X and y ∈ Y , we get two variable blocks X ∈ R N×m and Y ∈ R N×n , and write the general formulation as : where T and U ∈ R N×S are factor matrices for the score and latent variables, respectively; P ∈ R m×S and Q ∈ R n×S are loading matrices and ε X ∈ R N×m and ε Y ∈ R N×n are error terms.
In the PLS regression, the optimization criterion is mizing the Tucker criterion of the inter-battery factor analysis: We thus try to simultaneously maximize the variance t (1) and the correlation between t (1) and Y . We are therefore looking for a normalized vector w (1) maximizing: To obtain the expression of w (1) , we solve the optimization problem under constraint ( * ) by using the Lagrange multipliers method: with λ ∈ R + . If we set the first derivatives of L with respect to λ and w (1) to zero, we write By multiplying (2.8) by w (1) and using (2.7), we get: w (1) X Y = 2λ . Let θ ∈ R, by symetry we obtain: (1) .
From (2.8) and (2.9), we have: Consequently, w (1) is the associated eigenvector to the eigenvalue θ 2 of the matrix X YY X; and the maximization of Xw (1) ,Y amounts to considering θ 2 as being the maximum eigenvalue of X YY X.
We can therefore deduce an expression for w (1) and the associated eigenvalue.
We write: From the constraint ( * ) we have w (1) = 1 so that w (1) = X Y X Y . In order to know if the first component t (1) = Xw (1) can sufficiently explain the set of explanatory and the endogenous variables, we perform two regressions X on t (1) and Y on t (1) to get (1) . Other weight vectors are iteratively computed by the PLS method, such that when w (1) and c (1) are available, the score vectors can be computed by t (1) = Xw (1) , u (1) = Y c (1) and loadings, i.e. the first columns of P and Q can be computed by p (1) = The data matrices X and Y are then deflated by subtracting their rank-one approximations The new X and Y are used to compute w (2) and c (2) based on This process is repeated until the residuals are small enough or a predefined number of weight vectors {w (1) , ..., w (k) } and {c (1) , ..., c (k) } are obtained (see [1,11]).

SVR technique.
The support vector regression was introduced by Vapnik in the early 1994 (see [4]); since it became popular and widely applied in many fields. It is firmly grounded in the framework of statistical learning theory which has led to a large development, the SVR is strongly recommended for solving various classification and prediction problems; it is also the key to construct the Lagrange function from an objective function by converting the minimization problem into a dual one [9]. Suppose the training set sample is given by where ∀i = 1, ..., L, x i ∈ R n is the input of the training sample, and y i ∈ R is the target value.
Our goal is to determine a function that can approximate future values accurately. The model is then given by: where weights w ∈ R n (i.e. the coefficients vector ), b ∈ R is a constant, and ϕ a map function supposed to be a nonlinear transformation from R n to a higher dimensional feature space.
The aim is to find the weight w and the bias b, such that x can be calculated by minimizing the regression risk defined as: where Γ(.) is the cost function and C is a constant.
The solution for minimizing this cost function is equal to a convex optimization problem with a soft margin loss function, which is represented as follows: where ζ is the insensitive parameter; ξ j , ξ * j are slack variables which are introduced to relax the optimization constraints and the vector w can be written in the form: By substituting equation (2.20) in (2.16), the generic equation can be rewritten as Here k(x j , x) is the kernel function. Kernel functions allow dot product to be computed in large dimensional feature space using small dimensional space data input without knowing the transformation φ . The most widely used cost function is the ζ insensitive loss function that has the form By solving the quadratic optimization problem (2.18), the regression cost in equation (2.17) and the ζ insensitive function in (2.22) can be minimized. We write where α * j and α j are the Lagrange multipliers, representing solutions to the quadratic problem that act as forces pushing predictions towards target values y j .

Neural Network technique. The neural networks were invented by Mc Culloch and Pitts
in 1950, and made popular by Hopfield [3]. As a nonlinear mapping between the input and the output sets, the NN emulates human systems (Brain), which is characterized by adaptation and self-organization. Moreover they have the ability to learn from the experience and generalization from the previous sample to solve new problems. They are composed of a number of very simple processing elements known as neurons (see Diagram 1). A neuron typically consists of four components: (1) input data (2) a group of weights, (3) a weighted summer (nodes) and (4) a nonlinear activation function φ such as sigmoid; note that each neuron has an activation function. The inputs x i are connected to the nodes in the input layer and the outputs y p are taken from the output one. Between the input and output layers, there exists one or more hidden layers. All the nodes in one layer are connected to all the nodes in the next one. The connection strength between the output of a node i with a node j is given by a weight W i j . The weights are regression coefficients to be estimated from a sample data. The bias term is comparable with the intercept of conventional regression model, i.e. it allows us to add flexibility when learning.
The model can be written as follows where x i represents the input value, y the model output, W = ∑ N i w i x i is the weight connecting input i with a hidden neuron j, and φ is nonlinear called activation function. Different activation functions can be used, among which we can cite, sigmoid, hyperbolic tangent and Gaussian.
In order to find the optimal architecture of a neural network, many different approaches exist.
These methods are usually quite complex in nature and are difficult to implement. Moreover there are no hard or fast rules about the choice of the number of nodes and hidden layers to be used in an application. Usually some trials and errors are required to determine the best combination to minimize the error. Hence the connection weights are auto-adjusted using efficient nonlinear optimization algorithms, namely the basic back propagation (BBP) training algorithm [11]. The weights W are changed by an amount ∂W according to the following formula where the parameter η is the learning rate and E is a cost function quatifying the difference between the initial known values of the approximated function over the discrete set of data and their corresponding NN approximation, written in the form where N is the number of error terms.
The most widely used Levenberg-Marquardt (LM) in an optimization algorithm, can be considered as a trust-region modification to Gauss Newton (GN) method or a more robust one as a bridge between the GN and the gradient descent algorithm. It has been readily shown in many cases, that it converges even if the error surface is much more complex than in the quadratic situation. The LM is similar to the BBP in the sense that it requires only the gradient vector calculation, wether the LM computes in addition the Jacobian. The LM algorithm can be represented as: where I is the identity matrix, ε the total error for all patterns and µ a learning parameter, that has to be adjusted several times at each iteration so that the result with the greatest error reduction is selected. When the µ value is very large the LM algorithm becomes steepest decent or BBP, whereas when it is equal to zero it is the Newton Method. If we consider a system defined by measured input and output data X(t) ∈ R and y(t) ∈ R, to take into account the variability of X(t), y(t) , a sliding window with width N is introduced and the input output data become X(t − N :  Table 1 and Figure 1. Table   1 defines the Process Parameters input and output data. The model structure given in Figure 1 shows the influence of input space on the output. The model inputs are the natural gas flow rate, the heat wind flow rate and the oxygen purety as shown in Figure 1 (Fig.1a-Fig.1c), while the output model is the temperature of hot metal (see  Output y Temperature of the pig iron Our goal is to find an input/output mathematical model y = f (X 1 , ., X n ).

Numerical Experimentation.
In this section we present the numerical algorithms used and the obtained results by the techniques that have been discussed in the previous sections.

Numerical Algorithms.
Algorithm 1. PLS Algorithm input : Two matrices X and Y , arbitrary w with w = 1 output: Weight vectors w and residuals, loading vector S and score vectors (t, u).

FIGURE 1. The model of experimental inputs and outputs
Step 1. Given a starting vector u 1 , usually one of the columns in Y .
Step 2. Calculate the X-weights w, by w 1 = X T u 1 X T u 1 Step 3. Calculate X-scores T , by t 1 = Xw 1 Step 4. Calculate the Y -weights L, by L 1 = u T 1 t 1 u T 1 t 1 Step 5. Update the Y -scores U, by u 1 = Y L 1 Step 6. Update S based on t by: Step 7. Find the regression coefficient for the inner relation: Step 8. The residuals, ε x and ε y , are calculated by Step 9 . Continue with next component until there is no more significant information.

Algorithm 2. SVR Algorithm
Initialize the network weights W 0 i j =[-0.5 to -0.5] and define the computing loop as: Step2: Compute the model outputŷ from (2.21) Step3: Compute the modeling error as: e(k) = y(k) −ŷ(k) Else, Adjust the SVR weights using the recursive Quadratic Programming Algorithm Algorithm 3. NN Algorithm The model structure can be defined as f → NN and its identification is simplified by the following steps: Initialize the network weights W 0 i j =[-0.5 to -0.5] and define the computing loop as: For K = 1 : L k Step 1: Acquisition of inputs/outputs.

3.2.2.
Results and Comments. The pig iron temperature prediction results as a function of the oxygen flow rate, the oxygen purity and the hot wind flow rate for the blast furnace process are given in Figure 3, from top to bottom as follows: The temperature prediction is carried out by the PLS model and its corresponding adaptive version. The latter approach gives better results of the predicted error which is lower than the conventional PLS. This is quite obvious because the adaptive model performs an immediate correction by obtaining a new projection at each iteration.
As the SVR approach is based on the Vapnik algorithm, its adaptive version is an execution of this conventional algorithm at every time using a sliding window mode.

CONCLUSION
In this paper, we explored three techniques and their adaptive versions, where we present the temperature predictions and their corresponding error estimates.The pig-iron temperature prediction as a function of the oxygen flow rate, the oxygen purity and the hot wind flow rate for the blast furnace process are solution of 2.16; where the results are shown in Table 2 and however, the SVR is more competetive than the PLS. The results of predicting the temperature of the cast iron by using the SVR approach are obtained by the use of a modified version based on the Vapnik algorithm [12]. An adaptive version of this latter is detailed in Algorithm 2 together with Diagram 2, and similar to the PLS case it also gives better results; however high uncertainty is noticed. Indicators for the proposed approaches are given in Table 2 and Figure   2. Therefore, we conclude that the adaptive based model approaches are more precise and can be recommended, more particularly they lead to a reduced uncertainties range in prediction compare to the partial least square model because all the input changes are considered; whereas the NN model may not be recommended, particularly in the case of noisy data. Furthermore the models can be combined with sensitivity analysis by the use of the Monte Carlo simulation technique.