A Hybrid Ensemble Model Based on ELM and Improved AdaBoost.RT Algorithm for Predicting the Iron Ore Sintering Characters

As energy efficiency becomes increasingly important to the steel industry, the iron ore sintering process is attracting more attention since it consumes the second large amount of energy in the iron and steel making processes. The present work aims to propose a prediction model for the iron ore sintering characters. A hybrid ensemble model combined the extreme learning machine (ELM) with an improved AdaBoost.RT algorithm is developed for regression problem. First, the factors that affect solid fuel consumption, gas fuel consumption, burn-through point (BTP), and tumbler index (TI) are ranked according to the attributes weightiness sequence by applying the RReliefF method. Second, the ELM network is selected as an ensemble predictor due to its fast learning speed and good generalization performance. Third, an improved AdaBoost.RT is established to overcome the limitation of conventional AdaBoost.RT by dynamically self-adjusting the threshold value. Then, an ensemble ELM is employed by using the improved AdaBoost.RT for better precision than individual predictor. Finally, this hybrid ensemble model is applied to predict the iron ore sintering characters by production data from No. 4 sintering machine in Baosteel. The results obtained show that the proposed model is effective and feasible for the practical sintering process. In addition, through analyzing the first superior factors, the energy efficiency and sinter quality could be obviously improved.


Introduction e energy consumption of iron and steel enterprises in
China is about 10% of the total energy consumption of the country. As a typical part of iron and steel making processes, the iron ore sintering process accounts for 10%∼ 15% of the energy consumption of an iron and steel enterprise. Figure 1 shows the schematic view of the iron ore sintering process flow. A mixture of iron ore, fuel (coke breeze), flux (limestone/lime), return fines, and other additives is granulated with water in a rotary drum. ese granulated mixtures are continuously charged together with bed layer material to form a thick bed of approximately 800 millimeters on a moving sinter strand. e sinter bed is ignited from the ignition hood, and the heat is drawn inside the bed under the action of air being sucked with the wind boxes under the strand. e holding furnace is used to maintain the heat absorbed by the surface of sinter bed sufficiently. e vertical sintering speed is controlled by strand speed and gas flow rate to ensure that the burn through occurs just prior to the end of the strand where sinter cake being discharged. e hot sinter cake is crushed followed by cooling in circular cooling system. e cold sinters are sieved and the undersize portion (the return fines, smaller than 5 mm) is delivered back to the raw materials feed system. e oversize portion is then transferred to the blast furnace and sieved the desired size (10-16 mm) used as the bed layer material.
In order to study the cause-effect relationships of the sintering process, expert systems [1][2][3][4], and prediction packages [5][6][7][8] have received particular attention due to the advantages of reliably representing nonlinear relations, which learn processes directly via historical data. e solid fuel consumption, the gas fuel consumption, the burn-through point (BTP), and the tumbler index (TI) are the four characters that reflect the iron ore sintering performance. e location of BTP is the number of wind box which reached the highest temperature. e quality of sinter cake is determined by the BTP. For instance, if the BTP is located on the front of the normal point, the productivity of sinter bed is much lower than the designed level. On the other hand, if BTP is lagged behind, row material is less burned and thus, the the return fines increase. Scholars have made efforts to predict these characters. Wang et al. [9], for instance, have constructed the data-driven energy consumption model, taking the whole manufacturing process parameters as the variables. Chen et al. [10] have established the prediction model by BP neural networks, with accuracy of 91% for the comprehensive carbon efficiency. Wu et al. [11] have developed the BTP prediction model by using the support vector machines (SVM), taking the bed height, ignition temperature, and strand speed as the input variables. And Shang et al. [12] have developed the empirical dynamic model of BTP through the genetic programming. Kumar et al. [13] have developed the iron ore sinter properties prediction model through statistical analysis software system, with accuracies of the 79% for mean particle size (MPS), 91% for TI and 76% for reduction degradation index (RDI) prediction. Umadevi et al. [14] have developed the sinter quality analysis model to appraise how each key factors affect the sintering drum strength. In the actual sintering practice, the solid fuel and gas fuel consumption reflect the energy efficiency, the BTP reflects the process performance and TI reflects the property of the sinter products. Hence, the present work focuses the prediction models on these four characters.
In order to apply ensemble method to solve regression problems, Solomatine and Sherstha proposed AdaBoost.RT algorithm [15], where the letters R and T stands for regression and threshold, respectively. e absolute relative error (ARE) is used as the criterion to demarcate samples into correct and incorrect predictions. If the ARE of sample is less than the threshold ϕ, the predictor for this sample is remarked as the correct predictor; otherwise, it is regarded as incorrect. Such method is similar to project the regression problem into classification problem. e threshold ϕ do need to be selected initially, and it is a key factor that affected the performance of ensemble machines. According to Shreshtha and Solomatines' experiments [16], the ensemble model is stable while the value of ϕ is between 0 and 0.4. In order to determine the threshold value, Tian and Mao [17] presented a modified AdaBoost.RT algorithm by using a selfadaptive modification mechanism subjected to the change trend of the prediction error at each iteration. is approach has well performed in predicting the temperature of molten steel in ladle furnace, but the initial value of ϕ also needs to be manually fixed. Zhang et al. [18] established a robust AdaBoost.RT by considering the standard deviation of approximation errors to determine the threshold. e absolute error (AE) is used to demarcate samples as either well or poorly predicted in this approach.
is method has well  performed on the regression problems from UCI machine learning repository, but the relative factor λ should be optimized initially. In this study, a dynamically self-adjustable modifying the value of ϕ method is used instead of the invariable ϕ to improve the original AdaBoost.RT algorithm. e present work proposes to optimize processing control by means of an ensemble predictor, which improves the energy efficiency of sintering process. Since there are many dependent or independent factors in the sintering production process, it is difficult and even impossible to develop mechanistic or detailed first-principle models that encompass all the factors, and it is clear that not the more input variables, the better performance of the model. e RReliefF algorithm is adopted to extract input attributes that are significant in terms of sintering characters. en, the root mean square error (RMSE) is calculated in order to validate the approach of the attributes selection. e optimal parameter configuration of the proposed model is identified by the lowest value of RMSE. Finally, boosting-enhanced ELM with improved AdaBoost.RT is presented.

Extreme Learning
Machine. An ELM [19] is an efficient learning algorithm for single hidden layer feedforward neural networks (SLFNs) used to solve the classification and regression problems. Compared with conventional neural networks, ELM is easy to use and theoretically achieve a globally optimum solution with much faster learning speed and good generalization performance. e weights between input layer and hidden layer are selected randomly and the output weights determined analytically during the learning process [20]. A schematic of ELM network with n nodes in the input layer, L nodes in the hidden layer, and m nodes in the output layer is shown in Figure 2. e mechanism of ELM with multioutput is briefly depicted as follows.
For N arbitrary samples ( . . , t m i ] T is the expected output. e output function of ELM with L hidden nodes and activation function h(x) is as follows: where β j � [β j1 , β j2 , . . . , β jm ] T is the weight vector between hidden and output layers, is the output vector of hidden layer and denotes the feature mapping in ELM. e feature mapping h(x) is known to users with respect to the input x: where w � [w 1 , w 2 , . . . , w L ] T is the weight vector between input and hidden layers, b j is the threshold of the jth hidden node, and G(w j , b j , x) � h(w j · x i + b j ) is the output of the j hidden node. e objective of ELM is to calculate the weight vector β in minimizing both the output weights and the training errors: where C is a regularization coefficient, T is the training error with respect to the input vector x i . e training of ELM is equivalent to solving the dual optimization problem according to the KKT theorem: where α i is the Lagrange multiplier that corresponds to the ith training sample. In order to simplify the computational complexity of ELM, two solutions can be obtained according to the scale of training samples by solving the dual optimization problem.
(1) e training set is not huge: the HH T (size: N × N) is used in this case: where and T � [t 1 , t 2 , . . . , t N ] T is the expected output vector of the training set. H in equation (6) is the hidden layer output matrix of the ELM network. en, the final ELM output function is Input layer Hidden layer Output layer Computational Intelligence and Neuroscience (2) e training set is huge: the H T H (size: L × L) is used in this case In this case, the final ELM output function is ese two different methods to calculate the output weight vector can reduce the computational cost of ELM conveniently. For the small size of training data applications (N < L), the output of equation (7) can be used to rapidly increase the training speed, while the output of equation (9) always appear in the large scale data applications.

Improved AdaBoost.RT Algorithm.
In order to determine the threshold ϕ value effectively, a novel improvement of AdaBoost.RT is proposed in the present work. We embed the statistics theory related to the regression capability of the weak learner into the AdaBoost.RT algorithm. A dynamically self-adjustable modifying the value of ϕ method is used instead of the invariable ϕ. For the training data sets with m samples (x 1 , y 1 ), . . . , (x m , y m ), in which y i ∈ R is the output. e sample weights begin with uniform distribution initially: Hence, the first training of the weak learning machine (WL) receives an equal weight for each sample.
In each subsequent iterations with index t + 1(1 ≤ t < T), each sample weight D t+1 (i) is determined by the fraction error ε n t (n is power coefficient (e.g., linear, square, or cubic)) produced by the preceding iteration with respect to the sample (x i , y i ): e absolute relative error (ARE) for each training sample is used as the criterion for demarcating samples into correct and incorrect predictions. In order to do so, a constant threshold value ϕ is introduced in the AdaBoost.RT algorithm. e predictions of the weak learner for those samples are considered as erroneous samples: Hence, the error rate ε t in equation (11) is given by e new weights D t+1 (i) are normalized to ensure that D t+1 constitutes a probability distribution: e final ensemble output hypotheses of AdaBoost.RT is We propose a dynamically self-adjustable modifying the value of ϕ method to improve the AdaBoost.RT by embedding the statistics theory related to the regression capability of the weak learner into the training of ensemble predictor.
e set of erroneous samples in our proposed method is given by e μ t and σ t in equation (17) are the expected value and the standard deviation of the weak learners' predictions for the training set in the tth network. en, the weak learners' error rate is Hence, the weights D t+1 (i) are updated as e proposed method overcomes the limitation suffered by the original AdaBoost.RT where the threshold value is set empirically.
e critical threshold used in the boosting process becomes self-adaptive to the individual weak learners performance on the input data samples. erefore, the proposed approach to improve the AdaBoost.RT algorithm is capable to output the final hypotheses in optimally weighted ensemble of the weak learners.

RReliefF-Based Feature Selection.
e RReliefF algorithm [21] estimates the relevance of the attributes for solving a regression problem. RReliefF assigns a weight W[A] to each attribute A based on how well it distinguishes similar target values: where P diff A � P (different value of A | nearest instances), P diff C � P (different targets | nearest instances), and

Hybrid Prediction Model of Iron Ore Sintering
In the present work, the RReliefF algorithm is used to select the input attributes for the ELM networks. e selected features influencing on the corresponding sinter characters can be ranked firstly through RReliefF. en, we package the new superior features sequence into 5, 10, 15, and 20 and form as the input features set of the ELM network. Lastly, the root mean square error (RMSE) of the different input features set of the model are compared with each other, and the corresponding features set with the lowest value of RMSE is considered as the optimal variable group for the ELM networks. erefore, the virtual model of sintering is optimized to predict the sintering characters.
Since the weight vector w j between input nodes and hidden nodes and the threshold of jth hidden node b j are selected randomly in the original ELM network, it does not have good stability and generalization capability for data with small sample size. With this in mind, we propose an ensemble model combined the improved AdaBoost.RT with ELM to improve the generalization capability of the original ELM algorithm. Here, the improved Ada-Boost.RT is used as the ensemble method, and the ELM is the weak learner. e hybrid intelligent model is shown in Figure 3.

Variables and Data.
A total of a year of operational data from no. 4 sintering machine in Baosteel are collected for modeling purposes. After removing the blank data, we introduce 3-sigma rule to deal with the abnormal value of measurement error. In applications, if the repeated measurement data satisfy the x i would be considered as an abnormal value and be rejected, where x i is the ith measurement value, � x is the average of all measurement values, and σ is the standard deviation of x i sequence.
is is the Pauta criterion of measurement error theory. e total data available for modeling reduced to 270 data sets, which are used for training and testing with a network model. Descriptive statistics for the burdening indexes (x 1 ∼ x 5 ), operating indexes (x 6 ∼ x 14 ), chemical components (x 15 ∼ x 20 ), and the four sinter characters (Y 1 ∼ Y 4 ) are tabulated in Table 1. From these 270 sets of data, 220 are selected to serve as training set for ensemble ELM model. e residual 50 sets data recorded are used as testing set, which are never used during the training of the prediction model. e values of error parameters are calculated as follows: where y i is the actual value, y i ′ is the predicted value, y i is the average of actual values, y ′ is the average of predicted values, and m is the size of testing set.

e Weightiness Sequence of the Attributes.
We applied the RReliefF algorithm to rank the attributes based on their merit scores. e weights are plotted in Figure 4. e 20 attributes for each character (Y 1 , Y 2 , Y 3 , and Y 4 ) can be ranked as tabulated in Table 2. It can be found from Table 2 that the first superior factors of the four sintering characters are bed permeability, ignition density, dolomite, and limestone, respectively. e first five attributes in the weightiness order are considered as the first superior factor package to be the input into the ELM network prediction model and the RMSE is calculated. en, the second five attributes in the weightiness sequence is combined with the first five attributes set and input into the ELMs network model until all the 20 attributes have been chosen. Lastly, Computational Intelligence and Neuroscience RMSE of the above different attribute sets are compared with each other, and the corresponding attribute sets with the lowest value of RMSE is considered as the optimal attributes group for the ELM network prediction model.

Model Parameters Selection.
In the proposed prediction model framework, five user-specified parameters are need to be selected to achieve the best generalization performance. For ELM network, the sigmoid function     Computational Intelligence and Neuroscience irty trails of simulations have been conducted, and the performance of the (C, L) is verified using the average RMSE in testing. e best performed combinations of (C, L) are selected for each input nodes case as presented in Figure 5. Figure 5 shows that RMSE of the testing sets are varied with the input nodes. To make the network efficiently calculate, the best performed of input nodes with the lowest RMSE has been chosen. erefore, the input attributes of ELM network structure of solid fuel consumption is 20, the input attributes of gas fuel consumption is 15, the input attributes of BTP is 5, and the input attributes of TI is 15.
In addition, for the improved AdaBoost.RT-based ensemble ELM, the number of ELM networks needs to be determined. According to Occams Razor theory, simpler models may capture the underlying structure better and may have better predictive performance than excessively complex models which are affected by statistical noise. erefore, the number of weak learners (T) need not be very large. In this paper, the number of ELM networks is set to be 5, 10, 15, 20, 25, and 30, and the optimal parameter is selected as the one which results in the best average RMSE in testing. Besides, the fraction error ε n t should be optimized with n � 1, 2, 3. Table 3 shows the examples of setting both T and ε n t for our simulation of the solid fuel consumption (Y 1 ) with the best parameter combination selected according to Figure 5.
As illustrated in Table 3, the cubic fraction error performs better than the other two fraction error. e RMSE is less sensitive to the fraction error as long as the number of weak learners T is larger than 15. e ensemble model reaches the best performance with ε 3 t and T � 15, then we set the ensemble model with T � 15 and ε 3 t in the solid fuel consumption (Y 1 ) experiment. e best performed combinations of user-specified parameters are selected for each sintering character as presented in Table 4.

Model Implementation.
irty trials of simulations have been conducted, and the averages serve as the "final results." e testing errors are given in Table 5. e accuracy (predicting solid fuel consumption compared with real solid fuel consumption does not go beyond ± 2 kg, that value for gas fuel consumption, BTP, and TI are ± 0.1 m 3 , ± 0.5, and ± 1.5%, respectively) are calculated to describe the performance of the proposed model and also tabulated in Table 5. e predicted values denoted by asterisks, and actual values denoted by circles are displayed in Figures 6-9.
As we can see in Figures 6-9, the predicted values are pretty close to corresponding actual ones in most cases. From Table 5, the Pearson correlation between actual values and predicted values in gas fuel consumption experiment is 0.9383, and it is the highest in the four characters experiments. e results indicate that the accuracy of the hybrid ensemble prediction model based on ELM and improved Adaboost.RT is satisfied for the process production of iron ore sintering.

Application
In order to figure out how the four sintering characters change with respect to the variation in input attributes, the sensitivity analysis of the first superior factor is conducted with the proposed model. e output value of the sintering character is calculated as the first superior factor is adjusted across its variation interval, while the other input attributes are kept fixed at their average values. Figure 10 shows the influence of the four first superior factors on the four sintering characters. e permeability of bed is a key factor that influences the sintering rate. e following equation is the bed permeability for the sintering machine [22]: where F denotes the air flow rate (Nm 3 /h), W denotes the pallet width (m), L denotes the strand length (m), κ denotes the bed permeability ((Nm 3 /h/m 2 )/(kPa/m) n ), P v denotes the suction pressure (kPa), H denotes the bed height (m), and n is an empirical factor (0.55∼0.65).
Hence, high bed permeability allows faster air flow through the bed and thus faster sintering. An increase in bed permeability accelerates the heat front propagation speed through the bed. e fast air flow rate results in a lower thermal efficiency and higher energy consumption. erefore, the solid fuel consumption increased with increase in bed permeability index. e increase of ignition density results in the increase of gas fuel consumption. Ignition density define as the fuel gas (usually coke oven gas, COG) used during the igniting process. e purpose of ignition is to heat the sinter mixture that has been placed on the pallet to the semimolten state. e solid fuel in the mixture on the surface of the bed is ignited due to the igniting gas. en, sintering is conducted from top to bottom under the action of suction of wind boxes. us, igniting is essential in iron ore sintering. e gas fuel consumption is increased as the ignition density increase.
e magnesium ferrite forms with the addition of dolomite in the sinter mixture and lowers the reducibility. Compared with CaO, MgO leads to an increase in the liquidus temperature of the melt phase. erefore, the sintering period increases with dolomite. is results in BTP lagging behind; thus, BTP increased with increase in dolomite addition. e strength of the sinter is dependent on the property and morphology of sinter. e high content of calcium ferrite in the iron ore sinter, in general, improves the tumbler strength of the sinter. e calcium ferrite phase increases with increase of the addition of limestone. erefore, the TI increases with increase in limestone.
rough adjusting the first superior variables for the solid fuel, gas fuel, BTP, and TI, the optimization of the energy consumption and the control of the sinter quality

Conclusions
An integrated predictive model combined with feature selection and an ensemble method for sintering is proposed. RReliefF algorithm, as a mathematic method that ranks the sequence of the weightiness of lots of attributes in iron ore sintering system, can distinguish the superior influence parameters on energy consumption and sinter quality from the complicated factors. An improved AdaBoost.RT is proposed by using the statistics distribution of weak learners predicted values to dynamically determine the threshold. e virtual prediction model of the sintering process, which is combined the improved AdaBoost.RT with ELM network, has been achieved to simulate the sintering with the high coincidence using the production data in the steel making plant.
Adopting the ensemble ELM model, we can construct the solid fuel consumption prediction model with the prediction accuracy of 96%; the gas fuel consumption prediction model with the prediction accuracy of 96%; the BTP prediction model with the prediction accuracy of 94%; and the TI prediction model with the prediction accuracy of 90%. ese are satisfied for the process production of iron ore sintering. e improved AdaBoost.RT algorithm can promote the performance of the regression problems when the output value y i ≠ 0 and the average value of predictions μ ≠ 0. If the true value of the sample comes to 0, the absolute relative  Data Availability e data used to support the findings of this study were supplied by Baosteel under license and so cannot be made freely available.