An Incremental Learning Ensemble Strategy for Industrial Process Soft Sensors

1 School of Electrical Engineering & Automation and Key Laboratory of Advanced Electrical Engineering and Energy Technology, Tianjin Polytechnic University, Tianjin, China 2State Key Laboratory of Process Automation in Mining & Metallurgy, Beijing 100160, China 3Beijing Key Laboratory of Process Automation in Mining & Metallurgy Research Fund Project, Beijing 100160, China 4School of Economics and Management, Tianjin Polytechnic University, Tianjin, China


Introduction
During industrial processes, plants are usually heavily instrumented with a large number of sensors for process monitoring and control.However, there are still many process parameters that cannot be measured accurately because of high temperature, high pressure, complex physical and chemical reactions and large delays, etc. Soft sensor technology provides an effective way to solve these problems.The original and still the most dominant application area of soft sensors is the prediction of process variables, which can be determined either at low sampling rates or through off-line analysis only.Because these variables are often related to the process product quality, they are very important for process control and quality management.Additionally, the soft sensor application field usually refers to online prediction during the process of production.
Currently, with the continuous improvement of automation in industrial production, large amounts of industrial process data can be measured, collected, and stored automatically.It can provide strong support for the establishment of data-driven soft sensor models.Meanwhile, with the rapid development and wide application of big data technology, soft sensor technology has already been used widely and plays an essential role in the development of industrial process detection and control systems in industrial production.Artificial intelligence and machine learning, as the important core technologies, are getting increasingly more attention.The traditional machine learning algorithm generally refers to the single learning machine model that 2 Complexity is trained by training sets, and then the unknown samples will be predicted based on this model.However, the single learning machine models have to face defects that cannot be overcome by themselves, such as unsatisfactory accuracy and generalization performance, especially for complex industrial processes.Specifically, in the supervised machine learning approach, the model's hypothesis is produced to predict the new incoming instances by using predefined label instances.When the multiple hypotheses that support the final decision are aggregated together, it is called ensemble learning.Compared with the single learning machine model, the ensemble learning technique is beneficial for improving quality and accuracy.Therefore, increasingly more researchers are studying how to improve the speed, accuracy and generalization performance of integrated algorithms instead of developing strong learning machines.
Ensemble algorithms were originally developed for solving binary classification problems [1], and then AdaBoost.M1 and AdaBoost.M2 were proposed by Freund and Schapire [2] for solving multiclassification problems.Thus far, there are many different versions of boosting algorithms for solving classification problems [3][4][5][6][7], such as boosting by filtering and boosting by subsampling.However, for regression problems, it is not possible to predict the output exactly as that in classification.To solve regression problems using ensemble techniques, Freund and Schapire [2] extended AdaBoost.M2 to AdaBoost.R, which projects the regression sample into a classification data set.Drucker [8] proposed the AdaBoost.R2 algorithm, which is an ad hoc modification of AdaBoost.R. Avnimelech and Intrator [9] extended the boosting algorithm for regression problems by introducing the notion of weak and strong learning as well as an appropriate equivalence theorem between the two.Feely [10] proposed BEM (big error margin) boosting method, which is quite similar to the AdaBoost.R2.In BEM, the prediction error is compared with the preset threshold value, BEM, and the corresponding example is classified as either well or poorly predicted.Shrestha and Solomatine [11] proposed an AdaBoost.RT, with the idea of filtering out the examples with relative estimation errors that are higher than the preset threshold value.However, the value of the threshold is difficult to set without experience.For solving this problem, Tian and Mao [12] present a modified AdaBoost.RT algorithm that can adaptively modify the threshold value according to the change in RMSE.Although ensemble algorithms have the ability to enhance the accuracy of soft sensors, they are still at a loss for the further information in the new coming data.
In recent years, with the rapid growth of data size, a fresh research perspective has arisen to face the large amount of unknown important information contained in incoming data streams in various fields.How can we obtain methods that can quickly and efficiently extract information from constantly incoming data?The batch learning is meaningless, and the algorithm needs to have the capability of real-time processing because of the demand of real-time updated data in industrial processes.Incremental learning idea is helpful to solve the above problem.If a learning machine has this kind of idea, it can learn new knowledge from new data sets and can retain old knowledge without accessing the old data set.Thus, the incremental learning strategy can profoundly increase the processing speed of new data while also saving the computer's storage space.The ensemble learning methods can be improved by combining the characteristics of the ensemble strategy and incremental learning.It is an effective and suitable way to solve the problem of stream data mining [13][14][15].Learn++ is a representative ensemble algorithm with the ability of incremental learning.This algorithm is designed by Polikar et al. based on AdaBoost and supervised learning [16].In Learn ++, the new data is also assigned sample weights, which update according to the results of classification at each iteration.Then, a newly trained weak classifier is added to the ensemble classifier.Based on the Learn ++ CDS algorithm, G Ditzler and Polikar proposed Learn ++ NIE [17] to improve the effect of classification on a few categories.Most of research of incremental learning and ensemble algorithm focus on the classification field, while the research in the field of regression is still less.Meanwhile the limitation of ensemble approaches is that they cannot address the essential problem of incremental learning well, what the essential problem is accumulating experience over time then adapting and using it to facilitate future learning process [18][19][20].
For the process of industrial production, increasingly more intelligent methods are used in soft sensors with the fast development of artificial intelligence.However, the practical applications of soft sensors in industrial production are not good.The common shortages of soft sensors are unsatisfactory, unstable prediction accuracy, and poor online updating abilities.It is difficult to meet a variety of changes in the process of industrial production.Therefore, in this paper, we mainly focus on how to add the incremental learning capability to the ensemble soft sensor modeling method and hopefully provide useful suggestions to enhance both the generation and online application abilities of soft sensors for industrial process.Aiming at the demands of soft sensors for industrial applications, a new detection strategy is proposed with multiple learning machines ensembles to improve the accuracy of the soft sensors based on intelligent algorithms.Additionally, in practical production applications, acquisition of information in new production data is expensive and time consuming.Consequently, it is necessary to update the soft sensor in an incremental fashion to accommodate new data without compromising the performance on old data.Practically, in most traditional intelligent prediction models for industrial process, the updates are often neglected.Some models use traditional updating methods that retrain the models by using all production data or using the updated data and forgo the old data.This kind of methods is not good enough because some good performances have to be lost to learn new information [21].Against this background, we present a new incremental learning ensemble strategy with a better incremental learning ability to establish the soft sensor model, which can learn additional information from new data and preserve previously acquired knowledge.The update does not require the original data that was used to train the existing old model.
In the rest of this paper, we first describe the details of the incremental learning ensemble strategy (ILES), which involves the strategy of updating the weights, the ensemble strategy, and the strategy of incremental learning for realtime updating.Then, we design experiments to test the performance of the ILES for industrial process soft sensors.The sizing percentage of the soft sensor model is built by the ILES in the sizing production process.The parameters of the ILES are discussed.We also compare the performance of the ILES to those of other methods.To verify the universal use of the new algorithm, three test functions are used to test the improvement on the predictive performance of the ILES.Finally, we summarize our conclusions and highlight future research directions.

The Incremental Learning Ensemble Strategy
The industrial process needs soft sensors with good accuracy and online updating performance.Here, we focus on incorporating the incremental learning idea into the ensemble regression strategy to achieve better performance of soft sensors.A new ensemble strategy called ILES for industrial process soft sensors that combines the ensemble strategy with the incremental learning idea is proposed.The ILES has the ability to enhance soft sensors' accuracy by aggregating the multiple sublearning machines according to their training errors and prediction errors.Additionally, during the iteration process, incremental learning is added to obtain the information from new data by updating the weights.
It is beneficial to enhance the real-time updating ability of industrial process online soft sensors.The details of the ILES are described as shown in Algorithm 1.
. .Strategy of Updating the Weight.In each iteration of  ( = 1, 2, . . ., ), the initial  1 () = 1/() is distributed to each sample ( 푖 ,  푖 ) with the same values.This means that the samples have the same chance to be included in training dataset or tested dataset at the beginning.In the subsequent iterations, the weight will be calculated as  푡 () =  푡 / ∑ 푚 푖=1  푡 () for every sublearning machine (in each iteration of ).In contrast to the traditional AdaBoost.R, here, the testing subdataset is added to test the learning performance in each iteration.It is useful to ensure the generalization performance of the ensemble soft sensors.Then, the distribution will be changed according to the training and testing errors at the end of each iteration.Here, the training subdataset  푡 and the testing subdataset  푡 will be randomly chosen according to  푡 (for example, by using the roulette method).The sublearning machine is trained by  푡 , and a hypothesized soft sensor  푡 :  →  will be obtained.Then, the training error and testing error of  푡 can be calculated as follows: The error rate of  푡 on  푘 =  푡 +  푡 is defined as follows: If  푡 > , the submodel is regarded as an unqualified and suspicious model.The hypothesis  푡 is given up.Otherwise, the power coefficient is calculated as  푡 =  푛 푡 (e.g., linear, square, or cubic).Here,  (0 <  < 1) is the coefficient of determination.After  ( = 1, 2, . . .,  푘 ) iterations, the composite hypothesis  푘 can be obtained according to the  hypothesized soft sensors (sublearning machines)  1 ,  2 , . . .,  푡 .The training subdataset error, the testing subdataset error, and the error rate of  푘 are calculated similarly to those of  푡 .In the same way, if  푘 > , the hypothesis  푘 is given up.At the end of the iterations, according to the error rate  푡 , the weight is updated as follows: where In the next iteration, the  푡 and  푡 will be chosen again according to the new distribution, which is calculated by the new weight  푡+1 .During the above process of iterations, the updating of the weights depends on the training and testing performance of the sublearning machines with different data.Therefore, the data with large errors will have a larger distribution for the difficult learning.It means that the "difficult" data will have more chances to be trained until the information in the data is obtained.Conversely, the sublearning machines or hypothesized soft sensors are reserved selectively based on their performance.Therefore, the final hypothesized soft sensors are well qualified to aggregate the composite hypothesis.This strategy is very effective for improving the accuracy of ensemble soft sensors.
Here, the better hypotheses will be aggregated with larger chances.Therefore, the best performance of ensemble soft sensors is ensured based on these sublearning machines.Then, the training subdataset error and testing subdataset error of  푘 can be calculated similarly to the error of  푡 .
Output: Obtain the ensemble soft sensor model according to  푘 : Algorithm 1: Incremental learning ensemble strategy.
After  hypotheses are generated for each subdataset, the final hypothesis  푓푖푛 is obtained by the weighted majority voting of all the composite hypotheses.
When new data come constantly during the industrial process, new subdatasets will be generated (they will be the  + 1,  + 2 . ..).Based on a new subdataset, a new hypothesized soft sensor can be trained by a new iteration.The new information in the new data will be obtained and added to the final ensemble soft sensor according to (7).As the added incremental learning strategy, the ensemble soft sensor is updated based on the old hypothesis.Therefore, the information in the old data is also retained, and the increment of information from new data is achieved.Overall, in the above ILES, the ensemble strategy is efficient to improve the prediction accuracy using the changed distribution.Therefore, the ILES will give more attention to the "difficult" data with big errors in every iteration that are attributable to the new distribution.Due to the harder learning for the "difficult" data, more information can be obtained.Therefore, the soft sensor model is built more completely, and the accuracy of perdition is improved.Moreover, the data that is used to train the sublearning machines is divided into a training subdataset and a testing subdataset.The testing error will be used in the following steps: the weight update and the composite hypothesis ensemble.Therefore, the generalization of the soft sensor model based on the ILES can be improved efficiently, especially compared with traditional algorithms.Additionally, when the new data is added, the new ILES with incremental learning ability can learn the new data in realtime and does not give up the old information from the old data.The ILES can save the information of old sublearning machines that have been trained, but it does not need to save the original data.In other words, only a small amount of new production data being saved is enough.This strategy is efficient to save space.Furthermore, the new ILES also may save time compared with the traditional updating method.This strategy is attributed to the conservation of the old  푘 and the sublearning machines in composite hypotheses (7).

Experiments
In this chapter, the proposed ILES is tested in the sizing production process for predicting the sizing percentage.First, the influence of each parameter on the performance of the proposed algorithm is discussed.Meanwhile, the real industrial process data is used to establish the soft sensor model to verify the incremental learning performance of the algorithm.Finally, to prove its generalization performance, three test functions are used to verify the improvement of the prediction performance.The methods are implemented using MATLAB language and all the experiments are performed on a PC with the Intel Core 7500U CPU (2.70GHZ for each single core) and the Windows 10 operation system.
. .Sizing Production Process and Sizing Percentage.The double-dip and double pressure sizing processes are widely used in textile mills, as shown in Figure 1.The sizing percentage plays an important role during the process of sizing for good sizing quality.In addition, the sizing agent control of warp sizing is essential for improving both productivity and product quality.The sizing percentage online detection is a key factor for successful sizing control during the sizing process.The traditional sizing detection methods, which are instruments measurement and indirect calculation, have expensive prices or unsatisfactory accuracy.Soft sensors provide an effective way to predict the sizing percentage and to overcome the above shortages.According to the mechanism analysis of the sizing process, the influencing factors on the sizing percentage are slurry concentration, slurry viscosity, slurry temperature, the pressure of the first Grouting roller, the pressure of the second Grouting roller, the position of immersion roller, the speed of the sizing machine, the cover coefficient of the yarn, the yarn tension, and the drying temperature [22].In the following soft sensor modeling process, the inputs of soft sensors are the nine influencing factors, and the output is the sizing percentage.
. .Experiments for the Parameters of the ILES.Here, we select ELM as the sublearning machine of the ILES, due to its good performance, such as fast learning speed and simple parameter choices [22,23]; the appendix reviews the process of ELM.Then, experiments with different parameters of the ILES are done to research the performance trend of the ILES when the parameters change.
First, the experiments to assess the ILES algorithm's performance are done with different  푘 s.Here, the  푘 increases from 1 to 15. Figure 2 shows the results of the training errors and the testing errors with different  푘 s.Along with the increasing  푘 , the training and testing errors decrease.When  푘 increases to 7, the testing error is the smallest.However, when  푘 increases to more than 9, the testing error becomes larger again.Furthermore, the training errors only slightly decrease.Therefore, we can draw the conclusion when the parameter  푘 is 7 that the performance of ILES is the best.The comparison is also done between AdaBoost.R and the ILES regarding the testing errors with different numbers of ELMs in Figure 3.Although the RMSE means of AdaBoost.R and the ILES are different, their performance trends are similar with the increasing number of ELMs.Here, the RMSE is described as Second, we discuss the impact of parameter ( and ) changes on the ILES performance.The experiments demonstrate that when  is too small, the performance of ELM is difficult to achieve the preset goal, and the iteration is difficult to stop. is also not larger than 80 percent of the average of the relative errors of ELMs; otherwise the  푘 can not be obtained.Furthermore, the value of  determines the number of "better samples".Here the "better samples" refer to the samples that can reach the expected precision standard of predicted results  1.When  = 0.06 and  = 0.15, the model has the best performance (the RMSE is 0.3084).
. .Experiments for the Learning Process of the ILES.For establishing the soft sensor model based on the ILES, a total of 550 observations of real production data are collected from Tianjin Textile Engineering Institute Co., Ltd., of which 50 data are selected randomly as testing data.The remaining 500 data are divided into two data sets according to the time of production.The former 450 data are used as the training data set, and the latter 50 data are used as the update data set.The inputs are 9 factors that affect the sizing percentage.The output is the sizing percentage.The parameters of the ILES are  = 9,  푘 = 7,  = 0.06, and  = 0.15.That is to say, the 450 training data are divided into 9 subdatasets  1 ∼  9 , and the number of ELMs is 7.According to the needs of the sizing production, the predictive accuracy of the soft sensors is defined as where  푠 is the number of times with an error < 0.6 and  푤 is the total number of testing times.Since the learning process is similar to the OS-ELM [24] update process.It is an online assessment model that is capable of updating network parameters based on new arriving data without retrains historical data.Therefore, while comparing the accuracy of IELS learning process, it is also compared with OS-ELM.The two columns on the right side of Table 2 show the changes in the soft sensor accuracy during the learning process of the ILES and OS-ELM.It can be seen that the stability and accuracy of ILES are superior to OS-ELM.
. .Comparison.In this experiment, we used 10-fold cross validation to test the model's performance.The first 500 data sets are randomly divided into 10 subdatasets S 1 ∼ S 10 .The remaining 50 data sets are used as the updated data set S 11 .The single subdataset from S 1 ∼ S 10 will be retained as the  can achieve 0.2.However, the testing errors of the prediction models of AdaBoost.R and the ILES are smaller than that of the single ELM.This result means that the ensemble methods have better generalization performance.Table 3 shows the performance of the prediction model based on different methods after updating.The results of the comparison experiments show that the soft sensor based on the new ILES has the best accuracy and the smallest RMSE.This result is attributed to the use of the testing subdataset in the ensemble strategy and the incremental learning strategy during the learning process of the ILES algorithm.Overall, the accuracy of the soft sensor can fit the needs of actual production processes.Moreover, the incremental learning performance can ensure the application of industrial process soft sensors in practical production.

. . Experiments for the Performance of the ILES by Test
Functions.To verify the universal use of the algorithm, three test functions are used to test the improvement of the prediction performance.These test functions are Friedman#1, Friedman#2, and Friedman#3.Table 4 shows the expression of each test model and the value range of each variable.Friedman#1 has a total of 10 input variables, of which there are five input variables associated with the output variable, and the other five input variables are independent of the output variables.The Friedman#2 and Friedman#3 test functions incorporate the impedance phase change of the AC circuit.
Through continuous debugging, the parameters of each algorithm are determined as shown in Table 5.For every test function, generate a total of 900 data, and 78% of the total samples were selected as training samples, 11% as updating samples and 11% as testing samples, according to the need for different test models.That is to say, the 700 training data are divided into 7 subdatasets  1 ∼  7 .Figures 5-7 show the predicted results of Friedman#1, Friedman#2, and Friedman #3 with different soft sensor models based on different methods (time consuming 227s).The comparison of the performances of the different soft sensors is shown in Table 6.It shows the soft sensor model based on ILES has the best performance.

Conclusions
An ILES algorithm is proposed for better accuracy and incremental learning ability for industrial process soft sensors.The sizing percentage soft sensor model is established to test the performance of the ILES.The main factors that influence the sizing percentage are the inputs of the soft sensor model.

Figure 3 :Figure 4 :
Figure 3: The performance trends of the ILES and AdaBoost.R with different parameters  푘 .

Table 1 :
The RMSE of the ILES with different parameters Δ, Φ ( 푘 = 7).If  is too small, the ELM soft sensor model ( 푡 ) will not be sufficiently obtained.If  is too large, the "bad" ELM model ( 푡 ) will be aggregated into the final composite hypothesized  푓푖푛 ().Then, the accuracy of the ILES cannot be improved.The relationships among , , and RMSE are shown in Table

Table 2 :
The changes in the soft sensor accuracy during the learning process of the ILES.

Table 3 :
The performance of the soft sensor model based on different methods with 10-fold cross validation.

Table 5 :
Parameters of the algorithmic performance test.