Data-Driven Optimal Control for Pulp Washing Process Based on Neural Network

Pulp washing process has the features of multivariate, time delay, nonlinearity. Considering the diﬃculties of modeling and optimal control in pulp washing process, a data-driven operational-pattern optimization method is proposed to model and optimize the pulp washing process in this paper. The most important quality indexes of pulp washing performance are residual soda in the washed pulp and Baume degree of extracted black liquor. Considering the diﬃculties of modeling, online measurement of these indexes, two-step neural networks, and multivariate logistic regression are used to establish the prediction models of residual soda and Baume degree. The mathematical model of the washing process can be identiﬁed, and the indexes can meet the production requirements. In the target of better product quality, low cost, and low energy consumption, a multiobjective problems is solved by ant colony optimization algorithm based on the optimized operational-pattern database. It shows that the theoretical analyses are correct and the practical applications are feasible, optimization control system has been designed for the pulp washing process, and the practical results show that pulp production increased by 20% and water consumption decreased by nearly 30%. This method is eﬀective in the pulp washing process.


Introduction
Pulp washing is one of the important parts in the pulpmaking process. Its main purpose is to wash soluble inorganic and organic substances in pulp and obtain high concentration of black liquor [1].
e most important quality indexes to evaluate pulp washing performance are residual soda for the washed pulp and Baume degree for the yielded thick black liquor. e conductivity of the residual soda extracted from the washed pulp is a measurement of dissolved solid content; It is expected that the residual soda in the washed pulp should be as little as possible. e Baume degree is used to denote the consistency of the thick liquor obtained in the first stage of the pulp washer [2]; it is required to be as high as possible in production. It can be seen that the two indexes are incompatible. From the perspective of pulp washing, it is required much more water to wash the pulp, but the black liquor will be diluted by the increase of water. From the perspective of drug recovery, in order to reduce the steam consumption of black liquor during evaporation and condensing procedure [3], a higher concentration and temperature of black liquor is required, so the washing water should be used as less as possible. For these two contradictory requirements, the key point is that a balance must be maintained between the amount of washing water and desired pulp cleanliness.
To address these issues, several classical control schemes are produced. Black liquor consistency control, residual soda loss control, dilution factor optimization control, multicomponent control, and model-based optimization control were produced. As the loss of residual soda is not taken into account, the method of Baume degree online measurement is only implemented in some small production mills [4]. Aiming at the disadvantage of single control, a residual soda predictor is constructed based on the predictive control [5]. As it is limited to the measurement and control of a certain parameter, more attention should be given to the modeling and optimization control for the whole washing process.
Model-based optimal control and multicomponent control [6] play a great role in the optimization of operation and control. Model-based optimization control can analyze the relationship between variables according to the model. In recent years, data-driven modeling has been widely used. In this method, based on online learning and calculating the control quantity, the current state is matched by a large amount of process data. en various static qualities required by the system can be obtained. e characteristics of sample data are the main criterion in data-driven modeling, which means that data speak for themselves [7][8][9]. It can convert high-dimensional data into low-dimensional data without losing important information. Pulping enterprises produce and store a large amount data about production parameters, equipment, and process data every day; those data imply the process change, equipment operation, working condition fluctuation, and other information. In the washing process, there is a wealth of online and offline measurement data, such as temperature, washing drum pressure difference and vacuum degree, pulp layer thickness, pulp concentration, pulp species, pulp hardness, the amount of fresh water added, and the times of washing.
In view of this, aiming at the characteristics of the pulping process, a data-driven operational-pattern optimization method to model and optimize the pulp washing process is proposed in this paper. Based on mechanism analysis, the basic concepts about data-driven operational pattern for the pulp washing process are described. e datadriven prediction models of residual soda and Baume degree are established by PCA-BP (principal component analysis backpropagation) neural network and multivariate logistic regression. Based on the modeled indexes, an overall evaluation model of washing quality is proposed to analyze the large number of industrial operation data. en, an optimized operational-pattern database is constructed by the criterion of fuzzy cluster and pattern matching algorithm. Finally, with the target of better product quality, low cost, and low energy consumption, the optimum operational pattern is obtained by ant colony optimization algorithm from the optimized operational-pattern database. Based on the hardware of Siemens S7-400 PLC and software of WinCC6.0 & Step7, an optimization control system has been designed for the pulp washing process. A practical application in a paper mill in Shandong province of China shows that pulp production increased by 20% and water consumption decreased by nearly 30%. It proves the effectiveness of this method. e main contribution of this paper is to solve the soft sensor modeling problem and optimize the pulp washing process. e soft sensor model of the residual soda and the Baume degree were obtained by two-step neural network. e multiobjective optimization of high yield, low cost, and low consumption in the pulp washing process was realized. A systematic and targeted optimization scheme of the pulp washing process was provided. e rest of the paper is organized as follows. Section 2 illustrates the crafts of the pulp washing process. e two-step neural network identification and mathematical model of residual soda and Baume degree based on least square method are given in Section 3. Multiobjective optimization for the washing process is given in Section 4. Optimization results and the application of this study in a paper mill are described in Section 5. Section 6 concludes the paper.

Crafts and Mode Description of the Pulp
Washing Process 2.1. Crafts of the Pulp Washing Process. Currently, multistage countercurrent washing has been widely used in multiple series washing machines (vacuum washing machines) or one separation into multiple washing machines (horizontal belt washing machines). In this device, the pulp is discharged from the last section, and washing water is opposite to the last section. e reverse washing process can be briefly described as shown in Figure 1.
In this sequence of washers, the pulp medium and the washer are generally arranged to flow counter current to each other. Fresh water is typically used to wash the pulp sheet on the last stage washer. e filtrate was pulled through pulp on the proceeding washer.
is aids in minimizing dilution of the liquor which is separated from the pulp and facilitates the formation of recyclable cooking chemicals.. Hence, it is desirable that the washing losses be kept as low as possible while using a minimum of washing water.

Operation Mode Description of the Pulp Washing Process.
In the pulp washing control system, six different washing stages are required to obtain the necessary pulp cleanliness. Subsystems of consistency, level, flow, temperature, drum rotation speed, and multiobjective optimization are involved.
e former five subsystems are used as a basic control hierarchy to guarantee the steady running of the counter current washing process, meanwhile, which ensures the washed pulp with a residual soda as low as possible and thick black liquor in the first stage filtrate tank with a Baume degree as high as possible.
e data of complex industrial processes mainly include input conditions, state parameters, operation parameters, and technical index [10,11]. For the pulp washing process, the process data can be defined according to the operation mode description as follows: (i) e input condition of the pulp washing process at time t is r(t) � [r 1 (t), r 2 (t), r 3 (t), r 4 (t), r 5 (t)], where r 1 (t), r 2 (t), r 3 (t), r 4 (t), and r 5 (t) indicate pulp species, pulp thickness, pulp hardness, cooking method, and washing times, respectively. (ii) e state parameters of the pulp washing process at time t are s(t) � [s 1 (t), s 2 (t)], where s 1 (t) and s 2 (t) indicate the Baume degree of black liquor and residual soda, respectively. (iii) e operation parameters of the pulp washing process at time t are q(t) � [q 1 (t), q 2 (t), q 3 (t)], where q 1 (t), q 2 (t), and q 3 (t) indicate pulp concentration, pulp flow, and water flow, respectively.
(iv) e technical index of the pulp washing process at , which indicates the target required to be achieved in the production process, where o 1 (t), o 2 (t), and o 3 (t) indicate pulp washing cleanliness, pulp output, and consumption cost, respectively.
Finally, the operation mode of the pulp washing process is established, and it can be described as p � [r T , s T , q T ] T � [r 1 , . . . , r 5 , s 1 , s 2 , q 1 , . . . , q 3 ] T . It is construed by input condition, state parameters, and operation parameters.

Two-Step Neural Network Modeling
e core idea of the operation optimization of the pulp washing process is to optimize the pulp washing quality. As the two contradictory pulp washing quality indexes (residual soda and the Baume degree of black liquor) are difficult to measure directly, a prediction model should be established. Neural network has a very prominent effect on modeling, classification, and prediction of nonlinear systems [12]. e neural network modeling method was used to establish the network structure between the main components and quality variables, and the soft measurement model of the pulp washing process was obtained.
Generally speaking, the result of washing is a function of many variables, such as the amount of wash water, feed consistency, air entrancement, sheet formation, wash water distribution, and discharge consistency. Most of these variables are interrelated, and an improvement in one variable may well have a favorable or unfavorable impact on the others. e primary objectives for achieving the best washing result are the lowest feed consistency, optimum mat formation with a uniform basis weight, a uniform shower liquor distribution, the highest discharge consistency, and minimal air content in the feed to the washer [13]. Pulp washing is a function of many variables, such as the amount of wash water, feed consistency, air entrancement, sheet formation, wash water distribution, and discharge consistency. And most of these variables are interrelated. e PCA method was used to preprocess the process variables involved in noise reduction, dimension reduction, and negative correlation elimination [14]. It helps to reduce the complexity of neural network.

Input Variable Selection Based on PCA.
A large number of experiments have shown that the factors influencing the washing quality of pulp include sizing concentration sizing flow rate, water addition, thickness of pulp layer, vacuum degree, pulp species, washing water temperature, pulp hardness, and pulping method. e dynamic PCA method was used to screen out these variables for establishing the mathematical model of the pulp washing process. e calculation steps are as follows: Step 1. Initialize the sample database. In order to fully reflect the dynamics of the process, some appropriate process data are added and a new sample database is set up. Standardize the new sample database and calculate the correction variance every time, until the mean variance no longer changes; the correction value of each variable can be obtained, and then the initialized sample database is established.
Step 2. PCA for the sample database. e load matrix and eigenvalue vector were obtained by PCA of pulp concentration, flow, dilution water, pulp thickness, water temperature, and pulp hardness. Find and eliminate the variable corresponding to the largest absolute value of coefficient in the first load vector. Repeat the process until just three variables are left (the number of principal variables was determined by significant correlation [15]; given the significance level α � 1%, the critical value of correlation coefficient significance is 0.487; find the largest absolute value of each column from the component matrix and compare with 0.487; when the largest absolute value in the k + 1 column is less than 0.487, k principal components can be extracted; in this paper, k � 3).

Mathematical Problems in Engineering
Step 3. According to multiple analyses, the contribution of the three components reaches 98% and inlet pulp consistency, inlet pulp flow, and hot clean water input flow are considered to be the most important variables in pulp washing.
e amount of residual soda in the final washed pulp and Baume degree in the first stage filtrate tank have typically been used as output indicators in pulp washing. erefore, the input and output of the pulp washing process can be described as

Two-Step Neural Network.
e pulp washing process is not a steady-state model which can be accurately identified. In this paper, a two-step identification method is employed. e basic idea of this method can be illustrated as follows.
Firstly, by means of the dynamic information of a process (for example, the step response), a dynamic model is obtained by a multilayer forward neural network. Secondly, with the obtained dynamic model, a steady-state data collection is produced as the training sample for the final steady model. Finally, another multilayer forward NN is trained to approximate the steady characteristic of the process and thus acquire the final steady NN model of the process.

Neural Network Dynamic Model Identification.
e dynamic NN mathematical model describing the pulp washing process is where d is the length of input samples (d � 3) and r 1 and r 2 are delay factors of residual soda and Baume degree, respectively. According to a one-year field survey conducted in the pilot paper mill, the two delay factors are determined as r 1 � 8 and r 2 � 6.
Substitute (d � 3, r 1 � 8, r 2 � 6) into equation (1). en, the number of independent variables for y 1 is 21 and the number of independent variables for y 2 is 19. According to the actual pulp washing process, residual soda has more dynamic influence factors than Baume degree. By training the neural networks, the number of dynamic independent variables for residual soda and Baume degree is 24 and 16; they would be the input layers for the both neural networks. e structure parameters of the dynamic NN model for residual soda and Baume degree are given in Table 1.
e identifying procedures are depicted as follows: Step 1. Initializing the weights in the dynamic NNs where usually the random numbers distributed evenly in the interval of [−1, 1] are employed for weight initialization.
Step 2. Constructing the input vector X(k) according to r and d and figuring out y(k) and e(k) � y(k) − y(k).
Step 3. Modifying the weights using a suitable learning algorithm.
Step 4. Shifting the elements of x 1 (k), x 2 (k), x 3 (k) and y(k) and terminating the training process if |e(k)| < ε or turning to step (2) if |e(k)| < ε, where ε is the expected permitted error.
To obtain a relatively accurate mathematical model, the standard BP (backpropagation) algorithm, the momentum and adaptive learning rate-based BP algorithm, and L-M  (Levenberg and Marquardt) optimization-based BP algorithm are used to train the NNs mentioned above. It is found that the dynamic model trained by L-M optimization-based BP algorithm is the optimal strategy. e training parameters of NN dynamic models for residual soda and Baume degree are listed in Table 2. e error curve in the training process of residual soda and Baume degree is shown in Figures 4 and 5. e simulation results show that the model has high convergence velocity; it is not suitable for local optimization. Based on the training of yielded data, the learning curves are shown in Figures 6 and 7. e true curve reflects the dynamic characteristics of the pulp washing process. e identification curve obtained by neural network reflects a good selflearning ability of neural network.
e generalizing performance on validation data is shown in Figures 8 and 9, which describes the performance of the dynamic neural network models.

Neural Network Stationary Model Identification.
After fully learning the pulp washing process by the dynamic neural network, the simulation data can be generated as the sample of the steady-state model, and only the dominant inputs are trained in the steady-state neural network model. e steady NN mathematical model of the pulp washing process can be descried as follows: where x 1 is inlet pulp consistency (kg/m 3 ), x 2 is inlet pulp flow (m 3 /h), x 3 is hot clean water input flow (m 3 /h), y 1 is residual soda in the final washed pulp (g/L), and y 2 is the Baume degree in the first stage filtrate tank (Be).
are steady models of residual soda and Baume degree trained by BP neural network. A three-layer BP NN is also employed for steady model identification. e structure of the neural network model is shown in Figure 10.
e corresponding model parameters of steady NN for residual soda and Baume degree are listed in Table 3. Two steady data collections are yielded based on the NN dynamic models of residual soda and Baume degree by continually changing the values of input variables. en, two data collections with 150 sets of sample set for neural network are obtained, respectively; 150 samples are composed of 30 sets of actual production simulation data and 120 sets of field data are collected by DCS (distributed control system).
ose sets of data are used as samples for identifying NN steady models. e 150 samples are divided into two parts: 100 sets of data are used to train the steady NN (30 sets of simulation samples are included) and another 50 sets of data are used to test the generalizing ability of the trained NN. For the training of NN steady models, the same three methods as above are employed. It is found that the L-M optimizationbased BP algorithm is the best. e training parameters of NN steady models for residual soda and Baume degree are listed in Table 4. BP neural networks were learned through their classified samples, which led to improved learning efficiency.
e yielded data learning curves after neural network training are described in Figures 11 and 12. It reflects that the employed BP neural network has wonderful function and robust property. e generalizing curves of neural network are shown in Figures 13 and 14. e results show that the L-M training function works effectively and accurately in data correction and has good generalization performance.

Data Preprocessing Based on Pattern
Clustering. As is well known, the learning and training of the neural network depend on sample data. In the actual pulping process, due to the influence of measurement device, measurement environment, measurement method, and human factors, the original measurement data inevitably have errors in the pulping process [17]. erefore, the traditional mathematical statistics method is difficult to preprocess.
In view of this, in this paper, the pattern clustering method is used to preprocess the initial sample training sets. is method can detect the error of fault and carry out weighted average of the measurement data in the same mode according to the measurement time, which reflects the time variability of the washing process and reduces the random error to a certain extent. e principle of the process data preprocess method based on pattern clustering is shown in Figure 15, which is divided into three parts: pattern clustering, error elimination data, and weighted average filtering. For the original measurement data, it is divided into input space and output space. In the data sample, X s is in the input space, Y s is in output space, and the original data imply the mapping information from the input space to the output space.
Based on this method, the sample data are in the left Table 5, and the clustering results are displayed in the right of Table 5. It can be seen that 40 sample data are aggregated into 34 sample data (see Table 6 for data distribution after pretreatment). e dataset tests the clustering effect of the improved algorithm. Cluster analysis data are applied to neural network, and 150 sets of data are obtained as the samples of neural network steady-state model identification. e 150 sets of data were randomly divided into two groups. e first group used 100 sets of data to train the stable neural network, and the second group used 50 sets of data to test the generalization ability of the training neural network. e learning curve and generalization curve after training are shown in Figures 16 and 17. Algorithm analysis and experimental results show that the improved algorithm has better detection performance, higher detection rate of learning performance, and stronger generalization ability. e steady-state data generated by the steady-state neural network model provide a reliable data source for the specific mathematical model of Baume degree of residual alkali black liquor. By the least square fitting of 200 sets of data in the steady-state model, the mathematical models for Mathematical Problems in Engineering 5 the residual soda and Baume degree are established as shown in the following equation: y 1 � 0.27x 1 − 0.0013x 2 − 0.0078x 3 + 1.923, Regression analysis is carried out for the parameters estimation, and the result is shown in Tables 7 and 8.
Regression analysis shows that 87% and 85% of residual soda and Baume degree are explained by the linear models, and the fitting deviation approximately follows the normal distribution. e predicted value can accurately reflect the       changes of these two indexes, and the accuracy is conducive to the actual production.

Operation Mode Optimization of Pulp
Washing Process

Condition Judgment.
e idea of the comprehensive optimal control of the washing process is to take the stability of the comprehensive working condition of the washing process as the control objective and find out the best operating parameters, i.e., the pulp concentration, the pulp flow rate, and the amount of water, by adopting the optimal control method. erefore, a new rigorous optimization model must be established. e residual soda and black liquor Baume degree reflect the working state of the pulp washing process. In order to control the dynamic process of pulp washing, the prediction models of residual soda and black liquor Baume degree are applied to judge the working state of pulp washing quality. e working condition index S is shown in the following equation: where 1.923 and 7.59 are the target values of residual soda and Baume degree of black liquor and k 1 and k 2 are the weights; the values of k 1 and k 2 are generally 0.4 and 0.6. According to the calculated S value, the comprehensive operating condition index can be divided into four intervals: excellent, good, medium, and poor. If it is optimal, the current parameter is maintained; if it is not optimal, adjust the operation parameters to optimize the model.

Comprehensive Optimization Control of Pulp Washing
Process. e comprehensive optimization control framework of the pulp washing process is shown in Figure 18, which can be summarized as follows:

Mathematical Problems in Engineering
Optimize the mode library when adding the condition samples and then recombine the samples [18]. Fuzzy matching of samples and clustering centers under current working conditions by similarity coefficient.
(c) Search the similar samples and predict y 1 and y 2 by the neural network. If the operating condition index Sis judged to be optimal, the operating parameters x 1 , x 2 and x 3 will be added to the optimization mode database, and the current operation parameters will be maintained. If nonoptimal, reduce the number of samples for operation optimization, adjust the        e quality of pulp washing is ensured by the judgment of the working condition index S. On this basis, the production efficiency is improved, and the comprehensive optimization control of the pulp washing process is formed.

Optimization Model of Pulp Washing Process.
In order to optimize the pulp washing process, a multiobjective optimization model of high quality, high yield, and low consumption is established. is objective model can be specified in more detail in the following equation: Merge (X i,r , Y i,r ) and (X j,r , Y j,r ) into one category and redefine C, ak Start Determining input classification δX and judging negligence data δY According to δX, mode clustered (X s , Y s ) (s = 1, 2, . . . , W) to X k,r ,Y k,r , k = 1, 2, . . . , C is class number, r = 1, 2, . . . , a k is the sample number.
According to δY, obtain X k,u , Y k,u , k = 1, 2, . . . , C by elimination of fault data in each category Obtain the X k,u , Y k,u by weighting average and filtering the X k , Y k | X i -X j | < δX i, j = 1, 2, . . . , C, i ≠ j End Yes No Figure 15: Flow of data preprocessing based on pattern clustering.  where f 1 (DF) is the consumption cost, which is elated to dilution factor DF, x 1 max and x 2 max are the maximum inlet pulp consistency and inlet pulp flow, f 2 (x 1 , x 2 ) is the deviation of the pulp output, and f 3 (x 3 ) is the water consumption.

Solution of the Optimization Model.
Structuring an evaluation function, transform the multiobjective optimization problem [18] into a single objective as follows: where ω 1 ∈ [0, 1], ω 2 ∈ [0, 1], ω 3 ∈ [0, 1], and ω 1 + ω 2 + ω 3 � 1. It reflects the importance of these objectives. During operation in a paper mill, the parameters are designed as follows: e optimization model is a nonlinear multiobjective optimization problem with linear constraints; ant colony algorithm can be regarded as a distributed multiagent system [19]. In this paper, ant colony optimization algorithm is  adopted for optimization, and the optimal operation mode is obtained by iteration of 10 steps. e model of unconstrained optimization based on penalty function is established. e corresponding nonlinear unconstrained optimization model of the pulp washing process is as follows:   Optimized operation based on data-driven optimal control Data base x 1 x 2 x 3 Figure 18: Framework of comprehensive optimization control.  e iteration step of λ in the upper median can be obtained either by fixing a constant or by optimizing each part. e penalty factor σ will increase with the number of infeasible solutions. e specific algorithm is as follows: (i) Given the initial point X (0) , the initial penalty factor σ 1 > 0 (taken σ 1 � 1000 for the actual calculation), the magnification factor C > 1 (e.g. C � 2), and the allowable error ξ > 0, k � 1.
(ii) Take X (k− 1) as the initial point; the unconstrained minimum of X (k) is obtained by solving the penalty function min G(X, σ k ). (iii) If the penalty is smaller than ξ, the calculation is stopped and the approximate minimum of X (k) for the original problem is obtained. Otherwise, σ k+1 � Cσ k and k � k + 1, and return to step (ii).

Optimization Results.
A comparison between before and after optimization on residual soda and Baume degree has been done in a paper mill in Shandong province, China. e results are shown in Table 9. After optimization, the residual soda and Baume degree can meet the process requirements (the residual soda is not higher than 2.2 g/L; the Baume degree is between 7.0 and 9.4Be) [21,22]. e average of residual soda decreased, and the Baume degree is increased to 9.3Be. e concentration and flow of the inlet pulp tend to the upper limit of the production index, the pulp yield is increased by 20%, and the water consumption is decreased by nearly 30%. Meanwhile, by the optimum of DF, the total cost is reduced from 13.2 to 10.9, and it has a certain economic benefit for pulping enterprises.

Application.
Based on the hardware of Siemens S7-400 PLC and software of WinCC 6.0 & Step 7, an optimization control system has been designed for the pulp washing process.
e DCS structure is shown in Figure 19. is system is a three-level control system. e soft measurement model and correction model of residual soda and Baume degree are embedded into the DCS in the optimization station. e main interface of the pulp washing process in the engineer station is shown in Figure 20.
rough the trial operation in the pilot paper mill, the curve of one week operation result is given and compared with the curve before optimization. As shown in Figures 21  and 22. It can be seen that before optimization, the residual soda and Baume degree cannot reach the process requirements at the same time and fluctuate greatly. After optimization, both of them are kept in a relatively stable value, and the higher the Baume degree, the lower the residual soda. It solves the incompatibility of the two indexes. In the actual production process, the long-term detection of the dynamic deviation shows that the model has high prediction accuracy.

Conclusions
In order to realize the multiobjective optimization of high yield, low cost, and low consumption in the pulp washing process, the operation mode optimization method in the pulp washing process is proposed. e optimal control has been successfully operated in some paper mills of China, and remarkable economic benefits have been achieved. In other words, the multiobjective optimization subsystem can effectively balance the contradiction between the residual soda and Baume degree. On the other hand, by optimizing the   subsystem, a set of recommended optimal operation parameters (including pulp concentration, pulp flow, and hot water flow) can be provided to the engineer, which increases the flexibility of pulp washing scheduling. It is important that the outlet pulp increases substantially and the hot water consumption is decreased to the lower limit. e total cost of the pulp washing process is reduced, which has a certain economic benefit for pulping enterprises.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.