A Novel Input Variable Selection and Structure Optimization Algorithm for Multilayer Perceptron-Based Soft Sensors

A novel optimization algorithm for multilayer perceptron(MLP-) based soft sensors is proposed in this paper. .e proposed approach integrates input variable selection and hidden layer optimization on MLP into a constrained optimization problem. .e nonnegative garrote (NNG) is implemented to perform the shrinkage of input variables and optimization of hidden layer simultaneously. .e optimal garrote parameter of NNG is determined by combining cross-validation with Hannan-Quinn information criterion. .e performance of the algorithm is demonstrated by an artificial dataset and the practical application of the desulfurization process in a thermal power plant. Comparative results demonstrated that the developed algorithm could build simpler and more accurate models than other state-of-the-art soft sensor algorithms.


Introduction
In complex industrial processes, important process parameters that influence product quality or energy consumption need to be monitored and controlled in real time and with high accuracy. However, some of them are difficult to be directly measured with hardware sensors due to the limitations of existing field conditions [1][2][3]. Soft sensors achieve the mathematical modeling of these hard-to-measure parameters through auxiliary variables that are easy to be measured [4,5]. Basically, there are two categories of soft sensor techniques: mechanism analysis-based approaches and data-driven approaches. e mechanism analysis-based approaches require accurate understanding of the inherent mechanism of complex industrial processes, which is very difficult for the researchers. Data-driven algorithms provide advanced alternatives with statistical inference and machine learning techniques [6,7]. In recent years, data-driven soft sensors including principal component regression (PCR), partial least squares (PLS) regression, support vector machine (SVM), extreme learning machine (ELM), and artificial neural networks (ANNs) have been widely studied [8][9][10][11][12].
Due to their powerful nonlinear modeling competence, ANNs have become the most popular nonlinear modeling techniques. ere are a variety of ANNs such as convolutional neural networks (CNN) [13], generative adversarial networks (GAN) [14], radial basis networks, and recurrent neural network (RNN) [15], each of which has its own characteristics and advantages. Among them, multilayer perceptron (MLP) is the most widely used technique for nonlinear soft sensing owing to its outstanding nonlinear mapping capability and convenience of application. Heidari et al. [16] built an accurate predictive model of nanofluid viscosity with MLP. Shen et al. [17] presented an MLP-based recursive sliding mode dynamic surface control scheme for a fully actuated surface vessel with uncertain dynamics and external disturbances. In [18], MLP was applied to predict the water content of biodiesel and diesel blend in terms of temperature and composition.
With the rapid development of process automation, more and more variables are involved in the modern process industry. Redundant input variables increase the model complexity, delay the training time, and decrease the predictive accuracy of the model [19,20]. Variable selection technology provides a good solution to this problem and therefore is extensively studied [1,21,22]. Guo et al. [23] proposed an input variable selection method for a feedforward neural network (FNN) by using partial autocorrelation function and successfully forecasted the wind speed. Fock [24] proposed a new algorithm for the selection of input variables, in which the global sensitivity analysis technique was used to select the optimal input variables. Adil et al. [25] presented a new variable selection algorithm that used the heuristic method and minimum redundancy maximum relevance, and the experimental results showed better accuracy than other algorithms. In [26], a neural network-based soft sensor was developed to predict effluent concentrations in a biological wastewater treatment plant, in which principal component analysis (PCA) was implemented to select optimal input variables.
Nonnegative garrote (NNG) is a linear coefficient shrinkage approach based on penalty likelihood function. In recent years, it has been widely used in the variable selection of ANNs [27]. Sun et al. [28] utilized the NNG to compress the input weights of the MLP to achieve nonlinear variable selection, and the superiority of the proposed algorithm was proved through two artificial dataset examples and a real industrial application. In [29], a local search strategy was incorporated into the NNG-MLP to improve its performance. However, these algorithms only consider the selection of input variables and ignore the optimization of the internal structure of the MLP network. Actually, the redundant nodes of hidden layers worsen the performance of MLP as the redundant input variables do and even lead to overfitting of the model. Pan et al. [30] proposed a novel approach of simplifying the structure of deep neural network through regularization of network architecture. Anbananthen et al. [31] presented a pruning procedure, by which redundant links were deleted from the trained network. Monika and Venkatesan [32] designed a divisive ANN clustering algorithm to prune the neurons of the hidden layer of MLP, which promoted model accuracy. Fan et al. proposed an algorithm that utilized the least absolute shrinkage and selection operator (LASSO) to perform the selection of input variables and the optimization of the hidden layer of MLP, named dLASSO [33]. However, the variable selection and hidden layer optimization of dLASSO are independent of each other, which may cause the omission of the optimal solution.
According to our investigation, few existing methods deal with the redundancy of input variables and hidden layers of ANN models synchronously. In this paper, a novel algorithm that performs global dimension reduction and structure simplification for MLP-based soft sensors is proposed by elaborately combining NNG and MLP. e MLP is implemented to cope with the nonlinear dynamics of the industrial processes, and NNG is devised to conduct the selection of the input variables and simplification of the hidden layers. To the best of our knowledge, this algorithm is a quite innovative design of a penalty function-based strategy for global optimizing the structure of ANNs. e effectiveness of the developed algorithm is validated by an artificial dataset and application to a practical industrial process to provide informative analysis. e remainder of this paper is organized as follows. e background theories of the approach are reviewed in Section 2. Section 3 describes the detailed principles and development of the proposed algorithm. e simulation results and analysis of artificial datasets and practical industrial process are presented in Section 4. Finally, some concluding remarks are given in Section 5.

Theoretical Background
e architecture of a three-layer MLP discussed in the paper is demonstrated in Figure 1, which is composed of an output layer, a hidden layer, and an input layer. e number of neurons of input layer is dependent on the variables or columns of the input dataset, while that of the hidden layer is usually chosen by trial and error. e mathematical expression of the studied MLP is shown as where g(·) and f(·) denote the activation functions of the hidden and output layer, respectively, x � [x 1 , x 2 , . . . , x p ] is the vector of input variables, and y is the output variable. For the linear regression problem, where β � [β 1 , β 2 , . . . , β p ] T is the vector of magnitude coefficients and ε is the random error. Breiman proposed a constraint consisting of the summation of shrinkage coefficients c � [c 1 , c 2 , . . . , c p ] and imposed it on the ordinary least squares (OLS) regression model [34]: in which β represents the coefficient vector of OLS estimation and s is the garrote parameter. X is the input dataset, in which each column corresponds to a candidate input variable, and Y ∈ R n is the dataset of output variable. In [28], the NNG algorithm was devised to select the input variable of MLP by imposing c on the input layer: and equation (3) is consequently reformulated as

Design of Global Optimization for MLP.
In the study, a global optimization algorithm for MLP-based soft sensor, called GNNG-MLP, is proposed to reduce the redundancy of input and hidden layer simultaneously. e primary strategy of the proposed algorithm is to design a nonlinear quadratic optimization expression with NNG constraint that imposes the shrinkage coefficients on the input and hidden layers of MLP. e GNNG-MLP is implemented with the continuous adjustment of the garrote parameter. e schematic diagram of the proposed algorithm is illustrated in Figure 2, in which the nodes x 2 and h 2 have null impacts on the model and will be removed from the MLP. Meanwhile, the weight lines connected to them will also be invalid. e proposed algorithm is divided into two phases. In the first phase, a well-trained MLP network is presented with the conventional MLP training algorithm. At the second phase, a set of shrinkage coefficients are imposed on input and hidden layer of the obtained MLP. Consequently, the expression of MLP is reformulated as follows: where the shrinkage coefficients of the nodes of input and hidden layer, respectively. c * I and c * H are obtained by solving the following formula: where c * i � 0 indicates that the input variable x i is removed from the MLP and c * h � 0 means that the hidden node h i is excluded from the model. Equation (7) is a nonlinear quadratic optimization problem with constraints that can be solved with trust-region reflective optimization algorithm [35]. After that, the optimal predictive model of MLP is presented by

Determination of Parameter s.
e choice of parameter s is very important for the developed algorithm because it can directly affect the extent of shrinkage on the MLP structure. s � 0 implies that all input variables and hidden nodes will be eliminated. When s ≥ p + q , all the input variables and the hidden nodes will be completely preserved. erefore, the value of s directly determines the number of neurons and influences the performance of MLP.
is paper adopts the enumeration approach to select the optimal s from the vector S � [s 1 , s 2 , . . . , s u ]. Herein, s 1 is set to a constant close to zero, and s u is set to e other values of S are equably distributed between s 1 and s u .
In this paper, Hannan-Quinn information criterion (HQ) [36] that can balance the accuracy and complexity of a model is adopted as the model evaluation criterion that is formulated as where n denotes the number of data samples, k represents the number of input variables, and y and y are the actual and predictive value of the output variable, respectively. Considering the randomness of ANNs, the V-fold cross-validation (CV) method is taken to validate the model. e execution is described as follows. Firstly, the group of all datasets is evenly separated into V subdatasets. Secondly, a single subdataset is taken as the validation dataset, and the other V-1 subdatasets are used as the training dataset to acquire the trained MLP. e procedure is repeated V times, and these V results are averaged to present the ultimate estimate. In this work, s is chosen by V-fold CV with HQ, whose pseudocode is shown in Algorithm 1.

∑ Input layer
Hidden layer Output layer

e Computational Procedure of Proposed Algorithm.
In this paper, a global optimization algorithm for MLP is developed. e advancement of the proposed algorithm is that it not only deals with the redundancy of input variables but also simplifies the internal structure of MLP. e overall computation flow of the algorithm is described as follows: Step 1. Initialization: get a trained MLP with the training dataset X, Y { }.
Step 2. Impose the NNG coefficients on the input and hidden nodes of the MLP.
Step 3. Perform Algorithm 1 to obtain the optimal s as s * .
Step 4. Acquire the shrinkage coefficient c * I and c * H by solving equation (7) with parameter s * .
Step 5. Updated weights of input and hidden nodes by substituting c * I and c * H into equation (8).
Step 6. Remove the columns whose corresponding coefficient c * i � 0 from X, Y { }, and delete the hidden nodes whose corresponding coefficient c * h � 0.
Step 7. Output the optimized MLP.

Experimental Setting.
In the paper, comprehensive simulations are implemented to verify the performance of the proposed algorithm, in which comparisons with other state-of-the-art variable selection algorithms such as SBS-MLP [37], NNGEO-MLP [29], and dLASSO-MLP [38] are performed. All algorithms are simulated under the same settings. e MLP structure in the case is a typical three-layer configuration, in which the activation function of hidden and output layer is hyperbolic tangent and linear, respectively. e initial number of hidden nodes is determined by ∑ Input layer

Hidden layer
Output layer  (7) with s to get c * I , c * H ; Get the new MLP by equation (8); Compute the HQ(v) with validation dataset X v , Y v ; End for CV_HQ(i) � mean (HQ); End for Output the optimal s * with the minimum CV_HQ; End Algorithm ALGORITHM 1: Pseudocode of choice of s. some trial runs. Training and testing data take up 80% and 20% of the overall dataset, respectively. 5-fold CV is employed in the algorithm. e performance of the involved algorithms is assessed with the following five measures.
(1) MSE: the mean square error between the predicted and the actual value with the testing dataset, , where y i is the mean value of output variable.

Simulation Results of Artificial Dataset.
In this subsection, a nonlinear model that was proposed in [28] is applied to generate artificial datasets. e input dataset X was produced from a multivariate normal distribution with covariance matrix Σ, in which covariance between two different variables (columns) i,j � ρ |i− j| , ∀i ≠ j. e mathematical expression of the model is where X 1 ∈ R 1000×10 are relevant variables, ε is white Gaussian noise, and β � [3.0, 1.5, 2.0, 4.0, 0.5, 1.3, − 2.6, − 3.5, − 5.1, 2.0] T . Besides the relevant variables, irrelevant dataset X 2 ∈ R 1000×40 is produced to make this case a problem of selecting 10 relevant variables out of 50 variables. Table 1 presents the statistical results of artificial dataset with different algorithms after 20 runs. In this case, ρ of the covariance matrix is set to 0.8, which generates a dataset with a high correlation between different variables. According to the numerical comparison of MSE and A R 2 , the GNNG-MLP has the highest prediction accuracy among all algorithms. Furthermore, FS+ is the smallest, which indicates that the GNNG-MLP selects fewer irrelevant variables than other approaches. By comprehensively comparisons of FS+ and FS− , it is can be concluded that our algorithm could select relevant variables with more precision. Besides, statistical results of neurons show that GNNG-MLP can effectively remove the redundant nodes and then improve the performance of the model. It can be found from the results that our algorithm solves the problems of input variables and model redundancy simultaneously.
In addition, the capability of different algorithms is further compared by changing the value of collinearity ρ. Figure 3 shows the comparison of the five indicators with different ρs. It can be seen that the GNNG-MLP consistently yields the lowest MSE, meaning that our algorithm always has the best accuracy. e number of hidden layer nodes with GNNG-MLP is always the lowest, which proves the efficiency of reducing the redundancy with our approach. Moreover, our algorithm also performs the best on other indicators in most cases, which demonstrates that our algorithm has the best stability.

Application to an Actual Desulfurization Process of Power
Plant. In this section, the developed algorithm was applied to forecast the SO 2 emissions from a desulfurization process of a thermal power plant in China. e structural diagram of the process is shown in Figure 4. is power plant adopts limestone-gypsum wet flue gas desulfurization technology, which includes SO 2 absorption system, flue gas system, and compressed air system. e technology mainly uses lime and limestone to absorb SO 2 by chemical reactions that are shown as follows: e limestone slurry entering the primary absorption tower is dissolved in the absorption tower slurry pool. By adjusting the amount of limestone slurry entering the absorption tower or the concentration of the slurry discharged from the absorption tower, the pH value of the absorption tower slurry pool is maintained between 5.5 and 6.5 to ensure the limestone dissolution and SO 2 absorption. After the original flue gas first enters the primary absorption tower, it passes through the spray zone in countercurrent, is fully contacted with the slurry to absorb SO 2 , and then enters the secondary absorption tower. e remaining SO 2 and other harmful components in the flue gas are absorbed in the spray zone. Finally, the dust is removed by a wet dust collector and discharged to the chimney. e two absorption towers adopt almost the same structure that is demonstrated in Figure 5. Table 2 presents the statistical results of 20 runs with different soft sensor algorithms. It can be found that GNNG-MLP has better prediction accuracy with a smaller number  of neurons than other approaches. is result shows that GNNG-MLP can improve the accuracy of the model by simplifying the internal structure of the MLP. Figure 6 shows the comparison of predictive and actual value of the target variable with our algorithm. Obviously, the proposed algorithm can effectively track the dynamic change of the target variable.
In order to further prove the accuracy of the proposed algorithm, error comparisons between the measured and the predicted SO 2 concentration with different algorithms are presented in Figure 7. e results show that the error of GNNG-MLP is the lowest and within the range [− 4.2, 4.2] in most instances, which can meet the requirements of the field operating. e performance of the developed soft sensor is fully compliant with the standards of industry demand.
Besides, comparative analyses based on the statistical results of variable selection and the actual industrial operating experience are given. Figure 8 presents the frequency of input variable selection over 100 runs. It can be found from Figure 8 that variable 13 is included in all solutions, and variables 17 and 30 are selected more than 80 times.

Mathematical Problems in Engineering
According to the statistics, the most relevant input variable to the output variable is variable 13. In terms of the manual book of the system, variable 13 is the SO 2 concentration of #9-1AT outlet's flue gas and the SO 2 concentration of #9-2AT inlet's flue gas. Obviously, this variable is highly related to the SO 2 concentration of final emission. Variable 17, that is, the limestone slurry to #9 AT flow, has 90% of selection frequency. It can be seen from formulas (11) and (12) that the CaO and CaCO 3 in limestone slurry can absorb the released SO 2 . erefore, variable 17 is included in the optimal solution. e variable 30 is the pH value of the slurry in the tower 9-2. e slurry absorbs more SO 2 when the SO 2 concentration in the flue gas is relatively high. As a result of this, a large amount of hydrogen ions will be generated, and the pH value will decrease.

Conclusions
is paper proposed a new optimization algorithm for MLP-based soft sensors with NNG. e advantage of this algorithm is that it can simultaneously perform the selection   of the input layer and the optimization of the hidden layer for MLP and therefore has more tendency to get the global optimal model. e simulation results on the artificial datasets demonstrate that GNNG-MLP has obvious advantages in both the number of neurons and the generalization performance of the model. In addition, the algorithm is applied to forecast the SO 2 emission in a desulfurization process to verify the reading of the online analyzer. Comprehensive results and comparisons prove that the developed soft sensor has remarkable model simplicity and accuracy. e proposed soft sensor can be further implemented for the optimization and control design of the desulfurization process.

Data Availability
e data used to support the findings of this study are currently under embargo, while the research findings are commercialized. Requests for data 24 months after publication of this article will be considered by the corresponding author.

Conflicts of Interest
e authors declare no conflicts of interest.