Comparing the Performance of Several Multivariate Control Charts Based on Residual of Multioutput Least Square SVR (MLS-SVR) Model in Monitoring Water Production Process

Water that is used as the basic human need, requires a processing process to get it. Water quality control in Tirtanadi Water Treatment Plant is still univariate, while theoretically the quality characteristics of water quality are correlated and there is also an autocorrelation due to the continuous process. In this study, quality control is performed on three main variables of water quality characteristics, namely acidity (pH), chlorine residual (ppm), and turbidity (NTU) using several multivariate control charts based on Multioutput Least Square Support Vector Regression (MLS-SVR) residuals. MLS-SVR modelling is used to overcome and get rid of autocorrelation. The input results of the MLS-SVR model are specified from the significant lag of the Partial Autocorrelation Function (PACF), which in this study, is the first lag. The results of the MLS-SVR input model and the optimal combination of hyper-parameters produce residual values that have no autocorrelation anymore. The residuals are used to develop the Hotelling’s T 2, Multivariate Exponentially Weighted Moving Average (MEWMA), and Multivariate Cumulative Sum (MCUSUM) control charts. In phase I, we found that the processes are statically controlled. Meanwhile, in phase II, the monitoring results show that there are several out-of-control observations.


Introduction
Clean water that is used as a basic human need requires a treatment process to get it. Acidity (pH), chlorine residual, and turbidity are the three main water quality characteristics that are used to monitor water quality. Currently, water quality control in Tirtanadi Water Treatment Plant is carried out univariately for each quality characteristic. Whereas in theory, the three characteristics of water quality are correlated and there is also an autocorrelation because the water treatment and purification process is carried out continuously. Since the three water quality characteristics are correlated, this study proposes using a multivariate chart.
The control chart is a tool to describe a quality characteristic that has been measured from the sample [1]. The multivariate control chart can be characterized into three types, namely Shewhart [2][3][4][5][6][7], Multivariate Cumulative Sum (MCUSUM) [8][9][10][11], and Multivariate Exponentially Weighted Moving Average (MEWMA) [12][13][14][15]. The conventional multivariate control chart assumes that the data are mutually independent [1]. Meanwhile, the water quality data shows a serial dependence or autocorrelation that will cause false alarms if not addressed. Autocorrelation is a condition in which successive observations have a relationship [16]. The existence of autocorrelation between the observations leads to type I error and deluding conclusions about the control state of the process. Two procedures can be employed to monitor data that has autocorrelation. The first is to monitor the autocorrelation data using a conventional control chart with modified control limits. Another method is to develop a control chart based on the residuals. This approach uses a time-series model to data that has a relationship between successive observations. The residuals of the model are used to monitor the process. The residuals obtained are independent, so it is possible to monitor the process using the residual components. In addition to using the time series method, residuals can also be obtained by using the Multioutput Least Square -Support Vector Regression (MLS-SVR) method [17].
In this study, the quality control of the water production process will be carried out using several types of multivariate control charts, namely Hotelling's T 2 , MEWMA, and MCUSUM. The results of the three charts will be compared to see which control chart produces the best results. The remaining parts of this paper are composed as follows: Section 2 presents the MLS-SVR algorithms. Section 3 discusses the methodology of this research. Section 4 presents the results as well as discussions. Finally, Section 5 displays conclusions.

Multi-output least square -support vector regression (MLS-SVR)
In this section, the MLS-SVR algorithms are presented. The basic Least Square SVR algorithm only learns the mapping from input to a single output. Let , where i = 1,2, …, n is the sample size, and j = 1,2, …, m is the number of output variable. The MLS-SVR algorithm uses a kernel function to perform a nonlinear mapping to a higher dimension, h is a higher dimension called Hilbert Space.
MLS-SVR solves the problem by looking for parameters and which can minimize the objective function with the following constraint: The optimization in equation 1 can be made into a Lagrange function as follows: is a matrix containing the Lagrange multiplier.
Furthermore, to simplify the solution above, the following equation is formulated.
can be solved by simplifying two sets of linear equations, each of which has a positive definite M matrix so that the MLS-SVR decision function can be obtained as follows: In this study, the Radial Basis Function (RBF) kernel function is used which can be written as follows: with 0   . The RBF kernel function was chosen because of the ease and efficiency of computation time. In the next step, the optimal hyper-parameters ( ', '', )    are selected using the grid search method with Mean Square Error (MSE) criterion as follows:

Dataset
The data analyzed are obtained from water samples processed at the Tirtanadi Water Treatment Plant from August 1, 2020, to January 24, 2021. The data used is daily data on the reservoir water quality and will be split into two phases. Phase I is on August 1 -October 31, 2020, and phase II is on November 1, 2020 -January 24, 2021.

Data structure and variables
The data structure of this research is presented in Table 1. Meanwhile, the specification from each quality characteristic is tabulated in Table 2.

Research steps
The steps taken in this research are defined as follows: 1. Formulate the problem and determine the research objectives. 2. Collect and input the water quality data. 3. Find the optimal hyperparameter from the MLS-SVR model in phase I. 4. Calculate the residual of the MLS-SVR model from phase I 5. Calculate the residual of the MLS-SVR model from phase II using the optimal hyperparameter in phase I. 6. Monitor the residual of water quality data using three control charts, namely Hotelling's T 2 , MEWMA, and MCUSUM. 7. Compare the performance of the three charts. 8. Draw conclusions and suggestions.

Descriptive statistics
In this subsection, the descriptive statistics from phases I dan II are presented. The descriptive statistics of the quality of production water in Phase I are displayed in Table 3. Meanwhile, Table 4 presents the descriptive statistics for phase II. From Table 3-4, it can be seen that the mean from each variable is meeting the specification. Therefore, by looking at these results we can say that the processes are meeting the requirements. However, we need more justification using a control chart whether the processes are in-control or there are out-of-control observations. with an MSE of 0.0107. Figure 1 presents the comparison between the original value and prediction in phase I. According to the figure, it can be seen that the prediction values are almost the same as the original.  Figure 2 shows the monitoring results for phase I using three types of control charts. The MCUSUM chart is constructed using k=0.5, while the MEWMA chart is constructed using λ=0.25. From the results, it can be seen that all of the control charts did not send any signal out-of-control. This is indicated that phase I of the monitoring process is statistically controlled.

Monitoring results for phase II
The monitoring results of the phase II control chart are displayed in this subsection. The phase II data was taken from November 1, 2020 -January 24, 2021, and treated as the testing data. From Figure 3 it can be seen that prediction values of some quality characteristics have a difficulty to follow the original data. This can be seen in acidity and turbidity quality characteristics.  Figure 4 shows the monitoring results for phase II using three types of control charts. Similar to phase I, the MCUSUM chart is constructed using k=0.5, and the MEWMA chart is constructed using λ=0.25. From the monitoring results, it can be seen that all of the control charts send out-of-control signals. The summary of the number of out-of-control signals is displayed in Table 5. According to the summary, for phase I, all of the control charts did not find the out-of-control observation. Meanwhile,

Conclusions
In this research, the water quality data is analyzed using multivariate control charts. These charts are Hotelling's T 2 , MCUSUM, and MEWMA. However, due to the autocorrelation on the water quality data, the conventional multivariate charts are not appropriate to use. To handle this issue, the MLS-SVR method is employed to calculate the residual of the water quality data. The residuals are used to construct the Hotelling's T 2 , MEWMA, and MCUSUM control charts. The monitoring results show that all of the control charts used did not find any out-of-control signal in phase I. Furthermore, the MCUSUM chart finds more out-of-control observations compared to the other charts in phase II.