An on-line framework for monitoring nonlinear processes with multiple operating modes

A multivariate statistical process monitoring scheme should be able to describe multimodal data. Multi-modality typically arises in process data due to varying production regimes. Moreover, multimodality may inﬂuence how easy it is for process operators to interpret the monitoring results. To address these challenges, this paper proposes an on-line monitoring framework for anomaly detection where an anomaly may either indicate a fault occurring and developing in the process or the process moving to a new operating mode. The framework incorporates the Dirichlet process, which is an unsupervised clustering method, and kernel principal component analysis with a new kernel specialized for multimode data. A monitoring model is trained using the data obtained from several healthy operating modes. When on-line, if a new healthy operating mode is conﬁrmed by an operator, the monitoring model is updated using data collected in the new mode. Implementation issues of this framework, including the parameter tuning for the kernel and the selection of anomaly indicators, are also discussed. A bivariate numerical simulation is used to demonstrate the performance of anomaly detection of the monitoring model. The ability of this framework in model updating and anomaly detection in new operating modes is shown on data from an industrial-scale process using the PRONTO benchmark dataset. The examples will also demonstrate the industrial applicability of the proposed framework.


Introduction
Changes in production regimes and market demands may result in a process running in several different healthy operating modes. The data collected in such processes are multimodal [1] . An important industrial requirement when monitoring such processes is that it must be possible to detect when a process is in one of several healthy operating modes and when it is developing a fault. Key to this is an anomaly indicator that detects anomalies in the data with respect to any of the known healthy operating modes.
To meet the requirement, the monitoring model should be able to account for all of the known healthy modes. However, new operating modes may emerge as the production regimes change. Therefore, whilst a detected anomaly might be symptomatic of a developing fault, it may also indicate that the process has moved into a when the process deviates away from any of the healthy modes, including the newly considered one.
The paper proceeds as follows. Section 2 gives an overview of relevant literature and describes the industrial requirements for a method of anomaly detection for multimode processes. Section 4 describes the methods used for training and tuning the monitoring model and discusses the choice of anomaly indicators and the tuning of parameters. Section 5 presents a flowchart summarizing the framework, showing the steps involved and the online model updating. A numerical example with multimode and nonlinear data is presented in Section 6 showing the performance of monitoring model training. In the same section the monitoring framework with anomaly detection and model update is tested using the PRONTO experimental benchmark dataset. The dataset is used to simulate a case where the process moves to a previously unseen operating mode and when a fault develops. Future directions for improving this framework are discussed in Section 7 . The paper ends with the conclusions in Section 8 .

Background
Faults are undesirable operating conditions which may develop into process failures. The development of a fault often leads to process variables deviating from their equivalent healthy values, resulting in anomalies in process data. Multivariate Statistical Process Monitoring (MSPM) approaches attempt to detect these anomalies by building monitoring models based on data recorded during periods of healthy operation.

Fault detection in multimodal processes
According to a recent review paper [3] , data-driven multimode process monitoring approaches are evaluated according to the false and missed alarms and other metrics. However, the interpretability of the monitoring results obtained by these approaches for multimode processes have been afforded less attention. For example, approaches based on building multiple monitoring models to describe the different operating modes [3] typically output several anomaly indicator values. Even if, for a given sample, the anomaly indicators of two or more modes have similar values, the associated extent of the anomalous behaviour indicated in each of the modes can vary significantly. An additional decision-level fusion step, which combines the multiple anomaly indicators to determine the occurrence of a fault, has been proposed to account for this issue [4][5][6] .
A multimode process may not be linear, and its multimode nature means there are ranges of values of the variables that are not encountered in normal operations. Hence nonlinear MSPM approaches [7][8][9][10][11] , especially Kernel Principal Component Analysis (KPCA) [8,10] , have been used to build single monitoring models for multimode processes. Nevertheless, nonlinear algorithms are not guaranteed to work well on multimodal data. For example, the Radial Basis Function (RBF), which is the most widely used kernel function in KPCA, can handle nonlinear data. However, a previous investigation [12] showed that RBF kernels perform badly when applied to multimodal data. Therefore, the kernel should be specified properly so as to make KPCA a good candidate for building a single monitoring model. One example is the Non-stationary Discrete Convolution (NSDC) kernel [12] , which achieves a better monitoring model for multimode data than the RBF kernel.

Anomaly indicators
Another task in building a single monitoring model is to enhance the interpretability of the monitoring result by achieving an anomaly indicator that is useful for process operators. Operators need to detect the presence of faults with acceptable levels of false and missed alarms. This requires an anomaly indicator that has the same interpretation regardless of the mode in which the process is operating.
A common observation from the anomaly indicators achieved by various nonlinear monitoring techniques is that, for nonlinear processes with a single operating mode, the same magnitude of the anomaly indicator represents the same level of anomaly in the process and the control limit does not change over time [13][14][15][16] . Such behaviour is desired in practice because it is easy for operators to determine if the process has anomalies by inspecting the anomaly indicator. This may not be guaranteed in multimodal process monitoring, however, because the multiple operating modes often result in significantly different steady states of the process variables and, if the process is also nonlinear, different static process models in each mode. For example in [17,18] , the control limit changes as the operating mode changes in the process and the same magnitude of the anomaly indicator does not indicate the same level of anomaly if the mode has changed. The NSDC kernel is designed to describe multimodal data accurately whilst achieving the behaviour that the same magnitude of the anomaly indicator represents the same level of anomaly in the process [12] . This paper aims to achieve a monitoring approach that gives an anomaly indicator that has consistent behaviour regardless of changes in steady states or the static models caused by varying healthy operating modes.

Data clustering and on-line updating
The NSDC-KPCA approach requires the training data to be clustered a priori. When clustering the historical data used for training, the number of operating modes existing in the data may not be known. Data-driven clustering methods such as K-means [19] and expectation maximization method [20] , typically need to know the number of clusters a-priori [21] . In contrast, the Dirichlet Process (DP), which is a non-parametric Bayesian method [22] , can automatically determine the number of clusters when clustering the data. After applying DP, the data are organized to be used in NSDC-KPCA.
The off-line training step of the proposed framework assumes that all training data are from healthy operating modes. This step uses DP to automatically determine the number of modes in the training data and to label the training samples accordingly. This step is unsupervised, in that it does not require the information about which mode each data sample belongs to. The training step then employs NSDC-KPCA in order to reduce false and missed alarms. Unlike the multiple model approaches such as [23,24] and [17] , the NSDC-KPCA approach does not require the classification of a test sample or the fusion of the monitoring results. Moreover, the magnitude of an anomaly indicator achieved by NSDC-KPCA reflects the level of anomalies in the data with respect to the known healthy operating modes regardless of which mode the process is running in. In the on-line update step, new data collected from new healthy operating modes can be incorporated via the DP-NSDC-KPCA approach allowing the monitoring model to be updated on-line. Preliminary results of such a framework were presented in [25] . This paper extends [25] in the following aspects: 1. Demonstrating the performance of monitoring modelling and anomaly detection of the proposed framework on a simulated multimode dataset with a nonlinear mode; 2. Considering the influence of the kernel width of the NSDC kernel on the monitoring model built by NSDC-KPCA and proposing a tuning strategy for the kernel width; 3. Proposing an anomaly indicator that suits the NSDC-KPCA;  4. Applying the framework with properly tuned kernel widths and the new anomaly indicator to both the numerical simulation and the PRONTO dataset.

Glossary of terms
Acronyms used in this paper are summarized in Table 1 .

Training of the monitoring model
This section will introduce the DP-GMM approach for clustering unlabelled data obtained from multiple healthy operating modes. The section will also describe the NSDC-KPCA approach for building monitoring models. The following bivariate algebraic example with four modes is used to illustrate the behaviour of the DP-GMM and NSDC-KPCA approaches: Mode 1: where e 11 ∼ N (0 , 2 . 25) and e 12 ∼ N (0 , 0 . 25) .

Mode 4:
x 1 = e 41 + 9 where e 41 ∼ N (0 , 1) and e 42 ∼ N (0 , 0 . 25) . N denotes Gaussian distributions and U denotes uniform distributions. The model of x 1 and x 2 is nonlinear in Mode 3. The data generated from this example are visualized in Fig. 1 .

Dirichlet process Gaussian mixture models for clustering
The Dirichlet Process Gaussian Mixture Model (DP-GMM) assumes that the data to be clustered are samples drawn from Gaussian Mixture Models (GMMs) [26] , and applies DP for inferring a GMM for the data.

Gaussian mixture models
An m -dimensional random variable x following a Gaussian mixture model with J components can be described as: where the j -th component is an m -dimensional multivariate Gaussian distribution with mean μ j and covariance j . x is assumed to be drawn for the j -th component with a mixture probability π j .

DP-GMMs
A multimodal training dataset X is assumed to have samples of x which follow the GMM model in Eq. (5) . The objective of DP-GMM clustering is to use X to determine the number of the components J , the mean μ j and the covariance j associated with each component and from which component a sample in X is drawn. Such inference is done in an iterative manner with two steps: the inference of means μ j , and covariances j of the Gaussian components and the inference of the labels of the training samples indicating from which component a sample is drawn. All the parameters are assumed to follow certain distributions and these distributions are updated using observations, by applying the Dirichlet distribution and the Dirichlet Process.
First, the vector of mixture probabilities π = { π 1 , . . . , π J } is assumed to follow the Dirichlet distribution [27] . The authors of [28,29] have shown that the means μ j and covariance matrices j follow a discrete distribution G ( · ) generated from a Dirichlet Process specified by the base distribution G 0 and the concentration parameter α. A DP-GMM model for μ j , j and π j is written as where DP and Dir represent the Dirichlet Process and the Dirichlet Distribution, respectively.
The inference step for μ j and j is to find the updated distribution G ( · ) using the data from the j -th Gaussian component. From a Bayesian perspective, the base distribution G 0 is the prior distribution of μ j and j while G ( · ) is the posterior distribution to be updated given the samples in X . The following Normal Inverse Wishart (NIW) distribution [30] is selected as G 0 because it is a conjugate prior of the multivariate Gaussian distribution: where j follows an Inverse Wishart distribution and μ j given j follows a Gaussian distribution. The parameters of this NIW distribution are 0 = { u 0 , κ 0 , ν 0 , 0 } , where ν 0 and κ 0 are positive values. The inference step for μ j and j involves calculation of the posterior distribution G ( · ) and sampling μ j and j from the updated G ( · ). The updated G ( · ) also has the NIW formulation in Eq. (7) , but the parameter vector is updated using X . For the dataset to be clustered, the inference step for the labels of samples uses a vector I = { I 1 , . . . , I N } ∈ R 1 ×N , which denotes the cluster indices for each sample in X. I i takes values from 1 to J and indicates that the i -th sample x i is drawn from the I i -th Gaussian component.

Inference
According to Theorem 2 of [29] , the distribution of the GMM given the training samples is the stationary distribution of the Markov chain, and convergence to the stationary distribution is independent of initial assumptions. The inference steps below are based on Gibbs sampling following [31] . Gibbs sampling infers I and ( μ j , j ) by iteratively drawing samples of I and ( μ j , j ) from the joint distribution of I and ( μ j , j ). The joint distribution is updated when the new inference of I or ( μ j , j ) is available.
where I is the cluster indices vector for the dataset with x * excluded and n j is the number of samples in the j -th cluster. A new cluster is created for x * with the probability The concentration parameter α, which influences the likelihood of generating a new cluster, is kept constant in this paper. After the assignment is updated for every sample in X , empty clusters are removed. This step enables DP to generate new clusters and remove empty clusters so that the number of clusters can be determined automatically. d. Updating the hyperparameter of the NIW distribution : the following Bayesian formulation estimates the posterior distribution of the parameters μ j and j : where X j denotes the samples clustered into the j -th cluster. 0 denotes the parameters for the prior NIW distribution u 0 , κ 0 , ν 0 and 0 and ( j ) for the posterior distribution, i.e. u ( j ) , κ ( j ) , ν ( j ) and ( j ) . Since the NIW distribution is a conjugate prior for the multivariate Gaussian distribution with unknown means and covariances, ( j ) can be obtained by comparing the posterior distribution in Eq. (10) with the standard NIW formulation, leading to Eq. (11) : i /n j and n j is the number of samples in X j . These steps, except the initialization step, are carried out iteratively until the GMM estimated converges to the true distribution of the data [29] . One sign of convergence is that the number of clusters does not change after several iterations. In practice, usually a maximum number of iterations is set and the iteration stops after reaching this maximum number.

Parameter estimation for Gaussian components
The inference procedure assigns the data to a certain number of clusters and obtains posterior estimations of the mean vectors and covariance matrices of these clusters. However, in order to reduce the influence of the initialization of the NIW distribution, the mean μ j and the covariance matrix j of the j -th component are re-estimated from the clustered data [32] : In summary, DP-GMM determines the number of Gaussian distributed clusters from data, estimates the means and covariance matrices of these clusters, and labels the data. The information obtained by DP-GMM will be used in KPCA-based modelling to train the monitoring model. Fig. 2 and Table 2 present the data clusters and the parameter estimation for the numerical example, respectively, obtained by DP-GMM.
of two samples x i and x j calculates the i, j -th entry of the kernel matrix K , where K ∈ R N×N is assumed to be the covariance matrix of the projected kernel features in the kernel space: where K i, row is the mean of the i -th row, K j, col is the mean of the j -th column, and K is the mean of the entire kernel matrix K . Similarly to the standard PCA formulation, eigenvalue decomposition is applied to ˜ K to obtain the eigenvalues λ 1 , λ 2 , , λ N and the eigenvectors v 1 , · · · , v N ∈ R N×1 : Assuming that the first N 1 kernel Principal Components (PCs) are retained after KPCA, the kernel principal vector of the i -th sample ( y i ) is calculated as: where ˜ K i, col is the i -th column of the centered kernel matrix ˜ K . More details of the KPCA formulation can be found in [33] . It can be seen that the kernel PCs y and the KPCA model depends on the kernel function k and, as a result, k should be selected such that it can adequately describe the multimodal data. The proposed framework uses the KPCA with the novel NSDC kernel to build a single monitoring model for multimode process data.

Non-stationary discrete convolution kernel
This section briefly reviews the formulation of the NSDC kernel [12] which aims to build a monitoring model which can account for any multimodality in a dataset. Both the RBF kernel and the NSDC kernel assume the following nonlinear regression problem with basis functions: where x ∈ R m ×1 is the vector with original measured variables and z is the output in a nonlinear feature space. There are H basis functions, of which f h is the h -th basis function with parameters θ h and a weight w h . Assuming two samples x and x * have the outputs z and z * , respectively, the covariance of z and z * is: where is the covariance matrix of x and δ controls the performance of the kernel function.
In contrast, in the NSDC kernel only samples in the training set X are chosen as centers of the radial basis functions. Given that c ∈ X and X has been clustered into J clusters by DP-GMM, the kernel function becomes: where n j is the number of samples in the j -th cluster and c ( i ) ∈ X . The NSDC kernel uses the covariance matrix of the j -th cluster estimated by DP-GMM, ˆ j , to incorporate the localized property of each operating mode. In NSDC-KPCA, Eq. (13) uses the kernel function defined in Eq. (20) .
Although the original formulation of the NSDC kernel is supervised because the data clusters X j and their covariance matrices need to be known a-priori [12] , the inclusion of the DP-GMM step into the framework renders the approach unsupervised as the training data is clustered and the parameters are estimated automatically.

Anomaly detection
When applying KPCA to anomaly detection, the anomaly indicators are calculated using the kernel PCs. An anomaly indicator is expected to change monotonically as the level of anomaly increases. For multimodal process monitoring, the same magnitude of anomaly indicators should imply the same level of anomaly for different operating modes.
T 2 and SPE are examples of anomaly indicators which are widely used in MSPM. For a sample x * , T 2 is defined as: where y * ∈ R N 1 ×1 are the kernel PCs of x * obtained by KPCA, N 1 is an N 1 × N 1 diagonal matrix with the first N 1 eigenvalues being its diagonal elements. Under the KPCA formulation the SPE is usually defined as [35] : where ˜ x * is the centered x * , ( ˜ x * ) are the unknown projections of x * to the kernel space before applying PCA-based dimension reduction, and 0 is the center of ( ˜ x i ) for i = 1 , · · · , N. Although ( ˜ x * ) and 0 cannot be obtained directly, their second order norms can be calculated using the kernel matrix. K * * = k ( x * , x * ) is the variance of x * in the kernel space and K * is the mean of the kernel It should be noted that, unlike the RBF kernel, k ( x * , x * ) is not necessarily equal to 1 when using the NSDC kernel. SPE may be superior to T 2 as an anomaly indicator when the RBF kernel is used since ( ˜ x ) has infinite dimensions while y * only has finite dimension. As a result, SPE * increases monotonically as x * gradually deviates away from the training data [36] . However, when using NSDC-KPCA neither T 2 or SPE are ideal for anomaly detection as the indicators do not increase monotonically with the level of anomaly. More specifically, multiple samples that are anomalous to different extents may have the same value of T 2 or SPE. For the numerical example in Fig. 3 (a), although the two samples highlighted (red asterisks) are different because one is located within the training data cluster while the other is far away from the training data and is likely to be an anomaly, they have the same T 2 value. The shaded region defined by the contour is not suitable for anomaly detection because false alarms may be triggered when test samples are inside the training cluster. The same behaviour may be observed when considering SPE as shown in Fig. 3 (b). The reason is that, in the NSDC formulation, the dimension of given that in the RBF kernel x * is detected as an anomaly if the following detection criterion holds: SPE * > SPE UCL (25) where SPE UCL is the upper control limit of the revised SPE estimated from the training data. The distribution of SPE in KPCA does not have an analytic form because of the kernel transformation. In practice, the SPE UCL can be defined by the use of the empirical reference distribution of SPE values for healthy training data [37] . This paper will use the (100 − η) % percentile of the SPEs obtained in the training data as the SPE UCL with confidence level η%.

Tuning the NSDC kernel
The kernel width δ of the NSDC kernel influences the performance of the NSDC-KPCA approach. One of the common choices for the kernel width of RBF kernels is δ = rm [38,39] where m is the dimension of the input variable space and r is a scaling factor. However, [36] showed that the proper kernel width selection for RBF kernels may also be related to the distances between samples in the training dataset. This paper applies the tuning strategy proposed in [36] . The strategy examines several values of the kernel width δ to train the monitoring model. The false alarm rates are calculated by applying the trained monitoring models to a cross-validation dataset generated using the model presented at the beginning of Section 4 . Fig. 4 plots the false alarm rates achieved for various δ values. The optimal δ value is considered as 5.4 because δ = 5 . 4 can achieve a reasonably low false alarm rate that is close to 1%. Details of this strategy are discussed in [36] . Fig. 5 compares the monitoring models built by NSDC-KPCA with various values of the kernel width by visualizing the detection contours. The detection contours for the numerical example are achieved by connecting the samples in the variable space with SPE values equal to SPE UCL . Any sample located outside the detection contour will be detected as an anomaly. The contour achieved when δ = 0 . 01 in Fig. 5 (a) is over-fitted and will lead to increased false alarms. The contour achieved when δ = 10 0 0 in Fig. 5 (b) is under-fitted and cannot describe the dataset sufficiently, leading to increased missed alarms. The optimal kernel width δ = 5 . 4 results in the contour in Fig. 5 (c), which suits the multimode data well.  Fig. 6 summarizes this framework. In the off-line training step, the DP-GMM approach clusters the unlabelled training data from multiple healthy operating modes. The labelled data, the number of clusters and the means and covariances of each cluster are used for tuning the kernel width for the NSDC kernel and then for training the KPCA model. The NSDC kernel in Eq. (20) calculates the kernel matrix from the training data, where the labelled data, the tuned kernel width δ, the number of the clusters, and the covariances of each cluster are needed. After the PCA dimension reduction in the kernel space, the monitoring model and the upper control limit of SPE are obtained.

The monitoring framework
In the on-line monitoring step, the monitoring model is applied to the on-line data to obtain the SPE without clustering. If the SPE does not exceed its control limit, the process is considered to operate in a healthy mode. If the SPE exceeds the control limit, an anomaly is detected, meaning that the process has moved to an operating mode that has not been seen in the training data. One needs to refer to expert knowledge, obtained for example by consulting with the process operator, to determine if this mode is a new healthy operating mode of the process. If a new mode has been confirmed, the data from this new mode will be included in the training data. As shown in Fig. 6 , the model update step incorporates the new data which are confirmed to be from the new operating mode. The DP-NSDC-KPCA approach will again cluster the data, identify the clusters in the data, and train the monitoring model in the model update step. Otherwise if there is no evidence that the process has moved to a new healthy mode, the framework will conclude that there is a fault in the process.
A question arises about the simultaneous occurrence of a fault and a mode change. The assumption in the model update step is that it is unlikely for operators to switch the process to a new healthy mode if there is already a fault detected in the process. Therefore, if the expert knowledge confirms the new operating mode, the data collected from the initial period of this mode are assumed to be healthy and can be incorporated in the monitoring model.

Simulation model
The illustrative numerical example presented in Section 4 is used for validating the performance of DP-NSDC-KPCA. In addition to the training data, Fig. 7 also visualizes the two test sequences used for testing the anomaly detection performance. Test sequence 1 (marked with yellow "+" in Fig. 7 ) has data samples with noise moving from Mode 1 to Mode 4. Test sequence 2 (marked with purple " × " in Fig. 7 ) has data samples moving from Mode 1 via Mode 2 to Mode 3, which is a nonlinear mode. For clarity, in Fig. 7 every 5-th sample of Test sequences 1-2 are plotted. The monitoring models are trained using the RBF-KPCA and the DP-NSDC-KPCA approach, respectively, and are then used for anomaly detection, following the procedure presented in Fig. 6 . For each monitoring model, the 95% percentile of the SPEs of the training data is selected as the upper control limit SPE UCL . Since there is no new mode assumed in the test sequences, the model update step is not tested.   Fig. 8 presents the monitoring contours achieved by the RBF-KPCA and DP-NSDC-KPCA with various values of the kernel width δ. The monitoring contour is obtained by connecting the points in the variable space which have SPE values equal to SPE UCL . Therefore, samples located outside the contour will be detected as anomalous and samples inside the contour will be labelled as healthy.

Performance
Two samples, S 1 and S 2 , are used to illustrate the performance of the monitoring contours. S 1 is located among the samples in Mode 1, indicating that it is likely to be a healthy sample, while S 2 is located far away from any known modes indicating that it is likely to be an anomaly. Fig. 8 (a) shows that the monitoring contours achieved by RBF-KPCA are over-fitted when δ = { 0 . 2 , 0 . 5 , 1 } , especially for Mode 1. In those cases, S 1 is always outside the monitoring contours and is incorrectly labelled as anomalous. When δ = 5 , the model is under-fitted and the potentially anomalous sample S 2 is labelled as healthy. The result shows that RBF-KPCA may not be able to adequately describe the multimodal data regardless of the tuning of its kernel width. On the other hand, the DP-NSDC-KPCA method performs better than RBF-KPCA because S 1 is inside the monitoring contours and S 2 is outside the contours when δ = { 2 , 5 , 8 } in Fig. 8 (b). This indicates that the DP-NSDC-KPCA method can achieve monitoring models which are sensitive to anomalies whilst being robust to the small variations in healthy samples.
The anomaly detection performance of the proposed approach for the two test sequences is shown in Figs 9 and 10 . The ker-   nel width δ used for NSDC-KPCA is 5 and the confidence level of SPE UCL is 1%. In order to relate the results in the trend plots of SPE to the scatter plots of the data, several samples are marked. The detection contours in Figs 9 (b) and 10 (b) are obtained by the SPE UCL s in Figs 9 (a) and 10 (a). In Fig. 9 (b), the samples between S 1,1 and S 1,2 do not belong to any known modes, while the SPE exceeds it control limit in Fig. 9 (a), resulting in these samples being identified as anomalous. Similar behaviour is observed for the samples after S 1,3 . On the other hand, it can been seen that the rest of the samples in the test sequence, e.g. the samples between S 1,2 and S 1,3 , belong to known modes and that the associated SPE does not exceed its control limit. For Test sequence 2, the samples between S 2,1 and S 2,2 , the samples between S 2,3 and S 2,4 , and the samples after S 2,5 are identified as anomalous.
When compared with RBF-KPCA, NSDC-KPCA can build better monitoring models and can generate better monitoring contours for multimodal data. The results also demonstrate that the magnitude of the SPE obtained using NSDC-KPCA is sensitive to the mode changes and the same control limit can be used for anomaly detection in several operating modes. In addition, the SPE increases monotonically as the test sample deviates from the known modes. This behaviour may also be useful for estimating the development of a fault.

Process description
The PRONTO benchmark dataset [40] is used to emulate the online implementation environment for the proposed approach. This dataset is collected from a pilot-scale multiphase flow facility with several operating modes specified by the water and air flow rates. Artificial faults were induced in various operating modes and datadriven monitoring algorithms can be tested using the data collected in the faulty scenarios. A detailed description of the experiments and dataset is given in [41] . In order to validate the proposed framework, the data from four different operating modes are used. Table 3 presents the air and water flow rates for the four modes. Since the proposed framework is for static process moni- toring without considering the temporal correlation in the data, it is possible to re-arrange the data from each mode to simulate the mode switch that may be observed in a multimode process. In this paper 11 variables are used for monitoring, which are the same as those presented in Table 1 in [25] . Figs 11 and 12 plot the data used for training and testing, respectively. The training set only contains data from Mode 1 and Mode 2 while the test set has data from all four modes shown in Table 3 of which the modes New 1 and New 2 are not included in the off-line training step. Moreover, a developing air blockage fault was introduced by gradually reducing the valve opening of the air inlet pipeline in the mode New 1. Fig. 13 shows the sequence for the mode change and the valve opening for fault simulation of the test set. Both the training and the testing sets mainly consist of steady state data from multiple operating modes. Short transition periods may exist between the modes. It may be observed from Figs 11 and 12 that the steady states of process variables are different in the various operating modes. Therefore, the data sets are suitable for validating the proposed monitoring framework.

Performance
The fault detection performance of the proposed framework is compared against the performance achieved by the monitoring model trained off-line using NSDC-KPCA without model update. The off-line monitoring result shows that, although the monitoring model is sensitive to anomalies caused by mode changes (A 1 and A 2 in Fig. 14 (a)) and is robust to the variations within the known modes in the training data (R 1 in Fig. 14 (a)), there is no way to incorporate the data from the new operating mode. As a result the monitoring model is unable to distinguish between the new   healthy operating modes and a fault in one of the new modes. The proposed framework also detects anomalies when the process has moved to a new operating mode. However, the model update step includes the data collected from a new mode when the new mode is confirmed to be healthy operation (U 1 and U 2 in Fig. 14 ). DP-GMM can automatically determine the number of operating modes in the data, as shown in Fig. 14 (b). The fault induced in the new mode can be detected as it develops (D 1 in Fig. 14 ) even if this new mode did not exist in the dataset used for off-line training. Fig. 14 also shows small periods when false alarms are triggered, such as the samples 1050 between 1090 and the samples between 1360 and 1410. For the first period, the false alarms are triggered by the oscillation in the data due to controller overshoot when the set points changed. The overshoot can be observed in Fig. 12 . The second false alarm occurs because the model update step U 2 uses the data when the process has switched to New 2 but did not reach the steady state (sample 1224-1324 in Fig. 12 ). The new model does not fully apply to the period afterwards when the process has stabilized. Since transitions between modes are not considered in the framework, it is recommended to use the data col-lected from steady states for model update. In on-line implementation, false alarms caused by transition periods may be suppressed by alarm management techniques.
It may also be noticed that, after model update, there is no change in the SPE UCL (red-dashed line) in Fig. 14 . Moreover, the magnitudes of SPEs obtained in multiple healthy operating modes are similar. This is an important result because the usage of NSDC-KPCA can achieve a single model for multiple modes where the same SPE value indicates the same level of anomaly for different modes. The SPE will facilitate the decision-making procedure of process operators because they can inspect the SPE and determine if there is a fault in the process without the need to specify the current operating mode. Hence this example demonstrates that the proposed framework has achieved an anomaly indicator which is useful for process operators.

Discussion
The DP-NSDC-KPCA approach used for building monitoring models in this framework is motivated by practical considerations when designing and applying monitoring methods. Historical data from process operations are usually unlabelled and the operating modes are often unknown. Since DP-based clustering can automatically determine the number of clusters, the off-line training procedure is made unsupervised and the on-line model update can easily incorporate new data from on-line operations. The clustering method assumed that data samples from operating modes do not overlap significantly with each other. If they do overlap, then the clustering performance may not be precise. Moreover, the transition periods between modes may be identified by DP-based clustering as individual modes.
In order to guarantee the process is operating efficiently and to minimize the likelihood of faults developing into failures, the decision of process operation and maintenance should be made according to the existence of faults and the fault severity in the process. Therefore, as stated in Section 4.2.2 , an anomaly indicator is desired if it exceeds the control limit and detects the fault when a fault occurs and changes monotonically as the fault severity develops. In the framework, the fault detection performance is improved by appropriate tuning of the NSDC kernel. The revised SPE proposed in Section 4.2.2 aims to fulfil the requirement of monotonicity. Additionally, the decision step will be made easier if the same magnitude of the anomaly indicator indicates the same level of fault severity in various modes. The SPE trend in Fig. 14 demonstrates that the DP-NSDC-KPCA approach, which builds a single monitoring model for multiple modes, fulfils this expectation.
Such behaviour of anomaly indicators achieved by DP-NSDC-KPCA approach will facilitate fault diagnosis and prognosis. For example, contribution plot-based methods can pin-point the location and the cause of a fault by identifying process variables that have large contributions to the anomaly indicator. The magnitude of variable contributions can only be compared if the magnitude of an anomaly indicator is the same for multiple operating modes. As for fault prognosis, the trend of the anomaly indicator can reflect fault development only if the same magnitude of the anomaly indicator indicates the same level of fault severity in various modes.
A future research direction will be to extend this framework to dynamic process monitoring. The dynamic extension requires the description of the process dynamics, to which auto-regressive models or state-space models may apply. For example, [42] provided a way of modelling nonlinear and dynamic processes. The extension also requires the dynamic adaptation of the DP-NSDC-KPCA framework with respect to the dynamic process models. The aim will be to cope with process dynamics and transition periods between modes whilst maintaining the unambiguous behaviour of the anomaly indicators achieved by the monitoring framework.

Conclusions
This paper presented an on-line monitoring framework for multimode and nonlinear processes. The unsupervised training step uses the DP-GMM clustering approach to cluster the training data from multiple operating modes and implements the NSDC-KPCA to build a single monitoring model. The tuning of the kernel width of the NSDC kernel has been considered and the SPE was chosen as the anomaly indicator. This framework has the following advantages. The monitoring result can be easily interpreted because the monitoring model and the anomaly indicator are robust to mode changes. At the same time the monitoring model is sensitive to anomalies that deviate from the known modes. Data from new op-erating modes observed during process operations can be incorporated to update the monitoring model, making it possible to detect faults in the new operating modes.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.