Q residual non-parametric Distribution on Fault Detection Approach Using Unsupervised LSTM-KDE

It is well known among practitioner, majority collected data from industrial process plant are unlabeled. The collected historical data if utilize, able to provide vital information of process plant condition. Learning from unlabeled dataset, this study proposed Unsupervised LSTM-KDE approach as a measure to predict fault in industrial process plant. The residual based fault detection approach framework is utilized with long short-term memory (LSTM) as the main pattern learner for nonlinear and multimode condition that usually appear in process plant. Furthermore, kernel density approach (KDE) is used to determine the threshold value in non-parametric condition of unlabeled data. The LSTM-KDE approach later is evaluated with numerical data as well as Tennessee Eastman process plant dataset. The performance also was compared to Principal Component Analysis (PCA), Local outlier factor (LOF) and Auto-associative Kernel Regression (AAKR) to further examine the LSTM-KDE performance. The experimental results indicate that the LSTM-KDE fault detection approach has better learning performance and accuracy compared to other approaches


INTRODUCTION
Process monitoring has become a crucial stage in the manufacturing industry.One of the motivations is to ensure personnel safety and optimum output by monitoring the entire engineering system.The current manufacturing plant, equipped with up-to-date data acquisition tools such as the Internet of Things (IoT), supervisory control and data acquisition (SCADA), cloud computing, and networks, collects an abundance of informative datasets.Hence, massive, collected signals are available in order to provide vital indications of the current plant condition.Consequently, draw more attention towards data-driven fault detection in monitoring industrial processes.With this, high-volume and multivariate datasets are inevitable, given the multiple components and sensory network in the manufacturing system.Multivariate statistical process monitoring (MSPM) approaches such as principal component analysis (PCA) and partial least squares (PLS) were seen as prominent approaches to cater to the multivariate condition, but the limitation relies on linear and gaussian assumptions, which are unfit for today's industrial system(K.Wang, Zhou, and Wang 2020).Since the collected signal relies heavily on time series conditions, adding in the temporal relation would increase the reliability of the fault detection approach (Ammann, Michau, and Fink 2020).In addition, time series analysis will also enable the future prediction of faulty cases in process monitoring rather than present-day time analysis.This extends condition monitoring to preventive measures.Other than that, high numbers of pieces of evidence presented by researchers focus on supervised and semi-supervised approaches in fault detection cases (Alrifaey, Lim, and Ang 2021;Belagoune et al. 2021;Dong, Ma, and Liu 2019;Jin et al. 2020;Li et al. 2020;Monostori et al. 2016;Shadi, Ameli, and Azad 2022;Wang et al. 2018) .However, less study was found for the unsupervised approach, despite the majority of industrial monitoring data being unlabeled.Therefore, a nonparametric time-series unsupervised fault detection approach will be beneficial considering the current industrial process plant environment.
Practically, the observation signal in process monitoring usually comes with a time series, an unknown distribution model, and parameter values.At the same time, determination threshold parameters such as the upper control limit,   and the lower control limit,   are important to distinguish the nominal signal condition from the faulty occurrences.Therefore, two (2) main factors that affect the efficiency of the non-parametric time series fault detection approach will be the time series model, which is based upon the choice of model,  and the unknown data distribution, which heavily impacts the determination of the threshold value.
Data-driven fault detection model,  selection has been divided into three parts: MSPM, shallow artificial intelligence (shallow AI), and deep learning approaches.The limitation on MSPM was already discussed at the first paragraph.The shallow AI approach, which is usually more towards application-specific problem-solving and handcrafted feature extraction, shows satisfactory implementation in (Nagarajan, Kayalvizhi, and Karthikeyan 2016;Ren et al. 2018;Shao et al. 2023;Tang et al. 2018).However, considering the massive dataset, nonlinear, and temporal relations that come together in engineering systems, deep learning approaches that provide more abstraction learning are found suitable as fault detection models.Recently, among many deep learning approaches, the authors found three (3) main architectures that were selected as unsupervised approaches to fault detection: deep belief network (DBN), autoencoder (AE), and long short-term memory (LSTM).
The DBN architecture consists of multiple restricted Boltzmann machine (RBM) models.In fault detection, DBN was used as model,  in (Ren et al. 2018) and (Anaissi and Zandavi 2019) due to its capability to capture nonlinear patterns in process plants.Implicitly, DBN was seen as a good feature extractor considering the complex signal analysis in engineering systems.However, less discussion was seen on temporal relations in both studies, but similar results of unsupervised AE fault detection were seen in (K.Wang et al. 2020) and (Xiang et al. 2020).(K.Wang et al. 2020) argued that the accuracy of AE fault detection is affected by noise in the dataset, and the selection of the activation function influences the detection margin.After that, the study proposed a deviation degree penalty to overcome this.Improvements in terms of fault alarm rate were seen, but this result does not reflect the time series analysis that is needed in the fault detection monitoring system.On the other hand, (Chen et al. 2020) innovated a new AE architecture with a Gated Recurrent Unit (GRU) neural network.The notion is to combine both feature extraction and correlation analysis in AE with a time series GRU model in order to improve the reconstruction loss in the normal state so that there is an obvious difference between a normal and faulty signal.However, the AE-GRU output distribution was assumed to be normal.Likewise (Amarbayasgalan et al. 2020) in, which chooses multivariate normal distribution for prior data distribution in training the sliding window variational autoencoder fault detection approach.Another perspective was in (Calabrese et al. 2020), which explained that the optimal subsequent time series window was generated by using autoregressive (AR), model learning with an autoencoder, and determining the threshold value using density-based spatial clustering of applications with noise (DBSCAN).However, using the clustering approach will require a detailed analysis of whether the grouped boundary is considered 'normal' or 'faulty'.Experienced personnel with the data domain are needed in this stage (Jiang et al. 2018).Furthermore, a study in (Jiang et al. 2018) proposed another fault detection framework based on sliding window AE and Kernel Density Estimation (KDE) as methods for determining threshold values.
Focusing on temporal relations or time series analysis, more attention is given to the recurrent neural network (RNN) family.Recently, LSTM, Bidirectional LSTM (BiLSTM), and GRU were among the selected architectures for the unsupervised fault detection approach (UFD).(Machado et al. 2022) proposed LSTM-AE for detecting multiple types of faulty signals at oil wells, whereas (J.Zhao et al. 2020) used BiLSTM with the same intention but in slow and fast dynamic system conditions.The same goes in (Kukkala, Thiruloga, and Pasricha 2020) using GRU to detect multiple intrusion attacks that are suitable for embedded systems.But the threshold determination for (Machado et al. 2022),(J. Zhao et al. 2020) and (Kukkala et al. 2020) is under gaussian assumption with the F1 score, Gaussian segment model, and maximum reconstruction score, respectively.Other than that, (Yu and Yan 2021) modified the LSTM architecture to capture better dynamicity in complex systems and KDE for fault detection.Furthermore, (Yu, Liu, and Ye 2021)studied lower-dimensional features representative of nonlinear complex systems using convolutional LSTM autoencoders (CLSTM-AEs) and KDE.In the perspective of early fault detection, (B. Zhao et al. 2020) solved the weak nonlinear system problem using CNN-LSTM, whereas the decision boundary nonlinear output frequency response functions (NOFRFs) indicator is specific to the application of the study area.On the other hand, (Liu et al. 2020) proposed multilayer LSTM and isolation forest (iForest) for detecting early warnings.However, the study did not mention how the selection of the iForest threshold value was made.Furthermore, (B.Wang, Liu, et al. 2020) developed a sliding window LSTM with a low-pass infinite impulse response (IIR) filter approach for large noise results, smoothing residuals for threshold determination with mean and standard deviation.The same concern was raised in (Ellefsen et al. 2019) where uncertainty in maritime components resulted in a noise dataset, and LSTM was proposed as one of the methods for learning this type of dataset.However, in the study, the threshold determination is specifically based on the velocity and acceleration of the dataset.The real-time efficiency of LSTM was proven in (B.Wang, Peng, et al. 2020)and the scalability issue with small applications was studied in (Silva et al. 2022).Other than that, the GRUgaussian mixture VAE approach was developed to cover multimodal systems (Guo et al. 2020).In addition, LSTM was also used in the vibration analysis of a motor, but the threshold determination is specific to the application itself (Principi et al. 2019).
From the literature, it is found that the LSTM family gives satisfactory performance as an UFD approach but less focus on the threshold determination.The study in (Guo et al. 2020(Guo et al. , 2020;;Kukkala et al. 2020Kukkala et al. , 2020;;Yu and Yan 2021;J. Zhao et al. 2020) decided the decision boundary for 'normal' and 'faulty' under gaussian assumption, which in practice, under an unsupervised approach, no distribution conditions were known, whereas (Kourti and MacGregor 1995;B. Wang, Liu, et al. 2020;B. Wang, Peng, et al. 2020;B. Zhao et al. 2020) proposed application-specific threshold determination.It is also found that studies in (B. Zhao et al. 2020), and(Yu et al. 2021) use a non-parametric kernel density estimation statistical approach, but evaluation not cover different modes dataset.Thus, this study will contribute to this area of study, which includes: 1. Unsupervised non-parametric time series fault detection framework for a nonparametric distribution dataset in a complex engineering system using the LSTM model.

Threshold determination via Kernel Density Estimation
for non-parametric residual distribution in complex engineering systems.
3. Comparison of several models with the proposed unsupervised fault estimation framework.

Statistical Process Control Chart Fault Detection
In a multivariable process control system,  ×  data matrix,  consists of  number of observations and  variables are usually statistically monitored using the  2 and  or  statistical process control chart.The threshold limit of the control chart upper and lower bound are calculated as follows (Cacciarelli and Kulahci 2022;Gajjar and Palazoglu 2016;Kourti and MacGregor 1995;Reinartz, Kulahci, and Ravn 2021): Where  , () is  distribution with  and  degree of freedom.The value of  2 >  ( 2 ) and  > () are considered as out of bounds.In Section 1, we discussed the assumption that the  2 and  distribution in earlier studies was usually for supervised manners but in unsupervised conditions, an unknown distribution comes to attention.
The  2 and  are, on the other hand, calculated based on Equations ( 2) and ( 3) (Cacciarelli and Kulahci 2022;Gajjar and Palazoglu 2016;Reinartz et al. 2021) respectively.The representation of  2 and  relies on the reconstruction error,   between the learned model,  from the data  and the estimated value of   ̅ where  is the estimated covariance matrix of  variables.Thus, the selection of model,  reflect the ability of the model to learn the signal pattern and best representation of data  in normal conditions.
The criteria that affect the selection of model,  also already been discussed in Section 1, but in summary, they are listed below: 1.A multivariable signal with various types of signals does not limit itself to a steady-state signal only.Some of the variables are in highly noisy, random, or sinusoidal conditions with an unknown distribution.
2. The faulty type covers steps, random variation, slow drift, sticking, constant position, and an unknown type of source.
3.Not only spatial relations but also temporal relations in the signal in the system.

Long Short-Term Memory (LSTM)
LSTM is considered part of the recurrent neural network Initially, the normal data matrix,   and the previous state  −1 will pass through the forget gate,   .At the forget gate, the information flow will be controlled by an element-wise sigmoid function as described in Equation ( 4).The   and   represent the weight of   and  −1 , whereas   is the bias weight vector.
At the input gate, two steps of learning appear.The normal data matrix,   and the previous state  −1 to be fed into the sigmoid function of   (see Equation ( 5)) for input learning as well as to   (see Equation ( 6)) to new possible cell state with the tanh activation function.After that, the new cell state   was updated by using Equation ( 7).
From Equation ( 7) above, the "memory" that will be held by   affected the forget gate function,   .The value of   either 1 or 0 indicates whether the previous  −1 will be forgotten or not.If yes, only current memory     will be considered for the next LSTM cell, and vice versa.Lastly, at the output gate, the output,   is determined by the previous state of  −1 , input data matrix,   and previous output state,  −1 , and later, the hidden state,   is updated accordingly.
The   ,   ,   ,   ,   ,   represent the weight and   ,   ,   are the bias vectors at respective input, output, and cell states.

Kernel Density Estimation for non-parametric threshold determination
Kernel density estimation (KDE) is a non-parametric approach to estimating an unknown distribution of dataset,  = { 1 ,  2 ,  3 , … ,   }.The probability of a given point   in the KDE distribution is calculated by: The kernel function (  (Jaya et al. 2021).In this study, the Gaussian kernel function is used as follows: In selecting the optimum value of ℎ , in this study, the Silverman [42] bandwidth Equation is used (see Equation ( 12)), where IQR is the interquartile range value.) . −1/5 (12) Referring to Equation (1), the threshold value was initially determined based on the percentile of the  , () distribution.Due to the non-parametric distribution of KDE, the upper and lower limit,  is calculated from the confidence interval of the KDE distribution (Węglarczyk 2018).The confidence interval for the upper and lower limits of KDE is given in Equation ( 13) below.
Where   and   are the lower and upper limits of the confidence interval within the probability of 1 −  .In realising the integral under the curve of  ̂(  ), a numerical approach trapezoid rule method (Yu et al. 2018) is used as stated below.
The numerical approach of the trapezoid rule in computational implementation is influenced by the selection of trapezoid size.In this study, the trapezoid size was tuned heuristically since each situation creates a different KDE distribution and a different trapezoid size for 1 −  condition.

NON-PARAMETRIC LSTM FAULT DETECTION FRAMEWORK
A data-driven fault detection approach usually has two stages of process flow.The first one is the training phase, which relates to normal dataset learning.Considering the criteria in Section 2.1, this study proposed the LSTM network as the learning model, .The accuracy of the stage is important, as the reconstruction error will reflect the final distribution of the system.Initially, a normal data matrix,  goes through normalisation pre-processing.After that, the dataset will be rearranged between input and target output in a time-series manner by looking back at a certain period of time, .Later, the arranged dataset will be trained using a stacked LSTM network, and  parameters of the control chart will be determined.The distribution of  is modelled using the KDE distribution, and the trapezoid rule is used to determine upper and lower boundaries to complete the normal control chart, as shown in Fig. 2.
At the second stage, faulty data is fed into the trained LSTM network after going through pre-processing and lookback functions similar to the first stage.The  for faulty dataset are determined, checking whether at a certain time, the condition is 'faulty' or not based on the upper and lower limits from the analysis at the 1 st stage.Fig. 3. illustrates the fault detection phase.Table 1. and Table 2. represent the flow chart in term of algorithm for the LSTM-KDE fault detection framework.

Performance Measure
In determining the ability of model,  in learning the structure or pattern in a normal condition, mean squared errors are used since they represent the reconstruction error of the normal condition of the system.A lower MSE means a more accurate learning model.

Experimental setup NUMERICAL EXAMPLE
The common responses that usually occur inside engineering systems are a) nonlinear conditions and b) varying operating system.In this section, these two common responses were produced and illustrated using a numerical example (Yu et al. 2018).The multivariate nonlinear process signal for normal conditions was generated based on mathematical expression (19) where  were set within the boundary [0.01,2] and  1 ,  2 and  3 are noise of each condition with a normal distribution of mean 0 and variance 0.01.As for faulty signal, two types of faulty signal were produced: a linearly increasing condition was injected into the generated data of  1 from sample 101-270 by adding 0.01( − 100) to  1 value of each sample in this range, where  is the sample number and a step (as faulty 1).In addition, another type of faulty signal, which is a step response, was inputted to  2 by 1.5 from sample 101 onwards as faulty 2.
As for the varying operating system condition, the system was described using the mathematical expression in Equation ( 20), where  1 and  2 follow normal distribution with a given mean and variance value as stated in Equations ( 21), ( 22), and (23).The data was generated from 100 samples for each mode, resulting in 300 samples of normal data for the training phase.Otherwise, 200 test samples were generated due to the faulty condition.At the first faulty condition (faulty 1), the system was initially running in mode 2, and a drifting error of 0.04 ( − 100) was applied to  2 from the 101st through the 200th samples, where  denotes the serial number of the test samples.In addition, for the second faulty condition (faulty 2), a step signal of 2 was added to  1 at Mode 1. [

TENNESSEE EASTMAN PROCESS PLANT(TE)
The TE benchmark process plant has been considered one of the most challenging industrial engineering problems in monitoring systems as well as in fault detection (Downs and Vogel 1993;Xiao et al. 2022).Many studies have referred to the TE process plant as the standard benchmark in the fault detection and diagnosis area, such as (Cacciarelli and Kulahci 2022;Downs and Vogel 1993;Gajjar and Palazoglu 2016).
Recently, an extended version of the TE process plant has been published (Reinartz et al. 2021) and data has been shared at https://data.dtu.dk/articles/dataset/Tennessee_Eastman_Reference_Data_for_Fault_Detection_and Decision_Support_Systems/13385936/1 which is in line with present plant system.Compared to the existing TE process plant dataset that focuses only on Mode 1 of the system, the extended dataset consists of Mode 1 until Mode 6 of the system.At each mode, there will be fifty-four (54) measured and manipulated variables.As for this study, only thirty-three (33) variables were used, as we referred to (Cacciarelli and Kulahci 2022) (see Table 3).Furthermore, each mode also contained twenty-eight (28) faulty conditions with steps, random variation, sticking, and unknown types of fault, as shown in Table 4.To evaluate the proposed LSTM fault detection framework, three other fault detection approaches were used.The details of each approach are mentioned below: 1.The PCA+KDE approach was inspired by (Jaya et al. 2021) where the study argues that the fault detection of multivariable condition control charts is better approximated using the KDE approach to reduce high false alarms in nonparametric data distribution.Furthermore, PCA was considered the most prominent approach and is still, to date, being chosen as the fault detection approach, making it an appropriate comparison model to evaluate the proposed fault detection framework.
2. The LOF approach was referred to in (Benkő, Bábel, and Somogyvári 2022) which defines a faulty condition in terms of distance instead of distribution.The study also emphasises unsupervised as well as nonlinear conditions, which is in line with our study.
3. Another approach chosen as a comparison method in this study is auto-associative kernel regression (AAKR) with KDE fault detection approach (Yu et al. 2018).This study focuses on the multimode approach from a regression perspective.The research also explains that the non-parametric distribution relies on a multimode dataset, which confirms the importance of this study.

Evaluation of model, f
The selection of model,  reflects the ability of the model to learn the signal.The aim is to analyse the MSE value for reconstruction error approach that has been proposed in Section 3. Based on Section 2, there are a number of deep learning approaches within the scope of this study.However, considering the nonlinear and multimode characteristics of engineering systems, this section further analyses the impact of three ).The numerical multimode dataset was used to evaluate the performance of each selected model, .The dataset was preprocessed using min-max scaling, which brings the value between 0 and 1.After that, the data were split between train input and output by the lookback function, using  =0 until  = to predict the next output   .The value  in this study was set as eight, epoch equal to 100, number of neurons nine and the optimizer used in this study is RMSprop using the TensorFlow Python library.
Table 5 shows the average accuracy and losses for each selected model,  for 100 epochs.From the observation of accurcay at Table 5, CNN-LSTM showed the lowest accuracy among all with 0.8913 followed by LSTM-AE with 0.9289 which leaves the option between BiLSTM and stacked LSTM with 0.9513 and 0.9397 respectively.When compared to the two-left approach in term of ∆Acc which difference between train accuracy and validation accuracy, stacked LSTM shows better result with 0.1646 compared to BiLSTM.BiLSTM was interpreted as overfit since high different between train and validation accuracy results.Thus, this experiment concludes that a stacked LSTM architecture will be used as main the model,  approach for unsupervised fault detection.

Fig. 4. KDE distribution for Tennessee Eastman process plant mode 1
This section describes KDE distribution shape of the proposed framework for non-parametric fault detection LSTM-KDE approach.Fig. 4 shows The Tennessee Eastman process plant mode 1 that were selected as visual representation of KDE distribution in this study.Refer to the algorithm Table .1, the number of kernels for this distribution was set as n = 150 with  = 0.05.From the algorithm the upper and lower threshold limit are, upper limit = 0.9935 and lower limit = 0.2604 that represent in vertical line at Fig. 4.

Evaluation of LSTM fault detection performance based on KDE threshold approach.
This section describes the performance of the proposed framework for non-parametric fault detection LSTM-KDE approach.The generated dataset, as mentioned in Section 3.2 numerical example, was used to evaluate the accuracy of the LSTM-KDE approach compared to the selected model in Section 3.3.Tables 7. and 6. show the performance indices of each method.For both multimode and nonlinear conditions, fault 1 illustrates amplitude increment fault, whereas fault 2 indicates a linear increment fault.
For multimode fault 1 at Table 7, the lowest accuracy is at AAKR+KDE with only 0.395, followed by LSTM+KDE with 0.4136, LOF with 0.5 and the highest is PCA+KDE with 0.605.Similar to the nonlinear condition of fault 1, LOF again has the lowest accuracy of 0.33, followed by LSTM+KDE at 0.48, PCA+KDE at 0.74, and AAKR+KDE at 0.787.
As for multimode fault 2, PCA+KDE, LOF, and AAKR+KDE have nearly the same values of 0.53, 0.5, and 0.54, whereas LSTM+KDE shows the overall highest value with 0.6701.In nonlinear fault 2, the lowest accuracy is PCA+KDE with 0.65, AAKR+KDE with 0.747, LOF with 0.76, and the highest is LSTM+KDE at 0.804.This indicates the LSTM+KDE framework is able to detect linear increment faults better than other methods.However, different performance in terms of multimode and amplitude increment faults.The overlapped learning between multimode conditions and steady-state increments makes it difficult to detect faulty conditions.This will be further discussed based on Figure 5. Fig. 5. illustrates the Q residual value for the LSTM-KDE approach for multimode and nonlinear conditions.The figures show a clear detection for a linear increment fault (fault 2) starting at 100 and above based on the calculated threshold value, whereas not for the fault 1 condition.The multimode condition fault 1 Q residual signal response does not represent the steady-state increment at all, thus there is no faulty signal indication, which is in line with the low detection rate.In the nonlinear case, the steady-state was observed as seen in the figure fault 1 nonlinear; however, the threshold value is not accurate for fault detection in this area.Thus, further improvement is needed for the LSTM-KDE framework to be able to work for both nonlinear and multimode conditions in all fault signals.

Evaluation of fault detection performance in large scale plant: Tennessee Eastman process plant
In the training phase, at each mode there are 2001 point of data.The normal signal at each faulty data point from time 1 to time 500 were grouped as normal datasets, and preprocessing was done using a minimum-maximum scaling of 0 to 1.After that, the data were fed into a non-parametric framework similar to Section 3.However, the number of stacked LSTM neurons at each layer was changed to thirtythree (33) neurons, value  in this study was set as twenty ( 20) and other hyperparameters remain the same as in section 4.1.
The remaining data were used in the testing phase, which is data from 501 to 600 that are normal and data from 600 to 2001 that are faulty.The performance indices of accuracy, FPR, and FNR were calculated at the last stage.The indices are shown in Table 8. until Table 11.We would like to note that in Mode 2, IDV 17, 18, and 28 were unable to be extracted, thus implicating the simulation with no results for these three types of faults and not included in the average calculation.
In the section 3. Further analysis was made on the performance indices of false positive rate (FPR) and false negative rate (FNR).In FPR situation, when the signals are supposedly indicated as faulty, the real output shows a normal condition.This is dangerous when considering interconnected process plants, which promote damage to machines.The FNR, on the other hand, expected a normal condition but a faulty signal to appear, thus wasting resources when the faulty signal does not actually appear.Thus, it is the best to keep these indexes as low as possible.Refer back to the study of the TE process plant in (Reinartz et al. 2021) shows fault detection based on PCA and explains that the ARL0 is not always in a nominal state, which impacts the  2 chart.The main reason is that the assumptions of observation are uncorrelated in terms of time.At the same time, it is also mentioned in (Gajjar and Palazoglu 2016) that the  2 control chart is not reliable and modifies the loss function in the study.This study proposes a stacked LSTM framework to cover the time-correlated control chart while focusing on SPE or Q residual response for fault detection.Using SPE measurements LSTM shows significant improvement compared to PCA in all modes except mode 5, in which PCA shows better performance, whereas AAKR always results in the middle between the other two approaches.From the 28 faulty conditions labelled as IDV 1-28, faults 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 17, 18, 19, 20, 25, 26, and 27 show a good result of more than 90%, but concern is raised at faults 3, 5, 9, 12, 15, 16, 21, 22, and 23.This is in line with the discussion in (Reinartz et al. 2021), which imposed that the faults in 3 and 15 are difficult to catch, as well as the multiple fault type of random variation and sticking in 9, 15, 20, and 23.However, the proposed LSTM framework is able to improve the fault in 13 and 18, which in (Reinartz et al. 2021) are also labelled as difficult to detect faults.Consider the SPE Equation (3), which relates to the matrix decomposition of highly correlated and multivariable variables might jeopardize the fault detection accuracy.For example, IDV 1, in which significant individual variables are out of control, shows a highly visible area between faulty and non-faulty conditions (Cacciarelli and Kulahci 2022), making the determined threshold value ample for fault detection, whereas other faulty conditions consist of fewer variables, shadowing the boundary between faulty and non-faulty detection.This individual signal condition as well as the correlation of variables are not included in this study and might become the next focus to improve the LSTM framework.

CONCLUSION
A time-series fault detection study based on the LSTM framework was proposed in this section.The main reason is to overcome the non-stationary assumption in previous studies, such as in (Cacciarelli and Kulahci 2022;Reinartz et al. 2021).In the LSTM framework, KDE threshold determination was introduced since the majority of the gathered datasets have an unknown distribution and most of the studies assume the collected signal is a gaussian assumption (Guo et al. 2020;Kukkala et al. 2020;Machado et al. 2022;Silva et al. 2022;B. Wang, Liu, et al. 2020;J. Zhao et al. 2020).The study compared four types of LSTM variants as the main learning model and came up with staked LSTM as the best model for fault detection.The study also analyses the numerical representation of multimode and nonlinear conditions, as well as the real representation of the Tennessee Eastman process plant for the proposed LSTM fault detection framework in comparison to the PCA+KDE, LOF, and AAKR+KDE approaches.The majority of results show the dominance of the LSTM approach, but improvements must be made to cater to the high correlation and individual signal contribution towards the accuracy of fault detection as well as the challenge of multimode conditions in the LSTM fault detection framework for future work.

Fig. 1 .
Fig. 1.The diagram of an LSTM cell of the lth layer at time t.
that closely relates to fault detection is fault detection accuracy, false positive rate (FPR), and false negative rate (FNR) based on the confusion matrix.Fault detection accuracy is based on the number of accurate fault detections, whereas FPR calculates the condition where there is a 'faulty' condition but estimated as a 'non-faulty', type I error.

Fig. 2 .
Fig. 2. Training phase On the other hand, FNR calculates the 'non-faulty' condition as a 'faulty', type II error.FPR values are important to keep as low as possible where a high misdetection rate is open to equipment damage in the process plant.

Table 3 .
Monitoring variables in TE process plant

Table 4 .
Faulty condition in TE process plant

Table 5 .
Average accuracy and loss for 100 epochs

Table 7 .
Performance indices for multimode condition Table 10 and Table 11 show the results for these indices.For FPR index, lowest rate goes to AAKR with 0.20, after that PCA with 0.23, next LOF with 0.25 and on LSTM which is 0.52.As for FNR index, LSTM shows the best performance with 0.19, follow by PCA is 0.28, AAKR is 0.30 and lowest performance on LOF with 0.49.

Table 8 .
Performance indexaccuracy for Group 1 faulty condition